Optimize CGC computations #13

lkdvos · 2024-03-16T13:21:08Z

This PR adds some optimizations for computing CGCs:

Add a special case for dealing with Clebsch-Gordan coefficients when one of the sectors is trivial, i.e. $a \otimes I \rightarrow a$ or $I \otimes a \rightarrow a$. For these cases the CGCs are just the identity matrices, thus no computations are needed.
Change the way lower_weight_CGC! works: solve equations using qr! instead of pinv, and build rhs differently.
Purge values that are almost zero from the final result to save memory on disk and in RAM.

codecov · 2024-03-16T13:26:09Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.76%. Comparing base (4107f08) to head (ad9aea9).

Additional details and impacted files

@@            Coverage Diff             @@
##           master      #13      +/-   ##
==========================================
+ Coverage   96.30%   96.76%   +0.45%     
==========================================
  Files           7        7              
  Lines         623      649      +26     
==========================================
+ Hits          600      628      +28     
+ Misses         23       21       -2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

benchmark/benchmarks.jl

lkdvos · 2024-03-17T18:26:01Z

I've added some benchmarks for some improved version of the CGC computations, which now use QR instead of SVD to solve the linear problems, along with some in-place optimizations etc. I'll post the results here when they are done, and then check on how to add everything.

github-actions · 2024-03-17T18:26:57Z

benchmark/benchmarks.jl

+
+using ThreadPinning
+ThreadPinning.pinthreads(:cores)
+ThreadPinning.threadinfo(;blas=true, hints=true)


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

ThreadPinning.threadinfo(;blas=true, hints=true)

ThreadPinning.threadinfo(; blas=true, hints=true)

github-actions · 2024-03-17T18:26:57Z

benchmark/benchmarks.jl

+    f
+end
+
+save("benchmark_results.png", f; px_per_unit=2)


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

save("benchmark_results.png", f; px_per_unit=2)

save("benchmark_results.png", f; px_per_unit=2)

lkdvos · 2024-03-19T08:08:04Z

The results are in:

https://gist.github.com/lkdvos/4030a328fc12ea3c939d0b5ccec3ad4a

Looks like a serious upgrade, so I'll clean up this PR and implement everything nicely.

Jutho · 2024-03-21T14:36:55Z

Looking good. I am wondering which of the changes contributes most to the nice speedup. Threading over the fusion degeneracy N123 is no longer possible with the storage recycling for building the right hand side, but I assume threading happens at the higher level anyway.

lkdvos · 2024-03-21T14:49:45Z

I actually think that using the threads for BLAS might actually be the best here, and even still, it would not be that hard to define the auxiliary vectors one loop lower, and re-use them only for the different m3s. In any case, the biggest speedup comes from the qr! and ldiv! from what I could find, and afterwards I changed how the arrays were built because of needing them in a dense format afterwards anyways.
I think it is probably harder to find the optimal threading strategy over a wide variety of cases/machines, so I would say that the current approach is quite nice.

Special case computation of CGC with trivial sectors

b2c5aad

Add benchmarks

464c4df

github-actions bot reviewed Mar 17, 2024

View reviewed changes

lkdvos added 4 commits March 17, 2024 19:14

Small updates to benchmarks

036c190

Formatter

1667c9f

don't use MKL

cf599f8

make threadinfo print

c56842c

github-actions bot reviewed Mar 17, 2024

View reviewed changes

lkdvos added 2 commits March 18, 2024 19:17

Make benchmarks more reasonable

a9c3cd4

Add benchmark results

ca35e4a

lkdvos changed the title ~~Optimize CGC computation with trivial sectors~~ Optimize CGC computations Mar 19, 2024

lkdvos added 5 commits March 19, 2024 09:32

incorporate CGC optimizations

4d963da

Remove some allocations

414502e

Add purging of almost-zeros

07b1e22

remove benchmark stuff (stored in gist)

6445cc5

remove vscode files

ad9aea9

lkdvos requested a review from Jutho March 19, 2024 09:23

lkdvos added the enhancement New feature or request label Mar 20, 2024

lkdvos linked an issue Mar 20, 2024 that may be closed by this pull request

Performance check for SU(>4) #9

Closed

lkdvos merged commit 8aabdc9 into master Mar 22, 2024
11 of 12 checks passed

lkdvos deleted the optimizations branch March 22, 2024 07:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize CGC computations #13

Optimize CGC computations #13

lkdvos commented Mar 16, 2024 •

edited

Loading

codecov bot commented Mar 16, 2024 •

edited

Loading

lkdvos commented Mar 17, 2024

github-actions bot Mar 17, 2024

github-actions bot Mar 17, 2024

lkdvos commented Mar 19, 2024

Jutho commented Mar 21, 2024

lkdvos commented Mar 21, 2024

	ThreadPinning.threadinfo(;blas=true, hints=true)
	ThreadPinning.threadinfo(; blas=true, hints=true)

	save("benchmark_results.png", f; px_per_unit=2)
	save("benchmark_results.png", f; px_per_unit=2)

Optimize CGC computations #13

Optimize CGC computations #13

Conversation

lkdvos commented Mar 16, 2024 • edited Loading

codecov bot commented Mar 16, 2024 • edited Loading

Codecov Report

lkdvos commented Mar 17, 2024

github-actions bot Mar 17, 2024

Choose a reason for hiding this comment

github-actions bot Mar 17, 2024

Choose a reason for hiding this comment

lkdvos commented Mar 19, 2024

Jutho commented Mar 21, 2024

lkdvos commented Mar 21, 2024

lkdvos commented Mar 16, 2024 •

edited

Loading

codecov bot commented Mar 16, 2024 •

edited

Loading