Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize CGC computations #13

Merged
merged 13 commits into from
Mar 22, 2024
Merged

Optimize CGC computations #13

merged 13 commits into from
Mar 22, 2024

Conversation

lkdvos
Copy link
Member

@lkdvos lkdvos commented Mar 16, 2024

This PR adds some optimizations for computing CGCs:

  • Add a special case for dealing with Clebsch-Gordan coefficients when one of the sectors is trivial, i.e. $a \otimes I \rightarrow a$ or $I \otimes a \rightarrow a$. For these cases the CGCs are just the identity matrices, thus no computations are needed.
  • Change the way lower_weight_CGC! works: solve equations using qr! instead of pinv, and build rhs differently.
  • Purge values that are almost zero from the final result to save memory on disk and in RAM.

Copy link

codecov bot commented Mar 16, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.76%. Comparing base (4107f08) to head (ad9aea9).

Additional details and impacted files
@@            Coverage Diff             @@
##           master      #13      +/-   ##
==========================================
+ Coverage   96.30%   96.76%   +0.45%     
==========================================
  Files           7        7              
  Lines         623      649      +26     
==========================================
+ Hits          600      628      +28     
+ Misses         23       21       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

benchmark/benchmarks.jl Outdated Show resolved Hide resolved
benchmark/benchmarks.jl Outdated Show resolved Hide resolved
benchmark/benchmarks.jl Outdated Show resolved Hide resolved
benchmark/benchmarks.jl Outdated Show resolved Hide resolved
benchmark/benchmarks.jl Outdated Show resolved Hide resolved
benchmark/benchmarks.jl Outdated Show resolved Hide resolved
benchmark/benchmarks.jl Outdated Show resolved Hide resolved
benchmark/benchmarks.jl Outdated Show resolved Hide resolved
benchmark/benchmarks.jl Outdated Show resolved Hide resolved
benchmark/benchmarks.jl Outdated Show resolved Hide resolved
@lkdvos
Copy link
Member Author

lkdvos commented Mar 17, 2024

I've added some benchmarks for some improved version of the CGC computations, which now use QR instead of SVD to solve the linear problems, along with some in-place optimizations etc. I'll post the results here when they are done, and then check on how to add everything.


using ThreadPinning
ThreadPinning.pinthreads(:cores)
ThreadPinning.threadinfo(;blas=true, hints=true)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
ThreadPinning.threadinfo(;blas=true, hints=true)
ThreadPinning.threadinfo(; blas=true, hints=true)

f
end

save("benchmark_results.png", f; px_per_unit=2)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
save("benchmark_results.png", f; px_per_unit=2)
save("benchmark_results.png", f; px_per_unit=2)

@lkdvos
Copy link
Member Author

lkdvos commented Mar 19, 2024

The results are in:
SUN_CGC_benchmarks
https://gist.github.com/lkdvos/4030a328fc12ea3c939d0b5ccec3ad4a

Looks like a serious upgrade, so I'll clean up this PR and implement everything nicely.

@lkdvos lkdvos changed the title Optimize CGC computation with trivial sectors Optimize CGC computations Mar 19, 2024
@lkdvos lkdvos requested a review from Jutho March 19, 2024 09:23
@lkdvos lkdvos added the enhancement New feature or request label Mar 20, 2024
@lkdvos lkdvos linked an issue Mar 20, 2024 that may be closed by this pull request
@Jutho
Copy link
Member

Jutho commented Mar 21, 2024

Looking good. I am wondering which of the changes contributes most to the nice speedup. Threading over the fusion degeneracy N123 is no longer possible with the storage recycling for building the right hand side, but I assume threading happens at the higher level anyway.

@lkdvos
Copy link
Member Author

lkdvos commented Mar 21, 2024

I actually think that using the threads for BLAS might actually be the best here, and even still, it would not be that hard to define the auxiliary vectors one loop lower, and re-use them only for the different m3s. In any case, the biggest speedup comes from the qr! and ldiv! from what I could find, and afterwards I changed how the arrays were built because of needing them in a dense format afterwards anyways.
I think it is probably harder to find the optimal threading strategy over a wide variety of cases/machines, so I would say that the current approach is quite nice.

@lkdvos lkdvos merged commit 8aabdc9 into master Mar 22, 2024
11 of 12 checks passed
@lkdvos lkdvos deleted the optimizations branch March 22, 2024 07:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Performance check for SU(>4)
2 participants