-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize CGC computations #13
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #13 +/- ##
==========================================
+ Coverage 96.30% 96.76% +0.45%
==========================================
Files 7 7
Lines 623 649 +26
==========================================
+ Hits 600 628 +28
+ Misses 23 21 -2 ☔ View full report in Codecov by Sentry. |
I've added some benchmarks for some improved version of the CGC computations, which now use QR instead of SVD to solve the linear problems, along with some in-place optimizations etc. I'll post the results here when they are done, and then check on how to add everything. |
benchmark/benchmarks.jl
Outdated
|
||
using ThreadPinning | ||
ThreadPinning.pinthreads(:cores) | ||
ThreadPinning.threadinfo(;blas=true, hints=true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[JuliaFormatter] reported by reviewdog 🐶
ThreadPinning.threadinfo(;blas=true, hints=true) | |
ThreadPinning.threadinfo(; blas=true, hints=true) |
benchmark/benchmarks.jl
Outdated
f | ||
end | ||
|
||
save("benchmark_results.png", f; px_per_unit=2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[JuliaFormatter] reported by reviewdog 🐶
save("benchmark_results.png", f; px_per_unit=2) | |
save("benchmark_results.png", f; px_per_unit=2) |
The results are in: Looks like a serious upgrade, so I'll clean up this PR and implement everything nicely. |
Looking good. I am wondering which of the changes contributes most to the nice speedup. Threading over the fusion degeneracy |
I actually think that using the threads for BLAS might actually be the best here, and even still, it would not be that hard to define the auxiliary vectors one loop lower, and re-use them only for the different m3s. In any case, the biggest speedup comes from the |
This PR adds some optimizations for computing CGCs:
lower_weight_CGC!
works: solve equations usingqr!
instead ofpinv
, and buildrhs
differently.