Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add disk-cached CGC coefficients #8

Merged
merged 35 commits into from
Jan 18, 2024
Merged

Add disk-cached CGC coefficients #8

merged 35 commits into from
Jan 18, 2024

Conversation

lkdvos
Copy link
Member

@lkdvos lkdvos commented Nov 27, 2023

This is my attempt at providing a thread-safe and performant way of having cached coefficients saved on disk.
Importantly, I avoid the use of JLD2, which should be slightly faster but less portable.
Additionally, this provides the ability to precompute a large number of CGCs in parallel, which can be useful on clusters.

Thoughts and comments welcome.

@lkdvos lkdvos requested a review from maartenvd November 27, 2023 17:03
Copy link

codecov bot commented Nov 27, 2023

Codecov Report

Attention: 10 lines in your changes are missing coverage. Please review.

Comparison is base (baff207) 97.92% compared to head (ce49f95) 96.57%.

Files Patch % Lines
src/caching.jl 89.28% 9 Missing ⚠️
src/sector.jl 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master       #8      +/-   ##
==========================================
- Coverage   97.92%   96.57%   -1.35%     
==========================================
  Files           5        6       +1     
  Lines         529      613      +84     
==========================================
+ Hits          518      592      +74     
- Misses         11       21      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@maartenvd
Copy link
Collaborator

I think it looks good! How slow is it to re-open the file every time a new key is needed? Would it make sense to hold an open handle to the file?

@lkdvos
Copy link
Member Author

lkdvos commented Dec 4, 2023

I can run some benchmarks next week maybe. It seems to be quite fast and not so important, especially since by default the RAM cache keeps 1e5 elements stored, so every CGC is only loaded from disk once. Opening the file compared to computing it seems to be faster anyways.
In any case, I think this would really benefit from some additional use of the symmetries of the CGC's, for now it quite quickly fills up a big chunk of memory.
Somehow it would also be nice to use a format that is slightly more portable instead of the serialization, I think for running things on clusters etc this would be nice.

* using JLD2 to make files relocateable
* always write new CGCs to disk
    * reading/writing is now necessarily blocking
* `precompute_disk_data` now immediately writes results to disk
    * saves intermediate progress
    * reduces memory usage
* `cache_info()` now gives more information

TODO:
- [ ] Update documentation/README
- [ ] Profile SU(>4) for hanging issues
- [ ] Use CGC symmetries to reduce memory usage (?)
- [ ] Check for near-zero entries in CGCs
-> this would remove cached files every time tests are run
now splits the cache file into a directory structure, where each combination of s1 x s2 gets its own file.
This way, once the file is written, it can be read in parallel, and by multiple processes, without any locking.
This should also make it easier to clean up the cache, as the directory is now more human-readable as well
@lkdvos
Copy link
Member Author

lkdvos commented Jan 10, 2024

I still want to add the functionality to only store CGCs on disk for s1 x s2 if s1 < s2, to reduce the amount of storage space that is required, but I think this is already an improvement for caching

Project.toml Outdated Show resolved Hide resolved
Project.toml Outdated Show resolved Hide resolved
src/caching.jl Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
src/caching.jl Outdated Show resolved Hide resolved
src/caching.jl Outdated Show resolved Hide resolved
src/caching.jl Outdated Show resolved Hide resolved
src/caching.jl Outdated Show resolved Hide resolved
src/caching.jl Outdated Show resolved Hide resolved
src/caching.jl Outdated Show resolved Hide resolved
@lkdvos lkdvos force-pushed the disk-cache branch 2 times, most recently from a0f89af to 9e10313 Compare January 17, 2024 15:03
@lkdvos lkdvos force-pushed the disk-cache branch 2 times, most recently from c65ee90 to 01e5a98 Compare January 17, 2024 16:38
Copy link
Member

@Jutho Jutho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great; some small final comments.

LocalPreferences.toml Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
src/SUNRepresentations.jl Outdated Show resolved Hide resolved
Remove the CGC cache for ``SU(N)`` with eltype `T` from disk.
"""
function clear_disk_cache!(N, T=Float64)
fldrname = joinpath(CGC_CACHE_PATH, string(N), string(T))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would maybe expect the behaviour that, if T is not specified, it removes all CGCs for N, irrespective of T, so essentially rm(joinpath(CGC_CACHE_PATH, string(N)), recursive=true).

src/caching.jl Show resolved Hide resolved
src/caching.jl Outdated Show resolved Hide resolved
test/caching.jl Show resolved Hide resolved
@lkdvos lkdvos merged commit 8342007 into master Jan 18, 2024
18 of 20 checks passed
@lkdvos lkdvos deleted the disk-cache branch January 18, 2024 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants