Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reapply reproducibility test improvements #3435

Closed
charleskawczynski opened this issue Nov 13, 2024 · 4 comments
Closed

Reapply reproducibility test improvements #3435

charleskawczynski opened this issue Nov 13, 2024 · 4 comments
Assignees

Comments

@charleskawczynski
Copy link
Member

#3433 reverted #3430 and #3425, and we should (correctly / more robustly) reapply these PRs. I think maybe applying the structural changes first is wise, then we can add more dependence on it.

Some important features I remember are:

  • our current export_nc is not gpu-compatible, so we should switch to just using the HDF5 files, like our restart
  • There was a lot of room for name improvements
  • Adding all of the cases made it difficult to accurately and thoroughly maintain the mse tables
  • Can we somehow add unit tests for some of these functions? Or better document the ("few") pieces that need to coordinate?
@charleskawczynski charleskawczynski self-assigned this Nov 13, 2024
@akshaysridhar
Copy link
Member

akshaysridhar commented Nov 18, 2024

@charleskawczynski There seems to be another issue with our longrun builds (and their interactions with reproducibility test paths) : See https://buildkite.com/clima/climaatmos-gpulongruns/builds/434#_ for example.

Stacktrace copied here (result of the issue is that the post-proc artifacts are not generated, they can still be manually produced offline with the nc datasets) :

ERROR: LoadError: IOError: readdir("/central/scratch/esm/slurm-buildkite/climaatmos-main"): no such file or directory (ENOENT)
Stacktrace:
 [1] uv_error
   @ ./libuv.jl:100 [inlined]
 [2] readdir(dir::String; join::Bool, sort::Bool)
   @ Base.Filesystem ./file.jl:869
 [3] readdir
   @ ./file.jl:862 [inlined]
 [4] sorted_dataset_folder(; dir::String)
   @ Main /scratch/clima/slurm-buildkite/climaatmos-gpulongruns/434/climaatmos-gpulongruns/reproducibility_tests/reproducibility_utils.jl:14
 [5] sorted_dataset_folder
   @ /scratch/clima/slurm-buildkite/climaatmos-gpulongruns/434/climaatmos-gpulongruns/reproducibility_tests/reproducibility_utils.jl:13 [inlined]
 [6] latest_comparable_paths(; n::Int64, root_path::String, ref_counter_PR::Int64)
   @ Main /scratch/clima/slurm-buildkite/climaatmos-gpulongruns/434/climaatmos-gpulongruns/reproducibility_tests/reproducibility_utils.jl:77
 [7] latest_comparable_paths()
   @ Main /scratch/clima/slurm-buildkite/climaatmos-gpulongruns/434/climaatmos-gpulongruns/reproducibility_tests/reproducibility_utils.jl:68
 [8] top-level scope
   @ /scratch/clima/slurm-buildkite/climaatmos-gpulongruns/434/climaatmos-gpulongruns/examples/hybrid/driver.jl:155
in expression starting at /scratch/clima/slurm-buildkite/climaatmos-gpulongruns/434/climaatmos-gpulongruns/examples/hybrid/driver.jl:146
Saving profiler information

@charleskawczynski
Copy link
Member Author

@charleskawczynski There seems to be another issue with our longrun builds (and their interactions with reproducibility test paths) : See https://buildkite.com/clima/climaatmos-gpulongruns/builds/434#_ for example.

This should be fixed by #3443.

@akshaysridhar
Copy link
Member

@charleskawczynski There seems to be another issue with our longrun builds (and their interactions with reproducibility test paths) : See https://buildkite.com/clima/climaatmos-gpulongruns/builds/434#_ for example.

This should be fixed by #3443.

Noted, thanks!

@charleskawczynski
Copy link
Member Author

Closed by #3513, #3510, #3507, #3502, #3500, #3497, #3496, #3493

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants