Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Truncate data to regrid only necessary dates #717

Merged
merged 1 commit into from
Apr 13, 2024
Merged

Conversation

anastasia-popova
Copy link
Contributor

@anastasia-popova anastasia-popova commented Apr 1, 2024

Purpose

This PR aims to truncate the data sets that are regridded in the coupler driver to significantly reduce processing time and memory

To-do

Content


  • I have read and checked the items on the review checklist.

@anastasia-popova anastasia-popova marked this pull request as draft April 1, 2024 19:03
src/Regridder.jl Outdated Show resolved Hide resolved
src/Regridder.jl Outdated Show resolved Hide resolved
@anastasia-popova anastasia-popova linked an issue Apr 2, 2024 that may be closed by this pull request
@anastasia-popova anastasia-popova marked this pull request as ready for review April 2, 2024 20:14
src/Regridder.jl Outdated Show resolved Hide resolved
src/Regridder.jl Outdated Show resolved Hide resolved
Copy link
Collaborator

@LenkaNovak LenkaNovak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job, @anastasia-popova ! This is going to be super helpful. I just had a couple of small suggestions.

experiments/AMIP/coupler_driver.jl Outdated Show resolved Hide resolved
test/regridder_tests.jl Outdated Show resolved Hide resolved
src/Regridder.jl Show resolved Hide resolved
src/Regridder.jl Outdated Show resolved Hide resolved
src/Regridder.jl Outdated Show resolved Hide resolved
# 2. start is early, end is within datafine
# 3. both start and end within datafile
# 4. start is in datafile end is not
# 5. both start and end are later than in datafile
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice tests! Could we also add a test that the sst data is correctly indexed (as well as the time)? I see from running this locally that sst has the correct dimensions, but it could be good to test that original_sst[index_of_start_date] == truncated_sst[1]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this need my function to return the start index? It is all calculated in the function so I don't think there would be a way to know what the index_of_start_date is other than just looking at the index of the first value in trucated_sst defeating the purpose of the test

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you could make it return it optionally.

@anastasia-popova anastasia-popova force-pushed the ap/regrid branch 2 times, most recently from b3037e1 to bc50a43 Compare April 8, 2024 16:07
@anastasia-popova anastasia-popova force-pushed the ap/regrid branch 4 times, most recently from 66b8ad2 to a49cc15 Compare April 11, 2024 21:03
Copy link
Member

@juliasloan25 juliasloan25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great! I just went through and left a final review, it's all minor stuff :)

artifacts/artifact_funcs.jl Outdated Show resolved Hide resolved
artifacts/artifact_funcs.jl Outdated Show resolved Hide resolved
src/Regridder.jl Outdated Show resolved Hide resolved
test/regridder_tests.jl Outdated Show resolved Hide resolved
test/regridder_tests.jl Outdated Show resolved Hide resolved
@anastasia-popova anastasia-popova force-pushed the ap/regrid branch 2 times, most recently from dcdf9e3 to e0c12eb Compare April 11, 2024 23:22
Copy link
Member

@juliasloan25 juliasloan25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me! If Lenka doesn't have any additional comments, feel free to merge :)

Copy link
Collaborator

@LenkaNovak LenkaNovak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very neat, thanks, @anastasia-popova. As I mentioned in the comments, it would be good to include a test that checks the truncation of the variable (in your case "SST"), as well as the dates. I propose splitting truncate_dataset, which allows us to separate the functionality to 1) find the date indices and 2) perform the actual truncation. This will clarify the functions' scope and make the separate functions more broadly useful. :)

src/Regridder.jl Outdated
ds = NCDataset(datafile, "r")
dates = ds["time"][:]

# if the simulation start date is before our first date in the dataset
Copy link
Collaborator

@LenkaNovak LenkaNovak Apr 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please split these if else statements (L661 - L694) into a separate function (e.g. find_idx_bounding_dates)? This way this can be used for different contexts, and it'll be easier to test.

sst_data = Regridder.truncate_dataset(sst_data_all, "test", REGRID_DIR, date0, time_start, time_end, comms_ctx)
ds_truncated = NCDataset(sst_data, "r")
new_dates = ds_truncated["time"][:]

Copy link
Collaborator

@LenkaNovak LenkaNovak Apr 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great test for checking the dates in the truncated dataset, but it would still be good to add a test that checks that the variable in the truncated dataset is aligned with the new truncated dates. If we split truncate_dataset as suggested above, we can add something like:

all_data = ds["SST"][:, :, :]
new_data = ds_truncated["SST"][:, :, :]
first_index = find_idx_bounding_dates(...)
@test new_data[:,:,1] ≈ all_data data[:,:,first_index]  

Copy link
Collaborator

@LenkaNovak LenkaNovak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's perfect, thank you, @anastasia-popova! No change in output as expected, and the initialization of our target GPU AMIP runs is reduced by 5mins (which can be 20-30% of the standard tests we use for development)! Great job!

@anastasia-popova anastasia-popova force-pushed the ap/regrid branch 2 times, most recently from d9a0a81 to 0ced3c7 Compare April 13, 2024 19:23
@anastasia-popova anastasia-popova merged commit fcb4232 into main Apr 13, 2024
8 of 9 checks passed
@anastasia-popova anastasia-popova deleted the ap/regrid branch April 13, 2024 21:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

remap only the necessary dates
3 participants