-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Truncate data to regrid only necessary dates #717
Conversation
5532588
to
8d6fc35
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job, @anastasia-popova ! This is going to be super helpful. I just had a couple of small suggestions.
test/regridder_tests.jl
Outdated
# 2. start is early, end is within datafine | ||
# 3. both start and end within datafile | ||
# 4. start is in datafile end is not | ||
# 5. both start and end are later than in datafile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice tests! Could we also add a test that the sst data is correctly indexed (as well as the time)? I see from running this locally that sst has the correct dimensions, but it could be good to test that original_sst[index_of_start_date] == truncated_sst[1]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this need my function to return the start index? It is all calculated in the function so I don't think there would be a way to know what the index_of_start_date is other than just looking at the index of the first value in trucated_sst defeating the purpose of the test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you could make it return it optionally.
b3037e1
to
bc50a43
Compare
66b8ad2
to
a49cc15
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking great! I just went through and left a final review, it's all minor stuff :)
dcdf9e3
to
e0c12eb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great to me! If Lenka doesn't have any additional comments, feel free to merge :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very neat, thanks, @anastasia-popova. As I mentioned in the comments, it would be good to include a test that checks the truncation of the variable (in your case "SST"), as well as the dates. I propose splitting truncate_dataset
, which allows us to separate the functionality to 1) find the date indices and 2) perform the actual truncation. This will clarify the functions' scope and make the separate functions more broadly useful. :)
src/Regridder.jl
Outdated
ds = NCDataset(datafile, "r") | ||
dates = ds["time"][:] | ||
|
||
# if the simulation start date is before our first date in the dataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please split these if else statements (L661 - L694) into a separate function (e.g. find_idx_bounding_dates
)? This way this can be used for different contexts, and it'll be easier to test.
sst_data = Regridder.truncate_dataset(sst_data_all, "test", REGRID_DIR, date0, time_start, time_end, comms_ctx) | ||
ds_truncated = NCDataset(sst_data, "r") | ||
new_dates = ds_truncated["time"][:] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great test for checking the dates in the truncated dataset, but it would still be good to add a test that checks that the variable in the truncated dataset is aligned with the new truncated dates. If we split truncate_dataset
as suggested above, we can add something like:
all_data = ds["SST"][:, :, :]
new_data = ds_truncated["SST"][:, :, :]
first_index = find_idx_bounding_dates(...)
@test new_data[:,:,1] ≈ all_data data[:,:,first_index]
cd82864
to
cee1484
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's perfect, thank you, @anastasia-popova! No change in output as expected, and the initialization of our target GPU AMIP runs is reduced by 5mins (which can be 20-30% of the standard tests we use for development)! Great job!
d9a0a81
to
0ced3c7
Compare
0ced3c7
to
575a545
Compare
Purpose
This PR aims to truncate the data sets that are regridded in the coupler driver to significantly reduce processing time and memory
To-do
Content