Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time averaging and FieldSlicer for NetCDFOutputWriter #1040

Merged
merged 36 commits into from
Oct 14, 2020

Conversation

ali-ramadhan
Copy link
Member

@ali-ramadhan ali-ramadhan commented Oct 13, 2020

This PR refactors the NetCDFOutputWriter to use FieldSlicer (which cleans it up a bit) and adds support for time-averaging for NetCDF. All the existing NetCDF tests pass (of which there are quite a few).

I also added a test for strided windowed time averaging of horizontal averages for NetCDFOutputWriter. Oceananigans solves ∂c/∂t = - λ(x, y, z) c where λ(x, y, z) = x + (1 - y)^2 + tanh(z) which is independent exoponential decay at every grid point so you can analytically compute what the output of the horizontal average and the strided windowed time average should be. Thankfully the test passes 🎉

I also reorganized test_output_writers.jl quite a bit. I think it's big enough that it should be split up into multiple files but I'll leave this for a future PR since it would make reviewing this PR's diff difficult.

Would be nice if NetCDF accepted a named tuple for outputs and had a less clunky interface than just dicts for everything. Might have to wait for a future PR though...

X-Ref: Alexander-Barth/NCDatasets.jl#105

Resolves #876

@ali-ramadhan ali-ramadhan requested a review from glwagner October 13, 2020 09:54
@codecov
Copy link

codecov bot commented Oct 13, 2020

Codecov Report

Merging #1040 into master will decrease coverage by 30.30%.
The diff coverage is 29.57%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master    #1040       +/-   ##
===========================================
- Coverage   54.75%   24.44%   -30.31%     
===========================================
  Files         160      158        -2     
  Lines        3866     3681      -185     
===========================================
- Hits         2117      900     -1217     
- Misses       1749     2781     +1032     
Impacted Files Coverage Δ
src/AbstractOperations/AbstractOperations.jl 50.00% <ø> (ø)
src/Diagnostics/Diagnostics.jl 100.00% <ø> (ø)
src/Oceananigans.jl 66.66% <ø> (ø)
src/OutputWriters/OutputWriters.jl 33.33% <ø> (-33.34%) ⬇️
src/OutputWriters/field_slicer.jl 0.00% <ø> (-53.34%) ⬇️
src/OutputWriters/jld2_output_writer.jl 0.00% <ø> (-90.57%) ⬇️
src/OutputWriters/netcdf_output_writer.jl 0.00% <0.00%> (-81.71%) ⬇️
src/OutputWriters/windowed_time_average.jl 0.00% <ø> (-84.62%) ⬇️
src/Fields/abstract_field.jl 48.97% <100.00%> (-9.03%) ⬇️
src/Utils/pretty_time.jl 96.55% <100.00%> (+14.19%) ⬆️
... and 108 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9cfb568...1d8c70c. Read the comment docs.

previous :: Float64
verbose :: Bool
end

"""
NetCDFOutputWriter(model, outputs; filename, iteration_interval=nothing, time_interval=nothing,
NetCDFOutputWriter(model, outputs; filepath, iteration_interval=nothing, time_interval=nothing,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In JLD2OutputWriter we have two arguments: dir, and filename. This is combined internally into a filepath. I suppose the reasoning is that its common to have multiple output writers (which you'd want to save data into the same directory), but different names. Having two keyword arguments makes that slightly more convenient. I'm ok with using filepath instead if there's a good argument. But I think both output writers should have the same interface?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, looks like JLD2OutputWriter uses "prefix":

JLD2OutputWriter(model, outputs; prefix,

The distinction is that prefix is appended with various endings (eg .jld2, or sometimes _part$n.jld2) to complete the file name.

Happy to change JLD2OutputWriter to conform to a style we think is clearest --- let's discuss.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think a friendly interface would use filename, but allow the user to give only the first "part" of the file name, leaving out .jld2 or .nc --- which are already made obvious by the name of the object (NetCDF* or JLD2*). This has the advantage that this keyword argument doesn't need to be changed if switching between output writers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree a shared interface would be good, but I think the reason JLD2OutputWriter has both a dir and filename is to support file splitting with automatic file naming.

The filepath should include both the directory and filename so I think having two arguments is kind of clunky. Ideally both would just use a filepath.

I guess NetCDFOutputWriter gets away with just a filepath because it doesn't support file splitting so the filename never changes during the course of a simulation.

We could modify the JLD2OutputWriter to just take in a filepath. Two possible solutions are:

  1. File splitting could be supported by injecting part1, part2, etc. appropriately into the filepath, or
  2. via another kwarg filename_pattern that's a function filename_pattern(n::Int)::String that returns the filename for part n.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A third alternative would be to remove the ability to split files, but I think this is a bad idea. While there is no file size limit for HDF5 and we typically don't split files since our files aren't huge, there will be users in the future who will run huge models and need this functionality due to memory limits (or even filesystem limits).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess NetCDFOutputWriter gets away with just a filepath because it doesn't support file splitting so the filename never changes during the course of a simulation.

Do we plan on supporting file splitting for NetCDFOutputWriter in the future?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think we should. We should continue discussion in #884 (where it seems we already liked filepath haha).

Copy link
Member

@glwagner glwagner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome. Some small comments about the interface but it doesn't have to be addressed in this PR. We need docs on setting up WindowedTimeAverage.

@glwagner
Copy link
Member

Does AveragedField and ComputedField work seamlessly with NetCDFOutputWriter now? If so, we can nuke average.jl and computation.jl.

@ali-ramadhan
Copy link
Member Author

Does AveragedField and ComputedField work seamlessly with NetCDFOutputWriter now? If so, we can nuke average.jl and computation.jl.

Yes good point. I'll nuke them and probably just need to do some %s/Average/AveragedField/gc.

@ali-ramadhan ali-ramadhan merged commit c12ecfe into master Oct 14, 2020
@ali-ramadhan ali-ramadhan deleted the ar/update-netcdf branch October 14, 2020 21:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Computation has result at wrong location
2 participants