Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up cache #2244

Closed
wants to merge 15 commits into from
Closed

Clean up cache #2244

wants to merge 15 commits into from

Conversation

Sbozzolo
Copy link
Member

@Sbozzolo Sbozzolo commented Oct 17, 2023

I went ahead and did a large first round of refactoring of the cache. This is what I have done:

  • Removed the items that can be trivially removed from the cache
  • Organized the cache by giving it a clear nested structure [see also, 1] and trying to establish ownership for the various quantities. The cache is now much simpler. It looks like:
p = (;
        simulation,
        atmos,
        core,
        sfc_setup,
        params,
        precomputed,
        scratch,
        hyperdiff,
        rayleigh_sponge,
        viscous_sponge,
        precipitation,
        subsidence,
        large_scale_advection,
        edmf_coriolis,
        forcing,
        non_orographic_gravity_wave,
        orographic_gravity_wave,
        radiation_mode,
        turbconv,
        ghost_buffer,
        Δt = dt,
    )
  • The temporary variables are all in scratch
  • Note that now it is no longer possible to access variables like ᶜu from p. It has to be p.precomputed. I am not super excited about this change and its impact on existing code. I also think that the ideal change would be break down precomputed and make it more modular. However, precomputed is one of the most involved components of the current cache (hyperdiff gets also some conditional quantities depending on EDMF).
  • Defined a standard pattern to obtain the cache for a given. For all the fields in AtmosModel except two, now the pattern is something like precipitation = precipitation_cache(Y, atmos). Dispatch over details of the submodel is used internally (for instance precipitation_cache(Y, atmos) = precipitation_cache(Y, atmos, atmos.precip_model)). The two items that could not fit this pattern are: (1) radiation, depending on the model more arguments are needed, (2) turbconv_model, it currently uses parsed_args for "imex_edmf_turbconv", "imex_edmf_gm", and so on (inside dycore_equations_deprecated, which is why I didn't bother fixing it too much).
  • Defined a generate_limiters_func! that returns limiters_func! depending on the model (I did this also to remove the limiter from the cache).
  • Moved the details of the numerics inside the AtmosModel, with a new AtmosNumerics type.
  • Removed radiation_model (now everything is in radiation_mode).
  • Cleaned up (a little bit) the hack for dummy surfaces.
  • Removed unused variables here and there.

Easier steps that can be taken next:

  • sfc_setup is used exclusively in update_surface_conditions! in set_precomputed_quantities.
  • simulation is pretty much only used for start_date for the insolation (+ job_id in the perf/post processing)

Difficult step that can be taken next:

  • precomputed_quantities is involved, especially with all the EDMF stuff. The precomputed and the hyperdiff tuples break the idea of ownership because it contains all sorts of variables that logically should belong to other fields. This could be refactored, but it requires thinking about the mechanism for (pre)computing variables.
  • Move all the named tuples to structs (see also [1])

Open questions:

  • core contains ᶜΦ, ᶠgradᵥ_ᶜΦ, ᶜρ_ref, ᶜp_ref, ᶜT, ᶜf, ∂ᶜK∂ᶠu₃_data. Is core the best name? Should all these variables be here?
  • T_ref is hardcoded to FT(255). Is this correct?
  • Can we remove this block:
    # TODO: In the following increments to ᶜK, we actually need to add
    # quantities of the form ᶜρaχ⁰ / ᶜρ⁰ and ᶜρaχʲ / ᶜρʲ to ᶜK, rather than
    # quantities of the form ᶜρaχ⁰ / ᶜρ and ᶜρaχʲ / ᶜρ. However, we cannot
    # compute ᶜρ⁰ and ᶜρʲ without first computing ᶜts⁰ and ᶜtsʲ, both of
    # which depend on the value of ᶜp, which in turn depends on ᶜts. Since
    # ᶜts depends on ᶜK (at least when the energy_form is TotalEnergy), this
    # means that the amount by which ᶜK needs to be incremented is a
    # function of ᶜK itself. So, unless we run a nonlinear solver here, this
    # circular dependency will prevent us from computing the exact value of
    # ᶜK. For now, we will make the anelastic approximation ᶜρ⁰ ≈ ᶜρʲ ≈ ᶜρ.
    # add_sgs_ᶜK!(ᶜK, Y, ᶜρa⁰, ᶠu₃⁰, turbconv_model)
    # @. ᶜK += Y.c.sgs⁰.ρatke / Y.c.ρ
    # TODO: We should think more about these increments before we use them.
    end
    ?
  • Is this code correct and relevant for all the configurations?
    if eltype(ᶜcoord) <: Geometry.LatLongZPoint
    Ω = CAP.Omega(params)
    ᶜf = @. 2 * Ω * sind(ᶜcoord.lat)
    else
    f = CAP.f_plane_coriolis_frequency(params)
    ᶜf = map(_ -> f, ᶜcoord)
    end
    ᶜf = @. CT3(Geometry.WVector(ᶜf))
  • Can we remove this hack?
    # TODO: Remove this hack
    sfc_temp_C3 = Fields.Field(C3{FT}, Spaces.level(face_space, half)), # ρ_flux_χ
  • Is there a more explicit way to achieve what this code is doing (this pattern is common throughout the code and I think that it is just checking energy_mode)
    if :ρθ in propertynames(Y.c)

Cleaning up stuff and organizing the cache more neatly while still using NamedTuples led to a 5-10 % overall improvement in latency (which compounds the ClimaCore fix for mutable spaces). Moving the cache from a NamedTuple to a struct of NamedTuples bought an additional 5-10 % of improvement in latency. As a result, merging this PR reduces the compile time for jobs in the CI by up to 9 minutes.

As an experiment, I turned one of the NamedTuples inside the cache into a struct (the core one). That led to a percent-level improvement in latency, not enough to stand up the experimental noise, but enough to give another hint that our NamedTuples are not the best for compilation.

[1] There's some lost julia knowledge about (named) tuples with many elements being slow. We should probably just start avoid large named tuples.

Closes #2217

CC: @charleskawczynski @LenkaNovak @szy21

Todo:

  • Two EDMFX tests are timing out. Figure out what's happening

@Sbozzolo
Copy link
Member Author

I am a little stumped on what to do with the tests that timeout. I cannot reproduce on my laptop and on the cluster, so I am extrapolating some guesses. I possibly narrowed down the problem to an inference issue in the EDMFX precomputed variables. The tests time out because julia cannot even compile the code.

set_edmf_precomputed_quantities! is indeed a big and complex function, but I don't understand what change led to this explosion in inference time on the cluster.

@szy21
Copy link
Member

szy21 commented Oct 18, 2023

All EDMF cases call the same set_edmf_precomputed_quantities!, so it is also confusing that only two of them time out.

@Sbozzolo
Copy link
Member Author

All EDMF cases call the same set_edmf_precomputed_quantities!, so it is also confusing that only two of them time out.

At a closer inspection, all the EDMF cases are affected (with compile and run time slowdowns)

@Sbozzolo Sbozzolo force-pushed the gb/clean_cache branch 2 times, most recently from c92f859 to f7b4d6a Compare October 18, 2023 22:12
Copy link
Member

@charleskawczynski charleskawczynski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we break this PR down into smaller, easily merge-able parts? My suggestion of some good peel off PRs would be:

  • Remove comms_ctx from p, and get from the space instead.
  • Rename param_set to params (this will be pretty big on its own). Also, I think we use the name params for some other data structure elsewhere, so we'll need to look out for name collisions and decide on how to resolve them.
  • Remove the do_dss bool and compute from the space
  • Struct-based changes. I might even suggest breaking this PR down into smaller parts.

@Sbozzolo
Copy link
Member Author

Sbozzolo commented Oct 19, 2023

Can we break this PR down into smaller, easily merge-able parts? My suggestion of some good peel off PRs would be:

  • Remove comms_ctx from p, and get from the space instead.
  • Rename param_set to params (this will be pretty big on its own). Also, I think we use the name params for some other data structure elsewhere, so we'll need to look out for name collisions and decide on how to resolve them.
  • Remove the do_dss bool and compute from the space
  • Struct-based changes. I might even suggest breaking this PR down into smaller parts.

param_set was set to be exactly what we call params elsewhere (if I follow the chain correctly), so I renamed to be more consistent across the codebase.

Can we break this PR down into smaller, easily merge-able parts?

Most of the commit should work as they are, and there's only a couple of commits that implement more than one change. So, it should be easy to peel off what we want. However, I first want to finalize the larger change (fixing the compilation problems)

bors bot added a commit that referenced this pull request Oct 23, 2023
2259: Clean up: remove underused quantities r=Sbozzolo a=Sbozzolo

Peel off #2244.

Co-authored-by: Gabriele Bozzola <[email protected]>
bors bot added a commit that referenced this pull request Oct 23, 2023
2259: Clean up: remove underused quantities r=Sbozzolo a=Sbozzolo

Peel off #2244.

Co-authored-by: Gabriele Bozzola <[email protected]>
@Sbozzolo Sbozzolo force-pushed the gb/clean_cache branch 2 times, most recently from 2a3bee7 to e7fdca9 Compare October 27, 2023 22:37
@Sbozzolo
Copy link
Member Author

Closed in favor of #2287

@Sbozzolo Sbozzolo closed this Oct 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace cache namedtuple with explicit struct
3 participants