Clean up cache #2244

Sbozzolo · 2023-10-17T15:46:18Z

I went ahead and did a large first round of refactoring of the cache. This is what I have done:

Removed the items that can be trivially removed from the cache
Organized the cache by giving it a clear nested structure [see also, 1] and trying to establish ownership for the various quantities. The cache is now much simpler. It looks like:

p = (;
        simulation,
        atmos,
        core,
        sfc_setup,
        params,
        precomputed,
        scratch,
        hyperdiff,
        rayleigh_sponge,
        viscous_sponge,
        precipitation,
        subsidence,
        large_scale_advection,
        edmf_coriolis,
        forcing,
        non_orographic_gravity_wave,
        orographic_gravity_wave,
        radiation_mode,
        turbconv,
        ghost_buffer,
        Δt = dt,
    )

The temporary variables are all in scratch
Note that now it is no longer possible to access variables like ᶜu from p. It has to be p.precomputed. I am not super excited about this change and its impact on existing code. I also think that the ideal change would be break down precomputed and make it more modular. However, precomputed is one of the most involved components of the current cache (hyperdiff gets also some conditional quantities depending on EDMF).
Defined a standard pattern to obtain the cache for a given. For all the fields in AtmosModel except two, now the pattern is something like precipitation = precipitation_cache(Y, atmos). Dispatch over details of the submodel is used internally (for instance precipitation_cache(Y, atmos) = precipitation_cache(Y, atmos, atmos.precip_model)). The two items that could not fit this pattern are: (1) radiation, depending on the model more arguments are needed, (2) turbconv_model, it currently uses parsed_args for "imex_edmf_turbconv", "imex_edmf_gm", and so on (inside dycore_equations_deprecated, which is why I didn't bother fixing it too much).
Defined a generate_limiters_func! that returns limiters_func! depending on the model (I did this also to remove the limiter from the cache).
Moved the details of the numerics inside the AtmosModel, with a new AtmosNumerics type.
Removed radiation_model (now everything is in radiation_mode).
Cleaned up (a little bit) the hack for dummy surfaces.
Removed unused variables here and there.

Easier steps that can be taken next:

sfc_setup is used exclusively in update_surface_conditions! in set_precomputed_quantities.
simulation is pretty much only used for start_date for the insolation (+ job_id in the perf/post processing)

Difficult step that can be taken next:

precomputed_quantities is involved, especially with all the EDMF stuff. The precomputed and the hyperdiff tuples break the idea of ownership because it contains all sorts of variables that logically should belong to other fields. This could be refactored, but it requires thinking about the mechanism for (pre)computing variables.
Move all the named tuples to structs (see also [1])

Open questions:

core contains ᶜΦ, ᶠgradᵥ_ᶜΦ, ᶜρ_ref, ᶜp_ref, ᶜT, ᶜf, ∂ᶜK∂ᶠu₃_data. Is core the best name? Should all these variables be here?
T_ref is hardcoded to FT(255). Is this correct?

Can we remove this block:

ClimaAtmos.jl/src/cache/precomputed_quantities.jl

Lines 315 to 328 in 2da96b1

    
               # TODO: In the following increments to ᶜK, we actually need to add 
        
               # quantities of the form ᶜρaχ⁰ / ᶜρ⁰ and ᶜρaχʲ / ᶜρʲ to ᶜK, rather than 
        
               # quantities of the form ᶜρaχ⁰ / ᶜρ and ᶜρaχʲ / ᶜρ. However, we cannot 
        
               # compute ᶜρ⁰ and ᶜρʲ without first computing ᶜts⁰ and ᶜtsʲ, both of 
        
               # which depend on the value of ᶜp, which in turn depends on ᶜts. Since 
        
               # ᶜts depends on ᶜK (at least when the energy_form is TotalEnergy), this 
        
               # means that the amount by which ᶜK needs to be incremented is a 
        
               # function of ᶜK itself. So, unless we run a nonlinear solver here, this 
        
               # circular dependency will prevent us from computing the exact value of 
        
               # ᶜK. For now, we will make the anelastic approximation ᶜρ⁰ ≈ ᶜρʲ ≈ ᶜρ. 
        
               # add_sgs_ᶜK!(ᶜK, Y, ᶜρa⁰, ᶠu₃⁰, turbconv_model) 
        
               # @. ᶜK += Y.c.sgs⁰.ρatke / Y.c.ρ 
        
               # TODO: We should think more about these increments before we use them. 
        
           end

?

Is this code correct and relevant for all the configurations?

ClimaAtmos.jl/src/cache/cache.jl

Lines 54 to 61 in 2da96b1

    
           if eltype(ᶜcoord) <: Geometry.LatLongZPoint 
        
               Ω = CAP.Omega(params) 
        
               ᶜf = @. 2 * Ω * sind(ᶜcoord.lat) 
        
           else 
        
               f = CAP.f_plane_coriolis_frequency(params) 
        
               ᶜf = map(_ -> f, ᶜcoord) 
        
           end 
        
           ᶜf = @. CT3(Geometry.WVector(ᶜf))

Can we remove this hack?

ClimaAtmos.jl/src/cache/temporary_quantities.jl

Lines 61 to 62 in 2da96b1

# TODO: Remove this hack

sfc_temp_C3 = Fields.Field(C3{FT}, Spaces.level(face_space, half)), # ρ_flux_χ
Is there a more explicit way to achieve what this code is doing (this pattern is common throughout the code and I think that it is just checking energy_mode)

ClimaAtmos.jl/src/prognostic_equations/advection.jl

Line 26 in 2da96b1

if :ρθ in propertynames(Y.c)

Cleaning up stuff and organizing the cache more neatly while still using NamedTuples led to a 5-10 % overall improvement in latency (which compounds the ClimaCore fix for mutable spaces). Moving the cache from a NamedTuple to a struct of NamedTuples bought an additional 5-10 % of improvement in latency. As a result, merging this PR reduces the compile time for jobs in the CI by up to 9 minutes.

As an experiment, I turned one of the NamedTuples inside the cache into a struct (the core one). That led to a percent-level improvement in latency, not enough to stand up the experimental noise, but enough to give another hint that our NamedTuples are not the best for compilation.

[1] There's some lost julia knowledge about (named) tuples with many elements being slow. We should probably just start avoid large named tuples.

Closes #2217

CC: @charleskawczynski @LenkaNovak @szy21

Todo:

Two EDMFX tests are timing out. Figure out what's happening

Sbozzolo · 2023-10-18T00:00:48Z

I am a little stumped on what to do with the tests that timeout. I cannot reproduce on my laptop and on the cluster, so I am extrapolating some guesses. I possibly narrowed down the problem to an inference issue in the EDMFX precomputed variables. The tests time out because julia cannot even compile the code.

set_edmf_precomputed_quantities! is indeed a big and complex function, but I don't understand what change led to this explosion in inference time on the cluster.

szy21 · 2023-10-18T06:35:49Z

All EDMF cases call the same set_edmf_precomputed_quantities!, so it is also confusing that only two of them time out.

Sbozzolo · 2023-10-18T16:03:29Z

All EDMF cases call the same set_edmf_precomputed_quantities!, so it is also confusing that only two of them time out.

At a closer inspection, all the EDMF cases are affected (with compile and run time slowdowns)

charleskawczynski

Can we break this PR down into smaller, easily merge-able parts? My suggestion of some good peel off PRs would be:

Remove comms_ctx from p, and get from the space instead.
Rename param_set to params (this will be pretty big on its own). Also, I think we use the name params for some other data structure elsewhere, so we'll need to look out for name collisions and decide on how to resolve them.
Remove the do_dss bool and compute from the space
Struct-based changes. I might even suggest breaking this PR down into smaller parts.

Sbozzolo · 2023-10-19T18:07:50Z

Can we break this PR down into smaller, easily merge-able parts? My suggestion of some good peel off PRs would be:

Remove comms_ctx from p, and get from the space instead.

Rename param_set to params (this will be pretty big on its own). Also, I think we use the name params for some other data structure elsewhere, so we'll need to look out for name collisions and decide on how to resolve them.

Remove the do_dss bool and compute from the space

Struct-based changes. I might even suggest breaking this PR down into smaller parts.

param_set was set to be exactly what we call params elsewhere (if I follow the chain correctly), so I renamed to be more consistent across the codebase.

Can we break this PR down into smaller, easily merge-able parts?

Most of the commit should work as they are, and there's only a couple of commits that implement more than one change. So, it should be easy to peel off what we want. However, I first want to finalize the larger change (fixing the compilation problems)

2259: Clean up: remove underused quantities r=Sbozzolo a=Sbozzolo Peel off #2244. Co-authored-by: Gabriele Bozzola <[email protected]>

Sbozzolo · 2023-10-30T15:38:47Z

Closed in favor of #2287

Sbozzolo added Refactor House cleaning 🧹 Consistency 🎯 Deleting duplication Latency labels Oct 17, 2023

Sbozzolo added this to the O1.2.1 Clean up ClimaAtmos and run it on GPU milestone Oct 17, 2023

Sbozzolo self-assigned this Oct 17, 2023

Sbozzolo force-pushed the gb/clean_cache branch 2 times, most recently from 5a050ed to 86e2517 Compare October 17, 2023 21:48

Sbozzolo force-pushed the gb/clean_cache branch from 86e2517 to 7fbeb9b Compare October 18, 2023 00:27

Sbozzolo force-pushed the gb/clean_cache branch 2 times, most recently from c92f859 to f7b4d6a Compare October 18, 2023 22:12

charleskawczynski reviewed Oct 19, 2023

View reviewed changes

Sbozzolo force-pushed the gb/clean_cache branch from 759ff2e to 2414602 Compare October 19, 2023 19:05

Sbozzolo mentioned this pull request Oct 19, 2023

Clean up: remove underused quantities #2259

Merged

bors bot added a commit that referenced this pull request Oct 23, 2023

Merge #2259

28e9b42

2259: Clean up: remove underused quantities r=Sbozzolo a=Sbozzolo Peel off #2244. Co-authored-by: Gabriele Bozzola <[email protected]>

bors bot added a commit that referenced this pull request Oct 23, 2023

Merge #2259

caded4d

2259: Clean up: remove underused quantities r=Sbozzolo a=Sbozzolo Peel off #2244. Co-authored-by: Gabriele Bozzola <[email protected]>

Sbozzolo force-pushed the gb/clean_cache branch 2 times, most recently from 2a3bee7 to e7fdca9 Compare October 27, 2023 22:37

Sbozzolo added 7 commits October 29, 2023 14:50

Cache: precipitation_cache

ba761a7

Remove edmfx_nh_pressure_cache

f1c95e6

Cache: large_scale_advection_cache

fae2128

Cache: subsidence

5393ea0

Cache: EDMFCoriolis

560aaf3

Cache: forcing_type

629dba5

Add held_suarez_ΔT_y_T_equator

41dff1f

Sbozzolo added 7 commits October 29, 2023 14:50

Cache: non_orographic_gravity_wave_cache

6816e9b

Cache: orographic gravity wave

df28dfc

Cache: radiation

73120ae

Cache: clean up imports

f2e3aa2

Cache: Remove additional_cache

7ea6ca3

Cache: limiters

50174ee

Cache: hyperdiffiusion

d6d72d0

Sbozzolo force-pushed the gb/clean_cache branch from e7fdca9 to 747ea7d Compare October 29, 2023 21:51

Cache: precomputed

58194fd

Sbozzolo force-pushed the gb/clean_cache branch from 747ea7d to 58194fd Compare October 30, 2023 01:06

Sbozzolo mentioned this pull request Oct 30, 2023

Clean more caches #2287

Merged

Sbozzolo closed this Oct 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up cache #2244

Clean up cache #2244

Sbozzolo commented Oct 17, 2023 •

edited

Loading

Sbozzolo commented Oct 18, 2023

szy21 commented Oct 18, 2023

Sbozzolo commented Oct 18, 2023

charleskawczynski left a comment

Sbozzolo commented Oct 19, 2023 •

edited

Loading

Sbozzolo commented Oct 30, 2023

	# TODO: In the following increments to ᶜK, we actually need to add
	# quantities of the form ᶜρaχ⁰ / ᶜρ⁰ and ᶜρaχʲ / ᶜρʲ to ᶜK, rather than
	# quantities of the form ᶜρaχ⁰ / ᶜρ and ᶜρaχʲ / ᶜρ. However, we cannot
	# compute ᶜρ⁰ and ᶜρʲ without first computing ᶜts⁰ and ᶜtsʲ, both of
	# which depend on the value of ᶜp, which in turn depends on ᶜts. Since
	# ᶜts depends on ᶜK (at least when the energy_form is TotalEnergy), this
	# means that the amount by which ᶜK needs to be incremented is a
	# function of ᶜK itself. So, unless we run a nonlinear solver here, this
	# circular dependency will prevent us from computing the exact value of
	# ᶜK. For now, we will make the anelastic approximation ᶜρ⁰ ≈ ᶜρʲ ≈ ᶜρ.
	# add_sgs_ᶜK!(ᶜK, Y, ᶜρa⁰, ᶠu₃⁰, turbconv_model)
	# @. ᶜK += Y.c.sgs⁰.ρatke / Y.c.ρ
	# TODO: We should think more about these increments before we use them.
	end

	if eltype(ᶜcoord) <: Geometry.LatLongZPoint
	Ω = CAP.Omega(params)
	ᶜf = @. 2 * Ω * sind(ᶜcoord.lat)
	else
	f = CAP.f_plane_coriolis_frequency(params)
	ᶜf = map(_ -> f, ᶜcoord)
	end
	ᶜf = @. CT3(Geometry.WVector(ᶜf))

	# TODO: Remove this hack
	sfc_temp_C3 = Fields.Field(C3{FT}, Spaces.level(face_space, half)), # ρ_flux_χ

Clean up cache #2244

Clean up cache #2244

Conversation

Sbozzolo commented Oct 17, 2023 • edited Loading

Sbozzolo commented Oct 18, 2023

szy21 commented Oct 18, 2023

Sbozzolo commented Oct 18, 2023

charleskawczynski left a comment

Choose a reason for hiding this comment

Sbozzolo commented Oct 19, 2023 • edited Loading

Sbozzolo commented Oct 30, 2023

Sbozzolo commented Oct 17, 2023 •

edited

Loading

Sbozzolo commented Oct 19, 2023 •

edited

Loading