Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add coupler summary table #706

Merged
merged 3 commits into from
May 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 47 additions & 0 deletions .buildkite/benchmarks/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
## ClimaCoupler Benchmarks Pipeline

### Purpose
The goal of the benchmarks pipeline is to have concrete comparisons between
analogous simulations of different setups and on different architectures.
This allows us to compare things like performance and allocations across
atmosphere-only vs coupled runs, and on CPU vs GPU.

This pipeline is triggered manually rather than on a schedule, so that we
can monitor the various metrics after specific changes made to the code.

### Simulation Setups
#### All simulations
- Timestep: 120 seconds
- Horizontal resolution: 30 spectral elements (~110km)
- Vertical resolution: 63 levels
- Config setup duplicated from ClimaAtmos.jl v0.23.0
[gpu_aquaplanet_diagedmf.yml](https://github.com/CliMA/ClimaAtmos.jl/blob/v0.23.0/config/gpu_configs/gpu_aquaplanet_diagedmf.yml),
with minor tweaks

#### CPU ClimaAtmos with diagnostic EDMF
- Atmosphere-only simulation
- Run on 64 CPU threads

#### CPU AMIP with diagnostic EDMF
- ClimaAtmos coupled to ClimaLand bucket model, with prescribed sea surface
temperature and sea ice
- Run on 64 CPU threads

#### GPU ClimaAtmos with diagnostic EDMF
- Atmosphere-only simulation
- Run on 4 A100 GPUs sharing 1 node

#### GPU AMIP with diagnostic EDMF
- ClimaAtmos coupled to ClimaLand bucket model, with prescribed sea surface
temperature and sea ice
- Run on 4 A100 GPUs sharing 1 node

### Comparison Metrics
- Simulated years per day (SYPD): The number of years of simulation time we
can run in 1 day of walltime
- CPU simulation object allocations: The allocations in GB of the simulation
object, which contains everything needed to run the simulation.
In the atmosphere-only case, this is the `AtmosSimulation` object.
In the coupled case, this is the `CoupledSimulation` object, which includes
all of the component models, coupler fields, and auxiliary objects. More
information on this object can be found in the `Interfacer` docs.
108 changes: 108 additions & 0 deletions .buildkite/benchmarks/pipeline.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
agents:
queue: clima
slurm_time: 24:00:00
modules: common

env:
JULIA_NVTX_CALLBACKS: gc
OPENBLAS_NUM_THREADS: 1
OMPI_MCA_opal_warn_on_missing_libcuda: 0
SLURM_KILL_BAD_EXIT: 1
SLURM_GRES_FLAGS: "allow-task-sharing"
BENCHMARK_CONFIG_PATH: "config/benchmark_configs"

steps:
- label: "init :GPU:"
key: "init_gpu_env"
command:
- echo "--- Instantiate experiments/AMIP"
- julia --project=experiments/AMIP -e 'using Pkg; Pkg.instantiate(;verbose=true)'
- julia --project=experiments/AMIP -e 'using Pkg; Pkg.precompile()'
- julia --project=experiments/AMIP -e 'using Pkg; Pkg.status()'

- echo "--- Instantiate test env"
- "julia --project=test/ -e 'using Pkg; Pkg.develop(path=\".\")'"
- "julia --project=test/ -e 'using Pkg; Pkg.instantiate(;verbose=true)'"
- "julia --project=test/ -e 'using Pkg; Pkg.precompile()'"
- "julia --project=test/ -e 'using Pkg; Pkg.status()'"

- echo "--- Download artifacts"
- "julia --project=artifacts -e 'using Pkg; Pkg.instantiate(;verbose=true)'"
- "julia --project=artifacts -e 'using Pkg; Pkg.precompile()'"
- "julia --project=artifacts -e 'using Pkg; Pkg.status()'"
- "julia --project=artifacts artifacts/download_artifacts.jl"

agents:
slurm_gpus: 1
slurm_cpus_per_task: 8
env:
JULIA_NUM_PRECOMPILE_TASKS: 8
JULIA_MAX_NUM_PRECOMPILE_FILES: 50

- wait

- group: "CPU benchmarks"
steps:
- label: "CPU ClimaAtmos with diagnostic EDMF"
key: "climaatmos_diagedmf"
command: "srun julia --color=yes --project=test/ test/component_model_tests/climaatmos_standalone/atmos_driver.jl --config_file $BENCHMARK_CONFIG_PATH/climaatmos_diagedmf.yml"
artifact_paths: "experiments/AMIP/output/climaatmos/climaatmos_diagedmf_artifacts/*"
env:
BUILD_HISTORY_HANDLE: ""
CLIMACOMMS_DEVICE: "CPU"
agents:
slurm_ntasks_per_node: 64
slurm_nodes: 1
slurm_mem_per_cpu: 4GB

- label: "CPU AMIP with diagnostic EDMF"
key: "amip_diagedmf"
command: "srun julia --color=yes --project=experiments/AMIP/ experiments/AMIP/coupler_driver.jl --config_file $BENCHMARK_CONFIG_PATH/amip_diagedmf.yml"
artifact_paths: "experiments/AMIP/output/amip/amip_diagedmf_artifacts/*"
env:
BUILD_HISTORY_HANDLE: ""
CLIMACOMMS_DEVICE: "CPU"
agents:
slurm_ntasks_per_node: 64
slurm_nodes: 1
slurm_mem_per_cpu: 4GB

- group: "GPU benchmarks"
steps:
- label: "GPU ClimaAtmos with diagnostic EDMF"
key: "gpu_climaatmos_diagedmf"
command: "srun julia --threads=3 --color=yes --project=test/ test/component_model_tests/climaatmos_standalone/atmos_driver.jl --config_file $BENCHMARK_CONFIG_PATH/gpu_climaatmos_diagedmf.yml"
artifact_paths: "experiments/AMIP/output/climaatmos/gpu_climaatmos_diagedmf_artifacts/*"
agents:
slurm_gpus_per_task: 1
slurm_cpus_per_task: 4
slurm_ntasks: 4
slurm_mem: 16GB

- label: "GPU AMIP with diagnostic EDMF"
key: "gpu_amip_diagedmf"
command: "srun julia --threads=3 --color=yes --project=experiments/AMIP/ experiments/AMIP/coupler_driver.jl --config_file $BENCHMARK_CONFIG_PATH/gpu_amip_diagedmf.yml"
artifact_paths: "experiments/AMIP/output/amip/gpu_amip_diagedmf_artifacts/*"
agents:
slurm_gpus_per_task: 1
slurm_cpus_per_task: 4
slurm_ntasks: 4
slurm_mem: 16GB

- group: "Generate output table"
steps:
- label: "Compare AMIP/Atmos-only with diagnostic EDMF"
key: "compare_amip_climaatmos_amip_diagedmf"
command: "julia --color=yes --project=experiments/AMIP/ experiments/AMIP/user_io/benchmarks.jl --cpu_run_name_coupled amip_diagedmf --cpu_run_name_atmos climaatmos_diagedmf --gpu_run_name_coupled gpu_amip_diagedmf --gpu_run_name_atmos gpu_climaatmos_diagedmf --mode_name amip --build_id $BUILDKITE_BUILD_NUMBER"
artifact_paths: "experiments/AMIP/output/compare_amip_climaatmos_amip_diagedmf/*"
depends_on:
- "climaatmos_diagedmf"
- "amip_diagedmf"
- "gpu_climaatmos_diagedmf"
- "gpu_amip_diagedmf"

- label: ":envelope: Slack report: CPU/GPU AMIP/Atmos-only table"
depends_on:
- "compare_amip_climaatmos_amip_diagedmf"
command:
- slack-upload -c "#coupler-report" -f experiments/AMIP/output/compare_amip_climaatmos_amip_diagedmf/table.txt -m txt -n compare_amip_climaatmos_amip_diagedmf_table -x "Coupler CPU/GPU Comparison Table"
7 changes: 3 additions & 4 deletions .buildkite/longruns/pipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@ env:
JULIA_MAX_NUM_PRECOMPILE_FILES: 100
GKSwstype: 100
SLURM_KILL_BAD_EXIT: 1

CONFIG_PATH: "config/longrun_configs"

timeout_in_minutes: 1440
Expand Down Expand Up @@ -300,11 +299,11 @@ steps:

# DYAMOND AMIP: 1 day (convection resolving)
- label: "GPU AMIP SUPERFINE: dyamond_target"
key: "gpu_dyamond_target"
key: "gpu_longrun_amip_dyamond"
juliasloan25 marked this conversation as resolved.
Show resolved Hide resolved
command:
- echo "--- Run simulation"
- "julia --color=yes --project=experiments/AMIP/ experiments/AMIP/coupler_driver.jl --config_file $CONFIG_PATH/gpu_dyamond_target.yml"
artifact_paths: "experiments/AMIP/output/amip/gpu_dyamond_target_artifacts/*"
- "julia --color=yes --project=experiments/AMIP/ experiments/AMIP/coupler_driver.jl --config_file $CONFIG_PATH/gpu_longrun_amip_dyamond.yml"
artifact_paths: "experiments/AMIP/output/amip/gpu_longrun_amip_dyamond_artifacts/*"
agents:
queue: clima
slurm_mem: 20GB
Expand Down
5 changes: 2 additions & 3 deletions .buildkite/pipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,8 @@ env:
GKSwstype: 100
SLURM_KILL_BAD_EXIT: 1

CONFIG_PATH: "config/model_configs"
CONFIG_PATH: "config/ci_configs"
PERF_CONFIG_PATH: "config/perf_configs"
MPI_CONFIG_PATH: "config/mpi_configs"

timeout_in_minutes: 240

Expand Down Expand Up @@ -81,7 +80,7 @@ steps:
steps:
- label: "MPI Regridder unit tests"
key: "regridder_mpi_tests"
command: "srun julia --color=yes --project=test/ test/mpi_tests/regridder_mpi_tests.jl --config_file $MPI_CONFIG_PATH/regridder_mpi.yml"
command: "srun julia --color=yes --project=test/ test/mpi_tests/regridder_mpi_tests.jl --config_file $CONFIG_PATH/regridder_mpi.yml"
timeout_in_minutes: 20
env:
CLIMACORE_DISTRIBUTED: "MPI"
Expand Down
2 changes: 2 additions & 0 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ authors = ["CliMA Contributors <[email protected]>"]
version = "0.0.1"

[deps]
CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba"
ClimaComms = "3a4d1b5c-c61d-41fd-a00a-5873ba7a1b0d"
ClimaCore = "d414da3d-4745-48bb-8d80-42e94e092884"
ClimaCoreTempestRemap = "d934ef94-cdd4-4710-83d6-720549644b70"
Expand All @@ -23,6 +24,7 @@ Thermodynamics = "b60c26fb-14c3-4610-9d3e-2d17fe7ff00c"
ClimaComms = "0.5.6"
ClimaCore = "0.13"
ClimaCoreTempestRemap = "0.3"
CUDA = "5"
Dates = "1"
DocStringExtensions = "0.8, 0.9"
JLD2 = "0.4"
Expand Down
18 changes: 18 additions & 0 deletions config/benchmark_configs/amip_diagedmf.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
FLOAT_TYPE: "Float32"
anim: false
atmos_config_file: "config/benchmark_configs/climaatmos_diagedmf.yml"
atmos_config_repo: "ClimaCoupler"
dt_cpl: 120
dt_save_state_to_disk: "Inf"
dt_save_to_sol: "Inf"
energy_check: false
job_id: "amip_diagedmf"
land_albedo_type: "map_temporal"
mode_name: "amip"
mono_surface: false
monthly_checkpoint: false
run_name: "amip_diagedmf"
start_date: "19790301"
t_end: "12hours"
turb_flux_partition: "CombinedStateFluxes"
use_coupler_diagnostics: false
30 changes: 30 additions & 0 deletions config/benchmark_configs/climaatmos_diagedmf.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
FLOAT_TYPE: "Float32"
approximate_linear_solve_iters: 2
juliasloan25 marked this conversation as resolved.
Show resolved Hide resolved
dt: 120secs
dt_cloud_fraction: 1hours
dt_rad: 1hours
dt_save_state_to_disk: "Inf"
dt_save_to_sol: "Inf"
dz_bottom: 30.0
dz_top: 3000.0
edmfx_detr_model: "Generalized"
edmfx_entr_model: "Generalized"
edmfx_nh_pressure: true
edmfx_sgs_diffusive_flux: true
edmfx_sgs_mass_flux: true
edmfx_upwinding: first_order
h_elem: 30
idealized_insolation: false
implicit_diffusion: true
job_id: "climaatmos_diagedmf"
moist: equil
output_default_diagnostics: false
precip_model: 0M
prognostic_tke: true
rad: allskywithclear
surface_setup: DefaultMoninObukhov
t_end: 12hours
toml: [toml/diagnostic_edmfx_box.toml]
turbconv: diagnostic_edmfx
z_elem: 63
z_max: 55000.0
18 changes: 18 additions & 0 deletions config/benchmark_configs/gpu_amip_diagedmf.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
FLOAT_TYPE: "Float32"
anim: false
atmos_config_file: "config/benchmark_configs/gpu_climaatmos_diagedmf.yml"
atmos_config_repo: "ClimaCoupler"
dt_cpl: 120
dt_save_state_to_disk: "Inf"
dt_save_to_sol: "Inf"
energy_check: false
job_id: "gpu_amip_diagedmf"
land_albedo_type: "map_temporal"
mode_name: "amip"
mono_surface: false
monthly_checkpoint: false
run_name: "gpu_amip_diagedmf"
start_date: "19790301"
t_end: "12hours"
turb_flux_partition: "CombinedStateFluxes"
use_coupler_diagnostics: false
30 changes: 30 additions & 0 deletions config/benchmark_configs/gpu_climaatmos_diagedmf.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
FLOAT_TYPE: "Float32"
approximate_linear_solve_iters: 2
dt: 120secs
dt_cloud_fraction: 1hours
dt_rad: 1hours
dt_save_state_to_disk: "Inf"
dt_save_to_sol: "Inf"
dz_bottom: 30.0
dz_top: 3000.0
edmfx_detr_model: "Generalized"
edmfx_entr_model: "Generalized"
edmfx_nh_pressure: true
edmfx_sgs_diffusive_flux: true
edmfx_sgs_mass_flux: true
edmfx_upwinding: first_order
h_elem: 30
idealized_insolation: false
implicit_diffusion: true
job_id: "gpu_climaatmos_diagedmf"
moist: equil
output_default_diagnostics: false
precip_model: 0M
prognostic_tke: true
rad: allskywithclear
surface_setup: DefaultMoninObukhov
t_end: 12hours
toml: [toml/diagnostic_edmfx_box.toml]
turbconv: diagnostic_edmfx
z_elem: 63
z_max: 55000.0
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,12 @@ dt_cpl: 50
dt_save_state_to_disk: "0.5days"
dt_save_to_sol: "0.5days"
energy_check: false
job_id: "gpu_dyamond_target"
job_id: "gpu_longrun_amip_dyamond"
land_albedo_type: "map_temporal"
mode_name: "amip"
mono_surface: false
monthly_checkpoint: false
run_name: "gpu_dyamond_target"
run_name: "gpu_longrun_amip_dyamond"
start_date: "19790301"
t_end: "1days"
juliasloan25 marked this conversation as resolved.
Show resolved Hide resolved
turb_flux_partition: "CombinedStateFluxes"
4 changes: 2 additions & 2 deletions experiments/AMIP/Manifest.toml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

julia_version = "1.10.2"
manifest_format = "2.0"
project_hash = "c00c8204c76db2774e82408096e51d91be9ef6bf"
project_hash = "36cae8e3da41534867db0a0941600724a7517b72"

[[deps.ADTypes]]
git-tree-sha1 = "016833eb52ba2d6bea9fcb50ca295980e728ee24"
Expand Down Expand Up @@ -401,7 +401,7 @@ uuid = "d934ef94-cdd4-4710-83d6-720549644b70"
version = "0.3.14"

[[deps.ClimaCoupler]]
deps = ["ClimaAtmos", "ClimaComms", "ClimaCore", "ClimaCoreTempestRemap", "ClimaLand", "ClimaParams", "Dates", "DocStringExtensions", "Insolation", "JLD2", "NCDatasets", "Plots", "SciMLBase", "StaticArrays", "Statistics", "SurfaceFluxes", "TempestRemap_jll", "Thermodynamics"]
deps = ["CUDA", "ClimaComms", "ClimaCore", "ClimaCoreTempestRemap", "Dates", "DocStringExtensions", "JLD2", "NCDatasets", "Plots", "SciMLBase", "StaticArrays", "Statistics", "SurfaceFluxes", "TempestRemap_jll", "Thermodynamics"]
path = "../.."
uuid = "4ade58fe-a8da-486c-bd89-46df092ec0c7"
version = "0.0.1"
Expand Down
8 changes: 8 additions & 0 deletions experiments/AMIP/cli_options.jl
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,10 @@ function argparse_settings()
help = "Device type to use [`auto` (default) `CPUSingleThreaded`, `CPUMultiThreaded`, `CUDADevice`]"
arg_type = String
default = "auto"
"--use_coupler_diagnostics"
help = "Boolean flag indicating whether to compute and output coupler diagnostics [`true` (default), `false`]"
arg_type = Bool
default = true
# ClimaAtmos specific
"--surface_setup"
help = "Triggers ClimaAtmos into the coupled mode [`PrescribedSurface` (default)]" # retained here for standalone Atmos benchmarks
Expand All @@ -89,6 +93,10 @@ function argparse_settings()
help = "Type of albedo model. [`ConstantAlbedo` (default), `RegressionFunctionAlbedo`, `CouplerAlbedo`]"
arg_type = String
default = "CouplerAlbedo"
"--atmos_config_repo"
help = "The repository containing the ClimaAtmos configuration file to use [`ClimaAtmos` (default), `ClimaCoupler`]"
arg_type = String
default = "ClimaAtmos"
# ClimaLand specific
"--land_albedo_type"
help = "Access land surface albedo information from data file. [`function`, `map_static`, `map_temporal`]"
Expand Down
Loading
Loading