CAM-ML: A Fork of The Community Atmosphere Model Implementing a Machine Learning Convection Parameterisation
This Fork of the CAM model implements a new convection parameterisation, YOG. The parameterisation is a machine learning implementation using a neural net trained on high-resolution cloud-resolving simulations in the SAM model as described in:\
- Yuval, J., O’Gorman, P.A. Stable machine-learning parameterization of subgrid processes for climate modeling at a range of resolutions Nat Commun 11, 3295 (2020). DOI: 10.1038/s41467-020-17142-3
- Yuval, J., O'Gorman, P.A., Hill, C.N. Use of Neural Networks for Stable, Accurate and Physically Consistent Parameterization of Subgrid Atmospheric Processes With Good Performance at Reduced Precision Geophysical Research Letters, 48, e2020GL091363 (2021). DOI: 10.1029/2020GL091363
The work is contained in a CAM-ML
branch which is based off the cam_cesm2_1_rel_60
tag.
Note that the doc
folder and README_EXTERNALS
are not updated, but just artifacts from the fork that we are keeping to be able to merge back to CESM later.
Information specific to running this fork of CAM within CESM is included in this README.
This section describes how to use CAM-ML as the atmospheric component in a normal CESM run. In its current build pipeline, the code assumes that it is run on NCAR's supercomputer Derecho.
Clone a copy of CESM from git and checkout the cesm2.1.5
tag on which this work is based:
git clone https://github.com/escomp/cesm.git my_cesm_sandbox_2_1
cd my_cesm_sandbox_2_1/
git checkout cesm2.1.5
To use this model in a CESM run you need to modify the Externals.cfg
file in the
main CESM directory to replace the CAM entry with:
[cam]
branch = CAM-ML
protocol = git
repo_url = https://github.com/m2lines/CAM-ML
local_path = components/cam
externals = Externals_CAM.cfg
required = True
This will pull the CAM-ML
branch of this repo in as the CAM component.
You can now run, from within the CESM root directory,
./manage_externals/checkout_externals
to fetch the external components.
Note
If you want to change the externals, or have made a mistake in this step, you have to delete the newly created components
folder (or the respective subfolders therein) in the base directory before you rerun checkout_externals
.)
Details on creating a case can be found here on the NCAR website. For this work we are using the gate III testcase which can be set up by running:
./create_newcase --case <path_to_testcase_directory> --compset FSCAM --res T42_T42 --user-mods-dir ../../components/cam/cime_config/usermods_dirs/scam_gateIII --project NCGD0054
from <cesm_root>/cime/scripts/
.
The <testcase_directory>
should be a separate directory outside of the code directory, to avoid cluttering up the local repository.
Run ./case.setup
for the case setup and creation of namelist files. Optionally, you can also run ./check_case
to check everything is ok. Once this has been done then edit user_nl_cam
for the case as detailed below. This is a CAM namelist generated from the default for the case.
Add the following lines:
deep_scheme = 'off'
yog_scheme = 'on'
If running a comparison to the ZM scheme also addrun_deep_comp = 'on'
.yog_nn_weights = '<PATH/TO/WEIGHTS.nc>'
The path to the NN weights. There are some weights in `src/physics/cam/' of this respository (CAM-ML) which can be used, or they can be generated from the standalone model.SAM_sounding = '<PATH/TO/SAM/SOUNDING.nc>'
The path to the SAM sounding for the NN.
This file is generated using thesounding_to_netcdf.py
script in the resources of the standalone NN code, and it can be just copied over to a suitable place.
All the paths have to be absolute, as they will be used when the code is run on the compute nodes.
We can then run ./case.build
to build the test case, and ./case.submit
to submit the job to the scheduler.
Note:
By default, CESM will place outputs and logs/restart files on Derecho in /glade/derecho/scratch/<user>/<case>/
, and then move them over to /glade/derecho/scratch/<user>/archive/<case>/
.
To place all output with logs in archive/<case>
switch 'short term archiving' on by editing env_run.xml
in the case directory to change DOUT_S
from FALSE
to TRUE
-- you can do this quickly by running ./xmlchange DOUT_S=FALSE
.
(In fact, you will find output files in all three places: /glade/derecho/scratch/<user>/<case>/
, /glade/derecho/scratch/<user>/archive/<case>/
-- unless the run has failed and was not moved to archive
, and <testcase_directory>
.)
The run will create a bld
and a run
directory in the output case folder on /glade/derecho/scratch/<user>/. The
blddirectory contains build files, logs and executables. The
run` directory contains the run logs, namelists and output NetCDF files.
A successful model run will have the line
******* END OF MODEL RUN *******
at the bottom of the run log file for the atmosphere run, i.e., atm.log.<stuff>.gz
.
A model run will also generate timing files in a subdirectory of the the caseroot directory (<testcase_directory>/timing
), unless you set ./xmlchange CHECK_TIMING=FALSE
. There are other timing files in the output directories on scratch
, but this one is the most useful.
Further information on this can be found in the Timers and timing section of the CIME documentation.
CAM Documentation - https://ncar.github.io/CAM/doc/build/html/index.html
CAM6 namelist settings - http://www.cesm.ucar.edu/models/cesm2/settings/current/cam_nml.html
Please see the wiki for information.
Contributions to the repository are welcome from members of M2Lines and ICCS.
Open tickets can be viewed at Issues.
To contribute, find a relevant issue or open a new one and assign yourself to work on it. Then create a branch in which to add your contribution and open a pull request. Once ready assign a reviewer and request a code review. Merging should only be performed once a reviewer has approved the changes.
Interested contributors from outside M2Lines are invited to comment on issues to propose solutions.