forked from NCAR/MPAS-Workflow
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
206 lines (138 loc) · 8.5 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
MPAS-Workflow
=============
A tool for cycling forecast and data assimilation experiments with the MPAS-Atmosphere model and the
MPAS-JEDI data assimilation package.
# starting a cycling experiment on the Cheyenne HPC
---------------------------------------------------
login to Cheyenne
mkdir -p /fresh/path/for/submitting/experiments
cd /fresh/path/for/submitting/experiments
module load git
git clone https://github.com/NCAR/MPAS-Workflow
modify configuration files as needed
source env-setup/cheyenne.csh
OR
source env-setup/cheyenne.sh
./drive.csh
# It is required to set the work/run directories in $HOME/.cylc/global.rc as follows:
[hosts]
[[localhost]]
work directory = /glade/scratch/USERNAME/cylc-run
run directory = /glade/scratch/USERNAME/cylc-run
[[[batch systems]]]
[[[[pbs]]]]
job name length maximum = 236
# It is recommended to also set 'job name length maximum' to a large value
# Configuration Files (config/)
-------------------------------
The files in this directory describe the configuration for the entire workflow. Some files are
designed to be modified by users, and others mostly by developers.
## top-level configuration (config/*.csh)
-----------------------------------------
builds.csh: describes the build directories for critical applications
modeldata.csh: static model-space data file structure, including mesh-specific partition files,
fixed ensemble forecast members for deterministic experiments, first guess files for the first cycle
of an experiment, surface variable update files (sst and xice), and common static.nc file(s) to be
used across all cycles.
environment.csh: run-time environment used across compiled executables and python scripts
experiment.csh: primary control knobs for individualizing experiments
filestructure.csh: workflow file structure
job.csh: job queue settings
obsdata.csh: static observation-space data file structure
tools.csh: initializes python tools for workflow task management
verification.csh: post-processing and verification script descriptions
## MPAS-specific configuration (config/mpas/)
---------------------------------------------
mpas/variables.csh: model/analysis variables used to generate YAML files for MPAS-JEDI applications
mpas/$MPASGridDescriptor/mesh.csh: mesh-specific options that affect the workflow and application
behaviors
mpas/$MPASGridDescriptor/job.csh: job durations and processor usages
In the above, MPASGridDescriptor describes the meshes that are used in the Variational application.
See config/experiment.csh for more information.
## main driver: drive.csh
-------------------------
Creates a new cylc suite file, then runs it. There are options at the top of this file for begin/end
dates and various kinds of workflows with and without verification. The CriticalPathType determines
whether the verification is performed concurrently with and depends on the critical path (Normal),
or as an independent post-processing diagnostic step (Bypass). The Reanalysis and Reforecast
CriticalPathType's are two variations of "partial cycling", where the current cycle does not depend
on the previous cycle. Reanalysis is used to perform the CyclingDA task on each cycle without
re-running forecasts. This requires the CyclingFC output files to already be present in the
experiment directory, which might be added manually outside of the workflow. Reforecast is used to
perform forecasts from an existing set of analysis states, which are stored in the CyclingDA
directory.
## templated workflow components
--------------------------------
These scripts serve as templates for multiple workflow components. The actual components are
generated by performing sed substitution within SetupWorkflow.csh and AppAndVerify.csh. Here we give
a brief summary of the templating for each script.
PrepJEDI.csh: substitutes relevent sections in the yaml file for all MPAS-JEDI applications.
Templated w.r.t. the application type (e.g., Variational, HofX) and application name (e.g.,
3denvar). Prepares namelist.atmosphere, streams.atmosphere, stream_list.atmosphere.*. Links
required static files and graph info files that describe MPI partitioning.
PrepVariational.csh: further modifies the application yaml file(s) for the Variational application
Variational.csh: used in the CyclingDA cylc task; executes the mpasjedi_variational and
mpasjedi_eda applications. Templated w.r.t. the background state prefix and directory. Reads output
states from a CyclingFC task, as coded in SetupWorkflow.csh.
CleanVariational.csh: used to generate CleanCyclingDA.csh, which cleans CyclingDA working
directories in order to reduce experiment disk resource requirements.
forecast.csh: used to generate all forecast scripts, e.g., CyclingFC.csh and ExtendedMeanFC.csh,
which perform mpas_atmosphere forecasts across a templated time range with state output at a
templated interval. Presently only takes analyses as initial conditions, which have the
ANFilePrefix and are produced by either CyclingDA or RTPPInflation. self_icStatePrefix could be
templated in order to enable forecasts from other kinds of states, like cold-start files.
HofX.csh: used to generate all HofX* scripts, e.g., HofXBG.csh, HofXMeanFC.csh, HofXEnsMeanBG.csh,
which run the mpasjedi_hofx3d application. Templated w.r.t. the input state directory and prefix,
allowing it to read any forecast state written through the "da_state" stream.
CleanHofx.csh: used to generate CleanHofX*.csh scripts, which clean HofX* working directories
in order to reduce experiment disk resource requirements.
verifyobs.csh: used to generate scripts that verify observation-database output from HofX* and
CyclingDA tasks.
verifymodel.csh: used to generate scripts that verify model forecast states with respect to GFS
analyses.
## non-templated workflow components
------------------------------------
These scripts are used as-is without sed substitution.
MeanBackground.csh: calculates the mean of ensemble background states
MeanAnalysis.csh: calculates the mean of ensemble analysis states
RTPPInflation.csh: performs Relaxation To Prior Perturbation (RTPP) inflation, taking as input two
ensembles, one each of background states and analysis states.
GenerateABEInflation.csh: generates Adaptive Background Error Inflation (ABEI) factors based on
all-sky IR brightness temperature H(x_mean) and H_clear(x_mean) from GOES-16 ABI and Himawari-8 AHI.
## MPAS-JEDI application configuration files
--------------------------------------------
config/applicationBase/*.yaml: MPAS-JEDI application-specific YAML templates
config/ObsPlugs/variational/*.yaml: observation yaml stubs that get plugged into all Variational
applications, e.g., 3denvar and eda_3denvar
config/ObsPlugs/hofx/*.yaml: same, but for HofX
## application-specific MPAS-Atmosphere configuration files
-----------------------------------------------------------
e.g., namelist.atmosphere, streams.atmosphere, and stream_list.atmosphere.*
config/mpas/forecast/*: *FC (forecast) tasks
config/mpas/hofx/*: HofX* task
config/mpas/rtpp/*: RTPPInflation task
config/mpas/variational/*: CyclingDA task
## python tools that perform aspects of the workflow
----------------------------------------------------
tools/advanceCYMDH.py: time-stepping used to figure out dates relative to an arbitrary input date
tools/memberDir.py: generates an ensemble member directory string, dependent on experiment- and
application-specific inputs
tools/nSpaces.py: generates a string containing the number of spaces that are input. Used for
controlling indentation of some yaml components
## some useful cylc commands
----------------------------
# Print a list of active suites
cylc scan
# Open an X-window GUI showing the status of all active suites. Double-click an individual suite in
order to see detailed information. From there it is easy to perform actions on the entire suite or
individual tasks, e.g., hold, resume, kill, trigger.
cylc gscan
# Trigger all tasks in a suite with a particular STATUS. Examples: failed, submit-failed
cylc trigger SUITENAME '*.*:STATUS'
## a note about disk management
-------------------------------
This workflow includes automated deletion of some intermediate files. That behavior can be modified
in scripts that look like clean-{{application}}.csh. If data storage is still a problem, it is
recommended to remove the Cycling* directories of an experiment after all desired verification has
completed. The model- and observation-space statistical summary files are orders of magnitude
smaller than the full model states and instrument feedback files.