Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UFS-dev PR#118 #113

Merged
merged 13 commits into from
Feb 2, 2024
Merged

UFS-dev PR#118 #113

merged 13 commits into from
Feb 2, 2024

Conversation

DeniseWorthen and others added 7 commits October 5, 2023 10:42
* update CMEPS

* update CMEPS w/ fix for error condition

* turning off regional_atmaq_debug: time-out issue on cheyenne.intel

*update cmeps build to implement check4nans feature using module stored in cdeps repo

* initial sw fix branch

* make pio rearranger=box the default for all tests

* update configurations for ocnalb changes
…ommunity#1906)

* update CMEPS w/ fix for error condition

* turning off regional_atmaq_debug: time-out issue on cheyenne.intel

*update cmeps build to implement check4nans feature using module stored in cdeps repo

* make pio rearranger=box the default for all tests

* update configurations for ocnalb changes

* set the ocean albedo limit in config and use it to set logical
flag

* albdir and albdif can be set as non-std values in nems.configure
will default to 0.07 and 0.06 if not set

* swap signs for latent heat flux used by HYCOM

* test latent flux change for hycom
remove mean prefix for fields imported to FV3atm from CICE or CMEPS
rename fv3atm export fields for snow and rain as inst
* add cpld_bmark_p8_iau test

* set up cpld_gfsv17 c96 iau test

* add tests/cpld_control_gfsv17_iau

* remove cpld_bmark_p8_iau

* sync MOM input template with global-workflow

* add a diag_table that works with fms fix
…community#1915)

* establish mrfd

* bug fixed for mraerosol

* updated upp

* sync up CICE/CMEPS/MOM6/Stoch
…e-opening ufs-community#1943) (ufs-community#1947)

* Allow instantaneous SW and LW fluxes to be used when cpllnd=.true. (re-opening ufs-community#1943) (ufs-community#1947)
@grantfirl
Copy link
Collaborator Author

Expected failures for ufs-community#1894:

All cpld and datm tests will need new baselines because of field name changes in the mediator restart files

@grantfirl
Copy link
Collaborator Author

Expected failures for ufs-community#1915:

003 cpld_control_gfsv17_iau_intel failed in check_result
014 cpld_bmark_p8_intel failed in check_result
023 control_flake_intel failed in check_result
025 control_CubedSphereGrid_parallel_intel failed in check_result
026 control_latlon_intel failed in check_result
027 control_wrtGauss_netcdf_parallel_intel failed in check_result
029 control_c192_intel failed in check_result
030 control_c384_intel failed in check_result
031 control_c384gdas_intel failed in check_result
032 control_stochy_intel failed in check_result
034 control_lndp_intel failed in check_result
035 control_iovr4_intel failed in check_result
036 control_iovr5_intel failed in check_result
037 control_p8_intel failed in check_result
039 control_qr_p8_intel failed in check_result
041 control_decomp_p8_intel failed in check_result
042 control_2threads_p8_intel failed in check_result
043 control_p8_lndp_intel failed in check_result
044 control_p8_rrtmgp_intel failed in check_result
045 control_p8_mynn_intel failed in check_result
046 merra2_thompson_intel failed in check_result
047 regional_control_intel failed in check_result
049 regional_control_qr_intel failed in check_result
051 regional_decomp_intel failed in check_result
052 regional_2threads_intel failed in check_result
055 regional_2dwrtdecomp_intel failed in check_result
056 regional_wofs_intel failed in check_result
057 rap_control_intel failed in check_result
058 regional_spp_sppt_shum_skeb_intel failed in check_result
059 rap_decomp_intel failed in check_result
060 rap_2threads_intel failed in check_result
062 rap_sfcdiff_intel failed in check_result
063 rap_sfcdiff_decomp_intel failed in check_result
065 hrrr_control_intel failed in check_result
066 hrrr_control_qr_intel failed in check_result
067 hrrr_control_decomp_intel failed in check_result
068 hrrr_control_2threads_intel failed in check_result
071 rrfs_v1beta_intel failed in check_result
076 control_ras_intel failed in check_result
078 control_p8_faster_intel failed in check_result
079 regional_control_faster_intel failed in check_result
106 regional_spp_sppt_shum_skeb_dyn32_phy32_intel failed in check_result
107 rap_control_dyn32_phy32_intel failed in check_result
108 hrrr_control_dyn32_phy32_intel failed in check_result
109 hrrr_control_qr_dyn32_phy32_intel failed in check_result
110 rap_2threads_dyn32_phy32_intel failed in check_result
111 hrrr_control_2threads_dyn32_phy32_intel failed in check_result
112 hrrr_control_decomp_dyn32_phy32_intel failed in check_result
119 rap_control_dyn64_phy32_intel failed in check_result
127 hafs_regional_atm_intel failed in check_result
137 hafs_global_multiple_4nests_atm_intel failed in check_result
138 hafs_global_multiple_4nests_atm_qr_intel failed in check_result
139 hafs_regional_specified_moving_1nest_atm_intel failed in check_result
168 control_atmwav_intel failed in check_result
169 atmaero_control_p8_intel failed in check_result
170 atmaero_control_p8_rad_intel failed in check_result
171 atmaero_control_p8_rad_micro_intel failed in check_result
176 control_stochy_gnu failed in check_result
177 control_ras_gnu failed in check_result
178 control_p8_gnu failed in check_result
179 control_flake_gnu failed in check_result
180 rap_control_gnu failed in check_result
181 rap_decomp_gnu failed in check_result
182 rap_2threads_gnu failed in check_result
184 rap_sfcdiff_gnu failed in check_result
185 rap_sfcdiff_decomp_gnu failed in check_result
187 hrrr_control_gnu failed in check_result
188 hrrr_control_qr_gnu failed in check_result
189 hrrr_control_2threads_gnu failed in check_result
190 hrrr_control_decomp_gnu failed in check_result
193 rrfs_v1beta_gnu failed in check_result
209 rap_control_dyn32_phy32_gnu failed in check_result
210 hrrr_control_dyn32_phy32_gnu failed in check_result
211 hrrr_control_qr_dyn32_phy32_gnu failed in check_result
212 rap_2threads_dyn32_phy32_gnu failed in check_result
213 hrrr_control_2threads_dyn32_phy32_gnu failed in check_result
214 hrrr_control_decomp_dyn32_phy32_gnu failed in check_result
221 rap_control_dyn64_phy32_gnu failed in check_result

@grantfirl grantfirl marked this pull request as ready for review January 30, 2024 22:21
@mkavulich
Copy link
Collaborator

Automated RT Failure Notification
Machine: hera
Compiler: intel
Job: RT
[RT] Repo location: /scratch1/BMC/gmtb/CCPP_regression_testing/NCAR_ufs-weather-model//run//1627331177/20240130222511/ufs-weather-model
[RT] Error: Test 001 cpld_control_p8_mixedmode_intel FAIL Tries: 2
[RT] Error: Test 002 cpld_control_gfsv17_intel FAIL Tries: 2
[RT] Error: Test 004 cpld_control_p8_intel FAIL Tries: 2
[RT] Error: Test 006 cpld_control_qr_p8_intel FAIL Tries: 2
[RT] Error: Test 008 cpld_2threads_p8_intel FAIL Tries: 2
[RT] Error: Test 009 cpld_decomp_p8_intel FAIL Tries: 2
[RT] Error: Test 010 cpld_mpi_p8_intel FAIL Tries: 2
[RT] Error: Test 011 cpld_control_ciceC_p8_intel FAIL Tries: 2
[RT] Error: Test 012 cpld_control_c192_p8_intel FAIL Tries: 2
[RT] Error: Test 014 cpld_bmark_p8_intel FAIL Tries: 2
[RT] Error: Test 016 cpld_control_noaero_p8_intel FAIL Tries: 2
[RT] Error: Test 017 cpld_control_nowave_noaero_p8_intel FAIL Tries: 2
[RT] Error: Test 018 cpld_debug_p8_intel FAIL Tries: 2
[RT] Error: Test 019 cpld_debug_noaero_p8_intel FAIL Tries: 2
[RT] Error: Test 020 cpld_control_noaero_p8_agrid_intel FAIL Tries: 2
[RT] Error: Test 021 cpld_control_c48_intel FAIL Tries: 2
[RT] Error: Test 022 cpld_control_p8_faster_intel FAIL Tries: 2
[RT] Error: Test 023 cpld_control_pdlib_p8_intel FAIL Tries: 2
[RT] Error: Test 026 cpld_debug_pdlib_p8_intel FAIL Tries: 2
[RT] Error: Test 027 control_flake_intel FAIL Tries: 2
[RT] Error: Test 029 control_CubedSphereGrid_parallel_intel FAIL Tries: 2
[RT] Error: Test 030 control_latlon_intel FAIL Tries: 2
[RT] Error: Test 031 control_wrtGauss_netcdf_parallel_intel FAIL Tries: 2
[RT] Error: Test 033 control_c192_intel FAIL Tries: 2
[RT] Error: Test 034 control_c384_intel FAIL Tries: 2
[RT] Error: Test 035 control_c384gdas_intel FAIL Tries: 2
[RT] Error: Test 036 control_stochy_intel FAIL Tries: 2
[RT] Error: Test 038 control_lndp_intel FAIL Tries: 2
[RT] Error: Test 039 control_iovr4_intel FAIL Tries: 2
[RT] Error: Test 040 control_iovr5_intel FAIL Tries: 2
[RT] Error: Test 041 control_p8_intel FAIL Tries: 2
[RT] Error: Test 043 control_qr_p8_intel FAIL Tries: 2
[RT] Error: Test 045 control_decomp_p8_intel FAIL Tries: 2
[RT] Error: Test 046 control_2threads_p8_intel FAIL Tries: 2
[RT] Error: Test 047 control_p8_lndp_intel FAIL Tries: 2
[RT] Error: Test 048 control_p8_rrtmgp_intel FAIL Tries: 2
[RT] Error: Test 049 control_p8_mynn_intel FAIL Tries: 2
[RT] Error: Test 050 merra2_thompson_intel FAIL Tries: 2
[RT] Error: Test 051 regional_control_intel FAIL Tries: 2
[RT] Error: Test 053 regional_control_qr_intel FAIL Tries: 2
[RT] Error: Test 055 regional_decomp_intel FAIL Tries: 2
[RT] Error: Test 056 regional_2threads_intel FAIL Tries: 2
[RT] Error: Test 059 regional_2dwrtdecomp_intel FAIL Tries: 2
[RT] Error: Test 060 regional_wofs_intel FAIL Tries: 2
[RT] Error: Test 061 rap_control_intel FAIL Tries: 2
[RT] Error: Test 062 regional_spp_sppt_shum_skeb_intel FAIL Tries: 2
[RT] Error: Test 063 rap_decomp_intel FAIL Tries: 2
[RT] Error: Test 064 rap_2threads_intel FAIL Tries: 2
[RT] Error: Test 066 rap_sfcdiff_intel FAIL Tries: 2
[RT] Error: Test 067 rap_sfcdiff_decomp_intel FAIL Tries: 2
[RT] Error: Test 069 hrrr_control_intel FAIL Tries: 2
[RT] Error: Test 070 hrrr_control_qr_intel FAIL Tries: 2
[RT] Error: Test 071 hrrr_control_decomp_intel FAIL Tries: 2
[RT] Error: Test 072 hrrr_control_2threads_intel FAIL Tries: 2
[RT] Error: Test 075 rrfs_v1beta_intel FAIL Tries: 2
[RT] Error: Test 080 control_ras_intel FAIL Tries: 2
[RT] Error: Test 082 control_p8_faster_intel FAIL Tries: 2
[RT] Error: Test 083 regional_control_faster_intel FAIL Tries: 2
[RT] Error: Test 111 regional_spp_sppt_shum_skeb_dyn32_phy32_intel FAIL Tries: 2
[RT] Error: Test 112 rap_control_dyn32_phy32_intel FAIL Tries: 2
[RT] Error: Test 113 hrrr_control_dyn32_phy32_intel FAIL Tries: 2
[RT] Error: Test 114 hrrr_control_qr_dyn32_phy32_intel FAIL Tries: 2
[RT] Error: Test 115 rap_2threads_dyn32_phy32_intel FAIL Tries: 2
[RT] Error: Test 116 hrrr_control_2threads_dyn32_phy32_intel FAIL Tries: 2
[RT] Error: Test 117 hrrr_control_decomp_dyn32_phy32_intel FAIL Tries: 2
[RT] Error: Test 124 rap_control_dyn64_phy32_intel FAIL Tries: 2
[RT] Error: Test 132 hafs_regional_atm_intel FAIL Tries: 2
[RT] Error: Test 142 hafs_global_multiple_4nests_atm_intel FAIL Tries: 2
[RT] Error: Test 143 hafs_global_multiple_4nests_atm_qr_intel FAIL Tries: 2
[RT] Error: Test 144 hafs_regional_specified_moving_1nest_atm_intel FAIL Tries: 2
[RT] Error: Test 154 datm_cdeps_control_cfsr_intel FAIL Tries: 2
[RT] Error: Test 156 datm_cdeps_control_gefs_intel FAIL Tries: 2
[RT] Error: Test 157 datm_cdeps_iau_gefs_intel FAIL Tries: 2
[RT] Error: Test 158 datm_cdeps_stochy_gefs_intel FAIL Tries: 2
[RT] Error: Test 159 datm_cdeps_ciceC_cfsr_intel FAIL Tries: 2
[RT] Error: Test 160 datm_cdeps_bulk_cfsr_intel FAIL Tries: 2
[RT] Error: Test 161 datm_cdeps_bulk_gefs_intel FAIL Tries: 2
[RT] Error: Test 162 datm_cdeps_mx025_cfsr_intel FAIL Tries: 2
[RT] Error: Test 163 datm_cdeps_mx025_gefs_intel FAIL Tries: 2
[RT] Error: Test 164 datm_cdeps_multiple_files_cfsr_intel FAIL Tries: 2
[RT] Error: Test 165 datm_cdeps_3072x1536_cfsr_intel FAIL Tries: 2
[RT] Error: Test 166 datm_cdeps_gfs_intel FAIL Tries: 2
[RT] Error: Test 167 datm_cdeps_debug_cfsr_intel FAIL Tries: 2
[RT] Error: Test 168 datm_cdeps_control_cfsr_faster_intel FAIL Tries: 2
[RT] Error: Test 173 control_atmwav_intel FAIL Tries: 2
[RT] Error: Test 174 atmaero_control_p8_intel FAIL Tries: 2
[RT] Error: Test 175 atmaero_control_p8_rad_intel FAIL Tries: 2
[RT] Error: Test 176 atmaero_control_p8_rad_micro_intel FAIL Tries: 2
[RT] Error: Test 181 control_stochy_gnu FAIL Tries: 2
[RT] Error: Test 182 control_ras_gnu FAIL Tries: 2
[RT] Error: Test 183 control_p8_gnu FAIL Tries: 2
[RT] Error: Test 184 control_flake_gnu FAIL Tries: 2
[RT] Error: Test 185 rap_control_gnu FAIL Tries: 2
[RT] Error: Test 186 rap_decomp_gnu FAIL Tries: 2
[RT] Error: Test 187 rap_2threads_gnu FAIL Tries: 2
[RT] Error: Test 189 rap_sfcdiff_gnu FAIL Tries: 2
[RT] Error: Test 190 rap_sfcdiff_decomp_gnu FAIL Tries: 2
[RT] Error: Test 192 hrrr_control_gnu FAIL Tries: 2
[RT] Error: Test 193 hrrr_control_qr_gnu FAIL Tries: 2
[RT] Error: Test 194 hrrr_control_2threads_gnu FAIL Tries: 2
[RT] Error: Test 195 hrrr_control_decomp_gnu FAIL Tries: 2
[RT] Error: Test 198 rrfs_v1beta_gnu FAIL Tries: 2
[RT] Error: Test 215 rap_control_dyn32_phy32_gnu FAIL Tries: 2
[RT] Error: Test 216 hrrr_control_dyn32_phy32_gnu FAIL Tries: 2
[RT] Error: Test 217 hrrr_control_qr_dyn32_phy32_gnu FAIL Tries: 2
[RT] Error: Test 218 rap_2threads_dyn32_phy32_gnu FAIL Tries: 2
[RT] Error: Test 219 hrrr_control_2threads_dyn32_phy32_gnu FAIL Tries: 2
[RT] Error: Test 220 hrrr_control_decomp_dyn32_phy32_gnu FAIL Tries: 2
[RT] Error: Test 227 rap_control_dyn64_phy32_gnu FAIL Tries: 2
[RT] Error: Test 235 cpld_control_p8_gnu FAIL Tries: 2
[RT] Error: Test 236 cpld_control_nowave_noaero_p8_gnu FAIL Tries: 2
[RT] Error: Test 237 cpld_debug_p8_gnu FAIL Tries: 2
[RT] Error: Test 238 cpld_control_pdlib_p8_gnu FAIL Tries: 2
[RT] Error: Test 239 cpld_debug_pdlib_p8_gnu FAIL Tries: 2
[RT] Error: Test 240 datm_cdeps_control_cfsr_gnu FAIL Tries: 2
[RT] Log file shows failures.
[RT] Please obtain logs from /scratch1/BMC/gmtb/CCPP_regression_testing/NCAR_ufs-weather-model//run//1627331177/20240130222511/ufs-weather-model

@grantfirl
Copy link
Collaborator Author

@mkavulich It looks like the failures are all expected according to the UFS PRs. I'll start the BL creation.

@mkavulich
Copy link
Collaborator

Automated RT Failure Notification
Machine: hera
Compiler: intel
Job: BL
[BL] Repo location: /scratch1/BMC/gmtb/CCPP_regression_testing/NCAR_ufs-weather-model//run//1627331177/20240131154510/ufs-weather-model
Please make changes and add the following label back: hera-intel-BL

@grantfirl
Copy link
Collaborator Author

@mkavulich I failed to notice that ufs-community#1916 requires new input data. The new test cpld_control_gfsv17_iau_intel failed to run. Can you please stage the input data on Hera (I apparently lack permissions):

cp /scratch2/NAGAPE/epic/UFS-WM_RT/NEMSfv3gfs/input-data-20221101/FV3_input_data/INPUT_L127_mx100/fv_increment*.nc /scratch1/BMC/gmtb/CCPP_regression_testing/NCAR_ufs-weather-model/input_data/20230920/FV3_input_data/INPUT_L127_mx100/

I'm going to commit the failed log. Once the input files are staged, we can create a baseline for the new test only and recommit a successful log. That way, logs will be committed showing all BL creations succeed, if not at the same time.

@mkavulich
Copy link
Collaborator

Sounds like a good plan, I have staged the new data. I also added group write permissions in that directory for the future (I still can't figure out why my umask isn't working as expected, I think I'll open a ticket with Hera helpdesk).

Should I move the new baselines that were created so far to main-20240131?

@grantfirl
Copy link
Collaborator Author

/scratch1/BMC/gmtb/CCPP_regression_testing/NCAR_ufs-weather-model//run//1627331177/20240131154510/ufs-weather-model

Yes, please, just in case.

@grantfirl
Copy link
Collaborator Author

@mkavulich Looks like we hit another snag. I ran into trouble trying to do the individual test, so I just tried to recreate all baselines. We need to do the following:

Stage the following files from /scratch2/NAGAPE/epic/UFS-WM_RT/NEMSfv3gfs/input-data-20221101 to /scratch1/BMC/gmtb/CCPP_regression_testing/NCAR_ufs-weather-model/input_data/20230920:
MOM6_IC/mom6_increment.nc
FV3_input_data/INPUT_L127_mx100.v2.sfc/fv_increment3.nc
FV3_input_data/INPUT_L127_mx100.v2.sfc/fv_increment6.nc
FV3_input_data/INPUT_L127_mx100.v2.sfc/fv_increment9.nc

We also hit the disk quota, so we need to nuke some stuff. I would suggest nuking the new baselines that we just created (Let's just re-run the tag after we clear space). We also need to clear out /scratch1/BMC/gmtb/CCPP_regression_testing/NCAR_ufs-weather-model/run and make sure we're only keeping the last 2 sets of baselines.

Once these things are done, let's add back the hera-intel-BL label.

@mkavulich
Copy link
Collaborator

mkavulich commented Feb 1, 2024

Crap, sorry about the disk quota thing. I forgot I had disabled the automatic cleanup for debugging, I will re-enable that for the automated runs when I get a chance. I cleaned out all but the last couple runs (and the latest failed baseline), should have plenty of space now for the next while.

Do we not need to copy the whole directory /scratch2/NAGAPE/epic/UFS-WM_RT/NEMSfv3gfs/input-data-20221101/FV3_input_data/INPUT_L127_mx100.v2.sfc/? Right now that directory does not exist in /scratch1/BMC/gmtb/CCPP_regression_testing/NCAR_ufs-weather-model/input_data/20230920/FV3_input_data/

Edit: I copied over that whole directory just in case, and started the newer baseline.

@grantfirl
Copy link
Collaborator Author

Crap, sorry about the disk quota thing. I forgot I had disabled the automatic cleanup for debugging, I will re-enable that for the automated runs when I get a chance. I cleaned out all but the last couple runs (and the latest failed baseline), should have plenty of space now for the next while.

Do we not need to copy the whole directory /scratch2/NAGAPE/epic/UFS-WM_RT/NEMSfv3gfs/input-data-20221101/FV3_input_data/INPUT_L127_mx100.v2.sfc/? Right now that directory does not exist in /scratch1/BMC/gmtb/CCPP_regression_testing/NCAR_ufs-weather-model/input_data/20230920/FV3_input_data/

Ya, if it doesn't exist, let's make/copy a directory there.

@mkavulich
Copy link
Collaborator

Automated RT Failure Notification
Machine: hera
Compiler: intel
Job: BL
[BL] Repo location: /scratch1/BMC/gmtb/CCPP_regression_testing/NCAR_ufs-weather-model//run//1627331177/20240201201518/ufs-weather-model
Please make changes and add the following label back: hera-intel-BL

Copy link
Collaborator

@mkavulich mkavulich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New baseline has been moved to baselines/main-20240131

@grantfirl grantfirl merged commit 220d636 into NCAR:main Feb 2, 2024
@grantfirl grantfirl mentioned this pull request Feb 2, 2024
39 tasks
@mkavulich
Copy link
Collaborator

Automated RT Failure Notification
Machine: hera
Compiler: intel
Job: BL
[BL] Repo location: /scratch1/BMC/gmtb/CCPP_regression_testing/NCAR_ufs-weather-model//run//1627331177/20240131203009/ufs-weather-model
Please make changes and add the following label back: hera-intel-BL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants