Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update github workflow #805

Closed
wants to merge 10 commits into from

Conversation

DusanJovic-NOAA
Copy link
Collaborator

@DusanJovic-NOAA DusanJovic-NOAA commented Mar 20, 2024

Description

  • Updated the gcc compiler version from 11 to 12.
  • Add 'spack clean --all' to reduce the size of the github cache file, Currently cache file is about 3.5 GB, after this change cache file is about 150 - 200 MB.
  • Split GCC workflow into two steps, the first step only builds spack, the second step builds the fv3atm. Currently spack stack is built for each combination of compile/mpi/cmake_flags even though spack does not depend on cmake flags (those are fv3atm specific flags).

Issue(s) addressed

Testing

How were these changes tested? GitHub CI
What compilers / HPCs was it tested with?
Are the changes covered by regression tests? (If not, why? Do new tests need to be added?)
Have the ufs-weather-model regression test been run? On what platform?

  • Will the code updates change regression test baseline? If yes, why? Please show the baseline directory below.
  • Please commit the regression test log files in your ufs-weather-model branch

Dependencies

N/A

@DusanJovic-NOAA DusanJovic-NOAA marked this pull request as draft March 20, 2024 14:35
@DusanJovic-NOAA DusanJovic-NOAA marked this pull request as ready for review March 20, 2024 14:36
@DusanJovic-NOAA DusanJovic-NOAA changed the title [DRAFT] Update github workflow Update github workflow Mar 21, 2024
Copy link
Contributor

@AlexanderRichert-NOAA AlexanderRichert-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good to me. Just out of curiosity what's the motivation for going from gcc 11 to 12?

@DusanJovic-NOAA
Copy link
Collaborator Author

This all looks good to me. Just out of curiosity what's the motivation for going from gcc 11 to 12?

If I remember correctly gcc 11 had some issues with mpich.

@DusanJovic-NOAA
Copy link
Collaborator Author

I see this warning:

Node.js 16 actions are deprecated. Please update the following actions to use Node.js 20: actions/checkout@v3, actions/cache@v3, actions/upload-artifact@v3. For more information see: https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/.

here: https://github.com/NOAA-EMC/fv3atm/actions/runs/8375122977

@AlexanderRichert-NOAA
Copy link
Contributor

All of those actions are currently at version 4.x, so changing @v3 to @v4 should make the warnings go away.

@DusanJovic-NOAA
Copy link
Collaborator Author

This PR will probably create conflicts with #752. @AlexanderRichert-NOAA I do not know which PR we'll be merging first but feel free to combine the changes from this PR with your 'Add unit testing' PR.

@AlexanderRichert-NOAA
Copy link
Contributor

@DusanJovic-NOAA I'll take a look, thanks

@DusanJovic-NOAA
Copy link
Collaborator Author

@AlexanderRichert-NOAA did you have a chance to look into potential conflicts in your #752 PR? Or should we merge this PR now as it is?

@AlexanderRichert-NOAA
Copy link
Contributor

Yes, thanks. I don't foresee any problem with merging this first.

@DusanJovic-NOAA
Copy link
Collaborator Author

Unfortunately workflow is failing, this time with:

CMake Error at /home/runner/work/fv3atm/fv3atm/spack-develop/opt/spack/linux-ubuntu22.04-zen2/gcc-12.3.0/fms-2023.04-cmrjkxgqmcz74sjpou2qtdgwag7mavuq/lib/cmake/fms/fms-config.cmake:11 (message):
  File or directory referenced by variable FMS_INSTALL_PREFIX does not exist
  !
Call Stack (most recent call first):
  /home/runner/work/fv3atm/fv3atm/spack-develop/opt/spack/linux-ubuntu22.04-zen2/gcc-12.3.0/fms-2023.04-cmrjkxgqmcz74sjpou2qtdgwag7mavuq/lib/cmake/fms/fms-config.cmake:52 (set_and_check)
  ci/CMakeLists.txt:58 (find_package)
  CMakeLists.txt:18 (include)

Previous run from this branch finished successfully: https://github.com/NOAA-EMC/fv3atm/actions/runs/8375122977

@DusanJovic-NOAA
Copy link
Collaborator Author

I tried to reproduce the build failure in the github workflow by building spack stack on my desktop computer but I couldn't. It's a different OS (Red Hat vs. Ubuntu), slightly different compiler and mpi versions. Building the spack stack failed first while building the ip library. In CI ip@develop is specified which needs LAPACK, which I do not have installed. Model currently uses ip/4.3.0 not ip/develop. I also noticed that the FMS version compiled in the workflow stack is 2023.04 while the model currently uses 2023.02.01. The versions of ESMF are also different 8.4.3 in CI vs. 8.5.0 that model currently needs. It's strange that in CI some libraries are newer that what model currently uses while some others are older.

We should still try to use the same versions everywhere, just for consistency. 

However, none of these explains why compilation fails in the github workflow.

@AlexanderRichert-NOAA
Copy link
Contributor

How recently was the cache rebuilt? That's the only thing I can think of... Can you try printing the value of that variable that it's complaining about? Or if nothing else, maybe cat /home/runner/work/fv3atm/fv3atm/spack-develop/opt/spack/linux-ubuntu22.04-zen2/gcc-12.3.0/fms-2023.04-*/lib/cmake/fms/fms-config.cmake and see if that yields any clues. Let me know if you'd like me to dive into this further.

@DusanJovic-NOAA
Copy link
Collaborator Author

When I look at the fms-config.cmake in the install tree on my desktop, which should be either identical or very similar to the one in the github cache, I see:

$ more ./lib/cmake/fms/fms-config.cmake

####### Expanded from @PACKAGE_INIT@ by configure_package_config_file() #######
####### Any changes to this file will be overwritten by the next CMake run ####
####### The input file was FMSConfig.cmake.in                            ########

get_filename_component(PACKAGE_PREFIX_DIR "${CMAKE_CURRENT_LIST_DIR}/../../../" ABSOLUTE)

macro(set_and_check _var _file)
  set(${_var} "${_file}")
  if(NOT EXISTS "${_file}")
    message(FATAL_ERROR "File or directory ${_file} referenced by variable ${_var} does not exist !")
  endif()
endmacro()

. . .


set(FMSVersion "${PACKAGE_VERSION}")
set_and_check(FMS_INSTALL_PREFIX "${PACKAGE_PREFIX_DIR}")

Variable FMS_INSTALL_PREFIX is set to be equal to PACKAGE_PREFIX_DIR, which is set to ${CMAKE_CURRENT_LIST_DIR}/../../../

CMAKE_CURRENT_LIST_DIR should be .../fms-2023.04-*/lib/cmake/fms and PACKAGE_PREFIX_DIR (and FMS_INSTALL_PREFIX) should then be .../fms-2023.04-*

I do not understand how this file (fms-config.cmake) exists but it's parent does not exist.

@DusanJovic-NOAA
Copy link
Collaborator Author

I added cat ...../fms-config.cmake to github workflow file and I see:

####### Expanded from @PACKAGE_INIT@ by configure_package_config_file() #######
####### Any changes to this file will be overwritten by the next CMake run ####
####### The input file was FMSConfig.cmake.in                            ########

get_filename_component(PACKAGE_${CMAKE_FIND_PACKAGE_NAME}_COUNTER_1 "${CMAKE_CURRENT_LIST_DIR}/../../../" ABSOLUTE)

....

which is different than in the equivalent fms-config.cmake file on hera, for example:

$ head /scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.6.0/envs/unified-env-rocky8/install/intel/2021.5.0/fms-2023.04-ywa3zzf/lib/cmake/fms/fms-config.cmake 

####### Expanded from @PACKAGE_INIT@ by configure_package_config_file() #######
####### Any changes to this file will be overwritten by the next CMake run ####
####### The input file was FMSConfig.cmake.in                            ########

get_filename_component(PACKAGE_PREFIX_DIR "${CMAKE_CURRENT_LIST_DIR}/../../../" ABSOLUTE)

macro(set_and_check _var _file)
  set(${_var} "${_file}")
  if(NOT EXISTS "${_file}")

That first line should set PACKAGE_PREFIX_DIR, which is later used to set FMS_INSTALL_PREFIX.

I do not know why this is different in workflow cache than on Hera. Could it be FMS issue, or spack issue, or cmake issue, I'm not sure.

@DusanJovic-NOAA
Copy link
Collaborator Author

Looks like this is an issue with the specific version of cmake (3.29.1) currently used by github workflow:

https://gitlab.kitware.com/cmake/cmake/-/issues/25827
https://gitlab.kitware.com/cmake/cmake/-/merge_requests/9420

microsoft/vcpkg#37968

@AlexanderRichert-NOAA
Copy link
Contributor

Hm that's unfortunate, good find though. Unless there's another version of cmake available in the runners, like under /usr/local somewhere, then maybe the way to go is to build it through Spack, which takes a few minutes, but at least it'll get cached. I think that would only require removing cmake from the spack external find call.

@DusanJovic-NOAA
Copy link
Collaborator Author

Or we can just wait for github to update their VMs with the latest version, which I hope they'll do soon and we can then just remove the step that installs cmake from a tar file. Three weeks ago this worked just fine, which means in the meantime version 3.29.1 has been released and github picked it. We should also add status message to our CMakeLists.txt to print the version of cmake used, which is useful in situations like this.

This was referenced Apr 17, 2024
@DusanJovic-NOAA
Copy link
Collaborator Author

Merged via #811

@DusanJovic-NOAA DusanJovic-NOAA deleted the update_ci branch May 13, 2024 12:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GitHub workflow run fails
4 participants