Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General config and run_reframe.sh for local and EESSI stack #200

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions CI/hortense_EESSI_ss/ci_config.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Configurable items
if [ -z "${REFRAME_ARGS}" ]; then
REFRAME_ARGS="--tag CI --tag 1_node|2_nodes --system hortense:cpu_rome_256gb"
fi

if [ -z "${UNSET_MODULEPATH}" ]; then
export UNSET_MODULEPATH=False
module --force purge
fi

if [ -z "${USE_EESSI_SOFTWARE_STACK}" ]; then
export USE_EESSI_SOFTWARE_STACK=True
fi

if [ -z "${RFM_CONFIG_FILES}" ]; then
export RFM_CONFIG_FILES="/dodrio/scratch/users/vsc46128/vsc_hortense.py"
fi
16 changes: 16 additions & 0 deletions CI/hortense_local_ss/ci_config.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Configurable items
if [ -z "${REFRAME_ARGS}" ]; then
REFRAME_ARGS="--tag CI --tag 1_node|2_nodes --system hortense:cpu_rome_256gb"
fi

if [ -z "${UNSET_MODULEPATH}" ]; then
export UNSET_MODULEPATH=False
fi

if [ -z "${USE_EESSI_SOFTWARE_STACK}" ]; then
export USE_EESSI_SOFTWARE_STACK=False
fi

if [ -z "${RFM_CONFIG_FILES}" ]; then
export RFM_CONFIG_FILES="/dodrio/scratch/users/vsc46128/vsc_hortense.py"
fi
26 changes: 18 additions & 8 deletions CI/run_reframe.sh
Original file line number Diff line number Diff line change
Expand Up @@ -50,11 +50,14 @@ fi
if [ -z "${EESSI_TESTSUITE_BRANCH}" ]; then
EESSI_TESTSUITE_BRANCH='v0.4.0'
fi
if [ -z "${EESSI_CVMFS_REPO}" ]; then
export EESSI_CVMFS_REPO=/cvmfs/software.eessi.io
fi
if [ -z "${EESSI_VERSION}" ]; then
export EESSI_VERSION=2023.06
if [ -z "${USE_EESSI_SOFTWARE_STACK}" ] | [ $USE_EESSI_SOFTWARE_STACK == "True" ]; then
Copy link
Collaborator

@casparvl casparvl Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting:

$ EESSI_CI_SYSTEM_NAME=surf_snellius ./run_reframe.sh
Running CI on host int5
/tmp/EESSI-test-suite/CI
./run_reframe.sh: line 53: [: ==: unary operator expected

Also, when USE_EESSI_SOFTWARE_STACK was empty, it didn't get set by default (which seems to me is what you were trying to do here). The reason is you're using a single | instead of ||.

Both be resolved by:

Suggested change
if [ -z "${USE_EESSI_SOFTWARE_STACK}" ] | [ $USE_EESSI_SOFTWARE_STACK == "True" ]; then
if [ -z "${USE_EESSI_SOFTWARE_STACK}" ] || [ "$USE_EESSI_SOFTWARE_STACK" == "True" ]; then

export USE_EESSI_SOFTWARE_STACK=True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to print some more of these variables in the section that also prints reframe config etc. Makes things easier to debug.

if [ -z "${EESSI_CVMFS_REPO}" ]; then
export EESSI_CVMFS_REPO=/cvmfs/software.eessi.io
fi
if [ -z "${EESSI_VERSION}" ]; then
export EESSI_VERSION=2023.06
fi
fi
if [ -z "${RFM_CONFIG_FILES}" ]; then
export RFM_CONFIG_FILES="${TEMPDIR}/test-suite/config/${EESSI_CI_SYSTEM_NAME}.py"
Expand All @@ -73,6 +76,9 @@ if [ -z "${REFRAME_TIMEOUT}" ]; then
# This will prevent multiple ReFrame runs from piling up and exceeding the quota on our Magic Castle clusters
export REFRAME_TIMEOUT=1430m
fi
if [ -z "${UNSET_MODULEPATH}" ]; then
export UNSET_MODULEPATH=True
fi

# Create virtualenv for ReFrame using system python
python3 -m venv "${TEMPDIR}"/reframe_venv
Expand All @@ -93,9 +99,13 @@ git clone ${EESSI_CLONE_ARGS}
export PYTHONPATH="${PYTHONPATH}":"${TEMPDIR}"/test-suite/

# Start the EESSI environment
unset MODULEPATH
eessi_init_path="${EESSI_CVMFS_REPO}"/versions/"${EESSI_VERSION}"/init/bash
source "${eessi_init_path}"
if [ $UNSET_MODULEPATH == "True" ]; then
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if [ $UNSET_MODULEPATH == "True" ]; then
if [ "$UNSET_MODULEPATH" == "True" ]; then

unset MODULEPATH
fi
if [ $USE_EESSI_SOFTWARE_STACK == "True" ]; then
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if [ $USE_EESSI_SOFTWARE_STACK == "True" ]; then
if [ "$USE_EESSI_SOFTWARE_STACK" == "True" ]; then

eessi_init_path="${EESSI_CVMFS_REPO}"/versions/"${EESSI_VERSION}"/init/bash
source "${eessi_init_path}"
fi

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have an alternative to load a local module environment? Might not be needed on your systems. On our system we'd have to load a meta-module (e.g. 2023, or 2024) to make a software environment available.

# Needed in order to make sure the reframe from our TEMPDIR is first on the PATH,
# prior to the one shipped with the 2021.12 compat layer
Expand Down
41 changes: 26 additions & 15 deletions config/vsc_hortense.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
# https://docs.vscentrum.be/en/latest/gent/tier1_hortense.html
#
# authors: Samuel Moors (VUB-HPC), Kenneth Hoste (HPC-UGent)
import os

from reframe.core.backends import register_launcher
from reframe.core.launchers import JobLauncher
Expand All @@ -21,6 +22,16 @@ def command(self, job):
return ['mympirun', '--hybrid', str(job.num_tasks_per_node)]


eessi_cvmfs_repo = os.getenv('EESSI_CVMFS_REPO', None)
if eessi_cvmfs_repo is not None:
prepare_eessi_init = 'module --force purge'
launcher = 'mpirun'
mpi_module = ''
else:
prepare_eessi_init = ''
launcher = 'mympirun'
mpi_module = 'vsc-mympirun'

site_configuration = {
'systems': [
{
Expand All @@ -32,13 +43,13 @@ def command(self, job):
{
'name': 'cpu_rome_256gb',
'scheduler': 'slurm',
'prepare_cmds': [common_eessi_init()],
'prepare_cmds': [prepare_eessi_init, common_eessi_init()],
'access': hortense_access + ['--partition=cpu_rome'],
'environs': ['default'],
'descr': 'CPU nodes (AMD Rome, 256GiB RAM)',
'max_jobs': 20,
'launcher': 'mympirun',
'modules': ['vsc-mympirun'],
'launcher': launcher,
'modules': [mpi_module],
'processor': {
'num_cpus': 128,
'num_sockets': 2,
Expand All @@ -64,13 +75,13 @@ def command(self, job):
{
'name': 'cpu_rome_512gb',
'scheduler': 'slurm',
'prepare_cmds': [common_eessi_init()],
'prepare_cmds': [prepare_eessi_init, common_eessi_init()],
'access': hortense_access + ['--partition=cpu_rome_512'],
'environs': ['default'],
'descr': 'CPU nodes (AMD Rome, 512GiB RAM)',
'max_jobs': 20,
'launcher': 'mympirun',
'modules': ['vsc-mympirun'],
'launcher': launcher,
'modules': [mpi_module],
'processor': {
'num_cpus': 128,
'num_sockets': 2,
Expand All @@ -96,13 +107,13 @@ def command(self, job):
{
'name': 'cpu_milan',
'scheduler': 'slurm',
'prepare_cmds': [common_eessi_init()],
'prepare_cmds': [prepare_eessi_init, common_eessi_init()],
'access': hortense_access + ['--partition=cpu_milan'],
'environs': ['default'],
'descr': 'CPU nodes (AMD Milan, 256GiB RAM)',
'max_jobs': 20,
'launcher': 'mympirun',
'modules': ['vsc-mympirun'],
'launcher': launcher,
'modules': [mpi_module],
'processor': {
'num_cpus': 128,
'num_sockets': 2,
Expand All @@ -128,13 +139,13 @@ def command(self, job):
{
'name': 'gpu_rome_a100_40gb',
'scheduler': 'slurm',
'prepare_cmds': [common_eessi_init()],
'prepare_cmds': [prepare_eessi_init, common_eessi_init()],
'access': hortense_access + ['--partition=gpu_rome_a100_40'],
'environs': ['default'],
'descr': 'GPU nodes (A100 40GB)',
'max_jobs': 20,
'launcher': 'mympirun',
'modules': ['vsc-mympirun'],
'launcher': launcher,
'modules': [mpi_module],
'processor': {
'num_cpus': 48,
'num_sockets': 2,
Expand Down Expand Up @@ -172,13 +183,13 @@ def command(self, job):
{
'name': 'gpu_rome_a100_80gb',
'scheduler': 'slurm',
'prepare_cmds': [common_eessi_init()],
'prepare_cmds': [prepare_eessi_init, common_eessi_init()],
'access': hortense_access + ['--partition=gpu_rome_a100_80'],
'environs': ['default'],
'descr': 'GPU nodes (A100 80GB)',
'max_jobs': 20,
'launcher': 'mympirun',
'modules': ['vsc-mympirun'],
'launcher': launcher,
'modules': [mpi_module],
'processor': {
'num_cpus': 48,
'num_sockets': 2,
Expand Down
Loading