Skip to content

Commit

Permalink
ADD: early pytest on slurm
Browse files Browse the repository at this point in the history
  • Loading branch information
matbun committed Oct 18, 2023
1 parent 416c879 commit a718b7a
Show file tree
Hide file tree
Showing 6 changed files with 55 additions and 69 deletions.
18 changes: 16 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,11 +96,25 @@ adding the `dev` extra:
pip install -e .[dev]
```

To **run tests** on itwinai package:
#### Test with `pytest`

To run tests on itwinai package:

```bash
# Activate env
micromamba activate ./.venv-pytorch # or ./.venv-tf

pytest -v tests/
pytest -v -m "not slurm" tests/
```

However, some tests are intended to be executed only on an HPC system,
where SLURM is available. They are marked with "slurm" tag. To run also
those tests, use the dedicated job script:

```bash
sbatch tests/slurm_tests_startscript

# Upon completion, check the output:
cat job.err
cat job.out
```
32 changes: 32 additions & 0 deletions tests/slurm_tests_startscript
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#!/bin/bash

# general configuration of the job
#SBATCH --job-name=PrototypeTest
#SBATCH --account=intertwin
#SBATCH --mail-user=
#SBATCH --mail-type=ALL
#SBATCH --output=job.out
#SBATCH --error=job.err
#SBATCH --time=00:30:00

# configure node and process count on the CM
#SBATCH --partition=batch
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=4
#SBATCH --gpus-per-node=4

# SBATCH --exclusive

# gres options have to be disabled for deepv
#SBATCH --gres=gpu:4

# load modules
ml --force purge
ml Stages/2023 StdEnv/2023 NVHPC/23.1 OpenMPI/4.1.4 cuDNN/8.6.0.163-CUDA-11.7 Python/3.10.4 HDF5 libaio/0.3.112 GCC/11.3.0

# shellcheck source=/dev/null
source ~/.bashrc

# from repo's root dir
srun micromamba run -p ./.venv-pytorch pytest -v -m slurm tests/
2 changes: 1 addition & 1 deletion tests/torch/distribtued_decorator.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
import torch.optim as optim
from torch.optim.lr_scheduler import StepLR

from itwinai.backend.torch.trainer import distributed
from itwinai.torch.trainer import distributed


class Net(nn.Module):
Expand Down
9 changes: 5 additions & 4 deletions tests/torch/test_distribtued_training.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,18 +7,19 @@
@pytest.mark.slurm
def test_distributed_decorator():
"""Test function decorator. Needs torchrun cmd."""
cmd = ("micromamba run -p ./ai/.venv-pytorch "
cmd = ("micromamba run -p ./.venv-pytorch "
"torchrun --nnodes=1 --nproc_per_node=2 --rdzv_id=100 "
"--rdzv_backend=c10d --rdzv_endpoint=localhost:29400 "
"tests/backend/torch/distribtued_decorator.py")
"tests/torch/distribtued_decorator.py")
subprocess.run(cmd.split(), check=True)


@pytest.mark.skip(reason="TorchTrainer not implemented yet")
@pytest.mark.slurm
def test_distributed_trainer():
"""Test vanilla torch distributed trainer. Needs torchrun cmd."""
cmd = ("micromamba run -p ./ai/.venv-pytorch "
cmd = ("micromamba run -p ./.venv-pytorch "
"torchrun --nnodes=1 --nproc_per_node=2 --rdzv_id=100 "
"--rdzv_backend=c10d --rdzv_endpoint=localhost:29400 "
"tests/backend/torch/torch_dist_trainer.py")
"tests/torch/torch_dist_trainer.py")
subprocess.run(cmd.split(), check=True)
2 changes: 1 addition & 1 deletion tests/torch/torch_dist_trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
from torch.utils.data import DataLoader
from torchvision import transforms, datasets

from itwinai.backend.torch.trainer import TorchTrainer
from itwinai.torch.trainer import TorchTrainer


class Net(nn.Module):
Expand Down
61 changes: 0 additions & 61 deletions tests/torch/torch_dist_trainer2.py

This file was deleted.

0 comments on commit a718b7a

Please sign in to comment.