Skip to content

Commit

Permalink
minors
Browse files Browse the repository at this point in the history
  • Loading branch information
KasperSkytte committed Nov 10, 2023
1 parent 62f7c53 commit 2af7cb7
Showing 1 changed file with 7 additions and 6 deletions.
13 changes: 7 additions & 6 deletions docs/slurm/request.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ The `sbatch` is in many cases the best way to use SLURM. It's different in the w
```bash
#!/usr/bin/bash -l
#SBATCH --job-name=minimap2test
#SBATCH --output=/user_data/abc/slurmjobs/job_%j.txt
#SBATCH --output=job_%j_%x.txt
#SBATCH --ntasks-per-node=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=10
Expand All @@ -54,7 +54,8 @@ minimap2 -t 10 database.fastq input.fastq > out.file
???+ Important
The `bash -l` in the top "shebang" line is required for the compute nodes to be able to load conda environments correctly.

Submit the batch script to the SLURM job queue using `sbatch minimap2test.sh`, and it will then start once the requested amount of ressources are available (also taking into account your past usage and priorities of other jobs etc, all 3 job submission commands do that). If you set the `--mail-user` and `--mail-type` arguments you should get a notification email once the job starts and finishes with additional details like how many ressources you have actually used compared to what you have requested. This is essential information for future jobs to avoid overbooking. As the job is handled by slurm in the background by the SLURM daemons on the individual compute nodes you won't see any output to the terminal, it will instead be written to the file defined by `--output`. To follow along use `tail -f /user_data/abc/slurmjobs/job_123.txt`.
Submit the batch script to the SLURM job queue using `sbatch minimap2test.sh`, and it will then start once the requested amount of ressources are available (also taking into account your past usage and priorities of other jobs etc, all 3 job submission commands do that). If you set the `--mail-user` and `--mail-type` arguments you should get a notification email once the job starts and finishes with additional details like how many ressources you have actually used compared to what you have requested. This is essential information for future jobs to avoid overbooking. As the job is handled by slurm in the background by the SLURM daemons on the individual compute nodes you won't see any output to the terminal, it will instead be written to the file defined by `--output`. To follow along use `tail -f /user_data/abc/slurmjobs/job_123.txt`. The line `--output=job_%j_%x.txt` above would result in an output file named job_xx_minimap2test.txt in the current working directory.


## Most essential options
There are plenty of options with the SLURM job submission commands, but below are the most important ones for our current setup and common use-cases. If you need anything else you can start with the [SLURM cheatsheet](https://slurm.schedmd.com/pdfs/summary.pdf), or else refer to the SLURM documentation for the individual commands [`srun`](https://slurm.schedmd.com/srun.html), [`salloc`](https://slurm.schedmd.com/salloc.html), and [`sbatch`](https://slurm.schedmd.com/sbatch.html).
Expand Down Expand Up @@ -86,12 +87,12 @@ Jobs that will spawn many parallel processes, fx when using GNU `parallel` or `x
If you need to use one or more GPUs you need to specify `--partition=biocloud-gpu` and set `--gres=gpu:x`, where `x` refers to the number of GPUs you need. Please don't do CPU work on the `biocloud-gpu` partition unless you also need a GPU.

## How many ressources should I request for my job(s)?
Exactly how many ressources your job(s) need(s) is something you have to experiment with and learn over time based on past experience. It's important to do a bit of experimentation before submitting large jobs to obtain a qualified guess since the utilization of all the allocated ressources across the cluster are ultimately based on people's own assessments alone. Below are some tips regarding CPU and memory.
Exactly how many ressources your job(s) need(s) is something you have to experiment with and learn over time based on past experience. It's important to do a bit of experimentation before submitting large jobs to obtain a qualified guess since the utilization of all the allocated ressources across the cluster is ultimately based on people's own assessments alone. Below are some tips regarding CPU and memory.

### CPUs/threads
In general the number of CPUs that you book only affects how long the job will take to finish. Since most tools don't use 100% of each and every allocated thread throughout the duration (due to for example I/O delays, internal thread communication, single-threaded job steps etc), our partitions are set with an **oversubscription factor of 1.5** (not yet, just preparing docs for it) to optimize ressource utilization. This means that SLURM will in total allocate more CPUs than there are physical cores or hyper-threads on each compute node. For example, SLURM will allocate up to 288 CPUs on a compute node with 192 threads across all SLURM jobs on the node. The number of threads is not a hard limit like the physical amount of memory is, on the other hand, and SLURM will never exceed the maximum physical memory of each compute node. Instead jobs are killed if they exceed the allocated amount of memory for the job or not be allowed to start in the first place.
In general the number of CPUs that you book only affects how long the job will take to finish. Since most tools don't use 100% of each and every allocated thread throughout the duration (due to for example I/O delays, internal thread communication, single-threaded job steps etc), our partitions are set with an **oversubscription factor of 1.5** (not yet, just preparing docs for it) to optimize ressource utilization. This means that SLURM will in total allocate more CPUs than there are physical cores or hyper-threads on each compute node. For example, SLURM will allocate up to 288 CPUs on a compute node with 192 threads across all SLURM jobs on the node. The number of threads is not a hard limit like the physical amount of memory is, on the other hand, and SLURM will never exceed the maximum physical memory of each compute node. Instead jobs are killed if they exceed the allocated amount of memory for the job, or not be allowed to start in the first place.

### Memory
Requesting a sensible maximum amount of memory is important to avoid crashing jobs. It's generally best to **allocate more memory** than what you need, so that the job doesn't crash and the spent ressources don't go to waste and could have been used for something else. To obtain a qualified guess you can start the job based on an initial expectation, and then set a job time limit of maybe 5-10 minutes just to see if it might crash due to exceeding the allocated ressources, and if not you will see the maximum memory usage in the email notification report. Then adjust accordingly and submit again with 10-15% extra than what was used at maximum. Certain steps of a workflow will obviously use more than others, so either request a maximum across all steps, split the job into multiple jobs, or use workflow tools that support cluster execution, for example [snakemake](https://snakemake.readthedocs.io/en/stable/executing/cluster.html).
Requesting a sensible maximum amount of memory is important to avoid crashing jobs. It's generally best to **allocate more memory** than what you need, so that the job doesn't crash and the spent ressources don't go to waste and could have been used for something else. To obtain a qualified guess you can start the job based on an initial expectation, and then set a job time limit of maybe 5-10 minutes just to see if it might crash due to exceeding the allocated ressources, and if not you will see the maximum memory usage for the job in the email notification report. Then adjust accordingly and submit again with 10-15% extra than what was used at maximum. Different steps of a workflow will in many cases unavoidably need more memory than others, and so it might be a good idea to either split the job into multiple jobs, or use workflow tools that support cluster execution, for example [snakemake](https://snakemake.readthedocs.io/en/stable/executing/cluster.html).

Our compute nodes have plenty of memory, but some tools require lots of memory. If you know that your job is going to use a lot of memory, say 1TB, you might as well also request more CPUs, since your job will likely allocate a full compute node alone and you can then finish the job faster. This of course depends on which compute node your job is allocated to, so you might want to request ressources on individual compute nodes specifically using the `nodelist` option, refer to the [hardware overview](../index.md).
Our compute nodes have plenty of memory, but some tools require lots of memory. If you know that your job is going to use a lot of memory, say 1TB, you might as well also request more CPUs, since your job will likely allocate a full compute node alone, and you can then finish the job faster. This of course depends on which compute node your job is allocated to, so you might want to request ressources on individual compute nodes specifically using the `nodelist` option, refer to the [hardware overview](../index.md).

0 comments on commit 2af7cb7

Please sign in to comment.