diff --git a/docs/slurm_intro.md b/docs/slurm_intro.md index b210bbd..d2e996e 100644 --- a/docs/slurm_intro.md +++ b/docs/slurm_intro.md @@ -5,69 +5,190 @@ ``` ```{instructor-note} -- Approx timing: 13.00-14.20 (10 mn break) +- Approx timing: 13.00-14.20 (10 min break) - Theory ``` +```{info} +- For now, **this course**, we use the **material on this present page**. +- A SLURM introduction can otherwise be found here: +``` + +## The compute nodes + +When you are logged in, you are on a login node. +There are two types of nodes: + +Type |Purpose +------------|-------------------------- +Login node |Start jobs for worker nodes, do easy things. You share 2 cores and 15 GB RAM with active users within your project +Compute nodes |Do hard calculations, either from scripts of an interactive session + +Bianca contains hundreds of nodes, each of which is isolated from each other and the Internet. + +```mermaid + + graph TB + + Node1 -- interactive --> SubGraph2Flow + Node1 -- sbatch --> SubGraph2Flow + subgraph "Snowy" + SubGraph2Flow(calculation nodes) + end + + thinlinc -- usr-sensXXX + 2FA + VPN ----> SubGraph1Flow + terminal/thinlinc -- usr --> Node1 + terminal -- usr-sensXXX + 2FA + VPN ----> SubGraph1Flow + Node1 -- usr-sensXXX + 2FA + no VPN ----> SubGraph1Flow + + subgraph "Bianca" + SubGraph1Flow(Bianca login) -- usr+passwd --> private(private cluster) + private -- interactive --> calcB(calculation nodes) + private -- sbatch --> calcB + end + + subgraph "Rackham" + Node1[Login] -- interactive --> Node2[calculation nodes] + Node1 -- sbatch --> Node2 + end +``` + ## Slurm, sbatch, the job queue -- Problem: 1000 users, 500 nodes, 10k cores -- Need a queue: +- Problem: _1000 users, 300 nodes, 5000 cores_ +- We need a **queue**: + + - [Slurm](https://slurm.schedmd.com/) is a job scheduler + +- You define **jobs** to be run on the compute nodes and therefore sent to the queue. + +### Jobs +- Job = what happens during booked time +- Described in + - a script file or + - the command-line (priority over script) +- The definitions of a job: + - Slurm parameters (**flags**) + - Load software modules + - (Navigate in file system) + - Run program(s) + - (Collect output) + - ... and more + + +```{info "Some keywords" + - A program may run _serially_ and then needs only ONE _compute thread_, which will occupy 1 core, which is a physical unit of the CPU on the node. + - You should most often just book 1 core. If you require more than 7 GB you can allocate more cores and you will get multiples of 7 GB. + - A program may run in _parallel_ and then needs either several _threads_ or several _tasks_, both occupying several cores. + - If you need all 128 GB RAM (actually 112) or all 16 cores for your job, book a complete node. + +### Slurm parameters +- 1 mandatory setting for jobs: + - Which compute project? (`-A`) +- 3 settings you really should set: + - Type of queue or partition? (`-p`) + - ``core`` for most jobs and **default**! + - ``node`` for larger jobs + - for short development jobs and tests: ``devcore``, ``devel``) + - How many cores? (`-n`) + - up to 16 for core job + - How long at most? (`-t`) +- If in doubt: + - `-p core` + - `-n 1` + - `-t 10-00:00:00` + +### The queue + +- How does the queue work? + +- Let's look graphically at jobs presently running. ![Image](./img/queue1.png) -- x-axis: cores, one thread per core -- y-axis: time + +- *x-axis: cores, one thread per core* +- *y-axis: time*

-- [Slurm](https://slurm.schedmd.com/) is a jobs scheduler -- Plan your job and but in the slurm job batch (sbatch) - `sbatch ` or - `sbatch ` -- Easiest to schedule *single-threaded*, short jobs +- We see some holes where we may fit jobs already! +- Let's see which type of jobs that can fit! ![Image](./img/queue2.png) + +
+ +- 4 one-core jobs can run immediately (or a 4-core wide job).* + + - *The jobs are too long to fit at core number 9-13.* + ![Image](./img/queue3.png) -- Left: 4 one-core jobs can run immediately (or a 4-core wide job). +
- - The jobs are too long to fit in core number 9-13. +- A 5-core job has to wait.* -- Right: A 5-core job has to wait. + - *Too long to fit in cores 9-13 and too wide to fit in the last cores.* - - Too long to fit in cores 9-13 and too wide to fit in the last cores. +- Easiest to schedule *single-threaded*, short jobs -## Jobs -- Job = what happens during booked time -- Described in a Bash script file - - Slurm parameters (**flags**) - - Load software modules - - (Move around file system) - - Run programs - - (Collect output) -- ... and more - -## Slurm parameters -- 1 mandatory setting for jobs: - - Which compute project? (`-A`) - - For example, if your project is named ``NAISS 2017/1-334`` you specify ``-A naiss2017-1-234`` -- 3 settings you really should set: - - Type of queue? (`-p`) - - core, node, (for short development jobs and tests: devcore, devel) - - How many cores? (`-n`) - - up to 16 (20 on Rackham) for core job - - How long at most? (`-t`) -- If in doubt: - - -`p core` - - -`n 1` - - `-t 7-00:00:00` +```{tip} + + - You don't see the queue graphically, however. + - But, overall: + - short and narrow jobs will start fast + - test and development jobs can get use of specific development nodes if they are shorter than 1 hour and uses up to two nodes. + - waste of resources unless you have a parallel program or need all the memory, e.g. 128 GB per node + +### Core-hours + +- Remember that you are charged CPU-hours according to booked #cores x hours +- Example 1: 60 hours with 2 cores = 120 CPU-hours +- Example 2: 12 hours with a full node = 192 hours + - Waste of resources unless you have a parallel program using all cores or need all the memory, e.g. 128 GB per node + +### Choices +- Work interactively with your data or develop or test + - Run an **Interactive session** + - ``$ interactive ...`` +- If you _don't_ need any live interaction with your workflow/analysis/simulation + - Send your job to the slurm job batch (sbatch) + - `$ sbatch ` or + - `$ sbatch ` + +```mermaid +flowchart TD + UPPMAX(What to run on which node?) + operation_type{What type of operation/calculation?} + interaction_type{What type of interaction?} + login_node(Work on login node) + interactive_node(Work on interactive node) + calculation_node(Schedule for calculation node) + + UPPMAX-->operation_type + operation_type-->|light,short|login_node + operation_type-->|heavy,long|interaction_type + interaction_type-->|Direct|interactive_node + interaction_type-->|Indirect|calculation_node +``` -![Image](./img/queue1.png) +### What kind of compute work are you doing? +- Compute bound + - you use mainly CPU power + - does the software support threads or MPI? + - **Threads/openMP** are rather often supported. **Use several cores!** + - **MPI** (Message Passing Interface) allows for inter-node jobs but are seldom supported for bioinformatics software. **You could use several nodes!** +- Memory bound + - if the bottlenecks are allocating memory, copying/duplicating + - use more cores up to 1 node, perhaps using a "fat" node. -- Where should it run? (`-p node` or `-p core`) -- Use a whole node or just part of it? - - 1 node = 20 cores (16 on Bianca & Snowy) - - 1 hour walltime = 20 core hours = expensive - - Waste of resources unless you have a parallel program or need all the memory, e.g. 128 GB per node -- Default value: core +```{admonition} "Slurm Cheat Sheet" + + - ``-A`` project number + - ``-t`` wall time + - ``-n`` number of cores + - ``-N`` number of nodes (can only be used if your code is parallelized with MPI) + - ``-p`` partition + - ``core`` is default and works for jobs narrower than 16 cores + - ``node`` can be used if you need the whole node and its memory ### Walltime at the different clusters @@ -75,6 +196,348 @@ - Snowy: 30 days - Bianca: 10 days + +## Interactive jobs +- Most work is most effective as submitted jobs, but e.g. development needs responsiveness +- Interactive jobs are high-priority but limited in `-n` and `-t` +- Quickly give you a job and logs you in to the compute node +- Require same Slurm parameters as other jobs +- Log in to compute node + - `$ interactive ...` +- Logout with `-D` or `logout` + +- To use an interactive node, in a terminal, type: + +```bash +interactive -A [project name] -p core -n [number_of_cores] -t [session_duration] +``` + +For example: + +```bash +interactive -A sens2023598 -p core -n 2 -t 8:0:0 +``` + +This starts an interactive session using project `sens2023598` +that uses 2 cores and has a maximum duration of 8 hours. + +```{tip + + ![copy-paste](./img/copy_paste.PNG) + +### Try interactive and run RStudio + +```{note "Copied to [intermediate/rstudio.md](intermediate/rstudio.md)" + + One may consider linking to that page :-) + +We recommend using at least two cores for RStudio, and to get those resources, you must should start an interactive job. + +```{example "Type-along" + Use **ThinLinc** + + - Start **interactive session** on compute node (2 cores) + - If you already have an interactive session going on use that. + - If you don't find it, do + + ``$ squeue`` + + - find your session, ssh to it, like: + + ``$ ssh sens2023598-b9`` + + - ``$ interactive -A sens2023598 -p devcore -n 2 -t 60:00`` + + + - Once the interactive job has begun you need to load needed modules, even if you had loaded them before in the login node + - You can check which node you are on? + + `$ hostname` + + - Also try: + + `$ srun hostname` + + - This will give several output lines resembling the number of cores you allocated. + - How many in this case?? + + - If the name before ``.bianca.uppmax.uu.se`` is ending with bXX you are on a compute node! + - The login node has ``sens2023598-bianca`` + - You can also probably see this information in your prompt, like: + ``[bjornc@sens2023598-b9 ~]$`` + + - Load an RStudio module and an R_packages module (if not loading R you will have to stick with R/3.6.0) and run "rstudio" from there. + + `$ ml R_packages/4.2.1` + + `$ ml RStudio/2022.07.1-554` + + + - **Start rstudio**, keeping terminal active (`&`) + + `$ rstudio &` + + - Slow to start? + - Depends on: + - number of packages + - if you save a lot of data in your RStudio workspace, to be read during start up. + + - **Quit RStudio**! + - **Log out** from interactive session with `-D` or `logout` or `exit` + + +## Job scripts (batch) + +- Batch scripts can be written in any scripting language. We will use BASH +- Make first line be `#!/bin/bash` in the top line + - It is good practice to end the line with ``-l`` to reload a fresh environment with no modules loaded. + - This makes you sure that you don't enable other software or versions that may interfere with what you want to do in the job. +- Before the job content, add the batch flags starting the lines with the keyword `#SBATCH`, like: + - ``#SBATCH -t 2:00:00`` + - ``#SBATCH -p core`` + - ``#SBATCH -n 3`` +- `#` will be ignored by `bash` and can run as an ordinary bash script +- if running the script with the command `sbatch