From 161dabe96c033c83935a11912013204c292ca1e1 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Bj=C3=B6rn=20Claremar?=
<70746791+bclaremar@users.noreply.github.com>
Date: Sun, 14 Jan 2024 16:13:19 +0100
Subject: [PATCH] slurm_intro.md bianca course input
---
docs/slurm_intro.md | 560 ++++++++++++++++++++++++++++++++++++++++----
1 file changed, 509 insertions(+), 51 deletions(-)
diff --git a/docs/slurm_intro.md b/docs/slurm_intro.md
index b210bbd..d2e996e 100644
--- a/docs/slurm_intro.md
+++ b/docs/slurm_intro.md
@@ -5,69 +5,190 @@
```
```{instructor-note}
-- Approx timing: 13.00-14.20 (10 mn break)
+- Approx timing: 13.00-14.20 (10 min break)
- Theory
```
+```{info}
+- For now, **this course**, we use the **material on this present page**.
+- A SLURM introduction can otherwise be found here:
+```
+
+## The compute nodes
+
+When you are logged in, you are on a login node.
+There are two types of nodes:
+
+Type |Purpose
+------------|--------------------------
+Login node |Start jobs for worker nodes, do easy things. You share 2 cores and 15 GB RAM with active users within your project
+Compute nodes |Do hard calculations, either from scripts of an interactive session
+
+Bianca contains hundreds of nodes, each of which is isolated from each other and the Internet.
+
+```mermaid
+
+ graph TB
+
+ Node1 -- interactive --> SubGraph2Flow
+ Node1 -- sbatch --> SubGraph2Flow
+ subgraph "Snowy"
+ SubGraph2Flow(calculation nodes)
+ end
+
+ thinlinc -- usr-sensXXX + 2FA + VPN ----> SubGraph1Flow
+ terminal/thinlinc -- usr --> Node1
+ terminal -- usr-sensXXX + 2FA + VPN ----> SubGraph1Flow
+ Node1 -- usr-sensXXX + 2FA + no VPN ----> SubGraph1Flow
+
+ subgraph "Bianca"
+ SubGraph1Flow(Bianca login) -- usr+passwd --> private(private cluster)
+ private -- interactive --> calcB(calculation nodes)
+ private -- sbatch --> calcB
+ end
+
+ subgraph "Rackham"
+ Node1[Login] -- interactive --> Node2[calculation nodes]
+ Node1 -- sbatch --> Node2
+ end
+```
+
## Slurm, sbatch, the job queue
-- Problem: 1000 users, 500 nodes, 10k cores
-- Need a queue:
+- Problem: _1000 users, 300 nodes, 5000 cores_
+- We need a **queue**:
+
+ - [Slurm](https://slurm.schedmd.com/) is a job scheduler
+
+- You define **jobs** to be run on the compute nodes and therefore sent to the queue.
+
+### Jobs
+- Job = what happens during booked time
+- Described in
+ - a script file or
+ - the command-line (priority over script)
+- The definitions of a job:
+ - Slurm parameters (**flags**)
+ - Load software modules
+ - (Navigate in file system)
+ - Run program(s)
+ - (Collect output)
+ - ... and more
+
+
+```{info "Some keywords"
+ - A program may run _serially_ and then needs only ONE _compute thread_, which will occupy 1 core, which is a physical unit of the CPU on the node.
+ - You should most often just book 1 core. If you require more than 7 GB you can allocate more cores and you will get multiples of 7 GB.
+ - A program may run in _parallel_ and then needs either several _threads_ or several _tasks_, both occupying several cores.
+ - If you need all 128 GB RAM (actually 112) or all 16 cores for your job, book a complete node.
+
+### Slurm parameters
+- 1 mandatory setting for jobs:
+ - Which compute project? (`-A`)
+- 3 settings you really should set:
+ - Type of queue or partition? (`-p`)
+ - ``core`` for most jobs and **default**!
+ - ``node`` for larger jobs
+ - for short development jobs and tests: ``devcore``, ``devel``)
+ - How many cores? (`-n`)
+ - up to 16 for core job
+ - How long at most? (`-t`)
+- If in doubt:
+ - `-p core`
+ - `-n 1`
+ - `-t 10-00:00:00`
+
+### The queue
+
+- How does the queue work?
+
+- Let's look graphically at jobs presently running.
![Image](./img/queue1.png)
-- x-axis: cores, one thread per core
-- y-axis: time
+
+- *x-axis: cores, one thread per core*
+- *y-axis: time*
-- [Slurm](https://slurm.schedmd.com/) is a jobs scheduler
-- Plan your job and but in the slurm job batch (sbatch)
- `sbatch ` or
- `sbatch `
-- Easiest to schedule *single-threaded*, short jobs
+- We see some holes where we may fit jobs already!
+- Let's see which type of jobs that can fit!
![Image](./img/queue2.png)
+
+
+
+- 4 one-core jobs can run immediately (or a 4-core wide job).*
+
+ - *The jobs are too long to fit at core number 9-13.*
+
![Image](./img/queue3.png)
-- Left: 4 one-core jobs can run immediately (or a 4-core wide job).
+
- - The jobs are too long to fit in core number 9-13.
+- A 5-core job has to wait.*
-- Right: A 5-core job has to wait.
+ - *Too long to fit in cores 9-13 and too wide to fit in the last cores.*
- - Too long to fit in cores 9-13 and too wide to fit in the last cores.
+- Easiest to schedule *single-threaded*, short jobs
-## Jobs
-- Job = what happens during booked time
-- Described in a Bash script file
- - Slurm parameters (**flags**)
- - Load software modules
- - (Move around file system)
- - Run programs
- - (Collect output)
-- ... and more
-
-## Slurm parameters
-- 1 mandatory setting for jobs:
- - Which compute project? (`-A`)
- - For example, if your project is named ``NAISS 2017/1-334`` you specify ``-A naiss2017-1-234``
-- 3 settings you really should set:
- - Type of queue? (`-p`)
- - core, node, (for short development jobs and tests: devcore, devel)
- - How many cores? (`-n`)
- - up to 16 (20 on Rackham) for core job
- - How long at most? (`-t`)
-- If in doubt:
- - -`p core`
- - -`n 1`
- - `-t 7-00:00:00`
+```{tip}
+
+ - You don't see the queue graphically, however.
+ - But, overall:
+ - short and narrow jobs will start fast
+ - test and development jobs can get use of specific development nodes if they are shorter than 1 hour and uses up to two nodes.
+ - waste of resources unless you have a parallel program or need all the memory, e.g. 128 GB per node
+
+### Core-hours
+
+- Remember that you are charged CPU-hours according to booked #cores x hours
+- Example 1: 60 hours with 2 cores = 120 CPU-hours
+- Example 2: 12 hours with a full node = 192 hours
+ - Waste of resources unless you have a parallel program using all cores or need all the memory, e.g. 128 GB per node
+
+### Choices
+- Work interactively with your data or develop or test
+ - Run an **Interactive session**
+ - ``$ interactive ...``
+- If you _don't_ need any live interaction with your workflow/analysis/simulation
+ - Send your job to the slurm job batch (sbatch)
+ - `$ sbatch ` or
+ - `$ sbatch `
+
+```mermaid
+flowchart TD
+ UPPMAX(What to run on which node?)
+ operation_type{What type of operation/calculation?}
+ interaction_type{What type of interaction?}
+ login_node(Work on login node)
+ interactive_node(Work on interactive node)
+ calculation_node(Schedule for calculation node)
+
+ UPPMAX-->operation_type
+ operation_type-->|light,short|login_node
+ operation_type-->|heavy,long|interaction_type
+ interaction_type-->|Direct|interactive_node
+ interaction_type-->|Indirect|calculation_node
+```
-![Image](./img/queue1.png)
+### What kind of compute work are you doing?
+- Compute bound
+ - you use mainly CPU power
+ - does the software support threads or MPI?
+ - **Threads/openMP** are rather often supported. **Use several cores!**
+ - **MPI** (Message Passing Interface) allows for inter-node jobs but are seldom supported for bioinformatics software. **You could use several nodes!**
+- Memory bound
+ - if the bottlenecks are allocating memory, copying/duplicating
+ - use more cores up to 1 node, perhaps using a "fat" node.
-- Where should it run? (`-p node` or `-p core`)
-- Use a whole node or just part of it?
- - 1 node = 20 cores (16 on Bianca & Snowy)
- - 1 hour walltime = 20 core hours = expensive
- - Waste of resources unless you have a parallel program or need all the memory, e.g. 128 GB per node
-- Default value: core
+```{admonition} "Slurm Cheat Sheet"
+
+ - ``-A`` project number
+ - ``-t`` wall time
+ - ``-n`` number of cores
+ - ``-N`` number of nodes (can only be used if your code is parallelized with MPI)
+ - ``-p`` partition
+ - ``core`` is default and works for jobs narrower than 16 cores
+ - ``node`` can be used if you need the whole node and its memory
### Walltime at the different clusters
@@ -75,6 +196,348 @@
- Snowy: 30 days
- Bianca: 10 days
+
+## Interactive jobs
+- Most work is most effective as submitted jobs, but e.g. development needs responsiveness
+- Interactive jobs are high-priority but limited in `-n` and `-t`
+- Quickly give you a job and logs you in to the compute node
+- Require same Slurm parameters as other jobs
+- Log in to compute node
+ - `$ interactive ...`
+- Logout with `-D` or `logout`
+
+- To use an interactive node, in a terminal, type:
+
+```bash
+interactive -A [project name] -p core -n [number_of_cores] -t [session_duration]
+```
+
+For example:
+
+```bash
+interactive -A sens2023598 -p core -n 2 -t 8:0:0
+```
+
+This starts an interactive session using project `sens2023598`
+that uses 2 cores and has a maximum duration of 8 hours.
+
+```{tip
+
+ ![copy-paste](./img/copy_paste.PNG)
+
+### Try interactive and run RStudio
+
+```{note "Copied to [intermediate/rstudio.md](intermediate/rstudio.md)"
+
+ One may consider linking to that page :-)
+
+We recommend using at least two cores for RStudio, and to get those resources, you must should start an interactive job.
+
+```{example "Type-along"
+ Use **ThinLinc**
+
+ - Start **interactive session** on compute node (2 cores)
+ - If you already have an interactive session going on use that.
+ - If you don't find it, do
+
+ ``$ squeue``
+
+ - find your session, ssh to it, like:
+
+ ``$ ssh sens2023598-b9``
+
+ - ``$ interactive -A sens2023598 -p devcore -n 2 -t 60:00``
+
+
+ - Once the interactive job has begun you need to load needed modules, even if you had loaded them before in the login node
+ - You can check which node you are on?
+
+ `$ hostname`
+
+ - Also try:
+
+ `$ srun hostname`
+
+ - This will give several output lines resembling the number of cores you allocated.
+ - How many in this case??
+
+ - If the name before ``.bianca.uppmax.uu.se`` is ending with bXX you are on a compute node!
+ - The login node has ``sens2023598-bianca``
+ - You can also probably see this information in your prompt, like:
+ ``[bjornc@sens2023598-b9 ~]$``
+
+ - Load an RStudio module and an R_packages module (if not loading R you will have to stick with R/3.6.0) and run "rstudio" from there.
+
+ `$ ml R_packages/4.2.1`
+
+ `$ ml RStudio/2022.07.1-554`
+
+
+ - **Start rstudio**, keeping terminal active (`&`)
+
+ `$ rstudio &`
+
+ - Slow to start?
+ - Depends on:
+ - number of packages
+ - if you save a lot of data in your RStudio workspace, to be read during start up.
+
+ - **Quit RStudio**!
+ - **Log out** from interactive session with `-D` or `logout` or `exit`
+
+
+## Job scripts (batch)
+
+- Batch scripts can be written in any scripting language. We will use BASH
+- Make first line be `#!/bin/bash` in the top line
+ - It is good practice to end the line with ``-l`` to reload a fresh environment with no modules loaded.
+ - This makes you sure that you don't enable other software or versions that may interfere with what you want to do in the job.
+- Before the job content, add the batch flags starting the lines with the keyword `#SBATCH`, like:
+ - ``#SBATCH -t 2:00:00``
+ - ``#SBATCH -p core``
+ - ``#SBATCH -n 3``
+- `#` will be ignored by `bash` and can run as an ordinary bash script
+- if running the script with the command `sbatch