diff --git a/.nojekyll b/.nojekyll new file mode 100644 index 00000000..e69de29b diff --git a/404.html b/404.html new file mode 100644 index 00000000..187dd8a4 --- /dev/null +++ b/404.html @@ -0,0 +1,1066 @@ + + + +
+ + + + + + + + + + + + + + +Objectives
+Teaching goals:
+Schedule (45 minutes):
+When you are logged in, you are on a login node. +There are two types of nodes:
+Type | +Purpose | +
---|---|
Login node | +Start jobs for worker nodes, do easy things. You share 2 cores and 15 GB RAM with active users within your project | +
Compute nodes | +Do hard calculations, either from scripts of an interactive session | +
Bianca contains hundreds of nodes, each of which is isolated from each other and the Internet.
+
+ graph TB
+
+ Node1 -- interactive --> SubGraph2Flow
+ Node1 -- sbatch --> SubGraph2Flow
+ subgraph "Snowy"
+ SubGraph2Flow(calculation nodes)
+ end
+
+ thinlinc -- usr-sensXXX + 2FA + VPN ----> SubGraph1Flow
+ terminal/thinlinc -- usr --> Node1
+ terminal -- usr-sensXXX + 2FA + VPN ----> SubGraph1Flow
+ Node1 -- usr-sensXXX + 2FA + no VPN ----> SubGraph1Flow
+
+ subgraph "Bianca"
+ SubGraph1Flow(Bianca login) -- usr+passwd --> private(private cluster)
+ private -- interactive --> calcB(calculation nodes)
+ private -- sbatch --> calcB
+ end
+
+ subgraph "Rackham"
+ Node1[Login] -- interactive --> Node2[calculation nodes]
+ Node1 -- sbatch --> Node2
+ end
+Bianca standard nodes
+Details about the compute nodes
+Storage
+We need a queue:
+You define jobs to be run on the compute nodes and therefore sent to the queue.
+Some keywords
+-A
)-p
)core
for most jobs and default!node
for larger jobsdevcore
, devel
)-n
)-t
)-p core
-n 1
-t 10-00:00:00
Tip
+See also
+ +$ interactive <flags> ...
$ sbatch <flags> <program>
or$ sbatch <job script>
flowchart TD
+ UPPMAX(What to run on which node?)
+ operation_type{What type of operation/calculation?}
+ interaction_type{What type of interaction?}
+ login_node(Work on login node)
+ interactive_node(Work on interactive node)
+ calculation_node(Schedule for calculation node)
+
+ UPPMAX-->operation_type
+ operation_type-->|light,short|login_node
+ operation_type-->|heavy,long|interaction_type
+ interaction_type-->|Direct|interactive_node
+ interaction_type-->|Indirect|calculation_node
+Slurm Cheat Sheet
+-A
project number-t
wall time-n
number of cores-N
number of nodes (can only be used if your code is parallelized with MPI)-p
partitioncore
is default and works for jobs narrower than 16 coresnode
can be used if you need the whole node and its memory-n
and -t
$ interactive ...
Logout with <Ctrl>-D
or logout
To use an interactive node, in a terminal, type:
+For example:
+ +This starts an interactive session using project sens2023598
+that uses 2 cores and has a maximum duration of 8 hours.
Tip
+ +We recommend using at least two cores for RStudio, +and to get those resources, you must should start an interactive job.
+Type-along
+Use ThinLinc
+Start interactive session on compute node (2 cores)
+If you already have an interactive session going on use that.
+$ squeue
find your session, ssh to it, like:
+$ ssh sens2023598-b9
$ interactive -A sens2023598 -p devcore -n 2 -t 60:00
Once the interactive job has begun you need to load needed modules, even if you had loaded them before in the login node
+You can check which node you are on?
+$ hostname
Also try:
+$ srun hostname
If the name before .bianca.uppmax.uu.se
is ending with bXX you are on a compute node!
sens2023598-bianca
You can also probably see this information in your prompt, like:
+ [bjornc@sens2023598-b9 ~]$
Load an RStudio module and an R_packages module (if not loading R you will have to stick with R/3.6.0) and run "rstudio" from there.
+$ ml R_packages/4.2.1
$ ml RStudio/2022.07.1-554
Start rstudio, keeping terminal active (&
)
$ rstudio &
Depends on:
+Quit RStudio!
+<Ctrl>-D
or logout
or exit
#!/bin/bash
in the top line-l
to reload a fresh environment with no modules loaded.#SBATCH
, like:#SBATCH -t 2:00:00
#SBATCH -p core
#SBATCH -n 3
#
will be ignored by bash
and can run as an ordinary bash scriptsbatch <script>
the #SBATCH
lines will be interpreted as slurm flagsType-along
+jobscript.sh
~
folder Tip
+ +#!/bin/bash
+
+#SBATCH -A sens2023598 # Project ID
+
+#SBATCH -p devcore # Asking for cores (for test jobs and as opposed to multiple nodes)
+
+#SBATCH -n 1 # Number of cores
+
+#SBATCH -t 00:10:00 # Ten minutes
+
+#SBATCH -J Template_script # Name of the job
+
+# go to some directory
+
+cd /proj/sens2023598/
+pwd -P
+
+# load software modules
+
+module load bioinfo-tools
+module list
+
+# do something
+
+echo Hello world!
+
Run it:
+$ sbatch jobscript.sh
Do you need more resources?
+Do you need more memory than 128 GB or GPU:s?
+-C mem256GB
allocate a fat node with 256 GB RAM-C mem512GB
allocate a fat node with 512 GB RAM-C gpu
-p node
must be used when allocating these nodesinteractive -A <proj> -n 3 -C gpu --gres=gpu:1 -t 01:10:00
Some Limits
+squeue
— quick info about jobs in queuejobinfo
— detailed info about jobsfinishedjobinfo
— summary of finished jobsjobstats
— efficiency of booked resourceseog
to watch the png
output filesbianca_combined_jobinfo
See also
+ +Slurm Cheat Sheet
+-A
project number-t
wall time-n
number of cores-N
number of nodes (can only be used if your code is parallelized with MPI)-p
partitioncore
is default and works for jobs narrower than 16 coresnode
can be used if you need the whole node and its memory-C mem256GB
allocate a fat node with 256 GB RAM-C mem512GB
allocate a fat node with 512 GB RAM-C gpu
Batch jobs
+sbatch <jobscript with all #SBATCH options>
sbatch <options that will be prioritized over the options within the jobs script> <jobscript>
sbatch -t 60:00 -p devcore -n 2 job.sh
Interactive
+interactive -A <project> <other options if not using default settings>
How does the queue work?
+Let's look graphically at jobs presently running.
+y-axis: time
+
We see some holes where we may fit jobs already!
+4 one-core jobs can run immediately (or a 4-core wide job).*
+
+
A 5-core job has to wait.*
+Easiest to schedule single-threaded, short jobs
+Tip
+The goal of this exercise is to make sure you know how to start an +interactive session.
+Because it is an inefficient use of your core hours.
+An interactive session means that you use a calculation node with low efficiency: only irregularly you will use such a node to its full +capacity.
+/proj/sens2023598/workshop/slurm
to your home folder and make the necessary changes.my_bio_worksflow.sh
, for example, with the content/proj/sens2023598/workshop/slurm/my_bio_workflow.sh
file and modify it
+ cd ~
+ cp /proj/sens2023598/workshop/slurm/my_bio_workflow.sh .
my_bio_workflow.sh
and add the SBATCH commands#!/bin/bash
+#SBATCH -A sens2023598
+#SBATCH -J workflow
+#SBATCH -t 01:00:00
+#SBATCH -p core
+#SBATCH -n 2
+
+
+cd ~
+mkdir -p myworkflow
+cd myworkflow
+
+module load bioinfo-tools
+
+# load samtools
+module load samtools/1.17
+
+# copy and example BAM file
+cp -a /proj/sens2023598/workshop/data/ERR1252289.subset.bam .
+
+# index the BAM file
+samtools index ERR1252289.subset.bam
+
+# load the GATK module
+module load GATK/4.3.0.0
+
+# make symbolic links to the hg38 genomes
+ln -s /sw/data/iGenomes/Homo_sapiens/UCSC/hg38/Sequence/WholeGenomeFasta/genome.* .
+
+# create a VCF containing inferred variants
+gatk HaplotypeCaller --reference genome.fa --input ERR1252289.subset.bam --intervals chr1:100300000-100800000 --output ERR1252289.subset.vcf
+
+# use snpEFF to annotate variants
+module load snpEff/5.1
+java -jar $SNPEFF_ROOT/snpEff.jar eff hg38 ERR1252289.subset.vcf > ERR1252289.subset.snpEff.vcf
+
+# compress the annotated VCF and index it
+bgzip ERR1252289.subset.snpEff.vcf
+tabix -p vcf ERR1252289.subset.snpEff.vcf.gz
+
make the job script executable +
+submit the job +
+jobstats
Keypoints
+