diff --git a/admin/getting-access/index.html b/admin/getting-access/index.html index 44a8c502b..d0c544a44 100644 --- a/admin/getting-access/index.html +++ b/admin/getting-access/index.html @@ -3357,7 +3357,8 @@
Welcome to the user documentation of the BIH high-performance computing (HPC) cluster, also called HPC 4 Research. The BIH HPC cluster is managed by CUBI (Core Unit Bioinformatics). This documentation is maintained by BIH CUBI and the user community. It is a living document that you can update and add to. See How-To: Contribute to this Document for details.
The global table of contents is on the left, the one of the current page is on the right.
Additional resources
Read the following set of pages (in order) to learn how to get access and connect to the cluster.
Acknowledging BIH HPC Usage
Acknowledge usage of the cluster in your manuscript as \"Computation has been performed on the HPC for Research/Clinic cluster of the Berlin Institute of Health\". Please add your publications using the cluster to this list.
"},{"location":"#news-maintenance-announcements","title":"News & Maintenance Announcements","text":"hpc-mem-5
with 4 TB of RAM./fast
on all non-transfer nodes.See Maintenance for a detailed list of current, planned, and previous maintenance and update work.
"},{"location":"#technical-details","title":"Technical Details","text":"If you are interested in how this HPC cluster is set up on a technical level, we got you covered. There is an entire section on this.
"},{"location":"#documentation-structure","title":"Documentation Structure","text":"The documentation is structured as follows:
Access to the BIH HPC cluster is conceptually based on user groups (also known as labs or units) and projects. Users have a relatively limited storage quota within their private home folder and store big data primarily within their group's work space or in project folders. Projects are collaborative efforts involving multiple PIs/groups and are allocated separate storage space on the cluster.
Independent group leaders at BIH/Charit\u00e9/MDC can request a group on the cluster and name group members. The work group leader (the group PI) bears the responsibility for the group's members and ensures that cluster policies and etiquette are followed. In brief: Fair usage rules apply and the cluster ist not to be abused for unethical or illegal purposes. Major and/or continued violations may lead to exclusion of the entire group.
The group leader may also name one delegate (typically an IT-savvy Post-Doc) who is thereby allowed to take decision about cluster usage and work group management on behalf of the group leader. The above mentioned responsibilities stay with the group leader.
Note
All cluster users are member of exactly one primary work group. This affiliation is usually defined by real life organisational structures within Charit\u00e9/BIH/MDC. Leaders of independent research groups (PIs) can apply for a new cluster work group as follows:
Important
Changes to an existing group (adding new users, changes in resources, etc.) can only be requested by group leaders and delegates.
"},{"location":"admin/getting-access/#form-new-group","title":"Form: New Group","text":"Example values are given in curly braces.
# Group \"ag-{doe}\"\nGroup leader/PI: {John Doe}\nDelegate [optional]: {Max Mustermann}\nPurpose of cluster usage [short]: {RNA-seq analysis in colorectal cancer}\n\nRequired resources:\n- Tier 1 storage: {1 TB}\n- Tier 1 scratch: {10 TB}\n- Tier 2 storage: {10 TB}\n\n# Users\n## User 1\n- first name: {John}\n- last name: {Doe}\n- affiliation: {Charit\u00e9, Department of Oncology}\n- institute email: {john.doe@charite.de}\n- user has account with\n - [ ] BIH\n - [x] Charite\n - [ ] MDC\n- BIH/Charit\u00e9/MDC user name: {doej}\n\n## User 2\n[etc.]\n
"},{"location":"admin/getting-access/#form-add-user-to-group","title":"Form: Add User to Group","text":"Example values are given in curly braces.
# New user of AG {Doe}\n- first name: {Mia}\n- last name: {Smith}\n- affiliation: {Charit\u00e9, Department of Oncology}\n- institute email: {mia.smith@charite.de}\n- user has account with\n - [ ] BIH\n - [x] Charite\n - [ ] MDC\n- BIH/Charit\u00e9/MDC user name: {smithm}\n
Notes
Projects are secondary user groups to enable:
Project creation can be initiated by group leaders and group delegates as follows:
Important
Changes to an existing project (adding new users, changes in resources, etc.) can only be requested by project owners and delegates. Please send us cluster user names for adding new project members.
"},{"location":"admin/getting-access/#form","title":"Form","text":"Example values are given in curly braces.
# Project \"{doe-dbgap-rna}\"\nProject owner: {John Doe}, {doej_c}\nDelegate [optional]: {Max Mustermann}, {musterm_c}\nPurpose of cluster usage [short]: {RNA-seq data from dbGAP}\n\nRequired resources:\n- Tier 1 storage: {0 TB}\n- Tier 2 storage: {1 TB}\n\nAdditional members (cluster user names):\n- {sorgls_c}\n- ...\n
Notes
This page documents the current and known upcoming maintenance windows.
"},{"location":"admin/maintenance/#login-compute-and-storage-maintenance-december-13-14-2022","title":"Login, Compute and Storage Maintenance, December 13-14, 2022","text":"All informationand updates regarding maintenance will be circulated on our forum https://hpc-talk.cubi.bihealth.org/c/announcements/5.
"},{"location":"admin/maintenance/#login-compute-and-storage-maintenance-march-22-23-2022","title":"Login, Compute and Storage Maintenance, March 22-23, 2022","text":"All COMPUTE nodes and STORAGE resources won't be reachable!
All nodes will be running in RESERVATION mode. This means you are still able to schedule new jobs on these nodes if their potential/allowed runtime does not extend into the maintenance window (Tuesday and Wednesday, March 22 and 23, all-day). For example, if you submit a job that can run up to 7 days after March 15 then the job will remain in \"pending/PD\" state giving the explanation of \"all nodes being reserved or unavailable\".
Issues of today's maintenance:
/tmp
on login nodescephfs-2
switches (Tier 2 storage, not relevant for most users)IMPORTANT
Progress Thread on hpc-talk
"},{"location":"admin/maintenance/#drmaa-deprecation-march-2-2022","title":"DRMAA Deprecation, March 2, 2022","text":"scontrol show job JOBID
and sacct -j JOBID
.snakemake --profile=cubi-v1
instead of snakemake --drmaa \"...\"
.rule myrule:\n # ...\n threads: 8\n resources:\n time=\"12:00:00\",\n memory=\"8G\",\n # ...\n
SchedulerParameters+=bf_max_job_user=50
: backfill scheduler only considers 50 jobs of each user. This mitigates an issue with some users having too many jobs and thus other users' jobs don't get ahead in the queueEnforcePartLimits=ALL
: jobs that don't fit into their partition are rejectedDependencyParameters=kill_invalid_depend
: jobs that have dependencies set that cannot be fulfilled will be killedlocaltmp
Resource, January 31, 2022","text":"localtmp
resource for local storage above 100MB./tmp
using Linux namespaces/cgroups. This greatly improves the reliability of cleaning up after jobs. (Technically, this is implemented using the Slurm job_container/tmpfs) plugin.Gres
) \"localtmp\". In the future this will become a requirement. Also see Slurm: Temporary Files.hpc-login-1.cubi.bihealth.org
hpc-login-2.cubi.bihealth.org
hpc-portal.cubi.bihealth.org
hpc-transfer-1.cubi.bihealth.org
hpc-transfer-2.cubi.bihealth.org
hpc-gpu-{5..7}
. 28.08.5
.The GPFS storage system has been upgraded to the latest version to make compatible with Enterprise Linux version 8.
"},{"location":"admin/maintenance/#slurm-upgrade-to-21080-september-8-2021","title":"Slurm upgrade to21.08.0
, September 8, 2021","text":"Slurm has been upgraded to version 21.08.0
.
All servers/nodes won't be reachable!
All nodes will be running in reservation mode. This means you are still able to schedule new jobs on these nodes if their potential/allowed runtime does not extend into the maintenance window (Tuesday and Wednesday, September 7 and 8, all-day). For example, if you submit a job that can run up to 7 days after August 30 then the job will remain in \"pending/PD\" state giving the explanation of \"all nodes being reserved or unavailable\".
If you already have a job running on any nodes that goes beyond September 7, 12:00 am (00:00 Uhr), this job will die.
"},{"location":"admin/maintenance/#renaming-of-gpu-high-memory-machines-scheduler-changes-september-7-2021","title":"Renaming of GPU & High Memory Machines & Scheduler Changes, September 7, 2021","text":"The GPU machines med030[1-4]
have been renamed to hpc-gpu-[1-4]
. The high memory machines med040[1-4]
have been renamed to hpc-mem-[1-4]
. It will probably take us some time to update all places in the documentation.
Further, the long
partition has been changed to allow jobs with a maximum running time of 14 days.
staging
partition, August 31, 2021","text":"We have installed 36 new nodes (in BETA mode) in the cluster called hpc-node-[1-36]
. They have 48 cores (thus 96 hardware threads) each and have 360GiB of main memory available (for the hardware nerds, it's Intel(R) Xeon(R) Gold 6240R CPUs at 2.40GHz, featuring the cascadelake
architecture).
Right now, they are only available in the staging
partition. After some testing we will move them to the other partitions. We'd like to ask you to test them as well and report any issues to hpc-helpdesk@bih-charite.de. The nodes have been setup identically to the existing med0xxx
nodes. We do not expect big changes but the nodes might not be as stable as other oness.
Here is how you can reach them.
hpc-login-1 # srun --immediate=5 --pty --time=24:00:00 --partition=staging bash -i\n[...]\nhpc-cpu-1 #\n
Note that I'm specifying a maximal running time of 24h so the scheduler will end the job after 24 hours which is before the upcoming maintenance reservation begins. By default, the scheduler allocates 28 days to the job which means that the job cannot end before the reservation and will be scheduled to start after it. See Reservations / Maintenances for more information about maintenance reservations.
"},{"location":"admin/maintenance/#reservation-maintenance-display-on-login-august-30-2021","title":"Reservation / Maintenance Display on Login, August 30, 2021","text":"User will now be notified on login about maintenance, for example:
NOTE: scheduled maintenance(s)\n\n 1: 2021-09-07 00:00:00 to 2021-09-09 00:00:00 ALL nodes\n\nSlurm jobs will only start if they do not overlap with scheduled reservations.\nMore information:\n\n - https://bihealth.github.io/bih-cluster/slurm/reservations/\n - https://bihealth.github.io/bih-cluster/admin/maintenance/\n
"},{"location":"admin/maintenance/#update-to-job-sumission-script-august-23-2021","title":"Update to Job Sumission Script, August 23, 2021","text":"The srun
command will now behave as if --immediate=60
has been specified by default. It explains how to override this behaviour and possible reasons for job scheduling to fail within 60 seconds (reservations and full cluster).
We upgrade from 20.11.2
to 20.11.8
which contains some fixes for bugs that our users actually stumbled over. The change should be non-intrusive as it's only a patch-level update.
Following servers won't be reachable:
These nodes are running in reservation mode now. This means you are still able to schedule new jobs on these nodes if their potential/allowed runtime does not extend into the maintenance window (Tuesday, August 3, all-day). For example, if you submit a job that can run up to 7 days after July 26 then the job will remain in \"pending/PD\" state giving the explanation of \"all nodes being reserved or unavailable\". If you have a job running on any of the before mentioned nodes that goes beyond August 3, 12:00 am (00:00 Uhr), this job will die. We do not expect the remaining nodes to be affected. However, there remains a minor risk of unexpected downtime of other nodes.
"},{"location":"admin/maintenance/#server-reorganization-july-13-2021","title":"Server reorganization, July 13, 2021","text":"Affected servers are:
If you have a job running on any of the before mentioned nodes that goes beyond June 22, 6am, this job will die. We put a so-called Slurm reservation for the maintenance period. Any job that is scheduled before the maintenance and whose end time (start time + max running time) is not before the start of the maintenance will not be scheduled with the message ReqNodeNotAvail, Reserved for maintenance.
Affected servers are:
HPC 4 Research
Note
This task is currently being planned. No schedule has been fixed yet.
Note
This task is currently being planned. No schedule has been fixed yet.
/fast
that currently points to /data/gpfs-1
on HPC 4 Research./data
instead of /fast
everywhere, e.g., /data/users/$NAME
etc.Time: 6am-12am
/fast
file system will be re-mounted to /data/gpfs-1
./fast
becomes a symbolic link to /data
on all of the cluster.hpc-login-1.cubi.bihealth.org
and login-2...
instead of hpc-login-{1,2}
.hpc-transfer-{1,2}
which will be replaced by transfer-1.research.hpc.bihealth.org
and transfer-2...
.med010[1-3]
and med012[5-6]
.On June 3, we need to perform a network maintenance at 8 am.
If everything goes well, there might be a short delay in network packages and connections will survive. In this case, the maintenance will end 8:30 am.
Otherwise, the maintenance will finish by noon.
"},{"location":"admin/maintenance/#cluster-maintenance-with-downtime-june-16","title":"Cluster Maintenance with Downtime: June 16","text":"We need to schedule a full cluster downtime on June 16.
"},{"location":"admin/maintenance/#slurm-migration","title":"Slurm Migration","text":"We will switch to the Slurm workload scheduler (from the legacy SGE). The main reason is that Slurm allows for better scheduling of GPUs (and has loads of improvements over SGE), but the syntax is a bit different. Currently, our documentation is in an transient state. We are currently extending our Slurm-specific documentation.
SSH Key Management has switched to using Charite and MDC ActiveDirectory servers. You need to upload all keys by the end of April 2020.
Schedule
Feb 4, 2020:
Keys are now also taken from central MDC/Charite servers. You do not need to contact us any more to update your keys (we cannot accelerate the process at MDC).May 1, 2020:
Keys are now only taken from central MDC/Charite servers. You must upload your keys to central servers by then.Affected systems:
hpc-transfer-1
hpc-transfer-2
hpc-login-2
The compute nodes are non-critical as we are taking them out of the queues now.
"},{"location":"admin/maintenance/#centos-76-upgrade-january-29-february-5","title":"CentOS 7.6 Upgrade, January 29, February 5","text":"Starting monday 03.09.2018 we will be performing rolling update of the cluster from CentOS 7.4 to CentOS 7.5. Since update will be performed in small bunches of nodes, the only impact you should notice is smaller number of nodes available for computation.
Also, for around two weeks, you can expect that your jobs can hit both CentOS 7.4 & CentOS 7.5 nodes. This should not impact you in any way, but if you encounter any unexpected behavior of the cluster during this time, please let us know.
At some point we will have to update the transfer, and login nodes. We will do this also in parts, so the you can switch to the other machine.
Key dates are:
18.09.2018 - hpc-login-1 & hpc-transfer-1 will not be available, and you should switch to hpc-login-2 & hpc-transfer-2 respectively.
25.09.2018 - hpc-login-2 & hpc-transfer-2 will not be available, and you should switch to hpc-login-1 & hpc-transfer-1 respectively.
Please also be informed that non-invasive maintenance this weekend which we announced has been canceled, so cluster will operate normally.
In case of any concerns, issues, do not hesitate to contact us via hpc-admin@bih-charite.de, or hpc-helpdesk@bih-charite.de.
"},{"location":"admin/maintenance/#june-18-2018-0600-1500","title":"June 18, 2018, 0600-1500","text":"Due to tasks we need to perform on BIH cluster, we have planned maintenance:
During maintenance we will perform several actions:
During maintenance whole cluster will not be usable, this includes:
Maintenance window is quite long, since we are dependent on external vendor. However, we will recover services as soon as possible.
We will keep you posted during maintenance with services status.
"},{"location":"admin/maintenance/#march-16-18-2018-mdc-it","title":"March 16-18, 2018 (MDC IT)","text":"MDC IT has a network maintenance from Friday, March 16 18:00 hours until Sunday March 18 18:00 hours.
This will affect connections to the cluster but no connections within the cluster.
"},{"location":"admin/maintenance/#january-17-2018-complete","title":"January 17, 2018 (Complete)","text":"STATUS: complete
The first aim of this window is to upgrade the cluster to CentOS 7.4 to patch against the Meltdown/Spectre vulnerabilities. For this, the login and transfer nodes have to be rebooted.
The second aim of this window is to reboot the file server to mitigate some NFS errors. For this, the SGE master has to be stopped for some time.
"},{"location":"admin/maintenance/#planprogress","title":"Plan/Progress","text":"(since January 2010)
This page describes strictly enforced policies valid on the BIH HPC clusters.
The aim of the HPC systems is to support the users in their scientific work and relies on their cooperation. First and foremost, the administration team enforces state of the art IT security and reliability practices through their organizational and operational processes and actions. We kindly ask user to follow the Cluster Etiquette describe below to allow for fair use and flexible access to the shared resources. Beyond this, policies are introduced or enforced only when required to ensure non-restrictive access to the resources themselves. Major or recurrent breaches of policies may lead to exclusion from service.
We will update this list of policies over time. Larger changes will be announced through the mailing list.
"},{"location":"admin/policies/#cluster-etiquette","title":"Cluster Etiquette","text":"getent passswd $USER
to find out the user's office contact details).conda
, archive management tools such as tar
, (un)zip
, or gzip
. You should probably only run screen
/tmux
and maybe a text editor there.hpc-transfer-1
and hpc-transfer-2
.In the case of violations marked with a shield () administration reserves the right to remove write and possibly read permission to the given locations. Policies marked with a robot () are automatically enforced.
home
, work
, and scratch
volume). You can request an increase by an email to hpc-helpdesk@bih-charite.de for groups and projects.home
10k files, 1GB spacework
2M files, 1TB spacescratch
20M files, 200TB spacehpc-users
and mode is u=rwx,go=
; POSIX ACLs are prohibited. This policy is automatically enforced every 5 minutes.u=rwx,g=rwxs,o=
; POSIX ACLs are prohibited. This policy is automatically enforced every 5 minutes.scratch/BIH_TRASH
after 14 days (by mtime
) over night. Trash directories will be removed after 14 further days.touch
on files in scratch
and subsequently bumping the mtime
./tmp
). In the case that users need to delete files that they can access but not update/delete, administration will either give write permissions to the Unix group of the work group or project or change the owner to the owner/delegate of this group. This can occur in a group/project directory of a user who has left the organization. In the case that a user leaves the organization, the owner/delegate of the hosting group can request getting access to the user's files with the express agreement of this user./tmp
in Slurm-controlled jobs. This will enforce that Slurm can clean up after you.Network connections are a topic important in security. In the case of violations marked with a shield () administration reserves the right to terminate connections without notice and perform other actions.
screen
and tmux
are only allowed to run on the head nodes. They will be terminated automatically on the compute nodes.srun
).~/.ssh/authorized_keys
file but their usage is discouraged.~/.bashrc
Guide","text":"You can find the current default content of newly created user homes in /etc/skel.bih
:
hpc-login-1:~$ head /etc/skel.bih/.bash*\n==> /etc/skel.bih/.bash_logout <==\n# ~/.bash_logout\n\n==> /etc/skel.bih/.bash_profile <==\n# .bash_profile\n\n# Get the aliases and functions\nif [ -f ~/.bashrc ]; then\n . ~/.bashrc\nfi\n\n# User specific environment and startup programs\n\nPATH=$PATH:$HOME/.local/bin:$HOME/bin\n\n==> /etc/skel.bih/.bashrc <==\n# .bashrc\n\n# Source global definitions\nif [ -f /etc/bashrc ]; then\n . /etc/bashrc\nfi\n\n# Uncomment the following line if you don't like systemctl's auto-paging feature:\n# export SYSTEMD_PAGER=\n
"},{"location":"best-practice/env-modules/","title":"Custom Environment Modules","text":"This document contains a few tips for helping you using environment modules more effectively. As the general online documentation is lacking a bit, we also give the most popular commands here.
"},{"location":"best-practice/env-modules/#how-does-it-work","title":"How does it Work?","text":"Environment modules are descriptions of software packages. The module
command is provided which allows the manipulation of environment variables such as PATH
, MANPATH
, etc., such that programs are available without passing the full path. Environment modules also allow specifying dependencies between packages and conflicting packages (e.g., when the same binary is available in two packages). Further, environment variables allow the parallel installation of different software versions in parallel and then using software \"a la carte\" in your projects.
List currently loaded modules:
$ module list\n
Show all available modules
$ module avail\n
"},{"location":"best-practice/env-modules/#loadingunloading-modules","title":"Loading/Unloading Modules","text":"Load one module, make sure to use a specific version to avoid ambiguities.
$ module load Jannovar/0.16-Java-1.7.0_80\n
Unload one module
$ module unload Jannovar\n
Unload all modules
$ module purge\n
"},{"location":"best-practice/env-modules/#getting-help","title":"Getting Help","text":"Get help for environment modules
$ module help\n
Get help for a particular environment module
$ module help Jannovar/0.16-Java-1.7.0_80\n
"},{"location":"best-practice/env-modules/#using-your-own-module-files","title":"Using your own Module Files","text":"You can also create your own environment modules. Simply create a directory with module files and then use module use
for using the modules from the directory tree.
$ module use path/to/modules\n
"},{"location":"best-practice/env-modules/#faq-why-bash-module-command-not-found","title":"FAQ: Why -bash: module: command not found
?","text":"On the login nodes, the module
command is not installed. You should not run any computations there, so why would you need environment modules there? ;)
meg-login2$ module\n-bash: module: command not found\n
Use srun --pty bash -i
to get to one of the compute nodes.
You will certainly finding yourself using a set of programs regularly without it being part of the core cluster installation, e.g., SAMtools, or Python 3. Just putting the appropriate module load
lines in your ~/.bashrc
will generate warnings when logging into the login node. It is thus recommended to use the following snippet for loading modules automatically on logging into a compute node:
case \"${HOSTNAME}\" in\n login-*)\n ;;\n *)\n # load Python3 environment module\n module load Python/3.4.3-foss-2015a\n\n # Define path for temporary directories, don't forget to cleanup!\n # Also, this will only work after /fast is available.\n export TMPDIR=/data/cephfs-1/home/users/$USER/scratch/tmp\n ;;\nesac\n
"},{"location":"best-practice/project-structure/","title":"Project File System Structure","text":"Under Construction
This guide was written for the old GPFS file system and is in the process of being updated.
"},{"location":"best-practice/project-structure/#general-aims","title":"General Aims","text":"Mostly, you can separate the files in your projects/pipelines into one of the following categories:
Ideally, scripts and documentation are independent of a given project and can be separated from the rest. Configuration is project-dependent and small and mostly does not contain any sensitive information (such as genotypes that allows for reidentification of donors). In most cases, data might be large and is either also stored elsewhere or together with scripts and configuration can be regenerated easily.
There is no backup of work
and scratch
The cluster GPFS file system /fast
is not appropriate for keeping around single \"master\" copies of data. You should have a backup and archival strategy for your valuable \"master\" copy data.
In addition, you might need project-specific \"wrapper\" scripts that just call your project-independent script with the correct paths for your project. These scripts rather fall into the \"configuration\" category and should then live together with your configuration.
"},{"location":"best-practice/project-structure/#data","title":"Data","text":"Temporary files
You really should keep temporary files in a temporary directory, set the environment variable TMPDIR
appropriately and automatically clean them up (see Useful Tips: Temporary Files)
But how can we put this into practice? Below, we give some examples of how to do this. Note that for simplicity's sake we put all scripts and configuration into one directory/repository contrary to the best practices above. This is for educational purposes only and you should strive for reuseable scripts where it makes sense and separate scripts and configuration.
We will limit this to simple Bash scripts for education's purposes. You should be able to easily adapt this to your use cases.
Thus, the aim is to separate the data from the non-data part of the project such that we can put the non-data part of the project into a separate location and under version control. We call the location for non-data part of the project the home location of your project and the location for the data part of the project the work location of your project.
Overall, we have three options:
Creating the work directory and copy the input files into work/input
.
$ mkdir -p project/work/input\n$ cp /data/cephfs-1/work/projects/cubit/tutorial/input/* project/work/input\n
Creating the home space. We initialize a Git repository, properly configure the .gitignore
file and add a README.md
file.
$ mkdir -p project/home\n$ cd project/home\n$ cat <<EOF >.gitignore\n*~\n.*.sw?\nEOF\n$ cat <<EOF >README.md\n# Example Project\n\nThis is an example project with config/scripts linked into work location.\nEOF\n$ git init\n$ git add .gitignore README.md\n$ git commit -m 'Initial project#\n
We then create the a simple script for executing the mapping step and a configuration file that gives the path to the index and list of samples to process.
$ mkdir scripts\n$ cat <<\"EOF\" >scripts/run-mapping.sh\n#!/bin/bash\n\n# Unofficial Bash script mode, see:\n# http://redsymbol.net/articles/unofficial-bash-strict-mode/\nset -euo pipefail\n\n# Get directory to bash file, see\n# https://stackoverflow.com/a/4774063/84349\nSCRIPTPATH=\"$( cd \"$(dirname \"$0\")\" ; pwd -P )\"\n\n# Helper function to print help to stderr.\nhelp()\n{\n >&2 echo \"Run Mapping Step\"\n >&2 echo \"\"\n >&2 echo \"run-mapping.sh [-c config.sh] [-h]\"\n}\n\n# Parse command line arguments into bash variables.\nCONFIG=\nwhile getopts \"hs:\" arg; do\n case $arg in\n h)\n help()\n exit\n ;;\n s)\n CONFIG=$OPTARG\n ;;\n esac\ndone\n\n# Print the executed commands.\nset -x\n\n# Load default configuration, then load configuration file if any was given.\nsource $SCRIPTPATH/../config/default-config.sh\nif [[ -z \"$CONFIG\" ]]; then\n source $CONFIG\nfi\n\n# Create output directory.\nmkdir -p output\n\n# Actually perform the mapping. This assumes that you have\n# made the bwa and samtools commands available, e.g., using conda.\nfor sample in $SAMPLES; do\n bwa mem \\\n $BWA_INDEX \\\n input/${sample}_R1.fq.gz \\\n input/${sample}_R2.fq.gz \\\n | samtools sort \\\n -o output/${sample}.bam \\\n /dev/stdin\ndone\n\nEOF\n$ chmod +x scripts/run-mapping.sh\n$ mkdir -p config\n$ cat <<\"EOF\" >config/default-config.sh\nBWA_INDEX=/data/cephfs-1/work/projects/cubit/current/static_data/reference/GRCh37/hs37d5/hs37d5.fa\nSAMPLES=\nEOF\n$ cat <<\"EOF\" >config/project-config.sh\n$ BWA_INDEX comes from default configuration already\nSAMPLES=test\nEOF\n
This concludes the basic project setup. Now, to the symlinks:
$ cd ../work\n$ ln -s ../home/scripts ../home/config .\n
And, to the execution...
$ ./scripts/run-mapping -c config/project-config.sh\n[...]\n
"},{"location":"best-practice/project-structure/#example-link-data-into-home-option-21","title":"Example: Link Data Into Home (Option 2.1).","text":"We can reuse the project up to the statement \"This concludes the basic project setup\" in the example for option 1.
Then, we can do the following:
$ cd ../work\n$ mkdir -p output\n\n$ cd ../home\n$ cat <<\"EOF\" >>.gitignore\n\n# Ignore all data\ninput/\nwork/\noutput/\nEOF\n$ git add .gitignore\n$ git commit -m 'Ignoring data file in .gitignore'\n$ ln -s ../work ../output .\n
And we can execute everything in the home directory.
$ ./scripts/run-mapping -c config/project-config.sh\n[...]\n
"},{"location":"best-practice/project-structure/#example-wrapper-scripts-in-home-option-22","title":"Example: Wrapper Scripts in Home (Option 2.2)","text":"Again, we can reuse the project up to the statement \"This concludes the basic project setup\" in the example for option 1.
Then, we do the following:
$ cd ../work\n$ cat <<\"EOF\" >do-run-mapping.sh\n#!/bin/bash\n\n../home/scripts/run-mapping.sh \\\n -c ../home/config/project-config.sh\nEOF\n$ chmod +x do-run-mapping.sh\n
Note that the the do-run.sh
script could also go into the project-specific Git repository and be linked into the work directory.
Finally, we can run our pipeline:
$ cd ../work\n$ ./do-run-mapping.sh\n[...]\n
"},{"location":"best-practice/screen-tmux/","title":"Screen and Tmux Best Pratice","text":"The program screen
allows you to detach your session from your current login session. So in case you get disconnected your screen session will stay alive.
Hint
You have to reconnect to screen on the machine that you started it. We thus recommend starting it only on the login nodes and not on a compute node.
"},{"location":"best-practice/screen-tmux/#start-and-terminat-a-screen-session","title":"Start and terminat a screen session","text":"You start a new screen
session by
$ screen\n
When you are in a screen session you can terminate it with $ exit\n
so its gone then."},{"location":"best-practice/screen-tmux/#detach-a-screen-session","title":"Detach a screen session","text":"If you want to detach your screen session press Ctrl+a d
To list all your screen sessions run
$ screen -ls\n\nThere is a screen on:\n 2441.pts-1.med0236 (Detached)\n1 Socket in /var/run/screen/S-kbentel.\n
"},{"location":"best-practice/screen-tmux/#reattach-screen-session","title":"Reattach screen session","text":"To reattach a screen session run
$ screen -r screen_session_id\n
If you do not know the screen_session_id
you can get it with screen -ls
, e.g. 2441.pts-1.med0236
in the example above. You do not have to type the whole screen_session_id
only as much as is necessary to identify it uniquely. In case there is only one screen session detached it is enough to run screen -r
Sometimes it is necessary to kill a detached screen session. This is done with the command
$ screen -X -S screen_session_id quit\n
"},{"location":"best-practice/screen-tmux/#multiple-windows-in-a-screen-session","title":"Multiple windows in a screen session","text":"It is possible to have multiple windows in a screen session. So suppose you are logged into a screen session, these are the relevant shortcuts
new win: Ctrl+a c\nnext/previous win: Ctrl+a n/p\n
To terminate a window just enter
$ exit\n
"},{"location":"best-practice/screen-tmux/#configuration-file","title":"Configuration file","text":"Here is a sensible screen configuration. Save it as ~/.screenrc
.
screenrc
"},{"location":"best-practice/screen-tmux/#fix-a-broken-screen-session","title":"Fix a broken screen session","text":"In case your screen session doesn't write to the terminal correctly, i.e. the formatting of the output is broken, you can fix it by typing to the terminal:
$ tput smam\n
"},{"location":"best-practice/software-craftmanship/","title":"General Software Craftmanship","text":"Computer software, or simply software, is a generic term that refers to a collection of data or computer instructions that tell the computer how to work, in contrast to the physical hardware from which the system is built, that actually performs the work. -- Wikipedia: Software
As you will most probably never have contact with the HPC system hardware, everything you interact with on the HPC is software. All of your scripts, your configuration files, programs installed by you or administration, and all of your data.
This should also answer the question why you should care about software and why you should try to create and use software of a minimal quality.
Software craftsmanship is an approach to software development that emphasizes the coding skills of the software developers themselves. -- Wikipedia: Software Craftmanship
This Wiki page is not mean to give you an introduction of creating good software but rather collect a (growing) list of easy-to-use and high-impact points to improve software quality. Also, it provides pointers to resources elsewhere on the internet.
"},{"location":"best-practice/software-craftmanship/#use-version-control","title":"Use Version Control","text":"Use a version control system for your configuration and your code. Full stop. Modern version control systems are Git and Subversion.
Every user should have their own Git/Subversion checkout. Otherwise you are inviting a large number of problems.
"},{"location":"best-practice/software-craftmanship/#document-your-code","title":"Document Your Code","text":"This includes
Document where you got things from, how to re-download, etc. E.g., put a README file into each of your data top level directories.
"},{"location":"best-practice/software-craftmanship/#use-checksums","title":"Use Checksums","text":"Use MD5 or other checksums for your data. For example, md5sum
and hashdeep
are useful utilities for computing and checking them:
md5sum
How-To (tools such as sha256sum
work the same...)hashdeep
How-ToUse some system for managing your workflows. These systems support you by
Snakemake is a popular workflow management system widely used in Bioinformatics. A minimal approach is using Makefiles.
"},{"location":"best-practice/software-craftmanship/#understand-bash-and-shell-exit-codes","title":"Understand Bash and Shell Exit Codes","text":"If you don't want to use a workflow management system, e.g., for one-step jobs, you should at least understand Bash job management and exit codes. For example, you can use if/then/fi
in Bash together with exit codes to:
if [[ ! -e file.md5 ]]; then\n md5sum file >file.md5 \\\n || rm -f file.md5\nfi\n
Also, learn about the inofficial Bash strict mode.
"},{"location":"best-practice/software-installation-with-conda/","title":"Software Installation with Conda","text":""},{"location":"best-practice/software-installation-with-conda/#conda","title":"Conda","text":"Users do not have the rights to install system packages on the BIH HPC cluster. For the management of bioinformatics software we therefore recommend using the conda package manager. Conda provides software in different \u201cchannels\u201d and one of those channels contains a huge selection of bioinformatics software (bioconda). Generally packages are pre-compiled and conda just downloads the binaries from the conda servers.
You are in charge of managing your own software stack, but conda makes it easy to do so. We will provide you with a description on how to install conda and how to use it. Of course there are many online resources that you can also use. Please find a list at the end of the document.
Also note that some system-level software is managed through environment modules.
"},{"location":"best-practice/software-installation-with-conda/#premise","title":"Premise","text":"When you logged into the cluster, please make sure that you also executed srun
to log into a computation node and perform the software installation there.
hpc-login-1:~$ srun --mem=5G --pty bash -i\nhpc-cpu-123:~$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh\nhpc-cpu-123:~$ bash Miniconda3-latest-Linux-x86_64.sh -b -f -p $HOME/work/miniconda\nhpc-cpu-123:~$ eval \"$(/$HOME/work/miniconda/bin/conda shell.bash hook)\"\nhpc-cpu-123:~$ conda init\nhpc-cpu-123:~$ conda config --set auto_activate_base false\n
This will install conda to $HOME/work/miniconda
. You can change the path to your liking, but please note that your $HOME
folder has limited space. The work
subfolder however has a bigger quota. More about this here.
To make bioinformatics software available, we have to add the bioconda
and some other channels to the conda configuration:
hpc-cpu-123:~$ conda config --add channels bioconda\nhpc-cpu-123:~$ conda config --add channels default\nhpc-cpu-123:~$ conda config --add channels conda-forge\n
"},{"location":"best-practice/software-installation-with-conda/#installing-software-with-conda","title":"Installing software with conda","text":"Installing packages with conda is straight forward:
hpc-cpu-123:~$ conda install <package>\n
This will install a package into the conda base environment. We will explain environments in detail in the next section. To search for a package, e.g. to find the correct name in conda or if it exists at all, issue the command:
hpc-cpu-123:~$ conda search <string>\n
To choose a specific version (conda will install the latest version that is compatible with the current installed Python version), you can provide the version as follows:
hpc-cpu-123:~$ conda install <package>=<version>\n
Please note that new conda installs may ship with a recently update Python version and not all packages might have been adapted. E.g., if you find out that some packages don't work after starting out/upgrading to Python 3.8, simply try to downgrade Python to 3.7 with conda install python=3.7
.
Hint
As resolving the dependency tree of an installation candidate can take a lot of time in Conda, especially when you are installing software from an environment.yaml
file, an alternative resolver has been presented that you can use to install software into your Conda environment. The time savings are immense and an installation that took more than an hour can be resolved in seconds.
Simply run
hpc-cpu-123:~$ conda install mamba\n
With that, you can install software into your environment using the same syntax as for Conda:
hpc-cpu-123:~$ mamba install <package>\n
"},{"location":"best-practice/software-installation-with-conda/#creating-an-environment","title":"Creating an environment","text":"Conda lets you create environments, such that you can test things in a different environment or group your software. Another common use case is to have different environments for the different Python versions. Since conda is Python-based, conflicting packages will mostly struggle with the Python version.
By default, conda will install packages into its root environment. Please note that software that does not depend on Python and is installed in the root environment, is is available in all other environments.
To create a Python 2.7 environment and activate it, issue the following commands:
hpc-cpu-123:~$ conda create -n py27 python=2.7\nhpc-cpu-123:~$ source activate py27\n(py27) hpc-cpu-123:~$\n
From now on, conda will install packages into the py27
environment when you issue the install
command. To switch back to the root environment, simply deactivate the py27
environment:
(py27) hpc-cpu-123:~$ source deactivate py27\nhpc-cpu-123:~$\n
But of course, as Python 2.7 is not supported any more by the Python Software Foundation, you should switch over to Python 3 already!
"},{"location":"best-practice/temp-files/","title":"Temporary Files","text":"Temporary Files and Slurm
See Slurm: Temporary Files for information how Slurm controls access to local temporary storage.
Often, it is necessary to use temporary files, i.e., write something out in the middle of your program, read it in again later, and then discard these files. For example, samtools sort
has to write out chunks of sorted read alignments for allowing to sort files larger than main memory.
TMPDIR
","text":"Traditionally, in Unix, the environment variables TMPDIR
is used for storing the location of the temporary directory. When undefined, usually /tmp
is used.
Generally, there are two locations where you could put temporary files:
/data/cephfs-1/home/users/$USER/scratch/tmp
-- inside your scratch folder on the CephFS file system; this location is available from all cluster nodes/tmp
-- on the local node's temporary folder; this location is only available on the node itself. The slurm scheduler uses Linux namespaces such that every job gets its private /tmp
even when run on the same node.scratch/tmp
","text":"Use CephFS-based TMPDIR
Generally setup your environment to use /data/cephfs-1/home/users/$USER/scratch/tmp
as filling the local disk of a node with forgotten files can cause a lot of problems.
Ideally, you append the following to your ~/.bashrc
to use /data/cephfs-1/home/users/$USER/scratch/tmp
as the temporary directory. This will also create the directory if it does not exist. Further, it will create one directory per host name which prevents too many entries in the temporary directory.
export TMPDIR=$HOME/scratch/tmp/$(hostname)\nmkdir -p $TMPDIR\n
Prepending this to your job scripts is also recommended as it will ensure that the temporary directory exists.
"},{"location":"best-practice/temp-files/#tmpdir-and-the-scheduler","title":"TMPDIR
and the scheduler","text":"In the older nodes, the local disk is a relatively slow spinning disk, in the newer nodes, the local disk is a relatively fast SSD. Further, the local disk is independent from the CephFS file system, so I/O volume to it does not affect the network or any other job on other nodes. Please note that by default, Slurm will not change your environment variables. This includes the environment variable TMPDIR
.
Slurm will automatically update temporary files in a job's /tmp
on the local file system when the job terminates. To automatically clean up temporary directories on the shared file system, use the following tip.
You can use the following code at the top of your job script to set TMPDIR
to the location in your home directory and get the directory automatically cleaned when the job is done (regardless of successful or erroneous completion):
# First, point TMPDIR to the scratch in your home as mktemp will use thi\nexport TMPDIR=$HOME/scratch/tmp\n# Second, create another unique temporary directory within this directory\nexport TMPDIR=$(mktemp -d)\n# Finally, setup the cleanup trap\ntrap \"rm -rf $TMPDIR\" EXIT\n
"},{"location":"connecting/connecting-windows/","title":"Connecting via SSH on Windows","text":""},{"location":"connecting/connecting-windows/#install-ssh-client-for-windows","title":"Install SSH Client for Windows","text":"We recommend to use the program MobaXterm on Windows. MobaXterm is a software that allows you to connect to an SSH server, much like PuTTy, but also maintains your SSH key.
Alternative SSH Clients for Windows
For transfering data from/to Windows, we recommand using WinSCP. Install the latest version from here: https://winscp.net/eng/download.php
On the Login
screen of WinSCP create a new login by selecting New Site
.
Fill in the following parameters:
File protocol
: SFTP
Host name
: hpc-transfer-1.cubi.bihealth.org
or hpc-transfer-2.cubi.bihealth.org
User name
: your user nameGo to Advanced
> SSH
> Authentication
> Authentication parameters
> Private key file
and select your private ssh key file (in .ppk
format).
Press Ok
then Save
.
Press Login
to connect. It will ask for your private key passphrase, if you set one up.
If you need to convert your private ssh key file the .ppk
format, on the WinSCP login screen go to Tools
> PuTTYgen
and follow the steps here: https://docs.acquia.com/cloud-platform/manage/ssh/sftp-key/
Click on Session
.
Click on SSH
.
In Basic SSH settings, enter a hostname (hpc-login-X.cubi.bihealth.org
, where X
is 1 or 2), check Specify username and enter your username in the textfield. Select the tab Advanced SSH settings, check Use private key and select your private SSH key file (possible choices described with the next to figures).
Select the id_rsa
file generated in Linux OR
select the id_rsa.ppk
file generated in Windows with MobaXterm.
Afterwards hit the OK button and MobaXterm will connect.
The session will be stored automatically and you can establish new connections later on, or also multiple ones at the same time, if you like.
"},{"location":"connecting/connecting/","title":"Connecting to HPC 4 Research","text":"HPC 4 Research is only available via the Charit\u00e9, MDC, and BIH internal networks. VPN access requires additional measures which are described in Connecting from External Networks.
There are two primary methods for interacting with BIH HPC:
This part of the documentation only described direct console access via SSH. For information regarding the web portal, please read OnDemand Portal. In case you're not familiar with SSH, you should probably start via the web portal or (if you are determined to learn) read through our SSH basics page.
"},{"location":"connecting/connecting/#in-brief","title":"In brief","text":"Follow these steps to connect to BIH HPC via the command line:
Connect to one of the two login nodes.
# Charite Users\n$ ssh user_c@hpc-login-1.cubi.bihealth.org\n$ ssh user_c@hpc-login-2.cubi.bihealth.org\n\n# MDC Users\n$ ssh user_m@hpc-login-1.cubi.bihealth.org\n$ ssh user_m@hpc-login-2.cubi.bihealth.org\n
Hint
There are two login nodes, hpc-login-1
and hpc-login-2
. There are two for redundancy reasons. Please do not perform big file transfers or an sshfs
mount via the login nodes. For this purpose, we have hpc-transfer-1
and hpc-transfer-2
.
Please also read Advanced SSH for more custom scenarios how to connect to BIH HPC. If you are using a Windows PC to access BIH HPC, please read Connecting via SSH on Windows
Allocate resources on a computation node using Slurm. Do not compute on the login node!
# Start interactive shell on computation node\n$ srun --pty bash -i\n
Bonus: Configure your SSH client on Linux and Mac or Windows.
tl;dr
SSH-Based Access:
# Interactive login (choose one)\nssh username@hpc-login-1.cubi.bihealth.org\nssh username@hpc-login-2.cubi.bihealth.org\nsrun --pty bash -i\n\n# File Transfer (choose one)\nsftp local/file username@hpc-transfer-1.cubi.bihealth.org:remote/file\nsftp username@hpc-transfer-2.cubi.bihealth.org:remote/file local/file\n\n# Interactive login into the transfer nodes (choose one)\nssh username@hpc-transfer-1.cubi.bihealth.org\nssh username@hpc-transfer-2.cubi.bihealth.org\n
Your username for accessing the cluster are composed of your username at your primary organization (Charit\u00e9/MDC) and a suffix:
<Charite username>_c -> doej_c
<MDC username>_m -> jdoe_m
Please read Connecting from External Networks
"},{"location":"connecting/connecting/#i-have-problems-connecting","title":"I have problems connecting","text":"Please read Debugging Connection Problems
"},{"location":"connecting/connection-problems/","title":"Debugging Connection Problems","text":"When you encounter problems with the login to the cluster although we indicated that you should have access, depending on the issue, here is a list of how to solve the problem:
"},{"location":"connecting/connection-problems/#im-getting-a-connection-refused","title":"I'm getting a \"connection refused\"","text":"The full error message looks as follows:
ssh: connect to host hpc-login-1.cubi.bihealth.org port 22: Connection refused\n
This means that your computer could not open a network connection to the server.
<DEST>
):ifconfig\ntraceroute <DEST>\n
ipconfig\ntracepath <DEST>\n
You're logging into BIH HPC cluster! (login-1)\n\n ***Your account has not been granted cluster access yet.***\n\n If you think that you should have access, please contact\n hpc-helpdesk@bih-charite.de for assistance.\n\n For applying for cluster access, contact hpc-helpdesk@bih-charite.de.\n\nuser@login-1's password:\n
Hint
This is the most common error, and the main cause for this is a wrong username. Please take a couple of minutes to read the What is my username?!
If you encounter this message although we told you that you have access and you checked the username as mentioned above, please write to hpc-helpdesk@bih-charite.de, always indicating the message you get and a detailed description of what you did.
"},{"location":"connecting/connection-problems/#im-getting-a-passphrase-prompt","title":"I'm getting a passPHRASE prompt","text":"You're logging into BIH HPC cluster! (login-1)\n\n *** It looks like your account has access. ***\n\n Login is based on **SSH keys only**, if you are getting a password prompt\n then please contact hpc-helpdesk@bih-charite.de for assistance.\n\nEnter passphrase for key '/home/USER/.ssh/id_rsa':\n
Here you have to enter the passphrase that was used for encrypting your private key. Read SSH Basics for further information of what is going on here.
"},{"location":"connecting/connection-problems/#i-can-connect-but-i-get-a-password-prompt","title":"I can connect, but I get a passWORD prompt","text":"You're logging into BIH HPC cluster! (login-1)\n\n *** It looks like your account has access. ***\n\n Login is based on **SSH keys only**, if you are getting a password prompt\n then please contact hpc-helpdesk@bih-charite.de for assistance.\n\nuser@login-1's password:\n
This is diffeerent from passPHRASE prompt
Please see I'm getting a passPHRASE prompt for more information.
When you encounter this message during a login attempt, there is an issue with your SSH key. In this case, please connect with increased verbosity to the cluster (ssh -vvv ...
) and mail the output and a detailed description to hpc-helpdesk@bih-charite.de.
This page describes how to connect to the BIH HPC from external networks (e.g., another university or from your home). The options differ depending on your home organization and are described in detail below.
Getting Help with VPN and Gateway Nodes
Please note that the VPNs and gateway nodes are maintained by the central IT departments of Charite/MDC. BIH HPC IT cannot assist you in problems with these serves. Authorative information and documentation is provided by the central IT departments as well.
SSH Key Gotchas
You should use separate SSH key pairs for your workstation, laptop, home computer etc. As a reminder, you will have to register the SSH keys with your home IT organization (MDC or Charite). When using gateway nodes, please make sure to use SSH key agents and agent forwarding (ssh
flag \"-A
\").
Use the following command to perform a proxy jump via the MDC SSH gateway (ssh1
aka jail1
) when connecting to a login node. Note that for logging into the jail, the <MDC_USER>
is required.
$ ssh -J <MDC_USER>@ssh1.mdc-berlin.de <HPC_USER>@hpc-login-1.cubi.bihealth.org\n
Note
Please Note that the cluster login is independent of access to the MDC jail node ssh1.mdc-berlin.de.
You can find the instructions for getting MDC VPN access here in the MDC intranet below the \"VPN\" heading. Please contact helpdesk@mdc-berlin.de for getting VPN access.
Install the VPN client and then start it. Once VPN has been activated you can SSH to the HPC just as from your workstation.
$ ssh user_m@hpc-login-1.cubi.bihealth.org\n
"},{"location":"connecting/from-external/#charite-users","title":"Charit\u00e9 Users","text":"Access to BIH HPC from external networks (including Eduroam) requires a Charit\u00e9 VPN connection with special access permissions.
"},{"location":"connecting/from-external/#general-charite-vpn-access","title":"General Charit\u00e9 VPN Access","text":"You need to apply for general Charit\u00e9 VPN access if you haven't done so already. The form can be found in the Charite Intranet and contains further instructions. Charit\u00e9 IT Helpdesk can help you with any questions.
"},{"location":"connecting/from-external/#zusatzantrag-b","title":"Zusatzantrag B","text":"Special permissions form B is also required for HPC access. You can find Zusatzantrag B in the Charit\u00e9 intranet. Fill it out and send it to the same address as the general VPN access form above.
Once you have been granted VPN access, start the client and connect to VPN. You will then be able to connect from your client in the VPN just as you do from your workstation.
$ ssh jdoe_c@hpc-login-1.cubi.bihealth.org\n
"},{"location":"connecting/from-external/#charite-vdi-not-recommended","title":"Charit\u00e9 VDI (Not recommended)","text":"Alternative to using Zusatzantrag B, you can also get access to the Charit\u00e9 VDI (Virtual Desktop Infrastructure). Here, you connect to a virtual desktop computer which is in the Charit\u00e9 network. From there, you can connect to the BIH HPC system.
You need to apply for extended VPN access to be able to access the BIH VDI. The form can be found here. It is important to tick Dienst(e), enter HTTPS and as target view.bihealth.org
. Please write to helpdesk@charite.de with the request to access the BIH VDI.
When the access has been set up, follow the instructions on client configuration for Windows, after logging in to the BIH VDI.
"},{"location":"connecting/ssh-basics/","title":"SSH Basics","text":""},{"location":"connecting/ssh-basics/#what-is-ssh","title":"What is SSH?","text":"SSH stands for S ecure Sh ell. It is a software that allows to establish a user-connection to a remote UNIX/Linux machine over the network and remote-control it from your local work-station.
Let's say you have an HPC cluster with hundreds of machines somewhere in a remote data-center and you want to connect to those machines to issue commands and run jobs. Then you would use SSH.
"},{"location":"connecting/ssh-basics/#getting-started","title":"Getting Started","text":""},{"location":"connecting/ssh-basics/#installation","title":"Installation","text":"Simply install your distributions openssh-client
package. You should be able to find plenty of good tutorials online. On Windows you can consider using MobaXterm (recommended) or Putty.
Let's call your local machine the client and the remote machine you want to connect to the server.
You will usually have some kind of connection information, like a hostname, IP address and perhaps a port number. Additionally, you should also have received your user-account information stating your user-name, your password, etc.
Follow the instructions below to establish a remote terminal-session.
If your are on Linux
Open a terminal and issue the following command while replacing all the <...>
fields with the actual data:
# default port\nssh <username>@<hostname-or-ip-address>\n\n# non-default port\nssh <username>@<hostname-or-ip-address> -p <port-number>\n
If you are on windows
Start putty.exe
, go into the Session
category and fill out the form, then click the Connect
button. Putty also allows to save the connection information in different profiles so you don't have to memorize and retype all fields every time you want to connect.
When you connect to a remote machine via SSH, you will be prompted for your password. This will happen every single time you connect and can feel a bit repetitive at times, especially if you feel that your password is hard to memorize. For those who don't want to type in their password every single time they connect, SSH keys are an alternative way of authentication.
Instead if being prompted for a password, SSH will simply use the key to authenticate. As this key file should be device specific, this also increases security of the login process.
You can generate a new key by issuing:
client:~$ ssh-keygen -t ed25519\n\n# 1. Choose file in which to save the key *(leave blank for default)*\n# 2. Choose a passphrase of at least five characters\n
"},{"location":"connecting/ssh-basics/#how-do-ssh-keys-work","title":"How do SSH-Keys work?","text":"An SSH key consists of two files, one private and one public key. The public key is installed on remote machines and can only be validated with the matching private key, which is stored on client computers. During the login process this is achieved via public-key cryptography.
Traditionally the algorithm used for this was RSA. Recently elliptic curve cryptography has been developed as a more secure and more performant alternative. We recommend the ed25519
type of SSH key.
The security problem with SSH keys is that anyone with access to the private key has full access to all machines that have the public key installed. Loosing the key or getting it compromised in another way imposes a serious security threat. Therefore, it is best to secure the private key with a passphrase. This passphrase is needed to unlock and use the private key.
Once you have your key-pair generated, you can easily change the passphrase of that key by issuing:
client:~$ ssh-keygen -p\n
"},{"location":"connecting/ssh-basics/#ssh-agent","title":"SSH-Agent","text":"In order to avoid having to type the passphrase of the key every time we want to use it, the key can be loaded into an SSH-Agent.
For instance, if you have connected to a login-node via Putty and want to unlock your private key in order to be able to access cluster nodes, you cant configure the SSH-Agent.
client:~$ source <(ssh-agent)\n
(The above command will load the required environment variables of the SSH-Agent into your shell environment, effectively making the agent available for your consumption.)
Next, you can load your private key:
client:~$ ssh-add\n
(You will be prompted for the passphrase of the key)
You can verify that the agent is running and your key is loaded by issuing:
client:~$ ssh-add -l\n# 'l' as in list-all-loaded-keys\n
(The command should print at least one key, showing the key-size, the hash of the key-fingerprint and the location of the file in the file-system.)
Since all home-directories are shared across the entire cluster and you created your key-pair inside your home-directory, you public-key (which is also in your home-directory) is automatically installed on all other cluster nodes, immediately. Try connecting to any cluster node. It should not prompt your for a password.
There is nothing you have to do to \"unload\" or \"lock\" the key-file. Simply disconnect.
"},{"location":"connecting/advanced-ssh/linux/","title":"Connecting via SSH on Unix","text":""},{"location":"connecting/advanced-ssh/linux/#activating-your-key-in-the-ssh-key-agent","title":"Activating your Key in the SSH Key Agent","text":"Note
The big Linux distributions automatically manage ssh-agent for you and unlock your keys at login time. If this doesn't work for you, read on.
ssh-agent
caches your SSH keys so that you do not need to type your passphrase every time it is used. Activate it by making sure ssh-agent
runs in the background and add your key:
$ eval \"$(ssh-agent -s)\"\n$ ssh-add\n
or if you chose a custom key name, specify the file like so:
$ ssh-add ~/.ssh/mdc_id_rsa\n
"},{"location":"connecting/advanced-ssh/linux/#macos","title":"MacOS","text":"If you run into problems that your key is not accepted when connecting from MacOS, please use:
$ ssh-add --apple-use-keychain\n
"},{"location":"connecting/advanced-ssh/linux/#configure-ssh-client","title":"Configure SSH Client","text":"You can define a personal SSH configuration file to make connecting to the cluster more comfortable by reducing the typing necessary by a lot. Add the following lines to the file ~/.ssh/config
file. Replace USER_NAME
with your cluster user name. You can also adapt the Host naming as you like.
Host bihcluster\n HostName hpc-login-1.cubi.bihealth.org\n User USER_NAME\n\nHost bihcluster2\n HostName hpc-login-1.cubi.bihealth.org\n User USER_NAME\n
Now, you can do type the following (and you don't have to remember the host name of the login node any more).
$ ssh bihcluster\n
This configuration works if you are inside Charit\u00e9, the Charit\u00e9 VPN, or MDC.
"},{"location":"connecting/advanced-ssh/linux/#mdc-users-jail-node","title":"MDC users: Jail node","text":"If you have an MDC user account and want to connect from the outside, you can use the following ~/.ssh/config
lines to set up a ProxyJump via the MDC SSH jail.
Host mdcjail\n HostName ssh1.mdc-berlin.de\n User MDC_USER_NAME\n
Now you can run
$ ssh -J mdcjail bihcluster1\n
If you are always connecting from outside the internal network, you can also add a permanent ProxyJump to the SSH configuration like so:
Host bihcluster\n HostName hpc-login-1.cubi.bihealth.org\n User USER_NAME\n ProxyJump mdcjail\n
"},{"location":"connecting/advanced-ssh/linux/#connecting-with-another-computerlaptop","title":"Connecting with another computer/laptop","text":"If you need to connect to the cluster from another computer than the one that contains the SSH keys that you submitted for the cluster login, you have two possibilities.
~/.ssh/id_rsa
) to the second computer into the same location.Danger
Do not leave the key on any USB stick. Delete it after file transfer. This is a sensible part of data. Make sure that the files are only readable for you.
$ cd ~/.ssh\n$ chmod g-rwx id_rsa*\n$ ssh-add id_rsa\n
"},{"location":"connecting/advanced-ssh/linux/#file-system-mount-via-sshfs","title":"File System mount via sshfs","text":"$ sshfs <USERNAME>@hpc-transfer-1.cubi.bihealth.org:/ <MOUNTPOINT>\n
hpc-transfer-1:
follows the structure <host>:<directory>
starting in the user home.<MOUNTPOINT>
must be an empty but existing and readable directory on your local computerMake sure you have both OSXFUSE and SSHFS installed. You can get both from here: https://osxfuse.github.io/ or the most recent version via Homebrew:
$ brew cask install osxfuse; brew install sshfs; brew link --overwrite sshfs\n
The last command is optional and unlinks any pre-existing links to older versions of sshfs. Now you can run $ sshfs -o follow_symlinks <USERNAME>@hpc-transfer-1<X>.cubi.bihealth.org:<directory_relative_to_Cluster_root> <MOUNTPOINT> -o volname=<BIH-FOLDER> -o allow_other,noapplexattr,noappledouble\n
"},{"location":"connecting/advanced-ssh/linux/#x11","title":"X11","text":"Do you really need to run a graphical application on the cluster?
Please note that running more complex Java applications, such as IGV may be not very efficient because of the connection speed. In most cases you can run them on your local workstation by mounting them via SSHFS.
Connect to one of the login nodes using X11 forwarding:
$ ssh -X -C -t <USERNAME>@hpc-login-1.bihealth.org\n
Once you get a login prompt, you can use the srun
command with the --x11
parameter to open a X11 session to a cluster node:
$ srun --pty --x11 bash\n
And finally you can start your X11 application, e.g.:
$ xterm\n
After a while Visual Terminal should start:
"},{"location":"connecting/advanced-ssh/overview/","title":"Advanced SSH usage","text":"Here we describe custom scenarios for using SSH to connect to BIH HPC. To keep it consise, this section is divided into separate documents for
Danger
Mounting ssh on Windows is currently discouraged since relevant software is outdated (see also hpc-talk). Also, in most cases it is not really necessary to have a constant mount. For normal data transfer please use WinSCP instead.
Once WinSshFS is started, an icon will be added to your taskbar:
Left-clicking that icon will bring up a window. If not, right click the taskbar icon, select Show Manager
and click Add
in the menu.
Fill out the marked fields:
hpc-transfer-1.cubi.bihealth.org
PrivateKey
. Select the id_rsa
private key, not the .ppk
format that is provided by PuTTY. Enter the password that you used to secure your key with.Then click Save
and then Mount
.
Open the explorer. A new drive with the name you gave should show up:
Finished!
"},{"location":"connecting/advanced-ssh/windows/#connecting-via-mdc-jail-node","title":"Connecting via MDC Jail Node","text":"This requires an active MDC account!
Additional to the steps above, click on the tab Network settings
.
ssh1.mdc-berlin.de
and in the field User your MDC username.Do you really need to run a graphical application on the cluster?
Please note that running more complex Java applications, such as IGV may be not very efficient because of the connection speed. In most cases you can run them on your local workstation by mounting them via SSHFS.
Start MobaXterm, it should automatically fetch your saved Putty sessions as you can see on screen below:
Connect to one of the login nodes, by double-click on saved profile, and then use srun --pty --x11 bash
command to start X11 session to one of the nodes:
Finally, start X11 application (below example of starting Visual Terminal):
"},{"location":"connecting/generate-key/linux/","title":"Generating an SSH Key in Linux","text":"~/.ssh/id_xxx.pub
is present.$ ssh-keygen -t ed25519 -C \"your_email@example.com\"\n
What is a key passphrase?
You should set a passphrase when generating your key pair. It is used for encrypting your private key in case it is stolen or lost. When using the key for login, you will have to enter the passphrase. Many desktop environments offer ways to automatically unlock your key on login.
Read SSH Basics for more information.
The whole session should look something like this:
host:~$ ssh-keygen -t ed25519 -C \"your_email@example.com\"\nGenerating public/private ed25519 key pair.\nEnter file in which to save the key (/home/USER/.ssh/id_ed25519): \nCreated directory '/home/USER/.ssh'.\nEnter passphrase (empty for no passphrase):\nEnter same passphrase again: \nYour identification has been saved in /home/USER/.ssh/id_ed25519.\nYour public key has been saved in /home/USER/.ssh/id_ed25519.pub.\nThe key fingerprint is:\nSHA256:Z6InW1OYt3loU7z14Kmgy87iIuYNr1gJAN1tG71D7Jc your_email@example.com\nThe key's randomart image is:\n+--[ED25519 256]--+\n|.. . . o |\n|. . . + + |\n|. . = . . |\n|. . +oE. |\n|. So= o o |\n| . . . * = + + |\n| + o + B o o .|\n| oo+. .B + + . |\n|.ooooooo*. . |\n+----[SHA256]-----+\n
The file content of ~/.ssh/id_ed25519.pub
should look something like this):
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIFzuiaSVD2j5y6RlFxOfREB/Vbd+47ABlxF7du5160ZH your_email@example.com\n
"},{"location":"connecting/generate-key/linux/#submit-your-key","title":"Submit Your Key","text":"As a next step you need to submit the SSH key use these links as:
Prerequisite: Installing an SSH Client
Please install an SSH client for Windows first.
"},{"location":"connecting/generate-key/windows/#generate-the-key","title":"Generate the Key","text":"Click on Tools
and MobaKeyGen (SSH key generator)
In the section Parameters make sure to set the following properties:
RSA
(this is the SSH-2
protocol)4096
If all is set, hit the Generate button.
During generation, move the mouse cursor around in the blank area.
When finished, make sure to protect your generated key with a passphrase. Save the private and public key. The default name under Linux for the public key is id_rsa.pub
and id_rsa
for the private key, but you can name them however you want (the .pub
is NOT automatically added). Note that in the whole cluster wiki we will use this file naming convention. Also note that the private key will be stored in Putty format (.ppk
, this extension is added automatically).
What is your key's passphrase?
You should set a passphrase when generating your private key. This passphrase is used for encrypting you private key to protect it against the private key file theft/being lost. When using the key for login, you will have to enter it (or the first time you load it into the SSH key agent). Note that when being asked for the passphrase this does not occur on the cluster (and is thus unrelated to it) but on your local computer.
Also see SSH Basics for more information.
The gibberish in the textbox is your public key in the format how it has to be submitted to the MDC and Charite (links for this step below). Thus, copy this text and paste it to the SSH-key-submission-web-service of your institution.
Store the private key additionally in the OpenSSH format. To do so, click Conversions
and select Export OpenSSH key
. To be consistent, give the file the same name as your .ppk
private key file above (just without the .ppk
).
To summarize, you should end up with three files:
id_rsa.pub
The public key file, it is not required if you copy and submit the SSH public key as described above and in the links below.id_rsa.ppk
This file is only needed if you plan to use Putty.id_rsa
This is your private key and the one and only most important file to access the cluster. It will be added to the sessions in MobaXterm and WinSSHFS (if required).As a next step you need to submit the SSH key use these links as:
As of February 2020, SSH key submission not accepted via email anymore. Instead, use the process outline here.
For any help, please contact helpdesk@charite.de (as this site is maintained by Charite GB IT).
"},{"location":"connecting/submit-key/charite/#charite-zugangsportal","title":"Charite Zugangsportal","text":"Key are submitted in the Charite Zugangsportal. As of Feb 4, you have to use the \"test\" version for this.
Go to zugang.charite.de and login.
Follow through the login page until you reach the main menu (it's tedious but we belive in you ;) Click the \"SSH Keys\" button.
Paste your SSH key (starting with ssh-rsa
) and ending with the label (usually your email, e.g., john.doe@charite.de
) into the box (1) and press append (2). By default, the key can be found in the file ~/.ssh/id_rsa.pub
in Linux. If you generated the key in Windows, please paste the copied key from the text box. Repeat as necessary. Optionally, go back to the main menu (3) when done.
If you have generated your SSH key with PuTTy, you must right click on the ppk-file, then choose \"Edit with PuTTYgen\" in the right click menu. Enter your passphrase. Then copy the SSH key out of the upper box (already highlighted in blue).
Check if the key has been added
After you clicked append
, your key will be printed back to you (as shown in the blurred picture above).
If your key is not printed back to you then adding the SSH key to zugang.charite.de was not successful. In this case please contact helpdesk@charite.de for assistance as they (Charite GB IT) maintains that system and it is out of our (BIH HPC IT) control.
Once your key has been added, it will take a few minutes for the changes to go live.
"},{"location":"connecting/submit-key/mdc/","title":"Submitting an SSH Key to MDC","text":"For MDC users, SSH keys are submitted through the MDC PersDB interface (see below). PersDB is not maintained by BIH HPC IT but by MDC IT.
Warning
The SSH keys are only activated over night (but automatically). This is out of our control. Contact helpdesk@mdc-berlin.de for more information.
"},{"location":"connecting/submit-key/mdc/#detour-using-mdc-vmware-view-to-get-into-mdc-intranet","title":"Detour: Using MDC VMWare View to get into MDC Intranet","text":"In case you are not within the MDC network, connect to MDC VMWare view first and use the web brower in the Window session.
~/.ssh/id_rsa.pub
into the clipboard window. Ensure that the whole file contents is there (should end with your email address). If you generated the key in Windows, please paste the copied key from the text box.Thus, you will only be able to connect the next day. - Bask in the glory of having completed this process.
"},{"location":"cubit/","title":"Overview","text":"The static data installation can be found at /data/cephfs-1/work/projects/cubit/18.12/static_data
.
The static data directory contains a sub-directory for the genomes, the precomputed index files for several different popular mapping tools and associated annotation (GFF and GTF files) from Ensembl and GENCODE for each of the available genomes. The top-level directory structure is as follows:
static_data/
annotations
app_support
db
exome_panel
exon_list
precomputed
reference
The following Ensembl and GENCODE versions corresponding to the indicated reference genomes will be made available on the cluster.
Database Version Reference Genome Ensembl 65 NCBIM37 (Ensembl release corresponding to GENCODE M1) Ensembl 67 NCBIM37 (Ensembl release for sanger mouse genome assembly) Ensembl 68 GRCm38 (Ensembl release for sanger mouse genome assembly) Ensembl 74 GRCh37 (Ensembl release for GENCODE 19) Ensembl 75 GRCh37 (Latest release for GRCh37) Ensembl 79 GRCh38 (Ensembl release for GENCODE 22) Ensembl 80 GRCh38 (Ensembl release corresponding to GENCODE 22) Ensembl 80 GRCm38 (Ensembl release corresponding to GENCODE M1) GENCODE M1 NCBIM37 (No gff3 file) GENCODE M5 GRCm38 GENCODE 19 current for GRCh37 GENCODE 22 current for GRCh38The annotation files associated with the indicated genomes can be accessed in the following directories:
static_data/annotation\n\u251c\u2500\u2500 ENSEMBL\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 65\n\u2502\u00a0\u00a0 \u2502\u00a0\u00a0 \u2514\u2500\u2500 NCBIM37\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 67\n\u2502\u00a0\u00a0 \u2502\u00a0\u00a0 \u2514\u2500\u2500 NCBIM37\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 68\n\u2502\u00a0\u00a0 \u2502\u00a0\u00a0 \u2514\u2500\u2500 GRCm38\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 74\n\u2502\u00a0\u00a0 \u2502\u00a0\u00a0 \u2514\u2500\u2500 GRCh37\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 75\n\u2502\u00a0\u00a0 \u2502\u00a0\u00a0 \u2514\u2500\u2500 GRCh37\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 79\n\u2502\u00a0\u00a0 \u2502\u00a0\u00a0 \u2514\u2500\u2500 GRCh38\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 80\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 GRCh38\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 GRCm38\n\u2514\u2500\u2500 GENCODE\n \u251c\u2500\u2500 19\n \u2502\u00a0\u00a0 \u2514\u2500\u2500 GRCh37\n \u251c\u2500\u2500 22\n \u2502\u00a0\u00a0 \u2514\u2500\u2500 GRCh38\n \u251c\u2500\u2500 M1\n \u2502\u00a0\u00a0 \u2514\u2500\u2500 NCBIM37\n \u2514\u2500\u2500 M5\n \u2514\u2500\u2500 GRCm38\n
"},{"location":"cubit/app-support/","title":"Cubit Static Data: Application Support","text":"The static_data/app_support
directory contains all data files that are shipped with a software package installed in cubit. For blast
this is not complete and more databases can be added upon request.
static_data/app_support\n\u251c\u2500\u2500 blast\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 variable\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 nt\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 refseq_protein\n\u251c\u2500\u2500 Delly\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 0.6.5\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 0.6.7\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 0.7.1\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 0.7.2\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 0.7.3\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 0.7.5\n\u251c\u2500\u2500 GATK_bundle\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 2.8\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 b37\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 hg19\n\u251c\u2500\u2500 Jannovar\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 0.14\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 0.16\n\u251c\u2500\u2500 kraken\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 0.10.5-cubi20160426\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 bacvir\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 minikraken_20141208\n\u251c\u2500\u2500 Oncotator\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 v1_ds_Jan262015\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 1000genome_db\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 achilles\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 cancer_gene_census\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 ccle_by_gene\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 ccle_by_gp\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 clinvar\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 cosmic\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 cosmic_fusion\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 cosmic_tissue\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 dbNSFP_ds\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 dbsnp\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 dna_repair_genes\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 esp6500SI_v2\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 esp6500SI_v2_coverage\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 familial\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 gencode_out2\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 gencode_xrefseq\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 hgnc\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 mutsig\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 oreganno\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 override_lists\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 ref_hg\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 simple_uniprot\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 so_terms\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 tcgascape\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 tumorscape\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 uniprot_aa_annotation\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 uniprot_aa_xform\n\u2514\u2500\u2500 SnpEff\n \u2514\u2500\u2500 4.1\n \u2514\u2500\u2500 data\n \u251c\u2500\u2500 GRCh37.75\n \u251c\u2500\u2500 GRCh38.79\n \u251c\u2500\u2500 GRCm38.79\n \u251c\u2500\u2500 hg19\n \u251c\u2500\u2500 hg38\n \u2514\u2500\u2500 mm10\n
"},{"location":"cubit/databases/","title":"Databases","text":"The file formats in the static_data/db
folder are mostly .vcf
or .bed
files. We provide the following databases:
The directory structure is as follows:
static_data/db\n\u251c\u2500\u2500 COSMIC\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 v72\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 GRCh37\n\u251c\u2500\u2500 dbNSFP\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 2.9\n\u251c\u2500\u2500 dbSNP\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 b128\n\u2502\u00a0\u00a0 \u2502\u00a0\u00a0 \u251c\u2500\u2500 mm9\n\u2502\u00a0\u00a0 \u2502\u00a0\u00a0 \u2514\u2500\u2500 NCBIM37\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 b142\n\u2502\u00a0\u00a0 \u2502\u00a0\u00a0 \u2514\u2500\u2500 GRCh37\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 b144\n\u2502\u00a0\u00a0 \u2502\u00a0\u00a0 \u2514\u2500\u2500 GRCh38\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 b147\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 GRCh37\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 GRCh38\n\u251c\u2500\u2500 DGV\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 2015-07-23\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 GRCh37\n\u251c\u2500\u2500 ExAC\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 release0.3\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 release0.3.1\n\u251c\u2500\u2500 giab\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 NA12878_HG001\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 NISTv2.19\n\u251c\u2500\u2500 goldenpath\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 variable\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 GRCh37\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 hg19\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 mm9\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 NCBIM37\n\u251c\u2500\u2500 SangerMouseGenomesProject\n\u2502 \u2514\u2500\u2500 REL-1211-SNPs_Indels\n\u2502 \u251c\u2500\u2500 mm9\n\u2502 \u2514\u2500\u2500 NCBIM37\n\u2514\u2500\u2500 UK10K_cohort\n \u2514\u2500\u2500 REL-2012-06-02\n
"},{"location":"cubit/exomes-panels/","title":"Exomes and Panels","text":"These exome panel data are proprietary and downloaded after registration. In case you want to use them, be sure you have access to them by creating an account at Agilent or Roche to not run into legal trouble.
static_data/exome_panel\n\u251c\u2500\u2500 Agilent\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 SureSelect_Human_All_Exon_V4\n\u2502\u00a0\u00a0 \u2502\u00a0\u00a0 \u251c\u2500\u2500 GRCh37\n\u2502\u00a0\u00a0 \u2502\u00a0\u00a0 \u2514\u2500\u2500 hg19\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 SureSelect_Human_All_Exon_V5\n\u2502\u00a0\u00a0 \u2502\u00a0\u00a0 \u251c\u2500\u2500 GRCh37\n\u2502\u00a0\u00a0 \u2502\u00a0\u00a0 \u2514\u2500\u2500 hg19\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 SureSelect_Human_All_Exon_V6\n\u2502\u00a0\u00a0 \u2502\u00a0\u00a0 \u251c\u2500\u2500 GRCh37\n\u2502\u00a0\u00a0 \u2502\u00a0\u00a0 \u2514\u2500\u2500 hg19\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 SureSelect_Mouse_All_Exon_V1\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 mm9\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 NCBIM37\n\u2514\u2500\u2500 Roche\n \u2514\u2500\u2500 SeqCap_EZ_MedExome\n \u2514\u2500\u2500 GRCh37\n
"},{"location":"cubit/exon-lists/","title":"Exon Lists","text":"Here we provide exon lists for some human genome assemblies in the .bed
-file format. Each file exists with the original coordinates contained and as a version with 10 bp padded on each site (suffix: _plus_10bp.bed
). The folder structure is self-explanatory:
static_data/exon_list\n\u251c\u2500\u2500 CCDS\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 15\n\u2502\u00a0\u00a0 \u2502\u00a0\u00a0 \u251c\u2500\u2500 GRCh37\n\u2502\u00a0\u00a0 \u2502\u00a0\u00a0 \u2514\u2500\u2500 hg19\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 18\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 GRCh38\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 hg38\n\u2514\u2500\u2500 ENSEMBL\n \u251c\u2500\u2500 74\n \u2502\u00a0\u00a0 \u2514\u2500\u2500 GRCh37\n \u2514\u2500\u2500 75\n \u2514\u2500\u2500 GRCh37\n
"},{"location":"cubit/index-files/","title":"Precomputed Index Files","text":"Index files for
have been precomputed. The index corresponding to each genome is stored in the following directory structure with the above mentioned reference genomes as subfolders (listed here only for Bowtie/1.1.2
, same subfolders for the remaining programs):
static_data/precomputed\n\u251c\u2500\u2500 Bowtie\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 1.1.2\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 danRer10\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 dm6\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 ecoli\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 GRCh37\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 GRCh38\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 GRCm38\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 hg18\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 hg19\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 hg38\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 mm10\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 mm9\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 NCBIM37\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 phix\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 sacCer3\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 UniVec\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 UniVec_Core\n\u251c\u2500\u2500 Bowtie2\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 2.2.5\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 [see Bowtie/1.1.2]\n\u251c\u2500\u2500 BWA\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 0.7.12\n\u2502\u00a0\u00a0 \u2502 \u2514\u2500\u2500 [see Bowtie/1.1.2]\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 0.7.15\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 [see Bowtie/1.1.2]\n\u2514\u2500\u2500 STAR\n \u2514\u2500\u2500 2.4.1d\n \u2514\u2500\u2500 default\n \u00a0\u00a0 \u2514\u2500\u2500 [see Bowtie/1.1.2]\n
"},{"location":"cubit/references/","title":"Reference Sequences","text":""},{"location":"cubit/references/#ncbi-mouse-reference-genome-assemblies","title":"NCBI mouse reference genome assemblies","text":"We provide the NCBI mouse reference assembly used by the Sanger Mouse Genomics group for NCBIM37 and GRCm38. This is a reliable source where the appropriate contigs have already been selected by experts. NCBIM37 is annotated with Ensembl release 67 and GRCm38 with Ensembl release 68.
"},{"location":"cubit/references/#ucsc-mouse-reference-genome-assemblies","title":"UCSC mouse reference genome assemblies","text":"The assembly sequence is in one file per chromosome and is available for mm9 and mm10. We concatenated all the chromosome files to one final fasta file for each genome assembly.
"},{"location":"cubit/references/#ncbi-human-reference-genome-assemblies","title":"NCBI human reference genome assemblies","text":"The assembly sequence is in one file per chromosome is available for hg18, hg19 and hg38. We concatenated all the chromosome files to one final fasta file for each genome assembly. Additionally, in the subfolder chromosomes
we keep the chromosome fasta files separately for hg18 and hg19.
The following directory structure indicates the available genomes. Where there isn't a name for the data set, either the source (e.g. sanger - from the Sanger Mouse Genomes project) or the download date is used to name the sub-directory.
static_data/reference\n\u251c\u2500\u2500 danRer10\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 ucsc\n\u251c\u2500\u2500 dm6\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 ucsc\n\u251c\u2500\u2500 ecoli\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 GCA_000005845.2_ASM584v2\n\u251c\u2500\u2500 genomemedley\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 1\n\u251c\u2500\u2500 GRCh37\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 g1k_phase1\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 g1k_phase2\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 hs37\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 hs37d5\n\u251c\u2500\u2500 GRCh38\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 hs38\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 hs38a\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 hs38DH\n\u251c\u2500\u2500 GRCm38\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 sanger\n\u251c\u2500\u2500 hg18\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 ucsc\n\u251c\u2500\u2500 hg19\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 ucsc\n\u251c\u2500\u2500 hg38\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 ucsc\n\u251c\u2500\u2500 mm10\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 ucsc\n\u251c\u2500\u2500 mm9\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 ucsc\n\u251c\u2500\u2500 NCBIM37\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 sanger\n\u251c\u2500\u2500 phix\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 illumina\n\u251c\u2500\u2500 sacCer3\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 ucsc\n\u251c\u2500\u2500 UniVec\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 9\n\u2514\u2500\u2500 UniVec_Core\n \u2514\u2500\u2500 9\n
"},{"location":"help/faq/","title":"Frequently Asked Questions","text":""},{"location":"help/faq/#where-can-i-get-help","title":"Where can I get help?","text":"Please see the section Connection Problems.
"},{"location":"help/faq/#connecting-to-the-cluster-takes-a-long-time","title":"Connecting to the cluster takes a long time.","text":"The most probable cause for this is a conda installation which defaults to loading the (Base) environment on login. To disable this behaviour you can run:
$ conda config --set auto_activate_base false\n
You can also run the bash shell in verbose mode to find out exactly which command is slowing down login:
$ ssh user@hpc-login-1.cubi.bihealth.org bash -iv\n
"},{"location":"help/faq/#what-is-the-difference-between-max-and-bih-cluster-what-is-their-relation","title":"What is the difference between MAX and BIH cluster? What is their relation?","text":"Administrativa
Request for both systems are handled separately, depending on the user's affiliation with research/service groups.
Hardware and Systems
Bioinformatics Software
packet_write_wait: Connection to XXX : Broken pipe
\". How can I fix this?","text":"Try to put the following line at the top of your ~/.ssh/config
.
ServerAliveInterval 30\n
This will make ssh
send an empty network package to the server. This will prevent network hardware from thinking your connection is unused/broken and terminating it.
If the problem persists, please report it to hpc-helpdesk@bih-charite.de.
"},{"location":"help/faq/#my-job-terminated-before-being-done-what-happened","title":"My job terminated before being done. What happened?","text":"First of all, look into your job logs. In the case that the job was terminated by Slurm (e.g., because it ran too long), you will find a message like this at the bottom. Please look at the end of the last line in your log file.
slurmstepd: error: *** JOB <your job id> ON med0xxx CANCELLED AT 2020-09-02T21:01:12 DUE TO TIME LIMIT ***\n
This indicates that you need to need to adjust the --time
limit to your sbatch
command.
slurmstepd: error: Detected 2 oom-kill event(s) in step <your job id>.batch cgroup.\nSome of your processes may have been killed by the cgroup out-of-memory handler\n
This indicates that your job tries to use more memory than has been allocated to it. Also see Slurm Scheduler: Memory Allocation
Otherwise, you can use sacct -j JOBID
to read the information that the job accounting system has recorded for your job. A job that was canceled (indicated by CANCELED
) by the Slurm job scheduler looks like this (ignore the COMPLETED
step that is just some post-job step added by Slurm automatically).
# sacct -j _JOBID_\n JobID JobName Partition Account AllocCPUS State ExitCode\n------------ ---------- ---------- ---------- ---------- ---------- --------\n_JOBID_ snakejob.+ medium hpc-ag-xx+ 4 TIMEOUT 0:0\n_JOBID_.bat+ batch hpc-ag-xx+ 4 CANCELLED 0:15\n_JOBID_.ext+ extern hpc-ag-xx+ 4 COMPLETED 0:0\n
Use the --long
flag to see all fields (and probably pipe it into less
as: sacct -j JOBID --long | less -S
). Things to look out for:
MaxRSS
)?Elapsed
)?Note that --long
does not show all fields. For example, the following tells us that the given job was above its elapsed time which caused it to be killed.
# sacct -j _JOBID_ --format Timelimit,Elapsed\n Timelimit Elapsed\n---------- ----------\n 01:00:00 01:00:12\n 01:00:13\n 01:00:12\n
Use man sacct
, sacct --helpformat
, or see the Slurm Documentation for options for the --format
field of sacct
.
This is most probably caused by your job being allocated insufficient memory. Please see the memory part of the answer to My job terminated before being done. What happened?
"},{"location":"help/faq/#how-can-i-create-a-new-project","title":"How can I create a new project?","text":"You can create a project if you are either a group leader of an AG or a delegate of an AG. If this is the case, please follow these instructions.
"},{"location":"help/faq/#i-cannot-create-pngs-in-r","title":"I cannot create PNGs in R","text":"For using the png
method, you need to have an X11 session running. This might be the case if you logged into a cluster node using srun --x11
if configured correctly but is not the case if you submitted a bash job. The solution is to use xvfb-run
(xvfb = X11 virtual frame-buffer).
Here is the content of an example script:
$ cat img.R\n#!/usr/bin/env Rscript\n\npng('cars.png')\ncars <- c(1, 3, 6, 4, 9)\nplot(cars)\ndev.off()\n
Here, it fails without X11:
$ ./img.R\nError in .External2(C_X11, paste(\"png::\", filename, sep = \"\"), g$width, :\n unable to start device PNG\nCalls: png\nIn addition: Warning message:\nIn png(\"cars.png\") : unable to open connection to X11 display ''\nExecution halted\n
Here, it works with xvfb-run
:
$ xvfb-run ./img.R\nnull device\n 1\n$ ls\ncars.png foo.png img.R Rplots.pdf\n
"},{"location":"help/faq/#my-jobs-dont-get-scheduled","title":"My jobs don't get scheduled","text":"You can use scontrol show job JOBID
to get the details displayed about your jobs. In the example below, we can see that the job is in the PENDING
state. The Reason
field tells us that the job did not scheduled because the specified dependency was neverfulfilled. You can find a list of all job reason codes in the Slurm squeue
documentation.
JobId=863089 JobName=pipeline_job.sh\n UserId=holtgrem_c(100131) GroupId=hpc-ag-cubi(5272) MCS_label=N/A\n Priority=1 Nice=0 Account=(null) QOS=normal\n JobState=PENDING Reason=DependencyNeverSatisfied Dependency=afterok:863087(failed)\n Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0\n RunTime=00:00:00 TimeLimit=08:00:00 TimeMin=N/A\n SubmitTime=2020-05-03T18:57:34 EligibleTime=Unknown\n AccrueTime=Unknown\n StartTime=Unknown EndTime=Unknown Deadline=N/A\n SuspendTime=None SecsPreSuspend=0 LastSchedEval=2020-05-03T18:57:34\n Partition=debug AllocNode:Sid=hpc-login-1:28797\n ReqNodeList=(null) ExcNodeList=(null)\n NodeList=(null)\n NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*\n TRES=cpu=1,node=1,billing=1\n Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*\n MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0\n Features=(null) DelayBoot=00:00:00\n OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)\n Command=/data/cephfs-1/work/projects/medgen_genomes/2019-06-05_genomes_reboot/GRCh37/wgs_cnv_export/pipeline_job.sh\n WorkDir=/data/cephfs-1/work/projects/medgen_genomes/2019-06-05_genomes_reboot/GRCh37/wgs_cnv_export\n StdErr=/data/cephfs-1/work/projects/medgen_genomes/2019-06-05_genomes_reboot/GRCh37/wgs_cnv_export/slurm-863089.out\n StdIn=/dev/null\n StdOut=/data/cephfs-1/work/projects/medgen_genomes/2019-06-05_genomes_reboot/GRCh37/wgs_cnv_export/slurm-863089.out\n Power=\n MailUser=(null) MailType=NONE\n
If you see a Reason=ReqNodeNotAvail,_Reserved_for_maintenance
then also see Reservations / Maintenances.
For GPU jobs also see \"My GPU jobs don't get scheduled\".
"},{"location":"help/faq/#my-gpu-jobs-dont-get-scheduled","title":"My GPU jobs don't get scheduled","text":"There are only four GPU machines in the cluster (with four GPUs each, hpc-gpu-1 to hpc-gpu-4). Please inspect first the number of running jobs with GPU resource requests:
hpc-login-1:~$ squeue -o \"%.10i %20j %.2t %.5D %.4C %.10m %.16R %.13b\" \"$@\" | grep hpc-gpu- | sort -k7,7\n 1902163 ONT-basecalling R 1 2 8G hpc-gpu-1 gpu:tesla:2\n 1902167 ONT-basecalling R 1 2 8G hpc-gpu-1 gpu:tesla:2\n 1902164 ONT-basecalling R 1 2 8G hpc-gpu-2 gpu:tesla:2\n 1902166 ONT-basecalling R 1 2 8G hpc-gpu-2 gpu:tesla:2\n 1902162 ONT-basecalling R 1 2 8G hpc-gpu-3 gpu:tesla:2\n 1902165 ONT-basecalling R 1 2 8G hpc-gpu-3 gpu:tesla:2\n 1785264 bash R 1 1 1G hpc-gpu-4 gpu:tesla:2\n
This indicates that there are two free GPUs on hpc-gpu-4.
Second, inspect the node states:
hpc-login-1:~$ sinfo -n hpc-gpu-[1-4]\nPARTITION AVAIL TIMELIMIT NODES STATE NODELIST\ndebug* up 8:00:00 0 n/a\nmedium up 7-00:00:00 0 n/a\nlong up 28-00:00:0 0 n/a\ncritical up 7-00:00:00 0 n/a\nhighmem up 14-00:00:0 0 n/a\ngpu up 14-00:00:0 1 drng hpc-gpu-4\ngpu up 14-00:00:0 3 mix med[0301-0303]\nmpi up 14-00:00:0 0 n/a\n
This tells you that hpc-gpu-1 to hpc-gpu-3 have jobs running (\"mix\" indicates that there are free resources, but these are only CPU cores not GPUs). hpc-gpu-4 is shown to be in \"draining state\". Let's look what's going on there.
hpc-login-1:~$ scontrol show node hpc-gpu-4\nNodeName=hpc-gpu-4 Arch=x86_64 CoresPerSocket=16\n CPUAlloc=2 CPUTot=64 CPULoad=1.44\n AvailableFeatures=skylake\n ActiveFeatures=skylake\n Gres=gpu:tesla:4(S:0-1)\n NodeAddr=hpc-gpu-4 NodeHostName=hpc-gpu-4 Version=20.02.0\n OS=Linux 3.10.0-1127.13.1.el7.x86_64 #1 SMP Tue Jun 23 15:46:38 UTC 2020\n RealMemory=385215 AllocMem=1024 FreeMem=347881 Sockets=2 Boards=1\n State=MIXED+DRAIN ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A\n Partitions=gpu\n BootTime=2020-06-30T20:33:36 SlurmdStartTime=2020-07-01T09:31:51\n CfgTRES=cpu=64,mem=385215M,billing=64\n AllocTRES=cpu=2,mem=1G\n CapWatts=n/a\n CurrentWatts=0 AveWatts=0\n ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s\n Reason=deep power-off required for PSU [root@2020-07-17T13:21:02]\n
The \"State\" attribute indicates the node has jobs running but is currenlty being \"drained\" (accepts no new jobs). The \"Reason\" gives that it has been scheduled for power-off for maintenance of the power supply unit.
"},{"location":"help/faq/#when-will-my-job-be-scheduled","title":"When will my job be scheduled?","text":"You can use the scontrol show job JOBID
command to inspect the scheduling information for your job. For example, the following job is scheduled to start at 2022-09-19T07:53:29
(StartTime
) and will be terminated if it does not stop before 2022-09-19T15:53:29
(EndTime
) For further information, it has been submitted at 2022-09-15T12:24:57
(SubmitTime
) and has been last considered by the scheduler at 2022-09-19T07:53:15
(LastSchedEval
).
# scontrol show job 4225062\nJobId=4225062 JobName=C2371_2\n UserId=user_c(133196) GroupId=hpc-ag-group(1030014) MCS_label=N/A\n Priority=805 Nice=0 Account=hpc-ag-group QOS=normal\n JobState=PENDING Reason=QOSMaxCpuPerUserLimit Dependency=(null)\n Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0\n RunTime=00:00:00 TimeLimit=08:00:00 TimeMin=N/A\n SubmitTime=2022-09-15T12:24:57 EligibleTime=2022-09-15T12:24:57\n AccrueTime=2022-09-15T12:24:57\n StartTime=2022-09-19T07:53:29 EndTime=2022-09-19T15:53:29 Deadline=N/A\n SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-09-19T07:53:15 Scheduler=Main\n Partition=medium AllocNode:Sid=hpc-login-1:557796\n ReqNodeList=(null) ExcNodeList=(null)\n NodeList=(null)\n NumNodes=1-1 NumCPUs=25 NumTasks=25 CPUs/Task=1 ReqB:S:C:T=0:0:*:*\n TRES=cpu=25,mem=150G,node=1,billing=25\n Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*\n MinCPUsNode=1 MinMemoryNode=150G MinTmpDiskNode=0\n Features=(null) DelayBoot=00:00:00\n OverSubscribe=YES Contiguous=0 Licenses=(null) Network=(null)\n Command=/data/cephfs-1/home/users/user_c/work/SCZ_replic/JR_sims/GS_wrapy/wrap_y0_VP_2371_GS_chunk2_C02.sh\n WorkDir=/data/cephfs-1/home/users/user_c/work/SCZ_replic/JR_sims\n StdErr=/data/cephfs-1/home/users/user_c/work/SCZ_replic/JR_sims/E2371_2.txt\n StdIn=/dev/null\n StdOut=/data/cephfs-1/home/users/user_c/work/SCZ_replic/JR_sims/slurm-4225062.out\n Power=\n
"},{"location":"help/faq/#my-jobs-dont-run-in-the-partition-i-expect","title":"My jobs don't run in the partition I expect","text":"You can see the partition that your job runs in with squeue -j JOBID
:
hpc-login-1:~$ squeue -j 877092\n JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)\n 877092 medium snakejob holtgrem R 0:05 1 med0626\n
See Job Scheduler for information about the partition's properties and how jbos are routed to partitions. You can force jobs to run in a particular partition by specifying the --partition
parameter, e.g., by adding --partition=medium
or -p medium
to your srun
and sbatch
calls.
This is probably answered by the answer to My jobs don't run in the partition I expect.
"},{"location":"help/faq/#how-can-i-mount-a-network-volume-from-elsewhere-on-the-cluster","title":"How can I mount a network volume from elsewhere on the cluster?","text":"You cannot.
"},{"location":"help/faq/#how-can-i-make-workstationserver-files-available-to-the-hpc","title":"How can I make workstation/server files available to the HPC?","text":"You can transfer files to the cluster through Rsync over SSH or through SFTP to the hpc-transfer-1
or hpc-transfer-2
node.
Do not transfer files through the login nodes. Large file transfers through the login nodes can cause performance degradation for the users with interactive SSH connections.
"},{"location":"help/faq/#how-can-i-circumvent-invalid-instruction-signal-4-errors","title":"How can I circumvent \"invalid instruction\" (signal 4) errors?","text":"Make sure that software is compiled with \"sandy bridge\" optimizations and no later one. E.g., use the -march=sandybridge
argument to the GCC/LLVM compiler executables.
If you absolutely need it, there are some boxes with more recent processors in the cluster (e.g., Haswell architecture). Look at the /proc/cpuinfo
files for details.
Please check whether there might be other jobs waiting in front of you! The following squeue
call will show the allocated GPUs of jobs in the gpu
queue. This is done by specifying a format string and using the %b
field.
squeue -o \"%.10i %9P %20j %10u %.2t %.10M %.6D %10R %b\" -p gpu\n JOBID PARTITION NAME USER ST TIME NODES NODELIST(R TRES_PER_NODE\n 872571 gpu bash user1 R 15:53:25 1 hpc-gpu-3 gpu:tesla:1\n 862261 gpu bash user2 R 2-16:26:59 1 hpc-gpu-4 gpu:tesla:4\n 860771 gpu kidney.job user3 R 2-16:27:12 1 hpc-gpu-2 gpu:tesla:1\n 860772 gpu kidney.job user3 R 2-16:27:12 1 hpc-gpu-2 gpu:tesla:1\n 860773 gpu kidney.job user3 R 2-16:27:12 1 hpc-gpu-2 gpu:tesla:1\n 860770 gpu kidney.job user3 R 4-03:23:08 1 hpc-gpu-1 gpu:tesla:1\n 860766 gpu kidney.job user3 R 4-03:23:11 1 hpc-gpu-3 gpu:tesla:1\n 860767 gpu kidney.job user3 R 4-03:23:11 1 hpc-gpu-1 gpu:tesla:1\n 860768 gpu kidney.job user3 R 4-03:23:11 1 hpc-gpu-1 gpu:tesla:1\n
In the example above, user1 has one job with one GPU running on hpc-gpu-3, user2 has one job running with 4 GPUs on hpc-gpu-4 and user3 has 7 jobs in total running of different machines with one GPU each.
"},{"location":"help/faq/#how-can-i-access-graphical-user-interfaces-such-as-for-matlab-on-the-cluster","title":"How can I access graphical user interfaces (such as for Matlab) on the cluster?","text":"-X
for Linux/Mac ssh
srun --pty --x11 bash -i
(instead of srun --pty --x11 bash -i
).Also see:
This is sometimes useful, e.g., for monitoring the CPU/GPU usage of your job interactively.
No Computation Outside of Slurm
Do not perform any computation outside of the scheduler as (1) this breaks the purpose of the scheduling system and (2) administration is not aware and might kill you jobs.
The answer is simple, just SSH into this node.
hpc-login-1:~$ ssh hpc-cpu-xxx\n
"},{"location":"help/faq/#why-am-i-getting-multiple-nodes-to-my-job","title":"Why am I getting multiple nodes to my job?","text":"Classically, jobs on HPC systems are written in a way that they can run on multiple nodes at once, using the network to communicate. Slurm comes from this world and when allocating more than one CPU/core, it might allocate them on different nodes. Please use --nodes=1
to force Slurm to allocate them on a single node.
You can select the CPU architecture by using the -C
/--constraint
flag to sbatch
and srun
. The following are available (as detected by the Linux kernel):
ivybridge
(96 nodes, plus 4 high-memory nodes)haswell
(16 nodes)broadwell
(112 nodes)skylake
(16 nodes, plus 4 GPU nodes)You can specify contraints with OR such as --constraint=haswell|broadwell|skylake
. You can see the assignment of architectures to nodes using the sinfo -o \"%8P %.5a %.10l %.6D %.6t %10f %N\"
command. This will also display node partition, availability etc.
No worries!
As documented in the Storage Locations section, each user/project/group has three storage volumes: A small home
, a larger work
and a large (but temporary) scratch
. There are limits on the size of these volumes. You get a nightly warning email in case you are over the soft limit and you will not be able to write any more data if you get above the hard limit. When you login to the login nodes, the quotas and current usage is displayed to you.
Please note that not all files will be displayed when using ls
. You have to add the -a
parameter to also show files and directory starting with a dot. Often, users are confused if these dot directories take up all of their home
quota.
Use the following command to list all files and directories in your home:
hpc-login-1:~$ ls -la ~/\n
For more information on how to keep your home directory clean and avoid quota warnings, please read Home Folder Quota.
"},{"location":"help/faq/#im-getting-a-disk-quota-exceeded-error","title":"I'm getting a \"Disk quota exceeded\" error.","text":"Most probably you are running into the same problem as described above: Help, I'm getting a Quota Warning Email!
"},{"location":"help/faq/#environment-modules-dont-work-and-i-get-module-command-not-found","title":"Environment modules don't work and I get \"module: command not found\"","text":"First of all, ensure that you are on a compute node and not on one of the login nodes. One common reason is that the system-wide Bash configuration has not been loaded, try to execute source /etc/bashrc
and then re-try using module
. In the case that the problem persists, please contact hpc-helpdesk@bih-charite.de.
All users get their home directory setup using a skelleton files. These file names start with a dot .
and are hidden when you type ls
, you have to type ls -a
to see them. You can find the current skelleton in /etc/skel.bih
and inspect the content of the Bash related files as follows:
hpc-login-1:~$ head /etc/skel.bih/.bash*\n==> /etc/skel.bih/.bash_logout <==\n# ~/.bash_logout\n\n==> /etc/skel.bih/.bash_profile <==\n# .bash_profile\n\n# Get the aliases and functions\nif [ -f ~/.bashrc ]; then\n . ~/.bashrc\nfi\n\n# User specific environment and startup programs\n\nPATH=$PATH:$HOME/.local/bin:$HOME/bin\n\n==> /etc/skel.bih/.bashrc <==\n# .bashrc\n\n# Source global definitions\nif [ -f /etc/bashrc ]; then\n . /etc/bashrc\nfi\n\n# Uncomment the following line if you don't like systemctl's auto-paging feature:\n# export SYSTEMD_PAGER=\n
There actually are a couple of more files by default. The original copy in /etc/skel.bih
might slightly change over time during improvements but we will not touch your home directory in an unsolicited way at any time!
hpc-login-1:~$ tree -a /etc/skel.bih/\n/etc/skel.bih/\n\u251c\u2500\u2500 .bash_logout\n\u251c\u2500\u2500 .bash_profile\n\u251c\u2500\u2500 .bashrc\n\u251c\u2500\u2500 .screenrc\n\u2514\u2500\u2500 .vimrc\n
"},{"location":"help/faq/#my-program-crashes-what-should-i-do","title":"My program crashes! What should I do?","text":"Have a look at our How-To: Debug Software and How-To: Debug Software on HPC Systems guides!
But it works on my workstation!
Yes, please also refer to these guides on possible approaches to find the problem.
"},{"location":"help/faq/#my-r-sessions-is-always-crashing","title":"My R sessions is always crashing!","text":"We are aware of issues for macOS users that are related to locales. You might see something like the following in your R session or logs.
Note that this is not macOS specific by itself but by macOS using a locale that R does not understand.
Try using export LC_ALL=C
before calling R. If the problem goes away, you can add the following lines to your ~/.bashrc
file to make this permanent. Note that this is actually a good idea so your locale settings are fixed.
# Fix all locale settings to \"C\" for robustness.\nexport LC_ALL=C\n
ChatGPT on Unix Locales
Let's se what ChatGPT 3.5 tells us about Unix locale.
Me: Please explain the concept of Unix locales to me.
Certainly! Unix locales are a set of environment variables and settings that define how various aspects of software and user interactions should be handled to accommodate different languages, regions, and cultural conventions. They play a crucial role in enabling software to adapt to local preferences for displaying text, formatting numbers, handling dates, and more.
In a Unix-like operating system (such as Linux or macOS), locales provide a way to customize how programs behave based on the user's language, country, and cultural norms. Here are some of the key components of Unix locales:
Language: The primary language used for messages, menus, prompts, and system output. For instance, if you set your locale to \"en_US\", messages and interfaces will be in English (United States).
Region: The country or region that defines conventions for aspects like date and time formatting, number formatting (decimal point or comma, thousands separator, etc.), and currency symbols. For example, \"en_GB\" represents English (Great Britain).
Character Encoding: Specifies how characters are encoded, which is vital for correctly displaying and handling text in various scripts and languages. Common encodings include UTF-8 for Unicode text.
Collation: Defines how strings should be sorted. Different languages have different sorting orders for characters, and the locale determines the appropriate collation rules.
Numeric and Date Formats: Specifies how numbers, dates, and times should be formatted. Different regions have different conventions for date formats (dd/mm/yyyy vs. mm/dd/yyyy) and numerical representations (commas vs. periods as decimal points).
Currency Symbols: Defines the symbol or abbreviation used for representing currency. This is important for currency formatting in software.
Unix locales are typically identified by a combination of language and region codes, like \"en_US\" for American English or \"fr_FR\" for French (France). These locales are stored in files on the system, and they contain the necessary rules and data for proper localization.
Users and administrators can set the system's default locale, and individual users can override this setting in their environment. Here are some common commands used to manage locales:
locale
: Displays the current locale settings.locale -a
: Lists all available locales on the system.export LC_ALL=locale_code
: Temporarily sets the locale for the current session.update-locale locale_code
: Changes the system-wide default locale.Programs that are designed to be locale-aware use these settings to display messages, format numbers and dates, and handle other language-dependent features. This allows software to be more user-friendly and relevant to different linguistic and cultural contexts.
In summary, Unix locales provide a framework for adapting software behavior to match the language, region, and cultural preferences of users, ensuring a more inclusive and globally accessible computing experience.
"},{"location":"help/faq/#which-cuda-version-is-installed","title":"Which CUDA version is installed?","text":"For this, connect to the node you want to query (via SSH but do not perform any computation via SSH!)
hpc-login-1:~$ ssh hpc-gpu-1\nhpc-gpu-1:~$ yum list installed 2>/dev/null | grep cuda.x86_64\ncuda.x86_64 10.2.89-1 @local-cuda\nnvidia-driver-latest-dkms-cuda.x86_64 3:440.64.00-1.el7 @local-cuda\n
"},{"location":"help/faq/#can-i-use-docker-on-the-cluster","title":"Can I use Docker on the Cluster?","text":"No, as Docker essentially gives you access as the root user.
However, you can use Apptainer (former Singularity) to run containers (and even many Docker contains if they are \"properly built\"). Also see Using Apptainer (with Docker Images).
"},{"location":"help/faq/#how-can-i-copy-data-between-the-max-cluster-mdc-network-and-bih-hpc","title":"How can I copy data between the MAX Cluster (MDC Network) and BIH HPC?","text":"The MAX cluster is the HPC system of the MDC. It is located in the MDC network. The BIH HPC is located in the BIH network.
In general, connections can only be initiated from the MDC network to the BIH network. The reverse does not work. In other words, you have to log into the MAX cluster and then initiate your file copies to or from the BIH HPC from there. E.g., use rsync -avP some/path user_m@hpc-transfer-1.cubi.bihealth.org:/another/path
to copy files from the MAX cluster to BIH HPC and rsync -avP user_m@hpc-transfer-1.cubi.bihealth.org:/another/path some/path
to copy data from the BIH HPC to the MAX cluster.
In general, connections can only be initiated from the Charite network to the BIH network. The reverse does not work. In other words, you have to be on a machine inside the Charite network and then initiate your file copies to or from the BIH HPC from there. E.g., use rsync -avP some/path user_c@hpc-transfer-1.cubi.bihealth.org:/another/path
to copy files from the MAX cluster to BIH HPC and rsync -avP user_c@hpc-transfer-1.cubi.bihealth.org:/another/path some/path
to copy data from the BIH HPC to the MAX cluster.
As of December 3, 2020 we have established a policy to limit you to 512 files and 128MB of RAM. Further, you are limited to using the equivalent of one core. This limit is enforced for all processes originating from an SSH session and the limit is enforced on all jobs. This was done to prevent users from thrashing the head nodes or using SSH based sessions for computation.
"},{"location":"help/faq/#slurm-complains-about-execve-no-such-file-or-directory","title":"Slurm complains aboutexecve
/ \"No such file or directory\"","text":"This means that the program that you want to execute does not exist. Consider the following example:
[user@hpc-login-1 ~]$ srun --time 2-0 --nodes=1 --ntasks-per-node=1 \\\n --cpus-per-task=12 --mem 96G --partition staging --immediate 5 \\\n --pty bash -i\nslurmstepd: error: execve(): 5: No such file or directory\nsrun: error: hpc-cpu-2: task 0: Exited with exit code 2\n
Can you spot the problem? In this case, the problem is that for long arguments such as --mem
you must use the equal sign for --arg=value
with Slurm. This means that instead of writing --mem 96G --partition staging --immediate 5
, you must use `--mem=96G --partition=staging --immediate=5
.
In this respect, Slurm deviates from the GNU argument syntax where the equal sign is optional for long arguments.
"},{"location":"help/faq/#slurmstepd-says-that-hwloc_get_obj_below_by_type-fails","title":"slurmstepd
says that hwloc_get_obj_below_by_type
fails","text":"You can ignore the following problem:
slurmstepd: error: hwloc_get_obj_below_by_type() failing, task/affinity plugin may be required to address bug fixed in HWLOC version 1.11.5\nslurmstepd: error: task[0] unable to set taskset '0x0'\n
This is a minor failure related to Slurm and cgroups. Your job should run through successfully despite this error (that is more of a warning for end-users).
"},{"location":"help/faq/#how-can-i-share-filescollaborate-with-users-from-another-work-group","title":"How can I share files/collaborate with users from another work group?","text":"Please use projects as documented here. Projects were created for this particular purpose.
"},{"location":"help/faq/#whats-the-relation-of-charite-mdc-and-cluster-accounts","title":"What's the relation of Charite, MDC, and cluster accounts?","text":"For HPC 4 Research either an active and working Charite or MDC account is required (that is, you can login e.g., into email.charite.de or mail.mdc-berlin.de). The system has a separate meta directory that is used for the authorization of users (in other words, whether the user is active, has access to the system, and which groups the user belongs to). Charite and MDC accounts map to accounts <Charite user name>_c
and <MDC user name>_m
accounts in this meta directory. In the case that a user has both Charite and MDC accounts these are completely separate entities in the meta directory. For authentication (veryfing that a user has acccess to an account), the Charite and MDC account systems (MS Active Directory) are used. Authentication currently only uses the SSH keys deposited into Charite (via zugang.charite.de) and MDC (via MDC persdb). Users have to obtain a suitable Charite/MDC account via Charite and MDC central IT departments and upload their SSH keys through the host organization systems on their own. The hpc-helpdesk process is then used for getting their accounts setup on the HPC 4 Research system (the home/work/scratch shares being setup), becoming part of the special hpc-users
group that controls access to the system and organizing users into work groups and projects.
The process of submitting keys to Charite and MDC is documented in the \"Connecting\" section.
"},{"location":"help/faq/#how-do-charitemdccluster-accounts-interplay-with-vpn-and-the-mdc-jail-node","title":"How do Charite/MDC/Cluster accounts interplay with VPN and the MDC jail node?","text":"Charite users have to obtain a VPN account with the appropriate VPN access permissions, i.e., Zusatzantrag B as documented here. For Charite VPN, as for all Charite IT systems, users must use their Charite user name (e.g., jdoe
and not jdoe_c
).
MDC users either have to use MDC VPN or the MDC jail node, as documented here. For MDC VPN and jail node, as for all MDC IT systems, users must use their MDC user name (e.g., jdoe
and not jdoe_m
).
For help with VPN or jail node, please contact the central Charite or MDC helpdesks as appropriate.
Only when connecting from the host organizations' VPN or from the host organizations' jail node, the users use the HPC 4 Research user name that is jdoe_c
or jdoe_m
and not jdoe
!
BIH HPC IT does not have the resources to offer such a service to normal users.
In particular, for privacy sensitive data this comes with a large number of strings attached to fulfill all regulatory requirements. If you need to exchange such data then you need to contact the central IT departments of your home organisation:
If your data is not privacy sensitive or you can guarantee strong encryption of the data then the Gigamove service of RWTH Aachen might come in handy:
You can login via Charite/MDC credentials (or most German academic institutions) and store up to 1TB of data at a time in the account with each file having up to 100GB.
As a note, Charite GB IT has a (German) manual on how to use 7-Zip with AES256 and strong passwords for encrypting data such that it is fit for transfer over unencrypted channels. You can find it here (Charite Intranet only) at point 2.12.
The key point is using a strong password (e.g. with the pwgen
utility), creating an encrypted file with AES256 encryption, using distinct password for each recipient, and exchanging the password over a second channel (SMS or voice phone). Note that the central manual remains the ground truth of information and this FAQ entry may not reflect the current process recommended by GB IT if it changes without us noticing.
Can you solve the question yourself?
Please try to solve the question yourself with this manual and Google.
If the problem turns out to be hard, we're happy to help.
This page describes how to write a good help request ticket.
There is more specific questions for common issues given below.
"},{"location":"help/good-tickets/#problems-connecting-to-the-cluster","title":"Problems Connecting to the Cluster","text":"ifconfig
on Linux/Mac, ipconfig
on Windows)?ssh-add -l
and add -vvv
to the SSH command that fails for you.scontrol show job <jobid>
or sacct --long -j <jobid>
of your job.Getting Help
Our helpdesk can be reached via email to hpc-helpdesk@bih-charite.de. Please read our guide on how to write good tickets first.
Please also use the handy figure below on general problem resolution.
But before contacting the helpdesk, try to get help in the HPC Talk BIH HPC user self-help forum!
"},{"location":"help/helpdesk/#helpdesk-scope","title":"Helpdesk Scope","text":"Our helpdesk can support you in the following areas:
We will try our best to resolve these issues. Please note that all other questions can only be answered in a \"best effort way\".
"},{"location":"help/helpdesk/#helpdesk-non-scope","title":"Helpdesk Non-Scope","text":"The following topics are out of scope for the BIH HPC Helpdesk:
We're happy to see if we can help when there is a concrete problem with the software, e.g.,
Another community-driven possibility to get help is our \u201cHPC Talk\u201d forum. After this manual, it should be the first place to consult.
https://hpc-talk.cubi.bihealth.org/
Its main purpose is to serve as a FAQ, so with time and more people participating, you will more likely find an answer to your question. We also use it to make announcements and give an up-to-date status of current problems with the cluster, so it is worth logging in every once in a while. It is also a great first place to look at if you're experiencing problems with the cluster. Maybe it's a known issue.
Despite users also being able to answer questions, our admins do participate on a regular basis.
"},{"location":"how-to/connect/gpu-nodes/","title":"How-To: Connect to GPU Nodes","text":"The cluster has seven nodes with four Tesla V100 GPUs each: hpc-gpu-{1..7}
and one node with 10 A40 GPUs: hpc-gpu-8
.
Connecting to a node with GPUs is easy. You request one or more GPU cores by adding a generic resources flag to your Slurm job submission via srun
or sbatch
. - --gres=gpu:tesla:COUNT
will request NVIDIA V100 cores. - --gres=gpu:tesla:COUNT
will request NVIDIA A40 cores. - --gres=gpu:COUNT
will request any available GPU cores.
Your job will be automatically placed in the Slurm gpu
partition and allocated a number of COUNT
GPUs.
Info
Fair use rules apply. As GPU nodes are a limited resource, excessive use by single users is prohibited and can lead to mitigating actions. Be nice and cooperative with other users. Tip: getent passwd USER_NAME
will give you a user's contact details.
Interactive Use of GPU Nodes is Discouraged
While interactive computation on the GPU nodes is convenient, it makes it very easy to forget a job after your computation is complete and let it run idle. While your job is allocated, it blocks the allocated GPUs and other users cannot use them although you might not be actually using them. Please prefer batch jobs for your GPU jobs over interactive jobs.
Furthermore, interactive GPU jobs are currently limited to 24 hours. We will monitor the situation and adjust that limit to optimize GPU usage and usability.
Please also note that allocation of GPUs through Slurm is mandatory, in other words: Using GPUs via SSH sessions is prohibited. The scheduler is not aware of manually allocated GPUs and this interferes with other users' jobs.
"},{"location":"how-to/connect/gpu-nodes/#usage-example","title":"Usage example","text":""},{"location":"how-to/connect/gpu-nodes/#preparation","title":"Preparation","text":"We will setup a miniconda installation with pytorch
testing the GPU. If you already have this setup then you can skip this step
hpc-login-1:~$ srun --pty bash\nhpc-cpu-1:~$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh\nhpc-cpu-1:~$ bash Miniconda3-latest-Linux-x86_64.sh -b -p ~/work/miniconda3\nhpc-cpu-1:~$ source ~/work/miniconda3/bin/activate\nhpc-cpu-1:~$ conda create -y -n gpu-test pytorch cudatoolkit=10.2 -c pytorch\nhpc-cpu-1:~$ conda activate gpu-test\nhpc-cpu-1:~$ python -c 'import torch; print(torch.cuda.is_available())'\nFalse\nhpc-cpu-1:~$ exit\nhpc-login-1:~$\n
The False
shows that CUDA is not available on the node but that is to be expected. We're only warming up!
Let us now allocate a GPU. The Slurm schedule will properly allocate GPUs for you and setup the environment variable that tell CUDA which devices are available. The following dry run shows these environment variables (and that they are not available on the login node).
hpc-login-1:~$ export | grep CUDA_VISIBLE_DEVICES\nhpc-login-1:~$ srun --gres=gpu:tesla:1 --pty bash\nhpc-gpu-1:~$ export | grep CUDA_VISIBLE_DEVICES\ndeclare -x CUDA_VISIBLE_DEVICES=\"0\"\nhpc-gpu-1:~$ exit\nhpc-login-1:~$ srun --gres=gpu:tesla:2 --pty bash\nhpc-gpu-1:~$ export | grep CUDA_VISIBLE_DEVICES\ndeclare -x CUDA_VISIBLE_DEVICES=\"0,1\"\n
As you see, you can also reserve multiple GPUs. If we were to open two concurrent connections (e. g. in a screen
) to the same node when allocating one GPU each, the allocated GPUs would be non-overlapping. Note that any two jobs are isolated using Linux cgroups (\"container\" technology) so you cannot accidentally use a GPU of another job.
Now to the somewhat boring part where we show that CUDA actually works.
hpc-login-1:~$ srun --gres=gpu:tesla:1 --pty bash\nhpc-gpu-1:~$ nvcc --version\nnvcc: NVIDIA (R) Cuda compiler driver\nCopyright (c) 2005-2019 NVIDIA Corporation\nBuilt on Wed_Oct_23_19:24:38_PDT_2019\nCuda compilation tools, release 10.2, V10.2.89\nhpc-gpu-1:~$ source ~/work/miniconda3/bin/activate\nhpc-gpu-1:~$ conda activate gpu-test\nhpc-gpu-1:~$ python -c 'import torch; print(torch.cuda.is_available())'\nTrue\n
Note
If scheduling a GPU fails, consider explicitely requesting the GPU partion via --partition gpu
(or #SBATCH --partition gpu
).
Also make sure to read the FAQ entry \"I have problems connecting to the GPU node! What's wrong?\" if you encounter problems.
"},{"location":"how-to/connect/gpu-nodes/#bonus-1-who-is-using-the-gpus","title":"Bonus #1: Who is using the GPUs?","text":"Use squeue
to find out about currently queued jobs (the egrep
only keeps the header and entries in the gpu
partition).
hpc-login-1:~$ squeue | egrep -iw 'JOBID|gpu'\n JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)\n 33 gpu bash holtgrem R 2:26 1 hpc-gpu-1\n
"},{"location":"how-to/connect/gpu-nodes/#bonus-2-is-the-gpu-running","title":"Bonus #2: Is the GPU running?","text":"To find out how active the GPU nodes actually are, you can connect to the nodes (without allocating a GPU; you can do this even if the node is full) and then use nvidia-smi
.
hpc-login-1:~$ ssh hpc-gpu-1 bash\nhpc-gpu-1:~$ nvidia-smi\nFri Mar 6 11:10:08 2020\n+-----------------------------------------------------------------------------+\n| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |\n|-------------------------------+----------------------+----------------------+\n| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |\n| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |\n|===============================+======================+======================|\n| 0 Tesla V100-SXM2... Off | 00000000:18:00.0 Off | 0 |\n| N/A 62C P0 246W / 300W | 16604MiB / 32510MiB | 99% Default |\n+-------------------------------+----------------------+----------------------+\n| 1 Tesla V100-SXM2... Off | 00000000:3B:00.0 Off | 0 |\n| N/A 61C P0 270W / 300W | 16604MiB / 32510MiB | 100% Default |\n+-------------------------------+----------------------+----------------------+\n| 2 Tesla V100-SXM2... Off | 00000000:86:00.0 Off | 0 |\n| N/A 39C P0 55W / 300W | 0MiB / 32510MiB | 0% Default |\n+-------------------------------+----------------------+----------------------+\n| 3 Tesla V100-SXM2... Off | 00000000:AF:00.0 Off | 0 |\n| N/A 44C P0 60W / 300W | 0MiB / 32510MiB | 4% Default |\n+-------------------------------+----------------------+----------------------+\n\n+-----------------------------------------------------------------------------+\n| Processes: GPU Memory |\n| GPU PID Type Process name Usage |\n|=============================================================================|\n| 0 43461 C python 16593MiB |\n| 1 43373 C python 16593MiB |\n+-----------------------------------------------------------------------------+\n
"},{"location":"how-to/connect/gpu-nodes/#fair-share-fair-use","title":"Fair Share / Fair Use","text":"Note that allocating a GPU makes it unavailable for everyone else, so please behave nicely and be cooperative. If you see someone blocking the GPU nodes for a long time, first find out who it is. You can type getent passwd USER_NAME
on any cluster node to see their email address (and work phone number if added). Send a friendly email, most likely they blocked the node accidentally. If you cannot resolve the issue (e. g. the user is not reachable) then please contact hpc-helpdesk@bih-charite.de.
The cluster has 4 high-memory nodes with 1.5 TB of RAM. You can connect to these nodes using the highmem
SLURM partition (see below). Jobs allocating more than 200 GB of RAM are automatically routed to the highmem
nodes.
Info
Fair use rules apply. As high-memory nodes are a limited resource, excessive use by single users is prohibited and can lead to mitigating actions. Be nice and cooperative with other users. Tip: getent passwd USER_NAME
will give you a user's contact details.
In the cluster there are four High-memory used which can be used:
hpc-login-1:~$ sinfo -p highmem\nPARTITION AVAIL TIMELIMIT NODES STATE NODELIST \nhighmem up 14-00:00:0 3 idle med040[1-4] \n
To connect to one of them, simply allocate more than 200GB of RAM in your job.
hpc-login-1:~$ srun --pty --mem=300GB bash -i\nmed0401:~$\n
You can also pick one of the hostnames:
hpc-login-1:~$ srun --pty --mem=300GB --nodelist=med0403 bash -i\nmed0403:~$\n
After successfull login, you can see that you are in \"highmem\" queue:
med0403:~$ squeue\n JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) \n[...]\n 270 highmem bash holtgrem R 1:25 1 med0403 \n
"},{"location":"how-to/misc/contribute/","title":"How-To: Contribute to this Document","text":"Click on the edit link at the top of each page as shown below.
Please Contribute!
This guide is far from complete. Please feel free to contribute, e.g., refer to How-To: Contribute to this Document.
Please make sure that you have read How-To: Debug Software as a general primer.
As debugging is hard enough already, it makes one wonder how to do this on the HPC system in batch mode. Here is a list of pointers.
"},{"location":"how-to/misc/debug-at-hpc/#attempt-1-run-it-interactively","title":"Attempt 1: Run it interactively!","text":"First of all, you can of course get an interactive session using srun --pty bash -i
and then run your program interactively. Make sure to allocate appropriate memory and cores for your purpose. You might also want to first start a screen
or tmux
session on the login node such that network interruptions to the login node don't harm your hard debugging work!
Does the program work correctly if you do this? If yes, and it only fails when run in batch mode, consider the following behaviour of the scheduler.
The scheduler takes your resource requirements and tries to find a free slot. Once it has found a free slot, it will attempt to run the program. This mainly differs in running it interactively in standard input, output, and error streams.
/dev/null
such that no input is read. You can change this with the --input=
flag to specify a file.--output=
. You can use certain wildcards to make the output (but also the input files) depend on things like the job ID or job name.mkdir
it in the job script itself.Please refer to the sbatch documentation for details.
If your program fails without leaving any log file or any other trace, make sure that the path to the output file exists. To the best of the author's knowledge, there is no way to tell apart a crash because this does not exist and a program failure (except maybe for the running time of 0 seconds and memory usage of 0 bytes).
"},{"location":"how-to/misc/debug-at-hpc/#attempt-2-inspect-the-logs","title":"Attempt 2: Inspect the logs","text":"Do you see any exception in your log files? If not, continue.
If your job is canceled by scancel
or stopped because it exhausted it maximal running time or allocated resources then you will find a note in the last line of your error output log (usually folded into the standard output). Please note that if the previous output line did not include a line ending, the message might be at the very end of the last line.
The message will look similar to:
slurmstepd: error: *** JOB <your job id> ON med0xxx CANCELLED AT 2020-09-02T21:01:12 DUE TO TIME LIMIT ***\n
"},{"location":"how-to/misc/debug-at-hpc/#attempt-3-increase-loggingprinting","title":"Attempt 3: Increase logging/printing","text":"Ideally, you can add one or more --verbose
/-v
flags to your program to increase verbosity. See how far your program gets, see where it fails. This attempt will be greatly helped by reproducible running on a minimal working example.
sattach
","text":"You can use sattach
for attaching your terminal to your running job. This way, you can perform an interactive inspection of the commands.
You can combine this with one of the next attempst of using debuggers to e.g., get an pdb
debugger at an important position of your program. However, please note that pdb
and ipdb
will stop the program's execution if the standard input stream is at end of file (which /dev/null
is and this is used by default in sbatch
jobs).
Log into the node that your program runs on either using srun --pty --nodelist=NODE
or using ssh
. Please note that you should never perform computational intensive things when logging into the node directly. You can then use all activity inspection tips from How-To: Debug Software.
After having logged into the node running your program, you can of course also attach to the program with gdb -p PID
or cgdb -p PID
.
Here are some final remarks:
Please Contribute!
This guide is far from complete. Please feel free to contribute, e.g., refer to How-To: Contribute to this Document.
Software development in general or even debugging of software are very broad topics. As such, we will not be able to handle them here comprehensively. Rather, we will give a tour de force on practical and minimal approaches of debugging of software. Here, debugging refers to the process of locating errors in your program and removing them.
Origin of the term debugging
The terms \"bug\" and \"debugging\" are popularly attributed to Admiral Grace Hopper in the 1940s. While she was working on a Mark II computer at Harvard University, her associates discovered a moth stuck in a relay and thereby impeding operation, whereupon she remarked that they were \"debugging\" the system. However, the term \"bug\", in the sense of \"technical error\", dates back at least to 1878 and Thomas Edison (see software bug for a full discussion).
-- Wikipedia: Debugging
When forgetting a moment about everything known about software engineering, programming roughly work sin the following cycle:
You run your program. In the case of failure, you need to remove the problem until the program runs through. You then start implementing the next change or feature. But how do you actually locate the problem? Let us walk through a couple of steps.
"},{"location":"how-to/misc/debug-software/#step-1-find-out-that-there-is-an-error","title":"Step 1: Find out that there is an error","text":"This might seem trivial but let us think about this for a moment. For this
You could make this step a bit more comfortable by writing a little checker script that compares expected and actual output.
"},{"location":"how-to/misc/debug-software/#step-2-reproduce-your-error","title":"Step 2: Reproduce your error","text":"You will have to find out how often or regularly the problem occurs. Does the problem occur on all inputs or only specific ones? Does it occur with all parameters? Make sure that you can reproduce the problem, otherwise the problem will be hard to track down.
Discard randomness
In most applications, true randomness is neither required nor used in programs. Rather, pseudo random number generators are used that are usually seeded with a special value. In many cases, the current time is used which makes it hard to reproduce problems. Rather, use a fixed seed, e.g., by calling srand(42)
in C. You could also make this a parameter of your program, but make sure that you can fix all pseudo randomness in your program so you can deterministically reproduce its behaviour.
Try to find a minimal input set on which you can produce your problem. For example, you could use samtools view FILE.bam chr1:90,000-100,000
to cut out regions from a BAM file. The next step is to nail down the problem. Ideally, you can deactivate or comment out parts of your program that are irrelevant to the problem.
This will allow you to get to the problematic point in your program quicker and make the whole debugging exercise easier on yourself.
"},{"location":"how-to/misc/debug-software/#interlude-what-we-have-up-to-here","title":"Interlude: What we have up to here","text":"We can now
If you reached the points above, you have probably cut the time to resolve the problem by 90% already.
Let us now consider a few things that you can do from here to find the source of your problems.
"},{"location":"how-to/misc/debug-software/#method-1-stare-at-your-source-code","title":"Method 1: Stare at your source code","text":"Again, this is trivial, but: look at your code and try to follow through what it does with your given input. This is nicely complemented with the following methods. ;-)
There is a class of tools to help you in doing this, so-called static code analysis tools. They analyze the source code for problematic patterns. The success and power of such analysis tools tends to corellate strongly with how strictly typed the targeted programming language is. E.g., there are very powerful tools for Java, C/C++. However, there is some useful tool support out there for dynamic languages such as Python.
Here is a short list of pointers to static code analysis tools (feel free to extend the list):
The most simple approach is to use print
statements (or similar) to print the current line or value of parameters. While sometimes frowned upon, this certainly is one of the most robust ways to see what is happening in your program. However, beware that too much output might slow down your program or actually make your problem disappear in the case of subtle threading/timing issues (sometimes referred to as \"Heisenbugs\").
Standard output vs. error
Classically, Linux/Unix programs can print back to the user's terminal in two ways: standard output and standard errors. By convention, logging should go to stderr. The standard error stream also has the advantage that writing to it has a more direct effect. In contrast to stdout which is usually setup to be (line) buffered (you will only see output after the next newline character), stderr is unbuffered.
"},{"location":"how-to/misc/debug-software/#look-at-tophtop","title":"Look attop
/htop
","text":"The tools top
and htop
are useful tools for inspecting the activity on the current computer. The following parameters are useful (and are actually also available as key strokes when they are running).
-c
-- show the programs' command lines-u USER
-- show the processes of the userYou can exit either tool by pressing q
or Ctrl-C
.
Use the man
, Luke!
Besides searching the internet for a unix command, you can also read its manual page by running man TOOL
. If this does not work, try TOOL --help
to see its builtin help function. Also, doing an internet search for \"man tool\" might help.
strace
","text":"The program strace
allows you to intercept the calls of your program to the kernel. As the kernel is needed for actions such as accessing the network or file system. Thus this is not so useful if your program gets stuck in \"user land\", but this might be useful to see which files it is accessing.
Pro-Tip: if you move the selection line of htop
to a process then you can strace the program by pressing s
.
lsof
","text":"The lsof
program lists all open files with the processes that are accessing them. This is useful for seeing which files you program has opened.
You can even build a progress bar with lsof, although that requires sudo
privileges which you might not have on the system that you are using.
Pro-Tip: if you move the selection line of htop
to a process then you can list the open files by pressing l
.
There are more ways of inspecting your program, here are some:
perf
Let us now enter the world of interactive debuggers. Integrated development environment (IDEs) generally consist of an editor, a compiler/interpreter, and an ineractive/visual debugger. Usually, they have a debugger program at their core that can also be used on their command line.
"},{"location":"how-to/misc/debug-software/#old-but-gold-gdb","title":"Old but gold:gdb
","text":"On Unix systems, a widely used debugger is gdb
the GNU debugger. gdb
is a command line program and if you are not used to it, it might be hard to use. However, here are some pointers on how to use it:
The commands in interactive mode include:
quit
or Ctrl-D
to exit the debuggerb file.ext:123
set breakpoint in file.ext
on line 123
r
run the programp var_name
print the value of the variable var_name
display var_name
print the value of the variable var_name
every time execution stopsl
print the source code around the current line (multiple calls will show the next 10 lines or so, and so on)l 123
print lines around line 123
f
show information about the current frame (that is the current source location)bt
show the backtrace (that is all functions above the current one)n
step to the next lines
step into function callsfinish
run the current function until it returnshelp
to get more helpYou can call your program directly with command line arguments using cgdb [cgdb-args] --args path/to/program -- [program-args
.You can also attach to running programs using
cgdb -p PIDonce you have found out the process ID to attach to using
htopor
ps`.
Pro-tip: use cgdb
for an easier to use version that displays the source code in split screen and stores command line histories over sessions.
You can get a simple REPL (read-execute-print loop) at virtually any position in your program by adding:
import pdb; pdb.set_trace()\n
You will get a prompt at the current position and can issue several commands including:
quit
or Ctrl-D
to exit the debuggerp var_name
to print the variable with var_name
f
show information about the current frame (that is the current source location)bt
show the backtrace (that is all functions called above the current one)continue
to continue runninghelp
to get more helpPro-tip: use import ipdb; ipdb.set_trace()
(after installing the ipdb
package, of course) to get an IPython-based prompt that is much more comfortable to use.
Here is a free bonus pro-tip: learn how to use version control, e.g., Git. This will allow you to go back to previous versions without problems and see current changes to your source code.
Combine the pro tip on using version control (learn Git already!) with this one: learn how to write automated tests. This will allow you to quickly narrow down problematic changes in your version control history.
Again, testing is another topic alltogether, so here are just some links to testing frameworks to get you started:
The following web resources can serve as a starting point on how to use debuggers.
We provide a user forum using the Discourse software at
First of all, visit the website for the first time: https://hpc-talk.cubi.bihealth.org
You will then be directed to our Single-Sign-On Page.
Use the appropriate button for your host organisation (MDC / Charite) where also your cluster account belongs to.
Then use the usual of your host organisation.
Clicked wrong organisation?
If you accidentally clicked the wrong institution then you need to clear your browser history up to the point where you clicked (e.g., for the last hour).
You will be shown the following screen after the first login.
You can proceed with reading the notification or split it. The site is mostly self-explanatory. let us point you at a couple of interesting things for first steps.
Here you can setup your preferences
Use the \"New Topic\" button to create a new topic. Set a meaningful title, select a suitable category (we will update the list of categories over time), and write down your question or discussion item. Finally, click \"Create Topic\" to create the new topic.
You will be directed to the page with your new topic.
You can enable email notifications to receive emails if someone answers.
"},{"location":"how-to/misc/hpc-talk/#disabling-browser-notifications","title":"Disabling Browser Notifications","text":"In your settings, you will find an option to disable browser notifications in this browser.
Or you can use the do not disturb button.
"},{"location":"how-to/misc/hpc-talk/#closing-remarks","title":"Closing Remarks","text":"We established the HPC Talk forum as a self-help forum for users. Alas, there is a number of such sites out there already that are populated by more users.
How does HPC Talk fit in?
We think it is most useful for asking questions and discussing points that are directly related to the BIH HPC system.
What alternatives do I have?
For example:
Obtaining File Boxes
At the moment, file boxes are only available to members of core facilities (e.g., genomics, bioinformatics, or metabolomics) for exchanging files for their collaboration partners. Currently, HPC users cannot use the file box mechanism on their own.
BIH HPC IT provides a file exchange server to be used by the BIH core facilities and their users. The server is located in the BIH DMZ in Buch. Users authenticate using their Charite/BIH (user@CHARITE
) or MDC accounts (user@MDC-BERLIN
). File exchange is organized using \"file boxes\", directories created on the server to which selected users are granted access. Access control list maintenance is done with audit-trails (\"Revisionssicherheit\") and the file access itself is also logged to comply with data protection standards.
Access from Charite Network
Access from the Charite network (IP ranges 141.x.x.x
and 10.x.x.x
) must connect through the Charite proxy (http://proxy.charite.de:8080
). Depending on the client software that you are using, you might have to configure the proxy.
File boxes are created by the core facilities (e.g., the genomics facilities at Charite and MDC). The facility members also organize the access control. Please talk to your core facility contact on file exchange.
External users must obtain a Charite or MDC account first. Account creation is handled by the core facilities that the external user is a customer of.
"},{"location":"how-to/service/file-exchange/#file-access","title":"File Access","text":"Generally, you will be given a URL to your file box similar to https://file-exchange.bihealth.org/<file-box-id>/
. The files are served over an encrypted connection using WebDAV (which uses HTTPS).
The following describes how to access the files in the box from different platforms.
"},{"location":"how-to/service/file-exchange/#from-linux","title":"From Linux","text":"We describe how to access the files on the command line using the lftp
program. The program is preinstalled on the BIH (and the MDC cluster) and you should be able to just install it with yum install lftp
on CentOS/Red Hat or apt-get install lftp
on Ubuntu/Debian.
When using lftp
, you have to add some configuration first:
# cat >>~/.lftprc <<\"EOF\"\nset ssl:verify-certificate no\nset ftp:ssl-force yes\nEOF\n
In case that you want to access the files using a graphical user interface, search Google for \"WebDAV\" and your operating system or desktop environment. File browsers such as Nautilus and Thunar have built-in WebDAV support.
"},{"location":"how-to/service/file-exchange/#connecting","title":"Connecting","text":"First, log into the machine that has lftp
installed. The login nodes of the BIH cluster do not have it installed but all compute and file transfer nodes have it. Go to the data download location.
host:~$ mkdir -p ~/scratch/download_dir\nhost:~$ cd ~/scratch/download_dir\n
Next, start lftp
. You can open the connection using open -u <user>@<DOMAIN> https://file-exchange.bihealth.org/<file-box-id>/
(NB: there is a trailing slash) where
<user>
is your user name, e.g., holtgrem
,<domain>
is either MDC-BERLIN
or CHARITE
, and<file-box-id>
the file box ID from the URL provided to you.When prompted, use your normal Charite/MDC password to login.
host:download_dir$ lftp\nlftp :~> open -u holtgrem@CHARITE https://file-exchange.bihealth.org/c62910b3-c1ba-49a5-81a6-a68f1f15aef6\nPassword:\ncd ok, cwd=/c62910b3-c1ba-49a5-81a6-a68f1f15aef6\nlftp holtgrem@CHARITE@file-exchange.bihealth.org:/c62910b3-c1ba-49a5-81a6-a68f1f15aef6>\n
"},{"location":"how-to/service/file-exchange/#browsing-data","title":"Browsing Data","text":"You can find a full reference of lftp
on the lftp man page. You could also use help COMMAND
on the lftp prompt. For example, to look at the files of the server for a bit...
lftp holtgrem@CHARITE@file-exchange.bihealth.org:/c62910b3-c1ba-49a5-81a6-a68f1f15aef6> ls\ndrwxr-xr-x -- /\ndrwxr-xr-x -- dir\n-rw-r--r-- -- file1\nlftp holtgrem@CHARITE@file-exchange.bihealth.org:/c62910b3-c1ba-49a5-81a6-a68f1f15aef6> find\n./\n./dir/\n./dir/file2\n./file1\n
"},{"location":"how-to/service/file-exchange/#downloading-data","title":"Downloading Data","text":"To download all data use mirror
, e.g. with -P 4
to use four download threads.
lftp holtgrem@CHARITE@file-exchange.bihealth.org:/c62910b3-c1ba-49a5-81a6-a68f1f15aef6> mirror .\nTotal: 2 directories, 3 files, 0 symlinks\nNew: 3 files, 0 symlinks\nlftp holtgrem@CHARITE@file-exchange.bihealth.org:/c62910b3-c1ba-49a5-81a6-a68f1f15aef6> exit\nhost:download_dir$ tree\n.\n\u251c\u2500\u2500 dir\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 file2\n\u251c\u2500\u2500 file1\n\u2514\u2500\u2500 file.txt\n\n1 directory, 3 files\n
Ignoring gnutls_record_recv
errors.
A common error to see is mirror: Fatal error: gnutls_record_recv: The TLS connection was non-properly terminated.
. You can just ignore this.
To upload data, you can use mirror -R .
which is essentially the \"reverse\" of the mirror command.
lftp holtgrem@CHARITE@file-exchange.bihealth.org:/c62910b3-c1ba-49a5-81a6-a68f1f15aef6> mirror -R\nmirror: Fatal error: gnutls_record_recv: The TLS connection was non-properly terminated.\nmirror: Fatal error: gnutls_record_recv: The TLS connection was non-properly terminated.\nmirror: Fatal error: gnutls_record_recv: The TLS connection was non-properly terminated.\nTotal: 2 directories, 3 files, 0 symlinks\nModified: 3 files, 0 symlinks\n4 errors detected\n
"},{"location":"how-to/service/file-exchange/#from-windows","title":"From Windows","text":"We recommend to use WinSCP for file transfer.
After starting WinSCP, you will see a window titled Login
. Just paste the URL (e.g., https://file-exchange.bihealth.org/c62910b3-c1ba-49a5-81a6-a68f1f15aef6/
) of the file box into the Host name
entry field. In this case, the fields File protocol
etc. will be filled automatically. Next, enter your user name as user@CHARITE
or user@MDC-BERLIN
(the capitalization of the part behind the @
is important). The window should now look similar to the one below.
Proxy Configuration on Charite Network
If you are on the Charite network then you have to configure the proxy. Otherwise, you have to skip this step.
Click Advanced
and a window titled Advanced Site Settings
will pop up. Here, select Connection / Proxy
in the left side. Select HTTP
for the Proxy type
. Then, enter proxy.charite.de
as the Proxy host name
and set the Port number
to 8080
. The window should nwo look as below. Then, click OK
to apply the proxy settings.
Finally, click Login
. You can now transfer files between the file exchange server and your local computer using drag and drop between WinSCP and your local Windows File Explorer. Alternatively, you can use the two-panel view of WinSCP to transfer files as described here.
For Mac, we you can also use lftp
as described above in From Linux. You can find install instructions here online.
Proxy Configuration on Charite Network
If you are on the Charite network then you must have configured the proxy appropriately. Otherwise, you have to skip this step.
You can find them in your System Preference
in the Network
section, in the Advanced
tab of your network (e.g., WiFi
).
If you want to use a graphical interface then we recommend the usage of Cyberduck. After starting the program, click Open Connection
on the top left, then select WebDAV (HTTPS)
and fill out the form as in the following way. Paste the file box URL into the server field and use your login name (user@CHARITE
or user@MDC-BERLIN
) with your usual password.
If you need to perform access through a graphical user interface on your Mac, please contact hpc-helpdesk@bihealth.org for support.
"},{"location":"how-to/service/file-exchange/#security","title":"Security","text":"The file exchange server has the fail2ban
software installed and configured (Charite, MDC, and BIH IPs are excluded from this).
If you are entering your user/password incorrectly for more than 5 times in 10 minutes then your machine will be banned for one hour. This means someone else that has the same IP address from the side of the file exchange server can get you blocked. This can happen if you are in the same home or university network with NAT or if you are behind a proxy. In this case you get a \"connection refused\" error. In this case, try again in one hour.
"},{"location":"how-to/software/apptainer/","title":"Using Apptainer (with Docker Images)","text":"Note
Singularity is now Apptainer! While Apptainer provides an singularity
alias for backwards compatibility, it is recommanded to adapt all workflows to use the new binary apptainer
.
Apptainer (https://apptainer.org/) is a popular alternative to docker, because it does not require to run as a privileged user. Apptainer can run Docker images out-of-the-box by converting them to the apptainer image format. The following guide gives a quick dive into using docker images with apptainer.
Build on your workstation, run on the HPC
Building images using Apptainer requires root privileges. We cannot give you these permissions on the BIH HPC. Thus, you will have to build the images on your local workstation (or anywhere where you have root access). You can then run the built images on the BIH HPC.
This is also true for the --writeable
flag. Apparently it needs root permissions which you don't have on the cluster.
Link ~/.apptainer to ~/work/.apptainer
Because you only have a quota of 1 GB in your home directory, you should symlink ~/.apptainer
to ~/work/.apptainer
.
host:~$ mkdir -p ~/work/.apptainer && ln -sr ~/work/.apptainer ~/.apptainer\n
In case you already have a apptainer directory:
host:~$ mv ~/.apptainer ~/work/.apptainer && ln -sr ~/work/.apptainer ~/.apptainer\n
Run a bash in a docker image:
host:~$ apptainer shell docker://godlovedc/lolcow\n
Run a command in a docker image:
host:~$ apptainer exec docker://godlovedc/lolcow echo \"hello, hello!\"\n
Run a bash in a docker image, enable access to the cuda driver (--nv) and mount a path (--bind or -B):
host:~$ apptainer shell --nv --bind /path_on_host/:/path_inside_container/ docker://godlovedc/lolcow\n
"},{"location":"how-to/software/apptainer/#some-caveats-and-notes","title":"Some Caveats and Notes","text":"Caveats
Notes
APPTAINERENV_
: host:~$ APPTAINERENV_HELLO=123 apptainer shell docker://godlovedc/lolcow echo $HELLO\n
apptainer shell
or apptainer exec
uses as cwd the host callers cwd not the one set in the Dockerfile. One can change this by setting --pwd
.The easiest variant to run a docker image available via a docker hub is by specifying its url. This causes apptainer to download the image and convert it to a apptainer image:
host:~$ apptainer run docker://godlovedc/lolcow\n
or to open a shell inside the image
host:~$ apptainer shell docker://godlovedc/lolcow\n
Furthermore, similar to docker, one can pull (and convert) remote image with the following call:
host:~$ apptainer pull docker://godlovedc/lolcow\n
In case your registry requires authentication you can provide it via a prompt by adding the option --docker-login
:
host:~$ apptainer pull --docker-login docker://ilumb/mylolcow\n
or by setting the following environment variables:
host:~$ export APPTAINER_DOCKER_USERNAME=ilumb\nhost:~$ export APPTAINER_DOCKER_PASSWORD=<redacted>\nhost:~$ apptainer pull docker://ilumb/mylolcow\n
More details can be found in the Apptainer documentation.
"},{"location":"how-to/software/apptainer/#option-2-converting-docker-images","title":"Option 2: Converting Docker Images","text":"Another option is to convert your docker image into the Apptainer/Singularity image format. This can be easily done using the docker images provided by docker2singularity.
To convert the docker image docker_image_name
to the apptainer image apptainer_image_name
one can use the following command line. The output image will be located in output_directory_for_images
.
host:~$ docker run -v /var/run/docker.sock:/var/run/docker.sock -v /output_directory_for_images/:/output --privileged -t --rm quay.io/singularity/docker2singularity --name apptainer_image_name docker_image_name\n
The resulting image can then directly be used as image:
host:~$ apptainer exec apptainer_image_name.sif bash\n
"},{"location":"how-to/software/apptainer/#conversion-compatibility","title":"Conversion Compatibility","text":"Here are some tips for making Docker images compatible with Apptainer taken from docker2singulrity:
~/.bashrc
, ~/.profile
, etc.ENTRYPOINT
instruction set pointing to the command line interface to your pipeline.CMD
- rely only on ENTRYPOINT
.ENTRYPOINT docker run -i -t --entrypoint /bin/bash bids/example
.--read-only --tmpfs /run --tmpfs /tmp parameters
(this emulates the read-only behavior of Apptainer).USER
instruction set.from the official website: \"Cell Ranger is a set of analysis pipelines that process Chromium single-cell RNA-seq output to align reads, generate feature-barcode matrices and perform clustering and gene expression analysis\"
"},{"location":"how-to/software/cell-ranger/#installation","title":"installation","text":"requires registration before download from here
to unpack Cell Ranger, its dependencies and the cellranger
script:
cd /data/cephfs-1/home/users/$USER/work\nmv /path/to/cellranger-3.0.2.tar.gz .\ntar -xzvf cellranger-3.0.2.tar.gz\n
"},{"location":"how-to/software/cell-ranger/#reference-data","title":"reference data","text":"will be provided in /data/cephfs-1/work/projects/cubit/current/static_data/app_support/cellranger
add a file slurm.template
to /data/cephfs-1/home/users/$USER/work/cellranger-3.0.2/martian-cs/v3.2.0/jobmanagers/sge.template
with the following contents:
#!/usr/bin/env bash\n#\n# Copyright (c) 2016 10x Genomics, Inc. All rights reserved.\n#\n# =============================================================================\n# Setup Instructions\n# =============================================================================\n#\n# 1. Add any other necessary Slurm arguments such as partition (-p) or account\n# (-A). If your system requires a walltime (-t), 24 hours (24:00:00) is\n# sufficient. We recommend you do not remove any arguments below or Martian\n# may not run properly.\n#\n# 2. Change filename of slurm.template.example to slurm.template.\n#\n# =============================================================================\n# Template\n# =============================================================================\n#\n#SBATCH -J __MRO_JOB_NAME__\n#SBATCH --export=ALL\n#SBATCH --nodes=1 --ntasks-per-node=__MRO_THREADS__\n#SBATCH --signal=2\n#SBATCH --no-requeue\n#SBATCH --partition=medium\n#SBATCH --time=24:00:00\n### Alternatively: --ntasks=1 --cpus-per-task=__MRO_THREADS__\n### Consult with your cluster administrators to find the combination that\n### works best for single-node, multi-threaded applications on your system.\n#SBATCH --mem=__MRO_MEM_GB__G\n#SBATCH -o __MRO_STDOUT__\n#SBATCH -e __MRO_STDERR__\n\n__MRO_CMD__\n
note: on newer cellranger version, slurm.template
needs to go to /data/cephfs-1/home/users/$USER/work/cellranger-XX/external/martian/jobmanagers/
if that hasn't been done yet, you can use cellranger mkfastq
(details to be added)
count
)","text":"create a script run_cellranger.sh
with these contents (consult the documentation for help:
#!/bin/bash\n\n/data/cephfs-1/home/users/$USER/work/cellranger-3.0.2/cellranger count \\\n --id=sample_id \\\n --transcriptome=/data/cephfs-1/work/projects/cubit/current/static_data/app_support/cellranger/refdata-cellranger-${species}-3.0.0\\\n --fastqs=/path/to/fastqs \\\n --sample=sample_name \\\n --expect-cells=n_cells \\\n --jobmode=slurm \\\n --maxjobs=100 \\\n --jobinterval=1000\n
and then submit the job via
sbatch --ntasks=1 --mem-per-cpu=4G --time=8:00:00 -p medium -o cellranger.log run_cellranger.sh\n
"},{"location":"how-to/software/cell-ranger/#cluster-support-sge-outdated","title":"cluster support SGE (outdated)","text":"add a file sge.template
to /data/cephfs-1/home/users/$USER/work/cellranger-3.0.2/martian-cs/v3.2.0/jobmanagers/sge.template
with the following contents:
# =============================================================================\n# Template\n# =============================================================================\n#\n#$ -N __MRO_JOB_NAME__\n#$ -V\n#$ -pe smp __MRO_THREADS__\n#$ -cwd\n#$ -P medium\n#$ -o __MRO_STDOUT__\n#$ -e __MRO_STDERR__\n#$ -l h_vmem=__MRO_MEM_GB_PER_THREAD__G\n#$ -l h_rt=08:00:00\n\n#$ -m a\n#$ -M user@email.com\n\n__MRO_CMD__\n
and submit the job via
qsub -cwd -V -pe smp 1 -l h_vmem=8G -l h_rt=24:00:00 -P medium -m a -j y run_cellranger.sh\n
"},{"location":"how-to/software/jupyter/","title":"How-To: Run Jupyter","text":"SSH Tunnels Considered Harmful
Please use our Open OnDemand Portal for running Jupyter notebooks!
The information below is still accurate. However, many users find it tricky to get SSH tunnels working correctly. A considerable number of parts is involved and you have to get each step 100% correct. Helpdesk cannot support you in problems with SSH tunnels that are caused by incorrect usage.
"},{"location":"how-to/software/jupyter/#what-is-jupyter","title":"What is Jupyter","text":"Project Jupyter is a networking protocol for interactive computing that allows the user to write and execute code for a high number of different programming languages. The most used client is Jupyter Notebook that can be encountered in various form all over the web. Its basic principle is a document consisting of different cells, each of which contains either code (executed in place) or documentation (written in markdown). This allows one to handily describe the processed workflow.
"},{"location":"how-to/software/jupyter/#setup-and-running-jupyter-on-the-cluster","title":"Setup and running Jupyter on the cluster","text":"Install Jupyter on the cluster (via conda, by creating a custom environment)
hpc-cpu-x:~$ conda create -n jupyter jupyter\nhpc-cpu-x:~$ conda activate jupyter\n
(If you want to work in a language other than python, you can install more Jupyter language kernel, see the kernel list)
Now you can start the Jupyter server session (you may want to do this in a screen
& srun --pty bash -i
session as jupyter keeps running while you are doing computations)
hpc-cpu-x:~$ jupyter notebook --no-browser\n
Check the port number (usually 8888
) in the on output and remember it for later:
[I 23:39:40.860 NotebookApp] The Jupyter Notebook is running at:\n[I 23:39:40.860 NotebookApp] http://localhost:8888/\n
By default, Jupyter will create an access token (a link stated in the output) to protect your notebook against unauthorized access which you have to save and enter in the accessing browser. You can change this to password base authorization via jupyter notebook password
. If you are running multiple server on one or more nodes, one can separate them by changing the port number by adding --port=$PORT
.
This is slightly trickier as we have to create a SSH connection/tunnel with potentially multiple hops in between. The easiest way is probably to configure your .ssh/config
to automatically route your connection via the login node (and possibly MDC jail). This is described in our Advanced SSH config documentation
In short,add these lines to ~/.ssh/config
(replace curly parts):
Host bihcluster\n user {USER_NAME}\n HostName hpc-login-2.cubi.bihealth.org\n\nHost hpc-cpu*\n user {USER_NAME}\n ProxyJump bihcluster\n
For MDC users outside the MDC network:
Host mdcjail\n HostName ssh1.mdc-berlin.de\n User {MDC_USER_NAME}\n\nHost bihcluster\n user {USER_NAME}\n HostName hpc-login-2.cubi.bihealth.org\n\nHost hpc-cpu*\n user {USER_NAME}\n ProxyJump bihcluster\n
Check that this config is working by connecting like this: ssh hpc-cpu-1
. Please note that you cannot use any resources on this node without a valid Slurm session.
Now you setup a tunnel for your running Jupyter session:
workstation:~$ ssh -N -f -L 127.0.0.1:8888:localhost:{PORT} hpc-cpu-x\n
The port of your Jupyter server is usually 8888
. The cluster node srun
has sent you to determines the last argument. You should now be able to connect to your Jupyter server by typing localhost:8888
in your webbrowser (see the note about token and password above).
It can and will happen that will lose connection, either due to network problems or due to shut-down of your computer. This is not a problem at all and you will not lose data, just reconnect to your session. If your notebooks are also losing connection (you will see a colorful remark in the top right corner), reconnect and click the colorful button. If this does not work, your work is still not lost as all cells that have been executed are automatically saved anyways. Copy all unexecuted cells (those are only saved periodically) and reload the browser page (after reconnecting) with F5
. (you can also open a copy of the notebook in another tab, just be aware that there may be synchronisation problems)
There are two independent steps in ending a session:
Canceling the SSH tunnel
hpc-cpu-x:~$ ps aux | grep \"$PORT\"\n
This will give you something like this:
user 54 0.0 0.0 43104 784 ? Ss 15:06 0:00 ssh -N -f -L 127.0.0.1:8888:localhost:8888 hpc-cpu-x\nuser 58 0.0 0.0 41116 1024 tty1 S 15:42 0:00 grep --color=auto 8888\n
from which you need the process ID (here 54
)
hpc-cpu-x:~$ kill -9 $PID\n
Shutdown the Jupyter server
Open the Jupyter session, cancel the process with {Ctrl} + {C} and confirm {y}. Make sure you saved your notebooks beforehand (though auto-save catches most things).
"},{"location":"how-to/software/jupyter/#advanced","title":"Advanced","text":"If anyone has figured out, the following might also be interesting (please add):
Because the GPU nodes med030[1-4]
has four GPU units we can train a model by using multiple GPUs in parallel. This How-To gives an example with Keras 2.2.4 together and tensorflow. Finally soem hints how you can submit a job on the cluster.
Hint
With tensorflow > 2.0 and newer keras version the multi_gpu_model
is deprecated and you have to use the MirroredStrategy
.
we need to import the multi_gpu_model
model from keras.utils
and have to pass our actual model (maybe sequential Keras model) into it. In general Keras automatically configures the number of available nodes (gpus=None
). This seems not to work on our system. So we have to specify the numer of GPUs, e.g. two with gpus=2
. We put this in a try catch environment that it will also work on CPUs.
from keras.utils import multi_gpu_model\n\ntry: \n model = multi_gpu_model(model, gpus=2) \nexcept:\n pass\n
That's it!
Please read here on how to submit jobs to the GPU nodes.
"},{"location":"how-to/software/keras/#conda-environment","title":"Conda environment","text":"All this was tested with the following conda environment:
name: cuda channels: \n- conda-forge\n- bioconda\n- defaults\ndependencies:\n- keras=2.2.4\n- python=3.6.7\n- tensorboard=1.12.0\n- tensorflow=1.12.0\n- tensorflow-base=1.12.0\n- tensorflow-gpu=1.12.0\n
"},{"location":"how-to/software/matlab/","title":"How-To: Use Matlab","text":"Note
This information is outdated and will soon be removed.
GNU Octave as Matlab alternative
Note that GNU Octave is an Open Source alternative to Matlab. While both packages are not 100% compatible, Octave is an alternative that does not require any license management. Further, you can easily install it yourself using Conda.
Want to use the Matlab GUI?
Make sure you understand X forwarding as outline in this FAQ entry.
You can also use Open OnDemand Portal to run Matlab.
"},{"location":"how-to/software/matlab/#pre-requisites","title":"Pre-requisites","text":"You have to register with hpc-helpdesk@bih-charite.de for requesting access to the Matlab licenses. Afterwards, you can connect to the High-Memory using the license_matlab_r2016b
resource (see below).
BIH has a license of Matlab R2016b for 16 seats and various licensed packages (see below). To display the available licenses:
hpc-login-1:~$ scontrol show lic\nLicenseName=matlab_r2016b\n Total=16 Used=0 Free=16 Remote=no\n
Matlab is installed on all of the compute nodes:
# The following is VITAL so the scheduler allocates a license to your session.\nhpc-login-1:~$ srun -L matlab_r2016b:1 --pty bash -i\nmed0127:~$ scontrol show lic\nLicenseName=matlab_r2016b\n Total=16 Used=1 Free=15 Remote=no\nmed0127:~$ module avail\n----------------- /usr/share/Modules/modulefiles -----------------\ndot module-info null\nmodule-git modules use.own\n\n----------------------- /opt/local/modules -----------------------\ncmake/3.11.0-0 llvm/6.0.0-0 openmpi/3.1.0-0\ngcc/7.2.0-0 matlab/r2016b-0\nmed0127:~$ module load matlab/r2016b-0\nStart matlab without GUI: matlab -nosplash -nodisplay -nojvm\n Start matlab with GUI (requires X forwarding (ssh -X)): matlab\nmed0127:~$ matlab -nosplash -nodisplay -nojvm\n < M A T L A B (R) >\n Copyright 1984-2016 The MathWorks, Inc.\n R2016b (9.1.0.441655) 64-bit (glnxa64)\n September 7, 2016\n\n\nFor online documentation, see http://www.mathworks.com/support\nFor product information, visit www.mathworks.com.\n\n\n Non-Degree Granting Education License -- for use at non-degree granting, nonprofit,\n educational organizations only. Not for government, commercial, or other organizational use.\n\n>> ver\n--------------------------------------------------------------------------------------------\nMATLAB Version: 9.1.0.441655 (R2016b)\nMATLAB License Number: 1108905\nOperating System: Linux 3.10.0-862.3.2.el7.x86_64 #1 SMP Mon May 21 23:36:36 UTC 2018 x86_64\nJava Version: Java is not enabled\n--------------------------------------------------------------------------------------------\nMATLAB Version 9.1 (R2016b)\nBioinformatics Toolbox Version 4.7 (R2016b)\nGlobal Optimization Toolbox Version 3.4.1 (R2016b)\nImage Processing Toolbox Version 9.5 (R2016b)\nOptimization Toolbox Version 7.5 (R2016b)\nParallel Computing Toolbox Version 6.9 (R2016b)\nPartial Differential Equation Toolbox Version 2.3 (R2016b)\nSignal Processing Toolbox Version 7.3 (R2016b)\nSimBiology Version 5.5 (R2016b)\nStatistics and Machine Learning Toolbox Version 11.0 (R2016b)\nWavelet Toolbox Version 4.17 (R2016b)\n>> exit\n
"},{"location":"how-to/software/matlab/#running-matlab-ui","title":"Running MATLAB UI","text":"For starting the Matlab with GUI, make sure that your client is running a X11 server and you connect with X11 forwarding enabled (e.g., ssh -X hpc-login-1.cubi.bihealth.org
from the Linux command line). Then, make sure to use srun -L matlab_r2016b:1 --pty --x11 bash -i
for connecting to a node with X11 forwarding enabled.
client:~$ ssh -X hpc-login-1.cubi.bihealth.org\n[...]\nhpc-login-1:~ $ srun -L matlab_r2016b:1 --pty --x11 bash -i\n[...]\nmed0203:~$ module load matlab/r2016b-0\nStart matlab without GUI: matlab -nosplash -nodisplay -nojvm\n Start matlab with GUI (requires X forwarding (ssh -X)): matlab\nmed0203:~$ matlab\n[UI will start]\n
For forcing starting in text mode can be done (as said after module load
): matlab -nosplash -nodisplay -nojvm
.
Also see this FAQ entry.
"},{"location":"how-to/software/matlab/#see-available-matlab-licenses","title":"See Available Matlab Licenses","text":"You can use scontrol show lic
to see the currently available MATLAB license. E.g., here I am running an interactive shell in which I have requested 1 of the 16 MATLAB licenses, so 15 more remain.
$ scontrol show lic\nLicenseName=matlab_r2016b\n Total=16 Used=1 Free=15 Remote=no\n
"},{"location":"how-to/software/matlab/#a-working-example","title":"A Working Example","text":"Get a checkout of our MATLAB example. Then, look around at the contents of this repository.
hpc-login-1:~$ git clone https://github.com/bihealth/bih-cluster-matlab-example.git\nhpc-login-1:~$ cd bih-cluster-matlab-example\nhpc-login-1:~$ cat job_script.sh\n#!/bin/bash\n\n# Logging goes to directory sge_log\n#SBATCH -o slurm_log/%x-%J.log\n# Keep current environment variables\n#SBATCH --export=ALL\n# Name of the script\n#SBATCH --job-name MATLAB-example\n# Allocate 4GB of RAM per core\n#SBATCH --mem 4G\n# Maximal running time of 2 hours\n#SBATCH --time 02:00:00\n# Allocate one Matlab license\n#SBATCH -L matlab_r2016b:1\n\nmodule load matlab/r2016b-0\n\nmatlab -r example\n$ cat example.m\n% Example Hello World script for Matlab.\n\ndisp('Hello world!')\ndisp('Thinking...')\n\npause(10)\n\ndisp(sprintf('The square root of 2 is = %f', sqrt(2)))\nexit\n
For submitting the script, you can do the following
hpc-login-1:~$ sbatch job_script.sh\n
This will submit a job with one Matlab license requested. If you were to submit 17 of these jobs, then at least one of them would have to wait until all licenses are free.
Matlab License Server
Note that there is a Matlab license server running on the server that will check whether 16 or less Matlab sessions are currently running. If a Matlab session is running but this is not made known to the scheduler via -L matlab_r2016b
then this can lead to scripts crashing as not enough licenses are available. If this happens to you, double-check that you have specified the license requirements correctly and notify hpc-helpdesk@bih-charite.de in case of any problems. We will try to sort out the situation then.
This article describes how to build an run an OpenMPI program. We will build a simple C program that uses the OpenMPI message passing interface and run it in parallel. You should be able to go from here with other languages and more complex programs. We will use a simple Makefile for building the software.
"},{"location":"how-to/software/openmpi/#loading-openmpi-environment","title":"Loading OpenMPI Environment","text":"First, load the OpenMPI package.
hpc-login-1:~$ srun --pty bash -i\nmed0127:~$ module load openmpi/4.3.0-0\n
Then, check that the installation works
med0127:~$ ompi_info | head\n Package: Open MPI root@med0127 Distribution\n Open MPI: 4.0.3\n Open MPI repo revision: v4.0.3\n Open MPI release date: Mar 03, 2020\n Open RTE: 4.0.3\n Open RTE repo revision: v4.0.3\n Open RTE release date: Mar 03, 2020\n OPAL: 4.0.3\n OPAL repo revision: v4.0.3\n OPAL release date: Mar 03, 2020\n
"},{"location":"how-to/software/openmpi/#building-the-example","title":"Building the example","text":"Next, clone the OpenMPI example project from Gitlab.
med0127:~$ git clone git@github.com:bihealth/bih-cluster-openmpi-example.git\nmed0127:~$ cd bih-cluster-openmpi-example/src\n
Makefile
.PHONY: default clean\n\n# configure compilers\nCC=mpicc\nCXX=mpicxx\n# configure flags\nCCFLAGS += $(shell mpicc --showme:compile)\nLDFLAGS += $(shell mpicc --showme:link)\n\ndefault: openmpi_example\n\nopenmpi_example: openmpi_example.o\n\nclean:\n rm -f openmpi_example.o openmpi_example\n
openmpi_example.c
#include <stdio.h>\n#include <mpi.h>\n\nint main(int argc, char** argv) {\n // Initialize the MPI environment\n MPI_Init(NULL, NULL);\n\n // Get the number of processes\n int world_size;\n MPI_Comm_size(MPI_COMM_WORLD, &world_size);\n\n // Get the rank of the process\n int world_rank;\n MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);\n\n // Get the name of the processor\n char processor_name[MPI_MAX_PROCESSOR_NAME];\n int name_len;\n MPI_Get_processor_name(processor_name, &name_len);\n\n // Print off a hello world message\n printf(\"Hello world from processor %s, rank %d\"\n \" out of %d processors\\n\",\n processor_name, world_rank, world_size);\n\n // Finalize the MPI environment.\n MPI_Finalize();\n\n return 0;\n}\n
run_mpi.sh
#!/bin/bash\n\n# Example job script for (single-threaded) MPI programs.\n\n# Generic arguments\n\n# Job name\n#SBATCH --job-name openmpi_example\n# Maximal running time of 10 min\n#SBATCH --time 00:10:00\n# Allocate 1GB of memory per node\n#SBATCH --mem 1G\n# Write logs to directory \"slurm_log\"\n#SBATCH -o slurm_log/slurm-%x-%J.log\n\n# MPI-specific parameters\n\n# Run 64 tasks (threads/on virtual cores)\n#SBATCH --nodes 64\n\n# Make sure to source the profile.d file (not available on head nodes).\n/etc/profile.d/modules.sh\n\n# Load the OpenMPI environment module to get the runtime environment.\nmodule load openmpi/3.1.0-0\n\n# Launch the program.\nmpirun -np 64 ./openmpi_example\n
The next step is building the software
med0127:~$ make\nmpicc -c -o openmpi_example.o openmpi_example.c\nmpicc -pthread -Wl,-rpath -Wl,/opt/local/openmpi-4.0.3-0/lib -Wl,--enable-new-dtags -L/opt/local/openmpi-4.0.3-0/lib -lmpi openmpi_example.o -o openmpi_example\nmed0127:~$ ls -lh\ntotal 259K\n-rw-rw---- 1 holtgrem_c hpc-ag-cubi 287 Apr 7 23:29 Makefile\n-rwxrwx--- 1 holtgrem_c hpc-ag-cubi 8.5K Apr 8 00:15 openmpi_example\n-rw-rw---- 1 holtgrem_c hpc-ag-cubi 760 Apr 7 23:29 openmpi_example.c\n-rw-rw---- 1 holtgrem_c hpc-ag-cubi 2.1K Apr 8 00:15 openmpi_example.o\n-rwxrwx--- 1 holtgrem_c hpc-ag-cubi 1.3K Apr 7 23:29 run_hybrid.sh\n-rwxrwx--- 1 holtgrem_c hpc-ag-cubi 663 Apr 7 23:35 run_mpi.sh\ndrwxrwx--- 2 holtgrem_c hpc-ag-cubi 4.0K Apr 7 23:29 sge_log\n
The software will run outside of the MPI environment -- but in a single process only, of course.
med0127:~$ ./openmpi_example\nHello world from processor med0127, rank 0 out of 1 processors\n
"},{"location":"how-to/software/openmpi/#running-openmpi-software","title":"Running OpenMPI Software","text":"All of the arguments are already in the run_mpi.sh
script.
med01247:~# sbatch run_mpi.sh\n
Explanation of the OpenMPI-specific arguments
--ntasks 64
: run 64 processes in the MPI environment.Let's look at the slurm log file, e.g., in slurm_log/slurm-openmpi_example-3181.log
.
med0124:~$ cat slurm_log/slurm-openmpi_example-*.log\nHello world from processor med0133, rank 6 out of 64 processors\nHello world from processor med0133, rank 25 out of 64 processors\nHello world from processor med0133, rank 1 out of 64 processors\nHello world from processor med0133, rank 2 out of 64 processors\nHello world from processor med0133, rank 3 out of 64 processors\nHello world from processor med0133, rank 7 out of 64 processors\nHello world from processor med0133, rank 9 out of 64 processors\nHello world from processor med0133, rank 12 out of 64 processors\nHello world from processor med0133, rank 13 out of 64 processors\nHello world from processor med0133, rank 15 out of 64 processors\nHello world from processor med0133, rank 16 out of 64 processors\nHello world from processor med0133, rank 17 out of 64 processors\nHello world from processor med0133, rank 18 out of 64 processors\nHello world from processor med0133, rank 23 out of 64 processors\nHello world from processor med0133, rank 24 out of 64 processors\nHello world from processor med0133, rank 26 out of 64 processors\nHello world from processor med0133, rank 27 out of 64 processors\nHello world from processor med0133, rank 31 out of 64 processors\nHello world from processor med0133, rank 0 out of 64 processors\nHello world from processor med0133, rank 4 out of 64 processors\nHello world from processor med0133, rank 5 out of 64 processors\nHello world from processor med0133, rank 8 out of 64 processors\nHello world from processor med0133, rank 10 out of 64 processors\nHello world from processor med0133, rank 11 out of 64 processors\nHello world from processor med0133, rank 14 out of 64 processors\nHello world from processor med0133, rank 19 out of 64 processors\nHello world from processor med0133, rank 20 out of 64 processors\nHello world from processor med0133, rank 21 out of 64 processors\nHello world from processor med0133, rank 22 out of 64 processors\nHello world from processor med0133, rank 28 out of 64 processors\nHello world from processor med0133, rank 29 out of 64 processors\nHello world from processor med0133, rank 30 out of 64 processors\nHello world from processor med0134, rank 32 out of 64 processors\nHello world from processor med0134, rank 33 out of 64 processors\nHello world from processor med0134, rank 34 out of 64 processors\nHello world from processor med0134, rank 38 out of 64 processors\nHello world from processor med0134, rank 39 out of 64 processors\nHello world from processor med0134, rank 42 out of 64 processors\nHello world from processor med0134, rank 44 out of 64 processors\nHello world from processor med0134, rank 45 out of 64 processors\nHello world from processor med0134, rank 46 out of 64 processors\nHello world from processor med0134, rank 53 out of 64 processors\nHello world from processor med0134, rank 54 out of 64 processors\nHello world from processor med0134, rank 55 out of 64 processors\nHello world from processor med0134, rank 60 out of 64 processors\nHello world from processor med0134, rank 62 out of 64 processors\nHello world from processor med0134, rank 35 out of 64 processors\nHello world from processor med0134, rank 36 out of 64 processors\nHello world from processor med0134, rank 37 out of 64 processors\nHello world from processor med0134, rank 40 out of 64 processors\nHello world from processor med0134, rank 41 out of 64 processors\nHello world from processor med0134, rank 43 out of 64 processors\nHello world from processor med0134, rank 47 out of 64 processors\nHello world from processor med0134, rank 48 out of 64 processors\nHello world from processor med0134, rank 49 out of 64 processors\nHello world from processor med0134, rank 50 out of 64 processors\nHello world from processor med0134, rank 51 out of 64 processors\nHello world from processor med0134, rank 52 out of 64 processors\nHello world from processor med0134, rank 56 out of 64 processors\nHello world from processor med0134, rank 57 out of 64 processors\nHello world from processor med0134, rank 59 out of 64 processors\nHello world from processor med0134, rank 61 out of 64 processors\nHello world from processor med0134, rank 63 out of 64 processors\nHello world from processor med0134, rank 58 out of 64 processors\n
"},{"location":"how-to/software/openmpi/#running-hybrid-software-mpimultithreading","title":"Running Hybrid Software (MPI+Multithreading)","text":"In some cases, you want to mix multithreading (e.g., via OpenMP) with MPI to run one process with multiple threads that then can communicate via shared memory. Note that OpenMPI will let processes on the same node communicate via shared memory anyway, so this might not be necessary in all cases.
The file run_hybrid.sh
shows how to run an MPI job with 8 threads each.
Note well that memory is allocated on a per-slot (thus per-thread) base!
run_hybrid.sh
#!/bin/bash\n\n# Example job script for multi-threaded MPI programs, sometimes\n# called \"hybrid\" MPI computing.\n\n# Generic arguments\n\n# Job name\n#SBATCH --job-name openmpi_example\n# Maximal running time of 10 min\n#SBATCH --time 00:10:00\n# Allocate 1GB of memory per node\n#SBATCH --mem 1G\n# Write logs to directory \"slurm_log\"\n#SBATCH -o slurm_log/slurm-%x-%J.log\n\n# MPI-specific parameters\n\n# Run 8 tasks (threads/on virtual cores)\n#SBATCH --ntasks 8\n# Allocate 4 CPUs per task (cores/threads)\n#SBATCH --cpus-per-task 4\n\n# Make sure to source the profile.d file (not available on head nodes).\nsource /etc/profile.d/modules.sh\n\n# Load the OpenMPI environment module to get the runtime environment.\nmodule load openmpi/4.0.3-0\n\n# Launch the program.\nmpirun -n 8 ./openmpi_example\n
We changed the following
Let's look at the log output:
# cat slurm_log/slurm-openmpi_example-3193.log\nHello world from processor med0133, rank 1 out of 8 processors\nHello world from processor med0133, rank 3 out of 8 processors\nHello world from processor med0133, rank 2 out of 8 processors\nHello world from processor med0133, rank 6 out of 8 processors\nHello world from processor med0133, rank 0 out of 8 processors\nHello world from processor med0133, rank 4 out of 8 processors\nHello world from processor med0133, rank 5 out of 8 processors\nHello world from processor med0133, rank 7 out of 8 processors\n
Each process can now launch 4 threads (e.g., by defining export OMP_NUM_THREADS=4
before the program call).
This page gives an end-to-end example how to build and install Gromacs as an example for managing complex scientific software installs in user land. You don't have to learn or understand the specifics of Gromacs. We use it as an example as there are some actual users on the BIH cluster. However, installing it is out of scope of BIH HPC administration.
Gromacs is a good example as it is a sufficiently complex piece of software. Quite some configuration is done on the command line and there is no current software package of it in the common RPM repositories. However, it is quite well-documented and easy to install for scientific software so there is a lot to be learned.
"},{"location":"how-to/software/scientific-software/#related-documents","title":"Related Documents","text":"We will perform the following step:
Makefile
s)Many scientific software packages will have more dependencies. If the dependencies are available as CentOS Core or EPEL packages (such as zlib), HPC IT administration can install them. However, otherwise you will have to install them on their own.
Warning
Do not perform the compilation on the login nodes but go to a compute node instead.
"},{"location":"how-to/software/scientific-software/#downloading-and-extracting-software","title":"Downloading and Extracting Software","text":"This is best done in your scratch
directory as we don't have to keep these files around for long. Note that the files in your scratch
directory will automatically be removed after 2 weeks. You can also use your work
directory here.
hpc-login-1:~$ srun --pty bash -i\nmed0127:~$ mkdir $HOME/scratch/gromacs-install\nmed0127:~$ cd $HOME/scratch/gromacs-install\nmed0127:~$ wget http://ftp.gromacs.org/pub/gromacs/gromacs-2018.3.tar.gz\nmed0127:~$ tar xf gromacs-2018.3.tar.gz\nmed0127:~$ ls gromacs-2018.3\nadmin cmake COPYING CTestConfig.cmake INSTALL scripts src\nAUTHORS CMakeLists.txt CPackInit.cmake docs README share tests\n
So far so good!
"},{"location":"how-to/software/scientific-software/#perform-the-configure-step","title":"Perform the Configure Step","text":"This is the most critical step. Most scientific C/C++ software has a build step and allows for, e.g., disabling and enabling features or setting installation paths. Here, you can configure the software depending on your needs and environment. Also, it is the easiest step to mess up.
Gromac's documentation is actually quite good but the author had problems to follow it to the letter. Gromacs recommends to create an MPI and a non-MPI build but the precise way did not work. This installation creates two flavours for Gromacs 2018.3, but in a different way than the Gromacs documentation proposes.
First, here is how to configure the non-MPI flavour Gromacs wants a modern compiler, so we load gcc
. We will need to note down the precise version we used so later we can load it for running Gromacs with the appropriate libraries. We will install gromacs into $HOME/work/software
, which is appropriate for user-installed software, but it could also go into a group or project directory. Note that we install the software into your work directory as software installations are quite large and might go above your home quota. Also, software installations are usually not precious enough to waste resources on snapshots and backups. Also that we force Gromacs to use AVX_256
for SIMD support (Intel sandy bridge architecture) to not get unsupported CPU instruction errors.
med0127:~$ module load gcc/7.2.0-0 cmake/3.11.0-0\nmed0127:~$ module list\nCurrently Loaded Modulefiles:\n 1) gcc/7.2.0-0 2) cmake/3.11.0-0\nmed0127:~$ mkdir gromacs-2018.3-build-nompi\nmed0127:~$ cd gromacs-2018.3-build-nompi\nmed0127:~$ cmake ../gromacs-2018.3 \\\n -DGMX_BUILD_OWN_FFTW=ON \\\n -DGMX_MPI=OFF \\\n -DGMX_SIMD=AVX_256 \\\n -DCMAKE_INSTALL_PREFIX=$HOME/work/software/gromacs/2018.3\n
Second, here is how to configure the MPI flavour. Note that we are also enabling the openmpi
module. We will also need the precise version here so we can later load the correct libraries. Note that we install the software into the directory gromacs-mpi
but switch off shared library building as recommended by the Gromacs documentation.
med0127:~$ module load openmpi/3.1.0-0\nmed0127:~$ module list\nCurrently Loaded Modulefiles:\n 1) gcc/7.2.0-0 2) cmake/3.11.0-0 3) openmpi/4.0.3-0\nmed0127:~$ mkdir gromacs-2018.3-build-mpi\nmed0127:~$ cd gromacs-2018.3-build-mpi\nmed0127:~$ cmake ../gromacs-2018.3 \\\n -DGMX_BUILD_OWN_FFTW=ON \\\n -DGMX_MPI=ON \\\n -DGMX_SIMD=AVX_256 \\\n -DCMAKE_INSTALL_PREFIX=$HOME/work/software/gromacs-mpi/2018.3 \\\n -DCMAKE_C_COMPILER=$(which mpicc) \\\n -DCMAKE_CXX_COMPILER=$(which mpicxx) \\\n -DBUILD_SHARED_LIBS=off\n
"},{"location":"how-to/software/scientific-software/#perform-the-build-and-install-steps","title":"Perform the Build and Install Steps","text":"This is simple, using -j 32
allows us to build with 32 threads. If something goes wrong: meh, the \"joys\" of compilling C software.
Getting Support for Building Software
BIH HPC IT cannot provide support for compiling scientific software. Please contact the appropriate mailing lists or forums for your scientific software. You should only contact the BIH HPC IT helpdesk only if you are sure that the problem is with the BIH HPC cluster. You should try to resolve the issue on your own and with the developers of the software that you are trying to build/use.
For the no-MPI version:
med0127:~$ cd ../cd gromacs-2018.3-build-nompi\nmed0127:~$ make -j 32\n[...]\nmed0127:~$ make install\n
For the MPI version:
med0127:~$ cd ../cd gromacs-2018.3-build-mpi\nmed0127:~$ make -j 32\n[...]\nmed0127:~$ make install\n
"},{"location":"how-to/software/scientific-software/#create-environment-modules-files","title":"Create Environment Modules Files","text":"For Gromacs 2018.3, the following is appropriate. You should be able to use this as a template for your environment module files:
med0127:~$ mkdir -p $HOME/local/modules/gromacs\nmed0127:~$ cat >$HOME/local/modules/gromacs/2018.3 <<\"EOF\"\n#%Module\nproc ModulesHelp { } {\n puts stderr {\n Gromacs molecular simulation toolkit (non-MPI version)\n\n - http://www.gromacs.org\n }\n}\n\nmodule-whatis {Gromacs molecular simulation toolkit (non-MPI)}\n\nset root /data/cephfs-1/home/users/YOURUSER/work/software/gromacs-mpi/2018.3\n\nprereq gcc/7.2.0-0\n\nconflict gromacs\nconflict gromacs-mpi\n\nprepend-path LD_LIBRARY_PATH $root/lib64\nprepend-path LIBRARY_PATH $root/lib64\nprepend-path MANPATH $root/share/man\nprepend-path PATH $root/bin\nsetenv GMXRC $root/bin/GMXRC\nEOF\n
med0127:~$ mkdir -p $HOME/local/modules/gromacs-mpi\nmed0127:~$ cat >$HOME/local/modules/gromacs-mpi/2018.3 <<\"EOF\"\n#%Module\nproc ModulesHelp { } {\n puts stderr {\n Gromacs molecular simulation toolkit (MPI version)\n\n - http://www.gromacs.org\n }\n}\n\nmodule-whatis {Gromacs molecular simulation toolkit (MPI)}\n\nset root /data/cephfs-1/home/users/YOURUSER/work/software/gromacs-mpi/2018.3\n\nprereq openmpi/4.0.3-0\nprereq gcc/7.2.0-0\n\nconflict gromacs\nconflict gromacs-mpi\n\nprepend-path LD_LIBRARY_PATH $root/lib64\nprepend-path LIBRARY_PATH $root/lib64\nprepend-path MANPATH $root/share/man\nprepend-path PATH $root/bin\nsetenv GMXRC $root/bin/GMXRC\nEOF\n
With the next command, make your local modules files path known to the environemtn modules system.
med0127:~$ module use $HOME/local/modules\n
You can verify the result:
med0127:~$ module avail\n\n------------------ /data/cephfs-1/home/users/YOURUSER/local/modules ------------------\ngromacs/2018.3 gromacs-mpi/2018.3\n\n-------------------- /usr/share/Modules/modulefiles --------------------\ndot module-info null\nmodule-git modules use.own\n\n-------------------------- /opt/local/modules --------------------------\ncmake/3.11.0-0 llvm/6.0.0-0 openmpi/3.1.0-0\ngcc/7.2.0-0 matlab/r2016b-0 openmpi/4.0.3-0\n
"},{"location":"how-to/software/scientific-software/#interlude-convenient-module-use","title":"Interlude: Convenient module use
","text":"You can add this to your ~/.bashrc
file to always execute the module use
after login. Note that module
is not available on the login or transfer nodes, the following should work fine:
med0127:~$ cat >>~/.bashrc <<\"EOF\"\ncase \"${HOSTNAME}\" in\n login-*|transfer-*)\n ;;\n *)\n module use $HOME/local/modules\n ;;\nesac\nEOF\n
Note that the paths chosen above are sensible but arbitrary. You can install any software anywhere you have permission to -- somewhere in your user and group home, maybe a project home makes most sense on the BIH HPC, no root permissions required. You can also place the module files anywhere, as long as the module use
line is appropriate.
As a best practice, you could use the following location:
$HOME/work/software
as a root to install software to$HOME/work/software/$PKG/$VERSION
for installing a given software package in a given version$HOME/work/software/modules
as the root for modules to install$HOME/work/software/$PKG/$VERSION
for the module file to load the software in a given version$HOME/work/software/modules.sh
as a Bash script to contain the line module use $HOME/work/software/modules
chmod ug=rX,o= $GROUP/work/software
, the upper case X
is essential to only set +x
on directories and not files):$GROUP/work/software
as a root to install software to$GROUP/work/software/$PKG/$VERSION
for installing a given software package in a given version$GROUP/work/software/modules
as the root for modules to install$GROUP/work/software/$PKG/$VERSION
for the module file to load the software in a given version$GROUP/work/software/modules.sh
as a Bash script to contain the case
Bash snippet from above but with module use $GROUP/work/software/modules
Every time you want to use Gromacs, you can now do
med0127:~$ module load gcc/7.2.0-0 gromacs/2018.3\n
or, if you want to have the MPI version:
med0127:~$ module load gcc/7.2.0-0 openmpi/4.0.3-0 gromacs-mpi/2018.3\n
"},{"location":"how-to/software/scientific-software/#launching-gromacs","title":"Launching Gromacs","text":"Something along the lines of the following job script should be appropriate. See How-To: Build Run OpenMPI Programs for more information.
#!/bin/bash\n\n# Example job script for (single-threaded) MPI programs.\n\n# Generic arguments\n\n# Job name\n#SBATCH --job-name gromacs\n# Maximal running time of 10 min\n#SBATCH --time 00:10:00\n# Allocate 1GB of memory per CPU\n#SBATCH --mem 1G\n# Write logs to directory \"slurm_log/<name>-<job id>.log\" (dir must exist)\n#SBATCH --output slurm_log/%x-%J.log\n\n# MPI-specific parameters\n\n# Launch on 8 nodes (== 8 tasks)\n#SBATCH --ntasks 8\n# Allocate 4 CPUs per task (== per node)\n#SBATCH --cpus-per-task 4\n\n# Load the OpenMPI and GCC environment module to get the runtime environment.\nmodule load gcc/4.7.0-0\nmodule load openmpi/4.0.3-0\n\n# Make custom environment modules known. Alternative, you can \"module use\"\n# them in the session you use for submitting the job.\nmodule use $HOME/local/modules\nmodule load gromacs-mpi/2018.3\n\n# Launch the program on 8 nodes and tell Gromacs to use 4 threads for each\n# invocation.\nexport OMP_NUM_THREADS=4\nmpirun -n 8 gmx_mpi mdrun -deffnm npt_1000\n
med0127:~$ mkdir slurm_log\nmed0127:~$ sbatch job_script.sh\nSubmitted batch job 3229\n
"},{"location":"how-to/software/tensorflow/","title":"How-To: Setup TensorFlow","text":"TensorFlow is a package for deep learning with optional support for GPUs. You can find the original TensorFlow installation instructions here.
This article describes how to set up TensorFlow with GPU support using Conda. This how-to assumes that you have just connected to a GPU node via srun --mem=10g --partition=gpu --gres=gpu:tesla:1 --pty bash -i
(for Tesla V100 GPUs, for A400 GPUs use --gres=gpu:a40:1
). Note that you will need to allocate \"enough\" memory, otherwise your python session will be Killed
because of too little memory. You should read the How-To: Connect to GPU Nodes tutorial on an explanation of how to do this.
This tutorial assumes, that conda has been set up as described in [Software Management]((../../best-practice/software-installation-with-conda.md).
"},{"location":"how-to/software/tensorflow/#create-conda-environment","title":"Create conda environment","text":"We recommend that you install mamba first with conda install -y mamba
and use this C++ reimplementation of the conda command
as follows.
$ conda create -y -n python-tf tensorflow-gpu\n$ conda activate python-tf\n
Let us verify that we have Python and TensorFlow installed. You might get different versions you could pin the version on installing with `conda create -y -n python-tf python==3.9.10 tensorflow-gpu==2.6.2
$ python --version\nPython 3.9.10\n$ python -c 'import tensorflow; print(tensorflow.__version__)'\n2.6.2\n
We thus end up with an installation of Python 3.9.10 with tensorflow 2.6.2.
"},{"location":"how-to/software/tensorflow/#run-tensorflow-example","title":"Run TensorFlow Example","text":"Let us now see whether TensorFlow has recognized our GPU correctly.
$ python\n>>> import tensorflow as tf\n>>> print(\"TensorFlow version:\", tf.__version__)\nTensorFlow version: 2.6.2\n>>> print(tf.config.list_physical_devices())\n[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]\n
Yay, we can proceed to run the Quickstart Tutorial.
>>> mnist = tf.keras.datasets.mnist\n>>> (x_train, y_train), (x_test, y_test) = mnist.load_data()\n>>> x_train, x_test = x_train / 255.0, x_test / 255.0\n>>> model = tf.keras.models.Sequential([\n... tf.keras.layers.Flatten(input_shape=(28, 28)),\n... tf.keras.layers.Dense(128, activation='relu'),\n... tf.keras.layers.Dropout(0.2),\n... tf.keras.layers.Dense(10)\n... ])\n>>> predictions = model(x_train[:1]).numpy()\n>>> predictions\narray([[-0.50569224, 0.26386747, 0.43226188, 0.61226094, 0.09630793,\n 0.34400576, 0.9819117 , -0.3693726 , 0.5221357 , 0.3323232 ]],\n dtype=float32)\n>>> tf.nn.softmax(predictions).numpy()\narray([[0.04234391, 0.09141268, 0.10817807, 0.12951255, 0.07731011,\n 0.09903987, 0.18743432, 0.04852816, 0.11835073, 0.09788957]],\n dtype=float32)\n>>> loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)\n>>> loss_fn(y_train[:1], predictions).numpy()\n2.3122327\n>>> model.compile(optimizer='adam',\n... loss=loss_fn,\n... metrics=['accuracy'])\n>>> model.fit(x_train, y_train, epochs=5)\n2022-03-09 17:53:47.237997: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)\nEpoch 1/5\n1875/1875 [==============================] - 3s 1ms/step - loss: 0.2918 - accuracy: 0.9151\nEpoch 2/5\n1875/1875 [==============================] - 3s 1ms/step - loss: 0.1444 - accuracy: 0.9561\nEpoch 3/5\n1875/1875 [==============================] - 3s 1ms/step - loss: 0.1082 - accuracy: 0.9674\nEpoch 4/5\n1875/1875 [==============================] - 3s 1ms/step - loss: 0.0898 - accuracy: 0.9720\nEpoch 5/5\n1875/1875 [==============================] - 3s 1ms/step - loss: 0.0773 - accuracy: 0.9756\n<keras.callbacks.History object at 0x154e81360190>\n>>> model.evaluate(x_test, y_test, verbose=2)\n313/313 - 0s - loss: 0.0713 - accuracy: 0.9785\n[0.0713074803352356, 0.9785000085830688]\n>>> probability_model = tf.keras.Sequential([\n... model,\n... tf.keras.layers.Softmax()\n... ])\n>>> probability_model(x_test[:5])\n<tf.Tensor: shape=(5, 10), dtype=float32, numpy=\narray([[1.2339272e-06, 6.5599060e-10, 1.0560590e-06, 5.9356184e-06,\n 5.3691075e-12, 1.4447859e-07, 5.4218874e-13, 9.9996936e-01,\n 1.0347234e-07, 2.2147648e-05],\n [2.9887938e-06, 6.8461006e-05, 9.9991941e-01, 7.2003731e-06,\n 2.9751782e-13, 8.2818183e-08, 1.4307782e-06, 2.3203837e-13,\n 4.7433215e-07, 2.9504194e-14],\n [1.8058477e-06, 9.9928612e-01, 7.8716243e-05, 3.9140195e-06,\n 3.0842333e-05, 9.4537208e-06, 2.2774333e-05, 4.5549971e-04,\n 1.1015874e-04, 6.9138093e-07],\n [9.9978787e-01, 3.0206781e-08, 2.8528208e-05, 8.5581682e-08,\n 1.3851340e-07, 2.3634559e-06, 1.8480707e-05, 1.0153375e-04,\n 1.1583331e-07, 6.0887167e-05],\n [6.4914235e-07, 2.5808356e-08, 1.8225538e-06, 2.3215563e-09,\n 9.9588013e-01, 4.6049720e-08, 3.8903639e-07, 2.9772724e-05,\n 4.3141077e-07, 4.0867776e-03]], dtype=float32)>\n>>> exit()\n
"},{"location":"how-to/software/tensorflow/#writing-tensorflow-slurm-jobs","title":"Writing TensorFlow Slurm Jobs","text":"Writing Slurm jobs using TensorFlow is as easy as creating the following scripts.
tf_script.py
#/usr/bin/env python\n\nimport tensorflow as tf\nprint(\"TensorFlow version:\", tf.__version__)\nprint(tf.config.list_physical_devices())\n\nmnist = tf.keras.datasets.mnist\n\n(x_train, y_train), (x_test, y_test) = mnist.load_data()\nx_train, x_test = x_train / 255.0, x_test / 255.0\n\n\nmodel = tf.keras.models.Sequential([\n tf.keras.layers.Flatten(input_shape=(28, 28)),\n tf.keras.layers.Dense(128, activation='relu'),\n tf.keras.layers.Dropout(0.2),\n tf.keras.layers.Dense(10)\n])\n\npredictions = model(x_train[:1]).numpy()\nprint(predictions)\n\nprint(tf.nn.softmax(predictions).numpy())\n\n# ... and so on ;-)\n
tf_job.sh
#!/usr/bin/bash\n\n#SBATCH --job-name=tf-job\n#SBATCH --mem=10g\n#SBATCH --partition=gpu\n#SBATCH --gres=gpu:tesla:1\n\nsource $HOME/work/miniconda3/bin/activate\nconda activate python-tf\n\npython tf_script.py &>tf-out.txt\n
And then calling
$ sbatch tf_job.sh\n
You can find the reuslts in tf-out.txt
after completion.
$ cat tf-out.txt \n2022-03-09 18:05:54.628846: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA\nTo enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.\n2022-03-09 18:05:56.999848: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30988 MB memory: -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:18:00.0, compute capability: 7.0\nTensorFlow version: 2.6.2\n[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]\n[[-0.07757086 0.04676083 0.9420195 -0.59902835 -0.26286742 -0.392514\n 0.3231195 -0.17169198 0.3480805 0.37013203]]\n[[0.07963609 0.09017922 0.22075593 0.04727634 0.06616627 0.05812084\n 0.11888511 0.07248258 0.12188996 0.12460768]]\n
"},{"location":"hpc-tutorial/episode-0/","title":"First Steps: Episode 0","text":"Episode Topic 0 How can I install the tools? 1 How can I use the static data? 2 How can I distribute my jobs on the cluster (Slurm)? 3 How can I organize my jobs with Snakemake? 4 How can I combine Snakemake and Slurm?"},{"location":"hpc-tutorial/episode-0/#prerequisites","title":"Prerequisites","text":"This tutorial assumes familiarity with Linux/Unix operating systems. It also assumes that you have already connected to the cluster. We have collected some links to tutorials and manuals on the internet.
"},{"location":"hpc-tutorial/episode-0/#legend","title":"Legend","text":"Before we start with our first steps tutorial, we would like to introduce the following convention that we use throughout the series:
$ Commands are prefixed with a little dollar sign\n
While file paths are highlighted like this: /data/cephfs-1/work/projects/cubit/current
.
After connecting to the cluster, you are located on a login node. To get to your first compute node, type srun --time 7-00 --mem=8G --cpus-per-task=8 --pty bash -i
which will launch an interactive Bash session on a free remote node running up to 7 days, enabling you to use 8 cores and 8 Gb memory. Typing exit
will you bring back to the login node.
hpc-login-1$ srun -p long --time 7-00 --mem=8G --cpus-per-task=8 --pty bash -i\nhpc-cpu-1$ exit\n$\n
See? That was easy!
"},{"location":"hpc-tutorial/episode-0/#preparation","title":"Preparation","text":"In preparation for our first steps tutorial series, we would like you to install the software for this tutorial. In general the users on the cluster will manage their own software with the help of conda. If you haven't done so so far, please follow the instructions in installing conda first. The only premise is that you are able to log into the cluster. Make also sure that you are logged in to a computation node using srun -p medium --time 1-00 --mem=4G --cpus-per-task=1 --pty bash -i
.
Now we will create a new environment, so as to not interfere with your current or planned software stack, and install into it all the software that we need during the tutorial. Run the following commands:
$ conda create -n first-steps python=3 snakemake bwa delly samtools gatk4\n$ conda activate first-steps\n(first-steps) $\n
"},{"location":"hpc-tutorial/episode-1/","title":"First Steps: Episode 1","text":"Episode Topic 0 How can I install the tools? 1 How can I use the static data? 2 How can I distribute my jobs on the cluster (Slurm)? 3 How can I organize my jobs with Snakemake? 4 How can I combine Snakemake and Slurm? This is part one of the \"First Steps\" BIH Cluster Tutorial. Here we will build a small pipeline with alignment and variant calling. The premise is that you have the tools installed as described in Episode 0. For this episode, please make sure that you are on a compute node. As a reminder, the command to access a compute node with the required resources is
$ srun --time 7-00 --mem=8G --cpus-per-task=8 --pty bash -i\n
"},{"location":"hpc-tutorial/episode-1/#tutorial-input-files","title":"Tutorial Input Files","text":"We will provide you with some example FASTQ files, but you can use your own if you like. You can find the data here:
/data/cephfs-1/work/projects/cubit/tutorial/input/test_R1.fq.gz
/data/cephfs-1/work/projects/cubit/tutorial/input/test_R2.fq.gz
First, you should create a folder where the output of this tutorial will go. It would be good to have it in your work
directory in /data/cephfs-1/home/users/$USER
, because it is faster and there is more space available.
(first-steps) $ mkdir -p /data/cephfs-1/home/users/$USER/work/tutorial/episode1\n(first-steps) $ pushd /data/cephfs-1/home/users/$USER/work/tutorial/episode1\n
Quotas / File System limits
/data/cephfs-1/home/users/$USER
. The reason for this is that nightly snapshots and backups are created for this directory which are precious resources./data/cephfs-1/home/users/$USER/work
. The limits are much higher here but no snapshots or backups are available./data/cephfs-1/home/users/$USER/scratch
. However, files placed here are automatically removed after 2 weeks. This is only appropriate for files during download or temporary files.In general it is advisable to have a proper temporary directory available. You can create one in your ~/scratch
folder and make it available to the system.
(first-steps) $ export TMPDIR=/data/cephfs-1/home/users/$USER/scratch/tmp\n(first-steps) $ mkdir -p $TMPDIR\n
"},{"location":"hpc-tutorial/episode-1/#using-the-cubit-static-data","title":"Using the Cubit Static Data","text":"The static data is located in /data/cephfs-1/work/projects/cubit/current/static_data
. For our small example, the required reference genome and index can be found at:
/data/cephfs-1/work/projects/cubit/current/static_data/reference/GRCh37/g1k_phase1/human_g1k_v37.fasta
/data/cephfs-1/work/projects/cubit/current/static_data/precomputed/BWA/0.7.17/GRCh37/g1k_phase1/human_g1k_v37.fasta
Let's align our data:
(first-steps) $ bwa mem -t 8 \\\n -R \"@RG\\tID:FLOWCELL.LANE\\tPL:ILLUMINA\\tLB:test\\tSM:PA01\" \\\n /data/cephfs-1/work/projects/cubit/current/static_data/precomputed/BWA/0.7.17/GRCh37/g1k_phase1/human_g1k_v37.fasta \\\n /data/cephfs-1/work/projects/cubit/tutorial/input/test_R1.fq.gz \\\n /data/cephfs-1/work/projects/cubit/tutorial/input/test_R2.fq.gz \\\n| samtools view -b \\\n| samtools sort -O BAM -T $TMPDIR -o aln.bam\n\n(first-steps) $ samtools index aln.bam\n
"},{"location":"hpc-tutorial/episode-1/#perform-structural-variant-calling","title":"Perform Structural Variant Calling","text":"And do the structural variant calling:
(first-steps) $ delly call \\\n -g /data/cephfs-1/work/projects/cubit/current/static_data/reference/GRCh37/g1k_phase1/human_g1k_v37.fasta \\\n aln.bam\n
Note that delly will not find any variants.
"},{"location":"hpc-tutorial/episode-1/#small-variant-calling-snv-indel","title":"Small Variant Calling (SNV, indel)","text":"And now for the SNP calling (this step will take ~ 20 minutes):
(first-steps) $ gatk HaplotypeCaller \\\n -R /data/cephfs-1/work/projects/cubit/current/static_data/reference/GRCh37/g1k_phase1/human_g1k_v37.fasta \\\n -I aln.bam \\\n -ploidy 2 \\\n -O test.GATK.vcf\n
"},{"location":"hpc-tutorial/episode-1/#outlook-more-programs-and-static-data","title":"Outlook: More Programs and Static Data","text":"So this is it! We used the tools that we installed previously, accessed the reference data and ran a simple alignment and variant calling pipeline. You can access a list of all static data through this wiki, follow this link to the Static Data. You can also have a peek via:
(first-steps) $ tree -L 3 /data/cephfs-1/work/projects/cubit/current/static_data | less\n
"},{"location":"hpc-tutorial/episode-2/","title":"First Steps: Episode 2","text":"Episode Topic 0 How can I install the tools? 1 How can I use the static data? 2 How can I distribute my jobs on the cluster (Slurm)? 3 How can I organize my jobs with Snakemake? 4 How can I combine Snakemake and Slurm? Welcome to the second episode of our tutorial series!
Once you are logged in to the cluster, you have the possibility to distribute your jobs to all the nodes that are available. But how can you do this easily? The key command to this magic is sbatch
. This tutorial will show you how you can use this efficiently.
sbatch
Command","text":"So what is sbatch
doing for you?
You use the sbatch
command in front of the script you actually want to run. sbatch
then puts your job into the job queue. The job scheduler looks at the current status of the whole system and will assign the first job in the queue to a node that is free in terms of computational load. If all machines are busy, yours will wait. But your job will sooner or later get assigned to a free node.
We strongly recommend using this process for starting your computationally intensive tasks because you will get the best performance for your job and the whole system won't be disturbed by jobs that are locally blocking nodes. Thus, everybody using the cluster benefits.
You may have noticed that you run sbatch
with a script, not with regular commands. The reason is that sbatch
only accepts bash scripts. If you give sbatch
a normal shell command or binary, it won't work. This means that we have to put the command(s) we want to use in a bash script. A skeleton script can be found at /data/cephfs-1/work/projects/cubit/tutorial/skeletons/submit_job.sh
The content of the file:
#!/bin/bash\n\n# Set a name for the job (-J or --job-name).\n#SBATCH --job-name=tutorial\n\n# Set the file to write the stdout and stderr to (if -e is not set; -o or --output).\n#SBATCH --output=logs/%x-%j.log\n\n# Set the number of cores (-c or --cpus-per-task).\n#SBATCH --cpus-per-task=8\n\n# Force allocation of the two cores on ONE node.\n#SBATCH --nodes=1\n\n# Set the total memory. Units can be given in T|G|M|K.\n#SBATCH --mem=8G\n\n# Optionally, set the partition to be used (-p or --partition).\n#SBATCH --partition=medium\n\n# Set the expected running time of your job (-t or --time).\n# Formats are MM:SS, HH:MM:SS, Days-HH, Days-HH:MM, Days-HH:MM:SS\n#SBATCH --time=30:00\n\nexport TMPDIR=/data/cephfs-1/home/users/${USER}/scratch/tmp\nmkdir -p ${TMPDIR}\n
The lines starting with #SBATCH
are actually setting parameters for a sbatch
command, so #SBATCH --job-name=tutorial
is equal to sbatch --job-name=tutorial
. Slurm will create a log file with a file name composed of the job name (%x
) and the job ID (%j
), e.g. logs/tutorial-XXXX.log
. It will not automatically create the logs
directory, we need to do this manually first. Here, we emphasize the importance of the log files! They are the first place to look if anything goes wrong.
To start now with our tutorial, create a new tutorial directory with a log directory, e.g.,
(first-steps) $ mkdir -p /data/cephfs-1/home/users/$USER/work/tutorial/episode2/logs\n
and copy the wrapper script to this directory:
(first-steps) $ pushd /data/cephfs-1/home/users/$USER/work/tutorial/episode2\n(first-steps) $ cp /data/cephfs-1/work/projects/cubit/tutorial/skeletons/submit_job.sh .\n(first-steps) $ chmod u+w submit_job.sh\n
Now open this file and copy the same commands we executed in the last tutorial to this file.
To keep it simple, we will put everything into one script. This is perfectly fine because the alignment and indexing are sequential. But there are two steps that could be run in parallel, namely the variant calling, because they don't depend on each other. We will learn how to do that in a later tutorial. Your file should look something like this:
#!/bin/bash\n\n# Set a name for the job (-J or --job-name).\n#SBATCH --job-name=tutorial\n\n# Set the file to write the stdout and stderr to (if -e is not set; -o or --output).\n#SBATCH --output=logs/%x-%j.log\n\n# Set the number of cores (-c or --cpus-per-task).\n#SBATCH --cpus-per-task=8\n\n# Force allocation of the two cores on ONE node.\n#SBATCH --nodes=1\n\n# Set the total memory. Units can be given in T|G|M|K.\n#SBATCH --mem=8G\n\n# Optionally, set the partition to be used (-p or --partition).\n#SBATCH --partition=medium\n\n# Set the expected running time of your job (-t or --time).\n# Formats are MM:SS, HH:MM:SS, Days-HH, Days-HH:MM, Days-HH:MM:SS\n#SBATCH --time=30:00\n\nexport TMPDIR=/data/cephfs-1/home/users/${USER}/scratch/tmp\nmkdir -p ${TMPDIR}\n\nBWAREF=/data/cephfs-1/work/projects/cubit/current/static_data/precomputed/BWA/0.7.17/GRCh37/g1k_phase1/human_g1k_v37.fasta\nREF=/data/cephfs-1/work/projects/cubit/current/static_data/reference/GRCh37/g1k_phase1/human_g1k_v37.fasta\n\nbwa mem -t 8 \\\n -R \"@RG\\tID:FLOWCELL.LANE\\tPL:ILLUMINA\\tLB:test\\tSM:PA01\" \\\n $BWAREF \\\n /data/cephfs-1/work/projects/cubit/tutorial/input/test_R1.fq.gz \\\n /data/cephfs-1/work/projects/cubit/tutorial/input/test_R2.fq.gz \\\n| samtools view -b \\\n| samtools sort -O BAM -T $TMPDIR -o aln.bam\n\nsamtools index aln.bam\n\ndelly call -g \\\n $REF \\\n aln.bam\n\ngatk HaplotypeCaller \\\n -R $REF \\\n -I aln.bam \\\n -ploidy 2 \\\n -O test.GATK.vcf\n
Let's run it (make sure that you are in the tutorial/episode2
directory!):
(first-steps) $ sbatch submit_job.sh\n
And wait for the response which will tell you that your job was submitted and which job id number it was assigned. Note that sbatch
only tells you that the job has started, but nothing about finishing. You won't get any response at the terminal when the job finishes. It will take approximately 20 minutes to finish the job.
You'll probably want to see how your job is doing. You can get a list of your jobs using:
(first-steps) $ squeue --me\n
Note that logins are also considered as jobs.
Identify your job by the <JOBID>
(1st column) or the name of the script (3rd column). The most likely states you will see (5th column of the table):
PD
pending, waiting to be submittedR
runningIn the 8th column you can see that your job is very likely running on a different machine than the one you are on!
Do not use Slurm and watch
or loops
The watch
command is a useful tool for running commands in a loop every N
seconds. For example, on your workstation you could do watch 'ping -c 3 google.com'
to execute three network pings to Google every two seconds.
\ud83d\udc4e Using watch
or manual loops in a cluster environment can have bad effects when querying Slurm or the shared file system. Both are shared resources and \"expensive\" queries should not be run in loops. For Slurm, this includes running squeue
. The same would be true for running squeue -i
which performs an internal loop.
\ud83d\udc4d Use the Slurm query commands only when you actually need the output. If you run them in an (implict or explicit) loop, then do so only for a short time and don't leave this open in a screen.
Get more information about your jobs by either passing the job id:
(first-steps) $ sstat <JOBID>\n
And of course, watch what the logs are telling you:
(first-steps) $ tail -f logs/tutorial-<JOBID>.log\n
There will be no notification when your job is done, so it is best to watch the squeue --me
command. To watch the sbatch
command there is a linux command watch
that you give a command to execute every few seconds. This is useful for looking for changes in the output of a command. The seconds between two executions can be set with the -n
option. It is best to use -n 60
to minimize unnecessary load on the file system:
(first-steps) $ watch -n 60 squeue --me\n
If for some reason your job is hanging, you can delete your job using scancel
with your job-ID: (first-steps) $ scancel <job-ID>\n
"},{"location":"hpc-tutorial/episode-2/#job-queues","title":"Job Queues","text":"The cluster has a special way of organizing itself and by telling the cluster how long and with which priority you want your jobs to run, you can help it in this. There is a system set up on the cluster where you can enqueue your jobs to so-called partitions. partitions have different prioritites and are allowed for different running times. To get to know what partitions are available, and how to use them properly, we highly encourage you to read the cluster queues wiki page.
"},{"location":"hpc-tutorial/episode-3/","title":"First Steps: Episode 3","text":"Episode Topic 0 How can I install the tools? 1 How can I use the static data? 2 How can I distribute my jobs on the cluster (Slurm)? 3 How can I organize my jobs with Snakemake? 4 How can I combine Snakemake and Slurm?In this episode we will discuss how we can parallelize steps in a pipeline that are not dependent on each other. In the last episode we saw a case (the variant calling) that could have been potentially parallelized.
We will take care of that today. Please note that we are not going to use the sbatch
command we learned earlier. Thus, this tutorial will run on the same node where you execute the script. We will introduce you to Snakemake, a tool with which we can model dependencies and run things in parallel. In the next tutorial we will learn how to submit the jobs with sbatch
and Snakemake combined.
For those who know make
already, Snakemake will be familiar. You can think of Snakemake being a bunch of dedicated bash scripts that you can make dependent on each other. Snakemake will start the next script when a previous one finishes, and potentially it will run things in parallel if the dependencies allow.
Snakemake can get confusing, especially if the project gets big. This tutorial will only cover the very basics of this powerful tool. For more, we highly recommend digging into the Snakemake documentation:
Every Snakemake run requires a Snakefile
file. Create a new folder inside your tutorial folder and copy the skeleton:
(first-steps) $ mkdir -p /data/cephfs-1/home/users/${USER}/work/tutorial/episode3\n(first-steps) $ pushd /data/cephfs-1/home/users/${USER}/work/tutorial/episode3\n(first-steps) $ cp /data/cephfs-1/work/projects/cubit/tutorial/skeletons/Snakefile .\n(first-steps) $ chmod u+w Snakefile\n
Your Snakefile
should look as follows:
rule all:\n input:\n 'snps/test.vcf',\n 'structural_variants/test.vcf'\n\nrule alignment:\n input:\n '/data/cephfs-1/work/projects/cubit/tutorial/input/test_R1.fq.gz',\n '/data/cephfs-1/work/projects/cubit/tutorial/input/test_R2.fq.gz',\n output:\n bam='alignment/test.bam',\n bai='alignment/test.bam.bai',\n shell:\n r\"\"\"\n export TMPDIR=/data/cephfs-1/home/users/${{USER}}/scratch/tmp\n mkdir -p ${{TMPDIR}}\n\n BWAREF=/data/cephfs-1/work/projects/cubit/current/static_data/precomputed/BWA/0.7.17/GRCh37/g1k_phase1/human_g1k_v37.fasta\n\n bwa mem -t 8 \\\n -R \"@RG\\tID:FLOWCELL.LANE\\tPL:ILLUMINA\\tLB:test\\tSM:PA01\" \\\n ${{BWAREF}} \\\n {input} \\\n | samtools view -b \\\n | samtools sort -O BAM -T ${{TMPDIR}} -o {output.bam}\n\n samtools index {output.bam}\n \"\"\"\n\nrule structural_variants:\n input:\n 'alignment/test.bam'\n output:\n 'structural_variants/test.vcf'\n shell:\n r\"\"\"\n REF=/data/cephfs-1/work/projects/cubit/current/static_data/reference/GRCh37/g1k_phase1/human_g1k_v37.fasta\n\n delly call -o {output} -g ${{REF}} {input}\n \"\"\"\n\nrule snps:\n input:\n 'alignment/test.bam'\n output:\n 'snps/test.vcf'\n shell:\n r\"\"\"\n REF=/data/cephfs-1/work/projects/cubit/current/static_data/reference/GRCh37/g1k_phase1/human_g1k_v37.fasta\n\n gatk HaplotypeCaller \\\n -R ${{REF}} \\\n -I {input} \\\n -ploidy 2 \\\n -O {output}\n \"\"\"\n
Let me explain. The content resembles the same steps we took in the previous tutorials. Although every step has its own rule (alignment, snp calling, structural variant calling), we could instead have written everything in one rule. It is up to you to design your rules! Note that the rule names are arbitrary and not mentioned anywhere else in the file.
But there is one primary rule: the rule all
. This is the kickoff rule that makes everything run.
As you might have noticed, every rule has three main parameters: input
, output
and shell
. input
defines the files that are going into the rule, output
those that are produced when executing the rule, and shell
is the bash script that processes input
to produce output
.
Rule all
does not have any output
or shell
, it uses input
to start the chain of rules. Note that the input files of this rule are the output files of rule snps
and structural_variants
. The input of those rules is the output of rule alignment
. This is how Snakemake processes the rules: It looks for rule all
(or a rule that just has input
files) and figures out how it can create the required input files with other rules by looking at their output
files (the input
files of one rule must be the output
files of another rule). In our case it traces the workflow back to rule snps
and structural_variants
as they have the matching output files. They depend in return on the alignment, so the alignment
rule must be executed, and this is the first thing that will be done by Snakemake.
There are also some peculiarities about Snakemake:
input
or output
as is done in rule alignment
with the output files.input
and output
files in the script by writing {input}
or {output}
.{output.bam}
${{VAR}}
instead of ${VAR}
but not Snakemake internal variables like {input}
or {output}
structural_variants
we cheat a bit because delly does not produce output files if it can't find variants.touching
(i.e., creating) the required output file. Snakemake has a function for doing so (call touch()
on the filename).But Snakemake can do more. It is able to parse the paths of the output files and set wildcards if you want. For this your input (and output) file names have to follow a parsable scheme. In our case they do! Our FASTQ files, our only initial input files, start with test
. The output of the alignment as well as the variant calling is also prefixed test
. We now can modify the Snakemake file accordingly, by exchanging every occurrence of test
in each input
or output
field with {id}
(note that you could also give a different name for your variable). Only the input rule should not be touched, otherwise Snakemake would not know which value this variable should have. Your Snakefile
should look now like this:
rule all:\n input:\n 'snps/test.vcf',\n 'structural_variants/test.vcf'\n\nrule alignment:\n input:\n '/data/cephfs-1/work/projects/cubit/tutorial/input/{id}_R1.fq.gz',\n '/data/cephfs-1/work/projects/cubit/tutorial/input/{id}_R2.fq.gz',\n output:\n bam='alignment/{id}.bam',\n bai='alignment/{id}.bam.bai',\n shell:\n r\"\"\"\n export TMPDIR=/data/cephfs-1/home/users/${{USER}}/scratch/tmp\n mkdir -p ${{TMPDIR}}\n\n BWAREF=/data/cephfs-1/work/projects/cubit/current/static_data/precomputed/BWA/0.7.17/GRCh37/g1k_phase1/human_g1k_v37.fasta\n\n bwa mem -t 8 \\\n -R \"@RG\\tID:FLOWCELL.LANE\\tPL:ILLUMINA\\tLB:test\\tSM:PA01\" \\\n ${{BWAREF}} \\\n {input} \\\n | samtools view -b \\\n | samtools sort -O BAM -T ${{TMPDIR}} -o {output.bam}\n\n samtools index {output.bam}\n \"\"\"\n\nrule structural_variants:\n input:\n 'alignment/{id}.bam'\n output:\n 'structural_variants/{id}.vcf'\n shell:\n r\"\"\"\n REF=/data/cephfs-1/work/projects/cubit/current/static_data/reference/GRCh37/g1k_phase1/human_g1k_v37.fasta\n\n delly call -o {output} -g ${{REF}} {input}\n \"\"\"\n\nrule snps:\n input:\n 'alignment/{id}.bam'\n output:\n 'snps/{id}.vcf'\n shell:\n r\"\"\"\n REF=/data/cephfs-1/work/projects/cubit/current/static_data/reference/GRCh37/g1k_phase1/human_g1k_v37.fasta\n\n gatk HaplotypeCaller \\\n -R ${{REF}} \\\n -I {input} \\\n -ploidy 2 \\\n -O {output}\n \"\"\"\n
Before we finally run this, we can make a dry run. Snakemake will show you what it would do:
(first-steps) $ snakemake -n\n
If everything looks green, you can run it for real. We provide it two cores to allow two single-threaded jobs to be run simultaneously:
(first-steps) $ snakemake -j 2\n
"},{"location":"hpc-tutorial/episode-4/","title":"First Steps: Episode 4","text":"Episode Topic 0 How can I install the tools? 1 How can I use the static data? 2 How can I distribute my jobs on the cluster (Slurm)? 3 How can I organize my jobs with Snakemake? 4 How can I combine Snakemake and Slurm? In the last episodes we learned about distributing a job among the cluster nodes using sbatch
and how to automate and parallelize our pipeline with Snakemake. We are lucky that those two powerful commands can be combined. What is the result? You will have an automated pipeline with Snakemake that uses sbatch
to distribute jobs among the cluster nodes instead of running only the same node.
The best thing is that we can reuse our Snakefile
as it is and just write a wrapper script to call Snakemake. We run the script and the magic will start.
First, create a new folder for this episode:
(first-steps) $ mkdir -p /data/cephfs-1/home/users/${USER}/work/tutorial/episode4/logs\n(first-steps) $ pushd /data/cephfs-1/home/users/${USER}/work/tutorial/episode4\n
And copy the wrapper script to this folder as well as the Snakefile (you can also reuse the one with the adjustments from the previous episode):
(first-steps) $ cp /data/cephfs-1/work/projects/cubit/tutorial/skeletons/submit_snakejob.sh .\n(first-steps) $ cp /data/cephfs-1/work/projects/cubit/tutorial/skeletons/Snakefile .\n(first-steps) $ chmod u+w submit_snakejob.sh Snakefile\n
The Snakefile
is already known to you but let me explain the wrapper script submit_snakejob.sh
:
#!/bin/bash\n\n# Set a name for the job (-J or --job-name).\n#SBATCH --job-name=tutorial\n\n# Set the file to write the stdout and stderr to (if -e is not set; -o or --output).\n#SBATCH --output=logs/%x-%j.log\n\n# Set the number of cores (-c or --cpus-per-task).\n#SBATCH --cpus-per-task=2\n\n# Force allocation of the two cores on ONE node.\n#SBATCH --nodes=1\n\n# Set the total memory. Units can be given in T|G|M|K.\n#SBATCH --mem=1G\n\n# Optionally, set the partition to be used (-p or --partition).\n#SBATCH --partition=medium\n\n# Set the expected running time of your job (-t or --time).\n# Formats are MM:SS, HH:MM:SS, Days-HH, Days-HH:MM, Days-HH:MM:SS\n#SBATCH --time=30:00\n\n\nexport TMPDIR=/data/cephfs-1/home/users/${USER}/scratch/tmp\nexport LOGDIR=logs/${SLURM_JOB_NAME}-${SLURM_JOB_ID}\nmkdir -p $LOGDIR\n\neval \"$($(which conda) shell.bash hook)\"\nconda activate first-steps\n\nset -x\n\nsnakemake --profile=cubi-v1 -j 2 -k -p --restart-times=2\n
In the beginning you see the #SBATCH
that introduces the parameters when you provide this script to sbatch
as described in the second episode. Please make sure that the logs
folder exists before starting the run! We then set and export the TMPDIR
and LOGDIR
variables. Note that LOGDIR
has a subfolder named $SLURM_JOB_NAME-$SLURM_JOB_ID
that will be created for you. Snakemake will store its logfiles for this very Snakemake run in this folder. The next new thing is set -x
. This simply prints to the terminal every command that is executed within the script. This is useful for debugging.
Finally, the Snakemake call takes place. With the --profile
option we define that Snakemake uses the Snakemake profile at /etc/xdg/snakemake/cubi-v1
. The profile will take create appropriate calls to sbatch
and interpret the following settings from your Snakemake rule:
threads
: the number of threads to execute the job onk
, M
, G
, or T
. You can specify EITHERresources.mem
/resources.mem_mb
: the memory to allocate for the whole job, ORresources.mem_per_thread
: the memory to allocate for each thread.resources.time
: the running time of the rule, in a syntax supported by Slurm, e.g. HH:MM:SS
or D-HH:MM:SS
resources.partition
: the partition to submit your job into (Slurm will pick a fitting partition for you by default)resources.nodes
: the number of nodes to schedule your job on (defaults to 1
and you will want to keep that value unless you want to use MPI)The other options to snakemake
have the meaning:
-j 2
: run at most two jobs at the same time-k
: keep going even if a rule execution fails-p
: print the executed shell commands--restart-times=2
: restart failing jobs up to two timesIt is now time to update your Snakefile
such that it actually specifies the resources mentioned above:
rule all:\n input:\n 'snps/test.vcf',\n 'structural_variants/test.vcf'\n\nrule alignment:\n input:\n '/data/cephfs-1/work/projects/cubit/tutorial/input/{id}_R1.fq.gz',\n '/data/cephfs-1/work/projects/cubit/tutorial/input/{id}_R2.fq.gz',\n output:\n bam='alignment/{id}.bam',\n bai='alignment/{id}.bam.bai',\n threads: 8\n resources:\n mem='8G',\n time='12:00:00',\n shell:\n r\"\"\"\n export TMPDIR=/data/cephfs-1/home/users/${{USER}}/scratch/tmp\n mkdir -p ${{TMPDIR}}\n\n BWAREF=/data/cephfs-1/work/projects/cubit/current/static_data/precomputed/BWA/0.7.17/GRCh37/g1k_phase1/human_g1k_v37.fasta\n\n bwa mem -t 8 \\\n -R \"@RG\\tID:FLOWCELL.LANE\\tPL:ILLUMINA\\tLB:test\\tSM:PA01\" \\\n ${{BWAREF}} \\\n {input} \\\n | samtools view -b \\\n | samtools sort -O BAM -T ${{TMPDIR}} -o {output.bam}\n\n samtools index {output.bam}\n \"\"\"\n\nrule structural_variants:\n input:\n 'alignment/{id}.bam'\n output:\n 'structural_variants/{id}.vcf'\n threads: 1\n resources:\n mem='4G',\n time='2-00:00:00',\n shell:\n r\"\"\"\n REF=/data/cephfs-1/work/projects/cubit/current/static_data/reference/GRCh37/g1k_phase1/human_g1k_v37.fasta\n\n delly call -o {output} -g ${{REF}} {input}\n \"\"\"\n\ndef snps_mem(wildcards, attempt):\n mem = 2 * attempt\n return '%dG' % mem\n\nrule snps:\n input:\n 'alignment/{id}.bam'\n output:\n 'snps/{id}.vcf'\n threads: 1\n resources:\n mem=snps_mem,\n time='04:00:00',\n shell:\n r\"\"\"\n REF=/data/cephfs-1/work/projects/cubit/current/static_data/reference/GRCh37/g1k_phase1/human_g1k_v37.fasta\n\n gatk HaplotypeCaller \\\n -R ${{REF}} \\\n -I {input} \\\n -ploidy 2 \\\n -O {output}\n \"\"\"\n
We thus configure the resource consumption of the rules as follows:
alignment
with 8 threads and up to 8GB of memory in total with a running time of up to 12 hours,structural_variants
with one thread and up to 4GB of memory in with a running time of up to 2 days,snps
with one thread and running up to four hours. Instead of passing a static amount of memory, we pass a resource callable. The attempt
parameter will be passed a value of 1
on the initial invocation. If variant calling with the GATK HaplotypeCaller fails then it will retry and attempt
will have an incremented value on each invocation (2
on the first retry and so on). Thus, we try to do small variant calling with 2, 4, 6, and 8 GB.Finally, run the script:
(first-steps) $ sbatch submit_snakejob.sh\n
If you watch squeue --me
now, you will see that the jobs are distributed to the system:
(first-steps) $ squeue --me\n
Please refer to the Snakemake documentation for more details on using Snakemake, in particular how to use the cluster configuration on how to specify the resource requirements on a per-rule base.
"},{"location":"misc/external-resources/","title":"External Resources","text":""},{"location":"misc/external-resources/#basic-linux","title":"Basic Linux","text":"The BIH HPC uses CentOS Linux. A basic understanding of Linux is required. Even better, you should already have intermediate to advanced Linux/Unix skills.
BIH HPC IT cannot provide you with basic Unix training. Please ask your home organization (e.g., Charite or MDC) to provide you with basic Linux training.
That said, here are some resources that we find useful:
"},{"location":"misc/external-resources/#internet-tutorials","title":"Internet Tutorials","text":"There is a large number of Linux tutorials online including:
GOBLET has a number of Bioinformatics-focused tutorials. This includes
Some software is provided by HPC Administration based on the criteria that it is:
Currently, this includes:
On the GPU node, this also includes a recent NVIDIA CUDA version.
To see which software is available, use module avail
on a compute node (this will not work on login nodes):
$ module avail\n--------------------- /opt/local/modules ---------------------\ncmake/3.11.0-0 llvm/6.0.0-0\ngcc/7.2.0-0 openmpi/4.0.3-0\n
To load software, use module load
. This will adjust the environment variables accordingly, in particular update PATH
such that the executable are available.
$ which gcc\n/bin/gcc\n$ module load gcc/7.2.0-0\n$ which gcc\n/opt/local/gcc-7.2.0-0/bin/gcc\n
Problems with executing module
?
See the corresponding FAQ entry in the case that you get a -bash: module: command not found
when calling module
.
The BIH Cluster is a valuable resource. It has been used to support the publications listed below.
Hollunder, B., Ostrem, J.L., Sahin, I.A., Rajamani, N., Oxenford, S., Butenko, K., Neudorfer, C., Reinhardt, P., Zvarova, P., Polosan, M., Akram, H., Vissani, M., Zhang, C., Sun, B., Navratil, P., Reich, M.M., Volkmann, J., Yeh, F.-C., Baldermann, J.C., Dembek, T.A., Visser-Vandewalle, V., Alho, E.J.L., Franceschini, P.R., Nanda, P., Finke, C., K\u00fchn, A.A., Dougherty, D.D., Richardson, R.M., Bergman, H., DeLong, M.R., Mazzoni, A., Romito, L.M., Tyagi, H., Zrinzo, L., Joyce, E.M., Chabardes, S., Starr, P.A., Li, N., Horn, A., 2024. Mapping dysfunctional circuits in the frontal cortex using deep brain stimulation. Nat. Neurosci. 1\u201314. doi: 10.1038/s41593-024-01570-1
"},{"location":"misc/publication-list/#2022","title":"2022","text":"Kossen T, Hirzel MA, Madai VI, Boenisch F, Hennemuth A, Hildebrand K, Pokutta S, Sharma K, Hilbert A, Sobesky J, Galinovic I, Khalil AA, Fiebach JB and Frey D. Toward Sharing Brain Images: Differentially Private TOF-MRA Images With Segmentation Labels Using Generative Adversarial Networks. Frontiers in Artificial Intelligence. 5 (2022). issn: 2624-8212. doi: 10.3389/frai.2022.813842
"},{"location":"misc/publication-list/#2021","title":"2021","text":"Li, N., Hollunder, B., Baldermann, J. C., Kibleur, A., Treu, S., Akram, H., Al-Fatly, B., Strange, B. A., Barcia, J. A., Zrinzo, L., Joyce, E. M., Chabardes, S., Visser-Vandewalle, V., Polosan, M., Kuhn, J., K\u00fchn, A. A., & Horn, A. (2021). A Unified Functional Network Target for Deep Brain Stimulation in Obsessive-Compulsive Disorder. Biological Psychiatry. doi: 10.1016/j.biopsych.2021.04.006
Bressem KK, Vahldiek JL, Adams L, Niehues SM, Haibel H, Rodriguez VR, Torgutalp M, Protopopov M, Proft F, Rademacher J, Sieper J, Rudwaleit M, Hamm B, Makowski MR, Hermann KG, Poddubnyy D. Deep learning for detection of radiographic sacroiliitis: achieving expert-level performance. Arthritis Res Ther. 2021 Apr 8;23(1):106. doi: 10.1186/s13075-021-02484-0
Kossen T, Subramaniam P, Madai VI, Hennemuth A, Hildebrand K, Hilbert A, Sobesky J, Livne M, Galinovic I, Khalil AA, Fiebach JB, Frey D. Synthesizing anonymized and labeled TOF-MRA patches for brain vessel segmentation using generative adversarial networks. Computers in Biology and Medicine. 2021 Apr 131,104254. doi: 10.1016/j.compbiomed.2021.104254
Paraskevopoulou S., K\u00e4fer S., Zirkel F., Donath A., Petersen M., Liu S., Zhou X., Drosten C., Misof B., Junglen S. (2021). \"Viromics of extant insect orders unveil the evolution of the flavi-like superfamily.\" Virus Evolution 2021 Mar 30. doi: 10.1093/ve/veab030
Thomas Krannich, W Timothy J White, Sebastian Niehus, Guillaume Holley, Bjarni V Halld\u00f3rsson, Birte Kehr, Population-scale detection of non-reference sequence variants using colored de Bruijn graphs, Bioinformatics, 2021, btab749, doi: 10.1093/bioinformatics/btab749
Julia Markowski, Rieke Kempfer, Alexander Kukalev, Ibai Irastorza-Azcarate, Gesa Loof, Birte Kehr, Ana Pombo, Sven Rahmann, Roland F Schwarz, GAMIBHEAR: whole-genome haplotype reconstruction from Genome Architecture Mapping data, Bioinformatics, Volume 37, Issue 19, 1 October 2021, Pages 3128\u20133135. doi: 10.1093/bioinformatics/btab238
"},{"location":"misc/publication-list/#2020","title":"2020","text":"Kr\u00fctzfeldt LM, Schubach M, Kircher M. The impact of different negative training data on regulatory sequence predictions. PLoS One. 2020 Dec 1;15(12):e0237412. doi: 10.1371/journal.pone.0237412.
Klotz-Noack K, Klinger B, Rivera M, Bublitz N, Uhlitz F, Riemer P, L\u00fcthen M, Sell T, Kasack K, Gastl B, Ispasanie SSS, Simon T, Janssen N, Schwab M, Zuber J, Horst D, Bl\u00fcthgen N, Sch\u00e4fer R, Morkel M, Sers C. SFPQ Depletion Is Synthetically Lethal with BRAFV600E in Colorectal Cancer Cells. Cell Rep. 2020 Sep 22;32(12):108184. doi: 10.1016/j.celrep.2020.108184.
Kleinert, P., Martin, B., & Kircher, M. (2020). \"HemoMIPs\u2014Automated analysis and result reporting pipeline for targeted sequencing data.\" PLOS Computational Biology, 16(6), e1007956. doi: 10.1371/journal.pcbi.1007956
Ehmke, N.; Cusmano-Ozog, K.; Koenig, R.; Holtgrewe, M.; Nur, B.; Mihci, E.; Babcock, H.; Gonzaga-Jauregui, C.; Overton, J. D.; Xiao, J.; et al. Biallelic Variants in KYNU Cause a Multisystemic Syndrome with Hand Hyperphalangism. Bone 2020, 115219. doi: 10.1016/j.bone.2019.115219.
Niehus, S.; J\u00f3nsson, H.; Sch\u00f6nberger, J.; Bj\u00f6rnsson, E.; Beyter, D.; Eggertsson, H.P.; Sulem, P.; Stef\u00e1nsson, K.; Halld\u00f3rsson, B.V.; Kehr, B. PopDel identifies medium-size deletions jointly in tens of thousands of genomes. bioRxiv 2020, 10.1101/740225 doi: 10.1101/740225
Gordon, M. G., Inoue, F., Martin, B., Schubach, M., Agarwal, V., Whalen, S., ... & Kreimer, A. (2020). \"lentiMPRA and MPRAflow for high-throughput functional characterization of gene regulatory elements.\" Nature Protocols, 15(8), 2387-2412. doi: 10.1038/s41596-020-0333-5
Paraskevopoulou S., Pirzer F., Goldmann N., Schmid J., Corman V.M., Gottula L.T.,Schroeder S., Rasche A., Muth D., Drexler J.F., Heni A.C., Eibner G.J., Page R.A., Jones T.C., M\u00fcllerM.A., Sommer S., Glebe D., and Drosten C. (2020). \"Mammalian deltavirus without hepadnavirus coinfection in the neotropical rodent Proechimys semispinosus.\" Proceedings of the National Academy of Sciences 2020 Jul 28;117(30):17977-17983. doi: 10.1073/pnas.2006750117.
"},{"location":"misc/publication-list/#2019","title":"2019","text":"Kircher, M., Xiong, C., Martin, B. et al. \"Saturation mutagenesis of twenty disease-associated regulatory elements at single base-pair resolution.\" Nat Commun 10, 3583 (2019). doi: 10.1038/s41467-019-11526-w
Stefanovski L, Triebkorn P, Spiegler A, Diaz-Cortes M-A, Solodkin A, Jirsa V, McIntosh RA and Ritter P (2019). \"Linking Molecular Pathways and Large-Scale Computational Modeling to Assess Candidate Disease Mechanisms and Pharmacodynamics in Alzheimer's Disease.\" Front. Comput. Neurosci.. 13:54. doi: 10.3389/fncom.2019.00054
Boeddrich A., Babila J.T., Wiglenda T., Diez L., Jacob M., Nietfeld W., Huska M.R., Haenig C., Groenke N., Buntru A., Blanc E., Meier J.C., Vannoni E., Erck C., Friedrich B., Martens H., Neuendorf N., Schnoegl S., Wolfer DP., Loos M., Beule D., Andrade-Navarro M.A., Wanker E.E. (2019). \"The Anti-amyloid Compound DO1 Decreases Plaque Pathology and Neuroinflammation-Related Expression Changes in 5xFAD Transgenic Mice.\" Cell Chem Biol. 2019 Jan 17;26(1):109-120.e7. doi: 10.1016/j.chembiol.2018.10.013.
Fountain M.D., Oleson, D.S., Rech. M.E., Segebrecht, L., Hunter, J.V., McCarthy, J.M., Lupo, P.J., Holtgrewe, M., Mora, R., Rosenfeld, J.A., Isidor, B., Le Caignec, C., Saenz, M.S., Pedersen, R.C., Morgen, T.M., Pfotenhauer, J.P., Xia, F., Bi, W., Kang, S.-H.L., Patel, A., Krantz, I.D., Raible, S.E., Smith, W.E., Cristian, I., Tori, E., Juusola, J., Millan, F., Wentzensen, I.M., Person, R.E., K\u00fcry, S., B\u00e9zieau, S., Uguen, K., F\u00e9rec, C., Munnich, A., van Haelst, M., Lichtenbelt, K.D., van Gassen, K., Hagelstrom, T., Chawla, A., Perry, D.L., Taft, R.J., Jones, M., Masser-Frye, D., Dyment, D., Venkateswaran, S., Li, C., Escobar, L,.F., Horn, D., Spillmann, R.C., Pe\u00f1a, L., Wierzba, J., Strom, T.M. Parent, I. Kaiser, F.J., Ehmke, N., Schaaf, C.P. (2019). \"Pathogenic variants in USP7 cause a neurodevelopmental disorder with speech delays, altered behavior, and neurologic anomalies.\" Genet. Med. 2019 Jan 25. doi: 10.1038/s41436-019-0433-1
Holtgrewe,M., Messerschmidt,C., Nieminen,M. and Beule,D. (2019) DigestiFlow: from BCL to FASTQ with ease. Bioinformatics, 10.1093/bioinformatics/btz850.
K\u00e4fer S., Paraskevopoulou S., Zirkel F., Wieseke N., Donath A., Petersen M., Jones T.C., Liu S., Zhou X., Middendorf M., Junglen S., Misof B., Drosten C. (2019). \"Re-assessing the diversity of negative strand RNA viruses in insects.\" PLOS Pathogens 2019 Dec 12. doi: 10.1371/journal.ppat.1008224
K\u00fchnisch,J., Herbst,C., Al\u2010Wakeel\u2010Marquard,N., Dartsch,J., Holtgrewe,M., Baban,A., Mearini,G., Hardt,J., Kolokotronis,K., Gerull,B., et al. (2019) Targeted panel sequencing in pediatric primary cardiomyopathy supports a critical role of TNNI3. Clin Genet, 96, 549\u2013559. https://doi.org/10.1111/cge.13645
Marklewitz M., Dutari L.C., Paraskevopoulou S., Page R.A., Loaiza J.R., Junglen S. (2019). \"Diverse novel phleboviruses in sandflies from the Panama Canal area, Central Panama.\" Journal of General Virology 2019 May 3. doi: 10.1099/jgv.0.001260
Quade,A., Thiel,A., Kurth,I., Holtgrewe,M., Elbracht,M., Beule,D., Eggermann,K., Scholl,U.I. and H\u00e4usler,M. (2019) Paroxysmal tonic upgaze: A heterogeneous clinical condition responsive to carbonic anhydrase inhibition. European Journal of Paediatric Neurology, 10.1016/j.ejpn.2019.11.002.
"},{"location":"misc/publication-list/#2018","title":"2018","text":"Blanc, E., Holtgrewe, M., Dhamodaran, A., Messerschmidt, C., Willimsky, G., Blankenstein, T., Beule, D. (2018). \"Identification and Ranking of Recurrent Neo-Epitopes in Cancer\". bioRxiv. 2018/389437, 2018. doi: 10.1101/389437
Brandt, R., Uhlitz, F., Riemer, P., Giesecke, C., Schulze, S., El-Shimy, I.A., Fauler, B., Mielke, T., Mages, N., Herrmann, B.G., Sers, C., Bl\u00fcthgen, N., Morkel, M. (2018). \"Cell type-dependent differential activation of ERK by oncogenic KRAS or BRAF in the mouse intestinal epithelium\". bioRxiv. 2018/340844. doi: 10.1101/340844.
Holtgrewe, M., Knaus, A., Hildebrand, G., Pantel, J.-T., Rodriguesz de los Santos, M., Neveling, K., Goldmann, J., Schubach, M., J\u00e4ger, M., Couterier, M., Mundlos, S., Beule, D., Sperling, K., Krawitz, P. (2018). \"Multisite de novo mutations in human offspring after paternal exposure to ionizing radiation\", Nature Scientific Reports. 2018 Oct 2;8(1):14611. doi: 10.1038/s41598-018-33066-x.
Kircher M., Xiong C., Martin B, Schubach M, Inoue F, Bell R.JA., Costello J.F., Shendure J., Ahituv N. (2018). \"Saturation mutagenesis of disease-associated regulatory elements.\" bioRxiv (2018): 505362. doi: 10.1101/505362
PCAWG Transcriptome Core Group, Calabrese, C., Davidson, N.R., Fonseca1, N.A., He, Y., Kahles, A., Lehmann, K.-V., Liu, F., Shiraishi, Y., Soulette, C.M., Urban, L., Demircio\u011flu, D., Greger, L., Li, S., Liu, D., Perry, M.D., Xiang, L., Zhang, F., Zhang, J., Bailey, P., Erkek, S., Hoadley, K.A., Hou, Y., Kilpinen, H., Korbel, J.O., Marin, M.G., Markowski, J., Nandi11, T., Pan-Hammarstr\u00f6m, Q., Pedamallu, C.S., Siebert, R., Stark, S.G., Su, H., Tan, P., Waszak, S.M., Yung, C., Zhu, S., PCAWG Transcriptome Working Group, Awadalla, P., Creighton, C.J., Meyerson, M., Ouellette, B.F.F., Wu, K., Yang, H., ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Network, Brazma1, A., Brooks, A.N., G\u00f6ke, J., R\u00e4tsch, G., Schwarz, R.F., Stegle, O., Zhang, Z. (2018). \"Genomic basis for RNA alterations revealed by whole-genome analyses of 27 cancer types\". bioRxiv. 2018/183889. doi: 10.1101/183889
Guneykaya D., Ivanov A., Hernandez D.P., Haage V., Wojtas B., Meyer N., Maricos M., Jordan P., Buonfiglioli A., Gielniewski B., Ochocka N., C\u00f6mert, C., Friedrich, C., Artiles, L. S., Kaminska, B., Mertins, P., Beule, D., Kettenmann, H. (2018). \"Transcriptional and translational differences of microglia from male and female brains\", Cell reports. 2018 Sep 4;24(10):2773-83. doi: 10.1016/j.celrep.2018.08.001.
Rentzsch P, Witten D, Cooper GM, Shendure J, Kircher M. (2018). \"CADD: predicting the deleteriousness of variants throughout the human genome\", Nucleic Acids Res. 2018 Oct 29. doi: 10.1093/nar/gky1016.
Salatzki J., Foryst-Ludwig A., Bentele K., Blumrich A., Smeir E., Ban Z., Brix S., Grune J., Beyhoff N., Klopfleisch R., Dunst S., Surma, M.A., Klose, C., Rothe, M., Heinzel, F.R., Krannich, A., Kershaw, E.E., Beule, D., Schulze, P.C., Marx, N., Kintscher, U. (2018). \"Adipose tissue ATGL modifies the cardiac lipidome in pressure-overload-induced left ventricular failure\", PLoS genetics. 2018 Jan 10;14(1):e1007171. doi: 10.1371/journal.pgen.100717.
Schubach M., Re M., Robinson P.N., Valentini G. (2017) \"Imbalance-aware machine learning for predicting rare and common disease-associated non-coding variants\", Scientific reports 7:1, 2959. doi: 10.1038/s41598-017-03011-5.
Schubert M., Klinge, B., Kl\u00fcnemann M., Sieber A., Uhlitz F., Sauer S., Garnett M., Bl\u00fcthgen N., Saez-Rodriguez J. (2018). \"Perturbation-response genes reveal signaling footprints in cancer gene expression\". Nature Communications. 9: 20, 2018. doi: 10.1038/s41467-017-02391-6
"},{"location":"misc/publication-list/#2017","title":"2017","text":"Euskirchen, P., Bielle, F., Labreche, K., Kloosterman, W.P., Rosenberg, S., Daniau, M., Schmitt, C., Masliah-Planchon, J., Bourdeaut, F., Dehais, C., et al. (2017). Same-day genomic and epigenomic diagnosis of brain tumors using real-time nanopore sequencing. Acta Neuropathol 1\u201313. doi: 10.1007/s00401-017-1743-5
Euskirchen, P., Radke, J., Schmidt, M.S., Heuling, E.S., Kadikowski, E., Maricos, M., Knab, F., Grittner, U., Zerbe, N., Czabanka, M., et al. (2017). Cellular heterogeneity contributes to subtype-specific expression of ZEB1 in human glioblastoma. PLOS ONE 12, e0185376. doi: 10.1371/journal.pone.0185376
Mattei D., Ivanov A., Ferrai C., Jordan P., Guneykaya D., Buonfiglioli A., Schaafsma W., Przanowski P., Deuther-Conrad W., Brust P., Hesse S., Patt, M., Sabri, O., Ross, T.L., Eggen, B.J.L., Boddeke E.W.G.M., Kaminska, B., Beule, D., Pombo, A., Kettenmann, H., Wolf, S.A. (2017). \"Maternal immune activation results in complex microglial transcriptome signature in the adult offspring that is reversed by minocycline treatment.\" Translational psychiatry. 2017 May;7(5):e1120. doi: 10.1038/tp.2017.80.
Mamlouk, S., Childs, L. H., Aust, D., Heim, D., Melching, F., Oliveira, C., Wolf, T., Durek, P., Schumacher, D., Bl\u00e4ker, H., von Winterfeld, M., Gastl, B., M\u00f6hr, K., Menne, A., Zeugner, S., Redmer, T., Lenze, D., Tierling, S., M\u00f6bs, M., Weichert, W., Folprecht, G., Blanc, E., Beule, D., Sch\u00e4fer, R., Morkel, M., Klauschen, F., Leser, U. and Sers, C. (2017). \"DNA copy number changes define spatial patterns of heterogeneity in colorectal cancer\", Nature Communications. 2017; 8, p. 14093. doi: 10.1038/ncomms14093.
Messerschmidt, C., Holtgrewe, M. and Beule, D. (2017). \"HLA-MA: simple yet powerful matching of samples using HLA typing results\". Bioinformatics. 28, pp. 2592\u20132599. doi: 10.1093/bioinformatics/btx132.
Kammertoens, T., Friese, C., Arina, A., Idel, C., Briesemeister, D., Rothe, M., Ivanov, A., Szymborska, A., Patone, G., Kunz, S., Sommermeyer, D., Engels, B., Leisegang, M., Textor, A., Fehling, H. J., Fruttiger, M., Lohoff, M., Herrmann, A., Yu, H., Weichselbaum, R., Uckert, W., H\u00fcbner, N., Gerhardt, H., Beule, D., Schreiber, H. and Blankenstein, T. (2017). \"Tumour ischaemia by interferon-\u03b3 resembles physiological blood vessel regression\". Nature. 545(7652), pp. 98\u2013102. doi: 10.1038/nature22311.
Schulze Heuling, E., Knab, F., Radke, J., Eskilsson, E., Martinez-Ledesma, E., Koch, A., Czabanka, M., Dieterich, C., Verhaak, R.G., Harms, C., et al. (2017). Prognostic Relevance of Tumor Purity and Interaction with MGMT Methylation in Glioblastoma. Mol. Cancer Res. 15, 532\u2013540. doi: 10.1158/1541-7786.MCR-16-0322
Yaakov, G., Lerner, D., Bentele, K., Steinberger, J., Barkai, N., Bigger, J., Maisonneuve, E., Gerdes, K., Lewis, K., Dhar, N., McKinney, J. D., Gefen, O., Balaban, N. Q., Jayaraman, R., Balaban, N. Q., Merrin, J., Chait, R., Kowalik, L., Leibler, S., Balaban, N. Q., Allison, K. R., Brynildsen, M. P., Collins, J. J., Nathan, C., Lewis, K., Glickman, M. S., Sawyers, Knoechel, B., Welch, A. Z., Gibney, P. A., Botstein, D., Koshland, D. E., Levy, S. F., Ziv, N., Siegal, M. L., Stewart-Ornstein, J., Weissman, J. S., El-Samad, H., Gasch, A. P., Weinert, T., Hartwell, L., Weinert, T. A., Hartwell, L. H., Lisby, M., Rothstein, R., Mortensen, U. H., Lisby, M., Mortensen, U. H., Rothstein, R., Domkin, V., Thelander, L., Chabes, A., Hendry, J. A., Tan, G., Ou, J., Boone, C., Brown, G. W., Berry, D. B., Gasch, A. P., Lynch, M., Nishant, K. T., Serero, A., Jubin, C., Loeillet, S., Legoix-Ne, P., Nicolas, A. G., Huh, W. K., Janke, C., Lee, S. E., Blecher-Gonen, R., Martin, M., Cherry, J. M., McKenna, A., DePristo, M. A., Lawrence, M., Obenchain, V., Ye, K., Schulz, M. H., Long, Q., Apweiler, R., Ning, Z., Layer, R. M., Chiang, C., Quinlan, A. R., Hall, I. M., Faust, G. G., Hall, I. M., Boeva, V., Boeva, V., Li, H., Koren, A., Soifer, I. and Barkai, N. (2017). \"Coupling phenotypic persistence to DNA damage increases genetic diversity in severe stress\". Nature Ecology & Evolution. 1(1), pp. 497\u2013500. doi: 10.1038/s41559-016-0016.
Uhlitz, F., Sieber, A., Wyler, E., Fritsche-Guenther, R., Meisig, J., Landthaler, M., Klinger, B., Bl\u00fcthgen, N. (2017). \"An immediate-late gene expression module decodes ERK signal duration\". Molecular Systems Biology. 13: 928, 2017. doi: 10.15252/msb.20177554.
"},{"location":"misc/publication-list/#theses","title":"Theses","text":""},{"location":"misc/publication-list/#2019_1","title":"2019","text":"Schumann F. (2019). \"Establishing a pipeline for stable mutational signature detection and evaluation of variant filter effects\". Freie Universit\u00e4t Berlin. Bachelor Thesis, Bioinformatics.
"},{"location":"misc/publication-list/#2018_1","title":"2018","text":"Borgsm\u00fcller N. (2018). \"Optimization of data processing in GC-MS metabolomics\", Technische Universit\u00e4t Berlin. Master Thesis, Biotechnology.
Kuchenbecker, S.-L. (2018). \"Analysis of Antigen Receptor Repertoires Captured by High Throughput Sequencing\". Freie Universit\u00e4t Universit\u00e4t Berlin. PhD Thesis, Dr. rer. nat. URN:NBN: urn:nbn:de:kobv:188-refubium-22171-8
Schubach M. (2018). \"Learning the Non-Coding Genome\", Freie Universit\u00e4t Universit\u00e4t Berlin. PhD Thesis, Dr. rer. nat. URN:NBN: urn:nbn:de:kobv:188-refubium-23332-7
"},{"location":"misc/publication-list/#posters","title":"Posters","text":""},{"location":"misc/publication-list/#2018_2","title":"2018","text":"Roskosch, S., Hald\u00f3rsson B., Kehr, B. (2018). \"PopDel: Population-Scale Detection of Genomic Deletions\" ECCB 2018. Poster.
White T., Kehr B. (2018). \"Comprehensive extraction of structural variations from long-read DNA sequences\" WABI 2018. Poster.
"},{"location":"misc/publication-list/#2017_1","title":"2017","text":"Schubach M., Re R., Robinson P.N., Valentini G. (2017). \"Variant relevance prediction in extremely imbalanced training sets\" ISMB/ECCB 2017. Poster.
White T., Kehr B. (2017). \"Improving long-read mapping with simple lossy sequence transforms\" ISMB/ECCB 2017. Poster.
"},{"location":"ondemand/interactive/","title":"OnDemand: Interactive Sessions","text":"Interactive sessions allow you to start and manage selected apps. Depending on the app they run as servers or GUIs. Selecting My Interactive Sessions
in the top menu will direct you to the overview of currently running sessions. The left-hand panel provides a short cut to start a new session of one of the provided apps.
Each running interactive session is listed. Each card corresponds to one session. The title of each card provides the name, allocated resources and the current status. Furthermore, detailed information and links are available:
Don't hit reload in your apps
Please note that the portal will use the authentication mechanisms of the apps to ensure that nobody except for you can connect to the session. This means that hitting the browsers \"reload\" button in your app will most likely not work.
Just go back to the interactive session list and click on the \"connect\" button.
"},{"location":"ondemand/interactive/#session-directories","title":"Session Directories","text":"The portal software will create a folder ondemand
in your home directory. Inside, it will create session directories for each started interactive job. For technical reasons, these folders have very long names, for example:
$HOME/ondemand/data/sys/dashboard/batch_connect/sys/ood-bih-rstudio-server/output/e40e03b3-11ca-458a-855b-98e6f148c99a/
This follows the pattern:
$HOME/${application name}/output/${job UUID}
The job identifier used is not the Slurm job ID but an identifier internal to OnDemand. Inside this directory you will find log files and a number of scripts that are used to start your job.
If you need to debug any interactive job, start here. Also, the helpdesk will need the path to this folder to help you with interactive jobs.
You can find the name of the latest output folder with the following command:
$ ls -lhtr $HOME/${application name}/output | tail -n 1\n
For example, for RStudio Server:
$ ls -lhtr $HOME/ondemand/data/sys/dashboard/batch_connect/sys/ood-bih-rstudio-server/output | tail -n 1\n
Prevent Home From Filling Up
You should probably move ~/ondemand
to your work volume with the following:
$ mv ~/ondemand ~/work/ondemand\n$ ln -sr ~/work/ondemand ~/ondemand\n
Make sure to delete potential interactive sessions and to logout from the Ondemand Portal first. Otherwise, the ~/ondemand
folder is constantly recreated and the symlink will be just created within this folder as ~/ondemand/ondemand
and thus not be used as intended.
Also, clear out ~/work/ondemand/*
from time to time but take care that you don't remove the directory of any running job.
This description of starting an RStudio session is a showcase for starting other interactive apps as well.
To start the session, please go to Interactive Apps
in the top menu bar and select RStudio Server
or click RStudio Server
in the left-hand panel.
Allocate appropriate resources and click Launch
.
An info card for the RStudio Server will be added to My Interactive Sessions
, and during start, it will change its state from Queued
to Starting
to Running
. Depending on the app, resources allocated and current cluster usage, this will take a couple of seconds.
When in the final state (Running
), one can directly connect to the RStudio Server to get an interactive session by clicking Connect to RStudio Server
:
To use the OnDemand portal with a specific R installation including a stable set of custom packages you can use a conda enviroment from the cluster as a R source.
For this you may first need to create this conda environment including your R version of choice and all necessary packages. Specific installations of i.e. python from conda can be used similarly in other interactive apps.
channels:\n - conda-forge\n - bioconda\n - defaults\ndependencies:\n - r-base\n - r-essentials\n - r-devtools\n - bioconductor-deseq2\n - r-tidyverse\n - r-rmarkdown\n - r-knitr\n - r-dt\n
Some packages (i.e. several single-cell-RNAseq analysis tools) are only available from github and not on Cran/Bioconductor. There are two ways to install such packages into a conda enviroment.
Click to expand 1) Install from inside R \\[easier option, but not pure conda\\] * First setup the conda env, ideally including all dependencies for the desired package from github (and do include r-devtools) * Then within R run `devtools::install_github('owner/repo', dependencies=F, upgrade=F, lib='/path/to/conda/env-name/lib/R/library')` * if you don't have all dependencies already installed you will have to omit dependencies=F and risk a mix of conda & native R installed packages (or just have to redo the conda env). * github_install involves a build process and still needs a bit of memory, so this might crash on the default `srun --pty bash -i` shell 2) Build packages into a local conda channel \\[takes longer, but pure conda\\]\\ This approach is mostly taken from the answers given [here](https://stackoverflow.com/questions/52061664/install-r-package-from-github-using-conda). These steps must be taken _before_ building the final env used with Rstudio * use `conda skeleton cran https://github.com/owner/repo [--git-tag vX.Y]` to generate build files * conda skeleton only works for repositories with a release/version tag. If the package you want to install does not have that, you either need to create a fork and add a such a tag, or find a fork that already did that. Downloading the code directly from github and building the package from that is also possible, but you will the need to manually set up the `meta.yaml` and `build.sh` files that conda skeleton would create. * If there is more than one release tag, do specify which one you want, it may not automatically take the most recent one. * If any r-packages from bioconductor are dependencies, conda will not find them during the build process. You will need to change the respective entries in the `meta.yaml` file created by conda skeleton. I.e. change `r-deseq2` to `bioconductor-deseq2` * Build the package with `conda build --R= [--use-local] r-` * You need to specifying the same R-version used in the final conda env * If the github package has additional dependencies from github, build those first and then add `--use-local` so the build process can find them. * The build process definitely needs more memory than the default `srun --pty bash -i` shell. It also takes quite a bit of time (much longer than installing through devtools::install_github) * Finally add the packages (+versions) you built to the environment definition (i.e. yaml file) and create the (final) conda environment. Don't forget to tell conda to use locally build packages (either supply `--use-local` or add `- local` to the channel list in the yaml file)Starting the Rstudio session via the OnDemand portal works almost as described above (see Example 1). However, you do have to select `miniconda` as R source and provide the path to your miniconda installation and (separated by a colon) the name of the (newly created) conda enviroment you want to use.
Additional notes:
.libPaths()
entries and therefore a link to your previous conda installation. Creating a new project cleans .libPaths()
to only the env specified in setting up the Rstudio session.Status / Stability
OnDemand Support is currently in beta phase on the BIH HPC. In case of any issues, please send an email to hpc-helpdesk@bih-charite.de.
To allow for better interactive works, BIH HPC administration has setup an Open OnDemand (OOD) portal web server.
You can find the OnDemand Portal for HPC 4 Research at:
OOD allows you to access cluster resources using a web-based graphical interface in addition to traditional SSH connections. You can then connect to jobs running graphical applications either to virtual desktops (such as Matlab) or to web apps (such as Jupyter and RStudio Server).
The following figure illustrates this.
The primary way to the cluster continues to be SSH which has several advantages. By the nature of the cluster being based on Linux servers, it will offer more features through the \"native\" access and through its lower complexity, it will offer higher stability. However, we all like to have the option of a graphical interface, at least from time to time .
The main features are:
The first prerequisite is to have a cluster account already (see Getting Access). Once you have done your first SSH connection to the cluster successfully you can start using the portal. For this you perform the following steps:
_c
) then please use the \"Charit\u00e9 - Universit\u00e4tmedizin Berlin\" button, for MDC Accounts please use the \"Max Delbr\u00fcck Center Berlin\" button. Clicked the Wrong Login Button?
If you clicked the wrong button then please clear your cookies to force a logout of the system.
"},{"location":"ondemand/overview/#prepare-ondemand-folder","title":"Prepare OnDemand Folder","text":"The ondemand
folder is automatically created in your home directory, and the OnDemand service searches for this folder in your home directory, i.e. it has to stay there. But as the quota in the home directory is very limited, you can easily hit the hard quota which might prevent you from working on the cluster.
To prevent this, move the ~/ondemand
folder to the ~/work
folder and create a symlink for the now dislocated ~/ondemand
folder:
hpc-login-1:~$ mv ~/ondemand ~/work/ondemand\nhpc-login-1:~$ ln -sr ~/work/ondemand ~/ondemand\n
Important
Make sure to delete potential interactive sessions and to logout from the Ondemand Portal first. Otherwise, the ~/ondemand
folder is constantly recreated and the symlink will be just created within this folder as ~/ondemand/ondemand
and thus not be used as intended.
Problems with Open OnDemand?
First try to log out and login again. Next, try to clear all cookies for the domain hpc-portal.cubi.bihealth.org
. Finally, try the Help > Restart Web Server
link to restart the per-user nginx (PUN) server.
You will then be redirected to the dashboard screen.
Here you have access to the following actions. We will not go into detail of all of them and expect them to be self-explanatory.
Important
Please note that when using the portal then you are acting as your HPC user. Use standard best practice. Consider carefully what you do as you would from the command line (e.g., don't use the portal to browse the web from the cluster).
Outdated
This document is only valid for the old, third-generation file system and will be removed soon. Quotas of our new CephFS storage are communicated via the HPC Access web portal.
Accessing the quota report by selecting Files
and then Quotas
in the top menu will provide you with a detailed list of all quotas for directories that you are assigned to.
There are two types of quotas: for (a) size of and (b) number of files in a directory.
Every row in the table corresponds to a directory that you have access to. This implies your home directory (fast/users
) as well as the group directory of your lab (fast/groups
) and possible projects (fast/projects
) (if any). Quotas are not directly implied on these directories but on the home
, scratch
and work
subdirectories that each of subdirectory of the beforementioned directories has (for a detailed explanation see Storage and Volumes).
The following list explains the columns of the table:
/
) and substituting the underscores with a slash in the (users|groups|projects)_
and _(home|scratch|work)
substring. The corresponding path for name fast/users_stolpeo_c_home
would be /fast/users/stolpeo_c/home
.BIH HPC IT provides acess to high-performance compute (HPC) cluster systems. A cluster system bundles a high number of nodes and in the case of HPC, the focus is on performance (with contrast to high availability clusters).
"},{"location":"overview/architecture/#hpc-4-research","title":"HPC 4 Research","text":""},{"location":"overview/architecture/#cluster-hardware","title":"Cluster Hardware","text":"Users don't connect to nodes directly but rather create interactive or batch jobs to be executed by the cluster job scheduler Slurm.
As common with HPC systems, users cannot directly access the compute nodes but rather connect to so-called head nodes. The BIH HPC system provides the following head nodes:
login-1
and login-2
that accept SSH connections and are meant for low intensity, interactive work such as editing files, running screen/tmux sessions, and logging into the compute nodes. Users should run no computational tasks and no large-scale data transfer on these nodes.transfer-1
and transfer-2
also accept SSH connections. Users should run all large-scale data transfer through these nodes.After registration and client configurations, users with typically connect to the HPC system through the login nodes:
local:~$ ssh -l jdoe_c hpc-login-1.cubi.bihealth.org\nhpc-login-1:~$\n
Subsequently, they might submit batch jobs to the cluster for execution through the Slurm scheduling system or open interactive sessions:
hpc-login-1:~$ sbatch job_script.sh\nhpc-login-1:~$ srun --pty bash -i\nmed0104:~$\n
"},{"location":"overview/for-the-impatient/","title":"Overview","text":""},{"location":"overview/for-the-impatient/#bih-hpc-4-research","title":"BIH HPC 4 Research","text":"BIH HPC 4 Research is located in the BIH data center in Buch and connected via the BIH research network. Connections can be made from Charite, MDC, and BIH networks. The cluster is open for users with either Charite or MDC accounts after getting access through the gatekeeper proces. The system has been designed to be suitable for the processing of human genetics data from research contexts (and of course data without data privacy concerns such as public and mouse data).
"},{"location":"overview/for-the-impatient/#cluster-hardware-and-scheduling","title":"Cluster Hardware and Scheduling","text":"The cluster consists of the following major components:
hpc-login-1
and hpc-login-2
(for interactive sessions only),hpc-transfer-1
and hpc-transfer-2
,hpc-cpu-{1..228}
hpc-mem-{1..5}
,hpc-gpu-{1..7}
and 1 node with 10x A40 GPUs (!) hpc-gpu-8
,/fast
,This is shown by the following picture:
"},{"location":"overview/for-the-impatient/#differences-between-workstations-and-clusters","title":"Differences Between Workstations and Clusters","text":"The differences include:
srun
to go to a compute node.srun
to go to a compute node you might end up on a different host./tmp
./fast
directory is shared throughout the cluster which contains your home, group home, and project directories.root
or sudo
permissions on the cluster.sbatch
) over calling programs interactively.NB: the following might sound a bit harsh but is written with everyone's best intentions in mind (we actually like you, our user!) This addresses a lot of suboptimal (yet not dangerous, of course) points we observed in our users.
IT IS
IT IS NOT
sudo
.Once logged into the cluster through the login nodes, the Slurm scheduler needs to be used to submit computing jobs. In Slurm nomenclature, cluster compute nodes are assigned to one or more partitions. Submitted jobs are assigned to nodes according to the partition's configuration.
"},{"location":"overview/job-scheduler/#partitions","title":"Partitions","text":"The BIH HPC has the partitions described below. The cluster focuses on life science applications and not \"classic HPC\" with numerical computations using MPI. Thus, all partitions except for mpi
only allow to reserve resources on one node. This makes the cluster easier to use as users don't have to explicitely specify this limit when submitting their jobs.
standard
","text":"Jobs are submitted to the standard
partition by default. From the, the scheduler will route the jobs to their actual partition using the routing rule set described below. You can override this routing by explicitely assigning a partition (but this is discouraged).
gpu
queue.highmem
queue.debug
, short
, medium
, and long
long depending on their configured maximal running time. The partitions are evaluated in the order given above and the first fitting partition will be used.debug
","text":"This partition is for very short jobs that should be executed quickly, e.g., for tests. The job running time is limited to one hour and at most 128 cores can be used per user but the jobs are submitted with highest priority.
debug
--time 01:00:00
short
","text":"This partition is for jobs running only few hours. The priority of short jobs is high and many cores can be used at once to reward users for splitting their jobs into smaller parts.
short
--time 04:00:00
medium
","text":"This partition is for jobs running for multiple days. Users can only allocate the equivalent of 4 nodes.
medium
--time 7-00:00:00
long
","text":"This partition is for long-running tasks. Only one node can be reserved for so long to discourage really long-running jobs and encourage users for splitting their jobs into smaller parts.
long
--time 14-00:00:00
gpu
","text":"Jobs requesting GPU resources are automatically assigned to the gpu
partition.
The GPU nodes are only part of the gpu
partition so they are not blocked by normal compute jobs. Maximum run time is relatively high (14 days) to allow for longer training jobs. Contact hpc-helpdesk@bih-charite.de if you have longer running jobs that you really cannot make run any shorter for assistance.
Info
Fair use rules apply. As GPU nodes are a limited resource, excessive use by single users is prohibited and can lead to mitigating actions. Be nice and cooperative with other users. Tip: getent passwd USER_NAME
will give you a user's contact details.
gpu
$count
GPUs: -p gpu --gres=gpu:$card:$count
(card=tesla
or card=a40
), maximum run time: --time 14-00:00:00
highmem
","text":"Jobs requesting more than 200 GB of RAM are automatically routed to the highmem
partition.
The high memory nodes are only part of the highmem
partition so they are not blocked by normal compute jobs. Maximum run time is relatively high (14 days) to allow for longer jobs. Contact hpc-helpdesk@bih-charite.de for assistance if you have longer running jobs that you really cannot make run any shorter.
Info
Fair use rules apply. As high-memory nodes are a limited resource, excessive use by single users is prohibited and can lead to mitigating actions. Be nice and cooperative with other users. Tip: getent passwd USER_NAME
will give you a user's contact details.
highmem
-p highmem
, maximum run time: --time 14-00:00:00
mpi
","text":"Jobs are not routed automatically to the mpi
partition but you have to explitely request the partition. This is the only partition in which more than one node can be allocated to a job.
You can submit multi-node jobs into the mpi
partition. Maximum run time is relatively high (14 days) to allow for longer jobs. Don't abuse this. Contact hpc-helpdesk@bih-charite.de for assistance if you have longer running jobs that you really cannot make run any shorter.
highmem
-p mpi
, maximum run time: --time 14-00:00:00
critical
","text":"Jobs are not routed into critial
automatically and the partition has to be selected manually.
This partition is for time-critical jobs with deadlines. As long as the cluster is not very busy, requests for critical jobs will be granted most of the time. However, do not use this partition without arranging with hpc-helpdesk as killing jobs will be used as the ultima ratio in case of such policy violations.
critical
--time 7-00:00:00
We currently provide you only with Ganglia for monitoring the cluster status.
"},{"location":"overview/monitoring/#using-ganglia","title":"Using Ganglia","text":"Go to the following address and login with your home organization (Charite or MDC):
Ganglia does not know about Slurm
Ganglia will not show you anything about the Slurm job schedulign system. If a job uses a whole node but uses no CPUs then this will be displayed as unused in Ganglia. However, Slurm would not schedule another job on this node.
You will be show a screen as shown below. This allows you to get a good idea of what is going on on the HPC.
By default you will be shown the cluster usage of the last day. You can quickly switch to report for two or four hours as well, etc.
In the first row of pictures you see the number of total CPUs (actually hardware threads), number of hosts seen as up and down by Ganglia, and cluster load/utilization. You will then see the overall cluster load, memory usage, CPU usage, and network utilization across the selected time period.
Linux load is not intuitive
Note that the technical details behind Linux load is not very interactive. It is incorporating much more than just the CPU usage. You can find a quite comprehensive treatement of Linux Load here.
We are using a fast shared storage system and almost no local storage (except in /tmp
). Also, almost no jobs use MPI or other heavy network communication. Thus, the network utilization is a good measure of the I/O on the cluster.
Below, you can drill down into various metrics and visualize them historically. Just try it out and find your way around, you cannot break anything. Sadly, there is no good documentation of Ganglia online.
"},{"location":"overview/monitoring/#aggregate-gpu-utilization-visualization","title":"Aggregate GPU Utilization Visualization","text":"Ganglia allows you to obtain metrics in several interesting and useful ways. If you click on \"Aggregate Graphs\" then you could enter the following values to get an overview of the live GPU utilization.
Aggreate GPU Utilization
hpc-gpu-.*
gpu._util
Stacked
Hide legend
Then click Create Graph
.
If a GPU is fully used, it will contribute 100 points on the vertical axis. See above for an example, and here is a direct link:
No mounting on the cluster itself.
For various technical and security-related reasons it is not possible to mount anything on the cluster nodes by users. For mounting the cluster storage on your computer, please read Connecting: SSHFS Mounts.
This document gives an overview of the nodes and volumes on the cluster.
"},{"location":"overview/storage/#cluster-layout","title":"Cluster Layout","text":""},{"location":"overview/storage/#cluster-nodes","title":"Cluster Nodes","text":"The following groups of nodes are available to cluster users. There are a number of nodes that are invisible to non-admin staff, hosting the queue master and monitoring tools and providing backup storage for key critical data, but these are not shown here.
hpc-login-{1,2}
hpc-login-{1,2}.cubi.bihealth.org
med0101..0124,0127
med0133..0164
med0201..0264
med0301..0304
med0401..0405
special purpose/high-memory machinesmed0401
and med0402
med0403
and med0404
med0405
gpu
)med0601..0616
med0618..0633
med0701..0764
The cluster has 2.1 PB of legacy fast storage, currently available at /fast
, as well as 1.6 PB of next-generation fast storage, available at /data/cephfs-1
. Additionally 7.4 PB of slower \"Tier 2\" storage is available at /data/cephfs-2
. Storage is provided by a Ceph storage cluster and designed for massively parallel access from an HPC system. In contrast to \"single server\" NFS systems, the system can provide large bandwidth to all cluster nodes in parallel as long as large data means relatively \"few\" files are read and written.
Storage is split into three sections:
home
-- small, persistent, and safe storage, e.g., for documents and configuration files (default quota of 1 GB).work
-- larger and persistent storage, e.g., for your large data files (default quota of 1 TB).scratch
-- large and non-persistent storage, e.g., for temporary files, files are automatically deleted after 2 weeks (default quota of 10 TB; deletion not implemented yet).)Each user, group, and project has one or more of these sections each, e. g. for users:
/data/cephfs-1/home/users/$NAME
/data/cephfs-1/home/users/$NAME/work
/data/cephfs-1/home/users/$USER/scratch
See Storage and Volumes: Locations for more informatin.
"},{"location":"slurm/background/","title":"Introduction to Scheduling","text":"As explained elsewhere in more detail, an HPC cluster consists of multiple computers connected via a network and working together. Multiple users can use the system simultaneously to do their work. This means that the system needs to join multiple computers (nodes) to provide a coherent view of them and the same time partition the system to allow multiple users to work concurrently.
user 1 user 2 ...\n\n .---. .---. .---. .---.\n | J | | J | | J | | J |\n | o | | o | | o | | o | ...\n | b | | b | | b | | b |\n | 1 | | 2 | | 3 | | 4 |\n '---' '---' '---' '---'\n\n.------------------------------------------.\n| Cluster Scheduler |\n'------------------------------------------'\n\n.----------. .------------. .------------.\n| multiple | | separate | | computers |\n'----------' '------------' '------------'\n
"},{"location":"slurm/background/#interlude-partitioning-single-computers","title":"Interlude: Partitioning Single Computers","text":"Overall, this partitioning is not so different from how your workstation or laptop works. Most likely, your computer (or even your smartphone) has multiple processors (or cores). You can run multiple programs on the same computer and the fact that (a) there is more than one core and (b) there is more than one program running is not known to the running programs (unless they explicitly communicate with each other). Different programs can explicitly take advantage of the multiple processor cores. The main difference is that you normally use your computer in an interactive fashion (you perform an action and expect an immediate reaction).
Even with a single processor (and core), your computer manages to run more than one program at the same time. This is done with the so-called time-slicing approach where the operating system lets each programs run in turn for a short time (a few milliseconds). A program with a higher priority will get more time slices than one with a lower (e.g., your audio player has real-time requirements and you will hear artifacts if it is starved for compute resources). Your operating system protects programs from each other by creating an address space for each. When two programs are running, the value of the memory at any given position in one program is independent from the value in the other program. Your operating system offers explicit functionality for sharing certain memory areas that two programs can use to exchange data efficiently.
Similarly, file permissions with Unix users/groups or Unix/Windows ACLs (access control lists) are used to isolate users from each other. Programs can share data by accessing the same file if they can both access it. There are special files called sockets that allow for network-like inter-process communication but of course two programs on the same computer can also connect (virtually) via the computer network (no data will actually go through a cable).
"},{"location":"slurm/background/#interlude-resource-types","title":"Interlude: Resource Types","text":"As another diversion, let us consider how Unix manages its resources. This is important to understand when requesting resources from the scheduler later on.
First of all, a computer might offer a certain feature such as a specific hardware platform or special network connection. Examples for this on the BIH HPC are specific Intel processor generations such as haswell
or the availability of Infiniband networking. You can request these with so-called constraints; they are not allocated to specific jobs.
Second, there are resources that are allocated to specific jobs. The most important resources here are:
Generally, once a resource has been allocated to one job, it is not available to another. This means if you allocating more resources to your job that you actually need (overallocation) then those resources are not available to other jobs (whether they are your jobs or those of other users). This will be explained further below.
Another example of resource allocation are licenses. The BIH HPC has a few Matlab 2016b licenses that users can request. As long as a license is allocated to one job, it is unavailable to another.
"},{"location":"slurm/background/#nodes-sockets-processors-cores-threads","title":"Nodes, Sockets, Processors, Cores, Threads","text":"Regarding compute resources, Slurm differentiates between:
In most cases, you will use one compute node only. When using more than one node, you will need to use some form of message passing, e.g., MPI, so processes on different nodes can communicate. On a single node you would mostly use single- or multi-threaded processes, or multiple processes.
Above: Slurm's nomenclature for sockets, processors, cores, and threads (from Slurm Documentation).
Co-locating processes/threads on the same socket has certain implications that are mostly useful for numerical applications. We will not further go into detail here. Slurm provides many different features of ways to specify allocation of \"pinning\" to specific process locations. If you need this feature, we trust that you find sufficient explanation in the Slurm documentation.
Usually, you would allocate multiple cores (a term Slurm uses synonymously with processors) on a single node (allocation on a single node is the default).
"},{"location":"slurm/background/#how-scheduling-works","title":"How Scheduling Works","text":"Slurm is an acronym for \"Simple Linux Unix Resource Manager\" (note that the word \"scheduler\" does not occur here). Actually, one classically differentiates between the managing of resources and the scheduling of jobs that use them. The resource manager allocates resources according to a user's request for a job and ensures that there are no conflicts. If the required resources are not available, the scheduler puts the user's job into a queue. Later, when then requested resources become available the scheduler assigns them to the job and runs it. In the following, both resource allocation and the running of the job are described as being done by the scheduler.
The interesting case occurs when there are not enough resources available for at least two jobs submitted to the scheduler. The scheduler has to decide how to proceed. Consider the simplified case of only scheduling cores. Each job will request a number of cores. The scheduler will then generate a scheduling plan that might look as follows.
core\n ^\n4 | |---job2---|\n3 | |---job2---|\n2 | |---job2---|\n1 | |--job1--|\n +--------------------------> t time\n 5 1 1 2\n 0 5 0\n
job1
has been allocated one core and job2
has been allocated two cores. When job3
, requesting one core is submitted at t = 5, it has to wait at least as long until job1
is finished. If job3
requested two or more cores, it would have to wait at least until job2
also finished.
We can now ask several questions, including the following:
Also see the Slurm Frequently Asked Questions.
Please note that even if all jobs were known at the start of time, scheduling is still a so-called NP-complete problem. Entire computer science journals and books are dedicated only to scheduling. Things get more complex in the case of online scheduling, in which new jobs can appear at any time. In practice, Slurm does a fantastic job with its heuristics but it heavily relies on parameter tuning. HPC administration is constantly working on optimizing the scheduler settings. Note that you can use the --format
option to the squeue
command to request that it shows you information about job scheduling (in particular, see the %S
field, which will show you the expected start time for a job, assuming Slurm has calculated it). See man squeue
for details. If you observe inexplicable behavior, please notify us at hpc-helpdesk@bih-charite.de
.
In Slurm, the nodes of a cluster are split into partitions. Nodes are assigned to one or more partition (see the Job Scheduler section for details). Jobs can also be assigned to one or more partitions and are executed on nodes of the given partition.
In the BIH HPC, partitions are used to stratify jobs of certain running times and to provide different quality of service (e.g., maximal number of CPU cores available to a user for jobs of a certain running time and size). The partitions gpu
and highmem
provide special hardware (the nodes are not assigned to other partitions) and the mpi
partition allows MPI-parallelism and the allocation of jobs to more than one node. The Job Scheduler provides further details.
This page contains assorted Slurm commands and Bash snippets that should be helpful.
man
pages!
$ man sinfo\n$ man scontrol\n$ man squeue\n# etc...\n
interactive sessions
hpc-login-1:~$ srun --pty bash\nmed0740:~$ echo \"Hello World\"\nmed0740:~$ exit\n
batch submission
hpc-login-1:~$ sbatch script.sh\nSubmitted batch job 2\nhpc-login-1:~$ squeue\n JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)\n 27 debug script.s holtgrem R 0:06 1 med0703\n
listing nodes
$ sinfo -N\nNODELIST NODES PARTITION STATE\nmed0740 1 debug* idle\nmed0741 1 debug* down*\nmed0742 1 debug* down*\n\n$ scontrol show nodes\nNodeName=med0740 Arch=x86_64 CoresPerSocket=8\n CPUAlloc=0 CPUTot=32 CPULoad=0.06\n AvailableFeatures=(null)\n[...]\n\n$ scontrol show nodes med0740\nNodeName=med0740 Arch=x86_64 CoresPerSocket=8\n CPUAlloc=0 CPUTot=32 CPULoad=0.06\n AvailableFeatures=(null)\n ActiveFeatures=(null)\n Gres=(null)\n NodeAddr=med0740 NodeHostName=med0740 Version=20.02.0\n OS=Linux 3.10.0-1062.12.1.el7.x86_64 #1 SMP Tue Feb 4 23:02:59 UTC 2020\n RealMemory=1 AllocMem=0 FreeMem=174388 Sockets=2 Boards=1\n State=IDLE ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A\n Partitions=debug\n BootTime=2020-03-05T00:54:15 SlurmdStartTime=2020-03-05T16:23:25\n CfgTRES=cpu=32,mem=1M,billing=32\n AllocTRES=\n CapWatts=n/a\n CurrentWatts=0 AveWatts=0\n ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s\n
queue states
$ squeue\n JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)\n$ squeue -u holtgrem_c\n JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)\n
node resources
$ sinfo -o \"%20N %10c %10m %25f %10G \"\n
additional resources such as GPUs
$ sinfo -o \"%N %G\"\n
listing job details
$ scontrol show job 225\nJobId=225 JobName=bash\n UserId=XXX(135001) GroupId=XXX(30069) MCS_label=N/A\n Priority=4294901580 Nice=0 Account=(null) QOS=normal\n JobState=FAILED Reason=NonZeroExitCode Dependency=(null)\n Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=130:0\n RunTime=00:16:27 TimeLimit=14-00:00:00 TimeMin=N/A\n SubmitTime=2020-03-23T11:34:26 EligibleTime=2020-03-23T11:34:26\n AccrueTime=Unknown\n StartTime=2020-03-23T11:34:26 EndTime=2020-03-23T11:50:53 Deadline=N/A\n SuspendTime=None SecsPreSuspend=0 LastSchedEval=2020-03-23T11:34:26\n Partition=gpu AllocNode:Sid=hpc-login-1:1918\n ReqNodeList=(null) ExcNodeList=(null)\n NodeList=med0301\n BatchHost=med0301\n NumNodes=1 NumCPUs=2 NumTasks=0 CPUs/Task=1 ReqB:S:C:T=0:0:*:*\n TRES=cpu=2,node=1,billing=2\n Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*\n MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0\n Features=(null) DelayBoot=00:00:00\n OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)\n Command=bash\n WorkDir=XXX\n Power=\n TresPerNode=gpu:tesla:4\n MailUser=(null) MailType=NONE\n
host:~$ squeue\n JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) \n 1177 medium bash jweiner_ R 4-21:52:24 1 med0127 \n 1192 medium bash jweiner_ R 4-07:08:40 1 med0127 \n 1209 highmem bash mkuhrin_ R 2-01:07:17 1 med0402 \n 1210 gpu bash hilberta R 1-10:30:34 1 med0304 \n 1213 long bash schubacm R 1-09:42:27 1 med0127 \n 2401 gpu bash ramkem_c R 1-05:14:53 1 med0303 \n 2431 medium ngs_mapp holtgrem R 1-05:01:41 1 med0127 \n 2437 critical snakejob holtgrem R 1-05:01:34 1 med0135 \n 2733 debug bash schubacm R 7:36:42 1 med0127 \n 3029 critical ngs_mapp holtgrem R 5:59:07 1 med0127 \n 3030 critical snakejob holtgrem R 5:56:23 1 med0134 \n 3031 critical snakejob holtgrem R 5:56:23 1 med0137 \n 3032 critical snakejob holtgrem R 5:56:23 1 med0137 \n 3033 critical snakejob holtgrem R 5:56:23 1 med0138 \n 3034 critical snakejob holtgrem R 5:56:23 1 med0138 \n 3035 critical snakejob holtgrem R 5:56:20 1 med0139 \n 3036 critical snakejob holtgrem R 5:56:20 1 med0139 \n 3037 critical snakejob holtgrem R 5:56:20 1 med0140 \n 3038 critical snakejob holtgrem R 5:56:20 1 med0140 \n 3039 critical snakejob holtgrem R 5:56:20 1 med0141 \n 3040 critical snakejob holtgrem R 5:56:20 1 med0141 \n 3041 critical snakejob holtgrem R 5:56:20 1 med0142 \n 3042 critical snakejob holtgrem R 5:56:20 1 med0142 \n 3043 critical snakejob holtgrem R 5:56:20 1 med0143 \n 3044 critical snakejob holtgrem R 5:56:20 1 med0143 \n 3063 long bash schubacm R 4:12:37 1 med0127 \n 3066 long bash schubacm R 4:11:47 1 med0127 \n 3113 medium ngs_mapp holtgrem R 1:52:33 1 med0708 \n 3118 medium snakejob holtgrem R 1:50:38 1 med0133 \n 3119 medium snakejob holtgrem R 1:50:38 1 med0703 \n 3126 medium snakejob holtgrem R 1:50:38 1 med0706 \n 3127 medium snakejob holtgrem R 1:50:38 1 med0144 \n 3128 medium snakejob holtgrem R 1:50:38 1 med0144 \n 3133 medium snakejob holtgrem R 1:50:35 1 med0147 \n 3134 medium snakejob holtgrem R 1:50:35 1 med0147 \n 3135 medium snakejob holtgrem R 1:50:35 1 med0148 \n 3136 medium snakejob holtgrem R 1:50:35 1 med0148 \n 3138 medium snakejob holtgrem R 1:50:35 1 med0104 \n
host:~$ squeue -o \"%.10i %9P %20j %10u %.2t %.10M %.6D %10R %b\"\n JOBID PARTITION NAME USER ST TIME NODES NODELIST(R TRES_PER_NODE\n 1177 medium bash jweiner_m R 4-21:52:22 1 med0127 N/A\n 1192 medium bash jweiner_m R 4-07:08:38 1 med0127 N/A\n 1209 highmem bash mkuhrin_m R 2-01:07:15 1 med0402 N/A\n 1210 gpu bash hilberta_c R 1-10:30:32 1 med0304 gpu:tesla:4\n 1213 long bash schubacm_c R 1-09:42:25 1 med0127 N/A\n 2401 gpu bash ramkem_c R 1-05:14:51 1 med0303 gpu:tesla:1\n 2431 medium ngs_mapping holtgrem_c R 1-05:01:39 1 med0127 N/A\n 2437 critical snakejob.ngs_mapping holtgrem_c R 1-05:01:32 1 med0135 N/A\n 2733 debug bash schubacm_c R 7:36:40 1 med0127 N/A\n 3029 critical ngs_mapping holtgrem_c R 5:59:05 1 med0127 N/A\n 3030 critical snakejob.ngs_mapping holtgrem_c R 5:56:21 1 med0134 N/A\n 3031 critical snakejob.ngs_mapping holtgrem_c R 5:56:21 1 med0137 N/A\n 3032 critical snakejob.ngs_mapping holtgrem_c R 5:56:21 1 med0137 N/A\n 3033 critical snakejob.ngs_mapping holtgrem_c R 5:56:21 1 med0138 N/A\n 3034 critical snakejob.ngs_mapping holtgrem_c R 5:56:21 1 med0138 N/A\n 3035 critical snakejob.ngs_mapping holtgrem_c R 5:56:18 1 med0139 N/A\n 3036 critical snakejob.ngs_mapping holtgrem_c R 5:56:18 1 med0139 N/A\n 3037 critical snakejob.ngs_mapping holtgrem_c R 5:56:18 1 med0140 N/A\n 3038 critical snakejob.ngs_mapping holtgrem_c R 5:56:18 1 med0140 N/A\n 3039 critical snakejob.ngs_mapping holtgrem_c R 5:56:18 1 med0141 N/A\n 3040 critical snakejob.ngs_mapping holtgrem_c R 5:56:18 1 med0141 N/A\n 3041 critical snakejob.ngs_mapping holtgrem_c R 5:56:18 1 med0142 N/A\n 3042 critical snakejob.ngs_mapping holtgrem_c R 5:56:18 1 med0142 N/A\n 3043 critical snakejob.ngs_mapping holtgrem_c R 5:56:18 1 med0143 N/A\n 3044 critical snakejob.ngs_mapping holtgrem_c R 5:56:18 1 med0143 N/A\n 3063 long bash schubacm_c R 4:12:35 1 med0127 N/A\n 3066 long bash schubacm_c R 4:11:45 1 med0127 N/A\n 3113 medium ngs_mapping holtgrem_c R 1:52:31 1 med0708 N/A\n 3118 medium snakejob.ngs_mapping holtgrem_c R 1:50:36 1 med0133 N/A\n 3119 medium snakejob.ngs_mapping holtgrem_c R 1:50:36 1 med0703 N/A\n 3126 medium snakejob.ngs_mapping holtgrem_c R 1:50:36 1 med0706 N/A\n 3127 medium snakejob.ngs_mapping holtgrem_c R 1:50:36 1 med0144 N/A\n 3128 medium snakejob.ngs_mapping holtgrem_c R 1:50:36 1 med0144 N/A\n 3133 medium snakejob.ngs_mapping holtgrem_c R 1:50:33 1 med0147 N/A\n 3134 medium snakejob.ngs_mapping holtgrem_c R 1:50:33 1 med0147 N/A\n 3135 medium snakejob.ngs_mapping holtgrem_c R 1:50:33 1 med0148 N/A\n 3136 medium snakejob.ngs_mapping holtgrem_c R 1:50:33 1 med0148 N/A\n 3138 medium snakejob.ngs_mapping holtgrem_c R 1:50:33 1 med0104 N/A\n
host:~$ sinfo\nPARTITION AVAIL TIMELIMIT NODES STATE NODELIST \ndebug* up 8:00:00 11 drain med[0707,0709-0710,0740-0742,0744-0745,0749,0752,0755] \ndebug* up 8:00:00 8 mix med[0104,0127,0133-0135,0703,0706,0708] \ndebug* up 8:00:00 10 alloc med[0137-0144,0147-0148] \ndebug* up 8:00:00 103 idle med[0105-0124,0136,0145-0146,0151-0164,0201-0264,0704-0705] \nmedium up 7-00:00:00 11 drain med[0707,0709-0710,0740-0742,0744-0745,0749,0752,0755] \nmedium up 7-00:00:00 8 mix med[0104,0127,0133-0135,0703,0706,0708] \nmedium up 7-00:00:00 10 alloc med[0137-0144,0147-0148] \nmedium up 7-00:00:00 103 idle med[0105-0124,0136,0145-0146,0151-0164,0201-0264,0704-0705] \nlong up 28-00:00:0 11 drain med[0707,0709-0710,0740-0742,0744-0745,0749,0752,0755] \nlong up 28-00:00:0 8 mix med[0104,0127,0133-0135,0703,0706,0708] \nlong up 28-00:00:0 10 alloc med[0137-0144,0147-0148] \nlong up 28-00:00:0 103 idle med[0105-0124,0136,0145-0146,0151-0164,0201-0264,0704-0705] \ncritical up 7-00:00:00 11 drain med[0707,0709-0710,0740-0742,0744-0745,0749,0752,0755] \ncritical up 7-00:00:00 8 mix med[0104,0127,0133-0135,0703,0706,0708] \ncritical up 7-00:00:00 10 alloc med[0137-0144,0147-0148] \ncritical up 7-00:00:00 103 idle med[0105-0124,0136,0145-0146,0151-0164,0201-0264,0704-0705] \nhighmem up 14-00:00:0 1 mix med0402 \nhighmem up 14-00:00:0 3 idle med[0401,0403-0404] \ngpu up 14-00:00:0 2 mix med[0303-0304] \ngpu up 14-00:00:0 2 idle med[0301-0302] \n
"},{"location":"slurm/commands-sacct/","title":"Slurm Command: sacct
","text":"Perform queries to the Slurm accounting information.
Representative Example
hpc-login-1:~$ sacct -j 1607103\n JobID JobName Partition Account AllocCPUS State ExitCode\n------------ ---------- ---------- ---------- ---------- ---------- --------\n1607103 wgs_sv_an+ medium 1 PENDING 0:0\n
The sacct
command displays information from the Slurm accounting service. The Slurm scheduler only knows about active or completing (very recently active) jobs. The accouting system also knows about currently running jobs so it is the more robust way to query information about jobs. However, not all information is available to the accouting system, so scontrol show job
and squeue
provide more information about current and pending jbos.
Slurm Documentation: sacct
Please also see the official Slurm documentation on sacct.
"},{"location":"slurm/commands-sacct/#important-arguments","title":"Important Arguments","text":"Also see all important arguments of the sbatch
command.
--jobs
-- The job(s) to query for.--format
-- Define attributes to retrieve.--long
-- Get a lot of information from the database, consider to pipe into | less -S
.sacct
over scontrol
and squeue
.sattach
","text":"The sattach
command allows you to connect the standard input, output, and error streams to your current terminals ession.
Representative Example
hpc-login-1:~$ sattach 12345.0\n[...output of your job...]\nmed0211:~$ [Ctrl-C]\nhpc-login-1:~$\n
Press Ctrl-C
to detach from the current session. Please note that you will have to give the job ID as well as step step ID. For most cases, simply append \".0\"
to your job ID.
Slurm Documentation: sattach
Please also see the official Slurm documentation on srun.
"},{"location":"slurm/commands-sattach/#important-arguments","title":"Important Arguments","text":"--pty
-- Execute task zero in pseudo terminal.--verbose
-- Increase verbosity of sattach
.sbatch
","text":"The sbatch
command allows you to put a job into the scheduler's queue to be executed at a later time.
Representative Example
# Execute job.sh in partition medium with 4 threads and 4GB of RAM total for a\n# running time of up to one day.\nhpc-login-1:~$ sbatch --partition=medium --mem=4G --ntasks 4 --time=1-00 job.sh\nSubmitted batch job JOB_ID\n
The command will create a batch job and add it to the queue to be executed at a later point in time.
Slurm Documentation: sbatch
Please also see the official Slurm documentation on sbatch.
"},{"location":"slurm/commands-sbatch/#important-arguments","title":"Important Arguments","text":"--array
-- Submit jobs as array jobs. Also see the section [#array-jobs] below.--nodes
-- The number of nodes to allocate. This is only given here as an important argument as the maximum number of nodes allocatable to any partition but mpi
is set to one (1). This is done as there are few users on the BIH HPC that actually use multi-node paralleilsm. Rather, most users will use multi-core parallelism and might forget to limit the number of nodes which causes inefficient allocation of resources.--cpus-per-task
-- This corresponds to the number of CPU cores allocated to each task.--mem
-- The memory to allocate for the job. As you can define minimal and maximal number of tasks/CPUs/cores, you could also specify --mem-per-cpu
and get more flexible scheduling of your job.--gres
-- Generic resource allocation. On the BIH HPC, this is only used for allocating GPUS, e.g., with --gres=gpu:tesla:2
, a user could allocate two NVIDIA Tesla GPUs on the same host (use a40
instead of tesla
for the A40 GPUs).--licenses
-- On the BIH HPC, this is used for the allocation of MATLAB 2016b licenses only.--partition
-- The partition to run in. Also see the Job Scheduler section.--time
-- Specify the running time, see man sbatch
or the official Slurm documentation on srun for supported formats. **Please note that the DRMA API only accepts the hours:minutes
format.--dependency
-- Specify dependencies on other jobs, e.g., using --dependency afterok:JOBID
to only execute if the job with ID JOBID
finished successfully or --dependency after:JOBID
to wait for a job to finish regardless of its termination status.--constraint
-- Require one or more features from your node. On the BIH HPC, the processor generation is defined as a feature on the nodes, e.g., haswell
, or special networking such as infiniband
. You can have a look at /etc/slurm/slurm.conf
on all configured features.--output
-- The path to the output log file (by default joining stdout and stderr, see the man page on --error
on how to redirect stderr separately). A various number of placeholders is available, see the \"filename pattern\" section of man sbatch
or the official Slurm documentation on srun.--mail-type=<type>
-- Send out notifications by email when an event occurs. Use FAIL
to get emails when your job fails. Also see the documentation of sbatch in the Slurm manual.--mail-user=<email>
-- The email address to send to. Must end in @charite.de
, @mdc-berlin.de
, or @bih-charite.de
.Ensure your --output
directory exists!
In the case that the path to the log/output file does not exist, the job will just fail. scontrol show job ID
will report JobState=FAILED Reason=NonZeroExitCode
. Regrettably, no further information is displayed to you as the user. Always check that the path to the directories in StdErr
and StdOut
exists when checking scontrol show job ID
.
--job-name
Also see the section Slurm Job Scripts on how to embed the sbatch
parameters in #SBATCH
lines.
If you have many (say, more than 10) similar jobs (e.g., when performing a grid search), you can also use array jobs. However, you should also consider whether it would make sense to increase the time of your jobs, e.g, to be at least ~10min.
You can submit array jobs by specifying -a EXPR
or --array EXPR
where EXPR
is a range or a list (of course, you can also add this as an #SBATCH
header in your job script). For example:
hpc-login-1 ~# sbatch -a 1-3 grid_search.sh\nhpc-login-1 ~# sbatch -a 1,2,5-10 grid_search.sh\n
This will submit grid_search.sh
with certain variables set:
SLURM_ARRAY_JOB_ID
-- the ID of the first jobSLURM_ARRAY_TASK_ID
-- the index of the job in the arraySLURM_ARRAY_TASK_COUNT
-- number of submitted jobs in arraySLURM_ARRAY_TASK_MAX
-- higehst job array index valueSLURM_ARRAY_TASK_MIN
-- lowest job array index valueUsing array jobs has several advantages:
Also see Slurm documentation on job arrays.
For example, if you submit sbatch --array=1-3 grid_search.sh
and slurm responsds with Submitted batch job 36
then the script will be run three times with the following prameters set:
SLURM_JOB_ID=36\nSLURM_ARRAY_JOB_ID=36\nSLURM_ARRAY_TASK_ID=1\nSLURM_ARRAY_TASK_COUNT=3\nSLURM_ARRAY_TASK_MAX=3\nSLURM_ARRAY_TASK_MIN=1\n\nSLURM_JOB_ID=37\nSLURM_ARRAY_JOB_ID=36\nSLURM_ARRAY_TASK_ID=2\nSLURM_ARRAY_TASK_COUNT=3\nSLURM_ARRAY_TASK_MAX=3\nSLURM_ARRAY_TASK_MIN=1\n\nSLURM_JOB_ID=38\nSLURM_ARRAY_JOB_ID=36\nSLURM_ARRAY_TASK_ID=3\nSLURM_ARRAY_TASK_COUNT=3\nSLURM_ARRAY_TASK_MAX=3\nSLURM_ARRAY_TASK_MIN=1\n
"},{"location":"slurm/commands-sbatch/#notes","title":"Notes","text":"sbatch
are governed by resource allocations, in particular:sbatch
jobs have a maximal running time set,sbatch
jobs have a maximal memory and number of cores set, andscontrol show job JOBID
.scancel
","text":"Terminate a running Slurm job.
Representative Example
hpc-login-1:~$ scancel 1703828\nhpc-login-1:~$\n
This command allows to terminate one or more running jobs (of course, non-superusers can only terminate their own jobs).
Slurm Documentation: scancel
Please also see the official Slurm documentation on srun.
"},{"location":"slurm/commands-scontrol/","title":"Slurm Command:scontrol
","text":"The scontrol
allows to query detailed information from the scheduler and perform manipulation. Object manipulation is less important for normal users.
Representative Example
hpc-login-1:~$ scontrol show job 1607103\nJobId=1607103 JobName=wgs_sv_annotation\n UserId=holtgrem_c(100131) GroupId=hpc-ag-cubi(5272) MCS_label=N/A\n Priority=748 Nice=0 Account=(null) QOS=normal\n [...]\nhpc-login-1:~$ scontrol show node med02[01-32]\nNodeName=med0201 Arch=x86_64 CoresPerSocket=8\n CPUAlloc=0 CPUTot=32 CPULoad=0.01\n AvailableFeatures=ivybridge,infiniband\n ActiveFeatures=ivybridge,infiniband\n [...]\nhpc-login-1:~$ scontrol show partition medium\nPartitionName=medium\n AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL\n AllocNodes=ALL Default=NO QoS=medium\n DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO\n [...]\n
This command allows to query all information for an object from Slurm, e.g., jobs, nodes, or partitions. The command also accepts ranges of jobs and hosts. It is most useful to get the information of one or a few objects from the scheduler.
Slurm Documentation: scontrol
Please also see the official Slurm documentation on scontrol.
"},{"location":"slurm/commands-scontrol/#important-sub-commands","title":"Important Sub commands","text":"scontrol show job
-- Show details on jobs.scontrol show partition
-- Show details on partitions.scontrol show node
-- Show details on nodes.scontrol help
-- Show help.scontrol
-- Start an interactive scontrol shell / REPL (read-eval-print loop).scontrol
can only work on jobs that are pending (in the queue), running, or in \"completing' state.sacct
command.sinfo
","text":"The sinfo
command allows you to query the current cluster status.
Representative Example
hpc-login-1:~$ sinfo\nPARTITION AVAIL TIMELIMIT NODES STATE NODELIST\n[...]\nmedium up 7-00:00:00 10 drain* med[0101-0103,0125-0126,0128-0132]\nmedium up 7-00:00:00 1 down* med0243\nmedium up 7-00:00:00 31 mix med[0104,0106-0122,0124,0133,0232-0233,0237-0238,0241-0242,0244,0263-0264,0503,0506]\nmedium up 7-00:00:00 5 alloc med[0105,0123,0127,0239-0240]\nmedium up 7-00:00:00 193 idle med[0134-0164,0201-0231,0234-0236,0245-0262,0501-0502,0504-0505,0507-0516,0601-0632,0701-0764]\n[...]\nhpc-login-1:$ sinfo --summarize\nPARTITION AVAIL TIMELIMIT NODES(A/I/O/T) NODELIST\ndebug* up 8:00:00 38/191/11/240 med[0101-0164,0201-0264,0501-0516,0601-0632,0701-0764]\nmedium up 7-00:00:00 38/191/11/240 med[0101-0164,0201-0264,0501-0516,0601-0632,0701-0764]\nlong up 28-00:00:0 38/191/11/240 med[0101-0164,0201-0264,0501-0516,0601-0632,0701-0764]\ncritical up 7-00:00:00 25/141/10/176 med[0101-0164,0501-0516,0601-0632,0701-0764]\nhighmem up 14-00:00:0 1/2/1/4 med[0401-0404]\ngpu up 14-00:00:0 3/0/1/4 med[0301-0304]\nmpi up 14-00:00:0 38/191/11/240 med[0101-0164,0201-0264,0501-0516,0601-0632,0701-0764]\n
This command will summaries the state of nodes by different criteria (e.g., by partition or globally).
Slurm Documentation: sinfo
Please also see the official Slurm documentation on srun.
"},{"location":"slurm/commands-sinfo/#important-arguments","title":"Important Arguments","text":"Also see all important arguments of the sinfo
command.
--summarize
-- Summarize the node state by partition.--nodes
-- Select the nodes to show the status for, e.g., display the status of all GPU nodes with sinfo -n med030[1-4]
.The most important node states are:
down
-- node is marked as offlinedraining
-- node will not accept any more jobs but has jobs running on itdrained
-- node will not accept any more jobs and has no jobs running on it, but is not offline yetidle
-- node is ready to run jobsallocated
-- node is fully allocated (e.g., CPU, RAM, or GPU limit has been reached)mixed
-- node is running jobs but there is space for moresqueue
","text":"The squeue
command allows you to view currently running and pending jobs.
Representative Example
hpc-login-1:~$ squeue\n JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)\n 1583165 highmem 20200702 usr PD 0:00 1 (DependencyNeverSatisfied)\n 1605901 critical variant_ holtgrem PD 0:00 1 (DependencyNeverSatisfied)\n 1605902 critical variant_ holtgrem PD 0:00 1 (Dependency)\n 1605905 critical variant_ holtgrem PD 0:00 1 (DependencyNeverSatisfied)\n 1605916 critical wgs_sv_c holtgrem PD 0:00 1 (Dependency)\n 1607103 medium wgs_sv_a holtgrem PD 0:00 1 (DependencyNeverSatisfied)\n[...]\n
Slurm Documentation: squeue
Please also see the official Slurm documentation on squeue.
"},{"location":"slurm/commands-squeue/#important-arguments","title":"Important Arguments","text":"--nodelist
-- Only display jobs running on certain nodes (e.g., GPU nodes).--format
-- Define the format to print, see man squeue
for details. See below for a format string that includes the jobid, partition, job name, user name, job status, running time, number of nodes, number of CPU cores, and allocated GPUs.The following aliases in ~/.bashrc
will allow you to print a long and informative squeue
output with sq
, pipe it into less with sql
, get only your jobs (adjust the alias
to your account) using sqme
and pipe that into less with sqmel
.
alias sq='squeue -o \"%.10i %9P %60j %10u %.2t %.10M %.6D %.4C %10R %b\" \"$@\"'\nalias sql='sq \"$@\" | less -S'\nalias sqme='sq -u YOURUSER_c_or_m \"$@\"'\nalias sqmel='sqme \"$@\" | less -S'\n
"},{"location":"slurm/commands-srun/","title":"Slurm Command: srun
","text":"The srun
command allows you to run a command now.
Representative Example
hpc-login-1:~$ srun --pty bash -i\nmed0201:~$\n
The command will perform a resource allocation with the scheduler (and wait until it has allocated the requested resources) first. Most importantly, you can specify the --pty
argument which will connect the current terminal's standard output, error, and input to your current one. This allows you to run interactive jobs such as shells with srun --pty bash -i
.
Slurm Documentation: srun
Please also see the official Slurm documentation on srun.
"},{"location":"slurm/commands-srun/#important-arguments","title":"Important Arguments","text":"Also see all important arguments of the sbatch
command.
--pty
-- Connect current terminal to the job's stdoud/stderr/stdin.--x11
-- Setup X11 forwarding.--immediate
-- Immediately terminate if the resources to run the job are not available, do not wait.--test-only
-- Don't run anything, but only estimate when the job would be scheduled.srun
are governed by resource allocations, in particular:srun
jobs have a maximal running time set,srun
jobs have a maximal memory and number of cores set, andscontrol show job JOBID
.In the sections Slurm Quickstart and Slurm Cheat Sheet, we have seen that sinfo
and squeue
allow for the compact display partitions/nodes and node information. In contrast, scontrol show job <id>
and scontrol show partition <id>
and scontrol show node <id>
show comprehensive information that quickly gets hard to comprehend for multiple entries.
Now you might ask: is there anything in between? And: yes, there is.
You can tune the output of sinfo
and squeue
using parameters, in particular by providing format strings. All of this is described in the man pages of the commands that you can display with man sinfo
and man squeue
on the cluster.
sinfo
Output","text":"Notable arguments of sinfo
are:
-N, --Node
-- uncompress the usual lines and display one line per node and partition.-s, --summarize
-- compress the node state, more compact display.-R, --list-reasons
-- for nodes that are not up, display reason string provided by admin.-o <fmt>, --format=<fmt>
-- use format string for display.The most interesting argument is -o/--format
. The man page lists the following values that are used when using other arguments. In other words, many of the display modifications could also be applied with -o/--format
.
default \"%#P %.5a %.10l %.6D %.6t %N\"\n--summarize \"%#P %.5a %.10l %.16F %N\"\n--long \"%#P %.5a %.10l %.10s %.4r %.8h %.10g %.6D %.11T %N\"\n--Node \"%#N %.6D %#P %6t\"\n--long --Node \"%#N %.6D %#P %.11T %.4c %.8z %.6m %.8d %.6w %.8f %20E\"\n--list-reasons \"%20E %9u %19H %N\"\n--long --list-reasons\n \"%20E %12U %19H %6t %N\"\n
The best way to learn more about this is to play around with sinfo -o
, starting out with one of the format strings above. Details about the format strings are described in man sinfo
. Some remarks here:
%<num><char>
displays the value represented by <char>
padded with spaces to the right such that a width of <num>
is reached,%.<num><char>
displays the value represented by <char>
padded with spaces to the left such that a width of <num>
is reached, and%#<char>
displays the value represented by <char>
padded with spaces to the max length of the value represented by <char>
(this is a \"virtual\" value, used internally only, you cannot use this and you will have to place an integer here).For example, to create a grouped display with reasons for being down use:
hpc-login-1:~$ sinfo -o \"%10P %.5a %.10l %.16F %40N %E\"\nPARTITION AVAIL TIMELIMIT NODES(A/I/O/T) NODELIST REASON\ndebug* up 8:00:00 0/0/16/16 med[0703-0710,0740-0742,0744-0745,0749,0 bogus node\ndebug* up 8:00:00 18/98/0/116 med[0104-0124,0127,0133-0148,0151-0164,0 none\nmedium up 7-00:00:00 0/0/16/16 med[0703-0710,0740-0742,0744-0745,0749,0 bogus node\nmedium up 7-00:00:00 18/98/0/116 med[0104-0124,0127,0133-0148,0151-0164,0 none\nlong up 28-00:00:0 0/0/16/16 med[0703-0710,0740-0742,0744-0745,0749,0 bogus node\nlong up 28-00:00:0 18/98/0/116 med[0104-0124,0127,0133-0148,0151-0164,0 none\ncritical up 7-00:00:00 0/0/16/16 med[0703-0710,0740-0742,0744-0745,0749,0 bogus node\ncritical up 7-00:00:00 18/98/0/116 med[0104-0124,0127,0133-0148,0151-0164,0 none\nhighmem up 14-00:00:0 0/4/0/4 med[0401-0404] none\ngpu up 14-00:00:0 3/1/0/4 med[0301-0304] none\n
"},{"location":"slurm/format-strings/#tuning-squeue-output","title":"Tuning squeue
Output","text":"The standard squeue output might yield the following
hpc-login-1:~$ squeue | head\n JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)\n 3149 medium variant_ holtgrem PD 0:00 1 (Dependency)\n 1177 medium bash jweiner_ R 6-03:32:41 1 med0127\n 1192 medium bash jweiner_ R 5-12:48:57 1 med0127\n 1210 gpu bash hilberta R 2-16:10:51 1 med0304\n 1213 long bash schubacm R 2-15:22:44 1 med0127\n 2401 gpu bash ramkem_c R 2-10:55:10 1 med0303\n 3063 long bash schubacm R 1-09:52:54 1 med0127\n 3066 long bash schubacm R 1-09:52:04 1 med0127\n 3147 medium ngs_mapp holtgrem R 1-03:13:42 1 med0148\n
Looking at man squeue
, we learn that the default format strings are:
default \"%.18i %.9P %.8j %.8u %.2t %.10M %.6D %R\"\n-l, --long \"%.18i %.9P %.8j %.8u %.8T %.10M %.9l %.6D %R\"\n-s, --steps \"%.15i %.8j %.9P %.8u %.9M %N\"\n
This looks a bit wasteful. Let's cut down on the padding of the job ID and expand on the job name and remove some right paddings.
hpc-login-1:~$ squeue -o \"%.6i %9P %30j %.10u %.2t %.10M %.6D %R %b\" | head\n JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)\n 3149 medium variant_calling holtgrem_c PD 0:00 1 (Dependency)\n 1177 medium bash jweiner_m R 6-03:35:55 1 med0127\n 1192 medium bash jweiner_m R 5-12:52:11 1 med0127\n 1210 gpu bash hilberta_c R 2-16:14:05 1 med0304\n 1213 long bash schubacm_c R 2-15:25:58 1 med0127\n 2401 gpu bash ramkem_c R 2-10:58:24 1 med0303\n 3063 long bash schubacm_c R 1-09:56:08 1 med0127\n 3066 long bash schubacm_c R 1-09:55:18 1 med0127\n 3147 medium ngs_mapping holtgrem_c R 1-03:16:56 1 med0148\n
"},{"location":"slurm/format-strings/#displaying-resources","title":"Displaying Resources","text":"Now display how many of our internal projects still exist.
hpc-login-1:~$ squeue -o \"%.6i %9P %30j %.10u %.2t %.10M %.6D %10R %s\" | head\n
The next steps are (TODO):
This page describes how to create SLURM job scripts.
SLURM job scripts look as follows. On the top you have lines starting with #SBATCH
. These appear as comments to bash scripts. These lines are interpreted by sbatch
in the same way as command line arguments. That is, when later submitting the script with sbatch my-job.sh
you can either have the parameter to the sbatch
call or in the file.
Multi-Node Allocation in Slurm
Classically, jobs on HPC systems are written in a way that they can run on multiple nodes at once, using the network to communicate. Slurm comes from this world and when allocating more than one CPU/core, it might allocate them on different nodes. Please use --nodes=1
to force Slurm to allocate them on a single node.
Creating the Script
host:example$ cat >my-job.sh <<\"EOF\"\n#!/bin/bash\n#\n#SBATCH --job-name=this-is-my-job\n#SBATCH --output=output.txt\n#\n#SBATCH --ntasks=1\n#SBATCH --nodes=1\n#SBATCH --time=10:00\n#SBATCH --mem-per-cpu=100M\n\ndate\n\nhostname\n>&2 echo \"Hello World\"\n\nsleep 1m\n\ndate\nEOF\n
Also see the SLURM Rosetta Stone for more options.
Submit, Look at Queue & Result
host:example$ sbatch script.sh \nSubmitted batch job 315\nhost:example$ squeue -u holtgrem_c\n JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) \n 315 debug this-is- holtgrem R 0:40 1 med0127 \nhost:example$ sleep 2m\nhost:example$ squeue -u holtgrem_c\n JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) \nhost:example$ cat output.txt \nWed Mar 25 13:30:56 CET 2020\nmed0127\nHello World\nWed Mar 25 13:31:56 CET 2020\n
"},{"location":"slurm/memory-allocation/","title":"Memory Allocation","text":"Memory allocation is one of the topics that users find confusing most often. This section first gives some technical background and then explains how to implement this properly with Slurm on the BIH HPC.
"},{"location":"slurm/memory-allocation/#technical-background","title":"Technical Background","text":"Technical Background Summary
Main memory used to be one of the most important topics when programming, as computers had so little. There is the infamous quote \"640KB ought ot be enough for anybody\" wrongly attribute to Bill Gates which refers to the fact that early computers could only address that amount of memory. In MS DOS, one had to use special libraries for a program to use more memory. Today, computers are very fast and memory is plentiful and people can (rightfully) forget about memory allocation ... as long as they don't use \"much\" memory by today's standards.
The Linux operating system differentiates between the following types of memory:
Note that above we are talking about processes, not Slurm jobs yet. Let us look at this in detail:
Each program uses some kind of memory management. For example, in C the malloc
and free
functions manually allocate and free memory while in Java, R, and Python, memory allocation and release is done automatically using a concept called garbage collection. Each program starts with a certain virtual memory size, that is the amount of memory it can address, say 128MB. When the program allocates memory, the memory allocation mechanism will check whether it has sufficient space left. If not, it will request an increase in virtual memory from the operating system, e.g., to 256MB. If this fails then the program can try to handle the error, e.g., terminate gracefully, but many programs will just panic and stop. Otherwise, the program will get access to more memory and happily continue to run.
However, programs can allocate humonguous amounts of virtual memory and only use a little. Memory is organized in \"pages\" (classically these are 4096 bytes each, but can be larger using so-called \"huge page\" features). The operating system tracks which memory pages are actually used by a process. The total size of these pages is called the resident set size: the amount of memory that is actually currently used by a program. Programs can also mark pages as unused again, thus freeing resident memory and can also decrease their virtual memory.
In some cases it is still interesting to use swap memory. Here, the contents of resident memory are copied to disk by the operating system. This process is completely transparent to the program; the data remains available at the original positions in the virtual memory! However, accessing it will take some time as it must be read back into main memory from the disk. In this way, it was possible for a computer with 4MB of RAM and a disk of 100MB to run programs that used 8MB. Of course, this was only really useable for programs that ran in the background. One could really feel the latency if a graphical program was using swapped memory (you could actually hear the hard drive working). Today, swap storage is normally only relevant when put your computer into hibernation. Given the large main memory on the cluster nodes, their small local hard drives (just used for loading the operating system), and the extreme slowness involved in using swapped memory, the BIH HPC nodes have no swap memory allocated.
Most HPC users will also use shared memory, at least implicitly. Whenever a program uses fork
to create a subprocess (BTW, this is not a thread), the program can chose to \"copy\" its current address space. The second process then has access to the same memory than the parent process in a copy-on-write fashion. This allows, for example, pre-loading a database, and also allows the use of already loaded library code by the child process as well. If the child process writes to the copy-on-write memory of the parent, the relevant memory page will be copied and attributed to the child.
Two or more processes can share the same memory explicitly. This is usually used for inter-process communication but the Bowtie program uses it for sharing the memory of indices. For example, the Python multiprocessing
module will use this, including if you have two MPI processes running on the same host.
Memory is also separated into segments, the most interesting ones are heap and stack memory. For compiled languages, memory can be allocated on either. For C, an int
variable will be allocated on the stack. Every time you call a function, a stack frame is created in memory to hold the local variables and other information for the duration of the function execution. The stack thus grows through function calls made by your program and shrinks when the functions return. The stack size for a process is limited (by ulimit -s
) and a program that goes too deep (e.g., via infinite recursion) will be terminated by the operating system if it exceeds this limit. Again in C, int * ptr = (int *)malloc(10 * sizeof(int));
will allocate memory for one variable (an integer pointer) on the stack and memory for 10 integers on the heap. When the function returns, the ptr
variable on the stack will be freed but to free the array of integers, you'd have to call free(ptr)
. If the memory is not freed then this constitutes a memory leak, but that is another topic.
Other relevant segments are code, where the compiled code lives, and data, where static data such as strings displayed to the user are stored. As a side node, in interpreted languages such as R or Python, the code and data segments will refer to the code and data of Python while the actual program text will be on the heap.
"},{"location":"slurm/memory-allocation/#interlude-memory-in-java","title":"Interlude: Memory in Java","text":"Memory in Java Summary
-XX:MaxHeapSize=<size>
(e.g., <size>=2G
) for your program and only tune the other parameters if neededJava's memory management provides for some interesting artifacts. When running simple Java programs, you will never run into this but if you need to use gigabytes of memory in Java then you will have to learn a bit about Java memory management. This is the case when running GATK programs, for example.
As different operating systems handle memory management differently, the Java virtual machine does its own memory management to provide a consistent interface. The following three settings are important in governing memory usage of Java:
-Xmx<size>
/-XX:MaxHeapSize=<size>
-- the maximal Java heap size-Xms<size>
/-XX:InitialHeapSize=<size>
-- the initial Java heap size-Xss<size>
/-XX:ThreadStackSize=<size>
-- maximal stack size available to a Java thread (e.g., the main thread)Above, <size>
is a memory specification, either in bytes or with a suffix, e.g., 80M
, or 1G
.
On startup, Java does roughly the following:
Memory freed by the Java garbage collector can be re-used by other Java objects (rss remains the same) or be freed in the operating system (rss decreases). The Java VM program itself will also consume memory on the OS stack but that is negligible.
Overall, the Java VM needs to store in main memory:
In the BIH HPC context, the following is recommended to:
Memory Allocation in Slurm Summary
--mem=<size>
(e.g., <size>=3G
) to allocate memory per nodesrun
and batch sbatch
jobs are governed by Slurm memory allocationOur Slurm configuration uses Linux cgroups to enforce a maximum amount of resident memory. You simply specify it using --mem=<size>
in your srun
and sbatch
command.
In the (rare) case that you provide more flexible number of threads (Slurm tasks) or GPUs, you could also look into --mem-per-cpu
and --mem-per-gpu
. The official Slurm sbatch manual is quite helpful, as is man sbatch
on the cluster command line.
Slurm (or rather Linux via cgroups) will track all memory started by all jobs by your process. If each process works independently (e.g., you put the output through a pipe prog1 | prog2
) then the amount of memory consumed will at any given time be the sum of the RSS of both processes at that time. If your program uses fork
, which uses memory in a copy-on-write fashion, the shared memory is of course only counted once. Note that Python's multiprocessing does not use copy on write: its data will be explicitly copied and consume additional memory. Refer to the Scipy/Numpy/Pandas etc. documentation on how to achieve parallelism without copying too much data.
The amount of virtual memory that your program can reserve is only \"virtually\" unlimited (pun not intended). However, in practice, the operating system will not like you allocating more than physically available. If your program attempts to allocate more memory than requested via Slurm, your program will be killed.
This is reported to you in the Slurm job output log as something like:
slurmstepd: error: Detected 1 oom-kill event(s) in step <JOB ID>.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.\n
You can inspect the amount of memory available on each node in total with sinfo --format \"%.10P %.10l %.6D %.6m %N\"
, as shown below.
$ sinfo --format \"%.10P %.10l %.6D %.6m %N\"\n PARTITION TIMELIMIT NODES MEMORY NODELIST\n debug* 8:00:00 240 128722 med[0101-0164,0201-0264,0501-0516,0601-0632,0701-0764]\n medium 7-00:00:00 240 128722 med[0101-0164,0201-0264,0501-0516,0601-0632,0701-0764]\n long 28-00:00:0 240 128722 med[0101-0164,0201-0264,0501-0516,0601-0632,0701-0764]\n critical 7-00:00:00 176 128722 med[0101-0164,0501-0516,0601-0632,0701-0764]\n highmem 14-00:00:0 4 515762 med[0401-0404]\n gpu 14-00:00:0 4 385215 med[0301-0304]\n mpi 14-00:00:0 240 128722 med[0101-0164,0201-0264,0501-0516,0601-0632,0701-0764]\n
"},{"location":"slurm/memory-allocation/#memorycpu-accounting-in-slurm","title":"Memory/CPU Accounting in Slurm","text":"Memory Accounting in Slurm Summary
sacct -j JOBID --format=JobID,MaxRSS
to display the RSS usage of your programsacct -j JOBID --format=Elapsed,AllocCPUs,TotalCPU
to display information about CPU usageWhile Slurm runs your job, it collects information about the job such as the running time, exit status, and memory usage. This information is available through the scheduling system via the squeue
and scontrol
commands, but only while the job is pending execution, executing, or currently completing. After job completion, the information is only available through the Slurm accounting system.
You can query information about jobs, e.g., using sacct
:
$ sacct -j 1607166\n JobID JobName Partition Account AllocCPUS State ExitCode\n------------ ---------- ---------- ---------- ---------- ---------- --------\n1607166 snakejob.+ critical 16 COMPLETED 0:0\n1607166.bat+ batch 16 COMPLETED 0:0\n1607166.ext+ extern 16 COMPLETED 0:0\n
This shows that the job with ID 1607166
with a job ID starting with snakejob.
has been run in the critical
partition, been allocated 16 cores and had an exit code of 0:0
. For technical reasons, there is a batch
and an extern
sub step. Actually, Slurm makes it possible to run various steps in one batch as documented in the Slurm documentation.
The sacct
command has various command-line options that you can read about via man sacct
or in the Slurm documentation. We can use --brief
/-b
to show only a brief summary.
$ sacct -j 1607166 --brief\n JobID State ExitCode\n------------ ---------- --------\n1607166 COMPLETED 0:0\n1607166.bat+ COMPLETED 0:0\n1607166.ext+ COMPLETED 0:0\n
Similarly, you can use --long
to display extended information (see the manual for the displayed columns). Very long report lines can be piped into less -S
for easier display. You can fine-tune the information to display with a format string to --format
:
$ sacct -j 1607166 --format=JobID,ReqMem,MaxRSS,Elapsed,TotalCPU,AllocCPUS\n JobID ReqMem MaxRSS Elapsed TotalCPU AllocCPUS\n------------ --------- ---------- ---------- ---------- ----------\n1607166 60Gn 13:07:31 7-16:21:29 16\n1607166.bat+ 60Gn 4314560K 13:07:31 7-16:21:29 16\n1607166.ext+ 60Gn 0 13:07:31 00:00.001 16\n
From this command, we can read that we allocate 60GB memory of memory per node (suffix n
, here Gn
for gigabytes per node) and the maximum RSS is reported as 4.3GB. You can use this information to fine-tune your memory allocations. As a side-remark, a suffic c
indicates the memory per core (e.g., that could be60Gc
)
Further, the program ran for 13 hours and 7 minutes with allocated 16 CPU cores and consumed a total of 7 days, 16 hours, and 21 minutes of CPU time. Thus, a total of 10,061 CPU minutes were spent in 787 minutes wall-clock time. This yields an overall empirical degree of parallelism of about 10061 / 787 = 14, and a parallel efficiency of 14 / 16 = 88%. The discussion of parallel efficiency is a topic not covered here.
However, you can use the awk
script below to compute the empirical parallelism (EmpPar
) and the parallel efficiency (ParEff
). The script also displays the difference I requested, and used RSS (DiffRSS
). The script can be found here.
$ sacct -j 1607166 --format=JobID,ReqMem,MaxRSS,Elapsed,TotalCPU,AllocCPUS \\\n | awk -f quick-sacct.awk\n JobID ReqMem MaxRSS Elapsed TotalCPU AllocCPUS EmpPar ParEff DiffMEM\n------------ ---------- ---------- ---------- ---------- ---------- --------- -------- --------\n1607166 60Gn 13:07:31 7-16:21:29 16 0.00 0.00 -\n1607166.bat+ 60Gn 4314560K 13:07:31 7-16:21:29 16 14.05 0.88 55.89\n1607166.ext+ 60Gn 0 13:07:31 00:00.001 16 0.00 0.00 -\n
"},{"location":"slurm/overview/","title":"Scheduling Overview","text":"The BIH HPC uses the Slurm scheduling system for resource allocation. This section of the manual attempts to give an overview of what scheduling is and how to use the Slurm scheduler. For more detailed information, you will have to refer to the Slurm website and the Slurm man pages (e.g., by entering man sbatch
or man srun
on the HPC terminal's command line).
For a quick introduction and hands-on examples, please see the manual sections
Also, make sure that you are aware of our How-To: Debug Software and How-To: Debug Software on HPC Systems guides in the case that something goes wrong.
"},{"location":"slurm/overview/#annotated-contents","title":"Annotated Contents","text":"srun
-- running parallel jobs nowsbatch
-- submission of batch jobsscancel
-- stop/kill jobssinfo
-- display information about the Slurm clustersqueue
-- information about pending and running jbosscontrol
-- detailed information (and control)sacct
-- access Slurm accounting information (pending, running, and past jobs)Many other facilities run Slurm clusters and make their documentation available on the internet. We list some that we found useful below. However, be aware that Slurm is a highly configurable and extensible system. Other sites may have different configurations and plugins enabled than we have (or might even have written custom plugins that are not available at BIH). In any case, it's always useful to look \"\u00fcber den Tellerrand\".
man
Pages - web versions of Unix manual (man
) pages.Create an interactive bash session (srun
will run bash in real-time, --pty
connects its stdout
and stderr
to your current session).
hpc-login-1:~$ srun --pty bash -i\nmed0740:~$ echo \"Hello World\"\nHello World\nmed0740:~$ exit\nhpc-login-1:~$\n
Note you probably want to longer running time for your interactive jobs. This way, your jobs can run for up to 28 days. This will make your job be routed automatically into the long
partition as it is the only one that can fit your job.
hpc-login-1:~$ srun --pty --time 28-00 bash -i\nmed0740:~$\n
Pro-Tip: Using Bash aliases for quick access.
hpc-login-1:~$ alias slogin=\"srun --pty bash -i\"\nhpc-login-1:~$ slogin\nmed0740:~$ exit\nhpc-login-1:~$ cat >>~/.bashrc <<\"EOF\"\n# Useful aliases for logging in via Slurm\nalias slogin=\"srun --pty bash -i\"\nalias slogin-x11=\"srun --pty --x11 bash -i\"\nEOF\n
Create an interactive R session on the cluster (assuming conda is active and the environment my-r
is created, e.g., with conda create -n my-r r
).
hpc-login-1:~$ conda activate my-r\nhpc-login-1:~$ srun --pty R\nR version 3.6.2 (2019-12-12) -- \"Dark and Stormy Night\"\nCopyright (C) 2019 The R Foundation for Statistical Computing\n[...]\nType 'demo()' for some demos, 'help()' for on-line help, or\n'help.start()' for an HTML browser interface to help.\nType 'q()' to quit R.\n\n\n> Sys.info()[\"nodename\"]\n nodename\n\"med0740\"\n> q()\nSave workspace image? [y/n/c]:\nhpc-login-1:~$\n
Create an interactive iPython session on the cluster (assuming conda is active and the environment my-python
is created, e.g., with conda create -n my-python python=3 ipython
).
hpc-login-1:~$ conda activate my-python\nhpc-login-1:~$ srun --pty ipython\nPython 3.8.2 | packaged by conda-forge | (default, Mar 5 2020, 17:11:00)\nType 'copyright', 'credits' or 'license' for more information\nIPython 7.13.0 -- An enhanced Interactive Python. Type '?' for help.\n\nIn [1]: import socket; socket.gethostname()\nOut[1]: 'med0740'\n\nIn [2]: exit\nhpc-login-1:~$\n
Allocate 4 cores (default is 1 core), and a total of 4GB of RAM on one node (alternatively use --mem-per-cpu
to set RAM per CPU); sbatch
accepts the same argument.
hpc-login-1:~$ srun --cpus-per-task=4 --nodes=1 --mem=4G --pty bash\nmed0740:~$ export | grep SLURM_CPUS_ON_NODE\n4\nmed0740:~$ your-parallel-script --threads 4\n
Submit an R script to the cluster in batch mode (sbatch
schedules the job for later execution).
hpc-login-1:~$ cat >job-script.sh <<\"EOF\"\n#!/bin/bash\necho \"Hello, I'm running on $(hostname) and it's $(date)\"\nEOF\nhpc-login-1:~$ sbatch job-script.sh\nSubmitted batch job 7\n\n# Some time later:\nhpc-login-1:~$ cat slurm-7.out\nHello, I'm running on med0740 and it's Fri Mar 6 07:36:42 CET 2020\nhpc-login-1:~$\n
"},{"location":"slurm/reservations/","title":"Reservations / Maintenances","text":"Hint
Read this in particular if you want to know why your job does not get scheduled and you see Reason=ReqNodeNotAvail,_Reserved_for_maintenance
in scontrol show job
.
Administration registers maintenances with the Slurm scheduler as so-called reservations. You can see the current reservations with scontrol show reservation
. The following is a scheduled reservation affecting ALL nodes of the cluster.
# scontrol show reservation\nReservationName=root_13 StartTime=2021-09-07T00:00:00 EndTime=2021-09-09T00:00:00 Duration=2-00:00:00\n Nodes=hpc-cpu-[1-36],med[0101-0116,0201-0264,0301-0304,0401-0404,0501-0516,0601-0632,0701-0764]\n NodeCnt=236 CoreCnt=5344 Features=(null) PartitionName=(null)\n Flags=MAINT,IGNORE_JOBS,SPEC_NODES,ALL_NODES TRES=cpu=10176\n Users=root Groups=(null) Accounts=(null) Licenses=(null) State=INACTIVE BurstBuffer=(null) Watts=n/a\n MaxStartDelay=(null)\n
You will also be notified when logging into the login nodes, e.g.,
--\n ***NOTE: 1 scheduled maintenance(s)***\n\n 1: 2021-09-07 00:00:00 to 2021-09-09 00:00:00 ALL nodes\n\nYou jobs do not start because of \"Reserved_for_maintenance\"?\nSlurm jobs will only start if they do not overlap with scheduled reservations.\nMore information:\n\n - https://bihealth.github.io/bih-cluster/slurm/reservations/\n - https://bihealth.github.io/bih-cluster/admin/maintenance/\n--\n
"},{"location":"slurm/reservations/#what-is-the-effect-of-a-reservation","title":"What is the Effect of a Reservation?","text":"Maintenance reservations will block the affected nodes (or even the whole cluster) for jobs. If there is a maintenance in one week then your job must have an end time before the reservation starts. By this, the job gives a guarantee to the scheduler that it will not interfer with the maintenance reservation.
For example, scontrol show job JOBID
might report the following
JobId=4011580 JobName=snakejob\n UserId=USER(UID) GroupId=GROUP(GID) MCS_label=N/A\n Priority=1722 Nice=0 Account=GROUP QOS=normal\n JobState=PENDING Reason=ReqNodeNotAvail,_Reserved_for_maintenance Dependency=(null)\n Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0\n RunTime=00:00:00 TimeLimit=28-00:00:00 TimeMin=N/A\n SubmitTime=2021-08-30T09:01:01 EligibleTime=2021-08-30T09:01:01\n AccrueTime=2021-08-30T09:01:01\n StartTime=2021-09-09T00:00:00 EndTime=2021-10-07T00:00:00 Deadline=N/A\n SuspendTime=None SecsPreSuspend=0 LastSchedEval=2021-08-30T10:20:40\n Partition=long AllocNode:Sid=172.16.35.153:5453\n ReqNodeList=(null) ExcNodeList=(null)\n NodeList=(null)\n NumNodes=1-1 NumCPUs=8 NumTasks=8 CPUs/Task=1 ReqB:S:C:T=0:0:*:*\n TRES=cpu=8,mem=4G,node=1,billing=8\n Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*\n MinCPUsNode=1 MinMemoryNode=4G MinTmpDiskNode=0\n Features=(null) DelayBoot=00:00:00\n OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)\n Power=\n NtasksPerTRES:0\n
Look out for the Reason
line:
Reason=ReqNodeNotAvail,_Reserved_for_maintenance\n
This job is scheduled to run up to 4 weeks and has been submitted on 2021-08-30.
Right now the following reservation is active
# scontrol show reservation\nReservationName=root_13 StartTime=2021-09-07T00:00:00 EndTime=2021-09-09T00:00:00 Duration=2-00:00:00\n Nodes=hpc-cpu-[1-36],med[0101-0116,0201-0264,0301-0304,0401-0404,0501-0516,0601-0632,0701-0764]\n NodeCnt=236 CoreCnt=5344 Features=(null) PartitionName=(null)\n Flags=MAINT,IGNORE_JOBS,SPEC_NODES,ALL_NODES TRES=cpu=10176\n Users=root Groups=(null) Accounts=(null) Licenses=(null) State=INACTIVE BurstBuffer=(null) Watts=n/a\n MaxStartDelay=(null)\n
Thus, the scheduler decided to set a StartTime
of the job to 2021-09-09T00:00:00
, which is the end time of the reservation. Effectively, the job is forced to run outside the maintenance reservation.
You can resolve this by using a --time=
parameter to srun
or sbatch
such that the job ends before the maintenance reservation starts.
Rosetta Stone?
The Rosetta Stone is a stone slab that carries the same text in Egyptian hieroglyphs and ancient Greek. This was key for decyphering Egyptian hieroglyphs in the 18th century. Nowadays, the term is often used to label translation tables such as the one below.
The table below shows some SGE commands and their Slurm equivalents.
User Command SGE Slurm remote loginqrsh/qlogin
srun --pty bash
run interactively N/A srun --pty program
submit job qsub script.sh
sbatch script.sh
delete job qdel job-id
scancel job-id
job status by job id N/A squeue --job job-id
detailed job status qstat -u '*' -j job-id
sstat job-id
job status of your jobs qstat
squeue --me
job status by user qstat -u user
squeue -u user
hold job qhold job-id
scontrol hold job-id
release job qrls job-id
scontrol release job-id
queue list qconf -sql
scontrol show partitions
node list qhost
sinfo -N
OR scontrol show nodes
cluster status qhost -q
sinfo
show node resources N/A sinfo \"%n %G\"
Job Specification SGE Slurm script directive marker #$
#SBATCH
(run in queue) -q queue
-p queue
allocated nodes N/A -N min[-max]
allocate cores -pe smp count
-n count
limit running time -l h_rt=time
-t days-hh:mm:s
redirectd stdout -o file
-o file
redirect stderr -e file
-e file
combine stdout/stderr -j yes
-o without -e
copy environment -V
--export=ALL\\|NONE\\|variables
email notification -m abe
--mail-type=events
send email to -M email
--mail-user=email
job name -N name
--job-name=name
restart job -r yes|no
--requeue|--no-requeue
working directory -wd path
--workdir
run exclusively -l exclusive
--exclusive
OR --shared
allocate memory -l h_vmem=size
--mem=mem
OR --mem-per-cpu=mem
wait for job -hold_jid jid
--depend state:job
select target host -l hostname=host1\\|host1
--nodelist=nodes
AND/OR --exclude
allocate GPU -l gpu=1
--gres=gpu:tesla:count
or --gres=gpu:a40:count
"},{"location":"slurm/snakemake/","title":"Snakemake with Slurm","text":"This page describes how to use Snakemake with Slurm.
"},{"location":"slurm/snakemake/#prerequisites","title":"Prerequisites","text":"source miniconda/bin/activate
.We first create a new environment snakemake-slurm
and activate it. We need the snakemake
package for this.
host:~$ conda create -y -n snakemake-slurm snakemake\n[...]\n#\n# To activate this environment, use\n#\n# $ conda activate snakemake-slurm\n#\n# To deactivate an active environment, use\n#\n# $ conda deactivate\nhost:~$ conda activate snakemake-slurm\n(snakemake-slurm) host:~$\n
"},{"location":"slurm/snakemake/#snakemake-workflow-setup","title":"Snakemake Workflow Setup","text":"We create a workflow and ensure that it works properly with multi-threaded Snakemake (no cluster submission here!)
host:~$ mkdir -p snake-slurm\nhost:~$ cd snake-slurm\nhost:snake-slurm$ cat >Snakefile <<\"EOF\"\nrule default:\n input: \"the-result.txt\"\n\nrule mkresult:\n output: \"the-result.txt\"\n shell: r\"sleep 1m; touch the-result.txt\"\nEOF\nhost:snake-slurm$ snakemake --cores=1\n[...]\nhost:snake-slurm$ ls\nSnakefile the-result.txt\nhost:snake-slurm$ rm the-result.txt\n
"},{"location":"slurm/snakemake/#snakemake-and-slurm","title":"Snakemake and Slurm","text":"You have two options:
snakemake --profile=cubi-v1
and the Snakemake resource configuration as shown below. STRONGLY PREFERREDsnakemake --cluster='sbatch ...'
command.Note that we sneaked in a sleep 1m
? In a second terminal session, we can see that the job has been submitted to SLURM indeed.
host:~$ squeue -u holtgrem_c\n JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)\n 325 debug snakejob holtgrem R 0:47 1 med0127\n
"},{"location":"slurm/snakemake/#threads-resources","title":"Threads & Resources","text":"The cubi-v1
profile (stored in /etc/xdg/snakemake/cubi-v1
on all cluster nodes) supports the following specification in your Snakemake rule:
threads
: the number of threads to execute the job onresources.mem
/resources.mem_mb
: the memory to allocate for the whole job, OR resources.mem_per_thread
: the memory to allocate for each thread.resources.time
: the running time of the rule, in a syntax supported by Slurm, e.g. HH:MM:SS
or D-HH:MM:SS
resources.partition
: the partition to submit your job into (Slurm will pick a fitting partition for you by default)resources.nodes
: the number of nodes to schedule your job on (defaults to 1
and you will want to keep that value unless you want to use MPI)You will need Snakemake >=7.0.2 for this.
Here is how to call Snakemake:
# snakemake --profile=cubi-v1 -j1\n
To set rule-specific resources:
rule myrule:\n threads: 1\n resources:\n mem='8G',\n time='04:00:00',\n input: # ...\n output: # ...\n shell: # ...\n
You can combine this with Snakemake resource callables, of course:
def myrule_mem(wildcards, attempt):\n mem = 2 * attempt\n return '%dG' % mem\n\nrule snps:\n threads: 1\n resources:\n mem=myrule_mem,\n time='04:00:00',\n input: # ...\n output: # ...\n shell: # ...\n
"},{"location":"slurm/snakemake/#custom-logging-directory","title":"Custom logging directory","text":"By default, slurm will write log files into the working directory of snakemake, which will look like slurm-$jobid.out
.
To change this behaviour, the environment variable SBATCH_DEFAULTS
can be set to re-route the --output
parameter. If you want to write your files into slurm_logs
with a filename pattern of $name-$jobid
for instance, consider the following snippet for your submission script:
#!/bin/bash\n#\n#SBATCH --job-name=snakemake_main_job\n#SBATCH --ntasks=1\n#SBATCH --nodes=1\n#SBATCH --time=48:10:00\n#SBATCH --mem-per-cpu=300M\n#SBATCH --output=slurm_logs/%x-%j.log\n\nmkdir -p slurm_logs\nexport SBATCH_DEFAULTS=\" --output=slurm_logs/%x-%j.log\"\n\ndate\nsrun snakemake --use-conda -j1 --profile=cubi-v1\ndate\n
The name of the snakemake slurm job will be snakemake_main_job
, the name of the jobs spawned from it will be called after the rule name in the Snakefile.
This section describes how Slurm handles temporary files on the local disk.
Temporary Files Best Practices
See Best Practices: Temporary Files for information how to use temporary files effectively.
"},{"location":"slurm/temporary-files/#slurm-behaviour","title":"Slurm Behaviour","text":"Our Slurm configuration has the following behaviour.
"},{"location":"slurm/temporary-files/#environment-variable-tmpdir","title":"Environment Variable TMPDIR","text":"Slurm itself will by default not change the TMPDIR
environment variable but retain the variable's value from the srun
or sbatch
call.
/tmp
Directories","text":"The only place where users can write data to on local storage of the compute nodes is /tmp
.
Storage is a consumable shared resource as the storage used by one job cannot use another job. It is thus critical that Slurm cleans up after each job such that all space on the local node is available to the next job. This is done using the job_container/tmpfs Slurm plugin.
This plugin creates a so-called Linux namespace for each job and creates a bind mount of /tmp
to a location on the local storage. This mount is only visible to the currently running job and each job, even of the same user, get their own /tmp
. After a job terminates, Slurm will remove the directory and all of its content.
There is a notable exception. If you use ssh
to connect to a node rather than using srun
or sbatch
, you will see the system /tmp
directory and can also write to it. This usage of storage is not tracked and consequently you can circumvent the Slurm quota management. Using /tmp
in this fashion (i.e., outside of Slurm-controlled jobs) is prohibited. If it cannot be helped (e.g., if you need to run some debugging application that needs to create FIFO or socket files) then keep usage of /tmp
outside of Slurm job below 100MB.
localtmp
","text":"Enforcing localtmp
Gres
From January 31, we will enforce the allocated storage in /tmp
on the local disk with quotas. Jobs writing to /tmp
beyond the quota in the job allocation will not function properly and probably crash with \"out of disk quota\" messages.
Slurm tracks the available local storage above 100MB on nodes in the localtmp
generic resource (aka Gres). The resource is counted in steps of 1MB, such that a node with 350GB of local storage would look as follows in scontrol show node
:
hpc-login-1 # scontrol show node hpc-cpu-1\nNodeName=hpc-cpu-1 Arch=x86_64 CoresPerSocket=24\n [...]\n Gres=localtmp:350K\n [...]\n CfgTRES=cpu=96,mem=360000M,billing=96,gres/localtmp=358400\n [...]\n
Each job is automaticaly granted 100MB of storage on the local disk which is sufficient for most standard programs. If your job needs more temporary storage then you should either
$HOME/scratch
volume (see Best Practices: Temporary Files)localtmp
generic resource (described here)You can allocate the resource with --gres=localtmp:SIZE
where SIZE
is given in MB.
hpc-login-1 # srun --gres=localtmp:100k --pty bash -i\nhpc-cpu-1 # scontrol show node hpc-cpu-1\nNodeName=hpc-cpu-1 Arch=x86_64 CoresPerSocket=24\n [...]\n Gres=localtmp:250K\n [...]\n CfgTRES=cpu=96,mem=360000M,billing=96,gres/localtmp=358400\n [...]\n AllocTRES=cpu=92,mem=351G,gres/localtmp=102400\n [...]\n
The first output tells us about the resource configured to be available to user jobs and the last line show us that 100k=102400
MB of local storage are allocated.
You can also see the used resources in the details of your job:
scontrol show job 14848\nJobId=14848 JobName=example.sh\n [...]\n TresPerNode=gres:localtmp:100k\n
"},{"location":"slurm/x11/","title":"Slurm and X11","text":"Make sure to connect to the login node with X11 forwarding.
host:~$ ssh -X -l user_c hpc-login-1.cubi.bihealth.org\n
Once connected to the login node, pass the --x11
flag.
hpc-login-1:~$ srun --pty --x11 xterm\n
"},{"location":"storage/home-quota/","title":"Keeping your home folder clean","text":"We set quite restrictive quotas for user homes, but in exchange you get file system snapshots and mirroring. Your home folder should therefore only be used for scripts, your user config, and other small files. Everything else should be stored in the work
or scratch
subdirectories, which effectively link to your group's shared storage space. This document describes some common pitfalls and how to circumvent them.
Hint
The tilde character (~
) is shorthand for your home directory.
Various programs are used to depositing large folders in a user's home and can quickly use up your allotted storage quota. These include:
~/.local/lib/python*
~/R/x86_64-pc-linux-gnu-library
~/ondemand
Please note that directories whose name is starting with a dot are not shown by the normal ls
command, but require the ls -a
flag. You can search your home folder for large directories like so:
$ du -shc ~/.* ~/* --exclude=.. --exclude=.\n
You should move these locations to your work
folder and create symbolic links in their place. Conda installations should be installed in work
from the very beginning as they do not react well to being moved around.
Here is an example for the .local
folder.
$ mv ~/.local ~/work/.local\n$ ln -s ~/work/.local ~/.local\n
"},{"location":"storage/home-quota/#temporary-files","title":"Temporary Files","text":"Another usual culprit is the hidden .cache
directory which contains temporary files. This folder can be moved to the scratch
volume in a similar manner as described above.
$ mv ~/.cache ~/scratch/.cache\n$ ln -s ~/scratch/.cache ~/.cache\n
Important
Files placed in your scratch
directory will be automatically removed after 2 weeks. Do not place any valuable files in there.
Please use hpc-transfer-1
and hpc-transfer-2
for moving large amounts of files. This not only leaves the compute notes available for actual computation, but also has no risk of your jobs being killed by Slurm. You should also use tmux
to not risk connection loss during long running transfers.
Define source and target location and copy contents. Please replace the parts in curly brackets with your actual folder names. It is important to end paths with a trailing slash (/
) as this is interpreted by sync
as \u201call files in this folder\u201d.
$ SOURCE=/data/gpfs-1/work/projects/{my_project}/\n$ TARGET=/data/cephfs-2/unmirrored/projects/{my-project}/\n$ rsync -ahP --stats --dry-run $SOURCE $TARGET\n
Remove the --dry-run
flag to start the actual copying process.
Important
File ownership information will be lost during this process. This is due to non-root users not being allowed to change ownership of arbitrary files. If this is a problem for you, please contact our admins again after completing this step.
Perform a second rsync
to check if all files were successfully transferred. Paranoid users might want to add the --checksum
flag to rsync
or use hashdeep
. Please note the flag --remove-source-files
which will do exactly as the name suggests, but leaves empty directories behind.
$ rsync -ahX --stats --remove-source-files --dry-run $SOURCE $TARGET\n
--dry-run
flag to start the actual deletion.$ find $SOURCE -type f | wc -l\n0\n$ rm -r $SOURCE\n
Warning
When defining your SOURCE location, do not use the *
wildcard character. It will not match hidden (dot) files and leave them behind. Its better to use a trailing slash which matches \u201cAll files in this folder\u201d.
All files within your own work directory can be transferred as follows. Please replace parts in curly braces with your cluster user name.
$ SOURCE=/data/gpfs-1/work/users/{username}/\n$ TARGET=/data/cephfs-1/home/users/{username}/work/\n$ rsync -ahP --stats --dry-run $SOURCE $TARGET\n
Note
The --dry-run
flag lets you check that rsync is working as expected without copying any files. Remove it to start the actual transfer.
Perform a second rsync
to check if all files were successfully transferred. Paranoid users might want to add the --checksums
flag or use hashdeep
. Please note the flag --remove-source-files
which will do exactly as the name suggests, but leaves empty directories behind.
$ rsync -ahP --stats --remove-source-files --dry-run $SOURCE $TARGET\n
$ find $SOURCE -type f | wc -l\n0\n
Conda installations tend not to react well to moving their main folder from its original location. There are numerous ways around this problem which are described here.
A simple solution we can recommend is this:
Install a fresh version of conda or mamba in your new work folder. Don't forget to first remove the conda init block in ~/.bashrc
.
$ nano ~/.bashrc\n$ conda init\n$ conda config --set auto_activate_base false\n
You can then use your new conda to export your old environments by specifying a full path like so:
$ conda env export -p /fast/work/user/$USER/miniconda/envs/<env_name> -f <env_name>.yaml\n
If you run into errors it might be better to also use the --no-builds
flag. Finally re-create your old environments from the yaml files:
$ conda env create -f {environment.yml}\n
Outdated
This document is only valid for the old, third-generation file system and will be removed soon. Quotas of our new CephFS storage are communicated via the HPC Access web portal.
As described elsewhere, all data in your user, group, and project volumes is subject to quotas. This page quickly shows how to query for the current usage of data volume and file counts for your user, group, and projects.
"},{"location":"storage/querying-storage/#query-for-user-data-and-file-usage","title":"Query for User Data and File Usage","text":"The file /etc/bashrc.gpfs-quota
contains some Bash functions that you can use for querying the quota usage. This file is automatically sourced in all of your Bash sessions.
For querying your user's data and file usage, enter the following command:
# bih-gpfs-quota-user holtgrem_c\n
You will get a report as follows. As soon as usage reaches 90%, the data/file usage will be highlighted in yellow. If you pass 99%, the data/file usage will be highlighted in red.
=================================\nQuota Report for: user holtgrem_c\n=================================\n\n DATA quota GR- FILES quota GR-\nENTITY NAME FSET USED SOFT HARD ACE USED SOFT HARD ACE\n------- ---------- ------- ----- ---- ----- ----- --- ----- ---- ----- ----- ---\nusers holtgrem_c home 103M 10% 1.0G 1.5G - 2.5k 25% 10k 12k -\nusers holtgrem_c work 639G 62% 1.0T 1.1T - 1.0M 52% 2.0M 2.2M -\nusers holtgrem_c scratch 42G 0% 200T 220T - 207k 0.1% 200M 220M -\n[...]\n
"},{"location":"storage/querying-storage/#query-for-group-data-and-file-usage","title":"Query for Group Data and File Usage","text":"# bih-gpfs-report-quota group ag_someag\n=================================\nQuota Report for: group ag_someag\n=================================\n\n DATA quota GR- FILES quota GR-\nENTITY NAME FSET USED SOFT HARD ACE USED SOFT HARD ACE\n------- ---------- ------- ----- ---- ----- ----- --- ----- ---- ----- ----- ---\ngroups ag_someag home 0 0% 1.0G 1.5G - 4 0% 10k 12k -\ngroups ag_someag work 349G 34% 1.0T 1.5T - 302 0% 2.0M 2.2M -\ngroups ag_someag scratch 0 0% 200T 220T - 1 0% 200M 220M -\n\n[...]\n
"},{"location":"storage/querying-storage/#query-for-project-data-and-file-usage","title":"Query for Project Data and File Usage","text":"# bih-gpfs-report-quota project someproj\n==================================\nQuota Report for: project someproj\n==================================\n\n DATA quota GR- FILES quota GR-\nENTITY NAME FSET USED SOFT HARD ACE USED SOFT HARD ACE\n------- ---------- ------- ----- ---- ----- ----- --- ----- ---- ----- ----- ---\ngroups someproj home 0 0% 1.0G 1.5G - 4 0% 10k 12k -\ngroups someproj work 349G 34% 1.0T 1.5T - 302 0% 2.0M 2.2M -\ngroups someproj scratch 0 0% 200T 220T - 1 0% 200M 220M -\n\n[...]\n
"},{"location":"storage/scratch-cleanup/","title":"Automated Cleanup of Scratch","text":"The scratch
space is automatically cleaned up nightly with the following mechanism.
scratch
folder are created and retained for 3 days.Warning
We specifically use the mtime
attribute to determine if files in scratch should be cleaned up. Copying or downloading files to scratch while preserving the original mtime
might lead to unexpected results.
This document describes the forth iteration of the file system structure on the BIH HPC cluster. It was made necessary because the previous file system was no longer supported by the manufacturer and we since switched to distributed Ceph storage.
Important
For now, the old, third-generation file system is still mounted at /fast
. It will be decommissioned soon, please consult this document describing the migration process!
There are the following three entities on the cluster:
Each user, group, and project can have storage folders in different locations.
"},{"location":"storage/storage-locations/#data-types-and-storage-tiers","title":"Data Types and Storage Tiers","text":"Files stored on the HPC fall into one of three categories:
Home folders store programs, scripts, and user config i.\u00a0e. long-lived and very important files. Loss of this data requires to redo manual work (like programming).
Work folders store data of potentially large size which has a medium life time and is important. Examples are raw sequencing data and intermediate results that are to be kept (e.\u00a0g. sorted and indexed BAM files). Work data requires time-consuming actions to be restored, such as downloading large amounts of data or long-running computation.
Scratch folder store temporary files with a short life-time. Examples are temporary files (e.\u00a0g. unsorted BAM files). Scratch data is created to be removed eventually.
Ceph storage comes in two types which differ in their I/O speed, total capacity, and cost. They are called Tier 1 and Tier 2 and sometimes hot storage and warm storage. In the HPC filesystem they are mounted in /data/cephfs-1
and /data/cephfs-2
.
Storage quotas are imposed in these locations to restrict the maximum size of folders. Amount and utilization of quotas is communicated via the HPC Access web portal.
"},{"location":"storage/storage-locations/#home-directories","title":"Home Directories","text":"Location: /data/cephfs-1/home/
Only users have home directories on Tier 1 storage. This is the starting point when starting a new shell or SSH session. Important config files are stored here as well as analysis scripts and small user files. Home folders have a strict storage quota of 1\u00a0GB.
"},{"location":"storage/storage-locations/#work-directories","title":"Work Directories","text":"Location: /data/cephfs-1/work/
Groups and projects have work directories on Tier 1 storage. User home folders contain a symlink to their respective group's work folder. Files shared within a group/project are stored here as long as they are in active use. Work folders are generally limited to 1\u00a0TB per group. Project work folders are allocated on an individual basis.
"},{"location":"storage/storage-locations/#scratch-space","title":"Scratch Space","text":"Location: /data/cephfs-1/scratch/
Groups and projects have scratch space on Tier 1 storage. User home folders contain a symlink to their respective group's scratch space. Meant for temporary, potentially large data e.\u00a0g. intermediate unsorted or unmasked BAM files, data downloaded from the internet etc. Scratch space is generally limited to 10\u00a0TB per group. Projects are allocated scratch on an individual basis. Files in scratch will be automatically removed 2 weeks after their creation.
"},{"location":"storage/storage-locations/#tier-2-storage","title":"Tier 2 Storage","text":"Location: /data/cephfs-2/
This is where big files go when they are not in active use. Groups are allocated 10 TB of Tier 2 storage by default. File quotas here can be significantly larger as space is much cheaper and more abundant than on Tier 1.
Note
Tier 2 storage is currently not accessible from HPC login nodes.
"},{"location":"storage/storage-locations/#overview","title":"Overview","text":"Tier Function Path Default Quota 1 User home/data/cephfs-1/home/users/<user>
1 GB 1 Group work /data/cephfs-1/work/groups/<group>
1 TB 1 Group scratch /data/cephfs-1/scratch/groups/<group>
10 TB 1 Project work /data/cephfs-1/work/projects/<project>
On request 1 Project scratch /data/cephfs-1/scratch/projects/<project>
On request 2 Group /data/cephfs-2/unmirrored/groups/<group>
10 TB 2 Project /data/cephfs-2/unmirrored/projects/<project>
On request 2 Group /data/cephfs-2/mirrored/groups/<group>
On request 2 Project /data/cephfs-2/mirrored/projects/<project>
On request"},{"location":"storage/storage-locations/#snapshots-and-mirroring","title":"Snapshots and Mirroring","text":"Snapshots are incremental copies of the state of the data at a particular point in time. They provide safety against various \"Ops, did I just delete that?\" scenarios, meaning they can be used to recover lost or damaged files. Depending on the location and Tier, CephFS creates snapshots in different frequencies and retention plans.
Location Path Retention policy Mirrored User homes/data/cephfs-1/home/users/
Hourly for 48 h, daily for 14 d yes Group/project work /data/cephfs-1/work/
Four times a day, daily for 5 d no Group/project scratch /data/cephfs-1/scratch/
Daily for 3 d no Group/project mirrored /data/cephfs-2/mirrored/
Daily for 30 d, weekly for 16 w yes Group/project unmirrored /data/cephfs-2/unmirrored/
Daily for 30 d, weekly for 16 w no Some parts of Tier 1 and Tier 2 snapshots are also mirrored into a separate fire compartment within the data center. This provides an additional layer of security i.\u00a0e. physical damage to the servers.
"},{"location":"storage/storage-locations/#accessing-snapshots","title":"Accessing Snapshots","text":"To access snapshots simply navigate to the .snap/
sub-folder of the respective location. This special folder exists on all levels of the CephFS file hierarchy, so even in your user home directory. Inside you will find one folder per snapshot created and in those a complete replica of the respective folder at the time of snapshot creation.
For example:
/data/cephfs-1/home/.snap/<some_snapshot>/users/<your_user>/
same as:/data/cephfs-1/home/users/<your_user>/.snap/<some_snapshot>
/data/cephfs-1/work/.snap/<some_snapshot>/groups/<your_group>/
/data/cephfs-2/unmirrored/.snap/<some_snapshot>/projects/<your_project>/
Here is a simple example of how to restore a file:
$ cd /data/cephfs-2/unmirrored/groups/cubi/.snap/scheduled-2024-03-11-00_00_00_UTC/\n$ ls -l\nimportant_file.txt\n$ cp important_file.txt /data/cephfs-2/unmirrored/groups/cubi/\n
"},{"location":"storage/storage-locations/#technical-implementation","title":"Technical Implementation","text":""},{"location":"storage/storage-locations/#tier-1","title":"Tier 1","text":"/data/cephfs-1
/data/cephfs-2
Important
We will remove access to /fast
on most cluster nodes following September 30th.
Files on the cluster's main storage /data/gpfs-1
aka. /fast
will move to a new file system. That includes users' home directories, work directories, and work-group directories. Once files have been moved to their new locations, /fast
will be retired.
Simultaneously we will move towards a more unified naming scheme for project and group folder names. From now on, all such folders names shall be in kebab-case. This is Berlin after all. Group folders will also be renamed, removing the \"ag_\" prefix.
Detailed communication about the move will be communicated via the cluster mailinglist and the user forum. For technical help, please consult the Data Migration Tips and tricks.
"},{"location":"storage/storage-migration/#why-is-this-happening","title":"Why is this happening?","text":"/fast
is based on a high performance proprietary hardware (DDN) & file system (GPFS). The company selling it has terminated support which also means buying replacement parts will become increasingly difficult.
There are two file systems set up to replace /fast
, named Tier 1 and Tier 2 after their difference in I/O speed:
/fast
ever was, but it only has about 75\u00a0% of its usable capacity.The Hot storage Tier 1 is reserved for files requiring frequent random access, user homes, and scratch. Tier 2 (Warm storage) should be used for everything else. Both file systems are based on the open-source, software-defined Ceph storage platform and differ in the type of drives used. Tier 1 or Cephfs-1 uses NVME SSDs and is optimized for performance, Tier 2 or Cephfs-2 used traditional hard drives and is optimized for cost.
So these are the three terminologies in use right now:
/data/cephfs-1
/data/cephfs-2
More information about CephFS can be found here.
"},{"location":"storage/storage-migration/#new-file-locations","title":"New file locations","text":"Naturally, paths are going to change after files move to their new location. Due to the increase in storage quality options, there will be some more folders to consider.
"},{"location":"storage/storage-migration/#users","title":"Users","text":"/data/cephfs-1/home/users/<user>
/data/cephfs-1/work/groups/<doe>/users/<user>
/data/cephfs-1/scratch/groups/<doe>/users/<user>
Important
User work
& scratch
spaces are now part of the user's group folder. This means, groups need to coordinate internally to distribute their allotted quota according to each user's needs.
The implementation is done via symlinks created by default when the user account is moved to its new destination:
~/work -> /data/cephfs-1/work/groups/<group>/users/<user>
~/scratch -> /data/cephfs-1/scratch/groups/<group>/users/<user>
/data/cephfs-1/work/groups/<group>
/data/cephfs-1/scratch/groups/<group>
/data/cephfs-2/unmirrored/groups/<group>
/data/cephfs-1/work/projects/<project>
/data/cephfs-1/scratch/projects/<project>
Space on Tier 1 is limited. Your colleagues, other cluster users, and admins will be very grateful if you use it only for files you actively need to perform read/write operations on. This means main project storage should probably always be on Tier 2 with workflows to stage subsets of data onto Tier 1 for analysis.
These examples are based on our experience of processing diverse NGS datasets. Your mileage may vary but there is a basic principle that remains true for all projects.
"},{"location":"storage/storage-migration/#dna-sequencing-wes-wgs","title":"DNA sequencing (WES, WGS)","text":"Typical Whole Genome Sequencing data of a human sample at 100x coverage requires about 150 GB of storage, Whole Exome Sequencing files occupy between 6 and 30 GB. These large files require considerable I/O resources for processing, in particular for the mapping step. A prudent workflow for these kind of analysis would therefore be the following:
fastqs
) from the Tier 2 location to Tier 1. seqtk
is your friend!fastq
files from Tier 2 to Tier 1. Run the your scripts on the whole dataset, and copy the results (bam
or cram
files) back to Tier 2.Tip
Don't forget to use your scratch
area for transient operations, for example to sort your bam
file after mapping. More information on how to efficiently set up your temporary directory here.
Analysis of RNA expression datasets are typically a long and iterative process, where the data must remain accessible for a significant period. However, there is usually no need to keep raw data files and mapping results available once the gene & transcripts counts have been generated. The count files are much smaller than the raw data or the mapped data, so they can live longer on Tier 1.
A typical workflow would be:
fastq
files from Tier 2 to Tier 1.salmon
or STAR
, and store the results on Tier 2.R
, using tximport
and DESeq2
or featureCounts
& edgeR
, for example.R
objects) and the output of salmon
, STAR
, or any mapper/aligner of your choice to Tier 2.Tip
If using STAR
, don't forget to use your scratch
area for transient operations. More information on how to efficiently set up your temporary directory here
The analysis workflow of bulk RNA & single cell dataset is conceptually similar: Large raw files need to be processed once and only the outcome of the processing (gene counts matrices) are required for downstream analysis. Therefore, a typical workflow would be:
fastq
files from Tier 2 to Tier 1.Cell Ranger
or alevin-fry
, perform count matrix QC and store the results on Tier 2.seurat
, scanpy
, or Loupe Browser
.There is no obvious workflow that covers most used cases for machine learning. However,
/fast
to CephFS","text":"Best practices and tools will be provided.
"}]} \ No newline at end of file +{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Home","text":"Welcome to the user documentation of the BIH high-performance computing (HPC) cluster, also called HPC 4 Research. The BIH HPC cluster is managed by CUBI (Core Unit Bioinformatics). This documentation is maintained by BIH CUBI and the user community. It is a living document that you can update and add to. See How-To: Contribute to this Document for details.
The global table of contents is on the left, the one of the current page is on the right.
Additional resources
Read the following set of pages (in order) to learn how to get access and connect to the cluster.
Acknowledging BIH HPC Usage
Acknowledge usage of the cluster in your manuscript as \"Computation has been performed on the HPC for Research/Clinic cluster of the Berlin Institute of Health\". Please add your publications using the cluster to this list.
"},{"location":"#news-maintenance-announcements","title":"News & Maintenance Announcements","text":"hpc-mem-5
with 4 TB of RAM./fast
on all non-transfer nodes.See Maintenance for a detailed list of current, planned, and previous maintenance and update work.
"},{"location":"#technical-details","title":"Technical Details","text":"If you are interested in how this HPC cluster is set up on a technical level, we got you covered. There is an entire section on this.
"},{"location":"#documentation-structure","title":"Documentation Structure","text":"The documentation is structured as follows:
Access to the BIH HPC cluster is conceptually based on user groups (also known as labs or units) and projects. Users have a relatively limited storage quota within their private home folder and store big data primarily within their group's work space or in project folders. Projects are collaborative efforts involving multiple PIs/groups and are allocated separate storage space on the cluster.
Independent group leaders at BIH/Charit\u00e9/MDC can request a group on the cluster and name group members. The work group leader (the group PI) bears the responsibility for the group's members and ensures that cluster policies and etiquette are followed. In brief: Fair usage rules apply and the cluster ist not to be abused for unethical or illegal purposes. Major and/or continued violations may lead to exclusion of the entire group.
The group leader may also name one delegate (typically an IT-savvy Post-Doc) who is thereby allowed to take decision about cluster usage and work group management on behalf of the group leader. The above mentioned responsibilities stay with the group leader.
Note
All cluster users are member of exactly one primary work group. This affiliation is usually defined by real life organisational structures within Charit\u00e9/BIH/MDC. Leaders of independent research groups (PIs) can apply for a new cluster work group as follows:
Important
Changes to an existing group (adding new users, changes in resources, etc.) can only be requested by group leaders and delegates.
"},{"location":"admin/getting-access/#form-new-group","title":"Form: New Group","text":"Example values are given in curly braces.
# Group \"ag-{doe}\"\nGroup leader/PI: {John Doe}\nDelegate [optional]: {Max Mustermann}\nPurpose of cluster usage [short]: {RNA-seq analysis in colorectal cancer}\n\nRequired resources:\n- Tier 1 storage: {1 TB}\n- Tier 1 scratch: {10 TB}\n- Tier 2 storage: {10 TB}\n\n# Users\n## User 1\n- first name: {John}\n- last name: {Doe}\n- affiliation: {Charit\u00e9, Department of Oncology}\n- institute email: {john.doe@charite.de}\n- user has account with\n - [ ] BIH\n - [x] Charite\n - [ ] MDC\n- BIH/Charit\u00e9/MDC user name: {doej}\n\n## User 2\n[etc.]\n
"},{"location":"admin/getting-access/#form-add-user-to-group","title":"Form: Add User to Group","text":"Example values are given in curly braces.
# New user of AG {Doe}\n- first name: {Mia}\n- last name: {Smith}\n- affiliation: {Charit\u00e9, Department of Oncology}\n- institute email: {mia.smith@charite.de}\n- user has account with\n - [ ] BIH\n - [x] Charite\n - [ ] MDC\n- BIH/Charit\u00e9/MDC user name: {smithm}\n
Notes
Projects are secondary user groups to enable:
Project creation can be initiated by group leaders and group delegates as follows:
Important
Changes to an existing project (adding new users, changes in resources, etc.) can only be requested by project owners and delegates. Please send us cluster user names for adding new project members.
"},{"location":"admin/getting-access/#form","title":"Form","text":"Example values are given in curly braces.
# Project \"{doe-dbgap-rna}\"\nProject owner: {John Doe}, {doej_c}\nDelegate [optional]: {Max Mustermann}, {musterm_c}\nPurpose of cluster usage [short]: {RNA-seq data from dbGAP}\n\nRequired resources:\n- Tier 1 work: {0 TB}\n- Tier 1 scratch: {0 TB}\n- Tier 2 storage: {1 TB}\n\nAdditional members (cluster user names):\n- {sorgls_c}\n- ...\n
Notes
This page documents the current and known upcoming maintenance windows.
"},{"location":"admin/maintenance/#login-compute-and-storage-maintenance-december-13-14-2022","title":"Login, Compute and Storage Maintenance, December 13-14, 2022","text":"All informationand updates regarding maintenance will be circulated on our forum https://hpc-talk.cubi.bihealth.org/c/announcements/5.
"},{"location":"admin/maintenance/#login-compute-and-storage-maintenance-march-22-23-2022","title":"Login, Compute and Storage Maintenance, March 22-23, 2022","text":"All COMPUTE nodes and STORAGE resources won't be reachable!
All nodes will be running in RESERVATION mode. This means you are still able to schedule new jobs on these nodes if their potential/allowed runtime does not extend into the maintenance window (Tuesday and Wednesday, March 22 and 23, all-day). For example, if you submit a job that can run up to 7 days after March 15 then the job will remain in \"pending/PD\" state giving the explanation of \"all nodes being reserved or unavailable\".
Issues of today's maintenance:
/tmp
on login nodescephfs-2
switches (Tier 2 storage, not relevant for most users)IMPORTANT
Progress Thread on hpc-talk
"},{"location":"admin/maintenance/#drmaa-deprecation-march-2-2022","title":"DRMAA Deprecation, March 2, 2022","text":"scontrol show job JOBID
and sacct -j JOBID
.snakemake --profile=cubi-v1
instead of snakemake --drmaa \"...\"
.rule myrule:\n # ...\n threads: 8\n resources:\n time=\"12:00:00\",\n memory=\"8G\",\n # ...\n
SchedulerParameters+=bf_max_job_user=50
: backfill scheduler only considers 50 jobs of each user. This mitigates an issue with some users having too many jobs and thus other users' jobs don't get ahead in the queueEnforcePartLimits=ALL
: jobs that don't fit into their partition are rejectedDependencyParameters=kill_invalid_depend
: jobs that have dependencies set that cannot be fulfilled will be killedlocaltmp
Resource, January 31, 2022","text":"localtmp
resource for local storage above 100MB./tmp
using Linux namespaces/cgroups. This greatly improves the reliability of cleaning up after jobs. (Technically, this is implemented using the Slurm job_container/tmpfs) plugin.Gres
) \"localtmp\". In the future this will become a requirement. Also see Slurm: Temporary Files.hpc-login-1.cubi.bihealth.org
hpc-login-2.cubi.bihealth.org
hpc-portal.cubi.bihealth.org
hpc-transfer-1.cubi.bihealth.org
hpc-transfer-2.cubi.bihealth.org
hpc-gpu-{5..7}
. 28.08.5
.The GPFS storage system has been upgraded to the latest version to make compatible with Enterprise Linux version 8.
"},{"location":"admin/maintenance/#slurm-upgrade-to-21080-september-8-2021","title":"Slurm upgrade to21.08.0
, September 8, 2021","text":"Slurm has been upgraded to version 21.08.0
.
All servers/nodes won't be reachable!
All nodes will be running in reservation mode. This means you are still able to schedule new jobs on these nodes if their potential/allowed runtime does not extend into the maintenance window (Tuesday and Wednesday, September 7 and 8, all-day). For example, if you submit a job that can run up to 7 days after August 30 then the job will remain in \"pending/PD\" state giving the explanation of \"all nodes being reserved or unavailable\".
If you already have a job running on any nodes that goes beyond September 7, 12:00 am (00:00 Uhr), this job will die.
"},{"location":"admin/maintenance/#renaming-of-gpu-high-memory-machines-scheduler-changes-september-7-2021","title":"Renaming of GPU & High Memory Machines & Scheduler Changes, September 7, 2021","text":"The GPU machines med030[1-4]
have been renamed to hpc-gpu-[1-4]
. The high memory machines med040[1-4]
have been renamed to hpc-mem-[1-4]
. It will probably take us some time to update all places in the documentation.
Further, the long
partition has been changed to allow jobs with a maximum running time of 14 days.
staging
partition, August 31, 2021","text":"We have installed 36 new nodes (in BETA mode) in the cluster called hpc-node-[1-36]
. They have 48 cores (thus 96 hardware threads) each and have 360GiB of main memory available (for the hardware nerds, it's Intel(R) Xeon(R) Gold 6240R CPUs at 2.40GHz, featuring the cascadelake
architecture).
Right now, they are only available in the staging
partition. After some testing we will move them to the other partitions. We'd like to ask you to test them as well and report any issues to hpc-helpdesk@bih-charite.de. The nodes have been setup identically to the existing med0xxx
nodes. We do not expect big changes but the nodes might not be as stable as other oness.
Here is how you can reach them.
hpc-login-1 # srun --immediate=5 --pty --time=24:00:00 --partition=staging bash -i\n[...]\nhpc-cpu-1 #\n
Note that I'm specifying a maximal running time of 24h so the scheduler will end the job after 24 hours which is before the upcoming maintenance reservation begins. By default, the scheduler allocates 28 days to the job which means that the job cannot end before the reservation and will be scheduled to start after it. See Reservations / Maintenances for more information about maintenance reservations.
"},{"location":"admin/maintenance/#reservation-maintenance-display-on-login-august-30-2021","title":"Reservation / Maintenance Display on Login, August 30, 2021","text":"User will now be notified on login about maintenance, for example:
NOTE: scheduled maintenance(s)\n\n 1: 2021-09-07 00:00:00 to 2021-09-09 00:00:00 ALL nodes\n\nSlurm jobs will only start if they do not overlap with scheduled reservations.\nMore information:\n\n - https://bihealth.github.io/bih-cluster/slurm/reservations/\n - https://bihealth.github.io/bih-cluster/admin/maintenance/\n
"},{"location":"admin/maintenance/#update-to-job-sumission-script-august-23-2021","title":"Update to Job Sumission Script, August 23, 2021","text":"The srun
command will now behave as if --immediate=60
has been specified by default. It explains how to override this behaviour and possible reasons for job scheduling to fail within 60 seconds (reservations and full cluster).
We upgrade from 20.11.2
to 20.11.8
which contains some fixes for bugs that our users actually stumbled over. The change should be non-intrusive as it's only a patch-level update.
Following servers won't be reachable:
These nodes are running in reservation mode now. This means you are still able to schedule new jobs on these nodes if their potential/allowed runtime does not extend into the maintenance window (Tuesday, August 3, all-day). For example, if you submit a job that can run up to 7 days after July 26 then the job will remain in \"pending/PD\" state giving the explanation of \"all nodes being reserved or unavailable\". If you have a job running on any of the before mentioned nodes that goes beyond August 3, 12:00 am (00:00 Uhr), this job will die. We do not expect the remaining nodes to be affected. However, there remains a minor risk of unexpected downtime of other nodes.
"},{"location":"admin/maintenance/#server-reorganization-july-13-2021","title":"Server reorganization, July 13, 2021","text":"Affected servers are:
If you have a job running on any of the before mentioned nodes that goes beyond June 22, 6am, this job will die. We put a so-called Slurm reservation for the maintenance period. Any job that is scheduled before the maintenance and whose end time (start time + max running time) is not before the start of the maintenance will not be scheduled with the message ReqNodeNotAvail, Reserved for maintenance.
Affected servers are:
HPC 4 Research
Note
This task is currently being planned. No schedule has been fixed yet.
Note
This task is currently being planned. No schedule has been fixed yet.
/fast
that currently points to /data/gpfs-1
on HPC 4 Research./data
instead of /fast
everywhere, e.g., /data/users/$NAME
etc.Time: 6am-12am
/fast
file system will be re-mounted to /data/gpfs-1
./fast
becomes a symbolic link to /data
on all of the cluster.hpc-login-1.cubi.bihealth.org
and login-2...
instead of hpc-login-{1,2}
.hpc-transfer-{1,2}
which will be replaced by transfer-1.research.hpc.bihealth.org
and transfer-2...
.med010[1-3]
and med012[5-6]
.On June 3, we need to perform a network maintenance at 8 am.
If everything goes well, there might be a short delay in network packages and connections will survive. In this case, the maintenance will end 8:30 am.
Otherwise, the maintenance will finish by noon.
"},{"location":"admin/maintenance/#cluster-maintenance-with-downtime-june-16","title":"Cluster Maintenance with Downtime: June 16","text":"We need to schedule a full cluster downtime on June 16.
"},{"location":"admin/maintenance/#slurm-migration","title":"Slurm Migration","text":"We will switch to the Slurm workload scheduler (from the legacy SGE). The main reason is that Slurm allows for better scheduling of GPUs (and has loads of improvements over SGE), but the syntax is a bit different. Currently, our documentation is in an transient state. We are currently extending our Slurm-specific documentation.
SSH Key Management has switched to using Charite and MDC ActiveDirectory servers. You need to upload all keys by the end of April 2020.
Schedule
Feb 4, 2020:
Keys are now also taken from central MDC/Charite servers. You do not need to contact us any more to update your keys (we cannot accelerate the process at MDC).May 1, 2020:
Keys are now only taken from central MDC/Charite servers. You must upload your keys to central servers by then.Affected systems:
hpc-transfer-1
hpc-transfer-2
hpc-login-2
The compute nodes are non-critical as we are taking them out of the queues now.
"},{"location":"admin/maintenance/#centos-76-upgrade-january-29-february-5","title":"CentOS 7.6 Upgrade, January 29, February 5","text":"Starting monday 03.09.2018 we will be performing rolling update of the cluster from CentOS 7.4 to CentOS 7.5. Since update will be performed in small bunches of nodes, the only impact you should notice is smaller number of nodes available for computation.
Also, for around two weeks, you can expect that your jobs can hit both CentOS 7.4 & CentOS 7.5 nodes. This should not impact you in any way, but if you encounter any unexpected behavior of the cluster during this time, please let us know.
At some point we will have to update the transfer, and login nodes. We will do this also in parts, so the you can switch to the other machine.
Key dates are:
18.09.2018 - hpc-login-1 & hpc-transfer-1 will not be available, and you should switch to hpc-login-2 & hpc-transfer-2 respectively.
25.09.2018 - hpc-login-2 & hpc-transfer-2 will not be available, and you should switch to hpc-login-1 & hpc-transfer-1 respectively.
Please also be informed that non-invasive maintenance this weekend which we announced has been canceled, so cluster will operate normally.
In case of any concerns, issues, do not hesitate to contact us via hpc-admin@bih-charite.de, or hpc-helpdesk@bih-charite.de.
"},{"location":"admin/maintenance/#june-18-2018-0600-1500","title":"June 18, 2018, 0600-1500","text":"Due to tasks we need to perform on BIH cluster, we have planned maintenance:
During maintenance we will perform several actions:
During maintenance whole cluster will not be usable, this includes:
Maintenance window is quite long, since we are dependent on external vendor. However, we will recover services as soon as possible.
We will keep you posted during maintenance with services status.
"},{"location":"admin/maintenance/#march-16-18-2018-mdc-it","title":"March 16-18, 2018 (MDC IT)","text":"MDC IT has a network maintenance from Friday, March 16 18:00 hours until Sunday March 18 18:00 hours.
This will affect connections to the cluster but no connections within the cluster.
"},{"location":"admin/maintenance/#january-17-2018-complete","title":"January 17, 2018 (Complete)","text":"STATUS: complete
The first aim of this window is to upgrade the cluster to CentOS 7.4 to patch against the Meltdown/Spectre vulnerabilities. For this, the login and transfer nodes have to be rebooted.
The second aim of this window is to reboot the file server to mitigate some NFS errors. For this, the SGE master has to be stopped for some time.
"},{"location":"admin/maintenance/#planprogress","title":"Plan/Progress","text":"(since January 2010)
This page describes strictly enforced policies valid on the BIH HPC clusters.
The aim of the HPC systems is to support the users in their scientific work and relies on their cooperation. First and foremost, the administration team enforces state of the art IT security and reliability practices through their organizational and operational processes and actions. We kindly ask user to follow the Cluster Etiquette describe below to allow for fair use and flexible access to the shared resources. Beyond this, policies are introduced or enforced only when required to ensure non-restrictive access to the resources themselves. Major or recurrent breaches of policies may lead to exclusion from service.
We will update this list of policies over time. Larger changes will be announced through the mailing list.
"},{"location":"admin/policies/#cluster-etiquette","title":"Cluster Etiquette","text":"getent passswd $USER
to find out the user's office contact details).conda
, archive management tools such as tar
, (un)zip
, or gzip
. You should probably only run screen
/tmux
and maybe a text editor there.hpc-transfer-1
and hpc-transfer-2
.In the case of violations marked with a shield () administration reserves the right to remove write and possibly read permission to the given locations. Policies marked with a robot () are automatically enforced.
home
, work
, and scratch
volume). You can request an increase by an email to hpc-helpdesk@bih-charite.de for groups and projects.home
10k files, 1GB spacework
2M files, 1TB spacescratch
20M files, 200TB spacehpc-users
and mode is u=rwx,go=
; POSIX ACLs are prohibited. This policy is automatically enforced every 5 minutes.u=rwx,g=rwxs,o=
; POSIX ACLs are prohibited. This policy is automatically enforced every 5 minutes.scratch/BIH_TRASH
after 14 days (by mtime
) over night. Trash directories will be removed after 14 further days.touch
on files in scratch
and subsequently bumping the mtime
./tmp
). In the case that users need to delete files that they can access but not update/delete, administration will either give write permissions to the Unix group of the work group or project or change the owner to the owner/delegate of this group. This can occur in a group/project directory of a user who has left the organization. In the case that a user leaves the organization, the owner/delegate of the hosting group can request getting access to the user's files with the express agreement of this user./tmp
in Slurm-controlled jobs. This will enforce that Slurm can clean up after you.Network connections are a topic important in security. In the case of violations marked with a shield () administration reserves the right to terminate connections without notice and perform other actions.
screen
and tmux
are only allowed to run on the head nodes. They will be terminated automatically on the compute nodes.srun
).~/.ssh/authorized_keys
file but their usage is discouraged.~/.bashrc
Guide","text":"You can find the current default content of newly created user homes in /etc/skel.bih
:
hpc-login-1:~$ head /etc/skel.bih/.bash*\n==> /etc/skel.bih/.bash_logout <==\n# ~/.bash_logout\n\n==> /etc/skel.bih/.bash_profile <==\n# .bash_profile\n\n# Get the aliases and functions\nif [ -f ~/.bashrc ]; then\n . ~/.bashrc\nfi\n\n# User specific environment and startup programs\n\nPATH=$PATH:$HOME/.local/bin:$HOME/bin\n\n==> /etc/skel.bih/.bashrc <==\n# .bashrc\n\n# Source global definitions\nif [ -f /etc/bashrc ]; then\n . /etc/bashrc\nfi\n\n# Uncomment the following line if you don't like systemctl's auto-paging feature:\n# export SYSTEMD_PAGER=\n
"},{"location":"best-practice/env-modules/","title":"Custom Environment Modules","text":"This document contains a few tips for helping you using environment modules more effectively. As the general online documentation is lacking a bit, we also give the most popular commands here.
"},{"location":"best-practice/env-modules/#how-does-it-work","title":"How does it Work?","text":"Environment modules are descriptions of software packages. The module
command is provided which allows the manipulation of environment variables such as PATH
, MANPATH
, etc., such that programs are available without passing the full path. Environment modules also allow specifying dependencies between packages and conflicting packages (e.g., when the same binary is available in two packages). Further, environment variables allow the parallel installation of different software versions in parallel and then using software \"a la carte\" in your projects.
List currently loaded modules:
$ module list\n
Show all available modules
$ module avail\n
"},{"location":"best-practice/env-modules/#loadingunloading-modules","title":"Loading/Unloading Modules","text":"Load one module, make sure to use a specific version to avoid ambiguities.
$ module load Jannovar/0.16-Java-1.7.0_80\n
Unload one module
$ module unload Jannovar\n
Unload all modules
$ module purge\n
"},{"location":"best-practice/env-modules/#getting-help","title":"Getting Help","text":"Get help for environment modules
$ module help\n
Get help for a particular environment module
$ module help Jannovar/0.16-Java-1.7.0_80\n
"},{"location":"best-practice/env-modules/#using-your-own-module-files","title":"Using your own Module Files","text":"You can also create your own environment modules. Simply create a directory with module files and then use module use
for using the modules from the directory tree.
$ module use path/to/modules\n
"},{"location":"best-practice/env-modules/#faq-why-bash-module-command-not-found","title":"FAQ: Why -bash: module: command not found
?","text":"On the login nodes, the module
command is not installed. You should not run any computations there, so why would you need environment modules there? ;)
meg-login2$ module\n-bash: module: command not found\n
Use srun --pty bash -i
to get to one of the compute nodes.
You will certainly finding yourself using a set of programs regularly without it being part of the core cluster installation, e.g., SAMtools, or Python 3. Just putting the appropriate module load
lines in your ~/.bashrc
will generate warnings when logging into the login node. It is thus recommended to use the following snippet for loading modules automatically on logging into a compute node:
case \"${HOSTNAME}\" in\n login-*)\n ;;\n *)\n # load Python3 environment module\n module load Python/3.4.3-foss-2015a\n\n # Define path for temporary directories, don't forget to cleanup!\n # Also, this will only work after /fast is available.\n export TMPDIR=/data/cephfs-1/home/users/$USER/scratch/tmp\n ;;\nesac\n
"},{"location":"best-practice/project-structure/","title":"Project File System Structure","text":"Under Construction
This guide was written for the old GPFS file system and is in the process of being updated.
"},{"location":"best-practice/project-structure/#general-aims","title":"General Aims","text":"Mostly, you can separate the files in your projects/pipelines into one of the following categories:
Ideally, scripts and documentation are independent of a given project and can be separated from the rest. Configuration is project-dependent and small and mostly does not contain any sensitive information (such as genotypes that allows for reidentification of donors). In most cases, data might be large and is either also stored elsewhere or together with scripts and configuration can be regenerated easily.
There is no backup of work
and scratch
The cluster GPFS file system /fast
is not appropriate for keeping around single \"master\" copies of data. You should have a backup and archival strategy for your valuable \"master\" copy data.
In addition, you might need project-specific \"wrapper\" scripts that just call your project-independent script with the correct paths for your project. These scripts rather fall into the \"configuration\" category and should then live together with your configuration.
"},{"location":"best-practice/project-structure/#data","title":"Data","text":"Temporary files
You really should keep temporary files in a temporary directory, set the environment variable TMPDIR
appropriately and automatically clean them up (see Useful Tips: Temporary Files)
But how can we put this into practice? Below, we give some examples of how to do this. Note that for simplicity's sake we put all scripts and configuration into one directory/repository contrary to the best practices above. This is for educational purposes only and you should strive for reuseable scripts where it makes sense and separate scripts and configuration.
We will limit this to simple Bash scripts for education's purposes. You should be able to easily adapt this to your use cases.
Thus, the aim is to separate the data from the non-data part of the project such that we can put the non-data part of the project into a separate location and under version control. We call the location for non-data part of the project the home location of your project and the location for the data part of the project the work location of your project.
Overall, we have three options:
Creating the work directory and copy the input files into work/input
.
$ mkdir -p project/work/input\n$ cp /data/cephfs-1/work/projects/cubit/tutorial/input/* project/work/input\n
Creating the home space. We initialize a Git repository, properly configure the .gitignore
file and add a README.md
file.
$ mkdir -p project/home\n$ cd project/home\n$ cat <<EOF >.gitignore\n*~\n.*.sw?\nEOF\n$ cat <<EOF >README.md\n# Example Project\n\nThis is an example project with config/scripts linked into work location.\nEOF\n$ git init\n$ git add .gitignore README.md\n$ git commit -m 'Initial project#\n
We then create the a simple script for executing the mapping step and a configuration file that gives the path to the index and list of samples to process.
$ mkdir scripts\n$ cat <<\"EOF\" >scripts/run-mapping.sh\n#!/bin/bash\n\n# Unofficial Bash script mode, see:\n# http://redsymbol.net/articles/unofficial-bash-strict-mode/\nset -euo pipefail\n\n# Get directory to bash file, see\n# https://stackoverflow.com/a/4774063/84349\nSCRIPTPATH=\"$( cd \"$(dirname \"$0\")\" ; pwd -P )\"\n\n# Helper function to print help to stderr.\nhelp()\n{\n >&2 echo \"Run Mapping Step\"\n >&2 echo \"\"\n >&2 echo \"run-mapping.sh [-c config.sh] [-h]\"\n}\n\n# Parse command line arguments into bash variables.\nCONFIG=\nwhile getopts \"hs:\" arg; do\n case $arg in\n h)\n help()\n exit\n ;;\n s)\n CONFIG=$OPTARG\n ;;\n esac\ndone\n\n# Print the executed commands.\nset -x\n\n# Load default configuration, then load configuration file if any was given.\nsource $SCRIPTPATH/../config/default-config.sh\nif [[ -z \"$CONFIG\" ]]; then\n source $CONFIG\nfi\n\n# Create output directory.\nmkdir -p output\n\n# Actually perform the mapping. This assumes that you have\n# made the bwa and samtools commands available, e.g., using conda.\nfor sample in $SAMPLES; do\n bwa mem \\\n $BWA_INDEX \\\n input/${sample}_R1.fq.gz \\\n input/${sample}_R2.fq.gz \\\n | samtools sort \\\n -o output/${sample}.bam \\\n /dev/stdin\ndone\n\nEOF\n$ chmod +x scripts/run-mapping.sh\n$ mkdir -p config\n$ cat <<\"EOF\" >config/default-config.sh\nBWA_INDEX=/data/cephfs-1/work/projects/cubit/current/static_data/reference/GRCh37/hs37d5/hs37d5.fa\nSAMPLES=\nEOF\n$ cat <<\"EOF\" >config/project-config.sh\n$ BWA_INDEX comes from default configuration already\nSAMPLES=test\nEOF\n
This concludes the basic project setup. Now, to the symlinks:
$ cd ../work\n$ ln -s ../home/scripts ../home/config .\n
And, to the execution...
$ ./scripts/run-mapping -c config/project-config.sh\n[...]\n
"},{"location":"best-practice/project-structure/#example-link-data-into-home-option-21","title":"Example: Link Data Into Home (Option 2.1).","text":"We can reuse the project up to the statement \"This concludes the basic project setup\" in the example for option 1.
Then, we can do the following:
$ cd ../work\n$ mkdir -p output\n\n$ cd ../home\n$ cat <<\"EOF\" >>.gitignore\n\n# Ignore all data\ninput/\nwork/\noutput/\nEOF\n$ git add .gitignore\n$ git commit -m 'Ignoring data file in .gitignore'\n$ ln -s ../work ../output .\n
And we can execute everything in the home directory.
$ ./scripts/run-mapping -c config/project-config.sh\n[...]\n
"},{"location":"best-practice/project-structure/#example-wrapper-scripts-in-home-option-22","title":"Example: Wrapper Scripts in Home (Option 2.2)","text":"Again, we can reuse the project up to the statement \"This concludes the basic project setup\" in the example for option 1.
Then, we do the following:
$ cd ../work\n$ cat <<\"EOF\" >do-run-mapping.sh\n#!/bin/bash\n\n../home/scripts/run-mapping.sh \\\n -c ../home/config/project-config.sh\nEOF\n$ chmod +x do-run-mapping.sh\n
Note that the the do-run.sh
script could also go into the project-specific Git repository and be linked into the work directory.
Finally, we can run our pipeline:
$ cd ../work\n$ ./do-run-mapping.sh\n[...]\n
"},{"location":"best-practice/screen-tmux/","title":"Screen and Tmux Best Pratice","text":"The program screen
allows you to detach your session from your current login session. So in case you get disconnected your screen session will stay alive.
Hint
You have to reconnect to screen on the machine that you started it. We thus recommend starting it only on the login nodes and not on a compute node.
"},{"location":"best-practice/screen-tmux/#start-and-terminat-a-screen-session","title":"Start and terminat a screen session","text":"You start a new screen
session by
$ screen\n
When you are in a screen session you can terminate it with $ exit\n
so its gone then."},{"location":"best-practice/screen-tmux/#detach-a-screen-session","title":"Detach a screen session","text":"If you want to detach your screen session press Ctrl+a d
To list all your screen sessions run
$ screen -ls\n\nThere is a screen on:\n 2441.pts-1.med0236 (Detached)\n1 Socket in /var/run/screen/S-kbentel.\n
"},{"location":"best-practice/screen-tmux/#reattach-screen-session","title":"Reattach screen session","text":"To reattach a screen session run
$ screen -r screen_session_id\n
If you do not know the screen_session_id
you can get it with screen -ls
, e.g. 2441.pts-1.med0236
in the example above. You do not have to type the whole screen_session_id
only as much as is necessary to identify it uniquely. In case there is only one screen session detached it is enough to run screen -r
Sometimes it is necessary to kill a detached screen session. This is done with the command
$ screen -X -S screen_session_id quit\n
"},{"location":"best-practice/screen-tmux/#multiple-windows-in-a-screen-session","title":"Multiple windows in a screen session","text":"It is possible to have multiple windows in a screen session. So suppose you are logged into a screen session, these are the relevant shortcuts
new win: Ctrl+a c\nnext/previous win: Ctrl+a n/p\n
To terminate a window just enter
$ exit\n
"},{"location":"best-practice/screen-tmux/#configuration-file","title":"Configuration file","text":"Here is a sensible screen configuration. Save it as ~/.screenrc
.
screenrc
"},{"location":"best-practice/screen-tmux/#fix-a-broken-screen-session","title":"Fix a broken screen session","text":"In case your screen session doesn't write to the terminal correctly, i.e. the formatting of the output is broken, you can fix it by typing to the terminal:
$ tput smam\n
"},{"location":"best-practice/software-craftmanship/","title":"General Software Craftmanship","text":"Computer software, or simply software, is a generic term that refers to a collection of data or computer instructions that tell the computer how to work, in contrast to the physical hardware from which the system is built, that actually performs the work. -- Wikipedia: Software
As you will most probably never have contact with the HPC system hardware, everything you interact with on the HPC is software. All of your scripts, your configuration files, programs installed by you or administration, and all of your data.
This should also answer the question why you should care about software and why you should try to create and use software of a minimal quality.
Software craftsmanship is an approach to software development that emphasizes the coding skills of the software developers themselves. -- Wikipedia: Software Craftmanship
This Wiki page is not mean to give you an introduction of creating good software but rather collect a (growing) list of easy-to-use and high-impact points to improve software quality. Also, it provides pointers to resources elsewhere on the internet.
"},{"location":"best-practice/software-craftmanship/#use-version-control","title":"Use Version Control","text":"Use a version control system for your configuration and your code. Full stop. Modern version control systems are Git and Subversion.
Every user should have their own Git/Subversion checkout. Otherwise you are inviting a large number of problems.
"},{"location":"best-practice/software-craftmanship/#document-your-code","title":"Document Your Code","text":"This includes
Document where you got things from, how to re-download, etc. E.g., put a README file into each of your data top level directories.
"},{"location":"best-practice/software-craftmanship/#use-checksums","title":"Use Checksums","text":"Use MD5 or other checksums for your data. For example, md5sum
and hashdeep
are useful utilities for computing and checking them:
md5sum
How-To (tools such as sha256sum
work the same...)hashdeep
How-ToUse some system for managing your workflows. These systems support you by
Snakemake is a popular workflow management system widely used in Bioinformatics. A minimal approach is using Makefiles.
"},{"location":"best-practice/software-craftmanship/#understand-bash-and-shell-exit-codes","title":"Understand Bash and Shell Exit Codes","text":"If you don't want to use a workflow management system, e.g., for one-step jobs, you should at least understand Bash job management and exit codes. For example, you can use if/then/fi
in Bash together with exit codes to:
if [[ ! -e file.md5 ]]; then\n md5sum file >file.md5 \\\n || rm -f file.md5\nfi\n
Also, learn about the inofficial Bash strict mode.
"},{"location":"best-practice/software-installation-with-conda/","title":"Software Installation with Conda","text":""},{"location":"best-practice/software-installation-with-conda/#conda","title":"Conda","text":"Users do not have the rights to install system packages on the BIH HPC cluster. For the management of bioinformatics software we therefore recommend using the conda package manager. Conda provides software in different \u201cchannels\u201d and one of those channels contains a huge selection of bioinformatics software (bioconda). Generally packages are pre-compiled and conda just downloads the binaries from the conda servers.
You are in charge of managing your own software stack, but conda makes it easy to do so. We will provide you with a description on how to install conda and how to use it. Of course there are many online resources that you can also use. Please find a list at the end of the document.
Also note that some system-level software is managed through environment modules.
"},{"location":"best-practice/software-installation-with-conda/#premise","title":"Premise","text":"When you logged into the cluster, please make sure that you also executed srun
to log into a computation node and perform the software installation there.
hpc-login-1:~$ srun --mem=5G --pty bash -i\nhpc-cpu-123:~$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh\nhpc-cpu-123:~$ bash Miniconda3-latest-Linux-x86_64.sh -b -f -p $HOME/work/miniconda\nhpc-cpu-123:~$ eval \"$(/$HOME/work/miniconda/bin/conda shell.bash hook)\"\nhpc-cpu-123:~$ conda init\nhpc-cpu-123:~$ conda config --set auto_activate_base false\n
This will install conda to $HOME/work/miniconda
. You can change the path to your liking, but please note that your $HOME
folder has limited space. The work
subfolder however has a bigger quota. More about this here.
To make bioinformatics software available, we have to add the bioconda
and some other channels to the conda configuration:
hpc-cpu-123:~$ conda config --add channels bioconda\nhpc-cpu-123:~$ conda config --add channels default\nhpc-cpu-123:~$ conda config --add channels conda-forge\n
"},{"location":"best-practice/software-installation-with-conda/#installing-software-with-conda","title":"Installing software with conda","text":"Installing packages with conda is straight forward:
hpc-cpu-123:~$ conda install <package>\n
This will install a package into the conda base environment. We will explain environments in detail in the next section. To search for a package, e.g. to find the correct name in conda or if it exists at all, issue the command:
hpc-cpu-123:~$ conda search <string>\n
To choose a specific version (conda will install the latest version that is compatible with the current installed Python version), you can provide the version as follows:
hpc-cpu-123:~$ conda install <package>=<version>\n
Please note that new conda installs may ship with a recently update Python version and not all packages might have been adapted. E.g., if you find out that some packages don't work after starting out/upgrading to Python 3.8, simply try to downgrade Python to 3.7 with conda install python=3.7
.
Hint
As resolving the dependency tree of an installation candidate can take a lot of time in Conda, especially when you are installing software from an environment.yaml
file, an alternative resolver has been presented that you can use to install software into your Conda environment. The time savings are immense and an installation that took more than an hour can be resolved in seconds.
Simply run
hpc-cpu-123:~$ conda install mamba\n
With that, you can install software into your environment using the same syntax as for Conda:
hpc-cpu-123:~$ mamba install <package>\n
"},{"location":"best-practice/software-installation-with-conda/#creating-an-environment","title":"Creating an environment","text":"Conda lets you create environments, such that you can test things in a different environment or group your software. Another common use case is to have different environments for the different Python versions. Since conda is Python-based, conflicting packages will mostly struggle with the Python version.
By default, conda will install packages into its root environment. Please note that software that does not depend on Python and is installed in the root environment, is is available in all other environments.
To create a Python 2.7 environment and activate it, issue the following commands:
hpc-cpu-123:~$ conda create -n py27 python=2.7\nhpc-cpu-123:~$ source activate py27\n(py27) hpc-cpu-123:~$\n
From now on, conda will install packages into the py27
environment when you issue the install
command. To switch back to the root environment, simply deactivate the py27
environment:
(py27) hpc-cpu-123:~$ source deactivate py27\nhpc-cpu-123:~$\n
But of course, as Python 2.7 is not supported any more by the Python Software Foundation, you should switch over to Python 3 already!
"},{"location":"best-practice/temp-files/","title":"Temporary Files","text":"Temporary Files and Slurm
See Slurm: Temporary Files for information how Slurm controls access to local temporary storage.
Often, it is necessary to use temporary files, i.e., write something out in the middle of your program, read it in again later, and then discard these files. For example, samtools sort
has to write out chunks of sorted read alignments for allowing to sort files larger than main memory.
TMPDIR
","text":"Traditionally, in Unix, the environment variables TMPDIR
is used for storing the location of the temporary directory. When undefined, usually /tmp
is used.
Generally, there are two locations where you could put temporary files:
/data/cephfs-1/home/users/$USER/scratch/tmp
-- inside your scratch folder on the CephFS file system; this location is available from all cluster nodes/tmp
-- on the local node's temporary folder; this location is only available on the node itself. The slurm scheduler uses Linux namespaces such that every job gets its private /tmp
even when run on the same node.scratch/tmp
","text":"Use CephFS-based TMPDIR
Generally setup your environment to use /data/cephfs-1/home/users/$USER/scratch/tmp
as filling the local disk of a node with forgotten files can cause a lot of problems.
Ideally, you append the following to your ~/.bashrc
to use /data/cephfs-1/home/users/$USER/scratch/tmp
as the temporary directory. This will also create the directory if it does not exist. Further, it will create one directory per host name which prevents too many entries in the temporary directory.
export TMPDIR=$HOME/scratch/tmp/$(hostname)\nmkdir -p $TMPDIR\n
Prepending this to your job scripts is also recommended as it will ensure that the temporary directory exists.
"},{"location":"best-practice/temp-files/#tmpdir-and-the-scheduler","title":"TMPDIR
and the scheduler","text":"In the older nodes, the local disk is a relatively slow spinning disk, in the newer nodes, the local disk is a relatively fast SSD. Further, the local disk is independent from the CephFS file system, so I/O volume to it does not affect the network or any other job on other nodes. Please note that by default, Slurm will not change your environment variables. This includes the environment variable TMPDIR
.
Slurm will automatically update temporary files in a job's /tmp
on the local file system when the job terminates. To automatically clean up temporary directories on the shared file system, use the following tip.
You can use the following code at the top of your job script to set TMPDIR
to the location in your home directory and get the directory automatically cleaned when the job is done (regardless of successful or erroneous completion):
# First, point TMPDIR to the scratch in your home as mktemp will use thi\nexport TMPDIR=$HOME/scratch/tmp\n# Second, create another unique temporary directory within this directory\nexport TMPDIR=$(mktemp -d)\n# Finally, setup the cleanup trap\ntrap \"rm -rf $TMPDIR\" EXIT\n
"},{"location":"connecting/connecting-windows/","title":"Connecting via SSH on Windows","text":""},{"location":"connecting/connecting-windows/#install-ssh-client-for-windows","title":"Install SSH Client for Windows","text":"We recommend to use the program MobaXterm on Windows. MobaXterm is a software that allows you to connect to an SSH server, much like PuTTy, but also maintains your SSH key.
Alternative SSH Clients for Windows
For transfering data from/to Windows, we recommand using WinSCP. Install the latest version from here: https://winscp.net/eng/download.php
On the Login
screen of WinSCP create a new login by selecting New Site
.
Fill in the following parameters:
File protocol
: SFTP
Host name
: hpc-transfer-1.cubi.bihealth.org
or hpc-transfer-2.cubi.bihealth.org
User name
: your user nameGo to Advanced
> SSH
> Authentication
> Authentication parameters
> Private key file
and select your private ssh key file (in .ppk
format).
Press Ok
then Save
.
Press Login
to connect. It will ask for your private key passphrase, if you set one up.
If you need to convert your private ssh key file the .ppk
format, on the WinSCP login screen go to Tools
> PuTTYgen
and follow the steps here: https://docs.acquia.com/cloud-platform/manage/ssh/sftp-key/
Click on Session
.
Click on SSH
.
In Basic SSH settings, enter a hostname (hpc-login-X.cubi.bihealth.org
, where X
is 1 or 2), check Specify username and enter your username in the textfield. Select the tab Advanced SSH settings, check Use private key and select your private SSH key file (possible choices described with the next to figures).
Select the id_rsa
file generated in Linux OR
select the id_rsa.ppk
file generated in Windows with MobaXterm.
Afterwards hit the OK button and MobaXterm will connect.
The session will be stored automatically and you can establish new connections later on, or also multiple ones at the same time, if you like.
"},{"location":"connecting/connecting/","title":"Connecting to HPC 4 Research","text":"HPC 4 Research is only available via the Charit\u00e9, MDC, and BIH internal networks. VPN access requires additional measures which are described in Connecting from External Networks.
There are two primary methods for interacting with BIH HPC:
This part of the documentation only described direct console access via SSH. For information regarding the web portal, please read OnDemand Portal. In case you're not familiar with SSH, you should probably start via the web portal or (if you are determined to learn) read through our SSH basics page.
"},{"location":"connecting/connecting/#in-brief","title":"In brief","text":"Follow these steps to connect to BIH HPC via the command line:
Connect to one of the two login nodes.
# Charite Users\n$ ssh user_c@hpc-login-1.cubi.bihealth.org\n$ ssh user_c@hpc-login-2.cubi.bihealth.org\n\n# MDC Users\n$ ssh user_m@hpc-login-1.cubi.bihealth.org\n$ ssh user_m@hpc-login-2.cubi.bihealth.org\n
Hint
There are two login nodes, hpc-login-1
and hpc-login-2
. There are two for redundancy reasons. Please do not perform big file transfers or an sshfs
mount via the login nodes. For this purpose, we have hpc-transfer-1
and hpc-transfer-2
.
Please also read Advanced SSH for more custom scenarios how to connect to BIH HPC. If you are using a Windows PC to access BIH HPC, please read Connecting via SSH on Windows
Allocate resources on a computation node using Slurm. Do not compute on the login node!
# Start interactive shell on computation node\n$ srun --pty bash -i\n
Bonus: Configure your SSH client on Linux and Mac or Windows.
tl;dr
SSH-Based Access:
# Interactive login (choose one)\nssh username@hpc-login-1.cubi.bihealth.org\nssh username@hpc-login-2.cubi.bihealth.org\nsrun --pty bash -i\n\n# File Transfer (choose one)\nsftp local/file username@hpc-transfer-1.cubi.bihealth.org:remote/file\nsftp username@hpc-transfer-2.cubi.bihealth.org:remote/file local/file\n\n# Interactive login into the transfer nodes (choose one)\nssh username@hpc-transfer-1.cubi.bihealth.org\nssh username@hpc-transfer-2.cubi.bihealth.org\n
Your username for accessing the cluster are composed of your username at your primary organization (Charit\u00e9/MDC) and a suffix:
<Charite username>_c -> doej_c
<MDC username>_m -> jdoe_m
Please read Connecting from External Networks
"},{"location":"connecting/connecting/#i-have-problems-connecting","title":"I have problems connecting","text":"Please read Debugging Connection Problems
"},{"location":"connecting/connection-problems/","title":"Debugging Connection Problems","text":"When you encounter problems with the login to the cluster although we indicated that you should have access, depending on the issue, here is a list of how to solve the problem:
"},{"location":"connecting/connection-problems/#im-getting-a-connection-refused","title":"I'm getting a \"connection refused\"","text":"The full error message looks as follows:
ssh: connect to host hpc-login-1.cubi.bihealth.org port 22: Connection refused\n
This means that your computer could not open a network connection to the server.
<DEST>
):ifconfig\ntraceroute <DEST>\n
ipconfig\ntracepath <DEST>\n
You're logging into BIH HPC cluster! (login-1)\n\n ***Your account has not been granted cluster access yet.***\n\n If you think that you should have access, please contact\n hpc-helpdesk@bih-charite.de for assistance.\n\n For applying for cluster access, contact hpc-helpdesk@bih-charite.de.\n\nuser@login-1's password:\n
Hint
This is the most common error, and the main cause for this is a wrong username. Please take a couple of minutes to read the What is my username?!
If you encounter this message although we told you that you have access and you checked the username as mentioned above, please write to hpc-helpdesk@bih-charite.de, always indicating the message you get and a detailed description of what you did.
"},{"location":"connecting/connection-problems/#im-getting-a-passphrase-prompt","title":"I'm getting a passPHRASE prompt","text":"You're logging into BIH HPC cluster! (login-1)\n\n *** It looks like your account has access. ***\n\n Login is based on **SSH keys only**, if you are getting a password prompt\n then please contact hpc-helpdesk@bih-charite.de for assistance.\n\nEnter passphrase for key '/home/USER/.ssh/id_rsa':\n
Here you have to enter the passphrase that was used for encrypting your private key. Read SSH Basics for further information of what is going on here.
"},{"location":"connecting/connection-problems/#i-can-connect-but-i-get-a-password-prompt","title":"I can connect, but I get a passWORD prompt","text":"You're logging into BIH HPC cluster! (login-1)\n\n *** It looks like your account has access. ***\n\n Login is based on **SSH keys only**, if you are getting a password prompt\n then please contact hpc-helpdesk@bih-charite.de for assistance.\n\nuser@login-1's password:\n
This is diffeerent from passPHRASE prompt
Please see I'm getting a passPHRASE prompt for more information.
When you encounter this message during a login attempt, there is an issue with your SSH key. In this case, please connect with increased verbosity to the cluster (ssh -vvv ...
) and mail the output and a detailed description to hpc-helpdesk@bih-charite.de.
This page describes how to connect to the BIH HPC from external networks (e.g., another university or from your home). The options differ depending on your home organization and are described in detail below.
Getting Help with VPN and Gateway Nodes
Please note that the VPNs and gateway nodes are maintained by the central IT departments of Charite/MDC. BIH HPC IT cannot assist you in problems with these serves. Authorative information and documentation is provided by the central IT departments as well.
SSH Key Gotchas
You should use separate SSH key pairs for your workstation, laptop, home computer etc. As a reminder, you will have to register the SSH keys with your home IT organization (MDC or Charite). When using gateway nodes, please make sure to use SSH key agents and agent forwarding (ssh
flag \"-A
\").
Use the following command to perform a proxy jump via the MDC SSH gateway (ssh1
aka jail1
) when connecting to a login node. Note that for logging into the jail, the <MDC_USER>
is required.
$ ssh -J <MDC_USER>@ssh1.mdc-berlin.de <HPC_USER>@hpc-login-1.cubi.bihealth.org\n
Note
Please Note that the cluster login is independent of access to the MDC jail node ssh1.mdc-berlin.de.
You can find the instructions for getting MDC VPN access here in the MDC intranet below the \"VPN\" heading. Please contact helpdesk@mdc-berlin.de for getting VPN access.
Install the VPN client and then start it. Once VPN has been activated you can SSH to the HPC just as from your workstation.
$ ssh user_m@hpc-login-1.cubi.bihealth.org\n
"},{"location":"connecting/from-external/#charite-users","title":"Charit\u00e9 Users","text":"Access to BIH HPC from external networks (including Eduroam) requires a Charit\u00e9 VPN connection with special access permissions.
"},{"location":"connecting/from-external/#general-charite-vpn-access","title":"General Charit\u00e9 VPN Access","text":"You need to apply for general Charit\u00e9 VPN access if you haven't done so already. The form can be found in the Charite Intranet and contains further instructions. Charit\u00e9 IT Helpdesk can help you with any questions.
"},{"location":"connecting/from-external/#zusatzantrag-b","title":"Zusatzantrag B","text":"Special permissions form B is also required for HPC access. You can find Zusatzantrag B in the Charit\u00e9 intranet. Fill it out and send it to the same address as the general VPN access form above.
Once you have been granted VPN access, start the client and connect to VPN. You will then be able to connect from your client in the VPN just as you do from your workstation.
$ ssh jdoe_c@hpc-login-1.cubi.bihealth.org\n
"},{"location":"connecting/from-external/#charite-vdi-not-recommended","title":"Charit\u00e9 VDI (Not recommended)","text":"Alternative to using Zusatzantrag B, you can also get access to the Charit\u00e9 VDI (Virtual Desktop Infrastructure). Here, you connect to a virtual desktop computer which is in the Charit\u00e9 network. From there, you can connect to the BIH HPC system.
You need to apply for extended VPN access to be able to access the BIH VDI. The form can be found here. It is important to tick Dienst(e), enter HTTPS and as target view.bihealth.org
. Please write to helpdesk@charite.de with the request to access the BIH VDI.
When the access has been set up, follow the instructions on client configuration for Windows, after logging in to the BIH VDI.
"},{"location":"connecting/ssh-basics/","title":"SSH Basics","text":""},{"location":"connecting/ssh-basics/#what-is-ssh","title":"What is SSH?","text":"SSH stands for S ecure Sh ell. It is a software that allows to establish a user-connection to a remote UNIX/Linux machine over the network and remote-control it from your local work-station.
Let's say you have an HPC cluster with hundreds of machines somewhere in a remote data-center and you want to connect to those machines to issue commands and run jobs. Then you would use SSH.
"},{"location":"connecting/ssh-basics/#getting-started","title":"Getting Started","text":""},{"location":"connecting/ssh-basics/#installation","title":"Installation","text":"Simply install your distributions openssh-client
package. You should be able to find plenty of good tutorials online. On Windows you can consider using MobaXterm (recommended) or Putty.
Let's call your local machine the client and the remote machine you want to connect to the server.
You will usually have some kind of connection information, like a hostname, IP address and perhaps a port number. Additionally, you should also have received your user-account information stating your user-name, your password, etc.
Follow the instructions below to establish a remote terminal-session.
If your are on Linux
Open a terminal and issue the following command while replacing all the <...>
fields with the actual data:
# default port\nssh <username>@<hostname-or-ip-address>\n\n# non-default port\nssh <username>@<hostname-or-ip-address> -p <port-number>\n
If you are on windows
Start putty.exe
, go into the Session
category and fill out the form, then click the Connect
button. Putty also allows to save the connection information in different profiles so you don't have to memorize and retype all fields every time you want to connect.
When you connect to a remote machine via SSH, you will be prompted for your password. This will happen every single time you connect and can feel a bit repetitive at times, especially if you feel that your password is hard to memorize. For those who don't want to type in their password every single time they connect, SSH keys are an alternative way of authentication.
Instead if being prompted for a password, SSH will simply use the key to authenticate. As this key file should be device specific, this also increases security of the login process.
You can generate a new key by issuing:
client:~$ ssh-keygen -t ed25519\n\n# 1. Choose file in which to save the key *(leave blank for default)*\n# 2. Choose a passphrase of at least five characters\n
"},{"location":"connecting/ssh-basics/#how-do-ssh-keys-work","title":"How do SSH-Keys work?","text":"An SSH key consists of two files, one private and one public key. The public key is installed on remote machines and can only be validated with the matching private key, which is stored on client computers. During the login process this is achieved via public-key cryptography.
Traditionally the algorithm used for this was RSA. Recently elliptic curve cryptography has been developed as a more secure and more performant alternative. We recommend the ed25519
type of SSH key.
The security problem with SSH keys is that anyone with access to the private key has full access to all machines that have the public key installed. Loosing the key or getting it compromised in another way imposes a serious security threat. Therefore, it is best to secure the private key with a passphrase. This passphrase is needed to unlock and use the private key.
Once you have your key-pair generated, you can easily change the passphrase of that key by issuing:
client:~$ ssh-keygen -p\n
"},{"location":"connecting/ssh-basics/#ssh-agent","title":"SSH-Agent","text":"In order to avoid having to type the passphrase of the key every time we want to use it, the key can be loaded into an SSH-Agent.
For instance, if you have connected to a login-node via Putty and want to unlock your private key in order to be able to access cluster nodes, you cant configure the SSH-Agent.
client:~$ source <(ssh-agent)\n
(The above command will load the required environment variables of the SSH-Agent into your shell environment, effectively making the agent available for your consumption.)
Next, you can load your private key:
client:~$ ssh-add\n
(You will be prompted for the passphrase of the key)
You can verify that the agent is running and your key is loaded by issuing:
client:~$ ssh-add -l\n# 'l' as in list-all-loaded-keys\n
(The command should print at least one key, showing the key-size, the hash of the key-fingerprint and the location of the file in the file-system.)
Since all home-directories are shared across the entire cluster and you created your key-pair inside your home-directory, you public-key (which is also in your home-directory) is automatically installed on all other cluster nodes, immediately. Try connecting to any cluster node. It should not prompt your for a password.
There is nothing you have to do to \"unload\" or \"lock\" the key-file. Simply disconnect.
"},{"location":"connecting/advanced-ssh/linux/","title":"Connecting via SSH on Unix","text":""},{"location":"connecting/advanced-ssh/linux/#activating-your-key-in-the-ssh-key-agent","title":"Activating your Key in the SSH Key Agent","text":"Note
The big Linux distributions automatically manage ssh-agent for you and unlock your keys at login time. If this doesn't work for you, read on.
ssh-agent
caches your SSH keys so that you do not need to type your passphrase every time it is used. Activate it by making sure ssh-agent
runs in the background and add your key:
$ eval \"$(ssh-agent -s)\"\n$ ssh-add\n
or if you chose a custom key name, specify the file like so:
$ ssh-add ~/.ssh/mdc_id_rsa\n
"},{"location":"connecting/advanced-ssh/linux/#macos","title":"MacOS","text":"If you run into problems that your key is not accepted when connecting from MacOS, please use:
$ ssh-add --apple-use-keychain\n
"},{"location":"connecting/advanced-ssh/linux/#configure-ssh-client","title":"Configure SSH Client","text":"You can define a personal SSH configuration file to make connecting to the cluster more comfortable by reducing the typing necessary by a lot. Add the following lines to the file ~/.ssh/config
file. Replace USER_NAME
with your cluster user name. You can also adapt the Host naming as you like.
Host bihcluster\n HostName hpc-login-1.cubi.bihealth.org\n User USER_NAME\n\nHost bihcluster2\n HostName hpc-login-1.cubi.bihealth.org\n User USER_NAME\n
Now, you can do type the following (and you don't have to remember the host name of the login node any more).
$ ssh bihcluster\n
This configuration works if you are inside Charit\u00e9, the Charit\u00e9 VPN, or MDC.
"},{"location":"connecting/advanced-ssh/linux/#mdc-users-jail-node","title":"MDC users: Jail node","text":"If you have an MDC user account and want to connect from the outside, you can use the following ~/.ssh/config
lines to set up a ProxyJump via the MDC SSH jail.
Host mdcjail\n HostName ssh1.mdc-berlin.de\n User MDC_USER_NAME\n
Now you can run
$ ssh -J mdcjail bihcluster1\n
If you are always connecting from outside the internal network, you can also add a permanent ProxyJump to the SSH configuration like so:
Host bihcluster\n HostName hpc-login-1.cubi.bihealth.org\n User USER_NAME\n ProxyJump mdcjail\n
"},{"location":"connecting/advanced-ssh/linux/#connecting-with-another-computerlaptop","title":"Connecting with another computer/laptop","text":"If you need to connect to the cluster from another computer than the one that contains the SSH keys that you submitted for the cluster login, you have two possibilities.
~/.ssh/id_rsa
) to the second computer into the same location.Danger
Do not leave the key on any USB stick. Delete it after file transfer. This is a sensible part of data. Make sure that the files are only readable for you.
$ cd ~/.ssh\n$ chmod g-rwx id_rsa*\n$ ssh-add id_rsa\n
"},{"location":"connecting/advanced-ssh/linux/#file-system-mount-via-sshfs","title":"File System mount via sshfs","text":"$ sshfs <USERNAME>@hpc-transfer-1.cubi.bihealth.org:/ <MOUNTPOINT>\n
hpc-transfer-1:
follows the structure <host>:<directory>
starting in the user home.<MOUNTPOINT>
must be an empty but existing and readable directory on your local computerMake sure you have both OSXFUSE and SSHFS installed. You can get both from here: https://osxfuse.github.io/ or the most recent version via Homebrew:
$ brew cask install osxfuse; brew install sshfs; brew link --overwrite sshfs\n
The last command is optional and unlinks any pre-existing links to older versions of sshfs. Now you can run $ sshfs -o follow_symlinks <USERNAME>@hpc-transfer-1<X>.cubi.bihealth.org:<directory_relative_to_Cluster_root> <MOUNTPOINT> -o volname=<BIH-FOLDER> -o allow_other,noapplexattr,noappledouble\n
"},{"location":"connecting/advanced-ssh/linux/#x11","title":"X11","text":"Do you really need to run a graphical application on the cluster?
Please note that running more complex Java applications, such as IGV may be not very efficient because of the connection speed. In most cases you can run them on your local workstation by mounting them via SSHFS.
Connect to one of the login nodes using X11 forwarding:
$ ssh -X -C -t <USERNAME>@hpc-login-1.bihealth.org\n
Once you get a login prompt, you can use the srun
command with the --x11
parameter to open a X11 session to a cluster node:
$ srun --pty --x11 bash\n
And finally you can start your X11 application, e.g.:
$ xterm\n
After a while Visual Terminal should start:
"},{"location":"connecting/advanced-ssh/overview/","title":"Advanced SSH usage","text":"Here we describe custom scenarios for using SSH to connect to BIH HPC. To keep it consise, this section is divided into separate documents for
Danger
Mounting ssh on Windows is currently discouraged since relevant software is outdated (see also hpc-talk). Also, in most cases it is not really necessary to have a constant mount. For normal data transfer please use WinSCP instead.
Once WinSshFS is started, an icon will be added to your taskbar:
Left-clicking that icon will bring up a window. If not, right click the taskbar icon, select Show Manager
and click Add
in the menu.
Fill out the marked fields:
hpc-transfer-1.cubi.bihealth.org
PrivateKey
. Select the id_rsa
private key, not the .ppk
format that is provided by PuTTY. Enter the password that you used to secure your key with.Then click Save
and then Mount
.
Open the explorer. A new drive with the name you gave should show up:
Finished!
"},{"location":"connecting/advanced-ssh/windows/#connecting-via-mdc-jail-node","title":"Connecting via MDC Jail Node","text":"This requires an active MDC account!
Additional to the steps above, click on the tab Network settings
.
ssh1.mdc-berlin.de
and in the field User your MDC username.Do you really need to run a graphical application on the cluster?
Please note that running more complex Java applications, such as IGV may be not very efficient because of the connection speed. In most cases you can run them on your local workstation by mounting them via SSHFS.
Start MobaXterm, it should automatically fetch your saved Putty sessions as you can see on screen below:
Connect to one of the login nodes, by double-click on saved profile, and then use srun --pty --x11 bash
command to start X11 session to one of the nodes:
Finally, start X11 application (below example of starting Visual Terminal):
"},{"location":"connecting/generate-key/linux/","title":"Generating an SSH Key in Linux","text":"~/.ssh/id_xxx.pub
is present.$ ssh-keygen -t ed25519 -C \"your_email@example.com\"\n
What is a key passphrase?
You should set a passphrase when generating your key pair. It is used for encrypting your private key in case it is stolen or lost. When using the key for login, you will have to enter the passphrase. Many desktop environments offer ways to automatically unlock your key on login.
Read SSH Basics for more information.
The whole session should look something like this:
host:~$ ssh-keygen -t ed25519 -C \"your_email@example.com\"\nGenerating public/private ed25519 key pair.\nEnter file in which to save the key (/home/USER/.ssh/id_ed25519): \nCreated directory '/home/USER/.ssh'.\nEnter passphrase (empty for no passphrase):\nEnter same passphrase again: \nYour identification has been saved in /home/USER/.ssh/id_ed25519.\nYour public key has been saved in /home/USER/.ssh/id_ed25519.pub.\nThe key fingerprint is:\nSHA256:Z6InW1OYt3loU7z14Kmgy87iIuYNr1gJAN1tG71D7Jc your_email@example.com\nThe key's randomart image is:\n+--[ED25519 256]--+\n|.. . . o |\n|. . . + + |\n|. . = . . |\n|. . +oE. |\n|. So= o o |\n| . . . * = + + |\n| + o + B o o .|\n| oo+. .B + + . |\n|.ooooooo*. . |\n+----[SHA256]-----+\n
The file content of ~/.ssh/id_ed25519.pub
should look something like this):
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIFzuiaSVD2j5y6RlFxOfREB/Vbd+47ABlxF7du5160ZH your_email@example.com\n
"},{"location":"connecting/generate-key/linux/#submit-your-key","title":"Submit Your Key","text":"As a next step you need to submit the SSH key use these links as:
Prerequisite: Installing an SSH Client
Please install an SSH client for Windows first.
"},{"location":"connecting/generate-key/windows/#generate-the-key","title":"Generate the Key","text":"Click on Tools
and MobaKeyGen (SSH key generator)
In the section Parameters make sure to set the following properties:
RSA
(this is the SSH-2
protocol)4096
If all is set, hit the Generate button.
During generation, move the mouse cursor around in the blank area.
When finished, make sure to protect your generated key with a passphrase. Save the private and public key. The default name under Linux for the public key is id_rsa.pub
and id_rsa
for the private key, but you can name them however you want (the .pub
is NOT automatically added). Note that in the whole cluster wiki we will use this file naming convention. Also note that the private key will be stored in Putty format (.ppk
, this extension is added automatically).
What is your key's passphrase?
You should set a passphrase when generating your private key. This passphrase is used for encrypting you private key to protect it against the private key file theft/being lost. When using the key for login, you will have to enter it (or the first time you load it into the SSH key agent). Note that when being asked for the passphrase this does not occur on the cluster (and is thus unrelated to it) but on your local computer.
Also see SSH Basics for more information.
The gibberish in the textbox is your public key in the format how it has to be submitted to the MDC and Charite (links for this step below). Thus, copy this text and paste it to the SSH-key-submission-web-service of your institution.
Store the private key additionally in the OpenSSH format. To do so, click Conversions
and select Export OpenSSH key
. To be consistent, give the file the same name as your .ppk
private key file above (just without the .ppk
).
To summarize, you should end up with three files:
id_rsa.pub
The public key file, it is not required if you copy and submit the SSH public key as described above and in the links below.id_rsa.ppk
This file is only needed if you plan to use Putty.id_rsa
This is your private key and the one and only most important file to access the cluster. It will be added to the sessions in MobaXterm and WinSSHFS (if required).As a next step you need to submit the SSH key use these links as:
As of February 2020, SSH key submission not accepted via email anymore. Instead, use the process outline here.
For any help, please contact helpdesk@charite.de (as this site is maintained by Charite GB IT).
"},{"location":"connecting/submit-key/charite/#charite-zugangsportal","title":"Charite Zugangsportal","text":"Key are submitted in the Charite Zugangsportal. As of Feb 4, you have to use the \"test\" version for this.
Go to zugang.charite.de and login.
Follow through the login page until you reach the main menu (it's tedious but we belive in you ;) Click the \"SSH Keys\" button.
Paste your SSH key (starting with ssh-rsa
) and ending with the label (usually your email, e.g., john.doe@charite.de
) into the box (1) and press append (2). By default, the key can be found in the file ~/.ssh/id_rsa.pub
in Linux. If you generated the key in Windows, please paste the copied key from the text box. Repeat as necessary. Optionally, go back to the main menu (3) when done.
If you have generated your SSH key with PuTTy, you must right click on the ppk-file, then choose \"Edit with PuTTYgen\" in the right click menu. Enter your passphrase. Then copy the SSH key out of the upper box (already highlighted in blue).
Check if the key has been added
After you clicked append
, your key will be printed back to you (as shown in the blurred picture above).
If your key is not printed back to you then adding the SSH key to zugang.charite.de was not successful. In this case please contact helpdesk@charite.de for assistance as they (Charite GB IT) maintains that system and it is out of our (BIH HPC IT) control.
Once your key has been added, it will take a few minutes for the changes to go live.
"},{"location":"connecting/submit-key/mdc/","title":"Submitting an SSH Key to MDC","text":"For MDC users, SSH keys are submitted through the MDC PersDB interface (see below). PersDB is not maintained by BIH HPC IT but by MDC IT.
Warning
The SSH keys are only activated over night (but automatically). This is out of our control. Contact helpdesk@mdc-berlin.de for more information.
"},{"location":"connecting/submit-key/mdc/#detour-using-mdc-vmware-view-to-get-into-mdc-intranet","title":"Detour: Using MDC VMWare View to get into MDC Intranet","text":"In case you are not within the MDC network, connect to MDC VMWare view first and use the web brower in the Window session.
~/.ssh/id_rsa.pub
into the clipboard window. Ensure that the whole file contents is there (should end with your email address). If you generated the key in Windows, please paste the copied key from the text box.Thus, you will only be able to connect the next day. - Bask in the glory of having completed this process.
"},{"location":"cubit/","title":"Overview","text":"The static data installation can be found at /data/cephfs-1/work/projects/cubit/18.12/static_data
.
The static data directory contains a sub-directory for the genomes, the precomputed index files for several different popular mapping tools and associated annotation (GFF and GTF files) from Ensembl and GENCODE for each of the available genomes. The top-level directory structure is as follows:
static_data/
annotations
app_support
db
exome_panel
exon_list
precomputed
reference
The following Ensembl and GENCODE versions corresponding to the indicated reference genomes will be made available on the cluster.
Database Version Reference Genome Ensembl 65 NCBIM37 (Ensembl release corresponding to GENCODE M1) Ensembl 67 NCBIM37 (Ensembl release for sanger mouse genome assembly) Ensembl 68 GRCm38 (Ensembl release for sanger mouse genome assembly) Ensembl 74 GRCh37 (Ensembl release for GENCODE 19) Ensembl 75 GRCh37 (Latest release for GRCh37) Ensembl 79 GRCh38 (Ensembl release for GENCODE 22) Ensembl 80 GRCh38 (Ensembl release corresponding to GENCODE 22) Ensembl 80 GRCm38 (Ensembl release corresponding to GENCODE M1) GENCODE M1 NCBIM37 (No gff3 file) GENCODE M5 GRCm38 GENCODE 19 current for GRCh37 GENCODE 22 current for GRCh38The annotation files associated with the indicated genomes can be accessed in the following directories:
static_data/annotation\n\u251c\u2500\u2500 ENSEMBL\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 65\n\u2502\u00a0\u00a0 \u2502\u00a0\u00a0 \u2514\u2500\u2500 NCBIM37\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 67\n\u2502\u00a0\u00a0 \u2502\u00a0\u00a0 \u2514\u2500\u2500 NCBIM37\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 68\n\u2502\u00a0\u00a0 \u2502\u00a0\u00a0 \u2514\u2500\u2500 GRCm38\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 74\n\u2502\u00a0\u00a0 \u2502\u00a0\u00a0 \u2514\u2500\u2500 GRCh37\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 75\n\u2502\u00a0\u00a0 \u2502\u00a0\u00a0 \u2514\u2500\u2500 GRCh37\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 79\n\u2502\u00a0\u00a0 \u2502\u00a0\u00a0 \u2514\u2500\u2500 GRCh38\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 80\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 GRCh38\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 GRCm38\n\u2514\u2500\u2500 GENCODE\n \u251c\u2500\u2500 19\n \u2502\u00a0\u00a0 \u2514\u2500\u2500 GRCh37\n \u251c\u2500\u2500 22\n \u2502\u00a0\u00a0 \u2514\u2500\u2500 GRCh38\n \u251c\u2500\u2500 M1\n \u2502\u00a0\u00a0 \u2514\u2500\u2500 NCBIM37\n \u2514\u2500\u2500 M5\n \u2514\u2500\u2500 GRCm38\n
"},{"location":"cubit/app-support/","title":"Cubit Static Data: Application Support","text":"The static_data/app_support
directory contains all data files that are shipped with a software package installed in cubit. For blast
this is not complete and more databases can be added upon request.
static_data/app_support\n\u251c\u2500\u2500 blast\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 variable\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 nt\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 refseq_protein\n\u251c\u2500\u2500 Delly\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 0.6.5\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 0.6.7\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 0.7.1\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 0.7.2\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 0.7.3\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 0.7.5\n\u251c\u2500\u2500 GATK_bundle\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 2.8\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 b37\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 hg19\n\u251c\u2500\u2500 Jannovar\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 0.14\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 0.16\n\u251c\u2500\u2500 kraken\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 0.10.5-cubi20160426\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 bacvir\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 minikraken_20141208\n\u251c\u2500\u2500 Oncotator\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 v1_ds_Jan262015\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 1000genome_db\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 achilles\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 cancer_gene_census\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 ccle_by_gene\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 ccle_by_gp\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 clinvar\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 cosmic\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 cosmic_fusion\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 cosmic_tissue\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 dbNSFP_ds\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 dbsnp\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 dna_repair_genes\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 esp6500SI_v2\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 esp6500SI_v2_coverage\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 familial\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 gencode_out2\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 gencode_xrefseq\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 hgnc\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 mutsig\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 oreganno\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 override_lists\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 ref_hg\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 simple_uniprot\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 so_terms\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 tcgascape\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 tumorscape\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 uniprot_aa_annotation\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 uniprot_aa_xform\n\u2514\u2500\u2500 SnpEff\n \u2514\u2500\u2500 4.1\n \u2514\u2500\u2500 data\n \u251c\u2500\u2500 GRCh37.75\n \u251c\u2500\u2500 GRCh38.79\n \u251c\u2500\u2500 GRCm38.79\n \u251c\u2500\u2500 hg19\n \u251c\u2500\u2500 hg38\n \u2514\u2500\u2500 mm10\n
"},{"location":"cubit/databases/","title":"Databases","text":"The file formats in the static_data/db
folder are mostly .vcf
or .bed
files. We provide the following databases:
The directory structure is as follows:
static_data/db\n\u251c\u2500\u2500 COSMIC\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 v72\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 GRCh37\n\u251c\u2500\u2500 dbNSFP\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 2.9\n\u251c\u2500\u2500 dbSNP\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 b128\n\u2502\u00a0\u00a0 \u2502\u00a0\u00a0 \u251c\u2500\u2500 mm9\n\u2502\u00a0\u00a0 \u2502\u00a0\u00a0 \u2514\u2500\u2500 NCBIM37\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 b142\n\u2502\u00a0\u00a0 \u2502\u00a0\u00a0 \u2514\u2500\u2500 GRCh37\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 b144\n\u2502\u00a0\u00a0 \u2502\u00a0\u00a0 \u2514\u2500\u2500 GRCh38\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 b147\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 GRCh37\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 GRCh38\n\u251c\u2500\u2500 DGV\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 2015-07-23\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 GRCh37\n\u251c\u2500\u2500 ExAC\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 release0.3\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 release0.3.1\n\u251c\u2500\u2500 giab\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 NA12878_HG001\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 NISTv2.19\n\u251c\u2500\u2500 goldenpath\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 variable\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 GRCh37\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 hg19\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 mm9\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 NCBIM37\n\u251c\u2500\u2500 SangerMouseGenomesProject\n\u2502 \u2514\u2500\u2500 REL-1211-SNPs_Indels\n\u2502 \u251c\u2500\u2500 mm9\n\u2502 \u2514\u2500\u2500 NCBIM37\n\u2514\u2500\u2500 UK10K_cohort\n \u2514\u2500\u2500 REL-2012-06-02\n
"},{"location":"cubit/exomes-panels/","title":"Exomes and Panels","text":"These exome panel data are proprietary and downloaded after registration. In case you want to use them, be sure you have access to them by creating an account at Agilent or Roche to not run into legal trouble.
static_data/exome_panel\n\u251c\u2500\u2500 Agilent\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 SureSelect_Human_All_Exon_V4\n\u2502\u00a0\u00a0 \u2502\u00a0\u00a0 \u251c\u2500\u2500 GRCh37\n\u2502\u00a0\u00a0 \u2502\u00a0\u00a0 \u2514\u2500\u2500 hg19\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 SureSelect_Human_All_Exon_V5\n\u2502\u00a0\u00a0 \u2502\u00a0\u00a0 \u251c\u2500\u2500 GRCh37\n\u2502\u00a0\u00a0 \u2502\u00a0\u00a0 \u2514\u2500\u2500 hg19\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 SureSelect_Human_All_Exon_V6\n\u2502\u00a0\u00a0 \u2502\u00a0\u00a0 \u251c\u2500\u2500 GRCh37\n\u2502\u00a0\u00a0 \u2502\u00a0\u00a0 \u2514\u2500\u2500 hg19\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 SureSelect_Mouse_All_Exon_V1\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 mm9\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 NCBIM37\n\u2514\u2500\u2500 Roche\n \u2514\u2500\u2500 SeqCap_EZ_MedExome\n \u2514\u2500\u2500 GRCh37\n
"},{"location":"cubit/exon-lists/","title":"Exon Lists","text":"Here we provide exon lists for some human genome assemblies in the .bed
-file format. Each file exists with the original coordinates contained and as a version with 10 bp padded on each site (suffix: _plus_10bp.bed
). The folder structure is self-explanatory:
static_data/exon_list\n\u251c\u2500\u2500 CCDS\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 15\n\u2502\u00a0\u00a0 \u2502\u00a0\u00a0 \u251c\u2500\u2500 GRCh37\n\u2502\u00a0\u00a0 \u2502\u00a0\u00a0 \u2514\u2500\u2500 hg19\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 18\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 GRCh38\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 hg38\n\u2514\u2500\u2500 ENSEMBL\n \u251c\u2500\u2500 74\n \u2502\u00a0\u00a0 \u2514\u2500\u2500 GRCh37\n \u2514\u2500\u2500 75\n \u2514\u2500\u2500 GRCh37\n
"},{"location":"cubit/index-files/","title":"Precomputed Index Files","text":"Index files for
have been precomputed. The index corresponding to each genome is stored in the following directory structure with the above mentioned reference genomes as subfolders (listed here only for Bowtie/1.1.2
, same subfolders for the remaining programs):
static_data/precomputed\n\u251c\u2500\u2500 Bowtie\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 1.1.2\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 danRer10\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 dm6\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 ecoli\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 GRCh37\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 GRCh38\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 GRCm38\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 hg18\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 hg19\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 hg38\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 mm10\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 mm9\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 NCBIM37\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 phix\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 sacCer3\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 UniVec\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 UniVec_Core\n\u251c\u2500\u2500 Bowtie2\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 2.2.5\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 [see Bowtie/1.1.2]\n\u251c\u2500\u2500 BWA\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 0.7.12\n\u2502\u00a0\u00a0 \u2502 \u2514\u2500\u2500 [see Bowtie/1.1.2]\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 0.7.15\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 [see Bowtie/1.1.2]\n\u2514\u2500\u2500 STAR\n \u2514\u2500\u2500 2.4.1d\n \u2514\u2500\u2500 default\n \u00a0\u00a0 \u2514\u2500\u2500 [see Bowtie/1.1.2]\n
"},{"location":"cubit/references/","title":"Reference Sequences","text":""},{"location":"cubit/references/#ncbi-mouse-reference-genome-assemblies","title":"NCBI mouse reference genome assemblies","text":"We provide the NCBI mouse reference assembly used by the Sanger Mouse Genomics group for NCBIM37 and GRCm38. This is a reliable source where the appropriate contigs have already been selected by experts. NCBIM37 is annotated with Ensembl release 67 and GRCm38 with Ensembl release 68.
"},{"location":"cubit/references/#ucsc-mouse-reference-genome-assemblies","title":"UCSC mouse reference genome assemblies","text":"The assembly sequence is in one file per chromosome and is available for mm9 and mm10. We concatenated all the chromosome files to one final fasta file for each genome assembly.
"},{"location":"cubit/references/#ncbi-human-reference-genome-assemblies","title":"NCBI human reference genome assemblies","text":"The assembly sequence is in one file per chromosome is available for hg18, hg19 and hg38. We concatenated all the chromosome files to one final fasta file for each genome assembly. Additionally, in the subfolder chromosomes
we keep the chromosome fasta files separately for hg18 and hg19.
The following directory structure indicates the available genomes. Where there isn't a name for the data set, either the source (e.g. sanger - from the Sanger Mouse Genomes project) or the download date is used to name the sub-directory.
static_data/reference\n\u251c\u2500\u2500 danRer10\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 ucsc\n\u251c\u2500\u2500 dm6\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 ucsc\n\u251c\u2500\u2500 ecoli\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 GCA_000005845.2_ASM584v2\n\u251c\u2500\u2500 genomemedley\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 1\n\u251c\u2500\u2500 GRCh37\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 g1k_phase1\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 g1k_phase2\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 hs37\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 hs37d5\n\u251c\u2500\u2500 GRCh38\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 hs38\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 hs38a\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 hs38DH\n\u251c\u2500\u2500 GRCm38\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 sanger\n\u251c\u2500\u2500 hg18\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 ucsc\n\u251c\u2500\u2500 hg19\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 ucsc\n\u251c\u2500\u2500 hg38\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 ucsc\n\u251c\u2500\u2500 mm10\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 ucsc\n\u251c\u2500\u2500 mm9\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 ucsc\n\u251c\u2500\u2500 NCBIM37\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 sanger\n\u251c\u2500\u2500 phix\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 illumina\n\u251c\u2500\u2500 sacCer3\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 ucsc\n\u251c\u2500\u2500 UniVec\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 9\n\u2514\u2500\u2500 UniVec_Core\n \u2514\u2500\u2500 9\n
"},{"location":"help/faq/","title":"Frequently Asked Questions","text":""},{"location":"help/faq/#where-can-i-get-help","title":"Where can I get help?","text":"Please see the section Connection Problems.
"},{"location":"help/faq/#connecting-to-the-cluster-takes-a-long-time","title":"Connecting to the cluster takes a long time.","text":"The most probable cause for this is a conda installation which defaults to loading the (Base) environment on login. To disable this behaviour you can run:
$ conda config --set auto_activate_base false\n
You can also run the bash shell in verbose mode to find out exactly which command is slowing down login:
$ ssh user@hpc-login-1.cubi.bihealth.org bash -iv\n
"},{"location":"help/faq/#what-is-the-difference-between-max-and-bih-cluster-what-is-their-relation","title":"What is the difference between MAX and BIH cluster? What is their relation?","text":"Administrativa
Request for both systems are handled separately, depending on the user's affiliation with research/service groups.
Hardware and Systems
Bioinformatics Software
packet_write_wait: Connection to XXX : Broken pipe
\". How can I fix this?","text":"Try to put the following line at the top of your ~/.ssh/config
.
ServerAliveInterval 30\n
This will make ssh
send an empty network package to the server. This will prevent network hardware from thinking your connection is unused/broken and terminating it.
If the problem persists, please report it to hpc-helpdesk@bih-charite.de.
"},{"location":"help/faq/#my-job-terminated-before-being-done-what-happened","title":"My job terminated before being done. What happened?","text":"First of all, look into your job logs. In the case that the job was terminated by Slurm (e.g., because it ran too long), you will find a message like this at the bottom. Please look at the end of the last line in your log file.
slurmstepd: error: *** JOB <your job id> ON med0xxx CANCELLED AT 2020-09-02T21:01:12 DUE TO TIME LIMIT ***\n
This indicates that you need to need to adjust the --time
limit to your sbatch
command.
slurmstepd: error: Detected 2 oom-kill event(s) in step <your job id>.batch cgroup.\nSome of your processes may have been killed by the cgroup out-of-memory handler\n
This indicates that your job tries to use more memory than has been allocated to it. Also see Slurm Scheduler: Memory Allocation
Otherwise, you can use sacct -j JOBID
to read the information that the job accounting system has recorded for your job. A job that was canceled (indicated by CANCELED
) by the Slurm job scheduler looks like this (ignore the COMPLETED
step that is just some post-job step added by Slurm automatically).
# sacct -j _JOBID_\n JobID JobName Partition Account AllocCPUS State ExitCode\n------------ ---------- ---------- ---------- ---------- ---------- --------\n_JOBID_ snakejob.+ medium hpc-ag-xx+ 4 TIMEOUT 0:0\n_JOBID_.bat+ batch hpc-ag-xx+ 4 CANCELLED 0:15\n_JOBID_.ext+ extern hpc-ag-xx+ 4 COMPLETED 0:0\n
Use the --long
flag to see all fields (and probably pipe it into less
as: sacct -j JOBID --long | less -S
). Things to look out for:
MaxRSS
)?Elapsed
)?Note that --long
does not show all fields. For example, the following tells us that the given job was above its elapsed time which caused it to be killed.
# sacct -j _JOBID_ --format Timelimit,Elapsed\n Timelimit Elapsed\n---------- ----------\n 01:00:00 01:00:12\n 01:00:13\n 01:00:12\n
Use man sacct
, sacct --helpformat
, or see the Slurm Documentation for options for the --format
field of sacct
.
This is most probably caused by your job being allocated insufficient memory. Please see the memory part of the answer to My job terminated before being done. What happened?
"},{"location":"help/faq/#how-can-i-create-a-new-project","title":"How can I create a new project?","text":"You can create a project if you are either a group leader of an AG or a delegate of an AG. If this is the case, please follow these instructions.
"},{"location":"help/faq/#i-cannot-create-pngs-in-r","title":"I cannot create PNGs in R","text":"For using the png
method, you need to have an X11 session running. This might be the case if you logged into a cluster node using srun --x11
if configured correctly but is not the case if you submitted a bash job. The solution is to use xvfb-run
(xvfb = X11 virtual frame-buffer).
Here is the content of an example script:
$ cat img.R\n#!/usr/bin/env Rscript\n\npng('cars.png')\ncars <- c(1, 3, 6, 4, 9)\nplot(cars)\ndev.off()\n
Here, it fails without X11:
$ ./img.R\nError in .External2(C_X11, paste(\"png::\", filename, sep = \"\"), g$width, :\n unable to start device PNG\nCalls: png\nIn addition: Warning message:\nIn png(\"cars.png\") : unable to open connection to X11 display ''\nExecution halted\n
Here, it works with xvfb-run
:
$ xvfb-run ./img.R\nnull device\n 1\n$ ls\ncars.png foo.png img.R Rplots.pdf\n
"},{"location":"help/faq/#my-jobs-dont-get-scheduled","title":"My jobs don't get scheduled","text":"You can use scontrol show job JOBID
to get the details displayed about your jobs. In the example below, we can see that the job is in the PENDING
state. The Reason
field tells us that the job did not scheduled because the specified dependency was neverfulfilled. You can find a list of all job reason codes in the Slurm squeue
documentation.
JobId=863089 JobName=pipeline_job.sh\n UserId=holtgrem_c(100131) GroupId=hpc-ag-cubi(5272) MCS_label=N/A\n Priority=1 Nice=0 Account=(null) QOS=normal\n JobState=PENDING Reason=DependencyNeverSatisfied Dependency=afterok:863087(failed)\n Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0\n RunTime=00:00:00 TimeLimit=08:00:00 TimeMin=N/A\n SubmitTime=2020-05-03T18:57:34 EligibleTime=Unknown\n AccrueTime=Unknown\n StartTime=Unknown EndTime=Unknown Deadline=N/A\n SuspendTime=None SecsPreSuspend=0 LastSchedEval=2020-05-03T18:57:34\n Partition=debug AllocNode:Sid=hpc-login-1:28797\n ReqNodeList=(null) ExcNodeList=(null)\n NodeList=(null)\n NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*\n TRES=cpu=1,node=1,billing=1\n Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*\n MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0\n Features=(null) DelayBoot=00:00:00\n OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)\n Command=/data/cephfs-1/work/projects/medgen_genomes/2019-06-05_genomes_reboot/GRCh37/wgs_cnv_export/pipeline_job.sh\n WorkDir=/data/cephfs-1/work/projects/medgen_genomes/2019-06-05_genomes_reboot/GRCh37/wgs_cnv_export\n StdErr=/data/cephfs-1/work/projects/medgen_genomes/2019-06-05_genomes_reboot/GRCh37/wgs_cnv_export/slurm-863089.out\n StdIn=/dev/null\n StdOut=/data/cephfs-1/work/projects/medgen_genomes/2019-06-05_genomes_reboot/GRCh37/wgs_cnv_export/slurm-863089.out\n Power=\n MailUser=(null) MailType=NONE\n
If you see a Reason=ReqNodeNotAvail,_Reserved_for_maintenance
then also see Reservations / Maintenances.
For GPU jobs also see \"My GPU jobs don't get scheduled\".
"},{"location":"help/faq/#my-gpu-jobs-dont-get-scheduled","title":"My GPU jobs don't get scheduled","text":"There are only four GPU machines in the cluster (with four GPUs each, hpc-gpu-1 to hpc-gpu-4). Please inspect first the number of running jobs with GPU resource requests:
hpc-login-1:~$ squeue -o \"%.10i %20j %.2t %.5D %.4C %.10m %.16R %.13b\" \"$@\" | grep hpc-gpu- | sort -k7,7\n 1902163 ONT-basecalling R 1 2 8G hpc-gpu-1 gpu:tesla:2\n 1902167 ONT-basecalling R 1 2 8G hpc-gpu-1 gpu:tesla:2\n 1902164 ONT-basecalling R 1 2 8G hpc-gpu-2 gpu:tesla:2\n 1902166 ONT-basecalling R 1 2 8G hpc-gpu-2 gpu:tesla:2\n 1902162 ONT-basecalling R 1 2 8G hpc-gpu-3 gpu:tesla:2\n 1902165 ONT-basecalling R 1 2 8G hpc-gpu-3 gpu:tesla:2\n 1785264 bash R 1 1 1G hpc-gpu-4 gpu:tesla:2\n
This indicates that there are two free GPUs on hpc-gpu-4.
Second, inspect the node states:
hpc-login-1:~$ sinfo -n hpc-gpu-[1-4]\nPARTITION AVAIL TIMELIMIT NODES STATE NODELIST\ndebug* up 8:00:00 0 n/a\nmedium up 7-00:00:00 0 n/a\nlong up 28-00:00:0 0 n/a\ncritical up 7-00:00:00 0 n/a\nhighmem up 14-00:00:0 0 n/a\ngpu up 14-00:00:0 1 drng hpc-gpu-4\ngpu up 14-00:00:0 3 mix med[0301-0303]\nmpi up 14-00:00:0 0 n/a\n
This tells you that hpc-gpu-1 to hpc-gpu-3 have jobs running (\"mix\" indicates that there are free resources, but these are only CPU cores not GPUs). hpc-gpu-4 is shown to be in \"draining state\". Let's look what's going on there.
hpc-login-1:~$ scontrol show node hpc-gpu-4\nNodeName=hpc-gpu-4 Arch=x86_64 CoresPerSocket=16\n CPUAlloc=2 CPUTot=64 CPULoad=1.44\n AvailableFeatures=skylake\n ActiveFeatures=skylake\n Gres=gpu:tesla:4(S:0-1)\n NodeAddr=hpc-gpu-4 NodeHostName=hpc-gpu-4 Version=20.02.0\n OS=Linux 3.10.0-1127.13.1.el7.x86_64 #1 SMP Tue Jun 23 15:46:38 UTC 2020\n RealMemory=385215 AllocMem=1024 FreeMem=347881 Sockets=2 Boards=1\n State=MIXED+DRAIN ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A\n Partitions=gpu\n BootTime=2020-06-30T20:33:36 SlurmdStartTime=2020-07-01T09:31:51\n CfgTRES=cpu=64,mem=385215M,billing=64\n AllocTRES=cpu=2,mem=1G\n CapWatts=n/a\n CurrentWatts=0 AveWatts=0\n ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s\n Reason=deep power-off required for PSU [root@2020-07-17T13:21:02]\n
The \"State\" attribute indicates the node has jobs running but is currenlty being \"drained\" (accepts no new jobs). The \"Reason\" gives that it has been scheduled for power-off for maintenance of the power supply unit.
"},{"location":"help/faq/#when-will-my-job-be-scheduled","title":"When will my job be scheduled?","text":"You can use the scontrol show job JOBID
command to inspect the scheduling information for your job. For example, the following job is scheduled to start at 2022-09-19T07:53:29
(StartTime
) and will be terminated if it does not stop before 2022-09-19T15:53:29
(EndTime
) For further information, it has been submitted at 2022-09-15T12:24:57
(SubmitTime
) and has been last considered by the scheduler at 2022-09-19T07:53:15
(LastSchedEval
).
# scontrol show job 4225062\nJobId=4225062 JobName=C2371_2\n UserId=user_c(133196) GroupId=hpc-ag-group(1030014) MCS_label=N/A\n Priority=805 Nice=0 Account=hpc-ag-group QOS=normal\n JobState=PENDING Reason=QOSMaxCpuPerUserLimit Dependency=(null)\n Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0\n RunTime=00:00:00 TimeLimit=08:00:00 TimeMin=N/A\n SubmitTime=2022-09-15T12:24:57 EligibleTime=2022-09-15T12:24:57\n AccrueTime=2022-09-15T12:24:57\n StartTime=2022-09-19T07:53:29 EndTime=2022-09-19T15:53:29 Deadline=N/A\n SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-09-19T07:53:15 Scheduler=Main\n Partition=medium AllocNode:Sid=hpc-login-1:557796\n ReqNodeList=(null) ExcNodeList=(null)\n NodeList=(null)\n NumNodes=1-1 NumCPUs=25 NumTasks=25 CPUs/Task=1 ReqB:S:C:T=0:0:*:*\n TRES=cpu=25,mem=150G,node=1,billing=25\n Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*\n MinCPUsNode=1 MinMemoryNode=150G MinTmpDiskNode=0\n Features=(null) DelayBoot=00:00:00\n OverSubscribe=YES Contiguous=0 Licenses=(null) Network=(null)\n Command=/data/cephfs-1/home/users/user_c/work/SCZ_replic/JR_sims/GS_wrapy/wrap_y0_VP_2371_GS_chunk2_C02.sh\n WorkDir=/data/cephfs-1/home/users/user_c/work/SCZ_replic/JR_sims\n StdErr=/data/cephfs-1/home/users/user_c/work/SCZ_replic/JR_sims/E2371_2.txt\n StdIn=/dev/null\n StdOut=/data/cephfs-1/home/users/user_c/work/SCZ_replic/JR_sims/slurm-4225062.out\n Power=\n
"},{"location":"help/faq/#my-jobs-dont-run-in-the-partition-i-expect","title":"My jobs don't run in the partition I expect","text":"You can see the partition that your job runs in with squeue -j JOBID
:
hpc-login-1:~$ squeue -j 877092\n JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)\n 877092 medium snakejob holtgrem R 0:05 1 med0626\n
See Job Scheduler for information about the partition's properties and how jbos are routed to partitions. You can force jobs to run in a particular partition by specifying the --partition
parameter, e.g., by adding --partition=medium
or -p medium
to your srun
and sbatch
calls.
This is probably answered by the answer to My jobs don't run in the partition I expect.
"},{"location":"help/faq/#how-can-i-mount-a-network-volume-from-elsewhere-on-the-cluster","title":"How can I mount a network volume from elsewhere on the cluster?","text":"You cannot.
"},{"location":"help/faq/#how-can-i-make-workstationserver-files-available-to-the-hpc","title":"How can I make workstation/server files available to the HPC?","text":"You can transfer files to the cluster through Rsync over SSH or through SFTP to the hpc-transfer-1
or hpc-transfer-2
node.
Do not transfer files through the login nodes. Large file transfers through the login nodes can cause performance degradation for the users with interactive SSH connections.
"},{"location":"help/faq/#how-can-i-circumvent-invalid-instruction-signal-4-errors","title":"How can I circumvent \"invalid instruction\" (signal 4) errors?","text":"Make sure that software is compiled with \"sandy bridge\" optimizations and no later one. E.g., use the -march=sandybridge
argument to the GCC/LLVM compiler executables.
If you absolutely need it, there are some boxes with more recent processors in the cluster (e.g., Haswell architecture). Look at the /proc/cpuinfo
files for details.
Please check whether there might be other jobs waiting in front of you! The following squeue
call will show the allocated GPUs of jobs in the gpu
queue. This is done by specifying a format string and using the %b
field.
squeue -o \"%.10i %9P %20j %10u %.2t %.10M %.6D %10R %b\" -p gpu\n JOBID PARTITION NAME USER ST TIME NODES NODELIST(R TRES_PER_NODE\n 872571 gpu bash user1 R 15:53:25 1 hpc-gpu-3 gpu:tesla:1\n 862261 gpu bash user2 R 2-16:26:59 1 hpc-gpu-4 gpu:tesla:4\n 860771 gpu kidney.job user3 R 2-16:27:12 1 hpc-gpu-2 gpu:tesla:1\n 860772 gpu kidney.job user3 R 2-16:27:12 1 hpc-gpu-2 gpu:tesla:1\n 860773 gpu kidney.job user3 R 2-16:27:12 1 hpc-gpu-2 gpu:tesla:1\n 860770 gpu kidney.job user3 R 4-03:23:08 1 hpc-gpu-1 gpu:tesla:1\n 860766 gpu kidney.job user3 R 4-03:23:11 1 hpc-gpu-3 gpu:tesla:1\n 860767 gpu kidney.job user3 R 4-03:23:11 1 hpc-gpu-1 gpu:tesla:1\n 860768 gpu kidney.job user3 R 4-03:23:11 1 hpc-gpu-1 gpu:tesla:1\n
In the example above, user1 has one job with one GPU running on hpc-gpu-3, user2 has one job running with 4 GPUs on hpc-gpu-4 and user3 has 7 jobs in total running of different machines with one GPU each.
"},{"location":"help/faq/#how-can-i-access-graphical-user-interfaces-such-as-for-matlab-on-the-cluster","title":"How can I access graphical user interfaces (such as for Matlab) on the cluster?","text":"-X
for Linux/Mac ssh
srun --pty --x11 bash -i
(instead of srun --pty --x11 bash -i
).Also see:
This is sometimes useful, e.g., for monitoring the CPU/GPU usage of your job interactively.
No Computation Outside of Slurm
Do not perform any computation outside of the scheduler as (1) this breaks the purpose of the scheduling system and (2) administration is not aware and might kill you jobs.
The answer is simple, just SSH into this node.
hpc-login-1:~$ ssh hpc-cpu-xxx\n
"},{"location":"help/faq/#why-am-i-getting-multiple-nodes-to-my-job","title":"Why am I getting multiple nodes to my job?","text":"Classically, jobs on HPC systems are written in a way that they can run on multiple nodes at once, using the network to communicate. Slurm comes from this world and when allocating more than one CPU/core, it might allocate them on different nodes. Please use --nodes=1
to force Slurm to allocate them on a single node.
You can select the CPU architecture by using the -C
/--constraint
flag to sbatch
and srun
. The following are available (as detected by the Linux kernel):
ivybridge
(96 nodes, plus 4 high-memory nodes)haswell
(16 nodes)broadwell
(112 nodes)skylake
(16 nodes, plus 4 GPU nodes)You can specify contraints with OR such as --constraint=haswell|broadwell|skylake
. You can see the assignment of architectures to nodes using the sinfo -o \"%8P %.5a %.10l %.6D %.6t %10f %N\"
command. This will also display node partition, availability etc.
No worries!
As documented in the Storage Locations section, each user/project/group has three storage volumes: A small home
, a larger work
and a large (but temporary) scratch
. There are limits on the size of these volumes. You get a nightly warning email in case you are over the soft limit and you will not be able to write any more data if you get above the hard limit. When you login to the login nodes, the quotas and current usage is displayed to you.
Please note that not all files will be displayed when using ls
. You have to add the -a
parameter to also show files and directory starting with a dot. Often, users are confused if these dot directories take up all of their home
quota.
Use the following command to list all files and directories in your home:
hpc-login-1:~$ ls -la ~/\n
For more information on how to keep your home directory clean and avoid quota warnings, please read Home Folder Quota.
"},{"location":"help/faq/#im-getting-a-disk-quota-exceeded-error","title":"I'm getting a \"Disk quota exceeded\" error.","text":"Most probably you are running into the same problem as described above: Help, I'm getting a Quota Warning Email!
"},{"location":"help/faq/#environment-modules-dont-work-and-i-get-module-command-not-found","title":"Environment modules don't work and I get \"module: command not found\"","text":"First of all, ensure that you are on a compute node and not on one of the login nodes. One common reason is that the system-wide Bash configuration has not been loaded, try to execute source /etc/bashrc
and then re-try using module
. In the case that the problem persists, please contact hpc-helpdesk@bih-charite.de.
All users get their home directory setup using a skelleton files. These file names start with a dot .
and are hidden when you type ls
, you have to type ls -a
to see them. You can find the current skelleton in /etc/skel.bih
and inspect the content of the Bash related files as follows:
hpc-login-1:~$ head /etc/skel.bih/.bash*\n==> /etc/skel.bih/.bash_logout <==\n# ~/.bash_logout\n\n==> /etc/skel.bih/.bash_profile <==\n# .bash_profile\n\n# Get the aliases and functions\nif [ -f ~/.bashrc ]; then\n . ~/.bashrc\nfi\n\n# User specific environment and startup programs\n\nPATH=$PATH:$HOME/.local/bin:$HOME/bin\n\n==> /etc/skel.bih/.bashrc <==\n# .bashrc\n\n# Source global definitions\nif [ -f /etc/bashrc ]; then\n . /etc/bashrc\nfi\n\n# Uncomment the following line if you don't like systemctl's auto-paging feature:\n# export SYSTEMD_PAGER=\n
There actually are a couple of more files by default. The original copy in /etc/skel.bih
might slightly change over time during improvements but we will not touch your home directory in an unsolicited way at any time!
hpc-login-1:~$ tree -a /etc/skel.bih/\n/etc/skel.bih/\n\u251c\u2500\u2500 .bash_logout\n\u251c\u2500\u2500 .bash_profile\n\u251c\u2500\u2500 .bashrc\n\u251c\u2500\u2500 .screenrc\n\u2514\u2500\u2500 .vimrc\n
"},{"location":"help/faq/#my-program-crashes-what-should-i-do","title":"My program crashes! What should I do?","text":"Have a look at our How-To: Debug Software and How-To: Debug Software on HPC Systems guides!
But it works on my workstation!
Yes, please also refer to these guides on possible approaches to find the problem.
"},{"location":"help/faq/#my-r-sessions-is-always-crashing","title":"My R sessions is always crashing!","text":"We are aware of issues for macOS users that are related to locales. You might see something like the following in your R session or logs.
Note that this is not macOS specific by itself but by macOS using a locale that R does not understand.
Try using export LC_ALL=C
before calling R. If the problem goes away, you can add the following lines to your ~/.bashrc
file to make this permanent. Note that this is actually a good idea so your locale settings are fixed.
# Fix all locale settings to \"C\" for robustness.\nexport LC_ALL=C\n
ChatGPT on Unix Locales
Let's se what ChatGPT 3.5 tells us about Unix locale.
Me: Please explain the concept of Unix locales to me.
Certainly! Unix locales are a set of environment variables and settings that define how various aspects of software and user interactions should be handled to accommodate different languages, regions, and cultural conventions. They play a crucial role in enabling software to adapt to local preferences for displaying text, formatting numbers, handling dates, and more.
In a Unix-like operating system (such as Linux or macOS), locales provide a way to customize how programs behave based on the user's language, country, and cultural norms. Here are some of the key components of Unix locales:
Language: The primary language used for messages, menus, prompts, and system output. For instance, if you set your locale to \"en_US\", messages and interfaces will be in English (United States).
Region: The country or region that defines conventions for aspects like date and time formatting, number formatting (decimal point or comma, thousands separator, etc.), and currency symbols. For example, \"en_GB\" represents English (Great Britain).
Character Encoding: Specifies how characters are encoded, which is vital for correctly displaying and handling text in various scripts and languages. Common encodings include UTF-8 for Unicode text.
Collation: Defines how strings should be sorted. Different languages have different sorting orders for characters, and the locale determines the appropriate collation rules.
Numeric and Date Formats: Specifies how numbers, dates, and times should be formatted. Different regions have different conventions for date formats (dd/mm/yyyy vs. mm/dd/yyyy) and numerical representations (commas vs. periods as decimal points).
Currency Symbols: Defines the symbol or abbreviation used for representing currency. This is important for currency formatting in software.
Unix locales are typically identified by a combination of language and region codes, like \"en_US\" for American English or \"fr_FR\" for French (France). These locales are stored in files on the system, and they contain the necessary rules and data for proper localization.
Users and administrators can set the system's default locale, and individual users can override this setting in their environment. Here are some common commands used to manage locales:
locale
: Displays the current locale settings.locale -a
: Lists all available locales on the system.export LC_ALL=locale_code
: Temporarily sets the locale for the current session.update-locale locale_code
: Changes the system-wide default locale.Programs that are designed to be locale-aware use these settings to display messages, format numbers and dates, and handle other language-dependent features. This allows software to be more user-friendly and relevant to different linguistic and cultural contexts.
In summary, Unix locales provide a framework for adapting software behavior to match the language, region, and cultural preferences of users, ensuring a more inclusive and globally accessible computing experience.
"},{"location":"help/faq/#which-cuda-version-is-installed","title":"Which CUDA version is installed?","text":"For this, connect to the node you want to query (via SSH but do not perform any computation via SSH!)
hpc-login-1:~$ ssh hpc-gpu-1\nhpc-gpu-1:~$ yum list installed 2>/dev/null | grep cuda.x86_64\ncuda.x86_64 10.2.89-1 @local-cuda\nnvidia-driver-latest-dkms-cuda.x86_64 3:440.64.00-1.el7 @local-cuda\n
"},{"location":"help/faq/#can-i-use-docker-on-the-cluster","title":"Can I use Docker on the Cluster?","text":"No, as Docker essentially gives you access as the root user.
However, you can use Apptainer (former Singularity) to run containers (and even many Docker contains if they are \"properly built\"). Also see Using Apptainer (with Docker Images).
"},{"location":"help/faq/#how-can-i-copy-data-between-the-max-cluster-mdc-network-and-bih-hpc","title":"How can I copy data between the MAX Cluster (MDC Network) and BIH HPC?","text":"The MAX cluster is the HPC system of the MDC. It is located in the MDC network. The BIH HPC is located in the BIH network.
In general, connections can only be initiated from the MDC network to the BIH network. The reverse does not work. In other words, you have to log into the MAX cluster and then initiate your file copies to or from the BIH HPC from there. E.g., use rsync -avP some/path user_m@hpc-transfer-1.cubi.bihealth.org:/another/path
to copy files from the MAX cluster to BIH HPC and rsync -avP user_m@hpc-transfer-1.cubi.bihealth.org:/another/path some/path
to copy data from the BIH HPC to the MAX cluster.
In general, connections can only be initiated from the Charite network to the BIH network. The reverse does not work. In other words, you have to be on a machine inside the Charite network and then initiate your file copies to or from the BIH HPC from there. E.g., use rsync -avP some/path user_c@hpc-transfer-1.cubi.bihealth.org:/another/path
to copy files from the MAX cluster to BIH HPC and rsync -avP user_c@hpc-transfer-1.cubi.bihealth.org:/another/path some/path
to copy data from the BIH HPC to the MAX cluster.
As of December 3, 2020 we have established a policy to limit you to 512 files and 128MB of RAM. Further, you are limited to using the equivalent of one core. This limit is enforced for all processes originating from an SSH session and the limit is enforced on all jobs. This was done to prevent users from thrashing the head nodes or using SSH based sessions for computation.
"},{"location":"help/faq/#slurm-complains-about-execve-no-such-file-or-directory","title":"Slurm complains aboutexecve
/ \"No such file or directory\"","text":"This means that the program that you want to execute does not exist. Consider the following example:
[user@hpc-login-1 ~]$ srun --time 2-0 --nodes=1 --ntasks-per-node=1 \\\n --cpus-per-task=12 --mem 96G --partition staging --immediate 5 \\\n --pty bash -i\nslurmstepd: error: execve(): 5: No such file or directory\nsrun: error: hpc-cpu-2: task 0: Exited with exit code 2\n
Can you spot the problem? In this case, the problem is that for long arguments such as --mem
you must use the equal sign for --arg=value
with Slurm. This means that instead of writing --mem 96G --partition staging --immediate 5
, you must use `--mem=96G --partition=staging --immediate=5
.
In this respect, Slurm deviates from the GNU argument syntax where the equal sign is optional for long arguments.
"},{"location":"help/faq/#slurmstepd-says-that-hwloc_get_obj_below_by_type-fails","title":"slurmstepd
says that hwloc_get_obj_below_by_type
fails","text":"You can ignore the following problem:
slurmstepd: error: hwloc_get_obj_below_by_type() failing, task/affinity plugin may be required to address bug fixed in HWLOC version 1.11.5\nslurmstepd: error: task[0] unable to set taskset '0x0'\n
This is a minor failure related to Slurm and cgroups. Your job should run through successfully despite this error (that is more of a warning for end-users).
"},{"location":"help/faq/#how-can-i-share-filescollaborate-with-users-from-another-work-group","title":"How can I share files/collaborate with users from another work group?","text":"Please use projects as documented here. Projects were created for this particular purpose.
"},{"location":"help/faq/#whats-the-relation-of-charite-mdc-and-cluster-accounts","title":"What's the relation of Charite, MDC, and cluster accounts?","text":"For HPC 4 Research either an active and working Charite or MDC account is required (that is, you can login e.g., into email.charite.de or mail.mdc-berlin.de). The system has a separate meta directory that is used for the authorization of users (in other words, whether the user is active, has access to the system, and which groups the user belongs to). Charite and MDC accounts map to accounts <Charite user name>_c
and <MDC user name>_m
accounts in this meta directory. In the case that a user has both Charite and MDC accounts these are completely separate entities in the meta directory. For authentication (veryfing that a user has acccess to an account), the Charite and MDC account systems (MS Active Directory) are used. Authentication currently only uses the SSH keys deposited into Charite (via zugang.charite.de) and MDC (via MDC persdb). Users have to obtain a suitable Charite/MDC account via Charite and MDC central IT departments and upload their SSH keys through the host organization systems on their own. The hpc-helpdesk process is then used for getting their accounts setup on the HPC 4 Research system (the home/work/scratch shares being setup), becoming part of the special hpc-users
group that controls access to the system and organizing users into work groups and projects.
The process of submitting keys to Charite and MDC is documented in the \"Connecting\" section.
"},{"location":"help/faq/#how-do-charitemdccluster-accounts-interplay-with-vpn-and-the-mdc-jail-node","title":"How do Charite/MDC/Cluster accounts interplay with VPN and the MDC jail node?","text":"Charite users have to obtain a VPN account with the appropriate VPN access permissions, i.e., Zusatzantrag B as documented here. For Charite VPN, as for all Charite IT systems, users must use their Charite user name (e.g., jdoe
and not jdoe_c
).
MDC users either have to use MDC VPN or the MDC jail node, as documented here. For MDC VPN and jail node, as for all MDC IT systems, users must use their MDC user name (e.g., jdoe
and not jdoe_m
).
For help with VPN or jail node, please contact the central Charite or MDC helpdesks as appropriate.
Only when connecting from the host organizations' VPN or from the host organizations' jail node, the users use the HPC 4 Research user name that is jdoe_c
or jdoe_m
and not jdoe
!
BIH HPC IT does not have the resources to offer such a service to normal users.
In particular, for privacy sensitive data this comes with a large number of strings attached to fulfill all regulatory requirements. If you need to exchange such data then you need to contact the central IT departments of your home organisation:
If your data is not privacy sensitive or you can guarantee strong encryption of the data then the Gigamove service of RWTH Aachen might come in handy:
You can login via Charite/MDC credentials (or most German academic institutions) and store up to 1TB of data at a time in the account with each file having up to 100GB.
As a note, Charite GB IT has a (German) manual on how to use 7-Zip with AES256 and strong passwords for encrypting data such that it is fit for transfer over unencrypted channels. You can find it here (Charite Intranet only) at point 2.12.
The key point is using a strong password (e.g. with the pwgen
utility), creating an encrypted file with AES256 encryption, using distinct password for each recipient, and exchanging the password over a second channel (SMS or voice phone). Note that the central manual remains the ground truth of information and this FAQ entry may not reflect the current process recommended by GB IT if it changes without us noticing.
Can you solve the question yourself?
Please try to solve the question yourself with this manual and Google.
If the problem turns out to be hard, we're happy to help.
This page describes how to write a good help request ticket.
There is more specific questions for common issues given below.
"},{"location":"help/good-tickets/#problems-connecting-to-the-cluster","title":"Problems Connecting to the Cluster","text":"ifconfig
on Linux/Mac, ipconfig
on Windows)?ssh-add -l
and add -vvv
to the SSH command that fails for you.scontrol show job <jobid>
or sacct --long -j <jobid>
of your job.Getting Help
Our helpdesk can be reached via email to hpc-helpdesk@bih-charite.de. Please read our guide on how to write good tickets first.
Please also use the handy figure below on general problem resolution.
But before contacting the helpdesk, try to get help in the HPC Talk BIH HPC user self-help forum!
"},{"location":"help/helpdesk/#helpdesk-scope","title":"Helpdesk Scope","text":"Our helpdesk can support you in the following areas:
We will try our best to resolve these issues. Please note that all other questions can only be answered in a \"best effort way\".
"},{"location":"help/helpdesk/#helpdesk-non-scope","title":"Helpdesk Non-Scope","text":"The following topics are out of scope for the BIH HPC Helpdesk:
We're happy to see if we can help when there is a concrete problem with the software, e.g.,
Another community-driven possibility to get help is our \u201cHPC Talk\u201d forum. After this manual, it should be the first place to consult.
https://hpc-talk.cubi.bihealth.org/
Its main purpose is to serve as a FAQ, so with time and more people participating, you will more likely find an answer to your question. We also use it to make announcements and give an up-to-date status of current problems with the cluster, so it is worth logging in every once in a while. It is also a great first place to look at if you're experiencing problems with the cluster. Maybe it's a known issue.
Despite users also being able to answer questions, our admins do participate on a regular basis.
"},{"location":"how-to/connect/gpu-nodes/","title":"How-To: Connect to GPU Nodes","text":"The cluster has seven nodes with four Tesla V100 GPUs each: hpc-gpu-{1..7}
and one node with 10 A40 GPUs: hpc-gpu-8
.
Connecting to a node with GPUs is easy. You request one or more GPU cores by adding a generic resources flag to your Slurm job submission via srun
or sbatch
. - --gres=gpu:tesla:COUNT
will request NVIDIA V100 cores. - --gres=gpu:tesla:COUNT
will request NVIDIA A40 cores. - --gres=gpu:COUNT
will request any available GPU cores.
Your job will be automatically placed in the Slurm gpu
partition and allocated a number of COUNT
GPUs.
Info
Fair use rules apply. As GPU nodes are a limited resource, excessive use by single users is prohibited and can lead to mitigating actions. Be nice and cooperative with other users. Tip: getent passwd USER_NAME
will give you a user's contact details.
Interactive Use of GPU Nodes is Discouraged
While interactive computation on the GPU nodes is convenient, it makes it very easy to forget a job after your computation is complete and let it run idle. While your job is allocated, it blocks the allocated GPUs and other users cannot use them although you might not be actually using them. Please prefer batch jobs for your GPU jobs over interactive jobs.
Furthermore, interactive GPU jobs are currently limited to 24 hours. We will monitor the situation and adjust that limit to optimize GPU usage and usability.
Please also note that allocation of GPUs through Slurm is mandatory, in other words: Using GPUs via SSH sessions is prohibited. The scheduler is not aware of manually allocated GPUs and this interferes with other users' jobs.
"},{"location":"how-to/connect/gpu-nodes/#usage-example","title":"Usage example","text":""},{"location":"how-to/connect/gpu-nodes/#preparation","title":"Preparation","text":"We will setup a miniconda installation with pytorch
testing the GPU. If you already have this setup then you can skip this step
hpc-login-1:~$ srun --pty bash\nhpc-cpu-1:~$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh\nhpc-cpu-1:~$ bash Miniconda3-latest-Linux-x86_64.sh -b -p ~/work/miniconda3\nhpc-cpu-1:~$ source ~/work/miniconda3/bin/activate\nhpc-cpu-1:~$ conda create -y -n gpu-test pytorch cudatoolkit=10.2 -c pytorch\nhpc-cpu-1:~$ conda activate gpu-test\nhpc-cpu-1:~$ python -c 'import torch; print(torch.cuda.is_available())'\nFalse\nhpc-cpu-1:~$ exit\nhpc-login-1:~$\n
The False
shows that CUDA is not available on the node but that is to be expected. We're only warming up!
Let us now allocate a GPU. The Slurm schedule will properly allocate GPUs for you and setup the environment variable that tell CUDA which devices are available. The following dry run shows these environment variables (and that they are not available on the login node).
hpc-login-1:~$ export | grep CUDA_VISIBLE_DEVICES\nhpc-login-1:~$ srun --gres=gpu:tesla:1 --pty bash\nhpc-gpu-1:~$ export | grep CUDA_VISIBLE_DEVICES\ndeclare -x CUDA_VISIBLE_DEVICES=\"0\"\nhpc-gpu-1:~$ exit\nhpc-login-1:~$ srun --gres=gpu:tesla:2 --pty bash\nhpc-gpu-1:~$ export | grep CUDA_VISIBLE_DEVICES\ndeclare -x CUDA_VISIBLE_DEVICES=\"0,1\"\n
As you see, you can also reserve multiple GPUs. If we were to open two concurrent connections (e. g. in a screen
) to the same node when allocating one GPU each, the allocated GPUs would be non-overlapping. Note that any two jobs are isolated using Linux cgroups (\"container\" technology) so you cannot accidentally use a GPU of another job.
Now to the somewhat boring part where we show that CUDA actually works.
hpc-login-1:~$ srun --gres=gpu:tesla:1 --pty bash\nhpc-gpu-1:~$ nvcc --version\nnvcc: NVIDIA (R) Cuda compiler driver\nCopyright (c) 2005-2019 NVIDIA Corporation\nBuilt on Wed_Oct_23_19:24:38_PDT_2019\nCuda compilation tools, release 10.2, V10.2.89\nhpc-gpu-1:~$ source ~/work/miniconda3/bin/activate\nhpc-gpu-1:~$ conda activate gpu-test\nhpc-gpu-1:~$ python -c 'import torch; print(torch.cuda.is_available())'\nTrue\n
Note
If scheduling a GPU fails, consider explicitely requesting the GPU partion via --partition gpu
(or #SBATCH --partition gpu
).
Also make sure to read the FAQ entry \"I have problems connecting to the GPU node! What's wrong?\" if you encounter problems.
"},{"location":"how-to/connect/gpu-nodes/#bonus-1-who-is-using-the-gpus","title":"Bonus #1: Who is using the GPUs?","text":"Use squeue
to find out about currently queued jobs (the egrep
only keeps the header and entries in the gpu
partition).
hpc-login-1:~$ squeue | egrep -iw 'JOBID|gpu'\n JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)\n 33 gpu bash holtgrem R 2:26 1 hpc-gpu-1\n
"},{"location":"how-to/connect/gpu-nodes/#bonus-2-is-the-gpu-running","title":"Bonus #2: Is the GPU running?","text":"To find out how active the GPU nodes actually are, you can connect to the nodes (without allocating a GPU; you can do this even if the node is full) and then use nvidia-smi
.
hpc-login-1:~$ ssh hpc-gpu-1 bash\nhpc-gpu-1:~$ nvidia-smi\nFri Mar 6 11:10:08 2020\n+-----------------------------------------------------------------------------+\n| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |\n|-------------------------------+----------------------+----------------------+\n| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |\n| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |\n|===============================+======================+======================|\n| 0 Tesla V100-SXM2... Off | 00000000:18:00.0 Off | 0 |\n| N/A 62C P0 246W / 300W | 16604MiB / 32510MiB | 99% Default |\n+-------------------------------+----------------------+----------------------+\n| 1 Tesla V100-SXM2... Off | 00000000:3B:00.0 Off | 0 |\n| N/A 61C P0 270W / 300W | 16604MiB / 32510MiB | 100% Default |\n+-------------------------------+----------------------+----------------------+\n| 2 Tesla V100-SXM2... Off | 00000000:86:00.0 Off | 0 |\n| N/A 39C P0 55W / 300W | 0MiB / 32510MiB | 0% Default |\n+-------------------------------+----------------------+----------------------+\n| 3 Tesla V100-SXM2... Off | 00000000:AF:00.0 Off | 0 |\n| N/A 44C P0 60W / 300W | 0MiB / 32510MiB | 4% Default |\n+-------------------------------+----------------------+----------------------+\n\n+-----------------------------------------------------------------------------+\n| Processes: GPU Memory |\n| GPU PID Type Process name Usage |\n|=============================================================================|\n| 0 43461 C python 16593MiB |\n| 1 43373 C python 16593MiB |\n+-----------------------------------------------------------------------------+\n
"},{"location":"how-to/connect/gpu-nodes/#fair-share-fair-use","title":"Fair Share / Fair Use","text":"Note that allocating a GPU makes it unavailable for everyone else, so please behave nicely and be cooperative. If you see someone blocking the GPU nodes for a long time, first find out who it is. You can type getent passwd USER_NAME
on any cluster node to see their email address (and work phone number if added). Send a friendly email, most likely they blocked the node accidentally. If you cannot resolve the issue (e. g. the user is not reachable) then please contact hpc-helpdesk@bih-charite.de.
The cluster has 4 high-memory nodes with 1.5 TB of RAM. You can connect to these nodes using the highmem
SLURM partition (see below). Jobs allocating more than 200 GB of RAM are automatically routed to the highmem
nodes.
Info
Fair use rules apply. As high-memory nodes are a limited resource, excessive use by single users is prohibited and can lead to mitigating actions. Be nice and cooperative with other users. Tip: getent passwd USER_NAME
will give you a user's contact details.
In the cluster there are four High-memory used which can be used:
hpc-login-1:~$ sinfo -p highmem\nPARTITION AVAIL TIMELIMIT NODES STATE NODELIST \nhighmem up 14-00:00:0 3 idle med040[1-4] \n
To connect to one of them, simply allocate more than 200GB of RAM in your job.
hpc-login-1:~$ srun --pty --mem=300GB bash -i\nmed0401:~$\n
You can also pick one of the hostnames:
hpc-login-1:~$ srun --pty --mem=300GB --nodelist=med0403 bash -i\nmed0403:~$\n
After successfull login, you can see that you are in \"highmem\" queue:
med0403:~$ squeue\n JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) \n[...]\n 270 highmem bash holtgrem R 1:25 1 med0403 \n
"},{"location":"how-to/misc/contribute/","title":"How-To: Contribute to this Document","text":"Click on the edit link at the top of each page as shown below.
Please Contribute!
This guide is far from complete. Please feel free to contribute, e.g., refer to How-To: Contribute to this Document.
Please make sure that you have read How-To: Debug Software as a general primer.
As debugging is hard enough already, it makes one wonder how to do this on the HPC system in batch mode. Here is a list of pointers.
"},{"location":"how-to/misc/debug-at-hpc/#attempt-1-run-it-interactively","title":"Attempt 1: Run it interactively!","text":"First of all, you can of course get an interactive session using srun --pty bash -i
and then run your program interactively. Make sure to allocate appropriate memory and cores for your purpose. You might also want to first start a screen
or tmux
session on the login node such that network interruptions to the login node don't harm your hard debugging work!
Does the program work correctly if you do this? If yes, and it only fails when run in batch mode, consider the following behaviour of the scheduler.
The scheduler takes your resource requirements and tries to find a free slot. Once it has found a free slot, it will attempt to run the program. This mainly differs in running it interactively in standard input, output, and error streams.
/dev/null
such that no input is read. You can change this with the --input=
flag to specify a file.--output=
. You can use certain wildcards to make the output (but also the input files) depend on things like the job ID or job name.mkdir
it in the job script itself.Please refer to the sbatch documentation for details.
If your program fails without leaving any log file or any other trace, make sure that the path to the output file exists. To the best of the author's knowledge, there is no way to tell apart a crash because this does not exist and a program failure (except maybe for the running time of 0 seconds and memory usage of 0 bytes).
"},{"location":"how-to/misc/debug-at-hpc/#attempt-2-inspect-the-logs","title":"Attempt 2: Inspect the logs","text":"Do you see any exception in your log files? If not, continue.
If your job is canceled by scancel
or stopped because it exhausted it maximal running time or allocated resources then you will find a note in the last line of your error output log (usually folded into the standard output). Please note that if the previous output line did not include a line ending, the message might be at the very end of the last line.
The message will look similar to:
slurmstepd: error: *** JOB <your job id> ON med0xxx CANCELLED AT 2020-09-02T21:01:12 DUE TO TIME LIMIT ***\n
"},{"location":"how-to/misc/debug-at-hpc/#attempt-3-increase-loggingprinting","title":"Attempt 3: Increase logging/printing","text":"Ideally, you can add one or more --verbose
/-v
flags to your program to increase verbosity. See how far your program gets, see where it fails. This attempt will be greatly helped by reproducible running on a minimal working example.
sattach
","text":"You can use sattach
for attaching your terminal to your running job. This way, you can perform an interactive inspection of the commands.
You can combine this with one of the next attempst of using debuggers to e.g., get an pdb
debugger at an important position of your program. However, please note that pdb
and ipdb
will stop the program's execution if the standard input stream is at end of file (which /dev/null
is and this is used by default in sbatch
jobs).
Log into the node that your program runs on either using srun --pty --nodelist=NODE
or using ssh
. Please note that you should never perform computational intensive things when logging into the node directly. You can then use all activity inspection tips from How-To: Debug Software.
After having logged into the node running your program, you can of course also attach to the program with gdb -p PID
or cgdb -p PID
.
Here are some final remarks:
Please Contribute!
This guide is far from complete. Please feel free to contribute, e.g., refer to How-To: Contribute to this Document.
Software development in general or even debugging of software are very broad topics. As such, we will not be able to handle them here comprehensively. Rather, we will give a tour de force on practical and minimal approaches of debugging of software. Here, debugging refers to the process of locating errors in your program and removing them.
Origin of the term debugging
The terms \"bug\" and \"debugging\" are popularly attributed to Admiral Grace Hopper in the 1940s. While she was working on a Mark II computer at Harvard University, her associates discovered a moth stuck in a relay and thereby impeding operation, whereupon she remarked that they were \"debugging\" the system. However, the term \"bug\", in the sense of \"technical error\", dates back at least to 1878 and Thomas Edison (see software bug for a full discussion).
-- Wikipedia: Debugging
When forgetting a moment about everything known about software engineering, programming roughly work sin the following cycle:
You run your program. In the case of failure, you need to remove the problem until the program runs through. You then start implementing the next change or feature. But how do you actually locate the problem? Let us walk through a couple of steps.
"},{"location":"how-to/misc/debug-software/#step-1-find-out-that-there-is-an-error","title":"Step 1: Find out that there is an error","text":"This might seem trivial but let us think about this for a moment. For this
You could make this step a bit more comfortable by writing a little checker script that compares expected and actual output.
"},{"location":"how-to/misc/debug-software/#step-2-reproduce-your-error","title":"Step 2: Reproduce your error","text":"You will have to find out how often or regularly the problem occurs. Does the problem occur on all inputs or only specific ones? Does it occur with all parameters? Make sure that you can reproduce the problem, otherwise the problem will be hard to track down.
Discard randomness
In most applications, true randomness is neither required nor used in programs. Rather, pseudo random number generators are used that are usually seeded with a special value. In many cases, the current time is used which makes it hard to reproduce problems. Rather, use a fixed seed, e.g., by calling srand(42)
in C. You could also make this a parameter of your program, but make sure that you can fix all pseudo randomness in your program so you can deterministically reproduce its behaviour.
Try to find a minimal input set on which you can produce your problem. For example, you could use samtools view FILE.bam chr1:90,000-100,000
to cut out regions from a BAM file. The next step is to nail down the problem. Ideally, you can deactivate or comment out parts of your program that are irrelevant to the problem.
This will allow you to get to the problematic point in your program quicker and make the whole debugging exercise easier on yourself.
"},{"location":"how-to/misc/debug-software/#interlude-what-we-have-up-to-here","title":"Interlude: What we have up to here","text":"We can now
If you reached the points above, you have probably cut the time to resolve the problem by 90% already.
Let us now consider a few things that you can do from here to find the source of your problems.
"},{"location":"how-to/misc/debug-software/#method-1-stare-at-your-source-code","title":"Method 1: Stare at your source code","text":"Again, this is trivial, but: look at your code and try to follow through what it does with your given input. This is nicely complemented with the following methods. ;-)
There is a class of tools to help you in doing this, so-called static code analysis tools. They analyze the source code for problematic patterns. The success and power of such analysis tools tends to corellate strongly with how strictly typed the targeted programming language is. E.g., there are very powerful tools for Java, C/C++. However, there is some useful tool support out there for dynamic languages such as Python.
Here is a short list of pointers to static code analysis tools (feel free to extend the list):
The most simple approach is to use print
statements (or similar) to print the current line or value of parameters. While sometimes frowned upon, this certainly is one of the most robust ways to see what is happening in your program. However, beware that too much output might slow down your program or actually make your problem disappear in the case of subtle threading/timing issues (sometimes referred to as \"Heisenbugs\").
Standard output vs. error
Classically, Linux/Unix programs can print back to the user's terminal in two ways: standard output and standard errors. By convention, logging should go to stderr. The standard error stream also has the advantage that writing to it has a more direct effect. In contrast to stdout which is usually setup to be (line) buffered (you will only see output after the next newline character), stderr is unbuffered.
"},{"location":"how-to/misc/debug-software/#look-at-tophtop","title":"Look attop
/htop
","text":"The tools top
and htop
are useful tools for inspecting the activity on the current computer. The following parameters are useful (and are actually also available as key strokes when they are running).
-c
-- show the programs' command lines-u USER
-- show the processes of the userYou can exit either tool by pressing q
or Ctrl-C
.
Use the man
, Luke!
Besides searching the internet for a unix command, you can also read its manual page by running man TOOL
. If this does not work, try TOOL --help
to see its builtin help function. Also, doing an internet search for \"man tool\" might help.
strace
","text":"The program strace
allows you to intercept the calls of your program to the kernel. As the kernel is needed for actions such as accessing the network or file system. Thus this is not so useful if your program gets stuck in \"user land\", but this might be useful to see which files it is accessing.
Pro-Tip: if you move the selection line of htop
to a process then you can strace the program by pressing s
.
lsof
","text":"The lsof
program lists all open files with the processes that are accessing them. This is useful for seeing which files you program has opened.
You can even build a progress bar with lsof, although that requires sudo
privileges which you might not have on the system that you are using.
Pro-Tip: if you move the selection line of htop
to a process then you can list the open files by pressing l
.
There are more ways of inspecting your program, here are some:
perf
Let us now enter the world of interactive debuggers. Integrated development environment (IDEs) generally consist of an editor, a compiler/interpreter, and an ineractive/visual debugger. Usually, they have a debugger program at their core that can also be used on their command line.
"},{"location":"how-to/misc/debug-software/#old-but-gold-gdb","title":"Old but gold:gdb
","text":"On Unix systems, a widely used debugger is gdb
the GNU debugger. gdb
is a command line program and if you are not used to it, it might be hard to use. However, here are some pointers on how to use it:
The commands in interactive mode include:
quit
or Ctrl-D
to exit the debuggerb file.ext:123
set breakpoint in file.ext
on line 123
r
run the programp var_name
print the value of the variable var_name
display var_name
print the value of the variable var_name
every time execution stopsl
print the source code around the current line (multiple calls will show the next 10 lines or so, and so on)l 123
print lines around line 123
f
show information about the current frame (that is the current source location)bt
show the backtrace (that is all functions above the current one)n
step to the next lines
step into function callsfinish
run the current function until it returnshelp
to get more helpYou can call your program directly with command line arguments using cgdb [cgdb-args] --args path/to/program -- [program-args
.You can also attach to running programs using
cgdb -p PIDonce you have found out the process ID to attach to using
htopor
ps`.
Pro-tip: use cgdb
for an easier to use version that displays the source code in split screen and stores command line histories over sessions.
You can get a simple REPL (read-execute-print loop) at virtually any position in your program by adding:
import pdb; pdb.set_trace()\n
You will get a prompt at the current position and can issue several commands including:
quit
or Ctrl-D
to exit the debuggerp var_name
to print the variable with var_name
f
show information about the current frame (that is the current source location)bt
show the backtrace (that is all functions called above the current one)continue
to continue runninghelp
to get more helpPro-tip: use import ipdb; ipdb.set_trace()
(after installing the ipdb
package, of course) to get an IPython-based prompt that is much more comfortable to use.
Here is a free bonus pro-tip: learn how to use version control, e.g., Git. This will allow you to go back to previous versions without problems and see current changes to your source code.
Combine the pro tip on using version control (learn Git already!) with this one: learn how to write automated tests. This will allow you to quickly narrow down problematic changes in your version control history.
Again, testing is another topic alltogether, so here are just some links to testing frameworks to get you started:
The following web resources can serve as a starting point on how to use debuggers.
We provide a user forum using the Discourse software at
First of all, visit the website for the first time: https://hpc-talk.cubi.bihealth.org
You will then be directed to our Single-Sign-On Page.
Use the appropriate button for your host organisation (MDC / Charite) where also your cluster account belongs to.
Then use the usual of your host organisation.
Clicked wrong organisation?
If you accidentally clicked the wrong institution then you need to clear your browser history up to the point where you clicked (e.g., for the last hour).
You will be shown the following screen after the first login.
You can proceed with reading the notification or split it. The site is mostly self-explanatory. let us point you at a couple of interesting things for first steps.
Here you can setup your preferences
Use the \"New Topic\" button to create a new topic. Set a meaningful title, select a suitable category (we will update the list of categories over time), and write down your question or discussion item. Finally, click \"Create Topic\" to create the new topic.
You will be directed to the page with your new topic.
You can enable email notifications to receive emails if someone answers.
"},{"location":"how-to/misc/hpc-talk/#disabling-browser-notifications","title":"Disabling Browser Notifications","text":"In your settings, you will find an option to disable browser notifications in this browser.
Or you can use the do not disturb button.
"},{"location":"how-to/misc/hpc-talk/#closing-remarks","title":"Closing Remarks","text":"We established the HPC Talk forum as a self-help forum for users. Alas, there is a number of such sites out there already that are populated by more users.
How does HPC Talk fit in?
We think it is most useful for asking questions and discussing points that are directly related to the BIH HPC system.
What alternatives do I have?
For example:
Obtaining File Boxes
At the moment, file boxes are only available to members of core facilities (e.g., genomics, bioinformatics, or metabolomics) for exchanging files for their collaboration partners. Currently, HPC users cannot use the file box mechanism on their own.
BIH HPC IT provides a file exchange server to be used by the BIH core facilities and their users. The server is located in the BIH DMZ in Buch. Users authenticate using their Charite/BIH (user@CHARITE
) or MDC accounts (user@MDC-BERLIN
). File exchange is organized using \"file boxes\", directories created on the server to which selected users are granted access. Access control list maintenance is done with audit-trails (\"Revisionssicherheit\") and the file access itself is also logged to comply with data protection standards.
Access from Charite Network
Access from the Charite network (IP ranges 141.x.x.x
and 10.x.x.x
) must connect through the Charite proxy (http://proxy.charite.de:8080
). Depending on the client software that you are using, you might have to configure the proxy.
File boxes are created by the core facilities (e.g., the genomics facilities at Charite and MDC). The facility members also organize the access control. Please talk to your core facility contact on file exchange.
External users must obtain a Charite or MDC account first. Account creation is handled by the core facilities that the external user is a customer of.
"},{"location":"how-to/service/file-exchange/#file-access","title":"File Access","text":"Generally, you will be given a URL to your file box similar to https://file-exchange.bihealth.org/<file-box-id>/
. The files are served over an encrypted connection using WebDAV (which uses HTTPS).
The following describes how to access the files in the box from different platforms.
"},{"location":"how-to/service/file-exchange/#from-linux","title":"From Linux","text":"We describe how to access the files on the command line using the lftp
program. The program is preinstalled on the BIH (and the MDC cluster) and you should be able to just install it with yum install lftp
on CentOS/Red Hat or apt-get install lftp
on Ubuntu/Debian.
When using lftp
, you have to add some configuration first:
# cat >>~/.lftprc <<\"EOF\"\nset ssl:verify-certificate no\nset ftp:ssl-force yes\nEOF\n
In case that you want to access the files using a graphical user interface, search Google for \"WebDAV\" and your operating system or desktop environment. File browsers such as Nautilus and Thunar have built-in WebDAV support.
"},{"location":"how-to/service/file-exchange/#connecting","title":"Connecting","text":"First, log into the machine that has lftp
installed. The login nodes of the BIH cluster do not have it installed but all compute and file transfer nodes have it. Go to the data download location.
host:~$ mkdir -p ~/scratch/download_dir\nhost:~$ cd ~/scratch/download_dir\n
Next, start lftp
. You can open the connection using open -u <user>@<DOMAIN> https://file-exchange.bihealth.org/<file-box-id>/
(NB: there is a trailing slash) where
<user>
is your user name, e.g., holtgrem
,<domain>
is either MDC-BERLIN
or CHARITE
, and<file-box-id>
the file box ID from the URL provided to you.When prompted, use your normal Charite/MDC password to login.
host:download_dir$ lftp\nlftp :~> open -u holtgrem@CHARITE https://file-exchange.bihealth.org/c62910b3-c1ba-49a5-81a6-a68f1f15aef6\nPassword:\ncd ok, cwd=/c62910b3-c1ba-49a5-81a6-a68f1f15aef6\nlftp holtgrem@CHARITE@file-exchange.bihealth.org:/c62910b3-c1ba-49a5-81a6-a68f1f15aef6>\n
"},{"location":"how-to/service/file-exchange/#browsing-data","title":"Browsing Data","text":"You can find a full reference of lftp
on the lftp man page. You could also use help COMMAND
on the lftp prompt. For example, to look at the files of the server for a bit...
lftp holtgrem@CHARITE@file-exchange.bihealth.org:/c62910b3-c1ba-49a5-81a6-a68f1f15aef6> ls\ndrwxr-xr-x -- /\ndrwxr-xr-x -- dir\n-rw-r--r-- -- file1\nlftp holtgrem@CHARITE@file-exchange.bihealth.org:/c62910b3-c1ba-49a5-81a6-a68f1f15aef6> find\n./\n./dir/\n./dir/file2\n./file1\n
"},{"location":"how-to/service/file-exchange/#downloading-data","title":"Downloading Data","text":"To download all data use mirror
, e.g. with -P 4
to use four download threads.
lftp holtgrem@CHARITE@file-exchange.bihealth.org:/c62910b3-c1ba-49a5-81a6-a68f1f15aef6> mirror .\nTotal: 2 directories, 3 files, 0 symlinks\nNew: 3 files, 0 symlinks\nlftp holtgrem@CHARITE@file-exchange.bihealth.org:/c62910b3-c1ba-49a5-81a6-a68f1f15aef6> exit\nhost:download_dir$ tree\n.\n\u251c\u2500\u2500 dir\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 file2\n\u251c\u2500\u2500 file1\n\u2514\u2500\u2500 file.txt\n\n1 directory, 3 files\n
Ignoring gnutls_record_recv
errors.
A common error to see is mirror: Fatal error: gnutls_record_recv: The TLS connection was non-properly terminated.
. You can just ignore this.
To upload data, you can use mirror -R .
which is essentially the \"reverse\" of the mirror command.
lftp holtgrem@CHARITE@file-exchange.bihealth.org:/c62910b3-c1ba-49a5-81a6-a68f1f15aef6> mirror -R\nmirror: Fatal error: gnutls_record_recv: The TLS connection was non-properly terminated.\nmirror: Fatal error: gnutls_record_recv: The TLS connection was non-properly terminated.\nmirror: Fatal error: gnutls_record_recv: The TLS connection was non-properly terminated.\nTotal: 2 directories, 3 files, 0 symlinks\nModified: 3 files, 0 symlinks\n4 errors detected\n
"},{"location":"how-to/service/file-exchange/#from-windows","title":"From Windows","text":"We recommend to use WinSCP for file transfer.
After starting WinSCP, you will see a window titled Login
. Just paste the URL (e.g., https://file-exchange.bihealth.org/c62910b3-c1ba-49a5-81a6-a68f1f15aef6/
) of the file box into the Host name
entry field. In this case, the fields File protocol
etc. will be filled automatically. Next, enter your user name as user@CHARITE
or user@MDC-BERLIN
(the capitalization of the part behind the @
is important). The window should now look similar to the one below.
Proxy Configuration on Charite Network
If you are on the Charite network then you have to configure the proxy. Otherwise, you have to skip this step.
Click Advanced
and a window titled Advanced Site Settings
will pop up. Here, select Connection / Proxy
in the left side. Select HTTP
for the Proxy type
. Then, enter proxy.charite.de
as the Proxy host name
and set the Port number
to 8080
. The window should nwo look as below. Then, click OK
to apply the proxy settings.
Finally, click Login
. You can now transfer files between the file exchange server and your local computer using drag and drop between WinSCP and your local Windows File Explorer. Alternatively, you can use the two-panel view of WinSCP to transfer files as described here.
For Mac, we you can also use lftp
as described above in From Linux. You can find install instructions here online.
Proxy Configuration on Charite Network
If you are on the Charite network then you must have configured the proxy appropriately. Otherwise, you have to skip this step.
You can find them in your System Preference
in the Network
section, in the Advanced
tab of your network (e.g., WiFi
).
If you want to use a graphical interface then we recommend the usage of Cyberduck. After starting the program, click Open Connection
on the top left, then select WebDAV (HTTPS)
and fill out the form as in the following way. Paste the file box URL into the server field and use your login name (user@CHARITE
or user@MDC-BERLIN
) with your usual password.
If you need to perform access through a graphical user interface on your Mac, please contact hpc-helpdesk@bihealth.org for support.
"},{"location":"how-to/service/file-exchange/#security","title":"Security","text":"The file exchange server has the fail2ban
software installed and configured (Charite, MDC, and BIH IPs are excluded from this).
If you are entering your user/password incorrectly for more than 5 times in 10 minutes then your machine will be banned for one hour. This means someone else that has the same IP address from the side of the file exchange server can get you blocked. This can happen if you are in the same home or university network with NAT or if you are behind a proxy. In this case you get a \"connection refused\" error. In this case, try again in one hour.
"},{"location":"how-to/software/apptainer/","title":"Using Apptainer (with Docker Images)","text":"Note
Singularity is now Apptainer! While Apptainer provides an singularity
alias for backwards compatibility, it is recommanded to adapt all workflows to use the new binary apptainer
.
Apptainer (https://apptainer.org/) is a popular alternative to docker, because it does not require to run as a privileged user. Apptainer can run Docker images out-of-the-box by converting them to the apptainer image format. The following guide gives a quick dive into using docker images with apptainer.
Build on your workstation, run on the HPC
Building images using Apptainer requires root privileges. We cannot give you these permissions on the BIH HPC. Thus, you will have to build the images on your local workstation (or anywhere where you have root access). You can then run the built images on the BIH HPC.
This is also true for the --writeable
flag. Apparently it needs root permissions which you don't have on the cluster.
Link ~/.apptainer to ~/work/.apptainer
Because you only have a quota of 1 GB in your home directory, you should symlink ~/.apptainer
to ~/work/.apptainer
.
host:~$ mkdir -p ~/work/.apptainer && ln -sr ~/work/.apptainer ~/.apptainer\n
In case you already have a apptainer directory:
host:~$ mv ~/.apptainer ~/work/.apptainer && ln -sr ~/work/.apptainer ~/.apptainer\n
Run a bash in a docker image:
host:~$ apptainer shell docker://godlovedc/lolcow\n
Run a command in a docker image:
host:~$ apptainer exec docker://godlovedc/lolcow echo \"hello, hello!\"\n
Run a bash in a docker image, enable access to the cuda driver (--nv) and mount a path (--bind or -B):
host:~$ apptainer shell --nv --bind /path_on_host/:/path_inside_container/ docker://godlovedc/lolcow\n
"},{"location":"how-to/software/apptainer/#some-caveats-and-notes","title":"Some Caveats and Notes","text":"Caveats
Notes
APPTAINERENV_
: host:~$ APPTAINERENV_HELLO=123 apptainer shell docker://godlovedc/lolcow echo $HELLO\n
apptainer shell
or apptainer exec
uses as cwd the host callers cwd not the one set in the Dockerfile. One can change this by setting --pwd
.The easiest variant to run a docker image available via a docker hub is by specifying its url. This causes apptainer to download the image and convert it to a apptainer image:
host:~$ apptainer run docker://godlovedc/lolcow\n
or to open a shell inside the image
host:~$ apptainer shell docker://godlovedc/lolcow\n
Furthermore, similar to docker, one can pull (and convert) remote image with the following call:
host:~$ apptainer pull docker://godlovedc/lolcow\n
In case your registry requires authentication you can provide it via a prompt by adding the option --docker-login
:
host:~$ apptainer pull --docker-login docker://ilumb/mylolcow\n
or by setting the following environment variables:
host:~$ export APPTAINER_DOCKER_USERNAME=ilumb\nhost:~$ export APPTAINER_DOCKER_PASSWORD=<redacted>\nhost:~$ apptainer pull docker://ilumb/mylolcow\n
More details can be found in the Apptainer documentation.
"},{"location":"how-to/software/apptainer/#option-2-converting-docker-images","title":"Option 2: Converting Docker Images","text":"Another option is to convert your docker image into the Apptainer/Singularity image format. This can be easily done using the docker images provided by docker2singularity.
To convert the docker image docker_image_name
to the apptainer image apptainer_image_name
one can use the following command line. The output image will be located in output_directory_for_images
.
host:~$ docker run -v /var/run/docker.sock:/var/run/docker.sock -v /output_directory_for_images/:/output --privileged -t --rm quay.io/singularity/docker2singularity --name apptainer_image_name docker_image_name\n
The resulting image can then directly be used as image:
host:~$ apptainer exec apptainer_image_name.sif bash\n
"},{"location":"how-to/software/apptainer/#conversion-compatibility","title":"Conversion Compatibility","text":"Here are some tips for making Docker images compatible with Apptainer taken from docker2singulrity:
~/.bashrc
, ~/.profile
, etc.ENTRYPOINT
instruction set pointing to the command line interface to your pipeline.CMD
- rely only on ENTRYPOINT
.ENTRYPOINT docker run -i -t --entrypoint /bin/bash bids/example
.--read-only --tmpfs /run --tmpfs /tmp parameters
(this emulates the read-only behavior of Apptainer).USER
instruction set.from the official website: \"Cell Ranger is a set of analysis pipelines that process Chromium single-cell RNA-seq output to align reads, generate feature-barcode matrices and perform clustering and gene expression analysis\"
"},{"location":"how-to/software/cell-ranger/#installation","title":"installation","text":"requires registration before download from here
to unpack Cell Ranger, its dependencies and the cellranger
script:
cd /data/cephfs-1/home/users/$USER/work\nmv /path/to/cellranger-3.0.2.tar.gz .\ntar -xzvf cellranger-3.0.2.tar.gz\n
"},{"location":"how-to/software/cell-ranger/#reference-data","title":"reference data","text":"will be provided in /data/cephfs-1/work/projects/cubit/current/static_data/app_support/cellranger
add a file slurm.template
to /data/cephfs-1/home/users/$USER/work/cellranger-3.0.2/martian-cs/v3.2.0/jobmanagers/sge.template
with the following contents:
#!/usr/bin/env bash\n#\n# Copyright (c) 2016 10x Genomics, Inc. All rights reserved.\n#\n# =============================================================================\n# Setup Instructions\n# =============================================================================\n#\n# 1. Add any other necessary Slurm arguments such as partition (-p) or account\n# (-A). If your system requires a walltime (-t), 24 hours (24:00:00) is\n# sufficient. We recommend you do not remove any arguments below or Martian\n# may not run properly.\n#\n# 2. Change filename of slurm.template.example to slurm.template.\n#\n# =============================================================================\n# Template\n# =============================================================================\n#\n#SBATCH -J __MRO_JOB_NAME__\n#SBATCH --export=ALL\n#SBATCH --nodes=1 --ntasks-per-node=__MRO_THREADS__\n#SBATCH --signal=2\n#SBATCH --no-requeue\n#SBATCH --partition=medium\n#SBATCH --time=24:00:00\n### Alternatively: --ntasks=1 --cpus-per-task=__MRO_THREADS__\n### Consult with your cluster administrators to find the combination that\n### works best for single-node, multi-threaded applications on your system.\n#SBATCH --mem=__MRO_MEM_GB__G\n#SBATCH -o __MRO_STDOUT__\n#SBATCH -e __MRO_STDERR__\n\n__MRO_CMD__\n
note: on newer cellranger version, slurm.template
needs to go to /data/cephfs-1/home/users/$USER/work/cellranger-XX/external/martian/jobmanagers/
if that hasn't been done yet, you can use cellranger mkfastq
(details to be added)
count
)","text":"create a script run_cellranger.sh
with these contents (consult the documentation for help:
#!/bin/bash\n\n/data/cephfs-1/home/users/$USER/work/cellranger-3.0.2/cellranger count \\\n --id=sample_id \\\n --transcriptome=/data/cephfs-1/work/projects/cubit/current/static_data/app_support/cellranger/refdata-cellranger-${species}-3.0.0\\\n --fastqs=/path/to/fastqs \\\n --sample=sample_name \\\n --expect-cells=n_cells \\\n --jobmode=slurm \\\n --maxjobs=100 \\\n --jobinterval=1000\n
and then submit the job via
sbatch --ntasks=1 --mem-per-cpu=4G --time=8:00:00 -p medium -o cellranger.log run_cellranger.sh\n
"},{"location":"how-to/software/cell-ranger/#cluster-support-sge-outdated","title":"cluster support SGE (outdated)","text":"add a file sge.template
to /data/cephfs-1/home/users/$USER/work/cellranger-3.0.2/martian-cs/v3.2.0/jobmanagers/sge.template
with the following contents:
# =============================================================================\n# Template\n# =============================================================================\n#\n#$ -N __MRO_JOB_NAME__\n#$ -V\n#$ -pe smp __MRO_THREADS__\n#$ -cwd\n#$ -P medium\n#$ -o __MRO_STDOUT__\n#$ -e __MRO_STDERR__\n#$ -l h_vmem=__MRO_MEM_GB_PER_THREAD__G\n#$ -l h_rt=08:00:00\n\n#$ -m a\n#$ -M user@email.com\n\n__MRO_CMD__\n
and submit the job via
qsub -cwd -V -pe smp 1 -l h_vmem=8G -l h_rt=24:00:00 -P medium -m a -j y run_cellranger.sh\n
"},{"location":"how-to/software/jupyter/","title":"How-To: Run Jupyter","text":"SSH Tunnels Considered Harmful
Please use our Open OnDemand Portal for running Jupyter notebooks!
The information below is still accurate. However, many users find it tricky to get SSH tunnels working correctly. A considerable number of parts is involved and you have to get each step 100% correct. Helpdesk cannot support you in problems with SSH tunnels that are caused by incorrect usage.
"},{"location":"how-to/software/jupyter/#what-is-jupyter","title":"What is Jupyter","text":"Project Jupyter is a networking protocol for interactive computing that allows the user to write and execute code for a high number of different programming languages. The most used client is Jupyter Notebook that can be encountered in various form all over the web. Its basic principle is a document consisting of different cells, each of which contains either code (executed in place) or documentation (written in markdown). This allows one to handily describe the processed workflow.
"},{"location":"how-to/software/jupyter/#setup-and-running-jupyter-on-the-cluster","title":"Setup and running Jupyter on the cluster","text":"Install Jupyter on the cluster (via conda, by creating a custom environment)
hpc-cpu-x:~$ conda create -n jupyter jupyter\nhpc-cpu-x:~$ conda activate jupyter\n
(If you want to work in a language other than python, you can install more Jupyter language kernel, see the kernel list)
Now you can start the Jupyter server session (you may want to do this in a screen
& srun --pty bash -i
session as jupyter keeps running while you are doing computations)
hpc-cpu-x:~$ jupyter notebook --no-browser\n
Check the port number (usually 8888
) in the on output and remember it for later:
[I 23:39:40.860 NotebookApp] The Jupyter Notebook is running at:\n[I 23:39:40.860 NotebookApp] http://localhost:8888/\n
By default, Jupyter will create an access token (a link stated in the output) to protect your notebook against unauthorized access which you have to save and enter in the accessing browser. You can change this to password base authorization via jupyter notebook password
. If you are running multiple server on one or more nodes, one can separate them by changing the port number by adding --port=$PORT
.
This is slightly trickier as we have to create a SSH connection/tunnel with potentially multiple hops in between. The easiest way is probably to configure your .ssh/config
to automatically route your connection via the login node (and possibly MDC jail). This is described in our Advanced SSH config documentation
In short,add these lines to ~/.ssh/config
(replace curly parts):
Host bihcluster\n user {USER_NAME}\n HostName hpc-login-2.cubi.bihealth.org\n\nHost hpc-cpu*\n user {USER_NAME}\n ProxyJump bihcluster\n
For MDC users outside the MDC network:
Host mdcjail\n HostName ssh1.mdc-berlin.de\n User {MDC_USER_NAME}\n\nHost bihcluster\n user {USER_NAME}\n HostName hpc-login-2.cubi.bihealth.org\n\nHost hpc-cpu*\n user {USER_NAME}\n ProxyJump bihcluster\n
Check that this config is working by connecting like this: ssh hpc-cpu-1
. Please note that you cannot use any resources on this node without a valid Slurm session.
Now you setup a tunnel for your running Jupyter session:
workstation:~$ ssh -N -f -L 127.0.0.1:8888:localhost:{PORT} hpc-cpu-x\n
The port of your Jupyter server is usually 8888
. The cluster node srun
has sent you to determines the last argument. You should now be able to connect to your Jupyter server by typing localhost:8888
in your webbrowser (see the note about token and password above).
It can and will happen that will lose connection, either due to network problems or due to shut-down of your computer. This is not a problem at all and you will not lose data, just reconnect to your session. If your notebooks are also losing connection (you will see a colorful remark in the top right corner), reconnect and click the colorful button. If this does not work, your work is still not lost as all cells that have been executed are automatically saved anyways. Copy all unexecuted cells (those are only saved periodically) and reload the browser page (after reconnecting) with F5
. (you can also open a copy of the notebook in another tab, just be aware that there may be synchronisation problems)
There are two independent steps in ending a session:
Canceling the SSH tunnel
hpc-cpu-x:~$ ps aux | grep \"$PORT\"\n
This will give you something like this:
user 54 0.0 0.0 43104 784 ? Ss 15:06 0:00 ssh -N -f -L 127.0.0.1:8888:localhost:8888 hpc-cpu-x\nuser 58 0.0 0.0 41116 1024 tty1 S 15:42 0:00 grep --color=auto 8888\n
from which you need the process ID (here 54
)
hpc-cpu-x:~$ kill -9 $PID\n
Shutdown the Jupyter server
Open the Jupyter session, cancel the process with {Ctrl} + {C} and confirm {y}. Make sure you saved your notebooks beforehand (though auto-save catches most things).
"},{"location":"how-to/software/jupyter/#advanced","title":"Advanced","text":"If anyone has figured out, the following might also be interesting (please add):
Because the GPU nodes med030[1-4]
has four GPU units we can train a model by using multiple GPUs in parallel. This How-To gives an example with Keras 2.2.4 together and tensorflow. Finally soem hints how you can submit a job on the cluster.
Hint
With tensorflow > 2.0 and newer keras version the multi_gpu_model
is deprecated and you have to use the MirroredStrategy
.
we need to import the multi_gpu_model
model from keras.utils
and have to pass our actual model (maybe sequential Keras model) into it. In general Keras automatically configures the number of available nodes (gpus=None
). This seems not to work on our system. So we have to specify the numer of GPUs, e.g. two with gpus=2
. We put this in a try catch environment that it will also work on CPUs.
from keras.utils import multi_gpu_model\n\ntry: \n model = multi_gpu_model(model, gpus=2) \nexcept:\n pass\n
That's it!
Please read here on how to submit jobs to the GPU nodes.
"},{"location":"how-to/software/keras/#conda-environment","title":"Conda environment","text":"All this was tested with the following conda environment:
name: cuda channels: \n- conda-forge\n- bioconda\n- defaults\ndependencies:\n- keras=2.2.4\n- python=3.6.7\n- tensorboard=1.12.0\n- tensorflow=1.12.0\n- tensorflow-base=1.12.0\n- tensorflow-gpu=1.12.0\n
"},{"location":"how-to/software/matlab/","title":"How-To: Use Matlab","text":"Note
This information is outdated and will soon be removed.
GNU Octave as Matlab alternative
Note that GNU Octave is an Open Source alternative to Matlab. While both packages are not 100% compatible, Octave is an alternative that does not require any license management. Further, you can easily install it yourself using Conda.
Want to use the Matlab GUI?
Make sure you understand X forwarding as outline in this FAQ entry.
You can also use Open OnDemand Portal to run Matlab.
"},{"location":"how-to/software/matlab/#pre-requisites","title":"Pre-requisites","text":"You have to register with hpc-helpdesk@bih-charite.de for requesting access to the Matlab licenses. Afterwards, you can connect to the High-Memory using the license_matlab_r2016b
resource (see below).
BIH has a license of Matlab R2016b for 16 seats and various licensed packages (see below). To display the available licenses:
hpc-login-1:~$ scontrol show lic\nLicenseName=matlab_r2016b\n Total=16 Used=0 Free=16 Remote=no\n
Matlab is installed on all of the compute nodes:
# The following is VITAL so the scheduler allocates a license to your session.\nhpc-login-1:~$ srun -L matlab_r2016b:1 --pty bash -i\nmed0127:~$ scontrol show lic\nLicenseName=matlab_r2016b\n Total=16 Used=1 Free=15 Remote=no\nmed0127:~$ module avail\n----------------- /usr/share/Modules/modulefiles -----------------\ndot module-info null\nmodule-git modules use.own\n\n----------------------- /opt/local/modules -----------------------\ncmake/3.11.0-0 llvm/6.0.0-0 openmpi/3.1.0-0\ngcc/7.2.0-0 matlab/r2016b-0\nmed0127:~$ module load matlab/r2016b-0\nStart matlab without GUI: matlab -nosplash -nodisplay -nojvm\n Start matlab with GUI (requires X forwarding (ssh -X)): matlab\nmed0127:~$ matlab -nosplash -nodisplay -nojvm\n < M A T L A B (R) >\n Copyright 1984-2016 The MathWorks, Inc.\n R2016b (9.1.0.441655) 64-bit (glnxa64)\n September 7, 2016\n\n\nFor online documentation, see http://www.mathworks.com/support\nFor product information, visit www.mathworks.com.\n\n\n Non-Degree Granting Education License -- for use at non-degree granting, nonprofit,\n educational organizations only. Not for government, commercial, or other organizational use.\n\n>> ver\n--------------------------------------------------------------------------------------------\nMATLAB Version: 9.1.0.441655 (R2016b)\nMATLAB License Number: 1108905\nOperating System: Linux 3.10.0-862.3.2.el7.x86_64 #1 SMP Mon May 21 23:36:36 UTC 2018 x86_64\nJava Version: Java is not enabled\n--------------------------------------------------------------------------------------------\nMATLAB Version 9.1 (R2016b)\nBioinformatics Toolbox Version 4.7 (R2016b)\nGlobal Optimization Toolbox Version 3.4.1 (R2016b)\nImage Processing Toolbox Version 9.5 (R2016b)\nOptimization Toolbox Version 7.5 (R2016b)\nParallel Computing Toolbox Version 6.9 (R2016b)\nPartial Differential Equation Toolbox Version 2.3 (R2016b)\nSignal Processing Toolbox Version 7.3 (R2016b)\nSimBiology Version 5.5 (R2016b)\nStatistics and Machine Learning Toolbox Version 11.0 (R2016b)\nWavelet Toolbox Version 4.17 (R2016b)\n>> exit\n
"},{"location":"how-to/software/matlab/#running-matlab-ui","title":"Running MATLAB UI","text":"For starting the Matlab with GUI, make sure that your client is running a X11 server and you connect with X11 forwarding enabled (e.g., ssh -X hpc-login-1.cubi.bihealth.org
from the Linux command line). Then, make sure to use srun -L matlab_r2016b:1 --pty --x11 bash -i
for connecting to a node with X11 forwarding enabled.
client:~$ ssh -X hpc-login-1.cubi.bihealth.org\n[...]\nhpc-login-1:~ $ srun -L matlab_r2016b:1 --pty --x11 bash -i\n[...]\nmed0203:~$ module load matlab/r2016b-0\nStart matlab without GUI: matlab -nosplash -nodisplay -nojvm\n Start matlab with GUI (requires X forwarding (ssh -X)): matlab\nmed0203:~$ matlab\n[UI will start]\n
For forcing starting in text mode can be done (as said after module load
): matlab -nosplash -nodisplay -nojvm
.
Also see this FAQ entry.
"},{"location":"how-to/software/matlab/#see-available-matlab-licenses","title":"See Available Matlab Licenses","text":"You can use scontrol show lic
to see the currently available MATLAB license. E.g., here I am running an interactive shell in which I have requested 1 of the 16 MATLAB licenses, so 15 more remain.
$ scontrol show lic\nLicenseName=matlab_r2016b\n Total=16 Used=1 Free=15 Remote=no\n
"},{"location":"how-to/software/matlab/#a-working-example","title":"A Working Example","text":"Get a checkout of our MATLAB example. Then, look around at the contents of this repository.
hpc-login-1:~$ git clone https://github.com/bihealth/bih-cluster-matlab-example.git\nhpc-login-1:~$ cd bih-cluster-matlab-example\nhpc-login-1:~$ cat job_script.sh\n#!/bin/bash\n\n# Logging goes to directory sge_log\n#SBATCH -o slurm_log/%x-%J.log\n# Keep current environment variables\n#SBATCH --export=ALL\n# Name of the script\n#SBATCH --job-name MATLAB-example\n# Allocate 4GB of RAM per core\n#SBATCH --mem 4G\n# Maximal running time of 2 hours\n#SBATCH --time 02:00:00\n# Allocate one Matlab license\n#SBATCH -L matlab_r2016b:1\n\nmodule load matlab/r2016b-0\n\nmatlab -r example\n$ cat example.m\n% Example Hello World script for Matlab.\n\ndisp('Hello world!')\ndisp('Thinking...')\n\npause(10)\n\ndisp(sprintf('The square root of 2 is = %f', sqrt(2)))\nexit\n
For submitting the script, you can do the following
hpc-login-1:~$ sbatch job_script.sh\n
This will submit a job with one Matlab license requested. If you were to submit 17 of these jobs, then at least one of them would have to wait until all licenses are free.
Matlab License Server
Note that there is a Matlab license server running on the server that will check whether 16 or less Matlab sessions are currently running. If a Matlab session is running but this is not made known to the scheduler via -L matlab_r2016b
then this can lead to scripts crashing as not enough licenses are available. If this happens to you, double-check that you have specified the license requirements correctly and notify hpc-helpdesk@bih-charite.de in case of any problems. We will try to sort out the situation then.
This article describes how to build an run an OpenMPI program. We will build a simple C program that uses the OpenMPI message passing interface and run it in parallel. You should be able to go from here with other languages and more complex programs. We will use a simple Makefile for building the software.
"},{"location":"how-to/software/openmpi/#loading-openmpi-environment","title":"Loading OpenMPI Environment","text":"First, load the OpenMPI package.
hpc-login-1:~$ srun --pty bash -i\nmed0127:~$ module load openmpi/4.3.0-0\n
Then, check that the installation works
med0127:~$ ompi_info | head\n Package: Open MPI root@med0127 Distribution\n Open MPI: 4.0.3\n Open MPI repo revision: v4.0.3\n Open MPI release date: Mar 03, 2020\n Open RTE: 4.0.3\n Open RTE repo revision: v4.0.3\n Open RTE release date: Mar 03, 2020\n OPAL: 4.0.3\n OPAL repo revision: v4.0.3\n OPAL release date: Mar 03, 2020\n
"},{"location":"how-to/software/openmpi/#building-the-example","title":"Building the example","text":"Next, clone the OpenMPI example project from Gitlab.
med0127:~$ git clone git@github.com:bihealth/bih-cluster-openmpi-example.git\nmed0127:~$ cd bih-cluster-openmpi-example/src\n
Makefile
.PHONY: default clean\n\n# configure compilers\nCC=mpicc\nCXX=mpicxx\n# configure flags\nCCFLAGS += $(shell mpicc --showme:compile)\nLDFLAGS += $(shell mpicc --showme:link)\n\ndefault: openmpi_example\n\nopenmpi_example: openmpi_example.o\n\nclean:\n rm -f openmpi_example.o openmpi_example\n
openmpi_example.c
#include <stdio.h>\n#include <mpi.h>\n\nint main(int argc, char** argv) {\n // Initialize the MPI environment\n MPI_Init(NULL, NULL);\n\n // Get the number of processes\n int world_size;\n MPI_Comm_size(MPI_COMM_WORLD, &world_size);\n\n // Get the rank of the process\n int world_rank;\n MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);\n\n // Get the name of the processor\n char processor_name[MPI_MAX_PROCESSOR_NAME];\n int name_len;\n MPI_Get_processor_name(processor_name, &name_len);\n\n // Print off a hello world message\n printf(\"Hello world from processor %s, rank %d\"\n \" out of %d processors\\n\",\n processor_name, world_rank, world_size);\n\n // Finalize the MPI environment.\n MPI_Finalize();\n\n return 0;\n}\n
run_mpi.sh
#!/bin/bash\n\n# Example job script for (single-threaded) MPI programs.\n\n# Generic arguments\n\n# Job name\n#SBATCH --job-name openmpi_example\n# Maximal running time of 10 min\n#SBATCH --time 00:10:00\n# Allocate 1GB of memory per node\n#SBATCH --mem 1G\n# Write logs to directory \"slurm_log\"\n#SBATCH -o slurm_log/slurm-%x-%J.log\n\n# MPI-specific parameters\n\n# Run 64 tasks (threads/on virtual cores)\n#SBATCH --nodes 64\n\n# Make sure to source the profile.d file (not available on head nodes).\n/etc/profile.d/modules.sh\n\n# Load the OpenMPI environment module to get the runtime environment.\nmodule load openmpi/3.1.0-0\n\n# Launch the program.\nmpirun -np 64 ./openmpi_example\n
The next step is building the software
med0127:~$ make\nmpicc -c -o openmpi_example.o openmpi_example.c\nmpicc -pthread -Wl,-rpath -Wl,/opt/local/openmpi-4.0.3-0/lib -Wl,--enable-new-dtags -L/opt/local/openmpi-4.0.3-0/lib -lmpi openmpi_example.o -o openmpi_example\nmed0127:~$ ls -lh\ntotal 259K\n-rw-rw---- 1 holtgrem_c hpc-ag-cubi 287 Apr 7 23:29 Makefile\n-rwxrwx--- 1 holtgrem_c hpc-ag-cubi 8.5K Apr 8 00:15 openmpi_example\n-rw-rw---- 1 holtgrem_c hpc-ag-cubi 760 Apr 7 23:29 openmpi_example.c\n-rw-rw---- 1 holtgrem_c hpc-ag-cubi 2.1K Apr 8 00:15 openmpi_example.o\n-rwxrwx--- 1 holtgrem_c hpc-ag-cubi 1.3K Apr 7 23:29 run_hybrid.sh\n-rwxrwx--- 1 holtgrem_c hpc-ag-cubi 663 Apr 7 23:35 run_mpi.sh\ndrwxrwx--- 2 holtgrem_c hpc-ag-cubi 4.0K Apr 7 23:29 sge_log\n
The software will run outside of the MPI environment -- but in a single process only, of course.
med0127:~$ ./openmpi_example\nHello world from processor med0127, rank 0 out of 1 processors\n
"},{"location":"how-to/software/openmpi/#running-openmpi-software","title":"Running OpenMPI Software","text":"All of the arguments are already in the run_mpi.sh
script.
med01247:~# sbatch run_mpi.sh\n
Explanation of the OpenMPI-specific arguments
--ntasks 64
: run 64 processes in the MPI environment.Let's look at the slurm log file, e.g., in slurm_log/slurm-openmpi_example-3181.log
.
med0124:~$ cat slurm_log/slurm-openmpi_example-*.log\nHello world from processor med0133, rank 6 out of 64 processors\nHello world from processor med0133, rank 25 out of 64 processors\nHello world from processor med0133, rank 1 out of 64 processors\nHello world from processor med0133, rank 2 out of 64 processors\nHello world from processor med0133, rank 3 out of 64 processors\nHello world from processor med0133, rank 7 out of 64 processors\nHello world from processor med0133, rank 9 out of 64 processors\nHello world from processor med0133, rank 12 out of 64 processors\nHello world from processor med0133, rank 13 out of 64 processors\nHello world from processor med0133, rank 15 out of 64 processors\nHello world from processor med0133, rank 16 out of 64 processors\nHello world from processor med0133, rank 17 out of 64 processors\nHello world from processor med0133, rank 18 out of 64 processors\nHello world from processor med0133, rank 23 out of 64 processors\nHello world from processor med0133, rank 24 out of 64 processors\nHello world from processor med0133, rank 26 out of 64 processors\nHello world from processor med0133, rank 27 out of 64 processors\nHello world from processor med0133, rank 31 out of 64 processors\nHello world from processor med0133, rank 0 out of 64 processors\nHello world from processor med0133, rank 4 out of 64 processors\nHello world from processor med0133, rank 5 out of 64 processors\nHello world from processor med0133, rank 8 out of 64 processors\nHello world from processor med0133, rank 10 out of 64 processors\nHello world from processor med0133, rank 11 out of 64 processors\nHello world from processor med0133, rank 14 out of 64 processors\nHello world from processor med0133, rank 19 out of 64 processors\nHello world from processor med0133, rank 20 out of 64 processors\nHello world from processor med0133, rank 21 out of 64 processors\nHello world from processor med0133, rank 22 out of 64 processors\nHello world from processor med0133, rank 28 out of 64 processors\nHello world from processor med0133, rank 29 out of 64 processors\nHello world from processor med0133, rank 30 out of 64 processors\nHello world from processor med0134, rank 32 out of 64 processors\nHello world from processor med0134, rank 33 out of 64 processors\nHello world from processor med0134, rank 34 out of 64 processors\nHello world from processor med0134, rank 38 out of 64 processors\nHello world from processor med0134, rank 39 out of 64 processors\nHello world from processor med0134, rank 42 out of 64 processors\nHello world from processor med0134, rank 44 out of 64 processors\nHello world from processor med0134, rank 45 out of 64 processors\nHello world from processor med0134, rank 46 out of 64 processors\nHello world from processor med0134, rank 53 out of 64 processors\nHello world from processor med0134, rank 54 out of 64 processors\nHello world from processor med0134, rank 55 out of 64 processors\nHello world from processor med0134, rank 60 out of 64 processors\nHello world from processor med0134, rank 62 out of 64 processors\nHello world from processor med0134, rank 35 out of 64 processors\nHello world from processor med0134, rank 36 out of 64 processors\nHello world from processor med0134, rank 37 out of 64 processors\nHello world from processor med0134, rank 40 out of 64 processors\nHello world from processor med0134, rank 41 out of 64 processors\nHello world from processor med0134, rank 43 out of 64 processors\nHello world from processor med0134, rank 47 out of 64 processors\nHello world from processor med0134, rank 48 out of 64 processors\nHello world from processor med0134, rank 49 out of 64 processors\nHello world from processor med0134, rank 50 out of 64 processors\nHello world from processor med0134, rank 51 out of 64 processors\nHello world from processor med0134, rank 52 out of 64 processors\nHello world from processor med0134, rank 56 out of 64 processors\nHello world from processor med0134, rank 57 out of 64 processors\nHello world from processor med0134, rank 59 out of 64 processors\nHello world from processor med0134, rank 61 out of 64 processors\nHello world from processor med0134, rank 63 out of 64 processors\nHello world from processor med0134, rank 58 out of 64 processors\n
"},{"location":"how-to/software/openmpi/#running-hybrid-software-mpimultithreading","title":"Running Hybrid Software (MPI+Multithreading)","text":"In some cases, you want to mix multithreading (e.g., via OpenMP) with MPI to run one process with multiple threads that then can communicate via shared memory. Note that OpenMPI will let processes on the same node communicate via shared memory anyway, so this might not be necessary in all cases.
The file run_hybrid.sh
shows how to run an MPI job with 8 threads each.
Note well that memory is allocated on a per-slot (thus per-thread) base!
run_hybrid.sh
#!/bin/bash\n\n# Example job script for multi-threaded MPI programs, sometimes\n# called \"hybrid\" MPI computing.\n\n# Generic arguments\n\n# Job name\n#SBATCH --job-name openmpi_example\n# Maximal running time of 10 min\n#SBATCH --time 00:10:00\n# Allocate 1GB of memory per node\n#SBATCH --mem 1G\n# Write logs to directory \"slurm_log\"\n#SBATCH -o slurm_log/slurm-%x-%J.log\n\n# MPI-specific parameters\n\n# Run 8 tasks (threads/on virtual cores)\n#SBATCH --ntasks 8\n# Allocate 4 CPUs per task (cores/threads)\n#SBATCH --cpus-per-task 4\n\n# Make sure to source the profile.d file (not available on head nodes).\nsource /etc/profile.d/modules.sh\n\n# Load the OpenMPI environment module to get the runtime environment.\nmodule load openmpi/4.0.3-0\n\n# Launch the program.\nmpirun -n 8 ./openmpi_example\n
We changed the following
Let's look at the log output:
# cat slurm_log/slurm-openmpi_example-3193.log\nHello world from processor med0133, rank 1 out of 8 processors\nHello world from processor med0133, rank 3 out of 8 processors\nHello world from processor med0133, rank 2 out of 8 processors\nHello world from processor med0133, rank 6 out of 8 processors\nHello world from processor med0133, rank 0 out of 8 processors\nHello world from processor med0133, rank 4 out of 8 processors\nHello world from processor med0133, rank 5 out of 8 processors\nHello world from processor med0133, rank 7 out of 8 processors\n
Each process can now launch 4 threads (e.g., by defining export OMP_NUM_THREADS=4
before the program call).
This page gives an end-to-end example how to build and install Gromacs as an example for managing complex scientific software installs in user land. You don't have to learn or understand the specifics of Gromacs. We use it as an example as there are some actual users on the BIH cluster. However, installing it is out of scope of BIH HPC administration.
Gromacs is a good example as it is a sufficiently complex piece of software. Quite some configuration is done on the command line and there is no current software package of it in the common RPM repositories. However, it is quite well-documented and easy to install for scientific software so there is a lot to be learned.
"},{"location":"how-to/software/scientific-software/#related-documents","title":"Related Documents","text":"We will perform the following step:
Makefile
s)Many scientific software packages will have more dependencies. If the dependencies are available as CentOS Core or EPEL packages (such as zlib), HPC IT administration can install them. However, otherwise you will have to install them on their own.
Warning
Do not perform the compilation on the login nodes but go to a compute node instead.
"},{"location":"how-to/software/scientific-software/#downloading-and-extracting-software","title":"Downloading and Extracting Software","text":"This is best done in your scratch
directory as we don't have to keep these files around for long. Note that the files in your scratch
directory will automatically be removed after 2 weeks. You can also use your work
directory here.
hpc-login-1:~$ srun --pty bash -i\nmed0127:~$ mkdir $HOME/scratch/gromacs-install\nmed0127:~$ cd $HOME/scratch/gromacs-install\nmed0127:~$ wget http://ftp.gromacs.org/pub/gromacs/gromacs-2018.3.tar.gz\nmed0127:~$ tar xf gromacs-2018.3.tar.gz\nmed0127:~$ ls gromacs-2018.3\nadmin cmake COPYING CTestConfig.cmake INSTALL scripts src\nAUTHORS CMakeLists.txt CPackInit.cmake docs README share tests\n
So far so good!
"},{"location":"how-to/software/scientific-software/#perform-the-configure-step","title":"Perform the Configure Step","text":"This is the most critical step. Most scientific C/C++ software has a build step and allows for, e.g., disabling and enabling features or setting installation paths. Here, you can configure the software depending on your needs and environment. Also, it is the easiest step to mess up.
Gromac's documentation is actually quite good but the author had problems to follow it to the letter. Gromacs recommends to create an MPI and a non-MPI build but the precise way did not work. This installation creates two flavours for Gromacs 2018.3, but in a different way than the Gromacs documentation proposes.
First, here is how to configure the non-MPI flavour Gromacs wants a modern compiler, so we load gcc
. We will need to note down the precise version we used so later we can load it for running Gromacs with the appropriate libraries. We will install gromacs into $HOME/work/software
, which is appropriate for user-installed software, but it could also go into a group or project directory. Note that we install the software into your work directory as software installations are quite large and might go above your home quota. Also, software installations are usually not precious enough to waste resources on snapshots and backups. Also that we force Gromacs to use AVX_256
for SIMD support (Intel sandy bridge architecture) to not get unsupported CPU instruction errors.
med0127:~$ module load gcc/7.2.0-0 cmake/3.11.0-0\nmed0127:~$ module list\nCurrently Loaded Modulefiles:\n 1) gcc/7.2.0-0 2) cmake/3.11.0-0\nmed0127:~$ mkdir gromacs-2018.3-build-nompi\nmed0127:~$ cd gromacs-2018.3-build-nompi\nmed0127:~$ cmake ../gromacs-2018.3 \\\n -DGMX_BUILD_OWN_FFTW=ON \\\n -DGMX_MPI=OFF \\\n -DGMX_SIMD=AVX_256 \\\n -DCMAKE_INSTALL_PREFIX=$HOME/work/software/gromacs/2018.3\n
Second, here is how to configure the MPI flavour. Note that we are also enabling the openmpi
module. We will also need the precise version here so we can later load the correct libraries. Note that we install the software into the directory gromacs-mpi
but switch off shared library building as recommended by the Gromacs documentation.
med0127:~$ module load openmpi/3.1.0-0\nmed0127:~$ module list\nCurrently Loaded Modulefiles:\n 1) gcc/7.2.0-0 2) cmake/3.11.0-0 3) openmpi/4.0.3-0\nmed0127:~$ mkdir gromacs-2018.3-build-mpi\nmed0127:~$ cd gromacs-2018.3-build-mpi\nmed0127:~$ cmake ../gromacs-2018.3 \\\n -DGMX_BUILD_OWN_FFTW=ON \\\n -DGMX_MPI=ON \\\n -DGMX_SIMD=AVX_256 \\\n -DCMAKE_INSTALL_PREFIX=$HOME/work/software/gromacs-mpi/2018.3 \\\n -DCMAKE_C_COMPILER=$(which mpicc) \\\n -DCMAKE_CXX_COMPILER=$(which mpicxx) \\\n -DBUILD_SHARED_LIBS=off\n
"},{"location":"how-to/software/scientific-software/#perform-the-build-and-install-steps","title":"Perform the Build and Install Steps","text":"This is simple, using -j 32
allows us to build with 32 threads. If something goes wrong: meh, the \"joys\" of compilling C software.
Getting Support for Building Software
BIH HPC IT cannot provide support for compiling scientific software. Please contact the appropriate mailing lists or forums for your scientific software. You should only contact the BIH HPC IT helpdesk only if you are sure that the problem is with the BIH HPC cluster. You should try to resolve the issue on your own and with the developers of the software that you are trying to build/use.
For the no-MPI version:
med0127:~$ cd ../cd gromacs-2018.3-build-nompi\nmed0127:~$ make -j 32\n[...]\nmed0127:~$ make install\n
For the MPI version:
med0127:~$ cd ../cd gromacs-2018.3-build-mpi\nmed0127:~$ make -j 32\n[...]\nmed0127:~$ make install\n
"},{"location":"how-to/software/scientific-software/#create-environment-modules-files","title":"Create Environment Modules Files","text":"For Gromacs 2018.3, the following is appropriate. You should be able to use this as a template for your environment module files:
med0127:~$ mkdir -p $HOME/local/modules/gromacs\nmed0127:~$ cat >$HOME/local/modules/gromacs/2018.3 <<\"EOF\"\n#%Module\nproc ModulesHelp { } {\n puts stderr {\n Gromacs molecular simulation toolkit (non-MPI version)\n\n - http://www.gromacs.org\n }\n}\n\nmodule-whatis {Gromacs molecular simulation toolkit (non-MPI)}\n\nset root /data/cephfs-1/home/users/YOURUSER/work/software/gromacs-mpi/2018.3\n\nprereq gcc/7.2.0-0\n\nconflict gromacs\nconflict gromacs-mpi\n\nprepend-path LD_LIBRARY_PATH $root/lib64\nprepend-path LIBRARY_PATH $root/lib64\nprepend-path MANPATH $root/share/man\nprepend-path PATH $root/bin\nsetenv GMXRC $root/bin/GMXRC\nEOF\n
med0127:~$ mkdir -p $HOME/local/modules/gromacs-mpi\nmed0127:~$ cat >$HOME/local/modules/gromacs-mpi/2018.3 <<\"EOF\"\n#%Module\nproc ModulesHelp { } {\n puts stderr {\n Gromacs molecular simulation toolkit (MPI version)\n\n - http://www.gromacs.org\n }\n}\n\nmodule-whatis {Gromacs molecular simulation toolkit (MPI)}\n\nset root /data/cephfs-1/home/users/YOURUSER/work/software/gromacs-mpi/2018.3\n\nprereq openmpi/4.0.3-0\nprereq gcc/7.2.0-0\n\nconflict gromacs\nconflict gromacs-mpi\n\nprepend-path LD_LIBRARY_PATH $root/lib64\nprepend-path LIBRARY_PATH $root/lib64\nprepend-path MANPATH $root/share/man\nprepend-path PATH $root/bin\nsetenv GMXRC $root/bin/GMXRC\nEOF\n
With the next command, make your local modules files path known to the environemtn modules system.
med0127:~$ module use $HOME/local/modules\n
You can verify the result:
med0127:~$ module avail\n\n------------------ /data/cephfs-1/home/users/YOURUSER/local/modules ------------------\ngromacs/2018.3 gromacs-mpi/2018.3\n\n-------------------- /usr/share/Modules/modulefiles --------------------\ndot module-info null\nmodule-git modules use.own\n\n-------------------------- /opt/local/modules --------------------------\ncmake/3.11.0-0 llvm/6.0.0-0 openmpi/3.1.0-0\ngcc/7.2.0-0 matlab/r2016b-0 openmpi/4.0.3-0\n
"},{"location":"how-to/software/scientific-software/#interlude-convenient-module-use","title":"Interlude: Convenient module use
","text":"You can add this to your ~/.bashrc
file to always execute the module use
after login. Note that module
is not available on the login or transfer nodes, the following should work fine:
med0127:~$ cat >>~/.bashrc <<\"EOF\"\ncase \"${HOSTNAME}\" in\n login-*|transfer-*)\n ;;\n *)\n module use $HOME/local/modules\n ;;\nesac\nEOF\n
Note that the paths chosen above are sensible but arbitrary. You can install any software anywhere you have permission to -- somewhere in your user and group home, maybe a project home makes most sense on the BIH HPC, no root permissions required. You can also place the module files anywhere, as long as the module use
line is appropriate.
As a best practice, you could use the following location:
$HOME/work/software
as a root to install software to$HOME/work/software/$PKG/$VERSION
for installing a given software package in a given version$HOME/work/software/modules
as the root for modules to install$HOME/work/software/$PKG/$VERSION
for the module file to load the software in a given version$HOME/work/software/modules.sh
as a Bash script to contain the line module use $HOME/work/software/modules
chmod ug=rX,o= $GROUP/work/software
, the upper case X
is essential to only set +x
on directories and not files):$GROUP/work/software
as a root to install software to$GROUP/work/software/$PKG/$VERSION
for installing a given software package in a given version$GROUP/work/software/modules
as the root for modules to install$GROUP/work/software/$PKG/$VERSION
for the module file to load the software in a given version$GROUP/work/software/modules.sh
as a Bash script to contain the case
Bash snippet from above but with module use $GROUP/work/software/modules
Every time you want to use Gromacs, you can now do
med0127:~$ module load gcc/7.2.0-0 gromacs/2018.3\n
or, if you want to have the MPI version:
med0127:~$ module load gcc/7.2.0-0 openmpi/4.0.3-0 gromacs-mpi/2018.3\n
"},{"location":"how-to/software/scientific-software/#launching-gromacs","title":"Launching Gromacs","text":"Something along the lines of the following job script should be appropriate. See How-To: Build Run OpenMPI Programs for more information.
#!/bin/bash\n\n# Example job script for (single-threaded) MPI programs.\n\n# Generic arguments\n\n# Job name\n#SBATCH --job-name gromacs\n# Maximal running time of 10 min\n#SBATCH --time 00:10:00\n# Allocate 1GB of memory per CPU\n#SBATCH --mem 1G\n# Write logs to directory \"slurm_log/<name>-<job id>.log\" (dir must exist)\n#SBATCH --output slurm_log/%x-%J.log\n\n# MPI-specific parameters\n\n# Launch on 8 nodes (== 8 tasks)\n#SBATCH --ntasks 8\n# Allocate 4 CPUs per task (== per node)\n#SBATCH --cpus-per-task 4\n\n# Load the OpenMPI and GCC environment module to get the runtime environment.\nmodule load gcc/4.7.0-0\nmodule load openmpi/4.0.3-0\n\n# Make custom environment modules known. Alternative, you can \"module use\"\n# them in the session you use for submitting the job.\nmodule use $HOME/local/modules\nmodule load gromacs-mpi/2018.3\n\n# Launch the program on 8 nodes and tell Gromacs to use 4 threads for each\n# invocation.\nexport OMP_NUM_THREADS=4\nmpirun -n 8 gmx_mpi mdrun -deffnm npt_1000\n
med0127:~$ mkdir slurm_log\nmed0127:~$ sbatch job_script.sh\nSubmitted batch job 3229\n
"},{"location":"how-to/software/tensorflow/","title":"How-To: Setup TensorFlow","text":"TensorFlow is a package for deep learning with optional support for GPUs. You can find the original TensorFlow installation instructions here.
This article describes how to set up TensorFlow with GPU support using Conda. This how-to assumes that you have just connected to a GPU node via srun --mem=10g --partition=gpu --gres=gpu:tesla:1 --pty bash -i
(for Tesla V100 GPUs, for A400 GPUs use --gres=gpu:a40:1
). Note that you will need to allocate \"enough\" memory, otherwise your python session will be Killed
because of too little memory. You should read the How-To: Connect to GPU Nodes tutorial on an explanation of how to do this.
This tutorial assumes, that conda has been set up as described in [Software Management]((../../best-practice/software-installation-with-conda.md).
"},{"location":"how-to/software/tensorflow/#create-conda-environment","title":"Create conda environment","text":"We recommend that you install mamba first with conda install -y mamba
and use this C++ reimplementation of the conda command
as follows.
$ conda create -y -n python-tf tensorflow-gpu\n$ conda activate python-tf\n
Let us verify that we have Python and TensorFlow installed. You might get different versions you could pin the version on installing with `conda create -y -n python-tf python==3.9.10 tensorflow-gpu==2.6.2
$ python --version\nPython 3.9.10\n$ python -c 'import tensorflow; print(tensorflow.__version__)'\n2.6.2\n
We thus end up with an installation of Python 3.9.10 with tensorflow 2.6.2.
"},{"location":"how-to/software/tensorflow/#run-tensorflow-example","title":"Run TensorFlow Example","text":"Let us now see whether TensorFlow has recognized our GPU correctly.
$ python\n>>> import tensorflow as tf\n>>> print(\"TensorFlow version:\", tf.__version__)\nTensorFlow version: 2.6.2\n>>> print(tf.config.list_physical_devices())\n[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]\n
Yay, we can proceed to run the Quickstart Tutorial.
>>> mnist = tf.keras.datasets.mnist\n>>> (x_train, y_train), (x_test, y_test) = mnist.load_data()\n>>> x_train, x_test = x_train / 255.0, x_test / 255.0\n>>> model = tf.keras.models.Sequential([\n... tf.keras.layers.Flatten(input_shape=(28, 28)),\n... tf.keras.layers.Dense(128, activation='relu'),\n... tf.keras.layers.Dropout(0.2),\n... tf.keras.layers.Dense(10)\n... ])\n>>> predictions = model(x_train[:1]).numpy()\n>>> predictions\narray([[-0.50569224, 0.26386747, 0.43226188, 0.61226094, 0.09630793,\n 0.34400576, 0.9819117 , -0.3693726 , 0.5221357 , 0.3323232 ]],\n dtype=float32)\n>>> tf.nn.softmax(predictions).numpy()\narray([[0.04234391, 0.09141268, 0.10817807, 0.12951255, 0.07731011,\n 0.09903987, 0.18743432, 0.04852816, 0.11835073, 0.09788957]],\n dtype=float32)\n>>> loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)\n>>> loss_fn(y_train[:1], predictions).numpy()\n2.3122327\n>>> model.compile(optimizer='adam',\n... loss=loss_fn,\n... metrics=['accuracy'])\n>>> model.fit(x_train, y_train, epochs=5)\n2022-03-09 17:53:47.237997: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)\nEpoch 1/5\n1875/1875 [==============================] - 3s 1ms/step - loss: 0.2918 - accuracy: 0.9151\nEpoch 2/5\n1875/1875 [==============================] - 3s 1ms/step - loss: 0.1444 - accuracy: 0.9561\nEpoch 3/5\n1875/1875 [==============================] - 3s 1ms/step - loss: 0.1082 - accuracy: 0.9674\nEpoch 4/5\n1875/1875 [==============================] - 3s 1ms/step - loss: 0.0898 - accuracy: 0.9720\nEpoch 5/5\n1875/1875 [==============================] - 3s 1ms/step - loss: 0.0773 - accuracy: 0.9756\n<keras.callbacks.History object at 0x154e81360190>\n>>> model.evaluate(x_test, y_test, verbose=2)\n313/313 - 0s - loss: 0.0713 - accuracy: 0.9785\n[0.0713074803352356, 0.9785000085830688]\n>>> probability_model = tf.keras.Sequential([\n... model,\n... tf.keras.layers.Softmax()\n... ])\n>>> probability_model(x_test[:5])\n<tf.Tensor: shape=(5, 10), dtype=float32, numpy=\narray([[1.2339272e-06, 6.5599060e-10, 1.0560590e-06, 5.9356184e-06,\n 5.3691075e-12, 1.4447859e-07, 5.4218874e-13, 9.9996936e-01,\n 1.0347234e-07, 2.2147648e-05],\n [2.9887938e-06, 6.8461006e-05, 9.9991941e-01, 7.2003731e-06,\n 2.9751782e-13, 8.2818183e-08, 1.4307782e-06, 2.3203837e-13,\n 4.7433215e-07, 2.9504194e-14],\n [1.8058477e-06, 9.9928612e-01, 7.8716243e-05, 3.9140195e-06,\n 3.0842333e-05, 9.4537208e-06, 2.2774333e-05, 4.5549971e-04,\n 1.1015874e-04, 6.9138093e-07],\n [9.9978787e-01, 3.0206781e-08, 2.8528208e-05, 8.5581682e-08,\n 1.3851340e-07, 2.3634559e-06, 1.8480707e-05, 1.0153375e-04,\n 1.1583331e-07, 6.0887167e-05],\n [6.4914235e-07, 2.5808356e-08, 1.8225538e-06, 2.3215563e-09,\n 9.9588013e-01, 4.6049720e-08, 3.8903639e-07, 2.9772724e-05,\n 4.3141077e-07, 4.0867776e-03]], dtype=float32)>\n>>> exit()\n
"},{"location":"how-to/software/tensorflow/#writing-tensorflow-slurm-jobs","title":"Writing TensorFlow Slurm Jobs","text":"Writing Slurm jobs using TensorFlow is as easy as creating the following scripts.
tf_script.py
#/usr/bin/env python\n\nimport tensorflow as tf\nprint(\"TensorFlow version:\", tf.__version__)\nprint(tf.config.list_physical_devices())\n\nmnist = tf.keras.datasets.mnist\n\n(x_train, y_train), (x_test, y_test) = mnist.load_data()\nx_train, x_test = x_train / 255.0, x_test / 255.0\n\n\nmodel = tf.keras.models.Sequential([\n tf.keras.layers.Flatten(input_shape=(28, 28)),\n tf.keras.layers.Dense(128, activation='relu'),\n tf.keras.layers.Dropout(0.2),\n tf.keras.layers.Dense(10)\n])\n\npredictions = model(x_train[:1]).numpy()\nprint(predictions)\n\nprint(tf.nn.softmax(predictions).numpy())\n\n# ... and so on ;-)\n
tf_job.sh
#!/usr/bin/bash\n\n#SBATCH --job-name=tf-job\n#SBATCH --mem=10g\n#SBATCH --partition=gpu\n#SBATCH --gres=gpu:tesla:1\n\nsource $HOME/work/miniconda3/bin/activate\nconda activate python-tf\n\npython tf_script.py &>tf-out.txt\n
And then calling
$ sbatch tf_job.sh\n
You can find the reuslts in tf-out.txt
after completion.
$ cat tf-out.txt \n2022-03-09 18:05:54.628846: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA\nTo enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.\n2022-03-09 18:05:56.999848: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30988 MB memory: -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:18:00.0, compute capability: 7.0\nTensorFlow version: 2.6.2\n[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]\n[[-0.07757086 0.04676083 0.9420195 -0.59902835 -0.26286742 -0.392514\n 0.3231195 -0.17169198 0.3480805 0.37013203]]\n[[0.07963609 0.09017922 0.22075593 0.04727634 0.06616627 0.05812084\n 0.11888511 0.07248258 0.12188996 0.12460768]]\n
"},{"location":"hpc-tutorial/episode-0/","title":"First Steps: Episode 0","text":"Episode Topic 0 How can I install the tools? 1 How can I use the static data? 2 How can I distribute my jobs on the cluster (Slurm)? 3 How can I organize my jobs with Snakemake? 4 How can I combine Snakemake and Slurm?"},{"location":"hpc-tutorial/episode-0/#prerequisites","title":"Prerequisites","text":"This tutorial assumes familiarity with Linux/Unix operating systems. It also assumes that you have already connected to the cluster. We have collected some links to tutorials and manuals on the internet.
"},{"location":"hpc-tutorial/episode-0/#legend","title":"Legend","text":"Before we start with our first steps tutorial, we would like to introduce the following convention that we use throughout the series:
$ Commands are prefixed with a little dollar sign\n
While file paths are highlighted like this: /data/cephfs-1/work/projects/cubit/current
.
After connecting to the cluster, you are located on a login node. To get to your first compute node, type srun --time 7-00 --mem=8G --cpus-per-task=8 --pty bash -i
which will launch an interactive Bash session on a free remote node running up to 7 days, enabling you to use 8 cores and 8 Gb memory. Typing exit
will you bring back to the login node.
hpc-login-1$ srun -p long --time 7-00 --mem=8G --cpus-per-task=8 --pty bash -i\nhpc-cpu-1$ exit\n$\n
See? That was easy!
"},{"location":"hpc-tutorial/episode-0/#preparation","title":"Preparation","text":"In preparation for our first steps tutorial series, we would like you to install the software for this tutorial. In general the users on the cluster will manage their own software with the help of conda. If you haven't done so so far, please follow the instructions in installing conda first. The only premise is that you are able to log into the cluster. Make also sure that you are logged in to a computation node using srun -p medium --time 1-00 --mem=4G --cpus-per-task=1 --pty bash -i
.
Now we will create a new environment, so as to not interfere with your current or planned software stack, and install into it all the software that we need during the tutorial. Run the following commands:
$ conda create -n first-steps python=3 snakemake bwa delly samtools gatk4\n$ conda activate first-steps\n(first-steps) $\n
"},{"location":"hpc-tutorial/episode-1/","title":"First Steps: Episode 1","text":"Episode Topic 0 How can I install the tools? 1 How can I use the static data? 2 How can I distribute my jobs on the cluster (Slurm)? 3 How can I organize my jobs with Snakemake? 4 How can I combine Snakemake and Slurm? This is part one of the \"First Steps\" BIH Cluster Tutorial. Here we will build a small pipeline with alignment and variant calling. The premise is that you have the tools installed as described in Episode 0. For this episode, please make sure that you are on a compute node. As a reminder, the command to access a compute node with the required resources is
$ srun --time 7-00 --mem=8G --cpus-per-task=8 --pty bash -i\n
"},{"location":"hpc-tutorial/episode-1/#tutorial-input-files","title":"Tutorial Input Files","text":"We will provide you with some example FASTQ files, but you can use your own if you like. You can find the data here:
/data/cephfs-1/work/projects/cubit/tutorial/input/test_R1.fq.gz
/data/cephfs-1/work/projects/cubit/tutorial/input/test_R2.fq.gz
First, you should create a folder where the output of this tutorial will go. It would be good to have it in your work
directory in /data/cephfs-1/home/users/$USER
, because it is faster and there is more space available.
(first-steps) $ mkdir -p /data/cephfs-1/home/users/$USER/work/tutorial/episode1\n(first-steps) $ pushd /data/cephfs-1/home/users/$USER/work/tutorial/episode1\n
Quotas / File System limits
/data/cephfs-1/home/users/$USER
. The reason for this is that nightly snapshots and backups are created for this directory which are precious resources./data/cephfs-1/home/users/$USER/work
. The limits are much higher here but no snapshots or backups are available./data/cephfs-1/home/users/$USER/scratch
. However, files placed here are automatically removed after 2 weeks. This is only appropriate for files during download or temporary files.In general it is advisable to have a proper temporary directory available. You can create one in your ~/scratch
folder and make it available to the system.
(first-steps) $ export TMPDIR=/data/cephfs-1/home/users/$USER/scratch/tmp\n(first-steps) $ mkdir -p $TMPDIR\n
"},{"location":"hpc-tutorial/episode-1/#using-the-cubit-static-data","title":"Using the Cubit Static Data","text":"The static data is located in /data/cephfs-1/work/projects/cubit/current/static_data
. For our small example, the required reference genome and index can be found at:
/data/cephfs-1/work/projects/cubit/current/static_data/reference/GRCh37/g1k_phase1/human_g1k_v37.fasta
/data/cephfs-1/work/projects/cubit/current/static_data/precomputed/BWA/0.7.17/GRCh37/g1k_phase1/human_g1k_v37.fasta
Let's align our data:
(first-steps) $ bwa mem -t 8 \\\n -R \"@RG\\tID:FLOWCELL.LANE\\tPL:ILLUMINA\\tLB:test\\tSM:PA01\" \\\n /data/cephfs-1/work/projects/cubit/current/static_data/precomputed/BWA/0.7.17/GRCh37/g1k_phase1/human_g1k_v37.fasta \\\n /data/cephfs-1/work/projects/cubit/tutorial/input/test_R1.fq.gz \\\n /data/cephfs-1/work/projects/cubit/tutorial/input/test_R2.fq.gz \\\n| samtools view -b \\\n| samtools sort -O BAM -T $TMPDIR -o aln.bam\n\n(first-steps) $ samtools index aln.bam\n
"},{"location":"hpc-tutorial/episode-1/#perform-structural-variant-calling","title":"Perform Structural Variant Calling","text":"And do the structural variant calling:
(first-steps) $ delly call \\\n -g /data/cephfs-1/work/projects/cubit/current/static_data/reference/GRCh37/g1k_phase1/human_g1k_v37.fasta \\\n aln.bam\n
Note that delly will not find any variants.
"},{"location":"hpc-tutorial/episode-1/#small-variant-calling-snv-indel","title":"Small Variant Calling (SNV, indel)","text":"And now for the SNP calling (this step will take ~ 20 minutes):
(first-steps) $ gatk HaplotypeCaller \\\n -R /data/cephfs-1/work/projects/cubit/current/static_data/reference/GRCh37/g1k_phase1/human_g1k_v37.fasta \\\n -I aln.bam \\\n -ploidy 2 \\\n -O test.GATK.vcf\n
"},{"location":"hpc-tutorial/episode-1/#outlook-more-programs-and-static-data","title":"Outlook: More Programs and Static Data","text":"So this is it! We used the tools that we installed previously, accessed the reference data and ran a simple alignment and variant calling pipeline. You can access a list of all static data through this wiki, follow this link to the Static Data. You can also have a peek via:
(first-steps) $ tree -L 3 /data/cephfs-1/work/projects/cubit/current/static_data | less\n
"},{"location":"hpc-tutorial/episode-2/","title":"First Steps: Episode 2","text":"Episode Topic 0 How can I install the tools? 1 How can I use the static data? 2 How can I distribute my jobs on the cluster (Slurm)? 3 How can I organize my jobs with Snakemake? 4 How can I combine Snakemake and Slurm? Welcome to the second episode of our tutorial series!
Once you are logged in to the cluster, you have the possibility to distribute your jobs to all the nodes that are available. But how can you do this easily? The key command to this magic is sbatch
. This tutorial will show you how you can use this efficiently.
sbatch
Command","text":"So what is sbatch
doing for you?
You use the sbatch
command in front of the script you actually want to run. sbatch
then puts your job into the job queue. The job scheduler looks at the current status of the whole system and will assign the first job in the queue to a node that is free in terms of computational load. If all machines are busy, yours will wait. But your job will sooner or later get assigned to a free node.
We strongly recommend using this process for starting your computationally intensive tasks because you will get the best performance for your job and the whole system won't be disturbed by jobs that are locally blocking nodes. Thus, everybody using the cluster benefits.
You may have noticed that you run sbatch
with a script, not with regular commands. The reason is that sbatch
only accepts bash scripts. If you give sbatch
a normal shell command or binary, it won't work. This means that we have to put the command(s) we want to use in a bash script. A skeleton script can be found at /data/cephfs-1/work/projects/cubit/tutorial/skeletons/submit_job.sh
The content of the file:
#!/bin/bash\n\n# Set a name for the job (-J or --job-name).\n#SBATCH --job-name=tutorial\n\n# Set the file to write the stdout and stderr to (if -e is not set; -o or --output).\n#SBATCH --output=logs/%x-%j.log\n\n# Set the number of cores (-c or --cpus-per-task).\n#SBATCH --cpus-per-task=8\n\n# Force allocation of the two cores on ONE node.\n#SBATCH --nodes=1\n\n# Set the total memory. Units can be given in T|G|M|K.\n#SBATCH --mem=8G\n\n# Optionally, set the partition to be used (-p or --partition).\n#SBATCH --partition=medium\n\n# Set the expected running time of your job (-t or --time).\n# Formats are MM:SS, HH:MM:SS, Days-HH, Days-HH:MM, Days-HH:MM:SS\n#SBATCH --time=30:00\n\nexport TMPDIR=/data/cephfs-1/home/users/${USER}/scratch/tmp\nmkdir -p ${TMPDIR}\n
The lines starting with #SBATCH
are actually setting parameters for a sbatch
command, so #SBATCH --job-name=tutorial
is equal to sbatch --job-name=tutorial
. Slurm will create a log file with a file name composed of the job name (%x
) and the job ID (%j
), e.g. logs/tutorial-XXXX.log
. It will not automatically create the logs
directory, we need to do this manually first. Here, we emphasize the importance of the log files! They are the first place to look if anything goes wrong.
To start now with our tutorial, create a new tutorial directory with a log directory, e.g.,
(first-steps) $ mkdir -p /data/cephfs-1/home/users/$USER/work/tutorial/episode2/logs\n
and copy the wrapper script to this directory:
(first-steps) $ pushd /data/cephfs-1/home/users/$USER/work/tutorial/episode2\n(first-steps) $ cp /data/cephfs-1/work/projects/cubit/tutorial/skeletons/submit_job.sh .\n(first-steps) $ chmod u+w submit_job.sh\n
Now open this file and copy the same commands we executed in the last tutorial to this file.
To keep it simple, we will put everything into one script. This is perfectly fine because the alignment and indexing are sequential. But there are two steps that could be run in parallel, namely the variant calling, because they don't depend on each other. We will learn how to do that in a later tutorial. Your file should look something like this:
#!/bin/bash\n\n# Set a name for the job (-J or --job-name).\n#SBATCH --job-name=tutorial\n\n# Set the file to write the stdout and stderr to (if -e is not set; -o or --output).\n#SBATCH --output=logs/%x-%j.log\n\n# Set the number of cores (-c or --cpus-per-task).\n#SBATCH --cpus-per-task=8\n\n# Force allocation of the two cores on ONE node.\n#SBATCH --nodes=1\n\n# Set the total memory. Units can be given in T|G|M|K.\n#SBATCH --mem=8G\n\n# Optionally, set the partition to be used (-p or --partition).\n#SBATCH --partition=medium\n\n# Set the expected running time of your job (-t or --time).\n# Formats are MM:SS, HH:MM:SS, Days-HH, Days-HH:MM, Days-HH:MM:SS\n#SBATCH --time=30:00\n\nexport TMPDIR=/data/cephfs-1/home/users/${USER}/scratch/tmp\nmkdir -p ${TMPDIR}\n\nBWAREF=/data/cephfs-1/work/projects/cubit/current/static_data/precomputed/BWA/0.7.17/GRCh37/g1k_phase1/human_g1k_v37.fasta\nREF=/data/cephfs-1/work/projects/cubit/current/static_data/reference/GRCh37/g1k_phase1/human_g1k_v37.fasta\n\nbwa mem -t 8 \\\n -R \"@RG\\tID:FLOWCELL.LANE\\tPL:ILLUMINA\\tLB:test\\tSM:PA01\" \\\n $BWAREF \\\n /data/cephfs-1/work/projects/cubit/tutorial/input/test_R1.fq.gz \\\n /data/cephfs-1/work/projects/cubit/tutorial/input/test_R2.fq.gz \\\n| samtools view -b \\\n| samtools sort -O BAM -T $TMPDIR -o aln.bam\n\nsamtools index aln.bam\n\ndelly call -g \\\n $REF \\\n aln.bam\n\ngatk HaplotypeCaller \\\n -R $REF \\\n -I aln.bam \\\n -ploidy 2 \\\n -O test.GATK.vcf\n
Let's run it (make sure that you are in the tutorial/episode2
directory!):
(first-steps) $ sbatch submit_job.sh\n
And wait for the response which will tell you that your job was submitted and which job id number it was assigned. Note that sbatch
only tells you that the job has started, but nothing about finishing. You won't get any response at the terminal when the job finishes. It will take approximately 20 minutes to finish the job.
You'll probably want to see how your job is doing. You can get a list of your jobs using:
(first-steps) $ squeue --me\n
Note that logins are also considered as jobs.
Identify your job by the <JOBID>
(1st column) or the name of the script (3rd column). The most likely states you will see (5th column of the table):
PD
pending, waiting to be submittedR
runningIn the 8th column you can see that your job is very likely running on a different machine than the one you are on!
Do not use Slurm and watch
or loops
The watch
command is a useful tool for running commands in a loop every N
seconds. For example, on your workstation you could do watch 'ping -c 3 google.com'
to execute three network pings to Google every two seconds.
\ud83d\udc4e Using watch
or manual loops in a cluster environment can have bad effects when querying Slurm or the shared file system. Both are shared resources and \"expensive\" queries should not be run in loops. For Slurm, this includes running squeue
. The same would be true for running squeue -i
which performs an internal loop.
\ud83d\udc4d Use the Slurm query commands only when you actually need the output. If you run them in an (implict or explicit) loop, then do so only for a short time and don't leave this open in a screen.
Get more information about your jobs by either passing the job id:
(first-steps) $ sstat <JOBID>\n
And of course, watch what the logs are telling you:
(first-steps) $ tail -f logs/tutorial-<JOBID>.log\n
There will be no notification when your job is done, so it is best to watch the squeue --me
command. To watch the sbatch
command there is a linux command watch
that you give a command to execute every few seconds. This is useful for looking for changes in the output of a command. The seconds between two executions can be set with the -n
option. It is best to use -n 60
to minimize unnecessary load on the file system:
(first-steps) $ watch -n 60 squeue --me\n
If for some reason your job is hanging, you can delete your job using scancel
with your job-ID: (first-steps) $ scancel <job-ID>\n
"},{"location":"hpc-tutorial/episode-2/#job-queues","title":"Job Queues","text":"The cluster has a special way of organizing itself and by telling the cluster how long and with which priority you want your jobs to run, you can help it in this. There is a system set up on the cluster where you can enqueue your jobs to so-called partitions. partitions have different prioritites and are allowed for different running times. To get to know what partitions are available, and how to use them properly, we highly encourage you to read the cluster queues wiki page.
"},{"location":"hpc-tutorial/episode-3/","title":"First Steps: Episode 3","text":"Episode Topic 0 How can I install the tools? 1 How can I use the static data? 2 How can I distribute my jobs on the cluster (Slurm)? 3 How can I organize my jobs with Snakemake? 4 How can I combine Snakemake and Slurm?In this episode we will discuss how we can parallelize steps in a pipeline that are not dependent on each other. In the last episode we saw a case (the variant calling) that could have been potentially parallelized.
We will take care of that today. Please note that we are not going to use the sbatch
command we learned earlier. Thus, this tutorial will run on the same node where you execute the script. We will introduce you to Snakemake, a tool with which we can model dependencies and run things in parallel. In the next tutorial we will learn how to submit the jobs with sbatch
and Snakemake combined.
For those who know make
already, Snakemake will be familiar. You can think of Snakemake being a bunch of dedicated bash scripts that you can make dependent on each other. Snakemake will start the next script when a previous one finishes, and potentially it will run things in parallel if the dependencies allow.
Snakemake can get confusing, especially if the project gets big. This tutorial will only cover the very basics of this powerful tool. For more, we highly recommend digging into the Snakemake documentation:
Every Snakemake run requires a Snakefile
file. Create a new folder inside your tutorial folder and copy the skeleton:
(first-steps) $ mkdir -p /data/cephfs-1/home/users/${USER}/work/tutorial/episode3\n(first-steps) $ pushd /data/cephfs-1/home/users/${USER}/work/tutorial/episode3\n(first-steps) $ cp /data/cephfs-1/work/projects/cubit/tutorial/skeletons/Snakefile .\n(first-steps) $ chmod u+w Snakefile\n
Your Snakefile
should look as follows:
rule all:\n input:\n 'snps/test.vcf',\n 'structural_variants/test.vcf'\n\nrule alignment:\n input:\n '/data/cephfs-1/work/projects/cubit/tutorial/input/test_R1.fq.gz',\n '/data/cephfs-1/work/projects/cubit/tutorial/input/test_R2.fq.gz',\n output:\n bam='alignment/test.bam',\n bai='alignment/test.bam.bai',\n shell:\n r\"\"\"\n export TMPDIR=/data/cephfs-1/home/users/${{USER}}/scratch/tmp\n mkdir -p ${{TMPDIR}}\n\n BWAREF=/data/cephfs-1/work/projects/cubit/current/static_data/precomputed/BWA/0.7.17/GRCh37/g1k_phase1/human_g1k_v37.fasta\n\n bwa mem -t 8 \\\n -R \"@RG\\tID:FLOWCELL.LANE\\tPL:ILLUMINA\\tLB:test\\tSM:PA01\" \\\n ${{BWAREF}} \\\n {input} \\\n | samtools view -b \\\n | samtools sort -O BAM -T ${{TMPDIR}} -o {output.bam}\n\n samtools index {output.bam}\n \"\"\"\n\nrule structural_variants:\n input:\n 'alignment/test.bam'\n output:\n 'structural_variants/test.vcf'\n shell:\n r\"\"\"\n REF=/data/cephfs-1/work/projects/cubit/current/static_data/reference/GRCh37/g1k_phase1/human_g1k_v37.fasta\n\n delly call -o {output} -g ${{REF}} {input}\n \"\"\"\n\nrule snps:\n input:\n 'alignment/test.bam'\n output:\n 'snps/test.vcf'\n shell:\n r\"\"\"\n REF=/data/cephfs-1/work/projects/cubit/current/static_data/reference/GRCh37/g1k_phase1/human_g1k_v37.fasta\n\n gatk HaplotypeCaller \\\n -R ${{REF}} \\\n -I {input} \\\n -ploidy 2 \\\n -O {output}\n \"\"\"\n
Let me explain. The content resembles the same steps we took in the previous tutorials. Although every step has its own rule (alignment, snp calling, structural variant calling), we could instead have written everything in one rule. It is up to you to design your rules! Note that the rule names are arbitrary and not mentioned anywhere else in the file.
But there is one primary rule: the rule all
. This is the kickoff rule that makes everything run.
As you might have noticed, every rule has three main parameters: input
, output
and shell
. input
defines the files that are going into the rule, output
those that are produced when executing the rule, and shell
is the bash script that processes input
to produce output
.
Rule all
does not have any output
or shell
, it uses input
to start the chain of rules. Note that the input files of this rule are the output files of rule snps
and structural_variants
. The input of those rules is the output of rule alignment
. This is how Snakemake processes the rules: It looks for rule all
(or a rule that just has input
files) and figures out how it can create the required input files with other rules by looking at their output
files (the input
files of one rule must be the output
files of another rule). In our case it traces the workflow back to rule snps
and structural_variants
as they have the matching output files. They depend in return on the alignment, so the alignment
rule must be executed, and this is the first thing that will be done by Snakemake.
There are also some peculiarities about Snakemake:
input
or output
as is done in rule alignment
with the output files.input
and output
files in the script by writing {input}
or {output}
.{output.bam}
${{VAR}}
instead of ${VAR}
but not Snakemake internal variables like {input}
or {output}
structural_variants
we cheat a bit because delly does not produce output files if it can't find variants.touching
(i.e., creating) the required output file. Snakemake has a function for doing so (call touch()
on the filename).But Snakemake can do more. It is able to parse the paths of the output files and set wildcards if you want. For this your input (and output) file names have to follow a parsable scheme. In our case they do! Our FASTQ files, our only initial input files, start with test
. The output of the alignment as well as the variant calling is also prefixed test
. We now can modify the Snakemake file accordingly, by exchanging every occurrence of test
in each input
or output
field with {id}
(note that you could also give a different name for your variable). Only the input rule should not be touched, otherwise Snakemake would not know which value this variable should have. Your Snakefile
should look now like this:
rule all:\n input:\n 'snps/test.vcf',\n 'structural_variants/test.vcf'\n\nrule alignment:\n input:\n '/data/cephfs-1/work/projects/cubit/tutorial/input/{id}_R1.fq.gz',\n '/data/cephfs-1/work/projects/cubit/tutorial/input/{id}_R2.fq.gz',\n output:\n bam='alignment/{id}.bam',\n bai='alignment/{id}.bam.bai',\n shell:\n r\"\"\"\n export TMPDIR=/data/cephfs-1/home/users/${{USER}}/scratch/tmp\n mkdir -p ${{TMPDIR}}\n\n BWAREF=/data/cephfs-1/work/projects/cubit/current/static_data/precomputed/BWA/0.7.17/GRCh37/g1k_phase1/human_g1k_v37.fasta\n\n bwa mem -t 8 \\\n -R \"@RG\\tID:FLOWCELL.LANE\\tPL:ILLUMINA\\tLB:test\\tSM:PA01\" \\\n ${{BWAREF}} \\\n {input} \\\n | samtools view -b \\\n | samtools sort -O BAM -T ${{TMPDIR}} -o {output.bam}\n\n samtools index {output.bam}\n \"\"\"\n\nrule structural_variants:\n input:\n 'alignment/{id}.bam'\n output:\n 'structural_variants/{id}.vcf'\n shell:\n r\"\"\"\n REF=/data/cephfs-1/work/projects/cubit/current/static_data/reference/GRCh37/g1k_phase1/human_g1k_v37.fasta\n\n delly call -o {output} -g ${{REF}} {input}\n \"\"\"\n\nrule snps:\n input:\n 'alignment/{id}.bam'\n output:\n 'snps/{id}.vcf'\n shell:\n r\"\"\"\n REF=/data/cephfs-1/work/projects/cubit/current/static_data/reference/GRCh37/g1k_phase1/human_g1k_v37.fasta\n\n gatk HaplotypeCaller \\\n -R ${{REF}} \\\n -I {input} \\\n -ploidy 2 \\\n -O {output}\n \"\"\"\n
Before we finally run this, we can make a dry run. Snakemake will show you what it would do:
(first-steps) $ snakemake -n\n
If everything looks green, you can run it for real. We provide it two cores to allow two single-threaded jobs to be run simultaneously:
(first-steps) $ snakemake -j 2\n
"},{"location":"hpc-tutorial/episode-4/","title":"First Steps: Episode 4","text":"Episode Topic 0 How can I install the tools? 1 How can I use the static data? 2 How can I distribute my jobs on the cluster (Slurm)? 3 How can I organize my jobs with Snakemake? 4 How can I combine Snakemake and Slurm? In the last episodes we learned about distributing a job among the cluster nodes using sbatch
and how to automate and parallelize our pipeline with Snakemake. We are lucky that those two powerful commands can be combined. What is the result? You will have an automated pipeline with Snakemake that uses sbatch
to distribute jobs among the cluster nodes instead of running only the same node.
The best thing is that we can reuse our Snakefile
as it is and just write a wrapper script to call Snakemake. We run the script and the magic will start.
First, create a new folder for this episode:
(first-steps) $ mkdir -p /data/cephfs-1/home/users/${USER}/work/tutorial/episode4/logs\n(first-steps) $ pushd /data/cephfs-1/home/users/${USER}/work/tutorial/episode4\n
And copy the wrapper script to this folder as well as the Snakefile (you can also reuse the one with the adjustments from the previous episode):
(first-steps) $ cp /data/cephfs-1/work/projects/cubit/tutorial/skeletons/submit_snakejob.sh .\n(first-steps) $ cp /data/cephfs-1/work/projects/cubit/tutorial/skeletons/Snakefile .\n(first-steps) $ chmod u+w submit_snakejob.sh Snakefile\n
The Snakefile
is already known to you but let me explain the wrapper script submit_snakejob.sh
:
#!/bin/bash\n\n# Set a name for the job (-J or --job-name).\n#SBATCH --job-name=tutorial\n\n# Set the file to write the stdout and stderr to (if -e is not set; -o or --output).\n#SBATCH --output=logs/%x-%j.log\n\n# Set the number of cores (-c or --cpus-per-task).\n#SBATCH --cpus-per-task=2\n\n# Force allocation of the two cores on ONE node.\n#SBATCH --nodes=1\n\n# Set the total memory. Units can be given in T|G|M|K.\n#SBATCH --mem=1G\n\n# Optionally, set the partition to be used (-p or --partition).\n#SBATCH --partition=medium\n\n# Set the expected running time of your job (-t or --time).\n# Formats are MM:SS, HH:MM:SS, Days-HH, Days-HH:MM, Days-HH:MM:SS\n#SBATCH --time=30:00\n\n\nexport TMPDIR=/data/cephfs-1/home/users/${USER}/scratch/tmp\nexport LOGDIR=logs/${SLURM_JOB_NAME}-${SLURM_JOB_ID}\nmkdir -p $LOGDIR\n\neval \"$($(which conda) shell.bash hook)\"\nconda activate first-steps\n\nset -x\n\nsnakemake --profile=cubi-v1 -j 2 -k -p --restart-times=2\n
In the beginning you see the #SBATCH
that introduces the parameters when you provide this script to sbatch
as described in the second episode. Please make sure that the logs
folder exists before starting the run! We then set and export the TMPDIR
and LOGDIR
variables. Note that LOGDIR
has a subfolder named $SLURM_JOB_NAME-$SLURM_JOB_ID
that will be created for you. Snakemake will store its logfiles for this very Snakemake run in this folder. The next new thing is set -x
. This simply prints to the terminal every command that is executed within the script. This is useful for debugging.
Finally, the Snakemake call takes place. With the --profile
option we define that Snakemake uses the Snakemake profile at /etc/xdg/snakemake/cubi-v1
. The profile will take create appropriate calls to sbatch
and interpret the following settings from your Snakemake rule:
threads
: the number of threads to execute the job onk
, M
, G
, or T
. You can specify EITHERresources.mem
/resources.mem_mb
: the memory to allocate for the whole job, ORresources.mem_per_thread
: the memory to allocate for each thread.resources.time
: the running time of the rule, in a syntax supported by Slurm, e.g. HH:MM:SS
or D-HH:MM:SS
resources.partition
: the partition to submit your job into (Slurm will pick a fitting partition for you by default)resources.nodes
: the number of nodes to schedule your job on (defaults to 1
and you will want to keep that value unless you want to use MPI)The other options to snakemake
have the meaning:
-j 2
: run at most two jobs at the same time-k
: keep going even if a rule execution fails-p
: print the executed shell commands--restart-times=2
: restart failing jobs up to two timesIt is now time to update your Snakefile
such that it actually specifies the resources mentioned above:
rule all:\n input:\n 'snps/test.vcf',\n 'structural_variants/test.vcf'\n\nrule alignment:\n input:\n '/data/cephfs-1/work/projects/cubit/tutorial/input/{id}_R1.fq.gz',\n '/data/cephfs-1/work/projects/cubit/tutorial/input/{id}_R2.fq.gz',\n output:\n bam='alignment/{id}.bam',\n bai='alignment/{id}.bam.bai',\n threads: 8\n resources:\n mem='8G',\n time='12:00:00',\n shell:\n r\"\"\"\n export TMPDIR=/data/cephfs-1/home/users/${{USER}}/scratch/tmp\n mkdir -p ${{TMPDIR}}\n\n BWAREF=/data/cephfs-1/work/projects/cubit/current/static_data/precomputed/BWA/0.7.17/GRCh37/g1k_phase1/human_g1k_v37.fasta\n\n bwa mem -t 8 \\\n -R \"@RG\\tID:FLOWCELL.LANE\\tPL:ILLUMINA\\tLB:test\\tSM:PA01\" \\\n ${{BWAREF}} \\\n {input} \\\n | samtools view -b \\\n | samtools sort -O BAM -T ${{TMPDIR}} -o {output.bam}\n\n samtools index {output.bam}\n \"\"\"\n\nrule structural_variants:\n input:\n 'alignment/{id}.bam'\n output:\n 'structural_variants/{id}.vcf'\n threads: 1\n resources:\n mem='4G',\n time='2-00:00:00',\n shell:\n r\"\"\"\n REF=/data/cephfs-1/work/projects/cubit/current/static_data/reference/GRCh37/g1k_phase1/human_g1k_v37.fasta\n\n delly call -o {output} -g ${{REF}} {input}\n \"\"\"\n\ndef snps_mem(wildcards, attempt):\n mem = 2 * attempt\n return '%dG' % mem\n\nrule snps:\n input:\n 'alignment/{id}.bam'\n output:\n 'snps/{id}.vcf'\n threads: 1\n resources:\n mem=snps_mem,\n time='04:00:00',\n shell:\n r\"\"\"\n REF=/data/cephfs-1/work/projects/cubit/current/static_data/reference/GRCh37/g1k_phase1/human_g1k_v37.fasta\n\n gatk HaplotypeCaller \\\n -R ${{REF}} \\\n -I {input} \\\n -ploidy 2 \\\n -O {output}\n \"\"\"\n
We thus configure the resource consumption of the rules as follows:
alignment
with 8 threads and up to 8GB of memory in total with a running time of up to 12 hours,structural_variants
with one thread and up to 4GB of memory in with a running time of up to 2 days,snps
with one thread and running up to four hours. Instead of passing a static amount of memory, we pass a resource callable. The attempt
parameter will be passed a value of 1
on the initial invocation. If variant calling with the GATK HaplotypeCaller fails then it will retry and attempt
will have an incremented value on each invocation (2
on the first retry and so on). Thus, we try to do small variant calling with 2, 4, 6, and 8 GB.Finally, run the script:
(first-steps) $ sbatch submit_snakejob.sh\n
If you watch squeue --me
now, you will see that the jobs are distributed to the system:
(first-steps) $ squeue --me\n
Please refer to the Snakemake documentation for more details on using Snakemake, in particular how to use the cluster configuration on how to specify the resource requirements on a per-rule base.
"},{"location":"misc/external-resources/","title":"External Resources","text":""},{"location":"misc/external-resources/#basic-linux","title":"Basic Linux","text":"The BIH HPC uses CentOS Linux. A basic understanding of Linux is required. Even better, you should already have intermediate to advanced Linux/Unix skills.
BIH HPC IT cannot provide you with basic Unix training. Please ask your home organization (e.g., Charite or MDC) to provide you with basic Linux training.
That said, here are some resources that we find useful:
"},{"location":"misc/external-resources/#internet-tutorials","title":"Internet Tutorials","text":"There is a large number of Linux tutorials online including:
GOBLET has a number of Bioinformatics-focused tutorials. This includes
Some software is provided by HPC Administration based on the criteria that it is:
Currently, this includes:
On the GPU node, this also includes a recent NVIDIA CUDA version.
To see which software is available, use module avail
on a compute node (this will not work on login nodes):
$ module avail\n--------------------- /opt/local/modules ---------------------\ncmake/3.11.0-0 llvm/6.0.0-0\ngcc/7.2.0-0 openmpi/4.0.3-0\n
To load software, use module load
. This will adjust the environment variables accordingly, in particular update PATH
such that the executable are available.
$ which gcc\n/bin/gcc\n$ module load gcc/7.2.0-0\n$ which gcc\n/opt/local/gcc-7.2.0-0/bin/gcc\n
Problems with executing module
?
See the corresponding FAQ entry in the case that you get a -bash: module: command not found
when calling module
.
The BIH Cluster is a valuable resource. It has been used to support the publications listed below.
Hollunder, B., Ostrem, J.L., Sahin, I.A., Rajamani, N., Oxenford, S., Butenko, K., Neudorfer, C., Reinhardt, P., Zvarova, P., Polosan, M., Akram, H., Vissani, M., Zhang, C., Sun, B., Navratil, P., Reich, M.M., Volkmann, J., Yeh, F.-C., Baldermann, J.C., Dembek, T.A., Visser-Vandewalle, V., Alho, E.J.L., Franceschini, P.R., Nanda, P., Finke, C., K\u00fchn, A.A., Dougherty, D.D., Richardson, R.M., Bergman, H., DeLong, M.R., Mazzoni, A., Romito, L.M., Tyagi, H., Zrinzo, L., Joyce, E.M., Chabardes, S., Starr, P.A., Li, N., Horn, A., 2024. Mapping dysfunctional circuits in the frontal cortex using deep brain stimulation. Nat. Neurosci. 1\u201314. doi: 10.1038/s41593-024-01570-1
"},{"location":"misc/publication-list/#2022","title":"2022","text":"Kossen T, Hirzel MA, Madai VI, Boenisch F, Hennemuth A, Hildebrand K, Pokutta S, Sharma K, Hilbert A, Sobesky J, Galinovic I, Khalil AA, Fiebach JB and Frey D. Toward Sharing Brain Images: Differentially Private TOF-MRA Images With Segmentation Labels Using Generative Adversarial Networks. Frontiers in Artificial Intelligence. 5 (2022). issn: 2624-8212. doi: 10.3389/frai.2022.813842
"},{"location":"misc/publication-list/#2021","title":"2021","text":"Li, N., Hollunder, B., Baldermann, J. C., Kibleur, A., Treu, S., Akram, H., Al-Fatly, B., Strange, B. A., Barcia, J. A., Zrinzo, L., Joyce, E. M., Chabardes, S., Visser-Vandewalle, V., Polosan, M., Kuhn, J., K\u00fchn, A. A., & Horn, A. (2021). A Unified Functional Network Target for Deep Brain Stimulation in Obsessive-Compulsive Disorder. Biological Psychiatry. doi: 10.1016/j.biopsych.2021.04.006
Bressem KK, Vahldiek JL, Adams L, Niehues SM, Haibel H, Rodriguez VR, Torgutalp M, Protopopov M, Proft F, Rademacher J, Sieper J, Rudwaleit M, Hamm B, Makowski MR, Hermann KG, Poddubnyy D. Deep learning for detection of radiographic sacroiliitis: achieving expert-level performance. Arthritis Res Ther. 2021 Apr 8;23(1):106. doi: 10.1186/s13075-021-02484-0
Kossen T, Subramaniam P, Madai VI, Hennemuth A, Hildebrand K, Hilbert A, Sobesky J, Livne M, Galinovic I, Khalil AA, Fiebach JB, Frey D. Synthesizing anonymized and labeled TOF-MRA patches for brain vessel segmentation using generative adversarial networks. Computers in Biology and Medicine. 2021 Apr 131,104254. doi: 10.1016/j.compbiomed.2021.104254
Paraskevopoulou S., K\u00e4fer S., Zirkel F., Donath A., Petersen M., Liu S., Zhou X., Drosten C., Misof B., Junglen S. (2021). \"Viromics of extant insect orders unveil the evolution of the flavi-like superfamily.\" Virus Evolution 2021 Mar 30. doi: 10.1093/ve/veab030
Thomas Krannich, W Timothy J White, Sebastian Niehus, Guillaume Holley, Bjarni V Halld\u00f3rsson, Birte Kehr, Population-scale detection of non-reference sequence variants using colored de Bruijn graphs, Bioinformatics, 2021, btab749, doi: 10.1093/bioinformatics/btab749
Julia Markowski, Rieke Kempfer, Alexander Kukalev, Ibai Irastorza-Azcarate, Gesa Loof, Birte Kehr, Ana Pombo, Sven Rahmann, Roland F Schwarz, GAMIBHEAR: whole-genome haplotype reconstruction from Genome Architecture Mapping data, Bioinformatics, Volume 37, Issue 19, 1 October 2021, Pages 3128\u20133135. doi: 10.1093/bioinformatics/btab238
"},{"location":"misc/publication-list/#2020","title":"2020","text":"Kr\u00fctzfeldt LM, Schubach M, Kircher M. The impact of different negative training data on regulatory sequence predictions. PLoS One. 2020 Dec 1;15(12):e0237412. doi: 10.1371/journal.pone.0237412.
Klotz-Noack K, Klinger B, Rivera M, Bublitz N, Uhlitz F, Riemer P, L\u00fcthen M, Sell T, Kasack K, Gastl B, Ispasanie SSS, Simon T, Janssen N, Schwab M, Zuber J, Horst D, Bl\u00fcthgen N, Sch\u00e4fer R, Morkel M, Sers C. SFPQ Depletion Is Synthetically Lethal with BRAFV600E in Colorectal Cancer Cells. Cell Rep. 2020 Sep 22;32(12):108184. doi: 10.1016/j.celrep.2020.108184.
Kleinert, P., Martin, B., & Kircher, M. (2020). \"HemoMIPs\u2014Automated analysis and result reporting pipeline for targeted sequencing data.\" PLOS Computational Biology, 16(6), e1007956. doi: 10.1371/journal.pcbi.1007956
Ehmke, N.; Cusmano-Ozog, K.; Koenig, R.; Holtgrewe, M.; Nur, B.; Mihci, E.; Babcock, H.; Gonzaga-Jauregui, C.; Overton, J. D.; Xiao, J.; et al. Biallelic Variants in KYNU Cause a Multisystemic Syndrome with Hand Hyperphalangism. Bone 2020, 115219. doi: 10.1016/j.bone.2019.115219.
Niehus, S.; J\u00f3nsson, H.; Sch\u00f6nberger, J.; Bj\u00f6rnsson, E.; Beyter, D.; Eggertsson, H.P.; Sulem, P.; Stef\u00e1nsson, K.; Halld\u00f3rsson, B.V.; Kehr, B. PopDel identifies medium-size deletions jointly in tens of thousands of genomes. bioRxiv 2020, 10.1101/740225 doi: 10.1101/740225
Gordon, M. G., Inoue, F., Martin, B., Schubach, M., Agarwal, V., Whalen, S., ... & Kreimer, A. (2020). \"lentiMPRA and MPRAflow for high-throughput functional characterization of gene regulatory elements.\" Nature Protocols, 15(8), 2387-2412. doi: 10.1038/s41596-020-0333-5
Paraskevopoulou S., Pirzer F., Goldmann N., Schmid J., Corman V.M., Gottula L.T.,Schroeder S., Rasche A., Muth D., Drexler J.F., Heni A.C., Eibner G.J., Page R.A., Jones T.C., M\u00fcllerM.A., Sommer S., Glebe D., and Drosten C. (2020). \"Mammalian deltavirus without hepadnavirus coinfection in the neotropical rodent Proechimys semispinosus.\" Proceedings of the National Academy of Sciences 2020 Jul 28;117(30):17977-17983. doi: 10.1073/pnas.2006750117.
"},{"location":"misc/publication-list/#2019","title":"2019","text":"Kircher, M., Xiong, C., Martin, B. et al. \"Saturation mutagenesis of twenty disease-associated regulatory elements at single base-pair resolution.\" Nat Commun 10, 3583 (2019). doi: 10.1038/s41467-019-11526-w
Stefanovski L, Triebkorn P, Spiegler A, Diaz-Cortes M-A, Solodkin A, Jirsa V, McIntosh RA and Ritter P (2019). \"Linking Molecular Pathways and Large-Scale Computational Modeling to Assess Candidate Disease Mechanisms and Pharmacodynamics in Alzheimer's Disease.\" Front. Comput. Neurosci.. 13:54. doi: 10.3389/fncom.2019.00054
Boeddrich A., Babila J.T., Wiglenda T., Diez L., Jacob M., Nietfeld W., Huska M.R., Haenig C., Groenke N., Buntru A., Blanc E., Meier J.C., Vannoni E., Erck C., Friedrich B., Martens H., Neuendorf N., Schnoegl S., Wolfer DP., Loos M., Beule D., Andrade-Navarro M.A., Wanker E.E. (2019). \"The Anti-amyloid Compound DO1 Decreases Plaque Pathology and Neuroinflammation-Related Expression Changes in 5xFAD Transgenic Mice.\" Cell Chem Biol. 2019 Jan 17;26(1):109-120.e7. doi: 10.1016/j.chembiol.2018.10.013.
Fountain M.D., Oleson, D.S., Rech. M.E., Segebrecht, L., Hunter, J.V., McCarthy, J.M., Lupo, P.J., Holtgrewe, M., Mora, R., Rosenfeld, J.A., Isidor, B., Le Caignec, C., Saenz, M.S., Pedersen, R.C., Morgen, T.M., Pfotenhauer, J.P., Xia, F., Bi, W., Kang, S.-H.L., Patel, A., Krantz, I.D., Raible, S.E., Smith, W.E., Cristian, I., Tori, E., Juusola, J., Millan, F., Wentzensen, I.M., Person, R.E., K\u00fcry, S., B\u00e9zieau, S., Uguen, K., F\u00e9rec, C., Munnich, A., van Haelst, M., Lichtenbelt, K.D., van Gassen, K., Hagelstrom, T., Chawla, A., Perry, D.L., Taft, R.J., Jones, M., Masser-Frye, D., Dyment, D., Venkateswaran, S., Li, C., Escobar, L,.F., Horn, D., Spillmann, R.C., Pe\u00f1a, L., Wierzba, J., Strom, T.M. Parent, I. Kaiser, F.J., Ehmke, N., Schaaf, C.P. (2019). \"Pathogenic variants in USP7 cause a neurodevelopmental disorder with speech delays, altered behavior, and neurologic anomalies.\" Genet. Med. 2019 Jan 25. doi: 10.1038/s41436-019-0433-1
Holtgrewe,M., Messerschmidt,C., Nieminen,M. and Beule,D. (2019) DigestiFlow: from BCL to FASTQ with ease. Bioinformatics, 10.1093/bioinformatics/btz850.
K\u00e4fer S., Paraskevopoulou S., Zirkel F., Wieseke N., Donath A., Petersen M., Jones T.C., Liu S., Zhou X., Middendorf M., Junglen S., Misof B., Drosten C. (2019). \"Re-assessing the diversity of negative strand RNA viruses in insects.\" PLOS Pathogens 2019 Dec 12. doi: 10.1371/journal.ppat.1008224
K\u00fchnisch,J., Herbst,C., Al\u2010Wakeel\u2010Marquard,N., Dartsch,J., Holtgrewe,M., Baban,A., Mearini,G., Hardt,J., Kolokotronis,K., Gerull,B., et al. (2019) Targeted panel sequencing in pediatric primary cardiomyopathy supports a critical role of TNNI3. Clin Genet, 96, 549\u2013559. https://doi.org/10.1111/cge.13645
Marklewitz M., Dutari L.C., Paraskevopoulou S., Page R.A., Loaiza J.R., Junglen S. (2019). \"Diverse novel phleboviruses in sandflies from the Panama Canal area, Central Panama.\" Journal of General Virology 2019 May 3. doi: 10.1099/jgv.0.001260
Quade,A., Thiel,A., Kurth,I., Holtgrewe,M., Elbracht,M., Beule,D., Eggermann,K., Scholl,U.I. and H\u00e4usler,M. (2019) Paroxysmal tonic upgaze: A heterogeneous clinical condition responsive to carbonic anhydrase inhibition. European Journal of Paediatric Neurology, 10.1016/j.ejpn.2019.11.002.
"},{"location":"misc/publication-list/#2018","title":"2018","text":"Blanc, E., Holtgrewe, M., Dhamodaran, A., Messerschmidt, C., Willimsky, G., Blankenstein, T., Beule, D. (2018). \"Identification and Ranking of Recurrent Neo-Epitopes in Cancer\". bioRxiv. 2018/389437, 2018. doi: 10.1101/389437
Brandt, R., Uhlitz, F., Riemer, P., Giesecke, C., Schulze, S., El-Shimy, I.A., Fauler, B., Mielke, T., Mages, N., Herrmann, B.G., Sers, C., Bl\u00fcthgen, N., Morkel, M. (2018). \"Cell type-dependent differential activation of ERK by oncogenic KRAS or BRAF in the mouse intestinal epithelium\". bioRxiv. 2018/340844. doi: 10.1101/340844.
Holtgrewe, M., Knaus, A., Hildebrand, G., Pantel, J.-T., Rodriguesz de los Santos, M., Neveling, K., Goldmann, J., Schubach, M., J\u00e4ger, M., Couterier, M., Mundlos, S., Beule, D., Sperling, K., Krawitz, P. (2018). \"Multisite de novo mutations in human offspring after paternal exposure to ionizing radiation\", Nature Scientific Reports. 2018 Oct 2;8(1):14611. doi: 10.1038/s41598-018-33066-x.
Kircher M., Xiong C., Martin B, Schubach M, Inoue F, Bell R.JA., Costello J.F., Shendure J., Ahituv N. (2018). \"Saturation mutagenesis of disease-associated regulatory elements.\" bioRxiv (2018): 505362. doi: 10.1101/505362
PCAWG Transcriptome Core Group, Calabrese, C., Davidson, N.R., Fonseca1, N.A., He, Y., Kahles, A., Lehmann, K.-V., Liu, F., Shiraishi, Y., Soulette, C.M., Urban, L., Demircio\u011flu, D., Greger, L., Li, S., Liu, D., Perry, M.D., Xiang, L., Zhang, F., Zhang, J., Bailey, P., Erkek, S., Hoadley, K.A., Hou, Y., Kilpinen, H., Korbel, J.O., Marin, M.G., Markowski, J., Nandi11, T., Pan-Hammarstr\u00f6m, Q., Pedamallu, C.S., Siebert, R., Stark, S.G., Su, H., Tan, P., Waszak, S.M., Yung, C., Zhu, S., PCAWG Transcriptome Working Group, Awadalla, P., Creighton, C.J., Meyerson, M., Ouellette, B.F.F., Wu, K., Yang, H., ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Network, Brazma1, A., Brooks, A.N., G\u00f6ke, J., R\u00e4tsch, G., Schwarz, R.F., Stegle, O., Zhang, Z. (2018). \"Genomic basis for RNA alterations revealed by whole-genome analyses of 27 cancer types\". bioRxiv. 2018/183889. doi: 10.1101/183889
Guneykaya D., Ivanov A., Hernandez D.P., Haage V., Wojtas B., Meyer N., Maricos M., Jordan P., Buonfiglioli A., Gielniewski B., Ochocka N., C\u00f6mert, C., Friedrich, C., Artiles, L. S., Kaminska, B., Mertins, P., Beule, D., Kettenmann, H. (2018). \"Transcriptional and translational differences of microglia from male and female brains\", Cell reports. 2018 Sep 4;24(10):2773-83. doi: 10.1016/j.celrep.2018.08.001.
Rentzsch P, Witten D, Cooper GM, Shendure J, Kircher M. (2018). \"CADD: predicting the deleteriousness of variants throughout the human genome\", Nucleic Acids Res. 2018 Oct 29. doi: 10.1093/nar/gky1016.
Salatzki J., Foryst-Ludwig A., Bentele K., Blumrich A., Smeir E., Ban Z., Brix S., Grune J., Beyhoff N., Klopfleisch R., Dunst S., Surma, M.A., Klose, C., Rothe, M., Heinzel, F.R., Krannich, A., Kershaw, E.E., Beule, D., Schulze, P.C., Marx, N., Kintscher, U. (2018). \"Adipose tissue ATGL modifies the cardiac lipidome in pressure-overload-induced left ventricular failure\", PLoS genetics. 2018 Jan 10;14(1):e1007171. doi: 10.1371/journal.pgen.100717.
Schubach M., Re M., Robinson P.N., Valentini G. (2017) \"Imbalance-aware machine learning for predicting rare and common disease-associated non-coding variants\", Scientific reports 7:1, 2959. doi: 10.1038/s41598-017-03011-5.
Schubert M., Klinge, B., Kl\u00fcnemann M., Sieber A., Uhlitz F., Sauer S., Garnett M., Bl\u00fcthgen N., Saez-Rodriguez J. (2018). \"Perturbation-response genes reveal signaling footprints in cancer gene expression\". Nature Communications. 9: 20, 2018. doi: 10.1038/s41467-017-02391-6
"},{"location":"misc/publication-list/#2017","title":"2017","text":"Euskirchen, P., Bielle, F., Labreche, K., Kloosterman, W.P., Rosenberg, S., Daniau, M., Schmitt, C., Masliah-Planchon, J., Bourdeaut, F., Dehais, C., et al. (2017). Same-day genomic and epigenomic diagnosis of brain tumors using real-time nanopore sequencing. Acta Neuropathol 1\u201313. doi: 10.1007/s00401-017-1743-5
Euskirchen, P., Radke, J., Schmidt, M.S., Heuling, E.S., Kadikowski, E., Maricos, M., Knab, F., Grittner, U., Zerbe, N., Czabanka, M., et al. (2017). Cellular heterogeneity contributes to subtype-specific expression of ZEB1 in human glioblastoma. PLOS ONE 12, e0185376. doi: 10.1371/journal.pone.0185376
Mattei D., Ivanov A., Ferrai C., Jordan P., Guneykaya D., Buonfiglioli A., Schaafsma W., Przanowski P., Deuther-Conrad W., Brust P., Hesse S., Patt, M., Sabri, O., Ross, T.L., Eggen, B.J.L., Boddeke E.W.G.M., Kaminska, B., Beule, D., Pombo, A., Kettenmann, H., Wolf, S.A. (2017). \"Maternal immune activation results in complex microglial transcriptome signature in the adult offspring that is reversed by minocycline treatment.\" Translational psychiatry. 2017 May;7(5):e1120. doi: 10.1038/tp.2017.80.
Mamlouk, S., Childs, L. H., Aust, D., Heim, D., Melching, F., Oliveira, C., Wolf, T., Durek, P., Schumacher, D., Bl\u00e4ker, H., von Winterfeld, M., Gastl, B., M\u00f6hr, K., Menne, A., Zeugner, S., Redmer, T., Lenze, D., Tierling, S., M\u00f6bs, M., Weichert, W., Folprecht, G., Blanc, E., Beule, D., Sch\u00e4fer, R., Morkel, M., Klauschen, F., Leser, U. and Sers, C. (2017). \"DNA copy number changes define spatial patterns of heterogeneity in colorectal cancer\", Nature Communications. 2017; 8, p. 14093. doi: 10.1038/ncomms14093.
Messerschmidt, C., Holtgrewe, M. and Beule, D. (2017). \"HLA-MA: simple yet powerful matching of samples using HLA typing results\". Bioinformatics. 28, pp. 2592\u20132599. doi: 10.1093/bioinformatics/btx132.
Kammertoens, T., Friese, C., Arina, A., Idel, C., Briesemeister, D., Rothe, M., Ivanov, A., Szymborska, A., Patone, G., Kunz, S., Sommermeyer, D., Engels, B., Leisegang, M., Textor, A., Fehling, H. J., Fruttiger, M., Lohoff, M., Herrmann, A., Yu, H., Weichselbaum, R., Uckert, W., H\u00fcbner, N., Gerhardt, H., Beule, D., Schreiber, H. and Blankenstein, T. (2017). \"Tumour ischaemia by interferon-\u03b3 resembles physiological blood vessel regression\". Nature. 545(7652), pp. 98\u2013102. doi: 10.1038/nature22311.
Schulze Heuling, E., Knab, F., Radke, J., Eskilsson, E., Martinez-Ledesma, E., Koch, A., Czabanka, M., Dieterich, C., Verhaak, R.G., Harms, C., et al. (2017). Prognostic Relevance of Tumor Purity and Interaction with MGMT Methylation in Glioblastoma. Mol. Cancer Res. 15, 532\u2013540. doi: 10.1158/1541-7786.MCR-16-0322
Yaakov, G., Lerner, D., Bentele, K., Steinberger, J., Barkai, N., Bigger, J., Maisonneuve, E., Gerdes, K., Lewis, K., Dhar, N., McKinney, J. D., Gefen, O., Balaban, N. Q., Jayaraman, R., Balaban, N. Q., Merrin, J., Chait, R., Kowalik, L., Leibler, S., Balaban, N. Q., Allison, K. R., Brynildsen, M. P., Collins, J. J., Nathan, C., Lewis, K., Glickman, M. S., Sawyers, Knoechel, B., Welch, A. Z., Gibney, P. A., Botstein, D., Koshland, D. E., Levy, S. F., Ziv, N., Siegal, M. L., Stewart-Ornstein, J., Weissman, J. S., El-Samad, H., Gasch, A. P., Weinert, T., Hartwell, L., Weinert, T. A., Hartwell, L. H., Lisby, M., Rothstein, R., Mortensen, U. H., Lisby, M., Mortensen, U. H., Rothstein, R., Domkin, V., Thelander, L., Chabes, A., Hendry, J. A., Tan, G., Ou, J., Boone, C., Brown, G. W., Berry, D. B., Gasch, A. P., Lynch, M., Nishant, K. T., Serero, A., Jubin, C., Loeillet, S., Legoix-Ne, P., Nicolas, A. G., Huh, W. K., Janke, C., Lee, S. E., Blecher-Gonen, R., Martin, M., Cherry, J. M., McKenna, A., DePristo, M. A., Lawrence, M., Obenchain, V., Ye, K., Schulz, M. H., Long, Q., Apweiler, R., Ning, Z., Layer, R. M., Chiang, C., Quinlan, A. R., Hall, I. M., Faust, G. G., Hall, I. M., Boeva, V., Boeva, V., Li, H., Koren, A., Soifer, I. and Barkai, N. (2017). \"Coupling phenotypic persistence to DNA damage increases genetic diversity in severe stress\". Nature Ecology & Evolution. 1(1), pp. 497\u2013500. doi: 10.1038/s41559-016-0016.
Uhlitz, F., Sieber, A., Wyler, E., Fritsche-Guenther, R., Meisig, J., Landthaler, M., Klinger, B., Bl\u00fcthgen, N. (2017). \"An immediate-late gene expression module decodes ERK signal duration\". Molecular Systems Biology. 13: 928, 2017. doi: 10.15252/msb.20177554.
"},{"location":"misc/publication-list/#theses","title":"Theses","text":""},{"location":"misc/publication-list/#2019_1","title":"2019","text":"Schumann F. (2019). \"Establishing a pipeline for stable mutational signature detection and evaluation of variant filter effects\". Freie Universit\u00e4t Berlin. Bachelor Thesis, Bioinformatics.
"},{"location":"misc/publication-list/#2018_1","title":"2018","text":"Borgsm\u00fcller N. (2018). \"Optimization of data processing in GC-MS metabolomics\", Technische Universit\u00e4t Berlin. Master Thesis, Biotechnology.
Kuchenbecker, S.-L. (2018). \"Analysis of Antigen Receptor Repertoires Captured by High Throughput Sequencing\". Freie Universit\u00e4t Universit\u00e4t Berlin. PhD Thesis, Dr. rer. nat. URN:NBN: urn:nbn:de:kobv:188-refubium-22171-8
Schubach M. (2018). \"Learning the Non-Coding Genome\", Freie Universit\u00e4t Universit\u00e4t Berlin. PhD Thesis, Dr. rer. nat. URN:NBN: urn:nbn:de:kobv:188-refubium-23332-7
"},{"location":"misc/publication-list/#posters","title":"Posters","text":""},{"location":"misc/publication-list/#2018_2","title":"2018","text":"Roskosch, S., Hald\u00f3rsson B., Kehr, B. (2018). \"PopDel: Population-Scale Detection of Genomic Deletions\" ECCB 2018. Poster.
White T., Kehr B. (2018). \"Comprehensive extraction of structural variations from long-read DNA sequences\" WABI 2018. Poster.
"},{"location":"misc/publication-list/#2017_1","title":"2017","text":"Schubach M., Re R., Robinson P.N., Valentini G. (2017). \"Variant relevance prediction in extremely imbalanced training sets\" ISMB/ECCB 2017. Poster.
White T., Kehr B. (2017). \"Improving long-read mapping with simple lossy sequence transforms\" ISMB/ECCB 2017. Poster.
"},{"location":"ondemand/interactive/","title":"OnDemand: Interactive Sessions","text":"Interactive sessions allow you to start and manage selected apps. Depending on the app they run as servers or GUIs. Selecting My Interactive Sessions
in the top menu will direct you to the overview of currently running sessions. The left-hand panel provides a short cut to start a new session of one of the provided apps.
Each running interactive session is listed. Each card corresponds to one session. The title of each card provides the name, allocated resources and the current status. Furthermore, detailed information and links are available:
Don't hit reload in your apps
Please note that the portal will use the authentication mechanisms of the apps to ensure that nobody except for you can connect to the session. This means that hitting the browsers \"reload\" button in your app will most likely not work.
Just go back to the interactive session list and click on the \"connect\" button.
"},{"location":"ondemand/interactive/#session-directories","title":"Session Directories","text":"The portal software will create a folder ondemand
in your home directory. Inside, it will create session directories for each started interactive job. For technical reasons, these folders have very long names, for example:
$HOME/ondemand/data/sys/dashboard/batch_connect/sys/ood-bih-rstudio-server/output/e40e03b3-11ca-458a-855b-98e6f148c99a/
This follows the pattern:
$HOME/${application name}/output/${job UUID}
The job identifier used is not the Slurm job ID but an identifier internal to OnDemand. Inside this directory you will find log files and a number of scripts that are used to start your job.
If you need to debug any interactive job, start here. Also, the helpdesk will need the path to this folder to help you with interactive jobs.
You can find the name of the latest output folder with the following command:
$ ls -lhtr $HOME/${application name}/output | tail -n 1\n
For example, for RStudio Server:
$ ls -lhtr $HOME/ondemand/data/sys/dashboard/batch_connect/sys/ood-bih-rstudio-server/output | tail -n 1\n
Prevent Home From Filling Up
You should probably move ~/ondemand
to your work volume with the following:
$ mv ~/ondemand ~/work/ondemand\n$ ln -sr ~/work/ondemand ~/ondemand\n
Make sure to delete potential interactive sessions and to logout from the Ondemand Portal first. Otherwise, the ~/ondemand
folder is constantly recreated and the symlink will be just created within this folder as ~/ondemand/ondemand
and thus not be used as intended.
Also, clear out ~/work/ondemand/*
from time to time but take care that you don't remove the directory of any running job.
This description of starting an RStudio session is a showcase for starting other interactive apps as well.
To start the session, please go to Interactive Apps
in the top menu bar and select RStudio Server
or click RStudio Server
in the left-hand panel.
Allocate appropriate resources and click Launch
.
An info card for the RStudio Server will be added to My Interactive Sessions
, and during start, it will change its state from Queued
to Starting
to Running
. Depending on the app, resources allocated and current cluster usage, this will take a couple of seconds.
When in the final state (Running
), one can directly connect to the RStudio Server to get an interactive session by clicking Connect to RStudio Server
:
To use the OnDemand portal with a specific R installation including a stable set of custom packages you can use a conda enviroment from the cluster as a R source.
For this you may first need to create this conda environment including your R version of choice and all necessary packages. Specific installations of i.e. python from conda can be used similarly in other interactive apps.
channels:\n - conda-forge\n - bioconda\n - defaults\ndependencies:\n - r-base\n - r-essentials\n - r-devtools\n - bioconductor-deseq2\n - r-tidyverse\n - r-rmarkdown\n - r-knitr\n - r-dt\n
Some packages (i.e. several single-cell-RNAseq analysis tools) are only available from github and not on Cran/Bioconductor. There are two ways to install such packages into a conda enviroment.
Click to expand 1) Install from inside R \\[easier option, but not pure conda\\] * First setup the conda env, ideally including all dependencies for the desired package from github (and do include r-devtools) * Then within R run `devtools::install_github('owner/repo', dependencies=F, upgrade=F, lib='/path/to/conda/env-name/lib/R/library')` * if you don't have all dependencies already installed you will have to omit dependencies=F and risk a mix of conda & native R installed packages (or just have to redo the conda env). * github_install involves a build process and still needs a bit of memory, so this might crash on the default `srun --pty bash -i` shell 2) Build packages into a local conda channel \\[takes longer, but pure conda\\]\\ This approach is mostly taken from the answers given [here](https://stackoverflow.com/questions/52061664/install-r-package-from-github-using-conda). These steps must be taken _before_ building the final env used with Rstudio * use `conda skeleton cran https://github.com/owner/repo [--git-tag vX.Y]` to generate build files * conda skeleton only works for repositories with a release/version tag. If the package you want to install does not have that, you either need to create a fork and add a such a tag, or find a fork that already did that. Downloading the code directly from github and building the package from that is also possible, but you will the need to manually set up the `meta.yaml` and `build.sh` files that conda skeleton would create. * If there is more than one release tag, do specify which one you want, it may not automatically take the most recent one. * If any r-packages from bioconductor are dependencies, conda will not find them during the build process. You will need to change the respective entries in the `meta.yaml` file created by conda skeleton. I.e. change `r-deseq2` to `bioconductor-deseq2` * Build the package with `conda build --R= [--use-local] r-` * You need to specifying the same R-version used in the final conda env * If the github package has additional dependencies from github, build those first and then add `--use-local` so the build process can find them. * The build process definitely needs more memory than the default `srun --pty bash -i` shell. It also takes quite a bit of time (much longer than installing through devtools::install_github) * Finally add the packages (+versions) you built to the environment definition (i.e. yaml file) and create the (final) conda environment. Don't forget to tell conda to use locally build packages (either supply `--use-local` or add `- local` to the channel list in the yaml file)Starting the Rstudio session via the OnDemand portal works almost as described above (see Example 1). However, you do have to select `miniconda` as R source and provide the path to your miniconda installation and (separated by a colon) the name of the (newly created) conda enviroment you want to use.
Additional notes:
.libPaths()
entries and therefore a link to your previous conda installation. Creating a new project cleans .libPaths()
to only the env specified in setting up the Rstudio session.Status / Stability
OnDemand Support is currently in beta phase on the BIH HPC. In case of any issues, please send an email to hpc-helpdesk@bih-charite.de.
To allow for better interactive works, BIH HPC administration has setup an Open OnDemand (OOD) portal web server.
You can find the OnDemand Portal for HPC 4 Research at:
OOD allows you to access cluster resources using a web-based graphical interface in addition to traditional SSH connections. You can then connect to jobs running graphical applications either to virtual desktops (such as Matlab) or to web apps (such as Jupyter and RStudio Server).
The following figure illustrates this.
The primary way to the cluster continues to be SSH which has several advantages. By the nature of the cluster being based on Linux servers, it will offer more features through the \"native\" access and through its lower complexity, it will offer higher stability. However, we all like to have the option of a graphical interface, at least from time to time .
The main features are:
The first prerequisite is to have a cluster account already (see Getting Access). Once you have done your first SSH connection to the cluster successfully you can start using the portal. For this you perform the following steps:
_c
) then please use the \"Charit\u00e9 - Universit\u00e4tmedizin Berlin\" button, for MDC Accounts please use the \"Max Delbr\u00fcck Center Berlin\" button. Clicked the Wrong Login Button?
If you clicked the wrong button then please clear your cookies to force a logout of the system.
"},{"location":"ondemand/overview/#prepare-ondemand-folder","title":"Prepare OnDemand Folder","text":"The ondemand
folder is automatically created in your home directory, and the OnDemand service searches for this folder in your home directory, i.e. it has to stay there. But as the quota in the home directory is very limited, you can easily hit the hard quota which might prevent you from working on the cluster.
To prevent this, move the ~/ondemand
folder to the ~/work
folder and create a symlink for the now dislocated ~/ondemand
folder:
hpc-login-1:~$ mv ~/ondemand ~/work/ondemand\nhpc-login-1:~$ ln -sr ~/work/ondemand ~/ondemand\n
Important
Make sure to delete potential interactive sessions and to logout from the Ondemand Portal first. Otherwise, the ~/ondemand
folder is constantly recreated and the symlink will be just created within this folder as ~/ondemand/ondemand
and thus not be used as intended.
Problems with Open OnDemand?
First try to log out and login again. Next, try to clear all cookies for the domain hpc-portal.cubi.bihealth.org
. Finally, try the Help > Restart Web Server
link to restart the per-user nginx (PUN) server.
You will then be redirected to the dashboard screen.
Here you have access to the following actions. We will not go into detail of all of them and expect them to be self-explanatory.
Important
Please note that when using the portal then you are acting as your HPC user. Use standard best practice. Consider carefully what you do as you would from the command line (e.g., don't use the portal to browse the web from the cluster).
Outdated
This document is only valid for the old, third-generation file system and will be removed soon. Quotas of our new CephFS storage are communicated via the HPC Access web portal.
Accessing the quota report by selecting Files
and then Quotas
in the top menu will provide you with a detailed list of all quotas for directories that you are assigned to.
There are two types of quotas: for (a) size of and (b) number of files in a directory.
Every row in the table corresponds to a directory that you have access to. This implies your home directory (fast/users
) as well as the group directory of your lab (fast/groups
) and possible projects (fast/projects
) (if any). Quotas are not directly implied on these directories but on the home
, scratch
and work
subdirectories that each of subdirectory of the beforementioned directories has (for a detailed explanation see Storage and Volumes).
The following list explains the columns of the table:
/
) and substituting the underscores with a slash in the (users|groups|projects)_
and _(home|scratch|work)
substring. The corresponding path for name fast/users_stolpeo_c_home
would be /fast/users/stolpeo_c/home
.BIH HPC IT provides acess to high-performance compute (HPC) cluster systems. A cluster system bundles a high number of nodes and in the case of HPC, the focus is on performance (with contrast to high availability clusters).
"},{"location":"overview/architecture/#hpc-4-research","title":"HPC 4 Research","text":""},{"location":"overview/architecture/#cluster-hardware","title":"Cluster Hardware","text":"Users don't connect to nodes directly but rather create interactive or batch jobs to be executed by the cluster job scheduler Slurm.
As common with HPC systems, users cannot directly access the compute nodes but rather connect to so-called head nodes. The BIH HPC system provides the following head nodes:
login-1
and login-2
that accept SSH connections and are meant for low intensity, interactive work such as editing files, running screen/tmux sessions, and logging into the compute nodes. Users should run no computational tasks and no large-scale data transfer on these nodes.transfer-1
and transfer-2
also accept SSH connections. Users should run all large-scale data transfer through these nodes.After registration and client configurations, users with typically connect to the HPC system through the login nodes:
local:~$ ssh -l jdoe_c hpc-login-1.cubi.bihealth.org\nhpc-login-1:~$\n
Subsequently, they might submit batch jobs to the cluster for execution through the Slurm scheduling system or open interactive sessions:
hpc-login-1:~$ sbatch job_script.sh\nhpc-login-1:~$ srun --pty bash -i\nmed0104:~$\n
"},{"location":"overview/for-the-impatient/","title":"Overview","text":""},{"location":"overview/for-the-impatient/#bih-hpc-4-research","title":"BIH HPC 4 Research","text":"BIH HPC 4 Research is located in the BIH data center in Buch and connected via the BIH research network. Connections can be made from Charite, MDC, and BIH networks. The cluster is open for users with either Charite or MDC accounts after getting access through the gatekeeper proces. The system has been designed to be suitable for the processing of human genetics data from research contexts (and of course data without data privacy concerns such as public and mouse data).
"},{"location":"overview/for-the-impatient/#cluster-hardware-and-scheduling","title":"Cluster Hardware and Scheduling","text":"The cluster consists of the following major components:
hpc-login-1
and hpc-login-2
(for interactive sessions only),hpc-transfer-1
and hpc-transfer-2
,hpc-cpu-{1..228}
hpc-mem-{1..5}
,hpc-gpu-{1..7}
and 1 node with 10x A40 GPUs (!) hpc-gpu-8
,/fast
,This is shown by the following picture:
"},{"location":"overview/for-the-impatient/#differences-between-workstations-and-clusters","title":"Differences Between Workstations and Clusters","text":"The differences include:
srun
to go to a compute node.srun
to go to a compute node you might end up on a different host./tmp
./fast
directory is shared throughout the cluster which contains your home, group home, and project directories.root
or sudo
permissions on the cluster.sbatch
) over calling programs interactively.NB: the following might sound a bit harsh but is written with everyone's best intentions in mind (we actually like you, our user!) This addresses a lot of suboptimal (yet not dangerous, of course) points we observed in our users.
IT IS
IT IS NOT
sudo
.Once logged into the cluster through the login nodes, the Slurm scheduler needs to be used to submit computing jobs. In Slurm nomenclature, cluster compute nodes are assigned to one or more partitions. Submitted jobs are assigned to nodes according to the partition's configuration.
"},{"location":"overview/job-scheduler/#partitions","title":"Partitions","text":"The BIH HPC has the partitions described below. The cluster focuses on life science applications and not \"classic HPC\" with numerical computations using MPI. Thus, all partitions except for mpi
only allow to reserve resources on one node. This makes the cluster easier to use as users don't have to explicitely specify this limit when submitting their jobs.
standard
","text":"Jobs are submitted to the standard
partition by default. From the, the scheduler will route the jobs to their actual partition using the routing rule set described below. You can override this routing by explicitely assigning a partition (but this is discouraged).
gpu
queue.highmem
queue.debug
, short
, medium
, and long
long depending on their configured maximal running time. The partitions are evaluated in the order given above and the first fitting partition will be used.debug
","text":"This partition is for very short jobs that should be executed quickly, e.g., for tests. The job running time is limited to one hour and at most 128 cores can be used per user but the jobs are submitted with highest priority.
debug
--time 01:00:00
short
","text":"This partition is for jobs running only few hours. The priority of short jobs is high and many cores can be used at once to reward users for splitting their jobs into smaller parts.
short
--time 04:00:00
medium
","text":"This partition is for jobs running for multiple days. Users can only allocate the equivalent of 4 nodes.
medium
--time 7-00:00:00
long
","text":"This partition is for long-running tasks. Only one node can be reserved for so long to discourage really long-running jobs and encourage users for splitting their jobs into smaller parts.
long
--time 14-00:00:00
gpu
","text":"Jobs requesting GPU resources are automatically assigned to the gpu
partition.
The GPU nodes are only part of the gpu
partition so they are not blocked by normal compute jobs. Maximum run time is relatively high (14 days) to allow for longer training jobs. Contact hpc-helpdesk@bih-charite.de if you have longer running jobs that you really cannot make run any shorter for assistance.
Info
Fair use rules apply. As GPU nodes are a limited resource, excessive use by single users is prohibited and can lead to mitigating actions. Be nice and cooperative with other users. Tip: getent passwd USER_NAME
will give you a user's contact details.
gpu
$count
GPUs: -p gpu --gres=gpu:$card:$count
(card=tesla
or card=a40
), maximum run time: --time 14-00:00:00
highmem
","text":"Jobs requesting more than 200 GB of RAM are automatically routed to the highmem
partition.
The high memory nodes are only part of the highmem
partition so they are not blocked by normal compute jobs. Maximum run time is relatively high (14 days) to allow for longer jobs. Contact hpc-helpdesk@bih-charite.de for assistance if you have longer running jobs that you really cannot make run any shorter.
Info
Fair use rules apply. As high-memory nodes are a limited resource, excessive use by single users is prohibited and can lead to mitigating actions. Be nice and cooperative with other users. Tip: getent passwd USER_NAME
will give you a user's contact details.
highmem
-p highmem
, maximum run time: --time 14-00:00:00
mpi
","text":"Jobs are not routed automatically to the mpi
partition but you have to explitely request the partition. This is the only partition in which more than one node can be allocated to a job.
You can submit multi-node jobs into the mpi
partition. Maximum run time is relatively high (14 days) to allow for longer jobs. Don't abuse this. Contact hpc-helpdesk@bih-charite.de for assistance if you have longer running jobs that you really cannot make run any shorter.
highmem
-p mpi
, maximum run time: --time 14-00:00:00
critical
","text":"Jobs are not routed into critial
automatically and the partition has to be selected manually.
This partition is for time-critical jobs with deadlines. As long as the cluster is not very busy, requests for critical jobs will be granted most of the time. However, do not use this partition without arranging with hpc-helpdesk as killing jobs will be used as the ultima ratio in case of such policy violations.
critical
--time 7-00:00:00
We currently provide you only with Ganglia for monitoring the cluster status.
"},{"location":"overview/monitoring/#using-ganglia","title":"Using Ganglia","text":"Go to the following address and login with your home organization (Charite or MDC):
Ganglia does not know about Slurm
Ganglia will not show you anything about the Slurm job schedulign system. If a job uses a whole node but uses no CPUs then this will be displayed as unused in Ganglia. However, Slurm would not schedule another job on this node.
You will be show a screen as shown below. This allows you to get a good idea of what is going on on the HPC.
By default you will be shown the cluster usage of the last day. You can quickly switch to report for two or four hours as well, etc.
In the first row of pictures you see the number of total CPUs (actually hardware threads), number of hosts seen as up and down by Ganglia, and cluster load/utilization. You will then see the overall cluster load, memory usage, CPU usage, and network utilization across the selected time period.
Linux load is not intuitive
Note that the technical details behind Linux load is not very interactive. It is incorporating much more than just the CPU usage. You can find a quite comprehensive treatement of Linux Load here.
We are using a fast shared storage system and almost no local storage (except in /tmp
). Also, almost no jobs use MPI or other heavy network communication. Thus, the network utilization is a good measure of the I/O on the cluster.
Below, you can drill down into various metrics and visualize them historically. Just try it out and find your way around, you cannot break anything. Sadly, there is no good documentation of Ganglia online.
"},{"location":"overview/monitoring/#aggregate-gpu-utilization-visualization","title":"Aggregate GPU Utilization Visualization","text":"Ganglia allows you to obtain metrics in several interesting and useful ways. If you click on \"Aggregate Graphs\" then you could enter the following values to get an overview of the live GPU utilization.
Aggreate GPU Utilization
hpc-gpu-.*
gpu._util
Stacked
Hide legend
Then click Create Graph
.
If a GPU is fully used, it will contribute 100 points on the vertical axis. See above for an example, and here is a direct link:
No mounting on the cluster itself.
For various technical and security-related reasons it is not possible to mount anything on the cluster nodes by users. For mounting the cluster storage on your computer, please read Connecting: SSHFS Mounts.
This document gives an overview of the nodes and volumes on the cluster.
"},{"location":"overview/storage/#cluster-layout","title":"Cluster Layout","text":""},{"location":"overview/storage/#cluster-nodes","title":"Cluster Nodes","text":"The following groups of nodes are available to cluster users. There are a number of nodes that are invisible to non-admin staff, hosting the queue master and monitoring tools and providing backup storage for key critical data, but these are not shown here.
hpc-login-{1,2}
hpc-login-{1,2}.cubi.bihealth.org
med0101..0124,0127
med0133..0164
med0201..0264
med0301..0304
med0401..0405
special purpose/high-memory machinesmed0401
and med0402
med0403
and med0404
med0405
gpu
)med0601..0616
med0618..0633
med0701..0764
The cluster has 2.1 PB of legacy fast storage, currently available at /fast
, as well as 1.6 PB of next-generation fast storage, available at /data/cephfs-1
. Additionally 7.4 PB of slower \"Tier 2\" storage is available at /data/cephfs-2
. Storage is provided by a Ceph storage cluster and designed for massively parallel access from an HPC system. In contrast to \"single server\" NFS systems, the system can provide large bandwidth to all cluster nodes in parallel as long as large data means relatively \"few\" files are read and written.
Storage is split into three sections:
home
-- small, persistent, and safe storage, e.g., for documents and configuration files (default quota of 1 GB).work
-- larger and persistent storage, e.g., for your large data files (default quota of 1 TB).scratch
-- large and non-persistent storage, e.g., for temporary files, files are automatically deleted after 2 weeks (default quota of 10 TB; deletion not implemented yet).)Each user, group, and project has one or more of these sections each, e. g. for users:
/data/cephfs-1/home/users/$NAME
/data/cephfs-1/home/users/$NAME/work
/data/cephfs-1/home/users/$USER/scratch
See Storage and Volumes: Locations for more informatin.
"},{"location":"slurm/background/","title":"Introduction to Scheduling","text":"As explained elsewhere in more detail, an HPC cluster consists of multiple computers connected via a network and working together. Multiple users can use the system simultaneously to do their work. This means that the system needs to join multiple computers (nodes) to provide a coherent view of them and the same time partition the system to allow multiple users to work concurrently.
user 1 user 2 ...\n\n .---. .---. .---. .---.\n | J | | J | | J | | J |\n | o | | o | | o | | o | ...\n | b | | b | | b | | b |\n | 1 | | 2 | | 3 | | 4 |\n '---' '---' '---' '---'\n\n.------------------------------------------.\n| Cluster Scheduler |\n'------------------------------------------'\n\n.----------. .------------. .------------.\n| multiple | | separate | | computers |\n'----------' '------------' '------------'\n
"},{"location":"slurm/background/#interlude-partitioning-single-computers","title":"Interlude: Partitioning Single Computers","text":"Overall, this partitioning is not so different from how your workstation or laptop works. Most likely, your computer (or even your smartphone) has multiple processors (or cores). You can run multiple programs on the same computer and the fact that (a) there is more than one core and (b) there is more than one program running is not known to the running programs (unless they explicitly communicate with each other). Different programs can explicitly take advantage of the multiple processor cores. The main difference is that you normally use your computer in an interactive fashion (you perform an action and expect an immediate reaction).
Even with a single processor (and core), your computer manages to run more than one program at the same time. This is done with the so-called time-slicing approach where the operating system lets each programs run in turn for a short time (a few milliseconds). A program with a higher priority will get more time slices than one with a lower (e.g., your audio player has real-time requirements and you will hear artifacts if it is starved for compute resources). Your operating system protects programs from each other by creating an address space for each. When two programs are running, the value of the memory at any given position in one program is independent from the value in the other program. Your operating system offers explicit functionality for sharing certain memory areas that two programs can use to exchange data efficiently.
Similarly, file permissions with Unix users/groups or Unix/Windows ACLs (access control lists) are used to isolate users from each other. Programs can share data by accessing the same file if they can both access it. There are special files called sockets that allow for network-like inter-process communication but of course two programs on the same computer can also connect (virtually) via the computer network (no data will actually go through a cable).
"},{"location":"slurm/background/#interlude-resource-types","title":"Interlude: Resource Types","text":"As another diversion, let us consider how Unix manages its resources. This is important to understand when requesting resources from the scheduler later on.
First of all, a computer might offer a certain feature such as a specific hardware platform or special network connection. Examples for this on the BIH HPC are specific Intel processor generations such as haswell
or the availability of Infiniband networking. You can request these with so-called constraints; they are not allocated to specific jobs.
Second, there are resources that are allocated to specific jobs. The most important resources here are:
Generally, once a resource has been allocated to one job, it is not available to another. This means if you allocating more resources to your job that you actually need (overallocation) then those resources are not available to other jobs (whether they are your jobs or those of other users). This will be explained further below.
Another example of resource allocation are licenses. The BIH HPC has a few Matlab 2016b licenses that users can request. As long as a license is allocated to one job, it is unavailable to another.
"},{"location":"slurm/background/#nodes-sockets-processors-cores-threads","title":"Nodes, Sockets, Processors, Cores, Threads","text":"Regarding compute resources, Slurm differentiates between:
In most cases, you will use one compute node only. When using more than one node, you will need to use some form of message passing, e.g., MPI, so processes on different nodes can communicate. On a single node you would mostly use single- or multi-threaded processes, or multiple processes.
Above: Slurm's nomenclature for sockets, processors, cores, and threads (from Slurm Documentation).
Co-locating processes/threads on the same socket has certain implications that are mostly useful for numerical applications. We will not further go into detail here. Slurm provides many different features of ways to specify allocation of \"pinning\" to specific process locations. If you need this feature, we trust that you find sufficient explanation in the Slurm documentation.
Usually, you would allocate multiple cores (a term Slurm uses synonymously with processors) on a single node (allocation on a single node is the default).
"},{"location":"slurm/background/#how-scheduling-works","title":"How Scheduling Works","text":"Slurm is an acronym for \"Simple Linux Unix Resource Manager\" (note that the word \"scheduler\" does not occur here). Actually, one classically differentiates between the managing of resources and the scheduling of jobs that use them. The resource manager allocates resources according to a user's request for a job and ensures that there are no conflicts. If the required resources are not available, the scheduler puts the user's job into a queue. Later, when then requested resources become available the scheduler assigns them to the job and runs it. In the following, both resource allocation and the running of the job are described as being done by the scheduler.
The interesting case occurs when there are not enough resources available for at least two jobs submitted to the scheduler. The scheduler has to decide how to proceed. Consider the simplified case of only scheduling cores. Each job will request a number of cores. The scheduler will then generate a scheduling plan that might look as follows.
core\n ^\n4 | |---job2---|\n3 | |---job2---|\n2 | |---job2---|\n1 | |--job1--|\n +--------------------------> t time\n 5 1 1 2\n 0 5 0\n
job1
has been allocated one core and job2
has been allocated two cores. When job3
, requesting one core is submitted at t = 5, it has to wait at least as long until job1
is finished. If job3
requested two or more cores, it would have to wait at least until job2
also finished.
We can now ask several questions, including the following:
Also see the Slurm Frequently Asked Questions.
Please note that even if all jobs were known at the start of time, scheduling is still a so-called NP-complete problem. Entire computer science journals and books are dedicated only to scheduling. Things get more complex in the case of online scheduling, in which new jobs can appear at any time. In practice, Slurm does a fantastic job with its heuristics but it heavily relies on parameter tuning. HPC administration is constantly working on optimizing the scheduler settings. Note that you can use the --format
option to the squeue
command to request that it shows you information about job scheduling (in particular, see the %S
field, which will show you the expected start time for a job, assuming Slurm has calculated it). See man squeue
for details. If you observe inexplicable behavior, please notify us at hpc-helpdesk@bih-charite.de
.
In Slurm, the nodes of a cluster are split into partitions. Nodes are assigned to one or more partition (see the Job Scheduler section for details). Jobs can also be assigned to one or more partitions and are executed on nodes of the given partition.
In the BIH HPC, partitions are used to stratify jobs of certain running times and to provide different quality of service (e.g., maximal number of CPU cores available to a user for jobs of a certain running time and size). The partitions gpu
and highmem
provide special hardware (the nodes are not assigned to other partitions) and the mpi
partition allows MPI-parallelism and the allocation of jobs to more than one node. The Job Scheduler provides further details.
This page contains assorted Slurm commands and Bash snippets that should be helpful.
man
pages!
$ man sinfo\n$ man scontrol\n$ man squeue\n# etc...\n
interactive sessions
hpc-login-1:~$ srun --pty bash\nmed0740:~$ echo \"Hello World\"\nmed0740:~$ exit\n
batch submission
hpc-login-1:~$ sbatch script.sh\nSubmitted batch job 2\nhpc-login-1:~$ squeue\n JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)\n 27 debug script.s holtgrem R 0:06 1 med0703\n
listing nodes
$ sinfo -N\nNODELIST NODES PARTITION STATE\nmed0740 1 debug* idle\nmed0741 1 debug* down*\nmed0742 1 debug* down*\n\n$ scontrol show nodes\nNodeName=med0740 Arch=x86_64 CoresPerSocket=8\n CPUAlloc=0 CPUTot=32 CPULoad=0.06\n AvailableFeatures=(null)\n[...]\n\n$ scontrol show nodes med0740\nNodeName=med0740 Arch=x86_64 CoresPerSocket=8\n CPUAlloc=0 CPUTot=32 CPULoad=0.06\n AvailableFeatures=(null)\n ActiveFeatures=(null)\n Gres=(null)\n NodeAddr=med0740 NodeHostName=med0740 Version=20.02.0\n OS=Linux 3.10.0-1062.12.1.el7.x86_64 #1 SMP Tue Feb 4 23:02:59 UTC 2020\n RealMemory=1 AllocMem=0 FreeMem=174388 Sockets=2 Boards=1\n State=IDLE ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A\n Partitions=debug\n BootTime=2020-03-05T00:54:15 SlurmdStartTime=2020-03-05T16:23:25\n CfgTRES=cpu=32,mem=1M,billing=32\n AllocTRES=\n CapWatts=n/a\n CurrentWatts=0 AveWatts=0\n ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s\n
queue states
$ squeue\n JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)\n$ squeue -u holtgrem_c\n JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)\n
node resources
$ sinfo -o \"%20N %10c %10m %25f %10G \"\n
additional resources such as GPUs
$ sinfo -o \"%N %G\"\n
listing job details
$ scontrol show job 225\nJobId=225 JobName=bash\n UserId=XXX(135001) GroupId=XXX(30069) MCS_label=N/A\n Priority=4294901580 Nice=0 Account=(null) QOS=normal\n JobState=FAILED Reason=NonZeroExitCode Dependency=(null)\n Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=130:0\n RunTime=00:16:27 TimeLimit=14-00:00:00 TimeMin=N/A\n SubmitTime=2020-03-23T11:34:26 EligibleTime=2020-03-23T11:34:26\n AccrueTime=Unknown\n StartTime=2020-03-23T11:34:26 EndTime=2020-03-23T11:50:53 Deadline=N/A\n SuspendTime=None SecsPreSuspend=0 LastSchedEval=2020-03-23T11:34:26\n Partition=gpu AllocNode:Sid=hpc-login-1:1918\n ReqNodeList=(null) ExcNodeList=(null)\n NodeList=med0301\n BatchHost=med0301\n NumNodes=1 NumCPUs=2 NumTasks=0 CPUs/Task=1 ReqB:S:C:T=0:0:*:*\n TRES=cpu=2,node=1,billing=2\n Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*\n MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0\n Features=(null) DelayBoot=00:00:00\n OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)\n Command=bash\n WorkDir=XXX\n Power=\n TresPerNode=gpu:tesla:4\n MailUser=(null) MailType=NONE\n
host:~$ squeue\n JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) \n 1177 medium bash jweiner_ R 4-21:52:24 1 med0127 \n 1192 medium bash jweiner_ R 4-07:08:40 1 med0127 \n 1209 highmem bash mkuhrin_ R 2-01:07:17 1 med0402 \n 1210 gpu bash hilberta R 1-10:30:34 1 med0304 \n 1213 long bash schubacm R 1-09:42:27 1 med0127 \n 2401 gpu bash ramkem_c R 1-05:14:53 1 med0303 \n 2431 medium ngs_mapp holtgrem R 1-05:01:41 1 med0127 \n 2437 critical snakejob holtgrem R 1-05:01:34 1 med0135 \n 2733 debug bash schubacm R 7:36:42 1 med0127 \n 3029 critical ngs_mapp holtgrem R 5:59:07 1 med0127 \n 3030 critical snakejob holtgrem R 5:56:23 1 med0134 \n 3031 critical snakejob holtgrem R 5:56:23 1 med0137 \n 3032 critical snakejob holtgrem R 5:56:23 1 med0137 \n 3033 critical snakejob holtgrem R 5:56:23 1 med0138 \n 3034 critical snakejob holtgrem R 5:56:23 1 med0138 \n 3035 critical snakejob holtgrem R 5:56:20 1 med0139 \n 3036 critical snakejob holtgrem R 5:56:20 1 med0139 \n 3037 critical snakejob holtgrem R 5:56:20 1 med0140 \n 3038 critical snakejob holtgrem R 5:56:20 1 med0140 \n 3039 critical snakejob holtgrem R 5:56:20 1 med0141 \n 3040 critical snakejob holtgrem R 5:56:20 1 med0141 \n 3041 critical snakejob holtgrem R 5:56:20 1 med0142 \n 3042 critical snakejob holtgrem R 5:56:20 1 med0142 \n 3043 critical snakejob holtgrem R 5:56:20 1 med0143 \n 3044 critical snakejob holtgrem R 5:56:20 1 med0143 \n 3063 long bash schubacm R 4:12:37 1 med0127 \n 3066 long bash schubacm R 4:11:47 1 med0127 \n 3113 medium ngs_mapp holtgrem R 1:52:33 1 med0708 \n 3118 medium snakejob holtgrem R 1:50:38 1 med0133 \n 3119 medium snakejob holtgrem R 1:50:38 1 med0703 \n 3126 medium snakejob holtgrem R 1:50:38 1 med0706 \n 3127 medium snakejob holtgrem R 1:50:38 1 med0144 \n 3128 medium snakejob holtgrem R 1:50:38 1 med0144 \n 3133 medium snakejob holtgrem R 1:50:35 1 med0147 \n 3134 medium snakejob holtgrem R 1:50:35 1 med0147 \n 3135 medium snakejob holtgrem R 1:50:35 1 med0148 \n 3136 medium snakejob holtgrem R 1:50:35 1 med0148 \n 3138 medium snakejob holtgrem R 1:50:35 1 med0104 \n
host:~$ squeue -o \"%.10i %9P %20j %10u %.2t %.10M %.6D %10R %b\"\n JOBID PARTITION NAME USER ST TIME NODES NODELIST(R TRES_PER_NODE\n 1177 medium bash jweiner_m R 4-21:52:22 1 med0127 N/A\n 1192 medium bash jweiner_m R 4-07:08:38 1 med0127 N/A\n 1209 highmem bash mkuhrin_m R 2-01:07:15 1 med0402 N/A\n 1210 gpu bash hilberta_c R 1-10:30:32 1 med0304 gpu:tesla:4\n 1213 long bash schubacm_c R 1-09:42:25 1 med0127 N/A\n 2401 gpu bash ramkem_c R 1-05:14:51 1 med0303 gpu:tesla:1\n 2431 medium ngs_mapping holtgrem_c R 1-05:01:39 1 med0127 N/A\n 2437 critical snakejob.ngs_mapping holtgrem_c R 1-05:01:32 1 med0135 N/A\n 2733 debug bash schubacm_c R 7:36:40 1 med0127 N/A\n 3029 critical ngs_mapping holtgrem_c R 5:59:05 1 med0127 N/A\n 3030 critical snakejob.ngs_mapping holtgrem_c R 5:56:21 1 med0134 N/A\n 3031 critical snakejob.ngs_mapping holtgrem_c R 5:56:21 1 med0137 N/A\n 3032 critical snakejob.ngs_mapping holtgrem_c R 5:56:21 1 med0137 N/A\n 3033 critical snakejob.ngs_mapping holtgrem_c R 5:56:21 1 med0138 N/A\n 3034 critical snakejob.ngs_mapping holtgrem_c R 5:56:21 1 med0138 N/A\n 3035 critical snakejob.ngs_mapping holtgrem_c R 5:56:18 1 med0139 N/A\n 3036 critical snakejob.ngs_mapping holtgrem_c R 5:56:18 1 med0139 N/A\n 3037 critical snakejob.ngs_mapping holtgrem_c R 5:56:18 1 med0140 N/A\n 3038 critical snakejob.ngs_mapping holtgrem_c R 5:56:18 1 med0140 N/A\n 3039 critical snakejob.ngs_mapping holtgrem_c R 5:56:18 1 med0141 N/A\n 3040 critical snakejob.ngs_mapping holtgrem_c R 5:56:18 1 med0141 N/A\n 3041 critical snakejob.ngs_mapping holtgrem_c R 5:56:18 1 med0142 N/A\n 3042 critical snakejob.ngs_mapping holtgrem_c R 5:56:18 1 med0142 N/A\n 3043 critical snakejob.ngs_mapping holtgrem_c R 5:56:18 1 med0143 N/A\n 3044 critical snakejob.ngs_mapping holtgrem_c R 5:56:18 1 med0143 N/A\n 3063 long bash schubacm_c R 4:12:35 1 med0127 N/A\n 3066 long bash schubacm_c R 4:11:45 1 med0127 N/A\n 3113 medium ngs_mapping holtgrem_c R 1:52:31 1 med0708 N/A\n 3118 medium snakejob.ngs_mapping holtgrem_c R 1:50:36 1 med0133 N/A\n 3119 medium snakejob.ngs_mapping holtgrem_c R 1:50:36 1 med0703 N/A\n 3126 medium snakejob.ngs_mapping holtgrem_c R 1:50:36 1 med0706 N/A\n 3127 medium snakejob.ngs_mapping holtgrem_c R 1:50:36 1 med0144 N/A\n 3128 medium snakejob.ngs_mapping holtgrem_c R 1:50:36 1 med0144 N/A\n 3133 medium snakejob.ngs_mapping holtgrem_c R 1:50:33 1 med0147 N/A\n 3134 medium snakejob.ngs_mapping holtgrem_c R 1:50:33 1 med0147 N/A\n 3135 medium snakejob.ngs_mapping holtgrem_c R 1:50:33 1 med0148 N/A\n 3136 medium snakejob.ngs_mapping holtgrem_c R 1:50:33 1 med0148 N/A\n 3138 medium snakejob.ngs_mapping holtgrem_c R 1:50:33 1 med0104 N/A\n
host:~$ sinfo\nPARTITION AVAIL TIMELIMIT NODES STATE NODELIST \ndebug* up 8:00:00 11 drain med[0707,0709-0710,0740-0742,0744-0745,0749,0752,0755] \ndebug* up 8:00:00 8 mix med[0104,0127,0133-0135,0703,0706,0708] \ndebug* up 8:00:00 10 alloc med[0137-0144,0147-0148] \ndebug* up 8:00:00 103 idle med[0105-0124,0136,0145-0146,0151-0164,0201-0264,0704-0705] \nmedium up 7-00:00:00 11 drain med[0707,0709-0710,0740-0742,0744-0745,0749,0752,0755] \nmedium up 7-00:00:00 8 mix med[0104,0127,0133-0135,0703,0706,0708] \nmedium up 7-00:00:00 10 alloc med[0137-0144,0147-0148] \nmedium up 7-00:00:00 103 idle med[0105-0124,0136,0145-0146,0151-0164,0201-0264,0704-0705] \nlong up 28-00:00:0 11 drain med[0707,0709-0710,0740-0742,0744-0745,0749,0752,0755] \nlong up 28-00:00:0 8 mix med[0104,0127,0133-0135,0703,0706,0708] \nlong up 28-00:00:0 10 alloc med[0137-0144,0147-0148] \nlong up 28-00:00:0 103 idle med[0105-0124,0136,0145-0146,0151-0164,0201-0264,0704-0705] \ncritical up 7-00:00:00 11 drain med[0707,0709-0710,0740-0742,0744-0745,0749,0752,0755] \ncritical up 7-00:00:00 8 mix med[0104,0127,0133-0135,0703,0706,0708] \ncritical up 7-00:00:00 10 alloc med[0137-0144,0147-0148] \ncritical up 7-00:00:00 103 idle med[0105-0124,0136,0145-0146,0151-0164,0201-0264,0704-0705] \nhighmem up 14-00:00:0 1 mix med0402 \nhighmem up 14-00:00:0 3 idle med[0401,0403-0404] \ngpu up 14-00:00:0 2 mix med[0303-0304] \ngpu up 14-00:00:0 2 idle med[0301-0302] \n
"},{"location":"slurm/commands-sacct/","title":"Slurm Command: sacct
","text":"Perform queries to the Slurm accounting information.
Representative Example
hpc-login-1:~$ sacct -j 1607103\n JobID JobName Partition Account AllocCPUS State ExitCode\n------------ ---------- ---------- ---------- ---------- ---------- --------\n1607103 wgs_sv_an+ medium 1 PENDING 0:0\n
The sacct
command displays information from the Slurm accounting service. The Slurm scheduler only knows about active or completing (very recently active) jobs. The accouting system also knows about currently running jobs so it is the more robust way to query information about jobs. However, not all information is available to the accouting system, so scontrol show job
and squeue
provide more information about current and pending jbos.
Slurm Documentation: sacct
Please also see the official Slurm documentation on sacct.
"},{"location":"slurm/commands-sacct/#important-arguments","title":"Important Arguments","text":"Also see all important arguments of the sbatch
command.
--jobs
-- The job(s) to query for.--format
-- Define attributes to retrieve.--long
-- Get a lot of information from the database, consider to pipe into | less -S
.sacct
over scontrol
and squeue
.sattach
","text":"The sattach
command allows you to connect the standard input, output, and error streams to your current terminals ession.
Representative Example
hpc-login-1:~$ sattach 12345.0\n[...output of your job...]\nmed0211:~$ [Ctrl-C]\nhpc-login-1:~$\n
Press Ctrl-C
to detach from the current session. Please note that you will have to give the job ID as well as step step ID. For most cases, simply append \".0\"
to your job ID.
Slurm Documentation: sattach
Please also see the official Slurm documentation on srun.
"},{"location":"slurm/commands-sattach/#important-arguments","title":"Important Arguments","text":"--pty
-- Execute task zero in pseudo terminal.--verbose
-- Increase verbosity of sattach
.sbatch
","text":"The sbatch
command allows you to put a job into the scheduler's queue to be executed at a later time.
Representative Example
# Execute job.sh in partition medium with 4 threads and 4GB of RAM total for a\n# running time of up to one day.\nhpc-login-1:~$ sbatch --partition=medium --mem=4G --ntasks 4 --time=1-00 job.sh\nSubmitted batch job JOB_ID\n
The command will create a batch job and add it to the queue to be executed at a later point in time.
Slurm Documentation: sbatch
Please also see the official Slurm documentation on sbatch.
"},{"location":"slurm/commands-sbatch/#important-arguments","title":"Important Arguments","text":"--array
-- Submit jobs as array jobs. Also see the section [#array-jobs] below.--nodes
-- The number of nodes to allocate. This is only given here as an important argument as the maximum number of nodes allocatable to any partition but mpi
is set to one (1). This is done as there are few users on the BIH HPC that actually use multi-node paralleilsm. Rather, most users will use multi-core parallelism and might forget to limit the number of nodes which causes inefficient allocation of resources.--cpus-per-task
-- This corresponds to the number of CPU cores allocated to each task.--mem
-- The memory to allocate for the job. As you can define minimal and maximal number of tasks/CPUs/cores, you could also specify --mem-per-cpu
and get more flexible scheduling of your job.--gres
-- Generic resource allocation. On the BIH HPC, this is only used for allocating GPUS, e.g., with --gres=gpu:tesla:2
, a user could allocate two NVIDIA Tesla GPUs on the same host (use a40
instead of tesla
for the A40 GPUs).--licenses
-- On the BIH HPC, this is used for the allocation of MATLAB 2016b licenses only.--partition
-- The partition to run in. Also see the Job Scheduler section.--time
-- Specify the running time, see man sbatch
or the official Slurm documentation on srun for supported formats. **Please note that the DRMA API only accepts the hours:minutes
format.--dependency
-- Specify dependencies on other jobs, e.g., using --dependency afterok:JOBID
to only execute if the job with ID JOBID
finished successfully or --dependency after:JOBID
to wait for a job to finish regardless of its termination status.--constraint
-- Require one or more features from your node. On the BIH HPC, the processor generation is defined as a feature on the nodes, e.g., haswell
, or special networking such as infiniband
. You can have a look at /etc/slurm/slurm.conf
on all configured features.--output
-- The path to the output log file (by default joining stdout and stderr, see the man page on --error
on how to redirect stderr separately). A various number of placeholders is available, see the \"filename pattern\" section of man sbatch
or the official Slurm documentation on srun.--mail-type=<type>
-- Send out notifications by email when an event occurs. Use FAIL
to get emails when your job fails. Also see the documentation of sbatch in the Slurm manual.--mail-user=<email>
-- The email address to send to. Must end in @charite.de
, @mdc-berlin.de
, or @bih-charite.de
.Ensure your --output
directory exists!
In the case that the path to the log/output file does not exist, the job will just fail. scontrol show job ID
will report JobState=FAILED Reason=NonZeroExitCode
. Regrettably, no further information is displayed to you as the user. Always check that the path to the directories in StdErr
and StdOut
exists when checking scontrol show job ID
.
--job-name
Also see the section Slurm Job Scripts on how to embed the sbatch
parameters in #SBATCH
lines.
If you have many (say, more than 10) similar jobs (e.g., when performing a grid search), you can also use array jobs. However, you should also consider whether it would make sense to increase the time of your jobs, e.g, to be at least ~10min.
You can submit array jobs by specifying -a EXPR
or --array EXPR
where EXPR
is a range or a list (of course, you can also add this as an #SBATCH
header in your job script). For example:
hpc-login-1 ~# sbatch -a 1-3 grid_search.sh\nhpc-login-1 ~# sbatch -a 1,2,5-10 grid_search.sh\n
This will submit grid_search.sh
with certain variables set:
SLURM_ARRAY_JOB_ID
-- the ID of the first jobSLURM_ARRAY_TASK_ID
-- the index of the job in the arraySLURM_ARRAY_TASK_COUNT
-- number of submitted jobs in arraySLURM_ARRAY_TASK_MAX
-- higehst job array index valueSLURM_ARRAY_TASK_MIN
-- lowest job array index valueUsing array jobs has several advantages:
Also see Slurm documentation on job arrays.
For example, if you submit sbatch --array=1-3 grid_search.sh
and slurm responsds with Submitted batch job 36
then the script will be run three times with the following prameters set:
SLURM_JOB_ID=36\nSLURM_ARRAY_JOB_ID=36\nSLURM_ARRAY_TASK_ID=1\nSLURM_ARRAY_TASK_COUNT=3\nSLURM_ARRAY_TASK_MAX=3\nSLURM_ARRAY_TASK_MIN=1\n\nSLURM_JOB_ID=37\nSLURM_ARRAY_JOB_ID=36\nSLURM_ARRAY_TASK_ID=2\nSLURM_ARRAY_TASK_COUNT=3\nSLURM_ARRAY_TASK_MAX=3\nSLURM_ARRAY_TASK_MIN=1\n\nSLURM_JOB_ID=38\nSLURM_ARRAY_JOB_ID=36\nSLURM_ARRAY_TASK_ID=3\nSLURM_ARRAY_TASK_COUNT=3\nSLURM_ARRAY_TASK_MAX=3\nSLURM_ARRAY_TASK_MIN=1\n
"},{"location":"slurm/commands-sbatch/#notes","title":"Notes","text":"sbatch
are governed by resource allocations, in particular:sbatch
jobs have a maximal running time set,sbatch
jobs have a maximal memory and number of cores set, andscontrol show job JOBID
.scancel
","text":"Terminate a running Slurm job.
Representative Example
hpc-login-1:~$ scancel 1703828\nhpc-login-1:~$\n
This command allows to terminate one or more running jobs (of course, non-superusers can only terminate their own jobs).
Slurm Documentation: scancel
Please also see the official Slurm documentation on srun.
"},{"location":"slurm/commands-scontrol/","title":"Slurm Command:scontrol
","text":"The scontrol
allows to query detailed information from the scheduler and perform manipulation. Object manipulation is less important for normal users.
Representative Example
hpc-login-1:~$ scontrol show job 1607103\nJobId=1607103 JobName=wgs_sv_annotation\n UserId=holtgrem_c(100131) GroupId=hpc-ag-cubi(5272) MCS_label=N/A\n Priority=748 Nice=0 Account=(null) QOS=normal\n [...]\nhpc-login-1:~$ scontrol show node med02[01-32]\nNodeName=med0201 Arch=x86_64 CoresPerSocket=8\n CPUAlloc=0 CPUTot=32 CPULoad=0.01\n AvailableFeatures=ivybridge,infiniband\n ActiveFeatures=ivybridge,infiniband\n [...]\nhpc-login-1:~$ scontrol show partition medium\nPartitionName=medium\n AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL\n AllocNodes=ALL Default=NO QoS=medium\n DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO\n [...]\n
This command allows to query all information for an object from Slurm, e.g., jobs, nodes, or partitions. The command also accepts ranges of jobs and hosts. It is most useful to get the information of one or a few objects from the scheduler.
Slurm Documentation: scontrol
Please also see the official Slurm documentation on scontrol.
"},{"location":"slurm/commands-scontrol/#important-sub-commands","title":"Important Sub commands","text":"scontrol show job
-- Show details on jobs.scontrol show partition
-- Show details on partitions.scontrol show node
-- Show details on nodes.scontrol help
-- Show help.scontrol
-- Start an interactive scontrol shell / REPL (read-eval-print loop).scontrol
can only work on jobs that are pending (in the queue), running, or in \"completing' state.sacct
command.sinfo
","text":"The sinfo
command allows you to query the current cluster status.
Representative Example
hpc-login-1:~$ sinfo\nPARTITION AVAIL TIMELIMIT NODES STATE NODELIST\n[...]\nmedium up 7-00:00:00 10 drain* med[0101-0103,0125-0126,0128-0132]\nmedium up 7-00:00:00 1 down* med0243\nmedium up 7-00:00:00 31 mix med[0104,0106-0122,0124,0133,0232-0233,0237-0238,0241-0242,0244,0263-0264,0503,0506]\nmedium up 7-00:00:00 5 alloc med[0105,0123,0127,0239-0240]\nmedium up 7-00:00:00 193 idle med[0134-0164,0201-0231,0234-0236,0245-0262,0501-0502,0504-0505,0507-0516,0601-0632,0701-0764]\n[...]\nhpc-login-1:$ sinfo --summarize\nPARTITION AVAIL TIMELIMIT NODES(A/I/O/T) NODELIST\ndebug* up 8:00:00 38/191/11/240 med[0101-0164,0201-0264,0501-0516,0601-0632,0701-0764]\nmedium up 7-00:00:00 38/191/11/240 med[0101-0164,0201-0264,0501-0516,0601-0632,0701-0764]\nlong up 28-00:00:0 38/191/11/240 med[0101-0164,0201-0264,0501-0516,0601-0632,0701-0764]\ncritical up 7-00:00:00 25/141/10/176 med[0101-0164,0501-0516,0601-0632,0701-0764]\nhighmem up 14-00:00:0 1/2/1/4 med[0401-0404]\ngpu up 14-00:00:0 3/0/1/4 med[0301-0304]\nmpi up 14-00:00:0 38/191/11/240 med[0101-0164,0201-0264,0501-0516,0601-0632,0701-0764]\n
This command will summaries the state of nodes by different criteria (e.g., by partition or globally).
Slurm Documentation: sinfo
Please also see the official Slurm documentation on srun.
"},{"location":"slurm/commands-sinfo/#important-arguments","title":"Important Arguments","text":"Also see all important arguments of the sinfo
command.
--summarize
-- Summarize the node state by partition.--nodes
-- Select the nodes to show the status for, e.g., display the status of all GPU nodes with sinfo -n med030[1-4]
.The most important node states are:
down
-- node is marked as offlinedraining
-- node will not accept any more jobs but has jobs running on itdrained
-- node will not accept any more jobs and has no jobs running on it, but is not offline yetidle
-- node is ready to run jobsallocated
-- node is fully allocated (e.g., CPU, RAM, or GPU limit has been reached)mixed
-- node is running jobs but there is space for moresqueue
","text":"The squeue
command allows you to view currently running and pending jobs.
Representative Example
hpc-login-1:~$ squeue\n JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)\n 1583165 highmem 20200702 usr PD 0:00 1 (DependencyNeverSatisfied)\n 1605901 critical variant_ holtgrem PD 0:00 1 (DependencyNeverSatisfied)\n 1605902 critical variant_ holtgrem PD 0:00 1 (Dependency)\n 1605905 critical variant_ holtgrem PD 0:00 1 (DependencyNeverSatisfied)\n 1605916 critical wgs_sv_c holtgrem PD 0:00 1 (Dependency)\n 1607103 medium wgs_sv_a holtgrem PD 0:00 1 (DependencyNeverSatisfied)\n[...]\n
Slurm Documentation: squeue
Please also see the official Slurm documentation on squeue.
"},{"location":"slurm/commands-squeue/#important-arguments","title":"Important Arguments","text":"--nodelist
-- Only display jobs running on certain nodes (e.g., GPU nodes).--format
-- Define the format to print, see man squeue
for details. See below for a format string that includes the jobid, partition, job name, user name, job status, running time, number of nodes, number of CPU cores, and allocated GPUs.The following aliases in ~/.bashrc
will allow you to print a long and informative squeue
output with sq
, pipe it into less with sql
, get only your jobs (adjust the alias
to your account) using sqme
and pipe that into less with sqmel
.
alias sq='squeue -o \"%.10i %9P %60j %10u %.2t %.10M %.6D %.4C %10R %b\" \"$@\"'\nalias sql='sq \"$@\" | less -S'\nalias sqme='sq -u YOURUSER_c_or_m \"$@\"'\nalias sqmel='sqme \"$@\" | less -S'\n
"},{"location":"slurm/commands-srun/","title":"Slurm Command: srun
","text":"The srun
command allows you to run a command now.
Representative Example
hpc-login-1:~$ srun --pty bash -i\nmed0201:~$\n
The command will perform a resource allocation with the scheduler (and wait until it has allocated the requested resources) first. Most importantly, you can specify the --pty
argument which will connect the current terminal's standard output, error, and input to your current one. This allows you to run interactive jobs such as shells with srun --pty bash -i
.
Slurm Documentation: srun
Please also see the official Slurm documentation on srun.
"},{"location":"slurm/commands-srun/#important-arguments","title":"Important Arguments","text":"Also see all important arguments of the sbatch
command.
--pty
-- Connect current terminal to the job's stdoud/stderr/stdin.--x11
-- Setup X11 forwarding.--immediate
-- Immediately terminate if the resources to run the job are not available, do not wait.--test-only
-- Don't run anything, but only estimate when the job would be scheduled.srun
are governed by resource allocations, in particular:srun
jobs have a maximal running time set,srun
jobs have a maximal memory and number of cores set, andscontrol show job JOBID
.In the sections Slurm Quickstart and Slurm Cheat Sheet, we have seen that sinfo
and squeue
allow for the compact display partitions/nodes and node information. In contrast, scontrol show job <id>
and scontrol show partition <id>
and scontrol show node <id>
show comprehensive information that quickly gets hard to comprehend for multiple entries.
Now you might ask: is there anything in between? And: yes, there is.
You can tune the output of sinfo
and squeue
using parameters, in particular by providing format strings. All of this is described in the man pages of the commands that you can display with man sinfo
and man squeue
on the cluster.
sinfo
Output","text":"Notable arguments of sinfo
are:
-N, --Node
-- uncompress the usual lines and display one line per node and partition.-s, --summarize
-- compress the node state, more compact display.-R, --list-reasons
-- for nodes that are not up, display reason string provided by admin.-o <fmt>, --format=<fmt>
-- use format string for display.The most interesting argument is -o/--format
. The man page lists the following values that are used when using other arguments. In other words, many of the display modifications could also be applied with -o/--format
.
default \"%#P %.5a %.10l %.6D %.6t %N\"\n--summarize \"%#P %.5a %.10l %.16F %N\"\n--long \"%#P %.5a %.10l %.10s %.4r %.8h %.10g %.6D %.11T %N\"\n--Node \"%#N %.6D %#P %6t\"\n--long --Node \"%#N %.6D %#P %.11T %.4c %.8z %.6m %.8d %.6w %.8f %20E\"\n--list-reasons \"%20E %9u %19H %N\"\n--long --list-reasons\n \"%20E %12U %19H %6t %N\"\n
The best way to learn more about this is to play around with sinfo -o
, starting out with one of the format strings above. Details about the format strings are described in man sinfo
. Some remarks here:
%<num><char>
displays the value represented by <char>
padded with spaces to the right such that a width of <num>
is reached,%.<num><char>
displays the value represented by <char>
padded with spaces to the left such that a width of <num>
is reached, and%#<char>
displays the value represented by <char>
padded with spaces to the max length of the value represented by <char>
(this is a \"virtual\" value, used internally only, you cannot use this and you will have to place an integer here).For example, to create a grouped display with reasons for being down use:
hpc-login-1:~$ sinfo -o \"%10P %.5a %.10l %.16F %40N %E\"\nPARTITION AVAIL TIMELIMIT NODES(A/I/O/T) NODELIST REASON\ndebug* up 8:00:00 0/0/16/16 med[0703-0710,0740-0742,0744-0745,0749,0 bogus node\ndebug* up 8:00:00 18/98/0/116 med[0104-0124,0127,0133-0148,0151-0164,0 none\nmedium up 7-00:00:00 0/0/16/16 med[0703-0710,0740-0742,0744-0745,0749,0 bogus node\nmedium up 7-00:00:00 18/98/0/116 med[0104-0124,0127,0133-0148,0151-0164,0 none\nlong up 28-00:00:0 0/0/16/16 med[0703-0710,0740-0742,0744-0745,0749,0 bogus node\nlong up 28-00:00:0 18/98/0/116 med[0104-0124,0127,0133-0148,0151-0164,0 none\ncritical up 7-00:00:00 0/0/16/16 med[0703-0710,0740-0742,0744-0745,0749,0 bogus node\ncritical up 7-00:00:00 18/98/0/116 med[0104-0124,0127,0133-0148,0151-0164,0 none\nhighmem up 14-00:00:0 0/4/0/4 med[0401-0404] none\ngpu up 14-00:00:0 3/1/0/4 med[0301-0304] none\n
"},{"location":"slurm/format-strings/#tuning-squeue-output","title":"Tuning squeue
Output","text":"The standard squeue output might yield the following
hpc-login-1:~$ squeue | head\n JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)\n 3149 medium variant_ holtgrem PD 0:00 1 (Dependency)\n 1177 medium bash jweiner_ R 6-03:32:41 1 med0127\n 1192 medium bash jweiner_ R 5-12:48:57 1 med0127\n 1210 gpu bash hilberta R 2-16:10:51 1 med0304\n 1213 long bash schubacm R 2-15:22:44 1 med0127\n 2401 gpu bash ramkem_c R 2-10:55:10 1 med0303\n 3063 long bash schubacm R 1-09:52:54 1 med0127\n 3066 long bash schubacm R 1-09:52:04 1 med0127\n 3147 medium ngs_mapp holtgrem R 1-03:13:42 1 med0148\n
Looking at man squeue
, we learn that the default format strings are:
default \"%.18i %.9P %.8j %.8u %.2t %.10M %.6D %R\"\n-l, --long \"%.18i %.9P %.8j %.8u %.8T %.10M %.9l %.6D %R\"\n-s, --steps \"%.15i %.8j %.9P %.8u %.9M %N\"\n
This looks a bit wasteful. Let's cut down on the padding of the job ID and expand on the job name and remove some right paddings.
hpc-login-1:~$ squeue -o \"%.6i %9P %30j %.10u %.2t %.10M %.6D %R %b\" | head\n JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)\n 3149 medium variant_calling holtgrem_c PD 0:00 1 (Dependency)\n 1177 medium bash jweiner_m R 6-03:35:55 1 med0127\n 1192 medium bash jweiner_m R 5-12:52:11 1 med0127\n 1210 gpu bash hilberta_c R 2-16:14:05 1 med0304\n 1213 long bash schubacm_c R 2-15:25:58 1 med0127\n 2401 gpu bash ramkem_c R 2-10:58:24 1 med0303\n 3063 long bash schubacm_c R 1-09:56:08 1 med0127\n 3066 long bash schubacm_c R 1-09:55:18 1 med0127\n 3147 medium ngs_mapping holtgrem_c R 1-03:16:56 1 med0148\n
"},{"location":"slurm/format-strings/#displaying-resources","title":"Displaying Resources","text":"Now display how many of our internal projects still exist.
hpc-login-1:~$ squeue -o \"%.6i %9P %30j %.10u %.2t %.10M %.6D %10R %s\" | head\n
The next steps are (TODO):
This page describes how to create SLURM job scripts.
SLURM job scripts look as follows. On the top you have lines starting with #SBATCH
. These appear as comments to bash scripts. These lines are interpreted by sbatch
in the same way as command line arguments. That is, when later submitting the script with sbatch my-job.sh
you can either have the parameter to the sbatch
call or in the file.
Multi-Node Allocation in Slurm
Classically, jobs on HPC systems are written in a way that they can run on multiple nodes at once, using the network to communicate. Slurm comes from this world and when allocating more than one CPU/core, it might allocate them on different nodes. Please use --nodes=1
to force Slurm to allocate them on a single node.
Creating the Script
host:example$ cat >my-job.sh <<\"EOF\"\n#!/bin/bash\n#\n#SBATCH --job-name=this-is-my-job\n#SBATCH --output=output.txt\n#\n#SBATCH --ntasks=1\n#SBATCH --nodes=1\n#SBATCH --time=10:00\n#SBATCH --mem-per-cpu=100M\n\ndate\n\nhostname\n>&2 echo \"Hello World\"\n\nsleep 1m\n\ndate\nEOF\n
Also see the SLURM Rosetta Stone for more options.
Submit, Look at Queue & Result
host:example$ sbatch script.sh \nSubmitted batch job 315\nhost:example$ squeue -u holtgrem_c\n JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) \n 315 debug this-is- holtgrem R 0:40 1 med0127 \nhost:example$ sleep 2m\nhost:example$ squeue -u holtgrem_c\n JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) \nhost:example$ cat output.txt \nWed Mar 25 13:30:56 CET 2020\nmed0127\nHello World\nWed Mar 25 13:31:56 CET 2020\n
"},{"location":"slurm/memory-allocation/","title":"Memory Allocation","text":"Memory allocation is one of the topics that users find confusing most often. This section first gives some technical background and then explains how to implement this properly with Slurm on the BIH HPC.
"},{"location":"slurm/memory-allocation/#technical-background","title":"Technical Background","text":"Technical Background Summary
Main memory used to be one of the most important topics when programming, as computers had so little. There is the infamous quote \"640KB ought ot be enough for anybody\" wrongly attribute to Bill Gates which refers to the fact that early computers could only address that amount of memory. In MS DOS, one had to use special libraries for a program to use more memory. Today, computers are very fast and memory is plentiful and people can (rightfully) forget about memory allocation ... as long as they don't use \"much\" memory by today's standards.
The Linux operating system differentiates between the following types of memory:
Note that above we are talking about processes, not Slurm jobs yet. Let us look at this in detail:
Each program uses some kind of memory management. For example, in C the malloc
and free
functions manually allocate and free memory while in Java, R, and Python, memory allocation and release is done automatically using a concept called garbage collection. Each program starts with a certain virtual memory size, that is the amount of memory it can address, say 128MB. When the program allocates memory, the memory allocation mechanism will check whether it has sufficient space left. If not, it will request an increase in virtual memory from the operating system, e.g., to 256MB. If this fails then the program can try to handle the error, e.g., terminate gracefully, but many programs will just panic and stop. Otherwise, the program will get access to more memory and happily continue to run.
However, programs can allocate humonguous amounts of virtual memory and only use a little. Memory is organized in \"pages\" (classically these are 4096 bytes each, but can be larger using so-called \"huge page\" features). The operating system tracks which memory pages are actually used by a process. The total size of these pages is called the resident set size: the amount of memory that is actually currently used by a program. Programs can also mark pages as unused again, thus freeing resident memory and can also decrease their virtual memory.
In some cases it is still interesting to use swap memory. Here, the contents of resident memory are copied to disk by the operating system. This process is completely transparent to the program; the data remains available at the original positions in the virtual memory! However, accessing it will take some time as it must be read back into main memory from the disk. In this way, it was possible for a computer with 4MB of RAM and a disk of 100MB to run programs that used 8MB. Of course, this was only really useable for programs that ran in the background. One could really feel the latency if a graphical program was using swapped memory (you could actually hear the hard drive working). Today, swap storage is normally only relevant when put your computer into hibernation. Given the large main memory on the cluster nodes, their small local hard drives (just used for loading the operating system), and the extreme slowness involved in using swapped memory, the BIH HPC nodes have no swap memory allocated.
Most HPC users will also use shared memory, at least implicitly. Whenever a program uses fork
to create a subprocess (BTW, this is not a thread), the program can chose to \"copy\" its current address space. The second process then has access to the same memory than the parent process in a copy-on-write fashion. This allows, for example, pre-loading a database, and also allows the use of already loaded library code by the child process as well. If the child process writes to the copy-on-write memory of the parent, the relevant memory page will be copied and attributed to the child.
Two or more processes can share the same memory explicitly. This is usually used for inter-process communication but the Bowtie program uses it for sharing the memory of indices. For example, the Python multiprocessing
module will use this, including if you have two MPI processes running on the same host.
Memory is also separated into segments, the most interesting ones are heap and stack memory. For compiled languages, memory can be allocated on either. For C, an int
variable will be allocated on the stack. Every time you call a function, a stack frame is created in memory to hold the local variables and other information for the duration of the function execution. The stack thus grows through function calls made by your program and shrinks when the functions return. The stack size for a process is limited (by ulimit -s
) and a program that goes too deep (e.g., via infinite recursion) will be terminated by the operating system if it exceeds this limit. Again in C, int * ptr = (int *)malloc(10 * sizeof(int));
will allocate memory for one variable (an integer pointer) on the stack and memory for 10 integers on the heap. When the function returns, the ptr
variable on the stack will be freed but to free the array of integers, you'd have to call free(ptr)
. If the memory is not freed then this constitutes a memory leak, but that is another topic.
Other relevant segments are code, where the compiled code lives, and data, where static data such as strings displayed to the user are stored. As a side node, in interpreted languages such as R or Python, the code and data segments will refer to the code and data of Python while the actual program text will be on the heap.
"},{"location":"slurm/memory-allocation/#interlude-memory-in-java","title":"Interlude: Memory in Java","text":"Memory in Java Summary
-XX:MaxHeapSize=<size>
(e.g., <size>=2G
) for your program and only tune the other parameters if neededJava's memory management provides for some interesting artifacts. When running simple Java programs, you will never run into this but if you need to use gigabytes of memory in Java then you will have to learn a bit about Java memory management. This is the case when running GATK programs, for example.
As different operating systems handle memory management differently, the Java virtual machine does its own memory management to provide a consistent interface. The following three settings are important in governing memory usage of Java:
-Xmx<size>
/-XX:MaxHeapSize=<size>
-- the maximal Java heap size-Xms<size>
/-XX:InitialHeapSize=<size>
-- the initial Java heap size-Xss<size>
/-XX:ThreadStackSize=<size>
-- maximal stack size available to a Java thread (e.g., the main thread)Above, <size>
is a memory specification, either in bytes or with a suffix, e.g., 80M
, or 1G
.
On startup, Java does roughly the following:
Memory freed by the Java garbage collector can be re-used by other Java objects (rss remains the same) or be freed in the operating system (rss decreases). The Java VM program itself will also consume memory on the OS stack but that is negligible.
Overall, the Java VM needs to store in main memory:
In the BIH HPC context, the following is recommended to:
Memory Allocation in Slurm Summary
--mem=<size>
(e.g., <size>=3G
) to allocate memory per nodesrun
and batch sbatch
jobs are governed by Slurm memory allocationOur Slurm configuration uses Linux cgroups to enforce a maximum amount of resident memory. You simply specify it using --mem=<size>
in your srun
and sbatch
command.
In the (rare) case that you provide more flexible number of threads (Slurm tasks) or GPUs, you could also look into --mem-per-cpu
and --mem-per-gpu
. The official Slurm sbatch manual is quite helpful, as is man sbatch
on the cluster command line.
Slurm (or rather Linux via cgroups) will track all memory started by all jobs by your process. If each process works independently (e.g., you put the output through a pipe prog1 | prog2
) then the amount of memory consumed will at any given time be the sum of the RSS of both processes at that time. If your program uses fork
, which uses memory in a copy-on-write fashion, the shared memory is of course only counted once. Note that Python's multiprocessing does not use copy on write: its data will be explicitly copied and consume additional memory. Refer to the Scipy/Numpy/Pandas etc. documentation on how to achieve parallelism without copying too much data.
The amount of virtual memory that your program can reserve is only \"virtually\" unlimited (pun not intended). However, in practice, the operating system will not like you allocating more than physically available. If your program attempts to allocate more memory than requested via Slurm, your program will be killed.
This is reported to you in the Slurm job output log as something like:
slurmstepd: error: Detected 1 oom-kill event(s) in step <JOB ID>.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.\n
You can inspect the amount of memory available on each node in total with sinfo --format \"%.10P %.10l %.6D %.6m %N\"
, as shown below.
$ sinfo --format \"%.10P %.10l %.6D %.6m %N\"\n PARTITION TIMELIMIT NODES MEMORY NODELIST\n debug* 8:00:00 240 128722 med[0101-0164,0201-0264,0501-0516,0601-0632,0701-0764]\n medium 7-00:00:00 240 128722 med[0101-0164,0201-0264,0501-0516,0601-0632,0701-0764]\n long 28-00:00:0 240 128722 med[0101-0164,0201-0264,0501-0516,0601-0632,0701-0764]\n critical 7-00:00:00 176 128722 med[0101-0164,0501-0516,0601-0632,0701-0764]\n highmem 14-00:00:0 4 515762 med[0401-0404]\n gpu 14-00:00:0 4 385215 med[0301-0304]\n mpi 14-00:00:0 240 128722 med[0101-0164,0201-0264,0501-0516,0601-0632,0701-0764]\n
"},{"location":"slurm/memory-allocation/#memorycpu-accounting-in-slurm","title":"Memory/CPU Accounting in Slurm","text":"Memory Accounting in Slurm Summary
sacct -j JOBID --format=JobID,MaxRSS
to display the RSS usage of your programsacct -j JOBID --format=Elapsed,AllocCPUs,TotalCPU
to display information about CPU usageWhile Slurm runs your job, it collects information about the job such as the running time, exit status, and memory usage. This information is available through the scheduling system via the squeue
and scontrol
commands, but only while the job is pending execution, executing, or currently completing. After job completion, the information is only available through the Slurm accounting system.
You can query information about jobs, e.g., using sacct
:
$ sacct -j 1607166\n JobID JobName Partition Account AllocCPUS State ExitCode\n------------ ---------- ---------- ---------- ---------- ---------- --------\n1607166 snakejob.+ critical 16 COMPLETED 0:0\n1607166.bat+ batch 16 COMPLETED 0:0\n1607166.ext+ extern 16 COMPLETED 0:0\n
This shows that the job with ID 1607166
with a job ID starting with snakejob.
has been run in the critical
partition, been allocated 16 cores and had an exit code of 0:0
. For technical reasons, there is a batch
and an extern
sub step. Actually, Slurm makes it possible to run various steps in one batch as documented in the Slurm documentation.
The sacct
command has various command-line options that you can read about via man sacct
or in the Slurm documentation. We can use --brief
/-b
to show only a brief summary.
$ sacct -j 1607166 --brief\n JobID State ExitCode\n------------ ---------- --------\n1607166 COMPLETED 0:0\n1607166.bat+ COMPLETED 0:0\n1607166.ext+ COMPLETED 0:0\n
Similarly, you can use --long
to display extended information (see the manual for the displayed columns). Very long report lines can be piped into less -S
for easier display. You can fine-tune the information to display with a format string to --format
:
$ sacct -j 1607166 --format=JobID,ReqMem,MaxRSS,Elapsed,TotalCPU,AllocCPUS\n JobID ReqMem MaxRSS Elapsed TotalCPU AllocCPUS\n------------ --------- ---------- ---------- ---------- ----------\n1607166 60Gn 13:07:31 7-16:21:29 16\n1607166.bat+ 60Gn 4314560K 13:07:31 7-16:21:29 16\n1607166.ext+ 60Gn 0 13:07:31 00:00.001 16\n
From this command, we can read that we allocate 60GB memory of memory per node (suffix n
, here Gn
for gigabytes per node) and the maximum RSS is reported as 4.3GB. You can use this information to fine-tune your memory allocations. As a side-remark, a suffic c
indicates the memory per core (e.g., that could be60Gc
)
Further, the program ran for 13 hours and 7 minutes with allocated 16 CPU cores and consumed a total of 7 days, 16 hours, and 21 minutes of CPU time. Thus, a total of 10,061 CPU minutes were spent in 787 minutes wall-clock time. This yields an overall empirical degree of parallelism of about 10061 / 787 = 14, and a parallel efficiency of 14 / 16 = 88%. The discussion of parallel efficiency is a topic not covered here.
However, you can use the awk
script below to compute the empirical parallelism (EmpPar
) and the parallel efficiency (ParEff
). The script also displays the difference I requested, and used RSS (DiffRSS
). The script can be found here.
$ sacct -j 1607166 --format=JobID,ReqMem,MaxRSS,Elapsed,TotalCPU,AllocCPUS \\\n | awk -f quick-sacct.awk\n JobID ReqMem MaxRSS Elapsed TotalCPU AllocCPUS EmpPar ParEff DiffMEM\n------------ ---------- ---------- ---------- ---------- ---------- --------- -------- --------\n1607166 60Gn 13:07:31 7-16:21:29 16 0.00 0.00 -\n1607166.bat+ 60Gn 4314560K 13:07:31 7-16:21:29 16 14.05 0.88 55.89\n1607166.ext+ 60Gn 0 13:07:31 00:00.001 16 0.00 0.00 -\n
"},{"location":"slurm/overview/","title":"Scheduling Overview","text":"The BIH HPC uses the Slurm scheduling system for resource allocation. This section of the manual attempts to give an overview of what scheduling is and how to use the Slurm scheduler. For more detailed information, you will have to refer to the Slurm website and the Slurm man pages (e.g., by entering man sbatch
or man srun
on the HPC terminal's command line).
For a quick introduction and hands-on examples, please see the manual sections
Also, make sure that you are aware of our How-To: Debug Software and How-To: Debug Software on HPC Systems guides in the case that something goes wrong.
"},{"location":"slurm/overview/#annotated-contents","title":"Annotated Contents","text":"srun
-- running parallel jobs nowsbatch
-- submission of batch jobsscancel
-- stop/kill jobssinfo
-- display information about the Slurm clustersqueue
-- information about pending and running jbosscontrol
-- detailed information (and control)sacct
-- access Slurm accounting information (pending, running, and past jobs)Many other facilities run Slurm clusters and make their documentation available on the internet. We list some that we found useful below. However, be aware that Slurm is a highly configurable and extensible system. Other sites may have different configurations and plugins enabled than we have (or might even have written custom plugins that are not available at BIH). In any case, it's always useful to look \"\u00fcber den Tellerrand\".
man
Pages - web versions of Unix manual (man
) pages.Create an interactive bash session (srun
will run bash in real-time, --pty
connects its stdout
and stderr
to your current session).
hpc-login-1:~$ srun --pty bash -i\nmed0740:~$ echo \"Hello World\"\nHello World\nmed0740:~$ exit\nhpc-login-1:~$\n
Note you probably want to longer running time for your interactive jobs. This way, your jobs can run for up to 28 days. This will make your job be routed automatically into the long
partition as it is the only one that can fit your job.
hpc-login-1:~$ srun --pty --time 28-00 bash -i\nmed0740:~$\n
Pro-Tip: Using Bash aliases for quick access.
hpc-login-1:~$ alias slogin=\"srun --pty bash -i\"\nhpc-login-1:~$ slogin\nmed0740:~$ exit\nhpc-login-1:~$ cat >>~/.bashrc <<\"EOF\"\n# Useful aliases for logging in via Slurm\nalias slogin=\"srun --pty bash -i\"\nalias slogin-x11=\"srun --pty --x11 bash -i\"\nEOF\n
Create an interactive R session on the cluster (assuming conda is active and the environment my-r
is created, e.g., with conda create -n my-r r
).
hpc-login-1:~$ conda activate my-r\nhpc-login-1:~$ srun --pty R\nR version 3.6.2 (2019-12-12) -- \"Dark and Stormy Night\"\nCopyright (C) 2019 The R Foundation for Statistical Computing\n[...]\nType 'demo()' for some demos, 'help()' for on-line help, or\n'help.start()' for an HTML browser interface to help.\nType 'q()' to quit R.\n\n\n> Sys.info()[\"nodename\"]\n nodename\n\"med0740\"\n> q()\nSave workspace image? [y/n/c]:\nhpc-login-1:~$\n
Create an interactive iPython session on the cluster (assuming conda is active and the environment my-python
is created, e.g., with conda create -n my-python python=3 ipython
).
hpc-login-1:~$ conda activate my-python\nhpc-login-1:~$ srun --pty ipython\nPython 3.8.2 | packaged by conda-forge | (default, Mar 5 2020, 17:11:00)\nType 'copyright', 'credits' or 'license' for more information\nIPython 7.13.0 -- An enhanced Interactive Python. Type '?' for help.\n\nIn [1]: import socket; socket.gethostname()\nOut[1]: 'med0740'\n\nIn [2]: exit\nhpc-login-1:~$\n
Allocate 4 cores (default is 1 core), and a total of 4GB of RAM on one node (alternatively use --mem-per-cpu
to set RAM per CPU); sbatch
accepts the same argument.
hpc-login-1:~$ srun --cpus-per-task=4 --nodes=1 --mem=4G --pty bash\nmed0740:~$ export | grep SLURM_CPUS_ON_NODE\n4\nmed0740:~$ your-parallel-script --threads 4\n
Submit an R script to the cluster in batch mode (sbatch
schedules the job for later execution).
hpc-login-1:~$ cat >job-script.sh <<\"EOF\"\n#!/bin/bash\necho \"Hello, I'm running on $(hostname) and it's $(date)\"\nEOF\nhpc-login-1:~$ sbatch job-script.sh\nSubmitted batch job 7\n\n# Some time later:\nhpc-login-1:~$ cat slurm-7.out\nHello, I'm running on med0740 and it's Fri Mar 6 07:36:42 CET 2020\nhpc-login-1:~$\n
"},{"location":"slurm/reservations/","title":"Reservations / Maintenances","text":"Hint
Read this in particular if you want to know why your job does not get scheduled and you see Reason=ReqNodeNotAvail,_Reserved_for_maintenance
in scontrol show job
.
Administration registers maintenances with the Slurm scheduler as so-called reservations. You can see the current reservations with scontrol show reservation
. The following is a scheduled reservation affecting ALL nodes of the cluster.
# scontrol show reservation\nReservationName=root_13 StartTime=2021-09-07T00:00:00 EndTime=2021-09-09T00:00:00 Duration=2-00:00:00\n Nodes=hpc-cpu-[1-36],med[0101-0116,0201-0264,0301-0304,0401-0404,0501-0516,0601-0632,0701-0764]\n NodeCnt=236 CoreCnt=5344 Features=(null) PartitionName=(null)\n Flags=MAINT,IGNORE_JOBS,SPEC_NODES,ALL_NODES TRES=cpu=10176\n Users=root Groups=(null) Accounts=(null) Licenses=(null) State=INACTIVE BurstBuffer=(null) Watts=n/a\n MaxStartDelay=(null)\n
You will also be notified when logging into the login nodes, e.g.,
--\n ***NOTE: 1 scheduled maintenance(s)***\n\n 1: 2021-09-07 00:00:00 to 2021-09-09 00:00:00 ALL nodes\n\nYou jobs do not start because of \"Reserved_for_maintenance\"?\nSlurm jobs will only start if they do not overlap with scheduled reservations.\nMore information:\n\n - https://bihealth.github.io/bih-cluster/slurm/reservations/\n - https://bihealth.github.io/bih-cluster/admin/maintenance/\n--\n
"},{"location":"slurm/reservations/#what-is-the-effect-of-a-reservation","title":"What is the Effect of a Reservation?","text":"Maintenance reservations will block the affected nodes (or even the whole cluster) for jobs. If there is a maintenance in one week then your job must have an end time before the reservation starts. By this, the job gives a guarantee to the scheduler that it will not interfer with the maintenance reservation.
For example, scontrol show job JOBID
might report the following
JobId=4011580 JobName=snakejob\n UserId=USER(UID) GroupId=GROUP(GID) MCS_label=N/A\n Priority=1722 Nice=0 Account=GROUP QOS=normal\n JobState=PENDING Reason=ReqNodeNotAvail,_Reserved_for_maintenance Dependency=(null)\n Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0\n RunTime=00:00:00 TimeLimit=28-00:00:00 TimeMin=N/A\n SubmitTime=2021-08-30T09:01:01 EligibleTime=2021-08-30T09:01:01\n AccrueTime=2021-08-30T09:01:01\n StartTime=2021-09-09T00:00:00 EndTime=2021-10-07T00:00:00 Deadline=N/A\n SuspendTime=None SecsPreSuspend=0 LastSchedEval=2021-08-30T10:20:40\n Partition=long AllocNode:Sid=172.16.35.153:5453\n ReqNodeList=(null) ExcNodeList=(null)\n NodeList=(null)\n NumNodes=1-1 NumCPUs=8 NumTasks=8 CPUs/Task=1 ReqB:S:C:T=0:0:*:*\n TRES=cpu=8,mem=4G,node=1,billing=8\n Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*\n MinCPUsNode=1 MinMemoryNode=4G MinTmpDiskNode=0\n Features=(null) DelayBoot=00:00:00\n OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)\n Power=\n NtasksPerTRES:0\n
Look out for the Reason
line:
Reason=ReqNodeNotAvail,_Reserved_for_maintenance\n
This job is scheduled to run up to 4 weeks and has been submitted on 2021-08-30.
Right now the following reservation is active
# scontrol show reservation\nReservationName=root_13 StartTime=2021-09-07T00:00:00 EndTime=2021-09-09T00:00:00 Duration=2-00:00:00\n Nodes=hpc-cpu-[1-36],med[0101-0116,0201-0264,0301-0304,0401-0404,0501-0516,0601-0632,0701-0764]\n NodeCnt=236 CoreCnt=5344 Features=(null) PartitionName=(null)\n Flags=MAINT,IGNORE_JOBS,SPEC_NODES,ALL_NODES TRES=cpu=10176\n Users=root Groups=(null) Accounts=(null) Licenses=(null) State=INACTIVE BurstBuffer=(null) Watts=n/a\n MaxStartDelay=(null)\n
Thus, the scheduler decided to set a StartTime
of the job to 2021-09-09T00:00:00
, which is the end time of the reservation. Effectively, the job is forced to run outside the maintenance reservation.
You can resolve this by using a --time=
parameter to srun
or sbatch
such that the job ends before the maintenance reservation starts.
Rosetta Stone?
The Rosetta Stone is a stone slab that carries the same text in Egyptian hieroglyphs and ancient Greek. This was key for decyphering Egyptian hieroglyphs in the 18th century. Nowadays, the term is often used to label translation tables such as the one below.
The table below shows some SGE commands and their Slurm equivalents.
User Command SGE Slurm remote loginqrsh/qlogin
srun --pty bash
run interactively N/A srun --pty program
submit job qsub script.sh
sbatch script.sh
delete job qdel job-id
scancel job-id
job status by job id N/A squeue --job job-id
detailed job status qstat -u '*' -j job-id
sstat job-id
job status of your jobs qstat
squeue --me
job status by user qstat -u user
squeue -u user
hold job qhold job-id
scontrol hold job-id
release job qrls job-id
scontrol release job-id
queue list qconf -sql
scontrol show partitions
node list qhost
sinfo -N
OR scontrol show nodes
cluster status qhost -q
sinfo
show node resources N/A sinfo \"%n %G\"
Job Specification SGE Slurm script directive marker #$
#SBATCH
(run in queue) -q queue
-p queue
allocated nodes N/A -N min[-max]
allocate cores -pe smp count
-n count
limit running time -l h_rt=time
-t days-hh:mm:s
redirectd stdout -o file
-o file
redirect stderr -e file
-e file
combine stdout/stderr -j yes
-o without -e
copy environment -V
--export=ALL\\|NONE\\|variables
email notification -m abe
--mail-type=events
send email to -M email
--mail-user=email
job name -N name
--job-name=name
restart job -r yes|no
--requeue|--no-requeue
working directory -wd path
--workdir
run exclusively -l exclusive
--exclusive
OR --shared
allocate memory -l h_vmem=size
--mem=mem
OR --mem-per-cpu=mem
wait for job -hold_jid jid
--depend state:job
select target host -l hostname=host1\\|host1
--nodelist=nodes
AND/OR --exclude
allocate GPU -l gpu=1
--gres=gpu:tesla:count
or --gres=gpu:a40:count
"},{"location":"slurm/snakemake/","title":"Snakemake with Slurm","text":"This page describes how to use Snakemake with Slurm.
"},{"location":"slurm/snakemake/#prerequisites","title":"Prerequisites","text":"source miniconda/bin/activate
.We first create a new environment snakemake-slurm
and activate it. We need the snakemake
package for this.
host:~$ conda create -y -n snakemake-slurm snakemake\n[...]\n#\n# To activate this environment, use\n#\n# $ conda activate snakemake-slurm\n#\n# To deactivate an active environment, use\n#\n# $ conda deactivate\nhost:~$ conda activate snakemake-slurm\n(snakemake-slurm) host:~$\n
"},{"location":"slurm/snakemake/#snakemake-workflow-setup","title":"Snakemake Workflow Setup","text":"We create a workflow and ensure that it works properly with multi-threaded Snakemake (no cluster submission here!)
host:~$ mkdir -p snake-slurm\nhost:~$ cd snake-slurm\nhost:snake-slurm$ cat >Snakefile <<\"EOF\"\nrule default:\n input: \"the-result.txt\"\n\nrule mkresult:\n output: \"the-result.txt\"\n shell: r\"sleep 1m; touch the-result.txt\"\nEOF\nhost:snake-slurm$ snakemake --cores=1\n[...]\nhost:snake-slurm$ ls\nSnakefile the-result.txt\nhost:snake-slurm$ rm the-result.txt\n
"},{"location":"slurm/snakemake/#snakemake-and-slurm","title":"Snakemake and Slurm","text":"You have two options:
snakemake --profile=cubi-v1
and the Snakemake resource configuration as shown below. STRONGLY PREFERREDsnakemake --cluster='sbatch ...'
command.Note that we sneaked in a sleep 1m
? In a second terminal session, we can see that the job has been submitted to SLURM indeed.
host:~$ squeue -u holtgrem_c\n JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)\n 325 debug snakejob holtgrem R 0:47 1 med0127\n
"},{"location":"slurm/snakemake/#threads-resources","title":"Threads & Resources","text":"The cubi-v1
profile (stored in /etc/xdg/snakemake/cubi-v1
on all cluster nodes) supports the following specification in your Snakemake rule:
threads
: the number of threads to execute the job onresources.mem
/resources.mem_mb
: the memory to allocate for the whole job, OR resources.mem_per_thread
: the memory to allocate for each thread.resources.time
: the running time of the rule, in a syntax supported by Slurm, e.g. HH:MM:SS
or D-HH:MM:SS
resources.partition
: the partition to submit your job into (Slurm will pick a fitting partition for you by default)resources.nodes
: the number of nodes to schedule your job on (defaults to 1
and you will want to keep that value unless you want to use MPI)You will need Snakemake >=7.0.2 for this.
Here is how to call Snakemake:
# snakemake --profile=cubi-v1 -j1\n
To set rule-specific resources:
rule myrule:\n threads: 1\n resources:\n mem='8G',\n time='04:00:00',\n input: # ...\n output: # ...\n shell: # ...\n
You can combine this with Snakemake resource callables, of course:
def myrule_mem(wildcards, attempt):\n mem = 2 * attempt\n return '%dG' % mem\n\nrule snps:\n threads: 1\n resources:\n mem=myrule_mem,\n time='04:00:00',\n input: # ...\n output: # ...\n shell: # ...\n
"},{"location":"slurm/snakemake/#custom-logging-directory","title":"Custom logging directory","text":"By default, slurm will write log files into the working directory of snakemake, which will look like slurm-$jobid.out
.
To change this behaviour, the environment variable SBATCH_DEFAULTS
can be set to re-route the --output
parameter. If you want to write your files into slurm_logs
with a filename pattern of $name-$jobid
for instance, consider the following snippet for your submission script:
#!/bin/bash\n#\n#SBATCH --job-name=snakemake_main_job\n#SBATCH --ntasks=1\n#SBATCH --nodes=1\n#SBATCH --time=48:10:00\n#SBATCH --mem-per-cpu=300M\n#SBATCH --output=slurm_logs/%x-%j.log\n\nmkdir -p slurm_logs\nexport SBATCH_DEFAULTS=\" --output=slurm_logs/%x-%j.log\"\n\ndate\nsrun snakemake --use-conda -j1 --profile=cubi-v1\ndate\n
The name of the snakemake slurm job will be snakemake_main_job
, the name of the jobs spawned from it will be called after the rule name in the Snakefile.
This section describes how Slurm handles temporary files on the local disk.
Temporary Files Best Practices
See Best Practices: Temporary Files for information how to use temporary files effectively.
"},{"location":"slurm/temporary-files/#slurm-behaviour","title":"Slurm Behaviour","text":"Our Slurm configuration has the following behaviour.
"},{"location":"slurm/temporary-files/#environment-variable-tmpdir","title":"Environment Variable TMPDIR","text":"Slurm itself will by default not change the TMPDIR
environment variable but retain the variable's value from the srun
or sbatch
call.
/tmp
Directories","text":"The only place where users can write data to on local storage of the compute nodes is /tmp
.
Storage is a consumable shared resource as the storage used by one job cannot use another job. It is thus critical that Slurm cleans up after each job such that all space on the local node is available to the next job. This is done using the job_container/tmpfs Slurm plugin.
This plugin creates a so-called Linux namespace for each job and creates a bind mount of /tmp
to a location on the local storage. This mount is only visible to the currently running job and each job, even of the same user, get their own /tmp
. After a job terminates, Slurm will remove the directory and all of its content.
There is a notable exception. If you use ssh
to connect to a node rather than using srun
or sbatch
, you will see the system /tmp
directory and can also write to it. This usage of storage is not tracked and consequently you can circumvent the Slurm quota management. Using /tmp
in this fashion (i.e., outside of Slurm-controlled jobs) is prohibited. If it cannot be helped (e.g., if you need to run some debugging application that needs to create FIFO or socket files) then keep usage of /tmp
outside of Slurm job below 100MB.
localtmp
","text":"Enforcing localtmp
Gres
From January 31, we will enforce the allocated storage in /tmp
on the local disk with quotas. Jobs writing to /tmp
beyond the quota in the job allocation will not function properly and probably crash with \"out of disk quota\" messages.
Slurm tracks the available local storage above 100MB on nodes in the localtmp
generic resource (aka Gres). The resource is counted in steps of 1MB, such that a node with 350GB of local storage would look as follows in scontrol show node
:
hpc-login-1 # scontrol show node hpc-cpu-1\nNodeName=hpc-cpu-1 Arch=x86_64 CoresPerSocket=24\n [...]\n Gres=localtmp:350K\n [...]\n CfgTRES=cpu=96,mem=360000M,billing=96,gres/localtmp=358400\n [...]\n
Each job is automaticaly granted 100MB of storage on the local disk which is sufficient for most standard programs. If your job needs more temporary storage then you should either
$HOME/scratch
volume (see Best Practices: Temporary Files)localtmp
generic resource (described here)You can allocate the resource with --gres=localtmp:SIZE
where SIZE
is given in MB.
hpc-login-1 # srun --gres=localtmp:100k --pty bash -i\nhpc-cpu-1 # scontrol show node hpc-cpu-1\nNodeName=hpc-cpu-1 Arch=x86_64 CoresPerSocket=24\n [...]\n Gres=localtmp:250K\n [...]\n CfgTRES=cpu=96,mem=360000M,billing=96,gres/localtmp=358400\n [...]\n AllocTRES=cpu=92,mem=351G,gres/localtmp=102400\n [...]\n
The first output tells us about the resource configured to be available to user jobs and the last line show us that 100k=102400
MB of local storage are allocated.
You can also see the used resources in the details of your job:
scontrol show job 14848\nJobId=14848 JobName=example.sh\n [...]\n TresPerNode=gres:localtmp:100k\n
"},{"location":"slurm/x11/","title":"Slurm and X11","text":"Make sure to connect to the login node with X11 forwarding.
host:~$ ssh -X -l user_c hpc-login-1.cubi.bihealth.org\n
Once connected to the login node, pass the --x11
flag.
hpc-login-1:~$ srun --pty --x11 xterm\n
"},{"location":"storage/home-quota/","title":"Keeping your home folder clean","text":"We set quite restrictive quotas for user homes, but in exchange you get file system snapshots and mirroring. Your home folder should therefore only be used for scripts, your user config, and other small files. Everything else should be stored in the work
or scratch
subdirectories, which effectively link to your group's shared storage space. This document describes some common pitfalls and how to circumvent them.
Hint
The tilde character (~
) is shorthand for your home directory.
Various programs are used to depositing large folders in a user's home and can quickly use up your allotted storage quota. These include:
~/.local/lib/python*
~/R/x86_64-pc-linux-gnu-library
~/ondemand
Please note that directories whose name is starting with a dot are not shown by the normal ls
command, but require the ls -a
flag. You can search your home folder for large directories like so:
$ du -shc ~/.* ~/* --exclude=.. --exclude=.\n
You should move these locations to your work
folder and create symbolic links in their place. Conda installations should be installed in work
from the very beginning as they do not react well to being moved around.
Here is an example for the .local
folder.
$ mv ~/.local ~/work/.local\n$ ln -s ~/work/.local ~/.local\n
"},{"location":"storage/home-quota/#temporary-files","title":"Temporary Files","text":"Another usual culprit is the hidden .cache
directory which contains temporary files. This folder can be moved to the scratch
volume in a similar manner as described above.
$ mv ~/.cache ~/scratch/.cache\n$ ln -s ~/scratch/.cache ~/.cache\n
Important
Files placed in your scratch
directory will be automatically removed after 2 weeks. Do not place any valuable files in there.
Please use hpc-transfer-1
and hpc-transfer-2
for moving large amounts of files. This not only leaves the compute notes available for actual computation, but also has no risk of your jobs being killed by Slurm. You should also use tmux
to not risk connection loss during long running transfers.
Define source and target location and copy contents. Please replace the parts in curly brackets with your actual folder names. It is important to end paths with a trailing slash (/
) as this is interpreted by sync
as \u201call files in this folder\u201d.
$ SOURCE=/data/gpfs-1/work/projects/{my_project}/\n$ TARGET=/data/cephfs-2/unmirrored/projects/{my-project}/\n$ rsync -ahP --stats --dry-run $SOURCE $TARGET\n
Remove the --dry-run
flag to start the actual copying process.
Important
File ownership information will be lost during this process. This is due to non-root users not being allowed to change ownership of arbitrary files. If this is a problem for you, please contact our admins again after completing this step.
Perform a second rsync
to check if all files were successfully transferred. Paranoid users might want to add the --checksum
flag to rsync
or use hashdeep
. Please note the flag --remove-source-files
which will do exactly as the name suggests, but leaves empty directories behind.
$ rsync -ahX --stats --remove-source-files --dry-run $SOURCE $TARGET\n
--dry-run
flag to start the actual deletion.$ find $SOURCE -type f | wc -l\n0\n$ rm -r $SOURCE\n
Warning
When defining your SOURCE location, do not use the *
wildcard character. It will not match hidden (dot) files and leave them behind. Its better to use a trailing slash which matches \u201cAll files in this folder\u201d.
All files within your own work directory can be transferred as follows. Please replace parts in curly braces with your cluster user name.
$ SOURCE=/data/gpfs-1/work/users/{username}/\n$ TARGET=/data/cephfs-1/home/users/{username}/work/\n$ rsync -ahP --stats --dry-run $SOURCE $TARGET\n
Note
The --dry-run
flag lets you check that rsync is working as expected without copying any files. Remove it to start the actual transfer.
Perform a second rsync
to check if all files were successfully transferred. Paranoid users might want to add the --checksums
flag or use hashdeep
. Please note the flag --remove-source-files
which will do exactly as the name suggests, but leaves empty directories behind.
$ rsync -ahP --stats --remove-source-files --dry-run $SOURCE $TARGET\n
$ find $SOURCE -type f | wc -l\n0\n
Conda installations tend not to react well to moving their main folder from its original location. There are numerous ways around this problem which are described here.
A simple solution we can recommend is this:
Install a fresh version of conda or mamba in your new work folder. Don't forget to first remove the conda init block in ~/.bashrc
.
$ nano ~/.bashrc\n$ conda init\n$ conda config --set auto_activate_base false\n
You can then use your new conda to export your old environments by specifying a full path like so:
$ conda env export -p /fast/work/user/$USER/miniconda/envs/<env_name> -f <env_name>.yaml\n
If you run into errors it might be better to also use the --no-builds
flag. Finally re-create your old environments from the yaml files:
$ conda env create -f {environment.yml}\n
Outdated
This document is only valid for the old, third-generation file system and will be removed soon. Quotas of our new CephFS storage are communicated via the HPC Access web portal.
As described elsewhere, all data in your user, group, and project volumes is subject to quotas. This page quickly shows how to query for the current usage of data volume and file counts for your user, group, and projects.
"},{"location":"storage/querying-storage/#query-for-user-data-and-file-usage","title":"Query for User Data and File Usage","text":"The file /etc/bashrc.gpfs-quota
contains some Bash functions that you can use for querying the quota usage. This file is automatically sourced in all of your Bash sessions.
For querying your user's data and file usage, enter the following command:
# bih-gpfs-quota-user holtgrem_c\n
You will get a report as follows. As soon as usage reaches 90%, the data/file usage will be highlighted in yellow. If you pass 99%, the data/file usage will be highlighted in red.
=================================\nQuota Report for: user holtgrem_c\n=================================\n\n DATA quota GR- FILES quota GR-\nENTITY NAME FSET USED SOFT HARD ACE USED SOFT HARD ACE\n------- ---------- ------- ----- ---- ----- ----- --- ----- ---- ----- ----- ---\nusers holtgrem_c home 103M 10% 1.0G 1.5G - 2.5k 25% 10k 12k -\nusers holtgrem_c work 639G 62% 1.0T 1.1T - 1.0M 52% 2.0M 2.2M -\nusers holtgrem_c scratch 42G 0% 200T 220T - 207k 0.1% 200M 220M -\n[...]\n
"},{"location":"storage/querying-storage/#query-for-group-data-and-file-usage","title":"Query for Group Data and File Usage","text":"# bih-gpfs-report-quota group ag_someag\n=================================\nQuota Report for: group ag_someag\n=================================\n\n DATA quota GR- FILES quota GR-\nENTITY NAME FSET USED SOFT HARD ACE USED SOFT HARD ACE\n------- ---------- ------- ----- ---- ----- ----- --- ----- ---- ----- ----- ---\ngroups ag_someag home 0 0% 1.0G 1.5G - 4 0% 10k 12k -\ngroups ag_someag work 349G 34% 1.0T 1.5T - 302 0% 2.0M 2.2M -\ngroups ag_someag scratch 0 0% 200T 220T - 1 0% 200M 220M -\n\n[...]\n
"},{"location":"storage/querying-storage/#query-for-project-data-and-file-usage","title":"Query for Project Data and File Usage","text":"# bih-gpfs-report-quota project someproj\n==================================\nQuota Report for: project someproj\n==================================\n\n DATA quota GR- FILES quota GR-\nENTITY NAME FSET USED SOFT HARD ACE USED SOFT HARD ACE\n------- ---------- ------- ----- ---- ----- ----- --- ----- ---- ----- ----- ---\ngroups someproj home 0 0% 1.0G 1.5G - 4 0% 10k 12k -\ngroups someproj work 349G 34% 1.0T 1.5T - 302 0% 2.0M 2.2M -\ngroups someproj scratch 0 0% 200T 220T - 1 0% 200M 220M -\n\n[...]\n
"},{"location":"storage/scratch-cleanup/","title":"Automated Cleanup of Scratch","text":"The scratch
space is automatically cleaned up nightly with the following mechanism.
scratch
folder are created and retained for 3 days.Warning
We specifically use the mtime
attribute to determine if files in scratch should be cleaned up. Copying or downloading files to scratch while preserving the original mtime
might lead to unexpected results.
This document describes the forth iteration of the file system structure on the BIH HPC cluster. It was made necessary because the previous file system was no longer supported by the manufacturer and we since switched to distributed Ceph storage.
Important
For now, the old, third-generation file system is still mounted at /fast
. It will be decommissioned soon, please consult this document describing the migration process!
There are the following three entities on the cluster:
Each user, group, and project can have storage folders in different locations.
"},{"location":"storage/storage-locations/#data-types-and-storage-tiers","title":"Data Types and Storage Tiers","text":"Files stored on the HPC fall into one of three categories:
Home folders store programs, scripts, and user config i.\u00a0e. long-lived and very important files. Loss of this data requires to redo manual work (like programming).
Work folders store data of potentially large size which has a medium life time and is important. Examples are raw sequencing data and intermediate results that are to be kept (e.\u00a0g. sorted and indexed BAM files). Work data requires time-consuming actions to be restored, such as downloading large amounts of data or long-running computation.
Scratch folder store temporary files with a short life-time. Examples are temporary files (e.\u00a0g. unsorted BAM files). Scratch data is created to be removed eventually.
Ceph storage comes in two types which differ in their I/O speed, total capacity, and cost. They are called Tier 1 and Tier 2 and sometimes hot storage and warm storage. In the HPC filesystem they are mounted in /data/cephfs-1
and /data/cephfs-2
.
Storage quotas are imposed in these locations to restrict the maximum size of folders. Amount and utilization of quotas is communicated via the HPC Access web portal.
"},{"location":"storage/storage-locations/#home-directories","title":"Home Directories","text":"Location: /data/cephfs-1/home/
Only users have home directories on Tier 1 storage. This is the starting point when starting a new shell or SSH session. Important config files are stored here as well as analysis scripts and small user files. Home folders have a strict storage quota of 1\u00a0GB.
"},{"location":"storage/storage-locations/#work-directories","title":"Work Directories","text":"Location: /data/cephfs-1/work/
Groups and projects have work directories on Tier 1 storage. User home folders contain a symlink to their respective group's work folder. Files shared within a group/project are stored here as long as they are in active use. Work folders are generally limited to 1\u00a0TB per group. Project work folders are allocated on an individual basis.
"},{"location":"storage/storage-locations/#scratch-space","title":"Scratch Space","text":"Location: /data/cephfs-1/scratch/
Groups and projects have scratch space on Tier 1 storage. User home folders contain a symlink to their respective group's scratch space. Meant for temporary, potentially large data e.\u00a0g. intermediate unsorted or unmasked BAM files, data downloaded from the internet etc. Scratch space is generally limited to 10\u00a0TB per group. Projects are allocated scratch on an individual basis. Files in scratch will be automatically removed 2 weeks after their creation.
"},{"location":"storage/storage-locations/#tier-2-storage","title":"Tier 2 Storage","text":"Location: /data/cephfs-2/
This is where big files go when they are not in active use. Groups are allocated 10 TB of Tier 2 storage by default. File quotas here can be significantly larger as space is much cheaper and more abundant than on Tier 1.
Note
Tier 2 storage is currently not accessible from HPC login nodes.
"},{"location":"storage/storage-locations/#overview","title":"Overview","text":"Tier Function Path Default Quota 1 User home/data/cephfs-1/home/users/<user>
1 GB 1 Group work /data/cephfs-1/work/groups/<group>
1 TB 1 Group scratch /data/cephfs-1/scratch/groups/<group>
10 TB 1 Project work /data/cephfs-1/work/projects/<project>
On request 1 Project scratch /data/cephfs-1/scratch/projects/<project>
On request 2 Group /data/cephfs-2/unmirrored/groups/<group>
10 TB 2 Project /data/cephfs-2/unmirrored/projects/<project>
On request 2 Group /data/cephfs-2/mirrored/groups/<group>
On request 2 Project /data/cephfs-2/mirrored/projects/<project>
On request"},{"location":"storage/storage-locations/#snapshots-and-mirroring","title":"Snapshots and Mirroring","text":"Snapshots are incremental copies of the state of the data at a particular point in time. They provide safety against various \"Ops, did I just delete that?\" scenarios, meaning they can be used to recover lost or damaged files. Depending on the location and Tier, CephFS creates snapshots in different frequencies and retention plans.
Location Path Retention policy Mirrored User homes/data/cephfs-1/home/users/
Hourly for 48 h, daily for 14 d yes Group/project work /data/cephfs-1/work/
Four times a day, daily for 5 d no Group/project scratch /data/cephfs-1/scratch/
Daily for 3 d no Group/project mirrored /data/cephfs-2/mirrored/
Daily for 30 d, weekly for 16 w yes Group/project unmirrored /data/cephfs-2/unmirrored/
Daily for 30 d, weekly for 16 w no Some parts of Tier 1 and Tier 2 snapshots are also mirrored into a separate fire compartment within the data center. This provides an additional layer of security i.\u00a0e. physical damage to the servers.
"},{"location":"storage/storage-locations/#accessing-snapshots","title":"Accessing Snapshots","text":"To access snapshots simply navigate to the .snap/
sub-folder of the respective location. This special folder exists on all levels of the CephFS file hierarchy, so even in your user home directory. Inside you will find one folder per snapshot created and in those a complete replica of the respective folder at the time of snapshot creation.
For example:
/data/cephfs-1/home/.snap/<some_snapshot>/users/<your_user>/
same as:/data/cephfs-1/home/users/<your_user>/.snap/<some_snapshot>
/data/cephfs-1/work/.snap/<some_snapshot>/groups/<your_group>/
/data/cephfs-2/unmirrored/.snap/<some_snapshot>/projects/<your_project>/
Here is a simple example of how to restore a file:
$ cd /data/cephfs-2/unmirrored/groups/cubi/.snap/scheduled-2024-03-11-00_00_00_UTC/\n$ ls -l\nimportant_file.txt\n$ cp important_file.txt /data/cephfs-2/unmirrored/groups/cubi/\n
"},{"location":"storage/storage-locations/#technical-implementation","title":"Technical Implementation","text":""},{"location":"storage/storage-locations/#tier-1","title":"Tier 1","text":"/data/cephfs-1
/data/cephfs-2
Important
We will remove access to /fast
on most cluster nodes following September 30th.
Files on the cluster's main storage /data/gpfs-1
aka. /fast
will move to a new file system. That includes users' home directories, work directories, and work-group directories. Once files have been moved to their new locations, /fast
will be retired.
Simultaneously we will move towards a more unified naming scheme for project and group folder names. From now on, all such folders names shall be in kebab-case. This is Berlin after all. Group folders will also be renamed, removing the \"ag_\" prefix.
Detailed communication about the move will be communicated via the cluster mailinglist and the user forum. For technical help, please consult the Data Migration Tips and tricks.
"},{"location":"storage/storage-migration/#why-is-this-happening","title":"Why is this happening?","text":"/fast
is based on a high performance proprietary hardware (DDN) & file system (GPFS). The company selling it has terminated support which also means buying replacement parts will become increasingly difficult.
There are two file systems set up to replace /fast
, named Tier 1 and Tier 2 after their difference in I/O speed:
/fast
ever was, but it only has about 75\u00a0% of its usable capacity.The Hot storage Tier 1 is reserved for files requiring frequent random access, user homes, and scratch. Tier 2 (Warm storage) should be used for everything else. Both file systems are based on the open-source, software-defined Ceph storage platform and differ in the type of drives used. Tier 1 or Cephfs-1 uses NVME SSDs and is optimized for performance, Tier 2 or Cephfs-2 used traditional hard drives and is optimized for cost.
So these are the three terminologies in use right now:
/data/cephfs-1
/data/cephfs-2
More information about CephFS can be found here.
"},{"location":"storage/storage-migration/#new-file-locations","title":"New file locations","text":"Naturally, paths are going to change after files move to their new location. Due to the increase in storage quality options, there will be some more folders to consider.
"},{"location":"storage/storage-migration/#users","title":"Users","text":"/data/cephfs-1/home/users/<user>
/data/cephfs-1/work/groups/<doe>/users/<user>
/data/cephfs-1/scratch/groups/<doe>/users/<user>
Important
User work
& scratch
spaces are now part of the user's group folder. This means, groups need to coordinate internally to distribute their allotted quota according to each user's needs.
The implementation is done via symlinks created by default when the user account is moved to its new destination:
~/work -> /data/cephfs-1/work/groups/<group>/users/<user>
~/scratch -> /data/cephfs-1/scratch/groups/<group>/users/<user>
/data/cephfs-1/work/groups/<group>
/data/cephfs-1/scratch/groups/<group>
/data/cephfs-2/unmirrored/groups/<group>
/data/cephfs-1/work/projects/<project>
/data/cephfs-1/scratch/projects/<project>
Space on Tier 1 is limited. Your colleagues, other cluster users, and admins will be very grateful if you use it only for files you actively need to perform read/write operations on. This means main project storage should probably always be on Tier 2 with workflows to stage subsets of data onto Tier 1 for analysis.
These examples are based on our experience of processing diverse NGS datasets. Your mileage may vary but there is a basic principle that remains true for all projects.
"},{"location":"storage/storage-migration/#dna-sequencing-wes-wgs","title":"DNA sequencing (WES, WGS)","text":"Typical Whole Genome Sequencing data of a human sample at 100x coverage requires about 150 GB of storage, Whole Exome Sequencing files occupy between 6 and 30 GB. These large files require considerable I/O resources for processing, in particular for the mapping step. A prudent workflow for these kind of analysis would therefore be the following:
fastqs
) from the Tier 2 location to Tier 1. seqtk
is your friend!fastq
files from Tier 2 to Tier 1. Run the your scripts on the whole dataset, and copy the results (bam
or cram
files) back to Tier 2.Tip
Don't forget to use your scratch
area for transient operations, for example to sort your bam
file after mapping. More information on how to efficiently set up your temporary directory here.
Analysis of RNA expression datasets are typically a long and iterative process, where the data must remain accessible for a significant period. However, there is usually no need to keep raw data files and mapping results available once the gene & transcripts counts have been generated. The count files are much smaller than the raw data or the mapped data, so they can live longer on Tier 1.
A typical workflow would be:
fastq
files from Tier 2 to Tier 1.salmon
or STAR
, and store the results on Tier 2.R
, using tximport
and DESeq2
or featureCounts
& edgeR
, for example.R
objects) and the output of salmon
, STAR
, or any mapper/aligner of your choice to Tier 2.Tip
If using STAR
, don't forget to use your scratch
area for transient operations. More information on how to efficiently set up your temporary directory here
The analysis workflow of bulk RNA & single cell dataset is conceptually similar: Large raw files need to be processed once and only the outcome of the processing (gene counts matrices) are required for downstream analysis. Therefore, a typical workflow would be:
fastq
files from Tier 2 to Tier 1.Cell Ranger
or alevin-fry
, perform count matrix QC and store the results on Tier 2.seurat
, scanpy
, or Loupe Browser
.There is no obvious workflow that covers most used cases for machine learning. However,
/fast
to CephFS","text":"Best practices and tools will be provided.
"}]} \ No newline at end of file diff --git a/sitemap.xml.gz b/sitemap.xml.gz index 240e7a19c..ca0f3b8f7 100644 Binary files a/sitemap.xml.gz and b/sitemap.xml.gz differ