-
Notifications
You must be signed in to change notification settings - Fork 37
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
remove more outdated SLURM access restrictions (#124)
- Loading branch information
Showing
3 changed files
with
20 additions
and
25 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,34 +6,24 @@ Connecting to a node with GPUs is easy. | |
You simply request a GPU using the `--gres=gpu:$CARD:COUNT` (for `CARD=tesla` or `CARD=a40`) argument to `srun` and `batch`. | ||
This will automatically place your job in the `gpu` partition (which is where the GPU nodes live) and allocate a number of `COUNT` GPUs to your job. | ||
|
||
!!! note | ||
|
||
Recently, `--gres=gpu:tesla:COUNT` was often not able to allocate the right partion on it's own. | ||
If scheduling a GPU fails, consider additionally indicating the GPU partion explicitely with `--partition gpu` (or `#SBATCH --partition gpu` in batch file). | ||
|
||
!!! hint | ||
!!! info | ||
Fair use rules apply. | ||
As GPU nodes are a limited resource, excessive use by single users is prohibited and can lead to mitigating actions. | ||
Be nice and cooperative with other users. | ||
Tip: `getent passwd USER_NAME` will give you a user's contact details. | ||
|
||
Make sure to read the FAQ entry "[I have problems connecting to the GPU node! What's wrong?](../../help/faq.md#i-have-problems-connecting-to-the-gpu-node-whats-wrong)". | ||
|
||
!!! important "Interactive Use of GPU Nodes is Discouraged" | ||
!!! warning "Interactive Use of GPU Nodes is Discouraged" | ||
|
||
While interactive computation on the GPU nodes is convenient, it makes it very easy to forget a job after your computation is complete and let it run idle. | ||
While your job is allocated, it blocks the **allocated** GPUs and other users cannot use them although you might not be actually using them. | ||
Please prefer batch jobs for your GPU jobs over interactive jobs. | ||
|
||
Further, interactive GPU jobs are currently limited to 24 hours. | ||
Furthermore, interactive GPU jobs are currently limited to 24 hours. | ||
We will monitor the situation and adjust that limit to optimize GPU usage and usability. | ||
|
||
!!! important "Allocation of GPUs through Slurm is mandatory" | ||
|
||
In other word: using GPUs from SSH sessions is prohibited. | ||
Please also note that allocation of GPUs through Slurm is mandatory, in other words: Using GPUs via SSH sessions is prohibited. | ||
The scheduler is not aware of manually allocated GPUs and this interferes with other users' jobs. | ||
|
||
## Prequisites | ||
|
||
You have to register with [[email protected]](mailto:[email protected]) for requesting access. | ||
Afterwards, you can connect to the GPU nodes as shown below. | ||
|
||
## Preparation | ||
|
||
We will setup a miniconda installation with `pytorch` testing the GPU. | ||
|
@@ -96,6 +86,8 @@ True | |
Recently, `--gres=gpu:tesla:COUNT` was often not able to allocate the right partion on it's own. | ||
If scheduling a GPU fails, consider additionally indicating the GPU partion explicitely with `--partition gpu` (or `#SBATCH --partition gpu` in batch file). | ||
|
||
Also make sure to read the FAQ entry "[I have problems connecting to the GPU node! What's wrong?](../../help/faq.md#i-have-problems-connecting-to-the-gpu-node-whats-wrong)" if you encounter problems. | ||
|
||
## Bonus #1: Who is using the GPUs? | ||
|
||
Use `squeue` to find out about currently queued jobs (the `egrep` only keeps the header and entries in the `gpu` partition). | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,11 +1,14 @@ | ||
# How-To: Connect to High-Memory Nodes | ||
|
||
## Prequisites | ||
|
||
You have to register with [[email protected]](mailto:[email protected]) for requesting access. | ||
|
||
Afterwards, you can connect to the High-Memory using the `highmem` SLURM partition (see below). | ||
Jobs allocating more than 200GB of RAM should be routed automatically to the `highmem` nodes. | ||
The cluster has 4 high-memory nodes with 1.5 TB of RAM. | ||
You can connect to these nodes using the `highmem` SLURM partition (see below). | ||
Jobs allocating more than 200 GB of RAM are automatically routed to the `highmem` nodes. | ||
|
||
!!! info | ||
Fair use rules apply. | ||
As high-memory nodes are a limited resource, excessive use by single users is prohibited and can lead to mitigating actions. | ||
Be nice and cooperative with other users. | ||
Tip: `getent passwd USER_NAME` will give you a user's contact details. | ||
|
||
## How-To | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters