From a261f3b1c9f47a05cfd387efb4f654c37d3e2eb0 Mon Sep 17 00:00:00 2001 From: yaojin17 <1220472926@qq.com> Date: Mon, 30 Dec 2024 14:24:08 -0500 Subject: [PATCH] reorganize --- CS_server.md | 77 +++++++++++++++++++++++++++++++ Storage.md => CS_storage.md | 1 + README.md | 89 ++++++++---------------------------- Rivanna.md | 18 ++++++++ Another-page.md => Tricks.md | 2 +- test-child.md => git-lfs.md | 0 index.md | 31 +++++++++++++ 7 files changed, 147 insertions(+), 71 deletions(-) create mode 100644 CS_server.md rename Storage.md => CS_storage.md (97%) create mode 100644 Rivanna.md rename Another-page.md => Tricks.md (92%) rename test-child.md => git-lfs.md (100%) create mode 100644 index.md diff --git a/CS_server.md b/CS_server.md new file mode 100644 index 0000000..d90f751 --- /dev/null +++ b/CS_server.md @@ -0,0 +1,77 @@ +--- +title: CS department servers +has_children: true +nav_order: 1 +--- + +## CS department servers + +The UVA CS Department provides servers for computing needs. See [UVA Computing Resources](https://www.cs.virginia.edu/wiki/doku.php?id=compute_resources) for more information. + +### How to access servers + +1. Log in to `portal.cs.virginia.edu` through ssh first as a forward server. If you do not have credentials yet, contact the CS IT team to request access for your computing ID. + +2. Check available servers. Commonly used commands: + - `sinfo`: Servers in the state of idle and mix are potentially available (sometimes reserved servers are not displayed correctly). + - `squeue -p gpu`: Check the queue of the gpu partition to see if there are any jobs waiting to be run. + - `scontrol show job `: Check the status of a specific job. This could be used to check if a job is waiting for a server that you are also waiting for. + - `scontrol show node `: Check the status of a specific node. This could be used to check the available resources on a node. + - `scontrol show res` : List all reservations including the server, users, and time. Reserved servers are only available to users on the reservation list. +3. Then you have two choices: + + - Submit a slurm script([UVA slurm information](https://www.cs.virginia.edu/wiki/doku.php?id=compute_slurm)) to run a job. An example is below: + ```bash + #!/bin/bash + + #SBATCH --job-name=your_job_name + #SBATCH --output=%x_%j.out + #SBATCH --error=%x_%j.err + #SBATCH --partition=gpu + #SBATCH --nodes=1 + #SBATCH --ntasks-per-node=1 + #SBATCH --cpus-per-task=10 + #SBATCH --gres=gpu:4 + #SBATCH --mem=50G + #SBATCH --time=D-HH:MM:SS + #SBATCH --constraint="a40|a100_80gb|a100_40gb|h100_80gb" # The requested gpu type could be either a40, a100_80gb, a100_40gb or h100_80gb + #SBATCH --reservation= # If you have a reservation on the targeted node + #SBATCH --mail-type=begin,end,fail + #SBATCH --mail-user=@virginia.edu + + module load cuda-toolkit-11.8.0 # Optional, according to your project + + python your_script.py --your_arguments xxx + ``` + Use `sbatch mysbatchscript.sh` to submit the slurm script. + - Use the `salloc` command like the one below to use the server interactively. + ``` + salloc -p gpu -N 1 -c 10 --mem=30G -J InteractiveJob -t 0-00:30:00 --gres=gpu:2 -C a100_80gb + ``` + This command allocates two A100 80GB GPUs, 10 CPUs, and 30GB of RAM for a 30-minute session. If no duration is specified, the default maximum runtime is set to four days. More explanations about the arguments can be found in [UVA slurm information](https://www.cs.virginia.edu/wiki/doku.php?id=compute_slurm). + Once the server resources are allocated, enter the following command to access the server interactively: + ``` + srun --pty bash -i -l -- + ``` +4. After a job is started, you could use `srun --pty --overlap --jobid= ` to run a command on the server. This is useful when you want to run a command on a server that is already running a job. For example, the `` could be `nvtop` to check the GPU usage of the running job. The `` could also be `bash` to open a new terminal on the server. +5. If we have reserved a server([slurm reservations](https://www.cs.virginia.edu/wiki/doku.php?id=compute_slurm#reservations)), add `--reservation=rry4fg_7` in the `salloc` command or `#SBATCH --reservation=rry4fg_7` in the `sbatch` script. Replace `rry4fg_7` with the reservation tag provided by IT. +Note that you cannot use the reserved servers without the tag, even if your ID is on the reservation user ID list. + +### Public server owned by our group + +Node `cheetah06` contains 8 80GB A100 GPUs and is owned by our group. It is open to the public and on the slurm queues. To prevent others from using the machine, you need to put a reservation. Actually, anyone can reserve the server, so for upcoming submission deadlines, it is better to reserve the server in advance. Remember to put all the group members on the list of who can access the reserved server. + +### Modules +Modules are pre-installed software packages in the Slurm system that users can access without root or sudo privileges. + +To view the list of available modules, use the command `module avail`. To load a required module, such as nvtop, use `module load nvtop`. + +If a needed module is not available, contact the CS IT team for assistance with installation or alternative solutions. See [Software Modules](https://www.cs.virginia.edu/wiki/doku.php?id=linux_environment_modules) for more information. + +### Development servers +The `gpusrv[01-19]` servers([GPU servers](https://www.cs.virginia.edu/wiki/doku.php?id=compute_resources#gpu_servers)), which are low GPU memory servers, can be directly accessed via SSH for developing and debugging code. + + +### Server Issues +If you encounter any hardware or software issues with the servers, send an email to cshelpdesk@virginia.edu. + diff --git a/Storage.md b/CS_storage.md similarity index 97% rename from Storage.md rename to CS_storage.md index 1876e0a..cfa640a 100644 --- a/Storage.md +++ b/CS_storage.md @@ -1,5 +1,6 @@ --- title: Storage +parent: CS department servers has_children: false nav_order: 2 --- diff --git a/README.md b/README.md index 01dbaff..3651566 100644 --- a/README.md +++ b/README.md @@ -1,71 +1,20 @@ -## Public Servers - -The UVA CS Department provides servers for computing needs. See [UVA Computing Resources](https://www.cs.virginia.edu/wiki/doku.php?id=compute_resources) for more information. - -### How to access servers - -1. Log in to `portal.cs.virginia.edu` through ssh first as a forward server. If you do not have credentials yet, contact the CS IT team to request access for your computing ID. - -2. Check available servers. Commonly used commands: - - `sinfo`: Servers in the state of idle and mix are potentially available (sometimes reserved servers are not displayed correctly). - - `squeue -p gpu`: Check the queue of the gpu partition to see if there are any jobs waiting to be run. - - `scontrol show job `: Check the status of a specific job. This could be used to check if a job is waiting for a server that you are also waiting for. - - `scontrol show node `: Check the status of a specific node. This could be used to check the available resources on a node. - - `scontrol show res` : List all reservations including the server, users, and time. Reserved servers are only available to users on the reservation list. -3. Then you have two choices: - - - Submit a slurm script([UVA slurm information](https://www.cs.virginia.edu/wiki/doku.php?id=compute_slurm)) to run a job. An example is below: - ```bash - #!/bin/bash - - #SBATCH --job-name=your_job_name - #SBATCH --output=%x_%j.out - #SBATCH --error=%x_%j.err - #SBATCH --partition=gpu - #SBATCH --nodes=1 - #SBATCH --ntasks-per-node=1 - #SBATCH --cpus-per-task=10 - #SBATCH --gres=gpu:4 - #SBATCH --mem=50G - #SBATCH --time=D-HH:MM:SS - #SBATCH --constraint="a40|a100_80gb|a100_40gb|h100_80gb" # The requested gpu type could be either a40, a100_80gb, a100_40gb or h100_80gb - #SBATCH --reservation= # If you have a reservation on the targeted node - #SBATCH --mail-type=begin,end,fail - #SBATCH --mail-user=@virginia.edu - - module load cuda-toolkit-11.8.0 # Optional, according to your project - - python your_script.py --your_arguments xxx - ``` - Use `sbatch mysbatchscript.sh` to submit the slurm script. - - Use the `salloc` command like the one below to use the server interactively. - ``` - salloc -p gpu -N 1 -c 10 --mem=30G -J InteractiveJob -t 0-00:30:00 --gres=gpu:2 -C a100_80gb - ``` - This command allocates two A100 80GB GPUs, 10 CPUs, and 30GB of RAM for a 30-minute session. If no duration is specified, the default maximum runtime is set to four days. More explanations about the arguments can be found in [UVA slurm information](https://www.cs.virginia.edu/wiki/doku.php?id=compute_slurm). - Once the server resources are allocated, enter the following command to access the server interactively: - ``` - srun --pty bash -i -l -- - ``` -4. After a job is started, you could use `srun --pty --overlap --jobid= ` to run a command on the server. This is useful when you want to run a command on a server that is already running a job. For example, the `` could be `nvtop` to check the GPU usage of the running job. The `` could also be `bash` to open a new terminal on the server. -5. If we have reserved a server([slurm reservations](https://www.cs.virginia.edu/wiki/doku.php?id=compute_slurm#reservations)), add `--reservation=rry4fg_7` in the `salloc` command or `#SBATCH --reservation=rry4fg_7` in the `sbatch` script. Replace `rry4fg_7` with the reservation tag provided by IT. -Note that you cannot use the reserved servers without the tag, even if your ID is on the reservation user ID list. - -### Public server owned by our group - -Node `cheetah06` contains 8 80GB A100 GPUs and is owned by our group. It is open to the public and on the slurm queues. To prevent others from using the machine, you need to put a reservation. Actually, anyone can reserve the server, so for upcoming submission deadlines, it is better to reserve the server in advance. Remember to put all the group members on the list of who can access the reserved server. - -### Modules -Modules are pre-installed software packages in the Slurm system that users can access without root or sudo privileges. - -To view the list of available modules, use the command `module avail`. To load a required module, such as nvtop, use `module load nvtop`. - -If a needed module is not available, contact the CS IT team for assistance with installation or alternative solutions. See [Software Modules](https://www.cs.virginia.edu/wiki/doku.php?id=linux_environment_modules) for more information. - -### Development servers -The `gpusrv[01-19]` servers([GPU servers](https://www.cs.virginia.edu/wiki/doku.php?id=compute_resources#gpu_servers)), which are low GPU memory servers, can be directly accessed via SSH for developing and debugging code. - - -### Server Issues -If you encounter any hardware or software issues with the servers, send an email to cshelpdesk@virginia.edu. +# UVA CV Lab Computing Resources Documentation + +This is a documentation website using the [Just the Docs theme](https://github.com/just-the-docs/just-the-docs), providing guides and tips for using UVA computing resources. + +## Website Structure + +``` +. +├── README.md # Project documentation +├── _config.yml # Jekyll configuration file +├── _sass/ # Custom styles +├── index.md # Homepage +├── Another-page.md # Nice Tricks page (parent page) +│ ├── test-child.md # git-lfs installation guide +│ └── trick_cuda.md # CUDA toolkit installation guide +├── CS_server.md # CS department server usage guide +│ └── CS_Storage.md # CS storage usage guide +└── Rivanna.md # Rivanna usage guide +``` diff --git a/Rivanna.md b/Rivanna.md new file mode 100644 index 0000000..cf68cb6 --- /dev/null +++ b/Rivanna.md @@ -0,0 +1,18 @@ +--- +title: Rivanna +nav_order: 2 +--- + +# Rivanna + +Rivanna is the high-performance computing (HPC) facility at UVA. This page will provide detailed information on how to use Rivanna. + +## Content Planning + +1. Access method +2. Job submission +3. Resource allocation +4. Common software modules +5. Best practices + +*This page is under construction...* diff --git a/Another-page.md b/Tricks.md similarity index 92% rename from Another-page.md rename to Tricks.md index 00c4e22..d7e0d52 100644 --- a/Another-page.md +++ b/Tricks.md @@ -1,7 +1,7 @@ --- title: Nice Tricks has_children: true -nav_order: 1 +nav_order: 3 --- # Tricks for Navigating on UVA compute diff --git a/test-child.md b/git-lfs.md similarity index 100% rename from test-child.md rename to git-lfs.md diff --git a/index.md b/index.md new file mode 100644 index 0000000..502e6b8 --- /dev/null +++ b/index.md @@ -0,0 +1,31 @@ +--- +title: Home +nav_order: 0 +--- + +# UVA CV Lab Computing Resources Documentation + +Welcome to the UVA CV Lab Computing Resources Documentation. This website provides detailed usage instructions and tips for using various computing resources at UVA. + +## Main Content + +1. [Nice Tricks](Tricks.md) + - [Install git-lfs](test-child.md) + - [Install CUDA toolkit](trick_cuda.md) + +2. [CS department servers](CS_server.md) + - How to access servers + - Slurm usage + - Module management + - Development server information + +3. [Rivanna](Rivanna.md) + - To be added + +## Quick Start + +If you are a new user, we recommend the following order: + +1. First, check out the [CS department servers](CS_server.md) +2. Then, check out the [Rivanna supercomputer](Rivanna.md) +3. Finally, familiarize yourself with the [Nice Tricks](Tricks.md) \ No newline at end of file