Skip to content

Commit

Permalink
reorganize
Browse files Browse the repository at this point in the history
  • Loading branch information
yaojin17 committed Dec 30, 2024
1 parent 7cc00f1 commit a261f3b
Show file tree
Hide file tree
Showing 7 changed files with 147 additions and 71 deletions.
77 changes: 77 additions & 0 deletions CS_server.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
---
title: CS department servers
has_children: true
nav_order: 1
---

## CS department servers

The UVA CS Department provides servers for computing needs. See [UVA Computing Resources](https://www.cs.virginia.edu/wiki/doku.php?id=compute_resources) for more information.

### How to access servers

1. Log in to `portal.cs.virginia.edu` through ssh first as a forward server. If you do not have credentials yet, contact the CS IT team to request access for your computing ID.

2. Check available servers. Commonly used commands:
- `sinfo`: Servers in the state of idle and mix are potentially available (sometimes reserved servers are not displayed correctly).
- `squeue -p gpu`: Check the queue of the gpu partition to see if there are any jobs waiting to be run.
- `scontrol show job <jobid>`: Check the status of a specific job. This could be used to check if a job is waiting for a server that you are also waiting for.
- `scontrol show node <node>`: Check the status of a specific node. This could be used to check the available resources on a node.
- `scontrol show res` : List all reservations including the server, users, and time. Reserved servers are only available to users on the reservation list.
3. Then you have two choices:

- Submit a slurm script([UVA slurm information](https://www.cs.virginia.edu/wiki/doku.php?id=compute_slurm)) to run a job. An example is below:
```bash
#!/bin/bash

#SBATCH --job-name=your_job_name
#SBATCH --output=%x_%j.out
#SBATCH --error=%x_%j.err
#SBATCH --partition=gpu
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=10
#SBATCH --gres=gpu:4
#SBATCH --mem=50G
#SBATCH --time=D-HH:MM:SS
#SBATCH --constraint="a40|a100_80gb|a100_40gb|h100_80gb" # The requested gpu type could be either a40, a100_80gb, a100_40gb or h100_80gb
#SBATCH --reservation=<reservation name> # If you have a reservation on the targeted node
#SBATCH --mail-type=begin,end,fail
#SBATCH --mail-user=<computingID>@virginia.edu

module load cuda-toolkit-11.8.0 # Optional, according to your project

python your_script.py --your_arguments xxx
```
Use `sbatch mysbatchscript.sh` to submit the slurm script.
- Use the `salloc` command like the one below to use the server interactively.
```
salloc -p gpu -N 1 -c 10 --mem=30G -J InteractiveJob -t 0-00:30:00 --gres=gpu:2 -C a100_80gb
```
This command allocates two A100 80GB GPUs, 10 CPUs, and 30GB of RAM for a 30-minute session. If no duration is specified, the default maximum runtime is set to four days. More explanations about the arguments can be found in [UVA slurm information](https://www.cs.virginia.edu/wiki/doku.php?id=compute_slurm).
Once the server resources are allocated, enter the following command to access the server interactively:
```
srun --pty bash -i -l --
```
4. After a job is started, you could use `srun --pty --overlap --jobid=<jobid> <command>` to run a command on the server. This is useful when you want to run a command on a server that is already running a job. For example, the `<command>` could be `nvtop` to check the GPU usage of the running job. The `<command>` could also be `bash` to open a new terminal on the server.
5. If we have reserved a server([slurm reservations](https://www.cs.virginia.edu/wiki/doku.php?id=compute_slurm#reservations)), add `--reservation=rry4fg_7` in the `salloc` command or `#SBATCH --reservation=rry4fg_7` in the `sbatch` script. Replace `rry4fg_7` with the reservation tag provided by IT.
Note that you cannot use the reserved servers without the tag, even if your ID is on the reservation user ID list.

### Public server owned by our group

Node `cheetah06` contains 8 80GB A100 GPUs and is owned by our group. It is open to the public and on the slurm queues. To prevent others from using the machine, you need to put a reservation. Actually, anyone can reserve the server, so for upcoming submission deadlines, it is better to reserve the server in advance. Remember to put all the group members on the list of who can access the reserved server.

### Modules
Modules are pre-installed software packages in the Slurm system that users can access without root or sudo privileges.

To view the list of available modules, use the command `module avail`. To load a required module, such as nvtop, use `module load nvtop`.

If a needed module is not available, contact the CS IT team for assistance with installation or alternative solutions. See [Software Modules](https://www.cs.virginia.edu/wiki/doku.php?id=linux_environment_modules) for more information.

### Development servers
The `gpusrv[01-19]` servers([GPU servers](https://www.cs.virginia.edu/wiki/doku.php?id=compute_resources#gpu_servers)), which are low GPU memory servers, can be directly accessed via SSH for developing and debugging code.


### Server Issues
If you encounter any hardware or software issues with the servers, send an email to [email protected].

1 change: 1 addition & 0 deletions Storage.md → CS_storage.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
title: Storage
parent: CS department servers
has_children: false
nav_order: 2
---
Expand Down
89 changes: 19 additions & 70 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,71 +1,20 @@
## Public Servers

The UVA CS Department provides servers for computing needs. See [UVA Computing Resources](https://www.cs.virginia.edu/wiki/doku.php?id=compute_resources) for more information.

### How to access servers

1. Log in to `portal.cs.virginia.edu` through ssh first as a forward server. If you do not have credentials yet, contact the CS IT team to request access for your computing ID.

2. Check available servers. Commonly used commands:
- `sinfo`: Servers in the state of idle and mix are potentially available (sometimes reserved servers are not displayed correctly).
- `squeue -p gpu`: Check the queue of the gpu partition to see if there are any jobs waiting to be run.
- `scontrol show job <jobid>`: Check the status of a specific job. This could be used to check if a job is waiting for a server that you are also waiting for.
- `scontrol show node <node>`: Check the status of a specific node. This could be used to check the available resources on a node.
- `scontrol show res` : List all reservations including the server, users, and time. Reserved servers are only available to users on the reservation list.
3. Then you have two choices:

- Submit a slurm script([UVA slurm information](https://www.cs.virginia.edu/wiki/doku.php?id=compute_slurm)) to run a job. An example is below:
```bash
#!/bin/bash

#SBATCH --job-name=your_job_name
#SBATCH --output=%x_%j.out
#SBATCH --error=%x_%j.err
#SBATCH --partition=gpu
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=10
#SBATCH --gres=gpu:4
#SBATCH --mem=50G
#SBATCH --time=D-HH:MM:SS
#SBATCH --constraint="a40|a100_80gb|a100_40gb|h100_80gb" # The requested gpu type could be either a40, a100_80gb, a100_40gb or h100_80gb
#SBATCH --reservation=<reservation name> # If you have a reservation on the targeted node
#SBATCH --mail-type=begin,end,fail
#SBATCH --mail-user=<computingID>@virginia.edu

module load cuda-toolkit-11.8.0 # Optional, according to your project

python your_script.py --your_arguments xxx
```
Use `sbatch mysbatchscript.sh` to submit the slurm script.
- Use the `salloc` command like the one below to use the server interactively.
```
salloc -p gpu -N 1 -c 10 --mem=30G -J InteractiveJob -t 0-00:30:00 --gres=gpu:2 -C a100_80gb
```
This command allocates two A100 80GB GPUs, 10 CPUs, and 30GB of RAM for a 30-minute session. If no duration is specified, the default maximum runtime is set to four days. More explanations about the arguments can be found in [UVA slurm information](https://www.cs.virginia.edu/wiki/doku.php?id=compute_slurm).
Once the server resources are allocated, enter the following command to access the server interactively:
```
srun --pty bash -i -l --
```
4. After a job is started, you could use `srun --pty --overlap --jobid=<jobid> <command>` to run a command on the server. This is useful when you want to run a command on a server that is already running a job. For example, the `<command>` could be `nvtop` to check the GPU usage of the running job. The `<command>` could also be `bash` to open a new terminal on the server.
5. If we have reserved a server([slurm reservations](https://www.cs.virginia.edu/wiki/doku.php?id=compute_slurm#reservations)), add `--reservation=rry4fg_7` in the `salloc` command or `#SBATCH --reservation=rry4fg_7` in the `sbatch` script. Replace `rry4fg_7` with the reservation tag provided by IT.
Note that you cannot use the reserved servers without the tag, even if your ID is on the reservation user ID list.

### Public server owned by our group

Node `cheetah06` contains 8 80GB A100 GPUs and is owned by our group. It is open to the public and on the slurm queues. To prevent others from using the machine, you need to put a reservation. Actually, anyone can reserve the server, so for upcoming submission deadlines, it is better to reserve the server in advance. Remember to put all the group members on the list of who can access the reserved server.

### Modules
Modules are pre-installed software packages in the Slurm system that users can access without root or sudo privileges.

To view the list of available modules, use the command `module avail`. To load a required module, such as nvtop, use `module load nvtop`.

If a needed module is not available, contact the CS IT team for assistance with installation or alternative solutions. See [Software Modules](https://www.cs.virginia.edu/wiki/doku.php?id=linux_environment_modules) for more information.

### Development servers
The `gpusrv[01-19]` servers([GPU servers](https://www.cs.virginia.edu/wiki/doku.php?id=compute_resources#gpu_servers)), which are low GPU memory servers, can be directly accessed via SSH for developing and debugging code.


### Server Issues
If you encounter any hardware or software issues with the servers, send an email to [email protected].
# UVA CV Lab Computing Resources Documentation

This is a documentation website using the [Just the Docs theme](https://github.com/just-the-docs/just-the-docs), providing guides and tips for using UVA computing resources.

## Website Structure

```
.
├── README.md # Project documentation
├── _config.yml # Jekyll configuration file
├── _sass/ # Custom styles
├── index.md # Homepage
├── Another-page.md # Nice Tricks page (parent page)
│ ├── test-child.md # git-lfs installation guide
│ └── trick_cuda.md # CUDA toolkit installation guide
├── CS_server.md # CS department server usage guide
│ └── CS_Storage.md # CS storage usage guide
└── Rivanna.md # Rivanna usage guide
```

18 changes: 18 additions & 0 deletions Rivanna.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
---
title: Rivanna
nav_order: 2
---

# Rivanna

Rivanna is the high-performance computing (HPC) facility at UVA. This page will provide detailed information on how to use Rivanna.

## Content Planning

1. Access method
2. Job submission
3. Resource allocation
4. Common software modules
5. Best practices

*This page is under construction...*
2 changes: 1 addition & 1 deletion Another-page.md → Tricks.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: Nice Tricks
has_children: true
nav_order: 1
nav_order: 3
---

# Tricks for Navigating on UVA compute
Expand Down
File renamed without changes.
31 changes: 31 additions & 0 deletions index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
---
title: Home
nav_order: 0
---

# UVA CV Lab Computing Resources Documentation

Welcome to the UVA CV Lab Computing Resources Documentation. This website provides detailed usage instructions and tips for using various computing resources at UVA.

## Main Content

1. [Nice Tricks](Tricks.md)
- [Install git-lfs](test-child.md)
- [Install CUDA toolkit](trick_cuda.md)

2. [CS department servers](CS_server.md)
- How to access servers
- Slurm usage
- Module management
- Development server information

3. [Rivanna](Rivanna.md)
- To be added

## Quick Start

If you are a new user, we recommend the following order:

1. First, check out the [CS department servers](CS_server.md)
2. Then, check out the [Rivanna supercomputer](Rivanna.md)
3. Finally, familiarize yourself with the [Nice Tricks](Tricks.md)

0 comments on commit a261f3b

Please sign in to comment.