Skip to content

Commit

Permalink
Merge branch 'master' of github.com:cmc-aau/hpc-docs
Browse files Browse the repository at this point in the history
  • Loading branch information
KasperSkytte committed Oct 23, 2024
2 parents 3bde503 + 1b46328 commit 6010d25
Show file tree
Hide file tree
Showing 10 changed files with 124 additions and 3 deletions.
13 changes: 13 additions & 0 deletions TODO
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
https://sciencedata.dk/
https://cbs-hpc.github.io/HPC_Facilities/UCloud/#login-on-ucloud
https://hpc.aau.dk/

download databases
ARB:
# create tmp folder for arb in our home folder and create the file
arbtmp="${HOME}/.tmp/arb7"
filename="names_start.dat"
file_path="${arbtmp}/${filename}"
mkdir -p "$arbtmp"
touch "${file_path}"
apptainer run -B ${file_path}:/opt/arb/lib/nas/names_start.dat -B ~/.Xauthority -B /projects /home/bio.aau.dk/ksa/projects/biocloud-software/containers/apptainer/arb/arb-7.0.sif
6 changes: 6 additions & 0 deletions docs/guides/git.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Setting up Git/GitHub



## Repository specific deploy keys
https://docs.github.com/en/authentication/connecting-to-github-with-ssh/managing-deploy-keys#deploy-keys
19 changes: 19 additions & 0 deletions docs/guides/sra.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Downloading data from NCBI SRA
https://bioinformatics.ccr.cancer.gov/docs/b4b/Module1_Unix_Biowulf/Lesson6/

## Download sra-tools container
```
singularity pull docker://ncbi/sra-tools
```

## Get data
bioproject: PRJNA192924
sra: SRR1154613
prefetch first, then use fasterq-dump
```
singularity run sra-tools_latest.sif prefetch SRR1154613
singularity run sra-tools_latest.sif fasterq-dump --threads $(nproc) --progress --split-3 SRR1154613
```

## Download SRA data from ENA instead
https://www.ebi.ac.uk/ena/browser/view/SRR1154613
21 changes: 21 additions & 0 deletions docs/guides/vscode.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@

By default, VSCode wants to help you write code. So much so that it starts to get intrusive. When you type, the IntelliSense kicks in and immediately suggests code for you upon each keystroke. When you hover over your code, a popup definition appears. When you type an open parenthesis, it pops up another autocomplete suggestion window. When you type out a tag, a closing tag magically appears, but sometimes it's wrong. I get what VSCode is trying to do but it got to a point where it was annoying me and getting in the way.

If you want VSCode to become more of a passive editor; if you enjoy typing the majority or all of your code, then add these to your settings.json to disable these autocomplete suggestions and pop ups:

```
"editor.autoClosingBrackets": "never",
"editor.suggestOnTriggerCharacters": false,
"editor.quickSuggestions": false,
"editor.hover.enabled": false,
"editor.parameterHints.enabled": false,
"editor.suggest.snippetsPreventQuickSuggestions": false,
"html.suggest.html5": false
```

Experiencing problems with vscode, hanging on remote connection etc. Log in on the particular login node and kill all vscode-server processes:
```
ps ux | grep [.]vscode-server | awk '{print $2}' | xargs kill
```

then try again.
2 changes: 2 additions & 0 deletions docs/guides/webportal/jobcomposer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Job composer
Guide coming soon...
31 changes: 31 additions & 0 deletions docs/slurm/multistep_jobs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Multi-step jobs

Many steps in a complex workflow will only run on a single thread regardless of whether you've asked for more. This leads to a waste of resources. You can submit separate jobs by writing down the commands in separate shell scripts, then submit them as individual jobs using sbatch with different resource requirements:

**`launchscript.sh`**
```bash
#!/bin/bash

set -euo pipefail

# Submit the first job step and capture its job ID
step1_jobid=$(sbatch step1_script.sh | awk '{print $4}')

# Submit the second job, ensuring it runs only after the first job completes successfully
step2_jobid=$(sbatch --dependency=afterok:$step1_jobid step2_script.sh | awk '{print $4}')

# Submit the third/last job, ensuring it runs only after the second job completes successfully
sbatch --dependency=afterok:$step2_jobid step3_script.sh
```

Types of Dependencies:
- `afterok`: The dependent job runs if the first job completes successfully (exit code 0).
- `afternotok`: The dependent job runs if the first job fails (non-zero exit code).
- `afterany`: The dependent job runs after the first job completes, regardless of success or failure.
- `after:<job_id>`: The dependent job starts when the first job begins execution.

In this case, using `--dependency=afterok` ensures that the second job will only start if the first job finishes without errors.

Submit using bash not sbatch.

This can be repeated as many times as necessary. Any arguments can be passed on to the shell scripts in the exact same way as when invoking them using `bash script -i "some option" -o "some other option"` as usual.
2 changes: 2 additions & 0 deletions docs/slurm/other.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,3 +63,5 @@ SLURM jobs will have a variety of environment variables set within job allocatio
| `SLURM_TASK_PID` | The process ID of the task being started |
| `SLURMD_NODENAME` | Name of the node running the job script |
| `SLURM_JOB_GPUS` | GPU IDs allocated to the job (if any). |

`sbcast`?
4 changes: 1 addition & 3 deletions docs/slurm/partitions.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
# Hardware partitions
Before submitting a job you must choose the correct hardware partition for it. To ensure hardware utilization is maximized the compute nodes are divided into separate partitions depending on their specs. You mainly need to choose between partitions depending on how much memory (per CPU) your job needs, expected CPU efficiency (i.e. do you expect to keep all CPUs busy at 100% at all times?), whether you need faster temporary scratch space, or a GPU. If in doubt just use the `default-op` for most things.

A few servers also have local scratch storage for faster I/O and also to avoid overburdening the [ceph storage cluster](../storage.md) when lots (millions) of small files need to be written, in which case you must also submit to specific compute nodes that has local storage, more on that later.
Before submitting a job you must choose the correct hardware partition for it. To ensure hardware utilization is maximized the compute nodes are divided into separate partitions depending on their specs. You mainly need to choose between partitions depending on how much memory (per CPU) your job needs, expected CPU efficiency (i.e. do you expect to keep all CPUs busy at 100% at all times?), whether you need faster, local scratch space ([more details here](../storage.md#local-scratch-space)), or a GPU. If in doubt just use the `default-op` for most things.

## Memory per CPU
Lastly, it's also important to note that all partitions have a **max memory per CPU** configured, which may result in the scheduler allocating more CPU's for the job than requested until this ratio is satisfied. This is to ensure that no CPU's end up idle when a compute node is fully allocated in terms of memory, when it could have been at work finishing jobs faster instead.
Expand Down
8 changes: 8 additions & 0 deletions docs/software/conda.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,9 @@ dependencies:

Then create the environment with `conda env create -f requirements.yml`. You can also export an **activated** environment created previously and dump the exact versions used into a YAML file with `conda env export > requirements.yml`.

# Create empty env
# DO NOT INSTALL CONDA YOURSELF

???+ "Note"
When you export a conda environment to a file the file may also contain a host-specific `prefix` line, which should be removed.

Expand All @@ -34,6 +37,11 @@ List available environments with
conda env list
```

https://conda.github.io/conda-lock/


DISABLE STRICT CHANNELS

## Installing packages using pip within conda environments
Software that can only be installed with pip have to be installed in a Conda environment by using pip inside the environment. While issues can arise, per the [Conda guide for using pip in a Conda environment](https://www.anaconda.com/blog/using-pip-in-a-conda-environment), there are some best practices to follow to reduce their likelihood:

Expand Down
21 changes: 21 additions & 0 deletions docs/software/preinstalled.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Pre-installed software
In addition to software management tools, there are a few things that are installed natively.

The following software is pre-installed:

- CLC (on `axomamma` only)

## CLC Genomics Workbench
As described here, you can run graphical apps in a SLURM job while the windows show up on your own computer by using the `--x11` option to `srun` and `salloc`, as described here https://cmc-aau.github.io/biocloud-docs/slurm/jobsubmission/#graphical-apps-gui. For example to run CLC you could login to a login node, then run:
```
srun --cpus-per-task 4 --mem 4G --nodelist axomamma --x11 /usr/local/CLCGenomicsWorkbench24/clcgenomicswb24
```

## ARB
ARB is old and unmaintained. Version 6 is available through conda, but the latest version 7 is not, and the only way to run it on servers running a later Ubuntu version than 20.04 is through a container:
```
srun --cpus-per-task 4 --mem 4G --x11 apptainer run -B ~/.Xauthority -B /projects /home/bio.aau.dk/ksa/projects/biocloud-software/containers/apptainer/arb/arb-7.0.sif
```

## Alphafold
databases

0 comments on commit 6010d25

Please sign in to comment.