Merge branch 'master' of github.com:cmc-aau/hpc-docs

cmc-aau · Oct 23, 2024 · 6010d25 · 6010d25
2 parents 3bde503 + 1b46328
commit 6010d25
Show file tree

Hide file tree

Showing 10 changed files with 124 additions and 3 deletions.
diff --git a/TODO b/TODO
@@ -0,0 +1,13 @@
+https://sciencedata.dk/
+https://cbs-hpc.github.io/HPC_Facilities/UCloud/#login-on-ucloud
+https://hpc.aau.dk/
+
+download databases
+ARB:
+# create tmp folder for arb in our home folder and create the file
+arbtmp="${HOME}/.tmp/arb7"
+filename="names_start.dat"
+file_path="${arbtmp}/${filename}"
+mkdir -p "$arbtmp"
+touch "${file_path}"
+apptainer run -B ${file_path}:/opt/arb/lib/nas/names_start.dat -B ~/.Xauthority -B /projects /home/bio.aau.dk/ksa/projects/biocloud-software/containers/apptainer/arb/arb-7.0.sif
diff --git a/docs/guides/git.md b/docs/guides/git.md
@@ -0,0 +1,6 @@
+# Setting up Git/GitHub
+
+
+
+## Repository specific deploy keys
+https://docs.github.com/en/authentication/connecting-to-github-with-ssh/managing-deploy-keys#deploy-keys
diff --git a/docs/guides/sra.md b/docs/guides/sra.md
@@ -0,0 +1,19 @@
+# Downloading data from NCBI SRA
+https://bioinformatics.ccr.cancer.gov/docs/b4b/Module1_Unix_Biowulf/Lesson6/
+
+## Download sra-tools container
+```
+singularity pull docker://ncbi/sra-tools
+```
+
+## Get data
+bioproject: PRJNA192924
+sra: SRR1154613
+prefetch first, then use fasterq-dump
+```
+singularity run sra-tools_latest.sif prefetch SRR1154613
+singularity run sra-tools_latest.sif fasterq-dump --threads $(nproc) --progress --split-3 SRR1154613
+```
+
+## Download SRA data from ENA instead
+https://www.ebi.ac.uk/ena/browser/view/SRR1154613
diff --git a/docs/guides/vscode.md b/docs/guides/vscode.md
@@ -0,0 +1,21 @@
+
+By default, VSCode wants to help you write code. So much so that it starts to get intrusive. When you type, the IntelliSense kicks in and immediately suggests code for you upon each keystroke. When you hover over your code, a popup definition appears. When you type an open parenthesis, it pops up another autocomplete suggestion window. When you type out a tag, a closing tag magically appears, but sometimes it's wrong. I get what VSCode is trying to do but it got to a point where it was annoying me and getting in the way.
+
+If you want VSCode to become more of a passive editor; if you enjoy typing the majority or all of your code, then add these to your settings.json to disable these autocomplete suggestions and pop ups:
+
+```
+"editor.autoClosingBrackets": "never",
+"editor.suggestOnTriggerCharacters": false,
+"editor.quickSuggestions": false,
+"editor.hover.enabled": false,
+"editor.parameterHints.enabled": false,
+"editor.suggest.snippetsPreventQuickSuggestions": false,
+"html.suggest.html5": false
+```
+
+Experiencing problems with vscode, hanging on remote connection etc. Log in on the particular login node and kill all vscode-server processes:
+```
+ps ux | grep [.]vscode-server | awk '{print $2}' | xargs kill
+```
+
+then try again.
diff --git a/docs/guides/webportal/jobcomposer.md b/docs/guides/webportal/jobcomposer.md
@@ -0,0 +1,2 @@
+# Job composer
+Guide coming soon...
diff --git a/docs/slurm/multistep_jobs.md b/docs/slurm/multistep_jobs.md
@@ -0,0 +1,31 @@
+# Multi-step jobs
+
+Many steps in a complex workflow will only run on a single thread regardless of whether you've asked for more. This leads to a waste of resources. You can submit separate jobs by writing down the commands in separate shell scripts, then submit them as individual jobs using sbatch with different resource requirements:
+
+**`launchscript.sh`**
+```bash
+#!/bin/bash
+
+set -euo pipefail
+
+# Submit the first job step and capture its job ID
+step1_jobid=$(sbatch step1_script.sh | awk '{print $4}')
+
+# Submit the second job, ensuring it runs only after the first job completes successfully
+step2_jobid=$(sbatch --dependency=afterok:$step1_jobid step2_script.sh | awk '{print $4}')
+
+# Submit the third/last job, ensuring it runs only after the second job completes successfully
+sbatch --dependency=afterok:$step2_jobid step3_script.sh
+```
+
+Types of Dependencies:
+ - `afterok`: The dependent job runs if the first job completes successfully (exit code 0).
+ - `afternotok`: The dependent job runs if the first job fails (non-zero exit code).
+ - `afterany`: The dependent job runs after the first job completes, regardless of success or failure.
+ - `after:<job_id>`: The dependent job starts when the first job begins execution.
+
+In this case, using `--dependency=afterok` ensures that the second job will only start if the first job finishes without errors.
+
+Submit using bash not sbatch.
+
+This can be repeated as many times as necessary. Any arguments can be passed on to the shell scripts in the exact same way as when invoking them using `bash script -i "some option" -o "some other option"` as usual.
diff --git a/docs/slurm/other.md b/docs/slurm/other.md
@@ -63,3 +63,5 @@ SLURM jobs will have a variety of environment variables set within job allocatio
 | `SLURM_TASK_PID` | The process ID of the task being started |
 | `SLURMD_NODENAME` | Name of the node running the job script |
 | `SLURM_JOB_GPUS` | GPU IDs allocated to the job (if any). |
+
+`sbcast`?
diff --git a/docs/slurm/partitions.md b/docs/slurm/partitions.md
@@ -1,7 +1,5 @@
 # Hardware partitions
-Before submitting a job you must choose the correct hardware partition for it. To ensure hardware utilization is maximized the compute nodes are divided into separate partitions depending on their specs. You mainly need to choose between partitions depending on how much memory (per CPU) your job needs, expected CPU efficiency (i.e. do you expect to keep all CPUs busy at 100% at all times?), whether you need faster temporary scratch space, or a GPU. If in doubt just use the `default-op` for most things.
-
-A few servers also have local scratch storage for faster I/O and also to avoid overburdening the [ceph storage cluster](../storage.md) when lots (millions) of small files need to be written, in which case you must also submit to specific compute nodes that has local storage, more on that later.
+Before submitting a job you must choose the correct hardware partition for it. To ensure hardware utilization is maximized the compute nodes are divided into separate partitions depending on their specs. You mainly need to choose between partitions depending on how much memory (per CPU) your job needs, expected CPU efficiency (i.e. do you expect to keep all CPUs busy at 100% at all times?), whether you need faster, local scratch space ([more details here](../storage.md#local-scratch-space)), or a GPU. If in doubt just use the `default-op` for most things.
 
 ## Memory per CPU
 Lastly, it's also important to note that all partitions have a **max memory per CPU** configured, which may result in the scheduler allocating more CPU's for the job than requested until this ratio is satisfied. This is to ensure that no CPU's end up idle when a compute node is fully allocated in terms of memory, when it could have been at work finishing jobs faster instead.

diff --git a/docs/software/conda.md b/docs/software/conda.md
@@ -20,6 +20,9 @@ dependencies:
 
 Then create the environment with `conda env create -f requirements.yml`. You can also export an **activated** environment created previously and dump the exact versions used into a YAML file with `conda env export > requirements.yml`.
 
+# Create empty env
+# DO NOT INSTALL CONDA YOURSELF
+
 ???+ "Note"
       When you export a conda environment to a file the file may also contain a host-specific `prefix` line, which should be removed.
 
@@ -34,6 +37,11 @@ List available environments with
 conda env list
 ```
 
+https://conda.github.io/conda-lock/
+
+
+DISABLE STRICT CHANNELS
+
 ## Installing packages using pip within conda environments
 Software that can only be installed with pip have to be installed in a Conda environment by using pip inside the environment. While issues can arise, per the [Conda guide for using pip in a Conda environment](https://www.anaconda.com/blog/using-pip-in-a-conda-environment), there are some best practices to follow to reduce their likelihood:
 

diff --git a/docs/software/preinstalled.md b/docs/software/preinstalled.md
@@ -0,0 +1,21 @@
+# Pre-installed software
+In addition to software management tools, there are a few things that are installed natively.
+
+The following software is pre-installed:
+
+ - CLC (on `axomamma` only)
+
+## CLC Genomics Workbench
+As described here, you can run graphical apps in a SLURM job while the windows show up on your own computer by using the `--x11` option to `srun` and `salloc`, as described here https://cmc-aau.github.io/biocloud-docs/slurm/jobsubmission/#graphical-apps-gui. For example to run CLC you could login to a login node, then run:
+```
+srun --cpus-per-task 4 --mem 4G --nodelist axomamma --x11 /usr/local/CLCGenomicsWorkbench24/clcgenomicswb24
+```
+
+## ARB
+ARB is old and unmaintained. Version 6 is available through conda, but the latest version 7 is not, and the only way to run it on servers running a later Ubuntu version than 20.04 is through a container:
+```
+srun --cpus-per-task 4 --mem 4G --x11 apptainer run -B ~/.Xauthority -B /projects /home/bio.aau.dk/ksa/projects/biocloud-software/containers/apptainer/arb/arb-7.0.sif
+```
+
+## Alphafold
+databases