updated Navigation

added lessons learned document revised language added transfer
IKIM-Essen · Nov 29, 2023 · 80e93da · 80e93da
1 parent ae97565
commit 80e93da
Show file tree

Hide file tree

Showing 5 changed files with 54 additions and 23 deletions.
diff --git a/docs/antipatterns.md b/docs/antipatterns.md
@@ -0,0 +1,17 @@
+# Anti patterns
+
+This document contains a list of "don't do this" type of things. The intent is to add lessons learned on how NOT TO DO things.
+
+
+### Blocking nodes with your editor by executing ripgrep
+
+We notice some users running processes taking a lot of CPU and IO with rg (short for ripgrep). This code is executed by Visual Studio Code and is probably not adding any value to the user.
+
+Jan Trienes commented: 
+```
+a good thing to know here is that VSCode recursive search 
+excludes patterns given in .gitignore and .  ignore. 
+So best practice is to have a language-specific gitignore 
+in the project to avoid searches over common directories with 
+tens of thousands of small files (like venv/node_modules etc.)
+```
diff --git a/docs/getting-started.md b/docs/getting-started.md
@@ -122,7 +122,7 @@ We recommend two options for installing and using an SSH client on Windows:
 
 ## What hardware is available on the IKIM cluster?
 
-The cluster has two sets of servers: 120 nodes for CPU-bound tasks and 10 nodes for GPU-bound tasks. At this moment, not all of these nodes are available for general computation tasks. However, more will be added in future. The following hardware is installed in the servers:
+The cluster has two sets of servers: 120 nodes for CPU-bound tasks and 10+ nodes for GPU-bound tasks. At this moment, not all of these nodes are available for general computation tasks. However, more will be added in future. The following hardware is installed in the servers:
 
 - CPU nodes (`c1` - `c120`): Each with 192GB RAM, 2 CPU Intel, 1 SSD for system and 1 SSD for data (2TB).
 - GPU nodes (`g1-1` - `g1-10`): Each with 6 NVIDIA RTX 6000 GPUs, 1024GB RAM, 2 CPU AMD, 1 SSD for system (1TB) and 2 NVMe for data (12TB configured as RAID-0).
@@ -132,26 +132,32 @@ A subset of these nodes are deployed as a Slurm cluster. Unless instructed other
 
 ## What software is available on the IKIM cluster?
 
-The main entrypoint to the cluster is Slurm. On compute nodes, we aim to keep the environment as clean as possible, therefore only commonly used software packages are pre-installed and configured. At this moment the list includes:
+Short answer: Everything under the sun. You can install software yourself using either a [package manager](conda.pm) or build or run a [container](apptainer.md). Containers can be used e.g. to run a different operating system if you absolutely need to.
+To avoid resource contention we recommend using our [resource manager](slurm.md).
 
-- Python 3
-- Mamba/Conda
-- Apptainer
+Example: To install [scikit-learn](https://scikit-learn.org/stable/install.html) all you need to do is 
 
-Introductory guides can be reached from the navigation pane.
+```
+conda create -n sklearn-env -c conda-forge scikit-learn
+conda activate sklearn-env
+```
+
+Conda and its siblings (anaconda and mamba) provide access to [thousands of software packages](https://conda-forge.org/feedstock-outputs/), you can 
+set up your required software by yourself and even have multiple environments. The [conda intro](https://conda.io/projects/conda/en/latest/user-guide/getting-started.html) provides a good starting point.
 
 ## Where to store your data?
 
 There are several locations where you can store data on the cluster:
 
 - **Your home directory** (`/homes/<username>/`): This directory is only for personal data such as configuration files. Anything related to work or that should be visible to other people should not reside here.
 - **Project directory** (`/projects/<project_name>/`): This location should be used for data related to your project. If you are starting a project, ask your project coordinator to create a directory and provide a list of participating users. Note that you cannot simply list all project directories via `ls /projects`; instead, you need to specify the full path, such as: `ls /projects/dso_mp_ws2021/`
-- **Public dataset directory** (`/projects/datashare`): A world-readable location for datasets for which no special access rights are required. To lower the risk of data loss, each user can write only in a subdirectory corresponding to their research group. For example, a user which belongs to group `tio` must add new datasets in `/projects/datashare/tio` but can browse and read throughout `/projects/datashare`.
+- **Public dataset directory** (`/projects/datashare`): A world-readable location for datasets for which no special access rights are required. To lower the risk of data loss, each user can write only in a subdirectory corresponding to their research group. For example, a user which belongs to group `tio` should add new datasets in `/projects/datashare/tio` but can browse and read throughout `/projects/datashare`.
 - **Group directory** (`/groups/<group_name>`): This is the appropriate place for any data that should be shared _within an IKIM research group_. In student projects you will most likely not need group directories.
 
 All of the above directories (homes, projects, groups) are shared with other hosts on the cluster through the network file system (NFS). This is convenient: sharing data between hosts becomes effortless and your data is stored redundantly on the file server.
 
-Also see the [storage](./storage.md) for details and also info on performance.
+Also see the [storage](storage.md) for details and also info on performance. If you need to transfer data, reading [transfer](transfer.md)
+
 
 ## GitHub Authentication through SSH
 

diff --git a/docs/index.md b/docs/index.md
@@ -9,8 +9,15 @@ The sources of this documentation can be found on [GitHub](https://github.com/IK
 ## Intro
 We believe it is important to know a thing or two about the underlying computer and network infrastructure to be effective. We note this will also limit your frustration level. We do not provide extensive documentation, but rather jumping off points and short best practice info on your setup and your procedures.
 
-  By configuring your environment correctly you can make your job easier, check out [./conda.md](Mamba/Conda) for info on getting the right execution environment set up for your code, it might be desirable to run them inside a [apptainer.md](container) similar to the Docker system. Some users will benefit from using interactive [jupyter.md](Jupyter Notebooks).
+  By configuring your environment correctly you can make your job easier, check out [Mamba/Conda](conda) for info on getting the right execution environment set up for your code, it might be desirable to run them inside a [container](apptainer) similar to the Docker system. Some users will benefit from using interactive [Jupyter Notebooks](jupyter).
 
-  Since most computes will involve [storage.md](storing, accessing and moving data), as well as [transfer.md](transferring data) into the cluster.
+  Since most computes will involve [storing, accessing and moving data](storage), as well as [transferring data](transfer) into the cluster.
 
-  If you pay attention to a few details in organizing your compute things will go a lot smoother. We recommend using reprocudible approaches with [snakemake.md](SnakeMake) to structure your compute and a use a professional resource manager ([slurm.md](Slurm)) to structure your access to computing devices.
+  If you pay attention to a few details in organizing your compute things will go a lot smoother. We 
+  recommend using reprocudible approaches with [SnakeMake](snakemake) to structure your compute and a 
+  use a resource manager ([slurm](slurm)) to structure your access to computing devices.
+
+  We note that typically compute resources are available but the lack of good computing practices leads 
+  contention for IO resources, that in turn slow everyone down.
+
+We are adding things to the documentation to aid our users, please familiarize yourself with it. Also check out the [lessons learned](antipatterns).
diff --git a/docs/transfer.md b/docs/transfer.md
@@ -14,6 +14,7 @@ Larger scale data transfer requires some degree of familiarity with the technolo
 We provide three different means for data transfer. We note that for larger transfers, the speed of the device the data is stored on remotely makes a difference.
 
 ### Using a web browser to move data into the cluster
+
 Use your web browser to upload data to a facility we yet have to build.
 
 Details: (TBA)
@@ -23,12 +24,14 @@ Disadvantages: not suitable for many files
 Intended data scope: up to 500GB works
 
 ### Using ssh / scp to move data into the cluster
+
 In short on the remote system execute
 `tar -cpf - | ssh -J login.ikim.uk-essen.de shellhost.ikim.uk-essen.de "tar -xpf -" `
 
 Read [this](https://www.cyberciti.biz/faq/howto-use-tar-command-through-network-over-ssh-session/) for more details.
 
 ### Using NC to move data into the cluster
+
 In short: 
 - ensure NC is installed on the remote system
 - you need to execute commands on both sending and receiving system
@@ -58,4 +61,5 @@ Depending on your needs and the systems involved, your technology choices may va
 | nc       | unlimiteed  | complicated, use zip or tar to group files | 
 
 ### Miscellaneous comments
+
 The local storage on each node typically consists of a system partition and a data partition. 
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -18,15 +18,12 @@ nav:
   - Introduction: index.md
   - Getting Started: getting-started.md
   - User Guide:
-     - configuring your environment
-     - Mamba/Conda: conda.md
-     - Containers: apptainer.md
-     - Jupyter Notebooks: jupyter.md
-     - working with data
-     - Storage: storage.md
-     - Data Transfer: transfer.md
-     - organizing compute
-     - Snakemake: snakemake.md
-     - Slurm: slurm.md
-     - everything else 
-     - Troubleshooting: troubleshooting.md
+     - Setup -- Mamba/Conda: conda.md
+     - Setup -- Containers: apptainer.md
+     - Setup -- Jupyter Notebooks: jupyter.md
+     - Data -- Storage: storage.md
+     - Data -- Transfer: transfer.md
+     - Computing -- Snakemake: snakemake.md
+     - Computing -- Slurm: slurm.md
+     - Misc -- Troubleshooting: troubleshooting.md
+     - Misc -- Antipatterns: antipatterns.md