Skip to content

Commit

Permalink
updated Navigation
Browse files Browse the repository at this point in the history
added lessons learned document
revised language
added transfer
  • Loading branch information
folker committed Nov 29, 2023
1 parent ae97565 commit 80e93da
Show file tree
Hide file tree
Showing 5 changed files with 54 additions and 23 deletions.
17 changes: 17 additions & 0 deletions docs/antipatterns.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Anti patterns

This document contains a list of "don't do this" type of things. The intent is to add lessons learned on how NOT TO DO things.


Check failure on line 5 in docs/antipatterns.md

View workflow job for this annotation

GitHub Actions / build

Trailing spaces [Expected: 0 or 2; Actual: 1]

Check failure on line 5 in docs/antipatterns.md

View workflow job for this annotation

GitHub Actions / build

Hard tabs [Column: 1]

Check failure on line 5 in docs/antipatterns.md

View workflow job for this annotation

GitHub Actions / build

Multiple consecutive blank lines [Expected: 1; Actual: 2]
### Blocking nodes with your editor by executing ripgrep

Check failure on line 6 in docs/antipatterns.md

View workflow job for this annotation

GitHub Actions / build

Heading levels should only increment by one level at a time [Expected: h2; Actual: h3]

We notice some users running processes taking a lot of CPU and IO with rg (short for ripgrep). This code is executed by Visual Studio Code and is probably not adding any value to the user.

Jan Trienes commented:

Check failure on line 10 in docs/antipatterns.md

View workflow job for this annotation

GitHub Actions / build

Trailing spaces [Expected: 0 or 2; Actual: 1]
```

Check failure on line 11 in docs/antipatterns.md

View workflow job for this annotation

GitHub Actions / build

Fenced code blocks should be surrounded by blank lines [Context: "```"]

Check failure on line 11 in docs/antipatterns.md

View workflow job for this annotation

GitHub Actions / build

Fenced code blocks should have a language specified [Context: "```"]
a good thing to know here is that VSCode recursive search
excludes patterns given in .gitignore and . ignore.
So best practice is to have a language-specific gitignore
in the project to avoid searches over common directories with
tens of thousands of small files (like venv/node_modules etc.)
```
22 changes: 14 additions & 8 deletions docs/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ We recommend two options for installing and using an SSH client on Windows:

## What hardware is available on the IKIM cluster?

The cluster has two sets of servers: 120 nodes for CPU-bound tasks and 10 nodes for GPU-bound tasks. At this moment, not all of these nodes are available for general computation tasks. However, more will be added in future. The following hardware is installed in the servers:
The cluster has two sets of servers: 120 nodes for CPU-bound tasks and 10+ nodes for GPU-bound tasks. At this moment, not all of these nodes are available for general computation tasks. However, more will be added in future. The following hardware is installed in the servers:

- CPU nodes (`c1` - `c120`): Each with 192GB RAM, 2 CPU Intel, 1 SSD for system and 1 SSD for data (2TB).
- GPU nodes (`g1-1` - `g1-10`): Each with 6 NVIDIA RTX 6000 GPUs, 1024GB RAM, 2 CPU AMD, 1 SSD for system (1TB) and 2 NVMe for data (12TB configured as RAID-0).
Expand All @@ -132,26 +132,32 @@ A subset of these nodes are deployed as a Slurm cluster. Unless instructed other

## What software is available on the IKIM cluster?

The main entrypoint to the cluster is Slurm. On compute nodes, we aim to keep the environment as clean as possible, therefore only commonly used software packages are pre-installed and configured. At this moment the list includes:
Short answer: Everything under the sun. You can install software yourself using either a [package manager](conda.pm) or build or run a [container](apptainer.md). Containers can be used e.g. to run a different operating system if you absolutely need to.
To avoid resource contention we recommend using our [resource manager](slurm.md).

- Python 3
- Mamba/Conda
- Apptainer
Example: To install [scikit-learn](https://scikit-learn.org/stable/install.html) all you need to do is

Introductory guides can be reached from the navigation pane.
```
conda create -n sklearn-env -c conda-forge scikit-learn
conda activate sklearn-env
```

Conda and its siblings (anaconda and mamba) provide access to [thousands of software packages](https://conda-forge.org/feedstock-outputs/), you can
set up your required software by yourself and even have multiple environments. The [conda intro](https://conda.io/projects/conda/en/latest/user-guide/getting-started.html) provides a good starting point.

## Where to store your data?

There are several locations where you can store data on the cluster:

- **Your home directory** (`/homes/<username>/`): This directory is only for personal data such as configuration files. Anything related to work or that should be visible to other people should not reside here.
- **Project directory** (`/projects/<project_name>/`): This location should be used for data related to your project. If you are starting a project, ask your project coordinator to create a directory and provide a list of participating users. Note that you cannot simply list all project directories via `ls /projects`; instead, you need to specify the full path, such as: `ls /projects/dso_mp_ws2021/`
- **Public dataset directory** (`/projects/datashare`): A world-readable location for datasets for which no special access rights are required. To lower the risk of data loss, each user can write only in a subdirectory corresponding to their research group. For example, a user which belongs to group `tio` must add new datasets in `/projects/datashare/tio` but can browse and read throughout `/projects/datashare`.
- **Public dataset directory** (`/projects/datashare`): A world-readable location for datasets for which no special access rights are required. To lower the risk of data loss, each user can write only in a subdirectory corresponding to their research group. For example, a user which belongs to group `tio` should add new datasets in `/projects/datashare/tio` but can browse and read throughout `/projects/datashare`.
- **Group directory** (`/groups/<group_name>`): This is the appropriate place for any data that should be shared _within an IKIM research group_. In student projects you will most likely not need group directories.

All of the above directories (homes, projects, groups) are shared with other hosts on the cluster through the network file system (NFS). This is convenient: sharing data between hosts becomes effortless and your data is stored redundantly on the file server.

Also see the [storage](./storage.md) for details and also info on performance.
Also see the [storage](storage.md) for details and also info on performance. If you need to transfer data, reading [transfer](transfer.md)


## GitHub Authentication through SSH

Expand Down
13 changes: 10 additions & 3 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,15 @@ The sources of this documentation can be found on [GitHub](https://github.com/IK
## Intro
We believe it is important to know a thing or two about the underlying computer and network infrastructure to be effective. We note this will also limit your frustration level. We do not provide extensive documentation, but rather jumping off points and short best practice info on your setup and your procedures.

By configuring your environment correctly you can make your job easier, check out [./conda.md](Mamba/Conda) for info on getting the right execution environment set up for your code, it might be desirable to run them inside a [apptainer.md](container) similar to the Docker system. Some users will benefit from using interactive [jupyter.md](Jupyter Notebooks).
By configuring your environment correctly you can make your job easier, check out [Mamba/Conda](conda) for info on getting the right execution environment set up for your code, it might be desirable to run them inside a [container](apptainer) similar to the Docker system. Some users will benefit from using interactive [Jupyter Notebooks](jupyter).

Since most computes will involve [storage.md](storing, accessing and moving data), as well as [transfer.md](transferring data) into the cluster.
Since most computes will involve [storing, accessing and moving data](storage), as well as [transferring data](transfer) into the cluster.

If you pay attention to a few details in organizing your compute things will go a lot smoother. We recommend using reprocudible approaches with [snakemake.md](SnakeMake) to structure your compute and a use a professional resource manager ([slurm.md](Slurm)) to structure your access to computing devices.
If you pay attention to a few details in organizing your compute things will go a lot smoother. We
recommend using reprocudible approaches with [SnakeMake](snakemake) to structure your compute and a
use a resource manager ([slurm](slurm)) to structure your access to computing devices.

We note that typically compute resources are available but the lack of good computing practices leads
contention for IO resources, that in turn slow everyone down.

We are adding things to the documentation to aid our users, please familiarize yourself with it. Also check out the [lessons learned](antipatterns).
4 changes: 4 additions & 0 deletions docs/transfer.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ Larger scale data transfer requires some degree of familiarity with the technolo
We provide three different means for data transfer. We note that for larger transfers, the speed of the device the data is stored on remotely makes a difference.

### Using a web browser to move data into the cluster

Use your web browser to upload data to a facility we yet have to build.

Details: (TBA)
Expand All @@ -23,12 +24,14 @@ Disadvantages: not suitable for many files
Intended data scope: up to 500GB works

### Using ssh / scp to move data into the cluster

In short on the remote system execute
`tar -cpf - | ssh -J login.ikim.uk-essen.de shellhost.ikim.uk-essen.de "tar -xpf -" `

Read [this](https://www.cyberciti.biz/faq/howto-use-tar-command-through-network-over-ssh-session/) for more details.

### Using NC to move data into the cluster

In short:
- ensure NC is installed on the remote system
- you need to execute commands on both sending and receiving system
Expand Down Expand Up @@ -58,4 +61,5 @@ Depending on your needs and the systems involved, your technology choices may va
| nc | unlimiteed | complicated, use zip or tar to group files |

### Miscellaneous comments

The local storage on each node typically consists of a system partition and a data partition.
21 changes: 9 additions & 12 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,15 +18,12 @@ nav:
- Introduction: index.md
- Getting Started: getting-started.md
- User Guide:
- configuring your environment
- Mamba/Conda: conda.md
- Containers: apptainer.md
- Jupyter Notebooks: jupyter.md
- working with data
- Storage: storage.md
- Data Transfer: transfer.md
- organizing compute
- Snakemake: snakemake.md
- Slurm: slurm.md
- everything else
- Troubleshooting: troubleshooting.md
- Setup -- Mamba/Conda: conda.md
- Setup -- Containers: apptainer.md
- Setup -- Jupyter Notebooks: jupyter.md
- Data -- Storage: storage.md
- Data -- Transfer: transfer.md
- Computing -- Snakemake: snakemake.md
- Computing -- Slurm: slurm.md
- Misc -- Troubleshooting: troubleshooting.md
- Misc -- Antipatterns: antipatterns.md

0 comments on commit 80e93da

Please sign in to comment.