-
Notifications
You must be signed in to change notification settings - Fork 0
Sherlock Software Setup Guide
When using Sherlock, you will probably have to work with several different package management systems to install the software you need. If you are coding with Python or R, you will need to download libraries, and you may also want to use other software which is pre-installed on Sherlock but not accessible unless you specifically import it. This guide will cover how to set up your workspace to have access to the common software you might need day-to-day in the lab.
The three package management tools that are covered are:
- module load -- the tool for accessing pre-installed software on Sherlock.
- conda -- a tool for managing python libraries, and downloading other software tools or system libraries that Sherlock does not provide.
- R -- R library management is easy compared to python, and the language built-ins will get you a long way.
Typing in your password every time you want to start a new ssh session is really a pain, and life is a lot easier if you can start up new ssh windows as fast as possible. Add the following to the file ~/.ssh/config
on your laptop/desktop computer:
Host sherlock
User [YOUR USERNAME]
HostName login.sherlock.stanford.edu
ControlMaster auto
ControlPersist yes
ControlPath ~/.ssh/%l%r@%h:%p
The first three lines mean you can type ssh sherlock
and it will act the same as if you typed ssh [YOUR USERNAME]@login.sherlock.stanford.edu
. The last three lines with the Control
settings will save your running session to a file in your ~/.ssh
folder, and next time you type ssh sherlock
it will reuse that existing connection and skip asking for a password. Once this saved connection expires you will have to retype your password -- to avoid dropping this connection I set my computer to turn off the display but not sleep automatically when it's plugged in.
tl;dr now you only have to type in your password for ssh sherlock
the first time, and all the later times will connect you automatically.
Bonus: you can also use this alias in place of [email protected]
when you are using other ssh-based commands such as scp
, sshfs
, and rsync
This guide will sometimes ask you to add a line to your ~/.bashrc
file. Your bashrc
is a short bash script that is run every time you log in to a node on Sherlock. It usually holds configuration commands that load libraries or programs you want fast access to. You can edit your bashrc
by running nano ~/.bashrc
on sherlock.
The module load
command is also abbreviated ml
, and it lets you load any tools or system libraries that the Sherlock administrators (a.k.a Killian) have pre-installed. The selection is a little random, but has many useful packages including some paid software that Stanford licenses. You can see a list of the available software packages here: https://www.sherlock.stanford.edu/docs/software/list/
For now, we will just load a few command-line tools for handling genomic data.
Add the following lines to your ~/.bashrc
# Load genomic tools
ml biology bowtie2 bwa samtools bedtools bcl2fastq
Handy ml
commands:
- Load packages:
ml [package1] [package2] ...
- Unload packages:
ml unload [package1] [package2] ...
- List loaded packages:
ml list
- Search for packages:
ml spider [search_term]
- Full command help:
ml help
Conda is a general-purpose package manager, but you see it used most frequently for managing python libraries. Most scientific python packages are available through conda, and a lot of non-python software and system libraries are easy to install through conda as well. We will be installing the miniconda distribution, which just gives us the conda command without a ton of pre-installed packages.
Install conda on your Sherlock account
- SSH into Sherlock
- Run
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
(this is the link listed on the main miniconda install page) - When the installer prompts for an install location, set it to
$OAK/$USER/software/miniconda3
or any other location of your choosing (I recommend Oak to avoid running out of 15GB of home directory storage) - Agree to have the installer modify your
.bashrc
, this will make it so conda is activated by default when you log in. (This is generally good, but I have found that some R packages won't install properly unless I runconda deactivate
first) - (Optional) Run
conda env update -f $PI_HOME/resources/software/base_conda_environment_v2.yml
. This will get you up and running quickly by installing some common python packages and bioinformatics tools to your base environment. It might take a long time though, so you can also just install the packages you care about manually
Handy conda
commands:
- Search for a package:
conda search [package-name]
- Install a package:
conda install [package-name]
- Uninstall a package:
conda uninstall [package-name]
- Activate a conda environment:
conda activate [name]
(defaults to thebase
environment, and will show the name of your current environment at the start of the command prompt) - Deactivate a conda environment:
conda deactivate
- List your environments:
conda env list
- Get help on the conda command:
conda help
or go to the online user guide
Installing packages through pip: While you're using conda it's generally safest to install packages through the conda
command, but if a package is not available you can install it through pip
and usually not run into issues. Most packages are available through conda though (check the conda-forge or bioconda channels for a wider selection of packages than default conda).
A note on Python 2: Most python code runs just fine in Python 3, so in general don't worry about the 2 vs. 3 debate, just use Python 3 and don't look back. Occasionally you might find an old package that only supports Python 2, though. If you need access to install Python 2 packages, create a python 2 environment using conda create --name py2 python=2.7
, which you can access using conda activate py2
.
Why not use virtualenv
: You may have heard of people using virtualenv
to manage their environments. virtualenv
is a handy tool, but it has a couple drawbacks compared to conda
. First, it can only handle python packages, and can't help with managing other software and system libraries. Second, it can be cumbersome to manage multiple environments because you need to remember where you saved them. By contrast, conda lets you activate any environment with one simple command, and you can use the environments for much more than just python packages.
Managing R libraries on Sherlock is pretty simple. The only problem I have run in to is that sometimes you will need to run conda deactivate
before installing R packages, since otherwise they can get confused when they try to compile using half libraries from conda and half from Sherlock.
First, create the directory you want to use to hold your R libraries: (I recommend using Oak to avoid running out of your 15GB home folder storage)
mkdir -p $OAK/$USER/software/R
All we need to do is load a few modules using ml
, and set a variable to point to where you want to install your R libraries (I suggest ~/software/R
)
Add the following lines to your ~/.bashrc
# Load libraries that some R libraries depend on
ml hdf5 gsl mariadb
# Load rstudio, then reload R so we get version 4.0
ml rstudio R/4.0.2
# Set R library location
export R_LIBS_USER=$OAK/$USER/software/R
Then you can install R packages to your heart's content by running the R
command from the terminal, then running install.packages()
with the packages you want to install.