Skip to content

Scripts for getting started on Northeastern's high performance computing cluster

License

Notifications You must be signed in to change notification settings

rouxinstitute/discovery-scripts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Discovery Scripts

Scripts for getting started on Northeastern's high performance computing cluster

About Discovery

Environment

Discovery uses Slurm to schedule jobs.

A simple sbatch script from the introductory tutorial:

#!/bin/bash
#SBATCH --partition=express
#SBATCH --job-name=test
#SBATCH --time=00:05:00
#SBATCH -N 1
#SBATCH -n 1
#SBATCH --output=%j.output
#SBATCH --error=%j.error

echo "HELLO WORLD!"

Packages that aren't available as a module by default can be installed with Spack to your home directory.

Spark

Running Spark requires scheduling time on two different types of node (a Driver and one or more Workers). Spark 3 is also not currently available as a module. So running Spark 3 demonstrates several features of the environment.

Installing Dependencies

First install Spack according to the documentation.

git clone https://github.com/spack/spack.git

# Schedule an environment that can handle a larger workload.
srun -p short --pty --export=ALL -N 1 -n 28 --exclusive /bin/bash

export SPACK_ROOT=/home/<yourusername>/spack

. $SPACK_ROOT/share/spack/setup-env.sh

Install Spark 3 with Hadoop and OpenJDK 11.

spack install [email protected] +hadoop ^[email protected]
spack install [email protected]

Upload or move the sample sbatch script and pi.py to your home directory and run it:

sbatch spark_with_slurm.sh

You should see output from Slurm that the job has been scheduled with a job ID, and then <job ID>.error and <job ID>.output in your home directory. The .error file should have a Spark log that includes how long the job took to run. You can see a list of your jobs at ood.discovery.neu.edu (NU ID required).

Increase the number of nodes at the top of the sbatch file:

.
.
.
#SBATCH --partition=express
#SBATCH --job-name=spark-cluster
#SBATCH --nodes=3
.
.
.

pi.py contains a perfectly parallel algorithm for estimating pi using a Monte Carlo method. Running the job with more nodes should increase performance.

If you find that a simple, parallelizable job is not scaling, it likely means you are running several Driver nodes instead of one Driver and several Workers. See spark_with_slurm.sh for details.

About

Scripts for getting started on Northeastern's high performance computing cluster

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published