Discovery Scripts

Scripts for getting started on Northeastern's high performance computing cluster

About Discovery

Main site
Documentation
Tutorial (Requires NU ID)

Environment

Discovery uses Slurm to schedule jobs.

How to use Slurm on Discovery

A simple sbatch script from the introductory tutorial:

#!/bin/bash
#SBATCH --partition=express
#SBATCH --job-name=test
#SBATCH --time=00:05:00
#SBATCH -N 1
#SBATCH -n 1
#SBATCH --output=%j.output
#SBATCH --error=%j.error

echo "HELLO WORLD!"

Packages that aren't available as a module by default can be installed with Spack to your home directory.

Spark

Running Spark requires scheduling time on two different types of node (a Driver and one or more Workers). Spark 3 is also not currently available as a module. So running Spark 3 demonstrates several features of the environment.

Installing Dependencies

First install Spack according to the documentation.

git clone https://github.com/spack/spack.git

# Schedule an environment that can handle a larger workload.
srun -p short --pty --export=ALL -N 1 -n 28 --exclusive /bin/bash

export SPACK_ROOT=/home/<yourusername>/spack

. $SPACK_ROOT/share/spack/setup-env.sh

Install Spark 3 with Hadoop and OpenJDK 11.

spack install [email protected] +hadoop ^[email protected]
spack install [email protected]

Upload or move the sample sbatch script and pi.py to your home directory and run it:

sbatch spark_with_slurm.sh

You should see output from Slurm that the job has been scheduled with a job ID, and then <job ID>.error and <job ID>.output in your home directory. The .error file should have a Spark log that includes how long the job took to run. You can see a list of your jobs at ood.discovery.neu.edu (NU ID required).

Increase the number of nodes at the top of the sbatch file:

.
.
.
#SBATCH --partition=express
#SBATCH --job-name=spark-cluster
#SBATCH --nodes=3
.
.
.

pi.py contains a perfectly parallel algorithm for estimating pi using a Monte Carlo method. Running the job with more nodes should increase performance.

If you find that a simple, parallelizable job is not scaling, it likely means you are running several Driver nodes instead of one Driver and several Workers. See spark_with_slurm.sh for details.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
spark		spark
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Discovery Scripts

About Discovery

Environment

Spark

Installing Dependencies

About

Releases

Packages

Languages

License

rouxinstitute/discovery-scripts

Folders and files

Latest commit

History

Repository files navigation

Discovery Scripts

About Discovery

Environment

Spark

Installing Dependencies

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages