Spark Performance Tests

This is a framework for repeatedly running a suite of performance tests for the Spark cluster computing framework.

The script assumes you already have a binary distribution of Spark 1.0+ installed. It can optionally checkout a new version of Spark and copy configurations over from your existing installation.

Running locally

Download a spark 1.0+ binary distribution.
Set up local SSH server/keys such that ssh localhost works on your machine without a password.
Git clone spark-perf (this repo) and cd spark-perf
Copy config/config.py.template to config/config.py
Set config.py options that are friendly for local execution:

SPARK_HOME_DIR = /path/to/your/spark
SPARK_CLUSTER_URL = "spark://%s:7077" % socket.gethostname()
SCALE_FACTOR = .05
SPARK_DRIVER_MEMORY = 512m
spark.executor.memory = 2g
uncomment at least one SPARK_TESTS entry

Execute bin/run

Running on an existing Spark cluster

SSH into the machine hosting the standalone master
Git clone spark-perf (this repo) and cd spark-perf
Copy config/config.py.template to config/config.py
Set config.py options:

SPARK_HOME_DIR = /path/to/your/spark/install
SPARK_CLUSTER_URL = "spark://:7077"
SCALE_FACTOR =
SPARK_DRIVER_MEMORY =
spark.executor.memory =
uncomment at least one SPARK_TESTS entry

Execute bin/run

Running on a spark-ec2 cluster with a custom Spark version

Launch an EC2 cluster with spark-ec2 scripts.
Git clone spark-perf (this repo) and cd spark-perf
Copy config/config.py.template to config/config.py
Set config.py options:

USE_CLUSTER_SPARK = False
SPARK_COMMIT_ID =
SCALE_FACTOR =
SPARK_DRIVER_MEMORY =
spark.executor.memory =
uncomment at least one SPARK_TESTS entry

Execute bin/run

Requirements

The script requires Python 2.7. For earlier versions of Python, argparse might need to be installed, which can be done using easy_install argparse.

Acknowledgements

Questions or comments, contact @pwendell or @andyk.

This testing framework started as a port + heavy modifiation of a predecessor Spark performance testing framework written by Denny Britz called spark-perf.

Name		Name	Last commit message	Last commit date
Latest commit History 258 Commits
bin		bin
config		config
lib/sparkperf		lib/sparkperf
mllib-tests		mllib-tests
pyspark-tests		pyspark-tests
spark-tests		spark-tests
streaming-tests		streaming-tests
.gitignore		.gitignore
.travis.yml		.travis.yml
DEVELOPER-NOTES.txt		DEVELOPER-NOTES.txt
README.md		README.md
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spark Performance Tests

Running locally

Running on an existing Spark cluster

Running on a spark-ec2 cluster with a custom Spark version

Requirements

Acknowledgements

About

Releases

Packages

renozhang/spark-perf

Folders and files

Latest commit

History

Repository files navigation

Spark Performance Tests

Running locally

Running on an existing Spark cluster

Running on a spark-ec2 cluster with a custom Spark version

Requirements

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages