Home

Short Name

Explore Spark SQL and its performance using TPC-DS workload

Short Description

Evaluate and measure the performance of Spark SQL using the TPC-DS benchmark. Learn to setup and run the TPC-DS benchmark in your own development environment.

Offering Type

Cognitive

Introduction

Two sentence introduction

Author

By Dilip Biswal and Rich Hagarty

Code

https://github.com/IBM/spark-tpc-ds-performance-test

Demo

N/A

Video

N/A

Overview

Two to three sentences about what the journey does and uses.

When the reader has completed this journey, they will understand how to:

goal 1
goal 2

Flow

Architecture diagram-1 Architecture diagram-2

Commandline
1. Compile the toolkit and generate the TPC-DS dataset by using the toolkit.
2. Create the spark tables and generate the TPC-DS queries.
3. Run the entire query set or a subset of queries and monitor the results.
Notebook
1. Create the spark tables with pre-generated dataset.
2. Run the entire query set or individual query.
3. View the query results or performance summary.
4. View the performance graph.

Included components

Apache Spark: An open-source, fast and general-purpose cluster computing system
Jupyter Notebook: An open-source web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text.

Featured technologies

Data Science: Systems and scientific methods to analyze structured and unstructured data in order to extract knowledge and insights.
Artificial Intelligence: Artificial intelligence can be applied to disparate solution spaces to deliver disruptive technologies.
Python: Python is a programming language that lets you work more quickly and integrate your systems more effectively.

Blog

blog

Links

Spark performance: https://databricks.com/blog/2014/11/05/spark-officially-sets-a-new-record-in-large-scale-sorting.html
TPC-DS site: http://www.tpc.org/tpcds/
BigSQL blog: https://developer.ibm.com/hadoop/2017/07/13/announcing-bigsql-5-0/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly