This repository has been archived by the owner on Jul 22, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 96
Home
Rich Hagarty edited this page Oct 26, 2017
·
7 revisions
Explore Spark SQL and its performance using TPC-DS workload
Evaluate and measure the performance of Spark SQL using the TPC-DS benchmark. Learn to setup and run the TPC-DS benchmark in your own development environment.
Cognitive
Two sentence introduction
By Dilip Biswal and Rich Hagarty
- N/A
- N/A
Two to three sentences about what the journey does and uses.
When the reader has completed this journey, they will understand how to:
- goal 1
- goal 2
- Commandline
- Compile the toolkit and generate the TPC-DS dataset by using the toolkit.
- Create the spark tables and generate the TPC-DS queries.
- Run the entire query set or a subset of queries and monitor the results.
- Notebook
- Create the spark tables with pre-generated dataset.
- Run the entire query set or individual query.
- View the query results or performance summary.
- View the performance graph.
- Apache Spark: An open-source, fast and general-purpose cluster computing system
- Jupyter Notebook: An open-source web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text.
- Data Science: Systems and scientific methods to analyze structured and unstructured data in order to extract knowledge and insights.
- Artificial Intelligence: Artificial intelligence can be applied to disparate solution spaces to deliver disruptive technologies.
- Python: Python is a programming language that lets you work more quickly and integrate your systems more effectively.
blog