DataStax Enterprise Spark Kernels for Scala and Python

Prerequisites:

DataStax Enterprise 4.8 Python 2.7 Scala (for Scala users) Jupyter

To get ipython notebook

Obviously you need python 2.7. Install these python packages

pip install jupyter

To set it up:

Unpack the tarball downloaded from http://github.com/slowenthal/spark-kernel/releases
in the tarball, navigate to the bin directory
Run setup.sh [<ip address for spark master>]. If your spark master is 127.0.0.1, you can leave out the ip addresses.*

If you are on an edge node, copy the hadoop configuration file dse-core-defaults.xml from a node in your cluster to your local DSE directories.

To run it

jupyter notebook

Some Useful options

--no-browser - avoid the browser from popping up
--ip 0.0.0.0 - listen on all interfaces instead of just localhost
--port <portno> - listen on a different port. (The default is 8888)

In the browser - create a new spark notebook

... and spark away

Special features of the Scala Kernel

%%cql <cql statement> Run a CQL statement and display the output
%%showschema [<keyspace>][.<table>] - Display all or part of the schema

Note: To download and install, click releases above, and read that stuff.

Requires JDK 1.7 or higher!

The Spark Kernel has one main goal: provide the foundation for interactive applications to connect and use Apache Spark.

Overview

The kernel provides several key features for applications:

Define and run Spark Tasks
- Executing Scala code dynamically in a similar fashion to the Scala REPL and Spark Shell
Collect Results without a Datastore
- Send execution results and streaming data back via the Spark Kernel to your applications
- Use the Comm API - an abstraction of the IPython protocol - for more detailed data communication and synchronization between your applications and the Spark Kernel
Host and Manage Applications Separately from Apache Spark
- The Spark Kernel serves as a proxy for requests to the Apache Spark cluster

The project intends to provide applications with the ability to send both packaged jars and code snippets. As it implements the latest IPython message protocol (5.0), the Spark Kernel can easily plug into the 3.x branch of IPython for quick, interactive data exploration. The Spark Kernel strives to be extensible, providing a pluggable interface for developers to add their own functionality.

If you are new to the Spark Kernel, please see the Getting Started section.

For more information, please visit the Spark Kernel wiki.

For bug reporting and feature requests, please visit the Spark Kernel issue list.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

DataStax Enterprise Spark Kernels for Scala and Python

Prerequisites:

To get ipython notebook

To set it up:

To run it

... and spark away

Special features of the Scala Kernel

Note: To download and install, click releases above, and read that stuff.

Overview

Files

README.md

Latest commit

History

README.md

File metadata and controls

DataStax Enterprise Spark Kernels for Scala and Python

Prerequisites:

To get ipython notebook

To set it up:

To run it

... and spark away

Special features of the Scala Kernel

Note: To download and install, click releases above, and read that stuff.

Overview