Smv Tutorial

This is a tutorial for people to have an idea of how to conduct various data analyses using SMV (Spark Modularized View) - a framework to use Spark to develop large scale data applications. API docs can be found here. After the tutorial, users are expected to be able to build a data analytics project with Smv framework.

The tutorial basics will mainly cover the following contents:

I. Preliminaries

First things first. We need to make sure we have all necessary tools installed and the environment set up.

Installation and A Sample Project
Get Started with the Tutorial

II. A Taste of Smv for Data Analyses

Once we have the environment set up, we can start doing some cool things. As a data scientist or a business analyst who may be familiar with traditional analytic tools such as SQL or SAS, it is natural to ask how to process data and conduct analyses in Smv. We will leverage the employment data in the SmvTraining in the following examples. The sample file in the data directory was directly extracted from US employment data.

$ wget http://www2.census.gov/econ2012/CB/sector00/CB1200CZ11.zip
$ unzip CB1200CZ11.zip
$ mv CB1200CZ11.dat CB1200CZ11.csv

More info can be found on US Census site

Now we will show how convenient and efficient data analyses can be with Smv.

Profile Input Data
Identify Insights from Data
Advanced Analytics
Quality Control
Smv Exercise 1: Employment Data

Remarks

Smv offers a the modularized computation framework, where the scalability and reusability of data, code is expected to finally scale the development team and reduce the development time of a complicated and large scale project. This tutorial is mainly to help users get familiar with how to build a project with Smv, and users are always encouraged to follow the latest development of SMV project and check the corresponding API docs for detailed help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Smv Tutorial

I. Preliminaries

II. A Taste of Smv for Data Analyses

Remarks

Files

README.md

Latest commit

History

README.md

File metadata and controls

Smv Tutorial

I. Preliminaries

II. A Taste of Smv for Data Analyses

Remarks