Skip to content

abdul-git/IAC-Bigdata

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 

Repository files navigation

#Introduction

This document is only for reference purpose to help students with weekly labs and is a guide to work hands-on with Big-Data stack. This document is living document and will be update during and after the course.

We will use this repo to setup labs and individual cluster in order to complete hands on exercises.

##Week-1: Intro to Hadoop

##Week-2: Cloudera hadoop cluster deployment in AWS

##Week-3a: Cloudera hadoop cluster deployment in AWS

Continue from previous week.

1 - Add a new data node to existing cluster

2 - Configure Hadoop NN HA –

3 - Perform manual and automatic switchover

4 - Configure RM - HA

##Week-3b: MYSQL Setup

After successful Hadoop cluster deployment in AWS, we will need to setup MySQL databases from scripts that can be found at GitHub repo for data movement exercises.

repo URL = https://github.com/abdul-git/IAC-Bigdata

Also, repo can be cloned using code below:

git clone https://github.com/abdul-git/IAC-Bigdata.git

We will work with three different type of datasets: Airlines, Retail_DB and Employees

•	Create database xxxx
•	Load data into interim mysql database
•	Verify tables and dataset

First we need to download and stage installation scripts from GitHub repo

git clone https://github.com/abdul-git/IAC-Bigdata.git

Above command will clone needed scripts to stage the datasets.

 cd  /xyz/IAC-Bigdata/labs/week3/setup_mysql

Where "/xyz" is location where you cloned and stored git repo. The directory structure will be different for your environment. If you cloned git repo under your home directory, you can use following command to change directory also:

 cd  ~/IAC-Bigdata/labs/week3/setup_mysql

###Week 3 -Data Lab Setup

For Week 3 Data labs, sequence below should be followed to complete the data labs

1 - setup_mysql - Create and load data into MySQL databases

2 - import_data - imports data from MySQL using sqoop

3 - mr_word_count - Sample map reduce program - pre-built or you can compile java code

#####Execute scripts

Before we can import data, we need to setup MySQL for password less execution. In order to do that, switch to directory by using code below:

cd ~/IAC-Bigdata/labs/week3/setup_mysql

Once you are in Week3 directory, execute the setup scripts as below:

./create_mycf.sh
./create_ec2_user.sh

Scripts above will create env file and creates user ec2-user(for AWS deployment), so mysql login will not prompt for password.

Database creation

Instructions for DB creation can be followed by using README for each database.

#Week-4 A. SQOOP SETUP AND IMPORT DATA FROM MYSQL B. CREATE HIVE TABLES FROM EXISTING DATASET C. CREATE TWITTER STREAM USING FLUME

#Week-5

Visual

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published