Skip to content

Latest commit

 

History

History
38 lines (24 loc) · 1.52 KB

README.md

File metadata and controls

38 lines (24 loc) · 1.52 KB

Prin of Big Data Mgmt project

Team Details

Veeresha M Thotigar

Sai Sampath Kumar Raigiri

Sai Srinivas Vidiyala

Phase 1

  1. Firstly we generated Twitter Access keys from the developers.twitters.com using our twitter accounts.

  2. Using tweepy package in python we downloaded data on topic yoga, medetation, etc.,

  3. Writing python code again we extracted url and hashtags for downladed tweets and the output is our translated input.

  4. We loaded traslated input to hdfs directory using "HDFS DFS -copyFromLocal source path HDFS destination path" command in the terminal.

  5. We used the example word count program which is part of haddop installation and produced word count for the large data.

  6. simillarly we executed spark word count job for the same input data to process the data. and the out put is in the folder


  We Pushed our hadoop log files , output, commands that we used in the terminal in the form of "steps_hadoop_wordcount.txt".

  Hadoop folder conains the output and logs generated for the hadoop word count program.

  tweetcrawler.py file in folder Python_script contains python code for downloading tweets using keys and saving them into a csv file.

  Extract.py file in folder Python_script contains python code for extracting URLs and HashTags from the downloaded tweets into a text file.

  Spark folder contains the output and logs of the job submitted.