Skip to content

Web scraping using BeautifulSoup and data analysis using Statsmodels and Scikit.

Notifications You must be signed in to change notification settings

salisquraishi/tv_scraping_regressions

 
 

Repository files navigation

tv-scraping-regressions

Web scraping using BeautifulSoup and data analysis using Statsmodels and Scikit.

####Folder structure:

_py_helpers
contains .py files used in the various .ipynb notebooks.
* sidereel.py uses BeautifulSoup module to scrape data from www.sidereel.com and generates variables ``show_list`` for all TV shows as of July 2015 and ``show_titles`` for all TV show titles as of July 2015 (subset of ``show_list``). * tvseriesfinale.py uses BeautifulSoup module to scrape data from www.tvseriesfinale.py and generates variable ``canceled_shows`` for concluded TV shows from 2011-2015 and ``title`` for concluded TV show titles (subset of ``canceled_shows``) * wikipedia-state.py generates variable ``show_state`` for shows and their settings (US state) as of July 2015 * generate_tv_csv_dataset.py generates ``data/tv.csv`` dataset used in analysis * generate_tv_training_df.py converts ``data/tv.csv`` dataset into pandas dataframe ``tv_df``
data
contains csv files used in running .ipynb notebooks.
* tv_20150718_dataset.csv backup copy of dataset used in analysis (can be used for analysis when webscraping fails)

About

Web scraping using BeautifulSoup and data analysis using Statsmodels and Scikit.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%