Sentiment Analysis of IMDB Movie Reviews

This project performs sentiment analysis of IMDB movie reviews and builds a flask web application that accepts users reviews of a movie and predicts the net promoter score for that movie review based solely on the text of the review.

Description

The project developed custom machine learning models to predict the net promoter score I user would provide to a review based on sentiment of that review.

The details of the models developed is documented in the presentation linked below, but to summarize the following models were developed and their corresponding accuracy scores are indicated in the table below:

Model	Test Accuracy
Random Forest	0.55
One Dimensional CNN with Custom Word Embedding	0.71
Bidirectional LSTM with GloVe Pretrained Word Embedding	0.74

Data Used

I scraped 62,612 movie reviews from imdb.com for movies released in the USA anytime between January 1, 2010 to December 31, 2019.

The image below is one of the reviews I captured with the specific fields I included in the data set called-out by the red arrows:

Brief Overview of the NLP Pipeline Used to Perform Sentiment Analysis

The diagram below summarize the NLP pipeline used to process the reviews, and train and validate the models.

Tools Used

python (version 3.8.3)
jupyter lab (2.2.6)
bs4 (4.9.1)
requests (2.24.0)
sklearn (0.23.2)
keras 2.4.0
tensorflow 2.3.1
NLTK: Perform tokenization (splitting into words) and lemmatization of the original text corpus and cleaning (removal of non-English words).
Omni Graffle Pro
A PDF file viewer (Acrobat)
Microsoft Powerpoint
emacs (or some other preferred text editor)
matplotlib
seaborn

Description of Files

MyMovieReview.py: Contains classes used to facilitate multi-day scraping and quick translation from raw scraped data to a pandas data frame
Project5.ipynb: Contains the NLP pipeline described above
get_word_vectors.sh: script that automatically downloads GloVE word embeddings from ai.standford.edu
mymovie.py: Contains classes used to read in an prescraped collection of IMDB movie URLs
project_utils.py: Contains utility functions used throughout the code in this project to aid in debug in enable better scraping
project5_venv.yml: The conda requirements file you can use to directly create the the python virtual environment used for this project
scrape_tool.py: The python script that automatically scraped movie reviews. It has some exception handling ability. But it also has the ability to pick up where it left off.
Link_to_Presentation.md: Contains a link to the presentation

Steps To Reproduce This Project

Clone this repo
Run this command: conda env create -f Source/project5_venv.yml
Once that's done and you have activated your virtual environment you should be able to start jupyter lab with this command: jupyter lab
With Jupyter Lab, open the notebook Source/Project5.ipynb
You should be able to run the cells. Please note that the default settings of the notebook require a pkl file of the cleaned data. Please contact me if you want that data.

The Web Application

Running the web application will required the following:

A setup script to set the application configuration variables that contain the credentials to enable the web application to access the remote SQL database and to send emails to the admin of the web application. You can ask me for a copy of the setup script.

Once you have obtained the the setup script you can run the following commands to launch the web application

cd Source/web_app
source <setup_script>
flask run

The fields to enter I estimate are likely self explanatory. If you would like to see a demo you can download a video here: Demo Link

It saves the reviews you enter into the database so that they could be used to re-train the model (and hopefully improve it) over time.
When you hit the [Submit] button the web application predicts the net promoter score of the movie review you entered into the web application

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
Documents		Documents
Source		Source
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment Analysis of IMDB Movie Reviews

Description

Data Used

Brief Overview of the NLP Pipeline Used to Perform Sentiment Analysis

Tools Used

Description of Files

Steps To Reproduce This Project

The Web Application

About

Releases

Packages

Languages

License

michael-a-green/Metis_Project5

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis of IMDB Movie Reviews

Description

Data Used

Brief Overview of the NLP Pipeline Used to Perform Sentiment Analysis

Tools Used

Description of Files

Steps To Reproduce This Project

The Web Application

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages