Predicting the rating a reviewer will give a restaurant using Featuretools and the nlp-primitives library
When customers visit restaurants, they will oftentimes leave a review of some sort. Using data from TripAdvisor, we investigate how this text data can be used to predict the overall thoughts of the customer on that restuarant represented in a star rating.
In this tutorial, we show how Featuretools can be used alongside the nlp-primitives library to train an accurate machine learning model that can predict a customer's rating of a restaurant based on the text of their review and some information about the restaurant.
- We use the nlp-primitives library to create structured data from unstructured, hard to parse, text data
- We acheive an accuracy rating 40% higher than the baseline
- We use these primitives alongside Featuretools'
dfs
method to create as much information as possible from a dataset containing only two entities. - The
dfs
method stacks the default primitives on top of the nlp-primitives to create new, data-rich, features. - We build a pipeline that it can be reused for numerous NLP prediction problems (You can try this yourself!)
-
Clone the repo
git clone
-
Install the requirements
pip install -r requirements.txt conda install pytorch cpuonly -c pytorch
-
Download the data
You can download the data directly from Kaggle here. Be sure to re-name it
reviews.json
, or change the file name in the tutorial. -
Run the tutorial notebook, Predict-Restaurant-Rating using Jupyter
jupyter notebook
Featuretools is an open source project created by Feature Labs. To see the other open source projects we're working on visit Feature Labs Open Source. If building impactful data science pipelines is important to you or your business, please get in touch.
Any questions can be directed to [email protected]