Skip to content

mikewcasale/nlp_primitives

 
 

Repository files navigation

Predicting the rating a reviewer will give a restaurant using Featuretools and the nlp-primitives library

Featuretools

When customers visit restaurants, they will oftentimes leave a review of some sort. Using data from TripAdvisor, we investigate how this text data can be used to predict the overall thoughts of the customer on that restuarant represented in a star rating.

In this tutorial, we show how Featuretools can be used alongside the nlp-primitives library to train an accurate machine learning model that can predict a customer's rating of a restaurant based on the text of their review and some information about the restaurant.

Highlights

  • We use the nlp-primitives library to create structured data from unstructured, hard to parse, text data
  • We acheive an accuracy rating 40% higher than the baseline
  • We use these primitives alongside Featuretools' dfs method to create as much information as possible from a dataset containing only two entities.
  • The dfs method stacks the default primitives on top of the nlp-primitives to create new, data-rich, features.
  • We build a pipeline that it can be reused for numerous NLP prediction problems (You can try this yourself!)

Running the tutorial

  1. Clone the repo

    git clone 
    
  2. Install the requirements

    pip install -r requirements.txt
    
    conda install pytorch cpuonly -c pytorch
    
  3. Download the data

    You can download the data directly from Kaggle here. Be sure to re-name it reviews.json, or change the file name in the tutorial.

  4. Run the tutorial notebook, Predict-Restaurant-Rating using Jupyter

    jupyter notebook
    

Feature Labs

Featuretools

Featuretools is an open source project created by Feature Labs. To see the other open source projects we're working on visit Feature Labs Open Source. If building impactful data science pipelines is important to you or your business, please get in touch.

Contact

Any questions can be directed to [email protected]

About

Natural Language Processing primitives for Featuretools

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Jupyter Notebook 92.6%
  • Python 7.1%
  • Other 0.3%