Topic models automatically infer the topics discussed in a collection of documents. These topics can be used to summarize and organize documents, or used for featurization and dimensionality reduction in later stages of the data analysis.
LDA (Latent Dirichlet Allocation) is a topic model library. I used LDA in this project to derive ‘topics’ from the dataset provided, the code was written in Python.
The dataset was obtained from Yelp’s website.
- Prepare the data:
- Tokenizing
- Stopping
- Stemming
- Construct a Document-term Matrix
- Apply the LDA Model
- Examine the results
This project is licensed under the GNU 2.0 License - see the LICENSE.md file for details