A Document Classification project using Latent Dirichlet Allocation
This repository demonstrates the use of Topic Modelling for the task of Document Classification. Topic Modelling is useful in learning latent topics in a given document corpus and can be extended to tasks like classification. The LDA.ipynb first implements a rudimentary version of Latent Dirichlet Allocation using Gibbs Sampling (a Markov Chain Monte Carlo (MCMC) algorithm) and then uses its output for the purpose of Document Classification using Support Vector Machines.
Note: The original paper on LDA is an interesting read!