• The dataset is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across 20 different newsgroups.
• Builded vocabulary from the dataset which was used as a feature set.
• Implemented Multinomial Naive Bayes classifier from scratch for classifying news into appropriate group.
• Naive Bayes from scratch : 0.8474
• SKlearn Naive Bayes : 0.8476