Skip to content

Latest commit

 

History

History
20 lines (16 loc) · 875 Bytes

README.md

File metadata and controls

20 lines (16 loc) · 875 Bytes

NLP-twitter-sentiment-analysis

Analyze the sentiment of the tweets of the Sentiment140 dataset by developing a machine learning pipeline involving the use of 2 classifiers (Logistic Regression, Bernoulli Naive Bayes)along with using Term Frequency- Inverse Document Frequency (TF-IDF).

Evaluvated the classifier using a confusion matrix and roc-auc curve.

Project pipeline- Import Necessary Dependencies Read and Load the Dataset Exploratory Data Analysis Data Visualization of Target Variables Data Preprocessing Splitting our data into Train and Test Subset Transforming Dataset using TF-IDF Vectorizer Function for Model Evaluation Model Building Conclusion

Used the following libraries- nltk,sklearn,pandas,numpy,matplotlib,seaborn,regex

Preprocessed the data by removing stopwords,punctuations,URLs,usernames,emojis etc.Finally,applied stemming and lemmatization.