Analyze the sentiment of the tweets of the Sentiment140 dataset by developing a machine learning pipeline involving the use of 2 classifiers (Logistic Regression, Bernoulli Naive Bayes)along with using Term Frequency- Inverse Document Frequency (TF-IDF).
Evaluvated the classifier using a confusion matrix and roc-auc curve.
Project pipeline- Import Necessary Dependencies Read and Load the Dataset Exploratory Data Analysis Data Visualization of Target Variables Data Preprocessing Splitting our data into Train and Test Subset Transforming Dataset using TF-IDF Vectorizer Function for Model Evaluation Model Building Conclusion
Used the following libraries- nltk,sklearn,pandas,numpy,matplotlib,seaborn,regex
Preprocessed the data by removing stopwords,punctuations,URLs,usernames,emojis etc.Finally,applied stemming and lemmatization.