-
Notifications
You must be signed in to change notification settings - Fork 1
Home
This document outlines the process of analyzing perfume buyers and product data, encompassing steps like data preprocessing, exploratory data analysis (EDA), machine learning modeling, and result evaluation.
Loading Data: Data is loaded from the 'noon_perfumes_buyer_dataset.csv' file using pd.read_csv. Checking Data Shape: The number of rows and columns is checked using the .shape attribute. Displaying First 5 Rows: The first 5 rows of the dataset are displayed using the .head() method. Data Information: Data types and missing values are checked using the .info() method. Descriptive Statistics: The .describe() method provides descriptive statistics for both numerical and categorical data. Null Value Check: The number of missing values in each column is identified using .isnull().sum().
The overview of the perfume dataset is checked following the same method as the buyer dataset. ###2. Data Preprocessing
Removing Missing Values: Rows with missing values are removed using the .dropna() method. Deleting Empty String Data: Rows with empty strings in specific columns are removed. Feature Integration: brand and name are combined into a single feature. One-Hot Encoding: Base notes and middle notes are one-hot encoded.
Removing Unnecessary Columns: Unneeded columns are dropped using the .drop() method. Data Cleaning: Specific characters are replaced using str.replace(). Feature Integration and One-Hot Encoding: Similar to the buyer dataset, feature integration and one-hot encoding are performed for the perfume dataset.
Data Merging: Buyer and perfume datasets are merged using pd.merge(). Creating User-Item Matrix: A user-item matrix is created using .pivot_table(). KNN Model: A perfume recommendation model is trained using NearestNeighbors. Recommendation Function: Provides perfume recommendations based on the input perfume name. ###4. KNN Classification Model Data Splitting: Data is split into training and test sets using train_test_split. Training and Evaluating KNN Classifier: The model is trained using KNeighborsClassifier, and evaluated with accuracy_score and classification_report. Elbow Graph: An elbow graph is plotted to find the optimal value of K.
Classification Report and Accuracy: The model's performance is summarized in a classification report, and its accuracy is calculated. This structured documentation provides a clear overview of the analysis process, making it easier to understand and follow the steps involved in the data analysis project.