Skip to content
Park-JuH edited this page Dec 5, 2023 · 5 revisions

Perfume Buyers and Products Data Analysis

This document outlines the process of analyzing perfume buyers and product data, encompassing steps like data preprocessing, exploratory data analysis (EDA), machine learning modeling, and result evaluation.

1. Dataset Overview

1.1 Buyer Dataset

Loading Data: Data is loaded from the 'noon_perfumes_buyer_dataset.csv' file using pd.read_csv. Checking Data Shape: The number of rows and columns is checked using the .shape attribute. Displaying First 5 Rows: The first 5 rows of the dataset are displayed using the .head() method. Data Information: Data types and missing values are checked using the .info() method. Descriptive Statistics: The .describe() method provides descriptive statistics for both numerical and categorical data. Null Value Check: The number of missing values in each column is identified using .isnull().sum().

1.2 Perfumes Dataset

The overview of the perfume dataset is checked following the same method as the buyer dataset. ###2. Data Preprocessing

2.1 Buyer Dataset

Removing Missing Values: Rows with missing values are removed using the .dropna() method. Deleting Empty String Data: Rows with empty strings in specific columns are removed. Feature Integration: brand and name are combined into a single feature. One-Hot Encoding: Base notes and middle notes are one-hot encoded.

2.2 Perfume Dataset

Removing Unnecessary Columns: Unneeded columns are dropped using the .drop() method. Data Cleaning: Specific characters are replaced using str.replace(). Feature Integration and One-Hot Encoding: Similar to the buyer dataset, feature integration and one-hot encoding are performed for the perfume dataset.

3. Collaborative Filtering-Based Perfume Recommendation System

Data Merging: Buyer and perfume datasets are merged using pd.merge(). Creating User-Item Matrix: A user-item matrix is created using .pivot_table(). KNN Model: A perfume recommendation model is trained using NearestNeighbors. Recommendation Function: Provides perfume recommendations based on the input perfume name. ###4. KNN Classification Model Data Splitting: Data is split into training and test sets using train_test_split. Training and Evaluating KNN Classifier: The model is trained using KNeighborsClassifier, and evaluated with accuracy_score and classification_report. Elbow Graph: An elbow graph is plotted to find the optimal value of K.

5. Results and Analysis

Classification Report and Accuracy: The model's performance is summarized in a classification report, and its accuracy is calculated. This structured documentation provides a clear overview of the analysis process, making it easier to understand and follow the steps involved in the data analysis project.

Clone this wiki locally