Skip to content

This repository contains GrocerGenius, an AI-driven supermarket sales prediction model. It includes a pipeline for data processing, model training, and optimization, along with a user-friendly interface for uploading data and viewing predictions.

License

Notifications You must be signed in to change notification settings

amalsalilan/GrocerGenius_AI_Based_Supermarket_Sales_Prediction_Infosys_Internship_Oct2024

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

56 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Grocery Sales Prediction

https://grocergeniusaibasedsupermarketsalesprediction.streamlit.app/

License: MIT Python Version Streamlit


Table of Contents


Introduction

Welcome to the Grocery Sales Prediction project!

In today's competitive retail landscape, accurate sales forecasting is crucial for inventory management, resource allocation, and strategic planning. This project leverages machine learning to predict the sales of grocery items across various outlets, enabling businesses to make data-driven decisions and optimize their operations.

Why is this important?

  • Inventory Management: Prevent overstocking or stockouts.
  • Pricing Strategies: Adjust prices based on demand predictions.
  • Marketing Campaigns: Target promotions effectively.

Technology Stack

We have utilized a modern and robust technology stack to ensure scalability and performance:


Project Workflow

Our project follows a structured workflow to ensure clarity and efficiency:

  1. Data Collection ๐Ÿ“Š
    • Gather raw sales data from multiple grocery outlets.
  2. Data Preprocessing ๐Ÿ› ๏ธ
    • Cleanse data, handle missing values, and prepare for modeling.
  3. Exploratory Data Analysis (EDA) ๐Ÿ”
    • Visualize data patterns and uncover insights.
  4. Feature Engineering ๐Ÿงช
    • Create new features and transform existing ones.
  5. Model Training ๐Ÿค–
    • Train machine learning models and fine-tune hyperparameters.
  6. Model Evaluation ๐Ÿ†
    • Assess model performance using appropriate metrics.
  7. Model Deployment ๐Ÿš€
    • Deploy the model using Streamlit for user interaction.
  8. Inferencing ๐Ÿ”ฎ
    • Generate predictions based on user inputs.

Data Preprocessing

"Data is the new oil." โ€” Clive Humby

To extract value from data, we performed meticulous preprocessing:

  • Handling Missing Values:
    • Item Weight: Imputed using median values grouped by Item Type.
    • Outlet Size: Filled using mode values grouped by Outlet Type.
  • Outlier Detection and Treatment:
    • Applied the Interquartile Range (IQR) method to cap outliers.
  • Data Standardization:
    • Unified labels in Item Fat Content to ensure consistency.
  • Feature Creation:
    • Item Visibility Bins: Categorized into 'Low', 'Medium', 'High'.
    • Years Since Establishment: Calculated operational years of outlets.
  • Encoding Categorical Variables:
    • One-Hot Encoding: For nominal variables like Item Type.
    • Ordinal Encoding: For variables with an inherent order.
    • Mean Target Encoding: For Outlet Identifier based on mean sales.
  • Feature Transformation:
    • Log transformation applied to Item Visibility to reduce skewness.

Visual Overview of Preprocessing Steps:

flowchart TD
    A[Raw Data] --> B[Handle Missing Values]
    B --> C[Outlier Treatment]
    C --> D[Data Standardization]
    D --> E[Feature Creation]
    E --> F[Encoding Categorical Variables]
    F --> G[Feature Transformation]
    G --> H[Preprocessed Data]
Loading

Modeling

Our predictive modeling process is designed for accuracy and robustness:

  • Algorithm Selection: Random Forest Regressor
    • Reasons:
      • Handles both linear and non-linear data.
      • Reduces overfitting through ensemble learning.
      • Captures complex feature interactions.
  • Model Training:
    • Data split into training and validation sets.
    • Hyperparameters tuned using grid search.
  • Evaluation Metrics:
    • Mean Squared Error (MSE): Measures average squared difference.
    • R-squared (Rยฒ): Indicates the proportion of variance explained.

Feature Importance Plot:

An image showcasing the importance of each feature in the model can be placed here.


Inferencing

The deployed model is accessible through an interactive web application:

  • User Interface: Built with Streamlit for a seamless experience.
  • Real-Time Predictions: Users receive immediate feedback upon input.
  • Robust Error Handling: Ensures smooth user interaction and guides users in case of invalid inputs.

Usage

Prerequisites

Ensure you have the following installed:

  • Python 3.x
  • Python Libraries:
    • pandas
    • numpy
    • scikit-learn
    • joblib
    • streamlit

Setup Instructions

Follow these steps to get the project up and running:

  1. Clone the Repository

    git clone https://github.com/yourusername/grocery_sales_prediction.git
    cd grocery_sales_prediction
  2. Create a Virtual Environment

    python3 -m venv env
    source env/bin/activate  # For Windows: env\Scripts\activate
  3. Install Dependencies

    pip install -r requirements.txt
  4. Directory Structure

    Your project should have the following structure:

    grocery_sales_prediction/
    โ”œโ”€โ”€ data_alchemy/
    โ”‚   โ”œโ”€โ”€ raw/
    โ”‚   โ”‚   โ””โ”€โ”€ train.csv
    โ”œโ”€โ”€ model_factory/
    โ”‚   โ”œโ”€โ”€ models/
    โ”‚   โ”œโ”€โ”€ encoders/
    โ”‚   โ””โ”€โ”€ features/
    โ”œโ”€โ”€ codebase/
    โ”‚   โ”œโ”€โ”€ utils.py
    โ”‚   โ”œโ”€โ”€ training_script.py
    โ”‚   โ””โ”€โ”€ app.py
    โ””โ”€โ”€ README.md
    
  5. Place Your Data

    • Copy your train.csv file into data_alchemy/raw/.
  6. Train the Model

    cd codebase
    python training_script.py
    • This script will preprocess the data and train the model.
  7. Run the Streamlit App

    streamlit run app.py
  8. Access the Application

    • Open your web browser and navigate to http://localhost:8501.

Using the Application

Step-by-Step Guide:

  1. Navigate to the Application

    • Open your browser and go to http://localhost:8501.
  2. Input Features

    • Item Identifier: Select from the dropdown.
    • Item Weight: Enter a value within the specified range.
    • Item Fat Content: Choose between 'Low Fat' and 'Regular'.
    • Item Visibility: Adjust using the slider.
    • Item Type: Select the appropriate category.
    • Item MRP: Enter the maximum retail price.
    • Outlet Identifier: Select the outlet code.
    • Outlet Establishment Year: Choose the year from the dropdown.
    • Outlet Size: Select the size category.
    • Outlet Location Type: Choose the location type.
    • Outlet Type: Select the type of outlet.
  3. Predict Sales

    • Click the "Predict" button to generate the sales prediction.
  4. View Results

    • The predicted sales figure will be displayed on the screen.

Sample Screenshot of the Application:

Include a screenshot of the Streamlit app interface here.


Contributors

We extend our heartfelt gratitude to everyone who contributed to this project:

Your contributions have been invaluable. Thank you for your dedication and hard work! ๐Ÿ™Œ


License

This project is licensed under the MIT License - see the LICENSE file for details.


Acknowledgments

  • Open-Source Community: For providing tools and resources that made this project possible.
  • You: For taking the time to explore our project.

Feel free to reach out for any queries or collaboration opportunities.

Contact: [email protected]


Made with โค๏ธ by the Grocery Sales Prediction Team.

About

This repository contains GrocerGenius, an AI-driven supermarket sales prediction model. It includes a pipeline for data processing, model training, and optimization, along with a user-friendly interface for uploading data and viewing predictions.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published