https://grocergeniusaibasedsupermarketsalesprediction.streamlit.app/
- Introduction
- Technology Stack
- Project Workflow
- Data Preprocessing
- Modeling
- Inferencing
- Usage
- Contributors
- License
- Acknowledgments
Welcome to the Grocery Sales Prediction project!
In today's competitive retail landscape, accurate sales forecasting is crucial for inventory management, resource allocation, and strategic planning. This project leverages machine learning to predict the sales of grocery items across various outlets, enabling businesses to make data-driven decisions and optimize their operations.
Why is this important?
- Inventory Management: Prevent overstocking or stockouts.
- Pricing Strategies: Adjust prices based on demand predictions.
- Marketing Campaigns: Target promotions effectively.
We have utilized a modern and robust technology stack to ensure scalability and performance:
- Programming Language: Python 3.x
- Data Manipulation: Pandas, NumPy
- Data Visualization: Matplotlib, Seaborn
- Machine Learning: Scikit-learn
- Model Persistence: Joblib
- Web Framework: Streamlit
- Version Control: Git
- Development Environment: Jupyter Notebooks
Our project follows a structured workflow to ensure clarity and efficiency:
- Data Collection ๐
- Gather raw sales data from multiple grocery outlets.
- Data Preprocessing ๐ ๏ธ
- Cleanse data, handle missing values, and prepare for modeling.
- Exploratory Data Analysis (EDA) ๐
- Visualize data patterns and uncover insights.
- Feature Engineering ๐งช
- Create new features and transform existing ones.
- Model Training ๐ค
- Train machine learning models and fine-tune hyperparameters.
- Model Evaluation ๐
- Assess model performance using appropriate metrics.
- Model Deployment ๐
- Deploy the model using Streamlit for user interaction.
- Inferencing ๐ฎ
- Generate predictions based on user inputs.
"Data is the new oil." โ Clive Humby
To extract value from data, we performed meticulous preprocessing:
- Handling Missing Values:
- Item Weight: Imputed using median values grouped by Item Type.
- Outlet Size: Filled using mode values grouped by Outlet Type.
- Outlier Detection and Treatment:
- Applied the Interquartile Range (IQR) method to cap outliers.
- Data Standardization:
- Unified labels in Item Fat Content to ensure consistency.
- Feature Creation:
- Item Visibility Bins: Categorized into 'Low', 'Medium', 'High'.
- Years Since Establishment: Calculated operational years of outlets.
- Encoding Categorical Variables:
- One-Hot Encoding: For nominal variables like Item Type.
- Ordinal Encoding: For variables with an inherent order.
- Mean Target Encoding: For Outlet Identifier based on mean sales.
- Feature Transformation:
- Log transformation applied to Item Visibility to reduce skewness.
Visual Overview of Preprocessing Steps:
flowchart TD
A[Raw Data] --> B[Handle Missing Values]
B --> C[Outlier Treatment]
C --> D[Data Standardization]
D --> E[Feature Creation]
E --> F[Encoding Categorical Variables]
F --> G[Feature Transformation]
G --> H[Preprocessed Data]
Our predictive modeling process is designed for accuracy and robustness:
- Algorithm Selection: Random Forest Regressor
- Reasons:
- Handles both linear and non-linear data.
- Reduces overfitting through ensemble learning.
- Captures complex feature interactions.
- Reasons:
- Model Training:
- Data split into training and validation sets.
- Hyperparameters tuned using grid search.
- Evaluation Metrics:
- Mean Squared Error (MSE): Measures average squared difference.
- R-squared (Rยฒ): Indicates the proportion of variance explained.
Feature Importance Plot:
An image showcasing the importance of each feature in the model can be placed here.
The deployed model is accessible through an interactive web application:
- User Interface: Built with Streamlit for a seamless experience.
- Real-Time Predictions: Users receive immediate feedback upon input.
- Robust Error Handling: Ensures smooth user interaction and guides users in case of invalid inputs.
Ensure you have the following installed:
- Python 3.x
- Python Libraries:
pandas
numpy
scikit-learn
joblib
streamlit
Follow these steps to get the project up and running:
-
Clone the Repository
git clone https://github.com/yourusername/grocery_sales_prediction.git cd grocery_sales_prediction
-
Create a Virtual Environment
python3 -m venv env source env/bin/activate # For Windows: env\Scripts\activate
-
Install Dependencies
pip install -r requirements.txt
-
Directory Structure
Your project should have the following structure:
grocery_sales_prediction/ โโโ data_alchemy/ โ โโโ raw/ โ โ โโโ train.csv โโโ model_factory/ โ โโโ models/ โ โโโ encoders/ โ โโโ features/ โโโ codebase/ โ โโโ utils.py โ โโโ training_script.py โ โโโ app.py โโโ README.md
-
Place Your Data
- Copy your
train.csv
file intodata_alchemy/raw/
.
- Copy your
-
Train the Model
cd codebase python training_script.py
- This script will preprocess the data and train the model.
-
Run the Streamlit App
streamlit run app.py
-
Access the Application
- Open your web browser and navigate to
http://localhost:8501
.
- Open your web browser and navigate to
Step-by-Step Guide:
-
Navigate to the Application
- Open your browser and go to
http://localhost:8501
.
- Open your browser and go to
-
Input Features
- Item Identifier: Select from the dropdown.
- Item Weight: Enter a value within the specified range.
- Item Fat Content: Choose between 'Low Fat' and 'Regular'.
- Item Visibility: Adjust using the slider.
- Item Type: Select the appropriate category.
- Item MRP: Enter the maximum retail price.
- Outlet Identifier: Select the outlet code.
- Outlet Establishment Year: Choose the year from the dropdown.
- Outlet Size: Select the size category.
- Outlet Location Type: Choose the location type.
- Outlet Type: Select the type of outlet.
-
Predict Sales
- Click the "Predict" button to generate the sales prediction.
-
View Results
- The predicted sales figure will be displayed on the screen.
Sample Screenshot of the Application:
Include a screenshot of the Streamlit app interface here.
We extend our heartfelt gratitude to everyone who contributed to this project:
- Mentor: Amal Salilan (amalsalilan)
- Aman (theamansyed)
- Vrushika K Panchal (vrushika-k-panchal)
- Chetan (Chetanp717)
- Rimi (rs2103)
- Shilpa Manaji (Shilpa-Manaji)
- Tharun (Kottetharun-09)
- Sumithra (Sumithra-git)
- Yanvi Arora (YanviAroraCS)
- Sayantan (SayanRony)
- Muskan Asthana (muskan42)
- Purnima Pattnaik (Purnima07-sudo)
- Rameswar Bisoyi (RB137)
- Raunit (raunit45)
- Hima Mankanta (manu-vasamsetti)
- Nuka Abhinay (NUKA-ABHINAY)
- Anjan Kumar (Anjankumarkamalapur)
Your contributions have been invaluable. Thank you for your dedication and hard work! ๐
This project is licensed under the MIT License - see the LICENSE file for details.
- Open-Source Community: For providing tools and resources that made this project possible.
- You: For taking the time to explore our project.
Feel free to reach out for any queries or collaboration opportunities.
Contact: [email protected]
Made with โค๏ธ by the Grocery Sales Prediction Team.