Car Price Prediction

Hello Everyone,

Here is my regression project based on predicting the price of used cars using Linear Regression.

Dataset

I used Honda Used Car Selling Dataset which is one of my own dataset uploaded on Kaggle.

Link to the Dataset : Car Price Dataset

Problem Statement

To develop a Machine Learning Model that can accurately predict the prices of used cars based on various features and attributes.
The predicted prices will assist both buyers and sellers in making informed decisions, ensuring fair transactions in the used car market.

Streamlit Web App

For my project, I have created a Streamlit Web App for predicting the prices of cars in more interactive and user friendly way.
This web app allows you to predict the prices of the cars by just selecting some of its features and fill in some details.
These all are the features you need to select or enter before pressing the predict button :
- Year : Select the manufacturing year of the car.
- kms Driven : Enter the total distance covered by the car.
- Fuel Type : Choose the fuel type of the car.
- Suspension : Pick the type of suspension.
- Car Model : Select your car model from the available options.
After selecting all these features, Just hit the 'Predict' Button.
This web app also has multiple constraints in the input fields.
I have named it AutoValuate.

Link to the Web App : Car Price Prediction App

Setting up the Enviroment

Jupyter Notebook is required for this project and you can install and set it up in the terminal.

Install the Notebook

pip install notebook

Run the Notebook

jupyter notebook

Libraries required for the Project

Pandas

Go to the terminal and run this code

pip install pandas

Go to Jupyter Notebook and run this code from a cell

!pip install pandas

Matplotlib

Go to the terminal and run this code

pip install matplotlib

Go to Jupyter Notebook and run this code from a cell

!pip install matplotlib

Seaborn

Go to the terminal and run this code

pip install seaborn

Go to Jupyter Notebook and run this code from a cell

!pip install seaborn

Sklearn

Go to the terminal and run this code

pip install scikit-learn

Go to Jupyter Notebook and run this code from a cell

!pip install scikit-learn

Getting Started

Clone this repository to your local machine by using the following command :

git clone https://github.com/TheMrityunjayPathak/CarPricePrediction.git

Steps involved in the Project

Data Cleaning

fuel_type, suspension and car_model has extra whitespaces which is removed by str.strip() method.
Removed kms suffix from kms_driven column by using str.split() method and keeping only numeric part of the column.
After that we can convert kms_driven column to 'int' DataType.
Modifying price column from 6.45 Lakh to 645000 and convering it into integer by using a custom made function.
From car_model column we will keep only first 3 words of car name and removing the rest.

Data Visualization

Visualizing year with price by using sns.swarmplot()

Visualizing kms_driven with price by using sns.relplot()

Visualizing car_model with price by using sns.relplot() and suspension as 'hue' parameter

Dummy Variable

We first create dummy variable column based on the text column.
Then we transform it into a DataFrame.
After that we will merge the dummies DataFrame and our orignal DataFrame.
Finally we will drop the text column from our dataset.

Outlier Removal

After describing the dataset I noticed that in kms_driven column, 75% of cars has travelled 85000 kms and our maximum value in kms_driven is 11 Lakh kms which is an outlier.
And similarly, In our price column, 75% of cars has price 7 Lakh and our maximum price is 26 Lakh which is an outlier.

Model Building

Firstly I have definied dependent and independent variables for our traning and testing.
I have splitted data into traning and testing set by using train_test_split.
Then I fit the model with X_train and y_train and checked the score.
After that I used k-fold cross-validation for measuring accuracy of our model.
So I cheked the cross_val_score for measuring the best score of our model and then I have taken mean of all the scores.
And, Finally I predicted the result from our trained model.

Conclusion

Developed a highly accurate Linear Regression Model using various features and attributes to predict used car prices, achieving an average prediction accuracy of 82%.
Further model showcased its robustness by undergoing rigorous k-fold cross-validation, resulting in a mean cross-val score of 83%.

^ Scroll to Top ^

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Car Price Prediction

Dataset

Problem Statement

Streamlit Web App

Table of Contents

Setting up the Enviroment

Libraries required for the Project

Getting Started

Steps involved in the Project

Conclusion

Files

README.md

Latest commit

History

README.md

File metadata and controls

Car Price Prediction

Dataset

Problem Statement

Streamlit Web App

Table of Contents

Setting up the Enviroment

Libraries required for the Project

Getting Started

Steps involved in the Project

Conclusion