Skip to content

Latest commit

 

History

History
103 lines (73 loc) · 3.99 KB

README.md

File metadata and controls

103 lines (73 loc) · 3.99 KB

House-Price-Prediction

Overview

This is an ML-based application that predicts the house prices based upon the features provided (such as area, num of bedrooms, etc.).

For testing using sample inputs, click here

Model Used Accuracy
Linear Regressor 81.705317 (Cost)
Ordinary Least Squares 92.9 (R-squared value)

File Description:

  • Delhi_load.ipynb : Application interface for users.
  • CSV File/ Delhi.csv : dataset
  • Codes/ Delhi.ipynb : Main file with all processing stuff
  • Codes/ ols_results_delhi.pickle : Pickled OLS model after training on dataset

Problem Statement

Given a set of features, predict the price of any given house in the Delhi region.

Dataset and libraries used

Dataset Summary
  1. Reference link for dataset : click here
  2. df.info()

Image for dataframe summary

For full image Click here

Libraries Used
  1. Pandas
  2. Statsmodel
  3. Sklearn
  4. Numpy
  5. Seaborn
  6. Matplotlib
  7. Google.colab

Feature analysis

Outliers
  1. Price

Raw Data Price_outliers Rectified Data Corrected Price

  1. Area

Raw Data Area Outliers Rectified Data Corrected Area

  1. Price per sq. foot

Raw Data Price/ Sq. foot Outliers Rectified Data Corrected Price/ Sq. foot

New Features

All the features provided can be reframed to the format : {Area, AttributeScore, Resale, LogPremium, Bedrooms}

  1. Area = Floor area of the property
  2. AttributeScore = An integer based on features like num of bedrooms, gym facility, etc
  3. Resale = A binary value denoting if the propery is first hand usage (0) or a resale (1)
  4. LogPremium = An integer value depending on the Price per sq. foot value
  5. Bedrooms = Number of bedrooms in the property

Final Correlation Matrix : Plot of Correlation Matrix

Model selection

Two models were cosidered as most optimum ones, whose predictions are depicted below:

Ordinary Least Squares Linear Regressor
OLS Plot Linear Regression Plot
92.9 (R-squared value) 81.705317 (Score)

Results

The trained OLS model gives an accuracy of 92.9 (R-squared value). The model summary for the same is provided below

OLS Model Summary