Housing Heritage

This is a data analysis project where the objective is to create an ML Model that predicts the value of a house in Ames, Iowa and to help visualize the most important features considered when predicting that value.

Dataset Content

The dataset is sourced from Kaggle. We created then a fictitious user story where predictive analytics can be applied in a real project in the workplace.
The dataset has almost 1.5 thousand rows and represents housing records from Ames, Iowa; indicating house profile (Floor Area, Basement, Garage, Kitchen, Lot, Porch, Wood Deck, Year Built) and its respective sale price for houses built between 1872 and 2010.

Variable	Meaning	Units
1stFlrSF	First Floor square feet	334 - 4692
2ndFlrSF	Second-floor square feet	0 - 2065
BedroomAbvGr	Bedrooms above grade (does NOT include basement bedrooms)	0 - 8
BsmtExposure	Refers to walkout or garden level walls	Gd: Good Exposure; Av: Average Exposure; Mn: Minimum Exposure; No: No Exposure; None: No Basement
BsmtFinType1	Rating of basement finished area	GLQ: Good Living Quarters; ALQ: Average Living Quarters; BLQ: Below Average Living Quarters; Rec: Average Rec Room; LwQ: Low Quality; Unf: Unfinshed; None: No Basement
BsmtFinSF1	Type 1 finished square feet	0 - 5644
BsmtUnfSF	Unfinished square feet of basement area	0 - 2336
TotalBsmtSF	Total square feet of basement area	0 - 6110
GarageArea	Size of garage in square feet	0 - 1418
GarageFinish	Interior finish of the garage	Fin: Finished; RFn: Rough Finished; Unf: Unfinished; None: No Garage
GarageYrBlt	Year garage was built	1900 - 2010
GrLivArea	Above grade (ground) living area square feet	334 - 5642
KitchenQual	Kitchen quality	Ex: Excellent; Gd: Good; TA: Typical/Average; Fa: Fair; Po: Poor
LotArea	Lot size in square feet	1300 - 215245
LotFrontage	Linear feet of street connected to property	21 - 313
MasVnrArea	Masonry veneer area in square feet	0 - 1600
EnclosedPorch	Enclosed porch area in square feet	0 - 286
OpenPorchSF	Open porch area in square feet	0 - 547
OverallCond	Rates the overall condition of the house	10: Very Excellent; 9: Excellent; 8: Very Good; 7: Good; 6: Above Average; 5: Average; 4: Below Average; 3: Fair; 2: Poor; 1: Very Poor
OverallQual	Rates the overall material and finish of the house	10: Very Excellent; 9: Excellent; 8: Very Good; 7: Good; 6: Above Average; 5: Average; 4: Below Average; 3: Fair; 2: Poor; 1: Very Poor
WoodDeckSF	Wood deck area in square feet	0 - 736
YearBuilt	Original construction date	1872 - 2010
YearRemodAdd	Remodel date (same as construction date if no remodelling or additions)	1950 - 2010
SalePrice	Sale Price	34900 - 755000

Business Requirements

A good friend, John, has received an inheritance from his deceased mother. John has requested me to help him analyze the value of his properties to make business decisions. The properties are in Ames, Iowa.

Although my friend has an excellent understanding of property prices in his own state and residential area, he fears that basing her estimates for property worth on her current knowledge might lead to inaccurate appraisals. He also wants to spot what attributes are more important to possibly remodel them and get a better price for the house. What makes a house desirable and valuable where he comes from might not be the same in Ames, Iowa. He found a public dataset with house prices for Ames, Iowa, and will provide me with that information.

1 - The client is interested in discovering how the house attributes correlate with the sale price. Therefore, the client expects data visualisations of the correlated variables against the sale price to show that. This will help him identify potential improvements in order to increase the price.
2 - The client is interested in predicting the house sale price from her 4 inherited houses, and any other house in Ames, Iowa.

Hypothesis and how to validate?

1 - We suspect houses with better overall quality will have a higher sales price.
- A Correlation study can help in this using PPS, Pearson and Spearman methods would help the investigation.
2 - We suspect houses with larger living area will have a higher sales price.
- - A Correlation study can help in this using PPS, Pearson and Spearman methods would help the investigation.
3 - We suspect that houses with more recent remodelations will have a higher sales price.
- A Correlation study can help in this using PPS, Pearson and Spearman methods would help the investigation.

The rationale to map the business requirements to the Data Visualisations and ML tasks

Business Requirement 1: Data Visualization and Correlation study
- We will inspect the data related to the customer base.
- We will conduct a correlation study (Pearson and Spearman) to understand better how the variables are correlated to Churn.
- We will plot the main variables against Churn to visualize insights.
Business Requirement 2: Regression, Data Analysis
- We want to predict the value of a house. We want to build a regression model to predict the target variable SalePrice.
- We want to make plots to visualize the train and test sets predictions vs the actual.
- We want to run regression evaluation to demonstrate the R2 Score and Mean Absolute Error.

ML Business Case

Predict Sale Price

Regression Model

We want an ML model to predict the sale price of a house. A target variable is a serial number. We consider a regression model, which is supervised and uni-dimensional.
Our ideal outcome is to provide John with reliable insight into what sale price he should expect for his inherited houses or identify what improvements he could implement to increase the price.
The model success metrics are
- At least 0.8 for R2 score, on train and test set
The ML model is considered a failure if:
- After 6 months of usage, the model's predictions are 30% off more than 25% of the time.
The model output should be a constant value for the sale price.

Dashboard Design (Streamlit App User Interface)

Page 1

Quick project summary

Quick project summary
- Project Terms & Jargon
- Describe Project Dataset
- State Business Requirements

Page 2

Sale Price Study

Before the analysis, we knew we wanted this page to answer business requirement 1, but we couldn't know in advance which plots would need to be displayed.
After data analysis, we agreed with stakeholders that the page will:
- State business requirement 1
- Checkbox: data inspection on house attributes (display the number of rows and columns in the data, and display the first ten rows of the data)
- Display the most correlated variables to Sale Price and the conclusions
- Checkbox: Individual plots showing the Sale Price levels for each correlated variable
- Checkbox: Parallel plot using Sale Price and correlated variables

Page 3

House Price Predictor

State business requirement 2
Set of widgets inputs, which relates to the prospect profile. Each set of inputs is related to a given ML task to predict prospect Sale Price.
Run predictive analysis" button that serves the prospect data to our ML pipelines, and predicts if the prospect will increase Sale Price or not, if so, when. For the Sale Price predictions, the page will inform the associated probability for Sale Price level.

Page 4

Project Hypothesis and Validation

Before the analysis, we knew we wanted this page to describe each project hypothesis, the conclusions, and how we validated each. After the data analysis, we can report that:
1 - We suspect houses with better overall quality will have a higher sales price.
- Correct. Overal Quality is the feature with the highest correlation with the target variable Sale Price.
2 - We suspect houses with larger living area will have a higher sales price.
- Correct. Ground Living Area is the feature with the second highest correlation with the target variable Sale Price.
3 - We suspect that houses with more recent remodelations will have a higher sales price.
- Correct. Eventhough the Remodelation Year has a low correlation with Sale Price, there is a strong correlation between Remodelation Year and Overal Quality which is the feature with strongest correlation with Sale Price

Page 5

Predict Sale Price

Considerations and conclusions after the pipeline is trained
Present ML pipeline steps
Feature importance
Pipeline performance

Unfixed Bugs

I struggled when running the Jupyter Notebooks since a lot of cells would come back with the older version and this took plenty of time.
I had a few dependency issues so I had to uninstall and install again a few applications.
I got a 503 server error when trying to open the app in Heroku. I fixed it by installing protobuf==3.20 and ipywidgets==8.0.2

Deployment

Heroku

The App live link is: https://housing-predictions-app.herokuapp.com/
The project was deployed to Heroku using the following steps.

Log in to Heroku and create an App
At the Deploy tab, select GitHub as the deployment method.
Select your repository name and click Search. Once it is found, click Connect.
Select the branch you want to deploy, then click Deploy Branch.
The deployment process should happen smoothly in case all deployment files are fully functional. Click now the button Open App on the top of the page to access your App.

Main Data Analysis and Machine Learning Libraries

Matplotlib - Creates various graphs and plots to visualize the data.
Seaborn - For visualizing the data in the Streamlit app with plots, graphs and more.
ppscore - Used to study the power predictive score of variables against one another.
Streamlit - Creating the app to present the study.
Feature-Engine - Major library for engineering the data for the pipeline.
Scikit-Learn - Creating the pipeline and applying various algorithms, feature engineering steps and more to it.
Numpy - To process arrays that store values, aka data. It facilitates math operations and their vectorization.
Pandas and Pandas-Profiling - For data analysis, data exploration, data manipulation, data visualization.

Credits

Content

The template for this project was created by Code Institute
A number of functions were built for this project by Code Institute and are credited throughout the notebooks.
The dataset is provided by Kaggle
All learning material was sourced through either the Code Institute program or the documentation of the various libraries used.
I used code from my colleagues at the Code Institute Samuel Dainton (https://github.com/Samuel-Dainton/Heritage-Housing-Issues-P5) to analyse his approach regarding data handling and Vanessa Andersson (https://github.com/van-essa/heritage-housing-issues) with her approach on how to handle the data.

Tutorials and inspiration

The walkthrough project 'Churnometer ' from Code Institute videos

Acknowledgements (optional)

My partner K and my one year old baby T, for all their patience and support.
Niel, tutor from Code Institute, for his super efficient support with my inquiries on slack.
Vanessa Andersson, student at Code Institute, for her support and help on Slack for my questions during my studies and PP5.

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
app_pages		app_pages
inputs/datasets/raw		inputs/datasets/raw
jupyter_notebooks		jupyter_notebooks
media		media
outputs		outputs
src		src
.gitignore		.gitignore
.gitpod.dockerfile		.gitpod.dockerfile
.gitpod.yml		.gitpod.yml
Procfile		Procfile
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
runtime.txt		runtime.txt
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Housing Heritage

Dataset Content

Business Requirements

Hypothesis and how to validate?

The rationale to map the business requirements to the Data Visualisations and ML tasks

ML Business Case

Predict Sale Price

Regression Model

Dashboard Design (Streamlit App User Interface)

Page 1

Quick project summary

Page 2

Sale Price Study

Page 3

House Price Predictor

Page 4

Project Hypothesis and Validation

Page 5

Predict Sale Price

Unfixed Bugs

Deployment

Heroku

Main Data Analysis and Machine Learning Libraries

Credits

Content

Tutorials and inspiration

Acknowledgements (optional)

About

Releases

Packages

Languages

Danpr1984/RealEstate-housing-predictive-model

Folders and files

Latest commit

History

Repository files navigation

Housing Heritage

Dataset Content

Business Requirements

Hypothesis and how to validate?

The rationale to map the business requirements to the Data Visualisations and ML tasks

ML Business Case

Predict Sale Price

Regression Model

Dashboard Design (Streamlit App User Interface)

Page 1

Quick project summary

Page 2

Sale Price Study

Page 3

House Price Predictor

Page 4

Project Hypothesis and Validation

Page 5

Predict Sale Price

Unfixed Bugs

Deployment

Heroku

Main Data Analysis and Machine Learning Libraries

Credits

Content

Tutorials and inspiration

Acknowledgements (optional)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages