diff --git a/.gitignore b/.gitignore index 86308f8c..61b9dd79 100644 --- a/.gitignore +++ b/.gitignore @@ -11,3 +11,4 @@ cmake-build-* *.a *.so data +utils/__pycache__ \ No newline at end of file diff --git a/california_housing_price_prediction_with_linear_regression/California_housing_prices_predictions_with_lr_python.ipynb b/california_housing_price_prediction_with_linear_regression/California_housing_prices_predictions_with_lr_python.ipynb new file mode 100644 index 00000000..a6d6ace3 --- /dev/null +++ b/california_housing_price_prediction_with_linear_regression/California_housing_prices_predictions_with_lr_python.ipynb @@ -0,0 +1,1240 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "7ffef0ff", + "metadata": {}, + "source": [ + "### Predicting California House Prices with Linear Regression\n", + "\n", + "### Objective\n", + "* To predict California Housing Prices using the most simple Linear Regression Model and see how it performs.\n", + "* To understand the modeling workflow using mlpack.\n", + "\n", + "### About the Data\n", + " This dataset is a modified version of the California Housing dataset available from Luís Torgo's page (University of Porto). Luís Torgo obtained it from the StatLib repository (which is closed now). The dataset may also be downloaded from StatLib mirrors.\n", + " \n", + " This dataset is also used in a book HandsOn-ML ( a very good book and highly recommended).[ https://www.oreilly.com/library/view/hands-on-machine-learning/9781491962282/].\n", + " \n", + " The dataset in this directory is almost identical to the original, with two differences:\n", + "207 values were randomly removed from the totalbedrooms column, so we can discuss what to do with missing data. An additional categorical attribute called oceanproximity was added, indicating (very roughly) whether each block group is near the ocean, near the Bay area, inland or on an island. This allows discussing what to do with categorical data.\n", + "Note that the block groups are called \"districts\" in the Jupyter notebooks, simply because in some contexts the name \"block group\" was confusing.\"\n", + "\n", + "Lets look at the features of the dataset:\n", + "* Longitude : Longitude coordinate of the houses.\n", + "* Latitude : Latitude coordinate of the houses.\n", + "* Housing Median Age : Average life span of houses.\n", + "* Total Rooms : Number of rooms in a location.\n", + "* Total Bedrooms : Number of bedroooms in a location.\n", + "* Population : Population in that location.\n", + "* Median Income : Median Income of households in a location.\n", + "* Median House Value : Median House Value in a location.\n", + "* Ocean Proximity : Closeness to shore. \n", + "\n", + "### Approach\n", + " Here, we will try to recreate the workflow from the book mentioned above. \n", + " * Look at the Big Picture.\n", + " * Get the Data.\n", + " * Discover and Visualize the data to gain insights.\n", + " * Pre-Process the data for the Ml Algorithm.\n", + " * Create new features. \n", + " * Splitting the data.\n", + " * Training the ML model using MLPACK.\n", + " * Residuals, Errors and Conclusion." + ] + }, + { + "cell_type": "markdown", + "id": "3c760992", + "metadata": {}, + "source": [ + "### Big Picture\n", + "\n", + "Suppose you work in a Real State Agency as an analyst or Data Scientist and your Boss wants you to predict the housing prices in a certain location. You are provided with a dataset. So, what will be the first thing to do?\n", + "\n", + "If you are probably jumping right into anaylsing the data and ML Algos, then this is a wrong a step. Its a big \"NO\". \n", + "
The first thing is to ask Questions.
\n", + " \n", + " Questions like : What will be the predictions used for? Will it be fed into some other system or not? And Many More, just to have concrete goals.\n", + " \n", + " So, your boss says that they will be using the data to get the predcitions so that the other team can work on some investment strategies.\n", + " \n", + "So, let's get started." + ] + }, + { + "cell_type": "markdown", + "id": "fc550b59", + "metadata": {}, + "source": [ + "

Importing Libraries

" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "id": "a1441566", + "metadata": {}, + "outputs": [], + "source": [ + "import mlpack\n", + "import pandas as pd\n", + "import numpy as np\n", + "import matplotlib.image as mpimg\n", + "import matplotlib.pyplot as plt\n", + "%matplotlib inline\n" + ] + }, + { + "cell_type": "markdown", + "id": "5c33741e", + "metadata": {}, + "source": [ + "

Get the Data

\n", + "\n", + "Here, we already have the 'CSV' file, so we will simply just download it. " + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "2a9ceafe", + "metadata": {}, + "outputs": [], + "source": [ + "!wget -q https://datasets.mlpack.org/examples/housing.csv" + ] + }, + { + "cell_type": "markdown", + "id": "232b2fd3", + "metadata": {}, + "source": [ + "

Discover and Visualize the Data

" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "id": "4f51f1c1", + "metadata": {}, + "outputs": [], + "source": [ + "dataset = pd.read_csv('housing.csv')" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "id": "79251923", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
longitudelatitudehousing_median_agetotal_roomstotal_bedroomspopulationhouseholdsmedian_incomemedian_house_valueocean_proximity
0-122.2337.8841.0880.0129.0322.0126.08.3252452600.0NEAR BAY
1-122.2237.8621.07099.01106.02401.01138.08.3014358500.0NEAR BAY
2-122.2437.8552.01467.0190.0496.0177.07.2574352100.0NEAR BAY
3-122.2537.8552.01274.0235.0558.0219.05.6431341300.0NEAR BAY
4-122.2537.8552.01627.0280.0565.0259.03.8462342200.0NEAR BAY
\n", + "
" + ], + "text/plain": [ + " longitude latitude housing_median_age total_rooms total_bedrooms \\\n", + "0 -122.23 37.88 41.0 880.0 129.0 \n", + "1 -122.22 37.86 21.0 7099.0 1106.0 \n", + "2 -122.24 37.85 52.0 1467.0 190.0 \n", + "3 -122.25 37.85 52.0 1274.0 235.0 \n", + "4 -122.25 37.85 52.0 1627.0 280.0 \n", + "\n", + " population households median_income median_house_value ocean_proximity \n", + "0 322.0 126.0 8.3252 452600.0 NEAR BAY \n", + "1 2401.0 1138.0 8.3014 358500.0 NEAR BAY \n", + "2 496.0 177.0 7.2574 352100.0 NEAR BAY \n", + "3 558.0 219.0 5.6431 341300.0 NEAR BAY \n", + "4 565.0 259.0 3.8462 342200.0 NEAR BAY " + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Lets print the first 5 rows of the dataset.\n", + "dataset.head() " + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "id": "ae042e5d", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
longitudelatitudehousing_median_agetotal_roomstotal_bedroomspopulationhouseholdsmedian_incomemedian_house_value
count20640.00000020640.00000020640.00000020640.00000020433.00000020640.00000020640.00000020640.00000020640.000000
mean-119.56970435.63186128.6394862635.763081537.8705531425.476744499.5396803.870671206855.816909
std2.0035322.13595212.5855582181.615252421.3850701132.462122382.3297531.899822115395.615874
min-124.35000032.5400001.0000002.0000001.0000003.0000001.0000000.49990014999.000000
25%-121.80000033.93000018.0000001447.750000296.000000787.000000280.0000002.563400119600.000000
50%-118.49000034.26000029.0000002127.000000435.0000001166.000000409.0000003.534800179700.000000
75%-118.01000037.71000037.0000003148.000000647.0000001725.000000605.0000004.743250264725.000000
max-114.31000041.95000052.00000039320.0000006445.00000035682.0000006082.00000015.000100500001.000000
\n", + "
" + ], + "text/plain": [ + " longitude latitude housing_median_age total_rooms \\\n", + "count 20640.000000 20640.000000 20640.000000 20640.000000 \n", + "mean -119.569704 35.631861 28.639486 2635.763081 \n", + "std 2.003532 2.135952 12.585558 2181.615252 \n", + "min -124.350000 32.540000 1.000000 2.000000 \n", + "25% -121.800000 33.930000 18.000000 1447.750000 \n", + "50% -118.490000 34.260000 29.000000 2127.000000 \n", + "75% -118.010000 37.710000 37.000000 3148.000000 \n", + "max -114.310000 41.950000 52.000000 39320.000000 \n", + "\n", + " total_bedrooms population households median_income \\\n", + "count 20433.000000 20640.000000 20640.000000 20640.000000 \n", + "mean 537.870553 1425.476744 499.539680 3.870671 \n", + "std 421.385070 1132.462122 382.329753 1.899822 \n", + "min 1.000000 3.000000 1.000000 0.499900 \n", + "25% 296.000000 787.000000 280.000000 2.563400 \n", + "50% 435.000000 1166.000000 409.000000 3.534800 \n", + "75% 647.000000 1725.000000 605.000000 4.743250 \n", + "max 6445.000000 35682.000000 6082.000000 15.000100 \n", + "\n", + " median_house_value \n", + "count 20640.000000 \n", + "mean 206855.816909 \n", + "std 115395.615874 \n", + "min 14999.000000 \n", + "25% 119600.000000 \n", + "50% 179700.000000 \n", + "75% 264725.000000 \n", + "max 500001.000000 " + ] + }, + "execution_count": 25, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Lets look into some statistics.\n", + "dataset.describe()" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "id": "cfcea99e", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "RangeIndex: 20640 entries, 0 to 20639\n", + "Data columns (total 10 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 longitude 20640 non-null float64\n", + " 1 latitude 20640 non-null float64\n", + " 2 housing_median_age 20640 non-null float64\n", + " 3 total_rooms 20640 non-null float64\n", + " 4 total_bedrooms 20433 non-null float64\n", + " 5 population 20640 non-null float64\n", + " 6 households 20640 non-null float64\n", + " 7 median_income 20640 non-null float64\n", + " 8 median_house_value 20640 non-null float64\n", + " 9 ocean_proximity 20640 non-null object \n", + "dtypes: float64(9), object(1)\n", + "memory usage: 1.6+ MB\n" + ] + } + ], + "source": [ + "dataset.info()" + ] + }, + { + "cell_type": "markdown", + "id": "57d17c78", + "metadata": {}, + "source": [ + "If you look closely, \"total_bedrooms\" column has some missing values. Later, we will learn how to deal with these missing values." + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "id": "015161cf", + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "# We are using matplotlib for visualization.\n", + "dataset.hist(bins=50, figsize=(20,15))\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "id": "607214a7", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 28, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "# Let's plot a scatter plot according to co-ordinates.\n", + "dataset.plot(kind=\"scatter\",\n", + " x=\"longitude\", \n", + " y=\"latitude\", \n", + " alpha = 0.1)\n" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "id": "a2e78140", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 29, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "# Now lets plot according to price.\n", + "dataset.plot(kind=\"scatter\",\n", + " x=\"longitude\", \n", + " y=\"latitude\",\n", + " alpha=0.4,\n", + " s=dataset[\"population\"]/100, \n", + " label=\"population\", \n", + " figsize=(10,7),\n", + " c=\"median_house_value\",\n", + " cmap=plt.get_cmap(\"jet\"),\n", + " colorbar=True,\n", + " sharex=False)\n", + "plt.legend()" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "id": "343a7b40", + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "# Lets plot this on top of a piece of california map.\n", + "california_img = mpimg.imread('california.png') #path to california image.\n", + "ax = dataset.plot(kind=\"scatter\",\n", + " x=\"longitude\",\n", + " y=\"latitude\", \n", + " figsize=(10,7),\n", + " s=dataset['population']/100, \n", + " label=\"Population\",\n", + " c=\"median_house_value\", \n", + " cmap=plt.get_cmap(\"jet\"),\n", + " colorbar=False, alpha=0.4)\n", + "plt.imshow(california_img, extent=[-124.55, -113.80, 32.45, 42.05], \n", + " alpha=0.5,\n", + " cmap=plt.get_cmap(\"jet\"))\n", + "plt.ylabel(\"Latitude\", fontsize=14)\n", + "plt.xlabel(\"Longitude\", fontsize=14)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "6859d178", + "metadata": {}, + "source": [ + "

Let's deal with Missing Values

" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "id": "cd7b0f63", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
longitudelatitudehousing_median_agetotal_roomstotal_bedroomspopulationhouseholdsmedian_incomemedian_house_valueocean_proximity
290-122.1637.7747.01256.0NaN570.0218.04.3750161900.0NEAR BAY
341-122.1737.7538.0992.0NaN732.0259.01.619685100.0NEAR BAY
538-122.2837.7829.05154.0NaN3741.01273.02.5762173400.0NEAR BAY
563-122.2437.7545.0891.0NaN384.0146.04.9489247100.0NEAR BAY
696-122.1037.6941.0746.0NaN387.0161.03.9063178400.0NEAR BAY
\n", + "
" + ], + "text/plain": [ + " longitude latitude housing_median_age total_rooms total_bedrooms \\\n", + "290 -122.16 37.77 47.0 1256.0 NaN \n", + "341 -122.17 37.75 38.0 992.0 NaN \n", + "538 -122.28 37.78 29.0 5154.0 NaN \n", + "563 -122.24 37.75 45.0 891.0 NaN \n", + "696 -122.10 37.69 41.0 746.0 NaN \n", + "\n", + " population households median_income median_house_value ocean_proximity \n", + "290 570.0 218.0 4.3750 161900.0 NEAR BAY \n", + "341 732.0 259.0 1.6196 85100.0 NEAR BAY \n", + "538 3741.0 1273.0 2.5762 173400.0 NEAR BAY \n", + "563 384.0 146.0 4.9489 247100.0 NEAR BAY \n", + "696 387.0 161.0 3.9063 178400.0 NEAR BAY " + ] + }, + "execution_count": 31, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Lets print whichever column has missing values.\n", + "sample_incomplete_rows = dataset[dataset.isnull().any(axis=1)].head()\n", + "sample_incomplete_rows" + ] + }, + { + "cell_type": "markdown", + "id": "296cfb38", + "metadata": {}, + "source": [ + "It can be clearly seen that only total_bedrooms has missing values. Let's fill these missing values using median of the column.\n", + "
\n", + " NOTE: \n", + " 1. We can also impute here using mean.\n", + " 2. For categorical data, use mode." + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "id": "2c182111", + "metadata": {}, + "outputs": [], + "source": [ + "median = dataset[\"total_bedrooms\"].median() # Here, we used median() method to fill missing values with median of column.\n", + "dataset[\"total_bedrooms\"].fillna(median, inplace=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "bc5c802e", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "RangeIndex: 20640 entries, 0 to 20639\n", + "Data columns (total 10 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 longitude 20640 non-null float64\n", + " 1 latitude 20640 non-null float64\n", + " 2 housing_median_age 20640 non-null float64\n", + " 3 total_rooms 20640 non-null float64\n", + " 4 total_bedrooms 20640 non-null float64\n", + " 5 population 20640 non-null float64\n", + " 6 households 20640 non-null float64\n", + " 7 median_income 20640 non-null float64\n", + " 8 median_house_value 20640 non-null float64\n", + " 9 ocean_proximity 20640 non-null object \n", + "dtypes: float64(9), object(1)\n", + "memory usage: 1.6+ MB\n" + ] + } + ], + "source": [ + "dataset.info()" + ] + }, + { + "cell_type": "markdown", + "id": "5a7562d4", + "metadata": {}, + "source": [ + "It can be clearly seen now that we have filled all the missing values." + ] + }, + { + "cell_type": "markdown", + "id": "5241d10d", + "metadata": {}, + "source": [ + "

Let's deal with Categorical Values

\n", + "We will use One hot encoding for this. It will create seperate columns of all categorical features and use :
\n", + "'1' : If a row has that feature.
\n", + "'0' : If a row doesn't have that feature." + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "id": "c25339f1", + "metadata": {}, + "outputs": [], + "source": [ + "def one_hot_encoding(data, dimensions, drop= False):\n", + " for dim in dimensions:\n", + " if(type(data.iloc[:,dim].values[0]) == str):\n", + " uniq = data.iloc[:,dim].unique()\n", + " for val in uniq:\n", + " data[f\"{data.columns[dim]}_{val}\"] = data.iloc[:,dim].apply(lambda x: 1 if x == val else 0)\n", + " \n", + " if drop:\n", + " data.drop(data.columns[dimensions], axis=1, inplace=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "id": "b38ebafa", + "metadata": {}, + "outputs": [], + "source": [ + "one_hot_encoding(data=dataset, dimensions=[9],drop=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "id": "33f33aad", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "RangeIndex: 20640 entries, 0 to 20639\n", + "Data columns (total 14 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 longitude 20640 non-null float64\n", + " 1 latitude 20640 non-null float64\n", + " 2 housing_median_age 20640 non-null float64\n", + " 3 total_rooms 20640 non-null float64\n", + " 4 total_bedrooms 20640 non-null float64\n", + " 5 population 20640 non-null float64\n", + " 6 households 20640 non-null float64\n", + " 7 median_income 20640 non-null float64\n", + " 8 median_house_value 20640 non-null float64\n", + " 9 ocean_proximity_NEAR BAY 20640 non-null int64 \n", + " 10 ocean_proximity_<1H OCEAN 20640 non-null int64 \n", + " 11 ocean_proximity_INLAND 20640 non-null int64 \n", + " 12 ocean_proximity_NEAR OCEAN 20640 non-null int64 \n", + " 13 ocean_proximity_ISLAND 20640 non-null int64 \n", + "dtypes: float64(9), int64(5)\n", + "memory usage: 2.2 MB\n" + ] + } + ], + "source": [ + "dataset.info()" + ] + }, + { + "cell_type": "markdown", + "id": "f7cd5c17", + "metadata": {}, + "source": [ + " As discussed, it created four different features according to categorical values.
\n", + "

Note :

\n", + " Make sure to remove original Categorical Column as our algorithm works with numeical values." + ] + }, + { + "cell_type": "markdown", + "id": "1a614e50", + "metadata": {}, + "source": [ + "

Let's create some more features

\n", + "\n", + "If you study the dataset, the rooms and bedrooms data corresponds to the whole location. Since we are trying to predict house price, let's create some features for it.\n", + "* Rooms per Household : To get an approximate no. of rooms each house has.\n", + "* Bedrooms per Room : To get an approximate no. of bedrooms among total rooms.\n", + "* Population per household : To get an approximate no. of residents in a house." + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "id": "3c3adf00", + "metadata": {}, + "outputs": [], + "source": [ + "dataset[\"rooms_per_household\"] = dataset[\"total_rooms\"]/dataset[\"households\"]\n", + "dataset[\"bedrooms_per_room\"] = dataset[\"total_bedrooms\"]/dataset[\"total_rooms\"]\n", + "dataset[\"population_per_household\"] = dataset[\"population\"]/dataset[\"households\"]" + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "id": "4130bd66", + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "import seaborn as sns\n", + "plt.figure(figsize=(15,10))\n", + "sns.heatmap(dataset.corr(),annot=True)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "493fde97", + "metadata": {}, + "source": [ + "

It is important to check what kind of correlation the features has, especially during feature engineering.

" + ] + }, + { + "cell_type": "markdown", + "id": "7b506260", + "metadata": {}, + "source": [ + "

Let's prepare the data for Training

" + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "id": "7816ee97", + "metadata": {}, + "outputs": [], + "source": [ + "#This function will split the data into requirement ratio and shuffle so that we can as get random data.\n", + "def splitTrainTest(data, testRatio):\n", + " shuffledIndices = np.random.permutation(len(data))\n", + " testSetSize = int(len(data)*testRatio)\n", + " testIndices = shuffledIndices[:testSetSize]\n", + " trainIndices = shuffledIndices[testSetSize:]\n", + " return data.iloc[trainIndices], data.iloc[testIndices]\n", + "\n", + "trainSet, testSet = splitTrainTest(dataset, 0.2)" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "id": "18738ea5", + "metadata": {}, + "outputs": [], + "source": [ + "trainLabels= trainSet[\"median_house_value\"]\n", + "trainSet = trainSet.drop(\"median_house_value\", axis = 1)" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "id": "0e289618", + "metadata": {}, + "outputs": [], + "source": [ + "testLabels = testSet[\"median_house_value\"]\n", + "testSet = testSet.drop(\"median_house_value\", axis = 1)" + ] + }, + { + "cell_type": "markdown", + "id": "3deafa5e", + "metadata": {}, + "source": [ + "

Model Training

\n", + "At this point, now we have our data processed and are ready to create a model for predcition." + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "id": "cb5381fa", + "metadata": {}, + "outputs": [], + "source": [ + "output = mlpack.linear_regression(training=trainSet, training_responses=trainLabels, verbose=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 42, + "id": "670501f9", + "metadata": {}, + "outputs": [], + "source": [ + "model = output[\"output_model\"]" + ] + }, + { + "cell_type": "markdown", + "id": "34596fba", + "metadata": {}, + "source": [ + " Our Model is Trained, now lets make predictions on the test_set" + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "id": "a860072b", + "metadata": {}, + "outputs": [], + "source": [ + "predictions = mlpack.linear_regression(input_model=model, test=testSet)" + ] + }, + { + "cell_type": "code", + "execution_count": 44, + "id": "15a9ef75", + "metadata": {}, + "outputs": [], + "source": [ + "yPreds = predictions[\"output_predictions\"].reshape(-1,1).squeeze()" + ] + }, + { + "cell_type": "markdown", + "id": "1c5f2eaa", + "metadata": {}, + "source": [ + " Let's see the residuals now" + ] + }, + { + "cell_type": "code", + "execution_count": 46, + "id": "b3a19cd7", + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "plt.figure(figsize= (6,4))\n", + "sns.histplot(testLabels - yPreds)\n", + "plt.title(\"Distribution of Residuals\")\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": 51, + "id": "1fc06194", + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAW8AAAFtCAYAAADI9OsfAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAgAElEQVR4nOy9e5QcV33v+917V1VXP6ZnNJoZaSR7bCQvIzk5AoOJbGISbC8bgR0BwcsONicEkxiSmxsCC4TjBJOYdQgSYeWai2PCw6wVsDknCIOFxZHt2Hjd49fELEEGWZaFRxajx2je0+967/vHrqrp50z3aHq6W9qftQya6u7q6te3fvXbv9/3RzjnHBKJRCLpKGirD0AikUgkjSPFWyKRSDoQKd4SiUTSgUjxlkgkkg5EirdEIpF0IFK8JRKJpANRWn0A5yozM1l4nqjCXLMmhrm5fIuP6OyRr6O9kK+jvTib19Hf39XwY2TkvQooCmv1IawI8nW0F/J1tBer/TqkeEskEkkHIsVbIpFIOhAp3hKJRNKBSPGWSCSSDkSKt0QikXQgUrwlEomkA5HiLZFIJB2IFG+JRCLpQKR4SyQSSQci2+MlEonkLBgZncaB4THMZi30JjTs2D6EbZv7mv68UrwlEolkmYyMTuOhJ4+CMYquqIL5nIWHnjwKAE0XcJk2kUgkkmVyYHgMjFFEVAZCCCIqA2MUB4bHmv7cMvKWSCSSZTKdMpAtWDAsL9ymaxSO4y3yqJVBirdEIpEsE8O0S4QbAAzLg0Ltpj+3TJtIJBLJMskabkPbVxIp3hKJRNKBSPGWSCSSDkSKt0QikXQgUrwlEomkA5HiLZFIJMtEZdUltNb2lUSKt0QikSyTdWv0hravJFK8JRKJZJlcOJBoaPtKIpt0JBKJZJn81+gsKAE4AM4BQgDib282UrwlEolkmRiWA0YJCBH/cc7BOYdhOU1/bpk2kUgkkmWiawo8XrrN42J7s5HiLZFIJMvkhrddAA4O1+Pg3BP/D44b3nZB059bpk0kEolkmey8ehMA4ImXTsKwXegqww1vuyDc3kykeEskEslZsPPqTdh59Sb093dhaiqzas8r0yYSiUTSgUjxlkgkkg5EirdEIpF0IFK8JRKJpAOR4i2RSCQdiBRviUQi6UCkeEskEkkHIsVbIpFIOpC2Em/TNPH5z38eN9xwA/7gD/4An/vc5wAAr7/+Om699Va8613vwq233orjx4+Hj1nt2yQSiaQdaCvx/vKXv4xIJILHH38cP/nJT/CJT3wCAPD5z38et912Gx5//HHcdtttuOeee8LHrPZtEolEUszI6DT2PHwQH/0fT2LPwwcxMjq9Ks/bNuKdy+Xw4x//GJ/4xCdACAEA9PX1YWZmBocPH8ZNN90EALjppptw+PBhzM7OrvptEolEUszI6DQeevIo5nMWuqIK5nMWHnry6KoIeNt4m5w4cQI9PT342te+huHhYcTjcXziE5+ArutYt24dGGMAAMYYBgYGMD4+Ds75qt7W29vbgndGIpG0KweGx5Ar2ChYbjiMIaoxHBgew7bNfU197rYRb8dxcOLECVx22WX47Gc/i//6r//Cxz/+cdx3332tPrRlsXZt6Rik/v6uFh3JyiJfR3shX0drGZvIIG+64d+cA3nTxdhEpumvqW3Ee8OGDVAUJUxXvOlNb8KaNWug6zomJibgui4YY3BdF5OTkxgcHATnfFVva4SZmSw836V9td3GmoV8He2FfB2tp1Ak3OXbG3lNyxH6tsl59/b2Yvv27XjuuecAiIqPmZkZXHzxxdi6dSsee+wxAMBjjz2GrVu3ore3F2vXrl3V2yQSiaQY3uD2lYRwzlfjeerixIkTuPvuuzE/Pw9FUfDXf/3X+P3f/32Mjo7irrvuQjqdRjKZxO7du7FpkzA7X+3b6kVG3u2LfB3tRSe/jju+9HTN2x6869q697OcyLutxPtcQop3+yJfR3vRya+jleLdNmkTiUQikdSPFG+JRCLpQKR4SyQSSQcixVsikUiWCWlw+0oixVsikUiWSW8y0tD2lUSKt0QikSyTd2yr3rxXa/tKIsVbIpFIlsmRsXnEIgy+lx4IAWIRhiNj801/7rZpj5dIJJJO49R0DqbjgVECSgg8zmE6Hk5N55r+3FK8JRKJZJm4LofrBn2OvGR7s5HiLZFIzmtGRqdxYHgM0ykDfd06dmwfqrBzrXUf23Gq7rPW9pVEirdEIjlvCYYpMEYR0xeGKQAIBXyx+7geQTUbKrG9ucgFS4lEct5yYHgMjFFEVAZCCCIqA2MUB4bH6rqPV8Maqtb2lUSKt0QiOW+ZThnQlFIZ1BSK6ZRR130oqR5h19q+kkjxlkgk5y193TosxyvZZjke+rr1uu6jawsSWizXxdubhRRviURy3rJj+xBc14Npu+Ccw7RduK6HHduH6rrP0Lou6JqYdxskSnSNYWhd88e6yQVLiURy3hIsSi5WbbLYfY6Pp3H05DwYJWAUcD3AclxsGepp+rFL8ZZIJOc12zb3LTnpvdZ9jozNQ2UUpu3B9TMrEZXiyNg8djbjYIuQaROJRCJZJmMTGZi2UO0g523aHsYmmj8ZSIq3RCKRLBPDKhJusiDgwfZmItMmEolEskyCem4e/k/p9mYiI2+JRCJZJrGIAgKUuAoSf3uzkeItkUgky+SGt10AECAItDkHQPztTUaKt0QikSyTiweT0FVaEnnrKsXFg8mmP7fMeUskEskyOTA8hu4uHQMqg6pQ2I5o5jkwPLZk+eHZIiNviUQiWSb1eKM0Cxl5SyQSyTLp69YxNpFBwXLBuUibRFepPV5G3hKJRLJM1iQ05E23ZMEyb7pYk9Ca/txSvCUSiWSZHPz1dEPbVxIp3hKJRLJMgtb4erevJFK8JRKJpAOR4i2RSCQdiBRviUQi6UCkeEskEkkHIsVbIpFIOhAp3hKJRNKBSPGWSCSSDkS2x0skEkkNRkansfdnr2FizgDAsa43hpvfubnpplP1ICNviUQiqcLI6DQe3P8KxmcL4ODwODA+k8ODPz2CkdHmd1AuhRRviUQiqcKB4TEYlgtCAEoIGCUghMIwHRwYHmv14UnxlkgkkmpMpwy4Hg+HCgMAJYDreati+boUUrwlEomkCn3dOhglxXOF4XGAUYq+br1lxxUgxVsikUiqsGP7EHSNgXMxDd71ODj3oEcU7Ng+1OrDk+ItkUgk1di2uQ933LgVg71REBBQAgyujeOO92xpi2oTWSookUgkNdi2ua8thLoaMvKWSCSSDkRG3pJVY2R0GgeGxzCdMtDXrWPH9qG2jWok5xbN+O61utZbirdkVRgZncZDTx4FYxQxXcF8zsJDTx4FACngkoZpRIyb8d0L9kkIwvmVxSRj6rL22whSvCWrwoHhMTBGEVEZACCiMpj+dine5yYjo9PY+8woJmbzAAjWrdFx8zWXrEjE24gYHxgeg+NyZPImHNeDwiiiEeWsvnvB97macAOA7bjL2m8jyJy3ZFWYThnQlNKvm6bQtmh2kKw8I6PTePCnRzA+k4PHAQ6O8dkCHtz/ylmnG4oDAUIIIioDY7Rm1+Pp6RzSOROOy0EJgeNypHMmTk/nln0M1b7PxRSs5s+wlJG3ZFXo69Yxn7PCyBsALMdri2YHycpzYHgMhumAEArqtyh64DAs96yvtqZTBmJ6qXQtFgg4Lgf8Uj/A75LkxN9enaXSMsH3uZXIyFuyKuzYPgTX9WDaLjjnMG0Xruu1RbODZOURreVeKJgAQAC4Hj/rq62+bh2WUxrZLhYIMEYAIhptOOfwOAeIv70KQVpmPmeBEODY6TS+uncE93zrxfCqIfg+txIp3pJVYdvmPtx+/aXoiWvIGw564hpuv/5Sme8+R9E1Bs8Tomq7nhBOAIySs77aajQQ2NgXR1dUBWNUtLcziq6oio198ar3D9IynscxlzHhcYBSgsl5Aw89eRQjo9Ph97mVyLSJZNVo54YHycoxMjqNtB+1ci7+E/lmIBY9+9by4DtUb7XJju1DeOjJo9AjCjSFwnK8RcU+SMtMzhUQpFs4F1cNQW69Hb7LUrwlEsmKcmB4DLGoiojGMJ+1YPspDoVR3HHj1hURvUbEs1GxD/LZjuuBEpFa4QAUhbbVIrsUb8k5RasagWQD0gJB5EpUhpgu6p0558gbTsvek0bEPojUKSHCEpYI8U7G1LZaZJfiLTlnaFUjkGxAKqXTK4uCz2zvM6M4PZ0DoxTJuMiZt9MiuxRvyTlDqxqBZANSKUHkagJ15ZhXi0aujoJIvfgxPXGtra6opHhLzhkarf/t9OdtVxrNMa8Gy706aoeFyVpI8ZacM7Tqcr3T0wSrRSvXBc7Fq6O2rPP+2te+hje+8Y04elScGV9//XXceuuteNe73oVbb70Vx48fD++72rdJ2pdWNQK1awPSyOg09jx8ELseeB57Hj64ai54xU0uQZT74P5X8K/7Xsax02nMZUwcO51ekVb5ejkX7RnaTrxffvll/PKXv8SGDRvCbZ///Odx22234fHHH8dtt92Ge+65p2W3SdqXVjUCtWMDUjUBDRpMVmr/tU4M1bxHcoaNgunC46Le2+McOcPB3p+9tiLHsxSNdmV2Am2VNrEsC/feey/+6Z/+CR/+8IcBADMzMzh8+DC+853vAABuuukmfOELX8Ds7Cw456t6W29v72q/JZIGaVWOst1yo2ebJlgsxbFU/rjaGkDQSR7UTRMIr5OJudWJfNt1EfVsaCvxvu+++7Bz505ceOGF4bbx8XGsW7cOjIkvIWMMAwMDGB8fB+d8VW+T4i3pFM5mEXUpcV7qxFBtDaAanAOE1DaHWknacRH1bGkb8f7FL36BX/3qV/j0pz/d6kNZEdauTZT83d/f1aIjWVnk62gvar2Owf4E5tIF6EUCalgOBvsTS772p/aOIKIx6JqQB01lMCwHT/3iNK678g2YzVroiiogZMHYSWEEc1kL/f1duPWGLfjXR0bgep4QdtuFwgg8T/ibiFZzD5wDisbw81cmcMXWdWf/ZizBdf1duO7KNzRl37rGoCoUmbwdbmv2d6xtxPull17CsWPHcN111wEAzpw5g49+9KP4m7/5G0xMTMB1XTDG4LouJicnMTg4CM75qt7WCDMzWXieiCr6+7swNZVZ8fdstZGvo71Y7HVcd/kGPPTkUTguL0kTXHf5hiVf+/hUFjFdCdvaAZHuGJ/KYmoqg96EVhFZm7aLNQkNU1MZXNQXwx9dd0lJlHvTVRfh6V+cRs6w4fpWrJQAUY3hXx8ZQeq6sx/SUIt9zx7DEy+dhGE50DUFN7ztAuy8etOy9lWcTurv1hHXFcSjqn+CogAWxLuR79hyhL5txPvOO+/EnXfeGf597bXX4utf/zouvfRSfP/738djjz2G9773vXjsscewdevWMIWxdevWVb1NIukEziZNsFTpY3n+OJ2zkDMc5As29jx8MHye8ue6eDCJb+w7DIO7UBWKZExFTFfhet6yc/FbhnpwZGy+5mvc9+wx7Hv+OAgIGCUwbRf7nj8OANh59aaS/ekaQ8Gwkc47ADjW9cZw8zs3l+T6H9z/CixHXFHM5yx4HL6BFUCqO8w2DcJ5rUE+raVYvEdHR3HXXXchnU4jmUxi9+7d2LRJnDlX+7Z6kZF3+yJfRynlgrgmoeHnr07D42JkWExXoTBSUkETPObUdA6G6SIRU9Hle3+4rlez2mbXA88L35OylEsqa2HPn799yeMMcvHBSSOdtxCLiKsEYSRFccUb+zCXtTCdMjCbNgFwPyoWuB5HRGW4c+dl4f5c18P0vOHb1gJiSZUjHtVwx3u2YNvmPtz7nWGkC054vKZdWr0S0xXkDSf8+8G7rq37M1hO5N224t3pSPFuX+TrEAQzJk9P56AwiohGkTccuJ6IIhVGhQ0qJbjxyqGSVEMg3qOn0iAE6OmKIBoRF/Km7aInrmHXbW+pcv8UCCHoTmihaZXreUjoasn9q7Hn4YMlVwRnZvOwbRfc9+imBHBcDx4HehIaknENYxNZAOIEEToEcg7X47j0wp5wfxOzeZiWCw7x2lXf+5sS4MKBBHqTERw9mUImb4e/62IIgHVrYzgzkw///naTxbtt0iYSiWTlWKqbMYhiU1kLhBA4rgcrXxRJciFyfd06KCU4MjaPnWWPFcMNOMCB2bSB3qSOqO+ZXVzVUnz/nq4IZtMiIuacQ1EYDNMBuIjKF0vvlFfQOI4QagALo9b8v+ezFgrWwhBgMfJMOAQSALqmlOzPcT0EkiyqYICErkBhFGdm8zh+JgO3TLRFvboY1JCIqiXDiFcjIpbiLZGcY9Qq9Ts+nsaRsXmcms4hVxB53UBwysVG/E2QzlkYWBPFdMqoGj0rCvXHgYn7RiNKRfNL+fR2SoXop7I21q0RjTyOx5f0HCnPxSsKhWu5Ya7ZK0siVBtT5p9r8KbNvZjLWuH+FEbhui6IL8S6RpHK2jDt2nMqI5oiTlYqRcF0MJMqLP7BrDBSvCWSNqReH5Bq96tWh50yHex/YQzxqIK86VQIXTUc14PrApm8jbiulETbBMBsxkREITD9qNZxgZNTWXCPQyEIx4Wdns4hZ9j+MGIiomPOEdUVJGIaQG0wSsNjrdVMVL5QGtUYTMsV2WnOKwYKLzZgePiVKWzf2o+ZtAETwJouDXnTBSUEmbyFdK7ysVGNwXRcRCMq4roChRLM5yzMZRy0IvksxVsiaQNKqh5UinTBQUxXFo1Gf/7KRNUI27BcrOmKlOxftKZ7KPhiF4woWwoOIJUzQcGhagoiKoOqMCGM3EPB5GH6AABclyMZU+Fw4MH9ryAZ15DJ22ISDVuY4h5Mb59OGehOaCVCW6uZqLyCZn1vDFuHevDzo9MVKY1yVEpgF93H4xzDRyZx9W+vR85wcGIqh1zBrhD8IB2jKhSJuIqLk0nMpArI5S1cuK4Lp/0cdyuQ4i2RtJDyRcNkXMXkvJi8rmsL3iDVotFHnnmtaqej6zphOVuA7XpQGYXjiInujJJFI9MAVaGI6wrSeQeDMc1/HgrTssNUS3mFXLYgap1zhgOjLO/MKBeVJv70dl1jODmZg+t5/uvXQIuGFFe7sihf2Nw+Oo29P3sNp6bzNXPNgXAT/3/iugI9ouDgr6dhO8JMLEBhBF0xFbqmgBCRfqGUIlewkcpZ4AD+bOdvYdvmPtzxpaeXfA+bhRRviaRFlC8aepxjLmuBe0LgghwyUD0anZjNQ1crnfIURoTLIRZ8PBgliOkq8qYjxIgQMIqSiLU4gg7+3tAXB+ccuYIdlgFmDQeULviVlAe9HgfSeRuMin9HNAbLrwpxPUBVCLjnIW84yPlCTymB7XiYSRUQj2r4o2svaciD23Q89HRFMJcxa77flBKR7mAU2YKNbGHh/VQZxeWX9uGq316P/c8fR95yoVACw3JQMF3kTRMUBBv6VOzYvrkt2uqleEsky6RaVAjU3xgT5KY9LsrxAHE57wFh2VtANQe8db0xTM3lK5ppNvTFw9x3cBxXbh3A4/85BtMK9ikUNxFVkC04IBDRuFfU/aj5+7UcD+t6YzBtV5xoAIBQEHiglNRMWbgeoGsUup+bDu7luJ6oZimOk/0TFqMUKkXFwihRWc0rkL3PjCKVtUQuviwdpDDRnKNHFHDOkc3bFSebHduHcMUbBxCPKogoDFds6cf/Hj4Bw3TgeRyef6w3vv2isFwy+OxbiRRviWQZVIsKH/zpEYBzxKJqzUixWPBTWQvdCRUKo3BckTsmgF+mJ8SUc17TAe8P33kJ/mXvL6s65ZV3OO579hgMu7L6ghCCqy4bCPPGzBdjjwuRTecsKIzgj669FABw/yO/AuccqkIRUUubUqqhawzZgl0i8tVy7R4HGAEiGsVc1oKqKSULowAQ09WqZYinp3O+8BN4RaeEiMaQ0BkKlodUtnrVyHVv3YB3Xr4R0YiCaIRBZRT/9esZ/wQlroYUf3E0KJcs/uxbiRRviaQGi1V8VKvomPVFZU1SD7cVR4rlgp/OWZjLWEhEFTiuA4+L6FtVRLSa9P3Ba0XwV2xdh9uvv7SuSP+Jl05WLT7O5G2cmMzixiuH8PNXp/yBu8LLxOUif33jlUPh8asKg2k7sGy3IoKtRt5wwrppxkjoa1IN1+PI5MXJYHq+EIo8ISINE9Mrp7cfGB7zm4k82P6+YxEGTWXImw5m0tVFO6oxXP+2C/Cu7ReJtQUEDTzAVMpAMq6hO7GQzeechyeN8s++VUjxlkiqsDzPalHvXExxpFj+o+/pimAmVUDBFNUhqawFx+VYvzaKm6+pz6ipXh9xw6odIU/OG3ju0BkYfmrD8wAX4krAdjgeffY4njp4CrmCHdZJ10sQbTNGEVEo0kWue4tBCQEn4vGUAJx74YSiLUM92PPwwfDqhTHA8QgSvtNhrmAjb7ol+4tGGJLxCOK6AlURzULve8cmVC63Lu3tUu2zbwVtN0lHImkHqk2DYYyGec5qk1kYpWHuOqD8R188iisaUdCb1MG5aJYZWBPFhrUxGLYwalrJEWG6ptSuxHA8zKQMUdJXmoYGIMQ6UyVXXA/CR4RifW8MpuOh3kxDsA4gUkdCzHviGn73t9fjuUNnwglBlBFoqoKoxpDN20jnrPCEEVEp4hGGjX0xXDDQhcG1MSSiKjyPY/RkCtWEG1h6rF21z74VtP70IZG0IUsNM6g2mUWPKID/Yw+25Qs2FEqw64HnkTccGJYDx+VwXFEaF40o2LyxO9xfo9PNF6M47RONiDRCNQKP7XpYTi9K3nQxdibT0GODskLPj/Rt18Ox02mMnkrD5RxxXQF0BQRAOleZGqEEiOsqLhpMwHI4uMdDYTdtF7rGwui9PN20lCNj8WffSqR4SyRVqGcifESh/hgvYR96x3u2AEBJsw2KWr8LpoNswQnrrG3Hg+WY+P03DZZE+gXTQTpnwXZcfGPfYdy587KqAv7zVybwv544UteoMsvxQKlIiZTDisr+VhrhzVcq+pQSRDWG3BKLncXHRIl4/6MRhigTJ6JMlRQMARCLCodDAuCSDd145penQydCyxEliuB80Zb8xdJR2zb34fh4WqwjtJC6xfvFF1/Exo0bceGFF2JychJf+cpXQCnFpz71KfT39zfzGCWSVaHc2znv1yCXV3IEwmg7olTOdjkm5go4Pp7Gzqs3hT/6PQ8fhMMRngCCS22PA57L/QYYNfSjDgR+Nm1AdCISGLZbNQIfGZ3G/3zqNYCgqgB998ARzGatcLFQVykICEhpgR6A+qPu5VC86+64iqwhTKgc10NPQkMqay0ZkWsqhaYwcHDkDQcF7lbcR2EEiZiGqKagYDmY8VNUvzo2W7GoqxCUfC7Lme/53KEzSCa0mlczq0Hd4v0P//AP+Pa3vw0A2L17NwAgEongc5/7HL7+9a835+gkkhVmZHQaT+0dwfhUtiRarRapghAoBBUVH3sePgjb8ZAp2KI+mgCex7H/hTFcPJgMBaA49VIwnZLJNAQi15wr2Dg1ncPGvjjmc5afAiBhw4yqiDLCf/nRIV9kRZQPAIpCQk8Qz+NIZS3c/8ghJOMqZjILqQTOgYJVO7ReTi57OaRy4mRICGA5HP09KjJ5q2bUrzAKPcLgOF7YtRkQRPS6xpBMaCAQC5UT2Vxo5Wp4Hk5P5yqi6MBTHADyho103objeJiZN0I/lmLKq46yfht9Jt/axEnd4j0xMYENGzbAcRw8++yzePrpp6GqKt7xjnc08/gkkhUjEOiIxiqi1WqlfwCQiGm4t6wdezploGAKj5DAI5oRVEyEKU69pHNWKDgo+n/b5YApRoiNnkrD9pWMUQCEQFcp5jOm8Jn2H3dyKhf6R1u2g7mMGZ4YKFmoi24GBMLNzz6bBTvfRrVW5UkQaZuWi2yV+1AiyjFVhcKyPRAOTM7nKyxZuSe8xctz27pKcWYmH3p/UyLq3QlB1br88rWIiVnhZxKcOFtF3c+eSCQwPT2Nl156CZs3b0Y8HgcAOE7rLhskkkYIBFp4VpRWkJRXggC1DZL6unXYrldSq+BxESkW37+4asF2XBT/1osfa7seXjw8CU2l4XbXE7XFedOtEPzg32dm8piaK5QIqcebmwYhlMA5y0oLsUDKkclZJTn4aIQhrithpG2XheSaStGb1DHQGwsXhn9v23qs6YpUvOagpNG0vbAyZT5n4cH9r2A2Y8H1FrzAPd+7vKcrUlJRBKBiLWIuY4p98+CVtI66I+8PfehDuPnmm2HbNu6++24AwMGDBxseDyaRtIrFKkjqWaAM2LF9CKOnD8HzuEiZcFHaxj2IcV5FcxwBIQAz8wXfv8SrWivNgYoFPM8DvGorjGWPW02qTZFpFIKFVI2mUqiMwuUcBbMyl00JoEcUMewAQK5gI50T1q26xnDxYBI7r96Ev/zn/w+G5YSVM0GbPCGoaKRyOQel1K/LFzC/8qe4GQdY+M7kDRuzGbPkpFuPsVczaWgM2uuvvw7GGIaGhsK/LcvCG9/4xqYdYKcix6C1B8X5yrzhQFMp+nqiYbQajOwqLtUrXqCsNYtx37PHsP+FMXjcAyHEbyYhWNOlQVFYxWOLTai8Kt7T5xuMEkQ0Jib4VGnbZ5SgK6YhHlVg2y7msxZsxwNjBBv7EwDEZ6cQkdp69cS8Xw8uroCC9n5KgAvXLYwYOzGRCa+SXM8LI3ZKgL6eqEhRcWDzxmToDyM8u83Q0Mt2vJI0Vi2aPcOyoVLBN7zhDQAWooGLLrqo4SeUSFaL8nyl63GkciYYI4jraoUXCFC/qdTOqzfh4sFkaKCkMloylzFlOvjGvsOI6Uq4r9uvvxTfPXCkZDHxfENTKRRGYVhuVV+UaIQhEVVBKUXesDE+kwcBF3M1gZJBwo7jYjJtYoCLKyjbH4vmuB40lYGDgJWNdA/ElooFi/AkygF/Eo44AQfrIUFTkO144VUWoaKZnq/WSm8N6hbvl19+Gffeey9effVVmKZYEOFcOIG98sorTTtAiWS5lC9CJuPCj9qwPBBUeobUqu2t5XES/Fc+ET1v2MgUxKBay3ExnTJwZGweiagCz1va42OlSMZUmI7nz3rkTc2FL4WuiRrtV7IAACAASURBVM/AsNyKSJsQIBFVEdNVcO4hb7gldq0Ld1z4DAEgnbPF4GSVoTshZmMSP6ru6YqE9dzFTVOcQ9S7+6ZXQY17MMS4+ARsAjgyNo/br78U39h3GKbtQFUokvEITMsJq2daRd3ifdddd+Gaa67BF7/4Reh6ZR5QImk3quW4u2IqTNvDP955ZV37WGoeZJCOcVwP3QkxvSadt8E9IZbF6ZFsYXUX9w3Lxfq1MViOh1TGWLRcsBkQItryXdcrGcoQoCkUXTEN0QhFtiBmQIp2eoaehIZ00aT2wBvcdjzomnBadFwPa/01Cct24bqiht31XDiOhzvesyVspjEsB7omroIcV0wUchyR8lKY+JxIWZQerIds29yHO3deVpJWE7X4raVu8T516hQ++clPVrxAiaRdqbUIOeDXSddDefTuuh7mMiZ+/OxxKEysirkcyJtCQIJ8evEV9VK50WZAfVOnvOHAMO1VFW5GCVSVwrY9FKo0scT00gXIgsWhUArXP+EF2ykhoExExOt7Y0hlTZiWC+ZP2lGo6F5NZU3Ml1m+zmYMDL98Bq+dTiOZ0NCn6KFdAQjBmq4IHMfFXMYCwKEwAsf1Suxnixesy9NqrssrhlesNnWL9/XXX49nn31W1nVLOoZq/iOu6+EP33lJzceM+CO1grZ3j4vhtBGVhRUHQcqjfNExb7o4MZkN7UWB1gg34OdmwaGrFNOpyqi3GaiqGDBsWi7cssoRSgm6/NSIYTmYz5iwHA8KI+jviWJyvuDPtgQcx/MXgDkIJUjGRBojsMjd8+dvB1C6CFyMSIsTDL8yhYHeaEXtvuIP8ZxIifr5YNRb1k91TacMkLQJRgmu3DoQ7rc4rfaxLz8DDh4Or2gFdYu3aZr4y7/8S7z1rW9FX19pXnDPnj0rfmASSaNUy01X87sGUNG4AQB7f/YaTk4HDRgAIKpIZtOmGEuWt2v40C3AOUAZAF+7WrqkxRG+nmYSUSk4R9WqkYjK0BVXwfwZkBNz+ZJyQ8/jmEkb4XQeSj0EBXAeB5IRRYxvM2zRSs95SSnm8fE0fvzs8ZLnDOq2AbGoWXzlpSkUcxlTeHhT0UHLufAtj6gLfiuaIsbGPXfoTEnXbMC6NTrGZwsr8O4tn7rF+5JLLsEll9SOWCSSVhJEYaJt2QoXCQFxGT+4NhaK9L8+MhJ6gkzMFfD//vBXcD0xQivA9QCFLcx1nJpvIMfZBlWAzY74CUTliONymFUWIOO6gnhUg+N6yOZtGP7UekorT38Los9L9sG5qH3nKCBvuAA4epN6ybrDc4fOVIw+K/73XEbMBw0WIcXVFwdjFKpC/fI/wOMEBdMBowSKItI0gChHrOZ5cvM1l+DB/a/U7U/eDBqq85bUj6zzXl32PHwQE3MFpHJW1UYSQoCumIZkVAGo8AQJTKAarbluVSqkHWCUgDEK23ErqlcURpGIiqnsBVMMFy5+b3sSGtI5K8wTB5PcaylQ4OdtOx5URSwYG5br12+LdvZoREEmb1edo6kpxB/tRsOFW9f1YFoueroi4vP3G28IhFUBYwS9XZFQ7DnnmMuY2NgXr6g2Ghmdxv/zg5Ga71Vb1Xm/+OKLePTRRzE5OYmBgQHs3LkTV111VcNPKpEsNmJsOYiqD7t27S0HDNNBwXCwtjuCiXRh2f4c56Nwq751gO14cL3SfLauMXTFNRCIhcZUrtRnhFECTaXQIwoyeQuKP8+SEOENU+vkSSnB+rUxnJzMIhphoQMhIYDLud/+Xr1mPsifR1SKgulhbCLrd2VSAATpnBVWBwUNQAAqziTpnAXDdEta7M/WZ32lqNvb5Ac/+AE++clPor+/H9dffz0GBgbw6U9/Gv/+7//ezOOTnIMEKY7yH8TZTI7RVQrT9moKK4e4ZLZdD2dmly/c5xuqQsPot/g9owToiqtYvzaGmK4inbUwOVcQaY7iShsC/N8f+G/42M7fQk9cExUkhEBhwselWsRMiLB4DYYvU0qQytkLHi9V7AXKURgFOEfeFD7mjIlI3bRdeJxjPmth7EwGcxkTju9TIyp0RIrsxEQGqayJnOEgEVNrTlRqJXVH3t/61rfwne98B1u2bAm3vfvd78Zf/dVf4ZZbbmnKwUnOTYLyO8/jmJwrhJfBe58ZXdKOs1qEPjI6jXQTa6hbXRK22gTOgY7rVZzkVIWiK6aKAb+Gg8m5Qk2/E0rFNJviJqiR0eklc8XF9fFnZpa34Gq7Ra3vlIJzDup3VAY15xwLz1PstwL/3+mcJa4q/GqXgFqGZatN3eI9Pz+PzZs3l2zbtGkTUqnUih+U5NxmOmWAEGAuYyIYOuC4Hk5O5fDXX/0/2NAXDxcX6xkNdmB4TMwzJKio9y2mfGGrXs4X4aYUYITCriLa0YiCZFyD53FkCzZm06W2s+XvrRgAwbCxL175RA30itRz4qz2uRb/7boeGCXwOK+5r2AzKfpbtOITWI5Xl2HZalN32uQtb3kLvvSlL6FQEOUx+Xwee/bsweWXX960g5Ocm/R1635tbjB0YOFHlSnYOHoyhft/dAjfe+LookOAAwI71yCHWQu5NF8dhYmFQc9DiQ0rpQTJuIbBtXFEVIaZlIGp+ULVxpvy95YAMGwXW4Z6SrYHJ9p6qefEWaWApWIfHucVKZryc0ixcAMipSOsDKoPI17JAdHLoaFJOp/61KdwxRVXoLu7G6lUCpdffjm+8pWvNPP4JC2mOG0x2J/AdZdvOOuBuKemc7D8SeIcpQtWnAMKRdgs0d+jA2V1uqemc2Gdtq5SZPIW5jIGVIVVe9oSzrcUyGIoTCwcli8YaipFMq6BUYpswcbEbG7p6BeBZ4j4DEWTEPDESycrpgutZJO2yJsvfb9q96F+mUlwW/lLJATY6F8FVusVCK4CW0Xd4j0wMIDvfe97GB8fx9TUFAYGBrB+/fpmHpukxZT7esylC8teaS/e15quCPKGLQyBSGU0RAjxJ9OI0V6EiOoAy/HC/Oqx02noGsNcxoTncRCKuhYhz3fhJhCLd2KCPS/ZHtMVdMU1WLaHTN6GWcWPpBbBnlixUx8H8qaD+390CFdc2oe5rG+tukIfAvU7JVHFdKueck6PizvVKll0XI4tQz1VDcv2PHwQjLV2ks6i4h24BgILNrDr1q3DunXrSrbRFo8DkjSHcl8PXWVwXF73oNbF9rW2O4qZVAGMUhBCYNpCKJh/Dez5EbjtepiaK1T8EC3HE5UIRHhfEHDA99WWVBKM+iqPtBkj6Ipq0CNiAXJqrrDs99D1AEIqH2s7Hl44PIlYhC3ZoVovhCx8R7g/Vrn4sItPJvBLEilBSUNRINZXXTaAE5NZnJkrhO8NpULU97/wGxwZm69YKK9merbaLPrsb33rW3Hw4EEAwGWXXVZhSiUtYc9tFps8c7b7ikYU0S2XMRHRGCzbDet+xY+QoyseQTpnYrEeGo8DFMIcioKD0doCfj4215Qs1BWFlsJGVQtbw+ezKzP3crF1hXyVSTn1Uv7ZBc/jeaKKRFEYIgotqWIJ1lO4B7g1PvlEVMFc1oJhe9jQFwchpGRqjsd5OD4tGddgWC76unXoqvDKaSWLivf+/fvDfz/11FNNPxhJe1HuypczHMzMF8A5KkZ9NbovQETMmzd2Y9dtb8E39x3Ci4cnYTkeCAFiEUW49tUhua7rt7YTCs5r/6DOJ+EOTmLFJzJCgHhURUJXYdhuyeDidqdm/b5f8x1RKEzHC78twRUcgWjoqUW24GD0VArr1kTDqpIFDxsSttDnDCe02J3PWaFXeCtZNN8xODgY/vvAgQPYuHFjxX9PPPFE0w9S0nxGRqex5+GD2PXA89jz8EGMjE6XDNDNGzYmZ/NwPY7uhNpwY03xvqqt2r92Oo2erggiKgUlwu/ikg3Jui7hORBOUDmfsybF3iHF75vCiBjcuyYGcGByLo/5Jgi3Qol/wj07yvdQPhiaYKHjkwOIRxTkTEf4lFACTaGIR1UwSuo6YRNCkC44yBdsmLbw+Q4GDCfjwlec+FF8UPUU05WSwRCtoO5k9f333191+wMPPLBiByNpDbU6HgHg9usvRU9cw3zG9EUggnhUa7jTbNvmvnBfecNBT1zD7ddfCgD4xr7DmE0bKJgOuhMRrO2OwgPHC4cnm/aazyUCnw+O0gHBusawrjeKZDyCnGFjYjYvbE+bcIIjBFiTrJzivpz9ML/7MhDxYOoN/G39a6LY0BfHmq4IVIWGpleAeB9iuoJcQSxw13MuEdNzhBgHXaCMihNeNKKEYl48gk3zn7eVLJlxf+GFFwCIxckXX3wRxT5WJ0+eRDxepQhf0lGULyZGVAbT377rtreEo766E1rJYlej+e9g1T4oGfz6o4dgWkUt7Y6HKT8tI1kaSgk8j8MresMoAeJRDTGdoWC6mEk1brzVKAQiLRbTVcxnrYYWPKvlshVG4PGFiiPGKFzXDR+ga0wMH2YEN145hP0v/Aac83BEWTpnhXte6liSsWD8mhhcce9Ht4fBDKWiRV+koLySSDto1Gllp+WS4v23f/u3AISf99133x1uJ4Sgr68Pf/d3f9e8o5OsCvUsTPZ168gaNlhRZVE6Z8GyPXziq/8HriumkQTdkbVy4cEPI284MMqmu8hKkfoIatWLo2yVESQTGqjvmz05Z63aSZBjobO1OLgL5kMG9CS0ig7Y8kNUGMG63hgmZvPCTVBloT1rKmvCsj3kjdL5o0fG5jExJ5qHpucLFW3utUjGVKxJ+mPUFpmaM9CjI11wQjEvHlwd2A63giXF++mnnwYA7Nq1Sw5dOEepNS6suAV4x/Yh/M+nXoNDxDDXdM5COm8hqinImw7AAdPmmJhbvBb8wPAYHJdXnRwuqU1Q6BU0wATEIgqSCRWm5SGTs1tWAWE7YoSY5/HQG8XzOCIaRVdUgWV7/igza9E8tOtynJrMhhUyEYJQMFWF4k/evaXkezUyOo2pubw/zmxpCBGBSURliEXVCjEOKK/tXmkXzJWg7kLFj3zkIxgfHy9ZxBwfH0cqlSoxq5J0Hju2D+HB/a9gNmX4/scEusbwR9cuDN/YtrkP3d0x/K8njmA6ZcCyPXTHI8ibTrhQFhja93RFKmrBgy//0RPzdbnCSQSBb0dxFE0pEb7ZmvDNnpwzappDrSqcg4Cgu0srSTGYtou+7ih23fYW3POtF3HKn+5TfsRxXUHOcOB4HAoVV/d508WpqRwGe6P4o+svrfhOPfTkUWQKzpKds9Sfe6lrop79QzdUTlhaTIyrNeq0mrrF+zOf+UzF4qRt2/jMZz6Dn/zkJyt+YJLmU9yunjccf4WIAyBVzYOu2LoOF/WJS9hdDzyPmK4gnbNCbwlKRMVHecqluLtSVVjYkCOpTTXRFjldDYQAuYKDdK61Y7jKUZiw5c36tdZdMbUiqjVsD2u7I8gUHJj+QqPwfyJh4OB5HJyQ0KbV8zhOz+Sx92evAShNazBGFx63SG6fUYJoRPFPJHpbinGj1C3ep0+fxoUXXliybWhoCKdOnVrxg5I0n2JBDcZQEQC93dHwS75YJ2WQalH8OtjAUlNhtCLlsvdnryHlL2RVG4MlWaBctAkA3Xf0My0XqawFpx4zjxZg2aJOOhphyBZsuK5XsQbS163jzGypzasHQGPCgpb4ZSYLlSbi/xklmJw3SlJywVpN+B1cxDXSdjycnMwgHtVKrig7mbpLBdevX4+XX365ZNvLL7+MgYGBGo+QtDPFFSaOP8cvmDACLF1JEtRtRzUGDrHYyLmHaEQpibRGRqdxeiYPj3NQUrqgJamEF4lVd0JDX48OSvza7KzZtsINBPX2HBm/ySUZU7HrtrcAQNhDMDUvRtU5oTEZ4Hni+0YJCV9/sfdKUDYYVJ4E5al93Tosx0MypgbXi4viekDBsLD3mdGSfoZOpe7I+0/+5E/wF3/xF/jTP/1TDA0NYWxsDA8++CA+/vGPN/P4JE2iuMIkiFyCtAewtGdxMLn7iZdOwnW5P2KKYd2aaCjcQX6TA4te0p6vVOsd1VSK7ngEjuuJkWId0gEZwP2TtMdFqmPfs8fw3KEzoblZMFxBVG4AqiIqOAqmi3W9UcykDRTK2uiDk4KiMDiOi9FTaex64HmRvy7YiEVVrEloSC+yYBtM5rFdYHKuEHZKtstIs+VQt3jfcsst6Orqwt69e3HmzBmsX78en/3sZ7Fjx45mHp+kSRRXmCRjKmZ9tzeFkZLux1qMjE7juUNnkPSjw/Lc5oP7X/Enf0vKCUQ7eG+CxpJYRIVhOZhJ178AqSkEtsPb6n0mfgStMIInXjqJZEILK5lcj4dNMOt6RY+IwghSWQv3fnQ77vnWixi38xUWrq4HxBWKuYwFRgliuiKEmhAoBHBAsGlDMvz+3f/Ir2D7tglBq3ywT9vxYFiuSA8CyzJaawcassV697vfjXe/+93NOhbJKrJj+xAeevIoTAiTqC5H+DdEVIaeuLbk6vtijT0ARNcbWf70mnOZYse7rpgGxoC84WJqvv4FSLE+oaO/W8erJ+bbpnzH9bjv1crRnYhgJmWgT1m4giNEpERcy8XYmQwURpBMRNDvX+UZtoeN/QkUTCccDBykQ0RHI0dPlx62qQNAIqbhXj89E7B5YzeOnU6HJmfljUqzaQO9SVF9cjaNNrWcd1ZjZWdR8f7xj3+M973vfQCAvXv31rzfzTffvLJHJWk65Y0I63tjDdWuBqb6E/4AV4VRaArFzLwB1zu/PUaWQlMpkjGRGskW7GXlsSkFVEaxY/sQXl2hRpHA1ZEvMi5sKTgX3djJuAbGKHRNCQ2fhIf7wo45ANvlmE0ZeMd/E7MBgivCmC46H/OGLWrDOQfnPGxZD6i1NrNj+xAe/OkR5ApWRRTPfMOzdM4CpZG6R5pVq/VuJUu6Cgbi/eijj1a9DyFEineHUi7gQdRcj4DrGsP4TA6AsBx1XBeG5YJRv+Zb5rhLKE6N5E0Hs+nCWZ3gKKH42B9uw0V9sRW7uuFcOPDR0Jp3eTslhCCTt+Bx4Ia3XYDnDp2BCSCds0uqaMIrEEZwZGweO7EguqLnwAOjFHpEwR3v2YIDw2OYz5U249Ram9m2uQ93vGcL9j4zipNTOQALk3PE83PYVZpzalE+mCTIl9d0O1xyj2fPouL9zW9+M/z3d7/73aYfjGR1qfWFBOoQcM79iKb0a+p64ochEYjUiOo3nDgNpUZqQQngegvh5Ia+OE5N5Vb0XSc18gHlLe/lqH5nJaMUyaiCnVdvwsWDSRwYHsOkXyKoMDF0GhALnBwojZ7DkwYp+bs41acptGpnZDFBLfeehw+G6zt5w0Y6b8N2POiqgtvLGn9qEdaUu5642nS8MJfeKhYVb8+r73JOTtLpTBbLW5d/oUdGp/HU3hGMT2WFIU+6dp5QSrd4LxMxFbYjRoqtpG+L5y8GPvLMa/jkzdtw8zs348GfHgnLPM+GoEmm1k+6YtxYUdRPqfC2EfdbsEAoFtGFPLS/P/85g+j5wPCYqB5JLkTTQc9BUHbYaJt6+fpOIML1CjcQnFw45rLWQvNQixdzFhXvatNzqiEn6XQmxeWCBdNBOmfBdlzMzBcwMjodfrH3PXsM+18YE+Va/tBa0+qsErbVIBgioUdE2/pMEx3nHJdj7EwawEKK4LuPv4qZ9PIm4giLA3/yjL//ahQLr6pQ9CQ0ZPxIViny3bYcD7rGwkHRfd06tgz14PRMXuShOQmFPxpRwuh5KZO05XRGlqcHl+NN0tet49jptC/cwVUD0MpQZVHxLp6e88wzz+Dxxx/Hxz72MWzYsAGnT5/GN7/5Tdxwww1NP0jJylG86JI3HNHOrjLMpg0IP2SR67z/R4cQ1RiScQ2nZ3IgvleyZXsw7ZUZmXWuoDCCRFQDIKLNnLE6NqGZvI1dDzwfitGX/+J3w4lEjUiKwsTnXu9whjVdYjyd53HMZkzoKoMFLhq2fKOnYNKM4/EwJffcoTO49vIN+PmrU5iYzQMgWN+r46Pv2xbaLtRjkrYczrYdfsf2IXx174gIZkMfldrvch0x71lDeJ0tb9dffz1++MMfIplMhttSqRQ+8IEP4D/+4z+adoCdysxMNqzV7e/vwtRUpsVHVJrj1hSKTN5GKmeWGNkHl/eyvG9pIhpDXFdg+2WWq20ORQlwwUACmbyNdC6oyAAoI9BVhlwN50ZGiT/lXVQF9SQ0ZPMWAu1eagH0ovVdJVdqEVXBDW+7AEfG5sPINpu34HCUiHBg6RrTlZLot/j3Uf4dDfLajaQ4msU93x7G5FzBvwIVPjOTc9XXMBgl+Oaua+red39/V8PHU3eddyaTQaFQKBFvwzCQybRelCT1UZ7jDpzf5jMmAA7ImuwloQSI6So0laJguphdZppiJVAUivGZfEXE7LocOVc47XEU5aQJQCiB4gs3oxTb39iHuayFozkLCiUgtDICL167DMacRSMKohElzG3vvHoTdhY9JjAuC8gbNjIFG+AcfT16yeL4dUXCtRIpjmZx8zs3V5xYarEa3vR1i/f73/9+fOQjH8GHP/xhrF+/HmfOnMF3v/tdvP/972/m8UlWkGr5xK6YGi50BQ0UkkoURpHw/Z9zhoNswV76QU3GdXlFtF8stOX56e64ilTWRldMqxDF4oqMk5NZiCk04rHFz9AVK53bWCulUZ7+SOdtgAOqwsIGm2Bx/Lor31Dy2HZ1/Kt2Ygna/VtBQ5awQ0ND+OlPf4rJyUn09/fj9ttvxy233NLM45OsAEGeO5W1kM5Z6OmKhI0OluNBYcRvFKGQtSKl6BpDVBfDBFI5s62uTKg/Ib6YWodnOx4cl2PzxmRYtVFMcUWG4jv8USou/4VfiSh5pFTYJyxVqhfsL22J4dWm71zZpS2kURodo9cOlJ9Y7vjS0y07lrrFm1KKD37wg/jgBz/YlAOZm5vDrl27MDY2Bk3TcNFFF+Hee+9Fb28vXn/9ddx1112Yn59HT08Pdu/ejYsvvhgAVv22TqHYq9swXTBGYPuhVJCnY0xcVweeJquxyNIJBKkRhVEULAdzLUyNlKMpNMxVN3ommc9aSOhKSSVRQHFUmSvY8DyOREwt8eQOBkbXk9IIjMv2vzgG1xMeIwRAtmBDUxmiEWVFFiJbDatyAg22N5u6Fyw55/jBD36A/fv3Y3Z2Fj/5yU/w0ksvYWpqCu95z3vO+kDm5+fx6quvYvv27QCA3bt3I5VK4Ytf/CL++I//GB/4wAfw3ve+F48++ih++MMf4t/+7d8AYNVvq5dWLlgWL/rMpU1YtlszIiNAWP7HfV/NdoouVxPFb1YK8rjtOFOTUQIOjpjGYLuiS7DWYdLAG7zo72RcQ85wENXYovNGv7nvEIZfmfKtfAkuvSAJEFJTtKu1jgcdkUFzzGzGBPwa9TVJPTwhXHflG9piQX85/Onup6u+/5QA3/rstXXvZzkLlnV319x3333Yu3cvbrnlFoyPjwMQHt/f+ta3Gn7SavT09ITCDQBvfvObcfr0aczMzODw4cO46aabAAA33XQTDh8+jNnZ2VW/rVMoXpi0XW/RRAghoiXa46WLWwDCZoRzHV1j6E5oUBXhd7HSTTUrCSHAzrdfjD/9g9/Cmi4dA70x9HVHqn5OHi9No1BKkPGjatN2w0XDck/rfc8ew/CRSRAiBhtzznHkRAonp0Qu/NjpNL66dwT3fHsYI6PTYbAwMVdAJm/h1yfncf8jh3D8TBqaX/sd01X0dkWgMArb8dAT19qiguRs0TVFLAQHbfdE/GZ0rSHPv2VR9zP86Ec/wo9+9CP09vbi7//+7wEAF1xwAU6cOLHiB+V5Hr7//e/j2muvxfj4ONatWwfGRK6MMYaBgQGMj4+Dc76qt/X29q74a20GwcJkwVy6fC2Y2lJcmRBc4moqw7reGH5zpjOjosUgBIjrKhgVczdT2bPvTmwmwWW4pjDsvHoTAJTMFNU1BYQAedOpeuVE/ZmQ4DzMldfqqH3ipZMgIAuX/kScBbIFBwV/dBkhBJP+sOmIQuG4HJm8BYCAUTGazLE40jkL3YkIACHgjFH0xLWqefdO5E2be/HC4cnw7+BK502bm68VdYu367qIx0Xra9B1mcvlEIvFVvygvvCFLyAWi+FDH/oQDh8+vOL7Xw3Wrk2U/L2cy6LlMtifwFy6gLnM0rna8soETaWIRxTMZS0Ylus3U5w7KIwgpi9UjbTF4N4lIESkdDj3YNhu+F3q7+/CFVvXAQA++j+exFyqUHXFkgDojmmY9ye3E8KhKhSqQqEwgrmsVfL9NGw3HAAMlA0/BvFH2YlqlIjGMDFbAKOi0SfoPmR0oUU+EVPFicJ2AQ7cesOWit/Dav4+lsvPX5nAI8+8honZPNb1xvCH77wEWctDIqYgX3AQuOHGogqyltf011S3eP/e7/0e/vEf/xF33303APHB3HfffbjmmvoL0eth9+7d+M1vfoOvf/3roJRicHAQExMTcF0XjDG4rovJyUkMDg6Cc76qtzVCK3Pe112+AQ89ebTujrliLNuDZVtFf58bw4J1jSGiMZiWuyIeIKsFQXB1JMRSV1n4XSr+XmlU2KuWE1xRpfMLpY2ci5JQYYlKsCahlXw/dV9oGRX7K23aEWsjgb8K9W+0HVE3HiyheZxDVRgiKkVCV0ty4Rf1xUqer12a2BajeB1JVymm5vL4l72/hGG56O3SsTZJoCoiJcQ5x/hUtqHX1NSc9913343JyUm89a1vRSaTweWXX47Tp0/j05/+dMNPWot//ud/xqFDh3D//fdD00Q96dq1a7F161Y89thjAIDHHnsMW7duRW9v76rf1s6MjE6HcwIPDI/hd397/YrstwMC05qI1IgY3ut5HCn/aqKTCN5+1+Pg4LjhbRcAEJ/33Q88F85iLBS9ruL0N6UEmkL9yFhsY1TcZz5jVi31u+FtF4CD+4vYC4lzsVi60BqewQW1UgAAIABJREFUjGuwHA/remOghIb3D/xRohGxKLrrtrdgz5+/Hbtue0vH5riL15GCOnVhcMUrmnVWq4qmrmoTzjlOnjyJwcFBpFIpnDp1CoODg+jv71+xA/n1r3+Nm266CRdffDF0XbzwCy64APfffz9GR0dx1113IZ1OI5lMYvfu3di0SeT9Vvu2emlm5D0yOo3vPXEUMykDHIBC/SaMLr2k/nY+Y4XlgbUmfpyLdGJqpJyIysAoQlGOaqIFfefVm8IoMKIxUEJgOR4mZ/PQNYqC6fmpkQURj0dVrOmKCFta3xLV8QVnsDcKw/YqKkj2PXsMT7x0EoblQNcUvGlzL05MZnF6Jg+FUXQntBJ3vuGXz4SeKoFBV0yvz3K1EyLvoGO02KiPc475jImIJoQ8rivIGc6y2vmXE3nXXSr45je/GQcPHpT2r3XSqHhXK7Wq9uGPjE7jG/sOI29W+lYkYwtWmqbtYnI2X1FBci4TURl0TVzyd1qEXU5XTMXGGqV8QTdkIqqGqbHxaTFwoKcrgnTOEk02hGBgTRSJqFph9pTOWcgWbKzt1qt6iNT6PtaaJhOk6QqmC9sVXtc3XjkULq4uRieId3EHaoBpu+HIwAPDY5jLWliTWHqEYDWa6m2ydetWvP7669i8eXPDTyJZnEaGIhwYHkPBF+4gBgi0OVOwoUeU0DAoCDjP5ag7iPIYIyiYLlIdlM+uxdouDV/+v66ueXs1m4PuhIaZlBHmXMWQA4Ir3tiPiweTFUMMsnkb8ahSMaDgG/sOh9NvHJcjb9iYyxgYPZXGjVcJMa52MhHfXRXd/jq9abvhdJxzgcUGQQRdl6t9EqpbvH/nd34Hf/Znf4b3v//9WL9+fcnlgxyDdnY0MhRh2k+VVCu/5hyhtWtxtM0RmMc36QW0AEYJ4roKj3vIm25HpkaqEdUo/vuOLYveJ/AN0YqiQMYoepM6UjkLrsehMoZohOG5Q2dw8WASt19/KfY+M+pH6ASu50FVaNg8QwAwApi2g/0vjiGiUv/qZaH0b/+LY7h4MFn1O7mYB/e5QDsaZtUt3gcPHsTGjRvxn//5nyXb5QzLs6eRL39ftx7mumsJOOeVXXce9832O3yGgqb6Q21tF+l850fZxWgKxdruaCgItVIXQRRoWE6Y83ZdD7pKoXTrFZf2B4bHsGWoBxNzBbhcNN54/omeUhoOGPC4WDuxHA+5ggPGaNj8Q3274GoBRbM8uNuNWoZZwec0m7XQu8y0yXJYUrwLhQIeeOABxONxXHbZZfj4xz8eVoJIVoZGvvw7tg9hbCIrmjFq7E+MlvI77Iru1KnCTQBEdQUqo6GP9LkGAbB+bSwcHVZPKu2pX5wOx9Lt2D6E7z1xtGoQcHo6h9FTaXicg/nzLzkA7ol/B2IuKkgiSGUtmJ5b0rXJIabV15rU3shsyXOJ4s+pK9rgHNizZEnxvvfee3Ho0CG84x3vwBNPPIFUKoXPfe5zTT2o841GvvzbNvfhzp2XlVSbaArFe64cwpGxecznLEzM5CtaozsRRkXVSDChJu9VHy5wLsAhFhHX94qmt6VSads291V4gvR1j1UNAhyXw+OiDhtYuGIjDP5tomknGRduk5btwrJduB5f6LyFKP2rNak9OOZ2SSmsFtVKCGulPFeaJatNrr76ajzyyCNhi/jtt9+Op59unQ1ip9CsapOl9vHg/ldKGjI6EU2hiEYU2I6LguV2TLVMLYe5Rh7/B2+/CEfG5nH0xDxUhSEZVxHTVb+F34TteLj0wh7s2D5UId61ptAYViDGC141nHO4HNjQG4XpeBWPuWRDEj8/Ou3nzymiEQZVoXWVwDX6Xe6EapNaFJcQFjfp5A0He/787XXvpynVJvl8HgMDAwCAwcFBZLPZhp9EsjSNGtDX+oEk4xpypgO3A4cqxCIKFIXCtDuzakS0sZOSgRaECI8Y0YUoYtha+u55HM8dOgPGROu643qYzQhXyJwhPEtUhYWX5t3dsXD2I1A7Aj4wPBaaRnmchIvXjBLcfM0lVR+zbXMfti8joGikcupcoJX5/iXF23VdvPjii2Hbq+M4JX8DwFVXXdW8I5RUUOsHcnw8jYnZgj+xhC6rPX61oZQgrisAAQqGU7V+vVNY0xXxq31EhCt6Iji64xp0jYXR2Me+/Aw4eOgDAoh2ctfl4SV4d0Lsi3PR2k79rptkXA0vzR955jV88uZtJcdQKwh46Mmj6IppyBu2XwNOceOVC2Jc/phGoufyodaaSv101+KVU+cCpUMsxKCK1cr3Lynea9euDf1MAGHdWvw3IaRkyryk+VTLh6ZMB/tfGBMt0Fw4M7YzqkIRiyhwXE+Y/3fehUKIcGCkmM+YYJSCEPFiRIRLfA+RSBiNrVujY3y2AA88rMHnfKGVHRAzInuTOlJZE6bNoSg0TKEAIrU0WadpWGlETuoS43qj5/L7zmUMmI5IIRQf67lUNlhM8Xt7Nk06y2FJ8Zb57dZRK/qpVlqYMxzYrgdKSVvXPEcjCiIqhWl7HZEaqac+vium4o4bt4bVHgXTwWzGhMeFONtlC9A3X3MJHtz/CgzLheP7rXMORFSKTN4OB0NHIwooJUhnLSQTWsWl+UBv/Y6ejaTlGuk7KL+vqjDYjod03g7F+1wsGyym7Zt0JKvLYtFPeZ6tYDphioQSoN1i7mCsGKMEedMJO0TbnWBxLzjZ1CIQ2+BzCUQr6FrU1VKPj22b+3DHjVux92ev4fRMHqrvFWI7Yk4mgJLxY0HHY3k10h++85KKY1mJhe9G+g7K75uMa5hJFcKFu/OhbLBVdd7SqKRNqeVidmB4DDu2D8F1PZi2G5rjAKK2u50mwAQGRvGoirzpIJWzOiIPH8D9Rb0/f99v17wPAeB4HA89eRRbhnqQL9gYn85hNm2CexwJXcGdOy+rOjMyEdMw0BvDYF/cby2PIBnTYFou8oYTTpvZefUm3H79peiJayXbAy/vgOCEL04gSs1JOUvR163X7ZRXft9oREEyHoGuspJjPRfz3UDpe15c593oe74cZOTdpiwW/ZRXFXAOdMUU5E0XvA3EW9dYWC+czlkdU+pXDofIQx8fT9f0hyF0obb350cmF+ZhBT2wi0x1nk4ZIASYyJhwXA8Ko+iKicHH5WVm9aQ9Gkl3LEYjfQfV7qswgg9XOWGdi7SyzluKd5uyVAlS8Y/5nm8PY3w239LyQEKAmK5AVRgKplPXFJ9OgHOOR587XrPhSfUXGDWFYnw6j7U9OtZ0RcLbg/b0aj9kXWMYn8mBEApKRInhbNrA4Nr4so51pTxGGmm6afcGnXJr28BWd6Vopa+LFO82pd7oZ2R0GjNpo2XCzRhBws/x5gwbuUJn5LPrxXIWf1+TsYVFOYCH1SIBi/6QeeBQw4uuTgiWe6mykjXHjSxwNtqjsFrse/YY9j1/PJzHadou9j1/HABWTMBbWectc95tyrbNfVXznNVW+1sh3Jr6/7d379FRVeffwL/7nDP33DMQghb1jUvNr7yxFCoIr61AxZSfJthlhMYiXUXt0lbR2lasrVqpr0KXS1prL65S7aq0urQ/rQWkouBbK6uKlRaoFxZRDJIbuSdzn3P2+8eZGWZyv+ec5PtZyxbmJJmzQ/LMnr2f59kK8rPNtc2uYBQdgWhGccpUklz5UHoc0S5grvEmc3uLCrzDOlUlHDOQn20eamBIszNgfrYT4QE2RwfScy9kInOOrejlA5+YabNSIpZoAwCZeHyMTOb3nDNvCxvKjKa5Iwx9gnK6k8HK7VYRCuu2WhoZbk/z9JTL5EQ4PQVTCLMTX1tXJHVoAoBhNWhKztpmFZxu9JZs8D8SVl/CmGihaDyzNXKi308oOnbvDi2d503W5s91ozMQhZHoFDcWegY6RRHI8pipfoFQDK0d9lkaMTNwzD/n+hzoCPTd96XnmAfKlVcE4M/zQFEE8nxOfK/6sxnXhxo8x6Mbn1WXMKYy5nnTiJQvnIPf7nof0fjYFbwkw5ZDU5DldUBPVEFaKQ1xqBShIC/HPC7M43akys0Bs5zZXAs1AAFoiQN2ey5Dpb8AqKpAQbbZfU9K2Ws9e7hrxUBmsL9gTh52v1mLp14+Ou1nzqPV39aBXbOfemLwtrmyEj+WzZttZkSM4IdS9LE/5nap8LkdCEd1tHdFbP3D7nGpmJHnwQVz8vB+bTvicQNnFbuhx81ue8FwHFkeBXFdIq4bcKgKDEOHU1MT/bXNI8L0xJmYBdmuMa0cTA/2062p03hzqErqAO6ej08FDN42d6imGW8caTBPOhlGkE0G7WRgVoR5yrjLoSIQjqNlivSi6ArF0NgWQktnOLXhm/729rafvY5AOAYhzF/oSFSHhLn2HIrE4XWb/UQ6uiOp02XGq3JwrPK0yZTsIZPo6ZVqQ1CUP7aZIKywpBFJ/sI7NLXPY9HSpV9PBm1NFcjPdiHHZ1b2NXeEbVO+PhRSAt2hWKo6tSczQ8b81Y7rMmPdu6UjhGA4hkhMh0NT8N8Xzxk0+2c0mjvCw0s1pAFdvfRc+Nxa6og3RZgdLJNtcMfCoZpm/HbX+/iwrhMt7SF8WNeJ3+56nxWWNLhkkYCaWK/tS2qWnfaYy6Eg2+dENGagM3Fo7VQVjxv9BkFVFUAcfaY5qorZKbDkjNzUbGo8T0OfLmdBTpRkD5nxzL557rUaBEJRCKFAUwV0AwiEonjutRpWWNLA/LlufHKqG4Fw5mxZJP4nfWlEAPB6NHjdDgQTSyN2Xs8eKgng5KluuBwaDtU0Y3naqSVn+H1oaA2ioztqFrQn3mI7HCqK8j0IhuO9sknGy3Q+C3K8jHf2TWNrEIBINDETUISELkXi8fHF4G1D6Z3j3E4V3WlVjckWponzZM3HFIEcrxOqCgTDOk61hSblvieVEHA6lF4n0CQDpkNToBvSfJcCs3KyMxBFNGbge7/cPyGZH8zTtiOR6t+eeiQ5cxpnDN420zMjIXmSerKoJH31w6EpyPU5oRsS3aGYrTr6jSWB01kikZiecQJNMjA+91oN6poDUBXz0AOzJ3UUuT7XhGZ+ME/bXtIP1lBgVnNKCcwqYHk89ZDaoFQVBMLxVPBOLyrxuFTMzPfA49LQ2hVBW1dkSgXuARr19WIeRyYGPIGmrMSP+9cvxK1Xl2Fmnhsd3TF0dEdTh8r2bMlLlJS5KSrHZVO0P5x520xjWxBSAk3BWEbAVgSQ5XHC7TJT/U61h6bserYiBPRBBjcjz42uxGEIWloGx2An0ETiBgrz3ObSkgRaO8MoyHHD49IyNj3H4tADsr/0TVGWx1MvUkq8X9uOV94+gbauzEpKTRXI8bmgKEB3MIbOoPWPFhutoWTGeFwa4rpENB6Bx6lm5Gb3dQINkJlnrWkK4nEdug6cagvBlehRXpTvGXUxDQP/1MLyeOolEtPx5ruNeOXtE/jkVCD1uCKA3Cxn6rzArqC9TqgZKx6nglDUHHdqJUWY69vBcBxF+R584cJivF/bnhEoF5QW9flLlt6b2e1Q0JGoqgTMcyij8Qi+cGHxqIppWEVJY4XB24JaO8PYe/Ak/t/BkxkpgGcVZeHzF56BglwX/vavOvy7psXShw2PN90w89WlNJdDFCHgdiqYkefJmM0ONTc7Pc86EjOgpOXHa5oCj1NNvRCMtAE/qyhprDB4W0hNXQf++tYJvPNBUyprRFUEykoKsWz+mTjvU3lwamaweus/DdM6cAOJlEjDQF6WC4oioKoKnJoy4tlsep51LK5DSRQ+JTNVko2oRlNMM5knrwwHl3asj8HbIuqaA3jw9++YDeMBZHscWPTpInxh3hkoyvdAVcxNt+Q+3ZvvnZqsW7UMs5+LQGtnBEWF3lHPZtPzrFvaQxBCIDfL2asR1WiKaexQRcmlHXtg8LYIn1tDsd8Lh6bg4k8XYdGnZyHL44DoJ9nfmKqpJEOkqeb3JVlUM1Y9QZKbT+kBrGcjqtEU09ihipJLO/bA4G0RuVkubFq/ELohoShmyO751jXZ1tRqb7Eng5QSBsx3Ii6HGQTHcjY7WIAeaTGNHaoo7bK0M90xeFuMmjgnsedb14bWII5+0o5cnwvZXgfcIRXhtGwIKxru0WPDoRuAgAQEkO11IRgyD1gYy9nseFU7Wr2K0g5LO8QKS8tKf+sqhEAoqkNAoDsUQ1NbCPE+msxbzVgHbiF6V1f6c91mEY4Q0ATGrV3rdMKDjO2BM2+L6vnWNR43AEjE4hJ9n8I49fW1zJ8sXQeALK8T909QB8CpzA5LO8TgbVk937pqmoKIxZdJJkOyfN3tVG25JmvVlDyrL+0Ql00sq+dbV49THfyTphlVFQCE2brVhmuyyX2N9kA0IyVvIk5hIfvjzNuier51nVXgRXv31O5bYlY2mu8uNFX0ebpNUiLtHQISsSFuUE7WWYP9YUoejQaDt4X1fOt608OvIRKz/kblcCjCXLcuLvAgHDMQ7dAhhNk5UED2uekpYBYxhWMGYnEDboeGJXNnYfebtXjq5aMZyw/JgH2yOYBwREeW14GCHJclCk+YkkejweBtI8meG3bkc2vQVIGOQOZ2q6qYB/tW/J//BQD43i/3o7kjjOgAjbaEMA8VzvG5oHkFlsydhTeONPSqCDxe35l6PBozYEiJrmAUbpdmiVkuU/JoNBi8LaqvjazZfh8CJzugD7CcYCWKIqApArohUZjrhhACToeKzmAM8UQjqW9+eW4qeB6qaUY0pqeOcuvJ5VTh0hRE4uaMOxLVsa7iv/pdfnj5wCfIyXLC5VAR1xONpiDQ3hXBzHzPpM9y7VBtSdbF4G1B/fWWWDJ3Fj6obZ/s2+tFVUWfLygi0brWpZ2ugPS6HanjyDRFZCx1dIdi8HoccDlPB3hVEYjrBs6cmQWRluQtpUQwHEdZiR9PvXy0z+WHcDQOv2bOYjVNgZ4I4Mn2uZM9y2VKHo0Gg7cF9TeTfL+2fdwqFkejv3cCui5x7WXnAUCvGWYwHAekRNyQqReoptYgCnPdqQAPmEG6vjkwYPl7+vJDMBxDZzB5XqeZiZKb5UKO14HWrgh0Q8KhKZYpPGFKHo0Ug7cFTaWNrPTAlD7D1AQQl8h4gdJUBR3d0VTgBswgXVTgRSSm97u8kFx+6IjE0RWKwayal/C4tdTJQjk+J7LjBgLhONxODXm+yc82IRoNBm8LGmgjq6UjPGGzby2xHDLQ8wnRd+VjTz1nmN/75f5eL1A5PgdaOyOIxPSMIL1mmTl7H6hJFAA8/uK7gJRwaCpyfE54XBocgSgiUR3BcByzCrwoXzgHyxedM6HHVRGNBwZvCxpoI6u5IzwhM3BVMXuTDBa4B/oAZYBT3vt6gdI0FbMLvcjyOgcM0n0pK/HD69bgz3NnrI1nex1QFYEtNy0eYCR9s2r1IxHA4G1JA21kfXUF8PM/HcJ4H1mpG4DXpSBs6DDM5n1QFDPARqJmLrYAMNBtzPb7+r3W3wvUmh4NpQ7VNGPLH94ZUgAdy9Q7HkhAVsfgbVEDbWTJcW22as66dQMIRXT489xo7QwDEJiR50Y4qiMW15HrcyEYNjcG+zuNbcH5M/p9jqFkWgw3gI5l6h2rH8nqGLxtZvebteO+5p0MxhJAJKqjuNAHSImYAURjBrwuDaFIHJGY0atFa5LXZR7WO9Dhv4NlWgw3gI5l6t1U2jSmqYnB22aaO8JwqAoixvh1GEzfgDQrGZ24eum5WL7oHFz7w10IhGMQQhlwszIY0fHBiXZs+cM7ww6gybXmoyfaE5uPp1MHBwugY5V6x+pHsjp2FbQZf64bHtfoOwym/8MPsK8IKYFPTgWw9dlDqPzun9EVikFKczNSHWhHEgAkht0pL73TnkNTENcNtHZFEAybZfUTFUB5IAFZHYO3zZQvnAOHpqQO4B2JHK8Dt1aVIcfrgFNT+l36ADJX1g3DDOaGNINoz65/ApkvBEIxD0pQVQW736wd0r2lL5XkZrlSGS2dgdiEBtCyEj+uvew85PmcPJ2HLInLJjaTDB7P7TuGupYgDGluMAoxcAvVJE0VmO33oazEj6//d2nq64xUcutUEYDTcfpcTVURcCROdB/OWnH6WrPHpaEgx42O7ghicX3CC2tY/UhWxpm3DZWV+HH/9YtQseRsKEJAN/ovUe8prktcMCcv9XWyvE7MLPDC5VShDvOnQQhAVZXE2ZICedkuc6atCAhhVjUCw1vq8Oe6e3UUlNLsPkhEp3HmbVOHapqx952TUBTA0AdPHBQwZ91ZXmcqC+RQTTNqTnbAkBKKEJA4nSY4FFKaXzM/241I1JwZB0IxGIZEltcBt1Md9lJHerqfrhupNMX8bOeUzLVmIRCNFIO3TT237xgC4Xif69VmQY3ZijW5fFFU4AVgNnpq7ginNgaFEImlDzN1RFEVSGmkCnMGelFwO9VU35HZfje+lzj8dzQBKT3dr+ZkR+r+2ruj0FQFHpc2ZXKtWQhEo8Hg3Y+PPvoIGzduRHt7O/Ly8rB582acffbZk31bKY1t4dSJMz1DrASgpypnZGr5Aji9hJHcGMzNcqK1KwIBCSEEDGlG7TyfM5XL3Z9sr6PPmfVo14qTn3/bz15HIBwDhAJFmEs+nYEI9KG+NbA4FgLRaHAhsR/33nsvqqur8de//hXV1dW45557JvuWepBDagjldKhQFNEr3a25IwynpsDrdqAg2wU1seBt6BI5Xidys1yJmXvfX7cwxwUpMa5ZGOYGrEj1SDH/f2gbs3aQ/DdIx0IgGirOvPvQ0tKCd999F0888QQA4IorrsCmTZvQ2tqKgoKCSb47U1GBF/UtARiy/zw/IQCHpkITQDAcz1jC8OfWpopQ0g9IaGkPpWbqmqogrksowlxG0VQBRQjMzHPj/usXjfsYVVUAccCQaQ0BRPLUePtjIRCNBoN3H+rr61FUVARVNX+pVFXFzJkzUV9fP+TgXViYlfH3GTOyx/Qe11f+b/zsmYMIhuOI9yi2FMJc83YmqhPzczz4vzctwdvvNeJ/XjuGP7x6DF6XhnAkbp7SHjfQ1hVBTJdwagoC4Rjys93Iz3HjVFsIQhFwqwr8eW7E4xLrV5WN+Xj6cvbsXNSd6kIwHEcsbsChmWvDs2dkj/r5J+L+B7N6xQX49f8cgm4YifNJzZ3n1SsuGPL9WWEcY4HjGD4G73HS0tINI7HuPGNG9pj3jz7L78W68vOx+81a1DUHEIrqiOsGNEWkGldlex1QhED9qW68+o+PUptjbod5RJghgVAohrbEZmBhjgtxXaKtKwJdN9fKs70OdAdjcGgK8nM8WD5vNs7yeyekH/byebOxfc9R5Ga5MhpNLZ83e1TPPx7/HiNxlt+LNcvP7bW5O9Tvr1XGMVocx8iCPoN3H4qLi9HY2Ahd16GqKnRdR1NTE4qLiyf71jKkbwweqmnG4y++i0gsDoemIMfngselIRLTMzYo0zfHAKCzO2rmeTsyS+6TBxgU5Xuwrvx8lJX4J/yXbDqc8chCIBopBu8+FBYWorS0FDt27EBlZSV27NiB0tJSy6x396WsxI8bK/4rNbt29jincSiH9CaN5gCDscbgRtQ3Bu9+3Hfffdi4cSN+8YtfICcnB5s3b57sWxrUQDPV9A3KpGjcgNupDXi4LxFZE4N3P0pKSvDss89O9m0MW38z1f4OKljxuTPxxpGGMTnAgIgmDoO3jYxV5WLPzz+7OGdKrysTTUUM3jYxFqXU/c3Kua5MZD+ssLSJ9GwRIYbfJ5uIphYGb5tgKTURpWPwtom++lwzK4Ro+mLwtgmeqUhE6bhhaRPTodqQiIaOwdtGmBVCRElcNiEisiEGbyIiG2LwJiKyIQZvIiIbYvAmIrIhBm8iIhtiqiDRFDCajpNkTwzeRDY3Fh0nyX64bEJkc+w4OT0xeBPZHDtOTk8M3kQ2x46T0xODN5HNsePk9MQNSyKbY8fJ6YnBewpgmhix4+T0w+Btc0wTI5qeuOZtc0wTI5qeGLxtjmliRNMTg7fNMU2MaHpi8LY5pokRTU/csLQ5pokRTU8M3lMA08SIph8umxAR2RCDNxGRDTF4ExHZEIM3EZENMXgTEdkQgzcRkQ0xeBMR2RCDNxGRDTF4ExHZEIM3EZENMXgTEdkQgzcRkQ0xeBMR2RCDNxGRDTF4ExHZEIM3EZENMXgTEdkQT9IhSzlU08wj3YiGgMGbLONQTTO27zkKVVXgdWtoD0Sxfc9RAGAAJ+qByyZkGbvfrIWqKnA5VAgh4HKoUFUFu9+snexbI7IcBm+yjOaOMJxa5o+kU1PQ3BGepDsisi4Gb7IMf64b0biR8Vg0bsCf656kOyKyLgZvsozyhXOg6wYiMR1SSkRiOnTdQPnCOZN9a0SWww1LsozkpiSzTYgGx+BNllJW4mewJhoCLpsQEdkQgzcRkQ0xeBMR2RCDNxGRDTF4ExHZkCWC949+9COUl5ejoqICa9asweHDh1PXQqEQbrvtNlx22WUoLy/Hvn37Ju0aEZFVWCJV8POf/zy+//3vw+FwYN++fbj99tvxyiuvAAC2bdsGn8+HPXv24Pjx47j22mvx8ssvw+fzTfg1IiKrsMTMe+nSpXA4HACAz3zmM2hoaIBhmGXSL730EtasWQMAOPvsszF37lz87W9/m5RrRERWYYmZd7rt27fj0ksvhaKYryt1dXU444wzUteLi4vR0NAwKdeGo7AwK+PvM2ZkD/trWBHHYS0ch7VM5DgmJHhfddVVqKur6/Pa/v37oaoqAGDnzp34y1/+gu3bt0/EbRER2daEBO/nn39+0I/Zs2cPHnnkETz55JPw+0+XR8+ePRsnT55EQUEBAKC+vh4LFy6clGtERFZhiTXvffv24cEHH8S2bdtw5plnZlwrLy/HM888AwA4fvw4Dh8+jEsuuWRSrhERWYWQUsrJvolFixbB4XCkZrsA8OSTTyI/Px/BYBAbN27Ee++9B0VR8N09B8hYAAAIm0lEQVTvfhdf/OIXAWDCrxERWYUlgjcREQ2PJZZNiIhoeBi8iYhsiMGbiMiGGLyJiGyIwZuIyIYYvMfRRx99hNWrV+Pyyy/H6tWrcfz48Ql9/s2bN2PZsmU4//zzcfTo0SHd10RfG4q2tjbccMMNuPzyy3HllVfiW9/6FlpbW205lptvvhkVFRVYtWoVqqur8d5779lyHADw85//PONny45jWLZsGcrLy1FZWYnKykq8/vrr9hmLpHGzdu1a+cILL0gppXzhhRfk2rVrJ/T5Dxw4IOvq6uTSpUvlBx98MKT7muhrQ9HW1ib/8Y9/pP7+0EMPybvuusuWY+ns7Ez9ec+ePXLVqlW2HMeRI0fk+vXr5aWXXpr62bLbGKSUvX43Jut+RzIWBu9x0tzcLOfPny/j8biUUsp4PC7nz58vW1paJvxe0n9AB7qvib42Urt375br1q2z/Vief/55edVVV9luHJFIRF5zzTWytrY29bNltzEk9RW87TIWy3UVnCrq6+tRVFSUarqlqipmzpyJ+vr6jEpSK92XlHJCr43k+2AYBv74xz9i2bJlth3L3XffjTfeeANSSvzmN7+x3Th++tOfoqKiAp/61KdSj9ltDOm+853vQEqJ+fPn49vf/rZtxsI1b7KVTZs2wev14qtf/epk38qIPfDAA3jttddw++23Y8uWLZN9O8Ny8OBBHD58GNXV1ZN9K2Ni+/btePHFF/GnP/0JUkrcf//9k31LQ8aZ9zgpLi5GY2MjdF2HqqrQdR1NTU0oLi627H1JKSf02nBt3rwZH3/8MX71q19BURRbjwUAVq1ahXvuuQezZs2yzTgOHDiADz/8EMuXLwcANDQ0YP369bjrrrtsM4Z0yY93Op2orq7GTTfdZJuxcOY9TgoLC1FaWoodO3YAAHbs2IHS0tJJXTIZ7L4m+tpwPPLIIzhy5Agee+wxOJ1OW44lEAigvr4+9fe9e/ciNzfXVuO48cYb8fe//x179+7F3r17MWvWLGzbtg0rV660zRiSgsEgurq6AABSSuzatQulpaX2+fcYdEWfRuzYsWPy6quvlitWrJBXX321rKmpmdDn37Rpk7zkkktkaWmpXLx4sVy5cuWg9zXR14bi6NGj8rzzzpMrVqyQFRUVsqKiQt588822G8upU6dkVVWVvOKKK2RFRYVcu3atPHLkiO3GkS59w89uY6itrZWVlZXyiiuukCtXrpS33HKLbGxstM1Y2FWQiMiGuGxCRGRDDN5ERDbE4E1EZEMM3kRENsTgTURkQwzeRBayZMkSvP322xP+uWQ/DN40bcybNy/13wUXXICysrLU31988cURf91rrrkGf/7zn1N/j0QiOP/889HQ0DAWt03UJ5bH07Rx8ODB1J+XLVuGH//4x1i8ePEk3hHRyHHmTZSg6zoee+wxLF++HAsXLsQdd9yBzs5OAGYp9e23346LLroICxYsQFVVFTo6OvDQQw/h8OHD+MEPfoB58+bhoYceGvA5ampqsHbtWlx00UVYtGgR7rzzTnR3d2d8zMGDB1FeXo6LLroIP/zhDxGNRlPX9uzZgyuvvBILFixAdXU1jh071ufz/POf/8SqVavw2c9+FkuWLMHDDz88yu8OWc6w6kmJpoilS5fKN954I+OxX//61/IrX/mKbGhokOFwWN55551y48aNUkopn3zySXnLLbfIUCgkY7GY/Pe//y0DgYCUUsqqqqpUI30ppQyHw/K8886T9fX1vZ732LFjcv/+/TISicimpiZZVVUlf/KTn6SuL168WFZWVsqGhgbZ0tIiv/zlL8vHHntMSinlwYMH5ZIlS+Thw4dlPB6XTz/9tFyxYoWMxWKpzz1w4ICUUsrKykq5a9cuKaWUXV1d8l//+tdYfevIIjjzJkp4+umncccdd6CoqAgulwvf/OY3sWvXLkgpoWkaWltbUVtbC03TUFZWBq/XO+znKCkpwcUXXwyn04kZM2Zg3bp1OHDgQMbHXHfddSgqKkJBQQG+8Y1vYOfOnQCAZ555Btdeey3mzp0LVVWxevVqRKNR/Oc//+n1PJqm4fjx42hra0NWVhYuvPDCkX1TyLK45k0Es6tcQ0MDbrzxRgghUo8bhoG2tjZUVVWhubkZt956K4LBIFatWoUNGzakGugPVWNjIx544AEcPHgQgUAAUkrMmDEj42PSW4HOnj0bTU1NAICTJ0/ipZdewrZt21LXY7EYGhsbez3P5s2b8eijj6K8vBxz5szBrbfeiksuuWRY90rWxuBNBEAIgaKiIjz66KOYO3dunx+zYcMGbNiwASdOnMD69etx7rnnoqKiIiPYD2bLli3wer3YsWMHcnNzsXPnTmzdujXjY9LbxtbX12PmzJkAzKB+6aWX4utf//qgz1NSUoKtW7dC13Xs3LkTt9xyC956661UO12yPy6bECWsWbMGDz/8cCp4trS0YO/evQCA/fv349ixYzAMAz6fD6qqpmbdhYWFOHHiRK+vF41GEYlEUv8ZhoFAIACv14usrCzU1dXhiSee6PV5v//979HU1ITW1lY8/vjj+NKXvgTATEl86qmncPjwYUgpEQgE8OqrryIUCvX6Gi+88ALa2tqgqiqys7MhhICi8Nd9KuHMmyjh+uuvhxAC69atw6lTp+D3+1FZWYlly5ahsbER9913H5qamuDz+XDllVemgurXvvY13H333fjd736HqqoqbNiwAQBw2WWXZXz9LVu2YMOGDdi4cSMWLFiAc845B+Xl5Xj22WczPm7lypW47rrr0NLSghUrVuCGG24AAMyfPx9333037r33Xnz88cfweDz43Oc+hyVLlvQay759+/Dggw8iGo3izDPPxNatW6Fp/HWfStjPm4jIhvg+iojIhhi8iYhsiMGbiMiGGLyJiGyIwZuIyIYYvImIbIjBm4jIhhi8iYhsiMGbiMiG/j8dHxZ+s1n0dgAAAABJRU5ErkJggg==\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "data = pd.DataFrame({'TestLabels':testLabels , 'Predictions':yPreds}, columns=['TestLabels','Predictions'])\n", + "sns.set_theme(color_codes=True)\n", + "sns.lmplot(x='TestLabels', y='Predictions', data=data)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "a10af570", + "metadata": {}, + "source": [ + "## Evaluation Metrics for Regression model\n", + "\n", + "In the Previous cell we have visualized our model performance by plotting the best fit line. Now we will use various evaluation metrics to understand how well our model has performed.\n", + "\n", + "* Mean Absolute Error (MAE) is the sum of absolute differences between actual and predicted values, without considering the direction.\n", + "$$ MAE = \\frac{\\sum_{i=1}^n\\lvert y_{i} - \\hat{y_{i}}\\rvert} {n} $$\n", + "* Mean Squared Error (MSE) is calculated as the mean or average of the squared differences between predicted and expected target values in a dataset, a lower value is better.\n", + "$$ MSE = \\frac {1}{n} \\sum_{i=1}^n (y_{i} - \\hat{y_{i}})^2 $$\n", + "* Root Mean Squared Error (RMSE), Square root of MSE yields root mean square error (RMSE) it indicates the spread of the residual errors. It is always positive, and a lower value indicates better performance.\n", + "$$ RMSE = \\sqrt{\\frac {1}{n} \\sum_{i=1}^n (y_{i} - \\hat{y_{i}})^2} $$" + ] + }, + { + "cell_type": "code", + "execution_count": 52, + "id": "749f2a47", + "metadata": {}, + "outputs": [], + "source": [ + "def MAE(y_true, y_pred):\n", + " return np.mean(np.abs(y_pred-y_true))\n", + "\n", + "def MSE(y_true, y_pred):\n", + " return np.mean(np.power(y_pred- y_true, 2))\n", + "\n", + "def RMSE(y_true, y_pred):\n", + " return np.sqrt( np.mean(np.power(y_pred- y_true, 2)))" + ] + }, + { + "cell_type": "code", + "execution_count": 54, + "id": "161e7ee1", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "---- Evaluation Metrics ----\n", + "Mean Absoulte Error: 50639.27\n", + "Mean Squared Error: 4878809544.62\n", + "Root Mean Squared Error: 69848.48\n" + ] + } + ], + "source": [ + "print(\"---- Evaluation Metrics ----\")\n", + "print(f\"Mean Absoulte Error: {MAE(testLabels, yPreds):.2f}\")\n", + "print(f\"Mean Squared Error: {MSE(testLabels, yPreds):.2f}\")\n", + "print(f\"Root Mean Squared Error: {RMSE(testLabels, yPreds):.2f}\")" + ] + }, + { + "cell_type": "markdown", + "id": "98bd24d3", + "metadata": {}, + "source": [ + "We can clearly see that the MAE is 49674, when compared with the median house value doesn't seems to be a good fit. \n", + "\n", + "Thus we can conclude that, the simple Linear Regression models is not being able to catch all the features.\n", + "So, maybe its time for you to try other algorithms. \n", + "
NOTE :
In the entire ML workflow, you never know exactly which model will perfrom the best. So, usually you try a lot of different algorithms to see which fits the model." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.8" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/california_housing_price_prediction_with_linear_regression/california.png b/california_housing_price_prediction_with_linear_regression/california.png new file mode 100644 index 00000000..0103e3ba Binary files /dev/null and b/california_housing_price_prediction_with_linear_regression/california.png differ diff --git a/california_housing_price_prediction_with_linear_regression/california_housing_price_prediction_with_lr_cpp.ipynb b/california_housing_price_prediction_with_linear_regression/california_housing_price_prediction_with_lr_cpp.ipynb new file mode 100644 index 00000000..265cfa9d --- /dev/null +++ b/california_housing_price_prediction_with_linear_regression/california_housing_price_prediction_with_lr_cpp.ipynb @@ -0,0 +1,813 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "048dbd39", + "metadata": {}, + "source": [ + "### Predicting California House Prices with Linear Regression\n", + "\n", + "### Objective\n", + "* To predict California Housing Prices using the most simple Linear Regression Model and see how it performs.\n", + "* To understand the modeling workflow using mlpack.\n", + "\n", + "### About the Data\n", + " This dataset is a modified version of the California Housing dataset available from Luís Torgo's page (University of Porto). Luís Torgo obtained it from the StatLib repository (which is closed now). The dataset may also be downloaded from StatLib mirrors.\n", + " \n", + " This dataset is also used in a book HandsOn-ML (a very good book and highly recommended)[ https://www.oreilly.com/library/view/hands-on-machine-learning/9781491962282/].\n", + " \n", + " The dataset in this directory is almost identical to the original, with two differences:\n", + "207 values were randomly removed from the totalbedrooms column, so we can discuss what to do with missing data. An additional categorical attribute called oceanproximity was added, indicating (very roughly) whether each block group is near the ocean, near the Bay area, inland or on an island. This allows discussing what to do with categorical data.\n", + "Note that the block groups are called \"districts\" in the Jupyter notebooks, simply because in some contexts the name \"block group\" was confusing.\"\n", + "\n", + "Lets look at the features of the dataset:\n", + "* Longitude : Longitude coordinate of the houses.\n", + "* Latitude : Latitude coordinate of the houses.\n", + "* Housing Median Age : Average lifespan of houses.\n", + "* Total Rooms : Number of rooms in a location.\n", + "* Total Bedrooms : Number of bedroooms in a location.\n", + "* Population : Population in that location.\n", + "* Median Income : Median Income of households in a location.\n", + "* Median House Value : Median House Value in a location.\n", + "* Ocean Proximity : Closeness to shore. \n", + "\n", + "### Approach\n", + " Here, we will try to recreate the workflow from the book mentioned above. \n", + " * Look at the Big Picture.\n", + " * Get the Data.\n", + " * Discover and Visualize the data to gain insights.\n", + " * Pre-Process the data for the Ml Algorithm.\n", + " * Create new features. \n", + " * Splitting the data.\n", + " * Training the ML model using MLPACK.\n", + " * Residuals, Errors and Conclusion.\n" + ] + }, + { + "cell_type": "markdown", + "id": "1929f17d", + "metadata": {}, + "source": [ + "### Big Picture\n", + "\n", + "Suppose you work in a Real State Agency as an analyst or Data Scientist and your Boss wants you to predict the housing prices in a certain location. You are provided with a dataset. So, what will be the first thing to do?\n", + "\n", + "If you are probably jumping right into anaylsing the data and ML Algos, then this is a wrong a step. Its a big \"NO\". \n", + "
The first thing is to ask Questions.
\n", + " \n", + " Questions like : What will be the predictions used for? Will it be fed into some other system or not? And Many More, just to have concrete goals.\n", + " \n", + " So, your boss says that they will be using the data to get the predcitions so that the other team can work on some investment strategies.\n", + " \n", + "So, let's get started." + ] + }, + { + "cell_type": "markdown", + "id": "2a8513db", + "metadata": {}, + "source": [ + "

Importing Header Files

" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "4d4ec4de", + "metadata": {}, + "outputs": [], + "source": [ + "#include \n", + "#include \n", + "#include \n", + "#include " + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "8a0ace0c", + "metadata": {}, + "outputs": [], + "source": [ + "#define WITHOUT_NUMPY 1\n", + "#include \"matplotlibcpp.h\"\n", + "#include \"xwidgets/ximage.hpp\"\n", + "\n", + "/* CPython Api Scripts for Plots */\n", + "\n", + "#include \"../utils/histogram.hpp\"\n", + "#include \"../utils/impute.hpp\"\n", + "#include \"../utils/pandasscatter.hpp\"\n", + "#include \"../utils/heatmap.hpp\"\n", + "#include \"../utils/plot.hpp\"\n", + "\n", + "namespace plt = matplotlibcpp;" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "79e6d53d", + "metadata": {}, + "outputs": [], + "source": [ + "using namespace mlpack;\n", + "using namespace mlpack::data;" + ] + }, + { + "cell_type": "markdown", + "id": "2d5992b1", + "metadata": {}, + "source": [ + "

Let's download the dataset.

" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "401c6664", + "metadata": {}, + "outputs": [], + "source": [ + "!wget -q https://datasets.mlpack.org/examples/housing.csv" + ] + }, + { + "cell_type": "markdown", + "id": "75b146bd", + "metadata": {}, + "source": [ + "### Loading the Data\n", + "Now, we need to load the dataset as armadillo matrix for further operations. Our dataset has a total of 9 features: 8 numerical and 1 categorical(ocean proximity). We need to map the categorical feature as armadillo operates on numeric values." + ] + }, + { + "cell_type": "markdown", + "id": "08e417d5", + "metadata": {}, + "source": [ + "But, there's one thing which we need to do before loading the dataset as armadillo matrix, that is, to deal with any missing values. Since 207 values were removed from the original dataset from \"total_bedrooms_column\", we need to fill them using either \"mean\" or \"median\" of that feature( for numerical) and \"mode\"( for categorical\")." + ] + }, + { + "cell_type": "code", + "execution_count": 49, + "id": "7e4a6750", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0" + ] + }, + "execution_count": 49, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "// The imputing functions follows this:\n", + "// Impute(inputFile, outputFile, kind);\n", + "// Here, inputFile is our raw file, outputFile is our new file with the imputations, \n", + "// and kind refers to imputation method.\n", + "\n", + "Impute(\"housing.csv\", \"housing_imputed.csv\", \"median\");" + ] + }, + { + "cell_type": "markdown", + "id": "ddba48dd", + "metadata": {}, + "source": [ + "Let's drop the headers using sed. Sed is a unix utility which is used to parse and transform text." + ] + }, + { + "cell_type": "code", + "execution_count": 50, + "id": "4d95bf63", + "metadata": {}, + "outputs": [], + "source": [ + "!sed 1d housing_imputed.csv > housing_without_header.csv\n", + "\n", + "// Here, we used sed to delete the first row which is indicated by \"1d\" and created a new file with name\n", + "// housing_without_header.csv" + ] + }, + { + "cell_type": "code", + "execution_count": 51, + "id": "d2e2c3f4", + "metadata": {}, + "outputs": [], + "source": [ + "arma::mat dataset;\n", + "data::DatasetInfo info;\n", + "info.Type(9) = mlpack::data::Datatype::categorical;\n", + "data::Load(\"housing_without_header.csv\", dataset, info);" + ] + }, + { + "cell_type": "code", + "execution_count": 52, + "id": "choice-victor", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " -1.2223e+02 -1.2222e+02 -1.2224e+02 -1.2225e+02 -1.2225e+02 -1.2225e+02\n", + " 3.7880e+01 3.7860e+01 3.7850e+01 3.7850e+01 3.7850e+01 3.7850e+01\n", + " 4.1000e+01 2.1000e+01 5.2000e+01 5.2000e+01 5.2000e+01 5.2000e+01\n", + " 8.8000e+02 7.0990e+03 1.4670e+03 1.2740e+03 1.6270e+03 9.1900e+02\n", + " 1.2900e+02 1.1060e+03 1.9000e+02 2.3500e+02 2.8000e+02 2.1300e+02\n", + " 3.2200e+02 2.4010e+03 4.9600e+02 5.5800e+02 5.6500e+02 4.1300e+02\n", + " 1.2600e+02 1.1380e+03 1.7700e+02 2.1900e+02 2.5900e+02 1.9300e+02\n", + " 8.3252e+00 8.3014e+00 7.2574e+00 5.6431e+00 3.8462e+00 4.0368e+00\n", + " 4.5260e+05 3.5850e+05 3.5210e+05 3.4130e+05 3.4220e+05 2.6970e+05\n", + " 0 0 0 0 0 0\n", + "\n" + ] + } + ], + "source": [ + "// Print the first 6 rows of the input data.\n", + "std::cout << dataset.submat(0, 0, dataset.n_rows - 1 , 5)<< std::endl;" + ] + }, + { + "cell_type": "markdown", + "id": "a43f7359", + "metadata": {}, + "source": [ + "Did you notice something? Yes, the last row looks like it is entirely filled with '0'. Let's check our dataset to see what it corresponds to.\n", + "It corresponds to Ocean Proximity which is a categorical value, but here it is zero.\n", + "Why? It's because the load function loads numerical values only. This is exactly why we mapped Ocean proximity earlier.\n", + "So, let's deal with this." + ] + }, + { + "cell_type": "code", + "execution_count": 53, + "id": "d6969aa9", + "metadata": {}, + "outputs": [], + "source": [ + "#include\n", + "arma::mat encoded_dataset; \n", + "data::OneHotEncoding(dataset, encoded_dataset, info);" + ] + }, + { + "cell_type": "markdown", + "id": "0f534207", + "metadata": {}, + "source": [ + "Here, we chose our pre-built encoding method \"One Hot Encoding\" to deal with the categorical values." + ] + }, + { + "cell_type": "code", + "execution_count": 54, + "id": "8bad850e", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "14" + ] + }, + "execution_count": 54, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "encoded_dataset.n_rows\n", + "// The above code prints the number of rows(features + labels) in current dataset." + ] + }, + { + "cell_type": "markdown", + "id": "89a8df9c", + "metadata": {}, + "source": [ + "You can notice the number of rows changed from 10 to 14, siginifying one hot encoding in this case." + ] + }, + { + "cell_type": "markdown", + "id": "f078a9e5", + "metadata": {}, + "source": [ + "

Visualization

" + ] + }, + { + "cell_type": "markdown", + "id": "b5ba850f", + "metadata": {}, + "source": [ + "Let's plot a histogram. " + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "a7b59588", + "metadata": {}, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "5c0dd57a133c4ecca91802380f610915", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "A Jupyter widget with unique id: 5c0dd57a133c4ecca91802380f610915" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "// Hist(inputFile, bins, width, height, outputFile);\n", + "Hist(\"housing.csv\", 50, 20, 15, \"histogram.png\");\n", + "auto im = xw::image_from_file(\"histogram.png\").finalize();\n", + "im" + ] + }, + { + "cell_type": "markdown", + "id": "ddcc2d3e", + "metadata": {}, + "source": [ + "Let's plot a scatter plot with longitude and latitude as x and y coordinates respectively." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "54c2a0ca", + "metadata": {}, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "f938371980f045b4b47b190bdc1dd973", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "A Jupyter widget with unique id: f938371980f045b4b47b190bdc1dd973" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "// PandasScatter(inputFile, x, y, outputFile);\n", + "PandasScatter(\"housing.csv\", \"longitude\", \"latitude\", \"output.png\");\n", + "auto im = xw::image_from_file(\"output.png\").finalize();\n", + "im" + ] + }, + { + "cell_type": "markdown", + "id": "5781bc1e", + "metadata": {}, + "source": [ + "Let's add some colour to the scatter plot." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "3fef937e", + "metadata": {}, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "8177cbf69b104cfeb24cbea0475693ae", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "A Jupyter widget with unique id: 8177cbf69b104cfeb24cbea0475693ae" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "// PandasScatterColor(inputFile, x, y, label, c, outputFile);\n", + "PandasScatterColor(\"housing.csv\",\"longitude\",\"latitude\",\"Population\",\"median_house_value\",\"output1.png\");\n", + "auto im = xw::image_from_file(\"output1.png\").finalize();\n", + "im" + ] + }, + { + "cell_type": "markdown", + "id": "431f719d", + "metadata": {}, + "source": [ + "Let's take it a step further and plot this on top of California map." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "5d22bf50", + "metadata": {}, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "10408985977f4b25b0332df8a43f7081", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "A Jupyter widget with unique id: 10408985977f4b25b0332df8a43f7081" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "//PandasScatterMap(inputFile, imgFile, x, y, label, c, outputFile);\n", + "PandasScatterMap(\"housing.csv\",\"california.png\",\"longitude\",\"latitude\",\"Population\",\"median_house_value\",\"output2.png\");\n", + "auto im = xw::image_from_file(\"output2.png\").finalize();\n", + "im" + ] + }, + { + "cell_type": "markdown", + "id": "36f8cbf3", + "metadata": {}, + "source": [ + "

Correlation

" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "9c60a67f", + "metadata": {}, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "98d1a64dbd0947d78f0f8e276debab93", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "A Jupyter widget with unique id: 98d1a64dbd0947d78f0f8e276debab93" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "// HeatMap(inputFile, outputFile);\n", + "HeatMap(\"housing.csv\", \"heatmap.png\");\n", + "auto im = xw::image_from_file(\"heatmap.png\").finalize();\n", + "im" + ] + }, + { + "cell_type": "markdown", + "id": "7d6af59e", + "metadata": {}, + "source": [ + "

Train-Test Split

\n", + "The dataset needs to be splitted into training and testing set for tarining." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "chubby-water", + "metadata": {}, + "outputs": [], + "source": [ + "// Labels are median_house_value which is row 8\n", + "arma::rowvec labels =\n", + " arma::conv_to::from(encoded_dataset.row(8));\n", + "encoded_dataset.shed_row(8);" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "vital-lebanon", + "metadata": {}, + "outputs": [], + "source": [ + "arma::mat trainSet, testSet;\n", + "arma::rowvec trainLabels, testLabels;" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "ruled-refrigerator", + "metadata": {}, + "outputs": [], + "source": [ + "// Split dataset randomly into training set and test set.\n", + "data::Split(encoded_dataset, labels, trainSet, testSet, trainLabels, testLabels,\n", + " 0.2 /* Percentage of dataset to use for test set. */);" + ] + }, + { + "cell_type": "markdown", + "id": "57755813", + "metadata": {}, + "source": [ + "### Training the linear model\n", + "\n", + "Regression analysis is the most widely used method of prediction. Linear regression is used when the dataset has a linear correlation and as the name suggests, multiple linear regression has one independent variable (predictor) and one or more dependent variable(response).\n", + "\n", + "The simple linear regression equation is represented as y = $a + b_{1}x_{1} + b_{2}x_{2} + b_{3}x_{3} + ... + b_{n}x_{n}$ where $x_{i}$ is the ith explanatory variable, y is the dependent variable, $b_{i}$ is ith coefficient and a is the intercept.\n", + "\n", + "To perform linear regression we'll be using `LinearRegression()` api from mlpack." + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "id": "chemical-inside", + "metadata": {}, + "outputs": [], + "source": [ + "using namespace mlpack::regression;\n", + "LinearRegression lr(trainSet, trainLabels, 0.5);\n", + "// The above line creates and train the model." + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "id": "sensitive-sociology", + "metadata": {}, + "outputs": [], + "source": [ + "// Let's create a output vector for storing the results.\n", + "arma::rowvec output; \n", + "lr.Predict(testSet, output);" + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "id": "empty-senator", + "metadata": {}, + "outputs": [], + "source": [ + "lr.ComputeError(trainSet, trainLabels);" + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "id": "circular-donna", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "4.74874e+09" + ] + } + ], + "source": [ + "std::cout<NOTE : In the entire ML workflow, you never know exactly which model will perfrom the best. So, usually you try a lot of different algorithms to see which fits the model." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "C++14", + "language": "C++14", + "name": "xcpp14" + }, + "language_info": { + "codemirror_mode": "text/x-c++src", + "file_extension": ".cpp", + "mimetype": "text/x-c++src", + "name": "c++", + "version": "14" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/student_admission_regression_with_logistic_regression/student-admission-logistic-regression-cpp.ipynb b/student_admission_regression_with_logistic_regression/student-admission-logistic-regression-cpp.ipynb index 9dfd3187..6ee3ad4b 100644 --- a/student_admission_regression_with_logistic_regression/student-admission-logistic-regression-cpp.ipynb +++ b/student_admission_regression_with_logistic_regression/student-admission-logistic-regression-cpp.ipynb @@ -1,278 +1,392 @@ { - "metadata":{ - "language_info":{ - "codemirror_mode":"text/x-c++src", - "file_extension":".cpp", - "mimetype":"text/x-c++src", - "name":"c++", - "version":"14" - }, - "kernelspec":{ - "name":"xcpp14", - "display_name":"C++14", - "language":"C++14" - } + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "[![Binder](https://mybinder.org/badge_logo.svg)](https://lab.mlpack.org/v2/gh/mlpack/examples/master?urlpath=lab%2Ftree%2Fstudent_admission_regression_with_logistic_regression%2Fstudent-admission-logistic-regression-cpp.ipynb)" + ] }, - "nbformat_minor":4, - "nbformat":4, - "cells":[ - { - "cell_type":"markdown", - "source":"[![Binder](https://mybinder.org/badge_logo.svg)](https://lab.mlpack.org/v2/gh/mlpack/examples/master?urlpath=lab%2Ftree%2Fstudent_admission_regression_with_logistic_regression%2Fstudent-admission-logistic-regression-cpp.ipynb)", - "metadata":{ - - } - }, - { - "cell_type":"code", - "source":"/**\n * @file student-admission-logistic-regression-cpp.ipynb\n *\n * A simple example usage of Logistic Regression (LR)\n * applied to the Student Admission dataset.\n *\n * We will use a Logistic-Regression model to predict whether a student\n * gets admitted into a university (i.e, the output classes are Yes or No),\n * based on their results on past exams.\n *\n * Data from Andrew Ng's Stanford University Machine Learning Course (Coursera).\n */", - "metadata":{ - "trusted":true - }, - "execution_count":null, - "outputs":[ - - ] - }, - { - "cell_type":"code", - "source":"!wget -q https://lab.mlpack.org/data/student-admission.txt", - "metadata":{ - "trusted":true - }, - "execution_count":1, - "outputs":[ - - ] - }, - { - "cell_type":"code", - "source":"#include \n\n#include \n#include ", - "metadata":{ - "trusted":true - }, - "execution_count":2, - "outputs":[ - - ] - }, - { - "cell_type":"code", - "source":"// Header files to create and show the plot.\n#define WITHOUT_NUMPY 1\n#include \"matplotlibcpp.h\"\n#include \"xwidgets/ximage.hpp\"\n\nnamespace plt = matplotlibcpp;", - "metadata":{ - "trusted":true - }, - "execution_count":3, - "outputs":[ - - ] - }, - { - "cell_type":"code", - "source":"using namespace mlpack;", - "metadata":{ - "trusted":true - }, - "execution_count":4, - "outputs":[ - - ] - }, - { - "cell_type":"code", - "source":"using namespace mlpack::regression;", - "metadata":{ - "trusted":true - }, - "execution_count":5, - "outputs":[ - - ] - }, - { - "cell_type":"code", - "source":"// Read the input data.\narma::mat input;\ndata::Load(\"student-admission.txt\", input);", - "metadata":{ - "trusted":true - }, - "execution_count":6, - "outputs":[ - - ] - }, - { - "cell_type":"code", - "source":"// Print the first 10 rows of the input data.\nstd::cout << input.submat(0, 0, input.n_rows - 1 , 10).t() << std::endl;", - "metadata":{ - "trusted":true - }, - "execution_count":7, - "outputs":[ - { - "name":"stdout", - "text":" 34.6237 78.0247 0\n 30.2867 43.8950 0\n 35.8474 72.9022 0\n 60.1826 86.3086 1.0000\n 79.0327 75.3444 1.0000\n 45.0833 56.3164 0\n 61.1067 96.5114 1.0000\n 75.0247 46.5540 1.0000\n 76.0988 87.4206 1.0000\n 84.4328 43.5334 1.0000\n 95.8616 38.2253 0\n\n", - "output_type":"stream" - } - ] - }, - { - "cell_type":"markdown", - "source":"Historical data from previous students: each student has two exams scores associated and the final admission result (1.0=yes, 0.0=no).", - "metadata":{ - - } - }, - { - "cell_type":"code", - "source":"// Plot the input data.\n\n// Get the indices for the labels 0.0 (not admitted).\narma::mat dataset0 = input.cols(arma::find(input.row(2) == 0));\n\n// Get the data to for the indices.\nstd::vector x0 = arma::conv_to>::from(dataset0.row(0));\nstd::vector y0 = arma::conv_to>::from(dataset0.row(1));\n\n// Get the indices for the label 1.0 (admitted).\narma::mat dataset1 = input.cols(arma::find(input.row(2) == 1.0));\n\n// Get the data to for the indices.\nstd::vector x1 = arma::conv_to>::from(dataset1.row(0));\nstd::vector y1 = arma::conv_to>::from(dataset1.row(1));\n\nplt::figure_size(800, 800);\n\n// Set the label for the legend.\nstd::map m0;\nm0.insert(std::pair(\"label\", \"not admitted\"));\nplt::scatter(x0, y0, 4, m0);\n\n// Set the label for the legend.\nstd::map m1;\nm1.insert(std::pair(\"label\", \"admitted\"));\nplt::scatter(x1, y1, 4, m1);\n\nplt::xlabel(\"Exam 1 Score\");\nplt::ylabel(\"Exam 2 Score\");\nplt::title(\"Student admission vs. past two exams\");\nplt::legend();\n\nplt::save(\"./plot.png\");\nauto im = xw::image_from_file(\"plot.png\").finalize();\nim", - "metadata":{ - "trusted":true - }, - "execution_count":8, - "outputs":[ - { - "execution_count":8, - "output_type":"execute_result", - "data":{ - "application/vnd.jupyter.widget-view+json":{ - "model_id":"75e1b93113f44ca2ad0a709098eae2c1", - "version_major":2, - "version_minor":0 - }, - "text/plain":"A Jupyter widget" - }, - "metadata":{ - - } - } - ] - }, - { - "cell_type":"markdown", - "source":"If the score of the first or the second exam was too low, it might be not enough to be admitted. You need a good balance.", - "metadata":{ - - } - }, - { - "cell_type":"markdown", - "source":"This is the logistic function to model our admission:\n$P(y=1) = \\frac{1}{1 + e^{-(\\beta_{0} + \\beta_{1} \\cdot x_{1} + ... + \\beta_{n} \\cdot x_{n}) }}$\n\nwhere y is the admission result (0 or 1) and x are the exams scores.\nSince in our example the admission decision is based on two exams (x1 and x2)\n(two exams) we can set n = 2. The next step is to find the correct beta\nparameters for the model by using our historical data as a training set.", - "metadata":{ - - } - }, - { - "cell_type":"code", - "source":"// Split data into training data X (input) and y (labels) target variable.\n\n// Labels are the last row.\narma::Row labels =\n arma::conv_to>::from(input.row(input.n_rows - 1));\ninput.shed_row(input.n_rows - 1);", - "metadata":{ - "trusted":true - }, - "execution_count":9, - "outputs":[ - - ] - }, - { - "cell_type":"code", - "source":"// Create and train Logistic Regression model.\n//\n// For more information checkout https://mlpack.org/doc/mlpack-git/doxygen/classmlpack_1_1regression_1_1LogisticRegression.html\n// or uncomment the line below.\n// ?LogisticRegression<>\nLogisticRegression<> lr(input, labels, 0.0 /* no regularization */);", - "metadata":{ - "trusted":true - }, - "execution_count":10, - "outputs":[ - - ] - }, - { - "cell_type":"code", - "source":"// Final beta parameters.\nlr.Parameters().print()", - "metadata":{ - "trusted":true - }, - "execution_count":11, - "outputs":[ - { - "name":"stdout", - "text":" -25.1613 0.2062 0.2015\n", - "output_type":"stream" - } - ] - }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "/**\n", + " * @file student-admission-logistic-regression-cpp.ipynb\n", + " *\n", + " * A simple example usage of Logistic Regression (LR)\n", + " * applied to the Student Admission dataset.\n", + " *\n", + " * We will use a Logistic-Regression model to predict whether a student\n", + " * gets admitted into a university (i.e, the output classes are Yes or No),\n", + " * based on their results on past exams.\n", + " *\n", + " * Data from Andrew Ng's Stanford University Machine Learning Course (Coursera).\n", + " */" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "!wget -q https://lab.mlpack.org/data/student-admission.txt" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "#include \n", + "\n", + "#include \n", + "#include " + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "// Header files to create and show the plot.\n", + "#define WITHOUT_NUMPY 1\n", + "#include \"matplotlibcpp.h\"\n", + "#include \"xwidgets/ximage.hpp\"\n", + "\n", + "namespace plt = matplotlibcpp;" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "using namespace mlpack;" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "using namespace mlpack::regression;" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "// Read the input data.\n", + "arma::mat input;\n", + "data::Load(\"student-admission.txt\", input);" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ { - "cell_type":"code", - "source":"// We can use these beta parameters to plot the decision boundary on the training data.\n// We only need two points to plot a line, so we choose two endpoints:\n// the min and the max among the X training data.\nstd::vector xPlot;\nxPlot.push_back(arma::min(input.row(0)) - 2);\nxPlot.push_back(arma::max(input.row(0)) + 2);\n\nstd::vector yPlot;\nyPlot.push_back((-1.0 / lr.Parameters()(2)) * (lr.Parameters()(1) * xPlot[0] + lr.Parameters()(0)));\nyPlot.push_back((-1.0 / lr.Parameters()(2)) * (lr.Parameters()(1) * xPlot[1] + lr.Parameters()(0)));", - "metadata":{ - "trusted":true - }, - "execution_count":12, - "outputs":[ - - ] - }, + "name": "stdout", + "output_type": "stream", + "text": [ + " 34.6237 78.0247 0\n", + " 30.2867 43.8950 0\n", + " 35.8474 72.9022 0\n", + " 60.1826 86.3086 1.0000\n", + " 79.0327 75.3444 1.0000\n", + " 45.0833 56.3164 0\n", + " 61.1067 96.5114 1.0000\n", + " 75.0247 46.5540 1.0000\n", + " 76.0988 87.4206 1.0000\n", + " 84.4328 43.5334 1.0000\n", + " 95.8616 38.2253 0\n", + "\n" + ] + } + ], + "source": [ + "// Print the first 10 rows of the input data.\n", + "std::cout << input.submat(0, 0, input.n_rows - 1 , 10).t() << std::endl;" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Historical data from previous students: each student has two exams scores associated and the final admission result (1.0=yes, 0.0=no)." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ { - "cell_type":"code", - "source":"// Plot the decision boundary.\n\n// Get the indices for the labels 0.0 (not admitted).\narma::mat dataset0 = input.cols(arma::find(labels == 0));\n\n// Get the data to for the indices.\nstd::vector x0 = arma::conv_to>::from(dataset0.row(0));\nstd::vector y0 = arma::conv_to>::from(dataset0.row(1));\n\n// Get the indices for the label 1.0 (admitted).\narma::mat dataset1 = input.cols(arma::find(labels == 1.0));\n\n// Get the data to for the indices.\nstd::vector x1 = arma::conv_to>::from(dataset1.row(0));\nstd::vector y1 = arma::conv_to>::from(dataset1.row(1));\n\nplt::figure_size(800, 800);\nplt::scatter(x0, y0, 4);\nplt::scatter(x1, y1, 4);\n\nplt::plot(xPlot, yPlot);\n\nplt::xlabel(\"Exam 1 Score\");\nplt::ylabel(\"Exam 2 Score\");\nplt::title(\"Student admission vs. past two exams\");\n\nplt::save(\"./decision boundary-plot.png\");\nauto im = xw::image_from_file(\"decision boundary-plot.png\").finalize();\nim", - "metadata":{ - "trusted":true + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "54466081d1d04875824c960df742de13", + "version_major": 2, + "version_minor": 0 }, - "execution_count":13, - "outputs":[ - { - "execution_count":13, - "output_type":"execute_result", - "data":{ - "application/vnd.jupyter.widget-view+json":{ - "model_id":"06d78d253ec546e780ea8b5d129f0e1f", - "version_major":2, - "version_minor":0 - }, - "text/plain":"A Jupyter widget" - }, - "metadata":{ - - } - } + "text/plain": [ + "A Jupyter widget with unique id: 54466081d1d04875824c960df742de13" ] - }, - { - "cell_type":"markdown", - "source":"The blue line is our decision boundary. When your exams score lie below the line then\nprobably (that is the prediction) you will not be admitted to University.\nIf they lie above, probably you will. As you can see, the boundary is not predicting\nperfectly on the training historical data.", - "metadata":{ - - } - }, + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "// Plot the input data.\n", + "\n", + "// Get the indices for the labels 0.0 (not admitted).\n", + "arma::mat dataset0 = input.cols(arma::find(input.row(2) == 0));\n", + "\n", + "// Get the data to for the indices.\n", + "std::vector x0 = arma::conv_to>::from(dataset0.row(0));\n", + "std::vector y0 = arma::conv_to>::from(dataset0.row(1));\n", + "\n", + "// Get the indices for the label 1.0 (admitted).\n", + "arma::mat dataset1 = input.cols(arma::find(input.row(2) == 1.0));\n", + "\n", + "// Get the data to for the indices.\n", + "std::vector x1 = arma::conv_to>::from(dataset1.row(0));\n", + "std::vector y1 = arma::conv_to>::from(dataset1.row(1));\n", + "\n", + "plt::figure_size(800, 800);\n", + "\n", + "// Set the label for the legend.\n", + "std::map m0;\n", + "m0.insert(std::pair(\"label\", \"not admitted\"));\n", + "plt::scatter(x0, y0, 4, m0);\n", + "\n", + "// Set the label for the legend.\n", + "std::map m1;\n", + "m1.insert(std::pair(\"label\", \"admitted\"));\n", + "plt::scatter(x1, y1, 4, m1);\n", + "\n", + "plt::xlabel(\"Exam 1 Score\");\n", + "plt::ylabel(\"Exam 2 Score\");\n", + "plt::title(\"Student admission vs. past two exams\");\n", + "plt::legend();\n", + "\n", + "plt::save(\"./plot.png\");\n", + "auto im = xw::image_from_file(\"plot.png\").finalize();\n", + "im" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If the score of the first or the second exam was too low, it might be not enough to be admitted. You need a good balance." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This is the logistic function to model our admission:\n", + "$P(y=1) = \\frac{1}{1 + e^{-(\\beta_{0} + \\beta_{1} \\cdot x_{1} + ... + \\beta_{n} \\cdot x_{n}) }}$\n", + "\n", + "where y is the admission result (0 or 1) and x are the exams scores.\n", + "Since in our example the admission decision is based on two exams (x1 and x2)\n", + "(two exams) we can set n = 2. The next step is to find the correct beta\n", + "parameters for the model by using our historical data as a training set." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "// Split data into training data X (input) and y (labels) target variable.\n", + "\n", + "// Labels are the last row.\n", + "arma::Row labels =\n", + " arma::conv_to>::from(input.row(input.n_rows - 1));\n", + "input.shed_row(input.n_rows - 1);" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "// Create and train Logistic Regression model.\n", + "//\n", + "// For more information checkout https://mlpack.org/doc/mlpack-git/doxygen/classmlpack_1_1regression_1_1LogisticRegression.html\n", + "// or uncomment the line below.\n", + "// ?LogisticRegression<>\n", + "LogisticRegression<> lr(input, labels, 0.0 /* no regularization */);" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ { - "cell_type":"code", - "source":"// Let's say that my scores are 40 in the first exam and 78 in the second one.\narma::mat scores(\"40.0; 78.0\");\n\narma::mat probabilities;\nlr.Classify(scores, probabilities);", - "metadata":{ - "trusted":true - }, - "execution_count":14, - "outputs":[ - - ] - }, + "name": "stdout", + "output_type": "stream", + "text": [ + " -25.1613 0.2062 0.2015\n" + ] + } + ], + "source": [ + "// Final beta parameters.\n", + "lr.Parameters().print()" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "// We can use these beta parameters to plot the decision boundary on the training data.\n", + "// We only need two points to plot a line, so we choose two endpoints:\n", + "// the min and the max among the X training data.\n", + "std::vector xPlot;\n", + "xPlot.push_back(arma::min(input.row(0)) - 2);\n", + "xPlot.push_back(arma::max(input.row(0)) + 2);\n", + "\n", + "std::vector yPlot;\n", + "yPlot.push_back((-1.0 / lr.Parameters()(2)) * (lr.Parameters()(1) * xPlot[0] + lr.Parameters()(0)));\n", + "yPlot.push_back((-1.0 / lr.Parameters()(2)) * (lr.Parameters()(1) * xPlot[1] + lr.Parameters()(0)));" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ { - "cell_type":"code", - "source":"probabilities.print()", - "metadata":{ - "trusted":true + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "e023c9b879234163b1f5498cc8154920", + "version_major": 2, + "version_minor": 0 }, - "execution_count":15, - "outputs":[ - { - "name":"stdout", - "text":" 0.7680\n 0.2320\n", - "output_type":"stream" - } + "text/plain": [ + "A Jupyter widget with unique id: e023c9b879234163b1f5498cc8154920" ] - }, + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "// Plot the decision boundary.\n", + "\n", + "// Get the indices for the labels 0.0 (not admitted).\n", + "arma::mat dataset0 = input.cols(arma::find(labels == 0));\n", + "\n", + "// Get the data to for the indices.\n", + "std::vector x0 = arma::conv_to>::from(dataset0.row(0));\n", + "std::vector y0 = arma::conv_to>::from(dataset0.row(1));\n", + "\n", + "// Get the indices for the label 1.0 (admitted).\n", + "arma::mat dataset1 = input.cols(arma::find(labels == 1.0));\n", + "\n", + "// Get the data to for the indices.\n", + "std::vector x1 = arma::conv_to>::from(dataset1.row(0));\n", + "std::vector y1 = arma::conv_to>::from(dataset1.row(1));\n", + "\n", + "plt::figure_size(800, 800);\n", + "plt::scatter(x0, y0, 4);\n", + "plt::scatter(x1, y1, 4);\n", + "\n", + "plt::plot(xPlot, yPlot);\n", + "\n", + "plt::xlabel(\"Exam 1 Score\");\n", + "plt::ylabel(\"Exam 2 Score\");\n", + "plt::title(\"Student admission vs. past two exams\");\n", + "\n", + "plt::save(\"./decision boundary-plot.png\");\n", + "auto im = xw::image_from_file(\"decision boundary-plot.png\").finalize();\n", + "im" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The blue line is our decision boundary. When your exams score lie below the line then\n", + "probably (that is the prediction) you will not be admitted to University.\n", + "If they lie above, probably you will. As you can see, the boundary is not predicting\n", + "perfectly on the training historical data." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "// Let's say that my scores are 40 in the first exam and 78 in the second one.\n", + "arma::mat scores(\"40.0; 78.0\");\n", + "\n", + "arma::mat probabilities;\n", + "lr.Classify(scores, probabilities);" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ { - "cell_type":"markdown", - "source":"Looks like my probability to be admitted at University is only 23%.", - "metadata":{ - - } + "name": "stdout", + "output_type": "stream", + "text": [ + " 0.7680\n", + " 0.2320\n" + ] } - ] + ], + "source": [ + "probabilities.print()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Looks like my probability to be admitted at University is only 23%." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "C++14", + "language": "C++14", + "name": "xcpp14" + }, + "language_info": { + "codemirror_mode": "text/x-c++src", + "file_extension": ".cpp", + "mimetype": "text/x-c++src", + "name": "c++", + "version": "14" + } + }, + "nbformat": 4, + "nbformat_minor": 4 } diff --git a/utils/heatmap.hpp b/utils/heatmap.hpp new file mode 100644 index 00000000..e9a5e60b --- /dev/null +++ b/utils/heatmap.hpp @@ -0,0 +1,105 @@ +// Inside the C++ notebook we can use: +// HeatMap("filename.csv",width, height,"heatmap.png") + +#ifndef CHEATMAP_HPP +#define CHEATMAP_HPP + +#define PY_SSIZE_T_CLEAN +#include +#include + +// Here, we will use the same argument as used in python script heatmap.py +// since this is what passed from the C++ notebook to python script. + +int HeatMap(const std::string& inFile, + const std::string& outFile = "histogram.png", + const int width = 15, + const int height = 10) +{ + // Calls python function cpandahist and plots the heatmap + + PyObject *pName, *pModule, *pFunc; + PyObject *pArgs, *pValue; + + // This has to be adapted if you run this on your local system, + // so whenever you call the python script it can find the correct + // module -> PYTHONPATH, on lab.mlpack.org we put all the utility + // functions in the utils folder so we add that path + // to the Python search path. + Py_Initialize(); + PyRun_SimpleString("import sys"); + PyRun_SimpleString("sys.path.append(\"../utils/\")"); + + // Name of python script without extension + pName = PyUnicode_DecodeFSDefault("heatmap"); + + pModule = PyImport_Import(pName); + Py_DECREF(pName); + + if (pModule != NULL) + { + // The Python function from the histogram.py script + // we like to call - cheatmap + pFunc = PyObject_GetAttrString(pModule, "cheatmap"); + + if(pFunc && PyCallable_Check(pFunc)) + { + // The number of arguments we pass to the python script. + // inFile, outFile, width, height + // for the function above it's 4 + pArgs = PyTuple_New(4); + + // Now we have to encode the argument to the correct type + // We can use PyLong_FromLong for width and height as they are integers + // As for rest, we can use PyString_FromString. + + PyObject* pValueinFile = PyUnicode_FromString(inFile.c_str()); + //Here we just set the index of the argument. + PyTuple_SetItem(pArgs, 0, pValueinFile); + + PyObject* pValueoutFile = PyUnicode_FromString(outFile.c_str()); + PyTuple_SetItem(pArgs, 1, pValueoutFile); + + PyObject* pValuewidth = PyLong_FromLong(width); + PyTuple_SetItem(pArgs, 2, pValuewidth); + + PyObject* pValueheight = PyLong_FromLong(height); + PyTuple_SetItem(pArgs, 3, pValueheight); + + // The rest of the c++ part can remain same. + + pValue = PyObject_CallObject(pFunc, pArgs); + // We call the object with function name and arguments provided in c++ notebook. + Py_DECREF(pArgs); + + if (pValue != NULL) + { + Py_DECREF(pValue); + } + else + { + Py_DECREF(pFunc); + Py_DECREF(pModule); + PyErr_Print(); + fprintf(stderr,"Call failed.\n"); + return 1; + } + } + else + { + if (PyErr_Occurred()) + PyErr_Print(); + } + + Py_XDECREF(pFunc); + Py_DECREF(pModule); + } + else + { + PyErr_Print(); + return -1; + } + return 0; + } + +#endif diff --git a/utils/heatmap.py b/utils/heatmap.py new file mode 100644 index 00000000..e0a87bdb --- /dev/null +++ b/utils/heatmap.py @@ -0,0 +1,9 @@ +import pandas as pd +import seaborn as sns +import matplotlib.pyplot as plt + +def cheatmap(inFile, outFile='heatmap.png', width=15, height=10): + plt.figure(figsize=(width,height)) + dataset = pd.read_csv(inFile) + sns.heatmap(dataset.corr(), annot=True) + plt.savefig(outFile) \ No newline at end of file diff --git a/utils/histogram.hpp b/utils/histogram.hpp new file mode 100644 index 00000000..5a5fa33c --- /dev/null +++ b/utils/histogram.hpp @@ -0,0 +1,110 @@ +// Inside the C++ notebook we can use: +// Hist("filename.csv", "bins", "histogram.png") + +#ifndef CHISTOGRAM_HPP +#define CHISTOGRAM_HPP + +#define PY_SSIZE_T_CLEAN +#include +#include + +// Here, we will use the same argument as used in python script histogram.py +// since this is what passed from the C++ notebook to python script + +int Hist(const std::string& inFile, + const int bins, + const int width = 20, + const int height = 15, + const std::string& outFile = "histogram.png") + +{ + // Calls python function cpandahist and plots the histogram + + PyObject *pName, *pModule, *pFunc; + PyObject *pArgs, *pValue; + + // This has to be adapted if you run this on your local system, + // so whenever you call the python script it can find the correct + // module -> PYTHONPATH, on lab.mlpack.org we put all the utility + // functions in the utils folder so we add that path + // to the Python search path. + Py_Initialize(); + PyRun_SimpleString("import sys"); + PyRun_SimpleString("sys.path.append(\"../utils/\")"); + + // Name of python script without extension + pName = PyUnicode_DecodeFSDefault("histogram"); + + pModule = PyImport_Import(pName); + Py_DECREF(pName); + + if (pModule != NULL) + { + // The Python function from the histogram.py script + // we like to call - cpandashist + pFunc = PyObject_GetAttrString(pModule, "cpandashist"); + + if (pFunc && PyCallable_Check(pFunc)) + { + // The number of arguments we pass to the python script. + // inFile, outFile, kind + // for the function above it's 5 + pArgs = PyTuple_New(5); + + // Now we have to encode the argument to the correct type + // We can use PyLong_FromLong for bins, width and height as they are integers + // As for rest, we can use PyString_FromString + + PyObject* pValueinFile = PyUnicode_FromString(inFile.c_str()); + //Here we just set the index of the argument. + PyTuple_SetItem(pArgs, 0, pValueinFile); + + PyObject* pValuebins = PyLong_FromLong(bins); + PyTuple_SetItem(pArgs, 1, pValuebins); + + PyObject* pValuewidth = PyLong_FromLong(width); + PyTuple_SetItem(pArgs, 2, pValuewidth); + + PyObject* pValueheight = PyLong_FromLong(height); + PyTuple_SetItem(pArgs, 3, pValueheight); + + PyObject* pValueoutFile = PyUnicode_FromString(outFile.c_str()); + PyTuple_SetItem(pArgs, 4, pValueoutFile); + + // The rest of the c++ part can remain same. + + pValue = PyObject_CallObject(pFunc, pArgs); + // We call the object with function name and arguments provided in c++ notebook. + Py_DECREF(pArgs); + + if (pValue != NULL) + { + Py_DECREF(pValue); + } + else + { + Py_DECREF(pFunc); + Py_DECREF(pModule); + PyErr_Print(); + fprintf(stderr,"Call failed.\n"); + return 1; + } + } + else + { + if (PyErr_Occurred()) + PyErr_Print(); + } + + Py_XDECREF(pFunc); + Py_DECREF(pModule); + } + else + { + PyErr_Print(); + return -1; + } + return 0; + } + +#endif diff --git a/utils/histogram.py b/utils/histogram.py new file mode 100644 index 00000000..46d39500 --- /dev/null +++ b/utils/histogram.py @@ -0,0 +1,7 @@ +import pandas as pd +import matplotlib.pyplot as plt + +def cpandashist(inFile, bins, width=20,height=15, outFile = 'histogram.png'): + dataset = pd.read_csv(inFile) + dataset.hist(bins = 50, figsize=(20,15)) + plt.savefig(outFile) diff --git a/utils/impute.hpp b/utils/impute.hpp new file mode 100644 index 00000000..244a78fc --- /dev/null +++ b/utils/impute.hpp @@ -0,0 +1,102 @@ +// Inside the C++ notebook we can use: +// Impute("filename.csv", "output.csv", "imputationMethod") +// imputationMethod can be "mean", "median", "method" depending upon missing values. + +#ifndef CIMPUTE_HPP +#define CIMPUTE_HPP + +#define PY_SSIZE_T_CLEAN +#include +#include + +// Here, we will use the same argument as used in python script impute.py +// since this is what passed from the C++ notebook to python script. + +int Impute(const std::string& inFile, + const std::string& outFile, + const std::string& kind) +{ + // Calls python function Imputer and fills the missing values using + // the specified imputation policy and saves the dataset as .csv. + PyObject *pName, *pModule, *pFunc; + PyObject *pArgs, *pValue; + + // This has to be adapted if you run this on your local system, + // so whenever you call the python script it can find the correct + // module -> PYTHONPATH, on lab.mlpack.org we put all the utility + // functions in the utils folder so we add that path + // to the Python search path. + Py_Initialize(); + PyRun_SimpleString("import sys"); + PyRun_SimpleString("sys.path.append(\"../utils/\")"); + + // Name of python script without extension. + pName = PyUnicode_DecodeFSDefault("impute"); + + pModule = PyImport_Import(pName); + Py_DECREF(pName); + + if (pModule != NULL) + { + // The Python function from the impute.py script + // we like to call - cimputer + pFunc = PyObject_GetAttrString(pModule, "cimputer"); + + if(pFunc && PyCallable_Check(pFunc)) + { + // The number of arguments we pass to the python script. + // inFile, outFile, kind + // for the function above it's 3 + pArgs = PyTuple_New(3); + + // Now we have to encode the argument to the correct type + // besides width , height everything else is a string. + // So we can use PyUnicode_FromString. + + PyObject* pValueinFile = PyUnicode_FromString(inFile.c_str()); + //Here we just set the index of the argument. + PyTuple_SetItem(pArgs, 0, pValueinFile); + + PyObject* pValueoutFile = PyUnicode_FromString(outFile.c_str()); + PyTuple_SetItem(pArgs, 1, pValueoutFile); + + PyObject* pValuekind = PyUnicode_FromString(kind.c_str()); + PyTuple_SetItem(pArgs, 2, pValuekind); + + // The rest of the c++ part can remain same. + + pValue = PyObject_CallObject(pFunc, pArgs); + // We call the object with function name and arguments provided in c++ notebook. + Py_DECREF(pArgs); + + if (pValue != NULL) + { + Py_DECREF(pValue); + } + else + { + Py_DECREF(pFunc); + Py_DECREF(pModule); + PyErr_Print(); + fprintf(stderr,"Call failed.\n"); + return 1; + } + } + else + { + if (PyErr_Occurred()) + PyErr_Print(); + } + + Py_XDECREF(pFunc); + Py_DECREF(pModule); + } + else + { + PyErr_Print(); + return -1; + } + return 0; + } + +#endif diff --git a/utils/impute.py b/utils/impute.py new file mode 100644 index 00000000..5d4569a7 --- /dev/null +++ b/utils/impute.py @@ -0,0 +1,17 @@ +import pandas as pd +import numpy as np + +def cimputer(inFile, outFile, kind): + dataset = pd.read_csv(inFile) + df = dataset.copy(deep=True) + for feature in df.columns: + if df[feature].dtype == "float": + if kind == "mean": + df[feature] = df[feature].fillna(df[feature].mean()) + elif kind == "median": + df[feature] = df[feature].fillna(df[feature].median()) + elif kind == "mode": + df[feature] = df[feature].fillna(df[feature].mode()[0]) + elif df[feature].dtype == "object": + df[feature] = df[feature].fillna(df[feature].mode()[0]) + df.to_csv(outFile, encoding='utf-8', index=False) diff --git a/utils/pandasscatter.hpp b/utils/pandasscatter.hpp new file mode 100644 index 00000000..ded0c721 --- /dev/null +++ b/utils/pandasscatter.hpp @@ -0,0 +1,280 @@ +// Inside the C++ notebook we can use: +// PandasScatter("housing.csv", "longitude", "latitude", "output.png"); +// auto im = xw::image_from_file("output.png").finalize(); +// im + +#ifndef C_PANDAS_SCATTER_C_PANDAS_SCATTER_HPP +#define C_PANDAS_SCATTER_C_PANDAS_SCATTER_HPP + +#define PY_SSIZE_T_CLEAN +#include +#include + +// Here we use the same arguments as we used in the python script, +// since this is what is passed from the C++ notebook to call the python script. +int PandasScatter(const std::string& inFile, + const std::string& x, + const std::string& y, + const std::string& outFile = "output.png", + const int width = 10, + const int height = 10) +{ + PyObject *pName, *pModule, *pFunc; + PyObject *pArgs, *pValue; + int i; + + // This has to be adapted if you run this on your local system, + // so whenever you call the python script it can find the correct + // module -> PYTHONPATH, on lab.mlpack.org we put all the utility + // functions for plotting uinto the utils folder so we add that path + // to the Python search path. + Py_Initialize(); + PyRun_SimpleString("import sys"); + PyRun_SimpleString("sys.path.append(\"../utils/\")"); + // Name of the python script without the extension. + pName = PyUnicode_DecodeFSDefault("pandasscatter"); + + pModule = PyImport_Import(pName); + Py_DECREF(pName); + + if (pModule != NULL) + { + // The Python function from the pandasscatter.py script + // we like to call - cpandasscatter + pFunc = PyObject_GetAttrString(pModule, "cpandasscatter"); + + if (pFunc && PyCallable_Check(pFunc)) + { + // The number of arguments we pass to the python script. + // inFile, x, y, outFile='output.png', height=10, width=10 + // for the example above it's 6 + pArgs = PyTuple_New(6); + + // Now we have to encode the argument to the correct type + // besides width, height everything else is a string. + // So we can use PyUnicode_FromString. + // If the data is an int we can use PyLong_FromLong, + // see the lines below for an example. + PyObject* pValueinFile = PyUnicode_FromString(inFile.c_str()); + // Here we just set the index of the argument. + PyTuple_SetItem(pArgs, 0, pValueinFile); + + PyObject* pValueX = PyUnicode_FromString(x.c_str()); + PyTuple_SetItem(pArgs, 1, pValueX); + + PyObject* pValueY = PyUnicode_FromString(y.c_str()); + PyTuple_SetItem(pArgs, 2, pValueY); + + PyObject* pValueoutFile = PyUnicode_FromString(outFile.c_str()); + PyTuple_SetItem(pArgs, 3, pValueoutFile); + + PyObject* pValueWidth = PyLong_FromLong(width); + PyTuple_SetItem(pArgs, 4, pValueWidth); + + PyObject* pValueHeight = PyLong_FromLong(height); + PyTuple_SetItem(pArgs, 5, pValueHeight); + + // The rest of the c++ part can stay the same. + + pValue = PyObject_CallObject(pFunc, pArgs); + Py_DECREF(pArgs); + if (pValue != NULL) + { + Py_DECREF(pValue); + } + else + { + Py_DECREF(pFunc); + Py_DECREF(pModule); + PyErr_Print(); + fprintf(stderr,"Call failed.\n"); + return 1; + } + } + else + { + if (PyErr_Occurred()) + PyErr_Print(); + } + + Py_XDECREF(pFunc); + Py_DECREF(pModule); + } + else + { + PyErr_Print(); + return 1; + } + + return 0; +} +int PandasScatterColor(const std::string& inFile, + const std::string& x, + const std::string& y, + const std::string& label, + const std::string& c, + const std::string& outFile, + const int width = 10, + const int height= 10) +{ + PyObject *pName, *pModule, *pFunc; + PyObject *pArgs, *pValue; + int i; + Py_Initialize(); + PyRun_SimpleString("import sys"); + PyRun_SimpleString("sys.path.append(\"../utils/\")"); + pName = PyUnicode_DecodeFSDefault("pandasscatter"); + + pModule = PyImport_Import(pName); + Py_DECREF(pName); + + if (pModule != NULL) + { + pFunc = PyObject_GetAttrString(pModule, "cpandasscattercolor"); + if( pFunc && PyCallable_Check(pFunc)) + { + pArgs = PyTuple_New(8); + + PyObject* pValueinFile = PyUnicode_FromString(inFile.c_str()); + PyTuple_SetItem(pArgs, 0, pValueinFile); + + PyObject* pValueX = PyUnicode_FromString(x.c_str()); + PyTuple_SetItem(pArgs, 1, pValueX); + + PyObject* pValueY = PyUnicode_FromString(y.c_str()); + PyTuple_SetItem(pArgs, 2, pValueY); + + PyObject* pValueLabel = PyUnicode_FromString(label.c_str()); + PyTuple_SetItem(pArgs, 3, pValueLabel); + + PyObject* pValueC = PyUnicode_FromString(c.c_str()); + PyTuple_SetItem(pArgs, 4, pValueC); + + PyObject* pValueoutFile = PyUnicode_FromString(outFile.c_str()); + PyTuple_SetItem(pArgs, 5, pValueoutFile); + + PyObject* pValueWidth = PyLong_FromLong(width); + PyTuple_SetItem(pArgs, 6, pValueWidth); + + PyObject* pValueHeight = PyLong_FromLong(height); + PyTuple_SetItem(pArgs, 7, pValueHeight); + + pValue = PyObject_CallObject(pFunc, pArgs); + Py_DECREF(pArgs); + if (pValue != NULL) + { + Py_DECREF(pValue); + } + else + { + Py_DECREF(pFunc); + Py_DECREF(pModule); + PyErr_Print(); + fprintf(stderr,"Call Failed.\n"); + return 1; + } + } + else + { + if (PyErr_Occurred()) + PyErr_Print(); + } + + Py_XDECREF(pFunc); + Py_DECREF(pModule); + } + else + { + PyErr_Print(); + return -1; + } + return 0; +} +int PandasScatterMap(const std::string& inFile, + const std::string& imgFile, + const std::string& x, + const std::string& y, + const std::string& label, + const std::string& c, + const std::string& outFile, + const int width = 10, + const int height= 10) +{ + PyObject *pName, *pModule, *pFunc; + PyObject *pArgs, *pValue; + int i; + Py_Initialize(); + PyRun_SimpleString("import sys"); + PyRun_SimpleString("sys.path.append(\"../utils/\")"); + pName = PyUnicode_DecodeFSDefault("pandasscatter"); + + pModule = PyImport_Import(pName); + Py_DECREF(pName); + + if (pModule != NULL) + { + pFunc = PyObject_GetAttrString(pModule, "cpandasscattermap"); + if(pFunc && PyCallable_Check(pFunc)) + { + pArgs = PyTuple_New(9); + + PyObject* pValueinFile = PyUnicode_FromString(inFile.c_str()); + PyTuple_SetItem(pArgs, 0, pValueinFile); + + PyObject* pValueimgFile = PyUnicode_FromString(imgFile.c_str()); + PyTuple_SetItem(pArgs, 1, pValueimgFile); + + PyObject* pValueX = PyUnicode_FromString(x.c_str()); + PyTuple_SetItem(pArgs, 2, pValueX); + + PyObject* pValueY = PyUnicode_FromString(y.c_str()); + PyTuple_SetItem(pArgs, 3, pValueY); + + PyObject* pValueLabel = PyUnicode_FromString(label.c_str()); + PyTuple_SetItem(pArgs, 4, pValueLabel); + + PyObject* pValueC = PyUnicode_FromString(c.c_str()); + PyTuple_SetItem(pArgs, 5, pValueC); + + PyObject* pValueoutFile = PyUnicode_FromString(outFile.c_str()); + PyTuple_SetItem(pArgs, 6, pValueoutFile); + + PyObject* pValueWidth = PyLong_FromLong(width); + PyTuple_SetItem(pArgs, 7, pValueWidth); + + PyObject* pValueHeight = PyLong_FromLong(height); + PyTuple_SetItem(pArgs, 8, pValueHeight); + + pValue = PyObject_CallObject(pFunc, pArgs); + Py_DECREF(pArgs); + if (pValue != NULL) + { + Py_DECREF(pValue); + } + else + { + Py_DECREF(pFunc); + Py_DECREF(pModule); + PyErr_Print(); + fprintf(stderr,"Call Failed.\n"); + return 1; + } + + } + else + { + if (PyErr_Occurred()) + PyErr_Print(); + } + + Py_XDECREF(pFunc); + Py_DECREF(pModule); + } + else + { + PyErr_Print(); + return -1; + } + return 0; +} +#endif diff --git a/utils/pandasscatter.py b/utils/pandasscatter.py new file mode 100644 index 00000000..787178f3 --- /dev/null +++ b/utils/pandasscatter.py @@ -0,0 +1,28 @@ +from matplotlib import figure +import pandas as pd +import matplotlib.pyplot as plt +from matplotlib.pyplot import figure + +def cpandasscatter(inFile, x, y, outFile='output.png', height=10, width=10): + dataset = pd.read_csv(inFile) + fig = dataset.plot(kind="scatter", x=x, y=y, alpha=0.1, figsize=(width, height)) + fig.figure.savefig(outFile) + +def cpandasscattercolor(inFile, x, y, label, c, outFile='output1.png', height=10, width=10): + dataset = pd.read_csv(inFile) + fig = dataset.plot(kind="scatter", x=x, y=y, alpha=0.4,s=dataset["population"]/100, + label=label, c=c, cmap=plt.get_cmap("jet"), colorbar=True, + sharex = False) + fig.figure.savefig(outFile) + +def cpandasscattermap(inFile, imgFile, x, y, label, c, outFile="output2.png", height=10, width=7): + figure(figsize=(10,7)) + im = plt.imread(imgFile) + dataset = pd.read_csv(inFile) + implot = plt.imshow(im, extent=[-124.55, -113.80, 32.45, 42.05], alpha=0.5, + cmap=plt.get_cmap("jet")) + plt.scatter(x=dataset[x], y=dataset[y], s=dataset["population"]/100, label=label, c=dataset[c], cmap=plt.get_cmap("jet"), alpha= 0.5) + plt.colorbar() + plt.ylabel("Latitude", fontsize=14) + plt.xlabel("Longitude", fontsize=14) + plt.savefig(outFile) \ No newline at end of file diff --git a/utils/plot.hpp b/utils/plot.hpp index 6e4f6ef0..871fe6fb 100644 --- a/utils/plot.hpp +++ b/utils/plot.hpp @@ -18,14 +18,14 @@ int scatter(const std::string& fname, const int figWidth = 26, const int figHeight = 7) { - + // Calls Python function cscatter and generates a scatter plot of Xcol and yCol and saves it, // so the plot can later be imported in C++ notebook using xwidget. - + // PyObject contains info Python needs to treat a pointer to an object as an object. // It contains object's reference count and pointer to corresponding object type. PyObject *pName, *pModule, *pFunc, *pArgs, *pValue; - + // Initialize Python Interpreter. Py_Initialize(); // Import sys module in Interpreter and add current path to python search path. @@ -43,43 +43,43 @@ int scatter(const std::string& fname, pArgs = PyTuple_New(12); // String object representing the name of the dataset to be loaded. - PyObject* pFname = PyString_FromString(fname.c_str()); + PyObject* pFname = PyUnicode_FromString(fname.c_str()); PyTuple_SetItem(pArgs, 0, pFname); - + // String object representing the name of the feature to be plotted along X axis. - PyObject* pXcol = PyString_FromString(xCol.c_str()); + PyObject* pXcol = PyUnicode_FromString(xCol.c_str()); PyTuple_SetItem(pArgs, 1, pXcol); - + // String object representing the name of the feature to be plotted along Y axis. - PyObject* pYcol = PyString_FromString(yCol.c_str()); + PyObject* pYcol = PyUnicode_FromString(yCol.c_str()); PyTuple_SetItem(pArgs, 2, pYcol); - + // String object representing the name of the feature to be parsed as TimeStamp. - PyObject* pDateCol = PyString_FromString(dateCol.c_str()); + PyObject* pDateCol = PyUnicode_FromString(dateCol.c_str()); PyTuple_SetItem(pArgs, 3, pDateCol); - + // String object representing the name of the feature to be used to mask the plot data points. - PyObject* pMaskCol = PyString_FromString(maskCol.c_str()); - PyTuple_SetItem(pArgs, 4, pMaskCol); - + PyObject* pMaskCol = PyUnicode_FromString(maskCol.c_str()); + PyTuple_SetItem(pArgs, 4, pMaskCol); + // String object representing the value for masking. - PyObject* pType = PyString_FromString(type.c_str()); + PyObject* pType = PyUnicode_FromString(type.c_str()); PyTuple_SetItem(pArgs, 5, pType); - + // String object representing the feature name to be used as color value in plot. - PyObject* pColor = PyString_FromString(color.c_str()); + PyObject* pColor = PyUnicode_FromString(color.c_str()); PyTuple_SetItem(pArgs, 6, pColor); - + // String object representing the X axis label. - PyObject* pXlabel = PyString_FromString(xLabel.c_str()); + PyObject* pXlabel = PyUnicode_FromString(xLabel.c_str()); PyTuple_SetItem(pArgs, 7, pXlabel); - + // String object representing the Y axis label. - PyObject* pYlabel = PyString_FromString(yLabel.c_str()); + PyObject* pYlabel = PyUnicode_FromString(yLabel.c_str()); PyTuple_SetItem(pArgs, 8, pYlabel); - // String object representing the title of the figure. - PyObject* pFigTitle = PyString_FromString(figTitle.c_str()); + // String object representing the title of the figure. + PyObject* pFigTitle = PyUnicode_FromString(figTitle.c_str()); PyTuple_SetItem(pArgs, 9, pFigTitle); // Integer object representing the width of the figure. @@ -104,9 +104,9 @@ int barplot(const std::string& fname, const int figWidth = 5, const int figHeight = 7) { - + // Calls Python function cbarplot and generates a barplot plot of x and y and saves it, - // so the plot can later be imported in C++ notebook using xwidget. + // so the plot can later be imported in C++ notebook using xwidget. // PyObject contains info Python needs to treat a pointer to an object as an object. // It contains object's reference count and pointer to corresponding object type. @@ -118,41 +118,41 @@ int barplot(const std::string& fname, PyRun_SimpleString("import sys"); PyRun_SimpleString("sys.path.append(\"../utils/\")"); - // Import the Python module. + // Import the Python module. pName = PyUnicode_DecodeFSDefault("plot"); pModule = PyImport_Import(pName); - // Get the reference to Python Function to call. + // Get the reference to Python Function to call. pFunc = PyObject_GetAttrString(pModule, "cbarplot"); - // Create a tuple object to hold the arguments for function call. + // Create a tuple object to hold the arguments for function call. pArgs = PyTuple_New(7); - // String object representing the name of the dataset to be loaded. - PyObject* pFname = PyString_FromString(fname.c_str()); + // String object representing the name of the dataset to be loaded. + PyObject* pFname = PyUnicode_FromString(fname.c_str()); PyTuple_SetItem(pArgs, 0, pFname); // String object representing the name of the feature to be plotted along X axis. - PyObject* pX = PyString_FromString(x.c_str()); + PyObject* pX = PyUnicode_FromString(x.c_str()); PyTuple_SetItem(pArgs, 1, pX); // String object representing the name of the feature to be plotted along Y axis. - PyObject* pY = PyString_FromString(y.c_str()); + PyObject* pY = PyUnicode_FromString(y.c_str()); PyTuple_SetItem(pArgs, 2, pY); - + // String object representing the name of the feature to be parsed as TimeStamp. - PyObject* pDateCol = PyString_FromString(dateCol.c_str()); + PyObject* pDateCol = PyUnicode_FromString(dateCol.c_str()); PyTuple_SetItem(pArgs, 3, pDateCol); - // String object representing the title of the figure. - PyObject* pFigTitle = PyString_FromString(figTitle.c_str()); + // String object representing the title of the figure. + PyObject* pFigTitle = PyUnicode_FromString(figTitle.c_str()); PyTuple_SetItem(pArgs, 4, pFigTitle); - // Integer object representing the width of the figure. + // Integer object representing the width of the figure. PyObject* pFigWidth = PyLong_FromLong(figWidth); PyTuple_SetItem(pArgs, 5, pFigWidth); - // Integer object representing the height of the figure. + // Integer object representing the height of the figure. PyObject* pFigHeight = PyLong_FromLong(figHeight); PyTuple_SetItem(pArgs, 6, pFigHeight); @@ -171,50 +171,50 @@ int heatmap(const std::string& fname, { // PyObject contains info Python needs to treat a pointer to an object as an object. - // It contains object's reference count and pointer to corresponding object type. + // It contains object's reference count and pointer to corresponding object type. PyObject *pName, *pModule, *pFunc, *pArgs, *pValue; - // Initialize Python Interpreter. + // Initialize Python Interpreter. Py_Initialize(); - // Import sys module in Interpreter and add current path to python search path. + // Import sys module in Interpreter and add current path to python search path. PyRun_SimpleString("import sys"); PyRun_SimpleString("sys.path.append(\"../utils/\")"); - // Import the Python module. + // Import the Python module. pName = PyUnicode_DecodeFSDefault("plot"); pModule = PyImport_Import(pName); - // Get the reference to Python Function to call. + // Get the reference to Python Function to call. pFunc = PyObject_GetAttrString(pModule, "cheatmap"); - // Create a tuple object to hold the arguments for function call. + // Create a tuple object to hold the arguments for function call. pArgs = PyTuple_New(6); - // String object representing the name of the dataset to be loaded. - PyObject* pFname = PyString_FromString(fname.c_str()); + // String object representing the name of the dataset to be loaded. + PyObject* pFname = PyUnicode_FromString(fname.c_str()); PyTuple_SetItem(pArgs, 0, pFname); // String object representing the name of color map to be used for plotting. - PyObject* pColorMap = PyString_FromString(colorMap.c_str()); + PyObject* pColorMap = PyUnicode_FromString(colorMap.c_str()); PyTuple_SetItem(pArgs, 1, pColorMap); - // Boolean object indicating if correlation values must be annotated in figure. + // Boolean object indicating if correlation values must be annotated in figure. PyObject* pAnnotation = PyBool_FromLong(annotation); PyTuple_SetItem(pArgs, 2, pAnnotation); - // String object representing the title of the figure. - PyObject* pFigTitle = PyString_FromString(figTitle.c_str()); + // String object representing the title of the figure. + PyObject* pFigTitle = PyUnicode_FromString(figTitle.c_str()); PyTuple_SetItem(pArgs, 3, pFigTitle); - // Integer object representing the width of the figure. + // Integer object representing the width of the figure. PyObject* pFigWidth = PyLong_FromLong(figWidth); PyTuple_SetItem(pArgs, 4, pFigWidth); - // Integer object representing the height of the figure. + // Integer object representing the height of the figure. PyObject* pFigHeight = PyLong_FromLong(figHeight); PyTuple_SetItem(pArgs, 5, pFigHeight); - // Call the function by passing the reference to function & tuple holding arguments. + // Call the function by passing the reference to function & tuple holding arguments. pValue = PyObject_CallObject(pFunc, pArgs); return 0; @@ -227,42 +227,42 @@ int lmplot(const std::string& fname, { // PyObject contains info Python needs to treat a pointer to an object as an object. - // It contains object's reference count and pointer to corresponding object type. + // It contains object's reference count and pointer to corresponding object type. PyObject *pName, *pModule, *pFunc, *pArgs, *pValue; - // Initialize Python Interpreter. + // Initialize Python Interpreter. Py_Initialize(); // Import sys module in Interpreter and add current path to python search path. PyRun_SimpleString("import sys"); PyRun_SimpleString("sys.path.append(\"../utils/\")"); - // Import the Python module. + // Import the Python module. pName = PyUnicode_DecodeFSDefault("plot"); pModule = PyImport_Import(pName); - // Get the reference to Python Function to call. + // Get the reference to Python Function to call. pFunc = PyObject_GetAttrString(pModule, "clmplot"); - // Create a tuple object to hold the arguments for function call. + // Create a tuple object to hold the arguments for function call. pArgs = PyTuple_New(4); - // String object representing the name of the dataset to be loaded. - PyObject* pFname = PyString_FromString(fname.c_str()); + // String object representing the name of the dataset to be loaded. + PyObject* pFname = PyUnicode_FromString(fname.c_str()); PyTuple_SetItem(pArgs, 0, pFname); - // String object representing the title of the figure. - PyObject* pFigTitle = PyString_FromString(figTitle.c_str()); + // String object representing the title of the figure. + PyObject* pFigTitle = PyUnicode_FromString(figTitle.c_str()); PyTuple_SetItem(pArgs, 1, pFigTitle); - // Integer object representing the width of the figure. + // Integer object representing the width of the figure. PyObject* pFigWidth = PyLong_FromLong(figWidth); PyTuple_SetItem(pArgs, 2, pFigWidth); - // Integer object representing the height of the figure. + // Integer object representing the height of the figure. PyObject* pFigHeight = PyLong_FromLong(figHeight); PyTuple_SetItem(pArgs, 3, pFigHeight); - // Call the function by passing the reference to function & tuple holding arguments. + // Call the function by passing the reference to function & tuple holding arguments. pValue = PyObject_CallObject(pFunc, pArgs); return 0; @@ -275,42 +275,42 @@ int histplot(const std::string& fname, { // PyObject contains info Python needs to treat a pointer to an object as an object. - // It contains object's reference count and pointer to corresponding object type. + // It contains object's reference count and pointer to corresponding object type. PyObject *pName, *pModule, *pFunc, *pArgs, *pValue; - // Initialize Python Interpreter. + // Initialize Python Interpreter. Py_Initialize(); // Import sys module in Interpreter and add current path to python search path. PyRun_SimpleString("import sys"); PyRun_SimpleString("sys.path.append(\"../utils/\")"); - // Import the Python module. + // Import the Python module. pName = PyUnicode_DecodeFSDefault("plot"); pModule = PyImport_Import(pName); - // Get the reference to Python Function to call. + // Get the reference to Python Function to call. pFunc = PyObject_GetAttrString(pModule, "chistplot"); - // Create a tuple object to hold the arguments for function call. + // Create a tuple object to hold the arguments for function call. pArgs = PyTuple_New(4); - // String object representing the name of the dataset to be loaded. - PyObject* pFname = PyString_FromString(fname.c_str()); + // String object representing the name of the dataset to be loaded. + PyObject* pFname = PyUnicode_FromString(fname.c_str()); PyTuple_SetItem(pArgs, 0, pFname); - // String object representing the title of the figure. - PyObject* pFigTitle = PyString_FromString(figTitle.c_str()); + // String object representing the title of the figure. + PyObject* pFigTitle = PyUnicode_FromString(figTitle.c_str()); PyTuple_SetItem(pArgs, 1, pFigTitle); - // Integer object representing the width of the figure. + // Integer object representing the width of the figure. PyObject* pFigWidth = PyLong_FromLong(figWidth); PyTuple_SetItem(pArgs, 2, pFigWidth); - // Integer object representing the height of the figure. + // Integer object representing the height of the figure. PyObject* pFigHeight = PyLong_FromLong(figHeight); PyTuple_SetItem(pArgs, 3, pFigHeight); - // Call the function by passing the reference to function & tuple holding arguments. + // Call the function by passing the reference to function & tuple holding arguments. pValue = PyObject_CallObject(pFunc, pArgs); return 0; diff --git a/utils/plot.py b/utils/plot.py index adebb5c0..19bf87ca 100644 --- a/utils/plot.py +++ b/utils/plot.py @@ -2,21 +2,21 @@ import matplotlib.pyplot as plt import seaborn as sns -def cscatter(filename: str, - xCol: str, +def cscatter(filename: str, + xCol: str, yCol: str, - dateCol:str = None, - maskCol: str = None, - type_: str = None, + dateCol:str = None, + maskCol: str = None, + type_: str = None, color: str = None, - xLabel: str = None, - yLabel: str = None, - figTitle: str = None, - figWidth: int = 26, + xLabel: str = None, + yLabel: str = None, + figTitle: str = None, + figWidth: int = 26, figHeight: int = 7) -> None: """ Creates a scatter plot of size figWidth & figHeight, named figTitle and saves it. - + Parameters: filename (str): Name of the dataset to load. xCol (str): Name of the feature in dataset to plot against X axis. @@ -31,7 +31,7 @@ def cscatter(filename: str, figTitle (str): Title for the figure to be save; defaults to None. figWidth (int): Width of the figure; defaults to 26. figHeight (int): Height of the figure; defaults to 7. - + Returns: (None): Function does not return anything. """ @@ -58,16 +58,16 @@ def cscatter(filename: str, plt.savefig(f"{figTitle}.png") plt.close() -def cbarplot(filename: str, - x: str, - y: str, - dateCol: str = None, - figTitle: str = None, - figWidth: int = 5, +def cbarplot(filename: str, + x: str, + y: str, + dateCol: str = None, + figTitle: str = None, + figWidth: int = 5, figHeight: int = 7) -> None: """ Creates a bar plot of size figWidth & figHeight, named figTitle between x & y. - + Parameters: filename (str): Name of the dataset to load. x (str): Name of the feature in dataset to plot against X axis. @@ -91,16 +91,16 @@ def cbarplot(filename: str, plt.title(figTitle) plt.savefig(f"{figTitle}.png") plt.close() - -def cheatmap(filename: str, - cmap: str, - annotate: bool, - figTitle: str, - figWidth: int = 12, + +def cheatmap(filename: str, + cmap: str, + annotate: bool, + figTitle: str, + figWidth: int = 12, figHeight: int = 6) -> None: """ Creates a heatmap (correlation map) of the dataset and saves it. - + Parameters: filename (str): Name of the dataset to load. cmap (str): Name of the color map to be used for plotting. @@ -120,14 +120,14 @@ def cheatmap(filename: str, plt.title(figTitle) plt.savefig(f"{figTitle}.png") plt.close() - -def clmplot(filename: str, - figTitle: str = None, - figWidth: int = 6, + +def clmplot(filename: str, + figTitle: str = None, + figWidth: int = 6, figHeight: int = 7) -> None: """ Generates a regression plot on the given dataset and saves it. - + Parameters: filename (str): Name of the dataset to load. figTitle (str): Title for the figure to be save; defaults to None. @@ -144,14 +144,14 @@ def clmplot(filename: str, ax = sns.lmplot(x="Y_Test", y="Y_Preds", data=df) plt.savefig(f"{figTitle}.png") plt.close() - -def chistplot(filename: str, - figTitle: str = None, - figWidth: int = 6, + +def chistplot(filename: str, + figTitle: str = None, + figWidth: int = 6, figHeight: int = 4) -> None: """ Generated a histogram on the given dataset and saves it. - + Parameters: filename (str): Name of the dataset to load. figTitle (str): Title for the figure to be save; defaults to None. @@ -169,4 +169,4 @@ def chistplot(filename: str, plt.title(f"{figTitle}") plt.savefig(f"{figTitle}.png") plt.close() - +