This project implements a simple Multiple Linear Regression model from scratch in Python using NumPy and Pandas. The model is trained on housing data (houses.csv
) to predict a target variable (e.g., house prices) based on selected features (e.g., size, number of rooms, etc.).
The project demonstrates how to:
- Load and preprocess data using
Pandas
. - Split data into training and testing sets.
- Build and train a linear regression model using gradient descent.
- Evaluate the model by predicting on test data.
houses.csv
: The dataset used for training and testing the model.linear_regression.py
: The Python script containing the implementation of the linear regression model.README.md
: Documentation for the project.
The dataset (houses.csv
) is loaded using pandas.read_csv
. The target variable y
is extracted as the first column, while the features X
are selected as the last two columns. The dataset is split into training (80%) and testing (20%) sets.
The MultipleLinearRegression
class includes:
- Initialization (
__init__
): Sets the learning rate and initializes weights and bias. - Forward Pass (
Z
): Calculates predictions as the dot product of input features and weights, plus bias. - Cost Function (
cost_function
): Computes the Mean Squared Error (MSE). - Gradient Calculation (
gradient
): Derives gradients for weights and bias using partial derivatives of MSE. - Training (
fit
): Updates weights and bias iteratively using gradient descent. - Prediction (
predict
): Predicts the target values for new input data.
The model is trained on the training set using fit
. The predictions on the test set are then rounded to simplify the output.
To run the project, you need:
- Python 3.7+
- NumPy
- Pandas
Install the dependencies using:
pip install numpy pandas
Clone the repository:
git clone https://github.com/your-username/linear-regression.git
cd linear-regression
- Place your dataset in the same directory and name it houses.csv.
python linear_regression.py
View the model predictions printed to the console.
fit(X_train, y_train, epochs)
: Trains the model using the provided training data and the specified number of epochs.predict(X_test)
: Returns predictions for the test set.
- Ensure the dataset
houses.csv
is correctly formatted with numerical columns, where the first column is the target variable. - Adjust the learning rate
lr
and epochs to fine-tune model performance.
-This project is open-source under the MIT License. -Feel free to contribute or suggest improvements! 😊