Skip to content

A project predicting housing prices using the California Housing Dataset with Linear Regression, featuring data preprocessing pipelines, model training, and future prediction capabilities.

Notifications You must be signed in to change notification settings

siddhinarayan09/house-prices-prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Housing Prices Prediction Project

This project predicts housing prices based on various features such as median income, house age, and average number of rooms. It leverages machine learning techniques with a Linear Regression model and provides insights through evaluation metrics.


Table of Contents


Overview

The Housing Prices Prediction Project applies machine learning to analyze the California Housing Dataset and predict housing prices. It preprocesses data, trains a model, evaluates performance, and saves the model for future predictions.


Features

  • Data Preprocessing:
    • Standardizes numerical features using StandardScaler for consistent model training.
  • Model Training:
    • A pipeline integrates preprocessing with a Linear Regression model.
  • Model Evaluation:
    • Calculates Mean Squared Error (MSE) to assess model accuracy.
  • Model Persistence:
    • Saves the trained model with joblib for reuse.
  • Prediction:
    • Accepts new input data and predicts housing prices.

Technologies Used

  • Python (Core Language)
  • NumPy and pandas (Data Handling)
  • scikit-learn (Modeling, Preprocessing, Evaluation)
  • joblib (Model Serialization)

How It Works

  1. Dataset:

    • The California Housing Dataset is loaded using fetch_california_housing.
    • Features and target values (median house prices) are extracted.
  2. Data Splitting:

    • The dataset is split into training (80%) and testing (20%) subsets.
  3. Preprocessing:

    • Numerical features are standardized using StandardScaler within a ColumnTransformer.
  4. Model Training:

    • A Linear Regression model is trained using the preprocessed training data.
  5. Evaluation:

    • The model predicts prices for the test set, and the Mean Squared Error (MSE) is computed.
  6. Saving and Loading the Model:

    • The trained model is saved as housing_prices_model.joblib for future predictions.
    • The saved model is reloaded for predicting prices for new data.

Setup and Installation

  1. Clone the Repository:
    git clone https://github.com/your-repo/housing-prices-prediction.git
    cd housing-prices-prediction
  2. Install Dependencies: Ensure Python 3.6+ is installed, then install the required libraries:
    pip install numpy pandas scikit-learn joblib
  3. Run the Script: Execute the Python script:
    python main.py
    

Usage

Run the Script:

Train the model, evaluate its performance, and save it for reuse.

Predict New Prices:

Modify the new_house DataFrame in the script with the desired input features. Load the saved model and make predictions for the new house.


Output

Mean Squared Error: Evaluates model accuracy on test data.

Mean Squared Error on Test Data: 0.47
Predicted Price for the New House: $237500.00

Future Improvements

Experiment with advanced models such as Random Forest or Gradient Boosting.

Conduct hyperparameter tuning to optimize the model.

Implement feature engineering to improve accuracy.

Add support for categorical and text features using methods like CountVectorizer.


Acknowledgments

scikit-learn: For providing the dataset and ML tools.

joblib: For efficient model persistence.

NumPy and pandas: For data manipulation.

About

A project predicting housing prices using the California Housing Dataset with Linear Regression, featuring data preprocessing pipelines, model training, and future prediction capabilities.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages