This project predicts housing prices based on various features such as median income, house age, and average number of rooms. It leverages machine learning techniques with a Linear Regression model and provides insights through evaluation metrics.
- Overview
- Features
- Technologies Used
- How It Works
- Setup and Installation
- Usage
- Output
- Future Improvements
- Acknowledgments
The Housing Prices Prediction Project applies machine learning to analyze the California Housing Dataset and predict housing prices. It preprocesses data, trains a model, evaluates performance, and saves the model for future predictions.
- Data Preprocessing:
- Standardizes numerical features using
StandardScaler
for consistent model training.
- Standardizes numerical features using
- Model Training:
- A pipeline integrates preprocessing with a Linear Regression model.
- Model Evaluation:
- Calculates Mean Squared Error (MSE) to assess model accuracy.
- Model Persistence:
- Saves the trained model with
joblib
for reuse.
- Saves the trained model with
- Prediction:
- Accepts new input data and predicts housing prices.
- Python (Core Language)
- NumPy and pandas (Data Handling)
- scikit-learn (Modeling, Preprocessing, Evaluation)
- joblib (Model Serialization)
-
Dataset:
- The California Housing Dataset is loaded using
fetch_california_housing
. - Features and target values (median house prices) are extracted.
- The California Housing Dataset is loaded using
-
Data Splitting:
- The dataset is split into training (80%) and testing (20%) subsets.
-
Preprocessing:
- Numerical features are standardized using
StandardScaler
within aColumnTransformer
.
- Numerical features are standardized using
-
Model Training:
- A Linear Regression model is trained using the preprocessed training data.
-
Evaluation:
- The model predicts prices for the test set, and the Mean Squared Error (MSE) is computed.
-
Saving and Loading the Model:
- The trained model is saved as
housing_prices_model.joblib
for future predictions. - The saved model is reloaded for predicting prices for new data.
- The trained model is saved as
- Clone the Repository:
git clone https://github.com/your-repo/housing-prices-prediction.git cd housing-prices-prediction
- Install Dependencies:
Ensure Python 3.6+ is installed, then install the required libraries:
pip install numpy pandas scikit-learn joblib
- Run the Script:
Execute the Python script:
python main.py
Run the Script:
Train the model, evaluate its performance, and save it for reuse.
Predict New Prices:
Modify the new_house DataFrame in the script with the desired input features. Load the saved model and make predictions for the new house.
Mean Squared Error: Evaluates model accuracy on test data.
Mean Squared Error on Test Data: 0.47
Predicted Price for the New House: $237500.00
Experiment with advanced models such as Random Forest or Gradient Boosting.
Conduct hyperparameter tuning to optimize the model.
Implement feature engineering to improve accuracy.
Add support for categorical and text features using methods like CountVectorizer.
scikit-learn: For providing the dataset and ML tools.
joblib: For efficient model persistence.
NumPy and pandas: For data manipulation.