Skip to content

Multi Regression Car Price Predictor based on a Kaggle dataset using Grid search and optimized hyperparameters

Notifications You must be signed in to change notification settings

Jackhammer9/Car-Price-Predictor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A machine learning project that predicts used car prices based on various features such as make, model, year, engine capacity, and more.

Python License LinkedIn


Overview

Predicting the price of used cars is a valuable exercise for both car dealerships and individual car owners. By analyzing historical data, we can use machine learning techniques to estimate the market value of a used car based on features like make, model, mileage, engine capacity, fuel type, and more.

This project showcases:

  • End-to-end data handling (cleaning, feature engineering)
  • Multiple regression algorithms (e.g., Linear Regression, Random Forest, XGBoost)
  • Model tuning and evaluation

Goal: Create a predictive model that accurately estimates a car’s price given its attributes.


Dataset

Source: Car Price Prediction Dataset by Hellbuoy on Kaggle (https://www.kaggle.com/datasets/hellbuoy/car-price-prediction)

This dataset contains various features like:

  • Car name (brand/model)
  • Year of manufacture
  • Selling price
  • Present price (original price)
  • Kilometers driven
  • Fuel type, Seller type, Transmission
  • And more…

Note: Please check the dataset’s licensing and usage permissions before commercial use.


  • data/: Contains raw or preprocessed data (or a README with a link to the dataset).
  • notebooks/: Jupyter notebooks for EDA, model training, and experiments.
  • src/: Python scripts for data preprocessing, modeling, etc.
  • models/: Serialized model files for quick loading.

Getting Started

  1. Clone the Repository

    git clone https://github.com/Jackhammer9/Car-Price-Predictor.git
    cd Car-Price-Predictor
    
  2. Create a Virtual Environment (Optional but Recommended)

    Using conda

    conda create -n car-price-predictor python=3.8
    conda activate car-price-predictor
    

    or using venv

    python -m venv env
    source env/bin/activate
    
  3. Install Dependencies

    pip install -r requirements.txt
    
  4. Download the Dataset If not included, download the dataset from Kaggle: https://www.kaggle.com/datasets/hellbuoy/car-price-prediction and place it in the data/ folder (e.g., Car-Data.csv).


EDA (Exploratory Data Analysis)

During EDA, we examine:

  • Missing values and possible imputation strategies
  • Distribution of numeric variables (mileage, price, etc.)
  • Categorical variable analysis (fuel type, seller type, etc.)
  • Correlation between features and target (selling price)

Modeling Approach

We tried multiple algorithms to find the best performer:

  1. Linear Regression

    • Pros: Interpretable, fast to train
    • Cons: May not capture nonlinear relationships well
  2. Random Forest

    • Pros: Handles nonlinearities, robust to outliers, can measure feature importance
    • Cons: Can be slower, may overfit if not tuned properly
  3. XGBoost

    • Pros: Often achieves high accuracy on tabular data, can handle missing data well
    • Cons: Tuning can be more involved

Hyperparameter Tuning: We used GridSearchCV or RandomizedSearchCV for each model to find optimal parameters (e.g., max depth, n_estimators, learning rate).


Future Improvements

  1. Advanced Feature Engineering:

    • Derived features like car age, brand-specific average prices, etc.
  2. Ensemble Methods:

    • Combine multiple models (e.g., stacking) for improved performance.
  3. Deep Learning:

    • Experiment with neural networks on tabular data (though benefits may vary).
  4. Deployment:

    • Containerize with Docker or deploy to AWS/Azure/GCP.

Contributing

Contributions, issues, and feature requests are welcome! Feel free to fork this repo and submit a pull request, or open an issue.


License

Distributed under the MIT License. See LICENSE for more information.

About

Multi Regression Car Price Predictor based on a Kaggle dataset using Grid search and optimized hyperparameters

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published