This is the "Pandas Express" submission for the Kaggle House Prices: Advanced Regression Techniques Challenge as part of the BMGT438A data science class. Note that the code in this repository relies on data provided by Kaggle which has been removed from the repository's history. Please visit Kaggle to see this dataset.
Final presentation including analysis can be viewed here:
- Clean up the data by turning all of the categorical columns (e.g Neighborhood) into a format an ML model can read using pd.get_dummies()
- Create an initial OLS (Ordinary Least Squares) model to see its r^2 value and see whether there are any other problems with the data
- Make use of SKlearn's automated feature selection package by using RFECV (Recursive Feature Selection with Cross Validation) to recursively determine the number of features to use in the final model as well as what those features are
- Explore SKlearn's Univariate Automated Feature Selection to see if it performs better than RFECV
- Build the final model and analyze the residuals to look for outliers and see if there are any patterns in the model's inaccuracies
- Run the final model on the test dataset to predict prices needed for the final Kaggle submission