Subject created by the 42AI association. Discover Data Science in the projects where you re-constitute Poudlard’s Sorting Hat. Warning: this is not a subject on cameras.
git clone https://github.com/pde-bakk/DSLR.git && cd DSLR
pip3 install -r requirements.txt
This program displays information about all numerical features of the provided dataset.
python3 srcs/describe.py datasets/dataset_train.csv
I created a set of scripts, each using a particular visualization method to answer a question.
These scripts require datasets/dataset_train.csv
as a parameter to be able to answer the questions.
Which Hogwarts course has a homogenous score distribution between all four houses?
python3 srcs/histogram.py datasets/dataset_train.csv
What are the two features that are similar ?
python3 srcs/scatter_plot.py datasets/dataset_train.csv
From this visualization, what features are you going to use for your logistic regression?
python3 srcs/pair_plot.py datasets/dataset_train.csv
First off, train the models by running python3 srcs/logreg_train.py datasets/dataset_train.csv
.
This will generate a datasets/weights
file which can then be used for the predictions.
Then, run the predictions with python3 srcs/logreg_predict.py datasets/dataset_test.csv datasets/weights
.
This will generate a file with all predictions in datasets/houses.csv
.