See our training notebook for our model training process and our evaluation notebook for our model evaluation work.
Tables showing train/test set time split boundaries are available here and here. The first is based on a temporal cross-validation approach that uses cumulative training sets. The second is based on an approach using only the previous year as a training set.
A feature list showing the features we included and wanted to include is available here.
We have recorded feature importance for every validation split. This file is an example.
The final list of block groups (to prioritize for intervention) is given here.
Other files are also available for each validation set, for example see 2016. This includes precision-recall curves for the best classifier and baseline (using just the previous year's evictions as a feature); the list of selected block groups based on the best classifier and best regressor; feature importance based on the best classifier and best regressor; a one-level "stump" decision tree; and a map based on the blocks identified by the best classifier.
In order to run the project yourself and execute the training and evaluation notebooks, first you will need to install the Python dependencies:
$ pipenv install
Then you will need to run all the pre-defined tasks (see below for more on tasks):
$ doit
The project's Python dependencies are managed by pipenv. The dependencies are specified in a file called Pipfile.
If you do not already have pipenv installed, you can install it using pip:
$ pip install pipenv
Once you have installed pipenv, you can install all of the dependencies for this project in one go:
$ pipenv install
This will install the project's Python dependencies into a virtual environment.
By default, this environment will be stored under your home directory, but you
can tell pipenv to store the virtual environment locally in the project by
setting the PIPENV_VENV_IN_PROJECT
environment variable in your
.bash_profile
:
$ export PIPENV_VENV_IN_PROJECT=1
In order to run any executables installed by pipenv, there are two options:
You can prefix every executable invocation with pipenv run
. To run jupyter notebook
, for example, you would instead run:
$ pipenv run jupyter notebook
This makes sure that the version of Jupyter installed by pipenv is the version actually run.
If prefixing every executable invocation gets annoying, you can instead run this command:
$ pipenv shell
This drops you into a sub-shell that is automatically configured to run the proper executables. So once you are in this sub-shell you can just run, for example:
(philly-evictions) $ jupyter notebook
This automatically runs the correct version of Jupyter.
doit is a build tool / task runner. It's a handy alternative to writing a bunch of shell scripts and its very good at creating a pipeline where one step depends on a previous step.
To see all the tasks currently defined in the project, run:
(philly-evictions) $ doit list
To run all the necessary data gathering / augmentation steps, run:
(philly-evictions) $ doit merge
WARNING: The above task can take a while to complete, since it has to download lots of ACS data!