Skip to content

Commit

Permalink
2023 refactor (#86)
Browse files Browse the repository at this point in the history
- Refactor. Please refer to the `Design Overview` session in docs for more details.
- Support both `matplotlib` and `plotly`.
- Update tutorials according to the refactor codes.
- Better unit test.
- Semi-automate dosctring generation.
  • Loading branch information
SauceCat authored Jun 5, 2023
1 parent b022a0a commit 7fae76b
Show file tree
Hide file tree
Showing 119 changed files with 1,686,681 additions and 12,981 deletions.
6 changes: 3 additions & 3 deletions .coveragerc
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@ include =
./*

omit =
*tests*
tests/*
*/_version.py
docs/*
docs_history/*
notebooks/*
tutorials/*
.tox/*
assets/*
16 changes: 11 additions & 5 deletions .flake8
Original file line number Diff line number Diff line change
@@ -1,13 +1,19 @@
[flake8]
max-line-length=100
# B905 should be enabled when we drop support for 3.9
ignore = E203, E266, E501, W503, B905, B907
# line length is intentionally set to 80 here because black uses Bugbear
# See https://black.readthedocs.io/en/stable/the_black_code_style/current_style.html#line-length for more details
max-line-length = 80
max-complexity = 18
select = B,C,E,F,W,T4,B9
exclude =
.git
__pycache__
tox.ini
docs/*
docs_history/*
notebooks/*
tutorials/*
.*
*.cfg
*.in
ignore=
*.in
versioneer.py
*/__init__.py
73 changes: 73 additions & 0 deletions .github/workflows/tox-test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
name: Tox Test

on:
push:
branches:
- master
tags:
- 'v*.*.*'
pull_request:
branches:
- master

jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [3.9]
toxenv: [py39, docs]

steps:
- uses: actions/checkout@v3
with:
fetch-depth: 0 # this is equivalent to Travis CI's `git fetch` in before_install

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install tox
- name: Run Tox
env:
TOXENV: ${{ matrix.toxenv }}
run: tox -e $TOXENV

# This step will set up a display that you can use for running tests that require a GUI (like certain selenium tests)
- name: Run xvfb
run: |
sudo apt-get install -y xvfb
export DISPLAY=:99.0
sudo Xvfb :99 -screen 0 1024x768x24 > /dev/null 2>&1 &
sleep 3
- name: Upload coverage.xml as artifact
uses: actions/upload-artifact@v3
with:
name: coverage
path: ./coverage.xml

upload-codecov:
needs: ['test']
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v3

- name: Download coverage.xml artifact
uses: actions/download-artifact@v3
with:
name: coverage

- name: Upload coverage reports to Codecov
uses: codecov/codecov-action@v3
env:
CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }}
with:
file: ./coverage.xml
fail_ci_if_error: true
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -110,3 +110,8 @@ venv.bak/
_build
_static
_templates

# OS generated files
.DS_Store
.DS_Store?
._*
6 changes: 3 additions & 3 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@ sudo: false
matrix:
fast_finish: true
include:
- python: 3.7
env: TOXENV=py37
- python: 3.7
- python: 3.9
env: TOXENV=py39
- python: 3.9
env: TOXENV=docs
services:
- xvfb
Expand Down
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,14 @@ All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).

## [0.3.0]
### Changed
- Refactor. Please refer to the `Design Overview` session in docs for more details.
- Support both `matplotlib` and `plotly`.
- Update tutorials according to the refactor codes.
- Better unit test.
- Semi-automate dosctring generation.

## [0.2.0]
### Added
- Formal documentation hosted on readthedocs.org
Expand Down
2 changes: 2 additions & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,2 +1,4 @@
include versioneer.py
include pdpbox/_version.py
include README.md
include requirements.txt
131 changes: 71 additions & 60 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,66 +1,41 @@

# PDPbox
[![PyPI version](https://badge.fury.io/py/PDPbox.svg)](https://badge.fury.io/py/PDPbox)
[![Build Status](https://travis-ci.com/SauceCat/PDPbox.svg?branch=master)](https://travis-ci.com/SauceCat/PDPbox)
[![codecov](https://codecov.io/gh/SauceCat/PDPbox/branch/master/graph/badge.svg?token=wIGFZIoSKJ)](https://codecov.io/gh/SauceCat/PDPbox)
![Build Status](https://github.com/SauceCat/PDPbox/actions/workflows/tox-test.yml/badge.svg)

python partial dependence plot toolbox
Python **P**artial **D**ependence **P**lot tool**box**.

Visualize the influence of certain features on model predictions for supervised machine learning algorithms,
utilizing partial dependence plots.

## Update! 😹
<img src="images/3_years_codes.gif" />
For a comprehensive explanation, I recommend referring to the [Partial Dependence Plot (PDP)](https://christophm.github.io/interpretable-ml-book/pdp.html) chapter in Christoph Molnar's book, [Interpretable Machine Learning](https://christophm.github.io/interpretable-ml-book/).

Update for versions:
```
xgboost==1.3.3
matplotlib==3.1.1
sklearn==0.23.1
```


## Motivation

This repository is inspired by ICEbox. The goal is to visualize the impact of certain features towards model
prediction for any supervised learning algorithm. (now support all scikit-learn algorithms)


## The common headache

When using black box machine learning algorithms like random forest and boosting, it is hard to understand the
relations between predictors and model outcome.

For example, in terms of random forest, all we get is the feature importance.
Although we can know which feature is significantly influencing the outcome based on the importance
calculation, it really sucks that we don’t know in which direction it is influencing. And in most of the real cases,
the effect is non-monotonic.
## I am back! :smirk_cat:

We need some powerful tools to help understanding the complex relations
between predictors and model prediction.
After four years...

I'm delighted to see how popular PDPbox has become; it has exceeded all my expectations.
When I first embarked on this project, it was a modest endeavor,
simply to whet my appetite for real-world Python package development.

## Highlight
With the shift in my career path towards deep learning in 2018, I had to halt the development and maintenance of PDPbox.
As I no longer actively used it and several other outstanding packages such as
[lime](https://github.com/marcotcr/lime) and [shap](https://github.com/slundberg/shap) were emerging.

1. Helper functions for visualizing target distribution as well as prediction distribution.
2. Proper way to handle one-hot encoding features.
3. Solution for handling complex mutual dependency among features.
4. Support multi-class classifier.
5. Support two variable interaction partial dependence plot.
However, as the years have passed, I have seen PDPbox gain a significant presence in the community.
It's been referenced in various online courses and books, demonstrating its valuable role.
Despite well-known limitations of partial dependence plots,
their simplicity and intuitiveness might have made them a popular starting point for many,
appealing to a broad range of audiences.


## Documentation

- Latest version: http://pdpbox.readthedocs.io/en/latest/
- Historical versions:
- [v0.1.0](https://github.com/SauceCat/PDPbox/blob/master/docs_history/v0.1/docs.md)

## Tutorials
https://github.com/SauceCat/PDPbox/tree/master/tutorials

## Change Logs
https://github.com/SauceCat/PDPbox/blob/master/CHANGELOG.md
Given this, I feel a renewed sense of responsibility to revisit the project, refine the existing code,
potentially add new features, and create additional tutorials.
I'm excited about this next phase and look forward to contributing more to the open source community.

## Installation

- through pip (latest stable version: 0.2.1)
- through pip
```
$ pip install pdpbox
```
Expand All @@ -72,17 +47,31 @@ https://github.com/SauceCat/PDPbox/blob/master/CHANGELOG.md
$ python setup.py install
```

## Reference

- Documentation: http://pdpbox.readthedocs.io/en/latest/
- Tutorials: tutorials
- Change Log: CHANGELOG.md


## Testing
### Test with `pytest`

```
cd <dir>/PDPbox
python -m pytest tests
```

### Test with `tox`
PDPbox can be tested using `tox`.

- First install `tox` and `tox-venv`
- First install `tox`

```
$ pip install tox tox-venv
$ pip install tox
```

- Call `tox` inside the pdpbox clone directory. This will run tests with python3.7.
- Call `tox` inside the pdpbox clone directory. This will run tests with python3.9.

- To test the documentation, call `tox -e docs`.
The documentation should open up in your browser if it is successfully build.
Expand All @@ -91,25 +80,47 @@ PDPbox can be tested using `tox`.

## Gallery
- **PDP:** PDP for a single feature
<img src='https://github.com/SauceCat/PDPbox/blob/master/images/pdp_plot.png' width=90%>

<img src='assets/images/pdp_plot.jpeg' width=90%>

---

- **PDP:** PDP for a multi-class
<img src='https://github.com/SauceCat/PDPbox/blob/master/images/pdp_plot_multiclass.png' width=90%>

<img src='assets/images/pdp_plot_multiclass.jpeg' width=90%>

---

- **PDP Interact:** PDP Interact for two features with contour plot
<img src='https://github.com/SauceCat/PDPbox/blob/master/images/pdp_interact_contour.png' width=60%>

<img src='assets/images/pdp_interact_contour.jpeg' width=90%>

---

- **PDP Interact:** PDP Interact for two features with grid plot
<img src='https://github.com/SauceCat/PDPbox/blob/master/images/pdp_interact_grid.png' width=60%>

<img src='assets/images/pdp_interact_grid.jpeg' width=90%>

---

- **PDP Interact:** PDP Interact for multi-class
<img src='https://github.com/SauceCat/PDPbox/blob/master/images/pdp_interact_multiclass.png' width=90%>

<img src='assets/images/pdp_interact_multiclass.jpeg' width=90%>

---

- **Information plot:** target plot for a single feature
<img src='https://github.com/SauceCat/PDPbox/blob/master/images/target_plot.png' width=90%>

<img src='assets/images/target_plot.jpeg' width=90%>

---

- **Information plot:** target interact plot for two features
<img src='https://github.com/SauceCat/PDPbox/blob/master/images/target_plot_interact.png' width=90%>

- **Information plot:** actual prediction plot for a single feature
<img src='https://github.com/SauceCat/PDPbox/blob/master/images/actual_plot.png' width=90%>
<img src='assets/images/target_plot_interact.jpeg' width=90%>

---

- **Information plot:** prediction plot for a single feature

<img src='assets/images/predict_plot.jpeg' width=90%>
Loading

0 comments on commit 7fae76b

Please # to comment.