This repository is the easiest way to start using conformal prediction (a.k.a. conformal inference) on real data.
Each of the notebooks
applies conformal prediction to a real prediction problem with a state-of-the-art machine learning model.
No need to download the model or data in order to run conformal
Raw model outputs for several large-scale real-world datasets and a small amount of sample data from each dataset are downloaded automatically by the notebooks. You can develop and test conformal prediction methods entirely in this sandbox, without ever needing to run the original model or download the original data. Open a notebook to see the expected output. You can use these notebooks to experiment with existing methods or as templates to develop your own.
notebooks/imagenet-smallest-sets.ipynb
: Imagenet classification with a ResNet152 classifier. Prediction sets guaranteed to contain the true class with 90% probability.notebooks/meps-cqr.ipynb
: Medical expenditure regression with a Gradient Boosting Regressor and conformalized quantile regression. Prediction intervals guaranteed to contain the true dollar value with 90% probability.notebooks/multilabel-classification-mscoco.ipynb
: Multilabel image classification on the Microsoft Common Objects in Context (MS-COCO) dataset. Set-valued prediction is guaranteed to contain 90% of the ground truth classes.notebooks/toxic-text-outlier-detection.ipynb
: Detecting toxic or hateful online comments via conformal outlier detection. No more than 10% of in-distribution data will get flagged as toxic.notebooks/tumor-segmentation.ipynb
: Segmenting gut polyps from endoscopy images. Segmentation masks contain 90% of the ground truth tumor pixels.notebooks/weather-time-series-distribution-shift
: Predicting future temperatures around the world using time-series data and weighted conformal prediction. Prediction intervals contaion 90% of true temperatures.notebooks/imagenet-selective-classification.ipynb
: When the Imagenet classifier is unsure, it will abstain. Otherwise, it will have an accuracy of 90%, even though the base model was only 77% accurate.- ...and more!
To run these notebooks locally, you just need to have the correct dependencies installed and press run all cells
! The notebooks will automatically download all required data and model outputs. You will need 1.5GB of space on your computer in order for the notebook to store the auto-downloaded data. If you want to see how we generated the precomputed model outputs and data subsamples, see the files in generation-scripts
. There is one for each dataset. To create a conda
environment with the correct dependencies, run conda env create -f environment.yml
. If you still get a dependency error, make sure to activate the conformal
environment within the Jupyter notebook.
This repository is meant to accompany our paper, the Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification. In that paper is a detailed explanation of each example and attributions. If you find this repository useful, in addition to the relevant methods and datasets, please cite:
@article{angelopoulos2021gentle,
title={A gentle introduction to conformal prediction and distribution-free uncertainty quantification},
author={Angelopoulos, Anastasios N and Bates, Stephen},
journal={arXiv preprint arXiv:2107.07511},
year={2021}
}