This is the code repository for the IJCAI-21 paper "Understanding the Effect of Bias in Deep Anomaly Detection" by Ziyu Ye, Prof. Yuxin Chen, and Prof. Heather Zheng.
Psyduck wants to talk to you!
Please check our detailed presentation and paper poster at the ./slides folder.
Simply put, we discover the counter-intuitive fact that additional labeled data in anomaly detection can hurt training and bring disastrous bias – knowing more does not mean doing better!
Theoretically, we propose the first rigorous PAC analysis for estimating the relative scoring bias for deep anomaly detection; empirically, we provide the first comprehensive experiments on how a biased training anomaly set affects the detection performance on different anomaly classes.
The main takeaway message is anomaly detection practitioner must not blindly believe in SOTA models, and must treat additional labeled data with extra care.
Again, the big picture is: the access to more information leads to worse generalization in the presence of distributional shift. If you are interested in learning more about such phenomena more broadly, i recommend you to read this one: Causal Confusion in Imitation Learning, NeurIPS '19.
numpy==1.22.2
pandas==1.3.4
torch==1.4.0
joblib==0.14.1
scikit-learn==0.22.2
torchvision==0.5.0
The high-level structure of the codebase follows the ICLR '20 paper "Deep semi-supervised anomaly detection". We thank them for the open-source work for the anomaly detection community.
./loader
provides various data loaders supporting datasets like ImageNet, FashionMNIST, Driver Anomaly Detection, Retina OCT Images, and your own customized datasets. You can check the themain_loading.py
inside for a detailed list../network
provides several network structures to build up the model. Please see the viables options inmain_network.py
../model
contains six popular and SOTA models for anomaly detection with deep learning, ranging from Deep SVDD to Autoencoding Binary Classifier. Please check our paper for the detailed model description../main
contains the main files to verify the PAC analysis or to characterize the effect of bias.
For reimplementation our work of PAC analysis (notice: some of the import function are outdated, you may adapt it to the main.py):
cd ./main
python main_pac_gaussian_train.py --loader_name gaussian9d_hard --n 2000 --mix 1 --ratio_abnormal 0.1 --n_feature 9
For reimplementation our work of general anomaly detection (and identify bias):
# Run the FashionMNIST experiments
$ . scripts/fmnist.sh
# Run the satimage experiments
$ . scripts/satimage.sh
The data for the real-world datasets are downloadable on UCI repository; please check our paper and ./helper/fetch_data.py
for details.
@inproceedings{ye2021understanding,
author = {Ziyu Ye and Yuxin Chen and Haitao Zheng},
editor = {Zhi-Hua Zhou},
title = {Understanding the Effect of Bias in Deep Anomaly Detection},
booktitle = {Proceedings of the Thirtieth International Joint Conference on Artificial
Intelligence, {IJCAI} 2021, Virtual Event / Montreal, Canada, 19-27
August 2021},
pages = {3314--3320},
publisher = {ijcai.org},
year = {2021},
url = {https://doi.org/10.24963/ijcai.2021/456},
}
If you have any problem regarding the code or the paper, please do not hesitate to contact me at ziyuye@uchicago.edu or ziyuye@live.com.