This resposity maintains a collection of important papers on knowledge distillation.
-
Model Compression, KDD 2006
- https://dl.acm.org/doi/abs/10.1145/1150402.1150464
- Cristian Buciluǎ, Rich Caruana, Alexandru Niculescu-Mizil.
-
Do Deep Nets Really Need to be Deep?, NeurIPS 2014
- https://arxiv.org/abs/1312.6184
- Lei Jimmy Ba, Rich Caruana.
-
Distilling the Knowledge in a Neural Network, NeurIPS-workshop 2014
- https://arxiv.org/abs/1503.02531
- https://neurips.cc/virtual/2014/workshop/4294
- Geoffrey Hinton, Oriol Vinyals, Jeff Dean.
-
Knowledge Distillation and Student-Teacher Learning for Visual Intelligence: A Review and New Outlooks, TPAMI 2022
- https://arxiv.org/abs/2004.05937
- Lin Wang, Kuk-Jin Yoon.
-
Knowledge Distillation: A Survey, IJCV 2021
- https://arxiv.org/abs/2006.05525
- Jianping Gou, Baosheng Yu, Stephen John Maybank, Dacheng Tao.
Extremely Promising !!!!!
-
Knowledge Distillation in Iterative Generative Models for Improved Sampling Speed [Tensorflow]
- https://arxiv.org/abs/2101.02388
- Eric Luhman, Troy Luhman
-
Progressive Distillation for Fast Sampling of Diffusion Models, ICLR 2022 [Tensorflow]
- https://arxiv.org/abs/2202.00512
- Tim Salimans, Jonathan Ho
-
Accelerating Diffusion Sampling with Classifier-based Feature Distillation, ICME 2023 [PyTorch]
- https://arxiv.org/abs/2211.12039
- Wujie Sun, Defang Chen, Can Wang, Deshi Ye, Yan Feng, Chun Chen
-
Fast Sampling of Diffusion Models via Operator Learning, ICML 2023 [PyTorch]
- https://arxiv.org/abs/2211.13449
- Hongkai Zheng, Weili Nie, Arash Vahdat, Kamyar Azizzadenesheli, Anima Anandkumar
-
Consistency Models, ICML 2023 [PyTorch]
- https://arxiv.org/abs/2303.01469
- Yang Song, Prafulla Dhariwal, Mark Chen, Ilya Sutskever
-
TRACT: Denoising Diffusion Models with Transitive Closure Time-Distillation [PyTorch]
- https://arxiv.org/abs/2303.04248
- David Berthelot, Arnaud Autef, Jierui Lin, Dian Ang Yap, Shuangfei Zhai, Siyuan Hu, Daniel Zheng, Walter Talbott, Eric Gu
-
A Geometric Perspective on Diffusion Models
- https://arxiv.org/abs/2305.19947
- Defang Chen, Zhenyu Zhou, Jian-Ping Mei, Chunhua Shen, Chun Chen, Can Wang
-
BOOT: Data-free Distillation of Denoising Diffusion Models with Bootstrapping, ICML 2024 [PyTorch]
- https://arxiv.org/abs/2306.05544
- Jiatao Gu, Shuangfei Zhai, Yizhe Zhang, Lingjie Liu, Josh Susskind
-
Fast ODE-based Sampling for Diffusion Models in Around 5 Steps, CVPR 2024 [PyTorch]
- https://arxiv.org/abs/2312.00094
- Zhenyu Zhou, Defang Chen, Can Wang, Chun Chen
-
On the Trajectory Regularity of ODE-based Diffusion Sampling, ICML 2024 [PyTorch]
- https://arxiv.org/abs/2405.11326
- Defang Chen, Zhenyu Zhou, Can Wang, Chunhua Shen, Siwei Lyu
-
FitNets: Hints for Thin Deep Nets, ICLR 2015 [Theano]
- https://arxiv.org/abs/1412.6550
- Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, Yoshua Bengio.
-
Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer, ICLR 2017 [PyTorch]
- https://arxiv.org/abs/1612.03928
- Sergey Zagoruyko, Nikos Komodakis.
-
Learning Deep Representations with Probabilistic Knowledge Transfer, ECCV 2018 [Pytorch]
- https://arxiv.org/abs/1803.10837
- Nikolaos Passalis, Anastasios Tefas.
-
Relational Knowledge Distillation, CVPR 2019 [Pytorch]
- https://arxiv.org/abs/1904.05068
- Wonpyo Park, Dongju Kim, Yan Lu, Minsu Cho.
-
Variational Information Distillation for Knowledge Transfer, CVPR 2019
- https://arxiv.org/abs/1904.05835
- Sungsoo Ahn, Shell Xu Hu, Andreas Damianou, Neil D. Lawrence, Zhenwen Dai.
-
Similarity-Preserving Knowledge Distillation, CVPR 2019
- https://arxiv.org/abs/1907.09682
- Frederick Tung, Greg Mori.
-
Contrastive Representation Distillation, ICLR 2020 [Pytorch]
- https://arxiv.org/abs/1910.10699
- Yonglong Tian, Dilip Krishnan, Phillip Isola.
-
Heterogeneous Knowledge Distillation using Information Flow Modeling, CVPR 2020 [Pytorch]
- https://arxiv.org/abs/2005.00727
- Nikolaos Passalis, Maria Tzelepi, Anastasios Tefas.
-
Cross-Layer Distillation with Semantic Calibration, AAAI 2021 [Pytorch][TKDE]
- https://arxiv.org/abs/2012.03236
- Defang Chen, Jian-Ping Mei, Yuan Zhang, Can Wang, Zhe Wang, Yan Feng, Chun Chen.
-
Distilling Knowledge via Knowledge Review, CVPR 2021 [Pytorch]
- https://arxiv.org/abs/2104.09044
- Pengguang Chen, Shu Liu, Hengshuang Zhao, Jiaya Jia.
-
Distilling Holistic Knowledge with Graph Neural Networks, ICCV 2021 [Pytorch]
- https://arxiv.org/abs/2108.05507
- Sheng Zhou, Yucheng Wang, Defang Chen, Jiawei Chen, Xin Wang, Can Wang, Jiajun Bu.
-
Decoupled Knowledge Distillation, CVPR 2022 [Pytorch]
- https://arxiv.org/abs/2203.08679
- Borui Zhao, Quan Cui, Renjie Song, Yiyu Qiu, Jiajun Liang
-
Knowledge Distillation with the Reused Teacher Classifier, CVPR 2022 [Pytorch]
- https://arxiv.org/abs/2203.14001
- Defang Chen, Jian-Ping Mei, Hailin Zhang, Can Wang, Yan Feng, Chun Chen.
-
Deep Mutual Learning, CVPR 2018 [TensorFlow]
- https://arxiv.org/abs/1706.00384
- Ying Zhang, Tao Xiang, Timothy M. Hospedales, Huchuan Lu.
-
Large scale distributed neural network training through online distillation, ICLR 2018
- https://arxiv.org/abs/1804.03235
- Rohan Anil, Gabriel Pereyra, Alexandre Passos, Robert Ormandi, George E. Dahl and Geoffrey E. Hinton.
-
Knowledge Distillation by On-the-Fly Native Ensemble, NeurIPS 2018 [PyTorch]
- https://arxiv.org/abs/1806.04606
- Xu Lan, Xiatian Zhu, Shaogang Gong.
-
Online Knowledge Distillation with Diverse Peers, AAAI 2020 [Pytorch]
- https://arxiv.org/abs/1912.00350
- Defang Chen, Jian-Ping Mei, Can Wang, Yan Feng and Chun Chen.
-
Feature-map-level Online Adversarial Knowledge Distillation, ICML 2020
- https://arxiv.org/abs/2002.01775
- Inseop Chung, SeongUk Park, Jangho Kim, Nojun Kwak.
-
Peer collaborative learning for online knowledge distillation, AAAI 2021
- https://arxiv.org/abs/2006.04147
- Guile Wu, Shaogang Gong.
-
Distilling knowledge from ensembles of neural networks for speech recognition, INTERSPEECH 2016
- https://www.isca-speech.org/archive_v0/Interspeech_2016/pdfs/1190.PDF
- Austin Waters, Yevgen Chebotar.
-
Efficient Knowledge Distillation from an Ensemble of Teachers, INTERSPEECH 2017
- https://isca-speech.org/archive_v0/Interspeech_2017/pdfs/0614.PDF
- Takashi Fukuda, Masayuki Suzuki, Gakuto Kurata, Samuel Thomas, Jia Cui, Bhuvana Ramabhadran.
-
Agree to Disagree: Adaptive Ensemble Knowledge Distillation in Gradient Space, NeurIPS 2020 [Pytorch]
- https://proceedings.neurips.cc/paper/2020/hash/91c77393975889bd08f301c9e13a44b7-Abstract.html
- Shangchen Du, Shan You, Xiaojie Li, Jianlong Wu, Fei Wang, Chen Qian, Changshui Zhang.
-
Reinforced Multi-Teacher Selection for Knowledge Distillation, AAAI 2021
- https://arxiv.org/abs/2012.06048
- Fei Yuan, Linjun Shou, Jian Pei, Wutao Lin, Ming Gong, Yan Fu, Daxin Jiang
-
Confidence-Aware Multi-Teacher Knowledge Distillation, ICASSP 2022 [Pytorch]
- https://arxiv.org/abs/2201.00007
- Hailin Zhang, Defang Chen, Can Wang.
-
Adaptive Multi-Teacher Knowledge Distillation with Meta-Learning, ICME 2023 [Pytorch]
- https://arxiv.org/abs/2306.06634
- Hailin Zhang, Defang Chen, Can Wang.
-
Data-Free Knowledge Distillation for Deep Neural Networks, NeurIPS-workshop 2017 [Tensorflow]
- https://arxiv.org/abs/1710.07535
- Raphael Gontijo Lopes, Stefano Fenu, Thad Starner
-
DAFL: Data-Free Learning of Student Networks, ICCV 2019 [PyTorch]
- https://arxiv.org/abs/1904.01186
- Hanting Chen, Yunhe Wang, Chang Xu, Zhaohui Yang, Chuanjian Liu, Boxin Shi, Chunjing Xu, Chao Xu, Qi Tian
-
Zero-Shot Knowledge Distillation in Deep Networks, ICML 2019 [Tensorflow]
- https://arxiv.org/abs/1905.08114
- Gaurav Kumar Nayak, Konda Reddy Mopuri, Vaisakh Shaj, R. Venkatesh Babu, Anirban Chakraborty
-
Zero-shot Knowledge Transfer via Adversarial Belief Matching, NeurIPS 2019 [Pytorch]
- https://arxiv.org/abs/1905.09768
- Paul Micaelli, Amos Storkey
-
Dreaming to Distill: Data-free Knowledge Transfer via DeepInversion, CVPR 2020 [Pytorch]
- https://arxiv.org/abs/1912.08795
- Hongxu Yin, Pavlo Molchanov, Zhizhong Li, Jose M. Alvarez, Arun Mallya, Derek Hoiem, Niraj K. Jha, Jan Kautz
-
The Knowledge Within: Methods for Data-Free Model Compression, CVPR 2020
- https://arxiv.org/abs/1912.01274
- Matan Haroush, Itay Hubara, Elad Hoffer, Daniel Soudry
-
Contrastive Model Inversion for Data-Free Knowledge Distillation, IJCAI 2021 [Pytorch]
- https://arxiv.org/abs/2105.08584
- Gongfan Fang, Jie Song, Xinchao Wang, Chengchao Shen, Xingen Wang, Mingli Song
-
Customizing Synthetic Data for Data-Free Student Learning, ICME 2023 [Pytorch]
- https://arxiv.org/abs/2307.04542
- Shiya Luo, Defang Chen, Can Wang
-
Structured Knowledge Distillation for Dense Prediction, CVPR 2019, TPAMI 2020 [Pytorch]
- https://arxiv.org/abs/1903.04197
- Yifan Liu, Changyong Shun, Jingdong Wang, Chunhua Shen.
-
Channel-wise Knowledge Distillation for Dense Prediction, ICCV 2021 [Pytorch]
- https://arxiv.org/abs/2011.13256
- Changyong Shu, Yifan Liu, Jianfei Gao, Zheng Yan, Chunhua Shen.
-
Cross-Image Relational Knowledge Distillation for Semantic Segmentation, CVPR 2022 [Pytorch]
- https://arxiv.org/abs/2204.06986
- Chuanguang Yang, Helong Zhou, Zhulin An, Xue Jiang, Yongjun Xu, Qian Zhang.
-
Holistic Weighted Distillation for Semantic Segmentation, ICME 2023 [Pytorch]
- Wujie Sun, Defang Chen, Can Wang, Deshi Ye, Yan Feng, Chun Chen.