Challenge Competitions | Years | Data type | Languages | Public Label (train&dev/test) | Audio | Visual | Team No. | Top-1 System |
---|---|---|---|---|---|---|---|---|
ASVspoof 2015 (audio) [15] | 2015 | Speech | English | Yes/Yes | Yes | No | 16 | Ensemble |
ASVspoof 2019 (LA Task) [16] | 2019 | Speech | English | Yes/Yes | Yes | No | 48 | Ensemble |
DFDC [17], [18] | 2020 | Speech | English | Yes/Yes | Yes | Yes | 2114 | Ensemble |
FTC [19] | 2020 | Speech | English | No/No | Yes | No | n/a | n/a |
ASVspoof 2021 (LA Task) [20] | 2021 | Speech | English | Yes/Yes | Yes | No | 41 | Ensemble |
ASVspoof 2021 (DF Task) [20] | 2021 | Speech | English | Yes/Yes | Yes | No | 33 | Ensemble |
ADD 2022 Track 1 [21] | 2022 | Speech | Chinese | Yes/Yes | Yes | No | 48 | Single model |
ADD 2022 Track 2 [21] | 2022 | Speech | Chinese | Yes/Yes | Yes | No | 27 | Single model |
ADD 2022 Track 3.2 [21] | 2022 | Speech | Chinese | Yes/Yes | Yes | No | 33 | Single model |
ADD 2023 Track 1.2 [22] | 2023 | Speech | Chinese | No/No | Yes | No | 49 | Ensemble model |
ADD 2023 Track 2 [22] | 2023 | Speech | Chinese | No/No | Yes | No | 16 | Single model |
AV-Deepfake1M [23], [24] | 2024 | Speech | English | Yes/No | Yes | Yes | n/a | n/a |
ASVspoof 2024 [25] | 2024 | Speech | English | Yes/No | Yes | No | 53 | Ensemble model |
SVDD 2024 [26], [27] | 2024 | Singing | Multi-language (6) | Yes/No | Yes | No | 47 | Ensemble model |
Dataset | Year | Language | Speakers (Male/Female) | Utt. No. (Real/Fake) | AI-Synthesized Speech Systems | Speech Condition | Real Speech Resources | Utt. length | Evaluation Metrics |
---|---|---|---|---|---|---|---|---|---|
ASVspoof 2015 [15](audio) | 2015 | English | 45/61 | 16,651/246,500 | 10 | Clean | Speaker Volunteers | 1 to 2 | EER |
FoR [30](audio) | 2019 | English | 33 | 198,000+ | 7 | Clean | Kaggle [31] | 2.35 | Acc |
ASVspoof 2019 (LA task) [16](audio) | 2019 | English | 46/61 | 121,461 | 19 | Clean & Noisy | Speaker Volunteers | n/a | EER |
DFDC [32](video) | 2020 | English | 3426 | 12,8154/104,500 | 1 | Clean & Noisy | Speaker Volunteers | 68.8 | Precision/ Recall |
ASVspoof 2021 (LA task) [20](audio) | 2021 | English | 21/27 | 18,452/163,114 | 13 | Clean & Noisy | Speaker Volunteers | n/a | EER |
ASVspoof 2021 (DF task) [20](audio) | 2021 | English | 21/27 | 22,617/589,212 | 100+ | Clean & Noisy | Speaker Volunteers | n/a | EER |
WaveFake [33](audio) | 2021 | English, Japanese | 0/2 | 117,985 | 6 | Clean | LJSPEECH [29] & JSUT [30] | 6s/4.8s | EER |
KoDF [36](video) | 2021 | Korean | 198/205 | 62,116/175,776 | 2 | Clean | Speaker Volunteers | 90/15 (real/fake) | Acc & AuC |
ADD 2022 [21] | 2022 | Chinese | 40/40 | 3012/24072 | 2 | Clean | AISHELL-3 [37] | 1s to 10s | EER |
FakeAVCeleb [38](video) | 2022 | English | 250/250 | 570/25000 | 2 | Clean & Noisy | Vox-Celeb2 [39] | 7s | AuC |
In-the-Wild [40](audio) | 2022 | English | 58 | 31779 | 0 | Clean & Noisy | Self-collected | 4.3s | EER |
LAV-DF [41](video) | 2022 | English | 153 | 36,431/99,873 | 1 | Clean & Noise | Vox-Celeb2 [39] | 3s to 20s | AP |
Voc.v [42](audio) | 2023 | English | 46/61 | 82,048 | 5 | Clean & Noisy | ASVspoofing 2019 LA | n/a | EER |
CFAD [43](audio) | 2023 | Chinese | 1023 | 374,000 | 12 | Clean & Noisy & Codecs | AISHELL1-3 [44], [45], MAGICDATA [46] | n/a | EER |
PartialSpoof [47](audio) | 2023 | English | 46/61 | 12,483/108,87 | 19 | Clean & Noisy | ASVspoofing 2019 | 0.2s-6.4s | EER |
LibriSeVoc [48](audio) | 2023 | English | n/a | 13,201/79,06 | 6 | Clean & Noisy | Librispeech | 5s-34s | EER |
AV-Deepfake1M [23], [24](video) | 2023 | English | 2,068 | 286,721/860,039 | 2 | Clean & Noisy | Vox-Celeb2 [33] | 5s-35s | Acc & AuC |
MLAAD [49](audio) | 2024 | Multi-Language(23) | n/a | 76,000 | 54 | Clean & Noisy | M-AILABS [50] | n/a | Acc. |
ASVspoof 2024 [25](audio) | 2024 | English | 964/958 | 188,819/815,262 | 28 | Clean & Noisy | MLS [51] | n/a | EER |
SVDD2024 [26](audio) | 2024 | Multi-Language (6) | 59 | 12,169/72,235 | 48 | Clean | Mandarin & Japanese [27] | n/a | EER |
Table IV: DEEPFAKE SPEECH GENERATION SYSTEMS USED IN PUBLIC DSD DATASETS (TTS: TEXT TO SPEECH, VC: VOICE CONVERSION, AT: ADVERSARIAL ATTACH USING MALAFIDE OR MALOCOPULA)
Datasets | Year | No. of TTS/VC/AT | Deepfake Speech Generation Systems |
---|---|---|---|
ASVspoof 2015 [15] | 2015 | 7 VC, 3 TTS | VC-01 [52], [53], VC-02 [54], TTS-01 [55], TTS-02 [55], VC-03 [56], VC-04 [57], VC-05 [57], VC-06 [58], VC-07 [59], TTS-03 [60] |
FoR [30] | 2019 | 7 TTS | Deep Voice 3, Amazon AWS Polly, Baidu TTS, Google Traditional TTS, Google Cloud TTS, Google Wavenet TTS, Microsoft Azure TTS |
ASVspoof 2019 (LA task) [16] | 2019 | 8 VC, 11 TTS | TTS-01 [61], TTS-02 [61], [62], TTS-03 [63], TTS-04 [64], VC-01 [65], VC-02 [66], TTS-05 [63], [67], TTS-06 [61], [68], TTS-07 [69], [70], TTS-08 [71], [72], TTS-09 [71], [72], [73], TTS-10 [74], VC-03+TTS [75], VC-04+TTS [76], [77], VC-05+TTS [76], [77], TTS-11 [64], VC-06 [78], [79], VC-07 [80], [81], [82], VC-08 [66] |
DFDC [32] | 2020 | 1 TTS | TTS Skins voice conversion [83] |
KoDF [36] | 2021 | 2 TTS | ATFHP [84] and Wav2Lip [85] |
ASVspoof 2021 (LA task) [20] | 2021 | 13 TTS/VC | Reuse ASVspoof 2019 |
ASVspoof 2021 (DF task) 20] | 2021 | 100 TTS/VC | Vocoders [86] |
WaveFake [33] | 2021 | 6 TTS | MelGAN [87], FB-MelGAN [87], HiFi-GAN [88], WaveGlow [89], PWG [90], MB-MelGAN [87] |
FakeAVCeleb [38] | 2022 | 2 TTS | SV2TTS [91], [92] |
In-the-Wild [40] | 2022 | n/a | n/a |
LAV-DF [41] | 2022 | 1 TTS | SV2TTS [93] |
Voc.v [42] | 2023 | 5 TTS | HiFi-GAN [88], MB-MelGAN [87], WaveGlow [89], PWG [90], Hn-NSF [94] |
CFAD [43] | 2023 | 11 TTS | STRAIGHT [95], Griffin-Lim [96], LPCNet [97], WaveNet [98], PWG [90], HiFi-GAN [99], MB-MelGAN [87], MelGAN [87], WORLD [100], FastSpeech [101], Tacotron-HifiGAN [102] |
PartialSpoof [47] | 2023 | 21 TTS/VC | Reuse ASVspoof 2019 |
LibriSeVoc [48] | 2023 | 6 TTS/VC | WaveNet [98], WaveRNN [103], MelGAN [87], Parallel WaveGAn [104], WaveGrad [105], DiffWave [106] |
AV-Deepfake1M [23], [24] | 2023 | 2 TTS | VITS [107], YoursTTS [108] |
MLAAD [49] | 2024 | 54 TTS | Bark, Capacitron, FastPitch, GlowTTS, Griffin Lim, Jenny, NeuralHMM, Overflow, Parler TTS, Speech5, Tacotron DDC, Tacotron2, Tacotron2 DCA, Tacotron2 DH, Tcotron2-DDC, Tortoise, VITS, VITS Neon, VITS-MMS, XTTS v1.1, XTTS v2 |
ASVspoof 2024 [25] | 2024 | 15 TTS, 6 VC, 7 AT | TTS-01 [109], TTS-02 [110], TTS-03 [111], TTS-04 [112], TTS-05 [113], TTS-06 [114], TTS-07 [115],TTS-08(self-develop), VC-01 [116], TTS-09 [117], VC-02 [118], VC-03(self-develop), TTS-10 [119], AT-01 (Malafide+TTS-10 [119]), TTS-11 [120], AT-02(self-Develop), TTS-12 [121], TTS-13 [122], AT-03(Malafide+TTS [123]), VC-04(self-develop), VC-05 [124], VC-06(add noise), AT-04(Malacopula+VC-06), TTS-14 [125], TTS-15 [126], AT-05(Malacopula+AT-01), AT-06(Malacopula+TTS-13 [122]), AT-07(Malacopula+VC-05 [124]) |