Roadmap | Awesome Papers | Runtime (x86_gpu) | Pretrained Models | Huggingface Demo
WeSpeaker mainly focuses on speaker embedding learning, with application to the speaker verification task. We support online feature extraction or loading pre-extracted features in kaldi-format.
- Clone this repo
git clone https://github.com/wenet-e2e/wespeaker.git
- Create conda env: pytorch version >= 1.10.0 is required !!!
conda create -n wespeaker python=3.9
conda activate wespeaker
conda install pytorch=1.10.1 torchaudio=0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge
pip install -r requirements.txt
- VoxCeleb: Speaker Verification recipe on the VoxCeleb dataset
- 🔥 UPDATE 2022.7.19: We apply the same setups as the CNCeleb recipe, and obtain SOTA performance considering the open-source systems
- 🔥 EER/minDCF on vox1-O-clean test set are 0.723%/0.069 (ResNet34) and 0.728%/0.099 (ECAPA_TDNN_GLOB_c1024), after LM fine-tuning and AS-Norm
- CNCeleb: Speaker Verification recipe on the CnCeleb dataset
- VoxConverse: 🔥 UPDATE 2022.7.2: Diarization recipe on the VoxConverse dataset
- Model (SOTA Models)
- Pooling Functions
- TAP(mean) / TSDP(std) / TSTP(mean+std)
- Comparison of mean/std pooling can be found in shuai_iscslp, anna_arxiv
- Attentive Statistics Pooling (ASTP)
- mainly for ECAPA_TDNN
- TAP(mean) / TSDP(std) / TSTP(mean+std)
- Criteria
- Scoring
- Cosine
- PLDA
- Score Normalization (AS-Norm)
- Metric
- EER
- minDCF
- Online Augmentation
- Noise && RIR
- Speed Perturb
- SpecAug
- Literature
For Chinese users, you can scan the QR code on the left to follow our offical account of WeNet Community
.
We also created a WeChat group for better discussion and quicker response. Please scan the QR code on the right to join the chat group.
If you are interested to contribute, feel free to contact @wsstriving or @robin1001