This is the roadmap for wespeaker version 1.0.
- Standard dataset support
- VoxCeleb
- CnCeleb
- SOTA models support
- x-vector (tdnn based, milestone deep speaker embedding)
- r-vector (resnet based, winner of voxsrc 2019)
- ecapa-tdnn (variant of tdnn, winner of voxsrc 2020)
- Back-end Support
- Cosine
- EER/minDCF
- AS-norm
- PLDA
- UIO for effective industrial-scale dataset processing
- Online data augmentation
- Noise && RIR
- Speed Perturb
- Specaug
- Online data augmentation
- ONNX support
- Triton Server support (GPU)
- Pretrained model as feature extractor
- Training or finetuning big models such as WavLM might be too costly for current stage
- Support using features from released pretrained models (hugging face)
- Basic Speaker Diarization Recipe
- Embedding based (more related with our speaker embedding learner toolkit)
- Interactive Demo