Skip to content

Latest commit

 

History

History
31 lines (28 loc) · 1.02 KB

ROADMAP.md

File metadata and controls

31 lines (28 loc) · 1.02 KB

Wespeaker Roadmap

Version 1.0 (Time: 2022.09)

This is the roadmap for wespeaker version 1.0.

  • Standard dataset support
    • VoxCeleb
    • CnCeleb
  • SOTA models support
    • x-vector (tdnn based, milestone deep speaker embedding)
    • r-vector (resnet based, winner of voxsrc 2019)
    • ecapa-tdnn (variant of tdnn, winner of voxsrc 2020)
  • Back-end Support
    • Cosine
    • EER/minDCF
    • AS-norm
    • PLDA
  • UIO for effective industrial-scale dataset processing
    • Online data augmentation
      • Noise && RIR
      • Speed Perturb
      • Specaug
  • ONNX support
  • Triton Server support (GPU)
  • Pretrained model as feature extractor
    • Training or finetuning big models such as WavLM might be too costly for current stage
    • Support using features from released pretrained models (hugging face)
  • Basic Speaker Diarization Recipe
    • Embedding based (more related with our speaker embedding learner toolkit)
  • Interactive Demo