VoxCentum-Training

Training code for Voxcentum: Spoken Language Identification for 100+ Languages Expanded to 100+ Hours.

Python version == 3.10.8 is recommended.

Install required packages using requirements.txt.

conda create -n voxcentum python=3.10.8
conda activate voxcentum
conda install pip
pip install -r requirements.txt

Download the VoxCentum Dataset

TBD

Create Manifest Files for Training and Testing

This step creates training and testing files.

python generate_manifest.py --raw_data /path/to/raw_data --meta_store_path manifest

Data should be structured as follows (having subfolders under each language is fine):

├── /path/to/raw_data
    ├── language_x
        ...
    ├── language_y
        ...
    └── language_z
        ...

Training

This step starts training the model for language identification. Remember to check config.yaml for hyperparameters.

python training.py config.yaml

Testing

python inference.py --model_path /path/to/ckpt --manifest_dir /path/to/manifest --output /output/dir

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
models		models
modules		modules
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
generate_manifest.py		generate_manifest.py
inference.py		inference.py
inference_fleurs.py		inference_fleurs.py
iso_639-1.json		iso_639-1.json
requirements.txt		requirements.txt
training.py		training.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VoxCentum-Training

Download the VoxCentum Dataset

Create Manifest Files for Training and Testing

Training

Testing

License

About

Releases

Packages

Contributors 2

Languages

License

ycj0123/VoxCentum-Training

Folders and files

Latest commit

History

Repository files navigation

VoxCentum-Training

Download the VoxCentum Dataset

Create Manifest Files for Training and Testing

Training

Testing

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages