Implementation of QuartzNet ASR model in PyTorch
To launch and inference in nvidia-docker container follow these instructions:
- Install nvidia-docker
- Run
./docker-build.sh
To launch training follow these instructions:
- Set preferred configurations in
config/config.yaml
. In particular you might want to setdataset
: it can be eithernumbers
orlibrispeech
- In
docker-run.sh
changememory
,memory-swap
,shm-size
,cpuset-cpus
,gpus
, and datavolume
to desired values - Set WANDB_API_KEY environment variable to your wandb key
- Run
./docker-train.sh
All outputs including models will be saved to outputs
dir.
To launch inference run the following command:
./docker-inference.sh model_path device bpe_path input_path
Where:
model_path
is a path to .pth model filedevice
is the device to inference on: either 'cpu', 'cuda' or cuda device numberbpe_path
is a path to yttm bpe model .model fileinput_path
is a path to input audio file to parse text from
Predicted output will be printed to stdout and saved into a file in inferenced
folder
My currently best model trained on librispeech and the respective config can be downloaded here.
It is not very good however because I only trained it to ~59 WER on librispeech