This is the project for the paper German End-to-end Speech Recognition based on DeepSpeech published at KONVENS 2019.
This project aims to develop a working Speech to Text module using Mozilla DeepSpeech, which can be used for any Audio processing pipeline. Mozillla DeepSpeech is a state-of-the-art open-source automatic speech recognition (ASR) toolkit. DeepSpeech is using a model trained by machine learning techniques based on Baidu's Deep Speech research paper. Project DeepSpeech uses Google's TensorFlow to make the implementation easier.
This Readme is written for DeepSpeech v0.5.0. Refer to Mozillla DeepSpeech for latest updates.
- Requirements
- Speech Corpus
- Language Model
- Training
- Hyper-Paramter Optimization
- Results
- Trained Models
- Acknowledgments
- References
virtualenv -p python3 deepspeech-german
source deepspeech-german/bin/activate
pip3 install -r python_requirements.txt
The necessary Linux dependencies can be found in linux_requirements.
xargs -a linux_requirements.txt sudo apt-get install
$ wget
$ tar -xzvf v0.5.0.tar.gz
$ mv DeepSpeech-0.5.0 DeepSpeech
- German Distant Speech Corpus (TUDA-De) ~127h
- Mozilla Common Voice ~140h
- Voxforge ~35h
- Download the corpus
1. Tuda-De
$ mkdir tuda
$ cd tuda
$ wget
$ tar -xzvf german-speechdata-package-v2.tar.gz
2. Mozilla
$ cd ..
$ mkdir mozilla
$ cd mozilla
$ wget
3. Voxforge
$ cd ..
$ mkdir voxforge
$ cd voxforge
from audiomate.corpus import io
dl = io.VoxforgeDownloader(lang='de')
- Prepare the Audio Data
$ cd ..
$ ##Tuda-De
$ git clone
$ deepspeech-german/pre-processing/ --tuda $tuda_corpus_path $export_path_data_tuda
$ ##Voxforge
$ deepspeech-german/pre-processing/
$ python3 deepspeech-german/ --voxforge $voxforge_corpus_path $export_path_data_voxforge
$ ##Mozilla Common Voice
$ python3 DeepSpeech/bin/ --filter_alphabet deepspeech-german/data/alphabet.txt $export_path_data_mozilla
NOTE: Change the path accordingly in
We used KenLM toolkit to train a 3-gram language model. It is Language Model inference code by Kenneth Heafield
- Installation
$ git clone
$ cd kenlm
$ mkdir -p build
$ cd build
$ cmake ..
$ make -j `nproc`
- Corpus
We used an open-source German Speech Corpus released by University of Hamburg.
- Download the data
$ wget
$ gzip -d German_sentences_8mil_filtered_maryfied.txt.gz
- Pre-process the data
$ deepspeech-german/pre-processing/ $text_corpus_path $exp_path/clean_vocab.txt
- Build the Language Model
$kenlm/build/bin/lmplz --text $exp_path/clean_vocab.txt --arpa $exp_path/ --o 3
$kenlm/build/bin/build_binary -T -s $exp_path/ $exp_path/lm.binary
NOTE: use -S memoryuse_in_%, if malloc expection occurs
$kenlm/build/bin/lmplz --text $exp_path/clean_vocab.txt --arpa $exp_path/ --o 3 -S 50%
To build Trie for the above trained Language Model.
- Build Native Client.
# The DeepSpeech tools are used to create the trie
$ git clone
$ cd tensorflow
$ git checkout origin/r1.13
$ ./configure
$ ln -s ../DeepSpeech/native_client ./
$ bazel build --config=monolithic -c opt --copt=-O3 --copt="-D_GLIBCXX_USE_CXX11_ABI=0" --copt=-fvisibility=hidden // //native_client:generate_trie --config=cuda
Flags used to configure TensorFlow
Do you wish to build TensorFlow with XLA JIT support? [Y/n]: n
Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: N
Do you wish to build TensorFlow with ROCm support? [y/N]: N
Do you wish to build TensorFlow with CUDA support? [y/N]: y
Do you wish to build TensorFlow with TensorRT support? [y/N]: N
Do you want to use clang as CUDA compiler? [y/N]: N
Do you wish to build TensorFlow with MPI support? [y/N]: N
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: N
Refer Mozilla's documentation for updates. We used Bazel Build label: 0.19.2 with DeepSpeechV0.5.0
- Build Trie
$ DeepSpeech/native_client/generate_trie $path/alphabet.txt $path/lm.binary $exp_path/trie
Define the path of the corpus and the hyperparameters in deepspeech-german/ file.
$ nohup deepspeech-german/ &
Define the path of the corpus and the hyperparameters in deepspeech-german/ file.
$ nohup deepspeech-german/ &
Some results from our findings.
- Mozilla 79.7%
- Voxforge 72.1%
- Tuda-De 26.8%
- Tuda-De+Mozilla 57.3%
- Tuda-De+Voxforge 15.1%
- Tuda-De+Voxforge+Mozilla 21.5%
NOTE: Refer our paper for more information.
1. German to German
- Specify the checkpoint directory in
$ nohup deepspeech-german/ &
2. English to German
Change all umlauts characters ä,ö,ü,ß to ae, oe, ue, ss
Re-build Language Model, Trie and Corpus
Specify the checkpoint directory in
$ nohup deepspeech-german/ &
NOTE: The checkpoints should be from the same version to perform Transfer Learning
The DeepSpeech model can be directly re-trained on new dataset. The required dependencies are available at:
1. v0.5.0
This model is trained on DeepSpeech v0.5.0 with Mozilla_v3+Voxforge+Tuda-De (please refer the paper for more details)
2. v0.6.0
This model is trained on DeepSpeech v0.6.0 with Mozilla_v4+Voxforge+Tuda-De+MAILABS(454+57+184+233h=928h)
3. v0.7.4
This model is trained on DeepSpeech v0.7.4 using pre-trained English model released by Mozilla English+Mozilla_v5+MAILABS+Tuda-De+Voxforge (1700+750+233+184+57h=2924h)
3. v0.9.0
This model is trained on DeepSpeech v0.9.0 using pre-trained English model released by Mozilla English+Mozilla_v5+SWC+MAILABS+Tuda-De+Voxforge (1700+750+248+233+184+57h=3172h)
Thanks to @koh-osug for providing Tflite model.
Why being SHY to STAR the repository, if you use the resources? :D
- Realse model for DeepSpeech-v0.6.0
- Realse model for DeepSpeech-v0.7.4
- Realse model for DeepSpeech-v0.9.0
- Add datasets - SWC
If you use our findings/scripts in your academic work, please cite:
author = "Aashish Agarwal and Torsten Zesch",
title = "German End-to-end Speech Recognition based on DeepSpeech",
booktitle = "Preliminary proceedings of the 15th Conference on Natural Language Processing (KONVENS 2019): Long Papers",
year = "2019",
address = "Erlangen, Germany",
publisher = "German Society for Computational Linguistics \& Language Technology",
pages = "111--119"