Skip to content

Inference Run Examples

Vineel Pratap edited this page Jan 14, 2020 · 42 revisions

Tutorial of running the examples

The supplied examples can be used directly to quickly bootstrap a demo. There are two executables:

  • simple_streaming_asr_example can by used to quickly transcribe a single audio file.
  • multithreaded_streaming_asr_example can by used to quickly transcribe many audio files.

Download the example trained models from AWS S3

~$> mkdir model
~$> cd model
for f in acoustic_model.bin arch.txt decoder_options.json feature_extractor.bin language_model.bin lexicon.txt tokens.txt ; do wget http://dl.fbaipublicfiles.com/wav2letter/inference/examples/model/${f} ; done

~/model$>ls -sh
total 270M
254M acoustic_model.bin  
1.0K arch.txt	 
512 decoder_options.json   
512 feature_extractor.bin   
13M language_model.bin	
4.0M lexicon.txt   
82K tokens.txt

Download LibriSpeech audio samples from openslr.org

~$> mkdir audio
~$> cd audio
~/audio$> wget -qO- http://www.openslr.org/resources/12/dev-clean.tar.gz | tar xvz
~/audio$> find LibriSpeech/dev-clean -type f -name "*.flac" -exec echo sox {} {}".wav"  \;
~/audio$> find LibriSpeech/dev-clean -type f -name "*.wav" > LibriSpeech-dev-clean-wav.lst

We should have 2703 audio files.

~/audio$> wc -l LibriSpeech-dev-clean-wav.lst
2703 LibriSpeech-dev-clean-wav.lst

Simple Streaming Asr Example

simple_streaming_asr_example can be used as a unix pipe to dump translation for a wav stream.

~/wav2letter/build$> make simple_streaming_asr_example
~/wav2letter/build$ cat ~/audio/LibriSpeech/dev-clean/777/126732/777-126732-0070.flac.wav | \
      inference/inference/examples/simple_streaming_asr_example \
         --input_files_base_path ~/model

Started features model file loading ...
Completed features model file loading elapsed time=46557 microseconds

Started acoustic model file loading ...
Completed acoustic model file loading elapsed time=2058 milliseconds

Started tokens file loading ...
Completed tokens file loading elapsed time=1318 microseconds

Tokens loaded - 9998 tokens
Started decoder options file loading ...
Completed decoder options file loading elapsed time=388 microseconds

Started create decoder ...
[Letters] 9998 tokens loaded.
[Words] 200001 words loaded.
Completed create decoder elapsed time=884 milliseconds

Started converting audio input from stdin to text... ...
Creating LexiconDecoder instance.
#start (msec), end(msec), transcription
0,1000,
1000,2000,he was out of his
2000,3000,mind with something
3000,4000,he overheard about eating
4000,5000,people's flesh
5000,6000,and drinking blood
6000,7000,what's the good of
7000,7315,of talking like that
Completed converting audio input from stdin to text... elapsed time=1302 milliseconds

We can verify the transcription quality by inspecting the audio's file transcription. We find the transcription by the file number. It is 0070 is this case. ~/audio/LibriSpeech/dev-clean/777/126732/777-126732-0070.flac.wav

~/wav2letter/build$> grep 0070 ~/audio/LibriSpeech/dev-clean/777/126732/777-126732.trans.txt
777-126732-0070 HE WAS OUT OF HIS MIND WITH SOMETHING HE OVERHEARD ABOUT EATING PEOPLE'S FLESH AND DRINKING BLOOD WHAT'S THE GOOD OF TALKING LIKE THAT

It can also transcribe a single file by directly pointing to it:

~/wav2letter/build$> make simple_streaming_asr_example
~/wav2letter/build$> inference/inference/examples/simple_streaming_asr_example --input_files_base_path ~/model --input_audio_file ~/audio/LibriSpeech/dev-clean/777/126732/777-126732-0076.flac.wav
...
#start (msec), end(msec), transcription
0,1000,
1000,2000,i wish he
2000,3000,had never been to school
3000,4000,missus
4000,4260,began again brusquely
Completed converting audio input file=/private/home/avidov/audio/LibriSpeech/dev-clean/777/126732/777-126732-0076.flac.wav to text... elapsed time=914 milliseconds

Multithreaded Streaming Asr Example

multithreaded_streaming_asr_example can convert a large list of audio files using multiple threads.

~/wav2letter/build:> mkdir ~/audio/LibriSpeech-dev-clean-transcribed
~/wav2letter/build$> make multithreaded_streaming_asr_example
~/wav2letter/build:> inference/inference/examples/multithreaded_streaming_asr_example   --input_audio_file_of_paths ~/audio/LibriSpeech-dev-clean-wav.lst  --output_files_base_path ~/audio/LibriSpeech-dev-clean-transcribed  --input_files_base_path=$HOME/model --max_num_threads $(nproc)
...
audioFileToWordsFile() processing 2699/2703 input=/private/home/avidov/model/LibriSpeech/dev-clean/3853/163249/3853-163249-0033.flac.wav output=/private/home/avidov/audio/LibriSpeech-dev-clean-transcribed/3853-163249-0033.flac.wav.txt
audioFileToWordsFile() processing 2700/2703 input=/private/home/avidov/model/LibriSpeech/dev-clean/3853/163249/3853-163249-0016.flac.wav output=/private/home/avidov/audio/LibriSpeech-dev-clean-transcribed/3853-163249-0016.flac.wav.txt
audioFileToWordsFile() processing 2701/2703 input=/private/home/avidov/model/LibriSpeech/dev-clean/3853/163249/3853-163249-0015.flac.wav output=/private/home/avidov/audio/LibriSpeech-dev-clean-transcribed/3853-163249-0015.flac.wav.txt
audioFileToWordsFile() processing 2702/2703 input=/private/home/avidov/model/LibriSpeech/dev-clean/3853/163249/3853-163249-0027.flac.wav output=/private/home/avidov/audio/LibriSpeech-dev-clean-transcribed/3853-163249-0027.flac.wav.txt
audioFileToWordsFile() processing 2703/2703 input=/private/home/avidov/model/LibriSpeech/dev-clean/3853/163249/3853-163249-0043.flac.wav output=/private/home/avidov/audio/LibriSpeech-dev-clean-transcribed/3853-163249-0043.flac.wav.txt
exit ThreadPool::~ThreadPool()
Completed converting audio input files to text elapsed time=85610 microseconds