GitHub - thetobysiu/audio-silence-normalize: Normalize the silence duration of audio to make comma/silence trainable in DCTTS.

Intro

Normalize the silence duration of audio to make comma/silence trainable in DCTTS.

Since sometimes an audio clip contains multiple sentences, and each sentences sometimes have longer or shorter pause, it's necessary to pre-process audio data in order for it to be used in DCTTS.

It first split audio so that all silence goes away, and then insert back a fixed duration of silence between the split audio clips.

Steps

Usage X.py Geralt

Place the respective character audio folder in the root and run split.py, the Geralt_output folder will be created containing the split clips.
(optional) select audio clips that are really small(likely to be sign and hmm) and move into a folder named _test, e.g. Geralt_test
(optional) run transcribe.py, it will transcribe all the clips, {voice}_transcription.csv will be created
(optional) run move.py, it reads the transcription.csv and move all the files with transcription from test folder back to output folder
(optional) run rename.py, it rename the remaining clips in test folder with the sentence as the filename for easier manual checking
(optional) after checking and deleting, the remaining clips in test folder should be retained, run clean.py to rename it back to normal and move back to output folder
run combine.py to merge the clips and insert fixed silence between clips. A folder {voice}_combined will be created.

Tools

convert_16k.sh to convert all audios to 16k (required for deepspeech transcribe) convert_22k.sh to convert all audios to 22k

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
Combined.ipynb		Combined.ipynb
README.md		README.md
Trim and split audio improved.ipynb		Trim and split audio improved.ipynb
Trim and split audio.ipynb		Trim and split audio.ipynb
clean.py		clean.py
clean_combine.py		clean_combine.py
combine.py		combine.py
convert_16k.sh		convert_16k.sh
convert_22k.sh		convert_22k.sh
format.py		format.py
move.py		move.py
rename.py		rename.py
split.py		split.py
transcribe.py		transcribe.py
verify_combine.py		verify_combine.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Intro

Steps

Tools

About

Releases

Packages

Languages

thetobysiu/audio-silence-normalize

Folders and files

Latest commit

History

Repository files navigation

Intro

Steps

Tools

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages