Skip to content

Automatic transcription of Danish using xls-r-300m-danish-nst-cv9 (ASR, automatic speech recognition)

License

Notifications You must be signed in to change notification settings

sorenss/transcribe-danish

Repository files navigation

transcribe-danish

Automatic rough transcription of Danish using xls-r-300m-danish-nst-cv9 (a pretrained model for ASR, automatic speech recognition), saved to a CLAN (.cha) file or basic text, both with pause measurements. This cannot replace human hearing, and the transcript will be very rough with weird spelling and lots of errors, but can be a help if you like to have sound-linked bullets and a rough starting point.

The code runs locally on your computer and the data is not sent to the cloud or elsewhere, but internet connection is needed on the first run to download the model and packages needed to run. Be aware that the transcription will not be anonymized.

How to install and use

The script requires Praat, Python and Bash. I have descriptions below of how I run it on Linux and Windows.

Linux

Installation

The script requires Praat to be installed, which can be done with: sudo apt install praat

The python packages torch, datasets, transformers and librosa are required:

pip install torch
pip install datasets
pip install transformers
pip install librosa

Download this repository (git clone https://github.com/sorenss/transcribe-danish.git).

How to run

From a terminal in the transcribe-danish folder, run the bash script like this:

bash transcribedanish.sh FILENAMEHERE.wav

with the name of your file as specified, and wait as it will take some time. It will give you a percentage during the process indicating how far it is.

The script can save the transcript as a CLAN (.cha) or basic text. You can specify this after the filename, but CLAN is standard, so you don't need to specify it. If you want a basic text transcript, you run the code like this:

bash transcribedanish.sh FILENAME.wav basic

The basic transcript will be saved to a text (.txt) file.

Windows

There are probably multiple ways to run bash scripts on Windows. The potentially easiest way for this script (that I'm doing) is through WSL (Windows Subsystem for Linux). Install WSL (in addition, I had to run a command to make it work on my machine).

In WSL, you can follow the steps in the Linux guide, except the filename MUST be soundfile.wav. Remember that Praat has to be installed within WSL as it cannot use the Windows installation, i.e. sudo apt install praat.

How it works

The code consists of a bash script that runs a Praat script, a Python script and some file operations. The Praat script divides the sound file into smaller 16 kHz parts divided by pauses and saves them to a subfolder with timestamps. The Python script sends each separate file to xls-r-300m-danish-nst-cv9 to get it transcribed, then uses the timestamps to format the transcription into the CHAT format used by CLAN (Computerized Language Analyzer), including calculation of pauses and insertion of bullets.

The transcript will only be a starting point, and will be full of errors which are due to how xls-r-300m-danish-nst-cv9 works, and not this script. There is no speaker identification, and every line will use SPE as the speaker's name. Speaker changes may not be seen when there is no pause between speakers. The pause detection does not take inbreaths into account. Names and other information you may want to be anonymized, will not be with this transcript. Information about the automatic speech recognition, such as word error rate, can be found on the site of xls-r-300m-danish-nst-cv9.

About

Automatic transcription of Danish using xls-r-300m-danish-nst-cv9 (ASR, automatic speech recognition)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published