This folder contains a shell script and a Task workflow to implement Whisper-AI to Opencast as an external code execution.
While there is a lot of ways to implement Whisper, on this script I implemented as a REST-API server, with this type of implementation, any worker can use the power of a single GPU, making scaling and configuration easier.
- Run the docker container whisper-asr-webservice in a machine with network access to the other workers (Not necessary if the same machine is a Opencast worker), you can use
whisper-generate.sh
as a starting point. - Copy the script file
whisper-getvtt.sh
into the scripts folder in opencast (normally/etc/opencast/scripts
) - Allow
whisper-getvtt.sh
to be run onorg.opencastproject.execute.impl.ExecuteServiceImpl.cfg
configuration file. - Copy the
action-transcribe-event.xml
to the opencastworkflows
folder. - Important Configure the workflow file to your settings and parameters
- The container can be run on CPU mode or GPU mode. GPU is way faster. For example, with the base model the CPU for a 2 hours video takes more than hour, with GPU (Quadro P4000) takes 5 minutes to transcribe.
- Each model reserves a part of the video RAM to work. depending on your GPU could it be possible that you can't run all language models at the same time or can't run at all (for example, the large model needs more than 10 GB to work!).
- If you need to look the available REST endpoints, you can go to
{{ your_server }}:{{ port }}/docs
- More information you can find on the container's GitHub page
The script is very simple, gets the audio from the video, sends the audio to the whisper server and gets the VTT subtitles.
$ ./whisper-getvtt.sh {whisperServer} {videoFile} {eventId} {outputVttFile} translate*
Where:
whisperServer
: The address where the server is running (EX:localhost:900
)videoFile
: Video file to transcribe or translateeventId
: Event ID from OpencastoutputVttFile
: VTT subtitles filetranslate
: * Optional, if is written, it will translate the transcription to english.
Finally, try the script manually to be sure that can reach the whisper server.
The workflow is a template, you need to configure first to your setup before to use it. Some things to take into account.
As it is will remove existing subtitles flavored captions/vtt+de
or captions/vtt+en
and generate new ones.
-
On the
configuration_panel
field, you can enable or disable models that you will not use. simply adddocument.getElementById("ID").disabled = true;
in the script part, for example:<script> document.getElementById("mTiny").disabled = true; document.getElementById("mSmall").disabled = true; document.getElementById("mLarge").disabled = true; </script>
-
Set the correct server in the
conditional-config
WoH for each model -
In the execution WoHs, take note of the captions tag, the normal transcription is set to tag in german
vtt+de
, change the last two letter to your corresponding language before.
- OpenAI for Whisper
- ahmetoner, for the creation of the Whisper docker image with REST API endpoints
- This script brought to you by the University of Cologne RRZK, author: Maximiliano Lira Del Canto.