This project implements the Whisper AI models in an easily accessible and simple to use containerized Web App and terrafrom deployement.
No need to mess with config files and pip dependencies, everything comes packages in a single docker container ready to be used.
The project deploys infrastructure using Terraform, specifically utilizing the AWS cloud provider with an instance type of t2.large
.
Check out the project on Docker Hub - Docker Hub
This repository hosts a Flask web application integrated with Audi2Text for speech-to-text functionality, Whisper messaging for secure communication, and a feature for downloading text. The README provides instructions on running the application.
Docker Hub Overview:
This Docker image contains the Flask app configured with Audi2Text and Whisper messaging capabilities. It simplifies deployment and offers flexibility for various environments.
Why This Image: This image provides a convert setup for a Audi2Text integration using python Flask app and Whisper messaging services enabling easy deployment and utilisation of audio-to-text and secure messaging and download features.
Overview: This Docker image facilitates the deployment of a Flask app with Audi2Text integration and Whisper messaging, streamlining the process of setting up a web application with advanced functionalities. It serves as a foundation for building speech-to-text enabled applications with secure communication capabilities.
Getting Started
These instructions will get you a copy of the project up and running on your local machine and developement server for a development and testing purposes.
Pre-requisites:
- Docker installed on the host machine or server
Deployment Options:
- Docker container only
The container runs on port 5000, launch and run it using the below command.
docker run -d --name=audio2txt -p 5000:5000 asharshith/audio2txtdowload:v1.0.0
Browse to http://your-host-ip:5000 to access the web UI
You can also build the container locally. Just clone this repository
git clone https://github.com/hrshith/audio2text-download.git
Then change into the directory
cd audio2txt-download
Build the container
docker build -t audio2txt-download
Finally once the container is build you can launch it using the command
docker run -d --name=audio2txt -p 5000:5000 audio2txt-download
Create Terraform Configuration File, And its avalible in the infrastructure.tf file formate.
In the infrastructure.tf file, specify the essential resources for my application, including Docker images and their Docker Hub details. Additionally, deploy an AWS EC2 instance with appropriate configuration and specifications, as outlined in the provided statement. Detailed instructions can be found in this readme file.
Initialize Terraform: Before applying any changes, you need to initialize Terraform in directory containing your configuration files, Run the following command:
To run the below init cmd terraform working directory, installing necessary plugins, configuring the backend, and downloading referenced modules.
terraform init
To run the below cmd to see whether the configuration file is valid or not .
terraform validate
To run the below cmd to see what changes Terraform will make to your AWS infrastructure .
terraform plan
To run the below apply cmd to enact changes to your AWS infrastructure based on the configured plan.
terraform apply
Confirm changes: Terraform will prompt you to confirm the changes before applying them. Review the changes carefully and type yes to confirm and proceed
When you are done and want to clean up the resources, you can destroy the Terraform-managed infrastructure with the following command:
terrafrom destroy
The container runs the base model of Whisper by default, if you want to change it, follow the instructions below. (For future builds I am hoping to incorporate this into the docker run command)
-
Once the container is running, enter it
docker exec -it audio2txt-download bash
-
Look for the text.py file and open it (You can install and use an editor of your choice I am using nano)
nano text.py
-
You should see the below line
# Load the Whisper model model = whisper.load_model("base")
-
Change it to anything you like based on the below table (The .en models are english only)
-
For example if you want to run the medium model your code should look like this.
# Load the Whisper model model = whisper.load_model("medium")
-
Just restart the container and upload your audio and it will automatically pull the new model.
Warning: Higher models require a moderately powerful CPU else it will take forever to load
Size | Parameters | English-only model | Multilingual model | Required VRAM | Relative speed |
---|---|---|---|---|---|
tiny | 39 M | tiny.en |
tiny |
~1 GB | ~32x |
base | 74 M | base.en |
base |
~1 GB | ~16x |
small | 244 M | small.en |
small |
~2 GB | ~6x |
medium | 769 M | medium.en |
medium |
~5 GB | ~2x |
large | 1550 M | N/A | large |
~10 GB | 1x |
- Access the application at
http://localhost:5000
after running the Docker container.
Special credits go to the OpenAI Whisper project which has made this project possible! Check them out at - Whisper Project
References:
-
Terraform: https://www.terraform.io