A service that automatically transcribes YouTube videos to text using LemonFox API. It comes with a one-off script to test the process, and a production setup for running the service.
The production setup uses a PostgreSQL database and a trigger to listen for new videos inserted into the database. When a new video is inserted, the service will download the audio, send it to the LemonFox API, and store the transcription in the database.
- PostgreSQL database
- LemonFox API key (for transcription service)
- yt-dlp command-line tool (for downloading YouTube audio)
- Go 1.22+
Create a .env
file based on .env.example
.
The application requires the following environment variables to be set in a .env
file:
If you create a new database, you can set the DATABASE_URL_<identifier>
environment variable to the database URL.
Then you should run the script as follows:
go run cmd/transcription/main.go <identifier>
for example:
go run cmd/transcription/main.go DEFAULT
will look for an variable in your .env called DATABASE_URL_DEFAULT
and use that to connect to the database.
This uses a hardcoded video URL in the script and log the transcription to the console.
go run one-off.go
The code below assumes you have a table named Video
.
CREATE TABLE "Video" (
id SERIAL PRIMARY KEY,
url TEXT NOT NULL,
status TEXT DEFAULT 'pending',
transcription TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
psql "postgresql://username:password@hostname:port/database" -c "
CREATE OR REPLACE FUNCTION notify_new_video()
RETURNS trigger AS \$\$
BEGIN
PERFORM pg_notify('new_video', row_to_json(NEW)::text);
RETURN NEW;
END;
\$\$ LANGUAGE plpgsql;
CREATE TRIGGER video_inserted_trigger
AFTER INSERT ON \"Video\"
FOR EACH ROW
EXECUTE FUNCTION notify_new_video();
"
psql "connection_string" -c "\df notify_new_video"
- Start the transcription service:
go run service.go transcribe.go
Make sure you have created a vector database. I used pgvector on railway.app
Run the following set up your database:
psql "postgres://postgres:xxxxx@xxxxx:37549/railway" -f setup.sql
Add a random SERVICE_API_KEY to the .env file.
You can generate a random key with openssl rand -hex 32
.