An AWS-based system that automatically generates and adds subtitles to videos. When a video is uploaded to an S3 bucket, it triggers a Lambda function that extracts the audio, transcribes it using AWS Transcribe, and overlays the resulting subtitles on the video.
- Upload S3 Bucket: Stores uploaded videos and temporary audio files
- Processed S3 Bucket: Stores the final videos with subtitles
- Lambda Function: Processes videos using FFmpeg and AWS Transcribe
- FFmpeg Layer: Custom Lambda layer containing FFmpeg binaries
- Node.js and npm
- AWS CLI configured with appropriate credentials
- Docker Desktop (for local development and deployment)
- Python 3.9 or later
- Clone the repository:
git clone [repository-url]
cd video-subtitler
- Install dependencies:
npm install
- Set up the FFmpeg layer:
mkdir -p lambda-layers/ffmpeg/bin
chmod +x download-ffmpeg.sh
./download-ffmpeg.sh
- Create a Python virtual environment for the Lambda function:
cd lambda
python3 -m venv .venv
source .venv/bin/activate
pip3 install -r requirements.txt
- Deploy the stack:
cdk deploy
- Record a video on your mobile device
- Upload the video to the upload S3 bucket using any S3-compatible app
- The system will automatically:
- Extract audio from the video
- Transcribe the audio
- Generate subtitles
- Create a new video with embedded subtitles
- Find the processed video in the processed S3 bucket
video-subtitler/
├── bin/ # CDK app entry point
├── lib/ # CDK stack definition
├── lambda/ # Lambda function code
│ ├── index.py # Main Lambda handler
│ └── requirements.txt # Python dependencies
├── lambda-layers/ # Lambda layers
│ └── ffmpeg/ # FFmpeg binaries
├── cdk.json # CDK configuration
└── package.json # Node.js dependencies
- Ensure Docker Desktop is running
- Use the Python virtual environment for Lambda function development
- Test Lambda function locally using AWS SAM CLI (optional)
- Ensure Docker is running before deploying
- Check CloudWatch Logs for Lambda function errors
- Verify S3 bucket permissions if upload/download fails
- Monitor AWS Transcribe job status in AWS Console