Skip to content

Latest commit

 

History

History
129 lines (89 loc) · 5.3 KB

README.md

File metadata and controls

129 lines (89 loc) · 5.3 KB

Speech-To-Text 🎙️

The Speech Recognition model enables you to transcribe spoken words into written text and is the foundation of all AssemblyAI products. On top of the core transcription, you can enable other features and models, such as Speaker Diarization, by adding additional parameters to the same transcription request.

Table of Contents

All Speech-To-Text Cookbooks

Basic Transcription Workflows

Transcribe an Audio File
Specify a Language
Transcribe YouTube Videos
Build a UI for Transcription with Gradio
Detect Low Confidence Words in a Transcript
🆕 How to Use the EU Endpoint

Batch Transcription

Transcribe a Batch of Files
Transcribe Multiple Files Simultaneously - Python SDK
Transcribe Multiple Files Simultaneously - Node SDK

Hosting Audio Files

Transcribe from an AWS S3 Bucket
Transcribe Google Drive Links
Transcribe GitHub Files

Speaker Labels

Identify Speakers in Audio Recordings
Generate Speaker Labels with Make.com
Calculate Talk/Listen Ratio of Speakers
Create a Speaker Timeline with Speaker Labels
Use Pyannote to Generate Custom Speaker Labels
Speaker Diarization with Async Chunking
Speaker Identification Across Files using Pinecone and Nvidia's TitaNet Model

Automatic Language Detection

Use Automatic Language Detection
Automatic Language Detection as Separate Step from Transcription
Route to Default Language if Language Detection Confidence is Low - Node SDK
Route to Default Language if Language Detection Confidence is Low - Python SDK
Route to Nano Speech Model if Language Confidence is Low

Subtitles

Generate Subtitles for Videos
Create Subtitles with Speaker Labels
Create Custom-Length Subtitles

Delete Transcripts

Delete a Transcript
Delete Transcripts After 24 Hours of Creation

Error Handling and Audio File Fixes

Troubleshoot Common Errors When Starting to Use Our API
Automatically Retry Server Errors
Automatically Retry Upload Errors
Identify Duplicate Channels in Stereo Files
Correct Audio Duration Discrepancies with Multi-Tool Validation and Transcoding

Translation

Translate an AssemblyAI Transcript
Translate an AssemblyAI Subtitle Transcript

Async Chunking for Near-Realtime Transcription

🆕 Near-Realtime Python Speech-to-Text App
🆕 Near-Realtime Node.js Speech-to-Text App
Split Audio File into Shorter Files

Migration Guides

AWS Transcribe to AssemblyAI
Deepgram to AssemblyAI
OpenAI to AssemblyAI
Google to AssemblyAI

Do More with our SDKS

Do More with the Node SDK
Do More with the Python SDK