Skip to content

Latest commit

 

History

History
106 lines (89 loc) · 5.35 KB

README.md

File metadata and controls

106 lines (89 loc) · 5.35 KB

Core Transcription 🎙️

The Speech Recognition model enables you to transcribe spoken words into written text and is the foundation of all AssemblyAI products. On top of the core transcription, you can enable other features and models, such as Speaker Diarization, by adding additional parameters to the same transcription request.

Table of Contents

All Core Transcription Cookbooks

Basic Transcription Workflows

Transcribe an Audio File
Specify a Language
Transcribe YouTube videos
Build a UI for Transcription with Gradio
Detect Low Confidence Words in a Transcript

Batch Transcription

Transcribe a batch of files using AssemblyAI
Transcribe multiple files simultaneously using our Python SDK
Transcribe multiple files simultaneously using our Node.js SDK

Hosting Audio Files

Transcribe from an AWS S3 Bucket
Transcribe Google Drive links
Transcribe GitHub Files

Speaker Labels

Identify Speakers in Audio Recordings
Generate Speaker Labels with Make.com
Calculate Talk/Listen Ratio of Speakers
Create a speaker timeline with Speaker Labels
Use AssemblyAI with Pyannote to generate custom Speaker Labels
Speaker Diarization with Async Chunking
Speaker Identification Across Files w/ AssemblyAI, Pinecone, and Nvidia's TitaNet Model

Automatic Language Detection

Use Automatic Language Detection
Automatic Language Detection as separate step from Transcription
Route to Default Language if Language Detection Confidence is Low - JS
Route to Default Language if Language Detection Confidence is Low - Python
Route to Nano Speech Model if Language Confidence is Low

Subtitles

Generate Subtitles for Videos
Create Subtitles with Speaker Labels
Create custom-length subtitles with AssemblyAI

Delete Transcripts

Delete a Transcript
Delete transcripts after 24 hours of creation

Error Handling and Audio File Fixes

Troubleshoot common errors when starting to use our API
Automatically Retry Server Errors
Automatically Retry Upload Errors
Identify Duplicate Channels in Stereo Files
Correct Audio Duration Discrepancies with Multi-Tool Validation and Transcoding

Translation

Translate Transcripts
Translate Subtitles

Async Chunking for Near-Realtime Transcription

🆕 Near-Realtime Python Speech-to-Text App
🆕 Near-Realtime Node.js Speech-to-Text App
Split audio file to shorter files

Migration Guides

🆕 AWS Transcribe to AssemblyAI
🆕 Deepgram to AssemblyAI
🆕 OpenAI to AssemblyAI
🆕 Google to AssemblyAI

Do More with our SDKS

Do more with the JavaScript SDK
Do more with the Python SDK