AsyncChunkPy is a Python application that provides near-realtime speech-to-text transcription using chunked audio processing and asynchronous transcription. It leverages the power of AssemblyAI's async transcription API to deliver high-quality transcriptions at near real-time speeds.
- Real-time audio recording and chunking
- Voice Activity Detection (VAD) for intelligent chunk processing
- Asynchronous transcription using AssemblyAI API
- Ordered transcript logging
- Configurable chunk size and silence threshold
- Support for multiple languages
- Access to AssemblyAI's powerful Universal-2 model for English and Universal-1 model for Spanish and German
- Support for all non-English languages available in AssemblyAI's async transcription service
- Higher accuracy compared to real-time transcription models
- More cost-effective than real-time transcription services
- Near real-time performance with the quality of async transcription
- Python 3.7 or later
- pip (Python package installer)
- AssemblyAI API key (You can # for an AssemblyAI account and get your API key from your dashboard.)
-
Clone the repository:
git clone https://github.com/AssemblyAI-Solutions/async-chunk-py.git cd async-chunk-py
-
Install dependencies:
pip install -r requirements.txt
-
Create a
.env
file in the root directory and add your AssemblyAI API key:ASSEMBLYAI_API_KEY=your_api_key_here
-
Start the application:
python main.py
-
Speak into your microphone. The application will record and transcribe your speech in near-realtime.
-
Press Ctrl+C to stop the recording and see the final transcript.
You can modify the following parameters in config.py
:
CHUNK_SIZE
: Size of each audio chunk in bytesCHUNK_DURATION_MS
: Duration of each audio chunk in milliseconds (default: 5000ms)SILENCE_THRESHOLD_MS
: Duration of silence required to trigger chunk processing (default: 600ms)
To change the language or enable language detection, modify the transcription_worker.py
file:
- Set
language_code='en'
to the desired language code in thetranscribe
method, or - Add
language_detection=True
to enable automatic language detection
This project uses the py-webrtcvad library for VAD. You can adjust VAD parameters by modifying the Vad
configuration in audio_recorder.py
. For more information on VAD parameters, visit the py-webrtcvad GitHub repository.
main.py
: Main application file handling coordination between audio recording and transcription.audio_recorder.py
: Handles audio recording and Voice Activity Detection.transcription_worker.py
: Worker for handling transcription tasks using AssemblyAI API.config.py
: Configuration file for various parameters.
- py-webrtcvad for the Voice Activity Detection functionality
If you encounter any issues with audio recording or transcription, ensure that:
- Your microphone is properly connected and selected as the input device.
- Your AssemblyAI API key is correctly set in the
.env
file. - You have a stable internet connection for API communication.
For any other issues, please check the console output for error messages and refer to the documentation of the individual dependencies if needed.