Added support for gpt4o-realtime models for Speect to Speech interactions #659

sharananurag998 · 2025-05-07T07:50:15Z

This PR introduces real-time voice pipeline support for OpenAI’s gpt-4o-realtime-preview model, enabling seamless, low-latency speech-to-speech interactions in the Speect framework. The update brings a modern, streaming audio interface, integrated tool execution, and robust event handling—while maintaining full compatibility with the existing STT/TTS pipeline.

Key Features & Changes

RealtimeVoicePipeline:
- New pipeline for direct, continuous audio-to-audio conversations with OpenAI’s real-time models.
- Handles streaming microphone input and speaker output at 24kHz, as required by the API.
- Supports push-to-talk and half-duplex operation to prevent echo/feedback.
Integrated Tool Calls:
- Tools are registered with the pipeline and executed automatically when the model requests a function call.
- Tool results are sent back to the model using the correct OpenAI Realtime API protocol.
Event Handling & Debugging:
- Full support for all major OpenAI Realtime API events, including:
  - Audio and text deltas
  - Tool call arguments (streamed and completed)
  - Transcription events (conversation.item.input_audio_transcription.delta and .completed)
  - Session and rate limit updates
- Example logs all transcription events for easy debugging of what the model “hears.”
Echo & Feedback Mitigation:
- Implements a buffer window after assistant audio playback to prevent microphone echo from triggering new turns.
- Optionally enables server-side noise/echo reduction via input_audio_noise_reduction in the session config.
Sample Rate Fixes:
- Ensures both input and output audio are always 24kHz PCM, as required by the OpenAI API (fixes “slow motion” audio bug).
Backwards Compatibility:
- All changes are fully compatible with the existing STT/TTS pipeline and configuration.
- Legacy examples and workflows continue to work without modification.
Documentation & Examples:
- Updated docs/voice/pipeline.md with new real-time usage, configuration, and troubleshooting sections.
- New example: continuous_realtime_assistant.py demonstrates push-to-talk, tool calls, and event handling.

🛠️ How to Use

Realtime Pipeline:
See the new example and documentation for how to use RealtimeVoicePipeline with your OpenAI API key and tools.
Classic Pipeline:
No changes required—existing STT/TTS flows are unaffected.

…ions - Added detailed documentation for the new `RealtimeVoicePipeline`, including usage examples and event handling for real-time audio interaction. - Introduced a new example script demonstrating the `RealtimeVoicePipeline` with continuous audio streaming and tool execution.

dkundel-openai · 2025-05-14T18:24:59Z

Thank you so much for the PR @sharananurag998! I'll try to look at the PR later this week. Thank you for your patience

sharananurag998 · 2025-05-15T04:24:02Z

@dkundel-openai @rm-openai

I haven't found a way for native speech-to-speech integration with an agent, but we can define an agent and use it as a tool in the real-time speech pipeline, and it works!

The agent-as-tool approach provides better latency than the STT-TTS-based VoicePipeline.

Also this branch has Juspay specific MCP tool handling changes since we're using the fork as a python dependency, I'll move it to a separate branch so that main can be merged.

@dkundel-openai you can review the new pipeline and let me know of any changes I'll be happy to work on it.

EmanueleTribi · 2025-05-29T15:21:35Z

Hi everyone, any news on this pull request or in general timeline to integrate the realtime api? i'm very much interested in using it with the SDK agent and i was wondering if to write my own code or to wait it to be directly integrated. Thanks!
@dkundel-openai @sharananurag998

sharananurag998 force-pushed the main branch 3 times, most recently from 8bcb389 to b8899f7 Compare May 7, 2025 11:06

sharananurag998 marked this pull request as draft May 7, 2025 14:29

feat: Context handling in realtime

692f4fd

sharananurag998 force-pushed the main branch from b8899f7 to 692f4fd Compare May 9, 2025 05:38

added context to tool input

acebd8e

rm-openai requested a review from dkundel-openai May 14, 2025 16:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added support for gpt4o-realtime models for Speect to Speech interactions #659

Added support for gpt4o-realtime models for Speect to Speech interactions #659

sharananurag998 commented May 7, 2025

Uh oh!

dkundel-openai commented May 14, 2025

Uh oh!

sharananurag998 commented May 15, 2025

Uh oh!

EmanueleTribi commented May 29, 2025 •

edited

Loading

Uh oh!

Uh oh!

Added support for gpt4o-realtime models for Speect to Speech interactions #659

Are you sure you want to change the base?

Added support for gpt4o-realtime models for Speect to Speech interactions #659

Conversation

sharananurag998 commented May 7, 2025

Key Features & Changes

🛠️ How to Use

Uh oh!

dkundel-openai commented May 14, 2025

Uh oh!

sharananurag998 commented May 15, 2025

Uh oh!

EmanueleTribi commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

EmanueleTribi commented May 29, 2025 •

edited

Loading