Certainly! Here’s a sample code snippet for achieving instant, subtitle-style output using OpenAI’s Realtime API with turn_detection disabled. This example uses a WebSocket connection, sends audio buffers, and commits them manually for low-latency transcription. The key is to set turn_detection to null and frequently send input_audio_buffer.commit events.
Sample WebSocket Payloads:
- Start the session with turn detection disabled:
{
"type": "transcription_session.update",
"input_audio_format": "pcm16",
"input_audio_transcription": {
"model": "gpt-4o-transcribe",
"prompt": "",
"language": "en"
},
"turn_detection": null,
"input_audio_noise_reduction": {
"type": "near_field"
}
}
Streaming the transcription of an ongoing audio recording
- Append audio data and commit frequently (e.g., every 100–200ms):
{
"type": "input_audio_buffer.append",
"audio": "Base64EncodedAudioData"
}
{
"type": "input_audio_buffer.commit"
}
Realtime conversations – Voice activity detection
Python-style Pseudocode:
import asyncio
import websockets
import base64
async def stream_audio(ws, audio_chunks):
for chunk in audio_chunks:
await ws.send(json.dumps({
"type": "input_audio_buffer.append",
"audio": base64.b64encode(chunk).decode("utf-8")
}))
await ws.send(json.dumps({
"type": "input_audio_buffer.commit"
}))
await asyncio.sleep(0.2)