I’ve been testing the new Realtime API with SIP integration over Twilio, the realtime conversation part works just fine. However, I haven’t figured out how to get the transcription for the audio input. I can get the full transcription of the model’s response, but I’m unable to retrieve the transcription from the user .
This is the only event related to the transcription that I receive:
{"id":"item_C9fNHe56u8NI1EJTiv4Q9","type":"message","status":"completed","role":"user","content":[{"type":"input_audio","transcript":null}]}}
I’ve tried sending the session.update event to set up the transcription model, but this doesn’t seem to work.
system_update = {
"type": "session.update",
"session": {
"input_audio_transcription": {
"model": "gpt-4o-transcribe",
"language": "es",
"prompt": "",
},
}
}