Hi,
I'm currently building a voice agent using the realtime API. I use the API on the frontend (so typescript), but fetch the ephemeral key via my backend. When the agent talks, I would like to create an audio reactive visual element. To do this, however, i need to get access to the agent's audio output stream. Documentation doesn't really mention a way to do this, and I'm kind of lost at this point. Is there even a way to do this?
Currently, I'm trying to gain access to the session's peerConnection object to see if an audio stream is attached to this. However, when i log the session.transport object, i get weird behavior: a single time, the peerConnection showed up as connected, but countless other times it showed up as disconnected. Here are the logs for reference:
Object { eventEmitter: {}, options: {}, #i: "gpt-realtime", #t: undefined, #n: null, #e: null, #i: "https://api.openai.com/v1/realtime/calls", #t: {…}, #n: false, #e: false, … }
eventEmitter: Object { #t: EventTarget, #e: Map(15) }
options: Object { }
#i: "gpt-realtime"
#t: undefined
#n: null
#e: Object { type: "realtime", object: "realtime.session", id: "sess_CaJc6qwsojX6OXJCMyr2F", … }
#i: "https://api.openai.com/v1/realtime/calls"
#t: Object { status: "disconnected", peerConnection: undefined, dataChannel: undefined, … }
callId: undefined
dataChannel: undefined
peerConnection: undefined
status: "disconnected"
<prototype>: Object { … }
#n: false
#e: true
#o: false
<prototype>: Object { … }
I understand that the peerConnection is not really meant to be accessed in this case, but is there really now other way to do this?