#Question
1 messages · Page 1 of 1 (latest)
Could you clarify your question or provide more details so I can assist you better?
I'm using ElevenLabs' convai/conversation WebSocket API for voice interactions. I want instant barge-in—when the user starts speaking (detected via vad_score events), the agent's speech should immediately stop, with no overlap.
Current approach:
Monitor VAD score, set isUserSpeaking when above threshold for 400ms.
Clear my local audio queue, set isInterrupted, send { type: "interrupt" } to the ElevenLabs WS.
My app drops new agent audio after this, but sometimes agent speech keeps coming for ~200–500ms after the interrupt, overlapping with the user.
My questions:
Is type: "interrupt" guaranteed to immediately stop ongoing agent speech and flush any buffered agent audio on the ElevenLabs side?
If not, do I need to always drop all audio messages received after the interrupt (and, for how long)?
Is there a faster or more reliable way to force agent speech termination—maybe an undocumented API feature?
Should I throttle/repeat sending the interrupt for reliability, or is one enough?
Any best practices for VAD thresholds and timing when implementing barge-in for clean user experience?
Is there an acknowledgment that confirms the agent has been interrupted and no further speech will arrive?
My goal is: zero perceptible agent/user overlap when barge-in happens. Any guidance would be much appreciated!