Question | ElevenLabs | Page 1

I'm using ElevenLabs' convai/conversation WebSocket API for voice interactions. I want instant barge-in—when the user starts speaking (detected via vad_score events), the agent's speech should immediately stop, with no overlap.

Current approach:

Monitor VAD score, set isUserSpeaking when above threshold for 400ms.

Clear my local audio queue, set isInterrupted, send { type: "interrupt" } to the ElevenLabs WS.

My app drops new agent audio after this, but sometimes agent speech keeps coming for ~200–500ms after the interrupt, overlapping with the user.

My questions:

Is type: "interrupt" guaranteed to immediately stop ongoing agent speech and flush any buffered agent audio on the ElevenLabs side?

If not, do I need to always drop all audio messages received after the interrupt (and, for how long)?

Is there a faster or more reliable way to force agent speech termination—maybe an undocumented API feature?

Should I throttle/repeat sending the interrupt for reliability, or is one enough?

Any best practices for VAD thresholds and timing when implementing barge-in for clean user experience?

Is there an acknowledgment that confirms the agent has been interrupted and no further speech will arrive?

My goal is: zero perceptible agent/user overlap when barge-in happens. Any guidance would be much appreciated!

#Question