how to detect tts end , when use elevenlabs websocket tts api | ElevenLabs | Page 1

hearty pebbleBOT Mar 20, 2026, 5:34 AM

#

AI-generated Answer for hurrytospring0798

Short answer:

Listen for a JSON message with isFinal: true from the TTS WebSocket. That’s the server’s signal that no more audio chunks will follow. (elevenlabs.io)
After sending your last text, send {"text": ""}. This tells the server the sequence is complete; you’ll receive the final message and then the socket will close. You can also treat the WebSocket close as end-of-utterance. (elevenlabs.io)

Minimal pattern (language-agnostic):

Receive loop:
- If message.audio exists, append/decode and play.
- If message.isFinal is true, stop reading and finalize playback.
When done sending text:
- Optionally send {"flush": true} with the last chunk to force any buffered text to generate promptly.
- Then send {"text": ""} to finalize and close. (elevenlabs.io)

Operational notes:

The connection auto-closes after inactivity (default 20 seconds); you can keep it alive by periodically sending " " (space), or adjust inactivity_timeout (up to 180 seconds) via query param. (elevenlabs.io)

Docs:

How-to guide with example showing isFinal handling and end-of-sequence close. (elevenlabs.io)
API reference for the text-to-speech WebSocket, including inactivity_timeout and message types (Audio Output, Final Output). (elevenlabs.io)

Sources:
Generate audio in real-time | ElevenLabs Documentation
Generate audio in real-time | ElevenLabs Documentation
Generate audio in real-time | ElevenLabs Documentation
Generate audio in real-time | ElevenLabs Documentation
Generate audio in real-time | ElevenLabs Documentation
WebSocket | ElevenLabs Documentation

civic tinsel Mar 20, 2026, 5:38 AM

#

In my design, the WebSocket will be multiplexed/reused, so I do not intend to disconnect after each request. Instead of the connection-closing semantics, I need to identify the end-of-audio marker for a specific text segment to know when the generation for that particular utterance is finished.

hearty pebbleBOT Mar 20, 2026, 5:38 AM

#

AI-generated Answer for hurrytospring0798

Short answer

The server tells you an utterance is finished via a final JSON message with isFinal: true. On the multi-context endpoint, that final message also includes contextId so you can correlate which utterance finished. (elevenlabs.io)

What to use, depending on your socket mode

Single-stream websocket (/v1/text-to-speech/{voice_id}/stream-input):
- Receive loop: collect audio chunks from messages that contain audio; stop for the current utterance when you receive a message where isFinal is true. You can keep the connection open and start the next utterance without disconnecting. (elevenlabs.io)
Multi-context websocket (/v1/text-to-speech/{voice_id}/multi-stream-input):
- Send text with a context_id per utterance.
- On the receive side, route each message by contextId. When you get { isFinal: true, contextId: "<id>" }, that specific utterance/context is done. You can close just that context with { context_id, close_context: true } without closing the socket. (elevenlabs.io)

Practical tips for reused/multiplexed sockets

Force completion of buffered text: send { flush: true } (or with context_id on multi-context) to make the model emit what it has, after which you will receive the isFinal message for that utterance/context. (elevenlabs.io)
Keep the socket alive between utterances:
- Single-stream: send " " periodically; an empty string "" sends EOS and closes the socket. You can also raise inactivity_timeout (max 180s) via a query param. (help.elevenlabs.io)
- Multi-context: to prevent a specific context from timing out, you can send an empty text on that context to reset its timer; closing a context is done with close_context, not by sending "". (elevenlabs.io)

References

Real-time TTS over WebSocket (guide): isFinal in receive loop, flush behavior, keep-alive guidance. (elevenlabs.io)
WebSocket API reference (single-stream): message schema, Final Output object, query params (e.g., inactivity_timeout, sync_alignment). (elevenlabs.io)
Multi-Context WebSocket API reference: contextId in responses, isFinal per context, close_context/flush messages. (elevenlabs.io)
Help Center: keeping the WebSocket open; "" closes the single-stream socket.

civic tinsel Mar 20, 2026, 5:43 AM

#

可以单独标记isFlush吗

hearty pebbleBOT Mar 20, 2026, 5:43 AM

#

AI-generated Answer for hurrytospring0798

不可以。协议里没有名为 isFlush 的字段；应使用 flush 布尔标志。

单路流端点（/stream-input）：
- 在发送最后一段文本时携带 "flush": true，用于清空缓冲并立即出音。官方指南示例：await websocket.send({"text": "...", "flush": true})。生成结束时，服务端会发送最终消息（Final Output），你可据此判定一段音频已完成。请勿发送空字符串 ""（这会触发 EOS 并关闭连接）；若需要保持连接，发送单个空格 " "。(elevenlabs.io)
复用/多路并发场景，建议改用多上下文端点（/multi-stream-input）：
- 可以单独发送仅含 "flush": true 的消息来结束某个 context（还可不带 text），例如：{"context_id":"ctx1","flush":true}。对应 context 的生成结束时，服务端会返回包含 isFinal: true 且带 contextId 的最终消息，作为该段音频完成的权威标记。(elevenlabs.io)
额外说明：部分示例代码用的是 is_final（下划线）键名读取最终标记，但 API 参考以 isFinal（驼峰）为准。(elevenlabs.io)

总结：没有 isFlush；请使用客户端消息中的 "flush": true（可与文本同发，或在多上下文端点中单独发送）并监听服务端最终消息的 isFinal 标志来判定单段 TTS 的结束。(elevenlabs.io)

Sources:
Generate audio in real-time | ElevenLabs Documentation
Multi-Context WebSocket | ElevenLabs Documentation
Multi-Context Websocket | ElevenLabs Documentation
Generate audio in real-time | ElevenLabs Documentation

civic tinsel Mar 20, 2026, 5:54 AM

#

isFinal 真正含义是什么，收到flush 一定会收到isFinal

#how to detect tts end , when use elevenlabs websocket tts api