#eleven labs text to speech API, where can I obtain parameter "seeds" It is mentioned several place
1 messages · Page 1 of 1 (latest)
- The parameter name is seed (singular), not seeds. It’s an optional integer you provide in the TTS request to make outputs more repeatable; determinism is not guaranteed. Valid range: 0–4,294,967,295. (elevenlabs.io)
- You don’t obtain it from an endpoint—pick any integer and send it in the request body to:
- POST /v1/text-to-speech/:voice_id (Create speech) via the seed field. (elevenlabs.io)
- POST /v1/text-to-speech/:voice_id/stream (Stream speech) via the seed field. (elevenlabs.io)
- The TTS overview also notes to use the optional seed parameter for more consistent results. (elevenlabs.io)
Sources:
Create speech | ElevenLabs Documentation
Create speech | ElevenLabs Documentation
Stream speech | ElevenLabs Documentation
Text to Speech | ElevenLabs Documentation
How can I know which seed is the best to use?
There isn’t a “best” seed documented. The seed parameter exists only to make sampling more deterministic so you can reproduce an output; it doesn’t select a higher‑quality mode. Use any integer in the allowed range (0 to 4,294,967,295) and reuse the same seed and parameters when you want the same result. Note that determinism is best‑effort and subtle differences may still occur. (elevenlabs.io)
difference between text to speech and text to dialogue API
Key differences
-
Purpose
- Text to Speech (TTS): Generate speech for a single voice per request; suited for both batch and low‑latency/streaming use cases. (elevenlabs.io)
- Text to Dialogue: Generate a single mixed audio file containing multiple speakers from an array of text+voice_id pairs; designed for natural multi‑speaker conversations and powered by the Eleven v3 model. Not intended for real‑time agents. (elevenlabs.io)
-
Endpoints
- TTS:
- POST /v1/text-to-speech/:voice_id (Create speech)
- POST /v1/text-to-speech/:voice_id/stream (Stream speech)
- Also “with timing” variants for character/word alignment
. (elevenlabs.io)
- Text to Dialogue:
- POST /v1/text-to-dialogue (Create dialogue)
- POST /v1/text-to-dialogue/stream (Stream dialogue)
- “with timestamps” variants return alignment and voice segment info
. (elevenlabs.io)
- TTS:
-
Request shape
- TTS body: { text, model_id (default eleven_multilingual_v2), optional language_code, voice_settings, seed, previous_text/next_text, previous_request_ids/next_request_ids, apply_text_normalization, apply_language_text_normalization }. Voice is selected via the path parameter :voice_id. (elevenlabs.io)
- Text to Dialogue body: { inputs: [{ text, voice_id }, …], model_id (default eleven_v3), optional language_code, settings, pronunciation_dictionary_locators, seed, apply_text_normalization }. Up to 10 unique voice_ids per request. (elevenlabs.io)
-
Models and real‑time suitability
- TTS supports multiple models (e.g., Multilingual v2, Flash v2/v2.5, v3); Flash is recommended for low‑latency conversational use, while v3 is more expressive but higher‑latency. (elevenlabs.io)
- Text to Dialogue defaults to and is documented for Eleven v3; Dialogue mode is part of v3 and emphasizes expressive multi‑speaker delivery rather than real‑time response. (elevenlabs.io)
-
Output
- Both return audio with the same output_format options (e.g., mp3/wav/pcm) and have streaming variants. TTS “with timing” and Dialogue “with timestamps” endpoints add alignment metadata; Dialogue also returns voice_segments for each speaker. (elevenlabs.io)
Helpful references
- TTS API: Create/Stream/Timing pages. (elevenlabs.io)
What is stream and without stream
In ElevenLabs Text to Speech, “stream” vs “without stream” refers to how the audio is delivered:
-
Stream (low‑latency, incremental delivery)
- Endpoint: POST /v1/text-to-speech/:voice_id/stream
- The server sends audio bytes as they’re generated via HTTP chunked transfer, so you can start playback immediately instead of waiting for the full file. Also available as a variant that streams JSON chunks with base64 audio plus per‑character timings. (elevenlabs.io)
-
Without stream (full file response)
- Endpoint: POST /v1/text-to-speech/:voice_id
- The API returns the complete generated audio in one response. There’s also a non‑streaming variant that returns a single JSON with base64 audio and timing data. (elevenlabs.io)
When to use which:
- Use streaming for real‑time/low‑latency playback or when text arrives in parts. (elevenlabs.io)
- Use non‑streaming when you can wait for the full file (simpler download/post‑processing). (elevenlabs.io)
Relevant docs:
- Streaming overview and chunked transfer explanation; TTS streaming endpoints. (elevenlabs.io)
- Non‑streaming TTS endpoints. (elevenlabs.io)
- TTS capability overview (mentions streaming real‑time audio, latency guidance). (elevenlabs.io)
Sources:
Stream speech | ElevenLabs Documentation
Create speech | ElevenLabs Documentation
Streaming | ElevenLabs Documentation
Create speech | ElevenLabs Documentation
Streaming | ElevenLabs Documentation
Create speech | ElevenLabs Documentation
Text to Speech | ElevenLabs Documentation
what is default stability for tts elven model v3 if not specified in API
0.5 (50%).
Notes:
- If you don’t pass voice_settings in the TTS request, the API uses the voice’s stored settings; absent that, the system defaults apply. The documented default for stability is 0.5. (elevenlabs.io)
How can I insert the expression like [excited] to the API of tts?
Short answer: Put audio tags directly in the text you send to the Text to Speech endpoint, and use the Eleven v3 model. Example:
curl -X POST "https://api.elevenlabs.io/v1/text-to-speech/VOICE_ID?output_format=mp3_44100_128"
-H "Content-Type: application/json"
-H "xi-api-key: YOUR_API_KEY"
-d '{
"text": "[excited] That was amazing! Let’s do it again.",
"model_id": "eleven_v3"
}'
- Audio tags like [excited], [whispers], [laughs] are interpreted by Eleven v3 when they appear inline in your text. Place them before or after the phrase you want to affect. [Prompting guide] and examples are here. (elevenlabs.io)
- Eleven v3 is the model that supports these audio tags; set model_id to eleven_v3. (elevenlabs.io)
- If you need pauses with v3, use [pause], [short pause], or [long pause] (v3 does not support SSML <break/>; other models do). (help.elevenlabs.io)
- The same inline tags work with streaming TTS as well; use the TTS streaming endpoint and keep model_id=eleven_v3. (elevenlabs.io)
- Full Create speech API reference (body shape and parameters): (elevenlabs.io)
References:
- How audio tags work in Eleven v3 and how to use them via API: (help.elevenlabs.io)
- Prompting Eleven v3 (audio tag list, placement): (elevenlabs.io)
- Create speech (TTS) endpoint: (elevenlabs.io)
- Models overview (model_id = eleven_v3): (elevenlabs.io)
- Pauses and SSML vs v3 pause tags: (help.elevenlabs.io)
- Streaming TTS guide: (elevenlabs.io)
Sources:
Best practices | ElevenLabs Documentation
Models | ElevenLabs Documentation
Do pauses and SSML phoneme tags work with the API? – ElevenLabs
Streaming | ElevenLabs Documentation
Create speech | ElevenLabs Documentation
How do audio tags work with Eleven v3? – ElevenLabs
Best practices | ElevenLabs Documentation
Create speech | ElevenLabs Documentation
Models | ElevenLabs Documentation
Do pauses and SSML phoneme tags work with the API? – ElevenLabs
Streaming | ElevenLabs Documentation
It doesnt work for tts, only for text to dialog