#eleven labs text to speech API, where can I obtain parameter "seeds" It is mentioned several place

1 messages · Page 1 of 1 (latest)

grave galeBOT
#
AI-generated Answer for chungx10
  • The parameter name is seed (singular), not seeds. It’s an optional integer you provide in the TTS request to make outputs more repeatable; determinism is not guaranteed. Valid range: 0–4,294,967,295. (elevenlabs.io)
  • You don’t obtain it from an endpoint—pick any integer and send it in the request body to:
    • POST /v1/text-to-speech/:voice_id (Create speech) via the seed field. (elevenlabs.io)
    • POST /v1/text-to-speech/:voice_id/stream (Stream speech) via the seed field. (elevenlabs.io)
  • The TTS overview also notes to use the optional seed parameter for more consistent results. (elevenlabs.io)

Sources:
Create speech | ElevenLabs Documentation
Create speech | ElevenLabs Documentation
Stream speech | ElevenLabs Documentation
Text to Speech | ElevenLabs Documentation

errant wind
#

How can I know which seed is the best to use?

grave galeBOT
#
AI-generated Answer for chungx10

There isn’t a “best” seed documented. The seed parameter exists only to make sampling more deterministic so you can reproduce an output; it doesn’t select a higher‑quality mode. Use any integer in the allowed range (0 to 4,294,967,295) and reuse the same seed and parameters when you want the same result. Note that determinism is best‑effort and subtle differences may still occur. (elevenlabs.io)

Sources:
Create speech | ElevenLabs Documentation

errant wind
#

difference between text to speech and text to dialogue API

grave galeBOT
#
AI-generated Answer for chungx10

Key differences

  • Purpose

    • Text to Speech (TTS): Generate speech for a single voice per request; suited for both batch and low‑latency/streaming use cases. (elevenlabs.io)
    • Text to Dialogue: Generate a single mixed audio file containing multiple speakers from an array of text+voice_id pairs; designed for natural multi‑speaker conversations and powered by the Eleven v3 model. Not intended for real‑time agents. (elevenlabs.io)
  • Endpoints

    • TTS:
      • POST /v1/text-to-speech/:voice_id (Create speech)
      • POST /v1/text-to-speech/:voice_id/stream (Stream speech)
      • Also “with timing” variants for character/word alignment
        . (elevenlabs.io)
    • Text to Dialogue:
      • POST /v1/text-to-dialogue (Create dialogue)
      • POST /v1/text-to-dialogue/stream (Stream dialogue)
      • “with timestamps” variants return alignment and voice segment info
        . (elevenlabs.io)
  • Request shape

    • TTS body: { text, model_id (default eleven_multilingual_v2), optional language_code, voice_settings, seed, previous_text/next_text, previous_request_ids/next_request_ids, apply_text_normalization, apply_language_text_normalization }. Voice is selected via the path parameter :voice_id. (elevenlabs.io)
    • Text to Dialogue body: { inputs: [{ text, voice_id }, …], model_id (default eleven_v3), optional language_code, settings, pronunciation_dictionary_locators, seed, apply_text_normalization }. Up to 10 unique voice_ids per request. (elevenlabs.io)
  • Models and real‑time suitability

    • TTS supports multiple models (e.g., Multilingual v2, Flash v2/v2.5, v3); Flash is recommended for low‑latency conversational use, while v3 is more expressive but higher‑latency. (elevenlabs.io)
    • Text to Dialogue defaults to and is documented for Eleven v3; Dialogue mode is part of v3 and emphasizes expressive multi‑speaker delivery rather than real‑time response. (elevenlabs.io)
  • Output

    • Both return audio with the same output_format options (e.g., mp3/wav/pcm) and have streaming variants. TTS “with timing” and Dialogue “with timestamps” endpoints add alignment metadata; Dialogue also returns voice_segments for each speaker. (elevenlabs.io)

Helpful references

errant wind
#

What is stream and without stream

grave galeBOT
#
AI-generated Answer for chungx10

In ElevenLabs Text to Speech, “stream” vs “without stream” refers to how the audio is delivered:

  • Stream (low‑latency, incremental delivery)

    • Endpoint: POST /v1/text-to-speech/:voice_id/stream
    • The server sends audio bytes as they’re generated via HTTP chunked transfer, so you can start playback immediately instead of waiting for the full file. Also available as a variant that streams JSON chunks with base64 audio plus per‑character timings. (elevenlabs.io)
  • Without stream (full file response)

    • Endpoint: POST /v1/text-to-speech/:voice_id
    • The API returns the complete generated audio in one response. There’s also a non‑streaming variant that returns a single JSON with base64 audio and timing data. (elevenlabs.io)

When to use which:

  • Use streaming for real‑time/low‑latency playback or when text arrives in parts. (elevenlabs.io)
  • Use non‑streaming when you can wait for the full file (simpler download/post‑processing). (elevenlabs.io)

Relevant docs:

  • Streaming overview and chunked transfer explanation; TTS streaming endpoints. (elevenlabs.io)
  • Non‑streaming TTS endpoints. (elevenlabs.io)
  • TTS capability overview (mentions streaming real‑time audio, latency guidance). (elevenlabs.io)

Sources:
Stream speech | ElevenLabs Documentation
Create speech | ElevenLabs Documentation
Streaming | ElevenLabs Documentation
Create speech | ElevenLabs Documentation
Streaming | ElevenLabs Documentation
Create speech | ElevenLabs Documentation
Text to Speech | ElevenLabs Documentation

errant wind
#

what is default stability for tts elven model v3 if not specified in API

grave galeBOT
#
AI-generated Answer for chungx10

0.5 (50%).

Notes:

  • If you don’t pass voice_settings in the TTS request, the API uses the voice’s stored settings; absent that, the system defaults apply. The documented default for stability is 0.5. (elevenlabs.io)

Sources:
Create speech | ElevenLabs Documentation

errant wind
#

How can I insert the expression like [excited] to the API of tts?

grave galeBOT
#
AI-generated Answer for chungx10

Short answer: Put audio tags directly in the text you send to the Text to Speech endpoint, and use the Eleven v3 model. Example:

curl -X POST "https://api.elevenlabs.io/v1/text-to-speech/VOICE_ID?output_format=mp3_44100_128"
-H "Content-Type: application/json"
-H "xi-api-key: YOUR_API_KEY"
-d '{
"text": "[excited] That was amazing! Let’s do it again.",
"model_id": "eleven_v3"
}'

  • Audio tags like [excited], [whispers], [laughs] are interpreted by Eleven v3 when they appear inline in your text. Place them before or after the phrase you want to affect. [Prompting guide] and examples are here. (elevenlabs.io)
  • Eleven v3 is the model that supports these audio tags; set model_id to eleven_v3. (elevenlabs.io)
  • If you need pauses with v3, use [pause], [short pause], or [long pause] (v3 does not support SSML <break/>; other models do). (help.elevenlabs.io)
  • The same inline tags work with streaming TTS as well; use the TTS streaming endpoint and keep model_id=eleven_v3. (elevenlabs.io)
  • Full Create speech API reference (body shape and parameters): (elevenlabs.io)

References:

Sources:
Best practices | ElevenLabs Documentation
Models | ElevenLabs Documentation
Do pauses and SSML phoneme tags work with the API? – ElevenLabs
Streaming | ElevenLabs Documentation
Create speech | ElevenLabs Documentation
How do audio tags work with Eleven v3? – ElevenLabs
Best practices | ElevenLabs Documentation
Create speech | ElevenLabs Documentation
Models | ElevenLabs Documentation
Do pauses and SSML phoneme tags work with the API? – ElevenLabs
Streaming | ElevenLabs Documentation

errant wind
#

It doesnt work for tts, only for text to dialog