#I've seen that some companies online use v3. I can tell it's v3 because they allow you to tag sente

1 messages · Page 1 of 1 (latest)

honest shoalBOT
#
AI-generated Answer for harry018485

Short answer

  • There are no hidden, undocumented v3 parameters. What you’re seeing is companies using the standard Text to Speech API with the v3 model and the documented voice_settings (speed, similarity_boost, stability, style, use_speaker_boost), plus inline [audio tags] for emotion. The Text to Dialogue API exposes only limited request-level settings (stability); other controls come from the voice’s stored settings and tags in the text. (elevenlabs.io)

How to do the same

  1. Use v3 in the normal Text to Speech API
  • Call Create speech (or Stream speech/WebSocket) and set model_id to eleven_v3. v3 supports inline audio tags in square brackets to control emotion, delivery, and reactions (e.g., [sad], [whispers], [laughs]). (help.elevenlabs.io)
  • Example on the official v3 page shows Text to Speech with eleven_v3 and tags like [slowly], [chuckles], [whispers]. (elevenlabs.io)
  1. Expose per-request voice controls in your UI
  • Pass the voice_settings object on the TTS request to override a voice’s stored settings for that generation:
    • stability
    • similarity_boost
    • style (style exaggeration)
    • use_speaker_boost
    • speed
      These are the documented fields returned by Get default voice settings and accepted by Edit voice settings. Speed is supported and ranges roughly 0.7–1.2 (default 1.0). (elevenlabs.io)
  • The TTS WebSocket API also accepts voice_settings (e.g., speed, stability, similarity_boost) if you prefer streaming. (elevenlabs.io)
  1. If you use the Text to Dialogue API
  • Request-level “settings” are limited; the docs emphasize Stability (and note Similarity as a voice-level setting influencing consistency). To control similarity, style, speaker boost, or baseline speed in Dialogue, set them on the voice itself via the Voices settings API, and steer moment-to-moment delivery with [audio tags] in each utterance. Dialogue mode itself is v3-only. (elevenlabs.io)

Confirmations

  • v3 supports [emotion/delivery] tags via square brackets; use them in both TTS and Dialogue. (help.elevenlabs.io)
  • v3 is available via the normal Text to Speech endpoints by specifying model_id=eleven_v3. (help.elevenlabs.io)

If you want, I can sketch minimal request examples showing Create speech with voice

rose spear
#

What's the difference between the Text to Dialogue API and the Text to Speech API? Do they both support inline audio tags for emotion? Why and in what cases would one use the Text to Dialogue API?

honest shoalBOT
#
AI-generated Answer for harry018485

Short answer

  • Core difference

    • Text to Speech (TTS): Single-speaker synthesis per request. You pass one voice_id and text and get speech back. Works with multiple models (e.g., v2.x, Flash, Turbo, and v3 when selected via model_id). (elevenlabs.io)
    • Text to Dialogue (TTD): Multi-speaker synthesis in one request. You pass an ordered list of {text, voice_id} turns and the API weaves them into a single, context-aware conversation. It’s built for Eleven v3 and is not intended for real-time use. Unlimited speakers supported. (elevenlabs.io)
  • Do both support inline audio tags for emotion?

    • Yes—when you use the Eleven v3 model. Audio tags are words in square brackets (e.g., [sad], [laughs], [whispers]) that v3 interprets to control delivery. They work via both Create speech (TTS) and Create/Stream dialogue endpoints when model_id is eleven_v3. (help.elevenlabs.io)
  • When to use Text to Dialogue

    • Use TTD when you need natural multi-speaker scenes with pacing, interruptions, and shared emotional context generated as one coherent track (e.g., games, dramas, podcasts, audiobooks with characters). It’s optimized for quality and interaction between speakers rather than low latency; several generations may be needed to get the best take. (elevenlabs.io)

Additional notes

  • v3 can also be used through the standard TTS endpoints by setting model_id to eleven_v3; Text to Dialogue defaults to v3. (elevenlabs.io)
  • A streaming variant exists for both TTS and Dialogue if you want incremental audio delivery, but Dialogue mode is still not meant for real-time agent use. (elevenlabs.io)

Sources:
Create speech | ElevenLabs Documentation
Text to Dialogue | ElevenLabs Documentation
How do audio tags work with Eleven v3 (Alpha)? – ElevenLabs
Text to Dialogue | ElevenLabs Documentation
Models | ElevenLabs Documentation
Stream speech | ElevenLabs Documentation

rose spear
#

While using inline audio tags, how do I add pauses to my audio?

honest shoalBOT
#
AI-generated Answer for harry018485

Short answer:

  • With v3 audio tags: insert [pause] (or similar timing cues like [continues after a beat]). These control pacing qualitatively; duration is context-dependent. (elevenlabs.io)
  • For exact timing: put the SSML-style break tag directly in your text: <break time="x.xs" /> (max 3.0s). Works in the API and Speech Synthesis editor; each break is billed as 10 characters. Use sparingly—overuse can cause speed-ups or artifacts. (help.elevenlabs.io)

Example:
Hello there.<break time="1.0s" />Okay, let’s continue. (help.elevenlabs.io)

Sources:
Eleven v3 (alpha) — The most expressive Text to Speech model
How can I add pauses? – ElevenLabs
How can I add pauses? – ElevenLabs

rose spear
#

What are valid values for the stability parameters i.e. what is the range of values accepted?

honest shoalBOT
#
AI-generated Answer for harry018485

Short answer: ElevenLabs does not publish a hard numeric min/max for stability.

What the docs do state:

  • In the API, stability is a floating‑point field (“double”) in voice_settings; the default returned by GET /v1/voices/settings/default is 1.0. (elevenlabs.io)
  • Example requests set stability with fractional values (e.g., 0.5) in the Text‑to‑Speech WebSocket API. (elevenlabs.io)
  • Tuning guidance (not a strict limit) suggests working in roughly the 0.30–0.85 range depending on desired expressiveness. (elevenlabs.io)

If you need absolute bounds enforced by the API, they are not specified in the public documentation; treat stability as an unbounded double in requests, with 1.0 as the documented default and the above ranges as recommendations. (elevenlabs.io)

Sources:
Get default voice settings | ElevenLabs Documentation
WebSocket | ElevenLabs Documentation
Conversational AI voice design guide | ElevenLabs Documentation
Get default voice settings | ElevenLabs Documentation

rose spear
#

When using the Text to Speech API with eleven_v3 as the model, I changed the value in the speed parameter to 1.75 but the audio came out longer than when I didn't specify any value in the parameter. What happened there? Is there a valid range for speed values?

honest shoalBOT
#
AI-generated Answer for harry018485

Short answer: you used an unsupported value.

  • Valid speed range: 0.7–1.2. Default is 1.0. Values outside this range are “not supported,” so behavior is undefined (it may be ignored or lead to unexpected pacing). Use values within 0.7–1.2. (elevenlabs.io)
  • Where to set it: pass speed inside voice_settings on Text to Speech requests (or store it on the voice). Speed was added to the TTS API in Feb 2025 and is exposed as voice_settings.speed. (elevenlabs.io)
  • Model note: eleven_v3 is supported by the Text to Speech API, so the same speed rules apply. (elevenlabs.io)

Why your audio was longer: because 1.75 is out of range, the API didn’t apply a valid “faster” speed; instead, you hit unsupported behavior, which can change duration unpredictably. Keep speed ≤1.2 for faster output and re-test. (elevenlabs.io)

Sources:
Speed control | ElevenLabs Documentation
February 25, 2025 | ElevenLabs Documentation
Models | ElevenLabs Documentation
Speed control | ElevenLabs Documentation

rose spear
#

It didn't work at all. The audio length stayed the same, the speed stayed the same. I tried different values: 0.8, 1.2, 0.85, 1.15. They all returned the same length audio

honest shoalBOT
#
AI-generated Answer for harry018485

Short answer: the API will only apply speed if you set it in the voice settings (either per request via voice_settings or by saving it on the voice). Supplying speed anywhere else is not used.

What to check and how to fix

  • Make sure you’re sending speed inside voice_settings in the Create speech request body (or save it on the voice first, then omit voice_settings). Example: voice_settings: { "speed": 0.85 }. The Create speech endpoint explicitly documents per-request voice_settings overrides. (elevenlabs.io)
  • Alternatively, persist the setting on the voice via POST /v1/voices/:voice_id/settings/edit (includes speed), then call TTS normally. (elevenlabs.io)
  • You can verify what the service will use by calling GET /v1/voices/:voice_id/settings (or GET /v1/voices/settings/default). Both responses include speed. (elevenlabs.io)
  • Valid speed range is 0.7–1.2 (default 1.0). Values outside this range aren’t supported; don’t rely on them being applied. (elevenlabs.io)
  • Eleven v3 is supported by the Text to Speech API; use model_id: "eleven_v3" if you want v3 through TTS. (elevenlabs.io)

If you still see identical durations after moving speed into voice_settings (or saving it on the voice), please share the exact request body you’re sending so we can pinpoint the mismatch. (elevenlabs.io)

Sources:
Create speech | ElevenLabs Documentation
Edit voice settings | ElevenLabs Documentation
Get voice settings | ElevenLabs Documentation
Speed control | ElevenLabs Documentation
Models | ElevenLabs Documentation
Create speech | ElevenLabs Documentation

rose spear
#

Here is the Python code I use, with the API key and the voice_id hidden:

url = f"https://api.elevenlabs.io/v1/text-to-speech/{voice_id}"


query_param = {
    "output_format":"mp3_44100_128"
    }

headers = {
    "Accept": "audio/mpeg",
    "xi-api-key": x,
    "Content-Type": "application/json"
}

data = {
    "text" : "[enthusiastic]Hi there. [cheerful] I am one of the MaxDay avatars and I can help you convert your leads today! See you later",
    "model_id": "eleven_v3",
    "voice_settings" : {
        "speed":1.15
    }
}
import requests

try:
    response = requests.post(url, headers=headers, json=data, params=query_param)
    response.raise_for_status()

    output_filename = "output_audio_5_fast.mp3"
    with open(output_filename, "wb") as f:
        f.write(response.content)
    print(f"Audio saved successfully to {output_filename}")

except requests.exceptions.HTTPError as err:
    print(f"HTTP error occurred: {err}")
    print(f"Response content: {response.text}")
except requests.exceptions.RequestException as err:
    print(f"An error occurred: {err}")

I believe I am making the correct request to the right endpoint, but please correct me if I'm wrong

honest shoalBOT
#
AI-generated Answer for harry018485
lilac estuary
#

The v3 model does not support these parameters; they are specifically for the other models. You can find all valid parameters in the documentation. For the v3 model, the only one is stability, as you pointed out in the beginning.

honest shoalBOT
#
AI-generated Answer for jonathan_elevenlabs.io

Not quite. Here’s what the official docs say:

  • Eleven v3 is supported by the normal Text to Speech API (set model_id=eleven_v3), and it supports inline audio tags like [sad], [whispers], [laughs], etc. (help.elevenlabs.io)
  • The TTS API accepts the standard voice_settings object (request-level overrides), which includes speed, stability, similarity_boost, style, and use_speaker_boost. These are general voice settings, not limited to “other models.” (elevenlabs.io)
  • Speed is available for all voices and all models; valid range is 0.7–1.2 (1.0 default). Values outside this range are unsupported and may be ignored, which explains why 1.75 had no effect. (help.elevenlabs.io)
  • The Text to Dialogue API (v3-only) exposes only a minimal settings object for dialogue generation (the reference shows a single property), and it relies on audio tags in your script to control emotion, pacing, pauses, etc. It does not expose speed/similarity/style parameters. (elevenlabs.io)

So: with v3 via Text to Speech you can use speed/similarity/style/etc in voice_settings; with Text to Dialogue you primarily use stability plus inline tags for delivery control. (elevenlabs.io)

Sources:
What is Eleven v3 (Alpha)? – ElevenLabs
Get default voice settings | ElevenLabs Documentation
Can I change the pace of the voice? – ElevenLabs
Create dialogue | ElevenLabs Documentation
Get default voice settings | ElevenLabs Documentation

rose spear