#Why does the SDK not provide the same endpoint as the elevenlabs homepage? It seems the homepage us

1 messages ¡ Page 1 of 1 (latest)

past kelpBOT
#
AI-generated Answer for ookami_no_ryuu

Short answer: they’re different endpoints by design, and the SDK exposes both. You’re likely calling the raw-audio stream, while the homepage uses the JSON-with-timestamps stream.

  • The homepage’s behavior matches POST /v1/text-to-speech/{voice_id}/stream/with-timestamps, which streams JSON objects that include base64-encoded audio plus character-level alignment. In the SDK this is client.text_to_speech.stream_with_timestamps(...), which yields a stream of JSON chunks. See “Stream speech with timing.” (elevenlabs.io)

  • The SDK method client.text_to_speech.stream(...) calls POST /v1/text-to-speech/{voice_id}/stream, which streams raw audio bytes over HTTP chunked transfer (not JSON/SSE). This is intended for lowest-latency playback without timing data. See “Stream speech” and the Streaming overview. (elevenlabs.io)

Why audio may differ between the two:

  • If you’re not passing identical parameters (model_id, voice_settings, output_format, etc.) to both endpoints, the synthesis can vary. Both endpoints share the same request fields. (elevenlabs.io)
  • Generation isn’t fully deterministic unless you provide a seed; even then it’s “best effort,” not guaranteed. To minimize variation, set the same seed on both requests. (elevenlabs.io)

What to do:

  • If you want the homepage behavior (JSON chunks with timestamps), call stream_with_timestamps in the SDK and parse audio_base64/alignment from each chunk. (elevenlabs.io)
  • If you only need audio for immediate playback, use stream (raw bytes). (elevenlabs.io)

For completeness: there’s also a non-streaming “with-timestamps” endpoint that returns a single JSON with base64 audio and alignment if you don’t need live streaming. (elevenlabs.io)

Sources:
Stream speech with timing | ElevenLabs Documentation
Stream speech | ElevenLabs Documentation
Stream speech | ElevenLabs Documentation
Stream speech | ElevenLabs Documentation
Stream speech with timing | ElevenLabs Documentation
Stream speech | ElevenLabs Documentation
Create speech with timing | ElevenLabs Documentation

honest yacht
#

The audio is completely different though, it says a different word.

I.e. Japanese ひ is prounced Hai instead of Hee

I'm using the exact same payload and parameters, the only difference is the response type.

past kelpBOT
#
AI-generated Answer for ookami_no_ryuu

Short answer: the response type alone should not change what is spoken. Both endpoints synthesize with the same request fields (voice_id, model_id, voice_settings, etc.); they only differ in how the result is delivered (raw audio stream vs JSON chunks with base64 audio and alignment). Please compare against the API docs for Stream and Stream-with-timestamps. (elevenlabs.io)

Why you’re hearing different words (and how to fix it)

  • Nondeterminism: TTS generations are stochastic. Use the seed parameter to get best‑effort repeatability across runs/endpoints. Determinism is not guaranteed. (elevenlabs.io)
  • Model mismatch: If one call is defaulting to a different model than the other (or than the website), output can change. Explicitly set model_id on both requests (e.g., eleven_multilingual_v2, eleven_flash_v2_5, etc.). The Help Center lists this as the most common cause of “sounds different” between surfaces. (help.elevenlabs.io)
  • Japanese pronunciation: For kana like ひ, enable apply_language_text_normalization=true (Japanese‑only) to improve readings; note it can increase latency. If you use Turbo v2.5 or Flash v2.5, also set language_code="ja" to enforce Japanese. (elevenlabs.io)
  • Voice settings variance: Differences in stability/similarity/style can noticeably change pronunciation. Ensure identical voice_settings on both calls. (help.elevenlabs.io)
  • Latency “optimizations”: If you’re passing optimize_streaming_latency, be aware it’s deprecated and higher levels can affect quality and even normalization. Leave it unset/None. (elevenlabs.io, help.elevenlabs.io)

Checklist to make both endpoints speak identically

  • Set the same: voice_id, model_id, output_format, voice_settings. (elevenlabs.io)
  • Add seed with a fixed integer on both requests. (elevenlabs.io)
  • For Japanese, set apply_language_text_normalization=true; if on Turbo/Flash v2.5, also set language_code="ja". (elevenlabs.io)
  • Ensure optimize_streaming_latency is not used. (elevenlabs.io)

If a specific token still mispronounces, you can force a reading with a pronunciation dictionary (alias or phoneme, model‑dependent) and attach it via pronunciation_dictionary_locators.

honest yacht
#

Does the API cache responses and/or is there any different caching behaviour between plan types?