#WHY CANT I USE TURBO V2 ASR WITH CONVERSATIONAL V3 VOICE

1 messages · Page 1 of 1 (latest)

pallid flintBOT
#
AI-generated Answer for palosireeni

Short answer: because Turbo v2 isn’t an ASR model. Turbo v2 is a Text-to-Speech (TTS) model, while Speech-to-Text (ASR) in ElevenLabs is provided by Scribe v2 (batch) or Scribe v2 Realtime. In Agents, “V3 Conversational” is the TTS side, and it’s designed to work with Scribe v2 Realtime for recognition and turn‑taking—not with Turbo v2.

What to use instead

  • ASR: Scribe v2 Realtime (modelId: scribe_v2_realtime).
  • TTS: Eleven v3 Conversational (selected in Agents as the TTS model).

Related notes

  • Standard Eleven v3 is expressive TTS but (outside the “Conversational” variant) isn’t intended for low‑latency real‑time; use the Conversational v3 in Agents or pick Turbo/Flash for real‑time TTS if you’re not using Agents.

If you’re integrating via API, stream ASR with scribe_v2_realtime, feed the text to your LLM, then synthesize with the v3 Conversational TTS in Agents; do not try to select “Turbo v2 ASR” as it doesn’t exist.

neon hedge
#

THEN WHY IS ONE OF THE ASR OPTIONS SCRIBE V2 TURBO WHEN CONVERSATIONAL V3 IS NOT ON

pallid flintBOT
#
AI-generated Answer for palosireeni

Short answer: I can’t find any official ElevenLabs documentation that refers to an ASR model named “Scribe v2 Turbo.”

What the docs do say:

  • The only Speech-to-Text models listed are Scribe v2 (batch) and Scribe v2 Realtime (low-latency ~150 ms). There’s no mention of a “Turbo” variant. (elevenlabs.io)
  • Scribe v2 Realtime is the real-time ASR option used for conversational agents; it’s introduced on the blog and in the streaming docs. (elevenlabs.io)
  • Text-to-Speech model choice (e.g., Eleven v3, Turbo, Flash) is documented separately from ASR. There’s no documented dependency that would hide or show ASR options based on whether “v3 Conversational” is enabled. (elevenlabs.io)
  • The help center notes that standard Eleven v3 isn’t intended for real-time; for low-latency TTS they recommend Turbo/Flash. Separately, the Agents docs describe an “Eleven v3 Conversational” mode for agents. (help.elevenlabs.io)

Given this, if you’re seeing “Scribe v2 Turbo” in the UI, that label isn’t documented publicly; it may correspond to Scribe v2 Realtime, but the docs don’t confirm that. If you need a supported choice for live voice agents today, pick Scribe v2 Realtime for ASR and a low‑latency TTS model (Flash or Turbo) per the docs. (elevenlabs.io)

Sources:
Models | ElevenLabs Documentation
Introducing Scribe v2 Realtime
Models | ElevenLabs Documentation
What is Eleven v3? – ElevenLabs
Models | ElevenLabs Documentation

neon hedge
#

not true

pallid flintBOT
#
AI-generated Answer for palosireeni

Here are the official facts:

  • There is no “Turbo v2 ASR.” Turbo v2/v2.5 are Text-to-Speech (TTS) models. In fact, Turbo models are now listed as deprecated in favor of Flash on the models page. (elevenlabs.io)
  • ElevenLabs ASR (speech-to-text) is Scribe only: Scribe v2 (batch) and Scribe v2 Realtime (streaming/WebSocket). (elevenlabs.io)
  • There is no “Scribe v2 Turbo” referenced anywhere in ElevenLabs’ official docs or blog; the documented Scribe variants are v2 and v2 Realtime. (elevenlabs.io)
  • “Eleven v3 Conversational” is a special low‑latency TTS for the Agents platform. To use it, select V3 Conversational as the agent’s TTS model; Expressive Mode is then enabled and the turn‑taking system uses Scribe v2 Realtime. (elevenlabs.io)
  • Outside Agents, the standard Eleven v3 TTS is not recommended for real‑time/conversational use; use Flash (or previously Turbo) for low latency. (help.elevenlabs.io)

If you’re seeing a UI label that says “Scribe v2 Turbo,” that label is not present in the official documentation above. (elevenlabs.io)

Sources:
Models | ElevenLabs Documentation
Models | ElevenLabs Documentation
Models | ElevenLabs Documentation
Expressive mode | ElevenLabs Documentation
What is Eleven v3? – ElevenLabs
Models | ElevenLabs Documentation

neon hedge
#

THERE ARE 3 ASR MODELS I CAN CHOOSE FROM, SCRIBE REALTIME 2.1, SCRIBE V2 TURBO AND ORIGINAL ASR AND WHEN I CHANGE VOICE TO COVERSATIONAL IT DOESNT ALLOW ME TO USE TURBO ANYMORE FOR SOME REASON

pallid flintBOT
#
AI-generated Answer for palosireeni

Short answer: That’s expected. Eleven v3 (“Conversational V3”) is not designed for real‑time/agent use, so real‑time options like Turbo/low‑latency pairings are disabled when you select it. For conversational/real‑time pipelines you need a low‑latency TTS model (Flash or Turbo), and for ASR you should use Scribe v2 Realtime.

What the docs say

  • Eleven v3 is “not suitable for real-time or conversational use cases.” For those, Eleven recommends Turbo v2/v2.5 or Flash v2/v2.5. (help.elevenlabs.io)
  • The supported real‑time ASR is Scribe v2 Realtime (model_id: scribe_v2_realtime) via the realtime STT API/Agents. (elevenlabs.io)
  • If you need model IDs for TTS: Turbo v2.5 = eleven_turbo_v2_5, Turbo v2 = eleven_turbo_v2, Flash v2.5 = eleven_flash_v2_5, Flash v2 = eleven_flash_v2. (help.elevenlabs.io)

What to do

  • For a conversational agent: pick Flash v2.5 or Turbo v2.5 for TTS, and Scribe v2 Realtime for ASR. Don’t use Eleven v3 in this flow. (help.elevenlabs.io)

If you still see mismatched options after switching models, share a screenshot of the exact page and model selections and I’ll cross‑check against the docs.

Sources:
What is Eleven v3? – ElevenLabs
Realtime | ElevenLabs Documentation
How do I find the model ID? – ElevenLabs
What is Eleven v3? – ElevenLabs

neon hedge
#

YES IT IS VERY SUITABLE ATLEAST WOULD BE IF I WAS ALLOWED TO USE TURBO, OTHER ONE IS ASR AND OTHER ONE VOICE THEY SHOULDNT CONFLICT IN ANY WAY. STOP USING OLD INFORMATION

pallid flintBOT
#
AI-generated Answer for palosireeni

You’re right that there’s a current, real‑time–optimized v3 option. Eleven v3 Conversational is a low‑latency TTS model for agents, introduced with Expressive mode. In Expressive mode, the turn‑taking system explicitly uses real‑time signals from Scribe v2 Realtime for timing and prosody; the docs only describe v3 Conversational paired with Scribe v2 Realtime, not with any other ASR model. (elevenlabs.io)

On ASR naming: in the public docs, the real‑time ASR model is “Scribe v2 Realtime” (modelId: scribe_v2_realtime). There is no official documentation page for an ASR named “Scribe v2 Turbo,” so I can’t confirm its availability or compatibility. (elevenlabs.io)

If you want to use Eleven v3 Conversational, the documented setup is:

  • TTS: V3 Conversational (enables Expressive mode by default)
  • ASR: Scribe v2 Realtime (~150 ms latency) (elevenlabs.io)

If you must choose a different TTS/ASR combination, the low‑latency TTS models documented for conversational use are Flash v2.5 or Turbo v2.5; select one of those instead of v3 Conversational and then configure ASR as needed. (elevenlabs.io)

Note: Older help content about “v3 not suitable for real‑time” refers to the original Eleven v3 TTS; the newer v3 Conversational specifically addresses real‑time agents. (help.elevenlabs.io)

Sources:
Introducing Expressive Mode for ElevenAgents
Client-side streaming | ElevenLabs Documentation
Expressive mode | ElevenLabs Documentation
Models | ElevenLabs Documentation
What is Eleven v3? – ElevenLabs

neon hedge
#

WELL TELL THEM TO LET ME CUSTOMIZE IT, IT SAYS THAT Scribe v2.1 Realtime is automatically enabled for V3 models for better latency. BUT IF I WANT LOWER LATENCY THEN THATS MY PROBLEM WHY WOULD THEY NOT LET ME TO

pallid flintBOT
#
AI-generated Answer for palosireeni

Short answer: in ElevenAgents, V3 Conversational is tied to Scribe v2 Realtime and there’s no documented way to override the ASR when that TTS model is selected.

What’s documented

  • Expressive mode (enabled automatically when you pick Eleven v3 Conversational) relies on a turn‑taking system that uses Scribe v2 Realtime signals; this is why that ASR is coupled by default. (elevenlabs.io)
  • Scribe v2 Realtime is integrated into the Agents platform by default; the docs do not describe an option to switch ASR models when using v3 Conversational. (elevenlabs.io)
  • For the lowest TTS latency, Eleven recommends Flash models (~75 ms). Turbo is typically ~250–300 ms. If latency is your top priority, switch your agent’s TTS to Flash v2.5 instead of v3 Conversational. (elevenlabs.io)

About “Scribe v2.1” and “Scribe v2 Turbo”

  • I can’t find any official documentation for an ASR model named “Scribe v2.1” or “Scribe v2 Turbo.” The public docs list Scribe v2 and Scribe v2 Realtime only. (elevenlabs.io)

Workarounds today

  • If you need the absolute lowest end‑to‑end latency, use Flash v2.5 for TTS in Agents (leaving Scribe v2 Realtime as the default ASR), or build with the API outside Agents where you can pair Scribe v2 Realtime (WebSocket STT) with your chosen TTS. (elevenlabs.io)

Requesting the feature

  • I’ll pass this feedback along, but to log it formally, please submit a ticket or email support so it’s tracked: Submit a request in the Help Center or email team@elevenlabs.io. (help.elevenlabs.io)

If you’d like, tell me your exact setup (agent TTS model, transport, streaming settings) and I’ll suggest latency‑focused tuning steps within what’s currently supported.

Sources:
Expressive mode | ElevenLabs Documentation
Realtime Transcription (STT) API - 150ms Latency API
Models | ElevenLabs Documentation
Models | ElevenLabs Documentation
Models | ElevenLabs Documentation
Submit a request – ElevenLabs

neon hedge
#

I DONT CARE ABOUT LATENCY I WANT CONVERSATIONAL WITH SCRIBE TURBO WHY WOULD THEY NOT GIVE ME THAT OPTION

pallid flintBOT
#
AI-generated Answer for palosireeni

Short answer: because Eleven v3 Conversational in ElevenAgents is built to work with Scribe v2 Realtime, not other ASR variants. The turn‑taking system that powers v3 Conversational’s expressive, interruption‑aware delivery uses real‑time prosody/VAD signals coming specifically from Scribe v2 Realtime; other ASR models aren’t documented as compatible with this mode. (elevenlabs.io)

What the docs say

  • Expressive Mode (enabled when you pick “Eleven v3 Conversational”) relies on “real-time signals from Scribe v2 Realtime” to decide when the agent should speak, pause, or wait. That coupling is part of the design, not just a latency preference. (elevenlabs.io)
  • Scribe v2 Realtime is the streaming STT model intended for live agent use; its API and model id are documented separately from batch Scribe v2. (elevenlabs.io)

What you can do today

  • If you need to control the ASR model, build against the Realtime Speech‑to‑Text WebSocket API directly and orchestrate your own pipeline; v3 Conversational’s Agents mode does not document support for swapping in other ASR models. (elevenlabs.io)
  • If you just need an agent with a selectable ASR and don’t require v3 Conversational’s expressive turn‑taking, use a different TTS model supported for Agents (e.g., Flash v2.5/Multilingual v2). (elevenlabs.io)

If you want Eleven to add “v3 Conversational + other ASR (e.g., Turbo)” as a selectable combo, please file a request with Support so it’s tracked. (help.elevenlabs.io)

Sources:
Expressive mode | ElevenLabs Documentation
Expressive mode | ElevenLabs Documentation
Introducing Scribe v2 Realtime
Realtime | ElevenLabs Documentation
Models | ElevenLabs Documentation
How can I contact support? – ElevenLabs

neon hedge
#

HOW LONG CAN IT TAKE FOR MY SUPPORT TICKET TO POSSIBLY HAVE EFFECT