#i tried multiple language voice agent setup and i dont think that its working at all
1 messages · Page 1 of 1 (latest)
Thanks for flagging this—here’s the quickest way to verify and fix a multilingual Voice Agent setup, strictly per ElevenLabs docs:
- Add languages (Agent tab)
- Ensure you’ve added specific languages or “All” under Additional languages. Adding any non‑English language switches the agent to the Multilingual v2.5 path; English responses continue to use Flash v2. (elevenlabs.io)
- Configure a voice per language
- For natural pronunciation, assign language‑specific voices in Voice & language. The Language guide explicitly recommends selecting voices per language. (elevenlabs.io)
- Enable automatic language switching (not on by default)
- Add the Language detection system tool. It lets the agent switch output language when the user speaks another language or requests a change. Without this tool, the agent won’t auto‑switch. (elevenlabs.io)
- If you’re using the web widget
- The widget can prompt for a preferred language before the call, or you can force one with override-language="es". Note: “Language selection is fixed for the duration of the call” in the widget—users can’t change it mid‑conversation unless you rely on the language detection tool to trigger a switch. (elevenlabs.io)
- If you’re using an SDK/API
- Pass a conversation override (agent.language, voice, first message, etc.) when starting the session. Overrides must be enabled in the agent’s Security tab. (elevenlabs.io)
- Verify config quickly
- Call GET /v1/convai/agents/:agent_id/widget to confirm the widget_config.language and supported_language_overrides your agent is actually exposing. (elevenlabs.io)
- If you expect different voices for different languages within one conversation
- Turn on Multi‑voice support and use the documented XML‑style tags (e.g., <Spanish>…</Spanish>). Voice labels are case‑sensitive; nested tags aren’t supported; max 10 configured voices per agent. (elevenlabs.io)
Notes on model/language support
- Agents support 31+ languages across voices/models; Flash v2.5 adds 32 low‑latency languages (including Norwegian, Hungarian, Vietnamese). If a language isn’t responding as expected, verify you’re on the multilingual path and that a matching voice is set. (elevenlabs.io)
If you share how you’re running the agent (widget vs SDK), which languages you enabled,
ok now i understand it was the fucntion, but how can i imporve the romanian transcribing quality with low latency faster llm models?
Short answer: use Scribe v2 Realtime for STT, lock the language to Romanian, tune commit/VAD for your latency target, and pick a fast LLM in Agents (Flash/Haiku/4o‑mini). Details below with docs.
What to change
-
Switch STT to Scribe v2 Realtime
- model_id: scribe_v2_realtime
- language_code: "ro" (Romanian) to avoid auto‑detect flips
- audio_format/sample_rate: pcm_16000 at 16 kHz (default in examples)
- Realtime WS endpoint: wss://api.elevenlabs.io/v1/speech-to-text/realtime
- Docs: API reference (handshake params, sample with model_id, language_code, pcm_16000) and Scribe v2 Realtime overview. (elevenlabs.io)
-
Tune commit strategy for latency vs stability
- Use commit_strategy: "manual" (you decide when to finalize) or "vad" (auto based on silence).
- If using VAD, adjust: vad_silence_threshold_secs (0.3–3.0), vad_threshold (0.1–0.9), min_speech_duration_ms/min_silence_duration_ms (50–2000) to balance speed and accuracy. (elevenlabs.io)
-
Provide short context while streaming
- Send previous_text in input_audio_chunk to help the model with names/terms in Romanian across turns. (elevenlabs.io)
-
Reduce background noise before STT (optional but helps quality)
- Use Audio Isolation (Voice Isolator) API to clean the mic feed (supports streaming and 16 kHz PCM path for lower latency). (elevenlabs.io)
-
Confirm Romanian support
- Romanian is a supported language (ISO 639-1 ro / 639-3 ron). (elevenlabs.io)
Choosing a faster LLM in Agents (affects response latency, not STT accuracy)
- Agents let you “bring any LLM” and are optimized for low-latency voice interactions. (elevenlabs.io)
- For conversational speed, pick models optimized for latency (docs call out Gemini Flash, Claude Haiku, GPT‑4o‑mini) and keep reasoning effort low. (elevenlabs.io)
- If you enabled RAG, note it adds ~500 ms; disable it if you’re chasing minimum end‑to‑end latency. (elevenlabs.io)
Why Scribe v2 Realtime
- Designed for live use with ~150 ms latency and 90+ languages; now available in
i am using the agents platform give me solutions accoridng to this
Here are Agents Platform–specific ways to improve Romanian transcription quality while keeping latency low:
-
Enable Romanian and language switching
- In your agent’s Agent tab, add Romanian under Additional languages. Non‑English languages will use the Multilingual v2.5 TTS path; English remains on Flash v2 for speed. (elevenlabs.io)
- Add the Language detection system tool so the agent can automatically switch to Romanian when the caller speaks it or requests it. (elevenlabs.io)
-
Use a Romanian voice for output
- Assign a Romanian voice for that language in Voice & language to ensure natural pronunciation and reduce confusion in mixed‑language prompts. The language guide recommends language‑specific voices. (elevenlabs.io)
- ElevenLabs also advises choosing voices trained in the target language/accent for best results. (help.elevenlabs.io)
- If you need multiple personas/languages in one call, enable Multi‑voice support and configure a Romanian voice label. (elevenlabs.io)
-
Pick a fast LLM and cap its “thinking”
- In Models, choose a speed‑optimized model (examples given in docs include Google’s Gemini Flash series, OpenAI GPT‑4o Mini, and Anthropic Claude Haiku) for real‑time conversations. (elevenlabs.io)
- Reduce latency by setting Temperature low, Thinking budget to Disabled/Low, and Reasoning effort to None for conversational turns. (elevenlabs.io)
-
Tune conversation flow for faster turn‑taking
- Set Turn eagerness to Eager and shorten silent‑turn timeouts; allow interruptions so users can speak over the agent naturally. (elevenlabs.io)
-
STT capabilities and latency expectations
- ElevenLabs STT (Scribe v2) supports 90+ languages including Romanian; Scribe v2 Realtime targets ~150 ms latency for live speech. The Agents architecture uses ElevenLabs’ fine‑tuned ASR in the pipeline. (elevenlabs.io)
-
Keep audio/telephony settings compatible with low‑latency streaming
- For phone integrations (e.g., Twilio), μ‑law at 8 kHz is the standard telephony format supported by ElevenLabs TTS streaming, which aligns with typical PSTN expectations. (elevenlabs.io)
- If you’re routing calls via your own Twilio, use the Register Call flow to connect Twilio to your agent over WebSocket. (elevenlabs.io)
-
General latency tips
- Choose nearby regions/infrastructure when possible; the latency guide also notes voice choice and output format
Usign scribe v2 works with the tool of detecting language ?
Short answer: yes.
- Scribe v2 (batch) automatically detects and transcribes multiple languages in a single file. (help.elevenlabs.io)
- In Agents, the Language detection system tool relies on the realtime STT pipeline; Scribe v2 Realtime provides automatic language detection and can switch mid‑conversation. Enable the Language detection tool in your agent and configure supported languages to allow automatic switching. (elevenlabs.io)
- If you’re integrating via API, you can include the detected language code in events by setting include_language_detection=true on the realtime Speech‑to‑Text WebSocket. (elevenlabs.io)
Note: Romanian is supported by Scribe v2/v2 Realtime. (elevenlabs.io)
Sources:
What is Speech to Text? – ElevenLabs
Language detection | ElevenLabs Documentation
Realtime | ElevenLabs Documentation
Transcription | ElevenLabs Documentation
but when i am testing right now its not working
Thanks for sticking with it. Here’s a quick, documentation‑based checklist to get your multi‑language agent (with Romanian STT) working right now:
- Rule out a platform issue
- Check the ElevenLabs status page. If everything shows “We’re fully operational,” the problem is likely in configuration. (status.elevenlabs.io)
- Verify the agent’s language setup
- Add Romanian under Agent → Additional Languages. Non‑English languages switch the agent to the multilingual stack and let you set a Romanian‑specific voice. (elevenlabs.io)
- If you expect automatic switching between languages, add the Language detection system tool to the agent and configure its description. It triggers a switch when the user speaks another language or requests a change. (elevenlabs.io)
- Important limitation: when using the web widget flow, “Language selection is fixed for the duration of the call – users cannot switch languages mid‑conversation.” If that’s your testing path, mid‑call switching won’t occur even if you enabled the tool. Start the call in Romanian (or the desired language) instead, or set it via conversation overrides. (elevenlabs.io)
- STT model choice and accuracy
- Use Scribe v2 Realtime for live Agents; it’s the recommended real‑time STT. Scribe v2 supports automatic multi‑language transcription and explicitly includes Romanian (ron). (help.elevenlabs.io)
- Improve Romanian term accuracy with Keyterm Prompting (up to 100 terms). (elevenlabs.io)
- Low‑latency LLM for the agent’s reasoning
- In Agent settings (or via Overrides), pick a faster model (e.g., “Flash/Mini/Haiku/…”) from the supported list, or override the LLM per conversation. The platform lets you switch providers and models, and you can enable LLM override in Security. (elevenlabs.io)
- If “it’s not working” during tests, check exact failure reasons
- Open Call History; failed calls show a red error with the cause. Common issues include allowlist blocking, missing required dynamic variables, daily call limit reached, or overrides not enabled. (help.elevenlabs.io)
If you share the specific error shown in Call History (or confirm whether you’re testing through the web widget vs. SDK/API), I can point you to the exact fix. If nothing shows and the issue persists, please open a support ticket and include the Call ID and a short recording so the team can investigate.
what should i do besied enableing the tool function?
Here’s what to check and change in the Agents Platform beyond enabling the Language Detection tool:
-
Configure languages correctly
- In your agent’s Language settings, add Romanian (or choose “All”) so the agent is allowed to switch to it, set a Romanian-specific voice, and customize the first message for that language. (elevenlabs.io)
- Keep the Language Detection system tool enabled and described so the LLM knows when to call it (on user speech in another language or upon a request to switch). (elevenlabs.io)
-
Use the right STT for real-time agents
- Use Scribe v2 Realtime for ultra‑low latency, automatic language detection, VAD, and manual commit; it’s built for agentic use cases and supports switching mid‑conversation. (elevenlabs.io)
- ElevenLabs recommends Scribe v2 Realtime for real-time agent use; Scribe v2 (batch) is for high‑accuracy offline jobs. (help.elevenlabs.io)
- Romanian is supported by Scribe v1/v2 models (90+ languages). (elevenlabs.io)
-
Match audio format and pipeline
- Ensure your input stream uses a supported format (PCM or μ‑law). Agents support PCM (8–44.1 kHz) and μ‑law 8 kHz; Scribe v2 Realtime supports PCM and μ‑law as well. (elevenlabs.io)
-
Tune conversation flow for lower latency
- Enable Interruptions (barge‑in) and set Turn Eagerness to Eager to make the agent respond faster; adjust Turn Timeout to a shorter value if appropriate. (elevenlabs.io)
-
Choose and iterate on the LLM
- In Models, you can select among supported LLMs (e.g., Gemini Flash variants, GPT‑4o/mini) or bring your own; you can swap models to match performance/cost needs and test which gives the best latency for your use case. (elevenlabs.io)
-
Verify it’s actually switching languages
- Use the Dashboard’s Language Breakdown to confirm detected call languages. (elevenlabs.io)
- Create a Tool Call Test to ensure the LLM is invoking the language_detection tool with valid parameters. (elevenlabs.io)
- If you have Enterprise, use Real‑time Monitoring to watch transcripts/events and confirm when detection/switching occurs. (elevenlabs.io)
Notes on “Scribe v2 + Language Detection tool”
- Scribe v2 Realtime automatically detects spoken language; the Language Detection system tool tells the agent to switch its output language/voice accordingly. Use both: automatic ASR detection plus the tool that instructs the LLM to switch the agent’s response language. (elevenlabs.io)
If any of this still doesn’t work, double‑check that Romanian is listed in your agent’s Additional Languages and that the tool call passes a language code present in that list, then validate via Tool Call Testing or Monitoring. (elevenlabs.io)
i dont understand i am the only one experiencing bad quality with romnaian, with russsian is great and it has no problem picking up all the words and things but with romanian is bad