#I’m working with the ElevenLabs Agents platform for a real-time phone agent (Romanian & Russian),a
1 messages · Page 1 of 1 (latest)
Thanks for flagging this. Here’s what the ElevenLabs docs recommend checking for real‑time phone agents when ASR appears unstable on faster agents/models:
-
Use Scribe v2 Realtime for ASR. It’s the low‑latency STT used for agentic use cases and supports 90+ languages, including Romanian and Russian. If you’re integrating STT directly, connect to the Realtime STT WebSocket and use model_id=scribe_v2_realtime. (elevenlabs.io)
-
Telephony audio format must be μ‑law 8000 Hz for both input and TTS output when using Twilio/telephony. Mismatched formats can hurt recognition. Set “μ‑law 8000 Hz” for TTS output and also select “μ‑law 8000 Hz” as the input format in the agent settings as shown in the Twilio integration guide. (elevenlabs.io)
-
Fix or explicitly control the conversation language. For single‑language lines, set the agent’s language to Romanian or Russian (or override it at conversation start). For mixed‑language lines, enable the Language Detection system tool so the agent can switch based on detected speech. (elevenlabs.io)
-
Tune turn‑taking for “faster” agents. Faster TTS/LLM settings can cause the agent to jump in early. In Conversation Flow:
- Use “Patient” turn eagerness when collecting structured inputs (numbers, emails) to avoid cutting off speakers.
- Adjust timeouts and consider whether user interruptions are enabled/disabled for your use case. (elevenlabs.io)
-
If you integrate STT directly (useful for isolating the issue), use Scribe Realtime’s VAD/commit controls:
- commit_strategy: manual or vad
- vad_threshold (default 0.4), vad_silence_threshold_secs (default 1.5s), min_speech_duration_ms/min_silence_duration_ms
- You can also pass language_code (“ro” or “ru”) and choose audio_format (PCM 8–48 kHz or μ‑law). (elevenlabs.io)
-
Verify language/model support. Romanian and Russian are supported across the current TTS models used by Agents; Scribe covers 90+ languages. (help.elevenlabs.io)
-
Capture a reproducible trace. Use Real‑time Monitoring (enterprise) to stream conversation events/transcripts while reproducing the issue, then share conversation IDs and timestamps with support. (elevenlabs.io)
If you still see instability after applying the above, please open a support ticket and include:
- the agent ID, conversation IDs, time ranges, sample call recordings, your telephony provider, and your current audio format/language settings. (help.elevenlabs.io)
If you can share a short call example and which “faster” model/setting you’re using (e.g., Flash v2/v2.5 TTS, turn‑eagerness = Eager), I can map that to the exact knobs in the
i am using the agents platform and i am connected to my sip trunk and i experience bad asr using scribe v2 fro an romanian/russian agent when i am using faster models
Thanks for the details. Here’s what you can verify/tune on ElevenLabs Agents with Scribe v2 Realtime for Romanian/Russian over SIP:
-
Use Scribe v2 Realtime for live calls and ensure the session is configured correctly:
- language_code: explicitly set to “ro”/“ron” (Romanian) or “ru”/“rus” (Russian) instead of relying on auto-detect, which can stabilize recognition for short or ambiguous utterances. Leave empty only if you want auto-detection. (elevenlabs.io)
- audio_format: match your SIP media. If your trunk is G.711 μ-law at 8 kHz, set ulaw_8000; if you can send PCM, pcm_16000 is recommended for best balance of accuracy/bandwidth. Only mono is supported. (elevenlabs.io)
- Commit strategy: if you see unstable partials, try VAD commit_strategy with tuned thresholds (e.g., vad_silence_threshold_secs, vad_threshold) so segments are committed on detected silences rather than manually. (elevenlabs.io)
-
Audio quality and chunking best practices:
- Prefer 16 kHz input where possible; ensure clean, non‑clipped audio; stream 0.1–1.0 s chunks for smooth real‑time behavior. (elevenlabs.io)
-
SIP trunking is supported with Agents; ensure your carrier is sending exactly what you configure above (e.g., μ-law 8 kHz) to avoid server‑side transcoding mismatches. (help.elevenlabs.io)
-
Language coverage:
- Agents support the languages covered by Flash v2.5/Turbo v2.5 (includes Romanian and Russian) for TTS. Scribe v2 Realtime supports multilingual ASR and can auto‑detect/switch languages mid‑conversation if needed. (help.elevenlabs.io)
-
If you need to compare behaviors, note ElevenLabs recommends Scribe v1 for highest‑accuracy batch transcription (non‑real‑time) and Scribe v2 Realtime for live use; this can help isolate whether the issue is real‑time segmentation vs. core recognition. (help.elevenlabs.io)
If the issue persists after aligning language_code, audio_format, and VAD settings, please share a couple of example call IDs and short audio excerpts with Support so we can review the session config and logs. (help.elevenlabs.io)
References:
- Realtime Speech to Text (Scribe v2 Realtime): query params, audio formats, VAD, best practices. (elevenlabs.io)
- Agents SIP trunking support. (help.elevenlabs.io)
- Agents language support;
keep in mind that i am using the agents platform tell me how should i test to find out what is the main problem and dont give me api solutions