#How can I handle accents and non-native speakers with my voice agents? This causes unexpected quirk
1 messages · Page 1 of 1 (latest)
AI-generated Answer for flenshhh0524
Short answer:
- Use Scribe v2 Realtime for the live transcript, and when the user finishes a turn, optionally recheck that segment with Scribe v2 + keyterm prompting to bias domain terms (e.g., “tire,” “lug nut,” “jack”) and fix homophones like “change a tire” vs “change attire.” Keyterm prompting supports up to 100 terms and is available on Scribe v2 batch transcription.
- Pass conversation context to the realtime recognizer using previous_text so the ASR can leverage what’s already been said (helps resolve ambiguous words in-context).
- If you know the caller’s language, set language_code in the realtime STT handshake to reduce misdetection with strong accents; also tune VAD for non‑native speakers by adjusting vad_silence_threshold_secs, vad_threshold, and min_*_duration to avoid premature commits or cut‑offs.
- Detect low‑confidence transcripts and ask the user to confirm: the realtime API returns word‑level logprob in committed_transcript_with_timestamps; you can use this signal to trigger a polite clarification. The Agents quickstart also recommends clarifying unclear requests in the system prompt.
- Let the agent adapt to non‑native speakers and code‑switching: enable the Language detection system tool and configure multiple languages for your agent so it can automatically switch output language based on the user’s speech.
- Choose Scribe v2 Realtime for accuracy “across diverse accents,” with automatic language detection and text conditioning; use it inside ElevenLabs Agents.
Why this helps:
- Keyterm prompting (batch Scribe v2) biases toward domain‑relevant words without forcing them when context doesn’t fit—ideal for homophones in task‑specific agents.
- Realtime context (previous_text), language hints, and tuned VAD improve turn integrity and reduce accent‑related errors during live calls.
- Programmatic clarification plus agent prompt guidance keeps the conversation correct when the ASR is uncertain.
Notes:
- Keyterm prompting is batch-only (Scribe v2), so use it as a quick post‑turn verification step alongside your realtime stream if needed.
- Scribe v2/Realtime support 90+ languages and are designed for real‑time agent scenarios.
Is this only with the real time API or can these also be configured on the agents platform?