#how can i stop the voice from reading out the tags in v3? this happens frequently
1 messages · Page 1 of 1 (latest)
Short answer: make sure you’re actually using Eleven v3 and only use its bracketed audio tags. SSML or narrative cues will be spoken.
What to do
- Verify the model: set model_id to eleven_v3 on Create speech/Stream speech. Only v3 interprets audio tags; other models will read them out. (help.elevenlabs.io)
- Use v3 audio tags exactly as documented: lowercase text in square brackets inline with your script, e.g. [whispers], [laughs], [sighs]. (elevenlabs.io)
- Don’t use SSML with v3. SSML break/phoneme tags aren’t supported in v3 and will be treated as text. For pauses in v3, use [pause], [short pause], or [long pause]. (help.elevenlabs.io)
- Remove descriptive “direction” text from the script (e.g., “she whispers,” “angrily”) if you don’t want it spoken. The model will read such guidance literally; use audio tags instead or strip these phrases before synthesis. (elevenlabs.io)
- If tags still get spoken intermittently, try a less “Robust” stability setting (Creative/Natural) so v3 better follows directional prompts. Results can be voice‑dependent. (elevenlabs.io)
Example input (v3): “[whispers] We’ll start now. [short pause] Please follow me.” (elevenlabs.io)
If you’re already on v3 with bracketed, lowercase tags and still hear them spoken, share a minimal text snippet and model settings and I’ll help pinpoint the cause.
Sources:
What is Eleven v3? – ElevenLabs
Eleven v3: Most Expressive AI TTS Model Launched
Do pauses and SSML phoneme tags work with the API? – ElevenLabs
Controls | ElevenLabs Documentation
Best practices | ElevenLabs Documentation
Eleven v3: Most Expressive AI TTS Model Launched
[happy] Great, let's jump right in then! [pause] [question_1] So, first things first: How would you rate today's curly interview experience so far on a scale of one to five, with five being excellent?
agent read the "question_1" tag
Root cause
- In Eleven v3, anything in square brackets is treated as an audio tag. Only expressive/audio tags should appear there; other bracketed text can be spoken. The docs explicitly say to never place original text inside brackets and to reserve square brackets for audio tags. (help.elevenlabs.io)
How to stop it
- Remove non-audio metadata like [question_1] from the text you send to TTS. Keep square brackets only for supported audio tags (e.g., emotions, delivery cues, non‑verbal reactions). (help.elevenlabs.io)
- For timing, use v3’s pause tags such as [pause], [short pause], [long pause] (SSML <break> is not supported by v3). (help.elevenlabs.io)
- If you’re scripting conversations, consider v3’s dialogue endpoints to separate speaker structure from spoken lines (avoid putting structural labels in brackets inside the utterance). (help.elevenlabs.io)
Reference guidance on audio tags and prompting for v3
- What is Eleven v3? (audio tags overview) (help.elevenlabs.io)
- Do pauses and SSML phoneme tags work with the API? (v3 pause tags; no SSML breaks) (help.elevenlabs.io)
- Prompting best practices (do not place original text inside brackets) (elevenlabs.io)
- Audio Tags blog (how v3 interprets words in square brackets as performance directions) (elevenlabs.io)
Sources:
What is Eleven v3? – ElevenLabs
What is Eleven v3? – ElevenLabs
Do pauses and SSML phoneme tags work with the API? – ElevenLabs
What is Eleven v3? – ElevenLabs
What is Eleven v3? – ElevenLabs
Do pauses and SSML phoneme tags work with the API? – ElevenLabs
Best practices | ElevenLabs Documentation
ElevenLabs Audio Tags: More control over AI Voices
hm but i have added [question_1] as an audio tag