#v3 emotion tagging guidance
1 messages · Page 1 of 1 (latest)
AI-generated Answer for tybost
Here’s concise, officially documented guidance for emotion tagging with the Eleven v3 model:
-
Syntax and scope
- Use inline “audio tags” in square brackets inside your text, e.g., [excited], [whispers], [sighs]. Tags can direct emotion, delivery, and human reactions. (help.elevenlabs.io)
- In Agents’ Expressive Mode (v3 Conversational), each tag typically affects roughly the next 4–5 words before returning to normal delivery. (elevenlabs.io)
-
Where to place tags
- Place tags immediately before or after the segment they should affect; you can combine multiple tags to shape moment‑to‑moment delivery. (elevenlabs.io)
-
What to tag (non‑exhaustive examples from docs)
- Emotions: [happy], [sad], [excited], [angry], [curious], [mischievously], [sorrowful]. (elevenlabs.io)
- Delivery direction: [whispers], [shouts], [slow]. (help.elevenlabs.io)
- Human reactions: [laughs], [clears throat], [sighs], [gulps], [gasps]. (help.elevenlabs.io)
-
Model behavior and best practices
- Effectiveness is voice‑ and context‑dependent; choose a voice aligned with the desired delivery. Don’t expect a whispery voice to convincingly [shout] just by adding a tag. (elevenlabs.io)
- For strongest responsiveness to tags, use the Stability setting “Creative” or “Natural”; “Robust” reduces responsiveness. (elevenlabs.io)
- Eleven v3 does not support SSML break tags; control pauses/pacing with audio tags, punctuation (e.g., ellipses), and text structure. (elevenlabs.io)
- If you include explicit emotional guidance in text, the model may speak it; remove such guidance in post if undesired. (elevenlabs.io)
- During the current v3 research preview, Professional Voice Clones (PVCs) aren’t fully optimized; prefer IVC or designed voices when relying on v3’s expressive features. (elevenlabs.io)
-
API usage
- Generate with v3 by calling Create speech or Stream speech and setting model_id to eleven_v3; use inline tags in the text payload. (help.elevenlabs.io)
-
Reference examples from docs/blog
- “[whispers] I never knew it could be this way…”; “[laughs] … [laughs harder] … [sighs]”; accents via “[strong French accent] …”. These illustrate how emotion, delivery, and reactions are combined inline. (elevenlabs.io)
More
Core emotions
[happy]
[sad]
[angry]
[excited]
[nervous]
[confident]
[calm]
[serious]
[playful]
[worried]
[frustrated]
[hopeful]
[relieved]
[disappointed]
[embarrassed]
[lonely]
[grateful]
Complex emotional shades
[nostalgic]
[wistful]
[bittersweet]
[melancholic]
[anxious]
[tense]
[uneasy]
[determined]
[triumphant]
[resigned]
[reflective]
[skeptical]
[suspicious]
[jealous]
[ashamed]
[yearning]
Intensity variants
[slightly nervous]
[barely holding back anger]
[quietly emotional]
[overjoyed]
[deeply sorrowful]
[barely excited]
[visibly shaken]
REACTIONS AND HUMAN SOUNDS
Breath and body reactions
[gasp]
[sigh]
[deep breath]
[sharp inhale]
[exhale slowly]
[breath trembles]
[nervous breath]
[breath catches]
Speech reactions
[chuckles]
[laughs softly]
[laughs loudly]
[nervous laugh]
[snorts]
[giggles]
[scoffs]
[groans]
[mutters]
Shock and surprise
[gasps in disbelief]
[stunned silence]
[taken aback]
[whispers in shock]
Eating and mouth sounds
[swallows]
[gulps]
[clears throat]
[lips smack]
[clicks tongue]
VOLUME AND ENERGY
Low energy
[whispering]
[quietly]
[softly]
[hushed tone]
[murmuring]
Neutral
[natural tone]
[casual tone]
[conversational tone]
[loudly]
[shouting]
[calling out]
[energetically]
[bursting with excitement]
Intensity shifts
[voice rising]
[voice lowering]
[voice cracks]
[voice trembling]
PACING AND RHYTHM
Speed control
[rushed]
[rapid-fire]
[slow and deliberate]
[slows down]
[picks up pace]
Natural conversation rhythm
[pauses]
[brief pause]
[long pause]
[hesitates]
[stammers]
Speech patterns
[drawn out]
[trails off]
[cuts sentence short]
[repeats for emphasis]
Breathing rhythm
[speaks between breaths]
[breathing heavily]
EMPHASIS AND DELIVERY
Word emphasis
[emphasized]
[stress on next word]
[strong emphasis]
[soft emphasis]
Subtle delivery styles
[understated]
[deadpan]
[dry tone]
Persuasive speaking
[convincingly]
[earnestly]
[passionately]
Authority and command
[firmly]
[commanding tone]
[assertively]
CHARACTER PERFORMANCE
Age cues
[childlike tone]
[teenager tone]
[young adult voice]
[middle-aged tone]
[elderly voice]
Character archetypes
[heroic voice]
[wise mentor voice]
[villain voice]
[evil scientist voice]
[news reporter voice]
[storyteller voice]
[radio host voice]
[teacher voice]
Fantasy characters
[pirate voice]
[knight voice]
[royal voice]
[dragon narrator]
Sci fi characters
[robotic tone]
[sci-fi AI voice]
[hologram voice]
[cybernetic voice]
GENRE PERFORMANCE
Narrative styles
[fantasy narrator]
[epic narrator]
[documentary narrator]
[audiobook narrator]
Film styles
[classic film noir]
[thriller narrator]
[horror whisper]
[comedic narration]
Advertising styles
[commercial voice]
[enthusiastic ad voice]
[luxury brand voice]
[corporate presentation tone]
CONVERSATIONAL REALISM
Thinking sounds
[thinking]
[muttering to self]
[searching for words]
Awkward pauses
[awkward silence]
[hesitates nervously]
Interruptions
[interrupting]
[cuts in]
Dialogue dynamics
[leans closer]
[steps back slightly]
EMOTIONAL SHIFTS
Sudden changes
[suddenly serious]
[suddenly excited]
[tone darkens]
AI-generated Answer for tybost
Here’s what Eleven v3 officially supports for “emotion tagging” (audio tags) and how to use them:
-
Syntax and placement
- Write tags in square brackets inside your script, typically immediately before the words you want to affect; placing them right after a segment also works. Effects are local rather than global. (elevenlabs.io)
- In Agents’ Expressive mode (V3 Conversational), each tag influences roughly the next 4–5 words, then delivery returns to baseline. Treat this as a practical guideline when crafting prompts. (elevenlabs.io)
-
Supported categories and examples (non‑exhaustive)
- Emotions/delivery: [excited], [sarcastic], [curious], [crying], [whispers], [shouts]; reactions like [laughs], [sighs], [exhales], [snorts], [wheezing]. (elevenlabs.io)
- Sound effects and mouth sounds: [applause], [clapping], [explosion], [swallows], [gulps]. (elevenlabs.io)
- Special/experimental: [strong X accent], [sings], [woo], etc. Consistency varies by voice—test before production. (elevenlabs.io)
-
High‑energy guidance (for your “loudly / shouting / calling out / energetically / bursting with excitement” set)
- Prefer canonical tags shown in docs: [shouts] for volume/urgency and [excited] for energetic tone. Example: [shouts] Incoming! Get to cover! or That was unbelievable! [excited] Let’s go! (help.elevenlabs.io)
- Synonyms like [calling out], [energetically], [bursting with excitement] may work sometimes, but the tag list is not guaranteed—use documented tags for reliability and validate output per voice. (elevenlabs.io)
- Reinforce intensity with punctuation and capitalization (exclamation marks, strategic caps) alongside tags. (elevenlabs.io)
-
Best practices that materially affect results
- Match tags to an appropriate voice; a soft/whispery voice won’t convincingly “shout,” and contradictory instructions reduce quality. (elevenlabs.io)
- V3 uses tags instead of SSML for pacing; control rhythm via tags, ellipses, and punctuation (SSML breaks aren’t supported in v3). (elevenlabs.io)
- Some tags are experimental and voice‑dependent; audition multiple voices and iterate. (elevenlabs.io)
-
API pointers
- Use model_id "eleven_v3" with Create speech or Stream speech. For multi‑speaker scenes, use Create dialogue or Stream dialogue. (help.elevenlabs.io)
- Dialogue mode in v3 is designed for expressive, natural multi‑speaker generation. (help.elevenlabs.io)
If you’d like, share a few sample lines from your taxonomy and I’ll map
Hello, @unkempt crown
Unfortunately, we're unable to assist with this issue through Discord. However, if you reach out to our Customer Support team by opening a ticket using this button, they'll be more than happy to help with your inquiry.