a long conversational goal for subagent in Workflow is good ? | ElevenLabs | Page 1

quick shardBOT Jan 20, 2026, 9:38 AM

#

AI-generated Answer for trinhthiquynhanh

Short answer: no—long “conversational goals” in a subagent’s System Prompt are not inherently better.

What ElevenLabs recommends

Keep prompts concise and specialized. Longer prompts increase latency and cost; there’s no guarantee they improve results. The docs explicitly note that prompts over ~2000 tokens tend to hurt efficiency, and “a longer system prompt does not guarantee better results.” (elevenlabs.io)
Use subagents to narrow scope. In Workflows, a Subagent node should override/append only what’s needed (goal, system instructions, model, voice) for that phase. Keep each subagent’s objective tightly focused. (elevenlabs.io)
Put large reference material in the Knowledge Base with RAG rather than the prompt/goal. RAG retrieves only relevant snippets, avoiding oversized context and keeping prompts lean. (elevenlabs.io)
Decompose complex tasks across multiple agents/nodes. Orchestrate with Workflows and only escalate to more capable agents when needed to control cost and complexity. (elevenlabs.io)

Practical limits

There’s a hard cap on total system context size (including instructions and any “Prompt”-mode documents). Keep goals brief and move details to KB/RAG if you approach limits. (elevenlabs.io)

Bottom line: write a short, specific goal for each subagent, structure steps clearly if needed, and shift bulk details to the Knowledge Base. This aligns with ElevenLabs’ prompt best practices and Workflow design guidance. (elevenlabs.io)

keen gazelle Jan 20, 2026, 9:43 AM

#

a token is counted as a character or a word ?

quick shardBOT Jan 20, 2026, 9:43 AM

#

AI-generated Answer for trinhthiquynhanh

ElevenLabs does not meter “tokens.” Text-to-Speech usage is billed by written text characters (not words). For Turbo/Flash models, 1 text character = 0.5 credits on self-serve plans; for English/Multilingual v1/v2, 1 text character = 1 credit. ElevenLabs renamed “characters” to “credits,” but for TTS it still refers to written text characters. (help.elevenlabs.io)

For Agents: charges are per call minute and/or per text message, with any LLM costs passed through separately (not token-based metering by ElevenLabs). (help.elevenlabs.io)

Sources:
Have characters changed? – ElevenLabs
How much does ElevenLabs Agents (formerly Conversational AI) cost? – ElevenLabs

keen gazelle Jan 20, 2026, 9:44 AM

#

In system prompt docs, the docs explicitly note that prompts over ~2000 tokens tend to hurt efficiency. Thus, a token is counted as a character or a word ?

quick shardBOT Jan 20, 2026, 9:44 AM

#

AI-generated Answer for trinhthiquynhanh

Neither. In ElevenLabs’ Conversational AI docs, a token is an LLM unit (subword), not a single character or a whole word. As guidance: “a token is … roughly equivalent to 4 characters on average,” and “for English text, 100 tokens is approximately 75 words.” (elevenlabs.io)

For context, the system-prompt guide notes that prompts over ~2000 tokens tend to increase latency and cost. (elevenlabs.io)

Sources:
Models | ElevenLabs Documentation
Prompting guide | ElevenLabs Documentation

keen gazelle Jan 20, 2026, 10:58 AM

#

I force my subagent (has 3 tools) to call a tool (to retrieve the itemID) by prompting it. But it didn't call at the end. Why? I submit the creator subscription so I think there will be no limit.

quick shardBOT Jan 20, 2026, 10:58 AM

#

AI-generated Answer for trinhthiquynhanh

Short answer: prompting a Subagent to call a tool doesn’t guarantee the call will happen. In Workflows, tools attached to a Subagent are available to the model, but the model may choose not to invoke them. If you need a guaranteed call (e.g., to fetch itemID), add a Dispatch Tool node in the Workflow—Dispatch Tool nodes are dedicated execution points that guarantee the tool is executed. (elevenlabs.io)

What to check/do:

Use a Dispatch Tool node for the itemID lookup and route success/failure via its special edges. (elevenlabs.io)
Improve tool selection reliability: give the tool and its parameters clear, descriptive names; add explicit “when to call” instructions in the tool description/system prompt; and prefer higher‑intelligence LLMs for function calling. (elevenlabs.io)
If it’s a Client Tool, confirm the client handler is registered and check the browser console for errors. (elevenlabs.io)

About limits: the Creator plan does not remove limits on Agents. Agents have plan‑based concurrency caps (Creator: 10 concurrent conversations per workspace), and usage is billed by minutes/messages; exceeding included usage requires enabling usage‑based billing or burst capacity. These are platform limits, independent of whether a tool call occurs. (help.elevenlabs.io)

If you share a screenshot of your Workflow graph (showing the Subagent node and edges), I can point to exactly where to insert the Dispatch Tool node and how to wire the success/failure paths. (elevenlabs.io)

Sources:
Workflows | ElevenLabs Documentation
Workflows | ElevenLabs Documentation
Client tools | ElevenLabs Documentation
Client tools | ElevenLabs Documentation
How many ElevenLabs Agents (formerly Conversational AI) requests can I make and can I increase it? – ElevenLabs
Workflows | ElevenLabs Documentation

keen gazelle Jan 21, 2026, 4:27 AM

#

The tool cannot be called after a long test can be due to the API rate limit (Iam using GPT-4o-mini) in subagent in workflow?

quick shardBOT Jan 21, 2026, 4:27 AM

#

AI-generated Answer for trinhthiquynhanh

Short answer: yes—hitting ElevenLabs platform limits can stop tools from being invoked by an Agent during long tests.

What’s documented:

Agents have per‑workspace concurrency limits (max simultaneous calls/conversations). If you reach your plan’s limit, new actions (including tool calls) won’t start until capacity frees up. Free: 4, Starter: 6, Creator: 10, Pro: 20, Scale: 30, Business: 30. You can optionally enable “burst pricing” to temporarily allow up to 3× your normal concurrency at a higher rate. (help.elevenlabs.io)
There’s also a cap on how many credits can be consumed “at once” (2× your plan’s monthly quota). If you hit this cap (or run out of credits without usage‑based billing enabled), operations stop. (help.elevenlabs.io)
You can verify whether you’re saturating concurrency or running into request spikes in Developers → Usage → API Requests, which shows totals and concurrent requests over time. (help.elevenlabs.io)
Agents let you choose external LLMs, including OpenAI models. ElevenLabs docs note that LLM selection and pass‑through LLM costs are supported, but they do not document OpenAI‑side rate limits. (help.elevenlabs.io)

What this means for your case (using GPT‑4o‑mini in a subagent/workflow):

Tool calls failing after a long test can be caused by ElevenLabs Agent concurrency saturation or by hitting the credits cap—both are documented platform limits.
If you suspect the LLM provider’s own rate limits, that would be outside ElevenLabs’ published docs; check the provider’s dashboard/quotas separately. ElevenLabs’ help center does not specify provider‑side rate‑limit behavior.

Recommended checks:

In Developers → Usage → API Requests, look for concurrency spikes aligning with the failures. (help.elevenlabs.io)
Confirm you haven’t hit the “credits at once” cap, or enable usage‑based billing if you need to exceed your monthly quota. (help.elevenlabs.io)
If needed, enable burst pricing on the impacted agent to temporarily exceed your normal concurrency. (help.elevenlabs.io)

If you need deeper investigation, contact support with timestamps and agent IDs so they can correlate with platform metrics.

keen gazelle Jan 21, 2026, 4:53 AM

#

This conversation failed with the following reason:

Generating the LLM response took too long.
what is this error ?

quick shardBOT Jan 21, 2026, 4:53 AM

#

AI-generated Answer for trinhthiquynhanh

Short answer: it’s a timeout while waiting for the configured LLM to return.

Details verified in docs:

ElevenLabs treats LLM “timeouts” as a failure condition; the platform will cascade to backup LLMs unless you’ve disabled fallbacks. If no model returns in time, the conversation can end with a failure reason. (elevenlabs.io)
There’s a configurable LLM cascade timeout (cascade_timeout_seconds; default 8s, range 2–15s). If the primary model exceeds this, the system moves on; if all attempts exceed timeouts, you’ll see a failure. (elevenlabs.io)
You can view the exact failure reason for a call/conversation in Call History in the dashboard. (help.elevenlabs.io)

Mitigations documented by ElevenLabs:

Keep backup LLMs enabled; disabling them can cause abrupt failures when a model times out. (elevenlabs.io)
Reduce LLM “thinking budget” and keep “reasoning effort” set to None for conversational agents to avoid slow responses. (elevenlabs.io)
Optimize for latency (use faster models/voices, streaming settings, and appropriate regions). (elevenlabs.io)
If using a custom LLM that’s slow, implement the documented “buffer words” streaming pattern to keep speech flowing while the model completes. (elevenlabs.io)
Be aware that enabling RAG adds ~500 ms per turn. (elevenlabs.io)

If the error persists after these changes, share the conversation ID and timestamps with Support via the help center. (help.elevenlabs.io)

Sources:
LLM Cascading | ElevenLabs Documentation
Changelog | ElevenLabs Documentation
Where can I locate the reason for my call failing? – ElevenLabs
Models | ElevenLabs Documentation
Models | ElevenLabs Documentation
Latency optimization | ElevenLabs Documentation
Integrate your own model | ElevenLabs Documentation
Retrieval-Augmented Generation | ElevenLabs Documentation
How can I contact support? – ElevenLabs

keen gazelle Jan 21, 2026, 7:52 AM

#

what is Play keypad touch tone ?

quick shardBOT Jan 21, 2026, 7:52 AM

#

AI-generated Answer for trinhthiquynhanh

Play keypad touch tone is a Conversational AI system tool that lets an ElevenLabs agent play DTMF tones (the sounds made when pressing phone keypad digits) during an active phone call so it can navigate IVRs, enter extensions/PINs, etc. It works with both Twilio and SIP trunking call integrations. (elevenlabs.io)

Key details:

Supported inputs: digits 0–9, star (*), hash (#); pauses w = 0.5s and W = 1.0s (e.g., dtmf_tones: "2w123#"). (elevenlabs.io)
Parameters when called by an LLM: reason (optional) and dtmf_tones (required). Tool name: play_keypad_touch_tone. (elevenlabs.io)
Enablement: add the play_keypad_touch_tone system tool to your agent’s tools; no extra configuration beyond enabling it. (elevenlabs.io)
Constraint: can only be used during an active phone conversation; calling it outside a call returns an error. (elevenlabs.io)

Sources:
Play keypad touch tone | ElevenLabs Documentation
Play keypad touch tone | ElevenLabs Documentation
Play keypad touch tone | ElevenLabs Documentation
Play keypad touch tone | ElevenLabs Documentation
Play keypad touch tone | ElevenLabs Documentation

keen gazelle Jan 21, 2026, 8:06 AM

#

The Play Keypad touch tone will kinda help to fulfil the empty (latency) when the agent will call api?

#a long conversational goal for subagent in Workflow is good ?