#i have a client that want 3 voice ai agents. reception, followup and caller. he wants to call 2M ph
1 messages · Page 1 of 1 (latest)
Here are the concrete levers you can use to reduce cost for 3 voice agents calling at scale:
ElevenAgents levers
- Use voice-only where possible; if follow-ups can be text, switch that agent to “text only” (0.3–0.4 cents per message by plan). (help.elevenlabs.io)
- Iterate prompts in “setup/prompt testing” first—those calls are billed at half cost. (help.elevenlabs.io)
- Keep calls short. Calls are billed by connection duration, with a 95% discount applied to silences longer than 10s; combine concise flows with a faster voice speed to finish sooner. (Speed control 0.7–1.2x is available in Agents.) (help.elevenlabs.io)
- Prefer ElevenLabs-hosted LLMs or lower-cost LLMs; LLM usage is passed through, and hosted options are designed for reduced reasoning cost. (help.elevenlabs.io)
- Avoid “burst pricing” unless you truly need it—excess concurrency is charged at 2× the standard rate; schedule batches within normal concurrency or request Enterprise scaling. (help.elevenlabs.io)
- Remember telephony carrier fees (e.g., via Twilio or your SIP trunk) are separate from ElevenLabs billing; pick and manage your provider accordingly. (help.elevenlabs.io)
Plan/billing strategy
- Choose a tier that matches expected monthly minutes; included minutes rise with higher tiers, and extra-minute credit costs vary by tier. For very large campaigns (e.g., millions of calls), contact Enterprise for custom rates and higher limits. (help.elevenlabs.io)
- If you also use API features, enable usage-based billing to buy overage credits at the best rate for your tier; use rollover by staying on the same plan to carry unused credits (up to 2Ă— monthly quota). (help.elevenlabs.io)
Voice choices
- Avoid Voice Library voices with custom-rate multipliers (they consume more credits). Use your own voices or Voice Design (only preview text consumes credits when generating candidate voices). (help.elevenlabs.io)
Alternative architecture (if you build your own dialer instead of ElevenAgents)
- Price TTS/STT directly: TTS is billed per character and STT per audio hour. Flash/Turbo is cheaper than Multilingual (e.g., business-tier starting prices: $0.06/1K chars for Flash/Turbo vs $0.12/1K for Multilingual); Scribe v1/v2 is cheaper than Scribe v2 Realtime ($0.22/hr vs $0.39/hr). (elevenlabs.io)
- Flash/Turbo also uses fewer credits per character than other TTS models (0.5 vs 1 credit/char on self-serve). (help.elevenlabs.io)
These adjustments—especially text-only follow-ups, faster speech, hosted/cheaper LLMs, avoiding burst pricing, and the right tier/rollover—are the primary, documented ways to lower total cost. (help.elevenlabs.io)
what does it mean text only? you mean text channels like SMS?
In ElevenAgents, “text-only” means Chat Mode: the agent has a text conversation with no audio input/output. In this mode you’re billed per text message rather than per call minute, with LLM costs passed through separately. (elevenlabs.io)
Text-only can run in:
- Web/app chat via the ElevenAgents widget or SDK/API. (elevenlabs.io)
- Messaging channels that ElevenLabs documents, such as WhatsApp. (elevenlabs.io)
SMS specifically isn’t listed as a built-in channel in the official documentation; WhatsApp is the documented messaging integration. (elevenlabs.io)
Sources:
Chat Mode | ElevenLabs Documentation
Widget customization | ElevenLabs Documentation
WhatsApp | ElevenLabs Documentation
WhatsApp | ElevenLabs Documentation
ok is there something else that i need to know to provide the best, naturally sound agents with lower cost that are actually capable?
Here’s a concise, doc-verified checklist to maximize naturalness and capability while lowering cost for large-scale voice agents.
Channel and mode selection
- Use Chat Mode (text-only) wherever voice isn’t essential (e.g., SMS/WhatsApp follow-ups). Text-only is charged per message; voice is billed per call-minute. This alone can dramatically reduce cost for follow-ups. (elevenlabs.io)
- For WhatsApp, you can run text conversations (and optionally enable voice calls) directly with an assigned agent. (elevenlabs.io)
Telephony integration and audio settings
- Connect phone numbers via native Twilio, SIP trunking, or register calls if you need to keep your existing Twilio flow. Configure telephony audio as ÎĽ-law 8000 Hz for compatibility. (elevenlabs.io)
- Supported output formats include μ-law 8 kHz (optimized for telephony). Use the correct format end‑to‑end to avoid transcoding artifacts. (help.elevenlabs.io)
Naturalness and latency (without unnecessary spend)
- Use Scribe v2 Realtime for ASR to get ~150 ms live transcription and robust multilingual accuracy—key for natural turn-taking. (elevenlabs.io)
- Tune conversation flow:
- Turn timeout: how long to wait in silence before prompting. (elevenlabs.io)
- Interruptions: allow barge‑in for more natural dialogs. (elevenlabs.io)
- Soft timeout: optional short filler if the LLM is slow (improves perceived quality; note it breaks long silences). (elevenlabs.io)
- Turn eagerness: control how quickly the agent responds. (elevenlabs.io)
- Voice customization: adjust speaking speed (0.7–1.2x), enable Expressive mode (v3 Conversational), and use pronunciation dictionaries for consistent names/terms. (elevenlabs.io)
- TTS latency best practices: prefer Flash models when you need lower latency; use streaming/WebSockets; route to the closest region (api.us.elevenlabs.io for US). (elevenlabs.io)
- Don’t rely on optimize_streaming_latency; it’s deprecated. (help.elevenlabs.io)
Cost controls you should actively use
- Silence discount: in voice or multimodal calls, periods of silence longer than 10 seconds are billed at only 5% of the per‑minute rate. Design your flow to sit quietly (e.g., while tools run) when appropriate. (help.elevenlabs.io)
- LLM cost optimization: choose right‑sized models (e.g., lower‑cost “Flash”/fast models for routine turns), keep prompts concise, leverage RAG to inject only relevant snippets, and use tools for deterministic tasks to reduce tokens. (elevenlabs.io)
- Be aware that enabling fillers (soft timeout) or long agent monologues can reduce silence time (and thus the 95% discount); tune per use case. (help.elevenlabs.io)
- During
what do you think about integrating the system prompt with agent lightning
Short answer: I can’t find any ElevenLabs feature named “Agent Lightning” in the official docs. If you meant Salesforce Lightning, the system prompt is configured on the ElevenLabs side (dashboard/API/CLI) and is simply used by the agent you embed in Salesforce—there’s no separate “prompt integration” step in Lightning itself. For prompt control and personalization, the supported mechanisms are:
- Author the system prompt in your agent configuration; see the Quickstart and the Prompting Guide for what the system prompt controls and best practices. (elevenlabs.io)
- Personalize the prompt at runtime with Dynamic Variables (recommended). These can inject values into the system prompt; “secret” dynamic variables are not sent to the LLM provider as part of the prompt/first message. (elevenlabs.io)
- If you need to fully replace the system prompt per conversation, enable Overrides for “System prompt” in the agent’s Security tab and pass the override when starting the conversation. (elevenlabs.io)
- For Salesforce deployments, follow the ElevenLabs Salesforce integration guide; your agent (with its configured system prompt) is what Lightning surfaces. (elevenlabs.io)
If you meant something else by “Agent Lightning,” please clarify the platform or product name so I can point you to the exact ElevenLabs guidance. (elevenlabs.io)
Sources:
Quickstart | ElevenLabs Documentation
Dynamic variables | ElevenLabs Documentation
Overrides | ElevenLabs Documentation
Salesforce | ElevenLabs Documentation
ElevenAgents | ElevenLabs Documentation
agent lightning is an MS framework to fine tune and make agents self learn and make them better by iterating on their own things. i want to make with it better sys prompt for the AI voice agents