#Conversation API
1 messages · Page 1 of 1 (latest)
Here’s a concise overview of ElevenLabs’ Conversation API (Conversational AI) with the key endpoints, connection options, and operations.
What it is
- Real-time conversational interface to ElevenAgents via streaming connections, enabling two‑way audio and text. Primary WebSocket endpoint: wss://api.elevenlabs.io/v1/convai/conversation?agent_id={agent_id}. Public agents can connect with just agent_id; private agents should use a signed URL. Never expose your API key client‑side. (elevenlabs.io)
Connection types
- WebSocket: Start a session to stream user audio chunks and receive agent audio and text events in real‑time. Signed URL endpoint (server-side): GET /v1/convai/conversation/get-signed-url?agent_id=... returning a wss URL with a short‑lived token. (elevenlabs.io)
- WebRTC: Supported for improved audio quality (AEC, noise suppression) and native browser integration. Obtain a session token via GET /v1/convai/conversation/token, then start a WebRTC session in the SDK. (elevenlabs.io)
Typical start flow
- Public agent over WebSocket: connect to wss endpoint with agent_id.
- Private agent over WebSocket: your server calls get-signed-url, returns the signed wss URL to the client; client connects with that URL. Tokens are short‑lived; do not send API keys to the client. (elevenlabs.io)
- WebRTC: your server requests a conversation token, client starts a session with connectionType: "webrtc". (elevenlabs.io)
Realtime messaging/events (WebSocket)
- Client sends audio chunks and control messages; server streams back agent text/audio and control events. Example event types shown in docs code include: user_transcript, agent_response, agent_response_correction, audio (with alignment), interruption, ping/pong, and agent_chat_response_part. (elevenlabs.io)
Manage and retrieve conversations (REST)
- List conversations (filter by agent, status, time, duration, language, tools, ratings, etc.): GET /v1/convai/conversations. Paginated with cursor; optional summary_mode. (elevenlabs.io)
- Get conversation details (status, metadata, transcript, booleans for audio availability): GET /v1/convai/conversations/{conversation_id}. (elevenlabs.io)
- Search messages across conversations:
- Full‑text/fuzzy: GET /v1/convai/conversations/messages/text-search. (elevenlabs.io)
- Semantic (smart) search: GET /v1/convai/conversations/messages/smart-search. (elevenlabs.io)
Real‑time monitoring (enterprise)
- Observe live conversations and send control commands (e.g., end_call, transfer_to_number, contextual_update, human takeover) via WebSocket: wss://api.elevenlabs.io/v1/convai/conversations/{conversation_id}/monitor. Text events only (no raw audio). Requires appropriate scopes and workspace permissions. (elevenlabs.io)
Post‑call webhooks
-
Receive data after analysis completes. Supported types:
I would like to automate testing the conversations via API.
Retrieve, edit/test and after are fixed, to be pushed to the agent in memory.
Below is a proven, API-first workflow to automate conversation testing and then promote fixes back into your agent’s configuration/“memory” (knowledge + behavior). Endpoints shown are all in v1.
- Retrieve conversations you want to test
- List/filter/search:
- GET /v1/convai/conversations (filter by agent_id, date, rating, feedback, etc.). (elevenlabs.io)
- GET /v1/convai/conversations/messages/text-search (full‑text search across messages). (elevenlabs.io)
- Drill into a specific conversation:
- GET /v1/convai/conversations/:conversation_id (full transcript, metadata, analysis flags). (elevenlabs.io)
- Optional: GET /v1/convai/conversations/:conversation_id/audio (call recording). (elevenlabs.io)
- Optional: POST /v1/convai/conversations/:conversation_id/feedback (like/dislike) to mark regressions to convert into tests later. (elevenlabs.io)
- Create automated tests (scenario/LLM or tool-call) from those transcripts
- Create a test:
- POST /v1/convai/agent-testing/create
- Supply chat_history (selected turns), success_condition, success_examples, failure_examples.
- You can also configure tool_call validation or dynamic_variables for test runs. (elevenlabs.io)
- POST /v1/convai/agent-testing/create
- Manage and run tests:
- GET /v1/convai/agent-testing (list), GET /v1/convai/agent-testing/:test_id (get), PUT /v1/convai/agent-testing/:test_id (update). (elevenlabs.io)
- POST /v1/convai/agents/:agent_id/run-tests (execute a set of tests against the current or overridden config). (elevenlabs.io)
- For end‑to‑end multi‑turn validation (beyond “next message” tests), use conversation simulation:
- POST /v1/convai/agents/:agent_id/simulate-conversation (batch) or the streaming variant; returns transcript + evaluation. (elevenlabs.io)
- Reference guide: Agent Testing (how to author tests, convert from conversations, run via CLI/CI). (elevenlabs.io)
- Iterate until tests pass, using branches/versioning to isolate changes
- Update the agent config (prompt, tools, voice, evaluation, etc.); optionally target a branch:
- PATCH /v1/convai/agents/:agent_id?branch_id=... (commits a new version on that branch when versioning is enabled). (elevenlabs.io)
- Create/list/merge branches as needed to stage and promote fixes:
- POST /v1/convai/agents/:agent_id/branches, GET /v1/convai/agents/:agent_id/branches, GET one, merge branch. (elevenlabs.io)
- Versioning overview and best practices (traffic-splitting, immutability, rollouts). (elevenlabs.io)
- “Push to agent memory” (make fixes and facts persist)
There isn’t a separate “memory write” API; in ElevenLabs Agents, persistent memory is
Ok, so conversation simulation is when I want to try an agent , but as well I can use conversations from previous luve calls .
Yes. Two distinct workflows are supported:
-
Simulate an agent offline for testing/evaluation. Use the Simulate Conversation or Stream Simulate Conversation endpoints to run text-based simulations against an existing agent, with optional evaluation criteria.
-
Reuse transcripts from previous live calls in your simulations. Fetch past conversations (and their transcripts) via the Conversations API, then pass them back to the simulator as a partial conversation history to start from a specific point. The guide explicitly calls out partial_conversation_history for this purpose.
If by “push to the agent in memory” you mean updating what the agent “knows” or the context it starts with:
- Persistent knowledge across calls: add/update documents in the agent’s Knowledge Base (optionally with RAG enabled).
- Per-conversation context: inject runtime data at start using Dynamic Variables and/or Overrides (conversation initiation client data), without changing the base agent.
Notes:
- If Zero Retention Mode is enabled, transcripts aren’t stored; you’ll need to capture data via post-call webhooks instead of relying on conversation history.
So, actually the "memory" is getting deleted after each conversation? (Beside system prompt and KB).
So agent start each conversation only with the system prompt and and context from KB and not previous conversations 🤔?
âś… Search complete, generating response...
So agent start each conversation only with the system prompt and and context from KB and not previous conversations 🤔?
Short answer: Yes. A new conversation starts from the agent’s current configuration (system prompt, tools, and any linked Knowledge Base), plus whatever dynamic variables/overrides you pass at initiation. Prior conversations aren’t automatically injected into the next session.
If you want the agent to “remember” previous calls, implement stateful conversations:
- At call end: receive the transcript/analysis via Post‑call Webhooks and store what you need (e.g., a summary of “previous_topics”). (elevenlabs.io)
- At the next call start: pass that stored context back as dynamic variables (e.g., {{previous_topics}}) or via overrides so it’s included in the new session’s prompt. (elevenlabs.io)
- For inbound telephony, you can also supply these variables through the Twilio personalization webhook before the conversation begins. (elevenlabs.io)
Note: Conversation data can be stored for review/analytics per your retention settings (default 2 years), but storage does not by itself make the agent use previous sessions—persistence of context is achieved by passing it back through dynamic variables/overrides at conversation start. (elevenlabs.io)
Sources:
Post-call webhooks | ElevenLabs Documentation
Post-call webhooks | ElevenLabs Documentation
Twilio personalization | ElevenLabs Documentation
Retention | ElevenLabs Documentation
Then, in this case, the system prompt is crucial, and simulation or edit/tests helps find out if the system prompt is good.
Yes—per ElevenLabs docs, the system prompt is the main lever that defines an agent’s behavior and response style (flow mechanics like turn-taking and language are configured elsewhere). (elevenlabs.io)
How to validate and iterate your system prompt:
- Use the Simulate Conversations guide and API to run scripted scenarios (batch or streaming), supply evaluation criteria, include partial histories, and mock tool responses to see how prompt changes affect outcomes. (elevenlabs.io)
- Use the Agent Testing framework/CLI to automate tests and integrate them into CI so regressions in prompt behavior are caught early. (elevenlabs.io)
Where to put what:
- Keep durable behavior/policies in the system prompt; keep factual/reference material in the Knowledge Base and enable RAG for retrieval rather than stuffing facts into the prompt. (elevenlabs.io)
About “memory” across conversations:
- ElevenLabs documents retention of conversation history for analytics and compliance (configurable up to deletion/zero-retention), but they do not document automatically using past conversations as context for new ones. If you need cross-session context, inject it at conversation start via Dynamic Variables or Overrides. (elevenlabs.io)
In short: yes—the system prompt is crucial, and simulation/edit-tests are the recommended way to prove it’s good before deploying. (elevenlabs.io)
Sources:
Prompting guide | ElevenLabs Documentation
Simulate Conversations | ElevenLabs Documentation
Agent Testing | ElevenLabs Documentation
Prompting guide | ElevenLabs Documentation
Retention | ElevenLabs Documentation
Simulate Conversations | ElevenLabs Documentation
Ok, so simulation API can be used with a brand new system prompt, and edit/tests conversation is for "refining" the system prompt.
Mostly correct, with two clarifications:
-
Simulation API
- What it’s for: programmatically run full multi‑turn, text‑based conversations against an agent and get an analysis payload (batch or streaming). (elevenlabs.io)
- Using a “brand new” system prompt:
- Recommended: create a new agent version/branch with the updated system prompt, then call the simulate‑conversation endpoint with that agent_id. This keeps production safe and lets you A/B later. (elevenlabs.io)
- Alternative for quick trials: start a normal conversation with a per‑conversation System prompt override (enable overrides in the agent’s Security tab). This is supported for live/sandbox conversations and is often paired with Chat Mode for text‑only testing. (elevenlabs.io)
- Note: the Simulate Conversation endpoints themselves take the target agent_id and a simulation_specification (simulated user, criteria, limits); they don’t document an “agent prompt override” parameter. Use a separate agent/version if you need the prompt change to apply to the simulation. (elevenlabs.io)
-
Edit/Tests for refining prompts
- The Agent Testing framework lets you:
- Create Scenario Tests (LLM‑evaluated next‑turn checks) and Tool‑Call Tests. (elevenlabs.io)
- Generate tests directly from past conversations (“Create test from this conversation”) to prevent regressions while iterating on your system prompt, tools, and KB. (elevenlabs.io)
- Run tests in batches and via CLI (elevenlabs agents test <agent_id>) as you refine. (elevenlabs.io)
- The Agent Testing framework lets you:
If you want a fast loop: use Chat Mode + System prompt override to iterate, then codify improvements with Agent Testing and lock them into a new branch/version before running Simulation API at scale. (elevenlabs.io)
Sources:
Simulate conversation | ElevenLabs Documentation
Agent versioning | ElevenLabs Documentation
Overrides | ElevenLabs Documentation
Simulate conversation | ElevenLabs Documentation
Agent Testing | ElevenLabs Documentation
Agent Testing | ElevenLabs Documentation
Agent Testing | ElevenLabs Documentation
Chat Mode | ElevenLabs Documentation
So.. assuming i have 1 conversation (that I think is the best until now), but i further want to improve the system prompt, do i need to take that conversation, and bring it in tests and modify in such manner until I get what I want? Then it means that the system prompt is correct ✅️ ?
Short answer: Yes—turn that “best” conversation into a repeatable test, iterate your system prompt (and related config) until the test passes, and then validate across a broader suite. A prompt is only “correct” when it passes your whole test suite and, ideally, wins an experiment on real traffic.
Recommended workflow
- Create a test from the real conversation: In Call History, use “Create test from this conversation,” define the expected behavior, and add it to your suite. Run tests in UI or via CLI (elevenlabs agents test <agent_id>). (elevenlabs.io)
- Use simulation to iterate quickly: Reproduce full or partial flows with the Simulate Conversations API, including partial_conversation_history and tool_mock_config to target decision points. (elevenlabs.io)
- Refine the system prompt deliberately: The prompting guide explains what the system prompt controls (behavior/style) and what it doesn’t (flow mechanics). Adjust the prompt, KB, tools, or settings as needed; then rerun tests. (elevenlabs.io)
- Define “done” with automation: Follow the testing loop—write tests, iterate, and refine until passing—to prevent regressions and ship with confidence. It’s not just about a perfect prompt; every change should be validated. (elevenlabs.io)
- Prove it in production: Enable Versioning to isolate changes and run Experiments (A/B tests) against live traffic. Promote the winning variant when metrics improve. (elevenlabs.io)
Note on “memory” across conversations
- Prior calls are not automatically in context; if you want the agent to use past interactions, fetch them and pass at conversation start (Personalization) or during the call via tools. (elevenlabs.io)
If you want, I can sketch a minimal test spec and a simulation payload that mirrors your “best” conversation and highlights which prompt lines to tweak first. (elevenlabs.io)
Sources:
Agent Testing | ElevenLabs Documentation
Simulate Conversations | ElevenLabs Documentation
Prompting guide | ElevenLabs Documentation
Agent Testing | ElevenLabs Documentation
Agent versioning | ElevenLabs Documentation
HubSpot | ElevenLabs Documentation
Simulate Conversations | ElevenLabs Documentation
Here’s how to turn your call transcript into a precise, repeatable test and refine your agent accordingly, using only supported ElevenLabs features.
-
Memory between calls
- By default, a new conversation does not automatically load context from previous calls. To persist context across sessions, store what you need after the call (via post‑call webhooks) and inject it at the next start using dynamic variables/overrides. The “Stateful conversations” pattern shows exactly this flow. (elevenlabs.io)
- Data retention controls how long transcripts/recordings are stored (default 2 years), but stored data is not auto‑used as context unless you pass it in. (elevenlabs.io)
-
Use your transcript as a simulation/test
- Simulate text conversations with your agent via the Simulate Conversation API to reproduce and evaluate flows without making calls. The guide also supports seeding a partial conversation history for targeted replays. (elevenlabs.io)
- API references for batch and streaming simulation are here. (elevenlabs.io)
- If you want persistent, automated checks, create agent tests (LLM/scenario or tool‑call tests). You can run them in the dashboard or CI via the CLI. (elevenlabs.io)
-
Apply this to your Romanian transcript
- Agents can auto‑detect and switch languages, and Romanian (RON) is supported. Enable the Language Detection system tool, or set the simulated user’s language to ro in simulations. (elevenlabs.io)
- Minimal simulation example (align fields with the API docs):
- Set simulation_specification.simulated_user_config.first_message to “Alo, da?” and language to “ro”; then add extra turns or evaluation criteria as needed. (elevenlabs.io)
-
Retrieve a real call and turn it into a test
- Fetch a past conversation (transcript, analysis, metadata) with GET Conversation, or list conversations to find the ID, then lift key user/agent turns into a simulation or a formal test. (elevenlabs.io)
-
Iterate on the system prompt safely
- Use simulations/tests to verify changes; the docs explicitly recommend refining prompts based on simulation outcomes. (elevenlabs.io)
- Use Agent Versioning to branch, test, and roll out improvements without risking production; deploy traffic gradually and merge once tests pass. (elevenlabs.io)
-
Ship the improvement
- Update the agent configuration (e.g., system prompt, tools, KB) via the Update Agent API—or commit the change on a versioned branch. (elevenlabs.io)
-
Ground answers in your materials
- If address/booking details should be authoritative, place them in the agent’s Knowledge Base and (optionally) enable RAG so the agent retrieves and cites the right chunks during the call. (elevenlabs.io)
If you want, I can sketch a concrete simulate_conversation payload for your exact transcript (user/
Ok
Here’s the authoritative answer based on ElevenLabs docs:
-
Does the agent “remember” previous conversations? The docs do not describe any automatic carry-over of past transcripts into a new session. A new conversation is initialized from the agent’s configuration (system prompt, knowledge base, tools, voice/language) plus any per-conversation overrides you pass at start (conversation_initiation_client_data with dynamic variables/overrides). Conversation history is retained for analytics and testing, not auto-injected into future sessions. Retention is configurable; default is 2 years. (elevenlabs.io)
-
How to test/simulate and refine the prompt
- Programmatic simulation: Use Simulate conversation (POST /v1/convai/agents/:agent_id/simulate-conversation) or the streaming variant to run text simulations. You can seed with partial_conversation_history for “start from here” tests. (elevenlabs.io)
- Turn real calls into tests: In the dashboard, “Create test from this conversation” converts a live transcript into a repeatable test; iterate on your system prompt until tests pass. (elevenlabs.io)
-
Reusing a “best” conversation to improve the system prompt
- Recommended workflow: take that conversation → create a test from it (UI) or simulate with it as partial history → iterate on the system prompt/KB until the test passes → save/version the agent. (elevenlabs.io)
Key references:
- Personalization and per-conversation overrides/dynamic variables (what you can pass at start). (elevenlabs.io)
- WebSocket initiation message structure (shows what is provided at session start). (elevenlabs.io)
- Conversation retention defaults and controls. (elevenlabs.io)
- Simulation guide (batch/streaming and partial_conversation_history). (elevenlabs.io)
- Simulate conversation API reference. (elevenlabs.io)
If you want, I can sketch the exact simulate_conversation payload to seed your “best” transcript and the minimal fields to tweak in the system prompt. (elevenlabs.io)
Sources:
Personalization | ElevenLabs Documentation
Simulate conversation | ElevenLabs Documentation
Agent Testing | ElevenLabs Documentation
Agent Testing | ElevenLabs Documentation
Personalization | ElevenLabs Documentation
Agent WebSockets | ElevenLabs Documentation
Retention | ElevenLabs Documentation
Simulate Conversations | ElevenLabs Documentation
Simulate conversation | ElevenLabs Documentation
Simulate Conversations | ElevenLabs Documentation
Yes, sketch the exact .
Here’s the exact, API-first workflow to: retrieve a prior conversation, turn it into reproducible tests, iterate with simulations, and then commit a refined system prompt to your agent.
- Retrieve a prior conversation
- List and filter conversations (e.g., by agent_id or user_id), then fetch the full transcript for a specific conversation_id.
- List: GET /v1/convai/conversations
- Returns conversation_id, status, transcript_summary, etc. (elevenlabs.io)
- Get details: GET /v1/convai/conversations/:conversation_id
- Returns transcript turns; use these turns to seed tests or simulations. (elevenlabs.io)
- List: GET /v1/convai/conversations
- Create a reusable test from that transcript
- Define a pass/fail condition and examples, and include the relevant chat_history extracted from the real call.
- Create test: POST /v1/convai/agent-testing/create
- Body: chat_history[], success_condition, success_examples[], failure_examples[]. (elevenlabs.io)
- Create test: POST /v1/convai/agent-testing/create
- (Docs UI/CLI) Tests can also be created from existing conversations and run via CLI or dashboard. (elevenlabs.io)
- Iterate quickly with conversation simulations (offline)
- Run a batch or streaming simulation against your agent, optionally seeding with part of the real transcript and adding evaluation criteria.
- Batch: POST /v1/convai/agents/:agent_id/simulate-conversation
- simulation_specification includes simulated_user_config (first_message or a prompt-driven simulated user), optional partial_conversation_history, tool_mock_config, extra_evaluation_criteria, new_turns_limit. (elevenlabs.io)
- Streaming: POST /v1/convai/agents/:agent_id/simulate-conversation/stream (yields incremental turns + final analysis). (elevenlabs.io)
- Batch: POST /v1/convai/agents/:agent_id/simulate-conversation
- The Simulate Conversations guide shows end-to-end examples, including how to pass partial_conversation_history and define extra_evaluation_criteria; the response includes simulated_conversation and analysis (e.g., call_successful, transcript_summary). (elevenlabs.io)
- Update (and version) the agent’s system prompt safely
- Recommended: enable Versioning and work on a branch, then patch the prompt on that branch.
- Enable/manage Versioning, branches, and traffic deployment (A/B and staged rollout). (elevenlabs.io)
- Update agent: PATCH /v1/convai/agents/:agent_id?[branch_id=...]
- Set conversation_config.agent.prompt.prompt to your refined system prompt; include version_description if applicable. (elevenlabs.io)
- Validate changes by running your test suite on the branch
- Run tests against the candidate branch; optionally pass agent_config_override for ephemeral tweaks.
- Run tests: POST /v1/convai/agents/:agent_id/run-tests
- Body: { tests: [{ test_id }], branch_id, agent_config_override? }
- Response includes per-test status
- Run tests: POST /v1/convai/agents/:agent_id/run-tests
What exactly is the structure of how a test can be created (it depends on the conversation,system prompt)?
Short answer:
- There are two complementary ways to “test” an agent:
- Persisted Agent Response Tests (scenario and/or tool-call checks) you create via POST /v1/convai/agent-testing/create. These evaluate the next agent turn given some prior chat history. (elevenlabs.io)
- Ad‑hoc Conversation Simulations you run via POST /v1/convai/agents/:agent_id/simulate-conversation (or …/stream). These generate a full synthetic conversation and an evaluation summary, but they are not persisted tests. (elevenlabs.io)
What a persisted “Agent Response Test” contains (request schema essentials)
Required top-level fields when calling POST /v1/convai/agent-testing/create:
- name (string) — test name. (elevenlabs.io)
- chat_history (array) — prior turns that set up the scenario to evaluate. The framework evaluates only the next agent response (not an entire conversation). Each turn supports role and message (and can include richer metadata like tool_calls if needed). (elevenlabs.io)
- success_condition (string) — an evaluator prompt that must return True/False for the produced agent response. (elevenlabs.io)
- success_examples (array of objects) — examples of acceptable agent replies, e.g. [{response: "...", type: "success"}]. (elevenlabs.io)
- failure_examples (array of objects) — examples of unacceptable replies, e.g. [{response: "...", type: "failure"}]. (elevenlabs.io)
Optional fields:
- tool_call_parameters (object) — validates that the agent calls a specific tool with correct parameters. In the UI/docs this supports Exact match, Regex, or LLM‑evaluated parameter checks. Use this when the “correct next step” is a tool call. (elevenlabs.io)
- check_any_tool_matches (boolean) — pass if any returned tool call matches the criteria. (elevenlabs.io)
- dynamic_variables (map) — variable substitutions to apply to the agent config during testing (useful to test with different IDs, names, etc.). (elevenlabs.io)
- type (enum) — test type (LLM scenario vs tool-focused), if you need to specify. (elevenlabs.io)
How this relates to conversation context and the system prompt
- The behavior under test comes from two places:
- chat_history you provide (the immediate conversational context the evaluator will judge against), and
- the agent’s current configuration (system prompt, KB, tools, and model) because tests/simulations run against a specific agent_id. If you change the prompt/KB, re-run the same tests to verify there are no regressions. (elevenlabs.io)
When to use simulations vs. persisted tests
- Use Conversation Simulation to iterate quickly on prompts, KB, and tools across a whole synthetic dialogue; you can also add extra_evaluation_criteria and (optionally) mock tools or start from partial conversation history. (elevenlabs.io)
- Convert real
So how/what exactly should i put/structure the agent system prompt and KB ?