#Custom LLM works = Gemini Flash + tools OK.Switch agent LLM to OpenAI (Custom LLM) and it immediat
1 messages · Page 1 of 1 (latest)
Short answer: none of those surfaces expose the upstream LLM provider’s raw HTTP status and body.
- Dashboard Call History “details”: Shows a human‑readable error message explaining why the call failed; it does not surface the raw upstream provider response. (help.elevenlabs.io)
- Browser DevTools while previewing: The preview talks to ElevenLabs over a WebSocket and you’ll only see the documented conversation events in frames (e.g., user_transcript, agent_response, audio). There is no event that carries a provider HTTP status/body. (elevenlabs.io)
- GET /v1/convai/conversations/:conversation_id: The schema documents status (including "failed"), transcript, metadata, etc., but no failure_reason or provider error fields. (elevenlabs.io)
What you can do instead (documented):
- If you host your own Custom LLM server, you may emit diagnostic metadata back to the client via the agent_response_metadata event. This is optional metadata you define; it’s not the upstream provider’s raw HTTP response. (elevenlabs.io)
- For call initiation failures (not LLM generation errors), you can enable “call initiation failure” post‑call webhooks to receive failure reasons and metadata. (elevenlabs.io)
- If you need the exact upstream error for BYO OpenAI keys, check the provider’s own logs/dashboards, or contact ElevenLabs Support. (help.elevenlabs.io)
Sources:
Where can I locate the reason for my call failing? – ElevenLabs
WebSocket | ElevenLabs Documentation
Get conversation details | ElevenLabs Documentation
Client events | ElevenLabs Documentation
Post-call webhooks | ElevenLabs Documentation
How can I contact support? – ElevenLabs
Understood EL won’t expose raw upstream OpenAI error.
Given: Gemini Flash works perfectly (tools + agent), switching LLM to OpenAI Custom LLM fails instantly with:
custom_llm generation failed
conv_9301khqakh91e98vjs1t20bk7v6h
What’s the best deterministic way to isolate the cause without upstream logs?
Can someone share a checklist of the known failing misconfigs for ElevenLabs Custom LLM (OpenAI key) — specifically:
- correct model string(s) they accept
- whether they hit /v1/responses vs /v1/chat/completions
- whether tools/function calling must be enabled
- minimum safe timeouts
- any common OpenAI org/project permission gotchas
Below is a deterministic isolation playbook and a concise checklist based only on ElevenLabs’ official docs.
Deterministic isolation playbook (no upstream logs required)
- Verify endpoint shape and streaming: Your Custom LLM URL must implement either /v1/chat/completions or /v1/responses, return Server‑Sent Events with Content‑Type: text/event-stream, stream chunks as lines prefixed with “data: {json}\n\n”, and terminate with “data: [DONE]\n\n”. A non‑streaming or malformed SSE response will fail immediately. (elevenlabs.io)
- Confirm tools/function calling: If your agent has system tools enabled, the LLM must emit OpenAI‑format function calls; requests will include a tools array when tools are configured. Temporarily remove tools in the agent to A/B test whether tool calling is the failure point. (elevenlabs.io)
- Check model ID and config: Use a valid provider model string; OpenAI models in ElevenAgents include gpt‑4o, gpt‑4.1, etc. Set the model in the Custom LLM config to match the upstream provider’s expected ID. (elevenlabs.io)
- Validate request extras: If “Custom LLM extra body” is enabled, ElevenLabs includes an elevenlabs_extra_body object in the payload. If you are proxying, strip or handle elevenlabs_extra_body (and map user_id → user) before forwarding. If you are pointing directly to a provider endpoint, disable or ensure the provider tolerates unknown fields. (elevenlabs.io)
- Use the sample proxy to isolate: Point your agent at the example FastAPI server from the docs, log the incoming payload, then forward to your provider. This deterministically reveals mismatched fields, model IDs, and tool payloads while satisfying the SSE contract. (elevenlabs.io)
- Inspect Call History error surface: The Dashboard Call History shows a red error reason when calls fail; use it to rule out workspace/agent misconfig (allowlists, missing dynamic variables, limits) vs LLM generation issues. (help.elevenlabs.io)
- Remember fallback behavior: With Custom LLM configured, ElevenLabs does not cascade to other models; repeated retries target your Custom LLM only. Immediate failure therefore indicates your Custom LLM request path is failing. (elevenlabs.io)
- Optional (enterprise): Use Real‑time Monitoring to watch events on a live conversation to confirm whether any tool call or first chunks arrive before failure. (elevenlabs.io)
Known misconfigurations checklist (OpenAI “Custom LLM”)
- Model strings: Must be valid provider model IDs. For OpenAI, ElevenAgents documents gpt‑4o, gpt‑4.1, gpt‑4.1‑mini/nano, etc. Use exactly the upstream model name in the agent’s Model field. (elevenlabs.io)
- Endpoint used: ElevenLabs supports both /v1/chat/completions and /v1/responses for Custom LLM. Your URL chooses which one is called; both must stream SSE as described above. (elevenlabs.io)
- Tools/function calling: If system tools are configured, your LLM must support OpenAI‑style function calling; tools are injected into the tools parameter and the model must return a function call in standard format. If not using tools, test with tools disabled. (elevenlabs.io)
- Extra body handling: If “Custom LLM extra body” is on, your endpoint
Quick check: in ElevenLabs “Custom LLM” are you in:
A) BYO OpenAI key mode (no URL), or
B) Custom LLM Server URL mode?
If A (BYO key): what exact OpenAI model IDs does ElevenLabs accept in that field—does it require ElevenLabs’ documented IDs (gpt-4o / gpt-4.1), or can we pass raw OpenAI IDs like gpt-4o-mini?
If B (server URL): can you share a known-good minimal SSE response example that ElevenLabs accepts?
Short answer from the docs:
A) BYO OpenAI key mode
- Model IDs: Use OpenAI’s own model IDs (example in docs uses gpt-4o; the Models page also references OpenAI’s GPT-4o-mini), so gpt-4o and gpt-4o-mini are valid examples. (elevenlabs.io)
- API shape: ElevenLabs supports both OpenAI’s /v1/chat/completions and /v1/responses formats for Custom LLM; configure the Custom LLM with your OpenAI key and model in Secrets and the Custom LLM settings. Enable “Custom LLM extra body” if you need to pass through extra parameters. (elevenlabs.io)
B) Custom LLM Server URL mode
- Required streaming format: Your endpoint must stream Server‑Sent Events with Content-Type: text/event-stream. Each chunk is sent as a line starting with data: {json}\n\n and the stream must terminate with data: [DONE]\n\n. (elevenlabs.io)
- Known‑good minimal SSE example (Chat Completions style):
- Headers: Content-Type: text/event-stream
- Body lines:
- data: {"id":"chatcmpl-1","object":"chat.completion.chunk","created":1234567890,"model":"gpt-4o","choices":[{"delta":{"content":"Hello"},"index":0,"finish_reason":null}]}
- data: [DONE]
This mirrors the structure shown in the docs’ FastAPI example and “buffer words” chunk. (elevenlabs.io)
Notes
- System tools/function calling: If you enable system tools in your agent, your Custom LLM must support OpenAI‑style function calling; ElevenLabs will include a tools array in requests. (elevenlabs.io)
Sources:
Integrate your own model | ElevenLabs Documentation
Integrate your own model | ElevenLabs Documentation
Integrate your own model | ElevenLabs Documentation
Integrate your own model | ElevenLabs Documentation
Integrate your own model | ElevenLabs Documentation
Update: the agent does work for normal voice replies.
The failure happens specifically when the conversation reaches a tool call (or right after tool execution) — audio cuts off / agent stops, and the run ends with:
"custom_llm generation failed"
Conv ID: conv_9301khqakh91e98vjs1t20bk7v6h
Gemini Flash handles the exact same tools perfectly.
So: OpenAI Custom LLM can generate speech, but fails on tool-calling or on continuing generation after tool results.
What are the known causes in ElevenLabs for “works until tools” on Custom LLM?
- tool call format mismatch (function/tool schema)
- responses vs chat/completions incompatibility
- missing required streaming frames when tool results are returned
- tool timeout too low (mine was response_timeout_secs=1 earlier)
- extra body fields breaking OpenAI request when tools are included
Any known-good reference config for OpenAI + tools would be amazing.
Here’s what ElevenLabs officially documents about Custom LLMs (incl. BYO OpenAI key) and tool-calling, plus the most deterministic checks to isolate “works until tools” failures:
What ElevenLabs expects at tool-call time
- The model must support function calling and structured outputs. ElevenLabs will include a tools array in the OpenAI-format request; your LLM must emit tool calls in standard OpenAI function-call format. Example function-call objects and a full request with tools are shown in the docs. (elevenlabs.io)
- Custom LLMs may implement either OpenAI Chat Completions (/v1/chat/completions) or Responses (/v1/responses); both formats are supported. If you host a Custom LLM server, responses must be streamed as SSE (text/event-stream) and must terminate with data: [DONE]. (elevenlabs.io)
- If you pass “Custom LLM extra body,” ElevenLabs will include elevenlabs_extra_body in the payload; if you proxy to OpenAI yourself (Custom LLM Server URL mode), you must remove that field before forwarding or OpenAI may reject the request. The server example shows popping elevenlabs_extra_body. (elevenlabs.io)
Known causes for “works for normal replies, fails at tools”
- Model lacks tool/function-calling support. ElevenLabs explicitly notes you need a model that supports tool use; their provider guides call out tool-capable model families. If the chosen OpenAI model doesn’t support function calling, tool steps will fail. (elevenlabs.io)
- Deprecated tool wiring. EleventLabs deprecated embedding tools directly in agent prompts; POST/PATCH with prompt.tools is rejected after July 23, 2025. Ensure tools are created/updated via the Tools API and referenced by IDs, or via built‑in system tools. (elevenlabs.io)
- Custom LLM server streaming/format issues. If you run your own endpoint, the stream must be valid SSE and end with [DONE]. Invalid/missing termination or non‑OpenAI chunk shapes commonly break the run at the first tool cycle. (elevenlabs.io)
- Extra body field leaked upstream. Forwarding elevenlabs_extra_body to OpenAI (instead of removing it server‑side) will cause provider errors. (elevenlabs.io)
- Tool timeout. Tools include response_timeout_secs; if the tool’s response exceeds this, the platform treats it as a timeout. Ensure the value is sufficient for your webhook/client tool. (Examples show 20s defaults and configurable values.) (elevenlabs.io)
- Provider-side behavior change. ElevenLabs previously tracked a gpt‑4o incident where finish_reason changed from tool_calls to stop, breaking tool invocation; gpt‑4o‑mini worked while the issue was live. If you’re using gpt‑4o via BYO key and hit tool-call stalls, test gpt‑4o‑mini as a control. (status.elevenlabs.io)
- No fallback on Custom LLM failure. LLM cascading is bypassed for Custom LLMs; any upstream error/timeout won’t fall back to hosted models and will fail the run. (elevenlabs.io)
Deterministic isolation checklist
- Verify tool-capable model: choose an OpenAI model that supports function calling; ElevenLabs’ guidance stresses this requirement. (elevenlabs.io)
- Confirm request/
For production voice agents using tools (Shopify actions, cart updates, etc.), what is currently the most stable and cost-efficient OpenAI model in BYO mode?
Is gpt-4o-mini the recommended baseline for tool-heavy flows?
Short answer: Yes—use OpenAI GPT-4o mini as your baseline.
- ElevenLabs’ tools guidance explicitly recommends “high intelligence models like GPT-4o mini” for tool use, and advises avoiding Gemini 1.5 Flash for this purpose. (elevenlabs.io)
- The Models page calls out GPT-4o-mini as a good choice when latency matters (typical for real‑time, tool‑heavy voice agents). (elevenlabs.io)
- Historical signal: an incident affected tool calling on gpt‑4o while gpt‑4o‑mini continued to work; a later changelog also notes a fix specific to gpt‑4o‑mini tool calling after an OpenAI change. This supports using gpt‑4o‑mini as the safer default in production. (Inference based on official records.) (status.elevenlabs.io)
BYO OpenAI setup notes (for stability with tools):
- Model ID: gpt-4o-mini; either OpenAI-style endpoint is supported. ElevenLabs accepts both /v1/chat/completions and /v1/responses as long as you stream SSE and return standard OpenAI tool-calling payloads. (elevenlabs.io)
- In Agents > Custom LLM, store your OPENAI_API_KEY as a Secret, set the server URL, model, and enable “Custom LLM extra body.” (elevenlabs.io)
Operational tip:
- If you need automatic fallbacks for maximum uptime, note that enabling a Custom LLM disables cascading to other hosted models; cascading is available only when you use the built-in model picker. (elevenlabs.io)
Cost note:
- ElevenLabs docs frame cost-efficiency as “right-size the model”; for low-latency conversational agents, GPT‑4o‑mini is listed among the suitable choices. Evaluate provider-side token pricing for your volume. (elevenlabs.io)
Conclusion: For production voice agents that rely on tool calls, GPT‑4o‑mini is the ElevenLabs‑recommended, stable, and cost‑efficient OpenAI choice in BYO mode. (elevenlabs.io)
Sources:
Client tools | ElevenLabs Documentation
Models | ElevenLabs Documentation
[ConvAI] Tool calling on gpt4o isn't working - ElevenLabs Status
Integrate your own model | ElevenLabs Documentation
Integrate your own model | ElevenLabs Documentation
LLM Cascading | ElevenLabs Documentation
Models | ElevenLabs Documentation
Client tools | ElevenLabs Documentation
In advanced tab in agent there is section called Keywords seperated by comma what is this for?
Short answer: it’s the agent’s ASR keywords list.
What the UI field maps to
- The “Keywords (comma‑separated)” field in the Advanced tab corresponds to conversation_config.asr.keywords in the agent configuration. In the API this is an array of strings, for example:
"asr": { "keywords": ["hello", "world"], ... }. (elevenlabs.io)
What the docs do and don’t state
- ElevenLabs’ public docs expose the keywords field in the agent schema but do not publish a behavioral description of how it is used at runtime. (elevenlabs.io)
Related, documented STT biasing feature
- If you’re looking for a documented way to bias transcription toward specific terms, ElevenLabs documents “keyterm prompting” for Speech‑to‑Text (Scribe v2), which lets you supply a list of terms to bias the model when transcribing. Note: this is documented for the Speech‑to‑Text API, not specifically for Agents UI. (elevenlabs.io)
Sources:
Get agent | ElevenLabs Documentation
Get agent | ElevenLabs Documentation
Keyterm prompting | ElevenLabs Documentation
Hi there! Do you have the agent configurated on the Shopify store with admin access?