#explain the RAG configuration settings. What is the role of each adjustable parameter
1 messages · Page 1 of 1 (latest)
Below are the RAG settings you can configure in ElevenLabs Agents and what each one does, as documented:
-
Use RAG (toggle)
- Enables the retrieval pipeline so the agent searches your knowledge base instead of stuffing all content into the prompt. Expect a small extra latency (~500 ms). When enabled, you can attach larger knowledge-base items to an agent (per-item limit increases from 300 KB to 10 MB; total limits vary by plan). (elevenlabs.io)
-
Embedding model
- Chooses the model used to embed queries and documents for vector search. This affects retrieval quality and is also specified when (re)indexing documents and in the agent config. API fields: model (for indexing) and embedding_model (for agent config). (elevenlabs.io)
-
Maximum document chunks
- Caps how many chunks can be retrieved per query. This limits how much context is sent to the LLM and helps control latency and cost. In the API, this is exposed as max_retrieved_rag_chunks_count. (elevenlabs.io)
-
Maximum vector distance
- Sets the similarity threshold for retrieved chunks. A higher distance allows less-similar chunks (more recall) but can reduce relevance and quality. (elevenlabs.io)
-
Maximum retrieved content length (characters)
- Limits the total size of retrieved text passed to the LLM. In agent config this appears as max_documents_length; the docs agent example sets “Maximum retrieved content: 50,000 characters.” (elevenlabs.io)
-
Document usage mode (per document)
- Auto: document is retrieved only when relevant.
- Prompt: document is always included in the system prompt (and can also be retrieved), which risks exceeding context if overused. (elevenlabs.io)
Notes on trade-offs (from the docs):
- Retrieving more chunks increases LLM cost; widening the vector distance may add context but hurt relevance. Tune for your use case and monitor performance/cost. (elevenlabs.io)
Additional references on RAG in ElevenLabs Agents and performance:
- Integrated RAG in the platform and latency considerations/optimizations. (elevenlabs.io)
Sources:
Retrieval-Augmented Generation | ElevenLabs Documentation
Retrieval-Augmented Generation | ElevenLabs Documentation
Retrieval-Augmented Generation | ElevenLabs Documentation
Retrieval-Augmented Generation | ElevenLabs Documentation
Retrieval-Augmented Generation | ElevenLabs Documentation
Retrieval-Augmented Generation | ElevenLabs Documentation
Retrieval-Augmented Generation | ElevenLabs Documentation
ElevenLabs Conversational AI 2.0 voice agents now live | ElevenLabs
explain in depth about maximum document chunks, maximum vector distance, maximum retrieved content length