#Does RAG size influence latency for ElevenAgents?
1 messages · Page 1 of 1 (latest)
AI-generated Answer for klara_56475
Short answer: indirectly.
- Small knowledge bases are injected directly into the LLM context to avoid added latency; RAG is only enabled once the KB is large enough that retrieval is more efficient than direct inclusion. (elevenlabs.io)
- When RAG is enabled, the system retrieves only relevant chunks and is engineered to keep per‑query overhead low and stable even on large enterprise KBs (median RAG latency ~155 ms after their redesign). In practice, overall KB size is not the primary driver of latency. (elevenlabs.io)
- What does influence latency are RAG configuration choices (e.g., embedding model, maximum retrieved chunks, vector distance), which the docs note can impact latency and cost. (elevenlabs.io)
Sources:
Deploying enterprise knowledge to voice agents
How we engineered RAG to be 50% faster
Retrieval-Augmented Generation | ElevenLabs Documentation
What are differences between embedding models?