#RAG Flow Help

11 messages · Page 1 of 1 (latest)

topaz vapor
#

Hi everyone. Can you give me some feedback? It's set up like this, but I don't think it's right and I believe I'm wasting tokens and context. I'm using Langgraph.

I ask question 1: "Analyze the risks of my case" -> a RAG is performed and the analysis is done by LLM.

I ask question 2: "And what are the main requests of the case?" -> The RAG is performed again, because the router doesn't analyze if the current context of the conversation is sufficient to answer, so it identifies that it's talking about the case and calls the RAG tool again.

Ideally, the router should analyze based on the context of the conversation and call the RAG only if necessary, right? Otherwise, I'll be filling up my context every time I ask a question about anything in the documentation, even if it's already there.

Like, I just don't know if it will cost many tokens to do this analysis beforehand too, you know? Instead of doing the RAG, because then it would analyze the entire context of the conversation to know if it should do the RAG or not. What would your approach be?

proven torrent
#

@topaz vapor i'm not clear on how you're wasting tokens? is the number of documents you're adding as context increasing as the conversation goes on or something?

topaz vapor
#

@proven torrent Hi. I would always call this rag tool, but I don't think that's the way to go, because I would be adding repeated chunks to the conversation and consuming context tokens. Correct?

#

In this case, I was thinking of the following:

Before creating another rag, I would have to make a call to the model, and if it didn't have enough context to respond, I would have to call the rag tool and make another call to the model again.

I'm a beginner and I'm working alone on this project, so I'm a bit lost with this still. Do you think this would be a good approach?

proven torrent
topaz vapor
#

Could you explain in more detail please, step by step? I'm just starting out in this field.

For this query, for example: "Analyze the risks of my case" what would you do?

#

I'm using langgraph. I migrated from AI SDK to langgraph

proven torrent
# topaz vapor Could you explain in more detail please, step by step? I'm just starting out in ...

for the first query you don't have to do anything. just RAG as normal. it's just follow up questions like "And what are the main requests of the case?". you make an LLM request like: "given the last queries and your answers, rewrite this query so that it contains all necessary information to understand the query in isolation". then you do a RAG on that new query the LLM gives you. you don't add the documents from the first query to your context, but just the new documents retrieved from the RAG of the new query

topaz vapor
#

Thank you very much, man. The flow would be like this, right?

RAG Follow-up Pattern (Clean Approach)

This is a classic RAG follow-up issue. The correct solution is query rewriting, not reusing past context or documents.

Step-by-step:

1️⃣ First question

  • Run RAG normally
  • Generate the answer
  • ❌ Do NOT keep retrieved documents
  • ✅ Keep only the question + a short structured answer

2️⃣ Follow-up questions

  • Make a small LLM call to rewrite the follow-up into a standalone query
  • Prompt example:

"Given the previous questions and answers, rewrite the user's last question so it can be understood in isolation."

3️⃣ Run RAG again

  • Use the rewritten query
  • Retrieve fresh documents
  • Pass only these documents to the LLM

Why this works

  • No duplicated chunks in context
  • No growing conversation window
  • Cheaper than analyzing full history
  • Predictable and scalable

Rule of thumb
👉 Always RAG, but always with a self-contained query, never with accumulated context.

proven torrent
#

yeah

topaz vapor
#

It helped me a lot. Thank you! I have many doubts about my new job and this implementation is all on my shoulders. The people I work with don't know anything about it either, and AI isn't helping much.