Hi everyone. Can you give me some feedback? It's set up like this, but I don't think it's right and I believe I'm wasting tokens and context. I'm using Langgraph.
I ask question 1: "Analyze the risks of my case" -> a RAG is performed and the analysis is done by LLM.
I ask question 2: "And what are the main requests of the case?" -> The RAG is performed again, because the router doesn't analyze if the current context of the conversation is sufficient to answer, so it identifies that it's talking about the case and calls the RAG tool again.
Ideally, the router should analyze based on the context of the conversation and call the RAG only if necessary, right? Otherwise, I'll be filling up my context every time I ask a question about anything in the documentation, even if it's already there.
Like, I just don't know if it will cost many tokens to do this analysis beforehand too, you know? Instead of doing the RAG, because then it would analyze the entire context of the conversation to know if it should do the RAG or not. What would your approach be?