Hi, guys. I'm planning to implement query rewriting for follow-up questions when RAG is involved and i'd like to have your feedback on the approach.
I'm using LangGraph and LangChain
Context:
The chat can use RAG (searchProjectDocuments) or not
The RAG will only be used if the agent detects that the input is talking about the document/case.
If RAG is not used, the LLM responds using chat context (GPT knowledge)
I want to improve follow-up questions when RAG was used in the previous turn, but I don't know if i'm right or if it is right.
Proposed Logic:
When to rewrite:
First user message: Never rewrite (use original query)
Subsequent messages:
If searchProjectDocuments was called in the last assistant response → Rewrite the current user query (it's probably another question about the case/document)
If searchProjectDocuments was not called → No rewriting (use original query)
Flow Example:
Turn 1:
User: "Tell me about the case"
→ LLM decides: call searchProjectDocuments
→ RAG executed with original query
→ Documents returned
Turn 2:
User: "And what are the main requests?"
→ Check: Was RAG called last turn? YES
→ Call Rewrite query node: returns "What are the main requests in the case?"
→ Pass rewritten query to LLM
→ LLM decides: call searchProjectDocuments
→ RAG executed with rewritten query
→ Returns NEW documents (only from rewritten query, not mixing with turn 1 docs)
Turn 3:
User: "What is litigation finance?"
→ Check: Was RAG called last turn? ✅ YES (it was)
→ Rewrite query: the node is called, but identifies that is not a question about the document/case and just returns the original query.
→ Pass original query to LLM
→ LLM decides: DON'T call searchProjectDocuments ❌
→ Responds using chat context (GPT knowledge) - rewritten query doesn't affect result
Continue below 👇