#RAG agent for SEC 10-K filings — hallucination guard with hybrid retrieval

1 messages · Page 1 of 1 (latest)

bright tiger
#

Spent 3 days building a RAG agent for SEC 10-K filings — wanted to share
what worked for hallucination control.

Most PDF question-answering tools confidently invent financial numbers. Ask
"How much did the CEO earn?" and they'll fabricate something, even though
that info isn't even in the 10-K (it's in DEF 14A).

What worked: section-aware chunking + BM25/vector hybrid retrieval + a
"Not found in the filing" prompt pattern with few-shot examples.

Benchmarked on Apple's 10-K — 92.3% pass rate, 100% citation precision,
all adversarial "not found" questions handled correctly.

Stack: LangChain + Groq Llama 3.3 + ChromaDB + bge-small-en.

If anyone wants to check it out:
Demo: https://elnar5-tenk-analyst-agent.hf.space
Code: github.com/Elnar5/tenk-analyst-agent

Still figuring out multi-doc retrieval for comparison queries — anyone
tackled that?