#RAGus.ai - RAG Data Pipeline with OpenAI Vector Store Integration

1 messages · Page 1 of 1 (latest)

honest walrus
#

Hey guys, I built RAGus - a tool that solves the "garbage in, garbage out" problem for RAG.

The problem: Most RAG implementations fail not because of the LLM, but because of bad chunking and poor retrieval. You scrape a website, split at 500 tokens, and suddenly your chunks have half a phone number or a question separated from its answer.

What RAGus does:

🌐 Sitemap Scraping - Scrapes entire websites via sitemap, cleans out nav/footer junk
🧠 Agentic Chunking - Uses an LLM to decide topic boundaries (not arbitrary token limits)
✨ Metadata Enrichment - Generates summaries + pre-generated questions for each chunk (solves vocabulary mismatch)
📊 AI Analytics - Analyzes chatbot conversations: sentiment, dropoffs, knowledge gaps, frustration detection
⏰ Auto-Sync - Scheduler keeps KB fresh when source content changes
Integrations:

🟢 OpenAI Vector Stores (via Assistants API)
🔷 Voiceflow KB
🟣 Qdrant
Built with FastAPI + React. We've seen retrieval accuracy go from ~40% to 90%+ just by adding the metadata layer.

Happy to give access if anyone wants to try it. Also open to adding more integrations (Pinecone, Weaviate, etc).
https://ragus.ai

RAGus.ai

No-code web scraper that transforms websites into structured knowledge bases for AI chatbots. Manage Voiceflow KB tables and OpenAI vector stores with 90-99% RAG accuracy.