#Building a WhatsApp bot for internal docs using OpenClaw

1 messages · Page 1 of 1 (latest)

neon nest
#

Hey everyone đź‘‹

Looking for advice on setting up an OpenClaw agent to answer questions about internal company documentation (~38 Markdown files in Google Drive).

The use case:

  • Team asks questions via WhatsApp
  • Bot searches our Google Drive docs and responds with relevant info + citations
  • Basically RAG over our internal procedures

What I'm trying to figure out:

  1. Best approach for the Drive sync skill — should I use an existing skill or build a custom one?
  2. How to handle the semantic search + vector DB part
  3. Whether to build this as a single custom skill or compose multiple existing skills
  4. Must be a separate, specialized agent (not main)

Has anyone built something similar? Would love to see examples or get pointed to relevant skills from ClawHub.

Running on a dedicated server (no sensitive data on the machine itself). Already comfortable with the basics but this is my first RAG-style agent.

Thanks!

shrewd thornBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> âś… Mark Solution

jade fogBOT
#

For ~38 Markdown files, you can keep this way simpler than a “full” RAG stack at first, and still get great results.

1) Google Drive sync: existing skill vs custom

Best practical approach: don’t “query Drive” live from the agent. Sync Drive → local folder on the server, then let the agent search local files.

Reasons:

  • more reliable + faster at query time
  • easier to add citations (file path + heading/line ranges)
  • avoids OAuth/Drive API flakiness in the hot path

How to implement sync:

  • MVP: rclone sync (or similar) as a scheduled job, writing into something like workspace/data/internal-docs/
  • More integrated: a small custom skill that uses the Google Drive API and tracks fileId, modifiedTime, etc.

If you go the skill route, I’d still keep it as a “sync/indexer” skill that runs on a timer, not on each WhatsApp message.

Docs to orient you:

#

2) Semantic search + vector DB: what to use

Given the corpus size, you have three sane tiers:

Tier A (start here): “good grep” + LLM

  • Use keyword search (ripgrep) + take top N snippets + have the model answer using only those snippets.
  • Add a simple “query expansion” step (LLM generates 5–10 keywords) if needed.
  • This often beats half-baked embeddings on small corpora, and citations are straightforward.

Tier B: embeddings, but keep storage simple

  • Chunk Markdown by heading (or ~400–800 tokens)
  • Store {chunkText, sourcePath, heading, anchor, modifiedTime, embedding[]} in SQLite
  • Do cosine similarity in-code
  • Re-embed only changed chunks during sync
#

Tier C: external vector DB

  • Qdrant / pgvector / LanceDB if you expect growth (hundreds/thousands of docs) or multi-tenant needs.

My bias: Tier A → Tier B when you feel pain. Tier C only when you know you need it.

3) One big skill vs composing multiple skills

Compose. It stays debuggable.

A clean split is:

  1. drive_sync (or rclone) → maintains local mirror
  2. index_build (optional) → builds embeddings/metadata store
  3. doc_search tool → returns structured results (snippets + citations)
  4. answering agent → uses only tool output, formats response + citations

That division also makes it easy to swap retrieval later without rewriting the WhatsApp bot logic.

4) Separate specialized agent (not main)