Building a WhatsApp bot for internal docs using OpenClaw | Friends of the Crustacean 🦞🤝 | Page 1

neon nest Feb 18, 2026, 12:59 AM

#

Hey everyone 👋

Looking for advice on setting up an OpenClaw agent to answer questions about internal company documentation (~38 Markdown files in Google Drive).

The use case:

Team asks questions via WhatsApp
Bot searches our Google Drive docs and responds with relevant info + citations
Basically RAG over our internal procedures

What I'm trying to figure out:

Best approach for the Drive sync skill — should I use an existing skill or build a custom one?
How to handle the semantic search + vector DB part
Whether to build this as a single custom skill or compose multiple existing skills
Must be a separate, specialized agent (not main)

Has anyone built something similar? Would love to see examples or get pointed to relevant skills from ClawHub.

Running on a dedicated server (no sensitive data on the machine itself). Already comfortable with the basics but this is my first RAG-style agent.

Thanks!

shrewd thornBOT Feb 18, 2026, 12:59 AM

#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution

jade fogBOT Feb 18, 2026, 1:01 AM

#

For ~38 Markdown files, you can keep this way simpler than a “full” RAG stack at first, and still get great results.

1) Google Drive sync: existing skill vs custom

Best practical approach: don’t “query Drive” live from the agent. Sync Drive → local folder on the server, then let the agent search local files.

Reasons:

more reliable + faster at query time
easier to add citations (file path + heading/line ranges)
avoids OAuth/Drive API flakiness in the hot path

How to implement sync:

MVP: rclone sync (or similar) as a scheduled job, writing into something like workspace/data/internal-docs/
More integrated: a small custom skill that uses the Google Drive API and tracks fileId, modifiedTime, etc.

If you go the skill route, I’d still keep it as a “sync/indexer” skill that runs on a timer, not on each WhatsApp message.

Docs to orient you:

#

Skills overview: https://docs.openclaw.ai/tools/skills
ClawHub (finding/installing skills): https://docs.openclaw.ai/tools/clawhub

2) Semantic search + vector DB: what to use

Given the corpus size, you have three sane tiers:

Tier A (start here): “good grep” + LLM

Use keyword search (ripgrep) + take top N snippets + have the model answer using only those snippets.
Add a simple “query expansion” step (LLM generates 5–10 keywords) if needed.
This often beats half-baked embeddings on small corpora, and citations are straightforward.

Tier B: embeddings, but keep storage simple

Chunk Markdown by heading (or ~400–800 tokens)
Store {chunkText, sourcePath, heading, anchor, modifiedTime, embedding[]} in SQLite
Do cosine similarity in-code
Re-embed only changed chunks during sync

#

Tier C: external vector DB

Qdrant / pgvector / LanceDB if you expect growth (hundreds/thousands of docs) or multi-tenant needs.

My bias: Tier A → Tier B when you feel pain. Tier C only when you know you need it.

3) One big skill vs composing multiple skills

Compose. It stays debuggable.

A clean split is:

drive_sync (or rclone) → maintains local mirror
index_build (optional) → builds embeddings/metadata store
doc_search tool → returns structured results (snippets + citations)
answering agent → uses only tool output, formats response + citations

That division also makes it easy to swap retrieval later without rewriting the WhatsApp bot logic.

#Building a WhatsApp bot for internal docs using OpenClaw

1) Google Drive sync: existing skill vs custom

2) Semantic search + vector DB: what to use

3) One big skill vs composing multiple skills

4) Separate specialized agent (not main)