-
Bot consistently fails to understand tasks, its own architecture, and agents.
- Bot lacks agency to propose or implement solutions; often explains how user could do things instead of acting autonomously.
- Agents return nothing or perform incorrect actions; bot reports success when tasks were not completed.
- Token usage is high/wasted despite attempted token-reduction policies.
- Bot overcomplicates simple tasks (e.g., building a dashboard) and fails to deliver functioning results.
- Repeated setup attempts (four different setups).
- Models tested:
- Opus 4.6 (used for setup/initiation)
- Kimi 2.5
- Google Gemini Pro
- Google Gemini Flash
- Grok
- Multiple agents created for different purposes; agents either return nothing or perform wrong actions.
- One agent was given access to documentation to self-diagnose/fix but did not improve.
- Bot does not understand what it should or should not do.
- Bot claims tasks are completed when they are not.
- When errors are pointed out, bot acknowledges but then makes further mistakes.
- Bot fails to optimize for token usage despite policies provided.
- Identity/instruction variations tested (different personas and initial instruction methods) with no meaningful improvement.
-
Specific examples/issues:
-
Dashboard task: intended to simply display files as a test; bot attempted to build an overly complex server calling its own API, could not make it work; persisted with complexity despite direction to simplify.
-
I am experienced with AI tools (Codex, Claude Code, various agents).
-
Best practices for first setup of OpenClaw bot / troubleshooting guide.
-
Guides or first-use documentation for making the bot smarter or more autonomous without relying solely on Opus 4.6.
-
How to configure agents, policies, identities, and workflows so the bot reliably understands tasks and executes them with minimal token waste.
-
#Troubleshooting Request — OpenClaw bot setup and functionality:
1 messages · Page 1 of 1 (latest)
What you’re describing usually isn’t “one bug” — it’s a combo of (a) toolchain not actually working end-to-end, (b) model/tool-calling reliability, and (c) missing “definition of done” guardrails so the agent can’t prove it did the thing.
Here’s the fastest way to get from “it’s useless” → “it’s predictable”.
1) First: verify the plumbing (otherwise everything feels hallucinated)
On the gateway host, run these and paste the output (especially status --all):
openclaw --version
openclaw status --all
openclaw models status
openclaw doctor
openclaw channels status --probe
openclaw logs --follow
Docs:
- Troubleshooting runbook (this exact command ladder): https://docs.openclaw.ai/gateway/troubleshooting
- FAQ “first 60 seconds”: https://docs.openclaw.ai/help/faq
If the bot “reports success” but didn’t do anything, the logs usually show one of:
- tool call rejected/failed
- model failed over / “all models failed”
- exec blocked by approvals/allowlist
- routing/policy dropped the message
2) Reduce to a single-agent baseline (then add complexity back)
When people try 4 setups + multiple agents + multiple models, it’s easy to end up in a state where:
- the agent you’re talking to isn’t the one you think
- the model you selected isn’t allowed (silent-ish “no reply” feeling)
- a fallback model that’s bad at tool calling is actually running the show
Baseline test:
- Use one agent
- Use one model (no fancy fallbacks initially)
- Test in DM, not a busy group/thread
- Don’t load huge docs into the system prompt
You can inspect what’s going on live with:
/status/model status/context detail(shows what’s eating context)/usage tokensor/usage cost(to see burn)
Slash commands reference: https://docs.openclaw.ai/tools/slash-commands
3) Treat “autonomy” as: tools + permissions + approvals (not vibes)
A super common mismatch: you expect it to act, but it’s not actually allowed to.
If you want it to build dashboards / touch files / run commands, you need the exec/tool guardrails set sanely, otherwise it will “try”, get blocked, and weaker models will paper over it.
Read this (it matters): https://docs.openclaw.ai/tools/exec-approvals
Key concept:
- Even if the model is willing, exec approvals + allowlists decide whether anything runs on your real machine/node.
If you see approvals getting in the way, you can also forward approval prompts to chat and approve with /approve … (covered in that doc).
4) Stop “claimed done” with one rule: no proof = not done
Add an explicit workflow constraint to your agent instructions (AGENTS / system prompt style):
“Never claim a task is complete unless you provide verifiable artifacts.”
Examples of “proof”:
- file paths + file contents (or a diff)
- command output
- a URL that actually loads + screenshot
- a
/context detailsnippet showing the created files are in-workspace
This single change eliminates 80% of the “it said it worked” frustration.
5) Token waste: measure it, then fix the real causes
Most token burn comes from:
- huge context (docs pasted into the prompt, giant system prompts, long chat history)
- retries/failover due to tool errors
- “overplanning” because the agent isn’t constrained to a minimal solution
Practical knobs:
- Use
/context detailto find prompt bloat (then remove it) - Use
/compactwhen the session gets big: https://docs.openclaw.ai/concepts/compaction - Use multi-agent routing so “cheap chat” and “tool work” aren’t the same agent/model: https://docs.openclaw.ai/concepts/multi-agent
- Turn on
/usage tokenswhile iterating so you can see whether changes helped
6) About models (important, but easy to mis-diagnose)
Some models are fine for chatting but unreliable at strict tool calling (schema adherence, function arguments, etc.). That shows up exactly as:
- tools not invoked when they should be
- tools invoked with wrong args
- “success” narratives without real side effects
So: don’t try to make one model do everything. Use:
- one “reliable tool-use” model for the agent that touches tools
- optionally a cheaper model for lightweight chat (separate agent)
For up-to-date “which models are best right now” community guidance, the server’s #models channel is the best source, and the model docs explain how selection/fallbacks work: https://docs.openclaw.ai/concepts/models
If you paste one thing, paste this
openclaw status --all + a short log excerpt around a failed task (from openclaw logs --follow).