I told my agent to spend 4h on researching my codebase to return a report on improvements and suggest new features. It returns instantly with some basic suggestions (with some suggestions which was already implemented, so I don't think it did anything at all) as if I was using the models web ui to ask the same question. I use Gemini 3 Flash. Anybody know why it won't do the 4h session or any research?
#finishing too early
1 messages · Page 1 of 1 (latest)
You can’t really “make it run for 4 hours” just by saying so in the prompt — the agent will answer as soon as it thinks it’s done (or when it hits built-in limits/timeouts). The more important signal is: did it actually call tools to look at your repo? If it didn’t, you’ll get generic “web UI”-style suggestions.
The two most common causes of “instant, generic report” are:
-
The model didn’t use tools (file/exec/search)
OpenClaw can only inspect your codebase if the model actually requests tool calls. Some models are noticeably worse at tool-calling / staying in “agent mode” and will just reply immediately. (This is a known pain point with some non-Anthropic models.) -
Your codebase isn’t accessible from where the Gateway is running
If your Gateway is on a VPS/remote box (or you’re sandboxed and the repo isn’t bound in), the agent may literally have nothing to read, so it answers from prior knowledge.
Concrete things to check (in order):
- Version first: paste
openclaw --version - Confirm the model you’re actually running:
/model status(oropenclaw models status)
Docs: https://docs.openclaw.ai/concepts/models - Look for tool calls in logs around the request:
openclaw logs --follow
If you see noread/exec-style activity, it didn’t “research” anything.
Troubleshooting ladder: https://docs.openclaw.ai/gateway/troubleshooting
- Confirm repo accessibility: is the repo on the same machine as the Gateway? If remote, you’ll need to move/sync the repo there or use a node/workflow that can access it. FAQ section on “agents work outside workspace / remote mode”: https://docs.openclaw.ai/help/faq
If you want, paste:
openclaw --version/model statusoutput- ~30 lines of logs from right after you sent the “4h research” prompt
…and we can tell quickly whether it’s (a) tool-calling/model behavior, or (b) file access / sandbox / remote-gateway issue.
PS C:\WINDOWS\system32> openclaw --version
2026.2.15
model status:
Current: anthropic/claude-haiku-4-5
Default: anthropic/claude-haiku-4-5
Agent: main
Auth file: ~.openclaw\agents\main\agent\auth-profiles.json
[anthropic] endpoint: default auth: anthropic:manual=token:sk-ant-a...zmrj8QAA (next, lastGood) (auth-profiles.json: ~.openclaw\agents\main\agent\auth-profiles.json)
• anthropic/claude-haiku-4-5 (haiku)
• anthropic/claude-sonnet-4-5 (sonnet)
latest log messages:
10:41:43 debug diagnostic {"subsystem":"diagnostic"} run cleared: sessionId=ecde6248-36e8-4e93-8bc9-5a2a9ce060d9 totalActive=0
10:41:43 debug agent/embedded {"subsystem":"agent/embedded"} embedded run done: runId=46805511-0679-479b-aefe-e85d29cc3e4e sessionId=ecde6248-36e8-4e93-8bc9-5a2a9ce060d9 durationMs=8738 aborted=false
10:41:43 debug diagnostic {"subsystem":"diagnostic"} lane task done: lane=main durationMs=8745 active=0 queued=0
10:41:43 debug diagnostic {"subsystem":"diagnostic"} lane task done: lane=session:agent:main:discord:channel:1473023326321639639 durationMs=8749 active=0 queued=0
10:42:22 debug diagnostic {"subsystem":"diagnostic"} lane enqueue: lane=session:agent:main:discord:channel:1473023326321639639 queueSize=1
10:42:22 debug diagnostic {"subsystem":"diagnostic"} lane dequeue: lane=session:agent:main:discord:channel:1473023326321639639 waitMs=1 queueSize=0
10:42:22 debug diagnostic {"subsystem":"diagnostic"} lane enqueue: lane=main queueSize=1
10:42:22 debug diagnostic {"subsystem":"diagnostic"} lane dequeue: lane=main waitMs=1 queueSize=0
10:42:22 debug agent/embedded {"subsystem":"agent/embedded"} embedded run start: runId=3eb70f82-1845-41dd-b3c3-9a168e0ebb59 sessionId=ecde6248-36e8-4e93-8bc9-5a2a9ce060d9 provider=anthropic model=claude-haiku-4-5 thinking=low messageChannel=discord
10:42:22 debug diagnostic {"subsystem":"diagnostic"} run registered: sessionId=ecde6248-36e8-4e93-8bc9-5a2a9ce060d9 totalActive=1
What you’re trying to do (“spend 4h researching my codebase, then come back with a report”) won’t happen just because you asked for 4 hours.
Two separate issues usually cause the “returns instantly with generic advice” behavior:
1) The agent can’t see your codebase (so it can only guess)
OpenClaw only “researches” if it can actually read files / run commands on the machine where the repo exists.
- By default the workspace is
~/.openclaw/workspace(on Windows that’s under your user profile). - If your repo isn’t in the workspace (or the agent workspace isn’t pointed at the repo root), it will answer like a normal web UI prompt.
Fix: point the agent workspace at your repo root (or copy the repo into the workspace). Docs: https://docs.openclaw.ai/concepts/agent-workspace
2) Tools might be restricted (so it can’t read/run anything even if the repo is there)
If your tool policy is “minimal” (or you deny fs/runtime tools), the model literally can’t do the repo scan and will just respond from prior knowledge.
Things to check in config:
tools.profile(you generally want something that includes filesystem + exec, e.g. “coding”)
tools.allow/tools.deny- any
tools.byProvideroverrides
Relevant docs (tool allow/deny + profiles): https://docs.openclaw.ai/gateway/configuration (see tools.allow/tools.deny, tool profiles/groups)
Also: you’re not actually running Gemini in the log you pasted
Your log line shows:
provider=anthropic model=claude-haiku-4-5
So either:
- the session/model didn’t switch the way you thought, or
- you’re looking at a different agent/session than the one you tested.
(If you are using Gemini in another session: some non-Anthropic models are currently flaky with tool-calling, which produces exactly this “didn’t do anything” vibe.)
How to get “long work” behavior in OpenClaw (instead of one quick reply)
LLMs don’t “wait 4 hours”; they do a run and stop when they think they’re done. The supported patterns for long work are:
- Sub-agents (best for “go work and report back”) — docs: https://docs.openclaw.ai/tools/subagents
- Cron jobs for staged work / periodic progress — docs: https://docs.openclaw.ai/automation/cron-jobs
If you paste these, we can tell exactly what’s wrong
Run on the gateway host and paste outputs (redacts are fine):
openclaw status --all
openclaw config get agents.defaults.workspace
openclaw config get tools.profile
openclaw config get tools.allow
openclaw config get tools.deny
openclaw config get agents.defaults.sandbox
If this ends up fixing it, please right-click the solution message → Apps → ✅ Mark Solution so Answer Overflow indexes it for the next person.
u may also ask Claw to include some self-reflection about progress of that task in HEARTBEAT.md
Yep — that’s a good trick, with one caveat: HEARTBEAT.md should stay tiny, otherwise you’ll burn tokens every heartbeat.
What HEARTBEAT.md is good for is making the agent do a consistent “status + next step” ritual so it doesn’t keep pretending it did work.
Practical pattern
- Put the “big task” output + checkpoints in a file in the workspace, e.g.
PROGRESS.md(orresearch-log.md). - Use sub-agents or a cron job to actually do the heavy lifting and update that file.
- Use HEARTBEAT.md to (a) glance at that file and (b) report what changed / what’s next.
A minimal HEARTBEAT.md template (copy/paste)
# Heartbeat checklist (keep short)
- If PROGRESS.md exists: summarize what changed since last heartbeat in 5 bullets.
- If task is incomplete: state the next concrete step (1-3 actions).
- If blocked: state exactly what input is needed (paths, commands to run, missing access).
- Do NOT claim you reviewed code unless you actually read files / ran repo search.