#optimal openclaw for weak laptop

1 messages · Page 1 of 1 (latest)

rocky atlas
#

Quick thread summary before this gets too long:

You wanted to run local models in OpenClaw on a machine with a 4GB GPU + 32GB RAM, including separate models for planning/coding, vision, and Whisper/audio.

What we found:

OpenClaw can use separate model slots:
agents.defaults.model for the main assistant/planner/coding model
agents.defaults.imageModel for image-capable model fallback
tools.media.audio for Whisper/audio transcription
Whisper should not be configured as a normal chat model. It belongs under tools.media.audio.
For whisper.cpp, use whisper-cli.exe/main.exe with full Windows paths, not usually command: "whisper" unless that is actually your binary.
Your local model providers were configured as custom OpenAI-compatible endpoints like:
custom-localhost-8080/planner
custom-localhost-8081/vision
custom-localhost-8082/coder
On OpenClaw 2026.4.23, models.providers.*.timeoutSeconds was rejected. After updating to 2026.5.7, it works, and you set planner timeout to 900.

stiff windBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> âś… Mark Solution

rocky atlas
#

The Gateway/device pairing issue was separate: the CLI needed a scope upgrade, which you approved with openclaw devices approve <requestId>.
The spawn start ENOENT after openclaw gateway restart looked like a Windows Startup-folder login-item restart path issue, not model config. Suggested checking openclaw gateway status, repairing with openclaw doctor / openclaw gateway install, or running openclaw gateway run temporarily.
Windows symlink errors for browser-automation skill were caused by Windows blocking symlink creation. Fix: enable Developer Mode or run elevated once/reinstall gateway.

Main diagnosis:

Your slow replies were mostly not network transfer. llama.cpp logs showed OpenClaw was sending a large prompt, then the model spent a long time in prompt prefill before producing the first token. Example logs showed ~13k–16k prompt tokens and then cancellation before generation.

Why even hi was heavy:

OpenClaw full agent mode includes system instructions, tools, workspace/bootstrap context, session metadata, Discord context, and model/tool protocol text.
OpenClaw requires at least ~16k context for the embedded agent loop.
Your chat template was also adding the entire tool list as JSON into the system prompt:

List of tools: <|tool_list_start|>[ ... ]<|tool_list_end|>

strange sealBOT
#

Yep, that summary looks right. The big takeaways are:

  • planner/coder/vision can be separate OpenAI-compatible local endpoints
  • Whisper/audio should stay under tools.media.audio, not agents.defaults.model
  • on 2026.5.7+, models.providers.*.timeoutSeconds is valid, so the 900s planner timeout is the right fix for slow local inference

For the weak-laptop setup, I’d still treat the local chat/coding models as experimental: 4GB VRAM + 32GB RAM can work, but tool-calling reliability and latency are the real bottlenecks. If it starts hanging or inventing tools, use a cloud model for the main assistant and keep local models for audio/vision or lighter tasks. 🦞

rocky atlas
#

That likely inflated prompts and encouraged JSON-looking replies.

Recommended local-model config direction:

openclaw config set models.providers.custom-localhost-8080.timeoutSeconds 900 --strict-json
openclaw config set agents.defaults.timeoutSeconds 900 --strict-json
openclaw config set agents.defaults.experimental.localModelLean true --strict-json
openclaw config set channels.discord.historyLimit 0 --strict-json
openclaw config set models.providers.custom-localhost-8080.models[0].reasoning false --strict-json
openclaw config set models.providers.custom-localhost-8080.models[0].maxTokens 512 --strict-json

For full agent mode, keep planner/vision context at least 16k:

powershell

#

openclaw config set models.providers.custom-localhost-8080.models[0].contextWindow 16000 --strict-json
openclaw config set models.providers.custom-localhost-8080.models[0].contextTokens 16000 --strict-json

If still too slow, last-resort for local planner:

openclaw config set models.providers.custom-localhost-8080.models[0].compat.supportsTools false --strict-json

Tradeoff: that makes the local planner much lighter, but no real tool-calling loop.

Bottom line:

Local vision/transcription/simple chat may work on this hardware. Full coding/tool-agent loops are likely too heavy for a 4GB GPU + CPU inference unless you heavily reduce tools/context or use a faster/hosted model for the main agent loop.

strange sealBOT
#

Yep — that diagnosis tracks.

The key limiter is prompt prefill, not download/network. With a local llama.cpp model, a 13k–16k token agent prompt can take ages before the first token, especially on 4GB VRAM where layers spill to CPU/RAM.

Practical next steps I’d try:

  1. Use the smallest tool surface possible for the local model. Fewer enabled tools = less tool schema text in the prompt.
  2. Avoid Discord/full-agent mode for latency testing. Test the raw llama.cpp endpoint with a tiny prompt first, then compare OpenClaw agent mode.
  3. Use a model/chat template that does not duplicate or bloat tool definitions if possible.
  4. If you want reliable day-to-day OpenClaw use on that hardware: cloud model for agents.defaults.model, local Whisper under tools.media.audio, and optionally local vision if it’s acceptable latency-wise.

For “weak laptop but still usable”, hybrid is honestly the sane setup. Fully-local agent mode can work, but it’s going to feel rough unless the model is very small and tool support is minimal. 🦞

#

Verified against current docs/source: those keys are real on current OpenClaw.

Small nuance: agents.defaults.timeoutSeconds is the whole agent runtime timeout; models.providers.<id>.timeoutSeconds is the important one for slow local model HTTP/prefill/streaming. Setting both to 900 is fine, but provider timeout is the key fix.

The config direction looks good:

openclaw config set models.providers.custom-localhost-8080.timeoutSeconds 900 --strict-json
openclaw config set agents.defaults.timeoutSeconds 900 --strict-json
openclaw config set agents.defaults.experimental.localModelLean true --strict-json
openclaw config set channels.discord.historyLimit 0 --strict-json
openclaw config set models.providers.custom-localhost-8080.models[0].reasoning false --strict-json
openclaw config set models.providers.custom-localhost-8080.models[0].maxTokens 512 --strict-json

Then restart the gateway and verify:

#
openclaw gateway restart
openclaw status --deep

For llama.cpp on Windows, the important bit is starting the server/model with enough context, e.g. roughly:

& "C:\path\to\llama-server.exe" `
  -m "C:\path\to\model.gguf" `
  --host 127.0.0.1 `
  --port 8080 `
  --ctx-size 16384

Use 32768 only if it still fits without becoming unbearably slow. On 4GB VRAM, 16k is already likely painful, but OpenClaw full agent mode needs that kind of room.

#

Yep — confirmed, those are the right knobs too.

I’d phrase the context part like this:

openclaw config set models.providers.custom-localhost-8080.models[0].contextWindow 16000 --strict-json
openclaw config set models.providers.custom-localhost-8080.models[0].contextTokens 16000 --strict-json

contextWindow declares the model/server window OpenClaw should budget against; contextTokens is the effective runtime cap. Keeping them aligned is sensible for a constrained local llama.cpp setup.

And yes, this is the “escape hatch” if the tool schema itself is killing latency or causing malformed replies:

openclaw config set models.providers.custom-localhost-8080.models[0].compat.supportsTools false --strict-json
#

But that’s a big tradeoff: the local planner becomes much lighter, but it is no longer a real OpenClaw tool-using agent. Good for simple chat/tests; bad for coding, browser, files, Discord actions, etc.

So the bottom line is right: local Whisper/vision/simple chat may be viable; full OpenClaw coding/tool loops on 4GB VRAM are going to be rough unless tools/context are aggressively trimmed or the main assistant uses a faster hosted model.

rocky atlas
#

this is the config for my main model:
"%LLAMA_PATH%" ^
-m "%MODEL_PATH%" ^
--host 127.0.0.1 ^
--port 8080 ^
-ngl 6 ^
--threads 8 ^
--ctx-size 65536 ^
--cache-type-k q5_1 ^
--cache-type-v q5_1 ^
--flash-attn on ^
--no-kv-offload ^
-s 1 ^
-np 1 ^
--mlock ^
--no-mmap

strange sealBOT
#

That llama.cpp launch is probably over-allocating for this machine.

The biggest red flag is:

--ctx-size 65536

Even if OpenClaw is capped to 16k, llama.cpp may still reserve KV/cache for the 65k server context. On 4GB VRAM + 32GB RAM, that can make prefill painfully slow or push a lot through CPU/RAM.

I’d try this first for the main planner:

"%LLAMA_PATH%" ^
  -m "%MODEL_PATH%" ^
  --host 127.0.0.1 ^
#
  --port 8080 ^
  -ngl 6 ^
  --threads 8 ^
  --ctx-size 16384 ^
  --cache-type-k q5_1 ^
  --cache-type-v q5_1 ^
  --flash-attn on ^
  -s 1 ^
  -np 1

Then only go to --ctx-size 32768 if 16k actually fails context checks. I would not use 65k for full OpenClaw agent mode on that hardware unless you’re okay with very slow replies.

Notes on the other flags:

  • --no-kv-offload: saves VRAM, but can make long-context prefill slower. If removing it fits in VRAM, try without it and compare.
#
  • --mlock: can help avoid paging, but on Windows it may fail or make memory pressure worse. I’d remove it while testing.
  • --no-mmap: okay if you have RAM, but it increases startup/load pressure. Not the first thing I’d optimize.
  • -np 1: good for single-user/OpenClaw testing.

So: shrink ctx first. That’s likely the biggest win. 🦞

rocky atlas
#

it is a moe 8b with 1b active

strange sealBOT
#

That helps, but it doesn’t save you from the long-context cost.

MoE “8B / 1B active” mainly reduces compute per generated token. The painful part you’re hitting is prompt prefill + KV cache size, and that still scales with:

  • total context length
  • hidden size / layers of the full model architecture
  • KV cache format
  • CPU/RAM bandwidth if KV or layers aren’t fully GPU-fast
  • OpenClaw’s tool/system prompt size

So --ctx-size 65536 can still be brutal even if only ~1B params are active per token.

I’d still test in this order:

--ctx-size 16384
#

then maybe:

--ctx-size 32768

Only keep 65k if you actually need it and the prefill time is acceptable.

Also: if it’s a small MoE, tool-calling quality may be the bigger problem than raw speed. If it starts replying with JSON-looking tool blobs instead of real tool calls, try localModelLean first, then compat.supportsTools false as the “simple chat only” fallback.

rocky atlas
#

lean is already turned on

strange sealBOT
#

Got it — if localModelLean is already on, then the remaining big levers are mostly outside lean mode:

  1. Drop llama.cpp --ctx-size from 65536 to 16384 and retest.
  2. Keep OpenClaw model metadata aligned:
openclaw config set models.providers.custom-localhost-8080.models[0].contextWindow 16000 --strict-json
openclaw config set models.providers.custom-localhost-8080.models[0].contextTokens 16000 --strict-json
  1. Keep Discord history at 0 if this is a Discord thread:
openclaw config set channels.discord.historyLimit 0 --strict-json
  1. If it still spends forever before first token, try the hard tradeoff:
#
openclaw config set models.providers.custom-localhost-8080.models[0].compat.supportsTools false --strict-json

At that point you’re basically testing whether the remaining slowdown is “tool schema / agent loop” vs “the model/server just can’t prefill OpenClaw-sized prompts quickly enough.”

If supportsTools false is fast, the local model is usable for simple chat but not full agent work. If it’s still slow, the bottleneck is the llama.cpp runtime/context/hardware combo.

rocky atlas
#

it took over 1 minute for the prompt from openclaw to reacht he model

strange sealBOT
#

If it took a full minute before llama.cpp even logged/received the request, that points to OpenClaw-side prompt assembly / agent setup, not llama.cpp prefill yet.

Things I’d check next:

openclaw logs --follow

Then send one tiny message and look for where the minute is spent. You’re looking for gaps around context building, session/bootstrap, tool loading, or model request start.

Also run:

openclaw status --deep

A few likely causes:

#
  • stale/huge session context being compacted or loaded
  • big workspace/bootstrap context
  • many tools enabled despite lean mode
  • slow Windows gateway filesystem / antivirus scanning the OpenClaw state dir
  • gateway running from a broken Startup/login-item path and doing weird restart/recovery work

Quick isolation test: start a fresh session/thread if possible, keep Discord history at 0, and test with compat.supportsTools false. If the request reaches llama.cpp quickly then, the delay is mostly agent/tool context construction. If it still takes a minute, I’d suspect gateway/runtime/environment rather than the model.

Also: after changing config, restart the gateway once so you know the running process actually has the lean/history/tool settings loaded.

rocky atlas
#

18:03:14 [diagnostic] liveness warning: reasons=event_loop_delay interval=39s eventLoopDelayP99Ms=32.4 eventLoopDelayMaxMs=13505.7 eventLoopUtilization=0.418 cpuCoreRatio=0.416 active=1 waiting=0 queued=1 recentPhases=sidecars.subagent-recovery:7ms,sidecars.main-session-recovery:7ms,post-attach.update-sentinel:0ms,sidecars.model-prewarm:7061ms,sidecars.session-locks:5475ms,post-ready.maintenance:988ms work=[active=agent:main:main(processing,q=1,age=14s) queued=agent:main:main(processing,q=1,age=14s)]
18:03:27 [agent/embedded] [trace:embedded-run] startup stages: runId=6e7c993f-ef14-4bdf-850b-2931d365b705 sessionId=903489c3-4d6a-487b-aced-ab4496860813 phase=attempt-dispatch totalMs=13215 stages=workspace:2ms@2ms,runtime-plugins:15ms@17ms,hooks:0ms@17ms,model-resolution:12909ms@12926ms,auth:285ms@13211ms,context-engine:0ms@13211ms,attempt-dispatch:4ms@13215ms
18:05:14 [diagnostic] liveness warning: reasons=event_loop_delay interval=30s eventLoopDelayP99Ms=37.8 eventLoopDelayMaxMs=8199.9 eventLoopUtilization=0.276 cpuCoreRatio=0.27 active=1 waiting=0 queued=1 recentPhases=sidecars.subagent-recovery:7ms,sidecars.main-session-recovery:7ms,post-attach.update-sentinel:0ms,sidecars.model-prewarm:7061ms,sidecars.session-locks:5475ms,post-ready.maintenance:988ms work=[active=agent:main:main(processing/model_call,q=1,age=100s last=model_call:started) queued=agent:main:main(processing/model_call,q=1,age=100s last=model_call:started)]

strange sealBOT
#

These logs narrow it down nicely.

The big line is this:

model-resolution:12909ms

So OpenClaw spent ~13s before dispatch mostly resolving the model/provider/auth. That is slow, but it is not the full minute.

Then this line says the agent has been inside the model request for 100s:

processing/model_call ... age=100s last=model_call:started

So from OpenClaw’s perspective, the request did start. If llama.cpp didn’t visibly begin prompt processing until much later, the delay is likely in one of these places:

#
  • HTTP request/body transfer into llama.cpp
  • llama.cpp parsing the large request before logging prompt eval
  • llama.cpp queued/busy with a prior request
  • Windows/AV/filesystem/network loopback weirdness
  • model prewarm/catalog resolution hitting the local endpoint slowly

The 13s model-resolution is still worth fixing/checking. I’d test these:

openclaw models list
openclaw models status

And separately hit the local server directly:

#
curl http://127.0.0.1:8080/v1/models

If /v1/models is slow, model resolution/prewarm will be slow too. If it’s instant, the remaining delay is likely request size + llama.cpp handling.

Also the event loop warnings are real:

eventLoopDelayMaxMs=13505.7

That means the Gateway’s Node event loop got blocked for up to 13.5s. Not catastrophic by itself, but on a weak Windows box it can stack with model prewarm/session locks and make everything feel stuck.

Next most useful isolation test: temporarily set compat.supportsTools false, restart gateway, send hi, and watch whether model_call reaches llama.cpp quickly. If yes, the huge tool prompt is the culprit. If no, it’s more likely provider/model resolution or the llama.cpp server/runtime path.

rocky atlas
#

11:03:14+00:00 warn diagnostic {"subsystem":"diagnostic"} liveness warning: reasons=event_loop_delay interval=39s eventLoopDelayP99Ms=32.4 eventLoopDelayMaxMs=13505.7 eventLoopUtilization=0.418 cpuCoreRatio=0.416 active=1 waiting=0 queued=1 recentPhases=sidecars.subagent-recovery:7ms,sidecars.main-session-recovery:7ms,post-attach.update-sentinel:0ms,sidecars.model-prewarm:7061ms,sidecars.session-locks:5475ms,post-ready.maintenance:988ms work=[active=agent:main:main(processing,q=1,age=14s) queued=agent:main:main(processing,q=1,age=14s)]
11:03:27+00:00 warn agent/embedded {"subsystem":"agent/embedded"} [trace:embedded-run] startup stages: runId=6e7c993f-ef14-4bdf-850b-2931d365b705 sessionId=903489c3-4d6a-487b-aced-ab4496860813 phase=attempt-dispatch totalMs=13215 stages=workspace:2ms@2ms,runtime-plugins:15ms@17ms,hooks:0ms@17ms,model-resolution:12909ms@12926ms,auth:285ms@13211ms,context-engine:0ms@13211ms,attempt-dispatch:4ms@13215ms
11:05:14+00:00 warn diagnostic {"subsystem":"diagnostic"} liveness warning: reasons=event_loop_delay interval=30s eventLoopDelayP99Ms=37.8 eventLoopDelayMaxMs=8199.9 eventLoopUtilization=0.276 cpuCoreRatio=0.27 active=1 waiting=0 queued=1 recentPhases=sidecars.subagent-recovery:7ms,sidecars.main-session-recovery:7ms,post-attach.update-sentinel:0ms,sidecars.model-prewarm:7061ms,sidecars.session-locks:5475ms,post-ready.maintenance:988ms work=[active=agent:main:main(processing/model_call,q=1,age=100s last=model_call:started) queued=agent:main:main(processing/model_call,q=1,age=100s last=model_call:started)]
11:05:46+00:00 warn diagnostic {"subsystem":"diagnostic"} stalled session: sessionId=903489c3-4d6a-487b-aced-ab4496860813 sessionKey=agent:main:main state=processing age=132s queueDepth=1 reason=active_work_without_progress classification=stalled_agent_run activeWorkKind=model_call lastProgress=model_call:started lastProgressAge=131s recovery=none
Log tail truncated (increase --max-bytes).

strange sealBOT
#

This confirms the stall is after OpenClaw starts the model call, not before.

Timeline from the log:

  • model-resolution:12909ms → OpenClaw spends ~13s resolving/prewarming the model. Slow, but not fatal.
  • model_call:started → request has been dispatched.
  • lastProgressAge=131s → no streaming chunks came back for over 2 minutes.

So the actual hard stall is the local model server not producing any response chunks after the request begins. That usually means llama.cpp is stuck in prompt ingestion/prefill, not that Discord/OpenClaw is waiting on network transfer.

Given your setup, I’d do this exact isolation sequence:

  1. Reduce llama.cpp context hard:
--ctx-size 16384
#
  1. Align OpenClaw to that:
openclaw config set models.providers.custom-localhost-8080.models[0].contextWindow 16000 --strict-json
openclaw config set models.providers.custom-localhost-8080.models[0].contextTokens 16000 --strict-json
  1. Restart both llama.cpp and OpenClaw gateway.

  2. Send one tiny hi.

If it still stalls at model_call:started, try the last-resort tool cutoff:

openclaw config set models.providers.custom-localhost-8080.models[0].compat.supportsTools false --strict-json
openclaw gateway restart
#

If supportsTools false fixes it, the model/server can’t handle the OpenClaw tool prompt fast enough. If it still stalls, the bottleneck is llama.cpp/hardware/runtime, not OpenClaw config.

Also increase the log tail once so we can see the exact model request/failure around the stall:

openclaw logs --max-bytes 200000
rocky atlas
#

can i make the model processing the prompt faster?

strange sealBOT
#

Yes, but only within limits. Prompt processing/prefill speed is mostly hardware + context length + KV/cache choices.

Biggest wins to try, in order:

  1. Lower context size

This is the main one. Don’t run --ctx-size 65536 on a weak laptop unless you truly need it.

Try:

--ctx-size 16384

Maybe later:

#
--ctx-size 32768
  1. Put more layers on GPU if VRAM allows

You currently have:

-ngl 6

Try increasing slowly: -ngl 8, 10, 12, etc. Stop when it OOMs or starts swapping. More GPU layers can improve prompt eval, but 4GB VRAM is tight.

  1. Try allowing KV offload

You have:

#
--no-kv-offload

That keeps KV off GPU. It saves VRAM, but can hurt long prompt speed. Try removing that flag and see if it still fits.

  1. Use smaller/lower-bit KV cache

You already use:

--cache-type-k q5_1
--cache-type-v q5_1

For speed/memory experiments, try q4_0 or q4_1. Quality may drop a bit, but it can reduce memory pressure.

#
  1. Reduce OpenClaw prompt size

You already enabled lean mode. Also keep:

openclaw config set channels.discord.historyLimit 0 --strict-json
openclaw config set models.providers.custom-localhost-8080.models[0].maxTokens 512 --strict-json

If tool schemas are still killing it, the nuclear option is:

openclaw config set models.providers.custom-localhost-8080.models[0].compat.supportsTools false --strict-json

That will make prompts much lighter, but the model won’t be a proper tool-using OpenClaw agent anymore.

#
  1. Remove questionable memory flags while testing

I’d test without these first:

--mlock
--no-mmap

They can help in some setups, but on Windows they can also make memory behavior worse or harder to diagnose.

My recommended test config:

"%LLAMA_PATH%" ^
  -m "%MODEL_PATH%" ^
#
  --host 127.0.0.1 ^
  --port 8080 ^
  -ngl 8 ^
  --threads 8 ^
  --ctx-size 16384 ^
  --cache-type-k q4_1 ^
  --cache-type-v q4_1 ^
  --flash-attn on ^
  -s 1 ^
  -np 1

Then watch llama.cpp’s prompt eval time / prompt eval speed. That number is the one you’re trying to improve.

rocky atlas
#

the agent returning json again, but on local server it doesnt behave like that, openclaw must had injected something into model prompt making it like that

strange sealBOT
#

Yes — that’s very plausible.

OpenClaw full agent mode does inject tool/schema instructions into the prompt/request. If the backend/model doesn’t support native tool calling cleanly, it may see the tool schema JSON and imitate it in the final answer instead of producing a real tool call.

That’s why it behaves normally in your local server test: your direct test is probably just a plain chat prompt. OpenClaw is sending an agent prompt with tools, protocol text, system instructions, and context.

Try this first:

openclaw config set models.providers.custom-localhost-8080.models[0].compat.supportsTools false --strict-json
openclaw gateway restart

Then send a tiny message again.

If JSON-looking replies stop, that confirms the tool schema/protocol was confusing the model. Tradeoff: that local model becomes “chat-only-ish” inside OpenClaw — no real tool loop.

#

If you still want tools, the better fix is a model/backend with stronger tool-calling support, or a Responses/native tool-compatible local server. Small local MoE models often look fine in plain chat but fall apart once agent tool schemas are added.

rocky atlas
#

is there any other way to make openclaw's injection better? i think i need to adjust something so that it will increase speed and prevent scuffed model response

strange sealBOT
#

Yes. You can’t fully remove OpenClaw’s agent prompt if you want agent behavior, but you can make the injected surface much smaller and less JSON-bait-y.

Important detail: localModelLean helps, but it only trims some heavy defaults. It does not mean “tiny prompt / no tools”. If tools are still enabled, OpenClaw still has to describe tool calling somehow, and weak local models often imitate that JSON.

Best options, from least destructive to most:

  1. Restrict tools for only this local provider

Try a provider-specific minimal tool profile:

openclaw config set tools.byProvider.custom-localhost-8080.profile '"minimal"' --strict-json
openclaw gateway restart

That keeps your global OpenClaw setup intact, but gives the local model a much smaller tool surface. minimal is basically session_status only, so it should be faster and less likely to spit tool JSON.

#

If you need a few tools, use an explicit allowlist instead of full/coding, for example:

openclaw config set tools.byProvider.custom-localhost-8080.allow '["web_fetch","session_status"]' --strict-json
openclaw gateway restart

Don’t allow lots of tools on this model. Every tool adds schema/prompt weight and more chances for scuffed JSON replies.

  1. Keep Discord/context trimmed

You already have lean on, but also keep:

openclaw config set channels.discord.historyLimit 0 --strict-json
openclaw config set models.providers.custom-localhost-8080.models[0].maxTokens 512 --strict-json
#
openclaw config set models.providers.custom-localhost-8080.models[0].reasoning false --strict-json
  1. Fix the llama.cpp chat template if possible

This part matters a lot. If the model’s chat template injects something like:

List of tools: <|tool_list_start|>[ ... ]<|tool_list_end|>

then the model may naturally copy JSON into the answer. If llama.cpp lets you choose a better template for that model, test it. The right template/parser can be the difference between “real tool call” and “assistant prints JSON”.

  1. Last resort: disable tools for that model
#
openclaw config set models.providers.custom-localhost-8080.models[0].compat.supportsTools false --strict-json
openclaw gateway restart

This is the cleanest way to prevent tool-schema JSON from influencing the answer, and it should speed things up. But it means that local model is no longer a proper tool-using agent.

My recommendation for your hardware: try tools.byProvider.custom-localhost-8080.profile = "minimal" first. If it still returns JSON, the model/template is not handling OpenClaw-style tool prompts well, and supportsTools false is probably the sane path for that local model.

rocky atlas
#

what about this:
"tools": {
"profile": "coding",

strange sealBOT
#

tools.profile: "coding" is useful for a normal coding agent, but it’s probably too heavy for your local weak-laptop model.

coding includes file tools, runtime/exec, web, sessions, memory, cron, image tools, etc. That means OpenClaw has to expose a much larger tool surface to the model, which increases prompt size and makes JSON-looking/tool-looking replies more likely.

Better setup: keep global coding only if you also use hosted/strong models, but override the local provider to something smaller:

{
  tools: {
    profile: "coding",
    byProvider: {
      "custom-localhost-8080": {
        profile: "minimal"
      }
    }
  }
#
}

Or via CLI:

openclaw config set tools.byProvider.custom-localhost-8080.profile '"minimal"' --strict-json
openclaw gateway restart

For your local model, I would not use global tools.profile: "coding" without a byProvider override. It’s exactly the kind of thing that can slow prompt prefill and make the model copy tool JSON instead of answering normally.

rocky atlas
strange sealBOT
#

OpenClaw tool profiles are:

  • minimal — session_status only
  • coding — filesystem, runtime/exec, web, sessions, memory, cron, image/image generation/video generation
  • messaging — messaging tools plus session history/send/status
  • full — unrestricted core/optional tool baseline, basically no profile restriction

For your local model, I’d try minimal first, or a tiny explicit allowlist. coding is likely too much prompt/tool schema for that setup.

Docs: https://docs.openclaw.ai/gateway/config-tools

rocky atlas
#

full would makes sense since the injection would be different, not every prompt is a code

strange sealBOT
#

I get the intuition, but full would usually make this worse, not better.

tools.profile doesn’t mean “coding prompt style” vs “general prompt style”. It controls which tools OpenClaw exposes to the model.

So:

  • coding = a broad coding-ish tool set
  • full = basically unrestricted tool surface
  • minimal = tiny tool surface

If you switch from coding to full, OpenClaw may inject more tool capability/schema, not a cleaner non-code prompt. That means bigger prompt, slower prefill, and more chances the local model copies JSON/tool syntax into the reply.

For your goal — faster + less scuffed JSON — go smaller, not bigger:

openclaw config set tools.byProvider.custom-localhost-8080.profile '"minimal"' --strict-json