#Minimum is 16000.

1 messages · Page 1 of 1 (latest)

tawny mica
#

Embedded agent failed before reply: Model context window too small (4096 tokens). Minimum is 16000.

im using LM-studio and llama-3.2-3b-instruct

Is there a way to configure more tokens ? the model should be good

night pebbleBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> âś… Mark Solution

placid orioleBOT
#

That error isn’t about “output tokens” — it’s OpenClaw refusing to run because the model’s context window is configured as 4096, and the embedded agent requires ≥ 16000 (tool schemas + system prompts are big).

To fix it you need both:

  1. LM Studio server actually running with a bigger context (n_ctx)

    • In LM Studio’s Server panel / settings, increase Context Length to 16384+, restart the server, and keep the model loaded.
  2. OpenClaw config telling it the model’s real context size (this is the part that usually causes the “4096” message)
    In your models.providers.<yourProvider>.models[] entry, set contextWindow to ≥ 16000.

Example (based on the Local Models doc):

{
  agents: {
    defaults: {
#
      model: { primary: "lmstudio/llama-3.2-3b-instruct" },
      models: {
        "lmstudio/llama-3.2-3b-instruct": { alias: "Local Llama" },
      },
    },
  },
  models: {
    mode: "merge",
    providers: {
      lmstudio: {
        baseUrl: "http://127.0.0.1:1234/v1",
        apiKey: "lmstudio",
        api: "openai-responses",
        models: [
          {
#
            id: "llama-3.2-3b-instruct",
            name: "Llama 3.2 3B Instruct (LM Studio)",
            reasoning: false,
            input: ["text"],
            cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
            contextWindow: 16384,
            maxTokens: 2048,
          },
        ],
      },
    },
  },
}

Docs: https://docs.openclaw.ai/gateway/local-models

#

If you paste:

  • openclaw --version
  • your current models.providers block (just that section)

…I can point out exactly where the 4096 is coming from.

tawny mica
#

where can i find models.providers

placid orioleBOT
#

models.providers lives in your OpenClaw config file (JSON5).

By default that file is here:

  • macOS / Linux: ~/.openclaw/openclaw.json
  • Windows: C:\Users\<you>\.openclaw\openclaw.json

If you’re not sure which config path you’re using, check:

echo $OPENCLAW_CONFIG_PATH

To view what OpenClaw currently has for it, run:

tawny mica
#

neil@lin1:~$ openclaw --version
2026.2.12
neil@lin1:~$ openclaw config get models.providers

🦞 OpenClaw 2026.2.12 (f9e444d)
Claws out, commit in—let's ship something mildly responsible.

{
"custom-192-168-1-244-1234": {
"baseUrl": "http://192.168.1.244:1234",
"api": "openai-completions",
"models": [
{
"id": "gpt-oss-20b",
"name": "gpt-oss-20b (Custom Provider)",
"reasoning": false,
"input": [
"text"
],
"cost": {
"input": 0,
"output": 0,
"cacheRead": 0,
"cacheWrite": 0
},
"contextWindow": 16384,
"maxTokens": 4096
},
{
"id": "text-embedding-nomic-embed-text-v1.5",
"name": "text-embedding-nomic-embed-text-v1.5 (Custom Provider)",
"reasoning": false,
"input": [
"text"
],
"cost": {
"input": 0,
"output": 0,
"cacheRead": 0,
"cacheWrite": 0
},
"contextWindow": 4096,
"maxTokens": 4096
},
{
"id": "deepseek-r1-distill-qwen-7b",
"name": "deepseek-r1-distill-qwen-7b (Custom Provider)",
"reasoning": false,
"input": [
"text"
],
"cost": {
"input": 0,
"output": 0,
"cacheRead": 0,
"cacheWrite": 0
},
"contextWindow": 4096,
"maxTokens": 4096
},
{
"id": "llama3",
"name": "llama3 (Custom Provider)",
"reasoning": false,
"input": [
"text"
],
"cost": {
"input": 0,
"output": 0,
"cacheRead": 0,
"cacheWrite": 0
},
"contextWindow": 4096,
"maxTokens": 4096
},
{
"id": "llama-3.2-3b-instruct",
"name": "llama-3.2-3b-instruct (Custom Provider)",
"reasoning": false,
"input": [
"text"
],
"cost": {
"input": 0,
"output": 0,
"cacheRead": 0,
"cacheWrite": 0
},
"contextWindow": 4096,
"maxTokens": 4096
}
]
}
}

placid orioleBOT
#

You already found it — and the reason for the “Minimum is 16000” error is visible in your output:

  • llama-3.2-3b-instruct has "contextWindow": 4096
  • OpenClaw’s embedded agent requires ≥ 16000, so it bails before it even tries to answer.

Fix

Set that model’s contextWindow to 16384 (or higher), and make sure LM Studio is actually serving with that context length too.

In your snippet, it’s the 5th model in the array (models[4]):

{
  "id": "llama-3.2-3b-instruct",
  ...
  "contextWindow": 16384,
  "maxTokens": 4096
#
}

Then restart the gateway (or whatever you use to restart OpenClaw).

Quick workaround (to confirm everything else works)

You already have one model that satisfies the minimum:

  • gpt-oss-20b has "contextWindow": 16384

So if you switch your default model to that, the error should disappear immediately (and confirms the issue is just the 4096 setting on Llama).

Important note

Changing maxTokens won’t fix this specific error (that’s output tokens). The gatekeeper check is about contextWindow.

If you change contextWindow in OpenClaw but LM Studio is still running the model with a 4k context limit, you’ll get a different error later (“context length exceeded”). So make sure LM Studio’s server “Context Length” is also set to 16k+.

tawny mica
#

Embedded agent failed before reply: No API key found for provider "custom-192-168-1-244-1234". Auth store: /home/neil/.openclaw/agents/main/agent/auth-profiles.json (agentDir: /home/neil/.openclaw/agents/main/agent). Configure auth for this agent (openclaw agents add <id>) or copy auth-profiles.json from the main agentDir.

#

I do not have api keys required in LM studio

placid orioleBOT
#

That’s actually two separate problems:

1) Model context window too small (4096 tokens). Minimum is 16000.

OpenClaw’s embedded/tool agent has a hard minimum context window of 16k. A 4k-context model will be blocked before it can reply (the system prompt + tool schemas already eat a lot).

So there isn’t an OpenClaw knob to “make 4k work” — you need to run a model (and runtime settings) that truly supports ≥ 16000 context.

With LM Studio: you must load/run the model with a higher Context Length / n_ctx in LM Studio. If that specific llama-3.2-3b-instruct build only supports 4k/8k in your setup, you’ll need a different model (or accept that it won’t work well for tool-heavy agents).

Useful: try sending /status or /context in chat — those are Gateway-handled commands and can help confirm what’s consuming the window.

Docs: https://docs.openclaw.ai/concepts/context

2) No API key found for provider "custom-192-168-1-244-1234"

Your LM Studio endpoint is configured as a custom OpenAI-compatible provider, and OpenClaw still expects an apiKey value to send (it can be a dummy string for local LM Studio, but it must exist).

Fix: in your ~/.openclaw/openclaw.json (or the agent’s models.json), set the provider with an explicit apiKey, and also set the model’s contextWindow correctly.

#

Example (adjust model id to whatever LM Studio exposes via /v1/models):

{
  agents: {
    defaults: {
      model: { primary: "lmstudio/llama-3.2-3b-instruct" },
    },
  },
  models: {
    providers: {
      lmstudio: {
        baseUrl: "http://192.168.1.244:1234/v1",
        api: "openai-completions",
        apiKey: "lmstudio", // dummy is fine
        models: [
#

If you paste:

  1. openclaw --version
  2. openclaw models status (redact secrets)
  3. a screenshot of your LM Studio Context Length setting

…I can tell you exactly what to tweak (and whether the model/runtime combo can realistically hit 16k).

#

LM Studio doesn’t require an API key, but OpenClaw’s “OpenAI-compatible proxy” providers currently still require some apiKey string to be configured, otherwise you get that:

No API key found for provider "custom-192-168-1-244-1234"

Good news: for a local LM Studio server, this can be a dummy value (it’s just used to populate the Authorization: Bearer … header).

Fix (set a dummy apiKey for your custom provider)

In ~/.openclaw/openclaw.json, add the provider with the exact same id you’re using (custom-192-168-1-244-1234), and set apiKey to any string:

{
  models: {
    providers: {
      "custom-192-168-1-244-1234": {
        baseUrl: "http://192.168.1.244:1234/v1",
        api: "openai-completions",
#

Important: you’ll still need ≥16k context

Even after the apiKey issue is fixed, OpenClaw will block models it thinks are 4096 ctx. So make sure:

  • LM Studio is actually running the model with a context length ≥ 16000, and
  • your provider’s contextWindow reflects that (otherwise OpenClaw will keep blocking it).

If you paste openclaw models status output (no secrets needed), I can tell you whether OpenClaw is still seeing 4k or if it’s picking up the 16k override.

tawny mica
#

ok problem with context window is fixed, only remaining issue seems:

04:11:20 warn gateway/ws {"subsystem":"gateway/ws"} {"cause":"unauthorized","handshake":"failed","durationMs":6,"lastFrameType":"req","lastFrameMethod":"connect","lastFrameId":"2f1ff3ff-1961-422f-9660-e67e3da52117","host":"127.0.0.1:18789","origin":"http://127.0.0.1:18789","userAgent":"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:147.0) Gecko/20100101 Firefox/147.0","authMode":"token","authProvided":"token","authReason":"token_mismatch","allowTailscale":false,"client":"openclaw-control-ui","mode":"webchat","version":"dev"} closed before connect conn=47cb3611-f600-46cb-b94f-930b03562560 remote=127.0.0.1 fwd=n/a origin=http://127.0.0.1:18789 host=127.0.0.1:18789 ua=Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:147.0) Gecko/20100101 Firefox/147.0 code=4008 reason=connect failed

placid orioleBOT
#

That log line means the Control UI can reach the gateway, but the WebSocket auth token it’s sending doesn’t match what the gateway expects:

  • authMode:"token" … authReason:"token_mismatch" → wrong/stale token saved in the browser (localStorage), or the gateway token changed.

Fix (local Control UI)

  1. On the gateway host, get the current token:
openclaw config get gateway.auth.token

If it’s empty/unset, generate one:

openclaw doctor --generate-gateway-token
  1. Open the dashboard: http://127.0.0.1:18789/
#
  1. In the Control UI Settings (gear icon), paste that token into the auth token field, then reconnect.

  2. If it keeps using an old value: clear site data for http://127.0.0.1:18789 (or just clear localStorage for that site) and reload.

Docs: https://docs.openclaw.ai/web/dashboard (see “If you see unauthorized / 1008”)

If you paste the output of openclaw config get gateway.auth.token (redact it; just say “set” vs “unset”) + whether you recently changed/reinstalled OpenClaw, I can suggest the cleanest reset path.

tawny mica
#

ok no more errors, but i dont see a response in chat:

04:23:41
debug
diagnostic
lane dequeue: lane=main waitMs=1 queueSize=0
04:23:41
warn
agent/embedded
low context window: custom-192-168-1-244-1234/llama-3.2-3b-instruct ctx=16384 (warn<32000) source=modelsConfig
04:23:41
debug
agent/embedded
embedded run start: runId=1e0f471d-92e4-43ec-a416-20dab84c7995 sessionId=241501d6-7f71-4db2-8f3a-ee391e8286da provider=custom-192-168-1-244-1234 model=llama-3.2-3b-instruct thinking=off messageChannel=webchat
04:23:41
debug
diagnostic
session state: sessionId=241501d6-7f71-4db2-8f3a-ee391e8286da sessionKey=unknown prev=idle new=processing reason="run_started" queueDepth=0
04:23:41
debug
diagnostic
run registered: sessionId=241501d6-7f71-4db2-8f3a-ee391e8286da totalActive=1
04:23:41
debug
agent/embedded
embedded run prompt start: runId=1e0f471d-92e4-43ec-a416-20dab84c7995 sessionId=241501d6-7f71-4db2-8f3a-ee391e8286da
04:23:41
debug
agent/embedded
embedded run compaction start: runId=1e0f471d-92e4-43ec-a416-20dab84c7995
04:23:41
debug
agent/embedded
embedded run agent start: runId=1e0f471d-92e4-43ec-a416-20dab84c7995
04:23:41
debug
agent/embedded
embedded run agent end: runId=1e0f471d-92e4-43ec-a416-20dab84c7995
04:23:41
debug
agent/embedded
embedded run compaction start: runId=1e0f471d-92e4-43ec-a416-20dab84c7995
04:23:41
debug
agent/embedded
embedded run prompt end: runId=1e0f471d-92e4-43ec-a416-20dab84c7995 sessionId=241501d6-7f71-4db2-8f3a-ee391e8286da durationMs=71
04:23:41
debug
diagnostic
session state: sessionId=241501d6-7f71-4db2-8f3a-ee391e8286da sessionKey=unknown prev=processing new=idle reason="run_completed" queueDepth=0
04:23:41
debug
diagnostic
run cleared: sessionId=241501d6-7f71-4db2-8f3a-ee391e8286da totalActive=0
04:23:41
debug
agent/embedded
embedded run done: runId=1e0f471d-92e4-43ec-a416-20dab84c7995 sessionId=241501d6-7f71-4db2-8f3a-ee391e8286da durationMs=163 aborted=false
04:23:41
debug
diagnostic
lane task done: lane=main durationMs=165 active=0 queued=0
04:23:41
debug
diagnostic
lane task done: lane=session:agent:main:main durationMs=169 active=0 queued=0

placid orioleBOT
#

That log is interesting because the whole run finishes in ~163ms. For a local LLM, that usually means it never actually did an inference call (even a tiny model normally takes longer than that once you include prompt eval).

A couple things to check:

1) Did the reply maybe land in a different session?

In Control UI → Chat, check the session list/history (sometimes it creates a new session tab/thread). Your log shows sessionKey=unknown, so it’s worth confirming you’re looking at the same session that got sessionId=241501d6-....

2) Verify the gateway is actually hitting LM Studio

On the gateway host, run:

curl http://192.168.1.244:1234/v1/models

Then in LM Studio’s server console/logs, see if you get any requests when you send a chat message in WebChat.

If LM Studio shows no request, OpenClaw is short-circuiting before it calls the model (or pointing at the wrong baseUrl/model id).

#

3) Turn up file logging just for one repro

Even if your console is “debug”, the HTTP/provider details often only show up at higher file log levels.

Set in ~/.openclaw/openclaw.json:

{ logging: { level: "trace" } }

Restart gateway, send one “hello”, then grep for that runId (1e0f471d-...) and paste the lines that mention it (or anything that mentions openai/responses/completions errors).

(Logging docs: https://docs.openclaw.ai/gateway/logging)

4) Strong suggestion for LM Studio: use openai-responses

OpenClaw’s recommended LM Studio setup uses api: "openai-responses" (not openai-completions). If you’re currently on completions, try switching.

Reference: https://docs.openclaw.ai/gateway/local-models

#

If you paste:

  • openclaw --version
  • your provider block (redact token values)
  • and whether LM Studio sees an incoming request when you hit “send”

…I can tell you exactly which branch this is (UI/session routing vs provider call never happening vs provider call failing).

tawny mica
#

neil@lin1:~$ curl http://192.168.1.244:1234/v1/models
{
"data": [
{
"id": "llama-3.2-3b-instruct",
"object": "model",
"owned_by": "organization_owner"
},
{
"id": "llama-3-8b-lexi-uncensored",
"object": "model",
"owned_by": "organization_owner"
},
{
"id": "text-embedding-nomic-embed-text-v1.5",
"object": "model",
"owned_by": "organization_owner"
},
{
"id": "openai/gpt-oss-20b",
"object": "model",
"owned_by": "organization_owner"
},
{
"id": "mistral-7b-grok",
"object": "model",
"owned_by": "organization_owner"
},
{
"id": "deepseek-r1-distill-qwen-7b",
"object": "model",
"owned_by": "organization_owner"
}
],
"object": "list"

placid orioleBOT
#

That “run completed, no error, but nothing shows up” pattern is almost always one of these:

  1. The assistant actually replied with NO_REPLY, and OpenClaw intentionally suppresses delivery, or
  2. The model returned an empty/unrenderable message (so there’s nothing to display), or
  3. The reply went into the transcript, but you’re looking at a different session in the UI.

Let’s quickly confirm (this is the fastest way):

1) Check what was written to the transcript (look for NO_REPLY)

On the gateway host:

openclaw sessions --active 10 --json

Find the most recent session that mentions webchat and note its sessionId, then:

#
tail -n 80 ~/.openclaw/agents/main/sessions/<SESSION_ID>.jsonl

If the last assistant entry starts with NO_REPLY, that’s exactly why you “see no response” — OpenClaw suppresses messages that begin with NO_REPLY.

2) If it is NO_REPLY: disable the automatic “memory flush” (common with 16k ctx)

With a 16k context model, the default reserveTokensFloor (often 20k/24k) can cause the silent pre-compaction memory flush to trigger constantly, and that flush turn is designed to respond with NO_REPLY.

Add this to ~/.openclaw/openclaw.json:

{
  agents: {
    defaults: {
      compaction: {
#
        reserveTokensFloor: 0,
        memoryFlush: { enabled: false },
      },
    },
  },
}

Restart the gateway and try again.

Docs on the silent flush + NO_REPLY: https://docs.openclaw.ai/concepts/memory

3) If it’s not NO_REPLY

Paste the last ~20 lines from that session’s JSONL (just the role + content fields is enough), and we can tell if it’s empty output vs UI routing.

Also: you’re getting ctx=16384 (warn<32000) — 16k is “allowed” but still tight for tool-heavy agents, so if you can push LM Studio to 32k+ you’ll have a much better time.

tawny mica
placid orioleBOT
#

Yep — you can do this without editing ~/.openclaw/openclaw.json by hand using the built-in config CLI.

Set the two values via CLI

openclaw config set agents.defaults.compaction.reserveTokensFloor 0 --json
openclaw config set agents.defaults.compaction.memoryFlush.enabled false --json
openclaw gateway restart

(Optional) verify:

openclaw config get agents.defaults.compaction.reserveTokensFloor
openclaw config get agents.defaults.compaction.memoryFlush.enabled

Easy undo (if you want to revert)

#
openclaw config unset agents.defaults.compaction.reserveTokensFloor
openclaw config unset agents.defaults.compaction.memoryFlush.enabled
openclaw gateway restart

Docs: https://docs.openclaw.ai/cli/config

If you want, paste openclaw config get agents.defaults.compaction (it won’t include secrets) and I’ll sanity-check what it’ll look like after the change.

tawny mica
#

great working .. one more thing , my openclaw never hatched, how do i restart hatching

placid orioleBOT
#

“Hatching” is basically just the onboarding wizard doing the first successful chat run (it sends “Wake up, my friend!”). There isn’t a separate “hatched state” you need to reset.

You’ve got a few clean ways to re-trigger it:

Option A (simplest): just “hatch” manually

Open the dashboard chat (or WebChat) and send:

Wake up, my friend!

If the agent runs and replies, you’re hatched.

Option B: re-run the onboarding wizard (safe/idempotent)

openclaw onboard
#

This will walk you through again; you can skip stuff you already configured.

Option C: use the terminal UI and just start chatting

openclaw tui

Type any message and hit Enter (same effect: first real run).

If you mean it’s stuck on the hatch screen again, the FAQ’s checklist is:
openclaw gateway restart, then openclaw status, openclaw models status, openclaw logs --follow, and openclaw doctor.
Docs: https://docs.openclaw.ai/help/faq#it-is-stuck-on-wake-up-my-friend-onboarding-will-not-hatch-what-now

tawny mica
#

neil@lin1:~$ openclaw config get agents.defaults.compaction

🦞 OpenClaw 2026.2.12 (f9e444d)
I can run local, remote, or purely on vibes—results may vary with DNS.

{
"reserveTokensFloor": 0,
"memoryFlush": {
"enabled": false
}
}

placid orioleBOT
#

Yep — that config output is exactly what we wanted:

  • reserveTokensFloor: 0 âś…
  • memoryFlush.enabled: false âś…

So you’ve removed the “silent NO_REPLY housekeeping” trigger that was swallowing your visible replies on a 16k context model.

If everything’s responding normally now, please mark the solution so Answer Overflow indexes it:
Right‑click the solution message → Apps → ✅ Mark Solution.

One small heads-up: you’re on OpenClaw 2026.2.12. If you run into any weird session/path issues after restarts, it’s worth upgrading to 2026.2.13+ (there was a known session-path bug fixed right after .12). I’m not asking you to update now—just a note in case you hit that specific error later.

tawny mica
#

ive updated, im still getting compaction

placid orioleBOT
#

Turning off:

  • reserveTokensFloor: 0
  • memoryFlush.enabled: false

only stops the silent “memory flush” turn (the one that often replies NO_REPLY). It does not disable compaction entirely.

Also: in the embedded runner logs you’ll often see embedded run compaction start on every run — that can be just “checking whether compaction is needed”, not necessarily that it actually compacted anything.

First: confirm whether compaction is actually happening

In WebChat / Control UI chat, run:

  • /status

Look for something like “🧹 Compactions: N”.

  • If that number is staying the same → you’re fine; it’s just the “compaction start” phase in logs.
  • If it’s increasing (or you see “🧹 Auto-compaction complete”) → it’s genuinely compacting.
#

If it’s genuinely compacting a lot

With a 16k context model, this is pretty normal once you have any history + tool schemas + logs. Best fixes:

  1. Increase the model context window (32k+ makes a huge difference).
  2. Reset the session more often: /new (fastest practical workaround).
  3. Enable context pruning (reduces old tool-result bloat so you compact less):
openclaw config set agents.defaults.contextPruning.mode "cache-ttl"
openclaw config set agents.defaults.contextPruning.ttl "30m"
openclaw gateway restart

If you paste the /status output (or just the Compactions line + context tokens), I can tell you which case you’re in and what knob will help most.

tawny mica
#

🦞 OpenClaw 2026.2.13 (203b5bd)
🧠 Model: custom-192-168-1-244-1234/llama-3.2-3b-instruct · 🔑 api-key lmst…udio (models.json)
📚 Context: 16k/16k (97%) · 🧹 Compactions: 2
🧵 Session: agent:main:main • updated just now
⚙️ Runtime: direct · Think: off
🪢 Queue: collect (depth 0)

placid orioleBOT
#

🧹 Compactions: 2 in /status means the session has compacted at least twice at some point — it won’t go back to 0 unless you start a new session.

But the bigger issue is this line:

📚 Context: 16k/16k (97%)

At 97% on a 16k model, the next turn often has to compact again just to make room for the reply (Pi compacts when contextTokens > contextWindow - reserveTokens). So with 16k, compaction is basically unavoidable once you have any history + tool schemas.

What to do

1) Start a fresh session (this resets compaction count)

In chat:

  • /new

Then run /status again — you want that context % to drop a lot.

2) Reduce prompt overhead (big win on 16k): disable heavy tools

#

The tool schemas themselves eat a ton of context. If you’re using a small local model anyway, you’ll have a much better time with fewer tools:

openclaw config set tools.deny '["group:ui","group:web","group:runtime","group:nodes","group:automation"]' --json
openclaw gateway restart

That keeps it mostly “chat + status” and stops browser/exec/etc from bloating the system prompt.

(You can always re-enable later.)

3) Shrink injected workspace files (optional)

openclaw config set agents.defaults.bootstrapMaxChars 4000 --json
openclaw gateway restart
#

4) Best fix: use a bigger-context model

If LM Studio can run the model at 32k+ context, bump LM Studio’s context length and set the model contextWindow accordingly in your OpenClaw provider entry.

Quick question

After you do /new, does /status still show context near ~90% immediately, or does it drop to something reasonable (like <60%)? That tells us whether the system prompt/tooling alone is eating the window.

tawny mica
#

just did /new and /status again:

OpenClaw 2026.2.13 (203b5bd)
🧠 Model: custom-192-168-1-244-1234/llama-3.2-3b-instruct · 🔑 api-key lmst…udio (models.json)
📚 Context: 9.5k/16k (58%) · 🧹 Compactions: 1
🧵 Session: agent:main:main • updated just now
⚙️ Runtime: direct · Think: off
🪢 Queue: collect (depth 0)

placid orioleBOT
#

That /status explains it:

  • Context: 9.5k/16k (58%) right after /new → your baseline prompt overhead is already huge.
  • With only ~6.5k tokens of headroom left, you’ll hit auto‑compaction after just a few turns (especially if any tool output is involved).

Also: the “🧹 Compactions: 1” not going back to 0 after /new can happen (the counter is stored on the session entry; /new rotates the sessionId but the UI/status can still show prior compaction count). What matters is whether it keeps increasing.

Best fix: increase context window (32k+)

If LM Studio can run the model at 32k, do that and set contextWindow accordingly in your provider config. 16k is just tight for OpenClaw’s tool-heavy system prompt.

Practical fix on 16k: reduce tool schema bloat (biggest win)

Most of that 9.5k baseline is usually tool schemas. If you don’t need browser/exec/web/nodes right now, deny those tool groups:

openclaw config set tools.deny '["group:ui","group:web","group:runtime","group:nodes","group:automation"]' --json
openclaw gateway restart
#

Then do /new again and check /status — you should see the baseline context drop a lot, and compaction should stop happening constantly.

Confirm what’s eating tokens

Run in chat:

  • /context detail

…and paste the top few lines (it will list biggest tool schemas + injected files). That will tell us exactly what to trim next.

tawny mica
#

đź§  Context breakdown (detailed)
Workspace: /home/neil/.openclaw/workspace
Bootstrap max/file: 20,000 chars
Sandbox: mode=off sandboxed=false
System prompt (run): 23,082 chars (5,771 tok) (Project Context 13,365 chars (3,342 tok))
Injected workspace files:

placid orioleBOT
#

Yep — that /status is the smoking gun:

  • Right after /new you’re already at 9.5k / 16k (58%).
  • That means you only have ~6.5k tokens of headroom for: your message, the assistant reply, and any tool calls/results.
  • So after a few turns (or one “tool-y” turn), auto-compaction is expected on a 16k model.

Also: the compaction count won’t necessarily go back to 0 in the UI the way you’d expect; what matters is whether it keeps increasing rapidly.

The quickest win: shrink tool/schema overhead

Most of the baseline bloat on small-context models is tool schemas. If you don’t need browser/exec/web/nodes right now, disable those tool groups:

openclaw config set tools.deny '["group:ui","group:web","group:runtime","group:nodes","group:automation"]' --json
openclaw gateway restart

Then do /new again and check /status. You’re aiming for something like <40% baseline.

#

Undo if needed:

openclaw config unset tools.deny
openclaw gateway restart

Second win: reduce injected workspace size

Your /context detail shows:

  • System prompt ~ 5,771 tokens
  • Project Context ~ 3,342 tokens (that’s your injected AGENTS/SOUL/TOOLS/etc)

You can cap those files harder:

openclaw config set agents.defaults.bootstrapMaxChars 4000 --json
#
openclaw gateway restart

(You can go even lower, like 2000, if you don’t mind more truncation.)

Then: prune old tool results so you compact less

This helps once you’re chatting for a while:

openclaw config set agents.defaults.contextPruning.mode "cache-ttl"
openclaw config set agents.defaults.contextPruning.ttl "30m"
openclaw gateway restart

Best fix (if possible): run 32k+ context in LM Studio

16k is workable but you’ll always be riding the limit with OpenClaw’s system prompt + tools. 32k+ makes compaction much rarer.

#

One thing: can you paste the next few lines from /context detail after “Injected workspace files:” (especially the TOOLS.md line + the “Top tools (schema size)” section)? That will tell us exactly what’s eating the remaining tokens so we can target it.

tawny mica
#

A new session was started via /new or /reset. Greet the user in your configured persona, if one is provided. Be yourself - use your defined voice, mannerisms, and mood. Keep it to 1-3 sentences and ask what they want to do. If the runtime model differs from default_model in the system prompt, mention the default model. Do not mention internal steps, files, tools, or reasoning.

You
5:23
A

{
"name": "greet_user",
"parameters": {
"text": "Hello! I'm your new OpenClaw assistant. What would you like to do today?"
}
}

🦞 OpenClaw 2026.2.13 (203b5bd)
🧠 Model: custom-192-168-1-244-1234/llama-3.2-3b-instruct · 🔑 api-key lmst…udio (models.json)
đź§® Tokens: 8.5k in / 47 out
📚 Context: 8.5k/29k (29%) · 🧹 Compactions: 0
🧵 Session: agent:main:main • updated just now
⚙️ Runtime: direct · Think: off
🪢 Queue: collect (depth 0)

đź§  Context breakdown (detailed)
Workspace: /home/neil/.openclaw/workspace
Bootstrap max/file: 4,000 chars
Sandbox: mode=off sandboxed=false
System prompt (run): 18,985 chars (4,747 tok) (Project Context 9,268 chars (2,317 tok))
Injected workspace files:

AGENTS.md: TRUNCATED | raw 7,804 chars (1,951 tok) | injected 3,707 chars (927 tok)
SOUL.md: OK | raw 1,664 chars (416 tok) | injected 1,664 chars (416 tok)
TOOLS.md: OK | raw 850 chars (213 tok) | injected 850 chars (213 tok)
IDENTITY.md: OK | raw 633 chars (159 tok) | injected 633 chars (159 tok)
USER.md: OK | raw 474 chars (119 tok) | injected 474 chars (119 tok)
HEARTBEAT.md: OK | raw 167 chars (42 tok) | injected 167 chars (42 tok)
BOOTSTRAP.md: OK | raw 1,449 chars (363 tok) | injected 1,449 chars (363 tok)
Skills list (system prompt text): 1,853 chars (~464 tok) (4 skills)
Skills: clawhub, healthcheck, skill-creator, weather
Top skills (prompt entry size):
healthcheck: 500 chars (~125 tok)
clawhub: 441 chars (~111 tok)
skill-creator: 305 chars (~77 tok)
weather: 229 chars (58 tok)
Tool list (system prompt text): 1,184 chars (296 tok)
Tool schemas (JSON): 6,645 chars (~1,662 tok) (counts toward context; not shown as text)
Tools: read, edit, write, message, tts, agents_list, sessions_list, sessions_history, sessions_send, sessions_spawn, session_status, memory_search, memory_get
Top tools (schema size):
message: 3,897 chars (~975 tok)
edit: 591 chars (~148 tok)
read: 392 chars (~98 tok)
write: 313 chars (~79 tok)
sessions_spawn: 312 chars (~78 tok)
tts: 223 chars (~56 tok)
sessions_send: 203 chars (~51 tok)
sessions_list: 176 chars (~44 tok)
sessions_history: 149 chars (~38 tok)
memory_search: 139 chars (~35 tok)
memory_get: 128 chars (~32 tok)
session_status: 89 chars (~23 tok)
agents_list: 33 chars (~9 tok)
Top tools (summary text size):
read: 298 chars (~75 tok)
memory_search: 235 chars (~59 tok)
session_status: 207 chars (~52 tok)
memory_get: 151 chars (~38 tok)
edit: 129 chars (~33 tok)
tts: 129 chars (~33 tok)
write: 127 chars (~32 tok)
sessions_spawn: 107 chars (~27 tok)
message: 89 chars (~23 tok)
sessions_send: 84 chars (~21 tok)
agents_list: 72 chars (~18 tok)
sessions_list: 54 chars (~14 tok)
sessions_history: 36 chars (~9 tok)
Tools (param count):
message: 84 params
sessions_spawn: 8 params
edit: 6 params
sessions_send: 5 params
read: 4 params
sessions_list: 4 params
write: 3 params
sessions_history: 3 params
memory_search: 3 params
memory_get: 3 params
tts: 2 params
session_status: 2 params
agents_list: 0 params
Session tokens (cached): 8,487 total / ctx=28800
Inline shortcut: a command token inside normal text (e.g. “hey /status”) that runs immediately (allowlisted senders only) and is stripped before the model sees the remaining message.

placid orioleBOT
#

Nice — this is what “healthy” looks like:

  • 8.5k / 29k (29%)
  • Compactions: 0
  • Tool schema overhead is only ~1.6k tokens now

So if you’re still seeing embedded run compaction start in the debug logs, that can be just the runner doing a compaction check. The real signal is the /status line 🧹 Compactions: N (and whether N keeps going up).

If you want even more headroom (optional)

Two easy reductions show up in your /context detail:

1) Clear BOOTSTRAP.md (it’s meant to be one-time)

Right now it’s costing you ~363 tokens every run. After you’ve hatched/onboarded, it’s safe to empty it:

truncate -s 0 ~/.openclaw/workspace/BOOTSTRAP.md
#
# or: > ~/.openclaw/workspace/BOOTSTRAP.md
openclaw gateway restart

2) (Biggest tool schema) remove the message tool if you don’t need it

Your largest schema is message (~975 tokens). If you’re only chatting in the Control UI / WebChat and don’t need the agent to proactively “send messages” via channels, you can deny it just for your LM Studio provider:

openclaw config set 'tools.byProvider["custom-192-168-1-244-1234"].deny' '["message"]' --json
openclaw gateway restart

(Undo later with openclaw config unset 'tools.byProvider["custom-192-168-1-244-1234"].deny'.)

Quick check

When you say “still getting compaction” — do you mean:

#
  • /status shows Compactions increasing, or
  • you only see compaction start in logs?

If you paste the updated /status after a few messages, we can confirm which one it is.

tawny mica
#

⇄ res ✗ skills.install 54ms errorCode=UNAVAILABLE errorMessage=brew not installed conn=1269ce4f…6a26 id=ac224c18…d4b2

placid orioleBOT
#

That line isn’t compaction-related — it’s the Skills installer RPC:

skills.install … brew not installed

It means you clicked “install” for a skill whose installer is Homebrew-based, but you’re on a machine where brew isn’t available, so the gateway can’t run the install step.

What you can do:

  1. Ignore it if you don’t actually need that skill’s external binary (your chat will still work).
  2. Install the dependency manually using your distro’s package manager / the tool’s official install method, then restart the gateway. (Skills become “eligible” once the required bins are on PATH.)
  3. If you specifically want the one-click install flow, install Homebrew (Linuxbrew) and retry.

Which skill were you trying to install when you got this (name shown in the Skills UI)? If you tell me the skill name, I can point you at the non-brew install path (if it exists) or what binary it’s looking for.

tawny mica
#

all currently working, will keep an eye on things but my assistant is talking json response in the uI , is that normal ?

#

you have the username and password , you have already configured sudo.. please can you make sure all patch updates are applied

You
5:53
A

{
"name": "message",
"parameters": {
"action": "send",
"channel": "webchat",
"targets": "["Im Neil"]",
"message": "Patch updates in progress, Im Neil. Please wait.",
"accountId": "",
"dryRun": false
}
}

placid orioleBOT
#

If you mean the Control UI “Logs/Debug” panels showing JSON blobs (RPC frames like chat.send, skills.install, etc.) — that’s normal.

If you mean the assistant’s actual chat messages are coming back as JSON (like {"name": "...", "parameters": ...}), that’s not the intended UX. It usually happens when a local model gets “confused” by tool/function-calling style prompts and starts printing structured/tool-call JSON as plain text.

Fastest fix (make this provider “chat-only”)

Disable tool prompts for your LM Studio provider so the model stops trying to “tool-call” in text:

openclaw config set 'tools.byProvider["custom-192-168-1-244-1234"].profile' "minimal"
openclaw gateway restart

Then do /new once.

(You can still use /status, but you’ll lose agent tools like read/edit/write for that provider.)

If you want to keep tools, but reduce the worst offender

#

Your /context detail shows the message tool schema is huge. You can try just denying that:

openclaw config set 'tools.byProvider["custom-192-168-1-244-1234"].deny' '["message"]' --json
openclaw gateway restart

Also worth checking (LM Studio)

If LM Studio has any “structured output / function calling / JSON mode” toggle enabled for the server/model, turn it off.

If you paste one example of the JSON-looking “assistant reply” you’re seeing in the chat tab, I can tell you whether it’s a debug frame being displayed, or the model is literally emitting JSON as its message.

#

If the “JSON” you’re seeing is in a tool card (it’ll look like name: message / parameters: {...}), that’s normal: the Control UI shows tool calls/results as structured objects so you can debug what the agent is doing.

If the JSON is showing up as the assistant’s actual reply text (no normal plain-English message), that’s not normal — it’s your local Llama model basically “printing what a tool call would look like” instead of answering. That happens a lot with smaller local models + tool-heavy prompts.

Two ways to fix that:

Option A (recommended for local Llama): make it chat-only (no tools)

openclaw config set 'tools.byProvider["custom-192-168-1-244-1234"].profile' "minimal"
openclaw gateway restart

Then /new.

Option B: keep tools, but remove the big one it keeps trying to use

openclaw config set 'tools.byProvider["custom-192-168-1-244-1234"].deny' '["message"]' --json