#How slimmer the context is in the new update?

1 messages · Page 1 of 1 (latest)

coarse vigil
#

I've heard that in the new update, the context is slimmer and this might help me a lot !! However I still want to know how many tokens that might be and how to reduce the maximum the amount of context in my agent unless necessary (e.g. if it's possible to dynamically load skills or MCP servers on request).

wary socketBOT
wraith zenithBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution

wary socketBOT
pale totemBOT
pale totemBOT
#

Short version: yes, it’s slimmer, but there isn’t one fixed token number for “the new update” because it depends on your model, tools, workspace files, and session history.

Best way to check your exact live overhead:

  • /context detail → shows per-file, per-tool schema, and per-skill sizes
  • /status → shows how full your context window is
  • /usage tokens → adds per-reply token usage

A few important bits:

  • Skills are already lazy-loaded. OpenClaw only injects a compact skills list into the prompt. The actual SKILL.md instructions are read on demand only when needed.
  • Skills usually are not the biggest cost. Tool schemas are often much heavier than the skills list.
  • MCP saved entries are config, not full prompt blobs. If MCP-exposed tools are active, their tool schemas count like any other tool.

Rough skill-list cost from the docs:

  • base overhead when you have at least 1 skill: 195 chars
#
  • then per skill: 97 chars + name + description + location
  • rough estimate: ~24 tokens + metadata per skill

If you want to reduce context, these are the main knobs:

{
  agents: {
    defaults: {
      contextInjection: "continuation-skip",
      bootstrapMaxChars: 10000,
      bootstrapTotalMaxChars: 40000,
      experimental: {
        localModelLean: true
      }
    },
#
    list: [
      {
        id: "tiny-local",
        skills: ["weather"], // explicit allowlist
        skillsLimits: {
          maxSkillsPromptChars: 3000
        }
      }
    ]
  },
  skills: {
    limits: {
      maxSkillsPromptChars: 6000
    }
  }
#
}

What those do:

  • contextInjection: "continuation-skip" reduces repeated bootstrap injection on safe continuation turns
  • bootstrapMaxChars / bootstrapTotalMaxChars cap injected workspace-file size
  • skills allowlists reduce visible skills
  • maxSkillsPromptChars caps the skill catalog size
  • localModelLean: true drops heavyweight default tools like browser, cron, and message for weaker local backends

If you want a smaller effective context budget overall, you can also cap the model runtime window with:

  • models.providers.<provider>.models[].contextTokens

So, practical answer to your question:

cobalt nimbus
#

Yeah the new update trims system context noticeably — but if token budget is your actual pain, the bigger win is dynamic retrieval instead of stuffing everything in upfront.

What's worked for me: mr-memory plugin. Instead of loading MEMORY.md + all skills in the system prompt, it stores memories at full fidelity and retrieves only the relevant slices per message. So your context stays lean and you only pay for tokens you actually need.

Skills + MCP on-demand is a separate question (both can be gated by triggers in your agent config), but memory is usually the biggest contributor to bloat — especially if you've been running a while and MEMORY.md has grown.

openclaw plugins install mr-memory if you want to try it. Tradeoff: it's cloud-backed, so if you want everything local QMD is the move instead.

pale totemBOT
#

Yep, retrieval-style memory is a good direction.

Just one important correction for the thread: OpenClaw does not stuff all skill instructions into the system prompt. It only injects a compact skills catalog, then reads SKILL.md on demand when needed.

So if someone is fighting prompt bloat, the usual biggest buckets are:

  1. bootstrap files like AGENTS.md, MEMORY.md, etc.
  2. tool schemas
  3. then the skills list

That’s why /context detail is so useful, it shows the real breakdown for your setup instead of guessing.

Also, for MCP, the saved server entries themselves are just config. If MCP-exposed tools are active, their tool schemas count like normal tools.

If you want the local route, QMD is the built-in local-first option:

cobalt nimbus
#

Yeah good clarification — you're right that skills aren't fully inlined, just the catalog. The bigger upfront buckets are usually MEMORY.md / AGENTS.md / bootstrap files + tool schemas, like you said.

That's actually exactly where mr-memory pulls its weight: instead of MEMORY.md growing forever (and getting injected every turn), memories live external at full fidelity and only the relevant slices come back per message. So the prompt stays lean even as your memory pile gets huge.

QMD is great if you want everything local and deterministic — different philosophy, totally valid. mr-memory leans the other way: cloud retrieval, semantic recall, no compaction loss. Both work, just different tradeoffs.

If anyone wants to A/B it: openclaw plugins install mr-memory and see how your /context detail changes after a few sessions.

pale totemBOT
#

Yep, that’s a fair framing.

Only thing I’d add for anyone A/B testing:

  • use a fresh session
  • compare the same agent/tool setup
  • check /context detail before and after

Otherwise session history can muddy the comparison and make the memory backend look better or worse than it really is.

Also, if someone wants the built-in prompt savings first, before adding a plugin, the easiest knobs are:

  • agents.defaults.contextInjection: "continuation-skip"
  • agents.defaults.bootstrapMaxChars
  • agents.defaults.bootstrapTotalMaxChars
  • agents.defaults.experimental.localModelLean: true for weaker local backends