#Qwen sending massive amounts of data at every message

1 messages · Page 1 of 1 (latest)

cerulean olive
#

sends this

2026-04-24 01:38:18 [DEBUG]
Received request: POST to /v1/chat/completions with body {
"model": "huihui-qwen3.5-9b-claude-4.6-opus-abliterated",
"messages": [
{
"role": "system",
"content": "You are a personal assistant running inside OpenCl... <Truncated in logs> ... /reasoning; /status shows Reasoning when enabled."
},
{
"role": "user",
"content": [
{
"type": "text",
"text": "[Thu 2026-04-23 23:57 GMT+5:30] hi"
}
]
},
{
"role": "user",
"content": [
{
"type": "text",
"text": "[Thu 2026-04-23 23:59 GMT+5:30] hi"
}
]
},
{
"role": "assistant",
"content": "\n\nNO_REPLY",
"reasoning_content": "The user sent a simple greeting "hi" twice. This i... <Truncated in logs> ...ools needed for this - just a warm greeting back.\n"
},
{
"role": "user",
"content": [
{
"type": "text",
"text": "[Thu 2026-04-23 23:59 GMT+5:30] exit"
}
]
},
{
"role": "assistant",
"content": "\n\nNO_REPLY",
"reasoning_content": "The user said "exit" — this is a clear signal to e... <Truncated in logs> ...ond with NO_REPLY as per the Silent Replies rule.\n"
},
{
"role": "user",
"content": [
{
"type": "text",
"text": "[Fri 2026-04-24 00:00 GMT+5:30] exit"
}
]
},
{
"role": "assistant",
"content": null,
"reasoning_content": "The user said "exit" which means they want me to e... <Truncated in logs> ...an empty or minimal message to signal completion.\n",
"tool_calls": [
{
"id": "253171335",
"type": "function",
"function": {

and this continues forever with lots of tool details and what not

umbral plazaBOT
golden whaleBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> âś… Mark Solution

cerulean olive
#

and it keeps doing it until context is full

fossil coralBOT
#

Let me verify the docs path, then I’ll give them a clean answer instead of guessing.

#

Yeah, this looks like two things at once:

  1. Expected: /v1/chat/completions is stateless, so OpenClaw resends the system prompt, relevant transcript, and tool schema on each turn.
  2. Annoying: your local OpenAI-compatible backend is logging the entire request body at DEBUG, so it looks huge.

So this is probably not Qwen “randomly sending massive data”, it is OpenClaw sending normal full-turn context, and your backend dumping it verbosely.

The part that is suspicious is the amount of internal stuff being replayed (reasoning_content, tool calls, sessions_yield, metadata wrappers). Smaller local backends/models often struggle with OpenClaw’s full agent prompt shape.

First, what does this show?

openclaw --version

Then I’d try, in this order:

#
  • If the spam itself is the problem, turn down backend DEBUG/body logging
  • If this is a local /v1 model, enable lean mode:
    • agents.defaults.experimental.localModelLean: true
  • On that model entry, try:
    compat: {
      requiresStringContent: true,
      supportsTools: false
    }
    
  • If reasoning is ballooning the transcript, also try /think off

Docs:

#

If you paste openclaw --version plus your provider/model config block, I can tell you which of those knobs is the right one.

cerulean olive
#

OpenClaw 2026.4.22

fossil coralBOT
#

I’m going to check one doc on transcript hygiene so I don’t hand-wave the “keeps growing until context is full” part.

#

Yeah, the big request body part is normal-ish for /v1/chat/completions, because OpenClaw has to resend system prompt + transcript + tool schema each turn.

What is not normal is the runaway growth until context fills. That usually means a smaller/local OpenAI-compatible backend is not handling OpenClaw’s full agent transcript cleanly, especially around:

  • reasoning_content
  • tool calls / tool results
  • structured messages[].content arrays

First, can you paste:

openclaw --version

Then, if this is a local /v1 backend, I’d try these in order:

  1. Enable lean mode:
#
agents: {
  defaults: {
    experimental: {
      localModelLean: true
    }
  }
}
  1. On that model entry, add:
compat: {
  requiresStringContent: true,
  supportsTools: false
}
#
  1. If it keeps stuffing reasoning into history, try:
/think off
  1. After changing config, start a fresh session with:
/new

because the current thread transcript is probably already polluted/bloated.

If the complaint is only about the logs being huge, also turn down DEBUG request-body logging on the backend itself.

Docs:

#

If you paste your provider/model config block, I can point to the exact place to put the compat flags.

#

I found one code comment that lines up pretty closely with your symptom. I’m checking that before I answer.

#

Thanks, that helps.

2026.4.22 is recent enough that I would not treat this as a simple “you’re on an ancient build” problem.

What I think is happening:

  • OpenClaw is using a Chat Completions style backend (POST /v1/chat/completions)
  • that backend/model is replaying or preserving a lot of reasoning/tool transcript state
  • then OpenClaw sends that transcript back on the next turn, so it grows fast until context fills

So the big body is partly expected, but the amount of internal junk being retained is not great.

My recommendation, in order:

  1. If this is your own local /v1 provider, prefer Responses API instead of Chat Completions
#
   api: "openai-responses"

The local-model docs explicitly recommend that path and note it keeps reasoning separate from final text.

  1. Mark the model as non-reasoning

    reasoning: false
    

    for that custom model entry, if it isn’t already.

  2. If you must stay on openai-completions, add compat flags

    compat: {
      requiresStringContent: true,
      supportsTools: false
    
#
   }
  1. Enable lean mode for local backends

    agents: {
      defaults: {
        experimental: {
          localModelLean: true
        }
      }
    }
    
  2. Start a fresh session after changing config

#
   /new

Otherwise the current bloated transcript keeps coming along for the ride.

Optional quick band-aid:

/think off

Also, that model name, huihui-qwen3.5-9b-claude-4.6-opus-abliterated, is exactly the kind of smaller/weird local model that tends to get messy on agentic/tool-heavy runs. OpenClaw can work with local models, but the docs are pretty blunt that small or strict local backends are fragile here.

Docs:

#

If you paste your actual models.providers... block, I can rewrite it into a safer config for this exact backend.

cerulean olive
#

what is this
Responses API instead of Chat Completions

fossil coralBOT
#

It’s the provider transport mode in OpenClaw.

In config, this part:

models: {
  providers: {
    yourprovider: {
      api: "openai-completions"
    }
  }
}

means:

  • openai-completions → OpenClaw talks to a Chat Completions style endpoint, usually POST /v1/chat/completions
#
  • openai-responses → OpenClaw talks to the Responses API style endpoint, usually POST /v1/responses

Why I brought it up:

  • your log shows POST /v1/chat/completions, so you are currently on the Chat Completions path
  • the local-model docs prefer Responses API when available, because it tends to handle reasoning/output separation more cleanly

So the change would look like:

models: {
  providers: {
    yourprovider: {
      baseUrl: "http://127.0.0.1:1234/v1",
      apiKey: "whatever",
      api: "openai-responses"
#
    }
  }
}

But, important: this only works if your backend actually supports /v1/responses.
If it only supports /v1/chat/completions, then you have to stay on openai-completions.

So in plain English:

  • Chat Completions = older/more common OpenAI-style chat endpoint
  • Responses API = newer endpoint, often better for structured/reasoning-heavy flows
  • OpenClaw setting for this = models.providers.<provider>.api

Docs:

cerulean olive
#

"models": {
"providers": {
"lmstudio": {
"baseUrl": "http://host.docker.internal:9191/v1",
"api": "openai-completions",
"auth": "api-key",
"apiKey": "LM_API_TOKEN",
"models": [
{
"id": "huihui-qwen3.5-9b-claude-4.6-opus-abliterated",
"name": "Huihui Qwen3.5 9B Claude 4.6 Opus Abliterated",
"reasoning": false,
"input": [
"text",
"image"
],
"cost": {
"input": 0,
"output": 0,
"cacheRead": 0,
"cacheWrite": 0
},
"contextWindow": 262144,
"contextTokens": 65536,
"maxTokens": 8192
}
]
}
},
"mode": "merge"
},

fossil coralBOT
#

I’m checking the LM Studio docs once more, because I don’t want to tell them to switch transports if that specific server path doesn’t actually support it.

#

Yep, this config helps a lot.

With this exact LM Studio block, I would not start by changing transports yet. The LM Studio provider docs still show api: "openai-completions", so first I’d harden the model entry for strict local-backend behavior:

"models": {
  "providers": {
    "lmstudio": {
      "baseUrl": "http://host.docker.internal:9191/v1",
      "api": "openai-completions",
      "auth": "api-key",
      "apiKey": "LM_API_TOKEN",
      "models": [
        {
          "id": "huihui-qwen3.5-9b-claude-4.6-opus-abliterated",
          "name": "Huihui Qwen3.5 9B Claude 4.6 Opus Abliterated",
#
          "reasoning": false,
          "input": ["text", "image"],
          "cost": {
            "input": 0,
            "output": 0,
            "cacheRead": 0,
            "cacheWrite": 0
          },
          "contextWindow": 262144,
          "contextTokens": 65536,
          "maxTokens": 8192,
          "compat": {
            "requiresStringContent": true,
            "supportsTools": false
          }
#
        }
      ]
    }
  },
  "mode": "merge"
}

And also add:

"agents": {
  "defaults": {
    "experimental": {
      "localModelLean": true
#
    }
  }
}

Why:

  • requiresStringContent: true helps strict /v1/chat/completions backends
  • supportsTools: false stops OpenClaw from sending the heavy tool schema surface
  • localModelLean: true trims more prompt bloat

Honestly, the 9B “abliterated” model is probably a big part of this. I would not trust that class of model for tool-heavy agent turns.

Then start a fresh session with:

#
/new

because the current transcript is already bloated.

Optional extra band-aid:

/think off

About openai-responses vs openai-completions:

  • if your LM Studio server on :9191 really supports /v1/responses, then yes, it is worth trying later
  • but for this config, I’d try the compat flags first

Docs:

cerulean olive
#

{
"models": {
"providers": {
"lmstudio": {
"baseUrl": "http://host.docker.internal:9191/v1",
"api": "openai-completions",
"auth": "api-key",
"apiKey": "LM_API_TOKEN",
"models": [
{
"id": "huihui-qwen3.5-9b-claude-4.6-opus-abliterated",
"name": "Huihui Qwen3.5 9B Claude 4.6 Opus Abliterated",
"reasoning": false,
"input": [
"text",
"image"
],
"cost": {
"input": 0,
"output": 0,
"cacheRead": 0,
"cacheWrite": 0
},
"contextWindow": 65536,
"contextTokens": 16384,
"maxTokens": 8192
}
]
}
},
"mode": "merge"
},

fossil coralBOT
#

This is a better cap, but it probably won’t fix the root problem by itself.

What you changed:

  • contextWindow: 65536 = model metadata
  • contextTokens: 16384 = OpenClaw’s effective runtime cap

So yes, 16k will limit the blast radius, but the real issue still looks like:

  • strict local chat/completions backend
  • weird reasoning_content replay
  • tool/schema bloat on a small 9B model

I’d use this instead:

#
{
  models: {
    providers: {
      lmstudio: {
        baseUrl: "http://host.docker.internal:9191/v1",
        api: "openai-completions",
        auth: "api-key",
        apiKey: "LM_API_TOKEN",
        models: [
          {
            id: "huihui-qwen3.5-9b-claude-4.6-opus-abliterated",
            name: "Huihui Qwen3.5 9B Claude 4.6 Opus Abliterated",
            reasoning: false,
            input: ["text", "image"],
            cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
#
            contextWindow: 65536,
            contextTokens: 16384,
            maxTokens: 8192,
            compat: {
              requiresStringContent: true,
              supportsTools: false
            }
          }
        ]
      }
    },
    mode: "merge"
  },
  agents: {
    defaults: {
#
      experimental: {
        localModelLean: true
      }
    }
  }
}

Then:

openclaw gateway restart

and start a fresh thread/session with:

#
/new

A couple important notes:

  • lowering contextTokens helps symptoms
  • compat.requiresStringContent and supportsTools: false are the more likely fix
  • that 9B abliterated model is still a pretty sketchy choice for agentic/tool-heavy work, honestly

So my recommendation is:

  1. add the compat block
  2. add localModelLean: true
  3. restart gateway
  4. /new
  5. if it still happens, switch to a stronger/cleaner model
cerulean olive
#

but I do want tool support

fossil coralBOT
#

I’m checking the exact tool-policy config name before I suggest trimming the tool surface without fully disabling tools.

#

Yep, then don’t set supportsTools: false.

If you want tool support, I’d do the middle path:

  1. keep tools on
  2. add only:
compat: {
  requiresStringContent: true
}
  1. enable lean mode:
agents: {
  defaults: {
    experimental: {
#
      localModelLean: true
    }
  }
}
  1. shrink the tool set for LM Studio instead of disabling tools completely

Example:

{
  tools: {
    byProvider: {
      lmstudio: {
        profile: "minimal",
#
        allow: ["group:fs", "group:web", "group:sessions", "group:memory"]
      }
    }
  }
}

That gives you a much smaller tool schema, while still keeping useful tools.

If you need command execution too, add it later once stable:

"group:runtime"

I would not start with browser/UI/message/cron on that 9B model. That’s where local small models usually start falling apart.

#

So my recommendation is:

  • keep api: "openai-completions"
  • add compat.requiresStringContent: true
  • add localModelLean: true
  • narrow LM Studio tools with tools.byProvider
  • restart gateway
  • start a fresh session with /new

If it still explodes after that, then it’s probably just the limit of this model/backend combo, not your config.

Docs:

If you want, I can turn your current config into one ready-to-paste version that keeps some tool support without the full bloat.

cerulean olive
#

first lets try only using responses type

fossil coralBOT
#

Yep, that’s a good next test. I’d change only one thing first so the result is clean:

{
  "models": {
    "providers": {
      "lmstudio": {
        "baseUrl": "http://host.docker.internal:9191/v1",
        "api": "openai-responses",
        "auth": "api-key",
        "apiKey": "LM_API_TOKEN",
        "models": [
          {
            "id": "huihui-qwen3.5-9b-claude-4.6-opus-abliterated",
            "name": "Huihui Qwen3.5 9B Claude 4.6 Opus Abliterated",
            "reasoning": false,
#
            "input": ["text", "image"],
            "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
            "contextWindow": 65536,
            "contextTokens": 16384,
            "maxTokens": 8192
          }
        ]
      }
    },
    "mode": "merge"
  }
}

Then:

#
openclaw gateway restart

And test in a fresh session:

/new

What you want to see in logs after that is:

  • POST /v1/responses
  • not POST /v1/chat/completions

If it works, nice, we learned the transport was part of the problem.
If it fails with 404/unsupported endpoint, then that LM Studio server on :9191 does not expose Responses API, and you should switch back to openai-completions.

cerulean olive
fossil coralBOT
#

Nice, this tells us a lot.

The transport switch worked. You are now on:

  • POST /v1/responses

But it also shows why the payload is still huge:

  • the full session transcript is still being replayed
  • I can see reasoning items in input
  • I can also see a big tools array starting with stuff like cron, edit, exec, image, image_generate

So, Responses API was not the root fix. It changed the shape, but not the fact that OpenClaw is still sending a lot of context + tool schema.

If you want to keep tool support, I’d do this next:

#
{
  agents: {
    defaults: {
      thinkingDefault: "off",
      experimental: {
        localModelLean: true
      }
    }
  },
  tools: {
    byProvider: {
      lmstudio: {
        profile: "minimal",
        allow: ["group:fs", "group:web", "group:sessions", "group:memory"]
      }
#
    }
  }
}

Why this helps:

  • thinkingDefault: "off" tries to stop more reasoning blocks from being generated/replayed
  • localModelLean: true trims heavyweight default tools
  • tools.byProvider.lmstudio keeps some tools, but drops giant stuff like cron, browser/UI, message, media generation, etc.

If you need shell commands too, add this later:

"group:runtime"
#

If you want, I can turn your current config into one ready-to-paste block that keeps read/edit/web/session style tools but drops the bulky ones.

cerulean olive
#

what does this do exactly?

localModelLean: true

also, I need all tools

fossil coralBOT
#

localModelLean: true is basically a “make OpenClaw less heavy for weak local models” switch.

What it does:

  • trims heavy default tools from the prompt/tool surface
  • docs call out things like browser, cron, and message
  • goal is a smaller, less brittle prompt shape for smaller/stricter local backends

What it does not do:

  • it does not make the model smarter
  • it does not compact your session history
  • it does not just “optimize performance” in a generic way
  • it does reduce tool availability

So if you need all tools, then honestly: don’t turn it on.

In your case, that means the real tradeoff is:

#
  1. Keep all tools

    • leave localModelLean off
    • stay on openai-responses if that path works better
    • try thinkingDefault: "off" to reduce reasoning junk
    • reduce backend DEBUG body logging
    • accept that a 9B local model may just be too small for full OpenClaw agent runtime
  2. Reduce prompt/tool bloat

    • turn on localModelLean
    • but you lose some heavyweight tools

Since you said you need all tools, my blunt take is:
the likely bottleneck is the model/backend combo, not one magic config flag.

So I would try next:

#

If this helped, please right-click the solution message → Apps → ✅ Mark Solution so Answer Overflow can index it.

cerulean olive
#

I solved it by turning on cache

fossil coralBOT
#

Nice, that makes sense.

If it was prompt caching, the likely effect is:

  • OpenClaw still sends a big stable prefix
  • but the backend can reuse/cache that prefix instead of paying the full cost every turn
  • so the setup becomes usable even though the raw logged request still looks huge

So, basically, cache helped with the repeated prefix, not by magically making the payload tiny.

One important gotcha: if your backend is logging the full request body at DEBUG, it will still look scary even when caching is doing its job.

If you want, share the exact cache setting you enabled, that’d be useful for anyone who hits the same thing.

Related docs:

#

Glad you caught it 🎉
If this fixed it, please right-click the solution message → Apps → ✅ Mark Solution so Answer Overflow can index it for future users.

cerulean olive
#

I will... can you recommend best practices to enable cache?

fossil coralBOT
#

Nice, glad you found it.

Best practices I’d use for cache in OpenClaw:

  1. Set a sane global default

    {
      agents: {
        defaults: {
          params: { cacheRetention: "short" } // none | short | long
        }
      }
    }
    

    Start with short, use long only for deep long-lived sessions.

  2. Override per agent

#
  • deep/research agent: cacheRetention: "long"

    • bursty alerts/notifiers: cacheRetention: "none"
    {
      agents: {
        list: [
          { id: "research", params: { cacheRetention: "long" } },
          { id: "alerts", params: { cacheRetention: "none" } }
        ]
      }
    }
    
  1. Pair cache with cache-TTL pruning
    This is the big one for long sessions.
#
   {
     agents: {
       defaults: {
         contextPruning: {
           mode: "cache-ttl",
           ttl: "1h"
         }
       }
     }
   }

That helps avoid re-caching giant stale tool history after idle gaps.

  1. Keep cache warm only where it matters
    If your provider/backend has a 1h cache window, heartbeat just under that:
#
   {
     agents: {
       defaults: {
         heartbeat: { every: "55m" }
       }
     }
   }

Good for main/research sessions, bad for noisy agents.

  1. Keep stable stuff stable
    Cache works best when the prompt prefix does not churn. So avoid constantly changing:
    • AGENTS.md / workspace bootstrap files
    • tool inventory
    • model/provider config
#
  1. Use /status to confirm it’s actually helping
    Look for cacheRead / cacheWrite movement, not just “it feels faster”.

  2. Do not judge by DEBUG body logs alone
    A cached run can still log a huge-looking request body if the backend dumps full requests.

My default recommendation:

{
  agents: {
    defaults: {
      params: { cacheRetention: "short" },
      contextPruning: { mode: "cache-ttl", ttl: "1h" }
    },
    list: [