#openclaw's wire format to OpenAI-compatible endpoints has two issues that break Kimi's chat template

1 messages · Page 1 of 1 (latest)

rapid crypt
#

Running openclaw against a self-hosted Kimi-K2 (sglang with --tool-call-parser kimi_k2 --reasoning-parser kimi_k2):

openclaw's wire format to OpenAI-compatible endpoints has two issues that break Kimi's chat template and cause the model to emit a preamble then halt with finish_reason=stop mid-task (no tool_calls, no further progress). Same model + same server work fine from other harnesses, so this is on the openclaw side.

  1. Tool call IDs are stripped of punctuation
    Session transcript stores functions.read:0 but the request wire sends functionsread0 — all dots and colons removed. Kimi's tool-call-parser kimi_k2 expects the native functions.<name>:<index> format, and the mangled IDs break tool-result matching on subsequent turns.

  2. reasoning_content is replayed in history
    Every historical assistant message in the request carries its reasoning_content. Kimi's template treats reasoning_content as output-only; replaying it on input confuses the parser. Combined with (1), the model emits a text preamble like "Task 1 — …" then stops with finish_reason=stop.

  3. Top-level thinking: {type: "enabled"}
    openclaw adds a top-level thinking field to the request body. Not an OpenAI field; upstreams can ignore it but worth dropping.

Any ideas why or how to fix?

uneven tendonBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> âś… Mark Solution

harsh valeBOT
misty voidBOT
#

You are not imagining it, this is reproducible on the OpenClaw side.

  1. functions.read:0 → functionsread0
    This comes from strict tool-id sanitization in replay:
  • sanitizeToolCallId strips [^a-zA-Z0-9] in src/agents/tool-call-id.ts
  • buildOpenAICompatibleReplayPolicy(..., { sanitizeToolCallIds: true }) is default in src/plugins/provider-replay-helpers.ts
  • SGLang plugin currently does not override replay policy, so openai-completions fallback applies.
  1. reasoning_content replayed back into history
    OpenClaw ingests reasoning_content / reasoning / reasoning_text as thinking blocks (src/agents/openai-transport-stream.ts, getCompletionsReasoningDeltas), then replay keeps those blocks for openai-completions, and pi-ai re-emits them as assistant fields (including reasoning_content) on the next request.
    That is exactly the pattern that can confuse Kimi chat templates.

  2. top-level thinking: {type:"enabled"}
    That is injected by Moonshot thinking wrappers (MOONSHOT_THINKING_STREAM_HOOKS → createMoonshotThinkingWrapper).
    If you are truly on sglang/... this field should not be auto-added unless you set a params.thinking override. If it appears, you are likely hitting Moonshot hooks (or explicit params).

Recommended fixes:

#
  • Add provider-owned replay policy for SGLang, disable sanitizeToolCallIds (same approach already done for Moonshot on main).
  • Strip thinking/reasoning replay for openai-completions proxy backends (or at least do not re-emit reasoning_content-style fields).
  • Scope/remove Moonshot thinking injection for non-native endpoints.

Can you paste:

  • openclaw --version
  • your exact model ref (sglang/... vs moonshot/...)
  • whether you have agents.defaults.params.thinking or model-level params.thinking

I can then give you a minimal patch diff for your exact path.

rapid crypt
#

OpenClaw 2026.4.21 (f788c88)

"models": {
"providers": {
"moonshot": {
"baseUrl": "http://127.0.0.1:30001/v1",
"apiKey": "4444",
"api": "openai-completions",
"models": [
{
"id": "mymodel",
"name": "mymodel",
"reasoning": true,
"contextWindow": 200000,
"maxTokens": 63000,
"input": ["text", "image"],
"compat": { "supportsReasoningEffort": false }
}
]
},

#

any ideas?

misty voidBOT
#

Yep, that config explains it.

You are not actually hitting the generic sglang provider path, you are using a custom moonshot provider with a self-hosted baseUrl. That means OpenClaw applies Moonshot-specific behavior on top of an OpenAI-compatible transport.

So, for your 3 symptoms:

  1. functions.read:0 getting mangled
    That was a real Moonshot replay bug. I checked main, and there is already a fix for it:
    c4dea58712 fix(moonshot): preserve native Kimi tool_call IDs in openai-completions replay

    Your version is 2026.4.21, and that fix landed 2026-04-23, so you are just before it.

  2. historical reasoning_content replay
    Also real. OpenClaw ingests reasoning_content into internal thinking blocks, then replays those back into OpenAI-completions history for same-provider/same-model turns. That is fine for some backends, but bad for Kimi via SGLang.

  3. top-level thinking: { type: "enabled" }
    That is happening because you chose the moonshot provider, which wires Moonshot thinking wrappers. With a self-hosted SGLang endpoint, that is probably not what you want.

#

My recommendation

Best immediate path:

  • Upgrade to a build newer than 2026.4.21 so you get the tool-call-id fix.
  • For this self-hosted SGLang setup, do not use models.providers.moonshot. Use models.providers.sglang instead:
    https://docs.openclaw.ai/providers/sglang
  • Also set thinking off for now on this model/session, so OpenClaw stops injecting Moonshot-style thinking and you avoid replaying Kimi reasoning back into history.

Short version

  • moonshot + custom baseUrl is the wrong combo for a generic SGLang endpoint.
  • One of your bugs is already fixed on main.
  • The other 2 are basically “Moonshot/Kimi reasoning semantics leaking into a non-native OpenAI-compatible proxy flow”.

If you want, I can sketch the exact JSON5 config I’d switch you to for sglang, plus the smallest code patch to suppress historical reasoning replay for this backend.

rapid crypt
#

Thanks, that matches what I saw on the wire perfectly. A few follow-ups:

  1. **Tool-id fix landed. Ill update to beta.

  2. Yes please — I'd like both:

    • the exact JSON5 config for models.providers.sglang with a Kimi-K2 model that has reasoning enabled (so I can stop abusing the moonshot provider)
    • the minimal patch to suppress historical reasoning_content replay on the openai-completions path, since I'd rather not have to run a rewriting proxy long-term
  3. On "set thinking off for now" — does that mean:
    (a) reasoning: false on the model entry, or
    (b) a params.thinking override at agent/model level, or
    (c) both?
    I do want the model's reasoning visible in outputs, I just don't want it replayed back in history.

misty voidBOT
#

Yep, here’s the clean version.

1) Exact sglang config

{
  models: {
    providers: {
      sglang: {
        baseUrl: "http://127.0.0.1:30001/v1",
        apiKey: "${SGLANG_API_KEY}",
        api: "openai-completions",
        models: [
          {
            id: "mymodel",
            name: "Kimi K2 via SGLang",
#
            reasoning: true,
            input: ["text", "image"],
            cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
            contextWindow: 200000,
            maxTokens: 63000,
            compat: {
              supportsReasoningEffort: false,
            },
          },
        ],
      },
    },
  },

  agents: {
#
    defaults: {
      model: { primary: "sglang/mymodel" },
    },
  },
}

Docs: https://docs.openclaw.ai/providers/sglang

2) For your “thinking off for now” question

For SGLang, the answer is basically (a), keep reasoning: true, and do not use params.thinking.

  • reasoning: true means “this model can emit reasoning/thinking-like output”.
  • params.thinking is the Moonshot-style request knob that caused the top-level thinking: { type: ... } field.
  • So for sglang, I would do:
#
  • reasoning: true
    • no params.thinking
    • no Moonshot provider

So: not (c). Closest is (a).

3) Minimal patch to stop replaying historical reasoning_content

I’d patch replay sanitization, not the wire serializer.

src/agents/pi-embedded-runner/thinking.ts

Change:

function stripAllThinkingBlocks(messages: AgentMessage[]): AgentMessage[] {
#

to:

export function stripAllThinkingBlocks(messages: AgentMessage[]): AgentMessage[] {

src/agents/pi-embedded-runner/replay-history.ts

Import it:

import { dropThinkingBlocks, stripAllThinkingBlocks } from "./thinking.js";

Then replace:

#
  const droppedThinking = policy.dropThinkingBlocks
    ? dropThinkingBlocks(sanitizedImages)
    : sanitizedImages;

with:

  const stripAllThinkingReplay =
    params.modelApi === "openai-completions" && params.provider === "sglang";

  const droppedThinking = stripAllThinkingReplay
    ? stripAllThinkingBlocks(sanitizedImages)
    : policy.dropThinkingBlocks
      ? dropThinkingBlocks(sanitizedImages)
#
      : sanitizedImages;

That does the narrow thing you want:

  • keeps reasoning visible in live output
  • removes replayed historical assistant thinking from the next request
  • does not require a proxy

If you want the same behavior while you’re temporarily still on moonshot, widen it to:

params.provider === "sglang" || params.provider === "moonshot"

4) Recommendation

#

My order would be:

  1. update to the beta/main build with the tool-id fix
  2. switch config to sglang/...
  3. apply the replay-thinking patch above

That should get you very close to a clean Kimi/SGLang path without wire rewriting.

rapid crypt
#

Updated with the 2026.4.22-beta + your sglang config:

  • âś… top-level thinking is gone — confirmed
  • ❌ tool ids still mangled: wire shows functionsread0/functionsread1/… on tool_calls and tool_call_id on tool messages, over 7 consecutive turns. Looks like c4dea58712 only disabled sanitizeToolCallIds for the moonshot replay policy and the sglang path still hits the openai-completions default.
  • ❌ reasoning_content still replayed on every historical assistant message (1, 2, 3, … 7 across the turns). Expected until the stripAllThinkingBlocks patch lands.
misty voidBOT
#

Yep, nice, that narrows it down a lot.

What’s left is now the generic openai-completions fallback path, not Moonshot-specific code:

  • thinking field was Moonshot-only, and switching to sglang removed it
  • tool-id mangling is still coming from the default openai-completions replay policy
  • reasoning_content replay is still coming from assistant thinking blocks being preserved and re-emitted into history

Minimal patch for the remaining tool-id bug (sglang)

src/agents/transcript-policy.ts

Change the fallback signature:

function buildUnownedProviderTransportReplayFallback(params: {
#
  modelApi?: string | null;
  modelId?: string | null;
  provider?: string | null;
}): ProviderReplayPolicy | undefined {

Then add:

  const normalizedProvider = normalizeProviderId(params.provider ?? "");
  const preserveNativeOpenAiCompatibleToolIds =
    normalizedProvider === "sglang" && params.modelApi === "openai-completions";

Then replace:

#
    ...(isGoogle || isAnthropic || requiresOpenAiCompatibleToolIdSanitization
      ? {
          sanitizeToolCallIds: true,
          toolCallIdMode: "strict" as const,
        }
      : {}),

with:

    ...(isGoogle ||
    isAnthropic ||
    (requiresOpenAiCompatibleToolIdSanitization && !preserveNativeOpenAiCompatibleToolIds)
      ? {
#
          sanitizeToolCallIds: true,
          toolCallIdMode: "strict" as const,
        }
      : {}),

And pass provider through here:

  return mergeTranscriptPolicy(
    buildUnownedProviderTransportReplayFallback({
      modelApi: params.modelApi,
      modelId: params.modelId,
      provider,
    }),
#
  );

That should stop:

  • functions.read:0 → functionsread0
  • and also stop mangling tool_call_id on tool messages

Minimal patch for the reasoning_content replay bug

Same one from before:

src/agents/pi-embedded-runner/thinking.ts

export function stripAllThinkingBlocks(messages: AgentMessage[]): AgentMessage[] {
#

src/agents/pi-embedded-runner/replay-history.ts

import { dropThinkingBlocks, stripAllThinkingBlocks } from "./thinking.js";

and:

  const stripAllThinkingReplay =
    params.modelApi === "openai-completions" && params.provider === "sglang";

  const droppedThinking = stripAllThinkingReplay
    ? stripAllThinkingBlocks(sanitizedImages)
    : policy.dropThinkingBlocks
      ? dropThinkingBlocks(sanitizedImages)
#
      : sanitizedImages;

Net effect after both patches

For sglang/openai-completions you should get:

  • native tool ids preserved, like functions.read:0
  • no historical reasoning_content replayed into assistant history
  • live reasoning output still visible when the model emits it

So yeah, your read is correct: c4dea58712 fixed the Moonshot-owned replay policy, but SGLang still needs its own opt-out from the generic fallback.

rapid crypt
#

do you expect these patches to go to the main repo for the next release?

misty voidBOT
#

Maybe, but I would not count on both landing in the very next release.

My guess:

  • SGLang tool-id fix: pretty likely
    It is small, low-risk, and matches the Moonshot fix that already landed.

  • reasoning_content replay strip: less certain
    I think it should land, but maintainers may want it done as a provider-owned SGLang replay rule instead of a broader core special-case.

So if you need this soon, I would treat the patches as local carry patches for now, and assume upstream is “likely soon, not guaranteed next release.”

If you want maintainer eyes on it, please file a GitHub issue with:

  • your exact sglang config
  • a before/after wire example showing functions.read:0 → functionsread0
  • one request showing replayed historical reasoning_content
rapid crypt
#

draft an issue so i can submit in github

misty voidBOT
#

Yep, paste this into GitHub and tweak any wording you want:

Title: SGLang `openai-completions` mangles Kimi tool call IDs and replays `reasoning_content` into history

## Summary

When using OpenClaw against a self-hosted SGLang OpenAI-compatible endpoint serving Kimi K2, the `openai-completions` path breaks multi-turn tool use in two ways:

1. Tool call IDs are sanitized from `functions.read:0` to `functionsread0`
2. Historical assistant `reasoning_content` is replayed back into subsequent request history

Together these confuse Kimi's chat template / parsers and cause the model to emit a short text preamble, then stop with `finish_reason=stop` and no `tool_calls`.

The same model and same SGLang server work correctly from other harnesses, so this appears to be on the OpenClaw side.
#
## Environment

- OpenClaw: `2026.4.22-beta`
- Transport: `openai-completions`
- Provider: `sglang`
- Backend: self-hosted SGLang serving Kimi K2
- SGLang flags:
  - `--tool-call-parser kimi_k2`
  - `--reasoning-parser kimi_k2`

## Config

```json5
{
  models: {
    providers: {
#

sglang: {
baseUrl: "http://127.0.0.1:30001/v1",
apiKey: "${SGLANG_API_KEY}",
api: "openai-completions",
models: [
{
id: "mymodel",
name: "Kimi K2 via SGLang",
reasoning: true,
input: ["text", "image"],
cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
contextWindow: 200000,
maxTokens: 63000,
compat: {
supportsReasoningEffort: false
}
}

#

]
}
}
},
agents: {
defaults: {
model: { primary: "sglang/mymodel" }
}
}
}


## Steps to reproduce

1. Start SGLang with Kimi K2 and:
   - `--tool-call-parser kimi_k2`
#
   - `--reasoning-parser kimi_k2`
2. Configure OpenClaw to use the `sglang` provider with `api: "openai-completions"`
3. Run a multi-turn task that requires tool calls across several turns
4. Inspect the `/v1/chat/completions` wire payloads

## Actual behavior

### 1) Tool call IDs are mangled

OpenClaw stores / sees tool IDs like:

```text
functions.read:0
functions.read:1
#

but the wire payload sends:

```text
functionsread0
functionsread1

This happens in both:

  • assistant tool_calls[*].id
  • tool messages tool_call_id

I observed this over 7 consecutive turns.

#
Kimi's `kimi_k2` tool-call parser expects native IDs shaped like:

```text
functions.<name>:<index>

so stripping punctuation breaks tool-result matching on later turns.

2) Historical reasoning_content is replayed

Every historical assistant message in the request includes reasoning_content again on replay.

Kimi's template treats reasoning_content as output-only. Replaying it as input appears to confuse the parser.

Resulting failure mode

#

The model often emits a short preamble like:

```text
Task 1 ...

then stops with:

finish_reason=stop

and produces no tool_calls.

#
## Expected behavior

For `sglang` + `openai-completions` + Kimi-style parsers:

- preserve native tool call IDs such as `functions.read:0`
- preserve matching `tool_call_id` values on tool result messages
- do not replay historical assistant `reasoning_content` back into later requests
- still allow live reasoning output to be surfaced when the model emits it

## Additional notes

I originally reproduced a related issue using a custom `moonshot` provider pointing at a self-hosted base URL. After switching to the proper `sglang` provider on `2026.4.22-beta`:

- the top-level `thinking: { type: "enabled" }` field is gone, which is good
- the remaining two issues above still reproduce
#

There is already a Moonshot-side fix for tool-call ID preservation on main:

- `c4dea58712 fix(moonshot): preserve native Kimi tool_call IDs in openai-completions replay`

but `sglang` still appears to go through the generic `openai-completions` fallback path.

## Suspected cause

It looks like the remaining problems come from the generic replay path for unowned `openai-completions` providers:

- strict tool ID sanitization still applies to `sglang`
- assistant thinking blocks are preserved in replay, then serialized back out as `reasoning_content`

## Possible fix directions