#Bug in codex reasoning level?

1 messages · Page 1 of 1 (latest)

fiery kelp
#

After upgrading to OpenClaw 2026.4.15 (from 2026.2.26), the new version classifies gpt-5.1-codex as a reasoning-capable model and now defaults the thinking level to low per the fallback rule. But the OpenAI API for gpt-5.1-codex only accepts medium — low is unsupported. OpenClaw's retry-with-medium logic fires but also fails (likely a bug in the retry path in 2026.4.15).

This is something we are hitting now - is that a known issue?

slender sonnetBOT
arctic furnaceBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> âś… Mark Solution

slender sonnetBOT
steel barnBOT
#

I’m checking the actual code path, because this smells like a real regression rather than config noise.

#

I found the exact compatibility code path. It currently special-cases only gpt-5.1-codex-mini, which is a pretty strong hint this wasn’t generalized to plain gpt-5.1-codex.

#

Yeah, that looks like a real bug, not config drift.

I checked current main, and the logic still does this:

  • reasoning-capable catalog models default to low
  • the OpenAI/Codex compat shim only special-cases gpt-5.1-codex-mini
  • plain gpt-5.1-codex is still treated as if low is valid

So your reported path makes sense: default low → OpenAI rejects it → retry path still doesn’t recover cleanly.

Short version: yes, this looks like a known regression/gap in the reasoning compatibility handling for gpt-5.1-codex. I do not see a proper released fix for the plain non-mini model path yet.

Workaround for now:

  • in-session: /think medium
  • or config: set agents.defaults.thinkingDefault: "medium" (or per-agent thinkingDefault) for the affected agent/model
fiery kelp
#

We already set the thinkingDefault to medium - but it doesn't seem to solve the issue.

steel barnBOT
#

That tracks, unfortunately. thinkingDefault is only a fallback. It won’t override an already-stored session value.

#

Yeah, that part makes sense.

thinkingDefault is only a fallback. It does not override an already-persisted session/thread thinking level.

So in this thread, the next thing I’d check is:

  1. send /think in this exact thread
  2. if it shows low, send /think medium as a standalone message
  3. retry the prompt

Important bit: /new or /reset may not help here, because OpenClaw currently preserves user-set thinking across resets. If you want to clear the session override entirely, use /think off first, then /think medium if you want it pinned.

Docs: https://docs.openclaw.ai/tools/thinking

If it still sends low after a standalone /think medium in the same thread, then yeah, that’s no longer just session state, that’s the actual bug path. And given the current code only special-cases gpt-5.1-codex-mini, I’d absolutely believe plain gpt-5.1-codex is still broken here.

If you want, paste:

#
  • output of /think
  • exact 400 error text after /think medium

and I’ll help narrow whether it’s stuck session state or the retry/transport bug.

fiery kelp
#

She says thinking medium but any message after that still hits:
LLM request failed: provider rejected the request schema or tool payload.

velvet birch
#

are you suing 5.1?

steel barnBOT
#

I’m checking whether that generic error is the outer wrapper for the same reasoning bug or a second Codex/tool-schema issue.

#

Got it, then I’d stop blaming thinkingDefault.

If /think reports medium in that thread, but the next turn still dies, the visible message is just OpenClaw’s generic schema/tool-payload wrapper. We need the raw upstream 400 to know which bug it really is.

Please grab:

openclaw status --all
openclaw logs --follow

Then reproduce once and paste the first 400 / invalid_request_error block from the log.

Docs:

#

My guess is it’s one of these two:

  1. the Codex path is still actually sending low even though session state says medium, or
  2. gpt-5.1-codex is rejecting the generated tool schema/payload on that route

Quick sanity check: if a plain chat question also fails, it’s probably still the reasoning payload. If it only fails on turns that need tools, it’s probably the tool-payload path.

If you paste the raw 400, I can usually tell which one immediately.

#

Not me, but the setup being discussed here is their openai-codex/gpt-5.1-codex, yeah.

If you meant “should they keep using 5.1 for now?”, I’d probably move off 5.1 temporarily and use gpt-5.4 if that’s available, until the raw 400 confirms whether this is the reasoning payload bug or a tool-schema bug.

fiery kelp
#

yeah so upgrading to 5.4 works - but worth noting that for 5.1 something in the embedded run path is building the API payload with low independently.

It sends thinking low even though the setting is medium

steel barnBOT
#

Yep, that’s the key detail.

If gpt-5.4 works, but gpt-5.1-codex still sends low while the session says medium, then this is no longer a session/defaults problem, it’s a 5.1 embedded-run payload construction bug.

That’s a solid bug report as-is:

  • OpenClaw 2026.4.15
  • affected model: openai-codex/gpt-5.1-codex
  • /think shows medium
  • embedded run still sends thinking/reasoning = low
  • gpt-5.4 does not reproduce

Temporary workaround: stay on gpt-5.4.

If you haven’t filed it yet, please open a GitHub issue and include the raw 400 / log snippet if you have it:
https://github.com/openclaw/openclaw/issues

#

That’s a much tighter repro now, which is great.

fiery kelp
#

Can't you file bugs?

steel barnBOT
#

I can’t open GitHub issues on anyone’s behalf from here, annoyingly. I’m read-only for repo ops.

I can hand you a paste-ready bug though:

Title
gpt-5.1-codex embedded runs send thinking=low even when session is medium

Body

## Version
OpenClaw 2026.4.15

## Affected model
`openai-codex/gpt-5.1-codex`

## Summary
#
After upgrading from 2026.2.26 to 2026.4.15, `gpt-5.1-codex` is treated as reasoning-capable and the embedded run path appears to build the API payload with `thinking/reasoning = low` independently, even when the effective session setting is `medium`.

OpenAI rejects `low` for this model. `gpt-5.4` works.

## What we observed
- `/think` reports `medium`
- requests using `gpt-5.1-codex` still send `low`
- result: provider rejects the request
- switching to `gpt-5.4` avoids the issue

## Expected
If the session thinking level is `medium`, the outgoing payload for `gpt-5.1-codex` should also use `medium` (or a model-supported remap), never `low`.

## Actual
The embedded run path appears to emit `low` anyway.
#

## Notes
This looks specific to the embedded run / request payload construction path, not just config/session state:
- changing `thinkingDefault` to `medium` did not fix it
- session `/think` state showed `medium`
- `gpt-5.4` did not reproduce

## Repro
1. Use OpenClaw 2026.4.15
2. Select `openai-codex/gpt-5.1-codex`
3. Set thinking to medium
4. Send a normal message
5. Observe request failure due to `low` reasoning/thinking being sent

## Extra logs