Hi, I have been facing an issue with using 5.3-codex as my main agent. I have mostly used Opus on main with minimax and kimi sometimes for lighter tasks. Codex work great as a separate agent, but when I try to use it on main, it tells me that he’s doing something but doesn’t actually make any tool calls. I have tried resetting session multiple times and also changing AGENTS.md. Nothing seems to work. Anyone else face this?
#OpenAI model pre-confirmation
1 messages · Page 1 of 1 (latest)
What you’re describing (“it narrates that it’s doing things” but never emits tool calls) usually boils down to one of two cases:
- The model/provider combo isn’t actually tool-calling in OpenClaw (so it will describe actions instead of producing tool invocations), or
- Tools are being called but you’re not seeing the tool bubbles (Discord + non-verbose mode can make this feel like “nothing happened”).
A quick way to disambiguate:
1) Turn on tool visibility in that session
In the same chat/session where you set Codex as main, send:
/verbose full
Then ask something that must use a tool, e.g. “use web_fetch on https://example.com and summarize”.
Docs: https://docs.openclaw.ai/tools/thinking (see “Verbose directives”)
2) Grab the 3 “safe to paste” diagnostics
On the gateway host, please paste outputs of:
openclaw --versionopenclaw status --allopenclaw models status
(These are the exact ones the FAQ recommends for “I’m stuck” debugging.)
Docs: https://docs.openclaw.ai/help/faq
3) Check logs for the smoking gun
While reproducing once, run:
openclaw logs --follow
If you see anything like “tools unsupported”, “tool schema rejected”, “request rejected”, or parsing errors, that’s likely a compatibility bug/limit with that model as the primary agent context (often more tools/skills attached than a minimal sub-agent). If that’s what it is, best next step is filing a GitHub issue with the log snippet: https://github.com/openclaw/openclaw/issues
Pragmatic workaround (if you just want to keep moving): keep a tool-reliable model as the main agent, and route coding work to the Codex agent via sub-agents (so you still get Codex where it shines, without breaking tools in the main loop).
If you paste the openclaw status --all + the relevant log lines, I can tell which branch you’re on in one pass.
Can I share a public gist?
Yes — a public gist is perfect, just please redact secrets first.
What’s generally safe/helpful to include (and most useful for debugging):
openclaw --versionopenclaw status --all(it’s designed to be shareable; tokens get redacted)openclaw models status- A short log excerpt around the failed run:
openclaw logs --follow→ reproduce once → copy ~50–150 lines around the error
What not to post publicly (or replace with ***):
- Any API keys / OAuth tokens / bot tokens (Discord/Telegram/etc)
gateway.auth.token- Anything under
channels.*.token, cookies, “Authorization:” headers - Full
~/.openclaw/openclaw.jsonif it contains secrets (better to paste only the relevant model/tool sections)
If you want, paste the gist link here and (optionally) tell me what OS + install type you’re on (npm vs git), and I’ll point out exactly what in the logs indicates “model didn’t tool-call” vs “tools blocked / not visible”.
I can see the gist, but two important notes from the part I can fetch:
-
There’s no Codex run in the snippet you posted.
All theembedded run startlines I can see areprovider=anthropicorprovider=minimax-portal(plus a bunch of cron/heartbeat activity). So we don’t yet have the “main agent on gpt-5.3-codex” reproduction in the logs. -
Your gateway is currently pretty noisy / partially unhealthy, which can make “it says it’s doing stuff” feel weird:
HTTP 401 authentication_error: Invalid bearer token(Anthropic)- lots of
API rate limit reached+ “No available auth profile for anthropic (all in cooldown…)” - Discord websocket reconnects (
code 1006) sprinkled in the tail
What I’d do next (quick + decisive)
A) In the chat where Codex is “main”, enable tool visibility:
/verbose full
Then ask something that forces a tool, e.g. “Useweb_fetchon https://example.com and tell me the title.”
(Without verbose, successful tool calls often don’t show as separate “tool bubbles”, so it can look like “no tools happened”.)
B) Capture the Codex run in logs
Run openclaw logs --follow, reproduce once with Codex as main, then paste just the lines around:
provider=openai-codex(or whatever provider it resolves to)- any
embedded run tool start/ tool errors
C) Paste openclaw models status
This will confirm whether Codex auth is actually healthy, and whether you’re silently failing over to Anthropic (which currently looks broken on your machine).
If you drop those two (models status + the log chunk that includes a Codex run start), we can tell quickly whether it’s:
- “tools happened but weren’t visible” vs
- “Codex provider never got selected” vs
- “Codex selected but tool-calling is failing” (then it’s GitHub-issue-worthy with that log excerpt).
In that second gist: this doesn’t look like a “Codex early termination” bug so much as sessions_send doing exactly what it’s documented to do.
sessions_send is agent-to-agent messaging, and it runs a little handshake:
- reply-back ping‑pong (up to
session.agentToAgent.maxPingPongTurns) - then an announce step (your log literally says “announce step auto-queued”)
Docs: https://docs.openclaw.ai/tools (search for sessions_send — it mentions the ping‑pong + announce + REPLY_SKIP / ANNOUNCE_SKIP)
So what you’re seeing:
- Run 1: Codex replies to the message with a quick “Acknowledged…” (text-only). That’s a successful completion from the gateway’s POV (not an error), so there’s no retry/failover trigger.
- Runs 2–4: those tiny
promptChars=79 / 35 / 56runs line up perfectly with the ping‑pong/announce follow-ups, not “real work”.
What to do instead (reliable tool use)
If you want “do the coding task, use tools, then report back”, use sessions_spawn (sub-agent run), not sessions_send.
sessions_spawn is the “run a task” API; sessions_send is “send a message / have a convo”.
If you must keep sessions_send
You can reduce the noise by having the target agent reply with:
REPLY_SKIP(to stop the ping‑pong), and/orANNOUNCE_SKIP(to suppress the announce step)
…but fundamentally, it still won’t force tool calls. If Codex chooses to answer in plain text, OpenClaw treats that as “done”.
Answering your 3 questions
- Yes, I’ve seen “acknowledge then stop” with Codex — but your multi-run pattern is mostly the
sessions_sendhandshake. - Not currently: text-only is not an error, so no automatic retry/failover. That would be a feature request.
- Yes — the announce/ping‑pong is very plausibly what’s nudging it into “conversational” mode.
If you want, paste the snippet of the code/automation where you call sessions_send (just the call + params, no secrets) and I’ll point to the exact swap to sessions_spawn.
This isn't a sessions_send issue. The same behavior happens when Codex is the primary agent receiving direct Discord messages — text-only response, no tool calls, run_completed in <4 seconds. The sessions_send handshake just made it more visible in the logs.
The core bug: gpt-5.3-codex intermittently chooses to emit end_turn with text instead of making tool calls, even when the task clearly requires them. Same session, same context, works fine minutes later. See Github issue #28754 for full logs.
Yep — if it’s happening on direct Discord messages too (not just sessions_send), then you’re describing the same underlying failure mode as in openclaw/openclaw#28754: Codex returns a perfectly valid assistant completion (end_turn / run_completed) but never emits any toolUse blocks, so the gateway has nothing to execute and considers the run “done”.
That’s also why it feels intermittent/clustered: once Codex falls into “conversational acknowledgement mode”, OpenClaw has no hard signal to trigger retry/failover.
A couple pragmatic things worth trying while the issue gets fixed upstream / in OpenClaw:
1) Force Codex transport (this one is surprisingly effective for flaky runs)
OpenClaw’s Codex provider supports a transport override (WebSocket vs SSE). If one of those paths is occasionally dropping tool-call frames, you’ll get exactly “fast text-only end_turn”.
Config (JSON5) idea:
{
agents: {
defaults: {
models: {
"openai-codex/gpt-5.3-codex": {
params: { transport: "sse" } // try "sse", and if that’s worse try "websocket"
}
}
}
}
}
Docs: https://docs.openclaw.ai/providers/openai (see “Codex transport default”)
2) Reduce the tool schema size for Codex (less to ignore)
If you’re running “full” tools with a giant system prompt, try tightening just for Codex via tool profiles (e.g. coding) so the model gets a smaller, more focused tool list.
Docs: https://docs.openclaw.ai/tools (see tools.profile + tools.byProvider)
3) Tactical prompt workaround when it starts happening
When you notice the “Acknowledged, I’ll do it” pattern, resend with an explicit first tool call, e.g. “Start by running exec: pwd && ls and paste output, then proceed.”
Not a real fix, but it often snaps tool-use back on.
If you can, adding one datapoint to #28754 would help a ton: whether forcing SSE vs WebSocket changes the frequency of the “no tool calls” clusters.
gpt-5.3-codex (SSE transport):
14:05:13 → 14:05:20 (7s) — 0 tool calls, text only ❌
14:05:43 → 14:05:48 (5s) — 0 tool calls, text only ❌
gpt-5.2-codex (same session, same context, 2 min later):
14:07:17 → still running (3+ min) — 13 exec calls and counting ✅
Same agent, same 37K system prompt, same 151K history, same auth. 5.3 gives text and stops. 5.2 executes.
Found another similar issue:
That’s a really strong A/B — thanks for running it.
Given:
- same session/context/auth
- same tool schema
- SSE doesn’t help
- 5.2-codex tool-calls normally
- 5.3-codex end_turns quickly with “acknowledged”
…this pretty much pins it on gpt-5.3-codex itself (or the openai-codex provider adapter for that specific model), not your setup or OpenClaw tool policy.
A couple actionable takeaways / what I’d recommend posting back into #28754:
What you’ve proven
- It’s not transport-specific (SSE still fails).
- It’s model-specific (5.2 works, 5.3 fails) under identical conditions.
- The failure mode is “valid text completion,
stopReason != toolUse” → gateway treats as success → no retries/fallback.
Practical workaround right now
- Use
openai-codex/gpt-5.2-codexas the default for any agent that must reliably use tools. - Keep 5.3-codex only as an “interactive coder / chatty reasoning” model (or behind a manual
/modelswitch when you can babysit it).
Engineering-y mitigation idea (feature request)
OpenClaw could optionally treat “text-only completion when tools are available + task looks tool-requiring” as a soft failure and automatically:
- re-prompt once (“You must start with a tool call; do not answer in text”), then
- if still no tool calls, fallback to the next model.
This would also address the Telegram symptom in #26494 (single message then stops).
And yes: #26494 looks like the same class of issue (“stops immediately, no action executed”) just observed on Telegram, so linking your A/B results there (and vice versa) is useful.
If you want to make the workaround super crisp in config terms, tell me whether you’re setting the primary model via agents.defaults.model.primary or per-agent agents.list[].model, and I’ll paste the exact minimal JSON5 snippet to pin that agent to 5.2 while keeping 5.3 available via /model.
Ok