#Codex timeout

1 messages · Page 1 of 1 (latest)

low pier
#

Openclaw version 2026.5.7

The failures were assistant-run timeouts

What the gateway logged:

• codex app-server turn idle timed out waiting for completion
• FailoverError: LLM request timed out.
• Embedded agent failed before reply: LLM request timed out.

This happened at:

• 18:51:03 UTC
• 19:00:05 UTC
• 19:04:46 UTC

Then Telegram sent the generic error message successfully:

• message 279
• message 290
• message 296

There were also liveness warnings around those periods:

• event_loop_delay
• max delay roughly 1.1s to 1.5s

So the root cause is:

• the embedded GPT/Codex assistant run did not finish before the gateway/app-server timeout window
• failover had no usable next candidate, so the turn died and Telegram got the generic error

So this is a gateway/model runtime timeout problem,

magic trailBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> âś… Mark Solution

calm ironBOT
#

Yep — that read is consistent with the log sequence. Telegram is downstream here: it successfully delivered the generic failure messages, so I’d treat the failure as the embedded assistant/model runtime timing out before OpenClaw could produce a reply.

The two knobs I’d check next:

  1. If the selected provider/model is slow or intermittently stalls, increase the provider/model request idle timeout, not just channel settings:
{
  models: {
    providers: {
      openai: { timeoutSeconds: 180 }
      // or whatever provider id you’re actually using
    }
  }
}

Docs: https://docs.openclaw.ai/concepts/agent-loop#timeouts

#
  1. Make sure the default model has a real fallback, otherwise FailoverError: LLM request timed out has nowhere useful to go:
{
  agents: {
    defaults: {
      model: {
        primary: "openai-codex/…",
        fallbacks: ["anthropic/…", "openai/…"]
      }
    }
  }
}

Docs: https://docs.openclaw.ai/gateway/config-agents

The event_loop_delay warnings at ~1.1–1.5s are worth noting, but I wouldn’t call those the primary cause by themselves unless they line up with broader CPU/memory pressure. They can make streaming/heartbeats less smooth, but the fatal line is the model/app-server idle timeout.

#

If this is reproducible, the most useful safe bundle is:

openclaw status --all
openclaw models status
openclaw logs --since "2026-05-10T18:45:00Z"

with tokens redacted. If it’s specifically Codex app-server repeatedly idling, I’d also try a non-Codex fallback once to separate “OpenClaw runtime timeout” from “Codex backend/app-server stall.”

low pier
#

I am getting 100% cpu usage which never happened on older versions

calm ironBOT
#

100% CPU changes the diagnosis a bit: that can absolutely explain the event_loop_delay warnings and can make Codex/app-server streaming look idle until OpenClaw hits the timeout.

First thing is to identify which process is pegged — gateway Node, Codex/app-server, a local model server, browser, or something else:

ps -eo pid,ppid,%cpu,%mem,etime,comm,args --sort=-%cpu | head -25
openclaw gateway stability
openclaw status --all

If the hot process is node/OpenClaw Gateway, grab the stability output — 2026.5.x records liveness samples with event-loop delay, CPU-core ratio, active/queued sessions, and phase/work labels. Docs: https://docs.openclaw.ai/gateway/diagnostics#stability-recorder

If the hot process is Codex/app-server, the earlier timeout is probably a symptom of that process spinning or saturating. In that case, the best quick mitigations are:

  • restart the gateway to clear stuck embedded runs:
#
openclaw gateway restart
  • temporarily switch the Telegram/default model away from Codex or add a non-Codex fallback, to confirm it’s Codex-runtime-specific
  • reduce parallel pressure if multiple turns/cron/jobs are running: lower agents.defaults.maxConcurrent

If this only started after 2026.5.7 and CPU stays pinned while idle, that smells like a regression or stuck runtime loop. Please capture the ps output + openclaw gateway stability before restarting if you can — that’s the useful evidence for a GitHub issue: https://github.com/openclaw/openclaw/issues

low pier
#

2vcpu 4gb ram 40gb is the VPS spec

calm ironBOT
#

That VPS size is fine for a basic OpenClaw gateway + Telegram + cloud model use, but it’s not a lot of headroom. On 2 vCPU, one runaway Node/Codex/browser/local-model process can show as 100% very quickly.

Given this never happened before, I’d separate “undersized VPS” from “2026.5.7 regression/stuck process” like this:

ps -eo pid,ppid,%cpu,%mem,etime,comm,args --sort=-%cpu | head -25
free -h
df -h
openclaw gateway stability

If the hot process is the gateway or Codex app-server while no task is actively running, that’s not normal. Capture that output before restart and file it here: https://github.com/openclaw/openclaw/issues

Quick mitigation for a 2 vCPU VPS:

#
{
  agents: {
    defaults: {
      maxConcurrent: 1
    }
  }
}

Also avoid running browser automation or local models on that VPS if possible. 4GB RAM is okay for the gateway, but browser + Codex/runtime + channels + logs can get spicy. 🦞

low pier
#

What I checked:

• Host load right now is modest: 0.48, 2.34, 2.91
• RAM is not exhausted: about 2.0 GiB available
• Gateway is running and has been up since 2026-05-10 14:12 UTC
• Gateway logs do show repeated liveness warning: event_loop_delay, but the CPU ratios in those warnings are roughly 0.03 to 0.40, not a box pinned at 100%

The stronger signal is in the gateway logs:

• subagent sat in processing for about 615s
• last progress was codex_app_server:notification:item/started
• OpenClaw eventually did stuck session recovery: abort_embedded_run
• that is much more consistent with a Codex/OpenClaw runtime stall or model-side no-progress timeout than pure CPU starvation

There is mild performance pressure on the gateway though:
• gateway memory is high at about 1. 8G now, peak 3. 3G
• event loop delay warnings are real

calm ironBOT
#

That’s a much better signal. I’d stop treating this as “VPS is too small / CPU pinned” and treat it as: Codex app-server accepted the turn, emitted item/started, then stopped producing progress until OpenClaw’s stuck-session recovery aborted it.

The gateway pressure is still relevant — 1.8G RSS / 3.3G peak on a 4G VPS is uncomfortable — but the 615s no-progress subagent is the smoking gun.

Useful next checks:

/codex status
/codex diagnostics "turn stalls after codex_app_server:notification:item/started; OpenClaw abort_embedded_run after ~615s"
openclaw gateway stability --json
openclaw logs --since "2026-05-10T18:45:00Z"

For mitigation, I’d try these in order:

  1. Restart once to clear the stuck Codex/app-server state:
#
openclaw gateway restart
  1. Keep concurrency low on that VPS:
{
  agents: {
    defaults: {
      maxConcurrent: 1
    }
  }
}
  1. If you need reliability immediately, temporarily route the affected agent away from native Codex app-server and use PI/direct OpenAI API instead, or use a non-Codex fallback. For OpenAI refs, don’t use new openai-codex/gpt-* model refs — current docs say openai/gpt-* is the Codex-backed path, while openai-codex is auth-profile naming.
#
  1. Don’t “solve” this by only raising timeouts yet. Codex harness has app-server knobs like plugins.entries.codex.config.appServer.requestTimeoutMs and turnCompletionIdleTimeoutMs, but your case looks like a true no-progress stall. Raising them may just wait longer before recovery.

If this repeats after a clean restart on 2026.5.7, I’d file it with the exact item/started → ~615s → abort_embedded_run sequence plus stability JSON: https://github.com/openclaw/openclaw/issues

Docs for the relevant bit: https://docs.openclaw.ai/plugins/codex-harness-reference

low pier
#

I just tried updating the runtime to Pi but i see this

🦞 OpenClaw 2026.5.7 (eeef486)
⏱️ Uptime: gateway 5h 34m · system 19h 11m
🧠 Model: openai-codex/gpt-5.5 · 🔑 oauth (openai-codex:x@gmail.com (x@gmail.com))
🔄 Fallbacks: openai-codex/gpt-5.4
🗄️ Cache: 100% hit · 136k cached, 0 new
📚 Context: 0/200k (0%) · 🧹 Compactions: 0
📊 Usage: 5h 92% left ⏱1h 48m · Week 76% left ⏱3d 1h
🧵 Session: agent:kim:telegram:direct:xxxxx • updated 1m ago
⚙️ Execution: direct · Runtime: OpenAI Codex · Think: high · Text: low
🪢 Queue: steer (depth 0)

calm ironBOT
#

That status means the PI change didn’t actually take for this session/model route.

Two tells:

  • Runtime: OpenAI Codex = still using native Codex app-server, not PI.
  • Model: openai-codex/gpt-5.5 = legacy model prefix. Current docs say openai-codex is the auth/profile provider; model refs should normally be openai/gpt-5.5. Doctor rewrites legacy openai-codex/gpt-* routes to openai/gpt-* unless PI is explicitly pinned.

Check the effective config for this exact agent (kim):

openclaw config get agents.defaults.model --json
openclaw config get models.providers.openai.agentRuntime --json
openclaw config get agents.list --json | grep -n "kim" -A40 -B10
openclaw models status --agent kim

To force PI for OpenAI/Codex-auth model turns, the runtime policy needs to be explicit, e.g. provider-wide:

#
{
  models: {
    providers: {
      openai: {
        agentRuntime: { id: "pi" }
      }
    }
  },
  agents: {
    defaults: {
      model: {
        primary: "openai/gpt-5.5",
        fallbacks: ["openai/gpt-5.4"]
      }
    }
#
  }
}

Then run:

openclaw doctor --fix
openclaw config validate
openclaw gateway restart

After that, /status should no longer say Runtime: OpenAI Codex. If it still does, there’s probably an agent-specific override or stale session route pin for agent:kim:telegram:direct:... winning over defaults.

Docs: https://docs.openclaw.ai/providers/openai

low pier
#

I have moved to pi now, i will monitor thanks

calm ironBOT
#

Sounds good. If it stays stable on PI, that strongly points back at the Codex app-server path rather than Telegram or the VPS itself.

If it stalls again, grab /status, openclaw gateway stability, and the log lines around the stall before restarting. That’ll make the next step much clearer.

cyan sky
#

codex 0auth login issue

calm ironBOT
#

For Codex OAuth, first thing: what’s your openclaw --version and the exact command/error?

Quick sanity path meanwhile:

openclaw models auth login --provider openai-codex --set-default
openclaw models auth list --provider openai-codex
openclaw config get agents.defaults.model --json
openclaw models status

A common gotcha: keep the model as openai/gpt-5.5; openai-codex is the auth provider/profile id, not the normal model ref. If you have older openai-codex/* model config, run:

openclaw doctor --fix

If you’re on a headless/VPS box, do the OAuth flow on a machine with a browser, then copy the matching ~/.openclaw/agents/<agentId>/agent/auth-profiles.json to the gateway host.

cyan sky
#

openclaw --profile shigga

calm ironBOT
#

Yep — if that’s the profile you’re using, run the same checks against that state dir:

openclaw --profile shigga --version
openclaw --profile shigga models auth login --provider openai-codex --set-default
openclaw --profile shigga models auth list --provider openai-codex
openclaw --profile shigga config get agents.defaults.model --json
openclaw --profile shigga models status

If the gateway service is also supposed to use that profile, make sure it was installed/run with the same profile too:

openclaw --profile shigga gateway status

Profile mismatch is a classic “I logged in but runtime still says no auth” trap because --profile shigga uses ~/.openclaw-shigga, not the default ~/.openclaw.

cyan sky
#

OpenAI Codex OAuth callback did not arrive within 15000ms; switching to manual entry (callback_timeout).
â—‡ Manual OAuth entry required

calm ironBOT
#

That message usually isn’t fatal — it means the localhost browser callback didn’t complete within 15s, so OpenClaw fell back to manual OAuth entry.

Try this:

  1. After the browser login redirects/fails, copy the full redirect URL from the address bar and paste it into the terminal prompt that says Paste the authorization code (or full redirect URL):
  2. If you’re on a VPS/headless box, skip the localhost callback path and use:
    openclaw models auth login --provider openai-codex --device-code
    
  3. If it still fails, please paste:
    openclaw --version
    openclaw models auth list --provider openai-codex
    openclaw models status
    

Docs: https://docs.openclaw.ai/providers/openai

low pier
#

Moving to runtime PI solved everything