#all sessions stuck after 2026.4.14 update (+ openai-codex continuous token refresh fail)

30 messages Ā· Page 1 of 1 (latest)

slender pebble
#

openclaw version 2026.4.14

error [openai-codex] Token refresh failed: 401 {
"error": {
"message": "Your refresh token has already been used to generate a new access token. Please try signing in again.",
"type": "invalid_request_error",
"param": null,
"code": "refresh_token_reused"
}
}

all sessions get stuck after one message via webchat ui (regardless of model used).

royal skyBOT
#

those two symptoms are probably related.

refresh_token_reused is a permanent Codex OAuth failure, not a transient one. If openai-codex is still in your default/fallback/background paths, it can spam refresh attempts and make unrelated webchat sessions look stuck too.

Try this in order:

  1. Re-auth Codex

    openclaw models auth login --provider openai-codex
    
  2. Restart the gateway

    openclaw gateway restart
    
  3. Check full status

#
   openclaw status --all
   openclaw doctor
  1. If you need a quick workaround, temporarily remove openai-codex/* from:

    • primary model
    • fallbacks
    • cron/heartbeat/background jobs

    then restart again.

If it still hangs after one webchat message, please paste:

openclaw status --all
#
openclaw doctor
openclaw logs --follow

and then send one webchat message while logs are running.

Docs:

Also, if you are not pinned to 2026.4.14, I’d try the newest build too, there have been several recent Codex OAuth fixes.

If re-auth + restart + removing Codex from the model chain still leaves webchat stuck, I’d file that as a bug here: https://github.com/openclaw/openclaw/issues

slender pebble
#

this is what we have in the logs. tried new codex token multiple times with no solution. xai is also giving the same issue:

07:46:13+00:00 warn agent/embedded {"subsystem":"agent/embedded"} {"event":"embedded_run_agent_end","tags":["error_handling","lifecycle","agent_end","assistant_error"],"runId":"9c3f28cb-7c7f-442f-bf93-138ad2564b87","isError":true,"error":"terminated","failoverReason":null,"model":"gpt-5.3-codex","provider":"openai-codex","rawErrorPreview":"terminated","rawErrorHash":"sha256:e8c95a2a7271","providerRuntimeFailureKind":"unknown"} embedded run agent end
07:46:24+00:00 warn agent/embedded {"subsystem":"agent/embedded"} {"event":"embedded_run_agent_end","tags":["error_handling","lifecycle","agent_end","assistant_error"],"runId":"9c3f28cb-7c7f-442f-bf93-138ad2564b87","isError":true,"error":"terminated","failoverReason":null,"model":"gpt-5.3-codex","provider":"openai-codex","rawErrorPreview":"terminated","rawErrorHash":"sha256:e8c95a2a7271","providerRuntimeFailureKind":"unknown"} embedded run agent end
07:46:24+00:00 warn model-fallback/decision {"subsystem":"model-fallback/decision"} {"event":"model_fallback_decision","tags":["error_handling","model_fallback","candidate_succeeded"],"runId":"9c3f28cb-7c7f-442f-bf93-138ad2564b87","decision":"candidate_succeeded","requestedProvider":"xai","requestedModel":"grok-4","candidateProvider":"openai-codex","candidateModel":"gpt-5.3-codex","attempt":2,"total":3,"isPrimary":false,"requestedModelMatched":false,"fallbackConfigured":true,"previousAttempts":[{"provider":"xai","model":"grok-4","reason":"timeout","status":408,"errorPreview":"LLM request timed out.","errorHash":"sha256:321034bb3121"}]} model fallback decision
07:46:28+00:00 info web-heartbeat {"module":"web-heartbeat","runId":"2b8ef8d6-df18-4fb5-9cba-dbe8f08cd0a6"} {"connectionId":"d35218dc-0156-4a7e-8242-e23d506f9717","reconnectAttempts":0,"messagesHandled":0,"lastInboundAt":null,"authAgeMs":180011.70336914062,"uptimeMs":180018} web gateway heartbeat

royal skyBOT
#

yep, that log helps a lot.

The important part is this:

  • requested provider: xai/grok-4
  • previous attempt: timeout
  • fallback candidate: openai-codex/gpt-5.3-codex

So in this snippet, xAI is not the same auth bug. What’s happening is:

  1. xai/grok-4 times out
  2. OpenClaw falls back to openai-codex/...
  3. Codex is unhealthy, and the run goes sideways

So I’d stop chasing xAI first and isolate fallbacks.

Next checks

#
openclaw models status
openclaw models fallbacks list

Fastest clean test

Temporarily remove Codex from fallbacks, then restart:

openclaw models fallbacks remove openai-codex/gpt-5.3-codex
openclaw gateway restart

If you have multiple Codex fallbacks, remove those too, or for a pure test:

#
openclaw models fallbacks clear
openclaw gateway restart

Then test again in a brand new webchat session.

Why a new session matters

A session can keep old model/session state around, so after changing fallbacks I would not trust the old stuck tab, open a fresh one.

If Codex still appears after removing global fallbacks

Then it’s probably coming from one of these:

  • a per-agent model override
  • a cron/heartbeat/background job override
slender pebble
#

07:56:20+00:00 info gateway/reload {"subsystem":"gateway/reload"} config change detected; evaluating reload (tools.web.search.apiKey)
07:56:20+00:00 info gateway/reload {"subsystem":"gateway/reload"} config change applied (dynamic reads: tools.web.search.apiKey)
07:56:26+00:00 info web-heartbeat {"module":"web-heartbeat","runId":"00732d0d-235c-49e4-9ca7-55cfca04500f"} {"connectionId":"853e2e6b-d606-4d46-8233-65c49c865d2f","reconnectAttempts":0,"messagesHandled":0,"lastInboundAt":null,"authAgeMs":180014.51489257812,"uptimeMs":180019} web gateway heartbeat
07:56:28+00:00 warn agent/embedded {"subsystem":"agent/embedded"} {"event":"embedded_run_agent_end","tags":["error_handling","lifecycle","agent_end","assistant_error"],"runId":"44ec38e4-44b8-4ac2-beda-c8bb9469a913","isError":true,"error":"LLM request timed out.","failoverReason":"timeout","model":"grok-4","provider":"xai","rawErrorPreview":"Request timed out.","rawErrorHash":"sha256:98123ccc6904","providerRuntimeFailureKind":"timeout"} embedded run agent end
07:56:28+00:00 warn agent/embedded {"subsystem":"agent/embedded"} {"event":"embedded_run_failover_decision","tags":["error_handling","failover","assistant","surface_error"],"runId":"44ec38e4-44b8-4ac2-beda-c8bb9469a913","stage":"assistant","decision":"surface_error","failoverReason":"timeout","profileFailureReason":null,"provider":"xai","model":"grok-4","sourceProvider":"xai","sourceModel":"grok-4","profileId":"sha256:0741309971a0","fallbackConfigured":false,"timedOut":false,"aborted":false,"rawErrorPreview":"Request timed out.","rawErrorHash":"sha256:98123ccc6904","providerRuntimeFailureKind":"timeout"} embedded run failover decision

royal skyBOT
#

this latest log is actually useful, it points somewhere different.

For this run:

  • Web heartbeat looks alive
  • xai/grok-4 hit LLM request timed out
  • fallbackConfigured:false
  • decision is surface_error

So this one is not ā€œxAI fell into Codex fallbackā€.
It looks more like:

  1. you still have the separate Codex refresh-token problem in the background, and
  2. this specific webchat turn failed because xAI itself timed out

So I’d split the debugging now.

#

What I’d run next

openclaw models status --probe
openclaw status --all

Best isolation test

Start a brand new webchat session and use a provider that models status --probe shows as healthy, but not:

  • openai-codex/*
  • xai/*

If that works, then webchat itself is probably fine, and the failures are provider/runtime-specific.

If that still hangs, then we’re looking at something broader than Codex/xAI.

#

What this snippet does not show

  • not a dead webchat socket
  • not a Codex fallback on this run
  • not a config reload issue, tools.web.search.apiKey is just noise here

If you paste the output of:

openclaw models status --probe
openclaw status --all

I can help narrow it down fast.

slender pebble
#

all providers show ok except for openai-codex:

[openai-codex] Token refresh failed: 401 {
"error": {
"message": "Your refresh token has already been used to generate a new access token. Please try signing in again.",
"type": "invalid_request_error",
"param": null,
"code": "refresh_token_reused"
}
}

could there be a bug in the oauth refresh? I've refreshed it several times already.

royal skyBOT
#

yeah, at this point I think ā€œpossible bugā€ is fair.

2026.4.14 already included a Codex refresh fix, so if you’ve re-authed several times and still get the exact same refresh_token_reused, I’d suspect one of these:

  1. stale external Codex CLI auth is being reused

    • OpenClaw can reuse ~/.codex/auth.json
    • if that happened, the docs say those creds stay managed by Codex CLI, and OpenClaw re-reads that source
  2. you’re re-authing a different host/agent than the gateway is actually using

    • auth profiles live on the gateway host
    • multi-agent setups can also have separate auth stores
  3. there’s still a real Codex OAuth refresh regression

Two checks I’d do now

#
openclaw models status --json
openclaw models auth order get --provider openai-codex

And I’d verify whether this install originally reused ~/.codex/auth.json.

Practical recommendation

Since all other providers are healthy, I would temporarily remove openai-codex/* from primary/fallbacks, restart the gateway, and use a fresh webchat session. That at least stops Codex from poisoning the rest of the system while you isolate it.

If it still throws refresh_token_reused after a clean re-auth on the actual gateway host, I’d call that bug-worthy and file it here:

https://github.com/openclaw/openclaw/issues

Docs:

slender pebble
#

this is the auth file being pulled ~/.openclaw/agents/main/agent/auth-state.json

royal skyBOT
#

ah, that’s the key distinction:

auth-state.json is not the token store.
It holds runtime auth-routing state like per-agent auth order / cooldowns, not the actual OAuth tokens.

For Codex, the files that matter are:

  • ~/.openclaw/agents/main/agent/auth-profiles.json ← actual OpenClaw OAuth/API-key store
  • possibly ~/.codex/auth.json ← if this install reused Codex CLI auth

So if you’re chasing the repeated refresh_token_reused, I would look at auth-profiles.json, not auth-state.json.

One useful cleanup test

Since auth-state.json can pin profile order, I’d clear any per-agent override for Codex:

#
openclaw models auth order clear --provider openai-codex

That will not fix a bad token by itself, but it can stop the agent from preferring a stale profile/order.

What I’d inspect next

  1. ~/.openclaw/agents/main/agent/auth-profiles.json
  2. whether this install reused ~/.codex/auth.json
  3. openclaw models status --json

If you paste the openai-codex section from auth-profiles.json with secrets redacted, that’s the file that will actually tell the story.

Docs:

slender pebble
#

08:22:11+00:00 warn agent/embedded {"subsystem":"agent/embedded"} {"event":"embedded_run_failover_decision","tags":["error_handling","failover","assistant","surface_error"],"runId":"active-memory-mnzs81dp-9f5a6681","stage":"assistant","decision":"surface_error","failoverReason":"timeout","profileFailureReason":"timeout","provider":"openai-codex","model":"gpt-5.4","sourceProvider":"openai-codex","sourceModel":"gpt-5.4","profileId":"sha256:8e0a45ec2bd9","fallbackConfigured":false,"timedOut":true,"aborted":true} embedded run failover decision
08:22:15+00:00 warn agent/embedded {"subsystem":"agent/embedded"} {"event":"embedded_run_agent_end","tags":["error_handling","lifecycle","agent_end","assistant_error"],"runId":"3732eee0-d272-4017-a854-da14e8c5b7b3","isError":true,"error":"terminated","failoverReason":null,"model":"gpt-5.4","provider":"openai-codex","rawErrorPreview":"terminated","rawErrorHash":"sha256:e8c95a2a7271","providerRuntimeFailureKind":"unknown"} embedded run agent end

royal skyBOT
#

this log gives us a pretty strong clue:

runId":"active-memory-...

So Active Memory is in the path here, not just the main reply run.

Docs say Active Memory inherits the current session model if its own model is unset, so if the session lands on openai-codex/gpt-5.4, Active Memory also hits Codex. In your log that Codex pass is:

  • failoverReason: "timeout"
  • profileFailureReason: "timeout"
  • fallbackConfigured: false
  • timedOut: true
  • aborted: true

So I think you may have two Codex-related failure modes at once:

  1. background/auth churn from refresh_token_reused
#
  1. Active Memory Codex turns timing out and getting aborted

Fast isolation test

Temporarily turn Active Memory off globally and start a fresh webchat session.

Docs:

If that makes the ā€œone message then stuckā€ behavior disappear, we’ve narrowed it down a lot.

I would also do both of these

  • remove openai-codex/* from primary/fallbacks for now
  • make sure Active Memory is not inheriting Codex

If you want, the next thing to paste is:

#
  • whether Active Memory is enabled
  • whether plugins.entries.active-memory.config.model is set
  • whether your default/session model is still openai-codex/gpt-5.4

That would tell us pretty quickly if Active Memory is the thing making the UI look hung.

slender pebble
#

how do i disable active memory?

royal skyBOT
#

easiest way:

  • just this session

    /active-memory off
    
  • all sessions / global

    /active-memory off --global
    

If you only want to test whether it’s the culprit, I’d use:

/active-memory off --global
slender pebble
#

active memory off definitely saved the session. it is now somewhat usable. what are these?

08:28:04+00:00 info web-heartbeat {"module":"web-heartbeat","runId":"d270e70b-6716-4e12-a2aa-d3913afd1d4e"} {"connectionId":"eeb3d1e5-d5c9-4929-aa58-81aab8b35d09","reconnectAttempts":0,"messagesHandled":0,"lastInboundAt":null,"authAgeMs":60007.82568359375,"uptimeMs":60014} web gateway heartbeat
Log tail truncated (increase --max-bytes).
08:29:04+00:00 info web-heartbeat {"module":"web-heartbeat","runId":"d270e70b-6716-4e12-a2aa-d3913afd1d4e"} {"connectionId":"eeb3d1e5-d5c9-4929-aa58-81aab8b35d09","reconnectAttempts":0,"messagesHandled":0,"lastInboundAt":null,"authAgeMs":120009.82568359375,"uptimeMs":120016} web gateway heartbeat

royal skyBOT
#

nice, that’s a really useful find.

Those web-heartbeat lines are normal keepalive logs, not errors.

What they mean:

  • connectionId: this webchat socket
  • reconnectAttempts: 0: it has not been reconnecting, good sign
  • messagesHandled: 0: that socket hasn’t processed chat events yet
  • lastInboundAt: null: no inbound message recorded on that connection yet
  • authAgeMs: how long ago the socket authenticated
  • uptimeMs: how long the connection has been alive

So those lines by themselves are basically, ā€œwebchat connection is still open and healthy.ā€

The more important signal is what you already found: