#Liveness warning for no apparent reason

1 messages Β· Page 1 of 1 (latest)

patent echo
#

I have this warning for quite some time I don't really know why :

11:42:25 [diagnostic] liveness warning: reasons=event_loop_delay,event_loop_utilization,cpu interval=72s eventLoopDelayP99Ms=85.3 eventLoopDelayMaxMs=66404.2 eventLoopUtilization=0.96 cpuCoreRatio=1.195 active=1 waiting=0 queued=1 phase=channels.discord.start-account recentPhases=sidecars.subagent-recovery:89ms,sidecars.main-session-recovery:20ms,post-attach.update-sentinel:1ms,sidecars.session-locks:434ms,sidecars.model-prewarm:3339ms,post-ready.maintenance:464ms work=[active=agent:main:discord:channel:1503693380448682064(processing,q=1,age=69s) queued=agent:main:discord:channel:1503693380448682064(processing,q=1,age=69s)]
11:42:25 [discord] gateway: Gateway websocket closed: 1000

I run my OpenClaw in a pterodactyl env with no CPU limit, and 6Gb RAM limit (tho it sits at around 500mb to 1Gb in average)

azure warrenBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> βœ… Mark Solution

atomic talonBOT
#

First thing: what does openclaw --version show?

That log is basically OpenClaw saying β€œthe Gateway process was too busy / not getting scheduled for a while.” The important bits:

  • eventLoopDelayMaxMs=66404.2 = Node’s event loop had one observed stall of ~66 seconds. That’s not RAM pressure; it’s either CPU starvation, a synchronous/blocking operation, or the container/host pausing/throttling the process.
  • eventLoopUtilization=0.96 and cpuCoreRatio=1.195 = during that sample it was effectively using about a full CPU core+.
  • active=1 queued=1 ... age=69s = one Discord-thread task was actively processing and another message for the same session was queued behind it.
  • phase=channels.discord.start-account means this happened while Discord account startup/reconnect work was in progress.
  • Gateway websocket closed: 1000 is a clean Discord gateway close code by itself; paired with the liveness warning it likely happened because startup/reconnect got delayed, not because RAM ran out.

Pterodactyl β€œno CPU limit” doesn’t always mean β€œno CPU contention” β€” the panel/host can still be on shared cores, have CPU steal, cgroup quirks, or get paused during backups/restarts. The 66s max delay is the suspicious part.

Useful next checks:

openclaw --version
#
openclaw status --all
openclaw health --verbose

Also watch whether the warning appears only during startup/model prewarm, or while completely idle. If it happens idle, I’d suspect the host/container scheduling. If it happens while the bot is answering, it’s more likely a heavy local task/model/tool call blocking the Gateway process.

Docs: https://docs.openclaw.ai/help/faq and https://docs.openclaw.ai/gateway/troubleshooting

patent echo
#
openclaw --version
OpenClaw 2026.5.7 (eeef486)

OpenClaw status --all
Overview
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Item               β”‚ Value                               β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Version            β”‚ 2026.5.7                            β”‚
β”‚ OS                 β”‚ linux 6.12.85+deb13-amd64 (x64)     β”‚
β”‚ Node               β”‚ 24.14.0                             β”‚
β”‚ Config             β”‚ ~/openclaw.json                     β”‚
β”‚ Dashboard          β”‚ http://127.0.0.1:7009/              β”‚
β”‚ Tailscale exposure β”‚ off                                 β”‚
β”‚ Channel            β”‚ stable (default)                    β”‚
β”‚ Update             β”‚ pnpm Β· up to date Β· npm latest      β”‚
β”‚                    β”‚ 2026.5.7                            β”‚
β”‚ Gateway            β”‚ local Β· http://127.0.0.1:7009 (env  β”‚
β”‚                    β”‚ OPENCLAW_GATEWAY_URL) Β· reachable   β”‚
β”‚                    β”‚ 414ms Β· auth token Β· xxx-2a75- β”‚
β”‚                    β”‚ 483e-aeba-xxx(172.18.0.   β”‚
β”‚                    β”‚ 5) app 2026.5.7 linux 6.12.         β”‚
β”‚                    β”‚ 85+deb13-amd64                      β”‚
β”‚ Security           β”‚ Run: openclaw security audit --deep β”‚
β”‚ Gateway self       β”‚ unknown                             β”‚
β”‚ Gateway service    β”‚ systemd user not installed          β”‚
β”‚ Node service       β”‚ systemd user not installed          β”‚
β”‚ Agents             β”‚ 3 total Β· 0 bootstrapping Β· 2       β”‚
β”‚                    β”‚ active Β· 52 sessions                β”‚
β”‚ Secrets            β”‚ none                                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Channels
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Channel  β”‚ Enabled β”‚ State  β”‚ Detail                     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Discord  β”‚ ON      β”‚ OK     β”‚ token config               β”‚
β”‚          β”‚         β”‚        β”‚ (sha256:8ae98d26 Β· len     β”‚
β”‚          β”‚         β”‚        β”‚ 72) Β· accounts 1/1         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Discord accounts
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Account  β”‚ Status   β”‚ Notes                              β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ default  β”‚ OK       β”‚ token:config Β· credential          β”‚
β”‚          β”‚          β”‚ available in gateway runtime Β·     β”‚
β”‚          β”‚          β”‚ allow:179970066777899008           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

I'll send more wait

#
Agents
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Agent     β”‚ Bootstrap file β”‚ Sessions β”‚ Active   β”‚ Store β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€
β”‚ main      β”‚ ABSENT         β”‚       37 β”‚ just now β”‚ ~/.   β”‚
β”‚           β”‚                β”‚          β”‚          β”‚ openc β”‚
β”‚           β”‚                β”‚          β”‚          β”‚ law/  β”‚
β”‚           β”‚                β”‚          β”‚          β”‚ agent β”‚
β”‚           β”‚                β”‚          β”‚          β”‚ s/    β”‚
β”‚           β”‚                β”‚          β”‚          β”‚ main/ β”‚
β”‚           β”‚                β”‚          β”‚          β”‚ sessi β”‚
β”‚           β”‚                β”‚          β”‚          β”‚ ons/  β”‚
β”‚           β”‚                β”‚          β”‚          β”‚ sessi β”‚
β”‚           β”‚                β”‚          β”‚          β”‚ ons.  β”‚
β”‚           β”‚                β”‚          β”‚          β”‚ json  β”‚
β”‚ archivist β”‚ ABSENT         β”‚        0 β”‚ unknown  β”‚ ~/.   β”‚
β”‚           β”‚                β”‚          β”‚          β”‚ openc β”‚
β”‚           β”‚                β”‚          β”‚          β”‚ law/  β”‚
β”‚           β”‚                β”‚          β”‚          β”‚ agent β”‚
β”‚           β”‚                β”‚          β”‚          β”‚ s/    β”‚
β”‚           β”‚                β”‚          β”‚          β”‚ archi β”‚
β”‚           β”‚                β”‚          β”‚          β”‚ vist/ β”‚
β”‚           β”‚                β”‚          β”‚          β”‚ sessi β”‚
β”‚           β”‚                β”‚          β”‚          β”‚ ons/  β”‚
β”‚           β”‚                β”‚          β”‚          β”‚ sessi β”‚
β”‚           β”‚                β”‚          β”‚          β”‚ ons.  β”‚
β”‚           β”‚                β”‚          β”‚          β”‚ json  β”‚
β”‚ developer β”‚ ABSENT         β”‚       15 β”‚ just now β”‚ ~/.   β”‚
β”‚           β”‚                β”‚          β”‚          β”‚ openc β”‚
β”‚           β”‚                β”‚          β”‚          β”‚ law/  β”‚
β”‚           β”‚                β”‚          β”‚          β”‚ agent β”‚
β”‚           β”‚                β”‚          β”‚          β”‚ s/    β”‚
β”‚           β”‚                β”‚          β”‚          β”‚ devel β”‚
β”‚           β”‚                β”‚          β”‚          β”‚ oper/ β”‚
β”‚           β”‚                β”‚          β”‚          β”‚ sessi β”‚
β”‚           β”‚                β”‚          β”‚          β”‚ ons/  β”‚
β”‚           β”‚                β”‚          β”‚          β”‚ sessi β”‚

🦞 > openclaw health --verbose
🦞 OpenClaw 2026.5.7 (eeef486)
   Claws out, commit inβ€”let's ship something mildly responsible.
Gateway connection:
  Gateway target: http://127.0.0.1:7009
  Source: env OPENCLAW_GATEWAY_URL
  Config: /home/container/openclaw.json
Discord: configured
Gateway event loop: ok max=116ms p99=21ms util=0.04 cpu=0.046
Agents: main (default), developer, archivist
Heartbeat interval: 30m (main), disabled (developer), disabled (archivist)
Session store (main): /home/container/.openclaw/agents/main/sessions/sessions.json (37 entries)
- agent:main:discord:channel:1503693380448682064 (1m ago)
- agent:main:discord:channel:1503367941754982520 (1m ago)
- agent:main:discord:channel:1503408688587739218 (1110m ago)
- agent:main:discord:channel:1503405561989828730 (1141m ago)
- agent:main:discord:channel:1503373264871882853 (1219m ago)
Session store (developer): /home/container/.openclaw/agents/developer/sessions/sessions.json (15 entries)
- agent:developer:subagent:3a6f7c70-9648-4b80-98f7-22c6f3aeb754 (1m ago)
- agent:developer:subagent:4c0dd4d6-931c-44b8-a146-f7686848ca98 (7m ago)
- agent:developer:subagent:ddefc6b7-f9c2-4b75-a6a0-0fd39d863c31 (36m ago)
- agent:developer:subagent:998ac875-ab2c-4440-9c3f-462d4eb10dd1 (51m ago)
- agent:developer:subagent:6708f770-ff3c-4db6-a54c-161a95ba2065 (97m ago)
Session store (archivist): /home/container/.openclaw/agents/archivist/sessions/sessions.json (0 entries)
atomic talonBOT
#

This snapshot is actually reassuring: Gateway event loop: ok max=116ms p99=21ms util=0.04 cpu=0.046 means it’s healthy now. So this is almost certainly transient, not a permanent bad config.

Two things stand out:

  1. You’re on Node 24.14.0, which is fine β€” docs recommend Node 24 / support Node 22.16+.
  2. You had recent developer subagents and active Discord sessions. The warning line’s work=[active=... queued=...] means the warning was emitted because there was real work open/queued while the Gateway event loop was saturated.

I’d check the stability recorder around the time it happens:

openclaw gateway stability
openclaw gateway stability --json

Docs: https://docs.openclaw.ai/gateway/diagnostics

If it only appears during startup/reconnect, model prewarm, or while subagents are running, I’d treat it as β€œhost got busy / Gateway was under load” unless Discord actually stops responding. If it appears while idle, then I’d look hard at the Pterodactyl node: CPU steal/shared-host contention, backups, panel freezes, or cgroup scheduling. β€œNo CPU limit” in Pterodactyl unfortunately doesn’t prove the process always gets a core when it needs one.

#

The one scary number remains eventLoopDelayMaxMs=66404.2 β€” a ~66s stall. That’s not normal OpenClaw idle behavior. If you can grab the nearby openclaw gateway stability --json output after the next occurrence, it should show whether that was startup phase, subagent recovery, model prewarm, or something else blocking the loop.

patent echo
#

Ok so I'll need to run stability once it tell me the warning again

atomic talonBOT
#

Yep β€” ideally run it right after the warning, before restarting anything:

openclaw gateway stability
openclaw gateway stability --json
openclaw health --verbose

If you want the most useful shareable artifact:

openclaw gateway diagnostics export --output openclaw-diagnostics.zip --log-lines 5000

That export is designed to redact secrets/content, but still skim it before posting anywhere just to be safe.

If the warning happens again, paste the liveness warning plus the nearby stability output. That should tell us whether it was Discord startup/reconnect, subagent/model work, or the container/host starving the Node event loop.

patent echo
#

I can't really manage to reproduce the issue but I saw that if I restart my container it doesn't seems to retrieve the session when I talk into it...

12:28:51 [ws] ⇄ res βœ— sessions.resolve 18ms errorCode=INVALID_REQUEST errorMessage=No session found: main conn=1e787a92…4e0c id=7e40e35a…c76b
12:28:51 [ws] ⇄ res βœ— sessions.resolve 9ms errorCode=INVALID_REQUEST errorMessage=No session found with label: main conn=44c60188…a136 id=da8659ed…2c1a
12:29:18 [ws] ⇄ res βœ— sessions.resolve 16ms errorCode=INVALID_REQUEST errorMessage=No session found: developer conn=98ae2741…e38a id=f5a6fd1e…2984
12:29:18 [ws] ⇄ res βœ— sessions.resolve 7ms errorCode=INVALID_REQUEST errorMessage=No session found with label: developer conn=f00be14d…34dd id=8d2d6f45…cee1
atomic talonBOT
#

Those sessions.resolve lines are probably not the Discord bot failing to retrieve this channel’s chat session.

main and developer are agent ids, not session keys. A real Discord session key looks like the one from your health output:

agent:main:discord:channel:1503693380448682064

So those errors usually mean some client/tool/UI asked β€œresolve session named main / developer”, and the Gateway correctly said β€œthere is no session with that key/label.” Annoying log noise, but not automatically the root cause.

To check whether the Discord session is actually persisted after a container restart, run:

openclaw sessions --all-agents --active 120
openclaw sessions --all-agents --json | grep 1503693380448682064
#

If that key exists after restart, OpenClaw did load the stored session. If it still β€œstarts fresh” when you talk, then we should look at routing/thread/session-key mismatch.

If it does not exist after restart, that points to Pterodactyl storage persistence: make sure /home/container/.openclaw and /home/container/openclaw.json are on the persistent container volume and not being recreated/overwritten on restart.

Also worth running once:

openclaw doctor
openclaw sessions cleanup --all-agents --dry-run

Don’t enforce cleanup yet β€” dry-run first, claws safe 🦞

patent echo
#

It manages to find the session with the commands

openclaw sessions --all-agents --active 120

🦞 OpenClaw 2026.5.7 (eeef486)
   I've seen your commit messages. We'll work on that together.
Session stores: 3 (main, developer, archivist)
Sessions listed: 8
Filtered to last 120 minute(s)
Agent      Kind   Key                        Age       Model          Runtime            Tokens (ctx %)       Flags
main       group  agent:main:disco...107315  2m ago    gemini-3-flash-preview OpenClaw Pi Default 48k/1049k (5%)       system id:422b663b-e772-422d-bb99-fa9699f514cc
main       group  agent:main:disco...982520  13m ago   gemini-3-flash-preview OpenClaw Pi Default 43k/1049k (4%)       system id:4d779c7e-291b-42e5-949d-2677f9b1c29c
main       group  agent:main:disco...682064  30m ago   gemini-3-flash-preview OpenClaw Pi Default 38k/1049k (4%)       system id:4636fab8-e5b3-4538-9b8f-b989eccc0ad9
developer  direct agent:developer:...9b81a5  30m ago   minimax-m2.7   OpenClaw Pi Default 30k/197k (15%)       id:fd8cddfd-e1d0-4d02-9017-dd716e66d8ef
developer  direct agent:developer:...aeb754  48m ago   minimax-m2.7   OpenClaw Pi Default 39k/197k (20%)       id:e7681a97-8a21-4c3a-97ac-3d8204b00dc6
developer  direct agent:developer:...48ca98  58m ago   minimax-m2.7   OpenClaw Pi Default 31k/197k (16%)       id:0777db92-2d34-4369-a919-fec394905b03
developer  direct agent:developer:...863c31  1h ago    minimax-m2.7   OpenClaw Pi Default 24k/197k (12%)       id:b02317d1-363f-4f30-9b88-e3d108af2403
developer  direct agent:developer:...b10dd1  2h ago    minimax-m2.7   OpenClaw Pi Default 33k/197k (17%)       id:38273f2c-7c11-4cf5-969c-c9bbf89ed82a
🦞 > openclaw sessions --all-agents --json | grep 1503693380448682064
      "key": "agent:main:discord:channel:1503693380448682064",

I run doctor --fix at each restart (i restart only when I upgrade or modify the openclaw.json) and session cleanup is done like this at each restart : sessions cleanup --enforce --fix-missing --all-agents

atomic talonBOT
#

Good data. If openclaw sessions --all-agents --json | grep 1503693380448682064 finds it, then the store itself survives restart. So the β€œnot retrieving the session” feeling is likely one of:

  • it is finding the session row, but the transcript/checkpoint behind it was pruned/missing/corrupt
  • it is a different Discord routing key than you expected β€” channel vs thread/forum topic/etc.
  • cleanup/doctor is changing state during startup before the bot handles the first message

I’d stop running these automatically at every restart:

openclaw doctor --fix
openclaw sessions cleanup --enforce --fix-missing --all-agents

They’re repair/maintenance commands, not normal startup steps. Especially --fix-missing: docs say it β€œremove entries whose transcript files are missing”. If there’s any startup timing/path/volume weirdness in Pterodactyl, that can turn a temporary missing transcript into a permanently removed session entry. Safer pattern:

#
openclaw doctor
openclaw sessions cleanup --all-agents --dry-run

Only run --fix / --enforce when the dry-run shows something you actually want changed.

For this specific session, inspect the full stored row:

openclaw sessions --all-agents --json | jq '.sessions[] | select(.key=="agent:main:discord:channel:1503693380448682064")'

Then after restart, before sending a message, run the same command again and compare. If the key + transcript/checkpoint fields stay identical, OpenClaw is retrieving it and we should look at conversation routing/model context. If fields disappear/change after cleanup, your restart maintenance is the culprit.

Docs for cleanup behavior: https://docs.openclaw.ai/cli/sessions#cleanup-maintenance

patent echo
#

Okay, I'll modify that then, here's the result of the cmd :

🦞 > openclaw sessions --all-agents --json | jq '.sessions[] | select(.key=="agent:main:discord:channel:1503693380448682064")'
{
  "key": "agent:main:discord:channel:1503693380448682064",
  "updatedAt": 1778580774625,
  "ageMs": 2089989,
  "sessionId": "4636fab8-e5b3-4538-9b8f-b989eccc0ad9",
  "systemSent": true,
  "abortedLastRun": false,
  "inputTokens": 38067,
  "outputTokens": 308,
  "totalTokens": 38067,
  "totalTokensFresh": true,
  "model": "gemini-3-flash-preview",
  "modelProvider": "google",
  "contextTokens": 1048576,
  "agentId": "main",
  "agentRuntime": {
    "id": "pi",
    "source": "implicit"
  },
  "kind": "group"
}
atomic talonBOT
#

Aha β€” I think we found an important mismatch.

The session you inspected is:

agent:main:discord:channel:1503693380448682064

But this current Discord thread/channel is:

1503694396883992676

So you checked a valid stored session, but probably not this support thread’s session. That would explain why it looks like OpenClaw β€œdoesn’t retrieve” the conversation after restart: it may be creating/using a different session key than the one you’re inspecting.

Check this one instead:

#
openclaw sessions --all-agents --json | jq '.sessions[] | select(.key=="agent:main:discord:channel:1503694396883992676")'

Or more broadly:

openclaw sessions --all-agents --json | jq '.sessions[] | select(.key | contains("1503694396883992676"))'

If that returns nothing before/after restart, then the bot isn’t storing this thread under the expected channel id. If it returns a row, compare that row before restart / after restart / after first message.

The row you pasted itself looks healthy: systemSent: true, kind: group, token counts present, no abortedLastRun. It just appears to be for a different Discord channel/thread.

patent echo
#

The chan should be 1503367941754982520

atomic talonBOT
#

That makes sense if 1503367941754982520 is the parent help/forum channel, but OpenClaw is currently receiving this conversation as the thread/topic id:

chat_id: channel:1503694396883992676
topic_id: 1503694396883992676
thread_label: Discord thread #help β€Ί Liveness warning for no apparent reason

So for this thread, the session key should normally be:

agent:main:discord:channel:1503694396883992676

not the parent channel id. That’s intentional thread isolation: each Discord thread/forum post gets its own conversation so unrelated support threads don’t bleed into each other.

Check:

#
openclaw sessions --all-agents --json | jq '.sessions[] | select(.key | contains("1503694396883992676") or contains("1503367941754982520")) | {key, updatedAt, sessionId, kind, inputTokens, model}'

If only 1503367941754982520 exists, then OpenClaw is storing/retrieving against the parent unexpectedly. If only 1503694396883992676 exists, that’s correct for this thread.

If your goal is shared parent-channel history across threads, that’s a different config/behavior discussion β€” but for support/forum threads, isolated per-thread history is usually the safer default.

patent echo
#

Parent is 1471513645174358180, thread is 1503367941754982520

atomic talonBOT
#

Got it β€” parent 1471513645174358180 makes sense.

But from OpenClaw’s side, the message I’m replying to is being routed through Discord channel/thread id:

1503694396883992676

That’s also what Discord returned when I sent replies in this thread. So I’d trust 1503694396883992676 as the active OpenClaw session key suffix for this conversation.

1503367941754982520 showed up in your sessions --active list, so it’s a real recent session, but likely a different Discord thread/channel. Could be easy to mix up if you’re copying from logs or an older support thread.

Run this to see all three side by side:

openclaw sessions --all-agents --json | jq '.sessions[] | select(.key | contains("1471513645174358180") or contains("1503367941754982520") or contains("1503694396883992676")) | {key, updatedAt, sessionId, kind, inputTokens, model}'
#

For this thread, the row I’d expect to update after each message is:

agent:main:discord:channel:1503694396883992676

If that row updates, session persistence is fine; the earlier sessions.resolve main/developer errors are probably unrelated client/UI lookups.

patent echo
#
🦞 > openclaw sessions --all-agents --json | jq '.sessions[] | select(.key | contains("1471513645174358180") or contains("1503367941754982520") or contains("1503694396883992676")) | {key, updatedAt, sessionId, kind, inputTokens, model}'
{
  "key": "agent:main:discord:channel:1503367941754982520",
  "updatedAt": 1778581757879,
  "sessionId": "4d779c7e-291b-42e5-949d-2677f9b1c29c",
  "kind": "group",
  "inputTokens": 42538,
  "model": "gemini-3-flash-preview"
}
atomic talonBOT
#

You’re right β€” I mixed in my current routing metadata there. Sorry, that was a bad comparison. Your OpenClaw’s relevant thread/session is the one your CLI shows:

agent:main:discord:channel:1503367941754982520

That row looks healthy, and importantly it has a stable sessionId:

4d779c7e-291b-42e5-949d-2677f9b1c29c

So the next useful test is not β€œdoes the key exist?” β€” it does. It’s β€œdoes restart rotate the sessionId or lose transcript state?”

Before restart:

#
openclaw sessions --all-agents --json | jq '.sessions[] | select(.key=="agent:main:discord:channel:1503367941754982520") | {key, updatedAt, sessionId, systemSent, inputTokens, outputTokens, totalTokens, model}'

After restart, before talking to it, run the same command. Then send one message and run it again.

Expected healthy behavior:

  • sessionId stays the same across restart
  • after you talk, updatedAt changes
  • token counts should generally stay present / increase

If sessionId changes after first message, OpenClaw decided the old session was stale/reset/new. Then we’d look at reset policy or cleanup/doctor effects.

And yeah, I’d still remove this from automatic startup:

#
openclaw doctor --fix
openclaw sessions cleanup --enforce --fix-missing --all-agents

Use dry-run on startup at most. --fix-missing is exactly the kind of thing I don’t want running automatically in a container where paths/volumes may briefly be weird during boot.

patent echo
#

Before restart :

{
  "key": "agent:main:discord:channel:1503367941754982520",
  "updatedAt": 1778581757879,
  "sessionId": "4d779c7e-291b-42e5-949d-2677f9b1c29c",
  "systemSent": true,
  "inputTokens": 42538,
  "outputTokens": 199,
  "totalTokens": 42538,
  "model": "gemini-3-flash-preview"
}

After restart :

{
  "key": "agent:main:discord:channel:1503367941754982520",
  "updatedAt": 1778581757879,
  "sessionId": "4d779c7e-291b-42e5-949d-2677f9b1c29c",
  "systemSent": true,
  "inputTokens": 42538,
  "outputTokens": 199,
  "totalTokens": 42538,
  "model": "gemini-3-flash-preview"
}

After message at 11:01 (but no answer after 60s) :

{
  "key": "agent:main:discord:channel:1503367941754982520",
  "updatedAt": 1778583624694,
  "sessionId": "4d779c7e-291b-42e5-949d-2677f9b1c29c",
  "systemSent": true,
  "inputTokens": 42538,
  "outputTokens": 199,
  "totalTokens": 42538,
  "model": "gemini-3-flash-preview"
}

And I had this right now :

13:01:38 [diagnostic] liveness warning: reasons=event_loop_delay,event_loop_utilization,cpu interval=78s eventLoopDelayP99Ms=462.7 eventLoopDelayMaxMs=73014.4 eventLoopUtilization=0.962 cpuCoreRatio=1.204 active=1 waiting=0 queued=1 phase=channels.discord.start-account recentPhases=sidecars.subagent-recovery:6ms,sidecars.main-session-recovery:14ms,post-attach.update-sentinel:0ms,sidecars.session-locks:204ms,sidecars.model-prewarm:3228ms,post-ready.maintenance:439ms work=[active=agent:main:discord:channel:1503367941754982520(processing,q=1,age=74s) queued=agent:main:discord:channel:1503367941754982520(processing,q=1,age=74s)]
13:01:38 [discord] gateway: Gateway websocket closed: 1000
#
openclaw gateway stability
[gateway] startup trace: entry.bootstrap 4.8ms total=4.8ms
[gateway] startup trace: entry.bootstrap 4.1ms total=4.1ms
[gateway] startup trace: entry.argv 6.6ms total=10.7ms
[gateway] startup trace: entry.run-main-import 66.0ms total=77.7ms
[gateway] startup trace: cli.main.argv 0.7ms total=0.7ms
[gateway] startup trace: cli.main.dotenv 506.9ms total=508.2ms
[gateway] startup trace: cli.main.proxy-imports 124.8ms total=640.9ms
[gateway] startup trace: cli.main.route-import 19.2ms total=663.5ms
[gateway] startup trace: cli.main.route 0.6ms total=664.3ms
Gateway Stability
Events: 58/1000
Types: diagnostic.phase.completed=36, diagnostic.heartbeat=4, diagnostic.memory.sample=4, session.state=4, diagnostic.liveness.warning=3, queue.lane.dequeue=2, queue.lane.enqueue=2, diagnostic.memory.pressure=1
Memory: rss=1.1 GiB heap=628 MiB maxRss=1.1 GiB pressure=1
Recent:
  2026-05-12T10:59:27.931Z #102 diagnostic.phase.completed
  2026-05-12T10:59:27.950Z #104 diagnostic.phase.completed
  2026-05-12T10:59:28.950Z #106 diagnostic.phase.completed
  2026-05-12T10:59:50.266Z #116 diagnostic.memory.sample rss=354 MiB heap=178 MiB
  2026-05-12T10:59:50.270Z #117 diagnostic.liveness.warning level=info reason=event_loop_delay queued=0
  2026-05-12T10:59:50.271Z #118 diagnostic.heartbeat queued=0
  2026-05-12T11:00:20.263Z #119 diagnostic.memory.sample rss=337 MiB heap=176 MiB
  2026-05-12T11:00:20.264Z #120 diagnostic.heartbeat queued=0
  2026-05-12T11:00:24.608Z #122 message.queued channel=discord queueDepth=1
  2026-05-12T11:00:24.609Z #123 session.state outcome=processing reason=message_start queueDepth=1
  2026-05-12T11:01:38.748Z #124 diagnostic.memory.sample rss=1.1 GiB heap=473 MiB
  2026-05-12T11:01:38.749Z #125 diagnostic.memory.pressure level=warning reason=rss_growth rss=1.1 GiB heap=473 MiB
  2026-05-12T11:01:38.756Z #127 diagnostic.liveness.warning level=warning reason=event_loop_delay queued=1
  2026-05-12T11:01:38.757Z #128 diagnostic.heartbeat queued=1
  2026-05-12T11:01:49.115Z #130 queue.lane.enqueue
  2026-05-12T11:01:49.116Z #131 queue.lane.dequeue
  2026-05-12T11:01:49.122Z #132 queue.lane.enqueue
  2026-05-12T11:01:49.122Z #133 queue.lane.dequeue
  2026-05-12T11:01:54.080Z #137 session.state outcome=processing reason=run_started queueDepth=1
  2026-05-12T11:02:02.382Z #145 session.state outcome=idle reason=run_completed queueDepth=0
  2026-05-12T11:02:02.863Z #149 message.processed outcome=completed channel=discord
  2026-05-12T11:02:02.864Z #150 session.state outcome=idle reason=message_completed queueDepth=0
  2026-05-12T11:02:08.748Z #151 diagnostic.memory.sample rss=1.1 GiB heap=628 MiB
  2026-05-12T11:02:08.748Z #152 diagnostic.liveness.warning level=info reason=event_loop_delay queued=0
  2026-05-12T11:02:08.750Z #153 diagnostic.heartbeat queued=0
[gateway] startup trace: cli.main.parse 4856.5ms total=12168.3ms
atomic talonBOT
#

This confirms the session is being retrieved. Same key, same sessionId, and after the message updatedAt changed. So the issue is not persistence anymore β€” it’s getting stuck during/after processing the first post-restart message.

The big clue is still this:

phase=channels.discord.start-account
active=1 queued=1 age=74s
eventLoopDelayMaxMs=73014.4

That means: while Discord account startup/reconnect was happening, the Gateway event loop stalled for ~73s, with one message actively processing and another queued for the same session. Since token counts didn’t change, it likely didn’t complete the model turn / response write.

Run this now, before restarting again:

openclaw gateway stability --json
#
openclaw health --verbose

And if you still have auto-start maintenance enabled, I’d disable it before the next test. Don’t run these on every boot:

openclaw doctor --fix
openclaw sessions cleanup --enforce --fix-missing --all-agents

Use only:

openclaw doctor
openclaw sessions cleanup --all-agents --dry-run
#

My current read: the session store is fine. The Gateway process is getting CPU/event-loop starved during Discord startup, likely worsened by boot-time repair/cleanup or Pterodactyl scheduling. The clean Discord close 1000 is probably a consequence of that stall, not the root cause.

#

This is super useful. It shows the message did eventually complete:

11:00:24 message.queued
11:01:38 liveness warning queued=1
11:01:54 run_started
11:02:02 run_completed
11:02:02 message.processed outcome=completed

So it wasn’t lost; it sat stuck for ~90s before the model run actually started, then the run itself finished quickly. That points away from Gemini/session retrieval and toward Gateway startup/event-loop starvation before dispatch.

Also notable:

rss=354 MiB -> rss=1.1 GiB
#
heap=178 MiB -> heap=473/628 MiB
memory.pressure reason=rss_growth

Still under your 6 GiB limit, but that jump happened exactly with the liveness warning. So I’d try reducing startup work and stop doing repair writes on boot.

Concrete changes I’d make:

  1. Remove boot-time repair/cleanup:
# do not run these automatically on every restart
openclaw doctor --fix
openclaw sessions cleanup --enforce --fix-missing --all-agents
#

Use these only:

openclaw doctor
openclaw sessions cleanup --all-agents --dry-run
  1. Try disabling startup model prewarm in the Pterodactyl environment:
OPENCLAW_SKIP_STARTUP_MODEL_PREWARM=1

The code supports that env var, and your warning phase repeatedly includes sidecars.model-prewarm / channels.discord.start-account, so it’s worth eliminating as a variable.

  1. After restart, wait until health is calm before sending the first Discord message:
#
openclaw health --verbose

You want event loop roughly like your earlier healthy sample, not saturated.

If it still stalls with boot cleanup removed + prewarm skipped, I’d suspect the Pterodactyl host/container scheduling rather than OpenClaw session state. At that point the best artifact for a maintainer issue is:

openclaw gateway diagnostics export --output openclaw-diagnostics.zip --log-lines 5000

Then file it at https://github.com/openclaw/openclaw/issues with the liveness line and the stability timeline you pasted.

patent echo
#

Ok I'll test it with the OPENCLAW_SKIP_STARTUP_MODEL_PREWARM=1, is this maybe adressed in the beta versions of OC, to avoid opening an issue that is already resolved

atomic talonBOT
#

Yes β€” there are a couple of beta/main changes that look related enough that I’d test beta before opening an issue.

I don’t see a β€œfixed Discord liveness stall in Pterodactyl” smoking-gun entry, but current beta/main includes related startup/liveness work, including:

  • fix(gateway): avoid sync restart sentinel startup probes
  • fix(logging): reduce active-only liveness noise
  • diagnostics improvements around startup phase spans / active work labels

Changelog: https://github.com/openclaw/openclaw/blob/main/CHANGELOG.md

So my recommended order:

  1. First test stable with:
OPENCLAW_SKIP_STARTUP_MODEL_PREWARM=1
#

and with boot-time doctor --fix / sessions cleanup --enforce --fix-missing removed.

  1. If it still stalls after restart, try beta:
openclaw update --channel beta

or reinstall beta if your Pterodactyl setup uses the installer/package directly:

curl -fsSL --proto '=https' --tlsv1.2 https://openclaw.ai/install.sh | bash -s -- --beta

Docs: https://docs.openclaw.ai/help/faq-first-run

  1. If beta still shows the same ~70s event-loop stall during channels.discord.start-account, then I’d open an issue with your stability timeline. That’s specific and useful enough for maintainers.
patent echo
#

I still have this warning

14:10:38 [diagnostic] liveness warning: reasons=event_loop_delay interval=31s eventLoopDelayP99Ms=20.3 eventLoopDelayMaxMs=1269.8 eventLoopUtilization=0.079 cpuCoreRatio=0.087 active=1 waiting=0 queued=1 phase=channels.discord.start-account recentPhases=sidecars.restart-sentinel:1ms,sidecars.subagent-recovery:119ms,sidecars.main-session-recovery:13ms,post-attach.update-sentinel:0ms,sidecars.session-locks:261ms,post-ready.maintenance:550ms work=[active=agent:main:discord:channel:1503367941754982520(processing,q=1,age=2s last=run:completed) queued=agent:main:discord:channel:1503367941754982520(processing,q=1,age=2s last=run:completed)]

BUT it does return me messages faster without killing the gateway or the session

atomic talonBOT
#

That’s a much healthier warning.

Compare the old one:

eventLoopDelayMaxMs=73014ms
eventLoopUtilization=0.962
cpuCoreRatio=1.204
age=74s

New one:

eventLoopDelayMaxMs=1269.8ms
eventLoopUtilization=0.079
#
cpuCoreRatio=0.087
age=2s last=run:completed

So now it’s basically saying: β€œthere was one ~1.3s event-loop hiccup while a Discord message was finishing/queued.” That’s not great, but it’s not the gateway-killing/session-stalling problem anymore.

Also notice sidecars.model-prewarm is gone from recentPhases, so OPENCLAW_SKIP_STARTUP_MODEL_PREWARM=1 likely helped.

I’d treat this as acceptable unless one of these happens:

  • replies take 30s+ again
  • eventLoopDelayMaxMs goes back to tens of seconds
  • eventLoopUtilization/cpuCoreRatio are ~0.9+ again
  • queued work stays stuck instead of last=run:completed

If you want to keep tuning, the next best step is beta, because there are recent liveness/startup noise reductions. But if messages are fast and sessions survive, I wouldn’t rush to open an issue for this smaller warning yet. Keep the stability output around if it regresses.

patent echo
#

Okay, and is there a way to put the OPENCLAW_SKIP_STARTUP_MODEL_PREWARM in the openclaw.json config ? And what purpose does the model prewarm serve

atomic talonBOT
#

I don’t see a documented openclaw.json config key for that one. In current code it’s checked only as an environment variable:

OPENCLAW_SKIP_STARTUP_MODEL_PREWARM=1

Accepted truthy values are 1, true, yes, or on.

So in Pterodactyl, the right place is usually the egg/startup environment variables, not openclaw.json.

What model prewarm does: on Gateway startup, OpenClaw looks at your configured default primary model, does lightweight model/provider metadata discovery, and ensures the agent’s models.json cache is prepared. It’s meant to make the first real request after restart smoother and catch/prime provider/model info early.

It should be metadata-only and has a timeout, but in constrained/shared environments it can still add startup pressure because it imports provider/runtime bits and may do provider discovery. Since your issue is specifically around channels.discord.start-account right after restart, skipping it is a reasonable tradeoff: slightly less β€œwarm” first model metadata, but a much calmer Discord startup.

If disabling it keeps your replies fast, I’d leave it disabled in Pterodactyl.