sessions.list perf | Friends of the Crustacean 🦞🤝 | Page 1

dense bison May 4, 2026, 5:11 AM

#

@subtle gull @tropic fox I'm trying to repro this with 10k sessions, if you folks have a minimal repro I could test that would be great

#

Could very well be machine dependent, I'm testng on a mac so it probably has a quick SSD

subtle gull May 4, 2026, 5:12 AM

#

im on something that has datacenter NVME SSDs

#

let me get the details

#

i just got things stabilized so i can use my main chat app, no longer bound to the confines of the tui

#

my chat app was abusing the problem big time, wide scope calls against sessions a lot of times frequently, worked fine until uh, 4.27?

#

now its scoped down much more, plus the cache and it seems to be happy

dense bison May 4, 2026, 5:14 AM

#

hmm on main I made 10k sessions about 256kb each and it completes in 1s cold and 300ms hot

#

Without any cache

tropic fox May 4, 2026, 5:16 AM

#

Mine is a 2.5GHz quad-core Intel MBP with 16GB memory running Sequoia 15.7.3. Only other things "running" were Tailscale, Clash proxy, Teamviewer, Obsidian and Things3 -- all in background. My numbers show only direct CPU usage excluding other processes.

dense bison May 4, 2026, 5:16 AM

#

Got it, lemme push a quick benchmark script

#

Try this please:

Run on the affected machine:

git fetch https://github.com/obviyus/clawdbot.git bench/sessions-list-main
git checkout FETCH_HEAD
pnpm install
pnpm test:sessions:list:bench -- --sessions 10000 --cold-runs 3 --runs 5 --warmup 1 --include-derived-titles --include-last-message

If you have ~6GB free disk, also run:

pnpm test:sessions:list:bench -- --sessions 10000 --inflate-transcript-kib 256 --cold-runs 1 --runs 3 --warmup 1 --include-derived-titles --include-last-message

Reply with:

full command output
OpenClaw version/commit
machine + storage type
number of agents
whether Control UI/TUI was open while running it

tropic fox May 4, 2026, 5:18 AM

#

imma get Claude to do that. Whilst OC is running. ok?

dense bison May 4, 2026, 5:18 AM

#

Make sure to test on latest main please!

tropic fox May 4, 2026, 5:19 AM

#

ah.. ok. that will put my claw out of action so I'll just have to spend a little longer preparing for that first... brb.

dense bison May 4, 2026, 5:20 AM

#

Thank you, really appreciate it

subtle gull May 4, 2026, 5:20 AM

#

let me get it to answer the right question the right way, heres a few of the details from me getting the hotfix in and getting my chat app, called clawdash, working with smarter bounded calls

Why prod hurts and test does not:

Prod has 30 agent dirs, 1,876 files, about 1.7GB under session dirs, and heavy active churn.

Forge alone has 367 session-dir files and about 375MB.

The expensive part is not just returning rows, it is repeatedly discovering, loading, merging, filtering, and ranking a much larger all-agent session
universe.

Test has the same architecture but lower cardinality and less churn, so the same O(all sessions / all agents) shape stays below the pain threshold.

#

but even with clawdash stopped and using the TUI, the openclaw node process goes to 100% cpu util for that core

#

and it grinds to a halt

#

this is what we just finished implementing, and its making things pretty usable

ClawDash hotfix

Caches channel computation for ~60-75s.

Uses single-flight, so concurrent callers share one request.

During active ClawCanvas sends, serves stale channel metadata for up to 5m.

Calls gateway with bounded params:
- limit: 80-160
- activeMinutes: 1440
- includeDerivedTitles: false
- includeLastMessage: false

Gateway hotfix

Adds a sessions.list response cache, 15s TTL.

Coalesces identical in-flight requests.

Still recomputes on cache miss or invalidation.

Non-hotfixed gateway path

Discovers all agent session stores if agentId is absent.

Loads and merges every agent’s sessions.json.

Iterates the full combined store.

Applies filters after store merge.

Applies limit only after scanning candidates.

Builds rows, then yields every 10 rows, but much of the expensive prework is synchronous.

tropic fox May 4, 2026, 5:22 AM

#

cogitating ...

subtle gull May 4, 2026, 5:24 AM

#

Gateway logs showed repeated slow sessions.list responses.
sessions.list 8123ms
sessions.list 8559ms
sessions.list 8337ms
sessions.list 8392ms
sessions.list 8578ms
sessions.list 8696ms
sessions.list 11470ms

count=55
min=115ms
p50=1811ms
p95=8628ms
max=11470ms
avg=4296ms

Interpretation:

•Fast path exists: some calls completed in ~500-700ms
•Bad path is very repeatable: many calls clustered around 8.1-8.7s
•Worst observed: 11.47s
•Tool-level call timed out at 10s
So this is probably not random network jitter. It smells like conditional hydration work: some requests or states hit a slow metadata path.

. User-visible impact
The direct UI symptom:

•Control UI / ClawDash session list became slow or stale
•Status updates looked broken because the list did not refresh promptly
•It created enough lag that the operator thought the assistant might be broken
Operational concern:

If sessions.list is taking 8-11s, it competes with ordinary chat operations on the same gateway. Even if chat still works, the UI becomes a pressure source. That is future outage bait: an observability/control surface should not degrade the thing it monitors.

#

god he loves talking about outages

tropic fox May 4, 2026, 5:24 AM

#

main as in beta4 right?

dense bison May 4, 2026, 5:24 AM

#

tropic fox main as in beta4 right?

No no github repo main

subtle gull May 4, 2026, 5:24 AM

#

Likely expensive areas
The suspected expensive work is per-row hydration across many sessions.

Likely hot spots:

•title generation or derived title hydration
•last-message preview hydration
•transcript head/tail reads per session
•model/runtime metadata projection
•token/usage fallback calculation
•plugin/setup metadata projection
•repeated UI polling every ~30s
The key risk is if sessions.list performs anything close to a transcript audit for each row.

Expected behavior should be:

•cheap index read
•cached metadata
•bounded preview data
•opt-in expensive hydration

#

i think that more or less summarizes what i asked him to consolidate for you. im on npm branches, i could spin up another test server and migrate my install to that and convert it to a github main install

tropic fox May 4, 2026, 5:27 AM

#

building... might take a while.

dense bison May 4, 2026, 5:30 AM

#

subtle gull > 5. Likely expensive areas > The suspected expensive work is per-row hydration ...

Weird, I'm still not able to repro this slowdown no matter what

subtle gull May 4, 2026, 5:31 AM

#

im spinning up a system to test everything with

dense bison May 4, 2026, 5:31 AM

#

Thank you both 🫡

subtle gull May 4, 2026, 5:31 AM

#

we can go hog wild

dense bison May 4, 2026, 5:31 AM

#

Let's get @digital frost in here too

#

kek

digital frost May 4, 2026, 5:32 AM

#

why

subtle gull May 4, 2026, 5:32 AM

#

ive got a template and everything, sonnet will take care of this quickly

digital frost May 4, 2026, 5:33 AM

#

am I QA now? what we testing

calm pebble May 4, 2026, 5:33 AM

#

i tried to run it for another datapoint. it failed horribly and I ain't breaking anything for it

dense bison May 4, 2026, 5:34 AM

#

digital frost am I QA now? what we testing

Nah nothing required from you, I'm just trying to repro the session.list slowdown

tropic fox May 4, 2026, 5:35 AM

#

baddies all up in this joint

digital frost May 4, 2026, 5:35 AM

#

dense bison Nah nothing required from you, I'm just trying to repro the `session.list` slowd...

on it boss

digital frost May 4, 2026, 5:36 AM

#

tropic fox baddies all up in this joint

@dense bison is 🐐 baddie, he's allowed to ping me

tropic fox May 4, 2026, 5:37 AM

#

now where's that "whipping claude" GIF...

digital frost May 4, 2026, 5:39 AM

#

I think they let me remove roles... one sec...

tropic fox May 4, 2026, 5:40 AM

#

Brave plugin causing issues... removing it to save time.

digital frost May 4, 2026, 5:41 AM

#

$ pnpm test:sessions:list:bench -- --sessions 10000 --cold-runs 3 --runs 5 --warmup 1 --include-derived-titles --include-last-message

openclaw@2026.5.3 test:sessions:list:bench <repo-root>
node --import tsx scripts/bench-sessions-list.ts -- --sessions 10000 --cold-runs 3 --runs 5 --warmup 1 --include-derived-titles --include-last-message

[sessions-list-bench] cold 1/3: rows=10000 wall=682.2ms eventLoopDelayMax=1.7ms
[sessions-list-bench] cold 2/3: rows=10000 wall=712.7ms eventLoopDelayMax=1.9ms
[sessions-list-bench] cold 3/3: rows=10000 wall=760.3ms eventLoopDelayMax=1.6ms
[sessions-list-bench] cloned 10000 sessions across 1 agents from 2/2 transcript-backed source rows across 1 stores (2420 MiB written)
[sessions-list-bench] warmup 1/1: rows=10000 wall=625.7ms eventLoopDelayMax=2.3ms
[sessions-list-bench] run 1/5: rows=10000 wall=111.0ms eventLoopDelayMax=2.0ms
[sessions-list-bench] run 2/5: rows=10000 wall=122.2ms eventLoopDelayMax=1.7ms
[sessions-list-bench] run 3/5: rows=10000 wall=115.6ms eventLoopDelayMax=2.4ms
[sessions-list-bench] run 4/5: rows=10000 wall=111.1ms eventLoopDelayMax=2.0ms
[sessions-list-bench] run 5/5: rows=10000 wall=117.0ms eventLoopDelayMax=1.7ms
[sessions-list-bench] cold summary: sessions=10000 min=682.2ms p50=712.7ms p95=760.3ms max=760.3ms avg=718.4ms
[sessions-list-bench] summary: sessions=10000 rows=10000 min=111.0ms p50=115.6ms p95=122.2ms max=122.2ms avg=115.4ms

#

$ pnpm test:sessions:list:bench -- --sessions 10000 --inflate-transcript-kib 256 --cold-runs 1 --runs 3 --warmup 1 --include-derived-titles --include-last-message

openclaw@2026.5.3 test:sessions:list:bench <repo-root>
node --import tsx scripts/bench-sessions-list.ts -- --sessions 10000 --inflate-transcript-kib 256 --cold-runs 1 --runs 3 --warmup 1 --include-derived-titles --include-last-message

[sessions-list-bench] cold 1/1: rows=10000 wall=685.6ms eventLoopDelayMax=1.7ms
[sessions-list-bench] cloned 10000 sessions across 1 agents from 2/2 transcript-backed source rows across 1 stores (3624 MiB written)
[sessions-list-bench] warmup 1/1: rows=10000 wall=785.2ms eventLoopDelayMax=2.1ms
[sessions-list-bench] run 1/3: rows=10000 wall=107.0ms eventLoopDelayMax=1.6ms
[sessions-list-bench] run 2/3: rows=10000 wall=112.0ms eventLoopDelayMax=1.4ms
[sessions-list-bench] run 3/3: rows=10000 wall=107.3ms eventLoopDelayMax=1.4ms
[sessions-list-bench] cold summary: sessions=10000 min=685.6ms p50=685.6ms p95=685.6ms max=685.6ms avg=685.6ms
[sessions-list-bench] summary: sessions=10000 rows=10000 min=107.0ms p50=107.3ms p95=112.0ms max=112.0ms avg=108.8ms

#

doesn't look like I can repro

subtle gull May 4, 2026, 5:43 AM

#

how many agent directories do you have?

digital frost May 4, 2026, 5:44 AM

#

subtle gull how many agent directories do you have?

0, this is on my main laptop, fresh OC install

subtle gull May 4, 2026, 5:45 AM

#

hm. ask your agent to make you like 10 more agents? i dont know that they actually need to do anything

#

you could go through the tui and just say hi to all of them

#

actually scratch that, my test server has all 15 of my main agents on it

#

its all the subagents that seem to be the real difference between my prod and test server

tropic fox May 4, 2026, 5:47 AM

#

it's the number of session files, isn't it?

subtle gull May 4, 2026, 5:48 AM

#

you would think, but the benchmark makes it look like its not

digital frost May 4, 2026, 5:53 AM

#

my QA paired programmer

#

📎 message.txt

#

reran with 10 agents

#

anything else you want me to try?

#

or I'm nuking these agents and sessions lol

dense bison May 4, 2026, 6:01 AM

#

digital frost or I'm nuking these agents and sessions lol

Go for it

#

Thank you for testing 🫡

digital frost May 4, 2026, 6:02 AM

#

salute

subtle gull May 4, 2026, 6:08 AM

#

wiring up auth on the test server now

tropic fox May 4, 2026, 6:17 AM

#

claude sez...

⏺ Here are the baseline numbers on 2026.5.3 main, no sessions.list caching (pre-#62164):

  ┌──────────────────────┬──────────────┬───────────────┐
  │        Metric        │ 200 sessions │ 1000 sessions │
  ├──────────────────────┼──────────────┼───────────────┤
  │ Warm p50             │ ~1.35s       │ ~5.4s         │
  ├──────────────────────┼──────────────┼───────────────┤
  │ Warm p95             │ ~1.37s       │ ~6.1s         │
  ├──────────────────────┼──────────────┼───────────────┤
  │ Event loop delay avg │ ~307ms       │ ~355ms        │
  ├──────────────────────┼──────────────┼───────────────┤
  │ Cold min             │ ~4.8s        │ ~5.9s         │
  └──────────────────────┴──────────────┴───────────────┘

  The event loop blocking is significant (~350ms for 1k sessions), and the call scales roughly linearly. These are the "before"
  numbers for when #62164 eventually lands.

  Summary of what just happened:
  - Installed: OpenClaw 2026.5.3 from GitHub main (commit 5a6cedc1) via npm pack + npm install -g
  - Config cleaned up: Removed brave (broken npm package, no compiled output), codex, discord stale entries
  - Gateway: Running at pid 62167, reachable, version confirmed 2026.5.3
  - Benchmark run: sessions.list at 1000 sessions → warm p50 ~5.4s, event loop delay ~355ms — this is the pre-#62164 baseline

#

actually getting codex to run it as well since it has more context about the fix.

dense bison May 4, 2026, 6:20 AM

#

tropic fox claude sez... ``` ⏺ Here are the baseline numbers on 2026.5.3 main, no sessions...

hmm you're slightly behind main which defaults omitted sessions.list.limit to 100 and returns totalCount/limitApplied/hasMore

tropic fox May 4, 2026, 6:22 AM

#

hmmm stupid clanker. I saw it deciding between 5.4 and 5.3 somehow. and said it was going to use the several hours older version cos it was already in npm. I stopped it, that's when I asked you, and I told it explicitly "I want you to update my local install with the version on github main". and it took a good 20-30mins to rebuild using pnpm. That's the best way I can articulate what I saw it do lol

digital frost May 4, 2026, 6:25 AM

#

tropic fox hmmm stupid clanker. I saw it deciding between 5.4 and 5.3 somehow. and said i...

I normally just tell it to pull upstream main into local project and install from that

tropic fox May 4, 2026, 6:29 AM

#

Benchmark 1 Output

openclaw@2026.5.3 test:sessions:list:bench /private/tmp/clawdbot2
node --import tsx scripts/bench-sessions-list.ts -- --sessions 10000 --cold-runs 3 --runs 5 --warmup 1 --include-derived-titles --include-last-
message

  [sessions-list-bench] cold 1/3: rows=10000 wall=47172.0ms eventLoopDelayMax=3189.8ms
  [sessions-list-bench] cold 2/3: rows=10000 wall=43038.5ms eventLoopDelayMax=206.7ms
  [sessions-list-bench] cold 3/3: rows=10000 wall=40875.8ms eventLoopDelayMax=196.3ms
  [sessions-list-bench] cloned 10000 sessions across 3 agents from 34/35 transcript-backed source rows across 6 stores (921 MiB written)
  [sessions-list-bench] warmup 1/1: rows=10000 wall=39691.1ms eventLoopDelayMax=415.5ms
  [sessions-list-bench] run 1/5: rows=10000 wall=36949.5ms eventLoopDelayMax=187.4ms
  [sessions-list-bench] run 2/5: rows=10000 wall=39557.2ms eventLoopDelayMax=335.3ms
  [sessions-list-bench] run 3/5: rows=10000 wall=42008.1ms eventLoopDelayMax=200.4ms
  [sessions-list-bench] run 4/5: rows=10000 wall=42586.7ms eventLoopDelayMax=203.0ms
  [sessions-list-bench] run 5/5: rows=10000 wall=40365.8ms eventLoopDelayMax=397.7ms
  [sessions-list-bench] cold summary: sessions=10000 min=40875.8ms p50=43038.5ms p95=47172.0ms max=47172.0ms avg=43695.4ms
  [sessions-list-bench] summary: sessions=10000 rows=10000 min=36949.5ms p50=40365.8ms p95=42586.7ms max=42586.7ms avg=40293.5ms

#

Benchmark 2 Output

openclaw@2026.5.3 test:sessions:list:bench /private/tmp/clawdbot2
node --import tsx scripts/bench-sessions-list.ts -- --sessions 10000 --inflate-transcript-kib 256 --cold-runs 1 --runs 3 --warmup 1 --include-
derived-titles --include-last-message

  [sessions-list-bench] cold 1/1: rows=10000 wall=46099.3ms eventLoopDelayMax=1950.4ms
  [sessions-list-bench] cloned 10000 sessions across 3 agents from 34/35 transcript-backed source rows across 6 stores (2893 MiB written)
  [sessions-list-bench] warmup 1/1: rows=10000 wall=43491.8ms eventLoopDelayMax=214.3ms
  [sessions-list-bench] run 1/3: rows=10000 wall=39897.2ms eventLoopDelayMax=211.7ms
  [sessions-list-bench] run 2/3: rows=10000 wall=41191.1ms eventLoopDelayMax=422.1ms
  [sessions-list-bench] run 3/3: rows=10000 wall=42582.5ms eventLoopDelayMax=213.9ms
  [sessions-list-bench] cold summary: sessions=10000 min=46099.3ms p50=46099.3ms p95=46099.3ms max=46099.3ms avg=46099.3ms
  [sessions-list-bench] summary: sessions=10000 rows=10000 min=39897.2ms p50=41191.1ms p95=42582.5ms max=42582.5ms avg=41223.6ms

#

pulling main again. 5 new commits...

tropic fox May 4, 2026, 6:57 AM

#

Quick aside on the Brave issue:

    directory, no index.js or index.mjs. OpenClaw 2026.5.3 requires compiled JS artifacts for plugin packages and refuses to start if it finds a
    TypeScript-only entry. Every time npm install -g runs the new tarball, openclaw's plugin manager re-fetches and reinstalls brave from npm,
    bringing the broken package back.```

calm pebble May 4, 2026, 6:57 AM

#

tropic fox Quick aside on the Brave issue: ``` The npm-published @openclaw/brave-plugin@...

I had to repackage my own plugin, as it had the same error

digital frost May 4, 2026, 7:02 AM

#

story of my fuckin life right now

#

tropic fox May 4, 2026, 7:03 AM

#

Quoth Claude:

⏺ Running on commit 3dcff3b — that's the latest main tip.
⏺ There it is. This pull added SESSIONS_LIST_DEFAULT_LIMIT = 100 — when no explicit limit is passed, sessions.list now caps at 100 rows by default, specifically to prevent large session stores from monopolizing the event loop.

Here's the comparison:

  ┌────────────────┬──────────────────────────┬───────────────┬───────────────────────┐
  │                │ Codex baseline (5a6cedc) │ Now (3dcff3b) │         Delta         │
  ├────────────────┼──────────────────────────┼───────────────┼───────────────────────┤
  │ Warm p50       │ 40.4s                    │ 1.0s          │ -97.5%                │
  ├────────────────┼──────────────────────────┼───────────────┼───────────────────────┤
  │ Warm p95       │ 42.6s                    │ 1.0s          │ -97.6%                │
  ├────────────────┼──────────────────────────┼───────────────┼───────────────────────┤
  │ Cold p50       │ 43.0s                    │ 2.3s          │ -94.6%                │
  ├────────────────┼──────────────────────────┼───────────────┼───────────────────────┤
  │ ELD avg (warm) │ ~265ms                   │ ~228ms        │ -14%                  │
  ├────────────────┼──────────────────────────┼───────────────┼───────────────────────┤
  │ rows returned  │ 10,000                   │ 100           │ default limit applied │
  └────────────────┴──────────────────────────┴───────────────┴───────────────────────┘

Sorry for the delay. AC contractor kicked my router. lol.

dense bison May 4, 2026, 7:04 AM

#

tropic fox Quoth Claude: ⏺ Running on commit 3dcff3b — that's the latest main tip. ⏺ There...

phew!

calm pebble May 4, 2026, 7:04 AM

#

digital frost

it wasn't so bad. gave an excuse to version bump it, which tends to increase the downloads (it's probably bots but, hey, number goes up)

tropic fox May 4, 2026, 7:07 AM

#

codex running the same benchmarks with 3dcff3b but sez it's unchanged for cold runs. :/ wtf. lol. waiting for final results...

#

that makes no sense. Must be some caching going on

#

Codex ran it in the wrong location and it picked up the old source tree. False alarm. The fix is real.

dense bison May 4, 2026, 7:19 AM

#

Woo! Nice

#

So I'll close your caching PR in that case

calm pebble May 4, 2026, 7:21 AM

#

does this land in 5.3? or next beta?

dense bison May 4, 2026, 7:21 AM

#

In 5.3

tropic fox May 4, 2026, 7:21 AM

#

mine is already closed. the 6000s one is someone elses, with the broader fix. Even codex says your patch is probably better than our fix because it bounds it by default. lol

#

My claw is currently awol... but I have yet to investigate why

dense bison May 4, 2026, 7:23 AM

#

tropic fox mine is already closed. the 6000s one is someone elses, with the broader fix. ...

Oh wow, got it

digital frost May 4, 2026, 7:26 AM

#

glad we figured it out

#

🐐

#

k gnight nerds

subtle gull May 4, 2026, 5:55 PM

#

scrolling back up to try to test this out, i got tied up last night and by the time i got back to it the discord was down, im not seeing a link to running the test? i'm still hitting perf issues on 2026.5.3, and its being replicated with the test server that i built and shifted my agent & subagent sessions over

#

my agent did a check through the commit history on sessions.list and has this theory

The regression likely came from sessions.list becoming too rich after v2026.4.23: subagent graph enrichment, child-session indexing, thinking/model metadata, transcript/title preview work, and plugin/session projections all ended up on a polling path.

subtle gull May 4, 2026, 6:22 PM

#

found it, running now, its running like crap with a "full" prod server load so i'll try a few scenarios

#

[sessions-list-bench] cold 1/3: rows=10000 wall=77389.5ms eventLoopDelayMax=1448.1ms
[sessions-list-bench] cold 2/3: rows=10000 wall=75926.4ms eventLoopDelayMax=119.4ms
[sessions-list-bench] cold 3/3: rows=10000 wall=75191.0ms eventLoopDelayMax=118.6ms
[sessions-list-bench] cloned 10000 sessions across 18 agents from 20/20 transcript-backed source rows across 18 stores (521 MiB written)

[sessions-list-bench] warmup 1/1: rows=10000 wall=75576.6ms eventLoopDelayMax=124.1ms
[sessions-list-bench] run 1/5: rows=10000 wall=75348.7ms eventLoopDelayMax=112.3ms

#

ok, this is odd. i shut down the TUI, shut down my chat app, nothing happening, then ran it again and im seeing similar figures plus the openclaw node process is not eating 100% cpu

#

pnpm test:sessions:list:bench -- --sessions 10000 --cold-runs 3 --runs 5 --warmup 1 --include-derived-titles --include-last-message

openclaw@2026.5.3 test:sessions:list:bench /tmp/clawdbot-bench
node --import tsx scripts/bench-sessions-list.ts -- --sessions 10000 --cold-runs 3 --runs 5 --warmup 1 --include-derived-titles --include-last-message

[sessions-list-bench] cold 1/3: rows=10000 wall=75461.7ms eventLoopDelayMax=662.2ms
[sessions-list-bench] cold 2/3: rows=10000 wall=75094.8ms eventLoopDelayMax=113.8ms

#

my test server that has no subagents

[sessions-list-bench] cold 1/3: rows=10000 wall=642.7ms eventLoopDelayMax=2.9ms
[sessions-list-bench] cold 2/3: rows=10000 wall=521.1ms eventLoopDelayMax=2.2ms
[sessions-list-bench] cold 3/3: rows=10000 wall=547.0ms eventLoopDelayMax=14.8ms
[sessions-list-bench] cloned 10000 sessions across 2 agents from 2/2 transcript-backed source rows across 20 stores (2717 MiB written)
[sessions-list-bench] warmup 1/1: rows=10000 wall=537.6ms eventLoopDelayMax=2.4ms
[sessions-list-bench] run 1/5: rows=10000 wall=161.7ms eventLoopDelayMax=2.2ms
[sessions-list-bench] run 2/5: rows=10000 wall=162.4ms eventLoopDelayMax=2.0ms
[sessions-list-bench] run 3/5: rows=10000 wall=157.8ms eventLoopDelayMax=2.3ms
[sessions-list-bench] run 4/5: rows=10000 wall=163.2ms eventLoopDelayMax=2.2ms
[sessions-list-bench] run 5/5: rows=10000 wall=161.5ms eventLoopDelayMax=2.1ms
[sessions-list-bench] cold summary: sessions=10000 min=521.1ms p50=547.0ms p95=642.7ms max=642.7ms avg=570.3ms
[sessions-list-bench] summary: sessions=10000 rows=10000 min=157.8ms p50=161.7ms p95=163.2ms max=163.2ms avg=161.3ms

#

my agent who is helping me with this pulled out the footgun card, and has a number of hypothesis we are testing out now, when i can get the shape of this to change wiht a discrete step i'll provide mroe detail

#

Fix direction:

•Do not build subagent read index for sessions.list unless subagent fields are requested or a spawnedBy/active-run filter requires it.
•Cache the subagent read index per request window or registry signature.
•Make descendant counts lazy and bounded.
•Avoid applying subagent graph enrichment to ordinary non-subagent rows by default.
This is probably the path that turns “10k rows” into “10k rows times subagent graph work.” Classic production footgun: metadata enrichment got promoted into the hot list path.

subtle gull May 4, 2026, 7:22 PM

#

my agent patched the test runner to have source limit, 5 ran great, 6 fell flat

node --import tsx scripts/bench-sessions-list.ts -- --sessions 10000 --source-limit 5 --cold-runs 1 --runs 1 --warmup 0

[sessions-list-bench] cold 1/1: rows=10000 wall=395.5ms eventLoopDelayMax=2.7ms
[sessions-list-bench] cloned 10000 sessions across 5 agents from 5/20 transcript-backed source rows across 18 stores (370 MiB written)
[sessions-list-bench] run 1/1: rows=10000 wall=348.8ms eventLoopDelayMax=2.3ms
[sessions-list-bench] cold summary: sessions=10000 min=395.5ms p50=395.5ms p95=395.5ms max=395.5ms avg=395.5ms
[sessions-list-bench] summary: sessions=10000 rows=10000 min=348.8ms p50=348.8ms p95=348.8ms max=348.8ms avg=348.8ms
lumadmin@psiclaw-template:/tmp/clawdbot-bench-forge$ pnpm test:sessions:list:bench -- --sessions 10000 --source-limit 6 --cold-runs 1 --runs 1 --warmup 0

openclaw@2026.5.3 test:sessions:list:bench /tmp/clawdbot-bench-forge
node --import tsx scripts/bench-sessions-list.ts -- --sessions 10000 --source-limit 6 --cold-runs 1 --runs 1 --warmup 0

[sessions-list-bench] cold 1/1: rows=10000 wall=83061.9ms eventLoopDelayMax=129.2ms
[sessions-list-bench] cloned 10000 sessions across 6 agents from 6/20 transcript-backed source rows across 18 stores (559 MiB written)

subtle gull May 5, 2026, 12:34 AM

#

found it

#

this is bizzare

#

Cause found: modelOverride.

It is not the session store shape generally. It is this field:

"modelOverride": "gpt-5.5"
Config📋
In sessions.list, every row calls:

resolveSessionModelRef(cfg, entry, sessionAgentId)
Code📋
when entry.modelOverride exists. That path goes through plugin model normalization and reloads plugin metadata/manifest machinery per row. When the benchmark clones one bad source row into thousands of rows, it becomes catastrophic.

Trace evidence from CPU profile:

listSessionsFromStoreAsync
buildGatewaySessionRow
resolveSessionModelRef
resolvePersistedSelectedModelRef
parseModelRef
normalizeModelRef
normalizeProviderModelId
normalizeStaticProviderModelId
normalizeProviderModelIdWithManifest
loadManifestModelIdNormalizationPolicies
loadPluginMetadataSnapshot
loadPluginRegistrySnapshotWithMetadata
loadInstalledPluginIndex
discoverOpenClawPlugins
Snippet📋
Repro evidence:

Original Forge template source-store:

--sessions 10000
--source-store forge-psiclaw-template/sessions/sessions.json
--agent-count 1
Snippet📋
Result:

timeout at 180s
no cold sample emitted
📋
Same store, only removing modelOverride:

cold 1/1: rows=10000 wall=555.7ms
run 1/1: rows=10000 wall=524.8ms

#

i backed into this by testing the agents directory cloned over to my test server and running the 10k sessions test against more and more folders at once

#

the first 5 had no override and executed just fine. the 6th turns out it had an override

#

after getting the script to display which folder it was running i started renaming the folders to get the alpha sort different until i validating it wasnt based on the number of sessions in the agents folder or the size of the sessions

#

{
"key": "agent:forge:webchat:forge-main",
"modelOverride": "gpt-5.5",
"model": "gpt-5.5",
"modelProvider": "openai-codex"
}

#

Root issue: entry.modelOverride caused model normalization per session row. With 10k cloned rows, that repeatedly hit plugin metadata discovery. Future outage, now with a receipt.

#

he did it! the outage has a fucking receipt

subtle gull May 5, 2026, 1:34 AM

#

Classic future outage with better marketing.

#

makes me giggle

#

prepping PR

subtle gull May 5, 2026, 4:00 AM

#

i think you found my PR @dense bison

#

my clanker is being a little odd, its on the dev server and its memory system has been heavily dev'd

#

77650

#

its working through getting it right

subtle gull May 5, 2026, 4:26 AM

#

@dense bison i think we got it right

#

its a pretty straightforward test, a crazy little thing not related to subagents or amount of agents but to per session model overrides, evaluated for every line

#

a bit excessive, the cache is scoped to just the session.list call, and seems to be a big part of the root cause people started seeing around 4.29

#

but not everyone, since not everyone does things like pin sessions to models

subtle gull May 5, 2026, 5:40 AM

#

Yep — confirmed.

PR #77650 is now:

•mergeable: true
•mergeable_state: clean
•Checks: 86 success, 1 neutral, 7 skipped
•0 queued / 0 in progress / 0 failing
•Head: b07c51a3f016
So it’s in final review/merge-ready state.

dense bison May 5, 2026, 6:05 AM

#

subtle gull > Yep — confirmed. > > PR #77650 is now: > > •mergeable: true > •mergeable_sta...

Reviewing

subtle gull May 5, 2026, 6:08 AM

#

if i need to teach my clanker a think or two about submitting PRs im all ears, ive got a skill in a repo im developing to get this right

#

did a little research, agent was part opus 4.6 and part gpt 5.5

dense bison May 5, 2026, 6:10 AM

#

subtle gull if i need to teach my clanker a think or two about submitting PRs im all ears, i...

Honestly if you've manually tested the fix that instantly makes your PR 10,000x better than everyone else

#

Most PRs just get the automated tests to pass and then I have to manually verify

subtle gull May 5, 2026, 6:12 AM

#

i tested it extensively to find the real problem

dense bison May 5, 2026, 6:19 AM

#

subtle gull i tested it extensively to find the real problem

Landing it 🫡

digital frost May 5, 2026, 6:20 AM

#

dense bison Landing it 🫡

🐐

#

this is why you're allowed to ping me

#

❤️

tropic fox May 5, 2026, 7:28 AM

#

subtle gull i tested it extensively to find the real problem

Nice one. this went deeper than we thought!
Someone get this man a pink hat.

subtle gull May 5, 2026, 8:20 AM

#

hell yeah and the fix made it into 2026.5.4-beta.3 just in the nick of time, and it looks like 2026.5.4 went live too

#

short beta

tropic fox May 5, 2026, 8:29 AM

#

I'm already using it. painless and quick.

#

release is bit-copy of beta3. great release.

subtle gull May 5, 2026, 9:37 AM

#

looks like only part of this PR got applied?

So 2026.5.4 shipped with the thinkingLevelsByModelRef cache, but not the selected-model override cache from our PR. That explains why it feels like the fix did not apply: the exact hot path we identified is still present in prod.

#

my prod server is hosed after updating 😄 reapplying my original fix

#

09:33:47+00:00 info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ sessions.list 197ms conn=d554971d…0751 id=691b95b3…b625
09:33:47+00:00 info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ sessions.list 176ms conn=d554971d…0751 id=a63c9cf1…24ed
09:33:48+00:00 info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ sessions.list 187ms conn=d554971d…0751 id=31fb61d5…6f02
09:34:08+00:00 info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ sessions.list 14102ms conn=d554971d…0751 id=1c963b38…1a76
09:34:22+00:00 info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ sessions.list 14243ms conn=d554971d…0751 id=f3c8a1a7…6489
09:34:22+00:00 info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ chat.history 28354ms conn=d554971d…0751 id=3461a64e…6f00
09:34:37+00:00 info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ sessions.list 14783ms conn=d554971d…0751 id=00fcfbb2…eb91
09:34:37+00:00 info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ chat.history 29041ms conn=d554971d…0751 id=940ac2c5…0781
09:34:37+00:00 info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ chat.history 14801ms conn=d554971d…0751 id=e9487977…b848
09:34:51+00:00 info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ sessions.list 13652ms conn=d554971d…0751 id=7ad03cb9…e40f

#

that escalated quickly

#

oh its not in 2026.5.4 😄

#

my agent told me that it was in 2026.5.4-beta.3 but it might have been checking local source which.. surprise would contain the fix it made

#

If I were patching next, I’d start in the Control UI/ClawCanvas request layer and add a small RPC broker: method + stableJson(params) → existing in-flight request or cached recent result. That stops the bear instead of just feeding it tranquilizers.
uh oh, i called the problem a bear. goblinification warning klaxons are going off

#

So the row-cache fix is not gone. The remaining failure is the request storm plus too-short cache TTL. Default sessions.list cache is only 10s, so ClawCanvas keeps hitting the expensive cold path every ~20s. Future outage, now current outage.
full circle back to outages. i think he feels very vindicated. its not a future outage, its a current outage that he predicted

#

the fixes from my PR do not improve the current situation, there is more going on

#

im still investigating, 5.5 will keep going on this, but i need to go to bed

digital frost May 5, 2026, 2:53 PM

#

robot borken right now, but I got you bb @subtle gull

#

bout time you tism maxers finally did a PR KEKWlaugh

subtle gull May 5, 2026, 2:57 PM

#

ive had a smooth going with openclaw mostly and i finally hit an issue that i was having and the maintainers werent

#

actually need to modify the PR, my claw found the real problem

#

The current root-cause fix targets the model identity resolver path:

•slow call is resolveSessionModelRef(...), ~550-700ms per affected row
•many rows still hit it even after our PR, especially rows missing persisted modelProvider + model
•resolveSessionModelIdentityRef(...) falls back into resolveSessionModelRef(...)
•resolveSessionDisplayModelIdentityRef(...) can also re-enter default model resolution for CLI/provider display normalization
So the real fix is:

1.Use persisted entry.modelProvider + entry.model directly for lightweight sessions.list rows.
2.Cache resolveSessionModelRef by agentId + modelOverride/providerOverride + runtime model/provider.
3.Precompute default model refs per agent once per list.
4.Avoid display-model normalization in lightweight rows unless required.
Short version: our PR cached some derived row fields; this fix removes/caches the expensive default-model resolver itself. The old PR was a bandage around the hotspot. This is cutting out the hotspot.

subtle gull May 5, 2026, 11:23 PM

#

2344+00:00 info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ chat.history 59ms conn=b9f8ca9b…4ee0 id=4da6c5aa…673e
2301+00:00 info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ chat.history 52ms conn=b9f8ca9b…4ee0 id=9baa2687…c1a7
2316+00:00 info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ sessions.list 1119ms conn=b9f8ca9b…4ee0 id=5b88262c…ab42
2322+00:00 info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ logs.tail 68ms conn=4440f309…bc82 id=75d76078…0ad3
2322+00:00 info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ chat.history 71ms conn=b9f8ca9b…4ee0 id=0c2a2e54…ba0f
2341+00:00 info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ chat.history 54ms conn=b9f8ca9b…4ee0 id=6c74419a…cc14
2302+00:00 info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ chat.history 51ms conn=b9f8ca9b…4ee0 id=a2b1f84e…5d71
2317+00:00 info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ sessions.list 1149ms conn=b9f8ca9b…4ee0 id=0bc8245d…5f8e
2321+00:00 info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ chat.history 52ms conn=b9f8ca9b…4ee0 id=d44b741b…d008
2341+00:00 info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ chat.history 55ms conn=b9f8ca9b…4ee0 id=88b8e8cc…f8b4

#

ok we might be cooking

#

fully loaded has 1.1s as worst case, vs uhhhh like 36s

digital frost May 5, 2026, 11:28 PM

#

subtle gull fully loaded has 1.1s as worst case, vs uhhhh like 36s

at what point would this be just chasing edge cases?

subtle gull May 5, 2026, 11:28 PM

#

anything further than this

#

its all in sessions.list

#

getting the full analysis from my claw about what it found

#

we are going back to clean slate and patching just the sessions.ts that was responsible, it went in a few other weird experimental areas before we put a custom sessions.ts with extensive logging in to find out which call was the problem child

#

up until the last patch the peaks would still get up to 8000ms, which could start a backlog issue

subtle gull May 6, 2026, 12:02 AM

#

updated PR incoming

tropic fox May 6, 2026, 12:43 AM

#

digital frost at what point would this be just chasing edge cases?

being an edge case just makes it less likely to manifest but if it's a bug, it's a bug!

digital frost May 6, 2026, 12:44 AM

#

tropic fox being an edge case just makes it less likely to manifest but if it's a bug, it's...

but if it affects 0.000000000001% of users, maintainer time and resources might be better spent else where lol

subtle gull May 6, 2026, 12:44 AM

#

the core problem isnt an edge case, when it gets fixed it benefits a wide range of people, even instances where it doesnt blow the duration of sessions.list from ~500ms steady state to 30000ms+

#

even 6000-7000ms is enough to negatively impact the event queue and affect a lot of people. and the potential causes are quite a bit and common

#

this can happen with 1 agent and zero subagents provided the user has more than 1 model, hell, it probably even can happen with a single model

digital frost May 6, 2026, 12:46 AM

#

subtle gull this can happen with 1 agent and zero subagents provided the user has more than ...

make sure to have repro steps in your PR

subtle gull May 6, 2026, 12:46 AM

#

what happens is that sessions.list looks at all off hte sessions in the openclaw folder, and all of the lines of conversation in the sessions, if a line has static model/thinking/provider assignments, it does it kicks off a blocking provider/model/thinking mode resolution path for each line

tropic fox May 6, 2026, 12:46 AM

#

digital frost but if it affects 0.000000000001% of users, maintainer time and resources might ...

That's what clawsweeper is for. Hopefully the PR is comprehensive enough that it's a simple [ok] click for the maintainer by the time clawsweeper is done with it.

subtle gull May 6, 2026, 12:47 AM

#

that call could be slow, especially if multiple providers and models are present

#

if you have a session file with 1000 entries in it, or 100 session files with 10 entries, things get ugly

tropic fox May 6, 2026, 12:48 AM

#

I could probably also repro this but I preemptively archived most of my session files to avoid this and related issues (most of which were already fixed in the last push)

subtle gull May 6, 2026, 1:29 AM

#

77902

#

its a bigger patch than the last one but 2026.5.4 changed things, the STR are explicit with showing the default path is the bad path, and how it blows up the more it hits the worse path

dense bison May 6, 2026, 4:57 AM

#

subtle gull 77902

Reviewing

subtle gull May 6, 2026, 4:58 AM

#

i have a few revisions from recent copilot suggestions

#

its a little narrower

#

Read: your suspicion is right. For the active fleet stores, the design assumption is backwards. sessions.list is optimized around “model data may be missing,” but the common case is “model data is already persisted.”

Operational conclusion: the fast path should be the default:

1.Use providerOverride + modelOverride if present.
2.Else use persisted modelProvider + model if present.
3.Only fall back to expensive default/agent/model resolution when both are missing.
That keeps correctness for old or malformed rows, but stops making the normal path wear the blast suit.

#

only sampled 37 on 2 different servers

#

100% of them had the data, and currently sessions.list does model resolution each row if it has it, but doesnt when its missing. if it already has the data, why is it looking it up?

#

#

1000 messages and most of them had the data

dense bison May 6, 2026, 5:06 AM

#

subtle gull i have a few revisions from recent copilot suggestions

Hmm, I'm able to repro your slowdown. I can hold off until you're done though

subtle gull May 6, 2026, 5:08 AM

#

dense bison Hmm, I'm able to repro your slowdown. I can hold off until you're done though

can you do a check like what i just did, looking at the session transcripts and seeing how many have the provider+model data and how many dont? its damn near 100% for me but i am one operator using two systems intentionally operating very similar to each other

#

Latest solution: make persisted model identity the primary path for sessions.list.

Current shape:

1.Fast path first
•If row has providerOverride + modelOverride, use it.
•Else if row has persisted modelProvider + model, use it.
•Do not call default model resolution.
2.Fallback only for sparse/legacy rows
•If both persisted paths are missing, fall back to existing agent/default model resolution.
•That preserves correctness for old rows.
3.Lightweight rows stay lightweight
•Skip thinking enrichment for lightweight sessions.list rows.
•Do not build full model/thinking option metadata unless the caller needs detail-view data.
4.Optional storm guard
•Keep response caching / UI poll damping as secondary mitigation.
•But the real fix is avoiding the expensive resolver in the normal row path.
Evidence so far:

•openclaw-prod current Gateway store: 23/23 rows had persisted model identity
•local current store: 14/14 rows had persisted model identity
•broader prod sample: 372/408 artifacts complete, missing cases were older sparse session-jsonl
•prod benchmark:
•before: full list around 8s
•temp patched dist: around 640ms
•installed direct benchmark: 776-873ms
•real Gateway WS log: sessions.list 124ms
So my current recommendation is: fold the persisted-model fast path into the maintainer PR as the primary fix, then keep the cache pieces only as defensive pressure relief. The previous cache-only path was a bandage. This is the artery clamp.

#

PR update in flight, should land shortly

#

5.5 in high thinking is a little slow sometimes

dense bison May 6, 2026, 5:31 AM

#

subtle gull May 6, 2026, 5:32 AM

#

sorry, hitting long tool calls that are killing results, getting it tightened up

dense bison May 6, 2026, 5:32 AM

#

haha no rush

subtle gull May 6, 2026, 5:33 AM

#

this clanker is getting tired we've been at it off and on all day

#

now test is running worse than prod lol. prod was chilling doing nothing most of hte day just watching openclaw logs --follow session.list and chat.history metrics

#

uploaded, its running through the copilot/clawsweeper comments now, might have some more minor revisions

#

One Copilot note is still worth tightening before I reply: runtime model identity should bypass the default/agent cache key entirely. I’m making that small follow-up so the responses are backed by code, not hand-waving.
He doesn't like hand waving

#

The revised head is now e92c9ad7ea and pushed. I’m posting concise replies to the ClawSweeper/Copilot threads now, then I’ll try to resolve the review threads if the token has permission.

#

Comments:

•Replied to ClawSweeper
•Replied to all 5 Copilot review comments
•Resolved all 5 review threads via GraphQL

#

ok, i think hes done for now

dense bison May 6, 2026, 5:57 AM

#

subtle gull ok, i think hes done for now

cool, ready for review?

#

I don't like that it adds another cache so I'll see if I can keep it fast without it

subtle gull May 6, 2026, 5:58 AM

#

yeah, ready. i tried to avoid caches but look at the results

#

made them very narrow

#

I’d describe it as request-local row-resolution memoization, not a broad cache.

Narrowness:

•Lifetime: one sessions.list call only. It dies after the response is built.
•Scope: only model identity resolution for rows in that single response.
•Keys: effective inputs that affect model identity: runtime model, fallback ref, override ref, agent/default model ref.
•No cross-request state: no TTL, no invalidation problem, no stale config risk.
•No persistence: nothing survives config changes, session writes, or process lifetime.
Why it matters: sessions.list is not resolving one model. It is resolving model identity across hundreds to thousands of session rows, and many rows share the same effective defaults. Without memoization, the same config/default resolution path gets recomputed per row. That is the bad scaling behavior.

The maintainer-preferred version, “fast without cache,” is ideal if the resolver itself can become O(1) and allocation-light. But for this PR, the safer narrow fix is to avoid changing resolver semantics and just avoid repeating identical resolver work inside one response.

Suggested phrasing:

This is not a long-lived cache. It is request-local memoization inside one sessions.list response build. The intent is to preserve the existing resolver semantics, including aliases and wrapper-provider inference, while avoiding repeated identical model identity resolution across rows that share the same effective runtime/default/override inputs. Nothing survives the request, so there is no invalidation surface.

#

it wanted to cache this and cache that, time based, dont worry it'll be good enough but i kept saying no and to drill in on the specific blocking part of the hot path

#

the benchmarks showed the results improved as the problem was more specifically addressed

dense bison May 6, 2026, 6:12 AM

#

Good catch @subtle gull, pushing a fix

dense bison May 6, 2026, 6:49 AM

#

Nice!!! Going to land this in a bit

#

Great work, thank you @subtle gull and @tropic fox

tropic fox May 6, 2026, 7:25 AM

#

props to @subtle gull for this bit. My fixes didn't fix his install and I inadvertently swerved it by archiving my sessions (^^;)

digital frost May 6, 2026, 7:34 AM

#

dense bison Great work, thank you <@142843545215041536> and <@679604208940351488>

I'm so proud of you boys ❤️

dense bison May 6, 2026, 7:38 AM

#

PR Landed 🫡

digital frost May 6, 2026, 7:39 AM

#

dense bison PR Landed 🫡

🐐

subtle gull May 6, 2026, 9:24 AM

#

dense bison Good catch <@142843545215041536>, pushing a fix

https://klipy.com/gifs/anthony-adams-rubbing-hands-together-spice-adams--k01KQY9N3YSC7AWXWTC564ZWEAR

Klipy

Spice Adams Rubbing Hands Meme: Yellow & Pink Suit

▶ Play video

#

my claw looking at that 20s resolveProivderHookPlugin

#

thinking 'i kicked this problems ass twice now i can go for some more'

#

Different function, same future outage jacket

#

oh hes got its number

subtle gull May 7, 2026, 11:09 PM

#

dense bison PR Landed 🫡

it looks like this isnt in 2026.5.7, is it showing up soon or is the underlying process getting reworked from the findings? I'm keeping track so I can reapply the fixes each time I update

dense bison May 8, 2026, 1:40 AM

#

subtle gull it looks like this isnt in 2026.5.7, is it showing up soon or is the underlying ...

It isn’t? It should’ve been in the next release after I landed it

subtle gull May 8, 2026, 1:41 AM

#

maybe my clanker is missing something, it keeps looking for the patch signals and says its not landed yet

#

•But v2026.5.7 does not contain the PR code:
•src/gateway/session-utils.ts in tag v2026.5.7 has no modelIdentityByEntryKey
•no allowPluginNormalization
•no lightweight-row fast-path markers from the PR
•GitHub compare shows the release tag diverged from the PR merge commit, not ahead of it.
Operational read: 2026.5.7 was likely cut from a maintenance/release branch that did not include our merged sessions.list PR. The fix is merged to main, but not shipped in this release.

#

soon as i cut a commit on another project i'll check to see if im safe to update to 2026.5.7 and then update it and not patch, my symptoms are immediate and severe so it will be an easy test

calm pebble May 8, 2026, 1:44 AM

#

I had/have a PR that stayed in unreleased too, though in that case I was asked to review if it was needed at all or needed to change shape after this release

subtle gull May 8, 2026, 2:06 AM

#

dense bison It isn’t? It should’ve been in the next release after I landed it

Confirmed: openclaw-prod is on 2026.5.7, the package is missing the fast-path markers, and the symptom is back. I have a rebased 2026.5.7 build ready; applying it with a package backup now.

02:02:36+00:00 warn diagnostic {"subsystem":"diagnostic"} liveness warning: reasons=event_loop_delay interval=32s eventLoopDelayP99Ms=650.6 eventLoopDelayMaxMs=5838.5 eventLoopUtilization=0.687 cpuCoreRatio=0.711 active=1 waiting=0 queued=1 recentPhases=sidecars.subagent-recovery:4ms,sidecars.main-session-recovery:4ms,post-attach.update-sentinel:0ms,sidecars.model-prewarm:581ms,sidecars.session-locks:835ms,post-ready.maintenance:138ms work=[active=agent:forge:webchat:forge-main(processing,q=1,age=5s last=run:completed) queued=agent:forge:webchat:forge-main(processing,q=1,age=5s last=run:completed)]
02:02:37+00:00 info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ chat.history 591ms conn=da87abf7…1a00 id=e210e490…5c3f
02:02:45+00:00 info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ sessions.list 6502ms conn=da87abf7…1a00 id=a4c41974…01b8
02:02:45+00:00 info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ logs.tail 6513ms conn=8e61cb8e…97ca id=cb88e5c3…3c9b
02:02:57+00:00 info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ chat.history 595ms conn=da87abf7…1a00 id=f580d9d1…543e
02:03:07+00:00 info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ sessions.list 6564ms conn=da87abf7…1a00 id=7e299566…abaf
02:03:17+00:00 info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ chat.history 582ms conn=da87abf7…1a00 id=77f8feeb…e7e0
02:03:31+00:00 info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ sessions.list 9194ms conn=da87abf7…1a00 id=fffc9483…1a18
02:03:32+00:00 info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ chat.history 585ms conn=da87abf7…1a00 id=bacdf968…4389
02:03:52+00:00 info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ sessions.list 6197ms conn=da87abf7…1a00 id=6ed2e67d…b38c
02:03:54+00:00 info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ chat.history 617ms conn=da87abf7…1a00 id=bbab0b17…c024
02:04:14+00:00 info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ sessions.list 6417ms conn=da87abf7…1a00 id=d5e3b7c8…2c7f
02:04:14+00:00 info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ chat.history 7011ms conn=da87abf7…1a00 id=5b39928f…b341
02:04:36+00:00 info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ sessions.list 6477ms conn=da87abf7…1a00 id=2323de92…ee2f
02:04:36+00:00 info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ chat.history 7114ms conn=da87abf7…1a00 id=989a9b11…f76

yeah it doesnt look like its in

dense bison May 8, 2026, 2:12 AM

#

subtle gull 77902

Can you please ask it to check specifically if this got merged

subtle gull May 8, 2026, 2:14 AM

#

after it repatched

0242+00:00 warn diagnostic {"subsystem":"diagnostic"} liveness warning: reasons=event_loop_delay interval=34s eventLoopDelayP99Ms=302 eventLoopDelayMaxMs=4810.9 eventLoopUtilization=0.472 cpuCoreRatio=0.483 active=1 waiting=0 queued=1 recentPhases=sidecars.subagent-recovery:3ms,sidecars.main-session-recovery:4ms,post-attach.update-sentinel:0ms,sidecars.model-prewarm:643ms,post-ready.maintenance:178ms,sidecars.session-locks:1217ms work=[active=agent:forge:webchat:forge-main(processing,q=1,age=8s last=run:completed) queued=agent:forge:webchat:forge-main(processing,q=1,age=8s last=run:completed)]
0244+00:00 info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ logs.tail 842ms conn=b767019f…acba id=0d9a0228…fbfc
0257+00:00 info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ sessions.list 1368ms conn=2edb97db…4bfe id=19fcfcb5…511d
0213+00:00 info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ sessions.list 1384ms conn=2edb97db…4bfe id=58352e11…9490
0230+00:00 info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ sessions.list 1384ms conn=2edb97db…4bfe id=fcacf92b…826a
0240+00:00 info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ sessions.list 1384ms conn=2edb97db…4bfe id=0202e0b5…1bab
0248+00:00 info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ sessions.list 1261ms conn=2edb97db…4bfe id=1f0f9d5d…3bcd
0250+00:00 info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ logs.tail 834ms conn=7f63f694…8e09 id=2ed91323…b5b2
0206+00:00 info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ sessions.list 1384ms conn=2edb97db…4bfe id=1ae0ae1e…edd8
0223+00:00 info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ sessions.list 1369ms conn=2edb97db…4bfe id=901a65db…15c9
0239+00:00 info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ sessions.list 1481ms conn=2edb97db…4bfe id=9053e7fa…b90a

higher than what it was when we made the PR
i asked specifically for 77902 tracing

#

For #77902: yes, GitHub reports it merged.

•PR: openclaw/openclaw#77902
•State: closed
•merged=True
•Merged at: 2026-05-06T07:33:55Z
•Merge commit: 3e0fcafb8720ba3c1bb0fb430b88013eff5c6d08
•Title: fix(gateway): fast-path qualified session-list model refs
Important caveat: it was merged, but it was not present in shipped v2026.5.7, which is why the prod hotpatch was still needed.

3.Code comparison
In v2026.5.7, SessionListRowContext only has the thinking-level cache:

type SessionListRowContext = {
subagentRuns: ReturnType;
storeChildSessionsByKey: Map;
thinkingLevelsByModelRef: Map>;
};

In origin/main, it has the missing selected-model cache:

type SessionListRowContext = {
subagentRuns: ReturnType;
storeChildSessionsByKey: Map;
selectedModelByOverrideRef: Map>;
thinkingLevelsByModelRef: Map>;
};

4.Row context initialization

v2026.5.7:

return {
subagentRuns,
storeChildSessionsByKey: buildStoreChildSessionIndex(params.store, params.now, subagentRuns),
thinkingLevelsByModelRef: new Map(),
};

origin/main:

return {
subagentRuns,
storeChildSessionsByKey: buildStoreChildSessionIndex(params.store, params.now, subagentRuns),
selectedModelByOverrideRef: new Map(),
thinkingLevelsByModelRef: new Map(),
};

5.Missing helper function in shipped tag
origin/main contains:

function resolveSessionSelectedModelRef(params: {
cfg: OpenClawConfig;
entry?: SessionEntry;
agentId: string;
rowContext?: SessionListRowContext;
}): ReturnType | null {
const override = normalizeStoredOverrideModel({
providerOverride: params.entry?.providerOverride,
modelOverride: params.entry?.modelOverride,
});
if (!override.modelOverride) {
return null;
}
if (!params.rowContext) {
return resolveSessionModelRef(params.cfg, params.entry, params.agentId);
}
const key = [
normalizeAgentId(params.agentId),
override.providerOverride ?? "",
override.modelOverride,
].join("\0");
const cached = params.rowContext.selectedModelByOverrideRef.get(key);
if (cached) {
return cached;
}
const selected = resolveSessionModelRef(params.cfg, params.entry, params.agentId);
params.rowContext.selectedModelByOverrideRef.set(key, selected);
return selected;
}

v2026.5.7 does not have this helper. It jumps from createSessionRowModelCacheKey() directly to resolveSessionRowThinkingLevels().

6.Call-site difference

v2026.5.7 still directly resolves per row:

const resolved = resolveSessionModelRef(cfg, entry, agentId);
return { provider: resolved.provider, model: resolved.model };

origin/main uses the cached helper in row construction:

const selectedModel = resolveSessionSelectedModelRef({
cfg,
entry,
agentId: sessionAgentId,
rowContext,
});

Conclusion: #77902 is merged to GitHub main, but v2026.5.7 was cut without that code. The shipped tag lacks the actual cache field, initializer, helper, and row call-site change.

#

ok that was a bit big, sorry

#

its also looking at the extra latency vs 2026.5.5 which is what i was running prior to todays patch. I think our steady state was around 400ms and peaks to 1200ms, not steady at 1200ms and peaks to 4800ms

subtle gull May 8, 2026, 2:35 AM

#

dense bison Can you please ask it to check specifically if this got merged

its in main, 5.6 and 5.7 just got cut more narrowly from main

dense bison May 8, 2026, 2:35 AM

#

Ahhh got it

subtle gull May 8, 2026, 2:36 AM

#

and my clanker thinks the extra latency is from background maintanence tasks related trying to help recover failed cron jobs

#

so not a sessions.list issue, different topic for a different PR

dense bison May 8, 2026, 3:32 AM

#

subtle gull its in main, 5.6 and 5.7 just got cut more narrowly from main

Hopefully going out soon, some major refactors on main right now so releases are being cherry picked for a bit

#sessions.list perf