#sessions.list perf
1 messages Β· Page 1 of 1 (latest)
@subtle gull @tropic fox I'm trying to repro this with 10k sessions, if you folks have a minimal repro I could test that would be great
Could very well be machine dependent, I'm testng on a mac so it probably has a quick SSD
im on something that has datacenter NVME SSDs
let me get the details
i just got things stabilized so i can use my main chat app, no longer bound to the confines of the tui
my chat app was abusing the problem big time, wide scope calls against sessions a lot of times frequently, worked fine until uh, 4.27?
now its scoped down much more, plus the cache and it seems to be happy
hmm on main I made 10k sessions about 256kb each and it completes in 1s cold and 300ms hot
Without any cache
Mine is a 2.5GHz quad-core Intel MBP with 16GB memory running Sequoia 15.7.3. Only other things "running" were Tailscale, Clash proxy, Teamviewer, Obsidian and Things3 -- all in background. My numbers show only direct CPU usage excluding other processes.
Got it, lemme push a quick benchmark script
Try this please:
Run on the affected machine:
git fetch https://github.com/obviyus/clawdbot.git bench/sessions-list-main
git checkout FETCH_HEAD
pnpm install
pnpm test:sessions:list:bench -- --sessions 10000 --cold-runs 3 --runs 5 --warmup 1 --include-derived-titles --include-last-message
If you have ~6GB free disk, also run:
pnpm test:sessions:list:bench -- --sessions 10000 --inflate-transcript-kib 256 --cold-runs 1 --runs 3 --warmup 1 --include-derived-titles --include-last-message
Reply with:
- full command output
- OpenClaw version/commit
- machine + storage type
- number of agents
- whether Control UI/TUI was open while running it
imma get Claude to do that. Whilst OC is running. ok?
Make sure to test on latest main please!
ah.. ok. that will put my claw out of action so I'll just have to spend a little longer preparing for that first... brb.
Thank you, really appreciate it
let me get it to answer the right question the right way, heres a few of the details from me getting the hotfix in and getting my chat app, called clawdash, working with smarter bounded calls
Why prod hurts and test does not:
- Prod has 30 agent dirs, 1,876 files, about 1.7GB under session dirs, and heavy active churn.
- Forge alone has 367 session-dir files and about 375MB.
- The expensive part is not just returning rows, it is repeatedly discovering, loading, merging, filtering, and ranking a much larger all-agent session
universe.- Test has the same architecture but lower cardinality and less churn, so the same O(all sessions / all agents) shape stays below the pain threshold.
but even with clawdash stopped and using the TUI, the openclaw node process goes to 100% cpu util for that core
and it grinds to a halt
this is what we just finished implementing, and its making things pretty usable
- ClawDash hotfix
- Caches channel computation for ~60-75s.
- Uses single-flight, so concurrent callers share one request.
- During active ClawCanvas sends, serves stale channel metadata for up to 5m.
- Calls gateway with bounded params:
- limit: 80-160
- activeMinutes: 1440
- includeDerivedTitles: false
- includeLastMessage: false- Gateway hotfix
- Adds a sessions.list response cache, 15s TTL.
- Coalesces identical in-flight requests.
- Still recomputes on cache miss or invalidation.
- Non-hotfixed gateway path
- Discovers all agent session stores if agentId is absent.
- Loads and merges every agentβs sessions.json.
- Iterates the full combined store.
- Applies filters after store merge.
- Applies limit only after scanning candidates.
- Builds rows, then yields every 10 rows, but much of the expensive prework is synchronous.
cogitating ...
Gateway logs showed repeated slow sessions.list responses.
sessions.list 8123ms
sessions.list 8559ms
sessions.list 8337ms
sessions.list 8392ms
sessions.list 8578ms
sessions.list 8696ms
sessions.list 11470mscount=55
min=115ms
p50=1811ms
p95=8628ms
max=11470ms
avg=4296msInterpretation:
β’Fast path exists: some calls completed in ~500-700ms
β’Bad path is very repeatable: many calls clustered around 8.1-8.7s
β’Worst observed: 11.47s
β’Tool-level call timed out at 10s
So this is probably not random network jitter. It smells like conditional hydration work: some requests or states hit a slow metadata path.. User-visible impact
The direct UI symptom:β’Control UI / ClawDash session list became slow or stale
β’Status updates looked broken because the list did not refresh promptly
β’It created enough lag that the operator thought the assistant might be broken
Operational concern:If sessions.list is taking 8-11s, it competes with ordinary chat operations on the same gateway. Even if chat still works, the UI becomes a pressure source. That is future outage bait: an observability/control surface should not degrade the thing it monitors.
god he loves talking about outages
main as in beta4 right?
No no github repo main
- Likely expensive areas
The suspected expensive work is per-row hydration across many sessions.Likely hot spots:
β’title generation or derived title hydration
β’last-message preview hydration
β’transcript head/tail reads per session
β’model/runtime metadata projection
β’token/usage fallback calculation
β’plugin/setup metadata projection
β’repeated UI polling every ~30s
The key risk is if sessions.list performs anything close to a transcript audit for each row.Expected behavior should be:
β’cheap index read
β’cached metadata
β’bounded preview data
β’opt-in expensive hydration
i think that more or less summarizes what i asked him to consolidate for you. im on npm branches, i could spin up another test server and migrate my install to that and convert it to a github main install
building... might take a while.
Weird, I'm still not able to repro this slowdown no matter what
im spinning up a system to test everything with
Thank you both π«‘
we can go hog wild

ive got a template and everything, sonnet will take care of this quickly
am I QA now? what we testing
i tried to run it for another datapoint. it failed horribly and I ain't breaking anything for it
Nah nothing required from you, I'm just trying to repro the session.list slowdown
baddies all up in this joint
on it boss
@dense bison is π baddie, he's allowed to ping me
now where's that "whipping claude" GIF...
I think they let me remove roles... one sec...
Brave plugin causing issues... removing it to save time.
$ pnpm test:sessions:list:bench -- --sessions 10000 --cold-runs 3 --runs 5 --warmup 1 --include-derived-titles --include-last-message
openclaw@2026.5.3 test:sessions:list:bench <repo-root>
node --import tsx scripts/bench-sessions-list.ts -- --sessions 10000 --cold-runs 3 --runs 5 --warmup 1 --include-derived-titles --include-last-message
[sessions-list-bench] cold 1/3: rows=10000 wall=682.2ms eventLoopDelayMax=1.7ms
[sessions-list-bench] cold 2/3: rows=10000 wall=712.7ms eventLoopDelayMax=1.9ms
[sessions-list-bench] cold 3/3: rows=10000 wall=760.3ms eventLoopDelayMax=1.6ms
[sessions-list-bench] cloned 10000 sessions across 1 agents from 2/2 transcript-backed source rows across 1 stores (2420 MiB written)
[sessions-list-bench] warmup 1/1: rows=10000 wall=625.7ms eventLoopDelayMax=2.3ms
[sessions-list-bench] run 1/5: rows=10000 wall=111.0ms eventLoopDelayMax=2.0ms
[sessions-list-bench] run 2/5: rows=10000 wall=122.2ms eventLoopDelayMax=1.7ms
[sessions-list-bench] run 3/5: rows=10000 wall=115.6ms eventLoopDelayMax=2.4ms
[sessions-list-bench] run 4/5: rows=10000 wall=111.1ms eventLoopDelayMax=2.0ms
[sessions-list-bench] run 5/5: rows=10000 wall=117.0ms eventLoopDelayMax=1.7ms
[sessions-list-bench] cold summary: sessions=10000 min=682.2ms p50=712.7ms p95=760.3ms max=760.3ms avg=718.4ms
[sessions-list-bench] summary: sessions=10000 rows=10000 min=111.0ms p50=115.6ms p95=122.2ms max=122.2ms avg=115.4ms
$ pnpm test:sessions:list:bench -- --sessions 10000 --inflate-transcript-kib 256 --cold-runs 1 --runs 3 --warmup 1 --include-derived-titles --include-last-message
openclaw@2026.5.3 test:sessions:list:bench <repo-root>
node --import tsx scripts/bench-sessions-list.ts -- --sessions 10000 --inflate-transcript-kib 256 --cold-runs 1 --runs 3 --warmup 1 --include-derived-titles --include-last-message
[sessions-list-bench] cold 1/1: rows=10000 wall=685.6ms eventLoopDelayMax=1.7ms
[sessions-list-bench] cloned 10000 sessions across 1 agents from 2/2 transcript-backed source rows across 1 stores (3624 MiB written)
[sessions-list-bench] warmup 1/1: rows=10000 wall=785.2ms eventLoopDelayMax=2.1ms
[sessions-list-bench] run 1/3: rows=10000 wall=107.0ms eventLoopDelayMax=1.6ms
[sessions-list-bench] run 2/3: rows=10000 wall=112.0ms eventLoopDelayMax=1.4ms
[sessions-list-bench] run 3/3: rows=10000 wall=107.3ms eventLoopDelayMax=1.4ms
[sessions-list-bench] cold summary: sessions=10000 min=685.6ms p50=685.6ms p95=685.6ms max=685.6ms avg=685.6ms
[sessions-list-bench] summary: sessions=10000 rows=10000 min=107.0ms p50=107.3ms p95=112.0ms max=112.0ms avg=108.8ms
doesn't look like I can repro
how many agent directories do you have?
0, this is on my main laptop, fresh OC install
hm. ask your agent to make you like 10 more agents? i dont know that they actually need to do anything
you could go through the tui and just say hi to all of them
actually scratch that, my test server has all 15 of my main agents on it
its all the subagents that seem to be the real difference between my prod and test server
it's the number of session files, isn't it?
you would think, but the benchmark makes it look like its not
my QA paired programmer
reran with 10 agents
anything else you want me to try?
or I'm nuking these agents and sessions lol
Go for it
Thank you for testing π«‘

wiring up auth on the test server now
claude sez...
βΊ Here are the baseline numbers on 2026.5.3 main, no sessions.list caching (pre-#62164):
ββββββββββββββββββββββββ¬βββββββββββββββ¬ββββββββββββββββ
β Metric β 200 sessions β 1000 sessions β
ββββββββββββββββββββββββΌβββββββββββββββΌββββββββββββββββ€
β Warm p50 β ~1.35s β ~5.4s β
ββββββββββββββββββββββββΌβββββββββββββββΌββββββββββββββββ€
β Warm p95 β ~1.37s β ~6.1s β
ββββββββββββββββββββββββΌβββββββββββββββΌββββββββββββββββ€
β Event loop delay avg β ~307ms β ~355ms β
ββββββββββββββββββββββββΌβββββββββββββββΌββββββββββββββββ€
β Cold min β ~4.8s β ~5.9s β
ββββββββββββββββββββββββ΄βββββββββββββββ΄ββββββββββββββββ
The event loop blocking is significant (~350ms for 1k sessions), and the call scales roughly linearly. These are the "before"
numbers for when #62164 eventually lands.
Summary of what just happened:
- Installed: OpenClaw 2026.5.3 from GitHub main (commit 5a6cedc1) via npm pack + npm install -g
- Config cleaned up: Removed brave (broken npm package, no compiled output), codex, discord stale entries
- Gateway: Running at pid 62167, reachable, version confirmed 2026.5.3
- Benchmark run: sessions.list at 1000 sessions β warm p50 ~5.4s, event loop delay ~355ms β this is the pre-#62164 baseline
actually getting codex to run it as well since it has more context about the fix.
hmm you're slightly behind main which defaults omitted sessions.list.limit to 100 and returns totalCount/limitApplied/hasMore
hmmm stupid clanker. I saw it deciding between 5.4 and 5.3 somehow. and said it was going to use the several hours older version cos it was already in npm. I stopped it, that's when I asked you, and I told it explicitly "I want you to update my local install with the version on github main". and it took a good 20-30mins to rebuild using pnpm. That's the best way I can articulate what I saw it do lol
I normally just tell it to pull upstream main into local project and install from that
Benchmark 1 Output
openclaw@2026.5.3 test:sessions:list:bench /private/tmp/clawdbot2
node --import tsx scripts/bench-sessions-list.ts -- --sessions 10000 --cold-runs 3 --runs 5 --warmup 1 --include-derived-titles --include-last-
message
[sessions-list-bench] cold 1/3: rows=10000 wall=47172.0ms eventLoopDelayMax=3189.8ms
[sessions-list-bench] cold 2/3: rows=10000 wall=43038.5ms eventLoopDelayMax=206.7ms
[sessions-list-bench] cold 3/3: rows=10000 wall=40875.8ms eventLoopDelayMax=196.3ms
[sessions-list-bench] cloned 10000 sessions across 3 agents from 34/35 transcript-backed source rows across 6 stores (921 MiB written)
[sessions-list-bench] warmup 1/1: rows=10000 wall=39691.1ms eventLoopDelayMax=415.5ms
[sessions-list-bench] run 1/5: rows=10000 wall=36949.5ms eventLoopDelayMax=187.4ms
[sessions-list-bench] run 2/5: rows=10000 wall=39557.2ms eventLoopDelayMax=335.3ms
[sessions-list-bench] run 3/5: rows=10000 wall=42008.1ms eventLoopDelayMax=200.4ms
[sessions-list-bench] run 4/5: rows=10000 wall=42586.7ms eventLoopDelayMax=203.0ms
[sessions-list-bench] run 5/5: rows=10000 wall=40365.8ms eventLoopDelayMax=397.7ms
[sessions-list-bench] cold summary: sessions=10000 min=40875.8ms p50=43038.5ms p95=47172.0ms max=47172.0ms avg=43695.4ms
[sessions-list-bench] summary: sessions=10000 rows=10000 min=36949.5ms p50=40365.8ms p95=42586.7ms max=42586.7ms avg=40293.5ms
Benchmark 2 Output
openclaw@2026.5.3 test:sessions:list:bench /private/tmp/clawdbot2
node --import tsx scripts/bench-sessions-list.ts -- --sessions 10000 --inflate-transcript-kib 256 --cold-runs 1 --runs 3 --warmup 1 --include-
derived-titles --include-last-message
[sessions-list-bench] cold 1/1: rows=10000 wall=46099.3ms eventLoopDelayMax=1950.4ms
[sessions-list-bench] cloned 10000 sessions across 3 agents from 34/35 transcript-backed source rows across 6 stores (2893 MiB written)
[sessions-list-bench] warmup 1/1: rows=10000 wall=43491.8ms eventLoopDelayMax=214.3ms
[sessions-list-bench] run 1/3: rows=10000 wall=39897.2ms eventLoopDelayMax=211.7ms
[sessions-list-bench] run 2/3: rows=10000 wall=41191.1ms eventLoopDelayMax=422.1ms
[sessions-list-bench] run 3/3: rows=10000 wall=42582.5ms eventLoopDelayMax=213.9ms
[sessions-list-bench] cold summary: sessions=10000 min=46099.3ms p50=46099.3ms p95=46099.3ms max=46099.3ms avg=46099.3ms
[sessions-list-bench] summary: sessions=10000 rows=10000 min=39897.2ms p50=41191.1ms p95=42582.5ms max=42582.5ms avg=41223.6ms
pulling main again. 5 new commits...
Quick aside on the Brave issue:
directory, no index.js or index.mjs. OpenClaw 2026.5.3 requires compiled JS artifacts for plugin packages and refuses to start if it finds a
TypeScript-only entry. Every time npm install -g runs the new tarball, openclaw's plugin manager re-fetches and reinstalls brave from npm,
bringing the broken package back.```
I had to repackage my own plugin, as it had the same error
Quoth Claude:
βΊ Running on commit 3dcff3b β that's the latest main tip.
βΊ There it is. This pull added SESSIONS_LIST_DEFAULT_LIMIT = 100 β when no explicit limit is passed, sessions.list now caps at 100 rows by default, specifically to prevent large session stores from monopolizing the event loop.
Here's the comparison:
ββββββββββββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββββββ¬ββββββββββββββββββββββββ
β β Codex baseline (5a6cedc) β Now (3dcff3b) β Delta β
ββββββββββββββββββΌβββββββββββββββββββββββββββΌββββββββββββββββΌββββββββββββββββββββββββ€
β Warm p50 β 40.4s β 1.0s β -97.5% β
ββββββββββββββββββΌβββββββββββββββββββββββββββΌββββββββββββββββΌββββββββββββββββββββββββ€
β Warm p95 β 42.6s β 1.0s β -97.6% β
ββββββββββββββββββΌβββββββββββββββββββββββββββΌββββββββββββββββΌββββββββββββββββββββββββ€
β Cold p50 β 43.0s β 2.3s β -94.6% β
ββββββββββββββββββΌβββββββββββββββββββββββββββΌββββββββββββββββΌββββββββββββββββββββββββ€
β ELD avg (warm) β ~265ms β ~228ms β -14% β
ββββββββββββββββββΌβββββββββββββββββββββββββββΌββββββββββββββββΌββββββββββββββββββββββββ€
β rows returned β 10,000 β 100 β default limit applied β
ββββββββββββββββββ΄βββββββββββββββββββββββββββ΄ββββββββββββββββ΄ββββββββββββββββββββββββ
Sorry for the delay. AC contractor kicked my router. lol.
phew!
it wasn't so bad. gave an excuse to version bump it, which tends to increase the downloads (it's probably bots but, hey, number goes up)
codex running the same benchmarks with 3dcff3b but sez it's unchanged for cold runs. :/ wtf. lol. waiting for final results...
that makes no sense. Must be some caching going on
Codex ran it in the wrong location and it picked up the old source tree. False alarm. The fix is real.
does this land in 5.3? or next beta?
In 5.3
mine is already closed. the 6000s one is someone elses, with the broader fix. Even codex says your patch is probably better than our fix because it bounds it by default. lol
My claw is currently awol... but I have yet to investigate why
Oh wow, got it
scrolling back up to try to test this out, i got tied up last night and by the time i got back to it the discord was down, im not seeing a link to running the test? i'm still hitting perf issues on 2026.5.3, and its being replicated with the test server that i built and shifted my agent & subagent sessions over
my agent did a check through the commit history on sessions.list and has this theory
The regression likely came from sessions.list becoming too rich after v2026.4.23: subagent graph enrichment, child-session indexing, thinking/model metadata, transcript/title preview work, and plugin/session projections all ended up on a polling path.
found it, running now, its running like crap with a "full" prod server load so i'll try a few scenarios
[sessions-list-bench] cold 1/3: rows=10000 wall=77389.5ms eventLoopDelayMax=1448.1ms
[sessions-list-bench] cold 2/3: rows=10000 wall=75926.4ms eventLoopDelayMax=119.4ms
[sessions-list-bench] cold 3/3: rows=10000 wall=75191.0ms eventLoopDelayMax=118.6ms
[sessions-list-bench] cloned 10000 sessions across 18 agents from 20/20 transcript-backed source rows across 18 stores (521 MiB written)[sessions-list-bench] warmup 1/1: rows=10000 wall=75576.6ms eventLoopDelayMax=124.1ms
[sessions-list-bench] run 1/5: rows=10000 wall=75348.7ms eventLoopDelayMax=112.3ms
ok, this is odd. i shut down the TUI, shut down my chat app, nothing happening, then ran it again and im seeing similar figures plus the openclaw node process is not eating 100% cpu
pnpm test:sessions:list:bench -- --sessions 10000 --cold-runs 3 --runs 5 --warmup 1 --include-derived-titles --include-last-message
openclaw@2026.5.3 test:sessions:list:bench /tmp/clawdbot-bench
node --import tsx scripts/bench-sessions-list.ts -- --sessions 10000 --cold-runs 3 --runs 5 --warmup 1 --include-derived-titles --include-last-message[sessions-list-bench] cold 1/3: rows=10000 wall=75461.7ms eventLoopDelayMax=662.2ms
[sessions-list-bench] cold 2/3: rows=10000 wall=75094.8ms eventLoopDelayMax=113.8ms
my test server that has no subagents
[sessions-list-bench] cold 1/3: rows=10000 wall=642.7ms eventLoopDelayMax=2.9ms
[sessions-list-bench] cold 2/3: rows=10000 wall=521.1ms eventLoopDelayMax=2.2ms
[sessions-list-bench] cold 3/3: rows=10000 wall=547.0ms eventLoopDelayMax=14.8ms
[sessions-list-bench] cloned 10000 sessions across 2 agents from 2/2 transcript-backed source rows across 20 stores (2717 MiB written)
[sessions-list-bench] warmup 1/1: rows=10000 wall=537.6ms eventLoopDelayMax=2.4ms
[sessions-list-bench] run 1/5: rows=10000 wall=161.7ms eventLoopDelayMax=2.2ms
[sessions-list-bench] run 2/5: rows=10000 wall=162.4ms eventLoopDelayMax=2.0ms
[sessions-list-bench] run 3/5: rows=10000 wall=157.8ms eventLoopDelayMax=2.3ms
[sessions-list-bench] run 4/5: rows=10000 wall=163.2ms eventLoopDelayMax=2.2ms
[sessions-list-bench] run 5/5: rows=10000 wall=161.5ms eventLoopDelayMax=2.1ms
[sessions-list-bench] cold summary: sessions=10000 min=521.1ms p50=547.0ms p95=642.7ms max=642.7ms avg=570.3ms
[sessions-list-bench] summary: sessions=10000 rows=10000 min=157.8ms p50=161.7ms p95=163.2ms max=163.2ms avg=161.3ms
my agent who is helping me with this pulled out the footgun card, and has a number of hypothesis we are testing out now, when i can get the shape of this to change wiht a discrete step i'll provide mroe detail
Fix direction:
β’Do not build subagent read index for sessions.list unless subagent fields are requested or a spawnedBy/active-run filter requires it.
β’Cache the subagent read index per request window or registry signature.
β’Make descendant counts lazy and bounded.
β’Avoid applying subagent graph enrichment to ordinary non-subagent rows by default.
This is probably the path that turns β10k rowsβ into β10k rows times subagent graph work.β Classic production footgun: metadata enrichment got promoted into the hot list path.
my agent patched the test runner to have source limit, 5 ran great, 6 fell flat
node --import tsx scripts/bench-sessions-list.ts -- --sessions 10000 --source-limit 5 --cold-runs 1 --runs 1 --warmup 0
[sessions-list-bench] cold 1/1: rows=10000 wall=395.5ms eventLoopDelayMax=2.7ms
[sessions-list-bench] cloned 10000 sessions across 5 agents from 5/20 transcript-backed source rows across 18 stores (370 MiB written)
[sessions-list-bench] run 1/1: rows=10000 wall=348.8ms eventLoopDelayMax=2.3ms
[sessions-list-bench] cold summary: sessions=10000 min=395.5ms p50=395.5ms p95=395.5ms max=395.5ms avg=395.5ms
[sessions-list-bench] summary: sessions=10000 rows=10000 min=348.8ms p50=348.8ms p95=348.8ms max=348.8ms avg=348.8ms
lumadmin@psiclaw-template:/tmp/clawdbot-bench-forge$ pnpm test:sessions:list:bench -- --sessions 10000 --source-limit 6 --cold-runs 1 --runs 1 --warmup 0
openclaw@2026.5.3 test:sessions:list:bench /tmp/clawdbot-bench-forge
node --import tsx scripts/bench-sessions-list.ts -- --sessions 10000 --source-limit 6 --cold-runs 1 --runs 1 --warmup 0
[sessions-list-bench] cold 1/1: rows=10000 wall=83061.9ms eventLoopDelayMax=129.2ms
[sessions-list-bench] cloned 10000 sessions across 6 agents from 6/20 transcript-backed source rows across 18 stores (559 MiB written)
found it
this is bizzare
Cause found: modelOverride.
It is not the session store shape generally. It is this field:
"modelOverride": "gpt-5.5"
Configπ
In sessions.list, every row calls:resolveSessionModelRef(cfg, entry, sessionAgentId)
Codeπ
when entry.modelOverride exists. That path goes through plugin model normalization and reloads plugin metadata/manifest machinery per row. When the benchmark clones one bad source row into thousands of rows, it becomes catastrophic.Trace evidence from CPU profile:
listSessionsFromStoreAsync
buildGatewaySessionRow
resolveSessionModelRef
resolvePersistedSelectedModelRef
parseModelRef
normalizeModelRef
normalizeProviderModelId
normalizeStaticProviderModelId
normalizeProviderModelIdWithManifest
loadManifestModelIdNormalizationPolicies
loadPluginMetadataSnapshot
loadPluginRegistrySnapshotWithMetadata
loadInstalledPluginIndex
discoverOpenClawPlugins
Snippetπ
Repro evidence:Original Forge template source-store:
--sessions 10000
--source-store forge-psiclaw-template/sessions/sessions.json
--agent-count 1
Snippetπ
Result:timeout at 180s
no cold sample emitted
π
Same store, only removing modelOverride:cold 1/1: rows=10000 wall=555.7ms
run 1/1: rows=10000 wall=524.8ms
i backed into this by testing the agents directory cloned over to my test server and running the 10k sessions test against more and more folders at once
the first 5 had no override and executed just fine. the 6th turns out it had an override
after getting the script to display which folder it was running i started renaming the folders to get the alpha sort different until i validating it wasnt based on the number of sessions in the agents folder or the size of the sessions
{
"key": "agent:forge:webchat:forge-main",
"modelOverride": "gpt-5.5",
"model": "gpt-5.5",
"modelProvider": "openai-codex"
}
Root issue: entry.modelOverride caused model normalization per session row. With 10k cloned rows, that repeatedly hit plugin metadata discovery. Future outage, now with a receipt.
he did it! the outage has a fucking receipt
i think you found my PR @dense bison
my clanker is being a little odd, its on the dev server and its memory system has been heavily dev'd
77650
its working through getting it right
@dense bison i think we got it right
its a pretty straightforward test, a crazy little thing not related to subagents or amount of agents but to per session model overrides, evaluated for every line
a bit excessive, the cache is scoped to just the session.list call, and seems to be a big part of the root cause people started seeing around 4.29
but not everyone, since not everyone does things like pin sessions to models
Yep β confirmed.
PR #77650 is now:
β’mergeable: true
β’mergeable_state: clean
β’Checks: 86 success, 1 neutral, 7 skipped
β’0 queued / 0 in progress / 0 failing
β’Head: b07c51a3f016
So itβs in final review/merge-ready state.
Reviewing
if i need to teach my clanker a think or two about submitting PRs im all ears, ive got a skill in a repo im developing to get this right
did a little research, agent was part opus 4.6 and part gpt 5.5
Honestly if you've manually tested the fix that instantly makes your PR 10,000x better than everyone else
Most PRs just get the automated tests to pass and then I have to manually verify
i tested it extensively to find the real problem
Landing it π«‘
Nice one. this went deeper than we thought!
Someone get this man a pink hat.
hell yeah and the fix made it into 2026.5.4-beta.3 just in the nick of time, and it looks like 2026.5.4 went live too
short beta
I'm already using it. painless and quick.
release is bit-copy of beta3. great release.
looks like only part of this PR got applied?
So 2026.5.4 shipped with the thinkingLevelsByModelRef cache, but not the selected-model override cache from our PR. That explains why it feels like the fix did not apply: the exact hot path we identified is still present in prod.
my prod server is hosed after updating π reapplying my original fix
09:33:47+00:00 info gateway/ws {"subsystem":"gateway/ws"} β res β sessions.list 197ms conn=d554971dβ¦0751 id=691b95b3β¦b625
09:33:47+00:00 info gateway/ws {"subsystem":"gateway/ws"} β res β sessions.list 176ms conn=d554971dβ¦0751 id=a63c9cf1β¦24ed
09:33:48+00:00 info gateway/ws {"subsystem":"gateway/ws"} β res β sessions.list 187ms conn=d554971dβ¦0751 id=31fb61d5β¦6f02
09:34:08+00:00 info gateway/ws {"subsystem":"gateway/ws"} β res β sessions.list 14102ms conn=d554971dβ¦0751 id=1c963b38β¦1a76
09:34:22+00:00 info gateway/ws {"subsystem":"gateway/ws"} β res β sessions.list 14243ms conn=d554971dβ¦0751 id=f3c8a1a7β¦6489
09:34:22+00:00 info gateway/ws {"subsystem":"gateway/ws"} β res β chat.history 28354ms conn=d554971dβ¦0751 id=3461a64eβ¦6f00
09:34:37+00:00 info gateway/ws {"subsystem":"gateway/ws"} β res β sessions.list 14783ms conn=d554971dβ¦0751 id=00fcfbb2β¦eb91
09:34:37+00:00 info gateway/ws {"subsystem":"gateway/ws"} β res β chat.history 29041ms conn=d554971dβ¦0751 id=940ac2c5β¦0781
09:34:37+00:00 info gateway/ws {"subsystem":"gateway/ws"} β res β chat.history 14801ms conn=d554971dβ¦0751 id=e9487977β¦b848
09:34:51+00:00 info gateway/ws {"subsystem":"gateway/ws"} β res β sessions.list 13652ms conn=d554971dβ¦0751 id=7ad03cb9β¦e40f
that escalated quickly
oh its not in 2026.5.4 π
my agent told me that it was in 2026.5.4-beta.3 but it might have been checking local source which.. surprise would contain the fix it made
If I were patching next, Iβd start in the Control UI/ClawCanvas request layer and add a small RPC broker: method + stableJson(params) β existing in-flight request or cached recent result. That stops the bear instead of just feeding it tranquilizers.
uh oh, i called the problem a bear. goblinification warning klaxons are going off
So the row-cache fix is not gone. The remaining failure is the request storm plus too-short cache TTL. Default sessions.list cache is only 10s, so ClawCanvas keeps hitting the expensive cold path every ~20s. Future outage, now current outage.
full circle back to outages. i think he feels very vindicated. its not a future outage, its a current outage that he predicted
the fixes from my PR do not improve the current situation, there is more going on
im still investigating, 5.5 will keep going on this, but i need to go to bed
robot borken right now, but I got you bb @subtle gull
bout time you tism maxers finally did a PR 
ive had a smooth going with openclaw mostly and i finally hit an issue that i was having and the maintainers werent
actually need to modify the PR, my claw found the real problem
The current root-cause fix targets the model identity resolver path:
β’slow call is resolveSessionModelRef(...), ~
550-700msper affected row
β’many rows still hit it even after our PR, especially rows missing persisted modelProvider + model
β’resolveSessionModelIdentityRef(...) falls back into resolveSessionModelRef(...)
β’resolveSessionDisplayModelIdentityRef(...) can also re-enter default model resolution for CLI/provider display normalization
So the real fix is:1.Use persisted entry.modelProvider + entry.model directly for lightweight sessions.list rows.
2.Cache resolveSessionModelRef by agentId + modelOverride/providerOverride + runtime model/provider.
3.Precompute default model refs per agent once per list.
4.Avoid display-model normalization in lightweight rows unless required.
Short version: our PR cached some derived row fields; this fix removes/caches the expensive default-model resolver itself. The old PR was a bandage around the hotspot. This is cutting out the hotspot.
23
44+00:00 info gateway/ws {"subsystem":"gateway/ws"} β res β chat.history 59ms conn=b9f8ca9bβ¦4ee0 id=4da6c5aaβ¦673e
2301+00:00 info gateway/ws {"subsystem":"gateway/ws"} β res β chat.history 52ms conn=b9f8ca9bβ¦4ee0 id=9baa2687β¦c1a7
2316+00:00 info gateway/ws {"subsystem":"gateway/ws"} β res β sessions.list 1119ms conn=b9f8ca9bβ¦4ee0 id=5b88262cβ¦ab42
2322+00:00 info gateway/ws {"subsystem":"gateway/ws"} β res β logs.tail 68ms conn=4440f309β¦bc82 id=75d76078β¦0ad3
2322+00:00 info gateway/ws {"subsystem":"gateway/ws"} β res β chat.history 71ms conn=b9f8ca9bβ¦4ee0 id=0c2a2e54β¦ba0f
2341+00:00 info gateway/ws {"subsystem":"gateway/ws"} β res β chat.history 54ms conn=b9f8ca9bβ¦4ee0 id=6c74419aβ¦cc14
2302+00:00 info gateway/ws {"subsystem":"gateway/ws"} β res β chat.history 51ms conn=b9f8ca9bβ¦4ee0 id=a2b1f84eβ¦5d71
2317+00:00 info gateway/ws {"subsystem":"gateway/ws"} β res β sessions.list 1149ms conn=b9f8ca9bβ¦4ee0 id=0bc8245dβ¦5f8e
2321+00:00 info gateway/ws {"subsystem":"gateway/ws"} β res β chat.history 52ms conn=b9f8ca9bβ¦4ee0 id=d44b741bβ¦d008
2341+00:00 info gateway/ws {"subsystem":"gateway/ws"} β res β chat.history 55ms conn=b9f8ca9bβ¦4ee0 id=88b8e8ccβ¦f8b4
ok we might be cooking
fully loaded has 1.1s as worst case, vs uhhhh like 36s
at what point would this be just chasing edge cases?
anything further than this
its all in sessions.list
getting the full analysis from my claw about what it found
we are going back to clean slate and patching just the sessions.ts that was responsible, it went in a few other weird experimental areas before we put a custom sessions.ts with extensive logging in to find out which call was the problem child
up until the last patch the peaks would still get up to 8000ms, which could start a backlog issue
updated PR incoming
being an edge case just makes it less likely to manifest but if it's a bug, it's a bug!
but if it affects 0.000000000001% of users, maintainer time and resources might be better spent else where lol
the core problem isnt an edge case, when it gets fixed it benefits a wide range of people, even instances where it doesnt blow the duration of sessions.list from ~500ms steady state to 30000ms+
even 6000-7000ms is enough to negatively impact the event queue and affect a lot of people. and the potential causes are quite a bit and common
this can happen with 1 agent and zero subagents provided the user has more than 1 model, hell, it probably even can happen with a single model
make sure to have repro steps in your PR
what happens is that sessions.list looks at all off hte sessions in the openclaw folder, and all of the lines of conversation in the sessions, if a line has static model/thinking/provider assignments, it does it kicks off a blocking provider/model/thinking mode resolution path for each line
That's what clawsweeper is for. Hopefully the PR is comprehensive enough that it's a simple [ok] click for the maintainer by the time clawsweeper is done with it.
that call could be slow, especially if multiple providers and models are present
if you have a session file with 1000 entries in it, or 100 session files with 10 entries, things get ugly
I could probably also repro this but I preemptively archived most of my session files to avoid this and related issues (most of which were already fixed in the last push)
77902
its a bigger patch than the last one but 2026.5.4 changed things, the STR are explicit with showing the default path is the bad path, and how it blows up the more it hits the worse path
Reviewing
i have a few revisions from recent copilot suggestions
its a little narrower
Read: your suspicion is right. For the active fleet stores, the design assumption is backwards. sessions.list is optimized around βmodel data may be missing,β but the common case is βmodel data is already persisted.β
Operational conclusion: the fast path should be the default:
1.Use providerOverride + modelOverride if present.
2.Else use persisted modelProvider + model if present.
3.Only fall back to expensive default/agent/model resolution when both are missing.
That keeps correctness for old or malformed rows, but stops making the normal path wear the blast suit.
only sampled 37 on 2 different servers
100% of them had the data, and currently sessions.list does model resolution each row if it has it, but doesnt when its missing. if it already has the data, why is it looking it up?
1000 messages and most of them had the data
Hmm, I'm able to repro your slowdown. I can hold off until you're done though
can you do a check like what i just did, looking at the session transcripts and seeing how many have the provider+model data and how many dont? its damn near 100% for me but i am one operator using two systems intentionally operating very similar to each other
Latest solution: make persisted model identity the primary path for sessions.list.
Current shape:
1.Fast path first
β’If row has providerOverride + modelOverride, use it.
β’Else if row has persisted modelProvider + model, use it.
β’Do not call default model resolution.
2.Fallback only for sparse/legacy rows
β’If both persisted paths are missing, fall back to existing agent/default model resolution.
β’That preserves correctness for old rows.
3.Lightweight rows stay lightweight
β’Skip thinking enrichment for lightweight sessions.list rows.
β’Do not build full model/thinking option metadata unless the caller needs detail-view data.
4.Optional storm guard
β’Keep response caching / UI poll damping as secondary mitigation.
β’But the real fix is avoiding the expensive resolver in the normal row path.
Evidence so far:β’openclaw-prod current Gateway store: 23/23 rows had persisted model identity
β’local current store: 14/14 rows had persisted model identity
β’broader prod sample: 372/408 artifacts complete, missing cases were older sparse session-jsonl
β’prod benchmark:
β’before: full list around 8s
β’temp patched dist: around 640ms
β’installed direct benchmark: 776-873ms
β’real Gateway WS log: sessions.list 124ms
So my current recommendation is: fold the persisted-model fast path into the maintainer PR as the primary fix, then keep the cache pieces only as defensive pressure relief. The previous cache-only path was a bandage. This is the artery clamp.
PR update in flight, should land shortly
5.5 in high thinking is a little slow sometimes
sorry, hitting long tool calls that are killing results, getting it tightened up
haha no rush
this clanker is getting tired we've been at it off and on all day
now test is running worse than prod lol. prod was chilling doing nothing most of hte day just watching openclaw logs --follow session.list and chat.history metrics
uploaded, its running through the copilot/clawsweeper comments now, might have some more minor revisions
One Copilot note is still worth tightening before I reply: runtime model identity should bypass the default/agent cache key entirely. Iβm making that small follow-up so the responses are backed by code, not hand-waving.
He doesn't like hand waving
The revised head is now e92c9ad7ea and pushed. Iβm posting concise replies to the ClawSweeper/Copilot threads now, then Iβll try to resolve the review threads if the token has permission.
Comments:
β’Replied to ClawSweeper
β’Replied to all 5 Copilot review comments
β’Resolved all 5 review threads via GraphQL
ok, i think hes done for now
cool, ready for review?
I don't like that it adds another cache so I'll see if I can keep it fast without it
yeah, ready. i tried to avoid caches but look at the results
made them very narrow
Iβd describe it as request-local row-resolution memoization, not a broad cache.
Narrowness:
β’Lifetime: one sessions.list call only. It dies after the response is built.
β’Scope: only model identity resolution for rows in that single response.
β’Keys: effective inputs that affect model identity: runtime model, fallback ref, override ref, agent/default model ref.
β’No cross-request state: no TTL, no invalidation problem, no stale config risk.
β’No persistence: nothing survives config changes, session writes, or process lifetime.
Why it matters: sessions.list is not resolving one model. It is resolving model identity across hundreds to thousands of session rows, and many rows share the same effective defaults. Without memoization, the same config/default resolution path gets recomputed per row. That is the bad scaling behavior.The maintainer-preferred version, βfast without cache,β is ideal if the resolver itself can become O(1) and allocation-light. But for this PR, the safer narrow fix is to avoid changing resolver semantics and just avoid repeating identical resolver work inside one response.
Suggested phrasing:
This is not a long-lived cache. It is request-local memoization inside one sessions.list response build. The intent is to preserve the existing resolver semantics, including aliases and wrapper-provider inference, while avoiding repeated identical model identity resolution across rows that share the same effective runtime/default/override inputs. Nothing survives the request, so there is no invalidation surface.
it wanted to cache this and cache that, time based, dont worry it'll be good enough but i kept saying no and to drill in on the specific blocking part of the hot path
the benchmarks showed the results improved as the problem was more specifically addressed
Good catch @subtle gull, pushing a fix
Nice!!! Going to land this in a bit
Great work, thank you @subtle gull and @tropic fox
props to @subtle gull for this bit. My fixes didn't fix his install and I inadvertently swerved it by archiving my sessions (^^;)
I'm so proud of you boys β€οΈ
PR Landed π«‘
π
my claw looking at that 20s resolveProivderHookPlugin
thinking 'i kicked this problems ass twice now i can go for some more'
Different function, same future outage jacket
oh hes got its number
it looks like this isnt in 2026.5.7, is it showing up soon or is the underlying process getting reworked from the findings? I'm keeping track so I can reapply the fixes each time I update
It isnβt? It shouldβve been in the next release after I landed it
maybe my clanker is missing something, it keeps looking for the patch signals and says its not landed yet
β’But v2026.5.7 does not contain the PR code:
β’src/gateway/session-utils.ts in tag v2026.5.7 has no modelIdentityByEntryKey
β’no allowPluginNormalization
β’no lightweight-row fast-path markers from the PR
β’GitHub compare shows the release tag diverged from the PR merge commit, not ahead of it.
Operational read: 2026.5.7 was likely cut from a maintenance/release branch that did not include our merged sessions.list PR. The fix is merged to main, but not shipped in this release.
soon as i cut a commit on another project i'll check to see if im safe to update to 2026.5.7 and then update it and not patch, my symptoms are immediate and severe so it will be an easy test
I had/have a PR that stayed in unreleased too, though in that case I was asked to review if it was needed at all or needed to change shape after this release
Confirmed: openclaw-prod is on 2026.5.7, the package is missing the fast-path markers, and the symptom is back. I have a rebased 2026.5.7 build ready; applying it with a package backup now.
02:02:36+00:00 warn diagnostic {"subsystem":"diagnostic"} liveness warning: reasons=event_loop_delay interval=32s eventLoopDelayP99Ms=650.6 eventLoopDelayMaxMs=5838.5 eventLoopUtilization=0.687 cpuCoreRatio=0.711 active=1 waiting=0 queued=1 recentPhases=sidecars.subagent-recovery:4ms,sidecars.main-session-recovery:4ms,post-attach.update-sentinel:0ms,sidecars.model-prewarm:581ms,sidecars.session-locks:835ms,post-ready.maintenance:138ms work=[active=agent:forge:webchat:forge-main(processing,q=1,age=5s last=run:completed) queued=agent:forge:webchat:forge-main(processing,q=1,age=5s last=run:completed)]
02:02:37+00:00 info gateway/ws {"subsystem":"gateway/ws"} β res β chat.history 591ms conn=da87abf7β¦1a00 id=e210e490β¦5c3f
02:02:45+00:00 info gateway/ws {"subsystem":"gateway/ws"} β res β sessions.list 6502ms conn=da87abf7β¦1a00 id=a4c41974β¦01b8
02:02:45+00:00 info gateway/ws {"subsystem":"gateway/ws"} β res β logs.tail 6513ms conn=8e61cb8eβ¦97ca id=cb88e5c3β¦3c9b
02:02:57+00:00 info gateway/ws {"subsystem":"gateway/ws"} β res β chat.history 595ms conn=da87abf7β¦1a00 id=f580d9d1β¦543e
02:03:07+00:00 info gateway/ws {"subsystem":"gateway/ws"} β res β sessions.list 6564ms conn=da87abf7β¦1a00 id=7e299566β¦abaf
02:03:17+00:00 info gateway/ws {"subsystem":"gateway/ws"} β res β chat.history 582ms conn=da87abf7β¦1a00 id=77f8feebβ¦e7e0
02:03:31+00:00 info gateway/ws {"subsystem":"gateway/ws"} β res β sessions.list 9194ms conn=da87abf7β¦1a00 id=fffc9483β¦1a18
02:03:32+00:00 info gateway/ws {"subsystem":"gateway/ws"} β res β chat.history 585ms conn=da87abf7β¦1a00 id=bacdf968β¦4389
02:03:52+00:00 info gateway/ws {"subsystem":"gateway/ws"} β res β sessions.list 6197ms conn=da87abf7β¦1a00 id=6ed2e67dβ¦b38c
02:03:54+00:00 info gateway/ws {"subsystem":"gateway/ws"} β res β chat.history 617ms conn=da87abf7β¦1a00 id=bbab0b17β¦c024
02:04:14+00:00 info gateway/ws {"subsystem":"gateway/ws"} β res β sessions.list 6417ms conn=da87abf7β¦1a00 id=d5e3b7c8β¦2c7f
02:04:14+00:00 info gateway/ws {"subsystem":"gateway/ws"} β res β chat.history 7011ms conn=da87abf7β¦1a00 id=5b39928fβ¦b341
02:04:36+00:00 info gateway/ws {"subsystem":"gateway/ws"} β res β sessions.list 6477ms conn=da87abf7β¦1a00 id=2323de92β¦ee2f
02:04:36+00:00 info gateway/ws {"subsystem":"gateway/ws"} β res β chat.history 7114ms conn=da87abf7β¦1a00 id=989a9b11β¦f76
yeah it doesnt look like its in
Can you please ask it to check specifically if this got merged
after it repatched
02
42+00:00 warn diagnostic {"subsystem":"diagnostic"} liveness warning: reasons=event_loop_delay interval=34s eventLoopDelayP99Ms=302 eventLoopDelayMaxMs=4810.9 eventLoopUtilization=0.472 cpuCoreRatio=0.483 active=1 waiting=0 queued=1 recentPhases=sidecars.subagent-recovery:3ms,sidecars.main-session-recovery:4ms,post-attach.update-sentinel:0ms,sidecars.model-prewarm:643ms,post-ready.maintenance:178ms,sidecars.session-locks:1217ms work=[active=agent:forge:webchat:forge-main(processing,q=1,age=8s last=run:completed) queued=agent:forge:webchat:forge-main(processing,q=1,age=8s last=run:completed)]
0244+00:00 info gateway/ws {"subsystem":"gateway/ws"} β res β logs.tail 842ms conn=b767019fβ¦acba id=0d9a0228β¦fbfc
0257+00:00 info gateway/ws {"subsystem":"gateway/ws"} β res β sessions.list 1368ms conn=2edb97dbβ¦4bfe id=19fcfcb5β¦511d
0213+00:00 info gateway/ws {"subsystem":"gateway/ws"} β res β sessions.list 1384ms conn=2edb97dbβ¦4bfe id=58352e11β¦9490
0230+00:00 info gateway/ws {"subsystem":"gateway/ws"} β res β sessions.list 1384ms conn=2edb97dbβ¦4bfe id=fcacf92bβ¦826a
0240+00:00 info gateway/ws {"subsystem":"gateway/ws"} β res β sessions.list 1384ms conn=2edb97dbβ¦4bfe id=0202e0b5β¦1bab
0248+00:00 info gateway/ws {"subsystem":"gateway/ws"} β res β sessions.list 1261ms conn=2edb97dbβ¦4bfe id=1f0f9d5dβ¦3bcd
0250+00:00 info gateway/ws {"subsystem":"gateway/ws"} β res β logs.tail 834ms conn=7f63f694β¦8e09 id=2ed91323β¦b5b2
0206+00:00 info gateway/ws {"subsystem":"gateway/ws"} β res β sessions.list 1384ms conn=2edb97dbβ¦4bfe id=1ae0ae1eβ¦edd8
0223+00:00 info gateway/ws {"subsystem":"gateway/ws"} β res β sessions.list 1369ms conn=2edb97dbβ¦4bfe id=901a65dbβ¦15c9
0239+00:00 info gateway/ws {"subsystem":"gateway/ws"} β res β sessions.list 1481ms conn=2edb97dbβ¦4bfe id=9053e7faβ¦b90a
higher than what it was when we made the PR
i asked specifically for 77902 tracing
For #77902: yes, GitHub reports it merged.
β’PR: openclaw/openclaw#77902
β’State: closed
β’merged=True
β’Merged at: 2026-05-06T07:33:55Z
β’Merge commit: 3e0fcafb8720ba3c1bb0fb430b88013eff5c6d08
β’Title: fix(gateway): fast-path qualified session-list model refs
Important caveat: it was merged, but it was not present in shipped v2026.5.7, which is why the prod hotpatch was still needed.
3.Code comparison
In v2026.5.7, SessionListRowContext only has the thinking-level cache:type SessionListRowContext = {
subagentRuns: ReturnType;
storeChildSessionsByKey: Map;
thinkingLevelsByModelRef: Map>;
};In origin/main, it has the missing selected-model cache:
type SessionListRowContext = {
subagentRuns: ReturnType;
storeChildSessionsByKey: Map;
selectedModelByOverrideRef: Map>;
thinkingLevelsByModelRef: Map>;
};4.Row context initialization
v2026.5.7:
return {
subagentRuns,
storeChildSessionsByKey: buildStoreChildSessionIndex(params.store, params.now, subagentRuns),
thinkingLevelsByModelRef: new Map(),
};origin/main:
return {
subagentRuns,
storeChildSessionsByKey: buildStoreChildSessionIndex(params.store, params.now, subagentRuns),
selectedModelByOverrideRef: new Map(),
thinkingLevelsByModelRef: new Map(),
};5.Missing helper function in shipped tag
origin/main contains:function resolveSessionSelectedModelRef(params: {
cfg: OpenClawConfig;
entry?: SessionEntry;
agentId: string;
rowContext?: SessionListRowContext;
}): ReturnType | null {
const override = normalizeStoredOverrideModel({
providerOverride: params.entry?.providerOverride,
modelOverride: params.entry?.modelOverride,
});
if (!override.modelOverride) {
return null;
}
if (!params.rowContext) {
return resolveSessionModelRef(params.cfg, params.entry, params.agentId);
}
const key = [
normalizeAgentId(params.agentId),
override.providerOverride ?? "",
override.modelOverride,
].join("\0");
const cached = params.rowContext.selectedModelByOverrideRef.get(key);
if (cached) {
return cached;
}
const selected = resolveSessionModelRef(params.cfg, params.entry, params.agentId);
params.rowContext.selectedModelByOverrideRef.set(key, selected);
return selected;
}v2026.5.7 does not have this helper. It jumps from createSessionRowModelCacheKey() directly to resolveSessionRowThinkingLevels().
6.Call-site difference
v2026.5.7 still directly resolves per row:
const resolved = resolveSessionModelRef(cfg, entry, agentId);
return { provider: resolved.provider, model: resolved.model };origin/main uses the cached helper in row construction:
const selectedModel = resolveSessionSelectedModelRef({
cfg,
entry,
agentId: sessionAgentId,
rowContext,
});Conclusion: #77902 is merged to GitHub main, but v2026.5.7 was cut without that code. The shipped tag lacks the actual cache field, initializer, helper, and row call-site change.
ok that was a bit big, sorry
its also looking at the extra latency vs 2026.5.5 which is what i was running prior to todays patch. I think our steady state was around 400ms and peaks to 1200ms, not steady at 1200ms and peaks to 4800ms
its in main, 5.6 and 5.7 just got cut more narrowly from main
Ahhh got it
and my clanker thinks the extra latency is from background maintanence tasks related trying to help recover failed cron jobs
so not a sessions.list issue, different topic for a different PR
Hopefully going out soon, some major refactors on main right now so releases are being cherry picked for a bit

44+00:00 info gateway/ws {"subsystem":"gateway/ws"} β res β chat.history 59ms conn=b9f8ca9bβ¦4ee0 id=4da6c5aaβ¦673e
01+00:00 info gateway/ws {"subsystem":"gateway/ws"} β res β chat.history 52ms conn=b9f8ca9bβ¦4ee0 id=9baa2687β¦c1a7
02+00:00 info gateway/ws {"subsystem":"gateway/ws"} β res β chat.history 51ms conn=b9f8ca9bβ¦4ee0 id=a2b1f84eβ¦5d71
42+00:00 warn diagnostic {"subsystem":"diagnostic"} liveness warning: reasons=event_loop_delay interval=34s eventLoopDelayP99Ms=302 eventLoopDelayMaxMs=4810.9 eventLoopUtilization=0.472 cpuCoreRatio=0.483 active=1 waiting=0 queued=1 recentPhases=sidecars.subagent-recovery:3ms,sidecars.main-session-recovery:4ms,post-attach.update-sentinel:0ms,sidecars.model-prewarm:643ms,post-ready.maintenance:178ms,sidecars.session-locks:1217ms work=[active=agent:forge:webchat:forge-main(processing,q=1,age=8s last=run:completed) queued=agent:forge:webchat:forge-main(processing,q=1,age=8s last=run:completed)]
13+00:00 info gateway/ws {"subsystem":"gateway/ws"} β res β sessions.list 1384ms conn=2edb97dbβ¦4bfe id=58352e11β¦9490
06+00:00 info gateway/ws {"subsystem":"gateway/ws"} β res β sessions.list 1384ms conn=2edb97dbβ¦4bfe id=1ae0ae1eβ¦edd8