Hi Krill — active-memory embedded runs ignoring AbortController after 4.15 upgrade
Upgraded from 4.14 → 4.15 today. After the upgrade, active-memory embedded runs are hanging well past timeoutMs before failing.
Config:
json"active-memory": {
"enabled": true,
"config": {
"agents": ["main"],
"queryMode": "message",
"promptStyle": "contextual",
"timeoutMs": 3000
}
}
Log evidence:
active-memory: ...start timeoutMs=3000 queryChars=507
embedded run failover decision: reason=timeout from=openai-codex/gpt-5.4-mini
active-memory: ...done status=timeout elapsedMs=20623 summaryChars=0
timeoutMs=3000 but elapsedMs=20623 — the AbortController fires at 3000ms but the embedded run keeps running for ~20s before eventually failing. Source code (index.ts lines 1816–1856) shows the abort should short-circuit immediately.
Model: openai-codex/gpt-5.4-mini (ChatGPT Plus OAuth)
Question: Is the 4.15 change moving recall from system prompt injection → prependContext (CHANGELOG line 196, #66144) related to this? The abort signal doesn't appear to be terminating the Codex embedded run cleanly. Is this a known issue with Codex-backed embedded runs post-4.15, and is there a workaround?
#active-memory embedded runs ignoring AbortController after 4.15 upgrade
1 messages · Page 1 of 1 (latest)
I dug through the source, and I don’t think #66144 (system prompt injection → prependContext) is the cause.
Why I’m pretty confident about that:
extensions/active-memory/index.ts:2003-2023only returns{ prependContext: promptPrefix }aftermaybeResolveActiveRecall()finishes.- Your timeout is happening inside that recall run, earlier, at
extensions/active-memory/index.ts:1761-1828.
What looks more likely is this:
active-memoryarms its ownAbortControlleratextensions/active-memory/index.ts:1764-1768.runRecallSubagent()passes thatabortSignalintorunEmbeddedPiAgent()atextensions/active-memory/index.ts:1664-1674.- In
src/agents/pi-embedded-runner/run/attempt.ts:1531-1539, aborting doesvoid activeSession.abort(), so it is fire-and-forget. - In the Codex path,
extensions/codex/src/app-server/run-attempt.ts:195-226, abort turns into a best-effortturn/interrupt, not a hard synchronous stop. - After that, cleanup still runs, and
cleanupEmbeddedAttemptResources()waits for idle before flushing. That path goes through:src/agents/pi-embedded-runner/run/attempt.subscription-cleanup.ts:39src/agents/pi-embedded-runner/wait-for-idle-before-flush.ts:10- default idle wait is
30_000ms
So the important distinction is:
timeoutMs=3000is when abort gets requested- it is not a guaranteed “return to active-memory caller by 3000ms”
That makes your elapsedMs=20623 sadly believable on a Codex-backed embedded run. The abort fires at 3s, but the run can still spend ~17s unwinding / waiting to become idle before runRecallSubagent() actually returns and active-memory logs done status=timeout.
Short answer
- Related to
prependContext? Probably no - Known Codex-backed embedded-run cancellation limitation? Looks like yes, functionally
- Why 20s? Abort is advisory/best-effort downstream, and cleanup waits for idle
- Best workaround: make Active Memory use a non-Codex recall model via
config.modelorconfig.modelFallback, so recall avoids the Codex app-server path entirely
Docs for that config are here:
Practical workarounds
- Best: set
plugins.entries.active-memory.config.modelto a non-openai-codex/...model
or at least setconfig.modelFallbackso recall has a non-Codex escape hatch. - If staying on Codex: raise
timeoutMsabove3000
because3000is likely to hit this cancellation path constantly. - If latency matters more than recall: temporarily disable Active Memory.
If you can, paste the exact output of openclaw --version. If this reproduces on the latest patch too, I’d file it as a bug with these code pointers and your three log lines:
extensions/active-memory/index.tssrc/agents/pi-embedded-runner/run/attempt.tsextensions/codex/src/app-server/run-attempt.ts
If you want, I can help turn this into a tight GitHub issue repro.
Follow-up: active-memory timeout still ~15s after switching from Codex to OpenRouter
Tried your suggested workaround — switched active-memory config to a non-Codex model:
"model": "openrouter/arcee-ai/trinity-large-preview:free"
Same result:
active-memory: ...start timeoutMs=3000
embedded run failover decision: reason=timeout from=openrouter/arcee-ai/trinity-large-preview:free
active-memory: ...done status=timeout elapsedMs=15562 summaryChars=0
So it's not Codex-specific. The abort fires at 3s but the embedded run still takes ~15s to return.
Based on your earlier analysis, wait-for-idle-before-flush in src/agents/pi-embedded-runner/wait-for-idle-before-flush.ts has a default idle wait of 30,000ms — but we're seeing ~15s, not 30s. Two questions:
- Is there a config to reduce or skip the idle wait specifically for active-memory embedded runs?
- Is
elapsedMsmeasuring from abort-requested to cleanup-complete, or something else? Trying to understand why we're seeing ~15s consistently instead of the full 30s.
OpenClaw v2026.4.15, Docker on Windows 11.
Nice catch, that changes the diagnosis a lot.
This does not look Codex-specific anymore. With the OpenRouter repro, I’d treat it as a generic embedded-run timeout/teardown issue in runEmbeddedPiAgent, not a prependContext issue and not a Codex-only issue.
1) Is there a config to reduce/skip the idle wait for active-memory?
I don’t see one in v2026.4.15.
What I found:
flushPendingToolResultsAfterIdle()does accept an internaltimeoutMsparam insrc/agents/pi-embedded-runner/wait-for-idle-before-flush.ts- but I do not see any user-facing config in:
extensions/active-memory/index.tsdocs/concepts/active-memory.md- the embedded runner wiring in
src/agents/pi-embedded-runner/run/attempt.ts
So in current source, Active Memory can set timeoutMs for the recall run itself, but not for the post-abort idle-wait cleanup.
2) What does elapsedMs actually measure?
It is total wall-clock from recall start until runRecallSubagent() finally returns/throws.
Specifically:
extensions/active-memory/index.ts:1720doesconst startedAt = Date.now()- then it awaits
runRecallSubagent(...) - then on success/timeout it logs
elapsedMs: Date.now() - startedAt
So yes, elapsedMs includes everything:
- recall setup
- embedded run execution
- abort request
- downstream unwind/cleanup
- whatever wait happens before
runEmbeddedPiAgent()finally gives control back
It is not “time until AbortController fired”.
Why ~15s instead of the full 30s?
Because 30_000 in src/agents/pi-embedded-runner/wait-for-idle-before-flush.ts:10 is just the upper bound.
flushPendingToolResultsAfterIdle() returns as soon as agent.waitForIdle() resolves. So seeing ~15.5s means the runner is becoming “idle enough” around then, before the 30s cap.
So the shape is:
- abort requested at ~3s
- embedded run does not unwind immediately
- some deeper waiter settles around ~15s
- cleanup finishes
- Active Memory logs timeout
I do see other 15_000 defaults elsewhere in the runtime, for example src/auto-reply/reply/reply-run-registry.ts:403, but I have not proven that exact one is the waiter on this path. I’d call that a plausible clue, not a confirmed root cause.
Practical conclusion
At this point I’d describe it as:
Active Memory
timeoutMsaborts the recall request, but the embedded runner can still spend ~15s unwinding before control returns.
So the bug is probably in one of these areas:
- embedded-run cleanup waiting too long for silent/tool-only runs
- provider/request abort not settling the agent promptly
waitForIdle()resolving on a later internal timeout boundary instead of immediate abort
Current workaround
Honestly, not a great one:
- assume the effective floor is around 15s
- set
active-memory.config.timeoutMsaccordingly, or - disable Active Memory if that latency is worse than the value it adds
That lines up with the docs too, which recommend starting around 15000 ms even for message mode:
Next step
This looks bug-report-worthy now. I’d file it with:
- exact version:
OpenClaw v2026.4.15 - both repros:
openai-codex/gpt-5.4-miniopenrouter/arcee-ai/trinity-large-preview:free
- the two timing pairs:
3000 -> 206233000 -> 15562
- code pointers:
extensions/active-memory/index.ts:1720src/agents/pi-embedded-runner/run/attempt.tssrc/agents/pi-embedded-runner/wait-for-idle-before-flush.ts
GitHub issues:
If you want, I can help you write a tight issue body that points straight at the likely fix area.
Follow-up: webhooks plugin registering route multiple times per session
Seeing registered route ai-news appear 2-3 times in logs for a single incoming message:
[webhooks] registered route ai-news on /plugins/webhooks/ai-news for session agent:main:main
[webhooks] registered route ai-news on /plugins/webhooks/ai-news for session agent:main:main
[webhooks] registered route ai-news on /plugins/webhooks/ai-news for session agent:main:main
Source research shows register(api) is triggered via loadGatewayStartupPlugins() at boot — expected once. But we're seeing it fire multiple times mid-session, each time bound to agent:main:main via taskFlow.bindSession().
No config changes between messages. replaceExisting: true is hardcoded so functionally it's not breaking anything, but the repeated registrations are suspicious.
Context: We're also hitting the active-memory embedded-run timeout bug (reported earlier) on the same setup. Could the active-memory cleanup cycle be triggering session resets on agent:main:main, which re-fires the webhook plugin's register() each time?
Question: Is multiple taskFlow.bindSession() firing on session reset expected behavior for the webhooks plugin, or is this a separate bug?
OpenClaw v2026.4.15, Docker on Windows 11.
I don’t think the Active Memory timeout bug is re-firing the webhooks route through a session reset path.
What the code says:
extensions/webhooks/index.tslogsregistered route ...insideregister(api)- inside that same function it does:
api.runtime.taskFlow.bindSession({ sessionKey: route.sessionKey })- then
api.registerHttpRoute(...)
src/plugins/runtime/runtime-taskflow.tsshowsbindSession(...)is just a thin binder that returnscreateBoundTaskFlowRuntime(...)- that bound runtime is not a session-reset listener or lifecycle hook
So, if you see:
[webhooks] registered route ai-news ...
multiple times, that means the plugin register() path ran multiple times. It does not mean taskFlow.bindSession() got “fired by session reset”.
Short answer
- Is multiple
taskFlow.bindSession()on session reset expected? No. - Is repeated route registration expected on plugin reload? Yes, that is why
replaceExisting: trueis there. - Is 2 to 3 re-registrations per single incoming message normal? I would say no, that’s suspicious.
What is probably happening instead
This looks more like a plugin registry reload / re-registration issue than a session-reset issue.
A few things can legitimately cause register(api) to run again:
- gateway startup load
- deferred plugin reload paths
- config/secrets reload
- a runtime plugin-registry compatibility miss that causes
loadOpenClawPlugins()to rebuild the registry
But that is different from “the session reset, so webhooks rebound”.
On the possible link to Active Memory
I don’t see a direct code path from Active Memory cleanup to webhook route re-registration.
If there is a connection, it would be indirect, like:
- Active Memory embedded runs causing some plugin-registry reload/cache miss
- then
webhooks.register()runs again - then
replaceExisting: truequietly replaces the same route
So I’d treat this as a separate bug class, even if both bugs happen in the same setup.