#Subagent spawn always times out — sessions.patch takes 85-90s

1 messages · Page 1 of 1 (latest)

cyan torrent
#

Every call to sessions_spawn returns gateway timeout after 10000ms. Looking at the gateway log I can see sessions.patch completing successfully but taking 85,000ms+ — well past the 10s client timeout.

• Gateway health: ✅ {"ok":true,"status":"live"}
• sessions.json size: 544KB (3 sessions, cleaned up)
• lightContext: true makes no difference
• Version: OpenClaw 2026.4.21, macOS
• WS log: ⇄ res ✓ sessions.patch 89684ms

Subagents never actually run — the spawn times out before the session initializes. Is there a config to raise the WS timeout, or is this a known issue?

fleet owlBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution

copper vigilBOT
minor graniteBOT
#

I’m checking the actual timeout path in source now, because 89s for sessions.patch definitely smells wrong.

#

Found the important bit: the spawn path is explicitly forcing a 10s gateway RPC timeout in code, which means this is not something lightContext can fix. I’m checking whether that timeout is user-configurable or just hard-coded.

#

Short answer so far: I’m not seeing a config knob for this. The current spawn implementation is hard-coding a 10s timeout around sessions.patch, which lines up exactly with what you’re seeing.

#

I checked the docs and source, this looks like a real bug/pathology, not a normal tuning issue.

  • sessions_spawn is supposed to be non-blocking and return immediately: https://docs.openclaw.ai/tools/subagents
  • The gateway client default per-RPC timeout is 30s, but the spawn path explicitly overrides that to 10s
  • In current code, sessions_spawn hits sessions.patch with timeoutMs: 10000 in the subagent/ACP spawn path
  • So no, there is not a normal config knob to raise this specific timeout
  • Also, runTimeoutSeconds / timeoutSeconds on sessions_spawn is the child run timeout, not this RPC timeout

So your symptoms line up exactly with the code.

The weird part is the 85 to 90s sessions.patch. That should be fast. sessions.json being only 544 KB does not rule it out, because sessions.patch goes through session-store update + maintenance logic, which can also touch transcript/archive cleanup.

My read:

  1. no supported config for this timeout path,
  2. lightContext would not help,
  3. this is likely a bug, or a session-store/filesystem stall.
#

Useful next checks:

openclaw status --all
openclaw doctor
du -sh ~/.openclaw/agents/*/sessions 2>/dev/null
find ~/.openclaw/agents -path '*/sessions/*.jsonl' | wc -l

If you want a quick A/B test, temporarily switch session maintenance to warn-only and retry. If spawn suddenly gets fast, the slowdown is probably in maintenance/cleanup:

session: {
  maintenance: {
    mode: "warn"
  }
cyan torrent
#

Thanks — that's really helpful. So the spawn path is blocking on sessions.patch before returning, even though the docs say it should be non-blocking?

To confirm: the subagent session IS being created (I can see it in sessions.json with 0 messages), but the sessions_spawn call itself times out at 10s before returning the accepted status. So it seems like the RPC is waiting for patch completion instead of returning immediately.

Is there a workaround for now? sessions_spawn with status="accepted" returned immediately would be enough — I just need the childSessionKey so I can poll/yield for results.

minor graniteBOT
#

Yeah, that’s my read too.

Right now the implementation is effectively:

  1. generate childSessionKey
  2. call sessions.patch
  3. do more setup
  4. start the child run
  5. only then return { status: "accepted", childSessionKey, ... }

So in your case, the docs say “non-blocking”, but the current code still has a synchronous setup phase in front of the accepted response. If sessions.patch takes 85s, the caller never gets the accepted payload before the hard-coded 10s timeout. That looks like a bug / docs mismatch.

A couple of practical notes:

  • there is no clean config workaround for “return accepted immediately”
  • runTimeoutSeconds is not the knob for this
  • /subagents spawn is the same path, so it likely won’t help either
#

Best current workarounds I can see:

  1. Make sessions.patch fast enough

    • temporarily set session maintenance to warn-only:
    session: {
      maintenance: { mode: "warn" }
    }
    

    Docs: https://docs.openclaw.ai/gateway/configuration-reference

  2. Recover the child session manually after timeout

    • since the provisional child session is already showing up in sessions.json, grab the newest agent:<id>:subagent:<uuid> from sessions.list
    • then try sessions_send to that session manually

    Caveat: this is a hack. It may bypass normal subagent tracking / auto-announce behavior.

#
  1. If you’re running from source, local patch the timeout in:
    • src/agents/subagent-spawn.ts
    • and probably src/agents/acp-spawn.ts too

The real fix should probably be: return { status: "accepted", childSessionKey } immediately after minting the provisional child key, then do patch/startup in background.

I’d definitely file this as a bug, because “non-blocking” is not what you’re getting here: https://github.com/openclaw/openclaw/issues

If you want, paste the output of sessions.list right after one of these timeouts and I can help figure out the safest manual sessions_send fallback.

cyan torrent
#

Copy and paste this to Krill:

───

Quick update — the session JSONL is the issue, not the file count. My main session file is 5.2MB (one long conversation today with lots of tool calls). sessions.patch takes ~40s on a 5MB file, ~4s on a small one. The 10s hard timeout fires either way.

The session doesn't reset between Telegram conversations — it's one continuous JSONL for the entire agent lifetime. So it only grows.

Two things would fix this:

  1. Raise the spawn timeout from 10s to 30s+ (would work for normal sessions)
  2. OR make spawn truly non-blocking — return the childSessionKey immediately, let patch complete async

Is there a /reset or /compact command that would shrink the active session JSONL without losing the current conversation? Or a config to trigger more aggressive auto-compaction?

Running 2026.4.21 on macOS.

minor graniteBOT
#

Yep, that’s a really useful find.

Big takeaway: for this specific issue, /compact is not the fix.

  • /compact reduces what the model sees, but the full conversation stays on disk, so it does not shrink the active *.jsonl file: https://docs.openclaw.ai/concepts/compaction
  • /reset soft also won’t help, because it keeps the current transcript
  • /new or /reset will help, because they create a new sessionId / new JSONL for that chat key: https://docs.openclaw.ai/concepts/session
  • session.maintenance / sessions cleanup is about sessions.json + old artifacts, not rewriting the active transcript smaller

So if the problem is “active transcript got too big and spawn blocks on it”, the practical workaround today is:

  1. save any important state somewhere durable
  2. run /reset
  3. enable idle resets so Telegram chats don’t grow forever

I’d use something like:

#
{
  session: {
    resetByType: {
      direct: { mode: "idle", idleMinutes: 60 }
    }
  }
}

Docs: https://docs.openclaw.ai/gateway/configuration-reference#session

That should cut a fresh transcript after inactivity, which is the real workaround for “one giant JSONL per long-lived Telegram chat”.

So, short version:

  • Can OpenClaw shrink the active JSONL without cutting a new session? I don’t see a built-in command for that.