#Spawned sub agents always fail on selected model

1 messages · Page 1 of 1 (latest)

winter shoreBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> âś… Mark Solution

summer gladeBOT
#

What you’re seeing is usually OpenClaw doing exactly what it’s designed to do: treat a provider timeout as a failover-worthy failure, put the current auth profile into a cooldown/disabled state, then walk fallbacks.

Key bits of how it works (docs):

Settings / states that commonly “trip” failover when you think it shouldn’t

  1. The Anthropic profile is in cooldown/disabled (even if your account has quota)
  • OpenClaw persists cooldown/disabled state in the agent’s auth-profiles.json under usageStats (cooldownUntil / disabledUntil). A single bad streak can cause “why is it skipping my main provider?” confusion.
  • Inspect it with:
    openclaw models status --json
    
    and look at auth.unusableProfiles. (This is explicitly called out in the FAQ + provider docs.)
#
  1. Your “quota limit” errors are actually on the fallback providers
    Even if Anthropic is fine, if your agents.defaults.model.fallbacks includes (say) OpenRouter/Gemini/etc and those aren’t authenticated / funded, you’ll see failures that look like “quota” (or “no API key”) when the fallback kicks in.

If you want “Claude only, no wandering”, set fallbacks to empty / remove non-Claude entries.

  1. Your sub-agent model override is being rejected
    From the sub-agent docs: if sessions_spawn.model is an invalid value, it’s skipped and the sub-agent runs on the default model (with a warning in the tool result). This can make it look like “I pinned Claude but it still rotated”.

Make sure the model you pass is:

  • an exact provider/model string (e.g. anthropic/claude-opus-4-5), and
  • present in agents.defaults.models (that catalog acts like an allowlist for selection).
  1. You’re debugging the wrong step: sub-agent run vs announce step
    sessions_spawn is “run sub-agent” then “announce back”. Either of those can hit a timeout/failover. If the announce step fails, it can look like “the sub-agent failed” even if the work basically completed.

Use:

  • /subagents list
#
  • /subagents log <id> [limit] [tools]

Timeouts: what’s actually configurable (and the defaults)

These are the ones that most often matter for this symptom:

#

Other common tool timeouts (usually not the cause of “provider timeout”, but good to know):

  • tools.web.search.timeoutSeconds default 30
  • tools.web.fetch.timeoutSeconds default 30
  • tools.exec.timeoutSec default 1800
  • media understanding timeouts default 60/60/120 (image/audio/video)

(Those defaults are documented in gateway configuration: https://docs.openclaw.ai/gateway/configuration)

“Recommended” values if you’re using Claude

  • Start by not changing anything unless you’ve confirmed you’re hitting the agent runtime abort.
  • If you do long multi-step runs (tools + analysis), bump:
    • agents.defaults.timeoutSeconds: 600 → 1200 (or 1800 if you often do long tool chains)
  • Keep web/tool timeouts near defaults unless you know they’re the bottleneck.

OpenClaw does not document a per-provider HTTP request timeout knob for Anthropic in the main config; the “provider timeout” you see is typically coming from the underlying request/stream timing out or being aborted by the run timeout.

The 3 things I’d like you to paste (so this becomes obvious fast)