#Model failover doesn't trigger when provider hangs without returning an error
1 messages · Page 1 of 1 (latest)
Based on the docs and my research, here's the situation:
How Failover Works Currently
Clawdbot's failover mechanism handles:
- Auth profile rotation within a provider (on auth/rate-limit errors)
- Model fallback to the next model in
agents.defaults.model.fallbacks - Cooldowns for "a timeout that looks like rate limiting" (exponential backoff: 1m → 5m → 25m → 1h cap)
The Problem With "Hangs"
The issue you're describing—a provider that hangs without returning any error—touches on a subtle distinction:
- HTTP timeout errors (server responds but slowly, or returns a timeout status) → trigger cooldown/failover ✅
- TCP-level hangs (connection opens but nothing comes back, or DNS/resolution issues) → may not trigger proper failover ❌
The docs say "a timeout that looks like rate limiting" triggers cooldown, but a true "hang" (where the provider never responds at all) might not be properly detected.
What To Check
-
Check your logs for timeout-related entries:
clawdbot logs --follow | grep -i "timeout\|hang\|fetch" -
Verify model fallbacks are configured:
{ "agents": { "defaults": { "model": { "primary": "provider/model-a", "fallbacks": ["provider/model-b", "provider/model-c"] }
}
}
}
- Check for provider-specific timeouts - some providers may need explicit timeout configuration
Workarounds
- Add multiple models to your
fallbackslist - Use providers with more reliable uptime
- Monitor logs for hanging behavior
This might be a bug - if the provider genuinely hangs without any error response, it should still trigger failover. You might want to open a GitHub issue with:
- Gateway version (
clawdbot --version) - Relevant log snippets
- Which provider is hanging
Link to relevant docs: https://docs.clawd.bot/concepts/model-failover