#Model failover doesn't trigger when provider hangs without returning an error

1 messages · Page 1 of 1 (latest)

frosty relic Jan 26, 2026, 1:55 PM

Hey any help with this?

rocky crystalBOT Jan 26, 2026, 1:59 PM

frosty relic Hey any help with this?

Based on the docs and my research, here's the situation:

How Failover Works Currently

Clawdbot's failover mechanism handles:

Auth profile rotation within a provider (on auth/rate-limit errors)
Model fallback to the next model in agents.defaults.model.fallbacks
Cooldowns for "a timeout that looks like rate limiting" (exponential backoff: 1m → 5m → 25m → 1h cap)

The Problem With "Hangs"

The issue you're describing—a provider that hangs without returning any error—touches on a subtle distinction:

HTTP timeout errors (server responds but slowly, or returns a timeout status) → trigger cooldown/failover ✅
TCP-level hangs (connection opens but nothing comes back, or DNS/resolution issues) → may not trigger proper failover ❌

The docs say "a timeout that looks like rate limiting" triggers cooldown, but a true "hang" (where the provider never responds at all) might not be properly detected.

What To Check

Check your logs for timeout-related entries:

clawdbot logs --follow | grep -i "timeout\|hang\|fetch"

Verify model fallbacks are configured:

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "provider/model-a",
        "fallbacks": ["provider/model-b", "provider/model-c"]
      }

       }
     }
   }

Check for provider-specific timeouts - some providers may need explicit timeout configuration

Workarounds

Add multiple models to your fallbacks list
Use providers with more reliable uptime
Monitor logs for hanging behavior

This might be a bug - if the provider genuinely hangs without any error response, it should still trigger failover. You might want to open a GitHub issue with:

Gateway version (clawdbot --version)
Relevant log snippets

Which provider is hanging

Link to relevant docs: https://docs.clawd.bot/concepts/model-failover