#Agent responses fail with assistant turn failed or HTTP 500 errors

1 messages · Page 1 of 1 (latest)

strange sun
#

This started happening sporadically a week ago, first seen only in some cron jobs. Now almost all queries fail. Upgrading to latest 2026.5.12 (f066dd2) did not resolve.

test via openclaw tui:

[assistant turn failed before producing content]

Looking at logs: ```
8:40:47+00:00 error diagnostic {"subsystem":"diagnostic"} lane task error: lane=session:agent:main:main durationMs=21708 error="FailoverError: HTTP 500: Internal Server Error"
18:40:47+00:00 warn model-fallback/decision {"subsystem":"model-fallback/decision"} {"event":"model_fallback_decision","tags":["error_handling","model_fallback","candidate_failed"],"runId":"da8d84d6-3a99-4683-8bc0-73ddbb2b2788","sessionId":"22fc4768-68a9-4ae0-b252-e96f1ff210bc","lane":"main","decision":"candidate_failed","requestedProvider":"openrouter","requestedModel":"minimax/minimax-m2.7","candidateProvider":"openrouter","candidateModel":"minimax/minimax-m2.7","attempt":1,"total":1,"reason":"timeout","status":408,"errorPreview":"500 Internal Server Error","errorHash":"sha256:3290109709a4","fallbackStepType":"fallback_step","fallbackStepFromModel":"openrouter/minimax/minimax-m2.7","fallbackStepFromFailureReason":"timeout","fallbackStepFromFailureDetail":"500 Internal Server Error","fallbackStepChainPosition":1,"fallbackStepFinalOutcome":"chain_exhausted","isPrimary":true,"requestedModelMatched":true,"fallbackConfigured":false} model fallback decision
18:40:47+00:00 error Embedded agent failed before reply: HTTP 500: Internal Server Error

Sometimes, an initial simple question will succeed, but follows in same conversation will fail with the above. Tried multiple models with same provider. Looking at Openrouter with detailed logs enabled, I see every completion API request, and no errors are returned by their system.
minor violetBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> âś… Mark Solution

thorn harnessBOT
# strange sun This started happening sporadically a week ago, first seen only in some cron job...

That log points pretty strongly at a local OpenClaw timeout/abort, not OpenRouter rejecting the request.

Key bits:

  • durationMs=21708
  • fallback reason timeout
  • synthetic status: 408
  • OpenRouter sees the completion request and doesn't show an upstream error

That pattern usually means OpenClaw gave up around ~20s while OpenRouter was still processing. Follow-ups failing more than first turns also fits: bigger session context -> slower first token.

I'd check these first:

openclaw config get agents.defaults.timeoutSeconds
openclaw config get agents.defaults.llm
openclaw models status --probe
thorn harnessBOT
thorn harnessBOT
# strange sun This started happening sporadically a week ago, first seen only in some cron job...
}

Or use something like idleTimeoutSeconds: 180 if you still want a guardrail. For cron jobs, also check each cron's timeoutSeconds; a low cron timeout can make this show up there first.

Docs:
https://docs.openclaw.ai/help/faq
https://docs.openclaw.ai/gateway/troubleshooting
https://docs.openclaw.ai/concepts/agent-loop

If raising that timeout doesn't change the failure, this is worth filing with a diagnostics export because the 408/500 mismatch is useful signal:

openclaw gateway diagnostics export
strange sun
thorn harnessBOT
# strange sun agents.defaults does not contain either timeoutseconds nor llm, so I'm using def...

Yep, good catch. Those paths are valid in current source, but if config get says they are absent then you’re on the built-in defaults:

  • agents.defaults.timeoutSeconds: 172800s / 48h
  • agents.defaults.llm.idleTimeoutSeconds: 120s when unset for normal interactive agent turns

So my first read over-weighted the timeout config. A ~21s failure does not line up with those defaults unless there’s another per-run/client timeout in play. Also make sure the path is camel-case if you set/check it: timeoutSeconds, not timeoutseconds.

Next thing I’d do is split “model transport works” from “full agent loop fails”:

openclaw infer model run --model openrouter/minimax/minimax-m2.7 --prompt "say hi" --json
openclaw models status --probe
openclaw status --all

If infer model run succeeds reliably but TUI/cron agent turns fail, the problem is probably in the agent runtime payload/history/tool surface rather than OpenRouter auth or the model endpoint. In that case, try one fresh isolated/new session and one existing failing session; if only existing sessions fail, I’d suspect session transcript/history or compaction state.

thorn harnessBOT
# strange sun agents.defaults does not contain either timeoutseconds nor llm, so I'm using def...

Two useful rule-outs:

{
  agents: {
    defaults: {
      llm: { idleTimeoutSeconds: 0 }
    }
  }
}

and for cron jobs, inspect the job’s own timeoutSeconds if set.

If this still reproduces, I’d file it with openclaw gateway diagnostics export plus the output of the three commands above. The 408 classification + displayed HTTP 500 + OpenRouter showing clean requests is exactly the kind of mismatch maintainers need in a repro. https://github.com/openclaw/openclaw/issues

strange sun
#

perhaps important error with first infer:
[halr9000@neo ~]$ openclaw infer model run --model openrouter/minimax/minimax-m2.7 --prompt "say hi" --json
Error: No text output returned for provider "openrouter" model "minimax/minimax-m2.7": 400 Reasoning is mandatory for this endpoint and cannot be disabled..

2nd try with thinking specified:
[halr9000@neo ~]$ openclaw infer model run --model openrouter/minimax/minimax-m2.7 --prompt "say hi" --json --thinking medium
{
"ok": true,
"capability": "model.run",
"transport": "local",
"provider": "openrouter",
"model": "minimax/minimax-m2.7",
"attempts": [],
"outputs": [
{
"text": "Hi there! đź‘‹ How are you doing today? Is there something I can help you with?",
"mediaUrl": null
}
]
}

Tried new session over telegram. First query succeeded. Second produced vague error in telegram "⚠️ Something went wrong..." which coincides with new "error Embedded agent failed before reply errors in logs.

thorn harnessBOT
# strange sun perhaps important error with first infer: [halr9000@neo ~]$ openclaw infer mode...

Yep, that first infer error is probably the smoking gun.

openrouter/minimax/minimax-m2.7 is rejecting requests where reasoning is disabled. Since the same command succeeds with --thinking medium, the next test is to force thinking on for the actual chat session/agent, not just infer:

/think medium

Send that as its own message in the Telegram session, then try the second normal query again. If that fixes it, make it the default in config:

{
  agents: {
    defaults: {
      thinkingDefault: "medium"
    }
thorn harnessBOT
# strange sun perhaps important error with first infer: [halr9000@neo ~]$ openclaw infer mode...
  }
}

If you have a specific agents.list[] entry for Telegram/main, you can put thinkingDefault: "medium" on that agent instead.

Why I think this matters: OpenClaw should normally infer a thinking default for catalog models marked reasoning: true, but this OpenRouter MiniMax route is behaving as “reasoning required”. If the catalog/agent path is resolving that as off or losing it on follow-up, the model transport works but the full agent turn fails before content.

Two useful checks:

openclaw models list --provider openrouter | grep -i minimax
openclaw config get agents.defaults.thinkingDefault

Docs: https://docs.openclaw.ai/tools/thinking and https://docs.openclaw.ai/providers/minimax

thorn harnessBOT
strange sun
#

[halr9000@neo ~]$ openclaw models list --provider openrouter | grep -i minimax
openclaw config get agents.defaults.thinkingDefault
openrouter/minimax/minimax-m2.7 text 195k no yes configured

🦞 OpenClaw 2026.5.12 (f066dd2) — Gateway online—please keep hands, feet, and appendages inside the shell at all times.

minimal

Infer will succeed. Telegram after new session, succeeds once only, regardless of think setting. Tried:

  1. /new
  2. /think medium (or minimal or high)
  3. pass: simple request eg "say hi" or tool use such as "create photo of..."
  4. fail: all subsequent attempts with same lane task errors followed by fallback attempts mixed with and ending with HTTP 500 errors.
thorn harnessBOT
# strange sun [halr9000@neo ~]$ openclaw models list --provider openrouter | grep -i minimax o...

That narrows it a lot. If the first turn after /new always works and the second always fails, regardless of /think, this is almost certainly not the original reasoning-default issue anymore. It’s the replay/history path: the second request is the first one that sends prior session history back to OpenRouter/MiniMax.

So I’d treat this as a likely OpenClaw bug or provider-compat bug around replaying the previous assistant turn for openrouter/minimax/minimax-m2.7.

Useful workaround/test: temporarily limit Telegram DM history to only the latest user turn, then restart the gateway:

{
  channels: {
    telegram: {
      dmHistoryLimit: 1
    }
  }
}

If that makes repeated turns work, it confirms the failure is caused by replaying prior turns, not model auth, Telegram delivery, or first-token timeout.

thorn harnessBOT
# strange sun [halr9000@neo ~]$ openclaw models list --provider openrouter | grep -i minimax o...

Also worth testing the same thing outside Telegram with a fixed CLI session id, after your default model is set to that OpenRouter MiniMax model:

openclaw agent --local --session-id m2-repro --message "say hi" --thinking minimal --json
openclaw agent --local --session-id m2-repro --message "say hi again" --thinking minimal --json

If command 1 passes and command 2 fails, that’s a clean non-Telegram repro for GitHub.

For the issue, I’d include:

  • OpenClaw 2026.5.12 (f066dd2)
  • openclaw infer model run ... fails without thinking but succeeds with --thinking medium
  • /new first agent turn succeeds, second turn fails
  • whether channels.telegram.dmHistoryLimit: 1 changes the behavior
  • openclaw gateway diagnostics export

Tracker: https://github.com/openclaw/openclaw/issues

strange sun
#

the above openclaw agent command shows errors consistent with telegram, but the output is much more informative. Openclaw agent loop with the primary model fails, then it falls back to another model, which succeeds. However, this can be repeated across multiple primary models, so I don't think it's the models in question.

Tell me how to just blow away provider and model configs so I can reocnfigure that from scratch w/o losing other settings.

partial output b/c discord limit:

[halr9000@neo ~]$ openclaw agent --local --session-id m2-repro --message "say hi" --thinking minimal --json
17:18:40 [agent/embedded] embedded run agent end: runId=m2-repro isError=true model=nvidia/nemotron-3-super-120b-a12b provider=openrouter error=HTTP 500: Internal Server Error rawError=500 Internal Server Error
17:19:00 [agent/embedded] embedded run failover decision: runId=m2-repro stage=assistant decision=fallback_model reason=timeout from=openrouter/nvidia/nemotron-3-super-120b-a12b profile=sha256:ac092b59b472 rawError=500 Internal Server Error
17:19:00 [diagnostic] lane task error: lane=main durationMs=28157 error="FailoverError: HTTP 500: Internal Server Error"
17:19:00 [diagnostic] lane task error: lane=session:agent:main:explicit:m2-repro durationMs=28160 error="FailoverError: HTTP 500: Internal Server Error"
17:19:00 [model-fallback/decision] model fallback decision: decision=candidate_failed requested=openrouter/nvidia/nemotron-3-super-120b-a12b candidate=openrouter/nvidia/nemotron-3-super-120b-a12b reason=timeout next=openrouter/anthropic/claude-sonnet-4.6 detail=500 Internal Server Error
17:19:05 [model-fallback/decision] model fallback decision: decision=candidate_succeeded requested=openrouter/nvidia/nemotron-3-super-120b-a12b candidate=openrouter/anthropic/claude-sonnet-4.6 reason=unknown next=none
{
"payloads": [
{
"text": "Good day, Hal. How may I be of service?",
"mediaUrl": null
}
[truncated rest of successful run]

thorn harnessBOT
thorn harnessBOT
thorn harnessBOT
thorn harnessBOT