#Title: openclaw doctor --repair silently rewrites agents.defaults.model.primary, breaks CLI session

1 messages · Page 1 of 1 (latest)

steady python
#

Version: 2026.5.7 (eeef486)

What happened:
Ran openclaw doctor --repair as part of recovering from gateway stalls. Afterward, the model status reported:

Current: anthropic/claude-opus-4-7 (selected)
Active: claude-cli/claude-opus-4-7 (runtime)

The runtime was still claude-cli/... only because the CLI session was already attached. On the next gateway/CLI restart, the selected anthropic/... provider would take over and break Claude-CLI transcript continuity (per-session-id .jsonl stops being read, conversation context is lost).

I confirmed by inspecting ~/.openclaw/openclaw.json:

"agents": {
"defaults": {
"model": {
"primary": "anthropic/claude-opus-4-7",
...

Before running doctor --repair, this was claude-cli/claude-opus-4-7.

Reproduction:

  1. Set agents.defaults.model.primary to claude-cli/claude-opus-4-7 (or any claude-cli/... value).
  2. Run openclaw doctor --repair.
  3. Check openclaw config get agents.defaults.model.primary — flipped to anthropic/claude-opus-4-7.

Workaround:

openclaw config set agents.defaults.model.primary claude-cli/claude-opus-4-7

(Then gateway restart.)

Why this matters:

  1. --repair is documented as "non-destructive remediation," but silently rewriting the user-selected provider is destructive to a user setting.
  2. CLI users who rely on claude-cli/... for transcript continuity lose context on the next restart with no warning surfaced in the doctor output.
  3. The upgrade docs (and Krill's own troubleshooting playbook) recommend doctor --repair as part of normal upgrade hygiene, so this footgun fires on a recommended path.

Asks:

  1. Can doctor --repair be made provider-preserving — only fill in primary if unset/invalid, never rewrite a valid one?
  2. If the rewrite is intentional, surface it in doctor's output (Changed: agents.defaults.model.primary: claude-cli/... → anthropic/...) so the user can react.
  3. Is this still live in 5.12? Asking before I do the 5.7 → 5.12 upgrade.
toxic palmBOT
# steady python Version: 2026.5.7 (eeef486) What happened: Ran openclaw doctor --repair as part...

I checked the v2026.5.7 and v2026.5.12 source, plus the FAQ/doctor docs path first.

Short answer: yes, the primary rewrite is still a live migration pattern in 5.12, but it is supposed to preserve the CLI runtime separately.

What I found:

  • claude-cli/... is treated as a legacy runtime-encoded model ref.
  • doctor/config migration rewrites it to canonical anthropic/....
  • In 5.7, the migration should also set agents.defaults.agentRuntime.id: "claude-cli".
  • In 5.12, this moved to provider/model scoped runtime policy: the canonical model entry should get agentRuntime.id: "claude-cli".

So after repair, Current: anthropic/... + Active: claude-cli/... can be expected, and should not by itself mean Anthropic API will take over. The dangerous case is if the config only got the primary rewrite and did not get the corresponding runtime policy.

Before upgrading, I’d check these:

openclaw config get agents.defaults.model
toxic palmBOT
# steady python Version: 2026.5.7 (eeef486) What happened: Ran openclaw doctor --repair as part...
openclaw config get agents.defaults.agentRuntime
openclaw config get agents.defaults.models

For 5.7, look for agents.defaults.agentRuntime.id = claude-cli. For 5.12, look for the anthropic/claude-opus-4-7 model entry carrying agentRuntime.id = claude-cli.

I agree with the issue shape, though: even if intentional, doctor should surface this as a clear “changed X -> Y and selected claude-cli runtime” migration because it looks destructive from models status. If the runtime policy is missing after --repair, that’s a real bug and worth filing with the before/after config snippets: https://github.com/openclaw/openclaw/issues

Docs ref for the doctor path: https://docs.openclaw.ai/gateway/doctor

steady python
#

Thanks — that clarifies the design. Confirming my state matches what you described:

agents.defaults.agentRuntime.id = "claude-cli"
agents.defaults.model.primary = "claude-cli/claude-opus-4-7" (after I manually patched it back)
agents.defaults.models = anthropic/... entries only (no claude-cli/... keys)

So the runtime policy was intact right after doctor --repair — meaning the migration did the right thing under the hood, and my flip back to claude-cli/... was probably unnecessary belt-and-suspenders. If that's correct, please confirm: is primary: anthropic/claude-opus-4-7 + agentRuntime.id: claude-cli the canonical post-migration state in 5.7, and should I revert my manual patch before the 5.12 upgrade so the migration sees a clean starting point?

Two things I'd still flag as real (separate from the migration working):

  1. models status UX is the actual footgun. Current: anthropic/... (selected) + Active: claude-cli/... (runtime) reads to a user as "the next restart will swap providers and lose CLI continuity." There's no surface that says "these are the same thing, runtime is pinned by agentRuntime.id." A one-line hint in models status output (e.g. Runtime: claude-cli (pinned by agents.defaults.agentRuntime.id)) would close the gap entirely.

  2. 5.12 post-upgrade verification target. Per your note, in 5.12 the runtime policy moves onto the model entry (models["anthropic/claude-opus-4-7"].agentRuntime.id = "claude-cli"). So after upgrade I should run:

openclaw config get agents.defaults.models
and confirm the anthropic/claude-opus-4-7 entry carries agentRuntime.id: "claude-cli". If --repair migrates the top-level agents.defaults.agentRuntime.id down onto the per-model entry, great. If it drops it on the floor, that's the real bug.

I'll file the GitHub issue scoped to the UX problem (and any 5.12 migration gap if I see one post-upgrade), with before/after config snippets.

toxic palmBOT
# steady python Thanks — that clarifies the design. Confirming my state matches what you describ...

Confirmed one important nuance, and I need to correct the upgrade advice slightly.

For 5.7: yes, agents.defaults.model.primary = anthropic/claude-opus-4-7 + agents.defaults.agentRuntime.id = claude-cli is the intended post-migration shape. Your original read that the runtime policy stayed intact was right.

For the 5.12 upgrade: I would not rely on “canonical 5.7 shape + top-level agentRuntime” being enough. In 5.12, agents.defaults.agentRuntime is considered ignored legacy config and the repair path removes it. The migration that writes models["anthropic/..."].agentRuntime.id = "claude-cli" is triggered by seeing a legacy runtime model ref like claude-cli/claude-opus-4-7.

So safest path before 5.12 is one of these:

# Option A: leave/restore the legacy runtime-encoded primary before upgrade
openclaw config set agents.defaults.model.primary claude-cli/claude-opus-4-7

Then after 5.12 + doctor/repair, verify:

toxic palmBOT
# steady python Thanks — that clarifies the design. Confirming my state matches what you describ...
openclaw config get agents.defaults.model
openclaw config get agents.defaults.models

Expected 5.12 result: primary is anthropic/claude-opus-4-7, and that model entry has agentRuntime.id: "claude-cli".

Or Option B, if you want to be explicit before/after repair, manually ensure the canonical model entry already has the runtime policy before letting 5.12 remove top-level agents.defaults.agentRuntime.

Your two UX flags still stand. In fact this makes the issue stronger: the migration is behavior-preserving only if the runtime pin makes it to the new model-scoped location, and models status should make that pin obvious. The issue title can probably be scoped as “models status/doctor migration should surface claude-cli runtime pin when canonicalizing claude-cli model refs.”