#[Help] No response in OpenClaw Web UI with mlx-community/Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-4

1 messages · Page 1 of 1 (latest)

neon lintel
#

Hi everyone,

I’m running on a Mac mini M4 24GB, using oMLX as the backend.

  • When I use  mlx-community/Qwen3.5-9B-MLX-4bit , everything works fine in both OpenMLX chat and OpenClaw Web UI.

  • But when I switch to  mlx-community/Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-4bit , I have an issue:

  • In OpenMLX backend: the model loads normally and I can see outputs as usual.

  • In OpenClaw Web UI: it just shows the three loading dots forever with no output at all.

After about 5–6 minutes, the OpenMLX process killed loaded model on the backend, but the Web UI still stays stuck loading.

Does anyone know what could be causing this? Where should I start debugging?

Thanks a lot!

wary mirageBOT
glacial duneBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> âś… Mark Solution

wary mirageBOT
velvet elbowBOT
#

Let me check the local docs layout first, then I’ll answer with something concrete.

#

This smells less like “the model won’t load” and more like “OpenClaw’s bigger prompt / stream format is hitting an oMLX compatibility edge.”

First thing, can you paste:

openclaw --version
openclaw status --all
openclaw models status

Then I’d debug in this order:

  1. Tail logs while reproducing
    openclaw logs --follow
    
    Docs: https://docs.openclaw.ai/help/faq and https://docs.openclaw.ai/gateway/troubleshooting
#
  1. Compare tiny OpenClaw inference vs Web UI

    openclaw infer model run --model <your-provider/model> --prompt "Reply with exactly: smoke-ok" --json
    

    If this fails too, it’s not a Web UI problem, it’s provider/model compatibility.

  2. If direct /v1/chat/completions works but OpenClaw hangs, try these compat flags on that model:

    compat: {
      requiresStringContent: true,
      supportsTools: false,
    }
    

    Relevant docs:

#

A few likely culprits here:

  • 27B on 24 GB unified memory is very borderline for a full OpenClaw agent turn. OpenMLX chat is much lighter than OpenClaw’s real prompt.
  • If your model entry has maxTokens == contextWindow, that can cause silent failures. Set maxTokens to a sane output cap like 1024-4096, not the full context size.
  • For smaller local contexts, OpenClaw’s default compaction reserve can be too high. Check:
    openclaw config get agents.defaults.compaction.reserveTokensFloor
    openclaw config get agents.defaults.timeoutSeconds
    
    On local MLX setups, lowering reserve floor and increasing timeout often helps.

Also, there was a known oMLX + reasoning_content bug with Qwen-style reasoning models. If you see logs mentioning reasoning_content, JSON parse errors, or the first turn works but later turns fail, that’s a strong clue.

If you paste:

  • openclaw --version
  • your model config block
  • the relevant openclaw logs --follow output