Here's a concise prompt for Discord:
OpenClaw TUI + llama-server streaming issue - 4min response delay
Setup:
- OpenClaw 2026.2.19-2 on Ubuntu 24.04
- Local llama-server (llama.cpp) at
http://127.0.0.1:11434/v1 - Model: Qwen3-Coder-Next-Q4_K_M (70B, CPU inference)
- Provider: vLLM in OpenClaw config
Problem:
OpenClaw TUI shows "running" for 4+ minutes per response. Model actually generates responses in ~20-30 seconds (confirmed via direct curl). Responses only appear in TUI after ~4 minutes OR when I exit/restart TUI (then all past responses appear instantly).
What I've tried:
- Set
agents.defaults.models["vllm/qwen3-coder"].streaming = false- no effect - Verified llama-server responds quickly via curl
- Gateway logs show "embedded run agent start" but no API call errors
- Model responds correctly, just streaming/delivery is broken
Config excerpt:
"models": {
"providers": {
"vllm": {
"baseUrl": "http://127.0.0.1:11434/v1",
"apiKey": "dummy",
"api": "openai-completions"
}
}
}
Question: How do I fix the streaming delay or force non-streaming responses in OpenClaw TUI with a local llama-server backend?