A 14B Q8 Ollama model can absolutely feel “slow” on a 24GB unified‑memory Mac — especially once the KV cache grows (long chats / large context), because you end up memory‑bandwidth bound and sometimes into swap.
A few quick checks that usually pinpoint what’s going on:
- First: what version + what model is OpenClaw actually using?
openclaw --version
openclaw models status
(or in Telegram: /model status)
- Is Ollama actually the bottleneck vs Telegram delivery?
- Compare latency in the TUI (no Telegram involved):
openclaw tui - Tail logs and look for the LLM request duration vs “send message” duration:
openclaw logs --follow