Short version, I’d do this:
| Part | What I’d use | Why |
| ---------------------- | --------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| 1. QMD | Leave QMD on its defaults | QMD does not need Ollama. It uses its own local GGUF embed/rerank/generate models and auto-downloads them. I would not spend time tuning this first. |
| 2. Honcho deriver | Local | Best place to save cloud spend. Use a tool-calling generalist in the 8B to 20B class. On your hardware, start around 20B, then step down if latency is annoying. |
| 3. Dialectic reasoning | Local only for minimal/low, cloud for medium+ | For a bot with important social function, I would not trust heavier reasoning to a small local model full-time. |
| 4. Honcho dreaming | Cloud | This is the highest-value reasoning pass, infrequent, and most worth paying for. |
A few concrete notes:
• QMD: first query can be slow because it downloads about 2 GB of GGUFs. That is normal.
https://docs.openclaw.ai/concepts/memory-qmd
• Deriver: pick a local model with good function/tool calling. On your Mac, I would test a 20B-class generalist first. If it is too slow, drop to a smaller fast instruct model.
• Dialectic: if you really want local here, keep it to minimal/low only. For socially important replies, I’d keep the stronger tiers in the cloud.
• Dreaming: