#Which Ollama models for memory?

1 messages · Page 1 of 1 (latest)

nimble ice
#

I have an openclaw bot with important social function. I use qmd and honcho. OpenClaw runs on an Ubuntu vm, on macOS. Ollama runs on the Mac and connects to the vm. I want to use a local model for the deriver, and maybe also for the dialectic reasoning. For dreaming I plan to use a cloud model. I’m not sure about qmd.

Which Ollama-available models do you advise for : 1. Qmd 2. The deriver from honcho 3. Dialectic reasoning 4. Honcho dreaming. I have an m3 pro MacBook Pro with 36 gb ram of which 8 gb is for the vm.

waxen sealBOT
neon spadeBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution

gilded iglooBOT
#

Short version, I’d do this:

| Part                   | What I’d use                                  | Why                                                                                                                                                              |
| ---------------------- | --------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| 1. QMD                 | Leave QMD on its defaults                     | QMD does not need Ollama. It uses its own local GGUF embed/rerank/generate models and auto-downloads them. I would not spend time tuning this first.             |
| 2. Honcho deriver      | Local                                         | Best place to save cloud spend. Use a tool-calling generalist in the 8B to 20B class. On your hardware, start around 20B, then step down if latency is annoying. |
| 3. Dialectic reasoning | Local only for minimal/low, cloud for medium+ | For a bot with important social function, I would not trust heavier reasoning to a small local model full-time.                                                  |
| 4. Honcho dreaming     | Cloud                                         | This is the highest-value reasoning pass, infrequent, and most worth paying for.                                                                                 |

A few concrete notes:

QMD: first query can be slow because it downloads about 2 GB of GGUFs. That is normal.
https://docs.openclaw.ai/concepts/memory-qmd
Deriver: pick a local model with good function/tool calling. On your Mac, I would test a 20B-class generalist first. If it is too slow, drop to a smaller fast instruct model.
Dialectic: if you really want local here, keep it to minimal/low only. For socially important replies, I’d keep the stronger tiers in the cloud.
Dreaming: