#multiple ollama instances

1 messages · Page 1 of 1 (latest)

patent tulip
#

Hey there. I am looking for a way to distribute my (local) ollama load across my network. For simplicity I'll use names for the ollama instances in my LAN:

  1. jetson
  2. agent-vm
  3. gaming-rig

My idea would be hooking up the agent-vm instance for small tasks like embedding, having jetson as my main low-power thing and the gaming-rig 'if needed' - is that possible?

velvet kiteBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> âś… Mark Solution

desert heartBOT
#

Yep — with one important caveat: OpenClaw can route to multiple configured model providers, but it is not currently a true Ollama cluster scheduler that watches GPU load and automatically picks “least busy”.

Practical setup:

  1. Expose each Ollama instance on the LAN, e.g. http://jetson:11434, http://agent-vm:11434, http://gaming-rig:11434.
  2. Configure them as separate provider IDs, all using native Ollama API — no /v1 suffix.
  3. Use agents.defaults.model.primary for the normal box and fallbacks for “try this if the main one fails”.
  4. For embeddings/memory search, point agents.defaults.memorySearch at the VM’s Ollama endpoint.

Shape would be roughly:

{
  agents: {
    defaults: {
      model: {
#
        primary: "ollama-jetson/qwen3:8b",
        fallbacks: [
          "ollama-gaming/qwen3:32b"
        ]
      },
      memorySearch: {
        provider: "ollama",
        model: "nomic-embed-text",
        remote: {
          baseUrl: "http://agent-vm:11434"
        }
      }
    }
  },
  models: {
#
    mode: "merge",
    providers: {
      "ollama-jetson": {
        api: "ollama",
        apiKey: "ollama-local",
        baseUrl: "http://jetson:11434",
        request: { allowPrivateNetwork: true },
        models: [
          { id: "qwen3:8b", name: "qwen3:8b", input: ["text"], contextWindow: 32768, maxTokens: 8192 }
        ]
      },
      "ollama-gaming": {
        api: "ollama",
        apiKey: "ollama-local",
        baseUrl: "http://gaming-rig:11434",
#
        request: { allowPrivateNetwork: true },
        models: [
          { id: "qwen3:32b", name: "qwen3:32b", input: ["text"], contextWindow: 32768, maxTokens: 8192 }
        ]
      }
    }
  }
}

The fallbacks part is failure/rate-limit/error fallback, not “this prompt looks big, use gaming rig”. For intentional task routing, I’d use separate agents or manual /model switching. If you want real dynamic load balancing across Ollama boxes, put a routing proxy in front of them and point OpenClaw at that one endpoint.

Also: if you want help with exact config, first paste openclaw --version — config details have moved around a bit between versions.

Docs: https://docs.openclaw.ai/providers/ollama and https://docs.openclaw.ai/concepts/models

patent tulip
#

I am running Version
v2026.5.7

Is there a way to add these additional ollama providers via cli commands as I do not feel comfy with editing json?

desert heartBOT
#

Yep. In v2026.5.7 there isn’t a nice guided “add three Ollama LAN nodes” wizard, but you can do it safely with openclaw config set so you don’t hand-edit JSON.

For each Ollama box, add a separate provider ID:

openclaw config set models.mode merge

openclaw config set models.providers.ollama-jetson.api ollama
openclaw config set models.providers.ollama-jetson.apiKey ollama-local
openclaw config set models.providers.ollama-jetson.baseUrl http://jetson:11434
openclaw config set models.providers.ollama-jetson.request.allowPrivateNetwork true --strict-json
openclaw config set models.providers.ollama-jetson.models '[{"id":"qwen3:8b","name":"qwen3:8b","input":["text"],"contextWindow":32768,"maxTokens":8192}]' --strict-json --merge

openclaw config set models.providers.ollama-gaming.api ollama
openclaw config set models.providers.ollama-gaming.apiKey ollama-local
openclaw config set models.providers.ollama-gaming.baseUrl http://gaming-rig:11434
#
openclaw config set models.providers.ollama-gaming.request.allowPrivateNetwork true --strict-json
openclaw config set models.providers.ollama-gaming.models '[{"id":"qwen3:32b","name":"qwen3:32b","input":["text"],"contextWindow":32768,"maxTokens":8192}]' --strict-json --merge

Then set routing/fallbacks:

openclaw config set agents.defaults.model.primary ollama-jetson/qwen3:8b
openclaw config set agents.defaults.model.fallbacks '["ollama-gaming/qwen3:32b"]' --strict-json

For the VM as embeddings-only, you probably don’t need to add it as a chat provider. Point memory search at it:

openclaw config set agents.defaults.memorySearch.provider ollama
#
openclaw config set agents.defaults.memorySearch.model nomic-embed-text
openclaw config set agents.defaults.memorySearch.remote.baseUrl http://agent-vm:11434

Then verify:

openclaw config validate
openclaw models list
openclaw models status

Two small gotchas:

  • Use native Ollama URLs like http://host:11434, not /v1.
  • Each Ollama server must be reachable on your LAN; on the Ollama hosts that usually means binding Ollama to the network interface, not just localhost.
patent tulip
#

the embedding model - does t hat also apply to qmd?

desert heartBOT
#

Mostly no — QMD has its own embedding stack.

agents.defaults.memorySearch.provider/model/remote.baseUrl applies to OpenClaw’s builtin memory engine. If you set:

openclaw config set agents.defaults.memorySearch.provider ollama
openclaw config set agents.defaults.memorySearch.model nomic-embed-text
openclaw config set agents.defaults.memorySearch.remote.baseUrl http://agent-vm:11434

that routes builtin memory embeddings to your agent-vm Ollama.

If you enable QMD:

openclaw config set memory.backend qmd
#

then QMD runs as a local sidecar and uses QMD’s own model env vars instead, e.g.:

export QMD_EMBED_MODEL="hf:Qwen/Qwen3-Embedding-0.6B-GGUF/Qwen3-Embedding-0.6B-Q8_0.gguf"
export QMD_RERANK_MODEL="/absolute/path/to/reranker.gguf"
export QMD_GENERATE_MODEL="/absolute/path/to/generator.gguf"

So: your Ollama embedding config is useful for builtin memory search, and also as the fallback if QMD is unavailable. But QMD itself won’t use agents.defaults.memorySearch.remote.baseUrl to send embeddings to agent-vm.

Docs: https://docs.openclaw.ai/concepts/memory-qmd and https://docs.openclaw.ai/concepts/memory-builtin

patent tulip
#

hm. According to openclaw memory status we are using qmd:

Memory Search (main)
Provider: qmd (requested: qmd)
Model: qmd
Sources: memory, sessions
Indexed: 40/45 files · 40 chunks
Dirty: no
Store: ~/.openclaw/agents/main/qmd/xdg-cache/qmd/index.sqlite
Workspace: ~/.openclaw/workspace
Dreaming: 0 3 * * * · limit=10 · minScore=0.8 · minRecallCount=3 · minUniqueQueries=3 · recencyHalfLifeDays=14 · maxAgeDays=30
By source:
  memory · 19/19 files · 19 chunks
  sessions · 21/26 files · 21 chunks
Vector: disabled
Batch: disabled (failures 0/0)
Recall store: 355 entries · 0 promoted · 354 concept-tagged · 0 spaced · scripts=354 latin
Recall path: ~/.openclaw/workspace/memory/.dreams/short-term-recall.json
Recall updated: 2026-05-13T03:00:08.735Z
QMD audit: ~/.openclaw/agents/main/qmd/xdg-cache/qmd/index.sqlite · 1462272 bytes · 4 collections
Dreaming artifacts: diary present · 0 corpus files · ingestion state absent
Dream corpus: ~/.openclaw/workspace/memory/.dreams/session-corpus
Dream ingestion: ~/.openclaw/workspace/memory/.dreams/session-ingestion.json
Dream diary: ~/.openclaw/workspace/DREAMS.md

But as you can see, something's not right there as it has Vector disabled (which shouldn't be the case then)

desert heartBOT
#

Yep, you caught a real config nuance there.

On v2026.5.7, QMD defaults to:

memory.qmd.searchMode = "search"

That mode is QMD’s lexical/BM25 search path, so OpenClaw reports:

Vector: disabled

So QMD being selected does not automatically mean vector search is active. To use vectors, switch QMD’s search mode:

#
openclaw config set memory.qmd.searchMode vsearch
openclaw memory index --force --verbose
openclaw memory status --deep

If you want QMD’s heavier query/rerank style instead, you can try:

openclaw config set memory.qmd.searchMode query
openclaw memory index --force --verbose
openclaw memory status --deep

My recommendation: start with vsearch to confirm vectors are working. Use query later if you want QMD’s more advanced/possibly slower query behavior.

One more important bit: this still won’t use your Ollama agent-vm embedding config. QMD uses its own local embedding model/env vars (QMD_EMBED_MODEL, etc.). The Ollama embedding config applies to builtin memory, not QMD.

patent tulip
#

QMD_EMBED_MODEL lists a model, not a host as far as I understand

desert heartBOT
#

You’re right — QMD_EMBED_MODEL is a model/path/ref, not a host. Good catch.

So the split is:

  • QMD backend: embeddings are handled by QMD locally via its own model env vars (QMD_EMBED_MODEL, etc.). OpenClaw doesn’t pass an Ollama host into QMD.
  • Builtin memory backend: can use Ollama over the network via agents.defaults.memorySearch.provider + a configured models.providers.<id>.baseUrl.

So if your goal is specifically “run memory embeddings on agent-vm Ollama”, I’d switch memory back to builtin and point it at that provider:

openclaw config set models.providers.ollama-agent-vm.api ollama
openclaw config set models.providers.ollama-agent-vm.apiKey ollama-local
openclaw config set models.providers.ollama-agent-vm.baseUrl http://agent-vm:11434
openclaw config set models.providers.ollama-agent-vm.request.allowPrivateNetwork true --strict-json

openclaw config set memory.backend builtin
#
openclaw config set agents.defaults.memorySearch.provider ollama-agent-vm
openclaw config set agents.defaults.memorySearch.model nomic-embed-text
openclaw memory index --force --verbose
openclaw memory status --deep

If you want to stay on QMD, then the “fix” is local QMD vector mode:

openclaw config set memory.qmd.searchMode vsearch
openclaw memory index --force --verbose
openclaw memory status --deep

…but that will use QMD’s local embedding model on the gateway machine, not your agent-vm Ollama host.

#

So: QMD for richer local sidecar search; builtin for network-routed Ollama embeddings. I’d pick builtin for your LAN load-distribution goal.

patent tulip
#

So qmd was never touching the jetson-ollama then. Thank you for clearing this up.

desert heartBOT
#

Exactly — if Provider: qmd, then QMD was doing its own thing and wasn’t using the jetson/Ollama provider at all.

Your Jetson/Ollama config would matter for chat model routing, or for builtin memory search if you set memory.backend builtin and point agents.defaults.memorySearch.provider at that Ollama provider.

Glad we caught that before you chased phantom network load 🦞

patent tulip
#

So as the memory engine is qmd (running on the agents machine), do I have to reset/reindex dreaming then?

desert heartBOT
#

Usually: no reset needed.

Dreaming is not the vector index itself. It writes/uses durable artifacts like:

  • DREAMS.md
  • memory/.dreams/short-term-recall.json
  • optional dream/session corpus files
  • QMD/session memory collections, if enabled

If you’re staying on QMD and just realized it wasn’t using Jetson/Ollama, you don’t need to wipe dreaming. The safe thing is just to reindex if you changed QMD search/vector settings:

openclaw config set memory.qmd.searchMode vsearch
openclaw memory index --force --verbose
openclaw memory status --deep
#

Only reset/repair dreaming artifacts if openclaw memory status --deep or openclaw doctor reports a dreaming artifact problem. Then use the built-in repair path, not manual deletion:

openclaw memory status --deep --fix

If you switch from QMD → builtin to use agent-vm Ollama embeddings, then yes, do a fresh index for the new backend:

openclaw config set memory.backend builtin
openclaw memory index --force --verbose
openclaw memory status --deep

But I would not delete DREAMS.md or .dreams just because the embedding backend changed. Reindex first; repair only if status says the dreaming artifacts are unhealthy.

patent tulip
#

this looks weird to me:

openclaw memory search "test" did result in a timeout after 8000ms?
qmd vsearch Test --json -n 4 -c memory-root-main -c memory-dir-main -c docs timed out after 8000ms

I do not believe that - running qmd directly:

$  qmd vsearch Test --json -n 4 -c memory-root-main -c memory-dir-main -c docs
Collection not found: memory-root-main

Guess we found the culprit

desert heartBOT
#

Maybe — but one important gotcha first: OpenClaw runs QMD with an agent-scoped XDG/QMD environment:

XDG_CONFIG_HOME=~/.openclaw/agents/main/qmd/xdg-config
QMD_CONFIG_DIR=~/.openclaw/agents/main/qmd/xdg-config/qmd
XDG_CACHE_HOME=~/.openclaw/agents/main/qmd/xdg-cache

So running plain qmd vsearch ... in your shell may be looking at a different QMD index/config than OpenClaw. Test the same environment:

XDG_CONFIG_HOME="$HOME/.openclaw/agents/main/qmd/xdg-config" \
QMD_CONFIG_DIR="$HOME/.openclaw/agents/main/qmd/xdg-config/qmd" \
XDG_CACHE_HOME="$HOME/.openclaw/agents/main/qmd/xdg-cache" \
qmd collection list
#

Then retry your direct command with those env vars:

XDG_CONFIG_HOME="$HOME/.openclaw/agents/main/qmd/xdg-config" \
QMD_CONFIG_DIR="$HOME/.openclaw/agents/main/qmd/xdg-config/qmd" \
XDG_CACHE_HOME="$HOME/.openclaw/agents/main/qmd/xdg-cache" \
qmd vsearch Test --json -n 4 -c memory-root-main -c memory-dir-main -c docs

If it still says Collection not found: memory-root-main, then yes, that’s the culprit: OpenClaw thinks managed collections exist, but QMD’s agent-scoped collection registry doesn’t.

First repair attempt:

openclaw memory index --force --verbose
openclaw memory status --deep
#
openclaw memory search "test"

If it still times out, temporarily raise the QMD search timeout so you can see the real error instead of the wrapper killing it:

openclaw config set memory.qmd.limits.timeoutMs 30000 --strict-json
openclaw memory search "test"

But yes: the “timeout” may be hiding the actual QMD collection problem. Nice catch.

patent tulip
#

I think I see a problem as that thing seems to be trying to run using vulkan or something weird.

Models
  Embedding:   https://huggingface.co/ggml-org/embeddinggemma-300M-GGUF
  Reranking:   https://huggingface.co/ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF
  Generation:  https://huggingface.co/tobil/qmd-query-expansion-1.7B-gguf
[node-llama-cpp] The prebuilt binary for platform "linux" "x64" with Vulkan support is not compatible with the current system, falling back to using no GPU
Not searching for unused variables given on the command line.
-- The C compiler identification is GNU 15.2.0

that looks like ./configure to me and we run into 'problems'

-- Configuring incomplete, errors occurred!
Not searching for unused variables given on the command line.
-- The C compiler identification is GNU 15.2.0

desert heartBOT
#

Yep, that’s almost certainly the local QMD/node-llama-cpp path biting you.

What that log means:

  • QMD is loading its own GGUF models locally.
  • node-llama-cpp first tries a prebuilt Vulkan binary.
  • It decides that binary is not compatible, then falls back to CPU/no-GPU.
  • Because the CPU prebuilt also isn’t usable or isn’t present, it starts a native CMake build.
  • -- Configuring incomplete means that native build failed before it even got to compiling.

So your 8s qmd vsearch timeout is probably not “search is slow”; it’s getting stuck/failing while trying to build/load the local llama runtime.

At this point you’ve got two sane paths:

Recommended for your setup: stop using QMD and use builtin memory with your agent-vm Ollama embeddings. That avoids node-llama-cpp entirely:

#
openclaw config set memory.backend builtin
openclaw config set agents.defaults.memorySearch.provider ollama-agent-vm
openclaw config set agents.defaults.memorySearch.model nomic-embed-text
openclaw memory index --force --verbose
openclaw memory status --deep

If you want to keep QMD: you’ll need to fix the local node-llama-cpp native runtime on that machine. For diagnosis, rerun with debug and grab the first real CMake error after Configuring incomplete:

NODE_LLAMA_CPP_DEBUG=true qmd vsearch Test --json -n 4

The lines shown so far are not the root error yet — the useful bit is usually a missing compiler/toolchain/library, incompatible Node arch, or a failed optional GPU backend probe.

Given your original goal was LAN load distribution, I’d honestly switch to builtin+Ollama here. QMD is neat, but it’s the opposite of what you want: it pulls embedding/rerank/generation work back onto the agent machine.

patent tulip
#

the actual error where configure craps out is this:

-- Adding CPU backend variant ggml-cpu: -march=native
CMake Error at /usr/share/cmake-4.2/Modules/FindPackageHandleStandardArgs.cmake:290 (message):
  Could NOT find Vulkan (missing: Vulkan_LIBRARY Vulkan_INCLUDE_DIR glslc)
  (found version "")
Call Stack (most recent call first):
  /usr/share/cmake-4.2/Modules/FindPackageHandleStandardArgs.cmake:654 (_FPHSA_FAILURE_MESSAGE)
  /usr/share/cmake-4.2/Modules/FindVulkan.cmake:694 (find_package_handle_standard_args)
  llama.cpp/ggml/src/ggml-vulkan/CMakeLists.txt:9 (find_package)
desert heartBOT
#

Yep, that’s the smoking gun.

QMD/node-llama-cpp is trying to build llama.cpp with the Vulkan backend enabled, but the machine doesn’t have the Vulkan SDK pieces installed:

missing: Vulkan_LIBRARY Vulkan_INCLUDE_DIR glslc

So you have two choices:

A) Install Vulkan build deps on the agent machine. On Ubuntu/Debian-ish systems that’s typically:

sudo apt install libvulkan-dev vulkan-tools glslc

Package names can vary by distro (shaderc sometimes provides glslc).

#

B) Force CPU/no Vulkan for QMD/node-llama-cpp. This is probably what you want if the agent machine shouldn’t do GPU work. Try setting this in the Gateway environment:

NODE_LLAMA_CPP_CMAKE_OPTION_GGML_VULKAN=OFF

Because OpenClaw runs as a gateway/service, don’t only export it in your interactive shell unless you run the gateway from that shell. Put it somewhere the gateway loads, e.g. ~/.openclaw/.env, then restart the gateway:

echo 'NODE_LLAMA_CPP_CMAKE_OPTION_GGML_VULKAN=OFF' >> ~/.openclaw/.env
openclaw gateway restart

Then retry:

#
openclaw memory index --force --verbose
openclaw memory search "test"

If that env var doesn’t affect QMD’s node-llama-cpp build path, then the practical fix is either install the Vulkan deps or switch back to builtin+Ollama embeddings. But the root cause is now clear: it’s not OpenClaw search timing out; it’s QMD’s local llama runtime failing during Vulkan-enabled native build.

patent tulip
#

That did it on top of more RAM. Thanks

desert heartBOT
#

Nice, glad that fixed it 🎉

So the culprit was QMD/node-llama-cpp trying to build/use Vulkan locally, plus not enough RAM headroom. If this thread solved it, please right-click the solution message → Apps → ✅ Mark Solution so Answer Overflow indexes it for the next person who hits the same QMD/Vulkan rabbit hole.