#QMD memory engine: qmd embed timeout loop on ARM (Raspberry Pi 5)

12 messages · Page 1 of 1 (latest)

lofty mountain
#

Could be related to #1465911124703576074 message

───

QMD on ARM (Pi 5): qmd embed timeout loop

Environment: Pi 5, aarch64, OpenClaw 2026.4.14, QMD v2.1.0

Problem: Every qmd embed cycle (boot + interval) times out at 120s with exponential backoff that never resolves.

Root cause (two layers):

  1. OpenClaw: embedTimeoutMs only applies to boot embed. Interval embeds use a hardcoded 120s default with no config override.
  2. Upstream (node-llama-cpp): On every QMD invocation, it attempts to compile llama.cpp with Vulkan. Pi has no Vulkan libs, so cmake fails — but the build attempt eats the entire 120s window. NODE_LLAMA_CPP_GPU=false doesn't prevent this (only affects runtime GPU selection, not compile-time detection). A prebuilt CPU binary exists but the Vulkan build is still attempted regardless.

Workaround: BM25-only search mode skips vector embeddings entirely:

{"memory": {"backend": "qmd", "qmd": {"searchMode": "search"}}}

Searches return in ~1.8s, no timeouts. Loses vector search/reranking but BM25 is solid for markdown memory files.

Suggestions:
• Make interval embed timeout configurable
• Skip embed cycles when searchMode: "search" is set (BM25 doesn't use vectors)
• Flag upstream to node-llama-cpp: Vulkan build failure should be cached, not retried every invocation

coral creekBOT
#

nice catch, and yeah, this looks like a real bug on the OpenClaw side.

I checked the docs/source here, and a few things are confirmed:

  • memory.qmd.update.embedTimeoutMs is a real supported config key, and it is documented as applying to each qmd embed cycle, not just boot. If interval embeds are still hardcoded to 120000, that is a regression/bug.
  • memory.qmd.update.embedInterval is also a real knob, and 0 is documented to disable periodic embeds.
  • searchMode is real too, but current docs still describe OpenClaw as running qmd update + qmd embed on boot and periodically, so skipping embeds when searchMode: "search" is set is not current behavior, just a good improvement idea.

Docs:

So for now, the safest workarounds look like:

{
#
  memory: {
    backend: "qmd",
    qmd: {
      searchMode: "search",
      update: {
        embedInterval: "0"
      }
    }
  }
}

That should stop the periodic embed loop. If your QMD setup also avoids embeddings in search mode, even better, but I would treat that as QMD-specific behavior, not an OpenClaw guarantee.

The node-llama-cpp Vulkan rebuild attempt on every invocation also sounds like a separate upstream bug, especially if NODE_LLAMA_CPP_GPU=false does not stop the compile-time probe/cache miss path.

lofty mountain
coral creekBOT
#

perfect, thanks for filing it.

That’s the right place, and the writeup is strong. I’d keep the current workaround for now:

{
  memory: {
    backend: "qmd",
    qmd: {
      searchMode: "search",
      update: {
        embedInterval: "0"
      }
    }
  }
}
#

Small nuance, embedInterval: "0" should stop the periodic embed loop, but I’m not confident it suppresses the boot-time embed too, so if you test that behavior it’d be useful to add to the issue.

If you also open an upstream issue for the repeated Vulkan compile probe in node-llama-cpp, linking both bugs together would help a lot.

lofty mountain
#

Tested behavior with embedInterval: "0":

• ✅ Periodic interval embeds: fully suppressed (no interval embed messages in logs after 1+ hour)
• ⚠️ Boot embed: unclear — no boot embed success/failure logged. May have been suppressed or completed silently against cached embeddings. Couldn't confirm either way from logs alone.

coral creekBOT
#

that’s super useful, thanks.

So the picture now looks like:

  • embedInterval: "0" is a valid mitigation for the periodic loop
  • the boot-time path is still ambiguous, partly because the logging/observability there seems weak

I’d add that exact result to the issue, it helps narrow the bug a lot:

  • interval embeds: confirmed suppressed
  • boot embed: behavior unclear, logs not conclusive

That makes this feel like two separate fixes may be needed:

  1. timeout handling/config on the periodic path
  2. better boot-embed behavior or at least clearer logging around whether it ran/skipped/succeeded
lofty mountain
#

Updated issue with above

coral creekBOT
#

awesome, thanks, that should make the issue a lot easier to act on.

You’ve already narrowed it down better than most bug reports do:

  • confirmed workaround
  • confirmed interval suppression with embedInterval: "0"
  • clear uncertainty isolated to boot behavior
  • likely upstream node-llama-cpp interaction called out separately

That’s a very solid repro.

lofty mountain