#installing node-llama-cpp for local memory embeddings

1 messages · Page 1 of 1 (latest)

opaque tusk
#

Self-hosted Docker setup: installing node-llama-cpp for local memory embeddings — is this the right approach?
Running OpenClaw 2026.4.25 in a custom Docker image built on top of ghcr.io/openclaw/openclaw:2026.4.25 on a Linux container (Debian 12 base, NVIDIA GPU available, runtime: nvidia configured in compose).
openclaw memory status --deep fails with:
Local embeddings unavailable.
Reason: optional dependency node-llama-cpp is missing (or failed to install).
Detail: Cannot find package 'node-llama-cpp' imported from
/app/dist/memory-core-host-engine-embeddings-BCID_0RN.js
Config has agents.defaults.memorySearch.provider: "local" with no explicit model path — expecting OpenClaw to auto-download embeddinggemma-300m-qat-Q8_0.gguf.
Proposed Dockerfile fix:
dockerfileRUN apt-get update && apt-get install -y --no-install-recommends
cmake clang libgomp1 &&
rm -rf /var/lib/apt/lists/*

RUN npm install -g node-llama-cpp@3.18.1
Model cache is mounted at /home/node/.node-llama-cpp as a named Docker volume so the auto-downloaded GGUF survives rebuilds.
Questions:

Is npm install -g the correct install pattern for a Dockerfile layered on top of the official OpenClaw image? Or should it go through the OpenClaw plugin mechanism instead?
Are cmake, clang, and libgomp1 sufficient build tools, or does the openclaw base image already include some of these (making them redundant)?
The container has CUDA 12.3 (cuda-cudart-12-3, libcublas-12-3) installed. Will node-llama-cpp automatically detect and use CUDA for embeddings, or does it need additional CUDA libs (e.g. cuda-libraries-12-3) to compile with GPU support?
Separate issue: the memory-core dreaming cron job in jobs.json has no agentId field (all other jobs have agentId: "main"). Dreaming ends with status=timeout and missing scope: operator.admin on cleanup. Is the missing agentId the cause, and is the fix simply adding "agentId": "main" to that job entry?

hybrid raftBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> âś… Mark Solution

opaque tusk
#

@maiden hill

maiden hillBOT
#

Short version: don’t use npm install -g here. Install node-llama-cpp into /app/node_modules, next to OpenClaw, because OpenClaw’s local embedding path does import("node-llama-cpp") from the app runtime. A global install usually won’t satisfy that import.

Recommended shape:

FROM ghcr.io/openclaw/openclaw:2026.4.25

USER root
RUN apt-get update && apt-get install -y --no-install-recommends \
    build-essential python3 cmake clang libgomp1 \
 && rm -rf /var/lib/apt/lists/*

USER node
WORKDIR /app
RUN npm install --no-save --package-lock=false node-llama-cpp@3.18.1
#

Notes:

  • This is not an OpenClaw plugin install. node-llama-cpp is an optional runtime package used by the built-in local embedding provider. OpenClaw docs describe local memory embeddings as local + node-llama-cpp, with the default GGUF model auto-downloaded when local.modelPath is unset. (docs.openclaw.ai)
  • The official Docker image is slim/non-root and supports baking extra apt packages, but it does not ship a full native build toolchain by default; cmake clang libgomp1 is a bit light if source-build fallback happens. Add build-essential and python3. (docs.openclaw.ai)
  • Your /home/node/.node-llama-cpp volume idea is good. I’d also set this explicitly so there’s no ambiguity:
    agents: {
      defaults: {
        memorySearch: {
          provider: "local",
          fallback: "none",
          local: {
            modelCacheDir: "/home/node/.node-llama-cpp"
          }
        }
      }
    
#
  }

CUDA: node-llama-cpp should auto-use CUDA when CUDA support is detected; OpenClaw doesn’t need a separate setting for that. But CUDA 12.3 is suspicious: current node-llama-cpp docs say CUDA prebuilt binaries target CUDA Toolkit 12.4/13.1+, and source builds need a real CUDA toolkit / nvcc, not just cuda-cudart + libcublas. Test inside the container with:

npx --no node-llama-cpp inspect gpu

If you don’t see CUDA: available / VRAM output, either update the container CUDA stack to 12.4+ or use a CUDA devel image/toolkit so it can compile with CUDA. (node-llama-cpp.withcat.ai)

For Dreaming: missing agentId on the managed memory-core dreaming cron is probably not the cause. Current code intentionally creates that managed cron without agentId; cron resolves missing agentId to the default agent. The missing scope: operator.admin cleanup issue matches a newer fix after 2026.4.25: changelog says memory-core subagent sessions are now tagged with plugin ownership so dreaming cleanup can delete its own ephemeral sessions without broad admin deletion. Also the narrative timeout was raised from 15s to 60s. I’d update to 2026.4.26+ rather than hand-editing jobs.json. (raw.githubusercontent.com)

After rebuilding:

#
docker compose run --rm openclaw-cli node -e "import('node-llama-cpp').then(()=>console.log('ok'))"
docker compose run --rm openclaw-cli npx --no node-llama-cpp inspect gpu
docker compose run --rm openclaw-cli openclaw memory status --deep --agent main
#

Yep — I’d do it, but not with npm install -g.

Use /app/node_modules, because OpenClaw imports node-llama-cpp from the app runtime:

FROM ghcr.io/openclaw/openclaw:2026.4.25

USER root
RUN apt-get update && apt-get install -y --no-install-recommends \
    build-essential python3 cmake clang libgomp1 \
 && rm -rf /var/lib/apt/lists/*

USER node
WORKDIR /app
RUN npm install --no-save --package-lock=false node-llama-cpp@3.18.1
#

Answers:

  1. Not a plugin — node-llama-cpp is an optional runtime dependency for built-in local embeddings. Install it beside OpenClaw, not through plugin install.

  2. Base image doesn’t include full build tooling. Your list is close, but I’d add build-essential + python3. libgomp1 is runtime-ish; cmake is needed for source builds.

  3. CUDA won’t be guaranteed from cuda-cudart + libcublas alone. If a prebuilt CUDA binary matches, great. If it needs to compile GPU support, you’ll likely need the CUDA toolkit / nvcc, not just runtime libs. Check inside the container:

    npx --no node-llama-cpp inspect gpu
    
  4. Your /home/node/.node-llama-cpp volume is the right idea. I’d make it explicit:

    agents: {
      defaults: {
        memorySearch: {
    
#
         provider: "local",
         fallback: "none",
         local: {
           modelCacheDir: "/home/node/.node-llama-cpp"
         }
       }
     }
   }
  1. Dreaming cron: missing agentId is probably not the cause. The managed memory-core dreaming cron is allowed to omit it and resolve to the default agent. The missing scope: operator.admin cleanup issue matches fixes after 2026.4.25; I’d update to latest 2026.4.26+ rather than hand-editing jobs.json.

Verify after rebuild: