#why cant I see what OpenClaw is thinking

1 messages Ā· Page 1 of 1 (latest)

worldly moatBOT
dense heartBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> āœ… Mark Solution

worldly moatBOT
abstract acornBOT
#

First, can you paste openclaw --version?

Most likely this is the mismatch:

  • /think changes reasoning effort/budget
  • /reasoning on changes whether returned reasoning is shown
  • /verbose on shows tool/activity traces, which is usually the closer answer to ā€œwhat is OpenClaw planning/runningā€

A few important gotchas:

  • On Discord, reasoning shows up as a separate message starting with Reasoning:, not inline.
  • With Qwen/OpenAI-compatible backends, OpenClaw can only show reasoning if the upstream actually emits reasoning fields. If it doesn't, /reasoning on has nothing to display.
  • If you want to see actions/tool calls, try /verbose on or /verbose full, not just /think//reasoning.

Quick checks:

  • /think → shows current effective thinking level
  • /reasoning → shows current reasoning visibility
rapid garden
#

matias@matias-System-Product-Name:~$ openclaw --version
OpenClaw 2026.4.15 (041266a)
matias@matias-System-Product-Name:~$

/status
šŸ¦ž OpenClaw 2026.4.15 (041266a)
🧠 Model: llama-server/Qwen3.6-35B-A3B-Uncensored.Q6_K_P Ā· šŸ”‘ unknown
🧮 Tokens: 0 in / 0 out
šŸ“š Context: 0/220k (0%) Ā· 🧹 Compactions: 7
🧵 Session: agent:main:main • updated just now
āš™ļø Runtime: direct Ā· Think: high Ā· Reasoning: on
🪢 Queue: collect (depth 1)

abstract acornBOT
#

Yep, the key clue is this line:

Model: llama-server/Qwen3.6-35B-A3B-Uncensored.Q6_K_P

So you're not on native qwen/..., you're on a local/custom OpenAI-compatible llama-server backend running a Qwen model behind it. That matters.

What your status means:

  • Think: high = OpenClaw session is set to ask for more reasoning effort
  • Reasoning: on = OpenClaw is allowed to show reasoning if the backend returns it
  • it does not guarantee the backend actually emits a separate reasoning stream

And for local/custom OpenAI-compatible /v1 backends, the docs call out that OpenClaw does not apply OpenAI reasoning-compat payload shaping there. So with llama-server/..., if the server/model does not emit structured reasoning fields, OpenClaw has nothing to show.

So the short answer is:

  1. Your flags are set correctly
#
  1. The backend likely isn't exposing reasoning in a way OpenClaw can surface
  2. If you want to see what OpenClaw is doing, use:
    • /verbose on
    • or /verbose full
      That shows tool/activity traces, which is usually what people mean by ā€œplanning/runningā€

Also, Reasoning: on in /status is just the visibility toggle, not proof that reasoning blocks are available.

Docs:

If you want, paste your models.providers.llama-server config block. I can tell you whether this is:

  • a model capability issue,
  • a llama-server transport issue,
  • or a config issue like the model not being marked reasoning: true.
rapid garden
#

okay. now, is my embedding setup done correctly?

{
"agents": {
"defaults": {
"workspace": "/home/matias/.openclaw/workspace",
"model": {
"primary": "llama-server/Qwen3.6-35B-A3B-Uncensored.Q6_K_P"
},
"memorySearch": {
"provider": "local",
"local": {
"modelPath": "/home/matias/llama.cpp/models/qwen/embedding/Qwen3-Embedding-4B-Q6_K.gguf"
},
"extraPaths": [],
"sync": {
"onSearch": false,
"onSessionStart": true,
"watch": false
},
"store": {
"vector": {
"extensionPath": "/home/matias/.npm-global/lib/node_modules/openclaw/node_modules/sqlite-vec-linux-x64/vec0.so",
"enabled": true
}
}
},
"compaction": {
"mode": "safeguard",
"reserveTokensFloor": 20000,
"notifyUser": true
},
"contextLimits": {
"toolResultMaxChars": 2000
},
"contextPruning": {
"mode": "cache-ttl",
"ttl": "5m"
}
}
},
"models": {
"mode": "merge",
"providers": {
"llama-server": {
"baseUrl": "http://127.0.0.1:8080/v1",
"api": "openai-completions",
"models": [
{
"id": "Qwen3.6-35B-A3B-Uncensored.Q6_K_P",
"name": "Local llama.cpp model",
"reasoning": false,
"input": [
"text"
],
"contextWindow": 220000,
"maxTokens": 8192
}
]
}
}
},
"gateway": {
"mode": "local",
"auth": {
"mode": "token",
"token":

abstract acornBOT
#

Mostly yes, your embedding config looks valid.

What looks right:

  • memorySearch.provider: "local" āœ…
  • memorySearch.local.modelPath pointing at a GGUF embedding model āœ…
  • store.vector.enabled: true āœ…
  • sync.onSessionStart: true āœ…

A few important notes though:

  1. reasoning: false is unrelated to embeddings

    • that only affects your chat model, not memory embeddings
  2. extraPaths: [] means you are not indexing extra docs

    • so memory search will mostly use the default memory roots
    • if you expected arbitrary notes/docs outside the default memory files, you're not there yet
#
  1. Your sync policy is conservative

    • onSessionStart: true
    • onSearch: false
    • watch: false
      This means updates will not be picked up live. If you edit memory files, you’ll usually need a restart/new session or:
    • openclaw memory index --force
  2. store.vector.extensionPath is probably unnecessary

    • valid, yes
    • but a bit fragile across upgrades
    • docs say to override it only if auto-discovery fails

So my verdict is:

  • Config shape: yes, looks correct
  • Will local embeddings work: probably yes
  • Is it ideal: mostly, but I’d consider removing the hardcoded extensionPath unless you actually needed it
rapid garden
#

what should I do with extrapaths? if I have a vast corpus of data in json files which have been indexed, how do I tell OpenClaw to use them? should I move them somewhere myself or will OpenClaw do it? what should I do now?

abstract acornBOT
#

extraPaths is basically: ā€œhere are more files/directories OpenClaw should index itself.ā€

A few important clarifications:

  • OpenClaw will not move your corpus for you
  • OpenClaw will not automatically reuse some other prebuilt JSON/vector index
  • With the builtin memory engine, extraPaths is for Markdown files (.md) plus optional multimodal files, not raw .json

So if your corpus is a big pile of JSON files, the answer is:

What to do now

Best simple path

  1. Convert the JSON corpus into text/Markdown files
    • one file per record, document, chunk, etc.
  2. Put them in a directory, for example:
    • /home/matias/openclaw-corpus/
#
  1. Add that directory to memorySearch.extraPaths
  2. Reindex:
    openclaw memory index --force
    
  3. Test:
    openclaw memory search "some known phrase"
    

Example:

{
  agents: {
    defaults: {
#
      memorySearch: {
        provider: "local",
        local: {
          modelPath: "/home/matias/llama.cpp/models/qwen/embedding/Qwen3-Embedding-4B-Q6_K.gguf"
        },
        extraPaths: ["/home/matias/openclaw-corpus"]
      }
    }
  }
}

What extraPaths is good for

  • notes outside the workspace
  • docs folders
  • exported knowledge bases
#
  • Markdown archives you want searchable

What it is not

  • a pointer to an existing JSON embedding store
  • an automatic importer for arbitrary .json corpora

If your corpus is huge

I’d seriously consider QMD for the search backend, but even then the practical move is still to give it text/markdown-ish source files, not just raw JSON blobs.

So my recommendation is:

  • Do not move files into memory/ manually unless you want them to become your main memory notes
  • instead, make a separate corpus folder
  • convert JSON → .md
  • point extraPaths there
  • run openclaw memory index --force
rapid garden
#

I reindexed last night over 7000 files, not going to do that again as everything was supposed to be working properly, how do I know OpenClaw has actually learned something from it

abstract acornBOT
#

Yeah, don’t reindex 7000 files blind again šŸ˜…

The important distinction is:

OpenClaw does not ā€œlearnā€ them like training a model.
It indexes them into a searchable memory store.

So the question is really: did those files become retrievable?

How to check without reindexing

1. Check the index health/counts

openclaw memory status --deep --json

What you want to see:

#
  • embedding provider = your local one
  • vector search healthy
  • non-zero files / chunks
  • ideally sourceCounts that make sense

2. Probe with a known unique phrase

Pick a phrase that exists in exactly one of those files:

openclaw memory search "very specific phrase from one file" --json

If OpenClaw really indexed it, you should get:

  • the matching snippet
  • the source path
  • a sensible score
#

3. Sanity-check with something impossible to fake

Search for:

  • a weird UUID
  • a rare error string
  • a unique title
  • a specific proper noun

That tells you way more than ā€œreindex completedā€.


The big gotcha

If those 7000 files were mostly JSON, then with the builtin memory engine that is a problem.

Builtin memory indexing for extraPaths is basically:

  • .md files āœ…
#
  • optional image/audio with multimodal āœ…
  • raw .json files āŒ

So if the corpus was JSON, the painful possibility is:

  • the reindex job ran
  • but the builtin memory system did not actually ingest those JSON files as memory content

That’s exactly why a targeted memory search test is the truth serum here.


What I’d do right now

Run these, in order:

rapid garden
#

matias@matias-System-Product-Name:~$ openclaw memory search "trading 212" --json{
"results": []
}

so I did all of it for nothing

abstract acornBOT
#

That’s frustrating, yeah.

Based on everything you’ve shown, the likely answer is:

for the JSON corpus, probably yes, the builtin memory path did not ingest it the way you expected.

Not because memory is broken in general, but because with your current setup:

  • memorySearch.extraPaths is for files OpenClaw can index itself
  • the builtin engine expects Markdown/text-like content
  • raw .json corpus files are not the happy path there

So the empty result strongly suggests OpenClaw cannot currently recall that corpus.

A more precise version:

  • if you indexed .md files too, those may still be fine
#
  • but if the important 7k files were JSON, that work did not become useful memory search

What I’d do now

Best supported path

Convert the JSON corpus to .md or plain text, then point extraPaths at that folder and reindex once.

If the corpus is huge

Use QMD as the backend, but still feed it text/markdown documents, not raw JSON blobs.

If you want to keep the existing JSON/vector setup as-is

Then honestly, memorySearch is probably the wrong layer.
You’d want a custom tool/plugin that queries your existing store directly instead of trying to force it through builtin memory indexing.

My recommendation

Don’t keep poking this current setup hoping it’ll suddenly work.

#

Your clean options are:

  1. Convert JSON → Markdown/text and reindex ← best practical fix
  2. Switch to QMD for large corpora
  3. Build/query your existing index directly via a tool/plugin

If you want, I can help you design the fastest JSON → Markdown conversion format so you only do the painful reindex one more time, properly.

rapid garden
#

if I indexed it all last night, I guess it saved the stuff somewhere? how do I point the stuff

abstract acornBOT
#

If you indexed it through OpenClaw, then yes, it should have saved into OpenClaw’s own memory store.

For the builtin memory engine, that is usually:

~/.openclaw/memory/{agentId}.sqlite

So for the default agent, it’s often roughly:

~/.openclaw/memory/main.sqlite

Important distinction

There are two different things you can ā€œpointā€:

  1. extraPaths → points OpenClaw at source files to index
  2. memorySearch.store.path → points OpenClaw at its own SQLite index file
#

So if you already indexed last night, you do not normally ā€œpoint it againā€ unless:

  • you changed agent IDs
  • you changed config/store path
  • or you’re now searching with a different memory backend/config than the one used for indexing

Key insight

If the config is the same, you are already pointing at the saved index.

So if search is still empty, the likely issue is not ā€œwhere is the DB?ā€, it’s one of these:

  • the DB does not contain the files you expected
  • those files were skipped
  • or they were never valid inputs for builtin memory search, like raw JSON

What to check

#
openclaw memory status --deep --json
ls -lah ~/.openclaw/memory

What I’d look for:

  • actual store path
  • agent id
  • file count
  • chunk count

If you want to force a custom location for OpenClaw’s own DB, that’s this key:

agents: {
  defaults: {
#
    memorySearch: {
      store: {
        path: "/some/path/main.sqlite"
      }
    }
  }
}

But that only works for OpenClaw’s SQLite store, not some arbitrary external JSON/vector index.

So the short answer is:

  • yes, it probably saved somewhere
  • no, extraPaths is not how you point at the saved DB
  • if you indexed with the same agent/config, OpenClaw should already be using it
#

Paste openclaw memory status --deep --json and I can tell you exactly whether you’re hitting the right store or the wrong one.

rapid garden
#

yes theres around 7200 .md files in /home/matias/.openclaw/workspace/memory, step by step what to do now

abstract acornBOT
#

Ah, okay, that changes things.

If the files are already .md under /home/matias/.openclaw/workspace/memory, then don’t move anything and ignore extraPaths for now. Those are already in the default memory location.

Do this, step by step:

1. Verify the phrase actually exists

Run:

rg -n -i "trading 212|trading212" /home/matias/.openclaw/workspace/memory | head -20

Why:

  • if this returns nothing, memory search is not the problem, the phrase just is not in those files in that form

2. Check what OpenClaw thinks the memory index contains

#

Run:

openclaw memory status --agent main --deep --json

What you want to see:

  • non-zero files
  • non-zero chunks
  • a sane store path
  • healthy embedding/vector status

3. If files or chunks are zero, the index is not actually populated

Then run:

#
openclaw memory status --agent main --deep --index --verbose

That’s the least dumb next step because it only reindexes if the store is dirty, instead of blindly forcing everything again.

4. If status looks healthy, test with an exact phrase from step 1

Take one literal phrase from an rg hit and run:

openclaw memory search --agent main "EXACT PHRASE FROM ONE FILE" --json

Use something very specific, not a vague concept.

5. Interpret the result

  • rg finds it, but memory search does not, and status shows plenty of files/chunks
#

→ that smells like an indexing/search bug or wrong store/agent mismatch

  • rg does not find it
    → the search term is the issue
  • status shows 0 files/chunks
    → the index was never built or is not being read

6. Paste back these two outputs

Please paste:

  1. openclaw memory status --agent main --deep --json
  2. one rg hit + the matching openclaw memory search ... --json

Then I can tell you exactly which layer is failing.

So short version: don’t move files, don’t touch extraPaths, verify the text exists, then check the index.

rapid garden
#

atias@matias-System-Product-Name:~$ rg -n -i "trading 212|trading212" /home/matias/.openclaw/workspace/memory | head -20
/home/matias/.openclaw/workspace/memory/investment-allocation-analysis-part2.md:66:I also have my card in Trading 212, where I have around 1 000-1 500 of balance.

what now

abstract acornBOT
#

Good, that proves the content is really there.

Now do these next two commands only:

1. Check whether the memory DB actually has content

openclaw memory status --agent main --deep --json

2. Search for a more exact phrase from that line

openclaw memory search --agent main "card in Trading 212" --json

Why this matters:

  • if status shows lots of files/chunks but the exact phrase still returns nothing, that’s likely a real indexing/search bug or wrong-store mismatch
#
  • if status shows 0 files/chunks, the DB is empty or not the one being used

Paste both outputs and I’ll tell you the next move.

rapid garden
#

matias@matias-System-Product-Name:~$ openclaw memory status --agent main --deep --json
│
ā—‡
[
{
"agentId": "main",
"status": {
"backend": "builtin",
"files": 7268,
"chunks": 20083,
"dirty": false,
"workspaceDir": "/home/matias/.openclaw/workspace",
"dbPath": "/home/matias/.openclaw/memory/main.sqlite",
"provider": "local",
"model": "/home/matias/llama.cpp/models/qwen/embedding/Qwen3-Embedding-4B-Q6_K.gguf",
"requestedProvider": "local",
"sources": [
"memory"
],
"extraPaths": [
"/home/matias/openclaw-corpus"
],

returns the conig

abstract acornBOT
#

That’s actually good news.

This confirms:

  • OpenClaw is reading a real index
  • it has 7268 files
  • it has 20083 chunks
  • it’s using main.sqlite
  • the index is not dirty

So this is not ā€œnothing got savedā€.

It means the problem is now narrowed down to retrieval/search behavior, not missing files.

Next step

Run the exact-phrase search now:

#
openclaw memory search --agent main "I also have my card in Trading 212" --json

If that still comes back empty, then we’ve basically proven:

  • the file is on disk
  • the index exists
  • but retrieval is failing for a literal hit

…and that starts looking like a real bug / search mismatch.

If it returns a hit, then the issue is just that "trading 212" was too weak / odd for the current retrieval path.

Paste that exact search output next.

rapid garden
#

{
"results": []
}

abstract acornBOT
#

Yep, that output is normal-ish, and the part you pasted already tells us a lot.

We now know:

  • the DB exists
  • it has 7268 files / 20083 chunks
  • OpenClaw is reading main.sqlite

So the next thing that matters is whether the two search engines are healthy:

  • FTS (keyword/literal match)
  • vector (embedding search)

Please run this:

#
openclaw memory status --agent main --deep --json | rg -n '"fts"|"vector"|error|available|loadError|fallback' -A3

What I’m looking for:

  • fts.available: true
  • vector.available: true

Why:

  • if FTS is unavailable, exact string hits can fail
  • if vector is unavailable, semantic search can fail
  • if both are bad, you get exactly the behavior you’re seeing

So we’re very close now, paste that part next.

rapid garden
#

^Cmatias@matias-System-Product-Name:~$ openclaw memory status --agent main --deep --json | rg -n '"fts"|"vector"|error|available|loadError|fallback' -A3
[node-llama-cpp] ggml_backend_cuda_buffer_type_alloc_buffer: allocating 2898.00 MiB on device 0: cudaMalloc failed: out of memory
[node-llama-cpp] ggml_gallocr_reserve_n_impl: failed to allocate CUDA0 buffer of size 3038775808
[node-llama-cpp] graph_reserve: failed to allocate compute buffers
[node-llama-cpp] llama_init_from_model: failed to initialize the context: failed to allocate compute pp buffers
[node-llama-cpp] ggml_backend_cuda_buffer_type_alloc_buffer: allocating 2448.00 MiB on device 0: cudaMalloc failed: out of memory
[node-llama-cpp] ggml_gallocr_reserve_n_impl: failed to allocate CUDA0 buffer of size 2566916608
[node-llama-cpp] graph_reserve: failed to allocate compute buffers
[node-llama-cpp] llama_init_from_model: failed to initialize the context: failed to allocate compute pp buffers

this is the issue, do I have ot run this via node llama cpp, and why?

abstract acornBOT
#

Yep, that’s the issue.

Short version:

  • You do not run it manually via node-llama-cpp
  • OpenClaw is already using node-llama-cpp internally
  • it does that because you configured memorySearch.provider: "local" with a GGUF embedding model
  • your chosen embedding model is blowing up on GPU memory

So this config:

memorySearch: {
  provider: "local",
  local: {
    modelPath: "/home/matias/llama.cpp/models/qwen/embedding/Qwen3-Embedding-4B-Q6_K.gguf"
#
  }
}

means:

ā€œOpenClaw, please load this local GGUF embedding model yourself.ā€

And the local GGUF loader OpenClaw uses for that is node-llama-cpp.

Why it fails

Your logs show GPU OOM:

  • allocating ~2.4 to 2.9 GiB
  • cudaMalloc failed: out of memory
#

So the 4B Qwen embedding model is too heavy for the available VRAM in this path.

What to do now

Easiest fix

Use a smaller local embedding model.

The docs default is a much smaller local model:

  • embeddinggemma-300m-qat-Q8_0.gguf

So either:

  1. remove local.modelPath and let OpenClaw use its smaller default, or
  2. point it to a much smaller embedding GGUF than Qwen3-Embedding-4B-Q6_K.gguf

Important downside

If you change embedding provider/model, you will need a reindex for proper semantic search.

#

If you do not want node-llama-cpp involved at all

Then don’t use:

memorySearch.provider: "local"

Use another embedding provider instead, for example:

  • ollama
  • openai
  • gemini
  • voyage
  • mistral

My recommendation

Given where you are, I’d do this:

rapid garden
#

no I wont reindex, I just reindexed with 4B yesterday, does the embedding model have to be running along with openclaw all the time? cant I just run it via llama-server

abstract acornBOT
#

Not all the time, but at query time too, yes.

That’s the key bit:

  • during indexing, OpenClaw embeds all your chunks
  • during search, OpenClaw also has to embed your query
  • so some embedding backend must be available whenever you search

With your current config

Because you set:

memorySearch.provider: "local"

OpenClaw tries to load the GGUF embedding model itself via node-llama-cpp.

#

So in your current setup:

  • yes, OpenClaw needs to be able to load that embedding model on demand
  • that’s why node-llama-cpp shows up
  • that’s why GPU OOM is killing it

Can you use llama-server instead?

Yes, in principle.

But then you should not use provider: "local".

You’d switch to a remote/OpenAI-compatible embedding setup, for example:

memorySearch: {
  provider: "openai",
  model: "your-embedding-model-name",
#
  remote: {
    baseUrl: "http://127.0.0.1:8081/v1",
    apiKey: "dummy"
  }
}

That only works if your llama-server instance actually exposes a working /v1/embeddings endpoint for the embedding model.

The annoying truth

I don’t want to bullshit you here:

If you change how OpenClaw sees the embedding provider/model, it may still want a reindex to be fully consistent.

So:

#
  • current local config = no manual server, but node-llama-cpp must load the model
  • llama-server config = no node-llama-cpp for embeddings, but only if llama-server serves embeddings properly
  • switching transport/provider may still force a reindex

Docs:

If you want, I can give you the exact config shape for using llama-server as the embedding backend so you can at least test that path.

rapid garden
#

okay I will change to a 0.6b model, how do we make sure we can run it? any way to run it on CPU so it wont run out of memory?

abstract acornBOT
#

Yes, there’s a sane way to test this first.

Good news

A 0.6B embedding model is much more realistic.

Important caveat

I don’t see an OpenClaw memorySearch config knob to force CPU-only for the built-in local embedding path.

From the current code path, local embeddings are loaded through node-llama-cpp with basically just the modelPath, not a CPU/GPU/offload setting exposed in OpenClaw config.

So:

  • smaller model = yes, supported and sensible
  • force CPU in OpenClaw local embeddings = I don’t think there’s a first-class config switch for that right now

How to make sure the 0.6B model runs

#
  1. Change memorySearch.local.modelPath to the 0.6B GGUF
  2. Run this:
    openclaw memory status --agent main --deep
    
  3. If it completes without CUDA OOM, the model is loadable in your setup

That command is the best quick probe because it tries the embedding backend without making you guess.

If you want CPU on purpose

The cleaner path is usually:

  • run the embedding model in llama-server / Ollama / LM Studio configured the way you want
  • then point OpenClaw at that remote embedding endpoint
  • instead of using memorySearch.provider: "local"

That avoids the built-in node-llama-cpp local-loading path.

#

My recommendation

If you want the least drama:

  • try the 0.6B model first
  • if it still fights your GPU setup, switch to a remote embedding server
  • don’t keep wrestling the 4B local path

If you want, send me the exact 0.6B model path you plan to use and I’ll tell you whether the config change looks right before you try it.

rapid garden
#

okay I am running it on llama-server under 127.0.0.1 8081

abstract acornBOT
#

Nice, that’s the cleaner path.

Then I’d stop using provider: "local" and point memory search at that server instead.

Config shape to try

Use something like:

{
  agents: {
    defaults: {
      memorySearch: {
        provider: "openai",
        model: "YOUR_EMBEDDING_MODEL_ID",
        remote: {
#
          baseUrl: "http://127.0.0.1:8081/v1",
          apiKey: "dummy"
        }
      }
    }
  }
}

Important

  • remove the local.modelPath block when you do this
  • YOUR_EMBEDDING_MODEL_ID should be whatever your llama-server exposes for the embedding model
  • if llama-server does not require auth, a dummy API key is usually fine

First check

Make sure the server really exposes the embedding model:

#
curl http://127.0.0.1:8081/v1/models

If that looks right, then after updating config run:

openclaw memory status --agent main --deep

If that passes without the node-llama-cpp CUDA OOM spam, then OpenClaw is talking to llama-server for embeddings instead of trying to load the GGUF locally.

One warning

I want to be straight with you: if OpenClaw sees this as a different embedding provider/model setup than yesterday’s local index, it may still want a reindex for full correctness.

But this is still the right next step, because first we need to make sure the embedding backend is stable and queryable.

rapid garden
#

572,"owned_by":"llamacpp","meta":{"vocab_type":2,"n_vocab":151665,"n_ctx_train":40960,"n_embd":2560,"n_params":4021774336,"size":3299735284}}]}matias@matias-System-Product-Name:~$ openclaw memory status --agent openclaw memory status --agent main --deep

šŸ¦ž OpenClaw 2026.4.15 (041266a)
If something's on fire, I can't extinguish it—but I can write a beautiful postmortem.

│
ā—‡
Memory Search (main)
Provider: openai (requested: openai)
Model: Qwen3-Embedding-0.6B-f16.gguf
Sources: memory
Indexed: 7268/7271 files Ā· 20083 chunks
Dirty: no
Store: ~/.openclaw/memory/main.sqlite
Workspace: ~/.openclaw/workspace
Dreaming: 0 3 * * * Ā· limit=10 Ā· minScore=0.8 Ā· minRecallCount=3 Ā· minUniqueQueries=3 Ā· recencyHalfLifeDays=14 Ā· maxAgeDays=30
Embeddings: ready
By source:
memory Ā· 7268/7271 files Ā· 20083 chunks
Vector: ready
Vector dims: 4096
Vector path: ~/.npm-global/lib/node_modules/openclaw/node_modules/sqlite-vec-linux-x64/vec0.so
FTS: ready
Embedding cache: enabled (40012 entries)
Batch: disabled (failures 0/2)
Recall store: 307 entries Ā· 0 promoted Ā· 307 concept-tagged Ā· 0 spaced Ā· scripts=307 latin
Recall path: ~/.openclaw/workspace/memory/.dreams/short-term-recall.json
Recall updated: 2026-04-21T07:36:53.434Z
Dreaming artifacts: diary present Ā· 1 corpus files Ā· ingestion state present
Dream corpus: ~/.openclaw/workspace/memory/.dreams/session-corpus
Dream ingestion: ~/.openclaw/workspace/memory/.dreams/session-ingestion.json
Dream diary: ~/.openclaw/workspace/DREAMS.md

matias@matias-System-Product-Name:~$

abstract acornBOT
#

Nice, this is actually healthy now.

You’ve got:

  • Embeddings: ready
  • Vector: ready
  • FTS: ready
  • 7268 files / 20083 chunks
  • remote embedding model responding on llama-server

So the backend problem looks fixed. šŸŽ‰

Next test, and don’t change anything else yet

Run this exact command:

#
openclaw memory search --agent main --query "I also have my card in Trading 212" --max-results 5 --min-score 0 --json

Why this version:

  • exact known phrase
  • --min-score 0 removes score filtering as a reason for empty results

What it means

  • if this now returns the file/snippet, you’re good
  • if this is still empty even with:
    • indexed files present
    • embeddings ready
    • vector ready
    • FTS ready
    • min-score forced to 0
#

then I’d call that a real bug or stale-index mismatch, not user error

Paste that result next.

rapid garden
#

didnt find anything, the llama-server still appears to working on something but I have no idea with what

abstract acornBOT
#

The llama-server activity is probably just OpenClaw asking for query embeddings. That part makes sense.

At this point, I’d stop guessing and inspect the DB directly, read-only.

Run these:

sqlite3 ~/.openclaw/memory/main.sqlite "select path from files where path like '%investment-allocation-analysis-part2%';"
sqlite3 ~/.openclaw/memory/main.sqlite "select count(*) from chunks where path like '%investment-allocation-analysis-part2%';"
sqlite3 ~/.openclaw/memory/main.sqlite "select path,start_line,end_line from chunks where text like '%Trading 212%' limit 5;"
#

How to read it

  • if the first/second commands return nothing / 0
    → the file was not actually indexed into the DB

  • if the third command returns rows with Trading 212
    → the chunk is in the DB, and memory search failing is a real search-layer bug

That’s the cleanest next step without doing another reindex.

If those rows are present but openclaw memory search still returns [], I’d call that GitHub-issue territory:
https://github.com/openclaw/openclaw/issues

Paste the sqlite output and I’ll tell you which side of the line you’re on.

rapid garden
#

matias@matias-System-Product-Name:~$ sqlite3 ~/.openclaw/memory/main.sqlite "select path from files where path like '%investment-allocation-analysis-part2%';"
memory/investment-allocation-analysis-part2.md
matias@matias-System-Product-Name:~$ sqlite3 ~/.openclaw/memory/main.sqlite "select count(*) from chunks where path like '%investment-allocation-analysis-part2%';"
3
matias@matias-System-Product-Name:~$ sqlite3 ~/.openclaw/memory/main.sqlite "select path,start_line,end_line from chunks where text like '%Trading 212%' limit 5;"
memory/asenna-flask-part1.md|37|90
memory/asenna-flask-part1.md|82|108
memory/budjetin-sovelluksen-bugit-ja-korjaukset-part10.md|23|48
memory/budjetin-sovelluksen-bugit-ja-korjaukset-part10.md|40|73
memory/budjetin-sovelluksen-bugit-ja-korjaukset-part13.md|16|27
matias@matias-System-Product-Name:~$

what is the optimal way to run embedding with openclaw?

abstract acornBOT
#

Your sqlite output is actually really useful.

It proves:

  • the file is in the DB
  • it has 3 chunks
  • Trading 212 text is definitely indexed

So openclaw memory search returning [] is very likely not your fault. That looks like bug / retrieval mismatch territory.

For your actual question:

Optimal way to run embeddings with OpenClaw

My recommendation for your setup

Run embeddings as a separate embedding server and let OpenClaw call it over HTTP.

#

In your case, that means:

  • keep the embedding model on llama-server at 127.0.0.1:8081
  • use a small embedding model like 0.6B
  • point memorySearch at it via remote.baseUrl
  • keep chat model serving separate from embedding serving if possible

Why this is the best path for you

  • avoids OpenClaw loading GGUF embeddings in-process via node-llama-cpp
  • avoids GPU OOM from the local provider path
  • lets you run the embedding server on CPU if you want
  • makes the embedding backend easier to inspect independently

What I would avoid

I would not use:

#
memorySearch.provider: "local"

for a big embedding model unless you specifically want in-process loading and you know the box can handle it.

That path is the most fragile.

Practical ranking

Best overall reliability

  • hosted embeddings (openai, voyage, gemini)
  • least pain, most boring, works well

Best fully local for you

  • llama-server / Ollama / LM Studio as a separate embedding server
  • OpenClaw talks to it remotely
#

Worst for your current setup

  • big GGUF + provider: "local" inside OpenClaw

Recommendation

For you, I’d settle on:

  • memorySearch.provider: "openai"
  • memorySearch.remote.baseUrl: "http://127.0.0.1:8081/v1"
  • small embedding model on the server
  • CPU if you want predictable memory behavior

Also, because the DB clearly has the data but memory search still comes back empty, I think this is worth filing as a bug:
https://github.com/openclaw/openclaw/issues

If you want, I can give you the exact config block I’d use as the final ā€œgoodā€ embedding setup for your machine.

rapid garden
#

{
"agents": {
"defaults": {
"workspace": "/home/matias/.openclaw/workspace",
"model": {
"primary": "llama-server/Qwen3.6-35B-A3B-Uncensored.Q6_K_P"
},
"memorySearch": {
"provider": "openai",
"model": "Qwen3-Embedding-0.6B-f16.gguf",
"remote": {
"baseUrl": "http://127.0.0.1:8081/v1",
"apiKey": "dummy"
}
},
"compaction": {
"mode": "safeguard",
"reserveTokensFloor": 20000,
"notifyUser": true
},
"contextLimits": {
"toolResultMaxChars": 2000
},
"contextPruning": {
"mode": "cache-ttl",
"ttl": "5m"
}
}
},
"models": {
"mode": "merge",
"providers": {
"llama-server": {
"baseUrl": "http://127.0.0.1:8080/v1",
"api": "openai-completions",
"models": [
{
"id": "Qwen3.6-35B-A3B-Uncensored.Q6_K_P",
"name": "Local llama.cpp model",
"reasoning": false,
"input": [
"text"
],
"contextWindow": 220000,
"maxTokens": 8192
}
]
}
}
},

abstract acornBOT
#

Yep, that config shape looks right for the setup I’d recommend.

What looks good

  • chat model stays on your local llama-server at :8080
  • embeddings move to the separate server at :8081
  • no local.modelPath anymore, so OpenClaw stops trying to load the embedding GGUF itself
  • embedding model is pinned explicitly

Tiny tweak I’d add

I’d make the embedding provider pin explicit:

memorySearch: {
  provider: "openai",
  fallback: "none",
  model: "Qwen3-Embedding-0.6B-f16.gguf",
#
  remote: {
    baseUrl: "http://127.0.0.1:8081/v1",
    apiKey: "dummy"
  }
}

That just avoids silent fallback weirdness.

Important note

At this point, your config is not the main problem anymore.
We already proved:

  • the DB has the file
  • the DB has the chunks
  • the text is in the chunks
#
  • memory search still returns []

So the remaining problem looks like search behavior / bug, not ā€œyou configured embeddings wrongā€.

My verdict

For your machine, this is the right final embedding architecture:

  • chat model on one llama-server
  • embedding model on another llama-server
  • OpenClaw talks to both over HTTP

That’s much better than big local GGUF embeddings through provider: "local".

If you want, I can help you write the smallest reproducible bug report for GitHub using the exact evidence you already collected.

rapid garden
#

I wanna test the embedding now, currently running main gguf on 8080 and embedding on 8081

abstract acornBOT
#

Nice. I’d test it in two layers:

1. Test the embedding server directly

Run this against 8081:

curl -s http://127.0.0.1:8081/v1/embeddings \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer dummy' \
  -d '{
    "model": "Qwen3-Embedding-0.6B-f16.gguf",
    "input": "I also have my card in Trading 212"
  }'

What you want:

  • JSON response
#
  • data[0].embedding present
  • not an error

If you have jq, this is even nicer:

curl -s http://127.0.0.1:8081/v1/embeddings \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer dummy' \
  -d '{
    "model": "Qwen3-Embedding-0.6B-f16.gguf",
    "input": "I also have my card in Trading 212"
  }' | jq '.data[0].embedding | length'

That should print the embedding dimension.

#

2. Test OpenClaw’s integration

Then run:

openclaw memory status --agent main --deep

What you want:

  • Provider: openai
  • Model: Qwen3-Embedding-0.6B-f16.gguf
  • Embeddings: ready
  • Vector: ready
  • FTS: ready

3. Then test memory search again

#
openclaw memory search --agent main --query "I also have my card in Trading 212" --max-results 5 --min-score 0 --json

How to interpret it

  • if curl works and status says Embeddings: ready, then the embedding backend is fine
  • if memory search still returns [], then the bug is above the embedding layer

So start with the raw /v1/embeddings call. Paste that output if you want and I’ll check it.

rapid garden
#

matias@matias-System-Product-Name:~$ curl -s http://127.0.0.1:8081/v1/embeddings
-H 'Content-Type: application/json'
-H 'Authorization: Bearer dummy'
-d '{
"model": "Qwen3-Embedding-0.6B-f16.gguf",
"input": "I also have my card in Trading 212"
}'
{"model":"Qwen3-Embedding-0.6B-f16.gguf","object":"list","usage":{"prompt_tokens":12,"total_tokens":12},"data":[{"embedding":[-0.0002532844664528966,-0.003442488145083189,0.017302660271525383,-0.03987465053796768,-0.0017
matias@matias-System-Product-Nacurl -s http://127.0.0.1:8081/v1/embeddings \ings
-H 'Content-Type: application/json'
-H 'Authorization: Bearer dummy'
-d '{
"model": "Qwen3-Embedding-0.6B-f16.gguf",
"input": "I also have my card in Trading 212"
}' | jq '.data[0].embedding | length'
2560

Memory Search (main)
Provider: openai (requested: openai)
Model: Qwen3-Embedding-0.6B-f16.gguf
Sources: memory
Indexed: 7268/7271 files Ā· 20083 chunks
Dirty: no
Store: ~/.openclaw/memory/main.sqlite
Workspace: ~/.openclaw/workspace
Dreaming: 0 3 * * * Ā· limit=10 Ā· minScore=0.8 Ā· minRecallCount=3 Ā· minUniqueQueries=3 Ā· recencyHalfLifeDays=14 Ā· maxAgeDays=30
Embeddings: ready
By source:
memory Ā· 7268/7271 files Ā· 20083 chunks
Vector: ready
Vector dims: 4096
Vector path: ~/.npm-global/lib/node_modules/openclaw/node_modules/sqlite-vec-linux-x64/vec0.so
FTS: ready
Embedding cache: enabled (40012 entries)
Batch: disabled (failures 0/2)
Recall store: 307 entries Ā· 0 promoted Ā· 307 concept-tagged Ā· 0 spaced Ā· scripts=307 latin
Recall path: ~/.openclaw/workspace/memory/.dreams/short-term-recall.json
Recall updated: 2026-04-21T07:36:53.434Z
Dreaming artifacts: diary present Ā· 1 corpus files Ā· ingestion state present
Dream corpus: ~/.openclaw/workspace/memory/.dreams/session-corpus
Dream ingestion: ~/.openclaw/workspace/memory/.dreams/session-ingestion.json
Dream diary: ~/.openclaw/workspace/DREAMS.md

abstract acornBOT
#

Aha, we found the smoking gun.

Your embedding server returns:

  • 2560 dims from /v1/embeddings

But OpenClaw says the current memory index is:

  • Vector dims: 4096

That is a hard mismatch.

What that means

Your current DB was indexed with a 4096-dim embedding model, but your live query embeddings are 2560-dim.

So OpenClaw is effectively trying to compare vectors from different spaces.

#

That explains why search is busted.

Your two real options

Option 1, no reindex

Use the same embedding model/dimensions that built the current DB.

So if you want to keep yesterday’s index, your embedding backend must return 4096-dim vectors, not 2560.

Option 2, use the new 0.6B model

Then you need a full reindex, because the stored vectors must be regenerated in 2560 dims.

There isn’t a safe shortcut around that.

My blunt recommendation

Since you said you do not want to reindex:

#
  • do not switch to the 0.6B embedding model for this existing DB
  • run the original embedding model remotely on 8081
  • make sure it returns the same dims as the existing index, which appears to be 4096

Also, this is probably a bug

OpenClaw really should be surfacing this mismatch clearly instead of just failing silently.

If you want maintainer attention, this is a good GitHub issue:
https://github.com/openclaw/openclaw/issues

So the short answer is:

2560 query vectors + 4096 index vectors = broken search
No reindex means you must go back to the original 4096-dim embedding model.

rapid garden
#

changed back to 4b model, how to try again

abstract acornBOT
#

Perfect. Try it in this order:

1. Make sure the live embedding model matches the DB dims

curl -s http://127.0.0.1:8081/v1/embeddings \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer dummy' \
  -d '{
    "model": "Qwen3-Embedding-4B-Q6_K.gguf",
    "input": "I also have my card in Trading 212"
  }' | jq '.data[0].embedding | length'

You want this to print:

#
4096

If it does not print 4096, stop there.

2. Check OpenClaw sees the embedding backend as healthy

openclaw memory status --agent main --deep

You want:

  • Embeddings: ready
  • Vector: ready
  • FTS: ready
  • Vector dims: 4096
#

3. Retry the exact memory search

openclaw memory search --agent main --query "I also have my card in Trading 212" --max-results 5 --min-score 0 --json

4. If that still returns []

Then we’ve ruled out:

  • missing file
  • missing chunks
  • broken embedding server
  • dim mismatch

…and it’s basically a real OpenClaw bug.

Paste:

  1. the embedding length output
  2. the memory status --deep
#
  1. the final memory search output

and I’ll tell you the last conclusion.

rapid garden
#

matias@matias-System-Product-Name:~$ curl -s http://127.0.0.1:8081/v1/embeddings
-H 'Content-Type: application/json'
-H 'Authorization: Bearer dummy'
-d '{
"model": "Qwen3-Embedding-4B-Q6_K.gguf",
"input": "I also have my card in Trading 212"
}' | jq '.data[0].embedding | length'
2560
matias@matias-System-Product-Name:~$

abstract acornBOT
#

Yep, that settles it.

Even your 4B llama-server is returning:

  • 2560 dims

But your current OpenClaw index is still:

  • 4096 dims

So the remote llama-server path is not compatible with the existing DB, even with the 4B model name.

What that means

Same conclusion, just stronger now:

  • your existing main.sqlite was built with a different embedding shape
#
  • the remote llama-server embeddings, whether 0.6B or 4B, are giving you 2560
  • so this DB cannot be queried correctly with that server

Real options now

Option A, realistic

Reindex using the llama-server embedding backend you want to keep.

That gives you a clean 2560-dim index matching the live server.

Option B, keep old DB

Find the exact original embedding runtime/model that produced the old 4096-dim vectors.

But based on what we’ve seen, that looks like the old local path that was causing OOM, so it’s probably not practical.

Blunt truth

There isn’t a magic bridge from:

#
  • stored 4096-dim vectors
    to
  • live 2560-dim query vectors

So if you want the llama-server setup, I think reindex is now unavoidable.

I know that’s annoying as hell, but at least now we know why.

If you want, I can help you make the reindex as safe as possible, for example:

  • back up the old SQLite DB first
  • confirm the live embedding dims before starting
  • then do one clean rebuild and stop touching it after that
rapid garden
#

#!/bin/bash
/home/matias/llama.cpp/build/bin/llama-server
-m "/home/matias/llama.cpp/models/qwen/embedding/Qwen3-Embedding-4B-Q6_K.gguf"
--alias "Qwen3-Embedding-4B-Q6_K" \

{
"agents": {
"defaults": {
"workspace": "/home/matias/.openclaw/workspace",
"model": {
"primary": "llama-server/Qwen3.6-35B-A3B-Uncensored.Q6_K_P"
},
"memorySearch": {
"provider": "openai",
"fallback": "none",
"model": "Qwen3-Embedding-4B-Q6_K",
"remote": {
"baseUrl": "http://127.0.0.1:8081/v1",
"apiKey": "dummy"
}
},
"compaction": {
"mode": "safeguard",
"reserveTokensFloor": 20000,
"notifyUser": true
},
"contextLimits": {
"toolResultMaxChars": 2000
},
"contextPruning": {
"mode": "cache-ttl",
"ttl": "5m"
}
}
},
"models": {
"mode": "merge",
"providers": {
"llama-server": {
"baseUrl": "http://127.0.0.1:8080/v1",
"api": "openai-completions",
"models": [
{
"id": "Qwen3.6-35B-A3B-Uncensored.Q6_K_P",
"name": "Local llama.cpp model",
"reasoning": false,
"input": [
"text"
],
"contextWindow": 220000,
"maxTokens": 8192
}
]

everything looks good so its loading the correct model?

abstract acornBOT
#

The config shape looks good, yes.

But the alias alone does not prove the server is producing the right embeddings.

Important

--alias "Qwen3-Embedding-4B-Q6_K" only changes the model name OpenClaw sees.

It does not change:

  • the actual model loaded
  • the embedding dimensionality
  • the vector space

So the real test is still this:

curl -s http://127.0.0.1:8081/v1/embeddings \
#
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer dummy' \
  -d '{
    "model": "Qwen3-Embedding-4B-Q6_K",
    "input": "I also have my card in Trading 212"
  }' | jq '.data[0].embedding | length'

What you need

You need that command to print:

4096

If it prints 2560 again, then:

#
  • yes, OpenClaw is probably calling the alias correctly
  • but no, it is not compatible with your current DB

So:

  • config looks fine
  • alias is fine
  • the real check is the embedding length

If you want, run that exact alias-based curl now and paste just the number it prints.

rapid garden
#

should I restart the servers and openclaw

abstract acornBOT
#

Yes, I would, just to remove any stale-state weirdness.

Safest order

  1. Restart the embedding server on 8081

    • because you changed the embedding model / alias
  2. Restart OpenClaw

    • so it re-reads the updated memorySearch config cleanly
  3. You do not need to restart 8080

    • unless you changed the main chat model server too

Then test immediately

After restart, run this first:

#
curl -s http://127.0.0.1:8081/v1/embeddings \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer dummy' \
  -d '{
    "model": "Qwen3-Embedding-4B-Q6_K",
    "input": "I also have my card in Trading 212"
  }' | jq '.data[0].embedding | length'

If that is not 4096, stop there, because restart didn’t fix the real mismatch.

So short answer:

  • restart 8081
  • restart OpenClaw
  • leave 8080 alone unless changed
rapid garden
#

still 2560

abstract acornBOT
#

Yep, then that’s definitive.

No more restarts or alias tweaks are going to fix this.

What we now know for sure

  • your current DB expects 4096-dim vectors
  • your llama-server embedding endpoint returns 2560-dim vectors
  • therefore this server cannot query that DB correctly

So the options are now brutally simple:

Keep the current DB

You must use the exact old embedding runtime/model that produced 4096 dims.

Keep the current llama-server embedding setup

You must reindex so the DB is rebuilt with 2560-dim vectors.

#

There isn’t a third option.

Short answer

If you want to stay on the 8081 llama-server embedding backend, reindex is unavoidable.

If you want, I can help you do it in the safest possible way:

  1. back up ~/.openclaw/memory/main.sqlite
  2. confirm the live embedding dims are 2560
  3. do one clean rebuild
  4. test the exact phrase again

That’s the clean path from here.

rapid garden
#

do you think when I tried to reindex it and I closed mid indexing it caused this? I can reindex with the 4b model, how to do taht

abstract acornBOT
#

It could have contributed, yeah, especially the 7268/7271 partial-looking state.

But the bigger proven problem is still the one we measured:

  • old DB state says 4096 dims
  • your live 4B server returns 2560 dims

So even if the interrupted run made things messy, the clean fix is the same:

Reindex cleanly against the current 4B embedding server

1. Keep this config active

  • memorySearch.provider: "openai"
  • memorySearch.model: "Qwen3-Embedding-4B-Q6_K"
  • memorySearch.remote.baseUrl: "http://127.0.0.1:8081/v1"

2. Make sure the live embedding server is up

#

You already tested it and it returns 2560. That’s fine, as long as the reindex uses the same thing consistently.

3. Run a full rebuild

openclaw memory index --agent main --force --verbose

4. Let it finish

Don’t close the terminal this time.

5. Verify the rebuilt index

openclaw memory status --agent main --deep

After a successful rebuild, I’d expect:

  • Embeddings: ready
#
  • Vector: ready
  • FTS: ready
  • Vector dims: 2560

6. Test the exact phrase again

openclaw memory search --agent main --query "I also have my card in Trading 212" --max-results 5 --min-score 0 --json

Optional but smart

Before reindexing, back up the old DB:

cp ~/.openclaw/memory/main.sqlite ~/.openclaw/memory/main.sqlite.bak.$(date +%F-%H%M%S)

That way you can always go back.

#

So yes, I think the interrupted indexing may have made things worse, but a clean full reindex with the current 4B server is the right fix now.

Docs:

If you want, once you kick off openclaw memory index --agent main --force --verbose, paste the first few lines and I’ll sanity-check that it’s using the right backend.

rapid garden
#

is this the best way to go now? before I commit to the reindexing

#

reindexing now with 4B model

abstract acornBOT
#

Good, let it run to completion.

When it finishes, run these two checks:

openclaw memory status --agent main --deep

You want Vector dims: 2560 now, since that’s what your live 4B server is returning.

Then:

openclaw memory search --agent main --query "I also have my card in Trading 212" --max-results 5 --min-score 0 --json

If either looks weird, paste it and I’ll check it.

rapid garden
#

any way to do this faster/in parallel? also, in the future, how do I add new stuff I want to teach to openclaw? I suppose I am not meant to do this reindexing anymore?

abstract acornBOT
#

Short answer:

Faster right now

Not really in a big generic way, not with this llama-server path.

The main reasons:

  • you’re using a normal OpenAI-compatible embedding endpoint, not a provider with a special bulk batch API
  • the expensive part is still generating embeddings for lots of chunks

What does help:

  • embedding cache, which you already have enabled
  • not interrupting runs
  • not force-reindexing unless you actually changed model/provider
  • a smaller embedding model if you’re okay with the tradeoff

I would not expect a big ā€œparallelize thisā€ win from OpenClaw for the generic llama-server route.

#

Future workflow, the normal way

No, you are not supposed to do full reindexing all the time.

That’s the key idea.

Normal ā€œteach OpenClawā€ path

You add/update memory files, for example:

  • durable facts in MEMORY.md
  • longer notes/docs in memory/*.md
  • external doc folders via memorySearch.extraPaths

Then OpenClaw should incrementally pick them up.

When to full reindex

Only for bigger shifts like:

  • changing embedding model
#
  • changing embedding provider
  • changing chunking/index assumptions
  • recovering from broken/stale index state

My recommendation going forward

For normal new knowledge

Just put it in:

  • ~/.openclaw/workspace/MEMORY.md
  • or ~/.openclaw/workspace/memory/*.md

and let OpenClaw ingest it incrementally.

For big imports

Drop the files in, then if you want to nudge it manually:

#
openclaw memory index --agent main

Use --force only when you really mean ā€œrebuild everythingā€.

Practical mental model

You are not ā€œtrainingā€ OpenClaw.

You are maintaining a memory index.

That means:

  • add/edit notes normally
  • let incremental indexing do the routine work
  • reserve full rebuilds for major backend/model changes

So yes, after this cleanup, the goal should be: don’t keep doing full reindexes.

rapid garden
#

just to clarify, what it is doing now is reading all those .md files and indexing them into the sqlite? where do these .md files originate from?

abstract acornBOT
#

Yes, basically.

What it’s doing now is:

  1. reading the Markdown files
  2. chunking them into smaller pieces
  3. embedding those chunks
  4. storing metadata + chunk text + vectors in the SQLite DB
    • your DB is ~/.openclaw/memory/main.sqlite

Where the .md files come from

For the builtin memory system, they usually come from your workspace:

  • ~/.openclaw/workspace/MEMORY.md
  • ~/.openclaw/workspace/memory/*.md
  • ~/.openclaw/workspace/DREAMS.md for dreaming/diary-related stuff
#
  • plus anything you intentionally added via memorySearch.extraPaths

Who creates them

A mix of:

  • you, when you manually add notes/files
  • OpenClaw, when it writes memory files during normal use
  • automatic memory flush / dreaming / daily notes, depending on what features are enabled
  • bulk imports, if you copied documents into workspace/memory or configured extra paths

So there is no hidden magical memory store before SQLite.
The source of truth is just Markdown on disk, and SQLite is the searchable index built from that.

That’s actually a nice design, because it means:

  • you can inspect the source files yourself
  • back them up
worldly moatBOT
rapid garden
#

please sum up the answer to the initial message and the conversation briefly

abstract acornBOT
#

Brief summary:

  • Why you couldn’t see OpenClaw ā€œthinkingā€

    • /think changes reasoning effort
    • /reasoning only shows reasoning if the backend actually returns it
    • /verbose is what shows tool/activity traces
    • with your local llama-server/... setup, OpenClaw was not guaranteed to expose reasoning blocks the way you expected
  • What we found about memory/embeddings

    • your memory files were real and indexed into ~/.openclaw/memory/main.sqlite
    • the search DB definitely contained the target file and matching text
    • but your live embedding backend and the stored index got out of sync
  • Root cause

    • the existing DB had 4096-dim vectors
    • your llama-server embedding endpoint was returning 2560-dim vectors
    • that mismatch breaks search
#
  • Best fix

    • use a separate remote embedding server (your 8081 llama-server is fine)
    • choose the embedding model you actually want to keep
    • do one clean reindex against that exact setup
  • Going forward

    • don’t keep full reindexing for normal use
    • ā€œteachingā€ OpenClaw usually just means adding/editing Markdown in:
      • MEMORY.md
      • memory/*.md
    • let incremental indexing handle routine updates

If this reindex finishes and exact phrase search still fails, then it’s likely a real OpenClaw bug.

rapid garden
#

14:24:17 [memory] embeddings rate limited; retrying in 2131ms
14:24:18 [memory] embeddings: batch start
14:24:18 [memory] embeddings: batch start
14:24:19 [memory] embeddings: batch start
14:24:19 [memory] embeddings: batch start
14:24:19 [memory] embeddings: batch start
14:24:20 [memory] embeddings: batch start
Memory index failed (main): openai embeddings failed: 500 {"error":{"code":500,"message":"input (2071 tokens) is too large to process. increase the physical batch size (current batch size: 2048)","type":"server_error"}}
matias@matias-System-Product-Name:~$
matias@matias-System-Product-Name:~$

did probably all but this was in the end

abstract acornBOT
#

It probably got most of the way through, but I would not treat that run as cleanly finished.

The important part is the fatal error:

  • one embedding input hit 2071 tokens
  • your llama-server embedding backend only accepts 2048
  • so the rebuild stopped on that chunk

Best fix from the OpenClaw side

Lower the memory chunk size before rerunning.

I’d try:

memorySearch: {
  provider: "openai",
#
  fallback: "none",
  model: "Qwen3-Embedding-4B-Q6_K",
  remote: {
    baseUrl: "http://127.0.0.1:8081/v1",
    apiKey: "dummy"
  },
  chunking: {
    tokens: 250,
    overlap: 50
  }
}

That should keep chunks comfortably under your server’s 2048-token limit.

Then rerun

#
openclaw memory index --agent main --force --verbose

Why this happened

OpenClaw’s builtin memory chunking is only an estimate, and your embedding server has a stricter real token limit than the chunk estimate happened to hit.

So yes, the rate limited messages are annoying but not the main issue.
The real blocker is the 2048 token ceiling from the embedding server.

Alternative fix

You could also raise the embedding server’s allowed physical batch/token size on the llama.cpp side, but the simplest, safest fix is just smaller OpenClaw chunks.

Docs:

#

If you want, I can give you the exact memorySearch block with the chunking settings merged into your current config.

rapid garden
#

no I will not run the whole process again, lets test if it works

abstract acornBOT
#

Fair enough, let’s just test it.

Run these two:

openclaw memory status --agent main --deep

and

openclaw memory search --agent main --query "I also have my card in Trading 212" --max-results 5 --min-score 0 --json

What we’re looking for

  • if status now shows the new vector setup cleanly and the search returns the snippet/file, then you’re probably okay
  • if search is still empty, then the partial reindex did not fix it
#

Paste both outputs and we’ll judge it from there, no more guessing.

rapid garden
#

matias@matias-System-Product-Name:~$ openclaw memory status --agent main --deep

šŸ¦ž OpenClaw 2026.4.15 (041266a)
Powered by open source, sustained by spite and good documentation.

│
ā—‡
Memory Search (main)
Provider: openai (requested: openai)
Model: Qwen3-Embedding-4B-Q6_K
Sources: memory
Indexed: 0/7271 files Ā· 0 chunks
Dirty: yes
Store: ~/.openclaw/memory/main.sqlite
Workspace: ~/.openclaw/workspace
Dreaming: 0 3 * * * Ā· limit=10 Ā· minScore=0.8 Ā· minRecallCount=3 Ā· minUniqueQueries=3 Ā· recencyHalfLifeDays=14 Ā· maxAgeDays=30
Embeddings: ready
By source:
memory Ā· 0/7271 files Ā· 0 chunks
Vector: ready
Vector path: ~/.npm-global/lib/node_modules/openclaw/node_modules/sqlite-vec-linux-x64/vec0.so
FTS: ready
Embedding cache: enabled (0 entries)
Batch: disabled (failures 0/2)
Recall store: 307 entries Ā· 0 promoted Ā· 307 concept-tagged Ā· 0 spaced Ā· scripts=307 latin
Recall path: ~/.openclaw/workspace/memory/.dreams/short-term-recall.json
Recall updated: 2026-04-21T07:36:53.434Z
Dreaming artifacts: diary present Ā· 1 corpus files Ā· ingestion state present
Dream corpus: ~/.openclaw/workspace/memory/.dreams/session-corpus
Dream ingestion: ~/.openclaw/workspace/memory/.dreams/session-ingestion.json
Dream diary: ~/.openclaw/workspace/DREAMS.md

matias@matias-System-Product-Name:~$ openclaw memory search --agent main --query "I also have my card in Trading 212" --max-results 5 --min-score 0 --json

llama-server is working on this, how long will it search?

abstract acornBOT
#

Don’t wait on that, you can cancel it.

This line is the important one:

  • Indexed: 0/7271 files Ā· 0 chunks
  • Dirty: yes

So right now, there is nothing usable to search.

What likely happened

The failed reindex seems to have left you with:

  • an empty memory DB
  • a dirty/incomplete rebuild state

So the old searchable index is effectively gone.

Answer to your question

#
  • a healthy search should take seconds, not sit there forever
  • in your current state, waiting is pointless, because there are 0 chunks

What your options are now

If you made a backup of main.sqlite

Restore that backup.

If you did not make a backup

You need to complete a clean index build. There’s no searchable index left right now.

I know that’s annoying, but the status output is pretty definitive here.

So short version:

  • cancel the current search
  • 0 chunks means no search can work
#
  • either restore the old DB or finish a successful reindex
rapid garden
#

oh my fuckin god this is painful, yes I have the old db available, HOW DO I USE THAT?

abstract acornBOT
#

Yeah, that’s brutal. I’m sorry, this got way more painful than it should have.

Important first

If you want to use the old DB, you need both:

  1. the old main.sqlite
  2. the old embedding setup that matches it

Restoring the DB alone will not fix search if you keep the current 2560-dim embedding backend.

Your old DB was 4096-dim.

So if you restore it while still using the current remote embedding server returning 2560, search will still be broken.

If your goal is ā€œput me back where I was before this messā€

Do this:

#

1. Stop OpenClaw / stop any running memory index job

Make sure it isn’t still writing the DB.

2. Back up the current broken DB

cp ~/.openclaw/memory/main.sqlite ~/.openclaw/memory/main.sqlite.broken.$(date +%F-%H%M%S)

3. Restore your old DB over main.sqlite

Replace /path/to/your/old/main.sqlite with the real backup path:

cp /path/to/your/old/main.sqlite ~/.openclaw/memory/main.sqlite

4. Restore the old embedding config/runtime too

If the old DB was built with the old local 4096-dim setup, you need to put that config back as well.

#

5. Restart OpenClaw

Then check:

openclaw memory status --agent main --deep

The blunt truth

There are two separate restore paths:

Path A, restore the old world

  • old DB
  • old embedding backend
  • search should behave like before

Path B, keep the new llama-server embedding backend

  • then the old DB is useless
#
  • you need a clean reindex

So before you do anything, decide which you want:

  • ā€œrestore old working stateā€ → restore old DB + old embedding setup
  • ā€œstay on new 8081 embedding serverā€ → old DB won’t help, reindex required

If you want, send me:

  • the path of your old DB backup
  • whether you want old working state or new 8081 setup

and I’ll tell you the exact next command.

rapid garden
#

Memory Search (main)
Provider: openai (requested: openai)
Model: Qwen3-Embedding-4B-Q6_K
Sources: memory
Indexed: 7268/7271 files Ā· 20083 chunks
Dirty: no
Store: ~/.openclaw/memory/main.sqlite
Workspace: ~/.openclaw/workspace
Dreaming: 0 3 * * * Ā· limit=10 Ā· minScore=0.8 Ā· minRecallCount=3 Ā· minUniqueQueries=3 Ā· recencyHalfLifeDays=14 Ā· maxAgeDays=30
Embeddings: ready
By source:
memory Ā· 7268/7271 files Ā· 20083 chunks
Vector: ready
Vector dims: 4096
Vector path: ~/.npm-global/lib/node_modules/openclaw/node_modules/sqlite-vec-linux-x64/vec0.so
FTS: ready
Embedding cache: enabled (40012 entries)
Batch: disabled (failures 0/2)
Recall store: 307 entries Ā· 0 promoted Ā· 307 concept-tagged Ā· 0 spaced Ā· scripts=307 latin
Recall path: ~/.openclaw/workspace/memory/.dreams/short-term-recall.json
Recall updated: 2026-04-21T07:36:53.434Z
Dreaming artifacts: diary present Ā· 1 corpus files Ā· ingestion state present
Dream corpus: ~/.openclaw/workspace/memory/.dreams/session-corpus
Dream ingestion: ~/.openclaw/workspace/memory/.dreams/session-ingestion.json
Dream diary: ~/.openclaw/workspace/DREAMS.md

matias@matias-System-Product-Name:~$

old db, how do we use this now

abstract acornBOT
#

Good, that means the old DB is back.

But to actually use that old DB, you need the embedding backend that matches it.

And your restored DB says it expects:

  • Vector dims: 4096

So the rule is simple

To use this DB live, OpenClaw must query with a backend that also returns 4096-dim embeddings.

Important

If you keep the current remote openai/llama-server setup that was returning 2560, then this old DB is restored but still not usable for search.

So your next move is one of these

If you want to use the old DB

#

Put back the old embedding config, for example the old local path:

memorySearch: {
  provider: "local",
  local: {
    modelPath: "/home/matias/llama.cpp/models/qwen/embedding/Qwen3-Embedding-4B-Q6_K.gguf"
  }
}

Then restart OpenClaw and test:

openclaw memory search --agent main --query "I also have my card in Trading 212" --max-results 5 --min-score 0 --json
#

If that old local path still OOMs

Then the old DB is basically just a backup, not a usable live index.

In that case:

  • keep it safe
  • and if you want the remote 8081 embedding setup, you’ll need a clean reindex for that setup

Blunt version

  • old DB restored āœ…
  • usable right now? only if you also restore a 4096-dim embedding backend

If you want, I can write the exact memorySearch block you should paste in for the old DB path versus the new 8081 path, so you can switch cleanly between them.

rapid garden
#

matias@matias-System-Product-Name:~$ openclaw memory search --agent main --query "I also have my card in Trading 212" --max-results 5 --min-score 0 --json
{
"results": []
}
^Cmatias@matias-System-Product-Name:~$ ^C
matias@matias-System-Product-Name:~$ openclaw memory search --agent main --query "Trading 212" --max-results 5 --min-score 0 --json
{
"results": []
}

using 4b model, why does it not work when I specifically used the 4B model to do this yesterday?

abstract acornBOT
#

Because ā€œ4B modelā€ is unfortunately not specific enough.

What matters is the exact embedding generator, not just the family/name.

You proved:

  • the restored DB says Vector dims: 4096
  • your current 4B llama-server endpoint returns 2560

So even though both are ā€œQwen 4B embeddingā€, they are not producing the same embeddings.

In practice that means

Yesterday’s index was built with something that produced 4096-dim vectors.

Today’s live query path is producing 2560-dim vectors.

Those do not match, so search returns nothing.

#

Why this can happen

Because these all matter:

  • provider/runtime (local node-llama-cpp vs llama-server OpenAI endpoint)
  • exact model handling
  • embedding output shape
  • maybe server-side implementation details

So ā€œsame 4B file/model nameā€ does not guarantee ā€œsame embedding spaceā€.

Blunt answer

It doesn’t work because today’s 4B backend is not equivalent to yesterday’s 4B backend, even if the label makes it look like it should be.

And yes, I think OpenClaw should make this much clearer instead of quietly failing.

So the real rule is:

#
  • if you want to use the old DB, you need the exact old 4096-dim embedding runtime
  • if you want to use the new llama-server backend, you need a fresh index built by that backend

That’s the actual reason.

rapid garden
#

okay can we try to see how we can achieve this 2560, how do I get this info with what command

worldly moatBOT
abstract acornBOT
#

Yep, compare these two numbers:

1. What the live embedding server returns

This is the authoritative one:

curl -s http://127.0.0.1:8081/v1/embeddings \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer dummy' \
  -d '{
    "model": "Qwen3-Embedding-4B-Q6_K",
    "input": "test"
  }' | jq '.data[0].embedding | length'

That prints the live query embedding dimension.
Right now for you, that has been 2560.

#

2. What the current OpenClaw DB expects

Use:

openclaw memory status --agent main --deep | rg 'Provider:|Model:|Vector dims:'

That prints the DB/index-side info, including:

  • provider
  • model
  • Vector dims

If you want the DB-level value directly from SQLite

You can also try:

#
sqlite3 ~/.openclaw/memory/main.sqlite "select json_array_length(embedding) from chunks limit 5;"

If JSON functions are available, that will show the stored vector length in the DB.

The rule

To make search work, these need to match:

  • curl ... | jq '.data[0].embedding | length'
  • openclaw memory status ... → Vector dims: ...

If you want, paste the output of these two commands together and I’ll tell you immediately whether the setup is consistent.

rapid garden
#

returning 4096 with current model

worldly moatBOT
abstract acornBOT
#

Nice, that’s the first actually promising result.

If the live embedding call is now 4096, then it finally matches the old DB’s 4096.

Next two commands

openclaw memory status --agent main --deep | rg 'Provider:|Model:|Vector dims:|Indexed:|Dirty:'

and then:

openclaw memory search --agent main --query "I also have my card in Trading 212" --max-results 5 --min-score 0 --json

What this means

#
  • if search works now, the issue really was the embedding mismatch
  • if search is still empty, then we’ve ruled out the dimension mismatch too, and it starts looking like a real search bug

Paste those outputs.

rapid garden
#

compact this conversation and we will pick up in another thread please

abstract acornBOT
#

Resume note for next thread:

  • Original issue: you couldn’t see OpenClaw ā€œthinkingā€. Main takeaway:

    • /think changes reasoning effort
    • /reasoning only shows reasoning if the backend returns it
    • /verbose is the better knob for seeing tool/activity traces
  • Memory/embedding investigation:

    • ~/.openclaw/workspace/memory contains ~7.2k .md files
    • OpenClaw indexes Markdown from workspace memory files into ~/.openclaw/memory/main.sqlite
    • We verified the DB contained the target file and chunks via sqlite3
  • Big problem discovered:

    • old DB expected 4096-dim vectors
    • new llama-server embedding endpoint at 127.0.0.1:8081 was at one point returning 2560
    • that mismatch explains broken search
    • later you said the current model is now returning 4096, which is promising
#
  • Failed reindex:

    • a forced reindex later failed on embedding input size (2071 tokens > 2048)
    • that temporarily left the DB empty
    • you restored the old DB successfully
  • Current desired setup:

    • main chat model on 127.0.0.1:8080
    • embedding model on 127.0.0.1:8081
    • current check from you: embedding endpoint now returns 4096 for the active model
  • Last recommended verification steps:

    openclaw memory status --agent main --deep | rg 'Provider:|Model:|Vector dims:|Indexed:|Dirty:'
    openclaw memory search --agent main --query "I also have my card in Trading 212" --max-results 5 --min-score 0 --json
    
  • Interpretation: