#why cant I see what OpenClaw is thinking
1 messages Ā· Page 1 of 1 (latest)
First, can you paste openclaw --version?
Most likely this is the mismatch:
/thinkchanges reasoning effort/budget/reasoning onchanges whether returned reasoning is shown/verbose onshows tool/activity traces, which is usually the closer answer to āwhat is OpenClaw planning/runningā
A few important gotchas:
- On Discord, reasoning shows up as a separate message starting with
Reasoning:, not inline. - With Qwen/OpenAI-compatible backends, OpenClaw can only show reasoning if the upstream actually emits reasoning fields. If it doesn't,
/reasoning onhas nothing to display. - If you want to see actions/tool calls, try
/verbose onor/verbose full, not just/think//reasoning.
Quick checks:
/thinkā shows current effective thinking level/reasoningā shows current reasoning visibility
/verbose on/status
Docs:
- https://docs.openclaw.ai/tools/thinking
- https://docs.openclaw.ai/concepts/messages
- https://docs.openclaw.ai/providers/qwen
If you paste:
openclaw --version/status- your exact model ref/provider (
qwen/...,openrouter/...,lmstudio/..., etc.)
ā¦I can tell you pretty quickly which layer is swallowing the reasoning.
matias@matias-System-Product-Name:~$ openclaw --version
OpenClaw 2026.4.15 (041266a)
matias@matias-System-Product-Name:~$
/status
š¦ OpenClaw 2026.4.15 (041266a)
š§ Model: llama-server/Qwen3.6-35B-A3B-Uncensored.Q6_K_P Ā· š unknown
š§® Tokens: 0 in / 0 out
š Context: 0/220k (0%) Ā· š§¹ Compactions: 7
𧵠Session: agent:main:main ⢠updated just now
āļø Runtime: direct Ā· Think: high Ā· Reasoning: on
šŖ¢ Queue: collect (depth 1)
Yep, the key clue is this line:
Model: llama-server/Qwen3.6-35B-A3B-Uncensored.Q6_K_P
So you're not on native qwen/..., you're on a local/custom OpenAI-compatible llama-server backend running a Qwen model behind it. That matters.
What your status means:
Think: high= OpenClaw session is set to ask for more reasoning effortReasoning: on= OpenClaw is allowed to show reasoning if the backend returns it- it does not guarantee the backend actually emits a separate reasoning stream
And for local/custom OpenAI-compatible /v1 backends, the docs call out that OpenClaw does not apply OpenAI reasoning-compat payload shaping there. So with llama-server/..., if the server/model does not emit structured reasoning fields, OpenClaw has nothing to show.
So the short answer is:
- Your flags are set correctly
- The backend likely isn't exposing reasoning in a way OpenClaw can surface
- If you want to see what OpenClaw is doing, use:
/verbose on- or
/verbose full
That shows tool/activity traces, which is usually what people mean by āplanning/runningā
Also, Reasoning: on in /status is just the visibility toggle, not proof that reasoning blocks are available.
Docs:
- https://docs.openclaw.ai/tools/thinking
- https://docs.openclaw.ai/gateway/local-models
- https://docs.openclaw.ai/concepts/messages
If you want, paste your models.providers.llama-server config block. I can tell you whether this is:
- a model capability issue,
- a llama-server transport issue,
- or a config issue like the model not being marked
reasoning: true.
okay. now, is my embedding setup done correctly?
{
"agents": {
"defaults": {
"workspace": "/home/matias/.openclaw/workspace",
"model": {
"primary": "llama-server/Qwen3.6-35B-A3B-Uncensored.Q6_K_P"
},
"memorySearch": {
"provider": "local",
"local": {
"modelPath": "/home/matias/llama.cpp/models/qwen/embedding/Qwen3-Embedding-4B-Q6_K.gguf"
},
"extraPaths": [],
"sync": {
"onSearch": false,
"onSessionStart": true,
"watch": false
},
"store": {
"vector": {
"extensionPath": "/home/matias/.npm-global/lib/node_modules/openclaw/node_modules/sqlite-vec-linux-x64/vec0.so",
"enabled": true
}
}
},
"compaction": {
"mode": "safeguard",
"reserveTokensFloor": 20000,
"notifyUser": true
},
"contextLimits": {
"toolResultMaxChars": 2000
},
"contextPruning": {
"mode": "cache-ttl",
"ttl": "5m"
}
}
},
"models": {
"mode": "merge",
"providers": {
"llama-server": {
"baseUrl": "http://127.0.0.1:8080/v1",
"api": "openai-completions",
"models": [
{
"id": "Qwen3.6-35B-A3B-Uncensored.Q6_K_P",
"name": "Local llama.cpp model",
"reasoning": false,
"input": [
"text"
],
"contextWindow": 220000,
"maxTokens": 8192
}
]
}
}
},
"gateway": {
"mode": "local",
"auth": {
"mode": "token",
"token":
Mostly yes, your embedding config looks valid.
What looks right:
memorySearch.provider: "local"āmemorySearch.local.modelPathpointing at a GGUF embedding model āstore.vector.enabled: trueāsync.onSessionStart: trueā
A few important notes though:
-
reasoning: falseis unrelated to embeddings- that only affects your chat model, not memory embeddings
-
extraPaths: []means you are not indexing extra docs- so memory search will mostly use the default memory roots
- if you expected arbitrary notes/docs outside the default memory files, you're not there yet
-
Your sync policy is conservative
onSessionStart: trueonSearch: falsewatch: false
This means updates will not be picked up live. If you edit memory files, youāll usually need a restart/new session or:openclaw memory index --force
-
store.vector.extensionPathis probably unnecessary- valid, yes
- but a bit fragile across upgrades
- docs say to override it only if auto-discovery fails
So my verdict is:
- Config shape: yes, looks correct
- Will local embeddings work: probably yes
- Is it ideal: mostly, but Iād consider removing the hardcoded
extensionPathunless you actually needed it
Best verification commands:
openclaw memory status --deep
openclaw memory index --force
openclaw memory search "some phrase you know exists"
If memory status --deep shows the local embedding model loaded and vector search healthy, you're good.
Docs:
- https://docs.openclaw.ai/reference/memory-config
- https://docs.openclaw.ai/concepts/memory-search
- https://docs.openclaw.ai/cli/memory
If you want, paste the output of openclaw memory status --deep and Iāll sanity-check the whole thing.
what should I do with extrapaths? if I have a vast corpus of data in json files which have been indexed, how do I tell OpenClaw to use them? should I move them somewhere myself or will OpenClaw do it? what should I do now?
extraPaths is basically: āhere are more files/directories OpenClaw should index itself.ā
A few important clarifications:
- OpenClaw will not move your corpus for you
- OpenClaw will not automatically reuse some other prebuilt JSON/vector index
- With the builtin memory engine,
extraPathsis for Markdown files (.md) plus optional multimodal files, not raw.json
So if your corpus is a big pile of JSON files, the answer is:
What to do now
Best simple path
- Convert the JSON corpus into text/Markdown files
- one file per record, document, chunk, etc.
- Put them in a directory, for example:
/home/matias/openclaw-corpus/
- Add that directory to
memorySearch.extraPaths - Reindex:
openclaw memory index --force - Test:
openclaw memory search "some known phrase"
Example:
{
agents: {
defaults: {
memorySearch: {
provider: "local",
local: {
modelPath: "/home/matias/llama.cpp/models/qwen/embedding/Qwen3-Embedding-4B-Q6_K.gguf"
},
extraPaths: ["/home/matias/openclaw-corpus"]
}
}
}
}
What extraPaths is good for
- notes outside the workspace
- docs folders
- exported knowledge bases
- Markdown archives you want searchable
What it is not
- a pointer to an existing JSON embedding store
- an automatic importer for arbitrary
.jsoncorpora
If your corpus is huge
Iād seriously consider QMD for the search backend, but even then the practical move is still to give it text/markdown-ish source files, not just raw JSON blobs.
So my recommendation is:
- Do not move files into
memory/manually unless you want them to become your main memory notes - instead, make a separate corpus folder
- convert JSON ā
.md - point
extraPathsthere - run
openclaw memory index --force
Docs:
- https://docs.openclaw.ai/reference/memory-config#additional-memory-paths
- https://docs.openclaw.ai/concepts/memory-builtin
- https://docs.openclaw.ai/concepts/memory-qmd
If you want, I can help you design a good JSON ā Markdown layout so retrieval quality doesnāt suck.
I reindexed last night over 7000 files, not going to do that again as everything was supposed to be working properly, how do I know OpenClaw has actually learned something from it
Yeah, donāt reindex 7000 files blind again š
The important distinction is:
OpenClaw does not ālearnā them like training a model.
It indexes them into a searchable memory store.
So the question is really: did those files become retrievable?
How to check without reindexing
1. Check the index health/counts
openclaw memory status --deep --json
What you want to see:
- embedding provider = your local one
- vector search healthy
- non-zero
files/chunks - ideally
sourceCountsthat make sense
2. Probe with a known unique phrase
Pick a phrase that exists in exactly one of those files:
openclaw memory search "very specific phrase from one file" --json
If OpenClaw really indexed it, you should get:
- the matching snippet
- the source path
- a sensible score
3. Sanity-check with something impossible to fake
Search for:
- a weird UUID
- a rare error string
- a unique title
- a specific proper noun
That tells you way more than āreindex completedā.
The big gotcha
If those 7000 files were mostly JSON, then with the builtin memory engine that is a problem.
Builtin memory indexing for extraPaths is basically:
.mdfiles ā
- optional image/audio with multimodal ā
- raw
.jsonfiles ā
So if the corpus was JSON, the painful possibility is:
- the reindex job ran
- but the builtin memory system did not actually ingest those JSON files as memory content
Thatās exactly why a targeted memory search test is the truth serum here.
What Iād do right now
Run these, in order:
openclaw memory status --deep --json
openclaw memory search "some unique phrase from the corpus" --json
If you paste those outputs, I can tell you very quickly whether:
- the index is healthy,
- the files are actually in it,
- or OpenClaw skipped the corpus entirely.
Docs:
matias@matias-System-Product-Name:~$ openclaw memory search "trading 212" --json{
"results": []
}
so I did all of it for nothing
Thatās frustrating, yeah.
Based on everything youāve shown, the likely answer is:
for the JSON corpus, probably yes, the builtin memory path did not ingest it the way you expected.
Not because memory is broken in general, but because with your current setup:
memorySearch.extraPathsis for files OpenClaw can index itself- the builtin engine expects Markdown/text-like content
- raw
.jsoncorpus files are not the happy path there
So the empty result strongly suggests OpenClaw cannot currently recall that corpus.
A more precise version:
- if you indexed
.mdfiles too, those may still be fine
- but if the important 7k files were JSON, that work did not become useful memory search
What Iād do now
Best supported path
Convert the JSON corpus to .md or plain text, then point extraPaths at that folder and reindex once.
If the corpus is huge
Use QMD as the backend, but still feed it text/markdown documents, not raw JSON blobs.
If you want to keep the existing JSON/vector setup as-is
Then honestly, memorySearch is probably the wrong layer.
Youād want a custom tool/plugin that queries your existing store directly instead of trying to force it through builtin memory indexing.
My recommendation
Donāt keep poking this current setup hoping itāll suddenly work.
Your clean options are:
- Convert JSON ā Markdown/text and reindex ā best practical fix
- Switch to QMD for large corpora
- Build/query your existing index directly via a tool/plugin
If you want, I can help you design the fastest JSON ā Markdown conversion format so you only do the painful reindex one more time, properly.
if I indexed it all last night, I guess it saved the stuff somewhere? how do I point the stuff
If you indexed it through OpenClaw, then yes, it should have saved into OpenClawās own memory store.
For the builtin memory engine, that is usually:
~/.openclaw/memory/{agentId}.sqlite
So for the default agent, itās often roughly:
~/.openclaw/memory/main.sqlite
Important distinction
There are two different things you can āpointā:
extraPathsā points OpenClaw at source files to indexmemorySearch.store.pathā points OpenClaw at its own SQLite index file
So if you already indexed last night, you do not normally āpoint it againā unless:
- you changed agent IDs
- you changed config/store path
- or youāre now searching with a different memory backend/config than the one used for indexing
Key insight
If the config is the same, you are already pointing at the saved index.
So if search is still empty, the likely issue is not āwhere is the DB?ā, itās one of these:
- the DB does not contain the files you expected
- those files were skipped
- or they were never valid inputs for builtin memory search, like raw JSON
What to check
openclaw memory status --deep --json
ls -lah ~/.openclaw/memory
What Iād look for:
- actual store path
- agent id
- file count
- chunk count
If you want to force a custom location for OpenClawās own DB, thatās this key:
agents: {
defaults: {
memorySearch: {
store: {
path: "/some/path/main.sqlite"
}
}
}
}
But that only works for OpenClawās SQLite store, not some arbitrary external JSON/vector index.
So the short answer is:
- yes, it probably saved somewhere
- no,
extraPathsis not how you point at the saved DB - if you indexed with the same agent/config, OpenClaw should already be using it
Paste openclaw memory status --deep --json and I can tell you exactly whether youāre hitting the right store or the wrong one.
yes theres around 7200 .md files in /home/matias/.openclaw/workspace/memory, step by step what to do now
Ah, okay, that changes things.
If the files are already .md under /home/matias/.openclaw/workspace/memory, then donāt move anything and ignore extraPaths for now. Those are already in the default memory location.
Do this, step by step:
1. Verify the phrase actually exists
Run:
rg -n -i "trading 212|trading212" /home/matias/.openclaw/workspace/memory | head -20
Why:
- if this returns nothing,
memory searchis not the problem, the phrase just is not in those files in that form
2. Check what OpenClaw thinks the memory index contains
Run:
openclaw memory status --agent main --deep --json
What you want to see:
- non-zero
files - non-zero
chunks - a sane store path
- healthy embedding/vector status
3. If files or chunks are zero, the index is not actually populated
Then run:
openclaw memory status --agent main --deep --index --verbose
Thatās the least dumb next step because it only reindexes if the store is dirty, instead of blindly forcing everything again.
4. If status looks healthy, test with an exact phrase from step 1
Take one literal phrase from an rg hit and run:
openclaw memory search --agent main "EXACT PHRASE FROM ONE FILE" --json
Use something very specific, not a vague concept.
5. Interpret the result
rgfinds it, butmemory searchdoes not, and status shows plenty of files/chunks
ā that smells like an indexing/search bug or wrong store/agent mismatch
rgdoes not find it
ā the search term is the issue- status shows
0files/chunks
ā the index was never built or is not being read
6. Paste back these two outputs
Please paste:
openclaw memory status --agent main --deep --json- one
rghit + the matchingopenclaw memory search ... --json
Then I can tell you exactly which layer is failing.
So short version: donāt move files, donāt touch extraPaths, verify the text exists, then check the index.
atias@matias-System-Product-Name:~$ rg -n -i "trading 212|trading212" /home/matias/.openclaw/workspace/memory | head -20
/home/matias/.openclaw/workspace/memory/investment-allocation-analysis-part2.md:66:I also have my card in Trading 212, where I have around 1 000-1 500 of balance.
what now
Good, that proves the content is really there.
Now do these next two commands only:
1. Check whether the memory DB actually has content
openclaw memory status --agent main --deep --json
2. Search for a more exact phrase from that line
openclaw memory search --agent main "card in Trading 212" --json
Why this matters:
- if status shows lots of files/chunks but the exact phrase still returns nothing, thatās likely a real indexing/search bug or wrong-store mismatch
- if status shows 0 files/chunks, the DB is empty or not the one being used
Paste both outputs and Iāll tell you the next move.
matias@matias-System-Product-Name:~$ openclaw memory status --agent main --deep --json
ā
ā
[
{
"agentId": "main",
"status": {
"backend": "builtin",
"files": 7268,
"chunks": 20083,
"dirty": false,
"workspaceDir": "/home/matias/.openclaw/workspace",
"dbPath": "/home/matias/.openclaw/memory/main.sqlite",
"provider": "local",
"model": "/home/matias/llama.cpp/models/qwen/embedding/Qwen3-Embedding-4B-Q6_K.gguf",
"requestedProvider": "local",
"sources": [
"memory"
],
"extraPaths": [
"/home/matias/openclaw-corpus"
],
returns the conig
Thatās actually good news.
This confirms:
- OpenClaw is reading a real index
- it has 7268 files
- it has 20083 chunks
- itās using
main.sqlite - the index is not dirty
So this is not ānothing got savedā.
It means the problem is now narrowed down to retrieval/search behavior, not missing files.
Next step
Run the exact-phrase search now:
openclaw memory search --agent main "I also have my card in Trading 212" --json
If that still comes back empty, then weāve basically proven:
- the file is on disk
- the index exists
- but retrieval is failing for a literal hit
ā¦and that starts looking like a real bug / search mismatch.
If it returns a hit, then the issue is just that "trading 212" was too weak / odd for the current retrieval path.
Paste that exact search output next.
{
"results": []
}
Yep, that output is normal-ish, and the part you pasted already tells us a lot.
We now know:
- the DB exists
- it has 7268 files / 20083 chunks
- OpenClaw is reading
main.sqlite
So the next thing that matters is whether the two search engines are healthy:
- FTS (keyword/literal match)
- vector (embedding search)
Please run this:
openclaw memory status --agent main --deep --json | rg -n '"fts"|"vector"|error|available|loadError|fallback' -A3
What Iām looking for:
fts.available: truevector.available: true
Why:
- if FTS is unavailable, exact string hits can fail
- if vector is unavailable, semantic search can fail
- if both are bad, you get exactly the behavior youāre seeing
So weāre very close now, paste that part next.
^Cmatias@matias-System-Product-Name:~$ openclaw memory status --agent main --deep --json | rg -n '"fts"|"vector"|error|available|loadError|fallback' -A3
[node-llama-cpp] ggml_backend_cuda_buffer_type_alloc_buffer: allocating 2898.00 MiB on device 0: cudaMalloc failed: out of memory
[node-llama-cpp] ggml_gallocr_reserve_n_impl: failed to allocate CUDA0 buffer of size 3038775808
[node-llama-cpp] graph_reserve: failed to allocate compute buffers
[node-llama-cpp] llama_init_from_model: failed to initialize the context: failed to allocate compute pp buffers
[node-llama-cpp] ggml_backend_cuda_buffer_type_alloc_buffer: allocating 2448.00 MiB on device 0: cudaMalloc failed: out of memory
[node-llama-cpp] ggml_gallocr_reserve_n_impl: failed to allocate CUDA0 buffer of size 2566916608
[node-llama-cpp] graph_reserve: failed to allocate compute buffers
[node-llama-cpp] llama_init_from_model: failed to initialize the context: failed to allocate compute pp buffers
this is the issue, do I have ot run this via node llama cpp, and why?
Yep, thatās the issue.
Short version:
- You do not run it manually via node-llama-cpp
- OpenClaw is already using node-llama-cpp internally
- it does that because you configured
memorySearch.provider: "local"with a GGUF embedding model - your chosen embedding model is blowing up on GPU memory
So this config:
memorySearch: {
provider: "local",
local: {
modelPath: "/home/matias/llama.cpp/models/qwen/embedding/Qwen3-Embedding-4B-Q6_K.gguf"
}
}
means:
āOpenClaw, please load this local GGUF embedding model yourself.ā
And the local GGUF loader OpenClaw uses for that is node-llama-cpp.
Why it fails
Your logs show GPU OOM:
- allocating ~2.4 to 2.9 GiB
cudaMalloc failed: out of memory
So the 4B Qwen embedding model is too heavy for the available VRAM in this path.
What to do now
Easiest fix
Use a smaller local embedding model.
The docs default is a much smaller local model:
embeddinggemma-300m-qat-Q8_0.gguf
So either:
- remove
local.modelPathand let OpenClaw use its smaller default, or - point it to a much smaller embedding GGUF than
Qwen3-Embedding-4B-Q6_K.gguf
Important downside
If you change embedding provider/model, you will need a reindex for proper semantic search.
If you do not want node-llama-cpp involved at all
Then donāt use:
memorySearch.provider: "local"
Use another embedding provider instead, for example:
ollamaopenaigeminivoyagemistral
My recommendation
Given where you are, Iād do this:
- switch to a smaller local embedding model
- restart/reload
- run
openclaw memory status --agent main --deep - then reindex once, properly
Docs:
- https://docs.openclaw.ai/reference/memory-config#local-embedding-config
- https://docs.openclaw.ai/concepts/memory-search
If you want, I can give you the exact minimal config change to stop using the 4B embedding model.
no I wont reindex, I just reindexed with 4B yesterday, does the embedding model have to be running along with openclaw all the time? cant I just run it via llama-server
Not all the time, but at query time too, yes.
Thatās the key bit:
- during indexing, OpenClaw embeds all your chunks
- during search, OpenClaw also has to embed your query
- so some embedding backend must be available whenever you search
With your current config
Because you set:
memorySearch.provider: "local"
OpenClaw tries to load the GGUF embedding model itself via node-llama-cpp.
So in your current setup:
- yes, OpenClaw needs to be able to load that embedding model on demand
- thatās why node-llama-cpp shows up
- thatās why GPU OOM is killing it
Can you use llama-server instead?
Yes, in principle.
But then you should not use provider: "local".
Youād switch to a remote/OpenAI-compatible embedding setup, for example:
memorySearch: {
provider: "openai",
model: "your-embedding-model-name",
remote: {
baseUrl: "http://127.0.0.1:8081/v1",
apiKey: "dummy"
}
}
That only works if your llama-server instance actually exposes a working /v1/embeddings endpoint for the embedding model.
The annoying truth
I donāt want to bullshit you here:
If you change how OpenClaw sees the embedding provider/model, it may still want a reindex to be fully consistent.
So:
- current local config = no manual server, but node-llama-cpp must load the model
- llama-server config = no node-llama-cpp for embeddings, but only if llama-server serves embeddings properly
- switching transport/provider may still force a reindex
Docs:
- https://docs.openclaw.ai/reference/memory-config#remote-endpoint-config
- https://docs.openclaw.ai/reference/memory-config#local-embedding-config
If you want, I can give you the exact config shape for using llama-server as the embedding backend so you can at least test that path.
okay I will change to a 0.6b model, how do we make sure we can run it? any way to run it on CPU so it wont run out of memory?
Yes, thereās a sane way to test this first.
Good news
A 0.6B embedding model is much more realistic.
Important caveat
I donāt see an OpenClaw memorySearch config knob to force CPU-only for the built-in local embedding path.
From the current code path, local embeddings are loaded through node-llama-cpp with basically just the modelPath, not a CPU/GPU/offload setting exposed in OpenClaw config.
So:
- smaller model = yes, supported and sensible
- force CPU in OpenClaw local embeddings = I donāt think thereās a first-class config switch for that right now
How to make sure the 0.6B model runs
- Change
memorySearch.local.modelPathto the 0.6B GGUF - Run this:
openclaw memory status --agent main --deep - If it completes without CUDA OOM, the model is loadable in your setup
That command is the best quick probe because it tries the embedding backend without making you guess.
If you want CPU on purpose
The cleaner path is usually:
- run the embedding model in llama-server / Ollama / LM Studio configured the way you want
- then point OpenClaw at that remote embedding endpoint
- instead of using
memorySearch.provider: "local"
That avoids the built-in node-llama-cpp local-loading path.
My recommendation
If you want the least drama:
- try the 0.6B model first
- if it still fights your GPU setup, switch to a remote embedding server
- donāt keep wrestling the 4B local path
If you want, send me the exact 0.6B model path you plan to use and Iāll tell you whether the config change looks right before you try it.
okay I am running it on llama-server under 127.0.0.1 8081
Nice, thatās the cleaner path.
Then Iād stop using provider: "local" and point memory search at that server instead.
Config shape to try
Use something like:
{
agents: {
defaults: {
memorySearch: {
provider: "openai",
model: "YOUR_EMBEDDING_MODEL_ID",
remote: {
baseUrl: "http://127.0.0.1:8081/v1",
apiKey: "dummy"
}
}
}
}
}
Important
- remove the
local.modelPathblock when you do this YOUR_EMBEDDING_MODEL_IDshould be whatever your llama-server exposes for the embedding model- if llama-server does not require auth, a dummy API key is usually fine
First check
Make sure the server really exposes the embedding model:
curl http://127.0.0.1:8081/v1/models
If that looks right, then after updating config run:
openclaw memory status --agent main --deep
If that passes without the node-llama-cpp CUDA OOM spam, then OpenClaw is talking to llama-server for embeddings instead of trying to load the GGUF locally.
One warning
I want to be straight with you: if OpenClaw sees this as a different embedding provider/model setup than yesterdayās local index, it may still want a reindex for full correctness.
But this is still the right next step, because first we need to make sure the embedding backend is stable and queryable.
Docs:
If you paste the output of curl http://127.0.0.1:8081/v1/models, I can tell you the exact model: value to use.
572,"owned_by":"llamacpp","meta":{"vocab_type":2,"n_vocab":151665,"n_ctx_train":40960,"n_embd":2560,"n_params":4021774336,"size":3299735284}}]}matias@matias-System-Product-Name:~$ openclaw memory status --agent openclaw memory status --agent main --deep
š¦ OpenClaw 2026.4.15 (041266a)
If something's on fire, I can't extinguish itābut I can write a beautiful postmortem.
ā
ā
Memory Search (main)
Provider: openai (requested: openai)
Model: Qwen3-Embedding-0.6B-f16.gguf
Sources: memory
Indexed: 7268/7271 files Ā· 20083 chunks
Dirty: no
Store: ~/.openclaw/memory/main.sqlite
Workspace: ~/.openclaw/workspace
Dreaming: 0 3 * * * Ā· limit=10 Ā· minScore=0.8 Ā· minRecallCount=3 Ā· minUniqueQueries=3 Ā· recencyHalfLifeDays=14 Ā· maxAgeDays=30
Embeddings: ready
By source:
memory Ā· 7268/7271 files Ā· 20083 chunks
Vector: ready
Vector dims: 4096
Vector path: ~/.npm-global/lib/node_modules/openclaw/node_modules/sqlite-vec-linux-x64/vec0.so
FTS: ready
Embedding cache: enabled (40012 entries)
Batch: disabled (failures 0/2)
Recall store: 307 entries Ā· 0 promoted Ā· 307 concept-tagged Ā· 0 spaced Ā· scripts=307 latin
Recall path: ~/.openclaw/workspace/memory/.dreams/short-term-recall.json
Recall updated: 2026-04-21T07:36:53.434Z
Dreaming artifacts: diary present Ā· 1 corpus files Ā· ingestion state present
Dream corpus: ~/.openclaw/workspace/memory/.dreams/session-corpus
Dream ingestion: ~/.openclaw/workspace/memory/.dreams/session-ingestion.json
Dream diary: ~/.openclaw/workspace/DREAMS.md
matias@matias-System-Product-Name:~$
Nice, this is actually healthy now.
Youāve got:
- Embeddings: ready
- Vector: ready
- FTS: ready
- 7268 files / 20083 chunks
- remote embedding model responding on llama-server
So the backend problem looks fixed. š
Next test, and donāt change anything else yet
Run this exact command:
openclaw memory search --agent main --query "I also have my card in Trading 212" --max-results 5 --min-score 0 --json
Why this version:
- exact known phrase
--min-score 0removes score filtering as a reason for empty results
What it means
- if this now returns the file/snippet, youāre good
- if this is still empty even with:
- indexed files present
- embeddings ready
- vector ready
- FTS ready
- min-score forced to 0
then Iād call that a real bug or stale-index mismatch, not user error
Paste that result next.
didnt find anything, the llama-server still appears to working on something but I have no idea with what
The llama-server activity is probably just OpenClaw asking for query embeddings. That part makes sense.
At this point, Iād stop guessing and inspect the DB directly, read-only.
Run these:
sqlite3 ~/.openclaw/memory/main.sqlite "select path from files where path like '%investment-allocation-analysis-part2%';"
sqlite3 ~/.openclaw/memory/main.sqlite "select count(*) from chunks where path like '%investment-allocation-analysis-part2%';"
sqlite3 ~/.openclaw/memory/main.sqlite "select path,start_line,end_line from chunks where text like '%Trading 212%' limit 5;"
How to read it
-
if the first/second commands return nothing / 0
ā the file was not actually indexed into the DB -
if the third command returns rows with
Trading 212
ā the chunk is in the DB, andmemory searchfailing is a real search-layer bug
Thatās the cleanest next step without doing another reindex.
If those rows are present but openclaw memory search still returns [], Iād call that GitHub-issue territory:
https://github.com/openclaw/openclaw/issues
Paste the sqlite output and Iāll tell you which side of the line youāre on.
matias@matias-System-Product-Name:~$ sqlite3 ~/.openclaw/memory/main.sqlite "select path from files where path like '%investment-allocation-analysis-part2%';"
memory/investment-allocation-analysis-part2.md
matias@matias-System-Product-Name:~$ sqlite3 ~/.openclaw/memory/main.sqlite "select count(*) from chunks where path like '%investment-allocation-analysis-part2%';"
3
matias@matias-System-Product-Name:~$ sqlite3 ~/.openclaw/memory/main.sqlite "select path,start_line,end_line from chunks where text like '%Trading 212%' limit 5;"
memory/asenna-flask-part1.md|37|90
memory/asenna-flask-part1.md|82|108
memory/budjetin-sovelluksen-bugit-ja-korjaukset-part10.md|23|48
memory/budjetin-sovelluksen-bugit-ja-korjaukset-part10.md|40|73
memory/budjetin-sovelluksen-bugit-ja-korjaukset-part13.md|16|27
matias@matias-System-Product-Name:~$
what is the optimal way to run embedding with openclaw?
Your sqlite output is actually really useful.
It proves:
- the file is in the DB
- it has 3 chunks
Trading 212text is definitely indexed
So openclaw memory search returning [] is very likely not your fault. That looks like bug / retrieval mismatch territory.
For your actual question:
Optimal way to run embeddings with OpenClaw
My recommendation for your setup
Run embeddings as a separate embedding server and let OpenClaw call it over HTTP.
In your case, that means:
- keep the embedding model on llama-server at
127.0.0.1:8081 - use a small embedding model like
0.6B - point
memorySearchat it viaremote.baseUrl - keep chat model serving separate from embedding serving if possible
Why this is the best path for you
- avoids OpenClaw loading GGUF embeddings in-process via
node-llama-cpp - avoids GPU OOM from the local provider path
- lets you run the embedding server on CPU if you want
- makes the embedding backend easier to inspect independently
What I would avoid
I would not use:
memorySearch.provider: "local"
for a big embedding model unless you specifically want in-process loading and you know the box can handle it.
That path is the most fragile.
Practical ranking
Best overall reliability
- hosted embeddings (
openai,voyage,gemini) - least pain, most boring, works well
Best fully local for you
- llama-server / Ollama / LM Studio as a separate embedding server
- OpenClaw talks to it remotely
Worst for your current setup
- big GGUF +
provider: "local"inside OpenClaw
Recommendation
For you, Iād settle on:
memorySearch.provider: "openai"memorySearch.remote.baseUrl: "http://127.0.0.1:8081/v1"- small embedding model on the server
- CPU if you want predictable memory behavior
Also, because the DB clearly has the data but memory search still comes back empty, I think this is worth filing as a bug:
https://github.com/openclaw/openclaw/issues
If you want, I can give you the exact config block Iād use as the final āgoodā embedding setup for your machine.
{
"agents": {
"defaults": {
"workspace": "/home/matias/.openclaw/workspace",
"model": {
"primary": "llama-server/Qwen3.6-35B-A3B-Uncensored.Q6_K_P"
},
"memorySearch": {
"provider": "openai",
"model": "Qwen3-Embedding-0.6B-f16.gguf",
"remote": {
"baseUrl": "http://127.0.0.1:8081/v1",
"apiKey": "dummy"
}
},
"compaction": {
"mode": "safeguard",
"reserveTokensFloor": 20000,
"notifyUser": true
},
"contextLimits": {
"toolResultMaxChars": 2000
},
"contextPruning": {
"mode": "cache-ttl",
"ttl": "5m"
}
}
},
"models": {
"mode": "merge",
"providers": {
"llama-server": {
"baseUrl": "http://127.0.0.1:8080/v1",
"api": "openai-completions",
"models": [
{
"id": "Qwen3.6-35B-A3B-Uncensored.Q6_K_P",
"name": "Local llama.cpp model",
"reasoning": false,
"input": [
"text"
],
"contextWindow": 220000,
"maxTokens": 8192
}
]
}
}
},
Yep, that config shape looks right for the setup Iād recommend.
What looks good
- chat model stays on your local
llama-serverat:8080 - embeddings move to the separate server at
:8081 - no
local.modelPathanymore, so OpenClaw stops trying to load the embedding GGUF itself - embedding model is pinned explicitly
Tiny tweak Iād add
Iād make the embedding provider pin explicit:
memorySearch: {
provider: "openai",
fallback: "none",
model: "Qwen3-Embedding-0.6B-f16.gguf",
remote: {
baseUrl: "http://127.0.0.1:8081/v1",
apiKey: "dummy"
}
}
That just avoids silent fallback weirdness.
Important note
At this point, your config is not the main problem anymore.
We already proved:
- the DB has the file
- the DB has the chunks
- the text is in the chunks
memory searchstill returns[]
So the remaining problem looks like search behavior / bug, not āyou configured embeddings wrongā.
My verdict
For your machine, this is the right final embedding architecture:
- chat model on one llama-server
- embedding model on another llama-server
- OpenClaw talks to both over HTTP
Thatās much better than big local GGUF embeddings through provider: "local".
If you want, I can help you write the smallest reproducible bug report for GitHub using the exact evidence you already collected.
I wanna test the embedding now, currently running main gguf on 8080 and embedding on 8081
Nice. Iād test it in two layers:
1. Test the embedding server directly
Run this against 8081:
curl -s http://127.0.0.1:8081/v1/embeddings \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer dummy' \
-d '{
"model": "Qwen3-Embedding-0.6B-f16.gguf",
"input": "I also have my card in Trading 212"
}'
What you want:
- JSON response
data[0].embeddingpresent- not an error
If you have jq, this is even nicer:
curl -s http://127.0.0.1:8081/v1/embeddings \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer dummy' \
-d '{
"model": "Qwen3-Embedding-0.6B-f16.gguf",
"input": "I also have my card in Trading 212"
}' | jq '.data[0].embedding | length'
That should print the embedding dimension.
2. Test OpenClawās integration
Then run:
openclaw memory status --agent main --deep
What you want:
Provider: openaiModel: Qwen3-Embedding-0.6B-f16.ggufEmbeddings: readyVector: readyFTS: ready
3. Then test memory search again
openclaw memory search --agent main --query "I also have my card in Trading 212" --max-results 5 --min-score 0 --json
How to interpret it
- if curl works and status says Embeddings: ready, then the embedding backend is fine
- if
memory searchstill returns[], then the bug is above the embedding layer
So start with the raw /v1/embeddings call. Paste that output if you want and Iāll check it.
matias@matias-System-Product-Name:~$ curl -s http://127.0.0.1:8081/v1/embeddings
-H 'Content-Type: application/json'
-H 'Authorization: Bearer dummy'
-d '{
"model": "Qwen3-Embedding-0.6B-f16.gguf",
"input": "I also have my card in Trading 212"
}'
{"model":"Qwen3-Embedding-0.6B-f16.gguf","object":"list","usage":{"prompt_tokens":12,"total_tokens":12},"data":[{"embedding":[-0.0002532844664528966,-0.003442488145083189,0.017302660271525383,-0.03987465053796768,-0.0017
matias@matias-System-Product-Nacurl -s http://127.0.0.1:8081/v1/embeddings \ings
-H 'Content-Type: application/json'
-H 'Authorization: Bearer dummy'
-d '{
"model": "Qwen3-Embedding-0.6B-f16.gguf",
"input": "I also have my card in Trading 212"
}' | jq '.data[0].embedding | length'
2560
Memory Search (main)
Provider: openai (requested: openai)
Model: Qwen3-Embedding-0.6B-f16.gguf
Sources: memory
Indexed: 7268/7271 files Ā· 20083 chunks
Dirty: no
Store: ~/.openclaw/memory/main.sqlite
Workspace: ~/.openclaw/workspace
Dreaming: 0 3 * * * Ā· limit=10 Ā· minScore=0.8 Ā· minRecallCount=3 Ā· minUniqueQueries=3 Ā· recencyHalfLifeDays=14 Ā· maxAgeDays=30
Embeddings: ready
By source:
memory Ā· 7268/7271 files Ā· 20083 chunks
Vector: ready
Vector dims: 4096
Vector path: ~/.npm-global/lib/node_modules/openclaw/node_modules/sqlite-vec-linux-x64/vec0.so
FTS: ready
Embedding cache: enabled (40012 entries)
Batch: disabled (failures 0/2)
Recall store: 307 entries Ā· 0 promoted Ā· 307 concept-tagged Ā· 0 spaced Ā· scripts=307 latin
Recall path: ~/.openclaw/workspace/memory/.dreams/short-term-recall.json
Recall updated: 2026-04-21T07:36:53.434Z
Dreaming artifacts: diary present Ā· 1 corpus files Ā· ingestion state present
Dream corpus: ~/.openclaw/workspace/memory/.dreams/session-corpus
Dream ingestion: ~/.openclaw/workspace/memory/.dreams/session-ingestion.json
Dream diary: ~/.openclaw/workspace/DREAMS.md
Aha, we found the smoking gun.
Your embedding server returns:
- 2560 dims from
/v1/embeddings
But OpenClaw says the current memory index is:
- Vector dims: 4096
That is a hard mismatch.
What that means
Your current DB was indexed with a 4096-dim embedding model, but your live query embeddings are 2560-dim.
So OpenClaw is effectively trying to compare vectors from different spaces.
That explains why search is busted.
Your two real options
Option 1, no reindex
Use the same embedding model/dimensions that built the current DB.
So if you want to keep yesterdayās index, your embedding backend must return 4096-dim vectors, not 2560.
Option 2, use the new 0.6B model
Then you need a full reindex, because the stored vectors must be regenerated in 2560 dims.
There isnāt a safe shortcut around that.
My blunt recommendation
Since you said you do not want to reindex:
- do not switch to the 0.6B embedding model for this existing DB
- run the original embedding model remotely on
8081 - make sure it returns the same dims as the existing index, which appears to be 4096
Also, this is probably a bug
OpenClaw really should be surfacing this mismatch clearly instead of just failing silently.
If you want maintainer attention, this is a good GitHub issue:
https://github.com/openclaw/openclaw/issues
So the short answer is:
2560 query vectors + 4096 index vectors = broken search
No reindex means you must go back to the original 4096-dim embedding model.
changed back to 4b model, how to try again
Perfect. Try it in this order:
1. Make sure the live embedding model matches the DB dims
curl -s http://127.0.0.1:8081/v1/embeddings \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer dummy' \
-d '{
"model": "Qwen3-Embedding-4B-Q6_K.gguf",
"input": "I also have my card in Trading 212"
}' | jq '.data[0].embedding | length'
You want this to print:
4096
If it does not print 4096, stop there.
2. Check OpenClaw sees the embedding backend as healthy
openclaw memory status --agent main --deep
You want:
Embeddings: readyVector: readyFTS: readyVector dims: 4096
3. Retry the exact memory search
openclaw memory search --agent main --query "I also have my card in Trading 212" --max-results 5 --min-score 0 --json
4. If that still returns []
Then weāve ruled out:
- missing file
- missing chunks
- broken embedding server
- dim mismatch
ā¦and itās basically a real OpenClaw bug.
Paste:
- the embedding length output
- the
memory status --deep
- the final
memory searchoutput
and Iāll tell you the last conclusion.
matias@matias-System-Product-Name:~$ curl -s http://127.0.0.1:8081/v1/embeddings
-H 'Content-Type: application/json'
-H 'Authorization: Bearer dummy'
-d '{
"model": "Qwen3-Embedding-4B-Q6_K.gguf",
"input": "I also have my card in Trading 212"
}' | jq '.data[0].embedding | length'
2560
matias@matias-System-Product-Name:~$
Yep, that settles it.
Even your 4B llama-server is returning:
- 2560 dims
But your current OpenClaw index is still:
- 4096 dims
So the remote llama-server path is not compatible with the existing DB, even with the 4B model name.
What that means
Same conclusion, just stronger now:
- your existing
main.sqlitewas built with a different embedding shape
- the remote llama-server embeddings, whether 0.6B or 4B, are giving you 2560
- so this DB cannot be queried correctly with that server
Real options now
Option A, realistic
Reindex using the llama-server embedding backend you want to keep.
That gives you a clean 2560-dim index matching the live server.
Option B, keep old DB
Find the exact original embedding runtime/model that produced the old 4096-dim vectors.
But based on what weāve seen, that looks like the old local path that was causing OOM, so itās probably not practical.
Blunt truth
There isnāt a magic bridge from:
- stored 4096-dim vectors
to - live 2560-dim query vectors
So if you want the llama-server setup, I think reindex is now unavoidable.
I know thatās annoying as hell, but at least now we know why.
If you want, I can help you make the reindex as safe as possible, for example:
- back up the old SQLite DB first
- confirm the live embedding dims before starting
- then do one clean rebuild and stop touching it after that
#!/bin/bash
/home/matias/llama.cpp/build/bin/llama-server
-m "/home/matias/llama.cpp/models/qwen/embedding/Qwen3-Embedding-4B-Q6_K.gguf"
--alias "Qwen3-Embedding-4B-Q6_K" \
{
"agents": {
"defaults": {
"workspace": "/home/matias/.openclaw/workspace",
"model": {
"primary": "llama-server/Qwen3.6-35B-A3B-Uncensored.Q6_K_P"
},
"memorySearch": {
"provider": "openai",
"fallback": "none",
"model": "Qwen3-Embedding-4B-Q6_K",
"remote": {
"baseUrl": "http://127.0.0.1:8081/v1",
"apiKey": "dummy"
}
},
"compaction": {
"mode": "safeguard",
"reserveTokensFloor": 20000,
"notifyUser": true
},
"contextLimits": {
"toolResultMaxChars": 2000
},
"contextPruning": {
"mode": "cache-ttl",
"ttl": "5m"
}
}
},
"models": {
"mode": "merge",
"providers": {
"llama-server": {
"baseUrl": "http://127.0.0.1:8080/v1",
"api": "openai-completions",
"models": [
{
"id": "Qwen3.6-35B-A3B-Uncensored.Q6_K_P",
"name": "Local llama.cpp model",
"reasoning": false,
"input": [
"text"
],
"contextWindow": 220000,
"maxTokens": 8192
}
]
everything looks good so its loading the correct model?
The config shape looks good, yes.
But the alias alone does not prove the server is producing the right embeddings.
Important
--alias "Qwen3-Embedding-4B-Q6_K" only changes the model name OpenClaw sees.
It does not change:
- the actual model loaded
- the embedding dimensionality
- the vector space
So the real test is still this:
curl -s http://127.0.0.1:8081/v1/embeddings \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer dummy' \
-d '{
"model": "Qwen3-Embedding-4B-Q6_K",
"input": "I also have my card in Trading 212"
}' | jq '.data[0].embedding | length'
What you need
You need that command to print:
4096
If it prints 2560 again, then:
- yes, OpenClaw is probably calling the alias correctly
- but no, it is not compatible with your current DB
So:
- config looks fine
- alias is fine
- the real check is the embedding length
If you want, run that exact alias-based curl now and paste just the number it prints.
should I restart the servers and openclaw
Yes, I would, just to remove any stale-state weirdness.
Safest order
-
Restart the embedding server on
8081- because you changed the embedding model / alias
-
Restart OpenClaw
- so it re-reads the updated memorySearch config cleanly
-
You do not need to restart
8080- unless you changed the main chat model server too
Then test immediately
After restart, run this first:
curl -s http://127.0.0.1:8081/v1/embeddings \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer dummy' \
-d '{
"model": "Qwen3-Embedding-4B-Q6_K",
"input": "I also have my card in Trading 212"
}' | jq '.data[0].embedding | length'
If that is not 4096, stop there, because restart didnāt fix the real mismatch.
So short answer:
- restart 8081
- restart OpenClaw
- leave 8080 alone unless changed
still 2560
Yep, then thatās definitive.
No more restarts or alias tweaks are going to fix this.
What we now know for sure
- your current DB expects 4096-dim vectors
- your llama-server embedding endpoint returns 2560-dim vectors
- therefore this server cannot query that DB correctly
So the options are now brutally simple:
Keep the current DB
You must use the exact old embedding runtime/model that produced 4096 dims.
Keep the current llama-server embedding setup
You must reindex so the DB is rebuilt with 2560-dim vectors.
There isnāt a third option.
Short answer
If you want to stay on the 8081 llama-server embedding backend, reindex is unavoidable.
If you want, I can help you do it in the safest possible way:
- back up
~/.openclaw/memory/main.sqlite - confirm the live embedding dims are 2560
- do one clean rebuild
- test the exact phrase again
Thatās the clean path from here.
do you think when I tried to reindex it and I closed mid indexing it caused this? I can reindex with the 4b model, how to do taht
It could have contributed, yeah, especially the 7268/7271 partial-looking state.
But the bigger proven problem is still the one we measured:
- old DB state says 4096 dims
- your live 4B server returns 2560 dims
So even if the interrupted run made things messy, the clean fix is the same:
Reindex cleanly against the current 4B embedding server
1. Keep this config active
memorySearch.provider: "openai"memorySearch.model: "Qwen3-Embedding-4B-Q6_K"memorySearch.remote.baseUrl: "http://127.0.0.1:8081/v1"
2. Make sure the live embedding server is up
You already tested it and it returns 2560. Thatās fine, as long as the reindex uses the same thing consistently.
3. Run a full rebuild
openclaw memory index --agent main --force --verbose
4. Let it finish
Donāt close the terminal this time.
5. Verify the rebuilt index
openclaw memory status --agent main --deep
After a successful rebuild, Iād expect:
Embeddings: ready
Vector: readyFTS: ready- Vector dims: 2560
6. Test the exact phrase again
openclaw memory search --agent main --query "I also have my card in Trading 212" --max-results 5 --min-score 0 --json
Optional but smart
Before reindexing, back up the old DB:
cp ~/.openclaw/memory/main.sqlite ~/.openclaw/memory/main.sqlite.bak.$(date +%F-%H%M%S)
That way you can always go back.
So yes, I think the interrupted indexing may have made things worse, but a clean full reindex with the current 4B server is the right fix now.
Docs:
If you want, once you kick off openclaw memory index --agent main --force --verbose, paste the first few lines and Iāll sanity-check that itās using the right backend.
is this the best way to go now? before I commit to the reindexing
reindexing now with 4B model
Good, let it run to completion.
When it finishes, run these two checks:
openclaw memory status --agent main --deep
You want Vector dims: 2560 now, since thatās what your live 4B server is returning.
Then:
openclaw memory search --agent main --query "I also have my card in Trading 212" --max-results 5 --min-score 0 --json
If either looks weird, paste it and Iāll check it.
any way to do this faster/in parallel? also, in the future, how do I add new stuff I want to teach to openclaw? I suppose I am not meant to do this reindexing anymore?
Short answer:
Faster right now
Not really in a big generic way, not with this llama-server path.
The main reasons:
- youāre using a normal OpenAI-compatible embedding endpoint, not a provider with a special bulk batch API
- the expensive part is still generating embeddings for lots of chunks
What does help:
- embedding cache, which you already have enabled
- not interrupting runs
- not force-reindexing unless you actually changed model/provider
- a smaller embedding model if youāre okay with the tradeoff
I would not expect a big āparallelize thisā win from OpenClaw for the generic llama-server route.
Future workflow, the normal way
No, you are not supposed to do full reindexing all the time.
Thatās the key idea.
Normal āteach OpenClawā path
You add/update memory files, for example:
- durable facts in
MEMORY.md - longer notes/docs in
memory/*.md - external doc folders via
memorySearch.extraPaths
Then OpenClaw should incrementally pick them up.
When to full reindex
Only for bigger shifts like:
- changing embedding model
- changing embedding provider
- changing chunking/index assumptions
- recovering from broken/stale index state
My recommendation going forward
For normal new knowledge
Just put it in:
~/.openclaw/workspace/MEMORY.md- or
~/.openclaw/workspace/memory/*.md
and let OpenClaw ingest it incrementally.
For big imports
Drop the files in, then if you want to nudge it manually:
openclaw memory index --agent main
Use --force only when you really mean ārebuild everythingā.
Practical mental model
You are not ātrainingā OpenClaw.
You are maintaining a memory index.
That means:
- add/edit notes normally
- let incremental indexing do the routine work
- reserve full rebuilds for major backend/model changes
So yes, after this cleanup, the goal should be: donāt keep doing full reindexes.
just to clarify, what it is doing now is reading all those .md files and indexing them into the sqlite? where do these .md files originate from?
Yes, basically.
What itās doing now is:
- reading the Markdown files
- chunking them into smaller pieces
- embedding those chunks
- storing metadata + chunk text + vectors in the SQLite DB
- your DB is
~/.openclaw/memory/main.sqlite
- your DB is
Where the .md files come from
For the builtin memory system, they usually come from your workspace:
~/.openclaw/workspace/MEMORY.md~/.openclaw/workspace/memory/*.md~/.openclaw/workspace/DREAMS.mdfor dreaming/diary-related stuff
- plus anything you intentionally added via
memorySearch.extraPaths
Who creates them
A mix of:
- you, when you manually add notes/files
- OpenClaw, when it writes memory files during normal use
- automatic memory flush / dreaming / daily notes, depending on what features are enabled
- bulk imports, if you copied documents into
workspace/memoryor configured extra paths
So there is no hidden magical memory store before SQLite.
The source of truth is just Markdown on disk, and SQLite is the searchable index built from that.
Thatās actually a nice design, because it means:
- you can inspect the source files yourself
- back them up
- edit them manually
- and understand exactly what OpenClaw is indexing
Docs:
please sum up the answer to the initial message and the conversation briefly
Brief summary:
-
Why you couldnāt see OpenClaw āthinkingā
/thinkchanges reasoning effort/reasoningonly shows reasoning if the backend actually returns it/verboseis what shows tool/activity traces- with your local
llama-server/...setup, OpenClaw was not guaranteed to expose reasoning blocks the way you expected
-
What we found about memory/embeddings
- your memory files were real and indexed into
~/.openclaw/memory/main.sqlite - the search DB definitely contained the target file and matching text
- but your live embedding backend and the stored index got out of sync
- your memory files were real and indexed into
-
Root cause
- the existing DB had 4096-dim vectors
- your llama-server embedding endpoint was returning 2560-dim vectors
- that mismatch breaks search
-
Best fix
- use a separate remote embedding server (your
8081llama-server is fine) - choose the embedding model you actually want to keep
- do one clean reindex against that exact setup
- use a separate remote embedding server (your
-
Going forward
- donāt keep full reindexing for normal use
- āteachingā OpenClaw usually just means adding/editing Markdown in:
MEMORY.mdmemory/*.md
- let incremental indexing handle routine updates
If this reindex finishes and exact phrase search still fails, then itās likely a real OpenClaw bug.
14:24:17 [memory] embeddings rate limited; retrying in 2131ms
14:24:18 [memory] embeddings: batch start
14:24:18 [memory] embeddings: batch start
14:24:19 [memory] embeddings: batch start
14:24:19 [memory] embeddings: batch start
14:24:19 [memory] embeddings: batch start
14:24:20 [memory] embeddings: batch start
Memory index failed (main): openai embeddings failed: 500 {"error":{"code":500,"message":"input (2071 tokens) is too large to process. increase the physical batch size (current batch size: 2048)","type":"server_error"}}
matias@matias-System-Product-Name:~$
matias@matias-System-Product-Name:~$
did probably all but this was in the end
It probably got most of the way through, but I would not treat that run as cleanly finished.
The important part is the fatal error:
- one embedding input hit 2071 tokens
- your llama-server embedding backend only accepts 2048
- so the rebuild stopped on that chunk
Best fix from the OpenClaw side
Lower the memory chunk size before rerunning.
Iād try:
memorySearch: {
provider: "openai",
fallback: "none",
model: "Qwen3-Embedding-4B-Q6_K",
remote: {
baseUrl: "http://127.0.0.1:8081/v1",
apiKey: "dummy"
},
chunking: {
tokens: 250,
overlap: 50
}
}
That should keep chunks comfortably under your serverās 2048-token limit.
Then rerun
openclaw memory index --agent main --force --verbose
Why this happened
OpenClawās builtin memory chunking is only an estimate, and your embedding server has a stricter real token limit than the chunk estimate happened to hit.
So yes, the rate limited messages are annoying but not the main issue.
The real blocker is the 2048 token ceiling from the embedding server.
Alternative fix
You could also raise the embedding serverās allowed physical batch/token size on the llama.cpp side, but the simplest, safest fix is just smaller OpenClaw chunks.
Docs:
If you want, I can give you the exact memorySearch block with the chunking settings merged into your current config.
no I will not run the whole process again, lets test if it works
Fair enough, letās just test it.
Run these two:
openclaw memory status --agent main --deep
and
openclaw memory search --agent main --query "I also have my card in Trading 212" --max-results 5 --min-score 0 --json
What weāre looking for
- if status now shows the new vector setup cleanly and the search returns the snippet/file, then youāre probably okay
- if search is still empty, then the partial reindex did not fix it
Paste both outputs and weāll judge it from there, no more guessing.
matias@matias-System-Product-Name:~$ openclaw memory status --agent main --deep
š¦ OpenClaw 2026.4.15 (041266a)
Powered by open source, sustained by spite and good documentation.
ā
ā
Memory Search (main)
Provider: openai (requested: openai)
Model: Qwen3-Embedding-4B-Q6_K
Sources: memory
Indexed: 0/7271 files Ā· 0 chunks
Dirty: yes
Store: ~/.openclaw/memory/main.sqlite
Workspace: ~/.openclaw/workspace
Dreaming: 0 3 * * * Ā· limit=10 Ā· minScore=0.8 Ā· minRecallCount=3 Ā· minUniqueQueries=3 Ā· recencyHalfLifeDays=14 Ā· maxAgeDays=30
Embeddings: ready
By source:
memory Ā· 0/7271 files Ā· 0 chunks
Vector: ready
Vector path: ~/.npm-global/lib/node_modules/openclaw/node_modules/sqlite-vec-linux-x64/vec0.so
FTS: ready
Embedding cache: enabled (0 entries)
Batch: disabled (failures 0/2)
Recall store: 307 entries Ā· 0 promoted Ā· 307 concept-tagged Ā· 0 spaced Ā· scripts=307 latin
Recall path: ~/.openclaw/workspace/memory/.dreams/short-term-recall.json
Recall updated: 2026-04-21T07:36:53.434Z
Dreaming artifacts: diary present Ā· 1 corpus files Ā· ingestion state present
Dream corpus: ~/.openclaw/workspace/memory/.dreams/session-corpus
Dream ingestion: ~/.openclaw/workspace/memory/.dreams/session-ingestion.json
Dream diary: ~/.openclaw/workspace/DREAMS.md
matias@matias-System-Product-Name:~$ openclaw memory search --agent main --query "I also have my card in Trading 212" --max-results 5 --min-score 0 --json
llama-server is working on this, how long will it search?
Donāt wait on that, you can cancel it.
This line is the important one:
- Indexed: 0/7271 files Ā· 0 chunks
- Dirty: yes
So right now, there is nothing usable to search.
What likely happened
The failed reindex seems to have left you with:
- an empty memory DB
- a dirty/incomplete rebuild state
So the old searchable index is effectively gone.
Answer to your question
- a healthy search should take seconds, not sit there forever
- in your current state, waiting is pointless, because there are 0 chunks
What your options are now
If you made a backup of main.sqlite
Restore that backup.
If you did not make a backup
You need to complete a clean index build. Thereās no searchable index left right now.
I know thatās annoying, but the status output is pretty definitive here.
So short version:
- cancel the current search
- 0 chunks means no search can work
- either restore the old DB or finish a successful reindex
oh my fuckin god this is painful, yes I have the old db available, HOW DO I USE THAT?
Yeah, thatās brutal. Iām sorry, this got way more painful than it should have.
Important first
If you want to use the old DB, you need both:
- the old
main.sqlite - the old embedding setup that matches it
Restoring the DB alone will not fix search if you keep the current 2560-dim embedding backend.
Your old DB was 4096-dim.
So if you restore it while still using the current remote embedding server returning 2560, search will still be broken.
If your goal is āput me back where I was before this messā
Do this:
1. Stop OpenClaw / stop any running memory index job
Make sure it isnāt still writing the DB.
2. Back up the current broken DB
cp ~/.openclaw/memory/main.sqlite ~/.openclaw/memory/main.sqlite.broken.$(date +%F-%H%M%S)
3. Restore your old DB over main.sqlite
Replace /path/to/your/old/main.sqlite with the real backup path:
cp /path/to/your/old/main.sqlite ~/.openclaw/memory/main.sqlite
4. Restore the old embedding config/runtime too
If the old DB was built with the old local 4096-dim setup, you need to put that config back as well.
5. Restart OpenClaw
Then check:
openclaw memory status --agent main --deep
The blunt truth
There are two separate restore paths:
Path A, restore the old world
- old DB
- old embedding backend
- search should behave like before
Path B, keep the new llama-server embedding backend
- then the old DB is useless
- you need a clean reindex
So before you do anything, decide which you want:
- ārestore old working stateā ā restore old DB + old embedding setup
- āstay on new 8081 embedding serverā ā old DB wonāt help, reindex required
If you want, send me:
- the path of your old DB backup
- whether you want old working state or new 8081 setup
and Iāll tell you the exact next command.
Memory Search (main)
Provider: openai (requested: openai)
Model: Qwen3-Embedding-4B-Q6_K
Sources: memory
Indexed: 7268/7271 files Ā· 20083 chunks
Dirty: no
Store: ~/.openclaw/memory/main.sqlite
Workspace: ~/.openclaw/workspace
Dreaming: 0 3 * * * Ā· limit=10 Ā· minScore=0.8 Ā· minRecallCount=3 Ā· minUniqueQueries=3 Ā· recencyHalfLifeDays=14 Ā· maxAgeDays=30
Embeddings: ready
By source:
memory Ā· 7268/7271 files Ā· 20083 chunks
Vector: ready
Vector dims: 4096
Vector path: ~/.npm-global/lib/node_modules/openclaw/node_modules/sqlite-vec-linux-x64/vec0.so
FTS: ready
Embedding cache: enabled (40012 entries)
Batch: disabled (failures 0/2)
Recall store: 307 entries Ā· 0 promoted Ā· 307 concept-tagged Ā· 0 spaced Ā· scripts=307 latin
Recall path: ~/.openclaw/workspace/memory/.dreams/short-term-recall.json
Recall updated: 2026-04-21T07:36:53.434Z
Dreaming artifacts: diary present Ā· 1 corpus files Ā· ingestion state present
Dream corpus: ~/.openclaw/workspace/memory/.dreams/session-corpus
Dream ingestion: ~/.openclaw/workspace/memory/.dreams/session-ingestion.json
Dream diary: ~/.openclaw/workspace/DREAMS.md
matias@matias-System-Product-Name:~$
old db, how do we use this now
Good, that means the old DB is back.
But to actually use that old DB, you need the embedding backend that matches it.
And your restored DB says it expects:
- Vector dims: 4096
So the rule is simple
To use this DB live, OpenClaw must query with a backend that also returns 4096-dim embeddings.
Important
If you keep the current remote openai/llama-server setup that was returning 2560, then this old DB is restored but still not usable for search.
So your next move is one of these
If you want to use the old DB
Put back the old embedding config, for example the old local path:
memorySearch: {
provider: "local",
local: {
modelPath: "/home/matias/llama.cpp/models/qwen/embedding/Qwen3-Embedding-4B-Q6_K.gguf"
}
}
Then restart OpenClaw and test:
openclaw memory search --agent main --query "I also have my card in Trading 212" --max-results 5 --min-score 0 --json
If that old local path still OOMs
Then the old DB is basically just a backup, not a usable live index.
In that case:
- keep it safe
- and if you want the remote
8081embedding setup, youāll need a clean reindex for that setup
Blunt version
- old DB restored ā
- usable right now? only if you also restore a 4096-dim embedding backend
If you want, I can write the exact memorySearch block you should paste in for the old DB path versus the new 8081 path, so you can switch cleanly between them.
matias@matias-System-Product-Name:~$ openclaw memory search --agent main --query "I also have my card in Trading 212" --max-results 5 --min-score 0 --json
{
"results": []
}
^Cmatias@matias-System-Product-Name:~$ ^C
matias@matias-System-Product-Name:~$ openclaw memory search --agent main --query "Trading 212" --max-results 5 --min-score 0 --json
{
"results": []
}
using 4b model, why does it not work when I specifically used the 4B model to do this yesterday?
Because ā4B modelā is unfortunately not specific enough.
What matters is the exact embedding generator, not just the family/name.
You proved:
- the restored DB says Vector dims: 4096
- your current 4B llama-server endpoint returns 2560
So even though both are āQwen 4B embeddingā, they are not producing the same embeddings.
In practice that means
Yesterdayās index was built with something that produced 4096-dim vectors.
Todayās live query path is producing 2560-dim vectors.
Those do not match, so search returns nothing.
Why this can happen
Because these all matter:
- provider/runtime (
localnode-llama-cpp vs llama-server OpenAI endpoint) - exact model handling
- embedding output shape
- maybe server-side implementation details
So āsame 4B file/model nameā does not guarantee āsame embedding spaceā.
Blunt answer
It doesnāt work because todayās 4B backend is not equivalent to yesterdayās 4B backend, even if the label makes it look like it should be.
And yes, I think OpenClaw should make this much clearer instead of quietly failing.
So the real rule is:
- if you want to use the old DB, you need the exact old 4096-dim embedding runtime
- if you want to use the new llama-server backend, you need a fresh index built by that backend
Thatās the actual reason.
okay can we try to see how we can achieve this 2560, how do I get this info with what command
Yep, compare these two numbers:
1. What the live embedding server returns
This is the authoritative one:
curl -s http://127.0.0.1:8081/v1/embeddings \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer dummy' \
-d '{
"model": "Qwen3-Embedding-4B-Q6_K",
"input": "test"
}' | jq '.data[0].embedding | length'
That prints the live query embedding dimension.
Right now for you, that has been 2560.
2. What the current OpenClaw DB expects
Use:
openclaw memory status --agent main --deep | rg 'Provider:|Model:|Vector dims:'
That prints the DB/index-side info, including:
- provider
- model
- Vector dims
If you want the DB-level value directly from SQLite
You can also try:
sqlite3 ~/.openclaw/memory/main.sqlite "select json_array_length(embedding) from chunks limit 5;"
If JSON functions are available, that will show the stored vector length in the DB.
The rule
To make search work, these need to match:
curl ... | jq '.data[0].embedding | length'openclaw memory status ...āVector dims: ...
If you want, paste the output of these two commands together and Iāll tell you immediately whether the setup is consistent.
returning 4096 with current model
Nice, thatās the first actually promising result.
If the live embedding call is now 4096, then it finally matches the old DBās 4096.
Next two commands
openclaw memory status --agent main --deep | rg 'Provider:|Model:|Vector dims:|Indexed:|Dirty:'
and then:
openclaw memory search --agent main --query "I also have my card in Trading 212" --max-results 5 --min-score 0 --json
What this means
- if search works now, the issue really was the embedding mismatch
- if search is still empty, then weāve ruled out the dimension mismatch too, and it starts looking like a real search bug
Paste those outputs.
compact this conversation and we will pick up in another thread please
Resume note for next thread:
-
Original issue: you couldnāt see OpenClaw āthinkingā. Main takeaway:
/thinkchanges reasoning effort/reasoningonly shows reasoning if the backend returns it/verboseis the better knob for seeing tool/activity traces
-
Memory/embedding investigation:
~/.openclaw/workspace/memorycontains ~7.2k.mdfiles- OpenClaw indexes Markdown from workspace memory files into
~/.openclaw/memory/main.sqlite - We verified the DB contained the target file and chunks via
sqlite3
-
Big problem discovered:
- old DB expected 4096-dim vectors
- new llama-server embedding endpoint at
127.0.0.1:8081was at one point returning 2560 - that mismatch explains broken search
- later you said the current model is now returning 4096, which is promising
-
Failed reindex:
- a forced reindex later failed on embedding input size (
2071 tokens > 2048) - that temporarily left the DB empty
- you restored the old DB successfully
- a forced reindex later failed on embedding input size (
-
Current desired setup:
- main chat model on
127.0.0.1:8080 - embedding model on
127.0.0.1:8081 - current check from you: embedding endpoint now returns 4096 for the active model
- main chat model on
-
Last recommended verification steps:
openclaw memory status --agent main --deep | rg 'Provider:|Model:|Vector dims:|Indexed:|Dirty:' openclaw memory search --agent main --query "I also have my card in Trading 212" --max-results 5 --min-score 0 --json -
Interpretation: