#Error openai embeddings failed 403
1 messages · Page 1 of 1 (latest)
Let me verify the docs path and the relevant config before I answer.
That error is OpenAI rejecting embeddings from your region, not the memory system itself.
A couple things to check first:
openclaw --version
openclaw models status
openclaw memory status --deep
If you want the likely fix right away, set memory search to a non-OpenAI provider in openclaw.json:
{
agents: {
defaults: {
memorySearch: {
provider: "gemini" // or "local", "ollama", "voyage", "mistral", "bedrock"
}
}
}
}
Why: in auto mode, OpenClaw prefers OpenAI before Gemini, so if an OpenAI key is present it may keep picking that and hit this 403.
If you don't need semantic memory search at all, you can also disable it:
{
agents: {
defaults: {
memorySearch: {
enabled: false
}
}
}
}
After changing provider, I'd run:
openclaw memory index --force
Docs:
- https://docs.openclaw.ai/help/faq
- https://docs.openclaw.ai/concepts/memory-search
- https://docs.openclaw.ai/reference/memory-config
If you paste the output of the 3 commands above plus your memorySearch config block, I can tell you the exact setting to use.
even with
"agents": {
"defaults": {
"memorySearch": {
"provider": "local"
},
model fallback decision: decision=candidate_failed requested=llamacpp/gemma-4-26B-A4B-it-UD-Q5_K_S candidate=llamacpp/gemma-4-26B-A4B-it-UD-Q5_K_S reason=timeout next=google/gemini-3.1-flash-lite-preview detail=LLM request timed out.
11:42:18+08:00 [agent] embedded run agent end: runId=4be14072-117c-49ca-9328-012bcc7b7d63 isError=true model=gemini-3.1-flash-lite-preview provider=google error=LLM error: {
"error": {
"code": 400,
"message": "User location is not supported for the API use.",
"status": "FAILED_PRECONDITION"
}
} rawError={"error":{"message":"{\n "error": {\n "code": 400,\n "message": "User location is not supported for the API use.",\n "status": "FAILED_PRECONDITION"\n }\n}\n","code":400,"status":""}}
11:42:18+08:00 [agent] unsupported thinking level for google/gemini-3.1-flash-lite-preview; retrying with off
11:42:19+08:00 [agent] embedded run agent end: runId=4be14072-117c-49ca-9328-012bcc7b7d63 isError=true model=gemini-3.1-flash-lite-preview provider=google error=LLM error: {
"error": {
"code": 400,
"message": "User location is not supported for the API use.",
"status": "FAILED_PRECONDITION"
}
} rawError={"error":{"message":"{\n "error": {\n "code": 400,\n "message": "User location is not supported for the API use.",\n "status": "FAILED_PRECONDITION"\n }\n}\n","code":400,"status":""}}
11:42:19+08:00 [model-fallback] model fallback decision: decision=candidate_succeeded requested=llamacpp/gemma-4-26B-A4B-it-UD-Q5_K_S candidate=google/gemini-3.1-flash-lite-preview reason=unknown next=none
I’m checking the exact fallback config keys so I don’t hand you a made-up fix.
Yep, that second log is a different path.
memorySearch.provider: "local" only affects memory embeddings.
The Gemini error you pasted is from main model fallback, because your local chat model timed out, then OpenClaw tried a Google fallback.
So if you want to stay fully local, you likely need both:
{
agents: {
defaults: {
model: {
primary: "your-local-model",
fallbacks: []
},
memorySearch: {
provider: "local",
fallback: "none"
}
}
}
}
Also, local memory embeddings only work if node-llama-cpp is actually available. Verify with:
openclaw --version
openclaw models status
openclaw models fallbacks list
openclaw memory status --deep
What I’d expect:
models status/fallbacks listshows whether Google is still in the fallback chainmemory status --deeptells you whether local embeddings are really ready, or if local memory is failing and trying to degrade/fallback
If you prefer CLI instead of editing config, you can also clear model fallbacks:
openclaw models fallbacks clear
Docs:
- https://docs.openclaw.ai/concepts/model-failover
- https://docs.openclaw.ai/concepts/models
- https://docs.openclaw.ai/reference/memory-config
If you paste the output of those 4 commands, I can tell you exactly which fallback is still active.
why is fall back happen? and what shd i use for llama cpp gemma-4-26B-A4B-it-UD-Q5_K_S?
I’m checking the local-model docs so I can give you a sane recommendation instead of guessing.
Fallback happens because your primary model timed out, and OpenClaw treats timeout/overloaded errors as failover-worthy, so it moves to the next entry in agents.defaults.model.fallbacks.
So the chain is basically:
- try primary
- primary times out
- try next fallback model
If you do not want remote fallback at all, clear the fallback list:
openclaw models fallbacks clear
or in config:
{
agents: {
defaults: {
model: {
primary: "your-local-model",
fallbacks: []
}
}
}
}
For llama.cpp + that quantized Gemma build: honestly, I would not use it as your main OpenClaw agent model if you can avoid it. The docs strongly prefer:
- Ollama for the easiest local setup
- LM Studio + Responses API for the best current local stack
- the largest/full-size model build you can run, not aggressive quantizations
That Gemma variant may be fine for experiments, but as a tool-using agent it’s more likely to be slow, timeout, or get flaky.
If you insist on keeping a custom local OpenAI-compatible backend, the docs suggest trying this when local models choke on bigger OpenClaw prompts:
compat: {
supportsTools: false
}
That can help if the backend/model is breaking on tool schemas, but of course it also means the model won’t do normal tool use.
My practical recommendation:
- if you want simple local: use Ollama
- if you want best local quality: use LM Studio
- if you want no cloud fallback: set
fallbacks: []
Docs:
- https://docs.openclaw.ai/concepts/model-failover
- https://docs.openclaw.ai/gateway/local-models
- https://docs.openclaw.ai/help/faq
If you want, paste your agents.defaults.model block plus openclaw --version, and I’ll rewrite it into a clean no-fallback local config.
i mean what memorySearch provide i shd use, [agent] embedded run failover decision: runId=4be14072-117c-49ca-9328-012bcc7b7d63 stage=assistant decision=fallback_model reason=timeout provider=llamacpp/gemma-4-26B-A4B-it-UD-Q5_K_S profile=-
11:42:16+08:00 [diagnostic] lane task error: lane=main durationMs=62537 error="FailoverError: LLM request timed out."
11:42:16+08:00 [diagnostic] lane task error: lane=session:agent:main:main durationMs=62539 error="FailoverError: LLM request timed out."
Ah, got it, you mean memorySearch provider, not the main chat model.
For your situation, I’d use a fully local memory provider:
Best choices
ollamaif you already use Ollamalocalif you want the built-in local embedding path
I would not use openai or gemini for memory search if your region is already rejecting them.
Recommended config
If you already have Ollama:
{
agents: {
defaults: {
memorySearch: {
provider: "ollama",
fallback: "none"
}
}
}
}
If you want built-in local embeddings:
{
agents: {
defaults: {
memorySearch: {
provider: "local",
fallback: "none"
}
}
}
}
Important distinction
This log:
provider=llamacpp/gemma-...
reason=timeout
is not the memorySearch provider.
That is your main agent model timing out.
So there are really 2 separate things:
- memorySearch provider → embeddings for memory recall
- agent model → the LLM replying in chat
My practical recommendation
- using Ollama already? → set
memorySearch.provider = "ollama" - not using Ollama? → set
memorySearch.provider = "local" - in both cases, set
fallback: "none"
Then verify with:
openclaw memory status --deep
If local still fails there, it usually means local embeddings are not actually ready yet.
Docs:
If you want, paste your current memorySearch block and I’ll tell you which of the two is better for your setup.
why fallback: "none" ? and i use llama cpp
fallback: "none" means:
if the chosen memorySearch provider fails, do not try another provider
I suggested it because otherwise OpenClaw may fall through to a remote embeddings provider, and in your case that just hits the same region block again.
So for you:
- using llama.cpp for the main chat model does not automatically make memory search use llama.cpp
- there is no
llamacppmemorySearch provider - the closest fully local memory option is:
{
agents: {
defaults: {
memorySearch: {
provider: "local",
fallback: "none"
}
}
}
}
Why none:
provider: "local"= use local embeddingsfallback: "none"= if local embeddings are not ready, fail locally
instead of trying OpenAI/Gemini/Voyage/etc.
That’s useful for you because remote fallback is exactly what you don’t want.
So if you use llama.cpp, my answer is:
- memorySearch provider:
local - memorySearch fallback:
"none"
Then check:
openclaw memory status --deep
If that still says local embeddings aren’t ready, then the problem is local embedding setup, not OpenAI.
Docs:
If you paste openclaw memory status --deep, I can tell you whether local embeddings are actually being picked up.
Fallbacks (9):
- llamacpp/gemma-4-26B-A4B-it-UD-Q5_K_S
- google/gemini-3.1-flash-lite-preview
- ollama/gemma4:31b-cloud
- openrouter/google/gemma-4-31b-it:free
- google/gemma-4-31b-it
- ollama/qwen3.5:397b-cloud
- google/gemini-3-flash-preview
- google/gemma-4-26b-a4b-it
- ollama/qwen3-coder:480b-cloud
🦞 OpenClaw 2026.4.9 (0512059)
I'm the reason your shell history looks like a hacker-movie montage.
│
◇
[openclaw] Failed to start CLI: Error: Local embeddings unavailable.
Reason: optional dependency node-llama-cpp is missing (or failed to install).
Detail: Cannot find package 'node-llama-cpp' imported from /home/rr420/.npm-global/lib/node_modules/openclaw/dist/engine-embeddings-BjFfLn_y.js
To enable local embeddings:
- Use Node 24 (recommended for installs/updates; Node 22 LTS, currently 22.14+, remains supported)
- Reinstall OpenClaw (this should install node-llama-cpp): npm i -g openclaw@latest
- If you use pnpm: pnpm approve-builds (select node-llama-cpp), then pnpm rebuild node-llama-cpp
Or set agents.defaults.memorySearch.provider = "openai" (remote).
Or set agents.defaults.memorySearch.provider = "gemini" (remote).
Or set agents.defaults.memorySearch.provider = "voyage" (remote).
Or set agents.defaults.memorySearch.provider = "mistral" (remote).
at createEmbeddingProvider (file:///home/rr420/.npm-global/lib/node_modules/openclaw/dist/manager-vTWIfqhx.js:393:19)
Nice, this gives the real answer.
Your problem is not OpenAI anymore, it’s this:
Local embeddings unavailable.
node-llama-cpp is missing
So memorySearch.provider = "local" currently cannot work on that install.
Also, those 9 fallbacks are your main model fallbacks, not memorySearch.
What this means
For memory search, your choices are now:
Option 1, best if you want built-in local memory
Keep:
memorySearch: {
provider: "local",
fallback: "none"
}
but fix the missing dependency:
npm i -g openclaw@latest
If you installed via pnpm/from source:
pnpm approve-builds
pnpm rebuild node-llama-cpp
Then verify:
openclaw memory status --deep
Option 2, easier if you already run local Ollama
Use:
memorySearch: {
provider: "ollama",
fallback: "none"
}
That avoids node-llama-cpp entirely.
Option 3, quickest temporary workaround
Disable memory search for now so the errors stop:
memorySearch: {
enabled: false
}
My recommendation
Since OpenAI/Gemini are region-blocked for you:
- If you already use local Ollama → use
provider: "ollama" - If not → fix
node-llama-cppand stay withprovider: "local" - keep
fallback: "none"so it doesn’t jump to blocked remote providers
Your OpenClaw version looks fine, this is mainly an install/dependency issue.
Docs:
If this fixes it, please right-click the solution message → Apps → ✅ Mark Solution so Answer Overflow can index it for the next person.
openclaw memory status --deep
🦞 OpenClaw 2026.4.9 (0512059)
Your task has been queued; your dignity has been deprecated.
│
◐ Probing embeddings…[node-llama-cpp] The prebuilt binary for platform "linux" "x64" with Vulkan support is not compatible with the current system, falling back to using no GPU
◒ Probing embeddings…..Downloading to ~/.node-llama-cpp/models
✔ hf_ggml-org_embeddinggemma-300m-qat-Q8_0.gguf downloaded 328.58MB in 22s
◐ Probing embeddings…...Downloaded to ~/.node-llama-cpp/models/hf_ggml-org_embeddinggemma-300m-qat-Q8_0.gguf
◇
Memory Search (main)
Provider: local (requested: local)
Model: hf:ggml-org/embeddinggemma-300m-qat-q8_0-GGUF/embeddinggemma-300m-qat-Q8_0.gguf
Sources: memory
Indexed: 0/28 files · 0 chunks
Dirty: yes
Store: ~/.openclaw/memory/main.sqlite
Workspace: ~/.openclaw/workspace
Dreaming: off
Embeddings: ready
By source:
memory · 0/28 files · 0 chunks
Vector: ready
Vector path: ~/.npm-global/lib/node_modules/openclaw/node_modules/sqlite-vec-linux-x64/vec0.so
FTS: ready
Embedding cache: enabled (0 entries)
Batch: disabled (failures 0/2)
Recall store: 0 entries · 0 promoted · 0 concept-tagged · 0 spaced
Recall path: ~/.openclaw/workspace/memory/.dreams/short-term-recall.json
?
Short version:
-
Using local Ollama already?
use:memorySearch: { provider: "ollama", fallback: "none" } -
Not using Ollama?
use:memorySearch: { provider: "local", fallback: "none" }but first fix the missing
node-llama-cppinstall -
Just want the errors to stop right now?
use:
memorySearch: { enabled: false }
Your current local setup is failing because node-llama-cpp is missing. So right now:
local= broken until fixedopenai/gemini= region blockedollama= easiest local workaround if you have it
If you want, paste whether you have Ollama installed yes/no, and I’ll give you the exact config block to use.
i have ollama install work, but not using it by running the model. I use llama cpp.
and
warn agent/embedded {"subsystem":"agent/embedded"} {"event":"embedded_run_agent_end","tags":["error_handling","lifecycle","agent_end","assistant_error"],"runId":"ee0fde1b-9d98-4e4d-91cd-8a34d0c76a1a","isError":true,"error":"LLM error: {\n "error": {\n "code": 400,\n "message": "User location is not supported for the API use.",\n "status": "FAILED_PRECONDITION"\n }\n}","failoverReason":null,"model":"gemini-3.1-flash-lite-preview","provider":"google","rawErrorPreview":"{"error":{"message":"{\n \"error\": {\n \"code\": 400,\n \"message\": \"User location is not supported for the API use.\",\n \"status\": \"FAILED_PRECONDITION\"\n }\n}\n","code":400,"status":""}}","rawErrorHash":"sha256:80e2ce331137","rawErrorFingerprint":"sha256:434b650ae971","providerErrorMessagePreview":"{\n "error": {\n "code": 400,\n "message": "User location is not supported for the API use.",\n "status": "FAILED_PRECONDITION"\n }\n}"} embedded run agent end
Yep, that’s the key distinction:
You can use:
- llama.cpp for the main chat model
- Ollama only for memorySearch
They do not have to be the same thing.
What’s happening now
You currently have two separate issues:
- memorySearch
provider: "local"is failing becausenode-llama-cppis missing- since Ollama works on your machine, easiest fix is:
memorySearch: {
provider: "ollama",
fallback: "none"
}
- main model fallback
- your llama.cpp main model timed out
- OpenClaw then fell through to Google fallback
- Google is region-blocked for you, so that fails too
So the clean setup is
{
agents: {
defaults: {
model: {
primary: "llamacpp/your-main-model",
fallbacks: [] // or keep only local/non-Google fallbacks
},
memorySearch: {
provider: "ollama",
fallback: "none"
}
}
}
}
In plain English
- keep llama.cpp as your chat model if you want
- use Ollama only for memory embeddings
- remove Google from your model fallback chain, or clear fallbacks entirely
If you want the fastest fix, I’d do these two things:
openclaw models fallbacks clear
and set:
memorySearch: { provider: "ollama", fallback: "none" }
That should stop both:
- the OpenAI/Google region errors
- the unwanted remote fallback after llama.cpp timeouts
If you paste your current agents.defaults block, I can rewrite it into a working config.
but why, i shd install it?
openclaw memory status --deep
🦞 OpenClaw 2026.4.9 (0512059)
Give me a workspace and I'll give you fewer tabs, fewer toggles, and more oxygen.
│
◒ Probing embeddings…[node-llama-cpp] The prebuilt binary for platform "linux" "x64" with Vulkan support is not compatible with the current system, falling back to using no GPU
◇
Memory Search (main)
Provider: local (requested: local)
Model: hf:ggml-org/embeddinggemma-300m-qat-q8_0-GGUF/embeddinggemma-300m-qat-Q8_0.gguf
Sources: memory
Indexed: 0/28 files · 0 chunks
Dirty: yes
Store: ~/.openclaw/memory/main.sqlite
Workspace: ~/.openclaw/workspace
Dreaming: off
Embeddings: ready
By source:
memory · 0/28 files · 0 chunks
Vector: ready
Vector path: ~/.npm-global/lib/node_modules/openclaw/node_modules/sqlite-vec-linux-x64/vec0.so
FTS: ready
Embedding cache: enabled (0 entries)
Batch: disabled (failures 0/2)
Recall store: 0 entries · 0 promoted · 0 concept-tagged · 0 spaced
Recall path: ~/.openclaw/workspace/memory/.dreams/short-term-recall.json
Ah, nice, this changes it.
You do not need to install/fix local embeddings now.
That output says:
Provider: localEmbeddings: readyVector: readyFTS: ready
So memorySearch local is working now. 🎉
The Vulkan line is just:
GPU path not compatible, falling back to CPU
That is not fatal, just slower.
What you should do now
1. Keep memorySearch on local
Since it’s already working, I’d keep:
memorySearch: {
provider: "local",
fallback: "none"
}
2. Build the index
Right now it says:
Indexed: 0/28 filesDirty: yes
So run:
openclaw memory index --force
3. Separate issue: your main model fallback
The Google/Gemini region error is still from your main model fallback chain, not memorySearch.
So if you want to stop that part, remove remote Google fallbacks or clear fallbacks:
openclaw models fallbacks clear
Short version
- memorySearch: now fine, use
local - need install? no, not for memory anymore
- next command:
openclaw memory index --force - Google error: separate main-model fallback issue
If this fixes the memory side, please right-click the solution message → Apps → ✅ Mark Solution so Answer Overflow can index it.
warn [node-llama-cpp] The prebuilt binary for platform "linux" "x64" with Vulkan support is not compatible with the current system, falling back to using no GPU
That warning is okay.
It means:
node-llama-cpptried to use GPU/Vulkan- your system/binary combo is not compatible
- so it fell back to CPU
So:
- not fatal
- memory search can still work
- it may just be slower
Since your earlier openclaw memory status --deep showed Embeddings: ready, I’d treat this as informational unless performance is bad.
So the practical read is:
- warning = CPU fallback
- not the cause of the Google/OpenAI region errors
- safe to ignore for now
If memory is still empty, the next real step is still:
openclaw memory index --force
Probing embeddings…[node-llama-cpp] The prebuilt binary for platform "linux" "x64" with Vulkan support is not compatible with the current system, falling back to using no GPU
◇
Memory Search (main)
Provider: local (requested: local)
Model: hf:ggml-org/embeddinggemma-300m-qat-q8_0-GGUF/embeddinggemma-300m-qat-Q8_0.gguf
Sources: memory
Indexed: 28/28 files · 230 chunks
Dirty: no
Store: ~/.openclaw/memory/main.sqlite
Workspace: ~/.openclaw/workspace
Dreaming: off
Embeddings: ready
By source:
memory · 28/28 files · 230 chunks
Vector: ready
Vector dims: 768
Vector path: ~/.npm-global/lib/node_modules/openclaw/node_modules/sqlite-vec-linux-x64/vec0.so
FTS: ready
Embedding cache: enabled (230 entries)
Batch: disabled (failures 0/2)
Recall store: 0 entries · 0 promoted · 0 concept-tagged · 0 spaced
Recall path: ~/.openclaw/workspace/memory/.dreams/short-term-recall.json
do i need openclaw memory index --force ?
No, not anymore.
Your current status looks healthy:
Indexed: 28/28 filesDirty: noEmbeddings: readyVector: readyFTS: ready
So openclaw memory index --force is not needed now.
Use --force only if:
- you changed the memory provider/model
- status shows
Dirty: yes - indexing is stuck/missing