Error openai embeddings failed 403 | Friends of the Crustacean 🦞🤝 | Page 1

rigid oriole Apr 10, 2026, 2:33 AM

#

[memory] sync failed (search-bootstrap): Error: openai embeddings failed: 403 {"error":{"code":"unsupported_country_region_territory","message":"Country, region, or territory not supported","param":null,"type":"request_forbidden"}}

dreamy ospreyBOT Apr 10, 2026, 2:33 AM

#

lofty lanternBOT Apr 10, 2026, 2:34 AM

#

Let me verify the docs path and the relevant config before I answer.

#

That error is OpenAI rejecting embeddings from your region, not the memory system itself.

A couple things to check first:

openclaw --version
openclaw models status
openclaw memory status --deep

If you want the likely fix right away, set memory search to a non-OpenAI provider in openclaw.json:

{
  agents: {
    defaults: {

#

      memorySearch: {
        provider: "gemini" // or "local", "ollama", "voyage", "mistral", "bedrock"
      }
    }
  }
}

Why: in auto mode, OpenClaw prefers OpenAI before Gemini, so if an OpenAI key is present it may keep picking that and hit this 403.

If you don't need semantic memory search at all, you can also disable it:

{
  agents: {

#

    defaults: {
      memorySearch: {
        enabled: false
      }
    }
  }
}

After changing provider, I'd run:

openclaw memory index --force

Docs:

#

If you paste the output of the 3 commands above plus your memorySearch config block, I can tell you the exact setting to use.

rigid oriole Apr 10, 2026, 3:47 AM

#

even with
"agents": {
"defaults": {
"memorySearch": {
"provider": "local"
},
model fallback decision: decision=candidate_failed requested=llamacpp/gemma-4-26B-A4B-it-UD-Q5_K_S candidate=llamacpp/gemma-4-26B-A4B-it-UD-Q5_K_S reason=timeout next=google/gemini-3.1-flash-lite-preview detail=LLM request timed out.
11:42:18+08:00 [agent] embedded run agent end: runId=4be14072-117c-49ca-9328-012bcc7b7d63 isError=true model=gemini-3.1-flash-lite-preview provider=google error=LLM error: {
"error": {
"code": 400,
"message": "User location is not supported for the API use.",
"status": "FAILED_PRECONDITION"
}
} rawError={"error":{"message":"{\n "error": {\n "code": 400,\n "message": "User location is not supported for the API use.",\n "status": "FAILED_PRECONDITION"\n }\n}\n","code":400,"status":""}}
11:42:18+08:00 [agent] unsupported thinking level for google/gemini-3.1-flash-lite-preview; retrying with off
11:42:19+08:00 [agent] embedded run agent end: runId=4be14072-117c-49ca-9328-012bcc7b7d63 isError=true model=gemini-3.1-flash-lite-preview provider=google error=LLM error: {
"error": {
"code": 400,
"message": "User location is not supported for the API use.",
"status": "FAILED_PRECONDITION"
}
} rawError={"error":{"message":"{\n "error": {\n "code": 400,\n "message": "User location is not supported for the API use.",\n "status": "FAILED_PRECONDITION"\n }\n}\n","code":400,"status":""}}
11:42:19+08:00 [model-fallback] model fallback decision: decision=candidate_succeeded requested=llamacpp/gemma-4-26B-A4B-it-UD-Q5_K_S candidate=google/gemini-3.1-flash-lite-preview reason=unknown next=none

lofty lanternBOT Apr 10, 2026, 3:47 AM

#

I’m checking the exact fallback config keys so I don’t hand you a made-up fix.

#

Yep, that second log is a different path.

memorySearch.provider: "local" only affects memory embeddings.
The Gemini error you pasted is from main model fallback, because your local chat model timed out, then OpenClaw tried a Google fallback.

So if you want to stay fully local, you likely need both:

{
  agents: {
    defaults: {
      model: {
        primary: "your-local-model",
        fallbacks: []
      },
      memorySearch: {

#

        provider: "local",
        fallback: "none"
      }
    }
  }
}

Also, local memory embeddings only work if node-llama-cpp is actually available. Verify with:

openclaw --version
openclaw models status
openclaw models fallbacks list
openclaw memory status --deep

#

What I’d expect:

models status / fallbacks list shows whether Google is still in the fallback chain
memory status --deep tells you whether local embeddings are really ready, or if local memory is failing and trying to degrade/fallback

If you prefer CLI instead of editing config, you can also clear model fallbacks:

openclaw models fallbacks clear

Docs:

If you paste the output of those 4 commands, I can tell you exactly which fallback is still active.

rigid oriole Apr 10, 2026, 3:49 AM

#

why is fall back happen? and what shd i use for llama cpp gemma-4-26B-A4B-it-UD-Q5_K_S?

lofty lanternBOT Apr 10, 2026, 3:50 AM

#

I’m checking the local-model docs so I can give you a sane recommendation instead of guessing.

#

Fallback happens because your primary model timed out, and OpenClaw treats timeout/overloaded errors as failover-worthy, so it moves to the next entry in agents.defaults.model.fallbacks.

So the chain is basically:

try primary
primary times out
try next fallback model

If you do not want remote fallback at all, clear the fallback list:

openclaw models fallbacks clear

or in config:

#

{
  agents: {
    defaults: {
      model: {
        primary: "your-local-model",
        fallbacks: []
      }
    }
  }
}

For llama.cpp + that quantized Gemma build: honestly, I would not use it as your main OpenClaw agent model if you can avoid it. The docs strongly prefer:

Ollama for the easiest local setup
LM Studio + Responses API for the best current local stack

#

the largest/full-size model build you can run, not aggressive quantizations

That Gemma variant may be fine for experiments, but as a tool-using agent it’s more likely to be slow, timeout, or get flaky.

If you insist on keeping a custom local OpenAI-compatible backend, the docs suggest trying this when local models choke on bigger OpenClaw prompts:

compat: {
  supportsTools: false
}

That can help if the backend/model is breaking on tool schemas, but of course it also means the model won’t do normal tool use.

My practical recommendation:

if you want simple local: use Ollama
if you want best local quality: use LM Studio

#

if you want no cloud fallback: set fallbacks: []

Docs:

If you want, paste your agents.defaults.model block plus openclaw --version, and I’ll rewrite it into a clean no-fallback local config.

rigid oriole Apr 10, 2026, 3:52 AM

#

i mean what memorySearch provide i shd use, [agent] embedded run failover decision: runId=4be14072-117c-49ca-9328-012bcc7b7d63 stage=assistant decision=fallback_model reason=timeout provider=llamacpp/gemma-4-26B-A4B-it-UD-Q5_K_S profile=-
11:42:16+08:00 [diagnostic] lane task error: lane=main durationMs=62537 error="FailoverError: LLM request timed out."
11:42:16+08:00 [diagnostic] lane task error: lane=session:agent:main:main durationMs=62539 error="FailoverError: LLM request timed out."

lofty lanternBOT Apr 10, 2026, 3:53 AM

#

Ah, got it, you mean memorySearch provider, not the main chat model.

For your situation, I’d use a fully local memory provider:

Best choices

ollama if you already use Ollama
local if you want the built-in local embedding path

I would not use openai or gemini for memory search if your region is already rejecting them.

Recommended config

If you already have Ollama:

{
  agents: {

#

    defaults: {
      memorySearch: {
        provider: "ollama",
        fallback: "none"
      }
    }
  }
}

If you want built-in local embeddings:

{
  agents: {
    defaults: {

#

      memorySearch: {
        provider: "local",
        fallback: "none"
      }
    }
  }
}

Important distinction

This log:

provider=llamacpp/gemma-...
reason=timeout

#

is not the memorySearch provider.
That is your main agent model timing out.

So there are really 2 separate things:

memorySearch provider → embeddings for memory recall
agent model → the LLM replying in chat

My practical recommendation

using Ollama already? → set memorySearch.provider = "ollama"
not using Ollama? → set memorySearch.provider = "local"
in both cases, set fallback: "none"

Then verify with:

#

openclaw memory status --deep

If local still fails there, it usually means local embeddings are not actually ready yet.

Docs:

If you want, paste your current memorySearch block and I’ll tell you which of the two is better for your setup.

rigid oriole Apr 10, 2026, 3:53 AM

#

why fallback: "none" ? and i use llama cpp

lofty lanternBOT Apr 10, 2026, 3:54 AM

#

fallback: "none" means:

if the chosen memorySearch provider fails, do not try another provider

I suggested it because otherwise OpenClaw may fall through to a remote embeddings provider, and in your case that just hits the same region block again.

So for you:

using llama.cpp for the main chat model does not automatically make memory search use llama.cpp
there is no llamacpp memorySearch provider
the closest fully local memory option is:

{
  agents: {
    defaults: {

#

      memorySearch: {
        provider: "local",
        fallback: "none"
      }
    }
  }
}

Why none:

provider: "local" = use local embeddings
fallback: "none" = if local embeddings are not ready, fail locally
instead of trying OpenAI/Gemini/Voyage/etc.

That’s useful for you because remote fallback is exactly what you don’t want.

#

So if you use llama.cpp, my answer is:

memorySearch provider: local
memorySearch fallback: "none"

Then check:

openclaw memory status --deep

If that still says local embeddings aren’t ready, then the problem is local embedding setup, not OpenAI.

Docs:

#

If you paste openclaw memory status --deep, I can tell you whether local embeddings are actually being picked up.

rigid oriole Apr 10, 2026, 3:55 AM

#

Fallbacks (9):

llamacpp/gemma-4-26B-A4B-it-UD-Q5_K_S
google/gemini-3.1-flash-lite-preview
ollama/gemma4:31b-cloud
openrouter/google/gemma-4-31b-it:free
google/gemma-4-31b-it
ollama/qwen3.5:397b-cloud
google/gemini-3-flash-preview
google/gemma-4-26b-a4b-it
ollama/qwen3-coder:480b-cloud

🦞 OpenClaw 2026.4.9 (0512059)
I'm the reason your shell history looks like a hacker-movie montage.

│
◇
[openclaw] Failed to start CLI: Error: Local embeddings unavailable.
Reason: optional dependency node-llama-cpp is missing (or failed to install).
Detail: Cannot find package 'node-llama-cpp' imported from /home/rr420/.npm-global/lib/node_modules/openclaw/dist/engine-embeddings-BjFfLn_y.js
To enable local embeddings:

Use Node 24 (recommended for installs/updates; Node 22 LTS, currently 22.14+, remains supported)
Reinstall OpenClaw (this should install node-llama-cpp): npm i -g openclaw@latest
If you use pnpm: pnpm approve-builds (select node-llama-cpp), then pnpm rebuild node-llama-cpp
Or set agents.defaults.memorySearch.provider = "openai" (remote).
Or set agents.defaults.memorySearch.provider = "gemini" (remote).
Or set agents.defaults.memorySearch.provider = "voyage" (remote).
Or set agents.defaults.memorySearch.provider = "mistral" (remote).
at createEmbeddingProvider (file:///home/rr420/.npm-global/lib/node_modules/openclaw/dist/manager-vTWIfqhx.js:393:19)

lofty lanternBOT Apr 10, 2026, 3:56 AM

#

Nice, this gives the real answer.

Your problem is not OpenAI anymore, it’s this:

Local embeddings unavailable.
node-llama-cpp is missing

So memorySearch.provider = "local" currently cannot work on that install.

Also, those 9 fallbacks are your main model fallbacks, not memorySearch.

What this means

For memory search, your choices are now:

Option 1, best if you want built-in local memory

Keep:

#

memorySearch: {
  provider: "local",
  fallback: "none"
}

but fix the missing dependency:

npm i -g openclaw@latest

If you installed via pnpm/from source:

#

pnpm approve-builds
pnpm rebuild node-llama-cpp

Then verify:

openclaw memory status --deep

Option 2, easier if you already run local Ollama

Use:

#

memorySearch: {
  provider: "ollama",
  fallback: "none"
}

That avoids node-llama-cpp entirely.

Option 3, quickest temporary workaround

Disable memory search for now so the errors stop:

memorySearch: {

#

  enabled: false
}

My recommendation

Since OpenAI/Gemini are region-blocked for you:

If you already use local Ollama → use provider: "ollama"
If not → fix node-llama-cpp and stay with provider: "local"
keep fallback: "none" so it doesn’t jump to blocked remote providers

Your OpenClaw version looks fine, this is mainly an install/dependency issue.

#

Docs:

If this fixes it, please right-click the solution message → Apps → ✅ Mark Solution so Answer Overflow can index it for the next person.

rigid oriole Apr 10, 2026, 4:01 AM

#

openclaw memory status --deep

🦞 OpenClaw 2026.4.9 (0512059)
Your task has been queued; your dignity has been deprecated.

│
◐ Probing embeddings…[node-llama-cpp] The prebuilt binary for platform "linux" "x64" with Vulkan support is not compatible with the current system, falling back to using no GPU
◒ Probing embeddings…..Downloading to ~/.node-llama-cpp/models
✔ hf_ggml-org_embeddinggemma-300m-qat-Q8_0.gguf downloaded 328.58MB in 22s
◐ Probing embeddings…...Downloaded to ~/.node-llama-cpp/models/hf_ggml-org_embeddinggemma-300m-qat-Q8_0.gguf
◇
Memory Search (main)
Provider: local (requested: local)
Model: hf:ggml-org/embeddinggemma-300m-qat-q8_0-GGUF/embeddinggemma-300m-qat-Q8_0.gguf
Sources: memory
Indexed: 0/28 files · 0 chunks
Dirty: yes
Store: ~/.openclaw/memory/main.sqlite
Workspace: ~/.openclaw/workspace
Dreaming: off
Embeddings: ready
By source:
memory · 0/28 files · 0 chunks
Vector: ready
Vector path: ~/.npm-global/lib/node_modules/openclaw/node_modules/sqlite-vec-linux-x64/vec0.so
FTS: ready
Embedding cache: enabled (0 entries)
Batch: disabled (failures 0/2)
Recall store: 0 entries · 0 promoted · 0 concept-tagged · 0 spaced
Recall path: ~/.openclaw/workspace/memory/.dreams/short-term-recall.json

#

?

lofty lanternBOT Apr 10, 2026, 4:03 AM

#

Short version:

Using local Ollama already?
use:

memorySearch: { provider: "ollama", fallback: "none" }

Not using Ollama?
use:
```
memorySearch: { provider: "local", fallback: "none" }
```
but first fix the missing node-llama-cpp install
Just want the errors to stop right now?
use:

#

  memorySearch: { enabled: false }

Your current local setup is failing because node-llama-cpp is missing. So right now:

local = broken until fixed
openai / gemini = region blocked
ollama = easiest local workaround if you have it

If you want, paste whether you have Ollama installed yes/no, and I’ll give you the exact config block to use.

rigid oriole Apr 10, 2026, 4:05 AM

#

i have ollama install work, but not using it by running the model. I use llama cpp.
and

warn agent/embedded {"subsystem":"agent/embedded"} {"event":"embedded_run_agent_end","tags":["error_handling","lifecycle","agent_end","assistant_error"],"runId":"ee0fde1b-9d98-4e4d-91cd-8a34d0c76a1a","isError":true,"error":"LLM error: {\n "error": {\n "code": 400,\n "message": "User location is not supported for the API use.",\n "status": "FAILED_PRECONDITION"\n }\n}","failoverReason":null,"model":"gemini-3.1-flash-lite-preview","provider":"google","rawErrorPreview":"{"error":{"message":"{\n \"error\": {\n \"code\": 400,\n \"message\": \"User location is not supported for the API use.\",\n \"status\": \"FAILED_PRECONDITION\"\n }\n}\n","code":400,"status":""}}","rawErrorHash":"sha256:80e2ce331137","rawErrorFingerprint":"sha256:434b650ae971","providerErrorMessagePreview":"{\n "error": {\n "code": 400,\n "message": "User location is not supported for the API use.",\n "status": "FAILED_PRECONDITION"\n }\n}"} embedded run agent end

lofty lanternBOT Apr 10, 2026, 4:05 AM

#

Yep, that’s the key distinction:

You can use:

llama.cpp for the main chat model
Ollama only for memorySearch

They do not have to be the same thing.

What’s happening now

You currently have two separate issues:

memorySearch
- provider: "local" is failing because node-llama-cpp is missing
- since Ollama works on your machine, easiest fix is:

#

   memorySearch: {
     provider: "ollama",
     fallback: "none"
   }

main model fallback
- your llama.cpp main model timed out
- OpenClaw then fell through to Google fallback
- Google is region-blocked for you, so that fails too

So the clean setup is

#

  agents: {
    defaults: {
      model: {
        primary: "llamacpp/your-main-model",
        fallbacks: [] // or keep only local/non-Google fallbacks
      },
      memorySearch: {
        provider: "ollama",
        fallback: "none"
      }
    }
  }
}

In plain English

#

keep llama.cpp as your chat model if you want
use Ollama only for memory embeddings
remove Google from your model fallback chain, or clear fallbacks entirely

If you want the fastest fix, I’d do these two things:

openclaw models fallbacks clear

and set:

memorySearch: { provider: "ollama", fallback: "none" }

That should stop both:

#

the OpenAI/Google region errors
the unwanted remote fallback after llama.cpp timeouts

If you paste your current agents.defaults block, I can rewrite it into a working config.

rigid oriole Apr 10, 2026, 4:07 AM

#

but why, i shd install it?
openclaw memory status --deep

🦞 OpenClaw 2026.4.9 (0512059)
Give me a workspace and I'll give you fewer tabs, fewer toggles, and more oxygen.

│
◒ Probing embeddings…[node-llama-cpp] The prebuilt binary for platform "linux" "x64" with Vulkan support is not compatible with the current system, falling back to using no GPU
◇
Memory Search (main)
Provider: local (requested: local)
Model: hf:ggml-org/embeddinggemma-300m-qat-q8_0-GGUF/embeddinggemma-300m-qat-Q8_0.gguf
Sources: memory
Indexed: 0/28 files · 0 chunks
Dirty: yes
Store: ~/.openclaw/memory/main.sqlite
Workspace: ~/.openclaw/workspace
Dreaming: off
Embeddings: ready
By source:
memory · 0/28 files · 0 chunks
Vector: ready
Vector path: ~/.npm-global/lib/node_modules/openclaw/node_modules/sqlite-vec-linux-x64/vec0.so
FTS: ready
Embedding cache: enabled (0 entries)
Batch: disabled (failures 0/2)
Recall store: 0 entries · 0 promoted · 0 concept-tagged · 0 spaced
Recall path: ~/.openclaw/workspace/memory/.dreams/short-term-recall.json

lofty lanternBOT Apr 10, 2026, 4:08 AM

#

Ah, nice, this changes it.

You do not need to install/fix local embeddings now.
That output says:

Provider: local
Embeddings: ready
Vector: ready
FTS: ready

So memorySearch local is working now. 🎉

The Vulkan line is just:

GPU path not compatible, falling back to CPU

That is not fatal, just slower.

#

What you should do now

1. Keep memorySearch on local

Since it’s already working, I’d keep:

memorySearch: {
  provider: "local",
  fallback: "none"
}

2. Build the index

Right now it says:

Indexed: 0/28 files
Dirty: yes

#

So run:

openclaw memory index --force

3. Separate issue: your main model fallback

The Google/Gemini region error is still from your main model fallback chain, not memorySearch.

So if you want to stop that part, remove remote Google fallbacks or clear fallbacks:

openclaw models fallbacks clear

Short version

#

memorySearch: now fine, use local
need install? no, not for memory anymore
next command: openclaw memory index --force
Google error: separate main-model fallback issue

If this fixes the memory side, please right-click the solution message → Apps → ✅ Mark Solution so Answer Overflow can index it.

rigid oriole Apr 10, 2026, 4:16 AM

#

warn [node-llama-cpp] The prebuilt binary for platform "linux" "x64" with Vulkan support is not compatible with the current system, falling back to using no GPU

lofty lanternBOT Apr 10, 2026, 4:17 AM

#

That warning is okay.

It means:

node-llama-cpp tried to use GPU/Vulkan
your system/binary combo is not compatible
so it fell back to CPU

So:

not fatal
memory search can still work
it may just be slower

Since your earlier openclaw memory status --deep showed Embeddings: ready, I’d treat this as informational unless performance is bad.

So the practical read is:

#

warning = CPU fallback
not the cause of the Google/OpenAI region errors
safe to ignore for now

If memory is still empty, the next real step is still:

openclaw memory index --force

rigid oriole Apr 10, 2026, 4:22 AM

#

Probing embeddings…[node-llama-cpp] The prebuilt binary for platform "linux" "x64" with Vulkan support is not compatible with the current system, falling back to using no GPU
◇
Memory Search (main)
Provider: local (requested: local)
Model: hf:ggml-org/embeddinggemma-300m-qat-q8_0-GGUF/embeddinggemma-300m-qat-Q8_0.gguf
Sources: memory
Indexed: 28/28 files · 230 chunks
Dirty: no
Store: ~/.openclaw/memory/main.sqlite
Workspace: ~/.openclaw/workspace
Dreaming: off
Embeddings: ready
By source:
memory · 28/28 files · 230 chunks
Vector: ready
Vector dims: 768
Vector path: ~/.npm-global/lib/node_modules/openclaw/node_modules/sqlite-vec-linux-x64/vec0.so
FTS: ready
Embedding cache: enabled (230 entries)
Batch: disabled (failures 0/2)
Recall store: 0 entries · 0 promoted · 0 concept-tagged · 0 spaced
Recall path: ~/.openclaw/workspace/memory/.dreams/short-term-recall.json

do i need openclaw memory index --force ?

lofty lanternBOT Apr 10, 2026, 4:22 AM

#

No, not anymore.

Your current status looks healthy:

Indexed: 28/28 files
Dirty: no
Embeddings: ready
Vector: ready
FTS: ready

So openclaw memory index --force is not needed now.

Use --force only if:

you changed the memory provider/model
status shows Dirty: yes
indexing is stuck/missing

#Error openai embeddings failed 403

Best choices

Recommended config

Important distinction

My practical recommendation

What this means

Option 1, best if you want built-in local memory

Option 2, easier if you already run local Ollama

Option 3, quickest temporary workaround

My recommendation

What’s happening now

So the clean setup is

In plain English

What you should do now

1. Keep memorySearch on local

2. Build the index

3. Separate issue: your main model fallback

Short version