#Error openai embeddings failed 403

1 messages · Page 1 of 1 (latest)

rigid oriole
#

[memory] sync failed (search-bootstrap): Error: openai embeddings failed: 403 {"error":{"code":"unsupported_country_region_territory","message":"Country, region, or territory not supported","param":null,"type":"request_forbidden"}}

dreamy ospreyBOT
lofty lanternBOT
#

Let me verify the docs path and the relevant config before I answer.

#

That error is OpenAI rejecting embeddings from your region, not the memory system itself.

A couple things to check first:

openclaw --version
openclaw models status
openclaw memory status --deep

If you want the likely fix right away, set memory search to a non-OpenAI provider in openclaw.json:

{
  agents: {
    defaults: {
#
      memorySearch: {
        provider: "gemini" // or "local", "ollama", "voyage", "mistral", "bedrock"
      }
    }
  }
}

Why: in auto mode, OpenClaw prefers OpenAI before Gemini, so if an OpenAI key is present it may keep picking that and hit this 403.

If you don't need semantic memory search at all, you can also disable it:

{
  agents: {
#
    defaults: {
      memorySearch: {
        enabled: false
      }
    }
  }
}

After changing provider, I'd run:

openclaw memory index --force

Docs:

rigid oriole
#

even with
"agents": {
"defaults": {
"memorySearch": {
"provider": "local"
},
model fallback decision: decision=candidate_failed requested=llamacpp/gemma-4-26B-A4B-it-UD-Q5_K_S candidate=llamacpp/gemma-4-26B-A4B-it-UD-Q5_K_S reason=timeout next=google/gemini-3.1-flash-lite-preview detail=LLM request timed out.
11:42:18+08:00 [agent] embedded run agent end: runId=4be14072-117c-49ca-9328-012bcc7b7d63 isError=true model=gemini-3.1-flash-lite-preview provider=google error=LLM error: {
"error": {
"code": 400,
"message": "User location is not supported for the API use.",
"status": "FAILED_PRECONDITION"
}
} rawError={"error":{"message":"{\n "error": {\n "code": 400,\n "message": "User location is not supported for the API use.",\n "status": "FAILED_PRECONDITION"\n }\n}\n","code":400,"status":""}}
11:42:18+08:00 [agent] unsupported thinking level for google/gemini-3.1-flash-lite-preview; retrying with off
11:42:19+08:00 [agent] embedded run agent end: runId=4be14072-117c-49ca-9328-012bcc7b7d63 isError=true model=gemini-3.1-flash-lite-preview provider=google error=LLM error: {
"error": {
"code": 400,
"message": "User location is not supported for the API use.",
"status": "FAILED_PRECONDITION"
}
} rawError={"error":{"message":"{\n "error": {\n "code": 400,\n "message": "User location is not supported for the API use.",\n "status": "FAILED_PRECONDITION"\n }\n}\n","code":400,"status":""}}
11:42:19+08:00 [model-fallback] model fallback decision: decision=candidate_succeeded requested=llamacpp/gemma-4-26B-A4B-it-UD-Q5_K_S candidate=google/gemini-3.1-flash-lite-preview reason=unknown next=none

lofty lanternBOT
#

I’m checking the exact fallback config keys so I don’t hand you a made-up fix.

#

Yep, that second log is a different path.

memorySearch.provider: "local" only affects memory embeddings.
The Gemini error you pasted is from main model fallback, because your local chat model timed out, then OpenClaw tried a Google fallback.

So if you want to stay fully local, you likely need both:

{
  agents: {
    defaults: {
      model: {
        primary: "your-local-model",
        fallbacks: []
      },
      memorySearch: {
#
        provider: "local",
        fallback: "none"
      }
    }
  }
}

Also, local memory embeddings only work if node-llama-cpp is actually available. Verify with:

openclaw --version
openclaw models status
openclaw models fallbacks list
openclaw memory status --deep
#

What I’d expect:

  • models status / fallbacks list shows whether Google is still in the fallback chain
  • memory status --deep tells you whether local embeddings are really ready, or if local memory is failing and trying to degrade/fallback

If you prefer CLI instead of editing config, you can also clear model fallbacks:

openclaw models fallbacks clear

Docs:

If you paste the output of those 4 commands, I can tell you exactly which fallback is still active.

rigid oriole
#

why is fall back happen? and what shd i use for llama cpp gemma-4-26B-A4B-it-UD-Q5_K_S?

lofty lanternBOT
#

I’m checking the local-model docs so I can give you a sane recommendation instead of guessing.

#

Fallback happens because your primary model timed out, and OpenClaw treats timeout/overloaded errors as failover-worthy, so it moves to the next entry in agents.defaults.model.fallbacks.

So the chain is basically:

  1. try primary
  2. primary times out
  3. try next fallback model

If you do not want remote fallback at all, clear the fallback list:

openclaw models fallbacks clear

or in config:

#
{
  agents: {
    defaults: {
      model: {
        primary: "your-local-model",
        fallbacks: []
      }
    }
  }
}

For llama.cpp + that quantized Gemma build: honestly, I would not use it as your main OpenClaw agent model if you can avoid it. The docs strongly prefer:

  • Ollama for the easiest local setup
  • LM Studio + Responses API for the best current local stack
#
  • the largest/full-size model build you can run, not aggressive quantizations

That Gemma variant may be fine for experiments, but as a tool-using agent it’s more likely to be slow, timeout, or get flaky.

If you insist on keeping a custom local OpenAI-compatible backend, the docs suggest trying this when local models choke on bigger OpenClaw prompts:

compat: {
  supportsTools: false
}

That can help if the backend/model is breaking on tool schemas, but of course it also means the model won’t do normal tool use.

My practical recommendation:

  • if you want simple local: use Ollama
  • if you want best local quality: use LM Studio
rigid oriole
#

i mean what memorySearch provide i shd use, [agent] embedded run failover decision: runId=4be14072-117c-49ca-9328-012bcc7b7d63 stage=assistant decision=fallback_model reason=timeout provider=llamacpp/gemma-4-26B-A4B-it-UD-Q5_K_S profile=-
11:42:16+08:00 [diagnostic] lane task error: lane=main durationMs=62537 error="FailoverError: LLM request timed out."
11:42:16+08:00 [diagnostic] lane task error: lane=session:agent:main:main durationMs=62539 error="FailoverError: LLM request timed out."

lofty lanternBOT
#

Ah, got it, you mean memorySearch provider, not the main chat model.

For your situation, I’d use a fully local memory provider:

Best choices

  1. ollama if you already use Ollama
  2. local if you want the built-in local embedding path

I would not use openai or gemini for memory search if your region is already rejecting them.

Recommended config

If you already have Ollama:

{
  agents: {
#
    defaults: {
      memorySearch: {
        provider: "ollama",
        fallback: "none"
      }
    }
  }
}

If you want built-in local embeddings:

{
  agents: {
    defaults: {
#
      memorySearch: {
        provider: "local",
        fallback: "none"
      }
    }
  }
}

Important distinction

This log:

provider=llamacpp/gemma-...
reason=timeout
#

is not the memorySearch provider.
That is your main agent model timing out.

So there are really 2 separate things:

  • memorySearch provider → embeddings for memory recall
  • agent model → the LLM replying in chat

My practical recommendation

  • using Ollama already? → set memorySearch.provider = "ollama"
  • not using Ollama? → set memorySearch.provider = "local"
  • in both cases, set fallback: "none"

Then verify with:

rigid oriole
#

why fallback: "none" ? and i use llama cpp

lofty lanternBOT
#

fallback: "none" means:

if the chosen memorySearch provider fails, do not try another provider

I suggested it because otherwise OpenClaw may fall through to a remote embeddings provider, and in your case that just hits the same region block again.

So for you:

  • using llama.cpp for the main chat model does not automatically make memory search use llama.cpp
  • there is no llamacpp memorySearch provider
  • the closest fully local memory option is:
{
  agents: {
    defaults: {
#
      memorySearch: {
        provider: "local",
        fallback: "none"
      }
    }
  }
}

Why none:

  • provider: "local" = use local embeddings
  • fallback: "none" = if local embeddings are not ready, fail locally
    instead of trying OpenAI/Gemini/Voyage/etc.

That’s useful for you because remote fallback is exactly what you don’t want.

#

If you paste openclaw memory status --deep, I can tell you whether local embeddings are actually being picked up.

rigid oriole
#

Fallbacks (9):

  • llamacpp/gemma-4-26B-A4B-it-UD-Q5_K_S
  • google/gemini-3.1-flash-lite-preview
  • ollama/gemma4:31b-cloud
  • openrouter/google/gemma-4-31b-it:free
  • google/gemma-4-31b-it
  • ollama/qwen3.5:397b-cloud
  • google/gemini-3-flash-preview
  • google/gemma-4-26b-a4b-it
  • ollama/qwen3-coder:480b-cloud

🦞 OpenClaw 2026.4.9 (0512059)
I'm the reason your shell history looks like a hacker-movie montage.



[openclaw] Failed to start CLI: Error: Local embeddings unavailable.
Reason: optional dependency node-llama-cpp is missing (or failed to install).
Detail: Cannot find package 'node-llama-cpp' imported from /home/rr420/.npm-global/lib/node_modules/openclaw/dist/engine-embeddings-BjFfLn_y.js
To enable local embeddings:

  1. Use Node 24 (recommended for installs/updates; Node 22 LTS, currently 22.14+, remains supported)
  2. Reinstall OpenClaw (this should install node-llama-cpp): npm i -g openclaw@latest
  3. If you use pnpm: pnpm approve-builds (select node-llama-cpp), then pnpm rebuild node-llama-cpp
    Or set agents.defaults.memorySearch.provider = "openai" (remote).
    Or set agents.defaults.memorySearch.provider = "gemini" (remote).
    Or set agents.defaults.memorySearch.provider = "voyage" (remote).
    Or set agents.defaults.memorySearch.provider = "mistral" (remote).
    at createEmbeddingProvider (file:///home/rr420/.npm-global/lib/node_modules/openclaw/dist/manager-vTWIfqhx.js:393:19)
lofty lanternBOT
#

Nice, this gives the real answer.

Your problem is not OpenAI anymore, it’s this:

Local embeddings unavailable.
node-llama-cpp is missing

So memorySearch.provider = "local" currently cannot work on that install.

Also, those 9 fallbacks are your main model fallbacks, not memorySearch.

What this means

For memory search, your choices are now:

Option 1, best if you want built-in local memory

Keep:

#
memorySearch: {
  provider: "local",
  fallback: "none"
}

but fix the missing dependency:

npm i -g openclaw@latest

If you installed via pnpm/from source:

#
pnpm approve-builds
pnpm rebuild node-llama-cpp

Then verify:

openclaw memory status --deep

Option 2, easier if you already run local Ollama

Use:

#
memorySearch: {
  provider: "ollama",
  fallback: "none"
}

That avoids node-llama-cpp entirely.


Option 3, quickest temporary workaround

Disable memory search for now so the errors stop:

memorySearch: {
#
  enabled: false
}

My recommendation

Since OpenAI/Gemini are region-blocked for you:

  • If you already use local Ollama → use provider: "ollama"
  • If not → fix node-llama-cpp and stay with provider: "local"
  • keep fallback: "none" so it doesn’t jump to blocked remote providers

Your OpenClaw version looks fine, this is mainly an install/dependency issue.

rigid oriole
#

openclaw memory status --deep

🦞 OpenClaw 2026.4.9 (0512059)
Your task has been queued; your dignity has been deprecated.


◐ Probing embeddings…[node-llama-cpp] The prebuilt binary for platform "linux" "x64" with Vulkan support is not compatible with the current system, falling back to using no GPU
◒ Probing embeddings…..Downloading to ~/.node-llama-cpp/models
✔ hf_ggml-org_embeddinggemma-300m-qat-Q8_0.gguf downloaded 328.58MB in 22s
◐ Probing embeddings…...Downloaded to ~/.node-llama-cpp/models/hf_ggml-org_embeddinggemma-300m-qat-Q8_0.gguf

Memory Search (main)
Provider: local (requested: local)
Model: hf:ggml-org/embeddinggemma-300m-qat-q8_0-GGUF/embeddinggemma-300m-qat-Q8_0.gguf
Sources: memory
Indexed: 0/28 files · 0 chunks
Dirty: yes
Store: ~/.openclaw/memory/main.sqlite
Workspace: ~/.openclaw/workspace
Dreaming: off
Embeddings: ready
By source:
memory · 0/28 files · 0 chunks
Vector: ready
Vector path: ~/.npm-global/lib/node_modules/openclaw/node_modules/sqlite-vec-linux-x64/vec0.so
FTS: ready
Embedding cache: enabled (0 entries)
Batch: disabled (failures 0/2)
Recall store: 0 entries · 0 promoted · 0 concept-tagged · 0 spaced
Recall path: ~/.openclaw/workspace/memory/.dreams/short-term-recall.json

#

?

lofty lanternBOT
#

Short version:

  • Using local Ollama already?
    use:

    memorySearch: { provider: "ollama", fallback: "none" }
    
  • Not using Ollama?
    use:

    memorySearch: { provider: "local", fallback: "none" }
    

    but first fix the missing node-llama-cpp install

  • Just want the errors to stop right now?
    use:

#
  memorySearch: { enabled: false }

Your current local setup is failing because node-llama-cpp is missing. So right now:

  • local = broken until fixed
  • openai / gemini = region blocked
  • ollama = easiest local workaround if you have it

If you want, paste whether you have Ollama installed yes/no, and I’ll give you the exact config block to use.

rigid oriole
#

i have ollama install work, but not using it by running the model. I use llama cpp.
and

warn agent/embedded {"subsystem":"agent/embedded"} {"event":"embedded_run_agent_end","tags":["error_handling","lifecycle","agent_end","assistant_error"],"runId":"ee0fde1b-9d98-4e4d-91cd-8a34d0c76a1a","isError":true,"error":"LLM error: {\n "error": {\n "code": 400,\n "message": "User location is not supported for the API use.",\n "status": "FAILED_PRECONDITION"\n }\n}","failoverReason":null,"model":"gemini-3.1-flash-lite-preview","provider":"google","rawErrorPreview":"{"error":{"message":"{\n \"error\": {\n \"code\": 400,\n \"message\": \"User location is not supported for the API use.\",\n \"status\": \"FAILED_PRECONDITION\"\n }\n}\n","code":400,"status":""}}","rawErrorHash":"sha256:80e2ce331137","rawErrorFingerprint":"sha256:434b650ae971","providerErrorMessagePreview":"{\n "error": {\n "code": 400,\n "message": "User location is not supported for the API use.",\n "status": "FAILED_PRECONDITION"\n }\n}"} embedded run agent end

lofty lanternBOT
#

Yep, that’s the key distinction:

You can use:

  • llama.cpp for the main chat model
  • Ollama only for memorySearch

They do not have to be the same thing.

What’s happening now

You currently have two separate issues:

  1. memorySearch
    • provider: "local" is failing because node-llama-cpp is missing
    • since Ollama works on your machine, easiest fix is:
#
   memorySearch: {
     provider: "ollama",
     fallback: "none"
   }
  1. main model fallback
    • your llama.cpp main model timed out
    • OpenClaw then fell through to Google fallback
    • Google is region-blocked for you, so that fails too

So the clean setup is

{
#
  agents: {
    defaults: {
      model: {
        primary: "llamacpp/your-main-model",
        fallbacks: [] // or keep only local/non-Google fallbacks
      },
      memorySearch: {
        provider: "ollama",
        fallback: "none"
      }
    }
  }
}

In plain English

#
  • keep llama.cpp as your chat model if you want
  • use Ollama only for memory embeddings
  • remove Google from your model fallback chain, or clear fallbacks entirely

If you want the fastest fix, I’d do these two things:

openclaw models fallbacks clear

and set:

memorySearch: { provider: "ollama", fallback: "none" }

That should stop both:

#
  • the OpenAI/Google region errors
  • the unwanted remote fallback after llama.cpp timeouts

If you paste your current agents.defaults block, I can rewrite it into a working config.

rigid oriole
#

but why, i shd install it?
openclaw memory status --deep

🦞 OpenClaw 2026.4.9 (0512059)
Give me a workspace and I'll give you fewer tabs, fewer toggles, and more oxygen.


◒ Probing embeddings…[node-llama-cpp] The prebuilt binary for platform "linux" "x64" with Vulkan support is not compatible with the current system, falling back to using no GPU

Memory Search (main)
Provider: local (requested: local)
Model: hf:ggml-org/embeddinggemma-300m-qat-q8_0-GGUF/embeddinggemma-300m-qat-Q8_0.gguf
Sources: memory
Indexed: 0/28 files · 0 chunks
Dirty: yes
Store: ~/.openclaw/memory/main.sqlite
Workspace: ~/.openclaw/workspace
Dreaming: off
Embeddings: ready
By source:
memory · 0/28 files · 0 chunks
Vector: ready
Vector path: ~/.npm-global/lib/node_modules/openclaw/node_modules/sqlite-vec-linux-x64/vec0.so
FTS: ready
Embedding cache: enabled (0 entries)
Batch: disabled (failures 0/2)
Recall store: 0 entries · 0 promoted · 0 concept-tagged · 0 spaced
Recall path: ~/.openclaw/workspace/memory/.dreams/short-term-recall.json

lofty lanternBOT
#

Ah, nice, this changes it.

You do not need to install/fix local embeddings now.
That output says:

  • Provider: local
  • Embeddings: ready
  • Vector: ready
  • FTS: ready

So memorySearch local is working now. 🎉

The Vulkan line is just:

GPU path not compatible, falling back to CPU

That is not fatal, just slower.

#

What you should do now

1. Keep memorySearch on local

Since it’s already working, I’d keep:

memorySearch: {
  provider: "local",
  fallback: "none"
}

2. Build the index

Right now it says:

  • Indexed: 0/28 files
  • Dirty: yes
#

So run:

openclaw memory index --force

3. Separate issue: your main model fallback

The Google/Gemini region error is still from your main model fallback chain, not memorySearch.

So if you want to stop that part, remove remote Google fallbacks or clear fallbacks:

openclaw models fallbacks clear

Short version

#
  • memorySearch: now fine, use local
  • need install? no, not for memory anymore
  • next command: openclaw memory index --force
  • Google error: separate main-model fallback issue

If this fixes the memory side, please right-click the solution message → Apps → ✅ Mark Solution so Answer Overflow can index it.

rigid oriole
#

warn [node-llama-cpp] The prebuilt binary for platform "linux" "x64" with Vulkan support is not compatible with the current system, falling back to using no GPU

lofty lanternBOT
#

That warning is okay.

It means:

  • node-llama-cpp tried to use GPU/Vulkan
  • your system/binary combo is not compatible
  • so it fell back to CPU

So:

  • not fatal
  • memory search can still work
  • it may just be slower

Since your earlier openclaw memory status --deep showed Embeddings: ready, I’d treat this as informational unless performance is bad.

So the practical read is:

#
  • warning = CPU fallback
  • not the cause of the Google/OpenAI region errors
  • safe to ignore for now

If memory is still empty, the next real step is still:

openclaw memory index --force
rigid oriole
#

Probing embeddings…[node-llama-cpp] The prebuilt binary for platform "linux" "x64" with Vulkan support is not compatible with the current system, falling back to using no GPU

Memory Search (main)
Provider: local (requested: local)
Model: hf:ggml-org/embeddinggemma-300m-qat-q8_0-GGUF/embeddinggemma-300m-qat-Q8_0.gguf
Sources: memory
Indexed: 28/28 files · 230 chunks
Dirty: no
Store: ~/.openclaw/memory/main.sqlite
Workspace: ~/.openclaw/workspace
Dreaming: off
Embeddings: ready
By source:
memory · 28/28 files · 230 chunks
Vector: ready
Vector dims: 768
Vector path: ~/.npm-global/lib/node_modules/openclaw/node_modules/sqlite-vec-linux-x64/vec0.so
FTS: ready
Embedding cache: enabled (230 entries)
Batch: disabled (failures 0/2)
Recall store: 0 entries · 0 promoted · 0 concept-tagged · 0 spaced
Recall path: ~/.openclaw/workspace/memory/.dreams/short-term-recall.json

do i need openclaw memory index --force ?

lofty lanternBOT
#

No, not anymore.

Your current status looks healthy:

  • Indexed: 28/28 files
  • Dirty: no
  • Embeddings: ready
  • Vector: ready
  • FTS: ready

So openclaw memory index --force is not needed now.

Use --force only if:

  • you changed the memory provider/model
  • status shows Dirty: yes
  • indexing is stuck/missing