#fresh install
1 messages · Page 1 of 1 (latest)
Let me verify the docs path so I don’t hand you made-up config keys.
I’m checking the provider docs for the exact config shape for llama-server specifically.
Yep, for llama-server use a custom OpenAI-compatible provider in ~/.openclaw/openclaw.json (JSON5, not YAML):
{
agents: {
defaults: {
model: { primary: "llamacpp/your-model-id" },
},
},
models: {
mode: "merge",
providers: {
llamacpp: {
baseUrl: "http://127.0.0.1:8080/v1",
api: "openai-completions",
models: [
{
id: "your-model-id",
name: "Your model",
reasoning: false,
input: ["text"],
contextWindow: 128000,
maxTokens: 8192,
},
],
},
},
},
}
Then:
openclaw gateway restart
openclaw models status
openclaw status
Replace:
8080with yourllama-serverportyour-model-idwith whatever this returns:
curl http://127.0.0.1:8080/v1/models
If your backend only accepts string content, add:
compat: { requiresStringContent: true }
inside that model entry.
Useful docs:
- https://docs.openclaw.ai/help/faq
- https://docs.openclaw.ai/concepts/model-providers
- https://docs.openclaw.ai/gateway/configuration-reference#custom-providers-and-base-urls
- https://docs.openclaw.ai/gateway/troubleshooting#local-openai-compatible-backend-passes-direct-probes-but-agent-runs-fail
Also, what does openclaw --version show? If you paste that plus your llama-server port/model id, I can give you the exact snippet.
Two things:
- Your gateway token is visible in that paste. Please rotate it after this.
primary: "llama-server"is not enough. OpenClaw wants aprovider/modelref, plus amodels.providersentry that points to your local server.
Use this shape instead:
{
agents: {
defaults: {
workspace: "/home/matias/.openclaw/workspace",
model: {
primary: "llama-server/your-model-id"
}
// remove agents.defaults.models for now
}
},
models: {
mode: "merge",
providers: {
"llama-server": {
baseUrl: "http://127.0.0.1:8080/v1",
api: "openai-completions",
models: [
{
id: "your-model-id",
name: "Local llama.cpp model",
reasoning: false,
input: ["text"],
contextWindow: 32768,
maxTokens: 4096
}
]
}
}
}
}
Then restart + verify:
openclaw gateway restart
openclaw models status
openclaw status
And get the real model id from your server with:
curl http://127.0.0.1:8080/v1/models
So if that returns something like qwen2.5-coder-7b-instruct, your primary becomes:
"primary": "llama-server/qwen2.5-coder-7b-instruct"
If llama-server is picky about content format, add this inside the model entry:
compat: { requiresStringContent: true }
If you want, paste the output of /v1/models and your llama-server port, and I’ll turn it into the exact final config for you.
Docs:
openclaw command not found
I’m checking the install docs so I don’t send you down the wrong path.
Ah, that usually means the npm global bin dir isn’t on your PATH.
Try this on Ubuntu:
npm prefix -g
echo "$PATH"
If the prefix’s bin folder is missing from PATH, add it to ~/.bashrc:
echo 'export PATH="$(npm prefix -g)/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc
hash -r
Then check:
openclaw --version
If you hit permission errors on global npm installs, use a user-writable prefix:
mkdir -p "$HOME/.npm-global"
npm config set prefix "$HOME/.npm-global"
echo 'export PATH="$HOME/.npm-global/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc
hash -r
Then reinstall if needed:
npm install -g openclaw@latest
If you installed from a repo checkout instead of globally, use:
pnpm openclaw ...
instead of openclaw ....
Docs: https://docs.openclaw.ai/install/node
Also, small security thing, you pasted your gateway token above in public, so please rotate that after you’re back in.
⚠️ Context limit exceeded. I've reset our conversation to start fresh - please try again.
To prevent this, increase your compaction buffer by setting agents.defaults.compaction.reserveTokensFloor to 20000 or higher in your config.
I’m checking the compaction docs so I can give you the exact config shape.
Yep, that means auto-compaction didn’t have enough headroom.
Add this to ~/.openclaw/openclaw.json:
{
agents: {
defaults: {
compaction: {
mode: "safeguard",
reserveTokensFloor: 20000,
notifyUser: true
}
}
}
}
Then restart the gateway.
Also, since you’re using llama-server, make sure your model entry has a realistic context size, for example:
contextWindow: 32768,
maxTokens: 4096
If the local model is only 8k or 16k context, OpenClaw will hit this pretty easily, especially with tools.
Quick recovery in chat:
/compactto force compaction now/newfor a clean session
Docs:
If you paste your current models.providers.llama-server block, I can sanity-check the contextWindow too.
I am running qwen3.6-35b on maximum 262k context, how much should I set as contextWindow and maxTokens and the restokensfloor
Let me check the token docs for the exact meaning of each field before I give you a number.
For that setup, I’d start with:
contextWindow: 262144maxTokens: 8192reserveTokensFloor: 24000
And, if you want a safer runtime cap, also add:
contextTokens: 220000
So the model block would look like:
{
agents: {
defaults: {
compaction: {
mode: "safeguard",
reserveTokensFloor: 24000
}
}
},
models: {
providers: {
"llama-server": {
baseUrl: "http://127.0.0.1:8080/v1",
api: "openai-completions",
models: [
{
id: "qwen3.6-35b",
name: "Qwen 3.6 35B",
reasoning: false,
input: ["text"],
contextWindow: 262144,
contextTokens: 220000, // optional, recommended
maxTokens: 8192
}
]
}
}
}
}
Why:
contextWindowshould match the model/server’s real max context.
maxTokensis output cap, not context size. Setting it to 262k would be kind of insane for normal agent turns.reserveTokensFloorat24000gives compaction some breathing room.20000is the minimum I’d use,24000is a nicer default.
If it still feels unstable on long threads, drop contextTokens further, like 200000.
If you want, I can turn your whole config into a ready-to-paste final version.
how do I set embedding for openclaw on llama-server? I am running qwen3.6-35B as the main model, how do I add a qwen3-embedding model?
I’m checking the memory docs for the exact embeddings config shape so I don’t invent keys.
Yep, embeddings are configured separately from your main chat model.
For llama-server, use agents.defaults.memorySearch, not models.providers:
{
agents: {
defaults: {
memorySearch: {
enabled: true,
provider: "openai",
model: "qwen3-embedding", // exact model id exposed by your llama-server
fallback: "none",
remote: {
baseUrl: "http://127.0.0.1:8080/v1"
// apiKey: "..." only if your llama-server requires one
}
}
}
}
}
A few important bits:
- Keep your main model as
qwen3.6-35binmodels.providers... - Set the embedding model here under
memorySearch.model provider: "openai"is correct for a local OpenAI-compatible/v1/embeddingsendpoint
Then verify / rebuild the memory index:
openclaw memory status --deep
openclaw memory index --force
or:
openclaw memory status --deep --index
If you change the embedding model later, reindex again.
So the split is basically:
- chat/completions →
models.providers["llama-server"] - embeddings →
agents.defaults.memorySearch