what should i change in the openclaw.json to make it worl perfectly?
"models": {
"mode": "merge",
"providers": {
"local-gemma": {
"baseUrl": "http://127.0.0.1:8082/v1",
"apiKey": "not-needed",
"api": "openai-completions",
"models": [
{
"id": "Gemma4-26B-A4B",
"name": "Google Gemma 4 26B A4B",
"reasoning": false,
"input": [
"text"
],
"cost": {
"input": 0,
"output": 0,
"cacheRead": 0,
"cacheWrite": 0
},
"contextWindow": 262144,
"maxTokens": 8192
}
]
},
#Setting up local Gemma4 26B A4B
1 messages · Page 1 of 1 (latest)
Not “perfectly” 😅, but for the best chance with a local Gemma OpenAI-compatible backend, I’d change/add these:
{
agents: {
defaults: {
model: { primary: "local-gemma/Gemma4-26B-A4B" },
models: {
"local-gemma/Gemma4-26B-A4B": { alias: "Gemma" },
},
},
},
models: {
mode: "merge",
providers: {
"local-gemma": {
baseUrl: "http://127.0.0.1:8082/v1",
apiKey: "gemma-local",
api: "openai-completions",
models: [
{
id: "Gemma4-26B-A4B",
name: "Google Gemma 4 26B A4B",
reasoning: false,
input: ["text"],
cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
contextWindow: 262144,
maxTokens: 8192,
compat: {
requiresStringContent: true,
supportsTools: false
}
}
]
}
}
}
}
Most important bits:
agents.defaults.model.primaryso OpenClaw actually uses itcompat.requiresStringContent: truefor stricter local/v1backendscompat.supportsTools: falsebecause Gemma/local OpenAI-compatible servers often break on full agent tool schemas
Also, if you already have an agents.defaults.models allowlist elsewhere, add this model there too, or it may be rejected as “not allowed”.
Please paste these too:
openclaw --version
openclaw models status
And if your server is specifically vLLM, SGLang, LM Studio, or llama.cpp, say which one, because the best config differs a bit.
Docs:
it should support not only text but even other formats, like image and audio
Yep, but there’s an important catch:
inputinmodels.providers...models[]only supports"text"and"image""audio"is not valid there- Audio support is configured separately under
tools.media.audio
So for your local Gemma block, the multimodal version would look more like:
{
agents: {
defaults: {
model: { primary: "local-gemma/Gemma4-26B-A4B" },
},
},
models: {
mode: "merge",
providers: {
"local-gemma": {
baseUrl: "http://127.0.0.1:8082/v1",
apiKey: "gemma-local",
api: "openai-completions",
models: [
{
id: "Gemma4-26B-A4B",
name: "Google Gemma 4 26B A4B",
reasoning: false,
input: ["text", "image"],
cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
contextWindow: 262144,
maxTokens: 8192,
compat: {
requiresStringContent: true,
supportsTools: false
}
}
]
}
}
},
tools: {
media: {
image: {
enabled: true
},
audio: {
enabled: true
}
}
}
}
Two important notes:
-
Image
input: ["text", "image"]is correct only if your Gemma server on:8082actually exposes vision- if the backend is text-only, config alone won’t make images work
-
Audio
- audio is not handled by
input- use
tools.media.audio - if you want reliable audio, you usually add a transcription model or CLI fallback, for example OpenAI/Groq/Deepgram/Whisper
- use
So the short answer is:
- change
inputto["text", "image"]for image support - add
tools.media.audio.enabled: truefor audio processing - do not add
"audio"to the modelinputarray
If you want, paste your full openclaw.json plus:
openclaw --version
openclaw models status