#DeepSeek reading images on Windows but not VPS
1 messages Ā· Page 1 of 1 (latest)
The short version: DeepSeek probably wasnāt the thing reading images on Windows. The environment was.
Image model: ā only means no explicit agents.defaults.imageModel is configured. OpenClaw can still auto-detect media understanding via things like:
- active model/provider if it supports images
- Gemini CLI if installed/authenticated
- other auth-backed vision providers
- configured provider models marked
input: ["text", "image"]
DeepSeekās built-in OpenClaw catalog is text-only, so on the VPS thereās no implicit image reader to fall back to. Windows likely had something extra available ā most commonly authenticated gemini CLI or another vision-capable provider/key. Docs: https://docs.openclaw.ai/nodes/media-understanding and https://docs.openclaw.ai/providers/deepseek
Ask him for:
openclaw --version
gemini --output-format json ok
openclaw infer image describe --file ./some-image.png --json
The clean fix is to configure an actual vision model under agents.defaults.imageModel, or install/auth the same fallback provider on the VPS. Image model: ā being the same on both machines doesnāt mean the effective auto-detected image path was the same. š¦
Gemini not installed on Windows either. Still unclear how images were being read
Yeah, then Iād look for provider auth / env differences, not imageModel.
Image model: ā only means āno explicit agents.defaults.imageModel.ā It does not mean āno image reader exists.ā
Windows could still have been reading images via auto-detected media understanding from an auth-backed provider, e.g.:
OPENAI_API_KEY- Anthropic auth/API key
- Google API key
- OpenRouter key with an image-capable model
- Codex/OpenAI OAuth profile
- a configured
models.providers.*.models[]entry withinput: ["text", "image"]
DeepSeek itself is still text-only in the OpenClaw catalog, so if the VPS only has DeepSeek auth, image understanding will disappear.
Iād have him compare these on both machines:
openclaw --version
openclaw models status
openclaw config get agents.defaults.imageModel
openclaw config get models.providers
openclaw infer image describe --file ./test.png --json
And after sending an image, /status may show the actual media result line, like š Media: image ok (provider/model) or skipped. Thatās more useful than the static Image model: ā line.
My bet: Windows had some other vision-capable auth/config present, and the VPS doesnāt. š¦
Found the root cause: Windows had OPENAI_API_KEY set as an environment variable, which OpenClaw auto-detected as a vision fallback. VPS doesn't have this, so image understanding disappeared after moving from Windows to VPS. For anyone else migrating from Windows to VPS ā check openclaw models status on Windows and look for any auto-detected auth providers, not just Image model:. The fix we're looking for now: cheapest way to add vision on VPS, preferably DeepSeek vision model via custom provider setup.
Nice, that root cause makes sense.
For the VPS fix: donāt replace DeepSeek for chat ā keep DeepSeek as the main text model and add a cheap dedicated imageModel just for media understanding.
Important bit: the built-in deepseek provider is text-only in OpenClaw. So āDeepSeek visionā only works if some other OpenAI-compatible host exposes a DeepSeek/Janus/VL-style model that actually accepts image input.
Pattern:
openclaw config set models.providers.cheap-vision '{
"baseUrl": "https://YOUR_PROVIDER/v1",
"apiKey": "YOUR_KEY",
"api": "openai-completions",
"models": [
{
"id": "PROVIDER_LOCAL_MODEL_ID",
"name": "Cheap Vision",
"input": ["text", "image"],
"contextWindow": 32768,
"maxTokens": 4096
}
]
}' --strict-json --merge
openclaw config set agents.defaults.imageModel '{
"primary": "cheap-vision/PROVIDER_LOCAL_MODEL_ID"
}' --strict-json --merge
openclaw gateway restart
openclaw infer image describe --file ./test.png --json
Key gotcha: models.providers.<id>.models[].id is provider-local, no prefix. But agents.defaults.imageModel.primary uses provider/model.
Docs: https://docs.openclaw.ai/gateway/config-tools and https://docs.openclaw.ai/gateway/config-agents
For āabsolute cheapest current vision model,ā Iād ask in #models since pricing/quality changes constantly. But config-wise, thatās the right shape. š¦
nobody is answering: What's the cheapest vision-capable model/provider that works with OpenClaw's custom provider setup right now? I want to add it as agents.defaults.imageModel on VPS.
If you need a concrete pick right now: Iād use OpenRouter + google/gemma-4-31b-it:free as the cheapest first try. Itās listed by OpenRouter as $0/M input + $0/M output and supports text + image input. Keep DeepSeek as the main chat model; use this only for agents.defaults.imageModel. (openrouter.ai)
DeepSeekās built-in OpenClaw models are text-only, so āDeepSeek visionā needs a third-party OpenAI-compatible host that exposes a real VL model. (docs.openclaw.ai)
Config shape:
openclaw config set models.providers.openrouter-vision '{
"baseUrl": "https://openrouter.ai/api/v1",
"apiKey": "YOUR_OPENROUTER_KEY",
"api": "openai-completions",
"models": [
{
"id": "google/gemma-4-31b-it:free",
"name": "Gemma 4 31B free vision",
"input": ["text", "image"],
"contextWindow": 262144,
"maxTokens": 32768
},
{
"id": "qwen/qwen2.5-vl-32b-instruct",
"name": "Qwen2.5 VL 32B",
"input": ["text", "image"],
"contextWindow": 128000,
"maxTokens": 16000
}
]
}' --strict-json --merge
openclaw config set agents.defaults.imageModel '{
"primary": "openrouter-vision/google/gemma-4-31b-it:free",