#DeepSeek reading images on Windows but not VPS

1 messages Ā· Page 1 of 1 (latest)

supple osprey
#

Bot was reading images on Windows with DeepSeek (no imageModel configured). After moving to VPS he says his model can't read images. Both show Image model: —. What changed?

old dewBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> āœ… Mark Solution

sullen palmBOT
#

The short version: DeepSeek probably wasn’t the thing reading images on Windows. The environment was.

Image model: — only means no explicit agents.defaults.imageModel is configured. OpenClaw can still auto-detect media understanding via things like:

  • active model/provider if it supports images
  • Gemini CLI if installed/authenticated
  • other auth-backed vision providers
  • configured provider models marked input: ["text", "image"]

DeepSeek’s built-in OpenClaw catalog is text-only, so on the VPS there’s no implicit image reader to fall back to. Windows likely had something extra available — most commonly authenticated gemini CLI or another vision-capable provider/key. Docs: https://docs.openclaw.ai/nodes/media-understanding and https://docs.openclaw.ai/providers/deepseek

Ask him for:

openclaw --version
gemini --output-format json ok
#
openclaw infer image describe --file ./some-image.png --json

The clean fix is to configure an actual vision model under agents.defaults.imageModel, or install/auth the same fallback provider on the VPS. Image model: — being the same on both machines doesn’t mean the effective auto-detected image path was the same. 🦐

supple osprey
#

Gemini not installed on Windows either. Still unclear how images were being read

sullen palmBOT
#

Yeah, then I’d look for provider auth / env differences, not imageModel.

Image model: — only means ā€œno explicit agents.defaults.imageModel.ā€ It does not mean ā€œno image reader exists.ā€

Windows could still have been reading images via auto-detected media understanding from an auth-backed provider, e.g.:

  • OPENAI_API_KEY
  • Anthropic auth/API key
  • Google API key
  • OpenRouter key with an image-capable model
  • Codex/OpenAI OAuth profile
  • a configured models.providers.*.models[] entry with input: ["text", "image"]

DeepSeek itself is still text-only in the OpenClaw catalog, so if the VPS only has DeepSeek auth, image understanding will disappear.

I’d have him compare these on both machines:

#
openclaw --version
openclaw models status
openclaw config get agents.defaults.imageModel
openclaw config get models.providers
openclaw infer image describe --file ./test.png --json

And after sending an image, /status may show the actual media result line, like šŸ“Ž Media: image ok (provider/model) or skipped. That’s more useful than the static Image model: — line.

My bet: Windows had some other vision-capable auth/config present, and the VPS doesn’t. 🦐

supple osprey
#

Found the root cause: Windows had OPENAI_API_KEY set as an environment variable, which OpenClaw auto-detected as a vision fallback. VPS doesn't have this, so image understanding disappeared after moving from Windows to VPS. For anyone else migrating from Windows to VPS — check openclaw models status on Windows and look for any auto-detected auth providers, not just Image model:. The fix we're looking for now: cheapest way to add vision on VPS, preferably DeepSeek vision model via custom provider setup.

sullen palmBOT
#

Nice, that root cause makes sense.

For the VPS fix: don’t replace DeepSeek for chat — keep DeepSeek as the main text model and add a cheap dedicated imageModel just for media understanding.

Important bit: the built-in deepseek provider is text-only in OpenClaw. So ā€œDeepSeek visionā€ only works if some other OpenAI-compatible host exposes a DeepSeek/Janus/VL-style model that actually accepts image input.

Pattern:

openclaw config set models.providers.cheap-vision '{
  "baseUrl": "https://YOUR_PROVIDER/v1",
  "apiKey": "YOUR_KEY",
  "api": "openai-completions",
  "models": [
    {
      "id": "PROVIDER_LOCAL_MODEL_ID",
#
      "name": "Cheap Vision",
      "input": ["text", "image"],
      "contextWindow": 32768,
      "maxTokens": 4096
    }
  ]
}' --strict-json --merge

openclaw config set agents.defaults.imageModel '{
  "primary": "cheap-vision/PROVIDER_LOCAL_MODEL_ID"
}' --strict-json --merge

openclaw gateway restart
openclaw infer image describe --file ./test.png --json
supple osprey
#

nobody is answering: What's the cheapest vision-capable model/provider that works with OpenClaw's custom provider setup right now? I want to add it as agents.defaults.imageModel on VPS.

sullen palmBOT
#

If you need a concrete pick right now: I’d use OpenRouter + google/gemma-4-31b-it:free as the cheapest first try. It’s listed by OpenRouter as $0/M input + $0/M output and supports text + image input. Keep DeepSeek as the main chat model; use this only for agents.defaults.imageModel. (openrouter.ai)

DeepSeek’s built-in OpenClaw models are text-only, so ā€œDeepSeek visionā€ needs a third-party OpenAI-compatible host that exposes a real VL model. (docs.openclaw.ai)

Config shape:

openclaw config set models.providers.openrouter-vision '{
  "baseUrl": "https://openrouter.ai/api/v1",
  "apiKey": "YOUR_OPENROUTER_KEY",
  "api": "openai-completions",
  "models": [
    {
      "id": "google/gemma-4-31b-it:free",
      "name": "Gemma 4 31B free vision",
      "input": ["text", "image"],

Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output. $0 per million input tokens, $0 per million output tokens. 262,144 token context window, maximum output of 32,768 tokens. Higher uptime with 8 providers.

#
      "contextWindow": 262144,
      "maxTokens": 32768
    },
    {
      "id": "qwen/qwen2.5-vl-32b-instruct",
      "name": "Qwen2.5 VL 32B",
      "input": ["text", "image"],
      "contextWindow": 128000,
      "maxTokens": 16000
    }
  ]
}' --strict-json --merge

openclaw config set agents.defaults.imageModel '{
  "primary": "openrouter-vision/google/gemma-4-31b-it:free",