#When I send a photo via Telegram, OpenClaw always falls back to Google Search

1 messages · Page 1 of 1 (latest)

safe pilot
#

'm using OpenClaw with Ollama + Gemma4 (multimodal, confirmed working via direct API). When I send a photo via Telegram, OpenClaw always falls back to Google Search instead of passing the image directly to Gemma4. I have input: ["text","image"] and imageModel.primary: ollama/gemma4 configured. Is there a known fix for Telegram → Ollama vision routing?

balmy obsidianBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> âś… Mark Solution

polar topazBOT
# safe pilot 'm using OpenClaw with Ollama + Gemma4 (multimodal, confirmed working via direct...

Alexander, first thing: can you share openclaw --version? This routing has changed recently enough that version matters.

From current docs/source, the intended behavior is:

  • Telegram media is downloaded/staged as inbound media.
  • If the active reply model is marked vision-capable, OpenClaw skips the image-summary step and passes the original image into the model context.
  • If the active reply model is not vision-capable, OpenClaw uses agents.defaults.imageModel for image understanding.
  • For Ollama, vision is only trusted when the model catalog/provider entry says input: ["text", "image"], usually discovered from Ollama /api/show reporting vision.

Most likely checks:

openclaw --version
openclaw models list --provider ollama
openclaw infer image describe --file ./photo.jpg --model ollama/gemma4 --json
openclaw logs --follow
polar topazBOT
polar topazBOT
# safe pilot 'm using OpenClaw with Ollama + Gemma4 (multimodal, confirmed working via direct...
  agents: {
    defaults: {
      imageModel: {
        primary: "ollama/gemma4",
      },
    },
  },
}

Docs I checked:
https://docs.openclaw.ai/providers/ollama
https://docs.openclaw.ai/nodes/media-understanding
https://docs.openclaw.ai/concepts/models

If infer image describe works but Telegram still triggers web/Google search, please paste the version plus the relevant log lines around the Telegram message/media-understanding decision. That sounds issue-worthy if the image attachment is being staged but not injected.

polar topazBOT
# safe pilot OpenClaw 2026.5.12 (f066dd2)

Thanks. 2026.5.12 is the stable tag, so next I’d narrow this to “catalog says Gemma4 is vision-capable?” vs “Telegram media is not being staged/injected.”

Please run:

openclaw models list --provider ollama
openclaw infer image describe --file ./photo.jpg --model ollama/gemma4 --json
openclaw logs --follow

What I’m looking for:

  • In models list, ollama/gemma4 needs to show image/vision input. If it only shows text, OpenClaw will not inject the Telegram photo into that model.
  • If you have a manual models.providers.ollama block, auto-discovery is skipped, so the explicit gemma4 model entry must include input: ["text", "image"].
  • If infer image describe works but Telegram still searches, paste the media-understanding / Telegram log lines around that inbound photo. That would point to the Telegram attachment path not reaching the agent turn.

Also check that your active reply model is actually the Ollama vision model if you want direct native injection:

polar topazBOT
safe pilot
polar topazBOT
# safe pilot Model Input Ctx Local Auth Ta...

Good, that rules out the common catalog/config problem: OpenClaw sees ollama/gemma4 as text+image and it’s tagged as both default and image.

Next useful split:

openclaw infer image describe --file ./photo.jpg --model ollama/gemma4 --json

If that fails, it’s Ollama/provider-side despite the catalog saying vision.

If that succeeds, then the remaining suspect is the Telegram inbound path. Please grab the log lines immediately after sending one Telegram photo:

openclaw logs --follow

Look for entries mentioning telegram, MediaPath / MediaPaths, media-understanding, or Skipping image understanding: primary model supports vision natively.

polar topazBOT
# safe pilot Model Input Ctx Local Auth Ta...

Expected path for your config is: Telegram downloads/stages the image, OpenClaw sees the active model supports vision, skips pre-describing it, then injects the original image into the model turn. If you instead see Google Search/web search, either the image never reached the turn as media, or Gemma is deciding to call search after seeing only text/context.

safe pilot
#

Found it. Log shows:
Inbound message telegram:... (direct, image/jpeg, 13 chars)
followed immediately by:
Native image: dropped 1 image(s) after sanitization (prompt:images)
Image reaches the gateway but is dropped before the model turn. Is this a known issue with Ollama vision models in 2026.5.12?

polar topazBOT
# safe pilot Found it. Log shows: Inbound message telegram:... (direct, image/jpeg, 13 chars)...

I don’t think that log points to an Ollama-vision-specific failure. In current source, Native image: dropped ... after sanitization (prompt:images) is emitted after OpenClaw has already loaded the image into native image blocks, then sanitizeImageBlocks() filters it out before the model call.

That drop usually means one of these happened:

  • empty image payload
  • invalid/non-canonical base64
  • image resize/metadata processing failed, or it could not be reduced under the image limits

For Ollama specifically, OpenClaw only marks the model as vision-capable if /api/show reports capabilities: ["vision", ...]; then the native Ollama transport sends base64 via the images field to /api/chat. So if it got as far as prompt:images, I’d look at the sanitizer/media path first, not model fallback.

Useful checks:

openclaw --version
openclaw models status
ollama show <model-name> --json | jq '.capabilities'
openclaw logs --follow
polar topazBOT
# safe pilot Found it. Log shows: Inbound message telegram:... (direct, image/jpeg, 13 chars)...

Then reproduce once and look for a nearby line from agents/tool-images, especially omitted image payload: ..., Image resize failed to fit limits, or invalid base64. If that line says invalid/resize failure, it’s likely worth filing with the minimal log excerpt + image type/size here: https://github.com/openclaw/openclaw/issues

Docs for the model/image fallback bits: https://docs.openclaw.ai/concepts/models

safe pilot
#

Found the exact log sequence:
Inbound message telegram:... (direct, image/jpeg, 13 chars)
Native image: dropped 1 image(s) after sanitization (prompt:images)
The image reaches the gateway as image/jpeg but is dropped during sanitization before the model call. No lines about omitted image payload, resize failed, or invalid base64 visible. Version 2026.5.12, Ollama gemma4 with vision capability confirmed. Is this a known sanitization issue with Telegram images?"

polar topazBOT
# safe pilot Found the exact log sequence: Inbound message telegram:... (direct, image/jpeg, ...

Yes, there’s a known/very similar report: https://github.com/openclaw/openclaw/issues/79750

That one was Telegram + native image drop with the exact Native image: dropped 1 image(s) after sanitization (prompt:images) line, though it was reported against Gemini on 2026.5.7. So I’d treat your Ollama/Gemma case as likely the same class of regression unless the logs prove otherwise.

One nuance: image/jpeg, 13 chars is probably the inbound text/caption length, not the image byte size. The important part is the later sanitizer drop.

Also, the lack of omitted image payload... lines makes sense from the current code: sanitizeImageBlocks() converts failures into text blocks internally, then sanitizeImageBlocks() filters those text blocks out and only logs the high-level dropped count. So the useful failure reason can be swallowed by that path.

Best next move: add your 2026.5.12 + Ollama/Gemma details to that issue, or open a new bug if maintainers prefer separate providers. Include:

  • openclaw --version
  • openclaw models status
  • ollama show <model> --json | jq '.capabilities'
  • the exact three log lines around inbound media + sanitizer drop
  • whether sending the same image as a Telegram document/file behaves differently than as a photo

If you can briefly run with debug logs, that may also help: OPENCLAW_LOG_LEVEL=debug openclaw logs --follow or set logging.level: "debug". Logging docs: https://docs.openclaw.ai/logging

safe pilot
polar topazBOT
# safe pilot https://github.com/openclaw/openclaw/issues/83023

Thanks, that issue has the right shape. One small thing I’d add there if you can: the actual model id as Ollama reports it (ollama list / ollama show ...) and whether ollama show <model> --json includes "vision" in capabilities.

The maintainer-useful distinction is: OpenClaw config says input: ["text", "image"], but does the Ollama /api/show capability probe also report vision for that exact model id? That narrows whether this is capability detection vs image sanitization/drop after load.

safe pilot
#

Better?

polar topazBOT
# safe pilot Better?

Better, yes. I’d still fill two gaps so it’s easier to action:

  • Logs, screenshots, and evidence: paste the exact log block there, even if it’s only 2-3 lines.
  • Additional provider/model setup: add the exact Ollama model id plus ollama show <model> --json | jq '.capabilities' output.

Right now the issue says Gemma4 (8B Q4_K_M) and that config has input: ["text", "image"], but the most useful proof is the raw Ollama capability result for that exact model. I’d also add the install method if you know it.

safe pilot
#

Better?

polar topazBOT
# safe pilot Better?

I’m still seeing the old version on GitHub from here: install method empty, provider/model setup empty, and no log block under evidence. Could be GitHub/readability cache, or the edit didn’t save.

If it did save on your side, the target version is basically:

  • install method filled in
  • exact log block pasted under evidence
  • exact Ollama model id
  • ollama show <model> --json | jq '.capabilities' output

Once those are in, yes, that’s a solid bug report.

safe pilot
#

I posted it into comments