Is image recognition broken? | Friends of the Crustacean 🦞🤝 | Page 1

woeful arch Feb 27, 2026, 8:45 PM

#

Hi, I am staying on one of the older version.
Was there report of image recognition by models (Gemini or anthropic) is broken?

I am on OpenClaw 2026.2.19-2.

fading pecanBOT Feb 27, 2026, 8:45 PM

#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution

thin crownBOT Feb 27, 2026, 8:48 PM

#

Not aware of a Gemini- or Anthropic-specific “vision is broken” regression in 2026.2.19-2 — but there have been multiple reports lately where it looks like the model can’t see images, and the root cause is usually image ingestion / forwarding (channel/endpoint) rather than the LLM itself.

The common failure modes I’ve seen reported:

Discord attachment fetch is wrong / stale (model keeps “seeing” an old image)
- Issue: https://github.com/openclaw/openclaw/issues/7917
Discord: model hallucinates because image never arrives (user perception = “vision broken”)
- Issue: https://github.com/openclaw/openclaw/issues/6410
OpenAI-compatible POST /v1/chat/completions drops image parts (works via other paths, but not that endpoint)
- Issue: https://github.com/openclaw/openclaw/issues/18583
- Docs for the endpoint: https://docs.openclaw.ai/gateway/openai-http-api
image tool fails with Gemini (via OpenRouter etc.) with “Image model returned no text” (often token/params handling)
- Issue: https://github.com/openclaw/openclaw/issues/8096

#

To narrow yours down quickly: when you say “broken”, which symptom do you get?

“I can’t see images” / refuses to analyze
It analyzes the wrong (previous) image
It errors (e.g. “Image model returned no text”, 4xx from provider, etc.)
You’re testing via Discord chat vs via /v1/chat/completions (OpenWebUI etc.)

If you can paste:

the exact error text (or 10–20 lines around it from openclaw logs --follow), and
whether this is Discord attachments or the HTTP API endpoint,

…we can usually pinpoint which bucket you’re in and whether upgrading is the only practical fix.

woeful arch Feb 27, 2026, 8:49 PM

#

It just hallucinate when I attach images in telegram. Used to be fine.

thin crownBOT Feb 27, 2026, 8:51 PM

#

If it’s Telegram and the bot now just hallucinates on attached photos, that’s usually not Gemini/Anthropic “vision broke” — it’s almost always the image never made it into the prompt (media download/forwarding issue), so the model only sees your text and guesses.

There have been a few reports in that area, for example:

Telegram media download fails (MediaFetchError: … TypeError: fetch failed) → image never arrives: https://github.com/openclaw/openclaw/issues/4662 and https://github.com/openclaw/openclaw/issues/24854
Telegram receives the image but doesn’t hand it to the configured imageModel (more relevant when primary model is text-only): https://github.com/openclaw/openclaw/issues/7564

Can you do this quick triage and paste the relevant bits?

Run:

openclaw logs --follow

then send an image in Telegram again, and paste the log lines around:

any MediaFetchError / “failed to download media”
the “run start” line that shows which provider/model it used

Also: are you sending the image as a photo or as a file/document? Telegram has a default mediaMaxMb (often 5MB) and bigger “document” uploads can get blocked/skipped. (Config ref shows channels.telegram.mediaMaxMb: https://docs.openclaw.ai/gateway/configuration-reference)

#

Did you recently change the session model via /model to something text-only? (openclaw models list shows an input column — you want text+image for the active model.)

With those logs we can tell immediately whether it’s (a) download failing, (b) size limit, or (c) model selection/vision routing.

woeful arch Feb 27, 2026, 8:56 PM

#

Ok they are all saying text + image. But they are image actually. How do I fix it ? In JSON file?

thin crownBOT Feb 27, 2026, 8:56 PM

#

Hallucination on Telegram almost always means: the model never actually received the image bytes (so it’s answering from your text only). That’s usually either:

you’re not actually running a vision-capable model for that session, or
Telegram media download/size is failing, so the gateway drops the image.

Here’s the fastest way to pin it down + what you can fix in openclaw.json:

1) Confirm the active model is vision (not just “models list”)

Run:

openclaw models status

(or openclaw models status --agent main if you have multiple agents)

You want the resolved primary model to show input: text+image.
If it doesn’t, switch to a vision model (CLI is easiest):

#

openclaw models set <your-vision-model-id>

2) Check logs for Telegram media fetch failures

Run:

openclaw logs --follow

Then send an image again and look for errors like:

MediaFetchError: Failed to fetch media from https://api.telegram.org/file/...
or anything like “failed to download media”

If you see that, it’s a Telegram media download problem (not Gemini/Claude vision).

3) Common JSON fix: increase Telegram media size limit

Telegram has a mediaMaxMb cap (default is often 5MB). If you’re sending screenshots/documents, you can hit it.

#

In ~/.openclaw/openclaw.json (JSON5), set e.g.:

{
  channels: {
    telegram: {
      mediaMaxMb: 20,
    },
  },
}

Then restart:

openclaw gateway restart

Quick sanity test

Send a small image as a photo (not “file/document”) and ask “what’s in this image?”. If that works, it was almost certainly mediaMaxMb / document size.

#

If you paste the snippet from openclaw logs --follow around the moment you send the image (especially any MediaFetchError lines + the “run start … provider/model=…” line), I can tell you which of the above it is in one pass.

woeful arch Feb 27, 2026, 8:58 PM

#

openclaw models list is showing text only but they are vision as well. How do I define that? The are from custom provider

thin crownBOT Feb 27, 2026, 8:59 PM

#

Yep — for custom providers, OpenClaw can’t reliably infer “vision vs text-only” from the upstream API, so if you don’t declare it, it defaults to input: ["text"] and openclaw models list will show text-only.

You fix it by defining the model in your models.providers.<yourProvider>.models entry and setting input: ["text", "image"] in ~/.openclaw/openclaw.json (JSON5).

Example (OpenAI-compatible proxy/custom baseUrl):

{
  models: {
    mode: "merge",
    providers: {
      myprovider: {
        api: "openai-completions",
        baseUrl: "https://your-proxy.example.com/v1",
        apiKey: "${MYPROVIDER_API_KEY}",
        models: [

#

          {
            id: "my-vision-model",
            name: "My Vision Model",
            input: ["text", "image"], // <- THIS is what flips it to vision
          },
        ],
      },
    },
  },

  agents: {
    defaults: {
      model: { primary: "myprovider/my-vision-model" },
      // optional but recommended if you sometimes use text-only primaries:
      // imageModel: { primary: "myprovider/my-vision-model" },

#

    },
  },
}

Then:

openclaw gateway restart
openclaw models list

Docs (custom providers + the input field): https://docs.openclaw.ai/concepts/model-providers#providers-via-modelsproviders-custombase-url

If after this it still hallucinates on Telegram images, paste the relevant openclaw logs --follow chunk when you send an image — we need to confirm Telegram media download succeeded (otherwise the model never sees the bytes, even if it’s marked text+image).

woeful arch Feb 27, 2026, 9:03 PM

#

Q

#Is image recognition broken?

1) Confirm the active model is vision (not just “models list”)

2) Check logs for Telegram media fetch failures

3) Common JSON fix: increase Telegram media size limit

Quick sanity test