#Is image recognition broken?

1 messages · Page 1 of 1 (latest)

woeful arch
#

Hi, I am staying on one of the older version.
Was there report of image recognition by models (Gemini or anthropic) is broken?

I am on OpenClaw 2026.2.19-2.

fading pecanBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> âś… Mark Solution

thin crownBOT
#

Not aware of a Gemini- or Anthropic-specific “vision is broken” regression in 2026.2.19-2 — but there have been multiple reports lately where it looks like the model can’t see images, and the root cause is usually image ingestion / forwarding (channel/endpoint) rather than the LLM itself.

The common failure modes I’ve seen reported:

  1. Discord attachment fetch is wrong / stale (model keeps “seeing” an old image)

  2. Discord: model hallucinates because image never arrives (user perception = “vision broken”)

  3. OpenAI-compatible POST /v1/chat/completions drops image parts (works via other paths, but not that endpoint)

  4. image tool fails with Gemini (via OpenRouter etc.) with “Image model returned no text” (often token/params handling)

#

To narrow yours down quickly: when you say “broken”, which symptom do you get?

  • “I can’t see images” / refuses to analyze
  • It analyzes the wrong (previous) image
  • It errors (e.g. “Image model returned no text”, 4xx from provider, etc.)
  • You’re testing via Discord chat vs via /v1/chat/completions (OpenWebUI etc.)

If you can paste:

  • the exact error text (or 10–20 lines around it from openclaw logs --follow), and
  • whether this is Discord attachments or the HTTP API endpoint,

…we can usually pinpoint which bucket you’re in and whether upgrading is the only practical fix.

woeful arch
#

It just hallucinate when I attach images in telegram. Used to be fine.

thin crownBOT
#

If it’s Telegram and the bot now just hallucinates on attached photos, that’s usually not Gemini/Anthropic “vision broke” — it’s almost always the image never made it into the prompt (media download/forwarding issue), so the model only sees your text and guesses.

There have been a few reports in that area, for example:

Can you do this quick triage and paste the relevant bits?

  1. Run:
openclaw logs --follow

then send an image in Telegram again, and paste the log lines around:

  • any MediaFetchError / “failed to download media”
  • the “run start” line that shows which provider/model it used
  1. Also: are you sending the image as a photo or as a file/document? Telegram has a default mediaMaxMb (often 5MB) and bigger “document” uploads can get blocked/skipped. (Config ref shows channels.telegram.mediaMaxMb: https://docs.openclaw.ai/gateway/configuration-reference)
#
  1. Did you recently change the session model via /model to something text-only? (openclaw models list shows an input column — you want text+image for the active model.)

With those logs we can tell immediately whether it’s (a) download failing, (b) size limit, or (c) model selection/vision routing.

woeful arch
#

Ok they are all saying text + image. But they are image actually. How do I fix it ? In JSON file?

thin crownBOT
#

Hallucination on Telegram almost always means: the model never actually received the image bytes (so it’s answering from your text only). That’s usually either:

  1. you’re not actually running a vision-capable model for that session, or
  2. Telegram media download/size is failing, so the gateway drops the image.

Here’s the fastest way to pin it down + what you can fix in openclaw.json:

1) Confirm the active model is vision (not just “models list”)

Run:

openclaw models status

(or openclaw models status --agent main if you have multiple agents)

You want the resolved primary model to show input: text+image.
If it doesn’t, switch to a vision model (CLI is easiest):

#
openclaw models set <your-vision-model-id>

2) Check logs for Telegram media fetch failures

Run:

openclaw logs --follow

Then send an image again and look for errors like:

  • MediaFetchError: Failed to fetch media from https://api.telegram.org/file/...
  • or anything like “failed to download media”

If you see that, it’s a Telegram media download problem (not Gemini/Claude vision).

3) Common JSON fix: increase Telegram media size limit

Telegram has a mediaMaxMb cap (default is often 5MB). If you’re sending screenshots/documents, you can hit it.

#

In ~/.openclaw/openclaw.json (JSON5), set e.g.:

{
  channels: {
    telegram: {
      mediaMaxMb: 20,
    },
  },
}

Then restart:

openclaw gateway restart

Quick sanity test

Send a small image as a photo (not “file/document”) and ask “what’s in this image?”. If that works, it was almost certainly mediaMaxMb / document size.

#

If you paste the snippet from openclaw logs --follow around the moment you send the image (especially any MediaFetchError lines + the “run start … provider/model=…” line), I can tell you which of the above it is in one pass.

woeful arch
#

openclaw models list is showing text only but they are vision as well. How do I define that? The are from custom provider

thin crownBOT
#

Yep — for custom providers, OpenClaw can’t reliably infer “vision vs text-only” from the upstream API, so if you don’t declare it, it defaults to input: ["text"] and openclaw models list will show text-only.

You fix it by defining the model in your models.providers.<yourProvider>.models entry and setting input: ["text", "image"] in ~/.openclaw/openclaw.json (JSON5).

Example (OpenAI-compatible proxy/custom baseUrl):

{
  models: {
    mode: "merge",
    providers: {
      myprovider: {
        api: "openai-completions",
        baseUrl: "https://your-proxy.example.com/v1",
        apiKey: "${MYPROVIDER_API_KEY}",
        models: [
#
          {
            id: "my-vision-model",
            name: "My Vision Model",
            input: ["text", "image"], // <- THIS is what flips it to vision
          },
        ],
      },
    },
  },

  agents: {
    defaults: {
      model: { primary: "myprovider/my-vision-model" },
      // optional but recommended if you sometimes use text-only primaries:
      // imageModel: { primary: "myprovider/my-vision-model" },
woeful arch
#

Q