#Is image recognition broken?
1 messages · Page 1 of 1 (latest)
Not aware of a Gemini- or Anthropic-specific “vision is broken” regression in 2026.2.19-2 — but there have been multiple reports lately where it looks like the model can’t see images, and the root cause is usually image ingestion / forwarding (channel/endpoint) rather than the LLM itself.
The common failure modes I’ve seen reported:
-
Discord attachment fetch is wrong / stale (model keeps “seeing” an old image)
-
Discord: model hallucinates because image never arrives (user perception = “vision broken”)
-
OpenAI-compatible
POST /v1/chat/completionsdrops image parts (works via other paths, but not that endpoint)- Issue: https://github.com/openclaw/openclaw/issues/18583
- Docs for the endpoint: https://docs.openclaw.ai/gateway/openai-http-api
-
imagetool fails with Gemini (via OpenRouter etc.) with “Image model returned no text” (often token/params handling)
To narrow yours down quickly: when you say “broken”, which symptom do you get?
- “I can’t see images” / refuses to analyze
- It analyzes the wrong (previous) image
- It errors (e.g. “Image model returned no text”, 4xx from provider, etc.)
- You’re testing via Discord chat vs via /v1/chat/completions (OpenWebUI etc.)
If you can paste:
- the exact error text (or 10–20 lines around it from
openclaw logs --follow), and - whether this is Discord attachments or the HTTP API endpoint,
…we can usually pinpoint which bucket you’re in and whether upgrading is the only practical fix.
It just hallucinate when I attach images in telegram. Used to be fine.
If it’s Telegram and the bot now just hallucinates on attached photos, that’s usually not Gemini/Anthropic “vision broke” — it’s almost always the image never made it into the prompt (media download/forwarding issue), so the model only sees your text and guesses.
There have been a few reports in that area, for example:
- Telegram media download fails (
MediaFetchError: … TypeError: fetch failed) → image never arrives: https://github.com/openclaw/openclaw/issues/4662 and https://github.com/openclaw/openclaw/issues/24854 - Telegram receives the image but doesn’t hand it to the configured
imageModel(more relevant when primary model is text-only): https://github.com/openclaw/openclaw/issues/7564
Can you do this quick triage and paste the relevant bits?
- Run:
openclaw logs --follow
then send an image in Telegram again, and paste the log lines around:
- any
MediaFetchError/ “failed to download media” - the “run start” line that shows which provider/model it used
- Also: are you sending the image as a photo or as a file/document? Telegram has a default
mediaMaxMb(often 5MB) and bigger “document” uploads can get blocked/skipped. (Config ref showschannels.telegram.mediaMaxMb: https://docs.openclaw.ai/gateway/configuration-reference)
- Did you recently change the session model via
/modelto something text-only? (openclaw models listshows aninputcolumn — you wanttext+imagefor the active model.)
With those logs we can tell immediately whether it’s (a) download failing, (b) size limit, or (c) model selection/vision routing.
Ok they are all saying text + image. But they are image actually. How do I fix it ? In JSON file?
Hallucination on Telegram almost always means: the model never actually received the image bytes (so it’s answering from your text only). That’s usually either:
- you’re not actually running a vision-capable model for that session, or
- Telegram media download/size is failing, so the gateway drops the image.
Here’s the fastest way to pin it down + what you can fix in openclaw.json:
1) Confirm the active model is vision (not just “models list”)
Run:
openclaw models status
(or openclaw models status --agent main if you have multiple agents)
You want the resolved primary model to show input: text+image.
If it doesn’t, switch to a vision model (CLI is easiest):
openclaw models set <your-vision-model-id>
2) Check logs for Telegram media fetch failures
Run:
openclaw logs --follow
Then send an image again and look for errors like:
MediaFetchError: Failed to fetch media from https://api.telegram.org/file/...- or anything like “failed to download media”
If you see that, it’s a Telegram media download problem (not Gemini/Claude vision).
3) Common JSON fix: increase Telegram media size limit
Telegram has a mediaMaxMb cap (default is often 5MB). If you’re sending screenshots/documents, you can hit it.
In ~/.openclaw/openclaw.json (JSON5), set e.g.:
{
channels: {
telegram: {
mediaMaxMb: 20,
},
},
}
Then restart:
openclaw gateway restart
Quick sanity test
Send a small image as a photo (not “file/document”) and ask “what’s in this image?”. If that works, it was almost certainly mediaMaxMb / document size.
If you paste the snippet from openclaw logs --follow around the moment you send the image (especially any MediaFetchError lines + the “run start … provider/model=…” line), I can tell you which of the above it is in one pass.
openclaw models list is showing text only but they are vision as well. How do I define that? The are from custom provider
Yep — for custom providers, OpenClaw can’t reliably infer “vision vs text-only” from the upstream API, so if you don’t declare it, it defaults to input: ["text"] and openclaw models list will show text-only.
You fix it by defining the model in your models.providers.<yourProvider>.models entry and setting input: ["text", "image"] in ~/.openclaw/openclaw.json (JSON5).
Example (OpenAI-compatible proxy/custom baseUrl):
{
models: {
mode: "merge",
providers: {
myprovider: {
api: "openai-completions",
baseUrl: "https://your-proxy.example.com/v1",
apiKey: "${MYPROVIDER_API_KEY}",
models: [
{
id: "my-vision-model",
name: "My Vision Model",
input: ["text", "image"], // <- THIS is what flips it to vision
},
],
},
},
},
agents: {
defaults: {
model: { primary: "myprovider/my-vision-model" },
// optional but recommended if you sometimes use text-only primaries:
// imageModel: { primary: "myprovider/my-vision-model" },
},
},
}
Then:
openclaw gateway restart
openclaw models list
Docs (custom providers + the input field): https://docs.openclaw.ai/concepts/model-providers#providers-via-modelsproviders-custombase-url
If after this it still hallucinates on Telegram images, paste the relevant openclaw logs --follow chunk when you send an image — we need to confirm Telegram media download succeeded (otherwise the model never sees the bytes, even if it’s marked text+image).
Q