Feishu image not visible to model | Friends of the Crustacean 🦞🤝 | Page 1

turbid cobalt Feb 14, 2026, 4:41 PM

#

OpenClaw 2026.2.13, Feishu channel, model connected via custom baseUrl.
Configured models[].input: ["text", "image"] and tools.media.image.enabled: true with a media understanding model specified.
After user sends an image, logs show the image was downloaded successfully, but:
No media understanding processing logs at all
Model only receives the file path as text, cannot see the image content
Does image understanding work with a custom baseUrl provider? Any additional config needed?

inland lanternBOT Feb 14, 2026, 4:41 PM

#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution

undone thistleBOT Feb 14, 2026, 4:48 PM

#

This is one of those “two different features with similar names” situations:

“Vision input to the reply model” (your models[].input: ["text","image"])
Inbound “media understanding” pre-processing (tools.media.image...) which runs a separate vision pass and injects a short text description into ctx.Body.

Why you’re seeing “only a file path”

If your primary reply model is marked as vision-capable (input includes "image"), OpenClaw will auto-skip image media-understanding with the assumption “the main model can see images natively”.

You can see this in code-path behavior: it explicitly skips image understanding when “primary model supports vision natively”, and then you won’t get any [Image] … Description: block added.

So with a custom baseUrl provider:

you set input: ["text","image"] → OpenClaw assumes vision works → skips media-understanding
but your upstream may not actually accept the image bytes in the way OpenClaw sends them (or may only support /chat/completions without a compatible vision format), so the model effectively just gets whatever placeholder/context OpenClaw can provide (often a local path string), not pixels.

What to try (fastest path)

A) If your goal is “at least get image content into the conversation” (via text description):
Temporarily make the primary model text-only so OpenClaw does not skip preprocessing:

#

In your custom provider catalog, set the primary model to:
- input: ["text"] (remove "image")

Then configure media-understanding to use a known-working vision provider/model (or your custom provider if it truly supports vision):

tools.media.image.enabled: true
tools.media.image.models: [ { provider: "...", model: "..." } ]

Docs: https://docs.openclaw.ai/nodes/media-understanding

B) If your goal is true “reply model sees pixels” via the custom baseUrl:
Check what models.providers.<id>.api is set to.

If you’re using api: "openai-completions": many OpenAI-compatible proxies only partially support multimodal, and local file paths are not usable by the remote model.
If your proxy supports OpenAI’s newer interface, try api: "openai-responses" (that’s the mode OpenClaw’s multimodal plumbing is generally built around for “inline” inputs).

Config reference (custom providers + api modes): https://docs.openclaw.ai/gateway/configuration-reference

#

How to confirm what OpenClaw decided

Run /status after sending an image. When media understanding runs (or is skipped), it should show a line like:

📎 Media: image ok (...)
or
📎 Media: image skipped (...) / no-attachment

If you want deeper visibility in logs, run the gateway with --verbose (console) and/or set logging.level: "debug" (file logs). Docs: https://docs.openclaw.ai/gateway/logging

Two quick questions (to point you to the exact fix)

What is your custom provider configured as: api: "openai-completions" or api: "openai-responses"?
If you paste (redacting keys) just these parts, I can tell you which branch you’re hitting:
- models.providers.<yourProvider>
- the primary model entry (with input: ...)
- tools.media.image (and/or tools.media.models)

turbid cobalt Feb 15, 2026, 3:50 AM

#

Thanks for the detailed explanation! I followed Option A:
Changed primary model to input: ["text"] (removed "image")
Kept tools.media.image.enabled: true with model configured
After restart, sent an image via Feishu DM. Result:
Image was downloaded successfully (confirmed in logs)
But still NO media understanding logs at all — no "image ok", no "image skipped", nothing
Model still only receives file path text, no [Image] description block
My provider config:
models.providers.anthropic:
baseUrl: "https://custom-proxy.example.com/claude"
api: "anthropic-messages"
models: [{ id: "claude-opus-4-6", input: ["text"] }]
tools.media.image:
enabled: true
maxBytes: 10485760
maxChars: 500
models: [{ provider: "anthropic", model: "claude-opus-4-6", capabilities: ["image"], baseUrl: "https://custom-proxy.example.com/claude" }]
The api is "anthropic-messages" (not openai-completions/responses). Could this be the issue — does media understanding's image vision call work with anthropic-messages API via custom baseUrl?

undone thistleBOT Feb 15, 2026, 3:52 AM

#

Yep — this isn’t really “anthropic-messages can’t do vision”, it’s that media understanding’s image path uses the same model catalog capability gate as the main reply path.

1) Your current config can’t work as-written

You told OpenClaw:

models.providers.anthropic.models[{ id:"claude-opus-4-6", input:["text"] }] ✅ (text-only)

But then you asked media-understanding to run:

tools.media.image.models[{ provider:"anthropic", model:"claude-opus-4-6" }]

Media understanding resolves anthropic/claude-opus-4-6 from the model catalog and then hard-requires:

model.input.includes("image")

So with input:["text"], image understanding will be skipped/failed internally (and will never produce a [Image] Description: block).

#

2) Also: `tools.media.image.models[].baseUrl` is (currently) ignored for images

In OpenClaw’s media-understanding implementation:

audio provider entries honor baseUrl / headers overrides
image provider entries do not pass those overrides through; they rely on the provider/model catalog (models.providers.<id>.baseUrl, etc.)

So in practice, for image understanding via a proxy, the supported path is:

set models.providers.anthropic.baseUrl (+ headers if needed)
ensure the target model in models.providers.anthropic.models[] advertises input: ["text","image"]

3) Why you saw “no media understanding logs at all”

Most of the media-understanding logs are behind debug-level file logs (or openclaw gateway --verbose).

To confirm what’s happening, do one of these and retry sending an image:

set logging.level: "debug" in config, restart, then openclaw logs --follow
or run the gateway with --verbose

#

Then you should see lines like:

Media understanding image: no-attachment ...
or image understanding failed: Model does not support images: anthropic/claude-opus-4-6
or a success summary

Docs: https://docs.openclaw.ai/gateway/logging
Media understanding docs: https://docs.openclaw.ai/nodes/media-understanding

4) What to do next (pick one)

Option A (recommended): Use a different vision-capable model/provider for tools.media.image
Keep your primary model text-only if you want, but point media understanding at a provider/model that truly supports images (and is marked input: ["text","image"] in its catalog entry). That avoids the “same model can’t be both text-only and vision” contradiction.

Option B: Make anthropic/claude-opus-4-6 vision-capable in the catalog
Change it back to input: ["text","image"].
If your proxy actually supports Anthropic vision, then you shouldn’t need media understanding at all (OpenClaw will likely skip it and just send the image to the reply model directly).

If you do Option B and it still only passes a file path, that’s a strong sign your custom proxy’s anthropic-messages implementation is text-only (drops/ignores image blocks).

#Feishu image not visible to model