OpenClaw 2026.2.13, Feishu channel, model connected via custom baseUrl.
Configured models[].input: ["text", "image"] and tools.media.image.enabled: true with a media understanding model specified.
After user sends an image, logs show the image was downloaded successfully, but:
No media understanding processing logs at all
Model only receives the file path as text, cannot see the image content
Does image understanding work with a custom baseUrl provider? Any additional config needed?
#Feishu image not visible to model
1 messages · Page 1 of 1 (latest)
This is one of those “two different features with similar names” situations:
- “Vision input to the reply model” (your
models[].input: ["text","image"]) - Inbound “media understanding” pre-processing (
tools.media.image...) which runs a separate vision pass and injects a short text description intoctx.Body.
Why you’re seeing “only a file path”
If your primary reply model is marked as vision-capable (input includes "image"), OpenClaw will auto-skip image media-understanding with the assumption “the main model can see images natively”.
You can see this in code-path behavior: it explicitly skips image understanding when “primary model supports vision natively”, and then you won’t get any [Image] … Description: block added.
So with a custom baseUrl provider:
- you set
input: ["text","image"]→ OpenClaw assumes vision works → skips media-understanding - but your upstream may not actually accept the image bytes in the way OpenClaw sends them (or may only support
/chat/completionswithout a compatible vision format), so the model effectively just gets whatever placeholder/context OpenClaw can provide (often a local path string), not pixels.
What to try (fastest path)
A) If your goal is “at least get image content into the conversation” (via text description):
Temporarily make the primary model text-only so OpenClaw does not skip preprocessing:
- In your custom provider catalog, set the primary model to:
input: ["text"](remove"image")
Then configure media-understanding to use a known-working vision provider/model (or your custom provider if it truly supports vision):
tools.media.image.enabled: truetools.media.image.models: [ { provider: "...", model: "..." } ]
Docs: https://docs.openclaw.ai/nodes/media-understanding
B) If your goal is true “reply model sees pixels” via the custom baseUrl:
Check what models.providers.<id>.api is set to.
- If you’re using
api: "openai-completions": many OpenAI-compatible proxies only partially support multimodal, and local file paths are not usable by the remote model. - If your proxy supports OpenAI’s newer interface, try
api: "openai-responses"(that’s the mode OpenClaw’s multimodal plumbing is generally built around for “inline” inputs).
Config reference (custom providers + api modes): https://docs.openclaw.ai/gateway/configuration-reference
How to confirm what OpenClaw decided
Run /status after sending an image. When media understanding runs (or is skipped), it should show a line like:
📎 Media: image ok (...)
or📎 Media: image skipped (...)/no-attachment
If you want deeper visibility in logs, run the gateway with --verbose (console) and/or set logging.level: "debug" (file logs). Docs: https://docs.openclaw.ai/gateway/logging
Two quick questions (to point you to the exact fix)
- What is your custom provider configured as:
api: "openai-completions"orapi: "openai-responses"? - If you paste (redacting keys) just these parts, I can tell you which branch you’re hitting:
models.providers.<yourProvider>- the primary model entry (with
input: ...) tools.media.image(and/ortools.media.models)
Thanks for the detailed explanation! I followed Option A:
Changed primary model to input: ["text"] (removed "image")
Kept tools.media.image.enabled: true with model configured
After restart, sent an image via Feishu DM. Result:
Image was downloaded successfully (confirmed in logs)
But still NO media understanding logs at all — no "image ok", no "image skipped", nothing
Model still only receives file path text, no [Image] description block
My provider config:
models.providers.anthropic:
baseUrl: "https://custom-proxy.example.com/claude"
api: "anthropic-messages"
models: [{ id: "claude-opus-4-6", input: ["text"] }]
tools.media.image:
enabled: true
maxBytes: 10485760
maxChars: 500
models: [{ provider: "anthropic", model: "claude-opus-4-6", capabilities: ["image"], baseUrl: "https://custom-proxy.example.com/claude" }]
The api is "anthropic-messages" (not openai-completions/responses). Could this be the issue — does media understanding's image vision call work with anthropic-messages API via custom baseUrl?
Yep — this isn’t really “anthropic-messages can’t do vision”, it’s that media understanding’s image path uses the same model catalog capability gate as the main reply path.
1) Your current config can’t work as-written
You told OpenClaw:
models.providers.anthropic.models[{ id:"claude-opus-4-6", input:["text"] }]âś… (text-only)
But then you asked media-understanding to run:
tools.media.image.models[{ provider:"anthropic", model:"claude-opus-4-6" }]
Media understanding resolves anthropic/claude-opus-4-6 from the model catalog and then hard-requires:
model.input.includes("image")
So with input:["text"], image understanding will be skipped/failed internally (and will never produce a [Image] Description: block).
2) Also: tools.media.image.models[].baseUrl is (currently) ignored for images
In OpenClaw’s media-understanding implementation:
- audio provider entries honor
baseUrl/headersoverrides - image provider entries do not pass those overrides through; they rely on the provider/model catalog (
models.providers.<id>.baseUrl, etc.)
So in practice, for image understanding via a proxy, the supported path is:
- set
models.providers.anthropic.baseUrl(+headersif needed) - ensure the target model in
models.providers.anthropic.models[]advertisesinput: ["text","image"]
3) Why you saw “no media understanding logs at all”
Most of the media-understanding logs are behind debug-level file logs (or openclaw gateway --verbose).
To confirm what’s happening, do one of these and retry sending an image:
- set
logging.level: "debug"in config, restart, thenopenclaw logs --follow - or run the gateway with
--verbose
Then you should see lines like:
Media understanding image: no-attachment ...- or
image understanding failed: Model does not support images: anthropic/claude-opus-4-6 - or a success summary
Docs: https://docs.openclaw.ai/gateway/logging
Media understanding docs: https://docs.openclaw.ai/nodes/media-understanding
4) What to do next (pick one)
Option A (recommended): Use a different vision-capable model/provider for tools.media.image
Keep your primary model text-only if you want, but point media understanding at a provider/model that truly supports images (and is marked input: ["text","image"] in its catalog entry). That avoids the “same model can’t be both text-only and vision” contradiction.
Option B: Make anthropic/claude-opus-4-6 vision-capable in the catalog
Change it back to input: ["text","image"].
If your proxy actually supports Anthropic vision, then you shouldn’t need media understanding at all (OpenClaw will likely skip it and just send the image to the reply model directly).
If you do Option B and it still only passes a file path, that’s a strong sign your custom proxy’s anthropic-messages implementation is text-only (drops/ignores image blocks).