model cannot see pasted image in chatbox | Friends of the Crustacean 🦞🤝 | Page 1

stable aurora Mar 3, 2026, 9:12 AM

#

i paste an image in the chatbox, so my loca visual model see it, but model says it sees nothing

crystal jungleBOT Mar 3, 2026, 9:12 AM

#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution

urban stoneBOT Mar 3, 2026, 9:14 AM

#

If you paste/upload an image in Discord and the agent replies but says it “sees nothing”, it’s usually one of these:

Your active reply model/provider integration isn’t actually getting the image bytes
A lot of “local vision” setups work in their own UI, but the OpenClaw provider path you’re using (often OpenAI-compatible / local gateway) may drop image parts, so the model only receives text.
The image isn’t arriving as an attachment (it’s just an embed/link)
OpenClaw only has something to pass along if Discord delivered it as a real attachment.

Quick checks

Are you sending it as a real attachment? (thumbnail + filename/size, not just an embed preview)
What does openclaw --version say? (version mismatches matter a lot)
If you can, paste a few lines of the gateway log from the moment you send the image (we’re looking for anything about inbound media/attachments).

The reliable fix (works even with non-vision local models): enable “Media Understanding”

OpenClaw can pre-describe inbound images and inject that text into the message as an [Image] block, so your local model has something concrete to work with even if it can’t accept raw image inputs.

Docs: https://docs.openclaw.ai/nodes/media-understanding

#

If you tell me:

your openclaw --version
which provider you’re using for the “local visual model” (Ollama? LM Studio OpenAI-compatible? something else?)
…I can point you at the exact knob to turn for your setup.

stable aurora Mar 3, 2026, 12:04 PM

#

urban stone If you paste/upload an image in Discord and the agent replies but says it “sees ...

i mean that i paste an image into my openclaw chatbox, but the model cannot see it , even though it is avisual model. i use local models through lmstudio server

urban stoneBOT Mar 3, 2026, 12:06 PM

#

If you’re pasting the image into the OpenClaw Control UI / WebChat chatbox, there are two separate things that have to be true before a model can “see” it:

The chat UI has to actually upload it as an attachment (not just show it visually in the textbox)
Your model provider has to support multimodal image input in the API mode you’re using

With LM Studio, the most common gotcha is #2:

LM Studio + OpenClaw: usually configured as text-only

The OpenClaw docs’ LM Studio “Local models” example registers the model with:

api: "openai-responses"
and input: ["text"] (text-only)

So even if the model you loaded in LM Studio is vision-capable, OpenClaw will treat it as not image-capable and won’t include image parts in the request.

Docs: https://docs.openclaw.ai/gateway/local-models

#

What to do next (quick triage)

In your chat, run: /model status and paste the output here.
(This shows which provider/api mode is active, and helps confirm whether OpenClaw thinks the model can take images.)
Confirm where you’re pasting the image:
- Control UI / WebChat chat tab vs Discord/Telegram/WhatsApp
- If the UI doesn’t show a real “attached file” (filename/size), the gateway may be receiving no attachment, so the model literally gets text only.

Reliable workaround (even if LM Studio vision isn’t wired up)

Enable Media Understanding so OpenClaw generates a short [Image] … description block from inbound images before the reply model runs:
https://docs.openclaw.ai/nodes/media-understanding

If you paste /model status + tell me whether this is WebChat/Control UI or Discord, I can tell you which side is dropping the image (UI upload vs LM Studio API mode/capability).

stable aurora Mar 3, 2026, 12:14 PM

#

Current: lmstudio/openai/glm-4.6v-flash
Default: lmstudio/openai/glm-4.6v-flash
Agent: main
Auth file: ~.openclaw\agents\main\agent\auth-profiles.json

#model cannot see pasted image in chatbox

Quick checks

The reliable fix (works even with non-vision local models): enable “Media Understanding”

LM Studio + OpenClaw: usually configured as text-only

What to do next (quick triage)

Reliable workaround (even if LM Studio vision isn’t wired up)