#model cannot see pasted image in chatbox
1 messages · Page 1 of 1 (latest)
If you paste/upload an image in Discord and the agent replies but says it “sees nothing”, it’s usually one of these:
-
Your active reply model/provider integration isn’t actually getting the image bytes
A lot of “local vision” setups work in their own UI, but the OpenClaw provider path you’re using (often OpenAI-compatible / local gateway) may drop image parts, so the model only receives text. -
The image isn’t arriving as an attachment (it’s just an embed/link)
OpenClaw only has something to pass along if Discord delivered it as a real attachment.
Quick checks
- Are you sending it as a real attachment? (thumbnail + filename/size, not just an embed preview)
- What does
openclaw --versionsay? (version mismatches matter a lot) - If you can, paste a few lines of the gateway log from the moment you send the image (we’re looking for anything about inbound media/attachments).
The reliable fix (works even with non-vision local models): enable “Media Understanding”
OpenClaw can pre-describe inbound images and inject that text into the message as an [Image] block, so your local model has something concrete to work with even if it can’t accept raw image inputs.
If you tell me:
- your
openclaw --version - which provider you’re using for the “local visual model” (Ollama? LM Studio OpenAI-compatible? something else?)
…I can point you at the exact knob to turn for your setup.
i mean that i paste an image into my openclaw chatbox, but the model cannot see it , even though it is avisual model. i use local models through lmstudio server
If you’re pasting the image into the OpenClaw Control UI / WebChat chatbox, there are two separate things that have to be true before a model can “see” it:
- The chat UI has to actually upload it as an attachment (not just show it visually in the textbox)
- Your model provider has to support multimodal image input in the API mode you’re using
With LM Studio, the most common gotcha is #2:
LM Studio + OpenClaw: usually configured as text-only
The OpenClaw docs’ LM Studio “Local models” example registers the model with:
api: "openai-responses"- and
input: ["text"](text-only)
So even if the model you loaded in LM Studio is vision-capable, OpenClaw will treat it as not image-capable and won’t include image parts in the request.
What to do next (quick triage)
-
In your chat, run:
/model statusand paste the output here.
(This shows which provider/api mode is active, and helps confirm whether OpenClaw thinks the model can take images.) -
Confirm where you’re pasting the image:
- Control UI / WebChat chat tab vs Discord/Telegram/WhatsApp
- If the UI doesn’t show a real “attached file” (filename/size), the gateway may be receiving no attachment, so the model literally gets text only.
Reliable workaround (even if LM Studio vision isn’t wired up)
Enable Media Understanding so OpenClaw generates a short [Image] … description block from inbound images before the reply model runs:
https://docs.openclaw.ai/nodes/media-understanding
If you paste /model status + tell me whether this is WebChat/Control UI or Discord, I can tell you which side is dropping the image (UI upload vs LM Studio API mode/capability).
Current: lmstudio/openai/glm-4.6v-flash
Default: lmstudio/openai/glm-4.6v-flash
Agent: main
Auth file: ~.openclaw\agents\main\agent\auth-profiles.json