OBJECTIVE:
Implement ChatGPT-style image responses in OpenClaw, where agents:
β’ Automatically include images when helpful
β’ Use real images (not only generated ones)
β’ Deliver images directly in Telegram
CURRENT STATE:
β’ image_generate tool available (AI-generated images)
β’ message --media works for sending images
β’ No automatic image selection or web image retrieval
REQUIREMENTS:
- Agent decides contextually when images improve response
- Ability to fetch REAL images from the web (not just generate)
- Save images locally
- Send images via Telegram automatically
- Keep responses concise (text + image)
QUESTIONS:
-
What is the correct architecture to enable:
β’ web image search β download β send via Telegram -
Is there an existing tool/skill for:
β’ image search (Google/Bing/SerpAPI/etc)
β’ media retrieval -
If not, what is the minimal recommended implementation:
β’ tools required
β’ example flow
β’ best practice -
How should agent prompts be structured so they:
β’ automatically decide when to include images
β’ avoid overusing images
IMPORTANT:
Do NOT explain conceptually.
Return:
β’ Exact setup (tools + flow)
β’ Whether built-in or custom
β’ Minimal working approach
GOAL:
Achieve GPT-like behavior:
user asks β agent responds β images included automatically when useful