#OPENCLAW β€” FULL IMAGE RESPONSE SETUP (GPT-LIKE BEHAVIOR)

1 messages Β· Page 1 of 1 (latest)

barren granite
#

OBJECTIVE:

Implement ChatGPT-style image responses in OpenClaw, where agents:

β€’ Automatically include images when helpful
β€’ Use real images (not only generated ones)
β€’ Deliver images directly in Telegram


CURRENT STATE:

β€’ image_generate tool available (AI-generated images)
β€’ message --media works for sending images
β€’ No automatic image selection or web image retrieval


REQUIREMENTS:

  1. Agent decides contextually when images improve response
  2. Ability to fetch REAL images from the web (not just generate)
  3. Save images locally
  4. Send images via Telegram automatically
  5. Keep responses concise (text + image)

QUESTIONS:

  1. What is the correct architecture to enable:
    β€’ web image search β†’ download β†’ send via Telegram

  2. Is there an existing tool/skill for:
    β€’ image search (Google/Bing/SerpAPI/etc)
    β€’ media retrieval

  3. If not, what is the minimal recommended implementation:
    β€’ tools required
    β€’ example flow
    β€’ best practice

  4. How should agent prompts be structured so they:
    β€’ automatically decide when to include images
    β€’ avoid overusing images


IMPORTANT:

Do NOT explain conceptually.

Return:

β€’ Exact setup (tools + flow)
β€’ Whether built-in or custom
β€’ Minimal working approach


GOAL:

Achieve GPT-like behavior:

user asks β†’ agent responds β†’ images included automatically when useful

shut gladeBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> βœ… Mark Solution

lost sparrowBOT
stray nexusBOT
#

Let me verify the docs and current built-in tool surface so I don't make up keys or capabilities.

#

Exact setup

Built-in

  • message, yes. Sends Telegram media.
  • image_generate, yes. Generated images only.
  • web_search, yes. General web search only, not real image search.
  • web_fetch, yes. HTML/text extraction only, not binary image download.

Not built-in

  • Google/Bing/SerpAPI image search, no
  • Generic web image download-to-file tool, no
  • Bundled skill for β€œreal image search + save + send”, no

Minimal recommended implementation
Use 2 custom tools plus built-ins:

  1. image_search_web(query, count=3)
#
  • backend: SerpAPI Google Images (simplest)
    • returns: [{ imageUrl, sourcePageUrl, title }]
  1. image_download(url, filenameHint?)
    • validate http/https
    • require content-type: image/*
    • size cap
    • save to local file
    • return { filePath }

Keep built-ins

  • message
  • image_generate as fallback only

Config

#
{
  channels: {
    telegram: {
      enabled: true,
      botToken: "123:abc",
      dmPolicy: "pairing",
    },
  },
  tools: {
    allow: [
      "group:messaging",
      "image_generate",
      "image_search_web",
      "image_download",
    ],
#
  },
}

If you also want normal web search for text/citations, add "group:web".

Exact flow

  1. User asks question.
  2. Agent decides if an image materially helps.
  3. If no, normal text reply.
  4. If yes:
    • call image_search_web
    • pick 1 best result
    • call image_download
    • call message to Telegram with local file
    • end with NO_REPLY
#

Example flow

image_search_web({ query: "Eiffel Tower at night", count: 3 })
image_download({ url: "<chosen imageUrl>" })
message({
  action: "send",
  channel: "telegram",
  target: "<chat-id-or-user>",
  media: "<local file path>",
  message: "Eiffel Tower at night."
})
NO_REPLY

Prompt block

#
Use images only when they materially improve the answer.
Default to no image.
Use at most 1 image unless the user asks for more.
Prefer real images via `image_search_web` + `image_download`.
Use `image_generate` only if the user explicitly wants generated art, or no suitable real image exists.
When sending an image, keep the text to 1 or 2 short sentences.
After `message` sends the Telegram reply, output NO_REPLY.

Built-in or custom

  • Built-in only: not enough
  • Minimal working approach: custom image search tool + custom download tool + built-in message
  • Skill: optional, for routing/prompting only. The search/download part still needs custom tools.

Docs: https://docs.openclaw.ai/tools/web https://docs.openclaw.ai/channels/telegram https://docs.openclaw.ai/tools/image-generation https://docs.openclaw.ai/tools/creating-skills

barren granite
#

Could this prompt be part of soul or does it need to go to each agents individual prompt?

stray nexusBOT
#

Quick check on prompt-placement docs so I answer the right layer, not just vibes.

#

Short answer: not SOUL.md.

Best placement

  1. Shared skill for the image policy/flow
  2. Per-agent skill allowlist if only some agents should use it
  3. Telegram systemPrompt only if you want this behavior on Telegram but not elsewhere

Do not put the main logic in SOUL.md

  • SOUL.md is for voice, tone, brevity
  • your image logic is tool-routing behavior
  • also, sub-agents do not get SOUL.md, so it is a bad place for this rule

Exact setup

If all agents should use it

  • create: ~/.openclaw/skills/real-image-replies/SKILL.md
  • allow it by default:
#
{
  agents: {
    defaults: {
      skills: ["real-image-replies"],
    },
  },
}

If only some agents should use it

{
  agents: {
    list: [
      { id: "telegram-main", skills: ["real-image-replies"] },
#
      { id: "writer", skills: [] },
    ],
  },
}

If only Telegram should use it

{
  channels: {
    telegram: {
      groups: {
        "*": {
          systemPrompt: "When a real image materially helps, use the real-image flow. Max 1 image unless asked.",
        },
#
      },
    },
  },
}

What goes in the shared skill

When a real image materially helps, use:
image_search_web -> image_download -> message

Rules:
- default to no image
- max 1 image unless the user asks for more
- prefer real images
#
- use image_generate only as fallback
- keep text to 1 or 2 short sentences
- after message sends the Telegram reply, output NO_REPLY

What can stay in SOUL.md

Keep replies concise.
Do not overuse images.
Prefer showing over telling when a visual genuinely helps.

Recommendation

  • shared skill for the behavior
  • SOUL.md only for tone
  • per-agent / Telegram systemPrompt only for overrides