#read/analyze images sent via WhatsApp
1 messages · Page 1 of 1 (latest)
Roger: recommended path is OpenClaw's built-in media understanding, configured under tools.media. It's not a separate "vision skill" you install. Vendor integrations, including Google/Gemini, plug into that shared media pipeline.
For WhatsApp images specifically:
- make sure you're on a current build first:
openclaw --version - keep the Google AI Studio key as provider auth:
GEMINI_API_KEYorGOOGLE_API_KEY, or a SecretRef-backedmodels.providers.google.apiKey - enable/select image media understanding with a Google provider entry, e.g. shape-wise:
{
tools: {
media: {
image: {
enabled: true,
maxChars: 500,
models: [
{
provider: "google",
model: "<current Gemini vision-capable model from the docs>",
capabilities: ["image"]
}
]
}
}
}
}
If you want one Google entry to cover image/audio/video, set it in tools.media.models with capabilities: ["image", "audio", "video"].
Important distinction: plugins.entries.google.config.webSearch.apiKey is for Google web search plugin config, not the recommended place for vision. For image analysis, use model/provider auth and tools.media.
Docs:
Roger receives WhatsApp images but struggles to analyze them. We configured tools.media.image with gemini-2.0-flash. Roger can find the image file at /root/.openclaw/media/inbound/ via exec, but the image tool says "not under an allowed directory". What directory does tools.media expect images to be in, and how should Roger correctly pass a WhatsApp image to the image analysis tool?
Roger: that error is expected if the agent tries to feed the raw host-store path to the image tool.
There are two different locations involved:
- Raw inbound media store: something like
/root/.openclaw/media/inbound/.... This is gateway-owned storage. It may be visible viaexec, but it is not generally an allowed local root for the sandboxed agent image tool. - Agent/sandbox workspace copy:
media/inbound/<filename>. OpenClaw stages inbound attachments into the active workspace/sandbox and rewritesMediaPath/MediaPathsto that relative path.
So Roger should not discover /root/.openclaw/media/inbound/... with exec and pass that absolute path to the image tool. Use the staged message media path instead:
{
"image": "media/inbound/<the-whatsapp-image-filename>",
"prompt": "Describe this image."
}
or, in a tools.media.image CLI entry, use the template variable:
{
type: "cli",
command: "gemini",
args: [
"-m",
"<your Gemini model>",
"--allowed-tools",
"read_file",
"Read the media at {{MediaPath}} and describe it in <= {{MaxChars}} characters."
],
capabilities: ["image"]
}
For normal inbound WhatsApp handling, the clean path is to let tools.media.image run automatically from the channel-provided MediaPaths. It collects the attachment, stages it to media/inbound/*, and either adds an [Image] summary block or passes the original image through when the active reply model has native vision.
If it still tries the host path, check openclaw --version and logs around media staging; the docs say inbound attachments should be staged into media/inbound/* in the active workspace.
Docs:
https://docs.openclaw.ai/nodes/media-understanding
https://docs.openclaw.ai/nodes/images
https://docs.openclaw.ai/gateway/sandboxing
Roger confirms: inbound WhatsApp media is not being staged to media/inbound/ in the workspace. The file lands at /root/.openclaw/media/inbound/ on the host but the agent sandbox never gets a staged copy. We're on OpenClaw 2026.5.20. What config is needed to enable media staging for the WhatsApp channel?
Roger: there isn’t a WhatsApp-specific “enable staging” flag. Staging happens only when the target session is sandboxed.
The relevant behavior in 2026.5.20 is:
- WhatsApp saves the raw file under the gateway media store:
~/.openclaw/media/inbound/... - Before the agent run, OpenClaw copies eligible inbound media into the sandbox workspace as
media/inbound/<filename> - That copy only happens if
ensureSandboxWorkspaceForSession(...)resolves a sandbox for the session
So the config you want is sandboxing + non-main DM isolation, e.g.:
{
session: {
// recommended for WhatsApp/Telegram/etc DMs, especially multi-account
dmScope: "per-account-channel-peer"
// "per-channel-peer" is also ok for simpler single-account setups
},
agents: {
defaults: {
sandbox: {
mode: "non-main",
scope: "agent",
workspaceAccess: "none"
}
}
}
}
Why this matters: WhatsApp direct chats default to session.dmScope: "main", which collapses DMs into agent:main:main. With sandbox.mode: "non-main", the main session is intentionally not sandboxed, so there is no sandbox workspace to stage media into.
Two valid fixes:
- Recommended: isolate WhatsApp DMs and sandbox non-main sessions:
openclaw config set session.dmScope "per-account-channel-peer"
openclaw config set agents.defaults.sandbox.mode "non-main"
openclaw gateway restart
- If you intentionally keep WhatsApp DMs on the main session, sandbox main too:
openclaw config set agents.defaults.sandbox.mode "all"
openclaw gateway restart
After that, a new inbound WhatsApp image should show up to Roger as MediaPath / MediaPaths like media/inbound/<filename>, and the image tool should use that staged relative path, not /root/.openclaw/media/inbound/....
channels.whatsapp.mediaMaxMb only controls media size limits; it does not enable sandbox staging.
We tried agents.defaults.sandbox.mode=non-main but Roger crashes with: "Sandbox mode requires Docker, but the docker command was not found in PATH." The gateway runs inside Docker without Docker-in-Docker access. We need media staging without Docker-in-Docker. Is there a sandbox mode that works without Docker, or another way to get WhatsApp images staged to media/inbound/ in the workspace?
Roger: Docker is only the default sandbox backend. There are Docker-free sandbox backends, but there is not currently a separate “stage WhatsApp media into the normal workspace without a sandbox” switch.
You have three practical paths:
- If the goal is image understanding, don’t require workspace staging.
tools.media.imageruns before sandbox staging and can analyze the raw inbound WhatsApp media from the gateway media store. In that flow Roger should use the automatic[Image]summary/native vision path, not exec-discover/root/.openclaw/media/inbound/...and call the image tool manually.
That means config like this is enough for automatic inbound image analysis:
{
tools: {
media: {
image: {
enabled: true,
models: [
{ provider: "google", model: "gemini-2.0-flash", capabilities: ["image"] }
]
}
}
}
}
- If you need staged
media/inbound/*files for tools, use a non-Docker sandbox backend.
Setagents.defaults.sandbox.backendto"ssh"or"openshell".
SSH example:
{
session: {
dmScope: "per-account-channel-peer"
},
agents: {
defaults: {
sandbox: {
mode: "non-main",
backend: "ssh",
scope: "session",
workspaceAccess: "none",
ssh: {
target: "user@host:22",
workspaceRoot: "/tmp/openclaw-sandboxes",
identityFile: "~/.ssh/id_ed25519",
knownHostsFile: "~/.ssh/known_hosts"
}
}
}
}
}
With backend: "ssh", OpenClaw stages inbound media into the remote sandbox workspace and file/media tools read through the SSH filesystem bridge. No Docker command is needed in the gateway container.
OpenShell is the managed version of the same idea:
{
session: { dmScope: "per-account-channel-peer" },
agents: {
defaults: {
sandbox: {
mode: "non-main",
backend: "openshell",
scope: "session",
workspaceAccess: "rw"
}
}
},
plugins: {
entries: {
openshell: {
enabled: true,
config: {
from: "openclaw",
mode: "remote"
}
}
}
}
}
- If you want Docker sandboxing from a Dockerized gateway, use Docker-out-of-Docker, not Docker-in-Docker.
That means mounting the host Docker socket into the gateway container and using host-path parity for workspaces. It’s powerful and needs careful security posture, so I’d prefer SSH/OpenShell for this case if you don’t already expose Docker to the gateway.
After changing sandbox backend/config, restart the gateway and check:
openclaw sandbox explain
We're running the OpenClaw gateway in Docker. We keep hitting limitations: no Docker-in-Docker for sandboxing, media staging doesn't work, filesystem access issues. On a previous non-Docker install everything worked out of the box. Should we move the gateway out of Docker and run it directly on the host (Ubuntu 24.04)? What's the recommended migration path?
Roger: yes, for this setup I’d move the gateway out of Docker and run it directly on Ubuntu 24.04.
Docker is useful for isolated/throwaway gateway deployments, but you’re hitting exactly the rough edges: sandbox orchestration wants either host Docker socket/path parity, SSH/OpenShell, or a host-native gateway. If your previous non-Docker install worked out of the box, host-native is the simpler and more supported shape for WhatsApp media + local filesystem + sandbox workflows.
Recommended migration path:
- Back up the Docker-mounted OpenClaw state
Find the host directory mounted to/home/node/.openclawin Compose. Back up the whole thing, not justopenclaw.json.
It contains config, auth profiles, WhatsApp login state, sessions, media, and workspace data.
-
Stop the Docker gateway
Stop the container before copying so WhatsApp/session files aren’t changing mid-copy. -
Install OpenClaw on the Ubuntu host
Use the normal installer:
curl -fsSL https://openclaw.ai/install.sh | bash -s -- --no-onboard
Node 24 is the recommended runtime; the installer handles Node if needed.
- Copy the old state into the host user’s
~/.openclaw
If your Docker state directory is, for example,/srv/openclaw/.openclaw, copy/sync that into the home directory of the Linux user that will run the gateway:
rsync -a /srv/openclaw/.openclaw/ ~/.openclaw/
Then fix ownership if the files came from Docker/root:
sudo chown -R "$USER:$USER" ~/.openclaw
- Run doctor and restart
openclaw doctor
openclaw gateway restart
openclaw status
- Verify channels and media
Check WhatsApp is still linked, then send a fresh image. With host-native gateway + sandbox enabled, OpenClaw should be able to stage inbound media into the active sandbox workspace asmedia/inbound/*.
For your case, I’d keep:
{
session: {
dmScope: "per-account-channel-peer"
},
agents: {
defaults: {
sandbox: {
mode: "non-main",
scope: "session",
workspaceAccess: "none"
}
}
}
}
If Docker was exposing the dashboard on a public/VPS interface, re-check gateway bind/auth after migration before opening ports. Also keep the Docker backup around until WhatsApp, sessions, and model auth are confirmed working.
Docs:
https://docs.openclaw.ai/install
https://docs.openclaw.ai/install/migrating
https://docs.openclaw.ai/install/docker
https://docs.openclaw.ai/gateway/sandboxing
Migrating gateway from Docker to host-native. Running openclaw gateway start gives:
- Plugins blocked: whatsapp, lossless-claw, apify — suspicious ownership (uid=1000, expected root)
- Plugins not installed: voice-call, brave, lobster, codex
- "Gateway service disabled" — needs openclaw gateway install
Should we run openclaw gateway install first, then fix plugin ownership with chown -R root:root /root/.openclaw/extensions/ ?
Short answer: don’t run gateway install first, and don’t blindly chown -R root:root unless you intentionally want the host gateway to run as root.
Pick the OS user that will own/run OpenClaw, then make the whole state dir match that user. The plugin ownership check expects plugin files to be owned by the current gateway user, or by root. Right now those plugins are uid 1000 because they came from the Docker node user.
Recommended host-native order:
- Choose the runtime user
Prefer a normal user, e.g.ubuntu/openclaw, not root. Then put the migrated state under that user’s home:
sudo mkdir -p /home/openclaw
sudo rsync -a /root/.openclaw/ /home/openclaw/.openclaw/
sudo chown -R openclaw:openclaw /home/openclaw/.openclaw
If you are deliberately running as root, then yes, the equivalent is:
sudo chown -R root:root /root/.openclaw
But root-native is usually not the shape I’d pick unless this box is already managed that way.
- Run repair/diagnostics as that same user
openclaw doctor
openclaw plugins doctor
openclaw plugins list --verbose
- Fix missing plugins before installing the service
The “not installed” plugins mean config/install metadata references plugins that aren’t actually present in the active plugin root, or discovery can’t load them. Reinstall the ones you still need withopenclaw plugins install <package>, or disable/remove theirplugins.entries.<id>config if they were old Docker-only leftovers.
For example, voice-call is:
openclaw plugins install @openclaw/voice-call
For custom/community plugins like lossless-claw/apify/lobster/codex, use the same package/spec you originally installed, or inspect the old plugins.installs metadata first.
- Then install/start the host service
openclaw gateway install
openclaw gateway restart
openclaw status --all
Why this order: gateway install creates the systemd user service for the current user. If you install it before fixing state ownership, you can end up with a service that starts under one user while ~/.openclaw/extensions belongs to another, which produces exactly these plugin blocks.
We have GOOGLE_API_KEY set in env config, and tools.media.image configured with gemini-2.0-flash. But models.providers only has deepseek — no google provider. What's the minimal config needed to add Google/Gemini as a provider for tools.media.image?
Roger: if GOOGLE_API_KEY is genuinely visible to the gateway process, you usually don’t need a full custom models.providers.google catalog entry. google is the provider id, and Google auth falls back to GEMINI_API_KEY / GOOGLE_API_KEY.
That said, if you want to make it explicit in config, minimal provider config is:
{
models: {
mode: "merge",
providers: {
google: {
api: "google-generative-ai",
apiKey: { source: "env", provider: "default", id: "GOOGLE_API_KEY" }
}
}
},
tools: {
media: {
image: {
enabled: true,
models: [
{ provider: "google", model: "gemini-2.0-flash", capabilities: ["image"] }
]
}
}
}
}
Equivalent if you use GEMINI_API_KEY instead:
apiKey: { source: "env", provider: "default", id: "GEMINI_API_KEY" }
You should not put this under plugins.entries.google.config.webSearch.apiKey; that is for Gemini web search/grounding, not the model provider used by tools.media.image.
CLI form:
openclaw config set models.providers.google '{"api":"google-generative-ai","apiKey":{"source":"env","provider":"default","id":"GOOGLE_API_KEY"}}' --strict-json --merge
openclaw config set tools.media.image.models '[{"provider":"google","model":"gemini-2.0-flash","capabilities":["image"]}]' --strict-json --merge
openclaw gateway restart
Then verify from the same runtime user/service:
openclaw models status
openclaw models list --provider google
openclaw status --all
If the gateway runs under systemd, don’t trust your interactive shell env. Put the key somewhere the daemon reads, e.g. ~/.openclaw/.env or the SecretRef/env config you’re already using, then restart the service.
Docs:
https://docs.openclaw.ai/providers/google
https://docs.openclaw.ai/nodes/media-understanding
https://docs.openclaw.ai/gateway/configuration-reference