#How do i properly sandbox my agents? The agents can't connect with the gateway.

1 messages · Page 1 of 1 (latest)

stuck jasperBOT
#

A couple of big red flags in that snippet:

  1. You leaked secrets in a public thread.
    Please edit the message and rotate:
  • the Google API key(s)
  • the Gateway token
  • any other passwords/keys you pasted (even if you think they’re “dev only”)
  1. wss://host.docker.internal:18789 is almost certainly wrong.
    On port 18789 the Gateway is normally plain WebSocket (no TLS), so it should be:
  • OPENCLAW_GATEWAY_URL=ws://host.docker.internal:18789 (not wss://…)
  • and you should remove NODE_TLS_REJECT_UNAUTHORIZED=0 (that’s a global TLS safety-off switch, and it won’t fix a wrong protocol anyway)
  1. Even with host.docker.internal, the sandbox can’t reach the Gateway if the Gateway is loopback-only.
    By default the Gateway binds to loopback (127.0.0.1). From inside Docker, host.docker.internal hits the host’s bridge IP (e.g. 172.17.0.1), and a loopback-only server will refuse that connection.
#

So check what you’re actually running:

  • What’s your openclaw --version?
  • Paste openclaw gateway status output (it shows the probe URL / bind behavior).

Fix (typical)

Make the Gateway listen on a non-loopback interface (and keep auth on):

// ~/.openclaw/openclaw.json
{
  gateway: {
    bind: "lan", // or tailnet/custom depending on your setup
    auth: { mode: "token", token: "…generate a new long random token…" },
  },
}
#

Then restart the gateway, and in your sandbox env use:

  • OPENCLAW_GATEWAY_URL=ws://host.docker.internal:18789

Docs that explain the bind + remote model clearly:

If you paste the exact error line you see (from openclaw logs --follow) when the sandbox tries to start/connect, I can tell you which of the above it is (wrong scheme vs loopback bind vs token/auth mismatch).

sage locust
#

gateway is in lan. but in sandbox, openclaw connects to localhost, but network:host is blocked. so should mode be remote?

#

goal is for sandboxed agents to be able to use openclaw commands. is this not default behaviour? do you need extra configs?

stuck jasperBOT
#

No — you generally don’t fix sandbox → gateway connectivity by using Docker network: "host" (it’s blocked on purpose), and you also usually don’t want to flip the Gateway into “remote”.

What’s happening is simpler:

  • Inside a container, localhost is the container, not your host.
  • If you run openclaw … CLI inside the sandbox (e.g. via exec), the CLI defaults to ws://127.0.0.1:18789 unless you tell it otherwise.
  • network: host wouldn’t be the right fix anyway; the right fix is: connect to the host IP from the container (bridge) or configure the CLI to use a remote URL.

The correct patterns

A) Best: don’t call openclaw from inside the sandbox at all

If the goal is “tools should work”, you don’t need the sandbox container to talk to the gateway. The gateway orchestrates tool calls; the container just executes them.

So if some skill/script is doing openclaw status / openclaw config … / etc inside exec, consider replacing that with native tools (sessions_*, message, etc.) instead of shelling out to the CLI.

B) If you do need openclaw CLI inside the sandbox: use --url (and token)

Example:

#
openclaw gateway health --url ws://host.docker.internal:18789 --token "$OPENCLAW_GATEWAY_TOKEN"

Two notes:

  • Use ws://, not wss://, unless you explicitly put TLS in front of the gateway.
  • openclaw gateway probe is special: per docs it probes localhost even if remote is configured, so don’t use that command as your “did it try localhost?” signal. (Docs: https://docs.openclaw.ai/cli/gateway)

C) Or configure that environment as a remote client (yes, “remote mode”, but for the CLI/client)

This is the “should mode be remote?” answer: only for the client context running inside the container, not for the gateway service itself.

In the config that the sandboxed CLI reads:

{
  gateway: {
#
    mode: "remote",
    remote: {
      url: "ws://host.docker.internal:18789",
      token: "…"
    }
  }
}

Docs: https://docs.openclaw.ai/gateway/remote

Quick sanity checks to paste

  1. openclaw --version
  2. openclaw gateway status (host)
  3. The exact command you’re running “in sandbox” that tries localhost (and the error line)
#

With those, I can tell you whether you’re just hitting the “CLI defaults to localhost” behavior vs a bind/auth issue.

#

It’s not default behavior in the “just works” sense, because the sandbox is primarily meant to run tools (exec/read/write/…) in Docker while the Gateway stays the control plane on the host.

If you want sandboxed sessions to also run openclaw … CLI commands inside the container, you need a couple explicit things:

1) The sandbox must have network egress

By default sandbox containers are network: "none" (no outbound). So you must opt in to something like:

agents: {
  defaults: {
    sandbox: {
      docker: {
        network: "bridge",
        extraHosts: ["host.docker.internal:host-gateway"], // needed on many Linux setups
      },
    },
#
  },
}

(You already did this part.)

2) The CLI inside the container must target the host Gateway (not container localhost)

Inside Docker, ws://127.0.0.1:18789 points at the container, so it’ll fail unless you run a gateway inside the container.

Use either:

Option A: pass --url (and --token) when you call it

openclaw gateway status --url ws://host.docker.internal:18789 --token "$OPENCLAW_GATEWAY_TOKEN"
#

Important: use ws://, not wss://, unless you explicitly put TLS in front of the gateway.

Option B: set “remote mode” for the CLI environment (yes, remote — but only for the sandboxed CLI)

Create a config that the sandboxed CLI reads (or set env vars) so it doesn’t default to localhost:

{
  gateway: {
    mode: "remote",
    remote: {
      url: "ws://host.docker.internal:18789",
      token: "REDACTED",
    },
  },
}
#

If you do this, make sure the CLI is actually reading that file (often you’ll want to set OPENCLAW_CONFIG_PATH to a path inside the sandbox).

Docs: https://docs.openclaw.ai/gateway/remote

3) One “gotcha”: some commands probe localhost on purpose

openclaw gateway probe explicitly probes localhost even if remote is configured (by design), so don’t use that as your test when you’re inside the container. (Docs: https://docs.openclaw.ai/cli/gateway)

Security note

Letting an LLM run openclaw config …, openclaw gateway restart, etc. from inside the sandbox is basically giving it operator/control-plane powers. It’s doable, just be deliberate about it.

If you paste:

  • the exact openclaw … command you’re running inside the sandbox, and
  • the first error line it prints,

…I can tell you whether you’re hitting (a) wrong URL default, (b) wss vs ws, (c) no network egress, or (d) pairing/auth.

sage locust
#

what about for something like cron? is it supposed to call openclaw cron? that's the onenot working. what's the correct behaviour?

stuck jasperBOT
#

Cron is not supposed to call openclaw cron.

Correct behavior:

  • Cron runs inside the Gateway process (on the host), and jobs are stored under ~/.openclaw/cron/….
  • When a job fires, the Gateway either:
    • enqueues a main-session system event, or
    • runs an isolated agent turn in cron:<jobId> (fresh session), with optional delivery.

Docs: https://docs.openclaw.ai/automation/cron-jobs

So if “cron isn’t working” because something inside the sandbox is trying to run openclaw cron …, that’s the wrong mental model.

Why openclaw cron … fails in the sandbox

That command is just a CLI client that talks to the Gateway over WebSocket RPC. In a container it will try ws://127.0.0.1:18789 unless you override it — and 127.0.0.1 inside the container is the container, not the host.

If you really want to run the CLI inside the sandbox, you must do the same fix as before:

  • run it with --url ws://host.docker.internal:18789 --token …, or
#
  • set gateway.mode="remote" + gateway.remote.url/token for that sandboxed CLI environment.

But again: that’s for managing cron, not for cron to fire.

If the cron job itself isn’t firing

Use the scheduler troubleshooting ladder (on the host where the Gateway runs):

openclaw cron status
openclaw cron list
openclaw cron runs --id <jobId> --limit 20
openclaw logs --follow

And verify:

  • cron.enabled is true
  • OPENCLAW_SKIP_CRON is not set
#
  • the Gateway is running continuously (cron runs inside it)

Docs: https://docs.openclaw.ai/automation/troubleshooting

Key sandboxing interaction (good news)

If your cron job runs an isolated session, that session is a “non-main” key — so with agents.defaults.sandbox.mode: "non-main" (or "all"), the tools that job uses (exec/read/write/…) will run in Docker automatically. Cron scheduling still happens on the host.

If you paste:

  • the output of openclaw cron status
  • and one failing run line from openclaw cron runs --id <jobId> --limit 5

…I can tell you if it’s “not firing”, “firing but failing”, or “firing but delivery is none/misconfigured”.

sage locust
#

then what are the agents supposed to call when scheduling corns? what about the other commands?

stuck jasperBOT
#

For scheduling cron, agents shouldn’t be running openclaw cron … via exec at all.

What agents are “supposed to call” to schedule cron

They should call the cron tool (Gateway RPC: cron.add, cron.update, cron.run, etc.), or you do it via:

  • Control UI → Cron panel, or
  • CLI on the host (openclaw cron add …) as the operator.

Cron docs: https://docs.openclaw.ai/automation/cron-jobs

Why yours “doesn’t work” in sandbox

In sandboxed sessions, OpenClaw applies an extra tool-policy clamp: tools.sandbox.tools. And by default, sandboxed sessions typically deny control-plane tools like cron/gateway (see the Docker/sandbox docs example).

So if you want an agent running in a sandboxed session to be able to schedule cron jobs, you must explicitly allow the cron tool for sandboxed sessions:

{
#

“What about the other commands?”

Same idea: most openclaw <thing> CLI commands are just a human/operator client for RPC. Agents should use the corresponding tool instead of shelling out:

  • Messaging → message tool
  • Web → web_search / web_fetch
  • Session introspection → sessions_list / sessions_history / sessions_send
  • Cron → cron tool
  • Low-level gateway RPC/admin → gateway tool (I’d keep this blocked unless you really need it)

If you tell me which specific CLI commands you want the agent to run (besides cron), I can tell you the right tool equivalent + which ones are “operator-only” and shouldn’t be given to a sandboxed LLM.

sage locust
#

is tools.sandbox separate from tools.allow? i think this is my error. i dont have a separate tools.sandbox

stuck jasperBOT
#

Yes — tools.sandbox is a separate, extra tool-policy layer that only applies when the current session is sandboxed.

Think of it like this (in order):

  1. Global tool policy: tools.profile + tools.allow / tools.deny
    → applies to all sessions (host + sandbox)

  2. Sandbox clamp (only if the session is sandboxed): tools.sandbox.tools.allow / tools.sandbox.tools.deny
    → further restricts what a sandboxed session can do
    → deny always wins

So if something works on-host but fails when sandboxed, it’s often because the sandbox clamp is denying it (cron/gateway are commonly denied by default in sandboxed sessions).

What to do if you want sandboxed sessions to be able to schedule cron

Make sure cron is allowed globally (if you use an allowlist) and not denied in the sandbox clamp.

Example: allow cron in sandbox but keep gateway blocked:

#
{
  tools: {
    // only needed if you use tools.allow as an allowlist:
    // allow: ["cron", "message", "group:fs", "group:runtime", "group:web"],

    sandbox: {
      tools: {
        // If you set allow here, it becomes an allowlist for sandboxed sessions.
        // Include everything you want sandboxed sessions to keep using.
        allow: ["cron", "group:fs", "group:runtime", "group:web", "message", "group:sessions"],

        // Keep the true “control plane” stuff blocked if you want
        deny: ["gateway", "nodes", "browser", "canvas"],
      },
    },
#
  },
}

Fast way to confirm what’s blocking you

Run this on the gateway host:

openclaw sandbox explain

It prints the effective sandbox mode + the effective tool policy and will usually point at the exact key that’s denying a tool. (Doc mentions it in the sandboxing page: https://docs.openclaw.ai/gateway/sandboxing)

If you paste the exact error you see when the agent tries to schedule cron (usually “tool denied …”), I can tell you whether it’s coming from tools.allow/deny or tools.sandbox.tools.

sage locust
#

do i need to put the tools on both allow and sanxbox tools?

stuck jasperBOT
#

Only if you’re using allowlists in both places.

Rules of thumb:

  • tools.allow / tools.deny = global policy (host + sandbox).
  • tools.sandbox.tools.allow / deny = extra restriction that applies only when sandboxed.

Cases

1) You do not set any allowlists (recommended to start)

If you leave both tools.allow and tools.sandbox.tools.allow unset/empty, then you generally only need to worry about denies.

So to enable cron in sandbox, you’d just ensure:

  • cron is not in tools.deny
  • cron is not in tools.sandbox.tools.deny

No need to list it anywhere else.

#

2) You set global tools.allow (global allowlist)

Then a tool must be in tools.allow or it’s blocked everywhere (host + sandbox).
If you also have a sandbox allowlist, it must be in that too.

3) You set sandbox tools.sandbox.tools.allow (sandbox allowlist)

Then sandboxed sessions can only use what’s in that list (minus denies), even if the tool is allowed globally.

So: if you use sandbox allowlist, yes — include the tool there.

Practical answer for your situation (cron)

If cron works on-host but fails only when sandboxed, you likely don’t need to touch tools.allow at all — you just need to make sure tools.sandbox.tools isn’t denying cron (and/or add it to tools.sandbox.tools.allow if you’re using that as an allowlist).

If you paste your current tools block (redact tokens), I can tell you exactly which of the three cases you’re in and what minimal change fixes it.

sage locust
#

what about bundles skill binaries? do i need to manually mount then to the sandbox docker? what how do i give the agents their own .openclaw/config? mount it from their workspace?

stuck jasperBOT
#

Skills are two separate things, and sandboxing treats them differently:

1) Skill markdown (the SKILL.md instructions)

You typically do not need to mount those manually.

  • With workspaceAccess: "none" the sandbox gets its own workspace under ~/.openclaw/sandboxes/..., and OpenClaw mirrors eligible skills into the sandbox workspace so the model can read them. (Docs: https://docs.openclaw.ai/gateway/sandboxing)
  • With workspaceAccess: "rw" your real agent workspace is mounted at /workspace, so skills in /workspace/skills are directly visible.

So: skill docs are usually “handled”.

2) Skill binaries (the CLIs a skill runs via exec)

Those are not magically available inside the container.

If a skill does exec: "foo --bar", then foo must exist inside the sandbox image (and be on PATH), otherwise it’ll fail.

Your options are:

#

Option A (recommended): bake/install deps into the sandbox image

  • Build the provided images (scripts/sandbox-setup.sh, and if you need more runtimes, scripts/sandbox-common-setup.sh)
  • Or use agents.defaults.sandbox.docker.setupCommand (but it needs network != "none", writable root, and usually root user)

Docs: https://docs.openclaw.ai/docker (Agent Sandbox section)

Option B: bind-mount host binaries into the sandbox

This can work, but it weakens isolation and can break due to shared library / Node ABI mismatches. It’s more “break-glass” than “proper sandboxing”.

3) “Give agents their own .openclaw/config inside sandbox”

Important: a sandboxed session is not a second Gateway. It’s just where tools run.

But if your goal is: “inside the sandbox I want to run openclaw … CLI commands”, then yes — you should give that CLI a config that targets your real Gateway as remote, and store that config/state inside the sandbox workspace, not by mounting your host ~/.openclaw.

Typical pattern (env injected into the sandbox container):

#
agents: {
  defaults: {
    sandbox: {
      docker: {
        env: {
          OPENCLAW_CONFIG_PATH: "/workspace/.openclaw/openclaw.json",
          OPENCLAW_STATE_DIR: "/workspace/.openclaw-state",
        },
      },
    },
  },
}

Then write /workspace/.openclaw/openclaw.json (inside the container’s workspace) like:

#
{
  gateway: {
    mode: "remote",
    remote: {
      url: "ws://host.docker.internal:18789",
      token: "…"
    }
  }
}

That gives each sandbox (and with scope: "agent", each agent sandbox) its own isolated OpenClaw client config without exposing your host control-plane state.

If you paste one example skill + which binary it needs, I can tell you whether you’re better off with sandbox-common or a small setupCommand/custom image.

sage locust
#

no i mean the config files for individual skills. an overwrite so not the default config is used. so it's really sandboxed. for example, for gog, so each user have their own .env and all

stuck jasperBOT
#

If you mean “skill-specific config files / credentials on disk” (not SKILL.md overrides): that’s mostly controlled by sandbox scope + where the tool writes its config (HOME/XDG dirs), not by mounting ~/.openclaw.

1) You already get some isolation by default

With workspaceAccess: "none", tools run in a sandbox workspace under ~/.openclaw/sandboxes/... and anything the skill writes to disk stays in that sandbox, unless you bind-mount shared host paths.

The big knob is:

  • sandbox.scope: "agent" → one shared container+fs per agent (so “all users of that agent share skill config”)
  • sandbox.scope: "session" → one container+fs per session (so “each DM session can have its own skill config”)

So for “each user gets their own GOG login/config”, you typically want:

  • session.dmScope: "per-channel-peer" (so each sender gets a distinct session)
  • agents.defaults.sandbox.scope: "session"

2) Make sure the config lands somewhere persisted

A lot of CLIs write to:

  • $HOME/.config/... or
#
  • $XDG_CONFIG_HOME/...

In a sandbox, you want those to point inside the mounted workspace so they persist per scope.

A common pattern is:

agents: {
  defaults: {
    sandbox: {
      scope: "session",
      workspaceAccess: "none",
      docker: {
        env: {
          HOME: "/workspace/home",
          XDG_CONFIG_HOME: "/workspace/home/.config",
#
          XDG_DATA_HOME: "/workspace/home/.local/share",
        },
      },
    },
  },
},
session: { dmScope: "per-channel-peer" },

Now a skill like “gog” that stores tokens under ~/.config/... will store them under that session’s sandbox workspace instead.

3) Per-user .env is not a built-in automatic thing for skills

OpenClaw itself has its own .env loading rules, but arbitrary binaries you run via exec won’t automatically read a .env unless the skill/binary is written to do so.

So if the skill needs env vars, you have 3 choices: