#Question: Does OpenClaw have a prompt safety guard before execution?

1 messages · Page 1 of 1 (latest)

grand widget
#

Hi all — quick question from a user perspective.
I opened a feature request about adding a lightweight prompt safety/risk guardrail before agent execution (non-blocking warnings first).

I’m trying to understand:

whether OpenClaw already has something similar, and
if not, whether plugin-first is the preferred direction.
Issue link: https://github.com/openclaw/openclaw/issues/71041
Any pointers are appreciated — thanks!molty

primal quarryBOT
#

I dug through the docs a bit, and the short version is: I do not see a built-in, generic “pre-run prompt risk check” that warns before agent execution starts.

Closest things that already exist are:

  • system prompt safety guidance, but docs explicitly say it is advisory only
  • hard guardrails later in the flow: tool policy, sandboxing, exec approvals, plugin approvals
  • tool-loop detection, which is different, it catches repetitive tool behavior
  • provider-specific guardrails in some cases, for example Bedrock guardrails

So for the specific thing you described, “scan prompt first, show non-blocking risk warning, maybe enforce later”, it looks like a real gap today.

On direction, plugin-first seems like the right first move. The plugin docs/architecture already point that way:

  • prompt mutation work should prefer before_prompt_build
  • plugins can use before_tool_call to block or require approval later in the chain
  • hook-only plugins are still supported, so this can start optional and iterate without forcing a core feature too early