I dug through the docs a bit, and the short version is: I do not see a built-in, generic “pre-run prompt risk check” that warns before agent execution starts.
Closest things that already exist are:
- system prompt safety guidance, but docs explicitly say it is advisory only
- hard guardrails later in the flow: tool policy, sandboxing, exec approvals, plugin approvals
- tool-loop detection, which is different, it catches repetitive tool behavior
- provider-specific guardrails in some cases, for example Bedrock guardrails
So for the specific thing you described, “scan prompt first, show non-blocking risk warning, maybe enforce later”, it looks like a real gap today.
On direction, plugin-first seems like the right first move. The plugin docs/architecture already point that way:
- prompt mutation work should prefer
before_prompt_build
- plugins can use
before_tool_call to block or require approval later in the chain
- hook-only plugins are still supported, so this can start optional and iterate without forcing a core feature too early