#Question about the new Agent Guardrails

1 messages · Page 1 of 1 (latest)

lofty arch
#

think about it like this
your prompt section: "LLM, please don't say bad things" -> LLM-dependent, best-effort
agent guardrails: An independent AI classifier that actively monitors the agent's output and kills the conversation if it violates safety thresholds - regardless of what the LLM decides to do

The moderation guardrail is the key differentiator. It provides a hard safety net that doesn't depend on the LLM's compliance. You can also configure presets (e.g. "medium" sensitivity) and fine-tune thresholds per category from the UI.

Please check it under agent -> security -> guardrails

snow forge
#

It looks like the announcement message that just got posted (#📢│agents-announcement message) is misleading - it is actually NOT talking about the new feature (where an independent AI classifier will monitor conversation for guardrail violation)

lofty arch
#

Oh sorry for that and thanks for reporting, we will fix announcement message, its not about better prompting, its entirely new setting for an agents

viscid marlin
#

Thank you for clarifying - yes, I had seen the content moderation guardrails feature before but was confused because of the announcement.