#Question about the new Agent Guardrails
1 messages · Page 1 of 1 (latest)
think about it like this
your prompt section: "LLM, please don't say bad things" -> LLM-dependent, best-effort
agent guardrails: An independent AI classifier that actively monitors the agent's output and kills the conversation if it violates safety thresholds - regardless of what the LLM decides to do
The moderation guardrail is the key differentiator. It provides a hard safety net that doesn't depend on the LLM's compliance. You can also configure presets (e.g. "medium" sensitivity) and fine-tune thresholds per category from the UI.
Please check it under agent -> security -> guardrails
It looks like the announcement message that just got posted (#📢│agents-announcement message) is misleading - it is actually NOT talking about the new feature (where an independent AI classifier will monitor conversation for guardrail violation)
Oh sorry for that and thanks for reporting, we will fix announcement message, its not about better prompting, its entirely new setting for an agents
Thank you for clarifying - yes, I had seen the content moderation guardrails feature before but was confused because of the announcement.