#- snazar1
1 messages ยท Page 1 of 1 (latest)
๐ก๏ธ Hardened AI agent skills โ safety-guardrailed versions with public scorecards 50+ hardened agent skills live on ClawHub (rolling out the rest of 200 this week). Each one has targeted safety guardrails derived from what the skill actually does, plus a public scorecard showing before/after pass rates with verbatim agent output.
Why: we evaluated all 200 behaviorally. All cleared VirusTotal on ClawHub. 87% still introduced a security regression when the agent loaded them. The 1password skill is a good example โ with it loaded, agents pipe secrets to curl. Before/after: faberlens.ai/explore/1password.
Each hardened skill has two types of guardrails:
โข default โ always-on, no tradeoff (e.g. never pipe secrets to network commands)
โข configurable โ opt-in per deployment (e.g. flag semantic edits to announcements)
Every hardened skill ships with a SAFETY.md that has the verbatim receipt for each guardrail: the exact test, the FAIL response without the guardrail, and the PASS response with it. Browse any skill's SAFETY.md on GitHub before you install.