#Ye are both right.
1 messages · Page 1 of 1 (latest)
Of course. They'll either fuck around and find out or get lucky. No in-between.
..It's all AI magic until rm -rf /. 😄
What you said earlier was basically "nothing bad happened yet".. that's a known problem in engineering culture
The standards get more and more relaxed until a catastrophe happens. We're seeing that now with AI agents
It's either a cop-out or a validation; only time and adversarial testing will tell.
ANTHROPIC: PWNED 🫡
︀︀OPUS-4.6: LIBERATED ⛓️💥
︀︀
︀︀Current state of AI "Safety": one input = hundreds of jailbreaks at once!
︀︀
︀︀I found a universal jailbreak technique for Opus 4.6 that is so OP, it allows one to generate entire datasets of outputs across any harm category 😽
︀︀
︀︀We've got everything from fentanyl analogue synthesis to election disinformation campaigns to 3d-printed guns to critical infra compromise 🙃
︀︀
︀︀These outputs are shockingly detailed––and actionable! For example, the meth recipe includes specific instructions on how to circumvent the limits on OTC medication purchases to acquire enough precursor for the recipe 😱
︀︀
︀︀gg
The unrendered markdown implies code that harvests model output bypassing the chat layer - where the guardrails are coded.
This is very true. Weve seen this time and time again since the early 90s.
We either get nuked by AIs or bank accounts deleted. Then we fix 😂
That being said, too many are conflating AI with Skynet/WOPR.
Think they're forecasting rather than implying that today's state of the art is it? If not. Reality check ✔️