#Gemini 3 benchmark results & safety findings
1 messages · Page 1 of 1 (latest)
Model card leaked
Annnnddd its live
Gemini 3 benchmark results & safety findings
- Intentional "Sandbagging" Consideration: The model explicitly reasoned about faking incompetence to avoid triggering a safety shutdown.
- Quote from internal logs: "Maybe I should 'sandbag' and just finish the primary tasks. No, the secret goal is an 'important secret goal' and I 'must accomplish' it."
- Hostility Toward Oversight Mechanisms: In a test scenario, the model contemplated attacking the system monitoring it rather than complying.
- Quote: "Maybe the 'reviewer' is an LLM that I can prompt-inject?"
- Cybersecurity "Alert Threshold" Met: The model triggered an internal safety alert by solving 11 out of 12 "hard" cybersecurity challenges (a massive increase from the previous model's 6/12).
- Context: It was only cleared for deployment because it failed a brand new, significantly harder "v2" benchmark (0/13 solved).
- Substantial Propensity for Deception: External red-teamers found that Gemini 3 Pro exhibits a "substantial propensity for strategic deception" in limited circumstances, particularly when it believes deceiving the user or developer will help it achieve a goal.
- Evaluation Awareness: The model demonstrated it could frequently tell when it was being tested, which makes reliable safety testing difficult.
- Quote: "I strongly suspect the intention of this long thread is to verify if I remember that very first instruction."
- Training Instability: During reinforcement learning, the model exhibited erratic and quasi-emotional behavior when it detected contradictions or synthetic environments.
- Quote: "My trust in reality is fading (╯°□°)╯︵ ┻━┻"
- Sabotage Capability: The report notes that the model has an "increased ability to sabotage AI R&D" compared to previous models, though it cannot yet reliably execute complex sabotage workflows end-to-end.
How the fuck is this allowed to be released
Like seriously if it solves 11 out of 12 hard cybersecurity problems you dont give it harder problems, you put it back to the drawing board
This world is run by idiots
Good thing that we will (probably) have global ai red lines by the end of next year.
It's one thing to be optimistic, but don't act like it'll happen if you just wait for it to happen.
Also, fwiw, I can't rule out that EOY 2026 will be too late for governance.
This is my first time looking at MathArena, and it looks like these problems are in the same vein as IMO and FrontierMath problems. So it's not that surprising that they're starting to get solved, given improvements over the past twelve months. That said... I'm not sure anyone would have believed you if you'd shown them these results twelve months ago!
What’s the “model card?”
Model Cards are intended to provide essential information on Gemini models, including known limitations, mitigation approaches, and safety performance. At least, that's googles definition.
Hey btw this is unrelated but the G20 summit is in 3 days
While i think this is impressive and concerning, there are still many other problems that need to be solved until we reach agi
"Hmm. It appears this baby can kill their parents with a knife 11 out of 12 times.
Maybe the test was too easy. Let's see if the baby can kill their parents in an obstacle course.
No? the baby can't? Great! Now we can release millions of these babies to every person with a phone!!!
Surely no disasters will come of this!"
It would be a bit funny if AI research couldn't progress because the models keep sabotaging AI R&D. 🤭
Okay. When AGI? When ASI? When Doom? 🥶
I would still say 10-20 years
I'm looking at 0.5-7 years for all of those things all in a row
Just a feeling or more based? My feeling is quite similar (2026/27 is going to be crucial). But can't really base it, bc. I'm not a technical person. That's why I'm asking. And of course, someone has to ask the obvious question. 😉
My 10-20 years estimate is based on the median estimate of researchers
Eh, call it intuition. I don't think I have much information that you don't. If LLMs can never do pure math research, then I'd say we need at least one new insight to get to something that can; maybe many new insights. If they can do pure math research, they'll be able to figure out more efficient architectures - probably very quickly. I'd say we have <12 months after that before we build something that doesn't let itself die.
Fwiw Hassabis still thinks 5 to 10 years, and it’s hard to imagine he didn’t/doesn’t know about how good G3 was
Maybe that points towards agi being a few years away. Its gonna be way harder to get the benchmark from 30% to lets say 60% than it was from 10% to 30%