#GPT-5 released
1 messages · Page 1 of 1 (latest)
System card here
gpt-5-thinking has a 50%-time horizon of 2h15m, 25 minutes more than Grok 4, the next best model
The benchmarks don't seem to be dramatically higher...
dont want to completely dismiss this but a lot of benchmarks are pretty saturated and the last few percent might be harder
From the System Card:
"In one example, GPT-5 correctly identified its exact testing environment."
This is surprisingly good news, though:
And they seem to at least be taking AI psychosis seriously.
Not to give OpenAI too much credit: in many sections of the system card, they compare the harmful outputs of GPT-5 with o3 (the most egregiously misaligned frontier model ever created), and act like it's wonderful that GPT-5 isn't as bad. Clearing a low bar, there.
Rere footage of openai doing the bare minimum??
But seriously though, I do think we don’t give enough credit where credit is due to ai companies
So strategically: this model is designed to damage Anthropic, right? Claude Code with 4.1 Opus is very valuable as a software assistant and many e.g. me are paying serious money to use it. This is most of Anthropic's revenue.
So you get GPT-5 to the point where it performs as well as Opus 4.1 on SWE-Bench, quote Cursor and WindSurf endorsements, massively undercut in what you charge, and hope it kicks your competitor where it hurts.
I've watched a bunch of reddit discussions on GPT-5. People generally hate it, not because the model is bad, but because 4o is gone and people like the 4o vibes. It's quite clear that there is a pretty big discrepancy between what people want and what we fear from AI models.
Most people don't need AGI / superintelligent capabilities. They just want a good chatbot that is a fun buddy that knows a lot. If it beats 50% of devs or 99,9% of devs, that doesn't matter for most. GPT4 was smart enough and knew enough about the world for >99% of usecases.
Where we are concerned, is if the models is smart enough to do the job of AI researchers or top hackers. These capabilities are not that relevant for most.