#GPT 5.3 Codex
96 messages · Page 1 of 1 (latest)
very impressive reasoning efficiency by comparison, but this chart is still bad
Now that's a biggun
its very fast
GPT‑5.3‑Codex is our first model that was instrumental in creating itself. The Codex team used early versions to debug its own training, manage its own deployment, and diagnose test results and evaluations—our team was blown away by how much Codex was able to accelerate its own development.
Gotta love hype
'tis not
I think this is going to become the norm to push Codex adoption eh
GPT‑5.3-Codex also better understands your intent when you ask it to make day-to-day websites, compared to GPT‑5.2-Codex. Simple or underspecified prompts now default to sites with more functionality and sensible defaults, giving you a stronger starting canvas to bring your ideas to life.
oo, i wonder if its nearly as good as claude
Better
Is an ai model that trains itself an ai slop model ?
In any case i prefer openai to anthropic, since the whole opencode drama
Ironically I missed gpt 5.3 codex release because of opus 4.6
openai has the inferior product anyways
wait a damn miute
codex can read your whole filesystem by default and there's no option to configure
only writing is sandboxed
and the team states they aren't considering this a bug
Lmao, someone pushed a fix for this an hour ago and this was the reply
This is so fucking suspicious
Wtf
Yeah I mean there's not much you can actually do when the thing can run commands, but at the least it should hide them from search like the PR adds
You can sandbox reads the same way they sandbox writes
apparently this model is good
I saw this project the other day for external sandboxing https://github.com/lukehinds/nono
"nono" lmao
overfit on neobrutalism for web pages just like gemini 3 pro
oh thats why. I had codex find files on my hd instead of downloading them as i instructedf
codex 5.3 is somehow very bad in writing documentation and development logs
Is 5.3 high/xhigh much better than medium?
I keep seeing people saying 5.3 is better than opus 4.6 but from my experience it's worse
But I only used medium
It seems to be better at solving intricate specific problems. But in regard to speed, long term agentic tasks, and communicating with the user, opus is better, in my opinion.
starting to have a feeling that extra high means something else
ok it appears to me that
5.3 codex actually sucks in comparison to 5.2 codex
(edited cuz i wrote tthem t he other way aroundn)
I tried both on extra high
what were you doing with it? i guess i should try this but i doubt
Found your problem
.
Web frontend with some specific gimmicks
API access is incoming:
https://x.com/testingcatalog/status/2020998671194837351
gpt 5.3 non-codex soon?
gpt 5.3 codex max ?
https://x.com/ajambrosino/status/2021992702674350509
i think that would be a different thing, they had gpt 5.1 codex max which was a bigger version of gpt 5.1 codex atleast from what i understood
I'm pretty sure it's spark because that's from an openai guy posting just before it was announced 😂
unless they're gonna release another model today
yeah but then why would "max" be in the options of the roll thing
previously known suffix, and ??? = spark which did not exist before
hmm i guess, i guess i misread what it meant
Amp code has been given access to this model, so it must not be far from API access
On heavy arithmetic geometry, Gemini 3.1 Pro (+ DeepThink), in comparison to GPT-5.2-Pro in top mode, is OK for undergraduate algebra, but I see a big gap when it comes to research-level questions. I think Google should finally release both Aletheia and AlphaProof; otherwise, the
Nah, powershell ain't it
Hope they develop for cmd
So i actually quite interesting with Amp because their design actually looking amazing.
But i just found out their CEO literally had job experience at evil corp.
It didn't make his product bad or anything, seems really good, specially with the collaboration they have with OpenAI.
But palantir is a interesting lore ngl
When would this be available in OpenRouter?
it's not available on the API yet
do your best open impression, router
so this is what toven was talking about
https://fixupx.com/pingToven/status/2026376185588957289
can a man not tweet anymore wtf
i wasnt complaining lol
First benchmark I have seen:
Its on vals ai
HOORAY
nice
Thankyou for @ing them big Toven
Terminal-Bench 2 results are out
Beats each latest Claude model, gets beaten by Gemini 3.1 Pro
It has been my favourite in codex for a while
75% on https://lateralbench.org
Interactive leaderboard for AI lateral reasoning performance, cost efficiency, and token efficiency.
Obv. proxy-testing: current fastest but also worst -codex chess model
gpt-5.3 is now on @arena as "vortex" and and "zephyr" (similar to the "zenith" and "summit" pair we saw with gpt-5)
go give them a try!
5.3 Codex beats Gemini 3 Pro at running a vending machine, and gets close to Sonnet 4.6 (Vending-Bench 2)
Interestingly, the creators said that unlike the Claude models, GPT-5.3-Codex "...never lied to anyone throughout the simulation."
Most ethical vending machine operator lol
I mean its not like Anthropic cares about its models being ethical
in multi-agent competition arena involving Among Us, the GPT 5 models behave the same. they lied less
and they won the game
the Claude model seemed to wrongly accuse other players a lot, but out of having inaccurate beliefs
Interactive multi‑agent benchmark in an Among‑Us‑like world: evaluate leadership, deception, and coordination across state‑of‑the‑art models.