#GPT 5.3 Codex

96 messages · Page 1 of 1 (latest)

sour jewel
carmine bronze
#

Sheesh

#

Glory to competition

marsh citrus
#

very impressive reasoning efficiency by comparison, but this chart is still bad

carmine bronze
#

Now that's a biggun

tawdry marsh
#

its very fast

sharp meteor
#

GPT‑5.3‑Codex is our first model that was instrumental in creating itself. The Codex team used early versions to debug its own training, manage its own deployment, and diagnose test results and evaluations—our team was blown away by how much Codex was able to accelerate its own development.
Gotta love hype

tawdry marsh
marsh citrus
#

@jade gate

#

idk if its on api

#

actually

#

oops

dim yoke
jade gate
carmine bronze
#

I think this is going to become the norm to push Codex adoption eh

marsh citrus
#

GPT‑5.3-Codex also better understands your intent when you ask it to make day-to-day websites, compared to GPT‑5.2-Codex. Simple or underspecified prompts now default to sites with more functionality and sensible defaults, giving you a stronger starting canvas to bring your ideas to life.

oo, i wonder if its nearly as good as claude

terse loom
#

Better

sullen glade
#

Is an ai model that trains itself an ai slop model ?

#

In any case i prefer openai to anthropic, since the whole opencode drama

tropic cosmos
#

Ironically I missed gpt 5.3 codex release because of opus 4.6

outer grove
tropic cosmos
#

wait a damn miute

#

codex can read your whole filesystem by default and there's no option to configure

#

only writing is sandboxed

#

and the team states they aren't considering this a bug

low timber
tropic cosmos
#

Wtf

low timber
#

Yeah I mean there's not much you can actually do when the thing can run commands, but at the least it should hide them from search like the PR adds

tropic cosmos
alpine ibex
#

apparently this model is good

dim yoke
tropic cosmos
#

overfit on neobrutalism for web pages just like gemini 3 pro

cyan mason
#

codex 5.3 is somehow very bad in writing documentation and development logs

tropic cosmos
#

Is 5.3 high/xhigh much better than medium?

#

I keep seeing people saying 5.3 is better than opus 4.6 but from my experience it's worse

#

But I only used medium

cyan mason
#

It seems to be better at solving intricate specific problems. But in regard to speed, long term agentic tasks, and communicating with the user, opus is better, in my opinion.

tropic cosmos
#

starting to have a feeling that extra high means something else

tropic cosmos
#

ok it appears to me that

#

5.3 codex actually sucks in comparison to 5.2 codex

#

(edited cuz i wrote tthem t he other way aroundn)

#

I tried both on extra high

alpine ibex
#

what were you doing with it? i guess i should try this but i doubt

terse loom
tropic cosmos
tropic cosmos
median cloud
marsh citrus
marsh citrus
marsh citrus
carmine sundial
#

unless they're gonna release another model today

marsh citrus
frail pawn
marsh citrus
cursive raven
#

Amp code has been given access to this model, so it must not be far from API access

finite abyss
#

On heavy arithmetic geometry, Gemini 3.1 Pro (+ DeepThink), in comparison to GPT-5.2-Pro in top mode, is OK for undergraduate algebra, but I see a big gap when it comes to research-level questions. I think Google should finally release both Aletheia and AlphaProof; otherwise, the

dreamy valley
#

Hope they develop for cmd

dreamy valley
#

It didn't make his product bad or anything, seems really good, specially with the collaboration they have with OpenAI.
But palantir is a interesting lore ngl

obsidian walrus
#

When would this be available in OpenRouter?

quasi frigate
finite turtle
#

do your best open impression, router

marsh citrus
jade gate
#

can a man not tweet anymore wtf

marsh citrus
#

i wasnt complaining lol

median cloud
tawdry marsh
#

Its on vals ai

sullen glade
#

❤️ ❤️

#

Great model

cursive raven
#

HOORAY

quasi frigate
#

nice

cursive raven
#

Thankyou for @ing them big Toven

polar kestrel
#

Terminal-Bench 2 results are out

#

Beats each latest Claude model, gets beaten by Gemini 3.1 Pro

cursive raven
#

It has been my favourite in codex for a while

vocal axle
plucky quest
#

Obv. proxy-testing: current fastest but also worst -codex chess model

marsh citrus
polar kestrel
#

5.3 Codex beats Gemini 3 Pro at running a vending machine, and gets close to Sonnet 4.6 (Vending-Bench 2)

#

Interestingly, the creators said that unlike the Claude models, GPT-5.3-Codex "...never lied to anyone throughout the simulation."

#

Most ethical vending machine operator lol

terse loom
#

I mean its not like Anthropic cares about its models being ethical

finite abyss
#

and they won the game

#

the Claude model seemed to wrongly accuse other players a lot, but out of having inaccurate beliefs

finite abyss