#GPT-5-Codex
1 messages · Page 1 of 1 (latest)
YES
does setting reasoning effort affect the model? or does it always "pick" on its own?
well this is awkward...
refuses to do any roo tool calls lol.
the model has very specific format that you need to follow
you need 2 tools, apply_patch & shell with openai's specific format
and system prompts have to be short
Tested GPT-5 Codex:
Coding specific GPT-5 version with emphasis on agentic coding.
- used 43% less tokens than gpt-5 in my general purpose benchmark (73% tokens spent on reasoning)
- roughly same performance as gpt-5, though stem/math performance was weaker
- saw no improvements in non-agentic coding tasks
- vision testing scored between gpt-5 and gpt-5-chat, thus for vision tasks gpt-5 might be preferable
In Chess testing it generated ~18k tokens per move, though sometimes racking up 50-70k reasoning tokens. It excelled in reasoning chess, placing 10-0-0 with 96% avg. accuracy, vastly outperforming gpt-5 and beating the strongest competition; currently #1 with a substantial 150 Elo lead.
Thus, some of its coding optimizations might be surprisingly beneficial in seemingly unrelated areas.
At same API pricing as gpt-5, the biggest draw could be the decreased token use, though that heavily depends on the use case and exact environment.
Obviously YMMV.
finally some movement in the chess bench!! 🥳
yea, quite expensive to add though, about 200 bucks (~120 for codex, 80 for opponents)
yikes. your work on this is highly appreciated tho
128k max output got reduced to 64k ? hitting 64k tok limits (lenght end), charging $0.641, delivering 0 tokens.used to hit 70k just fine
@buoyant current what happened to model output limit? makes chessbench impossible now since its on loop of lenght limit
can you set a max_tokens on your end?
i have unlimited, exact same inference settings (untouched) in both highlighted calls
we set a default to 64k due to load balancers not liking what we had been doing
you will have to more explicitly set a max_tokens yourself now unfortunately
for >64k
will change apps..