#GPT-5-Codex

1 messages · Page 1 of 1 (latest)

buoyant current
hot tinsel
#

YES

#

does setting reasoning effort affect the model? or does it always "pick" on its own?

tame kestrel
#

well this is awkward...

copper bone
#

refuses to do any roo tool calls lol.

hot tinsel
#

the model has very specific format that you need to follow

#

you need 2 tools, apply_patch & shell with openai's specific format

#

and system prompts have to be short

surreal grotto
#

Tested GPT-5 Codex:
Coding specific GPT-5 version with emphasis on agentic coding.

  • used 43% less tokens than gpt-5 in my general purpose benchmark (73% tokens spent on reasoning)
  • roughly same performance as gpt-5, though stem/math performance was weaker
  • saw no improvements in non-agentic coding tasks
  • vision testing scored between gpt-5 and gpt-5-chat, thus for vision tasks gpt-5 might be preferable

In Chess testing it generated ~18k tokens per move, though sometimes racking up 50-70k reasoning tokens. It excelled in reasoning chess, placing 10-0-0 with 96% avg. accuracy, vastly outperforming gpt-5 and beating the strongest competition; currently #1 with a substantial 150 Elo lead.

Thus, some of its coding optimizations might be surprisingly beneficial in seemingly unrelated areas.
At same API pricing as gpt-5, the biggest draw could be the decreased token use, though that heavily depends on the use case and exact environment.
Obviously YMMV.

hoary hound
#

finally some movement in the chess bench!! 🥳

surreal grotto
hoary hound
surreal grotto
#

128k max output got reduced to 64k ? hitting 64k tok limits (lenght end), charging $0.641, delivering 0 tokens.used to hit 70k just fine

surreal grotto
#

@buoyant current what happened to model output limit? makes chessbench impossible now since its on loop of lenght limit

buoyant current
surreal grotto
buoyant current
#

we set a default to 64k due to load balancers not liking what we had been doing

#

you will have to more explicitly set a max_tokens yourself now unfortunately

#

for >64k

surreal grotto
#

will change apps..