#codex-discussions
1 messages · Page 3 of 1
They likely do, but you can easily "trick" model that its your repo while its not, and hence the id ver is deserved
That's literally what I'm saying
I don't think its complete garbage I have had great amount of success with compound engineering plugin.
Reason why I like codex is because it just think better and don't rant.
Its dumb, token inefficient, has dementia and hallucinates
Looks like you got a beef 😂
4.6 is literally quantized 4.5 with more thinking
🤣
That I agree
Here is a radical idea for Codex App: human2human chat.
People want to share their works, share their threads, prompts, skills. And the best way to do it is enable teams to work on the same thread
Is anyone else not seeing reasoning messages in the CLI?
I’m only seeing the title when it’s reasoning now, no subtext
Guys what is the point of a 200$ ChatGPT subscription for coding if I can have Codex 5.3 and all the other LLM for 200$ too on Cursor ?
You need to do a 60-100$ unlimited monthly offer for vibe coders
Same
And you have nearly unlimited on the 20$ sub either way
I finished my weekly token in a day
20$ membership is great for 2h of coding a day
For 10 hours it’s not enough
2X rate limits til April in Codex
OAI subsidizes Codex usage more than pure API rates
Ok thank god it’s not only me..
What’s up with that??
only in the app though
They did this with cloud tasks at the start as well
I thought we dispelled that myth.
They tweeted that it’s across the Codex suite of apps
I can confirm from my daily usage at the moment, that the rates are double. I'm using the VSC extension for the most part.
I was coding 20h total nonstop for two days and still 60% remain
They said app, cli, web, all
I finished my GPT pro weekly limit on the 2x usage bro
Gpt or codex limit?
And are you sure you logged in?
I see some reasoning. If I hit Ctrl+t, I see a lot of reasoning
it's easy to hit even pro limits when youre parallelizing stuff
after playing with 5.3 codex for a solid 4-5 days, I'm heading back to GPT 5.2 (non-codex). I consistenly find that 5.3 avoids reading long files, and lots of files. It's producing incomplete implementations and ships bugs fast, but I prefer 5.2 one-shotting tasks and shipping working results in however much time it needs. I even put an agent on it, to keep sending the sub agent back again and again until it finally reads the 60+ components in a vue front end (patternfly).
non-codex 5.2 is just extremely good, yes
but i'm finding 5.3-codex good enough for most coding tasks right now, and it's just so fast
this was the death of claude, though. "you're absoluty right, i didnt actually implement...."
i hope regular 5.3 will not lose these extremes of 5.2 that i loved
and i hope -codex variants will stop losing them during code tuning that oai is doing
yeah openai ties speed to adoption, they say. Im fine with a fast inferior model as long as they keep the superior ones available. The inferior fast ones are a nightmare for me because my codebases are big and complex.
sub agent just keeps giving the orchestrator the finger... It's not doing it
Can you try adding a boat?
i wonder if the sub agent can get the orchestrator to use threats and profanities if it keeps refusing to put in the work
boom that's how 5.2 rolls in. "separate work tree to check everything twice". keep talking dirty to me, baby.
random Russian in a response from gpt-5.2-xhigh, I've definitely never written a word of Russian to it lol
that's a documented sign of degradation.
should use /feedback to report it
is there any way to /feedback it if I've already sent it further messages since then?
yeah it will ask to send session history
thanks
I tried but the floating physics werent very good
Getting a real polished results was not working so well
K at least thanks
Where is this documented? I've never heard of that before
@warm rain But you have the repo. Feel free to try and throw me a PR if you wish 🙂
Ok 😄
https://arxiv.org/abs/2406.20052 "The phenomenon of LLMs inserting foreign language words—particularly English—into non-English outputs is documented as a sign of degradation in the research paper "Understanding and Mitigating Language Confusion in LLMs" by Kelly Marchisio et al. (2024).
This study introduces the Language Confusion Benchmark (LCB), which evaluates how often LLMs fail to maintain the user-specified language, especially in non-Latin scripts. It identifies that models like Llama Instruct and Mistral frequently generate responses with unintended language switches, even when prompted clearly.
Key Finding: Base and English-centric models are most prone to this issue, especially under high sampling temperatures or complex prompts.
Mitigation: The paper suggests few-shot prompting, multilingual supervised fine-tuning (SFT), and preference tuning as effective countermeasures"
Thank you!
codex has become slow again
plus sub?
i dont know but on pro even with xhigh it's fast
im on pro
cli or app?
did you see ppl were getting routed to 5.2, coudl that be it?
They did a fix but you never know
no its fine in CLI
How you finding 5.3 anyway?
seems smarter as well
yeah
perfect, all good then.
the difference between 5.3 codex and 5.2 is noticeable. 5.3 codex is faster, smarter and more token efficient compared to 5.2. Other than the ID requirement fiasco, I also recommend it. If you're not flagged then you don't need to worry about the ID requirement though, appears around 9% (if I recall right) of users were impacted by the overflagging incident
uh i had to go give my id, probably because i was working on some encryption stuff for medical records.
haha fair enough 😛
false positive, bit of a drama. Couldnt imagine being banned o.0
yeah unfortunately it was poorly executed and implemented, but lessons learned hopefully on OpenAI's side
Why don’t these big tech companies just use localhost? Are they stupid? My localhost app is fully unhackable!
5.3 thinks slightly less than 5.2 noncodex
But its faster and smarter so its not an issue
yup, considering how it thinks less but is smarter, it's impressive
If llm says wrong word then the context is either filled to much, or it was quantized too much
Meanwhile opus is the exact opposite lol
Thinks a lot but complete buffon
yeah, and that fast option, that has to be an April fools gag. Way more expensive for possibly twice the speed
The fast option aint even real thing
Its not faster model
Its "priority queue"
🤣
Pay even more for making it remotely usable 🥀
ah, so fast isn't even true, it's just the same concept that OpenAI use which is included for Pro plans, priority processing
Yeah lol
😂
haha
jees, I'm glad OpenAI include that with Pro, without paying extra
true, 5.3 codex is a good example of that
Yeah they are making model smarter and more efficient, instead of quantizing/compressing it and telling it to think even more
Waste of money, more hallucinations, more dementia, worse performance
It's only 20% faster though
every little helps
yea... plus 20% faster on top of the previous overall speed improvement of the standard queue
20% faster backandforth, iteration, planning and implementation for a model that can work for hours is quite significant
20% speed boost priority, but the model itself is already crazy fast
Nothing like opus or so
How many people here experience crashes of codex desktop?
Found this here: https://github.com/openai/codex/issues/11016
It crashes on my end even when idle (Mac OS Tahoe)
Mac os issue likely
Rip
I have had no crashes, macOS 26.2 (25C56)
Version 260208.1016 (571) on macOS 26.2 on a Macbook Pro, M2 Max, 96GB RAM. I'll update Codex
ok, now on Version 260210.1703 (602)
I heard that since the latest two releases it started to happen
I am also experiencing UI laggs, the Cursor not changing, not able to click anything and so on
working perfectly here
I am going to further investigate if that might be colliding with other Codex instances running on other IDEs
Thanks for your Info!
Windows fork? From OpenAI or 3rd party?
3rd
##+
Hey which codex model the plus users have access to?
submit button is non-functional on visual studio code extension?
All?
Don’t see any option to choose
/models
Or if you're in the extension it's in the chat box now
I’m on web and /model doesn’t work
Oh that's the cloud version I thought you meant codex. Cloud doesn't let you pick.
It also uses 25 * more rate limit. So I recommend switching to CLI or the extension as soon as you can
What’s the cloud default model?
I recommend switching to the CLI or extension as soon as you can
alright ty i was just testing before installing anything
gpt-5.3-codex (low, med, high, xhigh?), gpt-5.2-codex (low, med, high, xhigh?), gpt-5.1-codex-max, gpt-5.1-codex-mini – according to /models
Any idea why Codex can perform reviews on my PR’s for my private repo but when I ping it to perform the corrections, the web task says there’s no upstream set. How do I configure that I assumed it could do it since I can already review and I have GitHub connector linked from where I can see.
If you want a Cloud like workflow I have a program I made that works 1 to 1 like cloud but uses the CLI if you want a link just ask <3
codex is much quicker today than yesterday. thanks
why can't i see gpt-5.3 codex in /model? i'm using v0.98.0
I dont have it either, it hasnt deployed to API codex users yet
you guys are flagged for misuse
tried doing anything hacky?
going to have to ID verify to keep using it lol
it should be available in the cli for users on a paid plan no?
mmmm what is this memory v2 i keep seeing in openai/codex commits
i doubt i'm flagged, not doing anything weird.
and if i'm flagged for misuse 5.3 requests are rerouted to 5.2 but 5.3 should still be visible in /models
How do you find out if you 've been routed or not?
this is interesting.. but what exactly is this misuse flagging?
I mean, if it flagged something as a misuse, why route to 5.2? shouldn't it block the request outright?
what is the interest in fulfiling a request that was flagged as misuse?? sounds counter intuitive
but well, there must be an explanation
it poisons the training data
everything you do gets integrated into the model
scary stuff
google red teaming practices and data poisioning
??
what I mean is.. if the request was flagged, why is it routing to a different model instead of blocking it?
capable of what? 5.2 isn't capable of being misused but 5.3 is?
the routing is not per request. once you are flagged, your whole account is, and every request you make will be rerouted for a certain amount of time.
5.3 is likely the base for upcoming releases/ new builds
this makes a little bit of sense
but I find it odd that the system would be able to detect misuse and still opt to fullfill the request with an older model
if they outright rejected your request, you would easily be able to figure out how their filtering system works and bypass it
yea.. if the account is flagged.. it should just block the whole account, lol
I mean.. ok.. maybe the detection isn't very reliable and they don't want to block an account just because they have a suspicion...
so... well.. the detection system isn't reliable so it shouldn't be deciding to downgrade the model anyway, I guess
Im sure that there must be reasons beyond of what we can see, things are usually not this black and white, im just specualting on the odd-ness of the presented information
once you verify your identity your requests bypass the detection anyway. they are playing it safe, its way too easy for a state actor to buy a bunch of accounts with stolen credit cards and abuse them.
more evidence of something strange going on with 5.3 codex
Hello eveyone. I just wanted to say that i've been having issues using codex IDE for now. WIth the new updates it has a lot of memory problems that did not have before. It slows down and it freezes my entire system as context grows. This did not happen at all before and it was very snappy. Also some other change i noticed is that the backend does not clean up context like before. before it would clean up context automatically, each time the agent would do research, and while idle times happened. But now it does not clean context anymore, just when the context approached 80%. Before i never used to run out of context, now it's compacting a lot. I don't know why these changes were made, but it was way better before.
there are many cases of false flagging, they're starting with very strict detection but they're relaxing the filters over time as they process the data based on their postings on X
yea, makes sense
it's also much worse quality....
Security by obscurity is not security, that's day-one infosec.
A system should be secure even when the attacker knows exactly how it works.
Hiding the rejection doesn't make the filter stronger, it makes it untestable.
And bad actors don't sit there iterating against your filter, they use open source models locally with zero restrictions.
The only people the silent downgrade affects are paying customers doing legitimate work.
That’s a nice motto I like it, Im guessing obscurity is needed in some situations though (obscurity + security)
A system should be secure even when the attacker knows exactly how it works.
LLMs inherently don't work like that. They are not deterministic. No LLM filtering can ever be 100% accurate.
Well if you have low temperature then you probably can
But I think they removed the temp param
Even at temperature 0, LLMs are not deterministic. There's a ton of other variables to consider. Even the GPUs they run on can influence their output. There's research on this.
You just made the case against the approach.
If the filter can never be 100% accurate, then silently downgrading paying users is guaranteed to hit innocent people, which it did, at 9% or higher, their numbers.
The minimum is notification so users can appeal.
Silent degradation of a paid service based on a known inaccurate filter isn't safety it's negligence.
I am not defending their approach? I am saying I understand why they implemented it like that. And they already said the 9% ban rate was a mistake, and unbanned everyone who was flagged. Do you expect every single company on the planet to never make a mistake? They are handling it better than a lot of other companies would already anyway (coughcough anthropic coughcough)
They silently rerouted our traffic, that is intentional.
That tweet (or whatever they call it these days) leaves so much to interpretation, but seeing as how they are not just blocking accounts entirely I think the reason they're routing to 5.2 is because it's been more thoroughly tested and they know the refusal rate is extremely high for cyber-abuse, whereas 5.3-codex needs more testing to ensure its refusals are strong enough. And they don't wanna block valid uses, they just wanna limit the blast radius.
Anthropic literally was serving worse versions of their models to their users for weeks if not months with NO way of knowing at all
And they literally denied it until they were forced to admit it
And we should not accept this from any company Anthropic, OpenAI, or anyone else.
Since when is silently degrading a paid service the standard?
These practices harm paying users doing legitimate work while real attackers use open source models locally and are completely unaffected.
The only people caught by these systems are the compliant ones because the non compliant dont use services tied to their identity and credit card inthe first place.
yep, this seems accurate
Posting for anyone from Codex team to see this.
with gpt-5.3-codex xhigh on Pro plan
I'm seeing faster output but..
- No reasoning blocks
- Heavy use of bulleted lists
- Repeatedly compacting
- Spawning subagents without being told to
How do ppl handle rules and permissions in a reliable low friction way?
I want to prompt on destructive commands and allow on safe ones. but its seems codex can just use zsh and bypass any prefix rule, and we cant target inner case on zsh wrapper commands.
Whats the solution?
not having this
Me either it is working well for me
welp... not sure what to do
I use trash-cli instead of rm so the agent can delete files using the trash can, and it's undoable. Then hard block the use of rm
Im more worried about things like git checkout nuking local changes or --force commands
Is the LiveBench updated for Codex 5.3? I don't see it https://livebench.ai/#/
Oh yeah, I think Codex config supports matching specific commands like that, so you can pattern match git checkout*and block it. As for the force commands I think you can pattern match --force too. They've made the sandbox settings pretty customizable
It does but then codex does compound calls using zsh like this (below) so using a prefix rule on cat is by passed by this so it can just do it anyway. We can target zsh but the inner target is a string match so we cant use it to target cat in the inner command.
Meaning codex can do it anyway
rg -n "zsh -lc 'cat > /tmp/ec_test.txt'|resolves to "'`allow`|updated to reduce the /bin/zsh -lc|=> `allow`|matchedRules: '"\\[\\]"
I don't believe so, but personally I find voratiq's leaderboard a nice one to follow. At least it seems to align with my own perceptions - https://voratiq.com/leaderboard/ (reposted as I forgot to tag reply)
Guys those who are on Chatgpt 200$ per month plan the limits are good? I mean almost all day coding with codex would be fine? I'm not a extreme coder but getting it to not have worry about speeds and limits
pretty much, especially with GPT-5.3-codex on high
I consider myself a reasonably heavy user, but best I've reached on 2x rate limit is 50% before weekly reset
I was confused between choosing claude code max vs getting chatgpt pro but going to get chatgpt just because I like working with Codex
It got its quirks sure but I just find it more sensible
the amount you get from Claude plans compared to ChatGPT plans is fairly night and day in my opinion. I feel you can get a lot more out of ChatGPT, even on Plus
also don't forget that the quotas for Codex and ChatGPT are separate
wow really
yup, unlike Claude, ChatGPT's quotas are separate. You have quota for ChatGPT and you have quota for Codex stuff
This is really good part of it
Yeah, it can indeed find workarounds. It could easily run python code that deletes files too. But I think if you set up a rule that catches git checkout and does "prompt", that gives you the opportunity to deny the request and steer them away from using it. If you do "deny", it might not prompt you, and they will work diligently to find a workaround like that. It's a tradeoff for sure but "prompt" is the way to go
Codex is pretty good at following rules, but in the cases when it decides not to, the prompt is a good realignment opportunity
I do have them as prompt and its not about it trying to find a way around it, it might just decide it needs to have a compound command for what ever its doing. So it just slides right past the rules.
yes its enough for fulltime coding with a few parallel agents
Does the app have a way to pull/rebase after you merge a PR on Github?
I cant find it
Guys when windows codex app coming?
tomorrow probably, official X post said end of this week
idk if they mean the work week (friday) or the entire week (sunday)
lets go that would be nice.
has anyone managed to create a team of SKILLS for a given workflow?
I created a "hey" CLI script. So "hey do this thing for me". That statement as-is will execute anything that is non-destructive. But "hey do this other thing!" with a trailing exclamation mark gives it permission to execute anything without permission. If I say "hey do this destructive thing" without the exclamation mark, it'll warn me. So the tools are available for all of us to use.
I just found out yesterday that with Plus I can burn through Codex on 5.3-medium all day long and not consume quota. Try that first.
So I'm supposed to believe this is normal 5.3 xhigh model behavior?
I'm seeing no reasoning output, long stacks of tool calls, numerous ------ dividers, Worked for 1m in every output
this is normal behavior
model tells you what its doing
it's one of the benefits that was explained in the release
these --- dividers and "worked for 1m" things are parts of CLI, not model's output
yes I'm aware of what CLI output is supposed to look like
so, what's wrong then?
just run the command to check response model, if it says gpt-5.3-codex, thats what ur getting served
this chart doesnt show any numbers for reference so i dont know what should i understand from it 🙂
i'm not seeing any reasoning output
the model is significantly faster
makes me think it's a different model being served
reasoning output is hidden in most cases when using 5.3, since the model updates you as it goes anyway. 5.3-codex is also significantly faster by design
there was a verbosity setting iirc
I've been using 5.3 since the hour of its release, this is not the same 5.3
wym? where s that
ask it what model is it
RUST_LOG='codex_api::sse::responses=trace' codex exec --sandbox read-only --model gpt-5.3-codex 'ping' 2>&1 \
| grep -m1 'SSE event: {"type":"response.created"' \
| sed 's/^.*SSE event: //' \
| jq -r '.response.model'
"model": "gpt-5.3-codex",
"reasoning": {
"effort": "xhigh",
"summary": "detailed"
},
i've also been using 5.3-codex from the hour it's been released. yesterday and the day before there were rerouting issues (which you could check with the command i sent above). they are fixed. the model behaves the same as it did before for me.
numerous ------ dividers, Worked for 1m
this is literally the main difference between 5.3 and 5.2... 5.3 updates you as it goes and therefore you see those dividers.
Do you still see reasoning sections like this?
reasoning sections are hidden by default when using 5.3
press ctrl + t in the tui and you should see them
uhm wat.. that's not the UX I've been seeing in my CLI
custom cli ui
your vanilla terminal/powershell wont look like that lmfao
the average ai coding agent user has no idea what sorcery real coders used to do prior to the advent of these tools lmfao
i just mean "reasoning sections are hidden by default" is not the UX I've ever seen with 5.3
oh its in the settings
type / and scroll through the cli setting options
i think hiding it saves some context
that's what i felt too
the summaries are perfect
I think there will always be user preferences for how much "thinking" to see and the format. The text Must remain in context - it's a part of the "context" which allows it to continue to do what it's doing. The only issue is how much we see and what it looks like.
Maybe there should be a callback from the CLI output where we can determine or define our own output for these things, and maybe other things like the language, timestamps, etc. AND/OR, maybe more correctly, we just need to drive for OpenAI to add config.toml props for just about anything they hardcode.
wdym
vanilla macos terminal
Yeah there's no shot.. something is very wrong
5.3 xhigh in my CLI is ripping through the code base like a mad man
eating up the context window insanely fast
doing ridiculous searches and file reads
am I on a Cerebras test release or something? get me out of here
"eating up the context window insanely fast" - that was my experience since first hour
I've written on this topic before : Don't just tell the assistant to do stuff. Add explicit directives in AGENTS.md to create documentation as .md files in a /docs folder, with an index and links to files, so that it can later learn everything about the project before it goes to work. This almost entirely eliminates all of the random 'ls' and 'grep' commands that it needs to issue to guess and hunt all over your application to figure out how it works for every single query.
Does anybody use the codex PR/Review/Fix Issue work flow?
i always have auto reviews on PRs enabled
Create AGENTS.md in every project as well as at the workspace and server levels. Give the assistant some clue about the layout of your projects.
You don't need to do this manually. Tell it to create this information as it discovers resources and assert that it will be used by the assistant on all queries. It will tailor the information and it will get better over time.
Yeah I'm just wondering why sometimes Fix Issues is greyed out and sometimes its not
No no this is something totally different..
I have 5.3 xhigh traces from yesterday and today
Something else is going on here
GLM-5 dropped today. its a good release. I'm using it in both claude code and opencode. Its a good replacement for anthropic. I think it will hurt them a lot (they deserve the pain anyway how they played things when they were a monopoly). its a sweet compliment to codex-5.3 (not everything needs 5.3 to implement, and oai rate limits suck lately, 2x is what should be 1x, so if they go back to 1x, codex will lose a TON of value)
Unfortunately they raised prices, but if you are an existing subscriber no change in pricing so its a hell of a deal for those of us in that boat
Does someone understand what this means?
By my feeling, the 5hrs window is resetting at 9PM which is indeed ahead in my future... but the weekly resetting at 1PM makes zero sense, 1PM is long gone here today
And if I where to assume those to be UTC or anything else, then the 5hrs window reset makes no sense anymore
perhaps it means some other day but doesnt show it
the 5 hour limit is how much you can do in a 5 hour window. For you, your weekly reset will happen before your 5 hour window resets
check in web version of codex
I'm aware what the limits represent, but the times make no sense
Here now it is 17:45
So... when does my weekly reset lol? 1PM is either tomorrow or today past. 9:55 is to come.
ah ok - web version has dates.
Lol, so once they go down to 1x, I will be basically at 0. Hopefully by then the 5.3 is api-available!
Usually, if you're in an ide codex, you'll see the weekly date, but if it's less than 24 hours, it just shows you the time it resets, which can be confusing
ok
Well I have to check in IDE sicne the CLI does not show usage limits etc (or I am to obtuse to see it)
in CLI you can run /status command
Right, thanks. To obtuse, clearly lol
Is anybody else having problems with the Codex VS code extension? Particularly when codex asks to run a command, and you're not able to approve/deny, its just frozen there and perpetually awaits permission.
@lusty nimbus honestly after trying the CLI, I am not going back to anything else
These UIs are all unstable, annoying, distracting, and I even feel the models just work better in the terminal, although that is probably felt not real
So while I cannot see the issue you mention, I can suggest trying the cli
Its truly impressive.
Got it. Thanks! I'll definitely try it out'
0.99 is out
so you basically add an adgents.md on root (or whatever) and in it you write "Create a /docs folder in the projcet root to docuemnt everything of importance for development, and keep it updated", something like this?
This might be interesting because honestly my projects are sometimes so large even the creator does need to dig code to remember why 🤣
The UI isn't relevant. That topic is about directives in AGENTS.md to document projects. Then regardless of which UI you are using it will always use the same directives.
Exactly - been there, done that. Younger and less-experienced developers tend to eschew documentation. FOSS repos typically have none aside from the README. So it's left to each individual to slog through the entire code base to understand how it works. That's a completely unnecessary waste of time for everyone.
We now have the ability to get the bot to document functions, special variables, types, components, modules, build rules, schema, db usage, coding patterns, preferences, deployment requirements and pre-flight, environment requirements, and all of these other things that we need to just internalize by osmosis as we try to understand code ... even stuff that we wrote ourselves last week. This costs us no time or money and saves a lot of both.
More evidence of 5.3 codex xhigh misbehavior...
spawning subagents when not specifically told to
i guess system prompt mentions that it should use subagent for exploration
nothing wrong about it
ok great, more opaqueness, unclear tool use, and potential to exponentially increase cost
Won't be exponential - that was patched.
Is there a skill which summarizes the work done in a session, so that one can quit the current session and start a fresh one, without losing lessons learned? Something like this https://x.com/zarazhangrui/status/2020992712825241801?s=20
yes i built one. trying to improve it
but really you need agent memory to perfect it
i'll share it with you hold up
agent memory is coming so this is just a bandaid.
@potent mason I'm glad I gave the app an honest try, this worktree flow is ridiculously nice lol
My productivity has like quadrupled, good thing they have the 2x limits or I'd be crying right now
Looking forward to it! But in meantime, I'll gladly use the bandaid 🙂
I'm doing this a lot too. I have the assistant in the first session summarize the session, and tell it something like "include everything required for the next assistant who will continue our efforts". I tend to add phrases like "include insights, lessons learned, and helpful knowledge" ... but I'm completely inconsistent about this.
Yeah, that should be in a Skill.
Or, what they want ultimately, is that we'll be paying for more credits to do more of what we want. To me that's completely fair.
Um, isn't this stuff in the Open Source code?
I'd just use Opus when I cap out, be cheaper and more predictable to pay a consistent lower sub for like a 5x than to pay additional overpriced costs for extra usage imo
Also I think using Code Review usage more reliably would save me on weekly usage, but the flow still seems kinda buggy, sometimes it doesnt give me the option to perform the fixes
best way to handle this is SQLite but i'm not putting in that kind of effort when OAI is <1 week fromlaunching an official version heh
In before vesion 0.101
put this in your agents file
Agent Memory
Search your CWD for .codex/memory/MEMORY.md at the beginning of each session. If it exists, load it into memory and take note of any memories/instructions. If it doesn't exist, just proceed forward with your task.
let me make a 'create memory' skill
this will be like init, you need to do this first
but basically in your CWD/.codex make a folder called /memory/
and then add a file called MEMORY.md
Project Memory
Active Handoff
There is a session handoff document at HANDOFF.md in the same directory as the MEMORY.md file ( CWD/.codex/memory/HANDOFF.md).
Read it first at the start of any new session to pick up where the last session left off.
Note: The handoff may not be relevant to your current session, it may be stale. Use your best judgement.
Key Learnings
-learnings
I'm glad I could help👍
yeah but CWD
current working dirctory
/path/to/repo/.codex/memory
you could do your base .codex but then you'd have to manage the project folders
.codex/project/memory/memory.md
thats prob how OAI will do it
🩹
@ivory zodiac do you have a X handle? In case I share this publicly, I'd like to give you proper credit. You can DM it to me if you prefer not to write it here
I copied this and will audit and implement sometime today. Thanks!
cool! enjoy. its not perfect!
it will read HANDOFF every session
if HANDOFF isn't relevant, you just loaded irrelevant context.
thats not necessarily good
i just had an idea.
DON'T add this to your agents file
well hmm acutally this is tough
you can make a skill or explicitly say "read memory"
and not, when you dont want it to
this is a tricky one because you want it to read memory
Okay actually yeah here's waht i'd do.
I would keep this in your AGENTS file
I would remove the HANDOFF instructions INSIDE the memory file.
yes, I was thinking od something like that. There should be a /handoff skill and a /read_handoff skill
(or /write_handoff and /read_handoff)
yeah you got it from here
so, keep the "Agent Memory" section in AGENTS.md, but remove the followingfrom the memory file:
Active Handoff
There is a session handoff document at HANDOFF.md in the same directory as the MEMORY.md file ( CWD/.codex/memory/HANDOFF.md).Read it first at the start of any new session to pick up where the last session left off.
correct?
yeah if you dont want it to read handoff every time
which you probably dont.
i was honestly just testing it out to see if it would reliably read it
and yeah it does
Gotta separate what AGENTS defines for consistent behavior compared to what you want just right now. Be explicit about what the cues are to involve desired actions.
i dont think you need a read handoff skill tho
just @handoff.md
handoff overwrites the old one every time so there's only 1
Note: AGENTS does not prescribe reading of supplemental files for instructions.
it will tho, reliably
its quite good at it
it might not 100% of the time but probably 98%
just keep your agents kinda concise, dont go crazy
It may, but directives outside of AGENTS are non authoritative. I'm running around right now and can't explain in detail. Look up docs and just ask the assistant in ChatGPT how exactly it's defined to work.
If it actually worked like that then we'd have different files for how it should process different details. That would be awesome but it just doesn't work like that ... yet.
i do undersatnd that, but its the best we have right now, and they are authoritative enough to inject the proper context. it doesn't ignore it. it reads it, loads it into the cw, and allows you to pickup where you left off
for all intents and purposes, it is memory and works all the same
the only thing you need to worry about is if you are providing a crap ton of context, it might ignore some stuff.
but thats the case for anything really
I completely agree, we're on the same page.
i like how claude does memory
We could simplify the intent of all of this by telling the assistant to save contextually relevant into a new context.txt file, and then open the next session with "Read context.txt and let's change ...".
hoping the new system from codex is good
has anyone found a way to mount multiple folders into one project?
yeah thats more or less what's happening but i just want it to do it automatically haha
That's called a workspace.
the key difference though is it wont just write memories without being asked to
but the handoff skill does that.
but i dont call the handoff skill every time.
Lovely, but is this possible in Codex?
the real system will use SQLite and should write memories on its own
well, should
idk how it will work in reality
Of course. Create a folder and symlink in others. If you can open it in VSCode or just see stuff from the CLI then it's "in-context".
solid.
didnt even consider that
that answers the whole monorepo ques4tion i get regularly
Doing that all the time ... PNPM also works on "workspace" concept so it's elegant with a single build for all related projects. Codex gets it, VSCode gets it, Linux gets it. Profit!
Oh and each project is still in its own repo. I also tend to have subfolders of sub-projects ... it all really works. Took a while to get brain around multi-project JS/TS projects, using @project/folder syntax but that was a good education.
So...
/opt/codex/repos/WorkspaceName
... Project1
Sub1 ... local under Project1
.git
dist/
tsconfig.json ... etc
Sub2 ... symlinked
.git
dist/
tsconfig.json ... etc
... Project2
.git ....
... refers to @sub1/ ...
AGENTS.md ... these files in workspace root
README.md
package.json (for scripts that build other projects)
@teal cargo just some feedback here. hope you're well.
in 0.99 you guys removed the ability to review subagent sessions when they are closed. This makes auditing impossible. Without any alternative to investigate what went wrong. You cant check them with /resume either. Once they are closed, they're just gone.
Back on the topic of docs, I've been asserting that we can get good quality docs from Codex which help the next assistants to process the next tasks. That's always been a subjective assertion: I believe it's true but cause it seems to be true. But I haven't been able to prove it qualitatively.
I'm now working with Codex to generate metrics to measure model difficulty in processing a project task with and without some documentation. That is, process a task on some undocumented functions, get a metric, add docs, re-run the task in a different session. Does the second assistant perform better with the docs? We shall see.... 🤓
@ivory zodiac I'm guessing you might be able to use this kind of A/B test to verify your notes about multi-agent orchestration.
i was using it a lot to 'perfect' the prompting of my swarms skill to make sure it got as much helpful context up front as possible
i can still do that, but ONLY when the agent is still active
i dont see why i shouldnt be able to go back and check them at any time in the session
oh wait
its back
nevermind VB
lmao
SORRY
I was using the alpha.
😀
I'm not crazy though. There was a note about this in the commits for 0.99 alpha, and i went and checked, and yeah, they disappeared after closing.
But now on GA its working right
thats what i get for ignoring the whole "alpha" disclaimer
It's not like they actually tell us what's changed in release notes for any of these products... 🙄
they do but that was in there
so idk if they changed it back
i'm not 100% sure.
but the reason i even tested it was because I read it in the commit notes.
Yeah, we shouldn't need to read GH commits ... and BTW, commit != merge.
I'm in OAI betas and we never get info about what's changed to see if they actually fixed or enhanced something before they push it to production. So why the beta? Just cuz someone said they need to beta?
( Former QA Manager here, very sensitive on this topic.... )
If anyone wants to see my AGENTS.md directives regarding documentation, I/GPT just abstracted them out to a separate document. DM me if interested.
5.3 vs 5.2 in my CLI today
bothcodex and xhigh
same prompt + plan mode
5.3 shows no reasoning, just tool calls. 5.2 shows reasoning
5.3 has 4% remaining while 5.2 has 86% remaining
This degraded model behavior started yesterday.. 5.3 had been providing phenomenal output since its release.
Consider 5.3 medium : Almost no quota consumption.
No reasoning from 5.3? My guess is that it's just not disclosing it, I think there's a verbose debug mode setting in config.toml.
We shouldn't need to guess on this stuff - should be documented.
Is my understanding correct - codex is provided the date in UTC and there's no way to configure this to align with system or local?
i ask because my codex is constantly time stamping logs and docs incorrectly and digging through the repo with codex returns no way to configure this. ?
@teal cargo ?
by incorrectly, I mean using UTC which isn't intuitively helpful for me
I know it CAN get that info but it shouldn't require an extra prompt or agents.md lines. should be provided at runtime, no?
its already being provided UTC. I should just be able to tune to my system/local time
Sandbox it myself, and run it with --dangerously-bypass-approvals-and-sandbox.
The word "should" is really subjective. All it takes is one line in AGENTS.md:
Whenever stating or logging time, use
date +"%c"from the CLI to get the data.
Completely understandable. For the sake of UX and not wasting tokens, I'm proposing being able to tune the UTC injection to align with system.
I would agree, however, "align with the system" just imposes a different rule that others might not want. I'd suggest allowing the date to be tunable in config.toml.
But frankly if we're going to tune it anyway, why ask for the option to tune it 'there' when we can already tune it 'here'?
That's why i said being able to tune it. in case people don't want that to happen. and the difference is codex working more than it has to vs an instant runtime injection
But ... the Codex server doesn't know what your time zone is, that would have to be determined locally. I'm not disagreeing with the suggestion, I'm saying it's not been considered well enough to adapt to a worldful of other users.
Good thing I'm not a codex dev 😅
hehehe - How about this ... post the suggestion for the ability to tune in some way in config.toml. Until they process that in some way at some time in the future, use the solution that's easily available now and see how it works for you.
I need to think about this more... as I'm thinking about it it seems putting it in AGENTS.md isn't that ideal either.
In short, the model needs the data for output, so am I suggesting that it gets the data from the "local" server and then take it back to the model for incorporation into a response? That would be icky.
My original idea was that logging is or maybe should be done with code, and that code should do a local OS call to date +"%c" (or whatever your preferred format).
So I don't have as solid an answer for you as I thought I did.
Ask ChatGPT. 
did codex remove support for MCP in cli?
everything that required oauth just broke for me
Hummm so I added the AGENTS.md in ~/.codex (it was there already but empty
Then I started codex in a project and prompted List the instruction sources you loaded.
It responded:
• 1. System instructions (global policy and tool-use requirements).
2. Developer instructions (coding agent behavior, sandbox/permissions, collaboration mode, formatting, escalation rules).
3. Repository AGENTS.md instructions provided in your message (/path-to-working-dir/).
4. Your direct user request in this turn.
No skill SKILL.md file was loaded, since no skill was triggered for this request.
I am not sure if it does or does not read the AGENTS.md file?
It seems it reads an AGENTS.md in my working dir, but there is none there
Also I would have expected it actually pasting the actual contents so ew can be sure it is loaded
This is its content, its barely 4kb, so should not hit any limits
meh, seems to work
Does codex quota reset follow utc or my local time,
It says limits reset at 8 30, so is it 830 utc or my local time
which one is whic? i see reasoning in both?
it does
my guy. the second one has
Reasoning Headers
paragraph reasoning text underneath
my guy what?
if you're gonna be a tool, you can just figure it out on your own. i was just trying to help 🤡 my guy
Local time im fairly certain
feelsbadman
It does and i sint got my pc rn
To use some
I was gonna burn thru it yesterday but I fell asleep
i have 3 accounts, always forgetting to use all of them. doesn't feel good to waste it lol
the other half the time i have all my usage burnt on all 3
@torpid trout when it says "Developer instructions" it's referring to "the current user's instructions in ~/.codex/AGENTS.md, with the fair assumption that the current user is a developer. Sometimes it simply doesn't even mention that file when it's absolutely using it.
When in doubt, just ask Codex or ChatGPT about exactly what it means, or the exact files used in a specific configuration. If you had asked for "file paths" rather than "instruction sources" you may (or may not 🤔) have gotten the specific path to ~/.codex/AGENTS.md. 🤡 (Anyone here being reminded (or not!) of the HHGTTG philosophers?)
i hope they launch a $100 plan soon
eventually they'll get tired of hearing about it
bruh
Codex app is driving me crazy
Hopping around trying to find the terminal that has the server running in my 100 chats is annoying
We should get a unified terminal or something
What’s the consensus? 5.3-codex-high or 5.2-high?
5.3
5.3 lol
5.3-c xhigh 😉
high is better for me. it is faster, and sometimes xhigh can overthink for little benefit. even detriment possibly
i use high for almost everything.
actually medium is pretty great too
I just use my Ghostty outside of it, I change the 'open' at the top to Ghostty
I find it just works better
5.2 xhigh
Ngl 5.3 codex is pretty garbage for me
Even on Xhigh I would say it's worse than 5.2 non codex medium. 5.2 Xhigh forget it
Have you tried tmux? 🙂
I checked and I'm not being routed
But it's just so bad
I have no idea how people say this is good
has it always been this bad for you?
or something recent changed?
Codex models always feel this way for me vs normal gpt models
Check your prompting, one file deep in a repo can do that...
I had to change my prompting from "keep docs" to "use code comments" and it fixed most issues
You can instruct it on the style you want in detail. I have this in ~/.codex/AGENTS.override.md. ## Coding Style & Naming Conventions
- In general, follow the guidlines in the books The Art of Readable Code and A Philosophy of Software Design
The Art of Readable Code
Core idea: Write code as if the next reader is a collaborator who needs to understand it quickly and correctly. Readability is a first-class requirement, not polish.
Practices to apply
- Optimize for “time-to-understand.” Prefer simple, explicit constructs over cleverness, even if they’re a few lines longer.
- Choose names that carry intent. Use precise nouns/verbs, include relevant units/constraints (e.g.,
timeout_ms,is_ready), and avoid vague placeholders (data,tmp,handle). - Make control flow easy to scan. Minimize nesting, use guard clauses, keep the “happy path” visually prominent, and avoid surprising side effects.
- Reduce cognitive load. Break complex expressions into well-named intermediate variables; keep functions focused; keep related logic close together.
- Use comments to explain “why,” not “what.” Comment on intent, constraints, non-obvious tradeoffs, invariants, and “gotchas.” Don’t narrate code that’s already clear.
- Keep formatting consistent and informative. Use whitespace, grouping, and consistent patterns so structure is visible at a glance.
- Make tests and diagnostics readable. Tests should communicate intent and failure messages should help the reader localize and understand the problem quickly.
Red flags
- Dense one-liners, deep nesting, unclear naming, and comments that restate code instead of explaining intent or constraints.
Also, a section on A Philosophy of Software Design, but discord is eating that for some reason, maybe I'm too prolix.
Or.... The official instructions on the docs
Instead of some vibe coded 🥛 you found from someone farming engagement
I agree I just can't like the way codex models behave. And I've given 5.3 codex xhigh a really decent try but it just doesn't produce good results for me. It's choosing speed over quality which just means having to spend several sessions debugging. It skips reading large files and hallucinates the missing pieces afterwards.
I'm back on 5.2 xhigh since yesterday and perfectly fine with everyone else staying on 5.3 and its GPUs 😁.
Where is the best spot to get a sense of benchmark performance of different models on lesser known or more eclectic coding benchmarks?
Benchmarks are useless trash
Only a marker of how much a company benchmaxes
Unless of course one finds realistic the notion that models like glm and gemini are apparently on par with opus, gpt 5.2 Xhigh, etc
Which is obviously absurd
Big Facts
v0.100. I called it
5.3-codex-xhigh for most stuff
5.2-xhigh when the former fails
what is this?
@cedar skiff Release tags in the Codex github page
ahh ok, but what does it mean?
The way I interpret it, is that after v0.99 does not come v1.0, but v0.100 🙂
I was thinking you had some juicy tin foil hat ideas about cool things to come 🤣
no 🙂
do you guys utilise skills much with codex?
Skills are absolutely perfect for when you have something you need to stay consistent.
Thanks
what model does sub agents use? parent model?
I am rerouted to 5.2 once more, and guess what no error was surfaced in the CLI.
What's going on with codex?
Opened a github ticket hopefully will be resolved fast.
even after the verification?
Yeah, worked for a bit and now i am back to 5.2...
All i do is work and also develop game mods which at times require reverse engineering.
But if that tripped the system, then the classifier can't distinguish between a developer modding a single-player game and an actual threat actor.
Also they did it stealthily again, no error surfaced, no notice.
did you at anypoint get an error that you should rephrase your porompt ?
Nope not a single time, i am also on the 0.99.0.
Lies, damn lies, and statistics
Parent
niceee
Has the statusline update been released yet?
Hello, is codex down right now?
I keep getting
Stream disconnected before completion: error sending request for url (https://chatgpt.com/backend-api/codex/responses
Not down for me.
Hmm, so statusline is only with preset components? Aka, we cannot actually edit this ourself to excatly how we want it?
Calude has had this option for ages. Its a really great functioanlity I would really appreciate it if they added this.
howdy, I started a task in a git inited project directory but I got
• I also see stale temp package folders from earlier attempts in dist/; I’m cleaning those up so only zip
artifacts remain.
✔ You approved codex to always run commands that start with Get-ChildItem dist -Directory -Filter package* | ForEach-Object { try { Remov...
• Ran Get-ChildItem dist -Directory -Filter package* | ForEach-Object { try { Remove-Item -Recurse -Force $_.FullName -ErrorAction Stop } catch { Write-Host "Could not remove $($_.FullName): $($_.Exception.Message)" } }
└ Could not remove : Access to the path 'C:\Development\web-extensions\cancelx-git\dist\package\icons\icon128.png' is denied.
Could not remove : Access to the path 'C:\Development\web-extensions\cancelx-git\dist\package_04bcef0ff4f84f85b99c7d27d20e1629\icons\icon128.png' is denied.
the path codex can not access were created by codex cli in first place in the previous task.
How can I grant codex cli any permission in cwd? I'd like it not to stop for such things
--dangerously-bypass-approvals-and-sandbox
or -s danger-full-access in sandobox
i've been using opencode with 5.3, and this morning i can only see 5.1 and 5.2. I've uninstalled, removed all configs, reinstalled, refreshed models. I've got chatgpt plus subscription. Anyone know what's going on? Running arch linux
sub agents are really good now, niceee
https://github.com/openai/codex/issues/11561
people are getting routed to 5.2 again while being ID verified..
How can I check if I'm being rerouted? I don't think I am, but I'm curious to know
I'm still 5.3 phew
I run this in terminal: RUST_LOG='codex_api::sse::responses=trace' codex exec --ephemeral "say ok" 2>&1 | rg -m 1 '^model:'
RUST_LOG='codex_api::sse::responses=trace' codex exec --sandbox read-only --model gpt-5.3-codex 'ping' 2>&1 | grep -m1 'SSE event: {"type":"response.created"' | sed 's/^.*SSE event: //' | jq -r '.response.model'
I like this one, quick & self-contained
what would response look like?
it would respond with the model being used to serve with your requests
in my case, 'gpt-5.2-2025-12-11' despite having gpt-5.3 selected
you can test it much faster you just need jq installed:
RUST_LOG='codex_api::sse::responses=trace' codex exec --sandbox read-only --model gpt-5.3-codex 'ping' 2>&1 \
| grep -m1 'SSE event: {"type":"response.created"' \
| sed 's/^.*SSE event: //' \
| jq -r '.response.model'
yes mine still on 5.3 since after verification
Anyway this is a real productivity killer.
I don't like it that they have the power to do such a thing without any notice.
Sent them an email about it, and opened a github ticket: https://github.com/openai/codex/issues/11561
I noticed this as well a lot
how about vscode
Are you using the CLI version of codex?
the IDE context feature. is it on when I click on it and disappears, or the way it is now - showing up in blue ? it's just confusing
I don't know how you can test it in the VS Code extension.
If you are on Linux you can use the CLI to test it.
nobody knows
I have not yet tried (and maybe I should just try it but sometimes I prefer asking) sub agents
How does it work?
Assume this scneario:
- we are in WD,
codexprompt:
- Refactor the monolyth with 5000 lines in file xxx/zzz/y into modular smaller dedicated files spearating by concern
- Implement new feature XYZ
- Write a user-facing documentation about how to install the app in /doc/user
Is the thing smart enough to spawn subagents (without asking for it) and make one agent wait on the other?
Or will it end up in a total mess as feature XYZ would be in the monolyth, but the monolyth is being ripped apart for refactor, and will doc subagent document everything, or only what he finds at the moment of prompt?
Asking GPT it says "usually" it will organise the subagents correctly but that for safety we should prompt it to do so - I then just wonder, how sub-agents are even useful, because it becomes a synchroneus process, which could as well be done manually (wait for results of task A, then prompt B and then C, also allowing us actually to see and test what task A resulted with before we go ahead with Task B)
you guys completely broke codex IDE. Look at this, it just stops all the time, without fixing the issue. What did you do ? Vibe coding is cool and all, but test your changes. What is this model, is it even 5.3 codex ? It does not seem so at all. It just lazy as hell, and just stops, without doing anything. What did you do ?
The silent reroute problem is bigger than people realize.
Developers rely on these tools daily.
When a model is silently swapped, the damage is done before we even notice.
Work gets built on the wrong model, time gets wasted, and nobody told us.
This goes beyond inconvenience.
These tools are now part of how we compete and deliver work.
Being silently cut off from the model I'm paying for puts me at a real disadvantage compared to others who have access.
And it's happening on false grounds.
The classifier is supposed to catch attackers.
Instead it's flagging developers doing normal work.
Real attackers don't use services tied to their real name and credit card.
The people getting caught were never a threat.
What we need:
Tell us when we're being rerouted, don't do it silently.
Fix false positives fast, hours not days.
Make verification stick, not expire without notice.
Silently downgrading paying users isn't security, it's a trust problem.
using the same model in opencode btw and i'm not getting this lazy behaviour. It's so strange and it does not even seem like the same model there
now I am scared that I am going to be rerouted 5.2-codex
More life like than ever.
"Yeah, yeah, I'll definitely do that"
Codex version of Claude CoWork?
Sounds like a you problem in blindly trusting clankers
I believe OpenAI knows what they're doing, delivering the most cost effective tool they can afford
OMG Is yowave still ranting about a problem that was fixed yesterday!?!
If every ChatGPT user actually paid for Plus, the price could drop to half for everyone already paying.
It wasn't fixed, i followed the verification, and today back to 5.2.
This is also not a rant, it's a real issue.
Part of the OpenAI ethos is to provide AI for humanity - that requires "free but limited" for the masses. I have no issues with that.
I hope so. Claude CoWork is fantastic
That's absolutely not how companies work
I explained this yesterday. Won't re-engage today.
Bugs happen. Sometimes bad ones - and yeah, this one was bad. We all try to fix what's broke.
I see you created a new ticket - that's great. It's now 7am in SF. It's now 7am in SF. Give um some time to wake up and get to work. Allow them time to process it. Be sure to logout and even restart your WSL ... it's common for some details to require exiting/re-init before taking effect. Be a professional and move on.
Verification? Rerouting?
I never had to do any "verification" since log in, and I checked with RUST_LOG='codex_api::sse::responses=trace' codex exec --sandbox read-only --model gpt-5.3-codex 'ping' 2>&1
| grep -m1 'SSE event: {"type":"response.created"'
| sed 's/^.*SSE event: //'
| jq -r '.response.model'
says gpt 5.3
Did I miss something? 😄
Or maybe because I am on v0.98?
is anyone else having codex app problems where it is overriding session policy to never?
I will say after the latest update, Codex is very responsive and not consuming lots and lots of RAM so OAI is definitely hard at work 🙂
If you use it for security work, you'll get flagged. Although it's been a little overzealous. I was using Shannon to pentest a couple of projects and that flagged me
Since verifying, it's been fine though.
EDIT: FOR ME. I'll get that in before I get the "well it's not working for me" replies lol
or they all just come over to your house and use yours 🤣
hello
Does coex automatically reset its ocntext window? if i leave my ide as is for the nextfew hours then come back will the ontext still remain
guys i have a
question
Hey, complete aside here - in the IDE, check the F1 + "new context window" feature. It's a real treat to get out of the sidebar and have multiple chat windows/contexts.
google haves a discord?
Seems to me that you don't understand the gravity of such an issue.
This isn't a bug, it was a deliberate design decision to silently reroute users without telling them.
Restarting WSL won't fix a server side account flag.
Pointing out that silently degrading a paid service is a serious problem isn't unprofessional, it's necessary.
These tools are part of how we compete and deliver work, being cut off from the model we're paying for puts us at a real disadvantage.
As a community we shouldn't normalize silent service degradation, no matter how it's framed.
This isn't the place to ask about Google and Discord. Try googling.
But no. Google has Google Groups.
Should this happen "Codex ran out of room in the model's context window. Start a new thread or clear earlier history before retrying." ? (latest version of codex and enabled memory, so maybe this is the reason)
This isn't a bug, it was a deliberate design decision to silently reroute users without telling them.
Responded to this yesterday.
Restarting WSL won't fix a server side account flag.
It may if the client side v0.99 was supposed to address our side of it.
Pointing out...
Over and Over and Over ....?
we shouldn't normalize silent service degradation
Addressed.
( getting back into code where no one is whining except me )
i think this is a bug, it just happened to me as well
I'm excited about Codex App for Windows and Android, but I need to wait to install until after all of the initial wailing and gnashing of teeth, the fallout from the 0.x rollout, and the subsequent fixes. When it's stable I'll load. Until then I won't be an alpha. I am a beta for ChatGPT but get no feedback from the company for that. Since they've disrespected me as a beta, I won't even try their alpha. This is how these things work.
will be funny if iOS gets the Codex app before Windows 🤭
Your responses show you don't understand how this works.
The model you get served is decided server-side, not client-side.
v0.99 updates the CLI, it doesn't change what model the API returns.
Your config can say gpt-5.3-codex all day, if the server decides to send you 5.2 based on an account flag, no client update or WSL restart will change that.
This is basic client-server architecture.
What have you guys been using for frontend skills? I feel like thats where OpenAI has been struggling the most compared to Claude. I'm working on a new project, and just looking to explore what everyone else has been using.
Genuinely I just boot up Opencode and made Kimi 2.5 do the design process.
Codex is unfortunately not good enough at design even with the front-end design skill and other skills it just doesn't nail it.
Please remember that I've been writing systems for decades. You can rightfully suggest that I'm wrong about something version-specific but I'm verifiably on top of the technologies.
There are no 0.99 release notes so we don't know what was done there. Note that the GH agent linked https://github.com/openai/codex/issues/11592 to Your GH issue because there seemed to be a link between the client and model degradation. Without looking at the code we don't know what flags are flying back and forth.
You've made your point that this is serious. I believe anyone who understands the issue agrees.
What else do you want here?
I've found Codex to be really good with React but I need to improve directives to get it to modularize better. By default (5.2 anyway) it creates huge components with everything and the kitchen sink included in one function.. That's not cool. I want it to abstract hooks and other functionality to other modules to make them easier to maintain. Other than that, it understands all the React rules and writes good code.
Yeah FWIW I think the React component it writes are fine, and if you've got your project structured to encourage that abstraction I don't think Codex has an issue.
Decades of experience doesn't change how client-server architecture works.
The client sends a request with a model parameter, the server decides what to serve back.
No client-side flag changes a server-side account classifier decision.
The linked issue confirms it, the rerouting is server-side based on their cyber abuse classifier, not a client bug.
The code is open source on GitHub, you can verify this yourself.
I personally modified frontend-design skill and use 5.2c and 5.3c. Prior to this, I still use the models without the skill - multiple iterations.
Modified skill + prompting gets me my result in 1-3 iterations.
The real issue is actual styling.
I've got a couple of jr's that are swearing up and down about Opus because it's better at styling xD
Since gpt-5, openai has made it possible to get decent UI. Prior, not a single model could come close to other SOTA at that time
GPT 5 is like ancient now
Those other models for me have never made “good” designs, they simply added more details than asked for. So, less prompting can lead to a decent enough result for most people.
Feels like it, especially in terms of capability. And that was just months ago 😭
competition made us get this models faster , without that they surely wont release something this powerful yet, and they still have more, IMO model and all
FWIW We first design in figma, then either try and use their broken MCP server or usually end up screenshotting components and sending them over. In this case Opus usually wins.
Codex mobile is only good if it acts as an ssh tunnel.
Otherwise the cloud rate limits cost something like 3x more.
Ah, that's a great point. I don't use standard design process, but I imagine this is tool / prompting based which I am surprised that GPT-5 family of models couldn't follow nearly as correctly.
im really hoping that openai makes even better models for taking down claude as much as i love claude there way over priced
Everyone is going to make better models 👌
The only way is up
Yeah but there's reason to believe they will
baby
absolutely. I have always been ready to jump from one provider to another. I just always found myself back at OpenAI's models.
I need a --web flag or something for codex app --web since its just an electron app, would be nice to access localhost:8080 or somehing and use the app
You know what, I'll do it myself
Persistent agent workflows coming soon?
oh my god im gonna bust
ok this is getting really hard to follow... https://x.com/sama/status/2021984777470193767?s=20
When you name everything codex, its hard to tell what codex is getting the "special thing"...
Is it only for the codex mac app, or will it be in the codex cli as well? lol
it's getting like 'Copilot' lol
its worse imo
That's the second time I've seen someone say the update 'sparks joy' or something along those lines
@tim tebow from codex, please clarify
Oh boy...
they also keep using this emoji ✨
I swear to God, if it is only for Mac, im burning something
buy a mac
burn your pc
@glass furnace I have one. I dont like it.
YES! FINALLY A GLITTER THEME FOR CODEX
I cringe almost to death every time I see keep4o on samas posts.
If anything, it reinforces why it needs to go!
I just want the codex app to use SSH so i can stop using the tui
Wasnt that like forever ago
Yeah, or maybe they just use the API. They still have GPT 3.5 turbo on there lol
it is still on every one of his posts lol
I know it's crazy
yes they are
Is the codex app open source?
or just the cli i might be dumb i dont see it in the repo
lol it will just be windows codex version
for pro usres only 💀
or control your computer. oooring boring oring ring ring ing ing ng
@spiral gorge Makes no sense to make that Pro-only
Im putting my money on ultra-fast model
Sparkles... Joy...
and fast....
what joy? 5.3 has no benchmarks
its sooo annoying pissing me off
maybe the Codex app is not the only thing coming today
I swear it feels like since yesterday I'm on a Cerebras test branch or something
model output has been totally different
faster, yes. but definitely not normal output
can someoen tell me what sub i am on?
I really dislike the interfaces since GPT-5.1 and newer models...
Has anyone been able to workaround it?
Even if you give it a UI library like shadcn, it still goes for the "gamer" look with very specific palette and gradients
Not a fan of the navy/black gradient with white writing? 😂
GPT-5 is already "deprecated" 😄
Well, I just wanted to make a interface that doesn't look out of place on phones lol
Gemini team always uses that emoji... and they are talking "fast" so maybe like claude introduced a couple days ago a fast version of opus for like 3x the cost running on google tpus, thats what this is... fast codex (2-3x speed up) running on google tpus for only pro users.
would that bring "joy"?🤔
keep 4o is one of the weirdest "did not see that coming" moments since LLM's became main stream
That would make me very happy
Whos max
(¬‿¬ )
I think pro users are getting mobile app today
Either that or pro/max versions
Yeah it's definitely the case
That should fix your issue
I cant wait to go BRRRRRRR with Max
So 5.3 max is a fast version of 5.3?
5.3 is fast
I guess this is the nod to their terrible naming conventions... not gambling lol
5.3 max is probably just like 5.1 max
i dont think it's a app
better, maybe new capability given their shared language on twitter
Codex fast and codex max
a codex fast would be a mini model, right?
bc 5.3 is already significantly faster
you sure about that lol
its either the iOS app, windows app beta test, or codex 5.3 MAX/FAST model varient
they cant just release app to pro users
they launched Codex app + 5.3 at the same time. could be both.
they can release it to all users but it can not be functional with any login unless you're Pro
I don’t mind
I do
It’ll make up for the rerouting haha
i kinda doubt it
it's most likely the 5.3 max/fast varient
a reset of limit would be nice too 😂
I wish they would take note from Anthropic's org billing and allow Teams the ability to buy a Pro-level of usage.
get a Codex Pro plan
go into debt if you have to
not about cost, it's about team availability + simplified payment.
I cant believe im giving 200$/month for this, and Im not even mad. When I joined the Pro plan, I thought "Im sure I will cancel it in 1 or 2 months".
And here I am, not even caring
We need a $100 plan 😅
So... the 8th best coder in the world... not too shabby.
IS the 'max' a good coding model? or what is max? Ive been estarnged from openai for a while
Max is a good thinking model, the base model coding is what you get on it. Codex use that foundation but tuned for coding and codex
This too
give it 3 months and all the ai labs will be competing for rank 1 lmao
deepseek v4 drops within the next 5 days
the chinese models released the last few days already mogging opus 4.5
1/20th the cost
I have a feeling 6 months down the line AI providers will decide that its time to specialize more than competing for the benchmarks
90% the results
GLM-5 is going to force the US labs to either lower prices or increase rates or a combo of both
After certain % its no longer improvement, it can never achieve 100%. Ideally AI shouldn't achieve 100%.
gotta love the 1 year release cycles when everyone else is month-to-month
yeah lmao
MiniMax just dropped M2.5 as well
moggs opus
1/20th the cost
codex hanging on to top spot right now by very thin margins
Hence today’s release
price moggs**
Opus is better at many other things though so that's the benefit of opus
opus is good at literally everything
you know where else m2.5 mogs?
SIZE
460gb full weights
its relatively tiny
i feel bad for elon man
Open weight models are going to be slow, they are not fighting the markets and open weight models need improvements on all levels including not having any hardware to run it instead of causing global hardware shortage 😂
me too honestly
grok 4.20 release within a week apparently has a lot to live up to im worried it's gonna be trash
Wsp
i just bought a second rtx6000
still too small 😂
How's everyone?
192gb vram and its still not remotely enough
not exciting given the mass exodus of founding staff
4.2 is DOA
its literally dead
elon himself admits its not a strong coding model
no one outside of x is going to use it
It's due to restructring and the merge with spaceX they had to let them go
Keep doing it, you will eventually able to run a non distilled foundation LLM model that can follow your instructions with more than 3 messages as context in about 1 year time
Grok 5 or 6 is their only hope
you can make local models very quick. esp on nv hardware
apple mlx is what runs slow
but the problem is intelligence
going to have ot run reap or quantized versions
or both
and smaller context windows
i may have misunderstood your point there actually
but yeah. we'll see lol
i think that smaller models are going to get better.
i'd rather have the hardware now than see it go up 50% in cost.
or more.
User: Hi
LLM: Let me think what to respond to hi before I send a response
......
LLM: Hi, How can I help you today?
User: How many rs in strawberry?
Error! Ran out of context window
😂
also VLMs are small and i have some ideas for those
i'm not worried. in a shortage, it'll be the easiest thing in the world to sell.
Minimax-M2.5
SWE-Bench Verified: 80.2%
Multi-SWE-Bench: 51.3%
BrowseComp: 76.3%
half the price of GLM-5
its also 33% smaller
What are you talking about? local inference is face melting on MLX. If you're using a brand new model then the fused kernels might not exist yet, and I've had to write my own fused kernels to get certain models to work, but it's far from slow
compared to nvidia, it is slow. and you know it. and concurrency. i'm not saying its SLOW overall, i'm talking comparatively.
but apple is still great
not hating. i just took a different path
its getting better too, mlx.
china is mogging with the acceleration ngl
that is a wild chart
Is minimax free
it's the cheapest API basically free
not typically but you can usually find it in things like kilo code and such
You know the M3 Ultra Mac Studios have RDMA over Thunderbolt 5. You can scale horizontally and achieve similar performance, and two 512GB Mac Studios cost less than a H100 and uses like 800 watts
the coding plan is $10
What are the limits
Optimized thinking efficiency + 100 tps to achieve 3x faster than opus
Priced at $0.3/M in / $1.2 out (20x cheaper than Opus 4.6)
better than anthropics 20 plan
probably like 2 20 plans
yes i'm aware but its not the same performance.
its good though
probably better than Claude $100 plan considering people run out after asking a bunch of things in Opus 4.6
everything is a tradeoff
apple gets you vram and power so much cheaper.
i got a free rtx 6000 or i'd have went that route too.
Codex should have $100 plan, it will generate more revenue than Go plan.
ship it to me and you still can go that route with no regrets
has anyone made a 2nd cursor that u can invoke or give tasks like "Open spotify and surprise me with a song"
idk i'm thinking of making that
dude it was such a blessing man
NVIDIA Parakeet will be the speech to text and gpt 5.1 nano will be the image gatherer and action taker
is this a fire idea
or am i burning the bed
see, more reason for you to spread that blessing to others
Anyone else getting "Error creating task" from VSCode extension? I can execute a prompt from the sidebar but not from a separate "New Codex Agent" panel.
v0.99
I couldnt resist it
There is only 3 UIs that Codex can make and we all know exactly how all 3 UIs will look like.
is the m4 pro notably better than the base m4 if you aren't using it for local llms?
i am thinking about returning my mini and getting the pro
Depends on how many core you are going for
help me out here bro
BTW, I've been working with the Apple engineers for the MLX project and we discovered a major source of performance issues was metal residency set not wiring down model weights, kv cache, or attention workspace during inference. I've patched these issues myself, in some cases inference speed increased by 400x 40x. The hardware might not have the same TFLOPs as Nvidia but with wired down memory and zero copying across memory it's going to perform almost as well as datacenters
thats sick!
best of luck to you on that.
well that's what i'm trying to understand
how much those exstra cores are needed for just running coding agents
in parallel
swarms
its going to be a dev machine.
10 would be enough, coding agents dont shouldn't eat up that much of anything
I run my coding agents and servers on a 2017 PC
you'd be surprised claude can crash my 5900x with 32gb ram and 3090
might be a ghostyy bug unsure
but if i run too many at once, it can crash
i have a new machine i'm building but for now this is what i'm using
Finally pro users get some love
Claude eats up if you are on the same CLI for ever, vs code extensions are literally cancer
but i am missing some of the macos apps.
i use cursor like 1x a month and thats it
i burn my Pro account and dont touch it again lol
codex is a bit forgiving compared to claude cli not because its built better but probably because of rust
I have a 16GB M1 Pro at home and tbh the only thing I regret is not getting at least 32GB. I got it before I realized unified memory means the GPU is going to eat away at the same RAM space, and dev stuff gets uncomfortably tight at 16GB. Otherwise yeah I think pro models are quite a bit faster
yeah i am getting the 24gb for sure.
just not sure if i get pro or non pro 24
realistically i'm waiting on m5 stuff... but who knows how long
or waht it'll cost.
so i'm trying not to go crazy right now but also, its on sale. so i'm conflicted.
M5 pro is out right?
What's your typical workflow? Or like expected usage
just doing normal dev work, building slop 😂 mostly micro apps, automations, front ends, etc
You dont need mac pro, don't waste money just burn it on AI 😛
AI optimal tools for February 2026:
- 5.3 Codex xHigh fast for everything coding
- MiniMax M2.5 for everything else
no need to complicate it
will update after deepseek release in a few days
Before you get used to it you will get another model released and you will be like wow she looks better
if OAI drops codex 5.3 max for pro users today i dont think anything will pass it this month
If it's web dev then probably dont need the pro chip. Native dev, where you'll be compiling stuff or running Metal in a simulator the pro chip will definitely feel more comfortable
yeah i have done some compiling, mostly rust
i think i'll take your advice
get the 24 non pro
cargo builds can crash my current machine too lol.
Rust + sccache, and static libraries only can be pretty fast on base M chips. Once you start working with iOS and having to build dylibs you no longer get sccache benefits and having more cores is definitely going to be faster
thanks brosef.
saved me $400
one thing about apple products
they hold value like nothing else.
easy to pivot
The extreme flexibility of GPT-5.1 to 5.3 interfaces 🙄
To be honest there were 2 or 3 examples that it actually got a decent style, but most of them are barely different from default interface...
Curious what workflow you guys are using for codex i.e a deterministic way of developing software.
https://github.com/EveryInc/compound-engineering-plugin
I'm gonna try this this weekend in codex. I used it 2 weeks ago in claude code and opencode with claude and it worked great but it eats lots of tokens.
deterministic?? way of developing software?? aint no such thing
I mean step based that compound step is pretty cool wouldnt you agree
Maybe deterministic is wrong word but it got commands for each stage.
i knew it lol just a fast mode
https://openai.com/index/introducing-gpt-5-3-codex-spark/ because announcement takes time
Wait... 1000 tps
whoa
how to use the spark model in cli? seems like not available yet for pro users
1000
Beautiful
Absolutely, I believe you're just talking about a formula, pattern, consistent flow by which code can be created. Answer ... I think a lot of people, including myself, are developing tooling like this. We'll probably see an explosing of them this year.
get your CC out... we coding today lol
128k context window
text only
but 1000 tps
Are you using one atm? I felt the difference between raw dodging and using compound engineering was massive.
MacBook Pro has an M5, not the M5 Pro chip.
128k context window
that is FAST
it's faster but less accurate
that has to be cerebras
it is
it is
Yeah they say that in the blog
it is
legends
now what is the best harness to take advantage of 5.3 high planning and 5.3 spark execution and hammering commands and generating logs to debug?
that's insane
makes me want to subscribe to the 200 plan
but i can't afford it 💔
this article isn't even promoted above codex app or 5.3. so this isn't even big news.
anyone got access yet?
oh?
interesting
so we might see this speed on Codex 5.3 with its normal context window
very exciting
Pre-commit git hooks, conflict resolution hooks, etc. Let the big brain agents to the real work and let spark handle the bookkeeping
personally, a 25% increase from 5.2->5.3 was enough for me. I will not sacrifice 10-15 points of performance for speed.
oh my god that makes sam's original tweet make more sense
"it sparks joy for me"
hahaha
My tooling is in development. I can't discuss much here. Essentially it's extremely thorough about developing modular and well-documented projects, with specific focus on common components like database, file-system, UI, performance, and most importantly, self-maintenance ... it continually checks itself for ways to improve.
My joy has been sparked
dude running alpha version!
LOL you show that model who's boss
if you're not, you ngmi
The models are fetched dynamically so
Shouldn’t need to
ideally yes but if GA the model shouldnt be restricted to apha release of the CLI
it wont
Weird, it should've finished building the snake game
I still dont haveeeeee. Why not?!?!?! I want it 😄
You are not the chosen one!
But I chose myself!!
Denied!
codex exec -m gpt-5.3-codex-test-youcanputanyslugandstillgetaresponse --sandbox read-only --skip-git-repo-check --json 'Reply with exactly OK and nothing else.'
Ah, you mean all GPT Pro AND small set of API