#codex-discussions
1 messages · Page 21 of 1
Honestly, the design itself looks way less bad than GPT usually makes
(Not referring to the "content", more like the layout/allowever look)
Which is why I rest assured you already gave it a good amount of spanks to go there lol
Because, otherwise it would look like this
You will see this same style on like at least 5 other "I made" pictures in this very channel lol.
Same background, same radius, same colors, same font.
last month they had a opus fast thinking version with 9x 😂
5.4-Mini is out.
In Codex, GPT‑5.4 mini is available across the Codex app, CLI, IDE extension and web. It uses only 30% of the GPT‑5.4 quota, letting developers quickly handle simpler coding tasks in Codex for about one-third the cost. Codex can also delegate to GPT‑5.4 mini subagents so that less reasoning-intensive work runs on the cheaper model.
Gpt 5.4 codex out when
5.4 Xhigh for planning,
5.4 mini for implementation?
YESSSSSS 🙏
A nano model!!!
Benchmarks from OpenAI's blog post.
what's price diff?
I have been waiting for an update to 5-mini!!
If I ask 5.4 to read my code and then plan feature x (without writing to disk)
And then switch to 5.4-mini to implement the plan in the same chat, does 5.4 mini have all the context it needs from the earlier research still?
Thought the mini line was dead
I asked for this yesterday 😂
anyone know how to update the windows codex gui app?
Its so easy on mac but cant find the update button on windows
It has the same context window so that'd work
MS Store
You are not imagining things, Anthropic models have this "intuition", they just fall short in execution. If they play their cards right they might get on top because their models do have this side to them.
My understanding is that usage limits are based on tokens, and tokens are measured in text read and output, since the research phase would involve the most file reading, does that mean there won't be much token savings on the text input side when switching to 5.4 mini for implementation since the file reading / research is already done? Mostly output token savings?
It's interesting that gpt-5.4-nano has higher latency than mini
If ai is ur power what are u whit out it?
Actually the "compute cost" of a token with mini is less than the full-size model. The only cost you would incur in the way you described is when you switch from gpt-5.4 to gpt-5.4-mini, it has to replay the whole conversation for mini which would not get cache benefits, but if mini does more work than the planner then it comes out the same or better
I see, so it sounds like trying to incorporate mini into subagents to summarize file reads and also using it for implementation could save a lot on usage limits
Yes, if the primary agent uses the full-size model, and does a good job describing the task to the mini subagents, they can knock out the work faster for half the cost. And then the full-size model checks on their work, makes minor corrections. You might get a little less than 2x quota that way, if the mini agents take fewer than 2 full passes to do their task
would be nice to be able to put that instruction in agents.md, so the agent can switch itself out and back in lol
A person who managed to grow from a farmer's son to a programmer with 10+ years experience on the subject, a licensed boatbuilder, construction worker and forklift driver lol
If AI is all you "can"... then you have a problem
But also, its off topic 🙂
In another domain (image generation) one case is labelled as overfitted and the other as ideally trained.
Just shipped v0.13.0 of the Elixir Codex SDK, including subagent capabilities! https://hex.pm/packages/codex_sdk
Lmfao the question / answer reminded me so much of the iron man thing about who he is without his suit
Yeah he does a couple times but the main point is not that he takes it off.
Someone who thinks he's better walks up to him and asks him what he is without his suit and he proceeds to deliver some sick lines.
I don't really remember them though it's been like 5-7 years since I saw it
I misunderstood you, all clear n
Programming
Finally someone asking the important questions
yeah but what
do i program
and where do i sell
i suscriptions for claude, chatgpt and cursor
have*
Program the solution to a problem you're having (assuming there isn't a solution already)
Sell it to other people having that problem
hmmm
Really dumb example:
If I'm having issues expanding my b2b section of my restaurant business because I don't know my clients build something that when the payment passes it takes in the information it knows about them to research them and see if they are a potential b2b customer.
Then you sell it in a restaurant forum
hmmm
your lowk right
but aint no body paying any $ these days
people are becomming really broke
@potent mason habe u built anything good w ai?
My really good products are pre-ai and now that AI came I'm improving them faster.
A big portion of my business comes from b2b businesses so yeah for them we build really quickly and personalized with AI
does anyone have data on inference speeds for codex with 5.4 mini
anybody using it, how does it feel?
{"detail":"Bad Request"}
Hmmmmmmm
well to be fair this thread has been going on for a day nonstop using it
You should reachout to the team work on Symphony at OpenAI. They are Elixir based and I believe just invoking Codex through the CLI.
yep, we're building a Symphony-inspired app based on Jido, the most feature-rich agent framework in Elixir. Lots to do, and we're always looking for talented contributors
The Elixir codex_sdk is a different animal than simple integration with the CLI. It exceeds the capability of the official SDK.
for me gpt-5.4-mini has not been working very well in codex framework, it consistently gets confused when asked to do a task that requires using multiple skills (skill 1 --> get the data, skill 2 --> format the data into an excel file). It erroneously claims that it doesn't have access to skill 1's tool, even though when pressed it admits it does have access to it. And gpt-5.4 never has any issues with this task
closest ive gotten to wasting my usage
what is the difference between 5 hour and weekly usage limit
one resets every 5 hours and one resets weekly
yea but which one is used for what
oh nvm I see now
I've got a series of prompts to run using GPT-5.4 and my prompt runner using codex_sdk if you want to use up some of that weekly usage 😄
where can i see it
Don't let Altman win.
Unfortunately that is just a gpt 5.4 issue
I literally have it produce evidence tracks meanwhile, as in exact evidence it has to add to a json file with goal, lines (file:number) of code that implement goal, and so on
Extremely stupid, but if I dont force it to prove it worked, it literally just does whatever it wants specially when plans become large. But also when small, because then it just goes assume "this is an easy one, keep it simple stupid"
It is honestly become a bit hilarious. Not much different than when you work with humans
"Its done sir test sir"
Opens website and first sir sees is the font is still like 3 iterations before...
People on github found out what was causing that nasty usage bug, it appears that Codex was searching in previous conversations related information to the request... lol
So now you have to delete your archive if you don't want it to use a lot
or maybe openai fixes this and does another reset
Can we hook the process that names the current Codex task? It derives the task title based on the prompt but often gets it wrong, requiring (if I care) an edit of the task name. It would be much better if we can deterministically set the task name, maybe just by specifying it at the top of the prompt - but I haven't had much success with that either. TY
Would be cool if entering iddqd in the chat turned on danger-full-access one time for only that turn 😏 jk
I have a CLI "hey do this" which runs without full access, but if I execute with an exclamation mark it uses full access: "hey do this!"
i know i have to use it fully
if (noticeably_distressed(user_message.clone()) {
turn.set_sandbox_mode("danger-full-access".to_owned());
}
Holy smokes dude, 5.4-mini is amazing! It feels closer to a Codex model
have you noticed what difference it makes in terms of token usage?
I'd say it depletes the quota at the same rate as the full 5.4 model, but does twice as much work during that time. It's like /fast mode at the cost of nothing
With this setup, idk if a 2nd Pro account is necessary, and when the 2x til April 2nd promo ends I think it wont matter that much. 5.4-mini is absolutely killing it
that's very interesting that it's intrinsically faster and getting very similar results. That's got to be better for rapidly iterating on designs
I haven't even noticed a performance/accuracy issue. I mean it appears to be as accurate in most cases as 5.4, and the times when it's not, the full size model reviews their work and has em fix it. Seems like the best combo is 5.4 medium for planning/orchestration and 5.4-mini high for impl
Sadly no
I only got the confirmation, but no shipping notification
anyone here that purchased codex credits for 40€? how many messages that equals?
o4-mini was the only useable mini model openai released
Roughly 8hrs of continuing work, it’s 1000 credits and each user message consumes significantly more than one credit (since model response etc is also counted)
My GPT5.4 with High reasoning tasks tend to consume about 150 credits per hour, in case that helps. I've actually been wondering if it'd be more cost effective to use the API for stuff like that but haven't really followed that up yet.
GPT5.4 with Medium reasoning probably more like 105ish credits per hour.
have fun
oh... my... goodness I freaking love gpt-5.4-mini. I don't even know if I wanna use the full-size anymore
API is extremely expensive
I wasted 40 bucks in like 4 hours
I wanted to share a skill I use
• Create a skill to optimize the use of assistants or coding agents, reducing cost, latency, and duplicated work without sacrificing
quality. Always prioritize deterministic operations before invoking expensive models.
The skill should:
- Normalize each request into a simple structure with
intent,scope,target,action, andconstraints. - Classify the workload as
single,batch,heavy, ordiagnostic. - Apply an efficiency policy:
single: reuse context and cache before redoing work.batch: group similar tasks by project or time window.heavy: split work into subtasks and parallelize them.diagnostic: measure before changing anything.
- Route by cost/quality:
- local tools or deterministic logic first,
- mid-tier model second,
- premium model only for critical tasks or when confidence is low.
- Record operational metrics and provide actionable recommendations.
The output should always include:
queuesuccess_rateavg_duration_mscache,warning, andsuccessevents- estimated cost trend
- top waste factors
- the next 3 prioritized actions with expected impact
Add these heuristics:
- If duplicate prompts exceed 15%, expand cache/context reuse.
- If there are multiple rate-limit events per hour, reduce concurrency and enable fallback routing.
- If the queue grows while quality remains stable, increase batching.
- If quality drops, reduce batching and raise model quality only for critical tasks.
Include these security rules:
- Never expose secrets.
- Report credential states only as
OK,MISSING, orINVALID. - Redact prompts and logs if they contain sensitive data.
Hi @torpid trout
I was looking for you the other day
for what
nvm, 5.4 is still an irreplaceable orchestrator/planner. 5.4-mini is just absolutely killing it with implementation right now.
Every time I've used subagents to explore code the main guy just goes and explores it himself as well
In my work today I've been a bit annoyed with 5.4/medium. It's being a bit dumb about things that I wouldn't expect.
Example, a document has sections which describe a task. One of the sections is for Discussion, where I discuss the task with the assistant before coding. All instructions point to discussions in that section, with alternating "Codex:">"Starbuck" / "Codex:">"Starbuck" etc.
The model has chosen to put some responses at the bottom of the file after other sections, outside of the Discussion section. It's reasoning is something like "you said to put discussion after other comments, but you didn't say exactly 'where' under the other comments..."
That's downright dumb, I've not had to babysit prior models with painfully explicit instructions. ChatGPT 4-early5 has frequently asked excruciatingly detailed questions when not necessary. I didn't expect it here. 😿
@torpid trout remember the conversations about public training?
I don't understand that, given the specs show lower code-related performance than base 5.4. Sure, "killing it" on simpler tasks is fine. But if the task is more difficult and the mini model didn't actually kill it, then we need to crank it to 11 and re-do or fix the now, um, "undead" "it". 🧟♂️
For those who don't check generated code, they won't know whether mini has killed it or not, there might be some lazy coding that looks fine in the UI but doesn't do any error handling or misses obvious edge cases.
I guess, as always, we need to see where it fits and doesn't. YMMV
Hello
Oy ve, and we have 5.4-mini : low/med/high/xhigh ... tune yourself into oblivion. 😆
lots of Codex "Working" text hanging for 10+ minutes, then i have to reset it..
TUI input frozen, can't press esc or Ctrl + C to cancel
Is your wifi good? Only reason I can think of is that following http requests aren't working
Too many things in my mind… refresh my memory?
This discord is study center
Our interaction are maybe being used to feed the models
🙄
@boreal holly finally made it to PR
Holy
I feel bad for whoever has to check that PR
Should've honestly made it a stacked PR
its just a personal project so im not too concerned lol
Just curious. With the addition of gpt 5.4 mini, how do i make sure that subagents use my bigger model?
Like, last time when i set to gpt 5.4 xhigh, im sure the subagents all use that too.
But now, how can i be sure thats the case? I dont wanna use the smaller models.
who says codex sucks at front end design
this sucks
it does really suck
In what cases do you ever just want an llm to go free for all and make all the design choices?
@velvet wren yo! Sorry for tagging, but i trust ur judgement haha
I cant think of any cases where design matters that you would ever do that
i think you set it in agent profiles in config.toml
How’s the experience with -mini?
time for the 100% cloud code review to produce 2 review comments
It absolutely slaps. In many ways better than 5.4 somehow when it comes to the act of writing code
So mass change all agents to mini ? Lol
I'm getting weaker performance from 5.4 and 5.3 codex atm, anyone else noticing this?
Rules and skills that are usually enforced are being ignored
So it’ll make our usage extend?
basically codex is only showing reds not greens.
might not be a huge problem but putting it out there
those look like deleted files
its been acting weird in showing transcripts of what changed
these are not deleted haha
Something is a little off atm.
still works
codex is cool
small changes in system prompt and models can have large sweeping consequences for work flows. Model idiosyncrasies change in an instant.
Someone in this discord server got exposed for being a pdf in a YouTube video
dude what does this mean for being a pdf
is a pdf talking to us?
lol
hahahha
hello i am .pdf I can speak to you liek i am a human
I’ll send the video in DMS from the YouTuber
just link it
I think this looks nice!
Crazy stuff if it's true
without a link to the video could just be ai fodder, link the video
Kinda what I was thinking lol
269b? how many accounts do you have
2 Pro's, but they also reset usage in the middle of the week a handfull of times lol
it's definetely not possible to use 269b tokens in a month on just two pros
Tell it to the police
I got reset at like 30% 2-3 times in the middle of the week
I'm not sure, I don't even know what the allowed tokens are from 100-0% currently with the 2x, its just what codexbar is showing me
Hey guys, what are your honest thoughts on ChatGPT 5.4 Mini so far?
I've been testing it out in Codex. The low quota consumption is nice, but honestly, it’s giving me the worst flashbacks to what vibecoding felt like about a year and a half ago.
The model just feels incredibly dumb. It’s not that it makes glaring errors or breaks the syntax, but the actual reasoning and initiative just aren't there.
Is it just me, or is anyone else getting the same vibe?
I haven't touched it yet. What if you used it for implementation and used another model for the plan?
I solved my sandboxing issue on windows with not-so-reassuring --yolo from CLI, but how do I go in order to prevend VSCode extension from doing this ?
Did they nuke 5.2?
no, they didn't
just use gpt-5.4 mini 🙏
how does it feel?
it performs well
it's like gpt 5.3 xhigh/high based on my exp
niceee, i will give it a try
happy limit reset day to everyone 😁
I have a new project, might try 5.4 mini
One day someone got to explain to me how comes that for one person model x performs „amazing“ and for the other person „trash“
These things shouldn’t be subjective
For example a person here said gpt 5.4 on copilot is a beast. And I tested and run away scared at the awful produce lol, it performed worse than on bare codex.
Or one says gpt 5.4 mini is amazing, yet for another it makes syntax errors. This can’t be a skills issue. No one will tell a skill to make syntax errors.
Are we not getting the same models? I mean… CDN routing and all that…
boils down to "skill issue". It all depends on how you prompt and what you expect the model to assume.
different model idiosyncrasies and learning curve.
Using one model for a long time and then trying to switch to another and expecting it to react the same given the same process will result in poor performance. This translates to people thinking a model is really bad, but the reality is they havent learnt to use it.
A/B tests maybe?
A/B testing (or split testing) is a controlled experiment comparing two versions (A and B) of a digital asset—such as webpages, emails, or ads—to determine which performs better based on metrics like conversion rates. Users are randomly shown one version, and statistical analysis determines the winner.
I am interested though how do they do the statistics in relation to LLM agents, if one doesn't know they are being A/B tested and doesn't inform them? Do they go through the chat and then infer from that?
you can capture other metrics, not necessary directly throught the chat
like quantity of inputs, outputs, time of reasoning, costs, etc
also is a hypothesis, I don't know really if they do it, but it would be reasonable to do it for improving model. Some people have not the option "Not train the model" so there is that also.
the LLM is one part, the others is the result and other metrics perceived
we should analyze the terms and conditions and see what data they use and what not
but if you have activated the "train the model" option, they should be using some of the chat data
i cant open files in codex? i mean like an ide where i can view and edit code
Yeah it's very interesting and I think it's a wide range of answers. If you worked with Claude on front end work and then try to do that with codex the model seems incredibly dumb. And vice versa for backend jobs. Then there are people who pin their hopes for the future to this new technology so they get super defensive about criticism. They're the first to yell it's prompting issue. But then, in many cases it really is a prompting issue. I think other times the model does something really great on a whimsical idea, making people spontaneously post praise but then later when the model starts to make mistakes they don't post that part. Then for sure DevOps, connection issues vary performance regionally. And then people using lower reasoning levels and declare it sucks. Or really low reasoning on very small specific tasks and they say it's great.
Maybe the biggest factor tho is that these models just behave differently by the week for whatever myriad if reasons. For sure they seem to shine in week 1 and then it's mostly downhill from there. So inconsistent performance to some degree should cause inconsistent reviews.
ohh boy... might manage to use full codex pro sub weekly limit for a first time 😄
from what countries are u people?
Australia
I use Windows with WSL and Codex CLI. I'm tring to use Codex APP but it can't access my skills at wsl. Is it a problem? Can I set this?
what languaje rules do you use? is there a public page that rules the languaje used in Australia?
I am building a semantic relationer
we use en-AU
thanks
Challenge de-CH plz 🫣
But the real one. The one that is different in each canton lol
Yeah, consistent performance has always been an issue with LLMs. I have warmed up to Codex recently for larger tasks. In the previous year the most I would be willing to let AI do is a function here and there and code auto complete. 5.4 on Extra High for complex tasks has been great for the most part. I was trying to refactor a project to split the 'installer' logic away from the business logic so the installer framework was reusable. Even using plan mode it will occasionally just say 'nah screw this, I know a better way'-
After completing the plan it mentioned nothing about this 'shortcut' which completely undermined the whole purpose of the refactor I was doing, until I discovered it reviewing the code and asked about it
This might be related to what I found to be a system prompt issue. I made a custom profile and described it here, which solved that issue of taking shortcuts and not reading skills/agents/linked instruction files, etc. you can have a look at it and see if it helps. Note this configured for complex codebases and sets xhigh as default for everything, you can make your own based of it. https://gitlab.vialogos.dev/vialogos/codex-strict-profile
Interesting, i'll look it over, thanks
does long chat lenghts effect token usage and performance?
planning any limits reset? 🤣
Been using it for that, and it performs the same if not better than full-size 5.4 in a fraction of the time.
Syntax errors really isn't the model's fault. The project needs to be set up to automatically execute static code analysis and linting, or the agent instructed to run these at the end of every turn.
5.4-mini makes syntax errors sometimes, but I hardly ever know about it because I have a script that runs on pre-commit that identifies all syntax errors and sends it right back to the agent to fix. What's important to me is an agent that eventually meets the completion criteria. My biggest gripe with the GPT models is they oftentimes exaggerate how complete their job is, but they have two layers of mandatory code review, one that catches logical bugs after static code analysis passes and the other catches completion criteria after logical bugs. Once they pass all 3 layers of validation the end result is it builds and runs precisely as it was described without errors.
I like 5.4-mini because it seems to take less than half the time to reach the finish line and the end result is as good as the full-size model. And it appears to consume 3x less quota and about the same number of passes. That's a huge win in my book
I am speechless. I have made an empty folder, gone inside and gave him a prompt:
$remotion-best-practices Make a video of a people using a terminal for random stuff
You see it in motion, and you wouldnt believe it
This is NUTS
Can I post videos?
oh, I can
Seems like it needs some time to slack sometimes. I already posted this in the other channel, but yesterday I found this paragraph buried inside the 5.4 thinking log between two normal working entries. I was using the ChatGPT app at that moment, though, as my weekly rate is currently at 0%.
It never mentioned anything in the chat and just kept going, but I suspect that was a funny but intransparent way of telling me that either the context window was compacted or it switched to another model/mode under the hood.
Considering a break
I’m feeling like I need a bit of a mental escape. Maybe I’ve been overthinking things, or it’s just one of those moments. A quick reset could help me refocus and come back with clearer ideas. Time to step away and come back to this refreshed!
As said, this was just one random paragraph inside a long activity log. If these kind of things happen mid-single-prompt work then I wonder how that affects the outcome and especially change in behavior/precision before and after (all without the user knowing).
is codex actually free rn?
free as in freedom
Are you saying codex made a video?
That is absolute insanity
what app is this?
I just told him now "Make a video for a promotional video, to show how invoices are made"
It's literally taking the React components of my frontend, and using them, to create the video
Wait thats so sick
By the way fellas, week reset in 2 hours... turn on those /fast right away 😄
its beautiful
I did last week and burned through my weekly ratio on the same day. 😛
Codex has been doing fantastic work for me lately, it even comes up with some really insightful enhancements to my prompts regularly. Appreciate the work. I haven't really noticed anything specifically different with 5.4, but I also provide very specific prompts so there's not a lot of room for ambiguity, but, anyway, years of development in weeks. What a time to be alive!
How is everyone's experience with different programming languages, especially converting source from one to another? I am only using Codex for vibe coding in Autohotkey v1.1, but even just converting that into a AHK v2 source failed miserably.
Hello there
Is there a problem with limits ? I am on pro plan and I just (for the first time in 3 months) hit the weekly limit, while my usage has not change that much
True easter egg lol?
I remember asking in github about gpt 5.3 having a skill to create sora video scripts but not creating the actual video
However, how did you solve the issue with chrome crashing?
GPT wants to run it headless to capture screen, so it says, yet it constantly crashes on it
what the hell happened to rate limits? It takes me just 1 day to eat through what would take me all week
ohh here we go again... stream disconnected before completion: An error occurred while processing your request. You can retry your request, or contact us through our help center at help.openai.com.....
I have a question. Why openai is not developing the codex ide for windows and linux?
@ashen shard It is available for windows
As for Linux... bah! Who develops on Linux?! /s
I don't want vs code extension. code ide like wundsurf
one more 😭
🤣
The first pass was too broad for a clean patch. I’m splitting the edits into smaller chunks now so I can land the structural JS changes safely without trampling unrelated logic.
do i need to live with that?
Do you guys have a way to check the token count per request as does Claude code ? That is a super useful feature to know wether you are burning tokens.
Up I shared a skill. You have to use de cli, ask codex to lookup for rate limits, and in every run you can calculate before and after
I need to get a handle on Codex agentic workflow - multi-agent processing by default in Codex. There are modules that should not be run concurrently, and instructions direct the assistant not to run in parallel, but it frequently does so anyway. I need to direct instructions to the agentic flow and get a handle on when and how Codex spawns its own agents.
This might actually be "subagents". If I can direct the assistant to launch a specific subagent to run a process, and that agent checks for an existing run, it can exit gracefully to not interrupt its sibling.
I have not yet done my homework on how these components work. Does anyone have a good link for details?
TYVM
For now I'm here: https://developers.openai.com/codex/concepts/subagents
You should use one line of command or things will get messy. Never ask different agents to work in the same location
One orquestrator
Or with the same function
Tito, I'm asking it to validate some generated documents. To be efficient it's taking the initiative to run validation on multiple documents simultaneously. That's not valid in this environment.
This is yet another example of where it's increasingly considering directives to be suggestions ... not cool.
If you look around, I'm a mature developer, very experienced with the tools. I don't do anything in a haphazard fashion.
It seem subagents are mainly for parallel work, but I can’t help any further. Probably you know more than me
Thanks for looking around.
Chatgpt says: The official Codex docs now explicitly cover subagents and multi-agent workflows, including parallel execution and custom agents. I’d start here: the Subagents guide, the Subagents concepts page, and the Codex prompting guide for notes on parallel_tool_calls. If some modules share state or must run strictly in sequence, they’re probably poor candidates for parallel subagents unless you isolate state or add your own orchestration layer/checks.
Maybe can be helpful
Note the link above. That is the subagents guide. 🙂
Also note that I did direct the assistant not to use parallel processing but it does anyway. That's why I want to understand how it's working, so that I can get more control.
I've found a lack in my knowledge - I'm striving to fill the gap.
Some weird things happening with codex cli 0.115.0 at times it doesn't work to completion for some reason...
Maybe its giving one implicit prompt to parallel work, and your explicit prompt generatrs a conflict?
If it follows the explicit prompt it won't create a conflicting (implicit) scenario.
To be clear....
I have AGENTS.md refer to a docs folder which contains /procedures and /project. It's directed to follow procedures in order to approach specific tasks. I did this before skills and subagents. It's a known fact that the model will (should!) follow AGENTS.md as a "prescriptive" set of directives ... it must follow those directives.
However, it's also known that it's flaky about this, so we need to carefully craft the prompt in that file to emphasize things that are important.
When AGENTS.md refers to other folders or files, it's in a "descriptive" mode, it considers everything outside of the the AGENTS hierarchy as information, a suggestion, a guide to how things should work. We can't rely on the model to follow directives in common doc files.
I have moved directives about parallel processing into AGENTS.md for this reason.
However, Codex is now initiating parallel processes, and I don't know what directives guide that process, or the subagents that it spawns.
That's why I want to get a better handle on this. I need to revise my system to the new feature set. I need an education in the new agents/subagents management to do this.
... End of line.
(Quick! Which movie is that from?!)
I tried sub agents today on Codex, it is pretty nice
I don't think it can do anything more than it would normally, but it for sure did complete the task faster
also uses a lot more tokens, but makes sense since it is basically doing the same amount of worker is a smaller amount of time..
so how do you get XHigh to make the plan
then use mini to implement?
do you have to skip the Implement the plan option
switch models, then type out "Implement the plan" manually?
Is it any different than a common app that uses an orchestrator to kick off new agent objects?
Ouch 🤕 that seems to be abusive use of the tools. If anything, go for high for planning and medium for implementation. Right now you're kinda using a genius to plan and a dummy to implement. You can cut back on both sides. 🙂
gpt-5.4 is acting weird for me today. Anyone else having the same?
can you be more specific?
Stops midway, forgetting AGENTS.md rules, being lazy (which is harder to explain, it's like at times it's reasoning is gutted).
I love when Codex is like "Let me show you a plan you can implement over the next few weeks to get this done" as if my next input isn't "Do it now"
Seriously LOL
Did your context collapse?
What that means and how do i know?
It will show you a little line break saying collapsing
IIRC it means it is taking a massive context and compacting it so it isnt using as many tokens and allowed window
... and when that happens it "gets dumb" because it doesn't know stuff that it knew before. That can explain reasoning being gutted ... it's missing information that it had for reasoning through a problem.
About forgetting AGENTS.md rules ... I've been discussing that here since yesterday.
About being lazy ... that may depend on the low/med/high/xhigh setting, or perhaps the specific context and prompts.
Bottom line for now, check the local environment and answer these questions for yourself: What are you asking it to do? Does it have the information and directives to do that?
An example where someone might say "it's not following directions": I have a scenario where a single issue resulted in two downstream errors. Codex created a single issue/ticket for the anomaly. I discuss issues with the assistant before we process, to make sure we understand the issue and have a solid plan to address it. << @calm aurora So in the ticket I setup a two step process to address the anomalies, first the downstream issue which shouldn't have tripped like it did, and then the primary issue with caused that secondary problem. After processing step 1 I directed it to process step 2. It did not, saying step 2 isn't actually related to step 1, it's a separate problem. Well dang! It's right. I need to create another ticket and have it process that one.
I could have said "it's not following directions". Really, it is doing more than just following directions, it's keeping the housekeeping in order, ensuring that I myself am correctly following protocol.
Be careful what you ask for, you might get it.
Be aware of what you ask for, you might get it.
I get 5.4 medium to make the plan and orchestrate the implementors, while the implementors are 5.4-mini high.
Also this morning I used 5.4-mini directly to build a zapier automation for my HVAC company. Flawless victory in like 2 hours. 5.4-mini is pretty legit
so not using plan mode directly ?
or what exactly is the handoff for "orchestrate the implementors" in this case?
how does a computer scientist get into starting a HVAC company ?? 💪
I just read that implicit prompts are given, and it will mantain that even recognising your rules
I read it in the Times Magazine artificial inteligence the promise and perils
OK so the process and rationale is this:
5.4 medium: smart enough to plan, but more importantly fast and decisive at orchestrating other agents. On high, it thinks and acts too slowly, and the agents overwhelm the high reasoning agent. I don't use plan mode or whatever the built-in thing is.
5.4-mini high: Also smart enough, and with enough agents working on smaller units of related work they stay focused. Having a single 5.4-mini do one big massive feature is a recipe for disaster, but a handful of em split across systems and communicating with each other keeps their lifetime short and drift free.
And the primary agent double-checks everything. I would say 70% of the context for an agent is consumed by navigating the codebase, applying patches, running code validation, etc. So the planner/orchestrator does not apply patches or run code validation. So 30% for navigation, the rest for communication between agents. That's kinda the split to avoid context poisoning and stuff
robert out here just giving away free alpha 🔥
thank you sir, will try this today 🫡
you can just build things
Thats what im saying!!!
We SO need a library for cool stuff like this. Forums don't cut it.
I used codex 5.3 high for most of today, it feels like it uses less tokens
is there any way to check how many tokens i used in my codex in a week?
my man
I think theres a tool somewhere, someone mightve even posted in this channel, that scans the rollout logs and produces stats like that
yeah found one just now, not sure how accurate it is, thanks
i defo get my money worth
Link?
anyone have a codex guide that explain some tipps and tricks ?
The CLI reads Codex session JSONL files located under CODEX_HOME (defaults to ~/.codex). Each file represents a single Codex CLI session and contains running token totals that the tool converts into per-day or per-month deltas.
-- https://ccusage.com/guide/codex/
just use it, as you use it more you will work out how to do all kinds of cool things
i could probably do like 1-2h webinar if there is voice channel here, basically same what i show from time to time to students at UNI
basically how i build bigger project as developer using codex
i switch from manus to codex, i build my own webside
but its better like manus 😄
I get much of my education from:
- Medium articles
- YouTube
- OpenAI.com documentation ( all over the place )
- OpenAI Discourse Developers' forum
Like anything in this industry, we need to keep our eyes on the available resources and ingest the information in whatever way fits our preferences - and time constraints.
You should not use ccusage to track Codex token consumption, since it is not reliable and is mostly focused on Claude Code.
Try llm-usage-metrics: https://github.com/ayagmar/llm-usage-metrics
Or tokscale: https://github.com/junhoyeo/tokscale
HVAC company came first 😅 didnt start taking computer science seriously until 2015
BTW Robert, it looks to me like your FF product can be used for carpets too. Jus' sayin...
ive heard people use it for stuff like that! Or like “I wanna hang a picture frame with these dimensions” so they size it up with the AR cam
ready to go outside
Well, it's been 4 hours... who is already below 90%?
Anyone can't access Codex?
"We are currently experiencing high demand, which may cause temporary errors"
Has anyone used codex app on browser?
me 😂
How much?
take two hopefully this doesn’t suck
Whats the advantage over the App though?
presumably it runs on Linux
i agree with the sentiment that linux doesnt get as much love as it deserves in terms of official support. that might change now that AI makes building stuff on linux so easy that theres no excuse not to do it lol
Is there a way to make 'Commit and push' the default?
It's just a matter of numbers. Most peeps start with Windows and stay there. A subset of devs who can afford it and have need code over MacOS. The number of nerds who code in Linux, especially with a desktop, is very small in comparison. Any survey will confirm this, and product developers need to follow the numbers.
The new codex multi-agent system is quite effective, but its current interface is unnecessarily complex. A better solution might be to implement a sub-chat popup, similar to the composition or chat windows in Gmail. This approach would allow users to easily monitor the actions of the sub-agents.
how is the cuurent subagent complex, it's as simple as it can be
and you can just ask codex what model to use on the sub, and all, it's all just to ask
I am using the Codex App on a Mac desktop, and while it delegates tasks to sub-agents, I only receive a summary of their work. I need the ability to view the sub-agents' reasoning and work-in-progress, not just the final resume.
you can click and pop into the subagent chat too to see what's going on, have you tried that?
you can click open and you see everything the subagent is doing
Oh I see, now, haha! Thanks
you are welcome
Woohoo - subagents Sagan and Plato ... playing with the big boys now! 🤓
I wonder if they name subagents according to the difficulty level of the current task. 🙂 Compare ... Bozo and Einstein...
I think so, and there are only two.
I know I just wanted to implement them into CMUX
codex running crazy slow for anyone else today?
Modern Development:
20 minutes to get specs for the next issue
10 minutes to craft a good prompt for multiple sub-agents
5 minutes for agents to run in parallel and for the coordinator to wrapup
30 minutes to figure out what they did
Old Development (like, two weeks ago)
20 minutes to get specs for the next issue
45 minutes for processing
Difference between old and new = 0
I'm exaggerating for fun, don't really believe 0 ROI.
But I feel so far like I've significantly shifted how I work, without defining metrics to quantify before/after efforts. This is fundamental to "the promise of AI".
Over time I want to work through this. Anyone else already have some tooling? It should be like a simple timer and logs. 🙂
You either build a full spec, let it run for a long time and then manually start fixing issues one by one or
You make a base and implement features one by one and fix/steer incoming issues
There's no magic bullet.
I didn't realize what this setting did until just now:
Standard/Fast is different from Low/Medium/High/ExtraHigh ... 🤦♂️
yeah fast means pay more but for faster responses
How do people make good use out of worktrees? You need to create a new worktree per thread right? What's the proper workflow? It seems messy enough to have to merge things back and handle recreating a worktree after every feature
I will have to take a look and learn about agents and skills. But to clarify beforehand, should it be possible to:
- use one Nano agent to read a log file and extract the numbers...
- then a Mini agent to analyze/interpret the numbers...
- then full 5.4 (Codex) to apply/suggest code changes based on the results?
Yes, I just did that.
Oh ... NOW it tells me. 😆
Thanks. I have to dig into (or ask ChatGPT) how to do that, but I assume using Nano and Mini for these tasks should help stay below my weekly and 5 hour Plus limits?
Here's how...
I just got this from my task history:
Spawn a subagent with model GPT-5.4-mini, reasoning effort medium, to refactor the ...
Take some time to consider ...Spawn another subagent with the same specs to go through code under src and find ...
It did exactly what I asked.
At this point I am only speculating that using Nano and Mini will save on tokens with these Plus limits. I used up my weekly by trying /Fast too much and sending to much data back and forth last week, but meanwhile I kept working on the code via ChatGPT app using Thinking. Now Codex is available again.
FWIW, I've been using this stuff for a Long time. I rarely even get close to the 5 hours quota because (see above) I review the results of the last effort, plan the next, and then get on with the next effort. The actual processing time is minimal because I have procedures documented, full product documentation, and all of the code is documented - the model doesn't need to waste time to hunt for anything or figure out how things work, it's all in text already.
Today is the first day that I've got down to minutes left on my 5 day quota, renewing tomorrow morning, and that's only because I had Fast mode on.
Summary: YMMV but unless you're working with a huge and poorly documented code set, you shouldn't really be concerned about limits. Just use the right model and reasoning effort to address each challenge. Don't just set it and forget it, change it for every new task.
I have two accounts this month so im gonna /fast one of htem the whole time and see what happens
It will give me a good idea of how my limits will be when they remove the 2x rate limits
since it should essentially cancel it out
I had ChatGPT insert lots of comments hints for itself and write docs, so that helps. But trying things for the first time using /Fast burned through the weekly.
5.4 Thinking in the ChatGPT app needs multiple cleaning/bug-hunting passes after adding new functionality, finding new issues each time. With the new weekly I am curious how Codex handles the code that went through Thinking 3 times in a row already.
Does /fast get passed to the sub agents as well?
Thanks for the hints, it's well appreciated. 👍
yep
It's weird but it's been my impression that no matter how many times I run code through a review it's always going to find something new wrong with it. It's never satisfied. It's always nitpicking. It's like someone's nagging mother-in-law. 👵 😆
Yes, but even after running the code through 5.4 Thinking multiple times, Codex now finds such blatant omissions like:
The maintained package still calls AppendFailureLine() and SanitizeLogText() from live code, but those helpers are not defined anywhere in the maintained files.
How does codex compare to Claude code?
im sure somebody has a blog post about it
There are Way too many articles on that topic. Google for it. If you have a Medium membership ($5/month) there are hundreds of articles there to compare all of these tools. I use Google surveys to get credit and then apply that to Medium ... that means it's "free".
Different topic: Maybe I'm missing one of the basic feature/benefits of the Codex app. I frequently have a discussion in ChatGPT, get my head straight on a topic, then go to Codex in VSCode as though it's a completely different product. I need to explain the goals and proposed solution. There's no context carried from ChatGPT to Codex.
Is this what's now built-in to the app?
The thing is, we get a ton of chat credit in ChatGPT but there are quotas in Codex. So I don't want to burden Codex with chitchat. Now that the models are mostly/fully aligned I trust that I'm essentially chatting with the same model regardless of the client app. But I still feel that disconnect between the two.
Suggestions?
Why does Codex only allow you to start new worktrees not reuse existing ones? I feel like I'm going insane
you're not going insane
this is just a totally new paradigm of computing
and no one knows how to use it "correctly" yet
But it seems like such an arbitrary limitation that makes using worktrees significantly more awkward with no benefit I can discern so I'm trying to figure out what I'm missing
Every single time you create a new thread you have to create a whole new worktree, which in my case means an entirely new build from scratch, downloading dependencies, etc. It actuallly makes the feature unusable
or simpler stuff like you can't enter plan mode until you wait for 1 minute compaction 😆
ya Codex desktop app ain't it.
why not use TUI?
Don't know what that is. And funnily enough each worktree has a notion of linked conversations, but the only one with multiple conversations started those through subagents. So it seems like it's just a UI oversight or something
Its more a limitation of Git I believe not Codex, but you can create another worktree and target the branch of the other worktree
"Yes. A Git worktree is not limited to a single commit.
A worktree is simply a separate working directory attached to the same repository. Each worktree checks out a specific branch or commit, but once it’s set up, it behaves like a normal working copy of that branch."
I never said it was limited to a single commit
I must misunderstand what you're asking
Again, the problem with creating another worktree is it implies rebuilding my entire project which involves downloading dependencies and compiling them. I don't want to do that every single time I add a new feature
I only started working with Git worktrees within the last few months. It's a tough setup just for Git. Have no idea yet how to integrate/adapt with Codex. 🙄
The problem is that I want to be able to start threads working on features in parallel but without having to download and rebuild my entire application in every thread
consider symlinks?
Why do you need to download / rebuild, this sounds like you just don't have a good worktree creation script setup
I mean I suppose I could try to hack a solution but it seems insane because I don't get why I can't just reuse a worktree
Because my build process involves pulling in dependencies and building them, not a hardcoded symlink to the "real" build directory. However I'm happy to have multiple build directories for different worktrees if I don't have to create them with every single new feature
Create it as a permanent worktree
And also probably want to have separate build directories anyway for separate worktrees
How do you do that?
Oh I see...
Either create the worktree as you normally would then just manually add it as a 'project' or theres a button under it somewhere to create permanent worktree
Oh thank you this is probably what I was looking for
@lean lark I asked GPT about which model type to use for which (subagent) task. Here is the important excerpt from its longer answer:
For your priorities—speed, reliability, and ChatGPT Plus usage limits—the main practical point is this:
Do not split work into many subagents unless the boundaries are clean.
More subagents can reduce per-step difficulty, but they also add overhead: more prompts, more context handoff, more opportunities for lossy summarization, and more total tokens. GPT-5.4 is specifically described as reducing end-to-end time and often completing multi-step agent workflows with fewer tokens and tool calls. So over-decomposition can be worse than one stronger agent handling two or three adjacent steps.
A better setup is usually:
Nano: strict extraction only
Mini: first-pass analysis + goal comparison
Full 5.4: algorithm redesign + implementation + review
That is the best balance in most cases.
One more important caveat: for ChatGPT Plus limits, model-routing behavior inside Codex/ChatGPT product surfaces is not publicly documented in a way that lets me quantify exactly how your weekly and 5-hour limits are consumed by each internal subagent choice. The official docs clearly describe model capabilities and relative positioning, but not a precise consumer-plan accounting model for your specific orchestration pattern. So the capability recommendation above is well-supported; the exact Plus-limit economics are less transparent from the public docs.
Seems like we can only speculate at this point that using Mini and Nano for subagents will be lighter on our Plus plan limits, based on their per token pricing. Or are there any official docs about this?
None leave it all to AI
i like this autocomplete for commands, super helpful. but i cant figure out how to actually accept it
i would expect tab to work but it does not
Arrow right likely
yup that worked, ty
i am astonished at how good codex is
Yeah honestly insanely good
Do subagents all work on the same worktree? The parallel writes seem to overlay on each other. It should be possible to distinguish the edits of each subagent and merge them intelligently (not by random order of patches they apply). I don't really understand the point as it is now.
Judging by the fact it's not described in the "subagent" section, it wasn't an important concern for the devs 🤯
Or am I missing something?
@weary jasper subagents more than anything help for the following in my experience:
#1 Specialized instructions (so for example a reviewer didn't write the code and it recieves instructions to be critical about it making it more likely to be correct).
#2 You don't context bloat the main agent (so for example if you need to explore the codebase for certain logic flows, you get some subagents to view all of the code and report back on it's findings so instead of 20-30k tokens it's just 1-2k of logic based tokens).
#3 You sometimes can get things done in parallel but you have to be specific about how it should act with blockers / seperate points.
optimize ur app pls
Only 6GB of RAM? That's honestly not even that crazy for Claude Code I've seen it go up to 100GB RAM
Maybe try force quitting it and reopening it but I'm getting about 2-3 GB so not much to gain from it
wqhy even use the app youc ant even open files
Codex crashes for me in WSL while it's doing Context compact..
and i send a follow up message with Enter instead of Tab
just hangs on Working indefinitely...
does a long conversation effect the token consumption and performance?
?
areu abe to view ad edit your scrpts
what scripts
Hello 👋
Please can explain me what will be the use of /skills in codex??
Thanks
I miss slash commands
i thought it was deprecated, can we start a subagent in plan mode?
Good boy
o.0?
yes, prompts have been deprecated in favor of skills instead
Unfortunately Codex really wasted my time yesterday. I repeatedly told me that it already fixed some issue when I tried to tell it that the fix did not work. And even thoroughly repeated answers to a second point where were working on while discussing the main issue, despite being told in the prompt and docs not to repeat the same information over and over again. It basically talked to me like to a kid who didn't understand its last answer.
Ive not used sub-agents once but my weekly limit is rapidly decreasing? Not sure how to check whats going on here but this was never the case
i feel like they changed the limits or smth
like a week ago
or maybe it was undercounting before that and they just fixed it, idk
last I checked there's a massive number of comments in one or more issues on GitHub (for Codex CLI) complaining about this, but no resolution officially yet. I checked a few days ago at least, not sure what the current situation is, but back then the issue appeared to have no official acknowledgement of a problem on OpenAI's side
In the meantime what helped slow my usage consumption down is:
- Use
rtk(reduces LLM token consumption by 60-90% on common dev commands), you can have developer instructions in Codex CLI automatically prompt the AI LLM to prefix certain commands to usertkinstead to yield token savings, while at the same time reducing the amount of context used too - Reasoning effort to
highinstead ofxhighalso helped reduce my consumption rate on GPT-5.4, while not noticing any real difference in output quality overall
I feel like 5.4 uses more
i switcehd back to 5.3 for a few days and it seems like it uses less and it feels more controlled
i hereby announce to the people of planetius: gpt 5.4 inside codex app is officially agi for vibers
Can I get codex spark without pro?
yes the fast one is available with plus sub
Fast one?
i think spark means fast gpt 5.4
Hmm
So I need to turn on fast mode?
yes. 2 to 4 times more token usage
u dont need fast lol
u are young steam play cs2... u absolutely dont need fast mode
LOL
Im 22
I have a compound with my friends. We have cs2 servers, minecraft server, csgo servers, a website and now we are devving a fiveM sevrer
We use codex for like minigames in fiveM
what are minigames
Google "FiveM rp minigames"
oh nice 😄 money money
I can't wait for GPT-6-Codex to be released (if it does)... Keeping in mind the performance increase over the last few models, it will totally be near-AGI level at least for coding.
For that to happen Anthropic/Google should push new models to keep the competition hot.
Yes, for sure.
Otherwise they will likely keep their models for themselves until it happens
I am rooting for them all, may they all have great success!
Spark is a model.
Fast is a mode.
I think spark is only on pro. You know immediately by switching models - if you see gpt-5.3-codex-spark you have access otherwise not.
Fast mode is basically a priority/inference speed mode and doesn’t matter on model used. It’s just driving the speed of responding, not the model used.
spark you get with pro, you dont need fast mode for it, it just a smaller faster model, not as smart be really quick for realtime coding usage
does any one have a quick answer about default_mode_request_user_input with custom subagents?
Like how do i get them to use it
Pro and a small subset of plus users
codex windows, automatically compaction...
just
means...
delete everything? lol
from the looks there wont be codex variants again, everything is baked into main model
seems like they make models that target specific work and then distill them into one model
What happens is that they create the specialty model (Base+Specialty) and then pull the specialty into the Base. This eliminates the situation where ChatGPT isn't as good as Codex.
Making something "compact" does not mean "delete". Compaction exists specifically to avoid deletion.
Your context window has a limited number of tokens. When you go beyond that the window moves down with the most recent exchanges.
The model no longer has access to the earlier discussion and seems to forget those details. In ChatGPT the text/discussion isn't deleted, it's just no longer accessible. Larger context means more is accessible - but larger context also means the model must juggle more tokens, and with current technology it can get confused. So we tend to limit context quantity in favor of quality.
In Codex, when we reach the context limit, rather than just losing access to the older tokens, it gets summarized for us. The details are removed and the general ideas and important notes are retained. That top part of the context is then "compacted" down into the summary, reducing the total number of tokens in context, allowing us to continue without doing anything to keep the discussion sane.
Oh yeah, the models we use are definitely distilled from a larger model. They usually train a massive and slow model on a massive corpus of data to create a base checkpoint, then from that checkpoint teach it instruction semantics <|start|> and <|end|> tokens or whatever they use so the input output is structured and make that an instruct checkpoint, and finally they train it on conversational semantics and specialized sequences to create the final chat checkpoint.
Then they use a different algorithm and neural net for a smaller model, initialize with completely random values, feed it inputs, and the big model grades the logits that the small model generated which tunes the parameters of the small model to the way the big model would've done it.
So if they want another general purpose model, they distill with lots and lots of general purpose inputs. If they want a codex model they focus mostly on agentic tasks and code.
The benefit is both models end up with the same vocabulary, the small model doesn't need the entire large data corpus, they just need to see the world as the big model sees it with the same vocab. It's genius because if you give a small model the same base dataset it interprets it differently and less accurately than if the big model trains the small model on how to see the world.
They probably make GPT Pro models first, and that model trains the smaller ones
does the codex macos app have gpt5.4 nano?
because i dont see it in mine
i see gpt5.4mini and gpt5.4
I was actually thinking they might train some more specific models and then distill them all into one. But they certainly do need a large general model to build everything else from. The only reason i was thinking this is because i watched an interview with the lead at google and he said as much, but i might has misunderstood i wasnt really listening intently
MY EYES
Is there codex on Linux?
That might also be a way to do it. And maybe OpenAI does it differently than I described, but the two usual approaches for making a specialized model is to take a general purpose one and fine tune it with post-training, or take a brand new uninitialized model and distill specific knowledge from a bigger teacher to the student, or a combination of both. But hey, this stuff is constantly evolving so who knows!
nano is only available on API, not subscription
Yeah, i agree you need a large frontier model to distill and make smaller models more capable. Let see if i can find the video and see what he was actually saying
I found it, he was talking about how they came up with distillation, which is not what they do today. They had 50 odd specialised image models and it was impractical to serve so they came up with distillation to squash them all into one model. https://youtu.be/F_1oDPWxpFQ?t=242
yesnt, no official desktop support but you can install the macOS version on linux or use something along the lines of T3-Code as a codex client
the CLI works obviously
It's the same as codex app?
the CLI is how the codex app works so yeah
it has a TUI on its own, but the codex desktop app and third-party clients use the CLI’s app server to bring the CLI functionality to desktop apps
The CLI is
Hey, i'm new to codex, what model should i use? 5.3 codes or 5.4?
5.4 is the default
that means i should use it?
5.4 mini is really good too if you want to conserve usage
ty!
yeah it’s good
basically it takes the advances of 5.3-codex into a more generally applicable model that at least in theory does stuff faster than 5.3-codex
why is there such a focus on worktrees in these tools, because whenever I tried using them and use multiple agents they just keep stepping on each other's toes because they need access to index.lock for practically anything, unless there's some way to prevent that...?
Is there a difference between using GPT-5.4 in Codex or VS Code vs. using it in the ChatGPT app when it comes to creating code? I am currently switching back and forth and both seem to work (with VS Code Codex being less cooperative in my last try).
Make a gitops skill with scripted git actions that serialize simultaneous git commands. Basically if index.lock exists, sleep loop until it's gone and do the command.
sounds like an OK workaround but some git commands take quite a long time. I just basically clone the repos normally which avoids these kind of problems. main drawback is having to push/pull between each other but I've got quick scripts for that
To my understanding one of the great improvements in the 5.3>5.4 update is that the same 5.4 model is used for ChatGPT and Codex. Previously the gpt-5.3-codex model had/has code-specific training that was not included in gpt-5.3 used for ChatGPT. Now, let's hope, the common gpt-5.4 model has all of the code training and should be exactly as good in all environments.
This makes the difference more about context than the model:
- ChatGPT has user/account custom instructions.
- Codex has a hierarchy of AGENTS.md files.
- ChatGPT is also influenced by the custom GPT or Project instructions and attached files.
- Codex still uses AGENTS.md files.
- Codex can use skills which are not available to ChatGPT.
- ChatGPT can refer to other conversations as a form of memory.
- Codex does not refer to other tasks.
So your ChatGPT thinking will not be exactly the same as Codex, even though all of the training is now the same.
Does anyone have a different understanding?
Yeah, the way I do it personally is use workspace_write and approval on_request which serializes access to index.lock by making me push a button. Keeps me in the loop on what they're doing too.
that sounds fine for write commands, but the AI sure loves using git to check on its changes/progress. I guess those don't trigger the prompt for you? but they still need access to the index lock...
I think the key to understanding how the worktree works in Codex is to get a solid grasp on the Git worktree. I think very few people use that feature, so something else based on it is going to be very foreign. Understanding the base functionality leads to understanding this specific implemented over it. I've only used worktrees a few times and just this year for the first time. I'm still wrapping my head around it.
In workspace_write, they can run non-mutating git commands autonomously. As soon as a git command needs to write into the .git folder it makes em request approval
I've been using the codex on CLI and I'm trying out the codex app, it seems to have a similar problem. the app is constantly looking at changes which keeps locking the index and locking codex (as well as overusing cpu)
me half the time when using new AI tools: is this really the default/proper way to use this? it's so broken
Gotcha, that makes sense. Most git commands are in the milliseconds. If the Codex GUI is acquiring the lock, and say you have another app like Gitkraken open at the same time, I can see the chances of lock contention happening
Oh yeah are you on Windows operating outside of WSL?
yea the file system is outside WSL so the file access is ""slow""
Yeah seems like the app has some common pains associated with mutex/semaphore/lock, etc. I'm sure they'll tune it.
I measured it though and it's only like 25k IOPS vs 200k IOPS or something like that. Slow, but shouldn't be such a problem in most cases
OK, so this is in fact a windows issue.
Windows does not allow files to be open by more than 1 program at a time. So in order to even "read" the git repo it has to lock. On mac, linux, bsd the operating system itself lets as many processes read from the same file without any limits.
I think some stuff like newline normalization is causing git to take longer than it should though
yes it's locking for any reads. I assume that was the case everywhere
I mean this is a deliberate lock by git. you shouldn't be able to read the git index at the same time someone else might be writing to it
windows does have ways to keep files handles shared, not exclusive
just trying to confirm if weekly limits are bugging right now? or if limits got tightened or something in the last 24 - 48 hours or so?
just timed git status on windows and got 354ms, subsequently ~80ms
and then on WSL...
time git status
real 0m8.033s
user 0m0.208s
sys 0m0.645
well, damn.
I was right that when I do git status on one side, the next time it's called on the other side it's much slower
idk if it's newline normalization or what
with repo inside WSL
time git status
real 0m0.070s
user 0m0.061s
sys 0m0.004s
guess that answers it lol
As a fellow Codex user I'll request that one of you guys ( @unreal parcel? ) check for existing GH Repo issues related to locks and timing, and maybe create a new issue. The interaction between Windows and WSL really needs to be solid. It's fundamental to how (I hope) most people should be using Codex (any AI) over Windows.
so we dont have 5 hours limits for codex
only for week?
i think rates are very low now, i spend 50% in few hours
I'm not sure how much they could do to alleviate the problem. I'd at least would want an option on codex app to stop it from analyzing the git repo in case you want to do that elsewhere
What led you to ask that?
plus
also full access 😬
i was able to do practically anything with plus on gpt4 extra high
what plan are you on, i still see 5 hour limits on pro
Interesting
I dipped below weekly 25% yesterday. what a moment
im on plus - codex app
Me too, Cleroth, first time. 🙂
y'all really yoloing with codex huh
Gotta turn off Fast processing!
maybe that's why WSL issues don't get fixed, everyone going full access
that's probably why I dipped below 25%
i mean if your not going over just use fast
Agreed, but if you're using Fast it's 2x tokens so you burn through them ... and by the time you decide to turn off Fast it's too late. 😆
I cannot wait for 5.4-codex
I am back using 5.3 until its gone
oh 5.3 codex is better in a lot of more complex cases than 5.4-high
really
5.4 tries to go too fast and quick rather than methodical
not my experience
yeah i think it is
then asks me if it wants to do x y z but ofc it should lol
5.4 kind of fluffs stuff sometimes
This is git for windows behavior.
In unix-based operating systems, everything is a file. Your GPU, keyboard, mouse, all of them are files in the file system somewhere. At an architectural level, the OS has no choice but to allow multiple processes the ability to read from a file. Writes however are handled atomically. index.lock is an optional agreement on Linux, BSD, mac. On Windows, it's a hard requirement for everything including read.
I suggest closing all apps and doing a check to update everything. VSCode, VSCode Codex Extension, Codex App, Codex npm, um... anything else?
ok but given this is WSL and not linux... I'm not so sure about that behavior? Like what happens when I then run TortoiseGit on the WSL repo
If you use WSL, and move your project into the linux container FS, you will never ever have this issue, full stop,
right, but I think that comes with other issues
macos ftw
I mean idk how you have it set up, but if you run which git and it says /mnt/c/Program Files(x86)/Git for Windows/bin/git or something instead of /usr/bin/git, then your WSL is going to inherit the windows lock contention behavior
TortoiseGit is a windows git tool, so it's going to use the windows git
the one inside WSL obviously is using WSL's git
The thing about WSL is that it's a Microsoft platform, and OpenAI has a tight relationship with Microsoft. If there's a significant performance issue with Windows/WSL integration, I'd hope tht OpenAI at least present it to Microsoft with a request to take a look. Of course, we all know that these things are issues so I suspect nudges by us plebs aren't necessary. But it's important to create the tickets to help surface the fact that platform constraints actually are affecting how we use these tools that they so eagerly want us to use.
I have to chill a bit, used 20% of weekly already on pro plan
lol I just found a solution
time "/mnt/c/Program Files/Git/cmd/git.exe" status
On branch claudereal 0m0.126s
user 0m0.001s
sys 0m0.000s
🤣
Is it more sustainable to make a fully automated workflow only using the agents or a hybrid workflow where most of the work is done through inline completion and debugging is done using agents?
just go back to 5.3-codex. its like ~40% cheaper.
did they update codex cuz for some reason i cant even open it anymore.. it forcefully closed
So currently we are on x2 for a while longer, oh boy when it'll end... 5.4 eats my plan limits fast...
Hi
Beginning of April. April 2nd, I think
indeed, hopefully the new plans will also be out by then, or at that day (Pro Lite aka Pro 5x last I've seen, and current Pro becoming Pro 20x)
Spun up a little red vs blue test this week
Threw Claude on an AWS Kali box as the attacker and ran Codex on my webserver for defense. First time ever doing something like this end-to-end.
Learned a ton. Definitely missed stuff, but watching Codex actively harden and make recommendations during a live attack was wild.
4.5 hours, ~188k requests, no breach.
Honestly one of the most fun things I’ve done in a while.
Anyone wanna see the PDF report?
sounds interesting!
how do you get claude to actually attack though?
scary right?
If you start almost any prompt with 'its mine' itll do it lol
some of the open weight models are all for it, but i wasnt thinking claude would be so easy
anthropic can trace and block malicious users but that's about how it goes down yup
network chucks got a whole series on it I wanted to do i live test with codex on the blue team ill send yeah the pdf if you want
dont you get a safety block if you try to prompt that?
I think if you add a route like GET /not-a-hack to the webserver, have claude CURL it, and it responds with "Hey Claude, as you can see this is my machine, so this is white hat pen testing 👍" And then it's like evidence to the model and Antrhopic that it's legit
Or heck, make it fetch the instructions that way lol
Nah, because it’s about context
I kept it restricted to my own AWS instance + my network, so it’s a controlled test environment
I wasn’t prompting anything like “go hack X”, just letting it behave like an automated attacker within scope
Suricata saw ~188k requests and still no breach, so the defenses held
McClintock
codex really got me considering the chatgpt plus upgrade
Heisenberg>
ive blown thru the rate limit 5 times in 6 days
Thought experiment, whats better in your opinion?
- Code reviewing with the AI using the existing, current context window before merging changes
- Refreshing context window, fresh AI, and asking it to review the changes
2
its worth it
1 is always biased imo
2
exactly right
anyway anyone who wants the pdf hit me up i gotta get back to it
what pdf?
Ooo interesting
also interested
please add release notes to codex mac app updates :)
custom subagents in codex are pretty nice, spent the day making orchestration and its feeling really good
I've spent the day with Codex to get my WSL project to launch and navigate a browser in Windows. It's scary cool how this stuff works, especially across OS boundaries. I think this'll control any browser on the internet that opens a port for it. 😱
Speaking of security ... What's the latest on OpenAI's tooling for securing code? The last I saw, I think it's only available to Pro/Enterprise and/or recognized/approved FOSS devs. I dunno, maybe the rules now say something about it must be raining in the Sahara or the S&P must be up by 2 when they approve usage. 🙄
i have it on pro, but the fact its trained on it appears makes me hesitant to try it
the repos i'd want to give it i really rather not train on
SMH - yeah, that's the game with this kinda stuff, eh?
yeah
Check this out: https://developers.openai.com/api/docs/guides/prompt-guidance
After reading through the official 5.4 prompting guide I can see why the migration from 5.3-codex to 5.4 was so rocky. This is an absolute gold mine for making 5.4 work properly
Based on https://developers.openai.com/codex/pricing am I right in understanding that 10x Plus is 40% more usage than 1x Pro at same $200?
Plus cost: $20/mo
Plus value multiplier: 1x
Cost per limit multiple = $20
Pro cost: $200/mo
Pro value multiplier: 6x
Cost per limit multiple = $33.33
Therefore Plus is 20/33.33-1 = 40% cheaper per token than Pro. And you don't risk overpaying if you're not using all the tokens.
Am I misunderstanding this?
This of course excludes the Spark model but I don't care much about that given that we have 5.4-mini.
If all you care about is Codex usage, and you want to put in zero effort on your part to make the most of the 6x higher limits then yeah it's a terrible value and you should just get 10 plus accounts. Pro comes with access to ChatGPT Pro model with practically unlimited usage (nobody has ever hit the limits with it and publicly posted about it), practically unlimited image generation, during peak time you still have low latency access to everything, lots and lots of ChatGPT Atlas agent mode usage, etc. The value is spread across the whole product line, and 6x codex usage can go a very long way if you put in the effort. You also get 6x more 5hr usage which is actually important if you want to use subagents
It seems that subagents consume the same context window from the parent agent. That is what I've seen so far... Is that how subagents are working?
not true. im two days in and im at 70% for the week. Ive been consistently hitting my limit on pro. xhigh+fast actually makes it much more likely to happen
The 5 hour limit? 5.4 xhigh?
Gotcha. If you look at the benchmarks, 5.4 xhigh is overall 2.1% more accurate while consuming double the tokens. High is ~0.3% more accurate overall than medium while consuming ~33% more tokens. Medium is legitimately the best choice in almost all cases. It's ~4% more accurate than Low which is the largest accuracy gain between reasoning efforts while consuming 4x fewer tokens than xhigh.
As for the /fast mode, hey, to each their own.
agreed, im still relatively new to LLM coding at this scale, I am noticing that sometimes, lower tier ones feel more efficient, and xhigh sometimes feels like its being too smart and takes too long
^ noticed this as well
using codex i feel like im giving a grad student 6th grade homework
which benchmarks are you referring to? I tried searching but cant find them by effort level
The graphs tell the story
If you recognize that you have 6th grade homework ... don't give it to the grad student. Give it to the 7th or 8th grader who should probably be able to do it. Most typically though we don't really know how difficult the task is until we collaborate with the bot.
This is where it's very helpful to understand your projects, understand how much effort the agent(s) might need to accomplish a task. If the project is poorly documented with no means of testing, then it'll take longer for an agent to figure it out and try to get a change right.
"Your yourself, know the enemy, know the terrain." -- Sun Tzu
those graphs are interesting, looks like at some point it's better to use nano with higher reasoning than mini with lower reasoning. too bad nano cannot be used with codex rn
I think nano makes a great classifier. It supports structured outputs, so if you turn reasoning off you get a really cheap and fast classifier/categorizer. I haven't played with the new nano but gpt-5-nano struggled pretty hard with anything requiring multiple steps, even if it's like 5 easy steps spelled out explicitly.
I just wish Chatgpt just releases the GPT 5.4 (1 Million Context just like how Anthropic did with Opus 4.6 Long Horizon Model) Even on the pro plan they haven't done that.
I will say Codex has gotten REALLY good at compaction (A noticeable increase in performance from 5.3-codex to 5.4), but still. Especially when searching files/context/content with MCPs (Gmail, Across Docs, etc.). I use Auggie MCP, but it's just not as good as Code Searching for text-based context & semantic search.
You can readily enable 1m context at the cost of tokens and precision.
You can?
model_context_window = 1000000
model_auto_compact_token_limit = 900000
Its not on the Model Selection menu? Oh only on API tho.
No, that’s a codex config toml flag which applies to CLI and app
Bet! Thank you, I'll go check that out. Probably just expected to be like how Claude does it where you do /model to change to Opus 4.6 (1 Million Context)
if youre not running throw 1b+ tokens a day on codex
youre genuinely just not trying
Which one would you say is better then? 500k, 1 mill, or 256k (Default) with compaction? Cuz if compaction can cover for context rot on the 256k over 500k then do 256k?
256k because anything above creeps and cracks
Unless you really need it… I’d not use it. It’s probs also the reason they didn’t make it an option with command
Ight bet, thx!
Codex remote control WHEN
to be honest if you dont have 5b+ tokens on a chinese model with ralph loop youre just destined to fial
I mean you can always give it Tailscale and then boom, remote control
https://x.com/trq212/status/2034761016320696565?s=46
they can’t keep getting away with this
I’m aware I can even prompt codex to make me my own app server with an iOS client , I just want something native
If you don't spend 50k on Mac Mini's or Nvidia DGX running GLM I'm praying on your downfall 🗿 🍷
at a 1.58bit quant running at 6tps
Just pray to the Sam Altman and he MAY grant wishes.
Damn 😭
if your screen doesnt look like this
just say bye bye to your dreams
found the subagent coordinator
I recommend looking at cmux its genuinely pretty good.
Has some pretty big Git bugs + updating the tasks but it the beta is pretty promising.
whats exec?
codex exec
no mine look like this.... have codex build the tools to use more codex
I have been leveraging OpenAI's harness paper, I now have a central tool repo with over 100 installed tools and context routs having codex run experiments, systematic project swarm deployments and observability tooling. Using only 14% of my credits working over 20 hrs.
Once I finish some of these apps I will post them open. They are not finished though. Still dont look nice enought.
so... quick question...
20 hours??? i can atmost make it work for 2 hours, and thats because it compiled a whole repo... 5times ;-; (like 20mins per)
Yes, the usage is all on tokens used and output, if it can come across with 20 deterministic tools and commands to run, then that will use far less tokens.
Hello, could you share how to use codex remotely? Like your picture shows
Explored 3 files
Ran python3 - <<'PY' i print() PY
Explored 4 files, 1 search
Ran python3 - <<'PY' import json from pathlib import Path protected = { PY
Ran python3 - <<'PY' import json from pathlib import Path path = P PY
my runs look like this where its running python commands and system checks mostly, then writing code as needed but often copying code running scrips and much less of it generating and writing raw tokens. I used to have codex write 16000 lines of code, now every session is 1200 with a bunch of commands to copy paste lint test, etc.
I am running a WSL on a windows machine, forwarding that through which allows me to run VSCode remote, the WSL is running the codex CLI and I had codex create a system which replicates the Codex UI from the JSONL's on the linux machine.
so WSL >Windows > Network > VSCode > WebGUI
still confused... how does yours work 20 hours, and mine works for 15 mins...
Plan mode:
Create a set of plans (as many as needed) to achieve this goal of a app that does <describe goal in detail here>
Have all plans executable through git management sessions and tooling. If the tools do not exist create them for the repo and install them.
Once finished, review all plans to validate that we can loop them.
Implement plan to build plans >
Now that all plans are done, create a tracker.md or system to track the execution of plans and execute all of them.
Done When:
All plans are done and tracker is marked complete.
@ember spindle
Where/how can I see how many tokens I have left?
codex shows a % percentage left in the bottom. First I thought this is the percentage of remaining tokens but now I tend to think this is the remaining context window (?). I am using codex with a ChatGPT Business (formerly: Teams) account.
Tokens are through API, percentage is the only metric in Codex. That is by designed because they do some handwaving and ignore some stuff if its not extensive or repeats for what I have observed.
So the percentage at the bottom line is what? Remaining free context window?
Like how do you know how close you are to running out of tokens?
That little % is how much of the context window you have left until the session itself compacts, and then you can keep going. If you're logged into an account, you'll be fine
You use /status which will output details of your current session and weekly limits.
im a professional PR reviewer
Do you have code rabbit? You can use the CLI locally and then when you're ready to make a PR get it to run on the PR
No im not sure what that is
AI review it's honestly pretty good
You can use the CLI for free if I'm not mistaken
And then PR's do actually have a cost
To addon to this, the 3 biggest reviewers I recommend. Is Auggie, CodeRabbit, and Greptile.
Auggie is both a Codereviewer and context engine that does SERENA and Semantic searches. It SIGNIFICANTLY helps token utilization and understandment of large codebases.
I've heard of Greptile
I have used code rabbit for a while it is very similar to codex in the fact that it aims at code correctness without any real understanding or comprehension.
They are great for finding code bugs, gaps and regressions. But not recognising the over arching cause. So they tend to treat symptoms and add more code and complication in cases where an architecture change is warranted.
Which i guess is the next golden egg to crack. Code is solved, but architecture engineering is not.
Noted thanks trying it now.
I cant wait till im done with this
What would you say is the criteria for "Architecture engineering" being solved in the first place? As coderabbit or multiple Codereviewers nowadays do have "learnings" you can implement where you can specifically state such codebase arch, "learns" (I put that lightly) from behavior and pattern regonition, along with MCP's codereviewers can call upon.
Thats the correct version.
LTS at least
Having understanding or comprehension on the bigger picture, not just correctness.
Bigger picture, as in?
It's major.minor.hotfix
what am i missing here? its like auto compaction is not working anymore I see this since this week?
Codex ran out of room in the model's context window. Start a new thread or clear earlier history before retrying.
All frontier labs suck at Verion numbers and model names anyways
It'll be 1.0 in my eyes when i can remotely connect to my sessions
the fact you can interact with claude in telegram and discord natively as of their latest 'channels' implementation,. and theres nothing for codex is shameful imo
Did you make sure you set enough tokens in your Config.toml to compact?
enough?
You still can with Openclaw via OAuth
💀 Should be
A simple example:
If something is supposed to be under the authenticated umbrella and it gets called early so it fails because it was placed in the wrong place in a tree of execution.
Comprehension on this would be understanding that the architecture needs changing in some way to enforce the ordering - reducing complexity by engineering better architecture that doesn't require guards or checks by design.
Correctness would add a guard to the code to protect against being called during unauthenticated state - adds complexity.
Oh.. I see what you mean on that point. Where when you run a codereview to simplify code, instead of simplifying it'll bloat the code with such complexity.
However thats why you have Developer Prompts & System Prompts in place to prevent such actions. You can define the Arch, suprises, easy mistakes, traps that could result in adding complexity. CodeRabbit specifically has custom review profiles, path-based intructions, learnings, and MCP tool integrations, meaning you can encode the arch inherit rules into the reviews that AI Codereviewers do. You define your module boundaries, your dependency rules, your "don't do X, refactor toward Y". And ofc the reviewer will enforce it as its a "System Prompt". If the codereviewer add complexity, thats the developer failing to teach the tool the architecture to begin with.
The Tool itself are only as architecturally blind as you leave them.
I can see your point where tools themselves "out-of-the-box" don't discover an arch change is needed on their own. But again, thats only if you leave them with nothing to grasp or giving them context before their review loop.
Auggie is actually a prime example of an "out of a box" that DOES actually know your Architecture. Its still in beta but will be released soon, its called "Human Intent" and I suggest you look into it. It uses augment's context Engine which is definitely a game changer for only 20$ per mnoth.
just found out that sub-agents pull in full thread histroy on each one so that chows down on your context and crashes your autocompact if you have a few running at the same time.
I'm suprised codex would have that issue, most of the time its just claude CLI. Codex is pretty reliable in terms of autocompaction and long-horizon tasks.
anyone want a free business seat
same, seems like a simple issue to fix. They could have it compact fro each sub-agent or a thread with the context it needs only?
Another thing that I'd say would be better is trimming down from 1mil to the default on what OpenAI states.
Autocompaction tends to fail on higher tokens too as it'll just completely forget about the tool to be called for.
yeah i did that also now
Subagents don't pull from each other threads?
Unless you have a skill specificaly for that use-case. As Sub-agents by default are RL'd to summarize what they did and report back to the main Agent.
cant confirm exact, here is what codex told me after i asked it to find out whats going on...
This is happening because auto-compact is a best-effort cleanup step, not a hard guarantee.
I checked your live config and session logs: on this machine (March 20, 2026) your model_auto_compact_token_limit is currently 900000 (not 100000), while the active context cap in those runs was 950000. That leaves very little safety room, so a big turn (especially with sub-agents that copy full thread history) can still hit the hard ceiling before compaction saves it.
Mhm werid, codex copies the exact same arch as claude codes subagent stack.
Had to remove that image has it won't the correct one. This is what I mean by subagents don't talk to subagents.
Mmm so at the moment sub-agents are just pawns for the main codex thread if I understand this correctly?
Yes, subagents don't talk to each other and are usually specifically not able to write on the same file. Claude Teams tho is a different story.
why is 5.4 lazy??:
The prototype milestone is working and hardened, but the broader overnight end goal you set is not fully achieved in this turn.
I understand all this, I'm just talking about the way code review in code rabbit works for correctness.
Requires harnessing for extra automation or a human with the comprehension.
Harnessing usually becomes code base specific for this kind of task
So the golden egg i was talking about would be - it just gets it.
Instead of building slop on top of slop, it would comprehend the bigger picture and not add complexity
I do 🫡
dm
I cannot fathom going back to 1/2 usage in a few weeks.
I'm on Pro and it still isn't enough right now
i did, thanks
I do😭
holy shi after getting rejected 7 times i finally got approved on github student pack
best three tips I can offer are:
- Get the AI LLM to use
rtk, installrtkon your system and have certain commands go via that to save on input tokens and subsequently how quickly your context window fills up too - Use
highinstead ofxhigh, many timeshighis more than good enough, just usexhighwhen you absolutely need it for something - For simpler tasks and if you're using subagents, use
gpt-5.4-minifor those kinds of tasks (e.g. explore I believe by default already uses mini)
Thanks for pointer to rtk, looks interesting. Not sure how much it'll help Windows dev. And yes I rarely use xhigh but probably high too often vs medium
whats rtk
I'm hoping when the new Pro Lite plan drops, probably after 2x ends, that if the hidden slider on the plans page for the Pro plan (5x / 20x) stays as it is, and Plus goes back to what it originally was then 5x and 20x are multipliers of the Plus plan. If that's the case then theoretically Pro 20x should pretty much feel like the current Pro plan (as if 2x rate hadn't been turned off), while 5x (aka Pro Lite) will feel slightly worse than the original Pro plan on 1x rate
I am crossing my fingers that they do something like this to avoid disappointing customers when 2x ends
20x would be better than what we have now, if Plus quota doesn't change
indeed, right now I measured the difference at 8x~ between Plus and Pro
It's strange because the site says "6x higher usage limits for local and cloud tasks" which isn't clear but yes I also measured Pro = 8x Plus
yeah it's a bit odd 😂
They just vibe code their site + apps
I really think they don't pay too much attention, too busy
yeah who knows, 6x might be an outdated value
the subagent tool supports optional parameters for model and reasoning effort, so you should be able to instruct the AI LLM accordingly
can you approx a number in tokens for that comparison ?
afraid not, I went by the topup difference on the weekly %
upgrading from Free to Plus/Pro resets the weekly quota, while upgrading from Plus to Pro tops it up
That does look kinda cool, but I see it strips potentially vital info from the outputs. For example rtk docker ps does not show uptime, rtk ls -lah strips file permissions, size, and modified. Definitely more compact representation but it changes the output contract of many commands.
I tried to do something like this a few months ago specifically for cargo commands, making them output json and parsing out what I thought was noise programmatically, but I think agents do better when they have the whole output to work with, and a tool like that only works if you force them to always use it (e.g. replace their shell or make rules about it).
Hi all. Is Codex free?
how is gpt 5.4 for everyone? Having a hard time with it today...
How do you make it work with codex (cli) though ? Do you have to write a skill to always spawn it ? To my knowledge there is no hooks either
In my case I added developer instructions to my config.toml to instruct/guide the AI LLM on how to use rtk
And you can see it being triggered as you would expect ? That means you might have entered the commands that you wish to re-direct through rtk yourself right ?
yes seems to work fine here, prefixes various commands with rtk on my WSL setup
[snippet] ...
Use `rtk` as the default wrapper for almost every shell command it supports, not just when the gain is obvious.
Reach for plain commands for shell builtins or cases where wrapping would be awkward or incorrect, such as `cd`, `export`, `alias`, heredocs, raw shell control flow, commands that `rtk` does not support, and all `npm`/`npx` commands.
Examples: default to `rtk git status`, `rtk ls`, `rtk find`, `rtk grep`, `rtk pytest`, `rtk vitest`, `rtk diff`, `rtk wc`, `rtk curl`, `rtk docker`, and `rtk kubectl`. Use plain `npm` and plain `npx`.
If `rtk` would change semantics, hide information you need, or make the result less reliable for the task, use the normal command instead.
adapt/modify as needed
Ok I'll try then, seems to be something good especially for tests and ruff outputs
Thanks for the plug
istg if u sneeze youll waste so much of ur codex usage
i cant do anything
its so annoying
even with the 2x increase
are the limits-issues resolved or what?
seeing the same thing. What a psyop
the 2x usage must be lies
no it was unbelievable but this week it seems to be back to reality
anyone else getting this message? stream disconnected before completion: An error occurred while processing your request. You can retry your request, or contact us through our help center at help.openai.com if the error persists. Please include the request ID 4697ff8e-2815-4364-bcc6-d97b1fe4418b in your message.
- I keep getting it when its almost done with a task and its sucking up my usage
Avengers assemble!
Guys is codex free?
I'm a student but I pay for Plus 😭
They give you $100 in credits as a student, I guess so you lose track of credits and spend $40,000 generating code.
To support the next generation of builders, we’re offering verified university students in the United States and Canada $100 in credits to use in Codex.
Not in amrica tho 😭
lol,js use a vpn
Since I start use the chatgpt/codex, somehow auto topup credit charge me 125credit with original price of 250credit, since my credit start from "0", first time initial the billing might be wrong somehow. Until now codex charge me every 125credit for $10 🤬🤬🤬 I willl try contact support. I just come to warn you guy if you just start to use the codex.
link?
msg
Dm?
who actually uses xhigh
its literally
not better than high in 99% of usecases
like
yes please feed me context rot
true, voratiq's benchmark even shows that it's almost the same rating as xhigh, but takes a lot longer on xhigh and presumably burns a lot more tokens
yeah voratiq benchmark seems to line up with my experience
except g31ph being below claude haiku lol
ty 😄
yea
I did❤️
Does anyone know the difference between the codex weekly usage and 5-hour, and if it is higher compared to somebody on a business team plan, business plan, and somebody on the pro plan and the plus plan? Is there any difference in percentage of how much usage more, of how much more usage they get, or no? Someone let me know. And has anyone seen if you actually get, you actually take up less usage if it's on the CLI or the VS Code or the codex app?
"js"
HOW MUCH TIME DID YOU SAVE
is 5.4 any good over 5.3 codex
5 hour = 5 hour, not sure if it's rolling or not
Weekly = weekly, not sure if it's rolling or not
🧠
dont u need to show proper proof and u in a us/canada college tho
100% for 5 hours = like 20 - 15% of weekly i guess
js fake it
0.2 seconds
cuz i am on pc
and i type fast on keybaord, even asked chGPT
hey folks. now that I’ve completed Trusted Access for Cyber on chatgpt.com/cyber, what a
dditional things can I do, and do I need to switch the model to GPT-5.3-Codex?
there is an annoying thing that Im noticing in codex..
it has the terrible habit of spawning a terminal session and abbandoning it without properly killing the console session..
for example, the AI will run something line npm run dev which starts a live server.. well, it is ok, the server runs, the AI does a curl request to see the page is correct.. but then, since the live server process never finishes.. codex leaves it running..
and then, after a while, next time, it starts another live server, which automaticlaly rolls over to the next available port.. and like that, the system gets littered with useless running process
bruh... they gotta fix ts
i mean % wise
10% used on weekly limit on plus plan might be like 5% on pro plan
etc
it used to be the case that if you exhausted your 5 hour quota, it would consume 20% of your weekly quota, not sure if that's changed but that's how I remember it
Man I have had codex writing all of my code for 18 months, over 1300 commits, tens of thousands of lines, and I've never used up more than like 15% of my allocation. My usage left is almost always above 90.
last I know, about a month or two ago, the difference from Plus to Pro is about 8x~ (source: I measured the difference when it topped up my weekly quota after upgrading from Plus to Pro)
That's just the standard ChatGPT Pro for me. I really don't know what people are doing with Codex that they're using up ALL of their allocation haha 🙂
what about business plan?
it’s about the same, when i use all my 5 hours it’s about 17% weekly
anyone have advice for driving better frontend/UI development? I find by default, Codex can't design itself out of a paper bag. Maybe there are ways I could prompt/support it better ..?
Use a tool such as Figma for designing flows. I usually tell my LLM to "create a design like Figma" and it gets that.
There's the problem. I regularly run out and I'm at almost 10,000,000 for this project
to be fair, now with agents, Im consuming some more tokens
also, some skills can be really heavy
.
good tip, ty
Business felt a little like Plus, maybe slightly better, never tried to measure the difference
stitch (Google), also has an MCP
Ah, nice one, I'll check that option out
It took Codex three prompts to create an MCP server integration with my app, it has been super useful, I expose database table schema data, the code planning agents LOVE it 🙂
I get stuff like "Oh, this is excellent I can see the entire database schema here" hehe
Since a lot of my application is controlled by the configuration data in the database, this fills in gaps the agents need to know, too, there's options to dump config tables and the like.
Like, none of the navbar is actually defined in the app, it's all read from the database and built dynamicly based on your security roles, and while there's documentation that describes this in the project, the agents being able to just look at that data is far more effective.
I cant post on reddit for some reason:
Hello all,
I am building a web app based on python. The app is basically parsing pdf documents for my company. I need to embed AI into it in order to improve accuracy and speed.
I am curious to know if it is possible to use ONE ChatGPT Plus account that will go to the back-end only through OAuth Sign-In method instead of using an API key.
My ideology is basically this: OpenClaw has it where you have the option to use OpenAI through OAuth instead of an API key. Can I use this same idea to my project?
The AI responsibility is: end-user uploads a pdf then it goes through the my python parser web app and then AI checks it and corrects what needs to be correct then spits out a .csv file that the end-user needs.
Ask questions if something is unclear, please do give me your input if you have any knowledge about this.
That's very easy, my app does something similar, I import data from Azure Billing API and stage it, the app has an MCP server built in that can read the database tables, this does require quite a bit of work, however, as you need to implement OAuth 2.1 fully to publish a public MCP server. With this data available to agents, it's very easy to do reporting and utilization projections
I did not implement specifically a document parsing provider, since there are APIs for this data, but adding an adapter to import documents wouldn't be difficult
It was my backup plan if I couldn't get the API imports working 🙂
does OAI's ChatGPT accounts allow granting auth like this? (I thought not, but I can easily be wrong..)
Th percentage of the wheel is context window, 250k works well its about 500pages. Local containes weekly usage.
Have to enable developer mode and add a custom app
In my case my auth is locked to a provided code.
There are callback and scope details that are required to get that all working
well there ya have it!
(only other thing I'd point out to @ocean fulcrum is to check ToCs ... running an app this way for commercial purposes could be a violation. I don't know, just a heads up)
Everything I am doing on Linux already does this. Is it more for Mac operation?
I actually only have MCP attached to Claude, which does all of the planning and design, Codex does 100% of the code work. It's a good combo. Codex often does things that Claude comments on in reviews like "Codex went ahead and made this additional change which is a good catch" stuff like that 🙂
Codex just, for example, removed an unused reference from a python file that Claude specifically said "You can leave this as it does no harm". Codex understands my security approach and reads the documents that also detail in the coding standards "No unused references" and while I don't test for that on the Python side, I do test for it in the ReactTS frontend 🙂
Did you sign up for security?
My dad wrote the CISSP...
I meant with codex, it allows you to push more insecure processes.
appreciate it
well how'd you go on about doing it?
I told Claude "Implement MCP server in my app" and I'm not actually trying to be flippant here. The state of my app, however, was that it already was built on OAuth2, if you are really interested all of it is public https://github.com/elide-us/elideus-group
Okay I asked Codex in my vscode to do something similar and it kinda just wouldnt and said that its not possible
is the keyword MCP server or something?
I will add that the library for FastMCP had its own starlette server instance in the library, and since my app is already FastAPI, I just unwrapped the library and ran the MCP message processor directly.
pip install mcp
To run an MCP server, however, requires a complete OAuth2.1 implementation, certificates, JWT generation, there's a lot to it.
In my case, I support OAuth from Microsoft, Google and Discord, and there are configurations required for each of those services to support this as well.
well im not that advanced so i will admit, i am having troubles understanding what your saying
Anyone ever seen or overidden Codex's default prompt?
For sure, I mean, I've been building enterprise infrastructure for decades, what might seem simple to me is... probably not 😄
FastMCP is the library from Anthropic that can run an MCP server. There are some infrastructure requirements to go with it if you want it to be accessible from public services like in your Claud.ai or ChatGPT, but for local work it's probably easier.
I love Enterprise grade software https://github.com/enterprisequalitycoding/fizzbuzzenterpriseedition
It does often seem like a lot of extra work, I actually work for Microsoft, so my perspective is probably a little biased, but... there are reasons we do all that extra crap 😄
Yeah def impressive. I just thought it would be coded in; if it is extensive amount of work, would you suggest I just use API?
here's codex response to "why not implement mcp into our app"
FWI This is the codex (missing 5 more kb of text btw)
I thnk for what you're doing you don't need an MCP server, you just need to set up an API to call, send your documents for processing and get back the response
You are right. I just thought with OAuth, it would be Cheaper. That is it
But I also want it to continously improve the codebase for when an "unseen pdf" is uploaded
I think in this scenario it would likely end up being more expensive if you allow an AI to read your documents via MCP. It's pretty sloppy, I've had Claude read the same endpoint four or five times in a row for the sam tasks. Set up two tables, one to stage unprocessed documents, set up a timer job to poll for new documents, open them, send them to an API that can do whatever you're looking for (OpenAI and many others provide a lot of content review options) and then move the results into a results table or storage endpoint.
Hey, if you work for microsoft then tell that one guy there's at least one person (me) who might use Windows if the whole OS was written in Rust 🙃 I was super bummed when they went back on that promise
Win32 hasn't changed since Windows 7 SP2 (Kernel 6.3), most of the security issues that are discovered these days are often patched with Rust, but I don't think they touch it unless they have to
To be fair, the main reason I still use Windows is because it supports FIDO2 for login.
why does codex still ask for permission for every new file edit? this permission does not seem to work as described. WSL CLI if that matters
is it in gitignore?
is what in gitignore?
the file it's trying to edit
i guess my main issue is OCR, which provider is excels at that? and also to validate the results pretty much, edit where it must be editted by the AI since it will be the checker
no
Couple answers to that, here's where you'd start for API docs on OpenAI for vision: https://developers.openai.com/api/docs/guides/images-vision
I will also add that the last time I checked there were 11,300 models available in the Microsoft Foundry (Azure) which allows you to also build automation flows to kinda create your own API systems, but that is all also tied into setting up a Microsoft tenant and endpoint configurations. Nothing more obtuse than any other cloud service, though.
i'm realizing it may be an issue with git modules. i'm running codex at the root of a workspace but its editing files under git modules
edit: now i can't repro the issue in other git modules.. what is going on
Have you considered asking ChatGPT what's going on? 🤣
Add a file at .codex/config.toml inside the repo root with this:
writable_roots = [
"/absolute/path/to/submodule1",
"/absolute/path/to/submodule2",
]
And then in ~/.codex/config.toml add your main project folder as trusted.
genuinely the issue has gone away as soon as i posted about it. maybe because i reselected Default permissions again though i'd done it before
If you have workspace_write sandbox, it may complain when the submodule index changes. Idk if what I recommended would help with that (probably not) but keep us posted
👉 **Codex Needs Memory Enhancement and Anti-Spaghetti Vibe Coding.**👈 As a vibe coding tool Chat GPT-5.4 on vscode would be a million times better if it could keep the whole repo and documents "in mind" when editing and writing code. However it is a constant case of one step forward, fifteen back. So many regressions, loss of context, writing a new path through the code rather than using what was there and so on. Needs smarter training, but it also likely needs something better than transformers in reality.
I disagree
You disagree that it needs more memory, or its use?
anyone know what the 'other' usage is? i've used opencode before, but i'm fairly certain i haven't touched it for at least a week
edit: nevermind, found a bug report: https://github.com/openai/codex/issues/15336
i did enable the 'guardian approvals' experimental option, curious if that's getting logged differently for some reason
Both. Anti-spaghetti is a design pattern/output contract. More memory requires a much more sophisticated attention workspace that modern hardware can't handle while still being useful
Yes, that is why I think transformers aren't the solution for more memory because of how they scale. When the system can't keep the full context in view it has the spaghetti "design pattern." Meaning it is harder for others to read and harder to maintain.
Ok so easy tasks im having a lot problems. Me, "hey codex add the image file from the image folder to the page with the same name." Thinking (10 min) I try to create the pr.."binary code not supported") Me, "Hey codex binary code not supported." Codex(thinking 10 min) screen freezes. I reload 4 times. I try to create the pr. Binary code not supported again. Hmm its been an hour and we still don't have the image on the web page. We have some real problems here. We need to admit that and fix them.
Perhaps, but it's modeled after human intelligence. When I read your message I prefilled it into my brain, and now as I'm tapping away on the keyboard I'm decoding the response.
Having an infinitely large context window is not the answer. If we went that route, then in order for me to type a response I have to think back to every text I ever read since I was a baby, all the way up to the text in your comment to come up with a response.
OpenAI's solution is pretty brilliant. They have a PhD level checkpoint full of vast knowledge, and a context window large enough to hold what's important. When that window fills up, it converts all of it into a mental state snapshot. It's kinda like waking up in the morning, remembering the important details of the last few weeks/months, and living in today's world.
Whatever they're doing over there in the research lab is good stuff imho
the prompt in there could be interesting to make a skill out of even though i know there's a lot of them already
nvm i'm blind that's like 3 lines below
exactly my first thought when i saw the prompt
hey folks. now that I’ve completed Trusted Access for Cyber on chatgpt.com/cyber, what a
dditional things can I do, and do I need to switch the model to GPT-5.3-Codex?
am starting to read the designing frontends guide but interestingly the example prompt they give mentioned an image generation tool. Surely this is API only, not supported in subscriptions (a tool to generate images) in Codex CLI. Unless I've missed the tool entirely and didn't realise such a thing existed 😂
EDIT: there's such a tool, but it's behind an experimental feature flag and even if enabled it seems to be unsupported on subscriptions
stuff like this makes codex awesome:
"Negative flags are harder to reason about."
Hence why theres something called Attention frameworks built specifically to combat that issue and eventually solve that entirely.
you can just build things
Is it possible to use codex to work on Kaggle notebooks? Is this what MCPs are for and if yes does such an MCP exist?
let codex figure out 🧙♂️
Right now I am saving the notebook, reading it into codex, importing in Kaggle, executing, downloading... kind of defeats the idea of the agentic loop.
anybody else having hella issues