#codex-discussions
1 messages · Page 22 of 1
how does the outptu and cost compare
I am more concerned with the quality of code than the cost, however that said I think it compares well
Low Quality = You need to use more chatgpt. So it might be same cost with High thinking or more than that.
alright thanks
Do you find the iterations needed to correct the errors to be more minimal than regular GPT 5.4?
Depends on the complexity and the breakdown of the steps for the task, right?
Does anyone have a way they like to approaching legal documents through skills? I'm not a lawyer and I'm relatively new to app development, so i want to ensure I am as legally protected as possible. Thanks!
Hi guys , If a tool could give you early trend signals (before they peak) ready-to-use video ideas
Would you pay (one-time)?
Or is this something you’d only use if it’s free?
Trying to understand if this has real value or not.
what does this have to do with Codex?
Cuz I have built that extension with codex
Chrome extension
refer to rule 7 of the #server-rules
Bruh seriously ? This is why we small peoples can't achieve anything
Fine ok
what happens if this happens?
{ "type": "error", "error": { "type": "invalid_request_error", "code": "invalid_value", "message": "Invalid 'input[175].content[2].image_url'. Expected a base64-encoded data URL with an image MIME type (e.g. 'data:image/png;base64,aW1nIGJ5dGVzIGhlcmU='), but got empty base64-encoded bytes.", "param": "input[175].content[2].image_url" }, "status": 400 }
?
{ "type": "error", "error": { "type": "invalid_request_error", "code": "invalid_value", "message": "Invalid 'input[175].content[2].image_url'. Expected a base64-encoded data URL with an image MIME type (e.g. 'data:image/png;base64,aW1nIGJ5dGVzIGhlcmU='), but got empty base64-encoded bytes.", "param": "input[175].content[2].image_url" }, "status": 400 }
?
{ "type": "error", "error": { "type": "invalid_request_error", "code": "invalid_value", "message": "Invalid 'input[175].content[2].image_url'. Expected a base64-encoded data URL with an image MIME type (e.g. 'data:image/png;base64,aW1nIGJ5dGVzIGhlcmU='), but got empty base64-encoded bytes.", "param": "input[175].content[2].image_url" }, "status": 400 }
Model changed from GPT-5.4 to GPT-5.3-Codex.
what happened
{ "type": "error", "error": { "type": "invalid_request_error", "code": "invalid_value", "message": "Invalid 'input[175].content[2].image_url'. Expected a base64-encoded data URL with an image MIME type (e.g. 'data:image/png;base64,aW1nIGJ5dGVzIGhlcmU='), but got empty base64-encoded bytes.", "param": "input[175].content[2].image_url" }, "status": 400 }
i was just thining that 5 mins ago
Is it more economical to use all the rate limits every week for the $200 Pro plan, or just use API?
Because at this point I'm wishing there was a higher plan than the $200 Pro plan, like $300 or something with a bit more limits
And there isn't - which is making me wonder instead of using up the $200 plan plus an extra $80 per month on credits when I exceed the allocated limits, would it simply be cheaper to just pay by the API directly.
I'd whip out the calculator, but no, I don't think it would be, even with the $80 in credits. The price difference between the compute in the plan and the API is rather large at scale
can someone explain the difference between my weekly and 5 hour usage limit
i dont get the purpose of the 5 hour one
i never hit it
but im draining weekly a bunch
Weekly is your total usage for the week. The 5 hour limit is just there to make sure you don't blow through it all at once basically.
is 270k context enough for you apps? why is openai not giving us 400k context, coz i see gpt 5-3 codex has 400k in copilot but 200k in codex

you can have 1m context
it's actually misleading. You get 258k context on Codex CLI by default, however there's a small buffer allowed for auto compaction to work and there's also a 128k output to consider (meaning it can reach up to the 400k context depending on the situation at the time). Copilot on the other hand, lets say Copilot CLI, also allows up to 400k context but has a different buffer and possibly a different output tokens limit, still allowing up to 400k context to be used depending on the situation at the time.
However, Codex CLI has the advantage that you can override the context limit even on the subscription to go up to the full 1M if you wish, but personally I wouldn't recommend it as degradation can become noticeable
How to enable the 1m. I have not seen such setting
I need an average of 600 to 700 context per story implementation. Since am working on adding features on top of a large codebase. And reading the flow and storing it as a context sometimes eats 3/4 of what codex has now
not tested but I've read numerous times these need to go in your config.toml:
model_context_window = 1000000
model_auto_compact_token_limit = 900000
this is very bad
eats up limits
makes model hallucinate
acts like claude after high enough context usage
agreed, it's 2x consumption over a certain usage of the context window, and will degrade noticeably
Claude accuracy over 1m context window is much much better than gpt-5.4. while gpt-5.4 drops to 36 recall accuracy, Claude Opus 4.6 drops to 78 recall accuracy.
In essence you could literally use that 1m context window...
For OpenAI 1m context window is just a gimmick currently that doesn't give you much.
claude has its own problems though, it isnt apples to apples
correct
@boreal holly did you see they added a flag for the cli to expose its internal app server
I have not! Whats it supposed to do differently?
I assume it just lets you move between cli and something custom made rather than being isolated in your own appserver
OK, that is kind of cool
I hope they add it to the app
Just want to say that it's almost embarrassing how much more inspired I feel after switching the Theme to Matrix hahaha 😆
WOW! Idk what to even say, that's downright savage
one of the best skills is the chrome-debug one, helps a lot in giving context of some bugs to codex
I wonder if they will make one for android and let the ai use adb
if youre doing web dev i suppose it would be.
its pretty good for creating accounts etc using llms
Like making a facebook or insta and then posting on it etc
well.. kinda, it lets the ai puppeteer the browser, but that does not work very well long term, the site's bot detection will kick in
If you use openclaw and just tell it not to get caught it's pretty good at not getting caught. I guess this is where the ai vs ai will start to have to step in and handle it.
openclaw
have you tried it? its pretty good as a personal assistant
Hey Eric. How are you?
I'm good
What you've been up to?
dealing with the constant feeling of a skill issue being the bottleneck between me and success with ai.
The answer is always just a few words away but finding the words and then making sure they are said every time is a constant churn
Anyone considered moving back to using gpt 5.2 xhigh?
I use 5.3 high for all coding atm
5.4 is not delivering quality for me. 5.3 neither. So I'm just running 5.2 on the refactoring that 5.4 leaves me with and it's immediately so much better again.
Can you elaborate what you mean by "just a few words away". Do you think your prompts aren't being understood and that's leading to a lack of unsatisfactory outputs?
Despite drastically changing the system prompt which does help....5.4 and 5.3 just don't gather enough context and engineer around problems instead of properly refactoring if needed.
I mean there is always more to add.
Like, i get a nice setup that can take a problem solution statement make a phased plan, make a plan for each phase.
it works, its nice etc.
But then its like ok i cant just sit around and wait for that, i need to keep working and iterating. So i need to build a work trees system that lets me do this many times at the same time in a satisfactory way.
Ok made that - ahh relax and vibe right? NAH!
Need a way to have code reviews be automated in the fixes that dont just add complexity and patch symptoms... ok lets make a prompt for that...
test it validate it etc etc etc
it just churns forever
There is always another golden egg to crack
haha codex is seriously addictive that way. so many elementes. for one the minimal input vs output. Just one prompt and i can have.... and then the random outcome. sometimes brilliant, sometimes it created a massive problem. The hours fly by
Are these for numerous projects of different tasks or is this just one big project that will be the best at producing other projects?
this is just life
Im not looking for solutions, its just dev life now
its the way it IS
I understand but if you could have one thing that would make the churning easier then what do you think it might be? Doesn't have to be a perfect answer but it'd be nice to hear your two cents.
just have ai get it
instead of being like a 10 year old with no comprehension that has 40 years coding experience, i want a 30 year old that gets it
is that too much to ask?
🤣
Do you have it articulate it's understanding of what you want before you send it on it's way?
Sometimes the assumptions is what sets it off course imo.
There no one thing, because it's just constant iteration in the way we work and the way models help us
hm?
All the time, i think one of the best ways to get it to have a good understanding is have it explain what it thinks you want
What i mean is the models iterate, and when that happens what we are doing changes in an instant
Oh does it?
Yeah that's what I was thinking 😛
Yeah, with models like sonnet 4 and the tools at the time i wrote probably 30% of the code. Now i write almost none.
That was 6 months ago
I mean manually writing it. I understand it can call for fixes etc, but mostly i am adding rules to a harness or lint tools etc to have mistakes not happen
Do you think it takes the fun out of programming/developement?
Maybe that's why you're trying to optimize it? Since it's the only way to get real development now since the AI can write so many tedious code for us now.
I felt bad using AI but then again I remember before AI where I used stackoverflow and the documentations to simply "borrow" their lines of code since there really was only one way to do certain things.
Atleast that my lazy brain could come up with hahaha.
You don't use codex 5.4 over sonnet 4?
Yeah i use codex as it is today, I'm just saying so much changed in 6 months
i went from digging through the ide to getting deep into the code to - reviewing prs, making skills and lint rules and building flows that let me get more done. Thinking less about the actual small implementation details and focusing on not having slop instead
How do you focus on not having slop?
harness it to follow architecture rules, give it custom lints and skills that enforce general architecture rules i like. Add focused critical rules for common mistakes it makes.
good things to have. Do you just include them with the prompts?
everything is a prompt, but the harnessing is a layer that takes care of its self, in agents.md it gets some base rules about how to following code clean up using linting and cli tools etc. then in skills it gets guidance for gaps it has, mistakes it commonly makes, and architecture.
I have some manual skills for things i do a lot, they are just prompts that i have refined for common tasks
I configure them with
policy:
allow_implicit_invocation: false
Wait codex has an agents.md?
o.0 ahuh 😛
please reset my weekly usage 🤣
oh this changes stuff. Do you find it helps you get better results than just using the prompt each time?
Adding a command module like the follow helps:
“Define what slop is, then perform a check after each output to ensure slop has not been generated.”
This is a good idea. I wouldn't have thought of defining what slop is.
Though I found a deep research on analytical definition of slop is best. Then as a pdf in a project folder, it functions as just a friendly hello / reminder to the next agent reading it to watch for slop. This context in-project seems to keep it “front of mind”. Fascinating I find it
That way it gives it a thing to compare outputs to; a sort of mini grading system
So you use the chatgpt chat model on chatgpt.com to do a analytical deep research and have it defined in a pdf, then save that pdf to your codex via project folder?
My mind is constantly blown how powerful a single config.json or txt file can be
"Disassemble the compiled binary and look for slop in the assembly"
Exactly. Doing deep research and then feeding it into your codex is super powerful
Gives nice guide rails
what other guard rails do you recommend in order to make better code?
Wait the AI can do this?
I'm totally joking 🙃
It can but it would be a herculean effort with little reward
slop imo is just spaghetti code
the real way to handle it is to force decent architecture
You never know now with AI.
what would be an example of forcing decent architecture?
I already said it like 3 times
harnessing as a second layer?
yeah
I was thinking more of the AI architecture but my understanding could be wrong.
code architecture is what i mean, maintainable code that isnt prone to errors
that makes more sense.
probably in 6 months it wont matter
the models will either understand it and just do it better, or they might just get good enough to brute force spaghetti code so well it just doesnt matter
You think so?
maybe a bit of both plus some better tooling
hopefully.
i just lost faith in OpenAI's development team after OpenClaw. Since it took like one guy initially to do something that a whole research team couldn't come up with themselves.
I hope it's the former
Nah, I think OpenClaw did something that was so popular, despite violating the ToS, OpenAI saw it as an opportunity - especially because Anthropic was banning folks using it so OpenAI could be like "we're the good guys! We let you use it!"
it violated ToS?
In the ToS it says you can't share your subscription with another user and you can't use it as part of an external facing service. OpenClaw lets you connect your ChatGPT account through Codex to group chat apps, which is technically like an external facing service.
so they get nicked on the technicality huh?
Yeah! I mean they hired the creator and gave their official 👍 to it
some one innovating doesnt make someone else invalid
I think the issue arrises if you have an agent for each person using your external service. With OpenClaw you can probably fall within reasonable usage if you connect a single agent to a group chat, and multiple users chat with the one agent. That doesn't wreck load balancing like "25 users, 1 agent per user", so there's a "within reason" clause
I have it working for my kids with minimax model
its cheap, it is good enough for them because they only ask it school stuff, get it to look at some images etc
they use telegram on the computer
gpt free gets limited too fast and cost a lot more for a plan
openclaw kind of has a personality in it as well. so it doesnt get generic feeling wording and answers
they basically get unlimited gpt feeling chat
When i set it up i didnt have high hopes, or any ideas on what i would use it for
but i had the old computer sitting there so i thought why not see what its all about
That is pretty awesome! I mean I've heard a lot about OpenClaw but that story in particular tells me they built something that places creativity at the top of the feature list
yeah its a great little toy, if you have a box sitting around its worth a tinker if you have the time
Not when that "someone else" had their shot with Atlas and failed greatly as a research team.
spark seems like a good little agent model to jump in and find answers in the code base
All depends on the quest you are on!
hmm okay it just happened again.
started a new session. had it plan its work. had it implement the plan. step away and come back and its request permission. i then ask it to edit a single file as a test and its still asking permission.
open a different session with codex and ask it to modify a file in the same folder and it has no problem
Oh, that changes it a little bit because it sounds like the transition from plan to implementation might be broken
yeah though yesterday i had the same speculation so i asked a new agent to come up with a plan and then implement it but it did not run into this issue. that's when i said i could no longe reproduce it. will keep toying
I wonder if closing out and resuming after switching out of plan mode keeps it from happening
just switched to a new session, asked the agent to come up with a plan to put a comment at the top of a file. it planned it. i said implement. it implemented no issue
still don't know how to repro but the workaround for the session is:
i denied permission for the edit request in the first image and then set permissions to Default again and then asked it to try again and it did so just fine without needing permission to edit the file
Is anyone else experiencing the same bug as me? I can't create a new task because it gives an error saying the container hasn't started or something like that.
@boreal holly People that say Codex can't do front end imo just don't have an imagination to design, maybe you think its ugly but I think it came out pretty slick
(WIP For my app-server client)
Dude that looks so good 😱 did it GeometryReader/animate the background?
Or is that a MetalView
Yeah im not sure what it used to make it tbh LOL I havent looked at the code yet, its mostly for personal use so im not too concered how spaghetti it was
Ill look and get back to you
and also full disclosure im making this with Expo/RN to test out how different it feels compared to my real production app made in Swift for another project
:c
Oh yeah, after you shared that update I'm rebuilding mine so the app-server, bridge server, and GUI are all decoupled and operate independently
I feel like they've been slowly bread crumbing to be like hey... build your own remote client... its just a prompt away
Ive just been really needing my usage for other stuff so i havent had a chance
The background is built with pure React Native views + react-native-reanimated:
Aura blobs — 4 large absolutely-positioned View circles with big borderRadius and low opacity (6-12%). Colors: blue top-left, indigo mid-right, cyan bottom-left, green
bottom-right. No gradient library needed — just oversized circles that bleed off-screen edges.
Pixel grid — A flexWrap: "wrap" container filled with small 16×16px View squares (2px gap). Static cells are white at 5% opacity, giving the subtle grid pattern.
Fireflies — 24 individually animated Animated.View overlays positioned at random grid coordinates. Each one uses react-native-reanimated's useSharedValue with:
withDelay(randomDelay,
withRepeat(
withSequence(
withTiming(peak, { duration: ~1500ms }),
withTiming(0, { duration: ~2000ms })
), -1
)
)
Colors cycle through #0A84FF, #5E5CE6, #30D158, #64D2FF, #BF5AF2. Each firefly has a random delay (0-6s) and peak opacity (0.2-0.55) so they pulse independently.
The whole thing is wrapped in React.memo, uses pointerEvents="none", and positions are pre-computed in useMemo so nothing re-renders. The component is at
components/animated-pixel-bg.tsx.
I just asked it to tell me how it was accomplished
Mostly react-native stuff I believe
Not sure what the swift equivalent is, but if you copied that and asked Codex to give the swift equivalent im sure it could
The big project I'm building is very AI-powered so I been using the app server as a "research" justification 😏
haha
im likely going to do some ai interwining in the future but right now im really trying to lockdown the billing cycle etc.
Billing is the scariest part
Hey, I am new to codex. Can some one rate my new project which i made using codex?😄
https://github.com/MightyXdash/ONCard
That's sick! So it's React Native. Honestly looking at it I thought it was SwiftUI with carefully designed GeometryReader+paths or MetalView lol that's how sick it looks
Yeah it turned out nice imo, and Menlo font is kinda chefs kiss
for a techy app
Liquid glass then looking at it on an actual beautiful retina display on full brightness
chefs kiss again
Nice! Keep at it 👍
haha, thx. the openAI community is nice to each other.
Honestly the Expo/RN work flow is so much nicer than Xcode/native swift, hot reloads are so nice, the performance doesnt really see any difference, and at this point the environment is so mature it really feels like you can build it as native as you want, and debugging feels easier cause you can attach RN developer tools so you have essentially a chrome devtools debugger
For native Swift I use InjectionNext for hot reloading the live app
ill have to check it out
Is anyone struggling with subagents thinking they are the main agent and creating their own subagents? E.g. initial prompt "You are an orchestrating agent. Spin 3 subagents to do task A, B, C". The first subagent, instead of just doing task A, then starts its own subagents to do A, B, C. etc.
Yeah, this is precisely why I have a separate subagent system that prevents non-orchestrators from spawning agents, and it provides separate instructions at runtime. Codex's base instructions tell them how to spawn agents whether they're an orchestrator or not, so there is always a non-zero chance they will think to try it
If they're not even aware of the tool then they wont think to use it
What separate subagent system are you using?
Custom
3rd saturday in a row codex gets IQ nerfed....
true, I'm using the taste skill at the moment, quite impressed with what GPT-5.4 can do with frontend UI if it has more ideal prompting
codex cli ... look at the 5h quota and the week quota ! why ?
Codex-CLI lets you override the base instructions 🤯
I think I know the answer already but app-server won’t load tool calls etc in the history correct?
Correct, so I like to maintain a local cache/diff so when you switch threads you don't lose that history
the system prompt? yeah this has been the case since opencode got the green light. It used to be restricted until then (on a subscription that is)
Yep, I had no idea they allow it now. So I can actually build the system I want to build
Hate when I create something beautiful I Want to share, but I dont want my IP ripped so i have to just sit here with it
and i dont mean 'internet protocol' lol
I meant intellectual property
Can you show a little bit with compromising it?
Possibly
Omg dude 😩 5.4 works sooooo much better with a custom system prompt. It's practically infallible now
You mind if I dm you?
I dont do dms 🙂
are you just tweaking it as you go or is there some golden ticket prompt lol
Well that's the thing, I've had these really strict processes spread across skills and AGENTS.md, but I consolidated them into the system prompt. Basically the default prompt gives them the role of a helpful "do it all" agent. I strictly separated the roles and now they stay within their role.
The orchestrator actually orchestrates, the worker actually works, and they don't have to repeatedly re-read the instructions or anything
why you gonna rip it off him?
If it was something largely in circulation, sure I’d ask for it but sounds like it’s something he’s working on himself so I’m not gonna poke and prod for it
It’s like starting your own mini openAI
i think codex in WSL was somehow switching to use the permissions from a super old codex install in windows. i updated the windows install and when i went back to the WSL version its permissions were now correct again so its some sort of weird sync issue between the two (though you would think that they would be completely separate). chatgpt led me to this idea by finding this issue on github: https://github.com/openai/codex/issues/13762
so onces were done with codex usage we switch to claude code
Anyone else found Codex has suddenly become much lazier? Not following through tasks to completion
I really think they've done something to it in the last 24 hrs because performance feels substantially degraded
Hello guys, is it possible to define a subagent with an mcp without putting it in the main config file? Reading the doc should be possible but i tried and seems to not work. The advantage of this approach would be avoid to dump all the mcp schema to the main agent context
can you post the subagent config youre trying?
this seems actually to work
in the main agent i have
Can i disable the openai doc skill in the main agent and give that to the subagent?
5.4 loves to summerise stuff
eg: i want to make a skill out of this prompt:
some big structured prompt with steps in it here
Check the skill it makes - complete butchery of the original prompt almost none of the original content survived
Ok I found a way to enable / disable skills in the main config / agent config. I was wondering: there is a way to open a session only with a specific agent, not with the main agent?
I noticed that GPT/Codex has a tendency to implement polling timers instead of message listeners, unless told to prefer the latter.
- Its original code implemented a polling timer every 2.5s that compared 1536 bytes against a cached HDR LUT snapshot.
- Then I asked Codex to optimize that mechanism for more CPU efficiency. Turns out that we already had listeners in place, but it seems like these listened for the wrong messages (albeit still needed for other things). So just disabling the 2.5 poll didn't suffice. Instead it suggested another 1s polling timer to monitor a Registry entry. This would be cheaper than the old LUT comparison, but still polling.
- Only then did Codex tell me about a solution that would install a listener on that exact Registry change and how all of that was documented by Microsoft.
So watch out that you tell GPT/Codex to look for listener solutions, else it might implement unnecessary active polling timers. 😉
yeah it loves imperative code
hi
Do you guys tend to throw a sprints worth of work at a time at codex
Or do feature focused work instead ?
I usually work mode MVP until told otherwise
focus in minimal deliverable and see if you like it
minimal viable product
I never managed a SWE team, so no idea what can be done within a sprint. When asked codex about story points, it's able to do about up to 3 story points in one go. With a good plan this can go up to 5. But I have basically no idea how this translate in real-enterprise-timelines as I've never experienced them.
55 agents, 25 of which worked through the whole night, cost 12% weekly. Efficiency is dialed in
How many tasks did u give it to work the whole night ?
55 tasks
Small tasks or complex tasks that touch multiple files ?
Mostly complex tasks
You must’ve spent a lot of time planning for sure
How much time spent planning before execution ?
I don't work with agents only to audit, usually use deterministic bots
where do you get a deterministic bot?
4 hours of QA and writing issues, then told orchestrator to divide the issues into slices and let em work on it following my specs
The AI alucinates, don't know when, don't know where.
Usually when asking for recursive action in time tried to build or ask the code to AI to extract real data and from there take better desitions after. Before I use to code myself, now I read it and approve or change it. But, the idea es to connect the bot to an endpoint of data
Right but how is that deterministic?
temperature = 0 🤣
I see I spent like a whole day re working a system and did some deep planning then doing incremental PR’s to ensure I can manually verify in the app that functionality still works
Since it’s a critical flow
it has a set of rules based on real connected data, I use Linux so is pretty connectable to different commands in the system, either creating them or use buildin infraestructure
Y recommend reading code with nano 😉
I just let em rip, they PR and get code reviewed however many times it takes to pass, the orchestrator does a completion review, then they merge into staging branch, at which point I test it, bump the version if it checks out, and merge into main
but in what parts you have a human in the loop?
I see, I guess my OCD just likes to do incremental changes for things I have a hard time following in my codebase lel
- QA testing
- Issue authoring
- Deployment
- Merging into main
I see, I think I do it differently since I don’t have a staging branch
I do my staging in local
Lol
Local is connected to non prod
Yeah a lotta folks use main as staging and add a tag to stable versions, some make branches and put an empty commit with a tag for stable versions, I merge staging into main for stable versions. I don't think there's any particular "correct" way 🙂
@boreal holly do you have your agent autonomously go through ai code review when it’s in a pr state?
Yes, as you can see in that video I posted, my cloud review is almost out for the week, lol that's one particular part of the system that does not get heavily optimized, and it sometimes takes ten, maybe twenty, iterations before their PR gets approved for merge
How do u make codex keep polling new review comments that come up?
Mine tend to do it but not consistently it just randomly ends their long running task and I have to resume
I have an MCP server that forces them to wait until the code review is complete. Once it's complete, if they choose not to do anything about it, it notifies the orchestrator, who then instructs them to follow through with the code review fixes
I see
So custom made mcp or one that already exists ?
It's custom made, but I also have a custom shell wrapper that produces a job ID for every single command. The MCP server latches onto that job and forces the agent to wait until it's completed
So I had to modify parts of my system so that their shell, which is typically ZSH, is completely replaced with my wrapper. They have no choice but to follow the command execution protocol
why does codex app not have /compact aka the most useful feature ever?
wth is my codex doing
shrodrigners terminal
its running until you take a closer look
Have the agent run echo $CODEX_THREAD_ID, copy the results, you run codex resume {PASTE}, execute /compact, ctrl+c, restart Codex desktop 🤌
It's that easy lol
... or add /compact to the actual app so its usable?
Nah would be too easy
But you can just ask the ai to give a summary for a new chat and then open a new window
Thank you codex for making it so I can ditch squarespace
This thing builds websites like nobodys business
This is pretty subpar at best though
Codex is cool but can we get a CLion button the same way there are buttons for other jetbrains IDEs?
Any sudamerican ai discord?
100x increase in model performance
what do you mean?
Rather than "Always use apply_patch", give them the option. Idc if the agent uses it or not. It shows up all pretty in the GUI but if they wanna script-fu to write the code that's fine by me. As a result they can make more complicated edits in one shot instead of carefully patching all day
i swear weekly usage runs out faster than the 5hour quota
too much definitions, conciousness wont happen, and mostly will not happen at all.
also it just sounds bad in general
Delusional silicon valley fantasy
I used the Codex App with GPT-5.4 at Ultra High to perform an AI experiment in theoretical physics research.
GPT-5.4 was asked to investigate whether my Æther-flow interpretation of relativity could be evaluated and developed as a valid interpretation of relativity and to expand upon it.
The model was given Æther and Æther-Flow, my original statement of the concept, as its starting point.
The experiment ran in average of 6 hours per day for 14 days. The LaTeX format was used to format the documents for AI use, and PDF was used for human readability.
The experiment produced mathematical sound theory called The Æther-Flow Interpretation of Relativity based on my Æther_and_Æther-Flow concept. The experiment produced a journal manuscript composed of 7 closure articles (supporting the theory), 1 front-facing flagship article, and a total of 87 research articles.
I am not a theoretical physicist, so I cannot independently judge whether the theory is ultimately correct or physically viable. If you are a theoretical physicist, I would welcome your feedback:
https://github.com/Omegapy/AEther
I am interested in how viable the process I used is for generating theoretical physics research.
funny that's what I would have said 5 years ago
about an AI writing code as well as Codex does today
we will eventually reach AGI but not so soon
LLM is really good at stitching together plausible (but false) ideas. They are hallucinations. Just knowing this is enough to think the result will be wrong.
Essentially llm can't be the expert, it is the tool the expert uses.
already has happened
like as in a thing that is somewhat intelligent at roughly any task
already exists
now human level general intelligence?
different story
super intelligence?
different universe
Yes it is true, the AI needs to iterate several time on its generation -> check its own results. This is not 100% proof against hallucinations. Additionally this project uses maths and physics which have deterministic outcomes so statistically it is very low probability that the AI will hallucinates, however it does not means that it is right tho…
but I mean llms are roughly as intelligent as low tier humans imo
we will enter the dune style of ai endgame
@boreal holly how do you keep code quality, codebase maintanability, and no duplication of code in check when running 55 agents?
This is a mistake to think: This 100% proof against hallucinations. It doesn't work like this.
There is a never ending stream of people who think they have solved problems like this now with llm help and no actual expertise in the area.
this is trivially obvious
It's way worse in the claude discord
Same
gemini and google for developer discord is a mix like this one between goated people and newbies except this one is more active xd
though I have to say a lot of hallucination claims are a bit overstated
ahah, claude is better at subtly convincing ppl they are a genius on things they know nothing about. It's like a dunning Kruger 10x machine
I haven't had too many issues with hallucination
Excellent question...
codex kind of does it but in a back handed way, it's more like You're to flag that..... goes on to explain how it was right and you were wrong
i swear to god
gpt 5.4
is so annoying
IT LITERALLY CALLS YOU WRONG
like
all the time
I heard that same story for last 3 years AI cannot do this or that because of this or that.
Every time these arguments have been proven wrong…
At the end of the day, AI, with the right framework can almost do anything a human does, but faster and better…
wutt
no?
context rot is still a thing
last time I checked
llms still regularly fall for misguided attention problems
Yes it is, but you are not considering a framework or a wrapper around the AI that addresses the AI shortcomings…
Coding AI like Claude code are just that!
The problem you have right now is you need an expert to verify your work. Go get an expert to do that.
Logic would dictate that the expert can probably dispel the idea instantly without even needing to read the research and they already have long ago. The experts are also using the same tool to work on the same problems, yet you still think you are likely to know better.
bro didnt even mention opencode(the goat)
Yes I do need I an expert, like every scientific paper needs to be peer reviewed.
I am not claiming that the theory is viable. This was an AI experiment mostly.
Are you an expert?
You are not expert so you can not claim that… or argue against it
Are you theoretical physicist?
My claim does not require me to be one
What is your claim?
People do not understand that AI limitations can be addressed in large extent by implementing frameworks around it…
That is what an AI agent mostly is.
I am not trying trying to copyright it
You are in a discord channel for a coding agent, most experts here are a SWE. You're in the wrong place looking for a physics expert. Every SWE using codex understands the limitations of codex in their field.
have you read your own github?
I used a coding agent to generate the manuscript, just saying
no
Right, but youre asking for a physics exert here?
Go where they are and ask - this is a place for coding experts and vibe coders
you seriously
havent read
what you generated
have you
complete non sequiter
I wonder whats the math on how subsidized the average codex sub is
and how subsidized it is at maxed out usage
comp to api cost ofc
they probably have a enough separation between plus and pro that the ppl who are willing to pay for pro but cant utilise it cover a lot of the difference. I'd guess the telemetry they get is invaluable.
I am showcasing a use case of the codex app… an AI experiment on how fare gpt-5.4 can take a concept…
bye
I suppose so, but I bet they take api data as well
regardless of if they admit it
It's not a strong showcase if you have no idea it worked as you expected. And again - you are asking for a physics expert in the wrong place.
At best at a first glance no one will take it seriously.
Are guys even coding expert…
I wonder if this actually does anything
define expert
Well I got my answer
I have been reading and all I can say is LMAO
I always have imposter syndrome and llms just make it worse ahah
They always make it feel like there is just a little more you can get done but arent quite saying the right words to it.
Just a little more... forever
just a little better...
become hopelessly egotistical
claude can help with that goal
idk man
claude is too expensive 4 me
I dont use llms enough to justify the 200 or 100 dollar subs
so I slum it on a 20 dollar oai sub and 20 dollar gai sub
20 dollar claude plan is a joke
google plan is mostly for nb access
also google integration to google services is nice to have
Thats the thing though, you keep iterating and iterating to take advantage of the tokens more and more. Before you know it you can burn easily several subscriptions on projects you may never use. I think the real constraints is working out what you should spend the tokens on.
I use codex simply to not fall behind
I don’t have an actual use for any of what I make
Ai simply creates too much cognitive debt
Though this could be a skill issue
I do still write code by hand like a caveman
I just checked some code and codex has a state machine with 35 odd combinations of enum values, probably only need 4 different states and a null check here and there. Should i care if it works?
I should say if it works, should i be worried?!
if i said this phrase to codex it would 1000% be taken the wrong way Should i care if it works?
Huh you guys are chat bots…
you ok there?
I am almost at the point of only checking the code that i care enough about - like it doesnt really matter if there is a wrapper function around a function that is called once. I can just leave it.
I think that I have been arguing with chat bots…
I think you have been playing make-believe science and are struggling to find anyone remotely interested
We just talking about common nuances to coding with codex and it probably doesnt translate to you.
Why you can not understand that AI can do research, I still remember when people were telling me that AI could not code…
I am not saying that AI cannot do research, what I am saying is that your pretend make-believe science is not research
drop a repo link into gpt pro and ask it to be brutally honest about the project and its conclusions.
Last time i did that i sat in the corner and questioned my life
How do you know that? Are you a theoretical physicist?
You say that AI can do research but it can not at the same time….
It can or it can not.
I just ask the AI to explore a concept…
this channel is about Codex, not roleplay
I used codex!!!!
I saw that
Codex is an agent you can used strict for coding or for other use case
LMAO
GPT-5.4 was asked to investigate whether my Æther-flow interpretation of relativity could be developed as a valid interpretation of relativity and to expand upon it. that is roleplay
I (CMU Physics student) analyzed this chat, and @velvet wren is wrong.
welcome to the chat new person
Yes gpt-5.4 within the codex app…
let's stick to the channel topic from now on thanks
Which is codex
correct, not roleplay
Asking codex to prove or disprove a concept is not role play. Good God!!!!
What you effectively did is role play, as the project moves forward the model buys into the fallacy and perpetuates it
Are you rely a physic student? I would love that!
It makes up plausible scenarios and then plausibly proves them, just like it would if you were playing d&d with it
ahuh
No it did not it did about 87 research articles to disprove or to support the concept
I can do this with 87 pages of d&d and it will prove it all if i tell it to
Discussing this further is not codex related... But serious question, Will the upcoming changes in context be better for me with pro?
I'm interested - where can read about them?
Do you even understand how research is done?
drop it or I will refer to the mods
Just go and find the ppl you need to verify your work.
yes
?
@velvet wren is wrong
so is @cedar skiff , thinking this is a valid argument means your brain / thinking is flawed
server rules rule 3
logical fallacy
He did not break any rule, this is a deterministic ground truth
I keep saying that to them, they seem to not understand
just ignore them, it's their own disadvantage
You even fell the need to make a second account to come and argue. Why don't you just go and find the people you need to verify your work and we will see alex as the next nobel prize winner.
💀
Just go and do what you need.
I think that you are right
robert stop harrassing people for no reason
is it just me or is the gpt-5.4 in free tier is dumber than gpt-5.4 in paid plans 🤔
Lol
Is there anyone having a problem with CODEX desktop app its crushes after launch around 15 sec
i suspecting its because of the Windows update last night,
What gpt model are u guys using for coding?
GPT-5.4 high here
For those using Codex in a monorepo (/web, /desktop), do you run Codex from the root or inside each app directory? My AGENTS.md is in root so I figured root is best?
i seem to be burning through tokens ridiculously fast again. is this happening to anyone else?
using the app or cli?
root
seems ok here, not using fast, allowed subagents, using default context
if I use GPT-5.4 with xhigh then the burn rate is more noticeable, but high is fine
yeah I noticed something in codex app. Even with /fast turned of if you click on the + it can still have fast selected? 
vscode
tbf i use xhigh a lot, but the rate i'm going through tokens is still insane. burnt through a top-up in half an hour
im actually pissed at how easy it is to deplete the weekly usage
im just using gpt5.4 high
its like 1 prompt for 5% of weekly usage, but its like half a percent on 5hour?
on the plus plan btw
Well, back in August (there is a particular project I'm using this harness on that is impossibly large for just 1 guy, hence the swarm) I set up the project in a way that makes duplicate code the least plausible outcome. Same with monolithic code files. It's all about scaffolding it in a way so when they navigate the codebase they are most likely to see the correct way to do things.
Step 1, I chose Rust not just because I love it, but because it automatically rules out memory issues, the compiler enforces code quality in a meaningful way, and it lets me subdivide the project into small shared crates easily. The agents use the cargo.toml as a table of contents for where parts of the project are located. The frontend uses Flutter with Rust as the state machine, and the backend uses Rust entirely, so if they're writing code that is used on both ends they write the DTOs in a shared crate. This pattern is so common throughout the project it's automatic. The beginning was rough, but with 55 shared crates they see it and just do it, writing all shared logic in a compact "public library" format.
Step 2, context distribution. Everybody knows a new thread always performs better than a 10x compacted thread. But, lo and behold, if an agent has exactly 1 job to do with a small & fixed set of decisions, they can perform well enough after 10x compactions. An orchestrator takes in a task or a list of tasks, has instructions and tooling for spawning agents, delegates the tasks, and adversarily reviews their completion.
Step 3, tooling. If you use --yolo, you're giving the agents the ability to work around strict processes. I give them workspace write, and I set their CWD to a worktree (or the orchestrator does this). I have rules that forbid certain commands, and the justifications tell them what they should do instead. When they hallucinate and drift, the tooling tells them to do it the correct way.
There's more to it, but those are the 3 things I think are the most impactful
Not sure where to post this, but while I love Codex and GPT-5.4 when it comes to bigger projects like mine, it becomes way too shy for big refactors, it tries too hard to maintain compatibility and is too biased for surgical refactors, etc. Even when I instruct it not too, it still feels heavily biased to not do big rewrites. I do like that it has the ability to be extremely careful it does have its uses cases, but it should not be very difficult to make it do aggressive rewrites, the potential is being held back imo in these use cases. Not sure what the ideal solution would be but maybe an option "Aggressive / Careful" coding or make it more intuitive or easier to instruct it to take bigger risks in its refatoring efforts.
wut??
You do know that the current usage limits are doubled right
like
if u need more usage
just buy credits?
or optimize your using
I didn't fix the Apply Patch tool per se, but I loosened the restrictions so they can explore alternative code editing techniques. I understand that Apply Patch is there because it can fail if there's any difference between what they're applying and what's actually there, which can be helpful. Also, it produces a clean-looking user interface when they're applying a patch, and it lets them track diffs or whatever, but I use version control, so I don't really care if they use Apply Patch or not.
I once had an issue deeply refactoring a part of my frontend. I asked it twice or thrice to redo it completely, that it was okay since I had an up to date repo, backups and so, all to reassure it on the possible agressive refactor.
It stayed biased, didn’t change the UI that much (if nothing), took too much time for nothing
Then I asked it to nuke the front-end, that I hate it and don’t want it anymore, just to keep track of the API routes used
Once it nuked it, GPT 5.3-Codex was able to refactor it like I wanted
I don’t know if it suits your case, but explicitly asking GPT to remove everything it knows and sees to redo it while keeping only the things you need seems to be the working solution
I understand, thanks
If the default context windows would be at least 544 Kt (without usage x2 rate) that would be a great improvement for handling big context. The x2 after the 272 Kt seams to high.
You gotta put important info back into the chat history as user messages. User messages survive compaction. 272K is fine if the user messages contain important and detailed information
For example if you build a plan with the agent, reinsert the plan they made into the conversation as a user message and it will not forget the plan for the whole conversation
i know but they get wasted easily
5.4 xhigh /fast with subagents and responses_websockets?
wut”
i like
never run out
on plus
of weekly*
I only use it like 4 times a week tho tbf
Why would you use xhigh?
genuinely
It’s not better than high
and mostly just wastes tokens
(for 99% of work)
also r u using fast inference
High isn't even better than medium in most cases
true, i mean I only use high for when I use 5.4 mini
otherwise medium 90% of the time
not being autocompacted every 10 seconds does wonders
That's perfect. There's a real performance reason for gpt-5.4-mini. 5.4 on high uses 33% more tokens and is 0.3% more accurate
yes... to keep the agent under control, is not enough. I actively use handoff documents to keep it trackable. But there are certain tasks that require a lot of context and it easy to reach the context limit. The compaction is a risky operation. Also I use subagents to separate tasks, isolated. But to be honest, having 500K tokens (same price) wold be great, that recurrent handoff operations would be fewer.
Gotcha, I mean that's up to OpenAI. I'm just saying if you have docs in the project that they have to cat TODO.md, when they undergo compaction, the plan becomes a "mental state blob". If you give it to the agent as a user message it is preserved verbatim. They don't forget to read the file and start drifting because the plan is unavoidably in their brain at all times
Yep. Handling tasks within the 270 Kt is the challenge... I always avoid to reach auto compaction (it's not under control of the agent manager, it could miss important context), so I use a handoff document that I can edit if necessary or work on it with other agent and see if it's necessary to split the task into smaller chunks. It's about to keep the context healthy and clear and with with at least 500 Kt my life would be much easier 😄
I mean you might as well try it! Sure it costs 2x if you set it to 500k, but if you look at the math:
272k * 1 = 272k
228k * 2 = 456k
Total effective usage = 272k + 456k = 728k
per-token multiplier = 728k / 500k = 1.456x
So if the amount of work you can get done is more than 1.456x by simply raising the context window then you can easily justify the cost of raising the limit
Prefill speed decreases quadratically with context window (if not cached), decode decreases linearly per token, and memory footprint increases linearly based on model arch. 1M context might cost a terabyte of VRAM, so I get why they charge more quota to go past 272k.
Yeah! I have my real limit over the 272 K. I increased the context window for those cases where I'm a bit tight with the task and need temporary extra window to avoid the auto compaction operation during the handoff operation. But in general I manually try to keep it under 272 Kt, including the handoff operation.
Before I used Gemini with its 1M context window by default, so I'm really spoiled when it comes to that. 😄
My comment is more about a wish rather than a real issue 🙂
I gotcha 😏 I think Google has specialized hardware they built themselves, and not enough users to put a DDoS-level load on their stuff so they're being generous
Reinforcement learning
new system prompt totally changes the game. 5.4 is downright surgical if the system prompt tight
where can I find this new system prompt?
I keep my system prompts here: https://github.com/robertmsale/.codex/tree/main/roles
That was an orchestrator. You set the sys prompt by going
model_base_instructions = "/path/to/instructions.md"
or codex -c model_base_instructions="/path/to/instructions.md"
Dang Rob, progress 👏
Will they ever let us generate images inside of codex? It would be so useful
Loll "Yes." "Understood."
someone's gotta tell Codex to live a little, can be so robotic sometimes lmao
I made it that way on purpose 😂 one of their instructions is "never respond with a decision, take decisive action and respond with the results", so they aren't like "I'm going to approve the worker's command" without actually carrying out the approval. Unfortunately that makes personality disappear
But they never stop until action is taken, so the benefit is forward progress
you can use the kylewhirl/image-playground-skill to generate pictures with playground/chatgpt which can use your subscription
one-shot by gpt5.4
nice, how did you create the prompt spec?
I have created an AI agent that codes close to 1:1 replicas of any website. just paste a url. not sure if I wanna do b2c or only b2b with it.
but exploring a new product where just say what site you like then you get a website based on that vibe.
this was first prototype, it will get much better if I can get funding for it so I got time for working on it more.
Your job is to create a landing page for an AI agent company that is to handle legal cases.
Don't focus too much on the text and such, but make it like a full landing page for an AI agent website. The way you are to create this website Is by using this website as a spec and instruction on how the code is to look and how the styling is to be and basically this is your design guide. Do not output any code that doesn't reference this website. Essentially you are to think of it such that you are to create a website that looks like this specification but just for a different content site so it is to be different content and different text basically but the hover effects should be the same the same interactable code design the same style the same colors the same vibe exactly it is to look like browser base .-com landing page but for the AI agent company that is to handle legal cases so you are to essentially create your website that looks the same basically but for the company landing page that you are building. Be thorough and create a perfect one and make sure that it truly is inspired by this website.:
ok that prompt was a hard read... but worked for first prototype apparently
Is it worth to buy the 1k credits? or any other suggestion?
Fun with VSCode extension
what model is used in codex cloud if anyone knows?
what are the main reasons of codex cloud vs codex local
I would say if you can help it, don't use cloud. If you don't have a computer then it's the obvious choice. If you have a potato computer, or you use Windows, those are other reasons cloud makes sense. Nobody knows what model runs on cloud but probably a flagship model
It's been so long without an update from Codex... almost 1 week! I hope everyone is healthy!
ngl I am eagerly waiting for codex app-server --listen ws://[::]:4500 to transition from experimental to stable. It's the most exciting thing they've released since 5.4
anyone else notice a serious drop in attention span for 5.4 today? it seems like it's getting sidetracked very easily.
someoen got tired.
cloud doesn't make much sense on windows either
cloud is good for when you're out somewhere and an idea comes to you
An observation that might help someone:
I have directives for Codex to update files like README, CHANGELOG, etc. If I see an anomaly in its text, I strengthen rules, and I may reprocess a short effort to see if it uses the new rules. But it doesn't work quite like that:
- If you change rules somewhere, you must tell Codex explicitly to re-read the updated file, otherwise the old rules are in context and not re-read.
- If a file has already been updated using a format/pattern, Codex will use the existing file as a model in preference to directives. That is, it follows the current convention as a higher-level guide than earlier instructions. This is (arguably) consistent with the pattern we know where instructions in the current context have more strength than Custom Instructions, Project Instructions, or AGENTS.md files.
I've seen this play out many times but am only now committing to keep this in mind as I tune bot behaviors.
HTH (I hope that helps)
I've already made an IRC client, that connects to Codex 🙂
To the ios devs here: Whats your go-to way of creating icons? Is there a specific workflow ? Maybe combining it with the icon composer.
Affinity Designer + Icon Composer
I flippin love the new Icon Composer. No more exporting 30 different sized images
I launched a VSCode extension to manage multiple git accounts, it works well and now available in the vscode marketplace. https://marketplace.visualstudio.com/items?itemName=Clickshow.ghas
@boreal holly what models and harnesses are you mainly using right now
have you tried composer 2?
ive heard it's fast and okay, but the benchmarks dont mean anything anymore xd
Orchestrator: 5.4 medium
Worker: 5.4-mini high
Hidden: 5.4 medium
Custom harness
I haven't tried anything other than GPT models and custom harnesses. I have however tried local inference with limited success
hidden?
Ive been using 5.4 xhigh for plan and then 5.4 medium for orchestration and 5.4 mini high for subagent
I have hidden agents that do not participate in the orchestration/communication part of the harness. They're for one-off tasks. They have the little eyeball icon. I don't want em bothered by other agents, nor to bother other agents 😂
ah xD
I see, thanks for the explanation
always working to improve my workflow
for me though, ive been slumming it with normal codex harness
it's good enough™
Hey that's fine! I like playing Conway's Game of Life with my codebases 😂
I just dont want to get in the trap of always improving my "workflow" to the point where I spend the majority of my time on my workflow compared to actually making things xd
I do have limited tokens after all
I have oai plus and gemini pro but rn gemini is basically unusable
True! I have spent a fairly significant amount of time on the workflow, but I'm building an AI powered software so using the workflow as practice
yeah all the software I build is normal
right now im experimenting with 5.4 mini on medium
it is mostly usable
the faster iteration helps offset the small increased error rate
I mean 5.4 mini is silly fast
You may have chosen a personal account, not your business workspace.
No i chose the business, as u can see i have 100%
But before i swtich to business i consumed all of my account usage
So is this a bug, where i need to upgrade to plus to use my business seat?
I use the business plan but I never experienced this in the desktop codex, but try the cli codex if there is no problem then there is a bug in the desktop codex
Im using the IDE in visual, its not same as codex exe right?
Yeah, it’s not exactly the same. The IDE in Visual uses an extension/UI layer, while Codex CLI runs directly in the terminal — so if the CLI works fine, it might be an IDE-specific issue
But can i have my same conversation on the CLI? cuz im working on a project and we working on a roadmap and it stopped in half of a task
You can continue in CLI, but it doesn’t maintain the same conversational state as the IDE. The IDE layer keeps richer context and history, while the CLI is more prompt-driven. So to resume your task reliably, you’ll need to restate the context or provide the relevant instructions again.
I tried another account and i was able to send 1 message then i had same error that saying out of codex while i have 100%
Is this becuase im using a lot of business seats and accounts?
I used 3 business seats in 4 days only
if you’re definitely on the business workspace, then it’s unlikely related to seats — seats only manage access, not usage limits.
Getting ‘out of Codex’ after a single request sounds more like a quota or rate limit issue on the backend, or possibly a bug in how usage is being tracked in the desktop/IDE layer.
Yeah i mean business workpace
Got it — if you're definitely on the business workspace, then seats shouldn’t affect this at all.
Hitting ‘out of Codex’ after a single request isn’t expected behavior — that points more to a quota tracking issue or a bug, especially in the IDE layer.
If you haven’t already, I’d compare it with CLI behavior — if CLI works fine, that pretty much confirms it’s an IDE-specific issue rather than anything related to your account or seats
I'm in the process of refactoring a lot of task-oriented/procedure docs into skills, some of which will do routing to other skills. Does anyone have a sense of limits on this fairly new tech in terms of:
- number of skills folders which become unwieldy
- number of invoked skills which start to clutter context or otherwise start to get ignored
- maximum SKILLS.md files size beyond which the agent gets wonky for any given skill
I don't intend to have a thousand skills or files with multi-MB of prose. I'm just looking for technically reasonable limits that we self-impose based on experience which I don't have yet.
Thanks!
Oh, and is it reasonable to assume that skills will trigger other skills?
For example, if a prompt implies a documentation update by the Docs skill, and specific doc updates imply a specific type of handling by a DocTypeFooHandler skill, does model CoT lead to specific skill invocation?
I suspect so but need to ask, as I have to wonder how far down the rabbit hole that goes. 🐰
5.4 high, without subagents (excluding the search model they use), without /fast, without responses_websockets
im just a plain dude
5.4 high is all i need
now im using 5.4 mini to stretch out the usage
looks attractive but iddkk
mcp
can anyone try installing the github mcp server, adding their gh pat token, and seeing if they can list all their repos from codex? I tried a lot but couldnt do it, it keeps on saying I need to do gh auth login but it doesnt let me finish the flow to do it. docs: https://github.com/github/github-mcp-server/blob/main/docs/installation-guides/install-codex.md
@nocturne folio @lean lark @elder edge if u guys could try would be great
Edit: nvm someone said that it also stopped working for them. wack
Just a Public Service Announcement from your buddy The Captain: Be careful about MCP Servers and Tools.
You give um your API keys. You give um the creds to your accounts.
Are ya lookin at the code or are ya just trusting?
You auto-update FOSS MCP components. Are you checking the PRs? Who is?
If you're paying for a sub that offers services to save you work, what are they doing with your creds?
Just be careful out there, and advise your employers, family, and friends to do the same.
um, wut?
did they change the copilot plan on the github student pack?
Yeah
How is /fork meant to be used? I sometimes want to "go back" several steps in a codex (cli) conversation and then from a certain previous point, "branch off" to a new conversation. Is that possible, is this what /fork is for?
Is there any usage cost to using the codex STT
@elder edge i tried it today and it worked normally, didnt have same bug as yesterday
Aint no way, i send a message now and the bug comes again
Xd it disappears and returns for no apparent reason
fork splits the conversation. Agent suggests next steps that go a different direction than you want to? Fork the convo and keep your momentum going in convo-A and follow the next-steps in convo-B. when convo-B runs its course you can keep it around OR archive it OR get the session ID and tell the agent in convo-A to use codex exec resume <convo B session ID> and "interview" it about the work it did to bring it up to speed.
is codex feeling like its cripple atm?
hey i am createing a socail media agent with codex and i add image genraotr API and text genrator API from chatgpt i buy credits now prolbem its not creating good posts text is good but post deisgn is very bad can any suggest me any workflow or anything thats help me to improve
fork is typically used if you wanna have 1 agent that explored and planned something big, then before implementation you fork and have it do a slice. when the slice is done, resume the original and fork again to do another slice. That way the research and planning phase isnt being repeated
Do any of you use the Codex CLI (TUI)? If so, I'd appreciate your help in validating a new feature before we enable it by default for all users. I've been working on porting the TUI on top of our "app server" API. This will allow us to add cool new features to the TUI like like remote access. To enable the new functionality, use "/experimental", choose "App-server TUI", hit enter, then quit the CLI and relaunch. If you see any regressions, please report them here or in the github issue tracker. Thanks!
Does this just run the tui with the app server in the background?
Kind of. It runs the TUI on top of an in-process instance of the app server. So you won't see a separate process.
I've had the feature enabled for quite a while and honestly haven't noticed any regressions, I've also been using the Codex app server for other UI's.
For instance, I forked clui-cc which was originally meant for Claude code, but I changed the backend and UI to fit more with Codex (you can see this here
Here's a screenshot of that in action:
I plan to make the UI for tool calls a little better and more informative when you click on it, but I need to brainstorm how that would look like.
-# If you want me to switch out better-hub links for regular github links, let me know.
So what happens if you do that but use the —remote flag? Is there a way to have the internal app server expose its port so you can just use the tui as the app server natively and have external sources connect to its internal app server?
We'll implement remoting and document how it works in the near future. For now, I'm just looking for validation that the local (in-process) app server feature works. This is a prerequisite for remoting.
??
Sounds good.
I think we need a skill like $startup-mode
where Codex is significantly more willing to cutover, totally change functionality / architecture
and find smart ways to reduce scope while keeping a similar end-user experience
Make one i forgot a few times to say it and then i made one.
I'm just spit balling some ideas for token savings, I wonder if it would be useful to effectively route all CLI calls/tool usage thats not read to a temp text file. Then if the output from the CLI is needed call the read_tool along with a hash to get that specific output.
I was just thinking for some build processes it would be nice to effectively silence output and go off of return codes only.
@boreal holly swears by using mini agents to offload tool calls to extract whatever Er
You save a substantial amount of tokens doing it that way. If you have a smaller, faster agent read the output and summarize it you save ~40% context window and weekly quota (for implementing agents & code writers)
Is $200 plan 6x more than pro, or unlimited?
I believe it, even if you used the spark for summary it wouldn't eat your normal usage. We need some more hooks!
Oh yeah, you can dial it in so the base instructions are super minimal and save even more tokens. I'm working on that right now!
What is the limit of 200 plan, can you use 12 hour a day, for entire month
Depending on your usage. It is not unlimited.
Thank you
I use it pretty much 20 hours a day and it lasts me the full the week
The limits get low by the end of the week but it lasts
Thank you
💀
youre cooked after the limit halvening
how do I examine the finding noted in the upper left of the input panel in codex app.
Why is there a new codex update out, but nothing in the changelog? Is this a litellm bugfix?
Saw a video a while back where soemone was updating codex projects using their phone. Any ideas what was being used?
It’ll be printed in its response
One of the [P#]
Happens all the time. OpenAI isn't big on documenting changes. How is it possible for a frontier company on the bleeding edge of technology to subscribe to these grumpy old man and lazy millennial culture policies like not publishing change logs? I expect this from Twitter, Google, Microsoft, Facebook, etc. But OpenAI has been a real disappointment in this one specific area. SMH
fr
Does codex use lite llm!?
It’d be weird since the OpenAI SDK already does what litellm does afaik (I’m using it instead of litellm to allow connections to local llm for example)
Also I doubt they’d only push an alpha if they were using it and fixing the litellm breach. so very likely no
I agree on the change log transparency.
Seconded
There's a changelog? (jokes)
Starting on March 31, 2026, Code Review usage will count toward your regular Codex limit instead of having a separate Code Review-specific limit.
This means:
• Code Review will draw from the same Codex usage pool as your other Codex activity
• You will no longer see a separate allowance just for Code Review
Wonder why
Heyyyyy guess who's gonna be doing local review only
more than likely starting to consolidate compute pools
When was rg my system for litellm codex popped.
Is there any tool to remember, browse, manage codex "sessions"? By session I basically mean the state that can be resumed using this long string it gives you after exiting.
Basically, I mean the equivalent of the left side bar of plain ChatGPT, for codex (cli)
/resume
That just resumes the last session
I mean remember, browse, manage the many sessions, of many different projects. which resume string belongs to which project? which state? things like that
Sorry I stand corrected
You are acutally right
This does list the sessions in the current dir with summary yes
ok cool
not sure if this has been mentioned before regarding token limits and model accuracy. I turned off 1 million context window and memory and im seeing "normal" as expected token usage. Also using /fast and sub-agents all the time with 5.4-high as standard.
codex has memory?
hhh
Hi
e
noooo.... I'm at 2% of burning my week quota
Reach 0% after 4 days again, despite mixing workload with the web app. But luckily it hit 0% right after a cleanup run and creation of a new prompt. 🙂
anyone know how the credits work? the equivalent into tokens?
I'm waiting for the mid-tier ($100) between Plus and Pro... But maybe with credits I could continue working as usual
the problem will be when the promotion (2x) ends (soon, April 2nd)
pretty sure openai doesnt stop ur responses mid way like anthropic
im just waiting for the announcement of the 2x usage increase to stay, or possibly a 4x
I mean to remember that I saw that happen last week, but I didn't really care to take a note, so I'm not sure.
the $100 pro lite plan is already in the json of the page, but it's not visible yet. My guess would be: it will be revealed shortly before or after the 2× stops
yeah that's what I think too. And it looks like Pro ($200/mo) plan will be renamed effectively to Pro 20x. That's 20x of Plus apparently. Currently it's about 6x to 8x (8x in my own experience) of Plus.
Pro Lite will be effectively named as Pro 5x
My guess would be: it will be revealed shortly before or after the 2× stops
Yeah! I expect the same.
the $100 pro lite plan is already in the json of the page,
what URL page? embed JSON?
when you load the "upgrade packages" page it loads a JSON from the current country (e.g. US or DE or …) there is the mention of prolite:
"prolite": {
"month": {
"amount": 100.0,
"tax": "inclusive"
}
}
there's also this within the javascript of one of the scripts on the plans page:
...
, g = a({
featureHigherUsageLimits: {
id: "pricingPlanConstants.pro.checkout.proliteGate.featureHigherUsageLimits",
defaultMessage: "5x or 20x more usage than Plus"
},
...
Among other similar lines mentioning 5x or 20x
lol base44. Nothing says omnipotent god like creating your app on base44/lovable.
Before the never-ending token resets, I could go through Plus in a single day easily. But now with ProI can barely get below 80% in a week. So wonder if I can switch back to Plus.
Get out of here with that scam stuff
I will definitely go for the $100 plan. At least, it's easier to switch between multiple plus accounts.
anyone else on the latest IDE extension notice the main pane is all blacked outr??
report user and message, its the only that helps.
reset?
I had to go with the credit option... but it burns credits quickly... 50 credits in about 2 hours, and not complicate tasks.
yo does codex app use memory?
probably not, as from the looks of it it's not directly impacting Codex
I think if
[features]
memory = true # or memories = true idk 🤷♂️
then yeah
There's a serious problem with how codex communicates what you get for what you pay, people've said this should be illegal but I think the company needs to get their act up. They were a bad model but now the pricing is annoying me, presiciely how many tokens I'm getting, it's super inconsistant and I never know.
These subscriptions suck
Also just like streaming services they pack in things you don't use to charge more, I only subscribe for codex yet there's no exclusive codex plan
if you really want granular control ig you oculd jsut switch to payg api
That generally costs more for the same number of tokens, whereas the monthly plans give you more.
It's not calculated as just tokens, it's also number of requests, and whether your conversations are being cached. for example, if you generate a huge plan with 5.4, and then switch to 5.4-mini to implement, it has to replay the entire conversation up to that point on a different set of GPUs with 5.4-mini loaded which is completely uncached. Or if you wait longer than 15 minutes to talk to the agent, most of the conversation becomes uncached and the next send incurs full cost.
- cached inputs
- uncached inputs at a higher rate
- number of requests (this includes the small tool calls)
- output tokens at a much higher rate
There are so many variables, they instead say "you can send between x and y messages with this subscription". They have no idea how you're gonna use it and offer a generalized range. That's really it
You are joking right? If it cost you less than the API which is 100% just what you need, than why you are complaining about it?
You better upgrade to PRO, we are still in the x2 phase...
I feel like good architecture and delivery management would massively cut down on the costs due to less substantial refactoring work.
The various coding agents are still vying for supremacy. The lead changes every couple of months. Not yet worth it for me investing in a single platform to that level.
As long as you don't come crying about it in a week or two xD...
Yeah, I finished last week with 52%, but that was with cloud reviews metered separately. Gonna ask OpenAI if they can do a "Pro Platinum Max" plan or something
idk if this is allowed or w/e but I wrote up a blog post on a feature I had 3 agents try and do, check it out if bored. spoiler: codex won, 1/2 hour initial response on 5.4-high
https://twolongos.com/3/24/i-gave-the-same-feature-to-3-different-agents-and-then-things-escalated/
how many tokens did you input/output in the week?
I don't keep track. I actually deleted a lot of session files recently so even if I did look it up, the stats are probably way off. Was taking up an insane # of gigs
if i hit the limit on weekly can i still use spark at that point?
Oh yeah absolutely (until they fold spark usage into general quota)
So 5.4 won by standing its ground instead of capitulating? I like it!
any1 else have memory leak/memory issues using codex app on windows? after i use codex and make it do alot my memory get used up so i will have 90¤ mem usage even after closing wsl and codex and evything i have on pc.
once you've closed them, check to make sure there are no more ghost processes around. It's a small side effect of the sandboxing for windows,
yeh il look, was working fine for a month and now suddenly its an issue. and only happens if i let codex work on my project. il try and check if it happens whit other projects to might be my project thats causing it.
I'm notice some weird in the weekly quota... It's dropping to fast respect previous days. I just got the renewal checkpoint for the week (100%), and I just do a session with only 88K tokens... and the current quota is 97%... At that rate I'll consume it in a few days. Before the quota consumption was slower. Anyone else notice this?
not sure how codex dose tokens, but if u make it do deeper serch in ur code and make it read more files etc ur gonna use more qoute. but im not sure if codex counts ur files as tokens or not.
Got my little guys piloting the simulator 😎
QA: receives user stories -> reports user experience issues to orchestrator
Orchestrator: turns any kind of report into an actionable plan -> spawns an agent to fix it, guides them to PR merge, adversarially reviews completion -> merges, archives worker and notifies QA to rebase and continue user story
Worker: receives unit of work with completion criteria, works diligently to achieve it
This is the one
is gpt5.3 codex free?
like unlimited usage
no
spot the dreamer o.0 🤣
i have pro and i manage to use all 100% even whit 2x rating....
damn
idk i thought i saw somewhere that it was lol
mb
Has any1 here run codex over 6 hours? iv ran mine for max 5 that i could manage for now.
i just downloaded codex
im trying it out because i ran out of my limits on antigravity
HF. but give it good promts or else u will get a code that half works
u can do it the easy way and give ur specificatiosn to gpt and ask it to ask u questions then make him make a promt for codex and send.
okay thank you
Awesome use of the technology bro!
My processes are not nearly as sophisticated but bend in that direction:
- Bot runs the utility.
- Bot verifies output and creates anomaly reports.
- We discuss open anomalies (Human In The Loop!!) and I finalize decisions on how to approach them.
- Bot processes final decisions.
- Bot updates docs and does all other housekeeping.
- Rinse/Repeat.
I feel guilty (sometimes) ... it's getting too easy to have really super-productive days.
That's the thing. Once the app gets large enough, and if you're the only human working on it, it gets tiring going "Hey, the add categories button is missing for the 3rd time" after booting up the container stack and doing a full rebuild. Now the QA agents do all that work, and it legitimately gets done
These suckas are thorough
easy tasks like thees arnt an issue for codex. but im working an ai project whit my ai. and my folder is now on 15 GB and codex are having issues so i have made overview and i need to make sure codex follows them and writes the changes it dose. wokring bit ptoject is really hard unlless u tell it exactly what to do and how to do it. ir can write code no problem issue is picing togheter all the files...
what is this?
They are definitely no longer an issue
I've been wondering where it was. 4 - 6 weeks they said, 6 weeks ago (or longer, it blurs).
You have to be careful codex does love to add complexity if you dont manage it. which gets out of control over time.
You can end up with 10k lines of soup that should be 3k lines
jupp. good promt and checking code myself here and there. but yes it tends to make it more complext then what it needs to be.
Yeah, i just let it go for a while to see how it would handle it, what it does is sees the complexity an assumes its purposful and needed. It does shallow checks i.e: Yes this thing is needed i can see it has a code relationship. Then it duplicates the complexity and adds more. Rinse and repeat until it gets too tangled to work on.
You can end up with 4 layers of abstraction with 3 enums all representing the same thing with slightly different names etc
Then if you just simply say is this needed? it will say yes because it can trace the relationship.
you have to have a deeper prompt to make it see the uneeded complexity
What language? just curious
dart mostly, but it tends to do this in any substantial code base. It knows everything but has zero comprehension.
dose it in python to
Yes!!!!
What I've learned is if you have a dart project, you put the logic in separate dart packages with their own pubspecs. Make them hunt down where everything is. If everything is in one massive 20k sloc file they just add more to it. But if they see that import statement at the top, and look at the pubspec.yaml, they see the project structure without even telling them to do it and make better decisions.
Everybody is so eager to give them the table of contents upfront, but if they have to search for stuff I think it triggers that reasoning part of the brain and the perplexity makes everything less "build app, make no mistakes" and more "gosh darn, how do I put this in here? How is everything wired? This is crazy!"
There really is no way around the comprehension problem, it's tied to the way they work. It's the same reason they can have new ideas. Everything you do is just mitigation.
u can tweak alot. in codex u can sett rules and how it works etc. so much stuff u can actually do.
They do catch loops on their own. This orchestrator has been through maybe 10 compactions. I was about to tell em to archive that agent and they suddenly snapped out of it and did it.
Please forgive me, @late bluff but this point needs to be made: If you're giving instructions to AI using the same language you use here, then there's no doubt you're going to have continuing problems.
"Garbage In, Garbage Out"
my folder is now on 15 GB
Yeah, there's no doubt there will be problems with that. You need atomicity in your code and data to allow it to focus on specific challenges.
"Garbage In, Garbage Out"
issue is picing togheter all the files
No, the issue is that you are asking AI to piece together files, when the correct way to do such things is either to avoid huge files, or to have the AI write code that assembles files correctly without having to send that much data to a server.
I cite these points as a service to our other colleagues here - as a great example of how NOT to use the technology.
There is lots you can do, but it is always a patch to the bigger issue. Just mitigation to help guide it.
Yes, these LLMs are the world's greatest pattern matchers. So the trick is to throw patterns down at them that are unavoidable
Something i would love is a layer that takes my ramblings and turns them into a nice llm friendly prompt before its sent.
This isn’t a garbage input issue, its a system coordination problem across multiple modules, and modern ai assisted development requires cross file reasoning avoiding that isnt realistic. also atomicity helps, but it doesnt replace the need for global context in complex systems.
i could give plenty more but, u get the point.
both yes and no. if u want randomess its bad yeh. it al depends how u use them in a system. u cant use them blindly.
I kinda wanna prove u wrong here but, theoreticly i could have set up an LLM to control codex and make a proejct thats good euogh. If i were to give the LLM ythe goals and how the system should be and what its gonna be sued how etc and i bet u it would make a decent programm for it even whtiout me doing anything else.
I was focused on your language style being garbage to AI processors, not your data. You may use a completely different style in your discussions with models, invalidating that part of my statement, so that's fine. The styles we use in chat can/should be vastly different from that used with AI ... but too many people don't understand that.
YES THIS!!!
100000% agree whit u
Kewl, thanks, we're on the same page wit dat thn. 🙂
If i have a really long prompt i ramble into the mic and the say Tell me what you think i am telling you to do.
Then i copy the output of that - its always fantastic and use that as my prompt.
I was also focused on your mention of 15GB of data. That's an abuse of server processing if passed to the server in huge payloads. If you're breaking that down into small batches then again I have nothing to moan about. Again, some people throw 15GB at AI in a single file and then go "wutz rong wit dat?" ... if you know better, kewl, some don't. 
I have custom GPTs dedicated to this concept. 🙂
nonononono i wouldnt be sending 15 gb tought the server over and over my gad XD
I want to set it up I just dont have the energy
hahah - then once again we're on the same page. 🤗
yeh mb for my wirting skills i guess
I don't look at it as "use the system", I look at it as "design the system."
When you run a program, how frustrating is it when you check a couple of boxes and receive an error saying "You cannot check both boxes at once, you must check either one or the other."?
It's frustrating. As a user, it absolutely kills the experience. For an AI, there is no feelings involved. But when they see the error, they go "Wow, I guess I can't check both boxes. My only option is one or the other."
So you design the system around rules with justifications. You know the randomness is there, and you'll always get that one agent that just doesn't wanna cooperate. So what do you do? You make it so when they try working around the process, they get feedback guiding them to the solution. Try to tune the parameters towards the correct path.
Get ChatGPT to create a prompt suitable for a Custom GPT that allows you to do it. This kind of "Inception" is freakin awesome.
^^ but u gotta know ur goals and a bit of how the program should be. if u just write make me a programm that dose this. its gonna do it but it wont be good
@lean lark u using codex app or just linux/wsl?
THIS is significant ... most people are vibing with Codex and just get it to satisfy the whim of the moment. Robert and I are using our experience with the tools - to improve the tools - to create better software that doesn't require tools to fix the same issues all the time. It's similar to the Custom GPT thing to organize thoughts, and using ChatGPT to create the prompt for that.
Use the tools to improve your use of the tools and you will do FAR better than just using the tools for each little task.
This comes back to the "give a man a fish and he eats for a day. teach him how to fish and he eats for a lifetime". Stop coding one fish at a time. (Um, wut? 🐟 ) Hone your tools to eliminate ongoing issues, create better code (and docs!!!), and it'll save you a ton of time - not just "later" but like later today, tomorrow, forever.
That's legit the best way to do it. Sometimes I'm like "Tell me what you think your role is as an orchestrator", it tells you, you find holes in its reasoning, and make it re-explain it. Once they say the right thing, you go "OK good, proceed with your role." Only the most recent models like 5.4 can be steered like this
I haven't touched the app yet. Have been waiting for wailing and gnashing of teeth associated with new offerings. Sometimes I'm bleeding edge, sometimes leading edge, sometimes I intentionally hang behind, and sometimes I just miss the boat ... experience helps.
So everything I do now is in VSCode with rare CLI requests. I haven't used Web since I started using the extension. But I understand some of the value of the app and will consider that soon. They're different tools for different purposes. I just don't feel the need to reach for the app's value-add yet. YMMV
wait a bit id say. still not 100%
That's the weird part. We're in the "generative" era. So using ChatGPT to take an idea and growing it didn't exist without 100% manual labor. Now you can be loose about how you put things together, and let the generative AI fill in the gaps. If what they filled in is insufficient you make corrections, until finally you have something that you can "vibe" with and ba-da-bing you send it into the pipeline to get built
It's practically magical
I've been intrigued by the app server. VSCode extension is just where I live but there is value to using other UI's. I do want to continue Codex exchanges (threads) when I'm away from the PC. So if I can use the app server to integrate with threads created by Codex, I'll be a real happy camper.
Based on the feedback it's good to be at the "intentionally lags behind" phase 😁 I think if OpenAI has done anything right, it's the "experimental" designation on certain features. It really is stepping into the danger zone. Meanwhile they have all these stable features that absolutely slap, and that's where I am happy to be
I don't trust vibing. It's my personal view that vibe coding is unprofessional, precarious, and the paradigm of bottom feeders and amateurs. However! Even in just the last few weeks with 5.4+ I've been increasingly delegating responsibility for accuracy to the agent. My processes are in place for verification, documentation, CHANGELOG updates, and all that housekeeping. So "Trust but Verify". And as noted earlier I do enforce a decision process before making many changes.
But honestly I've begun to trust the tech more because it has earned my trust ... not because I want it to do free work for me that I don't understand.
It takes a LOT for me to trust tech and people ... this stuff is evolving fast and I've needed to adapt my sense of trust to that rapid pace.
Having said all that ... I'm struggling now to understand something that 5.4 just said that's downright Dumb. I dunno how to ask it for clarification because what it said seems too dumb to explain or counter. 🤣
I've learned more about SWE in the last 7 months than I have in all the years of college and years of hobby work preceding that by simply watching how these little gremlins navigate and reason about the codebase. I have eyes, and I fail to see half as much as they do slinging ripgrep around like a rodeo clown in a field full of calves.
To me, Codex is like the "thought I haven't had yet, just waiting to exist." I could never refactor dozens of files with a python script without thinking about all the consequences. These bad boys do it in a millisecond without a nanosecond of thought. It's truly inspiring!
Inspiring, and yet oh so reminding of personal inadequacy. I hate it when I take twenty minutes or more to craft what I think (oh human hubris) is some really well-thought out and articulated concept, and ChatGPT or Codex respond in less than a second with a huge really well considered response. It's like working out at the gym, feeling good about muscle growth and tone, and seeing some beautiful woman walking right by for the really built guy who you know is on steroids. It just dashes the wind out of ya. (Which is fine for me cuz I love my wife, and BTW, I don't work out at a gym anyway.)
Any codex skill that ensures codex isnt the worst frontend designer ever
Backend wonderful but frontend is miserable
Pretty much, its a lot more stubborn than CC imo. Its a little refreshing..
The best skill is to use acpx and get Claude/Gemini to collaborate on the front end design and formulate a comprehensive plan and then present it to codex for implementation
It has been an absolute game changer
Gemini daily limits are very generous for doing little collab work like this
How can I configure Codex in the cloud to access an MCP server (my app) in order to read config detail required for the build (the RPC namespace is generated from database data, database is MSSQL, so no ODBC in Codex Cloud sandbox, unfortunately)
Claude.ai uses this constantly to design the prompts, but when I send it to Codex, the setup is unable to generate the RPC bindings for the frontend because there are no ODBC libraries in the cloud sandbox, so can't use pyodbc. Alternative is probably just to expose an endpoint that generates the manifest, but I'd like to just have Codex read my API or MCP server directly, seems a lot more useful.
Here is an example: I like this as well.
If later we do want a wake-side tightening, I would rather tighten the raw top-core floor than focus on 52 -> 54. That would target “leave Saver only on a more obvious spike” more directly than nudging the blended score by 2 points.
The other day (as in this morning at 4am) I was troubleshooting why my mac was experiencing kernel panics (for a month, 4 of them happened, chalked up to me being greedy)
I spent countless hours of research going "why does kalloc.1024 kernel map become exhausted?" Every search comes up dry. There's maybe 4 reports on the whole entire internet, and centered around different reasons and troubleshooting steps. So I had codex build me a zprint server https://github.com/robertmsale/.codex/tree/main/zonewatch. I watched it like a hawk, checking zone health, isolating various workloads. Ran my docker containers, ran integration tests, until finally I realized the reason my kalloc.1024 zone map was exhausted is because I had my agents working in a highly containerized environment just to be safe, and to be safe I was like "I have 128GB of ram, so all integration tests must be ran with tmpfs because I have the hardware to support it." Turns out even with more RAM than there is stars in the sky, a memory leak in the kernel cannot be stopped.
I have 3 weeks worth of conversations where I kept bouncing ideas off of ChatGPT, and it was steering me in all sorts of directions, but the magic happened when I had an open mind and said "maybe it's the virtualization stack. I'm gonna rip it all out and see what happens." Turns out yes, kernel panics can happen on a mac if the conditions are completely and utterly unrealistic.
Human : 2 points
Bot: 1 point
Anyways, the agent steered me to the solution, and that's the part I think is cool
since anthropic increased the amount of subsidization, i believe theyre going to increase it... welll i hopppee
please sam altman
if they dont ill just go see if glm or any other chinese model is competent
anthropic didn't give out more effect usage, what they did is make the models use more to be better. Then they increased the usage because the models cost more, so in the end using opus 4.6 gives about the same level of usage
Seems like they didn't innovate in the research space and to stay in front they just made the smartest model accessible
It's kinda like they made the latest Microsoft Word more smart, but you still gotta write the words 😏
glm is the best and most competent model ive ever used omg
its ok
i think it writes better than gpt
like if you want some skill edited or something like that
OK, so it turns out orchestrators are better when they are the ephemeral ones
It's gunna be very awkward next week ..
when we have 2X usage limits until April 2nd
but last reset is on April 1st 😆
will have 1 day to use 100% at the 2X limits
it might also be that the quotas don't reset but rather the consumption rate changes 😛
with the 5 hour windows i dont think its possible
Yeah, so that’ll be a 24 hour window
Of 2X consumption rates 😄
Five hour windows are very generous
From what I see
But I’m sure gunna find out
hope the codex team are all doing ok... been quiet out there 😄
every day I wonder, how is this real life?
Do I still get the 2x quota if I use codex through opencode?
What is the best way to share a project between codex and Claude?
Whatcha mean? they'll read the same files/repo. If you just want a spec or instructions have them write to a markdown file is what I do
is new england clam chowder the red or the white
Coming from Claude pro I’m perfectly happy with codex limit
Like the era of checking usage one last time before a huge refactor is over
i use both, run out of claude limits then swap to codex 🤣
Like I have a local setup as well to do simple stuff, but with codex I can actually use the frontier models more often
■ ! shell commands are unavailable in app-server TUI because command output is not yet persisted in thread history.
what
is there a way to request a feature in the codex app?
Can AUS get the free codex credit thing?
never seent this one before 👀
Hi, I have a question. If I have multiple paid accounts, would using an account auto-switching tool violate the usage policy?
anyone has any idea?
sure bro lol
changelog for codex app?
I wish the codex app team make Windows more friendly like how opencode did it. Just popup (New Version! > Click to install and restart) instead of going to microslop store to check every single time bruh... it's too tedius.
Switch to macOS, it's far better at this stuff
for some reason using @ to search for files defaults to skills first and I have to type the exact file name to get it. Anyone know if this is related to:
shell_snapshot = true
shell_tool = true
It would be, if they did a app store release, but it's not and therefore updates itself instead over the normal way – like in Windows.
It's in one of the latest updates
??????
the Codex macOS App updates simply by clicking on update, it closes and restarts. Painless experience compared to Windows.
the macOS version is not distributed via the App Store, which makes the update mechanism non-default
do u even have a mac?
It's simple, but non-default. The Windows version is the default
top tier ragebait
yes, why is that a question?
I was just replying that even if macOS is far better at updates than windows, that this app is using a non-default way. I didn't want to provoke any outrage about that, just a clean statement.
yes, that's the latest version … ?
thanks, went through the changelogs now.
Compacting = forget the most immediate context
what model?
5.4 extra high
damn, I was 10 days behide
heh
seem like codex app on windows is like a middle child in the house fr
Do your own custom compaction. That is what I m using and it s orders of magnitude better
Vanilla compaction is terrible
Didn't even know that was possible. Where is the setting?
You tweak codex cli itself
🙂
Using codex obviously
There s a reason codex is open source
I don't know what that means. I'm using the web app, are you saying the CLI exposes some different compaction setting?
Settings are for normal users in Ai era. Codex code is open source so you can tweak it
Ah I suppose there is some compaction endpoint and the app calls it according to what its internal rules are and you're saying it's possible to change that rule. Makes me wonder now what the rule is
I had the same issue as you and it was annoying af
Because agent was deep in a task then compaction came and he forgot everything
And was wondering what s up with those half made scripts
And I was like well dude you were just working on them 😂
Yup. But that s the thing vanilla compaction is server by a server api endpoint... So it is not down by the resident agent... So obviously the server side has no clue what your agent was actually doing
It's unfortunate I don't think the desktop app is open source. Maybe it's possible to hack ha ha
So you solve this by having your own resident agent fill up a JSON with what he was doing
Web app still uses codex cli
And that overrides the default compaction
And instead of getting some useless summary
Your agent gets what matters
Given that you tweaked this, do you know/remember what the standard compaction rule(s) are?
I assumed it was trying to be smart at least
Because it doesn't just happen at some fixed point it's somewhat variable
i m using a custom fork of codex, ofc i have the exact diff 🙂
here s an example
notice how my codex shows 0 confusion right after compaction
he knows exactly what he was doing, and what it needs to do
Tbh mine is usually like that, it was just a particularly egregious example I shared, but it does that sometimes
yea vanilla one can work fine too, really depends on task and depth
if it catches the model at a clear checkpoint it s better
i just hated many times it had to start digging again stuff that was already covered
hello, what's up with the codex app for macos? mine keeps crashing silently 12 times an hour, was this vibecoded by a 12 years old?
you gonna bet that 12 yo gonna smarter than us lol
Btw anyone notice qouta draining really fast today?
I was running my usual pace and I never get out of 5h limit and now I think after 1 hour and a half suddenly it was gone
The general point stand though. Everything outside the model itself, is tweakable
That is why they call it "harness"
Because it s your job as the driver to harness properly the horse
How did you modify the codex backend of the closed source app?
And that s why codex cli which is the default harness is open source
Only the front-end is closed source app
It still calls a codex cli module that is on your computer
But I didn't even try that. I keep the app vanilla, I use my custom harness on VScode
Maybe I mistook your screenshot
As a rule of thumb the app might have some parts that are closed source, but the agentic harness that is provided by codex cli open source code is used by the app too
Yea it s from VScode plugin
It looks visually similar
Myself I just hate cli I don't like to feel like coding in 1970
I presume the harness even if open source was embedded at build time in the app
The app is pretty unpolished tbh, lots of janky UI. Vibe coded for sure
If you're on mac maybe
I m not a mac user
I m windows/Linux there any app is just a folder with some files and on wsl it just uses some codex cli build as part of backend
But again, I didn't bother with that. I leave the app untouched
VScode plugin explicitly exposes custom codex cli executable path
so TL"DR, if you feel you're up to go DEVELOPER ONLY you can tweak, otherwise stay on the USER path 🙂
Nice. I don't think that setting is in the app.
Thanks for your ideas, definitely interesting
yea don t think it is either. app is user facing
cli and ide plugin are more dev oriented
I heard app is better at orchestrating parallel agents
Don't know if there's anything to that
over last week 5.4 is taking more and more short cuts. It's starting to feel like i have to verify everything it says.
app is just an UX surface really that uses as engine stuff you find in the codex open source code
that being said, UX might make it easier to orchestrate parallel agents because it might give you explicit flows for that
that in cli you don t get them fed to you as neatly, but the capability is the same
I havent used the cli for a few months, but you can easily chat to the sub agents in the codex app
same in ide extension...there you just click the name of the agent and you go into it's specific thread
I never used the ide extension at all, i went from cli to codex app, i should give the extension a go.
but how you design the logic of what sub agents do is up to you
it s a good middle ground to me. i like to have easy access to the tree file, to terminal etc
You know how occasionally people start reporting "5.4 has stupid-mode enabled today"? I'm willing to bet when OpenAI's servers are under heavy load they are quantizing the KV Caches to make extra room and that's what enables stupid-mode
something is off
I used to use Codex Web all the time, pulling and checking changes made by the assistant before starting another round. Then I loaded the Codex extension in VSCode and I haven't used the the web tooling once since then. With the extension we need to load the CLI, so with the occasional use of the CLI, I feel I have everything I want for now. I'm holding off on the app until it stabilizes. YMMV
My recommendation: Try the VSCode extension. All of these interfaces are used differently. Once you get familiar with each it turns out that it's not a matter of "which one is best" or "which one do I like to use" but about which one to reach for in specific circumstances.
If I connect an MCP server to ChatGPT, can Codex also use it?
They are under heavy enough load to kill an entire product venture to reinvest into Codex.
I heard about that this morning! You're probably right, Codex needs that extra compute. Generating chiropractor throwing old ladies through the sheet rock videos are a side quest
Did they already remove search from the Codex app? I had it earlier today and after the latest update it's gone
these bots are outsmarting me. over and over now I put them on a mission to fix a bug. and im like, dude, fix it, y u so dumb. its right there.
only to have it tell me that actually, I was incorrectly configuring things, and it was all my own fault, and I had to wait for the ai to figure this out and tell me. hey human, y u so dumb.
we are in for one weird future..
ineedmorequota 😔
mepoor
Just wait until ai is convincing ppl to buy stuff
isn't that what the ads are for?
I would agree that ads are for selling stuff to ppl, but its not ai subtly leading people into purchases through profiling and gaslighting