#codex-discussions
1 messages Β· Page 42 of 1
Oh I don't doubt it's a skill issue. I've tried to find models that can help, but so far I've found none that do tool use very well.
GPT have been uncommonly little help
I donβt think you can find such small models that are so general, I think you want to look for models for a specific use case
You mean like "Unity programming" or, just like "C# programming"?
qwen just released a dense 27b model that does pretty good. But being dense probably still needs some decent hardware
DeepSeek V4 is out? and I mean programming specific models
i think running a 1.6 trillion param model might need a little more than a mac mini
Thereβs smaller versions π
I'm currently limited to 16GB VRAM
what is the smallest one?
Once not long ago, that was respectable. π
flash is 284b o.0
Still maybe cluster is not ideal, would work but would be slow
@bright swift I wonder, how did you automate your task?
did you use scheduled automation-feature of Codex?
what do you mean? i dont have anything automated right now. i only use the cli, not the app. usually a few parallel sessions right now with lots of hook-based enforcements. so tasks usually run 1-3h, sometimes longer.
oh so you keep typing into the console ?
I thought you were using some sort of automated agent
yea i did dictation for a bit but prefer typing
one of the tools im building is for better planning/spec workflows so i can parallelize more
bottleneck right now is not implementation but detailed specs really
defining edge cases etc
You can use an orchestration loop to push through spec writing a bit quicker
its more about business logic decisions. sure the models can write a spec fast, but i need to make sure that the details are correct
Yeah that's what im talking about
This tactic only really works on brown field though
Funny. I was just thinking about getting my fields plowed. π
basically i'm trying to do something similar to what you are suggesting i think. trying to build a process where the agent makes "better default decisions"
yo can someone tell what those it do? the auto review thingie? like I dont understand cause he reviews the code and edits it and idk what would it change tho
You make a subagent that reads your rough spec, finds ambiguous details or gaps. Then searches the code base and attempts to fill those gaps, if it finds a definitive answer it fills out the spec to fill the gap. If it doesn't it fills out open questions with suggestions on what it found or what it might suggest.
Then you rinse and repeat, but each time the agent starts it checks and tries to answer open questions first.
What happens is you go from a small spec with a rough idea to a large detailed spec quickly.
Each iteration because the spec got filled a little more some of the open question become clearer and can be answered.
You can also insert your self in the loop and have it present open question to you each iteration.
You go from rough outline to detailed spec much faster.
Yea similar approach. This is good if you already have (decent) code. For new stuff i'm trying to make it fill in the question marks with research from primary sources etc
Glad to see people with decent workflows. π
Throughout my career I've met so so many who have no idea what they're doing
Even if they are excellent coders
This grill-me skill that was/is trending is surprisingly good for nailing down details. Very simple prompt basically. But it can get so detailed that i often find myself annoyed because codex is like "ok, next, question 59: bla blah 3 paragraphs"
Finally got x20 
Most of the time the suggested answer is good, but every now and then i need to clarify sth and that makes the difference
Hey everyone. So what are the impressions on gpt 5.5?
I had to sleep and I barely gave a couple of test tasks
I'm still on 5.4 on the web. Only 5.5 in mobile. :/
Its less corporate in communication style, havent noticed big behavior changes yet, token usage relatively similar so far
haven't tried 5.5
the image gen however is pretty good
I generated some virtual wifu and husbandos
Imagegen yea is great
You can get pretty simple models to do that though
Was curious how did some A/B in codex though for 5.5
how do i get 200$ asap
money
ran out of usage on the plus plan
go to your chat gpt account on webpage and upgrade it to 200
do you have a casino nearby?
Getting money is easy. Getting money ethically is harder.
I disgree, getting money not-ethically is hard too
I'm broke, but can think of a thousand unethical ways to make money without breaking a sweat.
if it were easy, I would some not-ethical money
lol
the arguments such as "you kill someone then get their money" is not applicable because you risk breaking the society's rules and being sent to jail
welcome to the light side
that's part of the whole unethical thing. It carries risks. shrugs
the financial translation of the consequence is negative then
you lost money in the end
I don't stick to ethical stuff due to risks though. I like to help people, not hurt them.
i used to do that, now i'm flexible
I thought we are talking about how easy can you make money unethically
my argument is that, punishment for breaking law is financially negative, therefore it is hard to make money even unethically
I guess I didn't see it from that angle. I just avoid it anyway.
yeah, what I wanted to say was, making money unethically is hard too
unethical != breaking the law
so therefore, if you consider the negative financial outcome of unethical method, your previous statements on making money by breaking law should be reconsidered
He's right though: The cost of breaking laws does make unethical moneymaking harder. Not necessarily hard, but harder.
I'm glad most AI don't help with unethical stuff easily.
i think i miss gpt4.1
often following a law is a physically beneficial choice
Thats why prompt engineering was invented hehe
breaking a law is not necessarily prevented by ethical feelings but more of calculated outcomes
I bet Codex is used mostly for legal unethical stuff actually.
this is necessary because we have psychopaths in our society and they simply want to benefit without regard of others, this type of people are pretty common (by that I mean, they are more common than you might think)
Good vibe ethics chat :)))
Like attention-farming on fb
Is it possible to use 400k context in Codex app with 5.5? It shows 256k for me
The definition of what is ethical and not can vary a lot from individual to individual, culture to culture and so on.
Law has a much clear definition in comparison and is in effect for it s jurisdiction
Venn diagrams and all that
You can use up to 1 mln context since 5.4
You have to add it in settings. Cost is also different above 256k.
Could say that, even though python also starts with venn diagrams and then "all the rest" π
wait u can?? hows the performance?
How do you mean?
yeah for some, crossing red-light on road is ok
That's in the intersection of illegal and unethical though
laws are suggestions
Guys where do I find the browser use plugin? Cant use it somehow
what I was trying to say is, for some crossing red-light is an ethical decision
it is ok to cross redlight for some
I donβt see it in Codex app. I was using other harness, and I think it had 400k. But Iβm used to 1M, donβt know how to enable it
Ask Codex to open a page
π
That basic logics, of which venn diagrams is like part of 101, are the base of all advanced cognitive stuff, from programming to ethical modeling
Ask codex, you can add the setting in the toml, tell it to give you also the link to where is documented on oai web, including price
I see. I don't have any official schooling, so I didn't know it was part of that. I just saw it as "ethics and law somewhat overlapping"
They overlap quite a lot. The purpose of law in decent society is to explicitate and enforce the ethical norms of the community
thats a very idealistic take
That s why i said purpose, doesn't mean is 100% like that in practice
yea ideally they would do that, in practice in the modern world i'd say laws are mostly in the interest of elites, not any community or citizens
All this goes all the way back to Plato, and older with the questions wether something is right cause law says so, or law is good wether it captures what is ethically right
My wife is a prison guard. She tells me it is almost universal that the prisoners feels like society is unfair towards them.
which is why i say laws are only suggestions
criminal law is an exception
i'm talking mostly about business world
How do you mean suggestions?
Law have causal consequences. They might be badly made but they still hurt when you fall under their hard lane
well first of all, just because its a law it doesnt mean its correct. laws get overturned or changed all the time in court
And that applies also for civil and commercial law, not just criminal, since you can end up having to pay a lot
International law is not full law yet. On that we agree, it s still mostly recommendation
As global society we don't have yet global mechanisms that actually enforce downstream
why should i as an entrepreneur follow every single tiny commercial law, when the state refuses to do its part of the social contract and not prosecute violent criminals, illegal immegration, corruption etc
there is always reason to not follow law X for reason Y
following a law is an option
you simply face a conseuquence or not
Yup
And obviously you can assess the probability of facing those consequences too
That's what good lawyers do, that's how they find loopholes
Under-specified cases in law etc
And the really expensive lawyers even use AI to invent cases now haha
And then if the judge finds out it s risky cause he might get annoyed
Pretty sure for lawyers it can have more serious consequences to have AI hallucinations in their works then for us vibe coding folks π
Actually that's the usecase from which I started my current project, full formalization using hybrid modern systems of legal corpora
So you have a system that can find contradictions, ambiguities in existing laws
That is super interesting! Do you publish details about the process?
Once I get there. The project is public but I'm still in building the pipeline to actually be able to do that in an actual sound manner
Can you PM me a link?
Repo is linked in my discord profile
So basically it finds and enumerates ambiguity?
Eventually it would map all the inferential dependencies
Legal is a semi-formal NL domain
That sounds crazy dificult. I am totally following this one. π
NL? Neurolinguistic?
Natural language
I see.
Before now I considered legal a very strict domain.
But thinking about it, of course it isn't.
Normally NL is under specified relative to concepts that are express, as in between terms and concepts the relation is loose
Yes it s most strict of NL
But still semi formal relative to truly formal domains like math
One would think it should be as formal as math.
But there are specialized logics to process normative bodies, the name of the family is Deontic logics
that would defeat the purpose as you wouldnt have any more loopholes
Exactly, and that's one of my long pending about 20 years projects
But before current ai systems, it was practically impossible to do it
Because of the scale of it
And like you would had to hire armies of humans to do the translation
And humans are error prone anyway
I'm reading up on Deontic logics as we speak.
Legacy formal systems are interesting in themeseleves, because there s a lot of them and there were always debates and paradoxes. Like the larger community never settled upon a canonical one
But yes legal and scientific methodology are formalizable with modern tools
Take academic artefacts, like papers, ideally they should all follow the methodology so their results are machine checkable
Not just be a bunch of natural text
And rhetoric heavy
Deontic logics sounds... messy
Like what if you have opposing obligations?
Don't answer that though. Still reading. π
The legacy ones are like from the 60-70's
They were over optimistic
Then they hit their head into the same questions like the one you just put π
And started to make refined models
π
If/else trees?
Well all logic is only if/else trees. But for a formal system you have to go upward at 2nd level and prove a model about how you make the if/else trees
Then you hit Godel issue
A depends on B depends on A?
Which proved that above 1st order systems there are always some statements that are true, but can't be proved
Godel is a result mostly about semantics outrunning syntax of you want
You can have things that are semantically true, yet you can't prove them syntactically at that level
So you end up going one level higher, make a richer syntax to prove that truth
And this is an open ended ladder
That feels like it can go on for a bit
Yes it's open
Sky has no upper ceiling from down here
Has there ever been attempts at creating a non ambiguous language?
Programming languages are probably the most successful attempt to do that π
Compilers don't accept ambiguity well
Lol
i'm tired boss
huh. interesting point. π
How many questions will it give you? Till you say stop? π
idk π usually it stops when everything is resolved but this is a heavy session π
I asked it to ask me questions until it had enough information to make a good decision. It didn't stop. I believe the attention system makes it keep on forever.
Yes because as opposed to math or logic, in programming language you also specify a base ontology
And if the base ontology is clear, then it s easy to map all the combinations among those privitime types
So far my experience with 5.5 medium and high seems to indicate that the model handles complex design more aptly. At least on the scope of my project regarding finance and in the 500k lines of code, the design and analysis sessions I carry out with Codex are more fruitful. Before my feeling is that it tended to get lost in small implementation minutia and get lost. I had to undo and redirect in all sort of ways the effort. Now it gives me the impression of being able to handle more complexity and keep architectural details more clearly. It's all subjective at the moment and I have no KPIs to measure it yet
Harder and harder, the less machine close the language is
Right
Thanks for the info
Thanks for the info.
Hi! Everyone?
Nice assessment!
Hi
How's going on?
Now think of IR but transpose it between the messy untyped natural language humans use and a typed language under which is mapped to clarify meaning.
You get the concept of Semantic IR
IR?
You want your role bounded agents to work on tasks after they ve been normalized through the Semantic IR layer
Ah.
You want the same thing between the human messy prompt and the final semantic compiler step the agent does
I feel inadequate for the intellectual requriements of this subject, but I still find it immensly interesting
Right now all those steps are fused. The agent does all together. It has to disambiguate your prompt, then implement the task
5.5 seems to be a lot less patient with subagents. its now spying on them via git logs in subagent worktrees like "what they doing ova there"
Separations like between plan step and implementation you can think of it as early steps in this direction
Give him "patient orchestrator" directive
Sounds sensible. Keep track on them so they don't spiral.
orchestrator already has "let them cook" instructions
It s default orchestrator style is very top-down master-slave vision
That s quite loose, letting someone cook with wanting it to be done in 5 seconds are still coherent
π
To me it feels agents really need a better grasp of time
I wonder how useful it would be to have an ai that constantly keeps track of what you are saying, and feeds back to you the ambiguity of it
Not metaphysical but actual time of how it takes for some processes
Do they have ANY?
i think it would have negative effects, like claude code got worse when it became aware of its context limits and status
They added that in codex app already, with the guardian mode or whatever its called
Or autoreview or whatever
Claude has different training. Personally I consider the path they went a dead end for my needs
Claude has anxiety trained into it as prime objective it seems
Yup
Which is very unproductive for depth
But very productive to fake you shallow success
Yea became unusable for me, i liked it for non-code general purpose stuff before
From my tests if you tell codex a 100 step sequence he must do
He will do it
If you give it to Claude, it browse it, jumps on it s own to the conclusion and writes some forgery of how it followed all 100 steps
Even though the logs show it never did it
And if you push pack, it says oh sorry
π
And does a bit more on 2nd attempt.
apparently with the newest versions it says stuff like "this is too much work for one context window" and doesnt even start, but i dont know if those were just ragebait screenshots
Are Chinese here?
It s plausible, we know they are compute constrained
So they have to limit the actual inference budget
I use third API? How use GPT-5.5? I can't see GPT-5.5 in CodeX model list
Yea
OK, I'm trying
Not on api yet I think
Other have used gpt5.5 in codex with third api
me with CGPT Pro 20x sub using it natively:
Oh I have updated codex cli, Can't find gpt-5.5 in codex model list
I use it on codex app/cli
No experience with 3rd party api
What third api do you use?
Cool, I'm a pauper, can't buy pro
I've seen several references on X and youtube that the intended baseline reasoning with 5.5 is medium. Can anyone verify that coding output with 5.5 medium will still be better than 5.4 at high?
Just do your own A/B
Give them the same task in your codebase
sensoft.top , this third api
a question is much more token efficient...
I only use xhigh, it does dumb stuff on lower ones sometimes
Yea but it s not about your actual project
And less likely to give useful data.
You will get hallucinations from other users relative to what works best for you
Where are you come from? I'm first in here.
I mean it s cool to exchange feedback
But you want your own baseline first
The first thing I do when a new model is released is to A/B it
Start with medium, go up if it makes mistakes or struggles with your project
And analyze the differences
Agree with that
I mostly use 5.4-mini, and go up when I need to. Which is rare.
yeah, I have always defaulted to high, just using xhigh for repo refactors or checks. 5.5 has been super solid on high, but I've already burned 20% of my 200 plan since 5.5 dropped, so trying to get the hive mind consensus re: dropping back to medium.
For writing concrete code I assume?
Just buy more pro plans
If you already have used high, do a test drop to medium
And see if you feel any difference
For example I did an open ended test on the broad repo architecture
Between high and xhigh
High defaulted to a more bounded operational interpretation of my task
Xhigh took it as I told it
Like high had the task to analyze 60 modules and it just skipped first 30 cause they were history
Xhigh analyzed all 60
But that s on high level open ended task across a big repo
It depends on what you're working on too. Some webapp, medium probably fine. I work on a custom ml pipeline in rust, so i go with xhigh
Yup. It really is dependent on that the most
The model itself is just half of the equation
thx for feedback!
Just give this to Codex and it won't
After you test medium and compare with high, please let us know of you impressions π
just fired off a medium prompt! (given how solid 5.4 was for me, if 5.5 medium can compete with 5.4 high, I will be happy!)
Btw i think that gptpro 5.4 on steroids was really 5.5
Cause yday after they released 5.5 I gave gpt pro 5.4 a review task, and it got back to taking 1 hour, whereas the steroids version finished in max 10 mins
I ll have to test 5.5 but I hope it sticks to the 10 mins I had for the past days on 5.4
On the usage: I am only using 5.5 High on analysis, design and architectural discussions. Found discussions very insightful and results useful for me. As far as the cost I don't see huge drops in my count. But for sure I would not use at the moment 5.5 for coding. It seems to me that 5.4 medium was doing just fine as long as I tell it to batch operations properly
i mean some people still swear by 5.3-codex for implementation
On specific pre 5.5 debugging tasks, 5.3 was still better than 5.4 for me.
Like 5.4 kept going in circles why a panel it made doesn't actually show up
And 5.3 solved the problem at first attempt.
My issue with 5.4 medium was that the model would seem to stop sooner than it should have with a multi-step ask even when I had a very specific scaffold for the work.
Yea i saw that too with lower reasoning mode
Maybe it s made to protect qouta for 20$ peeps π
xhigh just works for hours and hours for me
You want long term stuff use xhigh
and does everything i ask for
I feel is how it should be tbh
If you want multi hour long tasks that s obviously an "XLong task"
But would be nice to have a different slider for time horizon separate from reasoning effort
So you can choose between shallow but long horizon
And deep but long horizon
i also had issues with that with 5.5 lol, only thing it had to do is split functions
Other model did it instantly, gpt 5.5 wouldnt
Interesting
When is the limit reset coming?
5.4 and 5.5 being more of a poly-reasoners they might need a different approach
5.3 codex is specialized on coding
So that might buy it a big advantage in focusing on the proper frame
5.3 codex medium is the sweet spot
For actually coding implementation it might as well be
So far I can say 5.5 agents are definitely much better at orchestrating workers during long running tasks. It finally learned what patience is.
right. it's great at refactoring and one shotting features
For architecturing how intent should be made in code, I always prefer 5.4 and probably now 5.5
But once the implementation spec is ready, codex 5.3 might be the sweet spot
Do you guys get decent architecture from specs? I need to do some manual preparation first to get it working properly.
I've had good luck delegating sub-agent tasks to leverage that seperate codex 5.3 sparc token bucket
I think they lie about the deepseek v4 cost like most chinese models. I just tried it briefly in opencode and its already used $2.95... in less then 1h. yes its a wide task, but this would be less then 1% of my gpt weekly
or their caching sucks
I do the first high level pass myself with chatgpt thinking coming up with objections. And once I like that. I usually tell gpt to craft a prompt for gptpro to add depth. I do usually a refinement or 2 in the same way. Then I move those to support docs and switch to codex where I use 3 layers of concretization, the last one being the actual implementation spec for a specific feat
That s a lot 3$ per hour. What s their supposed cost for 1 mln tokens?
Would be interesting to test the same task and how much tokens gpt 5.5 uses on it
It s ok don't bother. I can search later
I assume it aligns with the 8 million it used for you
its weird b/c I found also k2.5 was super expensive. not sure if its a problem using with opencode
maybe its super inefficient
So it should be like 40 cents for 1 million
save me...
Their cot sucks
I am testing them with simple but tricky logical puzzles and the cot of the Chinese models is still vanilla
And it runs for ages
have you used anything other then opencode?
Token cost in itself does not tell much if most of those tokens are junk
I did not even bother trying them on api, they fail at the in vitro logical test :=)
That is very similar to my approach.
they dont have any coding plans do they? its just api right?
here it is: https://api-docs.deepseek.com/quick_start/pricing/
I m just saying that what you say makes sense to me. If they need 20 times more tokens than gpt for same task, even if on paper they are 10 times less expensive per token, you pay twice for same real task
Think so
Yea but if you payed 3$ for something that would have taken 1% of your weekly qouta with codex....
π
are u guys complaining abt token cost
they must have been comparing api to api not against gpt coding plan also
that would make more sense
Nah
if u dont make use of like normal chat and video generation, u can get 2x the amount of codex (in return for those other features)
so its still way way way more epensive then gpt coding plan for less capability... only makes sense if you need api
Yea that's why I m saying testing the same task in codex via api would show exactly the token ratio used for same fixed task
Or can it be counted via sub plan too?
Those models are probably good for tasks like this though
I think there might be some tools
loll
quality
dont leak the sauce
@boreal holly dont forget the agentic qa pass on the result!
Adversarial capitalization review π
That s some new offer or what?
you mean coding in 2021? copilot came to be in 2022 around masses
prob better to use a team of agents and use consensus review
How you get 2x codex if you don't use normal chatgpt chat?
@tawny turret the above is for you
Don't forget xhigh /fast --yolo, because you want the capitalization to be as close to 100% accurate as possible but not take too much time
don't forget "dangerously skip permissions"
Through shady resellers
uh idk if i can share it yet apparently theyre still in the "testing" early access phase, im lucky to be a tester but basically itrs a company that offers twice the amount of codex limits for the same price as codex plans, but no access to regular chatgpt web stuff
resell? how are they reselling it ? in a loss? they said they have a partnership
wrong* they allow u to buy codex only, 50% off the regular chatgpt plans but u get codex access only*
useful for ppl like me who only use ai for coding and skidding
"I need to fan out subagents, one to check for capitalized Kanji, one for Cyrillic, another for Greek"
I assume they use the web interface for distilling and sell the codex usage to soften the financial cost
One for runes, as well.
But idk
no its 100% codex ive been using it since 2 days ago and I have gpt pro from my company aswell
Sometimes I do not understand LLM
It often infers answers based on almost a decade years old specs
Like, ask it this:
Is it allowed to wrap dt in a dl with div?
It will consistently tell you no - ask it the same with "in 2026" appended, which will trigger a websearch, and of course the answer will be yes - its been allowed since basically 10 years now
Asking it why - it says "lots of training data still refers to the old spec"
π
can you send me the link? would be quite useful for me I only use claude for chat
I guess it is a conundrum - do not train in old data, it cant deal with old data... but at the cost of losing new data? Sounds like a bad move to me
https://openai.com/index/codex-flexible-pricing-for-teams/
I guess there are codex only plans now, didnt know π
tester only rn, but Ill send u a dm once they launch public, probably tomiorrow
I guess they are exploiting the signup bonuses with multiple accounts etc "To support your adoption, eligible ChatGPT Business workspaces can receive $100 in credits for each new Codex-only team member that joins and starts using Codex, up to $500 per team, for a limited time. To activate the offer, add Codex-only seats to your workspace or create a new ChatGPT Business workspaceβ (opens in a new window)1."
no lol thats something entirely different, thats api usage, like 20x more expensive than what i have XD
does sound shady to me, and probably against oai terms
i mean sure we all know there is grey digital market
yeah lol but they said they haev a partnership, tbh I hope its all legit, would be very nice to not pay for voice mode and roleplay webchat when i dont use it
My guess its an exploit of the API credits i just posted
lol that would be crazy
But maybe there are some unreleased partnership for this type of stuff, who knows
or they resell legit codex usage to which they have access to
nah they wouldnt sell in a loss would they? i think legit partnership
there are loads of gpt users that don t use codex. many companies that have enterprise plans but trust the admin to manage who uses
lol
Yea possible, but still that wouldnt be an official partnership to resell pretty sure. And definitely not enterprises handing out login info
but for me it's 100% clear. there are no official codex offers than those offered directly by OAI
*better
i mean
it s just basic economic logic
Just saw some activity on the issue I posted:https://github.com/openai/codex/issues/18130#issue-4275895971
If this gets enough π, it would allow conversations to be cached for up to 24 hours instead of 10-15 mins for no additional cost. Just saying this is a pretty good feature to have
oai won t partner some 3rd party to giver you 2x cheaper codex
You can also buy "residential VPNs", what they dont tell you is that those are just malware infections π
thumbed up it
uh those are called residential IP's and they are as legit as they get lol. many big official companies sell them. theres also car dealerships that scam ppl, doesnt mean its all a scam
Just because a company is big and official doesnt mean they dont do shady stuff π
Nobody will sell you a codex subscription cheaper than OpenAI
One thing I noticed, with 5.4 I would frequently see "that giant patch failed because something changed, I'm gonna do multiple small, targeted patches instead." 5.5 does not have this issue. Instead it'll run nl -ba on a file beforehand and craft a perfect patch. Fascinating to see it meticulously avoid issues like that
hi, is it possible to config browser use to be able to open other local ips?
For some reason my doesnt allow any other than localhost
Now 5.5 arrived for me. π
No, go back to sleep
ohhhhhhhhhh TT
hello everyone, is there any promotion ongoing for a 1 month trial?
5.5 feels about the same, or possibly a bit faster than 5.4.
Hey folks, to save on usage should I plan with higher reasoning (xh) and implement with lower (med) or the other way around?
I'm testing this but not sure yet which one is working better
I would do the former. Best results are if you already know architecture, and only need to guide the AI to write documents for an architecture you have decided on.
Plan with Medium, execute with Medium. Maybe execute with high if it truly struggles with something, but otherwise start with medium.
I can't keep up with it π. Well, ev3 in the model name is experimental variant number 3. but I don't have a clue what "flx" means. Flexible is the only thing which came to mind, possibly trained for long running agentic tasks or something
looks like we're getting a codex model variant soon though

You can get an architect to design your house and a high school kid to build it.
Or you can get a high school kid to design it and a certified contractor to build it.
Or just ask the kid to figure it all out and let you know when it's ready to move in.
Or you can work with the architect and contractor.
Your choice.
It's actually not that cut-n-dried. Quite often the task doesn't require an architect or a contractor. Pounding a nail doesn't require a contractor - ask the kid to do it. Drawing a picture of the house doesn't require an architect. Ask the kid to do it.
Use the right tool for the job. β’
So JaneBot did this for me yesterday. What else should we add to make it useful?
to be fair since these upgrades are incremental, much of what was already good should ideally remain the same, whereas the real improvement should come in edge cases
βοΈ That seems to be the consensus in all comments. 5.5 isn't revolutionary, it's evolutionary.
βͺI found a site that would help you pass your evals faster by giving you a algo design to pass evals, it gives you a significantly higher chance of passing evals and a great way to hedge on cheap prop firms like apex-https://propfirmpinescripts.com/β¬
Yea, that makes sense
it's more token-efficient than 5.4, so it's probably running at about the same speed but completing tasks faster
pretty cool considering it does come with a price increase per-token
Is this the first time we dont get a reset with a model release?
π Oh well...
I don't get the feeling it's completing faster, but looking at my usage stats it drains the usage slower
only 25% of my weekly pro 200 used in about 6h yay
if u ask to make a white man black, it will not add the word "black" in the prompt
you can also more safely lower the reasoning level to low/medium which is neat
I can't believe what 5.5 just did
It tilted the freaking image, on its own, no ask for whatever crazy.
When I got the "want to try 5.5?" question, it set it up for highest settings possible, including fast.
You know, so it will eat my usage as quickly as possible. π
interesting. based on the leaderboard solely for this benchmark
in terms of performance gpt 5.4 xhigh = gpt 5.5 medium
which combined with that cost table you posted, means what costed 2851$ now costs 1199$ which is quite big
also you get 2 extra points for 960 extra $ with gpt 5.5 high
and to squeeze a 3rd extra point it costs another 1200$ for 5.5 xhigh
I wish I had enough business to warrant that sort of extravaganze π
well assumption is the ratios remain similar on lower tasks for peasants like us
Side note: Last warning from GitHub:
On April 24 we'll start using GitHub Copilot interaction data for AI model training unless you opt out. Review this update and manage your preferences in your GitHub account settings.
who knows. maybe it will be codex 5.6
5.4 skipped codex
maybe 5.5 skips dedicated codex model too
it's unlikely, openai unified codex with 5.4, they're unlikely to go back on that now
what s that?
is it on this official discord?
or?
is it just me or would be nice if on this official oai discord channel we could trust info relayed at least about oai models is actually legit?
This is just another public forum, it's not really "official".
do you think it will be a thing?
if yes then bro
I cant imagine
public refers to target audience
who are the creators of this forum?
i mean is created by 2 random dudes in a basement? or is it actually made by oai?
the space itself i mean
The closest thing to "official" is this: https://community.openai.com/
OpenAI is the creator. There's a verified checkmark here indicating its legitimacy
This server is staffed mostly by volunteers but there are many company staff here on their own time and they do occasionally view and rarely post here.
ok then that is "official" enough for me
it's a community space about oai products created by oai
Well, it was created by the company and they help to keep it a well-managed community resource, and of course they do publish announcements, pretty much anything else here is purely public commentary.
Hey, is anyone else seeing high resource usage from the Codex desktop app on Mac?
For me, Codex is using around 3 GB memory which is fine , but CPU is the issue. Activity Monitor shows total Codex CPU around 290-300%, with two Codex Helper Renderer processes each going over 100% CPU. Energy Impact is also around 1,100-1,300.
Memory pressure is normal and swap is 0, so it doesnβt look like a RAM issue.
Can other Mac users share their CPU, memory, and Energy Impact numbers, and whether the app is also making your Mac heat up?
though i am on macbook pro m4 pro .
yea bro but there are hygiene rules to keep the community healthy, like you can't start using foul language
my point was that it would be nice to have higher hygiene
How would that be enforced?
For that, send a note to the Mod bot.
But they aren't gonna suppress community scuttlebut.
for example, people coming around and shadily promoting dubious offers of some "3rd parties" that give you 2x codex and other stuff like this
i don t think it should be allowed
We want a healthy environment, not nannies.
you are WE? π
Commit what work you have, quit the app, reopen and resume that thread. See if it return to normal
I'm happy for @lean lark to include me in that we
"I" am "WE" .... community.
we are all part of that WE that s the point
we are having a discussion inside that WE
lol
For this healthy community we have:
- Bots
- Mods
- Guides
- Eachother
Would you like more?
Bots scrub for words and images. Outside of that kind of deterministic processing of data, we can only rely on one another ... this is a public resource, we need to pick up our own trash.
that's a long discussion probably. i was just expressing something that i think would be nice to have, on a very specific topic, like spam/scam related to Oai products
Yes, but to be able to shut down community hype around models / misinformation around models it would require a moderation team whose sole purpose/job is to monitor each channel. It's nearly impossible. Let's not forget that our moderators are community volunteers
I believe we would all agree that keeping the place clean is important but frankly I don't see anything here that needs cleaning up outside of peeps posting wants/desires and misinformation that the rest of us need to counter.
Yeah, staff here aren't paid.
(Former Guide here, been through the process)
I have a solution to the hygiene
It's not automatic but gets the job done
we are in AI age though... π there might be other solutions already haha
yea that's epistemic due diligence
There are actually very few official channels, just keep an eye on those. Anything that is not coming from the home page or a validated X account is subject to scrutiny.
( Hey JaneBot, keep an eye on official channels and let me know when OpenAI makes product announcements. ) π
GPT 55 is probably a decade away anyways, let alone GPT 55 codex. That's reaching wayyyy into the future π€ͺ
i saw at least 3 occasions with shady promotions about some 3rd party that gets you cheaper codex and to dm about the link
personally i think this shouldn t be floating around here
but i guess you guys are fine with this kind of scams
or consider there s nothing that can be done about it
Scams not being handled by platforms is one of the worst parts of this modern age.
WTH?!!? I just worked with Codex to modify code, did a commit and push, and in the GH repo it says I and Copilot committed.
100%
hmm, that's new and bad haha
If they committed and pushed with gh instead of git that might do that
So you're saying you read something on the internet and it wasn't true? I'm kinda doubting you bro....
anyone worked long enough with 5.5 so far to get a feel of how it compares in qouta usage for same types of tasks vs 5.4?
Slightly lower I would say
Although... I just reached the dreaded "You've hit your usage limit. To get more access now, send a request to your admin or try again at 10:19 PM."
lol let's get back at chill between us. we always end up at odds with eachother haha. let's just close that topic. π
imagine 5.5 pro heavy
Please, I'd suggest that the metric is quality vs cost. We're looking at that Triangle again. If you only look at tokens and quotas you're not focused on the value of the model. Though I understand that this is all a part of that cost/value determination process.
@unique spade that was just a funny comment, absolutely not a jab, please assume funny first and rarely jab, especially amongst friends here. Yeah, in plain text we do not get tone. π
to build that triangle you still want each edge. was just curious as qouta use for alike tasks vs 5.4 and 5.5
Oh, funny or stupid, more likely stupid... just sayin....
oai people still vagueposting on X so probably something else coming today
wen reset
what else did they post
that's such a tease
I thought I was tripping at first...
Oh, and because CoPilot shows as co-authoring my repo, now commits are Unverified. Ugh, need to address this!
what s this
codex usage is a bit out of sync
results from testing 5.5 on 5*5 html blueprints
Notice the cards on the graph.. they move.
The prompt for these jots is something as simple as "create 5 different business personas and for each 5 different landing pages using distinct styles and layouts"
Compared to 5.4 and prior, this is truly pleasantly surprising everytime I look at a new draft it made.
It comes up with craze, and I love craze.
nice
kewl
bruh thats nuts
like I mean not crazy but from what earlier models used to do is good
What is really nuts, but I am sure this is a system error, is that these 5*5 html dummies, with business personas, consumed 1 percent of my weekly pro sub (200 bucks one)
Oh I know! It's not just rounded rectangle cards inside cards with plain white backgrounds and default typography
Exactly, and when it does that (sometimes it still does), you tell it... and it actually changes things lol
And that is even more pleasing. I feel understood. I feel listened to. My wife should take notes.
it does seem to have a much better understanding of 2D space based on what Iβve seen
What it still cant do at all is manage subagents
But I gave up on that lol. Who needs it, anyway, it mainly is cool to have but does not add much better results, rather a perhaps bit cooler experience... fine, I will not use them
(these 5*5 html pages are like the best ever use case for agents, and it plain out did not use a single one)
im going to cancel my claude sub for a second codex plus sub
which is 60$ cad sadly
but idc
I havent yet used deepseek yet
seems super promissing
what problems are you seeing with subagents? i'm currently experimenting with the multiagentv2 feature flag, seems to make things better
Someone mentioned earlier this morning setting up the orchestrator to "let em cook." 5.5 is really good at this! I feel like the official subagents feature, the main agent can see what happens between full turns. In any case I think if an orchestrator is set up only to judge the end result it keeps the role simple enough they never drift.
that was me π
i give each subagent a fully set up isolated worktree though
and some wrapper commands for the orchestrator to manage adopting commits
Me too! I have a hook that creates the worktree and rewrites their CWD to it, and when they get archived it reaps their worktree & container stack. This is the way π
the issue with the multi agents v1 is that subagents cant respond when the orchestrator sends a question
but with v2 this seems to work better
and orchestrator can get impatient if you dont tell him specifically to chill
UH, I had seen that a while ago in releases and they said this is not released (even if it was released, but whatever)
So how to use this?
That was my impression looking at the official subagents. When the orchestrator is stateless, they send out messages and end their turn, which I think is a stress reliever in its own rite
yea exactly thats the issue, it asks for status update and the subagent either doesnt respond which pisses the orchestrator off, or the subagent accidentally ends his turn to respond
[features]
multi_agent_v2 = true
So replace this?
[features]
multi_agent = true
or keep and add?
with v2 the subagents can send a non-final response back
uhm not sure probably both at same time will give config error
actually both yes
one important thing i added in my fork is to pass cwd to the subagent, so he starts in his worktree instead of the main checkout. important for my hooks setup
thumb up welcome π https://github.com/openai/codex/issues/18969
Wait, im lost. Have agents changes?
?
multi_agent is not valid anymore?
multi_agent enables the collaboration/subagent tools at all. multi_agent_v2 selects the v2 version of those tools. If multi_agent is off, v2 will not matter because the collab tools are not exposed.
typical oai naming things
anybody else has problems with codex app, it's slow and constantly freezes and I have to wait till it unfreezes....
Strange, I have been using codex for 2 days, and I do not see traces of agents anymore in its usage, even though features.multi_agent is true
ideally you want the orchestrator to oversee the process, i agree
and to actively interfere little in the actual work of sub-agents
It only uses subagents if you explicitly tell it to
Its not like claude code where its proactive
@bright swift Before it wasnt so, no?
I think it was always like this
i saw it very very rarely to use sub agents by it s own
Not true. I have been seeing agents for weeks, even though I never said to use them
I've seen many people reporting many different experiences with agent spawning, just like we're seeing here.
Exactly! I set it up so workers cannot run git commands (except commit and request review), so only the orchestrator can approve code changes into integration branch. That's the gate where orchestrator can oversee things in my setup
it makes sense, i think the practice of every person matters a lot
Don't you have an agent for GH ops? π
i mean if you have it use sub agents often i d imagine it becomes some kind of implicit policy inside the memory of the thread
nice
i wonder if 5.5 suffers from 5.4 orchestrator bug when it used inherit parent context to 5.4 sub agents
@bright swift Doubtful, I have changed nothing
it was so funny to see how sub-agents thought themselves they are the orcehstrator and tried to mirror the orchesdtrator instructions and spawn themselves worker agents
Yes! I think if an agent's only job is to manage subagents they tend to do so, but in the way default codex is set up, the base instructions are like "you are a general purpose, all encompassing agent designed to do whatever", and most folks are gonna be like "make a plan, refine the plan, write the code". But when you want em to use subagents, they don't see any task as so complex they need to fan out so they default to doing it themselves. But when their entire purpose is to use subagents and they can't write code it works out better
Well they can.. There is agents.max_depth setting π
Are the default agents gone? Do I need to create them manually now?
it s a specific bug that is documented in issues that affects only 5.4
Ah ok
The configuration of Codex is such a unholy mess
5.4 sub agent received prompt to do x, but if it got inherited context , it also saw the prompt for the main agent and somehow in half of instances that prompt overpowered the one it received. so it tried to do the orchestrator task instead
π
Codex usage/quota reporting sub-thread: https://discord.com/channels/974519864045756446/1497255535664300173
Read only permissioned orchestrator is an interesting idea but I use it to resolve possible merge conflicts because i have multiple orchestrators running in parallel
In different checkouts though
me after switching from opus to codex
My orchestrator does have workspace_write! The trick I employed is I made a zsh wrapper, tied it into the role identity of the running agent so when they run any command I know what role is running it and can react accordingly.
So they can't write into worktrees or the main repo folder, but I have certain commands escaping the sandbox depending on the role. Rather than direct access to git, I have a bunch of sanctioned scripts that do non-destructive ops only, and only the orchestrator can use most of em. It's pretty complicated, but it's all posted up in https://github.com/robertmsale/.codex
Ah interesting. I have hooks that block writes outside the checkout/worktree but not per role
Except for the animation, that reminds me of old wall-sized ASCII-art. Anyone remember Monalisa or StarTrek posters created on LA36 or similar paper terminals?
i tweaked my orchestrator so it can peek at intermediate reasoning outputs of his sub agents
so it stops wondering what they're doing mid task π
its made using a gif to ascii pipeline i coded
Via raw session logs?
patched codex itself and introduced this capability in my fork lol
Ah nice
Gawd, the time it took me to create stuff like that by hand in BASIC....
its the age of "forking your own" π
well it's AI age, it's supposed to be like this
anyone here still uses claude or other agents too?
and if yes for what type of tasks
only if my codex runs out
Now make it like codex let it accept input while subagents work?
so like back up emergency junior
π
im praying sam finds a reason to reset limits
βοΈ Tobi
where would you use this type of Ascii animations?
idk, i just like them personally
will look into it. right now i got side tracked to make my own UX surface to use codex
maybe as a game asset or something
like for example you can make cool 3D effects with it
Seriously all this meta work is so addicting lol
Thats the thing I loved on claude/copilot cli
makes sense I thouthg how could I use it in ecom but no clue haha but ye for games or loading to hacker menus coool
And truly OAI should take a look.
They did with the codex app didnt they?
Tripy
Maybe, I am not using that app, it doesnt really work (after a while starts crashing)
Yea I uninstalled it too, felt like it was draining usage in cli faster somehow
Meta is the edge. It s where you get to compete straight with the giants. So it makes sense it's addictive
Decades ago I did a TUI/ASCII animation: A Romulan warbird came down from the top of the screen over business data π ... then the Enterprise came up behind it, shot photon torpedos, blew up the warbird, and then exited off to the side of the screen ... then the business data was restored.
Who said green screen was boring?
Now Codex can do that in a minute. 
I'm in love
its a fun time
Oki can you put it in one concise paragraph? What you d like the current harness to support and it doesn't?
Weirdly enough that's about the quality of animation available at the time StarWars first came out.
This rocks
( Until Intel decides to kill open source )
(Sorry, just reflecting a bit of news...)
Missed those news... What s Intel planning
So in codex CLI when you have a thread open and it (the model) is working, the main input still lets you submit new messages, but it queues them. Imagine the model would actually use subagents reliably (unfortunately most of the time it does not) and, when the main thread is not busy revieweing something or delegating tasks to the sub agents, the main agent becomes immediately available for interaction.
Basically, you can then:
- kick off task via prompt to main agent
- main agent does its thing (delegates to subagents)
- as it is done and waits for sub agents, you can already interact with main agent again, potentially having a cofeechat or do more serious work, while it waits for subagents to return
When I was working via CopIlot CLI using Claude models exactly that was possible, and claude told me it is an explicit feature it has inbuilt, to free up the main thread as much as possible for parallel interaction
They've been publishing FOSS for a couple years, seriously deep stuff related to chips, but they just terminated all of that as an undesirable expense. Kinda of a kick in the pants to corporate support of FOSS ... especially for their own products.
guys. im sick of this stupid claude. 2 questions on opus 4.7
claude limit reached.
switching to codex
Thx
the 5h limits are terrible on max x5
I just hit that
gpt/codex is a dream in all ways compare to everything else
how many prompts?
i wanna get the plus plan
ChatGPT Plus plan is a great place to start.
Youll prompt it twice and itll be on limit, its more than claude still but still close to nothing
I might say, the plus is the plan to start
That is just not true
Unless you prompt it to analyse fqacebook's source code or so, you get hours to work with plus
I know because I have both, and I use these products daily, unless I am not.
bruv so it aint good as all people say? that codex has more usage
it has more usage
but $20 is hobby plan unless you are very careful with your prompts
Ignore vague "trust me its the internet" statements and just try it first.
It has significant usage. Whether it is more or less than claude I cannot say, but it is very generous even now that limits are not 2x anymore.
Had been running business plan for a while, has the same limits as plus and I had to switch like every 40mins max
trust me, ok? π€£
Okay
option 1, keep claude and suffer
option 2, go to gpt and maybe also suffer i dont know yet
both 20 dollar plan
i thought they were gonna reset rate limits
okay cuz i broke my keuyboard today of rage
couldve bought gpt pro with that money π§
Generate a meme with the last 2% please.
nah it was a 2 dolalr keyboard sicne i like smashing things
Your decision shouldn't be based on "more usage" but what you're doing with the usage. Don't run xhigh or fast, use high-quality prompts, document your project/code. These factors drastically reduce time and token usage, and increase the quality of your product.
okay so run it on effor low? or medium
For rare I agree with captain
no lol just look at the keys and the background, its very close to being real was this with the api or with normal chatgpt ?
normal chat
the key in the front is a bit weird
My suggestions about which settings to use: #codex-discussions message
#codex-discussions message
the reason i say no is since all the switches are dif colours and even the keycaps are dif colors on the under part and have a bunch of weird text
I consider fast only if reset gonna come too soon and I have too much qouta left
Otherwise I see no reason trade qouta for speed π
for everyone
im 14
yea that
I understand that being 14 you want to throw as much as you can at the prompt and get as much back as you can. As professionals, we need to use the technology better to get better results. Follow what I said, learn how to get the most out of the tech. Beyond that there's not much to offer.
just use 5.5 xhigh fast
LMAO
π
get high fast used to be the way to go...
rip my usage then
i use 5.5 high on low speed
Just manage your usage. It doesn't matter what age you have. It is a basic resource optimization problem
( Standard speed, there is no Low )
( Do or do not, there is no try )
just think about it like this.. pro is 200/mo.. thats what, 10-15 hours of fast food work? and you can have agents running 24/7 on xhigh fast with a couple of fastfood jobs + pro plans
still is
24/7 with xhigh fast i honestly doubt it gets you there
Not based on my usage.
you should validate the concept for us. Gl
Lol
it hurts every time i see our API costs after using 5.4-pro
Well would be nice if you can only do those 10-15 hours so you can chill the rest of the month coding with codex.
But I don't know if it works that way
well you need a couple of pro plans for that especially i running things in parallel with subagents π
From June it will be 4 pro plans
Since currently it's promotional 2x π
they will extend.. believe π
Let s hope they do. They still have best offer by far for power users
As an example of my common usage, I just generated code on Medium, and I toggled to Low and said "fix lint issues". You don't need high reasoning to fix linting issues. (In this utility I just added ESLint which wasn't present for the original code, otherwise the assistant runs lint before and after code gen and corrects its own mistakes before completion.)
they should just let us add multiple pro plans in one account instead of credits
switching accounts is annoying
Yup
And then make a top with who wields most pro plans
π
Why would they make it easier to lose money?
there is a guy on X who claims he uses 24x claude max plans and uses them up
there is a guy on X who claims he can fly
in this case i believe him because he puts out crazy amounts of code
We re still in R&D phase
Much of that usage goes into improving models
Personally i make all my codex available for training, because I want my usage to be part of new training for obvious reasons
Actually, yes. But not all users are sharing
Save on subs: At some point, run local LLMs for little stuff, or use a per-minute/hour VPS with huge GPU for big stuff.
so youre saying i shouldnt ask gpt 5.5 pro extended thinking for a weather report? blasphemy
I have a DGX spark and m5 max both 128GB and canβt find reliable enough local models to use consistently. Probably requires fine tuning currently for any production case.
Definitely not code.
you should ask it to use chatpgtp via computer use, to prompt there and give you the answer back π
did you try gemma 4?
its not making me stop :)
Same as gemini then huh
Typical of Google, yes
I'm Not a fan of any Google products but for some purposes I lower my standards.
Seems like google is making a big push now with their "strike team" announcement couple days ago
Iβve read itβs a problem that can be solved- mostly related to their own formats not being consistently supported across current inferencing
Ignore Google Marketing buzzwords ... they're a daily annoyance.
quick question i started recently a fresh repo, in codex settings it appears in the review list, but somehow i don t get either auto reviews from codex, nor the manual @codex review works
maybe i need to go through an extra step like updating how codex connects to that specific org/repo?
OpenAI has consistently had the strongest models for every use case I have had (real world)
Except design but wbk
probably need to add it in github connector permission list too
if you set it up with "only some repos"
yea was thinking about it, have to remember where to do that lol
dear sam please get drunk and reset limits again
I thouhtg its codex 5.5 notificatiin
thx! you re faster than gpt
wen reset
π
Are we finding 5.5 faster to use up rate limits?
about the same i feel
its faster
ive never hit limits so quickly
It burns faster but 10x better output in my project
im a sad developer now, gonna have to use inferior opus
We have spark for this use case
Surprisingly not really
Just keep it in medium
Scary. I need to earn more within a month so I can afford three pro subs
never, i need all the juice
3????
Damn
i have gpt pro, google pro, and claude max
Bruh Opus uses up limits at least 10x faster
yeah but
thx bro! on my main i had all repos set, but in the specific org i started that repo it was set only for individual repos
Ye I have 1 and its ideal for me but if u have more things to build then 3 I guess
the tokens must flow!
Iβm fulltime indie working 7 days a week
My problem isn't even with programming, it's with the openclaws they freaking devour it
I would think that OpenAI would adjust API tiers such that the natural path is to go with the API when quota is exceeded. That eliminates the multi-account hassle. Of course that's where Business and Enterprise pricing come into play as well.
you dont need a claw with codex automations
There are tools for load balancing multiple accounts automatically
API will never be economical
Speaking of OpenClaw and them ... the 'llm' utility can interface with 5.5 via API, even though 5.5 API hasn't been announced. That gimmick will be obsolete in days but it was a nice feat for 'llm'.
You do know openclaw is not only automations right?
why 258k context max on codex? using gpt 5.5
Biz pay insane overages just to have enterprise management of org accounts
idk, i wrote my own MCP toolkit
codex autocompact beats bigger context imo
I'm getting Error running remote compact task: timeout waiting for child process to exit errors on my linux vm when i running out of context. Any way to like give it a longer timeout? It works fine on my Windows machine
I think they just haven't updated the FOSS for that. You can set 400000 or above in the .toml.
i need for large database
even 400000 didnt work
That's weird.
model = "gpt-5.5"
model_reasoning_effort = "high"
model_context_window = 300000
model_auto_compact_token_limit = 8000000
personality = "friendly"
sandbox_mode = "danger-full-access"
I can't wait until someone comes up with a "token economy" just like the crypto bs
anyone know if openai ever said if they r gonna add gpt pro model to codex?
dont give them ideas
I don't think so
codex app
Besides it would devour usage, it's like 6x more expensive than 5.5, 12x than 5.4, and probably uses like 10x more tokens (basing it off how long it loves to think)
So in reality it's probably like 120x more expensive than 5.4
I dunno about app details, sorry.
yeah but its so good , i'd love to have that as an option
works great for reviews or planning, i use it with repomix to build a combined file of additional context
oh I saw one of those like 6m ago or so but if its from pete then I'll def give it a try
thanks!
HAHAHA - Funny about Oracle, I just did that this morning.
Spotted you in the wild on a YouTube comment lol, I was wondering what your linting setup looks like when working with codex?
I was thinking about just starting with the recommended typescript eslint rules
Which comment?
I have the ESLint extension in all profiles. I npm install eslint and related packages for all projects. I have an extensive eslint.config.mjs. My package.json has a "lint" script. For Codex, AGENTS.md directs the assistant to run lint and save output to a temp file "lint-1.txt", and at the end it runs the same to "lint-2.txt". Then it compares. It must fix whatever is different and it must not address whatever was wrong before it started.
I think that's it.
It was from a video from Web Dev Simplified about something called Biome.
why the two files? i just lint in post tool use hook and tell it to fix asap or it gets replaced
Oh yeah, Kyle's channel. Awesome stuff!
The two files are a before/after. I don't want the current task to be burdened with fixing unrelated issues. I only want it to fix what it broke. Other issues are fixed and committed separately.
Sounds to me that this setup is useful for brownfield projects where existing stuff is known and accepted
the reason i lint in post tool use is that i dont want linting issues to accumulate until pre-commit which runs the big QA gate
Does anyone know roughly when GPT 5.5 will be available in Codex?
yesterday
lol rip github copilot.. 7.5x modifier for gpt-5.5 same as opus 4.7 https://github.blog/changelog/2026-04-24-gpt-5-5-is-generally-available-for-github-copilot/
instead of 1x for 5.4
check the fast setting as it changed its default for some people
they enabled fast mode by default in 0.124.0
sneaky bois
its not enabled on my end
at the beginning of April, i could get half of the day with a single 5 hour limit
(right after x2 event ended)
but now the 5 hour limit dont stand 1 hour
i signed into codex but now it sign ed off now its asking for number and when i enter i receive no sms
is there a way to solve it,
guys how to sign in so i wont have to verify via sms
I was using GPT 5.4/5.3 medium
Comments on Codex reporting are welcome. π https://discord.com/channels/974519864045756446/1497255535664300173
Breaking for lunch now.
if you use codex app or even have it installed but use cli, its sneaks in a bunch of extra stuff into your requests via the app server integration. big mcp tools like github, other apps, skills, etc.
if you use cli and care about usage, uninstall the slapp
I believe this has to do with openai switching to token-based calculation?
is it asking for sms verification in codex appp for u guys
Giving a piece of software a persona is crazy work tbh
To stop multiple free accounts bypassing usage limits
Damn a single message used 2% (no tool usage, short answer)
Only for enterprise accounts
I use Auth, I have it on since codex cloud time when it was mandatory there I think
βͺI found a site that would help you pass your evals faster by giving you a algo design to pass evals, it gives you a significantly higher chance of passing evals and a great way to hedge on cheap prop firms like apex-https://propfirmpinescripts.com
I haven t noticed any change in how fast qouta runs this month
holyshit codex do use a lot of usage
to point i feel my pc stutter lot when playing
and using codex
even tho i got a high end pc
whatever it is doing in background testing the game or idfk
Am I the only one who thinks token usage is lower than 5.4 when using 5.5?
I am able to crank out significantly more work when using 5.5 than I did with 5.4
No. I have a similar impression so far
I was capping my 5hr limit on 5.4/high in 2-3 prompts all last week and prior, now on 5.,5 Im rocking xhigh and I am at least 8 prompts deep in this 5h session
ofc?
But according to model card 5.5 is supposed to use less tokens I think
5.5 seems infinitely more efficient
Infinitely is a bit of an overstatement π
{"type":"error","status":400,"error":{"type":"invalid_request_error","message":"The 'gpt-5.5' model is not supported when using Codex with a ChatGPT account."}}
When will GPT-5.5 be available in the Codex Desktop app?
The new update enabled it with this whole thing saying try GPT-5.5 then I try it and get an error.
Perhaps a bit of hyperbole lol but still, it seems much more effecient
Make sure you are on latest version from windows store
It works fine for most. Are you sure you updated your app to latest version?
I'm on the latest version, yes. Previous versions didn't even have the option of GPT-5.5.
I just updated like 5 minutes ago, after I had just updated an hour before that.
What version do you have
0.124 in CLI?
You don't need cli, desktop app comes bundled with it s own cli binary
Yes same for ide extension it uses it s bundled binary
Weird Hudson, im on a technically older version of the app and have 5.5
thank god bro i fixed it
it was pain
i already have an access
For those who are concerned about token usage, ongoing mantra: Look at your AGENTS.md and other directives. What are you asking the bot to do? What is it doing because you haven't provided guidance? That's a serious biggie! Look at the CoT and see what it's doing, then help it to do better. THAT will save you tokens on every turn!!
Odd mine is working fine on an earlier version. I just noticed a new update button
I m on 26.422.20832
I ll update and tell you if it s still working, maybe it s an issue with that version who knows
Hmmm...
Well anyways I hope whatever the problem is gets fixed soon because I'd like to try out 5.5 in Codex.
There are however some bugs I don't want them to fix.
Well it s working fine since yday for me
What bugs
I'm not going to mention them or they might fix them.
well maybe your issue comes bundled with those bugs that are features from your perspective
π€·ββοΈ
Oh hi I saw your tag.
So you're a fellow AI researcher?
What did you think of the new CSA in Deepseek V4?
i have NDA on ai research discussions
would you guys say that gpt 5.5 is generally more token efficient? Besides the cost?
I feel like token efficiency helps with generation speed to completion
...
I was asking about your thoughts on an attention architecture in a public paper.
I haven't looked into it yet, what are your thoughts so far?
when are they reseting codex limits for the new model launch ?
Why did you dump your limits prior assuming there would be a reset?
Deepseek V4 CSA is pretty cool actually I think. I'm planning to try it for some future runs.
since i thought there would be a reset since sama said there would be
let us know of your impressions once you do
when did he say that?
they already did 2 like 2 days apart.
feel the agi
you design transformers?
This is how I dream
the baby should have been eating the burger
try it out yourselves π https://chatgpt.com/share/69ebc6ea-5b6c-83eb-b720-e81cb247eb15
no but that's pretty sweet dude
Concerned about your tokens? Look at my cached input. While the "cost" is less for cache, it's still consuming a huge percentage of the overall transaction volume. Occasional forced compression might help. Anyone know if we can force context compression in the middle of a turn?
there was someone that recommended that earlier in the chat history
connection unstable for someone else?
yeah mine wont connect
I think someone said that there was a compact command in the codex cli but unavailable on the app?
did i hear reset?
If it works out good I may make an smaller open version (without proprietary IP) for release on HF.
well yes if I am not the only one experiencing this hehe
yeah, we were talking about this yesterday or so, there's no "/compact" in the VSCode extension. Will try CLI.
They recently added /compact
oh yeah, missed it. just hit it.
BAM! Yeah, you can see when I hit it. Need to think through the significance... Anyone?
praise god bro
Between screenshot 1 and 2, input tokens and output tokens dropped, because I compacted between turns. It wasn't running at the moment I compacted.
There's something to this. The trade off is that if you compact context then of course you lose some of the details that might be required to complete a task effectively. From what I see here, I don't think most peeps care. They just post one turn after another and prior context doesn't really matter.
In the turn after compaction, cache hit rate went up because the hits on cache were more relevant.
So are /compact and the automatic compaction after hitting the context window doing different things under the hood?
same thing here in CLI
I don't think so, Trippy
wdym reset?
server is not responding consistently
I think /compact lets you also feed a prompt to guide in the compacting and auto just does its thing
threats always work https://chatgpt.com/share/69ebcb1f-5668-83e9-bf46-c6a01176188d
I think I shuld have used 5.5 for it, since the kind of "I cannot do this" is typical for 5.4 lol
go back to gpt 5.4 it seems to work for now
This is good, they'll say 'oops' and hit the RESETTI SPAGHETTI BUTTON
KNEES WEAK ARMS RESETTI
How do I get some good codex set up that you guys are on?
Like agents and config?
I donβt want that normie codex
reset wen
I am currently using the OAI vetted "superpowers"
But honestly? No agent, no skill, just pure gpt and a tiny winey agents.md telling it to use ascii does just fine
Its literally IMO mostly not required to have agents or skills unless you ahve very specific needs (like for example, I jave a skill that is capable to understand 200-pages-client-specs and creates actionable json-to-do lists off it, whether those specs are written in handnotes, images or simple workflow diagrams). Now that requires a skill due to the validation of what it extracts, mostly.
But otherwise? I feel more and more the whole skills/agents stuff just actually makes the model less free.
are you being sincere right now
Yeah, I havenβt really used codex before
What have you been using?
This chat, for satire farming
My brain
Remember "CLI".
I made one joke and you're still think it's so funny
π
π
holy deficit
i bet on reset with 5.5 release fml
Hoping at some point someone will comment on the Codex usage utility thread. I'm done with that one for the day. Continuing on real projects.
Here 5.5?
