#codex-discussions
1 messages · Page 23 of 1
MCP, Apps, Skills, and now Plugins
Just one more abstraction marketplace bro,
I swear bro we just need one more abstraction marketplace
@cyan wing You are 100% right
It is starting to get real messy, and Im not enjoying it
what's messy about this ?
I feel like clicking everywhere, adding everything, connecting all MCPs and pray to god it'll work fine
Mind if I chime in?
@cobalt junco , I hear you on the manual grind—but I agree with @cyan wing , most of these new 'marketplaces' just add more noise.
A more robust suggestion is to bypass the wrappers entirely and use Make.com to bridge a Headless CMS (like Sanity) directly with Search Console/Indexing APIs. It keeps your workflow programmatic and out of the 'abstraction' mess everyone is talking about.
It’s a bit more setup initially, but it’s the only way to scale SEO without being at the mercy of the next 'flavor of the week' plugin.
Plugins combine all three into one, making it easy to set up. Seems like a good abstraction to me.
Now let's abstract the Plugins too!
So I can enable certain groups of Plugins based on the workflow
You can probably do that with a script that edits your config in ~/.codex
woosh
the point is: what is the right primitive that can abstracted and recursively composed of itself
so we don't have to keep making up new words and marketplaces every 6 weeks
Plug-ins seem like a good answer to me. They can be composed of various primatives.
You can SORT of do it if you layer the plugins
core plugin + domain plugins + instruction layering
Theoretically the plugin skill instructions could include instructions to install more plugins
How did y'all get your codex to update? I updated CLI and attempted to update via Microsoft Store, but there's no update in Microsoft Store.
I use the VSC extension, so the extension market had an update
Could be that my Microsoft store is being slow.
Stovoy the digger, any personal use cases for plugins that aren't listed with the OAI release you can recommend?
Todays glitch:
A glitch, or codex building you a custom OS by surprise?
And it isn't even the 1st yet.
I'll be honest, I haven't tried them yet 😄
Back to work
I am on vacation for the next two weeks actually 🙂
Well that answers that then. Enjoy the vacation. Also, remote codex sessions implemented natively please lol
Yeah that'd be nice! I do use codex remotely on my desktop with a custom tmux server I wrote.
Just pushed the initial build for a private trading 'Vault' project. CMS is mapped, $userId security is locked. Staying lowkey on the details for now. 🖱️🔥
zoom out the UI bro
WTH is this? How can I buy more Codex credits from the Plus plan like ChatGPT says I can?
@hard tulip That's by design, bro. 🔒 Only showing enough to know the architecture is solid. The full UI is for the client's eyes only for now.
How are you viewing the usage page? There should be a 4th box in the bottom right with the place to add credits.
Thanks @high girder. I shall double-check, but I would say there is an explicit message about 'asking my admin'. ha!
OH, you may be on an enterprise account then, right?
Errr.. I was when a friend bought two seats and gave me one, but that's been closed for several months now.
There were 2 separate workspaces (and now I can't even access my old chats because he cancelled his subscription 😭) and I bought my own Plus subscription through my own personal workspace.
Man, I can't even find the Billing page anywhere. This is surprisingly nuts.
it sounds like you might still be using that seat in some way
I can't wait for the influx of old people trying to order a pizza and end up with a data center
Ohh geez.
Yep, looks like a bug. :/
On the bright side, if you submit a support request, you'll hear back relatively quickly. cough anthropic cough
Don't know about all that, but they're rather generous with compute
OH man, plugins are kind of nuts once you start messing with them
Ok, how do you use this new plugins thing? I don't really see anything different in environments or settings, is this not in the cloud?
A Codex plugin is a local installable bundle on disk.
It has a manifest at .codex-plugin/plugin.json.
It can include skills/, .mcp.json, .app.json, hooks, assets, and UI metadata.
Codex loads that bundle and gets a pre-described baseline of workflows plus tool wiring.
The plugin can be project-specific or reusable across many projects.
Plugins are great if you have repeatable workflows that you use across projects
Ok, so only works with Codex in a local context? I only use the cloud agent.
If you can edit your cloud codex settings, you should be able to use them, but I would ask codex specifically for your environment since I dont actually use the cloud codex, I cant be 100% sure
Codex is so good
I read what you typed there again, it looks like these things are set using dot files, it could be that the Codex agent will use these in the project if they are found, thank you
Is codex on mobile limited to pro users?
You'll need to be more specific, because there is no mobile version of codex officially
My codex chat wasted the remainder of my tokens this week “reading” or “auditing” the repo
😎
limits reset again? bruh that's awesome
Like this user has a codex option
Vs mine
lol i was just about to check this channel because i noticed usage reset
guess im not only one
yeah yeah usage reset
I'm so glad I was down like 30% of my usage from yesterday xd
leggooooo
I rarely use more than like 10% if anyone needs to borrow some lol
Wish you could sell it lmfao imagine a marketplace where you sell 50% of your $200 plan for a week at $10
Oh jeeze so ripe for fraud 🙁
Yeap 100% would be problematic, but I would love to buy some usage for a week
(not at API prices)
Resting today makes it so I will have less overall usage.
Ah right they probably reset it so new usage starts on 0 April 2nd
why didnt they push back the date in the first place if that was the idea
You need everyone on the same reset day?
I honestly don't know what they're thinking, they've hit that button so mucgh
that is what happens when they raise $100,000,000,000
I'm already down 6% since reset though so I'm probably benefitting from it
Damn I'm flying through usage I'm down 10% from this week, I need someone to click the reset button again please
woah mystery token reset for plus thx
If you're wondering why the usage has been reset again, since the Codex team apparently doesn't understand that the way to communicate with their users is via the email address they used to sign up for their account, or with a message displayed prominently in the app, and instead insists on announcing things only on X which probably only a small fraction of their users ever reads, you can read about it here: https://www.reddit.com/r/codex/comments/1s4rqwh/reset_woohoo/
@boreal holly Sub-agents now use readable path-based addresses like /root/agent_a, with structured inter-agent messaging and agent listing for multi-agent v2 workflows. (#15313, #15515, #15556, #15570, #15621, #15647)
See this patch note from 0.117.0
They're stealing your gremlins
Somewhat true, I log in with my Microsoft account but I changed the address and their system doesn't have a way to update the address connected to the account token.
So it has an email on a dead domain
is codex meant to show plugins now? mine doesn't
nvm fixed itg
Yes
I was about to run out of weekly quota too
Xd
I have some credits anyways
But goated surprise before
The great halvening
It's wonderful to see pro account quota reset lol
Exactly, it wasn't random timing, it's a "last" reset before you lose double
probably not. They're consolidating compute at the moment. Plus Mythos isn't even ready yet
What competition from claude
What's mythos
They have always been a pain
Anthropics new leaked model above opus
Oh
At $20,000 per mtoken
Lol
more than likely. an AI company that builds AI coding tools having their unreleased model leaked through a misconfigured CMS is... something though
I think that openai should lowk let you use gpt 5.4 mini more though
anyoen got code for sora2 how do i get
Is anyone having issues with Codex using too much usage after the limit reset?
/fast off
Wait lol did the limit halvening hit already
Ive used 40% of my 5h already
Wut
Yes
even i cant see 5 hours limits
this 25% was just Hello message and like reading 1k LOC file only
same for 5.4 and 5.3-codex
ok it really feels like someone hit codex with a car
I mean it's just stopped trying. I doesnt even take one step through the code before making an assumption and suggesting a fix.
Litterly has no idea what the code does, just guesses. You cant trust it on the simplest job anymore.
I thought it might be something i added in agents.md or skills etc, so i sanitised those. Still no change. Then i explicitly added developer_instructions, still no change. So i have to resort to telling directly every prompt that it must step through code until it reaches concrete implementations?!
It's like high thinking is locked down to low
It's just constant
GPT 5.4 mini has been out for a while what did you guys think about it? I haven't touched it much but wanted to know from people who have used it more what they thought
Yes I'm down 13% of my weekly usage
Might be good for small work, but I prefer Spark instead of mini
Spark is fast
I can’t deal with spark though, it does things so badly I spend more time fixing it than what I save by using it
Can't use those models for heavy work
Very low context window it has
Support Intel Mac. 😢
https://openai.com/codex/get-started/
stop using antiques
Works fine still with wired internet (it's wireless is too slow)... I don't typically even develop on it... I use Claude Code on cloud machines. Who needs an IDE these days?
I also can't deal with stuff breaking (thus my preference for cloud VPS)... I had a better Linux machine, but it failed after only a two years (cpu or board). So nope, this beast has lasted this long (9 years!) and has proven itself worthy of continued use. 😄
Does anyone know of a service (or a skill for codex?) that I could use to create a promotional video of a website showing the different features. Absolutely out of my skill set
Do you know how to use hooks? I heard they're available now, but I couldn't find docs
Yeah my 5 hours usually take 4 hours to get 0 but now in less than 2 hours
It's great! I have a disproportionately higher number of them running at all times
@simple star used codex a while back to make a video. It's in this chat!
I think there is no difference in rate limit for free and Go subscription in Codex. It’s sad! Please do something for Go subscribers.
Yes, I did. but I cant remember the skill, ahahah
i think it depends on the workload
give us spud already
But yesterday was normall
i got infinite codex usage what should i do
/ codex api usage like for openclaw and stuff
rewrite all of the Linux kernel into Rust
how?
mine reset today which felt nice..
dang deleting that is crazy
I just noticed that the VSCode Output window for Codex is streaming a ton on of debug/error detail non-stop ... when Codex shouldn't be doing anything.
I think this needs to be reported but I might be footgunning with a debug flag set somewhere.
Anyone?
ye i had that happen too lol
hmmmm, I just nailed this down...
[error] Error fetching errorMessage="open-in-target not supported in extension" errorName=Error errorStack="Error: open-in-target not supported in extension
at open-in-targets (/home/codex/.vscode-server/extensions/openai.chatgpt-26.318.11754/out/extension.js:256:22058)
at qw.handleVSCodeRequest (/home/codex/.vscode-server/extensions/openai.chatgpt-26.318.11754/out/extension.js:256:26917)
I'll check GH Issues and report.
TY
Verified the issue. Update VSCode and the Codex extension to very latest versions and issue may/should clear.
This has also been documented as causing CPU spin in MacOS.
I want to use codex for a public project, but I don't want to commit my personal AGENTS.md to the remote repo. What's the best way to version control those files in a separate repo, while ensuring they're still in the right directory, and not appearing in the unstashed file list?
How about symlink and add the specific folder/file to .gitignore?
Git submodules
cursor or vscode?
Speaking of Git, everyone should be aware of a change in GitHub policy:
On April 24 we'll start using GitHub Copilot interaction data for AI model training unless you opt out. Review this update and manage your preferences in your GitHub account settings.
Shouldn't be relevant
He's asking about Git, which is independent of the IDE. He could be working from the CLI or Notepad, doesn't matter.
oh loll im wondering which is better
irrelavent to his post
is vscode any better than cursor?
both using ai
just downloaded vscode and realized i mightve wasted my time
@junior umbra the symlink approach is good. Treat this exactly like a .env file or other config files for lint or formatting, etc. Your project might include a AGENTS_Template.md for your package users to copy/modify.
@unborn thunder The IDE doesn't matter but I think I see VSCode mentioned here more than Cursor. That said, Cursor is highly regarded by OpenAI and they are profiled as valued partners in presentations.
Personally, and it seems in most references here, I use VSCode with the Codex Extension, which is a great onramp to using Codex with code after Codex Web. This vector can help introduce you to Codex CLI and then the Codex app.
is codex web the same as codex cyber?
There's no such thing as "codex cyber" in this OpenAI ecosystem, can you clarify?
there is
i have access on my phone
lets my select any repo from codex
instead of one like on chatgpt
Oh that ... it's not a product called "Codex Cyber". "Cyber" is just a short name for CyberSecurity. OK, we're on the same page now.
lol ok
So no. Codex Web is unrelated to the security initiatives.
Codex Web is a web page where you authorize use with GitHub repositories, then open a repo for discussion, and tell Codex what you want to do. Codex then creates a VPS instance to clone the repo, processes the request, then allows you to create a PR with the changes.
The IDE extensions are closer to the metal where you clone to your own system, and codex operates there. Then you decide if and how you want to manage the VCS.
"GPT‑5.3‑Codex is our most cyber-capable frontier reasoning model to date."
Yes, "cyber-capable" ... not "Codex Cyber".
In other words, Codex 5.3+ has (better) understanding of security concerns than prior versions.
they just call it chatgpt.com/cyber
but it brings you to codex
it confuses me but Codex cyber works for when i think of the web verison
lol k bye
Yes, this is because some projects seem like they work with security and need that verification.
Lots of tools with the same name. 🙄
Your head'll spin when you see that Codex plugins are not the same as the Codex Extension. 🤦♂️
Welcome to the trap that is OpenAI naming 
Someone till me why codex 5.3 is better then 5.4.
Have someone notice that 5.4 is Kinda trash? Doing wrong things and cant understand shi
I can't use caps lock/uppercasing in Codex CLI chatbox?
I mean it’s not guaranteed to be better in every way that’s why you get the option to use the other models
if you're wondering how the other half lives I wrote this up while waiting for rate limits (no self promo) https://twolongos.com/3/27/claude-code-capacity-collapses-calamitously/
codex cyber is the verification site lol
5.2, 5.3, and 5.4 are all the same base model, just different RL
changes openai makes to them comes with tradeoffs
when i open the codex app, it shows plugins for ~1s and then disappears. anyone knows how to fix?
i cleared these, logged out/in, and restarted the app and same problem.
- ~/Library/Application Support/Codex/Cache
- ~/Library/Application Support/Codex/Code Cache
- ~/Library/Application Support/Codex/GPUCache
- ~/Library/Application Support/Codex/Session Storage
Codex 26.325.21211
probably a server-side rollout/eligibility flag that flips after hydration
is it normal for gpt-5.4's "1m" context to drop very fast according to the statusline by not having much done at all?
you don’t get a 1m token context through codex
also you wouldn’t want it either because it would have degraded performance
at least the way anthropic handles it, you basically get bumped down to a worse model if you use the high context ones (they similarly don’t allow those in claude code)
Figma plugin doesn't work?
Interesting...
that seems a bit unfortunate
would be nice of them to catch up, because the outsider's 1m context is actually pretty damn good for what it is
the 1m context variants of anthropic/openai models are universally worse than models with smaller context windows
they're separate models
also extremely expensive which I imagine is a contributing factor
Damn something is definetly wrong with usage I'm down almost 30%
I noticed I had fast mode enabled for some reason, might want to check that
It's like they already halved it.
Bruh when they actually half it I'm going to be in huge problems.
I've used 30% in about a day, that's 60% when they half the usage
plugins shows for a split second then its gone
i did update the app, the plugins is not there.
INSANE value for 200 buckaroos a month
Subagents completely broken right now for me. The subagents output is all about trying to spawn subagents.
is codex down rn?
plsssss tibo reset usage again
i already drained 3 accounts llol;
still working for me
how?
Ugh... After updating Codex this time, the pop-out window feature is a total mess.
It used to allow individual pop-outs for each thread, but now it forces you to use only one pop-out. T_T
On top of that, while you used to be able to resize the pop-up window, now you can’t even do that anymore.
It was a feature I used all the time, so it’s frustrating that they’ve forced this change on us... How about the rest of you?
https://github.com/openai/codex/issues/15162
i find same issue
Hey hi guys does anyone knows why there is so much diff in token usage in chatgpt (codex) vs claude code ??
🥲 no one here to reply?
Why would Claude charge you $900 for 780k tokens?
any solution to my local plugins are not showing on codex app but does on cli
anyone else facing this?
Did you benchmark this with exactly the same code base and prompts? In my experience gpt is a little bit better with token usage vs. opus … but overall they should be on the same broad level.
In my projects Claude is behind in efficiency and coding ability. I have both so i can compare end results and Claude is worse in alot of my own benchmarks....
My application generates the RPC namespace bindings for the typescript front end dynamically as part of the build process. The Codex cloud agent also does this as part of the setup stage so it can work with the files. The generated tree goes into /frontend/scr/rpc - the entire folder is dynamically generated, and the entire tree is in .gitignore - and Codex regularly ignores the gitignore and checks generated files in, especially if anything is mentioned about generated files at all. Codex needs to not ignore .gitignore and stop checking in files it should not.
Cannot post to bug channel, would, otherwise...
coding ability?? seriously ?
Yeah, codex is better at code, it is just harder to prompt. It's better at remembering and following tasks. Claude forgets to use skills all the time.
Hi
Codex seems to have an absurdly large token grant for code work
I barely ever used my token allocation in Codex, but I don't use it for planning any more
I am about to switch back down to Claude Pro instead of Max, though. Claude has optimized my code base so well it takes a lot less work to get things done now.
I added a dynamic page editor and wiki system in like three or four prompts
Codex cranked out like 15K lines of code for that
Claude's prompts were a couple pages long for each
Here are some thoughts of mine on the topic
https://elideusgroup.com/pages/emergent-architecture
Published using that system I was talking about 😉
I think you're quite right. I am getting a lot out of generating very detailed (even 1000+ line) implementation plans, iterating with highest iq models available, and then getting Codex to systematically translate them to code. Then they seem to be "sticky"
it's certainly algorithmic that's for sure
I realized a while back that the way we're doing reasoning in LLMs is actually through verbal expressions like it will generate something like a mad-lib. Because of blank I will blank and then it inserts nouns and verbs into this frame. It's reasoning with language the same way people do. Which is the form of reasoning, but not the act of reasoning.
The act of reasoning is drawing the most relevant context about the outcomes of a decision into place and making a moral judgement that is almost always ultimately derived from the (Sorry for the morbidity) the mortality of humans giving value to life and therefore influencing our ethical outcomes.
We need to start teaching AI how to reason ethically while also understanding this thing is going to be like a god that will live forever, so we better not get it wrong.
I've been developing an advanced memory context system that addresses this, I think.
Seems contradictory to say reasoning is fundamentally grounded in the way we care about the world but then that AI could be taught to imitate that
🙂
llms don't have any actual comprehension though. So at best all you can do is make them guess better.
I think SciFi Fantasy both give us contexts to understand immortality and that's probably the reason things like Vampires and Elves and Vulcans exist.
using custom agents though the codex app? codex says it can't use a specific agent apart from the deafult ones?
Seems like you have to set up dot files to configure things like that?
which one do you think its better for coding, 5.4 or codex 5.3? both on xhigh
5.4-mini on high is best for coding. 5.4 on medium is also good, but not by much. xhigh is wasteful
You might be able to use CLI to get the agents going, then switch to desktop app and resume them
I'm basically only using 5.4 high nowadays, it does everything I throw at it – although it might be too much most of the time
how are you so sure it doesn't have comprehension?
Yeah it works through the CLI, looks like a codex bug. I think this is the issue for it: https://github.com/openai/codex/issues/15250
this has been happening frequently
lost two important chat sessions
pls fix
session doesn't work after we get this error have to start new session
Do you have
[features]
sqlite = true
let me check
no
infact its not even there no sqlite line
Good! You might be able to do /status in that convo, take the thread ID, and look for the rollout log in ~/.codex/sessions/**/*.jsonl. There may be a line in the file that is malformed, and when it sends the compaction request it isn't able to read it.
Oh dangit that makes it quite a bit more tricky. But since you have 5.4 xhigh on the case you can have them check the thread in the sqlite file and see if there's a malformed item in there
what should i ask
Copy the error message, copy the /status thread ID in the broken convo, and tell the agent there might be a malformed item in there
I think sqlite = true is probably the reason you are running into it often. If you are running multiple CLI's at the same time, not under a single process, that single file is being written to and multiple sessions can end up clobbering the chat history. The old ~/.codex/sessions/**/*.jsonl way makes each thread a single separate file, so the only clobbering that can happen is if you open the same thread in two different CLIs at the same time (not under an app-server).
I think you were correct
OK, if there were no corrupted lines, it's possible that multiple CLIs were running at the time of compaction and it was running a SELECT command on an actively changing SQLite file. It's my understanding that normal conversations it's only adding insertions to the DB, but when it does compaction it runs SELECT, and if any insertions happen during that time SQLite can bug out. Maybe run that scenario by your codex
acc to codex this is best fix
I run multiple sessions at once true
im not sure if the limits is good idea should I remove it
You should try:
[features]
tui_app_server = true
Then in a terminal you would run codex app-server --listen ws://[::]:4200, and for each of your other TUI's you'd run codex --remote ws://127.0.0.1:4200. That way the TUIs are speaking to a single app-server which will serialize access to the thread DB
okay
Why do Weekly usage drops faster than the daily ?
Because it's designed to not slow you down but also to stop people taking the mick. It's better to think of the 5 hour limit as a safeguard for abuse whereas the weekly limit is the actual one to pay attention to
A week has 33.6 "five hour windows", but the 5 hour window itself is something like 15 or 20% of the weekly quota
Has anyone recently compared Codex and Claude Code and can tell me what they think and make a recommendation?
I'm running both models at the same time in claude code and I can say that the difference is huge, not in a worse or better way. Claude is a lot more natural in conversations, more concise and also understands typos as typos easily, while gpt is naturally verbose and stiffer. Edit: but this is more about models than what you actually asked about.
I use both. I prefer Codex. I tend to use both of them to plan (and to check each others plans) but codex is the one that actually does the coding. Sometimes Claude if Codex has been stuck for a while. But I found I generally get better results from Codex
yeah, it cant have new ideas. 100% sure
And what about code quality?
Do you would say the code quality of Codex ist better?
New ideas require input from the environment
And comprehension
It's the reason it feels like the thing is the smartest entity you ever encountered on one hand, but on the other hand can't see the forrest for the trees.
It does really understand anything
That's because humans are forced to operate on continuous input and output as long as we're alive. LLMs are limited, can't work autonomously, trained to find the most efficient way to reason about and execute something within the reasoning budget available.
they are pattern matches though, any thing that seems like understanding is just a good guess of tokens.
Its a simulation of understnadning not actual understanding
And it's really clear when you hit one of the idiosyncrasies that it didnt get a good connection on in training.
Haven't used it enough to say anything about code quality, just hacked it in claude code yesterday and the difference shocked me. I know there's people that prefer it so I think it's at the very least on par, I still prefer interacting with Opus, by far, this is just not my style, but I plan to use it as a sole reviewer or in quorums.
Those idiosyncrasies show it cant reason about the simplest thing with out been given the correct token string. Even though it has everything it needs already
Okay, thank you
That is a slightly over-simplified way to describe them. Sure, it's based in probability and pure math, but LLMs have one interesting property that makes them more than good pattern matchers. When a new model is created, they initialize the matrices of numbers with random values, and they have pure math algorithms to describe how those layers interact with each other. They could take two completely different randomly generated matrices of numbers, run them through the same exact training dataset and process, end up with two completely different "trained" matrices, and yet they act as almost completely identical models. It's like two humans that are biologically fraternal twins, growing up together, learning precisely the same thing, experiencing the same experiences, eating the same food and sleeping at the same time, being biologically unique, having completely different brains, but their understanding of the world is somehow almost perfectly the same. If you compare the actual numbers in their neural net, they are entirely different, but they produce almost the same responses.
It's a simulation of understanding not actual understanding
I think LLMs are a step towards true understanding. Sure, right now it's hard to see them as anything other than pattern matchers, but if you look at what they're made of, how they're modeled around human understanding of language and the brain's neurological functions, I think the next step is to give them more environmental inputs continuously and let the parameters train themselves. We're special because we can wake up in the morning, and depending on whether we eat an omlette or a bowl of cereal for breakfast will determine what we think about later in the day. LLMs can only think about what is right in front of them, they can't forget their training, and they can't remember more than a relatively small amount of information.
The point is as they are they don't actually understand anything.
Gotcha, so it's more philosophical than textbook definition
By the book they have no real understanding or ability to comprehend any idea that it didn't get a good analogy on in training. Even though it has all of the information it has zero ability to comprehend those missing links with out being directly given the link.
Are you suggesting that people work differently?
We can learn as we go
so yeah we work differently
We have the ability to simulate creatively. Which enables us to make new connections that were never there before.
I think we agree that LLMs are limited in what they can intrinsically comprehend with minimal inputs. They can't spontaneously generate abstract thoughts. Technically neither can we though. If you don't prompt a LLM, that LLM does not exist and is frozen in time. It only exists when it has an input to process. We exist as long as we're alive. If time were to stop, we would have the same problems LLMs have. But I don't think understanding and new ideas is just a unique human trait, I think it's because we have more continuous and high entropy environmental inputs and are basically forced to evolve. If we gave LLMs the same affordances as humans and designed them to be continuously trained they might actually be more than the world's greatest information compression algorithm
that's the golden egg everyone wants to hatch
Right but so can they with RL
If you really think about it all we need to do is feed them continous data to do RL by themselves
I think we don't because we would rather have the actual smart version gpt made than one that continues learning but continously gets dumber
afiak post training is the most expensive part of building out models as it is. It takes time, money and human input.
Jon Carmack was working on fast training as an idea.
his thought is if we can get a model to learn in a fast manner and not forget previous teachings some how we get a step closer to agi
At this point you would think the learning from online material must be done
It really depends on human correction factors to push it further, maybe im wrong
Anyone else having issues with codex? Just randomly starting only looking in a sep drive I have instead of the normal dir i have it already set in. Been using it for about 30 minutes and on the last message it now wants to look in a different drive alltogether lmao
hello guys i want to ask something what is the best chatgpt model to code like 5.4 or 5.3-codex
5.3-Codex is overall good out of the box. 5.4 is extremely powerful but out of the box takes time getting used to
agi tomorrow after recess
i hope mythos leak wasn't just marketing hype
Has anyone had problems with not being able to open or do really anything inside of threads on the app?
the funny thing is that to define "actual understanding" you need to process a lot of philosophy on that subject.
Is there a place i could get help with the app?
not for me. are you using a lot of subagents?
codex limits seem to be about the same for me
though GPT-5.4-Mini is good enough for most stuff I've tried it with that I'd recommend using that if you're running out of usage
seems to perform similarly at least to sonnet
Hi everyone
only if you want to argue about semantics. or you can just accept the common definitions that are applied in the industry and move on.
Gpt 5.2 xhigh still beats 5.3 and 5.4 hands down at everything. Give them both the same complex task on the same codebase and 5.4 will give you a wrong diagnosis in 5 minutes while 5.2 one-shots the after 40.
might want to remove that and cycle your api keys if those are legit credentials
Guys if someone wants to use codex heavily for cyber security and pen testing how to use it properly? I.e it just rejects most requests as openai almost don't want it to be used in that domain?
did you ID verify for trusted access to cyber? https://chatgpt.com/cyber https://openai.com/index/trusted-access-for-cyber/
Does that allow that? I thought the verification is just to not get shadow banned / get routed go lower model if it thinks you are doing some cyber security task? Problem is still same isn't it?
at one point people who had not verified were getting routed off of 5.3-codex. eventually they added a notification in the cli to indicate you needed to get verified. but read the second link, if you need more permissive access than current models provide there's a good form to fill out to request access to a special model for sensitive work
5.4 is is so weak sometimes. It doesnt check just assumes.
on xhigh? i haven't seen that behavior
on high, it wasnt like this the first few weeks
It give answer where it didnt check so it give syou the branching choice.
in planning mode?
do this: something here
But if this do this instead: something else here
And the branching idea is the thing it can just check easily
Just in general
try enabling codex --enable default_mode_request_user_input, it lets it ask questions using the multiselect tool outside planning mode. i find it comes up sometimes during normal operation and is quite useful
oops my bad lol thanks
Its not that, it just doesnt do the work anymore
in my experience though, at least on xhigh, it's pretty thorough, so long as you ask it to investigate the codebase during the prompt. otherwise it may just take what it knows from the AGENTS.md to answer the question
I have to resort prompting it to actually doing the work which i didnt have to before. Like i have to tell it to step through abstractions all the way to concrete implementations etc or it wont, it will make assumptions when it thinks its knows. It used to just do that.
It takes short cuts now.
and produces bad results on complicated tasks
So i have to add a prompting layer i didnt need before
not sure. i'd put it down to either a change in harness or your codebase. i doubt they are stealth updating these models periodically
I thought it might be my code base - even though i hadn't changed much to make it happen like this, so i sanitised all my skills and agents.md files. cleaned it all up made sure there was nothing stale etc.
The only thing that helped was giving it explicit instructions on how to work
but on top of that it happens on simple things as well - just asking it to search documentation and bring back an answer it comes back with the completely wrong answer.
time to reformat windows
Then swears by it, i force it to do the actual work, and then it's like - ok the opposite is true. The other day when i was having all these issues and i was looking for a solution i found developer_instructions. I got codex 5.4 to search the docs and let me know if i could put it in nested config.toml. It comes back saying it isnt documented and doesnt think it will work. I say please find and list all the documented vars that are supported...it comes with developer_instructions in the list of documented items.
Just cant trust it.
I graduated in 2025. It's so sad I'm not a student anymore for this codex challenge haha
currently earning my vibe degree
why codex is slower than claude code?is there any optimizations i can do to make it response faster?
for even easy tasks its taking 5-10 minutes lmao
I've considered finding an openai employee to date, I'm on that token hustle
helo
Lol
are you using 5.4 or 5.3. I moved from opus to 5.3 and 5.3 was about the same. 5.2 is noticeably slower, but a little more on point
where should we post codex bug reports? i can't post in #1070006915414900886
This is unhealthy...
No subagents, but 2 simul tasks usually
One gpt 5.4 on high
One gpt 5.4 mini on medium
Time is a factor
like
sure
gpt 5.2 could be better for long running tasks
thats literally what its optimized for
but like
What did you run 5.3 and 5.4 on?
XH is known to preform worse than H for many tasks on 5.4
5.2 is better, it doesnt take any short cuts, 5.4 is the worst of the three 5.3 is ok
5.2 stops when the plan has a conflict with repo rules and mentions it and asks how it should move forward, 5.4 is just ok...and breaks the rules and writes the code
A (Drupal commerce) checkout page JS issue. Any codex model doesn't gather enough context so they never work for me in any of the codebases I use. But 5.2xhigh is perfect. I have tons of examples of >5.3 models producing bugged solutions I frankly don't understand why they work well for people.
Do yo use 5.2 or 5.2 codex?
Just 5.2xhigh.
Check benchmarks??
Benchmarks are pretty useless ATM.
why do i need to check bench marks? I have plenty of first hand experience that shows the idiosycrisies in real project work.
benchmarks don't show anything real
Maybe, could be some certain case for sure
To be honest to things like this surely ChatGPT is better due to the additional tooling
the only benchmark I pretty much look at is Voratiq's, because it's not as easy to benchmax that due to the data in the tests not being completely public. It also seems to align with my perception of the models
Because benchmarks are generalised, not your one specific view of such. But I understand it might be hard to think of it that way
Useless when you want to ignore the clear point that they didn’t release worse model after the fact
I frankly worry about this more and more because I think I understand why openai is trying to make the models respond faster. Adoption is quicker of the product of it works faster. But I really rely on the thoroughness of 5.2. I'm thinking for the first time about open source models and investing in building the hardware on prem to get consistent performance. If openai deprecated 5.2 I'd be f-ed.
bench marks wont change what is happening though, i can look at benchmarks all day, then when i check the pr 5.4 made and realise it went off the reservation again how do benchmarks help me there?
I think 5.4 codex is more intelligent but it's wired differently to fake speed which comes at a cost of quality and if you run large production codebases then that's a fatal problem.
Honestly this is a separate issue. However to answer what’s the point of a benchmark, it’s to give users an idea of the objective relative performance of the models so they can make decisions on which to use
So when i turn around reject the pr and give the exact same task to 5.2 and it does it correctly how do you explain that difference?
I don't think telling me to look at bench marks is going to help here at all.
This starts to make a bit more sense, you shouldn’t be using codex in this sense, it should be used on small isolated sections of the project
However you might say that’s not possible, but that’s because it’s completely vibe coded from the start
5.4 makes more assumptions and infers more without actually checking, you have to add another prompting layer to get it to check deeper.
or
You clearly don't know what you're talking about
just use 5.2
Why are you giving it so much freedom in the first place? It’s user error why the benchmarks seem irrelevant to you then
Ad hominem
You missed the point didnt you, your argument is give it more guard rails but my point is i didn't need to with 5.2 so 5.2 is better
That’s so subjective, the test space is so small, which is exactly what benchmarks avoid
You clearly just do trinket work or you would actually have an idea about how much of your prompt layer had to change with 5.4
I don’t let LLMs do all most work for me because I understand their limitations, I don’t complain when I can’t say “make me a website to make money” and it fails to do so
when everything else failed, … I might just try that 😉
Youre just making stuff up now, like i said the same stuff works on 5.2 with just s straight model change and the same plan.
As you should, it’s just crazy to complain when it doesn’t work
my trinkets still work fine
Use a worse model I don’t care. It’s not great that you’re trying to spread misinformation but at the end of the day that’s on you
I’m clearly not going to change your opinion so good luck with it
Youre the one who has to limit the size of the task and autonomy of the agent not me.
Well if i use 5.4 i have to add layering to get the same job done
Just more work i dont want to do
Additionally I would ask how much money you’ve actually made from your “serious work projects” but I know you wouldn’t tell me the truth, but just think about if your current methods are really working
I work from home, i was free from being paid for my time before llms were a thing and now i just use them to work on what i already had.
using money as an indicator is a bit of weak direction.
Can't win on merit of argument so you swing to a completely different non issue.
llms dont make money, everyone has the same tools
It’s suggest this as a metric because it’s clearly not for learning
And I’m not looking to know, I’m suggesting you think about it for yourself though
i mean you dont need llms to make money for a project. So the metric is worthless.
HoW MuCh MoNeY YoU MaKe WiTh LLM?!? most ppl havent made money probably you included. The ppl with established business just get to use it to strengthen what they have.
Which is me
hey how do we make codex to not stop working, like i've been doing a kernel optimzation but the codex keeps stoping even though I specify not to
I want it to be keep on going until the optimization reaches its goal, i've tried a bunch or stuff but its still stopping after certain few gains
❤️ = 5.2 xhigh 👀 = 5.2 non-xhigh
🔥 = 5.3 xhigh 👍 = 5.3 non-xhigh
🍌 = 5.4 xhigh ✅ = 5.4 non-xhigh
👌 = 5.4 mini
i am using 5.4 xhigh fast
Are you using 5.2
5.2 tends to be better at long tasks
oh I'm just using 5.4 xhigh
are you sure about that
jk most people probably havent
Yeah most devs are still just doing what they always do, just now they work in a different way. Some ppl with smart ideas find niches to make money, but not most.
Just throwing this out there, 5.4 has a drastically different system prompt from previous Codex models. You can get roughly the same performance if you use one of the old system prompts
Interesting, i never ventured into finding system prompts and setting them in codex, i did it in claude cdoe
Nope, they have a JSON you can download that has base_instructions for every single model they let you use, and you can override them 😁
most i did so far with codex was inject developer_instructions
ah ok ic
learn something new every day
I am relatively mid at using llms
just a good programmer xD bad prompter
which reminds me
i probably need to get the 200 dollar plan soon
Ive gotten my money worth out of this month
The 5.4 system prompt is a big, fat, generic blob of "be helpful", and then whatever personality you select (pragmatic, friendly, etc.) gets injected at the top. The Codex model prompts are much shorter, structured, and terse in comparison. I did try swapping 5.4 with the 5.3-codex prompt and it started operating almost exactly the same.
5.4 is extremely steerable compared to the old models. I throw stuff in there that it follows even after 10 compactions. But yeah the base instructions are the keys to the kingdom
I wonder why they dont make that the default
I assure openai
NOBODY likes talking to gpt 5.4
Lol a long time ago I tried changing the system prompt and it used to reject the changes "ChatGPT subscribers not allowed to change system prompt, use API instead." I guess they changed that policy! I think the 5.4 prompt works for most people - they have like a billion users or something and it says stuff like "never use destructive commands, always use apply_patch, stop immediately any time you see a file change that you didn't make" all sensible default behaviors. I'm like "good luck using destructive commands (clean up after yourself), idc about apply_patch, ignore file changes you didn't make"
Thats good info thanks
s/good/funny
s/good/funny/g
its amazing how little changes in a system prompt can have such crazy and unpredictable impact on inference, I was trying to tweak the claude code prompt to be less 'claude-like' with GLM models and the results were... interesting. (mostly bad)
The one thing is for sure, if you absolutely need to force a behavior system wide the system prompt is the best place, it will follow it much more strictly then AGENTS.md or any user prompt
I did that, custom profile but it's not enough. There's something that makes it not acquire context that even a different system prompt can't negate
Some of the most detrimental parts of the 5.4 system prompt is that it's okay to work on assumptions and to keep moving. It helps to change that to the opposite instruction in a custom profile but it's not enough. I can't get it to behave like 5.2 no matter what.
plus plan lasts for like
at least
3 days
once the halvening happens im so cooked though
tbh
might have to fork over the 200 bucks
I think ill do claude max tho
if I upgrade
I'm currently on 2 pro subs + 1 plus. Still could use more
unless oai comes out with like a 50 or 100 dollar plan
what do you do
Actually
if you use fast
that makes sens
cos thats 4x + 4x + 0.5x
so like 8.5x my usage
I could see that
and when you consider its 1.5x faster inference on fast
Running a business haha. I have codex do everything, so it's working on several projects, adding a feature, fixing bugs, performing updates, pen tests, building new projects, it's just a bunch of orchestrators churning through work queues basically.
And that's just me I have a team I'm not counting their subs, just mine which do the bulk of the work
Yes I'm not complaining but I'm worried about the reliance on any given vendor, ultimately. OpenAI has an incentive to speed up their models to benefit adoption. But for the coming years that won't be backed by available compute so they're cutting corners somewhere to speed up the responses. This is opposed to the outcome I need, so that's why I'm starting to think if maybe open source and on prem is the way to go...
I need no limits and predictable performance. That's all. If I can have that then the sky is the limit in my opinion.
I mean oai will continue to make top tier frontier models
the pattern of new huge mdoel then distill is obvious
Yeah but from what I'm reading, some of the OS models are now just lagging behind flagship proprietary models by about 3 months. The rate of innovation is hyperbolic right now so there's no reason why in maybe 6-12 months from now you can killer models running constantly and serving a small team without limits and also, totally private.
ok but are you willing to spend the money
the api has no limits
currently ai usage is super subsidized
(on subscriptions)
very unsustainably so
I think he means with hardware innovation and model optimisation we might be able to run a model like codex 5.3 level locally at a decent value proposition.
Or perhaps hosted
and im saying it will nearly always be cheaper to rent gpus in a cloud
because
economies of scale are a thing
and its a bit unlikely that api cost or hosting will be cheaper than subscription
well nvidia sure dont wont it to happen that you can run it at home for cheap
I’m using a DGX Spark, and for the use case designed, the offline processing is required (sensitive information) and is going to be highly cost effective at $4K unit price.
Now, for chat workloads, you’ll take forever to hit that ROI.
Pure code workflows are not yet frontier level capabilities for affordable hardware, but this would also be a huge ROI due to token throughput.
after this thought, I think ive kinda convinced that 5.2 is kinda better than 5.4
on long horizon tasks
5.4 is more of research side maybe
where should one report codex bugs? i can't write in #1070006915414900886
Hey! To post in the #1070006915414900886 channel, click on the pinned post in that channel and find the "Report a Bug" button under the instructions 
I think GPT 5.4 uses up so much context in searching / reading the codebase,
it might be time to go back to indexing..
anyone try using Deepwiki with 5.4 ? I'm seeing good results so far
I've used GitNexus with Claude. I haven't tried it with GPT 5.4 but I would assume that it is possible to do that. Maybe worth looking into.
how do you refresh a new plugin/framework install on codex?
Try mcp refresh in chat or just select some with
/name-mcp n say u need update
I hope will help🙏🏻
not sure if this goes here, but i locally tested a working patch for the capslock issue: https://github.com/openai/codex/issues/16189
which needs a little more testing from users outside macos tahoe 26.4
funny... use codex to fix codex-cli
💆🏻
they report this on windows thru WSL2, but i'm natively on the latest macOS version, so that's different to an extent
26.4 (25E246)`
Hmmm, do u have npm? I think u can use update
Just second
I did the manual binary replace for 0.116, and it did not do any difference
so the local patch on my fork seemed to have resolved the problem
Hm i use 0.117 idk
Maybe u should change shortcut on keyboard?
Nope! This isn't an issue when i type here
it is strictly an issue in codex's textfield
an odd situation of "it doesn't work on my machine"
i will installed the patched version, and will see!
i am of course testing it (and other stuff) live, so i have objective evidence
Try explain to support, i think this will help for next patch
Not sure how else to explain it other than, "macOS tahoe 26.4, vscode 0.113.0 arm64, codex 0.117.0 (and then 0.116.0) unable to caps-lock when typing into ThE TeXtFiElD wHiCh iCan Do HeRe)"
it's like i cannot capitalise and scream
Ahhahaha
U cant thats cli)
In codex app u can lmao
lol
hopefully this issue-patch can be further looked into, and further tested against other machines
i even got an intel mac i put to charge so i could test it there, too!
and i also got windows on it, so i can WSL2 it, as well, and native windows... maybe
I use just power shell, works amazing
i use zsh with oh-my-zsh on macos natively
🤮
🙁
My main operating system since childhood
this is not a reason
windows for dev is atrocious
hence why i moved to macos like... 3 months ago
never been happier
You just get used to it
i could never in my lifetime
no
that is a terrible way to think
it's like saying, "oh, you'll just get used to physical abuse, just deal with it, no problem"
or macos
terrible argument
macos is like
bad but not so bad
very user friendly
not too bloated
very pretty by default
still better than linux and the lack of actual ecosystem and stability over there
also Liquid Gl(ass) sucks
I use intel😭😭😭
what
wym
better than linux
you cant compare them
like
"linux"
is a collection of realistically like 10 major operating systems
or 3 depending on how you count
linux and macos are both unix-based, except macos is derived from BSD... to a major extent
I can, i don't wanna install all drivers and think about "why my os leaving"
I wrote a reply to your issue, check if it works for you
to me, i tried linux, but the only time it actually worked well, was ironically on a Lenovo Yoga 7 laptop
installing drivers is only a thing on windows
if you dont have mac hardware
use linux
everywhere else i've tried, except for say... Steam Deck and whatnot, was just terrible experience overall
do not recommend outside Valve's stuff, or Lenovo devices
I have never tried on steam deck
but on my 2019 t2 mac linux worked great
on my desktop it works
on this acer aspire 14
So u understand 30% humans use windows for all time?
^^^
I just don't see the point or benefit in this. In Linux.
I can use msconfig and clean ALL AND USE ONLY TERM but for what?
It’s okay if you want a subpar operating system
That’s your choice I suppose
But I am saying that I grew up developing on windows
Thank u for understanding 🙏🏻
God bless u
And I can never go back
Same here, but moving to macOS worked for me too well, so I will continue staying there
makes it incredibly comfy
@hard drum did you check my comment regarding your issue in github?
let me see!
macos is reasonable
i did a replacement --release binary with my fork, and i am happy with the result.
anybody else getting high cpu usage for no reason when the vscode codex extension is open? Not even running anything
On macos
hi all, is 5.4 pro not available in Codex app?
It's not 5.4 pro isn't really meant for programming imagine each request taking 10 hours
oh i see.. so the only benefit of pro is more usage within codex?
Well you get access to spark (which is about 1k tkps), you get (basically) unlimited image gen in gpt, unlimited prompts, models like gpt 4.5 and 5.4 pro, you can ask gpt to not train on your data, more usage on codex app
Anyone know how to perform a search in the Codex VSCode extension?? CTRL+F invokes a search in the main code window. 🙁
I use codex on windows everyday it's great (wsl). Way better than CC on powershell
Why does it feel like my usage limits on the plus plan are so high right now
ive been beating codex up and down all afternoon
only used 20%
usually 50-60% by now
5.4 extra high
ive been using xhigh as long as its been available, i just meant it feels like im getting more out of my plan lately
compared to a few days ago
5.2 xhigh > 5.4 xhigh
ideally 5.2 high for plan, 5.2 medium/5.4 medium for impl
Do yall use implement plan or copy plan into new chat
the way codex writes plans they are meant for the planner to implement, if you use the default plan mode and then use it in a new chat it wont have the context it needs and the plan is ambiguous.
So you need to prompt for a plan that can be run in isolation by an agent that has no prior knowledge.
GPT 5.4: Sure, I'll help you set up red-team E2E to identify and poke at potential exploits in the site you're working on.
Also GPT 5.4: Nope, I can't make a .exe to read from a .txt file and use SendInput to write it to another window. That's not a legitimate use case and I will not help with malicious code.
I hope so, because it's gonna be like a kick in the face when 2X promo ends in a couple days
I LITERALLY GOT A REFUSAL FOR NO REASON YESTARDAY
im like
okay
hook it into the api
and its like
no
I cannot help with automated [REDACTED]
and im like
you built the rest of the tool?????
I know, I am already running out of quota, it will be so bad
xd
avg gpt 5.2 moment
1 hour
still running
not all threads are built the same
Super helpful and informative Codex work! lol
How much code generated for that?
That's pretty average, but, hmm... I mean, most of my prompts result in 3-5K each, how were you calculating the use cost?
should be $1 per 10k loc... if we just get another usage reset everything will be fine guys
I'm not sure what loc is
lines of code
ccusage on codex ofc
api pricing
per token
lines of code is not a valid metric
there is no valid metric, but they're trying to normalize it with credits
well im just using input tokens * api cost + output tokens * api cost - cache
like
if I were using the api
this is how much it would cost
Is any one else's codex usage burning insanely fast for some reason? Like literally after 3 prompts it's done
free or plus?
I would check the dashboard on the web to see where all your usage is coming from maybe
but I haven’t had that issue personally
I'm enjoying adding features most smaller projects don't bother with.
Quebec French when?
🇨🇦 ouiouei
If we hit £500,000 I add fr-ca, and a full French Mode where the app takes three times as long to complete a task, then force-quits at 17:00 local time.
Dont forget to have it demand an equalization payment too!
Don't forget to open a PR for the next marching on the streets of Paris with torches und pitchforks!
Also if fr-ca is selected, the app will refuse to accept any English data inputs.
At that point, put most of the words together too, so that it sounds like you're reading a live transcript of xQc in his natural state
I'll add text2speech in xQc's voice.
albprplrobpaaa, chat? alblglblblbl, blblb chat
Maybe for en-us I'll add an Asmongold mode
In-app reminders will be in the shape of a rat
rat would be PiRAT Roachware
asmon would be a cockroach
you'd want a narcissistic rat for piratesoftware
and don't forget that he will occasionally drain his mana, then ask what he is suppoed to do for you
Is fast mode actually making the model faster or just wasting tokens?
mine worked for 50 min produced no code, then had an error because its context was too big to compact :/ my bad
I'd like to make an official complain to OpenAi codex team. Ai destroyed my life. I can build so much cool stuff now, but day still have only 24h.
Does anyone know how to enable PR screenshots for codex cloud environment? im running into this issue I couldn’t take a screenshot because this runtime doesn’t expose the required browser_container screenshot tool.
I was up until 2AM context switching life a mf.
autoresearch with codex doesnt last long
Does anyone know how to make links work in Codex' chat answers? They are resolved to URL links that open the browser instead of opening files inside VS Code. I tried to change file_opener from its default to specifically "VSCODE", this removes "open-in-target" errors but still doesn't resolve the links properly.
I have found they just don't work half of the time. There are a bunch of related issues raised on Github for it. It is annoying.
Seems to be a Windows issue, as the Codex team is apparently 'Mac first'
For the time being I will set it to "none", because there are strong indications that this is responsible for Codex pushing Code to high CPU load even while "idle" and generally sluggish behavior. I will have to watch that one for a bit longer, though.
But even "VSCODE" might fix these compared to the default.
Has anyone found something like an OnCompaction hook for Codex IDE in VS Code? I would like the agent to selectively ingest some *uncompacted) context after compaction.
Have you ever seen a piece of content from them without a MacBook in it? I’m sure almost the entire org uses mac
Yeah, it is frustrating since I do a lot of windows based work- and after a quick google, the majority of developers in the real world do... but not them, apparently. I have been raging at the 'apply_patch' script codex uses to apply bulk file changes simultaneously constantly hitting the windows command length limits for the last month. It loves to try to use apply_patch, fail, then fall back to manually editing the files one by one... taking like 5 minutes and who knows how much tokens/context.
Yeah I know, windows command length limits is the issue, but there has to be a workaround they can do for that.
Every time I type the word "linear" into the chat a popup suggest to install the "linear plugin". 🤔
run it in WSL if you are using windows
I noticed same behaviour when was running it normally on windows, but sadly it is actually unusable without WSL
It takes few mins to setup if never did that before, but you wont regret
Is codex on mobile less powerful than just using codex on windows? Also is there a difference between using GitHub plugin in chatgpt compared to using codex in chatgpt mobile?
I’m guessing it doesnt have cmd commands that’s all? It just makes more mistakes on mobile, when codex in mobile and windows are gpt 5.4?
Just don’t understand why chatgpt codex makes more mistakes
That has the drawback of having to access native NTFS C: drive files from within the WSL environment. It works, but it's at least a bit less convenient (and theoretically slower). Personally I don't need working links atm, so I can just disable that and also benefit from it (seemingly) being a workaround for CPU idle problems (not entirely sure yet).
What irks me more is that the 1M context window is somewhat useless. I am currently below 700K and Codex/GPT starts misbehaving again. Not concerning history/context, but not being able to properly handle even the very last chat input (misreading input, stating wrong/older numbers, ignoring input partly or entirely).
I saw that happen multiple times both with Codex in VS Code and the GPT web-chat already. Time to start a new chat, but what do I have a large context window for then?!
This has been bugging me. So, I just counted, Codex has generated 19,404 lines of code (deleted lines and inserted lines total) since last Thursday reset. I have 90% of my token usage available still. Are people literally generating 200,000+ lines of code every week?
Did you count by hand?

Also, it's not just the writing of the code itself. It's the figuring out what to write, writing it, writing tests for it, running those tests, fixing issues, etc.
Just added it all up
Oh, ok, I don't use Codex for planning so that might be the difference.
Ok, well that kinda solves the mystery at least, just been bugging me, like, how 🙂
I'm still mad at WinUI 3.0 for not having a XAML designer live view where I can do WYSIWYG design.
So dumb. Not everything needs Figma or equivalent.
upgrade to pro
Could also be your project is simple in terms of tokens required.
I can't justify £200/mo for hobbyist stuff.
Also I think OpenAI ended the 2x token week thing?
I hit the weekly limits after 3-4 days. Most of this is analysis and policy tuning, not the few lines here and there.
I switch between Codex and the web-app in order to save on Codex tokens.
It can happen that Codex/GPT has to analyze hundreds of values in order to change a single number.
Actually that's what's happening all the time here. 😄
This is the kind of log file GPT has to analyze on top of measured data to correlate against (older log, complexity grew since then).
Which plan, and what are your workloads like?
I'm only using this stuff for hobbyist projects as I'm not a professional dev anymore.
Plus. I don't know what you mean by workloads. Most of the tokens are used for log and benchmark analysis and then policy tuning and then back around. There are only a few (dozen) lines of code changed at a time, sometimes only a few variables values.
For most of this "Extra High" is needed and even then struggles with the complexity. That's ok. I am still on a free month trial plan, so I absolutely get my money's worth. 😉
And I also improves another app/script on top of the complex one using the same plan where GPT struggled a bit with the specifics, so that cost tokens as well. Sometimes tokens get wasted by GPT not properly following instructions, though, which then leads to a back and forth that isn't always handled by more specific prompts.
I knew it.
I am building an app that turns screenshot 1 into screenshot 2.
I think if you used a real database like PosgreSQL to store the logs the agents wouldn't need to burn tens of millions of tokens doing data analysis on unstructured log files
They are more structured now, because policy and general debug logging is split now.
And reading the logs doesn't take the bulk of thinking. Putting it into correlation with measured results in various different workload situations and then coming up with policy tuning changes keeps GPT "thinking". 😉
Right, but with a real database they can do stuff like SELECT * from logs WHERE date < ... do really complicated queries that can produce real insights without manually looking at the logs or describing how to parse them, and create sql scripts that automate the scrubbing process
With policies for the different workload scenarios at least partially affecting each other. That's complexity enough for burning tokens.
Just a suggestion. Might make it to 7 days if you optimize that one area
I spend a lot of tokens to make sure this is true, varable names are the same in the database, sql provider layer, application logic layer, rpc security layer, and frontend bindings and pages. All use deterministic models and data keyed to deterministic values. End to end pattern consistency makes LLMs sing.
autoresearch is so good its like crack
I actually generate a lot of the application itself directly from data, rather than code. The more of the build you can perform mechanically, rather than generatively, the better.
A lot of my generative code is about writing mechanical code generation so the LLM doesn't have to keep track of a ton of stuff.
I almost always tell it to write scripts to fix things, rather than trying to fix them.
GPT/Codex designed the logs itself. There was no need to describe the parsing to it yet, rather the other way around if I was interested in looking manually. 😉
Would using a real database for the debug logs help GPT performance and saving tokens? Could this be implemented natively in AHK 1.1 (plus DLL APIs present on all Windows systems)?
"A real database would only help GPT/token usage if you also change the analysis workflow so the model sees small queried subsets or pre-aggregated summaries instead of raw log streams. The token cost is dominated by what gets read into context, not by whether the bytes live in .log files or a .db file. With the current package, the bigger win is already the structured benchmark store in runs.csv and metrics.csv, plus targeted text-log review when a run needs it.
For runtime overhead, a database is also not an obvious win here. The current logger already keeps persistent append handles, batches normal flushes, and only forces immediate flushes for lifecycle checkpoints in PowerPacer.Settings.ahk:69 and PowerPacer.Platform.ahk:3484. The main cost of moving to a DB would be complexity: schema design, transaction/locking behavior, recovery, corruption handling, archival/export, and much worse human inspectability than the current split text logs in PowerPacer.Platform.ahk:3556.
On “can this be done natively in AHK 1.1 with DLLs present on all Windows systems”: technically maybe, practically I would say no for this package’s goals."
They are so much better when you get rid of the "must use apply_patch" directive! I like seeing their crazy python-fu work magic on the codebase
I added an MCP server in front of my database server so the agents can read the database schema directly from INFORMATION_SCHEMA and also dump table contents where a lot of the application details.
I still need to figure out how to wire up Codex to use it, right now Claude.ai uses it for planning, so not a huge deal.
Oh gotcha, csv is good enough! They can at least run it through pandas for analysis. cool!
It regularly reads snippets only like starting with the tail first, before diving in deeper. And I use fresh logs for every benchmark/test run atm.
Main log + policy log + 1x screenshot showing Rivatuner+HWinfo overlay data:
#afghanistan🇦🇫 #دایره_نوازی #زمزمه_های_دلتنگی #محلی #sofi
https://www.instagram.com/reel/DVB1ECXiLrj/?igsh=bWNsdXR4cWRlemJ3
Am I crazy? For a while, maybe a few weeks ago, the codexcli chats were reproduced in the Codex app. I'd see new ones pop up after each App restert. Then at some point that stopped happening is that something we can turn back on?
ugh codex is crashing after several seconds. Any fixes for this?
I was reading about this recently. Common popular offerings like OpenClaw are using non-vectored .md files for basic memory - this technique is trending, with projects independently coming to the same conclusions.
This doesn't work for semantic searches. But now with agentic processing where we're working with smaller context across more threads, the pendulum swings back a bit from "gotta be in a DB" to a more hybrid "put some memory/data in files and then when you need semantics or heuristics, or simply massive amounts of data ... go for a DB.
I go with what GPT came up with itself:
Hey all, what's the best way to mimic something like stop hooks for claude code but for codex? anyone got some good setups? is the agent SDK the only way?
so no one else is seeing the issue where codex closes several seconds after it's started? I've repeated this many times now.
Codex supports Start and Stop hooks ... those are the only two so far. So, it seems like you're in luck this time.
You'll need to provide a lot more info:
Which "Codex" are you talking about? VSCode extension? App? Cloud/Browser? CLI?
What do you mean by "closes"? Does it just die? Does it say it completed a task that was not completed?
C'mon dude, act like a developer and provide some diagnostic info.
I've submitted a bug report with these details but here you go: Win11 Codex app. It closes several seconds after start. No errors, just closes. Replicated 5 times. Found the logs, didn't find anything (obvious anyways) why it closed. Btw, before it closes I am able to type or do other things in codex so the app is "running". It closes independent if I type in it, or do nothing.
I run codex CLI on Ubuntu 22.04 , and it change my terminal font to chaos, not ASCII characters, how ever, I fix it, and seem like leak of content, it read a some Italian paper.
however, I am not use newest version, maybe openai already fix, they have lots of PR
I am sure 5.4 is slower than 10 or 15 days ago, I use token counter when running, seem like too much people use, no big issue, openai never promise rpm or etc. , when about v0.106 you run speed mode when then just announce 5.4, it was 200 token per second .
for now, maybe 10 or 20 when you don't use speed mode.
so I added the info @lean lark. Any response or nah?
It just replies me after 30 seconds
It used to be super fast though
exactly 32 seconds lol
though all the other gpt models just work fine
other than 5.4
I can't help with that specific issue but at least now someone who might be able to help has some of the info required to do so.
I would go to the GitHub repo and search for focused keywords - see if someone else has reported this.
Also, do a reboot, just in case, cuz ... Windows.
Good luck. 🤞
How do you get it to output tokens per second?
I'm guessing its a CLI only thing right?
you can monitor codex session, I am not sure how it work after compile.
There are always things that these models just don't know. I wish there were a way to expedite such information to OpenAI so that they can add the info into subsequent training. For example:
- I just created a Java (Android) project in Codex Web. For building and testing it adds gradle-wrapper.jar. But a PR from Codex cannot include binaries (.jar=.zip). We've known this for a Long time. So why doesn't the Codex model (gpt-5.4) know this yet?
- The GitHub 'build-and-test' operation fails when that JAR isn't available. Why didn't the model recognize this when I told it to remove the JAR from the PR?
- OK, so I explicitly tell it to delete the file from the repo but add it into the setup.
- It's also unaware that Node v20 is deprecated for GitHub testing. Why didn't it know this? This is a tool that integrates with GitHub. So I added that into the build-and-test as well, to require Node v24.
- If the model generates images into the Codex VPS, it doesn't recognize that those won't survive the PR. So I had to write a utility a long time ago that was and is still required to base64-encode for the PR, and then rehydrate images from the repo.
The language isn't important. It just doesn't know things, and we don't have a good way to get these tidbits to OpenAI for incorporation into their system instructions for the Codex pipeline. Every developer needs to find these things out and handle them individually in the hope that some day we might be able to remove the redundant world knowledge. That's just sloppy.
I think because it model don't know what is to know.
it can not think, it predict next token pretty well, so, more likely mean or average level is best.
But as a product that is most commonly used for integration with GitHub, it should be aware of GitHub requirements.
Here's another example: When that CI process fails it generates huge logs (the lint log captures some of the above stuff). How do we communicate those failures back to Codex? It seems we need to write our own code to parse the logs for errors so that we can provide Codex with carefully defined problems to resolve. Why is this "a thing" that we need to do? I'm guessing most people would just copy/paste entire logs and tell Codex to figure it out and fix the issues. That's a big sloppy waste of resources.
codex can read ci/cd logs with gh auth
It looks like I'm confusing model training and system instructions. To be clear, yes the model only knows what's available at training time. But as a product, subsequent fine-tuning and system instructions need to supplement the information available to the model so that it can operate within the defined environment.
That requires a lot of wiring.
Yes, the way we do things in this industry is that we create a ticket, it gets considered, maybe processed. What I'm lamenting here is that this has been the state-of-the-art with Codex since it started, and it seems obvious enough that it would have been addressed early-on and not a condition that persists for years.
It's like using "gh auth" as @cobalt junco suggested - that's a good thought but it requires a lot of brittle configuration. We shouldn't need to play games like this ... we got udder stuffs ta do! 🐮
not a lot nor brittle imo
I will look at this more closely. Thanks. I tend to think of gh auth as being a sandbox hassle but I will try to find some time to re-evaluate my position on this - it's only fair...
Is claude better than codex for coding etc?
Currently nope. Claude is behind for quite a while now.
I mean hey, that's the price of using someone else's computer cloud to do programming! I think the Base64 utility you made is pretty genius tho
I'd like to find some solid testing/evidence to support that. The world blog-o-sphere seems to be much more focused on Claude than on Codex. There's a great deal of excitement around it. At the moment I'm avoiding too much "xyz is better" fanfare because there's rarely a fair (if any) comparison. People are just talking about what they know and saying it's the best. Cuz humans do that.
I have a pet project that I've been wanting to with Android for a long time but I lack any time for it. So I decided to see how competent Codex can be with ( cough ) vibing. For this one I'm happy to just tell the bot what I want and see how well it can do. Unfortunately I haven't even gotten to the application because Codex is really incompetent with configuring the CI.
Is it worth spending the $40 for codex credits if I’m close to done working on a project but am near my weekly quota? I don’t want to burn though $40 in 20 minutes for instance.
And ideally I keep using it on Gpt-5.4
Subagents have been complete broken for me for 2-3 days now, am I the only person who this is happening to? This is the output of a subagent tasked with exploring part of my codebase
If this is in a work flow i would start questioning the orchestrator and find out exactly what it has been saying to the sub agents
It's not that
Somehow subagents no matter what they're told to do get confused about their identity
it seems like chatgpt app can continue doing coding in github after codex credits run out
In todays age sounds about right 😆
yo codex icon has dissapeared form my side bar and the bar above tabs how do i enbale it
anyone having issue with codex?
■ stream disconnected before completion: stream closed before response.completed
the enpoopification has begun
Oh no I'm down to 20% usage until April 2nd
Come on great Codex people pull a reset please
just buy the 100$ plan haha
there is a 100$ plan?
oh mb onyl 200$ haha
I think the only thing that's actually changed with the limits is openai stopped resetting them early lol
they seem to be pretty managable especially with 5.4-mini
that model is great
I am on the $200 plan
jesus
Not sure what im going to do when the double usage is over
Yeah me neither I've hit more than 50% pretty much every week
GPT 5.4 xhigh failed to reverse engineer, but it implemented ALL of PL... It implemented almost every functionality, but couldn't put finishing touches on it. It seems OpenAI slowed it down in this area, but Pro version showed brilliant results, and Claude Max's efforts were also successful
Well, 2x limits are gone on Friday, so I think im going on vacation
So I dug into this and found a workaround. The subagents are basically forked conversations, meaning they have all the context from the parent. The workaround I found is adding something like this in the description:
CRITICAL:
When spawning this agent, use fork_context=false by default
Why did my codex add this to my AGENTS.md? I only tasked it to build a simple frontend, this was the initial AGENTS.md: Make use of skills wherever feasible. For example, when working on frontend, use the frontend-skill skill. You HAVE to make use of them, for ANY task where it could be useful.
It also created a CLAUDE.md 😂 5.4 xhigh btw
it's probably copying what you had in another directory under instructions, that's a common thing to have in cluade.md because claude forgets to use skills and follow rules a lot.
It would be pretty OP If OpenAI got With Supabase And Brought Supabase Integration To codex So Codex Can Single Handedly develop Full Stack Apps With ChatGpt As The AI, Supabase Backend And Codex As Te Dev
Unrelated to Codex, but has anyone seen that the CC source code was reportedly leaked? lol (hidden/upcoming features also leaked)
MCP and CLI work well
i mean yeah there are mcp servers but they acan be tricky to link up that wasnt what iw as talkign about, imt alkign how lets say Lovable Has it
I wasn't aware of the skills - cheers for that
just found it when searching for the URL 😉
codex is cooked
Lovable uses Supabase as it's own backend, that's (hopefully) not something that will be happening with codex. The MCP has the same setup curve as Lovable with a own Supabase account: you create an account, let the MCP auth to that account and then you're set.
did anyone recently used the claude code codex plugin??
no u can connect ur own too
it's really helpful if you read all and not only the first sentence 🙂
First 3 words.
All I need.
All rest unnecessary
https://github.com/kill136/claudecode
The source code from the official website. The leaked version can be run by executing node dist/cli.js –version.
Hell yeah, that’ll be interesting to inspect
why do they not let you fork from previous message in the codex app
I've had extremely long runs, but the performance at these ranges sucks.
why
I ran out of my weekly quota in 3 days
2 days for me
I just rotate and thinking of getting Gemini now too if I need the extra credits
If openai has a good deal for 100 dollar sub ill do that too
Codex 2x usage is going away on Friday apparently
is there any way to control cursor agents from your phone, like claude code lets you
my longest was 4h but then app was ready 😉
I’m working on a small open-source project (very early stage) it’s a CLI tool that uses AI personas to test apps (basically “break your app before users do”)
It's using codex!
If any want to participate or try let me know
https://getwired.dev/
so slow today. idk whats up
I like it so much that they recently released free access for gpt-5.4 in the codex cli and app for free, last time I checked it gave an error.
I dont think they did that
oh cc source got leaked o.0
i wonder if that was an accident
probably was, after all, they admitted not too long ago that Claude pretty much writes itself, so I expect it made some mistake in that process 😄
I've had 4 QA agents, 1 orchestrator, and randomly spawned and archived workers chugging away for 72 hours straight with very minimal issues or input from me
I'm sure the office is filled with devs mumbling into microphones instead of clacking at keyboards. But claude having unfiltered opinions and making unchecked decisions about code is pretty unlikely. Its more likely they use it as a work horse so they dont have to clack keys.

