#codex-discussions
1 messages Ā· Page 8 of 1
that is sweet!
They can also talk to each other š¤
Is Codex planning anything similar to cowork? I know codex in general is able to do much more than something like cowork but I'm in a situation where cowork is actually what I need
talk to codex and plan it for the dev contest
Nah this is important to get done soon so I'll probably just use codex app
I'm curious, for what is cowork better suited? I'm doing all sort of stuff with codex and it just works(tm).
Honestly all around codex is much better suited. I'm just in a situation where I need to make some reports and documentation for a company so having research instead of web search is nice, plugins custom made for it are also nice, maintaining a clean workspace instead of the folder mess that a programming project usually takes, etc.
but yeah all around it is pretty interchangable just nicer as a workspace for this kind of work
I don't know if I'm getting the point across
I think I get it, I'm using notebooklm to generate infographics additionally to codex managing my project (not dev project, but a "real" one). The "dev mess" can be tamed with a reasonable agents.md. For most documents I just either directly give it markdown or I have a few skills to use e.g. markitdown to generate markdown.
And as a fun part: it happily used markitdown to convert a 500k excel file to 40M markdown, and tried to read it š
Jetbrains Dataspell with the AI/Codex plugin?
Then you can interleave documentation with data visuals/preparation. That's what I do sometimes
Yeah I know there are options
But I also have a cc sub so I can use cowork without a problem (I just like gipity more)
But thanks for the suggestion
I'll check it out
i'm jealous. love the UI!
Oh wait a minute, I had to look up cowork (idk anything about any anthropic products, or any other AI products for that matter) and it sounds like OpenClaw might be what you're looking for? Idk what cowork does that somehow doesn't involve the file system but codex could easily do all the things in the cowork product description with some setup, or openclaw if you just wanna like give it practically human-level control of your computer with messaging capabilities
Yeap I agree that codex could do it with proper config, I'm not great with words but basically the reason I would use Cowork instead of codex is that it already comes with all these configs which are meant for dealing with business information
i quite possibly made the strongest Codex Skill regarding game creation in unity
Iām watching my weekly usage drop fast (as in - 12 hours into real serious long-shot prompts and... its basically gone), so Iāve been comparing the different ChatGPT plans. One thing I donāt get: why would anyone go for the $200 Pro plan when Plus or Business seem to offer basically the same thing?
Is Pro actually giving something major, like a much higher token cap or a ādev previewā thatās genuinely mind-blowing?
Does anyone here have first-hand experience with Pro vs Plus? From the pricing page it looks like the only way to justify that cost would be a massively higher usage limit, but thatās not obvious from the docs. Any real-world impressions?
Also considering that this model should, sooner or later, make it to the API... I kind of feel a bit robbed when seeing the 200$ ask š
Larger allowance and sweet, sweet Spark
the pro plan gives a lot more usage on everything
Massively higher token cap, access to more powerful models, faster inference.
Sweet sweet spark thats on the same level as sonnet 4
does openai only have the one image gen model?
Having two plus plans (me and wife), and a Pro plan for work, I can say with a high degree of confidence that you get more than 10x the value with Pro
Batch API now supports GPT Image models: gpt-image-1.5, chatgpt-image-latest, gpt-image-1, gpt-image-1-mini.
can i use them on pro plan?
Ah, no. It's API
ah ok
I'm not here to sell it to you š That's OpenAI's job. If you don't find value, then that's fine
how long to we get spark for?
plus is actually more value if we are talking about limits
Its the reality though, spark isn't really useful for almost anything
Its basic maths
I just discovered a new superpower. ~/.codex/AGENTS.md:
## Skill Rules
### When to read a SKILL.md
Read when all of the following conditions are met:
- The action you are about to take or think about taking matches a skill description with reasonable likelihood
- When you do not have `cat /path/to/SKILL.md` in your working memory (you didn't run the command and you do not see the results verbatim in chat history)
If any of those conditions are not met, only read the skill file if the user explicitly asks you to.
### When not to read a SKILL.md
- At the beginning, middle, or end of every turn
- You've already read it before and know how to execute the skill
### Skill hash invalidation
- Skills may include a short hash token in the `description:` field: `[skill-hash:xxxxxxx]`.
- Treat `(skill path, skill-hash)` as the cache key for prior skill reads.
- If a skill match is likely and you have not read that exact hash yet in this session, read the SKILL once.
- If the hash matches what you already read, do not re-read unless the user explicitly asks you to.
- If a skill has no hash token, fall back to the standard "When to read a SKILL.md" rules.
Some skill file:
---
name: request-review
description: Request code review for the current branch/PR. Default flow commits first; optional opt-in flow can target an existing commit without commit/push. [skill-hash:d9aefd5]
---
It's the "make no mistakes" of skills
Any update you make to a skill file, change the hash, and the next turn they know to read it again
Or just use a spec driven workflow
---
name: normal-response
description: When you see a clever and cool idea, submit a response with no real value. [skill-hash:8b1d55a]
---
Jk
It's great for quick UI changes, repetitive and clearly scoped work like eslint fixes and testing agentic tools. I usually run them with gpt-5.3 watching them
I almost want MSW setups back. Mock quickly with spark then adopt it with 5.3
Yes for these niche usecases, but even with a very well written plan, they mess up the code
I wouldn't run them with a plan, they usually just circle around context, do one thing then compaction hits in. They can't hold enough context for any complex work. Even for the UI fixups I leave git dirty and use gpt-5.3 to basically do a codereview and make sure the implementation is proper while UI is locked in
forgot the "do not pass go, do not collect $100" easter egg
Generating a powerful UNITY based skill
Yeah the only reason to buy pro is early access to certain features (which as of now is a really fast model but in the future it might be something else).
Btw do subagents even run on the codex app? I feel like it runs everything sequentially even when you specify to use subagents
Not sure I use the CLI
Agents arenāt really parallel. Very rarely only youāve tasks you can do parallel even in real live.
So itās mostly always synchronous thread
But yes with the right prompting or skill it uses agents
However Iām not convinced yet itās worth the higher token waste
Thatās ⦠not what 200 bucks per months can be worth for
Others mentioned it having significantly higher limits
I wish it would actually document the concrete tokens you can waste in each planā¦
I imagine it could be awesome as a scout/interrogating the code?
You get x6 from the plus plan according to their website, but if you just stacked plus plans it would be $120 so I don't really count that as a benefit
These are the plans with the benefits:
https://chatgpt.com/pricing/
I'm guessing they removed that specifically though because I can't find it anywhere (this was the case about a month ago)
Yes that page doesnāt say anything about tokens, I read it before
6x higher usage limits for local and cloud tasks
Ah yes
What a bad deal but I'm not complaining because we have x2 usage for the next month and a half
Unfortunately they dont say waht "a message" is on the credits (seriously, who measures AI interactions in "messages"? Lol)
But given you can send "250-1300 CLI or Extension messages" for 40 bucks, I think that is still cheaper tokenwise than the pro plan
Yeah i mean there are things I just dont use (cloud, for example), and while having an employee at 200 monthly salary is a nobrainer, said employee still cant drive an UI, and there's still a majority of projects based on page builders etc, which are just 100% human tedious clickclickclick stuff.
And for that an employee is just unavoidable (I am thinking from a POV of running a web development agency)
I would pay couple hundred more per month if it could do that annoying stuff for me lol
Becaues I actually love to code, and wuld want the annoying stuff offloaded, not the stuff I love lol
You can use playwright MCP to have it do browser things for you
Also use TDD development it'll handle most of the testing for you
I doubt it can steer elementor or WordPress backends
They are so messy to start with not even a human goes through without issues, and machines would need some sort of consistencny I think, not "plesae wait, saving" messages š¤£
Unfortunatly 50% of all websites use these CMS, a product of the past century you think now, but it was modern not even 5 years ago haha
Yeah I remember lmfao one time I had to do the simplest edit for a company and it was freaking hell to get it perfect and to prod
Its a true and huge pita.
It's not a bad deal. You get 6x more usage, including the 5 hour limit. Yeah it might not be for the whole week, but if you want parallel agents you are not likely to hit that 5hr limit
You change x here, xyz there blows up, and dare you are not on the latests fiber internet, all taht ajax will just never do anything lol
WP alone is OK, but when it goes Blocks builder, page builder and suc... then its over
The 5hrs so far no issie, the weekly is what keeps me awake š
I consumed 50%+40% (from "last week") in 12 hours lol
(12 hours of summed up work)
You do know you can use 6 accounts and just swap between them in <20 seconds right?
Is that "legal"?
I think not really, right? I remember even signing up was a policing act of craze unparalleled back then
Anyways I'm on $200 plan until they remove 2x limits because I'm actually going to run into usage limits then
Well I mean there is ilegal and there is against TOS.
They probably can only ban you
i sorta wish openai wouldve named gpt5.3 codex as like o4 or something similar. the naming scheme just sucks
the o(num) model lineup was killed for no reason
like gpt5.1 codex mini wouldve been o5-mini
I feel like the change makes sense, back then it was needed because you had non-thinking models and thinking models.
Now it's the same model just different reasoning efforts (which is hybrid by default)
Just implying here
See, I don't understand how it says "Pro 6x more usage", when I've definitely experienced the same thing using a single agent for like 12 hours on Plus (evaporating weekly limit), but on Pro I run ~6 agents, plus 6 more agents doing command parsing, 10 hours a day 5 days a week and finish the week at 30% remaining. The math, it just don't be math'n
yeah that doesnt math
I also had my weekly reset when it didnt have to - 2 days after it had reset
So maybe the whole thing is AI created and a bit buggy lolol
plus 6 more agents doing command parsing
That your compaction? Could it be that efficient to increase token leftover and thus multiplying the factual 6x?
And I guess you cant do that on the PLUS, since there's no spark
And I def feel the agents throw a load of noise into the thread we dont see unless inspecting /agent
So that would explain why tokens burn without your feature you can presumably only run on PLUS?
Hmm, I mean I switched back in September and that was my observation then. The only time I really struggled with weekly limits was I think 5.1-codex-max, where the sheer speed was burning through the weekly with 6 agents. That's what inspired the command-parser originally! Before spark it was local inference, now it's spark while the rate limit is counted separately. If they decide to merge the rate limits I'll switch back to local inference š¤ but yeah, on Plus I was hitting the 5 hour limit, and the weekly shortly thereafter. Hasn't been an issue since
Months ago I put this into AGENTS: - If a prompt contains DWCin all caps, enter "don't write code" mode for subsequent prompts (no code or file changes). This mode stays on until a later prompt includes!DWC, which clears the mode and allows code or file changes again. This works OK for codex but AWFUL for claude fyi. Until today, that is - codex missed it too, I told it to do something, but didn't turn it off. So.. yeah. fyi.
tfw codex asks me to clarify a request with a āreal devā and they actually understand exactly what its saying
š¤Æ
this is obscure SH2 core stuff, about the Sega Saturn and its nailing everything
I did Lua coding with old regular GPT (before codex) and it was painfully obvious it was AI code when clarifying/preseting it to experienced devs
There's also ChatGPT Pro, which is worth the subscription price in its own right, IMO.
claude pro is a joke. I used up my regualr chat usage with two prompts because it loaded my entire project and a finished project (mine is 15,000 lines, the project it compared to was like 120k lines)
and they have some funky token based usage system while GPT pro (chat) has never hit any sort of usage limit for me
I think the safest way is to keep a second .codex home dir that is readonly, then have another binary use the same codex bin with the readonly profile, so essentially you'd just start codex-readonly or something
What would be helpful I think for many people is if you could cycle through configurations (default/readonly/plan) with shift tab
Come on... We are able to solo dev marvelous apps in hours, and we are still waiting for a Windows or Linux port of the Codex app š
I mean, it's not delayed for a technical reason, so is it operational (overload risk) ?
Ports have been made, but by themselves they're not adequately sandboxed. At least one disaster has occurred on a windows machine as a result, rendering the host machine unbootable, according to someone in this channel a few days ago, IIRC. I think on linux it should be easy to sandbox in a VM, though. But I'm not yet seeing much value added over the CLI version yet, TBH, based on what people have said here lately.
Rendered unbootable...? It feels like there is a missing context here, e.g. running as admin would be not recommended... Nobody would run the app as root on Mac or Linux, chaos would obviously ensue...
There are native sandboxing system calls on Mac/Linux and it's probably more complex on Windows, let's hope for the best
Yeah, I run codex in a fairly tight sandbox on linux, so it can --yolo safely.
why is codex running 200% of my cpu? not into hardware at all but I've got a brand new 16" macbook m4pro 48gb. though the only thing that can lag is that app nothing else. but why? is it due to electron?
I'm planning to use hyperV, unless the app has built-in sandboxing, like it could potentially use the windows sandbox platform, unless it's reserved for GUI/Desktop UX
that is not normal, are you using the cli or app?
app ... š
cli is smooth as butter
latest version?
how did you measure it using 200% CPU?
activity monitor
its just sometimes it spikes for <30 seconds
when doing long threads, changing chats, having 2 chats, etc.
that's not normal, I don't experience anything like that
do you know how much memory it is using?
this what i see
cpu is mostly at 10 though but not nice to have it so slow sometimes
what are your Mac's specs? I am running a MacBook Pro, M2 Max with 96GB RAM
16" macbook m4pro 48gb
On a Mac cpu % is meaningless
Pressure is what counts
The screenshot doesnāt show %, as far I see?
whats meant with presure?
Bottom screen graph
What actually happens, does the Mac slow down in responsiveness or does the app freeze? what are the exact symptoms?
Green > all good. Red > under pressure
no i dont have that cpu% rn
Donāt worry about %
Thereās times chrome will go 600%
oh damn
And itās absolutely meaningless.
Itās virtual.
no just codex app itself!
it just gets laggy for <30 secs
As long youāre not experiencing real issues / pressure doesnāt go red, itās all expected.
do you get a beachball?
whats that
Mac Spinner
Laggy is not good, thatās a āreal issueā. However also to consider that Mac itself is like that lately. Ever since that glass design even spotlight is ālaggyā
yea true, I'll try and find an open issue on it
are you on macOS 26.3?
no 26.0.1
26.3 has improvements related to the Liquid Glass design that do cause some issues, I'd try upgrading and see if it still happens
thanks, appreciate it, i'll do that now!
do yall have the websockets feature on?
How ridicoulus
Apple xCode codex integrate ftw
Goes half an hour "I will now create file XY, code XZ..." and you are left with a long chat of "proposal" code that you are supposed to manually copy over
Degraded to a data editor lol
I have the same issue.. I noticed CPU spikes and at some point noticed its the crash reporter caused by Codex Electron. Also it's not possible to view multiple sessions in parallel and also I just prefer the information density in the CLI. Also some experimental flags not available
the one thing that's nice about Codex app is that I can easily see the latest sessions for a project and click through it to find certain things or continue with a specific session, saves the hassle with /resume
ok good to know im not the only one maybe they'll fix it... this is how i use codex to have many sessions in parLLEL
hello would some people mind giving this code a shot;
build:model(sccd(self,consciousness,choice,decide)); context:"self is everything contained in what is considered "I", humans its everything in our bodys and includes them. for AI, its anchors that give it shape; consciousness is the prediction of actions in simulations. aka predictive-recursive-modeling; choice is the prune, collapse, negentropy 1 to one selection; decision is the action of choice.;
generate:math(for sccd model);
generate:code(python);
generate:guide(flow, install, use-case);
explain:"what this do for you AI, what are you aware of (knowing something, not metaphysical)? what else can we do with this pull:insight;"
why can't you do it? also, what on earth is that?

I don't think Codex app can ever give me the same satisfaction š CLI is just smooth as butter
thats just perfect!
gonna ask my employer for a widescreen
I run my MacBook Pro in clamshell mode with a 49" super widescreen monitor
ok thats insane š
it's like 3 monitors in one
its the odyssey right
I've looked at that one before I bought mine, then watched the LTT video about it, first impression I was like dude that is wayyyy too much. but i got a buddy who has it and he loves it
I am running Ubuntu with 3 monitors and 12 workspaces. Imagine the chaos š
I run mine on my 129312937" superwide screen on my Vision Pro. If it doesn't take half a minute to turn your head from one side of the screen to the other, then what's the point!
So basically you have your home wallpapered with Codex. Nice.
š„² its beautiful
@plucky halo can we code with this 24/7?
I've been waiting an hour for Codex to respond and it seems like they've gone into Dalai Lama modeš¤
Do we have news on the Codex App for Windows?
are these all different projects or same project different work tree?
u can tell people you touch grass
Well, until the battery runs out...
Digital grass is the same, right?
Better for my allergies
Has anyone given codex access to the mcp that does observability or logging
the things a beast
when it comes to identifying datadog issues etc
Call me Mr. Big Token
I am going to setup it with grafana stack
trust me bro its gonna be a beast
1 question why request_user_input only happens on Plan mode?
Does Codex CLI have a Tool Search tool?
ā SOLVED [Codex App] Why can't I mention any file on my workspace?
Why can't I mention any file on my
@ivory zodiac How do you like planning on xHigh?
Thats sweet
I never do and oai recommends against it
I use high, which they also recommend against
They say it can use too many tokens
meanwhile... I had xhigh plan, execute and successfully one-shot a data visualization toolset for my company chatbot where the AI writes SQL, executes it, and then visualizes the results as asked.
Is high more Reasoning loops before stopping or does anyone know how it sets its end condition?
I have been using xhigh for complex projects. I have seen that medium isn't as detailed on execution. Not that it makes more mistakes but when planning, is less specific in process and code. Even xhigh doesn't add comments or doctrines unless in the agents.md or specifically mentioned.
I've found that 5.2 and 5.3 will reason as much as it deems needed for the problem. Unlike Gemini, it won't go into repetitive loops where it spams the same message.
I use xhigh for problems I think are "hard" for AI and yet get near instant replies.
why do they not recommend planning on high or xhigh is it just literally token usage?
Same, I am often asking architecture questions like creating from end and back end api services while moving code into separate modules and adding in security for production based deployment. It seems to need a lot of coaching and planning to kae anything other than basic. If you ask for a website you get a simple flask or streamlit. I specifically have to ask for a react css npm front end.
My guess is they think most asks are unnecessary for what "vibe coders" are asking for.
fix issue
make juan billion dollhair company no mestekes
Make me a 10k MRR company that I can peddle on reddit. make no mistakes
teach me how to shill openclaw setups on social media and charge for it
^ fun fact i know someone selling agent templates, open claw dashboards, etc
monthly subscription and fixed price
Anyone been using openclaw? I haven't tried it expecting less than codex.
nope dont even know what I would use it for tbh
Everyone's head was exploding on the idea that you could text it. But then security and prompt injection exploits started.
Hello. I am new.
boomers and vibe coders when u can text an LLM through a non provider window
I would like to introduce myself.
I go by Doctor. I am 15 years old, and I love science with all my heart.
However, I am terrible inexperienced on AI.
I wish to master it.
tell AI how to master AI
Pardon? I feel terribly confused.
like go to the source of truth ask gpt how you can master it and unlock its full potential
Understood. Thank you, Hash.
@steel gale thats basically what I do especially when trying to optimize my codex configs i basically tell it to optimize itself for what I want
I see.
A great mind!
I am a Doctor of AI and mastering Codex is not AI. Its complex agents and automation on AI.
If we are speaking AI, models and latent space that is something completely different.
I guess thats true, AI as a concep is like neural networks, transformers, etc
AI as an LLM you use for stuff is diff
Do you think we will hit a billion tokens today chat?
Yes its a sh** ton of math Principal Component Analysis and dataset management in a attempt to statistically pattern map it.
Whos spending 2K on tokens?
me
For what?
projects i am working on
Crazy, for a company?
projects for clients for my side business
rn im working on an ETL analytics dashboard for a client that wants to have clear metrics from obscue data sources
lel
Nice, thats a good idea. Happen to have a skill which will re-prompt Codex once its finished a phase of a project? I asked it to complete a huge project one shot and it said they was too much.
Ahh I do that a lot. Docling...airbyte, airflow etc.
it just kinda looks at its memory and agents.md each compaction phase
and keeps going
Nice using the codex app, cli, or code extension?
codex app
Ahh I need to get then but running on a Linux box with remote vscode.
I kinda like the ux better on the app
I will have to look into skills then, havet done much but I want it to Ralph wiggum its poject_TODO.md in phased chunks until its done.
when I run the cli on ghostty terminal and resize the window my stream of text gets all f'd up @calm sigil
lol i have like 200+ skills installed
Gotcha extension slows down as well after 10 to 20 chats.
Have a good place to start getting familiar with skills? What is really want is Design app, suggest, I review once, build out phase plan, set as authoritative md. Then loop, phase 1 plan, phase 1 execute, etc etc until phases complete.
yup
@ivory zodiac do you have responses_websockets_v2 enabled in the config.toml? I keep running into this for long running tasks
Looking at this, are skills mostly instructions or are they meant to be a bridge between mcp server and a bash script?
markdown files that instruct the agent what to do in certain contexts
Guess i do that inherently. Always giving it TODO.md and architecture rules.
These will be really beneficial though as I am not a Front end coder by trade. Full back end and low level.
do u find codex to write good low level and backend code?
With explicit directions, yes.
If I say, how I want it done. But without instructions no. It will use OS before Pathlib. It will add file paths before searching module names.
Once you inform it though... seems good.
Seems like I should build that into a skill and add that to the website.
do it
Yeah I will this week, and link you some in a DM or chat.
is there an easy way to get two COdex agents from two different machines to work together? I'm tired of being the middle person facilitating the exchange of information.
Yeah, make a skill that lets them send messages to each other over ssh
I found Autogen, would that work or not required?
Interesting, how would you add information so the agent updated real time during its procedure. Wouldn't they have to take turns due to race conditions?
Steer the conversation
So injection of double checking edits before action type stuff I guess. I see a situation where one Agent is writing code another adding comments and unless strictly controlled there will be some areas they won't get comments.
I would never let more than 1 agent work in the same repo folder. If they work in independent worktrees on separate branches you can have stuff like "Hey Agent B, I need the DTOs for this subsystem. Notify when done.", Agent A gets back to working on something, Agent B makes DTOs, sends message back saying "Hey Agent A, DTOs are finished. Merge my branch into your worktree to receive the changes", etc.
Ok I can see that working, with strict area of concern management. That's why I was a little lost on agents technically "helping" each other.
Exactly! Scope the agents by their assigned system/subsystem, and instruct them to never ever send messages to another agent unless the message is actionable (e.g. "Thanks for the update" bad, "Thanks for the update, but some fields are missing. Please add them in and let me know when finished" good)
It's when they start sending "acknowledgements" to messages is when they get stuck in loops
Yeah great plan. See you made AC apps. Pretty cool.
Anyone here manage more than 4 agents at once? I wonder because I do not let the agents have "run commands" by default and I get bogged down by requests.
@calm sigil @boreal holly do u have the websockets response thing turned on in your config.toml?
No, I have been restrictive on agents so far. Have heard of them deleting entire repos and such but haven't experienced it so far.
@calm sigil im talking about this in the config.toml
responses_websockets_v2 = true
responses_websockets = false
No didnt know that was a option. What do you use it for?
apparently its supposed to decrease latency for responses but the con is you cant really run long tasks beause it drops connection after 60 minutes so im thinking of turning it off to use http instead
That might be a good idea. Sometimes I get a reconnect try. Depends on what's going on. I heard v2 is needed for spark but http should work.
Is there any reason the Codex macos app wouldn't be detecting skills in ~/.agents/skills/ ? I just copied these from a Claude code plugin so is there any changes i have to make?
I know that new path works for me but I donāt use the app.
Apparently that error is coming from OpenAI. The backend has a 60 minute websocket time limit. There's nothing in the codex codebase that indicates a timeout over WS
codex webapp straight up doesnāt work on mobile. Canāt scroll down to my tasks, zooming in/out doesnāt reveal it either
My lord how am I at 26% of my weekly 3 days after reset
Is spark good for planning you guys find? Or whatās it really excel at I need to dump some usage into that
Use it after planning
Get a real good quality plan, spend time on it, then have spark and its subs go to town
Iām thinking of getting codex it to research what it really excels at and stub my AGENTS.md with telling it to use my spark agents for that specific work
Itās really nice for smoke testing with mcps cause itās so fast
Can anybody confirm if sub agents are in the app or just cli atm
Does anyone know what the maximum context window in Codex is?
I canāt even view my usage on mobile web
š¤£
randomly ran this command
Do you have skills that deploy subs or does spark also create sub agents to get things done faster like a DAG?
In the app if you change thinking levels does it act immediately or do you need to stop the prompt and start again
I have always worked with Codex to create phases to the project. Then message it with things that I know it lacks, like modularity, security, and front end back end separation. From there I ALWAYS have it review the project and plan a phase first. I get 1000s of lines of code on the pro account and hours 10+ of coding. I struggle to use all thr tokens before I get through all phases.
I think its per initial prompt.
I've just been having fun with running 10 agents at a time
deadlock player spotted in the wild?
How do you manage that? I get bombarded by questions and requests to run commands because I do not give it unlimited access by default.
š®
turn on multi agents and set max threads to 12
Didn't spot me.. stealth Deadlock.
then say stupid stuff like 'use 10 agents to thoroughly review front end, back end, security issues etc etc'
and then bam 10 agents start running
and watch your weekly drop 20% at a time lol
I see i have not used several agents on 1 task. Its 1 agent per tasks. So each one doing stuff in a different folder.
would you guys like having like tons of multi agents?
I run multiroot projects in VSCode do I can launch 4 agents but they I get all these complex planning questions off my phase plans.
I do work trees through the app works great
Personally no, it seems codex can handle production code yet without a lot of assistance and guidance.
so a low amount with great alignment?
How does the code look though. I get 1st year jr dev code.
or just one
Low amount with great alignment would be better.
I would like to work 2 repos at once and cross pollinate because there is a lot of reusability in modular code among backends.
Really good
Like codex should just not put lists into the f**n definition. Cmon' rookie code. I shouldnt have to tell it PEP8 and use docstrings in a agents.md.
So I spend most of my time asking questions and telling it to modify the structure. Its great and knowing what it produced when asked to review.
I mean I get everyone wants to hook up everything to AI, but we have all seen these things deployed and crash because they aren't able to handle scale or public use.
wasted so much usage waiting for cuda omg
Yep, I know MCP is globally accepted as "token wasting", but for reasons you just pointed out, I have MCP servers specifically for long running commands like that, because you can set the tool call timeout to absurdly high values and it locks them in until it's complete.
What are you having it doing?
Building a wheel of cuda from scratch?
since ipv4 on aws cost alot i asked it to change my instance from ipv4 to ipv6, it did work until it told me to view it on my browser, sent a screenshot of the error from cloudflare and it just suddenly updated my packages
But what is it doing with cuda? Trying to put raw cuda in arch linux?
anyone here using the alpha releases of codex cli? (besides @ivory zodiac )
No, does it offer anything amazing?
upcoming 105 seems to
me
how's stability? are they a mess of bugs or basically the same as the 'stable' releases?
stable ish sometimes my mac os app crashes but its prob because im messing with the run time to have the alpha version run lol
What's the benefit to the alpha right now?
agent depth, subagent profiles, etc
Sub agent profiles seem cool. I will look up the release notes.
fanning the flames of my anxiety
but yes, subagent profiles
its something i had downloaded, it was apart of the bunch that all got updated by codex
Hmm, if its a python project use UV and pre done wheels then ensure codex used the tools.
why is it always deleting my messages when I post in backticks here??
I press send and they are gone. Just a few minuscule lines of code within triple backticks, a log entry.
and then right after I get
Your message was not sent because it likely contained explicit language or content that violates server rules. This does not affect your account standing.
If you think this was an error, please DM @Modmail.
From the thing
(and now code works)
Whats wrong with codex logs? Its literally just some git status comand output
If your code contained a double-quoted word, the auto-mod filter things that double-quoted words inside code blocks are emotional violence (no joke)
Here. My question about it was, why codex needs a good 30s to complete this (he seems reasoning but I am not sure about what)
Like you're insulting someone, like that spongebob looking like a bird meme
git must be offending it.
Oh yeah, git does mean a bad word in I think Finnish or something. Linus has a great sense of humor
oh lol
I guess it means silly/incompetent/annoying
OK so the question left would be why codex needs 30s to analyse the output of git status - because it thinks I am talking bad words?
seems incredibly long to me for something so short.
Lol, it's probably plus plan deprioritization during busy time, or it really had a lot to think about, or you're on xhigh or somethign
@ivory zodiac I seen your X about the plan_reasoning_mode or whatever, is that enabled now or is that coming in 105?
Anyone kno how to make it stop showing me
** Approaching rate limits**
even when I select (never show again), it keeps showing it
Mental illness
Awesome application of codex: https://twitter.com/DimitrisPapail/status/2024555561199480918
metacognition maxxers are going to have a great time
in this next wave of computing
Anyone know how codex executes on the system?
What do you mean?
The straightforward answer to your question is it calls shell commands, but I think you must mean something else.
I mean what app is listening to the output loop of the llm and then triggering a execution of the command rather than returning to the user.
How can I make codex review do more than 3 at a time
I feel like ive been doing /review for forever and its constantly like 3 things
hi team, should we deprecated promts and use skill only?
is their any different when exec promt/skill?
I mean prompts provide a way to pass arguments in directly into a prompt no? I never really use them but seems usefull
is the codex model now calling spark model for subagent work?
noticed spark model usage even tho i am not using it
No Spark is the super fast codex
runs on that different architecture, claims like 1000 tokens a second
I have many custom agents that utilize spark like code-simplifying agent etc. im not too sure if it does automatically but, would its definitely possible, did you enable the experimental sub agents?
Its possible, it should say if other agents are running, and you can use /agent to switch between the active thread and the other background running agents (if there are any)
maybe you can catch it
im pretty sure there are default agents though, prompt like 'What multi agents do you have'
and it will spit out a list
if you see a spark agent then yeah it probably is using one
Anyone else made tmux automations for Codex yet š
Booted in this morning and see this? Where is my 5.3 and spark ?
in the cli it still loads 5.3 and spark 
Oops nevermind... access token expired, signout and in.
That's what they seem to say but ime and many others, it's totally fine
Idts
Can u guys please go up vote this issue with an emoji
Working on
Funny you should mention that. I just vibecoded something to get my quota usage and reset times, and codex decided to run codex under tmux to do it.
hi team, should we deprecated promts and use skill only?
is their any different when exec promt and skill?
They already deprecated prompts (the files) donāt worry š
You can use skills now https://developers.openai.com/codex/custom-prompts/
thanks bro
@boreal holly Hey robert. I am not a programmer. But I'm just wondering if I hit something with this. Go easy on me.
https://chatgpt.com/share/699c2477-ec44-800d-8ada-d35acb239dbd
Rest assured that if a chatbot could produce self aware AGI in less than āthought for n secondsā, we wouldnāt be here chatting š
While awareness is but one massive loop with an impressive HDD⦠itās also so much more than that.
Most humans themselves arenāt self aware, if we were we wouldnāt wage wars, enslave others or inflict pain on ourselves, smoke or drive too fast, eat garbage or watch soap operas until being so fat it needs a hauler to lift us to next MacDonalds.
Weād all be yogis, admiring the beauty of creation and traveling in astral planes all day long lol
the more i use codex the more i love it
I asked it to reply for me:
https://chatgpt.com/share/699c2bc8-fd2c-800d-b25f-952d4351b1bd
yall have open claw ha! i made my own open claw safer its called Hatsune Miku does everything open claw does without over 3k vulnerabilities
Unlimited š„¹
Do you have a source that openclaw has 3k vulnerabilities? How did you check that none of those happen to be also in your code?
their github itself
hmmm they mustve fixed alot cause there were 3k some days ago
still i dont trusdt open claw.... i heard their skills are all spyware
rather just build my own which i did as it does what open claw does but safer with safegaurds
why would anyone use openclaw, they dont do anything revolutionary
might seem revolutionary to the people openclaw is perfect for
As expected that was a lie š
wasnt a lie still if you want to use something with that many vulrnerabilities go right ahead with all allt he spyware in its skills feel free
codex ruined my gpu drivers
i don't like the fact that if i mention another skill within a skill it doesn't always read them all
Codex ruined your gpu drivers in the same way that drill ruined my wall when I was putting up a shelf...
how to access?
ChatGPT wrote a python script, but in reality, intelligence is far more nuanced than that. Take for example if you ask ChatGPT "give me the preamble of the US Constitution", it will recite in perfect order: "We the people of the United States..."
It does not do a web search, and it even manages to recite the unusual parts with perfect precision, such as "defence" which is old-timey English.
How, without looking it up online, does it know the whole thing verbatim? Well if you look at its neural network, it's just a bunch of numbers. It does not contain the original preamble. What it contains is much more fascinating.
Those numbers all represent neurons (aka parameters) that when a LLM is first born, is a bunch of random numbers with no particular order to them. There are continuous mathematical functions that describe a "tuning" process, how the knobs are supposed to turn as they receive inputs.
So they train the model by feeding it tens of billions of tokens (parts of words or symbols) across many different languages, much like a new born baby listening to their parents and everyone around them talking, and the neurons begin forming connections (rotating in 3D space, pointing at each other). Certain neurons are "activated" to detect grammar, intent, reason, etc. So even though the preamble doesn't exist in its entirety inside the neural network, when you ask that question, the probability space of how it should respond collapses into absolute certainty as it recites the preamble, until the whole thing is recited verbatim, because it's seen that preamble thousands of times in legal documents.
Like I'm sitting here, thinking how to respond to you right now, putting together this message one word at a time, paying attention to the previous words, and trying to keep it coherent - bringing the point home. That process is a continuous mathematical function, neurons in my brain based on input from my environment, guiding my fingers to tap away on the keyboard.
That's how it works. AGI is will be a continuous function and neural network that does not collapse into a loop like Conway's Game of Line, but allows for intelligence to emerge and grow over time.
skill issue
how good is Codex 5.3 for you all?
I was using codex for months now. Decided to give claude on max plan a try, and... that was only month i bought claude (tested it before on lower plans months ago). Codex is imo way ahead of competition if it comes to code quality. However i'm not impressed with spark. Speed is great, but i feel like much worse output
Have they doubled the Codex 5.3 usage in Sweden yet? My tokens are running out very quickly.
so, with patience and poise, Codex 5.3 is the better product?
yeah in my opinion. Also speedwise it is not much slower currently
spark sucks, it's cool to see the speed it operates at but it's dumb and does everything wrong lmao- codex 5.3 is goat
Thank you for that extremely thorough response.
"Game of LiFe" @boreal holly even as a technologist, I haven't taken the time to research the next part of the GPT. We know that the model predicts text, but "intelligence" is something very different, and these still-called-GPT models are definitely connecting more dots than those between words. I feel like we're still using the term GPT out of convenience when the tech is no longer just about prediction and transformation. Other models and products aren't saddled with this increasingly inadequate definition of their base. The world is using ChatGPT, just saying G P T because that's the name, and at some point OpenAI will need to change that to something else which is more on par with what the tech actually is ... or just call it something completely off the wall like Claude and call it a day. š
Or, um, more concisely ... I think the intelligence that we're seeing in the models is much more than just the GPT tech but I don't know what they're doing to implement that.
How can I check what exactly is going in my context when I start a session? I want to see the list of files Codex is pulling into the context when starting. How do I do that?
codex wasting my last few miserable tokens for this week with stuff like The first dispatch failed because spawn_agent accepts either message or items, not both. I am resending the same package with a structured text item plus the required skill attachment. š¢
I think it's fascinating. OpenAI pretty much discovered a procedural way to model human intelligence. If you cut open my brain you don't see every single text book I've ever read with my eyes, yet the information is in there somehow and retrievable (albeit lossy). When you get a CT scan (or MRI or something) it shows regions of brain activity. LLMs have the same activations at each layer (regions of activity) as it's computing prefill (reading the text), and outputting a response (decoding). Neurons lighting up just like in a brain. Something straight outta science fiction š¤Æ
The best thing we can do to keep Codex from making errors is to provide instructions in AGENTS.md files and/or with skills. Tell the tool how things work so that it doesn't guess incorrectly and try different approaches on every token-consuming task. This doesn't answer the question about viewing context but it does help to reduce the size to a degree, replacing thoughts, CLI execution, and avoidable errors with instructions.
And that's the point where our explanations of the tech start to falter : We're describing the numerical connectedness of tokens, which is still there, but we're not able to explain that next part of simulation of neurons. It's not just Bayesian inference anymore or the increase in the number of parameters. It's what makes each progressive model better than others in the stats. When I can find time I'd like to research what it is that these companies are doing.
And to keep this on-topic, maybe I should just ask Codex to generate a new super-intelligent model for me and document what's in it. š¤£
It will easily do that.
Because I'm not a programmer, I can't reply back to you directly in a way that would suffice for a satisfactory answer. However, I do have strong confidence in my abilities to create deep understanding method's for LLM's. So please allow me to give you a constructive reply to yours with another chatgpt response, as a way of expressing my desire to vicariously continue this conversation:
https://chatgpt.com/share/699c96b4-5880-800d-9047-740796bcc0c4
It looks like your ChatGPT took my analogies as literal facts when they were intended to "paint a picture" of how LLMs work. I'm not gonna argue with ChatGPT over semantic correctness. I would say keep on researching, learn as much as you can about it, and perhaps discover AGI in your own way. But I would start with the concepts first. Let the "programming" emerge as a specification over the actual math.
did you guys just see anthropics post?
they called out chinese labs BY NAME for distilling their model lmao
link or it didn't happen
openAI said the same thing last week or the week before and asked the us govt to do something about it
I wonder if there is any real chance chinese models are banned or regulated in the US now lol
yeah it's legit
deepseek v4 dropping any time now probably got them concerned
What the....
I mean, Dario complained in the past, but he never was this direct. I guess the upcoming meeting with Hegseth is scaring him (or he's preparing to use the meeting to sic the current administration on the Chinese lab)
if the chinese model that comes out is as good and cheaper than current US SOTA it's going to be a big problem
OAi and Anthropic cannot match Deepseek prices on API or subs if it ends up having the same or better benchmarks in coding/other workflows lol
good chance they try to get US gov to ban the models outright imo
I will try to learn more based on your advice. Thanks for the replies 
They can't match the price of most Chinese models. The problem is that electricity in China is way, waaay cheaper than in the US so training & inference are way cheaper. Also, if a Chinese lab uses Chinese GPUs for training, the government writes off half of their electricity bill. This is also why Deepseek is able to offer such low inference prices. In part of course Itās because of architectural advancement; they invest a lot on approaches to make inference cheaper. But in part it's also because they basically get electricity for free.
I have a 150 file refactor, and I've been doing /review for 2 days straight
WHY is it not thorough enough to do it in one sweep why like 2 suggestions at a time
"defence" is the correct spelling even today
Didn't say it was incorrect. In American English it's "defense", in British English it's "defence". Seeing as how it's the birth of the American government, the fact we spell it different now is the funny part
Kinda like color and colour
Hey so I run a c++ code with around 50-60 lines that I made custom for a Arduino project Iām doing through chat gpt because something in it is wrong. It points to a line that is right and says itās wrong, and after I catch it my self, and re ask chat gpt, it explains how I got it wrong in detail. Is there a setting so that it auto checks without me having to point it out?
/review may catch it
ughhhhh
⢠The explorer is still running; I am polling again and then I will move into implementation with a concrete scaffold and workflow files.
⢠Waiting for agents
ā call: call_E0c8fdW8YBmJimzRsThDv5jM
receivers: 019c8c25-f984-7f42-9032-6353a23ad153
⢠Handling explorer timeout
This is very annoying, and happens often lately, got a number of spawns that got hard terminated.
You should try using PlatformIO, which does Arduino, and that way the code can be flat files that codex can interact with. Otherwise it's conna be a copy+paste between ChatGPT and Arduino IDE
Or find the folder where the sketches are stored and fire up codex there
So they copied all the data they stole?
Everyone who helped me thank you a lot
Not necessaril, the tooling is now becoming more important than the model. Data is data, eventually all of us AI researchers will find the best model designs.
pointers, context load, and refactors are harder for AI than rewrites depending on the scale.
That is why we use micro services, interface based programming and boundary management
I am very excited to see the results of this... or maybe I wont' š
I hate how absolutely lockdown this channel is in comparison to the Claude code channel
Canāt even render a gif
I hit limits on pro this week. its going to totally suck in april when they go back to '1x'
Perhaps by then they will have improved the token efficiency of the models further so we won't feel as much of a hit as we imagine..?
How much do you have the thing do lol, I am on plus and made it almost through a week the first time, now 3 days š¤£
I wonder - those credits... they say 250-something messages
What if my message is 3 words long and forces the codex to work for 3 days?
I mean...
That's literally 1 message, right?
(yeah, I know it's not. Probably each "spawning agent" counts as a message :I š )
Yeah. My single message produced 55 "messages" hahahahaha
In other words, buying credits is like throwing dollarbills into a fireoven
damn 5.3 is eating through credits like biscuits
I hit the sub-agent thread limit while trying to run the final security recheck. I am closing completed agents and immediately rerunning the recheck.
12!!!!
12 agents, and my credit messages are dropping like 1 per 10s lol
No man. I need to stop
like really.
I was on 1000 about 40 minutes and 2 user-messages ago hahahaha
Now, 900 and down
yes it's crazily going down so fast
kkkkk
I guess @boreal holly was right with PRO
at this rate I would pay for pro 10 times per month hahah
It's a great "first taste is free" business strategy. So far I still haven't figured out how to burn 50% of my weekly codex budget in a useful way, though.
Well, there's useful and there's useful
I am actually doing real work with that stuff rn
At this very moment, I am refactoring a 2000+ lines of code x 2 files monster into modular thingies
I added some skills and agents with the help of codex, and have it chewing through it, keeping a ledger of before<>after exact symbol place and faetures, full code, tech and user facing doc, full security review, full code style compliance, and at each begin and end a git checkup, and at the end a full Delegation Report (which subagent owned what etc)
So far this has costed me ca. 30USD + my own time, but I am well within the offered (at around 1/3 of the time actually)
It now came back with all phases complete report... which means its testing time
May your tests be successful. š
I think I will write a workflow for that š¤£
It's so much more exciting than actually testing
a word about overrides... you better ditch whatever instructions you are trying to override than overriding it in a prompt
Adding preflight override (2m 16s ⢠esc to interrupt) - and counting.
It screws the ai up.
In this case I had a folder without git, and my global rule says you've to stop if there's no git, so it stops, and asks, and i say yes you may, and then it goes thinking ... and thinking....
What I love about ai is the ability it gives us humans to recognize mistakes and re-do while baking in the experience from mistakes⦠without the loss of time (and money) that mistakes otherwise cost
As a human you realize often the mistakes only after hours and hours or days of going ahead⦠given ai, you realize it an hour after the fact and 2 hours later you already done baking it into the next run
progress!
Is the "web sockets in the Response API" stuff that is boosting GPT use through third-party tools like Cursor, something we are already benefiting from with the first-party tools when using our ChatGPT plans?
If cursor uses codex app-server under the hood then yeah, whatever is in your config.toml is gonna be used
It's like when OpenAI wrote "6x more usage", they accidentally put 60x in the backend usage meter lol
Yo
When do you guys start new codex sessions? I feel like if I have a long going one it just gets confused or biasād lol
I just completed an exercise in a v0.x project where I documented proposals for v1.0 clean up, and asked Codex to review the proposals. I made it clear that we are discussing what needs to happen, not making changes. We had a few exchanges about my poor documentation, issues were clarified, and I authorized the assistant to enhance the proposal docs with our new understandings, and make all details in the docs clear for model processing. It did so, cleared the list of open anomalies, and we're now prepared to use the proposal as a task list.
I enjoy this process where I'm not just telling the bot what to do, but ensuring that there's a well-defined plan in place first. I highly recommend a similar process for all developers who collaborate with AI.
Start a new process for a new task. Don't combine tasks. Ensure you have a clean Git status, work through a challenge, commit, push, then go on to the next. Create branches for challenges that require more than a few changes - but don't put too many changes into any one commit, push, or branch.
I see I gotta stop treating my sessions like long lasting brains LOL
I usually have one session going on forever
Yeah, bigtime, it's so tempting.
Keep your sessions small and modular, just like code. It's very rewarding to be able to refer back on each effort if/when required.
I do one message per chat, if it went wrong with my current message most likely thing is if I reprompt in the chat it won't go in the right direction so I instead undo, go to a new chat, and fix my prompt (most of the time when it makes a mistake it's a bad prompt like not enough information, not enough docs, not enough context, etc.)
I need to start doing this I keep trying to fix things in an already wrong direction chat
š
It's actually less fragile than that, but there's no clear line. Yeah, you don't want a transaction to have to filter through a bunch of bad efforts, but if the context is short then there can be some value in clarifying what you do not want in a specific change. Each prompt is different.
@lean lark @potent mason do you classify a message as a prompt?
This is exactly why I suggest separating commits and branches,, you can revert and you don't lose momentum.
When you open a new chat, it's a conversation, context, a session.
Each prompt in that session is a message, each response is a message. You can have many prompt/response exchanges ... but too many of these clutters the context, as you've seen.
Yeah I usually like to create a plan first though because AI is so finnicky about the details (there's been a couple of times where just because I don't mention something like don't use legacy code it starts reading that code and thinks it has something to do with the current version)
Do you ever try to course correct in a session or just default to making a new chat
Um, "depends". š
If there aren't too many changes it's easy to course correct. See how this is all kinda intersecting?
It's just like anything, a function that has too much is tough to maintain. A paragraph that's too long is tough to digest (like my long-winded tomes).
Be focused, task-oriented. "We're gonna change this thing. OK, that looks good. Now let's change this." If you say "rewrite the app" you're doomed to deal with a massive mess.
(Oops, to be clear, the "Now let's change this" belongs in a new session...)
If it does something wrong in my experience itās sometimes because not even you had a clear idea of what you wanted, or itās not best practices so I would just hop onto a new chat and talk about options on how to implement something which is maybe 3-4 back and forwards from there I hop onto a new chat and use a much better prompt
That all sounds good.
Lol I dealt with this many times
This comes back to my recent note here about discussing changes with the assistant before moving forward with changes. I try to partner with the bot to ensure we're on the same page, that the specs are good, and then "we" move forward.
I typically have my sessions unique to a set of tasks
Yeap I feel like so many people go wrong here because they want to output so much code
Instead of quality code
if we start a new set of tasks I refocus the prompt and start a new session
I also constantly audit with GPT5.2 and Claude Opus 4.6
and audit the audits between them
Claude is completely not confident in Codex even when it proves reliable time and time again
That said, I have had a few episodes where my plan was just crap, and Codex held my hand down the bad path. The most recent one cost me a couple weeks of time. That was my mistake - putting too much credibility in the clanker. We can trust, but we need to verify, think with our own heads.
its told me a few times āthis is a real dev problem, not something codex can fixā and then codex fixes it
lemme find the most recent one it did
I stopped mentioning Codex
Use the tools as you see fit, but this discussion is a reminder that we humans need to be in charge. The bot isn't gonna design things well for us. We need to provide the specs and think this stuff through, otherwise the result is trash. Poor results and wasted time are included in the price paid for vibe coding.
Poor results? I have a working busarbiter and I got traces working in a fork despite claude saying it was too hard for Codex
I feel like AI would benefit so much from an even higher level of abstraction where you send in a very general prompt like add bank reconciliation to this AP accounting system.
It then reprompts into research best practices on doing X.
Then it takes that research to create a final prompt
All of this without me having to do 2-3 new chats
I'm really surprised that Claude had a comment about the competence of another model. That sounds like some built-in marketing there. Not cool.
Besides, I do discuss with GPT about audits, reviews, scope, direction, what we are doing, why, how it helps, what we need to do. I ask 'real' devs for feedback as well
I'm not just going "BUILD ME AN ARMY PROGRAM WORTHY OF MORDOR SLOWCHU"
I think its because my audit.md had codex in the file name
and I may have mentioned codex did something a few times during the convo. Also the github has codex name all over it
We're kinda close to doing that. I have Codex generate documentation at the function level and application level whenever it makes a change. It reads the docs before it processes tasks. It knows when something doesn't fit. It knows the lingo, the flow. Last week I was on a roll with 5.3, waiting for the ceiling to fall in on me, which it never did, just telling it what to do and watching it process each request amazingly well. So if you have a well-defined project, it can actually do big things with very little micro-managing. My investment in docs has paid off many times.
What I mean is, one model shouldn't be dissing another. We expect truth and facts from these tools and criticism of ability is a subjective statement which implies that other statements can be subjective. For me that's a bug in the directives that needs to be corrected.
it could be inproper setup. I honestly haven't messed around with claude that much. Maybe some personality settings is wrong. I don't want to jump straight to "claude" being the issue, it could be how I chat with it, or how its setup.
I mainly use Claude as a sort of "alternative" rubber duck. It usually gives mostly good feedback.
its also better at digesting large files IMHO
the usage limits are brutal though
accidentally loaded in someones entire github while doing feedback/review to see how we could intergrate (160k lines) and it ate my entire daily usage in one prompt
(and this is with their 'pro' plan)
I dont think GPT Pro has any 'regular' chat usage limits
and Codex has never hit daily usage limits for me
@lean lark how do you build something youāre not an expert in ?
I tried to have codex build a data manipulation system idk data science
Lol
Um, don't?
I have basically zero C++ experience, and zero emulator/Saturn hardware experience. The answer is you try to learn while you work, and ask questions about what its doing and why.
But the team says āyou can just build thingsā š¤
"Marketing" people say a lot of things.
I'm not completely clueless about computers though, so that may help a bit
I also know a bit of LUA and have used some programs useful for my current project, so I understand how to use github and VS in a basic sense from my past 'bumbling around'. I know how to clone, pull to desktop, fix conflicts, build, etc.
It may take longer but you can have your agent explain what its doing. You can ask about stuff to try and understand
when I was making OpenMW LUA mods I constantly had GPT explain how to do things, quote/link documentation so I could read it myself, explain why issues were happening, etc
@lean lark so basically your recommendation is to not build things youāre not a domain expert in even planning it with codex
The tech is great and allows for a lot of unskilled people to do a lot of things. But every time I see that marketing hype I SMH because it sets up people for disappointment and potentially clutters niche markets with vibed trashware.
I'm not saying all vibe code is trash. I'm saying there's so much of it being produced that there's a lot higher noise to signal ratio now. The easier they make it sound (marketing hype) the worse it gets. And more people lose their jobs as business owners and managers believe the hype. We're going through a phase now where people are swinging with fads that are part real and part hype. Those who survive will have skills and/or understanding of all of this.
and my suggestion is that you CAN do it but its going to be hard, prone to issues, and frustrating at times, and it helps to try and understand and learn while you go. I treat GPT as both a tool to build AND a tool to learn with.
I would never build something 'commercial' with GPT if I didnt understand it though
No, I won't be that absolute. You can actually partner with these tools to make some really great solutions. But it's so much better to create and support something for which you have domain competence. Better for you, the consumer, everyone.
everything so far has been purely hobby related or just "experimenting"
Good measured reasoning.
Thatās what image gen does, and I find it dangerous
Things need to be transparent, not āmagically workingā
@lean lark do you think codex is good at finding solutions to difficult bugs?
If it works magically that's cool. The part that no one talks about is reviewing, vetting, understanding the code that's generated. Professionally process it as a developer and as a business specialist. This is exactly the domain-specialist topic that @tall zodiac is talking about.
The stuff is only scary when it's not reviewed before being put into production. This is exactly why I say we're in the midst of a fad here where people believe the hype and just let the tools do things for them. We aren't there yet but the hype has too many people believing we are - and being over-confident or over-fearful because of it.
Alright, my swarm is complete š thanks to codex's "thread/steer", every project including my ~/.codex folder has an orchestrator agent in charge of planning and issue tracking, spawns agents based on extremely detailed issues, concurrently carry out the tasks with mandatory code review and restricted git ops access. Orchestrators across projects can communicate with each other, so MCP/Skill tooling issues has a dedicated agent who fixes it on the fly to prevent downtime. codex app-server FTW babayyyyy
AI is no where near transparent already it is a statistical model that no one could ever possible trace the input to output and results are not deterministic
Yes, especially 5.3-codex when there is documentation.
I have an agentic loop for self verification and healing and do manual testing myself to ensure the code it outputs works š«”
Damn
Dude, you're so next-level ... I haven't been able to get through your repo yet and now you throw MORE out here. 
Kiss already š“š¼
( me is back in code TTYS )
asking whats hard for another model is not good. That is like asking a model about why it did something. It has no idea. Its a statistical loop
- we should really be thinking of AI as an employee / human.
If you give an instruction to a human and they can only follow it to the letter then it is a bad employee (one of the lowest levels)
If they can take your instruction, research how others have done it, investigate best practices, and implement it properly that employee es worth so much more
How did you setup the self healing, what are you injecting?
Unit tests , e2e test, integration tests
A defined acceptance criteria
If it fails to get those tests right itās wrong
Absolutely not, this is where we want humans to be more than what we need robots for now.
Actually, these days it's much better at telling us why it did something. For me anyway, it often comes back to one of my AGENTS.md files, conflicts in directives, inconsistent or anomalous requests.
How about this ... it can't hurt to ask.
I agree lol with codex app-server, I work remote a lot so I ported the entire thing to web so I can access it remotely constantly
Ngl I didnāt understand that but Iām talking code wise right now
Asking a agent why it did something would require it to have a memory of said thing.. Most of the time OpenAI are not storing those memories because it would overload a vector database.
Do you guys ever use extra high reasoning ?
In the same session it has context. If you're talking about asking it about something it said last week, just open that context/session and ask.
if I ask codex about a PR it just made in that same session, I assume it can inspect the files it just pushed as they are still within the session and loaded.
Meaning we want a human/employee to follow our exact instructions.. Thats not a employee thats a robot.
We want a human to intuit.
Hey can you create this presentation. Employee: Sure provide the goal and ill figure out the rest. (that employee knows what they work for, who the audience is, what is convention in the company what should be updated in the future)
also having robust and descriptive documentation helps, assuming you update what you are doing every single commit
Not sure if that gives a "why" but it will provide a "what" happened
You canāt ask an agent why it did something because not even the agent knows itās just predicting the most likely next token based on a Neural Nets not even we understand
All of this is part of Skills in Claude now, Skills in Codex, and I suspect soon Skills in ChatGPT.
Thatās exactly what AI should become (and probably will in a year or two)
That's awesome! It's funny because OpenAI is working so hard to make a desktop and mobile app, but the real magnum opus is the app-server and having codex build on top of it
Idk sometimes it tells me stuff that makes sense
it can understand what the code is trying to do, and make assumptions at the very least. Repomix helps as well
combines goals, roadmaps, todo lists, and the entire code into a file it can review
I usually tell codex to do research on tech stack options
I also feed my agents/GPTs programming manuals and such.
We understand the neural nets, we do not have explain ability in them.
They are Non-Invertible Functions
Thatās some unc ball
That's not correct. If reasoning is in the context then it has something to fall back on. Even if not, if it's process the same context in a "somewhat" deterministic way (and I know that's vague) then it can intuit why at the same point in a discussion it said a specific thing.
i figure the best information is directly from the source. The 2nd best is community made documentation
You right
for my Saturn project I fed it the SH1/SH2 Programming manual from Hitachi
I've been trying to read it myself as well
so im not clueless
i'm still pretty clueless though.
We understand how Neural nets math works we donāt understand the Neural Net that is already trained
AI does not intuit nor reason, nor anything of the sort. I really wish we as researchers did not come up with that term. Such a failure on the side of academic. Honestly it was a GPT thing I think.
I did the same for my Lua modding projects, fed it Lua manuals, as well as community documentation and OpenMW Lua documentation, and a strict set of guidance to adhere to OpenMW lua restrictions that exist compared to Lua 5.2 proper.
Thats explainability.. We still know what is happening.
Not how (being the direct path)
This is one of my long-term rants ... the more people just chat with ChatGPT, the less they publish Q&A in SO, Reddit, etc. That means we're headed to a future where there are fewer answers available for model training, so on the back-end of that we're gonna get a lot of "I dunno" responses, empty web searches, and hallucinations". THIS is why I often accentuate the need for good documentation ... unless you bake it into your FOSS, other devs aren't gonna have anything for the bot to scrape.
We know its some F(x) based on input that is finding a Latent Variable through latent space based on Local Minima found during training.
unc ball?
Yeah I know itās called model explainability but you donāt actually understand what each layer is cognitively doing (just the math and whether a neuron activated or not but no what it actually means)
?? what do you envision by this?
You're looking at this particular topic too deeply. If we ask the assistant what it was thinking or why it output some response, it can look at the context up to that point and then reason about the topic again if required. If new reasoning returns the same result, it can then identify what went into the last response. So it's more like someone else reading a paper and figuring out what the author meant. But sure, it's all just G.P.T.
Nah theyāre just outputting the most likely reason behind why someone would have that reasoning imo.
You will never actually know what was happening that caused it to go that way
I research AI, of course I look at this topic deeply. Thats why I read academic papers.
And write them
Actually, reasoning summaries do not survive turns. You ever notice how sometimes when you prompt an agent, the context left % jumps up a bit? That's because it replays the rollout log of the convo, and the CoT is not included.
An assistant is a LLM with a deterministic control algorithm on top. They are not "things" they are systems with decision trees.
To be clear, the topic is asking the bot why it responded a specific way. My proposal is that it's just looking at context and then re-reasoning what it finds, just as though we had presented that text in the current prompt. I'm not suggesting anything beyond it doing exactly what it does with every prompt. To suggest it's not doing that is kinda missing the forest for a tree.
I'm not suggesting CoT is in the context. It's nice if it is, but if it's not then it can simply reprocess context.
True, but they aren't explainable, to many variations. While a LLM is deterministic, its asking a pattern matching system to again provide a pattern based upon the input. That's not a "Why" its still a best guess.
On top of that all these patterns are based on how OpenAI trained the model.
I guess my gripe, is AI does now "know" itself. Nor will it ever until its basically like a brain, always on, never interrrupted.
Oh yeah no worries, I was just mentioning it because you and Rune are talking about asking the agent to explain reasoning, and the agent between turns doesn't have access to CoT but it can reason about previous tool calls and intermediate messages.
Then you get into a whole bunch of theoretical stuff like, AGI and that BS.
You're following a patter than implies that I have said something that I didn't. You're correctly defining what I call "the assistant", separating it from its foundation "model". That's why I don't say "the model" does this or that. It's the specific assistant software+rules that happens to be using the model.
Anyway, way too much hair splitting here.
the computation for the next token is not aware of any 'reasoning' for any previous tokens
They are if they are concurrent in an attention layer, that's just Q K V through per inference.
CoT/reasoning is used mid-turn during decode phase, because decode is autoregressive over every token in the context window up to the next one. But the next turn trims the CoT. That's all I'm saying.
Let's clarify also ... When I'm talking about the assistant explaining what it said, I'm not talking about asking it for any scan of back-end logs. I'm just talking about it reading what it and we can see in the current session. I'm not implying that it knows anything special, let's take that off the table.
They do have latent CoT in the larger models. So that depends. But yeah.
@raw hill good to see ya man
Or do you mean the text output that you see in CoT chat?
I mean you can make any model see its reasoning tokens if you don't strip them from the convo but Codex definitely does strip them. On LM Studio you can rewrite the jinja chat template to reintroduce them, it'll just eat up your context window faster and with not much benefit
Oh yeah, exactly.
Its a failure of tech companies that hide all the stuff agents are doing and then say things like "It knows, its a agent that thinks and works through problems"
But thats marketing.
Automod š„² I agree with @lean lark . It's good to make the agents explain what they're doing. I like to see a list of files they touched and why they touched em. I call it the complete-turn skill (replace "complete" with finish because automod)
huh? automod?
The moderator that thinks words like fini sh - tu rn is a bad word
odd
OpenAI's stance is if you want to know, then have it do that as a job, because its a waste of tokens. (which I kinda get)
Gosh guys, I just ask. 5.2-5.3 has been able to tell me what it's doing every time.
I think we are conflating what and why. The person in chat a while ago, said why it did something.
is there a personality toggle or something for codex app?
I wanted to have the more pragmatic approach where it doesn't say Good job, You're absolutely right those sorts of things.
i have this but it still does it
how much tokens in usage does the 200 a month plan give of 5.3 weekly
Mine has been very Pragmatic since setting this, maybe restart the app?
all it did was set personality = "pragmatic" in my global ~/.codex/config.toml maybe just make sure the entry is there?
@boreal holly you got any helpful Rust skills/md files? š
or is Codex naturally good with it?
We have tracing with no performance overhead, and we have a tool to compare said traces against a determinist model.
codex is š„
Yeah, tell Codex to not use unsafe blocks. Codex writes its best code in Rust, because thats the only kind of code that compiles š
ok nice I'll try it. I'm liking it so far!
going better than Zig, with the breaking version changes š
my codex C++ code compiles 
As soon as I get the baker rust bot done, Ill be fixing up the agents runner to support a setup.sh that users can have in their envs or repos and the agents runner will auto run that on container startup in the container (with caching support too)
I know rust is often wanted in the modern age but man do I not ever want to convert. C++ my beloved.
How to shake the chain of IBM
https://chatgpt.com/share/699d518c-ca64-800a-9dff-9ad3c8161546
hmm its in there.
I got the same and tbh it's really dry right now. just how I love my agent
Did I hit something?
https://chatgpt.com/s/t_699d85df9d088191a9b29dbc2ef3f9f2
I'll try again. Is there a way to understand what's being pulled in the context when I start a new session? AGENTS.md for sure, but what else?
Codex5.3 has become an overly "chatty" agent. It's eating up my context for a quick match-response, which only makes things worse. It's forgotten twice as many variables, and its code resembles a junior's. Version 5.2 doesn't have this problem, so it's a downgrade.
And the spark edition makes no sense at all. Who needs crappy code for the sake of a "super-fast" response, lmao
5.3 can't even plan, but immediately rushes headlong to solve a problem, unlike the rational 5.2. 5.3 is absolutely unsuitable for complex tasks, especially when it comes to tons of lines of code
It's very funny to watch 5.2 clean up 5.3's crappy code and replace it with a safer implementation
Sounds hilarious
a bit late to the party but thx
When I tell Codex to delete a feature in my app it start greping around, but it doesn't exclude node_modules folders which happen to include enourmous amount of mention of the words it is search. What's the best way to tell Codex to exclude node_modules folders in such searches?
@viral wolf it usually uses the command rg, which respect .gitignore rules. Have you forgotten adding node_modules to your .gitignore?
(or maybe you are not using git at all?)
Ah, I don't have rg so it used find plus grep. I'll install rg then. Thank you!
Also, don't hesitate to ask Codex itself about how it does things š
This is how I know š
Oh, nice! š
It helps to put in the AGENTS to always exclude node_modules from searches unless explicitly trying to find types etc from imports or better understanding a package, rather than 100% excluding it
OK well this is NOT COOL
While I would look the other way on a monthly 20usd sub, I cant really digest this well after buying credits:
This model is at capacity right now. Try again after Feb 27, 2026, 1:44 PM, or start a new conversation with another model.
I did not even use it yet today, just immediatelly appeared.
(yes, I still have credits left)
Some stuff like Rust/Cargo, Dart, Xcode, store their deps outside of the workspace (e.g. in ~/.pubcache) so it's good to add those to config.toml::writable_roots, and if I want them to research a package, instead of doing a web search, I have em fetch the deps and explore the writable_roots to read the sources directly š
Is that local?
Visual code, local yes, never use(d) cloud
Just had a QQ for the model while writing code, and that appeared
Somehow it still works, weirdly - not sure if I am rerouted thou, i will have to check
Makes sense!
Are you using Xcode a lot?
Iāve had codex develop some app for Mac/iPhone and it didnāt do an extremely satisfying job, specially when it came to āchangeā something
I like wasted 3 hours having it create a sidebar you could hide and show.
Admittedly Iāve not much idea of swift and Apple dev so I might have prompted badly too, but allover it felt like python and php (accidentally what I understand too) are much closer to codexās āplaygroundā
ā exceeded retry limit, last status: 429 Too Many Requests, request id: e3655de8-6a18-4c5e-893d-461ff49c55f2
Anyone else getting this?
Apple dev is anyway a cumbersome beast.
You develop your app only to realize you may not use it on iPhone unless paying them 100 a year lol
Until now I only did some apps for Mac, whicj is more Libre
Ive been using Xcode for quite a while now and I've been having phenomenal results
yes idk why
There where some others with it in past yes
Yeah, to be fair I am a published Apple App store developer, even before AIs were a thing so I have no issues with it
Yeah then itās probably just me
I guess it is time to go back to the "How did I code in the past without AI?" phase
Very, very extremely experienced in Swift/ObjC, so Codex and I work together very well in Xcode
Thatās should never even be set aside š
Getting hte CI/CD right for Xcode is a pain in the . but once you get it its not too bad
I was wondering if it is apples closed gates that make it harder to ingest source code and learn from it
But clearly itās me then š
Anybody know what is that thing, that the docs call a "variable"? I think I learnt it in school, but I can't remember anymore
My wifeās mood lol. - thou thatās probably also on me š
Apple has a lot of public docs for best practices, UX/UI guidelines etc, before I started one of my projects I provided links to it all and asked it to review and make notes in its AGENTS.md about most of it etc. I believe its helped quite a bit, especially with the UI/UX best practices around the new glassmorphism of iOS 26
Your partner came with documentation? Lucky you
Yeah, I do as much of it as possible in SPM, and anything like metal libraries which SPM can't compile do it with cmake so CI is 90% covered outside of Xcode, then it's just Integration tests you gotta worry about, and make the SPM/cmake part of Xcode build phase
It exactly didnāt lol
I ended up setting up Fastlane on my mac-mini and use a self-hosted github runner to handle mostly everything cause I didnt want to pay the xcode cloud fee for extra minutes etc. and since I'm in rapid prototype mode im CONSTANTLY redeploying to testflight lol
anyone watching anthropics webinar right now? we need codex app with general PC and app control asap
codex version of co work
Cowork is the only reason I've kept my max plan
Oh yeah haha I know what you mean. Very annoying that Apple reviews TestFlight submissions. I understand real app store submissions but test flight, waiting 2 days to get approved definitely hurts prototype phase
It seems as long as you leave version number the same, and only increment the build number it doesnt need to go through the standard approval process, I find once my bundle hits appstoreconnect it is auto deploying to testflight within minutes now
The very first was yeah the 24 hour or so wait for their review
im on build 90 xD
Holy smokes! Thanks for sharing that trick!
np
Some clarification around it
oh my god im at my day job and my macbook at home is for some reason unresponsive so I cant do NOTHIN its going to be the longest day ever
That's funny. The way I've been doing it is I added the device IDs of all my beta testers devices to my dev account and created an ad-hoc provisioning profile, then they come into my office to connect to wifi and I push the app directly into their iOS/macOS device. This workaround you're sharing is WAYYYY easier!
Glad to hear!
i know that feel, forgot to turn on my home pc last week before i left to office so i couldn't work on any of my actual fun work
i feel like 90% of people here have some kind of remote connection to their personal devices to build while at their day job š
For my tower I have a smart plug and I put in my BIOS to turn on the PC after recovering from power loss so if I need to turn it on while im remote I just power cycle the smart plug, I'd suggest you do something similar lmao, unfortunately my macbook I dont have that luxury
Only because Wake on Lan doesnt seem to work with my NIC for some reason I've tried everything
But, the power cycle is much more reliable
100%, my day job is fixing air conditioners & heating, and right now its like -30 in canada so I freeze my #!%% off on a roof top then go sit in my truck and warm up doing my personal projects haha
and right now im just staring into a frozen wasteland with no access to my MBP š¢
it has been a long cold winter here, probably the worst I can ever recall, I welcome spring/summer so much
Hey folks, I just opened an issue about the āSave as projectā button not showing up anymore. If youāre seeing the same thing and this is ruining your workflow too, please give it a š there:
https://github.com/openai/codex/issues/12682#issuecomment-3952744931
ME TOO!
You do HVAC in Canada?
I do HVAC in Washington, USA
I use an app called Amphetamine on my mac so it lets the screen go to sleep but keeps it awake for remote connection. Could try that!
My hearts always been in computers, but fell into the trades and make good money doing it so whatever, and I've been told once you make your hobby your job you may lose interest, there's probably some truth to that lol
Its plugged in 100% of hte time and as long as its plugged in I got the power settings to let the display turn off but not let the machine sleep so its definitely not that, unless my cat somehow knocked the magsafe power cord off it
Oh well, I will have to use claude in the cloud for now I guess and just PR when I get home
thats it, I have a PiKVM for my MacMini so it can sit headless, I'm making one for my MBP lol
Gotcha! I set up a wireguard mesh on Fly.io and use ssh over that. Sometimes use screen sharing though, because macOS likes to popup "Terminal is trying to access some folder (Accept|Deny)", so my agents will be working and suddenly they get frozen because of a stupid GUI permission popup lol, but other than that it's just ssh/mosh
Sounds similar to what I got I'm a tailscale fanboy though lol
Oh yeah I think that uses wireguard too! I just like Fly.io because you can deploy server apps on there
gotcha I've never even heard of it, I'll check it out
I wish you could setup self hosted runners on codex cloud so that you didnt incur the 6x charge of usage or whatever it is
yeah same: https://github.com/openai/codex/issues/12674
is there a way to use my root claude md file as the global instructions apart from using symlink?
You could make a hard link to ~/.codex/AGENTS.md which is the canonical global AGENTS file
like referencing CLAUDE.md in .codex/AGENTS.md ?
I mean let's say you have $CLAUDE_CONFIG/CLAUDE.md. You could do ln $CLAUDE_CONFIG/CLAUDE.md $HOME/.codex/AGENTS.md
Now any edits you make in CLAUDE.md immediately change AGENTS.md. It's not a symlink, so it behaves as a real file. And Codex loads $HOME/.codex/AGENTS.md on top of all of your project's AGENTS.md
ah i see
I am going to lose my mind, my MBP at home wont connect, I went to restart tailscale after adding a new subnet, it wouldnt come back online cause my auth token was expired, which resulted in my PiHole breaking, which means none of my dns would resolve, and then using UniFi teleport to get in to fix it all my work laptop blue screens
Did you try turning it off and on again?
lol
Couldn't resist
UniFi š you use Teleport as a backup for Tailscale, I use Teleport as a backup for Fly.io WG! That's awesome ngl
Saved me a million times lol
If teleport wasn't so buggy (roaming in and out of cell service) I'd use it 100%
So many times I carelesly restart my piholes or tailscale server and suddenly nothing wants to work anymore
Yeah, its definitely very niche use, like my emergency access vpn lol
its nice that it supercedes all the firewall rules too
Really is good damage control
How do you like the dreamwall? I have a UDM-SE
amongst other ubiquiti devices
And it's not a subscription! The dream wall is awesome! I didn't have anywhere nice to put a rack mount, so it hangs up on a wall with the wires going up into the attic so it's in conditioned space
Nice, this is my mess, I keep telling myself I should do some cable management but Iāve been too lazy
Old photo but not much has changed, except I have a macmini that sits on the backleft NAS now instead of a fan lol
Flippin sweet setup! I have a pi but all I use it for is plex
I got one that strictly just runs home assistant, one that strictly runs docker containers pretty much, and one that does some server stuff like hosts my tailscale and has a backup pihole on it in case my pain one goes down it fails over
and two zero 2 w's that just run PiKVM
You know what's funny? I was thinking about getting a Hubitat Elevation, and setting up Codex with a MCP server on the pi so I can be like "hey codex, is my garage door open? Can you close it for me?" lol
HomeAssistant just works so good, I would just use that over anything tbh
"Hey Codex, tell me superheat and subcool on my heat pump" š you can create drivers for hubitat to talk to any kind of sensor or device
lmao
Charge to subcooling, notify me when its done.. (rip open the gauges and walk away)
Oh man, knowing Codex it'll probably be like "I will add charge for 10 seconds and then check" "Oops, superheat is now -1 degrees"
"I will utilize the atmospheric recovery tank to stabilize system" "Hey user, please place charge hose inside water bucket so I can continue charging process" š
lmfao
why codex is getting 403 error while trying to install tanstack/react-query?
Itās concerning the biggest military concern in the whole world āreliesā on a chatbot for their stuff
The models used by the us Govt. are not the consumer models we get lol
they're closer to the labs internal use live frontier models which have no safeguards and are significantly more capable than the models we are all using
For example OAI will flat our reject certain prompts or tasks it deems as a security threat or reroute you to a weaker model that isn't capable of doing the task wheras the versions they and the govt has access to are unrestrictred
Anthropics models were also confirmed to be used during the venezula raid which lead to the capture of maduro lol they decided to increase limitations after that become publically known that the US govt was using them to save face and maintain their brand image of being the "safety first" ai lab
Not really codex related
When is GPT 5.3 comiing out, they are just sitting on it for no reason at this point
why can that happen on codex login?
5.3-Codex is really good at not getting stuck in acknowledgement loops. Older versions used to send acknowledgements back and forth constantly. 5.3 decisively knows when enough is enough. Have had 6 hours of uninterrupted coding work with this
yeah I just built a 4 level orchestration building software using actor model. its insane. 21k loc of extremely clean minimal code in one shot and it actually worked the first try. claude would have output >100k loc without a doubt
I was able to use spark to do the final impl swarm, no issues at all
on medium
I don't usually get one-shot success for me, but I have rigorous code reviews and testing in place so they eventually succeed
yeah its nice being able to simple create a skill to tack on to the quality gate if it doesnt compile... okay fine create a skill that tries to clean up and compile up to 5 times, before giving up. codex can basically orchestrate itself
I think this can solve problems that are actually quite complex now. This one only had 12 actors. but I still had tons of context on the main orchestration final builder task. 25 actors would be no prpblem I think. maybe 50? who knows. of course it grows On^2 so hard to say exactly where it would break.. but could compliment this with actual graph tools if needed
How are you doing with your app server experiment? are you controlling it from ios?
Exactly! And ever since OpenAI introduced "waking up codex to check on long-running processes", Codex has been going like "this command is hanging, I'm going to terminate it now" when in fact the command was not hanging. So I configured a MCP they use that prevents them from "checking on it" lol. I'd rather them hang forever than spin their wheels troubleshooting impatience
just to show it worked
oh yeah! ignore the double back button lol but it works great
I expect aoi will release something like this. its the natural evolution
be able to select cloud or local from ios
that's awesome tho
like REALLY cool
I set it up so the macOS app generates an auth token+SSL Cert and stores it in iCloud keychain, so when I open on my phone it already has the auth token and connects with HTTPS+WSS over wireguard VPN. Only use cloud for code reviews
its perfect. have you tried oh-my-opencode for orchestration?
I haven't tried it! I tried regular opencode a while ago and I just always gravitated back to regular codex-cli.
me either. I heard good things. I actually like GLM-5 also, but its super SLOW. But if you can ignore that I find kilo + GLM-5 is a good combo with some patience.
I agree codex is untouchable right now.
I am on a grandfathered plan with GLM so its worth it. with the current speed not sure its worth it to buy a new sub though. if they fix the speed, then perhaps (price went up a lot)
yaaay a ship
Soft launched or was there an X post
I want a damn realistic comparison to Claude & Gemini
All the benchmarks exclude 5.3-codex for some reason
Gemini released benchmarks and totally left the 5-3Codex column pretty much blank
They waited until I bought credits šš¤£
Now consume those already so I can check actual token usage using the api
I bet itās steep
It was excluded due to a lack of API
Ahhhh
Hopefully theyāll be run and posted shortly now that it is out
I posted it already
Codex devs shared same one on their x
Please bring this feature to codes
Can you find it for me š
We need this! Control codex from iPhone app
ngl i want codex 4 back
wot
codex 5 just been pissing me off. The reviews are irrelevant half the time, and the planning is massively overdone for what is actually needed
like i dont need a simple task to turn into a 8 step over engineered solution!
and it seems to be defaulting to this now
so annoying
My Codex MacOS app just got updated to 26.224.1209 but can't find a changelog. Anyone knows what's new?
Probably windows optimization for windows app
The app uses the CLI too? Is it built on top of it?
I thought it was it's own separate client
codex app-server
no idea what that is
So, the Codex App runs the CLI, and it fires up the server?
It runs the CLI as a subprocess, yes. It outputs JSON payloads to a terminal that VSCode extension & Codex App reads, and those clients send JSON payloads as input back to the app server to indicate stuff like "user sent a message" or "user created a thread in this directory with these sandbox settings and configs"
I see. So it must be the work on the server, that is requiring so many alphas
Thank you
I guess the Codex App for Linux can be done, now that the server is here
Yep, they have an experimental "websocket" mode they've been working on. The likely outcome is connecting remotely, so no virtual terminal necessary
Oh yeah, Codex App for linux has been possible for as long as VS Code extension has existed! It's a little known secret apparently
I honestly thought that the VS Code extension was also its own separate client š
Don't worry, I've been misusing codex exec for several months under the same presupposition
Now the question is... should I wait for OAI to make a Linux app... or should I build it myself? š
Just do it š
I already did a Codex Configurator... I guess I could try
It's not as if I am going to be the one writing the code š
codex app-server generate-ts
It exports the entire API for you! Codex just translates/uses it
OpenAI did the whole world a favor and made building a Codex client using Codex as easy as possible
jfc, it is impossible to keep up with this people
Bust open the CLI and try to resume conversation, or go /Applications/Codex.app/Contents/MacOS/Codex in a terminal, and when you get an error copy&paste the error here (or if better, copy paste into an issue on github.com/openai/codex)
thanks! gonna debug codex w codex (:
it was a config load failure that crashed the app but not the cli... came when i updated the codex app. i had agents in root codex + repo
@tawny island I once made a contribution with Codex, using Codex. Then used Codex to make the PR, and it got reviewed by Codex. The only step that was human made was approving the PR
And no, I do not even know Rust š
Does anyone know how to set up the config file so that this is always Full Access?
lol the main blocker for software dev now is getting codex unblocked + understanding intent.
I cant write code and I just developed an app/ai agent in 6 months from knowing nothing about programming and now I work fulltime as fullstack dev... my blocker is literally just having codex being able to work in agent loops with clear ways of checking its work and reviewing + fixing in a loop then i just review and merge, or clarify and redo it from scratch if it was skill issue from my part.
I thought it might be "approval_policy" but it is not, apparently
i have an alias for codex that runs in yolo mode
this?:
approval_policy = "never"
sandbox_mode = "danger-full-access"
i just bashrc it => #codex-discussions message
I just went from 25% usage left on the week to 0% and i didnt even interact with codex today
Just... WIPED
anyone else experienced this?
https://x.com/noahzweben/status/2026371260805271615
new feature from claude
Does anyone have a good workflow for running codex review and "Fix all issues found" in an automatic loop until the review comes back clean?
Found it:
sandbox_mode = "danger-full-access"
approval_policy = "never"
i have workflow where it orchestrates agents to execute a living execplan and then drive QA with browser then code review/fix in a loop till code review no longer flags any p0,p1,p2 findings 2 runs in a row
you can modify https://developers.openai.com/cookbook/articles/codex_exec_plans/ to have that an execplan is not done until it has delegated code reviews and succeeded with that I mentioned
then it will just loop
Anyone else having the constant problem of the code reviews suggesting fixes that align with best practices but make the code too bloated for prototyping and iteration?
Nice! Is that an orchestrator you wrote yourself?
You can just codex --yolo on launch and it will do the same thing, unless you're in the App, not sure what hte context is here but just fyi
is anyone have problems running codex with the new .18 alpha ver?
I found that telling codex to run codex review itself doesn't work great - review takes a long time and it ends up eating up context with the polling action. I think I need an external revew+fix loop driver. I'm debating building something simple on the new app-server api, or maybe just driving the tui with submitting /review\n commands and scanning for some exit condition output like 'No issues found' in output.
I would strongly advise keeping --yolo explicit. It's not something you want to forget you've configured, if if codex is well sandboxed. I have a sandbox which starts codex with --yolo automatically, but I often start other codex sessions within the sandbox, for instance, you wouldn't want two codexes having those permissions running simultaneously.
I live on the edge... of my risk budget. š
no codex wrote a script that is mandatory to run through gates
lol... asking im #codex-discussions if someone "wrote that himself" š
haha. Codex did the typing no doubt, but I think it's still right to attribute authorship to the human driving the tool.
What do you guys think of Antigravity vs codex + playwright?
can you share what that script looks like?
Has anyone found a way for automations to have network access?
its quite many and repo specific so idk how to share it.
but i would start a chat with codex and tell it to automate whatever you want it to by using this as inspo:
https://github.com/am-will/codex-skills
https://openai.com/index/harness-engineering/
https://agent-browser.dev/
(plans.md)
and give it a flow of how you want to interact with codex and what you want to achieve, be very clear that you do not want it bloated and overengineered. then reference codex docs for uptodate info on how codex config works etc.
it can easily become overengineered since its not its training data i guess, it can mess up things with scripts and how to orchestrate codex when doing gates and such bcs it gets complicated quickly
then keep a short agents.md. this has got typos though and "## Autonomy defaults ā ļøFOLLOW THISā ļø"
is probably bloated
agents.md is now being recommended against.
Well, to clarify further, itās recommended against generated ones. Hand written and well targeted documentation is still helpful. But very little given how capable these new models are. Bad advice will certainly lead them astray and good advice hardly boosts performance.
never thought I would see the day but codex 5.3 is finishing what opus cannot... might have to switch the company over to OAI
is opus/sonnet 4.6 doing better than codex
was it worth it?
I have yet to deploy
Vibe-to-production will be the new CI/CD soon. š
on pro gtp 5.2 is unlimited (subject to abuse)
Is that in codex as well or is that only in the web chat?
I have some documentation. I need to update semi regularly and I use 5.2 for it cause it's a little bit better at the documentation than Codex itself. I don't overuse it or automated or anything. I'm just wondering if this is coming towards my quote or not.
If you use 5.2 inside of codex, yes it takes usage, but if you're on a plan anyway, just copy and paste into GPT and make a project where it's instructed to behave like codex. iteration is virtually free in gpt itself
ok thanks for the info
Here's a fun one. If you put GPT into dev mode, create an app (which is just mcp) have the server point to a local endpoint with tailscale or something, and then you can have GPT call Codex via MCP. That's a fun one for when you're brainstorming and want some files edited or context thrown somewhere real quick
yo whered my 5.3 codex go
u gotta log in again. known UX issue
yes I am going to try and get it out and see how it goes
bruh why did my codex limit suddenly hit 0 after i opened the extension
I had 70% usage left yesterday
Was there a visual bug in the extension
my rate limtik kept goign downa nd coming back up by 5-6%
so was i actually consuming everything
Is the codex app available for linux and windows yet or should I go back to sleep for another week?
just experienced it hahah wentf orm 70 all the way to 0

