#codex-discussions
1 messages · Page 37 of 1
WSL is pretty bad in my experience, it’s just better than the alternative (powershell 💀 )
they didn't have to make PWSH so complicated && verbose
I do fortunately just run linux on my personal desktop
but at work I must WSL to get anything done
the idea was to make an object-oriented shell, the reality is that’s actually pretty stupid and makes everything overcomplicated
Yet I've never been happier than before.
Mac is a great platform, beloved by millions and it's based on *nix. Windows has always been awful. But in this specific case, if different experiences are radically different between success and failure, I suggest success is possible and common, and thus the platform isn't the issue, and that failure is the exception of something unique in environments.
And "the WSL application over Windows failed so I converted to a different base OS" is non-sequitur. If that were the case we could say "Notepad is a bad code editor so I converted to Linux".
Again, I don't argue with your experience or the quality of your solution. I'm pointing out that one piece of software that failed to work on your system does not justify a complete OS change. What you did in switching to Mac was good for you and so many others ... but that does not actually say anything about WSL, which was your attempt to link cause and effect.
Wut?
I'm saying WSL works for some of us really well, so we use it. If it doesn't work for you, cool, don't use it. 🫶
WSL is certainly the least worst option for a nice terminal environment if you’re stuck on windows
If it doesn't work for you, cool, don't use it.
It's a bad way to put it, && frankly a shallow way to dismiss anything I said.
but that doesn’t make it a good option compared to what’s available on macOS/Linux
PowerShell is objectively a hassle to work with. Many things work great under Linux, so WSL2 is unfortunately the thing you have to use to really get things done proper at some point.
I'm being kinda verbose about this here because peeps are coming to code for the first time with Codex and AI and we're seeing the sort of not-so-logical leaps of intuition from these newcomers who jump from OpenAI to A\ to X, etc ... They haven't honed their prompting skills and they blame the technology.
Unfortunately, Microslop being who they are, decided to not do a good-enough job at making WSL2 work on Windows proper.
it works okay, just broken in subtle ways that you do have to understand
Which, if you're on Windows, should not be a need at all.
(Powershell is awful, I never use it. IMHO ... YMMV)
If you're not trying to run your own models, I'd buy a RPi5 they're fairly inexpensive and more than robust enough to run a linux desktop, or I'd dual boot
e.g. file watching will not work for stuff in your windows filesystem
If you have to emulate a headless Linux distro just to use Bash && whatever UNIX-like things on a system that isn't UNIX-like/based, you've kinda screwed up your own system. That dependency is pathetic.
removable disk support is also very iffy in WSL
Why the hate on powershell? What makes it bad?
it turns things that would normally be 3 or so commands piped together in a POSIX shell into a 15-line script with conditional statements
In their defense, Windows works so differently from everything else that it's kinda impressive WSL works at all! It's like having two universes with completely different laws of physics exist in the same universe
Dude, you had a bad experience and you're taking it to "the OS is bad, the company is bad...". Many of us agree that Windows can be awful and Microsoft has done a really bad job. But you're extending from one person's experience to blasting a product, a platform, and a company. That's full-out irrational.
just very inefficient and obtuse
I think generally any power user would rather anything other than windows
I mean, it's incompetence. I never said Windows && the company were all terrible from all time. I said that WSL2 is not properly done on Windows at all, even though it "works".
It's really not that hard to get the gist of it.
mom dad stop fighting
These are different Operating Systems ... "Windows is bad because it doesn't run Linux software" Whaaaaat? Please, now I'm questioning your age and competence.
What if you are building a windows app, aren’t you then typically bound to work in windows?
windows power users that love powershell do actually exist, I only slightly question their sanity
but my work is deploying to linux systems, so learning powershell doesn’t make a whole lot of sense for me
I'm a windows user. I also question my sanity, but it works. WSL is the goat though
I mean to be fair, even Microsoft thinks Windows is a bad OS. Their entire cloud infrastructure runs on Linux. You'd think the server OS they charge tens of thousands to license would be the thing running Azure but they don't even want to run it on their own machines
PowerShell is a replacement for the limited cmd functionality from early windows. It's like BASH/ZSH ... just another shell. I never use it, don't like it, but I don't bash the company over it.
My biggest issue with powershell is the characters getting messed up on default formatting.
powershell is better than CMD, I’ll give it that
that's why skills exist though
the bar is the floor though lol
This is true
Relax Robert
dont yell at my dad
I see this as a healthy discussion BTW, not an argument. 🙂
Never did I say that. I said that it's pathetic that if you want to do anything proper outside of giving yourself pain of Windows' own shell-like interface (Oh, did you know we have both CMD && PWSH, && then there's the two versions, where one is the older built-in, && the newer one is 2 versions ahead && has to be installed separately--how cool is that?), you have to move into WSL2, which itself is just giving you a container to a linux distribution, so what was the point of using Windows again?
So my grandad?
At that point, wouldn't it make more sense to just run Linux native && them contain a Windows VM for anything truly windows-adjacent?
This guy love to argue dw
I think it's important for us because WSL is a great way to use Codex. Done. For some it's not the right solution. For some a better answer might be Docker, or Mac, or a dedicated Debian/Ubuntu/RH, or whatever distro they prefer. We all need to try and choose our platforms ... and for our purposes get back to Codex.
the point is that microsoft has imprivita which means millions of workers are stuck with it indefinitely
Wouldn’t it make sense to just run Linux and use wine or VM if needed?
i'll give MSFT credit where due outside of W11: They tried to keep legacy backwards stuff going because some old heads couldn't be bothered to progress.
Windows itself is useless if you have a clue what you’re doing
No no you don't get to counter-argue and then try to return the discussion so nobody can follow up your statements... your back on topic request has been overruled.
Exactly just choose another solution, those are good options too. I think we can run Codex in any container.
Oh! I forgot Wine. Right
Not everything plays nice w/ Wine, but there will be a day. A DAY, i say, when that will no longer be the case
WINE can’t run most windows software that would keep you on windows
They've already done so much
Hence container on that rare case
But there's still a long road ahead
I think part of the problem is that Codex wasn't engineered to work well in a sandbox over Windows, so almost all of us perceive a need to use Linux. Why doesn't that translate to a bash on Codex, OpenAI, and AI in general?
Example?
microsoft office suite, adobe, 3DS suite, etc.
Now I wonder what the hell did XlethYireh do on Xubuntu, in 2013, to run FL Studio 10 && VEGAS Pro 12 without issues, + any OFX || 3rd-party visual plugns that were usually used on Adobe AE/PP
Maybe true, I don’t like office or adobe so I can’t comment
libreoffice/openoffice?
it’s not a technical limitation so much as these are very locked down software suites
(I have no idea what platforms he's talking about but curiously my respect just jumped way up. 🙂 )
lol I use these on bindows
better than installing the copilot-infested ffice 365 whatever heck
personally I prefer those but people are set in their ways
But in general just use latex
Mhm
Kind of
you’ll have to pry excel out of enterprises cold dead hands
how is math lang related to spreadsheets && word documents?
Formatted word documents, also replacement for PowerPoint
typesetting language but yeah it’s good for math
And BTW ... I think I need to give up on JaneBot. She's now working well but she seems to be more of an high-maintenance wrapper around Codex for conveniently creating local utilities. I dunno, I need to think her through a bit more.
...it can DO THAT?
That’s the main use
i thought all latex was doing was just rendering math equations
it can do anything really, it’s primarily for building documents
scientific papers and whatnot
i've been living under Patrick's rock apparently
Use Prism for LaTex
Last two years and documents I created were in LaTeX only (or some markdown)
Or now skip LaTex and ask Image 2.0 to render it.
lmao, true. I don't think arxiv likes image formats though
(I've seen her papers ... would hate to do that in LaTex)
I used LaTeX through university, it’s pretty solid
especially since you can generate graphs and stuff
Add the report into your project folder, then python plots are automatically updated when you re render
you can do graphs directly in LaTeX, no python required
Only sometimes makes sense
I used it for vectors a lot
I absolutely love latex
for other reasons
I like the snap it makes :>
there is also a dedicated plugin that adds coffee stains to your papers
I don’t think it installs with snap
Any other reason?
some folks are allergic tho
oh how the mighty have fallen...
That’s not funny 😐
Keep this server SFW please
ahhhh
JaneBot just refused create another utility. I hate AI. It's evil. Skynet is gonna kill everyone. We need this to be legislated!!
Hmmm, you have never heard of gloves? Mind always in the gutter? 🤖
Latex gloves are not safe in the lab, NSFW
From a tree?
No stupid, poor chemical resistance
Um, well for working with microplastics, the gloves actually themselves shed nano-particles, contaminating specimens. So the gloves are Not Suitable For (that kind of) Work (anymore).
(And yes, I'm stupid, but not in this specific discussion, I find that label highly offensive.)
That's my line.
(oops)
I use em when working on my truck. Thankfully not allergic 🤣 better than constantly using Gojo and brillo pad
Nothing like Nylog all over your hands
It’s so gooey
Or that blue gas pipe stuff
There’s not plastic in latex, it’s rubber, from a tree
Yummy 🤤
um, that's also correct. I accept the public flogging.
OAI after hours
BTW, JaneBot refused to do something because of explicit directives. I guess AI is OK again. I won't trash my system and move to Mac afterall.
What are you asking poor Jane to do?
Unspeakable acts
And it’s programmed that it can’t say no 😭😭
@boreal holly do you have a good flow for quick testing work trees with the iOS simulator, they seem to like not wanting to give it up to the other work trees
I asked her to create a utility that summarizes system status, kind of like HAL 9000 would report on the ship status. She refused because that utility would require access to /var/log which is forbidden outside Codex sandbox scope, and journalctl which is a forbidden command. She actually refused to write the code that would violate high-level directives. She done good.
What Mac should I get?
depends on your needs && budget
Get the expensive Mac ... Oh, I'm sorry, they're ALL expensive.
Maybe the new mini Mac book and a Mac Studio
not really if you go into used market
Not really if you’re not broke
2021 MBP holds up pretty well against whatever I owned in 2022/2023 outside of Apple, esp. in terms of speakers && hardware efficiency
also that touchpad mmmmmmmmmhmhmhmhmhm
I load up multiple simulators and assign them one. Typically it's the same device with varying iOS versions so the experience is the same
the 2nd bst I ever tried was from my 2022 Yoga 7
Look at how fluffy this snow is today
Christmas in April?
Well, for what? You need a clear need for it, not "just because"
Calculating your co ordinates rn
We need to crank up the global warming
Wait, can't just dump some nvidia gpu onto it via server linux?
It's not a yearly expense! I have a 4 year old Mac Studio M1 Ultra and it still performs as well as the day I bought it 😁 Apple builds the machine you want 10 years from now, that's why they're spendy
I can but I want a Mac
fair point
Can I have one?
Probably check the refurb store
@lean lark pleaseeee 🥺
I’ll be waiting
This is the slippery slope: WSL is awful so I want a Mac. That means bucks or getting someone else's refurbished hand-me-down. That's a horrible series of consequences to avoid an app.
Again, though, note from above, I acknowledge Mac is a fine platform. It's just not for me.
Not for you, for me
to be fair, apple does overcharge a lot, but i give credit where due in other points:
- convenience
- ecosystem within their own services && other devices, even as far back to eons ago to a big extent
- hardware, esp. on iPhones, that many Androids still don't have in parity (faceID + TrueDepth -- the stuff VTubers sometimes use for their advanced facial movements && expressions as long as the model is wired for it, && yes, it takes a lot of time && effort to do so on model side)
- software interface to hardware was || sometimes is mostly "we do this once or twice, then keep it maintained, not make regular users our alpha/beta testers) -- hint: xiaomi is notorious for the latter nowadays, esp. w/ HyperOS when it came out
I mean, I wanted one for the sake of general-purpose software development && music production-ish stuff
WSL2 && linux crapping out on my GMKTek K8 Plus was the last straw
I know bud, I'm razzin...
Refurb is not a bad deal. I have a 2019 intel i9 macbook pro that still works, bought it refurbished, and the only problem it's had since 2019 is the left speaker blew out, so I set the balance to the right speaker. Other than that, still works!
I think M1 is still to this day powerful and a substantial upgrade over modern PCs. All I recommend is you get >16GB RAM.
Did anyone find good way to generate nice powerpoint presentations with gpt? Usually i end up with a lot of elements not being correctly positioned etc
rip speaker
I think I changed the storage in my laptop and forgot to reconnect the speakers so I feel your pain
1076 EUR for an M1 Max 10C/24C MBP that has 32G ram && 512G ssd is honestly not bad, considering 86% battery && the fact the price had 24% VAT on top.
Wrong channel
For my next purchase of any box, I'm gonna need at least 32GB VRAM for AI.
I actually meant with codex app, not gpt 😄
Mac has like unified memory
Yeah so go to #codex-discussions
HAHAHA
this is codex discussion channel 😄
Gotcha, so 32 GB Macbook (or maybe 48GB, 32 for AI, the rest for the system)
Is it? 😐
unless it is now mac tech centre
48gb might be needed at that case
price difference between 32gb and 48gb is quite big sadly
It’s just for chatting tbh, have a look at trying to have it generate slides in LaTeX
that might actually work
I think it’s a reasonable solution, makes sense to me
Makes sense I’m Ngl I assumed it was 1 sim at a time
I wanna say each sim takes up ~500MB + your app so it's not too bad unless your app is recource hungry
I’m finally going to get a new phone
Noice! I got an iPhone Air recently. Upgraded from iPhone 11 Pro Max
I mean the simulator from @frosty zealot
I was looking at air but I don’t think it makes sense over 17
@boreal holly come here I’ll give you the belt
Tattoo identified
35 years old??
I need the VFD tho 😩
If a mod here here they'd prolly suggest we move to #off-topic
base 17 is tempting because finally base model has 120hz, but also too expensive
max for phone i buy is 500eur, that incl 24% VAT
my 13pro w/ 100% battery health && good condition casing (it was reported as Grade C but i saw no grade c) was 299eur
I got the air because the titanium frame. Saw some dudes do stress tests on it, seemed very rugged. idc about camera or battery, just will it last 10 yrs
Okej, the rugged thing is kinda true, though
can't argue w/ that
i really gotta get myself a cheapidy cheap iPhone XR || 11 just for TrueDepth, so that I could harness it as a more-powerful webcam over my otherwise-iffy Logitech C920
I have it, I really like it
Where 5.5
Tomorrow
It can’t
My bankruptcy depends on it
Or I think 14c each
I’m scared too look
Made like 200%
😂
why does codex have a 5 hour limit anyway ? just let me waste my 7 day limit in 1 day
Hi, quick question about Codex MCP handling.
I’m trying to understand whether GPT-5.4 can actually see namespace-level descriptions / MCP ServerInstructions.
I verified with mitmproxy that Codex sends this in the request to https://chatgpt.com/backend-api/codex/responses:
{
"type": "namespace",
"name": "mcp__openaiDeveloperDocs__",
"description": "Tools in the mcp__openaiDeveloperDocs__ namespace."
}
But when I ask the model to repeat that namespace description verbatim, it returns UNAVAILABLE.
I see the same with my own MCP server: per-tool descriptions seem visible, but namespace description / ServerInstructions do not.
Is this expected in Codex, or should namespace descriptions be model-visible the same way tool/function descriptions are?
I think OAI's MCP implementation is pretty basic, that's why they ship the skill docs with their own MCP so the tooling can be described outside of the server
Opinion on the limits:
- Daily limit is an arbitrary contract. We agree that we'll only get 5 hours per day and we can pay for more.
- Daily limit also helps to prevent spikes of mass use over a broad time span. It's a kind of defensive mechanism that might not be there if "compute" wasn't such a limited resource.
Betting vs trading lil bro
So upgrade
fixes nothing there is still a 5 hour limit
Not one you can meaningfully get through
It's practically impossible to hit the 5hr limit on Pro (except spark)
What even is spark?
A very, very ridiculously fast gpt-5.3-codex model
I might try out, I don’t have time to use my limits this week so
spark is mini 5.3 codex
well ill by buying 1x pro 20x when spud comes out if it works well i might get 2 as ive been running out of codex so fast and its the stupidest thing ive ever seen claude is even worse then it comes to limits
Preform degradation?
dunno ¯_(ツ)_/¯
ive been hearing spud is faster and uses less tokens
Spud?
5.5 codex is out tomorrow
why are we saying "x.x. codex" if we're not talking specifically Codex-optim models?
gpt-5.4 in codex =/= 5.4 codex
gpt-5.3-codex = 5.3 codex
Because the chat version is not out tomorrow
The main constraint is 128k context window. It's reasonably intelligent. Good for massive amounts of chores
My project isn’t that big
If I build an MCP server, am I really expected to tell every user to manually add guidance to AGENTS.md so the model knows how to use it?
That seems like a broken workflow. What is the intended way for an MCP server to communicate its purpose and usage guidance to the model?
shouldn't we then say "5.5 in codex"
No
You can add resources to the MCP server. I know for a fact they see that stuff
but 5.5 codex would imply existence of gpt-5.5-codex akin to gpt-5.3-codex, similar to 5.4 pro for gpt-5.4-pro
This but 5.5 #announcements message
arent they no longer making a codex model ?
They are, new one tomorrow
thats just 5.5 not 5.5 codex optimized version
How do you know?
if i see "5.5 codex", i expect to see gpt-5.5-codex in /model list
More over how much you want to bet?
Yeah, that’s what I’m trying next.
But honestly it still feels like a weird workflow: the server can’t really explain itself directly, so I have to put guidance somewhere else and hope the model reads it.
people had access to it for a bit and it said gpt 5.5 not 5.5 codex 😄
I mean there's a non-zero chance they'll release another codex-optimized model. They released rosalind or whatever for scientific research, so they still make fine tuned models
Never again
5.4-codex would go hard if done right
now imagine 5.4-codex-spark HOOOOO BOY
why do they even make chat models ? do they even make money from that, as far as i see it chat models are just another waste just like sora was
not everybody uses/owns a computer, && web users are a diff market that's still out there
you can't expect to run, idk, codex on a damn phone && have a nice, comfy workflow
i mean wouldnt they be able to make it use the cloud version instead of local ?
besides, being able to quickly go to the web && ask things away is easier & better than having to go thru hoops just to have a user ask "how many Rs in strawberry?"
website is just convenience
They make chat models because for chat they use sliding window attention. Free/Go/Plus plans get like 14k context or something really small. This means if they want their model to not tell users how to build bombs, they have to fine tune that behavior into a dedicated model, so when the context window runs out and there's no compaction to help out the model still remembers its rules.
The models we use in Codex are not fine tuned, and the behavior is described in the system prompt. There's no sliding window attention, and it uses compaction to manage memory. That's why they have a chat model and a regular one
that just makes me think, i wonder how much time and power went into that, like if they didnt have the censor the model how much more powerful would it be already id imagine alot
the amount of code things it could do if it wasn't held back by guardrails about "legality" && such
@frosty zealot hahaha you
Maintain aura at all costs
And how much would we save on the expense of automobiles if they only got rid of all of that heavy garbage like bumper/fender, seatbelts, airbags....
Me in class after vaporising my account
i wish they didnt have to deal with that and instead the blame would be pushed onto the user that made something bad etc
Mind you that tech companies, like motherboard manufacturers, try to "save a few teeny pennies" by not including a debug electronic hex report screen thingy on their boards unless you paid like 300-500 buckaroos
Unlike Claude, OpenAI's responses API does not natively support MCP. What they do is they convert MCP tools into the same function call format used by other non-MCP tools before it's sent to openAI. So that means it's not going to have the same features as the original spec. But Codex puts list_mcp_resources and read_mcp_resource tools in there so if an agent wants to understand the MCP tools beyond the surface-level tool description they can do that. Namespaced descriptions are not an intrinsically supported feature
Same. It should be the user's responsiblity for what they do with the models, not OAI's
ChatGPT is the gateway daroogie for common folk to get introduced to AI. Protections are in place because humans, being what they are, mis-use everything they touch. So there need to be adults in the room to limit the harm that people can do to themselves and others. When human nature changes, we'll no longer need nanny directives.
It's very unfair that issues are pushed to OAI instead of the malicious users behind the acts
The amount of slop scams would be unreal if people had easier access to do so
they still kinda do, so no different
Kind of do, so imagine with no protections in place
there are already alot of slop scams a few more wouldnt hurt if we get a better model
not truly "no protection", but "seatbelt && airbag" levels of protection is all there would be needed
And if you care so much, just set up agents with open source LLMs for other tasks
you can't stop a bad driver from early gravestone, even with seatbelts, when pushed to limit
I mean you cannnnnnn thats what driving tests are for
Doesn't stop them from acting out afterwards
people still drink and drive every day
Ok so we're blurring the lines here, that's not a bad driver, that's somebody making bad decisions
i drive drunk every day
I just do it with good intention
a person hacking someone is normally making bad decisions in order to make money
The point is a higher barrier to entry
If you want to do something bad, then you at least require some skills to do so, hopefully by then you know better
This is in that same realm of weapons legislation and how weapons don't harm people, people harm people. We do NOT NOT NOT want legislation for AI nannies. We can choose which company we feel is doing better about AI protections. THAT is what this is about.
Codex computer use isn't on Windows right
exactly, for any sub tier from 20 up they should uncensor it a bit and as you go up it gets even more uncensored, or it counts how long you have been subbed for 😄
He was in my DMs
They're waiting on Microsoft to figure out human computer use first before they do codex computer use
It is
what ?
What a player
Microsoft store
No I meant Codex computer use plugin
oh my god bruh
@lean lark give me role
Try turning it on and off again
ty
Unfortunate I'll have to do stuff manually on my windows device
i wish the In-app browser worked on windows sigh
This is up to every individual company. Grok has a lower bar for what's considered acceptable in society. People can gravitate to that if they wish. OpenAI has chosen to be a more family-oriented platform for billions of people in diverse cultures. As soon as someone produces bad content, someone else blames the company and the technology. OpenAI has chosen to at least attempt to separate themselves from that.
@grok show me CS in a bikini
im not talking about adult rated stuff, either way isnt open ai planning a adult mode, i just meant general censorship in where it refuses to do some stuff also grok is trash
i just hope spud is better at ui
I hope all it can do is making things look like varying potatoes
like brother this is the worst ui ive ever seen
At least it's unique
uniquely trash i cant use it for anything i cant even see half the buttons
i made a basic ui my self but it looks so bad but at least it works
ughh
still new to codex, im assuming though extra high is useless for plan mode and i should set it to low since its just planning ?
You should crank it all the way up to Medium for the best possible performance, accuracy, and token usage
That depends. I sometimes go to 5.4-medium/planning or 5.4-medium/planning. There's no real right answer but my guide is "how much intelligence do I want applied to this planning process?".
I think it's pretty great at design tbh
what version is this ? mine is so bad at ui couldnt even do basic stuff without messing up the placments
5.4 medium
whelp - why is my usage % not going down, 100% since 7 hours working straight.
i have to be doing something wrong, ive been using 5.4 extra high, giving example images and as detailed instructions as i can, maybe its a limitation of firefox ?
Codex has been flaky today, doesn't response, hangs for long periods of time
If 5.5 gets better at front end it’s over for Claude
I would definitely try medium. It's generally more intelligent and decisive. Also make the agent crop the example images into 4 quarters and look at each one individually. It sees a lot more detail when it quarters an image than viewing the whole thing
has better rate limits its already over for claude
Maybe due to model release tomorrow?
I’ve had a few reconnects but
BREAKING: Dario Amodei, CEO of Anthropic, warns that codex rate limits give “too much compute for mere peasants”
do you think they will reset limits again for the new model ?
I had a good 1/2 of hangs and no prompts working, but status page showed everything ok. I'm just having reconnects now.
Yes
Tibo will hit the button
why does it do this ?
use medium
this is the normal chatgpt app not codex, on 5.4 normal
You ask why multiple thinking steps?
yea its on auto, its what ever just looks a bit strange
Oh ok
It happens when it chains distinct cot's to have them show as separate
why does he look so sad all the time
he looks like a sad, confused thumbnut
Just keep it on extended thinking at all times. Instant model is much dumber
For easy questions, thinking extended will be almost instant anyway
There will always be their cult
thinks that Dario thing was a troll
i dont think ive ever ran out of normal 5.4 whats the message limit for plus ?
You must be doing something wrong 😂
No limit
it might just be a issue with firefox plugins no clue though
Message limit?? That's not a thing.
really ? i thought only pro had no limit
On pro you get extra perks
Virtually unlimited gptpro with 2 settings normal/extended
2 extra reasoning for thinking light and hard at the 2 ends of the spectrum
Access to some older models
Pulse daily digest
And increased usage on deep research, agent etc (but I never use those)
You can post as much as you want into a thread. The issue isn't with quantity of messages, it's the size of the context window.
intresting the way they word the pro and plus it seems like plus has limited chats
Go to opus discord and throw a party then 😂
if codex was playing chess and you asked it to movea specific pawn, it would just figure you wanted all the pawns moved on both sides
its so bad
It was like that 1.5 years ago
im spending all my time putting all the pawns back right now
During early o1 era, you had like 50 messages per week with o1 on plus
And unlimited on pro
Free users get limited access to 5.3 instant, Go gets unlimited 5.3, Plus gets access to 5.4 Thinking, Pro gets 5.4 Pro and 10x context window on the models.
But I think since spring of 1year ago model use became unlimited on plus too, except you don't have access to GptPro
5.2 is better than 5.4 but plebs cant use it
oops didnt mean to reply to that
In practice you get no limit on thinking on plus though. I ve been on plus for the 2nd half of last year, and never hit limit and I used it a lot
Oh yeah absolutely, I think rather than limits they just give you less preference during peak time
Goal posts are moving from questioning limits on messages to thinking... Rather than focusing on "I think", just look at the docs and see what's there.
I guess you get default, whereas on pro you have priority boarding
i feel like i need to keep search on at all times or else its 2x more stupid
I wonder when exactly will I get the notificarion that 5.5 is here
? 😂
Hmm?
@frosty zealot wake me up when the discussion comes back to Codex.
You need to give it a problem it can't solve. And then teach it how to solve it. And then it'll respect you and always think harder for you 😂
Ok dad have a good nap!
i wish there was a way i could tell it to think for over 6 hours so i can goto sleep but still know im not wasting time
that seems stupid now that i think of it
We re speaking about the 5.4 thinking that is used in codex too. So it qualifies
Do you guys get overburnt after long streaks of using codex?
seems like the length of thinking is getting longer though, 30 min for a simple refactor of a xpi thats under 20kb
i get tired then go and play games
Lol I play games while waiting for it to finish tasks 😂
for some reason the new image model seems to be bad at humans still ? i mean there is no way this is the new model right ?
But sometimes I just get a day when I just feel to get a break from it
ive seen amazing images of image v2 that are crazy realistic this has to be image 1.5 still
i tried the website the desktop app and the mobile app
Maybe you don't get very good quality on codex or gpt with it anyway
If you want 2k or 4k with high quality you need to pay for it via api
Ah you're trolling
Is it @kind jay 's alt account
Bro I was thinking the same thing 😩
no this is literally from a few hours ago ..
I dunno about getting burnt out on Codex itself. Within the last few months my approach to development has significantly changed, with Codex being an integral partner-as-tool in my process. I'm now spending time to formulate prompts rather than looking at syntax. I'm spending time to check the work done by Codex rather than fixing bugs. It's less of a burnout from this than just a shift in what we've all experienced as developer burnout the way we've always done it.
webapp is the same, was just testing it to see since i had no idea what to put as the cover image since its just a asi mod that doesnt really have any good images that would make sense its not like the prompt is short either
Nice answer
the text is very nice, but from other images ive seen they were way better at characters and backgrounds
Bro go to api add
10 bucks
Then make that request on 4k with high detail
ok thank you thats why i felt like i was doing something wrong, let me go try it
Chatgpt/codex can do at most 1024*1500
And either low or normal quality, not sure about that
I tried myself to push it more than that but they can t tweak quality
Codex doesn't have any knob exposed for quality or resolution in its image tool
Those are available only via api
im guessing its since they dont want another sora thing to happen with normal chagpt ? so they make it harder to get to and cost per image instead of a plan, that makes sense but it is a shame
You talking about viewing or generating images?
I think i that live stream they used max quality and resolution. Like there was an image where they zoomed out to a speck of rice that had image 2.0 written on it
You can t replicate that on codex or chatgpt
😂
Generating
If we don't have control over the sub-agent that Codex spawns for doing an image, tell it to create a utility that creates a model object that has all of the specs you want, then execute that utility. So you either get the general-purpose solution, or you get what you personally want. Just be specific.
sub agents dont get full tool usage depending on how you have it set up and which flags. If it uses --exec, then it's read only with no image capability. The best way to have codex use images with a sub agent, is to have a specific agent that exposes the MCP
yea robert made it look like i was doing something stupid 😅 , they were using the chatgpt app in the live stream so i assumed it would work normally via that but i guess not
I was gonna say I did that Grandma as a Service pic and it seemed way higher res than 1024*1500
for context i havent used the images before since it was pretty bad so i had no clue about the api
Well I zoomed out my images and the text is barely readable haha
I think that's what I said. 😆
@frosty zealot found your steam
I showed that to a friend and she LOL'd.
🙂 Put it on the shrine
I’ll DM you proof (not allowed to show here)
Maybe in some cases they route you to higher quality. But it can t be controlled by the model you talk with.
Go look in codex repo, and you will see there is no parameter for quality or resolution.
if you guys are trying to reduce usage i recommend https://github.com/JuliusBrussee/caveman/blob/main/README.md
my bad. I'm tired and read it wrong
I have no reason to doubt you
I literally just want to make an edgy joke
LOL
Nah, I'm just long-winded and make it easy for peeps to get confused. It's an art. 🎨
Alright, I'll entertain it, one moment
Caveman is interesting, but I don't consume tokens in system responses. My output tokens are consumed in thinking, new code, patches, processing CLI output, etc. My Codex output is nowhere near as stupidly verbose as I am. 🤔
No one uses tokens in outputs. Those are like 1% of the total tokens used in the process itself
it thinks in less words too
What do you think those tokens are that are generated in response to prompts?
Yea sure
My prompts are like "process the first item in the todo list". That's about 10 tokens ... the token consumption is ALL in the output.
Its asstonishing that the tool can actually design a nice look in a meme, but then when asked to implement such thing in actual code... it coems up with the dinosaur poop of the second section. radius, radius, radius and cards, cards cards.
Next level is that it s also writing short form python right? 😂
Caveman rust and cavemen python
90% less tokens
🤣
Why it's astonishing? The model writing the code is not the model generating the image
That's hilarious ngl
( Side note : Will anyone else admit that somethings they can't understand what in blazes Codex is saying ... about their own project? )
It is a legit strategy, also using mandaring works.
It's just not worth it with how generous OAI is being with tokens
Top 10 anime deaths
The model writing the code is an aphantasic model and right now there s not much communication with the image engine. Beside the model sending a prompt and getting an image back
what are you hopping for the most in spud ?
You can t set internal reasoning the model does. And that uses most of the real tokens. Not the final output you get. That s my point
I mean you can set reasoning effort. But you can t make xhigh think in more compressed tokens just cause you tell it to
i think being faster + using less tokens then 5.4 would already be a big jump then just using more tokens and being better
Front end design capabilities
yeah its getting bad.
That's kinda the pattern of Every new model.
codex will often use slang it made up
you can tell it to stop using or creating 'project slang' but its something you gotta hammer into it
the 1st part or 2nd part ?
and then the other issue is it saves tokens by completely making up what files do based on the name of the file
itll also write docs you never asked for and just read those, which are also mostly made up nonsense text generated based on the file name
When people complain about Codex being bad at UI, do they mean at a creative level, or do they mean like it doesnt know how to position elements
if you're lucky itll read a file and infer what a file does based on function names.
None of what you say is part of my experience
NewModel = OldModel + (OldModel*PercentSpeedImprovement + OldModel*PercentQualityImprovement - OldModel*PercentTokenUsage)
So i have a hard time believing your stories
Hmm, that made no sense mathematically, nvm
😂
So who else has their codex producing random docs you never asked for?
Cause personally I never had that happen
uh never so far
itll write tests and docs you never asked for constantly.
@frosty zealot How much can you bench?
I was just really impressed with all the docs that JaneBot recognized as being necessary.
My point exactly 🙂
its also god awful at writing unit tests
it'll make tests using synthetic data without even looking at the real schema so it wont match
and then like venusrose it'll convince itself everything is fine without knowing whats actually going on
Lol
I hate to say it, but definitely a skill issue
you can hammer into it to stop doing that
it takes a long time and a consistent session id
Everything you have to share in this chat is how codex is bad at various things that just don't match up to my own experience
And promoting github stuff
If the bot is or is not doing something that you want, you MUST be the leader and control the tool with firm directives. It's all on us. Bot success is proportional to prompt quality.
So i have a hard time why you would actually pay for a service that is so bad as you claim it is
dont use caeman venus. spend more of your money
Yeah, I hope some folks shift their approach here from telling us how bad the tech is to asking us how to improve their instructions.
Bro the plus plun on 20x is the best purchase I did
Is that what you’re doing with Jane?
I can t even exhaust my qouta 😂
you can give it the same prompt a 100 times and get 50 different responses
there is no hard output for a specific input
I think you said that yesterday too.
That s such a generic anti llm objection
😂
You don't want 100% determinant responses .... they could all be wrong.
venus im guessing you're from eastern europe
how many members does this discord have again ?
That’s actually crazy, would you not write the tests and have codex write the code to solve them?
Didn t we had this exact convo yday?
At least 10
With AFK saying the exact thing
Idk why I wanna participate in this, but if the codebase is full of garbage then the agent will only produce garbage. If there are unit tests with fake data they are going to look at those tests and think they must write theirs the same way to remain consistent.
With 100 inputs and 50 different outputs
LLMs are deterministic
Depends on the temperature
Still deterministic, depends on the seed
i catch the issues before they pollute the codebase
but i spend more time right now with 5.4 dealing with things i didnt ask for than with previous models
when people say "everything's fine" i suspect they just arent catching slop.
LLM temperature between 0.8 and 0.9 can be much better than 1.0. There are other factors. Look at the tuning on various published models where testing indicates varied performance with not just temperature but other k/v factors.
like i promise you guys ive. been working with claude and codex for over a year now. hundreds of millions of tokens used. im attuned to how it works and how to get it to do what i want. im telling you its getting more and more difficult.
As to quality of bot output: Prompts must be high quality. Tell the tool to check its own work. Generate sample data that conforms to schema and that doesn't conform. Run that data through the unit tests. If non-conformant data passes tests, fix the tests! There's nothing mysterious here.
You're saying 5.4 has gotten worse over time or something?
heh... seed
it overthinks, overengineers, goes on more tangents, builds things you dont want on the offchance you want them in the future....
He s saying 5.4 sucks and 5.2is much better
the analogy i used earlier is if you were playing chess and asked 5.4 to move a specific pawn, it would move all the pawns on the board from both sides.
There's that doc on the OpenAI site that @boreal holly linked to a while back about how different 5.4 actually is, and how we need to communicate with it effectively. "It's different" is not imagination. It really is. But for ALL AI, we must write good prompts or it doesn't matter what the model is, the results will be undesirable.
5.4 certainly takes a little more work to get in line than 5.3. I had my issue with it for sure. But i got it working in the end, just takes different nuance.
its not just different, it's difficult.
That may be, can't argue, but we must learn how to work with the tools we have.
Bro why don't you share some images with your obvious dissatisfaction with what codex did?
You know that kind with claude deleting the database and user getting irritated
Cause if it s that bad as you say, you should have a lot of those lying around
its near to the point that the time you saved is less than the time you spent fixing slop and trying to establish guardrails
Hi guys, I'm new in here, I normally hang out over at the Claude discord, and I'm just wondering if I should give Codex a chance or if you think 4.7 Opus is a better model
i think opus is better rn
Dear Newbie, I think you should try Eliza.
4.7 has better tools rn too
you must have some serious code base issue to think that.
@frosty zealot Am I niche?
keep in mind it intentionally gives you higher temp responses and then logs your dissatisfaction as a metrict to tune the model in the future
so your frustration is OpenAI's profit
No, APC Like what I am listening to is Niche
Can’t see it, not added
You realize this is a feature people have begged for?
Not the over engineering, or going off on tangents, but the fact that it continues to work to completion. It avoids stopping short of completion. You need to put completion criteria and enforce stopping. 5.4 is indefatigable compared to the old models, so you have to give it bounded tasks and rules for stopping.
Whenever someone says "I got a bad response" and we ask "what was the prompt?" The inquiry never goes anywhere. Bot responses depend on full context, which includes all .md files, custom instructions (or whatever), and the full thread. We never get that info ... and I don't want it.
Honestly, it's almost impossible to generate a repro case that includes all relevant context ... that's a fault in this tech now that I hope we'll be able to resolve in some years.
lol it needs work
do I have it hidden?
How would I know?
click my profile
It’s hidden
AFK - if Codex or ChatGPT do something you don't want, have you ever asked it Why it did that? The responses to that kind of question can be extremely illuminating and educational.
like if we're starting wars and bankrupting society to turn earth into a giant chatbot, I shouldnt have to spend all of my time setting up guardrails and engineering prompts
wut?
u herd me
"like if we're starting wars and bankrupting society to turn earth into a giant chatbot" wut?
i dont wanna do all of this. im going back to manually coding everything
I’m going to vibe code a Spotify niche meter, feel free to do it before me
yeah we have all decided to turn earth into a chatbot. for some reason.
i dunno who thought this was a good idea but we're doing it
homie having an existential crisis over 5.4 needing some minor tuning
In this channel we focus on using the technology. I think concerns about the world going to hell because of the tech is best in a different forum, whether we agree or not.
all money, all resources, the total agregate of human productivity, all the rainforests, all the data centers, all the fresh water, all for a chatbot
slow down brother.... The answer to your angst is in better prompting ... more work to use better tools. that's the way everything works. These are tough times, no doubt. I think we all feel it.
4.7 has better tools
looks outside
512K lines of Node.js memory hog excused for a harness (claude code)
yes
Can we say the same about jumbojets flying on autopilot. Um. Yes.
Had to use the vertical monitor for this one
i see GetX and it hurts my soul
Yes, that's why they ripped it all out 🤣 I thought I'd try it
Robert does this for HVAC ... who knows what people do for real world use. 😆
Robert was my apprentice, gret apprentice, some might say the best,
And that was just for the kitchen heating unit.
how can i get good uptime like github
im sorry, robert, did you build a robdex?
Roblox
Who wants to play Roblox with me?
W Kit
Yeah, it's the alternative to codex
Me reading my documentation and glancing at the screen as the spambot deletes a bad post.
they have poisoned context no doubt
interesting. yeah it's designed to do more work per run, which is great if you can dial it into what you want
not so great if you dont have a long, heavily guardrailed run for it to do
☝️ blaiming the tools
no i love writing long heavily guardrailed prompts every 4 minutes
but he can do it for you😭
if you want success you cant rely on prompts, you need to back check the results... has to be done both ways. you cant prompt your way to clean code no matter how good you think your prompt it. its not gonnna happen and over time will get worse due to error
doesnt ai already change your prompt to be better ?, what would the differance be to prompt the ai to make it better then paste it in then it makes it better again by it self
the SDK is great for that
that was some bad grammar but it gets the question across
The most powerful anti-poison I've found is enable sandbox, set up rules that hard deny certain commands, and funnel them towards the outcome.
I have it set up so when a worker hits a sandboxed tool call that isn't allowed, it forwards it to the orchestrator, and their only way to handle it is denying the approval request. So when a worker starts drifting and doing bad stuff, the orchestrator sees the tool call drift and reminds em of the rules. Basically set up booby traps for the agents to run into and take that opportunity to put em back on track lol
not just tools though, I would argue architectural debt can't be fixed, only mitigated
but yeah that's a good layered approach
Funny to see statements like "you can't" about things that I actually do.
I'm not saying everything is perfect here or that my prompts are awesome or that the bot doesn't make mistakes. I'm saying better use of the tooling yields better results, simply compared to poor use of the tooling obviously yielding poor results. Perhaps the easy compromise is "learn to do these things better". That's a daily effort for all of us.
Some people think they can tell a bot to write code and it will write whatever they want. You have to tell it What you want correctly, and How you want it correctly. The responses we get aren't "it's done, ship it". The responses are an ongoing sequence of "it should be better, check it, what's next?" ... just like coding we've been doing for decades.
prompt == assumption management. still need the objective feedback loop
Hahaha
(Might be best to experiment in DM, and you can delete experiments when done)
?
How would that work?
DM the mods 😩
lol
One of the mouseover options here is to delete your own message.
Group chat with the bots
i just read that page. what do you mean? plan mode?
nope
Get the LLM to improve your prompts and AGENTS directives.
The real results are in adjusting the system prompt to suit your given project
you mean have itt write guardrails everywhere.
inject guardrails into every doc, a mass proliferation of guardrails
what happens then is it starts rereading and reinterpreting those guardrails in a million different ways until it thinks everything violates those guardrails.
then you gotta rewrite em
That is how a newbie might do it
You ever seen the movie Joe Dirt? There's a guy who's like "Home is where you make it", and Joe understands what he said differently. All your docs are written by different models that see the world differently. You avoid that by having 5.4 write the docs in its own words
yes:
- There are Account-level Codex directives. See cloud client settings for this. These settings apply to all Codex usage for all projects you do.
- There are System-level Codex directives as ~/.codex/AGENTS.md.
- There are workspace-level Codex directives in each /project/folder/AGENTS.md.
- There are folder-specific Codex directives for /src or /docs or src/components
- And you can tell the assistant to refer to specfic .md files anywhere for additional guidance about specific topics.
Use ChatGPT or Codex to help you write ALL of these.
Ask it about conflicts, anomalies, contradictions, tensions, gaps, and suggestions for how to improve it all.
Take the time to do this and you will get SO much more out of the product.
but again 5.4 doesnt know what words mean. it's going to interpret the same word an infinite number fo ways.
thats the setup you must build for codex so he wont ever get lost and even write prompts for humself
no
yes lol
mine those not idk bro
I'm afraid you're not understanding what this stuff is ... refinement for use with a product like Codex seems to be getting outside of your scope.
yeah dude ive only burned a billion tokens idk what im talking about
ive got nowhere near the prowess or intuition as yall
Use ChatGPT or Codex to help you write ALL of these.
Ask it about conflicts, anomalies, contradictions, tensions, gaps, and suggestions for how to improve it all.
Take the time to do this and you will get SO much more out of the product.
If you've burned a billion tokens and you don't know how to tune a prompt ... we're seeing why you've burned a billion tokens.
who's deployed a vibecoded project yet?
I don't want to insult you, sorry and know I have. You can do better.
how many vibe coded projects have reached production?
too many
Many people have success working through these problems and get results. Maybe you should ask your self if they can do it can i do it?
they dont make it
What am I doing wrong? I have GhidraMCP and have it linked to Codex. It's finding functions in the disassembled code. It will think for a while and stop. Like it found the entry point for the introduction story that you start with a button "Introduction". It was able to find it. I say "Follow the flow of that function. It should load Video, graphics, and sound files and eventually loop back to the main menu.
It'll build a list and then after 20-30 minutes say, "I made good progress to getting through the introduction loading, blah blah blah." I have to say "Okay keep going" every 20-30 minutes. I just want to let it ride. It's doing a good job, but I don't want to sit here all day nudging it forward
You seem stressed
The world is full of MVP public offerings. Vibing is the latest way to generate an MVP. There are a ton of them out there and the average consumer accepts them as SOTA. It's a horrible low-bar for the world.
The last prompt i said "Don't stop until we have a WORKING intro with the proper videos, audio, and text. If your reply after running refers to something not being quite there, or not being 100%, don't stop. Just keep going." That seems straight forward
name an MVP in production thats vibecoded
@patent wharf perhaps you should break down the project into modules: do videos, then do audio, then do text... I have no idea about your project, can't engage on this one, but it sounds like you're literally asking it to do the whole thing in one shot and that's never a great approach.
Which LLM is better at writing code?
i woulve told you claude for front end codex for back end back when it was 4.6 and 5.2
now idk
Google "list of published vibe-coded SaaS"
🙄
5.4 is best, 5.4-mini is good, the rest are OK
Why can others solve the problems you claim to be unsolvable?
its like talking to open ai
Nth time i've heard "SaaS" && AI mentioned
the brother is complaining about googling. I'm backing away.
name one
it's like the "indie game dev" space on TTV
define ONE
chatbots are good at speaking in convincing generalities
I usually just use the 5.3 codex; ChatGPT later recommended the 5.4 version as well
doesnt bode well that that users of said chatbots speak in the same generalities and think what they're saying isnt just as cheap
Your "name one" debate tactic is childish.
you cant name one
really all i see is a skill issue and and will not to move forward because you convinced your self it's not possible.
name one what?
a single vibe coded project that made it to production.
I just put AFK on my ignore list. Someone please message me when he grows up.
oh no ill get less ad homs. anyway
i don't know anything about that? How can i know what projects are vibe coded and why does it matter?
It doesnt change your lack of skill and whining about not being able to use the tools at hand.
openclaw is vibe coded
what skill is it that i lack oh ad homming, speaking in generalities, cant provide specifics to anything....
claude code is vibe coded
only a couple of billion dollar entries to the name one contest.
Call me Opus 4.7, but I'm failing to see the point in any of this
right now the point is for you guys to deflect any criticism of the current state of agents by saying 'skill issue'
its like a mmo wackamole of negative user feedback
I could name some of my own that have usefulness
JagFx being one of them
filled a need that apparently wasn't properly there
We can do what you can't i guess thats why.
You mean 4.6
Holy moly. Will lives
whats good my boy
This reminds me that I wish there was a chatgpt-discussions channel for professional users and now I'm wishing there was one for codex as well. 🥹
you can certainly claim you can.
Sure!
There is the problem. You can't do it so you can't see the possibility. That's a skill issue.
I've found the main entry point and can "access" all the main menu items. Started with the buttons for preferences menu, new game, and exit. It knows the first file and found where it was referenced. It's just a straight shot of loading flac, bmp and wav files.
I don't give it further instructions other than to keep going until complete.
Is made it the longest it ever has so far without stopping with an update so that's good. It's still going right now
jk one sec
actually you can see now. but i'm updating the cli so keep that in mind
im just pointing out that egos are in play. some of yall HAVE to project that you're better.
did the same lol
Youre just complaining that it can't be done and won't accept YOU can't do it and others can.
awesome
Ive made it so that claude now spawns codex agents and can use gpt image gen 2 as a skill
i cant do what?
what is it that im saying i can't do?
dogpiling, speaking generalities, no clue what you're saying yourself...
If i need to answer that i can understand why you can't get results.
you cant answer it because, again, dogpiling, speaking generalities, no clue what you're saying yourself...
You claimed you have irreconcilable issues with 5.4, and he claims they are reconcilable and he did so. And many people here are saying the same thing.
You cried for about 30 minutes about not being able to rangle the 5.4.
such as
now you guys are on the defense because you feel guilty for being clowns
ad hom rate up 200% increase egotism
lmao
ok back to work, this is probaly dev anyways.
"It adds features I didn't ask for" "It doesn't work continuously until completion", those are two big and extremely solvable issues
yes it will do work i didnt ask for
that is a unique characteristic of 5.4
"its fixable" doesnt make it not a characteristic
I'm sorry bud, but that's a bit deep with no context. I don't know what GhidraMCP is, or "main menu items" or "new game", etc, and shouldn't need to. It sounds like you're still trying an "all or nothing" approach, which is probably not a great approach for your challenge. It sounds like you're vibe-coding a big game. If you can break down what you're doing I think we can help a little more.
"you can ask it to blast guardrails into every doc until it stops doing the behavior you dont like" doesnt negate what i said.
skill issue
I could say a pinto is slower than a ferrari and you'd say skill issue
only if you were driving the slower car
just stop
Nice! Imo I have not tried plugins or apps yet but cool to see a practically official tool about it
where's the docs i need to have chatgpt populate with guardrails to make eric wimp understand he's conflating issues with the tools and actual user usage.
You have the wrong Ferrari
Me neither, but hoping to find time for coding into these tools. That's more along the lines of what I do but rarely associated with a business model that makes the time worthwhile. 😢
You're funny 🤣 I said have it rewrite docs in its own language and understanding, not blast guardrails everywhere. I don't even have docs in my codebase. AGENTS.md is like 6 lines with really basic facts. But all the SKILL.md files are rewritten by 5.4 and that's imho a huge reason its default behavior sucks. It reads docs written by old, terse models and fills in the gaps of understanding.
@plush nymph i have 3 tips for you. if you do this, it'll stop happening in almost every case
-
Plan first, but when you create a plan, ensure that it has an ENUMERATED task list with validation loops/testing to ensure completeness and correctness
-
in your global agents: "When working from a plan, I may be away from my keyboard. Ensure that you act boldly on my behalf and work autonomously without stopping until the plan is FULLY completed. DO NOT stop to ask for permssion unless you are truly blocked or risk executing a potentially dangerous and irreversible action. Use your best judgement and continue until all tasks are complete"
-
if your agent is "doing things you didnt ask for" that is because your spec is too ambiguous. gpt is highly steerable. Be specific. if you allow it to guess, it will.
its not xml, people ahve been using that
i have found since gpt 5.4, that using normal "plan mode" leads to the model not finishing
this is a fairly new behavior
even before that slightly
it is a real problem
but you can mitigate it
its a tradeoff because it doesnt want to use too many tokens in a single run, so its almost better that it stops
Can someone explain codex usage on codex app vs chaptgpt codex?
not necessarily because you could break caching too and then you're using even more tokens
sending new requests has its downsides
if you give it a complex task that should in theory use a lot of token, if it does all of it in a single run, it likely cut a lot of corner
not if you spec well
and steer well
really spend a lot of time on the spec
get granular
this will help prevent the corner cutting
but to your point, plans/specs should be well scoped where possible. dont try to do too much at once
i've written out entire projects in english in a single file before.
this will also help
yeah so that can be problematic. it just depends how big
scoped is better
where possible
there's nothing wrong with doing a whole project at once PERSAY but if you can break it down into sub-specs, it can help. esp if its a big proj
&& use a framework so you have hooks, guardrails, subagents, && other forms of skills/commands at your disposal.
my workflow currently has 3 distinct grids of things going on, && top-left has 3 distinct repos open that are somewhat related to each other (language, vscode-extension, website)
the things you have to do to fit 16" when you cannot put one item elsewhere
that OBS takes space haha
but i need chat
It's my understanding that the apps provide a way to centralize access for common functionality. Other access via cloud/web, VSCode extension, and CLI, provide more specific access, and a different kind of access. You kinda need to try each tool to see what fits with specific needs. I hope that helps a little. Others who use the apps might add some insight.
my iphone is being used constantly by the 2 other agents to dump the app
i have never hated a screenshot more than this one
thank you
have you tried cmux?????
i dunno if they fixed it but the terminal in vs code would glitch out and crash vs code
always better to just use the terminal
cmux is goated
yeah i like iterm2
you mean not having access to my left-side file browser && code editor at the same time? no, ave not
iterm2 has some python api that lets you send messages to specific panes, so i built a messaging system around it for agents to message or delegate to eachother. the problem is that sending a message to an agent interrupts its flow and theres not a great way to monitor if an agent is active and wait.
besides, i gotta test my language's vscode extension at the same time
my workflow just works for me, idk mane
yeah you can do that in cmux / limux too
i admit it's not fun for many, but if it ain't broke, don't fix it
and you get browser support
md editor
splittable panes
agent to agent
workspaces
tabs
yeah i think iterm is tmux based. im not sure. the problem is that sending a message to an agent interrupts its flow and theres not a great way to monitor if an agent is active and wait.
agents would just say "seen that notification before" and ignore it
had to manually break each session of that habit and it would eventually backslide to ignore them
it would be nice, because i think ideally, you have like 50 sessions, each of them honed in on a specific scope of your project, a specific workflow and you delegate to the onesyou need when you need them
but you spend a lot more time breaking bad habits
the longer a session runs, the fewer unexpected behaviors it seems, especially if you dont switch tasks.
so now im trying to get a few sessions todo all of the work without any bad habits
we'll see if i can later fork those sessions and maintain the guardrails
somethign you need to understand about GPT models is, and this may actually get WORSE as models get better:
they are VERY good at following directions for the most part.
If you have a behavior that you dont like, then you need to adjust the context somewhere.
There's a reason for it. Sometimes that might be a system prompt you have no control over.
but usually you can fix this behavior.
So spend time on the prompts/crafting context.
sometimes you need to be obnoxiously explicit
its steerable to a fault
it doesn't infer well
even claude is getting worse at inferring, which is a trend i've noticed
yeah i think larger contexts cuts two ways. you gotta hone that larger context which is more difficult
i wouldn't use the larger context for now
there's no benefit unless you have really huge files
codex compaction endpoint is literally MAGIC
keep it at the 262k and just let it compact
as many times as it needs
i have seen and heard numerous tests of intelligence degradation over 400k
and for what?
you gain nothing when you can just let it compact and have no meaningful degradation
there may be some usecases. huge repo. huge file. if that be the case, fine
lots of workflows and tasks thatyou want a single agent to perform maybe
if you want the same session to do everything on your project
because without a larger context you may switch from task A to task B and it may forget how to do task A
it wont
ive seen it happen.
if you say so
It really is magic 🪄 some of my agents have thousands of user messages, still perform just as well as the beginning (if not better over time)
if you have a reusable composible task you need to reoccur, you can always make a skill
I use skills to solve most of the problems youre talking about.
but i can assure you that you do not need larger context and you are actively harming performance veruss just letting it compact
I also inject a small amount into the system prompt
im speaking in generalities, but for example, i'll have it work on front end for a while, then switch to fixing back end stuff. The longer i let it work on back end, the more it'll forget exactly how i want it to do frontend.
and you think a larger context window solves that?
it doesn't
the benchmarks even show that
models always weight thing that happened more recently than further back in the cw.
use different sessions for different tasks and create skills based around what you you need to guard rail.
i've always considered that loss of 'how to do task A' as a shift in context window due to limits in context window size.
i use multiple sessions but when then you'respending time honing each of those sessions on the rules you want to follow
Use skills
More context is not the same as better context. Models can read long windows, but they don’t reliably preserve all earlier instructions with equal strength
They are reusable guard raile that are called on demand
a skill is a doc it reads for context. you're still course correcting.
the further back they are, the less weight they get
and the worse they remember
on top of that, you harm actual outputs
You need to course correct, you already do it in a given conversation like you mentioned
better to compact 🙂 trust
yeah exactly. its potentially better context but i assume, and what ive experienced, is that you have to spend more time honing that larger context.
you will see basically no loss in intelligence or forgetting
if something is critical, make a skill
standardize
right and if you have multiple sessions you need to repeat more course corrections, so there's a tradeoff
but its pretty good about remembering, also it has session logs!
Use skills, then it just happens
again a skill is just a doc. its just a runbook.
u guys like my banger https://x.com/LLMJunky/status/2046735968259162292?s=20
skill isn't a doc. its a folder
its a file system
the skill.md is the readme of that file system
it tells you who what where when and how to use the files therein
You build the skills over time to keep the agent within the scopes you need.
They prevent drift by enforcing your ideas into places you commonly see drift.
you can even ask the agent to make them for you. ask it to scan the session logs, find opportunities to create stanrdaizations for skills for reusability
Skills are way more than just docs. They're the most powerful means of opting into processes at the most important times of development
The problem with AGENTS.md is an agent sees it exactly one time and forgets the whole thing. Skills they can opt into seeing it just in time.
If you set up land mines for the agents, where they're suddenly perplexed they can't do something, you can funnel that perplexity into a skill. For example I have a request-review skill that when they go to do anything with git before having requested a review, they get errors guiding them to review first. So you create the process, and you enforce it with hard blockers they're guaranteed to experience. That's pretty much the game to making the agents work perpetually and do everything correctly.
I admit that I've not migrated my environment from common .md docs to skills format yet. I understand that when AGENTS.md refers to any other doc, it's very indeterminant, more of a suggestion than a directive. But I've found with strong guidance on how things must work that the assistant always uses my .md files correctly - and I actually keep very few directives in standard .md files anyway.
I do not know yet how strong Skills-specific .md files are processed as authoritative directives, similar to AGENTS.md files. Can anyone point to more specifics on this? The public docs frequently mix Claude and other notes amongst discussion of Skills. TYVM
ill have to look at more skills but for the most part its just a doc used as a runbook. you can have some scripts as part of that and tell it 'if conditional run x'
instead of saying 'reread agents.md and do specific task' you command $agents:specific_task
its reductive.
software is just some 1s and 0s
What makes skills special is this:
---
name: request-review
description: Use `request-review` for review-gated work. First run `request-review-role-instructions` so the current thread loads only the worker or orchestrator guidance it actually needs. [skill-hash:91e3b8c]
---
They see just that part in their context. If you change the description, they immediately see it at the start of the next turn, so your process can evolve in the short description there and they're enticed to follow the updates.
You combine skills with rules. Codex let's you define rules for command execution where you can block certain commands with justification. I've pretty much blocked all git commands, and other commands I see as drift, and the justification you can write "use the skill instead". Because usually they aren't touching git until the end of the work, the justification is like "use request-review, and prefer the sanctioned git scripts" which I have allowed, and only do non-destructive git ops.
You can kinda engineer the determinism and guidance around skills, and finding a way to make road blocks push them into using skills
skills are proactively called when they are needed, they arent front loaded into the context. So they dont add to context rot in the same way, they are directly next to the task they are designed to help with.
They have a description field that the agent uses. The description should be formatted to say what the skill is and when to use it.
For example i have a skill that is loaded every time the agent touches a unit test. It handles the common problems i had with unit tests being low value. It is only called when an agent gets to the testing phase.
I dont need to worry about that problem anymore
it just works now
is codex fked for anyone else? impossible to open app
I need to come back to Robert's post a few times to grok it. You're both saying skills are proactively called when required and that the rules don't front-load unnecessary bulk into context. I understand the value of that and I think I already do that with my own system of AGENTS.md > /docs/procedures files, instructing the assistant to consult specific docs under specific conditions. Since we can't see per-turn context I can't assert for sure what I've pulled into context. I need to read more docs and think on it. I will need to build a new WSL environment with Skills to ensure I don't break the instances that are already running my system ... probably what Robert went through when writing robdex.
Great way to open a discussion in a public forum, dude!
Why do people do that?
U want my whole life story? It was just a question my dude
yes
gather around everybody, TS is a special guest and will be sharing his life story
ready when you are, @slate schooner
You can see per-turn context in the *.jsonl session files. Also I've swept through the codebase (up to v0.116.0) and that's pretty much how I figured out skills.
If you wanna try out skills without breaking anything you can launch mkdir ~/.test_codex && CODEX_HOME=$HOME/.test_codex codex. That'll give you a fresh codex home that you can fill with skills and all sorts of stuff to give it a whirl, and if you decide you don't like it, the default CODEX_HOME is $HOME/.codex
fix computer or maybe buy new
Codex is just stuck here, ive tried everything, Weird
Sorry, barked. Have a great evening! 🙂
They also have layering that is inbuilt into the system where they can have conditional reference documents internally in the SKILL.md. I would also expect they have at least a little bit of post training that teaches the model to use them.
They can also be more tool base, like i have skills that create an account and verify the email address in gmail
what agents see and not see is decided by the harness. if you want agent to see agents.md at every turn you can tweak the harness so it injects agent's md at top of context at start of each turn
the agent itself doesn t opt_in into anything, it just get fed whatever the harness is setup to fed it
OpenAI recommends not repetitively injecting the same context into each message like that for 5.4
terrible idea it only needs it once at the start, the agents.md is injected into the system prompt at the start of the session afaik.
i didn t say it was a good idea
It's injected as a user message in the chat template 🥸 that's why they don't take it seriously
i said IF you want it to be seen every turn
YOU CAN DO THAT
it s a conditional guys
interesting, in claude code the directory level one is injected into the system prompt and further nested instances are read on demand
Why do we know/assume AGENTS.md fileS are forgotten? I don't even see that happening on compaction.
you don t need to assume. you go look into codex code, and see the exact structure and logic of what is packed into the context of an agent
what it gets at every turn, what it gets after compaction
what is the order
System instructions are rolled up pre-context window. I'll assume you have read that code and trust that the info is corect. So is there any indication Why AGENTS files haven't been configured like that too?
system instructions at least the part that lives in those 256k of context are still part of that context, what do you mean by pre-context?
They are preserved after compaction. All user messages are reinserted into the conversation, and then a "mental state blob" is inserted. That's why you probably noticed after compaction, the agent starts with less and less % context remaining each time. There is a limit though. It truncatrs after COMPACT_USER_MESSAGE_MAX_TOKENS is reached, so your AGENTS.md can eventually fall off the convo
I'm thinking something like this...
-- System Instructions --
-- Account-level AGENTS.md --
-- System-User-level AGENTS.md --
-- Workspace-level AGENTS.md --
-- Folder-level AGENTS.md --
{{{ Context Compaction Window }}
bro, just go look in the code and you see how it is
I certainly will, just haven't had time.
agents.md if i remember well it's a thing only at start of a thread
it doesn t have a special place in context management
can somoen tell if reset was yesterday or 20th?
how robert said it is accurate about how remote compaction works
That makes it a part of context and therefore subject to the window. Do we have confirmation that Skills files aren't rolled out the same way??
thre is nothing outside of context lol
just the raw model on server side
That's not correct... There are many tiers between model and apps that manage conversation/context.
Skill files are manually loaded by the agent on-demand. The YAML header at the top of the skill file gets inserted once, and if anything in it changes it gets reinserted
bro i was speaking from the perspective of the agent
the agent is not aware of the deterministic harness that wraps it
it is only aware of what it has in context window
OK, here's one then... If Skills are compacted within context and there's nothing left but a summary of earlier context, then how does the model know what skills to pull in next?
robert knows the stuff
it just does
🪄
its fkn magic
i'm telling you
Yeah pretty much lol
codex is so inteligent you can literally give it 5 different skills at once all interleaved in instructions and it can coherently parse and use all of them like a 600 iq martian space traveler
So that confirms that there is a layer Outside of conversation context that's not touched by compaction. Which matches the model I described earlier. The question is what else survives compaction? What venus has been saying about Everything getting compacted is not correct, and I'm sure we'd see that in code. If it was correct then none of our AGENTS.md directives would survive first compaction, but that's not what happens. After compaction a turn will still complete, following firm directives. It goes brain dead on the nitty gritty of the context, not on the directives which guide actions.
bro i didn t say everything is compacted
i said everything is context for the model
You absolutely said that.
compaction doesn t touch everything that is in context
quote me
compaction just works most of the time. The only time i feel a bit of pain is if it compacts really close to the start of implmentation after planning.
You have been saying "just go look in the code". as though you did.
that was about agents.md
the agent.md system in codex could probably benefit from looking at the way anthropic did it.
there s one thing to look into it and another for me to recite you the detailed order on top of my head
I smell gaslight. Look, it seems like this is the model that we're observing:
-- System Instructions --
-- Account-level AGENTS.md --
-- System-User-level AGENTS.md --
-- Workspace-level AGENTS.md --
-- Folder-level AGENTS.md --
{{{ Context Compaction Window }}
Who disagrees and why?
The AGENTS.md file does survive until COMPACT_USER_MESSAGE_MAX_TOKEN is reached, then it falls off the back.
It used to, back in v0.67.0, get preserved when it did local compaction, where it would insert the AGENTS.md files and the compaction summary message to the new agent. Now that it does remote compaction, it preserves all user messages + mental state blob, and truncates user messages after that constant. The AGENTS.md files are the first user message, so it falls off first
my point with looking at code was simply that instead of imagining how it works, you can actually go look and see how it works
what s that? is your theory of how it works? or you say that is how it works
That's my belief based on observation. Why is that wrong?
If one asserts that Skills are better because they survive compaction, and we agree that AGENTS.md and the established model works the same, then why is Skills better in that specific regard?
check the detailed list of how current structure is shaped
imo you should write it to a md file anyway so that the agent can track and manage tasks, mark them complete, add logs, etc
that way state is always preserved
Tell me what is wrong with the simple example that I provided.
🦗 That's what I thought. 🦗
agents.md is only read specifically at start of thread
at new turns the agent doesn t get it again as a special instruction
it just remains in the larger context
So how is it still being processed at the end of the thread after a compaction event?
becasue as agent reads it it becomes part of the context
You're ignoring the part about Compaction. 🙄
It's the first user message the agent ever sees
Yes. So if it gets compacted then it loses detail. If that's actually the case then how is it that at the end of a compacted thread that it's still following the same directives?
bro compaction sends the current context after is normalized locally to a specific server api that makes a summary out of it and returns
the purpose of compaction is to keep whatever was important for the agent working memory state
ChatGPT loses high-level instructions and gets stupid. Codex doesn't do that.
You're not saying if AGENTS.md is inside or outside of that API call (which isn't relevant)
Here's a more visual example. Before compaction:
User: AGENTS.md file contents
User: Hey Codex, do some stuff
Agent: Ok, doing stuff
...
User: Hey, do more stuff
Agent: Doing stuff...
<COMPACTION STARTING>
After compaction:
User: AGENTS.md file contents
User: Hey Codex, do some stuff
User: Hey, do more stuff
<MENTAL STATE BLOB>
Agent: Continuing stuff...
the instructions of the compaction model are not available so i can't tell you what is returned in detail
skills are in developer message