#codex-discussions

1 messages · Page 37 of 1

turbid axle
#

likely

solemn acorn
#

WSL is pretty bad in my experience, it’s just better than the alternative (powershell 💀 )

hard drum
solemn acorn
#

I do fortunately just run linux on my personal desktop

#

but at work I must WSL to get anything done

solemn acorn
lean lark
#

Yet I've never been happier than before.
Mac is a great platform, beloved by millions and it's based on *nix. Windows has always been awful. But in this specific case, if different experiences are radically different between success and failure, I suggest success is possible and common, and thus the platform isn't the issue, and that failure is the exception of something unique in environments.
And "the WSL application over Windows failed so I converted to a different base OS" is non-sequitur. If that were the case we could say "Notepad is a bad code editor so I converted to Linux".
Again, I don't argue with your experience or the quality of your solution. I'm pointing out that one piece of software that failed to work on your system does not justify a complete OS change. What you did in switching to Mac was good for you and so many others ... but that does not actually say anything about WSL, which was your attempt to link cause and effect.
Wut?
I'm saying WSL works for some of us really well, so we use it. If it doesn't work for you, cool, don't use it. 🫶

solemn acorn
#

WSL is certainly the least worst option for a nice terminal environment if you’re stuck on windows

hard drum
#

If it doesn't work for you, cool, don't use it.
It's a bad way to put it, && frankly a shallow way to dismiss anything I said.

solemn acorn
#

but that doesn’t make it a good option compared to what’s available on macOS/Linux

hard drum
#

PowerShell is objectively a hassle to work with. Many things work great under Linux, so WSL2 is unfortunately the thing you have to use to really get things done proper at some point.

lean lark
#

I'm being kinda verbose about this here because peeps are coming to code for the first time with Codex and AI and we're seeing the sort of not-so-logical leaps of intuition from these newcomers who jump from OpenAI to A\ to X, etc ... They haven't honed their prompting skills and they blame the technology.

hard drum
#

Unfortunately, Microslop being who they are, decided to not do a good-enough job at making WSL2 work on Windows proper.

solemn acorn
hard drum
#

Which, if you're on Windows, should not be a need at all.

lean lark
#

(Powershell is awful, I never use it. IMHO ... YMMV)

frosty zealot
#

If you're not trying to run your own models, I'd buy a RPi5 they're fairly inexpensive and more than robust enough to run a linux desktop, or I'd dual boot

solemn acorn
#

e.g. file watching will not work for stuff in your windows filesystem

hard drum
#

If you have to emulate a headless Linux distro just to use Bash && whatever UNIX-like things on a system that isn't UNIX-like/based, you've kinda screwed up your own system. That dependency is pathetic.

solemn acorn
#

removable disk support is also very iffy in WSL

cyan gyro
#

Why the hate on powershell? What makes it bad?

solemn acorn
boreal holly
lean lark
#

Dude, you had a bad experience and you're taking it to "the OS is bad, the company is bad...". Many of us agree that Windows can be awful and Microsoft has done a really bad job. But you're extending from one person's experience to blasting a product, a platform, and a company. That's full-out irrational.

solemn acorn
#

just very inefficient and obtuse

frosty zealot
hard drum
#

It's really not that hard to get the gist of it.

frosty zealot
#

mom dad stop fighting

lean lark
#

These are different Operating Systems ... "Windows is bad because it doesn't run Linux software" Whaaaaat? Please, now I'm questioning your age and competence.

cyan gyro
#

What if you are building a windows app, aren’t you then typically bound to work in windows?

solemn acorn
#

windows power users that love powershell do actually exist, I only slightly question their sanity

#

but my work is deploying to linux systems, so learning powershell doesn’t make a whole lot of sense for me

high girder
#

I'm a windows user. I also question my sanity, but it works. WSL is the goat though

boreal holly
lean lark
#

PowerShell is a replacement for the limited cmd functionality from early windows. It's like BASH/ZSH ... just another shell. I never use it, don't like it, but I don't bash the company over it.

high girder
#

My biggest issue with powershell is the characters getting messed up on default formatting.

solemn acorn
#

powershell is better than CMD, I’ll give it that

high girder
#

that's why skills exist though

solemn acorn
#

the bar is the floor though lol

kind jay
lean lark
#

(And to keep this on-topic, AGENTS.md directs the assistant to Not ever use PS)

frosty zealot
#

dont yell at my dad

lean lark
#

I see this as a healthy discussion BTW, not an argument. 🙂

hard drum
# lean lark These are different Operating Systems ... "Windows is bad because it doesn't run...

Never did I say that. I said that it's pathetic that if you want to do anything proper outside of giving yourself pain of Windows' own shell-like interface (Oh, did you know we have both CMD && PWSH, && then there's the two versions, where one is the older built-in, && the newer one is 2 versions ahead && has to be installed separately--how cool is that?), you have to move into WSL2, which itself is just giving you a container to a linux distribution, so what was the point of using Windows again?

kind jay
hard drum
#

At that point, wouldn't it make more sense to just run Linux native && them contain a Windows VM for anything truly windows-adjacent?

lean lark
#

I think it's important for us because WSL is a great way to use Codex. Done. For some it's not the right solution. For some a better answer might be Docker, or Mac, or a dedicated Debian/Ubuntu/RH, or whatever distro they prefer. We all need to try and choose our platforms ... and for our purposes get back to Codex.

solemn acorn
kind jay
hard drum
kind jay
#

Windows itself is useless if you have a clue what you’re doing

frosty zealot
lean lark
hard drum
#

Not everything plays nice w/ Wine, but there will be a day. A DAY, i say, when that will no longer be the case

solemn acorn
#

WINE can’t run most windows software that would keep you on windows

hard drum
#

They've already done so much

kind jay
hard drum
#

But there's still a long road ahead

lean lark
#

I think part of the problem is that Codex wasn't engineered to work well in a sandbox over Windows, so almost all of us perceive a need to use Linux. Why doesn't that translate to a bash on Codex, OpenAI, and AI in general?

solemn acorn
hard drum
kind jay
hard drum
solemn acorn
#

it’s not a technical limitation so much as these are very locked down software suites

lean lark
#

(I have no idea what platforms he's talking about but curiously my respect just jumped way up. 🙂 )

kind jay
hard drum
solemn acorn
kind jay
#

But in general just use latex

hard drum
#

isn't that the math language?

kind jay
kind jay
solemn acorn
#

you’ll have to pry excel out of enterprises cold dead hands

hard drum
#

how is math lang related to spreadsheets && word documents?

kind jay
solemn acorn
lean lark
#

And BTW ... I think I need to give up on JaneBot. She's now working well but she seems to be more of an high-maintenance wrapper around Codex for conveniently creating local utilities. I dunno, I need to think her through a bit more.

hard drum
kind jay
hard drum
#

i thought all latex was doing was just rendering math equations

solemn acorn
#

scientific papers and whatnot

hard drum
#

i've been living under Patrick's rock apparently

high girder
#

Use Prism for LaTex

kind jay
lean lark
high girder
#

lmao, true. I don't think arxiv likes image formats though

lean lark
#

(I've seen her papers ... would hate to do that in LaTex)

solemn acorn
#

I used LaTeX through university, it’s pretty solid

kind jay
#

@frosty zealot

#

@frosty zealot

#

@frosty zealot you there?

solemn acorn
#

especially since you can generate graphs and stuff

frosty zealot
kind jay
#

Add the report into your project folder, then python plots are automatically updated when you re render

solemn acorn
#

you can do graphs directly in LaTeX, no python required

kind jay
solemn acorn
#

I used it for vectors a lot

frosty zealot
#

for other reasons

#

I like the snap it makes :>

solemn acorn
#

there is also a dedicated plugin that adds coffee stains to your papers

kind jay
#

Any other reason?

boreal holly
lean lark
#

oh how the mighty have fallen...

kind jay
#

Keep this server SFW please

lean lark
#

ahhhh horrified JaneBot just refused create another utility. I hate AI. It's evil. Skynet is gonna kill everyone. We need this to be legislated!!

boreal holly
kind jay
kind jay
lean lark
#

That's true! Was recently reading on that.

#

Microplastics.

boreal holly
kind jay
lean lark
#

Um, well for working with microplastics, the gloves actually themselves shed nano-particles, contaminating specimens. So the gloves are Not Suitable For (that kind of) Work (anymore).

#

(And yes, I'm stupid, but not in this specific discussion, I find that label highly offensive.)

#

That's my line.

#

(oops)

boreal holly
#

I use em when working on my truck. Thankfully not allergic 🤣 better than constantly using Gojo and brillo pad

frosty zealot
#

It’s so gooey

boreal holly
kind jay
kind jay
lean lark
kind jay
#

OAI after hours

lean lark
#

BTW, JaneBot refused to do something because of explicit directives. I guess AI is OK again. I won't trash my system and move to Mac afterall.

kind jay
frosty zealot
#

Unspeakable acts

#

And it’s programmed that it can’t say no 😭😭

#

@boreal holly do you have a good flow for quick testing work trees with the iOS simulator, they seem to like not wanting to give it up to the other work trees

lean lark
#

I asked her to create a utility that summarizes system status, kind of like HAL 9000 would report on the ship status. She refused because that utility would require access to /var/log which is forbidden outside Codex sandbox scope, and journalctl which is a forbidden command. She actually refused to write the code that would violate high-level directives. She done good.

kind jay
#

What Mac should I get?

hard drum
lean lark
#

Get the expensive Mac ... Oh, I'm sorry, they're ALL expensive.

kind jay
#

Maybe the new mini Mac book and a Mac Studio

hard drum
kind jay
hard drum
#

2021 MBP holds up pretty well against whatever I owned in 2022/2023 outside of Apple, esp. in terms of speakers && hardware efficiency

#

also that touchpad mmmmmmmmmhmhmhmhmhm

boreal holly
hard drum
#

the 2nd bst I ever tried was from my 2022 Yoga 7

frosty zealot
#

Look at how fluffy this snow is today

kind jay
#

I have some MSI laptop from 2020 and it’s still pretty peak

#

I want a Mac though

hard drum
hard drum
kind jay
frosty zealot
hard drum
boreal holly
kind jay
hard drum
#

fair point

boreal holly
kind jay
#

@lean lark pleaseeee 🥺

frosty zealot
lean lark
#

This is the slippery slope: WSL is awful so I want a Mac. That means bucks or getting someone else's refurbished hand-me-down. That's a horrible series of consequences to avoid an app.
Again, though, note from above, I acknowledge Mac is a fine platform. It's just not for me.

hard drum
# boreal holly It's not a yearly expense! I have a 4 year old Mac Studio M1 Ultra and it still ...

to be fair, apple does overcharge a lot, but i give credit where due in other points:

  1. convenience
  2. ecosystem within their own services && other devices, even as far back to eons ago to a big extent
  3. hardware, esp. on iPhones, that many Androids still don't have in parity (faceID + TrueDepth -- the stuff VTubers sometimes use for their advanced facial movements && expressions as long as the model is wired for it, && yes, it takes a lot of time && effort to do so on model side)
  4. software interface to hardware was || sometimes is mostly "we do this once or twice, then keep it maintained, not make regular users our alpha/beta testers) -- hint: xiaomi is notorious for the latter nowadays, esp. w/ HyperOS when it came out
hard drum
#

WSL2 && linux crapping out on my GMKTek K8 Plus was the last straw

lean lark
#

I know bud, I'm razzin...

boreal holly
blissful basin
#

Did anyone find good way to generate nice powerpoint presentations with gpt? Usually i end up with a lot of elements not being correctly positioned etc

kind jay
hard drum
#

1076 EUR for an M1 Max 10C/24C MBP that has 32G ram && 512G ssd is honestly not bad, considering 86% battery && the fact the price had 24% VAT on top.

lean lark
#

For my next purchase of any box, I'm gonna need at least 32GB VRAM for AI.

blissful basin
#

I actually meant with codex app, not gpt 😄

kind jay
lean lark
#

HAHAHA

blissful basin
boreal holly
kind jay
blissful basin
#

unless it is now mac tech centre

hard drum
blissful basin
kind jay
kind jay
frosty zealot
boreal holly
boreal holly
kind jay
hard drum
boreal holly
kind jay
#

I was looking at air but I don’t think it makes sense over 17

frosty zealot
#

@boreal holly come here I’ll give you the belt

kind jay
#

35 years old??

boreal holly
lean lark
#

If a mod here here they'd prolly suggest we move to #off-topic

hard drum
#

max for phone i buy is 500eur, that incl 24% VAT

#

my 13pro w/ 100% battery health && good condition casing (it was reported as Grade C but i saw no grade c) was 299eur

boreal holly
#

I got the air because the titanium frame. Saw some dudes do stress tests on it, seemed very rugged. idc about camera or battery, just will it last 10 yrs

hard drum
#

can't argue w/ that

#

i really gotta get myself a cheapidy cheap iPhone XR || 11 just for TrueDepth, so that I could harness it as a more-powerful webcam over my otherwise-iffy Logitech C920

frosty zealot
#

Where 5.5

kind jay
frosty zealot
#

It can’t

kind jay
#

I sold mine at 8%

frosty zealot
#

My bankruptcy depends on it

kind jay
#

Or I think 14c each

frosty zealot
#

I’m scared too look

kind jay
#

Made like 200%

kind jay
frosty zealot
gentle harbor
#

why does codex have a 5 hour limit anyway ? just let me waste my 7 day limit in 1 day

finite perch
#

Hi, quick question about Codex MCP handling.

I’m trying to understand whether GPT-5.4 can actually see namespace-level descriptions / MCP ServerInstructions.

I verified with mitmproxy that Codex sends this in the request to https://chatgpt.com/backend-api/codex/responses:

{
  "type": "namespace",
  "name": "mcp__openaiDeveloperDocs__",
  "description": "Tools in the mcp__openaiDeveloperDocs__ namespace."
}

But when I ask the model to repeat that namespace description verbatim, it returns UNAVAILABLE.

I see the same with my own MCP server: per-tool descriptions seem visible, but namespace description / ServerInstructions do not.

Is this expected in Codex, or should namespace descriptions be model-visible the same way tool/function descriptions are?

boreal holly
lean lark
#

Opinion on the limits:

  1. Daily limit is an arbitrary contract. We agree that we'll only get 5 hours per day and we can pay for more.
  2. Daily limit also helps to prevent spikes of mass use over a broad time span. It's a kind of defensive mechanism that might not be there if "compute" wasn't such a limited resource.
kind jay
gentle harbor
kind jay
lean lark
#

No, just pay for more.

#

This is simple business, no magic.

boreal holly
boreal holly
kind jay
#

I might try out, I don’t have time to use my limits this week so

hard drum
gentle harbor
# kind jay Not one you can meaningfully get through

well ill by buying 1x pro 20x when spud comes out if it works well i might get 2 as ive been running out of codex so fast and its the stupidest thing ive ever seen claude is even worse then it comes to limits

kind jay
hard drum
gentle harbor
#

ive been hearing spud is faster and uses less tokens

hard drum
#

gpt-5.4 in codex =/= 5.4 codex

#

gpt-5.3-codex = 5.3 codex

kind jay
boreal holly
finite perch
hard drum
kind jay
boreal holly
hard drum
#

but 5.5 codex would imply existence of gpt-5.5-codex akin to gpt-5.3-codex, similar to 5.4 pro for gpt-5.4-pro

gentle harbor
#

arent they no longer making a codex model ?

kind jay
gentle harbor
#

thats just 5.5 not 5.5 codex optimized version

kind jay
hard drum
#

if i see "5.5 codex", i expect to see gpt-5.5-codex in /model list

kind jay
#

More over how much you want to bet?

finite perch
gentle harbor
boreal holly
#

I mean there's a non-zero chance they'll release another codex-optimized model. They released rosalind or whatever for scientific research, so they still make fine tuned models

frosty zealot
hard drum
#

now imagine 5.4-codex-spark HOOOOO BOY

gentle harbor
#

why do they even make chat models ? do they even make money from that, as far as i see it chat models are just another waste just like sora was

hard drum
#

you can't expect to run, idk, codex on a damn phone && have a nice, comfy workflow

frosty zealot
gentle harbor
hard drum
#

besides, being able to quickly go to the web && ask things away is easier & better than having to go thru hoops just to have a user ask "how many Rs in strawberry?"

#

website is just convenience

boreal holly
# gentle harbor why do they even make chat models ? do they even make money from that, as far as...

They make chat models because for chat they use sliding window attention. Free/Go/Plus plans get like 14k context or something really small. This means if they want their model to not tell users how to build bombs, they have to fine tune that behavior into a dedicated model, so when the context window runs out and there's no compaction to help out the model still remembers its rules.

The models we use in Codex are not fine tuned, and the behavior is described in the system prompt. There's no sliding window attention, and it uses compaction to manage memory. That's why they have a chat model and a regular one

gentle harbor
hard drum
kind jay
frosty zealot
lean lark
#

And how much would we save on the expense of automobiles if they only got rid of all of that heavy garbage like bumper/fender, seatbelts, airbags....

kind jay
gentle harbor
hard drum
boreal holly
# finite perch Yeah, that’s what I’m trying next. But honestly it still feels like a weird wor...

Unlike Claude, OpenAI's responses API does not natively support MCP. What they do is they convert MCP tools into the same function call format used by other non-MCP tools before it's sent to openAI. So that means it's not going to have the same features as the original spec. But Codex puts list_mcp_resources and read_mcp_resource tools in there so if an agent wants to understand the MCP tools beyond the surface-level tool description they can do that. Namespaced descriptions are not an intrinsically supported feature

hard drum
lean lark
#

ChatGPT is the gateway daroogie for common folk to get introduced to AI. Protections are in place because humans, being what they are, mis-use everything they touch. So there need to be adults in the room to limit the harm that people can do to themselves and others. When human nature changes, we'll no longer need nanny directives.

hard drum
#

It's very unfair that issues are pushed to OAI instead of the malicious users behind the acts

kind jay
hard drum
kind jay
gentle harbor
hard drum
kind jay
#

And if you care so much, just set up agents with open source LLMs for other tasks

hard drum
#

you can't stop a bad driver from early gravestone, even with seatbelts, when pushed to limit

frosty zealot
#

I mean you cannnnnnn thats what driving tests are for

hard drum
gentle harbor
frosty zealot
#

Ok so we're blurring the lines here, that's not a bad driver, that's somebody making bad decisions

#

i drive drunk every day

#

I just do it with good intention

gentle harbor
#

a person hacking someone is normally making bad decisions in order to make money

kind jay
#

If you want to do something bad, then you at least require some skills to do so, hopefully by then you know better

lean lark
#

This is in that same realm of weapons legislation and how weapons don't harm people, people harm people. We do NOT NOT NOT want legislation for AI nannies. We can choose which company we feel is doing better about AI protections. THAT is what this is about.

frosty zealot
#

yay dads back!

#

How was your afternoon nap

hollow reef
#

Codex computer use isn't on Windows right

gentle harbor
kind jay
boreal holly
kind jay
hollow reef
#

How do I get it

#

🤔

gentle harbor
frosty zealot
kind jay
hollow reef
gentle harbor
#

oh my god bruh

hollow reef
#

Not the app itself

#

I have the app

kind jay
#

@lean lark give me role

kind jay
hollow reef
hollow reef
#

Unfortunate I'll have to do stuff manually on my windows device

gentle harbor
#

i wish the In-app browser worked on windows sigh

lean lark
# gentle harbor exactly, for any sub tier from 20 up they should uncensor it a bit and as you go...

This is up to every individual company. Grok has a lower bar for what's considered acceptable in society. People can gravitate to that if they wish. OpenAI has chosen to be a more family-oriented platform for billions of people in diverse cultures. As soon as someone produces bad content, someone else blames the company and the technology. OpenAI has chosen to at least attempt to separate themselves from that.

hollow reef
#

At least I can do my normal work on my m5 max

#

😭

gentle harbor
#

i just hope spud is better at ui

frosty zealot
#

I hope all it can do is making things look like varying potatoes

gentle harbor
#

like brother this is the worst ui ive ever seen

frosty zealot
#

At least it's unique

gentle harbor
#

uniquely trash i cant use it for anything i cant even see half the buttons

#

i made a basic ui my self but it looks so bad but at least it works

#

still new to codex, im assuming though extra high is useless for plan mode and i should set it to low since its just planning ?

boreal holly
lean lark
#

That depends. I sometimes go to 5.4-medium/planning or 5.4-medium/planning. There's no real right answer but my guide is "how much intelligence do I want applied to this planning process?".

boreal holly
gentle harbor
#

what version is this ? mine is so bad at ui couldnt even do basic stuff without messing up the placments

boreal holly
#

5.4 medium

torpid trout
#

whelp - why is my usage % not going down, 100% since 7 hours working straight.

gentle harbor
silent aspen
#

Codex has been flaky today, doesn't response, hangs for long periods of time

dusk thorn
#

If 5.5 gets better at front end it’s over for Claude

boreal holly
gentle harbor
dusk thorn
#

I’ve had a few reconnects but

dusk thorn
gentle harbor
#

do you think they will reset limits again for the new model ?

silent aspen
dusk thorn
#

Tibo will hit the button

gentle harbor
#

why does it do this ?

boreal holly
gentle harbor
dusk thorn
#

It uses thinking if it needs to

#

Do you got auto for thinking?

unique spade
dusk thorn
gentle harbor
dusk thorn
#

Oh ok

unique spade
hard drum
#

he looks like a sad, confused thumbnut

unique spade
# dusk thorn

Just keep it on extended thinking at all times. Instant model is much dumber

#

For easy questions, thinking extended will be almost instant anyway

frosty zealot
lean lark
#

thinks that Dario thing was a troll

gentle harbor
unique spade
gentle harbor
lean lark
#

Message limit?? That's not a thing.

gentle harbor
unique spade
#

On pro you get extra perks

Virtually unlimited gptpro with 2 settings normal/extended

2 extra reasoning for thinking light and hard at the 2 ends of the spectrum

Access to some older models

Pulse daily digest

And increased usage on deep research, agent etc (but I never use those)

plush nymph
#

codex is so bad

#

opus is so much better right now

lean lark
#

You can post as much as you want into a thread. The issue isn't with quantity of messages, it's the size of the context window.

gentle harbor
unique spade
plush nymph
#

if codex was playing chess and you asked it to movea specific pawn, it would just figure you wanted all the pawns moved on both sides

#

its so bad

unique spade
plush nymph
#

im spending all my time putting all the pawns back right now

unique spade
#

During early o1 era, you had like 50 messages per week with o1 on plus

And unlimited on pro

boreal holly
unique spade
#

But I think since spring of 1year ago model use became unlimited on plus too, except you don't have access to GptPro

plush nymph
#

oops didnt mean to reply to that

unique spade
boreal holly
lean lark
#

Goal posts are moving from questioning limits on messages to thinking... Rather than focusing on "I think", just look at the docs and see what's there.

unique spade
gentle harbor
#

i feel like i need to keep search on at all times or else its 2x more stupid

lost drum
#

I wonder when exactly will I get the notificarion that 5.5 is here

lean lark
#

@frosty zealot wake me up when the discussion comes back to Codex.

unique spade
frosty zealot
gentle harbor
#

that seems stupid now that i think of it

unique spade
#

Do you guys get overburnt after long streaks of using codex?

gentle harbor
#

seems like the length of thinking is getting longer though, 30 min for a simple refactor of a xpi thats under 20kb

gentle harbor
unique spade
gentle harbor
#

for some reason the new image model seems to be bad at humans still ? i mean there is no way this is the new model right ?

unique spade
#

But sometimes I just get a day when I just feel to get a break from it

gentle harbor
#

ive seen amazing images of image v2 that are crazy realistic this has to be image 1.5 still

#

i tried the website the desktop app and the mobile app

unique spade
#

Maybe you don't get very good quality on codex or gpt with it anyway

#

If you want 2k or 4k with high quality you need to pay for it via api

frosty zealot
#

Is it @kind jay 's alt account

boreal holly
gentle harbor
lean lark
#

I dunno about getting burnt out on Codex itself. Within the last few months my approach to development has significantly changed, with Codex being an integral partner-as-tool in my process. I'm now spending time to formulate prompts rather than looking at syntax. I'm spending time to check the work done by Codex rather than fixing bugs. It's less of a burnout from this than just a shift in what we've all experienced as developer burnout the way we've always done it.

gentle harbor
gentle harbor
#

the text is very nice, but from other images ive seen they were way better at characters and backgrounds

unique spade
gentle harbor
unique spade
#

Chatgpt/codex can do at most 1024*1500
And either low or normal quality, not sure about that

#

I tried myself to push it more than that but they can t tweak quality

#

Codex doesn't have any knob exposed for quality or resolution in its image tool

#

Those are available only via api

gentle harbor
#

im guessing its since they dont want another sora thing to happen with normal chagpt ? so they make it harder to get to and cost per image instead of a plan, that makes sense but it is a shame

boreal holly
unique spade
#

You can t replicate that on codex or chatgpt

#

😂

unique spade
lean lark
#

If we don't have control over the sub-agent that Codex spawns for doing an image, tell it to create a utility that creates a model object that has all of the specs you want, then execute that utility. So you either get the general-purpose solution, or you get what you personally want. Just be specific.

high girder
gentle harbor
boreal holly
gentle harbor
#

for context i havent used the images before since it was pretty bad so i had no clue about the api

unique spade
lean lark
kind jay
#

@frosty zealot found your steam

lean lark
frosty zealot
kind jay
unique spade
plush nymph
high girder
frosty zealot
kind jay
frosty zealot
#

LOL

lean lark
frosty zealot
lean lark
#

Caveman is interesting, but I don't consume tokens in system responses. My output tokens are consumed in thinking, new code, patches, processing CLI output, etc. My Codex output is nowhere near as stupidly verbose as I am. 🤔

unique spade
plush nymph
#

it thinks in less words too

lean lark
#

What do you think those tokens are that are generated in response to prompts?

unique spade
torpid trout
lean lark
#

My prompts are like "process the first item in the todo list". That's about 10 tokens ... the token consumption is ALL in the output.

torpid trout
#

Its asstonishing that the tool can actually design a nice look in a meme, but then when asked to implement such thing in actual code... it coems up with the dinosaur poop of the second section. radius, radius, radius and cards, cards cards.

unique spade
#

Caveman rust and cavemen python

#

90% less tokens

#

🤣

unique spade
lean lark
#

( Side note : Will anyone else admit that somethings they can't understand what in blazes Codex is saying ... about their own project? )

potent mason
kind jay
frosty zealot
#

Top 10 anime deaths

unique spade
#

The model writing the code is an aphantasic model and right now there s not much communication with the image engine. Beside the model sending a prompt and getting an image back

gentle harbor
#

what are you hopping for the most in spud ?

unique spade
gentle harbor
#

i think being faster + using less tokens then 5.4 would already be a big jump then just using more tokens and being better

potent mason
lean lark
plush nymph
#

codex will often use slang it made up

#

you can tell it to stop using or creating 'project slang' but its something you gotta hammer into it

gentle harbor
plush nymph
#

and then the other issue is it saves tokens by completely making up what files do based on the name of the file

#

itll also write docs you never asked for and just read those, which are also mostly made up nonsense text generated based on the file name

frosty zealot
#

When people complain about Codex being bad at UI, do they mean at a creative level, or do they mean like it doesnt know how to position elements

plush nymph
#

if you're lucky itll read a file and infer what a file does based on function names.

unique spade
lean lark
#

NewModel = OldModel + (OldModel*PercentSpeedImprovement + OldModel*PercentQualityImprovement - OldModel*PercentTokenUsage)

unique spade
#

So i have a hard time believing your stories

lean lark
#

Hmm, that made no sense mathematically, nvm

unique spade
#

😂

#

So who else has their codex producing random docs you never asked for?

#

Cause personally I never had that happen

plush nymph
#

itll write tests and docs you never asked for constantly.

kind jay
#

@frosty zealot How much can you bench?

lean lark
#

I was just really impressed with all the docs that JaneBot recognized as being necessary.

unique spade
plush nymph
#

its also god awful at writing unit tests

#

it'll make tests using synthetic data without even looking at the real schema so it wont match

#

and then like venusrose it'll convince itself everything is fine without knowing whats actually going on

unique spade
#

Lol

boreal holly
#

I hate to say it, but definitely a skill issue

plush nymph
#

you can hammer into it to stop doing that

#

it takes a long time and a consistent session id

unique spade
#

Everything you have to share in this chat is how codex is bad at various things that just don't match up to my own experience

#

And promoting github stuff

lean lark
#

If the bot is or is not doing something that you want, you MUST be the leader and control the tool with firm directives. It's all on us. Bot success is proportional to prompt quality.

unique spade
#

So i have a hard time why you would actually pay for a service that is so bad as you claim it is

plush nymph
#

dont use caeman venus. spend more of your money

lean lark
#

Yeah, I hope some folks shift their approach here from telling us how bad the tech is to asking us how to improve their instructions.

lost drum
#

Bro the plus plun on 20x is the best purchase I did

kind jay
unique spade
plush nymph
#

there is no hard output for a specific input

lean lark
#

I think you said that yesterday too.

unique spade
#

😂

lean lark
#

You don't want 100% determinant responses .... they could all be wrong.

plush nymph
#

venus im guessing you're from eastern europe

gentle harbor
#

how many members does this discord have again ?

kind jay
unique spade
kind jay
unique spade
#

With AFK saying the exact thing

boreal holly
unique spade
#

With 100 inputs and 50 different outputs

lean lark
#

Depends on the temperature

kind jay
plush nymph
#

but i spend more time right now with 5.4 dealing with things i didnt ask for than with previous models

#

when people say "everything's fine" i suspect they just arent catching slop.

lean lark
#

LLM temperature between 0.8 and 0.9 can be much better than 1.0. There are other factors. Look at the tuning on various published models where testing indicates varied performance with not just temperature but other k/v factors.

plush nymph
#

like i promise you guys ive. been working with claude and codex for over a year now. hundreds of millions of tokens used. im attuned to how it works and how to get it to do what i want. im telling you its getting more and more difficult.

lean lark
#

As to quality of bot output: Prompts must be high quality. Tell the tool to check its own work. Generate sample data that conforms to schema and that doesn't conform. Run that data through the unit tests. If non-conformant data passes tests, fix the tests! There's nothing mysterious here.

boreal holly
frosty zealot
plush nymph
#

it overthinks, overengineers, goes on more tangents, builds things you dont want on the offchance you want them in the future....

unique spade
kind jay
plush nymph
#

the analogy i used earlier is if you were playing chess and asked 5.4 to move a specific pawn, it would move all the pawns on the board from both sides.

lean lark
cedar skiff
#

5.4 certainly takes a little more work to get in line than 5.3. I had my issue with it for sure. But i got it working in the end, just takes different nuance.

plush nymph
#

its not just different, it's difficult.

lean lark
#

That may be, can't argue, but we must learn how to work with the tools we have.

unique spade
#

Bro why don't you share some images with your obvious dissatisfaction with what codex did?

You know that kind with claude deleting the database and user getting irritated

Cause if it s that bad as you say, you should have a lot of those lying around

plush nymph
#

its near to the point that the time you saved is less than the time you spent fixing slop and trying to establish guardrails

frosty zealot
#

Hi guys, I'm new in here, I normally hang out over at the Claude discord, and I'm just wondering if I should give Codex a chance or if you think 4.7 Opus is a better model

plush nymph
#

i think opus is better rn

lean lark
#

Dear Newbie, I think you should try Eliza.

plush nymph
#

4.7 has better tools rn too

cedar skiff
#

you must have some serious code base issue to think that.

kind jay
#

@frosty zealot Am I niche?

plush nymph
#

keep in mind it intentionally gives you higher temp responses and then logs your dissatisfaction as a metrict to tune the model in the future

#

so your frustration is OpenAI's profit

frosty zealot
kind jay
boreal holly
lean lark
#

Whenever someone says "I got a bad response" and we ask "what was the prompt?" The inquiry never goes anywhere. Bot responses depend on full context, which includes all .md files, custom instructions (or whatever), and the full thread. We never get that info ... and I don't want it.
Honestly, it's almost impossible to generate a repro case that includes all relevant context ... that's a fault in this tech now that I hope we'll be able to resolve in some years.

frosty zealot
kind jay
frosty zealot
#

click my profile

kind jay
lean lark
#

AFK - if Codex or ChatGPT do something you don't want, have you ever asked it Why it did that? The responses to that kind of question can be extremely illuminating and educational.

plush nymph
#

like if we're starting wars and bankrupting society to turn earth into a giant chatbot, I shouldnt have to spend all of my time setting up guardrails and engineering prompts

lean lark
#

wut?

plush nymph
#

u herd me

lean lark
#

"like if we're starting wars and bankrupting society to turn earth into a giant chatbot" wut?

plush nymph
#

i dont wanna do all of this. im going back to manually coding everything

kind jay
#

I’m going to vibe code a Spotify niche meter, feel free to do it before me

plush nymph
#

i dunno who thought this was a good idea but we're doing it

boreal holly
#

homie having an existential crisis over 5.4 needing some minor tuning

lean lark
#

In this channel we focus on using the technology. I think concerns about the world going to hell because of the tech is best in a different forum, whether we agree or not.

plush nymph
#

all money, all resources, the total agregate of human productivity, all the rainforests, all the data centers, all the fresh water, all for a chatbot

lean lark
#

slow down brother.... The answer to your angst is in better prompting ... more work to use better tools. that's the way everything works. These are tough times, no doubt. I think we all feel it.

hard drum
plush nymph
#

⏵⏵ auto mode on

#

isnt that what you want? less input more output?

frosty zealot
lean lark
boreal holly
cedar skiff
#

i see GetX and it hurts my soul

boreal holly
lean lark
#

Robert does this for HVAC ... who knows what people do for real world use. 😆

frosty zealot
#

Robert was my apprentice, gret apprentice, some might say the best,

lean lark
#

And that was just for the kitchen heating unit.

nocturne folio
#

how can i get good uptime like github

plush nymph
kind jay
#

Who wants to play Roblox with me?

kind jay
boreal holly
lean lark
#

Me reading my documentation and glancing at the screen as the spambot deletes a bad post.

steady vigil
plush nymph
#

interesting. yeah it's designed to do more work per run, which is great if you can dial it into what you want

#

not so great if you dont have a long, heavily guardrailed run for it to do

lean lark
#

☝️ blaiming the tools

plush nymph
#

no i love writing long heavily guardrailed prompts every 4 minutes

lost drum
steady vigil
#

if you want success you cant rely on prompts, you need to back check the results... has to be done both ways. you cant prompt your way to clean code no matter how good you think your prompt it. its not gonnna happen and over time will get worse due to error

gentle harbor
# lost drum but he can do it for you😭

doesnt ai already change your prompt to be better ?, what would the differance be to prompt the ai to make it better then paste it in then it makes it better again by it self

steady vigil
#

the SDK is great for that

gentle harbor
#

that was some bad grammar but it gets the question across

boreal holly
# steady vigil they have poisoned context no doubt

The most powerful anti-poison I've found is enable sandbox, set up rules that hard deny certain commands, and funnel them towards the outcome.

I have it set up so when a worker hits a sandboxed tool call that isn't allowed, it forwards it to the orchestrator, and their only way to handle it is denying the approval request. So when a worker starts drifting and doing bad stuff, the orchestrator sees the tool call drift and reminds em of the rules. Basically set up booby traps for the agents to run into and take that opportunity to put em back on track lol

steady vigil
#

but yeah that's a good layered approach

lean lark
#

Funny to see statements like "you can't" about things that I actually do.
I'm not saying everything is perfect here or that my prompts are awesome or that the bot doesn't make mistakes. I'm saying better use of the tooling yields better results, simply compared to poor use of the tooling obviously yielding poor results. Perhaps the easy compromise is "learn to do these things better". That's a daily effort for all of us.

#

Some people think they can tell a bot to write code and it will write whatever they want. You have to tell it What you want correctly, and How you want it correctly. The responses we get aren't "it's done, ship it". The responses are an ongoing sequence of "it should be better, check it, what's next?" ... just like coding we've been doing for decades.

steady vigil
#

prompt == assumption management. still need the objective feedback loop

kind jay
#

Hahaha

steady vigil
#

careful @kind jay lol

#

dont get yourself banned

kind jay
#

** test **

#
  • 1
lean lark
#

(Might be best to experiment in DM, and you can delete experiments when done)

boreal holly
#

DM the mods 😩

steady vigil
#

lol

lean lark
#

One of the mouseover options here is to delete your own message.

kind jay
plush nymph
lean lark
#

Get the LLM to improve your prompts and AGENTS directives.

cedar skiff
#

The real results are in adjusting the system prompt to suit your given project

boreal holly
#

As a matter of fact, have 5.4 rewrite all documentation, AGENTS.md, SKILL.md, and any resources in its own language and understanding.

plush nymph
#

you mean have itt write guardrails everywhere.

#

inject guardrails into every doc, a mass proliferation of guardrails

#

what happens then is it starts rereading and reinterpreting those guardrails in a million different ways until it thinks everything violates those guardrails.

#

then you gotta rewrite em

cedar skiff
boreal holly
# plush nymph you mean have itt write guardrails everywhere.

You ever seen the movie Joe Dirt? There's a guy who's like "Home is where you make it", and Joe understands what he said differently. All your docs are written by different models that see the world differently. You avoid that by having 5.4 write the docs in its own words

lean lark
#

yes:

  • There are Account-level Codex directives. See cloud client settings for this. These settings apply to all Codex usage for all projects you do.
  • There are System-level Codex directives as ~/.codex/AGENTS.md.
  • There are workspace-level Codex directives in each /project/folder/AGENTS.md.
  • There are folder-specific Codex directives for /src or /docs or src/components
  • And you can tell the assistant to refer to specfic .md files anywhere for additional guidance about specific topics.

Use ChatGPT or Codex to help you write ALL of these.
Ask it about conflicts, anomalies, contradictions, tensions, gaps, and suggestions for how to improve it all.
Take the time to do this and you will get SO much more out of the product.

plush nymph
#

but again 5.4 doesnt know what words mean. it's going to interpret the same word an infinite number fo ways.

lost drum
#

thats the setup you must build for codex so he wont ever get lost and even write prompts for humself

plush nymph
#

yes lol

lost drum
#

mine those not idk bro

lean lark
plush nymph
#

yeah dude ive only burned a billion tokens idk what im talking about

#

ive got nowhere near the prowess or intuition as yall

lean lark
#

Use ChatGPT or Codex to help you write ALL of these.
Ask it about conflicts, anomalies, contradictions, tensions, gaps, and suggestions for how to improve it all.
Take the time to do this and you will get SO much more out of the product.

lean lark
plush nymph
#

who's deployed a vibecoded project yet?

lean lark
#

I don't want to insult you, sorry and know I have. You can do better.

plush nymph
#

how many vibe coded projects have reached production?

plush nymph
#

none

#

zero

cedar skiff
plush nymph
#

they dont make it

patent wharf
#

What am I doing wrong? I have GhidraMCP and have it linked to Codex. It's finding functions in the disassembled code. It will think for a while and stop. Like it found the entry point for the introduction story that you start with a button "Introduction". It was able to find it. I say "Follow the flow of that function. It should load Video, graphics, and sound files and eventually loop back to the main menu.

It'll build a list and then after 20-30 minutes say, "I made good progress to getting through the introduction loading, blah blah blah." I have to say "Okay keep going" every 20-30 minutes. I just want to let it ride. It's doing a good job, but I don't want to sit here all day nudging it forward

boreal holly
lean lark
#

The world is full of MVP public offerings. Vibing is the latest way to generate an MVP. There are a ton of them out there and the average consumer accepts them as SOTA. It's a horrible low-bar for the world.

patent wharf
#

The last prompt i said "Don't stop until we have a WORKING intro with the proper videos, audio, and text. If your reply after running refers to something not being quite there, or not being 100%, don't stop. Just keep going." That seems straight forward

plush nymph
#

name an MVP in production thats vibecoded

lean lark
#

@patent wharf perhaps you should break down the project into modules: do videos, then do audio, then do text... I have no idea about your project, can't engage on this one, but it sounds like you're literally asking it to do the whole thing in one shot and that's never a great approach.

vapid widget
#

Which LLM is better at writing code?

plush nymph
#

i woulve told you claude for front end codex for back end back when it was 4.6 and 5.2

#

now idk

lean lark
#

🙄

boreal holly
cedar skiff
plush nymph
hard drum
lean lark
#

the brother is complaining about googling. I'm backing away.

hard drum
#

it's like the "indie game dev" space on TTV

cedar skiff
plush nymph
#

chatbots are good at speaking in convincing generalities

vapid widget
plush nymph
#

doesnt bode well that that users of said chatbots speak in the same generalities and think what they're saying isnt just as cheap

lean lark
#

Your "name one" debate tactic is childish.

plush nymph
#

you cant name one

cedar skiff
#

name one what?

plush nymph
#

a single vibe coded project that made it to production.

lean lark
#

I just put AFK on my ignore list. Someone please message me when he grows up.

plush nymph
#

oh no ill get less ad homs. anyway

cedar skiff
# plush nymph you cant name one

i don't know anything about that? How can i know what projects are vibe coded and why does it matter?
It doesnt change your lack of skill and whining about not being able to use the tools at hand.

#

openclaw is vibe coded

plush nymph
#

what skill is it that i lack oh ad homming, speaking in generalities, cant provide specifics to anything....

cedar skiff
#

claude code is vibe coded

#

only a couple of billion dollar entries to the name one contest.

plush nymph
#

yeah that's what they report at least

#

they claim 90% of it was ai written

boreal holly
#

Call me Opus 4.7, but I'm failing to see the point in any of this

plush nymph
#

right now the point is for you guys to deflect any criticism of the current state of agents by saying 'skill issue'

#

its like a mmo wackamole of negative user feedback

hard drum
#

JagFx being one of them

#

filled a need that apparently wasn't properly there

cedar skiff
ivory zodiac
#

whats good brethren

#

i'm building something cool

#

you guys wanna see?

high girder
ivory zodiac
#

whats good my boy

lean lark
#

This reminds me that I wish there was a chatgpt-discussions channel for professional users and now I'm wishing there was one for codex as well. 🥹

plush nymph
boreal holly
cedar skiff
ivory zodiac
#

smh

patent wharf
# lean lark <@399715676039675915> perhaps you should break down the project into modules: do...

I've found the main entry point and can "access" all the main menu items. Started with the buttons for preferences menu, new game, and exit. It knows the first file and found where it was referenced. It's just a straight shot of loading flac, bmp and wav files.

I don't give it further instructions other than to keep going until complete.

Is made it the longest it ever has so far without stopping with an update so that's good. It's still going right now

ivory zodiac
#

jk one sec

#

actually you can see now. but i'm updating the cli so keep that in mind

plush nymph
ivory zodiac
cedar skiff
ivory zodiac
#

if you have skills, etc to add, feel free

#

go for it

unique spade
high girder
#

Ive made it so that claude now spawns codex agents and can use gpt image gen 2 as a skill

plush nymph
#

what is it that im saying i can't do?

#

dogpiling, speaking generalities, no clue what you're saying yourself...

cedar skiff
plush nymph
boreal holly
cedar skiff
plush nymph
#

now you guys are on the defense because you feel guilty for being clowns

#

ad hom rate up 200% increase egotism

#

lmao

cedar skiff
#

ok back to work, this is probaly dev anyways.

boreal holly
# plush nymph such as

"It adds features I didn't ask for" "It doesn't work continuously until completion", those are two big and extremely solvable issues

plush nymph
#

yes it will do work i didnt ask for

#

that is a unique characteristic of 5.4

#

"its fixable" doesnt make it not a characteristic

lean lark
# patent wharf I've found the main entry point and can "access" all the main menu items. Start...

I'm sorry bud, but that's a bit deep with no context. I don't know what GhidraMCP is, or "main menu items" or "new game", etc, and shouldn't need to. It sounds like you're still trying an "all or nothing" approach, which is probably not a great approach for your challenge. It sounds like you're vibe-coding a big game. If you can break down what you're doing I think we can help a little more.

plush nymph
#

"you can ask it to blast guardrails into every doc until it stops doing the behavior you dont like" doesnt negate what i said.

cedar skiff
plush nymph
#

I could say a pinto is slower than a ferrari and you'd say skill issue

cedar skiff
plush nymph
#

just stop

ivory zodiac
#

ferraris arent that fast

#

change my mind

boreal holly
# ivory zodiac

Nice! Imo I have not tried plugins or apps yet but cool to see a practically official tool about it

plush nymph
#

where's the docs i need to have chatgpt populate with guardrails to make eric wimp understand he's conflating issues with the tools and actual user usage.

potent mason
lean lark
boreal holly
plush nymph
#

what are you writing?

#

what is unique to 5.4 language?

ivory zodiac
#

@plush nymph i have 3 tips for you. if you do this, it'll stop happening in almost every case

  1. Plan first, but when you create a plan, ensure that it has an ENUMERATED task list with validation loops/testing to ensure completeness and correctness

  2. in your global agents: "When working from a plan, I may be away from my keyboard. Ensure that you act boldly on my behalf and work autonomously without stopping until the plan is FULLY completed. DO NOT stop to ask for permssion unless you are truly blocked or risk executing a potentially dangerous and irreversible action. Use your best judgement and continue until all tasks are complete"

  3. if your agent is "doing things you didnt ask for" that is because your spec is too ambiguous. gpt is highly steerable. Be specific. if you allow it to guess, it will.

plush nymph
#

its not xml, people ahve been using that

ivory zodiac
#

i have found since gpt 5.4, that using normal "plan mode" leads to the model not finishing

#

this is a fairly new behavior

#

even before that slightly

#

it is a real problem

#

but you can mitigate it

plush nymph
#

its a tradeoff because it doesnt want to use too many tokens in a single run, so its almost better that it stops

unborn thunder
#

Can someone explain codex usage on codex app vs chaptgpt codex?

ivory zodiac
#

not necessarily because you could break caching too and then you're using even more tokens

#

sending new requests has its downsides

plush nymph
#

if you give it a complex task that should in theory use a lot of token, if it does all of it in a single run, it likely cut a lot of corner

ivory zodiac
#

not if you spec well

#

and steer well

#

really spend a lot of time on the spec

#

get granular

#

this will help prevent the corner cutting

#

but to your point, plans/specs should be well scoped where possible. dont try to do too much at once

plush nymph
#

i've written out entire projects in english in a single file before.

ivory zodiac
#

this will also help

#

yeah so that can be problematic. it just depends how big

#

scoped is better

#

where possible

#

there's nothing wrong with doing a whole project at once PERSAY but if you can break it down into sub-specs, it can help. esp if its a big proj

hard drum
# ivory zodiac this will help prevent the corner cutting

&& use a framework so you have hooks, guardrails, subagents, && other forms of skills/commands at your disposal.

my workflow currently has 3 distinct grids of things going on, && top-left has 3 distinct repos open that are somewhat related to each other (language, vscode-extension, website)

#

the things you have to do to fit 16" when you cannot put one item elsewhere

#

that OBS takes space haha

#

but i need chat

lean lark
# unborn thunder Can someone explain codex usage on codex app vs chaptgpt codex?

It's my understanding that the apps provide a way to centralize access for common functionality. Other access via cloud/web, VSCode extension, and CLI, provide more specific access, and a different kind of access. You kinda need to try each tool to see what fits with specific needs. I hope that helps a little. Others who use the apps might add some insight.

hard drum
#

my iphone is being used constantly by the 2 other agents to dump the app

ivory zodiac
hard drum
ivory zodiac
#

😂 jk

#

i prefer the codex app so much

hard drum
#

i prefer the workflow i have

#

it's not great, but it works for my adhd brain

ivory zodiac
#

have you tried cmux?????

plush nymph
#

i dunno if they fixed it but the terminal in vs code would glitch out and crash vs code

#

always better to just use the terminal

ivory zodiac
#

cmux is goated

plush nymph
#

yeah i like iterm2

ivory zodiac
#

ib uilt a linux version

hard drum
plush nymph
#

iterm2 has some python api that lets you send messages to specific panes, so i built a messaging system around it for agents to message or delegate to eachother. the problem is that sending a message to an agent interrupts its flow and theres not a great way to monitor if an agent is active and wait.

hard drum
#

besides, i gotta test my language's vscode extension at the same time

#

my workflow just works for me, idk mane

ivory zodiac
hard drum
#

i admit it's not fun for many, but if it ain't broke, don't fix it

ivory zodiac
#

and you get browser support

#

md editor

#

splittable panes

#

agent to agent

#

workspaces

#

tabs

plush nymph
ivory zodiac
#

notifications

#

when an agent needs ur attn

plush nymph
#

agents would just say "seen that notification before" and ignore it

#

had to manually break each session of that habit and it would eventually backslide to ignore them

#

it would be nice, because i think ideally, you have like 50 sessions, each of them honed in on a specific scope of your project, a specific workflow and you delegate to the onesyou need when you need them

#

but you spend a lot more time breaking bad habits

#

the longer a session runs, the fewer unexpected behaviors it seems, especially if you dont switch tasks.

#

so now im trying to get a few sessions todo all of the work without any bad habits

#

we'll see if i can later fork those sessions and maintain the guardrails

ivory zodiac
#

somethign you need to understand about GPT models is, and this may actually get WORSE as models get better:

they are VERY good at following directions for the most part.

If you have a behavior that you dont like, then you need to adjust the context somewhere.

There's a reason for it. Sometimes that might be a system prompt you have no control over.

but usually you can fix this behavior.

So spend time on the prompts/crafting context.

#

sometimes you need to be obnoxiously explicit

#

its steerable to a fault

#

it doesn't infer well

#

even claude is getting worse at inferring, which is a trend i've noticed

plush nymph
#

yeah i think larger contexts cuts two ways. you gotta hone that larger context which is more difficult

ivory zodiac
#

i wouldn't use the larger context for now

#

there's no benefit unless you have really huge files

#

codex compaction endpoint is literally MAGIC

#

keep it at the 262k and just let it compact

#

as many times as it needs

#

i have seen and heard numerous tests of intelligence degradation over 400k

#

and for what?

#

you gain nothing when you can just let it compact and have no meaningful degradation

#

there may be some usecases. huge repo. huge file. if that be the case, fine

plush nymph
#

lots of workflows and tasks thatyou want a single agent to perform maybe

#

if you want the same session to do everything on your project

ivory zodiac
#

you can keep the same session

#

let it compact

#

it wont forget

plush nymph
#

because without a larger context you may switch from task A to task B and it may forget how to do task A

ivory zodiac
#

it wont

plush nymph
#

ive seen it happen.

ivory zodiac
#

if you say so

boreal holly
ivory zodiac
#

if you have a reusable composible task you need to reoccur, you can always make a skill

cedar skiff
#

I use skills to solve most of the problems youre talking about.

ivory zodiac
#

but i can assure you that you do not need larger context and you are actively harming performance veruss just letting it compact

cedar skiff
#

I also inject a small amount into the system prompt

plush nymph
#

im speaking in generalities, but for example, i'll have it work on front end for a while, then switch to fixing back end stuff. The longer i let it work on back end, the more it'll forget exactly how i want it to do frontend.

ivory zodiac
#

and you think a larger context window solves that?

#

it doesn't

#

the benchmarks even show that

#

models always weight thing that happened more recently than further back in the cw.

cedar skiff
plush nymph
#

i've always considered that loss of 'how to do task A' as a shift in context window due to limits in context window size.

#

i use multiple sessions but when then you'respending time honing each of those sessions on the rules you want to follow

ivory zodiac
#

More context is not the same as better context. Models can read long windows, but they don’t reliably preserve all earlier instructions with equal strength

cedar skiff
#

They are reusable guard raile that are called on demand

plush nymph
#

a skill is a doc it reads for context. you're still course correcting.

ivory zodiac
#

the further back they are, the less weight they get

#

and the worse they remember

#

on top of that, you harm actual outputs

cedar skiff
ivory zodiac
#

better to compact 🙂 trust

plush nymph
ivory zodiac
#

you will see basically no loss in intelligence or forgetting

#

if something is critical, make a skill

#

standardize

plush nymph
ivory zodiac
#

but its pretty good about remembering, also it has session logs!

cedar skiff
plush nymph
#

again a skill is just a doc. its just a runbook.

ivory zodiac
ivory zodiac
#

its a file system

#

the skill.md is the readme of that file system

#

it tells you who what where when and how to use the files therein

cedar skiff
ivory zodiac
#

you can even ask the agent to make them for you. ask it to scan the session logs, find opportunities to create stanrdaizations for skills for reusability

boreal holly
#

The problem with AGENTS.md is an agent sees it exactly one time and forgets the whole thing. Skills they can opt into seeing it just in time.

If you set up land mines for the agents, where they're suddenly perplexed they can't do something, you can funnel that perplexity into a skill. For example I have a request-review skill that when they go to do anything with git before having requested a review, they get errors guiding them to review first. So you create the process, and you enforce it with hard blockers they're guaranteed to experience. That's pretty much the game to making the agents work perpetually and do everything correctly.

lean lark
# boreal holly Skills are way more than just docs. They're the most powerful means of opting in...

I admit that I've not migrated my environment from common .md docs to skills format yet. I understand that when AGENTS.md refers to any other doc, it's very indeterminant, more of a suggestion than a directive. But I've found with strong guidance on how things must work that the assistant always uses my .md files correctly - and I actually keep very few directives in standard .md files anyway.
I do not know yet how strong Skills-specific .md files are processed as authoritative directives, similar to AGENTS.md files. Can anyone point to more specifics on this? The public docs frequently mix Claude and other notes amongst discussion of Skills. TYVM

plush nymph
#

ill have to look at more skills but for the most part its just a doc used as a runbook. you can have some scripts as part of that and tell it 'if conditional run x'

#

instead of saying 'reread agents.md and do specific task' you command $agents:specific_task

ivory zodiac
#

its reductive.

software is just some 1s and 0s

boreal holly
# lean lark I admit that I've not migrated my environment from common .md docs to skills for...

What makes skills special is this:

---
name: request-review
description: Use `request-review` for review-gated work. First run `request-review-role-instructions` so the current thread loads only the worker or orchestrator guidance it actually needs. [skill-hash:91e3b8c]
---

They see just that part in their context. If you change the description, they immediately see it at the start of the next turn, so your process can evolve in the short description there and they're enticed to follow the updates.

You combine skills with rules. Codex let's you define rules for command execution where you can block certain commands with justification. I've pretty much blocked all git commands, and other commands I see as drift, and the justification you can write "use the skill instead". Because usually they aren't touching git until the end of the work, the justification is like "use request-review, and prefer the sanctioned git scripts" which I have allowed, and only do non-destructive git ops.

You can kinda engineer the determinism and guidance around skills, and finding a way to make road blocks push them into using skills

cedar skiff
# lean lark I admit that I've not migrated my environment from common .md docs to skills for...

skills are proactively called when they are needed, they arent front loaded into the context. So they dont add to context rot in the same way, they are directly next to the task they are designed to help with.
They have a description field that the agent uses. The description should be formatted to say what the skill is and when to use it.
For example i have a skill that is loaded every time the agent touches a unit test. It handles the common problems i had with unit tests being low value. It is only called when an agent gets to the testing phase.

#

I dont need to worry about that problem anymore

#

it just works now

slate schooner
#

is codex fked for anyone else? impossible to open app

lean lark
#

I need to come back to Robert's post a few times to grok it. You're both saying skills are proactively called when required and that the rules don't front-load unnecessary bulk into context. I understand the value of that and I think I already do that with my own system of AGENTS.md > /docs/procedures files, instructing the assistant to consult specific docs under specific conditions. Since we can't see per-turn context I can't assert for sure what I've pulled into context. I need to read more docs and think on it. I will need to build a new WSL environment with Skills to ensure I don't break the instances that are already running my system ... probably what Robert went through when writing robdex.

lean lark
#

Why do people do that?

slate schooner
#

U want my whole life story? It was just a question my dude

glacial shadow
#

gather around everybody, TS is a special guest and will be sharing his life story

#

ready when you are, @slate schooner

slate schooner
#

Hi my name is TS. I live in Uganda. I like dogs.

#

pretty much it

boreal holly
# lean lark I need to come back to Robert's post a few times to grok it. You're both saying ...

You can see per-turn context in the *.jsonl session files. Also I've swept through the codebase (up to v0.116.0) and that's pretty much how I figured out skills.

If you wanna try out skills without breaking anything you can launch mkdir ~/.test_codex && CODEX_HOME=$HOME/.test_codex codex. That'll give you a fresh codex home that you can fill with skills and all sorts of stuff to give it a whirl, and if you decide you don't like it, the default CODEX_HOME is $HOME/.codex

glacial shadow
slate schooner
#

Codex is just stuck here, ive tried everything, Weird

lean lark
cedar skiff
#

They can also be more tool base, like i have skills that create an account and verify the email address in gmail

unique spade
#

the agent itself doesn t opt_in into anything, it just get fed whatever the harness is setup to fed it

boreal holly
cedar skiff
unique spade
#

i didn t say it was a good idea

boreal holly
unique spade
#

i said IF you want it to be seen every turn

YOU CAN DO THAT

#

it s a conditional guys

cedar skiff
lean lark
#

Why do we know/assume AGENTS.md fileS are forgotten? I don't even see that happening on compaction.

unique spade
#

what it gets at every turn, what it gets after compaction

#

what is the order

lean lark
#

System instructions are rolled up pre-context window. I'll assume you have read that code and trust that the info is corect. So is there any indication Why AGENTS files haven't been configured like that too?

unique spade
boreal holly
lean lark
#

I'm thinking something like this...

-- System Instructions --
-- Account-level AGENTS.md --
-- System-User-level AGENTS.md --
-- Workspace-level AGENTS.md --
-- Folder-level AGENTS.md --
{{{ Context Compaction Window }}
unique spade
lean lark
#

I certainly will, just haven't had time.

unique spade
#

agents.md if i remember well it's a thing only at start of a thread

#

it doesn t have a special place in context management

lost drum
#

can somoen tell if reset was yesterday or 20th?

unique spade
lean lark
#

That makes it a part of context and therefore subject to the window. Do we have confirmation that Skills files aren't rolled out the same way??

unique spade
#

just the raw model on server side

lean lark
#

That's not correct... There are many tiers between model and apps that manage conversation/context.

boreal holly
unique spade
#

the agent is not aware of the deterministic harness that wraps it

#

it is only aware of what it has in context window

lean lark
#

OK, here's one then... If Skills are compacted within context and there's nothing left but a summary of earlier context, then how does the model know what skills to pull in next?

boreal holly
unique spade
#

robert knows the stuff

ivory zodiac
#

🪄

#

its fkn magic

#

i'm telling you

boreal holly
#

Yeah pretty much lol

ivory zodiac
#

codex is so inteligent you can literally give it 5 different skills at once all interleaved in instructions and it can coherently parse and use all of them like a 600 iq martian space traveler

lean lark
#

So that confirms that there is a layer Outside of conversation context that's not touched by compaction. Which matches the model I described earlier. The question is what else survives compaction? What venus has been saying about Everything getting compacted is not correct, and I'm sure we'd see that in code. If it was correct then none of our AGENTS.md directives would survive first compaction, but that's not what happens. After compaction a turn will still complete, following firm directives. It goes brain dead on the nitty gritty of the context, not on the directives which guide actions.

unique spade
#

i said everything is context for the model

lean lark
#

You absolutely said that.

unique spade
#

compaction doesn t touch everything that is in context

unique spade
cedar skiff
#

compaction just works most of the time. The only time i feel a bit of pain is if it compacts really close to the start of implmentation after planning.

lean lark
cedar skiff
#

the agent.md system in codex could probably benefit from looking at the way anthropic did it.

unique spade
lean lark
#

I smell gaslight. Look, it seems like this is the model that we're observing:

-- System Instructions --
-- Account-level AGENTS.md --
-- System-User-level AGENTS.md --
-- Workspace-level AGENTS.md --
-- Folder-level AGENTS.md --
{{{ Context Compaction Window }}

Who disagrees and why?

boreal holly
# lean lark So that confirms that there is a layer Outside of conversation context that's no...

The AGENTS.md file does survive until COMPACT_USER_MESSAGE_MAX_TOKEN is reached, then it falls off the back.

It used to, back in v0.67.0, get preserved when it did local compaction, where it would insert the AGENTS.md files and the compaction summary message to the new agent. Now that it does remote compaction, it preserves all user messages + mental state blob, and truncates user messages after that constant. The AGENTS.md files are the first user message, so it falls off first

unique spade
#

my point with looking at code was simply that instead of imagining how it works, you can actually go look and see how it works

unique spade
lean lark
#

That's my belief based on observation. Why is that wrong?

lean lark
#

If one asserts that Skills are better because they survive compaction, and we agree that AGENTS.md and the established model works the same, then why is Skills better in that specific regard?

unique spade
ivory zodiac
#

that way state is always preserved

lean lark
#

🦗 That's what I thought. 🦗

unique spade
#

at new turns the agent doesn t get it again as a special instruction

#

it just remains in the larger context

lean lark
unique spade
lean lark
#

You're ignoring the part about Compaction. 🙄

boreal holly
lean lark
#

Yes. So if it gets compacted then it loses detail. If that's actually the case then how is it that at the end of a compacted thread that it's still following the same directives?

unique spade
#

the purpose of compaction is to keep whatever was important for the agent working memory state

lean lark
#

ChatGPT loses high-level instructions and gets stupid. Codex doesn't do that.

lean lark
boreal holly
unique spade
boreal holly
#

See, it trimmed everything except user messages. AGENTS.md files are user messages

lean lark
#

Yes, so AGENTS.md is outside of what's compacted. Which is exactly the same as what is being said about Skills.md. So how is Skills "better" when the same process is in place?

unique spade