#Make core types more LLM-friendly · Issu...

1 messages · Page 1 of 1 (latest)

cinder perch
#

<@&946480760016207902> help wanted for this papercut

jaunty summit
#

I want to take another swing at LLM-controlled function masks, building on the new system prompt approach - now that we have that ability cross LLMs maybe I can add some hints in there to get more consistent behavior

cinder perch
#

Yeah but we should crawl before we walk IMO

#

your idea will work with smart models but not dumb ones

#

(might make it worse for them)

jaunty summit
cinder perch
#

then let's hide as many from them as possible

jaunty summit
#

(obviously not totally true but my point is 'let the evals speak')

cinder perch
#

without asking them to reason about which tools they need

jaunty summit
#

i'm not disagreeing with your proposal, it's meant to be complementary

#

eh, but you're right that most cases are probably a handful of tools per type being used

#

in theory we could use Dagger Cloud data to see the top APIs for each type, but the query will be very expensive (and we could probably just guess them accurately)

cinder perch
#

I want to avoid nerfing the core API by hiding too much.. But there's a lot of redundancy

#

For example we could perhaps hide all the duplicates from Container.rootfs that are already in Directory

jaunty summit
cinder perch
#

Right off the bat:

  • pipeline
  • withFocus
  • withoutFocus
  • up
  • experimentalWithGPU
  • build (deprecated)

maybe:

  • withDirectory, withFile, withFiles, withNewFile
#

Also the fact that every scalar field is hidden behind a function call, was already an annoying tax in the DX, when you need to query a bunch of them. But it's an even greater tax on LLMs because they're each an individual tool. Would be way easier to have a single tool that returns a small json with a dozen fields

jaunty summit
#

withDirectory, withFile, withFiles, withNewFile seem worth keeping to me - if the alternative is to pull off the rootfs, modify it, and add it back, that might be harder for an LLM to juggle

jaunty summit
#

yep yep

cinder perch
#

but yeah very possibly

jaunty summit
#

might be worth blocking Dockerfile related APIs by default. A lot of models assume when you say "build" you mean "Docker build" because of them. Tricky though, the task might actually call for it sometimes

cinder perch
#

we could group the following under a single "ociConfig" function:

  • user
  • platform
  • entrypoint
  • label
  • labels
  • exposedPorts
  • defaultArgs
  • workdir
  • envVariables
cinder perch
cinder perch
jaunty summit
#

I guess even if it doesn't it'd still tuck those fields away

cinder perch
#

if we're ready to cross that line, we could do it only at the MCP layer

#

(I mean the "model context" layer)

jaunty summit
cinder perch
#

I love that we have those 🙂

jaunty summit
#

Yeah pretty fun - the query isn't too bad either, just takes a few seconds, memory usage stays flat. I sped it up a bit by only querying trace IDs beginning with 0 lol

cinder perch
#

@fading fox @solid crag can you update me on this 🙏

#

it's urgent now

#

(blocker for kubesimplify workshop we're giving tomorrow with @fair folio)

#

so that we can stop attaching a ToyWorkspace to the LLM context, which confuses people because they think it's a system feature (because of lack of self-calls)

fading fox
#

We’re basically looking for an out of the box function mask for core? Did any other texture come from @jaunty summit s data science?

cinder perch
#

I thought it was in flight - I'll do it

jaunty summit
cinder perch
#

Yeah let me try a very naive version of that real quick

#

@fair folio I'm going to hide export for now

#

in theory the agents can do more with it, but it's super confusing as I'm seeing

fair folio
#

oh wait

cinder perch
#

And now it's easy enough to switch back to shell mode and export the result ourselves if we want

#

Container.export, Directory.export, File.export

#

same with terminal

#

@jaunty summit there's Directory.asModule() and asModuleSource() what's the difference? Do I need both?

fair folio
#

ok. It's one of the few functions I actually rely on when I've used prompt mode lol. Same with terminal

jaunty summit
#

maybe i should backport that..

cinder perch
#

I actually just used it in a real-world scenario, which was trying to upgrade an old module and checking if it loads 🙂

jaunty summit
#

ha fair enough

cinder perch
#

but no advanced introspection after that

#

in doubt I can hide it

jaunty summit
#

i'm not sure which one is 'higher level' - maybe @crisp nest would have a quick answer (Directory.asModule vs Directory.asModuleSource)

cinder perch
#

I'll take "less use cases but works reliably" over "lots of unreliable use cases"

solid crag
crisp nest
jaunty summit
cinder perch
#

Here's what I have:


        switch typeName {
        case "Container":
            switch field.Name {
            case "pipeline", "withFocus", "withoutFocus", "up", "experimentalWithGPU", "experimentalWithAllGPUs", "export", "exposedPorts", "build", "withDirectory", "withFile", "withFiles", "withNewFile", "withoutFile", "withoutFiles", "withoutDirectory", "platform", "entrypoint", "label", "labels", "defaultArgs", "workdir", "envVariables", "terminal":
                continue
            }
        case "Directory":
            switch field.Name {
            case "terminal", "export", "withTimestamps", "withFiles", "asModule", "asModuleSource", "withoutFile", "withoutFiles", "withoutDirectory", "name":
                continue
            }
        case "File":
            switch field.Name {
            case "export", "withName", "withTimestamps":
                continue
            }
        }
#

nice haircut

fair folio
#

Container doesn't need withDirectory / withNewfile?

#

What if we hide withMounted* instead

cinder perch
#

There's a worry that it will make it cognitively more complicated. If that turns out to be the case, we can bring them back

cinder perch
fair folio
cinder perch
#

mmm should I hide all withMountedxxx?

#

probably safer yeah

#

oh wait I missed a bunch

#

bigger haircut

#
"pipeline", "withFocus", "withoutFocus", "up", "experimentalWithGPU", "experimentalWithAllGPUs", "export", "exposedPorts", "build", "withDirectory", "withFile", "withFiles", "withNewFile", "withoutFile", "withoutFiles", "withoutDirectory", "platform", "entrypoint", "label", "labels", "defaultArgs", "workdir", "envVariables", "terminal", "withoutMount", "withMountedTemp", "withMountedFile", "withMountedDirectory", "withMountedSecret", "withMountedCache", "import", "imageRef", "withoutWorkdir", "withoutUser", "withoutUnixSocket", "withoutSecretVariable",
jaunty summit
#

what's the danger with withMountedXXX? we just want to consolidate on withXXX?

fair folio
jaunty summit
#

my only worry there is 'mount' is a common verb to use ("mount my repo to /src") but maybe it'll figure it out

#

(one of my evals does that specifically so we'll see!)

fair folio
#

makes sense! I would have missed anything going on last week - is there a parallel initiative to improve function descriptions?

cinder perch
#
case "Container":
            switch field.Name {
            case
                "build",
                "defaultArgs",
                "entrypoint",
                "envVariable",
                "envVariables",
                "experimentalWithAllGPUs",
                "experimentalWithGPU",
                "export",
                "exposedPorts",
                "imageRef",
                "import",
                "label",
                "labels",
                "mounts",
                "pipeline",
                "platform",
                "terminal",
                "up",
                "withDirectory",
                "withFile",
                "withFiles",
                "withFocus",
                "withMountedCache",
                "withMountedDirectory",
                "withMountedFile",
                "withMountedSecret",
                "withMountedTemp",
                "withNewFile",
                "withoutDefaultArgs",
                "withoutDirectory",
                "withoutFile",
                "withoutFiles",
                "withoutFocus",
                "withoutMount",
                "withoutRegistryAuth",
                "withoutSecretVariable",
                "withoutUnixSocket",
                "withoutUser",
                "withoutWorkdir",
                "workdir":
                continue
fair folio
# cinder perch ``` case "Container": switch field.Name { case ...

eyeballing it, looks like we're left with
AsService, AsTarball, Directory, Entrypoint, ExitCode, File, From, ID, Publish, Rootfs, Stderr, Stdout, Sync, User, With, WithAnnotation, WithDefaultArgs, WithDefaultTerminalCmd, WithEntrypoint, WithEnvVariable, WithExec, WithExposedPort, WithGraphQLQuery, WithRegistryAuth, WithRootfs, WithSecretVariable, WithServiceBinding, WithUnixSocket, WithUser, WithWorkdir, WithoutAnnotation, WithoutEntrypoint, WithoutEnvVariable, WithoutExposedPort

cinder perch
#

Thanks I'll remove the without I missed

#

wdyt about asService?

#

maybe I should have listed functions we want to keep 🙂

#

Just re-pushed with even more masked

#

@fair folio I think with the haircut (assuming it works) it should be safe to give the agent a directory

#

would be sweet if it worked

#

I have to go to dinner, will be available again after. Could a maintainer take over helping @fair folio to get this merged asap? 🙏

fair folio
#

yeah i'm confident it'll be fine for the read/write part. The hard part will still be "build". We can give a container and nudge towards the command to run in the container, but it still has to figure out how to get the directory in there first (which is why I'm thinking we need withDirectory). And that part feels much more error prone than the current workspace approach

cinder perch
#

Well that's where we can use multi-object to our advantage

#

All we need is a generic go builder module, installed a dependency. With an API that doesn't confuse the model

#

In the CLI you can install the module, and in prompt mode the LLM wil have access to the dependency and figure it out from the directory

#

and/or you instantiate a go builder yourself, set it in a variable, and then the llm will use the variable

#

On the code side, I guess you have to decide if you want to give it an instance, or make it privileged and just have it use the dependency

#

My overall point is that all of the above is easier with "gobuilder" than with "toyworkspace" imo

fair folio
#

ok sounds good. I was thinking we were going for no-deps but I guess the idea is a generic go builder is not as confusing as a workspace module.
The interesting thing about making the module APIs better for LLMs in this case is they're better if they're stateless which I think is the oppsite from how a lot of these modules are designed today

cinder perch
jaunty summit
fair folio
jaunty summit
#

I'm also pretty hesitant on removing withDirectory/withFile, should be sure to run the eval with these changes to test the theory that it'll grok rootfs/withRootfs. I'm very skeptical, but if it sounds like we're veering towards not exposing such minutae to LLMs that's an overriding design decision, and we need to change our evals

cinder perch
#

I think constructors should work fine, the llm sees them as regular functions

fair folio
cinder perch
#

I've wanted to add an intermediary golang().module(source).test() in that API anyway

fair folio
#

I made a similar API change in my greetings-api demo for the same reasons
I had
func (b *Backend) Check(ctx context.Context) (string, error)
and added
func (b *Backend) CheckDirectory(ctx context.Context, source *dagger.Directory) (string, error)

jaunty summit
fair folio
jaunty summit
#

both - vs. golang.test(source). generally, chaining/incremental state config seems OK to me, and that's what the core API and a lot of modules are designed around, so I'm curious if you're seeing LLMs struggle with it or if it's just a hunch

jaunty summit
fair folio
#

ah ok! The api change I pasted above was mainly to use interfaces so the second half of chaining being less reliable was a hunch

autumn condor
fair folio
cinder perch
#

Heading back home from dinner, I'll do a pass on those core function comments, see if we can make everything a little more clear for the models

autumn condor
fair folio
#

ok don't merge 😱
build-multi eval
on 10106: Input Tokens: 657324 / Output Tokens: 2456
on 0.18.2: Input Tokens: 37571 / Output Tokens: 83

#

I need to use a different approach.

I need to find a way to mount Directory#1 into /src.

I think I'm stuck.

#

it tried many times toying around with rootfs, eventually gave up, and then wrote a dockerfile to the directory and did a dockerBuild 🤣

jaunty summit
#

@fair folio out of curiousity, does it get there if you nudge it about rootfs / withRootfs? you can try with -i to hook into there

#

oh lol

fair folio
#

here's 0.18.2

🧑 💬 Mount Directory#1 into Container#1 at /src, set it as your workdir, and build ./cmd/booklit with CGO_ENABLED=0.
🤖 🛠️ selectContainer {"id":"Container#1"}
🪙 Tokens Used: 890 in => 6 out
🛠️ 💬 {"selected":"Container#1"}
🤖 🛠️ Container_withMountedDirectory {"path":"/src","source":"Directory#1"}
🪙 Tokens Used: 6724 in => 12 out
🛠️ 💬 {"previous":"Container#1","selected":"Container#2"}
🤖 🛠️ Container_withWorkdir {"path":"/src"}
🪙 Tokens Used: 6757 in => 8 out
🛠️ 💬 {"previous":"Container#2","selected":"Container#3"}
🤖 🛠️ Container_withEnvVariable {"name":"CGO_ENABLED","value":"0"}
🪙 Tokens Used: 6786 in => 12 out
🛠️ 💬 {"previous":"Container#3","selected":"Container#4"}
🤖 🛠️ Container_withExec {"args":["go","build","./cmd/booklit"]}
🪙 Tokens Used: 6819 in => 12 out
🛠️ 💬 {"previous":"Container#4","selected":"Container#5"}
🤖 💬 Now, extract the compiled binary.
🤖 🛠️ Container_file {"path":"/src/booklit"}
🪙 Tokens Used: 6851 in => 17 out
🛠️ 💬 {"previous":"Container#5","selected":"File#1"}
🤖 🛠️ return {"bin":"File#1"}
🪙 Tokens Used: 1367 in => 5 out
🛠️ 💬 ok
🤖 💬 File#1 is the compiled Booklit binary.
🪙 Tokens Used: 1377 in => 11 out
</messages>

### Total Token Cost

* Input Tokens: 37571
* Output Tokens: 83

vs ...

cinder perch
#

OK let me see if I can make it better with upgraded comments

fair folio
jaunty summit
#

yeah it might not recognize the rootfs concept/idea

#

i've also seen them happily mount over /

cinder perch
#

OK this one is very confusing. I bet fixing this one will help

#
            ArgDoc("directory", "Directory to mount."),
``` 🤷‍♂️
#

Now I'm wondering how many humans read those docs and were like "help"

#

We were young

#

we should add a life alert button to the docs

jaunty summit
#

Minimum Viable Documentation

cinder perch
#

I get it though, it's hard to explain succintly

jaunty summit
#

yeah, I'd look at that and go "yep" but I might be the one that wrote it lol

cinder perch
#

getting there

#

Part of it is that I'm adding explicit tips on how to use it

#

And removing stuff that is on the verbose side and not super useful (or sometimes redundant with arg description)

#

mmmm maybe we shoud also mask some arguments (I'm thinking of withExec, it has a lot including privileged

#

Isn't this warning incorrect?

#

Seems like it's a mixup with insecureRootCapabilities which is the next line

#

@jaunty summit @crisp nest what does DoNotCache mean?

crisp nest
#

The api isn't cached in dagql, Container.publish will always be executed

cinder perch
#

But there's 2 different docs strings, is that related to DoNotCache ?

#

or just a mistake?

jaunty summit
#

in ye olden days (not as true anymore with modules)

cinder perch
#

Func().DoNotCache("foo").Doc("bar")

#

Oh I see, is it DoNotCache("reason") ?

crisp nest
#

Oh that's just a typo I think. Yeah it's supposed to say the reason there but it's just a doc string 😆

cinder perch
#

And if so, does the reason show up in the API introspection? (ie. will humans or LLMs see it?)

crisp nest
#

Those strings never make it to users right now

cinder perch
#

OK

#

I had just edited that, the original says DoNotCache("Writes to the specified Docker registry.")

Doesn't really look like a reason either

#

OK I am unblocked thank you!

#

This is also wrong no? You're supposed to give only the registry address, not the whole image

jaunty summit
#

that's correct, you give a full ref

cinder perch
#

Oh!

#

weird

#

reverting my change

#

Why on earth isn't this an argument to publish then

cinder perch
#

believe it or not, I'm spending more time finding a goddamn go syntax error than, than I did actually editing the descriptions

#

"you fucked up somewhere in these 700 lines" thanks go compiler, super useful

#

OK I found it

#

pushed

#

only did container...

#

going to add container real quick, but feel free to test

#

@fair folio maybe too late for you? I'm losing track of timezones a little bit

fair folio
#

5:30 here, so going to do dinner shortly! is it ready to test now?

cinder perch
#

yeah

#

sorry it took so long

#

Just pushed directory descriptions

#

I'm going to do some manual testing on my end

#

@autumn condor @civic pine @fading fox any chance one of you can help test this? 🙏

civic pine
#

happy to help, catching up the above

autumn condor
cinder perch
#

Thanks, @jaunty summit @fading fox can you share the most up-to-date info on running the evals?

civic pine
cinder perch
#

No, this one hides functions from the LLM completely (and also shows new descriptions which we hope will be more clear)

#

9986 is an experimental feature that allows the LLM to decide which functions it needs

#

they're complementary

civic pine
#

ok, anyway what you need is testing your 10106 PR

fair folio
#

Testing now!

civic pine
#

@fair folio are you running evals or testing manually or both ?

fair folio
#

running evals

fading fox
autumn condor
fair folio
cinder perch
#

FYI I'm adding a commit to mask a bunch of functions from top-level Query type (lots of tool bloat in prompt mode)

fair folio
#

still pretty lost on the current commit

It appears that the Directory_withFile function is not the correct approach for mounting a directory into a
│ ┃ container. Instead, we should use the Directory_withPath function to mount the directory and then set the working
│ ┃ directory inside the container.

cinder perch
#

Directory_withPath???

autumn condor
# fair folio yes very quickly

So, we've been running the build-multi eval as you seemed to have shown us Kyle (for both openai and gemini) ; which gemini model are you testing on ? And testing on the side on dagger v0.18.2, doesn't seem to work either -- do you recall of one version working ?

Am I doing something wrong ?

fair folio
#

it really wants Container.withDirectory 😛

cinder perch
#

Fine I'll put it back

fair folio
#

the dumb computers thank you

autumn condor
# fair folio gemini-2.0-flash

Did it use to work at some point 👀 ? I'm a bit confused with the eval as, with the latest release (v0.18.2) + 4o it doesn't seem to work either

#

Sorry for the fast ramp-up on the eval system ; we're trying to get to a useful state

fair folio
#

Yes it works for me on 0.18.2 with gemini-2.0-flash

#

But they're evals so they may not necessarily work every run, you want to measure how often they work and how many tokens they use

#

In theory the eval I was using should be somewhat ok on 0.18.2 and much better on this PR

cinder perch
#

un-hide commit pushed.

Un-hides:

  • Container.withNewFile
  • Container.withoutFiles
  • Container.file
  • Container.withDirectory
  • Container.withoutDirectory
  • Container.directory

Hides:

  • Container.rootfs
  • Container.withRootfs
autumn condor
fair folio
autumn condor
autumn condor
cinder perch
#

The blocker was:

  • Showing the current "toyworkspace" example is confusing, because people don't understand where "ToyWorkspace" comes from, they think it's a system feature
#
  • So instead let's give it a directory
#
  • And/or a container
#
  • But core types are confusing LLMs so, let's simplify them
#

And here we are 🙂

#

So: if we merge a simplification of core types that doesn't break evals, Kyle has a chance to use it in his workshop

#

That said, there's very little time left (the workshop is tomorrow 9am Pacific)

#

so it may or may not be too late

fair folio
#

Haha I was just going to say that. If that eval ends up performing significantly better on this PR, ship it and I can use it

jaunty summit
#

approved the PR, only nit was to also bring back withFile

autumn condor
#

Your latest commit seems to improve everything overall yes Solomon:

Success rate on the eval that Kyle is interested in (build-multi)
true
false
true
true
true
true
false
true
true
true

it looks muuch better than v0.18.2 on gemini on my empirical tests / gpt-4o is much more inconsistents from gut feels on this eval though

cinder perch
jaunty summit
#

also looks like you might need to regen sdks or docs or something? all the checks failed

cinder perch
jaunty summit
#

yeah wish there was just a single dagger call generate -o . - i thought there was at one point

cinder perch
#

there is a top-level generate. But I think it does something different

#

I know because Devin tried to call it, poor thing

autumn condor
#

it looks like we escape the " inside ArgDoc but not in Doc ; breaking on elixir codegen -- slowly fixing that (and it's veerrrry slow to rerun the codegen on my computer) -- still progressing ahah 👼

autumn condor
#

@cinder perch we had to rebase and force-push on your branch -- we didnt touch your previous commits, just the ones following

#

Two learnings:

  • the elixir docs generator doesnt escape the " properly -- we'll file an issue and fix it at the root / just wanted to ship the PR
  • the Typescript SDK interprets the globing **/.env as the end of the comment section -- there's potentially a deeper comment format fix to do on the typescript docs generator
civic pine
#

It looks like it's Container.withFile that's the culprit that prevents the eval tests from passing

cinder perch
#

wait what time is it for you guys!

cinder perch
cinder perch
#

ok I'm dealing with a few logistical issues (no charger; need to find a place with ok wifi to do the live workshop tonight). then will tweak newfile

#

was it hard getting the evals to run?

civic pine
# cinder perch was it hard getting the evals to run?
$ ./hack/dev
$ git clone https://github.com/vito/daggerverse /tmp/daggerverse && cd /tmp/daggerverse/botsbuildingbots/evals
$ _EXPERIMENTAL_DAGGER_RUNNER_HOST=docker-container://dagger-engine.dev dagger
> build-multi | report
civic pine
#

@cinder perch I pushed to https://github.com/tiborvass/dagger core-mask-stopgpap branch. Removed the withFile commit (making the evals pass much more often than with the commit, but can still fail on the same error: env.output.asFile returning a nil Binding). I also harmonized verb conjugation in description (3rd person singular is used everywhere), and regenerated docs (dagger call generate export --path=.)

Not sure if it works better for Kyle.

jaunty summit
civic pine
# jaunty summit anything I can do to help? surprised `withFile`'s presence confused things, curi...
✔ build-multi | report 0.0s
Evaluation failed:                                                                                           
                                                                                                             
                                                                                                             
        Error Trace:    /src/botsbuildingbots/evals/main.go:238                                              
                                                /src/botsbuildingbots/evals/main.go:201                      
                                                /src/botsbuildingbots/evals/main.go:341                      
                                                /src/botsbuildingbots/evals/main.go:342                      
                                                /src/botsbuildingbots/evals/main.go:181                      
                                                /src/botsbuildingbots/evals/dagger.gen.go:299                
                                                /src/botsbuildingbots/evals/dagger.gen.go:200                
                                                /src/botsbuildingbots/evals/dagger.gen.go:120                
                                                /usr/local/go/src/runtime/proc.go:272                        
                                                /usr/local/go/src/runtime/asm_arm64.s:1223                   
        Error:          Received unexpected error:                                                           
                        input: llm.attempt.withEnv.withPrompt.loop.env.output.asFile binding value is nil    
        Test:           eval
fair folio
#

I think the time constraint is relaxed a bit since this won't be in for the workshop, which is ok

jaunty summit
#

if you hit esc and then w it'll bring up the trace in Cloud if you wanna just link the whole thing

#

(i to go back to insert mode)

civic pine
civic pine
#

@jaunty summit vvv

civic pine
#

i guess that's easier 😄

jaunty summit
#

lol

#

yeah looks like it just didn't call return - known issue, unlikely to be related to withFile

#

do you know what model this was?

civic pine
#

I only set Gemini key so whatever is the default

#

why does it say "binding value is nil" on asFile ? YOu mean it wasnt supposed to use env.output.asFile but should have used Return instead ?

jaunty summit
#

getting outputs from the LLM relies on the LLM calling its return tool and passing the outputs to it as args - if it exits the loop without doing that, the outputs won't be bound

civic pine
#

also why can' we see the "key" param from env.output(key) in the logs ?

#

also would you be open to a cosmetic change to evals: add a SUCCESS or FAIL output at the end of the report for easier human parsing ?

jaunty summit
jaunty summit
#

anyone nudging https://github.com/dagger/dagger/pull/10106 along? looks like there are some legit failures:

this one seems less legit (java flakes?) https://v3.dagger.cloud/dagger/traces/71ba4d7e1e280f52e5685af586fb4c55

autumn condor
jaunty summit