#Make core types more LLM-friendly · Issu...
1 messages · Page 1 of 1 (latest)
I want to take another swing at LLM-controlled function masks, building on the new system prompt approach - now that we have that ability cross LLMs maybe I can add some hints in there to get more consistent behavior
Yeah but we should crawl before we walk IMO
your idea will work with smart models but not dumb ones
(might make it worse for them)
we'll see! what if the dumb ones are dumb because they have 1000 tools thrown at them?
then let's hide as many from them as possible
(obviously not totally true but my point is 'let the evals speak')
without asking them to reason about which tools they need
i'm not disagreeing with your proposal, it's meant to be complementary
eh, but you're right that most cases are probably a handful of tools per type being used
in theory we could use Dagger Cloud data to see the top APIs for each type, but the query will be very expensive (and we could probably just guess them accurately)
I want to avoid nerfing the core API by hiding too much.. But there's a lot of redundancy
For example we could perhaps hide all the duplicates from Container.rootfs that are already in Directory
SELECT SpanName, COUNT(*) AS Calls
FROM otel_traces
WHERE SpanName LIKE 'Container.%'
AND TraceId LIKE '0%'
GROUP BY SpanName
ORDER BY Calls DESC;
=> https://gist.github.com/vito/f9bd178b760c81d281a5408ed95d4e6b
Right off the bat:
- pipeline
- withFocus
- withoutFocus
- up
- experimentalWithGPU
- build (deprecated)
maybe:
- withDirectory, withFile, withFiles, withNewFile
Also the fact that every scalar field is hidden behind a function call, was already an annoying tax in the DX, when you need to query a bunch of them. But it's an even greater tax on LLMs because they're each an individual tool. Would be way easier to have a single tool that returns a small json with a dozen fields
withDirectory, withFile, withFiles, withNewFile seem worth keeping to me - if the alternative is to pull off the rootfs, modify it, and add it back, that might be harder for an LLM to juggle
"let the evals decide"? 🙂
yep yep
but yeah very possibly
might be worth blocking Dockerfile related APIs by default. A lot of models assume when you say "build" you mean "Docker build" because of them. Tricky though, the task might actually call for it sometimes
we could group the following under a single "ociConfig" function:
userplatformentrypointlabellabelsexposedPortsdefaultArgsworkdirenvVariables
would this return JSON?
I think Container.build can go because it's deprecated
Directory.dockerBuild should be fine, provided we clean up and optimize the descriptions
🤷♂️
I guess even if it doesn't it'd still tuck those fields away
if we're ready to cross that line, we could do it only at the MCP layer
(I mean the "model context" layer)
Grabbed Directory stats while I'm at it: https://gist.github.com/vito/f9bd178b760c81d281a5408ed95d4e6b#file-directory-csv
I love that we have those 🙂
Yeah pretty fun - the query isn't too bad either, just takes a few seconds, memory usage stays flat. I sped it up a bit by only querying trace IDs beginning with 0 lol
@fading fox @solid crag can you update me on this 🙏
it's urgent now
(blocker for kubesimplify workshop we're giving tomorrow with @fair folio)
so that we can stop attaching a ToyWorkspace to the LLM context, which confuses people because they think it's a system feature (because of lack of self-calls)
I do not have context here but I’ve been trying to do shit without toy workspaces for a while and mostly failing or emulating it
We’re basically looking for an out of the box function mask for core? Did any other texture come from @jaunty summit s data science?
I thought it was in flight - I'll do it
nope not really - honestly I think we could get away with a chopped down list of functions to expose for the biggest core types, even without a way to get access to the rest, for now. Things will get more complicated if/when we need that, but I doubt we will for all intents and purposes so far
Yeah let me try a very naive version of that real quick
@fair folio I'm going to hide export for now
in theory the agents can do more with it, but it's super confusing as I'm seeing
oh wait
And now it's easy enough to switch back to shell mode and export the result ourselves if we want
Container.export, Directory.export, File.export
same with terminal
@jaunty summit there's Directory.asModule() and asModuleSource() what's the difference? Do I need both?
ok. It's one of the few functions I actually rely on when I've used prompt mode lol. Same with terminal
i think you can safely disable both, they might even be filtered out already on my branch since it has a change to avoid exposing tools that return any of those meta types we disabled (like TypeDef)
maybe i should backport that..
I actually just used it in a real-world scenario, which was trying to upgrade an old module and checking if it loads 🙂
ha fair enough
i'm not sure which one is 'higher level' - maybe @crisp nest would have a quick answer (Directory.asModule vs Directory.asModuleSource)
I'll take "less use cases but works reliably" over "lots of unreliable use cases"
we weren't working on this one with Connor 🙏
Directory.asModule is just a shortcut to Directory.asModuleSource.asModule, so it's not necessary. IIRC I left Directory.asModule in during the most recent refactoring of all that because I didn't feel like rm'ing it was worth the minorly breaking change since it didn't seem to hurt anything
sorry, i probably should have delegated this one, I got nerd sniped by some weird eval behavior
Here's what I have:
switch typeName {
case "Container":
switch field.Name {
case "pipeline", "withFocus", "withoutFocus", "up", "experimentalWithGPU", "experimentalWithAllGPUs", "export", "exposedPorts", "build", "withDirectory", "withFile", "withFiles", "withNewFile", "withoutFile", "withoutFiles", "withoutDirectory", "platform", "entrypoint", "label", "labels", "defaultArgs", "workdir", "envVariables", "terminal":
continue
}
case "Directory":
switch field.Name {
case "terminal", "export", "withTimestamps", "withFiles", "asModule", "asModuleSource", "withoutFile", "withoutFiles", "withoutDirectory", "name":
continue
}
case "File":
switch field.Name {
case "export", "withName", "withTimestamps":
continue
}
}
nice haircut
Container doesn't need withDirectory / withNewfile?
What if we hide withMounted* instead
I want it to use rootfs/withRootfs instead
There's a worry that it will make it cognitively more complicated. If that turns out to be the case, we can bring them back
Oh yeah definitely hiding those! thanks I missed them
got it, i'll wait to see the results then 😛
mmm should I hide all withMountedxxx?
probably safer yeah
oh wait I missed a bunch
bigger haircut
"pipeline", "withFocus", "withoutFocus", "up", "experimentalWithGPU", "experimentalWithAllGPUs", "export", "exposedPorts", "build", "withDirectory", "withFile", "withFiles", "withNewFile", "withoutFile", "withoutFiles", "withoutDirectory", "platform", "entrypoint", "label", "labels", "defaultArgs", "workdir", "envVariables", "terminal", "withoutMount", "withMountedTemp", "withMountedFile", "withMountedDirectory", "withMountedSecret", "withMountedCache", "import", "imageRef", "withoutWorkdir", "withoutUser", "withoutUnixSocket", "withoutSecretVariable",
what's the danger with withMountedXXX? we just want to consolidate on withXXX?
that was my thought, yeah. I was proposing hiding withMountedDirectory but showing withDirectory
my only worry there is 'mount' is a common verb to use ("mount my repo to /src") but maybe it'll figure it out
(one of my evals does that specifically so we'll see!)
makes sense! I would have missed anything going on last week - is there a parallel initiative to improve function descriptions?
case "Container":
switch field.Name {
case
"build",
"defaultArgs",
"entrypoint",
"envVariable",
"envVariables",
"experimentalWithAllGPUs",
"experimentalWithGPU",
"export",
"exposedPorts",
"imageRef",
"import",
"label",
"labels",
"mounts",
"pipeline",
"platform",
"terminal",
"up",
"withDirectory",
"withFile",
"withFiles",
"withFocus",
"withMountedCache",
"withMountedDirectory",
"withMountedFile",
"withMountedSecret",
"withMountedTemp",
"withNewFile",
"withoutDefaultArgs",
"withoutDirectory",
"withoutFile",
"withoutFiles",
"withoutFocus",
"withoutMount",
"withoutRegistryAuth",
"withoutSecretVariable",
"withoutUnixSocket",
"withoutUser",
"withoutWorkdir",
"workdir":
continue
I think it got dropped also
eyeballing it, looks like we're left with
AsService, AsTarball, Directory, Entrypoint, ExitCode, File, From, ID, Publish, Rootfs, Stderr, Stdout, Sync, User, With, WithAnnotation, WithDefaultArgs, WithDefaultTerminalCmd, WithEntrypoint, WithEnvVariable, WithExec, WithExposedPort, WithGraphQLQuery, WithRegistryAuth, WithRootfs, WithSecretVariable, WithServiceBinding, WithUnixSocket, WithUser, WithWorkdir, WithoutAnnotation, WithoutEntrypoint, WithoutEnvVariable, WithoutExposedPort
Thanks I'll remove the without I missed
wdyt about asService?
maybe I should have listed functions we want to keep 🙂
Just re-pushed with even more masked
@fair folio I think with the haircut (assuming it works) it should be safe to give the agent a directory
would be sweet if it worked
I have to go to dinner, will be available again after. Could a maintainer take over helping @fair folio to get this merged asap? 🙏
yeah i'm confident it'll be fine for the read/write part. The hard part will still be "build". We can give a container and nudge towards the command to run in the container, but it still has to figure out how to get the directory in there first (which is why I'm thinking we need withDirectory). And that part feels much more error prone than the current workspace approach
Well that's where we can use multi-object to our advantage
All we need is a generic go builder module, installed a dependency. With an API that doesn't confuse the model
In the CLI you can install the module, and in prompt mode the LLM wil have access to the dependency and figure it out from the directory
and/or you instantiate a go builder yourself, set it in a variable, and then the llm will use the variable
On the code side, I guess you have to decide if you want to give it an instance, or make it privileged and just have it use the dependency
My overall point is that all of the above is easier with "gobuilder" than with "toyworkspace" imo
ok sounds good. I was thinking we were going for no-deps but I guess the idea is a generic go builder is not as confusing as a workspace module.
The interesting thing about making the module APIs better for LLMs in this case is they're better if they're stateless which I think is the oppsite from how a lot of these modules are designed today
yeah I was going for no-deps but had completely forgotten about the go builder
by stateless do you mean minimizing withFoo calls in favor of just one big function with a bunch of args? (taken to an extreme)
that, and also making constructors optional
I'm also pretty hesitant on removing withDirectory/withFile, should be sure to run the eval with these changes to test the theory that it'll grok rootfs/withRootfs. I'm very skeptical, but if it sounds like we're veering towards not exposing such minutae to LLMs that's an overriding design decision, and we need to change our evals
I think constructors should work fine, the llm sees them as regular functions
ok I can play around and see what works. My idea is that
golang(source).test() or golang.withSource(source).test()
is going to be less reliable than
golang.test(source)
possible
I've wanted to add an intermediary golang().module(source).test() in that API anyway
I made a similar API change in my greetings-api demo for the same reasons
I had
func (b *Backend) Check(ctx context.Context) (string, error)
and added
func (b *Backend) CheckDirectory(ctx context.Context, source *dagger.Directory) (string, error)
fwiw i haven't really seen evals struggle with this sort of thing, so if it goes against usual module design conventions I'd caution against it. is there a common failure motivating this?
I shouldn't have combined the constructor and withSource examples lol, which are you referring to?
I guess it's basically this one https://github.com/vito/daggerverse/blob/main/botsbuildingbots/evals/main.go#L180
both - vs. golang.test(source). generally, chaining/incremental state config seems OK to me, and that's what the core API and a lot of modules are designed around, so I'm curious if you're seeing LLMs struggle with it or if it's just a hunch
yeah that one does a lot of chaining and seems OK
ah ok! The api change I pasted above was mainly to use interfaces so the second half of chaining being less reliable was a hunch
What's the status on that PR atm ? Do you need more help Kyle: https://github.com/dagger/dagger/pull/10106
I'll build it now and see how things look. I'm guessing evals will have trouble without withDirectory but I'll try 🙂
Heading back home from dinner, I'll do a pass on those core function comments, see if we can make everything a little more clear for the models
Keep us updated 🙏 -- here to help if needed 👀
ok don't merge 😱
build-multi eval
on 10106: Input Tokens: 657324 / Output Tokens: 2456
on 0.18.2: Input Tokens: 37571 / Output Tokens: 83
I need to use a different approach.
I need to find a way to mount Directory#1 into /src.
I think I'm stuck.
it tried many times toying around with rootfs, eventually gave up, and then wrote a dockerfile to the directory and did a dockerBuild 🤣
@fair folio out of curiousity, does it get there if you nudge it about rootfs / withRootfs? you can try with -i to hook into there
oh lol
here's 0.18.2
🧑 💬 Mount Directory#1 into Container#1 at /src, set it as your workdir, and build ./cmd/booklit with CGO_ENABLED=0.
🤖 🛠️ selectContainer {"id":"Container#1"}
🪙 Tokens Used: 890 in => 6 out
🛠️ 💬 {"selected":"Container#1"}
🤖 🛠️ Container_withMountedDirectory {"path":"/src","source":"Directory#1"}
🪙 Tokens Used: 6724 in => 12 out
🛠️ 💬 {"previous":"Container#1","selected":"Container#2"}
🤖 🛠️ Container_withWorkdir {"path":"/src"}
🪙 Tokens Used: 6757 in => 8 out
🛠️ 💬 {"previous":"Container#2","selected":"Container#3"}
🤖 🛠️ Container_withEnvVariable {"name":"CGO_ENABLED","value":"0"}
🪙 Tokens Used: 6786 in => 12 out
🛠️ 💬 {"previous":"Container#3","selected":"Container#4"}
🤖 🛠️ Container_withExec {"args":["go","build","./cmd/booklit"]}
🪙 Tokens Used: 6819 in => 12 out
🛠️ 💬 {"previous":"Container#4","selected":"Container#5"}
🤖 💬 Now, extract the compiled binary.
🤖 🛠️ Container_file {"path":"/src/booklit"}
🪙 Tokens Used: 6851 in => 17 out
🛠️ 💬 {"previous":"Container#5","selected":"File#1"}
🤖 🛠️ return {"bin":"File#1"}
🪙 Tokens Used: 1367 in => 5 out
🛠️ 💬 ok
🤖 💬 File#1 is the compiled Booklit binary.
🪙 Tokens Used: 1377 in => 11 out
</messages>
### Total Token Cost
* Input Tokens: 37571
* Output Tokens: 83
vs ...
OK let me see if I can make it better with upgraded comments
yeah it might not recognize the rootfs concept/idea
i've also seen them happily mount over /
OK this one is very confusing. I bet fixing this one will help
ArgDoc("directory", "Directory to mount."),
``` 🤷♂️
Now I'm wondering how many humans read those docs and were like "help"
We were young
we should add a life alert button to the docs
Minimum Viable Documentation
I get it though, it's hard to explain succintly
yeah, I'd look at that and go "yep" but I might be the one that wrote it lol
getting there
Part of it is that I'm adding explicit tips on how to use it
And removing stuff that is on the verbose side and not super useful (or sometimes redundant with arg description)
mmmm maybe we shoud also mask some arguments (I'm thinking of withExec, it has a lot including privileged
Isn't this warning incorrect?
Seems like it's a mixup with insecureRootCapabilities which is the next line
@jaunty summit @crisp nest what does DoNotCache mean?
The api isn't cached in dagql, Container.publish will always be executed
But there's 2 different docs strings, is that related to DoNotCache ?
or just a mistake?
I think that stems from being able to do things like Host.directory(path: "~/.ssh/")
in ye olden days (not as true anymore with modules)
Oh that's just a typo I think. Yeah it's supposed to say the reason there but it's just a doc string 😆
And if so, does the reason show up in the API introspection? (ie. will humans or LLMs see it?)
Those strings never make it to users right now
OK
I had just edited that, the original says DoNotCache("Writes to the specified Docker registry.")
Doesn't really look like a reason either
OK I am unblocked thank you!
This is also wrong no? You're supposed to give only the registry address, not the whole image
that's correct, you give a full ref
Oh!
weird
reverting my change
Why on earth isn't this an argument to publish then
believe it or not, I'm spending more time finding a goddamn go syntax error than, than I did actually editing the descriptions
"you fucked up somewhere in these 700 lines" thanks go compiler, super useful
OK I found it
pushed
only did container...
going to add container real quick, but feel free to test
@fair folio maybe too late for you? I'm losing track of timezones a little bit
5:30 here, so going to do dinner shortly! is it ready to test now?
yeah
sorry it took so long
Just pushed directory descriptions
I'm going to do some manual testing on my end
@autumn condor @civic pine @fading fox any chance one of you can help test this? 🙏
happy to help, catching up the above
same, taking a look -- compiling the dev engine rn 👀
Thanks, @jaunty summit @fading fox can you share the most up-to-date info on running the evals?
Is this similar to @jaunty summit 's https://github.com/dagger/dagger/pull/9986 ?
No, this one hides functions from the LLM completely (and also shows new descriptions which we hope will be more clear)
9986 is an experimental feature that allows the LLM to decide which functions it needs
they're complementary
ok, anyway what you need is testing your 10106 PR
Testing now!
@fair folio are you running evals or testing manually or both ?
running evals
code is here but i don't know to run it tbh
Can you go on dev-audio ? 👀
yes very quickly
FYI I'm adding a commit to mask a bunch of functions from top-level Query type (lots of tool bloat in prompt mode)
still pretty lost on the current commit
It appears that the Directory_withFile function is not the correct approach for mounting a directory into a
│ ┃ container. Instead, we should use the Directory_withPath function to mount the directory and then set the working
│ ┃ directory inside the container.
So, we've been running the build-multi eval as you seemed to have shown us Kyle (for both openai and gemini) ; which gemini model are you testing on ? And testing on the side on dagger v0.18.2, doesn't seem to work either -- do you recall of one version working ?
Am I doing something wrong ?
it really wants Container.withDirectory 😛
gemini-2.0-flash
the dumb computers thank you
Did it use to work at some point 👀 ? I'm a bit confused with the eval as, with the latest release (v0.18.2) + 4o it doesn't seem to work either
Sorry for the fast ramp-up on the eval system ; we're trying to get to a useful state
Yes it works for me on 0.18.2 with gemini-2.0-flash
But they're evals so they may not necessarily work every run, you want to measure how often they work and how many tokens they use
In theory the eval I was using should be somewhat ok on 0.18.2 and much better on this PR
un-hide commit pushed.
Un-hides:
Container.withNewFileContainer.withoutFilesContainer.fileContainer.withDirectoryContainer.withoutDirectoryContainer.directory
Hides:
Container.rootfsContainer.withRootfs
and for it to work, as you mentionned orally, it's like ~600k tokens
That was the first version of this PR. 0.18.2 was like 30k
@autumn condor here
Just got a success on 0.18.2 🙏 -- now I have a base reference 🙏
Can we help any other way than testing Solomon / variations of the PR ?
I don't know, the priority was to unblock Kyle for the workshop he's doing tomorrow
The blocker was:
- Showing the current "toyworkspace" example is confusing, because people don't understand where "ToyWorkspace" comes from, they think it's a system feature
- So instead let's give it a directory
- And/or a container
- But core types are confusing LLMs so, let's simplify them
And here we are 🙂
So: if we merge a simplification of core types that doesn't break evals, Kyle has a chance to use it in his workshop
That said, there's very little time left (the workshop is tomorrow 9am Pacific)
so it may or may not be too late
Haha I was just going to say that. If that eval ends up performing significantly better on this PR, ship it and I can use it
approved the PR, only nit was to also bring back withFile
Your latest commit seems to improve everything overall yes Solomon:
Success rate on the eval that Kyle is interested in (build-multi)
true
false
true
true
true
true
false
true
true
true
it looks muuch better than v0.18.2 on gemini on my empirical tests / gpt-4o is much more inconsistents from gut feels on this eval though
oh right. adding
also looks like you might need to regen sdks or docs or something? all the checks failed
pushed
no idea how
will try
doing it atm
yeah wish there was just a single dagger call generate -o . - i thought there was at one point
there is a top-level generate. But I think it does something different
I know because Devin tried to call it, poor thing
it looks like we escape the " inside ArgDoc but not in Doc ; breaking on elixir codegen -- slowly fixing that (and it's veerrrry slow to rerun the codegen on my computer) -- still progressing ahah 👼
@cinder perch we had to rebase and force-push on your branch -- we didnt touch your previous commits, just the ones following
Two learnings:
- the elixir docs generator doesnt escape the
"properly -- we'll file an issue and fix it at the root / just wanted to ship the PR - the Typescript SDK interprets the globing
**/.envas the end of the comment section -- there's potentially a deeper comment format fix to do on the typescript docs generator
It looks like it's Container.withFile that's the culprit that prevents the eval tests from passing
wait what time is it for you guys!
huh, you mean newFile doesn't work, or models are confused by it?
it's confused
ok I'm dealing with a few logistical issues (no charger; need to find a place with ok wifi to do the live workshop tonight). then will tweak newfile
was it hard getting the evals to run?
$ ./hack/dev
$ git clone https://github.com/vito/daggerverse /tmp/daggerverse && cd /tmp/daggerverse/botsbuildingbots/evals
$ _EXPERIMENTAL_DAGGER_RUNNER_HOST=docker-container://dagger-engine.dev dagger
> build-multi | report
@cinder perch I pushed to https://github.com/tiborvass/dagger core-mask-stopgpap branch. Removed the withFile commit (making the evals pass much more often than with the commit, but can still fail on the same error: env.output.asFile returning a nil Binding). I also harmonized verb conjugation in description (3rd person singular is used everywhere), and regenerated docs (dagger call generate export --path=.)
Not sure if it works better for Kyle.
anything I can do to help? surprised withFile's presence confused things, curious to see how if you have any trace links handy. i'll ping t-infra so we can start getting these evals in-repo (
https://discord.com/channels/707636530424053791/1355029902247854226 )
✔ build-multi | report 0.0s
Evaluation failed:
Error Trace: /src/botsbuildingbots/evals/main.go:238
/src/botsbuildingbots/evals/main.go:201
/src/botsbuildingbots/evals/main.go:341
/src/botsbuildingbots/evals/main.go:342
/src/botsbuildingbots/evals/main.go:181
/src/botsbuildingbots/evals/dagger.gen.go:299
/src/botsbuildingbots/evals/dagger.gen.go:200
/src/botsbuildingbots/evals/dagger.gen.go:120
/usr/local/go/src/runtime/proc.go:272
/usr/local/go/src/runtime/asm_arm64.s:1223
Error: Received unexpected error:
input: llm.attempt.withEnv.withPrompt.loop.env.output.asFile binding value is nil
Test: eval
I think the time constraint is relaxed a bit since this won't be in for the workshop, which is ok
that's possibly unrelated, depending on the model - it just means the model never called the return tool. would need to see the rest of the report to see if it actually failed or if it just forgor
if you hit esc and then w it'll bring up the trace in Cloud if you wanna just link the whole thing
(i to go back to insert mode)
I can do later now I’m on phone
@jaunty summit vvv
i guess that's easier 😄
lol
yeah looks like it just didn't call return - known issue, unlikely to be related to withFile
do you know what model this was?
I only set Gemini key so whatever is the default
why does it say "binding value is nil" on asFile ? YOu mean it wasnt supposed to use env.output.asFile but should have used Return instead ?
getting outputs from the LLM relies on the LLM calling its return tool and passing the outputs to it as args - if it exits the loop without doing that, the outputs won't be bound
also why can' we see the "key" param from env.output(key) in the logs ?
also would you be open to a cosmetic change to evals: add a SUCCESS or FAIL output at the end of the report for easier human parsing ?
no good reason, that's just how graphql errors are formatted
yeah sure 👍
anyone nudging https://github.com/dagger/dagger/pull/10106 along? looks like there are some legit failures:
- https://v3.dagger.cloud/dagger/traces/c30480ce48ae0cc42b3ed85737adfc03 - needs to be updated for new doc strings
- https://v3.dagger.cloud/dagger/traces/9cbcd1150d23110a7983b02332b42063 - docs need to be regenerated
this one seems less legit (java flakes?) https://v3.dagger.cloud/dagger/traces/71ba4d7e1e280f52e5685af586fb4c55
We can take a look if no one is doing it 👀 -- from what I understood, the withFile leading to an early return is a known issue AND you're on it Alex ?
I'm on the early return issue yeah but it's tied up with a bunch of other changes. I don't think it's related at all to withFile, it happens all the time with certain models. So I wouldn't consider it a blocker for this PR