static tool scheme | Dagger | Page 1

cinder skiff May 9, 2025, 6:21 PM

#

evals looking good - only one failure from Claude 3.5 which looks like possibly on the model/prompt: https://v3.dagger.cloud/dagger/traces/e98a8b514bfe77fb774eb8fb0c9a2e23

Dagger Cloud

Browse and visualize Dagger traces.

#

@random heart re: naming - i tried out "method" and it seems to work pretty well. Meshes well with our existing 'object' metaphor, and no ambiguity with tools/functions

random heart May 9, 2025, 6:22 PM

#

which commit on dagger is this ?

#

nvm 7e18cb44

celest dune May 9, 2025, 6:25 PM

#

@cinder skiff what's the gist of the static tool strategy you're pursuing?

random heart May 9, 2025, 6:25 PM

#

this is without chaining ?

cinder skiff May 9, 2025, 6:26 PM

#

oh yeah sorry here's the PR: https://github.com/dagger/dagger/pull/10366

GitHub

llm: static tool scheme by vito · Pull Request #10366 · dagger/da...

cinder skiff May 9, 2025, 6:26 PM

#

random heart this is without chaining ?

yeah no chaining

cinder skiff May 9, 2025, 6:27 PM

#

celest dune <@108011715077091328> what's the gist of the static tool strategy you're pursuin...

select_tools => select_methods (a little confusing now that we're static but evals are passing, so i'll try that later)
new list_available_methods tool, which lists methods that have not been selected, which was previously in the select_methods description but that busts prompt caches
new call_method tool, which takes a self arg, instead of each method taking a TypeName arg

random heart May 9, 2025, 6:28 PM

#

thank you @cinder skiff ❤️

cinder skiff May 9, 2025, 6:28 PM

#

cinder skiff * `select_tools` => `select_methods` (a little confusing now that we're static b...

also select_methods is the way the model discovers the schema, and we keep track of that, so the model can't YOLO straight to call_method (it'll get a method not selected error)

#

i'm trying a run with a blank system prompt just to see if somehow they magically know what to do based on this new framing

#

update: they do not lol https://v3.dagger.cloud/dagger/traces/e14a68c60cd6e3adab757a3c63e28da5

Dagger Cloud

Browse and visualize Dagger traces.

random heart May 9, 2025, 6:30 PM

#

hahaha

celest dune May 9, 2025, 6:30 PM

#

are you sure requiring a 2-step process (select then call) will help more than it hurts?

cinder skiff May 9, 2025, 6:31 PM

#

well, the alternative is to dump all of the schemas in list_available_methods which seems expensive

#

and the model's going to be hopeless without the schema

#

(also dumping all at once seems like it'd hurt the context window)

celest dune May 9, 2025, 6:32 PM

#

you could separate list from get_schema

cinder skiff May 9, 2025, 6:32 PM

#

that's essentially what it is, just under a different name

celest dune May 9, 2025, 6:32 PM

#

it's just that "select a method before calling it" is not a pattern that the model will be familiar with, so intuitively it feels like it might get confused

random heart May 9, 2025, 6:33 PM

#

conceptually it's the same as what i called "list", "describe" (getSchema), "call" in the first version of static

celest dune May 9, 2025, 6:33 PM

#

cinder skiff that's essentially what it is, just under a different name

ah it's the method not selected error that confused me

cinder skiff May 9, 2025, 6:33 PM

#

yeah this is basically the 'you need to RTFM before calling' error pattern

random heart May 9, 2025, 6:34 PM

#

and in the description of call i see you speicfied that the llm has to first select it.

#

which i found works pretty well, though i usually mention the name of the tool to be more precise

#

ah, i guess you were trying to avoid confusion around calling select_methods in the MCP sense vs calling in the dagger sense (aka calling call_method)

#

maybe phrasing like "using select_methods" works

cinder skiff May 9, 2025, 6:36 PM

#

ah yea good idea

random heart May 9, 2025, 6:38 PM

#

i agree that we could easily rename select_methods to method_schemas maybe it's conceptually easier

#

another thing that could help is in call_method we could validate the schema and return an error with the expected schema

cinder skiff May 9, 2025, 6:41 PM

#

renaming is easy, it's mostly a matter of what guides the model to actually call it. it took a few iterations to land on select_ and IME models are very keen to avoid reading manuals/etc. and will just steamroll forward

#

i'll give it a go either way, ideally it's understandable by us and the model lol

random heart May 9, 2025, 6:41 PM

#

WDYT about returning schema in call_method if validation fails ?

#

or maybe just a message saying "RTFM call select_methods first" to have it reinforce the pattern

celest dune May 9, 2025, 6:46 PM

#

Let's coordinate closely if you guys don't mind - I'm going to open a separate issue to fix the DX of Env (not LLM-facing, but as we know things are entangled at the moment).

cinder skiff May 9, 2025, 6:46 PM

#

random heart WDYT about returning schema in `call_method` if validation fails ?

worth a try, it all comes down to tradeoffs around token cost (needing a system prompt increases baseline cost, failures increase cost + look spooky) - it's nice that this saves the extra trip to select_methods but wouldn't want to see failures all the time if we rely on it too much

celest dune May 9, 2025, 6:49 PM

#

My focus:

Make the DX simpler, by addressing papercuts. The first papercut being: adding aliases in LLM so you can do simple things without explicitly instantiating a Env (kind of like Container aliases to Directory functions for its rootfs)
Get Env back on track as a more general primitive, not just for LLMs. This one is where I see the most risk of entanglement - in both directions

royal fjord May 9, 2025, 6:58 PM

#

I'm a bit confused on this DX / static tool track sorry 👀

#

Oooh:

Make the DX simpler, by addressing papercuts. The first papercut being: adding aliases in LLM so you can do simple things without explicitly instantiating a Env (kind of like Container aliases to Directory functions for its rootfs)
Simplify the API so that it's easier for the LLM and making Env a top level thing in the dev loop

celest dune May 9, 2025, 7:34 PM

#

royal fjord Oooh: > 1. Make the DX simpler, by addressing papercuts. The first papercut bei...

cinder skiff May 9, 2025, 10:25 PM

#

zed + dagger mcp works now 🎉

#

(prompt: "create a debian container and install and run cowsay", trace: https://v3.dagger.cloud/dagger/traces/776ea47af98494499a8f60fa0b4a812f)

Dagger Cloud

Browse and visualize Dagger traces.

#

evals looking solid too, running a bunch of iterations locally to measure success rate + token usage

#

seems better than main even (some evals were a bit flaky there on certain models, now it seems more consistent)

random heart May 9, 2025, 10:29 PM

#

That's awesome!!! Thank you!

cinder skiff May 9, 2025, 10:36 PM

#

token usage seems significantly higher, which is a little surprising, will investigate

#

initial guesses:

list_available_methods now includes the description for each method, so that'll obviously cost more
various tools now return JSON formatted responses, so that'll add up a bit
maybe there's still something busting caches in tool descriptions?

random heart May 9, 2025, 11:22 PM

#

@cinder skiff i had a commit that was taking only the first paragraph of descriptions

firstParagraph := strings.ReplaceAll(strings.Split(tool.Description, "\n\n")[0], "\n", " ")

Maybe that could also help as that should be sufficient to tell the LLM whether to call select_methods on it.

Another thing i wanted to try is to pass in either an object ref or a an object type to list_available_methods to make it less big.

cinder skiff May 9, 2025, 11:30 PM

#

lol - this explains some of it

random heart May 9, 2025, 11:33 PM

#

hmm stupid

#

is it available that's tripping it ? Maybe it thinks that it changes constantly when in fact it doesn't once you have a type.

#

What happens if you try list_methods and mention in its description specifically that the methods of a type will never change

cinder skiff May 9, 2025, 11:40 PM

#

could be a poorly worded system prompt too

#

slowing down for a bit if you wanna try anything! (dinner + errands)

celest dune May 9, 2025, 11:43 PM

#

I finally got around to writing my thread

#static tool scheme