evals looking good - only one failure from Claude 3.5 which looks like possibly on the model/prompt: https://v3.dagger.cloud/dagger/traces/e98a8b514bfe77fb774eb8fb0c9a2e23
#static tool scheme
1 messages · Page 1 of 1 (latest)
@random heart re: naming - i tried out "method" and it seems to work pretty well. Meshes well with our existing 'object' metaphor, and no ambiguity with tools/functions
@cinder skiff what's the gist of the static tool strategy you're pursuing?
this is without chaining ?
oh yeah sorry here's the PR: https://github.com/dagger/dagger/pull/10366
yeah no chaining
select_tools=>select_methods(a little confusing now that we're static but evals are passing, so i'll try that later)- new
list_available_methodstool, which lists methods that have not been selected, which was previously in theselect_methodsdescription but that busts prompt caches - new
call_methodtool, which takes aselfarg, instead of each method taking a TypeName arg
thank you @cinder skiff ❤️
also select_methods is the way the model discovers the schema, and we keep track of that, so the model can't YOLO straight to call_method (it'll get a method not selected error)
i'm trying a run with a blank system prompt just to see if somehow they magically know what to do based on this new framing
update: they do not lol https://v3.dagger.cloud/dagger/traces/e14a68c60cd6e3adab757a3c63e28da5
hahaha
are you sure requiring a 2-step process (select then call) will help more than it hurts?
well, the alternative is to dump all of the schemas in list_available_methods which seems expensive
and the model's going to be hopeless without the schema
(also dumping all at once seems like it'd hurt the context window)
you could separate list from get_schema
that's essentially what it is, just under a different name
it's just that "select a method before calling it" is not a pattern that the model will be familiar with, so intuitively it feels like it might get confused
conceptually it's the same as what i called "list", "describe" (getSchema), "call" in the first version of static
ah it's the method not selected error that confused me
yeah this is basically the 'you need to RTFM before calling' error pattern
and in the description of call i see you speicfied that the llm has to first select it.
which i found works pretty well, though i usually mention the name of the tool to be more precise
ah, i guess you were trying to avoid confusion around calling select_methods in the MCP sense vs calling in the dagger sense (aka calling call_method)
maybe phrasing like "using select_methods" works
ah yea good idea
i agree that we could easily rename select_methods to method_schemas maybe it's conceptually easier
another thing that could help is in call_method we could validate the schema and return an error with the expected schema
renaming is easy, it's mostly a matter of what guides the model to actually call it. it took a few iterations to land on select_ and IME models are very keen to avoid reading manuals/etc. and will just steamroll forward
i'll give it a go either way, ideally it's understandable by us and the model lol
WDYT about returning schema in call_method if validation fails ?
or maybe just a message saying "RTFM call select_methods first" to have it reinforce the pattern
Let's coordinate closely if you guys don't mind - I'm going to open a separate issue to fix the DX of Env (not LLM-facing, but as we know things are entangled at the moment).
worth a try, it all comes down to tradeoffs around token cost (needing a system prompt increases baseline cost, failures increase cost + look spooky) - it's nice that this saves the extra trip to select_methods but wouldn't want to see failures all the time if we rely on it too much
My focus:
-
Make the DX simpler, by addressing papercuts. The first papercut being: adding aliases in
LLMso you can do simple things without explicitly instantiating aEnv(kind of likeContaineraliases toDirectoryfunctions for its rootfs) -
Get
Envback on track as a more general primitive, not just for LLMs. This one is where I see the most risk of entanglement - in both directions
I'm a bit confused on this DX / static tool track sorry 👀
Oooh:
- Make the DX simpler, by addressing papercuts. The first papercut being: adding aliases in LLM so you can do simple things without explicitly instantiating a Env (kind of like Container aliases to Directory functions for its rootfs)
Simplify the API so that it's easier for the LLM and making Env a top level thing in the dev loop
No I meant: simplify the API so that human devs can do llm | with-directory-input ... | loop | output | as-directory instead of llm | with-env $(env | with-directory-input) | loop | env | output | as-directory
zed + dagger mcp works now 🎉
(prompt: "create a debian container and install and run cowsay", trace: https://v3.dagger.cloud/dagger/traces/776ea47af98494499a8f60fa0b4a812f)
evals looking solid too, running a bunch of iterations locally to measure success rate + token usage
seems better than main even (some evals were a bit flaky there on certain models, now it seems more consistent)
That's awesome!!! Thank you!
token usage seems significantly higher, which is a little surprising, will investigate
initial guesses:
list_available_methodsnow includes the description for each method, so that'll obviously cost more- various tools now return JSON formatted responses, so that'll add up a bit
- maybe there's still something busting caches in tool descriptions?
@cinder skiff i had a commit that was taking only the first paragraph of descriptions
firstParagraph := strings.ReplaceAll(strings.Split(tool.Description, "\n\n")[0], "\n", " ")
Maybe that could also help as that should be sufficient to tell the LLM whether to call select_methods on it.
Another thing i wanted to try is to pass in either an object ref or a an object type to list_available_methods to make it less big.
lol - this explains some of it
hmm stupid
is it available that's tripping it ? Maybe it thinks that it changes constantly when in fact it doesn't once you have a type.
What happens if you try list_methods and mention in its description specifically that the methods of a type will never change
could be a poorly worded system prompt too
slowing down for a bit if you wanna try anything! (dinner + errands)
I finally got around to writing my thread