#vito vacation handoff

1 messages · Page 1 of 1 (latest)

iron holly
#

quick 🧵 to share notes - I'll be on PTO until May 5 starting tomorrow, and busy cleaning my house today (finally moved back in after flooding repairs)

#

cc @sleek crater @elfin badge @weary sedge @hardy jungle

#

This PR is mostly small changes, but it's a bit of a grab bag so it's possible some things help and other things hinder: https://github.com/dagger/dagger/pull/10254

Context: https://discord.com/channels/707636530424053791/1364354292982616074 - it seemed to help in the end, so as long as evals are passing I'm pretty comfortable merging it, will try to keep the PR green today

GitHub

This is a bit of a grab-bag now fixing various issues @kpenfound ran into for kpenfound/hello-dagger#2

#

This PR is much more experimental, can be scrapped for parts: https://github.com/dagger/dagger/pull/10229

Tool chaining: shows promise, helps save token usage by minimizing round-trips, but difficult to articulate the scenarios when the model should/shouldn't use chaining. Example of this is with WorkspacePattern where the model tries to "chain" a string value into a required arg for the next call, not realizing you can only chain objects. I added validations for this so the model can learn on the fly, but it sucks to watch it stumble. Lightly considering generalizing the chaining to support passing any result type T into a single missing required arg of same type (T) but not sure, could be unsound. Convenient result of that would be chaining calls could work by literally just calling them in order, if we adopt a policy of any missing required arg uses the most recently observed value of the required type. (We already do this - kind of - for the typed self args, but those use "highest numbered" whereas we would actually want "most recently returned")

Eliminating dynamic tool descriptions + schemas: the new scheme leans on dynamic tool descriptions and schemas, which unfortunately busts caches all the time. I added a list_objects tool which helps but we're not fully there yet. It's a bummer to lose the "enum" schema trick for listing valid IDs, since that's the one schema attribute respected by all major models.

sleek crater
#

Thank you so much for this amazing work! I'll try to experiment with the various ideas in chaining tools and see if i can improve the evals!