#vito vacation handoff
1 messages · Page 1 of 1 (latest)
cc @sleek crater @elfin badge @weary sedge @hardy jungle
This PR is mostly small changes, but it's a bit of a grab bag so it's possible some things help and other things hinder: https://github.com/dagger/dagger/pull/10254
Context: https://discord.com/channels/707636530424053791/1364354292982616074 - it seemed to help in the end, so as long as evals are passing I'm pretty comfortable merging it, will try to keep the PR green today
This PR is much more experimental, can be scrapped for parts: https://github.com/dagger/dagger/pull/10229
Tool chaining: shows promise, helps save token usage by minimizing round-trips, but difficult to articulate the scenarios when the model should/shouldn't use chaining. Example of this is with WorkspacePattern where the model tries to "chain" a string value into a required arg for the next call, not realizing you can only chain objects. I added validations for this so the model can learn on the fly, but it sucks to watch it stumble. Lightly considering generalizing the chaining to support passing any result type T into a single missing required arg of same type (T) but not sure, could be unsound. Convenient result of that would be chaining calls could work by literally just calling them in order, if we adopt a policy of any missing required arg uses the most recently observed value of the required type. (We already do this - kind of - for the typed self args, but those use "highest numbered" whereas we would actually want "most recently returned")
Eliminating dynamic tool descriptions + schemas: the new scheme leans on dynamic tool descriptions and schemas, which unfortunately busts caches all the time. I added a list_objects tool which helps but we're not fully there yet. It's a bummer to lose the "enum" schema trick for listing valid IDs, since that's the one schema attribute respected by all major models.
Thank you so much for this amazing work! I'll try to experiment with the various ideas in chaining tools and see if i can improve the evals!