#default-constructible loaded-module LLM function visibility
1 messages ยท Page 1 of 1 (latest)
putting aside potential api bikeshedding adjacent to privileged:true, how do we expect this to work?
like we want to initialize the module object so that the llm can call it, but atm the way "root" works is to pass the wholeass server
so should i expect that having initialized the module means all its functions will show up in a callable way on that server? [the problem marcos was describing](#agents message) is that he doesn't wanna have to select the module object to call the functions, like with DaggerDev as the loaded module we should just be able to > run all the tests
because reading the code, to me, it feels like exposing those functions requires selecting that module object, and therefore not selecting the toplevel server with all the queries on it and whatnot
yeah, I think that's the desired state. there's a tangential issue, since there's no way to go back to Query from the selected module object, so we've been talking about adding a selectRoot (or something) to fix that. but I think the desired initial state would still be that you only have the module object's tools immediately available (bikesheddable)
like for > run all the tests it would ideally already be scoped in tightly
and you'd only escape back out to the root if you need to (at the LLM's discretion I guess?)
ah gotcha, i thought the bikeshedding was gonna be around env(privileged:true, module: .), and for both those to be set at once i think breaks the whole abstraction
everything is bikesheddable ๐ - I'm also toying with another scheme that might remove the need for it, but not sold on it yet, still very experimental and might not work for dumber models
i guess both those bikesheds are possible at once if env(privileged:true, module: .) initially selects the module but has selectRoot available
tl;dr: no more 'current selection', there's just one tool, selectTools(tools: ["..."]), which lists all available tools and object IDs in its description, and all objects are passed around as arguments instead of swapping selections around (so there's a self: arg)
like WithExec(self:ctr, cmd:"cowsay lul")?
(ok not just one tool, there's still returning)
yea
but, sorry, might just be a side tangent, still eval'ing
Quick interjection: I think the monolithic concept of "root query object" (which tainted Env.withQuery() and still taints env(privileged) doesn't make sense to expose to the LLM. It mixes 1) core API access and 2) module access which are pretty different from a UX point of view
That's literally the original flat BBI though
problem is tool explosion
lol [this](#1359291317364064357 message) vibes with me hard, it's a big smell that the reason we want privileged:true is because module funcs have parameters that can only be "built" via the core api
looping the tangent back in then, rm'ing the idea of a 'currently selected object' helps, since then we don't need to express that there's a root object. with this model it's just a flat set of functions that the LLM keeps growing. the Query level functions are listed in there and just don't take a self arg, and they can be intermixed with other type's functions
have we tested that the LLMs actually understand swapping back and forth from root, anyways? like I try to get it to call ctr.WithFile(file) with a container selected, it's gotta pop out to root, construct the file, bind it, then pop back to ctr and call WithFile($file)
i imagine testing this requires that the selectRoot tool exists lol
and maybe that's step one here
i did, and couldn't get it to behave well consistently, so i shelved it due to time constraints
๐ข
i could have been doing something dumb though
at the time it was a "maybe i can sneak this in for release" => computer said no
I agree the Query functions should be an always-present overlay
I don't think we can get rid of "select" completely without finding another solution to tool explosion. If the tool list just keeps growing, eventually it will hit the limit
I guess we could bet on models just getting better
- maybe with the core type haircut, the situation is different?
still I think query is a special case. never made sense to me that you could select it
OK how about this
as an alternative
i am so curious to see what happens when they release a claude whose training set includes the part of the internet documenting MCP and tool calling XD
@serene sky instead of selecting objects, we keep an ever-growing flat list of namespaced tools (basically like in single-object), but we add a "tool store" indirection, where the model can search the toolstore and enable/disable tools
difference is that it maps directly to the flat list of tools concept, that LLMs are trained for. Less juggling with "objects" to select etc
But the type system is still there. So hopefully the model can use type information to choose tools wisely
searching tools by type sounds pretty sensible
already working on literally that, i think haha
some evals are looking promising
still totally need the function trimming btw. if not just to hide export ๐ญ - with a narrower selection there's a MUCH higher chance of the LLM picking the right ones based on name only
@serene sky yeah we're trying to get that merged here -> https://github.com/dagger/dagger/pull/10106
so uh this convo went slightly of the rails but i also wasn't very clear with my intentions. short term, should we be looking to change the default shell startup to be something like env(privileged: true, module: .) so you boot into a shell/prompt with your sdk-defined functions available and then the LLM has to back out to root via a selectRoot tool? the tenor of this conversation was leaning towards the idea that selectTools might obviate such a thing.
it might obviate it but I wouldn't block on it for now, it's too early to tell. At least some of the work you do for that will still be relevant either way
ah, because selecttools still kinda implies you want some control over the intial selection
cool i will move forward then
yeah exactly, the end goal is still there: "some means via the API to control what tools the model has immediately available"
unfortunately i'm seeing that models are keen to make up tool args (Container.withExec(..., env: [...]) and neglect to provide required ones (self). trying to system prompt my way out of it.
https://v3.dagger.cloud/dagger/traces/52833f21ff76c0fbcf01c994a3c5dd1a?span=4de7ff3688002d0c
the struggle is real
I keep making the same mistake. I need to specify the container each time.
lmao it's so dumb and yet so self-aware at the same time
the "bad intern that's real good at googling" analogy echos throughout lmao
omg that trace is insanely long
yeah now i'm adding per-eval MaxAPICalls lol