supporting lists | Dagger | Page 1

normal zenith Mar 4, 2025, 9:03 PM

#

I think this can't be true? The following prompt for example: llm --model gemini-2.0-flash | with-directory $(git github.com/dagger/dagger| head | tree) | with-prompt "i gave you access to a directory at / and tools to work with it. the tool entries will tell you the contents of the directory. tell me the first item" | last-reply calls Directory.entries("/") and tells me .changes is the first entry

sturdy jolt Mar 4, 2025, 9:10 PM

#

Right, so I think that case works for a specific reason, which is that Entries returns a list of strings. When we hit that case we actually just json marshal and i think it sees a list of strings: https://github.com/shykes/dagger/blob/431135c10af2d3a2a6051cf3841f26982f70ae48/core/bbi/flat/flat.go#L260

So the LLM can infer there's a list just because it's json.

But when we return a list of objects we return an ID for the type "list of that Object", but it's just a single ID for the whole list. That's where the LLM has no knowledge of lists or how to select elements from them.

Like if you change your example to use something that has a return type of []*Directory or anything like that, then I think the LLM will get confused again

GitHub

dagger/core/bbi/flat/flat.go at 431135c10af2d3a2a6051cf3841f26982f7...

A portable devkit for CI/CD pipelines. Contribute to shykes/dagger development by creating an account on GitHub.

#

It's a good point though that the problem is specifically with lists of objects and interesting the LLM can do it with lists of scalars just because it understands JSON

normal zenith Mar 4, 2025, 9:13 PM

#

Gotcha, that makes sense. I haven't run into that yet!

sturdy jolt Mar 4, 2025, 9:15 PM

#

Yeah I don't think it's a blocker per-se, but I happened to hit it quick and it's profoundly confusing.

The fact that it works with JSON in the list-of-scalars-case gave me an idea for a potential quick fix though, which is to represent lists of objects as a list of their IDs when presented to the LLM. Maybe they are good enough JSON parsers to work with that reliably

#

I'll try it quick

normal zenith Mar 4, 2025, 9:17 PM

#

Yeah it sounds like a missing part of BBI that we don't provide a tool to access the list items

sturdy jolt Mar 4, 2025, 9:19 PM

#

normal zenith Yeah it sounds like a missing part of BBI that we don't provide a tool to access...

Yeah that would be even better but more work. If this quick fix works it basically relies on the LLM being given a json list encoded as a string and being able to reliably do stuff like "what is the 7th element of that list?"

I have no intuition on how good they would be at that. Seems plausible but also they struggle counting the number of r's in strawberry so 🤷‍♂️

deft raft Mar 4, 2025, 9:51 PM

#

At the moment I believe BBI requires the state to be an object

#

Good timing since I'm diving back into BBI today and tomorrow to add multi-object, and could make other improvements while I'm at it

#

One thing I've been wondering: would there be a way to have weakly-typed setter functions?

#

Like the equivalent of Llm.WithAny(value any) in go?

sturdy jolt Mar 4, 2025, 9:53 PM

#

deft raft Good timing since I'm diving back into BBI today and tomorrow to add multi-objec...

Okay I'll just leave it alone for now then rather than try a fix (I'm probably spending a bit too much time here tbh anyways). It wasn't turning out to be a "quick fix" anyways 😄

deft raft Mar 4, 2025, 9:53 PM

#

@sturdy jolt just so I understand the shape of the problem, you were focusing on arrays specifically right?

#

Oh OK I Just realized - your issue is not even with the Dagger-facing part of the BBI. It's purely with the llm-facing part

#

ie you do not need this:

var ctrs []*dagger.Container
dag.Llm().WithContainerArray(ctrs).WithPrompt(...)

sturdy jolt Mar 4, 2025, 9:58 PM

#

deft raft <@949034677610643507> just so I understand the shape of the problem, you were fo...

It's more specifically arrays of objects. Right now the code handling that is confusing because when it sees a type of []Object (where Object is File, Directory, etc.) it treats it as just Object in some cases. e.g. isObjectType seems to return true when it's a list of Objects, not just a standalone Object: https://github.com/shykes/dagger/blob/431135c10af2d3a2a6051cf3841f26982f70ae48/core/bbi/flat/flat.go#L147

That's due to the .Name() method on ast.Type just returning the element type name when it's a list (which is dubious, but in a library outside our control)

After that the return value for a list of objects is just a single ID which represents the entire list. Possible fixes include:

Instead present that to the LLM as a json list of the ID types (relies on LLM understanding json, which seems plausible)
Keep returning just a single ID for the whole list, but specifically tell the LLM that it's a list type and give it an extra "built-in" tool called like selectNthElement which gives it the ability to retrieve individual elements from any list ID (this matches how dagql call ID formats work)

sturdy jolt Mar 4, 2025, 10:00 PM

#

deft raft ie you do *not* need this: ```golang var ctrs []*dagger.Container dag.Llm().Wit...

Correct, I don't need that (though obviously would be nice to have some day). The problem here arises the LLM calls something that returns a list of objects and then tries to use it. See simple example here: https://github.com/dagger/dagger/pull/9628#issuecomment-2698853881

deft raft Mar 4, 2025, 10:00 PM

#

Ah I see

#

yeah that was one of the horrible horrible blocker bugs where I had to ask cursor+o1 for help 🙂

#

actually one of my nightmares is re-triggering these bugs as I go back and mess with BBI

sturdy jolt Mar 4, 2025, 10:02 PM

#

deft raft Like the equivalent of `Llm.WithAny(value any)` in go?

I think that's what we talked about a few weeks ago before any of this had come together. It's definitely technically possible, though the classic tradeoff of type safety vs. convenience in some situations. If we don't have that we will end up with lots of autogenerated with* for permutations of with<Object>, with<Object>Array, etc. But idk if that's a bad thing

deft raft Mar 4, 2025, 10:03 PM

#

sturdy jolt I think that's what we talked about a few weeks ago before any of this had come ...

Let's revisit once we have multi-object in place. It could be that there's an easy fix

sturdy jolt Mar 4, 2025, 10:03 PM

#

deft raft actually one of my nightmares is re-triggering these bugs as I go back and mess ...

Yeah my other general comment after reviewing is that even though it will be tricky to figure out, we desperately need some tests somehow. Probably need some mock LLM backend

deft raft Mar 4, 2025, 10:03 PM

#

I was thinking of this kind of API:

LLM.set<Foo>(key: String, value: <Foo>): LLM!
LLM.get<Foo>(key: String): <Foo>!

Potentially we could add a third:

LLM.append<Foo>(key: string, value: <Foo>): LLM!

sturdy jolt Mar 4, 2025, 10:06 PM

#

deft raft I was thinking of this kind of API: ``` LLM.set<Foo>(key: String, value: <Foo>)...

Yeah IIUC that seems reasonable to me. It's "stringly typed" of course but if someone really cares about that they are free to make their own "workspace object" that has fields for each of the objects/lists instead (which also of course will be 100x less painful after we have self calls)

deft raft Mar 4, 2025, 10:44 PM

#

Sorry the key: string are for multi object it's so I can give the LLM say, a workspace, a container and a github API endpoint (3 different object types) each at a different variable name

#

so it's orthogonal to the array problem

sturdy jolt Mar 4, 2025, 10:51 PM

#

deft raft Sorry the `key: string` are for multi object it's so I can give the LLM say, a w...

Yeah that's what I was imagining. I was just saying that there's sort of an equivalence between doing:

This:

type Workspace struct {
   Foo *dagger.File
   Bar *dagger.Directory
   Baz []*dagger.Container
}

// (pretend self-calls exist and this would work w/out separate module)
ws := &Workspace{Foo: foo, Bar: bar, Baz: baz}
llm := dag.Llm().WithWorkspace(ws)

And this:

llm := dag.Llm().
  SetFile("Foo", foo).
  SetDirectory("Bar", bar).
  AppendContainers("Baz", baz)

Like "multi-object" could be modeled either with string vals as you mentioned, or there's an equivalent representation of a custom object that has fields for each object.

In the longer term I like the "custom object" approach more since once self calls exist it's about as convenient and retains type safety, but in the mean time the "string key" approach works just fine and holds us over

deft raft Mar 4, 2025, 11:05 PM

#

sturdy jolt Yeah that's what I was imagining. I was just saying that there's sort of an equi...

ah yes. exactly 👍

#supporting lists