Multi-object eval 1 🧵 | Dagger | Page 1

cedar cedar Mar 13, 2025, 11:00 PM

#

OK I got it to actually see its objects @wild marlin 👍 now onto actually performing a task

#

cc @hushed hill @sturdy mason @pseudo flame @hollow osprey @glossy finch

#

Multi-object nirvana here we come!

#

https://v3.dagger.cloud/dagger/traces/0c1194dea89c89b7f81e64da9f9979d9#fd90c1caf4124291

Dagger Cloud

Browse and visualize Dagger traces.

#

I hit an error with my github publish, env var not available somehow. Had to interrupt the agent with Ctrl-C. Wasn't sure what part of the context was still there. Which prompts need to be re-sent? Which variables are still set?

#

(note: I kind of want to annotate traces right now 🙂

hushed hill Mar 13, 2025, 11:04 PM

#

looks like it didn't stumble actually finding it's tools which is sweet

cedar cedar Mar 13, 2025, 11:05 PM

#

You can see here I'm trying to figure out how much of context is left: https://v3.dagger.cloud/dagger/traces/0c1194dea89c89b7f81e64da9f9979d9?span=467091788f1d71be

Dagger Cloud

Browse and visualize Dagger traces.

cedar cedar Mar 13, 2025, 11:05 PM

#

hushed hill looks like it didn't stumble actually finding it's tools which is sweet

yup that part worked out it looks like

#

Not sure what that was: https://v3.dagger.cloud/dagger/traces/0c1194dea89c89b7f81e64da9f9979d9?span=7e13fec0c8599460

Dagger Cloud

Browse and visualize Dagger traces.

#

hushed hill Mar 13, 2025, 11:07 PM

#

yeah looks like either the tool arg types were wrong or it hallucinated them a bit

cedar cedar Mar 13, 2025, 11:10 PM

#

@wild marlin one interesting side effect of allowing use of hashes: I updated one of the variables, asked it to try again, but it tried from the same hash so I had to explicitly say "reload from the variable please"

cedar cedar Mar 13, 2025, 11:11 PM

#

cedar cedar

we had an actual engine / bbi bug in there at some point. We may have removed the fix alongside much of the bbi code

#

@wild marlin part 1: object has wrong credentials, agent gets an error

#

@wild marlin part2 : user fixes the object, asks agent to try again. but agent does not reload from variable (uses hash)

wild marlin Mar 13, 2025, 11:20 PM

#

right right

cedar cedar Mar 13, 2025, 11:20 PM

#

Part 3: user prompts to reload. this time object is used. We get to a new error 🙂

wild marlin Mar 13, 2025, 11:20 PM

#

well, technically it's not that it used the hash - it's that the current state never changed

#

(i think)

cedar cedar Mar 13, 2025, 11:21 PM

#

oh, is there still a concept of current state? I thought that was replaced by explicit IDs

wild marlin Mar 13, 2025, 11:21 PM

#

nope it's still there

cedar cedar Mar 13, 2025, 11:21 PM

#

I guess it's a hybrid system: current state does the heavy lifting. Explicit IDs for passing objects as arguments

wild marlin Mar 13, 2025, 11:21 PM

#

yeah

cedar cedar Mar 13, 2025, 11:22 PM

#

(and/or more creative uses by the model)

#

creative->scary

wild marlin Mar 13, 2025, 11:22 PM

#

and calling tools changes the current state to the new return value

cedar cedar Mar 13, 2025, 11:22 PM

#

makes sense. then yeah you're right, it's the current state that's the problem

wild marlin Mar 13, 2025, 11:23 PM

#

maybe setting variables could also set the current state? thinkies

cedar cedar Mar 13, 2025, 11:23 PM

#

-> maybe we should show current selection in the prompt?

cedar cedar Mar 13, 2025, 11:23 PM

#

wild marlin maybe setting variables could also set the current state? <:thinkies:10403576432...

tricky because I might set multiple vars in the same shell session

wild marlin Mar 13, 2025, 11:23 PM

#

would that break?

#

i think right now it's basically set a=1, b=2 -> prompt -> model selects a or b

cedar cedar Mar 13, 2025, 11:24 PM

#

maybe setting variables could also set the current state?

But if I set 3 variables, which one would be set to the state? Last one wins?

wild marlin Mar 13, 2025, 11:24 PM

#

yeah, and if the model needs a different one i presume it would just swap

cedar cedar Mar 13, 2025, 11:25 PM

#

I think it might confuse the model if it's not the one doing the selecting

wild marlin Mar 13, 2025, 11:25 PM

#

but, eh, it doesn't feel great

cedar cedar Mar 13, 2025, 11:25 PM

#

at the moment we only tell the model about its new state when it selects it

#

what if we shaped the tools so that it has to start a "pipeline"; and the concept of currents state is only within that pipeline/session. If the pipeline completes, a value will be saved to a variable. If it's interrupted, the state is lost

#

(not sure that even makes sense)

#

OK next UI blocker: the agent says it's done. I can't really inspect the result, because I can't list current variables in the shell

#

I'm literally paying OpenAI to ask the LLM which variables it sees, because i can't see them 😛

#

wild marlin Mar 13, 2025, 11:29 PM

#

lol, i ran into that too and tried running _env and .env in desparation

#

would be nice if $ supported tab completion too

wild marlin Mar 13, 2025, 11:29 PM

#

cedar cedar

are you able to confirm those are synced back to the shell?

cedar cedar Mar 13, 2025, 11:33 PM

#

yes I opened a terminal in one of them 🙂

cedar cedar Mar 13, 2025, 11:49 PM

#

getting some random 1password lookup errors

#

(ignore the var names 😛 )

#

error is from pressing >

#

@wild marlin just demoed this 👆 to @glossy finch @hollow osprey @strong lynx 🙂 they like it

#

Started talking about the looking question of "what about access to the shell environment - like the current module, core API, etc"

#

observed that the copilot's "privileges" will need to be configurable - you don't always want to give access to core API, or current module

#

from there: how to configure this?

#

from there: well there's the LLM API, llm | .... prompt mode is already a special case of that API. Could we make the relationship more ovious, so you can configure your "copilot" (the special global llm instance selected by prompt mode) using the llm API?

#

from there: what if you could set any number of variables of type llm, and > let you cycle through them?

#

Like copilot tabs 🙂

wild marlin Mar 14, 2025, 12:23 AM

#

@cedar cedar an idea for the var confusion: what if, when we set a var, we add a message like "The variable foo has been set to Container@xxh3:...", and if it changes, it says "The variable foo has changed from Contaienr@xxh3:... to Container@xxh3:..." - maybe that will be enough of a hint to the model

#

I actually tried that before embracing the 'just pass $foo' pattern and it did help

#

right now var changes are completely invisible to the model until it observes them (_objects), which it can't always know to do

wild marlin Mar 14, 2025, 12:54 AM

#

worked 🎉
before: https://v3.dagger.cloud/dagger/traces/5fd781eb740315208d8c51e0123b860b
after: https://v3.dagger.cloud/dagger/traces/5b683a47ab40e34ad66f34e6027fb2b4

cedar cedar Mar 14, 2025, 12:55 AM

#

so you wrap it in an extra system prompt?

wild marlin Mar 14, 2025, 12:56 AM

#

not a system prompt - it just adds a message to the history, and _save returns the same message: https://github.com/shykes/dagger/commit/3f85dea93b1f72b5c496c1f481465315bccb2b08

GitHub

add variable changes to message history · shykes/dagger@3f85dea

Signed-off-by: Alex Suraci

cedar cedar Mar 14, 2025, 1:25 AM

#

Overall this first eval went really well!

#

A few papercuts. But the core plumbing held up well

#

Makes me want to try making more modules that I can compose in-model

#Multi-object eval 1 🧵