Cannot decode `Directory#1` | Dagger | Page 1

tawdry veldt Apr 14, 2025, 10:07 PM

#

there are 2 bugs in one bug. First is that the LLM is not great at typechecking, it is sometimes taking a HelloDagger#1 as a dagger.Directory: https://v3.dagger.cloud/tiborvass/traces/c218cb47a4d1c0fe518206ea305f957d

Thankfully I won't be hitting this since i'll be autoselecting it with @young solstice 's PR.

The other bug is that when it does choose a directory, it uses the string Directory#1 as-is instead of looking up its base64 ID. Where is the correspondance between the LLM-friendly numbers and the dagger IDs ?

I'm testing on main.

https://dagger.cloud/tiborvass/traces/5bef0311f17844dd789f50488a038719

#

Cannot decode Directory#1

young solstice Apr 14, 2025, 10:09 PM

#

what LLM impl are you using, btw? im a lil suprised by this error but maybe we just have different preferred providers

tawdry veldt Apr 14, 2025, 10:10 PM

#

4o

young solstice Apr 14, 2025, 10:10 PM

#

i never hit it with claude 3.5

tawdry veldt Apr 14, 2025, 10:36 PM

#

with claude 3.5 it is asking "which directory?" instead of trying the default, which tbh is the actual bug. https://v3.dagger.cloud/tiborvass/traces/c24a5d944b9ec91265331e8cf496bea8

So now i'm starting to think that 4o is hallucinating a Directory#1.

How can I check all the variables in a running shell ?

young solstice Apr 14, 2025, 10:38 PM

#

once i've opened a prompt, i go back to the shell and then do $agent | env | inputs | name

#

i am almost entirely certain there is a less stupid way than that though lmao

tawdry veldt Apr 14, 2025, 10:38 PM

#

TIL $agent

young solstice Apr 14, 2025, 10:39 PM

#

it used to be .llm

tawdry veldt Apr 14, 2025, 10:41 PM

#

Now, i just repro'd the same behavior as 4o on claude 3.5 (it's trying to pass Directory#1 but it was never created).

#

$agent | env | inputs | name returns just hello

sick scroll Apr 14, 2025, 10:43 PM

#

may want to base your repro + fixes on https://github.com/dagger/dagger/pull/10134

#

it's a large messy pr atm, sorry - but the tool calling scheme is very different, so I'm curious if any of the stuff I've already done addresses the issue in any way

tawdry veldt Apr 14, 2025, 10:47 PM

#

sure! will try now

sick scroll Apr 14, 2025, 10:56 PM

#

the evals are in-repo now, so you can do stuff like this:

# run a few attempts of an eval, analyze the results, suggest a new system prompt
dagger-dev -m modules/evaluator call --model claude-3-7-sonnet-latest --docs ./core/llm_docs.md --initial-prompt ./core/llm_dagger_prompt.md evaluate --model gpt-4.1 --name BuildMulti

# or to run a single eval
dagger-dev -m modules/evaluator/evals call build-multi report

young solstice Apr 14, 2025, 10:59 PM

#

i had procrastinated cloning them or learning how to use them in their original location, and now you have rewarded my procrastination

fervent temple Apr 14, 2025, 11:25 PM

#

sick scroll may want to base your repro + fixes on <https://github.com/dagger/dagger/pull/10...

Hey Alex, do we agree that this PR relies heavily on the system prompt to fix all those edge cases ?

I lost track with all the ideas

So, in order to have MCP - PROMPT parity, we need to re-focus on the system prompt eval / idea from @tawdry veldt.

Because, all the fixes from this branch currently won't be repercuted on MCP cc @young solstice

tawdry veldt Apr 14, 2025, 11:34 PM

#

Sorry Alex, got sidetracked, will get back into testing your branch. And I think Guillaume is referring to handling prompts via Instructions in MCP, which if we start relying heavily on it again, we'll need to prioritize to make things work with MCP. /cc @young solstice

#

I'll try to add an eval and run it with your instructions above

young solstice Apr 14, 2025, 11:35 PM

#

maybe a hot take but we should keep focus on a minimal mcp demo unless merging selectTools is hyper-imminent, and if it is we should acknowledge we're gonna have some serious logical merge conflicts to iron out

tawdry veldt Apr 14, 2025, 11:38 PM

#

right now the intuitive prompt is hitting the Directory#1 issue, so i'm trying to fix that.

sick scroll Apr 14, 2025, 11:41 PM

#

fervent temple Hey Alex, do we agree that this PR relies heavily on the system prompt to fix al...

this PR adds a system prompt, but that's not the central goal, it's also a revamped tool calling scheme altogether (no more 'current selection', instead there's a growing set of tools selected by the model)

I have a throwaway commit in the PR that also bubbles up the system prompt through MCP instructions but I haven't tested it yet. Goose respects it in theory

sick scroll Apr 14, 2025, 11:49 PM

#

young solstice maybe a hot take but we should keep focus on a minimal mcp demo unless merging s...

you're right though - i'm mostly looking at this trying to prevent redundant work but if this is a blocker for MCP maybe just go forward - at worst you'll find a different solution and I'll see how to adjust my branch if needed @tawdry veldt

#

i don't think i'm far from merging, but there's a lot of clean-up to do (as in splitting up the PR potentially)

tawdry veldt Apr 15, 2025, 5:05 PM

#

good news, the problem seems to go away with your branch! Will need to do more tests with MCP but it looks promising

sick scroll Apr 15, 2025, 5:28 PM

#

tawdry veldt good news, the problem seems to go away with your branch! Will need to do more t...

oh sweet. what does the eval/repro look like?

tawdry veldt Apr 15, 2025, 5:34 PM

#

Will send you later but basically the QuickStart example and the prompt for now is just “build dagger hello”

#Cannot decode `Directory#1`