select_tools | Dagger | Page 1

summer holly Apr 11, 2025, 6:32 PM

#

Current status: still seems promising, but it's extremely dependent on the system prompt. We already are - #1357369057334136882 message - so that's not a dealbreaker, but now it's just even more important.

What I'm finding though is it's hard to come up with one system prompt that makes all models happy. That's still the goal, but yeah. So far I've been able to get a system prompt that makes OpenAI and Gemini happy, but not Claude. And I can craft a system prompt that makes Claude happy, but not Gemini.

The march continues...

#

Worth noting the issue with Claude when Gemini+OpenAI are happy is that it doesn't call returnToUser - so the same issue as on main. For some reason it's just not keen to call it. The system prompt needs to REALLY focus on it (like "look at returnToUser and work backward")

#

Still have things to try. Just pulling up the curtain a bit!

#

One of the breakthroughs was listing the return type for each tool in the selectTools description.

gray cedar Apr 11, 2025, 6:41 PM

#

So in terms of dependency on system prompt, so far this is a regression right? Since so far only Gemini has needed one

summer holly Apr 11, 2025, 6:44 PM

#

gray cedar So in terms of dependency on system prompt, so far this is a regression right? S...

Not really. Claude needs one right now too and is pretty broken since the introduction of the return tool. I really don't think we can get by without one at this point.

#

(see linked message)

gray cedar Apr 11, 2025, 6:49 PM

#

I wonder if mcp.run has the same issue. They also claim a dynamic/programmable mcp server, but seem to have gotten it to work over MCP

#

Maybe worth looking at which tools they expose

#

https://docs.mcp.run/blog/2024/12/18/universal-tools-for-ai/

Universal Tools For AI | 🤖

#

For sure they don't rely on system prompt since they got it to work over MCP

summer holly Apr 11, 2025, 6:49 PM

#

maybe they use the instructions API? thinkspin (but I would be surprised just from how chaotic everything is haha)

summer holly Apr 11, 2025, 6:49 PM

#

gray cedar For sure they don't rely on system prompt since they got it to work over MCP

that's the thing - the instructions API is explicitly for this sort of thing

#

so I don't think we should think of it as "MCP doesn't support system prompts"

gray cedar Apr 11, 2025, 6:49 PM

#

yeah but do any clients actually implement it?

summer holly Apr 11, 2025, 6:50 PM

#

if they don't, that's their problem, and more reflective of how new and incomplete the whole field is

#

Goose implements it

gray cedar Apr 11, 2025, 6:50 PM

#

a few quick observations too:

I noticed in a trace that return wasn't always selected (seemed like it had to be explicitly selected in selectTools). I think probably that should be only for object functions, not meta-tools

summer holly Apr 11, 2025, 6:51 PM

#

gray cedar a few quick observations too: - I noticed in a trace that `return` wasn't alway...

that's already the case, they don't have to select that one, some just do that anyway yeah

gray cedar Apr 11, 2025, 6:51 PM

#

ah 🙂 but maybe remove it from the list then (or maybe it's not even in the list and they hallucinate it)

summer holly Apr 11, 2025, 6:51 PM

#

yeah it's a hallucination lol

#

models say the darndest things - at one point Gemini (when poorly prompted) was calling selectTools(["Workspace_research", "Workspace_research", "Workspace_research"]) when told to "research and record three findings"

#

the models were all also ignoring the required self arg like 50% of the time 😬 - which I've worked around by having it default to the most recent object of the desired type

#

also self is now named Foo, to match Foo_bar, which I thought would help. it didn't, but I decided to keep it that way anyway. (Gemini sometimes mangles words that are special words in Python, like return -> return_, hence the switch to returnToUser)

summer holly Apr 11, 2025, 6:58 PM

#

summer holly the models were all also ignoring the **required** `self` arg like 50% of the ti...

i actually kind of like this trick though - it means you can hop between objects somewhat safely without extra calls to switch selections etc.

#

but, it's a bit of a crutch, I would much prefer they remember object IDs and use them accurately

summer holly Apr 11, 2025, 7:07 PM

#

gray cedar I wonder if mcp.run has the same issue. They also claim a dynamic/programmable m...

From last time I looked into it, their tools aren't nearly as dynamic. It removes the pain of managing the MCP server config .json, by putting their single stateful entrypoint in the middle (tied to your account where you install servlets, i.e. tools, from their marketplace). But in the end those tools/servlets are already designed to be exposed over MCP; they don't have a sophisticated stateful object/API/function calling/output returning scheme like we do, so they shouldn't need a system prompt at all.

gray cedar Apr 11, 2025, 8:21 PM

#

ah I see. I was wondering if they added an indirection in how llms consumed tools, so that different llm sessions could get different tools.

summer holly Apr 16, 2025, 7:05 PM

#

update: PR nearly done, updated the description with eval results + more details https://github.com/dagger/dagger/pull/10134

GitHub

llm: switch to `selectTools` scheme by vito · Pull Request #10134 ...

Evals comparison with main
Bottom line up front: with these changes, all evals now perform much more consistently across all major providers.
There is a higher baseline token cost now that we apply...

eternal cipher Apr 16, 2025, 7:06 PM

#

👋 Testing this now with the quickstart on different models
Currently hitting an issue where the actual operations that modify my environment are showing as cached and my saved container is unmodified
https://v3.dagger.cloud/kpenfound/traces/3af5f19a4d48beb8cbeb55a819adb8ad

Dagger Cloud

Browse and visualize Dagger traces.

summer holly Apr 16, 2025, 7:11 PM

#

eternal cipher 👋 Testing this now with the quickstart on different models Currently hitting an...

Just to set expectations: this is with qwen and afaik the baseline there is already the floor. Would be really nice if we can get it to work but it hasn't been a priority for this PR - I've been focusing on the usual providers first

#

I guess it amounts to doing a lot more prompting, I see you've got a hefty one there. Some of it is out of date (I see "Select Container#1")

eternal cipher Apr 16, 2025, 7:12 PM

#

Haha yeah I'll sign the waiver that says qwen might not work properly. My goal is to find what amount of prompting gets different sizes of qwen to kind of work. I was surprised with this one because in the trace it looks like it actually did the right thing

summer holly Apr 16, 2025, 7:13 PM

#

no idea what's going on here lol - I set ?debug on my local Cloud UI which shows literally all spans (even passthrough) and it's not like there's a hidden call or anything. You can see a revealed passthrough span at the bottom

#

like why's it saying "I have created ..." when it didn't even take any action thinkspin

#

The repeated selectTools might be something we can system prompt away

eternal cipher Apr 16, 2025, 7:14 PM

#

ok, I wasn't sure if that was a quirk of the new API or not. Agreed with the repeated selects. Do you know where the tool calls at the end are actually happening?

summer holly Apr 16, 2025, 7:15 PM

#

it seems like it failed to call withExec a couple times and then just returned an older Container

eternal cipher Apr 16, 2025, 7:16 PM

#

ah, thank you, that explains it

summer holly Apr 16, 2025, 7:17 PM

#

the cached statuses are just because the actual effects are cached, like at the buildkit level. so that part worked fine I think?

#

hmm it also bailed early because it called save which is normally a hint to our loop that it's done (and it can break). in this case it looks like it was treating it more like a checkpoint. it might have kept going if it didn't call it so early

#

i added that break there because otherwise I saw qwen just repeatedly calling return (which is now save) with the same value

eternal cipher Apr 16, 2025, 7:18 PM

#

without debug, this is what was confusing me. It looks like it did the right thing and all was good

eternal cipher Apr 16, 2025, 7:19 PM

#

summer holly i added that `break` there because otherwise I saw `qwen` just repeatedly callin...

yeah I also saw the repeated return on 0.18.3

#

i'll add some prompting to say only use save when you're done for-reals

summer holly Apr 16, 2025, 7:20 PM

#

eternal cipher without debug, this is what was confusing me. It looks like it did the right thi...

ah, what happened was it made the same failed go build call from earlier, against Container#1 instead of the newly returned Container#2. and since it was the same thing, the span got deduped

eternal cipher Apr 16, 2025, 7:20 PM

#

ahh ok, makes sense

#

maybe ollama specifically is weird with system prompts?

summer holly Apr 16, 2025, 7:21 PM

#

yeah maybe you have to write it differently

eternal cipher Apr 16, 2025, 7:21 PM

#

i'll copy some parts of the default system prompt into the user prompt

summer holly Apr 16, 2025, 7:22 PM

#

i'm not too familiar with ollama but i'd expect it to at least apply stronger weight to a system prompt still

#

so in my mind that would just make things worse, but SHRUGGERS

eternal cipher Apr 16, 2025, 7:23 PM

#

yeah who knows, all of the "openAI compatible APIs" so far have taken the "compatible" part pretty loosely

eternal cipher Apr 17, 2025, 8:02 PM

#

I've been doing a bunch of testing with local models on this branch. The issues I'm hitting aren't specific to this change, so I can start a new thread if that's better

The challenge I'm trying to overcome is that it's currently impossible for a local model (like qwen2.5-coder:14b on ollama) to complete the quickstart - even with toy-workspace instead of container.

To set expectations, I recreated the quickstart using openai's agent sdk and the same model has no problems at all completing the "make a curl clone" prompt.

I guess the question is: how urgent is this problem?

summer holly Apr 17, 2025, 8:06 PM

#

eternal cipher I've been doing a bunch of testing with local models on this branch. The issues ...

@lapis narwhal suggested trying 32b but I haven't yet - did you try it by any chance?

eternal cipher Apr 17, 2025, 8:07 PM

#

Yeah I've mostly tried 32b with dagger, even more customized versions of it from huggingface

#

fwiw I used qwen 32b a lot on 17-llm.X with success on single-object, so it's still a tool calling complexity issue

summer holly Apr 17, 2025, 8:10 PM

#

hmm interesting

#

i'm trying to avoid the idea that smaller models will need us to maintain an entirely different scheme laughcry

partly because if that is the answer, we'd really need to justify the existence of the more complex one (e.g. it makes it better at more sophisticated tasks, somehow larger models prefer the more sophisticated scheme, or the more sophisticated scheme saves you a ton in token cost). but maybe they're low ish maintenance once you figure it out

eternal cipher Apr 17, 2025, 8:13 PM

#

I even got 3b (three) to work with the openai agents sdk, but not consistently

summer holly Apr 17, 2025, 8:14 PM

#

like this part is interesting - it seems to think it called those tools, instead of just making them available

#

there might be a simple fix for that if so

#

right now selectTools just replies with "ok" lol

#

it could either:

reply with "You now have the %s tool available."
reply with a list of tool schemas - which might double as a workaround for MCP clients that don't suppport the tools-changed notification (or it's racy etc)

#

maybe those will help - i'll try 2) first since the secondary effect would be neato (compatibility with more clients)

eternal cipher Apr 17, 2025, 8:16 PM

#

here's the current version of the prompt I was trying with qwen 32b and the exact quickstart we have today that uses container https://github.com/kpenfound/agents/blob/67d0f355e6191b0392ad98fcca8c83eae4b82c8b/quickstart/prompts/qwen2.5-coder.md?plain=1

#

and the models I've been trying

hf.co/unsloth/Qwen2.5-Coder-32B-Instruct-128K-GGUF:Q4_K_M
hf.co/unsloth/Qwen2.5-Coder-32B-Instruct-128K-GGUF:Q8_0
qwen2.5-coder:32b
qwen2.5-coder:14b

eternal cipher Apr 17, 2025, 8:19 PM

#

summer holly it could either: 1. reply with "You now have the %s tool available." 2. reply wi...

if you push something like this i'll rebuild and try it

#

here's the openai version btw https://gist.github.com/kpenfound/c0227e0e308ed3161fcde27e3f289d60

Gist

Dagger agent quickstart using OpenAI Agent SDK

Dagger agent quickstart using OpenAI Agent SDK. GitHub Gist: instantly share code, notes, and snippets.

summer holly Apr 17, 2025, 8:48 PM

#

@eternal cipher pushed

#

gonna see if this makes dagger mcp usable in Zed 👀

lapis narwhal Apr 17, 2025, 8:52 PM

#

summer holly gonna see if this makes `dagger mcp` usable in Zed 👀

lmk how it goes

summer holly Apr 17, 2025, 8:52 PM

#

...gotta make a zed extension first lol

lapis narwhal Apr 17, 2025, 8:53 PM

#

summer holly ...gotta make a zed extension first lol

you shouldn't need to, the mcp configs are hackable

summer holly Apr 17, 2025, 8:53 PM

#

couldn't find it in their docs

lapis narwhal Apr 17, 2025, 8:54 PM

#

{
  "context_servers": {
    "dagger": {
      "command": {
        "path": "dagger",
        "args": ["mcp", "-m", "~/src/dagger"],
        "env": {
          "ANTHROPIC_API_KEY": "REDACTED",
          "NO_COLOR": "true"
        }
      },
      "settings": {}
    },

in your zed config

summer holly Apr 17, 2025, 8:55 PM

#

ah cool - i just told an agent to do it for me and let it rip, since we might want it eventually lol

#

if it gets stuck ill give that a go instead

eternal cipher Apr 17, 2025, 9:04 PM

#

summer holly <@135620352201064448> pushed

Yooooo that did it 🎉 https://v3.dagger.cloud/kpenfound/traces/ea99e098ec75c34ca93ade59e1990aeb
This was qwen2.5-coder:14b. It got confused with immutable objects at first but then it got there

Dagger Cloud

Browse and visualize Dagger traces.

summer holly Apr 17, 2025, 9:05 PM

#

eternal cipher Yooooo that did it 🎉 https://v3.dagger.cloud/kpenfound/traces/ea99e098ec75c34ca...

https://youtu.be/EQ6HaIP5uwg?t=66

YouTube

Brooklyn Nine-Nine

HOT DAMN! | Brooklyn Nine-Nine

HOT DAMN!

Watch full episodes of Brooklyn Nine-Nine on Google Play: http://bit.ly/1qSidLc
Watch full episodes of Brooklyn Nine-Nine on Itunes: http://apple.co/2gsqIjI

Keep up to date with the Nine-Nine and Subscribe: http://bit.ly/2y90XNd

Welcome to the OFFICIAL Brooklyn Nine-Nine YouTube channel. Here you'll find the best moments from Jake, ...

▶ Play video

#

can't figure out how to get zed to actually use a MCP server thinkies

#

i configured it but it never runs it

#

https://tenor.com/bmq4l.gif

Tenor

summer holly Apr 17, 2025, 9:41 PM

#

eternal cipher Yooooo that did it 🎉 https://v3.dagger.cloud/kpenfound/traces/ea99e098ec75c34ca...

lol turns out this confuses Goose

#

one step forward, one step back

#

maybe because it thinks it replaced the full tool set, including selectTools?:

( O)> gimme a go container
I'll help you create a Go container environment. Let me use Dagger to set this up.
─── selectTools | dagger ──────────────────────────
tools:
    -
        container



Hmm, I see that I can create a container but I need some additional tools to properly set it up with Go. Let me search for any available extensions that might help.
─── search_available_extensions | platform ──────────────────────────



I apologize, but I notice that I currently don't have access to the full set of container tools needed to pull a Go image or configure it. While I can create a base container, I don't have access to the methods needed to:
1. Pull a Go image
2. Configure it with necessary settings
3. Install dependencies

Would you like me to:
1. Create just a basic scratch container (though it wouldn't be very useful without additional configuration)
2. Wait for additional container tools to be enabled
3. Try a different approach to help you with your Go development needs

#

when I change it back to ok it figures it out just fine

#

currently it's responding with {"tools":...} - maybe that should be {"addedTools":...}

#

oh yep that fixed it

#

https://tenor.com/n1aCiqLeuI3.gif

Tenor

#

gonna try disabling the mcp tools changed notification and see if this fools goose into using them elmofire

eternal cipher Apr 17, 2025, 9:54 PM

#

summer holly lol turns out this confuses Goose

Ah man

#

Saved?

summer holly Apr 17, 2025, 9:54 PM

#

lol yep

#

it took "tools":... too literally

#

(very interesting that it respected it that hard, even...)

eternal cipher Apr 17, 2025, 9:55 PM

#

4.1?

summer holly Apr 17, 2025, 9:55 PM

#

Claude

#

3.5 Sonnet

summer holly Apr 17, 2025, 10:00 PM

#

summer holly gonna try disabling the mcp tools changed notification and see if this fools goo...

( O)> gimme a go container
I'll help you create a Go container using Dagger. Let me select the container tool and set it up.
─── selectTools | dagger ──────────────────────────
tools:
    -
        container




─── think | dagger ──────────────────────────
thought: ...




─── container | dagger ──────────────────────────


Execution failed: RPC error: code=-32602, message=tool 'container' not found: tool not found
◐  Gathering computational momentum...                                                                                                                                                                                2025-04-17T21:53:43.433047Z ERROR goose::agents::agent: Error: Request failed: Request failed with status: 400 Bad Request. Message: messages.5: `tool_use` ids were found without `tool_result` blocks immediately after: toolu_01RdGLghXzbXdfK6FmE5QXTt. Each `tool_use` block must have a corresponding `tool_result` block in the next message.
    at crates/goose/src/agents/agent.rs:530

it does indeed - it only failed because I also disabled the bit that actually wires up the tool on the server side.

lapis narwhal Apr 17, 2025, 10:38 PM

#

summer holly ``` ( O)> gimme a go container I'll help you create a Go container using Dagger....

what makes you think it's also not wired up client side?

#

i guess it'd prolly fail earlier if that was the case?

summer holly Apr 17, 2025, 10:39 PM

#

it's in the core/mcpserver.go file in the MCP server handling path thinkies

summer holly Apr 17, 2025, 10:40 PM

#

lapis narwhal i guess it'd prolly fail earlier if that was the case?

yeah, afaict this is showing the client went ahead and called a nonexistant tool just based on being told it has them in the selectTools response

lapis narwhal Apr 17, 2025, 10:40 PM

#

i just can't tell from the output whether it actually tried to call it

summer holly Apr 17, 2025, 10:41 PM

#

this line:

─── container | dagger ──────────────────────────

and then I think that error must come from the mark3labs package

lapis narwhal Apr 17, 2025, 10:42 PM

#

the error matching would be enough for sure

summer holly Apr 17, 2025, 10:42 PM

#

https://github.com/mark3labs/mcp-go/blob/71b910bee8fee098e3412177dac8548453eee5c0/server/server.go#L851

lapis narwhal Apr 17, 2025, 10:43 PM

#

love it, text generation all the way down

#

2025-04-17T21:53:43.433047Z ERROR goose::agents::agent: Error: Request failed: Request failed with status: 400 Bad Request. Message: messages.5: `tool_use` ids were found without `tool_result` blocks immediately after: toolu_01RdGLghXzbXdfK6FmE5QXTt. Each `tool_use` block must have a corresponding `tool_result` block in the next message.
``` this thing plagues me, and apparently many others

#

i don't understand how they haven't fixed it

summer holly Apr 17, 2025, 10:44 PM

#

ah that should be fixed on my branch

#

it's actually our bad

#

we're just raising an error internally instead of appending an errored tool response

lapis narwhal Apr 17, 2025, 10:45 PM

#

there are similar issues on claude code and goose too

#

ahhhh interesting that makes sense

#

apparently there are situations where anthropics api itself returns responses that trip that, too

#

and yeah, i just switched back to claude with non-error example i got working on open ai and it keeps working

summer holly Apr 19, 2025, 9:13 PM

#

update: went back to calling it select_tools (previously load_tools) (previously selectTools). Gemini showed markedly worse evals with load_tools, yielding an incredibly opaque FinishReason(10) in the majority of runs, which means "the agent generated a malformed function call".

It's incredibly frustrating to troubleshoot this error since there isn't any other info returned besides what you get the agent to tell you (and likely hallucinate). But going on what it says it'll do before it blows up, it seems like with the name as load_tools the model was prone to jump right to using tools without loading them first. Renaming back to select_tools makes a dramatic difference.

Just an example of the butterfly effect you can have from tiny changes, and why evals are critical. This took hours to figure out, after many stabs in the dark 😵‍💫

summer holly Apr 21, 2025, 2:34 PM

#

Still need someone to ✅ this PR - CI is green

https://github.com/dagger/dagger/pull/10134

GitHub

llm: switch to `select_tools` scheme by vito · Pull Request #10134...

Evals comparison with main
Bottom line up front: with these changes, all evals now perform much more consistently across all major providers.
There is a higher baseline token cost now that we apply...

gray cedar Apr 21, 2025, 2:34 PM

#

summer holly Still need someone to ✅ this PR - CI is green https://github.com/dagger/dagger/...

no API breaking changes?

summer holly Apr 21, 2025, 2:37 PM

#

gray cedar no API breaking changes?

Nothing breaking - I changed historyJSON to return JSON instead of String but kept module compatibility

#

merged - ty!

lapis narwhal Apr 21, 2025, 3:59 PM

#

summer holly update: went back to calling it `select_tools` (previously `load_tools`) (previo...

can corroborate that the load_tools felt worse as well, glad to hear the evals backed up the vibes.

summer holly Apr 21, 2025, 4:00 PM

#

the goal was to prevent models from loading the same tools over and over, since they stick around, but oh well

#

select_tools

lapis narwhal Apr 21, 2025, 4:09 PM

#

there was some point on the mcp demo branch where we had the tool name change btw, but the descriptions/prompts weren't updated... idk if you noticed that on your end and fixed it for the evals

summer holly Apr 21, 2025, 4:09 PM

#

yeah, i fixed that and it made a big difference for some models, but Gemini still struggled a lot with FinishReason(10)

eternal cipher Apr 22, 2025, 7:06 PM

#

on 18.4 qwen is selecting tools nicely, which is awesome. It's tripping over outputs:

│🤖 {"completed": "Container#3"}

What is the tool it needs to do that correctly so I can prompt it? save?

summer holly Apr 22, 2025, 7:18 PM

#

eternal cipher on 18.4 qwen is selecting tools nicely, which is awesome. It's tripping over out...

yeah save is how it can return outputs, but it should be getting prompted on that already via the system prompt notsureif

eternal cipher Apr 22, 2025, 7:22 PM

#

I'm somewhat skeptical of ollama's openai compat api and system prompts... but that's just my gut and haven't tested anything. I'll guide towards save in my user prompt

#select_tools