#select_tools

1 messages ยท Page 1 of 1 (latest)

summer holly
#

Current status: still seems promising, but it's extremely dependent on the system prompt. We already are - #1357369057334136882 message - so that's not a dealbreaker, but now it's just even more important.

What I'm finding though is it's hard to come up with one system prompt that makes all models happy. That's still the goal, but yeah. So far I've been able to get a system prompt that makes OpenAI and Gemini happy, but not Claude. And I can craft a system prompt that makes Claude happy, but not Gemini.

The march continues...

#

Worth noting the issue with Claude when Gemini+OpenAI are happy is that it doesn't call returnToUser - so the same issue as on main. For some reason it's just not keen to call it. The system prompt needs to REALLY focus on it (like "look at returnToUser and work backward")

#

Still have things to try. Just pulling up the curtain a bit!

#

One of the breakthroughs was listing the return type for each tool in the selectTools description.

gray cedar
#

So in terms of dependency on system prompt, so far this is a regression right? Since so far only Gemini has needed one

summer holly
#

(see linked message)

gray cedar
#

I wonder if mcp.run has the same issue. They also claim a dynamic/programmable mcp server, but seem to have gotten it to work over MCP

#

Maybe worth looking at which tools they expose

#

For sure they don't rely on system prompt since they got it to work over MCP

summer holly
#

maybe they use the instructions API? thinkspin (but I would be surprised just from how chaotic everything is haha)

summer holly
#

so I don't think we should think of it as "MCP doesn't support system prompts"

gray cedar
#

yeah but do any clients actually implement it?

summer holly
#

if they don't, that's their problem, and more reflective of how new and incomplete the whole field is

#

Goose implements it

gray cedar
#

a few quick observations too:

  • I noticed in a trace that return wasn't always selected (seemed like it had to be explicitly selected in selectTools). I think probably that should be only for object functions, not meta-tools
summer holly
gray cedar
#

ah ๐Ÿ™‚ but maybe remove it from the list then (or maybe it's not even in the list and they hallucinate it)

summer holly
#

yeah it's a hallucination lol

#

models say the darndest things - at one point Gemini (when poorly prompted) was calling selectTools(["Workspace_research", "Workspace_research", "Workspace_research"]) when told to "research and record three findings"

#

the models were all also ignoring the required self arg like 50% of the time ๐Ÿ˜ฌ - which I've worked around by having it default to the most recent object of the desired type

#

also self is now named Foo, to match Foo_bar, which I thought would help. it didn't, but I decided to keep it that way anyway. (Gemini sometimes mangles words that are special words in Python, like return -> return_, hence the switch to returnToUser)

summer holly
#

but, it's a bit of a crutch, I would much prefer they remember object IDs and use them accurately

summer holly
# gray cedar I wonder if mcp.run has the same issue. They also claim a dynamic/programmable m...

From last time I looked into it, their tools aren't nearly as dynamic. It removes the pain of managing the MCP server config .json, by putting their single stateful entrypoint in the middle (tied to your account where you install servlets, i.e. tools, from their marketplace). But in the end those tools/servlets are already designed to be exposed over MCP; they don't have a sophisticated stateful object/API/function calling/output returning scheme like we do, so they shouldn't need a system prompt at all.

gray cedar
#

ah I see. I was wondering if they added an indirection in how llms consumed tools, so that different llm sessions could get different tools.

summer holly
eternal cipher
summer holly
#

I guess it amounts to doing a lot more prompting, I see you've got a hefty one there. Some of it is out of date (I see "Select Container#1")

eternal cipher
#

Haha yeah I'll sign the waiver that says qwen might not work properly. My goal is to find what amount of prompting gets different sizes of qwen to kind of work. I was surprised with this one because in the trace it looks like it actually did the right thing

summer holly
#

no idea what's going on here lol - I set ?debug on my local Cloud UI which shows literally all spans (even passthrough) and it's not like there's a hidden call or anything. You can see a revealed passthrough span at the bottom

#

like why's it saying "I have created ..." when it didn't even take any action thinkspin

#

The repeated selectTools might be something we can system prompt away

eternal cipher
#

ok, I wasn't sure if that was a quirk of the new API or not. Agreed with the repeated selects. Do you know where the tool calls at the end are actually happening?

summer holly
#

it seems like it failed to call withExec a couple times and then just returned an older Container

eternal cipher
#

ah, thank you, that explains it

summer holly
#

the cached statuses are just because the actual effects are cached, like at the buildkit level. so that part worked fine I think?

#

hmm it also bailed early because it called save which is normally a hint to our loop that it's done (and it can break). in this case it looks like it was treating it more like a checkpoint. it might have kept going if it didn't call it so early

#

i added that break there because otherwise I saw qwen just repeatedly calling return (which is now save) with the same value

eternal cipher
#

without debug, this is what was confusing me. It looks like it did the right thing and all was good

eternal cipher
#

i'll add some prompting to say only use save when you're done for-reals

summer holly
eternal cipher
#

ahh ok, makes sense

#

maybe ollama specifically is weird with system prompts?

summer holly
#

yeah maybe you have to write it differently

eternal cipher
#

i'll copy some parts of the default system prompt into the user prompt

summer holly
#

i'm not too familiar with ollama but i'd expect it to at least apply stronger weight to a system prompt still

#

so in my mind that would just make things worse, but SHRUGGERS

eternal cipher
#

yeah who knows, all of the "openAI compatible APIs" so far have taken the "compatible" part pretty loosely

eternal cipher
#

I've been doing a bunch of testing with local models on this branch. The issues I'm hitting aren't specific to this change, so I can start a new thread if that's better

The challenge I'm trying to overcome is that it's currently impossible for a local model (like qwen2.5-coder:14b on ollama) to complete the quickstart - even with toy-workspace instead of container.

To set expectations, I recreated the quickstart using openai's agent sdk and the same model has no problems at all completing the "make a curl clone" prompt.

I guess the question is: how urgent is this problem?

summer holly
eternal cipher
#

Yeah I've mostly tried 32b with dagger, even more customized versions of it from huggingface

#

fwiw I used qwen 32b a lot on 17-llm.X with success on single-object, so it's still a tool calling complexity issue

summer holly
#

hmm interesting

#

i'm trying to avoid the idea that smaller models will need us to maintain an entirely different scheme laughcry

partly because if that is the answer, we'd really need to justify the existence of the more complex one (e.g. it makes it better at more sophisticated tasks, somehow larger models prefer the more sophisticated scheme, or the more sophisticated scheme saves you a ton in token cost). but maybe they're low ish maintenance once you figure it out

eternal cipher
#

I even got 3b (three) to work with the openai agents sdk, but not consistently

summer holly
#

like this part is interesting - it seems to think it called those tools, instead of just making them available

#

there might be a simple fix for that if so

#

right now selectTools just replies with "ok" lol

#

it could either:

  1. reply with "You now have the %s tool available."
  2. reply with a list of tool schemas - which might double as a workaround for MCP clients that don't suppport the tools-changed notification (or it's racy etc)
#

maybe those will help - i'll try 2) first since the secondary effect would be neato (compatibility with more clients)

eternal cipher
#

and the models I've been trying

hf.co/unsloth/Qwen2.5-Coder-32B-Instruct-128K-GGUF:Q4_K_M
hf.co/unsloth/Qwen2.5-Coder-32B-Instruct-128K-GGUF:Q8_0
qwen2.5-coder:32b
qwen2.5-coder:14b
eternal cipher
summer holly
#

@eternal cipher pushed

#

gonna see if this makes dagger mcp usable in Zed ๐Ÿ‘€

lapis narwhal
summer holly
#

...gotta make a zed extension first lol

lapis narwhal
summer holly
#

couldn't find it in their docs

lapis narwhal
#
{
  "context_servers": {
    "dagger": {
      "command": {
        "path": "dagger",
        "args": ["mcp", "-m", "~/src/dagger"],
        "env": {
          "ANTHROPIC_API_KEY": "REDACTED",
          "NO_COLOR": "true"
        }
      },
      "settings": {}
    },

in your zed config

summer holly
#

ah cool - i just told an agent to do it for me and let it rip, since we might want it eventually lol

#

if it gets stuck ill give that a go instead

eternal cipher
summer holly
# eternal cipher Yooooo that did it ๐ŸŽ‰ https://v3.dagger.cloud/kpenfound/traces/ea99e098ec75c34ca...

HOT DAMN!

Watch full episodes of Brooklyn Nine-Nine on Google Play: http://bit.ly/1qSidLc
Watch full episodes of Brooklyn Nine-Nine on Itunes: http://apple.co/2gsqIjI

Keep up to date with the Nine-Nine and Subscribe: http://bit.ly/2y90XNd

Welcome to the OFFICIAL Brooklyn Nine-Nine YouTube channel. Here you'll find the best moments from Jake, ...

โ–ถ Play video
#

can't figure out how to get zed to actually use a MCP server thinkies

#

i configured it but it never runs it

summer holly
#

one step forward, one step back

#

maybe because it thinks it replaced the full tool set, including selectTools?:

( O)> gimme a go container
I'll help you create a Go container environment. Let me use Dagger to set this up.
โ”€โ”€โ”€ selectTools | dagger โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
tools:
    -
        container



Hmm, I see that I can create a container but I need some additional tools to properly set it up with Go. Let me search for any available extensions that might help.
โ”€โ”€โ”€ search_available_extensions | platform โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€



I apologize, but I notice that I currently don't have access to the full set of container tools needed to pull a Go image or configure it. While I can create a base container, I don't have access to the methods needed to:
1. Pull a Go image
2. Configure it with necessary settings
3. Install dependencies

Would you like me to:
1. Create just a basic scratch container (though it wouldn't be very useful without additional configuration)
2. Wait for additional container tools to be enabled
3. Try a different approach to help you with your Go development needs
#

when I change it back to ok it figures it out just fine

#

currently it's responding with {"tools":...} - maybe that should be {"addedTools":...}

#

oh yep that fixed it

#

gonna try disabling the mcp tools changed notification and see if this fools goose into using them elmofire

eternal cipher
#

Saved?

summer holly
#

lol yep

#

it took "tools":... too literally

#

(very interesting that it respected it that hard, even...)

eternal cipher
#

4.1?

summer holly
#

Claude

#

3.5 Sonnet

summer holly
# summer holly gonna try disabling the mcp tools changed notification and see if this fools goo...
( O)> gimme a go container
I'll help you create a Go container using Dagger. Let me select the container tool and set it up.
โ”€โ”€โ”€ selectTools | dagger โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
tools:
    -
        container




โ”€โ”€โ”€ think | dagger โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
thought: ...




โ”€โ”€โ”€ container | dagger โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€


Execution failed: RPC error: code=-32602, message=tool 'container' not found: tool not found
โ—  Gathering computational momentum...                                                                                                                                                                                2025-04-17T21:53:43.433047Z ERROR goose::agents::agent: Error: Request failed: Request failed with status: 400 Bad Request. Message: messages.5: `tool_use` ids were found without `tool_result` blocks immediately after: toolu_01RdGLghXzbXdfK6FmE5QXTt. Each `tool_use` block must have a corresponding `tool_result` block in the next message.
    at crates/goose/src/agents/agent.rs:530

it does indeed - it only failed because I also disabled the bit that actually wires up the tool on the server side.

lapis narwhal
#

i guess it'd prolly fail earlier if that was the case?

summer holly
#

it's in the core/mcpserver.go file in the MCP server handling path thinkies

summer holly
lapis narwhal
#

i just can't tell from the output whether it actually tried to call it

summer holly
#

this line:

โ”€โ”€โ”€ container | dagger โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

and then I think that error must come from the mark3labs package

lapis narwhal
#

the error matching would be enough for sure

lapis narwhal
#

love it, text generation all the way down

#
2025-04-17T21:53:43.433047Z ERROR goose::agents::agent: Error: Request failed: Request failed with status: 400 Bad Request. Message: messages.5: `tool_use` ids were found without `tool_result` blocks immediately after: toolu_01RdGLghXzbXdfK6FmE5QXTt. Each `tool_use` block must have a corresponding `tool_result` block in the next message.
``` this thing plagues me, and apparently many others
#

i don't understand how they haven't fixed it

summer holly
#

ah that should be fixed on my branch

#

it's actually our bad

#

we're just raising an error internally instead of appending an errored tool response

lapis narwhal
#

there are similar issues on claude code and goose too

#

ahhhh interesting that makes sense

#

apparently there are situations where anthropics api itself returns responses that trip that, too

#

and yeah, i just switched back to claude with non-error example i got working on open ai and it keeps working

summer holly
#

update: went back to calling it select_tools (previously load_tools) (previously selectTools). Gemini showed markedly worse evals with load_tools, yielding an incredibly opaque FinishReason(10) in the majority of runs, which means "the agent generated a malformed function call".

It's incredibly frustrating to troubleshoot this error since there isn't any other info returned besides what you get the agent to tell you (and likely hallucinate). But going on what it says it'll do before it blows up, it seems like with the name as load_tools the model was prone to jump right to using tools without loading them first. Renaming back to select_tools makes a dramatic difference.

Just an example of the butterfly effect you can have from tiny changes, and why evals are critical. This took hours to figure out, after many stabs in the dark ๐Ÿ˜ตโ€๐Ÿ’ซ

summer holly
summer holly
#

merged - ty!

lapis narwhal
summer holly
#

the goal was to prevent models from loading the same tools over and over, since they stick around, but oh well

#

select_tools

lapis narwhal
#

there was some point on the mcp demo branch where we had the tool name change btw, but the descriptions/prompts weren't updated... idk if you noticed that on your end and fixed it for the evals

summer holly
#

yeah, i fixed that and it made a big difference for some models, but Gemini still struggled a lot with FinishReason(10)

eternal cipher
#

on 18.4 qwen is selecting tools nicely, which is awesome. It's tripping over outputs:

โ”‚๐Ÿค– {"completed": "Container#3"}

What is the tool it needs to do that correctly so I can prompt it? save?

summer holly
eternal cipher
#

I'm somewhat skeptical of ollama's openai compat api and system prompts... but that's just my gut and haven't tested anything. I'll guide towards save in my user prompt