#mcp
1 messages Β· Page 1 of 1 (latest)
Just to confirm our understanding of the v2 of MCP:
- Alex's current PR exposes one MCP server that exposes the graphQL API
The new implementation is:
dagger mcpcan take 2 args:
---mto specify a module
---cto specify a command ? (not sure about that one)
Once the dagger mcp is running, it exposes the given module as an MCP server.
So from Alex's PR, we still expose the module
But, the underlying pipeline / plumbing is us creating an LLM instance of that given module + potentially the with-prompt / with-* binding ?
cc @proper ledge @slow sierra @clever oasis Just to be sure π (sorry for the direct ping)
Not 100% sure about the args
the dagger mcp in my PR does support -m, but it does use GraphQL under the hood yeah and just implements the tools I did for the hack day (+ half baked support for shell). I think we want it to use the existing BBI code instead to discover and call tools
there's a lot of hand-waving in that second part π
it would be pretty neat if it worked using the API somehow, since the BBI code is reliant on direct access to a *dagql.Server, which is more of an engine-side concept
We just went over the hand waving part π @tardy shadow and @proper ledge are taking a stab at it
gonna start from terminal as a transport and see where it leads
In the llm branch we see there's core/bbi/gql driver that's not being used anywhere but it already has some MCP server stuff. Do you guys think should we reuse this driver ?
Ah I think this is probably related to @clever oasis 's implementation because i saw the same bbi tools in the PR, on the cli side
ah, that looks like an old implementation yoinked from my hackathon code, my PR should probably replace that (as in the code I added should probably be moved over it)
no worries, just thinking out loud.
Regarding using the API, we were thinking on relying on BBI's flat driver which already accesses *dagql.Server. Am I misunderstanding something ?
the CLI won't have access to a *dagql.Server since that's on the engine side. it will have a *dagger.Client, limited to the GraphQL API
maybe BBIs can be implemented externally instead?
- they're pretty high level tbh
right, we were thinking of doing this inside the engine and use Terminal or some session attachable to proxy
oh ok
Do you think it makes sense to add this MCP integration as part of the LLM type, or create its own ?
dont think it matters much yet, hard to say from here π - can always adjust
i recommend llm type
because otherwise you will have to duplicate the middleware magic
(dynamic type that changes based on other available types)
also, you shouldn't need to worry about bbi implementation
you should be able to only consume the bbi interface
to over-simplify you want a new method in core.Llm that receives a io.Reader and io.Writer or some other representation of a stdio stream, and instantiates & run a mcp server with those streams as frontend and the bbi session as backend
note that the mcp.Tool type has a map[string]any field for its arguments schema that I think can be directly copied from bbi.Tool
(I'm willing to bet it's the same unmarshaled json schema under the hood)
@dark egret with @proper ledge , we've been working on the BBI <-> MCP implementation ; what we have is present on https://github.com/tiborvass/dagger/tree/mcp-debug and same on mine: https://github.com/grouville/dagger/tree/mcp-debug
We've been blocked by the glibc issue this morning (that Erik gave a fix to 5 min earlier). We're currently trying to use the CLI as a proxy to the mcp server in the engine (under the LLM type . MCP() funciton for now)
To do that, for now hijacking the terminal sessionattachable
As soon as we have a hello world (MCP) working, we'll add the dagger modules to the server
reading the code, still looks open ended as to what "hello world" means here, given that the hacked-in-MCP-tool is a stub. am i guessing correctly that y'all are hoping to wire dagger modules as tools back through to an LLM running in dagger? then maybe wire those out to a dagger-client-as-MCP-server (@clever oasis's PR) for external consumption?
not yet attacking running 3rd party MCPs inside dagger and wiring them through the same pipes?
also wondering if https://github.com/mark3labs/mcp-go/tree/main/client is in y'all's plans or if it's unnecessary due to how BBI already handles tools
This is the hello world part: https://github.com/tiborvass/dagger/blob/08832a0bbc29006279802189fa81686efd3166d6/core/llm.go#L523
We are currently hard-coding a tool that simply returns Hello $name! but eventually yes, we will expose all the modules that the BBI instance will know about (right now it is limited to only one module, but that will change). This is all happening in the engine, and the CLI proxies it to the client.
We probably should use the mcp client for the tests, but right now we are literally pasting in the JSON that we know it expects π
dope then i think the thing i'm contemplating hacking at might be non-overlapping... imagine i've got a a toy agent built in dagger calling modules as tools. i want that agent to interact with linear or gdrive, but i don't want to learn how their APIs work or figure out how to wire their API clients into my dagger SDK of choice. Both these things have MCP servers that already present these APIs to LLMs.
if i could configure and run those servers in module code and wire them up to my dagger-managed LLM, i can skip writing a bunch of module code.
if i'm understanding y'all's plans correctly, you're building all the pieces to easily turn module code into an MCP, but not the pieces that let you take an MCP off the shelf for use by a dagger LLM?
(hypothetically there's also a use case for an off-the-shelf MCP configured and running in dagger, but wired out to an external client, but that's maybe less exciting as a demo)
Correct
sick
@dark egret just saw your update from this morning, you're working on MCP support also, complementary with what @tardy shadow and @proper ledge are doing?
Heβs doing client MCP and weβre doing server MCP so itβs complementary. Thatβs what we figured out from the conversation last week (if I understood correctly)
essentially, yeah, but that's getting indefinitely punted to work on actual llm-branch-merging issues
it's kinda spooky to do it on top of so many stacked and unstable changes, so i'd rather get things onto trunk first anyways
yeah makes sense
@proper ledge @tardy shadow cursor has a refresh button on their MCP UI that reloads tools, which at least implies that refreshing continuously isn't entirely out of reach...
also i just went and tried to corroborate @slow sierra 's statement that the MCP authors want to support dynamic tool changes, and i didn't even have to browse github issues, they've already got the notifications/tools/list_changed backchannel in their docs ... i don't think this existed last time i played with this
Enable LLMs to perform actions through your server
Yes that's correct! I need to check if mcp-go implements list_changed, but it would make sense that they did.
it did exist at the time of the hackathon, but Claude Desktop didn't support it (not surprisingly)
interesting
perhaps clients are just lagging behind servers, wouldn't be the first time
i think Claude Desktop is kind of just Baby's First MCP Client
maybe dagger/BBI can be the first client that actually uses the whole protocol
i'd be curious if zed/cursor/goose use any more of the protocol than claude desktop... like if resources worked properly it might save a whole shit ton of pain around prompting the llm on how to use a given tool or set of tools
or their whole sampling server-completion thing
i've had a TODO to investigate that for a while now haha
do you remember if cursor supported MCP at the hackathon? i think they sneakily launched support either like right before or right after, because in my head they still don't support it, but on git and in their docs it looks like they started supporting it on 1/30
sounds about right - pretty sure it came after. Hackathon was on 1/21
From empirical tests today -- it seems that they don't have a .sync implementation client side
which is kinda weird -- goose doesn't do that either
@clever oasis I wasn't hearing your response when Solomon was talking about the fact that those clients don't re-explore the types of complex objects (the way dagger does in the sync). Does it change somehting for the mcp implem ? We'll explore solutions on our end too
Currently digging other client's code to ensure that what I'm saying is true -- but from empirical tests it's the case
what do you mean by .sync? refreshing the set of tools?
i think he's referring to LLM.Sync()
that's what i thought but the context is the cursor 'refresh tools' button 
Does it change somehting for the mcp implem ?
it might make us necessary champions of MCPnotifications/tools/list_changed, otherwise i suspect MCP servers wired out from dagger to goose or claude desktop or whatever will need to be obscenely careful about restricting their return types to the point of uselessness
like i doubt models today can handle us giving them our entire core API as tools in all of its nested inter-relatedness, much less all the additional APIs implied by custom module return types... the thing where we refresh tools based on available objects to manipulate is a big part of what makes llm.Sync work
UPDATE π I clarifief in a live conversation with Guillaume and Tibor, that the issue is a) their MCP implementation doesn't implement notifications/tools/list_change, and b) unclear which clients actually honor that
https://github.com/orgs/modelcontextprotocol/discussions/76#discussioncomment-12502992 -- 7 hours ago -> almost no client the notifications/tools/list_changed. I am implementing it nonetheless atm
Maybe we should provide the first MCP client that does it properly? π
That would be hilarious
Here is 1) a MCP server implementation that leverages the power of dynamic tools + 2) a MCP client that actually can use it
On the MCP server in dagger -- currently seeing how to stretch the library to actually fit that model -- potential upstream contribution / digging atm
@tardy shadow first step is to implement that protocol on server side right?
Yes
So, we're actually notifying at the moment (the library does it), but only at the MCP server initialization for now
What we want is, in our MCP server support, is an equivalent of tools.List(), which would be:
- everytime a tool call responds (with a different type), then also send this notification update and add the tools to the server
I'm implementing that atm -- exploring how the mcp-go lib handles that / as the lib has a handler concept and this might need some stretching (it might need to be changed a bit (the maintainer seems to be quite open to contributions)) -- but potential temporary fork is doable too
Sorry for the confusion this morning -- we needed to double check. I doubt though that goose handles it either. Will triple check once done
It would be funny (but maybe not practical) to have a "mcp/llm adapter" π
ie a double-ended proxy: it's a mcp proxy but also a llm proxy
and it connects the mcp clients to the llm clients
It seems is an implementation detail of Tibor's idea that I shared briefly this morning -> which, if I'm correct, is to generate an MCP SDK (curious how Connor was thinking about implementing the MCP client support)
no this is different - this is purely to implement the mcp server side
it's just that dagger already has the code to dynamically notify a llm server of new tools
what if we think of MCP as a complicated way to reach a LLM that you can't actually connect to
(ie the LLM behind goose)
we can't prompt it
but we can give it tools
(dropping that whole train of thought - ignore)
Ok, goose supports dynamic loading -- testing it out atm
oh!
did chatgpt get it right?
mmmh, I'm not on the nightly so surprised. But, notifications seem to have updated the list tool state (dynamic extension in a given session)
This is exactly my intent, if yall donβt get there faster than me happy to put hands on keyboard
I want to run the servers as containers and wire tool calling through BBI. Beyond that, not sure, the other elements of the protocol donβt necessarily wire through BBI
But BBI only works when targeting a Dagger API backend. If the backend is a MCP server, I don't think BBI can help
and perhaps too ambitious to try to call MCPs as though they were dagger modulesβ¦ but perhaps itβs a interop layer adjacent to BBI then
I am actually building it rn, (as a standalone binary) ; to be able to test the dagger MCP server implementation without side effect of those overloaded clients ; should have something by tomorrow -- the dynamic loading via the nortification should work and we currently have surprising behaviors on community clients ; still dont know if its us or just them
Coupled with the golden tests, it's a nice foundation for our test suite
@tardy shadow @dark egret would love to discuss with you both tomorrow ππ
@tardy shadow @proper ledge @dark egret https://x.com/fkadev/status/1904790444103311644?s=46
MCP coming on VSCode very soon
nice
https://news.ycombinator.com/item?id=43488151
I think there is a fundamental misunderstanding that MCP services are plug and play. They are not. Function names and descriptions are literally prompts so it is almost certain you would need to modify the names or descriptions to add some nuances to how you want these to be called. Since MCP servers are not really meant to be extensible in that sort of way, the only other alternative is to add more context into the prompt which is not easy unless you have a tone of experience. Most of our customers fail at prompting.
I agree. What OpenAI did was simple and beautiful.Also, I think there is a fundamental misunderstanding that MCP services are plug and play. They are not. Function names and descriptions are literally prompts so it is almost certain you would need to modify the names or descriptions to add some nuances to how you want these to be called. Since M...
good example of where having types composition, and core types for files & directories can make for better MCP servers: https://x.com/lukeharries_/status/1906017561138897372?s=46
@magomaev Not implemented into the MCP server (or images), but the underlying protocol whatsmeow supports all media types so would be possible to extend
Hi team, with MCP support released in vscode https://github.com/microsoft/vscode-docs/blob/vnext/docs/copilot/chat/mcp-servers.md last week, there would be a new large surface area opened up for building agentic dev workflow solution right inside the ide. I've been prototyping some dev workflow scenario with dagger modules and made some demo-able progress, as I just found out this vscode MCP preview, I'm eager to build an end-to-end dev experience through this portal. The most intuitive way of integrating it with my dagger module would be leveraging dagger as a MCP server I guess ? which leads me to this thread for MCP support update, any concrete timeline of when this is going to be available ? Happy to hear more insights too π
@zinc glacier we agree π we are going all out on MCP support
experimental MCP server support is merged in main. You can run dagger mcp -m MODULE to expose any Dagger module as a MCP server with zero additional code
we're going to add client support soon cc @dark egret @tardy shadow
I think the most exciting use cases for Dagger & MCP will be:
- Packaging and distribution of MCP servers cross-platform
- End-to-end testing of MCP servers, with multi-model evals
- MCP composition. Right MCP servers are monolithic and can't really be composed with each other. Dagger's object model and typed composition changes that
- Multi-agent. Your MCP server may also be an agent. Dagger can be a runtime for both
- End-to-end observability. Get a deep trace from LLM to MCP, all the way to individual function granularity
My bad! Actually I did read through MCP thread here but missed latest msg in agent main channel π where you guys had the PR merged already!
It merged just this friday I think. No worries our discord can be chaotic and fast moving at times π
so many channels, so many threads
Great summary! I'm still new to this agentic composition world which had lots of new hypes coming up lately including dagger and MCP. I saw some of your tweets about MCP vs dagger, and my itch is to see how I could leverage both potentials in synergy.
i am literally hacking at mcp client support rn, something like dag.LLM().WithMCP(*dagger.Container)
got the wiring up to the buildkit part (read: the hard part lol)


warning, long rambling message that is largely me trying to concretize the 10 billion LoC I've read today so it's not waste of time tomorrow



the pipe-y bits of mcp client functionality @clever oasis @thorn sand @proper ledge @tardy shadow:
trying to do stdio mcp servers first. ideally we are able to use a 3rd party MCP client impl like go-mcp, but coupling there between the client impl and exec.Cmd may force us to use a different lib or roll our own. ANYWAYS:
the hard part is the IPC element: in the engine process, i want to be able to call api methods and have the underlying impl talk to a bk-ran container on stdin and stdout. there are lots of session-attachable-adjacent things that do similar comms, but most of those are about exchanging data between an interactive mainclient and the engine. here i'm trying to communicate between the engine and a noninteractive bk-exec'd process. having dug through the code, there's definitely something sorta analogous on the backend of our terminal impl, but i'm not sure which of these MANY pipe/listen/forward situations in our codebase might be ones that actually wire up a bk process's stdios through the engine, even if they land somewhere else like a session attachable...
for terminal, does connect-to-bk-container-stdio always go through Service.startContainer? the terminal code makes me think it that it does... whenever i trace down the stack from that Start call, I find all the schema callsites where one can start a container and svcID is passed as a callID of a directory or a container or whatever, and at the other end of the stack looks to be svc.startContainer's
svcProc, err := gc.Start(execCtx, bkgw.StartRequest{
Args: execOp.Meta.Args,
Env: env,
Cwd: execOp.Meta.Cwd,
User: execOp.Meta.User,
SecretEnv: execOp.Secretenv,
Tty: interactive,
Stdin: stdinCtr,
Stdout: stdoutCtr,
Stderr: stderrCtr,
SecurityMode: execOp.Security,
})
which looks like the only place we throw stdio pipes into buildkit, other than debugContainer... so do y'all think I should put stdio MCP servers through svc.startContainer too? provided this reading of the code is correct, sessionattachables are fully orthogonal to all this.
Description Build the SDK to create a new MCP Client form scratch Addtitional Information Look at the offcial MCP SDKs: Python: https://github.com/modelcontextprotocol/python-sdk?tab=readme-ov-file...
An engine to run your pipelines in containers. Contribute to cwlbraa/dagger development by creating an account on GitHub.
@dark egret Possibly you could reuse our PipeIO session attachable from our MCP PR. You'd still need to wire it to the buildkit side of things, i can take a look at it if you want. Or we can jump on a call and you show me your code.
that's where i started, and im trying to rule it out as orthogonal
I like the strategy of starting with one specific transport (stdio). Don't forget to keep an eye out for generalization at the same time - a delicate balance I know
especially with recent changes to the protocol to allow stateless HTTP - that's going to become increasingly popular in the future (ie. Cloufdlare, Vercel & others selling mass backend hosting for them etc)
lol π― , going through svc.startContainer has a certain appeal there too, given that SSE is gonna wanna deal with netNSes, hostnames, sockets and whatnot
Are you planning on hooking up to already running MCP servers ?
@clever oasis pushed
sorta, i'm planning on having bk solve user-provided container definitions and then wire those up to the engine-orchestrated LLM
func (m *McpCaller) MCPGSearch() *dagger.Container {
return m.WolfiNode().WithExec([]string{"npx", "-y", "g-search-mcp"})
}
func (m *McpCaller) MCPAWS() *dagger.Container {
return dag.Container().From("ghcr.io/alexei-led/aws-mcp-server:latest")
}
func (m *McpCaller) LLMWithMCP() *dagger.LLM {
return dag.LLM().
WithMCP(m.MCPGSearch()).
WithMCP(m.MCPAWS()).
WithPrompt("list your available tools")
}
wrote those before i started digging through this service thing, though, and if these all go through svc, you'd prolly do ctr.AsService(stdio: true) to make the semantics more clear?
Ok so if LLM is the MCP client, and the MCP servers live in the container, then yeah i don't think you need any of the session attachable stuff. (sorry i'm slow, just catching up). And now your question is what's the best way to expose an MCP server in a container, to the engine's LLM correct ?
you're suggesting adding an Stdio bool to https://pkg.go.dev/dagger.io/dagger#ContainerAsServiceOpts ?
correct!
although im not entirely sure i'd expose the stdio bool all the way out through the api on the first pass... i'm not sure it has any userspace applications until there's an equivalent flag on dagger call myService up, and i think implementing that would probably use y'all's pipeio session attachable
but that's like step 4 lol, i'm on step 0 still
yeah i think a__ prefix hides it from codegen? or at least it can for functions...
or maybe it's fine to expose, it'd definitely be fine to expose if it's part of the actual interface, like WithMCP(m.MCPGSearch().AsService(stdio: true))
if you're doing step 0, do whatever works π
anyways tomorrow imma start with tryna wire this through Service. this morning i was trying to do it inside engine/buildkit/client.go but that did not yield fruit
happy to help, i'm still new to the code but can dig around
β€οΈ appreciate it, i am also still pretty new to it and i pretty consistently get on these deep branches where i want somebody to sanity check me so i don't spend multiple days working off of bad assumptions lol
@proper ledge @tardy shadow did y'all develop opinions about mark3labs/mcp-go during your hacking? im gonna bring in the metoro.io/mcp-golang lib on this branch because the mark3labs stdio client assumes way too much about how you wanna execute the server.
Write Model Context Protocol servers in few lines of go code. Docs at https://mcpgolang.com - metoro-io/mcp-golang
there's also this one https://github.com/riza-io/mcp-go/tree/main which has few stars for whatever reason but looks to be applied and built by folks with similar motivations as us
@dark egret we haven't considered other libs as it seemed to be the official one. It was making releases every couple of hours that's how fast it was changing. My main issue was not having a DefaultArray and the fact that cmd.Stderr was not being set which echoes what you're saying about it not being too flexible, but i thought of pushing a PR to make it so. I already have a PR out for the DefaultArray it just needs tests. I don't really care tbh
afaict they haven't blessed one lib or another, but the mark3labs one is definitely most popular. i'm inclined to use the most popular one, but yeah gonna have to PR some bits if we wanna stay on there, so might just be easier to switch to the metoro one.
tbh the changes necessary to get a stdio client might be kinda large, the existing impl has like 0 respect for coupling or composition lol https://github.com/mark3labs/mcp-go/blob/main/client/stdio.go
yeah i didnt really need it other than to debug. Will try to think of a way
FYI we think env api broke our dagger mcp cli. We're investigating
no worries π i think this is the fix: https://github.com/dagger/dagger/pull/10050
dagger --progress plain -c 'h=$(github.com/shykes/hello); e=$(env | with-module-input hello $h "module to expose as an MCP server"); llm | with-env $e'
yields this error:
error: parse selections: parse field "withModuleInput": init arg "value" value as dagql.ID[*github.com/dagger/dagger/core.Module] (ModuleID!) using dagql.ID[*github.com/dagger/dagger/core.Module]: expected "Module" ID, got Hello! ID
Does this mean a Hello ID is not a Module ID from the Env's perspective ?
@clever oasis any ideas? (sorry for the endless pings)
yeah, I don't think you want withModuleInput there, I think you want "with"+modDef.MainObject.AsObject.Name+"Input"
ah, that would make more sense indeed
i'm just hacking rn, but it turns out the metoro mcp lib one is also not maintained enough, first party examples require initialize params it doesn't provide and it also somehow made the AWS python one choke π€·ββοΈ ... gonna try that riza one next, but its increasingly plausible we should have our own
early adopter penalties are funny, at least we're not actually meaningfully late to the party
I'll try to fork marklabs and get it to work for you, if we can upstream then good
happy to help with that too once i've got a demo working
i also wanna make sure we can pass a custom httpserver to the sseserver because it's simply hiding it rn
Yeah i'd say just get it working and we can clean things up
lol hilarious that it's the same implementation oversight server side
yeah it feels like they were content using the Rob Pike options design pattern, and declare themselves "idiomatic Go" π
some of it is also clearly AI slop, and coding via giving claude instructions and TS code references to translate makes you way more likely to get something that works but doesn't compose properly
oh i didnt even think of that but i think you're probably right
you can tell by the abundance of correctly-capitalized comments lol
https://github.com/dagger/dagger/pull/10043
got a mega hacky example working e2e
tomorrow i gotta play around with how these might interact with Environment
the mcp client lib woes never end, the one i finally got working uses a default bufio.Scanner, max token size 64kb, and if a single response is longer than that it blocks indefinitely 
The Go team has just announced that they are going to build the official MCP Go SDK π So many people have been asking for it, and it's finally happening! https://t.co/rwPAvWPET1
Praise be I was not looking forward to writing a 4th one
Zed launched their new agent beta. Seems like MCP support is in. Strongly recommend joining their discord
There's an "agent-beta" channel there
Fun fact they call it "Context Servers" https://zed.dev/extensions?filter=context-servers
Signup is here https://zed.dev/ai/agent
signed up!
grrrrr now i'm hanging out in here watching ppl be all excited... the review multibuffer sounds like it's the exact polish i've been wishing for on top of the assistant editing shit i've already been doing in neovim for a year now
woooo got an invite
it's pretty slick, combo'd with claude 3.7 sonnet thinking it's quite good at code navigation out the box, like surprisingly little coaxing required to get it to load the right shit
the new zed agent's default toolset in "ask" mode: ```
Based on the function definitions currently available to me, I have access to the following tools:
-
diagnostics: Gets errors and warnings for the project or a specific file.
- Can check a specific file or provide a project-wide summary
- Helps identify issues in your code that need fixing
-
path_search: Finds files matching a glob pattern in the project.
- Returns matching paths in paginated results (50 matches per page)
- Useful for locating files with specific patterns or extensions
-
thinking: A scratchpad for problem-solving without executing actions.
- Helps work through complex problems or develop strategies
-
fetch: Retrieves content from a URL and returns it as Markdown.
-
read_file: Reads the content of files in the project.
- Can read entire files or specific line ranges
- For large files, returns an outline of symbols with line numbers
-
now: Returns the current datetime in RFC 3339 format.
- Can use either UTC or local timezone
-
regex_search: Searches the entire project for text matching a regular expression.
- Returns matching paths with excerpts containing matches
- Results are paginated (20 matches per page)
These tools help me understand, navigate, and analyze your codebase to provide better assistance.
and in "write" mode
Based on the function definitions currently available to me, I have access to the following tools:
1. **find_replace_file**: Edits a specific part of a file by finding unique text and replacing it.
- Requires unique text to find (with context) and replacement text
- Good for targeted changes to files
2. **diagnostics**: Gets errors and warnings for the project or a specific file.
- Can check a specific file or provide a project-wide summary
- Helps identify code issues that need fixing
3. **code_symbols**: Provides an outline of code symbols in the project.
- Can show public symbols across the project or all symbols in a specific file
- Shows line number information for navigating large files
4. **symbol_info**: Provides detailed information about code symbols using LSP.
- Finds declarations, definitions, references, implementations, and type definitions
- More reliable than text searches for code navigation
5. **path_search**: Finds files matching a glob pattern in the project.
- Returns matching paths in paginated results
6. **thinking**: A scratchpad for problem-solving without executing actions.
- Useful for working through complex problems
7. **fetch**: Retrieves content from a URL and returns it as Markdown.
8. **read_file**: Reads the content of files in the project.
- Can read entire files or specific line ranges
9. **create_file**: Creates a new file with specified content.
- The most efficient way to create new files
10. **bash**: Executes a bash command and returns the output.
- Must use the `cd` parameter to navigate to a project directory
11. **now**: Returns the current datetime in RFC 3339 format.
12. **batch_tool**: Runs multiple tool calls either sequentially or concurrently.
- Efficient for complex operations requiring multiple steps
13. **regex_search**: Searches the project for text matching a regular expression.
- Returns matching paths with excerpts containing matches
out the gate i cannot get it to connect to my configured context servers, neither custom ones nor ones installed by extensions... i wanna give it DaggerDev so it can run tests π
also switching back and forth between ask and write is dope, afaict it keeps the built up history but changes out the tools... like you can get it on the right track with ask and then switch to write: disable thinking and enable all the actual writing tools
prolly the 2 most interesting tools:
- thinking: A scratchpad for problem-solving without executing actions.
- Useful for working through complex problems
- batch_tool: Runs multiple tool calls either sequentially or concurrently.
- Efficient for complex operations requiring multiple steps
@dark egret I've been tinkering with adding a think tool - wasn't a silver bullet or anything but pretty interesting, planning to try again
batch is interesting too
that sounds solid, always kind of annoyed me with cursor how it's totally split
the main thing i think is interesting is that it's the only tool that exists on "ask" but doesn't exist on "write"
implying that hey, maybe thinking is bad for actually doing shit, but good when you're explicitly not modifying state
got dagger mcp -m ~/src/dagger wired into the zed agent. as expected and similar to many other MCP clients, zed doesn't seem to respect tools changing with selections. even if it did, there's some UX that's not gonna be cooperative: per-profile, users are expected to manually select which additional tools they want to add - so if the agent calls select, it's gonna add to a list of tools, but then a human has to look at the list via a UI selector.
funnily, it can default-instantiate and select DaggerDev, and when it does, those tools tell it about additional tools it should've gained, but it doesn't ever get them. cc @deft osprey
@tardy shadow @proper ledge how confident are we that we're actually sending notifications/tools/list_changed btw? reading code it looks like it should happen when calling mcp-go.MCPServer.SetTools, but im curious if y'all have seen the notification go across the "wire" with your own eyes
taking a look to double/triple verify π
no rush, i confirmed in zed code they are definitely just another client on the list of clients that dont handle the notification
We've got out own client that logs everything for that kind of debug
It's actually worse than that as I was mentioning to Guillaume. Claude Desktop supports dynamic tools and it fails to find my simple hello-world function after calling our select. I'll try with goose later today to see if there is the same behavior, but maybe something was broken recently (I am using a version of dagger built from latest main branch)
that is suspicious
im not sure what i'm suspicious of but i'm suspicious
same, I am suspicious of that too
investigating
@dark egret we should probably add an env var or a flag that logs the pipe to a file
On Goose, dynamic seems to be properly working ; here is the full session -- in parallel, triple checking on our custom client to ENSURE that it's indeed via the notification that it's updated
Confirming that we see the tools/list/changed notification on the wire cc @dark egret
OPENAI_API_KEY=toto dagger_dev mcp -s -m /tmp/hello
overriding 'auto' progress mode to 'plain' to avoid interference with mcp stdio
Exposing module "hello" as an MCP server on standard input/output
{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"capabilities":{},"clientInfo":{"name":"goose","version":"1.0.12"},"protocolVersion":"1.0.0"}}
{"jsonrpc":"2.0","id":2,"method":"tools/list","params":{}}{"jsonrpc":"2.0","id":1,"result":{"protocolVersion":"2024-11-05","capabilities":{"tools":{}},"serverInfo":{"name":"Dagger","version":"0.0.1"}}}
{"jsonrpc":"2.0","id":2,"result":{"tools":[{"description":"module to expose as an MCP server","inputSchema":{"type":"object","properties":{}},"name":"hello"},{"description":"Select a Hello by its ID.\n\nProvides the following tools:\n\n- Hello_alpine","inputSchema":{"type":"object","properties":{"id":{"description":"The Hello ID to select, in \"Hello#number\" format.","type":"string"}},"required":["id"]},"name":"selectHello"}]}}
{"jsonrpc":"2.0","id":3,"method":"tools/call","params":{"arguments":{},"name":"hello"}}
{"jsonrpc":"2.0","id":3,"result":{"content":[{"type":"text","text":"{\"selected\":\"Hello#1\"}"}]}}
{"jsonrpc":"2.0","method":"notifications/tools/list_changed","params":{}}
I guess it's not very helpful -- yes our implementation is compliant -- BUT, unusable by most mcp clients ... Goose works, Claude Desktop seems to have been broken by the multi-object (or I've been hallucinating in the past)
sh -c "dagger mcp | tee /tmp/dagger-mcp.log" works in theory, no?
might work in reverse for stdin too lol
Did you have an arg on your function?
cc @deft osprey Mmmh, not for my top-level module function ; but, in the trace above, (in the file), we do: 1. spin up an alpine using a custom Hello_Alpine function (no arguments) -- 2. then, I ask him to do the "uname -a", which do have arguments
Could you please send me the module you're running the mcp server on π (or the mcp command you're running π
Simple module "dagbox" that has one function:
// Calls to curl to get the contents of a URL and returns the result
func (m *Dagbox) Curl(ctx context.Context, url string) (string, error) {
return dag.Container().
From("cgr.dev/chainguard/wolfi-base:latest").
WithExec([]string{"apk", "add", "--no-cache", "curl"}).
WithExec([]string{"curl", "-L", url}).
Stdout(ctx)
}
Config for Claude Desktop:
{
"mcpServers": {
"dagbox": {
"command": "/Users/shad/forks/dagger/hack/with-dev",
"args": [
"dagger",
"mcp",
"-m",
"/Users/shad/sandbox/dagger-example"
],
"env": {
"ANTHROPIC_API_KEY": "xxx",
"OPENAI_API_KEY": "xxx"
}
}
}
}
Prompt: "Fetch the content of the URL "github.com" and give me the body."
Result:
View result from selectDagbox from dagbox (local)
{
`id`: `Dagbox#1`
}
{"previous":"Dagbox#1","selected":"Dagbox#1"}
View result from Dagbox_curl:
{
`url`: `https://github.com`,
`method`: `GET`
}
Tool 'Dagbox_curl' not found.
Specifying "with-dev" as command is my hack to use local dagger, that parts works, at least
this is basically the same thing i get with zed where they don't claim to have any sort of tool change notification support
It works with goose -- even though a bit painful
So, in order for it to work, you need to:
- run dagbox ->
use dagbox and list all the tools you have access to - select dagbox to jump in the context of the module (thing that @dark egret is changing as part of its PR https://github.com/dagger/dagger/pull/10122)
select dagbox and Fetch the content of the URL "github.com" and give me the body.
So, for now -- best to use goose Sam
Ok, will try and let you know. Thanks for digging into it
@dark egret @tardy shadow @proper ledge @clever oasis after the call, can we bikeshed that "privileged" / selectRoot / core API access ux problems, because it seems like there are many parallel threads right now and maybe going in circles
Iβve been hacking at this since Wednesday fwiw, mostly actual impl issues keeping me from the goal, but I have a clear end state in mind. That end state might not be the final one, but will hopefully qualify as good enough lol
OK let's talk about that end state then
I'm on team audio if you have any juice left for this
@dark egret @floral shuttle @tardy shadow @proper ledge @clever oasis@deft osprey quick braindump of dependencies we discussed for end-to-end awesome goose:
- "shell parity" -> for a given environment, the interface presented to the model over MCP should mirror closely the interface presented to a human user over the shell
- context awareness -> goose informs dagger of the context directory to load, dynamically, without having to change config (using roots? other goose-specific protocol? hack for the demo?)
- make a decision on "privileged": what does it mean, and how does the goose user configure it?
- example app: we need a realistic demo scenario that involves a daggerized app, and a realistic dev loop with build, test, lint etc. It should not be the Dagger-dev module itself, too meta. Maybe the one from our quickstart?
- model performance. generally getting our tool mapping implementation to a point where it works at its best. If that means restricting parameters (only this model, etc) then so be it
- secrets injection. How to inject secrets into objects, eg. github token into a github module. At the moment
dagger mcpdoesn't support that dagger -s-> last I tried, the-sflag was mandatory to work with goose
It should not be the Dagger-dev module itself, too meta
it also adds 30s to my dev loop, this is a good callout
dagger -s -> last I tried, the -s flag was mandatory to work with goose -> Not since 0.18.2
https://linear.app/dagger/issue/PROP-77/mcp-support-demo (i suspect tibor can't view this, but previously shared the text of it in dms)
Hey @bleak drift ,
Weβre working on a simple but realistic MCP demo this week, and before diving into the flow, we want to make sure expectations are aligned.
As a first draft, weβre building from the hello-dagger CI example, aiming for the most intuitive experience (free of any known issues in dagger main).
Hereβs what weβre imagining for the baseline:
1. dagger # enter shell mode in the context of the hello-dagger module
2. Run `test` or `publish` directly in shell mode # current documented flow
3. Switch to prompt mode
4. Ask: `run the tests and summarize the output`
This should work out of the box and serve as the foundational demo from the docs.
Now, weβd love your input β along with @dark egret and @proper ledge β on how we could evolve this into a more complete and immersive experience, as Solomon suggested here:
https://linear.app/dagger/issue/PROP-77/mcp-support-demo#comment-27ac7cad
I was just trying Claude desktop and couldn't get it to call "core" functions. Any tips? I also tried to have two tools, Golang as one (without --env-privileged) and Core as another. Have a video.
{
"mcpServers": {
"Golang": {
"command": "dagger",
"args": [
"-s",
"-m",
"github.com/jpadams/mark-daggerverse/go",
"mcp",
"--env-privileged"
],
"env": {
"OPENAI_API_KEY": "anything",
"DAGGER_CLOUD_TOKEN": "dag_dagger_XXXXXXXXX"
}
}
}
}
Hey Jeremy, thanks for the video π
What I suspect: "Dynamic tooling doesn't work" in Claude Desktop -- So, on the second time you ask for the tools, it's actually lying to you, as in a Dagger sense, we probably changed scope and thus available tools (which Claude Desktop does not refresh)
Can you please retry with this prompt let's first create a new empty directory and call it sourcedirectly from the beginning ?
I would recommend to just test on goose -- that's the only client that we know for sure dynamically update / refresh the tools atm
btw, -s is not necessary anymore
cool! not having much success with goose either... but
claude 3.5 / 3.7?
oh, actually works much better after removing the -s on goose
still need a -m even for a dedicated Core tool, right?
Nope, just a dagger mcp --env-privileged shall work (but inside a module context though) -> dagger init --name=toto, dagger mcp --env-privileged
So yes, you're right sorry
if you're using 0.18.2+, you no longer need -s and that has no impact on your demo. If you had -s problems it wouldn't even have started.
--env-privileged is in v0.18.3
mmh -- rechecking this ; that's odd π
no I'm on 0.18.3
so no need for -s
weird side effect of not having Container().From() is that my LLM tried to create the golang container I asked for by Dockerfile π
Here are the functions available for **Container#1**:
### Container Functions
- **Container_asService**: Turn the container into a service with optional exposed ports.
- **Container_asTarball**: Package the container state as an OCI image and return it as a tar archive.
- **Container_directory**: Retrieve a directory from the container's root filesystem.
- **Container_exitCode**: Get the exit code of the last executed command.
- **Container_file**: Retrieve a file at a specified path from the container.
- **Container_publish**: Package the container state as an OCI image and publish it to a registry.
- **Container_stderr**: Access the buffered standard error stream of the last executed command.
- **Container_stdout**: Access the buffered standard output stream of the last executed command.
- **Container_withDefaultArgs**: Configure default arguments for future commands.
- **Container_withDirectory**: Add a directory to the container's filesystem.
- **Container_withEntrypoint**: Set an OCI-style entrypoint for the container.
- **Container_withEnvVariable**: Set a new environment variable in the container.
- **Container_withExec**: Execute a command in the container and return a new snapshot of the container state.
- **Container_withExposedPort**: Expose a network port for the container.
- **Container_withFile**: Add a file to the container's filesystem.
- **Container_withLabel**: Add a label to the container.
- **Container_withNewFile**: Add a new file to the container's filesystem.
- **Container_withRegistryAuth**: Attach credentials for publishing to a registry.
- **Container_withSecretVariable**: Set an environment variable with a secret value.
- **Container_withServiceBinding**: Establish a runtime dependency on a network service.
- **Container_withUnixSocket**: Forward a socket to a specified Unix socket path in the container.
- **Container_withUser**: Use a different user for the command.
- **Container_withWorkdir**: Change the container's working directory.
- **Container_withoutDirectory**: Remove a directory from the container's filesystem.
- **Container_withoutEntrypoint**: Reset the container's OCI entrypoint.
- **Container_withoutFiles**: Remove specified files from the container.
- **Container_withoutLabel**: Remove a label from the container.
I had no more memory on my VM, retesting but I'm 99% confident that it's the behavior:
- module present being: -m module
- core present being --env-privileged
I've double checked Jeremy π It is possible to expose the core only, but it currently still need to live inside a module, thanks π ; what would be your expected behavior ? Happy to fix it
so much better in the GUI
absolutely strange things it's doing to the tool names though lol
Yeah itβs some bizarre gore, that bug does not exist in the cli either so it feels like itβs the presentation layer somehow
hmm seems like it's expecting things to be called like do_a_thing which would become Do A Thing
if MCP normally has underscored names we could always do that conversion, then you'd get Trivy Scan Container and Container With New File at least
i wonder if that has any effect on evals 
could also be a casing library bugging out because Foo_barBazBuzz is a cursed pairing of camel and snake case (maybe it saw _ and assumed snake)
I panicked the dagger-engine (not sure if it's just my branch) and my best bet is that somehow CurrentModule has an empty "" InstanceID. If anyone have ideas or clues, i'm all ears
logs:
this might be fixed by pulling my branch. for a moment I had Binding.asFoo return a null Foo if it wasn't present, but that actually has really bad SDK semantics at the moment; instead of llm.Env().Output("foo").AsFoo().Sync(ctx) erroring it would return a Foo that tried to load a "" ID internally
so now there's a Binding.isNull instead
we have so many non determinism problems that i'm trying refrain from updating unnecessarily, but i guess i'll try since nothing works for me anymore (we're suspecting OpenAI changes at this point)
fwiw nondeterminism is what led me to testing with gemini-2.0-flash over other models - it's still nondeterministic, but it's fast, and basically no rate limits, so you can run 20 things in parallel and analyze the results
shot in the dark, but every time anybody's hit an mcp bug in the last 3 days @clever oasis 's already fixed it somewhere-
using helloDagger.publish on my machine, or trivy.scanContainer on tibor's, called through MCP, hitting a cache miss, we hit this error in the engine and the demo stops:
error="map[error:process \"/runtime\" did not complete successfully: exit code: 2 kind:*buildkit.ExecError stack:<nil>]
calling dagger call $fn works, and then re-running via mcp will start working. known issue? or worth filing a bug?
i'm about to reabse on his branch
on dagger-to-agents@main, prompting goose run -t "scan the hello-dagger application container for security vulnerabilities", hit this engine panic:
panic: no ':' separator in digest ""
goroutine 4648341 [running]:
github.com/opencontainers/go-digest.Digest.sepIndex({0x0, 0x0})
/go/pkg/mod/github.com/opencontainers/go-digest@v1.0.0/digest.go:153 +0x94
github.com/opencontainers/go-digest.Digest.Encoded(...)
/go/pkg/mod/github.com/opencontainers/go-digest@v1.0.0/digest.go:137
github.com/dagger/dagger/network.HostHash({0x0, 0x0})
/app/network/hosts.go:16 +0x24
github.com/dagger/dagger/network.ModuleDomain(0x314a850?, {0x40011d7500, 0x19})
/app/network/hosts.go:37 +0x4c
github.com/dagger/dagger/core.(*Container).WithExec(0x0?, {0x314a850, 0x400d5dbdb0}, {{0x4002964510, 0x9, 0x9}, 0x0, {0x0, 0x0}, {0x0, ...}, ...})
/app/core/container_exec.go:115 +0x684
github.com/dagger/dagger/core/schema.(*containerSchema).withExec(0x4006b479f8?, {0x314a850, 0x400d5dbdb0}, 0x4009182400, {{{0x4002964510, 0x9, 0x9}, 0x0, {0x0, 0x0}, ...}, ...})
/app/core/schema/container.go:858 +0x180
github.com/dagger/dagger/dagql.Func[...].func1({0x400ebcdd80?, 0x4009182400?, {0x0, 0x1, 0x400f03cf60, 0x400efec7e0}, 0x0, 0x0}, {{{0x40029643f0, 0x9, ...}, ...}, ...})
/app/dagql/objects.go:704 +0x74
github.com/dagger/dagger/dagql.NodeFuncWithCacheKey[...].func1({0x400ebcdd80, 0x4009182400, {0x0, 0x1, 0x400f03cf60, 0x400efec7e0}, 0x0, 0x0}, 0x400fd09bf0)
/app/dagql/objects.go:761 +0x10c
github.com/dagger/dagger/dagql.Class[...].Call(0x31a32c0?, {0x314a850?, 0x400d5dbdb0?}, {0x400ebcdd80?, 0x4009182400, {0x0, 0x1, 0x400f03cf60, 0x400efec7e0}, 0x0, ...}, ...)
/app/dagql/objects.go:266 +0x130
github.com/dagger/dagger/dagql.Instance[...].call.func2()
/app/dagql/objects.go:523 +0xa0
github.com/dagger/dagger/engine/cache.(*cache[...]).GetOrInitializeWithCallbacks.func1()
/app/engine/cache/cache.go:189 +0x64
created by github.com/dagger/dagger/engine/cache.(*cache[...]).GetOrInitializeWithCallbacks in goroutine 4639903
/app/engine/cache/cache.go:187 +0x4e0
nice well then thats 2 examples
gonna go back to my previous checkout for the demo then
alex says it could be fixed by his branch
our confluence of examples does mean its triggered by something dagger-to-agents@main, not your engine or goose branch
after this call btw i'd love to know what you're trying to do on those branches π
agentic loop that improves test coverage
oh sick
dude if you get that working and we can layer in sam's shit we're getting to an actually exciting demo
you didn't see sam's shit but he's got a host filesync thing working based on an old branch from andrea
that was the whole goal of this week, but "LLMs"
I don't see how the filesync will plug into the LLM as-is.
I struggled to find a good name for it, and I could simply have used "Sam's shit" instead
i am here to provide super descriptive names
i suspect a hacky way might be via some wacky agent-to-agent-over-mcp setup
although tibor also already has a thing where the source dir gets passed as WithDirectoryInput, that might work with some coaxing
the next step of the current demo with trivy is to ask for action items and then install a module that enables the fix of those action items -- but code coverage is maybe easier ?? π€ It's the same underlying principle -- that would unlock a pattern + 2 demos π€
i'm hopeful that once the LLM gods like me again, we can get that to work
I feel like i've seen this many times now (from prompt mode): β ! failed to convert call inputs: decode arg "source" (string): decode "Directory" ID: failed to decode base64: illegal base64 data at input byte 9
Is this something you know about or even better has been fixed in your latest of your branch ?
(Canada holiday today so I'll be mostly away - realized I only put that in the team channel!)
apologies for the ping!
@dark egret i still have this after having rebased on alex's latest branch. So we're gonna have to fix this
So Tibor does a spike on the static tools routing ? What's left ahah: keep nudging the test coverage demo ?
i first want to be able to pass a directory to a function via prompt mode, shouldn't be this hard
Without bindings I suppose ? oki
with bindings. I do workdir=$(host | directory .) then go to prompt mode
what was the way to debug prompt mode LLM calls ?
i remember this being claude-specific on some older selectTools branch, so idk what you're hitting now
I'm actually doing a spike to fix the .refresh of the shell / the fact that it doesn't reload the instlaled dependencies makes the gradual buildup without refreshing ? πΌ + MCP evals ? wdyt @dark egret
its possible its jsut prompt engineering. We hit it with 4o, then upgraded to 4.1 and it disappeared and now i hit it consistently
this is the top priority thing imo, although outputs are maybe more immediately urgent than input refreshing, no? like is there a demo where we need to input refreshing to work?
evals are important just because they'll speed us up
i think we need ONE MCP client we can trust, and that is no longer Goose for me
very left field lol, what did the goose do to you? what's the alternative?
we had our own mcp client it's a slight change to the llm logic we already have. The SendQuery abstraction was reused to reuse all the providers.
having our own in a demo (almost) entirely defeats the purpose of the demo
if we're writing our own client, the demo is only useful for going to other mcp clients and being like "look at this thing we could do with your product if only you fixed X" - we lose any value of going to real normal users and being like "look at this cool thing you can do with dagger via goose"
i would not go dropping this to work on the static tools yet unless you're very blocked on something related to dynamic toolsets
tbh evals should prolly come before a static tool spike too just to have a target
I hear you I just need to know if it's me or goose the problem. Hence also figuring out the differences in how they make the calls to LLM provider
valid
rn i'm trying to get dir to be passed to a function as part of that demo 2
that doesnt work in prompt mode either, so when it works in prompt mode we can see whatsup with goose
omg i'm so dumb, i was responding to guillaume as though you hadn't already responded
#1346602677236400159 message this did not connect to the demo for me the first time i read it but in retrospect very obvious
np
cool so we're good, im gonna violently break my demo setup now and see about fixing/filing some bugs
just asking for what to prioritize short term -- evals is a safe bet ; happy to do it ; didn't understand the .refresh comment sorry
Cursor can now send images over mcp. Not sure how?
MCP defines types, text is the most used one, but image is another one: https://pkg.go.dev/github.com/mark3labs/mcp-go@v0.21.1/mcp#ImageContent
ok so this unlocks a first tier demo -> headless browser sending screenshots
right timing with the o4-mini that's SOTA on image analysis
til goose session --debug to show full tool responses
where is it logged ? π π
Stdout lol
I also have an additional mcp.sh hack that collects the dagger mcp stderr to a file, will share in a sec
yeah I do 2>/tmp/debug.stderr.log too and tail -F it in a 3rd terminal, this way i have stdin, stdout and stderr π
Yep exactly
{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"capabilities":{},"clientInfo":{"name":"goose","version":"1.0.18"},"protocolVersion":"2025-03-26"}}
{"jsonrpc":"2.0","id":2,"method":"tools/list","params":{}}
@slow sierra
{"jsonrpc":"2.0","id":2,"result":{"tools":[{"description":"Select tools for interacting with the available objects.\n\nAvailable tools:\n- HelloDagger_build (returns Container!)\n- HelloDagger_buildEnv (returns Container!)\n- HelloDagger_publish (returns String!)\n- HelloDagger_test (returns String!)\n- cacheVolume (returns CacheVolume!)\n- container (returns Container!)\n- directory (returns Directory!)\n- git (returns GitRepository!)\n- host (returns Host!)\n- http (returns File!)\n- trivy (returns Trivy!)\n\nAvailable objects:\n- HelloDagger#1: helloDagger","inputSchema":{"type":"object","properties":{"tools":{"description":"The tools to select.","items":{"type":"string"},"type":"array"}},"required":["tools"]},"name":"selectTools"},{"description":"A tool for thinking through problems, brainstorming ideas, or planning without executing any actions","inputSchema":{"type":"object","properties":{"thought":{"description":"Your thoughts.","type":"string"}},"required":["thought"]},"name":"think"}]}}
till hitting this with the binding.isNull fyi
is this also a panic ?
this being the panic: no ':' separator in digest "" panic, yes
^ after rebasing on alex's branch this is needed
we also have some sort of truncation wackiness going on, like first part of list tools response
{"jsonrpc":"2.0","id":2,"result":{"tools":[{"description":"Load tools for interacting with the available objects.\n\nAvailable tools:\n- HelloDagger_build (returns Container!): Build the application container\n- HelloDagger_build_env (returns Container!): Build a ready-to-use development environment\n- HelloDagger_publish (returns String!): Publish the application container after building and testing it on-the-fly\n- HelloDagger_test (returns String!): Return the result of running unit tests\n- Directory_as_git (returns GitRepository!): Converts this directory to a local git repository\n- Directory_diff (returns Directory!): Return the difference between this directory and an another directory. The difference is encoded as a directory.\n- Directory_digest (returns String!): Return the directory's digest. The format of the digest is not guaranteed to be stable between releases of Dagger. It is guaranteed to be stable between invocations of the same Dagger engine.\n- Directory_directory (returns Directory!): Retrieves a directory at the given path.\n- Directory_docker_build (returns Container!): Use Dockerfile compatibility to build a container from this directory.
goose run -t "list all of your available dagger tools, do not leave anything out. i don't care how long the response is. use complete names." --debug
starting session | provider: openai model: gpt-4.1
logging to /Users/braa/.local/share/goose/sessions/20250418_154104.jsonl
working directory: /Users/braa/src/dagger
Here is a comprehensive list of all available Dagger tools, including their complete method names:
---
### HelloDagger (Build and Test Workflow)
- **HelloDagger_build**
Builds the application container.
- **HelloDagger_build_env**
Builds a ready-to-use development environment.
- **HelloDagger_publish**
Publishes the application container after building and testing it on-the-fly.
- **HelloDagger_test**
Returns the result of running unit tests.
---
### Directory (File and Git Operations)
- **Directory_as_git**
Converts this directory to a local git repository.
- **Directory_diff**
Returns the difference between this directory and another directory. The difference is encoded as a directory.
- **Directory_digest**
Returns the directory's digest. The format of the digest is not guaranteed to be stable between releases of Dagger. It is guaranteed to be stable between invocations of the same Dagger engine.
- **Directory_directory**
Retrieves a directory at the given path.
- **Directory_docker_build**
[The remainder of the method name was cut off in the tool listing. If you need the full signature or more details, please let me know.]
---
If you need more details or want the signature/parameters for any of these tools, let me know!
this breaks the trivy part of the demo once we've layered in the directory input because trivy's at the end of the selectTools list, and probably prevents chaining as well because "available objects" is at the very very end
starting session | provider: openai model: gpt-4.1
logging to /Users/braa/.local/share/goose/sessions/20250418_154842.jsonl
working directory: /Users/braa/src/dagger
Certainly! Here is the complete description of dagger__load_tools, with no omissions or editorializing:
typescript
namespace functions {
// Load tools for interacting with the available objects.
//
// Available tools:
// - Directory_as_git (returns GitRepository!): Converts this directory to a local git repository
// - Directory_diff (returns Directory!): Return the difference between this directory and an another directory. The difference is encoded as a directory.
// - Directory_digest (returns String!): Return the directory's digest. The format of the digest is not guaranteed to be stable between releases of Dagger. It is guaranteed to be stable between invocations of the same Dagger engine.
// - Directory_directory (returns Directory!): Retrieves a directory at the given path.
// - Directory_docker_build (returns Container!): Use Dockerfile compatibility to build a container from this directory. Only use this function for Dockerfile compatibility. Otherwise use the native Container type directly, it is feature-complete and supports all Dockerfile features.
// - Directory_entries (returns [String!]!): Returns a list of files and directories at the given path.
// - Dire
type dagger__load_tools = (_: {
// The tools to select.
tools: string[],
}) => any;
This is the full, unedited, available description. If you need more details about any listed tool, just let me know!
yes that's what i was referring to this morning or last night idk
goose only sees 9 avilable tools to select
starting session | provider: openai model: gpt-4.1
logging to /Users/braa/.local/share/goose/sessions/20250418_154908.jsonl
working directory: /Users/braa/src/dagger
Here is the complete description of the dagger__load_tools tool, without editorializing or omissions:
// Load tools for interacting with the available objects.
//
// Available tools:
// - HelloDagger_build (returns Container!): Build the application container
// - HelloDagger_build_env (returns Container!): Build a ready-to-use development environment
// - HelloDagger_publish (returns String!): Publish the application container after building and testing it on-the-fly
// - HelloDagger_test (returns String!): Return the result of running unit tests
// - Directory_as_git (returns GitRepository!): Converts this directory to a local git repository
// - Directory_diff (returns Directory!): Return the difference between this directory and an another directory. The difference is encoded as a directory.
// - Directory_digest (returns String!): Return the directory's digest. The format of the digest is not guaranteed to be stable between releases of Dagger. It is guaranteed to be stable between invocations of the same Dagger engine.
// - Directory_directory (returns Directory!): Retrieves a directory at the given path.
// - Directory_docker_build (
This is the full excerpt provided by the underlying system as of April 2025.
but if you look at the mcp logs, it's there
yep exactly
this is where i was like what the heck is goose doing
and i have this behavior, even without any system prompt by goose, just dagger's
they must be doing some kinda processing ... hence my joke that they probably ask another AI how to mess it up
no joke, bug goes away if you switch to claude
geez
this was w 4.1 ?
This doesn't make sense, why would it work with dagger promtp mode and not goose ?
4.1 and 4o have pretty much the same thing, which makes me think it's a context window thing. could possibly be goose mismanaging the context window.
claude has a different bug but it at least gets a complete load_tools description
hence the need to mitm the http at this point
yep yep that comment makes more sense now lmao
Ok, I'm getting confused on the best observability stack we need to isolate any client / dagger shell discrepency.
Thinking from first principle, what would be, from our conversations, the most critical things to observe:
- MITM http query to AI
- Tool list before / after each response
$agent |Β toolsvsmcp tool list - Eval
intent of tool use per $assignment - Extraction of system prompt (goose's system prompt is dynamic) -- might influence
evals are always welcome, i think we also need to mcpify the existing ones
Also MITM is important now (either via proxy or just debug printing both goose and dagger and rebuilding) to understand if goose is tweaking things differently in the way they send their requests to openai.
lol one sorta deranged way to mitm openai specifically is to use ollama serve
anybody got a beefy gpu?
sorta beefy mac. what you need π
im mostly kidding lol, but theoretically if qwen 32B plays well with our dynamic tool routing, ollama serve has useful debug logging
that's a big if though
these are super interesting: https://modelcontextprotocol.io/docs/concepts/tools#tool-annotations
noticed the package we use supports them now: https://github.com/mark3labs/mcp-go/pull/158
openWorldHint boolean true If true, the tool may interact with an βopen worldβ of external entities
the other 3 annotations are unsurprising but this one is both a surprise to me and pretty relevant to us
@clever oasis consolidating #agents message to here, with selectTools merged, how do we imagine module-object preselection working? it might get pretty immediately relevant for @tardy shadow 's hypothetical "tier 1" headless browser screenshot + cursor MCP demo because cursor isn't friendly to tools_changed notifications
does selectTools really even change anything about that apart from that our selections aren't really focused around single specific objects any longer?
(subtext: i'm wondering if I should try to bring back some part of the WithFooInput(foo, initialSelect:bool) PR to enable this)
@dark egret i would try without a manual 'initial select' thing and just try providing the constructed object as an input, and see if that's enough as-is
ideally the model would see that the object is available, maybe with a helpful description as a hint if needed, and also see that its tools are available, but still need to select them first (and have enough hints that it knows to do that immediately)
in theory it'll see something like this:
Available objects:
- DaggerDev#1: (description here)
Available tools:
- DaggerDev_test: Find test suites to run
and figure everything out from there, given "run the tests"
hmmm lemme try it out real quick in cursor, this is basically hinging on the "selectTools returns a tool schema that the model understands as though it was part of the MCP tools list" thing, right?
i keep forgetting that that works because in my mental model of the clients it does not XD (my mental model is clearly wrong)
that's unrelated, doesn't hinge on it at all, it just hinges on the model's ability to select tools before using them, which is now an assumption made everywhere
previously: 'current selected object' was important because it determined the available tools => so we needed a way to pre-select an object
now: 'current selected object' is not a thing => model selects tools as appropriate for working with the available objects => so theoretically we don't need to do anything special anymore
selectTools returning tools is only relevant for: 1. teaching dumb models that all they've done is select tools, not invoke them, and 2. helping any MCP clients that don't support the 'tools changed' notification
yeah we're saying the same thing, this is 2.
That's what we already do π
(both things are required)
ah ok, skimmed over the end of the prior message and was mostly focusing on this
cursor has it's own way of managing context-specific mcp servers: <project-root>/.cursor/mcp.json
there's still a small possibility that this still works (with a bug on their end that makes the LLM actually calling an MCP tool that doesn't exist before the select) -- trying it out
@tardy shadow i was testing the same thing i think at the same time, on tiborvass/mcp-demo-wip, llms inside Cursor at least will not call the selected tools
Then you're right on this, to unlock static calls π πΏ
@dark egret hey sorry, just wanna say that my mcp-demo-wip branch is a constantly moving target, it's where i'm trying to get the Directory#1 issue fixed so we can try the test coverage demo working
we're not the first people to ask cursor to implement tools_changed though https://forum.cursor.com/t/mcp-hosted-dynamic-tools-refreshing/49693
Iβve been using MCP servers hosted by Toolbase and everything works so well! Through their app, I can enable and disable MCP servers dynamically but I would still need to manually click the refresh button on Cursor. I talked to the dev and he mentioned Cursor doesnβt support the notifications/tools/list_changed notification and that is w...
and yeah i know, no worries, im just using it because it's what i had built already
For what you guys are trying to do, my branch or not my branch doesn't change much wrt Cursor
your branch has selectTools + WithFooInput is all
And your recursive dependencies too i believe
yeah yeah and that which is needed for the trivy thing
So we do need your stopgap (which was your closed PR) Connor to unlock the use with static MCP clients π€£ Full circle π
Unless @dark egret you wanted to make it work, I can take a stab at doing autoselect for @sonic fox since my understanding is that Cursor won't cut it
I think actually once i rebase https://github.com/dagger/dagger/pull/10118 on main, that branch is theoretically enough for our goose demo from last week
fixes #10116, replaces #9992
this PR adds
module.Serve(ctx, dagger.ModuleServeOpts{IncludeDependencies: true})
to the API, enabling shell and other clients to serve the dependencies of modules to e...
TBH, even if it worked, it'd be relying on a bug. Since it's not a security-wise sound thing to just yolo call a method that doesn't exist according to your local state
i have made it work lol, the problem is a combination of API design (WithFooInput(selection:bool) leaks this "select" concept out the API) and a deep codegen bug i think i can work around if necessary
ah ok, i was thinking of just passing selectedTools map[string]bool around in newMCP
make it a private API with _ prefix
which brings us back to the original thing i was poking @clever oasis about, I confirmed that the selectTools returns tool schemas thing does not allow those tools to be called in cursor.
yeah maybe i rebase and do WithFooInput(_select:optional[boolean]) -- very small diff from what i already had - not that map[string]bool is worse just that it's a larger diff
at this point it's not a reliable solution either way, but did you try with different models?
I personally don't mind having a private API stop-gap solution if it unlocks interesting demos on the non-dynamic clients, since they're private, we can always find better ways later. In fact I haven't abandoned the experimentation on full static indirect API either.
https://forum.cursor.com/t/mcp-hosted-dynamic-tools-refreshing/49693/3 interesting - I wonder why Twilio needs it
cursor default, claude 3.7, and 4o
samesies, i wonder if their problem is similar to ours tbh, huge horizontal API that can't quite fit in a context window
from a quick skim i don't see anything complicated in their mcp server
maybe the user misunderstood something (https://github.com/twilio-labs/mcp)
the other thing i've been thinking about is that the protocol does actually require clients to specify capabilities, like they're supposed to declare on init if they handle list_tools_changed notifications, and we could customize our toolset to be static when they don't even if it's not the optimal setup, like swap out selectTools for selectFunctions + runFunction
aka: static only if client doesn't support dynamic
Except now we're talking about whether clients send those capabilities at all π
and when you frame it that way, the ability to do static under certain conditions almost becomes a requirement... like we don't want to document that our MCP support only works with clients that support the tools changed notification, ideally we work with all of them but better with the dynamic ones
Also it's a little more subtle than that, because it's not necessarily about handling the notification, it's about the tools list synchronization in the agentic loop
π― the thing that inspired this line of thought was that cursor claims the capability when it doesn't actually have it
hmm i wonder if all that's needed for static is adding a tool to call a function - basically a "legal" way for the model to call a tool that it doesn't actually have, but that it can recognize from the select_tools response
I've already been tempted to add support for chaining/batching to save token cost, a lot like that batch meta-tool you found @dark egret (I think from Zed?) - maybe once we add that we'll get 'static' support for free? 
or is that already 100% what y'all are suggesting lol
exactly, yeah! i do think we'd also need to change the tool descriptions/instructions/system prompt a lil based on the mcp client's list_tools capability, but that's basically the concept
i can give that a go if no one has it in flight, super curious how the evals handle it, would love to see token cost π
β€οΈ i love that it's potentially 2 birds with one stone. idk if he's got anything going yet, but i do know a static toolcalling scheme is in @proper ledge's mental backlog
if you have call_tool do you still need select_tools?
you at least need something that tells you their schema + docs, once the model knows that it wants it
No need for select tools. But list tools and call tool should always take an object
yeah this is getting at what i mean by "changing the tool descriptions/instructions/system prompt" -> i think if you go static you replace select_tools with list_functions and call_function
(so as to not confuse the LLMs by calling our functions tools when they're not actually callable via toolcall messages)
select_tools basically becomes list_functions + actually installing them as native tools, for clients that support it
is closing discord to stop procrastinating the 2 branches i need to get rebased and merged to get the demo functionality on main
actually wait i might only have 1 branch, i think static fn calling might obviate any sort of initialSelect type thing
yes that's what i've been saying. If @clever oasis you don't mind, let me quickly finish what i have and you can take it from there
rebasing both anyways but imma focus on the serveDependencies then
wip for chaining - looking good so far: https://github.com/dagger/dagger/pull/10229
evals pretty much passed, the UndoChanges failure in that trace is a preexisting issue, seems unrelated
haven't verified if any models will use it to work around the static tools limitation
tried it out with Cursor - it does use chain_tools but is hamstrung by the fact that it never sees the updating select_tools description (where the available objects and tools are listed as it reaches more objects/types)
gonna look into workarounds (it's something i was curious about in the past and would improve caching behavior), and if they work maybe we can just get rid of the dynamically updating descriptions
Ah, yeah that makes sense, could try omitting the description change and only giving that data on return (list_tools instead of select_tools)
yeah - might be "and" instead of "or" - since i wouldn't want to get all the tool descriptions + schemas dumped into the context, this is more of a summary to see which tools can be selected
or, it could look like augmenting it into existing responses
{"result":"DaggerTest#1","new_tools":["DaggerTest_all","DaggerTest_specific"]}
I think youβre thinking about the prompt mode/dynamic case and Iβm mostly thinking about forced-static mcpβ¦ might be good to try to outline how to do dynamic additively or something?
i'm just pointing out there are two levels of tool discovery: 1. learning what tools can be selected/described, which is just a list of names + return types, and 2. learning about tool(s) in full, which gives you all the descriptions + schemas + etc. which is way more verbose
Yeah makes sense
Static could try to flatten that but I can see that that might be fucked, too many tools to pretend theyβre all βselectedβ at once
yeah - this is all mostly scheming to save tokens + context window usage
Found it interesting while trying dagger mcp in Claude desktop, etc that there was always a selectFooand a Foowhereas other servers don't have that.
Is that selecting the object with the functions vs an MCP server just having a set of functions at the top level? Related to #1346602677236400159 message?
yeah those are the sort of 'meta' tools for our dynamic tool calling scheme - but more specifically that's the older scheme, now main is select_tools instead of one-select-tool-per-type
for the curious, here are the evals with chain_tools + fully static tools (select_tools just lists) - some pass, but plenty of room for improvement: https://v3.dagger.cloud/dagger/traces/c76d5bcbfa579abfe442f02687b58fca
Gemini is once again hitting FinishReason(10) all over the place 
adding a singular static equivalent helps (run_tool): https://v3.dagger.cloud/dagger/traces/8b6a0e00085fc829aaf504e454d61028
Curious to know Alex how you found out about that prompt: https://github.com/dagger/dagger/pull/10229/files#diff-91f8c8294849cf40bf8fee4f54abf9eed4637327cc87aba4f34f365ab5293c90R33-R60 -- is it the evals that constructed it or you vibed + ran the evals ?
If I were to improve my prompt engineering skills ahahah
just wrote it manually + ran evals yeah
the prompt helped - without it gpt-4.1 would mess up the schema by skipping the params wrapper
@clever oasis @dark egret sorry i'm late with my branch not sure if it's worth anything for you now, but i wanted to do two versions, this one is the initial one: https://github.com/tiborvass/dagger/commits/mcption/
The second version will take object IDs as inputs to list_functions, and maybe i remove the arg schema function.
@tardy shadow did a few runs, sometimes there are some failures i'd need to investigate but overall it's on par.
claude3-5: https://dagger.cloud/dagger/traces/c49a9b444118d8545499a08addbbae1e
gemini-2-0flash: https://dagger.cloud/dagger/traces/c38f4351275a8228c2e965f06dbce6bd
gpt-4o: https://dagger.cloud/dagger/traces/7a42a0ff4b7411f0473045fa16522151
gpt-4.1: https://dagger.cloud/dagger/traces/8b7e6481681d4cc4bf0ad365fae031b2
those runs look like they're using the dynamic scheme?
- i see select_tools calls in there
mmmh, maybe π€
I was using dagger_dev call evals-across-models --models gpt-4.1 csv on his dev branch
Ah sorry will doublecheck later
my bad -- i didn't recompile the engine
Not on par atm -- my bad ;
update: you can now view evals without it crashing your browser π https://v3.dagger.cloud/dagger/traces/1a6ca248cce0609b21afd7b7c90a81f2
(be sure to click the 'update' button)
https://github.com/dagger/dagger/pull/10118 includeDependencies is ready to merge after @copper panther takes 1 more peek monday, which means we'll have the trivy demo working on main.
i've also started setting up a mutagen filesync + dagger mcp demo. i'm having some struggles with getting cache volumes to share as I'd expect them to, but it feels like user error more than an actual engine problem, so i'm shelving that to reexamine on monday. if i can get it working, the demo will require lots of ugly and error prone setup, BUT we'll be able to make a ts unit test fail on purpose and then ask goose to fix it with its "developer" tools enabled
I am doing some introspection on one simple daggerized mcp server and I see we're exposing a think function (alongside the select_tools)? Is it part of the spec?
Not the MCP spec per say, but it helps the LLM understand how to use select_tools.
Is it common practice for other MCP implem?
orthogonal to MCP
that tool also exists in prompt mode and for async agents, it measurably helped with evals
What we do is not really common practice MCP because what most people do is expose static functions with a description and let the LLM handle the rest. We expose MCP tools that allow you to find the right dagger function to call. The way to navigate that is a bit more complicated and it helps the LLM a lot to ask it to think.
interesting, seems to be something that should be formalized in the spec at some point
think is an increasingly common practice for tool calling in general, like zed has a built-in (non-mcp) think tool that it enables in "ask" mode and disables in "write" mode, but i don't really see how that relates to the model context protocol
I think the decision to include think in the core deserves a github discussion thread. It's not a trivial decision. It's commonly an outside tool, not in the core of any framework
Also I would love a discussion thread on this mutagen filesync idea. It sounds cool but I have no idea what it does or how it works. Beyond a certain level of investment (ie a few days full time) it starts being too much without a prior discussion
yep yep, we're in strong agreement here, and i have 0 intent to take it any further than a demo to show the vibes w/ an external MCP client and further spark a discussion of what it might look like as an actual feature.
as for how it works: in userspace it's pretty violently faff-y, we run a mutagen agent service in dagger, pointed at a shared cache volume. outside of dagger, on the host box, we point a mutagen client at the dagger-hosted mutagen agent service. that sets up a sync that keeps a host directory matching the cache volume directory in its contents. further sessions can use that cache volume to read and write files as though they were bind-mounted on the host.
this all gets wacky really fast. the wire protocol for mutagen uses ssh, so restarting the mutagen Service means you gotta clean up known hosts on the host machine. sessions sometimes seem to interfere with each other and take down the mutagen sync, etc, etc
afaik think was part of the evals driven improvements Alex worked on for 0.18.5, right?
yep
We can stand by our decision to embed think but we should have our eyes open about the fact that it's unusual and probably involves tradeoffs. Users will probably question it too
if i had to gamble i'd bet that the conclusion of this discussion will be a flag to make it configurable... i feel like it makes sense to include in prompt mode by default, but considering MCP it gets weird fast if there are other think tools available, and considering async agents it's not crazy to think that folks might want to implement their own or exclude it altogether to encourage the agent to "just do it"
dependency constructors should be LLM-usable now in prompt mode/dagger mcp cc @tardy shadow @proper ledge - aka the demo from 2 weeks ago is working on main
@proper ledge @weary stag is it possible to start dagger mcp just with the core library?
getting this currently:
10 : β ! module not found
1 : dagger mcp ERROR [0.5s]
1 : ! module not found and --core not specified
14 : moduleSource(refString: "."): ModuleSource!
14 : moduleSource DONE [0.0s]
--env-privileged
The error message is not helpful because the flag was renamed and we forgot to rename it in the error message but I believe @tardy shadow fixed it already
thx π
I'm also getting very inconsistent errors while using goose and v0.18.5. I see a bunch of these messages:
which LLM ?
If it's Gemini Anthropic, I believe you're hitting a known issue, or at least i've seen this before. I could be wrong but i've seen 400 errors before with Gemini Anthropic
sonnet 3.5
I'm mostly trying to make the default coding agent module https://docs.dagger.io/quickstart/agent?sdk=go in goose
I meant Anthropic
Gemini gives me random FinishReason(10). I guess I can try Chat GPT 4o
that's old, try upgrading
was trying to get a very simple dagger mcp demo working but fails with a very simple create an alpine container and install curl
what version are you currently in?
1.0.18
i think we had welcome fixes in 1.0.18 but 1.0.21 is out
woot, I ran goose update and I'm still in 1.0.14 somehow, lol
hash -r ?
if you brew installed it originally it puts it in a different path where the selfupdate doesnt work
lol linuxbrew of course
but same might go for certain distros package managers, wouldn't surprise me
shame on you for not using bin π https://github.com/marcosnils/bin/
Effortless binary manager. Contribute to marcosnils/bin development by creating an account on GitHub.
cool, I have a sweet demo working now. Thx @proper ledge and @dark egret πΊ
i think i'm like an inch away from a working mutagen+mcp demo, but i keep getting rate limited by anthropic... time to appease the LLM spirits
lol my offering worked, but i still exceeded rate limits so the demo looks bad... gotta try o3 now i guess
... also holy shit, while it was failing to read the second run of tests back into claude bc of rate limits, my local agent setup:
- correctly updated node dependencies
- added a test preserving the "original" test behavior + code to make the intentionally broken test pass
- modified my .dagger/main.go for x-platform runs (so it could run
npm run test:unit runon the host or via HelloDagger.Test without one of them failing)
meanwhile i was just hitting the tokens/minute limit and ignoring changes it was making
omg it also fixed the demo with the x-platform runs fix somehow
but i threw it away π
it's amazing ahahah
and woooooo got it all working on o3... fixed my rate limit problems with a .goosehints that says don't list files recursively to infinite depth, it'll overload the context window and exceed rate limits. control depth or use rg to find what you need. (bad goose defaults for their "developer" toolset, not our fault)
more interestingly, i had to futz with the module setup to force in-session test reruns, WithEnvVariable("CACHEBUSTER" is insufficient in shell/prompt/mcp environments, instead we've now gotta do ```
// Return the result of running unit tests.
// Results from previous runs are cached by idempotency key.
func (m *HelloDagger) Test(
ctx context.Context,
// Supply a new idempotency key to force a new run.
// +default=""
idempotencyKey string,
) (string, error) {
return m.BuildEnv().
WithExec([]string{"npm", "run", "test:unit", "run"}).
Stdout(ctx)
}
(also tq to @thorn sand for sanity checking me when i wasn't sure if the caching was doing what i thought it was doing lmao)
tried to take the demo a little further just now and compose 3 modules together, but hit a wall:
- hello-dagger (with my added cachevolume + mutagen mounting bits, usable by goose MCP client)
- dev-env (takes a container and layers writefile, readfile, listfiles, etc on top)
- agent (toy-programmer-esque llm.withEnv(env.withHelloDaggerInput.WithDevEnvInput).withPrompt.Sync, return helloDagger.BuildEnv)
hit a wall i wasn't expecting that makes this less cool to show, tool routing seems to work great but "cannot retrieve path from cache" errors prevent me from using dagger fns to manipulate files on the cache volume. i can probably work around this via withExecs. the demo also suffers from the obvious problem of the additional module necessitated by the lack of self-calls, but we're working on that.
i'll show the code anyways at demos tomorrow because i think the interface is interesting, and once we have self-calls it's potentially quite slick
@warm wolf π this is where the MCP work is happening btw π
@dark egret I think mutagen stuff is a distraction. just being honest
As an impl? Or do you mean host βbind mountsβ with any underlying implementation are a distraction?
I mean short term, trying to fit that in on top of the rest in the first demo
you mean for showing externally?
quoting myself @slow sierra , approach and intents have not changed- i'm just playing around with UXes we can potentially enable, mutagen is just a hack to make the basics work. this is not part of any external-facing thing.
yep yep, we're in strong agreement here, and i have 0 intent to take it any further than a demo to show the vibes w/ an external MCP client and further spark a discussion of what it might look like as an actual feature.
Darren was just telling me about MCP sampling & roots π
After we're done with the first MCP demo, we should target getting MCP working with some popular frameworks like google-adk. The blocker seems to be with our select_tools that keep looping (like with some other clients), it'd be nice to have the ability to disable the dynamic tooling with something like dagger -m my-module mcp --static.
It's a good question, I don't know if agent frameworks support dynamic tools natively. I know it's definitely an issue with the apps (Cursor, Claude, Windsurf).
In any case we need support for static tools. And once we have it, we probably will just get rid of dynamic tools. Not a lot of benefits to having 2 competing MCP implementations, it's hard enough to polish one
btw once we have generated clients, I think MCP support in frameworks becomes less important, because you can just integrate with their native tool calling system, which will probably work better.
tldr apps > frameworks in the priorities for MCP clients to support
After trying github's official mcp server (running it with docker) with google-adk, what I think would be great to have, is the ability to use any of my dagger module (ignoring the language they are written in), with a simple python agent that uses one of this frameworks.
I agree but I think it will be easier to achieve that with generated Python and Typescript clients, because it's more mature than mcp and has less moving parts
(but if it ends up working better with mcp, that's great too)
Not a lot of benefits to having 2 competing MCP implementations, it's hard enough to polish one
the hard part is gonna be if the dynamic scheme works better in evals, this logic becomes less clear, or at least we may have to sacrifice capability for simplicity... plus if it does it's totally possible we can push MCP clients, over time, to support dynamic tool calling
yeah good point. let's cross that bridge when we get to it.
Personally my money is on "GraphQL query builder over mcp"
raw graphql BUT with a twist: instead of learning the schema, we guide the model to build the query one function call at a time
then a separate tool to send it
yeah that does seem the natural choice
i'm just glad we're starting to figure out evals, without data it scares me even contemplating decisions like these XD
Not an easier dev loop though: every time I add a function or change an argument, I need to re-gen the client and update the tools. With MCP, I let the LLM figure it out dynamically.
sooooo with @clever oasis back and OTEL API stuff resurfacing the big question is: do you still wanna work out the whole static tool calling scheme? cc @proper ledge
no strong preference here - happy to let others carry the torch on that if there's interest
I could use a checkpoint on what our current LLM-facing API is, how it may or may not change in the future... It feels like a black box to me and also feels like it might churn completely from one week to the next. And I wrote the original version... I imagine everyone else feels even more that it's a black box.
Are we in a place where we can start stabilizing the API?
Are there big WIPs that we should be aware of?
eg. "declare_output" seems cool but I don't fully understand the implications
my impression is that the LLM facing API is still deeply unstable
also we haven't resolved the discussionon current state of Env which doesn't match anyon'e original vision - not LLM-agnostic; not simple; etc
+1 (to connor) - I see it as purely an implementation detail of better eval performance (+ shell DX), subject to change at any moment, and nothing should ever make assumptions about it
I agree with that theory but in practice, a lot of external-facing design discussions are heavily influenced by LLM-facing constraints
or maybe that's no longer true, and we're back in safer territory where API and implementation can be decoupled like with regular software?
do you have an example offhand? (don't doubt it just wanna make sure i'm tracking)
trying to remember / disentangle π
i can think of the other way around, like static vs. dynamic tool schemes being influenced by limited clients, and external factors like prompt caching nudging us away from dynamic
The one that's occupying my mind right now is declare_output. Every time I saw it happen, it was in the context of a shell session, and it was very useful - basically replaced the old set and the LLM was able to set a shell variable for me.
But, it kind of broke my brain when trying to connect it to the system of externally-defined inputs and outputs. Can any LLM add any outputs to its environment at any time? Is that a good thing?
ah - that's exposed in the API as dag.Env(writable: true)
Also having a baked-in think tool which is a non-trivial design decision (kind of like including something in core dagger api or not). ie sophisticated devs will care about that as it might affect their other tools
OK, so I guess that's an example of internal constraint changing the external API right there
privileged has the same problem
and there was that whole convo about "levels" of privilege etc
that shit we should try to stabilize asap
Yeah but at least we had a semblance of API design discussion for privileged - what scares me is when things pop in the API that has no visible API design thread anywhere in the record. That scares me because implementation changes very fast - if API starts moving at the rate of implementation we're in trouble
Like writeable I had never even heard mentioned ever until just now
So my request is - can we bring the LLM parts of the API back under the usual bikeshedding system. There was a sort of implicit exception made for that part because of the unique constraints. Wondering if now we can go back to normal or too soon basically
(I know our bikeshedding is not perfect and open to improving it in any number of wayas - but that's orthogonal I think)
probably soon yeah - my mentality has been to just make a best guess at everything assuming we'll have one final round of bikeshed later, so maybe it's that time now?
i think still too soon but you are 100% correct to apply pressure that we need to be trying to get there asap (basically what alex said)
Well like you said, for changes that are truly only LLM-facing, it's OK to consider them implementation (never will be 100% since how LLMs understand our tools leaks into how they will respond to user prompts... but anyway close enough)
For LLM-facing changes that directly affect the API (ie. new flags in Env etc), then at the very least we should regain the habit of treating them like API changes, ie. notifying people, have an issue or a place to discuss or at least collect feedback for later. We can still do that and fast track it, we've been known to do that elsewhere.
Also just double-checking, did we get around to marking LLM and Env as experimental? Cause clearly they still are based on this conversation
yep, both are marked
@tardy shadow @proper ledge i switched dagger-to-agents to use jeremy's trunk trivy module, should be on main now (i meant to PR but my upstreams are set up weird so i accidentally force-pushed
, but i don't think i rewrote any history)
oh yo @proper ledge hmu for a review when you get the in-process stdio tees open as a PR, would love to see that on main
FYI starting a discussion on broadening out support of MCP spec over time - thank you @blazing mortar for starting this! https://github.com/modelcontextprotocol/modelcontextprotocol/issues/362#issuecomment-2859692113
uh this is a bit awkward, but we don't currently even support tools as far as the MCP Client list is concerned. dagger mcp is a an MCP server, not an MCP client. in the future it'd be great to get on the client list, but we'd need to ship client support for one of those columns first. currently that's not at the top of our list of priorities given that dagger modules already provide quite a bit of the programmability of MCP servers, plus i'm still holding out hope that we'll get a blessed go client mcp library
Speaking of which, i think we should reserve the dagger mcp namespace and do dagger mcp serve instead for what currently is dagger mcp
With Solomon's MCP client runtime -- doesn't that unlock the client use-case ? π€
yeah [when it has tool calls working](#agents message)? (very plausible i'm misunderstanding what solomon got working especially if y'all are currently located in his basement lol)
oh no, you're 100% right -- I was already mentioning that from the future π π€£
@dark egret you are completely right, I had missed the fact that the issue was about Dagger as a MCP client... So my comment was completely off topic π I corrected my comment accordingly. Thanks for catching
merged!
plz ser I need to be able to add my MCPs to to my dagger modules π₯Ή
I'm working on a mcp runtime for dagger, could you share links to the MCPs you want to add, so I can use them as reference?
Right now I use https://www.firecrawl.dev/mcp a lot
Also https://github.com/mark3labs/phalcon-mcp for analyzing blockchain stuff
MCP server that integrates with the BlockSec https://blocksec.com platform to provide blockchain transaction analysis. - mark3labs/phalcon-mcp
is there a dagger mcp --no-mod/-M equivalent?
like if i just want a global core dagger MCP server
yes it's --env-privileged
ah my bad, i think we just talked about having a -M
--env-privileged adds the core api, and if you dont have any modules it will only have core, but i dont think we handle -M
happy to add it
ah ok it at least does the right thing if i run it outside of a module
yes
would be nice, for consistency, but not blocking
Anytime and anything for dagger & team. Thank you.
@slow sierra with the sdk/runtime approach to MCP clients, would we retain the ability for users to configure the module container with their arbitrary secrets, base images, withexecs and whatnot? i thought i was gonna find an example here but i suspect i'm looking in the wrong place
@proper ledge did you say you had code pushed somewhere for static tool calling?
yes but not working as i wanted and didnt finish it. https://github.com/tiborvass/dagger/commits/mcption/
i wanted to also do a version where you pass in a receiver to list_tools so that it only lists the tools of that object
oh wow just noticed it's merged
Baked into a closed-source fork of an open-source fork of a web browser
π«
finally i got off the preview channel where it was asking me to update 6 times daily lmao
i think i'll do the same
@dark egret The special bindings for workdir (and result_dir) are just stopgaps for demos. I was reassured by the hack because mcp also has the notion (albeit weird) of Roots. But i'm still not sold on it, i'd rather expose host tools in a secure way.
i have a proposal π writing the issue now
trying out the new static tool scheme with Claude Code, since it should theoretically work now - but it fails when I run out of the dagger repo, with "error": "Connection failed: Connection to MCP server \"dagger\" timed out after 30000ms",
is there any way we could start the MCP server eagerly and load the module async?
it's claude code that's timing out ?
it does but that's all it says
my guess is it bails before it sees the stdio mcp init response
(btw would love a review on https://github.com/dagger/dagger/pull/10341)
(oh i guess needs a rebase)
are you saying that it times out when it's not being used ?
it times out because the MCP server takes too long to start, since it's waiting for the module to load
once the caches are warm it's fine
aaahh
ok thanks, yeah that's bad
yeah it shoudl be lazy. i'll think of a fix during lunch
cool cool
also seeing this weirdness:
β dagger:call_method (MCP)(method: "Container.withExec", args: {"args":["apk","add","--no-cache","cowsay"]}, self: "Container#2")β¦
βΏ Β Error: Error calling tool call_method: undefined
the call doesn't show up in the trace, almost seems like Claude Code is messing up something? sometimes it works, like this one was fine:
β dagger:call_method (MCP)(method: "Container.withExec", args: {"args":["echo","test"]}, self: "Container#2")β¦
βΏ Β {
"result": "Container#3"
}

oh - it's just really confusing output, nvm
it puts the logs above it
@dark egret seeing something odd - module dependencies are listed, but trying to actually call them fails 
it works in dagger shell: https://v3.dagger.cloud/dagger/traces/6bd9962b90932cc3f4fb13d6d1726da4#c63e8573f74f69e6
but not over MCP: https://v3.dagger.cloud/dagger/traces/b785bbbe57ee0d06c6f5acb5173ed199?listen=303716c247ae0c87&listen=7fe048fcb162cc9f
that is definitely weird, the trivy demo would also be broken in this case
may have figured it out - env privileged: true translates to storing the root dagql.Object on the Env, we probably need to have it use the one on *dagql.Server instead. working on a fix but gotta nuke my dev volumes first so it may be a while before i can confirm
(confirmed)
I'm sorry I'm confused on why π
it's pretty confusing, but basically all the MCP code needs to respect the *dagql.Server that's passed in, so it sees the caller's API, instead of just the core API
for env(privileged: true) we were grabbing the Root() of the core API, and then at runtime we would use env.Root(), instead of srv.Root() (the *dagql.Server passed in)
nice catch
@clever oasis merged @proper ledge 's mcp stdio PR so i can stop scripting and manually process managing my mcp processes on 18.7 
woo with proper config, can confirm trivy demo works real good and easy... kinda crazy how much more stable it feels
other thing i'd recommend to folks is if you've got some mcp client config like
dagger:
args:
- mcp
- -m
- ~/src/dagger-to-agents/hello-dagger
- --env-privileged
bundled: null
cmd: dagger
description: "dagger functions for the hello-dagger app"
enabled: true
env_keys: []
envs:
OPENAI_API_KEY: foo
DAGGER_LOG_STDIN: /tmp/debug.stdin.log
DAGGER_LOG_STDOUT: /tmp/debug.stdout.log
DAGGER_LOG_STDERR: /tmp/debug.stderr.log
# _EXPERIMENTAL_DAGGER_RUNNER_HOST: docker-container://dagger-engine.dev
# _EXPERIMENTAL_DAGGER_CLI_BIN: ~/src/dagger/bin/dagger
name: dagger
timeout: 300
type: stdio
you should really not hesitate to use "description" field, like until we get roots support wired up, it's not cheating to tell the client what module we're pointed at and what it's for
πΈ Going in the docs
at least with 4o, that lil description field seems to have totally avoided the problem i was having before where it wants to immediately ls -R to figure out wtf is going on in the project dir
did this get sneakily fixed on main? i was expecting to hit it when demo-assembling today but haven't seen anything that looks like this (yet?)
it's possible that it only broke on my branch, from a refactor
ah cool
sweet lord @clever oasis the static tools PR is impressive... i set it up in zed, and accidentally created a profile where all the zed built-ins are disabled. i broke a test intentionally, then prompted "i've got a broken test, can you run it and fix it plz" ... claude 3.7 did all the test running, that part i expected, but even in the absence of client-provided edit_file tools, totally inside the sandbox, it globbed its way down into the appropriate test and impl files, then pulled out contents, Directory.withNewFile'd new contents back in, and then reran tests with the changes. It did all this on the first try with intentionally vague prompting.
sick
Claude does seem especially good at our static scheme, for whatever reason - it doesn't even seem to need the system prompt: https://discord.com/channels/707636530424053791/1371932924299182100
incredibly sick, this makes me think that fully sandboxed edit tools are way more feasible than I thought they were a week ago
like we were discussing with goosemic
don't block on me for these tool scheme prs, i'll catch up
(or at least I'll try π )
appreciate the ping though
soooooo with static tools + hostdir reload, i just got a vibe-code-coverage-improvement demo mostly working in zed. their "follow edits" feature makes the demo look great. however, out the box, there are a couple issues:
- I needed to configure a zed agent profile with access to file read and edit tools, but without the terminal tool. with the terminal tool, it'll either entirely ignore dagger or it'll get confused about sandboxing vs the host shell.
- Because static tools doesn't get super descriptive about what all can be done with the loaded module until the agent calls the tools, the agent needs to be coaxed to use explore dagger functionality at all. i added a zed "rule" that says "When developing code, when you'd normally run a command in your terminal, instead you should look through available dagger objects and methods to accomplish the task." this way it'll actually explore the API.
- It did get confused about sandbox state at some point... with host dir reloading, some things reload automatically, but you can also save off old state for later use, and it'll find ways to run tests in old containers where it's most recent changes aren't reflected. if you stop it and explain what's happening, it stops getting confused and acts right. i'm experimenting now with a zed rule that injects this context.
- zed, unlike goose, preserves a single instance of the mcp server for the whole project, rather than per-chat. this makes repeated runs w/ new sessions kinda weird, like it'll go off and find all the objects you have lying around and use those to get results that may be from historical state.
- one interesting tidbit, claude is getting good enough at this that one time when i had terminal enabled, it did a complete end run around our MCP and figured out how to use
dagger callto run tests.
friendly reminder I'm heading to the airport this evening for PTO, back Tuesday - would be nice to merge static tools so it's not lingering, whenever ya have a moment to review/approve @dark egret @proper ledge
PR is all green: https://github.com/dagger/dagger/pull/10366
onnit
oops mcp usage specifically is broken atm, know what it is, looking into it:
1 : ! error starting MCP server: input: llm.withEnv.__mcp failed to convert tools to MCP: schema of arg "self" of tool "call_method" should have a "type" entry of type string, got []string
Do we want to merge it prior the hack night ? I mean we're still gonna ship it so maybe do it now yeah ahah -- as the evals still work ahah ; we gotta trust the evals π but it's still a paradigm shift π€£
afaict it works quite a bit better than before in situations where there aren't other tools involved, but may be about the same in the presence of other tools.
i'm curious if anybody remembers the details of which clients support instructions cuz that would give me a more complete mental model of relative performance (i think it might be that goose works good bc it reads instructions and zed does not)
@tardy shadow the other thing is for the hack night you can be on the released version and have dynamic or dev build and have static
Yeah just realized it after saying it π―π
I withdraw all previous comments π πΌ π€£
afaik it's only Goose that respects instructions, yeah. but based on https://discord.com/channels/707636530424053791/1371932924299182100 if you're using Claude it doesn't even need them
this seems true w/ claude when you have no other tools. i think you may need instructions when you've got other tools that can accomplish the same goals (terminal especially)
basically what i was seeing yesterday in my agent setups stitching up other toolkits is that even claude will privilege the "terminal" tool and find that dependencies are consistently missing, and then not conclude from that that it should be using dagger
pushed a fix for this
btw, interesting - Claude Code seems to have a "TODO list" mechanism now, which is exactly what i tried for static output saving (each output gets a TODO)
ended up scrapping it, but i still like the idea of a TODO list that the model can check in with at any time
though that also means there'd be two TODO lists π
oh that's interesting
@clever oasis the input/output tokens count is it an average or is it because or despite the increase in number of attempts ?
it's an average across all attempts, so # of attempts increasing doesn't inflate it
so overall it's just gemini doing couple % worse (but not statistically significantly worse)
When the number of tokens increases is it because it's searching its way to the goal via some detour ?
could be any number of things - redundant tool calls (e.g. list_methods repeatedly), stumbling and having to try different approaches, larger system prompt
If you're looking for other tools to test mcp functionality in I can recommend https://github.com/sst/opencode
so i made the llm API key loading lazy a while ago, right? i was just going through and testing that w/ dagger mcp because theoretically it means we don't have to configure OPENAI_API_KEY=lul given that we never call loop, provider, or model on the mcp codepath.
and as expected, it's no longer required. that said, while testing, i keep hitting errors like this:
{"jsonrpc":"2.0","id":11,"result":{"content":[{"type":"text","text":"failed to sync: select: process \"/runtime\" did not complete successfully: exit code: 2\n\n\u003c_type\u003e\nEXEC_ERROR\n\u003c/_type\u003e\n\n\u003ccmd\u003e\n[\"/runtime\"]\n\u003c/cmd\u003e\n\n\u003cexitCode\u003e\n2\n\u003c/exitCode\u003e\n\n\u003cstderr\u003e\nunexpected status 200: get or init client: initialize client: failed to load module: load moduleSource(refString: \"~/src/dagger-to-agents/hello-dagger\").asModule: Module!: load: load base: load: failed to stat local path: failed to stat path: failed to get requester session: session for \"twser5q6zfebjl0q0ee271x65\" not found\n\u003c/stderr\u003e\n\n\u003cstdout\u003e\nfailed to return error: input: get or init client: initialize client: failed to load module: load moduleSource(refString: \"~/src/dagger-to-agents/hello-dagger\").asModule: Module!: load: load base: load: failed to stat local path: failed to stat path: failed to get requester session: session for \"twser5q6zfebjl0q0ee271x65\" not found\n \noriginal error: get parent name: input: get or init client: initialize client: failed to load module: load moduleSource(refString: \"~/src/dagger-to-agents/hello-dagger\").asModule: Module!: load: load base: load: failed to stat local path: failed to stat path: failed to get requester session: session for \"twser5q6zfebjl0q0ee271x65\" not found\n\u003c/stdout\u003e"}],"isError":true}}
to me, this reads like the engine is getting confused about which cli-side session it should be talking to... does that make any sense to anybody?
YES, i've seen this error sometimes, even at the hackathon, and I don't know why it happens, and i don't know how it resolves itself (i think i just busted cache or idk)
i think it might only occur when you've got multiple CLIs running at once
ah possible, that can be tested
comingggg π https://github.com/orgs/modelcontextprotocol/discussions/364#discussioncomment-13491704
yeah i've been stalking them a bit here https://pkg.go.dev/golang.org/x/tools/internal/mcp
Word on the street is that they plan to setup the public SDK repo this week sometime.