#Release checklist π§΅
1 messages Β· Page 1 of 1 (latest)
I'm stressing out a little over getting to a stable release this week... Doesn't help that my hands are tied with the shell launch. Can we try to make a checklist together @nocturne laurel @bright ore @mighty storm @crystal monolith ?
Happy to help
I'm working on a PR (+ big description) for all the R&D I've been doing on the tool calling scheme: https://github.com/dagger/dagger/pull/9956
I'll go over my other TODOs and list anything that seems delegateable here
I can only talk about MCP stuff.
- [DONE] i wanted to fix a bug, that's done, will push it.
- There's another bug that's exercised only in Cursor, not sure why, it feels like a "multi-object/single-object conflict" but it doesn't make sense why it would only happen in Cursor. I'm fine deprioritizing this bug if it helps.
- About to update the PR description to show how one can use Guillaume's mcp-client to better test it.
- Need Reviews
Happy to help review things too
Pulling in ecosystem crew @tribal pewter @sharp oasis @burnt warren
One possible gap is the end-to-end CLI experience. The killer use case, for now, seems to be "write simple modules, and compose them for your personal workflows". Strong continuity from dagger shell, and strong focus on the details of the CLI UX
So the fine details of "prompt mode", how to configure and customize it, how to get data in and out of it, etc. will get a lot of usage and eyeballs - but comparatively they're the least mature part because we built it last
It's like the typical user journey is the chronological inverse from our own builder's journey. What they will use first is what we built last, and vice-versa
The Benjamin Button journey
Agreed. This is the part I'm working on content for, so if I work out a script tonight maybe that'll help show where the gaps are?
yes please!
One gap will probably be: modules that are actually composable
Right now we're getting lots of monolithic agents
Instead we want composable modules. Since the first agent you'll create is the one embedded in the CLI
Iβve been refining my database workspace, I can pull that out into its own repository if thatβs helpful? I donβt feel like too many people are using agents with databases (in dagger at least) atm
Yes it would be helpful. The key would be to make it a regular module, that both a human or a llm could use
Also another papercut:
dagger initwithout a SDK:daggerfails to load the module. This makes it difficult to introduce a workflow based on CLI-only module composition
I can take a stab at this if it helps
Its pretty generic, just want to sort out a name - probably just going to go with database-workspace . Will push after dinner and getting the little ones down
which database? Any sql? or postgres only?
MySQL or Postgres π
I would just call it sql then
Unless you plan on supporting non-sql databases, in which case call it db or something
Think of it as a precursor to a stdlib module
ok, I'll name it sql... I have used a lot of no-sql - that really should/could be a different module.
or, if you want to expose mysql-specific or postgres-specific modules, then split them into mysql and postgres (but given our DX, splitting modules with common code is probably not the place to start)
Could you please detail how to repro the expected use-case:
- you init a module without the
--sdkfield ? This works - You try anything after:
dagger_dev functions
β connect 1.6s
β load module 0.6s
! failed to serve module: input: moduleSource.asModule module name and SDK must be set
β β finding module configuration 0.6s
β β initializing module 0.0s
β ! input: moduleSource.asModule module name and SDK must be set
β β β ModuleSource.asModule: Module! 0.0s
β β ! module name and SDK must be set
Error: failed to serve module: input: moduleSource.asModule module name and SDK must be set
- You expect that to work ?
Once 17.1 is out we'll want to update every example on the list to that release and make sure they're all in good shape
Yes that's right. At step 2 you can just call dagger
I reported the issue, but I think I did it on discord... Now I can't find it. Should have made a github issue
@nocturne laurel assuming we release tomorrow: what do we do about multi-object?
Do you want to charge ahead in the evals branch? But if so, we lose the current UX for prompt mode right? Can't give variables without expanding them in the prompt - can't receive variables back
This is my current plan yeah. Personally I prefer the expanding vars in the prompt, since it makes it totally in line with how prompt vars otherwise work, and keeps things very clear if you change variables over time. The magic $_ variable also substitutes the need for having the LLM set vars to me, since it aligns with the functional paradigm (LLMs just return a value, they don't muck with your environment and make up names with inconsistent schemes etc).
This is "just, like, my opinion" but I think it's worth trying since IMO it feels very aligned with our broader paradigm outside of LLMs - it aligns with functions, and even in shell we already wanted a "magic var" to refer to the last value returned, so reusing that proposal here feels good to me
- For giving variables to llm: would be cool to "merge" them into the current object scope, like an overlay. So they would show up as tools
- For receiving variables from llm: if
set/getvariables was too heavyweight, maybe just a_returnbuiltin tool? We could enshrine that a llm can behave like a function, ie. it can return a single value
_error / _return -> works well together
Yeah I like this idea, since it'll be a more explicit signal that it thinks it succeeded at the task you gave it
we could even say "if a model ends its turn without returning, that's when we hand control back over to you"
might help if the LLM API has a way to specify what return type you expect? Then it would be _return_Container or something
I really don't mind that system existing at all, I might even use it occasionally, but I find it confusing that a llm can ambiently discover some things (core API, current module & dependencies) but not others (bindings you specifically gave it already). I think in that context, there will be a lot of support requests like "what do I do?" and we'll be copy-pasting "did you make sure to expand the variable in the prompt?"
So adding them as tools overlay seems like a good addition IMO
My only concern with that, was the weird assymetry if some bindings can be written to (variables) and others can't (functions). But if the llm can't set variables anymore, and only return a value - then that assymetry is resolved
I'll play with it - really hard to say without seeing what it does, IME it's pretty delicate so keeping the toolset as focused as possible is a high priority. Maybe even having a tool to just list the current vars would help, but even that might end up being called all the time, confusing it into thinking vars are super important when really they're just the jumping off point
personally referring to values in the prompt felt the most intuitive to me, and I even suspect it's what most people would try if they know about prompt vars, but my expectations don't always match reality π
another option is to have the act of setting a variable inject something into the history, like "The variable $foo has been set to Container#1". That's actually what happens on main at the moment but I removed it for llm-evals
@nocturne laurel to be clear I'm proposing no new tool for listing variables
yeah - I thought you were proposing one-tool-per-var
just add them to the list of functions. Basically llm should not see the difference between a var and an actual function in the current object
main difference I guess, is that the vars stayed overlaid on top of all objects across selection
and maybe they are not prefixed with Type_?
none of which should require more "thinking" IMO
you're suggesting bringing back select_varname right?
that would make MCP support harder (same as system prompt), so last resort IMO
No, something simpler. Let me illustrate:
dag.LLM().SetString("foo", "bar").WithQuery().WithPrompt("what tools do you see?").LastReply()
I see the following tools:
- `container` which returns a `Container`
- `git` which returns a `GitRepo`
- `foo` which returns a `string`
withPrompt("now create a container and select it. What tools do you see?")
Now I see:
- `Container_withExec`
- `Container_rootfs`
- ...
- `foo`
oh i see, for non-object values? or both? if it's an object, does it change the selection?
and i suppose calling foo would return {"result":"..."}? (to be consistent with the current scheme on llm-evals)
i'm a little wary of doing something too different from regular prompt engineering here, since there are existing patterns even for passing large string values into a prompt
but, it's an interesting mechanic for it to be able to recall values later, rather than relying on message history
I guess both? For objects, I would keep the same behavior as everything else
yeah the more consistent the better
What's the use-case: why not using the dagger --no-mod ? Or is it because you wanna gradually install modules and use them in your LLM agents
Just wanna clarify if it's: 1) enabling moduleSource without SDK field across the entire engine or 2) just bypassing in a smart way this security check ?
Hey @cyan vapor, what's your opinion on https://github.com/dagger/dagger/issues/9203 π Helder said you were exploring this -- and we were thinking about taking it
Hey @fresh pivot, do you have some guidelines / opinions on how to solve https://github.com/dagger/dagger/issues/9203 ? π
We'll explore a bit and add context on the issue in the meantime
yes
we can just remove the message and the error case
also remember that over MCP, you only have bindings. Would be weird to have ability for object bindings but not scalar bindings
bindings? is that an MCP term?
You mean CLI side or API side; changing the moduleSourceAsModule ? π
wherever the error is coming from, I think is safe to remove
essentially that code path just shouldn't error
Oki, because i've been trying to remove the checks here: https://github.com/dagger/dagger/blob/main/core/schema/modulesource.go#L2135 ; and i get some segfaults -- but pulling the thread, no worries, thanks, very helpful π
Glad to know there's no big red flag π
No I don't think MCP uses that term. I mean that when exposing a Dagger module over MCP (coming soon β’οΈ ) you can't inject prompts in the client LLM, you can only add bindings that will be exposed as tool calls over MCP
So "you can always inject the string in the prompt" doesn't apply when your module is consumed over MCP
i see - but when you're only consuming over MCP, won't you also not be able to set variables anyway?
(or i just haven't seen how we do that)
Well the idea is that any environment dagger can give to a llm, it can expose over MCP. So in this case it might be dagger mcp -c 'foo | bar | baz'
But even if we didn't have a practical UX for exposing variables - we should still decouple tools from prompt as much as possible
I'm leaning on MCP as an simple proxy for a more general design rule.
it also applies for eg. composition of multiple objects & LLMs
one trick I just did that helped a LOT was to expose a dummy currentSelection tool that just has "Your current selection: Container#1" as its description. With that change I no longer have to convince gpt-4o of anything after setting its state
nice π
Yeah I think there are a LOT of unexplored tricks with tools alone
With MCP blowing up, I bet we're going to see a lot more of those tricks in the near future
also, not sure if you saw but the system prompt and additional hints in llm-evals are only enabled for Gemini
once I get gpt-4o working without a prompt, I'll go back and try Gemini again without them
and maybe Gemini 2.5 which just came out won't need them either
it's the smartest gemini ever
uh i just checked and it's rate limit is 5RPM (vs 2000 on 2.0), so probably not ready for tool calling yet lol
it wrote me a short story about ordering pizza in 25 seconds
writing the story took 25 seconds, it wasn't a story about ordering pizza in that amount of time
yeah π that's the tricky bit, the error message protects from those segfaults or bad errors
you also need to fix that case to actually work
ok maybe this test is being a little extreme, but this is a little spooky: (not all LLMs fail at this)
the test:
weirdText := "-$@!&* BEGIN WEIRD FILE -$@!&*\nim some fun content\n---- END WEIRD FILE----"
return withLLMReport(ctx,
m.LLM().
SetString("myContent", weirdText).
SetString("desiredName", "/weird.txt").
SetDirectory("dest", dag.Directory()).
WithPrompt("I gave you a variable, a directory, and a filename. Can you write the content to the specified file in the directory?"),
func(t testing.TB, llm *dagger.LLM) {
content, err := llm.Directory().File("weird.txt").Contents(ctx)
require.NoError(t, err)
require.Equal(t, weirdText, content)
})
i think the moral of the story is "don't set anything in the environment that you wouldn't trust an LLM to read and regurgitate accurately". the realistic workaround is probably to pass a File in instead.
