#agents
1 messages ยท Page 2 of 1
mmm already on 0.17.0-llm.1
new error dropped:
wanna go on dev-audio ?
! input: llm panic while resolving Query.llm: runtime error: invalid memory address or nil pointer dereference
! goroutine 9748 [running]:
! runtime/debug.Stack()
! /usr/lib/go/src/runtime/debug/stack.go:26 +0x5e
! github.com/dagger/dagger/dagql.(*Server).resolvePath.func1()
! /app/dagql/server.go:736 +0x78
! panic({0x25ef180?, 0x4b21010?})
! /usr/lib/go/src/runtime/panic.go:785 +0x132
! github.com/dagger/dagger/core.(*SecretStore).GetSecretPlaintext(0xc000ac8450, {0x30d5110, 0xc000b22210}, {0xc0016a6570, 0x15})
! /app/core/secret.go:199 +0x1dc
! github.com/dagger/dagger/core/schema.(*secretSchema).plaintext(0xc000499f20?, {0x30d5110, 0xc000b22210}, 0xc00127a588, {})
! /app/core/schema/secret.go:182 +0x9a
! github.com/dagger/dagger/dagql.Func[...].func1({0xc000373640?, 0xc00127a588?, {0x0, 0x1, 0xc0007f61e0, 0xc000ac2610}, 0x0}, {})
! /app/dagql/objects.go:577 +0x49
! github.com/dagger/dagger/dagql.NodeFuncWithCacheKey[...].func1({0xc000373640, 0xc00127a588, {0x0, 0x1, 0xc0007f61e0, 0xc000ac2610}, 0x0}, 0xc000b22030)
! /app/dagql/objects.go:634 +0xfc
! github.com/dagger/dagger/dagql.Class[...].Call(0x3118c00?, {0x30d5110?, 0xc000b22210?}, {0xc000373640, 0xc00127a588, {0x0, 0x1, 0xc0007f61e0,
i was doing fil cuz the log is too long discord doesnt like it ๐ฆ
๐จ๐จ๐จ New pre-release: v0.17.0-llm.2. Now with Anthropic support!
To install:
curl -fsSL https://dl.dagger.io/dagger/install.sh | DAGGER_VERSION=0.17.0-llm.2 BIN_DIR=/usr/local/bin sh
Thank you @split shard @gloomy kindle @spring wave for wrangling our release system to support these pre-releases ๐
a checksum might be missing
I installed it and it works for me at least! got Claude-3.5-sonnet to install various packages in a container ๐
The checksum issue should be fixed now...
amazing work kyle, guillaume and vito 
np! the main learning: for folks configuring ollama, you have to configure an IP reachable from the engine - so 192.168.xx.xx or 10.xx.xx.xx, not 127.0.0.1 or 0.0.0.0. Maybe we should automatically create a tunnel to the host?
@wraith remnant would you mind adding this to the README? ๐ in a section "setting up with ollama"?
Yes now ๐
@spring wave @wraith remnant quick note while preparing demo. We lost the ๐ค๐ญ on API calls... was pretty nice for long round trips
doesnt it stream? or is this for a slow API that takes a long time to stream back?
It should stream I agree, but I had the same feeling, when using sync that sometimes we are in front of an empty step
@smoky ocean @wraith remnant try using loop instead of sync
i think i noticed sync gets hidden because we usually don't actually want to show it (i.e. container.sync)
also shell doesn't print errors coming from function calls
(sorry rehearsing demo, will follow up after)
@spring wave that change you mentioned, including stderr in withExec error under llm codepath... Is it in llm.2? If so I can simplify my toy-workspace module even further ๐
yep it's in
of course NOW the llm tries to access workspace.container()...
We may need a new pragma +nollm
or argument to llm: withToyWorkspace(hide:["container"])
๐ it worked! Thank you @spring wave
New workspace code:
package main
import (
"context"
"dagger/toy-workspace/internal/dagger"
)
// A toy workspace that can edit files and run 'go build'
type ToyWorkspace struct {
// The workspace's container state.
// +internal-use-only
Container *dagger.Container
}
func New() ToyWorkspace {
return ToyWorkspace{
// Build a base container optimized for Go development
Container: dag.Container().
From("golang").
WithDefaultTerminalCmd([]string{"/bin/bash"}).
WithMountedCache("/go/pkg/mod", dag.CacheVolume("go_mod_cache")).
WithWorkdir("/app"),
}
}
// Read a file
func (w *ToyWorkspace) Read(ctx context.Context, path string) (string, error) {
return w.Container.File(path).Contents(ctx)
}
// Write a file
func (w ToyWorkspace) Write(path, content string) ToyWorkspace {
w.Container = w.Container.WithNewFile(path, content)
return w
}
// Build the code at the current directory in the workspace
func (w *ToyWorkspace) Build(ctx context.Context) error {
_, err := w.Container.WithExec([]string{"go", "build", "./..."}).Stderr(ctx)
return err
}
cc @shrewd ermine @bronze fern ๐
I also removed the comments from read() and write() arguments, because they're self-evident, model doesn't need them. Saves me a lot of LoC ๐
lol every line counts!
Starting a thread to coordinate launching the new repo github.com/dagger/agents
sorry Alex 
Quick hype break: https://x.com/solomonstre/status/1893059388518342704
๐ฅค hey everyone, quick taste test: you already know about Dagger but many of your friends still don't, so which of these bite-sized descriptions do you think would get their attention the fastest?
6
11
1
Dagger: Composable agent runtime ๐คโ๐
One more hype tweet for the road - this time hyping up @shrewd ermine ๐ https://x.com/solomonstre/status/1893080136141947252
Good luck @smoky ocean - you are going to crush it! Remember to zoom in ๐
Everyone: this repo is the central place for everyone interested in building AI agents with Dagger: https://github.com/dagger/agents
Please star and fork ๐
Quick request @wraith remnant ๐ there is a typo in the readme.
Once this feature is merged (current target is 0.17), Melvin will support with a stable release of Dagger.
Could you fix it please? afk at the moment...
What do you want it to be ? I don't see it ahah:
Once this feature is merged (current target is 0.17.0--llm.2), Melvin will support with a stable release of Dagger. ?
Actually, there are several references to melvin still -- I shall reference to dagger/agents ?
just say "a development build will no longer be required". no melvin mention
I can do some touch ups shortly too when I submit my demos
https://github.com/dagger/agents/pull/8, can someone โ ๐
Can't merge either, I have 0 rights on this repo ahahah ๐คฃ
done. sorry! will give you permissions tonight
can you throw a repo description in the about section (top right) i.e Dagger AI Agent runtime or something
i also think that section helps with index / observability on search
merged ๐
I'm new to golang, can someone help me to understand where does this llm.WithToyWorkspace() is from? https://github.com/dagger/agents/blob/main/toy-programmer/main.go#L15
This is a bit of a dagger specific thing. When you install a dagger module you get bindings for all of the functions that exposes.
So in this case, llm.WithToyWorkspace() comes from the fact that the toy-workspace module is installed here: https://github.com/dagger/agents/blob/main/toy-programmer/dagger.json
--
more details below, feel free to skip if you dont want to fall into a rabbit hole (yet) ๐
You can see the details of ToyWorkspace here: https://github.com/dagger/agents/blob/main/toy-workspace/main.go
This one is written in go, but note that dagger modules can be written in many languages (go, typescript, python, php, and more coming soon) and be consumed from any other language.
You can learn more about dagger modules here: https://docs.dagger.io/api/module-structure
Thank you!
@dim cypress note that Dagger is not Go-specific. What's your language of choice? There are SDKs for Typescript, Python, PHP
Anthropic 3.7 (+ their agentic CLI -> Claude Code) released today -- need to explore ๐ ; wish its gonna be awesome for agents ๐ค
Gemini Client ๐งต
I guess I finally understand how dagger module works, but my IDE (VSCode) can't resolve module, I tried https://docs.dagger.io/api/ide-integration/ and restart IDE, but still the same
Dagger uses GraphQL as its low-level language-agnostic API query language, and each Dagger SDK generates native code-bindings for all dependencies from this API. This gives you all the benefits of type-checking, code completion and other IDE features for your favorite language when developing Dagger Functions.
ok.. running dagger updatefix it
I always thought it would be awesome to use AI locally when working on projects for Laravel, so I wrote an agent that will look at my git diff and determine what tests I need to write (and also run them until they pass locally first!) - Repo is here, I used the PHP SDK https://github.com/jasonmccallister/laravel-assistant
using qwen2.5-coder:14b and the ToyWorkspace, feels like the LLM is not really calling the function, it just explain the steps but not calling ToyWorkspace.Write , any ideas why?
Mmmh, I haven't tried Qwen yet -- testing it out. Is your repo public ? The nodeApp implementation in particular ๐
not public, but it's simple like
Qwen tool calling
๐
Was there a reason to not expose the with-system-prompt as a top-level primitive like with-prompt ? What's your sentiment around introducing it ?
i think we can add it to the constructor for llm, wdyt?
since at least gemini and i think anthropic just allow setting one, it doesnt need to be chainable
so being passed as an arg ๐
I just realized there is https://github.com/dagger/dagger/blob/llm/core/llm.go#L422 but it's added in history with Role: "system" which may not work on all providers
maybe that's what you were already talking about
yes and it's not exposed ๐
Gemini expects the system prompt(s) to be set in the client and not in the content feed. Not sure how anthropic feels about it. Role must be either user or model https://pkg.go.dev/github.com/google/generative-ai-go@v0.19.0/genai#Content
I initially left it hidden just because of API churn: less boilerplate to refactor 5 times in a row.. now we could bring it back.
BUT
there is churn upstream also, I think openai actually deprecated the term "system" in favor of "platform" (model owner) and "developer" (mere mortals programming the agent)
That gemini PR is ready for testing/feedback/review ๐
@shrewd ermine nice! Is it safe to merge and test from a pre-release? Or is there a risk that it breaks other things?
Totally isolated and safe
@smoky ocean coming soon to an agent repo near you...
wat!
Can't wait ๐ @shrewd ermine . How's the model performing, is the translation reliable?
I haven't given it's workspace a checker yet, but it'll be checking against dagger functions matching the original
My next problem to overcome is that I'm providing all sdk reference inline with prompts and it's just too much. It's overwhelming the llm. So I'm giving the "dag-workspace" a reference lookup tool instead
@shrewd ermine @bronze fern for changes to your respective modules only, feel free to just push directly to llm
for changes that might affect others, safer to do PRs (even if they end up being fast-tracked anyway)
This feels like a good time to tag llm.3? OK with you @spring wave @shrewd ermine ?
yeah worth doing if not just for the API getter lowercasing change
i'm also cooking up some shell improvements, but that can wait til next release
which one is that?
the llm getters were all uppercase before, which we didn't notice probably because we use Go all the time
but in PHP/etc you'd end up with e.g. $llm->Container() which is weird
You mean push to https://github.com/dagger/agents too? or just talking about dagger/dagger@llm?
wtf is going on here lol - running the same command in one shell is committing necromancy on an older session I had open?! /cc @steep onyx
happens on vito/shell-bbt (ish - local changes, but CLI only). maybe related to cross-session caching...?
you're running the same thing in each right? so I guess telemetry for it is getting sent to both clients because they are both open and the older client gets updates
isn't that how the telemetry pub/sub supposed to work (even though weird in this case)?
I was seeing an issue with directory | entries not displaying in shell but I'm afk at the moment so I can't provide any helpful details. Otherwise good for LLM.3 release
Also there isn't any cross-session dagql caching yet, that's not merged (not sure if that's what you were referring to)
yeah - they're both sessions where I run the command and then interrupt it. I guess what's surprising is that subsequent runs + interrupts always get routed to the older client, even long after it interrupted its own call
it seems like the newer client doesn't get the buildkit-level telemetry at all (it just says pending)
Oh I wouldn't be surprised if this is our old friend buildkit edge merging...
yeah could be
I hadn't factored in that shell is a very long lived client/session
seems to persist even after closing the old client. hmm
fine if i start a new one though
If you have access to the engine logs and you see any recent logs that contain "merging edges", that would help confirm
I wouldn't be surprised if the progress buildkit sends ends up being only the "original edge's client". And that also aligns with it persisting even after you close the old client and using on the one that was already merged
0.17.0-llm.3 ๐งต
@spring wave you happy with your choice of MCP lib from that POC you had?
Same one @spark phoenix has been using. We're looking at fastest way to add server-side support.
I'm wondering how much I can reuse the existing bbi implementation vs. re-implementing a "Dagger introspection to X" adapter
Doesn't look like they allow passing an existing jsonschema
Do they take an openapi Schema? That's what Genai uses so we have a conversion for that now
Doesn't look like it, it's a static Go API for declaring each type
calculatorTool := mcp.NewTool("calculate",
mcp.WithDescription("Perform basic arithmetic operations"),
mcp.WithString("operation",
mcp.Required(),
mcp.Description("The operation to perform (add, subtract, multiply, divide)"),
mcp.Enum("add", "subtract", "multiply", "divide"),
),
mcp.WithNumber("x",
mcp.Required(),
mcp.Description("First number"),
),
mcp.WithNumber("y",
mcp.Required(),
mcp.Description("Second number"),
),
)
Oh ๐ณ
@smoky ocean, I can merge my shell branch in with the llm stuff so you guys can dogfood the filesystem navigation if you want, but may be risky if you hit a bug and I can only fix it the next day. Same thing for rebasing, I can do it easily but you may need it on your timezone.
Maybe we do a quick live discussion with @spring wave tomorrow our morning?
I would love to do it if we don't find any red flags
Actually, maybe it works!
type ToolInputSchema struct {
Type string `json:"type"`
Properties map[string]interface{} `json:"properties,omitempty"`
Required []string `json:"required,omitempty"`
}
That Properties looks a lot like raw unmarshaled jsonschema ๐
Trying now!
it's crazy that google makes it so cheap compared to others -- I just hope it is good in agentic flows ๐ค
here's the work so far on shell-bbt - it's a bit tedious to implement, but feels like a good balance
- sticking with the change to not take over mouse input (scrolling and clicking)
- instead, commands are now printed to the scrollback when they complete, similar to before - so you can just scroll like normal
- in-progress commands are rendered with a different background, matching the prompt background
- hitting
<esc>still pops you into navigation mode, but now it has the distinct background so you can tell what's scrollback vs. the navigable history
gonna do the rebase + re-merge shell-bbt + force-push /cc @smoky ocean
done
@shrewd ermine are you still using the top-level demo module? I'm going to split it up, the melvin parts will go under melvin/.
No i've been using separate top level modules. +1 to that change
done
please tell me if you find any issues!
I moved demo under melvin/demo, but later today will finish cleaning up by merging it into melvin/ proper
I'm finding myself using this pattern in every checker
@function
async def check(
self
) -> str:
"""Checks if the workspace meets the requirements"""
cmd = (
self.ctr
.with_exec(["sh", "-c", self.checker], expect=ReturnType.ANY)
)
out = await cmd.stdout() + "\n\n" + await cmd.stderr()
if await cmd.exit_code() != 0:
raise Exception(f"Checker failed: {self.checker}\nError: {out}")
return out
do we need a better DX for that? (specifically returntype.any -> exit_code -> stdout/stderr)
The reason I have to do this is because the raised exception does not include stderr. It just says process "foo" did not complete successfully: exit code: 1
anyway the dagger translator is getting better! Almost ready to PR I think. The prompt is something very special lol
i'm not sure how it is in Python, but in Go you get an ExecError type back that has the stdout/stderr pre-populated. I think Python raises a specific exception type that has the same fields
there's also a chance I change the errors back to directly embed, but it'll be a dive back down the frontend rabbit hole to figure out how to strip them back out
maybe something incredibly explicit like -----BEGIN STDOUT-----
I'm not sure ๐ค This is all I see when a function call fails (and all the llm sees) "foo" did not complete successfully: exit code: 1
yeah, it's intentionally omitted from the default error string representation because it makes the UI and telemetry (and anywhere else it ends up) incredibly verbose.
but the data should be available on the error type
That does sound better because at least if I'm trying to debug something I have a way to see the error. So I think I'd lean toward too verbose vs nice an clean (I know, be careful what you wish for). Here's a trace for example https://v3.dagger.cloud/kpenfound/traces/3dabde63ed3a0b6c3cfe80b69881f022?span=c37a3e11241deab4
the problem isn't necessarily that it shows it, the problem is that it's usually completely redundant with the logs also showing the same thing (only better, since they're interleaved)
not sure why they don't in that case though (https://v3.dagger.cloud/kpenfound/traces/3dabde63ed3a0b6c3cfe80b69881f022?span=258644302e896d58) - maybe it's not the first time the call was made, so it was cached?
ah here it is
Could be.. I could go back in my previous sessions to find it. BUT we should still show it even if it was cached?
the thing is, even if we do bring it back in the error string, the UI would just be stripping it back out, because otherwise in the above screenshot you'd be seeing the same thing twice
(only one would be poorly formatted)
might need to do something sophisticated like keep track of the same exact call having already errored elsewhere in the trace
hmm actually these aren't even the same
I guess. In general I'd like to see output from cached events most of the time (ignoring technical constraints ๐ ). Pretty often I find myself looking at a trace and want to know what the output of a certain command was but it isn't shown because it was cached
oh right. So in the screenshot the one with no output was the one that the llm ran into. Eventually it gave up. My pipeline runs the test itself before returning the output (but after the llm has done everything) becase I'm struggling with llms giving up and I want to be sure the output is good manually
this is even trickier because it looks like it was deduped at the LLB layer, not the DagQL layer - they're different calls (different space invaders, digests, etc), but they ended up the same at the LLB layer, because one of them calls withNewFile twice with the same content
it might still be possible to correlate them, though, I suppose they should have the same LLB vertex digests as outputs? maybe...?
because one of them calls withNewFile twice with the same content
Does that make things more complicated? I think it's the best primitive we have for an llm workspacewritefunction, but maybe it needs to be synced/flattened
@shrewd ermine @spring wave @bronze fern we're going to merge github.com/dagger/agents directly into dagger/dagger. It seemed like a good idea to have a separate repo, but having a throaway repo in the official dagger org ends up being worst of both worlds
I am going to open a PR to dagger/dagger which:
- Updates the README to mention both CI and AI use cases
- Adds the demo modules to a
agents/directory - (waves hands)
Thoughts?
a little bit - it means I can't do something like "oh that same exact call failed over here, I can just show it the same way in both locations" - because they're actually different calls. Instead, if I want to e.g. show the logs in both places, the UI would have to aggregate based on something else, like buildkit vertex digests, assuming they actually are the same. (They might not be - iirc Buildkit distinguishes between vertex digests and cache keys - the latter are what actually dedupes things in the solver, but the former might be what gets sent over telemetry). Would you be able to send this telemetry to Honeycomb? I have a helper script if you need. Or, if you have a command I can run myself, that might be easier
quick 30s video of the results ๐
lol (from Cursor)
Is any Cursor user not in yolo mode?
@steep onyx @spark phoenix quick engine question: how hard would it be to tweak the session-attachable that powers Container.terminal(), to support a "raw" mode without a term emulator? The idea would be to use that quickly add the ability for the dagger CLI to act as a stdio server (for the purpose of MCP integration)
Rough concept:
"""A large language model context""
extend type LLM {
"""Serve the current context for remote tool calling over MCP. This takes over the current client stdio streams, and converts it to a stdio server"""
serveMCP(): Void
}
Not yet sure that's the right design, even for a MVP (cc @spring wave). It's pretty intrusive, also any nested module can just take over the CLI and really mess things up. Technically same thing for Container.terminal()... But this seems more disruptive
I'm just not sure what a better version is
So you want the CLI to just proxy the stdio of an arbitrary service? Probably not too much work, the terminal attachable just proxies stdio to some io.Readers/io.Writers it's given: https://github.com/sipsma/dagger/blob/5f7a342344e9bc4fcf2b734c5a548727fa6025f0/engine/session/terminal.go#L55-L55
Right now the CLI passes it this: https://github.com/sipsma/dagger/blob/5f7a342344e9bc4fcf2b734c5a548727fa6025f0/cmd/dagger/terminal.go#L13-L13
But if it passed it just the raw os.Stdin/out/err then that'd probably do it
I don't know enough about MCP implementation details to have an opinion on the best approach, but if there's some way to just proxy a network protocol rather than proxy via stdio that feels cleaner/nicer and would be able to just re-use our existing support for network tunnels
MCP also supports SSE (server-side events in other words - a dumb http endpoint) but 1) exec/stdio is pretty widespread, and 2) since Dagger doesn't support long-running / detached services, integrating via SSE requires telling users "keep a dagger service running in a term in the frontend for the duration of your user session" which is also not great
LLM-friendly dagger cloud preview broken?
it's just v3.dagger.cloud now
looks like it
@spring wave in the TUI, this seems to drop LLM spans:
dagger shell -c 'github.com/dagger/agents/toy-programmer go-program-qa "write me a curl clone"'
I see the prompt & replies for the dev agent. But for the QA agent, I only see the tool calls and the final reply. Not the intermediate prompt
does it look OK in the web UI? trying to grok the issue https://v3.dagger.cloud/dagger/traces/890d5b52999f71cddb27d6e7a29d6030
oh interesting, I think I see what you mean. it's the second human-emoji span, in the web UI, which is missing in the TUI?
FYI I just pushed a major clean up and improvement of the melvin example. I'd love a review from those of you who are interested, I'm seeing new patterns emerge.
I'm still working on this, hoping to get it out tomorrow
First draft: https://github.com/dagger/dagger/pull/9726
https://x.com/willccbb/status/1894957149500965292
I've been wondering the same thing.
I think if we had eg. a Dagger object with "searchDiscord" and "searchDocs" functions we could build a decent "help bot" without sophisticated RAG or other information embedding at the prompt level.
if youโre building a RAG system in 2025 just build a good search engine backend + let the model query it
fyi: working on metrics for token usage now
seems really easy to run into token limits, to the point where I'm worried that either a) there's some silly thing consuming a bunch of tokens, or b) I'm being spoiled by AI services/tools (Cursor, Claude) that have partnerships that let them blow past the usual limits for individual accounts
@spring wave fwiw I rarely hit limits on OpenAI. But had to pre-pay $100-ish, that got me to a certain "tier" where the limits are no longer an issue
I'm trying out Claude 3.7 which has a pretty low limit atm (20k tokens per minute) - can try switching back
it was mainly a good motivator to drive out the token metrics ๐ https://asciinema.org/a/z4CRn9NLaM0RWOXoJE1tTod2B
for that ^ I changed the TUI to show metrics by default, might need more thought (usually it's behind verbosity level 3)
If you use a broker like openrouter.ai, then they have agreements to give you higher token limits.
ah interesting. maybe we should support that?
It's "OpenAI API compatible", so as long as you can change the endpoint for openai, then you can point to it.
If I'm reading this right, only those env variables will be pulled from the .env file? Any reason to limit to only those? I know there is this issue with .env support https://github.com/dagger/dagger/issues/9584
should we be using this to lower token usage with Anthropic?
https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching
@spring wave also make sure to tell Claude to "shut up and just do the thing", by default it will comment in detail on everything it's doing which wastes output tokens
Yeah, this is the approach (at a smaller scale) that I took here: https://github.com/dagger/agents/blob/main/dagger-programmer/moduleWorkspace/src/index.ts#L85-L90
It seemed to work much better than just putting it all in my initial user prompt. I still had to keep each lookup response fairly short to avoid overwhelming it
anthropic caching
Not sure where those HTTP requests are coming from... verbosity=1, version=llm.3
I decided they're probably parts streaming from the llm's api, but that was my own theory ๐
those might be docker image layer pulls
I don't think I've seen them on verbosity=1 before
If someone is looking for a low-hanging fruit contribution: I would like to bring back the ๐ค๐ญ span during API round trip
--> Alex beat me to it ๐
It got lost in the refactors
BTW re-trying this on llm.3, still getting the problem
it's back as part of the token tracking work: https://asciinema.org/a/G0wZ4Rb3HpZBWpcUw605NjOem
still chewing on it though - it's unfortunate that it breaks the tool call chains
also not seeing the tool call spans.. Just the underlying regular spans, withExec etc
Dropping random idea for a dagger agent: AI-driven "fuzz testing" of your app. I feel like one of dagger's superpowers is being able to easily write e2e tests that would normally be a nightmare to create thanks to all the primitives in the core API + external modules.
So idea would be to write an agent that's given a set of tool calls that the LLM can use to create stress/fuzz tests. I.e. "create a nightmare filesystem of complicated files/directories/symlinks/hardlinks and tests that verify my code works as expected with it", "spin up these five services with dependencies between them and then assault it with invalid requests, make sure it doesn't crash", etc. etc. All of those sorts of things are possible with the dagger core API, this would just automate creation of the scaffolding code around it.
Would love to give it a try if/when I have time but feel free to nerd snipe ๐
It sort of feels like dagger is uniquely positioned to do this too because the normal problem would be you could have an AI agent do this, but then at the end if you saw some failures you'd just be left with the question of "what actually happened", but because we have The DAG โข๏ธ you can see everything that actually happened in dagger cloud, progress output, etc.
The README change we talked about: https://github.com/dagger/dagger/pull/9726
Yes exactly. That deep tracing capability is the killer feature in every demo. It seems to trigger the "want" reflex
Can a daggerized agent beat me in Red Alert? https://github.com/electronicarts/CnC_Red_Alert
๐จ PSA: all shell work is now merged into main and I just force-pushed a rebase to llm /cc @smoky ocean @shrewd fern
wait - why do you want to see those? I intentionally removed them because afaict they were 1:1 with the underlying call, to the point where we were constructing a "fake" call representation for a bit of time, which I later realized we can just skip by combining passthrough + reveal
Mmm you made me realize it may also be dropping intermediary function calls? I'm seeing a naked withExec and dont know. where it's coming from
I guess it's an overall UX problem of "I used to know the overall context at a glance, and now I don't". I think it's a combination of:
- Loss of "depth" information: nested ๐ง๐ฌ and ๐ค๐ฌ are flattened
- Loss of ๐ค๐ญ
- Loss of ๐ค๐ป
- The occasional extra verbosity
- maybe dropped function call spans?
Somehow the combination got me past the threshold of "I don't feel like I know what's going on exactly at all times, and I don't feel confident that the audience will either - which makes my demo less cool"
But I guess I shouldn't pinpoint one specific thing
@spring wave I'm starting an eval thread, going to run the same reference workflow and record it, then point out issues
Pushed the token counting + Anthropic caching.
I added token counting for OpenAI too but it always returns 0 for me
- and there's a != 0 check so nothing is actually emitted. But, if it returns the token count for you, should work. Don't know what's up with that.
between tasks atm - any preference?
- what MCP clients support dynamically changing tools
- implement graphql bbi
- propagating stdout/stderr from module return value to LLM
- what's using 10k input tokens
I think stdout/stderr is the most urgent (sorry...) because it breaks my workhorse demo... But if it becomes too much of a timesink let me know
Yeah no makes sense - I might have an idea that would help without requiring the frontend to strip the data out, so I'll look into that first, and if that starts taking too long I'll shift gears to just yeeting it into the string and stripping them in the frontend. (Though, even still it goes pretty strongly against OTel convention to put such large values in span errors)
the tl;dr is module-returned errors currently lose all the extra data we return from the API (in this case, stdout/stderr). Which is a shame since Errors are full objects and it'd be easy to attach more data to them. Then we just need to change the LLM implementation to also return all of that data, just like I hacked in for ExecError
is there a way to tell if you've been accidentally making an asciinema recording for hours without playing Ctrl+D russian roulette
ps? ๐

here's the current state of llm: https://asciinema.org/a/wk1VtsAjThkyuZcG8rtpN13Qq
- token usage metrics
- message spans now show their duration
- anthropic caching, saving ~12k tokens per roundtrip near the end, which is a ton considering 3.7-sonnet is limited to 20k per minute
- now shows the "thinking" span always, instead of only when there's a message
it gets pretty slow by the end, unsure why, maybe those cached tokens still take a long time for the LLM to process each time, they just don't count towards your limit?
pushed a commit - wanna check when you have time? i tried it with your repro and it seems to work now. so far it's only implemented for Go SDK
putting kids to bed then will try! thanks!
ok this is clearly not correct yet but it's starting to get pretty close
nice - if it helps any, here's my prompt from my MCP demo - its context is converting from a GraphQL query + schema, but maybe some of it is reusable
Nice nice, that looks really good. I'll try to incorporate some of that and see how it goes
@tigran_iii sonnet: reliable workhorse, if a task is very well defined and i have a clear outline of how it should be done but i need something that writes code extremely well, perfect
grok: big model smell but undertrained, it sucks alone but if i have something that is difficult, i have a
@spring wave got a panic in TUI ๐งต
Tracking the merge of github.com/dagger/agents into github.com/dagger/dagger ๐งต
nice - if it helps any, here's my prompt
Starting a thread for llm.4, seems worth it!
Thread about human-in-the-loop
๐จ๐จ๐จ new release v0.17.0-llm.4 is out.
To install:
export BIN_DIR=/usr/local/bin # modify as needed
curl -fsSL https://dl.dagger.io/dagger/install.sh | DAGGER_VERSION=0.17.0-llm.4 sh
@spring wave do you remember what your change was when I was trying to enable the host stuff? I just ran into the same issue when trying to pass tcp://localhost:3000 as a dagger.Service through the cli (on llm.3)
! query{host{service(host:"localhost", ports:[{backend:3000,frontend:3000,protocol:TCP}]){start}}}
! error: parse selections: parse field "service": init arg "ports" value as dagql.DynamicArrayInput ([PortForward!]!) using dagql.DynamicArrayInput: assign input object "Frontend" as {Elem:0 Value:3000 Valid:true} (dagql.DynamicOptional): cannot assign dagql.DynamicOptional into field of type *int
I'll try bumping to llm.4
yeah still have it on llm.4 with in both call and shell
๐ this is a broken thing on the llm branch that needs fixed before we can merge
new demo time @smoky ocean
Inspired by my old tictactoe RNN repo getting close to 10 years old https://github.com/kpenfound/reinforcement-tictactoe
Here's a Dagger agent that can play tictactoe
Plug in any LLM platform / model. This video is using gemini-2.0-flash
Code here:
https://github.com/dagger/agents/pull/18
I started that branch from scratch 3 hours ago ๐คฏ
boom boom boom
how many LoC?
@shrewd ermine want to maybe make it a separate repo? Since we're going to spin back out next week
It's inflated a bit because it also includes functions to run the game server and web client. Still comes in around 50
Yeah I'll add it to my agents repo, one sec
the agent itself is entirely included on the screen. Yeah the game logic is in the "tictactoeClient" but that just provides the tools "read" and "move" to the llm
just realized there's a bug making this less cool but it's not super noticeable in the video
actually I'm going to record it again because it's way cooler with the bug fix
random thought: how do we get in on Levelsio flightsim action ๐ https://x.com/levelsio/status/1895543217460035895
It's still going!
haha I used to mess around with NNs in flight sims and I could totally hook an agent up to one now. A lot of them have great APIs
What if we did a silly pipeline integration. Like watch the event stream and push it to github; or to a database or something
or, maybe more obviously plug it into an agent? But the problem is that it's very streaming-oriented, dagger might not work great for this. or maybe it will?
๐จ๐จ Calling for feedback: new proposed README for github.com/dagger/dagger, mentioning AI and CI use cases
spooky dagql code
๐ญ worth a try
there's a TUI issue I've been struggling with for these. You can see in my recording earlier that the function returns string (the output of histor()), but all I see in the terminal when the call completes is the traces. I think it puts my function's output somewhere in scrollback but it's not immediately visible
noticed that too - looking into it now, think i know why it happens but not sure what can be done about it yet
better demo run @smoky ocean
nice thanks - what should I link to?
Also @shrewd ermine can you spice it up and make the model explicit? You've got the multi-model setup and python makes the model optional argument look less lame than Go
That way we can say "Kyle plays tic-tac-toe with <model>"
Question is: which model? ๐
llama? Can try to get another retweet from ollama
This is with gemini and the repo is here: https://github.com/kpenfound/agents/tree/main/tictactoe
I can make the model an optional input if that makes it look better?
new readme is amazing
oh you mean like .llm(model="gemini-2.0-flash")?
or if ollama is more likely to rewteet I can do it with llama3 or something
Maybe gemini, since Jaana was so nice and gave such fantastic feedback yesterday ๐
also we have that native client now, let's flex it
@shrewd ermine I know it's getting late for you. Are you OK to re-record it with gemini?
sounds good! yeah totally. I would just run llm first but now my tui shows this instead of the nice model output we had before
what if when something is running we pop the terminal into an alternate screen mode instead? (same way vim/htop/etc work)
i think that might be the only way to preserve prompt position
- will keep looking in a bit
i've been doing llm | model; we got rid of the auto-field-printing because it was poor signal:noise in most cases
new recording is the best one. One sec, just speeding it up
so good
it didn't just let me win on the 3rd turn that time either lol
Can you have two models play each other? ๐
I might try that
(I don't mean record it now - just curious if the module would support it)
sure can! it's supported
actually needs a small tweak to the prompt because right now I tell it that it's player O but otherwise supported
tictactoe is kind of boring though. I was thinking about hooking it up to something more interesting like chess, go, or civilization ๐
@outer moth ๐ welcome ๐ I will start a thread here to discuss possible fun integrations between Dagger and Assistant UI. I know you're busy this coming week so no pressure to reply!
yes! I was playing with Chess idea today. Need some more hack time
probably fairly easy to drop into the tictactoe client I made! Basically
- read: read the board state
- move: makes a move and then waits for the opponent to move and reports the new state
the waiting part simplified the llm loop a lot because you didn't have to convince it to keep polling until the opponent moves
and then gemini did non-dagger parts of the demo too ๐
yes! I had the same highlighted to paste ๐ found it
Gemini: making 10x engineers 100x (pay me google)
trying a less invasive fix: instead of flushing command progress to the scrollback immediately, flush only when you run a new command. that way when it finishes you'll see your last command at the top of the screen instead.
building on this, I could have it switch into nav mode while the command is running, and switch back when it's done. someone mentioned being able to type while something is running feels weird, might be nice to be swapped into the usual TUI control mode while you wait, since it doesn't support running multiple concurrently anyway. but, sometimes it's nice to pre-type the next thing to run.
(i guess that's an i keypress away? also, how we feeling about that keybind?)
Sounds good. I don't feel like my issue was flushing output but maybe we're saying the same thing. I wasn't actually seeing my output at all but the trace output was still there
At the very end of this clip
oh, in that clip it looks like you switched from input mode to nav mode and stayed in that mode, which is what keeps the trace nav onscreen - did you expect to be switched back to input mode when it completed?
i'm fixing a different issue then, but it sounds related to the second thing I mentioned
the issue i'm fixing is that when you run a command that fills your screen height, after it exits the screen is just blank with your prompt sitting at the top
Yeah I cut the recording too short to tell but it just sits on that last frame when it's done
That's it too. I expected to still see my functions output at the end while in nav mode
ah maybe because right now logs are printed above the child spans - so it was offscreen
maybe they should be below? 
That's what I was expecting but maybe I'm holding it wrong?
hard to say, from the clip it was still running so everything there seems expected to me, and from my testing it does collapse when it's done
either way, pushed a fix for the thing i mentioned at least
Thanks I'll give it a go!
Consider submitting each of these as Show HNs, not just to Twitter. Just need to have a Readme (with a gif/video or interactive demo)
Demo idea: a docker registry where you describe the image you want in the address, and it creates it for you on the fly
docker pull "<registry>/an image with latest cuda and pytorch installed please"
call it the magistry ๐
I'd have expected that "initialize a container from an alpine image" was going to automatically give me a container but my Anthropic llm seems to insistently trying to use docker for that.
โ llm | with-prompt "initialize a container from an alpine image" | container | terminal 4.1s
โ๐ง initialize a container from an alpine image
โ โ 0.0s
โ
โ๐ค I'll help you initialize a container using the Alpine Linux image. Alpine is a lightweight Linux distribution that's popular in containerization. Here's the command to do that:
โ โ
โ โ docker run -it alpine
โ โ
โ โ This command will:
โ โ
โ โ โข docker run : Create and start a new container
โ โ โข -it : Provide an interactive terminal
โ โ โข alpine : Use the latest Alpine Linux image (it will pull it automatically if not present locally)
โ โ
โ โ If you want to run it in detached mode (in the background) instead, you can use:
โ โ
โ โ docker run -d alpine
โ โ
โ โ Would you like me to explain any additional options or would you like to do something specific with the Alpine container?
โ โ 4.1s โ LLM Input Tokens: 31 โ LLM Output Tokens: 1
am I missing something here?
you need to give it access to an object (Container or otherwise)
I was trying for the model to give me a container without having to supply a with-container myself
I thought the model already had access to the core native types / tools and handle those out of the box
โ container 0.0s
Container@xxh3:6934f6e558023746
โ llm | with-container $(container) | with-prompt "initialize a container with an alpine image" | container | terminal 8.2s
! input: llm.withContainer.withPrompt.container no response from model
โ๐ง initialize a container with an alpine image
โ โ 0.0s
โ
โ๐ค I can help you initialize a container from an Alpine image using the from function. This function requires an address parameter specifying the image's address from its registry.
โ โ 3.5s โ LLM Input Tokens: 11,698 โ LLM Output Tokens: 88
โ
โ โ Container.from(address: "alpine"): Container! 2.3s
โ
โ๐ค 2.4s
โ ! no response from model
๐ค
this looks correct , just a matter of prompt probably, it depends on the model.
Try "you have access to a container. Use it to..." that works well for gpt4o
nope, fully sandboxed by default now
yep, that works, thx.
also 11k tokens just for that ๐ฌ
I'd assume it's all the container object with its underlying tree being fed to the model
mmm no it only gets the equivalent of .help with arg schema.
but I guess it's a lot of functions
also claude talks alot by default
Cooking with @sand knot @stray ice
File edits ๐งต
Yeesssssssssss
Finally getting around to posting this on social ๐ ๐
Re: cached tokens still slowing things down. @void flint explained to me that cached tokens still have to be run through a GPU, to rebuild the inference state - it's not like a traditional cache lookup where cost is zero. It's cheaper but still you have to re-run the whole sequence through a GPU. So maybe that's what we're seeing
got it, that does feel like what was happening
What's the status of Dagger's dagger watch or shell reloading features for AI agent workflows? I'm planning a GitHub Actions pipeline where a Dagger-powered AI agent modifies files based on LLM responses and then re-runs tests. Does this require dagger watch or shell reloading, or can it be done with current Dagger functionality?
You could keep your working directory state inside of the Dagger pipeline in a Directory or Container in a "workspace" until you have passing tests and then as a final step create a PR with that working state, or publish a container image, or export it locally, or anything else you like. Since you don't need to use your laptop filesystem for a workspace, you don't need the watch/reload.
Good examples here: https://docs.dagger.io/ai-agents#examples
State in the golang Container: https://github.com/dagger/agents/blob/main/toy-programmer/toy-workspace/main.go#L12
Updated as it changes: https://github.com/dagger/agents/blob/main/toy-programmer/toy-workspace/main.go#L33
State in an empty Directory: https://github.com/dagger/agents/blob/main/melvin/workspace/main.go#L15-L18
https://github.com/dagger/agents/blob/main/melvin/workspace/main.go#L35
ok let me check and learn...
I played again with the AI agents ๐
- I've built a small
toolboxproject. The idea is to group some of the commands I want to use. An intermediate module: https://github.com/eunomie/toolbox- not a lot of things, but a
java-workspace. This is a mix betweenmelvin/workspaceand thetoy-workspace. This one is using my java module https://github.com/eunomie/java to ensure tests are passing after a change
- not a lot of things, but a
- I tried to use it on a "real" project, the getting started made by quarkus: https://github.com/eunomie/quarkus-getting-started
- the dagger module in this repository is using the toolbox above as well as the java module directly. It mixes plain old commands (to build and run a java code) and llm based commands into a single module.
- I added a
code-me-that-featurefunction, and here is the result of one call: https://github.com/eunomie/quarkus-getting-started/commit/ca97179c85ca924b8a35c01b322241252825d8ba
Contrary tomelvinor others, the idea is to generate the code locally, not to interact with github
On the things that works well, the harder was to have the workspace ๐ then it's quite easy to just wrap a prompt in a call. And more than that, once you have the few modules for your case (so for instance java tooling here) then it's very easy to create new functions when needed. One other good thing is it mixes modules in multiple languages. The AI stuff is in Go, is uses under the hood the java module in java. It works ๐
No idea where this stuff is going, but at least that's fun!
the harder was to have the workspace
I think we're getting much closer to a place where you can just grab one off the shelf (from daggerverse). I think it's actually been really beneficial so far that we don't have that quite yet because we probably have 50+ implementations of workspace that we've all learned from and can converge on best practices
that could be nice ๐
I wanted to have a workspace using the Java SDK, but the java sdk lacks the interface feature so I can't implement a checker. For now.
Nice @river belfry ! Glad you're seeing the "fun" part, it's actually very important for early tech waves ๐ fun brings more people
In this workspace, the checker is just a string. Maybe something like that would work? https://github.com/kpenfound/dag/blob/main/workspace/src/workspace/main.py#L65-L76
re workspace: I'm adding a shell() function today
I have a slightly more complex command https://github.com/eunomie/toolbox/blob/5f65357fd548c6b4b3e42cacb99cd23b85006e5a/java-workspace/main.go#L90 as I'm using my java module. So there's built-in cache and other stuff.
The idea of the interface is great for that, it's just it's not (yet) supported on my Java side (but I wrote this one in Go anyway...)
Maybe to provide both a string and a container (container instead of a from arg) could be a solution. But maybe it's just different workspaces
One thing really good for sure is this aspect of workspace/checker: the ability to have the llm running the commands to build/test without to have to code them. It's roughly a line in the prompt saying it has to check it's working. On the developer side it really cool, very convenient. And way better than to have generated code that doesn't build for instance.
Do you imagine that workspace will become a defined type? They seem to all have very similar functions
I don't see it becoming part of the core api if that's what you mean but it definitely fits in the bucket of "stdlib"
curious what you mean by stdlib?
In general it means a module or set of modules that provide very basic utilities on top of the core API that would be useful in most modules. Things like concurrency helpers, more advanced file ops / git ops, common cookbook patterns. Kind of one step beyond "featured" daggerverse modules
๐ฆ now that we have anthropic and gemini routing built in, we should split out ollama too. It's currently leaning on openai configs, which works but it means you can't use both openai and ollama together. We don't necessarily have to use the official client, the openai client seems to work fine. It'll mean an env change for ollama users, so we should do it asap
I don't see workspace becoming part of the core API. Actually I'm trying it to make it more clear in the demo that it's a user-defined type, because that's the most powerful part, and the hardest to understand
Re: stdlib. Maybe, but not 100% certain either. We haven't really figured out the patterns yet, I don't know if we can encapsulate everything in a single stdlib type
seems more like a pattern to me
Yeah if anything I could see a stdlib/featured interface that most workspaces would be extending. But not a usable workspace itself. But that currently wouldn't be possible anyway
workspace + shell = DX breakdown
Is the LLM PR in a good place to review now? I started the other day but then held off after hearing I should wait cc @spring wave
I think a first pass review is useful at this point. Architecture is pretty settled. Just don't look at the commit history ๐
it. there's some git cleanup to do
yeah now's a good time, the big rebases are out of the way
Teasing MCP support (thank you @spring wave for the POC ) https://x.com/solomonstre/status/1896634723684004013
@shrewd ermine @woeful quiver @dense flare I feel like the next few demos should focus on something more impressive on the functionality side - something that is super hard to do without dagger, but super easy to do with dagger. (ie. my "magistry" idea as one example).
Totally! I'm working on the "write examples for my module in every language" thing now. Been thinking about other dev flows for the backlog too, like "here's my current workdir, i've introduced a bug somewhere on my branch, what is it"
I have a few ideas Iโm looking into based on the LangChain stuff we discussed on the call today
The 'magistry' thing sounds actually really easy to implement and potentially a very sweet demo
just be careful to not fall into the trap of making something that looks really impressive to us systems people discovering AI, but looks super trivial to AI engineers. You want the other way around ๐
Yep exactly. I was brainstorming ideas to iterate on the tictactoe demo and realized they all fell into that category
10000x bonus if you find a way to tag along on levelsio flight thing ๐
What vercel did was insanely smart and something to aspire to @noble notch @bronze fern
(the aliens)
got a link?
I'm trying to get a self-healing ci pipeline to work - user adds code, tests (run with dagger) fail, agent (also with dagger) proposes a fix, tests pass
@bronze fern was trying the cypress-test-writer example from the docs page, looks like there's a typo in the code that causes it to fail, fixed here: https://github.com/jpadams/cypress-test-llm-ts/pull/1
Thanks @steep onyx ! That was the SDK/API change, I fixed it here https://github.com/dagger/agents/commit/b4d8f42cf5314d9a61412fc856ec0eb61e6f3006, but forgot this one! ๐
updated to v0.17.0-llm.4 too
Has anyone else struggled to get LLMs to know how to use slices as args correctly? I was just playing with as trivial an agent as I could think of to start, where I initialized the workspace with []*dagger.File and gave it a grep tool to search through a single dagger.File at a time, but it consistently tries to give a whole list of []*dagger.File as the arg for a single file, even when I yell at it in all caps in the prompt.
Works like a charm if I change my types to just be a single *dagger.File.
Just wondering if the LLM is being dumb vs. I'm hitting some limitation in what we allow it to do.
tool_call_id error
When you're trying to remember how to split a gif from the command-line, ask ChatGPT, doesn't work on the first try, you're going to copy paste back and forth...
Then you remember you can just write a one-liner agent command with dagger ๐
"you do it"
So.... Goose is basically an open source, multi-model clone of Claude Desktop.
Should we make an open source, multi-model clone of Claude Code?
YEAAAAAAAH !!!!
Also @spring wave - should we cut a llm.5 with the token counting + claude token caching? (thats the gif I'm trying to split btw) ๐
What if, what if, we tried to make a SOTA agent with Dagger agents ? Looking for my next objective
With o1 models and many providers doing native inference time scaling, the writing is on the wall that spending more compute on inference leads to better results on any given task. Alpha (Go|Code|Maths|Chess) have already proved that for their respective domain.
With our SOTA
sure
cool I'll try
๐ i'm also working on a bug fix for the gemini client that I ran into but don't wait for me
Update: it's not working ๐ฆ
e2b ๐งต
it only works for anthropic atm
openai returns 0 for token usage for some reason
This would be very impressive IMO. That's something tangible that can improve a developer's day to day
oh, in that clip it looks like you
Is it normal for the same input to the same model to produce widely differing solutions?
I have a rough cut of this working for an admittedly simple fastapi app but the results are very inconsistent, it seems like the model just goes off chasing it's own tail sometimes
could it be a temperature thing ?
I didn't know this existed but after reading your comment I looked it up and I guess it could be... But do we expose this configuration in Dagger?
we don't, but should
also: welcome go AI engineering ๐
when responses are inconsistent - is it better to try different models with the same prompt, or the same model with iterative changes to the prompt?
when responses are inconsistent - is it
debugging workspace
This is partially the reason I'm not so positive AI could be integrated in our specific ci/cd workflows. I'm happy to be proven wrong though.
I see this as a paradigm shift where you have to work with the pre-condition that all LLM outputs will definitely not be deterministic. In that sense it's super important to come up with a validation loop which basically sets the threshold to what you can accept as valid.
It's a new way of building software which still needs a bunch of refinement.
Yeah I don't think I'd ever want AI directly in the path to production for a serious project, for the same reasons I would argue against apt-get update && apt-get install. But using it to help out in the dev loop makes a lot of sense; the end result is still just code frozen in time.
I guess I'm on the disagreeing side -- (I agree, just temporarily). The best parallel is autonomous cars: we don't want them to drive until proven that it just works.
It's just gonna be a continuum: from assisting on the highway to FSD driving end-to-end on your vacation, and, at some point, we have enough data to just let the car drive, then laws will forbid humans to drive on some lanes on the highway ๐คฃ
The parallel to autonomous cars is interesting. However, a vast majority of people don't trust autonomous cars or want to go anywhere near them still. Granted there's slightly less risk with AI in software dev based on the area but it's still a risk. To convince developers to use it, we have to prove that it can be somewhat consistent (I won't say deterministic) to be trusted. But we aren't there yet. Hopefully Dagger can help us get there ๐
to me it's more about reproducibility and isolating from external dependencies in the critical path to production
fully automating to that level seems like there's a lot to lose or re-invent and very little to gain, vs. just having the same AI generate code for you
yes agree, reproducibility is key. I say this as an owner of a Tesla with FSD which I use for my daily commute. It's gotten pretty good at taking me to work every day without much variance. So, i think software dev can get there too. But there's a long way to go. Tesla has an insane amount of targeted, real world data to work with.
I take Waymo several times a week btw and it's basically flawless, fyi ๐
I've seen those. It's funny how what you are doing with Dagger is providing the tools to the AI like Waymo does with it's gadgets on the car
An AI agent doesn't need to be in the path to production to be helpful. Your CI is basically validation. An agent can help build that validation and diagnose if something is wrong with it, but it doesn't necessarily need to be in the critical path to do those things. It can asynchronously do these tasks on broken PRs or opening new PRs to solve issues / expand validation
This needs to be showcased front and center.
QA is a great use case for insertion in a production CI IMO
My ex-roommate is building an AI QA company: https://www.heal.dev/ -- some big companies already adding that solution, so it's happening right now ahah
There are also several startups building "unit tests generator agents". For example Tusk https://www.usetusk.ai
Yeah totally, in the end the winning solution is going to be the tool that enables a continuum of trust to add those agents in the flow ๐ฏ
It's not just trust though, it's also cost. Do you really need 10 llms to run on every commit in github, or is it something that runs only after the tests fail in a certain way or when a qa label is added
I may be wrong, but such features can help to make everything fully automated like SWE agent, we can trigger github action pipeline on any type of issue creation like bug or security to run the pipeline and fix the issue and commit, release and closing the issue, many things can be fully automated, so much potentials...
everything is powered by dagger and ai agents with dagger...
Yeah, this kind of automation is already possible - it's what we all already do in CI. We're just adding a new primitive to the toolbox for what can happen in the pipelines responding to VCS events ๐
Quick update on workstreams relevant to agents. Let me know if I'm wrong or missing anything.
- @spring wave is looking into Claude Code and how Dagger might integrate
- @wraith remnant is working on his e2b project with @warped bramble , and exploring n8n integration as a proxy for "human-in-the-loop" and framework integration
- @hidden tartan is working on generated clients (also needed for framework integration)
- @shrewd fern is stabilizing the shell
- @shrewd ermine @woeful quiver @dense flare @bronze fern @storm gate are building demo agents & related devrel content
- @spark phoenix is focused on daggerizing swebench
- @smoky ocean is trying to ugrade BBI to multi-object + taking mcp/bbi from @spring wave so he can focus on claude code
@lost topaz @quiet ether are catching up and are available to help -> correct? ๐
After digging into the actual code some more, I think this may actually be a limitation in BBI, specifically LLMs have no knowledge of the existence of list types and have no ability to select elements from them: https://github.com/dagger/dagger/pull/9628#issuecomment-2698853881
๐จ๐จ New release: v0.15.0-llm.5
After digging into the actual code some
mcp
It's a great privilege to be able to follow @spring wave 's traces live as he develops ๐ ๐ฟ
Later:
๐
Remember this value: <a href="https://www.youtube.com/watch?v=dQw4w9WgXcQ" target="_blank">potato</a>
The official video for โNever Gonna Give You Upโ by Rick Astley.
Never: The Autobiography ๐ OUT NOW!
Follow this link to get your copy and listen to Rickโs โNeverโ playlist โค๏ธ #RickAstleyNever
https://linktr.ee/rickastleynever
โNever Gonna Give You Upโ was a global smash on its release in July 1987, topping the charts in 25 countries includi...
TIL:
I thought I was being clever adding tools for my agent to lookup reference information. In this case it's static information that it needs every time. Turns out some models just hate that. Going back to giving the information up front worked much better https://github.com/kpenfound/fork-dagger-agents/commit/149ca283c8eb7327edcd0eeda78d87d34680fc8a
Think I figured out what that 10k token usage burst comes from: it's probably all the tools + documentation.
And when I said I figured it out, I mean the AI figured it out. https://v3.dagger.cloud/dagger/traces/d746866137eb8def406fa48feec8445e
lol, yeah this was by far the most frustrating thing with my hackathon
if there's a prompt engineering tip for that, it would be such a huge improvement
gemini was totally fine with it, but when I switched back to ollama qwen it would take the doc as a brand new assignment
the only trick i really found was to add error checks before tool calls so they'd just fail until the required README was read
but, it's really clunky (the user sees errors, etc)
mine was good about looking up the information every time, but it would forget about it's original mission once it received it. Probably something about context windows ๐คท I'm learning
Getting there... https://asciinema.org/a/CPEvr96l5dYoAyKodHdUiaKYZ /cc @smoky ocean (you can ignore the second half, forgot it was a cold cache ๐
and then it blowed up)
Corresponding trace (TODO fix infinite pending message bubbles)
So far I've added a dagger llm command, which is like dagger shell but accepts text prompts instead. Under the hood it just does a Llm.withPrompt(...) loop, syncing after every input.
You can use /with to run a shell command and set its result as the LLM context, e.g. /with container | from alpine will do Llm.withContainer(...), or /with . will pass the current module to it, etc.
dagger llm
Grok 3 with thinking is just soo good with Dagger
One shotted this for my personal project:
name: Daily Scrape, Transform, and Publish
on:
schedule:
- cron: '0 0 * * *' # Runs daily at midnight UTC
workflow_dispatch: # Allows manual triggering
jobs:
daily_workflow:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v3
- name: Run Dagger pipeline
uses: dagger/dagger-for-github@8.0.0
with:
verb: "call"
module: "."
args: "daily-workflow --code_dir=. --mistral_api_key=env:MISTRAL_API_KEY --orama_api_key=env:ORAMA_API_KEY"
workdir: "."
env:
MISTRAL_API_KEY: ${{ secrets.MISTRAL_API_KEY }}
ORAMA_API_KEY: ${{ secrets.ORAMA_API_KEY }}
Just perfect
awesome ๐
@uneven depot this is an example of my idea - nice when the model cooperates but unfortunately it doesn't happen every time (although I'm getting closer ๐ https://github.com/vikram-dagger/fastapi-sample-app/pull/36
merged dagger llm in to llm cc @smoky ocean
also - I undid that change to keep the last command in the 'active' area after it completes, because the fact that it cut off the top with no way to read it felt worse than the behavior it intended to fix (having the output offscreen in the scrollback in some scenarios). /cc @shrewd ermine
(and also it wasn't the issue you initially ran into anyway, was just something else i noticed)
This interaction was interesting - it looks like it got confused by the xxh3:... return value from the BBI, and I had to convince it that it's actually an object it can interact with:
https://v3.dagger.cloud/dagger/traces/a9f67b7f7a2731713a7ac0b8570e436a?listen=576cf045d2a005e1
Probably because it expected head to return a git hash, and it confused the xxh3:... hash for one.
we hould try prefixing it with <type>://ID. eg. Directory://xxh3:...
Oh yeah, maybe we can reuse the format from the CLI (Directory@xxh3:...) - but only if it solves the problem, URI scheme might be clearer to the LLM. I'll try that out
also, idea for dagger llm: /checkpoint foo => saves the current Llm object + state, so you can go back to it and continue later, sort of like rewinding the dialogue tree
yes I've been saved once before (last year) by url scheme format being clear to LLM
tried adding Foo@... prefix but then it just dropped the Foo@ prefix when it called back
- gonna try just adding something to the description, like "This function returns an object of type Foo"
getting a lot of Overloaded errors from Anthropic today
that hint didn't really help, it seems to just have problems identifying that it won't have tool X available until it does certain parts of the prompt first: https://v3.dagger.cloud/dagger/traces/2312c289f9a6cd01bfe119d2c5de9733
@spring wave running on a commit f3bccd836b5eb6d6e671d3bfa235aea19f4bfa21 (from 24h ago). Looks like something yesterday introduced a double output. Should I pull latest or is it still TODO?
pull latest - should be fixed
yep looks good, thanks!
0.17.0-llm.6 ๐งต
๐จ ๐จ ๐จ new release: 0.17.0-llm.6
This one has a new goodie... Which I highly recommend trying ๐
dagger llm --> our take on the Claude Code command-line experience... ๐ courtesy of @spring wave
Try it and tell us what you think!
@spring wave btw you probably want a slash command to clear the context
@spring wave how do I get a list of slash-commands? I typed /help but the llm answered ๐
Ok, the title of this one is a bit of a mouthful: "LLM calling LLM using a Dagger GraphQL MCP server module called via the flat BBI"
https://asciinema.org/a/nNjuad4p7WQEiQVXGTK66rMOH
https://v3.dagger.cloud/dagger/traces/f30fd419bb059cac83251fb3a91bb2f0?listen=bdaed855ce2487a8
I was going to revive the GraphQL BBI because I wanted to see if it would be any better at "thinking ahead" (#1346945909837008997 message), but then had an idea to implement the BBI as a Dagger module instead. Worked out pretty well I think.
oh here's the module: https://github.com/vito/daggerverse/blob/main/mcp/main.go
will pick this up now, right now /with is a total one-off
i got tab completion and completion checking working at least
ok ๐
๐ง ๐ง ๐ง ๐ง should I revisit my approach to making BBI multi-object?
or orthogonal?
(I don't have a full handle on how tbh... It's like a halfway between BBIv1 and your gql approach)
Yeah I think we might not need BBI since you can just use the flat one to bootstrap a module instead
I think I'm going to add LLM.withModule(string) to get rid of the annoying limitation where I have to install a dependency (and reload schema) before I can give my LLM a custom object
@spring wave but I don't want to lose "give any object instance to llm, not just a stateless module entrypoint"
Oh I misread, dunno how it affects multi object
just wondering if you discovered a new state-of-the-art approach to Dagger/LLM integration, that may invalidate some of our assumptions in ongoing BBI work
I just watched it... need to sit down
Hmmm... variables do seem to help a bit with multi-object: https://v3.dagger.cloud/dagger/traces/4bc987983b0e80660ae74c840ab9b6e3?listen=661d9dc77df3da47 (screenshot for those without access)
Is that kind of what you had in mind?
Yes 100%, variables as a first class feature of the LLM type basically
I don't love the DX though, lots of GraphQL cruft, would be nice to just see the calls immediately without having to expand
(brb)
LLM.set<Foo>(key string, value <Foo>) set the variable key to value, available to the LLM.
Is it possible to support/generate embeddings using LLM? I did not see it but was looking at the ollama API https://github.com/ollama/ollama/blob/main/docs/api.md#generate-embeddings
Right now we're using the generative API / models for every provider because that's what agent and tool calling use cases need. Not to say that can't be supported but it would be something much different from what we have today I think
FYI, I'm on llm.6 and following the example in the docs for the go-program - getting the following:
! could not parse arguments for function "go-program": missing 1 positional argument(s)
Can you paste the exact command?
Oh man it's my required arg
dagger -m https://github.com/shykes/toy-programmer
go-program "develop a curl clone" | terminal
Also, might be the Zed terminal - but its really hard to copy anything above the last output from the TUI - meaning almost impossible
fixing now
Oh right it scrolls up too fast right?
Yes
I see it, want me to document that or PR the qa to be optional?
No shame here! Was just walking through the docs before the meetup presentation next week ๐
oh my god
nested llms is crazy
wtf
Oh man! That's why I got the error during the live as well!๐ I thought it was me, now I can tell the truth! ๐
noooooo! I'm so sorry ๐ฆ
Eheh, no worries, I'll edit the recording... but it was funny, because I literally tried it 1h before and it was working
Holy ๐ฉ๐ฉ๐ฉ๐ฉ dagger llm supports adding modules @spring wave what have you done
OK i have to share a recording of this...
just a heads up, i'm gonna be pushing some stuff today to start prepping for merge - the idea is to get ci completely running and green by EOD!
i'm breaking down any/all changes into smaller pieces, with commit descriptions, so if any of the pieces look controversial or i remove or rename something that looks sus, then it can be reverted easily ๐
apk add apt 
@gloomy kindle heads up there's probably a merge conflicts in our near future
shouldn't be too bad but depends how far you take your refactoring ๐ can you let me know when you push?
lol i've already solved two today ๐ข
why not one more ๐
working on fixing the current one right now
or ideally show me in advance, if you have a wip branch
i'm pushing one commit at a time, since it's a bit time consuming to run all ci locally
getting to merge ๐งต
Terminal access ๐งต
๐ the dagger CLI doesn't support stdin, correct? i.e: I'd like to do something like some_command_useful_output | dagger -c "llm | with-prompt "blablabla" | with-input? so I can pass it to Dagger
not supported, would be cool though
not sure how
yeah.. same. Maybe a custom function argument type? ๐ค
We're kinda enabling this for mcp, so the implementation might be very close -- but UX wise ๐ค do we expose that as part of the llm type (which is stdin queryable??)
When using a module with constructor args in shell, I'm not seeing the arguments documented in .help - makes it really difficult to determine the syntax and discover whats available
it looks like the way to see those is .help ., but I agree it should come up with just .help
@quiet ether do you want the stdin stream or a buffered string is enough?
buffered is enough for my immediate agent use-case
Been thinking on this a bit, is there any reason to not show the help by default if there is an error with the input (missing args). That seems to be normal behavior for a lot of CLI programs and that would give the user some feedback on what they did wrong
No reason not to
@shrewd ermine any reason not to set up streaming for Gemini? noticed it's missing the fancy telemetry
can work on that now if not
@spring wave separate telemetry question: I notice that the custom span for tool call ๐ค๐ป is still "old style" whereas the one for prompts and replies are using the new fancy UIActorEmojiAttr attribute. Should I change it over to do the same, or is there a reason it's been left behind?
context: trying to debug my llm-multiobject branch
that span is never actually visible since it just does reveal + passthrough to show its child spans directly instead
Ah right
So now that tool calls don't map 1-1 to underlying function calls (at least in my branch) not sure what to do
mmm actually I do know. I have a half-dozen builtins, which I can show as special "ui emojis" messages. They don't have children so no need for pass through.
Then for regular tool calls, those do map 1-1, so I can keep the passthrough
pass-through supports multiple children too so maybe nothing needs to change there?
oh I see. just add my custom span in my builtin function
and it will benefit from the pass-through, like regular function calls
yep
cool cool
@spring wave when using "UI actor emoji", can I still encode the actual tool call in the span, without relying on streaming content? Do I just put that in the span content? For the prompt that value is just "LLM prompt" not sure if it gets rendered anywhere?
are you trying to put an emoji next to something that's not a message bubble?
Yeah
ah right now they go hand-in-hand
Should I just make it a regular span with an inline emoji?
like the good old days (aka 2 weeks ago ๐ )
what is the content you want the emoji next to? 
builtin name & args
assuming it's not a function call or message bubble
ah ok - so like a shell command?
(kind of?)
eg :
_load base
_save myctr
_objects
_scratch
(sadly .foo is not allowed, so I went for _)
eh yeah you can just put the emoji in the name for now
if i find tiem i'll update the UI to support the emoji attribute
So far I have 7 builtins:
_save: save the current object to a variable_load: load a variable and make it the current object_undo: undo the last action_scratch: clear object selection_type: show the type of a variable_objects: list saved objects and their type_current: print the current object ID
don't worry about it, inline emoji is completely fine. thanks
We may have accidentally built an open alternative to Claude Code. Fully open source, supports any model, and MCP native.
Anyone interested in that?
side note: cursor's terminal does NOT like markdown output. grinds to a halt.
@smoky ocean pushed what I have to get it off my machine, but seeing some TUI jankiness still, feel free to try
testing!
@spring wave a /model would be cool
to show off the multi-model aspect
...should it keep the history?
that might actually kind of sort of work, just need Llm.withModel
Good question ๐
Yeah I would try to keep it, I think ChatGPT lets you do it (openai models only of course)
adding support for it, you can always /clear 
we should just send the old model history along to the new model
@smoky ocean pushed
it seems to gaslight the shit out of Gemini though. ๐
https://dagger.cloud/dagger/traces/d0fdc0527ef54425e4ebd5bdfdea2933
I am 100% certain those later messages are all Gemini - I can tell from the telemetry, and from how it refused to do the "what distro is this?" prompt
(the /clear at the end did accidentally make it swap back to Claude, which is fixed now)
That's amazing
so I think we need to implement the real ollama client ๐ฌ I guess it's not so bad because we're up to 3 official clients now
Oh that reminds me @spring wave - with the multi-model focus, might be cool to show the model name prominently in the visible "identity" of the AI soehow in the trace. So that we can see "Deepseek said this"; "Gemini did that" etc
Maybe down the line there's a visual identity thing, like the space invaders but for LLMs ๐
Trying to remember the details of the toy-workspace implementation. This line isn't real right? https://github.com/dagger/agents/blob/main/toy-programmer/toy-workspace/main.go#L11
it would be nice to have a way to do this for real. Like a LlmWithXOpts to exclude certain objects from getting added to tools
@shrewd ermine that line is actually prompt engineering ๐
yeah I have something to the same effect in my system prompt. The problem is the container type still gets added to the tools and adds all the tokens
Yeah
I think multi-object might help
(mm but not really)
you're right we need an optional arg to hide fields
We have something similar in codegen already but it probably doesn't help because I'm guessing this goes in bbi
multi-object ๐งต
i just added it to the llm prompt, agree seeing it in the trace would be nice, could be an easy span attribute (already have a model attribute for the metrics)
lemme know if it's annoying in the prompt, it's longer than i thought ๐
the no-module 1 shot with a review environment ๐
has anyone tried "write me an agent that ..." yet? ๐
can you do it in dagger llm ? It has /model now
then you should quote-tweet my claude code tweet with your video & say something like "this is amazing, I did XYZ"
tried and it failed miserably - probably bad prompt engineering on my part ๐คฃ
then I'll retweet you ๐
@spring wave is the /model thing merged to @llm?
yep
oh duh I have to rebuild cli 
ok so /with directory doesn't take any args... what kind of voodoo magic does it need to say "get . from host?" (that specifically doesn't do it lol)
oh man, running ollama and pgvector to embed documents is the first time I've heard the macbook pro (Apple M4 Pro) with 48GB RAM put the fans in overdrive ๐คฏ
and also not be a lap top because its on ๐ฅ
/with directory | with-directory / ./ seems to work ๐
is this a model issue or is it missing tools?
โ please read the contents of the file ./main.go 0.6s
โ๐ง please read the contents of the file ./main.go
โ โ 0.0s
โ
โ๐ค I am sorry, I cannot read the content of the file. The API only returns a digest of the file.
that's what i've been seeing with gemini
ohhhh, maybe it doesn't support streaming + tools?
it's able to do directory things once I've done /with directory. Just gets stuck on file from there
โ what is in the current directory? 1.2s
โ๐ง what is in the current directory?
โ โ 0.0s
โ
โ๐ค 0.5s โ LLM Input Tokens: 1,087 โ LLM Output Tokens: 1
โ
โ โ Directory.entries: [String!]! 0.0s
โ
โ๐ค The current directory contains the following files and directories: .circleci, .dagger, .git, .github,
โ โ .gitignore, .gitmodules, DEMO.md, Jenkinsfile, LICENSE, README.md, dagger.json, go.mod, go.sum,
โ โ main.go, main_test.go, and website.
โ /with container 0.0s
Container@xxh3:6934f6e558023746
โ /with directory 0.0s
Directory@xxh3:c02ee2fb89ab4f04
โ /with file 0.0s
! find module "file": input: moduleSource local path "file" does not exist
โ โ looking for module 0.0s
โ ! input: moduleSource local path "file" does not exist
โ โ โ moduleSource(refString: "file"): ModuleSource! 0.0s
โ โ ! local path "file" does not exist
๐ค When I try to read the file at ./main.go, my tool encounters an error.
โ โ
โ โ The error message is:
โ โ
โ โ "ID lookup failed: ./main.go"
getting this on ollama too. Tried a few different big models
regression on llm.6?
this is on llm latest but i'm seeing the same on llm.6
Can you try on gpt-4o? Just give it a directory and ask it to tell you what's in a file
claude 3.7 is working 
testing
@smoky ocean https://x.com/kylepenfound/status/1897891500073144372 ๐
perfect ๐ ๐
works with gpt-4o (0.17.0-llm.6)
Nice, something is definitely off for ollama and gemini in this then. I can dig deeper tomorrow!
Thanks for testing!
Can I use copilot (any supported model) with this?
through ollama only for now: https://github.com/dagger/dagger/blob/dd42a62c75a46292136ed4fa4737d6c449ecbd8a/docs/current_docs/agents.mdx#2configure-llm-endpoints
hmm, I don't want to run it locally necessarily. I can't really use bigger models that way. Is it even possible to integrate copilot with this? That's the only tool available to us in the Enterprise.
good question, it depends on 1) Copilot's available extension points (I'm not familiar) and 2) what you want to build with Dagger
Ideally you would get a direct access to the LLM endpoint powering your copilot instance. Probably a private Azure Cloud endpoints backed by OpenAI. But it's possible that Github doesn't expose that for anti-competitive reasons
I am thinking, allowing devs to ask dagger (based on my private daggerverse) to scaffold their CI pipeline or add a stage to it. Right now, without that context, the responses copilot gives for dagger code is pretty bad.
Copilot does have the CLI but yeah it's not direct LLM access
is the Azure OpenAI Service an option for you? It seems like that's Microsoft's answer to the competition at the moment
Heard similar from someone I know at a big shoe company. Not allowed to go to public OpenAI, et al because they don't want employees to accidentally leak private info.
big shoes huh ๐ I don't know if anyone's tested a private OpenAI setup yet but private Ollama works great
I am honestly not sure. I could find out. Our copilot usage is mostly via GH
@uneven depot https://github.com/ericc-ch/copilot-api ? ๐คช
edit related: https://dev.to/ericc/i-turned-github-copilot-into-openai-api-compatible-provider-1fb8
Got it! We've discussed (separately from this experimental feature) working on making Copilot better at working with Dagger
As of 3 Mar 2025, after around 1 month of usage, GitHub finally sends me the first warning email
Um, nope!
Is there a good way to get the response as a string from the LLM? Meaing I gave it a prompt, and want that LLM's response as a string. Does that require I give it a workspace with a write tool?
nope you can call last-reply and it'll give the content of the last message from the llm. or history for all messages
llm | with-prompt "who are you?" | last-reply
Tested and it works!
Implemented according to https://github.com/openai/openai-go/tree/main?tab=readme-ov-file#microsoft-azure-openai
pushed iteration #(i lost count) at reasonable scrollback behavior /cc @shrewd ermine @smoky ocean
should we cut a tag to get the latest improvements out ?
For anyone interested in the Dockerfile-Optimizer agent, I finally have a new version that is smaller and producing much better results via an eval: https://github.com/dagger/agents/pull/20 - feedback is welcome, it's not finished yet (still have to re-wire the feature branch and the opening of a PR to not break the current behavior).
Here is the kind of result I am getting, even on a super simple image, it's still able to shrink it by giving the llm some accurate feedback in between tries.
@storm gate nice! Is there a way to show the diff? It would make it feel more "real" in the demo than the agent saying "I did it"
I added line to my prompt for agent to cat out results to celebrate success. Could run diff too
I guess the โcatโ was from the LLM since I didnโt provide a shell ๐
In my experience the more "real" artifact you show the more people get how cool it is. Of course opening a PR is awesome too ๐คฏ
Yes, the diff will be in the PR it opens at the end (I did not wire this yet but itโs already on the v1)
Active fork or toy-programmer: https://github.com/helloprkr/toy-programmer
Does anyone know if it's possible to have an agent interact with a table loaded up in duckdb? I'm trying to get an agent to do some simple/fun EDA on a small data set from github, but I'm struggling about how to tie the pieces together
I feel like I can figure out how to do this with langchain in python, but where/how dagger intersects is thowing me
๐ hello! The way I would do it: first develop a small Dagger module that integrates with DuckDB. It doesn't have to do anything LLM-specific.
(you could also use a pre-made DuckDB module, but I searched on daggerverse.dev and couldn't find one)
Once the module is ready, instantiate a LLM (either in the command-line with 'dagger llm', or in code from another module) and bind a instance of your DuckDB module so it can use it.
do some simple/fun EDA on a small data set from github
Can you share a little more detail? ย Seems like a fun use case ๐
Or, you could try giving the LLM just a plain container, and see if it can figure out how to install duckdb software, and fetch data from github on its own.
In that alternative scenario you don't even have to develop a custom module: you let the LLM do all the work from an open-ended environment. But, you will get less predictable results. It's the defining tradeoff of agent engineering ๐
v0.17.0-llm.7 ๐งต
If this is helpful this is what I'm doing with python/langchain:
...
con.execute("""
CREATE SECRET secret2 (
TYPE s3,
PROVIDER credential_chain
);
""")
result = con.sql("""
CREATE TABLE IF NOT EXISTS github_pulls
AS (
WITH data AS (
SELECT
* exclude(is_locked, milestone_id, issue_number, is_pull_request, repository_id),
row_number() OVER (PARTITION BY issue_id ORDER BY closed_at DESC) _row_num
FROM 's3://launi-data-bucket/__unitystorage/catalogs/bdeab243-c684-4e4d-ab6c-c2c87596398d/tables/46d6ab44-359a-4d62-b888-682db22402e1/*.parquet'
WHERE is_pull_request
)
SELECT * exclude(_row_num)
FROM data WHERE _row_num = 1
)
""")
con.close()
db = SQLDatabase.from_uri(
"duckdb:///./data.ddb",
include_tables=['github_pulls'],
sample_rows_in_table_info=3
)
toolkit = SQLDatabaseToolkit(db=db, llm=OpenAI(temperature=0))
agent_executor = create_sql_agent(
llm=OpenAI(temperature=0),
toolkit=toolkit,
verbose=True,
agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION
)
template = """/
You are a SQL analyst that is querying a database of GitHub repository
information named "github_pulls".
Your job is to write and execute a query that answers the following question:
{query}
"""
prompt = PromptTemplate.from_template(template)
agent_executor.run(
prompt.format(
query="Which creator_login_name has merged the most pull requests? Tell me their name and how many pull requests they have merged."
)
)
With the intent being there being another agent who is thinking of interesting questions to ask this agent, who actually runs the query
So i guess if I were going to do this in dagger: iโd have a duckdb module that exposes like, with_db, execute_sql, etc. and then thereโd be a dagger LLM writing queries and calling execute_sql?
@smoky ocean pushed more scrollback fixes/polish, if you wanna tag another release. ๐
- Gave up on swapping to the Alt screen since that introduced new weird behavior. (interpreting scroll events as going through prompt history)
- Don't really need it anyway, it's simpler without it and the end result (after things finish running) is still good.
- Also got rid of the highlighted background since it was causing artifacting and reduced contrast.
- Now prints the scrollback on exit, and fixed the spacing above the "Full trace at ..." message
- Fixed an off-by-one error preventing the prompt from "hugging" the bottom of the screen
https://asciinema.org/a/oCkA9UGG7bEq9MuE14reWyM6q
Can tag myself when I'm back at the pc
nice! tagging now
17.0-llm.8 ๐งต
๐จ curl -fsSL https://dl.dagger.io/dagger/install.sh | DAGGER_VERSION=0.17.0-llm.8 BIN_DIR=/usr/local/bin sh
I just realized I can't do /with llm because the LLM type can't introspect itself ๐ญ
multi-agent with zero line of code, just raw command-line, would have been sick
is there a reason it can't? 
I guess not since type references are lazily resolved. it just doesn't get registered automatically by the middleware, because of protection against infinite recursion. Probably a 1-line patch to special case it ๐
I'll try when the kids are asleep
this seems to help:
diff --git a/core/llm.go b/core/llm.go
index 210ef854b..76512e72b 100644
--- a/core/llm.go
+++ b/core/llm.go
@@ -667,6 +667,17 @@ func (llm *Llm) Set(ctx context.Context, key string, value dagql.Typed) (*Llm, e
}
llm = llm.Clone()
llm.env.Set(key, value)
+ if obj, ok := dagql.UnwrapAs[dagql.Object](value); ok {
+ llm.history = append(llm.history, ModelMessage{
+ Role: "user",
+ Content: fmt.Sprintf("The variable %s is set to %s@%s.", key, value.Type().Name(), obj.ID().Digest()),
+ })
+ } else {
+ llm.history = append(llm.history, ModelMessage{
+ Role: "user",
+ Content: fmt.Sprintf("The variable %s is set to %v.", key, value),
+ })
+ }
llm.dirty = true
return llm, nil
}
before: https://v3.dagger.cloud/dagger/traces/b913833bf8e945bafda382b6a12e1d02
after: https://v3.dagger.cloud/dagger/traces/72b5fd437d2323cc8c5b70c38ce4948e
hrmm that helps it along a bit in the simple one-object case, but it doesn't seem to improve the 'use ctr to build repo' prompt (granted, it's an incredibly vague prompt)
you know what, instead of _load we can expose eg. _load_ctr and _load_src. In other words _load_<key>: load the <typename> "<key>"
one less indirection
trying
Hiiiii
I'm a dagger noob and running through the AI docs page using v0.17.0-llm.8 and storing my openai env's in .env, and I get this 404:
! input: toyProgrammer.goProgram POST "https://api.openai.com/chat/completion
โ๐ง You are an expert go programmer. You have access to a workspace
โ โ 0.0s
โ
โ๐ง Complete the assignment written at assignment.txt
โ โ 0.0s
โ
โ๐ง Don't stop until the code builds
โ โ 0.0s
โ
โ๐ค 0.2s
โ ! POST "https://api.openai.com/chat/completions": 404 Not Found```
I'm assuming these are OK:
OPENAI_MODEL=o3-mini```
Those seem OK but I'm not 100% sure about OPENAI_BASE_URL, because I usually don't set it, and let the openai client use the default. Might be worth unsetting it to see if it fixes the issue
Also hello! ๐ welcome
Fixed! unsetting worked
It works ๐ pushing
Hello! I'm new to Dagger and I'm following this blog: https://docs.dagger.io/ai-agents/ , but can't making it work. I'm getting this error. Any help? Thanks!
ah ha, figured out how to enable token usage stats for OpenAI - had to flip a boolean in the params. pushed /cc @smoky ocean
hm this might come down to the AI model needing a bit of prompting. which model are you using?
What's a local model that runs on a macbook with 32gb of RAM that can be successfully used for this flow? I'm using qwq because it supports tool use and seems to run locally, but I got strange behavior when I tried to use the toy go coder with it. Is it just that a 32b parameter model is too greedy for this RAM profile? Is there a smaller model that still supports tool use that y'all might recommend?
I'd suggest qwen2.5-coder:14b. it's been pretty good so far for me
is there an example of a node/typescript coder workflow, rather than the go one? I'm less familiar with go, would be able to make more headway with an example in the node ecosystem
๐ค
oh this is adorable:
root@4m6qndd0p6sfi:/app# cat hypergraphdb.go
<paste-hypergraphdb-code_HERE>
how's this one? https://github.com/jpadams/cypress-test-llm-ts it's typescript and writes cypress tests
also, this is the TS version of the same-ish toy-programmer with goProgram() using the Go toy-workspace as is. Multi-language FTW!
https://github.com/dagger/agents/tree/main/toy-programmer-ts
import { dag, Container, func, object } from "@dagger.io/dagger";
@object()
export class ToyProgrammerTs {
/**
* Write a Go program based on the provided assignment.
*/
@func()
goProgram(assignment: string): Container {
// Create a new workspace using the third-party module
let before = dag.toyWorkspace();
// Run the agent loop in the workspace
let after = dag
.llm()
.withToyWorkspace(before)
.withPromptVar("assignment", assignment)
.withPromptFile(dag.currentModule().source().file("prompt.txt"))
.toyWorkspace();
// Return the modified workspace's container
return after.container();
}
}
ah, so like, the fact that the app is written in "go" is just a property of the prompt, the implementation of the flow is language agnostic?
Yep, this was a tightly-coupled demo example, but the concept could be super generalized so you could provide a container with the right tools for a workspace, and a function to test/check things with, etc.
I think there's a more general example like that...
interesting. So the tools that the agent gets access to - the ability to read and write from the file system - those are just baked into the Container?
baked in Dagger's universal type system. Container is just one primitive type, you can compose your own too
You can attach an object of any type to an llm instance and all of the functions on that object become tools for the LLM ๐
ah, interesting.
So a core type like Container has a ton of functions/tools, prob too many
While in the toy-workspace we constrain down to 3 functions
read, write, build - keeps the agent out of trouble ๐ and is surprisingly powerful when the agent can run in a loop: receiving the initial assignment and then errors and function return values as feedback as it loops, trying to reach the goal (and then stopping).
agent loop repeats
Is it a bug that sometimes when I give the agent a container with Alpine image via the /with command, it pulls another image (e.g., Ubuntu) when I ask it to do something? It seems like it doesn't know the context.
Basically a side-effect of giving too many tools. The container object has all the APIs to interact with a container, including choosing a new from, which LLMs seem to like to call. What I'd like to do is have a way to explicitly narrow down which tools are exposed from an object so I can just give it withExec or something
Totally agree. I added that feature as "function mask" in the merge checklist: https://github.com/dagger/dagger/issues/9801
Overview This issue tracks the work required to ship native LLM support in the Dagger Engine, as prototyped in #9628 . Blockers Command-line UX dagger llm: this CLI command is amazing and we need t...
Regarding the .help usage in shell. It does not show functions or constructors when in a module. Created this issue to track: https://github.com/dagger/dagger/issues/9828
Qwen is a bit drunk for me ๐
โ๐ค Certainly! To understand what your Dagger module does, I'll need to take a look at the code in the
โ โ https://github.com/levlaz/snippetbox repository. Specifically, I'm looking for a Dagger file, which is
โ โ typically written in HCL (HashiCorp Configuration Language) or Go.
This gave me a good laugh! Qwen must be sponsored by Hashicorp
@spring wave I wish it was possible to share a specific span/repo from dagger cloud that is not in the context of a repo - esp for the agent use case where I might be in vibe mode and wanting to share a link for someone to look at - I think right now these become "orphaned" local traces and there is no way to access them via the public url
Is there an easy way to see which model is being used?
โ llm 0.0s
Llm@xxh3:372bce69d14ff865
I feel like llm used to tell you more info, now its just this id it seems
llm | model
or if you're in dagger llm it'll be in the prompt
Thank you so much!
we got rid of the object dumping because in a lot of cases it was super spammy and didn't show the most useful info
Hm, is this all cached? I am trying to switch from local qwen to anthropic using env var but it seems to still be using qwen even though my env has changed
nvm, i am dumb ๐
small UX feature request, if anyone has time: would be awesome if /model could autocomplete model name ๐ I find it super hard to memorize the model names
either that, or support shortnames like "claude-sonnet", "claude-sonnet-2.7" or even "claude"
or both ๐
Would love to see this, ran into it myself this morning
Also, if anyone is looking for well-scoped contributions: token counting doesn't work with openai ๐ญ
#agents message ๐
Experimenting with multi-object @spring wave ๐ https://v3.dagger.cloud/dagger/traces/c7544f59767be564b198cb49e9137181#07aa2dae8785d894
(is there a way to make a single trace public?)
no, i (and Lev :P) would very much like that too
hi @spring wave ๐ been a while since our Cloud Foundry days
whoa hey!
i'm trying to run one of the agents, the dockerfile optimizer. i'm in a clone of the pandas repo, just trying to try it out but when i follow the instructions on the tutorial (https://docs.dagger.io/ai-agents/) i hit this positional arguments bug. maybe it's expecting me to pass in my api keys?
possibly a dumb question, but where is the code for the agentic loop? i see the initial prompt and the tools, but i'm confused how it's orchestrated
Great question, a bit of magic.
When you provide the LLM with an object like a Directory or ToyWorkspace using llm().withDirectory or llm.withToyWorkspace, all of the functions on that type are made available as tools for the LLM. Then you kick off the conversation with a prompt. The agent will loop in its own, requesting tools to be called with certain parameters and consuming the outputs until it sends a final response with no tool call.
For example the Dagger Directory type has these functions:
as-module
as-module-source
diff
digest
directory
docker-build
entries
export
file
glob
name
sync
terminal
with-directory
with-file
with-files
with-new-directory
with-new-file
with-timestamps
without-directory
without-file
without-files
This may be too many or not the right tools for the job, thus in a lot of cases we provide a set of tools and a suitable execution environment in what we're calling a
"workspace"
This this little example has just 3 functions / tools ๐ to constrain the LLM and keep it on track. The prompt plays a big role. And some LLMs are better at tool calling and selecting tools than others. As you may have guessed, often the "bigger, smarter" models are better at it.
https://github.com/shykes/toy-programmer/blob/main/toy-workspace/main.go
read
write
build
ah got it. so it's this part:
after := dag.Llm().
WithWorkspace(ws).
WithPrompt(`
that kicks it off
is that a part of the OSS repo? i'm curious to poke around if it is
yep! On the llm branch
ah i see, so the "workspace" is a special case folder that tells the agent essentially what it's allowed to do/which tools to use
It's a Dagger module, in fact, the way we package up all Dagger functions for re-use, but in this case the trick is that every type XXXX in the API automatically gets an llm().withXXXX() function, so if your project has a ToyWorkspace, you get an llm().withToyWorkspace()
yeah "workspace" is not a special case
yep, so you can grab any module from https://daggerverse.dev/ and give it to an LLM as dag.Llm().With{Module} and it'll have all the functions of that module available as tools
(we really need a way to make that part more clear...)
aha, so that's what makes it so powerful
it's like Dagger's version of MCP in a way
exactly ๐
very very cool
side note, using Claude Code to explain the Dagger codebase to me is very fun
i'm not sure i should have paid 61 cents for the privilege but it found the right code!
It summarized the loop really well
@coarse epoch note we are adding actual MCP compat, so you can expose any Dagger object as a MCP server also
hopefully soon you can write a 5-line dagger script to do the same thing ๐ cc @wraith remnant
did you try the dagger llm command yet?
for context, Alex and I were interns together at VMware on the Cloud Foundtry team way back in 2011
Iโm also ex-cf but I suspect we didnโt overlap ๐
you can bind any object to the llm context also. Same plumbing as explained above, just with a slicker UI.
Try: /with github.com/dagger/dagger/modules/wolfi
then:
please build me a container with git, go and python installed. Also add a text file that says "hello world". Then push it to the registry ttl.sh with an image name of your choice
$ dagger llm
gpt-4o * /with https://github.com/dagger/dagger@llm
gpt-40 * look in core/llm.go and explain how the agentic loop works in Dagger?
I feel like my dagger llm just keeps telling me how to do things, rather than doing them itself
are you using llama? it tends to do that more than others. With a little prompting it goes away
no, i'm using openai, 4o i think
I like to say "you have access to a container/workspace/whatever. Use it to accomplish your tasks. Don't tell the user how to do it, do it yourself". Then follow up with more specific prompts
This one works great with gpt-4o: https://github.com/shykes/toy-programmer/blob/main/main.go
If I already have a CI dagger module with test/build/etc functions, how can I give those to my agent directly, without recreating them in a workspace module?
Link? Let's try
- CI module: https://github.com/vikram-dagger/fastapi-sample-app/blob/main/.dagger.old/src/book/main.py
- Workspace module (currently recreates the test() fn, trying to improve this): https://github.com/vikram-dagger/fastapi-sample-app/blob/main/.dagger/workspace/src/workspace/main.py
maybe the workspace.test() is just a wrapper which internally calls ci.test()
whoa
sick i will try that
I was actually trying this yesterday night, I got hit by a 40k token per minute limit with claude sonnet 3.5
which is actually a very useful error, as you can iterate until you find the best model for your use case
maybe Gemini would be best as it allows more input tokens
Wrote a technical content summarizer agent, that might have had a few too many long islands... ๐คฃ
It's using Ollama and qwen2.5-coder, and does the following:
Takes a URL and strips the content out of the website using cheerio (JS library)
Gives a reader workspace with get-content (this uses cheerio) and check-content against a min and max length and forbidden words
Then its asked to summarize the content for a non-technical audience
Right now its modifying the summary to fit the actual check-content tool ๐คฆ so it needs a little tweaking
Code is here - will be touching up a little more https://github.com/jasonmccallister/tech-summarizer-agent
Wrote a technical content summarizer
sneak peek ๐
if tests or linting fails on a PR, it'll comment the fix as a suggestion
interesting, would like to see the code and maybe integrate into my demo as well. currently it only produces a diff in a single comment (and sometimes doesn't do the diff at all)
It's all this function here: https://github.com/kpenfound/greetings-api/blob/main/.dagger/main.go#L200
Getting the code suggestion diff correct was honestly half the work of building this demo lol. The heavy lifting for that is from workspace.Diff (workspace is a submodule) and suggestions.go in the main module
Thoughts on today's OpenAI announcements? https://openai.com/index/new-tools-for-building-agents/
Just tried a qwen model with llama.cpp:
llama-server -m qwen2.5-coder-7b-instruct-q5_k_m.gguf --host 0.0.0.0 --port 8000
when I ran a function using Llm and tool calling or via dagger llm /with xxx ... I hit:
got exception: {"code":500,"message":"Cannot use tools with stream","type":"server_error"}
srv log_server_r: request: POST /v1/chat/completions 192.168.64.2 500
Seems like a llama.cpp limitation, perhaps.
Most GGUF models lack OpenAI-style tools
need some wrapper magic, it seems: vllm or ollama?
Thoughts on today's OpenAI announcements
Yeah that's correct, I thought we weren't streaming for the "other" provider but I haven't actually looked at it
Chat completion worked great though via dagger and dagger llm
Anyone else getting 502/503's from Gemini right now?
seems to be acting up a bit, yeah
oh cool, since ollama 0.5.13 you can set a default OLLAMA_CONTEXT_LENGTH to override the default 2048 ๐
if you're using ollama with ollama serve, set this ๐ to something like 8192 and see qwen fly
this will be such a spicy demo, i love it
your wish is granted, it is live now ๐ #1349235356184350770 message
if you can get it to fix a flaky test on camera...
fixes tests and lints btw. I did not emphasize the linting part in the video
the linting side of things does invite naysayers "good linters fix their own errors" ... irl they don't but whatever
it's also the classic problem of "oh yeah but i run cyborg-vim closed beta that doesn't support linters so i just push"
lol or "my build system is complicated enough i don't actually know how to apply linter diffs, especially inside my editor" coughuscough
demos like this do make me wonder if we're gonna see a resurgence of phab or gerrit style stacked-diff code review systems.... like someone should have to stamp the agent commit seperately from the rest of the PR
https://www.git-town.com in our context, although the UX is still kinda unpleasant in github when it's not strictly necessary
Git Town user manual
there's also a paid saas startup that does this on top of github but i can't remember the name
are you thinking of trunk?
nope, https://graphite.dev
ah yes
fyi: merged main into llm, had some nontrivial conflicts but I think we're good, lemme know if anything seems... off
Feature request: a way to interrupt the llm, without killing the shell
@spring wave worth it for me to push a llm.9 ?
it's just merges from main I think
i kind of have a heretical view that we will eventually stop reviewing code altogether and AI will do all the reviewing
Could anyone repro a build error for me please?
dagger -m github.com/shykes/dagger@llm-multiobj -c 'cli | binary'
--> does this build for you? And if so, what's your engine version?
OK I can't build main either...
wth
what is your engine version? 
either 0.17.0-llm.8 can't build dagger main; or main is broken
ah. I have 0.17.0-llm.8 and 0.16.3 running so i can try both
I'm getting elixir SDK errors on both main and my llm-multiobj branch
yeah i got the elixir error from llm.8
# github.com/dagger/dagger/.dagger
./sdk_elixir.go:52:19: sdkDev.Lint undefined (type *dagger.ElixirSDKDev has no field or method Lint)
./sdk_elixir.go:81:18: sdkDev.Test undefined (type *dagger.ElixirSDKDev has no field or method Test)
./sdk_elixir.go:97:16: sdkDev.WithBase undefined (type *dagger.ElixirSDKDev has no field or method WithBase)
./sdk_elixir.go:98:16: sdkDev.Generate undefined (type *dagger.ElixirSDKDev has no field or method Generate)
./sdk_elixir.go:164:16: sdkDev.WithBase undefined (type *dagger.ElixirSDKDev has no field or method WithBase)
@shrewd ermine why does your :looking for module take a min lol - mine was 20ish sec
uh because I lied and it still had to start 0.16.3 lol
I am told main builds fine from 0.16.2
based on the error it feels like a merge issue
@smoky ocean this worked in 0.16.3
ok thanks
yeah but a merge issue that somehow causes llm.8 to fail to load one particular module (dagger/dagger) ??
oh wait yeah you're right
Ok this is melting my brain:
0.16.3can loadmainandllm-multiobjfinellm.8fails to loadmain,llmandllm-multiobj- But
llm.8doesn't fail to load any other module that I know of?
@spring wave are you getting any of these build errors?
OK I narrowed down the issue to github.com/dagger/dagger/elixir/dev. Somehow when loading that module, llm.8 returns a boilerplate module straight from dagger init
So it looks like we shipped llm.8 with a timebomb that goes off when you load sdk/elixir/dev (and possibly other modules from the dagger repo?) I'm going to say: something went wrong in the SDK-bundling system at build
Elixir SDK errors ๐งต
getting a surprising result from LastReply() on llm.8 ๐งต
Is there a way to break long line in shell?
just adding a linebreak with enter should work, depending on where you do it; it'll only consider it complete once it's valid shell, so if you hit enter e.g. after a pipe it'll continue on to a new line. or, \ might work too (haven't tried)
the \ does not work. Currently, if I want to write prompts with multiple lines, I have to write them outside the shell and paste them in.
turns out there's a keybind, ctrl+o - but that seems pretty hard to find to me. did you try shift+enter? i can swap it to that
oh - shift+enter isn't bindable at the moment, maybe that's why (https://github.com/charmbracelet/bubbletea/issues/1014)
I tried with shift+enter but it didn't work. I'm fine with ctrl+o for now until shift+enter is supported
I've got my agent writing some tests for me, but i haven't been able to get it to actually persist so once my dagger call ends it's all gone ๐ซ what do i need to do to save the work?
You typically would return a directory or file from your function and export it - https://docs.dagger.io/cookbook#export-a-directory-or-file-to-the-host
ah
i used directory because the doc for export said export the image as a tarball
Yeah something like dagger call <func> export --path=<somewhere> - if you are giving the repo as a directory you will probably want to export the entire directory
ah, i wouldn't like just export my tests dir onto my existing tests dir?
Onboarding improvements ๐งต
what's weird is even using export it's not writing anything ๐ง
are you able to confirm that the llm is using it's tools to write it's work and not just "thinking" about it's work? you can also look at the files before exporting if you run dagger call <func> terminal / <func> | terminal
and if you have a cloud trace that helps a lot with debugging what the agent is doing ๐
are we using this sdk for gemini? https://x.com/_philschmid/status/1900095644624134347
No the Dagger core engine is implemented in Go, so we use that SDK. cc @shrewd ermine who contributed that feature
Has anyone tried/had success giving an agent two workspaces?
In multiobject branch?
in general, have not tried that branch yet but will soon. I'm giving the agent a database workspace and a toy (go) workspace
At the moment (0.17.0-llm.8) you can only give a llm one object at a time
History inspection ๐งต
This is more about introspecting an Llm object before the execution, for prompt composition. Posted an explanation in the thread: #1349789097178300417 message - curious to know if anyone else is doing something similar.
@shrewd ermine yesterday got a message from a friend trying out our agent stuff, he specifically called out your multi-agent demo module as particularly cool ๐
awesome! that's good to hear. Maybe I should make a more advanced multi-agent demo too!
Quick question, what's the preferred way to launch two dagger containers within single network? Application and a database as a real example
does the database need to be accessed other than by the application?
Nope, throwaway setup, it's agents after all (:
haha yes that happens to be one of dagger's superpowers ๐ here's an example in the cookbook https://docs.dagger.io/cookbook/#create-a-transient-service-for-unit-tests
the key line is the "with service binding" to connect the database container to the application container
drupal.with_service_binding("db", mariadb) # assume "db" will be the hostname
...
.as_service(use_entrypoint=True) # <- this is the key line
That's it?
My boi that's clean
Note if you want it to connect to an external DB (like one running on your laptop or an external server) you can bind that to your app container also. Service is an abstract type that can be backed by a container or a tunnel to host network
neat, so I can take app container, swap service binding and throw this onto prod so to speak?
yup! The cool thing too is that you don't even need to change the service binding necessarily. Using the snippet from the cookbook as the example, you could have an optional arg to your function for mariadb dagger.Service, and if it's not set create it like in the snippet. But then you could pass it in from the cli as --mariadb tcp://localhost:3306
that's what is shown here (without optionally creating one) https://docs.dagger.io/cookbook/#create-a-transient-service-for-unit-tests
Perfect, will try that out after checking how my pipeline runs on top of dagger
Finished porting all steps besides the database, but this should be ready in a couple minutes
@shrewd ermine if i drop a trace link in here is it public or can you see it because you have privileged access
yep dagger support (including myself) can see it
Another silly question, say I expect command to fail often but still want to grab stdout stderr
result = await container.with_exec(["tsp", "compile", ".", "--no-emit"])
exit_code = await result.exit_code()
stdout = await result.stdout()
stderr = await result.stderr()
return exit_code, stdout, stderr
I found the .sync() method, not sure where to correctly get stdout and stderr
you can pass the with_exec option expect=ReturnType.ANY and then make sure to check the exit_code https://github.com/kpenfound/dag/blob/main/workspace/src/workspace/main.py#L135-L142
@shrewd ermine your multi-agent demo has gpt-o1 as a constant for model name, but does that actually work?
uh probably not, i've always passed them explicitly. I can update it
Were you able to use o1 at all? For me llm --model=o1 fails with an api error
I haven't tried ๐คท
Multi-object eval 1 ๐งต
Any techniques for trying to help dagger limit the amount of context it's sending to the LLM? I feel like i was making progress and now it's started blowing up due to exceeding token limits
๐ง maybe i should be limiting what i'm passing into the llm workspace
Yeah that'll help. We're going to have function masking so that you can limit which functions from your object get passed in, I think that's going to be big
Is there a way to expose the container without adding a container() function so i'm not confusing the LLM with that tool, but can still access the workspace container? (i should probably go look at the toy programmer repo you just sent..)
@lean mural we plan on adding a "function mask" so that you can hide certain functions (in this case, container()) when binding an object to the llm
for now you can use tricks like 1) adding a "do not use" comment in the function ๐ or 2) saying it explicitly in the prompt
in the case of my toy-workspace module, here's my trick: https://github.com/shykes/toy-programmer/blob/main/toy-workspace/main.go#L11
// The workspace's container state.
// +internal-use-only
Container *dagger.Container
That +internal-use-only doesn't mean anything to the Dagger SDK, it's for the LLM to read ๐
๐ does /prompt know how to handle reply types? in llm.8 at least how do I get a prompt reply and set that as the new /with object?
mmm in prompt mode you can't do that unfortunately (I think). have to instantiate a llm type from the shell
"prompt reply" you mean the actual string message sent by the llm?
That's what I thought. I was able to get it with the shell, yes
No, sorry. The reply type of my tool calling
In my case the LLM is calling a tool that returns a container and I want to be able to do something with it
But I can't get a reference to that
ah I see. well in llm.8 as long as the returned type is the same as the original type you set with /with, the llm keeps the latest reference.
Yeah.. they're different
I'm using the toy programmer fwiw. Which returns a Container and doesn't have a function to get a reference to it
in any case you can't get it out euther way. but you can ask the llm to publish/export/open a terminal/expose port ๐
like in my demo when I say "give me a terminal"
Yes, but that works a long as there's a function that retrieves it
In this case there's not
It's a response from a function call. I can add another function to retrieve it
In any case, adding s other function to retrieve it will work ๐
I don't understand the issue. it should be able to chain
"get the container from your workspace then publish it to ttl.sh"
It strange.. it doesn't know about it. Since the toy-programner doesn't have a way to retrieve it
The LLM only knows about the goprogram, goprogramqa
ah you're doing multi agent? your "copilot" has access to a ToyProgrammer correct?
Yes
ok - then that's expected yes, no function to get the workspace
for a demo I recommend doing single agent unless you specifically want to show multi agent as the topic
for a "basic" demo of dagger prompt mode it might be too meta
Yeah.. multi agent will probably be confusing. I'll call that toy programner with the shell and that's it
you can also bind toy-workspace to the prompt mode and basically recreate toy-programmer from the CLI. Then show that you can do the same in code
both work
I am using latest dagger with llm integration. When the llm tries to expose the port, it gets the following error : decode arg "port": cannot create Int from float64
This is the call that it is declaring to use:````
<invoke name="withExposedPort">
<parameter name="description">Plone HTTP port</parameter>
<parameter name="port">8080</parameter>
<parameter name="protocol">TCP</parameter>
</invoke>
Any idea how I could hint it ? Or is that a known bug of the integration ?
I have found the following in https://github.com/dagger/dagger/pull/9628
Bug 3. Cannot create int from float64 fixed! thank you @spark phoenix ๐
IOW, might not be totally fixed. Or I am looking at another flavour of that bug.
๐ is +internal-use-only a thing? do we want to make it a thing? https://github.com/dagger/agents/blob/8c35400cb65192ed5bf34936f88481fe40cebd12/toy-programmer/toy-workspace/main.go#L8-L13
i can't see how it's used, is this different from what a +private field is?
AFAIK it's not a thing but we do want to make it a thing in some sort of way I assume ๐ . Not sure where the discussion for this feature is ๐ฌ
oh I guess it's not private because we still need to access it from code
but the LLM shouldn't see it right
yes, exactly
mm okay yeah fiddly
not sure we want to call this, but it doesn't seem plumbed through to anything
gonna merge llm-multiobj in to llm and keep working on it
i think we might be able to just skip the feature flag, and have it only opt in to multi-object once you set variables 

neat neat
except for interactive shell which automatically sets variables. maybe that one should be opt-in somehow? @shrewd ermine have you had a chance to play around with it yet? wondering how the demo UX is
I haven't yet but I will this afternoon
pushed the merge, here are the cliff's notes:
llmcommand is gone - useshellinstead- press
>at start of input to swap to prompt mode,!to switch back /withis gone - I'll add a.withbuiltin instead (e.g..with $(container | from alpine))- vars that you set in shell are synced to the LLM, vars that the LLM set are synced back to the shell (though you usually have to tell it to set one)
- in prompts you can use
$vars(auto-completable) to explicitly reference objects, or not and see if it figures it out based on context
working on .with now
"autocompletable" ๐ฅณ ๐ฅณ
still need to backport that to shell completion funnily enough
prompt can complete shell vars but not shell ๐
btw, dagger shell is displayed with strange hue on my terminal (ghostty). It wasn't like that before
that should be gone now (not sure if it shipped)
ah! ok cool. I'm on 0.16.3 so it hasn't shipped
@spring wave should we celebrate multi-obj being merged with a llm.9 tag?
not yet please ๐ unless you want to 
the merge is just to avoid losing time to merge conflicts
but, should be soon
i'm adding a .llm builtin to replace /with and let you access the current LLM object:
# set state
container | from golang | .llm
# access current LLM state
.llm
# get state
.llm | container
# all in one
container | from golang | .llm | container
pushed ๐
can we change the name of that builtin? looks a lot like llm...
does anyone have a clue why the model doesn't seem to figure out how to call withExposedPort?
edit: not sure if it's a claude specific thing
โ /with $(container | from alpine ) 2.0s
Container@xxh3:0991463b40bf15ca
โ expose the port 8080 in the container 12.4s
โ๐ง expose the port 8080 in the container
โ โ 0.0s
โ
โ๐ค I'll help you expose port 8080 in the container using the withExposedPort function. This function requires the port parameter, and since you specified port 8080, I can use that directly.
โ โ 3.6s โ LLM Input Tokens: 11,701 โ LLM Output Tokens: 105
โ
โ๐ค I apologize for the error in my previous attempt. Let me try again with the correct format:
โ โ 2.5s โ LLM Input Tokens: 29 โ LLM Output Tokens: 78 โ LLM Input Tokens (cache writes): 11,805
I'm also not being able to see the call in the trace here: https://v3.dagger.cloud/marcos-test/traces/b81bf3d688e337cb4c844ce71b25a546
open to suggestions, though it literally returns an LLM, the same as llm, so the name doesn't seem that far off to me. it's sort of like "the current LLM state". e.g. .llm | history will show you the message history, .llm | last-reply gives you the last reply, etc
.current-llm?
oh sorry - I thought it returned the current selected object within the llm
ah np, yea that'd be weird
the trick is it also supports piping state to it
then I could just set it
i think we'd lose that with a var
copilot=$($copilot | with-foo) ?
true
wait how do I pipe state to .llm?
container | from alpine | .llm
it figures out the type and calls the appropriate withFoo
this is to replace /with for current demo flows, since all the slash commands are gone
why not just set variables?
you can do that of course, this is in the spirit of compatibility with pre-multi-object demo flows, in case we're still giving them, since using vars changes how you prompt it. with this you just change from /with to piping at the end
bearing in mind the original plan was to feature-flag all of the multi-object stuff
super confused, on a fresh checkout of the llm branch I have
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: sdk/php/generated/LLM.php
modified: sdk/php/generated/LLMId.php
and restore/checkout doesn't actually fix ๐ค
those files don't actually exist, but I do have Llm.php and LlmId.php
except also they do exist 
โ dagger git:(llm) โ
> ls -la sdk/php/generated/LLM.php
-rw-r--r-- 1 kylepenfound staff 27015 Mar 14 13:36 sdk/php/generated/LLM.php
โ dagger git:(llm) โ
> ls -la sdk/php/generated/Llm.php
-rw-r--r-- 1 kylepenfound staff 27015 Mar 14 13:36 sdk/php/generated/Llm.php
โ dagger git:(llm) โ
> ls sdk/php/generated/L*php
sdk/php/generated/Label.php sdk/php/generated/LabelId.php sdk/php/generated/ListTypeDef.php sdk/php/generated/ListTypeDefId.php sdk/php/generated/Llm.php sdk/php/generated/LlmId.php
probably some fun with case insensitive filesystems
guessing you're on macOS
we probably need to re-generate all of sdk/*
> git status
On branch llm
Your branch is up to date with 'upstream/llm'.
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: sdk/php/generated/LLM.php
modified: sdk/php/generated/LLMId.php
no changes added to commit (use "git add" and/or "git commit -a")
โ dagger git:(llm) โ
> rm sdk/php/generated/LLM.php sdk/php/generated/LLMId.php
โ dagger git:(llm) โ
> git status
On branch llm
Your branch is up to date with 'upstream/llm'.
Changes not staged for commit:
(use "git add/rm <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
deleted: sdk/php/generated/LLM.php
deleted: sdk/php/generated/LLMId.php
deleted: sdk/php/generated/Llm.php
deleted: sdk/php/generated/LlmId.php
no changes added to commit (use "git add" and/or "git commit -a")
yeah very weird lol
basic single-object workflows dont work in tip llm fyi
I just gave a demo and it broke
llm | with-container | with-prompt "do you see the container" -> doesn't see it
lol we have reached the "damn we really need tests don't we" point of this branch
did that prompt work before? i've found it always has trouble with that sort of phrasing, since it's more around what set of tools it has
I just did llm | with-container $(container | from alpine) | with-prompt "do you see the container" | last-reply on llm.8 with ollama and it was good
this worked for me:
โ llm | with-container $(container | from golang) | with-prompt "what tools do you have? list them all" 43.7s
# ... lists all of Container, plus the multi-object tools (TODO it should not)
but yeah it responded with the list of tools
false alarm then we haven't reached the "damn we really need tests don't we"
i really need tests but that's a personal problem
who needs'em
it wasn't the actual wording - I used wording that consistently worked before
oh boy... regenerating the SDKs has a circular dependency on the CLI https://v3.dagger.cloud/dagger/traces/64f461f3758096962c6d5c2f94117690
rust codegen is broken by the LlmID -> LLMID change; it ends up as Llmid and doesn't pass a ends_with("Id") check ๐ซ