#agents
1 messages · Page 1 of 1 (latest)
Has anyone found any of the new "computer use" features that have any sort of free tier. It seems the anthropic API is only paid - I was hoping to do some basic experiements without dropping a CC yet 😄
@subtle surge @merry scarab I share the desire for a new channel but I think this was a bit rushed
Dagger and... is for integration / combination with specific tools or tech. This feels different
Yeah I hear that, but on the other hand I don't see it as that different than dagger + aws, since aws represents hundreds of potential tools.
Totally open to other approaches though, its not too late to close this one and rethink it.
I think this is more like a use case
Dagger for...
And it should be more specifically about AI agents, "AI" is a bit too broad
Also possible that I'm overthinking it
Yeah I agree on the Dagger for... that was my intent in the first place for sure.
Also agree that its about agents vs all possible ai topics.
Issue is adding Dagger for... feels like an even bigger escalation than adding a new channel 🙂
I also thought about it as a similar channel to "Dagger and AWS", but I should've checked with you first.
For now, I am excited that we now have a place for the community to put thoughts and ideas around "Dagger and AI". On that note, would love the community's thoughts on Solomon's AI demo from the last community call. https://www.youtube.com/watch?v=ycjT88jZccQ
@subtle surge is there a reason this is unlisted? 🙂
Yes, but I can change that 🙂
All i can say is, always say thank you to gpt. you never know when terminator judgement day comes, it will remember the kind humans and spare us
AI Agent
Definition: An AI agent is a software entity that autonomously performs tasks on behalf of a user or another program. It is designed to perceive its environment, make decisions, and act to achieve specific goals.
Functions:
Autonomy: Operates independently without continuous human intervention.
Task-Oriented: Focuses on specific tasks such as data analysis, customer service, scheduling, or monitoring.
Learning: Can incorporate machine learning algorithms to improve performance over time.
Examples: Chatbots, virtual assistants (like Siri, Google Assistant), recommendation systems.
AI Avatar
Definition: An AI avatar is a digital representation of an entity, often human-like, that interacts with users in a more engaging and personal way. It combines visual and sometimes voice elements to simulate a human presence.
Functions:
Representation: Provides a visual and interactive interface for AI systems.
Engagement: Enhances user interaction and engagement by simulating a human-like presence.
Communication: Can use natural language processing (NLP) to understand and respond to user input.
Examples: Virtual customer service agents with human-like avatars, digital characters in video games, virtual influencers.
your saying just reminded me of the difference between AI Agent and Avatar.
I watched the demo a few weeks ago, what stuck with me from it was that it matched my experience of getting a chat bot to do something like troubleshoot and fix some networking problems with MacOS. But I had to ask it to explain its rollback plan before issuing some obscure CLI commands . Apple Intelligence, the other AI, OTOH, doesn’t appear to be trained at this level..
AI = algorithm we don't understand because it's occluded by a "network"
@harsh stag the reason why we couldn't load your module in our call with @smoky ocean seems to be related to an old dependency in your project's main branch. If you run this in the shell .doc git@github.com:siafu/cover.ai/apps/reporter@feat/pytest-plugin, that works
@quiet ether
let me up and running I want to attempt a POC with the latest version of the agent
I have my company and another lined up
lets talk early this week if possible
@harsh stag let me prepare instructions for you real quick
@harsh stag checkout github.com/shykes/melvin. I added a setup.sh script
(adapted from your gist @bronze fern thank you 🙏 )
I don't see the setup script. did you push?
You can also set:
LLM_HOSTLLM_PATHLLM_MODEL
Those should allow you to connect to any openai-compatible endpoint.
@shrewd ermine and @shrewd fern know how to get it working on local models.
FYI @spring wave @quiet ether @shrewd ermine @worn hill @spark phoenix I'm going to move the llm/agent dev thread to here
I like this new API better:
⋈ llm | with-git-repository $(git https://github.com/dagger/dagger) | ask "show me all minor releases, each with the latest patch release"
Below are the minor releases along with their latest patch releases:
- **v0.1**: Latest patch release is **v0.1.0**
- **v0.2**: Latest patch release is **v0.2.36**
- **v0.3**: Latest patch release is **v0.3.13**
- **v0.4**: Latest patch release is **v0.4.2**
- **v0.5**: Latest patch release is **v0.5.3**
- **v0.6**: Latest patch release is **v0.6.4**
- **v0.8**: Latest patch release is **v0.8.8**
- **v0.9**: Latest patch release is **v0.9.11**
- **v0.10**: Latest patch release is **v0.10.3**
- **v0.11**: Latest patch release is **v0.11.9**
- **v0.12**: Latest patch release is **v0.12.7**
- **v0.13**: Latest patch release is **v0.13.7**
- **v0.14**: Latest patch release is **v0.14.0**
- **v0.15**: Latest patch release is **v0.15.3**
@spring wave I wired middleware back in, it works for core types but inexplicably, causes modules to fail loading with missing Go types at module runtime build. Even when I comment out the payload of the middleware, even a noopModuleWithObject simply being called triggers the issue...
dagger -m github.com/shykes/dagger@llm shell -c 'dev | with-mounted-file .env .env | terminal -c "dagger -i shell -m github.com/shykes/hello"'
@smoky ocean .env is a directory now?
@spring wave I pushed a commit that re-enables the middleware, so the issue is reproducible
I think lack of self-calls will be a blocker for making the most of this for agent dev
@spark phoenix re-posting the 4 bugs I'm currently stuck on... (was initially on another thread, but here is better)
## Bug 1. "Can't instantiate..." Solved!
## Bug 2. "FIeld "withExec" not found..."
Bug 3. Cannot create int from float64
⋈ llm | with-directory $(directory) | with-prompt "write a file hello.txt with contents 'hello world' and set the permissions to 0600" | history
🧑 💬write a file hello.txt with contents 'hello world' and set the permissions to 0600
🤖 💻 withNewFile({"path":"hello.txt","contents":"hello world","permissions":384})
💻 error calling tool: decode arg "permissions": cannot create Int from float64
🤖 💻 withNewFile({"path":"hello.txt","contents":"hello world"})
💻 ok
🤖 💬I have created the file `hello.txt` with the contents "hello world". The file was created without setting specific permissions due to an earlier issue, so it has the default permissions. Would you like me to attempt setting the permissions again?
⋈
Bug 4. Cannot load modules
This was introduced in my refactoring yesterday
$ dagger call -m github.com/shykes/hello functions
✘ Container.withExec(args: ["go", "build", "-ldflags", "-s -w", "-o", "/runtime", "."]): Container! 0.5s
# hello/internal/dagger
internal/dagger/dagger.gen.go:5461:33: undefined: Engine
internal/dagger/dagger.gen.go:5472:38: undefined: EngineCache
internal/dagger/dagger.gen.go:5483:43: undefined: EngineCacheEntry
internal/dagger/dagger.gen.go:5494:46: undefined: EngineCacheEntrySet
internal/dagger/dagger.gen.go:5648:31: undefined: Host
! process "go build -ldflags -s -w -o /runtime ." did not complete successfully: exit code: 1
Error: failed to serve module: input: module.withSource.initialize failed to initialize module: failed to call module "hello" to get functions: call constructor: process "go build -ldflags -s -w -o /runtime ." did not complete successfully: exit code: 1
--> Happens with any module
Note: this repro doesn't use dagger shell, because shell doesn't show the full error message for some reason
When dagger rag agents that push your code for you?
I think soon that will be something you can do yourself in a dagger module 🙂 (or find a module in daggerverse that already does it)
@spark phoenix here's the main PR: https://github.com/dagger/dagger/pull/9504
Added a setup script in the PR description
I see it coming too cant wait
You can try with this branch already, it's not finished but honestly pretty fun 🙂
@woeful flume do you have a use case in mind? 🙂
Solved 🙂
OK this one truly mystifies me 😭
FIXED!! Finally
my use case is using it with all our customers at a Fortune 1000 AI CoE I work at. An initial poc would need to see it work via jenkins then other ci platforms.
@spring wave I was wrong, the problem happens even if I make llmMiddleware.ModuleWithObject() a noop... No idea where to look next . I may have broken something from your module loading middleware code in my refactor? but the middleware now doesn't create any new types or patch every object type, it just adds 2 functions (setter and getter) per object type, always on the llm type
@gloomy kindle @shrewd fern I would like to hack together self-calls, I think it will be crucial for this feature to make sense to AI agent devs. Can you point me in the right direction? How would you go about it?
There need to be two separate passes - introspection and then codegen - atm it is one huge phase
To be able to codegen for yourself you need to know your own introspection
The options are either to hack it into every sdk manually or to cleanly separate out those phases at the sdk interface level
Mmmm I don't sense an easy stopgap for next week's demo 🙂
Yeah 😢
When is next weeks demo?
Maybe there's something I'm missing, but there's a reason its not done 😢 can reply in more detail when I'm back from PTO tomorrow
@gloomy kindle sorry I didn't realize you were on vacation!
From the engine pov I think self calls just work - it's the codegen for those self calls that's missing
Actually self calls are missing something on the engine also (I think)- you can't get your own ID
Ooo yeah okay
Although, that doesn't stop you doing a self call
Just doing a self call on an object you've made yourself
Right
But yes good point, that's a rather more tricky limitation
That probably requires some thinking about how to solve
I was initially hoping to do a stopgap without bindings, but really without the full codegen it's useless, because you can't "weave" self-calls into the rest of your code
Yeah for id-ying your own objects, it's a bit miserable I think, I think we'd prooobably need to embed dagql ID construction into the SDKs? Or something similarly tricky
Not out of the question, I've wanted this anyways, but definitely not trivial
I'm starting to think that we should prioritize self-calls aggressively, because it's a forcing function to address fundamental issues in our DX apparently
same for object persistence at the lower levels of the engine (buildkit caching etc)
I'd be happy to try picking this thread up again - I was working on it for a bit
Nice, that would be a trio of longstanding platform gaps, that the LLM use case really needs, and might all get a much-deserved boost:
- Object persistence (via @steep onyx attacking the caching layer)
- Generated clients (via @hidden tartan who needs it for his experimental Docker SDK)
- Self-calls 🎉
@shrewd ermine @spark phoenix quick status update:
-
I'm stuck on the remaining bugs, @spring wave offered to take a look at "bug 4" which is the most painful right now (can't load any module from llm branch). He is timeboxed because of other commitments so let's see if more help if needed today.
-
Meanwhile I'm turning my attention to melvin. Since I can't prompt from the melvin code yet (because of bug 4) I'm focusing on tools, and general project structure
-
I'm trying a "tools-first approach": what tools will the AI need for each part of the workflow? And what's the perfect environment to consume these tools? From there I design the corresponding sub-module. It has to be sub-modules, so that the top-level module can hook them up to a LLM (no self-calls means you can't hook up a LLM to your own module's types..)
-
Right now I'm focusing on a
workspacesub-module, since I wrote several variations of it for my past llm modules. Basically a basic environment for editing files, mounting them in a container, and running commands. With easy passing of files in and out, and checking the history of changes (to supervise the work of models) -
The next submodule which is up for grabs, is github interactions. Imagine a LLM tasked with monitoring a github issue, communicating back and forth with users (including catching new comments, distinguishing them from your own, etc), perhaps some abstraction for reporting a list of tasks and their status. Ie a declarative API for managing those tasks in a stateful way, instead of a stateless firehose of github messages (like in my demo 🙂
Nice the github one sounds similar to what @spark phoenix demoed
Yeah exactly
Basically picture yourself driving that module from the shell, to accomplish a task. You can't use anything other than that module's API via dagger shell. Can you do it?
If you can do it, then the LLM can do it (basically)
After that, there's still big questions around the AI-specific parts. Ie. which modules actually prompts LLM to do what? If we split up the agent into multi-agent, how do the agents talk to each other?
But I feel like if we get the nice environments working first, we'll be better equipped to answer
(also until bug 4 is fixed, we don't have much choice 🙂
Self calls might be possible once we complete the module generated client, it would just be a matter of loading the current module in addition to the rest
Because we could generate binding to call the current module functions
Aha!
I'm not sure it fully solves it, you still need to know what your own functions are (the introspection part), which is tied into codegen itself.
Also still no way to get your own IDs.
But yes, it's a way to potentially get something (but we still fundamentally need to refactor the interface if we want this to be built into all modules)
Yeah that's a lot of work indeed
I'm technically stuck on loading a module that doesn't have source for now, it's already a big challenge because the engine is assuming too many things when loading a module
Yeah are you working on untangling this? I'm happy to start on it, once I get back and get the next release prep started
Feels like it's blocking a few things
To understand how it would work, I'm doing a side implementation of a Module that doesn't have source, and then I'll find a way to consolidate everything.
Ideally we would load a module differently depending on the task to perform/what's available
Ah make sure to look at sipsma's recent pr
So maybe a bunch of interface around Module could abstract that, I don't know yet because it's not working for now haha
Which one?
I haven't seen it properly yet, but there's one that heavily refactors how modules are loaded to be wayyyy more efficient and simple
Goals here are:
Fix various sources of excessive module building cache invalidation
changing completely unrelated source files in context
building different git commits that have no changes in th...
@harsh stag so you got it to build? 🙂
ok so next steps:
-
You can hook up core dagger objects to a llm, from the dagger shell, to get familiar with the possibilities
-
as soon as we fix our remaining bug, you can start hooking up your own dagger modules to the llm so that it can drive them
Try this as a start (from the dagger shell session):
llm | with-directory $(directory | with-new-file hi.txt "Hi Bob") | with-prompt "look for a text file, then read its contents. Who is the message addressed to?" | history
@smoky ocean pushed a fix for bug 4
THANK YOU! Was it very stupid?
actually don't answer that 😛
it wasn't that stupid 😛 arguably something our codegen should handle (I left a comment + TODO)
OK I feel less bad for not figuring it out then
tl;dr there are some types that we exclude from module codegen (Engine, Host) - but codegen will still codegen fields that return those types, which is why it errors
Ooh! I was wondering why those specific types
Thought it was an alphabetic order thing + race condition... that's how deep in the rabbit hole I was
testing now
haha, i tried asking cursor "what do these types have in common?" after it indexed everything. but, no cigar. even though they're all literally listed together in the codebase somewhere, it just told me general stuff
@spring wave loading modules worked! But doesn't seem to hookup the setter/getter in Llm for the module type?
Maybe I commented out that part while debugging and forgot to reconnect it?
dagger shell -m github.com/shykes/hello
⋈ llm with-hello $(.)
github.com/shykes/hello ⋈ llm | with-hello
Error: no function "with-hello" in type "Llm"
seems like we shouldn't even need the module middleware anymore right?
I don't know - didn't fully understand why we needed it in the first place
something about 2 different modes of introspection co-existing in the engine
we needed to before because it had to add the agent fields to the module's type
oooh
but now the only thing that changes is the LLM type, and that's all through standard graphql inspection
i wonder if the schema actually is there, but the CLI has an outdated view of it, or something
But will external clients (eg. my CLI session) find all the setter/getter fields in the LLM type, regardless of whether they reference module or core types?
yea looks like it's there in the schema (good ol dagger listen + apollo sandbox)
it should but maybe there's a sequencing/timing thing, if it's all introspected at once or something
but it's confusing because you'd expect to have that problem with the module functions too
looking into it
pushed a fix for #3
nice thanks @spark phoenix !
I think I found a nice pattern for the workspace sub-module. Impatient to try it 🙂
@spark phoenix it feels like the pattern is to program the objects that the agent will interact with.
(eg. a workspace - makes sense that the workspace itself does not include prompts. It's not an agent, it's an object that an agent can use)
Ok I'm 99% there... pushed what I have. Quick lunch break then I try to plug a llm into it
I think think it's going to be
@shrewd ermine here's what it looks like so far: https://github.com/shykes/melvin/blob/main/workspace/main.go
Contribute to shykes/melvin development by creating an account on GitHub.
Note: the LLM pattern brings a lot of clarity to what should be in constructor arguments, and what should be in WithFoo chained methods... depends on what you want inside vs. outside the sandbox
Nice that makes perfect sense
Really happy with that "checker" pattern - in a perfect world it would be a dagger interface, but easier to use container + default args for now
@shrewd ermine
start=$(git https://github.com/dagger/dagger | head | tree)
checker=$(container | from golang | with-default-args go build ./cmd/dagger)
ws=$(github.com/shykes/melvin/workspace --start=$start --checker=$checker)
# Does the start workspace build?
$ws | check
# Let's make a stupid change and check again
$ws | write cmd/dagger/foo.go 'package main typo typo typo' | check
still looking into the shell thing but might have to timebox soon. the issue is similar to before: the field shows up with native graphql introspection, but it's not present in the fields for Llm listed under currentTypeDefs, so shell doesn't see it
That's what's surprising to me, shouldn't the existing middleware logic deal with that
cc @steep onyx any chance we could bother you with this real quick?
We have another "I see it in the graphql introspection, but not in the dagger introspection" problem
The interesting part is that we have a core type (LLM) that gets extended with new fields that refer to module types. I don't think we've ever had that dependency arrow direction before (core -> modules). I'm looking for places where we treat the core module as a "leaf" dependency, but from what I've found it should still end up installing to the same *dagql.Server that the other module dependencies install to. And yet, I'm seeing in the currentTypeDefs path for the core module that it doesn't see the fields via GraphQL native introspection. Despite them showing up from the outside.
Maybe we could pretend Llm comes from a module somehow? Would that help?
@shrewd ermine do you want to try the github submodule? Otherwise I'll give it a go in about 2h (after my board meeting 😛 )
still working on another demo at the moment (recording due Thursday) but I'm eager to check it out 🙂
New bug just dropped...
⋈ llm | with-directory $(git https://github.com/dagger/dagger | head | tree | with-new-file hi.txt "Hello redpoint team") | with-prompt "read the contents of ./hi.txt. Who is the message addressed to?" | with-prompt "write a new file to hi-back.txt with a response from the Redpoint team, saying, wow this is so amazing." | history
🧑 💬read the contents of ./hi.txt. Who is the message addressed to?
🤖 💻 file({"path":"./hi.txt"})
💻 xxh3:b00afab666cda867
🤖 💻 Filecontents({"id":"xxh3:b00afab666cda867"})
💻 "Hello redpoint team"
🤖 💬The message is addressed to the "redpoint team."
🧑 💬write a new file to hi-back.txt with a response from the Redpoint team, saying, wow this is so amazing.
🤖 💻 withNewFile({"path":"hi-back.txt","contents":"Wow, this is so amazing. - The Redpoint Team"})
💻 error calling tool: toSelectable: unknown type "DirectoryID"
Never saw this error before... Maybe today's fixes introduced it as a regression?
I just hit the same error
This one is fresh, probably introduced by a commit in the last 24h
yep. same error
https://v3.dagger.cloud/dagger/traces/f80feebf3bd856bcb0edf5780fe3db52?listen=efb33bacb39404eb&listen=c4276a3a2b6f9101
no commits on dagger/dagger seem to apply. In shykes/melvin you mean?
can i see the exact timestamp of a span in cloud?
I think it might be coming from "case 3" https://github.com/shykes/dagger/blob/llm/core/bbi/flat/flat.go#L255
time="2025-02-05T03:38:51Z" level=debug msg="Loading tool from field" field=id type=Directory
time="2025-02-05T03:38:51Z" level=debug msg="Checking if type is an object" kind=SCALAR typeName=DirectoryID
time="2025-02-05T03:38:51Z" level=debug msg="Field returns non-object type. Tool will return its value" field=id type=Directory
got a trace id?
Just don't have a timestamp on the error in the client to correlate to the engine logs
Looks like adding debug to the query string will do it. As in
https://v3.dagger.cloud/my-actions-org/traces/e8697a2b28f926da6e3c72a582f754d6?span=7954dc5a92b8a8ac&logs#7954dc5a92b8a8ac:L1295&debug
https://gist.github.com/jpadams/c372b825ed9cf41007d979a19774c10d <- Logs from engine side from Dagger Cloud
what I mean is on my 2nd cloud link I don't have a timestamp for toSelectable: unknown type "DirectoryID". No problem with timestamps on the engine side
@smoky ocean pushed some incremental progress towards getting modules working. now it works with -m but it still doesn't pick up modules installed interactively
llm | with-apko $(github.com/vito/daggerverse/apko) | ask "create a container with cowsay"
https://asciinema.org/a/xCIkrZJHzw5BXMWIZ793cyiLV
closed I got was surrounding span
https://v3.dagger.cloud/my-actions-org/traces/23c1103d13150197b2e117375b0e1fa9?listen=f2e8d3dee5b5043d&debug
{
"Version": 6,
"Final": true,
"ID": "ec509fb26a68ef32",
"Name": "Llm.withPrompt",
"StartTime": "2025-02-05T03:36:46.301423387Z",
"EndTime": "2025-02-05T03:40:06.32252127Z",
"Activity": {
"CompletedIntervals": [
{
"Start": "2025-02-05T03:36:46.301423387Z",
"End": "2025-02-05T03:40:06.32252127Z"
}
],
"EarliestRunning": "0001-01-01T00:00:00Z"
},
"ParentID": "f4772ea35cbed054",
"Status": {
"Code": 2,
"Description": ""
},
"CachedReason_": [
"span has children"
],
"PendingReason_": [
"span has completed"
],
"CallDigest": "xxh3:89846ea5b17eb54c",
"CallPayload": "ChV4eGgzOjM1ZmYxYjI3ZWJiMGUzZTMSBwoDTGxtGAEaCndpdGhQcm9tcHQiggEKBnByb21wdBJ4OnZ3cml0ZSBhIG5ldyBmaWxlIHRvIGhpLWJhY2sudHh0IHdpdGggYSByZXNwb25zZSBmcm9tIHRoZSBhZGRyZXNzZWUgb2YgdGhlIGZpcnN0IG1lc3NhZ2Ugc2F5aW5nIHdvdyB0aGlzIGlzIHNvIGFtYXppbmcuShV4eGgzOjg5ODQ2ZWE1YjE3ZWI1NGM=",
"ChildCount": 5
}
this looks like what i want
"Name": "💻 error calling tool: toSelectable: unknown type \"DirectoryID\"",
"StartTime": "2025-02-05T03:38:51.718863334Z",
"EndTime": "2025-02-05T03:38:51.718892376Z",
yep, just got into that 👆
no sure how I got there 😂
so it does line up with these logs
nice! I can work with that 🙂
yup that's bug 6
added a .refresh command, feel free to rename
alternatively we could refresh whenever a new module is served, but that seems like a bigger lift
Weirdly, "bug 6" makes it impossible for the llm to modify a container or directory state, but it works fine with a module type. (eg. a melvin/workspace)
I got melvin to complete its first coding task 🙂
@spring wave wanna try?
a glimpse of the future, made possible by your dagql acrobatics
yes please
is this bug 6?
yeah
🍿
#!/usr/bin/env dagger-llm shell -m github.com/shykes/melvin/workspace
# Starting point for the workspace
source=$(git https://github.com/dagger/dagger | tag v0.15.0 | tree)
# Checker container for the workspace. Here we try building the dagger CLI
checker=$(container | from golang | with-mounted-cache /go/pkg/mod $(cache-volume gomodcache) | with-default-args -- go build ./cmd/dagger)
# Setup workspace
ws=$(. --start $source --checker $checker)
# Run the agent!
agent=$(llm | with-workspace $ws | with-prompt "create a new go CLI at cmd/hello that just says hello. It should take an optional flag to say 'bonjour' in french instead. Use the 'check' tool to make sure the build is not broken")
# Get the result
result=$(agent | workspace)
# Print the diff
$result | diff
# Inspect interactively
$result | dir | terminal
source=$(...
need was trying to run line by line...start?
Sorry should have checked that it runs first, but got over-excited 🙂
OK here's a one-liner version to run straight from your dagger-llm shell:
_EXPERIMENTAL_DAGGER_RUNNER_HOST=tcp://localhost:1234 dagger-llm shell -m github.com/shykes/melvin/workspace <<'EOF'
llm |
with-workspace $(github.com/shykes/melvin/workspace --start $(git https://github.com/dagger/dagger | tag v0.15.0 | tree) --checker $(container | from golang | with-mounted-cache /go/pkg/mod $(cache-volume gomodcache) | with-default-args -- go build ./cmd/hello)
) |
with-prompt "write a new go CLI at cmd/hello that just says hello. It should take an optional flag to say 'bonjour' in french instead. Use the 'check' tool to make sure the build is not broken" |
workspace |
dir |
terminal
EOF
this one should work @bronze fern . Not as readable but..
yep! working for me 🚀 when I enter the terminal session at the end, I'm in the "workspace" container?
oh, terminal from a directory?
yeah
just interactively looking at the result
you can also replace | dir | terminal with | diff to see the changes
I wanted to end up with the go program in a golang container 🙂
to run myself. Not fair that the checker got to 😉
Yeah 🙂 Same thought. I went for a very small piece that does as little as possible - just a sandbox for the LLM to write code, and get a green/red for its loop
What's nice is that it's agnostic - you can use it to write a frontend, or even docs
You can even have a checker that actually calls another llm and asks "does this look legit to you?"
(that would require making the checker a dagger interface rather than a container though)
totally! Gues I can foo=$(llm...dir) and put that in a container
@smoky ocean pushed a couple UX things, feel free to rm
- spans for API calls that return a scalar now print the value to the span's logs, so now you see this instead of just nothing
- removed the silly string truncation, don't think we need it anymore now that we don't routinely pass giant schema inspection JSON strings around
bed time for me 👋
_EXPERIMENTAL_DAGGER_RUNNER_HOST=tcp://localhost:1234 ~/bin/dagger-llm shell -m github.com/shykes/melvin/workspace <<'EOF'
agentwork=$(llm |
with-workspace $(github.com/shykes/melvin/workspace --start $(git https://github.com/dagger/dagger | tag v0.15.0 | tree) --checker $(container | from golang | with-mounted-cache /go/pkg/mod $(cache-volume gomodcache) | with-default-args -- go build ./cmd/hello)
) |
with-prompt "write a new go CLI at cmd/hello that just says hello. It should take an optional flag to say 'bonjour' in french instead. Use the 'check' tool to make sure the build is not broken" |
workspace)
container | from golang | with-mounted-directory /app $($agentwork | dir) | with-workdir /app | terminal --cmd bash
EOF
works for me @smoky ocean 👆😁
not sure why it says "no checker configured"
typo in my script (new lines..) try running my updated version
gonna make this code, like a demo module, will be nicer
but frustrating that I can't return the workspace from my demo module
Super cool. Changed my $foo above to $agentwork since I'm mounting some agent-produced work in my image after all 🙂
noticed I can't use a variable name like agent-promised-work with hyphens. Seems to be interpreted as a module name or function name.
_EXPERIMENTAL_DAGGER_RUNNER_HOST=tcp://localhost:1234 ~/bin/dagger-llm shell -m github.com/shykes/melvin/workspace <<'EOF'
# Agent does some work with LLM and tools in its workspace, including an automated qa check
agentwork=$(llm |
with-workspace $(github.com/shykes/melvin/workspace --start $(git https://github.com/dagger/dagger | tag v0.15.0 | tree) --checker $(container | from golang | with-mounted-cache /go/pkg/mod $(cache-volume gomodcache) | with-default-args -- go build ./cmd/hello)
) |
with-prompt "write a new go CLI at cmd/hello that just says hello. It should take an optional flag to say 'bonjour' in french instead. Use the 'check' tool to make sure the build is not broken" |
workspace)
# Human does a manual spot check
container | from golang | with-mounted-directory /app $($agentwork | dir) | with-workdir /app | terminal --cmd bash
EOF
that's probably a bash parser limitation
I pushed a code version. But it's not working, I think it's hitting what's left of bug 5 (module loading)
llm # workaround to global llm config bug
github.com/shykes/melvin/demo |
go-programmer "a terminal snake game with ncurses-like interface and a basic AI opponent. Heavy use of ascii art effects. make it psychedelic." |
terminal
Error: input: demo.goProgrammer load: Call: Query has no such field: "workspace"
Updating my list of platform gaps for agent dev:
- Object persistence
- Generated clients
- Self-calls
- Type re-exporting (module A can export the types of module B)
- Type embedding
I wonder if this could be daggerized? https://news.ycombinator.com/item?id=42935659
Noice!Does anyone have a good recommendation for a local dev setup that does something similar with available tools? Ie incorporates a bunch of PDFs (~10,000 pages of datasheets) and other docs, as well as a curl style importer?Trying to wean myself off the next tech molochs, ideally with local functionality similar to OpenAIs Search + Reason, a...
@smoky ocean just a heads up, the change I made to log the response from scalar API calls leaks secrets 😅 hopefully I can just re-use the censoring code
well that solves the issue of sharing my openai pro subscription with you 🙂
the censoring code is a bit intertwined with Buildkit atm, thinking I can get away with just marking .plaintext sensitive internally, which has some precedent already (setSecret's plaintext arg) - trivial to implement, will push soon
pushed
taking a look at bug 6 unless there's a higher priority
The toSelectable one?
yep
This is the other one @spring wave . Continuation of bug 5 I guess. Module A imports module B, and calls dag.Llm().WithBType(). That fails
Oh also I found a bug 7 🙂 But can deal with it
I keep getting an error with the ".env"
Error: input: llmapp.foo select: failed to read secret file ".env": open .env: no such file or directory
I tried several locations but same error every time. Isn't supposed to look for the work dir from where you run the cli?
@storm gate I think that's bug 7. Are you calling llm from a module or from the CLI?
from a module, I want to experiment from the Go API directly
(so I created a dumb go mod)
Yeah that's bug 7. Should be an easy fix.
Workaround: call llm without argument in the CLI (call or shell) before calling your module
You can see an example of the workaround in this snippet 👆
It's because I store the llm creds as a global variable engine-wide (or session-wide? 🤔), so even modules can use it. But you have to instantiate Llm once to trigger the loading of .env. Whoever loads it first, everyone gets to re-use it. The bug is that, if it's a module who loads it first, then the engine looks for .env in the module's runtime.
@storm gate did the workaround fix it?
Yep, calling llm directly before anything else works, thanks! 👍
t'was this commit what broke it: https://github.com/shykes/dagger/commit/eb5cab452a0761d1e558a41931adfc465bbd40c2
I think we just want to make the sync call and then discard the result. The problem is sync actually returns a FooID, not Foo, and it's trying to treat that FooID into an object, but it's a scalar
also what's that View: v0.13.2 doing there?
cc @smoky ocean
Ran this with llama3.3 locally
root@ju9ikntv2jqee:/app# ls cmd/hello
main.go main_test.go
root@ju9ikntv2jqee:/app# cat cmd/hello/main.go
package main
import (
"flag"
)
func main() {
language := flag.String("lang", "en", "Language")
flag.Parse()
if *language == "fr" {
print("Bonjour\n")
} else {
print("Hello\n")
}
}root@ju9ikntv2jqee:/app# go run ./cmd/hello
Hello
root@ju9ikntv2jqee:/app# go run ./cmd/hello --lang fr
Bonjour
very cool
pushed fix for bug 6
❯ dagger-dev shell --no-mod
Dagger interactive shell. Type ".help" for more information. Press Ctrl+D to exit.
⋈ llm | with-directory $(git https://github.com/dagger/dagger | head | tree | with-new-file hi.txt "Hello redpoint team") | with-prompt "read the contents of ./hi.txt. Who is the message addressed to?" | with-prompt "write a new file to hi-back.txt with a response from the Redpoint
team, saying, wow this is so amazing!" | history
🧑 💬read the contents of ./hi.txt. Who is the message addressed to?
🤖 💻 file({"path":"./hi.txt"})
💻 xxh3:b00afab666cda867
🤖 💻 Filecontents({"id":"xxh3:b00afab666cda867"})
💻 "Hello redpoint team"
🤖 💬The message is addressed to the "redpoint team."
🧑 💬write a new file to hi-back.txt with a response from the Redpoint team, saying, wow this is so amazing!
🤖 💻 withNewFile({"path":"hi-back.txt","contents":"Wow, this is so amazing! - The Redpoint Team"})
💻 ok
🤖 💬A new file named "hi-back.txt" has been created with the response from the Redpoint team.
⋈
fixes the disappearing withExec bug
oh. i may have just broken that then. assumed it was AI slop 😅
Probably 😛
nice! thank you!
is there a repro for the withExec bug?
ha ha awesome 🙂 what does the .env look like for that?
LLM_HOST=wompbox.turkey-beta.ts.net
LLM_PATH=/v1/
LLM_MODEL="llama3.3"
LLM_KEY=ollama
turkey-beta is my tailnet
Is the /v1 part specific ot ollama? or to llama3.3? or to the openai API?
It's the ollama route for the openai compat api. It has the ollama endpoints at /
so it's v1 of the ollama api?
I'm asking because eventually we could have something like LLM_ENDPOINT=ollama://wompbox.turkey-beta.ts.net/llama3.3 or somethign like that 🙂
I don't know the context on why it's called v1. But v1 is compatible with openai. It has similar but slightly different endpoints at / that ollama-native libraries use
but was wondering if it should be ollama:// or llama3.3:// or maybe just http:// is enough
Ah got it. Since ollama can serve any number of models, I'm not sure where that should go. The /v1/ always has to get added somehow, though
ollama:// could work because it'll let us know to add /v1/ and other oddities we may discover. The model you might want configurable per-request still, even with openai or claude
Pulled the latest version of the dagger/llm branch and running llm first still gives me Error: input: llm select: failed to read secret file ".env": open .env: no such file or directory
did I miss another setup pre-req?
yeah you'll need a .env file in the repo root, containing your OpenAI API key
there's a "Setup" section here that covers it: https://github.com/dagger/dagger/pull/9504
thanks, was missing the M I was supposed TFR 😄
@steep onyx you also need to call llm from the CLI first, before letting a module call dag.Llm() (bug 7)
Yeah I was just missing the setup of .env in the first place, works now when calling llm. Or at least I get to the current error Error: input: demo.goProgrammer load: Call: Query has no such field: "workspace" now 🙂
Welcome to the frontier 🙂
for anyone else wanting to use thier own hardware with ollama, here's my setup
OLLAMA_HOST=0.0.0.0 OLLAMA_ORIGINS=* ollama serve
and in another terminal
tailscale serve 11434
which lets me access ollama over tailscale with https
and then in another terminal you can pre-pull some models you might want to try, like
ollama pull llama3.3
Taking a first stab at the melvin/github module follow the workspace pattern
Maybe I should try Gerhard's notification module, and get a bunch of comm channels all at once?
oooooh https://daggerverse.dev/mod/github.com/aluzzardi/daggerverse/github-comment @spark phoenix 🙂
@spark phoenix can I use that module to do the neat "edit comment in place" thing you did in your demo last year?
looking at the source code, it looks like it 💪
@smoky ocean @spring wave might have figured it out, at least I broke through the previous error and am now hitting:
Error: input: demo.goProgrammer Post "https://api.openai.com/v1/chat/completions": net/http: invalid header field value for "Authorization"
Dunno if I misconfigured something in my api key or what.
Either way, the diff that seems to fix it is:
sipsma@dagger_dev:~/repo/github.com/sipsma/dagger$ git diff
diff --git a/core/llm.go b/core/llm.go
index b24b18dc2..4bbb447b6 100644
--- a/core/llm.go
+++ b/core/llm.go
@@ -354,8 +354,8 @@ func (llm *Llm) messages() ([]openAIMessage, error) {
return messages, nil
}
-func (llm *Llm) WithState(ctx context.Context, objId dagql.IDType) (*Llm, error) {
- obj, err := llm.srv.Load(ctx, objId.ID())
+func (llm *Llm) WithState(ctx context.Context, objId dagql.IDType, srv *dagql.Server) (*Llm, error) {
+ obj, err := srv.Load(ctx, objId.ID())
if err != nil {
return nil, err
}
@@ -504,7 +504,7 @@ func (s LlmMiddleware) extendLlmType(targetType dagql.ObjectType) error {
func(ctx context.Context, self dagql.Object, args map[string]dagql.Input) (dagql.Typed, error) {
llm := self.(dagql.Instance[*Llm]).Self
id := args["value"].(dagql.IDType)
- return llm.WithState(ctx, id)
+ return llm.WithState(ctx, id, s.Server)
},
nil,
)
Basically, we were storing a dagql.Server in the object instances for Llm, but I think we probably were hitting dagql cache and thus ending up using a server from a totally different client where workspace had never been stitched into the schema. So passing the server around explicitly instead seems to fix it by ensuring we are using the one where workspace had been installed.
I wouldn't be surprised if there's some more errors lurking after the Authorization one but my hypothesis is that if there are we could work around them with some well-placed .Sync(ctx) calls in the module code. TBD.
There's a similar problem with for some other parts of the impl that were using the dagql.Server in the object, so I'll patch those up quick too and push this to the llm branch
Ah yeah that sounds very similar to what I ran into with CoreMod
Oh actually I wonder if I'm getting those auth errors because I'm sourcing the key from a file w/ a \n at the end..
EDIT: no, doesn't seem like that fixed it
Yeah I guess the caching makes it very tricky to store any state like that; I've wondered in the past if we'd ever end up getting bitten by storing *Query everywhere for similar reasons (don't think we have yet at least)
I don't think the CoreMod case was caching related specifically, it was more like having two different sources of truth: the "pristine" Core schema installed into CoreMod.Dag vs. the "live/unified" schema that currentTypeDefs was trying to introspect. Didn't matter until now because the core schema never changed at runtime. But, yeah, it was another weird consequence of storing a dagql.Server on a long-lived object
@steep onyx for that auth error, are you able to get a working query with just CLI core types? That would isolate whether it's linked to modules
Oh wait, okay there's multiple things:
- The
\nwas indeed a problem when sourcing from a file - For some reason the values read from
.envseems to be extremely cached in that they never update even across separatedagger shellsessions until I restart the engine
But now it works.
Next error was: process "go build ./..." did not complete successfully: exit code: 1
I had started with --interactive though and could see it failed to build just because there was only a main.go and no go.mod.
After manually running go mod init, the new error when trying to build is:
./main.go:143:16: invalid operation: cannot receive from non-channel termbox.PollEvent() (value of type termbox.Event)
Which I think just means the LLM generated go code that doesn't compile? So success 🎉 ? Sorta?
Pushed the fixes to https://github.com/dagger/dagger/pull/9504
Re: value of .env, the way I do it is... hacky.
I have a single global variable in the engine code. The first call to Llm (from any client, including module runtimes) triggers a callback to fetch file://.env (via new secrets provider code). Then it's persisted in the global variable for all other clients.
I do this to allow modules to "break out" and use the host's .env. But as a result yeah, it's engine-wide..
This 👆 is also why you have to call llm from the client, to force hydration of the .env from the host, before a module gets to call llm first. Otherwise, the module "wins" and my code tries to hydrate llm config from file://.env in the module runtme container
I tried to solve that problem by auto-hydrating, but actually didn't find a good place in the codebase to do it...
./main.go:143:16: invalid operation: cannot receive from non-channel termbox.PollEvent() (value of type termbox.Event)
I have never seen this error before in my life
It's from the code I assume was generated by the LLM, so I think it just did a bad job generating it? Unless I was missing something
Oh! I thought it was from our code 🙂
Normally it should continue in a loop until the code builds
Yeah I was in an interactive debug container, so I guess the loop didn't kick in
Pulling fixes now 🙂 Thank you!
If anyone's around, let me know if you want to see a cool demo 🙂
@shrewd ermine tune in, the show is about to begin 😛 https://github.com/kpenfound/dagger-modules/issues/8
@shrewd ermine more fun if you have the page open to see live updates 🙂
@shrewd ermine what's your favorite movie?
Haha, I'm literally in the process of opening PRs on your x repo so it's all fair game
probably Interstellar
(engine build... can't wait for those caching improvements)
of course out of nowhere the build is 10x slower
still building...
finally
OK @shrewd ermine I'm going to run this:
llm |
with-github-progress-report $(
new-progress-report interstellar GH_TOKEN kpenfound dagger-modules 8
) |
with-prompt "You are the hero of the movie Interstellar. Retrace the whole story of the movie. As you go through it, share your journey with us in your progress report. Also keep us updated on the various tasks you accomplish throughout your journey" |
history
(quick reset as I configure my github token)
Ha ha that was terse... Let's try again
llm |
with-github-progress-report $(
new-progress-report interstellar GH_TOKEN kpenfound dagger-modules 8
) |
with-prompt "You are the hero of the movie Interstellar. Retrace your whole journey, and send us updates as you experience it. Write the summary in movie script style. Also keep track of your tasks throghout the adventure." |
history
@shrewd ermine now I'm starting a parallel pipeline for a different movie (eg. different progress udpate) on the same issue.
this is awesome 😮
hold on, re-building the engine... I think there's a new bug, it seems that every completed session causes the engine to be unreachable...
(and by "re-building" I mean "re-running, but with a 120 second rebuild overhead")
Ok I'm preparing the double movie report, setting it up to record
Ok I got you back now
I'm happy that it's so re-entrant, very compatible with Dagger's signature rapid dev loop
https://github.com/shykes/x/pull/8
. | with-github-token env:GITHUB_TOKEN | create github.com/shykes/x "add_readme" --fork-name "shykes-x" | with-changes $(.core | directory | with-new-file "README.md" "This is a repo of experimental Dagger modules") | pull-request "Add Readme" "This adds a basic readme"
(thanks to @spark phoenix 's marker trick, I'm using his module under the hood)
Ok but will you dare plug a brain into it? 🙂
Seems easy enough! can either do what we had working earlier and save the dir to a var, put that to the with-changes, or teach an llm to use the feature-branch module
yeah actually that part wouldn't be LLM-connected I guess
could use some better prompting, but here's what I got
https://github.com/shykes/x/pull/9
export GITHUB_TOKEN=$(gh auth token)
_EXPERIMENTAL_DAGGER_RUNNER_HOST=tcp://localhost:1234 ~/bin/dagger-llm shell -m github.com/shykes/melvin/workspace <<'EOF'
# LLM work
agentwork=$(llm |
with-workspace $(github.com/shykes/melvin/workspace --start $(git https://github.com/shykes/x | head | tree)) | with-prompt "look around the repository and summarize its purpose in /README.md" | workspace)
# PR
github.com/kpenfound/feature-branch | with-github-token env:GITHUB_TOKEN | create github.com/shykes/x "add_readme_llm" --fork-name "shykes-x" | with-changes $($agentwork | dir | without-directory .git) | pull-request "Add Readme" "This adds a basic readme"
EOF
So in theory another agent can be driving this workflow instead. 1) read issue (not included, 2) make changes (agentwork), 3) make PR (final step)
@shrewd ermine first attempt at wiring together workspace + github progress update + some prompting glue 🙂 https://github.com/shykes/x/issues/7#issuecomment-2639027714
Trace -> https://v3.dagger.cloud/dagger/traces/3a8d5039e7e57148ce2b21e5df976815?span=0a2e712b42f4810e
We're going to need Dagger interfaces... lots of them
also, multi-object (named variables idea we discussed @spring wave) is going to be a must
Could anyone check if they can reproduce my issue:
- Run llm engine
- Run a complete shell session against that engine
- Try running a second session -> can't connect to the remote port
- Engine looks like it's still running (no crash or obvious error message) but impossible to reach it
checking. Btw it looks like a binary was committed to melvin at github/github
oh no it's the curse of Devin
can't reproduce that issue
Just wanted to share with the community we can now contribute faster with native GitHub copilot agent support.
I think we will have more features faster now. Let's leverage it and enjoy.
https://github.blog/news-insights/product-news/github-copilot-the-agent-awakens/
@spring wave I'm moving up a layer to Melvin, trying to build something cool on the primitives we have.
I was wondering... How do you feel about trying to implement multi-object? (ability to give the llm several objects with variable names). I'm 100% conviced it will be better. Without it, you find yourself doing more gluing together with traditional code than you really need to. Also that part of the Dagger DX is not great - eg. can't embed types, so if I have 2 modules and I want to give the same llm access to both, I need to create a wrapper type and basically copy-paste everything...
Of course the challenge is getting it to work 🙂 I have a reasonable idea of how it would work - but it's a BBI change. Not flat anymore... Closer to your graphql implementation I think.
wdyt?
Yeah makes sense to prioritize it imo. How does it work right now? If you call withFoo(foo) and then withBar(bar), does that have foo + bar in the context, or just bar?
Trying to judge if context is going from n=1 to n=T or n=(T * V) (T=types, V=vars of that type)
Just bar
there's a state dagl.Typed. It would have to become state map[string]dagql.Typed
I actually started down that route when refactoring from agent to llm branch. Then realized "this is a significant BBI change, it's doable but one thing at a time"
@spring wave you can see some remnants of that scaffolding being removed in ec217d68f9f6d0a263396cd4b52b78217ee9e069, eg. https://github.com/shykes/dagger/commit/ec217d68f9f6d0a263396cd4b52b78217ee9e069#diff-73f0094219b4f5510b335d7f8ca08d679fb0c4b2014c3dbdcaafe6d84bd3f38fL375-L381
To break down the big problem into smaller problems, we could hack multi-object into the current flat BBI. We could inject special "meta-tools" prefixed with _ or something, always available, and would allow the model to list variables, select a variable. set a variable, etc. The key is the concept of "select", then the rest of the flat BBI applies to the currently variable. Could actually perform quite well
If you're already heading down the gql route, let me know, I can try this 👆 in parallel (but probably not today)
@spark phoenix one gap that I found, is that using Dagger interfaces as callbacks would be very valuable... That checker trick in workspace works super well, but it only works for running a raw container... Without interfaces I can't eg. have the agent's innermost devloop send updates as it encounters issues for example.
Another task if anyone is looking to help: --> actually there's something simpler we can doLlm.WithPromptTemplate(prompt string, values ... with builtin support for mustache/handlebars templating, would be nice. (Not sure what type we could use for values)
Also WithPromptFile which would take a dagger.File instead of a string. That would allow me to move prompts into separate text files and grab them as contextual arguments... Would make the code way cleaner (half of my code is embedded strings atm).
Feature request: add support for Claude Sonnet3.5, and perhaps also Gemini?
done
The DX with WithPromptFile + optional vars for templating is really nice
$ cat prompts/historian.txt
You are a historian. Given a period or historical event, provide a paragraph summarizing what you know about it; then bullet points about key facts and events.
The topic is: $topic
func (h Historian) Explain(
ctx context.Context,
// The topic to explain
topic string
// +optional
// +defaultPath="prompts/historian.txt"
prompt *dagger.File,
) (string, error) {
return dag.
Llm().
WithPromptFile(prompt, dagger.LlmWithPromptFileOpts{Vars:[]string{"topic", topic}).
LastReply(ctx)
}
No more embedded strings for me 🙂
@shrewd ermine
. GH_TOKEN github.com/kpenfound/dagger-modules 8 | go-programmer --start $(git https://github.com/dagger/dagger | head | tree) "Improve the core/ and core/schema/ with a top-level file() function that returns an empty File, similar to how directory() returns an empty Directory. Calling file() (in graphql/dagql: '{ file(name: "foo", contents: "bar") { ... } }' is a convenience wrapper for '{ directory { with-new-file(path:"foo", contents: "bar") { file(path: "foo")} { ... } } }'. Look at core/file.go core/directory.go core/schema/file.go core/schema/directory.go. Depending on how the code is laid out, you may need to also look at core/query.go and core/schema/query.go. Please be careful to keep the change clean and concise. Necessary changes only"
Can't wait for this:
llm --model openai://gpt=-4o
llm --model google://gemini2
llm --model anthropic://claude3.5-sonnet
llm --model ollama://llama3.3
etc. etc.
Just abstract away the minutia of setitng up 1) hostname 2) path 3) auth 4) creds. Also I heard some providers have you select the model in the query, and others in the http header.
@shrewd ermine FYI I just pushed some cleanup I hadn't committed
Early support for Claude Sonnet as well as a LLMProvider abstraction to support others: https://github.com/shykes/dagger/pull/297 - I made it as a draft PR on @smoky ocean's fork because I don't want to break the branch. Claude Sonnet seems to work fine but I have issues with gpt4o, not sure if I intro'd a regression, I'll spend more time debugging.
Thank you!
Btw @storm gate there is a feature request from @shrewd ermine to make the model configurable as an argument to llm() (so that you can mix and match different models in the session). I haven't started yet, so if you want to do it, it's all yours. Otherwise, just keep in mind that it's coming, in case that changes how you want to do your PR (to avoid conflicts I mean)
Yeah I thought of that as well, but there is still the bug of the session being sticky and global to the engine…
But you can pass it as an argument to core.NewLlm. Then the bug doesn't affect you. It only affects the global variable (which would be overridden by the argument)
So it will only be the default llm settings that are global to the engine
Actually the Llm type already has the fields to persist its own copy of Model, Host, Path, Token, so if you add arguments to the type constructor, and fallback to using the corresponding field from the global variable as a default, it should be all you need.
The only reason we need that global variable really (and have to deal with the stickiness problems) is for the token, so that modules don't have to provide a token. But for model config it's fine I think
(until a random module decides to make 100 requests to o1 using my account... but we can worry about that later 😛 )
Yeah makes sense, also it’ll allow to pass the config directly in code instead of relying on the .env
@shrewd ermine FYI I stopped passing prompts as contextual dir, it makes the examples look more complicated. I used dag.CurrentModule().Source() instead
Sounds good!
@shrewd ermine I'm trying to decide how to divide up the different github-related modules
Was wondering the same myself. I'm going to push a new one tonight which is just a webhook listener. Seems like a good unit to have it's own modules. But I felt bad adding the issue fluff to the demo one
- We embed our own github client to get the issue contents, but then call
aluzzardi/daggerverse/github-commentfor the in-place editing, which uses its own client lib - My progress update logic is not super github-specific, so ideally I would move it into its own module. But in practice it only works with github today
@shrewd ermine can you explain the patch to github/ that loads the contents of the issue at initialization? What does that do?
Yeah since the github-comment module is write-only we didn't have a mechanism to read/store the issue title and body which I wanted to use that as the assignment if one wasn't provided. Not the cleanest
I should just split it out to a read-only issue module
I think I'll move the progress report stuff to its own progress module, and keep it write-only (since it's a progress report) seems easier to understand
in practice it will be github-only (for now) but can be expanded later to support eg. discord, etc
Yeah that abstraction makes sense to me
I can tell we're going to need interfaces soon... not looking forward to that
Layers on layers
@shrewd ermine I'm adding a reviewer agent to give extra feedback to the coder agent, beyond "it builds". So far implementation is 6 lines 🙂
@spring wave not demo-relevant, but just to confirm: I think multi-object, and in particular typed multi-object, will be a game changer. Especially for making it super easy to get "callbacks" from the model.
For example right now I'm slapping together a code-reviewer agent, to quickly tell me "hey does this seem good to you?" so I can wrap the "get it to build" devloop into another "get it to pass review loop" (and those will be actual for-loops, how cool is that).
BUT there is the question of the for-loop condition. I can easily get a long bla-bla-bla from the reviewer (last-reply. But in this case I need a boolean (merge / no-merge) or perhaps an int (review score 0-10) so my code can decide when to stop. So I'm missing a low-friction way to pass typed information back and forth. I can't use the state because there's only one slot at the moment and it's taken by the directory to review. So this is where having a 2nd typed object which can receive the score would be perfect.
And one more thing: in simple cases like a boolean or integer score, it will be extra convenient I am guessing, if we can attach scalars as variables. So I would just go:
score := dag.Llm().
WithInteger("score", "Review score. 0/10 means unacceptable; 10/10 means perfect", -1).
WithDirectory("code", "The code to review", source).
WithPrompt("review the source code and set its score from 0 to 10").
Integer("score")
The scalar thing is kind of a bonus. We can start with custom objects and then continue from there. I'm just trying to imagine the most lean possible DX, since simplicity seems to be the selling point here
@shrewd ermine @spring wave live test upcoming at https://github.com/shykes/melvin/issues/2
damnit, it's getting stuck on missing go.sum and trying to manually edit it... need to give it a way to run shell commands
Ha ha it worked 🙂
oh nice! go 1.16, a fine vintage
OK I found a way to punt on shell commands (my checker just runs go mod tidy && go build ./...)
Gotta try giving it something harder
What was the prompt for that out of curiousity
The reviewer is too easy to sway. Trying again 🙂
"The code is just a placeholder with a giant FIXME. 8/10"
Yeah this is why I thought I was seeing prompt entropy with the reporter. I think it's just down to prompt crafting though. It will take that first part of the prompt and make sure it solves that, assuming the rest is bonus points or something
coder agent reads its review score: 4. Coder agent is sad...
now at 6/10!
(just pushed everything)
Hello newcomers! 👋 This is where the magic happens 🙂 The whole Dagger team is nearby, don't be shy, we're happy to answer questions ansd help you get setup
@spring wave is there any way to make the LLM traces look awesome in the experimental cloud, without losing the cool emojis in stable cloud v3?
yeah you can add them back to the span names
Ah cool 🙂 Are they still treated as regular custom spans?
they're spans with their message as logs isntead of the span name (so they can stream)
which i think is worth keeping
Ie. I saw now there's a span that says "LLM response" and you have to expand it to see its "logs". So is it OK if I just change that span name to the actual response? or weird?
Ah
Ah for the streaming
Yeah that makes sense
i think it's weird - and poor UX, since you have to wait for the entire response to be done
I did notice some streaming in the TUI... thought I was hallucinating at first 🙂 Very very cool
so if you just make it '<emoji> response' and '<emoji> prompt' or something that's probably a good balance
Also FYI the TUI seems a little more prone to locking, and display glitches
(this was on last morning's version though)
are you on latest llm? i pushed a bug that was live for a bit
it was requesting the terminal's background color on every paint for the markdown logs 😅
yea try just pulling and rebuilding the CLI (don't need to rebuild the entire engine - go build -o ./bin ./cmd/dagger should do fine)
Nice, will try that now
i'm also trying and get the new bubbletea shell in a good enough state for the demo, i think i'm close. will ping when it's ready to try out
it's already pretty usable, has autocomplete and all that, just want to make some changes to the UI
that's on vito/llm-shell if you want to try building muscle memory early
Can you remove the builtins .foo from the autocomplete while you're at it? 😛
from what complete scenario?
Oh nice I just remembered - that's for seeing the traces "lingering" in the TUI? 🤯
yup
Forget it, it's nothing - don't want to distract you from the magic tracing stuff 🙂
lol k
Something else we learned / confirmed yesterday:
-
Controlling model name from the function is non-negotiable. Models are not interchangeable, they are an application concern. The same codebase will orchestrate different models for different tasks. We can't abstract that away. Cost; capabilities; performance; and even prompts; none of those things are portable across models.
-
As a convenience, we can choose a reasonable default; and also expose generic families of model, to give developers the choice of making their function more portable. Then it becomes a SWE decision: up to you how portable vs. optimized you want your code to be.
-
Corollary of the above: we may need to expose discovery of available models (ie. those that have a valid configuration in the current engine).
-
Routing from model name to llm endpoint (host + path + creds + raw client) can be hidden from the function code, and left to the operator to configure. The result may be something between secrets providers, and docker config (different creds for different registries)
@spring wave let's move the oss coding stuff here 🙂
@spring wave looks like my issue was a fluke... Ctrl-a Ctrl-e works in zed also now
@gloomy kindle @compact swan is any of your current work (sdk config etc) relevant to a global client-side config? We might need that to clean up LLM config... Right now it's a .env in the current directory with a few flat vars... We're going to need something closer to a docker config, with multiple llm endpoints, each with their own config. How far off are we from having that kind of plumbing available?
@smoky ocean pushed Ctrl+C fix
HOLY 🤯💩🤯💩🤯💩🤯💩🤯💩 now I see it
streaming the spans and everything
erasing history now 🙂
actually tools is a bigger problem than history
@spring wave I don't know why, some things don't work as well in the default zed terminal. Ctrl-L doesn't clear, but in ghostty it does.
are you doing plain old cmd+v?
or you mean you can't select stuff to copy?
ah the second part is definitely true, maybe i just need to disable mouse input for this
can't select
another bug: calling a function without required arguments, doesn't print the error
pushed - i just disabled all mouse input across the TUI, which might be worth shipping anyway, it was only used for scrolling up/down
I'm getting the streaming but then it disappears at the end. Is that something I need to toggle?
yeah try alt+=/alt++ to bump verbosity from input mode. or hit <esc> to swap to navigation mode and just press =/+
(and <tab> or i to return to input mode)
fixed
one sorta clunky thing is with increased verbosity now you also see all the spans for the individual field selections
i'd love to tuck those under an internal span to hide them, but they're actually what drives the real evaluation
maybe i can change that query to just id or sync and then do those as a separate query 
Ok I got it to work by pressing alt-+ a random number of times until it showed it.
I think no visual indicator of verbosity level right?
only in non-input mode (<esc>)
✔ llm | with-prompt "your name is bob" | with-container $(container) | with-prompt "You have access to a container. install nodejs on it" | container
defaultArgs: []
entrypoint: []
mounts: []
platform: linux/amd64 default│ │ │ ┃ main.main()
│ │ │ ┃ /app/cmd/init/main.go:115 +0x906
│ │ │ ! process "apt-get install -y curl" did not complete successfully: exit code: 2
│ │ │ ✘ .sync: ContainerID! = xxh3:5f8c1553c7884d06 0.5s
│ │ │ ! process "apt-get install -y curl" did not complete successfully: exit code: 2
│ │
│ │ ✘ withExec(stdin: "", args: ["apt-get", "update"], insecureRootCapabilities: true, noInit: false, redirectStderr: "", redirectStdout: "", expand:
│ │ ! process "apt-get update" did not complete successfully: exit code: 2
│ │ │ ✘ Container@xxh3:6934f6e558023746.withExec(args: ["apt-get", "update"], expand: true, expect: SUCCESS, experimentalPrivilegedNesting: true, inse
│ │ │ ┃ panic: exec: "apt-get": executable file not found in $PATH
│ │ │ ┃
│ │ │ ┃ goroutine 1 [running]:
│ │ │ ┃ main.main()
│ │ │ ┃ /app/cmd/init/main.go:115 +0x906
│ │ │ ! process "apt-get update" did not complete successfully: exit code: 2
│ │ │ ✘ .sync: ContainerID! = xxh3:f74eb22c5bd89241 0.3s
│ │ │ ! process "apt-get update" did not complete successfully: exit code: 2
│ │
│ │🤖 It seems there is an issue running the apt-get update command, which might be due to a base image that doesn't use apt as its package
│ │ ┃ manager. Let's first check which package manager is available in this container and then proceed with the installation steps for Node.js
│ │ ┃ accordingly. Can you please provide more details or context about the container's base image?
│ $ .Container: Container! = xxh3:2fbfa0d0748ed4a0 0.0s CACHED
│
│ $ loadContainerFromID(
│ │ │ id: $ Container.from(address: "alpine"): Container! = xxh3:ff04b88d02461bd7 0.0s CACHED
│ │ ): Container! = xxh3:0621482c6c1c7b28 0.0s CACHED
. ⋈
nice. What's the minimum verbosity level for lingering spans?
ok!
could also do dagger shell -v
@spring wave sorry separate question. I'm trying to make a barebones implementation of Workspace, that's so simple I can live code that in a demo
Basically read, write, build
My issue is that `dag.Container().From("golang").WithExec([]string{"go", "build"}).Sync(ctx) will not return the full stderr in the error. So I can't just pass the error through to the model. It will only see "process exited with code 1 blablabla".
I have to manually set expect: Any, then check for exit code, get stderr, etc etc.
Am I holding it wrong? Is there a way to get the actual stderr in the error without my own glue code?
separately: calling a module function that returns an object, prints the ID
i'd expect something like this:
err := dag.Container().WithExec(...).Sync(ctx)
if err != nil {
var execErr *ExecError
if errors.As(err, &execErr) {
// ... access Stdout/Stderr/etc
}
}
the stdout/stderr was removed from the error message because it makes the UI insanely verbose and redundant
that's even worse IMO
that's standard go error handling?
(in the context of my demo - showing Go SDK-specific tricks etc)
Yeah I guess
It's just that the audience isn't a Go audience, so I'm pushing it by showing them Go at all
how about python 😛
but you're right, it's hard for LLMs to grok the full details when things fail, i think we need a way to feed them the telemetry report or something
i ran into that issue with the Claude MCP demo
it'd run a command, the command would print a clear failure that I could see, but Claude had to continue on blindly
Why is it redundant to add stderr to the error?
In the contexts of BBI, I don't have any special information about the behavior of withExec, so I would just expect it to return the most informative error possible. If a command was executed and failed, the contents of its stderr seems like the most informative content possible no?
because those errors get displayed in the UI for span errors, so you end up with gigantic swathes of red text, poorly formatted because it's split Stdout/Stderr
and there isn't a good heuristic for cleaning it up
like when you run a command in go and it fails, you don't get an error with its entire stdout/stderr
I wouldn't expect stdout to be there, just verbatim stderr, perhaps prefixed with "failed with exit code <n>: <stderr>"
yep, still too noisy
trust me 🙂 there just isn't a great end down that path, i tried for a long time
You mean for rendering? But isn't that just a rendering issue? "only render the first line of multi-line errors"?
i tried that
and that breaks things that intentionally return multi-line errors
like linters
Yeah I know you spent a lot of time on that rabbit hole... Just trying to replay the whole thing to understand the ramifications for agent dev specifically
I'm looking at your new LLM trace rendering and thinking - we're dealing with multi-line content already. You see a snippet, then you can manually click to expand. And it looks <chefs kiss> Couldn't we do the same for long errors?
Otherwise someone will have to special-case with-exec, forever. And also we'll have to tell everyone "never return multi-line errors, we don't like those"
maybe for the web UI, but the TUI would still be uglified
i would rather try replaying telemetry data or something
we already persist it engine-side, just need to work out the interface
i'll think about it more, but it was definitely a frustrating thing to try and fix and workaround in all of our UIs
Ok, I get the whole chain of reasoning, it's just going to be really tempting to just have a wrapper module with a MyExec that gets what I actually want which is stderr, and puts it in the error if the engine refuses to do it
Basically a human looking at the TUI has access to error information that my code (and by extension the LLM consuming it) can't
right
theoretically that's also true of anything that prints logs that you don't directly access via stdout in the API
like if you doa container.withExec("echo hi").withExec("echo bye").stdout the agent will only see bye
it can't learn from anything that might have gone wrong earlier
Yes, it's just that in the context of a command-line tool failing, the contents of stderr is not really logs, it's also an error message
I get that we're paying the price for the ambiguity of what stderr is for
i guess my point is the medium for a human is the full logs/telemetry, and if we solve it that way we'll get more benefits; when something fails it might be more than just the failed command that's relevant to understanding why it failed, so in the same way that a human might go back and grok the full context, the agents should too
I agree. But just like humans, llms should be able to get the bare minimum feedback directly. Imagine if the shell said "your command failed. I would tell you the error message but that would be incomplete context. For a better experience, ask your administrator for grafana or datadog access"
There's a long tail of issues where just getting a few lines from stderr is all you need to get "oh right I forgot the go.mod" and keep moving
Then for more complicated stuff, I love the idea of introspecting the telemetry. Love love it
@spring wave sorry to jump to a completely different topic (demo prep mode...)
It looks like at the moment, only a module's top-level type is available to attach to Llm, is that right? Or is there another issue with my code?
hmm should be all types, maybe you need to run a .refresh in your shell?
ah you might be right
Oooops sorry - it was prefixed by the module name doh
oh cool
maybe a good heuristic is returning only the last line of stderr? 
Side note: just ran into lack of self-calls. Need to make my demo more complicated by splitting it into 2 modules
I still think it's OK to have the whole thing, and have heuristics for rendering instead. But will shelve that for now 😛
@spring wave is it an easy fix to print errors returned from function calls? example:
sorry for bombarding you
Got some ETOOBIGs
side note: are you cranking up the verbosity? or maybe you have an old engine? those loadFOoFromIDs shouldn't be showing up
(i think it's the latter)
haven't fixed this in the TUI yet, should it just truncate?
Not sure. Let me re-run with your latest version and see
Quick notes:
- Streaming is a game changer
- Feels more verbose than I need it to, I would love something slightly less verbose, but with lingering
- Tool calls aren't clearly shown - but maybe that's because of the extra verbosity, ie. drowned in the noise a little
- In view mode (esc), it doesn't auto-follow
(example of verbosity: lots of visible IDs)
define "old" 🙂 Should be the latest branch as of yesterday morning
too old!
gah
Streaming may impact tool calling.
Some of the upstream models won’t run tool call when streaming is enabled
Oh we don't support streaming to the models, just streaming their output tokens to the end user via otel
or is that what you mean by "support streaming"?
Also good to see yuo here @void flint !
man, i had the cutest little demo of "use my apko module to install and run cowsay" and it worked first try, but now it doesn't work, for one of ~3 different reasons. one panic where the id arg wasn't provided somehow, one panic from OpenAI not returning any Choices - maybe API error. and now it installs the package, but then tries to call Apko.asContainer which just errors because it's blank. (instead of just chaining from Apko.wolfi(cowsay) which it just called)
or it just doesn't do the thing i asked...
Figured out a trick so the tool calls reveal the function directly, so we don't need the 'fake Call' hack anymore: if you combine our passthrough and reveal attributes, that tells the UI: 1. reveal this span (the tool call), but then 2. instead of actually showing the span, show its children. Now you see this, instead of ContainerwithExec(id: "xxh3:...") in a non-chained form
@smoky ocean that'll need a new dev engine though 😅 feel free to skip if it's cutting too close
You can set the LLM models to stream responses back.
Ollama doesn’t support tool calling when asking for streamed responses
Checking the docs, OpenAI and Claude do allow for tool calling and streamed responses
At the airport, so will follow along rather than be able to test this
Depending on jet lag, maybe Friday I’ll have a play
Ah ok I see 🙏
I believe we're not setting streaming explicitly (or maybe @spring wave you enabled that in the openai client when you added streaming to telemetry?)
Noted for ollama @void flint !
yeah, I changed it to do a streaming API call instead. I guess we can make that a per-provider decision
@smoky ocean pushed some minor CLI-only fixes fyi (fix quitting too aggressively, improved spacing a bit)
@spring wave any chance you could add 1) a mode that is less verbose while still lingering 2) showing emojis in that mode.
Feedback I'm getting: still not as clear and pretty as | history (in the specific context of a 7mn demo)
will take a swing at it 👍 i'll try just having it only show revealed spans, which tracks 1:1 to history
So thats why my demo broke today after pulling lol
thank you! Also getting that feedback of "too much debug info on verbosity=2" from several people, they feel overwhelmed
oh shit - did it screw up a presentation? D:
No no you're ok, I wasn't showing it to anyone today
Which is why I risked the git pull lol
haha k cool
hm try pulling - i think that turned out to be a one-line change
wow yeah that made a huge difference. sweet
@smoky ocean btw, just a heads up, Bubbletea can only paint within the visible screen region, so output will get cut off at the top.
- To scroll up, hit
<Esc>and then arrow keys/hjkl, but it works in a particular way (it's not a pager, it hops between spans). - We could bring back mouse events, if we're OK with saying just hold
Shiftto bypass it for copy/paste/whatever. - I could bring back mouse events only in navigation mode which could be a decent balance. (
<Esc>=> scroll)
Side note: the full non-cut-off output is printed on exit.
I have that working for golang sdk, and now working through the moduleSDK. We still need to figure out how those config name/types will be made available to api, but i am thinking early next week (as we still need to add tests and get through the review process).
what about a global config in -/.dagger or equivalent, can we have plumbing to read/write to that from core? like a special session attachable for client config
some shell feedback - I find myself making this mistake a lot:
dagger on llm [$!?] via 🐹 v1.23.2 took 1m56s
❯ dagger-dev shell -m github.com/shykes/melvin/demo
Dagger interactive shell. Type ".help" for more information. Press Ctrl+D to exit.
✔ .doc 0.0s
MODULE
demo
ENTRYPOINT
Usage: github.com/shykes/melvin/demo <token> <repo> <issue>
REQUIRED ARGUMENTS
token Secret
repo string
issue int
✘ . PAT https://github.com/vito/testctx 1 1m17s
! constructor: accepts at most 0 positional argument(s), received 3
│ ✔ load module . 1m17s
✔ github.com/shykes/melvin/demo PAT https://github.com/vito/testctx 1 0.4s
issue: 1
repo: https://github.com/vito/testctx
github.com/shykes/melvin/demo ⋈
I ran it from ~/src/dagger, but with github.com/shykes/melvin/demo as the current module, and I always expect . to be the current module, not the current working directory. So then it loads the ~/src/dagger module and runs its initializer, which expects completely different args, and fails without telling you which module failed. Takes a bit to figure out. (The solution is to use github.com/shykes/melvin/demo instead of . but that was the whole point of me doing -m, in my mind)
the session attachable is straight forward, but I think Justin has lot of context/plans on how global config should work, so will let him answer that.
got rid of ETOOBIG 🫗
Yeah that’s currently what I’m working on actually!
Welcome newcomers from the AI devtools meetup 🙂
Tomorrow I will update the repo to make it easy to replicate my demo.
not really in scope for what we're currently doing
it's pretty detached imo - but yes, we do need this 😄
in fact, it's pretty much this https://github.com/dagger/dagger/issues/9007
the title maybe isn't 100% correct, "session" refers to the fact that we would load the config every session - it doesn't configure the client, it's instructions for how the client configures the engine (for the duration of it's session)
but as i remember, you were suggesting that that config be merged into the engine config?
pushed a change to just stop the CLI from auto-fetching tools, history, and lastReply
Sorry didn't have time to remove those from engine yesterday.
Did you hardcode those 3 fields specifically in the shell logic? Might be a sign that we should discuss the shell's auto-print policy. In the past I floated the idea of either printing nothing on a returned object, or printing its state (ie persisted fields). WDYT?
Yeah we already had a list of type fields that we avoid fetching so I just chucked it in there. But I'd also like to just get rid of the auto field fetching and do something else (either print nothing, or print the one-liner ID like Container.withExec(...): Container!, or maybe support a display: String! but that feels a little murky)
I still like "print nothing"
I keep searching for a simple and intuitive way to communicate "a lot of things were built, they now exist in the cache"
I feel like once you know dagger 101, you just know that - so the information is redundant
We have the precedent of unix... command succeeds, if it has nothing further to say, it just prints nothing and exits
If we had a nice short ID, we could print that
Ok let's start using threads 🙂 I worry that we're drowning newcomers in walls of implementation text
🧵 how should the CLI print objects?
Is it just me or is the behavior of OpenAI gpt-4o extremely volatile? Does it change due to load on their end or something?
Also, I'm trying out a new TUI cue where it renders a 'shadow' of a parent span after the last of its children so you don't need to scroll all the way up
There's a temperature setting in the API, to somewhat control the "creativity", we should expose it in llm(). That might help
Thought that might be related, looks like the default is 0 but I have no idea what the 'log probability' bit means. 
The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit.
HELLO NEWCOMERS
If you are here because of the Civo Navigate demo, or the AI devtools meetup demo, I will push an update to the melvin repo tonight, which makes it easier to get started, and notify you here.
In the meantime, for those who haven't already done so, please say hello and tell us what are your thoughts on the demo, and what use case you have in mind 🙂
@spring wave llm-shell bug report: ctrl-D triggers an immediate exit at any time, instead of only on an empty line edit. When the line is not empty, it's a common shortut for "delete character to the right of the cursor"
👋 @here ok folks, as promised I cleaned up the Melvin repo, and updated it with everything you need to replicate yesterday's demo ("containerize your AI agents!"), and get started playing with Melvin, and the experimental LLM branch of Dagger that it's built on.
https://github.com/shykes/melvin
Please let me know if you run into any issues getting started!
Contribute to shykes/melvin development by creating an account on GitHub.
pushed a fix (+ the 'shadow spans' experiment, feedback welcome)
@spring wave I think I just saw a bug report from you? Can't find where... Anyway I fixed it, prompt.txt is now pushed 🙂
Heads up, I'm making a small API adjustment that was frozen pre-demo.
--> withPrompt will be lazy
--> loop as an explicit call to send context and process replies & tool calls; to make it very clear when the loop actually happens
--> remove ask sugar to keep it simple
loop api sounds like sync. Is it fundamentally different or just named differently for a different audience?
Yeah it's called sync right now... I think it's technically the same, but it just does a lot more than for other types (ie. it actually implements the agentic loop)
So I would say 50/50 🙂
Also. I think I'm going to remove LLM_MODEL from the .env global config. Just to make thing simpler, and help us focus on making that an application concern
I think the next big features we need to add are:
-
Model selection in the API (including the routing logic in the backend, so operators can decide where to route which model). Clean separation of concerns between app and infra.
-
Multi-object BBI. Allowing one LLM to juggle any number of Dagger objects will unblock even more powerful composition scenarios. There are workarounds today, but they lean on some of our DX weak spots: wrapper types, interfaces, better to not force devs to use that on their first day...
-
Polish the tracing experience. Still too verbose out of the box in the terminal (per user feedback, "I am overwhelmed")
-
Generated clients.
-
Object persistence
Bug report @spring wave : new shell works wrapped in a dagger interactive terminal (github.com/shykes/dagger@llm | dev | terminal --cmd=dagger,shell) but just barely.
I just pushed this 👆
Now updating Melvin to use it
kind of interesting how the shell doesn't block input while you're running the command now. you can just keep spamming and backgrounding things
woops it crashed (concurrent map writes)

@smoky ocean two options (re: this crash, caused by running a bunch of stuff in parallel)
- decide we don't need that, and disable input while something is running in the foreground, like before (except
foo &- that'd be fine) - decide it's cool, and run each command in a subshell and backport any changes it makes to env back to the outer shell when it completes (last write wins, will have to keep track of changes manually)
edit: I'm just gonna wrap a mutex around it and call it a day - you can still type, but itll queue up commands instead
Getting Started with shykes/melvin
I think that was the right choice 🙂 (at least for now)
OK I'm actually reconsidering, maybe sync would be nice - it's not just about the function name, but the fact that the auto-unlazying behavior might be nice and predictable:
withPrompt-> lazywithXXX-> lazylast-reply-> auto-synchistory-> auto-syncXXX(get state back out) -> auto-sync
I can't find a sitemap.xml on the dagger.io website
I want to train a model on the docs
sitemap makes a easier to crawl
I know we have one for docs site
https://docs.dagger.io/robots.txt
https://docs.dagger.io/sitemap.xml
and checking for dagger.io, yes!
https://dagger.io/robots.txt
https://dagger.io/sitemap.xml
they're separate today (re-read and saw you were looking for the docs one specifically)
thanks @bronze fern
I remade my "marvin" demo in my own new agents repo https://github.com/kpenfound/agents/blob/main/go-coder/main.go#L37
More agents to come 🙂
But first I want to add go-programmer iterate <PR> <feedback>
Currently it can
solve a go coding assignment and give you a terminal with the solution: assignment "make a curl clone" | terminal
read an assignment from a github issue and push a PR with the solution: solve-issue GITHUB_TOKEN https://github.com/kpenfound/greetings-api 32
@harsh stag you could also work from the docs markdown files since they're open source.
Hey Dagger community! & @smoky ocean
I've been following the interesting conversations around AI, LLMs, and AI agents in this channel. My team has been actively building AI agents using LangGraph for the past six months, and we're now looking ahead to plan for some upcoming projects. This makes understanding Dagger's future direction in this space really important for our decision-making.
So, I'm curious about a couple of things:
-
Does Dagger plan to offer any built-in framework-like functionalities to help with building AI agents, similar to something like LangGraph? Or is the focus more on providing a foundational base that developers can use to build whatever they need on top of existing AI/LLM tools and frameworks? Knowing this will help us decide whether to continue investing in LangGraph or explore Dagger's native capabilities.
-
How does Dagger envision its integration with things like shell reload and distributed builds? For some of our projects, we face significant compute resource constraints and also need to minimize commute time for our developers. We're exploring how Dagger can help us manage these challenges, and understanding how these features might work together would be incredibly helpful.
Thanks in advance for any insights!
Hi Bhaumin!
This is very new so we're still figuring it out...
It seems that Dagger can provide a great runtime for agents, that allows "containerizing" both the software tools, and the llm context calling the tools. It's not an AI framework like Langchain, but provides a new primitive that could drastically simplify the architecture of your agents, and therefore change how you use the framework.
I think in the same way that Dagger integrates with your CI platform while radically changing how you use it; Dagger will integrate with your agentic framework while radically changing how you use it.
I shared details with our team for this agent direction, and they're excited about its potential. We're anticipating something big. We'd love to try it out soon and are hoping for experimental support in the next Dagger release so we can provide feedback.
The Dagger ecosystem is a huge plus. We appreciate that it offers built-in observability and the ability to orchestrate workflows based on a single cache, which can be used in multiple parallel workflows. There's so much potential here. Great Work, Keep it up.
Thanks for your reply.
Hello! I'm seconding this! I just saw your latest demo and was blown away by the potential of using dagger and containerized agents. Adding my reaction
- Wrapping MCP servers? Possibly from https://hub.docker.com/u/mcp
- Adding a break/token limit as a parameter to prevent the agent from going into an infinite loop?
Thanks, and excited to toy with it.
Could I use Claude with this agent or just openAI?
Claude support is coming 🙂 PR is underway
Also local models - llama3.2, qwen, deepseek...
Yes absolutely. We're thinking 2-way MCP bridge:
- Expose your Dagger modules as a MCP server (very easy)
- Wrap any MCP server and expose as Dagger API: need to find the best way. Dagger can certainly orchestrate and run the server itself, but the trickier part is API translation. How would you see us using that
mcpimage?
cc @hidden tartan 👆
+1 on token limit, we're also going to add token cost in the telemetry, perhaps even dollar cost 😁 cc @spring wave
Great! really want to try this
We're here to help!
Are you able to try with openai and switch to claude when we ship it? Or you'd rather wait?
I think I'm missing something obvious but I'm not seeing what. I'm trying to follow https://github.com/shykes/melvin readme but constantly hit this issue:
✔ toyProgrammer: ToyProgrammer! 0.0s
✘ .goProgram(assignment: "develop a curl clone"): Container! 1.8s
! select: failed to read secret file ".env": open .env: no such file or directory
But I have a .env file containing a value for LLM_KEY. The file is in the current directory from where I'm running dagger-llm shell. Is there something else to do? Like a way to mount/share the .env file?
Oh no I removed that crucial part from the README ... 🤦🏻♂️
Will add it back @river belfry . Basically, you need a .env file in your current directory, with LLM_KEY set to your openai token. It can be the plaintext value, or a dagger-style secret reference, for example op://... vault:// , env://..., cmd://...
For example here is mine:
$ cat .env
LLM_KEY=op://Dev/sdhfkasdhaskdahskd/credential
$
This is temporary until we connect this to Dagger's builtin config system
cc @gloomy kindle 👉👈 🥹
🤔
I have a .env in my current directory
$ cat .env
LLM_KEY="sk-pr....."
(it's an openapi key)
And I still got the error message
Ah..
Ah you're hitting a small bug - you need to call llm from the CLI first
Then your modules will be able to call it
We call this "bug 7", here's context if you're interested: #agents message
It's caused by my temporary hack to get LLM configuration into the engine
Starting a thread about adding MCP support / cc @worn hill @spring wave 🧵
sounds good, that works after to call llm 👍
I finally have the demo working. But I had to use op://... for the LLM_KEY, the other solutions (plain or env:// all failed on my side)
But it's working, so I can now play with it!
Nice! I have to say I've only been using op:// so it's possible something broke in the other providers without me noticing
@river belfry it might be the " in the plaintext, I don't know if .env format supports quotes in that way
I tried to remove them with no success
fwiw I set LLM_KEY with a raw value and it works
OK, I'm finishing up a README & setup.sh update so that others don't get stuck where you were. Then I will look into the plaintext issue
@river belfry maybe it was actually the "bug 7" and your plaintext was fine?
⚠️ I pushed a fixed & improved README. It includes a quickstart option with a setup.sh script. Thank you @river belfry for testing the onboarding flow and catching the gaps!
Quick recap of missing features:
- Binary release we need a binary release of the llm branch, to make it easier to install.
- MCP frontend (discussion #1341123420246773882 ). Any Dagger module instantly becomes a MCP server.
- Clean LLM configuration (not requiring a
.envin the local dir, not having to manually runllmfrom the CLI) - More models. Clearly we need Claude; as well as llama3, qwen, deepseek, Gemini... The more the better! This includes the ability to choose the model from the code.
- Token budgets. That includes measuring token cost and enforcing token limit
- Less verbose TUI. Consistent feedback that the output is too overwhelming by default
- Multi-object. Being able to give each LLM more than one Dagger object, would make it easy to compose modules and build more powerful agents in less lines of code
- Generated clients, to integrate Dagger modules in your existing application #1334452944740814931
- Finish the shell. All our demos rely on it, but it's not documented and not stabilized.
- MCP backend. Less straightforward, be very doable. Would be great to be able to wrap any existing MCP server in a Dagger module and boom, use it.
- Object persistence. When calling Dagger modules from an async, distributed workflow platform like Temporal or equivalent - you'll need a way to save object state in-between events. That is already a priority in the engine roadmap.
- Enable host API inside functions. #1339366755134472282 message
- Clean LLM configuration (not requiring a .env in the local dir, not having to manually run llm from the CLI)
- More models. Clearly we need Claude; as well as llama3, qwen, deepseek, Gemini... The more the better! This includes the ability to choose the model from the code.
Going to look into adding some plumbing to unblock those 👆
Is 5) still the case with the latest llm branch? Should be much quieter than the original demo now
I will check.
The feedback was not specific to the LLM parts btw, it was also about seeing lots of inner calls by default
I know that's always a delicate balance. Maybe the right answer is "you'll get used to it"
FYI @jovial grail this is where we're hacking on that agent demo 🙂
@shrewd ermine you mentioned returning a service endpoint - is there a missing feature in the engine llm branch to enable that? We can add it to the list 👆
It's the Host().Service() thing I was trying to enable, I'll link the thread
Oh right!
The idea is it'll let me run a GitHub webhook service wired into any agent I want. So not directly connected to the LLM feature but it helps
Added 🙂
On 3) more models - what's the blocker? Documentation? I think I've tested like 15 different models on ollama with my code
It's the plumbing into the API, and the dependency on the "clean llm config" which is a horrible sack of hacks right now (trying to fix it at the moment, without a core engine dev at my side it's quite challenging)
I'm going to start small, and move model selection into the llm() call. Initially it will only work with openai models, but at least will shift us towards the DX we want
then I'll try to add the "routing layer" that we talked about
(ie. based on requested model name, configure the right endpoint, taking existing standard config when it exists, and allowing augmenting it with dagger-specific config)
Can we add a bunch of statically defined models temporarily like we had in the early demos?
sure - I can do that first and deal with the "shared global config" issue later
Mode models
⚠️ pushed: first part of model selection API. You can now call eg llm(model: "gpt-3.5-turbo") or llm(model: "o1"). Still OpenAI only, but that will change soon
LLM_MODEL in .env is no longer supported. You can only configure the model from your code
I noticed the setup in melvin is a few commits behind shykes/dagger@llm. Should we get a pre-release branch+build on dagger/dagger similar to the cloak beta?
Oh, I keep forgetting to update the dependency
I figured, better to have to manually update, but at least stay in control of the user experience
I think a pre-release build would be a good thing to have, soon
yeah sounds good. I have the dagger@llm branch locally and I'm running dagger shell -c 'engine | service llm | up'
Debugging loop of the engine is SO SLOW
missing a field in the config parsing? /app/core/llm.go:485 /app/core/llm.go:662
Yeah maybe - trying to get very very basic debug information out...
seems like a lot of stuff isn't cached on engine builds. Or at least the work being done after the changed source is mounted takes a long time
It looks like my convention of splitting LLM_HOST from LLM_PATH goes against the grain of basically every API client... they all rely on mixing host & path in to a single "base url". So I'm finding myself splitting & merging it along the way, for basically no gain
Yeah I vote merge it. Scheme too.
on it
⚠️ pushed: second part of model selection.
We now honor standard env variables:
OPENAI_API_KEY,OPENAI_BASE_URL,OPENAI_MODELANTHROPIC_API_KEY,ANTHROPIC_BASE_URL,ANTHROPIC_MODEL
Note:
- Variables may be overridden with a .env file in the current directory.
- Variables for secret values may either contain the secret plaintext (basic convention) or a reference to the secret in 1password (op://...), hashicorp vault (vault://...), or a file (file://...)
- all API endpoints are still queried by the standard OpenAI client library. This will break Anthropic endpoints, and will be fixed in a follow-up commit.
Tomorrow:
- I'll finish adding Anthropic support (will pull in Sam's initial PR with anthropic client, and test it all)
- @shrewd ermine please let me know if it all works with local models
- I'll update melvin to the latest verison, and update the README
I take it back @shrewd ermine . Watching @dim spruce setup melvin in real time: yes we should absolutely start shipping a pre-release binary of llm right away, just like cloak...
Melvin install party underway with @dim spruce 😁
- Binary release we need a binary release of the llm branch, to make it easier to install.
@split shard could you help with this when in your morning? 🙏 It's the most painful blocker right now - installing a dev version of the engine, from source, is just not a lot of fun. A binary release, like we did for cloak, would be 10x better.
- MCP frontend (discussion #1341123420246773882 ). Any Dagger module instantly becomes a MCP server.
@worn hill want to take it?
- Clean LLM configuration (not requiring a
.envin the local dir, not having to manually runllmfrom the CLI)
@gloomy kindle we're due to sync tomorrow morning-pacific/evening-uk. @compact swan not sure if you'll be around, but welcome to join if so!
- More models. Clearly we need Claude; as well as llama3, qwen, deepseek, Gemini... The more the better! This includes the ability to choose the model from the code.
@shrewd ermine all ready for you to test with local modules.
Anthropic is half-finished, but I can finish it tomorrow.
- Token budgets. That includes measuring token cost and enforcing token limit
- Less verbose TUI. Consistent feedback that the output is too overwhelming by default
- Multi-object. Being able to give each LLM more than one Dagger object, would make it easy to compose modules and build more powerful agents in less lines of code
@spring wave take your pick 🙂 Let me know what you choose, so I know what can be allocated to others (or if you find something else)
- Generated clients, to integrate Dagger modules in your existing application #1334452944740814931
The rebel alliance is counting on you @hidden tartan 🙂
- Finish the shell. All our demos rely on it, but it's not documented and not stabilized.
I already know @shrewd fern is on it, just wanted to highlight that it's very useful for the LLM work
- MCP backend. Less straightforward, be very doable. Would be great to be able to wrap any existing MCP server in a Dagger module and boom, use it.
@hidden tartan @shrewd fern maybe we can talk about this in the context of the Magic SDK discussion we were planning on having anyway.
- Object persistence. When calling Dagger modules from an async, distributed workflow platform like Temporal or equivalent - you'll need a way to save object state in-between events. That is already a priority in the engine roadmap.
I know @steep onyx is on it already.
- Enable host API inside functions. #1339366755134472282 message
This one is up for grabs.
On top of this 👆 there's the equally important work on examples, and just building cool modules on top of the llm branch, which is the whole point 🙂
r.e. binary release. this is a touch tricky, we've changed a lot in our release process, but should be doable (especially if we're not cutting external sdk releases as well).
but, instead, i think it's worth (even if we do ship a binary soon) seeing how soon we can merge the llm pr, and have an experimental feature in main - that also reduces us needing to multi-task cross-cutting features like shell improvements in multiple places, and needing to keep rebasing. I'll look through the llm pr, and see if i spot any major blockers.
@spring wave notice there's a lot of https://github.com/dagger/dagger/pull/9327 in the llm branch. imo, the api seems entirely reasonable, and it feels like we've kind of decided on the design now that it's in the llm branch? is there still bikeshedding to do, or any blockers to merging that soon?
similarly, with the shell changes
there's a few code cleanups i think we'd want to make before merging - but don't know whether it's premature:
bbidoesn't seem to depend onremotefnanymore? seems to be deadcode now.- there's a few TODOs around the dagql server middlewares, which we'd probably want to fix (and add tests for, just so we're not accidentally breaking anything else)
honestly, aside from the above (which i'm happy to pick up), i'd be very happy to land this very soon (assuming we're on the same page with the new Span API and the shell tui changes). without the middlewares/etc, the scope of the changes are really not particularly sprawling, if something goes wrong, i'm not too worried about it breaking the rest of the engine.
(pushed a couple things to try and clean up ci on that branch, trying to get an idea of what needs doing to get it passing)
👋 I'm just looking to get this setup and working with our own API compatible endpoint (https://api.relax.ai/v1) and I can't seem to get the env vars to load from .env correctly
I've followed the setup.sh from github.com/shykes/melvin and I think I've got the correct engine / binaries running
I have the following in .env:
OPENAI_BASE_URL=api.relax.ai
OPENAI_MODEL=llama-3-70b
OPENAI_API_KEY=<PLAINTEXT>
however these don't seem to be being loaded when run
❯ ~/bin/dagger-llm shell -c "llm | model"
✔ connect 0.0s
✔ looking for module 0.2s
✔ loading type definitions 0.2s
✘ llm: Llm! 0.0s
! No valid LLM endpoint configuration
│ ∅ model router: []->[&core.LlmEndpoint{Model:"", BaseURL:"", Key:"", Provider:"other"}] 0.0s
I'm sure this is user error ... does anyone have any ideas of where I can start looking?
I've also tried:
OPENAI_BASE_URL=api.relax.ai/v1
OPENAI_BASE_URL=https://api.relax.ai/v1
Yeah - it was merged in to make web UI dev easier since it depended on the frontend-api branch + llm. I backported all the changes out from llm and back into that PR over the weekend, but I'm going to try and divide it up even further; technically llm only needs the internal plumbing (new attrs) and TUI changes, it doesn't need the API and SDK parts, which we don't want to rush. All the shell changes should also probably be a separate PR.
@bronze fern very interested in that typescript video you were about to post! 👀👀
LLM branch review & pre-release binary
Hi Solomon, please let me know when you chat with Justin. if I am awake, I would like to join that call.
will do!
hoping to be ready in 10mn
Is the LLM logic able to pull in secrets from the environment to call functions?
Trying to use Gerhard's notify module for sending some messages and it expects a webhookURL argument that is a secret
No, for secrets you need to pre-load them into the object state before passing it to the llm. The LLM typically doesn't have secrets to provide anyway
Looks like maybe that URL argument shouldn't be a secret?
it definitely should. Its a sensitive argument, anybody that can read it is able to post messages
ready whenever
@gloomy kindle @compact swan joining dev audio
Is the LLM logic able to pull in secrets
sorry I wanted to push a fix for @void flint first
using the dagger-in-dagger method for the engine and it is having a rough time on the latest commit
@shrewd ermine yet another instance of "you were right", I'm going to drop the mandatory loop() and make llm follow the standard "sync-able" system. Some calls will be lazy, others will force sync().
sounds good! guessing that means with-prompt is lazy but sync or history will execute?
what's weird with this is the session I was running has completed. The tui actually got dropped half way through. It looks like it thought it completed but it was only half done. Cloud was still following. After killing the shell from that session it's still 1000%+. No activity in engine log
@smoky ocean shall I make that change to print TypeName@xxh3:... instead of object fields, as part of splitting out TUI changes?
Yeah I like it 👍 should probably let @shrewd fern know since he's also working on shell in parallel
Is supplying multiple modules in a single Llm() supported?
Not yet... But that's planned. At the moment you can only give it one object. If you want to combine objects, you have to wrap them in a middleware object - not always fun with our current DX as you know
Progress update:
Binary release we need a binary release of the llm branch, to make it easier to install.
Just spoke to @split shard , he will take care of this tomorrow morning UK time. 🎉
Clean LLM configuration (not requiring a .env in the local dir, not having to manually run llm from the CLI)
I am unblocked thanks to @gloomy kindle , working on a fix now
Starting a thread for "clean LLM configuration" fix
Here's my multi-model agent flow https://github.com/kpenfound/agents/blob/main/go-coder/main.go#L59-L60
The idea is that I pass a coder-specific model to the part of my agent that writes code, and a chat/generalized model to the part of my agent that writes text (PR title, branch names, commit names)
@shrewd ermine so the model selection is passed to the openai client lib within the engine implementation, and it magically works? (same endpoint for both models in your case)
yup! I have a new optional arg for model too since it assumes you have the models I've selected available so it makes sense to override those. I'm wondering (as we get better at provider awareness) if we should attempt to pull models too. Some frontends have those capabilities and some don't
oooh, like instrumenting ollama to do it?
yeah it just depends where we want to sit in the stack. I don't know how those responsibilities get divided in different orgs. If we're provider-aware for ollama we'd have the ability to list models and pull models. But is that something someone would expect dagger to do or something you'd expect of your infra team
maybe something best left to the infra until its better defined
looking into gemini now
You'll need a custom client too for Gemini @shrewd ermine . Getting closer to what Sam started with https://github.com/shykes/dagger/pull/297
going to try their OpenAI compatible api first https://ai.google.dev/gemini-api/docs/openai
Have two PRS up now:
with a minor modification, I present: gemini
🤔 I don't know what I'm doing wrong, but I can't get the melvin stuff working anymore.
- with the latest commit from
shykes/melvin - I'm building
dagger-llmwith no problem - the engine is up and running
- inside
dagger-llm shellI'm loadingllmfirst with successmodel: gpt-4o(openai) - when running
./toy-programmer | go-program "develop a curl clone" | terminaleverything looks ok, but nothing is created inside the container
dagger /app $ ls -a
. ..
- if I'm trying the default
llm | with-prompt "llm, are you there?" | last-replyI just got a(no reply) - my openai key is stored in 1password, and accessed (as it asked me to unlock the vault)
If anyone has an idea of what I'm doing wrong 🙏
For the prompt, update it to llm | with-prompt "llm, are you there?" | loop | last-reply. I'll have a look at the toy-programmer setup. The api is changing fast 🙂
that works better 🙂 thanks
toy-programmer is also missing loop, although @smoky ocean how's the progress on changing that behavior?
yeah, I just added .Loop() and that works
So now I have something I know is working I can continue to try to translate everything in Java 🙂
I was still not able to get things to work but it was due to some weird networking issues. I restarted my Mac/Docker and things seem to be working finally. 😄
Can you share how you call this in the context of melvin or using the .env file?
I was able to translate toy-workspace and toy-programmer into Java. It's using a specific branch for dagger (basically the shykes/llm branch with the constructor support from https://github.com/dagger/dagger/pull/9523 - available at lgtdio/llvm-java)
-> There's no real benefit here except writing in Java and catching bugs in the Java SDK (already found and fixed 3 of them)
I updated the workspace part and the prompt, I'm now able to generate java/maven code. The prompt is more complex than the Go one, to ensure the generated code will work (and I haven't fixed everything). But that a start.
At least it's fun to do 🙂
./nono-programmer | java-program "develop a clone of curl" | terminal --cmd=bash
I love how this extremely amazing thing is framed as " no real benefit" 😂
The bar keeps rising every day!
-> There's no real benefit here except writing in Java and catching bugs in the Java SDK (already found and fixed 3 of them)
@river belfry that reminds me, would you mind opening a PR to the melvin repo with your Java example?
😅
What I mean is just it doesn't add any kind of feature except the fact it's in Java 🙂 But yeah I'm happy that works, that's a real test for the java sdk
I am very excited about the progress! Having the Java SDK is going to open up entire new swaths of people being able to take advantage of Dagger.
I'll look at that. I don't think it will be mergeable right away because of the specific java additions that needs to be merged first, but at least it can be in PR to start. I'll just refine a bit the prompt and I'll open it
- Clean LLM configuration (not requiring a .env in the local dir, not having to manually run llm from the CLI)
🚨🚨🚨 This is now fixed! Thank you @steep onyx @gloomy kindle @compact swan @wraith remnant for your help tracking it down!
@shrewd ermine what open models do you have working so far? And what's the coolest way we could share that online? I want to send one tweet per "headline" improvement 🙂
maybe a cool little gif of each? If we can make it catchy somehow
He has all the models
In ollama I've had the most success with qwen2.5-coder (of various sizes) for working with code. For more general purpose, ollama3.3 has been great (thats the same as ollama3 70b).
I have gemini-flash working although it seems pretty stupid. Maybe I need to give them my credit card
I tried using deepseek-r1:70b on Kyles infra but its not working for me at the moment.
✔ llm 0.0s
model: deepseek-r1:70b(other)
✘ llm | with-prompt "what model are you using" | loop | last-reply 34.8s
! Post "http://dagger/query": context canceled
● llm | with-prompt "what model are you using" | loop | last-reply 1m11s
Anthropic is half-finished, but I can finish it tomorrow.
@wraith remnant is finishing Anthropic support
I'm going to adjust the API, to not require Loop() every time (too extreme), going to aling it with the standard sync() model like other core types
see if your engine died. It happens sometimes with dagger-in-dagger engines
It seems to be alive because I was running with setup.sh shortcut, but idk - is it working on your end?
I switched back to qwen and it worked right away
✔ llm 0.0s
model: qwen2.5-coder:32b(other)
✔ llm | with-prompt "what model are you using" | loop | last-reply 8.9s
I am based on the Qwen large language model created by Alibaba Cloud. How can I assist you today?
│🧑 what model are you using
│
│🤖 I am based on the Qwen large language model created by Alibaba Cloud. How can I assist you
│ ┃ today?
interesting, maybe it was having trouble loading the deepseek 70b model with multiple users 😂 that model does push the limits of that GPU
Ah yeah makes sense
I've had that model working btw but only when I was the only user. And it's really bad at tool calls because it's so chatty
added instructions https://github.com/kpenfound/agents?tab=readme-ov-file#running-the-go-coder-agent
Here is a draft PR that adds a ./toy-programmer | java-program to Melvin
This uses a specific maven/java workspace (created with the Java SDK)
And uses a specific llm-java branch the time we merge the constructor support (the branch has been refreshed a few minutes ago)
https://github.com/shykes/melvin/pull/4
@smoky ocean my 2 changes required for gemini (so far based on ongoing testing)
- disable the automatic model routing so that it falls back to the openai client
- comment out
seedfrom the openai request. It's not supported in google's openai "compatible" api. How necessary is it for actual openai models?
Noted 👍
Want to make the PR?
re: seed. No idea...
Yep will do. I can just drop seed when it's a google model
No worries about that, I'll make it a provider specific thing, could you please add a comment next to it ?
Sam abstracted a bit and made the implementation per provider
@wraith remnant but does it default to openai still? I guess so
So, upon the call on teh SendQuery interface, per provider, it will catch it
yes yes
It looks like @shrewd ermine confirms that we can fallback to openai provider for gemini, if we make the 2 small changes above
(or you're saying we can have a gemini-specific provider that just calls the openai client in a slightly different way?)
I think this whole block is outdated now 😬
anyway - as long as you two are talking I'm happy 😉
Yeah, need to see Kyle's PR to be sure, BUT, what I meant is that, as I'm implementing this sendQuery interface ; each provider would have the responsibility to set up everything
If gemini retrocompat might impact the openAI as we remove some options, then we could just implement the interface
Yeah rebasing it and making it work-ish atm -- or am i totally off ? And misunderstood somethign 🤣
I guess I'll see your pr
well there's provider-specific routing existing on @llm now. All I've done is disabled it for google, and then commented out this line https://github.com/shykes/dagger/blob/llm/core/llm.go#L549
Oh so it fallsback to gemini, using the chatgpt library ; yeah we're aligned then
You're just in advance ahah 🤣
But, wouldn't what we want is to keep the routing per provider though ? 🤔
in some cases, like anthropic where you have a different client, yes. But for ollama and gemini I want to use the openai client
Yeah, but wouldn't what we really want is to have those providers implement their own specific: func (llm *Llm) sendQuery(ctx..., in which case it reuses the openai client with the seed off -- and not for openai routing for example ?
As I am implementing the abstraction for claude with an interface, extending the logic is then easy
No strong opinion though 😇
My original answer was based on the fact that *we don't * know if the seed is actually useful -- but if useless it's easy and you're right ahah 👼
yeah I know what you mean. The painful part is that the whole benefit of the "openai compatible api" is that we really shouldn't need to do that. For Ollama it's been perfect so far. The fact that gemini's openai API is missing seed is just a bug on their end. If we were going to actually route it differently I'd say we just use the core gemini api instead of the openai compatible one
So we're aligned, it's just an implementation detail afterward
⚠️ pushed: API call limts. llm(maxApiCalls:10) -> this will cap to 10 API calls total for the duration of that LLM instance. This deprecates loop(maxLoops) and paves the way to renaming it to adopting standard Dagger convention for laziness and sync
If you reach the cap, you will get an error
we can add token limit and maybe dollar limit in the future (might require a dagger cloud integration for that last one)
how far along are you in that rebase? I can work on implementing the gemini client once you're in a good spot
cool, i'll work on something else for now!
🙏 (focus mode, seeya 👋 )
I'm thinking that with multi-object, I'll take the opportunity to adjust the API as so:
Option 1:
type LLM {
set<FOO>(key: String!, value: <FOO>!): LLM!
get<FOO>(key: String!): <FOO>!
}
The explicit set and get might make it more clear what is happening.
But, we would lose with<FOO> which is also nice. wdyt?
Option 2:
Keep what we have, just with key argument:
type LLM {
with<FOO>(key: String!, value: <FOO>!): LLM!
<FOO>(key: String!): <FOO>!
}
mmm weird, toy-programmer prompt now breaks for me, it tells me how to do it, instead of using tools to do it ...
Nevermind, it was a bug I introduced, now fixed
@shrewd ermine @spring wave you guys both prefer option 2?
One issue I have with it, is that it pollutes the LLM type completion big time... the 5-6 "regular" functions are completely mixed with the dozens of state setters/getters and there's no good way to distinguish them
I think I do, but I haven't actually done anything multi-object yet. So maybe after that's supported I'll realize it's gross
I just have a negative gut reaction to getters/setters 😛 but now I understand the problem a bit more
setFoo behaves like withFoo, right? it just returns a modified LLM with the value set, as opposed to mutating?
The problem I'm trying to avoid ⏬
Yes it would be exactly the same as withFoo, just a rename
a compromise would be withFoo / getFoo
🚨🚨🚨 LLM.loop() is now no longer required. You can call sync() explicitly to force evaluation - otherwise just call lastReply, history or a getter function (container(), myObject()) and the state will be automatically synchronized
--> Logging off for the day. Tomorrow attacking multi-object and/or MCP support, we'll see. + playing with the latest dagger shell improvements from Helder, and hopefully plugging that in!
Thank you for a fun few days everyone!
@shrewd ermine - re the messages earlier about deepseek. The model doesn't support tool calling at the moment, hence why you're not getting "great" results from it. I've also found that r1 with 70b params can halucinate quite quickly. I've been running a 671b quantised model to get better results. I still havn't run the full model
To get tool call "working" with DeepSeek R1, I've been using BAML (https://www.boundaryml.com/blog/deepseek-r1-function-calling) in between my code and model. BAML taks the random / unstructured output from any LLM, and coercess into a user defined format
I re-did my java test: https://github.com/shykes/melvin/pull/4
So it's using the default dagger-llm (no specific branch, just needed for the java sdk)
There's a new java module with a few commands:
find-bugsto find and explain them, but without to change anythingrefactorto improve the code
That's still a toy, but that's really fun to do! 🙂
❓ Is there a way we can directly edit files on the host from the workspace? I'm doing a ./java --source MyDir | foo | export MyDir but I'd like to avoid the last segment is possible
well it's just a regular dagger module, so same answer as any other module -> likely not possible because of the sandbox
we want to add a "smart export" so you don't have to specify the target path : https://github.com/dagger/dagger/issues/8235
ok, it was just to be sure I haven't missed something. The smart export idea looks nice 👍
@spring wave @shrewd ermine @wraith remnant heads up I'm moving us from shykes/dagger@llm to dagger/dagger@llm. Thanks to @split shard and @gloomy kindle we have a nice pre-release system, so we can start giving binary builds to users instead of having them build dagger with dagger 🙂 🙏
New PR for llm support: https://github.com/dagger/dagger/pull/9628
🚨🚨🚨 Pushed: a new README with much simpler initial setup. You can now install llm-enabled Dagger from a binary pre-release. Please give the new setup instructions a try to make sure they work 🙏
@spring wave @steep onyx I have a new merge conflict rebasing llm on main... I think related to this commit:
commit 4bb955ca4e9791125de7aae4fa090759891acc11 (upstream/main, main)
Author: Erik Sipsma <erik@dagger.io>
Date: Wed Feb 19 11:22:49 2025 -0800
make function calls cached session-wide (#9621)
I'm going to poke around but I didn't author either side of the conflicting code so not very confident
@smoky ocean new demo just dropped https://github.com/shykes/melvin/pull/5
Merged! 🙂
Love it
It just occured to me: at the moment multi-model will not work if you combine OpenAI models & generic - because the engine uses OPENAI_BASE_URL for both
Starting a thread to discuss @nova bronze 's toy roasting agent 🙂
@smoky ocean 🚨 gonna push -f soon: I removed all the commits that were split out into my two PRs, rebased on main, and then merged both of my PRs in on top. From now on we can just continuously merge from the 3 branches (vito/tui-llm, vito/shell-bbt, main - I'll manage the first two as changes are made to them), and once they're merged we can just rebase on main
uh...done... i think?
remote: Resolving deltas: 100% (661/661), completed with 79 local objects.
remote: Bypassed rule violations for refs/heads/llm:
remote:
remote: - Cannot force-push to this branch
remote:
To github.com:dagger/dagger
+ 2273cd29a...bae2eb0f5 llm -> llm (forced update)
branch 'llm' set up to track 'upstream/llm'.
thank you!
Current status: upgrading github.com/shykes/melvin/workspace to use a checker interface instead of a container... If that works, I'll use interfaces for a onSave hook, then the programmer agent can eg. send updates to github, discord etc. from within its inner loop 😛
@shrewd ermine do you need help on the dag.Host support? Looks like it would unblock you in a big way
ugh getting stuck on interfaces... 😦 I never get it right on the first try
sounds like design ideas were discussed today to unlock it in the future! But nothing I can work with today
Yes! Interface triggered
Man1️⃣ Machine0️⃣
Next blocker: figuring out the right pattern for using interfaces as callbacks in a stateful loop...
Specifically: if the callback needs to persist state across calls... how to do that.
For example, for a notification callback (so I can eg. send github & discord updates every time the dev agent checkpoints its workspace).
--> the github notification module kind of assumes its local state will be up-to-date. For example if you add 3 tasks, it expects that its state will contain an array of 3 tasks.
But if I wrap it in an interface that looks like this:
interface Notifier {
notify(message: String!): Void
}
Then there's no way to "build state" from one notification to the next
So, no big deal you say: just make it a chained call:
interface Notifier {
notify(message: String!): Notifier
}
Well guess what: now my "notify" function has to be lazy, because it returns an object, and therefore there is a blanket rule that you can't do error check on it
Not too bad for notify (would be nice to be able to know when a notification callback failed, but sure why not).
But it becomes a major problem for my other interface: Checker.
interface Checker {
check(dir: DirectoryID!): Checker
}
Now my check function, whose only purpose is to maybe return an error, cannot return an error.
A separate but related problem: because a module can't receive or return another module's type, we can't use interfaces to implement a complete hook system.
Eg. my Workspace type can't define a hook interface that receives the Workspace as argument, for more flexibility.
So for example my notifier can't inspect Workspace.Diff() and make its own decision on whether to include it or not.
that said: HA HA HA it's working
I think actually a lot of this callback system might be advantageously replaced by multi-object. Instead of programming my workspace to call a notifier hook on each save(), I would just give the LLM both 1) the workspace and 2) the notifier, and let it decide when to send updates
@steep onyx is this error related to the bump to 0.16? Cannot query field \\\\\\\"resolveContextPathFromCaller\\\\\\\" on type \\\\\\\"ModuleSource
Curious if this could be daggerized: https://x.com/leonardtang_/status/1892243653071908949
Yep that's part of the breaking change, need v0.16.0
Got it. I'm just doing dag.CurrentModule().Source().File("system.txt"). Based on the note in the release I thought it would be safe
You upgraded everything to v0.16.0? And getting that?
I wonder if this is related to your success installing before with the different sha
"engineVersion": "v0.16.1-250219172554-bae2eb0f5765"
using cli + engine built off of dagger@llm
Oh I'm not sure about the llm branch. resolveContextPathFromCaller is an API that doesn't exist anymore, so you have an old client somehow
ok I have a pretty messed up setup right now so i'm bringing it all back up lol
no luck 😞 let me try to uncomplicate the setup a bit
ok I think we're good. I was on dev-engine in prod engine -> shell -> function -> dagger-in-dagger. Now I'm just on the llm release. It was a bit manual updating deps
little demo of my github-agent. It runs my go-coder agent on my greetings-api repo automatically https://github.com/kpenfound/agents/tree/main/github-agent
Q: maybe this was already mentioned (sorry), can I make the llm primitive talks to an openai compatible service running inside the Dagger engine? Like, I start one with gpu access, then I load the model, and eventually I test my agent?
What a great idea. Not yet supported but yes we can add that without too much trouble I think 👍 would you mind opening an issue for it? 🙏
Sure, I'll do it
Agent demo generating a new Cypress test for a UI change in a project very close to the example project from the Quickstart. All using the TypeScript SDK.
https://github.com/jpadams/cypress-test-llm-ts <- demo project
https://github.com/jpadams/hello-dagger-ts <- project under test
Cannot use Claude endpoint with version 0.16.0-llm, right? I see it using OpenAI endpoint when I set up Claude endpoint.
What are you trying to do? start Dagger enging with gpu access create a service that loads and LLM model and expose an OpenAI API create awesome agents with LLM Why is this important to you? Testin...
Normally it should route correctly, did you request claude-2 or another anthropic model in the code?
if you set ANTHROPIC_MODEL in the environment it should pick that as a default
Thank you!
That said @proper stratus even once you get the requested routed correctly to anthropic (which should already work), you will hit another issue, which is that Anthropic endpoints don't work with the openai client libraries, and we haven't yet added the anthropic client library. @wraith remnant is finishing a PR for this, and we plan on merging today.
TLDR: by the end of the day, you should have a working anthropic implementation running on Dagger 🙂
Great! I will try that tomorrow morning.
Hello everyone! On the menu for today:
- Anthropic support merging soon thanks to @wraith remnant and @storm gate
- We rebased on dagger main, and will get the sweet sweet performance boost contributed by @steep onyx 🙏 Will cut a release soon, this will require an upgrade
- Some cool demos being built on dagger-llm, we will share videos and links
- I would love to get MCP support prototyped today
- Big improvements to dagger shell in a separate branch thanks to @shrewd fern , I'm thinking we should just merge that in also 😛
Awesome, dagger team rocks. is there any way to try such features with dagger canary or dagger next or like dagger nightly build kind of releases like other technology follows eg: vs code insider, deno canary, chrome canary, etc
@devout magnet yes if you look at the README for https://github.com/shykes/melvin , we updated it yesterday with install instructions that do this. There is a "pre-release" version of Dagger called v0.16-llm.1. Later today we will release v.017-llm.1
ok got it, and all this llm support in dagger will be like we dont need to use any library of any llm directly, we just need to use dagger llm features, that way is it going to be supported? i am curious to know so i asked...
Yes correct, if you decide to use Dagger for sending your prompts, then you won't need a llm client library in your code.
I'm so excited about Dagger and AI! It's fantastic to finally share this with you all. I've always told my team that the people building Dagger are practically magicians – you bring next-generation ideas to life. It's incredibly inspiring, and I'm thrilled to start using AI with Dagger. I know it's going to be a lot of fun!
ok but with prompt we will be able to send other context and use other features like structure output and code execution also?
@gloomy kindle @spring wave @split shard @wraith remnant @shrewd ermine be advised: I am tagging v0.17.0-llm.1
@wraith remnant do you want me to wait for your Anthropic PR to merge?
Need to debug a few things still can you wait ~1/2h ?
no problem we'll just release llm.2
🚨🚨🚨 New pre-release has dropped: v0.17.0-llm.1. Please update your dagger with:
curl -fsSL https://dl.dagger.io/dagger/install.sh | DAGGER_VERSION=0.17.0-llm.1 BIN_DIR=/usr/local/bin sh
This is rebased on Dagger 0.16, so among other things, you will get the performance boost & 1password & hashicorp vault integrations
@smoky ocean have you used lmstudio? when I was doing the C# stuff with semantic kernel, i was only calling locally hosted models, wondering if i can do that with this PR or does it call out to the remote model provider API only? i.e you need to get an api key / acocunt on those providers to use the LLM dagget type?
I somehow racked up a £100 bill and i errm, didnt want to run remote OpenAI stuff anymore because of that
like i would just wanna point it to localhost
This works with locally hosted models, we have examples with ollma
Depending on your infra it may be slower but works out do the box
I've been mostly just using ollama which is local models and you can point at localhost
what lev said 🙂
@shrewd ermine to the rescue
ahh okay cool, thats good - the variables provided doesnt make it all too clear to me
Yeah, good point! I was also confused at first but the tldr is that ollama provides a compatible api
in which case, ignore me will try on the weekend!!
yeah with ollama specifically we're taking advantage of the fact that they have an openai compatible API, so I set OPENAI_BASE_URL=http://localhost:11434/v1/. I'm not familiar with lmstudio to say what that would look like
@shrewd ermine might be a nice blog post or something showing how to get this working from scratch
As a safety, you can also cap the number of API calls per llm instance (for example llm(maxApiCalls:10)) it's a pretty crude safety but better than nothing. We're going to ask token caps also
I was wondering if one could implement this "DeepSeek + tool calling wrapper" as a dagger module 🙂
- Run a DeepSeek instance ("thinker")
- Also run a non-reasoning model that supports tool calling ("doer")
- Wire them so that they talk to each other - the thinker tells the doer what tools to call
- Wrap that in a
DeepSeekWithToolCalltype, that can be instantiated and prompted at will
I think the issue might be - how to get your object in there. You might have to instantiate the "doer" LLM on your own, then pass it as an argument
So maybe a more generic utility, to combine 2 Llms - one thinker, one doer - so that they collaborate
I'd definitely like to try that concept. For deepseek + tools I tried ishumilin/deepseek-r1-coder-tools:14b
and some models support images right?
has anyone tried to dagger -> provide image -> get output from image?
me winning a game of chess, but i was sending screenshots every second to a dagger agent and nobody knew
💀
could be an interesting use case for dagger watch... could screenshot your screen or a window, and boom - an assistant
@shrewd ermine the anthropic PR is ready for review: https://github.com/dagger/dagger/pull/9656. if you can take a look to ensure I didn't break anything 🙏
It shouldn't change anything on the openAI / fallback logic for gemini
at a glance, looks great! I can try building it if you're unsure
yeah please 🙏 Especially for the retrocompat ; Anthropic still hallucinates a bit, I am digging that
But it can be a follow-up -- tools are discovered, the model is using them -- it's just that it doesn't seem as good as openAI
Just waiting for my build to finish 🐌 You should try the multiagent demo with openAI + anthropic 🙂 https://github.com/shykes/melvin/tree/main/multiagent-demo
ollama / openai still works for me!
Then good to go ✅🙏
Added my approval
lmao your agent is cheeky af "jonathan are you running or are you simply lost in an american mall because that pace needs some working on. the elevation you've scaled is about as intimidating as your granny's porch steps!" .... also did you intend to teach the LLM that it's not american? or did it infer that from your activities?
Hahah very cheeky 😅. It picked it up either from the location of the activity or the location of the "club"
@wraith remnant @shrewd ermine @spring wave if anthropic is merged, do you want to tag v0.17.0-llm.2 ?
Alex fixed the tool calling. He's pushing the change soon ❤️
pushed
Works prefectly ✅ ❤️ -- thanks again
awesome 🙂
@wraith remnant can you share a gif that I can tweet 🙂
And can you guys tag 0.17.0-llm.2 if anthropic works? 🙏
@bronze fern @shrewd ermine back to my computer in 5mn... sorry for disappearing. Got some gifs for me? multi model goodness maybe? 😛
can try to condense this one into a gif? #agents message
X takes videos just fine
Ok I'm at my computer. Pushing llm.2 so that everyone gets the Anthropic goodness
@spring wave I may have found a bug in the TUI/llm branch...
Does this work for you?
container | from alpine | with-service-binding api $(container | from nginx | with-exposed-port 8000 | as-service) | terminal
nope - hangs
Same for me
That's the bug
I assumed it's TUI/shell but maybe not?
v0.17.0-llm.2
Still hitting a 8k token output limit though on Anthropic's complex agentic solution
It looks like this (basically silent erroring as the output gets incomplete)
But works
that's a limit linked to your account?
To their models themselves : https://docs.anthropic.com/en/docs/about-claude/models#model-comparison-table -- and gpt4o (16k) https://platform.openai.com/docs/models#gpt-4o-mini
Ah but it still does tool calling fine?
So maybe we need to tell it to shut up and just do the work? 🙂
put mine up on X @smoky ocean
has vid attached
still getting this error from time to time: ! input: llm.withContainer.withPrompt.sync POST "https://api.anthropic.com/v1/messages": 400 Bad Request {"type":"error","error":{"type":"invalid_request_error","message":"messages: text content blocks must be non-empty"}}
Digging back after the gif is done
@smoky ocean re: stdout/stderr, I pushed a change to include <stdout> and <stderr> in exec error responses, only in the LLM code path for now instead of everywhere. demo (too bad that's not publicly viewable)
oh nice! I think even just stderr would have been enough in case of error. "only in llm code path" -> anything under a llm.Sync?
I actually went through the trouble of having it render the progress UI by loading and replaying the client's telemetry, but that didn't even help because it's reliant on the command actually running where/when you expect it to. In my case the command was being evaluated prior to the llm.Sync (arguably a bug - I was passing a container as an arg, and I think that calls sync instead of id atm) so the call in llm.Sync ended up just hitting the same cached failure, without any new logs printed. Whereas if you extract it from the error itself, it'll always be there
Also at one point it rendered way too much and blew up my token rate limit 😅 so that method is a little fraught
Looking into MCP support... So what's the deal with this stdio-server situation?
- The convention seems to be: implement your MCP server as a stdio server?
- But then surely the client apps expect a remote HTTP endpoint?
Should dagger http endpoint? Or stdio and if so -how
indeed!
Looks like Goose definitely executes directly
(with manual curation of which shell command to run for which mcp server... 😭)
If it helps any, there's a stdio => HTTP proxy tool called @mpchub/gateway - it's how I was able to get Claude Desktop to talk to my MCP service running in Dagger:
env MCPHUB_SERVER_URL=http://localhost:1234/api/mcp npx @mcphub/gateway
Claude config (mine has extra WSL rubbish):
❯ cat /mnt/c/Users/surac/AppData/Roaming/Claude/claude_desktop_config.json
{
"mcpServers": {
"dagger": {
"command": "wsl",
"args": [
"--cd",
"~/hack/mcp-gql/",
"env",
"MCPHUB_SERVER_URL=http://localhost:8080",
"npx",
"@mcphub/gateway"
]
}
}
}
not ideal for prod obviously, but could help unblock
yeah
which is... kind of weird tbh. you'd think it'd be easy to support a simple HTTP endpoint config
yes
I'm curious, does mcp support nesting / multiplexing? In other words, in theory could you connect one MCP server that has all the tools ™️ then select from the client which tool you want in which session?
my guess is "yeah right"
the set of available tools can in theory be changed at "runtime" - there's a protocol message for it
but i don't think Claude Desktop supports that
(that's going to be a theme - for Desktop anyway, different story if we control the client too)
Goose is the same. Very static
which is fine for the graphql BBI approach, since the set of tools never has to change, its understanding of the schema changes instead
So, since the utility of MCP is defined by its clients - and there are actually very few of those - fair to say, "support MCP" actually means "support executing many specialized MCP servers as single-purpose stdio servers"
--> let's move this thread to #1341123420246773882
Orchestrate Agents
A thought, from a business perspective.
Assistant (something a user or customer will ACTUALLY use...) -> calls out to an agent? Multiple agents behind the scene? Does the Customer UI even show whats happening? - Think of complicated workflows that could be retreival and manipulation of data, and then possibly dumped somewhere else, not back to the user at all) Agents can or will possibly have multiple tools available and an assistant may possibly be a agent gpt based bot on the front or not.
What does THAT look like in a world of dagger
ignore dagger and the devops, how is a customer interacting with AI solutions that run on dagger. Those diagrams will be interesting.
yes agreed. Dagger can help with the backend, but you still need a frontend, and event-driven infrastructure to connect them
Hi everyone!
Im trying to reproduce the legendary demo "containerize your agents".
Tested on 0.15 and 0.16, this attempts to reach the OpenAI API even when the endpoint is set to local ollama URL:
command:
OPENAI_BASE_URL="tcp://0.0.0.0:11434" OPENAI_API_KEY="notused" dagger shell
. ⋈ llm --model "gpt-4o" | with-prompt "hi " | ask yo | history
Error: input: llm.withPrompt.ask POST "https://api.openai.com/v1/chat/completions": 401 Unauthorized {
"error": {
"message": "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.",
"type": "invalid_request_error",
"param": null,
"code": null
}
}
any help/advice is appreciated. In oder to make this work, you need to create an alias for any of your ollama models to one of the openai models (as per how the code seems). The rest i dunno 😦
Make sure you install v0.17.0-llm.1, see the readme in https://github.com/shykes/melvin
or is it llm.2 now @gloomy kindle @split shard ? 😇
see #1342304973421150218, still failing - i tried submitting a fix earlier today, did not manage to get a review
<@&1326978792110948425> folks, would love your input here. 👆

