#maintainers
1 messages · Page 13 of 1
We need to decide what we want to support. Those are available with a .. prefix to avoid a name clash with a function name. Except function because that’s not a shell builtin, its a part of bash syntax and that was disabled because of export.
can I get an example of valid value for _EXPERIMENTAL_DAGGER_CACHE_CONFIG ?
sorry just saw this separate from DMs, for the record the current best place to look are the integ tests and the upstream docs
what's up with this thing where github reports "pending" for all its dagger checks? https://github.com/dagger/dagger/pull/9027 https://github.com/dagger/dagger/pull/9030 https://github.com/dagger/dagger/pull/9031 @fair ermine @spark cedar
Follow-up to #8466.
Some tools (like git checkout) return valid 128 exit codes.
We only exclude the range 128+, because this is the exit code for a process that was terminated by signal(). 0 isn&am...
does what it says on the tin, the memstat double-print is an ugly consequence of a merge conflict resolution gone poorly.
fixes #9012
possibly related?: #infrastructure message
Looks like the GHA checks run+pass and the checks are not pending in dagger.cloud, so feels like the checks themselves are just not getting marked as done and/or GHA is having issues with showing the latest status?
@hybrid widget @tidal spire for generating the gif examples 🙂
#!/usr/bin/env dagger shell -q -m github.com/shykes/daggerverse/termcast
print "# Let's start with a simple command" |
enter |
exec "dagger shell -c '.container | from alpine | with-exec apk,add,git,openssh,rsync'" |
gif |
..export ./demo.gif
First run will be slow, it has to build a bunch of tooling from scratch
@still garnet I still have that problem where the TUI output is somehow not intercepted... 😦 But otherwise it all works
Yup I noticed that too, that strange because the CI job completed
hm, i don't think this is related to an engine change, since all the jobs here are using the same engine version that they were before, and that was working
potentially a cloud change?
unless it's related to https://github.com/dagger/dagger/pull/9013 🤔 (the first change it started happening on main)
potentially something up with our github app integration as well? not sure where the code for that lives though
cc @wild zephyr you did make a change around here yesterday 🤔 https://github.com/dagger/dagger.io/commit/769e2519ac8fd64f82c7b7cdf1d384814bef2a17
could this be causing the checks thing we're seeing?
Hmm weird..I'll check in a bit
just pushed an update, if that was the issue, this should fix it
hm, it definitely seems fixed now 😄
huh, so i guess the theory is the api workers became saturated retrying errors that were previously not retried?
@still garnet, I'm getting a panic on TUI and non-tty stdin:
❯ dagger shell --no-load <<<'.help'
...
ADDITIONAL COMMANDS
.core Load a core Dagger type
.doc Show documentation for a type, or a function
.help Print this help message
Use ".help <command>" for more information.
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x2 addr=0x18 pc=0x105a507cc]
goroutine 217 [running]:
github.com/muesli/cancelreader.(*fallbackCancelReader).Read(0x140001e6080, {0x140007f6100, 0x100, 0x100})
/go/pkg/mod/github.com/muesli/cancelreader@v0.2.2/cancelreader.go:53 +0x4c
github.com/charmbracelet/bubbletea.readAnsiInputs({0x1060d1df0, 0x140001810e0}, 0x1400014c070, {0x14da59c98, 0x140001e6080})
/go/pkg/mod/github.com/charmbracelet/bubbletea@v1.1.1/key.go:565 +0x88
github.com/charmbracelet/bubbletea.readInputs(...)
/go/pkg/mod/github.com/charmbracelet/bubbletea@v1.1.1/key_other.go:12
github.com/charmbracelet/bubbletea.(*Program).readLoop(0x1400042c8c0)
/go/pkg/mod/github.com/charmbracelet/bubbletea@v1.1.1/tty.go:94 +0x8c
created by github.com/charmbracelet/bubbletea.(*Program).initCancelReader in goroutine 1
/go/pkg/mod/github.com/charmbracelet/bubbletea@v1.1.1/tty.go:86 +0x110
It does show the output but doesn't seem to close properly in the end. What's the proper way to get that stdin? It used to work correctly, but I can see v0.14.0 has the issue (main as well).
ironically the release title for bubbletea v1.1.1 is Don't panic!
hm, i recally attempting to fix this before
is there a reason we don't use h.stdin there in the line you link to?
This is actually what I see locally.
yeah, i think this fixed it for the echo "" | dagger call case
(pretty unfamiliar with this code, still reading)
At the time it didn't work. I also tried again earlier today and didn't work too. Let me see if I get a different stack.
No, it's the same.
is that the whole stack? it looks like it's on a read, not on a close? or maybe it's the close after reading that triggers it?
This is with h.stdin
nothing ever sets r.r = nil (not even Close()) so it seems like it was legitimately initialized with nil: https://github.com/muesli/cancelreader/blob/245609eb8557cff32c56eed62b04a2d096c83e83/cancelreader.go
Hm, but WithInput(nil) is definitely valid 🤔
the doc strings for WithInput tells you to do it 😛
ahh
i think it's RestoreTerminal
https://github.com/charmbracelet/bubbletea/blob/4ad07926d7ff00bc21a05b2536d82a7cc629225e/tea.go#L738-L740 it does this unconditionally
vs. here
Yeah 🧐
(brb, vet)
So it's a regression?
i think it's just a bug in bubbletea - if there's no input, then it will fail to RestoreTerminal - which only gets called in this way by the shell
i guess you could also do it via echo "" | dagger core container from --address=alpine terminal
this also crashes
might be a regression 🤔 but it's also unclear imo what you'd even expect this to do lol
ohhh yeah, okay, we used to WithInputTTY
Why doesn't the same thing happen with dagger query?
but we switched it because this caused weird issues - the handling for that always unconditionally tries to open /dev/tty which doesn't always exist in some tests/etc.
but we should try and handle that case
i can open a pr that will handle it for this specific case, but i think we still need the bubbletea upstream fix if no /dev/tty is available
the RestoreTerminal code path is only executed when calling the tea.Exec - which happens when we send a backgrounded message (only happens for terminal + shell)
Not sure though.. since before my change I think we were unnecessarily returning an error in the river worker which AFAIK it should trigger a retry by default?
so not sure why not returning the error was causing some checks to be left in a pending state
note - we still need a bubbletea fix
but, this at should handle your case @rancid turret
it still does really weird things if no /dev/tty is available, but that's very uncommon if stdout/stderr/stdin are ttys 😛
i guess now it's checking for a different error: transport.AuthorizationFailed.Error() == "authorization failed" (ref) which doesn't contain "Forbidden" so if we were getting a bunch of those before, now they'd be going down the return err path and getting retried
all of the errors were going through the return err path before (https://github.com/dagger/dagger.io/commit/a2b42c7238743920fc993681909bb394c87dc414#diff-f2914bfb48db7043bf448fa78e415d6f39b95463ca78d5c55fbfe4bada632db6L130-L154), so in my head a bunch of things were being retried unnecessarily.
I think the pending check issue could have beeing the nildereference fixed in that same commit above?
which is also strange as the nil dereference happens after the status is effectively updated 🤔
oh yep that seems likely, didn't see the nil deref fix
maybe those nil deref jobs kept getting added back to the river queue?
yes, I'd assume river has a recover from panics. So yes, it could have been a retry saturation issue in the end. Note, we need dashboard for the river job queue
yeah good idea - looks like we have that already for Daggeverse, even
Fun fact: you can get a dev build of the CLI that thinks it's a stable release, so it agrees to connect to your stable engine 🙂
$ dagger call -m github.com/dagger/dagger@v0.14.0 --source https://github.com/dagger/dagger#main cli binary --platform=darwin/arm64 export --path ./dagger-dev
$ ./dagger-dev version
dagger v0.14.0 (registry.dagger.io/engine:v0.14.0) darwin/arm64
Think that'll break some of the TUI functionality, like duration accounting (withExecs will probably say 0s)
but i'm thinking of fixing that anyway, so old traces look right in cloud v3
On a site note, I saw that you left this commit from https://github.com/rajatjindal/dagger/pull/1/commits/abdc17d87f59ac9923e94d96583cef8c7e7a092f ; I am fixing a bug regarding the socket in git using the dag.Taint(), are you planning on another PR to finish the refacto of the sockets, or can I integrate this commit ?
Does anyone have experience using buildkit cache export to S3 with dagger? 🙏 I managed to inject the configuration into my session, the engine is clearly doing something at export time, but nothing gets uploaded in the end... The exact same config string works just fine with docker buildx build
Ah, I got one lead in engine logs:
time="2024-11-23T02:43:00Z" level=error msg="error running cache export for client j1jp4r4d3wqdd9k5uy3fk8n47" client_hostname=mbsh4.local client_id=j1jp4r4d3wqdd9k5uy3fk8n47 error="failed to check file presence in cache: operation error S3: HeadObject, context canceled" session_id=wqhht3fg95go9xgnd3kiwof3m spanID=e1571202493b78c7 traceID=6ace9b43a215147d32986e1552f3b9bd
Is there any way to get more details in the logs? 😭
Slightly more logs...
time="2024-11-23T03:01:26Z" level=debug msg="getting remotes for jarz0d34dv0vhyfjr4v57w1hf::i56o1idyahbuy67m9bqrimfqc" client_hostname=mbsh4.local client_id=c8p0i5017e2cq7kh6b3rh807c session_id=8af6qk5739pf9dd2eyx9fojpz spanID=5219b6416f96a325 traceID=aeb1d99f0a7d086b91577f8645e36066
time="2024-11-23T03:01:26Z" level=debug msg="got remotes for jarz0d34dv0vhyfjr4v57w1hf::i56o1idyahbuy67m9bqrimfqc" client_hostname=mbsh4.local client_id=c8p0i5017e2cq7kh6b3rh807c session_id=8af6qk5739pf9dd2eyx9fojpz spanID=5219b6416f96a325 traceID=aeb1d99f0a7d086b91577f8645e36066
time="2024-11-23T03:01:26Z" level=debug msg="getting remotes for jarz0d34dv0vhyfjr4v57w1hf::j6cawqukhzvlblmkq2vl1pqfx" client_hostname=mbsh4.local client_id=c8p0i5017e2cq7kh6b3rh807c session_id=8af6qk5739pf9dd2eyx9fojpz spanID=5219b6416f96a325 traceID=aeb1d99f0a7d086b91577f8645e36066
time="2024-11-23T03:01:26Z" level=debug msg="got remotes for jarz0d34dv0vhyfjr4v57w1hf::j6cawqukhzvlblmkq2vl1pqfx" client_hostname=mbsh4.local client_id=c8p0i5017e2cq7kh6b3rh807c session_id=8af6qk5739pf9dd2eyx9fojpz spanID=5219b6416f96a325 traceID=aeb1d99f0a7d086b91577f8645e36066
time="2024-11-23T03:01:27Z" level=debug msg="finalizing exporter" client_hostname=mbsh4.local client_id=c8p0i5017e2cq7kh6b3rh807c session_id=8af6qk5739pf9dd2eyx9fojpz spanID=5219b6416f96a325 traceID=aeb1d99f0a7d086b91577f8645e36066
time="2024-11-23T03:01:44Z" level=debug msg="finalized exporter" client_hostname=mbsh4.local client_id=c8p0i5017e2cq7kh6b3rh807c session_id=8af6qk5739pf9dd2eyx9fojpz spanID=5219b6416f96a325 traceID=aeb1d99f0a7d086b91577f8645e36066
time="2024-11-23T03:01:44Z" level=debug msg="waited for cache export" client_hostname=mbsh4.local client_id=c8p0i5017e2cq7kh6b3rh807c session_id=8af6qk5739pf9dd2eyx9fojpz spanID=5219b6416f96a325 traceID=aeb1d99f0a7d086b91577f8645e36066
time="2024-11-23T03:01:44Z" level=error msg="error running cache export for client c8p0i5017e2cq7kh6b3rh807c" client_hostname=mbsh4.local client_id=c8p0i5017e2cq7kh6b3rh807c error="failed to check file presence in cache: operation error S3: HeadObject, context canceled" session_id=8af6qk5739pf9dd2eyx9fojpz spanID=5219b6416f96a325 traceID=aeb1d99f0a7d086b91577f8645e36066
time="2024-11-23T03:01:44Z" level=debug msg="done running cache export for client c8p0i5017e2cq7kh6b3rh807c" client_hostname=mbsh4.local client_id=c8p0i5017e2cq7kh6b3rh807c session_id=8af6qk5739pf9dd2eyx9fojpz spanID=5219b6416f96a325 traceID=aeb1d99f0a7d086b91577f8645e36066
time="2024-11-23T03:01:44Z" level=error msg="failed to flush telemetry" clientID=c8p0i5017e2cq7kh6b3rh807c error="map[error:context canceled\ncontext canceled\ncontext canceled kind:*errors.joinError stack:<nil>]" isMainClient=true mainClientID=c8p0i5017e2cq7kh6b3rh807c sessionID=8af6qk5739pf9dd2eyx9fojpz
time="2024-11-23T03:01:44Z" level=debug msg="removing session; stopping client services and flushing" session=8af6qk5739pf9dd2eyx9fojpz
time="2024-11-23T03:01:44Z" level=debug msg="stopped services" session=8af6qk5739pf9dd2eyx9fojpz
time="2024-11-23T03:01:44Z" level=debug msg="session removed" session=8af6qk5739pf9dd2eyx9fojpz
Could this be related to our dagger-specific cache exporter issues?
testing with dagger run...
yup it works with dagger run
(yay but also boohoo)
Oh Marcos noticed this the other day too, at some point a timeout of 10s got added to the CLI's shutdown process: https://github.com/marcosnils/dagger/blob/8c0d24b399be77a90d8c4bfa21496a296a6ec39d/engine/client/client.go?plain=1#L754
Which inadvertently applied to the cache export (since that runs when the client is closing the session). Didn't notice because our integ tests aren't exporting a ton of data so they always made it out in under 10s
Need to fix that, probably just not having a timeout when cache export is enabled
But you can build a CLI with that timeout rm'd to unblock quick
Sort of surprised; it might be that when you re-ran the work to compress the cache up from your previous failed attempts was re-used and you made it out in under 10s
ah I see. well my dagger run is for a tiny program that runs a tiny pipeline, so it might simply be the overhead of loading a module that causes the issue
Hey Alex,
I am a bit confused on how to use the dynamic purity feature you implemented to ensure that any socket or pat being retrieved shows up in the dagql call as a secret / socket to be passed to functions:
We currently do like that for the PAT.
But my understanding was that I could replace this selection by marking the field selectors as impure when selecting them: here for the PAT and that I could add it there for the socket
Plus, as unixSocket is impure, then anyone calling it would make it impure
My tests show that it's not enough. What am I missing ? 🙏
this looks like it was written against the old version of my PR for dynamic purity; it was switched to do the opposite before it was merged - now you keep the Impure annotation on the schema, and specify Pure: true in `dagql.Select
(i changed the pr name to 'elective purity' or something)
Mmmh, thanks. I was doing it wrong indeed 🙏 ; Now, there's still something that confuses me: we used to rely on the withAuthToken selector to ensure that the PAT retrieved shows up in the dagql call to functions.
As the core.unixSocket is marked as Impure, and Purity is not set (commented), my current understanding was that the core git API makes an impure call when selecting the socket, and, when creating the new instance for current ID, this would automagically be part of the dagql ID, removing the necessity for an __internalWithSocket and withAuthToken call as this is constructed directly here, from here or here from there.
However, I tried every variation of purity / impurity (for the unixSocket + IpSocket) + inside the git API, setting it as pure or not ; but removing the withAuthToken does seem to make it disappear from the dagql call for functions calling it ;
On your elective purity PR, you mentioned that we need to self select the field with purity: true ?
Summary: I'm still misdoign something ahah, and not sure how to unlock myself🙏
dagql elective purity no jutsu
@obsidian rover did something change in the config for gitlab we're using in our integ tests? I got a weird failure in my PR. Then I saw main passed 12h ago but just re-ran CI on main and hit the same weird error there: https://dagger.cloud/dagger/traces/e098e2462cc91e1254d3a1bbe18df97a?span=e02b84e361a6d970
In TestCLI/TestDaggerInstall/GitLab_public/git/sad. Not an emergency if so, just wondering what could have changed to cause that to fail consistently
Mmmh no, to my knowledge nothing should have changed ; digging 👀
No rush at all, just odd it seems to have started out of nowhere with no changes
fwiw, seeing the same failure in my PR: https://v3.dagger.cloud/dagger/traces/4e8392d26fe330289654d5cb471019e1?span=520650edc77f48a9
Yeah there's something odd: multi.go:85: 27 : [1.9s] | 22 : parseRefString: gitlab.com/dagger-modules/test/subdir/dep2@323d56c9ece3492d13f58b8b603d31a7c511cd41 DONE [0.2s]
The source ref is: gitlab.com/dagger-modules/test/more/dagger-test-modules-public, from vcsTestCase named GitLab public. It seems truncated weirdly
parseRefString prints an impossible ref here: https://v3.dagger.cloud/dagger/traces/fdc6f94b5168ef19cf43163fa85355e9#L297-ff0aa4dda54d634a.
is that /subdir supposed to not exist?
since the test is expecting an error
i don't see it at https://gitlab.com/dagger-modules/test 
double checking - the buildkit s3 exporter does not include dagger cache volumes right? @civic yacht
none of them do
gitlab error
Just hit this random python error while querying the core api (unrelated to python in every way)
✘ .withExec(args: ["uv", "run", "--isolated", "--frozen", "--package", "codegen", "python", "-m", "codegen", "generate", "-i", "/schema.json", "-o", "/gen.py"]): Container! 1.9s
error: Failed to prepare distributions
Caused by: Failed to build `codegen @ file:///src/sha256:66b11cafd456559c6a162a3349d8409b474f5a6476ed31c9534ef046dada37d6/sdk/python/dev/c
Caused by: Failed to resolve requirements from `build-system.requires`
Caused by: No solution found when resolving: `hatchling`
Caused by: Failed to download `hatchling==1.26.3`
Caused by: Failed to fetch: `https://files.pythonhosted.org/packages/72/41/b3e29dc4fe623794070e5dfbb9915acb649ce05d6472f005470cbed9de83/ha
one-any.whl.metadata`
Caused by: Request failed after 3 retries
Caused by: error sending request for url (https://files.pythonhosted.org/packages/72/41/b3e29dc4fe623794070e5dfbb9915acb649ce05d6472f00547
-1.26.3-py3-none-any.whl.metadata)
Caused by: client error (Connect)
Caused by: tcp connect error: Network unreachable (os error 101)
Caused by: Network unreachable (os error 101)
Is the dagger engine hitting pythonhosted.org on every core api call every time it loads github.com/dagger/dagger?
I think it is
ack, sorry, I added more brokenness on main after merging https://github.com/dagger/dagger/pull/9054 - looking into it now
found the issue - it's actually from https://github.com/dagger/dagger/pull/9054 (edited original msg), and it's because we've been using semconv v1.24.0 meanwhile the otel SDK version we're using wants v1.25.0 - should be a trivial bump, i think, this sort of dependency pain is pretty localized
FYI I think there is a bug with context dir, when loading a local module: https://github.com/dagger/dagger/issues/9068
@still garnet I'm filing an issue, and it's not urgent, but just in case you know the answer off the top of your head: I can't get the TUI to work inside asciinema, inside dagger... For some reason the TUI output is excluded from the recording
in dagger shell/terminal?
unrelated to the shell
I had the same problem last year when I made that asciinema module... still breaking my teeth on it now
(was hoping to finally automate the production of our docs gifs)
hmmm, shot in the dark, but try asciinema rec --cols 80 --rows 24 - maybe something isn't passing along a window size?
Nevermind, I may have tracked down the issue, there is a huge difference between stable releases of asciinema, and the dev branch on their repo. The latter appears to be an in-progress rust rewrite. I was using that to make it easier to inject the tool into arbitrary containers (to allow recording any command on any container). But the tty problem seems to go away when I just use the stable release. So, I may redirect my efforts towards finding another way to record in any container.
ahh ok. you could try https://github.com/charmbracelet/vhs - I love its UX, the only crummy thing is it tends to produce fast-forwarded gifs: https://github.com/charmbracelet/vhs/issues/88
but maybe it'll be fine on your machine 🤷♂️
Hi folks,
I have a question about how View works (cross posting the question from here (https://github.com/dagger/dagger/pull/8865#discussion_r1862896017)
- so lets say current release is v0.14.0
- now I changed the AsService api so that for BeforeVersion("v0.15.0") it should serve old api, and for AfterVersion("v0.15.0") it serves the new version.
- now when we run the tests, it runs with the current version of dagger, which would be v0.14.1-.....
- that would mean that the api visible to tests would be the old one. so we don't need any changes in existing tests yet.
- now lets say v0.15.0 is released. this would mean tests would suddenly start seeing v0.15.0 version of api, and the tests would fail as the new api is no longer backward compatible.
- now we would need to make changes to the existing tests?
does ^^ makes sense? It almost feels like I am missing something obvious here.
yes. welcome to complexity hell lol 😄
so, if you glance at future_test.go you'll see an example of where we've done this
see https://github.com/dagger/dagger/issues/8670 as well
but. this is maybe kind of irrelevant? the next release is going to be v0.15.0? so we should update the "current" releases to be v0.15.0-...
which sidesteps the problem entirely
so we should update the "current" releases to be v0.15.0-...
ah, yeah that may work. The prob is with the existing tests (which are not necessarily testing this api but depends on it).
yeaaah okay um
let's just bump to do this
i don't really want to figure this out with future_test.go, i'm not sure that approach is very fun
fyi @meager summit, went through and looked at your open prs, and left some comments - trying to make sure you're not blocked!
fyi @Rajat Jindal, went through and looked at your open prs, and left some comments - trying to make sure you're not blocked!
that is super helpful. thank you. It is still a secret, but I was planning of pinging someone on Monday to help move me forward with these PRs. No points for guessing who is this someone.
Justin, while you are here, I have one more question for you 🙂
I am working on chore(tests): archive test results in json format with CI runs (https://github.com/dagger/dagger/pull/9011) and as part of that I need a few information to be attached to the test results:
- current PR number
- current branch
- current commit
- current GitHub action run id
This information is available as env variables when triggering the GitHub actions. Is it possible to make changes to our .dagger module to pass this information to test function?
uh, getting the current commit (and branch+PR to some extent) should be possible using t.Dagger.Git which can get info for that.
but getting the action run id, we'd want to inject it through an env var, manually using a flag (probably on the dagger constructor)
i'd kind of like to avoid injecting github-specific stuff into dagger config, it feels like a bit of a smell, but not sure really how to keep it moving
i'm trying to get our ci into a shape more away from the specifics of github actions
hm, reminds me of #1288202189558517832
we have this info in cloud traces somehow. so i am assuming we do have the data in context of gh actions already?
we do, but it's not exposed through to functions
dagger cloud gets passed these automatically
do we currently have a way in core of getting a dagger Directory from a directory on the engine host?
i have some ideas for how to implement it, but just wanted to check before i spent time on hacking it in
👀 this is an internal impl detail i need it for
it's not for an external api
without it, it will be harder to add this feature 😛 (it's for contextual git)
Ah nevermind then 🙂
Yes, we use it for Container.AsTarball, but I'd suggest trying to avoid it if at all possible. I'd like to remove it someday since it's annoying to maintain, slow, etc.
Hm okay well maybe I'll try and come up with something better 🤔
Updating/adding an LLB source is a better route. I think we could actually customize or even add new LLB ops (not just sources) now that we have our own worker too (just need to handle calling it in ResolveOp).
Those options give you access to raw snapshots/mounts so you should be able to do whatever you want
What's the recommended best practice for installing a dev version of the engine on my system? I know where to put the CLI, but where should I put the engine image?
If I only install the CLI, it seems to auto-download a dev build of the engine, but unless we recently implemented "magical build on demand" from our registry, I'm guessing it's just main
https://github.com/dagger/dagger/issues/7466 https://tenor.com/view/someday-not-tomorrow-not-today-maybe-maybe-not-today-gif-13346626
is there a plan for supporting custom certificates & proxy config without requiring building a custom engine, or messing with the engine image in any way?
This will be part of: https://github.com/dagger/dagger/issues/9007
just a quick ping on this
what's the rest of the context here? are you dling from CI outputs?
also where do you put the CLI?
I setup a new bare metal lab machine, to ssh into for performance (especially when on a plane 🙂 . I just want the system install to be for dagger-dev rather than stable release. But otherwise it's a persistent system-wide install
Yeah I think as @obsidian rover said above - we'll do it as part of that issue
Which I would like to do, but priorities haven't let me get to it atm 😭
But it is definitely on my little list of things to do 🎉
Maybe after my little visit to git land?
We have a lot of enterprise type users asking for this, so I really want to get to it 🎉
this should work. curl -fsSL https://dl.dagger.io/dagger/install.sh | DAGGER_COMMIT=$commit BIN_DIR=$path sh
yeah agreed 🎉 will be a fun little journey, so will try tackling this properly 🙂 would be good to finally fix https://github.com/dagger/dagger/issues/6927
i think we want an Op instead of a Source for this, since we actually want to track inputs/outputs as part of the tree
maybe something like a HostExecOp would be generic and neat enough for our use cases?
the context for why i want this is in https://github.com/dagger/dagger/pull/9098 😄
might interest you too @obsidian rover ^
the basic idea is to just make the current git implementation one of many backends - for now, we'll add Directory.asGit, but we can also live-load these details from the client as well later
Rant(status:optional, length:short, urgency:low, areyoustill:reading?)
Rant of the day: prefixing commit messages & PR titles with feat(come-component): or chore: hurts readability. IMO titles are for data, not metadata. If a commit does a thing, the commit should be "do the thing"
i agree, especially after following the existing pattern for a few weeks, the metadata feels like a recipe for word salad
especially around bugs where it's already mildly tricky to get the right verb tense, putting bug(engine): in front reads weird
bug: make thing do correct behavior
wait is doing the correct behavior a bug now?
re-upping this... What do I do with the output of dev-export?
Is is by setting _EXPERIMENTAL_DAGGER_DEV_CONTAINER?
Basically, how to install a dev build of cli+engine pair on my system
I agree about the components like feat(component) - I'm happy to drop those.
But I find the general "type" of action quite useful, especially when prepping releases to see what actually needs release noting / blogging / raising.
(Git itself doesn't have metadata, even though GitHub does, so I find it useful when looking locally)
./hack/dev
Which shells out to a mage script that I want to get rid of, but for now is still necessary
What is the relationship between ./hack/dev and dagger call dev-export?
Hack/dev -> weird mage thing -> dev-export
Context: I don't have a go dev environment installed on the host system, and would prefer not to. ./hack/dev requires a certain go version installed on the host
Can I replace ./hack/dev by 1) calling dev-export and 2) setting env vars myself?
Yup
What variables please? 🙏 I looked at https://docs.dagger.io/configuration/custom-runner#connection-interface and couldn't find anything about using a tarball
I'm not at my laptop right now, but the tarball needs to be imported to docker using docker import
Yup I think that should work - it's what the weird mage script is doing under the hood anyways
This makes me want "one binary" more
@spark cedar ha ha! I found a way to use ./hack/dev directly without sullying my host, using the power of dagger:
$ dagger shell -m github.com/dagger/dagger -c 'go | env | with-file /bin/dagger $(cli | binary) | terminal --experimental-privileged-nesting'
dagger /app $ ./hack/dev
🤔 how does the terminal there have access to the hosts docker?
But if you pass the docker socket it should work 👀
so I guess I didn't even make it as far as lack of docker access
Oh yeah I hit this lol yesterday - go env doesn't include the .git directory
Here's what I tried:
COMMIT=$(dagger core git --url=https://github.com/dagger/dagger head commit)
dagger call -m github.com/dagger/dagger@$COMMIT dev-export -o ./dagger-$COMMIT
docker import ./dagger-$COMMIT/engine.tar registry.dagger.io/engine:$COMMIT
./dagger-$COMMIT/dagger core version
Last command fails with:
$ ./dagger-$COMMIT/dagger core version
✘ connect 1.2s
! start engine: failed to run container: 2f9b918a4258bf650a247b29e4fc987f0802f99491816fc1e48404c34ecb4db7
! docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: exec: "--debug": executable file not found in $PATH: unknown.
! : failed to run command: exit status 127
│ ✘ starting engine 1.2s
│ ! failed to run container: 2f9b918a4258bf650a247b29e4fc987f0802f99491816fc1e48404c34ecb4db7
│ ! docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: exec: "--debug": executable file not found in $PATH: unknown.
│ ! : failed to run command: exit status 127
│ │ ✘ create 1.2s
│ │ ! failed to run container: 2f9b918a4258bf650a247b29e4fc987f0802f99491816fc1e48404c34ecb4db7
│ │ ! docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: exec: "--debug": executable file not found in $PATH: unknown.
│ │ ! : failed to run command: exit status 127
│ │ │ ✔ exec docker ps -a --no-trunc --filter name=^/dagger-engine- --format {{.Names}} 0.0s
│ │ │ ✔ exec docker inspect --type=image registry.dagger.io/engine:8cfb4bc89fbb1a2cba30ecb35feaed5a81fc2916 0.0s
│ │ │ ✘ exec docker run --name dagger-engine-8cfb4bc89fbb1a2cba30ecb35feaed5a81fc2916 -d --restart always -v /var/lib/dagger --privileged registry.dagger.io/engine:8cfb4bc89fbb1a2cba30ecb35feaed5a81fc2916 --debug 1.1s
│ │ │ ┃ 2f9b918a4258bf650a247b29e4fc987f0802f99491816fc1e48404c34ecb4db7
│ │ │ ┃ docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: exec: "--debug": executable file no
│ │ │ ┃ t found in $PATH: unknown.
│ │ │ ! failed to run command: exit status 127
Error logs:
✘ exec docker run --name dagger-engine-8cfb4bc89fbb1a2cba30ecb35feaed5a81fc2916 -d --restart always -v /var/lib/dagger --privileged registry.dagger.io/engine:8cfb4bc89fbb1a2cba30ecb35feaed5a81fc2916 --debug 1.1s
2f9b918a4258bf650a247b29e4fc987f0802f99491816fc1e48404c34ecb4db7
docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: exec: "--debug": executable file no
t found in $PATH: unknown.
! failed to run command: exit status 127
Error: start engine: failed to run container: 2f9b918a4258bf650a247b29e4fc987f0802f99491816fc1e48404c34ecb4db7
docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: exec: "--debug": executable file not found in $PATH: unknown.
: failed to run command: exit status 127
$ docker ps -a | grep engine
2f9b918a4258 registry.dagger.io/engine:8cfb4bc89fbb1a2cba30ecb35feaed5a81fc2916 "--debug" About a minute ago Created dagger-engine-8cfb4bc89fbb1a2cba30ecb35feaed5a81fc2916

I see that dagger tries to execute this command:
docker run --name dagger-engine-8cfb4bc89fbb1a2cba30ecb35feaed5a81fc2916 -d --restart always -v /var/lib/dagger --privileged registry.dagger.io/engine:8cfb4bc89fbb1a2cba30ecb35feaed5a81fc2916 --debug
Is that the right command?
2 hours in, I am giving up on installing a dev build of dagger on my home server
Did you try this? #maintainers message that works for me, the CLI knows how to use the right engine built off the same commit
Gah I had missed that... My fault for not creating a thread 🙏 will try, thanks @civic yacht @wild zephyr
Will this work on any commit on the repo? Even unmerged dev branch?
no, only commits merged to main
as those are the only ones that publish the CLI and the Engine
Ah, so this is the same as dagger call github.com/dagger/dagger/cmd/dagger@$COMMIT binary I believe -> will get the right CLI. Then the CLI itself downloads the right engine, as long as it's from a commit on main (and perhaps not too old I guess)
hmmm not exactly the same I think because in this case I'd assume you could potentially target any commit in the repo. In the script above, you can only target commits that belong to main
Yeah I can build the CLI for any commit, but if that commit is not in main, then that CLI will fail to download the corresponding engine (and not sure what happens then)
the CLI will just fail eventually as it won't find that engine image
@wild zephyr how soon does an engine image become available from main on the registry?
I tried pulling the current head commit, and it doesn't work. Same for a 2-day old commit.
$ docker pull registry.dagger.io/engine:d1e140d84910b0d0bc5427d845e6bdf4d2d16e83
Error response from daemon: manifest unknown
Looks like it's weekly
This commit didn't modify the engine
we only publish the engine when any of the engine files gets modified
it's part of the GHA workflow trigger basically
Ah. That makes my life a little harder
CI on main is also current stuck with a bunch of jobs pending waiting for a GHA runner for some reason, so the last two commits haven't run the publish job yet
Because I have to manually find the last commit that change the engine, from the history of the commit I actually want, and build that
OK most recent working commit I found: 8cfb4bc89fbb1a2cba30ecb35feaed5a81fc2916
Update: this fails with the same weird docker error...
still stuck on this 😭
I'm thinking maybe my version of docker is too old?
Which error? Also double-checking all the various _*DAGGER*_ env vars are unset. I tried earlier with that same commit and it worked
@civic yacht seems like this error.... as if the entrypoint is not honored for some reason 🤔
Yeah... When I run that dev CLI the engine starts successfully using the entrypoint as expected... My docker versions are
Client: Docker Engine - Community
Version: 27.3.1
API version: 1.46 (downgraded from 1.47)
Go version: go1.22.7
Git commit: ce12230
Built: Fri Sep 20 11:40:38 2024
OS/Arch: linux/arm64
Context: default
Server: Docker Engine - Community
Engine:
Version: 27.0.3
API version: 1.46 (minimum version 1.24)
Go version: go1.21.11
Git commit: 662f78c
Built: Sat Jun 29 00:02:44 2024
OS/Arch: linux/arm64
Experimental: false
containerd:
Version: 1.7.18
GitCommit: ae71819c4f5e67bb4d5ae76a6b735f29cc25774e
runc:
Version: 1.7.18
GitCommit: v1.1.13-0-g58aa920
docker-init:
Version: 0.19.0
GitCommit: de40ad0
@tepid nova does docker run --rm --privileged -v /var/lib/dagger registry.dagger.io/engine:v0.14.0 --debug actually work?
^ if that doesn't work then yes, maybe a docker version issue
Nope, I tried earlier and it fails with the entrypoint error
how about this? docker run --rm --privileged -v /var/lib/dagger --entrypoint /usr/local/bin/dagger-entrypoint.sh registry.dagger.io/engine:v0.14.0 --debug
- if you haven't upgraded docker yet 😛
OK now that I upgraded docker, running these commands myself works, but still getting the error when calling dagger... 
(note, upgrading docker was a painful affair, I'm on an old ubuntu and ended up downloading the static binaries and manually writing the systemd unit for dockerd)
COMMIT=$(dagger core git --url=https://github.com/dagger/dagger head commit)
dagger call -m github.com/dagger/dagger@$COMMIT dev-export -o ./dagger-$COMMIT
docker import ./dagger-$COMMIT/engine.tar registry.dagger.io/engine:$COMMIT
./dagger-$COMMIT/dagger core version
^ this should work now. I was about to suggest this approach. What error are you getting?
I will try, but I have been doing this with a very recent commit. Exact command:
curl -fsSL https://dl.dagger.io/dagger/install.sh | sudo -E DAGGER_COMMIT=8cfb4bc89fbb1a2cba30ecb35feaed5a81fc2916 BIN_DIR=/usr/local/bin sh ; /usr/local/bin/dagger core version
Yeah that works for me. Not sure upgrading docker cleans all previous state, maybe docker system prune -a in case some previous state is messing with something?
found the issue here. docker import is not what we need, it should be docker load && docker tag . That's because docker import doesn't preserve some of the manifest settings metadata
Trying with the most recent main commit that seems to work: c811cc8b23c6398b4bf3b3ea358733759b9b9257
just tried this myself and that works
Ah it works for me on latest commit!
so weird
For the record, this worked:
sudo rm /usr/local/bin/dagger ; curl -fsSL https://dl.dagger.io/dagger/install.sh | sudo -E DAGGER_COMMIT=c811cc8b23c6398b4bf3b3ea358733759b9b9257 BIN_DIR=/usr/local/bin sh ; /usr/local/bin/dagger core version
And this continues to fail:
curl -fsSL https://dl.dagger.io/dagger/install.sh | sudo -E DAGGER_COMMIT=8cfb4bc89fbb1a2cba30ecb35feaed5a81fc2916 BIN_DIR=/usr/local/bin sh ; /usr/local/bin/dagger core version
So I think it might be a regression in main? Or perhaps in how we built the engine images from main.
SGTM!
8cfb4bc89fbb1a2cba30ecb35feaed5a81fc2916 works for me and I checked both platform images and they have the entrypoint set, so I suspect something in your docker state is messing with it. If you imported/pulled/etc. the image corresponding to 8cfb4bc89fbb1a2cba30ecb35feaed5a81fc2916 earlier and it had some manifest cached that ended up not having an entrypoint, that might explain it.
ah yeah, that's probably it 👍 thank you
whoall works frequently on the engine from a macos host other than me?
i've only this morning realized how unfriendly the tests are to macos hosts, like i've been running ./hack/dev dagger call test specific like the hack scripts are doing anything meaningful here, when really they just double wrap everything in my built code for no real gain... all of this is in pursuit of being able to profile an engine while i run tests against it
i've got a lot of ideas as to how to do this effectively, but the decision matrix of which to pursue is rough:
- enhance .dagger/test.go to pull pprof profiles after test runs, so you can call something like
dagger call test profile -o /tmp/profile.p(worried this is gonna give me profile files locally that have no function names since the binaries weren't built on my machine, but inside dagger) - teach the integration tests that they need to put linux binaries into the containers they create, then teach hack (and prolly dev-export too?) to cross compile the CLI if it's on macos
- give up, pull test cases into manual test modules (this is the easiest, but has no benefit for other macos engine devs)
- super give up, spend a couple days building a reproducible & semi-persistent linux setup (this seems deranged given that we're working on a tool that promises to make this sort of thing unnecessary, but less deranged if we acknowledge there's a legit bootstrapping problem here)
having typed that, i think 2 sounds best?
sorry, I don't have a lot of context around the pprof work you are doing. do you mind giving a little context of how you are using the profiling here
lol, yeah, sorry it's very x-y, i'm a couple layers deep rn.
i'm looking to pull profiles (initially pprof, but i wanna play with fgprof because we likely have disk-write perf problems) from an engine while i run ModuleTests against them... to get hyper-specific i'm looking to understand the time cost of OCI importing built-ins
Calling maintainers... We have an opportunity to try one of those "AI software devs" to farm out issues humans don't have time to get to. I need a list of potential candidates for such issues...
@civic yacht @meager summit @spark cedar @stray heron @fair ermine @obsidian rover @final star @rancid turret @surreal berry @wet mason @still garnet 🙏 any suggestions?
the known technique that linux-host-devs do to do this sorta thing is dev && with-dev go test -run "TestModule" ./core/integration && curl docker-container://dagger-engine.dev:6060/debug/pprof/profile
-- this doesn't work on macos bc core/integration likes to load your local built CLI into the container
now regretting having not made a thread
indeed, teaching the int tests and dev-export more about x-OS builds is highly workable here, PR forthcoming
tq for coming to my TED talk
What's the go-to test suite for every day engine dev? dagger call test specific?
depends on how tight you're trying to make the loop. I'll use specific for running 3 or 4 tests and tryna keep it simple
running go test directly gets very nice and useful when you're iterating on the test itself or profiling
and then in terms of subsets of tests i've found myself mostly in TestModule & TestService, but that's largely a function of what i'm working on
regardless, i can't actually get any large subset of the tests to run locally (or dagger-call-locally) in a reasonable time, so it's always small subsets. i wish ./hack/most-tests was maintained just to have a quicker, sub 10m sanity-check-all-the-things
There's an issue for adding a "sanity check" suite like that, agreed we want it. Obviously getting the tests to be faster so we can easily run them all would be even better but in the long term as we accumulate more we'll eventually hit fundamental limits and need a smaller focused set to run locally
i guess i shouldn't say sanity-check, that issue is not what i'd like, i more want to run everything that isn't resource-intensive
Hopefully one day it will be a built-in feature of Dagger Cloud 🙂 cc @still garnet @tidal spire
yeah, actually quite adjacent to flake detection
would be cool to somehow have heuristics for "most useful" tests (based on code-coverage, number of times a non-flake error occurs in unmerged PRs, etc.) 
most efficient is what i'm looking for lol, useful/time
the one problem though with any auto-test-subset-detection is you can really shoot yourself in the foot by assuming that some test you need to run is running when the system has suddenly decided that it should not, and with a big corpus, even showing a diff "heres the ones we're skipping" doesn't end up human-comprehensible
had an idea about changing the entries and glob methods: https://github.com/dagger/dagger/pull/9118
kinda interested in feedback, i had a specific use case where it would have come in very handy
Hi @spark cedar I am working through the scenarios of dagger update, and one scenario is when user runs dagger update <just-name>@<version>.
based on our discussion on the call the other day, I am thinking it should trigger the update and change the version to specified version.
however, when we are parsing this using parseRefString function, it returns the hasVersion: false. I made the change for it to parse the version correctly, and that fixes the testcase (and i am keeping an eye on CI for this).
BUT - what do you think about this change?
🤔 so the reasoning for the way it is currently is that local modules can't currently be versioned
potentially this might be possible in the future using https://github.com/dagger/dagger/pull/9098 (not sure though, tricky to imagine use cases i think)
actually this is easy
i don't think this is a valid case
you shouldn't be able to uninstall by name@version
that's not a module ref
I am currently talking about update scenario.
right or update
User can do “dagger update name”
oh i see, if we have update-to-version semantics
bleh
okay, then i think this is subtly different than module ref semantics.
we should remove the @ before passing to parseRefString
But… it would mean it will just behave like “dagger update name”?
sorry, i'm not making sense, my bad
i mean, we remove it, and keep it 😛
we don't change the behavior of parseRefString - it's not valid to do dagger install ./local/mod@version
for the update case, you take the "@version" off first, and save that - that's the version to upgrade to 🙂
(also, realized - make sure we have a test case for trying to update a local source module - that shouldn't be possible)
since local modules aren't versioned, you shouldn't be allowed to update them
I will add an explicit testcase and validation for it 👍🏻
You can’t create a module that conflicts with a core type: ❯ dagger init --sdk=go container Initialized module container in container ❯ dagger functions -m container Error: get module name: input: ...
inspired by this: https://github.com/dagger/dagger/pull/9124
this feels like a useful way to eventually get rid of ./hack/dev - but it feels weird that i can load the docker image using the docker socket directly, but i can't export to the ./bin directory 🤔
it feels like i want a WriteableDirectory type or something to be able to directly write to from a function
this is somewhat neater with the shell fyi - but it still feels like a bit of an api disparity - one of these functions can write stuff to the host, one of them can't
til withunixsocket, last time i tried to kill hack i was put off by the fact i still needed a script to orchestrate the load-to-docker the a host (figured this was probably mitigated by shell, but i didn't even consider using the docker socket)
works on macos 
Overview When alpinelinux.org is down, building dagger fails. This may also affect other dagger modules that use alpine in the same way we do. What I did dagger call -m github.com/dagger/dagger dev...
<@&946480760016207902> I messed up and merged a PR with a committed binary. I want to fix it before too many people pull the tainted history. Do I have your blessing to force-push an amended version of that commit?
👍 sounds good! i've got a local copy of the latest state of main as well, so worst case, i can push that if we make a mistake 🙂
Rewriting history
⚠️ attention maintainers & contributors ⚠️ we had to rewrite history on main. If you pulled main in the last 12 hours, you may have failures next time you pull. If that's the case, you'll have to delete the old main branch:
git checkout <another-branch>
git branch -D main
git fetch origin
git checkout main
Sorry about the inconvenience
⚠ **attention maintainers &
Continuing the chain of inspiration 🙂
#!/bin/bash
set -ex
SOURCE=$1
if [ -z "$SOURCE" ]; then
echo >&2 "Missing source"
exit 1
fi
TMP=$(mktemp -d)
echo "Installing from $TMP"
cd $TMP
dagger shell -i -m "$SOURCE" <<'EOF'
container |
from index.docker.io/docker |
with-unix-socket /var/run/docker.sock $(host | unix-socket /var/run/docker.sock) |
with-workdir /root/dagger |
with-directory . $(dev-export --platform=current) |
with-env-variable COMMIT $(.deps | version | git | head | commit) |
with-new-file load.sh 'docker tag $(docker load -i engine.tar | sed -n -e "s/^Loaded image ID: //p") registry.dagger.io/engine:$COMMIT' |
with-exec sh,load.sh |
file dagger |
export ./dagger
EOF
@fair ermine I think there's a typescript test that's failing pretty consistently (on main and in new PRs) https://github.com/dagger/dagger/issues/9149
Saw this in a PR and then on main, from this commit https://dagger.cloud/dagger/traces/c6bf2d9a604a4307490d228f73bf6d9d?span=9e076b65df227739
Using a recent dev build of dagger, all of sudden I'm getting the same error loading pretty much any module:
error: parse selections: parse field "pin": ModuleSource has no such field: "pin"
Update: this is not happening on main, but on latest build of https://github.com/dagger/dagger/pull/9097. Some kind of regression?
OK I think I'm screwing up somewhere in my technique for building & installing dev engines...
How is this possible (fresh out of a dev-export):
./dagger version
dagger v0.15.0-241209202543-8dac6d5f8db7 (registry.dagger.io/engine:eb738ebe8bf53a80f8061d377dd04934e1489fce) darwin/arm64
--> note the different commit IDs for CLI (8dac6d5f8db7) and engine image eb738ebe8bf53a80f8061d377dd04934e1489fce) what's up with that???
this is from:
dagger call -m $(github.com/helderco/dagger@helder/dev-4805-shell-navigation-model) dev-export --platform=current export --path=.
The commit I'm building from is indeed 8dac6d5f8db7. So why won't the CLI build inject that same commit into its engine image ref?
I'm very confused right now
@civic yacht sorry to both you... every other core maintainer is either sleeping, on vacation or sick 😅 am I doing something very stupid above?
I guess I'm trying to replicate manually what @spark cedar implemented here: https://github.com/dagger/dagger/pull/9124/files
My theory:
- when building CLI from commit
foo, it will attempt to download an engine fromregistry.dagger.io/engine:foo. - By loading the corresponding engine.tar into docker then tagging it as
registry.dagger.io/engine:foo, I can trick the dev CLI into using the correct dev engine
Reality:
- The CLI from commit
fooattempts to download an engine fromregistry.dagger.io/engine:bar, wherebaris a commit onmain - Therefore I don't know how to reliably inject my
engine.tar
Yeah for a local build of a CLI (i.e. not an official release or a build of the CLI off of main), there's no published image for the corresponding engine for us to default to (since it's all based off local code). I guess the current behavior is to default to the engine image corresponding to main, which isn't entirely unreasonable since that's the "closest" image that's actually published somewhere, but it's not going to be a dev engine built off of your branch by default.
But what you typically do in this situation is tell the CLI which engine to connect to via _EXPERIMENTAL_DAGGER_RUNNER_HOST. So you'd need to build the dev engine and load it into docker and then point to it with that env var.
The thing is, my use case is to install a dev build as my "system dagger". So it's inconvenient to have to carry env variables around
Justin's PR does that and defaults the container in docker to "dagger-engine.dev" (but can be overridden via the name arg), so you'd need to set _EXPERIMENTAL_DAGGER_RUNNER_HOST=docker-container://dagger-engine.dev
I see, makes sense.
I guess what I need to do to simplify install, is parametrize my build to inject the registry image I want in the CLI binary
looks like that's not exposed in dev-export though
Yeah that's what I was just thinking, that would make sense as arg to the CLI build. The value gets configured at build time via a -X flag to go build, so we could implement that in our CI already I think, no engine code changes needed
(probably, going off memory here)
yeah I'm familiar with the build logic because I recently refactored it all to use github.com/dagger/dagger/modules/go
now there's a nice clean values argument which abstracts away the -X ldflags 🙂
What's missing is optional an optional argument in github.com/dagger/dagger/cmd/dagger.Binary() to override the image tag
when you're trying to build a go binary, but you have a dependency on the Dagger Python SDK and pythonhosted.org is down...
(once nice benefit of splitting up modules, github.com/dagger/dagger/cmd/dagger doesn't have the python dependency, and therefore it still builds.
@civic yacht OK I think I managed to hack it just with the dagger CLI 🙂
I'll push a fix! It's not a bug, but the codegen test that I forgot to update. I didn't caught it in the PR because they are not wrapped in a job and I might have read the err stdout a bit faster.
However it's not a issue for user, just internal tests that fails because of that, so no emergency.
Not an emergency, but known red tests are always distracting and potentially hide real fails
PR is here: https://github.com/dagger/dagger/pull/9155 😄
Checks are green, ready for a merge
Thanks!
@civic yacht is there a quick way to totally invalidate engine caches? the WithEnv(CACHE_BUST) thing prolly works fine for my current application, but im curious if there's some hack to throw out the entire engine cache without rebuilding it from scratch
there used to be a dagger --no-cache
dagger core engine local-cache prune
slick and easy
interesting that it's undocumented afaict?
not really sure where the right place to document would even be apart from maybe https://docs.dagger.io/reference/cli/
(this is me implicitly offering to document lol)
unless there's a reason we dont wanna
https://docs.dagger.io/configuration/cache/#cache-inspection we need to update this, it didn't used to be available in the CLI but once we added dagger core it became much easier
Continuing the chain of inspiration 🙂
I think I found a regression: loading a module without an sdk field now fails, but used to work
What specific command(s)? We have some test coverage for all that but might be missing some path
dagger functions -m github.com/shykes/x
Did that used to work? I tried on v0.15 and v0.14 and get an error on both. I also would expect an error when calling functions since there's no way a module w/ no SDK can provide functions
v0.15.1 🎉
idk how much y'all macos engine devs have played with different container runtime provisioners, i was getting fed up with orbstack producing untrustworthy container stats, and started playing with colima again... it turns out, once you start your colima VM with enough memory and enable rosetta, by my cursory measurements it's significantly faster than orb for ./hack/dev builds, like 20%+, both with a cold cache or a warm one
colima start --vm-type=vz --vz-rosetta --cpu=8 --memory=16 --disk=500
Colima vs Orb
(Don’t know how warm) Orb dev took 2m 13s
Orb, pruned cache, unlimited cpu: 3m 49s
Pruned cache. 8cpu/16ram:
Colima dev took 2m 48s
Orb dev took 3m 26s
Colima dev took 3m 8s
Warm cache, 8cpu/16 ram:
Orb dev took 33s
Colima took 25s
Warm cache, change to bk worker.go, 8cpu/16 ram:
Orb took 57s
Colima took 46s
extremely bullshit benchmarking, fwiw, single runs here, but still
Interesting, curious what's making a difference there. I've always wondered about the connection from the CLI -> engine for these macos cases, since there's a lot of data that needs to cross a VM boundary, which could plausibly be more/less efficient for different underlying implementations. But also a zillion other things that could impact this.
(Realize that's probably not a rabbit hole worth going down at the moment amongst all possible rabbit holes 😄 , good to know either way)
i have the fridays rn so i'm playing with tools when i told myself i would backfill tests... the other thing i wanted to make work was colima/containerd/nerdctl instead of docker, but seems like we might rely on some specific docker cli output from docker load that's ever-so-slightly different with nerdctl
dev
~/src/dagger/.dagger/mage ~/src/dagger
> dagger call dev-export --platform=darwin/arm64/v8 export --path=/Users/braa/src/dagger/bin
✔ connect 12.8s
✔ load module 1m18s
✔ parsing command line arguments 0.0s
✔ daggerDev: DaggerDev! 11.0s
✔ .devExport(platform: "darwin/arm64/v8"): Directory! 1m10s
✔ .export(path: "/Users/braa/src/dagger/bin"): String! 1.1s
/Users/braa/src/dagger/bin
Full trace at https://dagger.cloud/dagger/traces/8bf485b80d073ef10f90c6bbe251b4d8
Error: unexpected output from docker load: unpacking overlayfs@sha256:c1e560a0d2a6b878a25bb361bb208c5e7996c42e2b700c45c392e0df7ba67999 (sha256:c1e560a0d2a6b878a25bb361bb208c5e7996c42e2b700c45c392e0df7ba67999)...
Loaded image: overlayfs:
exit status 1
~/src/dagger
Better late than never... Here's a PR to start automatic the recording of our demo gifs: https://github.com/dagger/dagger/pull/9209
Hi @spark cedar , when we call dagger call some-fn-that-returns-container up, does the last up part starts a different server? What I am observing in the logs is that the view is v0.14.0 (from dagger.json) up until the function returns, and then it is v0.15.2.
yes
so each caller gets a different server
the up here is actually a different server, because it's being called from the cli
imo - this is expected - we should not attempt to fix this
we should fix the case where Container.up is called from another module
but from the cli, it should follow the new version - the reason is, if we follow the design of "use the version returned by the last thing", it suddenly breaks all sorts of things - e.g. it means that you can't Container.terminal if a module uses a super old version of dagger when we modified it to make it chainable
when you call up on the cli - it should use the latest api - regardless of who returned it.
we shouldn't do something clever and try and use the version declared by the module - it hugely complicates the version matrix, i'm really hestitant to make versioning even more complex than it already is
hm, well maybe i'm wrong here, up isn't actually versioned
(for AsService)
maybe we should jump into #911305510882513037 since i feel unsure of what's actually going on here
Container.up breakages
Give me 2 mins
^ outcome of chat: https://github.com/dagger/dagger/issues/9190#issuecomment-2545604302
@obsidian rover I'm gonna need to update all the git repos we use in our integ tests, I found most of the creds but not the one for "Azure DevOps public", is that hiding somewhere or do I need to go through you?
The context is sort of funny: I have had to re-work a few parts of dagql for this PR and saw that this integ test for custom SDKs started failing. Turns out, that test should have started failing a long time ago because the type of the introspectionJson arg changed from string->File, but for whatever reason it kept running without an error. I'm not even 100% sure yet what I did to cause it to fail correctly, so I guess I just accidentally fixed a bug but now have to deal with the consequences of updating all the integ test repos 😂
👀 actually, once we have all the creds 🤔
If you put them in a vault somewhere, I'm happy to add some automated dagger pipelines to make all of them mirror the main GitHub ones 🎉
checking 👀
Since currently my pipeline is @ Guillaume, pls pls help me
The credentials for the dev.azure.com are the dagger infra credentials -- do you have them ?
I can do it if you want, don't want to add to your plate ahah 👼
Yeah go for it 🎉🎉🎉🎉
Oki, adding on my todo 👍
@tidal spire putting this on your radar: https://github.com/dagger/dagger/pull/8730#issuecomment-2521824681
Thanks! I'll dig in!
Not urgent, dealing with other stuff first atm ... just want to make sure the design works with vault since that's the primary use case
Private packages in modules 🧵
@civic yacht is it intentional that we use singleflight.Group (from an external dependency) in https://github.com/dagger/dagger/blob/ff0823f0bdccb873e53d4287e72a4d9fc7c7c3a6/engine/buildkit/blob.go#L21 ?
As opposed to our custom SingleflightGroup?
huh, fun realization - we actually haven't been doing graphql validation on inputs since our dagql migration (so all of these fun things have been getting skipped https://github.com/vektah/gqlparser/tree/master/validator/rules)
through an absolutely incredible coincidence, we haven't been importing this package (which sets the rules up) - so we haven't actually been validating any of these 😱
noticed this special little hell in https://github.com/dagger/dagger/pull/9187#issuecomment-2548555634
and when you enable it you get such delightful errors: Fragment "TypeRef" cannot be spread here as objects of type "__Type" can never be of type "__Type".'
ah of course, __Type is not __Type, makes perfect sense
GraphQL Introspection is often weird but that’s extra weird
Sounds like it's probably avoiding some infinite recursion corner case maybe? You are allowed to reflect, but you can't reflect your reflection 🪞 (makes me wonder what happens if you try to get a reflect.Value of a reflect.Value in go 😄 )
@Erik Sipsma is it intentional that we
In case it's of interest, here's how the OpenAI Go SDK handles the distinction between normal values, null values, and unset fields: https://pkg.go.dev/github.com/openai/openai-go#section-readme
I would love to hear everyone's ideas for "a config file for modules"... It came up earlier with @wet mason (in the context of secrets providers, eg. "can they be designed independently"). I personally really feel the need for it... @upbeat hare I know you guys have your in-house equivalent, I believe a yaml file? Are you happy with your current design? Could it be generalized to being a "config file for my module"?
Basically I want the equivalent of an .env file but for my module. Perhaps we could use an actual .env file? But it may not be structured enough?
whats the issue 🙏
I'm doing something wrong in my custom otel spans, but I'm not sure what... These │ ✔ Missing.plaintext: String! 0.0s are mysteriously appearing... Does that ring a bell?
👋 We do have our in-house config file for a contextual module in yaml, yes.
We are somewhat happy with the design 😅 , our yaml config file generation/parsing moved into its own module as the complexity began to increase, and consuming any field of the config now is a bit sad as it needs error checking for every field
config := dag.Config("file.yaml")
version, err := config.Version()
if err != nil { ... }
Having said that, having a way to pass values to a module through a .env would be quite helpful.
At the moment, what we've seen in our consumers is that they wrap dagger calls around a makefile, to avoid passing a bunch of input parameters that never change
Having a .env will likely render makefiles irrelevant to an extent (for the cases where they are only used for passing inputs, which we've already seen)
Did we remove the ability to start a graphql server?
I think dagger listen still works, are you having trouble with the token?
This worked for me in the shell:
server() {
container |
from alpine |
with-file /bin/dagger $(github.com/dagger/dagger/cmd/dagger | binary)
with-exposed-port 8080 |
with-env-variable DAGGER_SESSION_TOKEN onedag |
with-default-args -- dagger listen --listen 0.0.0.0:8080 --allow-cors |
as-service --experimental-privileged-nesting
}
client() {
container |
from alpine |
with-file /bin/dagger $(github.com/dagger/dagger/cmd/dagger | binary)
with-exec apk add curl |
with-service-binding dagger $(server)
}
client | with-exec curl http://onedag:@dagger:8080/query | stdout
i've noticed some behavior messing around with TestModule locally that smells like an engine memory leak... i run tests and regularly prune the cache between runs, but after 4-5 of these pretty heavy, 8-15m long test runs, im fairly certain i consistently see the engine getting OOM'd -- no panic logs make it to docker, the engine process just disappears and my tests start 502'ing
annoyingly this would take an hour+ to repro via scripting if my hypothesis is correct, so curious if anybody knows how to catch evidence of the OOMkill after-the-fact
(made extra annoying by colima/macos virtualization lol)
circling back on memory utilization, putting #s to things before moving on - all numbers here are me looking at process-specific dev engine memory utilization via btop:
- a fresh dev engine uses like 85M.
- running a slightly-trimmed version of ModuleTest (PR forthcoming, there are 4 tests in here that should be treated as benchmarks and ran in serial/isolation)
- it'll peak at ~9.3G memory utilization, then after tests are complete fall back down to ~8.4G at rest...
- rerunning the same suite will make it climb even higher. with the warmed cache it climbs up to 11.2G during the run, 10.6G after
- pruning and rerunning, during the prune we get up to 11.2g, then during the run up to 13G before getting OOMkilled (my box has 16G total, 15.5 userspace, at least 1G of which is consumed by the released engine at any given time)
smells like a leak, no? i should prolly look at CI engine #s to see if they crawl upwards in the same way...
in CI, things look flat, but it's likely we paper over the hypothetical leak by running dagger-in-dagger each time
Does anyone know if there's a non-hacky way to get the client sessionID in userspace? I can't seem to find anything in the SDK and/or generated code that exposes that. Silently pinging @still garnet @spark cedar @civic yacht @paper epoch
the reason why I'm trying to do this is so I can reference a service FQDN by just knowing its hostname
Hey everyone, I want to reach out regarding the issue https://github.com/dagger/dagger/issues/6990. The company I work for needs to have an option to mount the volume from the host, and it is currently not supported.
I would love to work on that feature if you agree, and I did the initial investigation. My thoughts are that havin run options that would be passed to containerd will probably be a good approach. When specifying a container, run options could be passed either through a new function, or through a functional options for Up, Stdout, etc. I would personally prefer the new option, but I don't have a strong opinion.
I'm happy to work on a proposal, POC or have a complete PR for a review. However, I guess the proposal route would be better since the API would be the most important thing to do right.
Please let me know if you are open for this, so I can start working on it right away :).
the closest thing i can find via grep is potentially a way to set it SessionID? I vaguely recall someone asking a similiar question a while back...
i imagine using the helpers around contexts and ClientMetadata in engine/opts.go is probably hacky for whatever application you're looking at, though...
you've presumably seen https://github.com/dagger/dagger/pull/9108 ? cc @wet mason
[WIP] dagger watch by aluzzardi · Pull R...
I was just wondering about Orbstack this morning. Nice to see this testing 🙂
annoyingly there does seem to be a bit of a performance vs stability tradeoff between orb and colima... i've somehow broken the colima network 3-4 times since doing all that perf testing and I don't think i've ever broken the orb equivalent. restarting the vm fixes it though.
there's also some memory_commit config that you gotta add to colima to keep redis from whining.
i have MACOS_ENGINE_DEV.md in my TODO to document this in a less ephemeral spot...
What I'm proposing is:
Flagging this here: #go message
replied
Hello everyone,
I'm new to Dagger and not sure if this is the place to discuss issues for a newbie like me. Currently, I'm trying to expose ports as a service for the development cycle, and I have a few questions that I hope can be answered:
- I'm not sure whether using Dagger to replace Docker Compose to run services for the development process is a good idea or not. For example, running MySQL, Redis, Memcached, MongoDB, phpMyAdmin (all the services that my application depends on).
- Is there any document describing how to start them in parallelism and maintain persistent data during development? I'm trying to do as shown in the code below, but it seems that the exposed services cannot be accessed by their ports from the host machine. Is there anything wrong with my approach?
https://gist.github.com/PhuongTMR/c4eca5508d976a189b0eca5d094ac379
Hello everyone,
While trying to understand why my PR (https://github.com/dagger/dagger/pull/9322) build is failing, I enabled (thanks @spark cedar ) the new merge preview on GitHub to get the link to the dagger cloud traces. But when I follow the link (https://dagger.cloud/dagger/traces/1e2071af9aa48db6c40aa171f4ae9761) I have a 500
If I try to add a v3. in front (just in case 😅 ) I have a different kind of error
thx for reporting @leaden glade . Seems to be working fine for me now. Mind trying again?
maybe it was a temporary hiccup?
It's now working from my side, thanks 👍
@civic yacht @spark cedar @rancid turret @tidal spire 👋
Working on cleaning up the new secret providers API
For context: the current API is SetSecret("name", "plaintext"). The new API (in its current, POC state) is MapSecret("vault://foo/bar") (returns a dagger.Secret as well).
Main difference is: 1) Secrets don't have names anymore 2) Can't set plaintext value 3) Secrets are mapped to an URI
I want to rename MapSecret to something else.
Option 1: NewSecret
Option 2 (my favorite, because more consistent, but breaking): Secret.
We have dag.Container to create a new container, dag.Directory ... it kinda makes sense to have dag.Secret to create a new secret (e.g. foo := dag.Secret("vault://foo/bar"). However, Secret() already exists and its used to lookup a secret by name (which doesn't make sense in the new "world" since secrets don't have names).
Thoughts on this one? IMHO Secret is the consistent answer, however if we'd rather not break things, NewSecret is a fallback. Is secret lookup by name really used?
Secret providers API
Running dagger functions in dagger/dagger
dagger functions
✔ connect 0.9s
✔ load module 3m56s
what can we do to make this better?
Running dagger functions in dagger/
hm, main is now failing with weird docker pull failures (the target pr i just merged was green before i clicked merge, i suspect something environmental?)
1 : [53.2s] | Error: failed to serve module: input: moduleSource.withContextDirectory.asModule failed to create module: select: failed to update module dependencies: failed to initialize dependency modules: failed to initialize dependency module: select: failed to create module: select: failed to update module dependencies: failed to initialize dependency modules: failed to initialize dependency module: select: failed to create module: select: failed to update codegen and runtime: failed to generate code: failed to call sdk module codegen: select: failed to copy: httpReadSeeker: failed open: unexpected status code https://registry-1.docker.io/v2/library/composer/blobs/sha256:3ae0d9dfc4dada15e6a030ba7b9c9a3b16f9f5a7597a4d46ff24226e91b91db7: 403 Forbidden
1 : [53.2s] |
i can access the library/composer image 👀
have we changed something docker hub creds related? @astral zealot @stray heron
aha, potentially this is an incident on docker hub: https://www.dockerstatus.com/
The official status page for services offered by Docker.
No. This is most likely the reason for it: https://www.dockerstatus.com/pages/incident/533c6539221ae15e3f000031/677ea803d93e4505cabcb2f6
The official status page for services offered by Docker.
Did you try pulling the image locally? What happens if you delete it & loop retry?
Works perfectly locally 🤔
bump, potential memory leak?
erik correctly points out that the next step here is a pprof heap dump
Proposal for typed secret providers (spun out of a comment thread on Andrea's PR): https://github.com/dagger/dagger/issues/9342
This proposal is for a variation of the design prototyped in #8730. It started as a comment in that PR, which I'm formalizing here. Overview In #8730 we introduce the concept of secret provider...
is it me or CI seems dead?
7 : │ │ │ │ exec docker pull registry.dagger.io/engine:c427926cc12fc0b01d1ad894e810dc6bb6948353 ERROR [0.7s]
7 : │ │ │ │ ! failed to run command: exit status 1
7 : │ │ │ │ [0.7s] | Error response from daemon: manifest unknown
not dead, but test provision is failing because of a missing tag
also this:
+2025/01/09 XX:XX:XX WARN failed to set up OTel resource error="1 errors occurred detecting resource:\n\t* conflicting Schema URL: https://opentelemetry.io/schemas/1.25.0 and https://opentelemetry.io/schemas/1.26.0"
I see main is green, rebasing ...
yep that fixed it
actually no, php still failing (it's been a while). I see you did a php change @spark cedar, is it supposed to have fixed CI or is that unrelated?
mmm this is a known flake
i'll open an issue so we can track it 🙏
php-dev constantly failing was the fix i worked on
this one is unrelated 😢
sooo say i'm inside of a go test process running inside dagger, and im adding otel spans for better obs... if i want to get git metadata about the repository, how might i do that? seems like we strip out the .git directory, but is there a reverse api where i can ask the engine what SHA i'm on?
is the repo the same as the current module, or different?
the same in this case
my initial approach didn't involve module code fwiw, that's what i'm gonna try next. i was just inside core/integration/testctx.go
but if i can get the git info from the dagger/.dagger module and pass it down to the go test process that would at least solve the problem (although it wouldn't be as easily reusable from the testctx-as-a-library-others-might-use perspective)
Trick question: if I run a container with dagger, and inside that container I run the dagger CLI and expose ports on the "host" (my container)... Is there any way to dynamically inspect those ports from the top-level client?
eg.
#!/usr/bin/env dagger shell
INNER=$(
container |
from alpine |
with-file /bin/dagger $(github.com/dagger/dagger/cmd/dagger | binary) |
with-exec --experimental-privileged-nesting dagger shell -c 'container | from nginx | with-exposed-port 80 | up'
)
# FIXME: how to introspect ports forwarded by the inner "up" and expose them here?
# (assuming I don't know them in advance)
<something something> $INNER | up
in the github actions context, we can possibly do it using github-context. e.g. https://github.com/dagger/dagger/pull/9011/files#diff-844c5083ad869965eb8bea7310291bcec5f105c66c861d61faa17a43579b3d8c
While working on the Java SDK, I'm curious about your workflows.
For instance, I want to dagger functions in a Java module. So for that I need the runtime module.
Once I'm inside a dev engine (I'm doing that with ./hack/dev zsh if that's the right way) is there an easy way to rebuild the runtime module? (in my case the sdk/java/runtime I'm working on) Something quicker than to rebuild the full engine.
Maybe there's a dagger command for that I haven't found?
And second question, is there an easy way, once it's built, to explore this runtime module content? (the container returned by moduleRuntime)
Hi @spark cedar, re: windows terminal issue, when I run echo ".core | container | from alpine | terminal" | dagger shell, it works fine, but when I type in exit, the terminan clears up and stdout is NOT restored until I press any key.
does that ring any bell?
While working on the Java SDK, I'm
Anything we can do about that dagger CLI build being not fully cacheable?
Function call caching
so first step is rm'ing SetSecret, which is the biggest blocker by far, after that it's just a few more adjustments
It looks like it's wolfi stuff
Called twice, once by dagger/dagger/version and another time by dagger/dagger/modules/go, which somehow makes a big difference.
Besides function call caching, isn't there somehint we could do in the implementation of wolfi/apko itself? I see fresh http requests to the wolfi registry every time, maybe we could pin versions or digests or something at the wolfi level?
maybe vendor an apko index file or something?
I'll look into it
(unless you tell me not to bother 🙂 )
I think something like that could help in theory, provided the apko lib supports it or we can implement that ourselves. Specifically that might help avoid extra requests happening in this chunk of code: https://github.com/sipsma/dagger/blob/9a18a0bd7f346ad9a9c2d05e2d7ac77999d3051d/modules/alpine/main.go#L99-L136
But the majority of the http steps you're seeing in that trace are from here: https://github.com/sipsma/dagger/blob/9a18a0bd7f346ad9a9c2d05e2d7ac77999d3051d/modules/alpine/main.go#L207-L207
Which is just the raw buildkit HTTP source op. It should just be making an HTTP HEAD request, get an etag and see it already has the download cached and re-use that. Those also run in parallel so the overhead is just a bunch of HEADs, so hopefully not adding tons of time. It does look like buildkit's HTTP source implementation supports some fast paths if you spcify an extra Checksum opt, so maybe there's a path to using that? But would require core API changes to dag.HTTP
Hard to say if all that's worth it since it would be almost entirely invalidated once function caching is a thing, which is hopefully not too far away thanks to the new secret stuff Andrea is doing
nice, thanks for the details!
fyi @rancid turret, looks like readthedocs ci checks have started failing: https://readthedocs.org/projects/dagger-io/builds/26828622/
hallo y'all - was curious if i could get a little discussion started on https://github.com/dagger/dagger/issues/8421 again - i came up with a use case where it's quite painful to work against this, so was hoping to try and work on a fix at some point
specifically @fair ermine and @rancid turret since I'm struggling with how you'd express this type-safely in typescript and python
Left a braindump, I need to think more about it but it's doable.
TypeScript and Python are typed but you can do some magic that can bypass the type and still be considered safe, I think your idea is even doable with throwable error (see my comment)
i'm real close to done with the initial benchdev PR, I've got 1 TODO left: make some call on how to handle caching for these runs.
summarizing the change: benchmarks are doing dagger call test all --bench and running on compute that's guaranteed 1 run per node. you can trigger them pre-merge with a "benchmark" PR label, and then they run on chron against main. each benchmark runs in serial so as to avoid x-bench pollution. in follow-up PRs im gonna try to add the span data needed to graph individual benchmark durations over time on main, and for pre-merge runs dagger cloud displays the spans
@still garnet @civic yacht seeking opinions: ideally individual go test benchmarks should be more order-agnostic than they are right now, but we're also building a dev engine to run against, and I don't particularly want to prune the whole engine cache because that can be slow. any bright ideas for the best way to set up bench tests so the first one isn't consistently slower than the rest?
hm, trying to track down a bit of a performance bottleneck, would maybe appreciate a hand 🤔
It seems like the test-module-runtimes has almost doubled in runtime since the start of January: it used to run in around 10 minutes - it now seems to run in around 20 minutes when succeeding, and often seems to time out (e.g. https://github.com/dagger/dagger/actions/runs/12771044484/job/35597270820 https://v3.dagger.cloud/dagger/traces/dff74f9620be00d277b967e2df0249cb)
have we changed anything particularly significant? or has this always been the case?
i have noticed this too, but mostly because i had been focused on making test-module shorter and then the long pole moved to test-module-runtimes
hmmm poking around honeycomb actually indicates it's been about like this for a while
so ignoring the massive spike:
all of the little bumps are at about the 20minute mark
and the bottom baseline is about 10minutes
maybe i'm honey-combing wrong tho
and you can see the trend you're describing, new year, around the 7th the peaks get taller (what got committed evening jan6, morning 7th?)
lots of interesting stuff in Go 1.24: https://antonz.org/go-1-24/
Go 1.24 interactive tour
can i get an approval on https://github.com/dagger/dagger/pull/9388 so we can start using v0.15.2 in our ci?
@still garnet re: accessor. Actually a good question. I was wondering about the same:
https://github.com/dagger/dagger/pull/8730#discussion_r1905325342
The tricky bit is the "front" API is the same for both new secrets and old secrets
@still garnet e.g. current API returns a "selector" with the accessor etc
Wondering if we could just return the Secret, with a reference to the URI and that's that
The other tricky part is leaking -- we don't want modules creating "root" secrets (e.g. file://...), instead scope them to their own module
ah gotcha
[/cc @civic yacht ^^]
I was thinking of leaving it like that, and when we do remove the "old secrets API", we might revisit that
I'll take a look at the PR again closely today. There's some changes in my mostly unrelated PR that might end up simplifying things here by making it easier to control the caching around all this, we'll see
for some reason when I am running dagger v0.15.2 commands on my windows laptop I am getting "exec format error", while v0.15.1 works fine.
Hi Justin, would you have some time to discuss the sdk string to struct changes.
@rancid turret @spark cedar 👋 I'm wondering what's the simplest way to introspect a module nowadays to list all available functions etc via the API?
(/cc @civic yacht)
back in the days there was a hidden __sdl in each module, and a way to programmatically load modules/introspect, not sure if that's still around
working my way backwards from the CLI code
I know last year Helder at a reusable introspection module, I forget what it was called
@still garnet @wild zephyr oh actually -- how are you doing introspection for daggerverse?
probably just using the Dagger API, there's like CurrentModule.TypeDefs, Module.objects, etc. (or maybe Module.initialize.objects)
answer might depend on where you're introspecting from - current module, or module ref?
a module ref
Yeah, that's what I ended up doing ... but I had to copy/paste 1k+ lines of boilerplate
@wet mason you can look at the shell source code, it had to implement all this very recently, and it's encapsulated in a relatively condensed codebase
and counting 🙂 still not working
Helder had to deal with the API sprawl and pull it together
yeah I'm looking at dagger functions (which should be the same as the shell), but there's a TON of helper code in there
so maybe you can piggy back on that
private helper code, like thousands of lines
It's just that dagger functions (and the rest of the CLI) is huge and complicated
and shell was an opportunity to clean up and simplify with a smaller footprint and less bagage
might work to your advantage
wondering if that could have been moved up the stack, directly into the API, rather than the client itself
not too late to propose it
should we expose a Module.schema: JSON (or something) that returns its GraphQL schema in the "standard" format?
then you'd just do one big json.Unmarshal or whatever
it's all the same from what I see -- shell or call are using the same underlying stuff
But we have all those API calls to supposedly spare you the effort of parsing that gql schema in the first place
at the moment with how all the SDKs work, where each individual field selection involves a roundtrip, it's much worse (in terms of effort and performance) than just exposing the semi-standard format imo
In the "old days" there was a __sdl field (borrowed from graphql federation best practices)
Does the rabbit hole go deeper, ie. should we revisit past design decisions on how this part of the API works?
to be clear i don't think it's wrong at all that we have our own introspection API too, i think it just ends up being less convenient in some cases, and maybe we can add a helper
Oh wow ... we actually have source mapping? e.g. function to source file/line
that'd be pretty cool to include in telemetry
yep thanks to @spark cedar 🙂
neat
In theory we could even link back to the corresponding link on github, since we have references to the module source code and exact version too?
I think having our API still makes sense for simple things like dagger functions - but once you start needing a full view of everything, that's where just getting a big schema dump becomes more convenient
That may be true but also our SDKs are just not good at querying data in the Dagger API - for API introspection or anything else. So it would also be nice to fix that. Maybe it makes the need for a dump escape hatch less acute?
I run into this all the time for my own modules
yeah true - gets back to one of the older bikesheds 😛
actually, that's kind of what i did with daggerverse at the beginning - i just did one big query and unmarshalled into my own struct type. maybe it still does that?
Just having a first class, well-documented GraphQL escape hatch in all SDKs would go a long way
here's what daggerverse does: https://gist.github.com/vito/a76dbf8a960666c2560e694925f92440 (cc @wet mason)
sweet, thank you!
pretty tedious writing the query + schema defs, but it's a viable escape hatch. would be nice if we could do something like https://github.com/shurcooL/graphql which infers the query from the struct
what's the cleanest module we have around, using the latest best practices (constructors etc) @still garnet @tepid nova ?
owned by Dagger?
by anyone
Daggerverse modules are in score order.
couple of good ones
Thank you!
@wet mason I like github.com/dagger/dagger/modules/go it's pretty meaty and works well
recently refactored
& used in a real project (ours)
I wanted to compare with my own modules, see if I was messing up best practices ...
One annoying bit:
A) The only way to "persist" state is through public fields
B) Best practice is to have a New constructor and then store arguments into fields
A+B = modules have their own constructor argument exposed as public functions
Not a huge deal, but it gets noisy
I have a module with ONE useful function (for now), but when I introspect, I see 6-7 public functions (most of them are just my constructor fields, saved as Public fields)
You can hide ones you don't want in the API https://docs.dagger.io/api/state/
Same here -- publishFile etc are the meaty ones, but they get mixed up with e.g. username and password which are just the constructor arguments, stored inside public fields
Yeah I use +private a lot to hide fields.
(but sometimes I find it useful to keep them public, depends on how I want people to use the module)
and these are the constructor arguments
ah nice
@obsidian rover
failed to resolve source metadata: failed to get credentials: error getting credentials - err: signal: killed, out: ``
Does that ring a bell?
Happens when I try to pull docker images
@wet mason friendly reminder that Guillaume is in France so it's very likely that he won't see this until Monday 🙏
Never mind! It was docker desktop causing issues
@wet mason re module introspection: actually @tidal spire and Guillaume have been working in a "loader" library which has been spawned from the daggerverse code. Maybe that's something useful: https://github.com/dagger/dagger.io/pull/4173
we are going to use this in the "Cloud modules" project to introspect modules after scanning an org repo or receiving a webhook event about a new module pushed to the org
Hey there.
Taking a shot at https://github.com/dagger/dagger/issues/7721 (tbh. with a limited amount of go knowledge). I think I got through passing the rewriteTimestamp:bool down from Container publishArgs/exportArgs to the actual publish operations in buildkit (making them actual properties on Container.export() and Container.publish()).
However, passing the SOURCE_DATE_EPOCH itself, I think this should probably be something that should not be picked by the engine but at the client during session initalization (a "frozen" value that would then be used for all calls within that session) - so wdyt. Would you prefer something where the epoch is:
- picked up by the engine from the environment globally (IF the client is creating the engine, it would pass it from its environment as env)
- read from env
SOURCE_DATE_EPOCHduring client creation and passed to the engine as optionalClientMetadata.SourceDateEpoch(similar to how CloudToken or DNT are passed at client creation time) <- my current preference - an explicit optional "Epoch" field on
Container.withExec/export/publish Args? - anything else?
What are you trying to do? I'd love to have binary reproducible builds (motivation: https://reproducible-builds.org/) in dagger. Why is this important to you? Beside its nice security propertie...
I try fixing the return list of container bug (https://github.com/dagger/dagger/issues/8202) with this PR https://github.com/dagger/dagger/pull/9425. It has a simple test case that covers this issue but is a bit hack for me. I'd appreciate it if anyone could guide the right fix.
What is the issue? From this snippet: $ cat main.go package main import ( "dagger/reproduce-bug/internal/dagger" ) type ReproduceBug struct{} func (m *ReproduceBug) Containers() []*dagger...
List of containers
Did you know that every SDK module contains the source files for every other SDK module? And for the whole Dagger engine and all of its tests, for that matter.
Give it a try: dagger core module-source --ref-string "./sdk/python/runtime" resolve-from-caller as-module source context-directory directory --path=sdk entries
I started a discussion about it on GitHub: https://github.com/dagger/dagger/discussions/9303
Reproducible Builds
rust jobs are failing on main because of an unpinned dependency 😢 https://github.com/dagger/dagger/pull/9440
ooops accidentally broke some releasing things: https://github.com/dagger/dagger/pull/9443
If we wanted to get the filesystem state of a container-backed Service during or after execution, could we? ie. does buildkit allow getting that state at all?
Sorta, it's still just snapshots underneath but the abstractions on top of it in that codepath are different and pretty unwieldy. Possible but would be very hacky and convoluted in the current state of things
Would it be cleaner to do after execution than during?
(I'm trying to imagine a possible "persistent services" feature where services could be configured to be persisted across engine restarts, or even engine migrations)
No that would make it worse, the problem is just that the codepath we have to go through for services is something bk calls a "gateway container", which is different than the containers made for exec-ops. The codepaths there are pretty hardcoded to assume that the filesystems are ephemeral, so you would have to jump through a million hoops to trick everything to make them not-ephemeral. Sure it's possible some way or another but probably a nightmare.
It might be possible to change services to not use gateway containers and instead be exec-ops, but IIRC that's what @still garnet tried in the first services v1 impl but then hit weird buildkit scheduler bugs and some other problems I can't remember now and moved off of them. Some stuff has changed since then, but even if that's plausible now it would be a significant lift to make that move
To be clear, on a purely conceptual level this should all be totally possible, the only blockers are implementation details
I guess the escape hatch is cache volumes
Unfortunately, yeah
(but in the future could be some other primitive we concoct, that also builds on buildkit bind mounts)
what about one-off exporting files from the gw container's fs into the DAG, kind of like Host.Directory() ?
so you wouldn't get the whole FS state as a Directory object, but you could query for some files to be exported, and under the hood it would be implemented with a bunch of non-atomic walk readfile etc. Maybe too hacky to be worth it
If the service is running then yeah this is plausible, we can get the paths where the services filesystems are mounted and cp stuff out for export essentially. But yeah, still pretty hacky and really straining ourselves to trick the underlying system into cooperating w/ us
Makes sense. Dropping this particular angle 🙂
OK another question: is there any way for a Dagger module to receive a generic object (ie. an interface with no specified methods), and somehow introspect the whole schema of the concrete implementation?
And from there, build queries to call that object?
This 👆 is what it would take to turn any Dagger object into an agent, without having to patch the engine
it's technically possible given the content/structure of IDs, but the APIs don't exist for it. IDs contain their type, and any modules they need, so you should be able to load/serve those modules and inspect the schema for its type
Yeah, as long as you are willing to be dynamic and not have codegen'd apis for calling it, that would be theoretically possible and supportable. But not really possible today most likely
I'm not even sure if interfaces would be the right approach there, sounds more like something different like a type called Dynamic that has generic methods for getting the apis and calling them in type-unsafe ways ("stringly typed" calls)
I'd want to be 100% sure that's the only option to support that use case before pursuing it but sounds plausible
well the alternative is to patch the engine...
Actually I'm pretty sure some early iteration of the module API had a field on Module named call(args: JSON): JSON which is fairly similar conceptually. I think I removed it because there was no use and it was becoming a pain to maintain atm, but maybe there's a use for it now
What do you mean patch? Like the engine code itself? I guess I just need more details on the use case
My use case would be to "agentify" any dagger object by plugging it into a llm. eg.:
func (m *Langdag) Angentify(env *dagger.Object, llm *dagger.Service, prompt string) *dagger.Object
Oh okay sure, basically just want to give up type safety in terms of your module code and do everything dynamically w/ the llm. Yeah, we'd just need a new type like Object that has fields for making calls with strings/json/whatever
I don't think it would be a huge effort to add support for that
Another angle would be to create a new SDK for this. If you implement an SDK you handle receiving args/returning values as raw JSON, which is close to what you're describing anyways. So that would actually be plausible today. Not convinced that's easier than us just adding support for dagger.Object but food for thought
Is there an escape hatch at the moment? 🙂
I'm trying to find out what's the path of least resistance for a POC:
- Module
- Engine patch (implement this in the core API)
- External tool
Well now that I think about it, today it is possible to pass arguments of type Module, and I guess that implies that in your module code you could do e.g.
func (m *Langdag) Angentify(mod *dagger.Module) {
mod.Serve() // replaces your schema with the schema for calling the mod
dag.GraphQLClient().MakeRequest(...) // make raw gql queries to the newly loaded schema, including introspection
}
Not 100% sure if it works but also can't think of why it wouldn't 😄
You can then pass around any other dynamic state as type string (including IDs serialized to strings)
That's my best stab at what's possible for a POC atm. But adding support for dagger.Object really doesn't strike me as that hard (often famous last words of course, but might actually be true), it wouldn't require anything very new because of what @still garnet mentioned about how IDs work
passing a dagger.Module doesn't get me much, because I can just load it myself from ref (Andrea's POC does that)
I was hoping that passing a dagger.Object allows me to setup my object state with complete freedom, eg. env := dag.Foo(src).WithToken(bar).WithSource(bla); dag.Agentify(env)
Object does sound neat... would it have fields for introspection (typeName?), and a dynamic call API? (maybe a step too far: a asFoo field for every type?)
env.MarshalJSON() exists and would let you pass it as type string, would that work?
Yeah that's what I'm imagining
My fallback I think, is to use @wet mason 's current implementation, which is: 1) load module from ref + 2) introspect module constructor (and only module constructor) and try to infer what to pass, eg. secret argument bar of module foo is loaded from env variable FOO_BAR
ooh, I could pass a dagger shell script instead of a ref?
for custom init
.... which makes me think this could be an experimental shell builtin?
github.com/my/agent --token=FOO | with-home-directory ./home/dir | with-db tcp://localhost:4242 | .agentify
- Arbitrary object construction ✅
- Access to typedefs for introspection ✅
- Access to query builder / client ✅
- Avoid spelunking in the engine internals ✅
- Avoid custom wrapper tool ✅
Downside: not programmable. So, not nestable
I'm looking for the implementation of the id() resolver for custom types in core... Any pointers would be appreciated 🙂
Ah, found a trail at core.ModuleObject.Install
Aha! Looks like dagql is where I might need to attach myself? Same level as ID()
@still garnet does it even remotely make sense to patch dagql itself, so that every object gets a Prompt() alongside an ID()? (where prompt is my POC implementation of introspecting that object's fields, and plugging them into a llm as tools)
@still garnet quick dagql question if you're around.
func (s *Server) installObject(class ObjectType) {
class.Extend(
FieldSpec{
Name: "prompt",
Description: "prompt a LLM with this object as environment",
Type: class.Typed(),
ImpurityReason: "I have no idea what I'm doing",
Args: []InputSpec{
{
Name: "prompt",
Type: String,
},
},
},
func(ctx context.Context, self Object, args map[string]Input) (Typed, error) {
promptArg, ok := args["prompt"]
if !ok {
return nil, fmt.Errorf("no prompt specified")
}
prompt := args["prompt"].??? // <------ 🤔
ctx, span := Tracer().Start(ctx, "[👨] "+prompt)
// insert magic here
span.End()
},
)
How do I get the actual string value for the prompt arg ?
I'm going to try ToLiteral().Display() since it's the only method I can find that returns a string
this is a bit off the beaten path so the API isn't pretty, but you can just cast it to .(dagql.String) since you at least know it's the same type you configured in Args
Noooo unrelated docker install fail 😭
Error: input: container.from failed to resolve image "docker.io/library/alpine:latest" (platform: "linux/arm64"): failed to resolve source metadata for docker.io/library/alpine:latest: DeadlineExceeded: failed to get main client caller: no active session for w450zguly0n08y955d3xkpwp0: context deadline exceeded
nice, thanks
I may be jinxing it, but this feels like perhaps the perfect layer for my POC
@still garnet how do I declare that my argument is of type string? String gets a type error:
Args: []InputSpec{
{
Name: "prompt",
Type: String, // <-- wrong
},
},
String("") should do
😢 😢 😢 😢 😢 😢
Error: Post "http://dagger/query": command [docker exec -i dagger-engine-v0.15.2 buildctl dial-stdio] has exited with exit status 137, make sure the URL is valid, and Docker 18.09 or later is installed on the remote host: stderr=
Run 'dagger call dev --help' for usage.
back on track
⋈ directory | prompt hi
Error: input: directory.prompt panic while resolving Directory.prompt: runtime error: invalid memory address or nil pointer dereference
⋈ directory | .doc prompt
prompt a LLM with this object as environment
USAGE
prompt <prompt>
REQUIRED ARGUMENTS
prompt string
RETURNS
Directory - A directory.
Use "prompt <prompt> | .doc" for available functions.
⋈ directory | prompt hi
Error: input: directory.prompt panic while resolving Directory.prompt: runtime error: invalid memory address or nil pointer dereference
goroutine 29708 [running]:
runtime/debug.Stack()
/usr/lib/go/src/runtime/debug/stack.go:26 +0x5e
github.com/dagger/dagger/dagql.(*Server).resolvePath.func1()
/app/dagql/server.go:689 +0x78
panic({0x21bfe20?, 0x3ca8f50?})
/usr/lib/go/src/runtime/panic.go:785 +0x132
github.com/dagger/dagger/core.(*Directory).PBDefinitions(0x3cb2e00?, {0x29d3470, 0xc00065aa20})
/app/core/directory.go:51 +0x2a
github.com/dagger/dagger/core.collectDefs({0x29d3470, 0xc00065aa20}, {0x29ac2a0?, 0x0})
/app/core/telemetry.go:27 +0xa2
github.com/dagger/dagger/core.AroundFunc.func1({0x29ac2a0, 0x0}, 0x0?, {0x0, 0x0})
/app/core/telemetry.go:139 +0x730
github.com/dagger/dagger/dagql.Instance[...].call.func1.1()
/app/dagql/objects.go:441 +0x30
github.com/dagger/dagger/dagql.Instance[...].call.func1()
/app/dagql/objects.go:474 +0x3f0
github.com/dagger/dagger/dagql.Instance[...].call(0x29fc880, {0x29d3470, 0xc00065a6f0}, 0xc0000c0cc0, 0xc000635740, 0xc00065a930)
/app/dagql/objects.go:482 +0x229
github.com/dagger/dagger/dagql.Instance[...].Select(0x29fc880, {0x29d3470, 0xc00065a6f0}, 0xc0000c0cc0, {{0xc0005aa510, 0x6}, {0xc000999ba0, 0x1, 0x1}, 0x0, ...})
/app/dagql/objects.go:387 +0xac5
github.com/dagger/dagger/dagql.(*Server).resolvePath(0xc0000c0cc0, {0x29d3470, 0xc00065a6f0}, {0x29d8460?, 0xc0006344c0?}, {{0xc0005aa510, 0x6}, {{0xc0005aa510, 0x6}, {0xc000999ba0, ...}, ...}, ...})
/app/dagql/server.go:698 +0x154
github.com/dagger/dagger/dagql.(*Server).Resolve.func1()
/app/dagql/server.go:457 +0x7e
github.com/dagger/dagger/dagql.(*Server).Resolve.(*ErrorPool).Go.func3()
/go/pkg/mod/github.com/sourcegraph/conc@v0.3.0/pool/error_pool.go:30 +0x23
github.com/sourcegraph/conc/pool.(*Pool).worker(0x1b99e74?)
/go/pkg/mod/github.com/sourcegraph/conc@v0.3.0/pool/pool.go:154 +0x69
github.com/sourcegraph/conc/panics.(*Catcher).Try(0xc000e30930?, 0xc0008727d0?)
/go/pkg/mod/github.com/sourcegraph/conc@v0.3.0/panics/panics.go:23 +0x42
github.com/sourcegraph/conc.(*WaitGroup).Go.func1()
/go/pkg/mod/github.com/sourcegraph/conc@v0.3.0/waitgroup.go:32 +0x4d
created by github.com/sourcegraph/conc.(*WaitGroup).Go in goroutine 29707
/go/pkg/mod/github.com/sourcegraph/conc@v0.3.0/waitgroup.go:30 +0x73
⋈
progress!
current status: trying to understand the difference between Typed.Extend() and Fields[Typed].Install()
dagql has a Tracer() but sending custom spans doesn't seem to work, or at least they don't show up in the TUI or Dagger Cloud
Update:
✅ Successfully hooked a prompt function in every object
✅ Successfully passed the prompt to LLM (hardcoded token for now)
🚧 trying to introspect dagql.ObjectType.. No dice so far. Maybe raw graphql introspection?
Introspection ✅
feel free to yoink any/all of this: https://github.com/vito/daggerverse/blob/main/mcp/introspection/introspection.go
i've found the agent is much happier to grok a graphql SDL rather than introspection JSON. (but maybe your current approach is already simpler)
So far I'm doing Server.Schema[Object.Type().Name] and from there I get an ast.Typedef with a lot of good stuff. Will see how far it gets me
(and indeed I looked at introspection to figure that out)
Now doing some LLM tool call plumbing to connect my code to @wet mason 's . Then the final touch should be figuring out how to build & send a query
then we can start having some fun with the call conventions 🙂
do we have an example of SDK impl which is not part of dagger/dagger repository?
freshly prototyped: https://github.com/quartz-technology/daggerverse/tree/main/docker_sdk
What are you looking for Rajat? I think there's an experimental ruby SDK somewhere which hasn't been merged to dagger/dagger yet
thanks Marcos. I am just trying to test some changes I am doing to sdk config structure in dagger.json, and want to test it with an sdk which is not a inbuilt sdk.
IIUC Class seems to be the concrete type of an ObjectType? Since ObjectType is an interface
hoping someone can help me understand this bit in dagql: https://github.com/dagger/dagger/blob/38c3055fa921ac703ef142c0bcf409123a8a5382/dagql/server.go#L494-L503
it seems to be the same as the logic in Select itself:
https://github.com/dagger/dagger/blob/b85ca77053d196ec622b38c23649f63d0d278f0a/dagql/objects.go#L456-L475
i'm not 100% sure why this logic appears in both places? does this not have the affect of selecting something twice?
🤔 how does Nth even get set actually
@still garnet re: how to plug in the llm config... Actually the cleanest way would be neither engine config, nor attaching to the query (although I might still do the latter for my POC). It would be session attachables no? Like either a custom one just for that, or more pragmatically, use the new secret provider stuff which is exactly what you suggested 😛 I'm just trying to not have my PR depend on another fast-moving PR if possible 🙂
yeah agree
maybe I can copy-paste an ugly throaway session-attachable hack from the secret providers PR, then just use the real thing when it merges?
@still garnet what was the incantation again to go from dagql.Server to *core.Query ?
srv.Root().(*core.Query)
that's what I tried but it won't let me
what's it say?
compiler: impossible type assertion: srv.Root().(*Query)
*Query does not implement dagql.Object (missing method Call)
(I'm inside core)
oh - maybe it's dagql.Instance[*core.Query]
@tepid nova wdyt of merging https://github.com/dagger/dagger/pull/9327 but marked experimental? too early? there's a bunch of other changes on that branch that would be nice to get in, but maybe I can split them out
Is the code snippet in the PR up to date?
yeah, i think so
(2 weeks is a long time ago 😵💫 - but i do remember editing it as things changed)
My thoughts
- Obviously shipping something incrementally useful soon would be nice
- I worry that "experimental" might become "good enough, let's revisit in 6 months" or even worse "hey don't break my code I was using that!" Not sure how to avoid that.
- Looks SDK-specific no? Would this work out of the box in other SDKs, or would it require more design work?
(I know it's a real core API call, but the nesting seems to require SDK heavy lefting)
re 3), i don't think there's a silver bullet that avoids per-SDK integration. the amount of work for each is pretty light and low-touch, it's just a matter of mapping to the natural "context propagation" pattern in each language (with in Python, callbacks in TS, context.Context in Go, etc)
i was able to do it in Python/TS without being an expert in either which seems like a good enough sign
What feels weird is that it's half SDK-backed and half engine-backed.
like i said - that's pretty unavoidable afaict; your proposed APIs from side conversations would also need a lot of SDK special-casing. all of the core logic is in the engine; the SDK side changes are just to make the API feel like something that you would actually want to use, and so it's able to interoperate with any existing OTel integrations you might have, which are still going to be dependent on the language-native context propagation patterns.
basically - if we do it only in the engine and just have SDKs use the API as if it were any other codegen'd API, that means we lose automagic OTel integration as a feature.
but, if we're still not comfortable committing to it even in experimental state, that's fine; I can pull out the other parts to merge separately
I guess it boils down to that statement. I thought there was a way to get our cake and eat it too, so was holding out for that. But if that's not actually possible, no point in holding out. So if we just hash out that one point, it should unblock the rest.
If you give me a couple hours to progress on my other things, then we can just figure it out live?
sounds good 👍
Super niche core question: are core ID types (like core.SecretID) nullable? Or do I need a *core.SecretID to get a null value?
you can either use a pointer or dagql.Optional[core.SecretID] (pointer is just syntax sugar)
they're not nullable by default
poking through typescript module init traces, 2 suspicious things, both of which sit on a line where idk whether to distrust the telemetry or the thing being instrumented:
1, containers seem to have absurd netns TX numbers... like corepack use yarn Rxs 1.1Mb (sensible) but apparently Txs 113Mb (wtf) @civic yacht
2, container construction calls show up repetitively a lot, and we only get logs and a duration measurement for the first invocation. looks like caching, sure, cool, BUT there seems to be 2 ways of getting a cache hit that are displayed differently? or some get the cache hit displayed and others don't? or i just don't know how to read the output here? see screenshots. how do i interpret the middle of trace one? what's going on there? @still garnet
lmao i hate discord so much, it doesn't show my filenames: 1st screenshot it top of trace cache miss, 2nd is middle, 0 duration but not explicitly CACHED, 3rd is bottom, CACHED
Dumb question but how do I check dagql.Optional for null?
Valid bool field - it's true if the value is provided
@hasty basin re: recorder module. If it's a PITA to separate the commits don't worry about it
I've already cherry-picked the history of the module. Will do the snippets part separately after
@still garnet oh no
s.srv.OnInstallObject(func(selfType dagql.ObjectType, install func(dagql.ObjectType)) {
&agentSchema[selfType]{
srv: m.srv,
selfType: selfType,
}.Install(m.srv)
})
This is the linchpin of the whole thing... How do I get a dynamic dagql type definition into a Go generic type bracket 😭
wanna pair?
yes please 🙂
@still garnet this is what I'm going to try to make work:
m.srv.OnInstallObject(func(bodyType dagql.ObjectType, install func(dagql.ObjectType)) {
class := dagql.NewClass(dagql.ClassOpts[*core.Agent]{
// Instantiate a throwaway agent instance from the type
Typed: core.NewAgent(bodyType),
})
install(class)
})
Does that look right?
What not 100% clear (writing it down to help think through it) is how exactly is Typed used
Yup!
Because that will determine how my core.Agent will be consumed by NewClass exactly
Oh wait this is what my instinct still tells me to do:
// [...]
Typed: core.NewAgent(dagql.Instance[bodyType]),
// [...]
If bodyType is eg. *core.Directory, won't dagql.Instance[*core.Directory] give the agent everything it needs? It can call .ObjectType() to get the type back for introspection by NewClass. And it doesn't have to take the type & instance separately
("body" as in the body of the robot 🙂
i think bodyType will be a dagql.Class[*core.Directory], not a *core.Directory
the Typed field is actually godoc'd:
// Typed contains the Typed value whose Type() determines the class's type.
//
// In the simple case, we can just use a zero-value, but it is also allowed
// to use a dynamic Typed value.
Typed T
in your case, you need a dynamic Typed value, whose sole purpose is to have Type() *ast.Type so the dagql.Server knows what type is being installed into the schema
so, i think the right thing to do is something like &core.AgentType{Inner: bodyType} with a Type() that returns &ast.Type{NamedType: Inner.Type().NamedType + "Agent", NonNull: true}
I already have that (I think) I just get the type name from the instance:
type Agent struct {
// [...]
self dagql.Object
}
func (a *Agent) Type() *ast.Type {
return &ast.Type{
NamedType: a.self.Type().NamedType + "Agent",
NonNull: true,
}
}
but you won't have an instance at this point in time
OK that's the part that I don't understand. I guess in my case there's no easy way to make an "zero -value" like the Typed godoc recommends
yeah - it has to be determined from runtime values, so there's no way for a zero-value to do it
Got it. Then I'll just add an extra argument to NewAgent to take both the bodyType and an optional body
(using self and body interchangeably, can't decide 🙂
like your internalId
yep yep
Is it too cute to try to pass a single self interface{} type and try to cast it either as a dagql.Object or dagql.ObjectType? It's annoying to have to pass both, and have to trust the caller that they always match
@still garnet follow-up question, is it accurate that NewClass will perform Go reflection on core.Agent to infer the fields to install?
Or do I still need to manually install fields somewhere?
need to manually install fields - there's no reflection a runtime
too cute imo
(brb)
s.srv.OnInstallObject(func(bodyType dagql.ObjectType, install func(dagql.ObjectType)) {
class := dagql.NewClass[*core.Agent](dagql.ClassOpts[*core.Agent]{
// Instantiate a throwaway agent instance from the type
Typed: core.NewAgent(dagql.Instance[bodyType]),
})
class.Install(
dagql.Func("withPrompt", s.withPrompt).
Doc("add a prompt to the agent context").
ArgDoc("prompt", "The prompt. Example: \"make me a sandwich\""),
dagql.Func("run", s.run).
Doc("run the agent"),
dagql.Func("history", s.history).
Doc("return the agent history: user prompts, agent replies, and tool calls"),
dagql.Func("as"+s.selfType.TypeName(), s.asObject).
Doc("convert the agent back to a " + bodyType.TypeName()),
)
install(class)
})
Childcare break...
Back at it. Managed to get it to build... Let's see it actually works 🙂
panic in the general area you would expect...
I have a first PR for the Java SDK (to run Java modules) that looks pretty good for now: https://github.com/dagger/dagger/pull/9422
Not everything is covered, but it allows to run java modules, call them from java or other languages, etc.
Optional args and default values are covered. Constructors are not yet handled.
The PR starts to be a bit big, I'd like to stop adding anything to it (except if we find bugs of course).
dagger init --sdk=java and dagger develop are working as expected.
Can I get some 👀 and feedback from <@&946480760016207902> 🙏 ?
If any question about it, do not hesitate, I can also jump on a call to explain why, how, etc if that can help to review it.
going through it now, it would be great to get @rancid turret's thoughts as well
Telemetry: cache miss reasons
verbose tui trace confusion
Engine error on k8s cluster
@still garnet What are the conditions for a service tunnel to start?
I'm working on DockerSDK with a docker compose -> https://v3.dagger.cloud/Quartz/traces/7dcf214641722677241609de0ec04ced?listen=ab1038fc7c420981
It seems I got all my services running but the tunnel isn't created, is there like a condition or something?
Also what happens if I bind a same service 2 times but with different name? Will it work as expected and the service be reachable with both name or it will fails? Maybe that's the reason
What's the best way to generate an introspection JSON file?
I'm using a simple go run cmd/introspect/main.go but I wonder if this also exists as a dagger command so that I can call it without building Go code
In the dagger repository, you can do dagger -m .dagger call sdk <your sdk> generate -o . to codegen sdk 🙂
Like … call sdk typescript generate -o .
yep, but I'm looking at doing roughly the opposite. At least for now the way the java sdk is built is by reading a local introspection json file or by calling dagger with a kind of custom query that will mimic the introspection (kind of because if the instrospection query changes it has to be reflected in the java code).
And I'd like to remove the query by a proper call to dagger.
The advantage of today's way of doing (so not with a dagger call sdk java generate) is the local dev ex is better. It's just a mvn install call.
So I've seen the instrospect/main.go that does exactly what I'd like, but wondering if it's exposed somewhere. Without the full sdk/codegen aspect.
I think there's an API method called something like __schemaFile somewhere
Mmm I lie actually
That's odd, I could have sworn we did that at some point
introspection.json
🙏 🙏
question: when an API client gets an ID in their response, how big is that ID? Is it the full uncompressed recipe, or is it a compressed version, or even a digest that the engine can use in a lookup later (I guess that last one is unlikely)
It's base64 and uncompressed atm. We made changes a while back that de-dupe it so each vtx in the DAG only appears once, etc. but it can still get large for really complex DAGs. I think we should move to passing digests around for better performance but we'd need a persistent (on-disk) cache of digest->ID mappings, whereas today that dagql caching is just in-memory and per-session. So once we move all caching logic from buildkit->dagql we should be able to do that
One degenerate case that results in huge IDs is withNewFile where the contents are large. In that case the ID includes the whole contents. There's probably some other degenerate cases like that too
sdk vendoring, library split
@leaden glade @rancid turret @spark cedar @fair ermine I don't want to pollute your other thread (https://discord.com/channels/707636530424053791/1334447130256867390) but there is a related topic that is on my mind: generated clients. At the moment they are a tightly coupled component of a monolithic SDK. But I think it's time to work on decoupling them. We've discussed this in the past but never got around to it - IMO it's time.
🧵
one minor suggestion on v3 cloud ui. can we add a mouseover msg in the "time it took for the trace" div. e.g. "failed after 3.5m", "running since 2m 27s", "used cache". that way I don't have to remember the color encoding. (pls ignore if that does not make sense).
cc @still garnet 🙏
Hi Folks, during one of my change, I am getting error:
Error: failed to get module SDK: input: module.withSource.sdk.source get field "source": reflect: call of reflect.Value.Field on zero Value
any pointers on how we can fix this. The SDK is a struct in this case (I am working on changing sdk from string to struct) which is nil as user has not configured the sdk yet.
this works fine when the sdk is initialized in dagger.json
do you have code that you can share?
its coming from here: https://github.com/rajatjindal/dagger/blob/main/dagql/objects.go#L1093
oh, i think i fixed it by uncommenting some code which was commented out 13 months ago
index 295fe28e5..3cf471cba 100644
--- a/dagql/objects.go
+++ b/dagql/objects.go
@@ -1102,9 +1102,9 @@ func getField(obj any, optIn bool, fieldName string) (res Typed, found bool, rer
}
objV := reflect.ValueOf(obj)
if objT.Kind() == reflect.Ptr {
- // if objV.IsZero() {
- // return nil, false, nil
- // }
+ if objV.IsZero() {
+ return nil, false, nil
+ }
objT = objT.Elem()
objV = objV.Elem()
seeing an increased numbers of errors like this (not related to v0.15.3 i don't think, this is on main, we haven't bumped there yet)
https://v3.dagger.cloud/dagger/traces/0e66540278724740d95e7292febaf539#98a7d0949b788363:EL1
the plain output is really unhelpful 🤔 https://github.com/dagger/dagger/actions/runs/13059492590/job/36438693105?pr=9465
failed to return error: input: currentFunctionCall.returnError failed to get requester session: session for "hncyh7pd4t5f2vju9z31uuoc7" not found
i wonder if it's got something to do with the underlying engine getting shut down, the client being closed, but the module is still running? (in which case "context cancelled" would be the right error, but it could be getting lost?)
only guessing that, since it seems to happen at the same time as a lot of other jobs on the same runner
is there a way to visualize via server calls vs direct calls in the dagger ui. I think it would be useful to know what level of nesting are we in.
Based on this comment in the java PR (https://github.com/dagger/dagger/pull/9422#discussion_r1935476374) I did some tests about module names. So I created a e2e module in multiple languages and see if they can call each others. That works quite well for Go, PHP and Java. But for both python and typescript it's not working. In both case, a dagger function right after the dagger init will complain not finding the module. No @dagger.object_typedecorated class named E2e was found for python, and could not find module entrypoint: class E2e from import. Class should be exported to benefit from all features. from typescript.
Hi @still garnet, I noticed that you had committed this code (but commented out) in this commit (13 months ago :P): https://github.com/dagger/dagger/commit/353ba56468fbb908cf29e2ccd50b187d178e95de#diff-9eea0c3c5756d18267e5948dc786fda84991c46a1694b9be7904931e7bfd639eR681
without this I am getting following error:
Error: failed to get module SDK: input: module.withSource.sdk.source get field "source": reflect: call of reflect.Value.Field on zero Value
do you know what would be the side effect of uncommenting this code. (I am still going through the tests to see if it impacts any test).
to confirm, is it the commented code in #maintainers message or in your link?
- https://github.com/dagger/dagger/commit/353ba56468fbb908cf29e2ccd50b187d178e95de#diff-9eea0c3c5756d18267e5948dc786fda84991c46a1694b9be7904931e7bfd639eR642 (snippet above)
- vs. https://github.com/dagger/dagger/commit/353ba56468fbb908cf29e2ccd50b187d178e95de#diff-9eea0c3c5756d18267e5948dc786fda84991c46a1694b9be7904931e7bfd639eR681 (your link)
The snippet, i may have messed up the link
hmm I may have tried it as a fix and then realized it's a secondary issue - if I'm reading it right it seems to imply we're trying to get the source field off of a nil value, which is a little strange because you'd expect the whole object to just be sdk: null rather than e.g. sdk: {source: null}
I think the error msg says “trying to read field on null value”, which i read as “sdk is nil and you are trying to read a field on it”
But i can print the type on that error msg to be sure
yeah agree
i'm just not sure why it's descending into a nil value and trying to select fields off of it - that seems like a bug
I have it as *SDKConfig, a new struct i am adding to type Module (to change sdk in dagger json file to a struct)
I thought maybe i need to send it back as dagql.Instance but i struggled converting this struct to a dagql instance
happy to review code if you have it pushed somewhere
its a WIP branch so lot of useless comments and unstructured commits there: https://github.com/dagger/dagger/compare/main...rajatjindal:dagger:private-deps-sdk-sol
i am working on cleaning this up, but i won't be ready with that until atleast tomorrow.
So, am I right that there is a regression in the caching of function calls?
- It used to be cached within the same session
- Now it is never cached even within the same session.
Is that correct @civic yacht?
Not exactly, they have always been cached within a session in terms of buildkit operations. 4 months ago we made a change that caused them to not be cached in the dagql level of caching, but that's not a ton of extra overhead since the bulk of the expensive work gets covered by buildkit caching for now.
This fix gets us more dagql-level caching back, but the main motivation was to avoid confusing duplicate telemetry
It's confusing due to the multiple cache systems in play obviously
"Will the real client.gen.go please standup" 😭
Nice! Now all I need to do is resolve the merge conflict that fix and my agent branch 😭
It's worth it though, those session-level cache hits make the experience of re-running agent workflows in the shell way more fun
What's surprising is that I noticed a sharp increase of in-session cache misses way more recently than 4 months... like in the last 2 weeks maybe. But maybe it was related to something I was doing
Hitting this https://github.com/dagger/dagger/pull/8991#issuecomment-2612710834 or something similar while trying to merge last of Solomon's docs gif recorder things in. dagger -i develop drops me in a sandbox with
root@buildkitsandbox:/src# cat dagger.json
{
"name": "bass-sdk",
"engineVersion": "v0.11.9",
"sdk": "go",
"source": "."
}
hmmm...
maybe related to 0.15.3 and maybe fixed in main here? https://github.com/dagger/dagger/pull/9483
This is caused by the otel version bump, it's the top line change in the changelog
The bass sdk version needs updating - or whatever depends on it
Which may be vito's apko module
The bass sdk version needs updating - or
I think I need to change something in bitbucket.org:dagger-modules/private-modules-test.git/cool-sdk repo to fix some tests that are failing in one of my PR. could someone please give me access to this repo
access to private test modules
An easy contribution if anyone is looking for a way to help 🙏 https://github.com/dagger/dagger/issues/9507
I have been seeing The operation was canceled. error in GitHub action runs, and it seems like it could be because the runner was stopped for some reason. could someone with the access check why that might be the case:
it's because we use spot instances for some of our ci runs
depends on the check though, do you have a link?
Following up on today's discussion on the future of the "remote engine protocol" and _EXPERIMENTAL_DAGGER_RUNNER_HOST... @spark cedar @civic yacht @still garnet @paper epoch @rancid turret @stray heron
Java PR https://github.com/dagger/dagger/pull/9422 has been approved (thanks @rancid turret )
But I don't have write access, so could anyone with write access (I guess <@&946480760016207902> ?) merge it? I guess a squash is best due to the number of commits 😅
Let me know if you need any support from me to merge it.
Can you rebase?
Rebased, and pushed
Looks like something's wrong (or just unexpected from my knowledge). It tries to access sdk/typescript while checking sdk/java for instance
2336: │ go(
2336: │ │ │ source: Directory.withDirectory(
2336: │ │ │ │ directory: no(digest: "sha256:4ad4fb9dd46b164f2c00b86b85a05e6d9ffacc5f2ef1ef22c007ee662e634724"): Missing
2336: │ │ │ │ exclude: [".git", "bin", "**/.DS_Store", "**/node_modules", "**/__pycache__", "**/.venv", "**/.mypy_cache", "**/.pytest_cache", "**/.ruff_cache", "sdk/python/dist", "sdk/python/**/sdk", "go.work", "go.work.sum", "**/*_test.go", "**/target", "**/deps", "**/cover", "**/_build"]
2336: │ │ │ │ path: "/"
2336: │ │ │ ): Directory!
2336: │ │ ): Go!
2337: │ Container.from(address: "golang:1.23.2-alpine"): Container!
2338: │ Container.withRootfs(
2338: │ │ │ directory: Directory.withDirectory(
2338: │ │ │ │ directory: Directory.directory(path: "sdk/typescript"): Directory!```
Do we have a update on https://github.com/dagger/dagger/issues/8864?
Using a private daggerverse repo is becoming tricky. If modules within the same repo use different commit IDs as their version, it can lead to random failures. With bumping dagger engine to v0.15.3, %60~%70 percent of the jobs is failing because of this.
Example trace: https://v3.dagger.cloud/CASTAI/traces/79442f5da59ef2838ab093dfcfa8d195
ping @obsidian rover
Update, synced versions between modules and repos etc but this time it didn't worked at all. To unblock our ci/cd, we're moving all private repo modules to inside of the repositories.
could you put the update in the issue?
sorry, just trying to make sure we don't lose that 😄
will do first, need to unblock ci/cd pipelines 
@spark cedar, why does this report exit with 1? https://github.com/dagger/dagger/actions/runs/13177476543/job/36780091370
cuz i made a typo 😛 https://github.com/dagger/dagger/pull/9517
@rancid turret as promised: https://github.com/dagger/dagger/pull/9518
hoping to have something that's somewhat ready by the end of the week
thinking about enums
Taking a look 👀 Thanks for the pings 🙏
@still garnet 👋 hey, is there a way with dagger.Connect to have the log output not be interactive?
isn't it already not interactive? it should just do plain logging
@obsidian rover do we have a private copy of https://github.com/dagger/dagger-test-modules with a PAT somewhere? wanting to write a test for https://github.com/dagger/dagger/issues/9524, and was looking to use one of the modules there to test it
The PAT is read-only -- base64 encoded to avoid github
and daggerverse-private is a mirror of the test-modules?
And it's not, I misunderstood 😿
Let me check
Do you specifically want github or bitbucket or gitlab is ok ?
It could be, feel free to push on top of that I'm ok -- the only test using it is this one -- easy to change and it can become a private github repo that mirrors the test-module with a PAT. Otherwise, this could work: https://github.com/dagger/dagger/blob/f920f96ffd0a0252073c28f39608ad2796e28985/core/integration/git_test.go#L332-L336 (credentials on 1password) ; or https://github.com/dagger/dagger/blob/f920f96ffd0a0252073c28f39608ad2796e28985/core/integration/git_test.go#L347-L351 (same, credentials on 1password)
Happy to give you a write access to the repo / write PAT
When using export-to-docker, what name should I give it as argument? Is there a rule I have to follow to get the corresponding CLI to pick it up?
Should it be registry.dagger.io/engine:<git-commit> ?
🙏 🙏 🙏 🙏 🙏 Can someone explain to me why the CLI git commit doesn't match the engine tag? 😭
dagger v0.15.4-250207020442-075eac5c0946 (registry.dagger.io/engine:736eabb66f8cbe32ecac21cd49d2696e41110084) linux/amd64
^^^^^^^^^^^^ 👈 👉 ^^^^^^^^^^^^^^^
I'm trying to build a CLI/engine pair such that, once I loaded the engine into docker, the CLI will pick it up
Oh it's the last commit of upstream main, rather than the commit it was built from I guess (how does it compute that I don't know)
It's calculated in the version module from git merge-base
But we do this because we need some stable things that's been published to a registry - every main commit gets published, so we can use that
But generally - dev CLI builds aren't really intended to be used without a dev engine, this is just used as something somewhat sane (it used to be just get the latest build of main, which for an old cli build would break a lot of things)
Hey, is there anywhere one can get a recap at what is possible with Dagger shell in the latest release?
Thanks. The reason I need this is to work a dev engine (+corresponding CLI) on my system.
This requires:
- building the CLI
- building the engine
- loading the engine into the docker engine at a name that the CLI will look for.
That step 3 👆 is proving hard to automate. I guess I need to call <MOD>/version | merge-base to get the right commit? But I need to pass as arguments the commit I'm building, plus the latest main commit, looked up by me. Any chance load-to-docker could do this by defaut if I don't specify a name?
I just want the equivalent of make; make install for dagger
We haven't documented that yet, sorry. We wil do it soon.
Are you wondering about a particular feature?
Basically shell now has feature parity with call. We're still working on stabilizing some aspects of the UX. But you can already accomplish most tasks with shell that you could do with call. Personally I've switched completely.
I waste 20mn every time
Shell status
I have a few PRs opened regarding Java SDK. But I can't have a green CI at all.
For instance this PR https://github.com/dagger/dagger/pull/9533 which simply pins a dependency to fix a critical severity inside the Java module template.
Is there anything I can do to improve the situation? I found it really hard to know if I'm introducing something bad or not (I'm sure not in this specific PR but I have 3 other opens).
Side note: I'm not able to follow the links to dagger.cloud from the GHA results. I always got a "No traces available".
An engine to run your pipelines in containers. Contribute to dagger/dagger development by creating an account on GitHub.
that shouldn't have occured there
i mean that error makes sense 🤔 why is cacerts being imported on windows
fyi it looks like that's happening on main as well
so it's not your pr
might have been https://github.com/dagger/dagger/pull/9525/files
ah yup, because of that cmd/dagger imports engine/buildkit which imports engine/buildkit/cacerts
so we need to have it not do that 🤔
looks like we're just using it for various env vars
i think we should probably move those to engine/consts.go
working on this now, it's a bit fiddly, trying to replace mage with dagger shell while here
i think it probably requires a bit of rethinking how we do building and reasoning about versions
The dream would be to no longer require auto-download... ie. bundle the engine as local data alongside the CLI
yeahhhh several hours later i'm not actually sure if this solves the real issue
the real issue is that we need every component that we build to be able to be aware of various "globals"
e.g. in this case, "is this build a dev build?"
previously, we've been inferring this from git - but actually this is wrong, and we shouldn't do this. e.g. currently there's very subtly different behavior if you commit all your changes and have a clean state, vs, if you create just one different file
technically you can solve this by just passing a boolean everywhere, but it's so ridiculously messy that it will make it makes our CI so much more complicated than before
one reason this won't work purely by itself, is that we need something that will disable the auto old engine gc that we have specifically for the cases of dev engines (and we can't just disable that everywhere without making the dagger taking up too much disk problem way worse - there are fixes, but then this easy little thing turns into an absolute mess)
</braindump>
we could still auto-remove old engines, just not download new ones
mmm, but having something like ./hack/build build + start the engine, and then killing the engine that built it is a little bit annoying
all of this logic is so annoyingly fragile
the ideal end state i want is that ./hack/build builds+starts an engine (using dagger shell), and outputs a cli to bin/
then, we would remote hack/dev and hack/with-dev - because that cli would just work and connect to the already started engine
it feels something like https://github.com/dagger/dagger/issues/8419 and https://github.com/dagger/dagger/issues/6323
💡 okay, actually i think i've worked out a way to avoid this whole thing maybe - that said, i still want a way to be able to "mark" a dev build explicitly, instead of relying on git metadata (rubber ducking is good)
@tepid nova https://github.com/dagger/dagger/pull/9555
^ this should mostly do what you want. ./hack/build is a dagger shell script 🎉 the built bin/dagger will always connect to the engine that was just started
i'm gonna work on using this to fully purge mage entirely 🙂
in a perfect world, hack is either removed, or just becomes a handy little directory of dagger shell scripts
any core engine devs with any objections to the above plan?
is the default "RUNNER_HOST" endpoint still a unix socket with the name "buildkit" inside it?
And if so, isn't that technically a lie since the engine protocol has been changed to no longer be a passthrough to buildkit?
asking because I saw a user say "I connect to the buildkit socket in production and it works fine" and I wasn't sure which socket they meant
Yeah I think it still is
Mildly inconvenient to change, since then old clients will just hang on connection
ok that's actually reassuring, just a naming inconvenience - not a mysterious new part of the stack I was missing
@spark cedar I am about to disappear for 2 days for conference reasons, but FYI I am eager to pick up a thread I started discussing with @stray heron today : the relationship between "compute drivers" (https://github.com/dagger/dagger/issues/5583) and "stable engine protocol" (https://github.com/dagger/dagger/issues/9516). And in particular the fact that they may be incompatible?
--> Stabilize engine protocol: we officially support provisioning engines yourself out of band; take on the burden of decoupled CLI & engine versioning
--> Compute drivers: equip CLIs with a low-level provisioning interface (containerd? cri?), so that it can manage the provisioning of its own engine. Thus coupling CLI & engine versioning, and not supporting out-of-band provisioning of engines.
How do we reconcile those? --> to be discussed in those issues
I think there's some remnants of parts of the buildkit protocol in there. I can't quite remember which bits are and aren't from memory, but at least the attachable are still there
And maybe some version info
I'll write something for when you're back 🫡
When I brought it up at the last maintainers call, I totally forgot to dig into the details of the compute drivers discussion... After reloading it in memory, I realized my mistake. We can't discuss one without the other IMO.
But I think (in summary of that upcoming response) I prefer the idea of a more stable protocol and doing things out of band. That said, I think we need to be really careful to control the amount of effort this is gonna take - I'm happy with changing this protocol over time, as long as we keep some core parts the same. If we need to redesign an entirely new protocol, or fully document and support it, it's going to take a while (and I'm kinda still not fully sure of the benefits of doing so, beyond it would be nice).
I think that means personally I'd scrap the compute drivers - and move all provisioning out of band. We can build all of our logic server side if we want to do something magic in cloud (aka the beast project)
If there's no auto-provisioning interface, technically that means the docker run auto-provisioning remains a weird special case?
Lol yes, was just typing that
Maybe there's something we could do there
E.g. for a Linux system, an install could be a systemd oci image or something? Instead of running under docker
But tldr, I think we still have one case of auto-provisioning (but we could hide that almost completely I guess... if we bundle the engine into the CLI, and solve those problems)
I just worry that choosing to stabilize RUNNER_HOST, and building the production architecture from that constraint, ends up being circular. "It's what we had in the beginning, so we built the architecture around that, so now it's all we have"
"It's all we have, so now we have a feature in the CLI for manually managing our engines. dagger engine list; dagger engine start prod-engine-2".
I would probably pare back the runner host env var a lot, simplifying it to only tcp, unix, tls, etc. Very very simple.
Reminds me of docker contexts, or before that - docker machine
I think we need to avoid this agreed
Well in that case we need to discuss alternatives to stabilizing runner-host, because one leads directly to the other
I mean the alternative is to combine the CLI and engine as you've suggested before
But I think that's hugely limiting
Because now you have to run it on the same machine
they could be coupled components without being merged into the same component
You can't offload compute resources (which will be hugely useful for llms, etc)
But why is that different from today 🤔
Because today that's only available on docker
If you want to run docker on anything other than the local docker engine, you have to eject out of auto-provisioning, and manually provision out of band
stabilizing RUNNER_HOST is enshrining the "manual out of band" as the one true way to provision
And maybe it is... but I'm worried that it might not be, because once we cross that door there's no going back
Extending the provisioning mechanism to everything is what I'd call the compute drivers approach
yes correct
But I think that also leads to docker contexts or similar
Hence my proposed design debate: "stabilize runner-host vs. compute drivers"
Because you'd want a way to choose which driver you want
no, I think it's different. docker contexts imply manual out-of-band provisioning
So I think we end up there anyways
But how would you pick which driver to use
Do you just look at the env? And see what you can find?
Yeah you would still need to configure that in the CLI. But configuring a provisioning driver is not the same as configuring individual instances
Oh sure - happy with that
So can you just not connect to an already running instance?
E.g. you've run it in kubernetes. Similar to a setup today.
Or do we need to build a kubernetes driver to enable this
yes in this model the current method of deploying engines would be replaced by something lower level
since the whole point would be to couple versions. So you get rid of "Hey I upgraded my CLI to 0.14, any chance we could upgrade the Kub cluster to that version by next week?"
Yeah that's nice
Ideally that conversation no longer happens between humans. It happens between the CLI and the kub driver. "Hey I need an engine for commit 424242". or maybe "hey here's a custom image I'm uploading, deploy me an engine from that please"
Yeah I guess this is the substrate you've suggested before
right. Last chance for discussing it basically
If we don't incorporate it into the plan now, it goes down the trash forever I think
Eh I don't think it's gone forever (no door gets closed forever), but I do think it makes it significantly harder
mmm yeah some doors do indeed get closed forever trust me
I remember trying to implement this before, but trying to avoid the tripple nested containers was not very fun.
I might need more time to remember the details
Honestly, I think having platform specific drivers would be the easiest way to go about it - i.e. a kubernetes operator or something for kubernetes
I think this is where compute drivers ended up
maybe compute drivers are just different "transports" to get to a remote-host session?
yes, call the driver and say "hey give me a session to an engine with these properties (we standardize the config there)" then the driver either gives you a session (could just be stdio proxy) or returns an error, eg. "sorry can't give you this version" or "sorry I don't support direct upload" or "sorry I don't have this architecture"
So it's basically "remote-host protocol but with a twist"
Yeah!
then that becomes the open interface for open and proprietary hosting solutions
the trick is to define the driver configuration carefully, so that it works for the implementations we have today, but is future-proof for when we introduce distributed caching, clustering capabilities, fully stateless engines etc
ok now I really have to disappear 🙂 to be continued
Mmm yes
I'll try and write up something I guess?
🙏 perhaps in the "stabilize engine protocol" issue, since that one is the freshest, and it will force us to discuss compute drivers in the context of the obvious default path (which is to just open the protocol as-is and call it a day)
Quick question regarding vulnerability scans. Trivy is run on PRs with the following settings: --scanners=vuln --vuln-type=library --severity=CRITICAL,HIGH --show-suppressed
Right now it's failing because of multiple vulnerable Java packages. I fixed some of them https://github.com/dagger/dagger/pull/9533 but one is still there.
The status raised by trivy is affected, meaning there's no available fix at this time.
In this specific case it should happen soon, the fix has been integrated to the project but not yet released. Once it will be released we can upgrade this dependency.
In the meantimg it means that even after to merge the mentionned PR the scan check will fail.
I was wondering about the expected behavior here. Is that what is wanted, so to fail about things we can't (right now) solve? Or should we add for instance --ignore-unfixed and only focus on the ones that have a fix available?
🤔 don't think we've really discussed, but personally, i'd rather have it marked as red - there is an issue. better that than a green check that's gonna hide real issues.
👋 need to confirm a design decision we made (not released yet, holding releasing until more clarity).
as part of working on private go dependency in modules, we did https://github.com/dagger/dagger/pull/9454. we needed a way for a module to mark a dependency with GOPRIVATE (see discussion in #1318581231465533450 message) - so we added an sdk.env field to dagger.json
How does cache volume/layer persistence behave when multiple engine instances run concurrently? (using distributed caching with the same dagger cloud token)
For example, let’s say two engines start up and both download the same initial cache volumes and functions lazy-download cache layers during execution as expected. They are both running the same dagger function call, which mount cache volumes of the same names, with cache mode shared, from the same branch and but I've ran it twice and thus two distinct machines and engines. In this example each engine is gracefully shutdown after the function call finishes.
So one finishes first and begins persisting it's caches, while the other finishes slightly later (but while the first is still persisting) and begins to persists it's caches. My understanding is that Dagger’s cache sharing is scoped to an individual engine/buildkit, and that there’s no coordination/awareness between separate engine instances—so it’s up to the remote caching service to handle any possible divergence/conflicts/merging.
noticed something a little weird while mucking around in the TS SDK's runtime module: it's running a bit behind in terms of go versions and go.sum versions, but when i dagger develop or go mod tidy in there and the versions all get bumped, it seems to dramatically slow down the go-sdk-codegen step of module init... like this can make a fairly large difference in module init time, 13.3s vs 3s
ideally most of those deps get pulled from the go sdk builtin container, but how are we handling bumping those deps over time or giving ourselves a way to guarantee that our 1st party modules are using the same go module versions that we bundle in the engine image? cc @spark cedar @civic yacht @still garnet @rancid turret @fair ermine
[async] 👋 Is there a way to get combined stdout and stderr, @civic yacht @spark cedar?
Not atm. It would be extremely trivial to add an API where stdout/stderr are just appended to one another, but if we wanted to add support for returning the string where they are interleaved in the actual order stdout/stderr was written (which is probably what would actually be expected) we'd need to do something fancy. Mainly tricky because we'd need to support all 3 APIs of "just stdout", "just stderr" and "interleaved".
Would probably need to prepend each written line of stdout/stderr streams with either timestamps or something indicating whether they are stdout vs. stderr and then do a lot of sorting/trimming for each of the api implementations
I was looking for interleaved unfortunately :/
Patching swebench to use dagger for evaluation, and at its core, it parses the combined output to figure out which test passed failed etc
Ah I see, the only other possibility I can think of is a setting on withExec that says "stdout/stderr are the same pipe", so then they are interleaved and .stdout and .stderr just return the same (interleaved) string. Which feels a little weird but is the most plausible in terms of an implementation that's not super complicated and doesn't have a ton of potential performance overhead
Yeah. Maybe no timestamp and not sorting if they’re written as they come … but still, quite a change
Yeah and I think performance would be a legit concern when there's thousands of lines of output
Yeah, I’m doing the hacky version of this right now (bash -c with 2>&1)
Or sacrifice space and keep an extra combined stream
true that might be least worst
Given how non trivial it is, I’ll keep doing this workaround
Maybe a variation of this, since we already have redirectStderr would be to add redirectStderrToStdout or similar (which is basically the hack I’m doing but without having to wrap the exec around a bash)
Anyway, bash wrapping does the trick for now
I'm trying to re-enable dag.Host().Service() in a local build of my engine. I've commented out this line https://github.com/dagger/dagger/blob/main/core/schema/query.go#L22 which succesfully codegens the .Host() fields, but I must be missing another piece that needs to be changed. My code using dag.Host().Service() runs, but it seems to just hang infinitely. Any ideas?
@tidal spire I asked o1 for you:
Possible Explanation
When you see a “hang” or infinite wait in a Dagger + GraphQL resolver, it usually means that the query is never completing, often because the relevant resolver wasn’t registered with the DAG/GraphQL server. In other words, your “host” field might exist, but the “service” field on that host type does not actually get installed, causing the query to stall.
Dagger uses code-generation heavily. Each GraphQL field needs two parts:
The resolver function in Go (for example, “func (s hostSchema) service(… ) …”)
A registration that says “this function resolves the ‘service’ field on Host objects” (i.e. “dagql.Fields[core.Host]{ … }.Install(s.srv)”).
If you only have the “func (s *hostSchema) service” code but never do something like:
dagql.Fields[*core.Host]{
dagql.Func("service", s.service).
Doc("Access services on the host"),
}.Install(s.srv)
then “service” won’t make it into the final schema. You’d see .Host() codegen appear, but calling .Host().Service() may attempt to execute a non-existing field and wind up stuck.
What to Check
Locate where your “hostSchema” or equivalent is installed (for example, “func (s hostSchema) Install() {}”).
Ensure it has a block that registers your “service” function, something like:
}
// This registers the top-level Query.field("host")
3. Re-run your codegen or recompile. Your .Host().Service() calls should now resolve properly, rather than hanging forever.
Usually, the fix is just making sure both the resolver implementation and the field registration are re-enabled. Once they match up, the “service” calls should work again without hanging.
I will give my branch to qwen2.5-coder:32b and see what it can find
Where does it hang specifically? If not obvious one brute method is to tail engine logs and send it SIGQUIT, you'll get a stack trace of all goroutines. Slightly less brute is to uncomment this line, re-run ./hack/dev and then use pprof to look at goroutines (curl -s -v http://localhost:6060/debug/pprof/goroutine > ~/gr.pprof && pprof -http localhost:8080 ~/gr.pprof)
enabling dag.Host()
Tired of waiting for this, might just do it when I have 30mn https://github.com/dagger/dagger/issues/8354
😂
just when I got used to seeing ETOOBIG
@still garnet I tried rebasing llm on main, and getting some light merge conflicts relative to your otel emoji stuff. Would you mind doing it, I'm afraid of breaking something 🙂
(trying to keep up with main to avoid rot)
also to stop the "upgrade to 0.15.4" nag 😛
@tepid nova done
@still garnet @civic yacht gut check. I think we should add token count as an otel metric in the llm branch 🙂 wdyt?
can we show the $ cost? 😛
I noticed every AI tracing/observability product has that, it's like the number one feature
even better (a bunch of products do that, clearly there's demand)
But that would be a Cloud feature IMO - doesn't make sense to hardcode that stuff in the engine
As long as model info & token count is sent up - we can do everything else from there
Btw my micro-agent fun cost me almost $20 of OpenAI tokens this month so far...
yeah not sure how all that works, I mean if it's something we can get easily out of the API it seems like it'd be nice to see that in the TUI too
since we're already integrating with OpenAI/etc at that level
I think the APi clients all have it
https://github.com/openai/tiktoken, coupled with: https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb
That's what most people use [to predict] (coupled with the given language implementation)
sgtm, the current metrics only come from exec-ops but the underlying infra will work in any other api call I believe.
to sketch out what you'd need to do the equivalent of using current code as an example:
Nice thanks. Started an issue https://github.com/dagger/dagger/issues/9591
Should we retire #1121837200712142878 ? In practice everyone is using either #engine-dev, or language-specific channels eg. #python , #php etc. Seems redundant to have another channel on top.
When you really need the op binary in your dagger dev environment and you're too lazy to follow the 10-step instructions for installing it on alpine:
⋈ dev | with-file /usr/local/bin/op $(container | from 1password/op:2 | file /usr/local/bin/op) | terminal
$ op --help
1Password CLI brings 1Password to your terminal.
Usage: op [command] [flags]
[...]
@spark cedar before you log off, could you give me pointers for how we could get llm branch connected to a proper client-side config system? My hackish workaround is the number one issue for people getting started with the melvin demo at the moment
there is no proper client side config right now 🙂
i genuinely don't really know what it should look like, but it's probably some stateful api under a new top-level client/session object that the cli hooks into
stateful apis suck, but better in graphql imo, than some side-channel (like more session attachable hell)
Don't we have engine.json wired already? Or something like that?