#maintainers
1 messages Β· Page 14 of 1
I guess now that we have secret providers merged, I can just use that to access a host file of my choice
I just need this: https://github.com/dagger/dagger/issues/9507
Would that π be the shortest path @spark cedar ?
sure, but that requires an engine restart
that would probably do it for now, but i'd like to have a client-led approach eventually as i proposed in https://github.com/dagger/dagger/issues/9007
yeah like I said, way too subtle to split it up in different files, it will all be dagger.json in the end IMO. I was going to just slap something in engine.json if that's the only file currently wired in, and use it for session-wide settings instead of engine-wide (because I don't care about the separation of files)
Anyway I'm flexible on means to get there at the moment, I just need to make the "hey why does it fail with .env file not found " blockers go away asap
@spark cedar talk live tomorrow? I'll try to start early
yeah π
I'm currently on Solomon's llm branch building a local engine with ./hack/dev. Second time I try it and got this same ~20 minute duration without finishing successfully. It seems to be stuck loading dependencies, according to the trace I have 3 that haven't completed yet. Resources shouldn't be a problem
π€ I don't see any reason why the build would fail... Just in case, can you try the build method I document in the melvin README, to see if you get the same issue?
It uses dagger-in-dagger on top of a stable dagger release, for simplicity
Its what I tried first but it never finished so I went with a non dagger-in-dagger approach to see if that was the problem but same thing
I'll retry it with a fresh engine
i had a failure like this randomly on a totally non-LLM-related test run like 30 minutes ago, got totally stuck on dagger/dagger module init
I don't really touch anything in the build itself tbh
Its weird. I started with a fresh engine and it worked now
At least it moved on from the initialize module stuff
yeah whatever that failure was it was non-llm-related
unless you're accidentally using a dev engine to build another dev engine?
that's happened to me before, a stray _DAGGER_WHATEVER in the env
(one reason I try to never use those hack scripts)
Nope, just using 0.15.4
Oh... well I havent upgraded to 0.15.4 yet
@still garnet rebasing on main, hitting that emoji span conflict again... Looks like it's easy to deal with, can you just give me a pointer? Or maybe we can strategically squash the commit that's causing the conflict?
yo @civic yacht i've been digging into why/how the engine benchmarks have been failing in CI and I think my "bisect" is pointing at https://github.com/dagger/dagger/pull/9483 causing dagger call bench specific --run=BenchmarkModule/BenchmarkLotsOfDeps to get OOMKilled... based on vibes, that kinda makes sense to me, but does it make sense to you? (fyi using 15.4 as the "outside" engine here)
Giving up on rebasing on main for now
Current plan for cleaning up LLM config:
- Move model selection to
llm(), deprecateLLM_MODELin.env - Patch
engine/server.initializeDaggerClient()so that it queries CLI for its LLM credentials via session attachable - Build from there for clean per-session LLM config
@final star @still garnet @meager summit @civic yacht what's the cleanest way for me to access the "outer" server (external client) from an "inner" server (ie. module), to cleanly implement selective sharing of llm config?
Mmmm, this argument to initializeDaggerClient seems promising?
Kind of struggling, but hopefully will fumble my way to getting this to work...
not quite understanding your callsite here, dagger.Connect() obviously doesn't work where you're at otherwise you'd not be asking... so where are you at?
iiuc you're trying to pass llm config in a way sorta similar to secrets, like attached to the client session?
Yes. I already use session attachable today to get LLM config from the client. The problem is that it's client specific. So when you call llm() from a dagger function, I reach into that function's client context to get LLM config, and of course there's nothing there (no .env). I have a hacky workaround, but what I really want is for the root client of each session to be the one I query. Then for all clients in that session to inherit the real llm config
Later that will have to be configurable (otherwise it means any module can just consume infinite tokens on your Anthropic or OpenAI account...)
But before making it configurable I need to make it possible
So I'm trying to plumb things through. It looks like I have access to everything I need in engine/server.initializeDaggerClient(). Just trying to figure out how to plumb it through, all the way to engine/core.Llm which is where I need it...
that's a lot of plumbing, not sure which wall to drill a hole in π
might be possible to hotwire it into engine/server/client_resources.go?
that's where the secrets get propogated around
Not 100% sure, but might be related: https://github.com/dagger/dagger/pull/9530/files. Here, Justin is making sure that the secret is passed to the Directory object. I mean, the call he's making seems ot be close to hwat you need (from what I understand)
lol i was spelunking around looking for that MainClientCallerID
There's also a ParentIDs
That seems to be for a similar purposer (allow a dagger function to use IDs from its parent object)
src.Query.AddClientResourcesFromID ends up assembling that (get or) initalizeDaggerClient call for you down the chain a lil bit i think no it doesnt nm but it does let you re-use both existing clients to pass the resource
Oh shoot, I thought the same
it uses clientFromIDs which looks up clients from session/client ID pairs
similar, just doesn't intialize bc i think in the context of core calls that's already done for you
Hmm try --rebase-merges, if that doesn't help I'll rebase it tmrw
like, conceptually, isn't there already a client serving this code? what you really need is the config, a new client is just a means to an end?
Solomon, do you have a branch which I can poke around at where this problem is happening right now
π yes. it's all in the llm branch --> See https://github.com/shykes/melvin and setup instructions there
Contribute to shykes/melvin development by creating an account on GitHub.
The engine branch is github.com/shykes/dagger@llm
and do you have steps to reproduce the problem.
(sorry, trying to get specific info so that I don't get spiralled into reproducing the issue itself)
@meager summit to reproduce the issue, follow the setup instructions (option 2: not-so-quick-start) from the README of https://github.com/shykes/melvin, but skip step 6: initialize LLM integration. Then try using melvin from the commandline (also explained in the README). You'll see the .env: no such file error
π trying this now
thanks, sorry it's not a neat isolated engine-only repro
that is ok. I think as long as we have a clear list of steps to repro the problem, it should be helpful.
I'm not sure why, but when I run my dagger dev branch inside dagger (dagger shell -c 'engine | service foo | up'), telemetry gets lost. Makes it very hard to debug issues
I'm resorting to panic("useful debug information") to get information out
18:32:07 ERR failed to emit telemetry error="traces export: context deadline exceeded: retry-able request failure: Post \"https://api.dagger
.cloud/v1/traces\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
I am able to reproduce the issue (I am using hack/dev -> hack/with-dev though).
looking more..
@meager summit so the issue is caused by my workaround. Look for globalLlmConfig
yeah, i am looking at loadGlobalLlmConfig function right now
I allow all clients to share the same llm config, by setting it on a shared global var. The first client to "hydrate" it wins... That has to be an external client. Otherwise module clients trigger a session attachable lookup in their own "host" and of course there is no config there
Generally what I'm trying to do:
- Whenever a new session is initialized (not just a new client in an existing session): use session attachables to load a llmconfig.
- Attach that llmconfig to the session
- Allow all other clients in the session to share the same llmconfig
Note: I'm refactoring how the llmConfig loads its configuration, but the mechanics remain the same: it needs a dagql.Server (and it has to be for the root client of the session), and it uses it to call session attachables
in other words my refactor doesn't change the fact that it's a hacky global variable that needs to be manually hydrated from the right client. It just changes how it does the hydration once called.
I think how this should work is like SSH_AUTH_SOCK..... where when we initialize the main client, it gets loaded into a store and then passed along in all the other sessions.
ah cool. Yeah sound similar. Can it be an arbitrary type, like my core.LlmConfig?
and this is where I think we load this using clientMetadata: https://github.com/shykes/dagger/blob/llm/core/schema/git.go#L157
i have never used .env yet. but does loading .env works outside of llm feature itself? or this is something new we are adding as part of this feature
.env is orthogonal, I'm refactoring so that llmconfig looks directory in the client env (and optionaly will also look for .env , as an experimental convenience)
so it's part of the "how llm config is hydrated" implementation detail
under the hood I only use very standard secret() calls
However, this method doesn't rely on the sessionAttachable though -- It would work π
yeah, i dont see a sessionAttachable implemented for llm
trying to follow SSH_AUTH_SOCK pattern for llm config....
@meager summit I don't use a custom session attachable. I just use secret()
Oh you're using the secret implementation from Andrea to retrieve the value at .env, and from that
Not sure if I should be using secret(). It's just my hack for reaching into the client system easily
yes exactly
secret("file://.env") and then more secret() calls if needed
Now refactoring to also do secret("env://OPENAI_API_KEY") etc
to be clear I am not surprised by the current behavior of my hack. It makes sense that it works that way. Just looking for a clean design to refactor it to not rely on a global variable, and not require "manual hydration"
Ok, and so, this LLM type is part of the core API I suppose?
So, from memory, for what you're trying to do, which is expose the secret across all the modules, this secret needs to be exposed in the dagql schema -- meaning that dagql needs to know that it exists
The way I did it was: https://github.com/shykes/dagger/blob/llm/core/schema/git.go#L254-L286. But, I had the withAuthToken core API to help me, in your case -- I have to check if Andrea left a helper (or you could create yours, which is ok)
So, in your case, you would need to have the same withLLmToken helper -- with that, in the call calling the sessionAttachable, you would do the same thing (more or less), as the thing we do on the link above
==> This is the path I would Rajat
Yes, part of the core API
the path you mentioned @obsidian rover, it seems to use GitCredential attachable and attach the withToken to dagql.Instance[*core.GitRepository]
are you suggesting we do similar thing for llm, where we attach llmToken to dagql.Instance[*core.LLM]?
where does the following fit in:
Oh you're using the secret implementation from Andrea to retrieve the value at .env, and from that
Yes exactly -- Solomon manages to retrieve the token at loading of the module, that's why his "hack" works -- the issue is when other modules / types need it, it is not available -- thus making it accessible in the dagql object
Mmmh, Solomon is already retrieving the secret the way he wants using the secret() helper that relies on another session attachable. What he wants now is for the secret of the main session to always be accessible when other modules or types request for it (which is not the case atm) -- right now when the module needs the secret, it requests the session attachable in the context of that module, not the host
does retry helps?
no
works for me, but not for him, very strange
ah, I think after giving it 15mn, retrying seems to work now
ok it works π Just very bad timing
one diff I am noticing is that when calling dagger core llm --model abc, the client is the MainClient but when calling dagger call -m toy-programmer go-program --assignment "develop a curl clone" terminal, the client that calls dag.Llm is the module's client.
I guess that bit is obvious.
but, what we need, for the solution that @obsidian rover suggested, is to be able to load the .env file during the load of the toy-programmer module. (noticed an open issue for that: https://github.com/dagger/dagger/issues/9584) so that we can load it from dag using withConfig when creating Llm object
@meager summit two notes:
-
As of my last push, it's not just a single
.envfile, but also looking up individual env variables
-secret("file://.env")
-secret("env://OPENAI_API_KEY")
-secret("env:://ANTHROPIC_API_KEY")
- And a few others
Not ideal though, as those show up as errors in the trace for each env variable that is not set. -
I was thinking that ideally, you do the loading of the above, when a new session is initialized, using that session's root client (so, not a module). Then later, every time a new client is initialized in the same session (so, from a module), just re-use the same llm config that was loaded from the root client. That way you don't have to do anything special at module loading. You just need to hook into client initialization, and handle it differently whether the client is the root of the session, or not. One bonus is that way, it will also work for eg. nested dagger clients inside a
dagger run, even though there is no module involved
@tepid nova do send me a ping once you're online, i'm kind of out of the loop on the details of the llm impl, so would be good to sync.
but generally, i think the place we'll want to distribute config is going to be in daggerSession, probably something generic with a little kv map you can get and set (maybe lazily though)? I think that follows the semantics of what we're going to want, ideally we shouldn't be trying to play client shuffling games in the core api package, since that gets really tricky really quickly. the session is already the place where we have all that shared data, so makes to put the config there
but let's chat a bit later, if we can work out something, i don't imagine it'll be too hard to get something sorted this week
@spark cedar handing off the kiddos (they're on vacation) & will arrive shortly
thank you!
Surprised that when I provide a git repo and branch as a Directory arg on the CLI/shell, that I end up on the HEAD commit, but not on the branch. Note green branch.
cypress-test-update https://github.com/jpadams/hello-dagger-ts#green
in terminal debug shell
i've made a dagger dagger CI health metabase dashboard that aggregates @stray heron's manual per-version test duration/success tables and the data we've got in honeycomb. this works entirely off dagger cloud OTEL span data, and should obviate needing to check historical CI data across #8184 and "Is Dagger @ Dagger production-ready"
currently this dashboard only deals with dagger/dagger main and indexes fairly heavily off of client version, meaning you do have to think a little about the meaning of test jobs, like client version v0.15.3 running test-modules is usually testing a v0.15.4 engine (inside a v0.15.3 engine), but v0.15.3 running testdev-modules is for v0.15.3, but apart from that sharp corner it should be much easier to assess our perf progress in the future π
I feel like I ask this question every time... When using interfaces, I need 3 distinct modules:
- A module to define the interface, and implement a function that receives it
- A module to implement the interface
- A module to call the function from 1, with the type from 2 as argument
Is that right?
It's the 3d module that trips me up every time
@tidal spire I think you have a reference example of interfaces somewhere?
yeah i'll find them
And have you ever been able to use it for something useful? Or is that 3 module minimum just too high a barrier?
I really want to use it for melvin but I don't see how, without making it look scary
I think it could be useful for toy-workspace if you want to have an interface and then go/python/etc-toy-workspace
Ok so I might try this
Workspacedefines interface MiddlewareGotype GoChecker implements WorkspaceMiddlewareProgrammercallsWorkspacewithGoCheckeras argument
Of course I can't just instantiate a Programmer.GoChecker so if my Go module is a generic "utilities to develop in Go" module, I need additional plumbing like NewChecker from the root.
Maybe the right pattern is to make modules ultra small, like each module is basically one type
Yeah that sounds good to me. I don't know where my working example of interfaces went. From what I remember it worked similarly to the example in the dagger tests
for nested clients, do we support ssh-auth-sock already?
Hey <@&946480760016207902>, I know we've said that the --workdir flag is a legacy option that can be removed, but isn't dagger session using it from automatically provisioned clients (depending on config used in the script), thus this possibly is a breaking change for some users?
@still garnet have we changed anything around the tui/otel in v0.16? difficult to explain, but it feels like some execs spend a lot of time in pending states instead of getting marked as running. not really sure, could just be my eyes playing tricks on me.
--workdir removal
@rancid turret if you're not inside a module... should the current directory be uploaded? if i run dagger shell in my home dir on v0.16, it seems like it's trying to upload the whole thing, while previously i think it just used to immediately jump into a case where no module was loaded
it gets stuck doing:
Depends on your current context. Did you start it with --no-mod?
What do you mean?
Shell outside module
Shell: lazy arguments
@vito have we changed anything around
is there a way to tell from TUI, how deep we are in terms of nested clients?
uh. what are you trying to do π (i don't think there's a way of doing it really, but kinda curious what you're up to, there might be another way of doing it)
I am trying to debug this error when running SDK checks in my branch:
trace here: https://dagger.cloud/rajatjindal/traces/649db4c5a5886f63b406420cb3322d25
this is the PR i am debugging: https://github.com/dagger/dagger/pull/9323
do you have any pointers for this @spark cedar
mm, i'm not actually sure, @civic yacht might have some idea here? or @obsidian rover since it's related to the git config.
its actually not related to gitconfig. we just want the socket mounted in SDK and it works when calling dagger develop/call etc.
BUT
it fails when running sdk check using hack/dev > hack/with-dev dagger call check --targets=sdk/go (basically from within .dagger module)
Is there a way to bypass compatibility check when connecting with the SDK (i.e dagger.Connect)?
any chance i could get a review on https://github.com/dagger/dagger/pull/9555 ? i'd really like to clean out mage, and simplify our dev engine build process.
it also lets us skip dev and with-dev in quite a few cases (though sadly not on macos)
Is there a way the module runtime can access data from the host?
The goal is to be able to read a specific maven (java) configuration file that is used to configure the repositories to use when the runtime will pull dependencies to build the java module (so this has to happen in all phases, dagger init/develop/call/functions)
For instance would it be possible to add at a global or module level a secret that will be accessible from the runtime code?
I'm not sure to have seen something on that area, but I might have missed it.
ah, there is no way to do this - no modules (even runtimes) have no host access
this feels incredibly related to the talk around #1338841380960866345
I'll read the full thread, but yes this looks related
On the Java world it's very common to have custom configuration to access dependencies, use proxy for instance. Especially in big companies. And it's all defined in a file it would be great to have access to
ah! okay, so there are two things here, we need two "types" of config
there's per-module config, and per-engine config
i think that in the general case, if it's a company proxy, you probably want to enforce this at the engine level
i want to avoid putting company specific proxy details into a module wherever possible - otherwise we just make it harder to run modules on any engine. the (at least my) goal is that we decouple engines and modules as much as possible - you shouldn't ever feel like you need to spin up a pet engine to run a module with specific requirements
approved!
Shell: errors in otel spans
wonderful π merged it π
(for anyone who starts seeing insane build errors, it's probably because of this, apologies in advance if that happens)
Shell: Bubbletea-ify shell
Sorry to bother again, but could I get some π from <@&946480760016207902> on
- https://github.com/dagger/dagger/pull/9605 (already approved by @ornate ridge but he doesn't have the permission to put the green tick)
- https://github.com/dagger/dagger/pull/9655 (cache/performances improvement on the runtime, easy one I think)
- https://github.com/dagger/dagger/pull/9523 (I know this one is more complex... I can help to review it if needed. The integration test can also give an interesting overview of the main addition)
And I already have more changes that are ready to be pushed as new PRs (I'll open the PR right after those ones are merged)
I'd love to have them all merged by Monday (so today if possible) as they are all quite important for the Java SDK and docs are based on those.
I added some reviewers. Thanks @leaden glade! π
lol already hit one myself: https://github.com/dagger/dagger/pull/9675
fyi, some php tests are failing on main, looks to be related to https://github.com/dagger/dagger/pull/9625, discussing in #php
@leaden glade looks like java jobs are also failing after a merge
somehow looks like https://github.com/dagger/dagger/pull/9605
Here is the fix: https://github.com/dagger/dagger/pull/9677
π
I ran into some issues with this new build command:
- flags are set to
--dagger-cloud-tokenand--dager-cloud-urlbut shouldn't they be--cloud-tokenand--cloud-urlinstead? https://github.com/dagger/dagger/blob/c77c673232c37810306de56a7013f7c17fb976f4/.dagger/docker.go#L92-L95
That's also what is exposed by.docshell function:
Start the loaded engine container
USAGE
start [options]
OPTIONAL ARGUMENTS
--name string (default: "dagger-engine.dev")
--cloud-token Secret
--cloud-url string
--debug bool
RETURNS
void
- when one has a value (using
--dagger-*) the shell command silently failed. Theengine | load-to-docker | startis never executed. My guess is it's because the--dagger-*is not valid. But it means we are probably missing some kind of error message to be shown to the user. It took me a while to understand that was the issue. - if I change
hack/buildto use--cloud-tokenand--cloud-url, it will then failed (I only have aDAGGER_CLOUD_TOKENenv var, containing the token as the value)
β β Container.withSecretVariable(
β β β name: "DAGGER_CLOUD_TOKEN"
β β β secret: no(digest: "xxh3:5c1589dbe03be857"): Missing
β β ): Container! 0.0s
β β .withExec(args: ["docker", "run", "-d", "-e", "DAGGER_CLOUD_TOKEN", "-v", "dagger-engine.dev:/var/lib/dagger", "--name", "dagger-engine.dev", "--privileged", "localhost/dagger-engine.dev", "--extra-debug", "--debugaddr=0.0.0.0:6060"]): Container! 0.0s
β ! secret env var not found: "dag..."
oop
nice, i'll dig into this, i know what it is π
I really like how the llm type gets WithX functions generated automatically. It would be cool if my modules could generate WithX for each of the type's fields similar to how we generate a getter function. A lot of us end up writing the With funcs for our modules anyway, but we might as well generate them. With a way to opt out like you can with the getter
There's a subtlety between constructor and withX though.
If you add a withX, llms will see them... if you an argument to constructor they will not
For that reason I think generating a constructor might be safer
Not sure I follow. I'm talking about general modules, not specifically related to llm
So if I have the type
type Foo struct {
Bar string
}
I'll codegen Foo.Bar() string as well as Foo.WithBar() *Foo
Yeah what I mean is that until llm branch , I felt that between a) WithBar() and b) New(bar..) there was no meaningful difference, purely a matter of personal preference.
But with llm branch there is a real difference: one will be visible over BBI, the other no. ie one is outside the sandbox, the other is inside.
Ah I see. So you're saying in a world with llm our modules generally shouldn't have WithX patterns
I don't know, I personally like the pattern, but it's no longer a straight up equivalent of constructor, there may be new rules to figure out for when to use which
I noticed today that the generated code (at least in the PHP SDK) from the LLM does not match casing? https://github.com/jasonmccallister/laravel-assistant/blob/main/src/LaravelAssistant.php#L62
good catch - the getter function is currently capitalized, probably shouldn't be. will fix
pushed a fix, can tag a release tomorrow
@hasty basin i think you ran into this with typescript last week π
yep had to ask for the CypressWorkspace back leading-capital PascalCase, here. I was using an IDE at the time and so I took the hint
https://github.com/jpadams/cypress-test-llm-ts/blob/main/cypress-test-writer/src/index.ts#L34
then later when I used an LLM to turn the ToyProgrammer from Go to TypeScript, I fed it to ChatGPT and the LLM assumed it should be leading-lower camelCase (as did I, since all the other functions are that way).
let before = dag.toyWorkspace(); <<<< lower
// Run the agent loop in the workspace
let after = dag
.llm()
.withToyWorkspace(before)
.withPromptVar("assignment", assignment)
...
.toyWorkspace();
but it needed to be
.ToyWorkspace(); <<<< upper
Was in the client.gen.ts as
/**
* Retrieve the llm state as a ToyWorkspace
*/
ToyWorkspace = (): ToyWorkspace => {
const ctx = this._ctx.select(
"ToyWorkspace",
)
return new ToyWorkspace(ctx)
}
vs
/**
* Set the llm state to a ToyWorkspace
* @param value The value of the ToyWorkspace to save
*/
withToyWorkspace = (value: ToyWorkspace): Llm => {
const ctx = this._ctx.select(
"withToyWorkspace",
{ value },
)
return new Llm(ctx)
}
api question in tests: is there a way to do the equivalent of priveleged nesting, but not nested? i need access to the api, but i don't want it to be a nested session, i want to start a new one.
reasoning being, attachables - the git attachables are attached only at the original client, but i'm writing a test to programmatically create one
hm, i'm clearly missing something actually, since it's attached both per session and per client?
@civic yacht @obsidian rover am i missing something really obvious here? i'm trying to flesh out our module tests, i realized there's no full end-to-end integration git credential tests for PATs for private modules as we document in https://docs.dagger.io/api/remote-modules
this is my attempt to work on https://github.com/dagger/dagger/pull/9697 π
π€ i do not possess these
@stray heron is there any way i can get access to the azure creds? i need to push to our test repo at https://dev.azure.com/daggere2e/public/_git/dagger-test-modules
i've pushed everywhere else π
its in 1password
oh it's not -- youre right
i did add a new private+public keypair into the bitbucket e2e tests, since i couldn't find those anywhere
those, you can find with e2e
they weren't before π€
Ah smart π π
okay, i'll hold off on the azure ones for now
(resolved out of band)
Added the Azure section in the password manager -- But, the key has write access across all those repos. Sorry, should have been better documented π
otel: live progress
Is this issue still correct? https://github.com/dagger/dagger/issues/6599
If so, can we narrow down its title to be more precise, and less ominous/scary?
I renamed the title to bring the FUD to a reasonable level
Started using the new dagger.io/ui.reveal=true UI cue for our tests: https://v3.dagger.cloud/dagger/traces/1baf7de7d6efd7e34a0120b94db10187
Did we make any recent change that would cause nevermind the build was just unusually slow (hanging on an external dep?)dagger shell -c 'engine | service FOO | up to not be reachable on tcp://localhost:1234?
anyone know what's up with TestElixir? seems to be failing across all PRs https://v3.dagger.cloud/dagger/traces/0048afc0f93ac91d028dc5b108dbb63e?span=8ab2816023cdd155
does the engine hang for anyone else when running any of dagger call test {list|specific|telemetry}
./hack/dev is also stuck in that same step 
removed dev-engine-related containers and volumes too
there might be activity offscreen - try pressing - (decrease verbosity)
I think it was stuck because of the change I made to labels.go
but it's weird that it got stuck there
Sorry if this is obvious, but when doing something like asService().up({ports:[{backend:1234}]}) is there a way to retrieve programmatically the tunnel's host-side endpoint as it's in the output logs?
I don't believe so. But I believe you can specify the frontend port (of course there is a risk of conflict)
instead of using Service.up, you can use Host.tunnel(Service).start and grab the Service.endpoint: https://github.com/dagger/dagger/blob/789200f43579a799b237c660e2faa79a83404104/core/integration/services_test.go#L1755-L1788
maybe Service.up should return the ServiceID! instead of Void 
after spending so many hours bikeshedding this, I'm ashamed that I forgot about it π
Ah thanks I thought this was only for a server on the host to which a dagger service would connect as a client. Iβll try it thanks!
thanks @obsidian rover
it's very likely ./hack/dev was stuck in that step because I did not have enough disk space
after freeing up some space I can now build a dev engine
not 100% sure this was the fix, but seems like it
My understanding for the reason why up returns void instead of service ID is because it returns once the service terminates. So it's less useful then.
ah right right - forgot it blocks
That's exactly what we were about to hack on the side π
Yeah you got a nasty one -- i think it was failing at a moment where no log would be printed out --'
is it possible to install additional packages into a running engine? (without having to build the image again)
i think you can do apk add?
oh no, you get an error π
uh, no you can't sadly - there's some technical constraints see: https://github.com/dagger/dagger/pull/9117
ok (sorry, I am deep into buildkit/runc world trying to debug why my test won't work. possibly the same issue that you mentioned for PAT).
has anyone ever hit the engine just dying from SIGSEGV and core-dumping somewhere on the system? with no message on stdout/stderr?
i seem to have started seeing this upon super rare occasions, i'm wondering if maybe related to the recent 1.24 go upgrade?
I have seen that multiple times today.
everytime the logs were just stopped on cloning of git repo I believe
huh yeah exactly that
i've not seen it in ci, which is very mysterious
only locally
(I have since then reset my docker and change the virtualization to Docker VMM (I had it on QEMO for some reason). and after that i've not seen the issue yet.
haven't seen it (yet), linux/arm64 fwiw
yeah it seems very ephemeral
maybe seems to be related to running a lot of tests at once?
i hit it while running a lot of git tests in https://github.com/dagger/dagger/pull/9697, and while trying to take memory profiles of https://github.com/dagger/dagger/pull/9395/
Fixes #6927, enables #9098, and continues the project of rebuilding the Ship of Theseus π
This PR introduces a new way of marrying dagger's dagql and buildkit's LLB. Currently, dagq...
huh i can hit this incredibly reliably
will dive into analyzing a core dump tomorrow
π On latest main branch, I try to call dagger develop on sdk/elixir/dev and got the following error:
$ dagger develop
β connect 0.3s
β moduleSource(refString: ".", requireKind: LOCAL_SOURCE): ModuleSource! 4.7s
! failed to load sdk for local module source: failed to load local dep: select: local path "/Users/thanabodee/src/github.com/dagger/dagger/sdk/elixir/dev/elixir" does not exist: unknown builtin sdk
! The "elixir" SDK does not exist. The available SDKs are:
! - go
! - python
! - typescript
! - php
! - elixir
! - java
! - any non-bundled SDK from its git ref (e.g. github.com/dagger/dagger/sdk/elixir@main)
Error: failed to get local context directory path: input: moduleSource failed to load sdk for local module source: failed to load local dep: select: local path "/Users/thanabodee/src/github.com/dagger/dagger/sdk/elixir/dev/elixir" does not exist: unknown builtin sdk
The "elixir" SDK does not exist. The available SDKs are:
- go
- python
- typescript
- php
- elixir
- java
- any non-bundled SDK from its git ref (e.g. github.com/dagger/dagger/sdk/elixir@main)
The dagger I use is version 0.16.2, not a dev binary.
put together a bug in https://github.com/dagger/dagger/issues/9759
managed to get a core dump, not incredibly clear what happens but appears to be a bad deref in go's malloc impl
I'm trying with v0.16.1 since I cannot regenerate clients or run tests because of that problem :/
Starting a thread to plan the merging of llm branch
are we planning to revert this commit? I (think) I am hitting this error when running my tests in my branch (although it could be related to my changes, but i dont see any relevant logs)
yeah π¦ thinking we need to revert, i think the go patch release will have a fix for it
sadly, we're actually using features of the new go
will get to it in a little bit π¦
sadly, we're actually using features of the new go
on that explains why just reverting that commit didn't work for me on my local setup. thanks Justin.
@spark cedar, just FYI, I am not blocked on this right now and able to run my tests (while I still can't explain the previous failures).
@civic yacht @still garnet would y'all anticipate any problems with tryna set success statuses on OTEL root spans? afaict we never seem to set STATUS_CODE_OK and @stray heron and I suspect that's fucking up our aggregate CI metrics bc I'm forced to treat STATUS_CODE_UNSET as success
per the OTel spec you should treat "unset" as success. Most span statuses are actually unset.
yeah, i'm aware, but i think in our CI context we wanna differentiate, especially bc there are externally enforced timeouts and i cannot tell if, for example, GHA cancels are propogating all the way down to the OTEL spans
not tryna set them everywhere, just on root spans
i also think our in-progress-spans thing complicates the spec a little in this regard
shouldn't a cancellation result in ERROR? either from the interrupt being interpreted (graceful exiting), or from the API timing them out (ungraceful), which sets EndTime at the same time
like i don't think there's a state where a span has an EndTime and Status=Unset if it's interrupted, given enough time for the timeout jobs worker to do its thing (timeout is 5 minutes, i believe it runs every minute)
lemme poke through the data using endtimes, i hadn't thought of that
@still garnet turns out i'm already filtering out null endtimes, but there are some endtimeless spans in our data... just gotta figure out the horrible SQL incantations necessary to handle essentially null durations π
there was a period of time where the API wasn't properly converting unfinished endtimes to null/unset in ClickHouse, maybe that's related
when? i don't have a shit ton of them, it's actually tempting to just call it noise... especially if you tell me that that period was 0.15.3-0.15.4
that period was 0.15.3-0.15.4 π - it came with the OTel bump https://github.com/dagger/dagger/releases/tag/v0.15.3
the fix was only applied to Cloud on Feb 25
seeing something a bit weird on main - i've updated ./hack/build:
diff --git a/hack/build b/hack/build
index 3020771e1..d1b455c51 100755
--- a/hack/build
+++ b/hack/build
@@ -4,7 +4,7 @@
# the engine in the host's docker runtime.
# HACK: strip "build" from the script path to get the parent module
-.use ${0%/build}/..
+.cd ${0%/build}/..
CONTAINER=${_EXPERIMENTAL_DAGGER_DEV_CONTAINER:-dagger-engine.dev}
VOLUME=$CONTAINER
but now, when running it, there seems to be a little pause in-between loading type definitions and load modules that didn't use to be there: https://asciinema.org/a/CoZQ1BMpuXFC5E4e3Sq40PF6X
fyi - i think we have to downgrade go to 1.23 π’ (and stop using a bunch of cool features)
we're hitting incredibly consistent SIGSEGVs for amd64 which we should probably help out and track down (but since it's not super reproducible, it's hard) - in the meantime, the best fix is to stop using 1.24 until we can get upstream fixed:
https://github.com/dagger/dagger/pull/9766
Just started using rand.Text() yesterday π But the function is small so I can copy until 1.24.
yeahhh the one that really hurts me is we started using generic type aliases, and now without them, some of our function signatures become hundreds of characters long
i've reached out to sebastian at docker to see if they've made any headway on the same thing we're seeing (there's a bug where they get exactly the same backtrace, i suspect it's similar)
API design question for Git π§΅
Power is out here and has been all morning, wonβt make the architecture call
Who has access to io.dagger namespace on maven central (java repository) and could it be possible to have my user added to it?
I'd like to publish some dev libraries, to test the publish workflow (and the result). It will only be -SNAPSHOT versions, nothing stable, but to ensure everything is on track before to start enabling it for real.
cc @stray heron ?
If you call Directory.AsModule() on a directory with a dagger.json that specifies a non-existent local dependency like "../.." you get stuck in an infinite loop. Don't ask me how I know π
If you call Directory.AsModule() on a
Could I get a quick β please: https://github.com/dagger/dagger/pull/9793
publishing via Netlify now
@hasty basin thanks - heads up I just pushed llm.6 π
hahaha
Note: the latest version is 0.17.0-llm.x. It was probably released moments ago...
fixed π
will PR change
thank you π
Could anyone have a look at https://github.com/dagger/dagger/pull/9767 please? π
This is a PR to prepare publication of the Java libraries. Publication is required to have https://docs.dagger.io/api/sdk/#custom-applications working a nice way.
This PR also improves to differentiate between (I have no idea if there's different names for that) SDK for modules and SDK to integrate in custom apps.
The diff of the PR can be better to be read commit by commit. Each have a specific goal and it helps to isolate noise of updated tests.
No real java code involved, just go modules and maven manipulation.
Improve versioning to prepare the publication of the different components to maven central.
Related to #9194
WarningBreaking change
This requires some changes in your module source code.
In order t...
And if some has a bit of extra Java capacity I have this one not that big: https://github.com/dagger/dagger/pull/9763
Again, better to review commit by commit. The first commit is the feature itself, with integration tests to that helps to see what's going on. The second commit changes the indentation of a file, so it's big for nothing...
@civic yacht @spark cedar we had a CI duration spike yesterday... might be back to our better trendline today, but curious if either of y'all already know what the spike was
investigating the glibc issue π§΅/cc @fair ermine
Thanks a lot! π
FYI as discussed today. Blockers for merging llm branch
Overview This issue tracks the work required to ship native LLM support in the Dagger Engine, as prototyped in #9628 . Blockers LLM access control. All modules have access to the host's LLM end...
Added known owners + HELP WANTED π
for the first time in a long time, I'm getting an engine build error that doesn't seem related to the engine code: https://v3.dagger.cloud/dagger/traces/c0c81d161633eae3c43f9547eebd3d01
pull main and re-run, you need this commit specifically https://github.com/dagger/dagger/pull/9800
sadly I need to merge main first π (edit: no big conflict, yay)
@still garnet feature request: Would be nice to get pagination back in Cloud (and/or filtering by user?)
I got bombed by @dense dust and my trace went immediately on page 2 π
lol, yeah that's been a menace. you can go to https://v3.dagger.cloud/dagger/follow to follow your own/other peoples local traces, but browsing by user would be great too, and/or filtering on the dashboard (maybe split the section in half?)
/follow ohhh sneaky
π£
folks, I don't know what changed, BUT, the output in UI and in TUI (especially during running tests) looks SO READABLE now. Thank you. Thank you.
has anyone else seen "waitid: no child processes" error when running Dagger's integration tests
I know Justin is off today so if @civic yacht or @still garnet have time to review this PR, that would be amazing π
It solves the embedding problem of dependencies that we talk about during wednesday meeting
I'm trying to understand where loadModuleSourceFromID is implemented on the graphql server side; I only see the parts building the query on the client side
dagql/server.go
InstallObject() I believe
the trick is that all load<XXX>FromID handlers are dynamically registered for each object type XXX
ahhh it's because i couldn't grep for it, it's load%sFromID, thanks
I know by chance just because that's the exact vicinity of where we added the LLM middleware magic
how cautious do we need to be changing linter module rules in dagger/dagger? there's this annoying one from pre-go-1.20 that i keep hitting while changing code in the dev module, basically doesn't allow fmt.Errorf("%w %w", err1, err2), bc you couldn't do that in 1.19.
does this need treatment like a breaking change to a public module? 1.20's been out since 2/2023, so theoretically 1.19 has been out of support for nearly 2 years
yup, let's just change it π
it's not really used anywhere else afaik, so i wouldn't worry π as long as our linting keeps passing
shell: completions
FYI, regarding the wolfi changes we saw upstream: https://www.chainguard.dev/unchained/wolfi-moves-to-usrmerge-standard
https://github.com/orgs/wolfi-dev/discussions/40270
To enhance standardization and compatibility with modern Linux applications and other major distributions, Chainguard is adopting the usrmerge filesystem layout.
Majority of glibc based distributions use usr-merged file system layout. Some are also moving sbin merged systems. For wolfi to remain a drop-in replacement for these, it is best for Wolfi to adopt...
They fixed up the issue that impacted us too (which was extremely specific to the glibc package) https://github.com/wolfi-dev/os/pull/45154
could version pinning have saved us?
I wonder why I got a cache miss here?
adding terminal
the apt update?
Seems to be cached without terminal and not cached with it.
FYI there seems to be a regression in how the Dagger engine loads the Elixir SDK?? I have a very very confusing repro matrix that I'm working on. Initially thought it was related to the llm branch... but no. It affects 0.16.3 at the very least
@still garnet random services question π§΅
I am looking at fixing https://github.com/dagger/dagger/issues/5691 (Prevent a .dockerignore papercut when instructing users on re-using Dockerfiles)
π§΅ about how I am proposing to fix this
omg ./hack/dev is so much faster now - ~34s, with engine changes! feels like just recently it was 2 mins, then 1 min
wait nevermind. it's because there was a compilation error offscreen.
looks like our new hack/dev script doesn't do the "Error Logs:" hoisting
im regularly getting ~40s
and boy is it a huge improvement, my sanity is only mildly tested now
the parallelism is pretty nifty too
makes me want to change export APIs to print a linebreak though π
/home/vito/src/dagger/bin/home/vito/src/dagger/bin/engine.tar
i now also need to figure out how to make it ignore CWD for export
i have ~/src/dagger/hack on PATH and frequently invoke dev in directories other than ~/src/dagger, and right now it's dropping output into CWD/bin
which is not desirable
yesssss ππ have no idea what specific change is making it work better but, yayyyyy!
(either some dagql session caching change, or maybe the asTarball caching?)
anyone know anything about this new looking for module span? it's only visible at higher verbosities, but it takes quite a while, possibly stealing work from the later initializing module span
Yeah, I added that as part of filesystem navigation. It's called when you .cd and when you start dagger without -n. It's used to get more data not in the initial inspection in order to resolve paths, but @dense dust worked on a PR to make those calls not fail if the wrong source (e.g., modSource.commit fails on a local source), so we could preload this data there.
are you thinking on working on top of https://github.com/dagger/dagger/pull/9812 ?
I only need to add some tests, should be mergeable afterwards
π sorry if this is obvious, I'm trying to access the current module from the CLI and i'm doing this:
q := querybuilder.Query().Client(engineClient.Dagger().GraphQLClient())
mod := q.Root().Select("currentModule")
var response any
if err := makeRequest(ctx, mod, response); err != nil {
return fmt.Errorf("error making request: %w", err)
}
fmt.Println(response)
but I get "failed to get current module". I know I'm doing something stupid I just don't understand what exactly.
That will only work inside a module. It refers to the module the function is a part of.
Hey @rancid turret , would you be around for a quick sync regarding how to query our LLM.MCP() function from the CLI ? A bit lost atm π€£ Basically, we generate dynamically those withFoo to pass as a context to an LLM object any module, and expose its functions as tools : We would like to add an MCP command to the CLI
Sure
We're on team audio π
dev-audio actually
can we please please rename our socket so that it doesn't "buildkit" in the name?
Yeah, on-brand for Theseus so I'll do that quick. For the sake of avoiding painful back-incompat I'll leave a symlink or something for the old one
private deps support re: PR https://github.com/dagger/dagger/pull/9323 π§΅
Fixes #8766 by implementing #7787.
API question: our llm integration tests rely on the ability to "replay" conversations, so the core LLM api needs to support this.
FYI @fair ermine I think this will be a blocker for shipping generated clients: https://github.com/dagger/dagger/issues/9889
super annoying
ye, lol, he actually took out the error, and then i told him to put it back in π
my bad, we should just take it out
Also found this while I was poking around: https://github.com/dagger/dagger/issues/9890
π it's related to shell & llm use cases too - relying on modules not just as "buckets of dagger types", but also "buckets of dependencies", and allowing one to be used without the other
If a client is set, the error will not be triggered, oh but in your case it will still fail yeah..
dagql question... How do I add an argument that is 1) optional and 2) an array of strings?
Args: dagql.InputSpecs{
{
Name: "name",
Description: "The name of the variable",
Type: dagql.NewString(""),
},
{
Name: "value",
Description: fmt.Sprintf("The %s value to assign to the variable", typename),
Type: idType,
},
{
Name: "functionMask",
Description: fmt.Sprintf("Array of function names to mask (hide) in this binding. Use this to restrict LLM access to parts of an object", typename),
Type: dagql.Optional{Value: dagql.NewStringArray()}, // π© this is wrong...
},
dagql.Optional[dagql.ArrayInput[dagql.String]]{} i think
Thanks! Trying to add function mask real quick
Now trying to find an example of processing that array argument in the field function
ah! ToArray() looks like
update:
var functionMask []string
if arg := args["functionMask"].(dagql.Optional[dagql.ArrayInput[dagql.String]]); arg.Valid {
elmts := arg.Value.ToArray()
for i := 0; i < elmts.Len(); i++ {
elmt, err := elmts.Nth(i)
if err != nil {
return nil, err
}
functionMask = append(functionMask, elmt.(dagql.String).String())
}
}
request for comments on: https://github.com/dagger/dagger/pull/9860 (a recursive dagger develop into local dependencies)
little shell question π§΅
@rancid turret is this a regression with live spans support in Python SDK?
https://v3.dagger.cloud/dagger/traces/681539060a3479ae5100c171537ff7d0#9837c54f4361ab1e:EL366
It's a known issue but I expected it to be released by now. Since we're about to cut our own release I'm going to switch it up in the meantime: https://github.com/dagger/dagger/pull/9913
ah ha, thanks! β
@still garnet, is there a need for the lock in .refresh because of llm here?
h.mu.Lock()
h.modDefs.Store(def.SourceDigest, newDef)
h.mu.Unlock()
I think I was just being cautious there, it may be unjustified
now I'm wondering if .refresh altogether is even needed still
@tepid nova has it come up recently at all?
Yeah, not justified. h.modDefs is a sync.Map. h.mu is only meant for read/writing to workdir (e.g., .cd).
Is there a way to install the latest main using standard install.sh?
I know you can do curl -L https://dl.dagger.io/dagger/install.sh | DAGGER_COMMIT=9282e31872eb55f330326b883b139e1eabb9ef0b sh to specify an exact commit, don't believe there's just main support
oh wait, apparently you can do curl -L https://dl.dagger.io/dagger/install.sh | DAGGER_COMMIT=head sh to get main π€―
looks like there's a flake in the new LLM suite?: https://v3.dagger.cloud/dagger/traces/4328a2bf3f992f940fa9de51dade9d52?span=bf41d09f0a159d63
maybe because there's a history diff?
https://v3.dagger.cloud/dagger/traces/4328a2bf3f992f940fa9de51dade9d52?listen=a8970b90c8db01ba&listen=8a85ef5eed6e8289&showHidden=ffb51f32bf9f5b10&span=bf41d09f0a159d63#660b8904ec22cca9:L234
I was using colima, and ./hack/dev seems to break since 2 releases (0.16.3) -- will dig more this weekend ; but going back to docker fixes the issue
Where are we on live-reload host dirs from the shell? Is it within reach?
do you mean force a reload of an already synced dir? Or the "live continuous" sync (that simulates a bind mount)?
Force reload of already synced dirs is possible now, just needs someone to implement it in the API. On the engine side it should be pretty trivial via the new filesync implementation + new dagql caching features
The first one
I mean obviously live sync would be amazing π But I know that's harder, and also less urgent
Cool yeah that's ready and waiting
pretty low effort required at this point
Calling for bikeshed π π
Iβm on Colima and have no such breakage, what are your errors?
It's probably me -- maybe it was because of a compilation error -- and it's hard to understand ?
i do think there's a known issue which can make the logs more confusing, like it doesn't exit early on error or something like that iirc?
I can confirm that with colima, on my mac host -- i can't compile the mcp branch whereas with docker I can 
(testing again ahahah)
What about main tho
let's try, from clean / system prune 
Same, confirmed. I do have this at the beginning, could it be related ?
And i start from scratch docker system prune --volumes -fa with colima
I'll keep using docker for now, not a big issue -- just a bit surprised ; might have messed up my environment
FYI bitbucket is having issues which seem to be causing some of our integ tests to fail due to getting 500s: https://bitbucket.status.atlassian.com/
Welcome to Atlassian Bitbucket's home for real-time and historical data on system performance.
We need to cut a 0.17.1 release asap... Because of a regression in codegen for dag.LLM() (capitalized as dag.Llm() in 0.17.0).
- Does this need dev work, or is it purely a matter of cutting the release?
- Who can take it? π
are you doing colima prune? i occaisonally have to restart with colima delete when my disk gets full or the network gets "messed up" (usually happens because of macos hibernate, haven't dug further because i rarely unplug my work computer)
Will try, I think it was due to bad luck + the bitbucket thingy above π
colima start --vm-type=vz --vz-rosetta --cpu=8 --memory=16 --disk=500 while you're at it, if you're not already on rosetta it is significantly faster for dagger/dagger build and test
i wanna discord echo @stray heron 's closed PR in case anybody missed the GH notifications @civic yacht @spark cedar @still garnet : we've got some weird long-tail blocking behavior in test-module-runtimes and test-everything-else where splitting these larger GHA jobs into smaller test subsets somehow does not reduce their durations. also @stray heron , do you happen to still have links to the traces for the split? after you closed the PR idk how to link back to its action runs actually ignore this, it's easy to get them back i just reopened the PR
question around --no-mod - I assume it should be allowed wherever a --mod flag is? it feels weird that it currently isn't.
for example, i can't attach it to dagger query, if there is a module in the current directory, that command will always attempt to load it π’
makes sense to me yeah
only weird case is that you'd be able to add it to dagger call
though I suppose that would just error out
since you're supposed to use dagger core
kinda thinking out loud here. but had an idea that we could maybe simplify a lot of our service bindings using new dagop-y stuff.
i've been experimenting with putting the service startups into the actual buildkit operation for git + http, and it actually works really neatly π
also means that we get caching! so the service isn't started if we never actually evaluate the buildkit operation and it's cached
that last part is
- always bothered me
@spark cedar does it open a path to accessing the state of a service post-execution? That's often requested
not really this exact bit, since this is mostly about the lifetime of service bindings
but! Theseus in general will help us do this - we'll hopefully own every cache ref, every part of the fs - so we can keep the state easily
Problem Dagger SDKs do not bundle 100% of their code and dependencies. This means that when a developer builds their module, their build downloads third party packages from various third-party regi...
I've been meaning to write this one π for a while.
While helping @fair ermine design the interface for generated clients, I asked "why?" a few times in a row, to understand the current state of our SDK bundling and distribution system.
I reached the conclusion that the way SDKs deal with their dependencies (including core client library) is completely wrong. We should bundle everything. It will make things better for our users. I already know what the objections will be (they are the reason it's designed this way in the first place). They are reasonable but not enough to change my mind.
LET'S PLEASE BUNDLE EVERYTHING
That will be a fun one (in the right way, that's an interesting thing to solve) in Java. And it will clearly help based on the feedbacks regarding access to java deps in corporate environments
is there a "correct" way to handle user package locks with bundled sdk dependencies?
And is there a correct way to bundle node modules at all?
i suspect there is, but it's probably different for every package manager
which is highkey annoying
I'm more into the opposite side, where we should rely on the published library as much as possible to avoid io operations and compatibility.
For go it might be easier to vendor everything (even though I'm not a fan to see 300 files being copied to my host after a develop) but for Typescript it will be very painful I think... since we have 3 differents runtime and 5 differents package managers
I think we could potentially support the option, but we should not rely on that by default
Here's a deep research link from a couple weeks ago: https://chatgpt.com/share/67e5840b-272c-800d-92c4-2cba78ae780a
Looks like:
- Node has several 3d-party tools including ncc by vercel
- Bun and Deno have built-in bundlers
Note: these are our dependencies, we know them in advance, so we can also solve this with static techniques
Yeah I know about bundler, but I would prefer to avoid that...
The nice and clean clients where you can navigate will become a trash file
tbh for an sdk user when you goto definition and land in dagger client code that's already an unreadable trash file
But yeah, that's doable technically, but not sure we should do it, maybe as an option
Well
Also regarding performances, we might take a hit
Let's find out!
Because that means we would need to rebundle the whole client file, based on the generated clients or we'll need to be very smart how we do it, so we have a dynamic and a static part that can work together
Don't we already have a dynamic and ststic part that work together?
@rancid turret I think you'll be interested by this thread
I propose we move the discussion to the github issue π
I will add a note on possible performance impact
Yes, they work as real code, not bundled files
@rancid turret is it intentional that dagger core can't access all core apis? e.g. i can't do dagger core set-secret ... but dagger shell -M -c "set-secret" is possible
Yes, but it's because it's not very useful when it doesn't allow passing objects by ID. Shell doesn't have the same limitation.
Same with "cache-volume"
hm, so the context was around trying to implement a new secretprovider - i wanted to test it worked by doing dagger core secret --url libsecret://<uri> plaintext
i don't need to pass around by id
Yeah but that's the only use case then, since it's not a good idea to pass the plaintext either. π The initial intent was to remove stuff from a big list of functions where it made sense. You can test it out with shell now, but if you think it's a good idea to enable it in core it's hardcoded in dagql/dagui/opts.go (shared by the TUI).
TTL for secrets providers π§΅
Is there a decent story for using secrets within dependencies / modules yet? Does Dagger Shell help at all?
Secrets providers are in (1password and vault) which is awesome, but probably not the thing you're asking about. The best hope for defaults for secrets is https://github.com/dagger/dagger/issues/9584 which I see you're already on π
The providers are great, I love it. Sadly the moment you use another module, they no longer work in any capacity without having to pass everything on the CLI
Yeah, I'm subscribed to all the good issues π
Awesome, yeah you're on it!
@fair ermine and/or @rancid turret can you jump on a quick call with me? regarding the dagger init no sdk PR ? i'm in #911305510882513037
β€οΈ thank you so much Helder for all your help!
@Vasek - Tom C. and/or @Helder Correia
what's the setting to see the graphql queries be logged ?
there's no setting for it but if you grep for "break glass" there's some code you can uncomment. (the logs will be in the engine)
Is there a known flake in the rust SDK tests?
I haven't seen any in Rust in recent memory
What about this one?
That one has had problems but has been deflaked, so I'd check the output in cloud and see what's failing
Yeah I just looked at the output, an elixir SDK step failed with output:
88 : Container.withExec DONE [1.1s]
88 : [4.1s] | OS monotonic time stepped backwards!
88 : [4.1s] | Previous time: 28555971801
88 : [4.1s] | Current time: 28555971715

Apparently GHA runners are time traveling π€·ββοΈ
quick, deploy ptp
The Precision Time Protocol (PTP) is a protocol for clock synchronization throughout a computer network with relatively high precision and therefore potentially high accuracy. In a local area network (LAN), accuracy can be sub-microsecond β making it suitable for measurement and control systems. PTP is used to synchronize financial transaction...
I was actually thinking we let them go back further in time and buy NVDA on our behalf
anyone got something like this work in your engine.toml?
[registry."registry-1.docker.io"]
mirrors = ["http://172.18.0.2:5000"]
mirror_only = true
plain_http = true
both my local registry and my custom dagger engine are on the same docker network and the engine can reach the registry at 172.18.0.2. I wan't it to use HTTP instead of HTTPS, not sure if there is an option that works for that. If not, I'll throw a reverse proxy in front to terminate the HTTPs but was trying to keep as simple as possible.
Goal is to be able to pull images from my local registry while offline. Like in the poor/nonexistent connection and rate-limited environment of conference wifi.
thanks, had tried that too, but wasn't respected, it seems
hm there's also insecure
the options you use in that snippet aren't actually valid config I don't think
Might have to go for the combo approach here. Still trying with my nginx reverse proxy once more/
sh/interp Env
dagger root/shell script mode
There's a lint error on main, has anyone fixed it in a PR already? \cc @still garnet
Error: input: daggerDev.engine.lint linting failed with 1 issues:
[unparam] dagql/idtui/frontend_pretty.go:1531: `(*frontendPretty).renderRow` - `highlight` is unused
not yet, can put up a quick one
wait nevermind, i don't see this
main lint is green here: https://v3.dagger.cloud/dagger/traces/e900ed4f372328dd32f62dd81b586d7c
Oh, yeah it was fixed here: https://github.com/dagger/dagger/commit/348169795acb9713ff4551260126946d763268f4#diff-353e936c79c84beefdcfc1c393c8df3eb7b4f56a0f59ad2396d62c0f781b538e
β¦s (#10011)
Bumps the engine group with 8 updates in the / directory:
| Package | From | To |
|---|---|---|
| github.com/99designs/gqlgen | 0.17.68 ... |
Hi @spark cedar when you have few mins, could you please provide your feedback on: https://github.com/dagger/dagger/pull/9997#discussion_r2022388448
just noticed in https://github.com/dagger/dagger/pull/10038
a bit confused why we have LLM.mcp as a property? the naming feels confusing to me, there doesn't seem anything specific to the MCP protocol in mcp.go there? while all the logic for talking that protocol is in mcpserver.go.
cc @charred lotus @obsidian rover - maybe i'm just missing something
I'm in the process of splitting the MCP property from LLM in another PR (to remove the requirement of setting api keys to start an MCP server). I'm thinking of having a separate mcpSchema, still hidden for now. The stuff in mcp.go is related to all the tools that can be used with LLMs, so it is related to MCP, but the wire protocol itself is in mcpserver.go.
imo, i think the naming of mcp should be just for the actual wire protocol
even mcp -> mcpBackend would help
i dunno, i just spent a moment trying to understand why solomon's pr was touching mcp code
given it's not mcp specific
@spark cedar when people think "mcp" they don't just think of the wire protocol, they think of the very concept of connecting external tools to a llm. Instead of distinguishing between "mcp for external clients" and "our own system that's similar to mcp but only works for dagger-to-dagger connection, we call it bbi", it's easier to just call the whole thing mcp.
so, internal mcp and external mcp
but it's only similar right? it's not actually exactly the same, which is where my confusion came from
agreed on not using bbi tho
I would expect things called MCP to be using exactly the MCP protocol documented at https://modelcontextprotocol.io/introduction
well in theory we could setup an actual in memory http/mcp stream and encode/decode everything over the wire protocol for internal links. It would be easy to do since we already use the internal and external links are exactly the same in every other aspect. It would just be very inefficient to do it that way.
I guess we could call that intermediate layer "virtual mcp" like react has the virtual dom
yeah i don't really mind what we call it hugely - it's internal in our code, so honestly, it doesn't even matter if it stays as mcp
but i just wanted to raise that the current name was confusing to me, and i spent quite a few minutes trying to understand how all of it fits together
specifically seems like this line sent me down a wild goose chase
// The environment accessible to the LLM, exposed over MCP
mcp *MCP
I like this β€οΈ
i can't decide if this gets more or less confusing as I plug a slice of actual MCP clients into the existing struct we already call MCP 
i do think the current state where we call this "virtual mcp" just plain "MCP" is maximally confusing for new contributors showing up and poking around the AI code. it makes it harder to find what actual MCP support exists today because part of the MCP code exists to support the literal protocol and part of it exists to support generic tool calling
tbh for clarity maybe calling it "ModelContext" would do the job here? like it's not the specific protocol in every situation, but it serves the same purpose and even if you divorce the meaning of those words from the protocol i think it still makes sense
I like ModelContext
same - and I had similar feedback in the original PR. feels worth distinguishing since we'll have things that deal with the literal protocol in the same codebase
"model context" feels like a good balance of playing off MCP and also being technically accurate (it implements the working context for the model, separate from the protocol that conveys it to the model)
but then it's weird to have "context" and "environment" to mean essentially the same thing π
im not sure they do need to mean exactly the same thing, like currently an environment is a collection of object-bindings and a modelcontext wraps that and provides the tool-calling "views" into it (via currentSelection and functionMask)
also that argument applies exactly the same whether you call it MCP or ModelContext, doesn't it?
(also just to stir the pot a bit more, i wondered in the past whether contextual dirs should bear any relation to the environments api)
in any case, representing an environment as a context as part of exposing it to a model feels less weird to me than overloading MCP to mean something that's not MCP. I get that the term MCP has spread far and wide and means different things to different people, but that seems like as strong an argument as any to avoid using it for a precise technical concept that it doesn't 1:1 map to
the fact that we were all independently confused by it is probably a sign of that
@spark cedar quick question regarding "experimentalizing" APIs: do you also plan on allowing an object to be hidden from Query ? I'd like to add a hidden __mcp object to Query in https://github.com/dagger/dagger/pull/10050, but the CI is complaining that i didn't generate the sdks. But I don't really want this to surface in the SDK just yet. So is it orthogonal to what you're doing ? If i had to tackle this where do you suggest i look ?
hm so any _ methods shouldn't get codegen at all
if you see codegen for them, there's a bug
but you still miiiight need to regenerate? e.g. if you added a new type
you can still hide the type from modules though
there's a list somewhere, I always forget what its called (something like typesHiddenFromModules)
Took a detour this morning to try to figure out why we (and docker) get random segfaults when trying to upgrade to go 1.24, since I was getting pretty sad not being able to use some stuff from it
Think I figure it out, pretty nasty go runtime bug: https://github.com/golang/go/issues/73141
Is it normal that TypesHiddenFromModuleSDKs is honored by __schemaJSONFile but not by core/schema.SchemaIntrospectionJSON used by cmd/introspect ? SchemaIntrospectionJSON seems to make a graphql query (dagql/introspection/query.graphql) which seems different but similar to __schemaJSONFile which does cmd/codegen/introspection/introspection.graphql. But still only one does the filtering not the other.
It's "correct" in that __schemaJSONFile is what's used to do codegen for modules (where those APIs should be hidden), but it is most definitely ugly and not super apparent by the names of the things involved
Does it mean that if i add a new hidden MCP object to Core API, it is expected to generate new graphql schemas in docs/ when i run dagger -c generate on the repo ?
+type MCP {
+ """A unique identifier for this MCP."""
+ id: MCPID!
+}
+
+"""
+The `MCPID` scalar type represents an identifier for an object of type MCP.
+"""
+scalar MCPID
+
Yeah the schemas in docs (and for external clients generally) are not expected have any of that filtering of TypedHiddenFromModuleSDKs
If it's hidden, why are we exposing it in the docs ?
Those types are only hidden from the code you write in modules. E.g. we don't allow you to use Host in your module code. The docs/ are currently for all the APIs available to SDKs, including those that are running direct on the host
But you're right that the docs probably don't mention anything about that, they should though
Would it be valuable to add a filter on the API generated in the docs in addition to mentioning the difference between APIs available on the host vs in module code ?
Yeah, hasn't come up previously, but it would make sense to annotate those APIs in the docs as "not available to module code" or similar
Yes but in addition to that, also have a list of hidden APIs that we don't want to show up in docs ?
Oh you don't want MCP to show up at all? For any SDK, including those running on the host? I guess that feels weird since there would be APIs in those SDKs but not doc'd.
Not opposed to it though
i actually do not want to expose the API at all, not even to SDKs for now
it used to be a hidden method on LLM, but in order to remove the dependency on LLM i want it to be its own MCP schema
but as soon as it's on Query, the "underscore hiding" logic doesn't work the same for some reason (or at least that's what i think)
I see, the __ might only work with fields and not object type names
correct
I guess you could add another set of APIs similar to TypesHiddenFromModuleSDKs but like TypesHiddenFromAllSDKs and apply that filter everywhere?
yes
If you want to call it from the CLI you'll need to use some raw gql queries (since the CLI uses the go sdk), but as long as you're okay with that, sgtm
Probably not too bad to use a one-off gql query then, there's prior art in the cli e.g. here https://github.com/sipsma/dagger/blob/818deecc7947c50cc4bf11d2ecb0a58e1f53ab78/cmd/dagger/module_inspect.go#L214-L214
so the way i query using querybuilder in mcp.go is using the SDK ?
Oh I just saw the code you linked to, if you're already using the querybuilder directly then yeah you should be good to go
good to go as-in, it's not using the SDK, so i am okay to implement a TypesHiddenFromAllSDKs ?
Yes. What I mentioned before about not being able to use the Go SDK was more specifically "not being able to use typed codegen bindings", but using the querybuilder is all good
Cool, so i'll send a PR to implement TypesHiddenFromAllSDKs, so i can decouple LLM and MCP
thank you so much for your help
FWIW it may be important to merge https://github.com/dagger/dagger/pull/10050 as it is fixing mcp server support completely broken in main atm. Decoupling will be in a follow up
Hey guys, dagger helm chart currently broken for ppl still have magic cache config. This commit changes the indentation of the value https://github.com/dagger/dagger/commit/dc12edd86f700cdf8e643b4bf17aa7d089485736#diff-7f29c29598c7eecdb3332043e4de48253d5124ac50303c94a226735044ea2d98
it seems the bug released in v0.16.3
Is there an issue for self-calls?
This is now high priority: https://github.com/dagger/dagger/issues/8987
Is that issue a dupe of this one: https://github.com/dagger/dagger/issues/9934 ?
yes I believe so,..
Didn't realize my PR got dropped for the release: https://github.com/dagger/dagger/pull/10038
env(privileged: Bool): create a privileged environment, with core APIβ¦, host and current module access
Deprecates LLM.withQuery()
Another bug fix that could be worth: https://github.com/dagger/dagger/pull/10049
fix for the bug: https://github.com/dagger/dagger/pull/10071 if we're thinking 0.18.2 soon π
thx! just released v0.18.2 with this fix π
Same problem locally, I don't understand what's happening
i suspect something in the alpine repos has changed π€
Again? Didn't it break 2 weeks ago and we merge a quick fix because wolfy was unhappy with a dependent version?
i think that was maybe https://github.com/dagger/dagger/pull/9802 ? cc @civic yacht
Yeah that was this one
Seems like a different error entirely
Guess we just need to pin (and remember to update it)
Lmk when you open a PR, I'll approve it directly
okay, so looks like this is wolfi: https://github.com/wolfi-dev/os/blob/main/busybox.yaml
that's where 1.37.0-40 comes from
By bumping this line: https://github.com/dagger/dagger/blob/eaed4d2e767973023e6d178b8bd77c09d42cce3e/engine/distconsts/consts.go#L33 I tried locally and it fixed the problem
BusyboxVersion = "1.37.1"
Still running ./hack/dev but it's not failing at the beggining like before
uh
i think that line is only used in the integration tests
aha
What
./hack/dev just started passing for me locally w/ zero changes
I suspect that some CDN/other cache might have been giving us out of date stuff
Yeah I guess my changes was useless too
Btw is it normal that now my return value in shell is 0 even when the call fails?
Yeah I revert the changes and it works
π yeah, known issue
see thread in #1351958226752766096 message
hmm it's still occuring in ci - will block for a bit, i suspect it might shake out in a few moments too?
hopefully... if not lemme know
just wanted to give a heads up, i am seeing a bunch of DNS i/o timeouts in my dagger call dev terminal devloop. If anybody else knows about it, or wonders if others have it, well I do. It's not blocking me as it recovers somehow (either by waiting, or i just lose patience and rerun), but it's something i might need to investigate.
Is it just me, or is there no 'shell' channel?
can't quite find where we discussed this, but yeah, we don't. i think the tl;dr was that the shell is just another part of the dagger experience
@spark cedar now that we have Directory.asGit, can we make github.com/dagger/dagger/version less slow?
not quite yet - you can't control clone depth, I need to add this, but I'm trying to do it as I migrate it out of buildkit
planning on getting this done by end of week!
there's a couple of other weird features we need there sadly too π
in the case of our version module, wouldn't the most efficient be to expose its own version to the module? that way we can remove the tag fetching logic altogether
(orthogonal to faster git)
mm that would be kinda neat, but also, modules that aren't tagged just get a commit sha, so we'd need to add that too
I like that though I think
just building most of the version logic into the engine itself
would also make navigating untagged versions in daggerverse a lot easier
then you'd just do dag.CurrentModule().Version() right?
I'm looking at exposing some maven related configuration (to access maven packages, for the java sdk) at the engine level. So this is quite different than, for instance, the goprivate thing that was at the module level.
My first idea is to add a setting to the engine configuration file, but:
- there's no configuration related to package managers, so this could be the first, so wondering if that's the right place or if there's an other better
- this is related to one specific SDK
- I think this should be at the engine level, as it's usually a global configuration on devs/ci machines and not project by project
Any thoughts?
@tepid nova made a tracking issue at https://github.com/dagger/dagger/issues/10099
What are you trying to do? Access a module's own version information directly from within the module, without requiring expensive git tag fetching operations. See this conversation in discord: ...
This is becoming urgent -> https://github.com/dagger/dagger/issues/10067
π Bug I think...
dagger -M
dir=$(directory | with-new-directory <hit TAB>
more of a cli-dev thing, I suppose
What is the issue? Was using TAB completion in Dagger Shell and got panic when tabbing. Dagger version 0.18.2 Steps to reproduce dagger -M β dir=$(directory | with-new-directory <hit TAB> Log...
Taking a look, shouldn't be too hard π. Thanks for the repro π
Could anyone help me get this one merged? https://github.com/dagger/dagger/pull/10106
I think the remaining blockers are:
- Failing CI
- Failing eval on
withFile(I'm looking into it)
Ah CI fails because of the fix for escaped double quotes
need to re-generate the schema I guess
Getting errors when re-generating Elixir SDK https://v3.dagger.cloud/dagger/traces/71f52abceacb8bd517de7e85d20b5d17?span=d94946b5fa793c82
Ah @spark cedar now I know why there was a \"withExec\"
It was workaround by @obsidian rover and @charred lotus for a bug in Elixir codegen, that tries to interpret double quotes in graphql schema doc
ah hm
I just pushed a commit that removes double quotes from the api doc, escaped or otherwise
all tests seem to pass locally
but we use quotes in other places fine it sees
Oh?
re-generated everything, pushed
Yes thanks π We wanted to advance not go into that rabbithole πΌ We were tracking down the drop in perf of the eval (which was due to withFile)
@obsidian rover π good news we are doing the workshop without the core type haircut, so we don't have to rush the PR, we can take the time to understand the issue with newFile. Do you know why the LLM gets confused with that one in particular? Maybe the wording of the description?
I'm looking at making changes in the Java SDK regarding the way generated files are managed (to help on developer experience)
So I'm generating the new files inside the runtime, but when dagger export them to the host, I have an issue: if I remove a previously installed dep, the old files will still be there.
I was thinking about using the wipe option. Maybe in a way the SDK runtime can say it should use, or not, the wipe option (just because maybe not all SDKs probably want that).
On the other ideas I have, I was thinking about using the "WithVCSGeneratedPaths". Like to remove those entries before to export, so there's less chances things goes wrong than with wipe. If those generated paths represents the paths the codegen is handling, they should be removed then recreated with no risk.
Maybe there's a better way to do that? Is that already something happening in other SDKs?
so other sdks don't have this problem, because they generate one big file - so it's always overridden
but imo, we shouldn't require that - having lots of separate files is fine, we just need to work out how to have it work π€
i like the idea of using WithVCSGeneratedPaths
(although, maybe might be worth a rename to GeneratedPaths? since it's now more than just VCS - though for compat, we can't just rename, we'd have to create a new property, and deprecate the old one)
i don't think we can use --wipe though. because we only do one export of the generated diff, so that would actually wipe the entire module AFAICT
In java that's not really the way to do it π
But as I'm also working on the "vendoring" maybe it's not that important. Vendoring is not really something people are doing in Java, but it might solve this issue by grouping all the generated code into a single file (jar)
yeah, exactly - we should fit with each language's way of doing it (e.g. the php sdk does the same as java)
so, your suggestion of VCSGeneratedPaths seems pretty sound to me
even if that might not be strictly necessary here, this might be interesting to be able to wipe out generated files, so that we ensure this is always clean as generated. I'll first see how this can work with the vendoring, and then I can try to use it. That could also be an interesting way to enter more on the way modules are created (not just from the java sdk perspective)
looking for a quick β if anyone's around: https://github.com/dagger/dagger/pull/10131
Looks like this was missed in the past couple of ships
π little achievements that come with refactoring all of our git clone logic π
every single git command is now it's own span π
ooh ooh do image fetching next π (joking i know time is scarce)
w2b those 2-dimensional image fetching progress bars we had for a hot minute before pivoting to OTel
π for anyone interested in api stuff (cc @tepid nova), proposing a couple new apis for dag.Git: https://github.com/dagger/dagger/pull/9980 (amid a load of other changes)
Fixes:
Cloned git branches are replaced with tagsΒ #9328
π ModuleSource fails with would clobber existing tagΒ #9405
β¨ Support --depth when using dag.gitΒ #7637
...and a host of other inconsistencies...
i have a ton of other git api changes i want to make, but need to do some dagql hackery first (and will ship that in another pr)
@spark cedar since you're talking about git api , I have this https://github.com/dagger/dagger/issues/10137 π just to keep in mind when you're working on it.
Also, I love idea having more proper git api. That was necessary
replied - i think we already have the fix in main, it'll be part of the next release next week
π€ seeing a lot of flake on TestTelemetry/TestGolden/fail-log
https://v3.dagger.cloud/dagger/traces/a2c3d76190b656964ee666519ac46656?listen=b1a4b55f7b6f6f68&listen=72c12642e4d6c179&listen=6729e0e9c2f738f9#72c12642e4d6c179:L45
--- expected
+++ actual
@@ -17,5 +17,5 @@
β .failLog: Void X.Xs
! process "sh -c echo im doing a lot of work; echo and then failing; exit 1" did not complete successfully: exit code: 1
-β β container: Container! X.Xs
+β $ container: Container! X.Xs CACHED
β $ .from(address: "alpine"): Container! X.Xs CACHED
β β .withEnvVariable(name: "NOW", value: "20XX-XX-XX XX:XX:XX.XXXX +XXXX UTC m=+X.X"): Container! X.Xs
it's not happening particularly consistently, but I'm guessing probably a change since @civic yacht's engine-wide caching changes?
winget-pkgs π§΅
alright dagql is starting to make me feel crazy, i hope this is a dumb question, but in core/schema, how should i type a input that's essentially an arbitrary (module) object? I've tried a bunch of variations, and they either don't compile or segfault:
InitialSelection dagql.Optional[dagql.InputObject[dagql.Object]](missing method: typename)InitialSelection dagql.Optional[dagql.Object](missing method: decoder)InitialSelection dagql.Optional[dagql.ID[dagql.Object]](panic: runtime error: invalid memory address or nil pointer dereference,github.com/dagger/dagger/dagql.IDTypeNameFor({0x0?, 0x0?}), looks like it can't deduce the type?)
am i breaking ground here by having an input arg that's so generically typed? @still garnet @civic yacht
yeah that's not possible. technically we could allow an untyped ID but you're definitely breaking new ground
that's why binding for example has separate withFooInput / withFooOutput fields for each type
tempting to add a withFooSelection then, but that's gonna be a lot of tool bloat 
maybe withFooInput(foo, select: bool)? (gotta go read code, maybe it already does this lol)
wooo that did it @still garnet
the way i have the code structured right now is comically unorganized but we'll fix that in review lol, i at least got the desired behavior at prompt startup
nice, that sounds good to me for now - maybe in the future this could turn into function masks, like 'select but only expose these functions'
you'd need like a onlyShow(object) to get the right mask i think, but yeah that sounds right to me too, right now it comes up with ```
β what tools u got 5.8s
βπ§ what tools u got
β β 0.0s
β
βπ€ I'll help you understand the available tools and their functions:
β β
β β 1. selectRoot - This lets you return to the root level/main module. It doesn't require any parameters.
β β 2. selectHelloDagger - This lets you select a HelloDagger object by its ID. It requires:
β β β’ id parameter in the format "HelloDagger#number"
β β Once selected, it gives access to two additional tools (HelloDagger_containerEcho and HelloDagger_grepDir).
β β 3. helloDagger - This retrieves the user input of type 'HelloDagger'. It doesn't require any parameters.
β β 4. HelloDagger_containerEcho - This tool returns a container that echoes back whatever string you provide. It requires:
β β β’ stringArg parameter (the string you want to echo)
β β 5. HelloDagger_grepDir - This tool searches for pattern matches in files within a directory. It requires:
β β β’ directoryArg parameter (in format "Directory#number")
β β β’ pattern parameter (the pattern to search for)
although i guess the only thing to mask there would be selectHelloDagger and that might get weird with multiple insantiations of a module? is that a sensible thing to do?
i was thinking more like 'select this type, like Container, but only expose its functions withExec and file', or 'don't expose publish'
like further configurability of function masks
anyways... i just hit a surprising issue adding opts to WithFooInput: it's exploding when trying to load daggerdev's remote ependencies because they don't have these new input structs? am i correct in assuming this is some sort of sneaky codegen bug around these dynamic/hook apis? did you encounter anything like this when adding those APIs in the first place @still garnet ?
maybe related https://discord.com/channels/707636530424053791/1359007453907386368 @spark cedar
Request for dag.Env(): it would be cool to allow giving access to another module's dependencies
Maybe possible to pass an array of module sources
Ideally just a single module source, of the downstream dependent. Then the engine would inspect that downstream module's dependencies, and expose that.
Important because dependencies can be installed with arbitrary local names
Just one becauseβ¦. Multi-agent?
https://github.com/kpenfound/dagger-modules/pull/9 cc @tidal spire i fixed this just by dagger develop upgrading your dockerd module and un-checking-in its dagger.gen.go. to be totally honest with you i do not understand the chain of causality that made this fix work, but it did so π€·ββοΈ
Thanks!
@still garnet @tepid nova do you remember what we bikeshedded the name of the selectRoot tool to? i cannot find the thread for the life of me but i remember someone suggesting something that i liked a lot better than selectRoot
Yeah unselect / clear selection
Basically it should not be presented as something to select, but as the absence of selection. Will make more sense to human and LLMs alike
β€οΈ yes indeed it is much better
@final star @civic yacht I am experimenting with calling dag.Env() from a custom-made module runtime, rather than from the runtime itself. Long story short, I was hoping that dag.Env(privileged: true) would load the query object of the payload module, but I'm worrying that maybe it loads the query object of the runtime's own module?
EDIT: nevermind it works π
@final star @still garnet you mentioned bikeshedding the "privileged env" API. Here's what I propose (I will open an issue after a first round of bikeshedding) π§΅
anybody got any tricks for debugging codegen? I've got a syntax error on the inside of the go sdk's baseWithCodegen container, but i cannot get inside the container to see the file with the syntax error... error looks something like this ```β connect 2.1s
β load module 9.2s
! failed to serve module: input: moduleSource.asModule failed to call module "sub2" to get functions: call constructor: process "codegen --output /src --module-source-path
! /src/core/integration/testdata/modules/go/namespacing/sub2 --module-name sub2 --introspection-json-path /schema.json" did not complete successfully: exit code: 1
β β finding module configuration 6.2s
β β initializing module 3.0s
β ! input: moduleSource.asModule failed to call module "sub2" to get functions: call constructor: process "codegen --output /src --module-source-path
β ! /src/core/integration/testdata/modules/go/namespacing/sub2 --module-name sub2 --introspection-json-path /schema.json" did not complete successfully: exit code: 1
β β β ModuleSource.asModule: Module! 3.0s
β β ! failed to call module "sub2" to get functions: call constructor: process "codegen --output /src --module-source-path /src/core/integration/testdata/modules/go/namespacing/sub2
β β ! --module-name sub2 --introspection-json-path /schema.json" did not complete successfully: exit code: 1
β β β β go SDK: load runtime 1.7s
Error logs:
β Container.withExec(args: ["codegen", "--output", "/src", "--module-source-path", "/src/core/integration/testdata/modules/go/namespacing/sub2", "--module-name", "sub2", "--introspection-json-path", "/schema.json"]): Container! 0.2s
generating go module: sub2
Error: generate code: error formatting generated code: 3994:89: expected ')', found 'select' (and 10 more errors)
! process "codegen --output /src --module-source-path /src/core/integration/testdata/modules/go/namespacing/sub2 --module-name sub2 --introspection-json-path /schema.json" did not
! complete successfully: exit code: 1
Error: failed to serve module: input: moduleSource.asModule failed to call module "sub2" to get functions: call constructor: process "codegen --output /src --module-source-path /src/core/integration/testdata/modules/go/namespacing/sub2 --module-name sub2 --introspection-json-path /schema.json" did not complete successfully: exit code: 1```
-i works in this context π
@still garnet @civic yacht @obsidian rover @charred lotus incoming
: so yesterday i thought i was sooooo close to having a working API for env.WithFooInput(select:bool) but it turns out i'm tripping hard over what appears to me to be a fairly deep backwards-compat codegen problem... specifically with certain older modules, like in our build k3s and dockerd were pinned back, but also under test, egcore/integration/testdata/modules/go/namespacing/sub2. i fixed k3s and dockerd by dagger update the module, but i think the breakage here is such that we can't ship this without fixing the codegen because we're gonna break a ton of old modules being able to build on the new engine. the problem is like this:
- post-codegen, when we
go build -o /runtime, these modules have sub2/dagger.gen.go AND sub2/internal/dagger/dagger.gen.go. - the main-module dagger.gen.go typedefs a bunch of inputOpts like
type EnvWithCacheVolumeInputOpts = dagger.EnvWithCacheVolumeInputOpts. BUT the internal/dagger/dagger.gen.go has no such opt structs, instead it hasWithCacheVolumeInput(name string, value *CacheVolume, description string, selection bool), as though the new selection arg didn't have a default value? - i'm defining this api like so within the env install hooks, but this reads as correct to me afaict- i don't know how else you'd define a default-false bool arg.
{
Name: "selection",
Description: "Select this input to scope the available tools to this input's functions. More recent select inputs will override.",
Type: dagql.NewBoolean(false),
Default: dagql.NewBoolean(false),
},
seems like there are two different code paths being hit, and one of them is only checking nullability to determine optionality, instead of also considering whether a default is present?
removing Default gives me a working build, but the wrong SDK bindings- selection shouldn't be a required arg
right
what if i remove Type: lol (edit: unsurprisingly this does not work, engine segfaults immediately)
seems wrong, but maybe it'll infer from Default, their types are the same after all
what i'm saying is there are two ways a value becomes optional: 1. the type is Nullable, or 2. a default is present
so it seems like one path respects both, but maybe another path doesn't respect 2.
yeah sounds about right, i also just have 0 clue where theses paths live
should be under cmd/codegen/generator/go/ i think
what do we call the thing where there's a package main generated file? is that something that happens for all modules?
possibly relevant: https://github.com/dagger/dagger/blob/main/cmd/codegen/generator/go/templates/functions.go#L227-L232 - that v0.13.0 compat check may be relevant if you're only seeing this with older modules?
yeah that's the module entrypoint codegen
plausibly, yeah. one weird thing is that the test modules like sub2/ don't actually declare any sort of engine version
what does select:true do?
it selects the provided input for the llm, so like with env(privileged:true).withFooInput(foo, select:true) the llm will start in an initial state that just shows the functions of foo, plus an "unselect" tool that goes up to the root + dep modules. in the future if we deal with the merge conflict around @still garnet 's selectTools PR, it'll just mask all the tools that aren't on the input object
but so far we've tried to keep llm-specific concepts (like tool selection) out of Env. This would break that. Also selection is an implementation detail that I'd rather not have devs thinking about if we can avoid it.
i don't particularly like it either, i'm just trying to get something working
this is impl preference #4 at this point
the type system is fighting this entire concept at every step of the way
i've been digging through the codegen for an hour now trying to figure out why the defs side of things isn't hitting that "version check failed -> defaults are not optional" logic. frankly i'm over my head and need a pair, this is my first foray into our go codegen and the code reads as correct to me, and now i've simultaneously got the solomon backing the little voice in my head that says this API is unacceptable anyways
well - this API is really just a stand-in until we get rid of the idea of 'current selection', it might literally never ship if selectTools is done in time. Right now I'm running the evals in a loop and comparing before/after, things are looking good
i think in the Brave New World this would be replaced with a function-masking-like API which we've discussed exposing at the API level in the past (not just as an LLM impl detail)
i think we "just" need to bikeshed an ideal API that doesn't result in a type explosion on another type
effectively reimplementing using selectTools to create some default behavior that selects module tools when the module object is default-constructible
also @tepid nova , i'm curious if you also have objections to the first thing i was trying: env(privileged: bool, initialSelection: optional[object]) or to hide selection we'd call it scope: optional[object] ... considering selectTools, it'd be more flexible to have something like env(privileged: bool, scope: list[function]) , but i think there's prolly some un-handwaving to do around what list[function] actually is
worth mentioning that optional[object] is not currently a thing we can pass through the api, so it's current feasibility is not that different from list[function]
deeply appreciate the quotes on the "just" here lmao, even trying to figure out an un-ideal API here has proven quite challenging
even providing a list of functions to (un)mask is tricky API-wise. stringly typed? camelCase? snake_case? regex? 
is dagql.ID[*core.Function] viable?
maybe requires too much additional SDK faff to make it usable?
worth a shot
also thinking i might be able to go even more full-spike on withFooInput(select:true) to have something that doesn't break old modules: withFooInput(select: optional[bool]) and manually default the null to false prolly works around the codegen thing unless i'm totally on the wrong track as to what the bug is there
Would it be possible to have an issue that describes the current problem and how select: bool would solve it? Also how developers would be expected to use the API (ie. what would the docs and examples say). How does a dev decide what to select or not. If the issue is that models are bad at selecting stuff - does it means it is unable to select any object other than the pre-selected one? etc.
i can try to write a generic issue, but it's kinda challenging because entire structure of the llm's tool selection has been a moving target for 2 months
(rephrased: perhaps more sensible to wait until after selectTools removes the object selection thing)
Hello π
i have a long running dagger instance in which a dumb OPENAI_API_KEY env var is set.
When initializing another dagger session on the same engine, in parallel, I have the weirdest error: basically, the env var taking precedence is not the one in the context of the second dagger session, but seems to be the long running one:
dagger_dev
Dagger interactive shell. Type ".help" for more information. Press Ctrl+D to exit.
β run dagger build 0.2s
βπ§ run dagger build
β β 0.0s
β
βπ€ 0.1s
β ! POST "https://api.openai.com/v1/chat/completions": 401 Unauthorized {
β ! "message": "Incorrect API key provided: toto. You can find your API key at https://platform.openai.com/account/api-keys.",
β ! "type": "invalid_request_error",
β ! "param": null,
β ! "code": "invalid_api_key"
β ! }
! input: llm.withEnv.withPrompt.sync select: failed to send query after retries: POST "https://api.openai.com/v1/chat/completions": 401
! Unauthorized {
! "message": "Incorrect API key provided: toto. You can find your API key at https://platform.openai.com/account/api-keys.",
! "type": "invalid_request_error",
! "param": null,
! "code": "invalid_api_key"
! }
β
0.0s
Setup tracing at https://dagger.cloud/traces/setup. To hide set DAGGER_NO_NAG=1
# Killing all the long running dagger instances
β hello-dagger git:(main) β pkill dagger
# retrying
β hello-dagger git:(main) β dagger_dev
# works
Is this due to the dev setup ?
What are the mechanics for a runtime to create a new instance of an object that it previously declared a typedef for?
For example if my runtime returns implements this schema:
extend type Query {
myModule: MyModule!
}
type MyModule {
hello: HelloResult!
}
type HelloResult {
message: String!
}
I know how to create the typedefs for all that. But when dispatching a call to MyModule.hello(), how do I return an instance of HelloResult? Looking for an example snippet to get me on the right track
I'm trying to find clues in the source code of existing SDKs, but they are all very complicated...
the basic mechanics are buried
That depends on the language, but API wise you need to serialize it into json an call dag.CurrentFunctionCall().ReturnValue(dagger.JSON(serializedResult))
Nice thank you. So the type is implicit at that point I guess - since the engine already has the schema
Yep
If you mean the values for the function arguments, yes.
You can see Pyton decoding (json.loads) each in the beginning here https://github.com/dagger/dagger/blob/73ec91427bf9bcdfce710d2191c87b14c98af4f5/sdk/python/src/dagger/mod/_module.py#L217
One more question, I am always tempted to try <type>.WithGraphqlQuery() but never know what to give as argument. Is there a simple example somewhere?
In my present case, it would be particularly well suited, since I'm implementing a sort of middleware that involves Env.with[Type]Input etc. So right now I have to deal with dynamic type mapping from engine into my runtime, then immediately back out to the engine. Would be simpler if I could just pass the raw input bytes to the graphql query for Env.with<relevantType>Input
You have an example here https://github.com/dagger/dagger/blob/73ec91427bf9bcdfce710d2191c87b14c98af4f5/cmd/dagger/llm.go#L178. Basically takes a query builder selection as argument.
@rancid turret if I do querybuilder.Select("foo").Arg("bar", "baz") does that translate to foo(bar: "baz") ?
--> does Arg apply to the last Select?
Yes, it does.
You can also see it being used on a module that defines an interface. Just look at the generated dagger.gen.go (next to main.go, not in internal). It's used to return the underlying generated implementation of the interface (client binding).
@civic yacht I'm not sure, but this might be a filesync bug? https://v3.dagger.cloud/dagger/traces/c7695a1849e70aec3725ffe827d5087c?listen=9ef579938cd788d5&listen=be274c8844051c03&span=5d83f496a7a609dd
Granted I am developing a toy custom SDK, so it could be that I'm doing something wrong there - but I can't think of anything I did that would translate to this error
Also I had the same error yesterday, and (seemingly) solved it by removing a +ignore in my module's constructor. Maybe that was just a flake though (since the error is back in spite of no more +ignore)
It looks like it's the bug I'm currently working on fixing most likely, has to do with cross-session cache hits and +private state
is the agent-sdk code somewhere I can look at it? to verify that's what's happening and not something else
Yeah just pushed it. It does have a +private field, would removing that solve it?
Yeah, I repro'd in an integ test and rm'ing +private does avoid it for now
Ah yeah that's it
looks like I got promoted to a new error π thanks!
Just a quick question, are env var scoped per session ? or is it just secrets that are, on the engine ?
@civic yacht one more question sorry - it's a follow-up to what Helder was helping me with earlier... in my little agent-sdk I'm trying to return an object of a type I myself defined. I now understand I just need to return an ID... But I don't know how to create that ID π
that π
Also: how do I drop to raw graphql in Go? (from the standard dag. / dagger. bindings)
For objects that are defined in your module, you currently just return an actual json serialization of it. It's objects defined outside your module (i.e. core objects) that are returned as ID strings. I remember this being a weird annoying incongruency that we wanted to address eventually, but that's what it is for now.
e.g. if you define an object like:
Bar string
Baz int
}```
then you'd return `{"Bar": "thevalue", "Baz": 123}`
Thanks, actually that's perfect for my case (was worried I would have to implement a complex ID system)
there's a .Do method on dagger.Client that lets you do that
Now all I need is a way to query the various outputs of my Env, without actually casting to their native client-side types π (hence the "drop to raw graphql")
It looks like it's not exposed in dagger.io/dagger/dag FYI. Should be easy to instantiate my own client instead
Oh interesting, yeah I guess dag has .GraphQLClient() instead, for reasons I don't recall
How do I connect a querybuilder to GraphQLClient() ?
You can call .Build(ctx) on a querybuilder.Select to get the string representation of the query.
Apparently querybuilder.Selection also lets you set the client by calling q.Client(theGraphQLCLient), so that's another route, depending on what you're doing
Just a quick question, are env var
Has anyone seen this error?
Error: input: <...> no client configured for selection
Repro:
dagger -m github.com/shykes/x/agent-sdk/examples/crashtest -c 'go-program "write a curl clone"'
Error: input: crashtest.goProgram marshal: json: error calling MarshalJSON for type *dagger.Directory: no client configured for selection
oh wow i just hit this too, @obsidian rover or @civic yacht is this the fix commit? https://github.com/dagger/dagger/commit/9eb69cbf8f0d4b419fd63814c400117a8c13c42c
uhhh @civic yacht do we have any new known bugs in Host.Directory caching? dagger call test specific keeps getting cache hits on my machine with stale code that doesn't compile (i'm 100% there is not a compile error on the specified line in this trace), i've even checked out main and ran the same thing and it keeps hitting that same compile error somehow. if i do the ./hack/dev & ./hack/with-dev go test ... dance with equivalent args, i get passing tests
Dagger Cloud
Hey Alex,
How could we debug the directory ID issue on your branch ? I am trying, for example, for the LLM to select host directory and save it as directory#3, then passing it along as a source to a call to one of my functions
I am trying to have a nice shell <-> prompt mode back and forth and see how much I can stretch it ahah
--
For the prompt -> shell state retrieval, there's $_` to export back the last state, which is nice
good question - think i just noticed a similar gap
My current pattern is to import helper modules that all have defaultPath to be able to vibe use those helper modules (to retrieve the source code)
But having a pattern where I can easily pass along the container of the top level module (the workspace) and directories (the code) would make all those helper modules generic
$_ returns the last value but sometimes that's not what you want, because the LLM pointlessly called entries or something, and now that it can't 'select' an object, there's no way for it to go back and hand you the Directory
but, maybe it can with prompting, like "save that directory as $foo", and that'll cue it to call save(foo:Directory#1)
and then you can pass that $foo around via prompt var expansion
potentially it could even set an accurate description for the binding, to carry context forward 
does that track?
save being an exposed tool ? I don't seem to see it with $agent |Β tools
yeah, i renamed return to save in https://github.com/dagger/dagger/pull/10134
that PR's just about ready, i'm just donig a self-review now
We're on top of it already ahah -- it's much much better π
PR 10134
so my includeDependencies PR raises some weird questions about fn precendence.... TestShell/TestModuleLookup/stdlib doc function demonstrates the issue, you can shadow the stdlib git function with a module called git and if you serve that dep module it'll shadow the stdlib fn at least for the purposes of .help... shoutout @rancid turret for the lovely unit tests, wouldn't have caught the edge case without them. i gotta sign out rn but i'm curious if this is real problem or something we should just accept, like shadowing stdlib names with module objects is fairly inadvisable
Yeah, we found it was acceptable thatβs why thereβs a .stdlib | git and a .deps | git, so you can disambiguate them if needed. Itβs like the order in $PATH. You can replace a bin by placing it in the right path.
the weird part is once you've served the dep, i think it thinks that the dep module is part of stdlib
I don't see how it's possible that a dependency would show up in .stdlib.
! failed to create base container: failed to get packages: solving "busybox" constraint: resolving "busybox-1.37.0-r40.apk" deps:
! resolving "glibc-2.41-r1.apk" deps:
! resolving "ld-linux-2.41-r2.apk" deps:
! solving "glibc=2.41-r2" constraint: glibc-2.36-r3.apk disqualified because "2.36-r3" does not satisfy "glibc=2.41-r2"
! glibc-2.36-r4.apk disqualified because "2.36-r4" does not satisfy "glibc=2.41-r2"
Is back... @spark cedar @still garnet @civic yacht π¦
I'm not sure what's the process to fix it, but all CI jobs/local build will fail until it's fixed, I think last time it get solved after few minutes by wolfy
Looks like it's fixed now π
going to be off friday/monday for uk bank holiday, but leaving this here as an RFC: https://github.com/dagger/dagger/pull/10195
this adds contextual git - essentially the same behavior as contextual directories, but allows for getting the git data that the module was run in
i also find it surprising, but specifically .stdlib | .help git ends up showing the git module instead of the core fn when we serve it prior to inspectModule https://v3.dagger.cloud/dagger/traces/e92e6b0f1a5d34db4372885647588e60?span=4b13e53c7a63faf0
If anyone is looking for something to contribute... https://github.com/dagger/dagger/issues/10112
I've been hopping around different engine builds a lot lately so I've been using this function as glue. In case others felt the same pain π https://github.com/kpenfound/dotfiles/blob/master/.zshrc#L112
@fair ermine a bunch of TestClientGenerator tests started consistently failing about 19h ago out-of-band (no dagger changes). Just sent out this PR with a quick fix and full explanation of why: https://github.com/dagger/dagger/pull/10209
Can you look into a fix to either the client generator code and/or the TestClientGenerator setup so we can avoid this again in the future?
either way need a quick shipit on the above from anyone^
@still garnet Hey, can I freely using core.Tracer(ctx) around? I tried but I don't see anything popping off in cloud
or is it telemetry.Tracer(ctx
or maybe it's because i'm in a session handler and ctx is not what it's supposed to be
(an attachable)
Oh
// Disable collecting otel metrics on these grpc connections for now. We don't use them and
// they add noticeable memory allocation overhead, especially for heavy filesync use cases.
IIRC the problem was that buildkit hardcodes this installation of a grpc stats handler: https://github.com/dagger/buildkit/blob/6f5903130a809fae40f475a7f9f256078f1131ed/session/grpc.go#L56
which indeed had an unreasonable amount of overhead.
Since that time we have hard forked that code so we could probably just delete that line that installs the stats handler and then don't need to unset the otel stuff in our code
TIL: if you call Directory.terminal() in a dagger runtime, it panics π
Also a question @civic yacht if you're still around: if I call dag.Host().Directory(".") from my runtime, what directory to I get back? Should be the current workdir within the runtime container no?
Am I in the same execution context as a regular code of a module?
Yes, it should point back to the filesystem local to the caller, so in that case the container your code is in
So, if the official go SDK runtime were to call dag.Host().Directory(".") for debugging purposes, would it get the same directory than if the module's source code itself called dag.Host().Directory(".")?
ie. the runtime is calling dag.Host().Directory(".") on behalf of the module it's currently running?
Then, is there an implicit contract governing the runtime container's filesystem? Like what files will be where, what the workdir should be?
IIUC then yes
One thing I noticed is that the workdir seems to be hardcoded to /scratch and the workdir of my runtime image is ignored
Do I need to copy the module files into the container returned by moduleRuntime() or is that done by the engine for me?
--> nevermind, I think the answer is: no files get copied, you get an empty scratch by default
No, we don't codegen dag.Host() for the SDKs available to module code, so we don't really give any guarantees around the layout (even though you are free to use os syscalls to look around technically)
yeah your SDK needs to explicitly put what it needs in the runtime container
Oh right
which could be full source code, just a compiled bin, etc.
That's funny, since my runtime binary imports plain dagger.io/dagger, it does get dag.Host()
Yeah haha, we don't actually block it in the API itself, just on the codegen level, so I guess that happens to work
I guess I should be reading dag.CurrentModule().Source()
thanks for answering my silly questions
That's the official way yes. No problem π
I guess I'm realizing that I don't need to worry about copying the module source into the runtime container in advance, since I can always access what I need at runtime. And even export it to the runtime container if needed
I think I copy-pasted that from somewhere
Oooh no
I can access the module source but not the rest of the module context
if I need the module root I need moduleRuntime() to copy it into the container
But on the other hand, normally I shouldn't need anything outside the module source
if I do, my module layout is probably wrong
Yeah the engine only loads what it really needs from the root, which is dagger.json, the source dir and any include manually set in dagger.json by the user
So that's all your SDK would have access to copy in
Oh it looks like I can't call dag.CurrentModule().Source() during initialization:
failed to load config: input: currentModule.source module source not available during initializatio
I'm getting confused. Maybe it was necessary to access the module source from moduleRuntime() and copy them into the runtime container?
This is my runtime build function:
func (m *MySdk) ModuleRuntime(ctx context.Context, modSource *dagger.ModuleSource, introspectionJSON *dagger.File) (*dagger.Container, error) {
return dag.Container().
From("docker.io/library/alpine:latest@sha256:a8560b36e8b8210634f77d9f7f9efd7ffa463e380b75e2e74aff4511df3ef88c").
WithFile("/bin/"+m.ToolName, m.bin()).
WithEntrypoint([]string{"/bin/" + m.ToolName}), nil
}
I guess I'm supposed to copy the stuff I need from modSource into the container, and not try to get it from within the container at runtime by calling dag.CurrentModule().Source()
Well yeah there's a corner case that makes currentModule.source not work right now during initialization, which is when your module gets invoked by the engine to find out what type defs it has. Has to do with the fact that the module is only half-created by that point, so to speak. It's fixable with some more plumbing, but previously none of our SDKs needed to call that during initialization.
For now yeah you could just copy it in there so it's always available
Ok makes sense. My catch-22 is that I need to build a container from a Dockerfile then run it, in order to inspect its contents and then return my typedefs
Technically I could do all that from moduleRuntime(), but then I can't declare the typedefs for the runtime in advance, can I? Only the runtime itself can declare its own typedefs? If that makes sense
Only the runtime itself can declare its own typedefs? If that makes sense
Yes, which is a big thorn and leftover from extremely early days before SDKs were modules, but has still managed to stick around. I know Helder+Justin want to fix that as part of getting self calls working
what does this mean again?
engine crashed
I see. Feels like the sky is the limit, like moduleRuntime could even register different containers as handlers for different callbacks... Or even no container at all, just pass interfaces π
OK, that's a stable engine 0.18.3 so I might have an issue to file, once I figure out what's going on
Yes please do
docker logs should have it if that's where your engine is running
$ docker logs 67ba2d77e9e5 2>&1 | grep panic
panic: no ':' separator in digest ""
panic: no ':' separator in digest ""
panic: no ':' separator in digest ""
Apparently @charred lotus and @obsidian rover have been hitting that
But I'm the first to hit it in a released version?
panic: no ':' separator in digest ""
goroutine 39360287 [running]:
github.com/opencontainers/go-digest.Digest.sepIndex({0x0, 0x0})
/go/pkg/mod/github.com/opencontainers/go-digest@v1.0.0/digest.go:153 +0x94
github.com/opencontainers/go-digest.Digest.Encoded(...)
/go/pkg/mod/github.com/opencontainers/go-digest@v1.0.0/digest.go:137
github.com/dagger/dagger/network.HostHash({0x0, 0x0})
/app/network/hosts.go:16 +0x24
github.com/dagger/dagger/network.ModuleDomain(0x3130d90?, {0x4010497d40, 0x19})
/app/network/hosts.go:37 +0x4c
github.com/dagger/dagger/core.(*Container).WithExec(0x3126150?, {0x3130d90, 0x4007cea320}, {{0x402b235a40, 0x2, 0x2}, 0x0, {0x4003c20700, 0xdd}, {0x0, ...}, ...})
/app/core/container_exec.go:115 +0x684
github.com/dagger/dagger/core/schema.(*containerSchema).withExec(0x400783ea28?, {0x3130d90, 0x4007cea320}, 0x400623a600, {{{0x402b235a40, 0x2, 0x2}, 0x0, {0x4003c20700, 0xdd}, ...}, ...})
/app/core/schema/container.go:873 +0x180
github.com/dagger/dagger/dagql.Func[...].func1({0x400c140440?, 0x400623a600?, {0x0, 0x1, 0x402aebce40, 0x404807a428}, 0x0, 0x0}, {{{0x402b235a00, 0x2, ...}, ...}, ...})
/app/dagql/objects.go:704 +0x74
github.com/dagger/dagger/dagql.NodeFuncWithCacheKey[...].func1({0x400c140440, 0x400623a600, {0x0, 0x1, 0x402aebce40, 0x404807a428}, 0x0, 0x0}, 0x40176f42a0)
/app/dagql/objects.go:761 +0x10c
github.com/dagger/dagger/dagql.Class[...].Call(0x3188dc0?, {0x3130d90?, 0x4007cea320?}, {0x400c140440?, 0x400623a600, {0x0, 0x1, 0x402aebce40, 0x404807a428}, 0x0, ...}, ...)
/app/dagql/objects.go:266 +0x130
github.com/dagger/dagger/dagql.Instance[...].call.func2()
/app/dagql/objects.go:523 +0xa0
github.com/dagger/dagger/engine/cache.(*cache[...]).GetOrInitializeWithCallbacks.func1()
/app/engine/cache/cache.go:189 +0x64
created by github.com/dagger/dagger/engine/cache.(*cache[...]).GetOrInitializeWithCallbacks in goroutine 39360286
/app/engine/cache/cache.go:187 +0x4e0
my best bet is that somehow CurrentModule has an empty "" InstanceID, but no clues on why or what's expected
me also, also mcp branches, especially easily triggered by claude for some reason
Were you doing anything with custom SDKs? Or changes to existing SDKs?
I could definitely see this with a new SDK that's doing stuff others hadn't before, like what Solomon's working on, but more surprised if it's happening outside one
If anyone has a repro I can run lemme know
lol i wouldn't wish our repro on you... an extra repo, a branch with like 30 commits, install goose and teach it to run dagger mcp
lol no problem, I can work off the stack trace for now
the mcp panic and solomon's do go through the exact same stack too, if that helps
what branch is it? skimming through the diff might be helpful to see if it's changing something that would cause it
it would happen if a module creates a withExec during initialization, which would most likely explain why Solomon hit it, but want to make sure that's the only case
Okay yeah can repro it exactly in an integ test where I have a custom SDK try to create a withExec during initialization. Will fix @tepid nova
Still curious to take a look at the mcp code that's managing to repro it since I wouldn't have expected that to result in an SDK making a call like that, but maybe it does somehow? Or maybe the LLMs are themselves making calls that result in that somehow? That would be close to my dream of "AI-driven fuzz-testing" if so π
oh yeah sorry. will push tonight.
there is no llm involved, I just buikd the mcp server container and exec it to dump the tool list for introspection
im not actually sure off the top of my head that the dagger/dagger branch is even necessary for anything other than triggering the codepath through mcp
No I know for you there wasn't, but for Connor+Tibor it sounds like there was
there was, but an LLM external to dagger
i will try to find a more shareable repro monday, the bug is mega-annoying particularly because we have other bugs on gpt, like claude gets the furthest and works the best but triggers this thing for some very unlucky reason
even just a cloud trace from it happening would help
oh true i can probably get that pretty easy, but no longer have work puter...
fix here: https://github.com/dagger/dagger/pull/10213, want to push a little bit more to ensure if we hit this again it's just an error and not a panic along with a few more integ tests, but it does fix the panic for the "withExec during module init" case so far
okay good to go ^ also enables dag.CurrentModule().Source() during init now since that being prevented had the same root cause as the panic ultimately
You are on fire with the bug fixes today, my god
I gotta balance out the days Iβm on fire with the bugs π
It's strange, the error from traces is timeout due to inactivity
https://v3.dagger.cloud/dagger/traces/be71907299b4592913a0bb16003e3c1e
Have you even seen this error happens before? The tests succeed and GH job succeed but the dagger cloud job fails
.stdlib | .help git
Dagger Cloud
FYI v0.18.4 is out now, has quite a few fixes for various issues brought up here the last week:
- LLM env being re-used across different
daggerinvocations - contextual dirs failing to load or loading older syncs of local data
- Various permutations of secret-related errors (sessions/clients not found)
CurrentModulenot fully working in SDKs
ModuleSource digest
π something incredibly satisfying about squashing 4 different bugs with one giant PR
question to git users - is it expected if i call dag.git("https://github.com/dagger/dagger.git").tags(patterns=["v0.18.3"]) that the output contains all of these:
helm/chart/v0.18.3
sdk/elixir/v0.18.3
sdk/go/v0.18.3
sdk/php/v0.18.3
sdk/python/v0.18.3
sdk/rust/v0.18.3
sdk/typescript/v0.18.3
v0.18.3
to me, this seems wrong π€
the behavior seems to have been introduced way back at the beginning in https://github.com/dagger/dagger/pull/7742, which references https://github.com/dagger/dagger/commit/260bde5070780b3ce267bde79de3a66c73eca6a6, which also has this bug
just encountered this in the context of realizing that the outputs to git tag -l -- <pattern> and git ls-remote --tags -- <pattern> do not really match π€¦ββοΈ (and also the buildkit code it's inspired from is definitely wrong, and is since fixed upstream)
fyi @still garnet @obsidian rover who worked on the initial impl + tests
does patterns=["v0"] also match everything containing "v0"? i.e. is it always just a wildcard/glob/partial match?
Yeah I believe it's always worked that way, "pattern" implies regexp for me
^v0\.18\.3$ would match specifically the one tag v0.18.3
(or at least used to)
ah it's not regexp, it's a "shell glob" per "section of the ref" - so "v0" won't match anything, you'd need "v0*" to match the entire final component
also not all git tooling takes the same kind of pattern - some takes literal strings, some take shell globs, some take fnmatch patterns
why not
note: I'm fine with the current behavior, but just need to make local directory tags match it (it currently doesn't at all). but thought maybe just worth checking anyways
This is a regression, no? I have a distinct memory of using ^vx.y.z in the past
so this definitely doesn't work today
it also isn't a pattern accepted by ls-remote which is how it's implemented in the very first version of this PR: https://github.com/dagger/dagger/pull/7742
so a bit confused
our docs describe the arg as "glob patterns" - but annoyingly, i don't think there's a way to select just v0.18.* for example (to get everything in the v0.18 series), without including all the sdk/ tags as well
@still garnet found a slightly odd bug during https://github.com/dagger/dagger/pull/10183
so in core/telemetry.go we were calling FieldSpec without a view, which meant that we never showing output from any field that had a View declared - so Container.withExec.stdout would only show output from withExec, and never from stdout.
i've "fixed" this, and now stdout output gets shown as well - but kinda curious if this is actually the desired behavior?
going to leave this here in case it's useful: https://pkg.go.dev/mvdan.cc/sh/v3/pattern
aha it is π we currently use moby/patternmatcher, but yes, this would be much better lol
I just got this when attempting to upgrade from 0.17.x to 0.18.4:
Error: failed to serve module: input: moduleSource.asModule failed to load dependencies as modules: failed to load module dependencies: select: failed to load dependencies as modules: failed to load module dependencies: select: failed to load dependencies as modules: failed to load module dependencies: select: failed to call module "proxy" to get functions: call constructor: process "uv pip install -e ./sdk -e ." did not complete successfully: exit code: 2
Stderr:
Using Python 3.12.10 environment at: /usr/local
error: Distribution not found at: file:///src/xxh3:4c5e95bf6a598fb0/proxy/sdk
There were regressions in the Python SDK, a fix has been merged for 0.18.5: https://github.com/dagger/dagger/pull/10252
Is that what you're hitting perhaps?
cc @rancid turret
I'll wait for 0.18.5
Which repo would you like to understand?
It's out!
That fixed the distribution not found error. Now I'm left with the env redeclared error π
Runtime dev question: when currentFunctionCall().parentName() return "", does that mean we are executing the module constructor? And if so, we should check for constructor arguments?
Yes
New side quest: a micro-framework for making it easy and fun to develop custom dagger runtimes π
I really think their power is underestimated and under-utilized
imagine if platform teams could easily create runtimes for their app teams to use for zeroconf delivery
would have to be at least 2 orders of magnitude easier to develop than today
Did we mess with the core Git API recently?
I'm seeing strange git-related errors on a vanilla module, using a vanilla 0.18.5 release
I think 0.18.4 had the same problem
Another runtime dev question: what's the exact rule for currentFunctionCall().parentName()? Especially when dealing with modules that have multiple layers of objects...
currentFunctionCall().parentName()
hm okay, so this is one of those annoying cases where the failure here is "expected"
i'll have a look at improving that
but the bit below that is the important bit - there's a context canceled error, that's the bit that actually failed the job
pr to fix the unhelpful false positive here: https://github.com/dagger/dagger/pull/10286
(also that check wasn't really working properly, so fixed it as well)
fix to resolve our recent red streak on ci - looks like an upstream git provider made a breaking change π https://github.com/dagger/dagger/pull/10289
Yet another runtime dev question.
I am fuzzy on the lifecycle of services in general. Is there any way that a runtime could start a service, and that service could continue to run for the duration of the caller's session? So that the service would run across several invocations of the runtime, to serve different function calls by the same session?
Ah, I guess if I return the service as a field, the service would continue to "exist" after my runtime exits, and its lifecycle would work as usual, ie: the service could keep running until the caller's session ends?
Follow-up question... In the other runtime implementations I'm looking at for inspiration and copy-pasting... When the constructor is called, they return a dagger.Module, in other words a module definition. But where do I return the values of my fields?
Theoretically that should be the case yes, and I can't think of why it wouldn't work. But new territory of course so if it doesn't work for whatever reason let me know
FYI, I was reading the engine/client/client.go code and I saw some potential dead code: https://github.com/dagger/dagger/blob/main/engine/client/client.go#L134-L149. Could those if c.SessionID == "" ( be non empty strings ? π€
indeed, does not look like those are set
@tepid nova π had a spare moment, so kept hacking away at the monolith of ci by splitting out the engine: https://github.com/dagger/dagger/pull/10296
Mmmmmm... With the way the runtime API works today, in theory any type could have a constructor?
This message is probably more appropriate in this channel. I'm currently running into a few show stopping issues on 0.18.5
You can technically define a constructor for any type but they should be ignored by the engine unless it's the main type. Similarly you can define a name for the constructor but it's ignored by the engine. At one point the constructor was in the list of functions but with an empty name. Right now it's still the same struct as a normal function, but in its own field.
@civic yacht that shell function wrapper idea is growing on me π i will try to do a POC
We discussed it a little more with @final star and one downside emerged which is that it looks like code, when really it's meant to be configuration. But maybe in practice it's fine. "configuration as code" has been done before π
Yeah I'd say that extremely simple code (i.e. just setting vars to values) is almost indistinguishable from configuration, practically speaking. And obviously we're very acquainted with the problem where configuration sprawls and becomes so complicated that you have "bash encoded in yaml"-type situations, at which point code is preferable anyways.
@civic yacht If you have a moment, I think this PR can be worth reviewing https://github.com/dagger/dagger/pull/10309
It's the decoupling of the SDK interface + some refactoring to make the code better (spoiler: the GoSDK now has its own little file)
Well here itβs setting vars to values AND capturing them in fn closures, no? (Sorta rhetorical question, my imagined interface is likely overcomplicating things)
I would probably skip setting the variables in this solution. shell functions with literal arguments is enough imo
This sounds like a solution to the input sprawl to the CLI. I hope it is but I'm in the dark about the proposal. Excited either way π
syntax example maybe? i'm prolly overly hung up on shell func details, and idk the details of our mvdan sh lib dep, but conventional bash fns don't have parameters, they just use positional args, so without "closures" (technically i don't think they capture either, sh scoping is weird and lexical iirc) idk how this looks, are you picturing like funcShort(arg1='str', arg2=host().file('path'), arg3=$env) { fn(arg1, arg2, arg3) } or maybe alias funcShort=fn('str', host().file('path'), $env)? I was initially picturing funcShort { fn('str', host().file('path'), $env) }... tbh of these 3, aliases are prolly least error prone
github() {
.deps | github --username=shykes --token=op://foo/bar/my/credential
}
cool we're picturing the same thing then, and that's open to "extension" likefuncShort() { fn 'str' host().file('path') $env } but whatever parts of that we want can be disabled prolly
Speaking of which. Whenever I need to codegen for any of my modules in order to do local dev, I find it awkward when dagger develop automatically bumps dagger engine version to latest (based on my CLI version). I don't want that unless I explicitly request it. Dagger update works for me for this purpose but by instinct I still use dagger develop
I agree
this bugs me too, but i think you can workaround via dagger develop --compat=skip?
(idk how long that's been there, i just noticed the "skip" i the help when i went to check the arg name, previously i had been manually providing the existing version like --compat=v0.18.3 and that was mega annoying)
I wasn't even aware of --compat. Still feels awkward and easy to miss
I guess I need to alias my dagger develop
I'd always want --compat=skip
as maintainers, we do have our ulterior motives - we want ppl to update
I understand. As a module maintainer, I have similar motives, but I want to plan around that update instead of it being offered every time.
Yeah I agree there's probably a better way to get our cake (getting people to update) and eat it too (offering a clean and pleasant DX)
@civic yacht sorry to be late to the party on the secrets caching issue, but have you already considered the option of never caching secret lookups outside the current session?
it's not a problem of caching secret lookups per-se. The problem is more about what the behavior should be when you do something like e.g.
someSecret := dag.Secret("env://BLAH")
dag.Container().
WithSecretVariable("TOKEN", someSecret).
WithExec([]string{"do something w/ token"})
The current one I'm working on specifically being when the cache for the WithExec should bust if the secret changes. We've always made the "name" of the secret the cache key for it, since we just inherited that from buildkit. But for other use cases, especially those involved URI-based secrets (i.e. not the old setSecret) and cross-session cache hits, it is sometimes preferable to cache based on the plaintext value.
However, in other situations, it actually isn't preferable to bust cache whenever the plaintext value changes (i.e. an equivalent secret that rotates often). So need to choose the best default and also support the other use cases.
What I'm going with currently is:
- Use digests of the plaintext as the secret cache key by default (so different than before)
- Allow the user to optionally configure the cache key of the secret if desired, e.g.
dag.Secret("env://BLAH", dagger.SecretOpts{CacheKey: "my-token-group"})someSecret.WithCacheKey("my-token-group")--token env://BLAH&cacheKey=my-token-group
@civic yacht @fair ermine I added a repro here https://github.com/dagger/dagger/issues/10300#issuecomment-2845466410
Can a function connect to a service it created?
or, can a function mount a unix socket into a container then connect to it?
basically is there any way for a function & container to send data streams to each other
atm I think it would require calling .Up on the service, but at a cursory glance we only have integ tests for up being called on the host so not 100% sure what the behavior is right now when called in a function. All the session attachable stuff should be setup to enable that theoretically though
ah, was this maybe something I ran into with the vault provider tests @tepid nova ? Here it is working in the tests: https://github.com/dagger/dagger/blob/main/core/integration/secretprovider_test.go#L140