#maintainers
1 messages Β· Page 10 of 1
It's similar to entrypoint in an oci image, just more specialized. If you call my build function and get a container, you may not know what's the right shell to call in the container.
The alternative is to use the actual OCI entrypoint. But it's super overloaded, you have to share it with services running in the contianer, and who knows what else, so I'm not a fan of that option.
To all the core developers on Mac and working inside a VM (due to build constraints), what's your config ? It takes me +9 minutes per dev build on my mac m1, i allocate 16go of ram + 5 cores, inside an alpine arm64 vm π’
My strategy is to not know how fast it would be on native linux
that way I'm not sad
@rancid turret I have a use case in our CI that will greatly take advantage of your upcoming CLI improvements π
Which one? Printing object fields? π
Yes
Just need to add tests to that, have you been able to test it out?
And probably others too (can't remember which right now)
Not yet, will do so tomorrow morning. Also I have 2 nits, but can be follow-ups.
- Would like to default to json rather than yaml (bikeshed alert!)
- For reasons that I will gladly explain live (related to my dogfooding) it would be awesome to broaden the set of printable fields, to include recursive search for scalars in objects
- Recursive field search
So for example, if I dagger call lint which returns a LintReport, which has a function issues() which returns an array of LintIssue, then the result might be:
$ dagger call lint
{
"errorCount": 1,
"warningCount": 0,
"issues": [
{
"text": "You messed up on line 10 my friend",
"isError": true
}
]
}
For a list that should already be supported. I have an example in the description, with a list of containers. EDIT: Err.. I think that's only when the leaf is the list of objects, not when it's in a field.
Not sure what to do with this:
Heads up, when the final leaf happens to be a binary
Filethecontentsbasically dumps the file into the terminal. Bit of usability issue.
We could have a simple heuristic based on filename extension and/or size and/or presence of binary data?
SDK sources in runtime module
Actually if the leaf is a file, then I think we should dump it (possibly with protection if there's binary data)
I was thinking of a case where the leaf is an object, with children that return files (requiring a print/no-print decision)
quick other question πΌ
I've been willing to print the core/schema variables for debug. I used to do: telemetry.globallogger but it doesnt seem to exist anymore (recent change). I'm trying with
slogger := slog.SpanLogger(ctx, InstrumentationLibrary)
slogger.Warn(":jack_o_lantern::jack_o_lantern::jack_o_lantern:")
where slog is the dagger/engine/slog implementation, with no chance. Do you use something else ? π
What command specifically takes 9 minutes? Just building the engine for me takes probably 2 minutes tops with no cache, 30s tops with cache but some changes. Thatβs in a linux vm on m1. I give the machine all cpu cores from the host and 32gb ram (which is overkill).
I use UTM to run the VM, debian testing for the distro
I just hit this error in the middle of my inner devloop... One second dagger call worked fine, the next:
$ dagger call --source=.:default sdk python lint-report
β initialize 4.4s
! get module name: returned error 400 Bad Request: failed to get schema for module "dagger": failed to get field spec: failed to decode default value for arg "runtimeSource": decode "Directory" ID: failed to decode base64: illegal base64 data at input byte 0
Error: get module name: returned error 400 Bad Request: failed to get schema for module "dagger": failed to get field spec: failed to decode default value for arg "runtimeSource": decode "Directory" ID: failed to decode base64: illegal base64 data at input byte 0
Nevermind I found it... it's those aspirational "context" pragmas that got me π
func (t PythonSDK) LintReport(
ctx context.Context,
// Source code of the Python runtime (written in Go)
// +default="../runtime"
runtimeSource *dagger.Directory,
// Dagger documentation (it has python files to lint inside)
// +default="/docs"
// +ignore=docsIgnore
docs *dagger.Directory,
) (*LintReport, error) {
FYI this is how the first pass at "contextual modules" will work @bronze hollow @copper snow π
There is nothing sadly... waiting for someone to have cycles to prototype it
fyi to core devs - we've merged https://github.com/dagger/dagger/pull/7635
this will likely cause testdev to timeout and fail until we fix the underlying engine bug. we think this is related to some of the timeouts we've been seeing before - but we didn't catch it due to this ci misconfiguration :oldmanyellsatyaml:
so don't be too alarmed by seeing failing testdev checks for the time being (it's known, and being discussed in https://discord.com/channels/707636530424053791/1250122361131499664)
How is this going to work when a module is called remotely? How is this going to work with constructor functions?
This is what I do at the moment: https://github.com/openmeterio/openmeter/blob/main/ci/main.go#L36-L58
also, in starting to plan for v0.12, wanted to start a conversation about how we're going to handle backwards-compat in modules (so as to avoid as much disruption to end-users as possible) https://github.com/dagger/dagger/issues/7640
in your case, dagger --context instead of dagger call --ref
default paths are relative to the context dir, which may or may not be a local dir
--context can be a local path or a git repo reference?
that will also allow engine to send accurate git metadata in its telemetry, which will unbreak your Traces UI (at the moment you're not getting grouping per PR and other UI conveniences because of your remote module + git checkout in the module)
correct
cool!
Context directory access: https://github.com/dagger/dagger/issues/7647
@spark cedar @rancid turret Hey π
This PR is ready to be integrated with Go & Python, I let you do the update
https://github.com/dagger/dagger/pull/7498
A small PR that fixes the License generation issue: https://github.com/dagger/dagger/pull/7658
cc @tepid nova @still garnet @rancid turret
Hi, I would like to request for review on this fix https://github.com/dagger/dagger/pull/7661. The elixir runtime cannot run on aarch64 because the image has only amd64 ><
Starting a release thread for v0.11.8 π§΅ (we've been discussing it in other places, this is for consolidation)
π @rancid turret @fair ermine I'm stuck in a cab in NYC if you want to talk live about context dir let me know
I have a call with Gerhard in 10 minutes so I can but fast
Ping if you want to call
I think we're good! thanks @fair ermine @rancid turret
No, just went for lunch.
tldr: keep the idea of dag.contexr().directory() as a default value. forbid calling it outside of a default value
Alright!
Any thoughts on this? Is there a downside to doing this by default? https://github.com/dagger/dagger/issues/7668
What are you trying to do? Creating a "Daggerverse" that has many modules residing in the same Git repository is a common practice today. However, not everyone will commit the generated c...
Just realized with https://github.com/dagger/dagger/pull/7476 that the dagger Cloud integration on our CI links to our Dagger Cloud's team, which seems accessible to just our team.
Any external contributor would have a hard time benefitting from it (without creating an account and running it against its own org), I wonder if we're planning on extending it ? Having like a public/contributors view.
They can still use the local Dagger for sure, but might miss the awesome new experience π₯Ή
Yes we were talking about that just last friday. Since we are focusing on daggerizing other open-source projects, this quickly will become a problem. We need either 1) better support for OSS projects in Dagger Cloud, or 2) a way to manage checks without Dagger Cloud (ie. with code running in the engine).
I would prefer option 1, Traces has been getting better and better, and we already have a free plan, it would be a shame for OSS projects to not benefit
@civic yacht, can we sync on https://github.com/dagger/dagger/issues/7647 in a couple of hours, after my kid goes to sleep? To make sure we're talking about the same thing.
Yep! ping anytime!
@still garnet @spark cedar I'm stuck on a point of architecture in the question for dagger call fix...
Hey Erik, If you have some time today, could you please take a look at the implications of allowing absolute paths (not sure): https://github.com/dagger/dagger/issues/7212#issuecomment-2174255325 ? π Just double-checking as these were forbidden at first + couldn't find on the blame / commit history explanations. Context is probably hidden on some issue's comment somewhere ahah
Let me know when you're available π
will call in 5m
@still garnet, coming back from PTO I noticed that OTel live progress is simpler to implement in SDKs. How much simpler? π
I would be happy to get some thought on https://github.com/dagger/dagger/issues/4855 if you have a minute @rancid turret
@civic yacht @spark cedar @wet mason @wild zephyr on the topic of "dagger behind corporate proxies", which as we know is a frequent source of issue for production deployments... How do you feel about the idea of packaging a pipeline into a self-contained artifact (possibly an OCI image but not necessarily) that contains everything needed to load and execute a given module? Like a "static compilation" option. The thought is growing on me as I see more people get stuck on this (see #1252980234001584150 for a live example)
I'm getting the feeling that 1) downloading engine image 2) downloading dependencies to build SDK 3) downloading dependencies to build the module, will
not be something we can fix for everyone, no matter how many configuration knobs we give them.
I'm not opposed - but how does this interact with the idea of one artifact? Letting anyone make an artifact like this feels a bit different from that goal.
quick thought, esp if I can get to things from my desktop that work for testing locally but break when in a pipeline due to proxy/firewall - would be nice to be able to download the dependencies into the repo as like a cached version then when then repo is downloaded for the GH action to run dagger - then it would be 'local' and ready to go...if thisd is even possible with go.
Two related but orthogonal features I think.
Also it's really heard to define what all the module deps even are - if I take a version number, that might mean a dependency on an image, or a pip package or anything like that - how do we grab every possible one of those?
I think if we're going to keep running pipelines in containers - then they'll always be some degree of needing to do proxy magic, etc.
I mean module loading dependencies . You're thinking of runtime operations correct? We wouldn't bundle those , at least not initially. They are different because they are in the end user's control. It's the user's dependencies. Whereas the deps I'm talking about are ours
@still garnet Hey, if you have 1 minute, I would love to get some hint on this one: https://github.com/dagger/dagger/issues/4855#issuecomment-2180808062
@spark cedar on the topic of "% of free disk or % of total disk?" you explained something about why "free disk" is not desirable. Could you explain it again when you get a chance? π§΅
β¨ Log warning when using deprecated APIs...
I don't know if it's only me but my dagger dev engine is way way faster than it used to be, before a dagger call --source=.:default test custom --run="XXX" would take 4-8minutes, now it takes 2-3 minutes maximum
That's super cool
OK @still garnet custom spans are cool π
I do have nits on the API but will start using the shit out of this in the meantime
haha nice. is this using the python otel sdk?
No I wrote it in Go
Actually it is written in go, I'm just refactoring it
ci/sdk_python.go
The part that makes me sad, is having to write custom spans just to add more human-friendly description on an existing dagger function call
I understand what you said about a "description" argument, which invalidate cache for stupid reasons. But I think there's a good API in there somewhere
like maybe just dag.Context().Description(string)
oh, i have an issue for showing a function's documentation in a tooltip, maybe related?
or maybe just always sending a span with the comments of the function?
talk about an incentive for documenting your functions π
should be easy to implement, all the data's available in the right spot, just need to add the attribute
i guess the main downside is bandwidth, since every API call would have its docs attached
probably compresses well though?
Technically could be cross-referenced with daggerverse index?
probably safer to just send in the telemetry though
btw for the api here, do you mean the otel sdk? or something in dagger
Test failure: terminal timeout
My usual thing about otel-specific vs dagger-specific and related tradeoffs
so Tracer()? or are you skipping that
@stray heron @civic yacht @spark cedar @astral zealot have we made enough progress on the "engine stability" part of prod-readiness, to resume the work on its packaging and prod architecture? One binary / all-in-one image / server mode / etc? It's a major source of issues, currently in pause because an unreliable engine is objectively worse.
(sorry if I'm pinging people on vacation, please ignore in that case!)
If it's too soon, I can snooze and bring it back in say 2 weeks
This issue for example is packaging/architecture: https://discord.com/channels/707636530424053791/1253636253358755840
(and many others)
Everyone using EXPERIMENTAL_DAGGER_RUNNER_HOST, and the associated helm chart, is in limbo until we address it
(that includes our own CI by the way)
on the telemetry draining side of engine issues, I've fixed a few more things that give me more confidence in it, will have a PR up soon
@still garnet @dense dust looking at this trace, it looks like there's a new UI for surfacing errors? https://dagger.cloud/dagger/traces/91c376c4f033a78d517639291213fc24
@still garnet maybe a stupid idea. Have we ever considered decoupling the ID of an object from the "recipe" that produced it? And instead making it a digest of its serialized state? Or is that already how it works?
@vito maybe a stupid idea. Have we ever
Maybe it's a cultural thing but I hate our auto-squashed git commit messages, so verbose
git.Tags(ctx) for gitRepository type ?
As the git.Head() was recently implemented, this: https://github.com/dagger/dagger/blob/66e98c39ef4ed0aa17bbb2c8b47e48b10620bdaa/core/schema/git.go#L232-L252 is not needed anymore, as it can be replaced by (which is lazy):
dagql.Selector{
Field: "head",
}
This means that, when loading a module, we will get one less external call, AND also benefit from the auth of the git implementation πΌ
However, we still have some logic to load the tags of a git repo: gitTags (https://github.com/dagger/dagger/blob/66e98c39ef4ed0aa17bbb2c8b47e48b10620bdaa/core/schema/git.go#L255-L302), used to have a pattern match of a versionTag and a subpath in a monorepo.
I am currently exploring how to add auth to this very call, BUT, I wonder if adding a tags(ctx) functions to the gitRepository implementation would be neater ?
Asking for opinions prior issue / PR πΌ
I think I had a PR for adding this API, will try to dig it up Monday. It was part of other changes but never ended up being merged.
π§΅for v0.12.0
I seem to remember we discussed about restoring the dagger CLI into the dagger engine container at various points - I feel like https://github.com/dagger/dagger/issues/7733 would be a good concrete reason to do this.
@tepid nova I remember you suggested this as a first step towards one artifact.
Any objections if we did this?
We should definitely do it!
@gerhard @Erik Sipsma @jedevc @matipan
Hey @civic yacht , I am currently checking on implementing the git.tags() whilst piggybacking on the git auth (instead of just an os.Exec, as i need it for the private module support).
I am however unsure how / and even if I should find a way to access the gitdns.newGitCLI inside the core/schema/git.go or if the logic shall be located elsewhere? Or if I shall reimplement something like it directly with https://github.com/sipsma/dagger/blob/64f4aaa2f794f4cb92ae0a833b61a953bc63df6f/engine/sources/gitdns/source.go#L286-L286 inside the core git schema
Would you be around for some brain dump ? Otherwise I can fire a draft and you can comment directly on it ?
I'm currently debugging something, how about you send the draft PR and I will take a look this afternoon/tonight?
β¨ v0.11.9 - 24th June 2024?
Hey, got a PR to fix an issue I discover on the TS SDK with default value: https://github.com/dagger/dagger/pull/7740
/cc @rancid turret (you might sleep) so I also ping you @still garnet π
I also updated https://github.com/dagger/dagger/pull/7719 based on your comment @rancid turret π
@civic yacht Is there a way using the current dagger API to get the root context of a module (git dir if it exist, module root if not)?
If so, how can I do it? Is see ModuleFunc has a mod field with a *Module type that has Source but is it only the module source code or it also include parent context dir?
If not, I guess I need to add something in the CLI to send the context (Directory) to the module so it can load any contextual directory or file from it?
Where are you trying to get access to it from? i.e. From the module code itself vs. from the internal engine APIs vs. from an SDK implementation
From the internal engine API, in https://github.com/dagger/dagger/blob/main/core/modfunc.go#L114
I'm going to open a PR soon, so it will be easier to understand what I'm trying to do
Ah okay yeah you should be able to use this mod field: https://github.com/dagger/dagger/blob/ea7fae0716990314ca221fa98f2ed227da5a48c4/core/modfunc.go#L25
That type has a GeneratedContextDirectory field that you can use: https://github.com/dagger/dagger/blob/ea7fae0716990314ca221fa98f2ed227da5a48c4/core/module.go#L39
If you instead need the context before codegen has run, you can use this Source field: https://github.com/dagger/dagger/blob/ea7fae0716990314ca221fa98f2ed227da5a48c4/core/module.go#L21, which has a ContextDirectory() method: https://github.com/dagger/dagger/blob/ea7fae0716990314ca221fa98f2ed227da5a48c4/core/modulesource.go#L247
Yeah I think I don't care if the codegen has run or not, as long as I can get the context path in the typedef
I opened a draft PR to validate the direction I'm taking https://github.com/dagger/dagger/pull/7744
Is there a way to convert this Directory returned to an identifier? I don't have access to the ID method from it
If you use the GeneratedContextDirectory field you can yes because it's dagql.Instance[*Directory], you should be able to call .ID() on that. If it doesn't matter either way whether there's generated code I'd just use that since it's already been generated by this time so you won't be adding an extra overhead using it.
Yes but I need to select a specific path from it, so I need to access the Self.Directory and retrieve it's ID
// We do the path resolution later.
dir, err := fn.mod.GeneratedContextDirectory.Self.Directory(ctx, arg.DefaultPathFromContext)
if err != nil {
return nil, fmt.Errorf("failed to load contextual directory %q: %w", arg.DefaultPathFromContext, err)
}
How do I convert this dir to ID? It's a core.Directory (maybe @spark cedar has an idea?)
Ok so I've check with debug and I'm able to retrieve the files or dir using mod.GeneratedContextDirectory, now I'm stuck with transforming the dir or file into an ID
Why do you need it's ID?
To transform it into an input
but yes, i think you need to convert it to a dagql.Instance somehow
And set it in the function call
Is there any example in the codebase of something like that?
i'm honestly not sure π
I'm setting the input here: https://github.com/dagger/dagger/blob/main/core/modfunc.go#L114
Okay, I'll try to find a way
so the issue is that IDs are from a query
which is represented by the dagql.Instance
so if it's from an input, you can get it as an instance i think
otherwise, you'll need to use Select I think to make it into one
Yeah I guess Select is a good idea
no problem, take your time
It seems AsBlob function can actually convert the directory into an instance
@spark cedar Okay I manage to get an identifier! However, the module source only has its module directory, not the whole contextual repo
Do we have something in the API to access the full context inside mod, or it's only scoped to the module itself and I need to add some features there?
do you not have access to ContextDirectory? that should grab everything
In GeneratedContextDirectory or Source.ContextDirectory?
Source.ContextDirectory surely?
I'd Select the contextDirectory field to get the ID
that directory should have the entire module source
I didn't use select, just
// TODO: path resolution later.
dir, err := fn.mod.GeneratedContextDirectory.Self.Directory(ctx, arg.DefaultPathFromContext)
if err != nil {
return nil, fmt.Errorf("failed to load contextual directory %q: %w", arg.DefaultPathFromContext, err)
}
mm i don't think you want the generated directory
And then I do
server, err := fn.mod.Deps.Schema(ctx)
if err != nil {
return nil, fmt.Errorf("failed to get schema: %w", err)
}
dirInstance, err := dir.AsBlob(ctx, server)
if err != nil {
return nil, fmt.Errorf("failed to get dir instance: %w", err)
}
dirID, err := dirInstance.ID().Encode()
if err != nil {
return nil, fmt.Errorf("failed to encode dir ID: %w", err)
}
return JSON(fmt.Sprintf(`"%s"`, dirID)), nil
right, we should avoid using AsBlob here - that's specifically only for converting local sources to a fixed blob
it's quite expensive - so we shouldn't do it more than we need to (though it's fine as a HACK for now)
Yeah I just want to get it work first, then optimize
Like for enum
So you say:
ctxDir, err := fn.mod.Source.Self.ContextDirectory()
if err != nil {
return nil, fmt.Errorf("failed to get context directory: %w", err)
}
// TODO: path resolution later.
dir, err := ctxDir.Self.Directory(ctx, arg.DefaultPathFromContext)
if err != nil {
return nil, fmt.Errorf("failed to load contextual directory %q: %w", arg.DefaultPathFromContext, err)
}
sure, maybe? i'm honestly not entirely sure, it'll take a bit of poking around
but i think ContextDirectory should have everything you want
Yeah it seems it only returns the module's source
ls
LICENSE backend dagger dagger.json
dagger call test
2024/06/25 12:09:49 WARN failed to get repo HEAD err="reference not found"
β connect 0.2s
β initialize 14.0s
β prepare 0.0s
β context: Context! 0.0s
β Context.test: [String!]! 2.3s
dagger
dagger.json
import { dag, Directory, object, func, context } from "@dagger.io/dagger"
@object()
class Context {
@func()
async test(@context("/") dir: Directory): Promise<string[]> {
return dir.entries()
}
}
backend & LICENSE are missing
I'm going to check if I can work around it with AsGitSource or something
@spark cedar I think there's something not normal with the uploaded context
It misses repo source files and only has dagger.json & source directory but not the complete repo.
Do you have some hint on where the context is uploaded from the dagger CLI, I cannot find it so I might missing something?
I want to check if the source has somehow the full context and if something goes wrong
Hey @civic yacht, could I get some hint on this one ^
I'm stuck with that, the overall context directory DX is done but I'm not able to retrieve the full context directory in the source, I don't understand why
the full context directory in the source
Can you clarify what that means? What's "source"?
Based on this, I assumed that ContextDirectory has the full context dir, but it's only composed of dagger.json and the module source dir
See #maintainers message
What I want is an access to the full context, but it's not what I get when I query the entries of the dir, so I wonder if it's actually uploaded in the ContextDirectory, and if that's the case, where?
If not, I'm searching where do we uploaded the module context dir in the CLI but I cannot find the line
See my draft PR there: https://github.com/dagger/dagger/pull/7744
Oh it's a call to AsModule I guess
Yeah, it's not the CLI. It's done by the API. Remember you can load modules via a graphql query.
Are you using a module with a context dir that's different from the root dir?
How can I know that? here's my dagger.json
{
"name": "context",
"sdk": "typescript",
"source": "dagger",
"engineVersion": "dev-f3bfc77a728042b219f46859c6e748a20118f526"
}
Where's the dagger.json in relation to the git repo's root? Or is it not in a git repo?
I have a git repo,so my context shoudl be the root of the repo, but I only see dagger and dagger.json
At the same level
ls
LICENSE backend dagger dagger.json
I want to see backend in the list of the entries
I don't right now, so I'm exploring why :/
Not all files should appear as you think. They go through the include/exclude patterns. Better put it in a subdir of the git repo to see it more clearly.
Like CI ?
I would assume that if there's no include/exclude directive, it will take everything
Ohhh actually if I add include: "backend", I can see the backend dir in the context
That's funny
{
"name": "context",
"sdk": "typescript",
"source": "dagger",
"engineVersion": "dev-f3bfc77a728042b219f46859c6e748a20118f526",
"include": ["backend"]
}
dagger call test
β connect 0.1s
β initialize 11.7s
β prepare 0.0s
β context: Context! 0.0s
β Context.test: [String!]! 2.6s
backend
dagger
dagger.json
@fair ermine assuming you're trying to implement the "context dir" feature, I don't think you'll be able to reuse the existing Module.contextDirectory() implementation directly, because it optimizes include/exclude to only get what's needed by the SDKs - but you want to control the include/excludes directly without these optimizations. Does this seem right @rancid turret @civic yacht ?
If I specified with include what directories I want in my context, it works fine actually
And yes, I'm working on the implementation
Btw, we assume the dagger.json is at the root of the context right?
Right but the context dir feature should not be affected by the include and exclude fields in the dagger.json, they are unrelated
so if I have
tree -L 2
.
βββ LICENSE
βββ backend
β βββ foo.txt
βββ ci
β βββ dagger
β βββ dagger.json
βββ frontend
βββ foo.txt
We want to be able to access backend because it's at the root of the git repo?
Yes. Either ../backend or /backend. Assuming you're showing the root of the git repo
Ok then I need to add some pieces of logic in the module to load a context as something different of the module context dir
As you said there:
Yes I think so. But it's been a while since I've looked at the internals, so you'll need to double check with an expert π
@civic yacht Any thought?
My guess is that your implementation will look a lot like the implementation of Module.contextDirectory but cannot be built on top of it
Based on my research, I think you're right, the context directory is related to the module itself, not the actual repo
(back from meeting, catching up)
It's the same path, just different contents because you'll use the ignore patterns in the source code instead of the ones in dagger.json, and you have a specific dir to return.
Hmm I don't think it's the same path actually
For example here: https://github.com/dagger/dagger/blob/6200b6dacea33857a850373a3119d2dcf921b5d8/core/schema/modulesource.go#L32
The mod path for something like
tree -L 2
.
βββ LICENSE
βββ backend
β βββ foo.txt
βββ ci
β βββ dagger
β βββ dagger.json
βββ frontend
βββ foo.txt
But I'll also need a contextPath that matches the root of the repo (if the module is inside the repo)
and I'll need to load the directory at the context path inside the module
For example, with this repo I got: kind=LOCAL_SOURCE modPath=./ci repoRoot="<nil>" repoRootSubdir=
Even if it's a git repo, at the root, because dagger.json isn't at the root of the repo
- The context directory is either the git repo root, or the root directory if not in a git repo
- The root directory is the directory where dagger.json is
- The source directory is the directory where dagger.json points to
They could all be different, or all the same, depending on the case.
The context directory is either the git repo root, or the root directory if not in a git repo
Yes, but right now we don't have this context directory implementation in our API
Yes we do, but I don't know what you mean exactly by "implementation".
Can you point me to where then? Because ContextDirectory in the Module isn't it, it's the root directory if I follow the list you mentionned
https://github.com/dagger/dagger/blob/6200b6dacea33857a850373a3119d2dcf921b5d8/core/modulesource.go#L247
And even there: it's using the root dir (of the module) not the contextual
(Which make sense since it's the context of the module) but in my case I cannot use this
Are you sure it's the root directory? I'd ask you for more details on your repro.
Yeah the ContextDirectory should be the .git root dir. If that dir also contains dagger.json then that's still valid
Yes, but not if dagger.json is at a lower level like
tree -L 2
.
βββ LICENSE
βββ backend
β βββ foo.txt
βββ ci
β βββ dagger
β βββ dagger.json
βββ frontend
βββ foo.txt
import { Directory, object, func, context } from "@dagger.io/dagger"
@object()
class Context {
@func()
async test(@context("/") dir: Directory): Promise<string[]> {
return dir.entries()
}
}
Output:
dagger call -m ci test
ci
(My repro is inside a git repo)
l
total 32
drwxr-xr-x 8 tomchauveau wheel 256B Jun 25 19:36 .
drwxrwxrwt 8 root wheel 256B Jun 25 12:36 ..
-rw-r--r-- 1 tomchauveau wheel 358B Jun 25 12:36 .envrc
drwxr-xr-x 12 tomchauveau wheel 384B Jun 25 12:38 .git
-rw------- 1 tomchauveau wheel 10K Jun 25 12:36 LICENSE
drwxr-xr-x 3 tomchauveau wheel 96B Jun 25 12:39 backend
drwxr-xr-x 4 tomchauveau wheel 128B Jun 25 19:36 ci
drwxr-xr-x 3 tomchauveau wheel 96B Jun 25 12:39 frontend
What Solomon asked is that even if dagger.json is in a sub directory, the context shouls be the git repo if the module is inside it
Otherwise, it's the dir of the dagger.json
I'm available for a call btw if you wanna talk about it
That output looks expected, the entries() is running on the dir that contains .git and showing just ci/, which is due to what was mentioned above about context loading currenty being optimized to only load what the sdk needs.
The dir containing .git is the context dir, the ci dir containing dagger.json is the root directory
Okay, make sense, if I include ../backend, it adds the backend in the output
Btw, the loading of the context dir that optimizes to only include what the SDK needs + what the user explicitly specified in their include/exclude settings is here: https://github.com/sipsma/dagger/blob/5231aa7de19266430581da054604ce533cc85898/core/schema/modulesource.go#L641-L641
{
"name": "context",
"sdk": "typescript",
"source": "dagger",
"engineVersion": "dev-f3bfc77a728042b219f46859c6e748a20118f526",
"include": ["../backend"]
}
dagger call -m ci test
backend
ci
Just to avoid misunderstandings, I'm simply describing what is written in the issue https://github.com/dagger/dagger/issues/7647
No change from that spec
So should I also keep a copy of an unfiltered version?
No I don't think so. In the case of huge monorepos loading the entire unfiltered context dir is too expensive. Imo what we should do for this feature is load the context dir as specified by the user's function (i.e. in the @context("/") you have above) at the time it is needed, leaving the current context loading we already do as is for now.
There's a million things we will want to do to optimize this more, but I'd say start simple for now. I'll get some links to give you pointers to how to do this, one min
(This is also just imo and open to discussion/design too)
Okay yeah, so we don't preload any specific context if it's not asked, that make sense and if we load a context, it's only scoped to what the user asked
I like this idea
That's what the context directory is, as it works today.
Yeah, but it depends on the dagger.json include/exclude, I didn't connect the dot on that because I got baited by the requirement of include
No, the path is the same, no matter what's in include/exclude. Just the contents differ.
Yeah, I was talking about the content
Well to be totally clear, we should leave the loading that already exists today, because that is needed by the SDK to even build the module in the first place. But for this new feature of user args that ask for specific directories from the context, load those on demand.
The optimizations I mentioned would allow us to reuse the load cache between all these different calls, but we can just get it working first and optimize later.
mb if that wasn't clear
Okay, let me know
It might be better to just comment on your draft PR so it's not lost on discord, I'll go there
gave some high level pointers here: https://github.com/dagger/dagger/pull/7744
@civic yacht could you remind me the minimal set of env variables needed to run a dev engine?
After you've run ./hack/dev to provision it in docker you can set _EXPERIMENTAL_DAGGER_RUNNER_HOST=docker-container://dagger-engine.dev to point the CLI/SDKs to it
Thanks.
By the way I meant to file an issue, but it's difficult for a dev and stable version of Dagger to co-exist on the same system: each CLI immediately nukes the engine of the other
Yeah that's from early days and just because we didn't want to leave users in a state where the engines accumulate indefinitely w/out manual intervention. But all the recent work to modularize CI and current/planned work around API-ifying engine state, compute drivers, etc. is gonna allow us to improve that as a side-effect
It should be controllable from the CLI I think, in one form or another
If we could get an escape hatch experimental env variable to disable it, in the meantime that would come in handy I think
I just hit an extreme case of this problem, where I tried using dev for a demo today, to show off the interactive debug. But have to re-build cache for extremely expensive operations (CPU speech-to-text inference). If I run into an issue, I can't roll back to my stable engine with its pre-built cache: I have to rebuild the cache for stable too now
Ah yeah, the other part of this is that we currently don't re-use cache across engine versions (docker volumes in the case of engine in local dockerd). That's the right place to have started but a pretty blunt hammer to avoid problems around the cache format/state changing from one version to another. Also ties into storage-driver+compute-driver work.
Yeah I think this would be trivial to implement, worth opening an issue
Yeah to be clear, I wasn't expecting cache to be reused (although obviously that would be great too :), I just was hoping that the old engine (with its separate cache) would still be there for me to fall back to
for this I use ./hack/with-dev and a local dagger-dev shortcut:
β― cat ~/.bin/dagger-dev
#!/usr/bin/env bash
exec ~/src/dagger/hack/with-dev dagger "$@"
which leaves my stable engine alone
Hey @Erik Sipsma , I am currently
Hey @rancid turret @spark cedar
Good news, I made some nice progress on the context directory, I'm going to write some tests soon
However, I have a small question regarding the loading of relative path, let me know what you think of https://github.com/dagger/dagger/pull/7744#issuecomment-2191213961 when you have few minutes π
@wet mason @still garnet am I missing something about the attachable terminals? It seems to pop 2 terminals, one after the other, the second one after the first one exits. this is my code:
// Returns a container that echoes whatever string argument is provided
func (m *MyModule) ContainerEcho(stringArg string) *Container {
return dag.Container().From("alpine:latest").Terminal().WithExec([]string{"echo", stringArg})
}
this is what i get:
β― dagger call container-echo --string-arg=foo
β Attaching terminal: container: Container!Container.from(address: "alpine:latest"): Container!
dagger / $ whoami
root
dagger / $ exit
exit 0
β Attaching terminal: container: Container!Container.from(address: "alpine:latest"): Container!
dagger / $ whoami
root
dagger / $ exit
exit 0
β connect 0.8s
β initialize 5.9s
β prepare 0.0s
β myModule: MyModule! 0.0s
β MyModule.containerEcho(stringArg: "foo"): Container! 40.6s
β Container.sync: ContainerID! 0.0s
Container evaluated. Use "dagger call container-echo --help" to see available sub-commands.
I pushed a first attempt to implement the context directory, it works locally with absolute path for both directory & file.
I'm going to write tests and do the relative path implementation soon
Feel free to give me a first review to catch any mistakes I made
cross-posting this here since I think it could make some of the current mindless boilerplate annoyances in engine development a lot easier if it pans out: #daggernauts message
Let me try to repro β¦ thatβs never been an issue but there was a big rebase right before merging
Looks like itβs getting evaluated twice?
Terminals open sequentially one after the other, so if for some reason that code gets evalβd twice (even in parallel), thatβs what youβd see
func (m *Interactive) Container(ctx context.Context) (string, error) {
return dag.Container().
From("alpine:latest").
WithExec([]string{"sh", "-c", "echo hello world > /foo && cat /foo"}).
Terminal().
Stdout(ctx)
}
$ dagger-dev call -i container β Attaching terminal: container: Container!Container.from(address: "alpine:latest"): Container!Container.withExec(args: ["sh", "-c", "echo hello world > /foo && cat /foo"]): Container!
dagger / $ exit
exit 0
β connect 0.2s
β initialize 1.5s
β prepare 0.0s
β interactive: Interactive! 0.0s
β Interactive.container: String! 3.0s
hello world
@spark cedar that's from the branch as it was merged ... trying on main now, maybe there's been a regression
Nope, same, only one terminal
Ok, I managed to repro by returning the *Container instead
@civic yacht @still garnet Looks like returning a Container causes the terminal to open twice ... I guess because the CLI is re-evaluating the container, it triggers the Terminal instruction again?
Hm yeah I suspect it's because Terminal is impure, so it will get evaluated multiple times. The thing is if it's not impure then you'd get other surprising situations in which you might legitimately be trying to open different terminals on containers that end up being the same, so not sure if just removing impure would be ideal either.
In dagql it's possible to have an impure function return a pure ID, maybe that's what we want in this case?
Ugh. I think you or Alex made a comment about setting it impure and I forgot to integrate it
oh no yeah:
dagql.NodeFunc("terminal", s.terminal).
Impure("Nondeterministic.").
oh ok just read the second part of your comment
Maybe it could just return the original dagql.Object?
Yeah I think it's opening twice because of that. The simplest fix would be to just remove that Impure. It would create surprising situations like I said, but that would be much more obscure than the current behavior. It would be something like you have Terminal calls on just the same alpine base image in two places; in that case you'd only ever get one Terminal. Or replace "the same alpine base image" with any two containers that are equal in their ID, same thing.
I think if you just change this line to return ctr, nil that might do the trick? https://github.com/dagger/dagger/blob/main/core/schema/container.go#L1437
Yeah actually that is probably pretty easy, just return dagql.Instance[*core.Container] from the parent object
That edge case (2 terminals get accidentally de-duplicated) could be solved with an optional string argument? Like description or title or name or something
Also possible (and maybe a nice thing to have independent of all this) but would put more onus on the user to set those, so given the fix should be simple I think it's not strictly necessary here
Sorry I meant a fix to the future edge cases caused by your fix
- Not marked impure
- 2 terminal calls on the same container will only be run once
- Solution: set 2 different names to those terminal calls, to make sure they both run
If we do this #maintainers message there shouldn't be any more edge cases, it can remain impure, different calls to Terminal will open twice even if containers have the same ID but the exact same call to Terminal won't execute twice
Ah I missed that line
I wonder who will publish the first "terminal bomb" module to daggerverse π
@civic yacht btw when you get a chance, this is the terminal on failure part of it: https://github.com/dagger/dagger/pull/7683
Yep on the TODO list!
no rush, π
Radical thought of the day: we should stop bending over backwards to get module source code to work magically with vanilla native toolchains. That means giving up on getting codegen to play nice with those tools, and embracing the dagger way: containerized dev environments running in dagger.
Of course that requires polishing our dev container story (which we need to do anyway). That seems like engineering work better spent, than to add hacks on hacks in dagger develop to try and make dagger-blind linters happy. I don't care if that linter is happy, actually. Just daggerize the linter, problem solved
I would really really like us to start working together to decrease the size of our GHA configuration instead of growing it π
I understand the pressure to just get things done in prod, why take the scenic route when you can just patch the yaml and move on. But if we can't get our own YAML mess under control, what chance does everyone else?
Yeah agreed, I suspect a lot can be trimmed by just modularizing even more. The less straightforward parts involve the need to spread jobs across runners (including eg macos) but I feel like we could improve on it still. One easy option would be to use dagger to generate the yaml, which would honestly make the release process a little easier too
I feel like that idea came up at some point in the past, rings a vague bell
use dagger to generate the yaml,
I have been dreaming of this... It might actually stress test how we do introspection. Because you need to write a module that knows about other modules in ways that may not work (hopefully that makes sense)
The less straightforward parts involve the need to spread jobs across runners
I think pipeline infra is fair to configure in YAML. But we should aim for zero pipeline logic. We're still pretty far from it
Oh sure, I was actually just thinking literally writing code that builds up structs that serialize to the existing yaml, but you could get fancier like that too. The APIs for querying deps exists so should be possible already in theory, or at least close to possible
I just had to run this to (hopefully) unbreak my CI checks... find . -type f -name "dagger.json" -execdir sh -c 'dagger develop' \;
This would be amazing
I updated https://github.com/dagger/dagger/pull/7744 with tests & final implementation, I convert the PR to ready.
Feel free to do the Go & Python implementation when you're ready! @rancid turret @spark cedar
The last things to do is the ignore, which is simply an extra parameter π
Been doing this every time there's an engine update
nice! Don't underestimate the ignore though, it doesn't map directly to llb.local include/exclude. I guess it's the exact opposite of patterns from views though, maybe that will help?
GitHub actions yaml removal
What do you mean?
Race seems to have been introduced on main by https://github.com/dagger/dagger/commit/82b67892fa47038e85f7bf226e25dfedce0aa320 π
Do you have a link to a failure?
https://dagger.cloud/dagger/traces/c71cecd3d42758687c7437e4b7ec98d1?span=1302c9920a784c73, was currently exploring it, but this one seems pretty deep, you might be faster πΌ
And commits prior this one doesn't have the failure, so I presumed it's been introduced by this commit
I mean you will need to convert gitignore format to buildkit include/exclude format
I have a fix, it's a bit ugly but it works: https://github.com/dagger/dagger/pull/7778
race got triggered again. It's possible this is not the root cause π’ I'm at the limit of my knowledge, turning the PR as draft again. Can even close it if someone have better ideas
hmm, might have something to do with testctx. it's really weird that Go detects the race but doesn't seem to actually print the detected race anywhere. :/
Heads up, we're about to merge a PR cleaning up our CI pipelines: https://github.com/dagger/dagger/pull/7766
It's not a major refactor, but a batch of small changes that we'd been discussing with @spark cedar . Sets the stage for more refactoring
Noteworthy:
github.com/dagger/daggeris no longer a dagger module. We're using the "dev module" pattern that @rancid turret started in the SDK modules. This means you will need to dodagger call -m dev. The goal is to fix the UX papercut that makes this necessary.- sub-modules are under
github.com/dagger/dagger/dev/FOO, for examplegithub.com/dagger/dagger/dev/go.
Last time I made a contribution to our CI pipeline, I triggered a hidden problem, so tests passed in my PR but then started failing in main... Fingers crossed this won't happen this time!
@spark cedar I'll add the README change in a follow-up PR, trying to get this in before my time window closes and it falls behind again
OK I just had my first taste of test instrumentation in Dagger Cloud... Pretty magical @still garnet !
Actually I'm having trouble with failing engine tests... π Can anyone reproduce this?
dagger call -m github.com/dagger/dagger/dev@pull/7766/merge \
--source=https://github.com/dagger/dagger#pull/7766/merge \
test \
all --race=true --parallel=16
Saw the ping on the PR, the job was just cancelled for some reason and based on the traces it ran for 5m, got cancelled and then the GHA job was stuck for 17m. So it may not be anything to do with your PR. I'll try re-running it to see if it happens again
Ah okay yeah I'm running locally too in parallel, will look at that trace
Actually that trace doesnt have any errors? Just this
I think I CtrlC-ed it because I wasn't sure if it was hanging or not.
- There was a TUI output with a list of tests that seemed to be running
- But stdout was showing me a go-style list of FAIL lines (without details), with no movement after 45mn
Yeah the tests take a while locally so won't know for a bit. Related side-note: when running locally it's better to drop that --parallel flag or set it to your number of CPUs (unless of course you actually have 16 cores). Overparallelization makes it slower.
Running locally I am seeing absurd memory usage from dagger call -m ... test all itself, currently 8.2GB RSS and climbing. There has been some known unoptimized code there related to the trace collecting for a while, but I've never seen it get this high before. Not sure if related to your PR or not
That could explain jobs getting cancelled if they are OOM'd
I have 32GB of memory in my VM so we'll see if it can hold out
I got this locally
Is that from a race condition getting detected? or is the test crashing for some other reason?
@tepid nova the failing test in your PR passed on a re-run: https://github.com/dagger/dagger/actions/runs/9700566209/job/26822541272?pr=7766
Here's what I have. It's still running, duration counters still moving, but nothing else is moving. You can see the trace still running: https://dagger.cloud/dagger/traces/c0a65fc6829c38d901f137ef46b714c7
Oh signal: killed is consistent with OOM killing
Oh this trace does have errors
You could try watching htop output or equivalent as the tests are running to see. I'm guessing this is on macos for you locally so not sure how to check the kernel logs of the VM used by orbstack/etc.
didnt redirect the logs so lost in the buffer, rerunning
Okay a few more interesting things:
- When I run the tests locally with
dagger --progress=plainI don't get the absurd memory usage, so something about the TUI is related. Also worth noting that this is in the "outer" CLI, which is v0.11.9, so I also want to test if the memory usage goes crazy when using the latest dev CLI. - In my re-run of the tests in your PR that passed, they passed in 17m (which is slow) but looking at the output it all was going fine and expected speed until it hits one part of the
TestExecError, which deals with a large 100KB string being included in error messages, at which point it gets stuck for like 9m before continuing on and finishing the rest of the execution at normal pace.
Overall this feels like some sort of performance issue (or multiple issues interacting), possibly centered around traces+output.
@tepid nova I can't see how this could be related to your PR and like I said the tests in CI did pass on a re-run, so I'd be okay with merging it and I can keep looking at all this separately.
Ah ok thanks! I was wondering if somehow my PR triggered this, but I couldn't see how
Will merge now then, so I have buffer time to be around if main breaks like last time...
Oh, all checks passed on re-run, too!
The only thing I can imagine is something in the PR is tweaking something somewhere and revealing pre-existing problems, but still better to merge and fix those at this point rather than keep it in rebase hell
tweaking something somewhere and revealing pre-existing problems
This seems to be my specialty these days π
It's a very valuable skill!
OK it's merged, I'll keep an eye out for breakage on main
Speaking of which... https://github.com/dagger/dagger/actions/runs/9717679155/job/26824096680
I guess there are additional checks in main, and since my PR changes CI itself, there was a testing blindspot?
Yeah just saw that, the macos error is a known flake (we have to install colima on it, which seems to work like 50% of the time). The Go SDK provisioning one is a new one, looking
Possibly a hardcoded ci/ that we missed and should renamed to dev/?
in some obscure yaml config somewhere
Yeah these tests use actual CLI+Engine builds published off of main and verify we can provision them. There's equivalents in PRs but we don't do really image publishes/CLI uploads on those so they aren't exactly the same
I see this error in the logs, not sure if it's an expected error
Also that test doesn't seem to run in dagger (no failed trace in dagger cloud)
At first glance that error sounds like it is trying to concurrently provision the engine on the same dockerd as another job and hitting some race condition? Can't imagine how it would be related to ci->dev
Or really anything in your PR. But I also have never seen that one before, looking more
yeah if that error seem like it could be the actual cause of the test failing (as opposed to a "normal" error expected by the test) then yeah I agree
I think it is the cause yeah
Glancing, but I think we encountered something like before with @astral zealot
We were doing something weird with the dind jobs
Could be a weird side effect of an artisanal script looking for ci/consts to inject the version, injection doesn't work, CLI doesn't see its version, thinks it needs a new engine?
The engine it's trying to provision looks correct though, we publish an engine off main with tag set to the git commit, which matches the image of the container it's trying to provision
Oh wait... you have to look in the GHA logs for now since there's a problem happening at the very startup of the engine container: sed error.
Which is in this script we run on the entrypoint to setup cgroups...
<old man yells at cgroups>
I think the error message about the Conflict when trying to provision the engine container is actually a red-herring after all. I forget we explicitly check for that and handle it https://github.com/sipsma/dagger/blob/0290df5df67b2514a0dd045652b0083017a86b14/engine/client/drivers/docker.go#L240-L240
(according to git blame I wrote that code 2 years ago, so
)
The provisioning test passed on a re-run, so yeah this is all unrelated to your PR.
I know that I have seen the sed: write error at least once before sometime/somewhere, but never in CI like this. I'm gonna have to just open an issue for now cause my stack of tasks is already about to topple over.
Following up on this, when I use the dev CLI instead of v0.11.9 the memory usage is still high in TUI mode but not completely absurd (1.3G RSS now vs 10+GB before). I saw this commit on main https://github.com/dagger/dagger/pull/7671 which based on the description seems almost certainly to have fixed it, so good to go there π
I'll give running that TestExecError locally a few tries locally to see if I can repro that weird 9m hang
@civic yacht just double-checking that you still think this is unrelated to my PR, and you don't need me to do anything? I don't want to drop the ball
No none of this appears to have anything to do with your PR, you got "lucky" and triggered some weird stuff that appears unrelated. I'm jsut opening issues for now
If what I'm seeing is correct, I think our test execution times are sometimes at least 5m longer due to a weird hang in TestExecError. So silver lining is that once fixed maybe we'll get another nice boost in CI test execution time?
@fair ermine do you use github.com/dagger/dagger/sdk/typescript/dev for SDK development? Is it part of our CI?
Issue for that here: https://github.com/dagger/dagger/issues/7786 really really odd, it seems possible to me at this point it could even be a GHA problem rather than our own
Oh no this is not about that go sdk provision error, I opened an issue for that here: https://github.com/dagger/dagger/issues/7785
The issue right above is something I noticed when I re-ran your PR tests and they took 17m in the GHA job step but the cloud traces showed they only took 9m
To be clear, I looked around our CI on main for the last few commits and found at least one more instance of that same weird test execution time discrepency, so for sure unrelated to your PR
Some similar things https://github.com/dagger/dagger/pull/7644 https://github.com/dagger/dagger/pull/7706
As was discussed here, since #7628 got merged we are using dind runners for SDK jobs. Default dind runners have 32c and 16Gi, which for SDK jobs is excessive. Since we now have support for dynamica...
On mobile, but the weird docker collisions in the jobs themselves were caused here
Oh that error message in the go sdk job was a red herring, see #maintainers message
Potentially also testctx changes might have upset some timing things here?
I forgot about code I wrote 2 years ago lol
Also possible, it's very hard to tell what's going on there between traces in the cloud, what the CLI does in the GHA job, and the GHA infra itself, all of which could be culprits
If there's any infray related suspicions, it's worth cross posting to #infrastructure - I know there were a couple changes in the last couple days, around worker defs, etc, so not impossible that something fragile in our interaction there broke
There's the new main workers?
By "infra" I meant Github in and of itself, not us necessarily. One plausible explanation is that we are trying to print output to the GHA job and getting blocked. The hang always happens when printing a 100KB single line string, so doesn't feel outlandish that we hit some corner cases there. But also possible the CLI somehow doesn't handle printing that well, etc.
^ Adding to that suspicion is the fact that the hang is only apparent when looking at the GHA job timestamps, it doesn't show up in the traces at all (fwict anyways)
Huh that sounds odd, and not great π if it's still around, I'll try and have a poke on Monday morning, but kinda away from a proper kb until then
The silver lining is that it seems like the engine tests are more consistently under 10m than we thought, at least according to the traces. So our metrics on all that may actually be better than they seem π π€·ββοΈ
time to give pocketCI a try? π
Plz yes π―
π’ i've seen the absurd memory use problem on high verbosity settings with the fancy progress again, specifically with --debug mode
it looks different from this (when i fixed this back for v0.11.8): https://github.com/dagger/dagger/pull/7679
@still garnet any ideas? i know we had those tui changes... nothing stood out immediately as suspicious (but I can try a bisect tomorrow)
(also we should have a test for this π’)
@civic yacht @rancid turret any chance we could chat live with @fair ermine about context dir DX and technical constraints?
Hi, I would like to request review on https://github.com/dagger/dagger/pull/7494. I need it to fix the Elixir issue on https://github.com/dagger/dagger/issues/7788. π
Anyone else getting this when building on main?
$ ./hack/dev
~/git/github.com/dagger/dagger/dev/mage ~/git/github.com/dagger/dagger
> dagger call -m dev --source=.:default engine container export --path=/Users/shykes/git/github.com/dagger/dagger/bin/engine.tar --forced-compression=Gzip
Full trace at https://dagger.cloud/dagger/traces/34d7ba597cd6d4b37560dea38527bec8
β initialize 7.0s
β prepare 0.5s
! unknown flag: --forced-compression
β ModuleSource.resolveDirectoryFromCaller(path: ".", viewName: "default"): Directory! 0.5s
Error: unknown flag: --forced-compression
Run 'dagger call engine container export --help' for usage.
Error: exit status 1
exit status 1
Found it:
$ env | grep DAGGER
_EXPERIMENTAL_DAGGER_RUNNER_HOST=docker-container://dagger-engine.dev
$ unset _EXPERIMENTAL_DAGGER_RUNNER_HOST
Can't wait to get rid of this version decoupling
Anyone else notice on main, that TUI doesn't clear at the end?
Stable:
$ dagger call -m github.com/shykes/hello hello
hello, world!
Main:
$ dagger call -m github.com/shykes/hello hello
β connect 0.1s
β initialize 3.1s
β prepare 0.0s
β hello: Hello! 0.0s
β Hello.hello: String! 0.2s
hello, world!
Files an issue to track https://github.com/dagger/dagger/issues/7801
Problem In main, output from the TUI stays on the terminal after completion. This is different from the behavior on the current stable release, 0.11.8. On 0.11.8: $ dagger call -m github.com/shykes...
First use of Terminal() in a real-world debugging session... π π π I'm so happy
Was finally able to do dagger call --docker-sock /var/run/docker.sock: https://github.com/dagger/dagger/pull/7804
Also, have a PR with some new core APIs for querying+pruning local cache here: https://github.com/dagger/dagger/pull/7767
finally getting back to this now!
hey hey
I'm getting an error related to binfmt when trying to run functions of the dev module in dagger/dagger
failed to resolve image docker.io/tonistiigi/binfmt@sha256:e06789462ac7e2e096b53bfd9e607412426850227afeb1d0f5dfa48a731e0ba5: failed to resolve source metadata for docker.io/tonistiigi/binfmt@sha256:e06789462ac7e2e096b53bfd9e607412426850227afeb1d0f5dfa48a731e0ba5: failed to do request: Head "https://registry-1.docker.io/v2/tonistiigi/binfmt/manifests/sha256:e06789462ac7e2e096b53bfd9e607412426850227afeb1d0f5dfa48a731e0ba5": context canceled
@dense dust You need to run dagger call -m dev --source .. The module and the source directory are no longer the same
cool cool
is this outdated then? https://github.com/dagger/dagger/tree/main/dev#dev-environment
Yes I'm about to push a PR fixing it, but you were too fast π
It's an awkward UX, one way or the other that mandatory -m dev needs to disappear. We're just using the dagger repo as a guinea pig to make progress on the big knot of design problems that is "contextual modules"
nice nice
I was actually trying to get some context on contextual modules, as it might be related to labels and metadata collection
any entrypoints or issues/discussions that you recommend to familiarize myself with that subject?
Well right now as a feature it doesn't exist. It's a set of problems in the UX, that we're trying to fix with a combination of features and best practices.
If you read this issue, and the comments, you'll get most of the context and history (note that it's closed and did not get implemented) https://github.com/dagger/dagger/issues/7199
oh nice, this links to what @fair ermine demo'd last week
The full history of relevant issues:
- "Hey this
daggerdirectory is a weird default" https://github.com/dagger/dagger/issues/6963 (default source directory) - "This is a big knot of problems, let's solve it all at once" https://github.com/dagger/dagger/issues/7199 ("contextual modules")
- "OK lots of bikeshedding, here's an alternative design" https://github.com/dagger/dagger/issues/7432 ("toolchain modules")
- "We're not getting anywhere, maybe this is too big, let's narrow the scope for now" https://github.com/dagger/dagger/issues/7647 ("context directory") --> This is what Tom is implementing
- "By the way
daggeris still a weird default..." https://github.com/dagger/dagger/issues/6963 - "Hey why do I need to
call -m devnow?" -> trying out best practices to unblock design process
awesome, thanks
@dense dust to give you an idea of what I mean by "knot of design problems": https://discord.com/channels/707636530424053791/1257562107692712017
Anyone else getting this?
$ dagger call -m github.com/shykes/hello hello
β connect 0.1s
β initialize 0.1s
! failed to get configured module: failed to get module ref kind: invalid character 'i' in literal false (expecting 'l')
Error: failed to get configured module: failed to get module ref kind: invalid character 'i' in literal false (expecting 'l')
Fails with both dev and 0.11.8
ah, this is my favorite error π’
there will be some more useful message in docker logs for the engine
I am not sure how suddenly we started getting this as our generic failure message, but it happens because we end up trying to parse a message fail: ... as a json object
docker logs -f | grep -v debug doesn't print anything
I get the same error, and same empty non-debug logs, on both stable and dev engines
π
It's weird because last night when I closed my laptop I didn't have this error. And this morning I do.
i don't see it weirdly
hmm, me neither 
Restarting Orbstack fixed it
(but the engine containers were running before I did)
I lost my dev engine in the process
(it disappeared from docker ps after restarting orb. Re-provisioning now)
Maybe it's related to my aggressively switching back and forth between dev and stable engines
possibly related question: when running dev I have _EXPERIMENTAL_DAGGER_RUNNER_HOST=docker-container://dagger-engine.dev, but now I see a different engine name:
$ docker ps | grep engine.dev
d7fdc0d5a541 localhost/dagger-engine.dev:latest "dagger-entrypoint.sβ¦" About a minute ago Up About a minute dagger-engine.dev
Is that localhost/ new, and could it be related to my issues?
From the release draft:
In previous releases, we faced hiccups where older clients couldn't connect properly to newer engines, leading to frustrating interruptions. With v0.12, we've implemented version checks directly into the client and engine code. Now, if a client or engine doesn't meet the minimum version requirements, you'll get a clear error message. This means no more unexpected compatibility issues during upgrades β everything should just work seamlessly.
How does this work in practice? How are the minimum version requirements defined?
No that's expected (just meant to reduce chances of trying to push it to a registry unless you explicitly retag it IIRC)
They're defined on initial connection to the engine, both the client checks the engine and the engine checks the client. We've actually been doing this since early v0.11.X, though we never really talked about it widely, and we're enhancing it with the module compat as well
Prior to that we had the sdks doing some checking, but we were just emitting warnings, that often got collapsed and missed in the progress output
There's some additional description and technical details in the linked PRs
They're defined as consts in version.go
So we have hardcoded settings that specify which version of the client can talk to which version of the engine? These settings are manually determined based on things like breaking graphql schema changes?
Sort of yes, we have minimum version requirements. Graphql schema changes have been one reason (hopefully less required going forward), but we've also done it for when telemetry has changed, which required big changes across cli and engine, and left very weird results (no progress) with a mismatch
They are manually determined - again, this is just coding explicitly for something that was implicit before, and had a tendency to lead to weird issues
FYI this is an obstacle to the the "test report" pattern: https://github.com/dagger/dagger/issues/7813
I think @still garnet mentioned this in the core dev sync we did last week π
hmmm, found a weird thing @civic yacht @tepid nova
because we recently moved dagger.json into dev, we're now dependent on finding the "root" by using the .git directory
just got hit by a weird issue
was trying to exec a fresh dev CLI, and when adding the --debug flag, my memory consumption spiked to 100%
this was being printed as the --debug output
for context, was using the --debug flag, as the dev CLI is stuck in the "connecting to engine" step
^ @dense dust
Also noticed that issue btw
@spark cedar Given this issue with debug being on... should I maybe turn that off in our prod engine config?
ah no it should be fine there! it's just in the cli, when the --debug flag is specified
in the engine there's no known issue
(worth clarifying this is only seen in dev builds I think as well?)
Im on a windows machine getting this error from a long time:
PS H:\dagger> go run .\cmd\dagger\main.go init --help
# command-line-arguments
cmd\dagger\main.go:72:3: undefined: listenCmd
cmd\dagger\main.go:73:3: undefined: versionCmd
cmd\dagger\main.go:74:3: undefined: queryCmd
cmd\dagger\main.go:75:3: undefined: runCmd
cmd\dagger\main.go:76:3: undefined: watchCmd
cmd\dagger\main.go:77:3: undefined: configCmd
cmd\dagger\main.go:78:3: undefined: moduleInitCmd
cmd\dagger\main.go:79:3: undefined: moduleInstallCmd
cmd\dagger\main.go:80:3: undefined: moduleDevelopCmd
cmd\dagger\main.go:81:3: undefined: modulePublishCmd
cmd\dagger\main.go:81:3: too many errors
you can't run the cli using go run
for developing dagger, we recommend using https://github.com/dagger/dagger/blob/main/CONTRIBUTING.md
I read the dev-docs and tried dagger develop worked well but when tried to run the test error occures
specifically, for the cli, you want https://github.com/dagger/dagger/blob/main/CONTRIBUTING.md
Do you want to build the engine on Windows? π€
I just realize that the file: secret source are not treat ~ as $HOME directory. So I ran dagger call -v=6 -m ./dev --docker-cfg=file:~/.docker/config.json --source=.:default sdk elixir generate export --path=. and it get
Error: failed to get value for argument "docker-cfg": failed to read secret file "~/.docker/config.json": open ~/.docker/config.json: no such file or directory
Is it make sense we could get expand ~ supported for file: ?
I think so π
I'm open a PR to try tackle this at https://github.com/dagger/dagger/pull/7818. The PR didn't have a tests but I can add looking into it tomorrow if you're ok with the solution. π
yea
So will this solve my issue too?
No
thenπ₯²
If you want a dagger cli binary. You may try dagger call -vv -m ./dev --source=.:default cli file --platform=windows export --path .\bin\dagger.exe on the root of the dagger repository.
Still this error exists:
β Module.initialize: Module! 0.9s
! failed to initialize module: failed to call module "dagger-dev" to get functions: call constructor: process "go build -o /runtime ." did not complete successfully: exit code: 1
β exec go build -o /runtime . 0.9s
! process "go build -o /runtime ." did not complete successfully: exit code: 1
β # github.com/dagger/dagger/dev
β ./docs.go:123:6: not enough arguments in call to dag.Go
β have (*dagger.Directory)
β want (*dagger.Directory, string, string)
β ./main.go:78:33: not enough arguments in call to dag.Go
β have (*dagger.Directory)
β want (*dagger.Directory, string, string)
β ./main.go:97:33: not enough arguments in call to dag.Go
β have (*dagger.Directory)
β want (*dagger.Directory, string, string)
β ./main.go:110:46: undefined: dagger.GoLintOpts
Error: input: module.withSource.initialize resolve: failed to initialize module: failed to call module "dagger-dev" to get functions: call constructor: process "go build -o /runtime ." did not complete successfully: exit code: 1
Stderr:
# github.com/dagger/dagger/dev
./docs.go:123:6: not enough arguments in call to dag.Go
have (*dagger.Directory)
want (*dagger.Directory, string, string)
./main.go:78:33: not enough arguments in call to dag.Go
have (*dagger.Directory)
want (*dagger.Directory, string, string)
./main.go:97:33: not enough arguments in call to dag.Go
have (*dagger.Directory)
want (*dagger.Directory, string, string)
./main.go:110:46: undefined: dagger.GoLintOpts
π₯²
Is there any interest in being able to set the runtime/base image for a module via a Dagger function? Something like:
[tool.dagger]
# base-image = "ttl.sh/python-pulumi-6:1h"
base-image-function="dagger -m github.com/kgb33/daggerverse/runtime_containers call python-pulumi"
If there's interest I'll be happy to take a stab at it over the weekend.
found the cause, putting up a PR soon (silent cc @spark cedar)
tl;dr we log when we log, which gets logged, and then logged, and then logged
nice! testing
works π
@tepid nova being able to build the CLI through a dagger function is 
the only wall I hit while using that was forgetting to set _EXPERIMENTAL_DAGGER_RUNNER_HOST=docker-container://dagger-engine.dev after building it
@civic yacht @still garnet re "double terminal" issue ... looks like it's back after rebasing from main?
Happens when returning the full *Container, wondering if it's related to the recent changes in evaluating all fields of objects /cc @rancid turret
Err -- wrong analysis. The problem was fixed when explicitly calling Terminal(), but still happens with terminal on failure
Only one terminal appears:
func (m *Interactive) Hello() *Container {
return dag.Container().
From("alpine").
WithExec([]string{"sh", "-c", "echo hello world > /tmp/hellO"}).
Terminal()
}
Two terminals appear (with #7683 when running with -i):
func (m *Interactive) Hello() *Container {
return dag.Container().
From("alpine").
WithExec([]string{"sh", "-c", "echo hello world > /tmp/hellO"}).
WithExec([]string{"cat", "/tmp/hello"})
}
Might be related to recent changes evaluating both stdout and stderr, and both failing? /cc @rancid turret
@civic yacht Might be related to your question about leaf/parent failures ... if we evaluate the same exec multiple times (e.g. stdout(), stderr()) then it spawns terminals each time?
Not a problem when mounting a failed exec since it's a single solve, but explicitly calling stdout then stderr is distinct solves?
Also -- maybe because when an exec fails it's not being cached, when we dump a failed object fields we end up evaluating the exec multiple times?
Hm possibly, I'll take a closer look and repro to confirm
doesn't look like the exec is being eval'd twice
maybe just wrapErr'd twice, leading to two terminals?
@civic yacht using this for repro:
func (m *Interactive) Hello() *Container {
return dag.Container().
From("alpine").
WithExec([]string{"sh", "-c", "echo $RANDOM | tee /dev/stderr && exit 1"})
}
Okay yeah, think you're right it's the change to evaluate all fields with no args resulting in stdout+stderr being selected, which in turn results in two "leaf" solves and thus two errors. Ultimately the container.withExec still only runs once because buildkit de-dupes the equal concurrent requests, but we still get two errors.
First thought is to include a random unique ID in execution metadata and track those so we only run the interactive debug terminal once for a given failed exec
ExecutionMetadata is not part of the cache key, so safe to just put totally random stuff in there.
I'l double check if there's anything better, but that doesn't sound too bad if nothing else
Yeah I think that approach is probably the simplest. So can just set a field on ExecutionMetadata in Container.WithExec that's a unique id (identity.NewID() is what we use to get random IDs usually), then add a map[string]struct{} to buildkit.Client for tracking which ones we've opened debug terminals for already.
It should work out because dagql caching results in WithExec only running once for a given Container, so you won't get a new random id every time the same query is re-evaluated.
Unsure about the naming, please nit away: https://github.com/dagger/dagger/pull/7683/commits/5a4381171c9b8185c8b31762d1d3eb4dd51124ff
@civic yacht if I pull in the CallID into the metadata it's not going to make your rebase painful?
I could wait and do that on top otherwise
No go right ahead, that's very easy to rebase on
Are you still getting this error? Could you give me a command to run, to reproduce it please? Thanks π
@spark cedar @rancid turret I added the Go support for context Dir: https://github.com/dagger/dagger/pull/7744/commits/96659da7a0c805d126ea9b1eeccd53b6fc739feb
It works when I test it locally but I need to update integration tests to be more generic so it's not scoped to a language.
I'll update this tomorrow
Only Python left, updated tests, a green CI and we'll be good for 0.12
Python is implemented, just need to test it.
@fair ermine, you can take https://github.com/dagger/dagger/issues/6963 next.
Are you still getting this error? Could
I'm currently hitting some DNS issues with nested dagger using the SDK. In magicache's integration tests we are running nested engines manually by:
- Creating a container with
ExperimentalPriviligedNestingwerego testis run - Test uses the dagger SDK to create a container with a new dagger engine
- Targeting the engine container that was created we create an SDK client instance
- Using the SDK client we make API calls to the engine
Problem is that the internal engine, the one created for the tests, cannot resolve external DNS names. It is failing with a connection refused when trying to resolve storage.googleapis.com:
dial tcp: lookup storage.googleapis.com on 10.88.0.1:53: read udp 10.88.0.1:34065->10.88.0.1:53: read: connection refused"
The odd thing is that when the engine starts dnsmasq says the nameserver to use is 10.87.0.1:53, but in the error we are seeing that it goes to 10.88.0.1:53:
23 : [1m40.5s] | dnsmasq[35]: reading /etc/dnsmasq-resolv.conf
23 : [1m40.5s] | dnsmasq[35]: using nameserver 10.87.0.1#53
This started happening with the upgrade to v0.11.9. We were using v0.10.0 previously so its a big jump. There are quite likely many changes that happen in the middle, but do you have any theory as to which could be the one causing the issue? Maybe @spark cedar?
I'm updating the integration tests btw, don't write the test for now
hmmm a lot has definitely changed here, we did fix something that sounds similar to this before: https://github.com/dagger/dagger/pull/7501
ohhhh actually i have an idea of what that might be mayyyybe as well
can you confirm that there are no engines that are overlapping their ip ranges
all the inner ones/all the outer ones
if these are misconfigured, then traffic all ends up in the wrong places
I'll check!
Oh I think I know what might be the issue. Looking closer at the logs, it seems dnsmasq is exiting which makes sense because we are indeed stopping the engine. The problem is that syncing the cache happens once we receive the sigterm which in this case seems to be happening after dnsmasq has already gone away. So the connection refused kind of makes sense, the process is no longer running
I'll check with v0.10.0 and see if I can confirm dnsmasq continues running
Indeed. In v0.10.0 dnsmasq is exiting afte the cache sync is done
makes sense but why is it only happening when running dagger-in-dagger? I don't see the same behavior if I run v0.11.9 locally π€
Tests done, you can do Python π
I'll work on this issue now which may impact the tests again because of the different gen dir
Maybe the behavior of tunnel stop/service stop changed? The engine is being run as a service and is being stopped using the stop function
The interesting thing is that in v0.10.0 it is receiving a SIGKILL. But in v0.11.9 is exiting on SIGTERM
noticed the Attaching terminal: message was a bit strange, so fixed it to be a touch more readable π https://github.com/dagger/dagger/pull/7825
Noticed something odd - the engine is running the same queries over and over in this test:
Which seems to be because it's re-initializing the same session over and over:
- https://dagger.cloud/dagger/traces/14bee421794b09d2012b0a65aed7e286?span=59de326987a2595b&logs#59de326987a2595b:L2083
- (Ctrl+F "initializing new session" for repeats, and "!!!" showing different cache instances)
The test in question makes a series of calls chaining one big Container:
I think this happens because we clean everything up with the MainClientCaller goes away. This is all one dagger session command, but that doesn't actually hold a 'main client' open the whole time afaict, so each sequential call becomes a main client and then cleans up after itself. That slows things down quite a bit because Container.from (and all other queries) get re-evaluated on every call.
I'm guessing this used to work because we established a persistent gRPC connection, but now we make independent HTTP requests. Maybe we need to represent the client lifecycle more explicitly now, so it's tied to dagger session? cc @civic yacht
DagQL + Buildkit cache misses
@rancid turret Could I get an approval on https://github.com/dagger/dagger/pull/7827 please ? π
Fishing out a few issues from the deep backlog, for post-0.12:
- Namespace cache volumes. The longer we wait, the higher the chances that someone is depending on the broken behavior. https://github.com/dagger/dagger/issues/7211
- Quickest win ever. Actual user request. π https://github.com/dagger/dagger/issues/7399
- Add an underscore
_in name mangling: https://github.com/dagger/dagger/issues/6952 - Check if a file or directory exists: still no easy way to do this. https://github.com/dagger/dagger/issues/6713
- Service API has sharp edges: https://github.com/dagger/dagger/issues/7399
Schema views
@spark cedar thanks for #7825!
I've ported the changes to #7683: https://github.com/dagger/dagger/pull/7683/commits/3efaed6c68702a48d01f8e3c9543534ff37677fe
@rancid turret getting a sdk python test failure on [dumb-init] foobar: No such file or directory ... not sure if it's hiding another problem
Re-running the job fixed it, seems flaky: https://dagger.cloud/dagger/traces/c0c97b62c01a23543121eb359adca48c
@civic yacht ok to merge https://github.com/dagger/dagger/pull/7683 ?
Yep go for it!
Saw this fail here: https://github.com/dagger/dagger/actions/runs/9509605842/job/26212800646?pr=7660 Don't have the cloud trace on hand because it got re-run and couldn't find it in the his...
Good morning, any release blockers I can help with?
Hey Solomon! What do you want to do with Erik's concerns in https://github.com/dagger/dagger/issues/7821#issuecomment-2211250114, re: continue printing all functions or limit to state?
Context Since #6515, the CLI prints objects returned by a pipeline. To do this, it calls certain functions in the returned object, specifically functions which 1) return a printable value, and 2) h...
If we only printed fields, what would Container look like?
Ironically, I am now OK with printing stdout/stderr, assuming we fix the behavior of Container.sync so that it doesn't fall back to defaultArgs. But @silent @still garnet doesn't want it to be printed, because his test pipeline returns a Container, and stdout/stderr would be too verbose
Yes, that's a different issue. I was going to remove stdout and stderr in that regard.
Well if we have a rule that "only precomputed fields are printed", we will need to apply that rule to core types, even if the underlying mechanics are not the same.
OK but why? What's the rationale for it, and how to we generalize it - that's the question that started the whole bikeshedding session in the first place
It's not sync that's falling back to defaultArgs, it's stdout and stderr.
Ok, that's a breaking change and I think it makes sense, but I wonder if there's a use case or need for it that we're missing. So if you call stdout and there's no withExec, do we just error? Doesn't happen on sync because it's not trying to output the result of a command. That's why stdout depends on some command.
So there's 3 things here:
- β
Fix
stdoutandstderrso it doesn't fallback todefaultArgs - β Keep or remove
stdoutandstderrfrom being printing onContainer? - β Address Erik's concerns and limit user modules to print only state or keep unrestricted? Implies figuring out a generalization for core types:
Containerwould just excludestdoutandstderr(this also addresses Alex's concern on big outputs):defaultArgsentrypointmountsplatformuserworkdir
Directorywould stay the same:entries
Filewould stay the same:namesizecontentsmanually excluded until we resolve the binary issuedigestin the future, etc.
either error or empty string IMO
Right π
I replied in the issue to keep things moving
LOL inspector @rancid turret π
"but you said last year that you wanted the opposite" https://tenor.com/view/columbo-detective-confused-thinking-peter-falk-gif-13175857
And that issue was one of the longest bikeshed sessions ever, with the concept of image vs container..
Alex on that thread:
For example, if I do container.From("redis").Publish("redis2") - does that mirror the redis image directly to redis2, or does it implicitly run the default command and hang forever instead?
I honestly can't remember any of the user feedback that led me to open that issue 4833. That would definitely help answer your question..
We can just return a clarifying error message if you use stdout without an exec. Hopefully that'll ease confusion:
no command set: stdout requires an exec, at minimum with empty args, to execute the default command
Do you remember what use case led to getting the error?
The typical workaround is to add an explicit withExec(nil), which is very confusing.
Who had to do this and why?
I think people were just confused about building a container and needing to add WithExec(nil) before Stdout(ctx). Wasn't obvious to them.
Maybe returning an empty string would solve our new problem without bringing back our old one?
Wouldn't returning an empty string be more confusing? Some people may expect that the CMD in a Dockerfile is run when you Build(src).Stdout(ctx), so if output comes out empty they start wondering what's wrong.
At least with the error message you can explain what to do.
Yeah possibly. Now I remember that withExec with nil argument will apply the entrypoint + defaultargs, is that still correct?
Yes, it is. But with https://github.com/dagger/dagger/pull/7136 it'll only use the entrypoint if you explicitly tell WithExec to use it.
Fixes #6574 (although with an alternative solution)
WarningIt's a breaking change for anyone currently depending on the entrypoint in withExec, unless using skipEntrypoint (except in Go since y...
OK, so I see how we made the jump from "add withExec(nil) as a workaround to "just add that withExec(nil) implicitly when calling stdout() or stderr()"
Yep.
In retrospect it seems like maybe it was too big of a jump
It used to raise an error: https://github.com/dagger/dagger/pull/4716, but then came https://github.com/dagger/dagger/issues/4833.
Especially since the behavior is very different depending on the number of past withExecs. That doesn't seem like it should be relevant
Out of that laziness confusion issue, came out two changes that are related:
- Users that want to stdout on the CMD of a Dockerfile: implicit
WithExec(nil)onStdout - Users that want to build, without exec of CMD: use
Syncinstead ofExitCode(orStdout)
I think explicit WithExec(nil) is better, with clarifying error message on Stdout if there's no exec.
But, in the CLI, if you keep stdout but remove the implicit WithExec(nil) then you can't just:
dagger core container from --address=alpine
You'll get an error on stdout. It's weird to require this:
dagger core container from --address=alpine with-exec=
Solution is to not include stdout.
βοΈ whack-a-mole
An alternative for the CLI could be to check if the response is a "no command set" error, then make a new request without stdout and stderr and add manually to the response, with empty strings. Bit hacky maybe.
Another alternative would be to add a noFail option (bikeshed needed, not to confuse with exec failure) to Stdout which the CLI could set when printing the object. In that case, the API would return the empty string if no command set.
Yeah I really would prefer avoiding anything that relies on error types. It's a PITA to use, and SDK-specific.
Let's assume for a minute that we exclude stdout/stderr from the autoprint. And let's assume we have a good, generalizable justification for it.
In that case, how do we feel about stdout/stderr returning a clear error on no execs?
In that case I think it's good because it's explicit.
I still kind of like the option of returning an empty string. I've always wanted stdout to be the concatenation of all commands anyway
(like docker logs)
Options:
- Flag in CLI to include stdout somehow (can't think how to generalize this), but default to not adding it.
- Option in API to return empty string instead of error if no command set.
- A new API field with the new behavior. We've always wanted an
Output(ctx)that combines both, but there's details to work out on interleaving and such. - Just don't include the
stdoutandstderronContainer
I think we can handle Container.stdout and Container.stderr the same way we handled File.contents: exclude it manually for now, but generalize it based on (future) type, specifically Either bytes or stream.
Rationale: neither of those 3 functions should return a string: they only do as a stopgap (we don't have a better type at the moment). If they returned a proper non-string type today, it would be obvious that they shouldn't be printed.
Agreed. π
Does anyone have a list of VCS we support for fetching modules now?
Is it still git only?
still only git yup
That title of "multi-VCS" might be misleading then
I changed the title to "Host modules on any git server"
I ended up splitting in 2 issues after all, it made more sense. I just created the missing issue for @still garnet 's PR π
- Don't run a container's default args when printing it. https://github.com/dagger/dagger/issues/7821
- Exclude stdout and stderr when printing a Container. https://github.com/dagger/dagger/issues/7853
hm i was looking at picking up some stuff with the context directories and hit a weird thing @civic yacht - i think ResolveContextPathFromCaller's implementation is actually wrong:
- it first gets the root subpath (relative to the context)
- it then tries to look to find that root subpath in the current directory on the client (using StatCallerHostPath, which is relative to the current directory)
- but the current directory on the client is not neccessarily the subpath - the client ends up looking for <abspath-to-context>/subpath/subpath somehow
weirdly, this seems to work? except context directories somehow makes it error π’
not sure if you ever noticed this, but i'm struggling to think of a way to actually get it to use the right paths here
I've noticed some issues with those paths. One example is when using an "unbundled" sdk with --sdk and --source. The path to the SDK is calculated in relation to --source, which is wrong.
There's another issue where if a module fails to load, it reverts to a parent module (parent as in the filesystem tree).
@still garnet, how do you see the debug logs? I can no longer see them.
@still garnet @wild zephyr @spark cedar @little stone @hasty basin
Notes from the Traces / TUI integration discussion:
- NOW:
- Keystroke for web visualization (proposal: 'w', 'web'), Logged in and logged out.
dagger call --web: visualize trace in web browser. Logged in and logged out.- Print trace URL
- When logged in: print in progress=plain, and in TUI when "lingering" is enabled (disable with
-qetc). - When logged out: same as logged in, a)
https://dagger.cloud/traces/setupb) super short one-liner c) explain how to disable. Proposal:Setup full trace at https://dagger.cloud/traces/setup. To turn this off, export SHUTUP=1(rotationNOTHANKS=1,GOAWAY=1,STOPIT=1,IAMGERMAN=1(jk)
- When logged in: print in progress=plain, and in TUI when "lingering" is enabled (disable with
- LATER:
- Clickable spans (logged in, and maybe logged out?)
dagger traces(logged in and logged out)
@rancid turret @still garnet @spark cedar @wild zephyr @hasty basin @little stone while discussing something else (Traces / TUI integration), we started then paused a sidetrack topic, which I'm resuming here now.
The topic was: with the new "object print" CLI feature, did we make it harder to use dagger call FOO BAR as a discovery mechanism? I'm starting a thread here so we can discuss.
hm i was looking at picking up some
Service.Stop sends SIGTERM to all container processes since v0.11.5
Is this getting bumped for v0.12?
dagger/go.mod Β» github.com/vektah/gqlparser/v2@2.5.11 β gqlparser denial of service vulnerability via the parserDirectives function (moderate severity) - came up on GoReleaser PR grype scan.
Yep, I see it here: https://github.com/dagger/dagger/pull/7695/files
cc @little stone π
@rancid turret quick question: https://github.com/dagger/dagger/pull/7136
why does this bring dev/wolfi out of the external dep? no objection, just had a conflict, and was curious
That's because the Wolfi module in Solomon's daggerverse uses a very old version of the APK module from Alex which conflicted with the change in that PR. Even though it should work now because of the schema views, I asked Solomon at the time to update but he was away and said he planned on moving it to our repo anyway so I could just do that.
awesome, yup, sgtm β€οΈ
@spark cedar thoughts on the "compatibility mode" name and description?
We are introducing Compatibility Mode, a new feature that improves backwards compatibility of the Dagger Engine by simulating the behavior of older engines. This allows you to upgrade to the latest version of Dagger with confidence, knowing that your dependencies will still work, even if they target older versions.
At Dagger we believe that backwards compatibility is a crucial component of a successful software ecosystem. This is especially true for a young and fast-moving platform like Dagger: at our current rate of improvement, API changes cannot be entirely avoided. Compatibility Mode will allow us to improve backwards compatibility over time, without sacrificing the velocity which makes the Dagger community so special.
Note that Compatibility Mode still has limitations. Most importantly, we donβt guarantee 100% backwards compatibility, although we aim to get there eventually. We also donβt provide feature granularity: modules canβt pick and choose which API changes they upgrade, and which they donβt. We plan on adding this feature later, when the scale and maturity of the platform justifies it.
Is this accurate? π
@spark cedar your recent refactor of DaggerDev.engine().lint() in our dev module... is that purely cosmetic? It involves moving code across the goroutine/errgroup boundary, but I couldn't find a clear change in behavior.
I'm hoping it wasn't purely cosmetic because it's adding a bunch of merge conflicts to my always-growing queue of linting PRs π
yea, there's a change now - there's a new method called LintGenerate that is called as part of this
Ah I see it now
that probably wipes out a bunch of my own PRs
(which I've been struggling to keep up to date for review..)
will rebase and see what's left that is still useful
Ah actually it doesn't - looks like it's mostly a cleanup of logic that was already there. So might make my PR actually easier (minus the one-time merge conflict)
argh, you should have pinged π’ sorry, i missed the notification when it was marked ready-for-reivew
will look now
I think the one that is affected never left draft (because of CI check issues perhaps)
It's not urgent, the one that is ready for review is so narrowly scoped, that I'm confident it can survive until the release rush is behind us
I'm mostly thinking about the next ones in my queue
i've reviewed the golangci one
thank you! note that I've scales back my ambitions of high-level check interface, combined reports etc. I'm taking a baby step approach (which is why my queue is growing larger, and more vulnerable to conflicts)
@spark cedar I have a question about compat mode. When loading a targeting an older engine version, and that module is the top-level (not a dependency) - how is the decision made to update the engineversion field vs. activate compat mode? Is the answer just "no compat mode for the top-level module, only for deps"?
exactly the second π
also we're in #911305510882513037 now for low level chats
@still garnet @spark cedar Is it normal these tests are broken: https://github.com/dagger/dagger/blob/55d4cf65e18103a2980f2189392909eb899d83b4/core/integration/engine_test.go#L365
# github.com/dagger/dagger/core/integration [github.com/dagger/dagger/core/integration.test]
core/integration/engine_test.go:365:36: cannot use dagger.ContainerWithNewFileOpts{β¦} (value of type dagger.ContainerWithNewFileOpts) as string value in argument to clientCtr.WithNewFile
core/integration/engine_test.go:366:4: unknown field Contents in struct literal of type dagger.ContainerWithNewFileOpts
core/integration/engine_test.go:368:33: cannot use dagger.ContainerWithNewFileOpts{β¦} (value of type dagger.ContainerWithNewFileOpts) as string value in argument to clientCtr.WithNewFile("/work/dagger.json", dagger.ContainerWithNewFileOpts{β¦}).WithNewFile
core/integration/engine_test.go:369:4: unknown field Contents in struct literal of type dagger.ContainerWithNewFileOpts
core/integration/engine_test.go:379:36: cannot use dagger.ContainerWithNewFileOpts{β¦} (value of type dagger.ContainerWithNewFileOpts) as string value in argument to clientCtr.WithNewFile
core/integration/engine_test.go:380:4: unknown field Contents in struct literal of type dagger.ContainerWithNewFileOpts
core/integration/engine_test.go:382:33: cannot use dagger.ContainerWithNewFileOpts{β¦} (value of type dagger.ContainerWithNewFileOpts) as string value in argument to clientCtr.WithNewFile("/work/dagger.json", dagger.ContainerWithNewFileOpts{β¦}).WithNewFile
core/integration/engine_test.go:383:4: unknown field Contents in struct literal of type dagger.ContainerWithNewFileOpts
core/integration/engine_test.go:393:36: cannot use dagger.ContainerWithNewFileOpts{β¦} (value of type dagger.ContainerWithNewFileOpts) as string value in argument to clientCtr.WithNewFile
core/integration/engine_test.go:394:4: unknown field Contents in struct literal of type dagger.ContainerWithNewFileOpts
Got this error after a rebase, thought I messed up something but no
Or then, how do I run my tests? Cause it seems it's because of the new version compat thing
hm this is related to @rancid turret's change
It has been introduce in https://github.com/dagger/dagger/pull/7692/files
Oh sorry π¦
nah, that's my bad
fix: https://github.com/dagger/dagger/pull/7867
i'm headed off for the evening now, might be back later, so merge when tests pass π
@spark cedar not urgent but when you get to it: the new DaggerDev.engine().lintGenerate() is specific to 'go generate', and seems orthogonal to whether 'dagger develop'-style codegen has been called. Does it matter to that function whether 'dagger develop' has been called or not?
No - it just lints the generated files, specifically the protobuf (not the dagger developed files)
Thanks. Context is I'm preparing a PR to clean up my GoToolChain.WithCodegen mess, and make handling of dagger-develop more sane (per your suggestion a while back)
β I'm stuck on a mysterious long-running trace, unsure if it's ever going to return... https://dagger.cloud/dagger/traces/662f791e5db823ea7604b7f1db76db6a
Maybe this has already been caught, but: it seems impossible to pass all dagger checks locally on our own codebase with a dev engine, because calling dagger codegen in the pipeline will change engineVersion to a dev version, which linting will detect to be different from the current checked in version. The only workaround is to first call dagger develop with the dev engine on all modules; then call the linting checks; then manually call dagger develop again from a stable engine to revert the engineVersions. Seems a bit complex
@civic yacht any chance we could make it so stable & dev dagger don't wipe each other's engine?
the dev engine created by ./hack/dev doesn't wipe stable's cache. Do you mean dev engine as in one built off main (with git commit tag, etc.)?
Ah, good to know!
What about the opposite: stable CLI removing dev engine?
I just upgraded to 0.11.9, saw "removing engine-<sha>" and immediately panic-control-C-ed
And it looks like that briked my dagger install π
$ dagger functions
Full trace at https://dagger.cloud/dagger/traces/2e74476798105cffad0334bef6329bb9
β connect 0.7s
! start session: connect buildkit session: unexpected status 200: update session state: initialize client: failed to create buildkit session: failed to create go sdk content store: mkdir /usr/local/share/dagger/content/ingest: read-only file system
β starting engine 0.7s
β create 0.7s
β starting session 0.0s
! connect buildkit session: unexpected status 200: update session state: initialize client: failed to create buildkit session: failed to create go sdk content store: mkdir /usr/local/share/dagger/content/ingest: read-only file system
Error: start session: connect buildkit session: unexpected status 200: update session state: initialize client: failed to create buildkit session: failed to create go sdk content store: mkdir /usr/local/share/dagger/content/ingest: read-only file system
(I get the same error manually removing the 0.11.9 engine container)
no they operate independently of each other, won't stomp on each other's cache.
That message you saw means that the previous stable engine (i.e. v0.11.8 or whatever) is being replaced with v0.11.9, so the new stable engine is removing the cache of the old stable engine. In theory it's trivial to not do that, but it's risky in that we don't guarantee/test that caches are compatible between versions.
OK that makes sense and yeah, I don't mind stable stomping previous stable
Now I'm getting docker/orbstack filesystem errors but perhaps unrelated
I think I may have triggered an orbstack and/or docker race condition by cancelling the docker rm -v half way
That is probably related to the "read-only file system" errors you're getting now. That's extremely odd and I can only think of filesystem corruption being the cause
Yup my orbstack install appears to be completely broken:
$ docker pull alpine
Using default tag: latest
Error response from daemon: error creating temporary lease: write /var/lib/docker/containerd/daemon/io.containerd.metadata.v1.bolt/meta.db: read-only file system: unknown
Hi @civic yacht !
Somehow my signed-off DCO line fell off squash in last PR. Okay if I git push origin main --force-with-lease with my amended commit?
Or another fix better.
From a legal POV, we can live without a DCO in that one commit FYI. Especially by an employee.
I'm not sure if we allow force pushes to main at all. I'm also not sure if it really matters
Yeah if it's fine legally then I can't think of any other reason it would break. The commit is still verified and such since it was from a squash+merge PR
Works for me. Just the jaundiced "eye of GitHub" was upon me.
I think the yellow dot just means CI is still running in this case
We disabled DCO check for (reason)
I think it was broken?
current status: removing orbstack and re-installing docker for mac... Karma
@civic yacht @spark cedar fwiw, switched Docker over to its btrfs storage driver (since that's my underlying fs) and now ./hack/dev takes 48s π (possibly related to the fact that I nuked everything, but using the native fs also just feels more appropriate)
huh interesting, generally speaking we do seem to be pretty bottlenecked by disk performance (https://github.com/dagger/dagger/pull/7863), but that's still a pretty dramatic improvement over 5m
Perhaps a few strategically placed Container.withMountedTemp might help?
π€·ββοΈ maybe overlay2 on btrfs just isn't a good combo, or I just had a ton of cruft lying around. rm -rf /var/lib/docker took upwards of 7 minutes
idk yet, we need to drill down more. Something about it all feels off and like there's some extreme inefficiency somewhere (like way more read/writes than necessary, too many fs syncs, etc.), but just a gut feeling
highly possible, kernel version would be relevant too since they're always optimizing more stuff in that whole area
β¬’ [fedora-toolbox:38] β― uname -a
Linux dev.fedora 6.8.9-300.fc40.x86_64 #1 SMP PREEMPT_DYNAMIC Thu May 2 18:59:06 UTC 2024 x86_64 GNU/Linux
Well that's pretty new, so whatever it is probably still exists (if it is the kernel not handling this well)
More conflicts on dev pipelines...
Damn, removing go aliases makes those lines pretty long...
but nicely explicit at the same time
./hack/dev on main, on a Mac, produces an elf linux binary..
Have we considered getting rid of local checkout in our own CI, and doing the checkouts in Dagger instead, like @copper snow does? One pretty awesome benefit is that the checks in the PR, would be reproduceable by anyone: copy-paste the command, run it locally, boom it works.
Edit: not just the checkout, but the -m MODULE_ADDRESS as well.
eg.
dagger call -m github.com/$GITHUB_REPOSITORY/dev@pull/$GITHUB_PR_NUMBER/merge call --source https://github.com/$GITHUB_REPOSITORY#pull/$GITHUB_PR_NUMBER/merge engine test important
I feel summoned
lol sorry!
Do you agree it would be cool if the Github Checks added by Dagger, were always copy-pasteable straight into your local machine?
One step closer to the CI-less dream
Absolutely! I want to be able to switch between CI systems every week, just for fun! If I canβt do that, my CI is not portable enough.
Obviously, this is an extreme, but moving between CI systems is way more common than people would think
CI tied to SCM is a HUGE step back for developersβ¦Iβm sure itβs good for business though
But trust you will keep distrupting that π
@still garnet hey -- on a fresh install, dagger takes over a minute to start the first time (26s connect plus 50s initialize)
https://dagger.cloud/dagger/traces/dfc0e3edc9c094fc4b85dc987eb3ed70#93ed389bc1e03622
Was wondering if we could add some custom spans to indicate what's going on and make it clear it's a one-time thing (e.g. "Setting up Go module environment" or whatnot)
I had the same conversation recently, so my educated guess is that Alex will answer: "finding the right level of detail has been a game of whack-a-mole. Some people complain it's too much detail, others it's not enough"
I think the way to address that "wack-a-mole" feeling, would be to open an issue, and try to get consensus on a scalable rule (I proposed a rule at some point, but of course can't find it in the maze of archives)
An example of "it's too noisy" I filed just today π https://github.com/dagger/dagger/issues/7866
(I also agree with what you're saying @wet mason. So IMO there a scalable rule that justifies more detail in some places, and less in others)
Yeah, adding an extra datapoint for too noisy / too quiet: maybe timing matters as well. First time I use dagger I may get the wrong impression that it's slow ("connect" is doing more than connect the first time, "initialize" is not just initializing the module but also doing bootstrap work)
That's the problem -- not sure if making it noisier it the solution though. Random first thoughts:
- Maybe make the installer (e.g. brew) trigger that stuff so the "slow start" is attributed to the install, not first run
- Don't be "raw" verbose on initialize/connect: maybe put some strategically, human crafted spans in there ("bootstrapping the engine", "setting up the go environment for the first time")
- Verbose, but hide on cached: so it's verbose the first time around, but quiet the next time (so it's quiet enough day to day, but verbose for first time users
You can also press + once you get impatient and want to see what's happening beneath initialize, which helps a bit. Though I think you need to press it twice at the moment to reach the appropriate level of verbosity
Some of those bespoke hand-crafted spans already exist beneath connect - I'm pretty sure it shows image fetching progress etc. now
initialize is a small levee holding back a deluge of info though without further demarcation
could maybe use some work there
@still garnet Thinking in the context of "I'm new to dagger, just trying the getting started" rather than existing users wanting more details
(or me just now after running dagger on a different machine and wondering why it's very slow -- forgot I just ran an upgrade)
Connect takes 1,5-2 minutes for me π
Every time?? That's not normal!
Yep. As far as I understood, it's due to caching: it downloads all cache volumes every time from cloud. No, I don't think it's normal either, but it is what it is.
For reference: the latest CI job:
https://github.com/openmeterio/openmeter/actions/runs/9878694926/job/27283200190#step:4:236
Ooooh! Yeah that's a known issue. A fix is underway (or perhaps even ready for next release?)
Last I heard the fix was done, but the integration test for that fix surfaced an unrelated issue deep in the engine, which @astral zealot has been chasing down the rabbit hole. I don't know if the fix itself has been merge, or is pending the integration test being finished
@silent @civic yacht a flag for explicitely disabling compat mode would be useful I think. I'm running into that while switching between dev and stable engines (re: Terminal())
@silent @Erik Sipsma a flag for
how's this?
After a couple weeks of usage, I'm starting to have doubts about this "dev module" pattern... wdyt @rancid turret @spark cedar ?
@spark cedar how's the release looking? Anything we can do to help?
you can look at https://github.com/dagger/dagger/pull/7870 π
i'm reviewing now
but more eyes on this one will be good
Done. Left one nit
These test flakes are killing me π
@astral zealot @stray heron @spark cedar could I get a quick LGTM on https://github.com/dagger/dagger/pull/7876 ? It's a module in dev that is not wired in anywhere. There are a bunch of failed CI checks but I don't see how they're not flakes
A centralized "checks factory", separate from the dagger repo, that can process all repo events, live or in batch, but also ingest commits directly, and maintains a database of which module in which commit passed which checks. The pipeline logic for all this would be centralized in its own repo - but it would dynamically import submodules of the target repos, and look for a very specific Check interface.
Then I look at the web ui and/or query a simple API to get the checks for a given commit of a given repo.
THEN I integrate that into my PR
Cut out CI completely
Note: this would also simplify the questions of checking the PR vs. checking the pipelines to check the PR (re: "should the lint pipeline lint itself?" @spark cedar )
@astral zealot my dream would be for pocketCI to give me all the building blocks to create this, as a collection of modules (as discussed yesterday)
@rancid turret can you spare a LGTM? π https://github.com/dagger/dagger/pull/7876
I really wish we had a high-level issue for storage drivers / configurable cache backends. We often say it's our number one feature, but never took the time to write it up like we did for others. I'll get around to it eventually, but would be delighted if someone beats me to it π
We did, it's in Linear (I'll send the link separately since it's not public). We can migrate it to a public GH issue if it makes sense though
Oh nice! Yes I think making it public would be fantastic
Oh btw -- are root-level codegen types (e.g. Container) gone from Go codegen? Just noticed by re-running dagger develop
yup, these are officially gone now - https://github.com/dagger/dagger/pull/7831
Completes the work from #6680 by removing the aliases, which were really tricky to work with from a code completion perspective.
By using #7759, we can avoid breaking the daggerverse, and just appl...
@spark cedar sweet! I think the docs snippets are still using those though
i'll do all the updates for go docs, including @rancid turret's WithNewFile one
Already started that one: https://github.com/dagger/dagger/pull/7755
Also this one: https://github.com/dagger/dagger/pull/7768
lol, hell:
anyone who feels like reviewing that is fun, go for it π https://github.com/dagger/dagger/pull/7884
the PR says "docs" π€
bait and switch
"let me do an easy one, read some markdown"
Oh I see they're snippets
PR 7884
I'll do that tomorrow π
π€ @civic yacht I just noticed DaggerEngineCache types in module codegen - we don't see dag.EngineCache as expected, buuut - is having dag.LoadDaggerEngineCacheFromID exposed an issue?
Hm yeah I guess that's different than when we hid Host from modules in that in this case there are IDable types that end up needing to be hidden.
As long as codegen compiles (which it must I guess), I'd say that it's not ideal but not a huge deal (as in a blocker). Honestly I don't think any of those Load* functions should be exposed since they are highly internal only
I can make a follow up issue for that
sadly, we have to have LoadFromID functions public (again, fancy interface things)
though we could just copy-paste the impl
I think we're at a point where we don't need netlify preview deployments anymore
We just need a bot that leaves a comment saying "to test this docs PR, run dagger call -m github.com/dagger/dagger/dev@pull/$PR/merge --source https://github.com/dagger/dagger#pull/$PR/merge docs server up and browse http://localhost:8000
@rancid turret hehe @copper snow caught a fun thing in our release notes:
Seems like it was added manually. I added that change log in the 3 sdks because the change was done in their codegen alone. But Erik pointed out that with modules this gets released with the cli instead of the lib.
i'm gonna update that i think
here's what i'm thinking
at the top, call out the sdks, and under each item, give update instructions
I made that change because for go/python/ts module authors, the separate sdk release is completely irrelevant. So it feels very odd to tell them to go read some notes on an irrelevant release
Hi, I would like to request the review on this PR to fix the Elixir runtime at https://github.com/dagger/dagger/pull/7844. π
Now that some release dust has settled, will take this opportunity to lobby for reviews on the socket/secret isolation + support for unix sockets as function args PR, which is good to go: https://github.com/dagger/dagger/pull/7804
@civic yacht @rancid turret quick q regarding custom types as argument:
type Test struct {
Foo string
}
func (m *Sandbox) Echo(stringArg *Test) *Test {
return nil
}
I noticed that as soon as a I use *Test as the input type, the function "disappears" (no longer shows up in dagger functions etc).
I'm assuming since it's not supported we're just silently skipping through that function?
From the CLI's perspective, yes we hide those for now since we don't have clean ways to support passing custom object args yet. It does show up in codegen though
Yeah the codegen is actually reasonably fleshed out in terms of handling args/returns that are custom objects, lists of those, custom objects with fields that are other custom objects, etc.
@wild zephyr btw - good call on the stateful TUI cloud URL thing, I just noticed the button didn't show up briefly, and I'd much rather that than press w too early and have it send me to a signup page when I expected to go to a trace
Lol, yes. It was totally considered π
The bug causing flakes in TestLegacy (and possibly the ones in PythonLayout) is proving to be a real treasure trove of wtf https://github.com/dagger/dagger/issues/7900#issuecomment-2226726249
It's dependent on the fact the Test and Bare are both modules names with the same number of characters 
POV: it's monday morning, rainy outside, you're upgrading to Dagger 0.12 on train wifi
Hm, looked like during our github label cleanup we lost our kind/regression label π€ Any objections if I bring it back? cc @twin crow (I really liked it for being able to glance at the backlog/milestone to prioritize)
Any chance we could get the target URL visible for these HTTP spans?
for sure, where would you want the url?
I guess anywhere on that line
Context: I'm running dagger -m github.com/dagger/dagger/dev functions after upgrading to 0.12 on slow train wifi, and staring at this for the last 14mn
grr, yeah, this sucks
When I zoom in, I see things like this which doesn't really help
I tried to press 'w' for web view, but that's hanging in the browser (blank page)
That feels like a Dagger Cloud problem, since after all Discord is clearly working
as a potential future idea - could we have a mini-project called "offline / slow internet". i'd love to have this be better, but i think it's a bit much for just a couple prs to sort out
Yes I've been calling this "offline mode" and it would be super useful IMO. For both 1) Geographical locations with slow internet in general, 2) travel (train and plane wifi), 3) airgapped or restricted networking environments.
@spark cedar is https://dagger.cloud responsive for you?
yeah it is
"offline mode"
One more frustrating issue on slow wifi. Even after installing the latest engine, dagger call will occasionally last 30 seconds, with a progress that implies that it's "creating" the engine (?).
My guess is that it does an image tag lookup on our registry?
What does create mean in this context? Increasing verbosity gives no more detail. (EDIT: I'm looking at the source and will open a PR with more verbosity to answer my own question)
β οΈ Starting a thread to coordinate response to the broken dagger login issue.
Starting a thread π§΅ for v0.12.1
@spark cedar in the last commit we merged to main we broke the helm chart due to an invalid label value (v"0.12.0"). Quick fix here: https://github.com/dagger/dagger/pull/7930
ah, would you prefer https://github.com/dagger/dagger/pull/7923?
Reported in #kubernetes message
The fix in #7917 was bad.
It would generate a bad label for app.kubernetes.io/version:
apiVer...
no preference really
Oh nice!
. mmm, to quote or not to quote
We already do no quotes in the rest of the labels. Looking at some other sample applications it seems no quotes shows up more
How realistic would it be to make some of our core types interfaces? Especially Directory, Container, Secret...
Did we get any closer to that goal with all the recent cleanup and refactoring?
I am at a point where having this would unblock some pretty cool use cases for me (I think)
interfaces in go? or interfaces as in, dagger interfaces?
Dagger interfaces
I'm looking at replacing all instances of apk add with apko in our main pipelines. This is built on the apko module by @still garnet (built in bass btw).
I have one issue around platform selection.
Consider this snippet:
base = dag.
Container(dagger.ContainerOpts{Platform: build.platform}).
From(consts.AlpineImage).
// NOTE: wrapping the apk installs with this time based env ensures that the cache is invalidated
// once-per day. This is a very unfortunate workaround for the poor caching "apk add" as an exec
// gives us.
// Fortunately, better approaches are on the horizon w/ Zenith, for which there are already apk
// modules that fix this problem and always result in the latest apk packages for the given alpine
// version being used (with optimal caching).
WithEnvVariable("DAGGER_APK_CACHE_BUSTER", fmt.Sprintf("%d", time.Now().Truncate(24*time.Hour).Unix())).
WithExec([]string{"apk", "upgrade"}).
WithExec([]string{
"apk", "add", "--no-cache",
// for Buildkit
"git", "openssh", "pigz", "xz",
// for CNI
"dnsmasq", "iptables", "ip6tables", "iptables-legacy",
}).
This is a pretty juicy target (we can get rid of that cachebuster hack altogether). And that's in the critical path of building the engine...
My problem is in the beginning of this pipeline: dag.Container(dagger.ContainerOpts{Platform: build.platform}).
@still garnet can I address this with apko.withArchs? But not sure what string value to pass from a dagger.Platform
I think you need to do this: _, arch, _ := strings.Cut(string(platStr), "/") and then pass arch as withArch - should probably make that easier
Looking at the apko source, I don't think archs is used anywhere..
ooh never mind config is serialized directly to the apko config file?
(let [config (mkfile ./config.yml (json self:config))]
yep, which I think should give us a properly-platformed container, but not 100% sure
Ok I'll try
depends on how loading OCI tarballs works I guess
Is there a dagger.Arch type? Or only Platform with OS & arch mixed?
yeah only Platform
any platform-parsing utilities?
I guess this is the utility : #maintainers message π
yeah nope, it's just a scalar string π’
I guess the default work because it's the apko tool itself that selects it?
so the buildkit withExec default carries to apko default?
https://github.com/vito/daggerverse/blob/36e8ce2f9eb52cd73c4d5397ed4ed65912d7b6ac/apko/apko.bass#L50 <- this TODO does not bode well
i think the root problem is: apko is capable of building multi-arch images, but Dagger's Container only represents a single architecture, which is also what results in that weird multi-platform export UX
Eek 5 +1s, maybe I should do this sometime
It's withArchs plural, not sure if we want to take advantage of that?
My earlier though exactly π
Do we have a documented good reason for that restriction of Container?
yeah, apko supports building a multi-arch image which is great, but a bit of a dead-end atm i think when it comes to bringing it back in to Dagger
There is no good reason imo, just that we haven't done it (there's some weird implications of it potentially that need thinking through)
I vaguely recall @civic yacht saying it seemed like a good idea and/or was necessary at the time, but can probably be revisited now
we've done soooo many refactors/rearchitectures since then
multi platform has always been hard dating back to the cue days..
I do remember some aspects of publishing being quite difficult to use in the current API, can't remember, something about insufficient state in Container... probably the single type you just mentioned
I don't think we can just have a single container be multiplatform. If you do a withExec then you'd be executing N containers (one for each platform, using qemu emulation when not matching the host platform), each of which have their own independent exitCode, stdout, stderr etc. which just doesn't make any sense at all with the Container API.
IIRC I wanted to add a separate type for this like MultiplatformContainer but there were objections which left us in the state where you do the multiplatform publish with that weird arg that provides a list of platform variants.
ah I would have anticipated that it just runs the container using your default platform if it's among the set, not that it would run all of them
but I guess therein lies the debate
The fact that we can run execs on platforms that don't match the host via emulation is a feature, so only running an exec for your host platform would create extreme logical inconsistencies
Couldn't you make it logically consistent like this?:
- A container's set of platforms enable it to run on those platforms.
- When a container runs, it finds an engine for one of its platforms.
- If no match is found, it emulates instead.
(mostly driving discussion, still feels a bit fuzzy, I see your point)
You could, but then each withExec is an independent container with independent outputs (exitCode, stdout, stderr, etc.). One could fail while others succeed, they could output entirely different files/dirs, etc. The current Container api is just not equipped to handle that.
I still like the idea of a separate MultiplatformContainer that has an API meant to handle it. Maybe you could do it on the current Container API by sprinkling optional platform args everywhere too?
Either way, that's why it's tricky π
(Terminal would be a fun case to handle too, do I want N terminals for each platform? etc.)
Another way of thinking about it might be more generalized than just platform details: the idea is just that "I want to chain the same operations onto a set of Containers", so you could think of it like MultiContainer that supports doing just that, with the use case of "run the same operations on these set of platform variants" just being a subset of cases handled
or, withExec takes an optional argument that specifies the platform (singular) to execute on. defaults to the engine's default platform. publish publishes all the platform by default
That's what we shouldn't do in order to avoid weird hidden inconsistencies I think. If a Container actually consists of N platform variant containers then it gets extremely subtle and confusing if some operations end up just affecting one of those containers. The rest of your chained operations that don' t specify a specific platform now operate on a set of containers that are inconsistent in terms of what operations have run, even though it's modeled as a single object.
The route that might make sense to me is to add optional platform args to every API like exitCode, stdout, stderr that allows you to retrieve the value from a single platform variant.
- But does raise the question of what to return in case that arg is not set. Returned the combined stdout/stderr of each container that executed in the last
withExec? etc.
That's what a MultiContainer would avoid
Also, this in theory could all just be a module actually. I can't see any reason this has to be in core anymore. The current core api has just the bare minimum necessary to create higher level abstractions like this in "userspace" (I think)
Can I inject the question of interfaces here? π what if Container were an interface and say ArmContainer, X86Container, MultiPlatformContainer were implementations?
Yeah that would help a lot if we model this support as a separate type, whether it's a separate core type or implemented as a userspace module.
Coincidentally... π
The refactoring was in a completely separate area from all of that, but yeah there's no huge technical blockers on making our core types interfaces. I think it's mainly DX questions like how to make it easy to implement vast APIs like "Container".
- In Go at least, the whole "embedded struct field" seems like an obvious route, but need something across SDKs of course.
It would be cool if you could do this:
type MyCustomType struct {
*dagger.Container
extraStuff string
}
type (m MyCustomType) ExtraAPI() string {
...
}
And then MyCustomType is usable as a Container (comes with a AsContainer API) but also has your extra stuff
I remember we had this feature early on in the modules work but removed it. I think at that time the DX was that you attached methods to the Container type, but something like the above feels like a much better + more sustainable approach
Mainly just need to know what the equivalent would be in Python/TS/etc. Need to make sure it's consistently possible to have a reasonable DX regardless of language-specific features (e.g. go struct embedding, which is highly Go-specific obviously)
I think the biggest bang for the buck would be Directory
Yeah, I can imagine. Should be equivalently possible to do for any Core type
I want to virtualize Directory in all sorts of ways
like imagine a composite Directory that is actually a mux-like router of dynamic handlers, assembled by me out of who knows what insane logic, and I can give it to you and your functions can just use it
Yeah totally agree, you could do all sorts of fun stuff with that
technically we could POC that concept in userland with an alternative core api implementation, but it would be a silo
Yeah wouldn't want that as a requirement. I guess right now the types that tie different modules together are core types and interfaces, but if we get core and interfaces to play nice with each other we could really start just centralizing on interfaces as the way types are shared. Or something like that.
Is there a transparent swap out, like if we renamed all core type implementations to CoreFoo, took over the original type names with perfectly matching interfaces, and touched nothing else, would everything just work? would anyone notice?
I think this is what we can and should do, yeah. Interfaces were implemented in such a way that they look the same as objects, which trimmed massive amounts of per-SDK work to support them on the consumer side (effort required is only on authoring-side). And I think that would also be our saving grace here.
Core types would look the same to all the SDKs, the only difference would be an additional ability to call e.g. AsDirectory to convert module custom types into a virtual core Directory implementation (when those custom types implement the Directory APIs)
And at that point the "standard" core Directory implementation that exists today would just be another implementation of that interface
amazing
that would unlock so many hacks
sorry I mean, so much synergistic innovation!
https://github.com/dagger/dagger/issues/6365 The blocker mentioned there was merged lifetimes ago (relatively speaking) and poking around the current code I don't see this would be too hard to get working. Interfaces are tricky so could be a tricky gotcha here or there but nothing fundamental that I can see
Actually, the way that issue is phrased is slightly different, it's mentioning using core types to implement interfaces, but we are talking about the inverse direction. I'll just update the wording of it slightly. We should handle both directions
No pb, makes sense
Hello,
I am considering changing the GitModuleSource's cloneURL to cloneRef ; It is a breaking change on the API, but it is supposed to be an internal call ...
This is not a mandatory change, I could keep it as is, but if we want to keep clarity, it makes more sense: https://github.com/dagger/dagger/pull/7708/files#diff-128f243f08ccf77a1f421d85c681f55e40096de957474b46c60602c9a0d94d38L429
WDYT ?
Hello,
I'm a big fan of this idea, I really like it - the interface wouldn't neccessarily be able to include all the methods currently on Container, like stdout/stderr/similar, as @civic yacht mentioned above. But we could have those methods exist only on a single arch container, while MultiarchContainer wouldn't have those. The issue is, something like that's a pretty big change, changing the Container type to no longer have those fields.
I think it's fine, either have those fields return an error, or merge the outputs of all archs executing each withExec concurrently. Either way MultiPlatformContainer would be used primarily for publishing (and maybe for test matrices)
I encountered another rough edge with compat mode, while developing on our own pipelines.
- While developing I find myself calling
dagger develop, to re-generate client bindings for a new dependency for example - Since I upgraded to 0.12,
dagger developalso upgradesdagger.jsonto require the latest engine - This disables compat mode and triggers compat errors, which are now my problem to fix
- Reverting this is confusing if you're not familiar with the mechanics
compat mode upgrade issues
@civic yacht @spark cedar Thinking of reworking the engine OTel pipeline into something more foolproof. Pretty tired of troubleshooting the current approach, and it still doesn't work properly in nesting scenarios, which is blocking my error simplification refactor. I've got a loose idea in my head that I'll investigate, hoping I can land something safely before next Thursday (PTO => Tuesday). It's starting to feel necessary to avoid these doubts of "is draining hosed again or is this a spot instance getting reclaimed."
eliminating OTel draining
During try to relocate the Elixir runtime (https://github.com/dagger/dagger/pull/7667), I just realize that include and exclude options in dagger.json file doesn't work when specify --sdk with local sdk path. For example, cd sdk/elixir && dagger init --sdk=.. potato, you will never get any files in include option, while initializing with dagger init --sdk=<github url> potato, the include works as expected. I curious now on how include and exclude works?
Got a quick CI-only PR if anyone's around for a β - I think it's uncontroversial: https://github.com/dagger/dagger/pull/7933
I suppose running in compat mode, I encountered this error in the docs which I'm updating.
Original command is: dagger -m github.com/dagger/dagger/linters/markdown@88d89e8d15ab6ad9ca4043a920d3cd735a6405fd call rules contents
Which errors, because at least in part, because of an issue with a dependency: failed to call module "wolfi" to get functions
https://github.com/dagger/dagger/blob/88d89e8d15ab6ad9ca4043a920d3cd735a6405fd/linters/markdown/dagger.json
shows
"dependencies": [
{
"name": "wolfi",
"source": "github.com/shykes/daggerverse/wolfi@f1481924568b8b317ab3854ded7c62028e484e2e"
}
],
https://github.com/shykes/daggerverse/tree/f1481924568b8b317ab3854ded7c62028e484e2e/wolfi
which has a custom engine version
"engineVersion": "4b825dc642cb6eb9a060e54bf8d69288fbee4904"
But since that linter is no longer around, I'm going to use ruff for now.
It took some kernel code spelunking but I think I fully figured out the bug that's causing flakes and mismatched go.mod files: https://github.com/dagger/dagger/issues/7900#issuecomment-2231773987
Ahh this thread is genuinely such a fun thing to follow along with, thanks for following through with this π
This would definitely make a good blog post if you decided βΊοΈ
Imagine the nanosecond timestamps colliding ahahaha, it's so horrible
Haha yeah it was a good one, my main takeaway is to never trust ns precision again, I think it's usually a lie π
What's most insane to me is how we never hit this before
Except with the weird python flake
I still need to look, but I think it could plausibly explain those python flakes too. But yeah it just requires so many things to align perfectly: generated files having same length but different contents, being loaded from the same paths in different nested clients that have the same hostname, being modified within 10ms of each other for the kernel to return the same cached current_time value, etc.
I'm almost more surprised we got "lucky" enough to hit it. But that's what happens when you have like 1000+ integ tests (when including subtests)
Hello! Just checking in on contextual dirs, is there a blocker there? Anything we can do to help?
the current status is still in #1260259252400820367 message, don't think there's any newer info
@tepid nova where does GOLANGCI_LINT_CONFIG come from in our linting config?
i noticed our linting isn't being correctly done, and this env variable is confusing to me
i can't find any reference to it in the golangci-lint codebase
or the docs
probably a cobra flag? We set it in dev/golangci
or on google either actually
i suspect something is wrong with it, and it doesn't exist, i can't find any reference to it from any other project
ah, it might be a gpt hallucination then π
so it's been silently using another file?
it's silently been using all the defaults rip π¦
also, not sure why, but issue.Severity is all empty strings?
and so Assert should have been erroring but wasn't
opened a pr https://github.com/dagger/dagger/pull/7950
I guess we need tests in some of these modules
sorry about this!
golangci config
Does Container.WithExec not support expand argument? Did I dream this?
(I did dream it. Really wish I had this right now https://github.com/dagger/dagger/issues/7951 )
Yeah, we always resort to "sh", "-c", "..." for that. π
I also need it in WithNewFile π
My pride forbids me from doing WithExec([]string{"sh", "-c", "cat > $HOME/bla"}, ContainerWithExecOpts{Stdin: contents})
Bumping this to keep it alive
@spark cedar I have proper wifi again. Anything i can help with?
bikeshedding on https://github.com/dagger/dagger/pull/7948 π
π§΅ on weird new v0.12 hangs
@rancid turret This PR is ready for a review: https://github.com/dagger/dagger/pull/7824
I have fix all tests π
Could anyone please try to reproduce this?
dagger call \
-m github.com/dagger/dagger/dev@pull/7950/merge \
--source https://github.com/dagger/dagger#pull/7950/merge \
engine \
lint
trying now
@tepid nova is there anything you think we need for https://github.com/dagger/dagger/pull/7824 that @fair ermine is working on? because this changes the default, aside from the docs update, are there other things we need to co-ordinate?
Ah, I don't think we should rush to merge this
Since these changes are very painful to reverse as we've learned, and this decision is linked to other decisions, which are not resolved (specifically: dev module or no dev module?)
Relevant discussion: #1260894336883626025 message
Don't get me wrong I hate the current default
had a successful run in https://dagger.cloud/jedevc/traces/1f0a1e4e1b0c4ae40ac56106a65c920e
just need an approval on that pr, and then we can have the fixed linter
done
Is there another way to pass information from the client to the engine without relying on env var ? As our tests are parallel, I might be interested in passing the SSHAuthSockPath via another channel on all our current tests. I've seen the engine.ContextWithClientMetadata, but the context does not seem accessible on server.client initialization
Like engine.toml?
Or from the Dagger Client during a run?
Clients should pass anything to the server through the client metadata, initialized client side here: https://github.com/sipsma/dagger/blob/c4f9c8672b294d11b8ab0b1a2ac576a02cdc5afc/engine/client/client.go#L1005-L1005
Yes thanks π , but I currently do os.Getenv("SSH_AUTH_SOCK") on that client metadata, however, if I wanted to scope each test with and without this option, relying on an env var might be easily broken, as most of the tests are running in parallel?
You can run nested clients, so the clients are in a container and can each have their own env vars. That's what we do for most of the module tests
Also, the support for passing sockets as args to functions is merged, so if that helps at all it's available on main now
@spark cedar sorry to add to your worries, my "streamline codegen" PR is done, I believe it's implemented properly, but it seems to be surfacing some weird interference between codegen and compat mode, I think?
@tepid nova, anything missing in this Python dev module iteration?
why do you say that?
I don't know, I've never used that module so not sure what to look for. I assume if it's good for you, it's good for me π
it looks greener than any other pr atm π
Asking for review π
on it β€οΈ
aha most of the line diff is linting π
You can filter the commits to get that out of the way π
OK then let's merge it
i have a funny suspicion that the use of the channel might do some fun things with cache, but it also might not - i left a comment about it, i can fix it up myself if ya like
Is there a way for a dagger function to infer its current default platform?
I guess just make a regular syscall for it from inside the runtime container (as long as I don't need the client's platform)
yeah, getting the current platform using something like containerd's platforms.Current is what we do in the dev module
for the client platform, you can pass it as an argument, and use the special current value as a flag on the cli
Would be great to make this the next use of the "contextual default" pattern. Take a dagger.Platform as argument, default to the client's platform
@spark cedar @rancid turret I'm thinking it's time to consider reverting the root dev module... While we untangle the design knot of "default source dir" / "contextual modules"... It's quite painful to have to juggle the -m flag while developing (sorry)
just a note about continually doing this - it means anyone outside of us using this module (like downstream builders, our own docs, etc) have to update as well
not an objection - but just noting that switching has a cost here
Right, I've been continuously warning potential downstreams to avoid importing our submodules for now, for that reason.
It's a tradeoff between using our reop as 1) a stable source of modules to import (which would be nice), or 2) a sandbox to experiment with best practices
At the moment we need the latter more IMO.
(this makes me want the equivalent of Go's "internal", similar to @copper snow asking to hide some of his modules from daggerverse)
yes, please
Does anyone know why our CLI publish function installs nix on top of goreleaser?
ctr := dag.Container().
From(fmt.Sprintf("ghcr.io/goreleaser/goreleaser-pro:%s-pro", goReleaserVersion)).
WithEntrypoint([]string{}).
WithExec([]string{"apk", "add", "aws-cli"})
// install nix
ctr = ctr.
WithExec([]string{"apk", "add", "xz"}).
WithDirectory("/nix", dag.Directory()).
WithNewFile("/etc/nix/nix.conf", `build-users-group =`).
WithExec([]string{"sh", "-c", "curl -L https://nixos.org/nix/install | sh -s -- --no-daemon"})
path, err := ctr.EnvVariable(ctx, "PATH")
if err != nil {
return nil, err
}
ctr = ctr.WithEnvVariable("PATH", path+":/nix/var/nix/profiles/default/bin")
// goreleaser requires nix-prefetch-url, so check we can run it
ctr = ctr.WithExec([]string{"sh", "-c", "nix-prefetch-url 2>&1 | grep 'error: you must specify a URL'"})
sounds like it needs nix-prefetch-url, which is a bit odd. maybe we use it for publishing our nix repo too, so it needs it?
Exactly this
Pulling on the thread of how we use goreleaser (started out just tracking down individual calls of "apk add")... I understand the appeal of goreleaser BUT in the case of our own repo, there's no excuse for not just using dagger instead
Context: I just realized that what I thought was a reusable daggerization of goreleaser, was actually just wrapping this:
- name: publish-latest-version
cmd: sh -c "echo {{ .Version }} | aws s3 cp - s3://{{ .Env.AWS_BUCKET }}/dagger/latest_version"
env:
- PATH={{ .Env.PATH }}
- AWS_EC2_METADATA_DISABLED=true
- AWS_ACCESS_KEY_ID={{ .Env.AWS_ACCESS_KEY_ID }}
- AWS_SECRET_ACCESS_KEY={{ .Env.AWS_SECRET_ACCESS_KEY }}
- AWS_REGION={{ .Env.AWS_REGION }}
- name: publish-latest
cmd: sh -c "echo {{ .Version }} | aws s3 cp - s3://{{ .Env.AWS_BUCKET }}/dagger/versions/latest"
env:
- PATH={{ .Env.PATH }}
- AWS_EC2_METADATA_DISABLED=true
- AWS_ACCESS_KEY_ID={{ .Env.AWS_ACCESS_KEY_ID }}
- AWS_SECRET_ACCESS_KEY={{ .Env.AWS_SECRET_ACCESS_KEY }}
- AWS_REGION={{ .Env.AWS_REGION }}
- name: publish-latest-major-minor
cmd: sh -c "echo {{ .Version }} | aws s3 cp - s3://{{ .Env.AWS_BUCKET }}/dagger/versions/{{ .Major }}.{{ .Minor }}"
env:
- PATH={{ .Env.PATH }}
- AWS_EC2_METADATA_DISABLED=true
- AWS_ACCESS_KEY_ID={{ .Env.AWS_ACCESS_KEY_ID }}
- AWS_SECRET_ACCESS_KEY={{ .Env.AWS_SECRET_ACCESS_KEY }}
- AWS_REGION={{ .Env.AWS_REGION }}
@civic yacht related to what we were just discussing live. How would you feel if the Go SDK by default created the boilerplate in ./internal/dagger/main/main.go instead of ./main.go?
The idea being - what if the code for your dagger module were always a library, that could fit in either a standalone project, or an existing project, without having to change the location
each SDK would be responsible for figuring out what "a library" means
In Go it seemed to make sense to make that a sub-package under internal/dagger, since it's reasonable to reserve that in every go project
You can move some of the main boilerplate there, but there would still be other generated code in the package with the user's code since we need to attach methods to objects in there and you can only do that in Go in the same package.
That wouldn't solve the problem we were discussing afaict though. Users want generated code bindings to module dependencies, so essentially they just want to be able to import what's currently in ./internal/dagger/dagger.gen.go. The main func is completely unrelated I think.
We need to do a little more to make what's in ./internal/dagger/dagger.gen.go importable and usable as a library, but that would be the route I think
You're right that it wouldn't solve the problem of calling modules from an external client, but it seems related, because it could give us a common framework for all projects using Dagger, whether they only want the client, or also want to write "server" code.
So the following would be true for every Go project that uses Dagger:
- Your Dagger module is also a Go module. The Go SDK can create a new go module for the occasion, or daggerize an existing one
- The Go SDK will write the generated client bindings (including your dagger dependencies) in
./internal/dagger. - If you want to extend the Dagger API with Functions, you develop that in
./internal/dagger/main(or maybe./internal/dagger/self?). The Go SDK can generate boilerplate for you there. - <something missing about getting the
dagobject?>