CI improvements | Dagger | Page 1

lusty grove Sep 15, 2025, 8:48 PM

#

Lets start it here!

Categories I brought up earlier:

[ ] dagger-for-github improvements to make it compatible with our own workflows
[ ] dagger git improvements. Currently we do some git checkout dance to get a shallow clone that also has some extra bits we need. In an ideal world, we can just run dagger call -m github.com/dagger/dagger@ref instead of doing a checkout which requires some work in dagger such as https://github.com/dagger/dagger/pull/11030
[ ] stop building the dev engine in every workflow. Instead, the current set of workflows will run against a release engine. This is ok because they already build their own dev engine as a service in the test pipeline. To make sure we still validate the module loading the same way building a dev engine does, we will create a new workflow that creates a dev engine and runs a simple job against it. See https://github.com/dagger/dagger/pull/9853 for more context on the type of error that this is intended to catch

The hopeful outcome is that each workflow can be a very simple event definition + dagger call

warm anvil Sep 15, 2025, 9:21 PM

#

CI improvements

#

Kind of a stretch goal, but maybe we take the opportunity to further break up and simplify the monolith? 🙂

#

If there's an easy target I can take that on

#

I still cringe every time I have to open /.dagger/*.go

glacial moth Sep 16, 2025, 10:14 AM

#

lusty grove Lets start it here! Categories I brought up earlier: - [ ] dagger-for-github im...

SGTM 🎉

#

based on a discussion in #maintainers curious about reworking the changie flow as well? i've wanted to remove it for a while.

#

chatting with @lime kestrel i actually really like the idea of just using labels (like changelog/feature, changelog/fix, etc) to decide which PRs are relevant, and then only extracting the PR title in the changelog.
that forces us to write nice, user-readable PR titles, which ain't a bad thing.
not a huge fan of using an agent, but if we changed the flow to this, it would be a lot easier to give that a try later.

glacial moth Sep 16, 2025, 10:19 AM

#

warm anvil Kind of a stretch goal, but maybe we take the opportunity to further break up an...

.dagger/scan.go could mostly be a resuable modules/trivy
sdk_typescript.go doesn't use the sdk/typescript/dev module
sdk_go.go should mostly be split into a dev module for sdk/go/dev

warm anvil Sep 16, 2025, 2:19 PM

#

another one: you mentioned path filtering @lusty grove .

Hear me out, what if we generated gha path filters from scanning dagger module dependencies? It would get us closer to the goal of "smart events" in native CI (where you don't need explicit path filtering at all).

wdyt?

glacial moth Sep 16, 2025, 2:22 PM

#

i don't think it's entirely trivial, since we'd need to extract include/exclude filters from dagger.json and from the context args

#

(those are also buildkit-style ignore patterns, which may-or-may-not be convertible to shell-style globs)

warm anvil Sep 16, 2025, 2:25 PM

#

right. What's interesting is that we need to do that anyway for snart events. So could be a stepping stone. cc @brittle sphinx @opal minnow

glacial moth Sep 16, 2025, 2:26 PM

#

if we do it, imo, it should be part of the engine functionality - i don't think we should have extra logic that's not part of the engine to detect this.

warm anvil Sep 16, 2025, 2:27 PM

#

yeah it's going to be in cloud in the end

#

but there could be helpers in the engine, just not sure what the interface would be

#

I guess just a list of paths?

#

I'm guessing you don't want an actual gha config generator in the engine 🙂

glacial moth Sep 16, 2025, 2:36 PM

#

yeah, probably a list of paths? i'm not really sure

#

obviously, there might be a super easy way to do this, but my guestimation is that getting something that matches the compat of include/excludes of modules is gonna be fiddly

glacial moth Sep 16, 2025, 3:22 PM

#

ci improvements: https://github.com/dagger/dagger/pull/11066

#

dagger.io update: https://github.com/dagger/dagger.io/pull/4608

warm anvil Sep 16, 2025, 3:54 PM

#

glacial moth obviously, there might be a super easy way to do this, but my guestimation is th...

yeah I see the appeal.

warm anvil Sep 16, 2025, 6:05 PM

#

~~@kpiroddi~~ @lusty grove do you think we can get our CI config so standardized and vanilla, that the YAML could eventually be generated without custom code? 🙂

(Sorry Kelvin for the accidental ping!)

#

Just look at the modules, boom, spit out some yaml

#

(not as a short term thing, more like an aspirational goal)

#

also a stepping stone to native CI 🙂

lusty grove Sep 16, 2025, 6:25 PM

#

warm anvil also a stepping stone to native CI 🙂

yes i've 100% been thinking of it as a stepping stone. we'll keep pushing for the config to be as minimal as possible

warm anvil Sep 19, 2025, 5:53 PM

#

glacial moth obviously, there might be a super easy way to do this, but my guestimation is th...

As discussed: https://github.com/dagger/dagger/issues/11098

#

I love the idea of providing the primitive in the engine, it makes perfect sense. It's non-trivial, but feels clean, and once we have it, it will make it trivial to generate path filters for all CI systems 😍

lusty grove Sep 19, 2025, 7:43 PM

#

draft: https://github.com/dagger/dagger/pull/11099

GitHub

Use released engine in CI by kpenfound · Pull Request #11099 · da...

As discussed, this removes the dev engine from our CI steps and adds one small new step that uses the dev engine to make sure it can load our CI module.
TODO

Merge avoid downloading dagger if req...

#

it wont actually run until the first 2 TODOs are done

#

the 3rd TODO is a bigger one so maybe we can pair on that Monday @brittle sphinx ?

brittle sphinx Sep 19, 2025, 8:46 PM

#

lusty grove the 3rd TODO is a bigger one so maybe we can pair on that Monday <@6283920878800...

Sounds good! We should have a bit more data on parc by monday so we could potentially start migrating some workflows as part of this work

lusty grove Sep 22, 2025, 6:53 PM

#

lusty grove draft: https://github.com/dagger/dagger/pull/11099

@brittle sphinx the key parts of this PR:

https://github.com/dagger/dagger/pull/11099/files#diff-8faf949e9daf6a3189631345b957f26bf1df9843f7fbbebc5a314e49bc9edfcc .github/main.go: disables the 3 spots where we specify that we want a dev engine and adds a new workflow that runs dagger call check --targets=SDKs with a dev engine
https://github.com/dagger/dagger/pull/11099/files#diff-45807afeb464794b5fe50db55cfca3642aca0555d095315263f84a45dcab4ed1 modules/gha/steps.go: replace the install-dagger bash step with dagger-for-github@v8.2.0. This is using the action as a "setup" action - it just installs the CLI. In runs where the correct version is already installed, it's a no-op

The remaining TODO that I mentioned in standup is to update /modules/gha/steps.go:callDaggerStep to use dagger-for-github instead of the bash its currently using, however this can totally be done in a separate PR to keep things simple.

Currently CI is failing due to the ongoing network/docker/ns issues, so I'm not sure how close the current PR is to being ready

brittle sphinx Sep 23, 2025, 12:31 PM

#

I know you are on PTO, just sharing here for context. I'll take a look at that last change. I was reviewing the PR, one thing we should change is the name of the jobs to not include the dev in their name. This way we can visiblity into the switch on honeycomb

brittle sphinx Sep 23, 2025, 1:19 PM

#

Switching to dagger-for-github as is means we are loosing the summary attached to each workflow (see screenshot). We could add this capability to dagger-for-github, Its nice to have the trace immediately there without having to search through the logs. WDYT? @glacial moth. In that case I would merge this PR as is and do the callDaggerStep change in a follow up as Kyle suggested

glacial moth Sep 23, 2025, 2:10 PM

#

uhhh

#

i don't use the summary that much honestly

#

having the links is useful!

#

there's an output from the dagger-for-github actions

#

see https://github.com/dagger/dagger-for-github/issues/179

GitHub

✨ Allow fetching the trace URL from the call · Issue #179 · dag...

What are you trying to do? There should be an action output (like there is for stdout), where we can get the trace URL for the run (if the token was provided, etc). Why is this important to you? I ...

#

so wouldn't be that hard to plumb into the summary? we just need output it

brittle sphinx Sep 23, 2025, 4:00 PM

#

Are the engine logs on engine & cli tests used?

glacial moth Sep 23, 2025, 4:58 PM

#

yesss - you can find them under the AsService logs span

#

takes a bit of clicking around, but they're there

warm anvil Sep 23, 2025, 5:01 PM

#

It would make me happy if we didn't throw away the summary feature, it took me a long time to debug 😭 By the way, the generator made the devloop on the summary easier, because I could call the individual dagger function that generates the markdown, with mocked inputs, and locally check that it looked ok.

glacial moth Sep 24, 2025, 10:38 AM

#

regardless of how long it took to work on, is it something that you use? i find the github summaries are not super useful to navigate to. if we do keep them, i'd rather have them as just a link directly to the trace, instead of having another way to visualize traces.

again, it's the dogfooding argument - if traces isn't good enough, we should make it good enough, but if we're relying on a tool our users aren't using, then we're going to miss those things

warm anvil Sep 24, 2025, 4:15 PM

#

Well, it's a native feature of Github Actions that makes our integration with it slightly better. I am all in on Dagger Cloud, and the CI integration itself is not something I actively debug and maintain these days, so I go straight to the Dagger Cloud traces. But that won't be true of all our users. Having gone to the trouble of implementing a native summary is a sign of a well-maintained action that worried about the details (IMO)

#

Occasionally when the CI config itself is what I care about, then yes, I absolutely use the summary. Much better than rawdogging the logs

brittle sphinx Sep 25, 2025, 1:38 PM

#

Added a summary to dagger-for-github here: https://github.com/dagger/dagger-for-github/pull/192

I included the actual command being executed (or script if running dagger shell), the link to the trace and the output of dagger version

brittle sphinx Sep 29, 2025, 5:44 PM

#

Sharing some more details of today's update re: https://github.com/dagger/dagger/pull/11133 @lusty grove

There's two separate changes in that PR:

Use dagger-for-github instead of the exec script. This removes the install-dagger as well
Using PARC by setting dagger-flags: --cloud and cloud-token: oidc on the dagger-for-github action

For 2. to go out we need tomorrow's release. We could split the PR in two but since we only had to wait until tomorrow I thought it didn't matter that much. This is the change we need before using PARC in CI: https://github.com/dagger/dagger/commit/e669c2a22a319691f82ae2cfbe4d9925ef26e0c0#diff-a779dc2d07845bab6445b177d3ab8e5b5f3ffb6e48d72198b9b92cce6837402eR287. It's a hacky version of scale out, developed as a temporary meassure to start testing the performance of PARC in alternative CI engines.

Next steps:

Develop the v0 scale out we mentioned on today's call. Module + function
Refactor our CI to have separate functions for github actions jobs: check --targets=docs -> check-docs, etc. This way our github actions YAML won't have anything special, it will just call functions with no required arguments

lusty grove Sep 29, 2025, 7:00 PM

#

brittle sphinx Sharing some more details of today's update re: https://github.com/dagger/dagger...

left a comment on the PR!

lusty grove Sep 29, 2025, 8:12 PM

#

glacial moth chatting with <@108011715077091328> i actually really like the idea of *just* us...

dont want to lose this thread (regarding changelogs / removing changie). What tooling do we need to land on a better flow?

lime kestrel Sep 30, 2025, 3:38 AM

#

@brittle sphinx looks like the evals job doesn't work anymore since it runs with final engine instead of dev: https://dagger.cloud/dagger/traces/39defc61d65f33469406c3c8e090d2af

Dagger Cloud

Browse and visualize Dagger traces.

#

will just run em locally for now

glacial moth Sep 30, 2025, 10:01 AM

#

lusty grove dont want to lose this thread (regarding changelogs / removing changie). What to...

i would suggest that use a github label enabled flow

#

i would add changelog/feature, changelog/fix, etc labels to correspond with the changie categories

#

then, when we generate a changelog, simply get all the PRs since the last release, grab em from github, filter by labels, and run the titles + authors through a go template

#

then, implement something like the changie merge functionality to get it into the CHANGELOG

#

that's my vision at least!

#

if we wanna add llm stuff after that that's potentially interesting, but i wouldn't start with that

lime kestrel Sep 30, 2025, 10:04 AM

#

github has basically all that built in btw, there's just a tiny straightforward config file: https://docs.github.com/en/repositories/releasing-projects-on-github/automatically-generated-release-notes#example-configurations

GitHub Docs

Automatically generated release notes - GitHub Docs

You can automatically generate release notes for your GitHub releases

glacial moth Sep 30, 2025, 10:04 AM

#

glacial moth then, implement something like the `changie merge` functionality to get it into ...

i still would really like the option to manually tidy up changelogs if i don't like it

lime kestrel Sep 30, 2025, 10:04 AM

#

lime kestrel github has basically all that built in btw, there's just a tiny straightforward ...

i use this for bass: https://github.com/vito/bass/blob/main/.github/release.yml

glacial moth Sep 30, 2025, 10:05 AM

#

lime kestrel github has basically all that built in btw, there's just a tiny straightforward ...

sgtm, if we're happy using a github tool, i thought part of the goal was to detach from a github-centric process?

lime kestrel Sep 30, 2025, 10:05 AM

#

well, we're still publishing github releases right? this is just a bool/flag on the github release creation api

#

i guess the question becomes whether we value a CHANGELOG file?

#

there's https://github.com/rhysd/changelog-from-release which seems ok. but i'm not strongly opinionated. just trying to raise options for minimal effort 😛

glacial moth Oct 6, 2025, 4:43 PM

#

@lusty grove was this one of the things you were interested on working on?

lusty grove Oct 6, 2025, 7:49 PM

#

glacial moth <@135620352201064448> was this one of the things you were interested on working ...

yeah I'm happy to carry this one. Sounds like a good discussion topic for next week!

warm anvil Oct 7, 2025, 4:45 PM

#

@lusty grove @lime kestrel I'm polishing my codegen improvements PR... A small issue with node_modules which is now getting pulled into the Changeset. Looking for the cleanest way to omit it. Is it acceptable to mount node_modules in a tmpfs?

lusty grove Oct 7, 2025, 4:48 PM

#

I think that should be ok in our case. Is it not possible to exclude it from the directory?

warm anvil Oct 7, 2025, 4:49 PM

#

lusty grove I think that should be ok in our case. Is it not possible to exclude it from the...

In my case, no practical way...

There's no Directory.Changes(exclude)
Also no Container.Directory(exclude)

So it's possible but I have to refactor a bit

lusty grove Oct 7, 2025, 4:50 PM

#

Got it, I was thinking like Directory.WithoutDirectory("/src/node_modules").Changes() but maybe that doesn't make sense

warm anvil Oct 7, 2025, 4:53 PM

#

Yeah that's what I'm going to do for now..

#

Just not sure what is more or less efficient..

warm anvil Oct 7, 2025, 9:03 PM

#

Update: codegen refactor is looking good! will open PR soon

#

I kind of want to attack our version module next... So slow to load..

#

#

I added a compat layer, so we can upgrade each individual SDK codegen to Changeset, incrementally (the module supports both)

#

Also simplified sdk_all.go which had a lot of unnecessary abstractions

#

@lusty grove how do you feel about porting all the official SDK CI modules back to Go?

#

Right now we're in an awkward middle ground, where:

We still have centralized per-SDK CI in the monolith (.dagger/sdk_xxx.go)
Separately, each SDK has a standalone "dev" module, written in its native language.
Some parts of the monolith do a pass-through to the dev module, for some functions. For example Python and Elixir pass-through to the dev module for generate and lint.
Other parts of the monolith still duplicate what's in the dev modules, so there's drift and dead code.
We get the downsides of multi-language (slowness, more complexity, more boilerplate), but we don't get the benefits

lusty grove Oct 7, 2025, 9:20 PM

#

the alternative would be to fix number 4 i guess?

warm anvil Oct 7, 2025, 9:20 PM

#

My proposal (for discussion):

We port the dev modules to Go
Once everything is in Go, much easier to finish the job, and spin out 100% of SDK CI in sub-modules
This is a forcing function for cleaning up the very complex relationship between SDK build and engine build. There's a circular dependency because: engine build bundles SDKs; and SDK tests require engine. Until we clear 1 and 2, we can't touch 3.

#

As a bonus, it would make our entire CI faster

#

everything is so complicated...

#

🤮

#

https://tenor.com/view/squirrel-brain-squirrel-math-hard-school-gif-24731714

Tenor

warm anvil Oct 7, 2025, 10:36 PM

#

https://tenor.com/view/alice-wonderland-maze-labyrinth-gif-7202023

Tenor

warm anvil Oct 7, 2025, 11:15 PM

#

Running into a fundamental UX question: different generation functions have different lifecycles

#

In devloop: generate SDKs and docs
In release loop: generate changelog (because that's how Changie works)

So far we made the assumption that all generation is always safe to run during devloop, and therefore should be required by CI (ie. there's a check that fails if you forgot to generate anything). But we can't do that for changelog... But it's still a generation function! So... how to expose this lifecycle difference in a way that makes sense to the user?

#

cc <@&946480760016207902>

#

Maybe we'll eventually need a first-class concept of "loop" or "gate"?

#

maybe to be reconciled with the Concourse-like declarative model? Here we would be declaring the stages of your lifecycle

#

And how to graduate from one to the other

#

(just spitballing)

warm anvil Oct 8, 2025, 12:00 AM

#

Starting to get there: https://github.com/dagger/dagger/pull/11161

warm anvil Oct 8, 2025, 1:26 AM

#

warm anvil - In devloop: generate SDKs and docs - In release loop: generate changelog (beca...

Update, looks like I got it wrong, the check for generated Changelog is called in devloop (and checked in every PR). I was missing the fact that new entries from changie new don't flow directly into the generated changelog - they have to be staged with changie batch first

lusty grove Oct 8, 2025, 4:24 PM

#

I think we're ready to simplify the changelog generation so that will go away probably

warm anvil Oct 8, 2025, 4:28 PM

#

Just playing devil's advocate, I don't think changing how we generate changelog is the lowest-hanging fruit - even though it sounds cool and I'm not a changie fan

#

(raw reaction from spelunking in our monolith... it's pretty gnarly in there still)

lusty grove Oct 8, 2025, 7:56 PM

#

Definitely not lowest-hanging, but it fits somewhere in the CI improvements roadmap

warm anvil Oct 8, 2025, 7:57 PM

#

@lusty grove when you're around I'd love to show you around my monolith simplification PR, it's pretty satisfying

warm anvil Oct 9, 2025, 12:27 AM

#

@lusty grove not urgent - what's the context for this GHA workflow added last week?

        withWorkflow(
            ci.AltRunner,
            true,
            "Dev Engine",
            "check-sdks",
        ).

#

It seems redundant with this other one:

    withSDKWorkflows(
            ci.AltRunnerWithCache,
            "SDKs",
            "python",
            "typescript",
            "go",
            "java",
            "elixir",
            "rust",
            "php",
            "dotnet",
        ).

Except for the different runner

lusty grove Oct 9, 2025, 12:29 AM

#

When we stopped running everything on dev engines, we agreed to have one small task that still runs against a dev engine to verify our CI module loading against dev and all that. So the specific check is somewhat arbitrary

warm anvil Oct 9, 2025, 12:33 AM

#

Ah I see.

#

@lusty grove I'm trying to organize the "check matrix" in the monolith..

#

The older workflow (withSDKWorkflows) runs each SDK check in a parallel GHA job. Whereas yours runs all SDK checks together in a single job. I'm wondering if I could nuke the parallel one, and just apply the same pattern as you in both cases

#

At the moment a SDK check is 1) sdk lint 2) sdk test 3) test-publish

I would have assumed it was too big to group all 3 for all sdks in a single gha job. But maybe not?

#

(assumption based on the pre-existing parallel workflow)

#

if it all fits in a single GHA job, I could nuke some complexity in the module

#

will check prod traces to get a sense for the size of these jobs

#

🤔

#

OK .... if I'm not mistaken, your combined "all in one" job is actually WAY FASTER... don't understand how that's possible

#

I guess it's because of the new runners

#

?

#

Based on these numbers it seems very reasonable to nuke the parallel jobs and just run all-in-one SDK checks in a single job, in both old and new runners

#CI improvements