#maintainers

1 messages ยท Page 17 of 1

spark cedar
tepid nova
still garnet
spark cedar
still garnet
#

that was the "more to bikeshed" i mentioned ๐Ÿ˜›

spark cedar
#

but yeah, uh

still garnet
spark cedar
#

i guess if they're part of the same thing it implies that they're ordered

#

jinx lol

#

uh

#

on the trickiness level, it would be the result of merging FilterFS and gitignoreFS into one struct

#

dunno, is undoing gitignores later on in a pattern an actual "legitimate" use case? like, i can't understand why that's useful

tepid nova
#

Yeah team +gitignore here

#

(I replied in the issue)

spark cedar
#

(some modules already have this problem, which grrr)

#

potentially we should take the opportunity to do codegen for the supported pragmas

#

e.g. for creating the helpers in python/ts

still garnet
still garnet
#

right right

spark cedar
#

tl;dr the main difference is that git patterns support a trailing / to only match directories. docker ignore patterns don't support anything like this, and you can't really emulate it see https://github.com/dagger/dagger/pull/10870

#

it feels like you should be able to right ๐Ÿ˜›

still garnet
#

yeah ๐Ÿ˜ญ

#

hot take: what if we ONLY did gitignore-style? more people know that than buildkit

tepid nova
#

My assumption from the very beginning has been that we will not get 100% compatibility between buildkit-style include/exclude, and git ignores. And should be using git's actual implementation to determine what is or isn't ignored

spark cedar
#

but bleurgh, implementing git's pattern matching in go is not incredibly trivial (and i couldn't find anyone who'd done it - although maybe for this variant?)

still garnet
#

or shell out to git plumbing somehow? ๐Ÿ˜„

spark cedar
#

if we decided that route, i would probably either:

  • painstakingly copy git's wildmatch logic into go, as close a translation as possible
  • bite the bullet and just use CGO
spark cedar
#

(one thing to note is that the git docs they reference are actually not what git does)

#

e.g. character classes are not mentioned! but they are supported ๐Ÿ˜›

#

sorry, welcome to a special and unique hell that i spent ages in

still garnet
#

perhaps we can elect to not care about character classes notsureif

tepid nova
#

There is a git plumbing CLI tool

#

Does anyone know why there is an explicit finding module configuration step hardcoded in the CLI when loading a module, before actually loading the module?

tepid nova
#

@still garnet I'm trying to wrap up CI refactor part 2, to prep for toolchains & checks... But later today, could I pick your brain live? I have a creeping fear that checks is using errors wrong... So far I haven't gotten a single useful error check message (because I use the error message, and that is also useless on its own - ie. the error might as well say "read the logs")

still garnet
tepid nova
still garnet
tepid nova
still garnet
#

sounds fun

tepid nova
#

Code is getting simpler, but somehow things are slower? ๐Ÿ˜ญ

#

It gets even stranger when I zoom into one particular linting task

why is the sum 43s????

#

Ah I have to sync... Can't have a function be lazy and with nice looking custom spans

#

somehow getting slower and slower

#

it all traces back to module loading... being very slow.... incredibly slow....

#

seriously

#

Is it possible that PARC machines are slower today @wild zephyr @astral zealot ?

#

20 seconds to remove a mountpoint (just randomly sampled what was running)

#

still going

still garnet
wild zephyr
tepid nova
#

Actually, looking at traces from our current CI... it's just as slow. So this is "normal"

still garnet
astral zealot
civic yacht
tepid nova
#

And now for some variety... Has anyone seen this crash in the typescript codegen?

2025/10/29 21:27:20 INFO generating SDK library language=typescript
panic: template: pattern matches no files: `src/api.ts.gtpl`

goroutine 1 [running]:
text/template.Must(...)
    /usr/lib/go/src/text/template/helper.go:26
github.com/dagger/dagger/cmd/codegen/generator/typescript/templates.New({0xc000673c80, 0x7}, {{0x7ffcd05ddca4, 0xa}, {0x7ffcd05ddcb2, 0x19}, {0x0, 0x0}, {0x0, 0x0}, ...})
    /app/cmd/codegen/generator/typescript/templates/templates.go:30 +0x55a
github.com/dagger/dagger/cmd/codegen/generator/typescript.generate({{0x7ffcd05ddca4, 0xa}, {0x7ffcd05ddcb2, 0x19}, {0x0, 0x0}, {0x0, 0x0}, 0xc0002e92f0, 0x0, ...}, ...)
    /app/cmd/codegen/generator/typescript/generator.go:65 +0x18e
github.com/dagger/dagger/cmd/codegen/generator/typescript.(*TypeScriptGenerator).GenerateLibrary(0xc000275950?, {0x4bb2a4?, 0x0?}, 0xc000275968?, {0xc000673c80?, 0xc000010f30?})
    /app/cmd/codegen/generator/typescript/generator.go:35 +0x8a
main.Generate({0x1265b98, 0xc0002e8480}, {{0x7ffcd05ddca4, 0xa}, {0x7ffcd05ddcb2, 0x19}, {0x0, 0x0}, {0x0, 0x0}, ...}, ...)
    /app/cmd/codegen/codegen.go:62 +0x1ca
main.GenerateLibrary(0xc000247000?, {0x101da6f?, 0x4?, 0x101d967?})
    /app/cmd/codegen/generate_library.go:39 +0x319
github.com/spf13/cobra.(*Command).execute(0x19e7940, {0xc0000cb9c0, 0x4, 0x4})
    /go/pkg/mod/github.com/spf13/cobra@v1.10.1/command.go:1015 +0xb02
github.com/spf13/cobra.(*Command).ExecuteC(0x19e84c0)
    /go/pkg/mod/github.com/spf13/cobra@v1.10.1/command.go:1148 +0x465
github.com/spf13/cobra.(*Command).Execute(...)
    /go/pkg/mod/github.com/spf13/cobra@v1.10.1/command.go:1071
main.main()
    /app/cmd/codegen/main.go:33 +0x1a
still garnet
tepid nova
#

Yes, going through my PR to figure out how/why.. probably a dumb typo

#

It looks like the codegen binary itself is wiping the source checkout?

#

ok nevermind. it was my dang module (which builds cmd/codegen) that had too conservative ignore filters. Which resulted in missing files from a go-embed (not caught at build) which in turn caused runtime error

coral vector
tepid nova
#

@still garnet maybe too late for you? But I'm ready to give it a try

#

(I think we lost Kyle)

still garnet
obsidian rover
tepid nova
still garnet
tepid nova
#

@still garnet @tidal spire hear me out: inline dang middleware instead of json arg overrides

tidal spire
#

It sounds interesting... What could it look like? Would it be unnecessarily confusing for someone unfamiliar with the API that just wants to set some argument values?

tepid nova
fair ermine
#

@civic yacht @still garnet @rocky plume @spark cedar Hey dagger internal experts ๐Ÿ˜„

Do this error sounds a bell to some of you?

Error: make request: input: packageDetection.containerEcho failed to loa            
ailed to load runtime: select: failed to load runtime: failed to call sd            
ime: select: load host.directory(path: "/work", include: ["./dagger.json            
 gitignore: true).directory(path: "src"): Directory!: load: load base: f            
 content hash: failed to get directory: failed to get snapshot: failed t            
failed to receive stat message: rpc error: code = NotFound desc = get fu            
: rpc error: code = NotFound desc = eval symlinks: lstat /work: no such             
ctory

It happens on my TypeScript optimization refactor: https://github.com/dagger/dagger/pull/11309

I quite de not understand why exactly it's happening, it seems that when we are lazy loading the runtime, we are not able to get the modulesource context directory snapshot when multiple calls happen in parallel: https://dagger.cloud/Quartz/traces/79dd59c08a37e61eb9987cf4917a2afd?listen=d0b8fdcaf31b041f&listen=7cabdf90e2c2109e&listen=14877eea57aa17e7&listen=ffe1200d6fa02c8e&listen=0440542341b8a3f1&listen=0ca779a763fb52ca&listen=0f5713370ceace87&listen=52a9060445ed383b&listen=c150f77cf0751ead&listen=ff45543849790c7b

I'm quite confused on that error and I'm not sure how I can debug that, do you guys have ideas?

My guess is that the call to host.Directory in that lazy context isn't quite right but it's very weird, like some traces are missing when loading the runtime (probably because they are cached?)

spark cedar
#

this introduces a new type Watcher, and plumbs it into just a manual call. there is currently no service integration.

#

however, all the host<->engine refreshing is implemented - it shouldn't be too hard to implement a WithMountedWatcher or something for use in Services

#

this should be possible for someone else to pick up when i'm gone!

fair ermine
#

@civic yacht @leaden glade @obsidian rover

Hey guys!

My TS optimization PR is pretty much ready, I'm gonna publish some benchmark tomorrow morning ๐Ÿ˜„

I'll follow up later with optimization on the client generation so I can fully cleanup the TS SDK module and remove a lot of files ๐Ÿ™‚

Link: https://github.com/dagger/dagger/pull/11309

GitHub

This PR contains a lot of improvements for the TypeScript SDK module support (ModuleRuntime / Codegen)
I may then split that into multiple PRs (or not), but this is a global PR so I can easily shar...

civic yacht
#

Was wondering why the finding module configuration step was so slow doing filesyncs and added some more spans. Turns out the filesync itself is essentially instantaneous and all the time is spent blocked on buildkit worker cache metadata operations that involve mutexes and writing to boltdb..... the create copy ref, finalize copy ref and release copy ref steps are the bottlenecks by far....

I think that subsystem needs to be the next target for theseus, it just keeps coming up every time I look into pretty much any perf issues cc @charred lotus

tepid nova
#

For you @civic yacht ๐Ÿ™‚

civic yacht
tepid nova
#

๐Ÿ˜ญ ๐Ÿ˜ญ

censored trace of my stupidity

#

yeah dumb bug on my part

still garnet
#

is there an issue for 1password prompting multiple times for multiple secrets? swore someone mentioned it recently. (will create if not, didn't find one)

civic yacht
# civic yacht Was wondering why the `finding module configuration` step was so slow doing file...

Hm yes, tugged on this thread a little bit more (disabled sync in boltdb in every db across buildkit+containerd) and just ran the full TestModule suite on my laptop in 9m43s... https://dagger.cloud/dagger/traces/7bc02c44646b3db013cf5835d3ed9bef?listen=b6ee1b64784edaf5&listen=8748a72037aa0dea#b6ee1b64784edaf5

Before this that would take well over 30m locally mindblown

This definitely seems worth pursuing ๐Ÿ˜‚

civic yacht
#

Dagger Cloud

still garnet
tepid nova
civic yacht
#

There have been a lot of jobs getting stuck at loadPackages in the go sdk today too, which also touches the network, which made me wonder if there's something going wrong there (also highly possible that's something else entirely though)

tepid nova
#

I wonder if it's related to the pull-through cache injected by Namespace?

Nope, I reproduced on my local machine (with and without PARC)

#

That's the only common denominator I can think of between 1) a pull by dagger on docker hub failing with a DNS resolve error, and 2) a pull by the docker CLI on our own registry, failing with a 500 http code

tepid nova
#

One of the issues seems related specifically to the php unit tests, but only when running dagger-in-dagger??

#

I ruled out PARC machines.. It happens on my local machine also.

tawny iris
#

flaky php unit tests

glossy stratus
obsidian rover
#

I am trying to ensure that the SDK runtime always get re-built whenever i call a function (never cache in this debug mode) as part of a debug hidden field inside the dagger.json.

I can't seem to find where to burst this very specific cache (inside modulesource )... Does anyone have hints ๐Ÿ™

Atm, when i call a function and recall it, the runtime is always cached, running out of ideas

This comment here makes me believe that there's a caching layer involved that I don't understand ๐Ÿคฃ

civic yacht
#

I am trying to ensure that the SDK

tepid nova
#

Any objections to disabling local checkout in our CI? Ie. go full dagger -m $GITHUB_THIS/$GITHUB_THAT

charred lotus
tepid nova
obsidian rover
#

Stupid question: I can see a "runtime"

fair ermine
charred lotus
#

Im missing the step where the string "moduleRuntime" is matched in core/schema

wild zephyr
#

seems like dagger develop and dagger functions don't check anymore that the code in the module actually compiles. Is this pary of the optimization we recently shipped with the typedef SDK split @leaden glade ?

#

mostly checking if we should do something else here given that I was a bit surprised to see that after upgrading a module to a newer version of Dagger which had a breaking change (removal of container.Build), I ran dagger functions and everything seemed to be working fine. It wasn't until I ran dagger call which I was presented with the actual compilation error

leaden glade
# wild zephyr seems like `dagger develop` and `dagger functions` don't check anymore that the...

Yes, it's due to the moduleTypes SDK split and more than that to the preparation of self calls.
With self calls, it means the module code can contains calls to the module itself through the generated code. But this generated code depends on the module code.
We need to know the types and functions exposed by the module to generate the code that the module will use.
This means the body of a function using self calls will never been able to build before the engine receives the types and the SDK use them to generate the corresponding code.

With that, it means the moduleTypes (at least the implementations we have) only care about the definition/signature of the exposed types and functions. Usually it still requires the code to be syntactically valid.

One of the positive aspect is the performances as it doesn't requires anymore to build and even doesn't require the dependencies to be fetch at the point.

#

Regarding dagger functions I think that makes sense in the way what we want is the list of functions exposed.

#

But maybe dagger develop or dagger update should go one step further and ensure the module can build. I don't know, but we can discuss/try that.

tidal spire
#

I do think we need a built in way to make sure a module (and its dependencies) compile. It necessarily need to be a side effect of some other task like many of us were using dagger functions for

leaden glade
#

Maybe only for dagger functions, that way we keep dagger develop less strict as it can help while working with self calls (and working with code that doesn't build yet)

#

I can propose something in that way and open a PR if you think that's better we go back to this behaviour for functions

tidal spire
#

I like dagger functions being faster, so I'd be in favor of some new command

leaden glade
#

Maybe something like a dagger verify? (or a name more scoped to module) A command that will ensure dependencies can be fetched, that will build the module to ensure it works, etc. A bit like a call but a call of nothing. No files will be exported on the host (it's the goal of develop) and no function list will be printed (it's the goal of functions)

One main difference with functions is functions can be used by users of the module where develop/verify (or any other name) is really for the developer of the module.

tepid nova
#

We discussed this briefly with @fair ermine when he was preparing to merge lazy runtime loading. But I didn't realize the behavior was already present with the SDK interface split.

#

I worry about adding a random command that you just have to memorize. We already have dagger developthat is already like that... We could add it to develop but then it becomes even more of an "everything command"

civic yacht
#

For the go sdk, it might be possible to catch compilation errors during the typedefs step because the package we use to parse out the schema is also capable of reporting compilation errors (it's the same package go linters use). Not 100% sure, but may be worth a check if not done already

tidal spire
#

wouldn't dagger call --help basically work the same way dagger functions used to?

leaden glade
#

Maybe that could be a flag on the develop command? So not a new command. And it's really like a command to run in development phase, so a develop --verify that ensure the module is valid could be nice

tepid nova
tepid nova
#

HEADS UP dagger call dev is now dagger call playground

tepid nova
#

@still garnet re: pretty-printing checks logs. Should I implement my own custom idtui.Frontend, and hardcode using that in the checks subcommand? Or should I modify / hook into an existing Frontend?

still garnet
#

the latter, preferably

tepid nova
#

Or maybe I don't need a full-blown Frontend, just need to implement a small part of what it does, and call that explicitly?

still garnet
#

yeah, i'd start by seeing if you can just extend frontend_pretty.go

tepid nova
still garnet
#

quick suggestion: add a DB.CollectChecks, very similar to CollectErrors, and use it in a very similar way, with whatever UI tweaks are appropriate

#

can hop on in a sec too

tepid nova
#

Ah I see, so keep calling the exact same UI code, but have it behave differently if checks data are available (indirectly detecting that dagger checks has been called)

still garnet
#

yeah exactly

tepid nova
#

But how do I detect that a check has been called?

#

Look for the actual function call CheckGroup.run() or whatever?

still garnet
#

default answer would be span attributes

still garnet
#

you can look for Call information too yeah (it's available on each span in the DB), but it might be easier if there's something on the spans themselves. where are they created at the moment?

#

i'd just add dagger.io/ui.check=true or something

tepid nova
#

On the frontend though: it still feels very indirect to modify the standard frontend code, instead of just calling a special render function from the checks command

still garnet
tepid nova
#

especially since I don't want to change the in-call TUI. Only what's printed after

still garnet
tepid nova
#

I may not even need Frontend at all, since really I only need to query the events DB, no TUI-related code at all.

#

I guess I need that DB

still garnet
#

well, you just need to make sure you're not printing straight to stdout/stderr

#

maybe just printing to cmd.Out() is enough?

#

that'll write to the "primary span" and it's the output that'll be printed on exit

tepid nova
#

It's the actual "useful" output of the command, eg. I would expect to be able to | grep etc

still garnet
#

yeah, while the frontend is running if something else prints to os.Stdout/os.Stderr it'll just be garbled

still garnet
tepid nova
#

Ah I see, no matter how I integrate, I can't escape the Frontend, so need to play nice with it

still garnet
#

yeah, it's downstream of having a TUI at all, really

tepid nova
#

@still garnet does each implementation of Frontend handle its own DB? Looks like I can't just get the DB from only the interface

#

Also what does DB.RowsView() do exactly?

#

Also another issue - I dont want dagger --format=.. to affect the output of dagger checks

still garnet
tepid nova
#

Maybe I should bypass the dagui stuff alttogether, and register my own otel collector?

still garnet
# tepid nova Also what does `DB.RowsView()` do exactly?

sorry the names are really terrible - it just constructs an intermediary phase where you have all the data to display from whatever the zoomed scope is. the tree, basically, with convenience for looking up sub-trees by span ID

still garnet
tepid nova
tepid nova
#

That's really the only part I need - a way to query spans for the current session. I don't need any of the actual UI stuff

#

(since I'm just going to print a bunch of logs anyway)

still garnet
#

yeah, that's essentially what the current dagui.DB is for

#

a higher level representation of what we saw from otel

tepid nova
#

OK, so I guess I just need a way to access that DB without tight coupling to how it's live-rendered to the screen

tepid nova
#

claude is suggesting the getter approach it seems ๐Ÿ™‚

still garnet
#

the only real coupling between the DB and live-rendering is the DB assumes locking is handled externally (for perf reasons)

#

so MAYBE there should be a Frontend.WithDB(func(db *dagui.DB) { ... }) if that becomes an issue

#

side note - dagui.DB is also the code that's shared between the web UI and the TUI

still garnet
#

yeah, could be somethig to keep in mind when deciding boundaries/interfaces

tepid nova
#

@still garnet does this look like a good starter prompt?

Subject: Prototype Request - Custom Log Display for dagger checks

Could you prototype adding custom log display to the dagger checks command? Here's the context and approach:

Problem: We want dagger checks to show the logs from each check execution after completion, in addition to the current table output. This should happen after the regular TUI finishes, so it doesn't interfere with --progress flags.

Current Architecture: The frontend (like prettyFrontend) acts as an OpenTelemetry exporter and accumulates all telemetry data in an internal fe.db field during execution. This database contains all spans, logs, and execution traces, but it's currently trapped inside the frontend implementations.

Proposed Solution:

  1. Extend the Frontend interface in dagql/idtui/frontend.go:

    type Frontend interface {
        // ... existing methods ...
        GetDB() *dagui.DB  // Add this
    }
    
  2. Implement in frontendPretty (dagql/idtui/frontend_pretty.go):

    func (fe *frontendPretty) GetDB() *dagui.DB {
        return fe.db
    }
    
  3. Create a standalone LogRenderer utility that can query the database and render logs for specific spans (like checks).

  4. Modify runChecks() in cmd/dagger/checks.go: After the normal table output, call Frontend.GetDB() and use the LogRenderer to display logs from check spans.

Flow:

dagger checks โ†’ withEngine() โ†’ Frontend.Run() โ†’ [normal TUI] โ†’ 
Frontend.GetDB() โ†’ Custom log rendering

This gives us clean separation, doesn't break existing behavior, and provides full access to the telemetry data for custom formatting.

The key files to look at:

  • cmd/dagger/checks.go - current implementation
  • dagql/idtui/frontend.go - interface definition
  • dagql/idtui/frontend_pretty.go - main frontend with the database
  • dagql/dagui/db.go - database structure and query methods
#

(I'll have to implement GetDB() in the other frontends also)

still garnet
tepid nova
still garnet
#

the DB doesn't do any locking on its own to handle concurrent reads/writes, so there could be a race condition on accessing internal maps etc

#

so you might want something that takes a callback, locks, calls fn with db, unlocks

tepid nova
#

Ah I see. not give unlimited access to the DB

still garnet
#

yeah, or at least make it harder

tepid nova
#

If it helps, I only need access to the db for post-execution render. Maybe that helps? I can pass a callback, but the implementation is simpler because that callback doesn't need to be called concurrently with live-rendering

#

Frontend.withPostRun(hook func(*DB) ?

Or while we're at it:

Frontend.withPostRun(hook func(*DB, io.Writer) ?

still garnet
#

hmmmm there's still a chance, i think, with how otel is wired up, but feel free to punt until -race reveals something

tepid nova
#

I'll still need to implement it in all frontends, but I'm guessing pretty is by far the most complex?

still garnet
#

yeah

#

they'll probably all look pretty similar tbh, i think they all have locking

tepid nova
#

(going through my list of questions so I can give this a shot autonomously before you logout ๐Ÿ™‚

#
  • I think I can handle setting attributes somewhere in checks (famous last words)
  • Not sure how to search for attributes in my post-run hook
#

I see a dagui.FindResource() that seems to be relevant?

#

Is it crazy that I want to use an ID from an object I built, and use that ID to get all collected spans emitted by that ID? ๐Ÿ™‚

#
checks := mod.Checks()
checkSpans := db.SpansForID(checks.ID())
still garnet
#

if you're wondering why SpanSnapshot exists - that's the subset of data that we're able to send over-the-wire from the web UI backend->frontend. in the web UI there's actually a dagui.DB on the backend loading everything from Clickhouse, and then sending only snapshots to a dagui.DB in the frontend, to keep required data transfer low

tepid nova
#

Reading through the general telemetry flow end-to-end, to get general awareness... wow the complexity of the otel stack.

still garnet
#

yeah, otel is pretty dense

still garnet
#

could be stockholm, but i can't say it's entirely redundant either

#

there's just a lot of things to consider ๐Ÿ˜ฌ (like batching rates)

obsidian rover
#

On a branch that adds a Debug field to the SdkConfig type, I am having a dagger call go lint linter issue that I'm not sure how to properly fix.

It seems that this linter takes the engine version in the dagger.json (at python/sdk/runtime) instead of the local version to generate the clients. So, when it regenerates the client bindings, it doesn't find my Debug() client method (it's normal, it's part of my PR)

It feels like a regression. I remember being able to extend the schema and not having such edge case (but maybe it's a new check that we added with this generic go lint function) thinkies

tepid nova
#

I'm trying to understand the basic "connector" between Frontend and the telemetry stream.

  • withEngine wraps everything inside Frontend.Run()
  • then within that it calls Frontend.SetClient()

--> ๐Ÿค”

#

Frontend.withPostRun(hook func(*DB) ?

tepid nova
#

Getting weird filesync error on very vanilla contextual dir upload

civic yacht
#

#team message, you can add this to workaround for now:

diff --git a/cmd/dagger/.dagger/main.go b/cmd/dagger/.dagger/main.go
index 9f68298c2..617f340e5 100644
--- a/cmd/dagger/.dagger/main.go
+++ b/cmd/dagger/.dagger/main.go
@@ -6,6 +6,7 @@ import (
        "github.com/dagger/dagger/cmd/dagger/.dagger/internal/dagger"
 )
 
+// +cache="session"
 func New(
        ctx context.Context,
 

Working on the fix here, hopefully can get merged + release tomorrow

civic yacht
tepid nova
#

Oh nice!

#

Thanks

tepid nova
#

Should we have an AGENTS.md / CLAUDE.md?

obsidian rover
tepid nova
#

(look at the diff)

obsidian rover
#

I am surprised by the TUI's time spent logic

#

the TestModule's time seems to rotate and is > to its parent's time ๐Ÿ˜‡

spark cedar
#

๐Ÿ‘€ something don't look right in the most recent release notes there @civic yacht

rocky plume
#

It's been a slog, but I got directory.File caching working under https://github.com/dagger/dagger/pull/11329
The PR itself is still really rough, and is littered with TODOs and printfs, but the (majority of) tests are passing -- and in one case TestTelemetry was updated for a case where a File op is now cached.

I still need to

  1. clean it up, and deal with linting issues,
  2. maybe even refactor it to use a file-equivalent version ofmaintainContentHashing,
  3. double check I haven't slowed things down (since there's more content hashing that's occuring now) -- so far it looks like some of the tests have been taking longer; however, I've seen a lot of network-related failures this afternoon.
  4. remove some hacks related to forcing host directories to copy
  5. figure out how to return container mounts as deps which then get passed to core.NewDirectoryDagOp

Either way, I wanted to share a brief update before I take off.

GitHub

working on a fix for #10992 -- this is a very rough draft so far.

tepid nova
#

Question, how come how much of the frontend code is under dagql/? Superficially it seems like 2 unrelated layers?

still garnet
#

i wish we had a pkg/ repo layout or something, part of the hesitation is just not wanting to put it in another toplevel dir

tepid nova
#

Or does util/ imply slightly more coupling to our repo?

#

Like "technically you could import this, but probably shouldn't"

still garnet
#

we're in true bikeshedding territory now, but for me util/ implies a grab-bag of packages that could live in a separate repo but we can't be bothered to maintain each of them, vs. pkg/ which to me is like "ALL of the core code lives here"

#

the role of pkg mainly being to distinguish from other not-even-Go-code directories like docs

still garnet
#

i'd say it's a superset

still garnet
#

imo pkg/ also implies "don't import from here, well you can if you want, but no guarantees, if I wanted these things to be for external use you'd be importing from a separate smaller repo"

so it's like a util/ that you just also put all your main code into

#

that's just what i've landed on for monorepos / application repos, plus cmd/ at the root level too, just to distinguish package main from importable packages

still garnet
#

does trivy not have a 'human readable output' mode? 39k lines of JSON seems a bit much

tidal spire
#

is that from the flag that says show us everything, not just the important stuff?

still garnet
#

i guess we are saying --format=json --show-suppressed. y tho

tepid nova
civic yacht
#

I would just remove it

#

What's the worst that can happen

#

(as in the flag, not the vuln scanner)

tepid nova
#

Update: cleaned up the checks PR, squashed

#

There are remaining failed CI checks, where I could use some help

#

Also the weird otel attribute errors are still there, but that's not blocking for merge (adding to TODO, forgot earlier)

still garnet
#

@tepid nova possible rationalization for pragmas: you might want something to be both a check and an artifact - with pragmas you can apply both, with return types you'd have to choose one

tepid nova
#

(somehow that was in Lint() but let's ignore that ๐Ÿ˜› )

#

btw @still garnet I'm doing monolith cleanup, an interesting pattern is emerging for generated files, will show you when the PR is up.

TLDR the module changes its Workspace field in place, and can give you changes on demand. I have this pattern in 2 places now:

  • PHP SDK (weird requirement to chain client generation then docs generation from the generated client)
  • Go toolchain (generate dagger runtimes, then lint from that)
#

Kind of a "virtual context"

#

Also: need to carry a bunch of paths relative to the workspace. Can't just carry Source *dagger.Directory around - instead SourcePath string which is eg. sdk/php

still garnet
#

@civic yacht [less of a nerd snipe and more fishing for a quick 'yep', can look further myself later] - is this making the mistake of calling a method (Directory.Without) that normally presumes it's called from within a DagOp? https://github.com/dagger/dagger/blob/0c0028796eb5252e373196dfa4964f07295ab546/core/directory.go#L1338

I added rm/mv tools to Doug but after it calls them any further operations on workspace fail with this error:

select: failed to compute cache key for Query.doug: load contextual arg "source": load contextual directory "/": select: failed to load contextual directory: failed to select env directory: select: failed to remove paths: unlinkat /var/lib/dagger/worker/cachemounts
tepid nova
#

As the complexity unravels, it's becoming more and more tempting to port big chunks of it to dang ๐Ÿ˜›

obsidian rover
#

Does anyone have ideas on how to systematic approach isolating the root cause of our ci_in_ci flake (which is quite consistent) https://dagger.cloud/dagger/traces/235542b1ab326872993b6595a2637bc5 ?

I am inside the container of one of the python failure, and I am getting whenever i try to introspect the schema or talk to the engine:

dagger.ClientConnectionError: Failed to establish client connection to the Dagger session: Failed to build schema from introspection query: get or init client: client "m18nhcf20secz91bs24z3u6xr" already exists with different secret token`

Getting out of ideas on how to isolate the bug ๐Ÿ˜ข

civic yacht
civic yacht
obsidian rover
#

Yeah it's above my level atm I think

civic yacht
tepid nova
#

That _contextDirectory span looks new?

civic yacht
tepid nova
#

Generally it's sometimes hard to mentally map a filesync to the line of code causing it

#

But maybe it's a me problem

#

Are self-calls available as an opt-in now? And are they still slower, or did all our recent speed improvements actually make them affordable?

civic yacht
# tepid nova But maybe it's a me problem

No the spans show up in confusing places. Actually now that you mention it, I think it's because we have to load context dirs to compute the cache key, and the cache key computation happens outside of the actual call

tepid nova
#

Its happening again

#

trying without pARC

#

granted I've been messing with ignore filters in this branch... But this doesn't look like a misconfigured filter

still garnet
#

"written bytes: 0" seems sus (also really cool that we have a metric for that, if it is indeed a sign)

civic yacht
# tepid nova Not sure if it's engine or infra/parc... https://dagger.cloud/dagger/traces/662...

ooo I used my new power to connect trace->namespace instance and tracked that down, engine logs show:

2025-11-06 23:15:58.640
daggerpanic: runtime error: invalid memory address or nil pointer dereference
2025-11-06 23:15:58.640
dagger[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x19454e2]
2025-11-06 23:15:58.640
dagger
2025-11-06 23:15:58.640
daggergoroutine 17436986 [running]:
2025-11-06 23:15:58.640
daggergithub.com/dagger/dagger/engine/filesync.(*localFS).Sync.func5.1()
2025-11-06 23:15:58.640
dagger    /app/engine/filesync/localfs.go:331 +0x762
2025-11-06 23:15:58.640
daggergolang.org/x/sync/errgroup.(*Group).Go.func1()
2025-11-06 23:15:58.640
dagger    /go/pkg/mod/golang.org/x/sync@v0.17.0/errgroup/errgroup.go:93 +0x50
2025-11-06 23:15:58.641
daggercreated by golang.org/x/sync/errgroup.(*Group).Go in goroutine 17434501
2025-11-06 23:15:58.641
dagger    /go/pkg/mod/golang.org/x/sync@v0.17.0/errgroup/errgroup.go:78 +0x95

https://github.com/sipsma/dagger/blob/0c0028796eb5252e373196dfa4964f07295ab546/engine/filesync/localfs.go#L331-L331

panic from the new telemetry added for filesync stuff recently

#

cc @wild zephyr

#

I think it was supposed to be werr instead of err

still garnet
wild zephyr
#

So if the filesync fails for whatever reason, the engine panics. Noice

civic yacht
#

interestingly I just saw in nvim that a linter named nilderef is complaining about that line, might be worth enabling that in our linter config if possible

wild zephyr
tepid nova
#

๐Ÿ˜›

#

@tidal spire hey you mentioned a toolchain can't have toolchains. Would that be a big change to make? ๐Ÿ˜‡

tepid nova
still garnet
#

@tepid nova here's a separate PR for pragmas, in case you want to just pull it into your PR: https://github.com/shykes/dagger/pull/448 - done for Go, Python, and TypeScript - can look into more tomorrow (wanna see how far automating this can go)

#

tested with dagger check and it seemed to find and be able to them all ๐Ÿ‘ - (plus additional tests for TS/Python)

tepid nova
tidal spire
tepid nova
#

@tidal spire I have a toolchain working in our monolith ๐Ÿ˜

leaden glade
#

blob_help Since 0260062 some tests are constantly failing. I thought it was on my PR but in fact it's on main.
I'm not seeing an obvious reason it should fail based on the diff of this commit. So I guess the error is somewhere else.
If anyone has an idea

dagger call test specific --pkg="./core/integration" --run="TestContainer/TestLoadHostContainerd"
tepid nova
#

@tidal spire FYI I am seeing my toolchains pop up as dependencies in the parent module's code. But honestly I'm glad it's there, ๐Ÿ˜›

tidal spire
tepid nova
#

@tidal spire 2 toolchains ๐Ÿ™‚ Go SDK & PHP SDK

#

one of them in dang

#

My only papercut so far with toolchains: I want the correct module description in dagger functions and dagger toolchain list ๐Ÿ™‚

stuck bloom
#

v0.19.6 engine container has 2 HIGH findings (trivy)

civic yacht
# leaden glade <:blob_help:1436291972515102821> Since [0260062](https://github.com/dagger/dagge...

https://github.com/dagger/dagger/pull/11376 fix, on the bright side I'm fairly sure this was a timing flake in the test that had always been possible, but only became an issue recently because of the perf improvements, which got rid of the multi-second hangs between operations ๐Ÿคทโ€โ™‚๏ธ ๐ŸŽ‰

GitHub

Seeing if this fixes the flake in TestContainer in CI. I can&#39;t confirm locally because there I get seemingly unrelated errors related to cgroup setup ๐Ÿ˜ตโ€๐Ÿ’ซ, possibly some kernel-related and/o...

obsidian rover
tidal spire
tepid nova
#

@wild zephyr thanks for dealing with the releasing issues...

wild zephyr
tepid nova
#

Are there follow-up issues that still need investigation - of the kind a mere mortal can do? ๐Ÿ˜›

#

On the flakes etc

#

I was going to look into the weird changelog generation issue

wild zephyr
wild zephyr
wild zephyr
#

but for the moment nothing that I can think of which has a high priority

tepid nova
#

Need help getting Checks merged... ๐Ÿ™ https://github.com/dagger/dagger/pull/11211

I think all the remaining failed checks are either 1) flakes or 2) already in main.

Could someone confirm, and maybe give me a quick review? The code is experimental, so the bar is lower for merging. We just need to make sure it doesn't break stable codepaths.

obsidian rover
#

I keep getting this error on CI (like a looot) on https://github.com/dagger/dagger/pull/11366:

Error response from daemon: No such image: registry.dagger.io/engine:v0.19.6
connect
starting engine
create container
exec docker pull registry.dagger.io/engine:v0.19.6
failed to run command [docker pull registry.dagger.io/engine:v0.19.6]: exit status 1

Error response from daemon: Head "https://registry.dagger.io/v2/engine/manifests/v0.19.6": Get "https://registry.dagger.io/token?scope=repository%3Aengine%3Apull&service=ghcr.io": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

@wild zephyr Could it be a cached instance on namespace that isn't able to query registry.dagger.io/engine:v0.19.6 ? Seemed to work fine locally

when i retry, does it spin up a new instance with a clean cache ?

tepid nova
obsidian rover
#
  1. seems fine (now)
civic yacht
tepid nova
# civic yacht Wouldn't 1 imply that namespace is mitm'ing our registry? Or do they apply some ...

Two things:

  1. It's possible that I'm misremembering and they they don't provide a pull-through cache. I might be confusing with our own. @wild zephyr / @astral zealot will know
  2. Namespace definitely does inject custom docker credentials in our CI runners at the machine level. So if they do inject a mitm proxy, that's how they would make it work. Or maybe they only inject the credentials and not a pull-through cache - still useful as it avoids their IPs getting rate-limited
tepid nova
wild zephyr
#

I can check really quick if you want

civic yacht
#

I think part of the issue is that some of the tests purposely close clients and then run other tests in the same container, which is breaking nested execs I think

#

The Version thing is a little weird but I have an idea on why that might have failed the way it did, checking that out too

#

Either way, that's all stuff which is in deeply obscure territory, and the tests work when they are running as intended, so nothing super broken thankfully

wild zephyr
civic yacht
# wild zephyr I see.. it's still strange that they passed using a nested session when I revert...

Well they would pass sometimes w/out that commit, but not always (hence the frequent flakes we saw before). That commit made them just always fail, but only after the release because of the bug where they weren't hitting the dev engine (they were hitting the "outer" stable engine).

The reason that made them always fail is what I was referring to in

The Version thing is a little weird but I have an idea on why that might have failed the way it did, checking that out too

#

I have a theory I'm testing, but if it's real it's an extremely obscure problem that would only happen with a very tangled web of nested execs and function calls, of which our CI is the only real use case for

wild zephyr
civic yacht
obsidian rover
# civic yacht merged the fix, can pick up in rebase

Saw your comment on how you debugged it ๐Ÿ™ I guess the difference is that I was trying to fix the root cause instead of trying to make the dagger-in-dagger-in-dagger not happen to unlock the pipeline

#

which i gave up ahahahah ๐Ÿคฃ

civic yacht
# obsidian rover Saw your comment on how you debugged it ๐Ÿ™ I guess the difference is that I was...

I guess what I do is:

  1. Repro
  2. Poke around to see if there's something plainly obvious right away
  3. If still lost, just go to the beginning of where everything starts (so in this case, here), trace through exactly what is supposed to happen and compare that against what actually happened, each step-by-step
    • In this case, that just meant comparing the code and the trace. But to figure out what actually happened you may need println debugs, etc.
#

That's pretty much a foolproof algorithm I suppose, it's just sometimes finding that exact point where "expected behavior" deviates from "actual behavior" may vary in effort required. e.g. if the deviation comes from the OS or compiler or something like that it may take a while ๐Ÿ˜„

still garnet
tepid nova
#

Current status: spinning out a 4th SDK toolchain from the monolith (Typescript) cc @fair ermine @tidal spire

tepid nova
#

almost done carving out toolchains/engine-dev. that's the largest remaining piece of the monolith

still garnet
tepid nova
#

@still garnet I just noticed custom spans from util/parallel are no longer visible in TUI or cloud... super weird. No leads from git blame ๐Ÿคทโ€โ™‚๏ธ

#

(in 0.19.6)

#

will bisect

still garnet
#

could check &debug

civic yacht
tepid nova
#

@tidal spire toolchains spun out so far:

  • go sdk
  • python sdk
  • php sdk
  • typescript sdk
  • cli dev
  • engine dev
  • helm dev
  • ci
  • almost go (missing default arg)

Monolith is so small it can be ported to dang ๐Ÿ™‚

tidal spire
tepid nova
#

ambiguity of contextual dirs will be an issue. but solvable

#

so far monolith did not get faster to load... if anything it's slightly slower. But we haven't harvested all the potential gains yet, in particular we're still paying the price for most SDKs not supporting lazy loading yet. Also the field is wide open to port most of these modules to dang (I ported 2 so far).

I think in another week this thing will fly

tidal spire
#

Interesting I didn't know configs only worked for strings. I thought a test used a file but I must be thinking about something else

#

The dang parts look sweet. Were those ported via claude?

#

asking because I saw the go-sdk one had

pub sourcePath: String! = "sdk/go"

and then

pub name: String! {
  "go"
}
tepid nova
#

yeah go-sdk I used claude. typescript-sdk I did by hand (refactored straight to dang instead of 2-step refactor->dang)

tidal spire
#

nice thats cool that claude is able to do that fairly easily. Feel like its easier for claude to write dang than dagger go sdk

tepid nova
#

It wasn't perfect on the first try but it was pretty good

#

I had it write a manual

still garnet
#

is this pretty-printing for core APIs worth it? notsureif(it's all client-side based on call structure)

#

Above is 100% tightly coupled, just frontend detecting certain fields and rendering them specially. Goes back to normal when you bump the verbosity. Maybe it could be applied to everything with a heuristic: <funcName> <reqArg1> <reqArg2> (opt1: ..., opt2: ...)

#

TODO: pretty-print addresses, which is a separate kind of beast that wouldn't fit that heuristic

tepid nova
still garnet
#

the bottom part, specifically the withExec ... part, which would normally say .withExec(args: ["..."]). but welcoming feedback on anything

#

just looking for angles to reduce noise, noticed for bracketed logs you opted to semantically inspect the call, so just seeing what else we can buy with that approach

still garnet
#

side note: the check path glob syntax is growing on me. only issue i think is you have to quote it, but you have that for regexes too anyway. (example real use: dagger-dev checks 'test/{lang,tree}*')

tepid nova
tepid nova
#

Same as always, my brain struggles with logs & traces woven together in the same view. I think it's because it breaks my understanding of the "time arrow". They're like 2 different visual languages, using the same screen layout to express different things, and the result is garbled (for me at least)

still garnet
tepid nova
#

I know I'm a broken record and it's easier said than done

tepid nova
#

(to be clear I was saying: when a check is expanded, it should either show only its flattened logs, or only its span tree; but not both one after the other like in that screenshot)

still garnet
tepid nova
#

I think the gains of doing this would grow linearly with the number of checks and their complexity

civic yacht
#

@Vasek - Tom C. when you have time,

warm musk
#

dang / php caused me some pains when i cloned on Windows, the famous CRLF / LF strikes me again. ran into issues running the dagger functions from the main dagger repo because of this, caught me off guard. not sure how best to resolve that really, but .gitattributes on certain files might help, somehow, windows clones -> files loaded into dagger aren't dos2unix'ified and then blurhghghghg happens. This happened when i was playing around with the dagger sdk builds/tests when making a PR for the dotnet update on the experimental dagger dotnet module.

Any ideas on how is a good way to solve this in a smart way? .gitattri on the repo, or CRLF conversion at a dagger level when copying files or... hmm

I am tempted to clone dagger into WSL and see if i have the same problems but i have a feeling i wouldnt reproduce the same issues I had.

still garnet
#

@tepid nova quick q: should the top-level 'rolled up logs' spans keep showing their logs if they fail, or collapse when they're done no matter what?

still garnet
#

going with collapse-by-default for now, feels tidier and you get the root cause logs at the end anyhow. in -E you can just toggle it open

tepid nova
#

I'll have to play with it directly to be sure I understand ๐Ÿ™‚ But seems reasonable

#

btw my preference is to not get that big ERRORS section at the end at all

#
  • I already know there's an error because the check says "ERROR" at the top.
  • I already have the logs (rendered in the usual way)
  • I already have the tree span with each span already having a clear error indicator

--> Normally, I should have everything I need from the regular render, without needing to append an ERRORS section at the end which duplicates info

#

My initial approach for post-execution print was:

  • Always show the list of all checks, with red/green
  • For failed checks, show all the logs in context
  • I didn't address sub-spans, so not sure how that factors in
  • No separate "errors" section. Just the above
#

In the context of dagger checks, IMO the whole output is the error context. There shouldn't be a difference

still garnet
#

for Dang where I'm dogfooding it I definitely wouldn't want all the rolled-up logs in my scrollback, because that includes noisy codegen stuff too

tepid nova
tepid nova
#

Only show the logs of failed subspans?

#

The part that bothers me about the ERRORS section is repeating the whole structure twice

still garnet
#

wouldn't that be strictly worse / more repetitive than just showing the root cause, considering we have that exact info anyway?

tepid nova
still garnet
#

yea sure

warm musk
still garnet
#

': No such file or directory
wat

still garnet
#

@tepid nova progress! feels like something more should be done to tidy it up, but not sure what exactly notsureif

tepid nova
tepid nova
#

merge ready?

still garnet
#

can get it ready soon, just need to regen telemetry tests once the dust settles

#

will ping when ready

#

oh, need to double check the codegen logs fix too

still garnet
tepid nova
#

@still garnet is there a standard way for the CLI to say "telemetry for this context should be hidden by default"?

still garnet
#

telemetry.Internal()

tepid nova
#

example: dagger checks -l that queries the list of checks; or dagger toolchain list that queries info about each toolchain

still garnet
#

oh i see

civic yacht
#

Dagger Cloud

tepid nova
#

At the moment I use the "tuck away" strategy in checks -l ๐Ÿ˜›

still garnet
#

that's what i did for dagger checks, and dagger prompt/shell mode

tepid nova
#

Ah, so the reverse of tuck away

still garnet
#

lol yes

tepid nova
#

How do I zoom?

still garnet
#

there's a bit of a dance to it:

ctx, shellSpan := Tracer().Start(ctx, "checks", telemetry.Passthrough())
defer telemetry.End(shellSpan, func() error { return nil })
Frontend.SetPrimary(dagui.SpanID{SpanID: shellSpan.SpanContext().SpanID()})
slog.SetDefault(slog.SpanLogger(ctx, InstrumentationLibrary))

the last line just wires slog up so the user will still see it, since the TUI shows logs for the zoomed span at the bottom. mostly for debugging

#

SetPrimary is the key part. probably overdue for some refactoring now that we want it in so many spots

#

one downside is that if something does fail outside it'll be out of view, i think. may want to test that, can figure out how to address if needed

tepid nova
#

@still garnet in this pattern, I should keep the old context, to make queries that I don't want visible right?

still garnet
#

yep

#

the alternative is to just make a telemetry.Internal() span and tuck everything in there, but that won't clear progress up until that point

tepid nova
#

btw this is another boundary of the "otel-driven UI" topic we keep discussing. In dagger toolchains list I don't actually have any spans to display. Just a few "printfs to tabwriter". But in the future, maybe we want even that display code to be owned by the engine? Via a table UI component perhaps?

still garnet
still garnet
#

i've also thought about making the 'zoom' mechanism able to be driven by attributes, but it feels a little OP

#

(kinda like the old .Focus() API, which i don't even remember how exactly that worked anymore, that was pre-otel i think?)

tepid nova
#

@still garnet if I don't use that inner "display context", does it never get created? eg. is it ok to do _, shellSpan := Tracer().Start(ctx, "checks", telemetry.Passthrough())

still garnet
#

it'll clear the screen as soon as you create the span

still garnet
#

Dagger Cloud

#

@tepid nova ah ha - lazy loading means sometimes we don't actually run codegen until the check function itself runs if it calls another module, so those logs slip through. need to guard against that too (easy fix, but was confusing for a bit)

still garnet
#

@civic yacht ack - i'm rebased on main and just hit SQLITE_BUSY right after a ./hack/dev, maybe something got worse with those tweaks? never saw this before

still garnet
tepid nova
still garnet
#

oh huh, it's getting a noopTracerProvider

#

ack, figured it out. it's kind of annoying

#

it's doing trace.SpanFromContext(ctx).TracerProvider(), which is a pretty common thing to do, but in this case it backfires because we don't have an actual span, it's just a container exec inheriting a span context, so SpanFromContext(ctx) returns a noop span

#

maybe this did a otel.GetTracerProvider() before and it worked?

tepid nova
#

Looks like it was otel.Tracer().Start() before my last CI refactor.

tepid nova
still garnet
# tepid nova <@108011715077091328> I tried to revert this ๐Ÿ‘† but after some superficial testi...

this fixed it for me:

func (job Job) startSpan(ctx context.Context) (context.Context, trace.Span) {
    attr := job.attributes
    attr = append(attr, attribute.Bool("dagger.io/ui.reveal", true))
    return otel.Tracer("dagger.io/util/parallel").
        Start(ctx, job.Name, trace.WithAttributes(attr...))
}

but, I think that'll break its engine-side usage since it won't be sending to the per-client telemetry DB anymore

#

i'm not using it engine-side for checks anymore, so haven't run into that

tepid nova
#

Ah... So you're saying it either works fine client-side, or it works fine engine-side, but not both?

Just to confirm - the current implementation in main does work engine-side without issues?

still garnet
tepid nova
#

FYI on a fresh checkout of your branch, dagger checks go/lint does not show the custom spans emitted from within the module. Is your fix currently applied in that branch?

#

Repro:

dagger -m github.com/vito/dagger@checks-pragma call \
 playground \
 with-directory --path=. --source=https://github.com/shykes/dagger#ci-faster-load: \
 with-workdir --path=dagger \
 terminal
still garnet
# tepid nova > SpanFromContext is a somewhat common practice What does that do? I'm a bit lo...

in OTel there are two ways to get a TracerProvider (the thingy that lets you create [the thing that lets you create ๐Ÿ™ƒ] spans)

  1. otel.GetTracerProvider() gets the singleton globally configured TracerProvider, if there is one
  2. trace.SpanFromContext(ctx).TracerProvider() gets the TracerProvider that created the current span, if there is one

did it find one? you don't know! the TracerProvider you get back will just silently be a nopProvider sending your telemetry into a black hole, so you can't even try one-and-then-the-other

#

(and otel.Tracer(...) is shorthand for otel.GetTracerProvider().Tracer(...))

#

IMO #2 is always better since it's more local to the code path, in the engine for example this is paramount because we have per-client TracerProviders that write to each client's separate SQLite DB, whereas ostensibly otel.GetTracerProvider() would be more for engine-side system-level traces, like periodic GC or whatever - things that are outside of any individual client

tepid nova
#

OK I see. So what prevents us from using SpanFromContext() is that connect() does not immediately send a top-level span, so there is no span to retrieve from the context, so we get a nopProvider?

still garnet
#

yep - it specifically won't work inside a module function call if it's right on the span boundary, because there's no outer span created yet. The actual trace.Span live object is on the outside - we propagate it in as traceparent but SpanFromContext won't have access to the trace.Span, you'll only have SpanContextFromContext (the propagation metadata, from traceparent, propagated through env vars)

still garnet
tepid nova
still garnet
#

oh, wait, i think this is because it's using code from your ci-faster-load branch, which has it broken still right?

tepid nova
#

But, normally those tweaks should not change the default behavior

tepid nova
#

Oooh wait it's the module linking against a different version of parallel...

#

the brain... it hurts...

tepid nova
still garnet
#

Yeah, though to its credit we're squeezing every ounce of that complexity to support our complex use case, and it's had a knob for everything we've needed to turn lol

#

</stockholm>

tepid nova
#

So how do we fix it?

#

Separate PR that combines fix to parallel + go sdk?

still garnet
# tepid nova Separate PR that combines fix to `parallel` + go sdk?

lazy fix: don't use it engine-side (current state of my branch) and just swap it back to the global tracer
general fix: have the Go SDK create a passthrough span so trace.SpanFromContext(ctx) returns a proper trace.Span that the user doesn't see, but, theoretically this is a problem for every SDK too

tepid nova
still garnet
#

i don't think so - the problem is local to each language's OTel SDK, so we can't solve it in one place

#

maybe the real fix is to just finish the statuses PR

tepid nova
still garnet
#

yeah but it'd be all in the engine. the SDKs wouldn't need to touch the OTel plumbing anymore

tepid nova
#

OK I will start with a super short term, yet clean-ish, fix: make the behavior configurable in parallel

still garnet
tepid nova
# tepid nova OK I will start with a super short term, yet clean-ish, fix: make the behavior c...

@still garnet how does this feel:

// Default: don't use the "contextual tracer". Works client-side 
jobs := parallel.New() // implicit
jobs := parallel.New().WithContextualTracer(false) // explicit

// Custom: enable the "contextual tracer". Works engine-side
jobs := parallel.New().WithContextualTracer(true)

Once we've done the "deep fix", I can flip to make contextual tracer the default. Nothing should break

still garnet
#

lgtm

tepid nova
#

@still garnet maybe I'm misunderstanding, but wouldn't this whole problem go away if otel had a trace.TracerProviderFromContext(), that worked whether or not the context had emitted a span? It's not like the context doesn't have a tracer provider configured... It's just not passed through for an arbitrary API design reason. Or am I missing something ?

still garnet
still garnet
tepid nova
still garnet
#

i guess we could add a WithTracerProvider but it wouldn't buy us much, since that'd be just as Go SDK -specific as creating a passthrough span

tepid nova
#

(testing my fix now)

#

I think I'm going to port the monolith to dang ๐Ÿ˜›

#

Oh wait no I can't - parallel

tepid nova
#

Did you have those gifs lined up? ๐Ÿคฃ

still garnet
#

lucky searches lol

tepid nova
#

gifs/reactions/whats-his-face/

#

welp it appears my fix did not work

#

but it should...

#

oh wait maybe I didn't wait long enough

#

ah that was it. it's working!

#

@still garnet ugly error alert

still garnet
#

these errors are created by the graphql client the go sdk uses, i think

#

we clean up the error returned by the function, but don't have that ability for errors that are directly stamped onto custom spans

#

we should maybe just change the Go SDK to clean all that up - alternatively, this is another thing that could be handled by the statuses PR

#

or, for now, your parallel package

#

i can take a swing at that on my branch, to see what it could look like

tepid nova
# tepid nova <@108011715077091328> ugly error alert

Breaking down my understanding of issues in this first screenshot:

  1. Error in sub-check is super verbose (and in my case I don't need it at all, exit code 1 is not useful info for me)

  2. Logs for all sub-checks are mixed at the top-level (I think), so I don't know what went wrong in this particular sub-check. (each sub-check executes its own golangci-lint process

  3. Normally golangci-lint is configured to prefix its error messages with the full path of the file. This would make it easier to use the combined logs. But for some reason it's not working here, perhaps a bug in my module
    --> looks like golangci-lint is indeed getting the correct --path-prefix <foo> but choosing to ignore it..

still garnet
#
  1. If you don't need that error, seems solveable at application layer (just return higher level error instead of bubbling up Sync error)
  2. Maybe we should add context to the line prefixes? e.g. [.dagger/golangci-lint]
  3. Looks like it is, but maybe they're just relative, and they all run in separate containers anyway (edit: oh)
tepid nova
#
  1. Maybe we should add context to the line prefixes? e.g. [.dagger/golangci-lint]

(In this case it would be hard to do with a heuristic, it would have to be controllable by the module dev somehow. mmm maybe we just use the name of the nearest subcheck name as the prefix, instead of a heuristic? That way the names in the brackets would match the names in the span tree)

Yesterday I think we settled on "don't expand the subcheck logs by default". But now I'm having second thoughts... In my case the top-level check doesn't actually stream any logs of its own... Each sub-check has its own withExec. So for this particular case distributing logs in the subchecks would be strictly better.

But, I know go tests are different, because they share one big withExec.

Is there a way to differentiate those 2 situations somehow?

still garnet
#

there's a new Boundary attribute I just added for that sort of thing

#

i think you'd want RollUpLogs + Boundary on each of them, and RollUpSpans if you also want the fancy dots on each sub-span

tepid nova
#

Can you explain RollUpLogs and Boundary? Honestly all those attributes (eg. Internal, Reveal) are like random buttons I try to push in different combinations until it looks ok. I don't feel in control of the model at all.

#

(not a judgement on the buttons, just raw user feedback)

#

i want to make all those attributes easily accessible in parallel, so it's easier to try them in the first place

still garnet
# tepid nova Can you explain `RollUpLogs` and `Boundary`? Honestly all those attributes (eg. ...

RollUpLogs <- when applied to a span, logs from descendant spans will be rolled up to this span's logs in the UI, with prefixes
Boundary <- prevents logs from rolling-up past this span
so in this case I think you want both - RollUpLogs to set each sub-span as the new roll-up target, and Boundary to prevent them from rolling up to the parent (this is also how codegen logs are hidden btw)

tepid nova
still garnet
#

found a fix for the input: foo.bar.baz.buzz: ... crazy errors. it's kinda silly. only happens for exec errors

#

still long, but less so

tepid nova
# tepid nova

My issues with this 2nd screenshot:

  • There are too many errors in the picture
  • The errors get increasingly more verbose and less useful as they bubble up

--> Solution: only print the error in the leaf?

still garnet
#

Second screenshot looks to me like we don't do any error origin tracking for self-calls (like within the engine, dagql.Select), blind guess but if true it could mean there's one spot where we can fix it

civic yacht
tepid nova
civic yacht
#

that's all I can think of

tepid nova
#

Ah yes...

#

Probably. I'm on it, sorry

civic yacht
still garnet
#

Reveal means "bubble me up through parent spans so the user sees it" and corresponds to the "Hiding noisy spans" toggle in the web UI. Boundary keeps that contained now, too - we had some tests specifically testing Reveal behavior and it was getting pretty annoying seeing the revealed spans dominate the whole test output, so now that's fixed. I suspect Reveal might be retired at some point in the future though, feel like we're chipping away at the use case, by e.g. adding more semantic flags like 'check'

tepid nova
still garnet
#

and also auto-expand parents

#

it's how e.g. when you run our tests, you don't see the withExec or any outer stuff, you just see DaggerDevTest.all > TestContainer | TestDirectory | ...

still garnet
# tepid nova

do you have a repro for this screenshot, or similar? gonna look into it

#

ah maybe this is close enough

โฏ mv docs/dagger.json{,.bak}

dagger checks-pragma*โ€‹โ€‹ โ‡กโ‰ก
โฏ dagger-dev functions
โœ” connect 0.1s
โœ˜ load module: . 7.9s ERROR
! failed to serve module: failed to load dependencies as modules: failed to load module dependencies: module requires dagger , but support for that version has been removed
โ”œโ•ดโœ” finding module configuration 3.7s
โ•ฐโ•ดโœ˜ initializing module 4.2s ERROR
  ! failed to load dependencies as modules: failed to load module dependencies: module requires dagger , but support for that version has been removed
  โ•ฐโ•ดโœ˜ ModuleSource.asModule: Module! 4.2s ERROR
    ! failed to load dependencies as modules: failed to load module dependencies: module requires dagger , but support for that version has been removed
    โ•ฐโ•ดโœ˜ load dep modules 4.2s ERROR
      ! failed to load module dependencies: module requires dagger , but support for that version has been removed
      โ•ฐโ•ดโœ˜ ModuleSource.asModule: Module! 0.0s ERROR
        ! module requires dagger , but support for that version has been removed
Error: failed to serve module: failed to load dependencies as modules: failed to load module dependencies: module requires dagger , but support for that version has been removed
still garnet
#

@tepid nova progress on screenshot 2 -

this works by replacing telemetry.End(span, func() error { return rerr }) with telemetry.End(span, &rerr) (extremely longstanding TODO), which now also handles error origin tracking, by transparently reassigning rerr to one that's stamped with span

not sure why it didn't work for the last step, but already much better, looking into that now

tepid nova
#

(back from late lunch)

still garnet
#

got the last one

tepid nova
still garnet
#

it is, but it helps when it's less clear imo, like $ => CACHED

tepid nova
still garnet
# tepid nova nice!

opened a separate PR for this since it touches so many things: https://github.com/dagger/dagger/pull/11410

just did a self review, went over each change and squinted, not sure how to explain the CI failures yet notsureif

GitHub

replaces the clunky telemetry.End fn pattern with a simple error pointer
tracks the span as the origin of the error by re-assigning the pointer, if the error does not already have an origin

tepid nova
#

I was trying to fix a check-generated failure in engine, but now also getting a go/check-tidy failure in sdk/typescript/runtime ๐Ÿคทโ€โ™‚๏ธ

tepid nova
#

@still garnet I just merged the fix for failed check-generated (thanks for the โœ…). Want to rebase and see if it fixes your CI errors?

#

@still garnet do you think we can merge checks-pragma today? ๐Ÿ˜‡ would allow me to start calling actual checks from GHA

still garnet
#

i see it's approved already (thanks @civic yacht) - can merge whenever! i re-ran a CI failure to see if it de-flakes, but gotta run to dinner, if it goes โœ… feel free to push the button

I was also gonna address the feedback on the .contributing AI slop but figure we can do that in post

tepid nova
#

go check-tidy fails on main in CI, but I cannot reproduce it locally on the same commit...

tepid nova
tepid nova
#

Still have a failed go check-tidy in main... Still can't reproduce it

#

๐Ÿ™‹ can anyone get this command to fail?

dagger call -m github.com/dagger/dagger@main go check-tidy
still garnet
tepid nova
#

can go mod tidy be flaky somehow? Or perhaps it depends on current engine version?

#

since check-tidy generates the dagger modules first

leaden glade
#

I think I got it. The good thing is it's not flaky, there's a reason for that ๐Ÿ˜‰
I tried different ways, and it wasn't clear to me why dagger call -m github.com/dagger/dagger@main go check-tidy was not passing, but a dagger call go check-tidy on the main branch of my checkout was passing ๐Ÿค”
The explanation is behind some unversionned files I had on my clone.

  • on a fresh clone of main, dagger call go check-tidy is not passing
  • after a dagger call go tidy it works (the fix is there: https://github.com/dagger/dagger/pull/11412)
  • on a fresh clone of main with some unversionned files in my case it's some generated files from the python-sdk toolchain, dagger call go check-tidy is passing without the above changes
toolchains
โ””โ”€โ”€ python-sdk-dev
    โ”œโ”€โ”€ dagger.gen.go
    โ””โ”€โ”€ internal
        โ”œโ”€โ”€ dagger
        โ”‚ย ย  โ””โ”€โ”€ dagger.gen.go
        โ”œโ”€โ”€ querybuilder
        โ”‚ย ย  โ”œโ”€โ”€ marshal.go
        โ””โ”€โ”€ telemetry
            โ”œโ”€โ”€ attrs.go
...
leaden glade
#

@still garnet I have not follow all the things, but is there a way to declare a function is a check in dang?

still garnet
#

eh actually i can probably merge it now. but yeah, will need newly shipped or dev engine for it to work

still garnet
#

gonna tweak go check-tidy to print the diff, which could give an idea - flying blind at the moment. can second @leaden glade's point that usually when this happens it's because of uncommitted files

leaden glade
tepid nova
#

@fair ermine @leaden glade ๐Ÿ‘‹ before you log off today, can you tell me which of the kill-monolith tasks I should not touch?

fair ermine
#

I also opened a PR for the eager loading so you can test the monolith

leaden glade
tepid nova
#

Thanks guys

leaden glade
#

@still garnet more dang questions:

  • are regexp available in dang?
  • is there a way to raise an error in specific case?
tepid nova
#

<@&946480760016207902> any objections to implementing currentModule().checks()? Would be cool if a toolchain could introspect all the checks in the current context - including but limited to its own. Is that what currentModule() does? (or does it point to the toolchain/blueprint's own module?)

#

Or, maybe this is when we unshelve the concept of "current env" @still garnet and try again to make it more general, not just for LLMs?

#
currentEnv().checks()
currentEnv().workspace()
currentEnv().toolchains()
currentEnv().functions() // ?
currentEnv().functions().build() // actual function invocation with codegen? a path to self-calls?
#

My immediate use case is a GHA config generator ๐Ÿ˜‡ inspect all checks in the current context, auto-generate a config

still garnet
#

@vito more dang questions:

tepid nova
#

@still garnet would it be easy to change the default checks view so that "sub-checks" are visible by default? (but not the other spans)

#

(sorry if we already had that discussion)

still garnet
#
  1. define sub-checks ๐Ÿ˜›
  2. anything is possible, though i'm not totally sure you want that, at least I wouldn't want a bunch of checks to push other things offscreen (Dang's suite for example is about 100 tiny scripts so maybe this goes back to 1.)
tepid nova
#

Sorry I'm using "sub-checks" as an alias for "custom spans emitted by checks, which we hope to formalize as a 'sub-checks' feature later"

tepid nova
#

It's just that right now if I don't expand the span tree, I only see raw logs but no useful information, including whether any sub-check has failed

#

but if I expand, I get a firehose

#

default vs. expanded

#

@still garnet quick question on a change you made to parallel. From telemetry.Internal() to a new dagger.io/ui.internal attribute.

I notice in the engine code we still use telemetry.Internal(). Is it safe for me to call parallel.New().WithInternal(true) within the engine, knowing that it will use the new attribute and not telemetry.Internal()?

(My goal is to always guarantee that parallel can safely be used both engine-side and client-side)

still garnet
#

those are just helpers for adding attributes so it doesn't matter at the end of the day

tepid nova
#

Nice! Thanks

#

(doing a quick pass at making engine traces more readable with a few judiciously placed custom spans)

still garnet
#

i do wish for the Go SDK we did something like replace dagger.io/dagger => ./internal/dagger but there might be a good reason not to

#

TypeScript SDK seems to work that way

tepid nova
still garnet
leaden glade
tepid nova
#

Thank you!

tepid nova
#

@still garnet dogfooding screenshot:

  • Command:dagger checks go/lint
  • Output: post-run (not live TUI viz)
  • Dagger version:dagger/dagger@main
  • CI module version: running shykes/dagger@ci-faster-load

Issues:

1. Error is too verbose.

  • What I need: exit code 1
  • What I get: ! input: container.from.withMountedCache.withMountedCache.withMountedCache.withWorkdir.withMountedDirectory.withWorkdir.withExec.sync process "golangci-lint run --path-prefix toolchains/security// --output.tab.path=stderr --output.tab.print-linter-name=true --output.tab.colors=false --show-stats=false --max-issues-per-linter=0 --max-same-issues=0" did not complete successfully: exit code: 1

2. Too much span context.

  • What I need: nothing or maybe a link to that span in dagger cloud (I'm not debugging my module, just running a linter)
  • What I get: 14 lines per check of low-level dagger calls and their arguments
#

The same screenshot with only the information I need:

tidal spire
#

@tepid nova trying to catch up - whats the status of checks + toolchains? is it working on any branch somewhere or should I put up a PR since checks are merged?

tepid nova
still garnet
tepid nova
#

missed that one

still garnet
#

probably still more to do after, but getting there. I think your parallel package might need to use the telemetry.EndCause helper internally to fix some of it, which gets back to the dagger.io/dagger/telemetry dependency

tepid nova
#

@still garnet I ran into an interesting situation in that example:

  • All my errors except for one were golang-ci-lint failures. So the context tree was completely useless (don't care how the tool is run, just about its logs). I also decided I don't care about exit code either. If exit code is meaningful, then it makes sense for the module code to expect & emit a custom error. The happy path should be: don't care

  • one error was a failed moduleSource().asModule() caused by missing dagger.json files in a few places. First reaction: "oh no in this case I do need the context tree! There are no logs, only the errors and those need context!". Second reaction: "actually if we're leaning on logs for user functions, we should do the same for system functions. Instead of printing an error with "no such file or directory" wrapped with a bunch of noise, wrapped in a complicated context tree, there should be an actual log message saying "no such file or directory" and we should print that to the user

TLDR if we're embracing logs for user-facing error context, we should embrace it all the way, including core functions. Then IMO we won't need to show special context trees around errors at all. Either you look at the logs, or you want to dig deeper, and you look at the compelte trace

tidal spire
tepid nova
#

We're trying to get the monolith refactor ready to merge... It's a big lift but getting close! You can run its checks+toolchains and everything

tepid nova
#

(not compatible with 0.19.6 checks. need main

tidal spire
#

dev build from 96a42e7e0 (head of ci-faster-load)

tepid nova
#

Ah I haven't rebased on main yet (Alex's PR also changed the CI module so I have to resolve that)

#

Until I do (today), you should test ci-faster-load with dagger@main

#

Just try to use it, and as soon as you find issues (you will), add them to the checklist in the issue ๐Ÿ™‚ That's what I'm doing

tidal spire
#

oh yeah looks great with a main build

tepid nova
#

@tidal spire I'm taking a quick break, but in 15mn if you want, we can stress test it together ๐Ÿ™‚

#

Could be a good demo too if anyone else is interested

tidal spire
#

nice i have to step away for a few minutes soon but i'll be back after!

tepid nova
#

I'm going to schedule an event. Should I schedule it for 12h30 PT? (in an hour)?

tidal spire
#

(violets home sick and shannon has a meeting at 3 est)

tepid nova
#

What time works for you

tidal spire
#

yeah 1230 or 1. If you're demoing I can watch whenever

tepid nova
#

Too late to run a release? ๐Ÿ™‚

tepid nova
#

@still garnet we're live-dogfooding with @tidal spire, there's a weird glitch where the output of dagger checks is.. flaky somehow? Same command will output different things. Like some checks just aren't there

tepid nova
tepid nova
#

@civic yacht I looked at the code but couldn't figure it out. Is ModuleSource.configExists() cheaper than ModuleSource.sync()?

#

Or does accessing that field imply sync()?

civic yacht
tepid nova
#

Will sync() fail if there's no valid dagger.json?

civic yacht
#

I'm assuming you are trying just like dag.ModuleSource("...").Sync(ctx). If you chain more calls to ModuleSource then the answer starts to vary

tepid nova
#

OK thanks! I was looking at the current code in CLI to initialize a module, to choose the most accurate & clear wording for the spans we show users

#

I'm using validate for ConfigExists which I think is accurate and clear. The only problem is that it takes a long time... Because I guess it triggers some actual file uploads/downloads. So I was thinking of calling it "materialize" or "tranfer files" or other sync-ish word.

#

TLDR seeing ConfigExists taking a long time got me wondering if I really understand what it does and why we call it in the CLI

#

@civic yacht will it make things faster if I call Sync() and then ConfigExists()?

#

That way I can say:

  • transfer files โœ…
  • validate โœ…

And not be lying ๐Ÿ™‚

#

I know it's a small thing, it just always bothered me that when reading those initialization spans I have no idea what each step actually does

civic yacht
# tepid nova <@949034677610643507> will it make things faster if I call `Sync()` *and then* `...

no:

  • Sync by itself will just trigger execution of dag.ModuleSource("foo")
  • ConfigExists triggers execution of dag.ModuleSource("foo") and then returns a bool field set during that execution

So they are almost the same thing, ConfigExists has probably microseconds of overhead to retrieve the bool field too, but that's it. Calling sync and then configExists would just add a little bit of extra overhead.

I'm using validate for ConfigExists which I think is accurate and clear. The only problem is that it takes a long time... Because I guess it triggers some actual file uploads/downloads. So I was thinking of calling it "materialize" or "tranfer files" or other sync-ish word.
Yeah whether it's a local or git source, it's essentially just pulling of sources and reading various pieces of configs from dagger.json. It also does the same for each dependency (recursively). So those words would make sense to me

tepid nova
#

@tidal spire @still garnet I figured out the root cause of that persistent build error in .dagger. It's not that the dagger runtime was not generated - it's that toolchains are no longer included in the codegen ๐Ÿคทโ€โ™‚๏ธ This is since I rebased on main. So something happened on main that changed the codegen behavior of toolchains... Seems related to the other unexplained issue where dagger functions no longer prints the correct description for toolchains, when it definitely did since I fixed it in main. Maybe same root cause for both mysteries?

#

(but I can't find any recent commit that does anything suspicious)

charred lotus
tepid nova
#

@still garnet re telemetry.EndWithCause(). Should I just always use it in parallel? Or make it configurable whether to call a) End() or b) EndWithCause() ?

#

Also should I stop calling span.SetStatus(codes.Error, err.Error()) ?

Answering my own question: EndWithCause() already handles it, so yes I can safely remove that

still garnet
tepid nova
still garnet
#

oh, euch that that leads to hairy dependency issues, since the modules won't have the right dagger.io/dagger (unless you do a go mod replace i guess? if that works in a module? had trouble with that in Dang)

tepid nova
#

So far:

diff --git a/.dagger/go.mod b/.dagger/go.mod
index 17b249f3e..e0c7807cb 100644
--- a/.dagger/go.mod
+++ b/.dagger/go.mod
@@ -9,6 +9,7 @@ require (
 
 replace (
     github.com/dagger/dagger => ..
+    dagger.io/dagger => ../sdk/go
     github.com/dagger/dagger/engine/distconsts => ../engine/distconsts
     github.com/dagger/dagger/sdk/typescript/runtime => ../sdk/typescript/runtime
 )
diff --git a/dagger.json b/dagger.json
index 2e3436cd2..f9ba52f12 100644
--- a/dagger.json
+++ b/dagger.json
@@ -9,7 +9,8 @@
     "sdk/typescript/runtime/**/*",
     "go.mod",
     "go.sum",
-    "util/parallel"
+    "util/parallel",
+    "sdk/go"
   ],
   "dependencies": [
     {
#

I have graduated from a runtime build error to a codegen error ๐Ÿ™‚

Error: load package ".": no packages found in .
! process "codegen generate-typedefs --module-source-path /src/.dagger --module-name dagger-dev --introspection-json-path /schema.json --output typedefs.json" did not
  complete successfully: exit code: 1
#

I've done this sort of "go.mod replace + dagger.json include" tweak many times, to import parallel from various dagger modules. This one is slightly different though

#

Oh no I have to do the extra replace it in every module that imports parallel

still garnet
#

yeah ๐Ÿ˜ญ

still garnet
#

i'm this ๐Ÿค close to just yeeting error origin metadata into error strings and parsing it out lol

tepid nova
#

Funny I actually removed parallel from almost every dagger module in my branch

#

thanks to dagger checks, tons of aggregator functions just aren't needed anymore

#

btw... we can make doug a toolchain now ๐Ÿ™‚

#

(I know technically it's the dev module but doug is too cool a name not to use it ๐Ÿ˜› )

#

(welp adding the go.mod replace everywhere did not fix the codegen error..)

#

Now go mod tidy complains in each module where i added the replace. Some sort of interference between github.com/dagger/dagger and dagger.io/dagger?

#
diff --git a/.dagger/go.mod b/.dagger/go.mod
index 17b249f3e..e0c7807cb 100644
--- a/.dagger/go.mod
+++ b/.dagger/go.mod
@@ -9,6 +9,7 @@ require (

 replace (
        github.com/dagger/dagger => ..
+       dagger.io/dagger => ../sdk/go
        github.com/dagger/dagger/engine/distconsts => ../engine/distconsts
        github.com/dagger/dagger/sdk/typescript/runtime => ../sdk/typescript/runtime
 )
$ go mod tidy
go: finding module for package github.com/dagger/dagger/.dagger/internal/dagger
go: github.com/dagger/dagger/.dagger imports
    github.com/dagger/dagger/.dagger/internal/dagger: module github.com/dagger/dagger@latest found (v0.19.6, replaced by ..), but does not contain package github.com/dagger/dagger/.dagger/internal/dagger
#

@still garnet permission to copy-paste? ๐Ÿ˜›

still garnet
#

lol yes

tepid nova
#

I don't know how to fix this

still garnet
#

that's what I ended up doing

tepid nova
#

just the whole telemetry package?

still garnet
#

you can just yoink the one function and handful of other types it needs

#

it'll just be more code we can triumphantly delete once we figure out the statuses/subchecks API

tepid nova
#

And it builds ๐Ÿ™‚

#

Now to add the logRollup

#

To confirm my understanding:

  • For now, to get log & span rollup in my "sub-checks", I need to set the corresponding attributes in my check function. This is a stopgap (we don't want to require all toolchain devs to do this)

  • Soon,we will have an official API for sub-checks, and the engine will handle setting those attributes on behalf of the checks dev.

--> Agree?

still garnet
#

yep!

tepid nova
#

Question for the theseus masters... @civic yacht @rocky plume . How hard would it be to start collecting some sort of "cache hit rate" metric from the engine? I think we need a number to quantify, even roughly, how much an engine is using its cache. If only to see if that number improves or degrades over time.

In the context of scale-out and parc, it's difficult to evaluate whether a given load distribution strategy is working or not without that number.

For example, I'm looking at an engine I'm currently alllocated to. It's 7 days old, and has a great variety of modules in its cache. On the one hand keeping instances around longer should give us better cache use. But on the other hand, re-using the same instance for many different modules might hurt cache performance. Hard to tell, without some sort of measure... You get the idea.

#

Ironically when we slice up an engine for multiple tenants, it's easier to slice up CPU, memory and disk than it is to slice up the engine cache

tepid nova
tepid nova
#

It's looking better ๐Ÿ™‚

still garnet
#

oh btw, random idea re: dagger generate-ing from the path you want to regenerate: when we run a // +generator func we can inspect its returned Changeset and record the changed files in dagger.json, something like this:

{
  "generators": {
    "dagql/foo.go": "TypeName.funcName"
  }
}

then when you do dagger generate ./dagql/foo.go or dagger generate ./dagql it'd run TypeName.funcName.

#

could get spammy in some cases (e.g. elixir/php have a bunch of separate files iirc), but you get the idea

tepid nova
#

a conendrum: how hard would it be to wrap our integration tests in go test?

#

our go and engine-dev toolchains are competing to run the same test suites

fair ermine
#

@still garnet There's a problem with the dang SDK, all our CI and local dev build are failing because of:

# dagger/dang/entrypoint
entrypoint/main.go:445:20: funDef.WithCheck undefined (type *"dagger/dang/internal/dagger".Function has no field or method WithCheck)

We are currently investigating with Yves to understand what's going wrong and why the CI has been green 2 days ago while you did your changes on the dang SDK 5 days ago (when adding the checks).


Yves noticed that the issue may be triggered because you are using unreleased API on the Dang SDK (it still doesn't explain the CI issue tho)


I also tried to run dagger functions on your dang module and same error, it's failing (with dagger v0.19.15 and v0.19.16).

tepid nova
#

@tidal spire can you help me get 11373 merged today? ๐Ÿ™ We should be close

rocky plume
tepid nova
tepid nova
#

@still garnet FYI in 11373, errors still get the unneeded "context tree", even though I changed parallel to use your fix with error hoisting. Is that normal, or a sign of a bug in my parallel implementation?

still garnet
tepid nova
#

loadPackage -> 4m30s and counting

tepid nova
#

@fair ermine there's only Elixir SDK left to merge/spin out right?

tepid nova
#

๐Ÿ™‹ for those who are familiar with our release flow. Do we ever call dagger -m releaser publish? Or is that actually dead code?

#
$ grep -r releaser RELEASING.md .github/
RELEASING.md:  dagger call -m releaser bump --version="$ENGINE_VERSION"
RELEASING.md:  export CHANGIE_MAINTAINERS=$(dagger call -m releaser get-maintainers --github-org-name dagger --github-token="cmd://gh auth token" --json)
RELEASING.md:  dagger develop --recursive -m ./releaser
.github/workflows/publish.yml:          module: github.com/${{ github.repository }}/releaser@${{ github.sha }}
.github/workflows/publish.yml:            --goreleaser-key=env:GORELEASER_KEY \

--> No relevant reference to it anywhere EDIT: last line is the relevant line

fair ermine
tepid nova
#

Could this possibly be dead code??? neverming I found it: module: github.com/${{ github.repository }}/releaser@${{ github.sha }}

#

btw we can use a .env for release...

#

Note @still garnet : our own publish function hand-rolls the pattern that many users have been asking us for on the tests: "give me a report artifact even if it fails"

#

Which is only possible if we suppress errors at the app layer ("dagger errors mean something went wrong with dagger")

#

OR, at some point down the road, we offer a core feature that can produce a report artifact for any dagger call, from its otel trace

#

(it does feel like that's what our own publish report code is trying to replicate. "We called this publish function, with these arguments, and then we got this error as a result", etc)

#

--> dagger --export-report=./report.md call ...

#

or as a intermediary step: allow checks to return a File or Directory? Since we have +check now

still garnet
#

hmm even with that you'd need a way to still 'fail and return'

tepid nova
#

yeah

#

doesn't work as is

#

So maybe in the same vein as adding support for a "subcheck" interface-ish, we could also support a "check-result" interface-ish, with a pass(): Bool! and report(): File!

#

(in lieu of void)

tepid nova
#

<@&946480760016207902> I'm going to make the following changes to our CI. Any red flags?

  • Engine tests default to --race=false, but CI config explicitly sets --race=true. Objections to changing the default to true, so the CI config can be dumber?
  • Engine tests default to --parallel=0, CI config sets --parallel=16. OK to set default to 16 for same reason?
#

Alternatively, I can move those settings to a .env

#

(but then we need to talk about how to get that .env into CI)

tepid nova
civic yacht
tepid nova
#

What about race enabled by default? (setting aside the DX complication)

civic yacht
tepid nova
#

@still garnet we're redoing the GHA config plumbing on top of dagger checks, and ditching the generator... There's one workflow that is a little more custom, your llm test workflow. I wanted to check with you what we should do:

  • Path filters. Easier if we remove it, I'm guessing it will make things slightly less efficient in the short term (the job will trigger when it's not needed), but that all goes in the trash soon and replaced with smart checks anyway. Just checking that it doesn't actually break anything to remove those test filters
"on":
    push:
        paths:
            - core/llm.go
            - core/mcp.go
            - core/env.go
            - core/llm_*.go
            - core/llm_*.md
            - core/schema/llm.go
            - core/schema/env.go
            - modules/evaluator/**
            - modules/evals/**

The shell job:

 call: --allow-llm all test specific --env-file file://.env --pkg ./cmd/dagger --run CMD/LLM

--> Not sure what that even does or what to do with it

still garnet
tepid nova
#

That's a highly motivating test case for smart checks ๐Ÿ™‚

still garnet
#

if you need to punt for now you can, since that job currently is broken in CI

#

i just run them locally atm

tepid nova
#

For now I'll "eject" the yaml to be manually edited if that's ok?

still garnet
#

sgtm

tidal spire
#

what if we change that one to just be dispatch?

still garnet
#

that'd be fine too ๐Ÿ‘

tepid nova
#

@tidal spire FYI I had to eject 2 workflows:

  1. llm.yml ๐Ÿ‘†
  2. daggerverse-preview.yml
tidal spire
#

I'm not actually sure what the purpose of the daggerverse-preview workflow is. I'll check into it but if anyone has context please educate me

#

Looks like it checks that crawling modules works as expected when we're creating a new release. I think we could find another way to test this that fits better in our test stack

#

Especially now that the crawling code is in a separate library which wasn't the case when this pipeline was created. The library is private but we can move it if it makes sense

#

we dont have to make that change now, ejecting is fine. just noting for the future

tepid nova
#

We're getting closer ๐Ÿ™‚

#

This ๐Ÿ‘† is all that's left of the monolith

#

To be fair engine-dev is a beast

#

the monolith within the monolith

#

But we'll get to that later

#

You fix 90%, and suddenly the remaining 10% looks huge

still garnet
#

Any hesitation with me shortening this error message?

process "go test -ldflags -X github.com/dagger/dagger/engine.Version=v0.19.7-251117095547-480c5d59242f -X github.com/dagger/dagger/engine.Tag=bfeb1a7cec3757a60e9e2c5bf176c1131e9af388 -parallel=24 -timeout=60m -count=1 -run TestModule ./..." did not complete successfully: exit code: 1

=>

exit code: 1
diff --git a/core/container_exec.go b/core/container_exec.go
index 7d8ea6e90..2efe87f8f 100644
--- a/core/container_exec.go
+++ b/core/container_exec.go
@@ -557,7 +557,7 @@ func (container *Container) WithExec(
        }

        if execErr != nil {
-               return nil, fmt.Errorf("process %q did not complete successfully: %w", strings.Join(metaSpec.Args, " "), execErr)
+               return nil, execErr
        }

        return container, nil

cc @civic yacht

civic yacht
still garnet
#

ah good call, will do! yeah it's redundant with telemetry, and SUPER verbose when like 10 things fail

tepid nova
#

<@&946480760016207902> if you recognize this GHA config, this is your warning that it will get nuked soon, and you should start thinking about how to migrate it off of GHA ๐Ÿ™‚

  • _dagger_on_depot_local_engine.yml
  • _dagger_on_depot_remote_engine.yml
  • alternative-ci-engines-1.yml
  • alternative-ci-runners-1.yml
  • benchmark-engine.yml
  • benchmark.yml
  • changelog.yml
  • daggerverse-preview.yml
  • deploy-docs.yml
  • llm.yml
  • publish.yml
  • stale.yml
  • trace-workflows.yml

"soon" as in: in the next few weeks.

#

@tidal spire I'm taking the PR live... let's see what happens!

#

To play with it:

dagger -m github.com/dagger/dagger@main call playground with-directory --path=. --source=https://github.com/shykes/dagger#ci-faster-load terminal
#

@still garnet would it be easy to get dang sdk to work with 0.19.6 and also fully support checks?

still garnet
#

in a hacky kind of way yeah - i could just sidestep the codegen'd WithCheck call and use the underlying graphql client. wouldn't be too bad

#

assuming theres a way to detect thinkies

tepid nova
#

Mmm what is WithCheck?

still garnet
#

Function.withCheck, it's the typedefs API that the pragmas translate to

tepid nova
#

Ah! of course

still garnet
#

(good time to bikeshed btw, hastily chosen name)

tepid nova
#

Those CI tests are passing a little too successfully...

#

I wonder if we remembered to exit 1 when dagger checks fails ๐Ÿค”

still garnet
#

that should be in, came with the TUI

#

maybe our tests are just passing padme_right

tepid nova
#

Looks like it!

#

I had a typo in the new split-test (made that a toolchain)

#

dagger checks test-split/* -l

#

There is one issue left... Our glob path matching for checks is confusing our GH actions... It doesn't escape it properly, so if any file matches, it expands it on files ๐Ÿ˜ญ

#

Note that this check succeeds... Because no checks match, so it happily checks nothing ๐Ÿ™‚

#

@tidal spire the test-split checks are run successfully in the matrix, but seeing a bunch of timeouts... are you sure these get scheduled to separate machines?

tidal spire
tepid nova
#

the other issue I see is the unescsped globbing, tried to fix it in d4gh but didn't seem supper easy

#

other than that, looking promising ๐Ÿ™‚

#

i'm happy with the separate test-split pattern, it's a stopgap, but uses toolchains in a way that feels clean

still garnet
tepid nova
#

forgot to add that to the todo list

#

also tomorrow I will switch to : as separator

tidal spire
#

just pushed up the wip elixir module @tepid nova ! left a comment on the PR too, but it has the same FIXMEs as typescript since we have several spots that need a Strings.replace

#

happy to attempt that on dang @still garnet but you can probably do it faster ๐Ÿ˜›

still garnet
#

ah fun, yeah lemme take a swing

#

alternatively: file("foo.txt", content).withReplaced("x", "y", all: true).contents ๐Ÿ˜›

#

@tidal spire pushed: "foo".replace("o", "x") returns "fxx", there's a count arg if needed

tidal spire
#

Wow that was fast! Thanks!!

tepid nova
#

toolchain devloop is getting faster ๐Ÿ’ช

tepid nova
fresh harbor
still garnet
tidal spire
leaden glade
tidal spire
#

I know at some point it wasn't working and there was an unsuccessfull effort to rewrite it

tidal spire
#

yeah

leaden glade
tidal spire
tepid nova
#

honestly we can do the recorder toolchain in a followup

#

(back)

#

Is the todolist up-to-date @tidal spire ? I don't want to conflict

#

(looks like it)

#

@astral zealot @wild zephyr I may need help understanding this timeout on test splitting...

#

Also is it possible that I accidentally triggered scale-out from GHA @civic yacht ?

civic yacht
tidal spire
#

well elixir is done-ish. As much as the typescript one is. They're both missing bump

tepid nova
#

@tidal spire I'm hanging out in lounge trying to figure out those timeouts. Whenever you're out of the zone ๐Ÿ™‚

obsidian rover
#

Hello, to access the args of a parent inside the dag, would this be the only way (for non-sensitive args):

 parentID := dagql.CurrentID(ctx).Receiver() // same as parent.ID()

      if arg := parentID.Arg("url"); arg != nil {          // look up by name
          if url, ok := arg.Value().ToInput().(string); ok {
              // use url ...
          }
      }
still garnet
#

github having issues for anyone else? the site works, but getting 500s when my engine tries to clone/resolve

still garnet
leaden glade
#

I'm stoping there for tonight, but I have something good for the toolchain descriptions + list. I just can't build and test it right now because of the current GitHub incident, so I'll finish that first thing tomorrow morning

tepid nova
#

One day we'll have caching & lockfiles so good, github outages won't even affect us ๐Ÿ™‚

civic yacht
tidal spire
#

same, we're so back

#

it would satisfy IsSemver (regexp.match) too since we're doing a hack with a go package at the moment

tidal spire
#

I guess we could do a similar hack for replace for now

tepid nova
#

@tidal spire I'm still seeing a bunch of mysteriously canceled / timed out jobs even after pushing your one-line yml fix

#

(almost done fixing the last typescript-sdk error)

tidal spire
tepid nova
#

Note quite there yet

charred lotus
#

@still garnet @civic yacht my git bisect says https://github.com/dagger/dagger/pull/11439 changed something I was relying on. Basically I have a println in HandleChanges() and unlazy and neither are printed when i do this:

foo=foo-$RANDOM
dagger -M -c 'directory | with-new-file bar BAR | changes $(directory | with-new-file '$foo' foo) | added-paths'

Am I missing something obvious ?

tepid nova
#

@tidal spire picking up kids. Still have timeouts on split tests ๐Ÿคทโ€โ™‚๏ธ

civic yacht
tidal spire
#

Yeah I can't imagine it's the workflow file at this point

tepid nova
tepid nova
leaden glade
tepid nova
#

thank you! if you have any ideas on why the remaining checks are failing, I'm interested! ๐Ÿ™

leaden glade
leaden glade
#

We have:

  • dagger functions
  • dagger call --help
  • dagger toolchain list
  • dagger checks -l
    Can we/should we align them more?
leaden glade
#

There's something odd I'm not sure to understand. If anyone has an idea.
I did a commit that fixes go/check-tidy complains. I did this by running dagger call go tidy from inside an instance based on main (not from my machine). The commit is pushed on the ci-faster-load branch.
With it, dagger checks go/check-tidy is happy.
But dagger checks test-split/test-base is now complaining saying go mod tidy must be run. And if I revert my commit, then the tests can be run.
So somewhere we have something wrong as the go.mod/sum changes are good on one case, bad on the other. And the other way around.

tepid nova
tidal spire
leaden glade
#

I always have timeout errors when trying to run the tests from a playground (local one or using cloud)

Head "https://mirror.gcr.io/v2/library/alpine/manifests/latest?ns=docker.io": dial tcp: lookup mirror.gcr.io: i/o timeout

For instance when running dagger checks test-split/test-base
Is that a known limitation? Or something I'm doing wrong?

tidal spire
#

@leaden glade @fair ermine since we're all looking at the same thing, I think I have a fix for the tidy issue and i'll push it up in a moment. Just verifying i've caught everything

leaden glade
fair ermine
tepid nova
leaden glade
fair ermine
tidal spire
tepid nova
#

So we're saying the remaining timeouts is becuase of telemetry getting lost in dagger cloud?

#

Trying to understand the way forward to merging this thing

tidal spire
#

Seems like it if telemetry is also hanging outside of ci for those checks. Maybe there's a missing defer somewhere ๐Ÿค”

tepid nova
#

Thanks for catching that go mod tidy issue, it was driving me crazy yesterday

#

Could it be the fact that it's all running on a main engine?

#

but that would affect other checks also...

#

Something about dang checks then?

#

No, some of those have passed

tidal spire
#

I'll keep poking around. It seems isolated to test-split but i dont know how that matters yet

tepid nova
#

yeah same

#

I'll call the linter in the meantime ๐Ÿ™‚

#

And try to get those : separators done before release (which we're supposed to do today!)

tidal spire
#

nice! I'm still working out the ignore thing. I think it might need to be more specific, like core/integration/testdata/modules/**

tepid nova
#

<@&946480760016207902> how do I "update the tests"? The go tool tells me to run go test -update but I'm assuming we need to run a wrapper equivalent?

still garnet
#

(we have a few that use the golden output pattern)

tepid nova
still garnet
#

that one's call test telemetry --update

tepid nova
#

Ha ha I was so close:

 dagger call test update --run TestTelemetry

--> ๐Ÿ’ฅ

still garnet
#

that's the one we use for the LLM tests, could maybe consolidate

tepid nova
#

btw @still garnet the reason it took me a while to answer your question: that particular PR of mine seems to really mess with the default trace view in cloud... ๐Ÿ˜ฌ . I think it's because of parallel's use of reveal

#

"our module loading telemetry is so user-friendly, we hide everything else so you can really appreciate how user friendly it is"

still garnet
#

eh yeah i'm pretty hesitant to use Reveal for internal stuff. is it crucial for the PR?

#

imo we should be trending towards deprecating it, but i'm not 100% sure yet (and we're not there yet)

tepid nova
#

Jokes aside, it happens to be what parallel does by default, and all I know is when I change any of the attributes, it starts a whole game of wack-a-mole. No opinion on reveal itself though

#

I could just remove the reveal from parallel completely? Will the custom spans still visible? Will tryu

still garnet
#

my guess is you'll want reveal when it's used within a module function, and not when it's used within the engine. but, while we're on the topic of deprecation, might as well try removing it entirely and see how it looks in modules too

tepid nova
#

@still garnet which commit of the dang SDK should I use for it to work with dagger stable?

#

(trying the vendor-by-blueprint trick to quickly swap the version when I need to run it locally with stable)

still garnet
tepid nova
still garnet
tepid nova
#

nice

#

that would be ideal

#

(also don't worry about the tag, I have 2 dagger.json and a symlink ๐Ÿ™‚

still garnet
# tepid nova that would be ideal

pushed, with a caveat that modules that use the @check directive will still fail because that's an unknown directive. can work around that too if needed

tepid nova
#

ha ha I just ran into the exact same issue with my workaround

still garnet
#

welp

tepid nova
#

(that's what happens if I load with 0fc10d961e421944c1f3dbff4961dd5b0a59998c

#

if it's easy to gracefully ignore unknown directives (or at least @check as a special case) that would be ideal

tepid nova
obsidian rover
tepid nova
#

Also vendor-by-blueprint is really nice

tidal spire
#

@still garnet running checks on the kill the monolith branch - if I dont expand go/lint it just shows me 1 error at the end when there are actually several. If I expand it, no problem

still garnet
#

@tidal spire just fixed that on main

tidal spire
still garnet
#

oh yeah, that's been a thing for ages, it's from the wolfi module

#

would be nice if it simplified it to the wolfi call

tepid nova
#

@still garnet but what's weird is that the call to the wolfi module is not visible in the trace

still garnet
#

how is that container even being passed in?

#

ah it's the default in ./modules/go?

tidal spire
#

yeah in the constructor

still garnet
#

what's probably happening is: 1. Go module is constructed, calls Wolfi to create that container, stores it as a property, 2. CLI runs checks, engine sets WithRepeatedTelemetry for the ctx, which causes all those cache hits to show up

#

maybe we just don't need that WithRepeatedTelemetry anymore?

#

@tepid nova do you remember the original motivation?

tepid nova
still garnet
#

@civic yacht a weird one...: https://dagger.cloud/dagger/traces/589323d9cf8e741614d3b9bfbc07c2a9?listen=e0964f87b1161272&listen=071224767ba23292

failed to sync: conflict at "cmd": change kind changed from "delete" to "add" during sync

I get what this is trying to guard against (concurrent changes during upload) but I feel like it's happening when it shouldn't. I wasn't making any changes and don't think any background thing was.
To add to the weirdness (or maybe answer for some), it looks like the filesync tried to touch /home?? Maybe it's looking from / instead of . in some situations?

tepid nova
wild zephyr
wild zephyr
# wild zephyr

@tepid nova it's because so we don't get rate-limited by dockerhub and are pulls are authenticated

#

that config file in our CI is currently injected by namespace

#

with an account that they own which has a some special limits for their fleet

tepid nova
#

I remember that part, but I thought there was also a pull-through cache that we inject at the engine config level, that made the docker config less important?

still garnet
wild zephyr
#

but whenever the image is not in that mirror, It'll use the config.json to fetch it from dockerhub authenticated

#

this config is not in the config.json though. That lives in the engine.json file

tepid nova
# wild zephyr but whenever the image is not in that mirror, It'll use the config.json to fetc...

We're seeing a weird network error, and trying to determine if it's caused by 1) unauthenticated Docker Hub pulls (we haven't re-enabled that docker config feature yet), or 2) a return of the dagger-in-dagger-in-dagger CIDR issue? 3) some other unknown issue...

@wild zephyr @civic yacht do you recognize the error in a way that helps diagnose?

https://dagger.cloud/dagger/traces/86cc7f665d07a2f3140df38f5d247084?span=f5e8ad5cff94ffbe

#
1) Dagger\Tests\Integration\ClientTest::testContainer
GraphQL\Exception\QueryError: failed to resolve image "docker.io/library/alpine:3.16.2" (platform: "linux/amd64"): failed to resolve source metadata for docker.io/library/alpine:3.16.2: failed to do request: Head "https://registry-1.docker.io/v2/library/alpine/manifests/3.16.2": dial tcp: lookup registry-1.docker.io: i/o timeout [traceparent:86cc7f665d07a2f3140df38f5d247084-29913271c3172612]
civic yacht
#

it's hard to tell what's happening since there's so much nesting but the first thing that comes to mind is that if we're spinning up multiple nested engines, not only does each one need a CIDR that's different than the parent engine's CIDR, they them selves also need different CIDRs from each other

tepid nova
civic yacht
tepid nova
#

@still garnet so there's good news and bad news.

  • Good news: the TUI showing memory metrics makes it really easy to spot infinite recursions blowing up your stack
  • Guess the bad news ๐Ÿ˜› (it's happening in the dang runtime)
#

narrowing down a repro

tepid nova
#

This crashes with infinite stack recursion:

  pub bump(version: String!): Changeset! {
    #let versionFilePath = "sdk/elixir/lib/dagger/core/version.ex"
    # let before = directory().withFile(versionFilePath, workspace.file(versionFilePath))
    #let before = directory()
    container().
    from("alpine:3").
    withWorkdir("/app").
    withDirectory(".", directory()).
    directory(".").
    changes(directory())
    #withExec(["sed", "-E", "-i", "", "-e", "s/@dagger_cli_version \"([^\"]+)\"/@dagger_cli_version \"" + version + "\"/g", versionFilePath]).
#    directory(".").
#    changes(directory())
  }
  pub bump(version: String!): Changeset! {
    container().
    from("alpine:3").
    withWorkdir("/app").
    withDirectory(".", directory()).
    directory(".").
    changes(directory())
  }
still garnet
#

do you have the stack trace handy?

tepid nova
tepid nova
#

(most of the other runs I canceled before the end)

still garnet
#

hmm kinda looks like an actual stack overflow in the dang code. is it pushed somewhere?

tepid nova
#

(to my great confusion)

still garnet
#

tried that in my own module but no cigar

#

do you have another function named directory or container by any chance that depends on bump?

tepid nova
#

arg I checked for directory() but just realized, there is indeed a container().....

#

it doesn't look like it should trigger a recursion though. But yeah that can't be a coincidence

still garnet
#

could be over-indexing but i wonder if this sort of thing is a point in favor of requiring explicit self.. depends on how common of a footgun it is

#

it's nice and terse being able to refer to sibling functions without it, but you don't have the dag. escape hatch that you normally would with dag.Container()

#

i guess alternatively we could be binding the dagger API to a var instead of having all of Query.* in the toplevel scope

#

tradeoffs all around...

tepid nova
#

ok I found the recursion I think

#

container() calls withBase() which calls container()...

still garnet
#

that'll do it

tepid nova
#

fixing

#

I think I had another instance of shadowing, in another module, whenever I reference source() I get a mystery string instead of my pub source: Directory!. Couldn't track down where that string comes from. Meant to open an issue for you later

Got me wondering if there's a special case source symbol somewhere in the dang runtime

still garnet
#

Feels like the paths forward are:

  1. Instead of Dagger's API providing a global container(), should Dagger's Query be bound as e.g. Dag.container?
  2. Instead of container() resolving to object-local container field, should it always resolve to the outer container(), and for local you need to do self.container()?
  3. This is fine thisisfine
still garnet
tepid nova
still garnet
#

oh i see, didn't get past typechecking

still garnet
# still garnet Feels like the paths forward are: 1. Instead of Dagger's API providing a global ...

leaning heavily towards 2 for a few reasons atm, gonna try it out.

  1. Precedent in Python (familiarity points)
  2. Much easier to tell what's a local var vs. what's a local method call (readability points)
  3. Much easier to support copy-on-write semantics. Currently Dang has a carve-out so that self.a = 1, siblingMethod() calls siblingMethod with the latest self (previously it called the lexically bound siblingMethod which resulted in each call having the same self => pollution). If I go this route, self can just go back to being a regular variable, rather than a stand-in for this "dynamic scope" mechanism
tepid nova
still garnet
#

if we went that route I'd probably opt to have an explicit import at the top. which is already there, it's just half-baked

tepid nova
#

@still garnet dang feature request: print a string to "stdout" ๐Ÿ™‚ to show up in function logs

still garnet
#

there's print(x)

civic yacht
#

I know it's fixed in the current CI refactor PR, but I noticed we are currently running all SDK tests and lints 2 times, all in parallel. Quick fix in the meantime: https://github.com/dagger/dagger/pull/11449 (also has a small java sdk test improvement that saves like 15s)

still garnet
#

dang: escaping shadowing (aka self or import?)

tepid nova
#

Hallelujah!

leaden glade
#

Improve test-split toolchain so that new test suites don't silently get dropped in the future
Could anyone detail a bit more this (first ToDo entry of https://github.com/dagger/dagger/pull/11373)? I'm not sure to understand what it means.
If I'm right, as some test are run with specific and testBase is running everything else, any new test suite should be run by testBase, no? Or is the issue at a different level?

tepid nova
#

I forgot that testBase uses -skip and not '-run`. So future new test suites will be automatically included.

wild zephyr
# tepid nova Hallelujah!

noice. I guess you were able to find what was happening with the dagger-in-dagger-in-dagger network errors I assume?

tepid nova
tepid nova
#

networking issue is unresolved but also very niche (only affects our ci) so leaving it as a followup is ok imo

tepid nova
#

@leaden glade thank you for the extra dangification ๐Ÿ˜

tepid nova
#

Rebasing on main and dealing with last minute merge conflicts...

tepid nova
#

I wanted to clean up kill-monolith into a few large but clean commits, it's proving difficult