Lockfile on egraph 🧵 | Dagger | Page 1

tawny apex Apr 8, 2026, 6:28 PM

#

FYI @earnest spruce @uneven plinth @unique robin @whole ibex

Starting this thread to track

#

@earnest spruce let me know if you need help today

#

Oh wait there was already another thread... facepalm

#

Sorry

earnest spruce Apr 8, 2026, 6:30 PM

#

I have https://github.com/dagger/dagger/pull/12922 in place, which was a result of first rebasing lockfile onto main, then merging that into the egraph branch. It's compiling and passing some tests there, but not all of them, which I'm trying to sort out today.

GitHub

[ignore] Lockfile on egraph by alexcb · Pull Request #12922 · dag...

tawny apex Apr 8, 2026, 6:30 PM

#

--> https://discord.com/channels/707636530424053791/1491197590174302208

earnest spruce Apr 8, 2026, 6:31 PM

#

I'm currently working on TestLockfile/TestLiveDiscoversQueryEntries in particular -- I think there's some sort of race where the client is shutting down before the lockfile gets written.

uneven plinth Apr 8, 2026, 6:31 PM

#

@tawny apex were the lockfile tests passing originally before the rebase?

tawny apex Apr 8, 2026, 6:32 PM

#

I would not bet on that

#

Very possible that it's a genuine bug

earnest spruce Apr 8, 2026, 6:32 PM

#

hahah..... oh dear. yeah that's a good question to see if the failing ones were initially failing.

tawny apex Apr 8, 2026, 6:32 PM

#

Or a dumb test, or both

earnest spruce Apr 8, 2026, 6:32 PM

#

here's the current state after my merge:

--- FAIL: TestLockfile (0.00s)
    --- PASS: TestLockfile/TestWorkspaceUpdateRequiresLockfile (1.22s)
    --- PASS: TestLockfile/TestFromLockfilePinnedUsesPinEntry (1.93s)
    --- PASS: TestLockfile/TestFromLockfileFrozenUsesFloatEntry (1.95s)
    --- FAIL: TestLockfile/TestLiveDiscoversQueryEntries (3.07s)
    --- PASS: TestLockfile/TestFromLockfileDisabledIgnoresEntry (3.76s)
    --- FAIL: TestLockfile/TestFromLockfileLiveRefreshesEntry (4.10s)
    --- FAIL: TestLockfile/TestFromLockfilePinnedRefreshesFloatEntry (4.16s)
    --- PASS: TestLockfile/TestFromLockfileFrozenRequiresEntry (1.14s)
    --- FAIL: TestLockfile/TestLiveDiscoversGitEntries (10.30s)
    --- PASS: TestLockfile/TestLockUpdateRefreshesExistingEntry (7.30s)
    --- PASS: TestLockfile/TestGitBranchFrozenUsesFloatEntry (10.43s)
    --- PASS: TestLockfile/TestLockUpdateRefreshesExistingGitEntry (6.46s)
    --- FAIL: TestLockfile/TestLiveNestedQuery (6.53s)
    --- FAIL: TestLockfile/TestModuleSourceFrozenUsesLockedEntry (8.71s)
    --- PASS: TestLockfile/TestWorkspaceUpdate (8.73s)
    --- FAIL: TestLockfile/TestLockUpdateRefreshesExistingModuleResolveEntry (10.73s)
    --- FAIL: TestLockfile/TestWorkspaceUpdateNestedQuery (9.51s)
    --- FAIL: TestLockfile/TestLiveModuleCall (6.22s)
    --- FAIL: TestLockfile/TestLiveDiscoversModuleSourceEntries (11.15s)
    --- FAIL: TestLockfile/TestGitBranchPinnedRefreshesFloatEntry (12.20s)

tawny apex Apr 8, 2026, 6:33 PM

#

If it's only lockfile tests failing post rebase/merge, that's good news IMO 🙂

uneven plinth Apr 8, 2026, 6:33 PM

#

No there's still other test failures but there the known ones I have left on egraph too

earnest spruce Apr 8, 2026, 6:33 PM

#

I just re-ran it, and see that some flipped from passing to failing, so defintely feels like a race

uneven plinth Apr 8, 2026, 6:36 PM

#

@earnest spruce Test failures I have left right now are:

some non-deterministic output for TestTelemetry, squashed most of them now but the last non-determinism bits are tricky
a few TestLLMs
Legacy enum support and legacy service support... which is a real nightmare but tackling it now
Some unit test failures due to stale tests which would show up in test-base

So those ones will still be red for you. But if you see something outside that lemme know

earnest spruce Apr 8, 2026, 6:37 PM

#

thanks for that heads up, yeah I noticed a TestContainer ...network.None label related test was failing and went and confirmed it was also failing under your branch so I simply ignored it for now.

uneven plinth Apr 8, 2026, 6:38 PM

#

uneven plinth <@691818044368093268> Test failures I have left right now are: * some non-determ...

Goal is to finish squashing those today so egraph is green+clean ready to merge tomorrow or whenever the release is done by

uneven plinth Apr 8, 2026, 6:38 PM

#

earnest spruce thanks for that heads up, yeah I noticed a TestContainer `...network.None` label...

Oh yeah that one I think I pushed already, was just a fat fingered codebase-wide rename

#

@earnest spruce #1488639315255890012 message I forget what the state of the lockfile PR was at the time you took it over but I had a bunch of feedback on the approach when I had reviewed it last. There were a bunch of issues so it may be worth just addressing those, might fix any race conditions as a side effect

tawny apex Apr 8, 2026, 6:52 PM

#

@uneven plinth quick summary:

You did your initial review. Had specific concerns
I incorporated your feedback, but did not fully test or deslopify the result
Asked Alex for help, he took over the branch (2nd priority after the Dockerfile work)

#

So, yes your first round of feedback was taken into account, but no I can't tell whether it was applied fully and to your satisfaction 🙂

#

I think in spirit we agreed on the direction - 1) avoid round trip on every lockfile lookup; 2) don't force every lookup to go through a recursive dagql call; 3) treat the problem as a specialized sync problem ("locksync" rather than "filesync") because of multiple clients sharing a workspace & therefore its lockfile;

earnest spruce Apr 8, 2026, 6:56 PM

#

uneven plinth <@691818044368093268> https://discord.com/channels/707636530424053791/1488639315...

good point regarding the back and forth bits between the client and potential races there. I'll take a closer look and fix anything up I come across.

tawny apex Apr 8, 2026, 7:01 PM

#

Important context for that:

1 workspace = 1 lockfile
1 workspace may be shared by multiple clients (because modules inherit their caller's workspace, and its lockfile with it)
Concretely: all modules in my dependency chain will populate the same lockfile
Consequence 1: one lockfile shared by multiple clients (so don't couple the lockfile too tightly to a client)
Consequence 2: one session may involve multiple workspaces (so don't couple the lockfile too tightly to a session)

earnest spruce Apr 8, 2026, 8:40 PM

#

so I think I have a better understand of the race, BuildkitSessionServer.Run() is called and will wait on a context before closing: https://github.com/alexcb/dagger/blob/407eebeeff14ee5bccdd13441dd61dbf6458b7b5/engine/client/client.go#L743-L760

and it seems like this context gets closed before the POST /shutdown request comes back under https://github.com/alexcb/dagger/blob/407eebeeff14ee5bccdd13441dd61dbf6458b7b5/engine/client/client.go#L1034-L1037

The issue is that the POST triggers the lockfile sync, which then performs a filesync which depends on the filesyncer (created under https://github.com/alexcb/dagger/blob/407eebeeff14ee5bccdd13441dd61dbf6458b7b5/engine/client/client.go#L542 ), which is an attachable on the buildkit session.

uneven plinth Apr 8, 2026, 9:06 PM

#

earnest spruce so I think I have a better understand of the race, `BuildkitSessionServer.Run()`...

(I still gotta rename BuildkitSessionServer, it's not buildkit anymore 😄)

That does sound plausible to me though. We should feel free to re-arrange the shutdown logic however we need.

tawny apex Apr 8, 2026, 9:15 PM

#

uneven plinth (I still gotta rename BuildkitSessionServer, it's not buildkit anymore 😄) That...

I was just going to ask about that 🙂

uneven plinth Apr 8, 2026, 9:16 PM

#

tawny apex I was just going to ask about that 🙂

I'm saving the final renaming and trimming for last, mostly cause there's still bigger fish to fry left, but alos will be satisfying to finish on that note guy_fieri_chef_kiss

earnest spruce Apr 8, 2026, 9:17 PM

#

are the protos still the original buildkit protos, just moved into engine/session/...?

uneven plinth Apr 8, 2026, 9:18 PM

#

earnest spruce are the protos still the original buildkit protos, just moved into engine/sessio...

for the session attachables? yes

#

there's probably more trimming we could do of those since we only use a subset, but yeah saving that sort of superficial stuff for last now

earnest spruce Apr 8, 2026, 9:19 PM

#

yeah, just like a bunch of bk variables could eventually be renamed, but best to keep it as-is for now. we don't want to have to change every single line in the repo for this PR

earnest spruce Apr 8, 2026, 10:58 PM

#

@uneven plinth any chance you have an idea where the attachables session is getting closed? I've traced it to waiting on a c.ctx.Done() under https://github.com/alexcb/dagger/blob/407eebeeff14ee5bccdd13441dd61dbf6458b7b5/internal/buildkit/session/manager.go#L151

I think it's the engine that's initiating the shutdown, i.e. marking the context as done, or closing the attachbles connection rather than the client initiating the closing of the attachables connection.

#

I just changed ctx, cc, err := grpcClientConn(ctx, conn) to ctx, cc, err := grpcClientConn(context.TODO(), conn), and that solved the race, so it's defintely the ctx that gets passed into handleConn(....) that's triggering the premature shutdown

#

maybe this.... https://github.com/alexcb/dagger/blob/407eebeeff14ee5bccdd13441dd61dbf6458b7b5/engine/server/session.go#L1226 ?

uneven plinth Apr 8, 2026, 11:04 PM

#

earnest spruce <@949034677610643507> any chance you have an idea where the attachables session ...

It can be multiple places technically, i.e. if the main client disconnects entirely the attachables will close.

But https://github.com/sipsma/dagger/blob/0e8d78c59290c34b08265270a2aa767525f13703/engine/server/session.go#L1132-L1132 is the area you'd hit typically during graceful shutdown.

That'll get canceled after beginClosing is called https://github.com/sipsma/dagger/blob/0e8d78c59290c34b08265270a2aa767525f13703/engine/server/session.go#L340-L340

#

Among other places that is called in the handler for /shutdown itself here: https://github.com/sipsma/dagger/blob/0e8d78c59290c34b08265270a2aa767525f13703/engine/server/session.go#L1277-L1277

#

so I'd make sure that you take care of anything that needs session attachables before you call beginClosing()

earnest spruce Apr 8, 2026, 11:06 PM

#

ahh, that's super helpful, i'll see if I can move sess.beginClosing() further down.

uneven plinth Apr 8, 2026, 11:10 PM

#

yeah sorry there's a lot of cancellation and shutdown paths that have accumulated over the years.

For theseus I also had to modify this whole area of the code significantly to help ensure that all clients are really for sure fully disconnected before trying to persist the cache, to ensure it is in a settled consistent state (aka I spent like 2 days debugging why the cache was occasionally off after reloads 😅)

#

It's probably worth a cleanup+consolidation pass sometime in the future

earnest spruce Apr 8, 2026, 11:17 PM

#

woohoo, moving it further down solved the TestLockfile/TestLiveDiscoversQueryEntries failure I was hunting down. and in fact many other ones, now I'm down to only 4 failures:

--- FAIL: TestLockfile (0.00s)
    --- PASS: TestLockfile/TestWorkspaceUpdateRequiresLockfile (1.36s)
    --- PASS: TestLockfile/TestFromLockfilePinnedUsesPinEntry (2.75s)
    --- PASS: TestLockfile/TestLiveDiscoversQueryEntries (4.55s)
    --- PASS: TestLockfile/TestFromLockfilePinnedRefreshesFloatEntry (4.84s)
    --- PASS: TestLockfile/TestFromLockfileDisabledIgnoresEntry (5.39s)
    --- PASS: TestLockfile/TestFromLockfileLiveRefreshesEntry (5.80s)
    --- PASS: TestLockfile/TestLockUpdateRefreshesExistingEntry (3.30s)
    --- PASS: TestLockfile/TestWorkspaceUpdate (6.27s)
    --- PASS: TestLockfile/TestFromLockfileFrozenRequiresEntry (1.22s)
    --- PASS: TestLockfile/TestWorkspaceUpdateNestedQuery (7.43s)
    --- PASS: TestLockfile/TestGitBranchPinnedRefreshesFloatEntry (7.46s)
    --- PASS: TestLockfile/TestFromLockfileFrozenUsesFloatEntry (1.76s)
    --- FAIL: TestLockfile/TestLiveDiscoversGitEntries (7.79s)
    --- FAIL: TestLockfile/TestLiveNestedQuery (3.30s)
    --- PASS: TestLockfile/TestLockUpdateRefreshesExistingGitEntry (5.48s)
    --- FAIL: TestLockfile/TestLiveModuleCall (5.52s)
    --- FAIL: TestLockfile/TestGitBranchFrozenUsesFloatEntry (5.30s)
    --- PASS: TestLockfile/TestLockUpdateRefreshesExistingModuleResolveEntry (20.95s)
    --- PASS: TestLockfile/TestModuleSourceFrozenUsesLockedEntry (19.89s)
    --- PASS: TestLockfile/TestLiveDiscoversModuleSourceEntries (21.28s)

tawny apex Apr 8, 2026, 11:19 PM

#

I sense a rebase in my near future 🙂

earnest spruce Apr 8, 2026, 11:20 PM

#

uneven plinth yeah sorry there's a lot of cancellation and shutdown paths that have accumulate...

yeah no doubt, it's pretty gnarly code with the various connection callbacks. The cache debugging issue sounds rather challenging.

tawny apex Apr 8, 2026, 11:20 PM

#

https://giphy.com/gifs/pudgypenguins-blocks-bricks-building-2dus6CIKNBlERhNa9B

Giphy

Pudgy Penguins

earnest spruce Apr 8, 2026, 11:22 PM

#

tawny apex https://giphy.com/gifs/pudgypenguins-blocks-bricks-building-2dus6CIKNBlERhNa9B

ae804b5d25ca71392a6ffb5675609e6e2b7444c0f8f8232c577de67cdfd69d6f.png

uneven plinth Apr 8, 2026, 11:26 PM

#

speaking of which I do still need to update my branch after all the merges to main last 24h... lemme finish my current task and I'll get to that next. I'm currently working on a fix to TestLegacy/TestContainerAsService, which despite being a stupid test for obscure things (why I saved it for the end) did manage to reveal a flaw in the new lazy evaluation logic elmofire Half-way through fixing it though

#

I'll let you know once I've done a merge commit from main. Might be easier for me to do that and then you to merge that from my branch, so then you don't have to resolve new conflicts on your own outside of the code you were actually changing

whole ibex Apr 8, 2026, 11:28 PM

#

my branch after all the merges to main last 24h...
sorry about that 😇 hope its gonna be fine 🤞

earnest spruce Apr 8, 2026, 11:28 PM

#

uneven plinth I'll let you know once I've done a merge commit from main. Might be easier for m...

yeah best to have only one of us resolve conflicts, rather than two different ways to resolve them 😆

#

plus you know your branch much more 😛

uneven plinth Apr 8, 2026, 11:29 PM

#

whole ibex > my branch after all the merges to main last 24h... sorry about that 😇 hope it...

No apology needed, I appreciate you checking before merge. Yeah it's not gonna be a bad one at all based on what I saw, just saving it for after I finish up my current changes

uneven plinth Apr 9, 2026, 12:51 PM

#

Merged main into my PR. Also got all of the tests that were failing passing locally! So provided I didn't break any other tests in the process, should have all green tests in CI hyperfastparrot

earnest spruce Apr 9, 2026, 7:38 PM

#

failed to read file: failed to create diff copy client: rpc error: code = Unavailable desc = rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: only one connection allowed", weeeeee.

I wonder if the filesync client needs a lock?

uneven plinth Apr 9, 2026, 7:38 PM

#

earnest spruce `failed to read file: failed to create diff copy client: rpc error: code = Unava...

You might be hitting the wrong client

earnest spruce Apr 9, 2026, 7:38 PM

#

weird thing is I only see a single call to the lockfile flush?

earnest spruce Apr 9, 2026, 7:39 PM

#

uneven plinth You might be hitting the wrong client

good idea, this seems to be occuring when there's nested clients.

uneven plinth Apr 9, 2026, 7:40 PM

#

earnest spruce good idea, this seems to be occuring when there's nested clients.

Yeah I forget the exact context, but I hit a similar error once with secrets and it came down to me trying to get a secret from a nested client that had exited

#

Make sure there's no clientIDs or other per-client state stored on cached objects which is then being used here.

Funnily, one large area of work with theseus has been finding places where we stored per-client state on objects, which just so happened to work because caching was broken enough that we never got cache hits and thus never hit bugs where we leaked client information across cache

#

But then when caching actually works, the bug gets revealed elmofire

earnest spruce Apr 9, 2026, 7:47 PM

#

yeah that sounds very similar to what I'm seeing:

    if client.clientID == sess.mainClientCallerID {
        slog.Info("main client is shutting down")
        flushCtx := context.WithoutCancel(ctx)
        if err := srv.flushWorkspaceLocks(flushCtx, client); err != nil {

the main calling client is getting passed to flushWorkspaceLocks, but I suspect the flush needs to happen in the nested clients.

uneven plinth Apr 9, 2026, 7:48 PM

#

earnest spruce yeah that sounds very similar to what I'm seeing: ``` if client.clientID ==...

Is flush the place where it's written back to the client's filesystem?

#

If so this might actually be more like a design and/or test harness issue

earnest spruce Apr 9, 2026, 7:49 PM

#

https://github.com/alexcb/dagger/blob/33eb5f0a73d041d9daf54866732ebfe8580c2713/engine/server/session.go#L1711-L1765

#

it pulls out the session, which then has a bunch of lockfiles which are tied to workspaces which then have a workspaceCtx which pulls out a connection from there, AFAIK

uneven plinth Apr 9, 2026, 7:51 PM

#

uneven plinth If so this might actually be more like a design and/or test harness issue

It's kind of ambiguous what the behavior should be here. In real-world use cases you do only want to export back to the main client. But our tests make heavy use of nested clients, and we in this test case essentially want the nested client to be more like a simulated main client

#

I guess the most coherent thing would be to always export back to the client that loaded the lockfile in the first place

#

that might be a significant design departure though. I don't remember whether nested CLI clients call /shutdown or not

tawny apex Apr 9, 2026, 7:53 PM

#

@uneven plinth @earnest spruce will catch up as soon as release emergency is over

earnest spruce Apr 9, 2026, 7:54 PM

#

I'm not entirely sure if this is a nested CLI client issue, it's getting triggered by

            "container": {
                "from": {
                    "file": {
                        "contents": "3.23.3\n"
                    }
                }
            }
        }

#

where as it seems to work when it's simply a container.from (without any file)

uneven plinth Apr 9, 2026, 7:54 PM

#

earnest spruce I'm not entirely sure if this is a nested *CLI* client issue, it's getting trigg...

can you link to the code for the test case that fails with this error

earnest spruce Apr 9, 2026, 7:54 PM

#

https://github.com/alexcb/dagger/blob/33eb5f0a73d041d9daf54866732ebfe8580c2713/core/integration/lockfile_test.go#L241-L255

uneven plinth Apr 9, 2026, 7:55 PM

#

earnest spruce https://github.com/alexcb/dagger/blob/33eb5f0a73d041d9daf54866732ebfe8580c2713/c...

ok yeah it is a nested client in that case https://github.com/alexcb/dagger/blob/33eb5f0a73d041d9daf54866732ebfe8580c2713/core/integration/lockfile_test.go#L246

#

Some of the other tests use hostDaggerExec, which would avoid this issue technically

#

but more in the sense of hiding from it rather than solving it

earnest spruce Apr 9, 2026, 7:56 PM

#

uneven plinth Some of the other tests use `hostDaggerExec`, which would avoid this issue techn...

ahh I see now, it's a hostDaggerExec (which works), vs With(daggerExec(....)), which doesn't work

tawny apex Apr 9, 2026, 8:56 PM

#

Now catching up...

earnest spruce Apr 9, 2026, 10:00 PM

#

I'm thinking it's that the flush is occurring under the main client shutdown; however at that point the nested clients will already have been closed.

#

maybe we need to scan through the workspaces and if any are associated with the client shuttingdown (either main, or nested), then we do a flush?

uneven plinth Apr 9, 2026, 10:06 PM

#

earnest spruce maybe we need to scan through the workspaces and if any are associated with the ...

i would check if nested CLI clients even call /shutdown today. If they do, then you'd need to update the accounting code for all of this to track which client a workspace+lockfile were loaded from and then trigger the export back on their shutdown. If they don't call /shutdown, then we'd need a broader change

#

The other option of course is to not care about nested clients since it's outside the realm of 99% of real world use cases, but we would rule out some legit uses like with the engine-dev playground container

earnest spruce Apr 9, 2026, 10:09 PM

#

I have a printf that's being hit when if client.clientID == sess.mainClientCallerID { ... } else { printf("here") }, so I think it's being called by nested.

#

this is feeling pretty hacky, but the test is now passing with this change: https://github.com/alexcb/dagger/commit/e068b39b403bc81efbe52e838517d619a00acd0c

GitHub

hacky fix · alexcb/dagger@e068b39

Signed-off-by: Alex Couture-Beil alex@mofo.ca

#

apologies for all the noisy printfs in this

#

the main changes are that I'm calling srv.flushWorkspaceLocks from all shutdown instances now, and that I also changed the flushing code to only flush matching clients via a if state.ws.ClientID == client.clientID {

uneven plinth Apr 9, 2026, 10:17 PM

#

earnest spruce the main changes are that I'm calling `srv.flushWorkspaceLocks` from all shutdow...

that sounds reasonable tbh

#

at least directionally

earnest spruce Apr 9, 2026, 10:18 PM

#

looks like my change has now caused TestGitBranchPinnedRefreshesFloatEntry to fail though, it used to pass before this 😞

uneven plinth Apr 9, 2026, 10:19 PM

#

https://klipy.com/gifs/blinds

Klipy

Peter Griffin Struggles with Blinds

▶ Play video

tawny apex Apr 9, 2026, 10:22 PM

#

ok now REALLY catching u

earnest spruce Apr 9, 2026, 10:22 PM

#

oh it's failing with a

it could very well be that it used to be passing because the string used to contain "error: this is not possible", at least that's me being optimistic 😆

tawny apex Apr 9, 2026, 10:22 PM

#

@earnest spruce is the architecture of lockfile itself an issue after all;?

earnest spruce Apr 9, 2026, 10:24 PM

#

tawny apex <@691818044368093268> is the architecture of lockfile itself an issue after all;...

I'm not sure to be honest, it seems like I'm making progress on adding some extra steps in how the flush is called. it might work.

#

I'm avoiding refactoring anything right now just to see if I can get the tests passing first, then maybe see if we can optimize things later.

tawny apex Apr 9, 2026, 10:25 PM

#

Can i help get to the answer?

earnest spruce Apr 9, 2026, 10:26 PM

#

I only have two failings tests now:

    --- FAIL: TestLockfile/TestLiveDiscoversGitEntries (2.79s)
    --- FAIL: TestLockfile/TestGitBranchPinnedRefreshesFloatEntry (1.84s)

if I can get these working I think we're in good shape.

tawny apex Apr 9, 2026, 10:26 PM

#

thanks for carrying it

#

I'm very curious to see what happens when I rebase workspace to egraph (via your new lockfile)

earnest spruce Apr 9, 2026, 10:28 PM

#

tawny apex I'm very curious to see what happens when I rebase `workspace` to egraph (via yo...

yeah that'll be another big moment

earnest spruce Apr 9, 2026, 10:33 PM

#

earnest spruce oh it's failing with a ``` Error: "\n{\n \"git\": {\n ...

so I checked against an earlier commit and it was legit passing earlier, so I gotta figure out what changed with this.

earnest spruce Apr 9, 2026, 11:47 PM

#

so this git test is a bit racy, when I run ./hack/build && ./hack/with-dev go test -count=1 -v -run TestLockfile/TestGitBranchPinnedRefreshesFloatEntry ./core/integration/ it passes, however when I change my test regex to TestLockfile/TestGit, it fails since it runs two tests:

--- FAIL: TestLockfile (0.00s)
    --- PASS: TestLockfile/TestGitBranchFrozenUsesFloatEntry (2.75s)
    --- FAIL: TestLockfile/TestGitBranchPinnedRefreshesFloatEntry (3.96s)

i think the only real difference between these tests is that dagger is run with either "--lock=pinned" or "--lock=frozen". I wonder if these args are missing from the arg digest and are getting deduped?

uneven plinth Apr 9, 2026, 11:59 PM

#

earnest spruce so this git test is a bit racy, when I run `./hack/build && ./hack/with-dev go t...

yeah looks like the lockfile operations run in GitRepository.ref, which at least for public repos will be cached correctly between clients. So the first client gets what it expects, second client probably hits cache GitRepository.ref and thus never even executes anything lockfile related

#

I was under the impression when I last reviewed that all the lockfile operations ran in operations that were per-client (which would avoid this), but I must have missed this case

#

Kind of gets at the broader problem that the approach of trying to do this stuff "inline" in operations is gonna be finnicky for these sorts of reasons

#

conceptually it would be better to just integrate it into the cache itself

#

the cache has some client/session awareness already to deal with stuff like secrets/sockets

earnest spruce Apr 10, 2026, 12:02 AM

#

yeah it's a whole other dimension of things that can affect the execution, and changes the op's result 😐

earnest spruce Apr 10, 2026, 12:03 AM

#

uneven plinth conceptually it would be better to just integrate it into the cache itself

do you think that's a very big detour to do that?

tawny apex Apr 10, 2026, 12:03 AM

#

Pre-theseus I wouldn't have either dared to explore that direction

#

https://giphy.com/gifs/wreckedtbs-funny-tbs-yNFg0qdiJTX1sCTjNc

Giphy

Wrecked

uneven plinth Apr 10, 2026, 12:04 AM

#

earnest spruce do you think that's a very big detour to do that?

yeah it would be, but not as bad as it used to be. it'd be like 1 day of designing w/ codex and implementing

earnest spruce Apr 10, 2026, 12:04 AM

#

tawny apex https://giphy.com/gifs/wreckedtbs-funny-tbs-yNFg0qdiJTX1sCTjNc

https://tenor.com/ifLwickW0rU.gif

Tenor

tawny apex Apr 10, 2026, 12:04 AM

#

What's finnicky about the current design though?

#

as opposed to less elegant / fast / simple

uneven plinth Apr 10, 2026, 12:07 AM

#

tawny apex What's finnicky about the current design though?

it actually is the same sort of problem that @whole ibex hit with git caching. Query.git itself is executed per-client but it returns an object cache-scoped to the repo (https://github.com/alexcb/dagger/blob/33eb5f0a73d041d9daf54866732ebfe8580c2713/core/schema/git.go#L730-L755)

So subsequent calls chained off of it can share cache

#

Actually, now that I say this out loud... That "worked" in the previous system because the cache just dissappeared and wasn't actually persistent. But now in theseus, cache is actually retained when it's referenced by something that is persistent (a.k.a. it actually makes sense now)

But that may actually not be what we desire anymore. We don't want to have persistent cache for the per-client Query.git op

#

(This has been the story of my life for the last 2 months, some part of our system relied on broken caching behavior and thus once caching actually works it needs updating)

tawny apex Apr 10, 2026, 12:11 AM

#

After "the engine is too fast", introducing "the cache works too well"

#

reverse bottlenecks

uneven plinth Apr 10, 2026, 12:11 AM

#

@earnest spruce does the flake disappear if you delete this chunk of code? https://github.com/alexcb/dagger/blob/33eb5f0a73d041d9daf54866732ebfe8580c2713/core/schema/git.go#L730-L755

uneven plinth Apr 10, 2026, 12:11 AM

#

tawny apex reverse bottlenecks

yes, good problems to have in the long term

earnest spruce Apr 10, 2026, 12:13 AM

#

uneven plinth <@691818044368093268> does the flake disappear if you delete this chunk of code?...

why yes it does

uneven plinth Apr 10, 2026, 12:14 AM

#

earnest spruce why yes it does

I know I'm a broken record but it feels soooooooooooo good when it's actually possible to predict the behavior of caching, it's such a goddamn relief after years of nonsense

uneven plinth Apr 10, 2026, 12:15 AM

#

earnest spruce why yes it does

Ok so I think we probably do want to delete that code. But we need replacements elsewhere. Basically need to "shift" the places where we scope cache by content.

E.g. by deleting that chunk of code each GitRepository operation will be per-client (so run once per-client), but we definitely want to attach a content digest at the end of GitRepository.ref. That way we retain content caching but in a more fine grain manner

#

That should avoid the issue for the lockfile stuff, we can probably put off trying to do deep cache integration (even though that might be a good long term direction)

earnest spruce Apr 10, 2026, 12:17 AM

#

uneven plinth Ok so I think we probably do want to delete that code. But we need replacements ...

ah that seems reasonable, since we'll force the ops to execute which will let us check the lockfile, then once we know exactly what the ref will be, that will go into the content digest key.

tawny apex Apr 10, 2026, 12:17 AM

#

@uneven plinth if we punt on deep cache integration, does the overall shape of the lockfile feature (CLI flags, "lock modes", location and format of the lockfile itself...) allow doing it later without breaking the contract?

uneven plinth Apr 10, 2026, 12:20 AM

#

tawny apex <@949034677610643507> if we punt on deep cache integration, does the overall sha...

Yeah for sure. It'd be an implementation detail only. Changing the implementing to have deep cache integration just helps avoid situations like what we currently hit. We were lucky in this case since we actually really wanted to delete that chunk of code anyways for basic correctness, and it also happened to unblock the lockfile problem.

But highly possible we might hit a situation in the future where we aren't so lucky. Deeper cache integration would make it all way easier

tawny apex Apr 10, 2026, 12:25 AM

#

BTW one stopgap available to us if needed: we could reduce the scope of lockfile-enabled operations in this branch. The only one we really need for merging workspace, is module.resolve. I added git and container operations mostly to make sure it generalized as I hoped, and also because it's a cool feature to have with little marginal cost.

But, if removing container and git ops from lockfile helps get this branch merged, it doesn't affect workspace

earnest spruce Apr 10, 2026, 12:31 AM

#

tawny apex BTW one stopgap available to us if needed: we could reduce the scope of lockfile...

it's only the git related tests that are failing, the container.From lockfile tests are working

tawny apex Apr 10, 2026, 12:34 AM

#

If you want to pull the ripcord and punt on git, it's an option

#

(we need module lookup though)

#

your call

earnest spruce Apr 10, 2026, 12:36 AM

#

tawny apex (we need module lookup though)

do the modules use any of the dagger git API?

tawny apex Apr 10, 2026, 12:39 AM

#

earnest spruce do the modules use any of the dagger git API?

What I mean is: all the lockfile features are awesome, but only one is a blocker for merging the workspace branch: patching moduleSource() to use lockfile when looking up pins for module addresses. That's because the new workspace config file .dagger/config.toml doesn't contain pins; so we need a lockfile to persist module pins in .dagger/lockfile. The beauty of the lockfile feature is that it generalizes beyond just pinning dependencies for dagger modules installed in your workspace: really it can pin any engine lookup operation, at runtime. It just happens that one of those operations is looking up the pinned ref of a module

earnest spruce Apr 10, 2026, 12:44 AM

#

tawny apex What I mean is: all the lockfile features are awesome, but only one is a blocker...

I do see some references to dag.Select(ctx, gitRef, so maybe the pinning of modules depends on git? It's probably more robust to also get the git pinning working, but I really don't know if it's a blocker for module pinning or not.

tawny apex Apr 10, 2026, 12:46 AM

#

I'm relatively confident that they are decoupled - yes there is a dependency between module ref lookup and git ref lookup. But from a lockfile perspective they are cleanly decoupled. (so a single module lookup on a remote git address, will produce 2 lockfile entries, one of each type)

tawny apex Apr 10, 2026, 12:47 AM

#

tawny apex I'm relatively confident that they are decoupled - yes there is a dependency bet...

100% confident that I designed it that way. Moderately confident that the agent didn't change the implementation behind my back 😛

earnest spruce Apr 10, 2026, 12:48 AM

#

I can give it a try, if you're feeling a bit of pressure to get the workspace PR rebased. We can always work on the git changes in a seperate PR

tawny apex Apr 10, 2026, 12:48 AM

#

It depends how you feel about the remaining git issues

earnest spruce Apr 10, 2026, 12:50 AM

#

I would be tempted to Erik's idea of moving the digests elsewhere, and see if that doesn't break any of the existing regular (non-lockfile) git tests. But if those explode then maybe yes cut out the git pinning otherwise

tawny apex Apr 13, 2026, 6:56 PM

#

@earnest spruce aloha 🙂 How is this looking?

#

(Trying to make a plan for workspace for this week)

earnest spruce Apr 13, 2026, 6:58 PM

#

I got nerdsniped on cgroups stuff 😁

#

I wrote a TestGit/... test to verify caching is still working after I removed the content digest overrides, and it still is, so I'm wondering if we even need to add them back in.

#

here's the new, rather simple test: https://github.com/alexcb/dagger/blob/306a0964c709ad97ca08c77d4793e9e3771f1610/core/integration/git_test.go#L1834-L1851 but it's working

#

so I think we would be good to try rebasing workspace on the lockfile-on-egraph branch we have.

tawny apex Apr 13, 2026, 7:02 PM

#

ah! nice 🙂

#

maybe we just overwrite lockfile with the egraph version? And keep the current version as backup somewhere

#

It's not like we have any path to shipping the non-lockfile version..

earnest spruce Apr 13, 2026, 7:04 PM

#

tawny apex maybe we just overwrite `lockfile` with the egraph version? And keep the current...

that sounds good, can I just do a git push -f when the time comes?

tawny apex Apr 13, 2026, 7:05 PM

#

Yes 👍

#

(don't forget the backup though)

earnest spruce Apr 13, 2026, 7:06 PM

#

I don't think I have direct dagger push access (which is a good thing), so I pushed a backup to https://github.com/alexcb/dagger/tree/lockfile-before-egraph

#

well maybe not a good thing since I would be able to directly push to lockfile, but I can push it to my own fork and maybe you can do the push on my behalf when its ready?'

tawny apex Apr 13, 2026, 7:08 PM

#

earnest spruce well maybe not a good thing since I would be able to directly push to lockfile, ...

Weird you have write access to the repo

#

@earnest spruce can you try again? I upgraded from "write" to "maintain"

#

Weird that "write" doesn't let you push a branch

earnest spruce Apr 13, 2026, 7:10 PM

#

$ git push dagger lockfile-before-egraph 
Total 0 (delta 0), reused 0 (delta 0), pack-reused 0
remote: error: GH013: Repository rule violations found for refs/heads/lockfile-before-egraph.
remote: Review all repository rules at https://github.com/dagger/dagger/rules?ref=refs%2Fheads%2Flockfile-before-egraph
remote: 
remote: - Cannot create ref due to creations being restricted.
remote: 
To github.com:dagger/dagger.git
 ! [remote rejected]     lockfile-before-egraph -> lockfile-before-egraph (push declined due to repository rule violations)
error: failed to push some refs to 'github.com:dagger/dagger.git'

#

looks like i'm running into https://github.com/dagger/dagger/rules/1146738?ref=refs%2Fheads%2Flockfile-before-egraph

GitHub

Rulesets · dagger · dagger/dagger

Automation engine to build, test and ship any codebase. Runs locally, in CI, or directly in the cloud - Rulesets · dagger · dagger/dagger

earnest spruce Apr 13, 2026, 7:12 PM

#

tawny apex Weird that "write" doesn't let you push a branch

looks like it's specific to creating a new branch

tawny apex Apr 13, 2026, 7:21 PM

#

Ah I see

#

@earnest spruce so that means you should be able to do the push -f lockfile then

earnest spruce Apr 13, 2026, 7:22 PM

#

tawny apex <@691818044368093268> so that means you should be able to do the `push -f lockfi...

probably... I could try it right now if we really want.

tawny apex Apr 13, 2026, 7:23 PM

#

If it's ready, then yeah - no point in waiting. Nobody else is using that branch

earnest spruce Apr 13, 2026, 7:25 PM

#

looks like I need to do another rebase onto egraph first, plus fix a lint issue. I'll do that and if I get my tests passing I'll do the push.

earnest spruce Apr 13, 2026, 10:12 PM

#

tawny apex <@691818044368093268> so that means you should be able to do the `push -f lockfi...

I just tried it now, and I can't force push either 🙁

#

my latest is on https://github.com/alexcb/dagger/tree/lockfile-on-egraph under f3ca85ef0e67bacf031eb4b1b53dab1eed3262f3

tawny apex Apr 13, 2026, 10:14 PM

#

@earnest spruce try again? I upgraded you to maintain again

whole ibex Apr 13, 2026, 10:14 PM

#

earnest spruce looks like I need to do another rebase onto egraph first, plus fix a lint issue....

did you do the merge commit for egraph ? Looking into it rn otherwise

earnest spruce Apr 13, 2026, 10:15 PM

#

whole ibex did you do the merge commit for egraph ? Looking into it rn otherwise

I only ever resolved it on my branch, here's the patch I had to apply to resolve the conflict: https://gist.github.com/alexcb/d93a89c7122815e61f6cada1c57e732f

#

I was sort of afraid to step on people's toes if I pushed it directly to the egraph branch

whole ibex Apr 13, 2026, 10:16 PM

#

earnest spruce I only ever resolved it on my branch, here's the patch I had to apply to resolve...

if youre confident with it, push on it, i think

#

its just you / lockfile

earnest spruce Apr 13, 2026, 10:16 PM

#

whole ibex if youre confident with it, push on it, i think

will do! give me a sec

earnest spruce Apr 13, 2026, 10:17 PM

#

tawny apex <@691818044368093268> try again? I upgraded you to `maintain` again

thanks! I'll try it once I get the above egraph conflict updated.

#

wow, and here I thought I was going to run into more permissions because the egraph branch is under Erik's fork, but apparently it worked: https://github.com/sipsma/dagger/commit/e482d5c5a1e021fd0b949ec4bfe3f3632a6ba78a

GitHub

GitHub - sipsma/dagger at egraph

A portable devkit for CI/CD pipelines. Contribute to sipsma/dagger development by creating an account on GitHub.

GitHub

Merge remote-tracking branch 'dagger/main' into egraph · sipsma/da...

tawny apex Apr 13, 2026, 10:29 PM

#

@earnest spruce nice, so you were able to push lockfile?

#

https://giphy.com/gifs/nobodiestv-tv-land-tvland-3og0ICZh82LEsNjHoc

Giphy

nobodies.

earnest spruce Apr 13, 2026, 10:30 PM

#

another rebase laughcry

#

then trying the --force push

earnest spruce Apr 13, 2026, 10:33 PM

#

tawny apex <@691818044368093268> try again? I upgraded you to `maintain` again

💀 I have no idea why it's still not working:

remote: error: GH013: Repository rule violations found for refs/heads/lockfile.
remote: Review all repository rules at https://github.com/dagger/dagger/rules?ref=refs%2Fheads%2Flockfile
remote: 
remote: - Cannot force-push to this branch
remote: 
To github.com:dagger/dagger.git
 ! [remote rejected]     lockfile -> lockfile (push declined due to repository rule violations)

#

@tawny apex do you want to just try running git checkout lockfile && git reset --hard c9aeb2d70079ef27e96cc55a8d4cf199657289ef && git push --force-with-lease?

tawny apex Apr 13, 2026, 10:39 PM

#

@earnest spruce try again?

earnest spruce Apr 13, 2026, 10:40 PM

#

https://tenor.com/is2DrZRFUtf.gif

Tenor

#

it worked!

earnest spruce Apr 13, 2026, 10:56 PM

#

@tawny apex should I try out the workspace branch next, and see if I can get it rebased onto the new branch I just force pushed?

tawny apex Apr 13, 2026, 11:18 PM

#

@earnest spruce sorry got pulled into something else..

tawny apex Apr 13, 2026, 11:19 PM

#

earnest spruce <@488409085998530571> should I try out the `workspace` branch next, and see if I...

Yes please! That would be awesome

tawny apex Apr 15, 2026, 8:35 PM

#

@earnest spruce any luck? (ignore if you're off today, I don't know how you allocate your time throughout the week 🙏 )

whole ibex Apr 15, 2026, 8:37 PM

#

Happy to sync this week too if you wanna have a rubber duck, once release if out, will try to resolve some of the egraph conflicts, in the meantime

earnest spruce Apr 15, 2026, 8:53 PM

#

tawny apex <@691818044368093268> any luck? (ignore if you're off today, I don't know how yo...

just got back from a nice lunchtime walk, I've managed to get all the workspace commits cherry-picked, and pushed it to https://github.com/alexcb/dagger/tree/workspace-on-rebased-lockfile but so far it's only compiling, and now I'm sorting out some failing tests.

tawny apex Apr 15, 2026, 8:53 PM

#

nice!

#

@earnest spruce just to be clear, the tests were not passing before

earnest spruce Apr 15, 2026, 8:54 PM

#

ah well that's good news 😅

tawny apex Apr 15, 2026, 8:54 PM

#

So unfortunately it will be hard to identity if the rebase itself broke anything - moving target

#

BTW make sure to get latest workspace tip if you're going to mess with the tests, I'm actively refactoring them

earnest spruce Apr 15, 2026, 8:54 PM

#

I think I introduced a race/cache related bug to TestDirectory/TestDiff, because it's intermittently failing, and is also doing that on the lockfile branch too.

tawny apex Apr 15, 2026, 8:55 PM

#

@whole ibex is egraph itself very far behind main, with all the release fire-fighting?

earnest spruce Apr 15, 2026, 8:55 PM

#

tawny apex BTW make sure to get latest workspace tip if you're going to mess with the tests...

that's good to know, I've only picked up to 20399820f5b5b1c337bcc95cfaab6533f0fbfc9e which is from 2026-04-08

tawny apex Apr 15, 2026, 8:57 PM

#

@earnest spruce since we don't have the objective truth of "tests are green again" to tell us if rebase is successful: we should define an arbitrary truth, let's call it "Alex gut check" 🙂

If you think we're close to passing the Alex Gut Check on latest tip, I can freeze while you catch up to tip, then can do the switch?

#

I'm doing a pass on docs right now anyway

#

I think you only have very superficial changes to catch up to (like UI polish, changing the text of an error message, docs, and tests)

earnest spruce Apr 15, 2026, 8:58 PM

#

gut check defintely has some food poisioning right now, since TestDirectory should be failing 😁

tawny apex Apr 15, 2026, 8:59 PM

#

New success metric: "tests are red again" 😛

earnest spruce Apr 15, 2026, 8:59 PM

#

do we think we'll be able to merge egraph this or next week?

tawny apex Apr 15, 2026, 8:59 PM

#

earnest spruce do we think we'll be able to merge egraph this or next week?

cc @whole ibex @hidden current on that

#

It's been non-stop firefighting on main... 0.20.4, .5, now struggling to get .6 out. So everyone has been distracted.

#

But we can't afford to wait too long on egraph, or it will get harder and harder

earnest spruce Apr 15, 2026, 9:00 PM

#

yeah, i've been slightly afraid to ask.... seems like it's been a struggle

tawny apex Apr 15, 2026, 9:00 PM

#

We're in the final stretch though

#

The root cause for most of the problems (although not all) has been our attempt to merge workspace-plumbing fast, to sneak in as much of the workspace branch as we could without breaking backwards compat. Turns out, that was harder than planned, we had gaps in backwards compatibility. And, the problem was compounded by us not catching it before merge; or even after merge... we only caught it after release. Which created more problems because Dagger Cloud could no longer upgrade to latest release, but customers did start to upgrade...anyway

#

So anyway:

We are still committed to merging egraph asap
I don't think anyone had time to actually work on that
So in practice I don't know what we can realistically aim to merge it
Are you interested in working on that? 😛

earnest spruce Apr 15, 2026, 9:04 PM

#

I can help with another merge from main if needed

tawny apex Apr 15, 2026, 9:09 PM

#

@earnest spruce I believe there's a specific failing test holding back merge

#

(plus right now, main is probably a few commits ahead too because of exceptions made to the freeze for the release)

earnest spruce Apr 15, 2026, 9:12 PM

#

before our cloud/release issues I think it was only the python-sdk tests that were failing, but now it looks like there's a merge conflict plus many other tests that were failing due to release issues.

tawny apex Apr 15, 2026, 9:15 PM

#

assuming we get the tests green again on main (should be imminent), then the remaining issue are 1) the original python-sdk test, 2) the merge conflicts - to be expected since the release fixes involved module loading, routing of context directory... but hopefully the conflicts are superficial

whole ibex Apr 15, 2026, 9:22 PM

#

tawny apex <@274903880343748619> is `egraph` itself very far behind main, with all the rele...

i think its ok, there might be a cache conflict with the PR we introduced from Tibor, but it should be well scoped.

whole ibex Apr 15, 2026, 9:23 PM

#

tawny apex cc <@274903880343748619> <@336241811179962368> on that

Yes, we still wanna merge asap (I'll let Marcos and Tibor confirm), but it seems that the few conflicts are scoped. I wanna add 1 cache integration test I think, regarding

earnest spruce Apr 15, 2026, 9:31 PM

#

I just tried a quick merge, and the one that's confusing me is:

func (mod *Module) GetContextSource() *ModuleSource {
    if !mod.ContextSource.Valid {
        return nil
    }
<<<<<<< HEAD
    return mod.ContextSource.Value.Self()
=======

    return hashutil.HashStrings(
        contentDigest,
        contentCacheScope,
        "asModule",
        mod.AsModuleVariantDigest,
    ).String()
>>>>>>> dagger/main
}

#

from core/module.go

#

@unique robin added AsModuleVariantDigest under b35e37b0e042e8ca148e8580b52bec65846576b6 however that was to func (mod *Module) ContentDigestCacheKey() string { ....}, but in the egraph source the function it's trying to merge into is a different (or maybe renamed?) function called func (mod *Module) GetContextSource() *ModuleSource {

whole ibex Apr 15, 2026, 9:33 PM

#

earnest spruce I just tried a quick merge, and the one that's confusing me is: ``` func (mod *...

yes that's the thing i was mentioning above

#

I think its ok to remove that cache key change, and implement it in egraph style. But we need the test coverage for that (as a followup)

earnest spruce Apr 15, 2026, 9:36 PM

#

whole ibex I think its ok to remove that cache key change, and implement it in egraph style...

yeah a test will help a lot 🥺

unique robin Apr 15, 2026, 9:38 PM

#

That cache commit is about the following case: Two modules using the SAME toolchain, but with differing customizations fields (example: ignore pattern for a directory arg)

#

One should not cache hit for the other

tawny apex Apr 15, 2026, 10:03 PM

#

@earnest spruce @whole ibex can we clarify ownership of getting egraph merged? Can I / should I make you Alex the owner of that? Not to do all the work necessarily, just making sure someone knows what's going on and can authoritatively pull others to help

whole ibex Apr 15, 2026, 10:03 PM

#

tawny apex <@691818044368093268> <@274903880343748619> can we clarify ownership of getting ...

Alex is fine by me totally 👍

tawny apex Apr 15, 2026, 10:04 PM

#

IMO it makes sense for Alex to own it, since 1) merging egraph, and 2) rebasing lockfile on egraph, and 3) rebasing workspace on lockfile, are tightly coupled and would require a lot of back-and-forth - better to have the same person own all 3 IMO

#

It boils down to: do you have the bandwidth this week @earnest spruce , since I know you're not full time

earnest spruce Apr 15, 2026, 10:08 PM

#

I only have a few more hours left this week (I got some house stuff to tackle tomorrow and friday), I have a halfway merge here locally, just no test or anything for it.

#

I could see if my merge can pass a quick local test.

whole ibex Apr 15, 2026, 10:09 PM

#

earnest spruce I only have a few more hours left this week (I got some house stuff to tackle to...

Sorry if youre waiting on me, still on the release above, will switch soon, let me hack the test real fast

earnest spruce Apr 15, 2026, 10:18 PM

#

I just pushed my untested merge to https://github.com/alexcb/dagger/tree/egraph-merge , @whole ibex if you're able to come up with an integration test for it, that'll make me feel more comfortable.

#

I can have a quick try at manually testing it while you're wrapping up with the release

whole ibex Apr 15, 2026, 10:24 PM

#

earnest spruce I just pushed my *untested* merge to https://github.com/alexcb/dagger/tree/egrap...

https://dagger.cloud/dagger/traces/f77ef6ebebff48a20b8593e838d8197d?listen=baf8a9761217ea6d

Dagger Cloud

Browse and visualize Dagger traces.

#

let me push

whole ibex Apr 15, 2026, 10:38 PM

#

earnest spruce I just pushed my *untested* merge to https://github.com/alexcb/dagger/tree/egrap...

https://github.com/grouville/dagger/tree/add-caching-regression-test @unique robin the fail still seems to fail until I add the second commit, which actually salts the and relies on the internal bindings you added. I remember your PR actually fixed a cache issue, but this very test might still trigger another variation ?

whole ibex Apr 15, 2026, 10:42 PM

#

whole ibex https://github.com/grouville/dagger/tree/add-caching-regression-test <@707011193...

To run it (pre-egraph): dagger call engine-dev test --run '^TestToolchain/TestToolchainCustomizationCacheKey$' --test-verbose

whole ibex Apr 15, 2026, 10:45 PM

#

whole ibex https://github.com/grouville/dagger/tree/add-caching-regression-test <@707011193...

To me, I'm ok removing the salting of the cache key @earnest spruce because I think it's been failing for a while (never really worked)

earnest spruce Apr 15, 2026, 10:47 PM

#

I'm currently running the test via ./hack/with-dev go test -count=1 -v -run TestToolchain/TestToolchainCustomizationCacheKey ./core/integration/ and it seems to have stalled 😕

#

but this is under egraph

#

I'll try it via dagger call engine-dev test --run '^TestToolchain/TestToolchainCustomizationCacheKey$' --test-verbose in case it's different

#

yeah it's failing here too

=== CONT  TestToolchain/TestToolchainCustomizationCacheKey                                                                                                                                        
    testctx.go:174:                                                                                                                                                                               
                Error Trace:    /app/core/integration/toolchain_test.go:522                                                                                                                       
                                                        /go/pkg/mod/github.com/dagger/testctx/oteltest@v0.1.2/trace.go:133                                                                        
                                                        /go/pkg/mod/github.com/dagger/testctx@v0.1.2/middleware.go:25                                                                             
                                                        /go/pkg/mod/github.com/dagger/testctx@v0.1.2/testctx.go:150                                                                               
                Error:          Should not be: "1776293353521819777"                                                                                                                              
                Test:           TestToolchain/TestToolchainCustomizationCacheKey                                                                                                                  
                Messages:       different toolchain customizations must not cache-hit

#

I'm ok removing the salting of the cache key
I can push my merge in that case, and see how it goes.

#

give me a quick sec to try a few unrelated tests first.

#

I just pushed e104ce1 onto https://github.com/dagger/dagger/pull/11856

whole ibex Apr 15, 2026, 11:20 PM

#

earnest spruce I just pushed `e104ce1` onto https://github.com/dagger/dagger/pull/11856

LGTM 😍🙏

hidden current Apr 15, 2026, 11:29 PM

#

earnest spruce I just pushed `e104ce1` onto https://github.com/dagger/dagger/pull/11856

@earnest spruce are you available tomorrow? Our plan is to rebase Erik's PR with main and merge it so we can start dogfooding egraph ASAP. My understanding is that you can keep rebasing your Lockfile + workspaces on top of that, correct?

earnest spruce Apr 15, 2026, 11:30 PM

#

hidden current <@691818044368093268> are you available tomorrow? Our plan is to rebase Erik's ...

no sorry, I have plans tomorrow.

hidden current Apr 15, 2026, 11:30 PM

#

earnest spruce no sorry, I have plans tomorrow.

perfect, yes it's what I understood

#

are you ok in us merging the egraph branch if we get it green on CI?

earnest spruce Apr 15, 2026, 11:31 PM

#

oh yes please! the only last test failure are the python-sdk errors (and some linting, which seems easy enough to clean up).

#

once egraph is in main, I can rebase the lockfile and workspace branches onto it -- no need to worry about those.

hidden current Apr 15, 2026, 11:32 PM

#

earnest spruce once egraph is in main, I can rebase the lockfile and workspace branches onto it...

awesome, thx 🙏 . Looking forward to hopefully merge that tomorrow

#Lockfile on egraph 🧵