It happened once but we shrugged it off | Dagger | Page 1

kind roost Jul 9, 2025, 3:14 AM

#

Does it happen to multiple people? Something weird must be going on since I can't repro it by just having v0.18.12 CLI installed, engine running and then doing a go run of:

package main

import (
    "context"
    "fmt"

    "dagger.io/dagger/dag"
)

func main() {
    ctx := context.Background()

    out, err := dag.Container().From("alpine:latest").WithExec([]string{"echo", "Hello, Dagger!"}).Stdout(ctx)
    if err != nil {
        fmt.Println("Error:", err)
        return
    }
    fmt.Println("Output:", out)
}

where the go.mod specifies v0.18.11

#

If you have a cloud trace where it happened there could be something useful there in the early steps where it's connecting to the engine

woven leaf Jul 9, 2025, 4:00 AM

#

kind roost Does it happen to multiple people? Something weird must be going on since I can'...

I hit it once in the past -- haven't bumped yet to 18.2

azure vortex Jul 9, 2025, 10:03 AM

#

yeah, i can't repro either.

one thing that would good to know is what the env looks like when this happens?

#

actually wait the original error shared here #p-envelope message is interesting
it feels like it's not the sdk provisioning code

deep relic Jul 9, 2025, 3:42 PM

#

@left otter @paper sigil this happened before to one of you I think?

paper sigil Jul 9, 2025, 3:42 PM

#

Happened to me

#

I'm honestly not sure why either, I PRd the upgrade to the dagger deps to container-use

azure vortex Jul 9, 2025, 3:43 PM

#

if anyone has a full trace, logs, etc, would be a lot easier to debug 👀

#

one thought - you wouldn't have upgraded your local dagger cli and also been using the older container-use without the bump?

#

and been running things at the same time

paper sigil Jul 9, 2025, 3:45 PM

#

this is the error/when I reported: #p-envelope message

#

Going to look in cloud under mccallister-dev org for a trace @azure vortex

paper sigil Jul 9, 2025, 4:11 PM

#

ok, I believe its one of these (sorry I can't be more specific):

https://dagger.cloud/mccallister-dev/traces/69b8db6f68c6428ee5bc5378a71f0182
https://dagger.cloud/mccallister-dev/traces/58c62d4e7e82d2ed5de540bf37b02b0d

(timeout due to inactivity)

woven leaf Jul 9, 2025, 11:12 PM

#

So, I've been looking into the "No such container" error recently, and here's a quick update:

The error seems to be coming from BuildKit’s connection helper library when it runs this:

docker exec -i dagger-engine-v0.18.11 buildctl dial-stdio

I couldn't replicate the exact root cause Andrea has encountered yesterday, but I did reproduce a similar issue by intentionally running the command on a non-existent container.

Simple Repro Steps
Create a container with a mismatched name

docker run -d --name dagger-engine-2e6d64d8564a25d1 \
  --privileged registry.dagger.io/engine:v0.18.11

Attempt to exec using the expected (but incorrect) name

docker exec -i dagger-engine-v0.18.11 buildctl dial-stdio

This gives exactly the error we're seeing:

Error response from daemon: No such container: dagger-engine-v0.18.11

Underlying Flow

SDK provisions containers using a docker driver docker.go
SDK returns a connection helper URL with the expected container name docker.go
Connection helper (from BuildKit library) uses docker exec internally to establish the connection.

If container names don't match, we get the error above

Where GC Logs Go

The logs in the garbage collection function, which could explain what's happening, are sent via OpenTelemetry and don't show up in stderr by default

Current theory

User has container dagger-engine-v0.18.12 running.
SDK v0.18.11 expects a different container (dagger-engine-v0.18.11).
SDK tries (but possibly fails silently) to garbage collect the incorrect container (v0.18.12).
Something prevents the correct container (v0.18.11) from starting properly.

Connection helper tries to connect, finds nothing, and throws "No such container."

azure vortex Jul 10, 2025, 10:20 AM

#

paper sigil ok, I believe its one of these (sorry I can't be more specific): https://dagger...

hm interesting. these kind of look as expected? one thing that's weird is that we don't actually have any way of seeing which version of the engine we actually connected to

#

i'll work on adding that log in

azure vortex Jul 10, 2025, 10:23 AM

#

woven leaf So, I've been looking into the "No such container" error recently, and here's a ...

i don't really understand this, but sure, yeah, the container somehow stops existing or wasn't created in the first place

azure vortex Jul 10, 2025, 12:37 PM

#

azure vortex i'll work on adding that log in

so not a fix at all, but this should at least help produce some more informative traces: https://github.com/dagger/dagger/pull/10709

GitHub

chore: tidy up engine startup and connection telemetry by jedevc ·...

This patch improves the telemetry at client startup in a few ways:

Earlier notification of cloud telemetry info - this ensures we can open the web view before the engine is provisioned.
Adds loggi...

woven leaf Jul 10, 2025, 5:26 PM

#

azure vortex i don't really understand this, but sure, yeah, the container somehow stops exis...

Oh, it's just a checkpoint on the understanding -- some, like me at first, might not know 😇

I dug a bit more, there is a potential Time‑of‑check vs. time‑of‑use (TOCTOU) race that i'm trying to reproduce: the docker engine doesn't release the name right away -- if the engine is big, it can take a few secs / minutes for the docker daemon to release the name -- then, we enter into the edge case of "in-use" which silently fails, which could explain everything -- I think 🤔

azure vortex Jul 10, 2025, 10:57 PM

#

yeah, but removing the container is done as a gc thing at the very end? it can take a while to release, but we've already selected which engine to use at that point

#

I guess this is explained a bit if you're running both a new dagger cli v0.18.12 and an older cu binary with v0.18.11
@paper sigil @deep relic could this have been the case for you?

#

if so, there are some things we could do here - we could "namespace" these container names such that they don't use the same resources at all.
alternatively. we could play a fun game of rebuild the garbage collection to just "work better somehow" 😭

woven leaf Jul 10, 2025, 11:13 PM

#

azure vortex I guess this is explained a bit if you're running both a new dagger cli v0.18.12...

That was the case for Andrea yes. And I think I inadvertendly encountered it today -- trying to extract a repro from it ; but getting more and more confident that's the issue

azure vortex Jul 10, 2025, 11:14 PM

#

mmm I wonder if we should avoid fixing this "obviously" and just work out a better way to do gc

#

I had an idea, been playing in containerd code a lot recently, I wonder if we could use it's concept of "leases"

#

essentially, when you use an engine you "lease" it for a period of time - it can't be gc-ed while its under lease

#

the lease expires some amount of time after you've used it

#

that would mean that you could have a lease set so that old engines would get cleaned up... eventually

#

if set to like 24 hours or so

#

but part of the problem is we'd need to change who runs the GC code. I wonder if we could have engines just "delete themselves"? then each engine garbage collects itself when it's leases expire

woven leaf Jul 10, 2025, 11:21 PM

#

Like it 💯

azure vortex Jul 10, 2025, 11:22 PM

#

azure vortex but part of the problem is we'd need to change who runs the GC code. I wonder if...

maybe need to investigate if this is possible (I guess otherwise we could maybe just have all the containers watch each other????? or have some sort of watchdog container)

#

I don't like those alternatives though

woven leaf Jul 10, 2025, 11:23 PM

#

All of this is kind of related to another tangent issue we have on SDK / Dagger CLI cohabitation -- cu uses the dagger cli to jump into the terminal, and the SDK sometimes is not up-to-date with the latest CLI version installed (which Andrea, Connor or me usually update right away) -- meaning that we encounter cache bursts between terminal jumps on the same container.

The current DAGGER_LEAVE_OLD_ENGINE creates a new container so we don't benefit from the warm cache ...

Maybe your "engine lifecycle" could indeed unlock a better foundation / UX around that ?

azure vortex Jul 10, 2025, 11:24 PM

#

azure vortex maybe need to investigate if this is possible (I guess otherwise we could maybe ...

maybe unless-stopped works. the engine can stop itself when it's run out of work to do. and we could gc sweep just stopped containers. will have a poke around tomorrow.

azure vortex Jul 10, 2025, 11:25 PM

#

woven leaf All of this is *kind of* related to another tangent issue we have on SDK / Dagge...

yeah exactly! would be cool to support more dagger versions coexisting at the same time

woven leaf Jul 10, 2025, 11:25 PM

#

azure vortex yeah exactly! would be cool to support more dagger versions coexisting at the sa...

cc @deep relic -- just to make sure you see it: #1392339595009458227 message ⏫

azure vortex Jul 10, 2025, 11:25 PM

#

we can't support cache shared between multiple versions (maybe one day)

woven leaf Jul 10, 2025, 11:26 PM

#

My aim is to isolate a good repro, then I'll fire an issue with the findings, Justin, would that work ? Happy to explore your idea(s) too, I don't know how busy you are on other stuff -- and how relevant I am in that context ahah :-p (at least i can do the repro ahahah)

azure vortex Jul 10, 2025, 11:29 PM

#

that would be perfect - I'm not reallllllly still understanding of the root cause of the issue, but having better gc behavior does feel like part of the problem

#

I'll have a poke around with docker tomorrow and see if I can work out what we would need to do - I can write up an issue proposal as well

paper sigil Jul 11, 2025, 1:30 AM

#

azure vortex I guess this is explained a bit if you're running both a new dagger cli v0.18.12...

this was the case for me,

azure vortex Jul 11, 2025, 11:20 AM

#

there's already an issue here 😄 https://github.com/dagger/dagger/issues/3849

#

trying to gauge vibes on just... removing the auto-gc step - make it a breaking change, sorry, now you need to do docker system prune occasionally. maybe work a bit with upstream, to see if we can get some better mechanisms in place for long term.

#

cc @trim bolt? i know DAGGER_LEAVE_OLD_ENGINE is kind of a workaround for a lot of this

#It happened once but we shrugged it off