#Really strange bugs started happening in CI, like `apt-get` not in PATH on a `ubuntu:22.04` image.
1 messages · Page 1 of 1 (latest)
👋 does it happen consistently? Or it's just random?
by any change in the engine logs of this pipeline could you see the digest that got used for that image?
maybe ubuntu:22.04 pushed an incorrect image for a moment?
particularly since you're using a 22.04 tag which could be potentially incorrectly replaced
Thanks for responding, sorry I missed it.
It's been happening pretty consistently all day, yeah, but not locally.
by any change in the engine logs of this pipeline could you see the digest that got used for that image?
What would I look for?
I should clarify: It is happening all day but we have like 8 or so different dagger pipelines running in parallel in 1 dagger run, and it happens in some but not others. So it's.. persistent but not consistent.
So, the dagger logs while the pipeline is running should output something like:
11: resolve docker.io/library/alpine@sha256:51b67269f354137895d43f3b3d810bfacd3945438e94dc5ac55fdac340352f48
11: resolve docker.io/library/alpine@sha256:51b67269f354137895d43f3b3d810bfacd3945438e94dc5ac55fdac340352f48 [0.01s]
11: pull docker.io/library/alpine:latest DONE
11: pull docker.io/library/alpine:latest
11: > in from > from alpine
11: > in sync
11: pull docker.io/library/alpine:latest DONE
11: pull docker.io/library/alpine:latest CACHED
11: > in from > from alpine
11: > in sync
11: pull docker.io/library/alpine:latest CACHED
11: pull docker.io/library/alpine:latest
11: > in from > from alpine
11: > in sync
11: sha256:661ff4d9561e3fd050929ee5097067c34bafc523ee60f5294a37fd08056a73ca 0B / 3.251MiB
11: sha256:661ff4d9561e3fd050929ee5097067c34bafc523ee60f5294a37fd08056a73ca 1MiB / 3.251MiB
11: sha256:661ff4d9561e3fd050929ee5097067c34bafc523ee60f5294a37fd08056a73ca 2MiB / 3.251MiB
11: sha256:661ff4d9561e3fd050929ee5097067c34bafc523ee60f5294a37fd08056a73ca 3.251MiB / 3.251MiB [1.88s]
11: pull docker.io/library/alpine:latest DONE
As you can see there, tha sha256 is the image digest. It'd be great if we could get that so we can validate what's the digest of the image being pulled.
Ah I'll take a look real quick. We run it in our drone with --silent because it crashes if we don't 🥲. So hopefully those are in dagger.cloud
It doesn't look like there's a way for us to see that unfortunately.
I've seen a number of times that CI will just about always pull a new image but your local environment might be retaining a cached image with the same tag. I might try force pulling a new image. I think you can just run docker pull ubuntu:22.04 and see if you can reproduce it locally. Just a thought
maybe. I"ll try that. The 22.04 image that's in the official repo is 6 days old; if there was an issue with it I think it would have been revealed by now.
There's been a couple other logs where it seems like a container / step that ran very late in the dag had some kind of filesystem issue, like a path suddenly not being writeable.
I'll try to gather more information and report back.
Here was another interesting one: https://dagger.cloud/grafana/runs/dbd0725b-38ea-4b2a-a2fc-5f393cead46b
this passes locally for me and everyone else so far but it hasn't changed in quite a while, but we started getting errors on it last night.
I'd assume those errors are random, correct? Sometimes it passes and some other times it fails?
Yeah I think that's fair. This is also a really resource intensive part of the pipeline, too.
In that link there are other containers that are imported that are able to run "/run.sh", and they were all built from the same Dockerfile
If I went and downloaded those images they'd have it in there too. I'm out right now but I wonder if I can figure out a way to get this to fail consistently.
I'm really struggling to reproduce this but it is still happening in our CI. These runners are ephemeral.
I ran exactly what CI was running locally (same program, same arguments).
The differences that I can think of currently:
- OS: these are running on AWS EC2 runners, probably on AWS linux, but the job itself is running in the docker container
drone/drone-runner-docker:1.8.2. - Docker daemon or docker version?
I'll try disabling --silent and seeing if I can get any better information from CI but I appreciate any suggestions on anything i can try to reproduce this locally.
Here's the result of running that without --silent:
1926: resolve image config for docker.io/library/ubuntu:22.04 DONE
1926: > in from ubuntu:22.04
1926: resolve image config for docker.io/library/ubuntu:22.04 DONE
1934: pull docker.io/library/ubuntu:22.04
1934: > in from ubuntu:22.04
1934: resolve docker.io/library/ubuntu:22.04@sha256:e6173d4dc55e76b87c4af8db8821b1feae4146dd47341e4d431118c7dd060a74 [0.00s]
1934: pull docker.io/library/ubuntu:22.04 DONE
...
1933: exec apt-get update -yq
1933: [0.12s] panic: exec: "apt-get": executable file not found in $PATH
1933: [0.12s]
1933: [0.12s] goroutine 1 [running, locked to thread]:
1933: [0.12s] main.shim()
1933: [0.12s] /app/cmd/shim/main.go:267 +0xfd8
1933: [0.12s] main.main()
1933: [0.12s] /app/cmd/shim/main.go:64 +0x5d
1933: exec apt-get update -yq ERROR: process "apt-get update -yq" did not complete successfully: exit code: 2
And locally:
[~/Work/Grafana/grafana-build] (main*) [i] » docker run --rm -it ubuntu:22.04@sha256:e6173d4dc55e76b87c4af8db8821b1feae4146dd47341e4d431118c7dd060a74 which apt-get
/usr/bin/apt-get
[~/Work/Grafana/grafana-build] (main*) [i] » docker run --rm -it ubuntu:22.04@sha256:e6173d4dc55e76b87c4af8db8821b1feae4146dd47341e4d431118c7dd060a74 apt-get update -yq
Get:1 http://archive.ubuntu.com/ubuntu jammy InRelease [270 kB]
Get:2 http://security.ubuntu.com/ubuntu jammy-security InRelease [110 kB]
Get:3 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [119 kB]
Get:4 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [109 kB]
The common theme with the few errors that we're seeing is:
- They're pretty deep into the dag. We can get around them by reducing the size / complexity of dag. The current dag has about ~1000 steps.
- They're filesystem related; the container is missing a file or folder, or the folder / path isn't writeable.
regarding point 1, it makes sense why we would be seeing this now despite code changes, because we just enabled building Grafana for more architectures (150-200 extra steps each)
Looking more closely, our Docker container that has our dagger pipeline is usign the docker client version 23.0, while our server version in our CI runners is now on 25.0. I'll see if using this combination locally causes it to fail the same way.
That did it.
I updated my docker daemon to 25.0.1 and ran it and that caused the failure.
maybe. double checking.
yup. That was it. I had to install docker via edge in the pipeline container but after I upgraded it to 25.0.1 then it worked. very strange but I bet it's related to this https://github.com/moby/moby/pull/44598