#unauthorized when pulling image from public (ECR) registry after resuming from idle

1 messages · Page 1 of 1 (latest)

winter condor
#

os: Fedora Silverblue
builder: Podman
client: Go SDK

After resuming my laptop after a few hours of idle I get a 400 when Dagger tries to pull an image from a public (ECR) registry.

#1 resolve image config for public.ecr.aws/docker/library/node:18-alpine3.15
#1 ERROR: unexpected status from HEAD request to https://public.ecr.aws/v2/docker/library/node/manifests/18-alpine3.15: 400 Bad Request
------
 > resolve image config for public.ecr.aws/docker/library/node:18-alpine3.15:
------
panic: input:1: container.from unexpected status from HEAD request to https://public.ecr.aws/v2/docker/library/node/manifests/18-alpine3.15: 400 Bad Request

After killing the dagger container and run the pipeline again it was fine.

Some root/client cert issue on waking up?

Putting it here if this deserves more investigation.

grand wave
#

interesting... next time that happens can you check what docker logs <engine_container> show please?

winter condor
#

I am going to run this from work today to rule out my home network

grand wave
#

I don't think it's an issue with Dagger. What I've seen out there with podman is that containers lose network connectivity after the machine is put to sleep for some time. Another way to test this is launching a an alpine container with sleep inifinity and check if when this issue presents, the alpine container has access to the outside world. You can easily verify this by execing into the alpine container and running ping or curl

#

My assumption is that it's somehow related to Podman. Docker Desktop had this problem for a long time and it was "finally fixed" not so long ago

winter condor
#

thanks for looking into this 🙌

must be something with how the container is resolving DNS from the host

dial tcp: lookup public.ecr.aws on 10.87.0.1:53: read udp 10.87.0.1:57508->10.87.0.1:53: read: connection refused" 

p.s. I am running podman containers for weeks/months without restarts, and no network issues

winter condor
#

yeah, now it's happening without idle

#
INFO[2023-03-25T19:17:23Z] trying next host                              error="failed to do request: Head \"https://registry-1.docker.io/v2/library/node/manifests/18-alpine3.15\": net/http: timeout awaiting response headers" host=registry-1.docker.io spanID=914f25deb34a0bd7 traceID=2cb8b151dd8da27a18f6a10ad8c57989
ERRO[2023-03-25T19:17:23Z] /moby.buildkit.v1.frontend.LLBBridge/ResolveImageConfig returned error: rpc error: code = Unknown desc = failed to do request: Head "https://registry-1.docker.io/v2/library/node/manifests/18-alpine3.15": net/http: timeout awaiting response headers 
failed to do request: Head "https://registry-1.docker.io/v2/library/node/manifests/18-alpine3.15": net/http: timeout awaiting response headers
1 v0.0.0+unknown /usr/local/bin/dagger-engine --config /etc/dagger/engine.toml /bin/sh
grand wave
#

@winter condor are you on a Linux host?

winter condor
#

yes, Fedora Silverblue, so I am being extra difficult 😆

#

I am podman unsharing /var/lib/dagger now, since I also have other selinux issues

grand wave
#

Nice. Can you check if you're using the latest 0.4.2 engine?

We found some DNS issues this week due to Linux hosts and apparmoe and we made a new release to fix that. Just checking if that could be the problem

winter condor
#

yep, on v0.4.2

grand wave
#

Cool. In that case it might be another DNS issue to take a look. cc @terse cloud

#

@winter condor if you can help us troubleshoot based on the discussion you shared before we'd appreciate it