#Weird gql transport error when updating to newest version of Dagger

1 messages · Page 1 of 1 (latest)

supple cradle
#

I get the following exception when using Dagger on CentOS-9 and Podman.

gql.transport.exceptions.TransportQueryError: {'message': 'failed to compute cache key: mount callback failed on /tmp/container
┃   │ mount1905428027: failed to convert whiteout file "usr/lib/.wh.gnupg": unlinkat /tmp/containerd-mount1905428027/usr/lib/gnupg: invalid
┃   │ rgument', 'locations': [{'line': 7, 'column': 9}], 'path': ['container', 'from', 'withExec', 'sync']}

Here is the Dagger code in the python SDK that triggers this error

def spack_base(client):
    return (client.container()
            .from_("ecpe4s/ubuntu20.04")
            .with_exec([
                "bash", "-l", "-c",
                """source /spack/share/spack/setup-env.sh
                git clone https://github.com/robertu94/spack_packages ./robertu94_packages
                spack repo add  ./robertu94_packages
                spack install libpressio"""
                ])
            .sync()
            )

Any ideas?

signal folio
#

👋 have you checked that the engine running in your machine is the same one as your SDK is using?

supple cradle
#

I just installed both of them.

#

The rest of the pipeline has worked well so far.

#

This also worked on a previous version of dagger (modulo replacing exit_code -> sync)

signal folio
#

hmm could be a buildkit bug indeed. cc @empty wren @lilac plinth . My initial guess is that it's related to how that custom ecpe4s image was built

supple cradle
#

Just following up here

signal folio
#

@supple cradle can you validate if it only happens with that image?

supple cradle
#

So far that is the only image I’ve seen it fail with, but I’ve not test a very exhaustive list, but basic CentOS, Fedora, and Ubuntu images all work. It also worked with an older version of Dagger.

supple cradle
#

Is there a way to get better debugging information out of dagger so I can try and root cause this?

signal folio
#

he @supple cradle, just checking if you were able to find anything new here. Are you currently running 0.9.3? or 0.9.4?

supple cradle
#

I don’t know enough about dagger internals to debug this without at least an architecture document and some debug flags to get the raw queries.

#

I’ll check what version I have though

#

0.9.3

signal folio
#

Got it. Could you give 0.9.4 a try and see if that helps?

supple cradle
#

No dice with 0.9.4

#

I get the same error.

signal folio
#

hey @supple cradle is there any chance you could share that ecpe4s/ubuntu20.04 image to repro? Or does it container some sensitive information?

supple cradle
#

It’s totally public on dockerhub. I didn’t make it

signal folio
signal folio
#

@supple cradle if you change the with_exec to ["-l", "-c", ...] (without the bash) does it fix it?

#

I'm trying locally and that image seems to act somehow funky

#

i.e this fails in my machine: docker run --rm -ti ecpe4s/ubuntu20.04:latest bash -l -c "echo hello"

#

I have to remove bash in order for it to work

supple cradle
#

i’ll give this a shot thanks @signal folio

supple cradle
#

@signal folio no that didn't fix it for me either.

signal folio
#

Can you podman run that image?

supple cradle
#

let me try that

#

yes. I can podman run the image.

signal folio
#

Ok.. that's strange indeed. I'll try to get a fedora VM tomorrow with Podman to give it a try