#what is cached by the remote registry cache?

1 messages · Page 1 of 1 (latest)

bold moth
#

Hi,

I'm discovering dagger, and right now I'm trying to "speed it up" in a gitlab-ci setup - because the engine has the lifetime of the gitlab-ci job - we're running dagger-engine in docker-in-docker.

I've enabled the remote registry cache, using the "_EXPERIMENTAL_DAGGER_CACHE_CONFIG" env var, with something like:

in gitlab UI I can see that the layers have been pushed - something like 1GB of data (my dagger pipeline is very simple for the moment: golang image to run the unit tests, and golangci-lint)

in the output of the dagger job I can see that the cache is used:

and I can see an operation as CACHED (still in the output of the job):

  • copy /builds/vbehar/test-dagger CACHED

so I guess that it's somehow working.

but I would have expected to see more cached operations - for example the unit tests - as the source code hasn't changed.

instead, I'm seeing that dagger is still downloading the container images such as golang or golangci-lint:

  • pull golang:1.20
  • sha256:aec14dce7e7846bbadf40f19d3d871f619c54c1fa6296cc3cfaa810b3c66024d 95.56MiB / 95.56MiB [1.08s]
  • extracting sha256:aec14dce7e7846bbadf40f19d3d871f619c54c1fa6296cc3cfaa810b3c66024d [3.38s]

and running the unit tests:

  • [5.28s] === RUN TestRun

so my question is: what is cached exactly? shouldn't the base image layers at least be re-used from the cache? and what about the unit tests?

thanks!

#

note that I can confirm that the layers retrieve from dockerhub are being pushed to the gitlab "cache" registry:

  • exporting cache to registry
  • preparing build cache for export
  • writing layer sha256:aec14dce7e7846bbadf40f19d3d871f619c54c1fa6296cc3cfaa810b3c66024d
  • writing cache image manifest sha256:bf56d550cc1e41ebcfba58ee6c6c77cb37e4b3daa8b73a66eb892e808026e658
  • exporting cache to registry DONE

and in the dagger-engine logs (in debug mode) I can see the HTTP requests being made to dockerhub to fetch the same layer...

verbal bay
#

~~I am testing this on a self-hosted Gitlab instance but Dagger hangs when writing layers, I do get a 202 response though 🤷 ~~

I'm not providing a solution here, just encountered this when trying this out

Edit:
It just took a while, see below.

export _EXPERIMENTAL_DAGGER_CACHE_CONFIG="type=registry,ref=<masked>,mode=max,image-manifest=true"

Would you mind sharing a code snippet?

#

Dagger output

43: exporting cache to registry
43: preparing build cache for export
43: writing layer sha256:0d24aa0af8fb9faa5095e381086835ce2e72f3621b46b4f486d427ba593937da
43: writing layer sha256:0d24aa0af8fb9faa5095e381086835ce2e72f3621b46b4f486d427ba593937da [0.20s]
43: writing layer sha256:10e206ad2a7e45692e257fb38fcae0548ba49093238e21592e1b6a114600873e
43: writing layer sha256:10e206ad2a7e45692e257fb38fcae0548ba49093238e21592e1b6a114600873e [0.16s]
43: writing layer sha256:16cc2f08857b36c096a6bb68be6ce2be8ce8e4cb02bed9647fea2ba49636d75f
43: writing layer sha256:16cc2f08857b36c096a6bb68be6ce2be8ce8e4cb02bed9647fea2ba49636d75f [0.04s]
#

Dagger logs

time="2023-07-07T10:38:15Z"
level=debug msg="fetch response received"
digest="sha256:c7f90aba8cae994da8f63e110e087d12a12a24a0cb7925083d6d4f23146da18c"
mediatype=application/vnd.docker.image.rootfs.diff.tar.gzip
response.header.connection=keep-alive
response.header.content-length=0
response.header.date="Fri, 07 Jul 2023 10:38:15 GMT"
response.header.docker-distribution-api-version=registry/2.0
response.header.docker-upload-uuid=b5fe84d4-e484-4bbd-9b00-cf2fce41a07d
response.header.location="https://<masked>/v2/<masked>/blobs/uploads/b5fe84d4-e484-4bbd-9b00-cf2fce41a07d?_state=OE5jKzvcdeM5AKOjbv-<masked>"
response.header.range=0-0
response.header.server=nginx
response.header.x-content-type-options=nosniff
response.status="202 Accepted"
size=528690166
spanID=9a621a3523cb8d97
traceID=7bbf32a11d2a846f71730e0cde0a2c38
url="https://<masked>/v2/<masked>/blobs/uploads/"
time="2023-07-07T10:38:15Z"
level=debug
msg="do request"
digest="sha256:c7f90aba8cae994da8f63e110e087d12a12a24a0cb7925083d6d4f23146da18c"
mediatype=application/vnd.docker.image.rootfs.diff.tar.gzip
request.header.content-type=application/octet-stream
request.header.user-agent=buildkit/v0.0-dev
request.method=PUT
size=528690166
spanID=9a621a3523cb8d97
traceID=7bbf32a11d2a846f71730e0cde0a2c38
url="https://<masked>/v2/<masked>/blobs/uploads/b5fe84d4-e484-4bbd-9b00-cf2fce41a07d?_state=OE5jKzvcdeM5AKOjbv-<masked>"
#

ahh no!

65: exporting cache to registry DONE

It just took a really, really, really long time

#
65: preparing build cache for export [191.0s]
verbal bay
#

next run 🥳

65: preparing build cache for export [2.40s]
fossil whale
#

cc @late spindle. ephemeral runners caching

bold moth
#

thanks @verbal bay . I have the same behavior: it's properly exporting to my registry.

I'm just wondering what's being done with this cache at the next run. In my case, I can still observe dagger pulling base images from their original registries instead of using the layers from the cache. Do you observe the same behavior or not?

I'll try next week with a more complex pipeline to see which steps are being cached (or not)

#

my pipeline is very simple:

client.Container().
        From("golang:1.20").
        WithMountedDirectory("/src", srcDirectory(client)).
        WithWorkdir("/src").
        WithMountedCache("/go/pkg/mod", client.CacheVolume("go-mod")).
        WithMountedCache("/go/build-cache", client.CacheVolume("go-build-cache")).
        WithEnvVariable("GOCACHE", "/go/build-cache").
        WithExec([]string{"go", "test", "-v", "."}).
        Sync(ctx)

and srcDirectory just returns client.Host().Directory(".") with the right includes/excludes

verbal bay
#

I am also facing a caching issue in a different project (legacy React version) and I think it is related to how webpack is handling the transpiling

#

the source is the same and should be cached, but Dagger is maybe looking at the output artifacts to determine the cache?

late spindle
#

@bold moth @verbal bay I know that default Buildkit caching doesn't cache the CacheVolumes/Mounts, so that is one thing you're likely seeing.

We are developing a cache service as part of Dagger Cloud (visibility of runs, logs, and DAGs + caching service) that will work automatically and combines layer caching, cache volumes (for dependencies like go mods and node modules), and more. This will be a paid service, but I'm working with some folks in early access now. Let me know if you're interested.

verbal bay
#

@late spindle yes, take my money!