Cache management | Dagger | Page 1

golden night Nov 8, 2022, 4:38 PM

#

Yes that is actually possible in dagger engine 0.2, we’ll bring it back soon in 0.3. Under the hood buildkit does the heavy lifting.

Sadly buildkit does not yet support persisting cache volumes. That is a known issue and we’re eager to find a solution, either downstream or upstream.

#

Copying @rapid brook your fellow early adopter , who has been digging pretty deep on this topic, in case you want to exchange notes.

And of course @covert wasp @pliant osprey @native kestrel

low cave Nov 8, 2022, 4:42 PM

#

I was thinking about this today and it might be possible to do something like at the end of a build create a docker container adding the volumes you want to cache. Then push this to a private registry

#

Then at the start of a run you could pull the container and restore the folders

#

That way you could have say a cache per branch, as long as the registry you push to is private the cache should be secure

#

So I wonder then if you could leverage this "hack" and wrap it in some semantic sugar. You could mark a directory as cached and the engine would do this for you under covers

rapid brook Nov 10, 2022, 6:41 AM

#

@low cave I'd be really interested to see what could be done with persisting cache volumes -- it would be a great complement to my current approach with my docker builds, which is to aggressively leverage buildkit's inline cache and remote caching. Specifically, I use git to construct tags that should contain recent builds -- details here https://github.com/dagger/dagger/discussions/3359

GitHub

Caching options for stateless environments · Discussion #3359 · dag...

This is a follow-up on a conversation I started in today's Community Call about caching. For those who weren't on the call, my general question is, "what's Dagger 0...

#

This strategy does a great job with dependencies as long as you don't change them too often. But when you do change deps that requires a full redownload/reinstall, and I completely miss out on the benefits of e.g. binary caching for Go builds.

low cave Nov 10, 2022, 11:09 AM

#

So I was thinking about something like this but I will take a look at your repo as it might be a better way.

#

// create a cache using a docker container as source
// gpgkey is used to encrypt and decrypt all files added to the cache
cache.LoadCache(&Options {
  GPGKey: "./file",
  Address: "nicholasjackson/build-cache:branch",
})

// add a cache folder to the build
build := container.Create().From("blah").WithMountedDirectory(cache.GetDirectory("/src/apps", "/src/apps"))

// build application producing artifacts
build.Exec()

// add a directory to the cache, encrypting files with gpg
cache.AddDirectory("/src/apps", build.Directory("/src/apps"))

// save the cache and push to a docker registry
// address can be changed from original in case you do not wish to overwrite the previous cache entry 
cache.Save("nicholasjackson/build-cache:branch")

rapid brook Nov 10, 2022, 7:13 PM

#

Yep -- I think these two strategies actually complement each other really well. Mine is focused on basically recreating the minimal-available-filesystem behavior of a good Dockerfile (copy dependency manifests first, then install deps, then copy full source, then run build) and then using the registry to keep that around -- so it creates a lot of situations where no work is done at all, because there's a cache hit at the buildkit level

#

But it's really hard to make that work at a granular level, for package level caching that's generally implemented by language-specific tooling, and your approach allows the language-specific tooling to pick up the slack there by keeping around the filesystem

#

one thing I have discovered with caching in CircleCI (which is kinda like what you're doing just without a container registry) is that extracting the cache can actually take a meaningful amount of time on its own (this was one of the reasons I started looking into layer caching, since it can prevent having to download/extract things in ideal cases).

#

Actually, it'd be really cool if there were some OSS version of image streaming (https://cloud.google.com/blog/products/containers-kubernetes/introducing-container-image-streaming-in-gke) to offset that cost (allow you to start pulling/using cache layers without waiting for the full layer to download and extract). Maybe there is? Not sure.

Google Cloud Blog

Introducing container image streaming in GKE | Google Cloud Blog

New container image streaming in Google Kubernetes Engine slashes the time it takes to boot your applications.

golden night Nov 10, 2022, 8:09 PM

#

@low cave you can probably implement such a caching strategy yourself in userland already. But it’s kind of like bundling your own malloc implementation statically linked in your program, instead of using the operating system’s stdlib. Not necessarily a bad idea per se, but not something you should do unless you absolutely positively know what you’re doing and are OK with the cost

#

the main cost being that you don’t benefit from your engine’s transparent cachint with site-specific configuration and backend

#Cache management