#Cache management
1 messages · Page 1 of 1 (latest)
Yes that is actually possible in dagger engine 0.2, we’ll bring it back soon in 0.3. Under the hood buildkit does the heavy lifting.
Sadly buildkit does not yet support persisting cache volumes. That is a known issue and we’re eager to find a solution, either downstream or upstream.
Copying @rapid brook your fellow early adopter , who has been digging pretty deep on this topic, in case you want to exchange notes.
And of course @covert wasp @pliant osprey @native kestrel
I was thinking about this today and it might be possible to do something like at the end of a build create a docker container adding the volumes you want to cache. Then push this to a private registry
Then at the start of a run you could pull the container and restore the folders
That way you could have say a cache per branch, as long as the registry you push to is private the cache should be secure
So I wonder then if you could leverage this "hack" and wrap it in some semantic sugar. You could mark a directory as cached and the engine would do this for you under covers
@low cave I'd be really interested to see what could be done with persisting cache volumes -- it would be a great complement to my current approach with my docker builds, which is to aggressively leverage buildkit's inline cache and remote caching. Specifically, I use git to construct tags that should contain recent builds -- details here https://github.com/dagger/dagger/discussions/3359
This strategy does a great job with dependencies as long as you don't change them too often. But when you do change deps that requires a full redownload/reinstall, and I completely miss out on the benefits of e.g. binary caching for Go builds.
So I was thinking about something like this but I will take a look at your repo as it might be a better way.
// create a cache using a docker container as source
// gpgkey is used to encrypt and decrypt all files added to the cache
cache.LoadCache(&Options {
GPGKey: "./file",
Address: "nicholasjackson/build-cache:branch",
})
// add a cache folder to the build
build := container.Create().From("blah").WithMountedDirectory(cache.GetDirectory("/src/apps", "/src/apps"))
// build application producing artifacts
build.Exec()
// add a directory to the cache, encrypting files with gpg
cache.AddDirectory("/src/apps", build.Directory("/src/apps"))
// save the cache and push to a docker registry
// address can be changed from original in case you do not wish to overwrite the previous cache entry
cache.Save("nicholasjackson/build-cache:branch")
Yep -- I think these two strategies actually complement each other really well. Mine is focused on basically recreating the minimal-available-filesystem behavior of a good Dockerfile (copy dependency manifests first, then install deps, then copy full source, then run build) and then using the registry to keep that around -- so it creates a lot of situations where no work is done at all, because there's a cache hit at the buildkit level
But it's really hard to make that work at a granular level, for package level caching that's generally implemented by language-specific tooling, and your approach allows the language-specific tooling to pick up the slack there by keeping around the filesystem
one thing I have discovered with caching in CircleCI (which is kinda like what you're doing just without a container registry) is that extracting the cache can actually take a meaningful amount of time on its own (this was one of the reasons I started looking into layer caching, since it can prevent having to download/extract things in ideal cases).
Actually, it'd be really cool if there were some OSS version of image streaming (https://cloud.google.com/blog/products/containers-kubernetes/introducing-container-image-streaming-in-gke) to offset that cost (allow you to start pulling/using cache layers without waiting for the full layer to download and extract). Maybe there is? Not sure.
@low cave you can probably implement such a caching strategy yourself in userland already. But it’s kind of like bundling your own malloc implementation statically linked in your program, instead of using the operating system’s stdlib. Not necessarily a bad idea per se, but not something you should do unless you absolutely positively know what you’re doing and are OK with the cost
the main cost being that you don’t benefit from your engine’s transparent cachint with site-specific configuration and backend