#External S3 Cache
1 messages · Page 1 of 1 (latest)
@plush urchin I've been keeping an eye out for this as well. I don't know how up-to-date this info is but it may be worth a look if you've not seen it already. https://github.com/dagger/dagger/pull/4543 suggests it's available with the _EXPERIMENTAL_DAGGER_CACHE_CONFIG environment variable set using the buildkit remotecache config described here: https://docs.docker.com/build/cache/backends/s3/
There's a test (albeit not using s3) in that PR: https://github.com/dagger/dagger/pull/4543/files#diff-3ef7e04319aa08cfefa3c03061e71273999adfa09161227662ecec316d4c2802
@muted moth thank you for the tips. Let me see if I can get this working. I guess it's a matter of authenticating to the AWS role and then passing the auth tokens in the env var.
It should be yes. I'd be interested to hear if you get it working.
@plush urchin @muted moth happy to chat more about your specific needs for S3 cache. There are some examples of using external caches like this GitHub Actions cache around, but the external cache is still in an experimental state (thus its current name). #1078344535756787792 message
Quick questions - what's left to do to get this out of an experimental state? Any ideas on timelines for that? And, I mention it a lot but for good measure, any chance of a guide showing it in use that I can share?
I tried setting _EXPERIMENTAL_DAGGER_CACHE_CONFIG but it doesn't seem to have any effect on the build. I set it to some incorrect values but the build is either hiding an error or is ignoring it.
where are you setting the env var? I dont think it will work if you set in on the client side, it probably needs to be set in the dagger engine container, which effectivily needs to be provisioned manually
havent tested s3 cache with dagger but it works fine with buildx in general, if using sso, just make sure to include the session token.
This is untested, and more as a clarification on how s3 cache could be used with a custom buildkit host endpoint
Also see the s3 cache docs for more info
https://docs.docker.com/build/cache/backends/s3/
profile=some-profile-name
bucket=some-bucket
prefix=some-bucket-prefix/
region=$(aws --profile $profile configure get region)
access_key=$(aws --profile $profile configure get aws_access_key_id)
secret=$(aws --profile $profile configure get aws_secret_access_key)
session_token=$(aws --profile $profile configure get aws_session_token)
# probably not needed
# docker buildx create --use --driver=docker-container
_EXPERIMENTAL_DAGGER_CACHE_CONFIG="type=s3,region=$region,bucket=$bucket,access_key_id=$access_key,secret_access_key=$secret,session_token=$session_token,prefix=$prefix"
docker run -d --name dagger-buildkitd -e _EXPERIMENTAL_DAGGER_CACHE_CONFIG=$_EXPERIMENTAL_DAGGER_CACHE_CONFIG --privileged --network=host docker.io/moby/buildkit:latest
export BUILDKIT_HOST=docker-container://dagger-buildkitd
this. You can set the variable client side, but it needs to be present before the engine container is created so it gets passed to it on creation.
so basically you can do:
1. docker rm -fv $engine_container
2. set cache env file locally
3. run your pipeline again and your newly spawned engine container should be set
You can always check if your engine has the variable set with: docker inspect $(docker ps -q --filter name=dagger-engine-) | jq "".[].Config.Env
Any idea on when this is coming out of the experimental phase?
Can't provide info here since I'm not currently working on this. I'll defer it to @visual shell
@rapid viper was making quite a few caching-related commits. Any info on a timeline there or guidance for folks wanting to give it a test drive?
The S3 cache export as it exists today is roughly in its final shape: it exposes the underlying buildkit cache export feature. The interface might shift a little (env variable or flag name) but other than that, it won’t move. It is a naive caching method but it is usually better than no cache (depends on the exact workload though)
Meanwhile we are working on a global caching service that will be much better, because it will coordinate caching across all your engines, move data between cold and hot cache asynchronously, persist cache mounts, and learn from runtime telemetry to take better caching decisions over time. That’s where most of Erik’s commits are going. The engine side of that feature will be open source but it will require a Dagger Cloud subscription to work (because Cloud orchestrates caching decisions based on runtime telemetry). DM me if you are interested in early access to that
and just to add, buildkit s3 cache will not be of any benefit unless you use custom ec2 runners that fetch the cache from within the aws internal network. Without it, the speed is terrible.
That's precisely the setup I'm working with - ephemeral self-hosted gitlab runners
Right, we also want to bring a cold cache as close to your engines as possible - ideally same region on the same hosting provider
As well as manage the hot cache (local storage) of each node. That way we can orchestrate moving blobs between runners and cold cache; but also direct transfer between runners on the same cluster
Initial results are… very promising 😇
Looking forward to hearing/seeing more about Dagger Cloud!
We can show you a demo if you want 🙂 DM me and we can schedule a zoom
👋 closing this thread
hi there, I'm wondering if this thread is still accurate in 2025 -- naive s3 cache for self-managed, smart cache for Dagger Cloud customers? @rapid viper @visual shell
And a follow-up: and Dagger Cloud be licensed for on-prem installation?
One important change: on Dagger Cloud we are working on integrated cache+compute. The performance is much better that way. It also makes it possible to collapse the entire CI stack, and process git events directly in a dagger cluster, with dynamic scale-out. We think this will blow existing CI platforms out of the water in terms of speed and efficiency, and unlock additional features like automatic test splitting.