#External S3 Cache

1 messages · Page 1 of 1 (latest)

plush urchin
#

I posted this in general too but it probably suits here better. I have seen some development happening around using a S3 remote cache. I am interested to try this out as I have ephemeral CI agents that don't persist cache. Is it still not ready for primetime? Is there a way to try it out right now? I am using the Go SDK.

muted moth
plush urchin
#

@muted moth thank you for the tips. Let me see if I can get this working. I guess it's a matter of authenticating to the AWS role and then passing the auth tokens in the env var.

muted moth
outer vapor
#

@plush urchin @muted moth happy to chat more about your specific needs for S3 cache. There are some examples of using external caches like this GitHub Actions cache around, but the external cache is still in an experimental state (thus its current name). #1078344535756787792 message

muted moth
plush urchin
#

I tried setting _EXPERIMENTAL_DAGGER_CACHE_CONFIG but it doesn't seem to have any effect on the build. I set it to some incorrect values but the build is either hiding an error or is ignoring it.

frozen mango
#

where are you setting the env var? I dont think it will work if you set in on the client side, it probably needs to be set in the dagger engine container, which effectivily needs to be provisioned manually

#

havent tested s3 cache with dagger but it works fine with buildx in general, if using sso, just make sure to include the session token.

This is untested, and more as a clarification on how s3 cache could be used with a custom buildkit host endpoint

Also see the s3 cache docs for more info

https://docs.docker.com/build/cache/backends/s3/

profile=some-profile-name
bucket=some-bucket
prefix=some-bucket-prefix/
region=$(aws --profile $profile configure get region)
access_key=$(aws --profile $profile configure get aws_access_key_id)
secret=$(aws --profile $profile configure get aws_secret_access_key)
session_token=$(aws --profile $profile configure get aws_session_token)

# probably not needed
# docker buildx create --use --driver=docker-container

_EXPERIMENTAL_DAGGER_CACHE_CONFIG="type=s3,region=$region,bucket=$bucket,access_key_id=$access_key,secret_access_key=$secret,session_token=$session_token,prefix=$prefix"


docker run -d --name dagger-buildkitd -e _EXPERIMENTAL_DAGGER_CACHE_CONFIG=$_EXPERIMENTAL_DAGGER_CACHE_CONFIG --privileged --network=host docker.io/moby/buildkit:latest

export BUILDKIT_HOST=docker-container://dagger-buildkitd

Docker Documentation
boreal grotto
# frozen mango where are you setting the env var? I dont think it will work if you set in on th...

this. You can set the variable client side, but it needs to be present before the engine container is created so it gets passed to it on creation.

so basically you can do:

1. docker rm -fv $engine_container
2. set cache env file locally
3. run your pipeline again and your newly spawned engine container should be set

You can always check if your engine has the variable set with: docker inspect $(docker ps -q --filter name=dagger-engine-) | jq "".[].Config.Env

muted moth
boreal grotto
plucky root
#

@rapid viper was making quite a few caching-related commits. Any info on a timeline there or guidance for folks wanting to give it a test drive?

visual shell
#

The S3 cache export as it exists today is roughly in its final shape: it exposes the underlying buildkit cache export feature. The interface might shift a little (env variable or flag name) but other than that, it won’t move. It is a naive caching method but it is usually better than no cache (depends on the exact workload though)

Meanwhile we are working on a global caching service that will be much better, because it will coordinate caching across all your engines, move data between cold and hot cache asynchronously, persist cache mounts, and learn from runtime telemetry to take better caching decisions over time. That’s where most of Erik’s commits are going. The engine side of that feature will be open source but it will require a Dagger Cloud subscription to work (because Cloud orchestrates caching decisions based on runtime telemetry). DM me if you are interested in early access to that

frozen mango
#

and just to add, buildkit s3 cache will not be of any benefit unless you use custom ec2 runners that fetch the cache from within the aws internal network. Without it, the speed is terrible.

muted moth
#

That's precisely the setup I'm working with - ephemeral self-hosted gitlab runners

visual shell
#

Right, we also want to bring a cold cache as close to your engines as possible - ideally same region on the same hosting provider

#

As well as manage the hot cache (local storage) of each node. That way we can orchestrate moving blobs between runners and cold cache; but also direct transfer between runners on the same cluster

#

Initial results are… very promising 😇

muted moth
#

Looking forward to hearing/seeing more about Dagger Cloud!

visual shell
boreal grotto
#

👋 closing this thread

reef oar
#

hi there, I'm wondering if this thread is still accurate in 2025 -- naive s3 cache for self-managed, smart cache for Dagger Cloud customers? @rapid viper @visual shell

reef oar
#

And a follow-up: and Dagger Cloud be licensed for on-prem installation?

visual shell
# reef oar hi there, I'm wondering if this thread is still accurate in 2025 -- naive s3 ca...

One important change: on Dagger Cloud we are working on integrated cache+compute. The performance is much better that way. It also makes it possible to collapse the entire CI stack, and process git events directly in a dagger cluster, with dynamic scale-out. We think this will blow existing CI platforms out of the water in terms of speed and efficiency, and unlock additional features like automatic test splitting.