#Caching across ephemeral instances

1 messages · Page 1 of 1 (latest)

cyan panther
#

Hey there. I'm evaluating the possibility of using Dagger in my team. I'm wondering what's the state of play re. caching across ephemeral runs/instances. I notice in some of the original blog posts/launch docs around Dagger Cloud it was pitched as providing caching so you could just use ephemeral instances in CI and still get the caching gains of Dagger via the cloud integration. From what I can see now this isn't an advertised feature on the https://dagger.io/cloud page, so I'm unsure if this offering has been scaled back or removed?

We are most likely looking at deploying Dagger into Kubernetes, so advice in that direction would be useful. This page https://docs.dagger.io/integrations/kubernetes/ makes note of the problem re. scaling nodes down will cause their caches to perish, but doesn't really propose any solution afaict.

hushed knot
# cyan panther Hey there. I'm evaluating the possibility of using Dagger in my team. I'm wonder...

Hi Owen. There is no single perfect solution but there are several viable ways to achieve cache persistence, each with their own tradeoffs:

  1. Our hosted cache service is still in technical evaluation, it has production users but we stopped taking new signups while we figure out the technical tradeoffs.

  2. Self-hosting on Kubernetes & relying on node's local storage is a popular option as you noticed. Of course the longer your nodes live, the longer your cache will live. You can complement this with your own storage persistence, eg. start all your nodes with an EBS/ceph/gluster snapshot of a pre-warmed dagger engine state

  3. We are partnering with hosted CI providers to offer the above as a managed solution. The first partner is depot.dev , with others planned

  4. You can use buildkit's cache export feature, it has limitations of its own but can be a good stopgap as well

Besides these solutions, we're overhauling dagger's caching system to be less dependent on buildkit- that dependency is the root cause for this problem not being solved. Once that's done, things will get much smoother: the engine will become much more stateless and there will be options for storage backends that "just work".

cyan panther
#

Thanks for the comprehensive reply!

frail nexus
#

@hushed knot if we wanted to do option 2 is there a lean option for copying a directory to capture the cache? We are using k8s nodes with nvme disks for speed but if we could pre-warm with a capture in an EBS volume that might be worth exploring. Hurts start up time, but might be worth it...

hushed knot
frail nexus
#

Yeah, I think that is where we are mounting the nvme disk.

hushed knot
#

just remember dagger doesn't support concurrent writers or in-flight snapshots

frail nexus
#

Yeah, it would just to see the node on startup.

#

Don't allow scheduling until that is done

hushed knot
#

but I believe you can spawn any number of instances from the same state snapshot

frail nexus
#

K. Wanted to check since I know sometimes metadata sneaks into those kind of things that makes them non-transferable.

hushed knot
#

I believe not but @pliant prism @novel pulsar @tacit sorrel or @solid whale can confirm

#

if by chance there is, it would be superficial and easy to scrub

#

since the engine is designed to be stateless