#How does DAG save computation

1 messages · Page 1 of 1 (latest)

honest dome
#

Moving to a thread (should have done so sooner, sorry!)

toxic kayak
#

All good, again I appreciate your valuable time.

honest dome
#

It boils down to caching. A lot of the benefits of Dagger derive from automatic caching of everything, all the time. Every time you execute a tool as part your pipeline, you're effectively building or downloading the container image for that tool on the fly.

So in a typical Dagger pipeline, even if your source code changes, requiring you to re-run the app build and test, a lot of the work to get to that point will be cached

#

Ideally, you would run any number of ephemeral dagger engines, and they would magically share cache data. We are driving towards that, but in practice the architecture requires a long-running service to coordinate access to the storage. In other words you cannot have N engines sharing a state directory

#

So for architectural reasons, today you need 1 long-running service per hot cache (local state directory)

#

On top of that, there is container nesting. Dagger needs to run linux containers. If Dagger is running inside a container (via a kub pod) it still needs a way to run containers. Docker-in-docker is in theory an option, and it does work. But often in production you want to avoid it. So having a companion daemonset also addresses that problem.

#

As a result, the most pragmatic way (that we know of) to run Dagger on a Kubernetes cluster today, is to split it in two:

  1. Run as much as possible inside the pod, or in an ephemeral container instrumented by the pod

  2. Run as little as possible in a companion daemonset, to broker access to the host filesystem and container runtime

honest dome
#

To clarify @toxic kayak I understand the concern you explained initially - how to make sure this isn't a step backwards in efficient resource allocation. Clearly we need to articulate our answer more clearly, and make it available in a document somewhere.

jade frigate
#

It's also worth mentioning that in the DAG / Dagger model, having fewer build nodes is "theoretically" better since if two build pipelines land in the same Dagger engine, they can efficiently de-dup operations between each other and make the best use of caching which doesn't generally happen with other traditional CI platforms where CI jobs are completely isoalted / unrealted one from each other even though they are probably overlapping several of the build steps

honest dome
#

It just occurred to me that this is a very relevant topic for #kubernetes 😁