#How does Dagger use BuildKit under the hood?

1 messages ยท Page 1 of 1 (latest)

runic flicker
#

Good morning! I'm coming to Dagger after using Earthly. Earthly used a forked version of Buildkit as their daemon under the hood. I'm curious how Dagger uses Buildkit - do you have a forked version as well, or do you use the stock buildkit version? Also, how deeply integrated is Dagger to buildkit - are there plans for other "backends"?

brisk sable
# runic flicker Good morning! I'm coming to Dagger after using Earthly. Earthly used a forked ve...

๐Ÿ‘‹
we do have a forked version of it but it's not heavily modified. We try as much as we can to push our changes upstream but sometimes we need to use our fork given that upstream release cycles don't match ours.

Regarding other backends, we're currently decoupling ourselves from some of the buildkit layers (like caching) so we have more autonomy and flexibility to deliver better solutions on that front. Wondering what use case you have in mind when you're thinking about other type of backends

runic flicker
#

awesome, that's a great overview, thanks! in case it's helpful to you/your team, the Earthly team recently did a write up of the modifications they made to their buildkit fork... it's here if it's useful to y'all at all: https://docs.google.com/document/d/1cakoPsXaw_IBUFUhPkz8aTyGJamyLGq91dxeAVWTgpM/edit?usp=sharing

runic flicker
# brisk sable ๐Ÿ‘‹ we do have a forked version of it but it's not heavily modified. We try as ...

Wondering what use case you have in mind when you're thinking about other type[s] of backends

I'm thinking more about if users ever wanted to use Dagger as a general orchestration engine over resources that don't live within Docker containers...

For example, if I wanted to manage a server over SSH, how easy would it be for me to write an "ansible" module, to enable writing code like:

dag.ssh_server("xyz.example.com")
.copy_directory("/opt/application", source)
.exec("/opt/application/install.sh")

I know this starts to deconstruct the caching and sandbox guarantees that Docker provides you, but it could be nice to allow developers access to build workflows like that (as long as they're willing to accept the tradeoffs... something like dag.ssh_server(...).exec(...) could never be cached by default (without something like a cacheKey or some other caching logic))

brisk sable
#

all the DAG properties of the engine will still apply to the workflow that you're describing

#

what I'm saying is that all the copy_directory and exec operations can be written in userland (module)

runic flicker
#

oh, interesting... are there any published examples of workflows that don't start with the container primitives? and does that mean caching is implemented at the module level for those?

brisk sable
runic flicker
#

ah - so everything still has to start with the container(...) primitive today... I see

brisk sable
#

it's pretty much the only way that you can guarantee that you can run your workflows from anywhere and that they should still work ๐Ÿ™

runic flicker
#

but the existing dagger setup at least requires you to run the dagger engine inside a container, which requires docker/podman/etc...

is there a path for a developer today to provide an API compatible with dagger to use a different base primitive? if we say, OK, we're going to agree that the only way you can run this workflow is if you set up an Ansible environment on your local machine...

if the developer wants to choose to make a workflow not portable and skip docker, is there a path to do that?

brisk sable
#

but the existing dagger setup at least requires you to run the dagger engine inside a container, which requires docker/podman/etc...

correct. We currently run the engine in docker/podman on the user's machine mostly for convenience. Due to the fact that the engine requires an environment with some custom binaries and libraries, using a container runtime seemed like the best approach. We do see a future where we provide something similar to a Flatpak bundle where you can just run dagger in your machine without the Docker or Podman dependency

#

if the developer wants to choose to make a workflow not portable and skip docker, is there a path to do that?

that hasn't been discussed in length since it kind of goes against to the problem Dagger is trying to solve. Besides the DAG caching, I'm wondering which other benefit Dagger could bring in a world where portability and reproducibility is completely removed from the equation. I'd think that there are tools that might already be solving this problem like Airflow for instance

runic flicker
#

I'm wondering which other benefit Dagger could bring in a world where portability and reproducibility is completely removed from the equation.
I don't know that I would go that far, I'd frame it more like choosing which underlying platform to make the "ground truth" to build portable workflows.

Maybe it was a mistake for me to phrase it as a developer choosing to make a workflow not portable... what I really imagine is something like a team who all uses the same software stack ("ground truth") deciding to use that as the basis for their dagger workflows.

brisk sable
#

Maybe it was a mistake for me to phrase it as a developer choosing to make a workflow not portable... what I really imagine is something like a team who all uses the same software stack ("ground truth") deciding to use that as the basis for their dagger workflows.

That's where this becomes a challenge. How does this look in practice? I have never seen it in the wild where a group of people have a "ground truth" setup where everything on their machines is exactly the same. This is the reason why tools like Vagrant, Docker, etc became so popular.

If I have to do this in my organization I guess I'd either choose one of the following strategies:

  1. The current way Dagger works.

  2. If running Dagger locally becomes a challenge for compliance reasons, I'd spin up the engines remotely and then point the developer's engines to a remote VM. This not only addresses the compliance issue but also brings the benefits of better connectivity as well as better cache reuse

#

If you think [2] is something that might be interesting to you but you don't want to deal with the task of running and managing those engines, happy to chat about that ๐Ÿ™‚