Good morning! I'm coming to Dagger after using Earthly. Earthly used a forked version of Buildkit as their daemon under the hood. I'm curious how Dagger uses Buildkit - do you have a forked version as well, or do you use the stock buildkit version? Also, how deeply integrated is Dagger to buildkit - are there plans for other "backends"?
#How does Dagger use BuildKit under the hood?
1 messages ยท Page 1 of 1 (latest)
๐
we do have a forked version of it but it's not heavily modified. We try as much as we can to push our changes upstream but sometimes we need to use our fork given that upstream release cycles don't match ours.
Regarding other backends, we're currently decoupling ourselves from some of the buildkit layers (like caching) so we have more autonomy and flexibility to deliver better solutions on that front. Wondering what use case you have in mind when you're thinking about other type of backends
awesome, that's a great overview, thanks! in case it's helpful to you/your team, the Earthly team recently did a write up of the modifications they made to their buildkit fork... it's here if it's useful to y'all at all: https://docs.google.com/document/d/1cakoPsXaw_IBUFUhPkz8aTyGJamyLGq91dxeAVWTgpM/edit?usp=sharing
Google Docs
Changes we made to Buildkit as part of Earthly: Custom exporter (earthlyoutputs), than enables the following Outputting multiple images and artifacts within the same build context (as opposed to having to create different build contexts for each one) Exporting via an embedded registry that we ad...
Wondering what use case you have in mind when you're thinking about other type[s] of backends
I'm thinking more about if users ever wanted to use Dagger as a general orchestration engine over resources that don't live within Docker containers...
For example, if I wanted to manage a server over SSH, how easy would it be for me to write an "ansible" module, to enable writing code like:
dag.ssh_server("xyz.example.com")
.copy_directory("/opt/application", source)
.exec("/opt/application/install.sh")
I know this starts to deconstruct the caching and sandbox guarantees that Docker provides you, but it could be nice to allow developers access to build workflows like that (as long as they're willing to accept the tradeoffs... something like dag.ssh_server(...).exec(...) could never be cached by default (without something like a cacheKey or some other caching logic))
but you can have this kind of experience with a module today that can connect through SSH to the target server(s)
all the DAG properties of the engine will still apply to the workflow that you're describing
what I'm saying is that all the copy_directory and exec operations can be written in userland (module)
oh, interesting... are there any published examples of workflows that don't start with the container primitives? and does that mean caching is implemented at the module level for those?
The way I was thinking about this is that your module will still require a container with the SSH client. This is actually a good thing since in any case you'd probably want to use a stable client no matter where you're running your pipelines from
ah - so everything still has to start with the container(...) primitive today... I see
correct
it's pretty much the only way that you can guarantee that you can run your workflows from anywhere and that they should still work ๐
but the existing dagger setup at least requires you to run the dagger engine inside a container, which requires docker/podman/etc...
is there a path for a developer today to provide an API compatible with dagger to use a different base primitive? if we say, OK, we're going to agree that the only way you can run this workflow is if you set up an Ansible environment on your local machine...
if the developer wants to choose to make a workflow not portable and skip docker, is there a path to do that?
but the existing dagger setup at least requires you to run the dagger engine inside a container, which requires docker/podman/etc...
correct. We currently run the engine in docker/podman on the user's machine mostly for convenience. Due to the fact that the engine requires an environment with some custom binaries and libraries, using a container runtime seemed like the best approach. We do see a future where we provide something similar to a Flatpak bundle where you can just run dagger in your machine without the Docker or Podman dependency
if the developer wants to choose to make a workflow not portable and skip docker, is there a path to do that?
that hasn't been discussed in length since it kind of goes against to the problem Dagger is trying to solve. Besides the DAG caching, I'm wondering which other benefit Dagger could bring in a world where portability and reproducibility is completely removed from the equation. I'd think that there are tools that might already be solving this problem like Airflow for instance
I'm wondering which other benefit Dagger could bring in a world where portability and reproducibility is completely removed from the equation.
I don't know that I would go that far, I'd frame it more like choosing which underlying platform to make the "ground truth" to build portable workflows.
Maybe it was a mistake for me to phrase it as a developer choosing to make a workflow not portable... what I really imagine is something like a team who all uses the same software stack ("ground truth") deciding to use that as the basis for their dagger workflows.
Maybe it was a mistake for me to phrase it as a developer choosing to make a workflow not portable... what I really imagine is something like a team who all uses the same software stack ("ground truth") deciding to use that as the basis for their dagger workflows.
That's where this becomes a challenge. How does this look in practice? I have never seen it in the wild where a group of people have a "ground truth" setup where everything on their machines is exactly the same. This is the reason why tools like Vagrant, Docker, etc became so popular.
If I have to do this in my organization I guess I'd either choose one of the following strategies:
-
The current way Dagger works.
-
If running Dagger locally becomes a challenge for compliance reasons, I'd spin up the engines remotely and then point the developer's engines to a remote VM. This not only addresses the compliance issue but also brings the benefits of better connectivity as well as better cache reuse
If you think [2] is something that might be interesting to you but you don't want to deal with the task of running and managing those engines, happy to chat about that ๐