Hi folks! I’m Maarten from HackerOne. | Dagger | Page 1

bright summit Jul 3, 2024, 5:00 PM

#

Hey! Welcome 🙂

Its definitely possible and I don't think its a bad practice.

Dagger is just code, so you can do anything that you can do in go, python, or typescript.

The way i would imagine the thing you're describing working is

write a test function that accepts some parameters (like the directory for the thing youre trying to test, maybe even a test command if its different across sub projects)
write a function or long cli command (or both!) string that calls function 1 with all the monorepo parameters defined

Is there a particular SDK that you were most interested in?

magic pollen Jul 3, 2024, 9:27 PM

#

SDK is going to be Python!

bright summit Jul 3, 2024, 10:36 PM

#

Great choice 🙂

Here's a fairly complex python example that does a concurrent build matrix based on python versions https://github.com/levlaz/boundary-layer/blob/master/dagger/src/main/__init__.py

Perhaps it can serve as inspiration, let me know if you have any questions!

GitHub

boundary-layer/dagger/src/main/__init__.py at master · levlaz/bound...

Builds Airflow DAGs from configuration files. Powers all DAGs on the Etsy Data Platform - levlaz/boundary-layer

magic pollen Jul 4, 2024, 6:00 AM

#

Thanks for sharing! Then I would guess given a monorepo with multiple projects you would do

Create a dagger module per project
Implement a test function in each module for each project
Create a root module which includes the other modules and awaits each test invocation

An approach like this raises a couple of questions:

Can you control the parallelism of the generated dag by submitting the job to a cluster of machines? Or is the entire dag executed on a single machine?
Each project might require its own host dependencies (files, secrets, env variables). Do you require passing all of these as arguments to the cli? Or is there also a way to store these in code? Similar to Bazel .bazelrc for config and BUILD files for files/directories?

bright summit Jul 5, 2024, 5:21 PM

#

Can you control the parallelism of the generated dag by submitting the job to a cluster of machines? Or is the entire dag executed on a single machine?

Right now the entire dag is executed on a single machine. It can do many things in parallel but you are limited scaling vertically. Vertical scaling does work quite well for many people though, but it would be great to hear your experience if you give it a try.

Horizontal scaling is still a WIP. One approach is to use the existing CI config to dispatch different steps to different dagger engines.

Each project might require its own host dependencies (files, secrets, env variables). Do you require passing all of these as arguments to the cli? Or is there also a way to store these in code? Similar to Bazel .bazelrc for config and BUILD files for files/directories?

The concept of "host" changes a bit in dagger, remember everything is going to be executed in a container runtime. I would suggest using a function like base() to define all the dependencies. Check out this example from that project I shared before: https://github.com/levlaz/boundary-layer/blob/master/dagger/src/main/__init__.py#L22

In this case it returns the project specific build container, but the fact that it returns a container means that you can chain it indefinitly. So in your case, you may have a global base() function that includes all the most common dependencies that exist across all project, and then have a similar local base() function within a given project that chains on the thing you need.

The benefit of this approach is that you only have to build the original base() once, and then incrementally add things in, everything will be cached as much as possible automatically.

GitHub

boundary-layer/dagger/src/main/__init__.py at master · levlaz/bound...

Builds Airflow DAGs from configuration files. Powers all DAGs on the Etsy Data Platform - levlaz/boundary-layer

magic pollen Jul 7, 2024, 4:56 PM

#

Thanks for the context! I’ll try to experiment!

#

Also I read somewhere that all exported files to the host are written once? At the end of the dag execution? I’m asking because I have the following use case:

Build a NixOS vm inside a container using https://github.com/nix-community/nixos-generators
Export the image to the host
Run the image using Lima (because I assume you cannot do virtualization inside a Dagger container). Communicate with Lima on the host from within a container using https://github.com/msoap/shell2http
Start a testinfra test suite inside a container against the running vm

If it’s possible to export the image during execution, instead of at the end, all these steps could become a single Dagger invocation!

GitHub

GitHub - nix-community/nixos-generators: Collection of image builde...

Collection of image builders [maintainer=@Lassulus] - nix-community/nixos-generators

GitHub

GitHub - msoap/shell2http: Executing shell commands via HTTP server

Executing shell commands via HTTP server. Contribute to msoap/shell2http development by creating an account on GitHub.

#

Alternatives to step 3 would be:

Ability to run a dagger function on the host, not in a container
Ability to run a VM from within a container
Ability to run a Dagger function directly in a vm

bright summit Jul 8, 2024, 5:17 PM

#

What is Lima?

(because I assume you cannot do virtualization inside a Dagger container).

You can run Docker in Docker with Dagger, so if Lima can run inside of a container then you can run it as a service as a part of your pipeline.

Alternatives to step 3 would be:

Ability to run a dagger function on the host, not in a container

Ability to run a VM from within a container

Ability to run a Dagger function directly in a vm

Its not possible to do any of these afaik.

magic pollen Jul 9, 2024, 5:52 AM

#

Lima is an easy way to create vms https://github.com/lima-vm/lima

GitHub

GitHub - lima-vm/lima: Linux virtual machines, with a focus on runn...

Linux virtual machines, with a focus on running containers - lima-vm/lima

#

Gotcha! Then I’m going to try to upload the image during the build to the macOS host and boot the vm there.

bright summit Jul 9, 2024, 7:07 AM

#

Sorry I am confused how VMs are getting into the mix 🙂

What is the purpose of Lima here, why not just run the container without Lima?

magic pollen Jul 9, 2024, 7:40 AM

#

Haha no worries, it’s also a bit complicated!

The vm part is necessary to run an iso created by a Dagger build. Dagger is used to create an os image which I want to run tests against. Unfortunately it’s not possible to load an os image into a container (afaik) and therefore a vm is necessary.

#Hi folks! I’m Maarten from HackerOne.