Run code on the host | Dagger | Page 1

cyan gust Dec 12, 2023, 11:08 AM

#

No, Dagger doesn't allow this. The properties that makes Dagger's graph useful, require all operations to be containerized. But you can do this from your own code, intermixed with calls to the Dagger API.

#

One "escape hatch" that is available, is accessing host services. Dagger can orchestrate containers such that they have access to specific network endpoints running on your host network. You could setup an ssh service, then ssh into that from the containers. Or, if the software you need to run on the host has a server mode, just expose that instead.

#

But as a rule, if you're using Dagger, containerizing should be the rule, and "escaping" the containers the exception. Otherwise you're going against the grain of the platform.

formal ledge Dec 12, 2023, 11:12 AM

#

I'm not Linux ninja enough to know, but is it possible to run containers in a way that a process will outlive the container? I'm still trying to find ways to make stateful Bazel work with Dagger, there is no advertised way to start the Bazel server and then configure the client to talk to a specific server. But I'll see if I can find some hidden flags for it, there are plenty of them in Bazel 😛

cyan gust Dec 12, 2023, 11:13 AM

#

You can have a container connect to another container as a service, but it will only last for the duration of your Dagger session - so the Bazel server will be wiped after each run, which is not what you want.

#

One great solution would be to persist the state of the bazel server each time, then you can use cache volumes like we discussed in the beginning. But I don't know enough about Bazel to tell you if that's possible or not

#

I think you mentioned long start times

formal ledge Dec 12, 2023, 11:32 AM

#

Yeah, it's not really possible to persist the state since it's just stored in memory of the Bazel process. I think I'll just have to leave Dagger out of the Bazel part of the pipeline for now.

formal ledge Dec 12, 2023, 12:12 PM

#

Could I mount a named pipe to the Dagger container and the pipe commands to the host? That does feel very hacky 😉

cyan gust Dec 12, 2023, 12:37 PM

#

If the only thing you need to run on the host is the Bazel server, I would just run it as a "regular" service, without involving Dagger, and forward the TCP port via Dagger's C2H networking feature

#

I don't think it's worth rigging a way for your Dagger logic to run commands on the host, if the only command you'll run is that bazel server

unreal rampart Dec 12, 2023, 1:57 PM

#

reading a bit more about the bazel client/server architecture (https://bazel.build/run/client-server) seems like you'll also have to take into account what is the userid and the path of the base workspace directory.

Is the bazel server currently running in your laptop?

Bazel

Client/server implementation | Bazel

formal ledge Dec 12, 2023, 2:15 PM

#

During local dev the client and server run on my laptop

#

And in CI they run on the CI host. AFAIK there is no way to start a remote Bazel server and then issue commands to it with the client. I at least have not found a way to do so. It might also not make sense to do so since the client and the server need access to the same set of source files.

cyan gust Dec 12, 2023, 2:32 PM

#

formal ledge And in CI they run on the CI host. AFAIK there is no way to start a remote Bazel...

That's the part that's not clear to me. How do you start the Bazel server today? If 1) your bazel server is long lived, and 2) your CI jobs which use that bazel server are short-lived, doesn't that require "starting a remote bazel server and then issuing commands to it with the client"?

unreal rampart Dec 12, 2023, 3:37 PM

#

cyan gust That's the part that's not clear to me. How do you start the Bazel server today?...

I've skimmed the bazel docs and seems like the server is automatically started by the client if it's not running (similary to what Dagger does). Since the CI machines are stateful, the server always uses the same caching directories when it starts.

#

Additionally, bazel starts different servers depending on the userId and workspace directory of the build. Apparently that's how they supporto multi-tenancy of multiple projects and users.

#

@formal ledge let me check if I find anything how to make a client connect to an existing server

cyan gust Dec 12, 2023, 3:42 PM

#

fking bazel

#

that's what happens when you want your software to wrap everything, and never the other way around. Headaches.

#

If that's true, then cache volumes should be viable then

#

specifically a cache volume with shared option

#

@unreal rampart what docs page are you looking at?

unreal rampart Dec 12, 2023, 3:46 PM

#

https://bazel.build/run/client-server

Bazel

Client/server implementation | Bazel

#

now skimming through the code to see how the client <> server communication works

cyan gust Dec 12, 2023, 3:47 PM

#

FYI @burnt grove 🙂

formal ledge Dec 12, 2023, 3:50 PM

#

Yeah, this is pretty much correct @unreal rampart
The issue not really the caches that Bazel uses. They can either be on disk or in remote servers which works fine with Dagger.

The issue is that Bazel also keeps in memory state to make incremental builds faster. Doing the initial build might add a few seconds up to minutes to your build depending on how many Bazel targets you have in your repo.

cyan gust Dec 12, 2023, 3:50 PM

#

Right, I now remember that you explained that earlier

#

So what happens when the first client transparently spawns the server, then exits - it just keeps running in the background, detached from the initial process group?

#

a sort of auto-daemon mode in the user's session I guess

#

and @formal ledge do you know how the client connects to that server? A named pipe or unix socket at a specific location in the workspace?

#

Ironically this is quite similar to how Dagger handles its engine container 🙂

unreal rampart Dec 12, 2023, 3:53 PM

#

here's the code that connects to the server: https://github.com/bazelbuild/bazel/blob/351b9a079d528b073d1f1405597d5bc11f6a8dbd/src/main/cpp/blaze.cc#L1593

#

seems like GRPC over TCP

#

seems like the architecture is very "same sandbox" oriented. I can see from the code that the connect function does some pid level checking to make sure the server is running, etc.

#

it also expects to have access to the path where the server writes some files to read the info: https://github.com/bazelbuild/bazel/blob/351b9a079d528b073d1f1405597d5bc11f6a8dbd/src/main/cpp/blaze.cc#L1596 😬

#

there might be a way to hack this by running the dagger engine in non-sandbox mode (no pid namespace basically) and then sending the server_info.rawproto file to the bazel client container. However, it's very very hacky and will probably require some very custom ugly setup

cyan gust Dec 12, 2023, 4:29 PM

#

I strongly discourage doing that 😛

formal ledge Dec 12, 2023, 4:41 PM

#

Thanks for really digging into this. This is all very interesting, I'm curious if you've ever had other stateful processes that haven't really fit into the Dagger model?

#

Would it make sense for Dagger to have some kind of persistent containers that can live through many invocations?

cyan gust Dec 12, 2023, 5:02 PM

#

formal ledge Would it make sense for Dagger to have some kind of persistent containers that c...

Hadn't encountered this particular problem before, it definitely opens the question

formal ledge Dec 13, 2023, 8:34 AM

#

Alrighty, do you want me to create a GH issue to track this?

cyan gust Dec 13, 2023, 9:06 AM

#

formal ledge Alrighty, do you want me to create a GH issue to track this?

would definitely be worth it 🙏

formal ledge Dec 13, 2023, 9:43 AM

#

https://github.com/dagger/dagger/issues/6263

GitHub

✨ Support for stateful processes · Issue #6263 · dagger/dagger

What are you trying to do? I've been looking into using Dagger for our CI pipeline in which one of the main tools we use is Bazel. The issue with using Bazel in Dagger is that the Bazel startup...

burnt grove Dec 14, 2023, 8:18 PM

#

formal ledge And in CI they run on the CI host. AFAIK there is no way to start a remote Bazel...

Share the execroot between runs, you’ll only have to rebuild the analysis graph that way

#

Which got a lot faster in bazel 7 with skymeld btw

#

And yes there is no official way to separate the “server” and the cli, actually the server is shipped inside the cli. There is a server only to keep the state in memory and for watching the workspace. Same with buck2.

unreal rampart Dec 14, 2023, 8:37 PM

#

burnt grove Share the execroot between runs, you’ll only have to rebuild the analysis graph ...

thx for sharing! good to know all this

cyan gust Dec 14, 2023, 9:28 PM

#

burnt grove Share the execroot between runs, you’ll only have to rebuild the analysis graph ...

Thank you Steeve! Does this mean re-running a server each time would be less overhead that way?

burnt grove Dec 14, 2023, 10:35 PM

#

cyan gust Thank you Steeve! Does this mean re-running a server each time would be less ove...

yeah, starting the server is like subsecond, what can take a long time is fetching artifacts and the graph building phase (called the analysis phase)

#

fun part, the server is actually a jar file that lives in the install base:

~/c/g/z/zml (master)> ls $(bazel info install_base)
total 238328
drwxr-xr-x  14 steeve  wheel   448B Dec 11 23:25 ./
drwxr-xr-x   5 steeve  wheel   160B Dec 11 22:58 ../
-rwxr-xr-x   1 steeve  wheel   116M Dec  8  2033 A-server.jar* <----------- BAZEL SERVER
-rwxr-xr-x   1 steeve  wheel     5B Dec  8  2033 build-label.txt*
-rwxr-xr-x   1 steeve  wheel    72K Dec  8  2033 build-runfiles*
-rwxr-xr-x   1 steeve  wheel    50K Dec  8  2033 daemonize*
drwxr-xr-x   8 steeve  wheel   256B Dec 11 22:58 embedded_tools/
-rwxr-xr-x   1 steeve  wheel    32B Dec  8  2033 install_base_key*
-rwxr-xr-x   1 steeve  wheel    51K Dec  8  2033 libcpu_profiler.dylib*
-rwxr-xr-x   1 steeve  wheel    16K Dec  8  2033 linux-sandbox*
drwxr-xr-x   6 steeve  wheel   192B Dec 11 22:58 platforms/
-rwxr-xr-x   1 steeve  wheel   106K Dec  8  2033 process-wrapper*
drwxr-xr-x   9 steeve  wheel   288B Dec 11 22:58 rules_java/
-rwxr-xr-x   1 steeve  wheel   150K Dec  8  2033 xcode-locator*

cyan gust Dec 14, 2023, 10:37 PM

#

burnt grove yeah, starting the server is like subsecond, what can take a long time is fetchi...

and reusing the rootexec allows skipping the artifact fetching phase?

burnt grove Dec 14, 2023, 10:37 PM

#

cyan gust and reusing the rootexec allows skipping the artifact fetching phase?

indeed

#

in theory you can also leverage --repository-cache and --disk-cache too, but they are made mostly for cross WORKSPACE sharing, but they work for (somewhat) stateless builds

#

but for true stateless system a remote cache is the way to go, since you only download the invalidated leafs in the build graph (google build without the bytes)

cyan gust Dec 14, 2023, 10:39 PM

#

Now we're getting somewhere 🙂

cyan gust Dec 14, 2023, 10:40 PM

#

burnt grove but for true stateless system a remote cache is the way to go, since you only do...

"Remote cache" being the long-running server (one per workspace per user) that was discussed earlier, or something else?

burnt grove Dec 14, 2023, 10:40 PM

#

cyan gust "Remote cache" being the long-running server (one per workspace per user) that w...

no something running remotely, an actually always running CAS server

#

you can use GCS itself as a bazel cache, too, but it's rather slow

#

i wrote a GCS backed bazel cache at Zenly (https://x.com/steeve/status/1367530898598592526?s=20) which was open sourced and support build without the bytes (leverages a merkle tree), but since Snap closed the company, the repo vanished (asshole move...)

burnt grove Dec 14, 2023, 10:43 PM

#

burnt grove you can use GCS itself as a bazel cache, too, but it's rather slow

actually, since bazel has dynamic scheduling (meaning it'll race the local build with the download), this may not be that big of a deal nowadays, but you do lose a bit in GCS latency

cyan gust Dec 14, 2023, 10:44 PM

#

Well in the case of running Bazel in Dagger, the rootexec directory will be persisted on a Dagger cache volume, which can be distributed across nodes just like the rest of the Dagger cache. So the end result I think will be stateless bazel, or at least stateless enough

burnt grove Dec 14, 2023, 10:44 PM

#

yeah, although the problem is that cache is huge, so snapshotting/restoring is something to watch out for

#

Zenly iOS that was like 5 to 8GB for a cold build

cyan gust Dec 14, 2023, 10:46 PM

#

@formal ledge I believe we have a possible solution to your problem, which will not require a long-running Bazel running on the host after all! Thanks to @burnt grove

#

I'm going to create a "Dagger and Bazel" channel to celebrate 🙂

burnt grove Dec 14, 2023, 10:47 PM

#

cyan gust Well in the case of running Bazel in Dagger, the rootexec directory will be pers...

honestly, i'd just start with the cache in the volume strategy, i'm curious!

#

it's just that on Github Actions snapshot/restore is painfully slow

cyan gust Dec 14, 2023, 10:49 PM

#

We don't use the Github Actions cache 😛

#

With Dagger Cloud distributed caching, the snapshots go to the closest storage bucket. We have an edge in each AWS and GCP region, so for self-hosted GHA runners in either of those clouds it should be very very fast. If running in managed Github Actions (Azure), it will fallback to Cloudflare R2 which is quite good too

burnt grove Dec 14, 2023, 10:51 PM

#

how does that work? is it a blob ?

cyan gust Dec 14, 2023, 10:53 PM

#

For layer cache it's just the buildkit state directory, so a bunch of layers. For cache volumes (just a bunch of bind mounts) they're just directories that we snaphot. I don't know what the snapshotting method is, @midnight tinsel will know. I'm guessing something straightforward like one tarball per volume maybe? 🤷‍♂️

burnt grove Dec 14, 2023, 10:54 PM

#

cyan gust For layer cache it's just the buildkit state directory, so a bunch of layers. Fo...

so actually it does look like you could get away with good performance with a low maintenance

#Run code on the host