Engine | Dagger | Page 1

late plank Oct 31, 2022, 8:15 AM

#

Yeah I like that this will let us do the work to bundle engine+buildkit into an image together independently of figuring out all the rest of the problems that need to be solved.

Just to double check we’re on same page, SDKs would basically all start out with the docker provisioner in this approach, right? I.e if nodejs needs an engine, it will shell out and do “docker run -v /:/host … dagger/engine:v0.3” or something right?

This fully enshrines “local docker daemon” as the one and only way to run the dagger engine, but gotta start somewhere.

#

I forget if -v /:/host includes submounts under /, could be minor corner cases if not, but likely handleable

gaunt glen Oct 31, 2022, 10:25 AM

#

late plank Yeah I like that this will let us do the work to bundle engine+buildkit into an ...

Oh, so we are moving the provisionner part to the SDKs ? Why not continue on the bundling PR and keep this logic on Dagger ? (not building the image, but just running it)

gaunt glen Oct 31, 2022, 11:33 AM

#

@ember grove, your questions regarding the CLI. I think the engine is going to be bundled as said above, for the next SDK releases

ember grove Oct 31, 2022, 11:37 AM

#

we don’t need to define local dirs ahead of time

Wait, what? I need to learn more about that!

#

I was doing work now on subprocess when I realized the CLI hasn't been released so I guess I'll skip it for now until this is decided.

#

My provisioner code is decoupled now btw, so I can swap easily.

broken mulch Oct 31, 2022, 2:59 PM

#

Doesn’t this completely break remote engines?

nova raft Oct 31, 2022, 4:11 PM

#

ember grove > we don’t need to define local dirs ahead of time Wait, what? I need to learn ...

here's the PR: https://github.com/dagger/dagger/pull/3560

broken mulch Oct 31, 2022, 4:15 PM

#

gaunt glen <@768585883120173076>, your questions regarding the CLI. I think the engine is g...

not decided

ember grove Oct 31, 2022, 4:15 PM

#

nova raft here's the PR: <https://github.com/dagger/dagger/pull/3560>

Oh I see 🙂 In v0.2 reading a directory from host (Client API) with a dynamic path is not possible. Not a problem writing dirs or files, or reading files, it's just reading directories that's the exception. And that was the reason, because we need to send the list of localdirs when initing buildkit.

cloud sun Oct 31, 2022, 5:36 PM

#

late plank Yeah I like that this will let us do the work to bundle engine+buildkit into an ...

Yeah. That by default, or (not needed right away, but fairly easy) you point to an already running engine (DAGGER_HOST) and we add docs on how to provision that

cloud sun Oct 31, 2022, 5:38 PM

#

broken mulch Doesn’t this completely break remote engines?

Yes, that's the downside (and performance maybe? not sure). Worth entertaining the idea though as it flips the problem around: we get to switch to engine as an OCI image right away, and "remote engine" becomes a feature for later (dependent on SDKs supporting filesync)

late plank Oct 31, 2022, 5:39 PM

#

cloud sun Yes, that's the downside (and performance maybe? not sure). Worth entertaining t...

Yeah I would say that since whatever changes we make here will be breaking anyways (BUILDKIT_HOST won't work anymore, plus in our new docs we never mention it anyways yet) that it'd be okay if for our very first step we only supported local docker daemon provisioner.

cloud sun Oct 31, 2022, 5:40 PM

#

Also gives a sleek path for the playground: just -v /tmp:/host and done 🙂

#

no software sandboxing needed

nova raft Oct 31, 2022, 5:42 PM

#

i'd very much like to get to an OCI-provisioned buildkit asap so I can start playing with services support, since it required installing CNI plugins + changing buildkit config for Bass's rendition

#

much easier to do by building a custom image

late plank Oct 31, 2022, 5:43 PM

#

nova raft i'd very much like to get to an OCI-provisioned buildkit asap so I can start pla...

I started working on this, based on top of the multiplatform PR so we can do it via dagger and push multiple arches

nova raft Oct 31, 2022, 5:44 PM

#

oo, started working on what? services?

late plank Oct 31, 2022, 5:44 PM

#

Oh no, just a magefile step that builds buildkit into an image along side whatever else we want in there

nova raft Oct 31, 2022, 5:44 PM

#

oh nice

cloud sun Oct 31, 2022, 5:44 PM

#

nova raft i'd very much like to get to an OCI-provisioned buildkit asap so I can start pla...

Like the engine+buildkit bundle suggestion I made above, or you mean buildkit-only packaged up?

nova raft Oct 31, 2022, 5:47 PM

#

cloud sun Like the engine+buildkit bundle suggestion I made above, or you mean buildkit-on...

engine + buildkit, ultimately, but no need for scope creep if that's not what you had in mind yet. part of bass's services implementation is to healthcheck ports, which will be easier to do from the same host. but this is all bass's take on it, don't know what we'll land on 🙂

broken mulch Oct 31, 2022, 5:47 PM

#

I think we need to discuss the multi tenancy aspect of these changes

#

we’ve looked at it in terms of “what is remote” but it’s also “what is multi-tenant “

late plank Oct 31, 2022, 5:53 PM

#

I agree we have to figure out multitenancy but afaict this change would be independent of that. It's possible for multiple tenants to use an engine today (run cloak dev and let multiple users connect), it'll still be possible after this change too with the same problems and constraints.

I think that no matter what we want engine to exist with the builder in the OCI image, so it is appealing to be able to make that change independently while figuring out multitenancy, session/pinning issues, localdir sync from sdks, etc.

broken mulch Oct 31, 2022, 6:05 PM

#

Note I updated the diagrams in this issue to reflect the multi-tenancy aspect of the problem: https://github.com/dagger/dagger/issues/3595

GitHub

Explain the engine architecture · Issue #3595 · dagger/dagger

Problem In the documentation, we talk about the Dagger Engine in these terms: Using the SDK, your program opens a new session to a Dagger Engine: either by connecting to an existing engine, or by p...

#

Current architecture

Screen_Shot_2022-10-31_at_11.05.57_AM.png

#

Proposed "grenade" architecture

Screen_Shot_2022-10-31_at_11.06.13_AM.png

#

I agree it makes sense to consider this option, since it's a first step in the direction of the "grenade", that we could implement quickly. But, just because it's easy doesn't make it a no-brainer either. I think we should carefully discuss pros and cons. My first reaction is fear - but need to take a little time to think through why.

late plank Oct 31, 2022, 7:55 PM

#

A highly related issue around session stickiness: https://github.com/dagger/dagger/issues/3613

Just an initial summary of the previous discussion to get started, but ties into multitenancy and other architecture stuff too

GitHub

Replace session stickiness with source pinning · Issue #3613 · dagg...

Continuing the discussion here: #3421 (comment) This is all just my interpretation and summary of that discussion, @vito please correct me if I misrepresent anything. Problem Buildkit sessions are ...

late plank Oct 31, 2022, 11:23 PM

#

One thing worth noting, unless we make other changes, running engine w/ -v /:/host right now implies that either:

Engine is long-running, in which case there would be a single buildkit session forever, even after the client disconnects.
- In this case, the only solution I know of is to solve the problems in that source pinning issue above (in addition to a few other misc problems)
Engine continues to be short-running and scoped to one client connection.
- So this would mean that whereas today provisioners do docker run only when buildkitd isn't running, now they call that everytime and block until its complete.
- I guess it would also be possible to have a long running container where buildkitd is persistently running but engine is run w/ docker exec

late plank Oct 31, 2022, 11:43 PM

#

I keep losing track of all the stuff we'd need to do to make the engine fully remote (just remembered another one while reviewing Alex's PR: oci tarball export, which is different than local dirs). So I made this issue to track everything and link to subissues: https://github.com/dagger/dagger/issues/3624

nova raft Oct 31, 2022, 11:45 PM

#

maybe we're all on the same page already, but to me, this feels like the next big epic/top priority. I mentioned here (https://github.com/dagger/dagger/pull/3348#discussion_r1009954916) that whatever host API we have feels temporary until we figure all this out

late plank Oct 31, 2022, 11:48 PM

#

nova raft maybe we're all on the same page already, but to me, this feels like the next bi...

Yes I completely agree (among all the other big epics and top priorities 😅 ) It's becoming a bottleneck on quite a bit

#

My plan is to try to prototype running engine w/ -v /:/host as quickly as possible just to get a concrete idea on whether it can serve as our "next stopgap". Worst case we end up with a magefile target for running a custom buildkitd 🙂 But if it works out, we can decide which parts of the remote engine tasks listed there makes sense to address next. Hopefully we can find a path like that where we can just tackle this one piece at a time

late plank Nov 1, 2022, 3:12 AM

#

A couple downsides I've noticed so far:

On docker desktop for macos docker run -v /:/host is weird for a bunch of reasons. You get the files that are under / on macos, but also a bunch of other stuff. And worst of all, /Users is a broken symlink to /host_mnt/Users for some reason, so you can't access your home dir by default. If you try to mount in your homedir directly, you get a ton of auth prompts whenever you try to do anything
-v /:/host will not behave as expected if you are already in a docker container where the docker socket was mounted in

#

The first one sucks a lot, second is not great but fairly obscure

#

FWIW I did otherwise get it to work so far. Packaged buildkit and cloak into an image; pulled down your stdio PR @cloud sun and then re-purposed it to enable connecting to cloak dial-stdio over docker exec. I also got log streaming to work by sending it over the stderr over the commandconn.

late plank Nov 1, 2022, 5:38 PM

#

^^ After thinking about it more and also based on what we just discussed in the graphql api meeting, I don't think we should go down the whole -v /:/host route. The downside of not really working on docker desktop for macos is pretty huge IMO.

Plus, as just discussed, it would make sense to always have the option of shelling out to a local binary to enable easier bootstrapping of new SDKs (even if it's not a firm requirement for all of them forever).

So, here's my thinking on the shortest possible path to an architecture that works for now but also lets us rearrange things as we keep iterating:

Continue relying on a local engine binary that SDKs talk to, but over stdio (using Andrea's PR). The engine binary will be pretty much the same as today; it will still include the graphql router.
That engine binary no longer provisions moby/buildkit:v0.x.x, we now have our own published images like dagger/engine:v0.x.x. Inside that image, all we do initially is package up buildkit and its dependencies and run them.

I think if we do that, the "visible" components of the architecture will be "local binary" plus "engine image". Then our future iterations can basically just consist of moving internal pieces of functionality between those two components. Eventually, for some SDKs, we will move everything from local binary to engine image and thus don't need the local binary at all.

late plank Nov 1, 2022, 5:56 PM

#

If we go the above route the immediate things we'd have to do are:

Figure out how to package the local binary w/ whichever SDK we release next
- Literally putting the binary for each platform in the npm/pypi package is an option
- Could have the SDK download the binary from somewhere
- Could have the SDK provision the dagger/engine:v0.x.x image themselves (i.e. move that to native SDK code), then we can put the local engine binary for each platform in that image and have the SDKs grab it out of there (docker cp or equivalent)
Publish the dagger/engine:v0.x.x images.
- If we are starting out with just plain old buildkitd in there, we could literally just retag moby/buildkit and push it to our own repo 🙃

#

I'll make issues for all this stuff if there's agreement, just brain dumping initially to get other thoughts

nova raft Nov 1, 2022, 5:59 PM

#

thinking out loud: could we just put the binary in the OCI image and run it with docker exec?

late plank Nov 1, 2022, 6:00 PM

#

nova raft thinking out loud: could we just put the binary in the OCI image and run it with...

That's what I implemented yesterday, the problem is localdirs, the -v /:/host doesn't really work on macos and also has corner cases on other platforms

nova raft Nov 1, 2022, 6:00 PM

#

oh right

nova raft Nov 1, 2022, 6:35 PM

#

this plan sounds good to me. I'm thinking of poking at the filesync API/protocol to see if we can have it transfer over websockets and straight into the buildkit session. and also whether we can re-use it for export. but if someone gets to that before me, by all means!

#

I can open an issue for that too

late plank Nov 1, 2022, 6:39 PM

#

nova raft this plan sounds good to me. I'm thinking of poking at the filesync API/protocol...

That sounds amazing, please do. I think if we had that change in place, the docker exec approach becomes more viable. We'd need to solve multiplexing over stdio and also the registry auth tokens (we currently buildkit's session attachable, which just reads from your local docker creds file), but the filesync aspect is the biggest blocker on that by far.

#

I don't think we have an issue for it yet

nova raft Nov 1, 2022, 8:33 PM

#

opened https://github.com/dagger/dagger/issues/3629

GitHub

Investigate using filesync protocol/service with remote engine · Is...

related to #3624 Now that we've hoisted the filesync implementation from Buildkit into our codebase, can we modify it to support dynamically syncing from remote filesync protocol streams? T...

#Engine