#Engine

1 messages · Page 1 of 1 (latest)

late plank
#

Yeah I like that this will let us do the work to bundle engine+buildkit into an image together independently of figuring out all the rest of the problems that need to be solved.

Just to double check we’re on same page, SDKs would basically all start out with the docker provisioner in this approach, right? I.e if nodejs needs an engine, it will shell out and do “docker run -v /:/host … dagger/engine:v0.3” or something right?

This fully enshrines “local docker daemon” as the one and only way to run the dagger engine, but gotta start somewhere.

#

I forget if -v /:/host includes submounts under /, could be minor corner cases if not, but likely handleable

gaunt glen
gaunt glen
#

@ember grove, your questions regarding the CLI. I think the engine is going to be bundled as said above, for the next SDK releases

ember grove
#

we don’t need to define local dirs ahead of time

Wait, what? I need to learn more about that!

#

I was doing work now on subprocess when I realized the CLI hasn't been released so I guess I'll skip it for now until this is decided.

#

My provisioner code is decoupled now btw, so I can swap easily.

broken mulch
#

Doesn’t this completely break remote engines?

ember grove
# nova raft here's the PR: <https://github.com/dagger/dagger/pull/3560>

Oh I see 🙂 In v0.2 reading a directory from host (Client API) with a dynamic path is not possible. Not a problem writing dirs or files, or reading files, it's just reading directories that's the exception. And that was the reason, because we need to send the list of localdirs when initing buildkit.

cloud sun
cloud sun
# broken mulch Doesn’t this completely break remote engines?

Yes, that's the downside (and performance maybe? not sure). Worth entertaining the idea though as it flips the problem around: we get to switch to engine as an OCI image right away, and "remote engine" becomes a feature for later (dependent on SDKs supporting filesync)

late plank
cloud sun
#

Also gives a sleek path for the playground: just -v /tmp:/host and done 🙂

#

no software sandboxing needed

nova raft
#

i'd very much like to get to an OCI-provisioned buildkit asap so I can start playing with services support, since it required installing CNI plugins + changing buildkit config for Bass's rendition

#

much easier to do by building a custom image

late plank
nova raft
#

oo, started working on what? services?

late plank
#

Oh no, just a magefile step that builds buildkit into an image along side whatever else we want in there

nova raft
#

oh nice

cloud sun
nova raft
broken mulch
#

I think we need to discuss the multi tenancy aspect of these changes

#

we’ve looked at it in terms of “what is remote” but it’s also “what is multi-tenant “

late plank
#

I agree we have to figure out multitenancy but afaict this change would be independent of that. It's possible for multiple tenants to use an engine today (run cloak dev and let multiple users connect), it'll still be possible after this change too with the same problems and constraints.

I think that no matter what we want engine to exist with the builder in the OCI image, so it is appealing to be able to make that change independently while figuring out multitenancy, session/pinning issues, localdir sync from sdks, etc.

broken mulch
#

Current architecture

#

Proposed "grenade" architecture

#

I agree it makes sense to consider this option, since it's a first step in the direction of the "grenade", that we could implement quickly. But, just because it's easy doesn't make it a no-brainer either. I think we should carefully discuss pros and cons. My first reaction is fear - but need to take a little time to think through why.

late plank
#

A highly related issue around session stickiness: https://github.com/dagger/dagger/issues/3613

Just an initial summary of the previous discussion to get started, but ties into multitenancy and other architecture stuff too

GitHub

Continuing the discussion here: #3421 (comment) This is all just my interpretation and summary of that discussion, @vito please correct me if I misrepresent anything. Problem Buildkit sessions are ...

late plank
#

One thing worth noting, unless we make other changes, running engine w/ -v /:/host right now implies that either:

  1. Engine is long-running, in which case there would be a single buildkit session forever, even after the client disconnects.
    • In this case, the only solution I know of is to solve the problems in that source pinning issue above (in addition to a few other misc problems)
  2. Engine continues to be short-running and scoped to one client connection.
    • So this would mean that whereas today provisioners do docker run only when buildkitd isn't running, now they call that everytime and block until its complete.
    • I guess it would also be possible to have a long running container where buildkitd is persistently running but engine is run w/ docker exec
late plank
#

I keep losing track of all the stuff we'd need to do to make the engine fully remote (just remembered another one while reviewing Alex's PR: oci tarball export, which is different than local dirs). So I made this issue to track everything and link to subissues: https://github.com/dagger/dagger/issues/3624

nova raft
late plank
#

My plan is to try to prototype running engine w/ -v /:/host as quickly as possible just to get a concrete idea on whether it can serve as our "next stopgap". Worst case we end up with a magefile target for running a custom buildkitd 🙂 But if it works out, we can decide which parts of the remote engine tasks listed there makes sense to address next. Hopefully we can find a path like that where we can just tackle this one piece at a time

late plank
#

A couple downsides I've noticed so far:

  • On docker desktop for macos docker run -v /:/host is weird for a bunch of reasons. You get the files that are under / on macos, but also a bunch of other stuff. And worst of all, /Users is a broken symlink to /host_mnt/Users for some reason, so you can't access your home dir by default. If you try to mount in your homedir directly, you get a ton of auth prompts whenever you try to do anything

  • -v /:/host will not behave as expected if you are already in a docker container where the docker socket was mounted in

#

The first one sucks a lot, second is not great but fairly obscure

#

FWIW I did otherwise get it to work so far. Packaged buildkit and cloak into an image; pulled down your stdio PR @cloud sun and then re-purposed it to enable connecting to cloak dial-stdio over docker exec. I also got log streaming to work by sending it over the stderr over the commandconn.

late plank
#

^^ After thinking about it more and also based on what we just discussed in the graphql api meeting, I don't think we should go down the whole -v /:/host route. The downside of not really working on docker desktop for macos is pretty huge IMO.

Plus, as just discussed, it would make sense to always have the option of shelling out to a local binary to enable easier bootstrapping of new SDKs (even if it's not a firm requirement for all of them forever).

So, here's my thinking on the shortest possible path to an architecture that works for now but also lets us rearrange things as we keep iterating:

  1. Continue relying on a local engine binary that SDKs talk to, but over stdio (using Andrea's PR). The engine binary will be pretty much the same as today; it will still include the graphql router.
  2. That engine binary no longer provisions moby/buildkit:v0.x.x, we now have our own published images like dagger/engine:v0.x.x. Inside that image, all we do initially is package up buildkit and its dependencies and run them.

I think if we do that, the "visible" components of the architecture will be "local binary" plus "engine image". Then our future iterations can basically just consist of moving internal pieces of functionality between those two components. Eventually, for some SDKs, we will move everything from local binary to engine image and thus don't need the local binary at all.

late plank
#

If we go the above route the immediate things we'd have to do are:

  1. Figure out how to package the local binary w/ whichever SDK we release next
    • Literally putting the binary for each platform in the npm/pypi package is an option
    • Could have the SDK download the binary from somewhere
    • Could have the SDK provision the dagger/engine:v0.x.x image themselves (i.e. move that to native SDK code), then we can put the local engine binary for each platform in that image and have the SDKs grab it out of there (docker cp or equivalent)
  2. Publish the dagger/engine:v0.x.x images.
    • If we are starting out with just plain old buildkitd in there, we could literally just retag moby/buildkit and push it to our own repo 🙃
#

I'll make issues for all this stuff if there's agreement, just brain dumping initially to get other thoughts

nova raft
#

thinking out loud: could we just put the binary in the OCI image and run it with docker exec?

late plank
nova raft
#

oh right

nova raft
#

this plan sounds good to me. I'm thinking of poking at the filesync API/protocol to see if we can have it transfer over websockets and straight into the buildkit session. and also whether we can re-use it for export. but if someone gets to that before me, by all means!

#

I can open an issue for that too

late plank
#

I don't think we have an issue for it yet

nova raft