#Engine
1 messages · Page 1 of 1 (latest)
Yeah I like that this will let us do the work to bundle engine+buildkit into an image together independently of figuring out all the rest of the problems that need to be solved.
Just to double check we’re on same page, SDKs would basically all start out with the docker provisioner in this approach, right? I.e if nodejs needs an engine, it will shell out and do “docker run -v /:/host … dagger/engine:v0.3” or something right?
This fully enshrines “local docker daemon” as the one and only way to run the dagger engine, but gotta start somewhere.
I forget if -v /:/host includes submounts under /, could be minor corner cases if not, but likely handleable
Oh, so we are moving the provisionner part to the SDKs ? Why not continue on the bundling PR and keep this logic on Dagger ? (not building the image, but just running it)
@ember grove, your questions regarding the CLI. I think the engine is going to be bundled as said above, for the next SDK releases
we don’t need to define local dirs ahead of time
Wait, what? I need to learn more about that!
I was doing work now on subprocess when I realized the CLI hasn't been released so I guess I'll skip it for now until this is decided.
My provisioner code is decoupled now btw, so I can swap easily.
Doesn’t this completely break remote engines?
here's the PR: https://github.com/dagger/dagger/pull/3560
not decided
Oh I see 🙂 In v0.2 reading a directory from host (Client API) with a dynamic path is not possible. Not a problem writing dirs or files, or reading files, it's just reading directories that's the exception. And that was the reason, because we need to send the list of localdirs when initing buildkit.
Yeah. That by default, or (not needed right away, but fairly easy) you point to an already running engine (DAGGER_HOST) and we add docs on how to provision that
Yes, that's the downside (and performance maybe? not sure). Worth entertaining the idea though as it flips the problem around: we get to switch to engine as an OCI image right away, and "remote engine" becomes a feature for later (dependent on SDKs supporting filesync)
Yeah I would say that since whatever changes we make here will be breaking anyways (BUILDKIT_HOST won't work anymore, plus in our new docs we never mention it anyways yet) that it'd be okay if for our very first step we only supported local docker daemon provisioner.
Also gives a sleek path for the playground: just -v /tmp:/host and done 🙂
no software sandboxing needed
i'd very much like to get to an OCI-provisioned buildkit asap so I can start playing with services support, since it required installing CNI plugins + changing buildkit config for Bass's rendition
much easier to do by building a custom image
I started working on this, based on top of the multiplatform PR so we can do it via dagger and push multiple arches
oo, started working on what? services?
Oh no, just a magefile step that builds buildkit into an image along side whatever else we want in there
oh nice
Like the engine+buildkit bundle suggestion I made above, or you mean buildkit-only packaged up?
engine + buildkit, ultimately, but no need for scope creep if that's not what you had in mind yet. part of bass's services implementation is to healthcheck ports, which will be easier to do from the same host. but this is all bass's take on it, don't know what we'll land on 🙂
I think we need to discuss the multi tenancy aspect of these changes
we’ve looked at it in terms of “what is remote” but it’s also “what is multi-tenant “
I agree we have to figure out multitenancy but afaict this change would be independent of that. It's possible for multiple tenants to use an engine today (run cloak dev and let multiple users connect), it'll still be possible after this change too with the same problems and constraints.
I think that no matter what we want engine to exist with the builder in the OCI image, so it is appealing to be able to make that change independently while figuring out multitenancy, session/pinning issues, localdir sync from sdks, etc.
Note I updated the diagrams in this issue to reflect the multi-tenancy aspect of the problem: https://github.com/dagger/dagger/issues/3595
Current architecture
Proposed "grenade" architecture
I agree it makes sense to consider this option, since it's a first step in the direction of the "grenade", that we could implement quickly. But, just because it's easy doesn't make it a no-brainer either. I think we should carefully discuss pros and cons. My first reaction is fear - but need to take a little time to think through why.
A highly related issue around session stickiness: https://github.com/dagger/dagger/issues/3613
Just an initial summary of the previous discussion to get started, but ties into multitenancy and other architecture stuff too
Continuing the discussion here: #3421 (comment) This is all just my interpretation and summary of that discussion, @vito please correct me if I misrepresent anything. Problem Buildkit sessions are ...
One thing worth noting, unless we make other changes, running engine w/ -v /:/host right now implies that either:
- Engine is long-running, in which case there would be a single buildkit session forever, even after the client disconnects.
- In this case, the only solution I know of is to solve the problems in that source pinning issue above (in addition to a few other misc problems)
- Engine continues to be short-running and scoped to one client connection.
- So this would mean that whereas today provisioners do
docker runonly when buildkitd isn't running, now they call that everytime and block until its complete. - I guess it would also be possible to have a long running container where buildkitd is persistently running but engine is run w/
docker exec
- So this would mean that whereas today provisioners do
I keep losing track of all the stuff we'd need to do to make the engine fully remote (just remembered another one while reviewing Alex's PR: oci tarball export, which is different than local dirs). So I made this issue to track everything and link to subissues: https://github.com/dagger/dagger/issues/3624
maybe we're all on the same page already, but to me, this feels like the next big epic/top priority. I mentioned here (https://github.com/dagger/dagger/pull/3348#discussion_r1009954916) that whatever host API we have feels temporary until we figure all this out
Yes I completely agree (among all the other big epics and top priorities 😅 ) It's becoming a bottleneck on quite a bit
My plan is to try to prototype running engine w/ -v /:/host as quickly as possible just to get a concrete idea on whether it can serve as our "next stopgap". Worst case we end up with a magefile target for running a custom buildkitd 🙂 But if it works out, we can decide which parts of the remote engine tasks listed there makes sense to address next. Hopefully we can find a path like that where we can just tackle this one piece at a time
A couple downsides I've noticed so far:
-
On docker desktop for macos
docker run -v /:/hostis weird for a bunch of reasons. You get the files that are under/on macos, but also a bunch of other stuff. And worst of all,/Usersis a broken symlink to/host_mnt/Usersfor some reason, so you can't access your home dir by default. If you try to mount in your homedir directly, you get a ton of auth prompts whenever you try to do anything -
-v /:/hostwill not behave as expected if you are already in a docker container where the docker socket was mounted in
The first one sucks a lot, second is not great but fairly obscure
FWIW I did otherwise get it to work so far. Packaged buildkit and cloak into an image; pulled down your stdio PR @cloud sun and then re-purposed it to enable connecting to cloak dial-stdio over docker exec. I also got log streaming to work by sending it over the stderr over the commandconn.
^^ After thinking about it more and also based on what we just discussed in the graphql api meeting, I don't think we should go down the whole -v /:/host route. The downside of not really working on docker desktop for macos is pretty huge IMO.
Plus, as just discussed, it would make sense to always have the option of shelling out to a local binary to enable easier bootstrapping of new SDKs (even if it's not a firm requirement for all of them forever).
So, here's my thinking on the shortest possible path to an architecture that works for now but also lets us rearrange things as we keep iterating:
- Continue relying on a local engine binary that SDKs talk to, but over stdio (using Andrea's PR). The engine binary will be pretty much the same as today; it will still include the graphql router.
- That engine binary no longer provisions
moby/buildkit:v0.x.x, we now have our own published images likedagger/engine:v0.x.x. Inside that image, all we do initially is package up buildkit and its dependencies and run them.
I think if we do that, the "visible" components of the architecture will be "local binary" plus "engine image". Then our future iterations can basically just consist of moving internal pieces of functionality between those two components. Eventually, for some SDKs, we will move everything from local binary to engine image and thus don't need the local binary at all.
If we go the above route the immediate things we'd have to do are:
- Figure out how to package the local binary w/ whichever SDK we release next
- Literally putting the binary for each platform in the npm/pypi package is an option
- Could have the SDK download the binary from somewhere
- Could have the SDK provision the
dagger/engine:v0.x.ximage themselves (i.e. move that to native SDK code), then we can put the local engine binary for each platform in that image and have the SDKs grab it out of there (docker cpor equivalent)
- Publish the
dagger/engine:v0.x.ximages.- If we are starting out with just plain old buildkitd in there, we could literally just retag
moby/buildkitand push it to our own repo 🙃
- If we are starting out with just plain old buildkitd in there, we could literally just retag
I'll make issues for all this stuff if there's agreement, just brain dumping initially to get other thoughts
thinking out loud: could we just put the binary in the OCI image and run it with docker exec?
That's what I implemented yesterday, the problem is localdirs, the -v /:/host doesn't really work on macos and also has corner cases on other platforms
oh right
this plan sounds good to me. I'm thinking of poking at the filesync API/protocol to see if we can have it transfer over websockets and straight into the buildkit session. and also whether we can re-use it for export. but if someone gets to that before me, by all means!
I can open an issue for that too
That sounds amazing, please do. I think if we had that change in place, the docker exec approach becomes more viable. We'd need to solve multiplexing over stdio and also the registry auth tokens (we currently buildkit's session attachable, which just reads from your local docker creds file), but the filesync aspect is the biggest blocker on that by far.
I don't think we have an issue for it yet