#Concerns about having a local engine

1 messages · Page 1 of 1 (latest)

trim surge
#

I would rank that as number 1, but there's some other tricky stuff we'd have to port into each language

#

The support for gateway containers has a lot of custom go code on top of the underlying grpc api

timber osprey
#

Yeah I know just food for thoughts 🙂

trim surge
#

No totally I agree with the general sentiment

timber osprey
#

But yeah if it was the case. Then each sdk would just spin up the dagger docker image

#

Nothing else

#

No dependency on the binary

hearty matrix
#

What problem are we talking about?

timber osprey
#

No bootstrapping of dagger-buildkitd with dagger

trim surge
#

Yes the dependency on the binary is a product of lack of resources to reimplement lots of code in each SDK, which can always be addressed by supplying more resources, but we'll need to go through carefully and make sure we know how much we're taking on

#

It's worth more thought though I agree

timber osprey
hearty matrix
#

Ok, going on a call in 5mn but will catch up after. At first glance that list looks like a list of design decisions, rather than a list of problems

trim surge
#

^ Yeah it ties into our architecture (we spin up the local engine as binary which just talks to the separate daemon binary, currently buildkitd, soon to be our buildkitd wrapper). It also thus ties into multitenant support, everything we were talking about in terms of session-state yesterday, etc. etc.

timber osprey
#

Or the fact that we have a weird arch (see my comment on buildkit embedding PR)

hearty matrix
#

but if we packaged it as a container, wouldn't we lose host { } ?

trim surge
#

Another thing we'd have to reimplement in each SDK: provisioners (just thinking out loud)

timber osprey
#

It’s not the case for buildkit because the SDK has filesync

trim surge
#

(probably)

timber osprey
#

If the SDK were to upload local, then cloak could run anywhere (container, hosted, etc). Doesn’t matter anymore

hearty matrix
#

have you been talking to Olivier recently Andrea? 😉

timber osprey
#

Nope

#

Haha 🙂

#

It ties into the virtualized host stuff, sessions, local dirs on the playground

#

With filesync all of that doesn’t matter anymore

hearty matrix
#

that's a pretty major design change that would affect a lot of things. I think we should talk about the actual problem part a bit more before going to deep down this rabbit hole

trim surge
#

It's all coming to head in multiple areas (cloud, embedding buildkit, etc.) so we should have that discussion soon, but I agree there's no time to dive fully at this precise moment 🙂

hearty matrix
#

is the problem "packaging a binary in a SDK is too hard"?

trim surge
hearty matrix
#

I also don't see the relationship to packaging buildkit (wouldn't we have to do that inside the container anyway?)

timber osprey
timber osprey
hearty matrix
#

Concerns about having a local engine

timber osprey
#
  • Concerns about having a local engine AND a containairized engine at the same time

To illustrate the issue

#

Two engines

#

Because one must run on the host, the other in a container

hearty matrix
#

Right I understand (but that's too long for a discord channel name 🙂

timber osprey
#

Either we make buildkit run on the host (not possible), or we make cloak run in a container (limitation: filesync)

#

Anyway

#

It’s incredibly hard to do so it’s just day dreaming at this point

trim surge
#

If the problem was filesync alone I'd actually categorize it at the edge of possible (many possibilities besides just literally reimplementing the filesync api), but I think the random pile of other assorted issues probably puts it over the edge of realistic in the medium term.

But worth double-checking those assumptions

#

And thinking through what the shortest possible path to getting something working would be

hearty matrix
#

back, catching up

hearty matrix
#

if the engine is always remote, then we would have to either 1) add file streaming to the graphql API or 2) expose parts of the buildkit/grpc API to all clients, right?

#
  • in both cases, all SDKs would have to implement it
#

we would also have to change the secrets API to support sending secret value in the api (similar options 1 or 2 to do that)

trim surge
#

That's a lot of "ifs" obviously

hearty matrix
#

we could ship the engine with an ssh server

#

make ssh part of the api

#

gql + ssh

#

rsync or sftp for sync

#

regular ssh session for attach

trim surge
#

We need native ssh clients in each language then. Unless we shell out to ssh, but shelling out to ssh is not conceptually different than shelling out to local engine

hearty matrix
#

v1 just a black box container with duct tape. we could bundle it in a binary later but less urgent now

hearty matrix
trim surge
#

(I'd also suggest replacing ssh with websockets if we are just going to run rsync through it, but implementation detail)

trim surge
hearty matrix
#

duct tape basically 🙂

trim surge
# hearty matrix oh I didn’t mean anything as clean as implementing the ssh protocol. I meant lit...

Oh totally, but if we want to connect to the openssh server in the engine from a script in, e.g., javascript, then we either need a native javascript ssh client or to shell out to ssh. I was just thinking websockets because that's what we are using for service otherwise. But doesn't matter, point is we need a transport.

Curiosity got the best of me and I took a brief detour to google native lang rsync implementations:

  1. https://github.com/gokrazy/rsync
  2. https://github.com/isislovecruft/pyrsync
  3. https://github.com/WebDeltaSync/WebRsync
    I don't really trust the python/js ones, but having given them a very brief glance I am surprised how simple they are. It makes me wonder if it would be easier to implement the rsync protocol in python/js than implementing the buildkit filesync stuff. I truly have no idea, it just is an intriguing line of thought.
#

I'm sure this all occurred to Tonis and he made his decision to implement the filesync the way he did for some reason, so I expect it's more complicated than it seems

hearty matrix
#

there’s also sftp

#

my head hurts

#

Not to stress you out, but once we ship a design built around a concept of local engine, it will be very hard to rip it out later

#

OK another idea: how about a fork-exec helper of reduced scope, designed as a stopgap for a future where we reimplement it all in each SDK?

#

for exemple a dagger-sync-workdir tool?

trim surge
#

Router goes into our buildkitd wrapper

#

Yeah this needs more thought

#

But it is probably right. I gotta finish up the other stuff at this exact moment obviously but this can't wait long either

hearty matrix
#

We would still need to bundle buildkit though. At best we get the luxury of bundling it in a container image for now, instead of a binary. But we still need the bundling for a bunch of reasons.

trim surge
hearty matrix
#

In this scenario would we still want to bundle both the engine + client utility in the dagger binary?

#

Another issue is the provisioners, it seems wrong to reimplement them in every SDK, even long term

trim surge
timber osprey
#

catching up

hearty matrix
timber osprey
#

Erik you know the internals of bk better than me, but I think rsync or whatever would be a big downgrade compared to native right?

#

AFAIK a ton of work went into filesync to be snappy for that particular use case

hearty matrix
#

@timber osprey tally so far:

  1. Problem solved by removing local engine
  • ? unclear for me at the moment
  1. Implications of removing local engine
  • Filesync
  • Secrets
  • Provisioners
  • Host API (workdir, env)
  • API is no longer 100% graphql?
timber osprey
#

(although it doens't feel like that at times 😂)

trim surge
timber osprey
trim surge
#

I would be biased to agree with what you're saying though because I doubt all the work went into filesync for nothing

timber osprey
#

provisioning could be done in one offs

#

not that it changes anything but yeah

trim surge
timber osprey
hearty matrix
trim surge
timber osprey
#

does the sdk need to provision?

trim surge
hearty matrix
#

with the current assumptions (avoid managing stateful daemons) yes

hearty matrix
#

part of that requires illusion/polyfill, but it's worth it because fundamentally there's no reason for buildkit daemons to be pets, should be cattle

timber osprey
#

by provisioning we mean spinning up the container? or more complex provisioning (e.g. k8s etc)?

hearty matrix
#

unclear

trim surge
#

@timber osprey what do you think about the idea of retaining the local binary, but don't run it as an engine, just shell out to it for some utilities we don't want to rewrite in every language (mentioned above somewhere I think)

timber osprey
#

the former -- it's really nice to have. however:

  • if we don't, not that weird (that's what you do with virtually all tooling)
  • and if we do, not rocket science for every SDK to do, right?
#

e.g. not rocket science --> compared to the freaking codegen and xdx 🙂

#

the bar is already high

hearty matrix
#

Ideally it would all be a single, indempotent action that can be called on demand by the SDK

#

ie. the first engine.Start may do a lot. Subsequent calls may be faster because images are already downloaded kub services deployed, etc. But conceptually it's all bundled together

timber osprey
hearty matrix
#

to get an answer? 🙂

timber osprey
#

vs having an official helm chart, CF template, etc

#

same feeling as "use the language you're familiar with" -- "use the native tooling you're already familiar with"

#

but debatable

hearty matrix
#

If your kub admin won't let you deploy stuff on kub, then you don't use the kub provisioner

#

in that case "provision" may just be "connect to this known URL"

#

maybe provisioner is the wrong term?

timber osprey
#

like, ok, deploy to my k8s cluster. but I happen to use istio or whatever service mesh, so i need those extra labels. Oh and rbac is configured this way so blah blah

trim surge
timber osprey
#

but anyway -- doesn't change anything for filesync, secrets, etc

hearty matrix
#

but we either 1) need something pluggable in the client to invoke stateless engines remotely (possibly pre-provisioned) or 2) everyone needs to manage pet daemons

#

on kubernetes I think correct invocation is actually with a one-off exec (as opposed to declarative kubect apply)

#

on a remote server: ssh + exec

trim surge
hearty matrix
trim surge
hearty matrix
#

this helper binary would need to be a local proxy for the duration of the session. Since it needs to handle callbacks from the server

#

so the http2/grpc plumbing needs to flow through the helper or too much work left to the sdk

#

this actually solves the provisioning problem too

#

gives the helper a hook to handle that too

#

same helper can wrap your shell script to solve session management

#

host API could remain: callbacks would flow back from remote engine to local helper

#

oh and socket forwarding too (adding that to tally)

#

dagger client ?

#

or dagger-client

trim surge
#

Yes to all the above, dagger client is fine, in my current thinking this would be a hidden command from end users, just meant to be shelled out by sdks

#

If we continue to use buildkit's session management (as we do today), then we'll need to either A) fix session sharing upstream or B) have dagger client be long-running for full duration of session and accept commands over pipes (brings us back to two daemons technically I guess, but in slightly different form)

Or we can make our own session concept and not expose buildkit's from daggerd. Usual tradeoff of more work for more customizability.

#

Just thinking out loud

hearty matrix
#

doesn’t the problem go away if we do everything in one buildkit client connection?

trim surge
hearty matrix
#

that’s what I’m saying, we need that anyway because of the callback-driven nature of filesync (not to mention socket forwarding)

#

no way around it imo

#

it’s basically exactly the same as today except instead of running the engine locally you’re running a local proxy to the engine

trim surge
hearty matrix
#

it’s suddenly way less weird if you fork-exec a helper proxy rather than a server talking to another server

#

Now actually feel that we could keep the fork-exec forever (but keep the option open to change our minds later)

#

remaining part is the API now having a non-gql component

trim surge
hearty matrix
#

still would be a pretty major change not running a server locally, plus lots of new grpc plumbing

#

but smaller diff than I feared at the beginning of this thread

trim surge
#

If we are sending commands over pipes to a local proxy running for the duration of the session, then why not just do what we're now and run the router in that local proxy and send graphql through it.

#

idk I need to sleep on it, brain is sputtering

hearty matrix
#

That’s where making a list of current pain would be helpful

#

then we can weigh cost & benefit

timber osprey
#

Catching up

#

Sorry about the “food for thought”, it turned into a full banquet

timber osprey
# trim surge If we are sending commands over pipes to a local proxy running for the duration ...

Good point. Need to think more about it. I guess the major difference is it’s scoped to a much smaller functionality rather than being the full engine

Basically, SDKs would use that binary as an “implementation stopgap”, with the end goal of eventually supporting it natively (and the SDK could actually support it natively from day 1)

That point of view changes things quite a bit. For instance it would be ok for the node sdk to just embed the proxy binary inside the npm package itself (I’ve seen a few packages doing that). Since it’s more of a “.dll” than an engine.

Embedding the full engine has bigger implications. Every script has its own instance, no multi tenancy. Services die off as soon as the script is done. We talked about how cloak attach was weird because it wanted to attach to an existing server rather than spawning its own

#

On the operations side, taking playground as an example:

Marcos will probably run dagger on a container, and mount the docker socket inside so that dagger can launch itself again as a daggerd container and communicate through stdio, which is plenty weird (although we could probably make that better regardless for the use case of running dagger in a container)

But then if it runs as a container: what to do with local files etc? (problem we do have in playground)

timber osprey
# hearty matrix or `dagger-client`

Related to point above

Could potentially be a much smaller binary embedded in the SDK. Basically a dagger.so library-ish, but over fork exec, to stopgap whatever the SDKs can’t implement on their own

trim surge
#

I have to call it a night but now that I have a slight bit of breathing room plan is to think about this so we can draw out options some more. But yes first step will be to just write out the problems explicitly, ensure they are real, etc. I'll update this discussion (which is currently just my dumping grounds, but feel free to dump thoughts too, I'll clean it up once thoughts are formed): https://github.com/dagger/dagger/discussions/3280

Could potentially be a much smaller binary embedded in the SDK. Basically a dagger.so library-ish, but over fork exec, to stopgap whatever the SDKs can’t implement on their own
I'd like this a lot; there are devils in the details related to session crap but I have been thinking about it on and off today and have some vague ideas on shortcuts we can look into. The local dirs specifically have an insanely complicated caching scheme, there's a small chance we might be able to use them independent of sessions, which will simplify things greatly I think

timber osprey
#

(Hope i’m not adding to confusion here, just brainstorming, not convinced either way)

trim surge
timber osprey
#

I’ve been thinking about filesync/rsync etc, and would would be the simplest possible thing to do (relatively speaking).

Turns out, you can “wrap” gRPC over websockets (found an example repo on GitHub). And you can “proxy” buildkit (Erik you implemented that in the early days of the typescript SDK when TS was talking gRPC directly).

So in theory: engine could proxy buildkit’s filesync as is. Wrap it in WS.

Go SDK natively imports bk, wraps it in WS, exposed as SDK functions

Small binary uses the Go SDK to do just that. It’s embedded by SDKs that can’t implement the session stuff

Someday: we replace gRPC over WS by our own thing over WS, SDKs use that directly, no more binary

#

(e.g. /ws/v1/filesync is just a proxy for the filesync bits in gRPC over WS. /ws/v2/filesync can be a simpler protocol in the future that can be implemented natively in TS etc)

timber osprey
timber osprey
trim surge
# timber osprey I’ve been thinking about filesync/rsync etc, and would would be the simplest pos...

Yes to this, sync-as-a-service (or similar) in the very long term feels like a nice approach. Interestingly, we might actually replace local source llb ops with cache mounts running on gateway containers if we do that (maybe).

Slight tangent, but I was thinking a few days ago about how you’d implement “hot reload” of generated code clients whenever a change is made to your code first extension schema. One possibility would be to run mutagen (two way remote syncing service) over a cloak service. The HLB authors actually told me about that idea, they ran it directly over bk ssh sockets. So then you sync local extension code changes to the remote service over websocket, it generates client code and syncs that back to you.

I guess you could enable general hot reload use cases with this approach too? Like frontend dev tools and stuff. Or hot reload of extensions into the cloak engine too I suppose.

Either way, not high priority but kind of interesting and semi related in that it also involved file syncing. But mutagen doesn’t have native clients in non-go (I think, didnt check though), so doesnt really answer any of those questions.

hearty matrix
#

So, I understand the concern about having a complicated architecture with one local engine and possibly more remote engines. I find the potential solution (local engine is replaced by a helper proxy) elegant.

But. I have a concern of my own on the UX side: no more local engine means there is always a stateful daemon that needs to be installed and managed out of band. No more “install SDK, it just works!”. This leads to a UX similar to docker engine: soon you need to manage infrastructure on the side before you can do anything. Enter Docker Machine, boot2docker, Docker Toolbox, Vagrant, and of course podman, kubernetes
 That is to say, a horrible and fragmented UX. Everyone endlessly tending to pet-like stateful engines, and arguing over the best tool to do so. Turning everyone into grumpy sysadmins.

How do we avoid that?

#

Unlike docker engine, our engine can get away with being stateless because it is based on buildkit which only has a cache to worry about. It would be a shame if we wasted that opportunity with a pet management UX a-la docker machine

trim surge
# hearty matrix So, I understand the concern about having a complicated architecture with one lo...

I agree that's one of the things we need to figure out and that we should find any way we can to hide the requirement of persistently running daemons. But I also think it's mostly orthogonal to whether the local engine exists or not.

In today's current state w/ the local engine, there is still a persistent stateful daemon (buildkitd). We could in theory fix that by making our buildkitd wrapper ephemeral while still retaining the local engine.

In a world where we get rid of the local engine (either have a helper binary or put everything in the client SDKs), we will still need to solve the problem of running buildkitd functionality ephemeraly.

So it's not that the problems have no interaction with one another, it's more that making buildkitd ephemeral and cattle-like will require its own independent set of solutions.