simplifications to runtime bootstrap | Dagger | Page 1

wraith osprey Oct 11, 2023, 9:06 PM

#

Have one vague idea on an overall simplified approach to the whole thing, curious what disadvantages it has relative to current approach. I feel like I'm forgetting some versioning considerations, but not sure.

We partially go back to the previous approach in that we have at least one runtime that is hardcoded in our core code (like how we had goRuntime and pythonRuntime before, except now we also handle codegen too rather than just the runtime).
Anyone writing a new runtime/codegen implementation will have to either
1. Open a PR with us to add it the set of hardcoded implementations (though we should be very conservative in what we add here)
2. Write a Module w/ ModuleRuntime and Codegen functions that itself uses one of the SDKs/runtimes we have hardcoded in core.
dagger.json can either specify one of those hardcoded SDKs or they can specify a Module reference that points to the external SDK implementation (i.e. pushed to git somewhere).

So let's say we have a hardcoded implementations of the Go+Python SDK runtime/codegen in core.

Now I want to write a Zig SDK implementation. The dagger team doesn't want to include that as a hardcoded SDK in core, so I will just write an external Module implementing it. I have to choose between Go and Python to do so, which makes me slightly sad, but at least I have a few options.

We can try to find a reasonable set of SDK langs to hardcode in core that give enough options to at least not upset someone by forcing use of a language they consider their mortal enemy.

So I finish that, now there's 3 SDKs in the world: hardcoded Go+Python and externally defined Zig.

Now I want to write a Rust SDK, once again as an external module rather than hardcoded. When I implement that, I can choose to use Go/Python, but I can also now choose to use the Zig one by specifying the Zig SDK (just pointing to the module's location in Git + branch/tag/commit).

Now there's 4 SDKs in the world, etc. etc.

#

The modifications from the current state (as I understand it) are that there's no bootstrap command and no need to publish images, open an automated PR to update the hardcoded image reference of "well known SDKs", etc.

Basically, "well known SDKs" revert back to our approach before where it's just code in core, but we retain the ability to also define more SDKs as Modules. Externally defined SDKs will have an "ancestry" that ultimately points back to one our hardcoded SDK implementations, but as more SDKs are added out in the ecosystem, more and more options for how to bootstrap become available.

lunar iron Oct 11, 2023, 9:20 PM

#

Agree in principle that it'd be nice to avoid the complexity of bumping/bootstrapping the images. The biggest challenge I see there is the "except now we also handle codegen too" - where/when/how would that run? The previous hack was running the mounted-in dagger binary (...btw I think we never undid that change to ExperimentalPrivilegedNesting 😅) since the dagger CLI had the codegen logic built-in, and I couldn't think of a nice way to run custom codegen logic in the context of the runtime otherwise. But that led to scary issues like caches not busting when codegen needs to re-run. Right now the runtimes just use off-the-shelf images, so don't have to deal with putting custom logic into the build. Maybe I'm missing something simple though.

#

So for me it's weighing the complexity of that solution vs. the complexity of the shipping/bumping process for well-known SDKs (which isn't mapped out yet at all and could maybe possibly not be awful)

wraith osprey Oct 11, 2023, 9:26 PM

#

lunar iron Agree in principle that it'd be nice to avoid the complexity of bumping/bootstra...

The previous implementation of pythonRuntime was technically Go code that specified how to build the container image, but the implementation of the actual runtime itself was still Python code (specified via a dependency by the user's project code, though we can also easily just embed that code in the engine itself).

I feel like that exact same idea would work for codegen, right? There's an implementation of codegen in the python SDK already, if we want to run it, it's just a matter of loading it up into a container and running it.

lunar iron Oct 11, 2023, 9:27 PM

#

So some sort of embed.FS => Container shenanigans?

wraith osprey Oct 11, 2023, 9:27 PM

#

Not even that (unless we wanted to), we can just literally put the code in the engine image

#

or a packaged up shiv thing, etc.

lunar iron Oct 11, 2023, 9:30 PM

#

How would it get hoisted into the runtime container? Ideally that would happen in a way that doesn't lead to the same overly-aggressive caching quirks I ran into before

wraith osprey Oct 11, 2023, 9:31 PM

#

We can just use local sync

#

I did that before to load files from the engine image itself into the engine

#

Several iterations ago, I think for the branch we used to demo

lunar iron Oct 11, 2023, 9:32 PM

#

Oh ha, that's something we can only do now that the server runs in the engine right

wraith osprey Oct 11, 2023, 9:32 PM

#

Yes exactly, that is what makes this possible

lunar iron Oct 11, 2023, 9:33 PM

#

That solves that problem for me then 👍 - I'd still be interested in finding a way to not have to special-case a handful of SDKs, though

wraith osprey Oct 11, 2023, 9:34 PM

#

Another thing I just remembered is that we have some users w/ incredibly strict firewall policies that have requested we remove any hard dependencies on pulling from external registries (even dockerhub) at runtime. IIUC having all runtimes be container images would probably not work for them (or at least be a huge headache)

wraith osprey Oct 11, 2023, 9:34 PM

#

lunar iron That solves that problem for me then 👍 - I'd still be interested in finding a w...

One option is to only special case Go

#

And then when we add Python, it has to be a Go module. But then when we add Node, we can choose between Go+Python. Etc.

#

I'm pretty sure there would even be a path to e.g. first implement Python in Go, but then use that module to implement a pure Python Module SDK. Similar to how building Go from source has to first go through their old C implementation and then compile Go w/ Go. Though not sure if that's ever worth the headache for us

lunar iron Oct 11, 2023, 9:37 PM

#

wraith osprey Another thing I just remembered is that we have some users w/ incredibly strict ...

IMO we should address this by solving the general problem (constantly resolving refs even when they're already pinned to digests, also something something registry mirrors/pre-fetching/etc) rather than using it as an incentive here one way or another

wraith osprey Oct 11, 2023, 9:39 PM

#

lunar iron IMO we should address this by solving the general problem (constantly resolving ...

Sure I guess the equivalent would be to not put directories in the engine image but instead put a container image tarball and load it (ish, something like that)

#

But same basic idea I think

wraith osprey Oct 11, 2023, 9:43 PM

#

wraith osprey One option is to *only* special case Go

I guess if we want to avoid hard network dependencies for our basic functionality, then it probably makes sense for our "official" SDKs to be hardcoded in the engine. So we can at least tell users that as long as you use one of the official ones, you don't need to update your firewalls.

lunar iron Oct 11, 2023, 10:12 PM

#

Taking a step back to try to unpack the complexity of the current everything-is-a-module + bootstrapping process. Does this sound accurate?

Glossary:

Client = the "bare" API client generated by the SDK via schema introspection, i.e. dagger.io/dagger as it is today
Runtime = whatever your language needs for evaluating a module written in the target language (for Go that's codegen + func main(), for Python it's dynamic I presume)
Runtime binary = the binary [or script] that implements the Module calling convention
Module container = the container with a runtime binary as its entrypoint
SDK Module = a Module that knows how to build Module containers from source code in the target language

Bootstrapping steps:

Implement Schema => Client codegen in your language of choice
Use step 1 to implement Runtime, whether that's part of codegen (Go) or evaluated at runtime (Python?)
Implement your SDK module
a. Use step 2 to implement SDK Module in your target language
b. Or use some other SDK
Build your SDK module into a Module container
a. Use step 1 to build a Module container "by hand" (./cmd/bootstrap)
b. Or use some other SDK
[Optional] Use step 4 to build the SDK module using itself
Build + publish Module container image somewhere

wraith osprey Oct 11, 2023, 10:17 PM

#

lunar iron Taking a step back to try to unpack the complexity of the current everything-is-...

From my understanding, that's accurate (cc @modest canyon)

#

The parts that would be gone with what I'm suggesting are 4+5. And 6 would be replaced with:

If a hardcoded SDK, update Dagger CI to include your SDK in the engine image directly (to avoid internet dependencies)
If an external SDK, push it to Git (or theoretically we could support images too, etc.)

lunar iron Oct 11, 2023, 10:26 PM

#

[I've typed maybe 5 beginnings of sentences trying to relate this to the idea of turning core/ into a module, I think it's aligned somehow but don't have anything coherent]

#

When you say "what would be gone" you're mainly talking about for us Dagger maintainers right? i.e. anyone incentivized (or for us, forced) to build a "Canonical Well-Known SDK." I would imagine most folks passionate enough to maintain an SDK for their pet language would rather just figure out the full bootstrapping process

wraith osprey Oct 11, 2023, 10:31 PM

#

lunar iron When you say "what would be gone" you're mainly talking about for us Dagger main...

Well it would be gone for anyone, us and them. Implementing an external SDK would be:

Implement Codegen
Implement Runtime
Package up as Module (using an existing SDK)
Push to Git

lunar iron Oct 11, 2023, 10:32 PM

#

Isn't that already something you can do with the current process? (see 3b and 4b)

wraith osprey Oct 11, 2023, 10:33 PM

#

Yeah, it's just simpler in that there's no need to always publish a container image, no external bootstrap command to figure out, etc.

#

Basically just trims the need for a "Module Container" as described in step 4

#

It's just Module

lunar iron Oct 11, 2023, 10:35 PM

#

I don't think you need a module container anyway; we should probably just replace sdkRuntime with a module ref

#

Then it'd just use the SDK Module's SDK to build its own module, like any other module

#

We'd still need an image ref for bootstrapping somewhere, granted, just saying the external SDK authoring flow can be simplified in either scenario

#

Either way I'm good with baking our officially supported SDKs into the engine as source code built at runtime, since it seems like it'd really simplify our dev/publishing flow, and it avoids the caching issues I ran into before. Sorry for the extended back and forth, just trying to make sure everything's clear on both sides since it's a tricky system only partially realized across all of our heads 😅

wraith osprey Oct 11, 2023, 10:44 PM

#

No it's all good, thanks for going through it! I am still wrapping my head around things and am still evolving my own thoughts, so it's all helpful.

lunar iron Oct 11, 2023, 10:46 PM

#

It seems like in the end we want dagger.json to have something like "sdk": "go" for built-in SDKs and "sdkModule": "github.com/vito/daggerverse/bass@abcdefg" for externally developed ones, which has to be implemented using either a built-in SDK or their favorite language's SDK (...which is, somewhere at the bottom, implemented using a built-in SDK)

wraith osprey Oct 11, 2023, 10:46 PM

#

lunar iron It seems like in the end we want `dagger.json` to have something like `"sdk": "g...

Yes exactly, built-in ones are the roots of the DAG

modest canyon Oct 11, 2023, 10:49 PM

#

I'd prefer building the python runtime with a go module if that simplifies the bootstrapping. Right now the issue is the mix of languages don't work, since the server expects the module container's entrypoint to be the built entrypoint of the included module. In go that's done by passing the sdk-go module through it's own moduleRuntime function. That only works if they're the same language.

#

Pointing to a module ref instead is simpler.

wraith osprey Oct 11, 2023, 10:53 PM

#

modest canyon I'd prefer building the python runtime with a go module if that simplifies the b...

Yep that's the idea. I was originally saying at the beginning of this thread to essentially go all the way back to the previous pythonRuntime func (only for our "official" SDKs, external SDKs can still be modules). That's maybe still where I'll start but I do see a route where only Go could be special, Python SDK is a Go Module, etc.

Honestly the newest thing to me is that we need avoid internet dependencies for our built-in SDKs, which I'm realizing is going to be a PITA in terms of bloating our engine image... I'll keep thinking about that.

#

Gonna go dive into actually typing the code for this, we'll see what happens in practice 😄

modest canyon Oct 11, 2023, 10:55 PM

#

Yep, I think it's actually nice to be able to use the higher level go code in a go module rather than the lower level core functions. I also need to include the sdk itself in the runtime container somehow. How do you suggest doing that with the previous pythonRuntime?

wraith osprey Oct 11, 2023, 10:57 PM

#

modest canyon Yep, I think it's actually nice to be able to use the higher level go code in a ...

For each official SDK that we want to support without having a hard dep on the internet, we're gonna have to build it into our engine image. We then have various options to efficiently load it as a container thanks to the fact that the graphql server is in the engine container now. We can do local dir syncs, load oci tarballs, etc.

#

By "build it into our engine image", I mean just update our CI to do that. Which is funny because our CI is now a Go Module laughcry

modest canyon Oct 11, 2023, 10:59 PM

#

Ok, cool 🙂 I just need the includes in bootstrap.py (draft PR from earlier) and add "/sdk" as an arg to shiv.

wraith osprey Oct 11, 2023, 11:00 PM

#

modest canyon Ok, cool 🙂 I just need the includes in `bootstrap.py` (draft PR from earlier) a...

Something like that, yeah. Actually, does shiv include the python interpreter too? Or is that still have to be external

#

That's where my concerns about image bloat are coming from (python interpreter, go compiler, etc.)

modest canyon Oct 11, 2023, 11:01 PM

#

Ah, but that's only needed during build of the runtime container. For the engine container, I just need those files to be copied into the runtime later.

#

shiv is similar to go build after codegen'ning a module.

#

No interpreter, it's just the python packages archived in a single executable.

wraith osprey Oct 11, 2023, 11:05 PM

#

modest canyon Ah, but that's only needed during build of the runtime container. For the engine...

Well if we want to actually run any user modules, they will need to be compiled/interpreted. But also, we currently aren't planning on pre-packaging any modules, so I suppose atm the only way to avoid internet deps would be for them to always load from a local dir... hyperthinkspin

#

I'm getting too lost in my own thoughts now, I'm just gonna try doing it and see what happens. But either way, yeah as a baseline we'll shiv stuff up and put it in the engine image

#simplifications to runtime bootstrap