#simplifications to runtime bootstrap
1 messages ยท Page 1 of 1 (latest)
Have one vague idea on an overall simplified approach to the whole thing, curious what disadvantages it has relative to current approach. I feel like I'm forgetting some versioning considerations, but not sure.
- We partially go back to the previous approach in that we have at least one runtime that is hardcoded in our
corecode (like how we hadgoRuntimeandpythonRuntimebefore, except now we also handle codegen too rather than just the runtime). - Anyone writing a new runtime/codegen implementation will have to either
- Open a PR with us to add it the set of hardcoded implementations (though we should be very conservative in what we add here)
- Write a Module w/
ModuleRuntimeandCodegenfunctions that itself uses one of the SDKs/runtimes we have hardcoded incore.
dagger.jsoncan either specify one of those hardcoded SDKs or they can specify a Module reference that points to the external SDK implementation (i.e. pushed to git somewhere).
So let's say we have a hardcoded implementations of the Go+Python SDK runtime/codegen in core.
Now I want to write a Zig SDK implementation. The dagger team doesn't want to include that as a hardcoded SDK in core, so I will just write an external Module implementing it. I have to choose between Go and Python to do so, which makes me slightly sad, but at least I have a few options.
- We can try to find a reasonable set of SDK langs to hardcode in
corethat give enough options to at least not upset someone by forcing use of a language they consider their mortal enemy.
So I finish that, now there's 3 SDKs in the world: hardcoded Go+Python and externally defined Zig.
Now I want to write a Rust SDK, once again as an external module rather than hardcoded. When I implement that, I can choose to use Go/Python, but I can also now choose to use the Zig one by specifying the Zig SDK (just pointing to the module's location in Git + branch/tag/commit).
Now there's 4 SDKs in the world, etc. etc.
The modifications from the current state (as I understand it) are that there's no bootstrap command and no need to publish images, open an automated PR to update the hardcoded image reference of "well known SDKs", etc.
Basically, "well known SDKs" revert back to our approach before where it's just code in core, but we retain the ability to also define more SDKs as Modules. Externally defined SDKs will have an "ancestry" that ultimately points back to one our hardcoded SDK implementations, but as more SDKs are added out in the ecosystem, more and more options for how to bootstrap become available.
Agree in principle that it'd be nice to avoid the complexity of bumping/bootstrapping the images. The biggest challenge I see there is the "except now we also handle codegen too" - where/when/how would that run? The previous hack was running the mounted-in dagger binary (...btw I think we never undid that change to ExperimentalPrivilegedNesting ๐
) since the dagger CLI had the codegen logic built-in, and I couldn't think of a nice way to run custom codegen logic in the context of the runtime otherwise. But that led to scary issues like caches not busting when codegen needs to re-run. Right now the runtimes just use off-the-shelf images, so don't have to deal with putting custom logic into the build. Maybe I'm missing something simple though.
So for me it's weighing the complexity of that solution vs. the complexity of the shipping/bumping process for well-known SDKs (which isn't mapped out yet at all and could maybe possibly not be awful)
The previous implementation of pythonRuntime was technically Go code that specified how to build the container image, but the implementation of the actual runtime itself was still Python code (specified via a dependency by the user's project code, though we can also easily just embed that code in the engine itself).
I feel like that exact same idea would work for codegen, right? There's an implementation of codegen in the python SDK already, if we want to run it, it's just a matter of loading it up into a container and running it.
So some sort of embed.FS => Container shenanigans?
Not even that (unless we wanted to), we can just literally put the code in the engine image
or a packaged up shiv thing, etc.
How would it get hoisted into the runtime container? Ideally that would happen in a way that doesn't lead to the same overly-aggressive caching quirks I ran into before
We can just use local sync
I did that before to load files from the engine image itself into the engine
Several iterations ago, I think for the branch we used to demo
Oh ha, that's something we can only do now that the server runs in the engine right
Yes exactly, that is what makes this possible
That solves that problem for me then ๐ - I'd still be interested in finding a way to not have to special-case a handful of SDKs, though
Another thing I just remembered is that we have some users w/ incredibly strict firewall policies that have requested we remove any hard dependencies on pulling from external registries (even dockerhub) at runtime. IIUC having all runtimes be container images would probably not work for them (or at least be a huge headache)
One option is to only special case Go
And then when we add Python, it has to be a Go module. But then when we add Node, we can choose between Go+Python. Etc.
I'm pretty sure there would even be a path to e.g. first implement Python in Go, but then use that module to implement a pure Python Module SDK. Similar to how building Go from source has to first go through their old C implementation and then compile Go w/ Go. Though not sure if that's ever worth the headache for us
IMO we should address this by solving the general problem (constantly resolving refs even when they're already pinned to digests, also something something registry mirrors/pre-fetching/etc) rather than using it as an incentive here one way or another
Sure I guess the equivalent would be to not put directories in the engine image but instead put a container image tarball and load it (ish, something like that)
But same basic idea I think
I guess if we want to avoid hard network dependencies for our basic functionality, then it probably makes sense for our "official" SDKs to be hardcoded in the engine. So we can at least tell users that as long as you use one of the official ones, you don't need to update your firewalls.
Taking a step back to try to unpack the complexity of the current everything-is-a-module + bootstrapping process. Does this sound accurate?
Glossary:
- Client = the "bare" API client generated by the SDK via schema introspection, i.e.
dagger.io/daggeras it is today - Runtime = whatever your language needs for evaluating a module written in the target language (for Go that's codegen +
func main(), for Python it's dynamic I presume) - Runtime binary = the binary [or script] that implements the Module calling convention
- Module container = the container with a runtime binary as its entrypoint
- SDK Module = a Module that knows how to build Module containers from source code in the target language
Bootstrapping steps:
- Implement Schema => Client codegen in your language of choice
- Use step 1 to implement Runtime, whether that's part of codegen (Go) or evaluated at runtime (Python?)
- Implement your SDK module
a. Use step 2 to implement SDK Module in your target language
b. Or use some other SDK - Build your SDK module into a Module container
a. Use step 1 to build a Module container "by hand" (./cmd/bootstrap)
b. Or use some other SDK - [Optional] Use step 4 to build the SDK module using itself
- Build + publish Module container image somewhere
From my understanding, that's accurate (cc @modest canyon)
The parts that would be gone with what I'm suggesting are 4+5. And 6 would be replaced with:
- If a hardcoded SDK, update Dagger CI to include your SDK in the engine image directly (to avoid internet dependencies)
- If an external SDK, push it to Git (or theoretically we could support images too, etc.)
[I've typed maybe 5 beginnings of sentences trying to relate this to the idea of turning core/ into a module, I think it's aligned somehow but don't have anything coherent]
When you say "what would be gone" you're mainly talking about for us Dagger maintainers right? i.e. anyone incentivized (or for us, forced) to build a "Canonical Well-Known SDK." I would imagine most folks passionate enough to maintain an SDK for their pet language would rather just figure out the full bootstrapping process
Well it would be gone for anyone, us and them. Implementing an external SDK would be:
- Implement Codegen
- Implement Runtime
- Package up as Module (using an existing SDK)
- Push to Git
Isn't that already something you can do with the current process? (see 3b and 4b)
Yeah, it's just simpler in that there's no need to always publish a container image, no external bootstrap command to figure out, etc.
Basically just trims the need for a "Module Container" as described in step 4
It's just Module
I don't think you need a module container anyway; we should probably just replace sdkRuntime with a module ref
Then it'd just use the SDK Module's SDK to build its own module, like any other module
We'd still need an image ref for bootstrapping somewhere, granted, just saying the external SDK authoring flow can be simplified in either scenario
Either way I'm good with baking our officially supported SDKs into the engine as source code built at runtime, since it seems like it'd really simplify our dev/publishing flow, and it avoids the caching issues I ran into before. Sorry for the extended back and forth, just trying to make sure everything's clear on both sides since it's a tricky system only partially realized across all of our heads ๐
No it's all good, thanks for going through it! I am still wrapping my head around things and am still evolving my own thoughts, so it's all helpful.
It seems like in the end we want dagger.json to have something like "sdk": "go" for built-in SDKs and "sdkModule": "github.com/vito/daggerverse/bass@abcdefg" for externally developed ones, which has to be implemented using either a built-in SDK or their favorite language's SDK (...which is, somewhere at the bottom, implemented using a built-in SDK)
Yes exactly, built-in ones are the roots of the DAG
I'd prefer building the python runtime with a go module if that simplifies the bootstrapping. Right now the issue is the mix of languages don't work, since the server expects the module container's entrypoint to be the built entrypoint of the included module. In go that's done by passing the sdk-go module through it's own moduleRuntime function. That only works if they're the same language.
Pointing to a module ref instead is simpler.
Yep that's the idea. I was originally saying at the beginning of this thread to essentially go all the way back to the previous pythonRuntime func (only for our "official" SDKs, external SDKs can still be modules). That's maybe still where I'll start but I do see a route where only Go could be special, Python SDK is a Go Module, etc.
Honestly the newest thing to me is that we need avoid internet dependencies for our built-in SDKs, which I'm realizing is going to be a PITA in terms of bloating our engine image... I'll keep thinking about that.
Gonna go dive into actually typing the code for this, we'll see what happens in practice ๐
Yep, I think it's actually nice to be able to use the higher level go code in a go module rather than the lower level core functions. I also need to include the sdk itself in the runtime container somehow. How do you suggest doing that with the previous pythonRuntime?
For each official SDK that we want to support without having a hard dep on the internet, we're gonna have to build it into our engine image. We then have various options to efficiently load it as a container thanks to the fact that the graphql server is in the engine container now. We can do local dir syncs, load oci tarballs, etc.
By "build it into our engine image", I mean just update our CI to do that. Which is funny because our CI is now a Go Module 
Ok, cool ๐ I just need the includes in bootstrap.py (draft PR from earlier) and add "/sdk" as an arg to shiv.
Something like that, yeah. Actually, does shiv include the python interpreter too? Or is that still have to be external
That's where my concerns about image bloat are coming from (python interpreter, go compiler, etc.)
Ah, but that's only needed during build of the runtime container. For the engine container, I just need those files to be copied into the runtime later.
shiv is similar to go build after codegen'ning a module.
No interpreter, it's just the python packages archived in a single executable.
Well if we want to actually run any user modules, they will need to be compiled/interpreted. But also, we currently aren't planning on pre-packaging any modules, so I suppose atm the only way to avoid internet deps would be for them to always load from a local dir... 
I'm getting too lost in my own thoughts now, I'm just gonna try doing it and see what happens. But either way, yeah as a baseline we'll shiv stuff up and put it in the engine image