#Toolchain Overlays POC by kpenfound · Pu...
1 messages · Page 1 of 1 (latest)
Does this solution for middleware make sense?
Is this better than just writing a main module for composition?
One concern I have is that its adding complexity to how modules are loaded. To understand the API, we now have
- Load the main module (or blueprint)
- Load the main module's dependencies
- Load toolchains and add proxy functions to the toolchain module's constructor
- Load overlay toolchains and apply proxies to the toolchain functions
What we gain is the ability to change how a toolchain function is called (like middleware) without writing a new check in a main module (we might not even have a main module) and disabling the toolchain check in the dagger.json
NOTE: this is a pretty pessimistic take having not used it for the use cases it's intended for, there's a good chance it's actually pretty nice all things considered, but leaving raw initial feedback because I'm guessing I won't be the only one
My 10 mile read on it is that it feels like an intimidating amount of complexity. Creating a third module that modifies how another module is used by your module feels like a lot to keep in your head. I'd be struggling to understand how all the components get joined together to result in the final interface, like you said.
I feel like I'd end up yearning for the relative simplicity of just using a module as a plain old dependency. Like the complexity of adding my own check that calls the module's check (~5 lines of code) would be preferable to the complexity of installing it as a toolchain and babysitting/modifying the checks that it brings along (through a combination of dagger.json and .env edits and overlay modules).
tl;dr this seems like a well designed solution for the problems we have, but I wonder if we could instead try to solve problems further upstream to avoid a complicated solution downstream
Some things to consider:
- Can we have toolchains act as super-powered dependencies, and "just" support calling them in the user module and returning their types? This would be distinct from plain dependencies, as their types would be effectively copied into the target module, avoiding the diamond dependency problem. We could maybe even support overriding methods on toolchain-provided types if we need to, just like we used to support that for
Containerfor a hot second near the start of Zenith. I'm not sure how you'd call the original though. - Do we need to have a "come to Jesus" moment with contextual dir args and try to find a new design? Feels like a lot of this hardship comes from how those work. (Failing to remember the details but I remember it coming up a lot when going over this, and they're in a sense the original reason toolchains exist)
- Overlays feel like monkey-patching in Ruby, but in Ruby it's a last resort. Through that lens, it makes me wonder if there's a better way to do module composition that's less brittle. But I don't know what that is. What I do know is if people design modules such that they expect users to define overlays to fully use them as the primary configuration/customization/parameterization technique, that would feel like pretty bad DX. Maybe something around dependency injection?
- How much of this can be solved by solid module design patterns? Things like "builder modules should always accept a configurable
base Container" This example feels like poor module design (sorry I know it's just a quick demo) - if a user wanted to build with a different base image entirely they'd have to completely ignore the originalBuildand do their own thing which feels crazy.
Sorry for all the words, trying to throw things at the wall and see what sticks 😅
thanks for the detailed analysis @compact folio ! I think we're on the same page, thats why I wanted to have this discussion
Can we have toolchains act as super-powered dependencies
Thats basically the alternative today. You can already use a toolchain as a dependency, but you can't override types. I don't know if thats really needed
Do we need to have a "come to Jesus" moment with contextual dir args and try to find a new design?
We definitely circled around this a few times when @deft arch and I were brainstorming this. I think on Monday someone (maybe @cobalt night ?) mentioned creating a way to find the true module source separate from context dirs, which feels like a prerequisite for handling the main context in a smarter way
Overlays feel like monkey-patching in Ruby
100%. The original idea was more like a middleware, but where we ended up was essentially monkey-patching. I think this leads to the next point
Things like "builder modules should always accept a configurable base Container"
Yes, but here's where it gets tricky. Taking our CI for example, to use a generic Go toolchain we would need to pass in a base container configured with multiple custom Services, some from other toolchains. So we have some cross-toolchain composition required. With a middleware, we could do some lightweight composition to create that base container that Go.Test needs. Or in a main module, we create a Test check that does the composition, and then we disable Go.Test (you can disable individual checks in the dagger.json). The later works today, but we're trying to see if a middleware solution could be better.
I'm not convinced the overlay solution is worth it when the current workaround isn't that much different and doesn't add to module loading complexity. But maybe there's still something to the original middleware idea?
I've also been considering that we might need a more advanced system for routing api calls to function calls now that we have blueprints and toolchains. It was simple enough when we just had individual modules, but the proxying stuff is a hack that could be handled by a proper router. And with a router, the middleware concept becomes much less complex
Hello! I will dive in after the week-end, but dropping a few thoughts:
-
I think we can salvage this feature with a chiropractic adjustment. Instead of "module FOO has a toolchain BAR but loaded as an overlay to BAZ", it should be: "module BAR extends module BAZ". This moves the feature to the toolchain's dagger.json - and makes it orthogonal to toolchains. It narrows the focus to a specific goal: a less verbose DX for wrappers.
-
Separately from all this, I agree that we should revisit context directories (
+defaultPathand+ignore). With the new toolchain pattern, they are even more in the UX hotpath, so it's worth looking for every opportunity to improve it.
module BAR extends module BAZ
would you then install BAR as a toolchain instead of BAZ?
Correct me if I'm wrong: in the current implementation you are required to install both as toolchains, whether you actually need both or not. What I'm proposing is that you would not be required to do that. Instead:
- If you need BAR, you install BAR
- If you actually need BAZ without any overlays, you install BAZ
- If you need both, you install both
And regardless of which toolchains you install, overlays are irrelevant at that point
Yep makes sense, was just making sure thats what you were saying. So the toolchain being overlaid is basically irrelevant to the user, they just care about the API of the toolchain they actually install. Fairly big departure from a middleware concept but I think it solves the problem
Do we need to have a "come to Jesus" moment with contextual dir args and try to find a new design? Feels like a lot of this hardship comes from how those work. (Failing to remember the details but I remember it coming up a lot when going over this, and they're in a sense the original reason toolchains exist)
@compact folio did you have anything specific in mind here? I'm interested..
not really 🫣
just extremely vague thoughts around dependency injection/discovery, Env, all that
but have to keep caching / cache key computation in mind, too
What do you think of @sullen venture 's proposal for https://github.com/dagger/dagger/issues/11287#issuecomment-3440807340
Seems related somehow
tldr: create a new type for the underlying "file server" that can produce directory snapshots; support passing that as argument to create your own snapshots
The part at the end feels intriguing to me, since what paths to load tends to change infrequently, and knowing that as early as possible seems like it'd have bigger performance implications for more potential systems (like smart checking, cache key computing, etc). I may have some bias though since it reminds me of my 'multiple .ignore files' idea 🙂
That's more of a "I want to know what we can find out about that approach" than a "I prefer that over LiveDir" - incrementally expanding a Directory does sound cool, but I wonder if it's more dynamic than we really need and would have a higher performance floor (as in higher price of admission)
I think the biggest part of the problem is there being one contextual dir - sometimes we want it to be the original module's source (e.g. Dang SDK), sometimes we want it to be the toolchain-using module's source, sometimes we want that to be an Env workspace
Yeah how aggressively we could optimize it, is an important question.
The original use case for the feature is very real, though. Basically having a way for users to not repeat themselves. If there's filtering info somewhere in the repo, even locked inside a native toolchain, a dagger module should be able to use it
I think starting by separating ModuleSource from context will help. Especially if we can expose the context directory in the API too, not just via magic arguments
expose the context directory in the API too, not just via magic arguments
That's the hard part, because of caching considerations (but yeah agreed)
ah right, so then further separating "here's the whole context you can access in the API" from "here's a special version of that you can get with a magical argument"
or just avoiding the API completely so there's not a wrong way to do the same thing
ok fine its a hard problem 😛
here's the whole context you can access in the API
But we can't expose that without breaking our whole caching system (specifically cache invalidation of the function calling that hypothetical API).
It's a "chicken and egg" problem
The idea in that issue (dynamic filtering) is to narrow the scope of the magic argument, and move as much logic as possible in regular code, inside an entrypoint function that is never cached.
The whole idea hinges on how fast we can make that never-cached entrypoint function (which is what @compact folio was concerned about earlier in this thread I think)
This "never-cached entrypoint" design does require a special argument type, and the exact shape of that type hasn't been defined. All we know is that it needs a way to create directory snapshots.
In that issue I call the type "FileServer", but it could also be broader in scope, and basically be "env 2.0".
OR, maybe it's not an argument at all, but a special pragma? "This function is my entrypoint" -> therefore it can call dag.CurrentEnv() or whatever. If another function does, it gets an error
OR, maybe you don't even need a pragma, you just have to do it in the module constructor? So we would assing more meaning to the concept of "constructor", which is neat because it already exists, it's just a little buried
the overhead of module function calls is less than a second I believe, at least last I looked during the boltdb fixes
but think there's some pretty reasonable options to reduce it further if needed
personally I like the argument approach the most because it's so explicit and obvious, but functionally speaking I think any of the options you listed end up equivalent
I'm personally super used to it, but still on the fence on whether fresh new users find it more or less confusing than alternatives
I like the fact that it's an argument, but I don't love that ignore lists become part of the argument spec - it's not so bad in some SDKs, but big JSON list comments in Go are unfortunate, and I worry about drift between those lists in general
(but it is convenient to define it all together, to be fair)
yeah that's an important distinction
That's part of the appeal, to me, of pulling those "specs" out as something that can be committed and possibly generated. Like maybe it would be nice to have names for different views of the source tree, and map those names to Directory arguments somehow
We did use to have views in dagger.json... But IMO the best place for those specs to live, is in code (if possible)
ah right, views is similar but not the best DX - caller has to specify it, no way to default to one for an arg, per-module JSON config instead of files in the repo at appropriate paths
i like the idea of driving them from code, e.g. to deal with replace rules in a go.mod, but generating them feels like a good next step on top of that, since these things rarely change and are almost never dynamic afaik
sorry i realize i'm mostly nibbling around the edges of an idea without putting forward a proposal to pick apart
But in the case of #fanvue for example, there doesn't seem to be a clear benefit of generating, since they're still manually maintaining a list at the end - plus dealing with the complication of generating from that. When ideally they would just manually maintain the dagger code directly (if the DX were good enough)
well, the generating is only a thing you'd do if you needed it, otherwise it'd be about the same as maintaining your .gitignore, which is a well-exercised muscle for most people i'd expect
(also was there a particular message re: their contextual dir constraints? or more a stand-in for the 'simple case'? or is their case actually hard?)
.gitignore support is something we should have regardless. IMO it's a bug that we're not able to apply .gitignore by default like we initially tried to.
imo the other stuff we're discussing here is when you need to filter even more
yeah, i'm just suggesting when you need more, it'd just be more repeating the same .gitignore pattern, just as other files representing views of your repo, instead of learning another ignore/path specification system
If we're talking about fanvue: here's the first message I got from him
like instead of // +defaultPath="/" +ignore=["**", "!**/*.go"] it'd be // +defaultDir="/.gopaths" or something, with .gopaths providing the ignore list in a .gitignore-compatible format
which could either be hand-maintained or generated, but that's an external exercise
Mmmm... We did ship a way to attach ignore filters to specific arguments of specific functions... Via "toolchain customizations". So we could lean on that if we wanted.
Different design from your example - instead of the code referencing an ignore file, it's a config file referencing the code... But same general idea (maintain your filters in a standard config format alongside your code, instead of mixed within it)
It'd be nice if we could make it more transparent, so you can configure filters once in a generic way and have things generally "just work" across toolchains.
Strawman idea: what if we could map file extensions to filters? e.g. %.go could represent all .go files and all the files they need , i.e. (**/*.go + and all // go:embeds + all go.mod replaced paths). The Go toolchain would use something like // +sources=["%.go"] for its source arg. %.go could default to **/*.go and be overridden by reading a conventional filename in the target directory (like .paths/go). The Go toolchain could provide a generator that finds the other dependencies and writes them to that file. In the CLI, foo/%.go could apply the same filters but scoped to foo/.
That's sort of designed on the theory that most args are generally accepting one or a handful of primary file types, and the dependencies descend from there.
Here's that https://github.com/dagger/dagger/pull/11642
I think its pretty cool actually. I'm also less concerned about this than I was about overlays. Wdyt @compact folio ?