#linting with not-committed autogen code
1 messages ยท Page 1 of 1 (latest)
potentially, could we go back to committing the autogen code? but still generate it at runtime (and maybe warning if there was a difference).
that way normal linting setups will "just work" again
we could, but I quite liked not committing 4k lines of code per module D:
I think the counter argument is that having to commit tons of generated code isn't really that unusual. E.g. this is just one of many pb.go files in buildkit, which has 13k lines: https://github.com/moby/buildkit/blob/master/solver/pb/ops.pb.go
As long as we combine it all into a single file, set the gitattributes to not show by default, etc. I think I'd be okay w/ it
Sure, but it's usually a bit more localized. Right now folks are putting a ton of modules in the same repo
If using protobuf meant generating 4k lines of client code at every call site I'm guessing people would be somehow less pleased with it
I guess I just don't see the practical downsides since it's just a git add still
(hopping into one-on-one, will check back after)
I don't think it's a matter of practicality, mostly aesthetics tbh. Some people feel passionately about repo bloat and clutter, and committing 90%-the-same code repeatedly in a repo isn't very pretty, practical or not. But then again committing un-compileable code is also not very pretty.
Maybe this can be solved by restructuring how codegen works; the biggest explosion came when we changed it to generate a full client instead of "just" the module dependency code, which was mostly to support monkey-patching, which is also a subject of debate at the moment (vs "safer" models like wrapping and namespacing type definitions, even though our monkey-patching model is still at least relatively safe because of the everyone-gets-a-schema-view behavior)
Related: https://discord.com/channels/707636530424053791/1156394226389622874 (cc @fiery grove @iron zenith in case y'all have any recent thoughts/opinions on codegen/monkey-patching now that you've used them for a bit)
If there's a way to do it that doesn't sacrifice the DX, then I'm all for it. I just couldn't find any other way due to the various constraints/quirks in Go. E.g. type aliases don't work because once you've defined the alias it's awkward to "go back" and have access to the original methods on the source of the alias.
If the concern is from having multiple modules in a single repo and thus multiplying 4k lines by number of modules, then I feel like we could just find a way for the generated code to be shared. So Some Wayโข๏ธ of being able to write logically different modules that can all be loaded separately, but they all use the same generated code.
We have a ton of flexibility in terms of changing how that's structured. The current approach of forcing use of a main package, always putting the generated code in the same package, etc. are just because they were most obvious to get going with. The x/tools/packages library is pretty flexible, it shouldn't be a massive lift to split things up across packages more.
But overall, I am inclined to see who actually cares about committing generated files. My instinct is that they'd be few and far between and probably get over the hurdle provided it's purely aesthetics, but I'm low conviction. It's mostly just based on how I feel, which is obviously not always gonna be how everyone else feels.
Yeah, we're both just guessing really. I'm always on team-commit-generated-code, but this case that seems to be testing my own personal limits of that (which I didn't even expect to have!), just because of the sheer amount of sprawl. If we can at least de-dupe the generated code somehow, or make it so we don't have to duplicate the "core" generated code, I'd be fully onboard with committing it, though we may still want regen-at-runtime just to ensure compatibility as dependent module's committed code grows stale
(though even that might not be a huge issue once the internal SDK API stabilizes)
though we may still want regen-at-runtime just to ensure compatibility as dependent module's committed code grows stale
Totally agree, basically what @crude magnet said above IIUC in terms of warning when there's a difference too and such
Maybe it's alright to allow both models for now - that way users can pick the policy they like, while for consumability we always regen in the engine
yeah, it's easy to support both, we just need a way for the user to indicate which model they want - basically, do you want it in .gitattributes or in .gitignore
Maybe a codegen section in dagger.json? Or something like that
could be a flag but you probably don't want to pass it every time
yeah dagger.json is probably the most sensible place to put it
"ignore_generated": true 
Yeah agree with the aesthetics part. Especially since it's different per language -- while it's common to commit generated code in Go, it's less so in other languages (IIRC Go was the first to "normalize" this pattern)
And there's this too
Oh on this note, all of this is highly specific to Go. We don't need the other SDKs to take the same approach. In Python/JS/TS for example, it's most likely much easier to monkey patch without needing to jump through so many hoops, so they can do whatever makes the most sense.
I was wondering if it'd be at all possible for generated code to "embed" top-level code, e.g.
//
// SDK
//
type Container interface {
...
}
type UpstreamContainer struct {
...
}
//
// codegen
//
type Container struct {
UpstreamContainer
}
func (c *Container) MyCustomStuff() {
...
}
IIRC I mentioned this to you @hollow pagoda but you explained why it wouldn't work, just can't remember
There's a few annoyances, the biggest being that if something else is returning a Container (i.e. some other method in the SDK), you all of a sudden lose your custom methods
On a side-note: to lessen the pain slightly, maybe we could move some SDK stuff out of internal (e.g. querybuilder) so it can be imported/re-used
Which is something that happens all the time in Go when trying to embed, even outside this use case, so I don't feel like we're doing something wrong or missing something. This is just Go being Go and we have to work with it.
Right! That's the reason
Yeah it would break chaining completely
I mean, chaining with custom stuff
dag.Container().Module().Foo().(*Container).Blargh()
Ok so, this is probably worse / no better, just brainstorming:
- There's an upstream interface (e.g.
Container) - There's a downstream interface, which implements upstream + custom fields (e.g. like a
ReadCloserthat also implementsReader) - Downstream code re-implements all functions. However, the implementation for upstream functions is just "call upstream + cast"
- Downstream types can be passed around to functions requiring upstream types (e.g.
ReadClosercan be passed instead of aReader)
This should reduce the boilerplate (1 one-liner ish per upstream function) while keeping compat
Eh, not necessarily much better, way more complexity on our end
is there a hack we can do with generics?
type Container[T] struct {
T
}
That would live in a common SDK package, and the module would always instantiate it with its own struct that contains additional methods
No clue if generics support that, trying to abuse embedding ๐
IIRC generics are pretty limited in Go. I should preface that by saying I don't know much about generics in Go
./prog.go:10:2: embedded field type cannot be a (pointer to a) type parameter
Had other issues anyway, e.g. how would the embedded container refer back to the "real" container
func (c *Container) Leaf(ctx) (string, error) { return c.Upstream.Leaf(ctx) }
func (c *Container) WithUpstream() *Container {
c.Upstream = c.Upstream.WithUpstream()
return c
}
func (c *Container) WithModule() *Container {
c.Upstream = c.Upstream.XX_Select("withModule", ...)
return c
}
kinda like this. Upstream calls are semi-passthrough. Downstream uses some query-builder functions expoed by upstream
this should shrink down the codegen part to commit quite a bit
and local.*Container implements dagger.Container so it can be passed around to outside code
downside is codegen complexity (now there's two modes, upstream gen and downstream gen)
also in this scenario no need to un-internal querybuilder since downstream would call upstream to add stuff into the query, rather than using querybuilder directly
downside is there needs to be some ugly public facing functions to do so
@hollow pagoda @lethal helm side note: what are your thoughts about core-type extension and namespacing?
A big part of me wants to wait and see how it plays out, i.e. allow everyone to extend core types, but probably draw the line at allowing modules to extend other module's types. We're doing something pretty novel with how the dependency namespaces interact; my initial instict was "oh that's scary" thinking back to Ruby, but Ruby was a total wild west. Our collision problem reduces to your direct dependencies, even excluding their dependencies. Ruby was one big ball of mud, so collisions were a bigger concern, and even that didn't feel like a day-to-day threat.
๐ yeah agree with the feeling
asking now because if we decide not to allow that, it changes completely the codegen problem (e.g. there's only one Container, and it's upstream, nice and easy)
Yeah, I've been trying to push the two problems against each other and hope one "wins" ๐
Yeah I was not sure until I asked about it in one of the Zenith Weekly chats and there was a lot of unambiguous excitement and support for it.
In terms of namespacing, while extending core types definitely increases the opportunities to conflict, the conflicts aren't fundamentally different from the conflicts we already needed to deal with and the solution is the same. It's just a matter of prefixing w/ module name (we were just discussing the details some more here: #1161413645075501186 message) .
So I don't feel bad about enabling it at all anymore. I would advocate for even allowing it on any types, including those from other modules, but could be convinced otherwise.
One thing I just thought of, doesn't Rust allow you to do this with Traits? Add new methods to existing objects you've directly imported? And I think the abstract idea behind it are "type classes".
Just wondering if there's anything we could learn from them. I'm also going way out on a limb w/ my knowledge here, so I could be spouting nonsense, take with salt.
IIUC it seems like Rust has a specific syntax for disambiguating in that case: https://stackoverflow.com/questions/49319935/how-can-i-avoid-a-function-name-clash-while-implementing-a-trait
Which is interesting, but I suppose for our use case the closest thing possible probably does end up just being to have prefixes to disambiguate
Yeah not really sure how the prefixing should work, in a cross-language way. I don't really like the idea of the codegen just transparently switching from Foo to Bar_Foo or Baz_Foo; I'd almost rather it error and give me some way to disambiguate, since I might not even remember the name of the module that provided the API (which you need immediately for autocomplete)
Maybe that's an implementation detail, it's fundamentally the same solution
I hadn't thought of using _ for this before, but it's safer than just BarFoo since you can end up with second-order conflicts there. But I don't think we can cheat as easily at the GraphQL layer as we can with the type names. Don't think we'd want all our GraphQL demos to look like container { netlify_deployToNetlify() }, so we probably only want to do the prefixing when necessary, unlike types which we can do all the time
Yeah error + some way to have the user tell us how to disambiguate is another approach for sure.
Another consideration in this are docs (i.e. go docs). Since we emphasize the consumer-side so heavily, I think we'll want to make it easy for consumers to view docs for their custom client too (as opposed to go.dev where docs are generated based on the "producer"). That was going to be needed even disregarding the namespacing, but definitely becomes more important w/ namespaces. Fortunately I think we have plenty of good options here, even all the way to using services + host proxying to serve docs as webpages on the fly
Agreed, the autoprefixing feels like it could very easily become a total pain.
I wonder if the go-level disambiguation could be done using a generic .As[MyContainer] func, or .As[YourContainer] - internally we could use reflection for constructing the right type, but we get a nice type-safe interface on the outside.
Quick poll on this topic, how do you all feel about not mixing all dependencies into a single GraphQL schema , but instead exposing each dependency in its own GraphQL schema, and allowing IDs from one schema to be used in other schemas? I know @hollow pagoda mentioned it would be "a lot of refactoring work in the client", so wouldn't be cheap. But setting aside feasibility for a minute, would we want to do it? Would it solve our schema multiplexing problems (above) and result in a better DX?
I don't think it would solve the hardest (imo) problem, which is how to deal with name conflicts in the codegen'd client SDKs, particularly Go. Having separate schemas would get rid of naming conflicts in the raw graphql level (in exchange for much less chainability), but you'd still have the problem of naming conflicts between structs+methods when generating Go code
Ah, you're right. I was thinking of the problem of namespacing function names, if each module could got its own global codegen var (basically splitting up dag). But it wouldn't solve the other half of the problem which is types.
Overall, I think I'm pretty inclined for our very first starting place to be no namespacing and returning an error if there's a conflict (should be the case already today). Then whenever (/if ever) people complain to us, we'll get the data on how often it happens and why the conflicts are arising in the first place.
I think if we try to solve it too early we probably end up less informed in the long run, maybe. But the good part is that once we have data we have a whole bunch of options available, almost all of which I'd be perfectly happy with; much better than having no clues at all ๐
I guess the most important thing in the meantime is to just keep our options open and make sure we don't take any one-way doors that rule out potential solutions if/when we get there.
I don't see any scenario where that doesn't become dll hell..
And I worry it's not a clean two-way door: once modules are out there, and we start seeing conflicts (IMO inevitable) it will break everyone to fix it
Is DLL hell the same as Ruby's ball-of-mud namespacing? If so, I don't think the comparison does justice to what we've actually built (see #1161333209594855464 message)
Ruby's "ball of mud" was the fact that everything existed in a global namespace, with "modules" (and classes) really acting more like named fence-posts that anyone could jump over freely at any time, so any Gem you loaded could add methods to any other class, or to Object, etc. - which is precisely not what we have, in fact you could argue we namespace relatively aggressively to avoid that. Instead each module has its own 'namespace' formed via the union of its direct module dependencies, which don't leak their transitive dependencies, instead each dependency has its own view, and so on, and so on. So the likelihood of collision is waaay smaller than you'd find in a lot of other languages.
We've got a sort of hybrid: classes are "open" in the sense that you can extend them, but those extensions are only exposed to those who directly request them via a module dependency. Closer to Ruby's "Refinements" than full-blown monkey-patching: https://docs.ruby-lang.org/en/2.4.0/syntax/refinements_rdoc.html
I'm just talking about a very basic scenario like:
- I import my supergit module, which has a type
Repository - I import, say a Debian module, which also has a type called
Repository - My module is broken because I now have two different types with the same name
This will happen all the time
I think of it as completely separate from monkeypatching. I no longer feel the need to monkeypatch core types (or other module types) anymore, I just create my own types and wrap as needed.
The only practical use of monkeypatching right now is to make it easier to pass directories and containers as arguments from a raw graphql query, while we wait for dagger call etc
Oh, yeah that part 100% needs to be fixed, but I don't think it needs to be super complicated
Sure, make it SupergitRepository and DebianRepository and I'm happy ๐
(My naive guess as how we would do it)
Probably something like Supergit_Repository and Debian_Repository, since the _ keeps us safe (there's no way to put that in a module/type name, and it's the only GraphQL-valid char available)
Right.
The nice thing is you never see type names in GraphQL queries - so it's mostly out-of-sight
There's some more discussion from yesterday on the details of this approach here https://discord.com/channels/707636530424053791/1161413645075501186
I agree that prefixing is the gist of all the possible solutions. And I think it almost certainly makes sense to just always prefix on the level of raw graphql schema.
Only question mark remaining for me is for the level of SDK client codegen; do we always do the prefixing by default? Do we only prefix when there's a conflict? Do we make this controllable by the user, e.g. dagger mod use --prefix <optional prefix, defaults to module name>? etc.
All of which ultimately is potentially a per-sdk decision too. Those are the questions where I'm not sure we can have an informed answer until we have more data. If conflicts are super common, then always prefixing begins to make more sense in my mind. If it's rarer, then I'd become more biased towards dagger mod use --prefix type solutions.
there was a specific request by @onyx turtle to always prefix, for consistency . That makes sense to me
otherwise things change too much based on adding a dependency. Too much surprise.
I agree the potential surprise is the tradeoff. I think the downsides of always doing it are increased verbosity when writing code and potentially awkward resulting names.
One example I can think of right away would be creating an object named PyPIRepository in a module named python, which for consumers would result in an object named PythonPyPIRepository, which starts to feel like one of those super verbose autogenerated names you see in Java all the time.
And w/ monkeypatching, a module named Foo that adds a function WithBar to Container would result in a method called FooWithBar, which is going to sound very weird in many cases.
That's why I'm still undecided on whether it should be done by default or only when conflicts are actually detected. I just wouldn't want to make everything way more verbose + ugly for everyone in order to handle a case that may or may not end up being relatively rare. If however it turns out to happen all the time then I think the surprise factor takes over in importance and we should probably just always prefix.