#linting with not-committed autogen code

1 messages ยท Page 1 of 1 (latest)

crude magnet
#

potentially, could we go back to committing the autogen code? but still generate it at runtime (and maybe warning if there was a difference).
that way normal linting setups will "just work" again

lethal helm
#

we could, but I quite liked not committing 4k lines of code per module D:

hollow pagoda
#

As long as we combine it all into a single file, set the gitattributes to not show by default, etc. I think I'd be okay w/ it

lethal helm
#

Sure, but it's usually a bit more localized. Right now folks are putting a ton of modules in the same repo

#

If using protobuf meant generating 4k lines of client code at every call site I'm guessing people would be somehow less pleased with it

hollow pagoda
#

I guess I just don't see the practical downsides since it's just a git add still

#

(hopping into one-on-one, will check back after)

lethal helm
#

I don't think it's a matter of practicality, mostly aesthetics tbh. Some people feel passionately about repo bloat and clutter, and committing 90%-the-same code repeatedly in a repo isn't very pretty, practical or not. But then again committing un-compileable code is also not very pretty.

#

Maybe this can be solved by restructuring how codegen works; the biggest explosion came when we changed it to generate a full client instead of "just" the module dependency code, which was mostly to support monkey-patching, which is also a subject of debate at the moment (vs "safer" models like wrapping and namespacing type definitions, even though our monkey-patching model is still at least relatively safe because of the everyone-gets-a-schema-view behavior)

hollow pagoda
#

If the concern is from having multiple modules in a single repo and thus multiplying 4k lines by number of modules, then I feel like we could just find a way for the generated code to be shared. So Some Wayโ„ข๏ธ of being able to write logically different modules that can all be loaded separately, but they all use the same generated code.

We have a ton of flexibility in terms of changing how that's structured. The current approach of forcing use of a main package, always putting the generated code in the same package, etc. are just because they were most obvious to get going with. The x/tools/packages library is pretty flexible, it shouldn't be a massive lift to split things up across packages more.

#

But overall, I am inclined to see who actually cares about committing generated files. My instinct is that they'd be few and far between and probably get over the hurdle provided it's purely aesthetics, but I'm low conviction. It's mostly just based on how I feel, which is obviously not always gonna be how everyone else feels.

lethal helm
#

Yeah, we're both just guessing really. I'm always on team-commit-generated-code, but this case that seems to be testing my own personal limits of that (which I didn't even expect to have!), just because of the sheer amount of sprawl. If we can at least de-dupe the generated code somehow, or make it so we don't have to duplicate the "core" generated code, I'd be fully onboard with committing it, though we may still want regen-at-runtime just to ensure compatibility as dependent module's committed code grows stale

#

(though even that might not be a huge issue once the internal SDK API stabilizes)

hollow pagoda
#

though we may still want regen-at-runtime just to ensure compatibility as dependent module's committed code grows stale
Totally agree, basically what @crude magnet said above IIUC in terms of warning when there's a difference too and such

crude magnet
#

Maybe it's alright to allow both models for now - that way users can pick the policy they like, while for consumability we always regen in the engine

lethal helm
#

yeah, it's easy to support both, we just need a way for the user to indicate which model they want - basically, do you want it in .gitattributes or in .gitignore

crude magnet
#

Maybe a codegen section in dagger.json? Or something like that

lethal helm
#

could be a flag but you probably don't want to pass it every time

#

yeah dagger.json is probably the most sensible place to put it

#

"ignore_generated": true shrug

fiery grove
hollow pagoda
fiery grove
#

I was wondering if it'd be at all possible for generated code to "embed" top-level code, e.g.

//
// SDK
//
type Container interface {
  ...
}

type UpstreamContainer struct {
  ...
}

//
// codegen
//

type Container struct {
  UpstreamContainer
}

func (c *Container) MyCustomStuff() {
   ...
}
#

IIRC I mentioned this to you @hollow pagoda but you explained why it wouldn't work, just can't remember

hollow pagoda
fiery grove
#

On a side-note: to lessen the pain slightly, maybe we could move some SDK stuff out of internal (e.g. querybuilder) so it can be imported/re-used

hollow pagoda
#

Which is something that happens all the time in Go when trying to embed, even outside this use case, so I don't feel like we're doing something wrong or missing something. This is just Go being Go and we have to work with it.

fiery grove
#

Yeah it would break chaining completely

#

I mean, chaining with custom stuff

#

dag.Container().Module().Foo().(*Container).Blargh()

#

Ok so, this is probably worse / no better, just brainstorming:

  • There's an upstream interface (e.g. Container)
  • There's a downstream interface, which implements upstream + custom fields (e.g. like a ReadCloser that also implements Reader)
  • Downstream code re-implements all functions. However, the implementation for upstream functions is just "call upstream + cast"
  • Downstream types can be passed around to functions requiring upstream types (e.g. ReadCloser can be passed instead of a Reader)
#

This should reduce the boilerplate (1 one-liner ish per upstream function) while keeping compat

#

Eh, not necessarily much better, way more complexity on our end

lethal helm
#

is there a hack we can do with generics?

type Container[T] struct {
  T
}

That would live in a common SDK package, and the module would always instantiate it with its own struct that contains additional methods

#

No clue if generics support that, trying to abuse embedding ๐Ÿ˜‚

fiery grove
#

IIRC generics are pretty limited in Go. I should preface that by saying I don't know much about generics in Go

lethal helm
#

./prog.go:10:2: embedded field type cannot be a (pointer to a) type parameter

#

Had other issues anyway, e.g. how would the embedded container refer back to the "real" container

fiery grove
#

kinda like this. Upstream calls are semi-passthrough. Downstream uses some query-builder functions expoed by upstream

#

this should shrink down the codegen part to commit quite a bit

#

and local.*Container implements dagger.Container so it can be passed around to outside code

#

downside is codegen complexity (now there's two modes, upstream gen and downstream gen)

#

also in this scenario no need to un-internal querybuilder since downstream would call upstream to add stuff into the query, rather than using querybuilder directly

downside is there needs to be some ugly public facing functions to do so

#

@hollow pagoda @lethal helm side note: what are your thoughts about core-type extension and namespacing?

lethal helm
# fiery grove <@949034677610643507> <@108011715077091328> side note: what are your thoughts ab...

A big part of me wants to wait and see how it plays out, i.e. allow everyone to extend core types, but probably draw the line at allowing modules to extend other module's types. We're doing something pretty novel with how the dependency namespaces interact; my initial instict was "oh that's scary" thinking back to Ruby, but Ruby was a total wild west. Our collision problem reduces to your direct dependencies, even excluding their dependencies. Ruby was one big ball of mud, so collisions were a bigger concern, and even that didn't feel like a day-to-day threat.

fiery grove
#

๐Ÿ‘ yeah agree with the feeling

#

asking now because if we decide not to allow that, it changes completely the codegen problem (e.g. there's only one Container, and it's upstream, nice and easy)

lethal helm
#

Yeah, I've been trying to push the two problems against each other and hope one "wins" ๐Ÿ˜›

hollow pagoda
#

Yeah I was not sure until I asked about it in one of the Zenith Weekly chats and there was a lot of unambiguous excitement and support for it.

In terms of namespacing, while extending core types definitely increases the opportunities to conflict, the conflicts aren't fundamentally different from the conflicts we already needed to deal with and the solution is the same. It's just a matter of prefixing w/ module name (we were just discussing the details some more here: #1161413645075501186 message) .

So I don't feel bad about enabling it at all anymore. I would advocate for even allowing it on any types, including those from other modules, but could be convinced otherwise.

#

One thing I just thought of, doesn't Rust allow you to do this with Traits? Add new methods to existing objects you've directly imported? And I think the abstract idea behind it are "type classes".

Just wondering if there's anything we could learn from them. I'm also going way out on a limb w/ my knowledge here, so I could be spouting nonsense, take with salt.

lethal helm
#

Yeah not really sure how the prefixing should work, in a cross-language way. I don't really like the idea of the codegen just transparently switching from Foo to Bar_Foo or Baz_Foo; I'd almost rather it error and give me some way to disambiguate, since I might not even remember the name of the module that provided the API (which you need immediately for autocomplete)

#

Maybe that's an implementation detail, it's fundamentally the same solution

#

I hadn't thought of using _ for this before, but it's safer than just BarFoo since you can end up with second-order conflicts there. But I don't think we can cheat as easily at the GraphQL layer as we can with the type names. Don't think we'd want all our GraphQL demos to look like container { netlify_deployToNetlify() }, so we probably only want to do the prefixing when necessary, unlike types which we can do all the time

hollow pagoda
# lethal helm Yeah not really sure how the prefixing should work, in a cross-language way. I d...

Yeah error + some way to have the user tell us how to disambiguate is another approach for sure.

Another consideration in this are docs (i.e. go docs). Since we emphasize the consumer-side so heavily, I think we'll want to make it easy for consumers to view docs for their custom client too (as opposed to go.dev where docs are generated based on the "producer"). That was going to be needed even disregarding the namespacing, but definitely becomes more important w/ namespaces. Fortunately I think we have plenty of good options here, even all the way to using services + host proxying to serve docs as webpages on the fly

crude magnet
iron zenith
#

Quick poll on this topic, how do you all feel about not mixing all dependencies into a single GraphQL schema , but instead exposing each dependency in its own GraphQL schema, and allowing IDs from one schema to be used in other schemas? I know @hollow pagoda mentioned it would be "a lot of refactoring work in the client", so wouldn't be cheap. But setting aside feasibility for a minute, would we want to do it? Would it solve our schema multiplexing problems (above) and result in a better DX?

hollow pagoda
iron zenith
hollow pagoda
#

Overall, I think I'm pretty inclined for our very first starting place to be no namespacing and returning an error if there's a conflict (should be the case already today). Then whenever (/if ever) people complain to us, we'll get the data on how often it happens and why the conflicts are arising in the first place.

I think if we try to solve it too early we probably end up less informed in the long run, maybe. But the good part is that once we have data we have a whole bunch of options available, almost all of which I'd be perfectly happy with; much better than having no clues at all ๐Ÿ™‚

I guess the most important thing in the meantime is to just keep our options open and make sure we don't take any one-way doors that rule out potential solutions if/when we get there.

iron zenith
#

I don't see any scenario where that doesn't become dll hell..

#

And I worry it's not a clean two-way door: once modules are out there, and we start seeing conflicts (IMO inevitable) it will break everyone to fix it

lethal helm
#

Is DLL hell the same as Ruby's ball-of-mud namespacing? If so, I don't think the comparison does justice to what we've actually built (see #1161333209594855464 message)

#

Ruby's "ball of mud" was the fact that everything existed in a global namespace, with "modules" (and classes) really acting more like named fence-posts that anyone could jump over freely at any time, so any Gem you loaded could add methods to any other class, or to Object, etc. - which is precisely not what we have, in fact you could argue we namespace relatively aggressively to avoid that. Instead each module has its own 'namespace' formed via the union of its direct module dependencies, which don't leak their transitive dependencies, instead each dependency has its own view, and so on, and so on. So the likelihood of collision is waaay smaller than you'd find in a lot of other languages.

iron zenith
#

I'm just talking about a very basic scenario like:

  • I import my supergit module, which has a type Repository
  • I import, say a Debian module, which also has a type called Repository
  • My module is broken because I now have two different types with the same name
#

This will happen all the time

#

I think of it as completely separate from monkeypatching. I no longer feel the need to monkeypatch core types (or other module types) anymore, I just create my own types and wrap as needed.

The only practical use of monkeypatching right now is to make it easier to pass directories and containers as arguments from a raw graphql query, while we wait for dagger call etc

lethal helm
#

Oh, yeah that part 100% needs to be fixed, but I don't think it needs to be super complicated

iron zenith
#

Sure, make it SupergitRepository and DebianRepository and I'm happy ๐Ÿ™‚

(My naive guess as how we would do it)

lethal helm
#

Probably something like Supergit_Repository and Debian_Repository, since the _ keeps us safe (there's no way to put that in a module/type name, and it's the only GraphQL-valid char available)

iron zenith
#

Right.

lethal helm
#

The nice thing is you never see type names in GraphQL queries - so it's mostly out-of-sight

hollow pagoda
# iron zenith Sure, make it `SupergitRepository` and `DebianRepository` and I'm happy ๐Ÿ™‚ (My ...

There's some more discussion from yesterday on the details of this approach here https://discord.com/channels/707636530424053791/1161413645075501186

I agree that prefixing is the gist of all the possible solutions. And I think it almost certainly makes sense to just always prefix on the level of raw graphql schema.

Only question mark remaining for me is for the level of SDK client codegen; do we always do the prefixing by default? Do we only prefix when there's a conflict? Do we make this controllable by the user, e.g. dagger mod use --prefix <optional prefix, defaults to module name>? etc.

All of which ultimately is potentially a per-sdk decision too. Those are the questions where I'm not sure we can have an informed answer until we have more data. If conflicts are super common, then always prefixing begins to make more sense in my mind. If it's rarer, then I'd become more biased towards dagger mod use --prefix type solutions.

iron zenith
#

otherwise things change too much based on adding a dependency. Too much surprise.

hollow pagoda
#

I agree the potential surprise is the tradeoff. I think the downsides of always doing it are increased verbosity when writing code and potentially awkward resulting names.

One example I can think of right away would be creating an object named PyPIRepository in a module named python, which for consumers would result in an object named PythonPyPIRepository, which starts to feel like one of those super verbose autogenerated names you see in Java all the time.

And w/ monkeypatching, a module named Foo that adds a function WithBar to Container would result in a method called FooWithBar, which is going to sound very weird in many cases.

That's why I'm still undecided on whether it should be done by default or only when conflicts are actually detected. I just wouldn't want to make everything way more verbose + ugly for everyone in order to handle a case that may or may not end up being relatively rare. If however it turns out to happen all the time then I think the surprise factor takes over in importance and we should probably just always prefix.