#`moduleTypes`, self calls, codegen, `dagger generate` and commit generated files

1 messages · Page 1 of 1 (latest)

olive bobcat
#

I have an issue with moduleTypes and I'm not sure how to solve it.

First, some history: moduleTypes was extracted when I worked on self calls. The way self calls work is fairly simple: the module itself is present in the type schema returned by the engine, so we can generate the necessary code. Before moduleTypes, we invoked the module with an empty function name and it responded with its types. Since the module needs those types for codegen, moduleTypes was extracted as a separate phase.
One nice side effect is that dagger functions doesn't need to execute the module at all, it only needs the types.
moduleTypes only depends on base types, since a module can't expose types from a dependency.

Outside of self calls, it gives cleaner phase separation, but it's not critical.

Now, the problem: I'm working on replacing dagger develop with dagger generate. The idea is to commit generated files, moving all that logic from runtime to a dev-time phase. The runtime shouldn't need any codegen at all.
But the generated code only contains the invocation boilerplate, it doesn't include the moduleTypes part (anymore). So even with generated code present, it can't answer a dagger functions. It also can't serve the types needed to generate self-call bindings. Which makes the generation somewhat useless.

My idea is to also generate moduleTypes. During generation we'd have something like:

  • analyze module source code
  • generate type defs (moduleTypes) with local persistence
  • send types to the engine
  • generate bindings and invocation boilerplate (the entrypoint), using the schema with dependencies (and with the analyzed types if self calls are enabled)

This also means the engine needs a way to query the module for its types, a bit like the legacy empty-function-name approach. But what I'd like is to get the types without any build step.

The benefits: we remove all codegen from runtime. And the cost of self calls becomes negligible 🎉 , it only exists at dagger generate time. If that's the case, we can imagine to remove it from experimental and enable it by default. Performance was a blocker.

My question is: I'm not sure how to persist this moduleTypes result. If we could describe it as a single query, that would help a lot, but today that's not the case. Maybe it's something we can make work. Or other ideas?

Any ideas or feedback welcome

#

cc @uncut wraith @brisk loom @granite sleet @sly reef

sly reef
olive bobcat
#

no problem, I have hard time to explain my problem 😅
So, my issue is when we remove all codegen from the runtime.
Whatever it's develop or generate in fact. But let's says we generated the files of a go module, commit them. And then we want to list functions and call functions from this module.
The generated files contain the entrypoint, something like that:

func invoke(ctx context.Context, parentJSON []byte, parentName string, fnName string, inputArgs map[string][]byte) (_ any, err error) {
        _ = inputArgs
        switch parentName {
        case "MyModule":
                switch fnName {
                case "ContainerEcho":
                        // ...
                        return (*MyModule).ContainerEcho(&parent, stringArg), nil
                case "GrepDir":
                        // ...
                        return (*MyModule).GrepDir(&parent, ctx, directoryArg, pattern)
                default:
                        return nil, fmt.Errorf("unknown function %s", fnName)
                }
        default:
                return nil, fmt.Errorf("unknown object %s", parentName)
        }
}

WIth that, the SDK runtime can just grab those files, pull the dependencies and build it. No more introspection, codegen, etc. And we can dagger call FUNCTION because we just use the entrypoint.
But when we want to run dagger functions for instance, or dagger check, etc, we must have access to the different types exposed by the module.
Before self call, the problem didn't existed because the entrypoint was used to return the types exposed by the module. If it was still the case, it would be easy.
But it has been removed for self calls, because the exposed types are required before the codegen.
But to get those types means to introspect the code, use the SDK to create the module (containing the types), etc.
All that is a bit like a second, more limited, entrypoint/runtime execution, and the corresponding code is never exported to the host, not committed.
One of the ideas I have is to keep it like that, but export to the host the result of the moduleTypes. During a generate it's created, then exported with the rest of the code. And then when the engine needs to retrieve it, it's already there, no more introspection / codegen.

...
rubberduck
...

I think I just got an idea 🤔
Maybe I can bring back the entrypoint branch with the empty function name, but in a way that works with self calls. That could be nice, and even help to debug this aspect. Not 100% sure it can work, but I can investigate that

sly reef
#

All that is a bit like a second, more limited, entrypoint/runtime execution, and the corresponding code is never exported to the host, not committed.

That's the part I don't understand. Why is that part not exported to the host? Isn't it part of codegen?

#

In my mind, when you split the moduleTypes function, you split the runtime interface - instead of exposing a single entrypoint (dispatch) a runtime now has to expose 2 entrypoints: (dispatch and moduleTypes). So when a SDK generates a runtime, now it generates a runtime with 2 entrypoints. So, whether that 2-entrypoint-runtime is generated at runtime or not - what's the difference? What's special about that 2nd entrypoint ?

#

Ah is it more like 2 different runtimes, each with their own entrypoint - to avoid the overhead of building one big runtime? (Because moduleTypes requires a much smaller runtime?)

#

Or, should the second entrypoint just be literally a graphql schema?

brisk loom
#

Yeah, can’t you just generate a graphql schema of the types and export it on the host?

sly reef
#

But those schema files would have to be exposed via a standard runtime interface, which at the moment is not defined

sly reef
#

OK @olive bobcat I'm starting to understand. It's not that dagger functions breaks, it's just that it's still doing codegen at runtime, so at the moment only half the problem is solved: dagger call doesn't do codegen anymore, but dagger functions (and dagger call --help, and dagger check -l...) still does

sly reef
#

Digging a little deeper...

It looks like moduleTypes() already returns a Container... With a weird runtime contract but I'm sure there are excellent legacy reasons for the weirdness 🙂

In any case, if there's a container, that means there's a entrypoint... And if there's an entrypoint, there's source code somewhere - probably generated. So why not generate that moduleTypes entrypoint source code, and commit that?

sly reef
sly reef
olive bobcat
#

Yes, I know the issue is a bit weird, and I have hard time to clearly explain it 😅
I can explain why this is a container, like the runtime (TL;DR: it's because of the way we deal with engine version mostly)
I'll have a look the your prototype, I also prototyped something in that spirit, that was storing the module def in dagger.json, but it wasn't that nice. Just a quick experiment.
The first version of moduleTypes was writing a json representing the types. But at that time we had a lot of discussions regarding if this isn't causing more pain, as it means we have a second kind serialization of what is already covered by the API. But maybe I'll go back to that. Especially it means we don't need the API to get the types, we are not doing calls, we just analyse the source code and generate a representation of the signatures. I'd love to have that in the API directly, but not sure that's doable (because of references).

sly reef
olive bobcat
#

@brisk loom What do you think about that approach? This reminds me a lot the Json attempt I made at the beginning of moduleTypes/selfcalls (but more generic). Maybe the right time to do that again.

brisk loom
#

So if I understand well, we would have:

  • Introspection of the modules to extract types
  • Store the result as a JSON artifact (or even better if its a GQL schema)
  • Have an way in the engine to load that artifact as a module so we can answer dagger function

Am I right or there's something missing?

olive bobcat
#

I think that's the main idea. The goal for me is so that when a dagger call happen, there's no codegen at all.
So for that, it means that dagger generate (develop phase) must perform the moduleTypes and codegen and the result of both must be exported.
That way a dagger functions will be way quicker as it only means to read and load a representation of the module, even if there's no cache.

The persistence above, even if scoped for now on moduleTypes, is more general. It's "allow to to store and load a representation of an object". So that means without IDs. In that way it's closer to the JSON representation I did at start.
But my fear with it is we are introducing a second way to serialize objects. Maybe that's not a problem. But it's a bit more complexity.
GQL schema, maybe it's possible, but I guess we will not do that for all types, it's more like a new type defs api. no?

brisk loom
#

A total different idea but can't we generate another entrypoint specially to return these moduleTypes?

#

GQL schema, maybe it's possible, but I guess we will not do that for all types, it's more like a new type defs api. no?
If we only need an artifact for typedef, I think a GQL schema is enough no?

sly reef