#withExec, module and engineVersion

1 messages · Page 1 of 1 (latest)

tribal vault
#

Context:

  • To know the types exposed by a module, the SDKs are providing a Runtime container. This container has an entrypoint that is designed to invoke functions (like the function called by a dagger call ...). With a special case, if the function name is empty, then it returns the type definitions.
  • Those type definitions are API calls, like dag.Module().WithEnum(dag.TypeDef().WithEnum().WithEnumMember())
    • This one specifically has a different behaviour based on the engineVersion defined for the module
    • For instance before v0.18.11 WithEnumMember fall backs to WithEnumValue
  • When dagger functions is called, the container Runtime is created and called with an empty function name. The type definitions are returned. Those calls (like the dag.Module().... as seen above) are made against a schema aware of the module's version. If the module version if v0.18.10 WithEnumMember will fallback as expected to WithEnumValue.
#

Changes:

  • The type definitions of a module has been extracted in its own function, ModuleDefs, returning directly a dagger.Module instance representing the types instead of a container
    • There's no anymore this indirection container -> call a function using the container's entrypoint -> get the module from the result
    • This is a direct call from the SDK interface in the engine
  • Currently, it appears the dag.Module().WithEnum(...) is made against the schema seeing the SDK's version and not the module's version
    • This means if the module defines a version v0.18.10 to use the old behavior of the enums and the SDK version is newer, then the schema will not fallback to WithEnumValue
#

How is it supposed to work:

  • in both situations, the execution of the code that will expose the types is made through a WithExec with experimentalPrivilegedNesting set so that the connection to the dagger engine is propagated
  • when a call is made from a module, the module has been loaded and the engineVersion defined in dagger.json is used to configure the schema
  • but in our specific case, the module is not yet loaded. The type registration is one of the things we are doing during the loading of the module. We need to have access to the types exposed by the module to be able to correctly load the module.
  • so when a withExec is performed to access the types, initiated by the module's SDK, the client is using the version defined in the SDK's dagger.json. And this causes the issue
#

But how is that working with the previous version, with the container runtime?

  • the SDK returns a container Runtime containing the user module's code and all the generated things in order to invoke functions (and get types)
  • this is not a service, so we are not running the entrypoint that way
  • a withExec with no args is performed, with useEntrypoint set to true -> this allows to describe the container using an entrypoint, that makes sense, but at the same time exec it like any other command
  • some extra execMD metadata are also part of the dance!
    • execMD? What's that? It's not part of the API 🤔
    • Well, sort of. This is a valid argument we can add, when using Select, but yes it's not exposed through the API.
    • This is in fact some buildkit execution metadata
    • one of the metadata, set when the function is called on the container runtime, is EncodedModuleID
      • when this ID is used to load the corresponding module, it also allows to access the engineVersion defined in the module
      • the client session reads it, load the module, access the engine version and configure the API schema accordingly
      • -> the call is made against the right schema version
#

And obviously this extra dance with the execMD is missing in the new approach made without a container.

So, how to fix that? One of the difficulties now is the withExec on which we'd like to add the same execMD metadata is now defined by every SDK. The entrypoint used as a way to define the command to execute is a very nice trick so that those metadata can be injected from the engine side even if not set (I mean it's impossible to set from the API perspective) at the SDK level. And this is even something the SDK should not be aware of, this is really internal details.

I tried to explore different ways to fix that, but without a lot of success. One way could be to use the exact same trick: to return a container and not directly a dagger.Module so that the engine will WithExec the entrypoint of the container and inject at the same time the right execMD. I can't see any reason that shouldn't work.

The main cons I'm seeing with that is the new SDK interface expected a function returning a dagger.Module when we want a dagger.Module instead of a generic dagger.Container makes more sense. It better express the intent and the requirements. To return a container can work, but I'd love to avoid that.

One other solution, as it was at some point partially made, is to keep a specific signature but returning an other representation of the types, and those types are translated to dagger.Module inside the engine. That way the engine can entirely control the management of the versions. But it's more work, more complexity, and the benefit is not very clear to me.

#

That said, I'm all ears, if you have any idea. Or if you have any comment to what I wrote, maybe some aspects are not precise enough or wrong.

finite coral
#

@tribal vault do you discuss these things with a llm?

#

if not you should

tribal vault
#

yes, I started to discuss that with Claude Code.
I already discovered most of the things, but it helps me to understand some aspects

#

I tried to use if to fix it, but it didn't went very well, there's some complexity that is hard to understand on the impact of the changes it was suggesting

#

For instance it never propose to go back to a container. And for now that would be my option as I think it will work with not that much changes

finite coral
#

cc <@&946480760016207902>

bronze vector
tribal vault
#

The "container entrypoint - with-exec" dance 👯 did the trick!
It's now working as expected.
I'm still not super happy with the result, as I wanted to avoid to return a container as I think it's not a good API, not expressive enough. Basically by asking a container, it means you can return any container, you have no real way here to validated this is the container you want. By asking to return a dagger.Module this already constraints way more.
That said, the container solution works and tests are passing as expected 🤷