#Persistent Env & default secrets
1 messages ยท Page 1 of 1 (latest)
cc @twilit venture @whole imp I know you guys are busy so I'm going to try a first pass on my own, with a focus on persisting secrets as the killer feature... I'm not going to lie, I'm not 100% sure where to start ๐
(this is about https://github.com/dagger/dagger/discussions/10370 )
Persistent Env & default secrets
@whole imp the context of my question re: object IDs in default arg, is for my POC of default secrets
(cc @quick birch since I know you're interested in that)
@whole imp I was thinking about attaching the env-specific default to the schema as a default value... But if it pollutes the schema too much, I can also remove the argument from the schema, and inject it on invocation. It's less transparent (the client can't see the argument at all, and can't override it) but maybe in practice it doesn't matter
One interesting side effect in both cases is that different schema means different codegen...
So technically each change to .env.dagger.json * requires re-generating the client
๐ hm
that doesn't seem great
It also means that, if you have multiple environments that can be selected at runtime, you may have multiple possible schemas, so at any given time your local environment might only be compatible with one of the environments
It makes perfect sense when you think about it - we made configuration code. So it follows that configuration overrides also affect that code
mmm, i think it's confusing, especially if you imagine platform modules - if i modify my env config, does that affect the code that the module sees? (which is in a different repo)
It's still strictly better than non-code overrides. At least it's possible to generate a client for any override, and catch errors at build time
Oh wait! Forget everything I just said...
we also take a lot of care to avoid rebuilding modules where neccessary - if you have one module used many times, we only need to build it once. if all of those have different config, they'd all need to rebuild it
which feels like taking a huge performance hit
None of this applies for env overrides on the top-level module because we're overriding command-line args
It only becomes an issue if we allow overriding constructor arguments for dependencies which I know you were already worried about, and I wasn't particularly focusing on for now
right
is there a reason that we need to be modifying the schema itself? or does it make sense to provide an alternative mechansim?
It just seemed easiest and simplest. If I can inject the default value using the existing mechanism for default values, I'm done. I don't need to hook a middleware into execution for example
it also seemed like a plus that it's introspectable by the client
My plan was to try that first, and if I run into obstacles that make it impractical, look for alternatives
Beyond that, I'm not particularly attached to that approach vs. another
@whole imp how would I go from a core.SecretID to a core.JSON for that default value? Should I explicitly json encode it?
Side note: it makes me sad that the way we handled default directories seems so full of special cases all the way down. Function.WithArg has arguments like defaultValue but also defaultPath, ignore...
i raised this during the initial implementation https://github.com/dagger/dagger/pull/7744#discussion_r1668416694
json.Encode should just work
If I click that link, will I find a message of myself contradicting what I just said? ๐
I hate when that happens
Oh no that's different, I remember we had to make difficult choices on the dev-facing APIs (ie. what's in a pragma vs. a client-side default value). But I had no opinion (or visibility) in how it worked in the lower layers
What's sad (but maybe was unavoidable) is that we special-cased it all the way down in the engine internals
i mean it's fixable
there's no current interpretation of defaultValue for File/Directory
we can just deprecate defaultPath in the api, and reroute it to defaultValue - if we want
But defaultValue for directory would have to be an ID, and at the moment you can't construct an ID that reads files from the context right?
we could change how default values are interpreted there
atm, i think they map to ids and just json unmarshalling
but, we can change that
we already have to do some dancing to make this work for enums
Baby steps ๐ https://github.com/dagger/dagger/pull/10697
@whole imp do you have an opinion on the design for that first POC?
I need to decide if I take this POC and flesh it out, or if I should prototype a different approach instead
yeah, i'm confused because i don't see how it's actually implementing the proposal in 10370
i thought the point was to avoid reading from env vars
Ignore the env://... that's just a hardcoded value for the POC. Initially I hardcoded it to op://fooo/bar go read a secret from my 1password vault, but then it didn't work when doing dev | terminal because op isn't installed
The arg name and value will be read from a user-supplied file, .env or .env.dagger.json or something. I'm just not including that in the POC for now
Focusing on the means of injection instead
But, it shows a big ugly ID in usage message (see my descripton in the PR)
i would say, i'm curious about @full tapir's and @cerulean lintel's opinions on the modification of the schema like this
it feels weird to me, but i'm having difficulty understanding the implications
The alternative is a shadow schema, where what you see in the schema is not 100% correct
(ie you see a required argument "token", but actually you don't need to specify it)
My guess is that's probably worse
well if it's only for cli/shell + such, they don't use the graphql schema anyways
they use the typedefs api
Oh yeah I'm not changing the graphql schema directly, just using the typedef api
Was using "schema" in a loose sense
This is the entire patch:
--- a/engine/server/session.go
+++ b/engine/server/session.go
@@ -11,6 +11,7 @@ import (
"runtime"
"runtime/debug"
"slices"
+ "strings"
"sync"
"time"
@@ -1241,6 +1242,44 @@ func (srv *Server) ServeModule(ctx context.Context, mod *core.Module, includeDep
client.stateMu.Lock()
defer client.stateMu.Unlock()
+ // FIXME: POC for injecting env-specific argument overrides, only in the top-level
+ // Find the eponym object (object with same name as module)
+ err = func() error {
+ slog.Info("hooking into main module install", "module", mod.Name(), "module_bis", mod.OriginalName)
+ var mainObj *core.ObjectTypeDef
+ for _, typeDef := range mod.ObjectDefs {
+ if typeDef.AsObject.Valid {
+ objDef := typeDef.AsObject.Value
+ if strings.ToLower(objDef.OriginalName) == strings.ToLower(mod.OriginalName) {
+ mainObj = objDef
+ break
+ }
+ }
+ }
+
+ if mainObj == nil {
+ return fmt.Errorf("no main object found") // No eponym object found
+ }
+
+ // Check if it has a constructor
+ if !mainObj.Constructor.Valid {
+ slog.Info("main module has no constructor: %s", mod.Name())
+ return nil // No constructor
+ }
+
+ slog.Info("main module has a constructor: %s", mod.Name())
+
+ constructor := mainObj.Constructor.Value
+ dag, err := srv.Server(ctx)
+ if err != nil {
+ return err
+ }
+ return injectDefaultSecret(ctx, dag, constructor, "token", "env://TOKEN")
+ }()
+ if err != nil {
+ return fmt.Errorf("env override: %s", err.Error())
+ }
+
err = srv.serveModule(client, mod)
if err != nil {
return err
@@ -1256,6 +1295,34 @@ func (srv *Server) ServeModule(ctx context.Context, mod *core.Module, includeDep
return nil
}
+func injectDefaultSecret(ctx context.Context, srv *dagql.Server, fn *core.Function, argName, argValue string) error {
+ slog.Info("injecting secret construvtor arg %s with default value %s", argName, argValue)
+ for _, arg := range fn.Args {
+ if arg.OriginalName == argName {
+ var secretID core.SecretID
+ if err := srv.Select(ctx, srv.Root(), &secretID,
+ dagql.Selector{
+ Field: "secret",
+ Args: []dagql.NamedInput{{Name: "uri", Value: dagql.NewString(argValue)}},
+ },
+ dagql.Selector{
+ Field: "id",
+ },
+ ); err != nil {
+ return err
+ }
+ secretIDJSON, err := json.Marshal(secretID)
+ if err != nil {
+ return err
+ }
+ arg.DefaultValue = secretIDJSON
+ slog.Info("injection complete. Default argument = %s", secretIDJSON)
+ return nil
+ }
+ }
+ return fmt.Errorf("Inject default secret: no such argument: %s", argName)
+}
+
// not threadsafe, client.stateMu must be held when calling
func (srv *Server) serveModule(client *daggerClient, mod core.Mod) error {
// don't add the same module twice
i can read the pr ๐
i think if the typedefs modification is kept in ServeModule it should be fine, but we should avoid it spiraling out everywhere. it looks like the code is somewhere missing a clone though (i think as is, it'll actually modify every instance of that module in the graph?)
We could make it a CLI-only feature (there is some logic to that, since that's where the arg parsing. logic already lives, would avoid duplication). So .env would be used like a CLI argument overlay. But it feels too shallow, feels kind of wrong for the engine to not be aware at all
But maybe we can start in the CLI, then move it to the engine along with the rest of the CLI logic to be moved engine-side as part of "evolution of env"?
to me, it feels very much tied into shell+call+how we parse args
so, i think it makes sense to have that live where those live - we should move it all engine side eventually, but i think it's a layer of abstraction above the raw typedefs + graphql schema
i'm aware this is a really annoying point for me to take, but it feels weird to mix user-provided info and module-provided info into the same schema - for one, it means that module loading cache is now not only just influenced by the module, it's influenced by the user config as well.
it feels like this is a calling convention on top of graphql, and it's worth keeping the distinction between module defaults, and defaults from the config
It's just that when the users use the CLI for introspection, they won't be getting the truth. We'll have to special case .help and other CLI introspection features to handle env overlays... Whereas if it's in the schema/typedefs, we get that for free
In a very real sense, we will be implementing a second layer of "meta-typedef"
yeah, but isn't that what it is? it is a seaprate feature, those defaults aren't defined by the module.
we have to special case introspection in some way at least, so that we don't get huge ids showing up in the typedefs help
But in the end it's the same user who will have to merge those 2 layers of defaults to actually know what arguments are required and which are optional in the CLI
We could handle those IDs in the engine
We could also remove the argument altogether for that session, instead of setting its default value
(although then end users would lose the ability to override in the CLI)
In the end what you're introspecting is an API endpoint that is influenced by a bunch of factors, including but not limited to a module. You're not literally introspecting the module, you're introspecting the API endpoint the engine decided to show you.
It doesn't shock me that the engine would take into account the contents of .env.dagger.json alongside its other various inputs, when deciding what endpoint to show you
I'll try the CLI-only version and see how it feels. It's a plausible option, we would just split up the change in 2 phases
But I can already tell it's going to be annoying having to add custom introspection in the CLI to show the user a "special kind of default, basically the same but different!"
not looking forward to that
potential idea:
- keep the modified typedefs, but instead of overwriting a
defaultValue, have a new field calledconfigValueor something. the cli could use that to determine an optional arg (but not display the value there)
the cli would still have to provide it in it's graphql request, the engine wouldn't auto fill it in
it's a "hint" to the caller, not a full on override of how the graphql schema works
Actually another issue with doing it CLI-side: one benefit of pre-configuring arguments to the constructor, is that you don't have execute the constructor explicitly at all in the shell.
- Before:
dagger -c '. foo env://TOKEN --repo=bar | build' - After:
dagger -c 'build
That's another thing I have to special case in the shell
It's reasonable to start client-side, it's just that our client-side UX is quite fragmented at the moment which complicates things
there's also privileged LLM environments which literally don't work if your constructor has required arguments, but could magically work with this feature. But not if we do it client-side..
re: defaultPath and ignore - are we any closer to lazy filesyncing/globbing being possible with Theseus by any chance? wonder if that could be a piece of the puzzle
this is essentially my interpretation too, but i may be overlooking dramatic caching / module-reuse downsides
one thing i'd say is, if we do have object defaults, they should just be in the GraphQL schema, even if they're giant IDs. that's bias coming from a sideproject of mine, which treats the GraphQL schema as the source of truth
imo, the module re-use/caching argument only applies if we're intending to propagate these kind of things down really deep.
if it only influences the schema seen by ServeModule, it's not very far reaching and should be fine
but we can't reach in and modify the schema of dependencies deep in the graph without potentially breaking them
personally, i'd argue we only care about these kind of user args at the top most level, and so this is fine
but if we want that feature, then i don't think we can do it with schema modification
(side note, i do want to allow configuration of just dependencies somehow, e.g. i want to be able to install modules/go but, in dagger.json/dagger.env/whatever, be able to override the default version - whether that's this mechanism or another)
modifying the schema that your module presents, or the schema that your module consumes is safe - no other schema modification could be safe
if that's the case and we agree we don't need more than that, the approach taken by the POC works for me
Yeah I don't think we need more depth if that's what you mean.
similar-ish pre-existing art exists for this in that you can rename dependencies in dagger.json, which just changes the name that is presented in the API to the end-caller (needed to avoid name conflicts between deps w/ the same name)
filesync stuff was swapped out a while ago, so it's certainly an available quest. It's just not straightforward because we still need content-based cache keys for the synced contents even if the sync itself is lazy. The engine does the content hashing atm, but if the engine isn't syncing the contents into it then that means the client needs to do the content hashing, send that over to the engine, etc. All very doable but a lot to work through to get the content hashing efficient on every client platform
Update: I'm working on the "load from a user-specified config file" part of the POC
Punting on the aesthetic problem of big ugly IDs in introspection - any ideas for dealing with that are welcome ๐
My immediate thought is that we might just want a prettier display of IDs than the current base64 encoded protobuf. There is id.Display() already, which gives a string that looks close to a graphql query, so that would be a step in the right direction. But then we could just flesh out "nice looking representation of ID" further. Since it would just be an aesthetics layer on top, it would be totally fine IMO to even make special casing for certain types, etc.
I do like the idea of centering the implementation around IDs, and it would obviously be a shame to move away from that just to make the introspection more readable. So I feel like that leaves us with just making IDs more readable
Yeah makes sense
also feels symmetrical to string-to-ID parsing currently done by the CLI for passing secrets, directories etc. And same format I want to parse from the .env file for this feature
if we could somehow make it so you could go from ID to string representation and back in a lossless way that could be very powerful