#secrets

1 messages · Page 1 of 1 (latest)

heavy egret
#

🧵

#

secrets

#

I've been thinking about your idea that maybe we can get rid of setSecret altogether. I think it's right

#

I also think instead of replacing it with mapSecret, we should put all builtin secrets provider under dag.Host

#

eg.

  • dag.Host().Secret("op://foo/bar")
  • dag.Host().Secret("file://foo/bar")
  • dag.Host().Secret("env://foo")
#

Or perhaps (open for bikeshedding)

  • dag.Host().OnePassword().Read("op://foo/bar")
  • dag.Host().HashicorpVault().Read("foo/bar")
  • dag.Host().File("foo/bar").AsSecret()
  • dag.Host().Env("foo").AsSecret()
#

Benefits:

  • No more concerns around caching. The host controls every avenue for populating a secret, so it can ensure it's not cached
  • No more concerns about sandbox abstraction leak. I don't want a function 5 layers down fetching op://foo/bar in my 1password account.
  • Still possible to support dynamic secrets: your function just has to write to a file in the local filesystem (making sure it's in tmpfs, to avoid caching) and call dag.Host().File("foo/bar").AsSecret(). This can be done dynamically.
#

cc @rose star

mossy shuttle
#

The concern is finding the right migration path, we'll probably have to keep setSecret around for a few releases alongside the new API to give folks a chance to migrate.

Migration depends on the use case for setSecret. I'm trying to figure out which one those are. I can think of:

  • "Secret Provider Modules" (e.g. Vault module) which can just be replaced by native secret providers
  • [can't think of anything else yet]

@rose star ?

Also /cc @azure grove: AFAIK setSecret is the main thing preventing function call caching

azure grove
#

Still possible to support dynamic secrets: your function just has to write to a file in the local filesystem (making sure it's in tmpfs, to avoid caching) and call dag.Host().File("foo/bar").AsSecret(). This can be done dynamically.
I don't think we could do this in any reasonable way any time soon. If the secret is written to a tmpfs, the tmpfs can't be persistently cached (either locally across engine start/stop or especially for remote cache use cases). You'd need to somehow identify that the secret is being accessed from a tmpfs and then go back to the previous operation and say "this operation can never be cached and must always re-run", but that itself is a game of finding which previous op was the one that touched that file most recently, etc. etc.

Never say never but it would be complicated to the point that I'd say it's non viable for now.

#

I think dynamic secrets can probably fit into the secret provider model most likely, especially if we add support for function calls serving as secret providers

mossy shuttle
mossy shuttle
azure grove
#

Yeah, we would have to actually cache it on disk for it to work and not have the problems of SetSecret, but like you said that's fine since it's just a dummy testing secret anyways. Probably worth a name like "insecure-plaintext" or something

#

just to emphasize that it's not actually secret

mossy shuttle
azure grove
#

The SSH keygen use case there is interesting too, basically using a module to generate a cert or similar

mossy shuttle
#

first thing that comes to mind is, for those, he'd probably want to disable function caching yeah? Not even considering the secrets aspect of that

azure grove
#

Those use cases are tough when the cert is meant to be cached+persistent (as opposed to throw away and regenerated every time the functions run).

The only thing I can think of is to tell users to persist those certs to disk as normal Files but use encryption, with the password that decrypts being a secret obtained from a provider

rose star
#

I'm not clear on this part

No more concerns about sandbox abstraction leak. I don't want a function 5 layers down fetching op://foo/bar in my 1password account.

Would only the original module (the one I dagger call) have access to dag.Host()?

Separate question - with the current thinking would it be possible for me to use a different provider based on the client's configuration? or is the code explicitly pointing at a specific provider?

mossy shuttle
azure grove
#

The sandbox leak problem is already solved by isolating secrets per client and only granting access based on explicit providing of secrets to a client based on function args/returns. The implementation would carry over to this new model by just applying to access to secret providers, so I don't think there's any difference before or after in that respect

azure grove
# mossy shuttle Yeah ... I'm wondering about the caching aspect because I'm wondering what the u...

Yeah exactly, not sure what Mark's exact use case was there. But either way I can imagine someone wanting to create a dagger module for generating certs (it's a super annoying problem once you get to a certain level of complexity, would be nice to modularize) but in such a way that the certs are actually cached. It seems like a legit use case (though that doesn't mean it's priority 0 to support it immediately necessarily)

mossy shuttle
#

(e.g. for the latter, your best bet would be to export that file locally? in which case you don't really want caching and are back to the "no caching" use case?)

azure grove
mossy shuttle
#

basically wondering if there's yet another use case for something like an "ephemeral://" secret or something. Only works on non-cached functions or something

azure grove
mossy shuttle
#

e.g. (don't mind the syntax)

"hardcoded secrets use case":

    // User defaults to "postgres".
    if user == nil {
        // user = dag.SetSecret("postgres-default-user", "postgres")
        user = dag.NewSecret("insecure-plaintext://postgres")
    }

"ssh key use case":

        // +no-cache
        // ...
    return &KeyPair{
        // PrivateKey: dag.SetSecret(name, string(pem.EncodeToMemory(sshPrivateKey))),
        PrivateKey: dag.NewSecret("ephemeral://"+string(pem.EncodeToMemory(sshPrivateKey)))
    }, nil
#

(the latter is basically today's implementation)

azure grove
#

Oh wait, I just remembered why this is extra hard... It's a pain to explain but it's a real non-obscure use case.

  • Function A creates an ephemeral secret, it's correctly marked as "never cached"
  • Function B is a cached function call. It calls out to Function A and gets a return value that contains the secret (either the return is the secret or the secret is embedded in a returned Container as secret env/file, etc.)
  • Function C is a cached function call. It calls to Function B. Say Function B was cached from a previous run. If Function B returns a value that has the secret in it (either direct or embedded), the secret will not be found.

Basically, you'd need to cascade function call cache invalidation, but only when an ephemeral secret is involved.

There's a zillion variations on the above idea too. It would actually matter in the real world

mossy shuttle
#

right

azure grove
#

The only world I can imagine that working is one where we've 100% taken over the entire cache logic from buildkit; it just departs completely from the model. And like yeah we probably should do that for many reasons but that's an enormous pre-req to take on for this work 😄

And even in that scenario, it would still be some incredibly gnarly logic. So it may be worth thinking through ways of avoiding ephemeral

mossy shuttle
#

and even in that scenario, it would be incredibly confusing for the user

#

even if it works 100% correctly

azure grove
#

Yes 100%

mossy shuttle
#

"why is my function not cached?" "well, see, 20 layers deep in the stack, someone used this"

azure grove
#

The only thing I can think of to avoid it is to support a secret provider that's backed by either:

#
  • a Function Call ( which would be never cached when used as a secret provider)
  • a Service (which is already never cached in execution)
#

I think that would create the same end effect as ephemeral but avoid those issues

#

Basically just ways of getting secrets on-demand every time they are needed, but in this case sourced from other dagger-native things (functions and/or services) rather than from the host. But it would all be in the same on-demand model, so to speak

mossy shuttle
#

Yeah. Kinda beefy though

azure grove
#

It might not be as bad as it seems, especially the Function call approach. I don't think the engine would need a ton of new features, or possibly any. We can already call out to arbitrary functions based on a given call.

#

The dagql call for a secret provider backed by a Function call would just have the metadata on what function to call (which is not much). Then all the plumbing needed to actually dynamically make that call exists today.

#

Services would be harder because you need to decide on some sort of protocol. But function calls avoid that problem entirely

heavy egret
#

ok I'm lost on the part about dynamic secrets and implications for caching (will catchup on all the messages above).

But also would love feedback on the other part - moving secret providers to Host()

heavy egret
azure grove
# heavy egret The difference is that if a function calls `mapSecret("op://Solomon/OpenAI/token...

Oh okay I misunderstood what you were saying, I agree that functions shouldn't be able to make those calls and that putting the API on Host thus makes sense since that's not available to functions already.

I was referring to the fact that when the CLI/shell invoke a function, it will be making those calls to Host and then passing the secret providers to functions as args (which can then pass the secret provider around to other function calls if needed). That's where the pre-existing logic around ensuring functions only have access to secret providers they were explicitly passed will kick in and ensure there's no weird leaks possible.

So we're on the same page there I think.

#

Putting the API on Host SGTM, the wrinkle is figuring out whether to/how to support the ephemeral use case mentioned above. If we go with my suggestion to support that by allowing function calls to serve as secret providers, then there would be an additional way to create a secret that doesn't involve the Host API. Namely, if an object implements a SecretProvider interface (that we'd add as part of this), then you can turn that object into a secret provider too.