#content-addressed secrets footgun
1 messages · Page 1 of 1 (latest)
this is only an issue for content-addressed files created via withNewFile, other files (including stdout/stderr) should be fine as long as the LLB that created them used secrets appropriately
content-addressed secrets footgun
maybe we could have an InMemoryFile (or better name) that doesn't expose secret but still has an id : FileID?
hm except that's not possible because withNewFile just builds on a Directory which might not be originally in-memory
Ugh … right.
And it’s not the first big problem we encounter we secrets.
Fundamentally:
-
Either the secret gets leaked in plaintext (e.g. withNewFile aside, even with Exec(“vault get -o /secret”).File(“/secret”).Secret(), the secret ends plaintext on a layer and cache exported)
-
Or if it doesn’t leak into layers, then it won’t work if the step that produced it is cached (as Kyle pointed out in the vault demo, it worked only the first time. Once cached, the secretid mapped to nothing)
withNewFile fits in the “leaking bucket”, except instead of being a cache layer it’s in the instructions themselves
Had a couple of half baked ideas:
-
We design secrets like in buildkit with providers that gets JIT invoked when a secret is needed, bypassing cache. POC here: https://github.com/dagger/dagger/pull/3124
-
We rethink SecretIDs so they work a lot more like FileIDs. Except it’s always an in-memory file.
/cc @sacred magnet @autumn hawk
Obviously the long-term fix is to remove the "IDs are cookies" part.. So we're talking short-term remediation right?
@opal rampart for the first one, some consolation is that reading from stdout (i.e. not using -o) will be fine since that'll end up in the /dagger mount which won't get exported
JIT secret fetching sounds most ideal to me, but there are tricky parts there, e.g. secrets may have multiple fields (aws access key + secret key) which must be fetched and used together, but passed as different env vars. so you have to avoid fetching the same secret multiple times because it might be regenerated in-between.
Making IDs not be cookies will require persisting the actual value somewhere so it can be mapped from the ID (probably in our daggerd boltdb, at least in my most recent thinking), so then we'd be responsible for persisting user secrets. I guess the shortest reasonable path to that would be that we have built-in support for vault or similar? I'm not sure, but there's a lot of implications and design to think through I believe.
Agree that JIT fetching is probably the fastest path and is not really that bad even in the long term provided we make it easy for users to setup these secret vendor services (or whatever we want to call them)
I must say I’m not super familiar with output mounts, but if we can grab the stdout without running the command again (cached), doesn’t that mean it’s still committed to disk somewhere?
oh, yeah, it's on disk somewhere, just pointing out it won't end up in a pushed image
which isn't all that helpful i guess (when would you push a vault read)
Oh yep yep. It’ll be in the cache export though, less bad but still
ah yeah true
What if we embraced the cookie model but split it from IDs?
- change IDs from being a “recipe” to a simple hash of the recipe
- a lookup table of recipes (keyed by hash) is sent back and forth as a http header. Like an actual cookie 🙂
- both client and server use their shared lookup table to resolve IDs
Benefits:
- IDs are short
- IDs contain no plaintext secrets
- No complicated persistence to implement in the engine
- No complicated logic in SDKs either: send empty cookie on first request; for each subsequent gql request, send the cookie received from the previous response
- Cookie header may contain plaintext secrets BUT much easier to keep secure because that header remains private to the SDK and engine. Doesn’t touch client code or extension code; people don’t copy paste it around; Biggest risk is http request dump leaking into logs in case of error
- Bonus: playground could use the lookup table to make graphql stitching easier: auto-complete IDs instead of copy-pasting them? 😇
That was the dream. @sacred magnetattempted an implementation in the past -- didn't work out in the end because we have no place to store " a lookup table of recipes (keyed by hash)" persistently
(when you run a cached thing, you'll only get the "hash" -- but the code that fills in hash->llb hasn't run)
Also -- the ID leak is a a symptom of the underlying problem, not the main problem.
-
Loading a secret leaks plaintext secrets to layers and is exported to cache (e.g. File().Secret() --> leak). Leaking the secret in the ID is a worse shape of the same problem.
-
When secrets are not leaking, they're actually broken by cache. e.g. Kyle's vault extension calling AddSecret --> first time it works, second time it's called it will be cached and NOT AddSecret, returning the secret hash from cache --> broken secret
basically, with the current design, secrets "have" to leak in order for them to work