@vito maybe a stupid idea. Have we ever | Dagger | Page 1

timber pumice Jun 21, 2024, 8:08 PM

#

What's the goal? (And what do you mean by serialized state?)

real raft Jun 21, 2024, 8:09 PM

#

the serialized fields of the object, that the engine collects to persist objects when chaining calls

#

no specific short term goal, was just pondering my experience instantiating my module's types; and whether there are lurking DX improvements there

#

for example we have loadXXXfromID at the root query which is horrible

#

and no generic new for a given type

timber pumice Jun 21, 2024, 8:13 PM

#

real raft for example we have `loadXXXfromID` at the root query which is horrible

this could in principle be replaced with standard node(id: ID!) - but the trade-off is losing the separate ID types, which are a critical part of the SDK codegen DX

real raft Jun 21, 2024, 8:13 PM

#

and separately, the same exact object might be instantiated by different chains of calls - leading to different IDs for the same object

real raft Jun 21, 2024, 8:14 PM

#

timber pumice this could in principle be replaced with standard `node(id: ID!)` - but the trad...

yeah I don't have a specific proposal or request, just accumulating a pile of observations that feel related

timber pumice Jun 21, 2024, 8:15 PM

#

real raft and separately, the same exact object might be instantiated by different chains ...

in scenarios like this it's possible to have a query return a "canonical" form of the value with its own ID, fwiw. For example container.from("alpine:latest") and container.from("alpine:3") could both return the exact same object + ID assuming it resolves to the same digest

real raft Jun 21, 2024, 8:17 PM

#

but what about foo.bar() with:

func (foo Foo) Bar() *Container {
  return dag.Container().From("alpine:latest")
}

timber pumice Jun 21, 2024, 8:17 PM

#

should work the same way yea

#

IMO recipe-addressed content is strictly better than content-addressed values, since it gives you the magical power to derive the value if it's missing in the destination. It also lets us be as lazy as possible. One of the DagQL experiments I tried was to have queries like foo { bar { baz { id } } } actually just statically return the ID without even evaluating foo.bar.baz, so things can be evaluated with maximal parallelism later

#

It's also deeply integrated into the TUI and Cloud UI. I know you're not arguing against it, just pointing out how leveraged it is at the moment 😛

real raft Jun 21, 2024, 8:19 PM

#

Yeah I can definitely see that

#

I'm wondering if content-addressed has a role though, perhaps complementary to recipe-based IDs?

timber pumice Jun 21, 2024, 8:20 PM

#

timber pumice IMO recipe-addressed content is strictly better than content-addressed values, s...

caveat: they're doing diffrent things (recipe-addressed without content hashing vs. content hashing without recipes)

real raft Jun 21, 2024, 8:20 PM

#

It seems like a gap that, when different recipes lead to the exact same content, only buildkit knows about that, and dagger doesn't

timber pumice Jun 21, 2024, 8:20 PM

#

best would be to have both, yeah. I wanted to do this for artifact publishing to Daggerverse

#

basically an ID with an attestation for what content digest it should produce

real raft Jun 21, 2024, 8:20 PM

#

(and in some cases even buildkit may not know, in the case of recipes that don't compile to llb at full resolution)

timber pumice Jun 21, 2024, 8:21 PM

#

timber pumice basically an ID with an attestation for what content digest it _should_ produce

then, when you publish an artifact to Daggerverse, it builds it on its own and verifies that it gets the same content hash

timber pumice Jun 21, 2024, 8:23 PM

#

real raft It seems like a gap that, when different recipes lead to the exact same content,...

I'd be curious to see if Buildkit actually would have the content hashes match in this type of scenario (different API call paths to set up module object state); I would guess that it wouldn't, for one reason or another (e.g. different exec ops)

real raft Jun 21, 2024, 8:23 PM

#

Is it fair to say that we are conflating 1) function memoization and 2) addressable objects?

#

There is loss of information there, and perhaps that information is currently worthless, but in the abstract we are losing information today - right?

#

For example given an object, the engine doesn't know how many different chains returned it

#

(this might just be "Solomon learns DAG 101", sorry)

timber pumice Jun 21, 2024, 8:25 PM

#

this probably ties into Erik's mention of giving Dagger its own content store and swapping out LLB (or was it integrating with LLB? don't remember)

real raft Jun 21, 2024, 8:26 PM

#

This traces back specifically to custom ID-able types, which in my defense are very new, and still not fully understood by us mere mortal developers using the platform 🙂

#

and also don't exist in buildkit

real raft Jun 21, 2024, 8:26 PM

#

timber pumice this probably ties into Erik's mention of giving Dagger its own content store an...

yes sounds very similar

#

it's like we introduced a whole new layer of capabilities, that are incredibly powerful, currently mixed into the pre-existing "almost llb" design, and eventually we will go back and layer it more cleanly once we understand it better

#

Perhaps eventually object ID wil be flipped: the ID is the content; and recipes (plural) are attached metadata saying "this is how to reproduce this artifact"

timber pumice Jun 21, 2024, 8:28 PM

#

real raft Is it fair to say that we are conflating 1) function memoization and 2) addressa...

yeah, I think so - IDs are sort of like 'promises' for arbitrary values, right now they always refer to Objects in GraphQL, but the format is technically any typed GraphQL value, and that's what actually becomes the DagQL cache key iirc

real raft Jun 21, 2024, 8:28 PM

#

not sure I believe my own BS here 😛

#

since the "content" in that content-addressed ID will most of the time itself be a recipe... Like a Container field in a custom App object...

#

All the way down to actual blobs that we're not permanently storing anyway

#

and therefore cannot be retrieved without a recipe to create them...

#

so not sure what these "not recipe" IDs would even be used for

timber pumice Jun 21, 2024, 8:34 PM

#

there'll probably be a few details to shake out if/when we start using IDs for persistence (beyond the in-memory cache) - whole different set of concerns there. the ID design is meant to accommodate it (e.g. the 'impure' metadata so we can know not to persist the result), but it hasn't seen a trial by fire yet.

#

the blob() API underlying local file syncs is a good example of an actually content-addressed ID, but as you would expect, it only works if the content is already in the store 😛

timber pumice Jun 21, 2024, 8:36 PM

#

timber pumice there'll probably be a few details to shake out if/when we start using IDs for p...

oh, and the meta: true/false property, which was 100% added with persistence in mind, but is such a huge pain in the butt to support (even though it's conceptually pretty simple) that I'd rather just try to keep metadata out of our API (hence my tirade about just embracing OTel for anything not content-affecting)

real raft Jun 21, 2024, 8:54 PM

#

I have no idea with meta: true/false is, first time I hear about that

timber pumice Jun 21, 2024, 9:04 PM

#

https://github.com/dagger/dagger/blob/66e98c39ef4ed0aa17bbb2c8b47e48b10620bdaa/dagql/call/callpbv1/call.proto#L51-L60

#

We don't use it anywhere now, since the only things that needed it before were pipeline() and withFocus(), which we've moved away from

#

The idea was that you could take an ID and remove any 'meta' calls from its DAG to yield a canonical representation that you can use for persistent cache keys, so APs like pipeline (or I suppose your ideal OTel alternative) don't bust cache keys

#

But it's pretty expensive and complicated to do that conversion

#

Interestingly it's a case where you'd want that canonicalization to apply to persistent caches but not the query cache - because pipeline and withFocus actually do affect the in-memory content, at least with how they were implemented before (they would store their value in a field on the object)

#@vito maybe a stupid idea. Have we ever