#@vito maybe a stupid idea. Have we ever
1 messages ยท Page 1 of 1 (latest)
the serialized fields of the object, that the engine collects to persist objects when chaining calls
no specific short term goal, was just pondering my experience instantiating my module's types; and whether there are lurking DX improvements there
for example we have loadXXXfromID at the root query which is horrible
and no generic new for a given type
this could in principle be replaced with standard node(id: ID!) - but the trade-off is losing the separate ID types, which are a critical part of the SDK codegen DX
and separately, the same exact object might be instantiated by different chains of calls - leading to different IDs for the same object
yeah I don't have a specific proposal or request, just accumulating a pile of observations that feel related
in scenarios like this it's possible to have a query return a "canonical" form of the value with its own ID, fwiw. For example container.from("alpine:latest") and container.from("alpine:3") could both return the exact same object + ID assuming it resolves to the same digest
but what about foo.bar() with:
func (foo Foo) Bar() *Container {
return dag.Container().From("alpine:latest")
}
should work the same way yea
IMO recipe-addressed content is strictly better than content-addressed values, since it gives you the magical power to derive the value if it's missing in the destination. It also lets us be as lazy as possible. One of the DagQL experiments I tried was to have queries like foo { bar { baz { id } } } actually just statically return the ID without even evaluating foo.bar.baz, so things can be evaluated with maximal parallelism later
It's also deeply integrated into the TUI and Cloud UI. I know you're not arguing against it, just pointing out how leveraged it is at the moment ๐
Yeah I can definitely see that
I'm wondering if content-addressed has a role though, perhaps complementary to recipe-based IDs?
caveat: they're doing diffrent things (recipe-addressed without content hashing vs. content hashing without recipes)
It seems like a gap that, when different recipes lead to the exact same content, only buildkit knows about that, and dagger doesn't
best would be to have both, yeah. I wanted to do this for artifact publishing to Daggerverse
basically an ID with an attestation for what content digest it should produce
(and in some cases even buildkit may not know, in the case of recipes that don't compile to llb at full resolution)
then, when you publish an artifact to Daggerverse, it builds it on its own and verifies that it gets the same content hash
I'd be curious to see if Buildkit actually would have the content hashes match in this type of scenario (different API call paths to set up module object state); I would guess that it wouldn't, for one reason or another (e.g. different exec ops)
Is it fair to say that we are conflating 1) function memoization and 2) addressable objects?
There is loss of information there, and perhaps that information is currently worthless, but in the abstract we are losing information today - right?
For example given an object, the engine doesn't know how many different chains returned it
(this might just be "Solomon learns DAG 101", sorry)
this probably ties into Erik's mention of giving Dagger its own content store and swapping out LLB (or was it integrating with LLB? don't remember)
This traces back specifically to custom ID-able types, which in my defense are very new, and still not fully understood by us mere mortal developers using the platform ๐
and also don't exist in buildkit
yes sounds very similar
it's like we introduced a whole new layer of capabilities, that are incredibly powerful, currently mixed into the pre-existing "almost llb" design, and eventually we will go back and layer it more cleanly once we understand it better
Perhaps eventually object ID wil be flipped: the ID is the content; and recipes (plural) are attached metadata saying "this is how to reproduce this artifact"
yeah, I think so - IDs are sort of like 'promises' for arbitrary values, right now they always refer to Objects in GraphQL, but the format is technically any typed GraphQL value, and that's what actually becomes the DagQL cache key iirc
not sure I believe my own BS here ๐
since the "content" in that content-addressed ID will most of the time itself be a recipe... Like a Container field in a custom App object...
All the way down to actual blobs that we're not permanently storing anyway
and therefore cannot be retrieved without a recipe to create them...
so not sure what these "not recipe" IDs would even be used for
there'll probably be a few details to shake out if/when we start using IDs for persistence (beyond the in-memory cache) - whole different set of concerns there. the ID design is meant to accommodate it (e.g. the 'impure' metadata so we can know not to persist the result), but it hasn't seen a trial by fire yet.
the blob() API underlying local file syncs is a good example of an actually content-addressed ID, but as you would expect, it only works if the content is already in the store ๐
oh, and the meta: true/false property, which was 100% added with persistence in mind, but is such a huge pain in the butt to support (even though it's conceptually pretty simple) that I'd rather just try to keep metadata out of our API (hence my tirade about just embracing OTel for anything not content-affecting)
I have no idea with meta: true/false is, first time I hear about that
We don't use it anywhere now, since the only things that needed it before were pipeline() and withFocus(), which we've moved away from
The idea was that you could take an ID and remove any 'meta' calls from its DAG to yield a canonical representation that you can use for persistent cache keys, so APs like pipeline (or I suppose your ideal OTel alternative) don't bust cache keys
But it's pretty expensive and complicated to do that conversion
Interestingly it's a case where you'd want that canonicalization to apply to persistent caches but not the query cache - because pipeline and withFocus actually do affect the in-memory content, at least with how they were implemented before (they would store their value in a field on the object)