#Cross-session IDs ๐ฅ
1 messages ยท Page 1 of 1 (latest)
๐งต
Quoting @timber zephyr :
Would it make sense to switch to a global session model, where only one session is allowed per process, to avoid edge cases like this?
I think that would just shift the problem to trying to share IDs across separate processes (and in languages like python, there are frameworks that let you treat functions running in separate processes the same way you treat functions in-process).
One possibility would be to include a session ID embedded in the ID itself and error out if someone tries to use an ID incorrectly. Previously, that would have been a practical impossibility, but @rustic anvil and I have both had to deal with so many issues around IDs being unstable in the last few months (service hostnames needing stable digests, most recently the changes in Zenith Checks to replace local:// sources w/ blob://, etc.) that I'm pretty sure we could actually do that now since we already had all the plumbing in place to account for IDs having something unstable embedded in them.
But if I'm sharing IDs across processes, don't I expect it to be re-interpreted in a different context? For example host.file("foo") may return a completely different file, but that's expected - kind of like if I saved the ID to a file, emailed it to someone else, and they loaded in their own engine?
basically, shouldn't an ID be treated as bytecode?
I understand the idea there, but in practice that model creates such an enormous number of problems it just doesn't work in practice.
We previously had essentially that behavior, but it resulted in concurrent local directory loads across entirely different clients running on different hosts (connected to the same remote engine) to be de-duplicated, which is really really bad to say the least and caused a big fire for us to put out ๐
We patched that by including the unique client session ID in the local dir ID (and then patched on top of that all sorts of workarounds to account for the fact that the ID was now unstable, which had lots of side effects on various digests).
The next chapter of this epic saga (๐ฉ) is actually the Zenith Checks PR. Zenith introduces the new ability for a cache ref to itself contain IDs (e.g. your entrypoint returns a Container or a Check that is based on Container execution). This throws another wrench into everything because now you can load an ID from the cache and it will be referring to a client session that no longer exists (there's a few other variations on the same sort of problem).
Long story short, I just moved us entirely off every embedding local:// LLB in IDs at all in that PR. The new implementation uses the same local dir sync as before, but we now do an extra step that puts the synced dir into the content store and uses a new custom LLB source called blob:// that lets us refer to it just by its content digest.
To summarize:
- Local dir syncs used to become bytecode instructions for "load this dir from the caller". But that creates many many problems.
- Now, local dir syncs become bytecode instructions for "use this exact content", with that content being what was loaded from a particular client at a particular time, but no longer attached to that client anymore.
Probably a naive answer here - but isn't the first problem caused by too much intermixing of llb and dagger ID?
A Dagger ID includes local directory path
B That becomes llb local(foo)
C Buildkit has weird default behavior in deduplicating local ID
D bad problem
--> sure you can solve at A ("always prefix the local path with a session ID!") but it could just as well be solved at B instead ("local path in dagger id becomes local path + session id when converted to llb")
Wow didn't know about blob:// that seems pretty awesome actually
If I understand correctly, this definitely bifurcates IDs from a possible save/load format. So if we wanted to implement 3923, we wouldn't do it on top of IDs anymore, but would have to be a subtly different thing, correct?
Does this also mean that copy pasting IDs manually in graphql queries will no longer work?
It does, but we've honestly been bifurcated for a while now. It's just more complete and obvious now.
Does this also mean that copy pasting IDs manually in graphql queries will no longer work?
It works provided it's in the same session (i.e. a singledagger listeninstance or a singledagger runinstance), not if it's across many. But that again has been the case for a while now.
So if we wanted to implement 3923, we wouldn't do it on top of IDs anymore, but would have to be a subtly different thing, correct?
Yes, this is the crux of the issue. In the past we've conceptualized IDs as client-independent bytecode, but they just can't practically operate like that. I agree, and I know @rustic anvil agrees even more since we've talked about this before, that we would love to have something like a "totally reproducible save/load format".
The problem is that local dirs and reproducibility are totally incoherent with one another. But I'm pretty much just echoing @rustic anvil's voice at this point ๐
There's some more details here on all that: https://github.com/dagger/dagger/pull/5659#issuecomment-1696035407
It works provided it's in the same session (i.e. a single dagger listen instance or a single dagger run instance), not if it's across many. But that again has been the case for a while now.
So if I craft a query in play.dagger.cloud, with copy-pasted IDs for stitching, it may not run correctly for you?
I forget but I thought we set up stickiness such that you pretty much always hit the same dagger listen and thus avoid this problem (cc @hexed root)
Yeah, then it won't work. The IDs aren't eternal
OK. I do think that leaves a gap, would be cool to have a stable bytecode format, but I understand that's like priority #99 on your list to even think about that ๐ And pretty low on my list too
Also it makes our pure graphql experience just that much more broken. But it wasn't great to begin with, so like you said - mostly making more clear a pre-existing situation
I still hope that one day we can add in-doc references, so manually crafted graphql queries can reference other queries by name in the same graphql doc; then convert all client libraries to use that feature also, for full parity
Yeah agreed; the other thing that came up when Alex and I talked about this was that we could distinguish between what he calls "pure IDs" and impure ones. So there could be a way of asking for a "pure ID" that is 100% reproducible and thus essentially like bytecode. But if we want bytecode to also support un-reproducible stuff like local dirs, then that's another problem ๐
Also that bytecode could be used for callbacks...
I think that would be orthogonal since they would all be in the same session
still agreed though
Honestly I think the word "ID" makes that level of conversation hard
Yeah, it's just a serialization of a runtime object
not a permanent ID
Yes it's just that it would unbreak the pure GraphQL DX, and we wouldn't need to copy-paste IDs anymore
That's the thing: "ID" could be interpreted as either 1) code, or 2) serialized state.
With things like "pure and impure IDs" we're just trying to distinguish between code and state; which are just different
I'm picturing an API where you could ask for either 1) the pipeline's code (to serialize and share), or 2) the reproduceable state produced by running it
the former would reference the latter
"here is a run of bytecode <xxx>, with the inputs <....> and the resulting values were <...>"
anyway I'm just procrastinating checks PR review ๐ back to that
OT: pure and impure reminds me to my old serializable Java days. Which in practice, is the same thing it's happening here ๐