Cross-session IDs 😥 | Dagger | Page 1

quiet verge Aug 31, 2023, 10:11 PM

#

🧵

#

Quoting @timber zephyr :

Would it make sense to switch to a global session model, where only one session is allowed per process, to avoid edge cases like this?

I think that would just shift the problem to trying to share IDs across separate processes (and in languages like python, there are frameworks that let you treat functions running in separate processes the same way you treat functions in-process).

One possibility would be to include a session ID embedded in the ID itself and error out if someone tries to use an ID incorrectly. Previously, that would have been a practical impossibility, but @rustic anvil and I have both had to deal with so many issues around IDs being unstable in the last few months (service hostnames needing stable digests, most recently the changes in Zenith Checks to replace local:// sources w/ blob://, etc.) that I'm pretty sure we could actually do that now since we already had all the plumbing in place to account for IDs having something unstable embedded in them.

#

But if I'm sharing IDs across processes, don't I expect it to be re-interpreted in a different context? For example host.file("foo") may return a completely different file, but that's expected - kind of like if I saved the ID to a file, emailed it to someone else, and they loaded in their own engine?

#

basically, shouldn't an ID be treated as bytecode?

#

--> https://github.com/dagger/dagger/issues/3923

GitHub

Save & load pipelines · Issue #3923 · dagger/dagger

Overview I propose adding a new feature to the Dagger API: saving & loading of pipelines. Save: snapshot the state of a pipeline, encode it as text, send it to the client Load: restore the stat...

timber zephyr Aug 31, 2023, 10:24 PM

#

quiet verge But if I'm sharing IDs across processes, don't I expect it to be re-interpreted ...

I understand the idea there, but in practice that model creates such an enormous number of problems it just doesn't work in practice.

We previously had essentially that behavior, but it resulted in concurrent local directory loads across entirely different clients running on different hosts (connected to the same remote engine) to be de-duplicated, which is really really bad to say the least and caused a big fire for us to put out 🙂

We patched that by including the unique client session ID in the local dir ID (and then patched on top of that all sorts of workarounds to account for the fact that the ID was now unstable, which had lots of side effects on various digests).

The next chapter of this epic saga (😩) is actually the Zenith Checks PR. Zenith introduces the new ability for a cache ref to itself contain IDs (e.g. your entrypoint returns a Container or a Check that is based on Container execution). This throws another wrench into everything because now you can load an ID from the cache and it will be referring to a client session that no longer exists (there's a few other variations on the same sort of problem).

Long story short, I just moved us entirely off every embedding local:// LLB in IDs at all in that PR. The new implementation uses the same local dir sync as before, but we now do an extra step that puts the synced dir into the content store and uses a new custom LLB source called blob:// that lets us refer to it just by its content digest.

To summarize:

Local dir syncs used to become bytecode instructions for "load this dir from the caller". But that creates many many problems.
Now, local dir syncs become bytecode instructions for "use this exact content", with that content being what was loaded from a particular client at a particular time, but no longer attached to that client anymore.

quiet verge Aug 31, 2023, 10:29 PM

#

Probably a naive answer here - but isn't the first problem caused by too much intermixing of llb and dagger ID?

A Dagger ID includes local directory path
B That becomes llb local(foo)
C Buildkit has weird default behavior in deduplicating local ID
D bad problem

--> sure you can solve at A ("always prefix the local path with a session ID!") but it could just as well be solved at B instead ("local path in dagger id becomes local path + session id when converted to llb")

#

Wow didn't know about blob:// that seems pretty awesome actually

#

If I understand correctly, this definitely bifurcates IDs from a possible save/load format. So if we wanted to implement 3923, we wouldn't do it on top of IDs anymore, but would have to be a subtly different thing, correct?

#

Does this also mean that copy pasting IDs manually in graphql queries will no longer work?

timber zephyr Aug 31, 2023, 10:37 PM

#

quiet verge If I understand correctly, this definitely bifurcates IDs from a possible save/l...

It does, but we've honestly been bifurcated for a while now. It's just more complete and obvious now.

Does this also mean that copy pasting IDs manually in graphql queries will no longer work?
It works provided it's in the same session (i.e. a single dagger listen instance or a single dagger run instance), not if it's across many. But that again has been the case for a while now.

So if we wanted to implement 3923, we wouldn't do it on top of IDs anymore, but would have to be a subtly different thing, correct?
Yes, this is the crux of the issue. In the past we've conceptualized IDs as client-independent bytecode, but they just can't practically operate like that. I agree, and I know @rustic anvil agrees even more since we've talked about this before, that we would love to have something like a "totally reproducible save/load format".

The problem is that local dirs and reproducibility are totally incoherent with one another. But I'm pretty much just echoing @rustic anvil's voice at this point 🙂

timber zephyr Aug 31, 2023, 10:37 PM

#

quiet verge Wow didn't know about `blob://` that seems pretty awesome actually

There's some more details here on all that: https://github.com/dagger/dagger/pull/5659#issuecomment-1696035407

quiet verge Aug 31, 2023, 10:38 PM

#

It works provided it's in the same session (i.e. a single dagger listen instance or a single dagger run instance), not if it's across many. But that again has been the case for a while now.

So if I craft a query in play.dagger.cloud, with copy-pasted IDs for stitching, it may not run correctly for you?

timber zephyr Aug 31, 2023, 10:39 PM

#

quiet verge > It works provided it's in the same session (i.e. a single dagger listen instan...

I forget but I thought we set up stickiness such that you pretty much always hit the same dagger listen and thus avoid this problem (cc @hexed root)

quiet verge Aug 31, 2023, 10:39 PM

#

But what happens when the server restarts?

#

Or you visit the link a month later?

timber zephyr Aug 31, 2023, 10:39 PM

#

quiet verge But what happens when the server restarts?

Yeah, then it won't work. The IDs aren't eternal

quiet verge Aug 31, 2023, 10:41 PM

#

OK. I do think that leaves a gap, would be cool to have a stable bytecode format, but I understand that's like priority #99 on your list to even think about that 😛 And pretty low on my list too

#

Also it makes our pure graphql experience just that much more broken. But it wasn't great to begin with, so like you said - mostly making more clear a pre-existing situation

#

I still hope that one day we can add in-doc references, so manually crafted graphql queries can reference other queries by name in the same graphql doc; then convert all client libraries to use that feature also, for full parity

timber zephyr Aug 31, 2023, 10:43 PM

#

Yeah agreed; the other thing that came up when Alex and I talked about this was that we could distinguish between what he calls "pure IDs" and impure ones. So there could be a way of asking for a "pure ID" that is 100% reproducible and thus essentially like bytecode. But if we want bytecode to also support un-reproducible stuff like local dirs, then that's another problem 🙂

quiet verge Aug 31, 2023, 10:43 PM

#

Also that bytecode could be used for callbacks...

timber zephyr Aug 31, 2023, 10:44 PM

#

quiet verge I still hope that one day we can add in-doc references, so manually crafted grap...

I think that would be orthogonal since they would all be in the same session

#

still agreed though

quiet verge Aug 31, 2023, 10:44 PM

#

Honestly I think the word "ID" makes that level of conversation hard

timber zephyr Aug 31, 2023, 10:45 PM

#

quiet verge Honestly I think the word "ID" makes that level of conversation hard

Yeah, it's just a serialization of a runtime object

#

not a permanent ID

quiet verge Aug 31, 2023, 10:45 PM

#

timber zephyr I think that would be orthogonal since they would all be in the same session

Yes it's just that it would unbreak the pure GraphQL DX, and we wouldn't need to copy-paste IDs anymore

quiet verge Aug 31, 2023, 10:46 PM

#

timber zephyr Yeah, it's just a serialization of a runtime object

That's the thing: "ID" could be interpreted as either 1) code, or 2) serialized state.

With things like "pure and impure IDs" we're just trying to distinguish between code and state; which are just different

#

I'm picturing an API where you could ask for either 1) the pipeline's code (to serialize and share), or 2) the reproduceable state produced by running it

#

the former would reference the latter

#

"here is a run of bytecode <xxx>, with the inputs <....> and the resulting values were <...>"

#

anyway I'm just procrastinating checks PR review 😛 back to that

hexed root Aug 31, 2023, 10:50 PM

#

quiet verge That's the thing: "ID" could be interpreted as either 1) code, or 2) serialized ...

OT: pure and impure reminds me to my old serializable Java days. Which in practice, is the same thing it's happening here 😛

#Cross-session IDs 😥