nesting + Unix sockets/C2H networking | Dagger | Page 1

lavish spoke Sep 5, 2023, 8:09 PM

#

@stone ruin Whenever you have time, wondering if you have any objections with this change: https://github.com/vito/dagger/compare/services-v2-c2h-h2c...services-v2-nested-sockets (currently pushed to a different branch to avoid services-v2 PR scope creep)

tl;dr: right now all Socket traffic is sent to MainClientCaller, which also affects C2H IP networking since it uses the exact same mechanisms as C2H Unix socket forwarding. I think it makes sense to support nesting instead, so that e.g. a Zenith function can use C2H to route container traffic to the module's "local" Host. Same for Unix sockets in principle.

I remember you ran into a lot of complications here, but don't remember if there were any major dealbreakers.

#

I figure this might raise some cache busting concerns from putting ClientID into an object, but those concerns already exist with sockets/h2c in general (e.g. proxying to an ever-changing socket path or port number), so there might be a general solution like bring-your-own-stable-ID/name (similar to what I mentioned in https://github.com/dagger/dagger/issues/5373).

stone ruin Sep 5, 2023, 9:33 PM

#

Yeah I just went with MainClientCaller in the previous re-arch because it was simple and good enough at the moment, and we had too many other problems to solve as part of all that PR 😄

I figure this might raise some cache busting concerns from putting ClientID into an object, but those concerns already exist with sockets/h2c in general (e.g. proxying to an ever-changing socket path or port number), so there might be a general solution like bring-your-own-stable-ID/name (similar to what I mentioned in https://github.com/dagger/dagger/issues/5373).
(As a side-note the re-arch should have fixed that exact issue you linked, left a comment on it).

General problem still exists though. My concern w/ allowing users to set the name of the socket is that I think it recreates the equivalent headaches we currently have around the names of cache volumes... Two clients could call different sockets the same thing and unintentionally dedupe w/ each other.

I still totally agree with your point that "those concerns already exist with sockets/h2c in general (e.g. proxying to an ever-changing socket path or port number)" though.

Thinking out loud:

When a socket/service is originating from a module, we can setup the caching to behave such that it's tied to the cache key of the module function being executed (i.e. the digest of the Function, including any invocation args). So if an exec depends on a socket/service from a function, the exec cache is busted when that function changes.
Otherwise, it's coming from the MainClientCaller. As a baseline, we can just retain the current behavior today (which isn't ideal) and say that if you want better safety you should use modules.

But for this case we can consider other ideas like what you suggested w/ bring-your-own-stable-name (still not ideal, but probably better). I also played around with using the hostname of the MainClientCaller + socket path as a heuristic for the "stable name", which once again not ideal but better than today's behavior

lavish spoke Sep 5, 2023, 9:41 PM

#

stone ruin Yeah I just went with `MainClientCaller` in the previous re-arch because it was ...

General problem still exists though. My concern w/ allowing users to set the name of the socket is that I think it recreates the equivalent headaches we currently have around the names of cache volumes... Two clients could call different sockets the same thing and unintentionally dedupe w/ each other.

Sorry, I wasn't clear; what I meant was using a user-supplied name/ID only when computing stable digests (e.g. to calculate the hostname for a service that depends on a socket/c2h), but when it comes to actually using the socket we use its embedded ClientID to ensure that socket traffic goes to the appropriate client. Not using a user-supplied tag for determining where traffic goes. Does that make sense? (Haven't read your two thoughts yet, just wanted to clarify)

#

Would be cool if there was a way to automatically infer something stable instead, but at some level it seems fundamental: what if your function mounts two sockets, both from unstable paths. I guess we could identify them by their mount point, but there might also be some application-specific knowledge to indicate whether the cache should be busted or reused 🤔 (based on knowing what's on the other side of the socket)

#

[there's a good chance I'm missing an aspect of how Buildkit handles caching here]

stone ruin Sep 5, 2023, 10:07 PM

#

lavish spoke > General problem still exists though. My concern w/ allowing users to set the n...

The type of tricky case I'm thinking about is:

Two different clients are connected to buildkit and both trying to forward a host socket at /var/run/foo.sock. If we allow clients to choose their own stable ID, then let's say both clients choose to use the name foo.
The two clients submit identical execs in parallel that depend on the socket. Buildkit will see the execs are identical and de-dupe their execution. It wouldn't matter that we still do all the setup w/ ClientIDs to route to the "correct" socket, there won't even be two separate requests to route correctly.
In a lot of cases, that's totally fine. But it's also highly possible that the client expected their /var/run/foo.sock to be talked to and would be surprised if it wasn't.

This is definitely not a common case, but it's not totally implausible either when dealing with shared engines running CI jobs in parallel.

Basically, what I'm suggesting is that we actually just leave that problem as is for the case when the socket is coming from a MainClientCaller. It's obscure enough that I wouldn't consider it a hair-on-fire problem. Allowing clients to choose their own stable ID and/or using the the client's hostname as a heuristic for that stable ID would improve the situation, and we should consider that, but not eliminate it.

Then, when it comes to sockets served by Functions, I think we can use a combination of the Function's digest and the path of the socket as the stable ID of the socket. That way, if the "thing behind the socket" changes, the Functions must change too, so the caching should work out like we want.

what if your function mounts two sockets, both from unstable paths.
This could still indeed be a problem if you don't have control over the path of the socket being setup, but one nice part of Functions being containerized is that it's easy to do stuff like create a stable symlink to an unstable path and then just use that instead. So I think I'd be okay with that sort of thing just needing to be documented. But also, I'd have no problem with additionally layering on support for optionally setting a stable name for the socket to deal with this too. That would be more convenient.

#

Hopefully I'm understanding and we aren't just talking past each other 😅 Let me know if I'm missing the point

lavish spoke Sep 5, 2023, 10:24 PM

#

Yeah, that's the same case I have in mind, but now I see that it doesn't seem possible to support both stable hostnames and per-client exec op caching/deduping in a clean way. I think we're on the same page now.

So basically the idea is to make them as content-addressed as possible, which we have the liberty of doing once Modules arrive by basically doing moduleID+function+path-or-port. neat

The symlink workaround is a good point, and applies to C2H too in the sense that you have your own network namespace to mess with, so you might as well use a stable port within it (even if that means mirroring in some obtuse situation). So maybe we just don't care about dynamic paths, at least until people complain enough about the host/non-module situation.

#

The only concern I have is if there's some surprisingly-common situation where someone needs nesting to work, in a way that doesn't really fit modules/functions. Example:

You're running tests in Dagger
Your tests use Dagger to start a Testcontainers-style service (H2C)
You want to pass it back into a different Dagger container (C2H) to talk to it

It's goofy, because you could just use C2C at that point, but maybe there are stupid code reasons it has to be this way (e.g. you don't control some aspect of it).

#

Eh, maybe the answer is "just use a modules" anyway. 😛 I can shelve this, it's just a distraction for now. We'll see if anyone runs into it. Thanks

#nesting + Unix sockets/C2H networking