#service de-duping

1 messages · Page 1 of 1 (latest)

abstract gulch
#

I think that currently services will be deduplicated not only within a single session but also across them (right?). If so, that would mean if two clients are concurrently running pipelines that spin up a db as a service, the db would be shared across both clients, which probably results in unexpected behavior quite often.

I was then thinking that clients could at least just set an env var or something on the service instance that uniquely identified the client/pipeline instance somehow, but then realized that would change the service's content-based DNS name and thus always invalidate the execution of the execs depending on the service (right?)

If that's all true, I feel like maybe we should just dedupe services ourselves inside the session rather than relying on the buildkit solver deduping, which is unfortunate to not be able to take advantage of, but I can't think of another way to get consistent behavior.

sleek ingot
#

Yeah this is all true

#

Once we have the new progress stream integrated it'll be really easy to implement per-session deduping, since we can forward gateway container stdout/stderr to a custom progress vertex, and just use an in-memory lock/map

The remaining concern would be making sure we don't end up with the exact same hostname across sessions; I think you could end up with funky load balancing behavior there. I guess we could inject a per-session value?

abstract gulch
# sleek ingot Once we have the new progress stream integrated it'll be really easy to implemen...

I guess we could inject a per-session value?
Yeah I think that would make sense, just include a random id generated once per session in the hash. If we're using gateway containers now we wouldn't need to worry about that impacting the caching of the service itself anymore.

But actually, I guess we do need to worry about the hostname still invalidating the execs depending on it... Worst case there are some tricks to inject information into an exec without it impacting caching: the http proxy settings in LLB don't break the cache (would be a total abuse, but could make it work robustly nonetheless I think), secret values don't actually invalidate the cache when it comes to the raw LLB I think, etc.

sleek ingot
#

Oh yeah good point. Hmm - maybe we could resolve hostnames to IPs using the per-session map and just inject them in /etc/hosts? We already do something like that for service aliases

#

@abstract gulch Have you seen any cases where you use Hostname / Entrypoint without WithServiceBinding? That's something that works today, but might be hard to support depending on what option we go with here

abstract gulch
sleek ingot
#

Oops yeah that's what I meant. I just remember seeing Hostname/Endpoint used in some of our bootstrapping/test code, and was wondering about folks who run dockerd too, wanted to make sure they're all binding the service. Sounds like probably