`Service.Stop` sends SIGTERM to all container processes since v0.11.5 | Dagger | Page 1

karmic yarrow Jul 9, 2024, 7:40 PM

I'm currently troubleshooting an issue with the graceful shutdown process of a dagger Service.

TL;DR: since v0.11.5 (possibly since the removal of the shim: https://github.com/dagger/dagger/pull/7367/files) the shutdown process of a Service behaves differently. Previously, only the main process of the container would receive a SIGTERM when you did a Service.Stop. Since v0.11.5 every process in the container receives a separate SIGTERM. This means the main program cannot control the shutdown gracefully, if it has a few processes that it needs for its own graceful shutdown they will likely be terminated either before or while they are being used. For example: in the engine we ran dnsmasq, upon a SIGTERM we begin the cache syncing process that uploads layers and cache mounts to some object storage. If dnsmasq exists due to receiving its own SIGTERM we are not able to resolve DNS names and do the upload of the layers/cache mounts. This is how I stumbled upon this issue, while upgrading magicache the integration tests broke (more info here: #maintainers message). I was able to repro this behavior by building two simple programs that are chained:

Program A starts Program B in the background. Waits on SIGINT/SIGTERM, waits a few seconds after and prints before exiting
Program B simply waits for SIGINT/SIGTERM and prints

I wrapped the execution of Program A in a dagger service with a Stop call and executed it with both dagger version v0.11.4 and version v0.11.5. In version v0.11.4 we can see that the only program that prints to stdout claiming it received the SIGTERM is the main one, the child process never receives the signal. However, with version v0.11.5 we can see that both programs receive the SIGTERM, print and then exit. In the screenshot: terminal on the left is dagger v0.11.9 showing that the top level process and the internal one receive separate SIGTERM signals and on the right is v0.11.4 showing that only the top level program receives the signal and prints.

I think I got somewhat close to knowing what might be the issue, but with my complete ignorance of the engine and the container space I'm a bit stuck 👼. Looking at https://github.com/dagger/dagger/pull/7367/files and how the executor combines with the shutdown process, my assumption is that the way the container gets terminated is by running runc kill --all as a last step (in theory this happens because the process we ran in the container is pid 1) which is causing every single process within the container to receive the sigterm. What I yet fail to understand is why that wasn't the behavior previously. The dagger shim was doing some mangling to the config.json but I don't see any changes to the pid namespace (https://github.com/containerd/containerd/issues/2558) or something else that would cause runc to want to kill the process differently.

I did not want to report an issue yet because this might be the behavior we want. I checked docker (with docker stop) and containerd (did a test with a kubernetes deployment) and their default behavior when terminating is to send SIGTERM to the main process only, like we did previously. This is why the sync process of the engine works fine in our CI runners or when testing the engine manually with a docker run.

I can create an issue if you believe this should be tracked there! I will also continue investigating since we need this fixed for magicache tests to clear

Pinging @half marlin since you might be more familiar with this process given the removal of the shim!

half marlin Jul 9, 2024, 8:37 PM

Huh, great find! This is actually the intended behavior of dumb-init (and most inits I believe), search for "In its default mode, dumb-init establishes a session rooted at the child, and sends signals to the entire process group" here https://github.com/Yelp/dumb-init

dumb-init does mention it's flag --single-child that results in it only forwarding signals to the first child process it spawns, which would revert us back to the previous behavior.

So we can at least go back to the previous behavior if needed very easily (trivial to append a flag to it). But I need to think about what's better in the long term. The fact that this is the default behavior of an init gives me a gut feeling it's actually better and more standard than what we were doing before, but worth more thought.

It's also pretty trivial to make this a setting on the Service API, but I want to be highly cautious about exposing low-level + possibly somewhat Linux-specific knobs like that.

If we do decide to keep the new behavior, we can probably update the dnsmasq process to just handle SIGTERM and shutdown nicely.

I'll think more and look around at other inits some more and get back.

Apparently it's actually somewhat inconsistent whether inits default to sending signals to the whole process group or just the direct child, tini also supports both modes but defaults to the opposite: https://github.com/krallin/tini?tab=readme-ov-file#process-group-killing

One important point is that no matter what, I do think we'll at some point want to add a Container/Service option to our core API that disables the automatic init. It's very important we default to it on because init-less containers often behave in extremely surprising+confusing ways, but removing the shim got rid of a hard dependency on an automatic init, so it can become an option now (without getting overly linux-specific).

That's important because at least whichever way we go here, users would always have the option to disable the automatic init and get whatever behavior they want.

Which in turn means that we should just go with the least surprising behavior by default and leave the advanced use cases to be handled by extra knobs on the API.

Reading around it seems like dumb-init's default of forwarding to the whole process group is probably closer to default expectations (even tini's docs mention that their default behavior of not doing that is surprising in some cases). So I'm sort of biased towards leaving this behavior as is and just updating the dagger engine (dnsmasq specifically) to handle this better. That would potentially make us more resilient to various custom provisioned engine deployments that may run into this same issue

#`Service.Stop` sends SIGTERM to all container processes since v0.11.5