I'm currently troubleshooting an issue with the graceful shutdown process of a dagger Service.
TL;DR: since v0.11.5 (possibly since the removal of the shim: https://github.com/dagger/dagger/pull/7367/files) the shutdown process of a Service behaves differently. Previously, only the main process of the container would receive a SIGTERM when you did a Service.Stop. Since v0.11.5 every process in the container receives a separate SIGTERM. This means the main program cannot control the shutdown gracefully, if it has a few processes that it needs for its own graceful shutdown they will likely be terminated either before or while they are being used. For example: in the engine we ran dnsmasq, upon a SIGTERM we begin the cache syncing process that uploads layers and cache mounts to some object storage. If dnsmasq exists due to receiving its own SIGTERM we are not able to resolve DNS names and do the upload of the layers/cache mounts. This is how I stumbled upon this issue, while upgrading magicache the integration tests broke (more info here: #maintainers message). I was able to repro this behavior by building two simple programs that are chained:
- Program A starts Program B in the background. Waits on SIGINT/SIGTERM, waits a few seconds after and prints before exiting
- Program B simply waits for SIGINT/SIGTERM and prints
I wrapped the execution of Program A in a dagger service with a Stop call and executed it with both dagger version v0.11.4 and version v0.11.5. In version v0.11.4 we can see that the only program that prints to stdout claiming it received the SIGTERM is the main one, the child process never receives the signal. However, with version v0.11.5 we can see that both programs receive the SIGTERM, print and then exit. In the screenshot: terminal on the left is dagger v0.11.9 showing that the top level process and the internal one receive separate SIGTERM signals and on the right is v0.11.4 showing that only the top level program receives the signal and prints.
I think I got somewhat close to knowing what might be the issue, but with my complete ignorance of the engine and the container space I'm a bit stuck 👼. Looking at https://github.com/dagger/dagger/pull/7367/files and how the executor combines with the shutdown process, my assumption is that the way the container gets terminated is by running runc kill --all as a last step (in theory this happens because the process we ran in the container is pid 1) which is causing every single process within the container to receive the sigterm. What I yet fail to understand is why that wasn't the behavior previously. The dagger shim was doing some mangling to the config.json but I don't see any changes to the pid namespace (https://github.com/containerd/containerd/issues/2558) or something else that would cause runc to want to kill the process differently.
I did not want to report an issue yet because this might be the behavior we want. I checked docker (with docker stop) and containerd (did a test with a kubernetes deployment) and their default behavior when terminating is to send SIGTERM to the main process only, like we did previously. This is why the sync process of the engine works fine in our CI runners or when testing the engine manually with a docker run.
I can create an issue if you believe this should be tracked there! I will also continue investigating since we need this fixed for magicache tests to clear