So currently when load testing a nestjs app with terminus and shutdownHooks enable while doing a rolling update on Kubernetes, I still get a small percentage of failed requests.
I then assumed the 0-downtime deployment is not correctly configured. FYI, I use app.enableShutdownHooks(); and a simple terminus healthcheck in the readiness (https://github.com/nestjs/terminus), and using dumb-init to be sure to receive sigterm signal.
Are there any good practices on how to achieve a fully clean shutdown?
Basically:
- on SIGTERM signal, set readiness probe to fail, to tell the orchestrator to send requests
- wait X seconds (should match the interval of the readiness probe, to be sure the orchestrator got the info), to be sure traffic stops being forwarded
- still wait x seconds to be sure IP address is removed from all tables
- proceed to close the webserver (process last requests if there are still some running)
- proceed to close database connections and others connections
- Shutdown the app
Mandatory: https://learnk8s.io/graceful-shutdown