#0-downtime deployment on Kubernetes with NestJS and readiness probes

23 messages Β· Page 1 of 1 (latest)

odd stump
#

So currently when load testing a nestjs app with terminus and shutdownHooks enable while doing a rolling update on Kubernetes, I still get a small percentage of failed requests.
I then assumed the 0-downtime deployment is not correctly configured. FYI, I use app.enableShutdownHooks(); and a simple terminus healthcheck in the readiness (https://github.com/nestjs/terminus), and using dumb-init to be sure to receive sigterm signal.

Are there any good practices on how to achieve a fully clean shutdown?

Basically:

  • on SIGTERM signal, set readiness probe to fail, to tell the orchestrator to send requests
  • wait X seconds (should match the interval of the readiness probe, to be sure the orchestrator got the info), to be sure traffic stops being forwarded
  • still wait x seconds to be sure IP address is removed from all tables
  • proceed to close the webserver (process last requests if there are still some running)
  • proceed to close database connections and others connections
  • Shutdown the app

Mandatory: https://learnk8s.io/graceful-shutdown

odd stump
#

So I tested in local, and it seems the app doesn't wait any readiness interval when receiving a SIGTERM

Is there an easy way to wait let's say 15sec after setting the readiness probe to fail, before closing the server?

rapid dirge
#
odd stump
#

Thanks!
Although, I was hoping something a bit more production-ready, and with a bigger community support than a github repo with 13 stars & 2 commits πŸ˜… (not against small projects, but as NestJS is huge in the JS community, I am a bit surprised that no well-established solution exists. Or is no one doing proper graceful shutdown/Kubernetes usage?)

rapid dirge
#

I am a bit surprised that no well-established solution exists
To be honest, proper shutdown isn't an often asked for problem. No. I'd venture to say, most people are taking that hit of a small percentage of failed requests. Or, not many are using k8s to run their apps. (I'd venture to say it is more the latter).

And, in the world of OSS, you should simply be grateful anything is available you can either use or "borrow" from, that gets you one or more steps further to solving your problem.

odd stump
#

100% true about the OSS part, I am just surprised it is not a more popular problem

rapid dirge
#

I think this can be chalked up to, if it is a problem being faced, it will usually be within an enterprise system and they'll more than likely not put their solution up in the OSS space (unfortunately).

odd stump
#

I was wondering with some colleagues how it is possible people are running at scale in prod, without proper graceful shutdown, and we went to the same conclusion πŸ˜… that people might just built it in-house and never share

Btw, to come back to the package, it doesn't seem to handle the exact use case, it is more for shuting-down long-lived connection on SIGTERM, am I correct?
While I was more looking for something that would wait X before triggering the shut down. Maybe I should do a PR on nestjs-terminus... πŸ€” ?

rapid dirge
#

nestjs-graceful-shutdown ensures graceful communication with clients currently receiving responses from your server during the shutdown process.
gracefulShutdownTimeout seems to be the wait time you are looking for.

 /**
    * The duration in milliseconds before forcefully
    * terminating a connection.
    * Defaults: 5000 (5 seconds).
    */
#

From httpTerminator:

The main benefit of http-terminator is that:

it does not monkey-patch Node.js API
it immediately destroys all sockets without an attached HTTP request
it allows graceful timeout to sockets with ongoing HTTP requests
it properly handles HTTPS connections
it informs connections using keep-alive that server is shutting down by setting a connection: close header
it does not terminate the Node.js process

odd stump
#

Ah yes didn't notice nestjs-graceful-shutdown was using http-terminator under the hood
I thought nestjs-terminus was using terminator πŸ˜… But turned out they removed it from the package
Confusing, confusing πŸ˜„

#

We do wrap the nestjs-terminus with our own module to share in a monorepo
I will just add a beforeShutdown hook with a sleep I guess, it should be enough πŸ™‚

odd stump
#

Basically,a service like this seems to do the trick with nestjs-terminus:

import { BeforeApplicationShutdown, Injectable } from '@nestjs/common';

@Injectable()
export class GracefulShutdownService implements BeforeApplicationShutdown {
  // eslint-disable-next-line class-methods-use-this
  async beforeApplicationShutdown(signal: string) {
    if (signal === 'SIGTERM') {
      console.log('SIGTERM received');
    }
    console.log('start sleep 20s');
    // eslint-disable-next-line no-promise-executor-return
    await new Promise((resolve) => setTimeout(resolve, 20000));
    console.log('stop sleep');
  }
}
odd stump
#

@rapid dirge Would it make sense if I do a PR to terminus repo to add this feature? (the waiting time)

rapid dirge
#

@odd stump - I'm not really an expert to ask that. I'm just a moderator here in the Discord server for Nest. I'd say, if it isn't too much work and if it will also help others, then yeah, make the PR.

odd stump
#

Oupsi I got misleaded by the NestJS badge ^^ But sure would do, I don't think it is a lot of work

odd stump
odd stump
iron sapphire
#

Hi @odd stump
Can you help to share an example of how to gracefully shutdown when using the nest-terminus in a Nest application that has some connections with database mysql (typeorm) and redis caching?

odd stump
#

Hi @iron sapphire , just use the module like the doc specifies (https://docs.nestjs.com/recipes/terminus#graceful-shutdown-timeout):

TerminusModule.forRoot({
      gracefulShutdownTimeoutMs: 1000, // <-- change here
    }),

And with gracefulShutdownTimeoutMs being higher than your readiness probe.
Then it should close the webserver, and close all connections

iron sapphire
#

Thanks for sharing @odd stump
Do you have any examples of how to handle close Redis, MySQL using TypeORM?

odd stump
#

It should be handled by the package I believe