#NestJS microservices & Heavy jobs

27 messages · Page 1 of 1 (latest)

sharp linden
#

I have the following services:
API
ProcessorService

They are run in docker and the ProcessorService has 3 replica's

The api starts a job, which can take a while, One of the processorservices starts handling that job

A ID is created, now I want to get the process from the ProcessorService, how would I be able to get the process from the specific replica that's processing this process?

Ps. ProcessorService can horizontally scale at will, so I don't want to add 3 clients (configured with ClientsModule), because at some time it could be 5 instances during runtime.

Questions;

  • Can I possibly target a specific replica of the microservice?
  • What's the best event/message broker for this case?
drowsy sphinx
#

This question needs to be better written. A docker instance of what? RabbitMQ? Start a job on 1 pod? Of the three? What? RabbitMQ doesn't work with jobs. Please explain better.

sharp linden
#

NestJS microservices & Heavy jobs

#

@drowsy sphinx I have updated the main question, since you've not looked into my question in #hangout 🙃

Thanks for your time

drowsy sphinx
#

So, Docker alone doesn't do load balancing. You'd need some sort of load balancer in front of the processor service replicas. It also has no idea about scaling an application. So, not sure how you will get "scale at will".

There is no such thing as a microservice broker either. I think you mean a message broker for communication between the microservice and your gateway/ controller. Seems like any type of messaging system will work.

Also, if you need a job queue, you'd probably be better off with something purpose built for job queues. Like Bull. https://docs.nestjs.com/techniques/queues

#

Going a bit further, microservices come with a ton of complexity and running microservices should be done in an environment like Kubernetes, where loadbalancing and scaling are handled for you. Docker wasn't really made to be a runtime for applications in production, even though some people use it as such.

sharp linden
#
  1. It's running in Kubernetes.. That's still docker... And replica's are still part of replicasets
  2. Yes the wording for message broker is better...
  3. The service itself is already using job queue, but the mentioned solution like Bull is not what I am trying to do here.
#

I've tried rabbitmq but that doesn't seem to be right, for example this is the scenario:

  1. api: DO_JOB_X
  2. One of the services responds with "Hey I picked that up, here is the ID of said job"
  3. api requests: JOB_PROGRESS with data { ID: <the id it received>}

How would I respond to JOB_PROGRESS with the correct pod that's handeling that request.

#

But rabbitmq just randomly assigns it to one of the connected clients

#

also, please don't go into the specifics of the job or whatever it's doing, that's not any part of the problem at hand here and is just distracting

drowsy sphinx
#

That's still docker.
Um, no it isn't. Not unless you are using an old version of k8s (1.23 or older). And the fact you are running things in k8s would have saved me from even pointing the environment out.

How would I respond to JOB_PROGRESS with the correct pod that's handeling that request.

If you are looking for a request/response cycle for the status of the jobs, then you must have a way to collect the status of the jobs. Your ProcessorService could be sending out event messages to update another service with status information (per jobId). Your API would then query that "JobStatusService" (or whatever you want to call it) to get the latest status of a job.

Or, in your ProcessorService, you store status information with jobIds in some store central to the service (like Redis). When you poll the ProcessorService for status information, any one of the pods can call up the store for the current status of a job. It won't matter what pod it is running in.

#

In general, with microservices, you have to think a lot more in terms of "decoupling". Your services should not be reliant on a particular instance of another service. If you are doing that, you aren't doing microservices.

sharp linden
#

1.23 and older have been EOL for a long time.

#

The job in question is handeling a Discord Bot. I unfortunately can't have multiple Discord Bots running in the same instance of NestJS (Necord limitation). So I've created a general instance of NestJS which are fed with their own token through environment variables etc.

#

But if discord bot A is responsible for server A. I want the api to tell discord bot A to do the thing for server A

#

The amount of replicas is finite. But Bot A cant access servers of Bot B

#

Storing all bot states to a central place is also very heavy and overkill

drowsy sphinx
#

If your server instances have to be unique, then you can't create a single generic service and have them service all servers. You'll have to create three (or however many servers you need to service) services. One service for each server. At that point, you can can have the load balancing you are asking for, but external to the cluster. In fact, at that point, you could just either use subdomains for each bot, or have a different path (if your ingress can handle path mapping, which most do) for each bot. You wouldn't need any kind of special load balancing. And, your services are still elastic/ scalable, which I believe was the main goal, right?

sharp linden
#

Okay I get what you're trying to explain, I also get that you're looking for solutions here. But it simply does not work like you think it does. Let's just say any of your solutions aren't possible. And keep our focus on the main question, because it seems like the whole architecture is distracting you.
All I want to have is a decoupled way of commuicating that supports both "Send event message to the first one that acknowledges" and "Send event to specific instance"

drowsy sphinx
#

@sharp linden - As I noted above, as soon as you say "specific instance" then it has to be a specific service. There is no way to have a set of replicas of a single service and point to one of them only. That isn't how k8s services work. Let me put it this way, as soon as a microservice/ service has to do anything in a particular way, with a particular config or a particular data set or a particular output, it must then be its own service. At that point you can communication practically any way you'd like with it, because it is a particular instance/ service.

sharp linden
#

I am not sure how to get it any clearer to you that I am not looking for a solution to the infrastructure. It's pretty clear to me that we are not talking about one and the same thing and I feel like explaining why said thing isn't like you think it is is a waste of my and your time...

Can you please, for the sake of us both, just imagine that the service I am talking about, is one thing and does one thing the same way over each and every replica.

All I want to know, is to check if there is a solution, where the API could send an event and get a response from a specific replica. The usecase of this service or the infrastructure is not relevant.

drowsy sphinx
#

where the API could send an event and get a response from a specific replica

A specific replica of what? And I don't even really need to ask that question, because the context of a "replica" is already in the realm of k8s infrastructure. I'm not trying to give you a solution to the infrastructure, I'm trying to tell you how k8s works. You said, your services are in containers in k8s, right? And you are trying to solve your problem inside k8s, right?

I'll stop there. Unless you can agree that the environment your services are ran in does matter, we won't be able to speak about the answer you need or rather you won't get to an understanding of what the answer is.

sharp linden
#

It doesn't matter if I have it in k8s or I spin up the server three times with just node dist/main.js

#

They connect to rabbitmq or any other broker the same etc etc

drowsy sphinx
#

It does matter. In k8s, you have a service defining the connectivity to your pods. The service makes any number of replica pods to a unit. With VMs/ servers, you have fixed IPs. In k8s with pods, you have no control over what happens past the service in terms of communication. With VMs/servers, you control how they are "spoken" to.

See the difference now?

The general difference is discoverability and routing. k8s has this object called a "service", so you don't have to wonder what IP address has been given to a certain pod, and thus pods can be killed and replaced at any point in time without down time (assuming you have more than one pod running behind the service), which is one of the bigger advantages of k8s - no down time and easy horizontal scalability. But yet, this service system is why you also can't address a single pod for any kind of special information. Pods behind a single service in k8s work as a unit all doing the same work. You could have many, many pods, and you'll never know what each one is doing in particular (unless you do as I noted above), because you can't address a single pod in any particular way.

So, are you running your service in k8s or not? Depending on the answer, will depend on what you can do.

sharp linden
#

You're wrong simply because I asked in the beginning what the best message/event broker would be. The k8s service is not the one connecting to that broker. It's the pod itself. Therefore it does not matter.

I wanna know if I can get a message broker that for example takes client_id's and I can target a client_id through that. Not through k8s or directly over ip. And at the same time give it a message that can be taken by any of the connected clients.

drowsy sphinx
#

I wanna know if I can get a message broker that for example takes client_id's and I can target a client_id through that. Not through k8s or directly over ip. And at the same time give it a message that can be taken by any of the connected clients.

So, now I have to ask, what does "target a client_id" mean? You noted above you wanted to be able to query for a particular job status. Is that via the "client_id" and what you mean by targeting it?

If your answer is yes, then my answer is absolutely, yes.

Any messaging service can take and pass forward the message, "give me a status on the job for client_id X". And, you'd listen for a response from the service that runs jobs and stores the status or collects status updates from the jobs.

How you inevitably keep track of the job statuses is back to what I told you above. If you have a service that is scaled out, you MUST somehow store the status changes local and central to the replicas used, so any replica can answer the request message "give me the status of client_id X". If you only have a VM with the service, then you can store that in its memory as a cache. Just that, if that VM goes down, you lose the status conditions (and why event sourcing is a thing). You could also have multiple VMs, for high availability and then you must have either the local cache and sticky sessions with a load balancer that can do sticky sessions or a central storage mechanism for storing the statuses.

Or, you send out events from the service, which communicate "I've got client_id X and I'm at status Y" and another service listens to these status changes and stores them. That service would then reply to your request for "give me a status on the job for client_id X". This is the pub/sub pattern.

You can also have a message broker like NATS, that allows for pushing messages to certain services. However, that doesn't negate the storage needs I've noted.