Deployment problem | Learn AI Together | Page 1

hybrid breach Feb 21, 2025, 11:18 AM

#

I am working with deploying a multi-modality model. I suddenly have a question about project when I deploy.
My QUESTION is followed/ blob_help
Assume that three requests are posted concurrently. Every request is passed by different users and need to request LLM to generate something. I have deployed LLM locally. Now three requests are passed. Do I need to create three LLM model instances to handle with requests respectively? The general solution is what. I only image to use a pool to create a fix number LLM instances . System selects an instance when requests are passed. However if my gpu is A40 and gpu memory just 48G, it's difficult for me to create a pool and push ten instances like Janus-7B. Have several solutions to deal with it?

latent garden Feb 24, 2025, 3:23 AM

#

Could try batching (if supported) or queueing, (or both, push onto a queue and batch procesess, then split and respond).

tawny basalt Mar 5, 2025, 7:26 PM

#

hybrid breach I am working with deploying a multi-modality model. I suddenly have a question a...

Has bro not thought of queues?
I mean an LLM doesn't take much time to generate if you have an A40 GPU.

hybrid breach Mar 7, 2025, 2:33 AM

#

tawny basalt Has bro not thought of queues? I mean an LLM doesn't take much time to generate ...

No... I dont have any engineer experience before. Thanks for your reply. It's very useful

hybrid breach Mar 7, 2025, 2:34 AM

#

latent garden Could try batching (if supported) or queueing, (or both, push onto a queue and b...

It's an amazing thought. I know queue but ever dont consider it as a solution. Thanks your reply. It's useful!

tawny basalt Mar 7, 2025, 8:28 PM

#

hybrid breach No... I dont have any engineer experience before. Thanks for your reply. It's ve...

Programming is most of like engineering.
But a queue is simple to implement
Np mate :)

#Deployment problem