#hanging after 500 concurrent requests

5 messages · Page 1 of 1 (latest)

keen lodge
#

Hi, I loaded llama 8b in serverless with 1 active worker A100, and 1 idle worker, I wanted to benchmark how many requests I can do at the same time so I can go production. But when I send 500 requests at the same the server just hangs and I don't get an error. What could be the issue? how to know how much load 1 gpu can handle and how to optmize it for max concurrency.

empty phoenix
#

@glass pawn any idea?

haughty cobalt
#

Hangs? What's it's like

#

No output just running?

glass pawn
#

Yeah, could you please expand on how it hangs?