#Model loadtime affected if PODs are running on the same server

15 messages · Page 1 of 1 (latest)

worldly trout
#

I was trying to debug the latency on my test PODs and now I figured that PODs running on the same physical machine are lagging too much on IO access.

After profilling, I've got these results.

Example:
Initial test on POD

  • running on a single POD model load time for 6Gb model is 2 sec
  • when I pulled 2 GPUs from the same server model load increased to 40 sec
    Even inference is affected, RAM leaking?

On Serverless:

  • Same GPU 4090, gets different inference and load time as well
  • 30s for loading, 4 sec depending on the machine
  • inference is non uniform as well: 20s on some and 10s on some

All running the same docker, and same scripts with the same libraries.

Do we have any work in place to ensure we have uniformity on HW?
Are we enforcing servers to have separate SSD / NVME for each GPU and including different pipe for IO access?

Need to have some idea if this is persisting issue, I'm pretty sure the Mbps on the descriptors are not reflecting the reality at all.

EDIT: I'm using US region now, Global the problem is worse.

worldly trout
#

Do we have any answers here?

worldly trout
#

I was using Global before, the problem was worse, and now the same region GPUs are showing discrepancy as well. There is no uniformity on inference power.

Maybe cap?

unique coral
#

or else you have to ask this on ticket maybe

unique coral
#

i heard some pods have t his problem

worldly trout
#

there is no support here?

#

This is extreme important to share in a board, so we can see the problem repeats

#

yeah the problem is on Serverless and PODs, I'm stress testing

#

and it's clear now it's a hardware issue

unique coral
#

there is actually, but they're not pretty active here because they are easier to report problems in their own platform

unique coral
worldly trout
#

interesting, I'll post that, but leave this open so the other users can see, I'm already seeing a lot of complains on the same, so it's getting hard to push to production.

Yes, software cap on docker host.

unique coral
#

alright