Hi. I deployed a Serverless app. Some workers are experiencing the error shown in the images, but not all of them. I deployed 5 workers, and 1 or 2 of them were affected, even though all the workers has the same deployment.
#Some workers failed with GPU Acceleration
30 messages · Page 1 of 1 (latest)
I got the same issues. Could someone mind to check this please?
@abstract temple
Escalated To Zendesk
The thread has been escalated to Zendesk!
one error is OOM, meaning youre out of vram and if you want to run it smoothly choosee another gpu with bigger vram
yeah I see. But I don't think it's OOM from my model. The model is lightweight, just a small TTS model with 0.3M parameters, takes around 3GB of VRAM.
No concurrency at runtime.
@paper gale
@abstract temple
Escalated To Zendesk
The thread has been escalated to Zendesk!
Ticket ID: #30408
its more like your code is doing something wrong or runpod is doing something wrong before giving the worker to you
I'm sure that my code is correct because it works well on my local machine and another platform
And not all workers got the same issue. Another one work well.
@paper gale any updates? I still got this error too much time recently.
You haven't opened the ticket BTW, you need to press that button above to open it
Seems like it's inconsistent happening on some workers only, but what if you used a way bigger vram gpu for safety?like h100 or 48gbs
Which one are you using right now?
And which tts model is this?
You can see my screenshot above. My model is F5TTS running on RTX4090 which has 24GB of VRAM.
It's a really lightweight tts model. I can run in a 6GBVRAM card as well.
You can also see here.
I didn't turn on the concurecy handler. -> That mean It should have only one process run in the time. But why it shows the worker have 3 processes here?
Btw, even if all processes are created from my code. But total of memory is less than 6GB. It's a small amounts vs total VRAM of RTX4090.
Do you have some kind of batching for that model inference code? (not runpod request batching)
And I actually can't see the 3processes did you mean the 3 requests that worker took, or it should be on a different screenshot?
Maybe do watch your nvidia ami output when running inference, use web terminal and keep an active worker for debugging
I mean the error message it says that the worker has 3 running processes (current process, 180 and 181 - looks like spawned from my main process).
Anyway could you take a look at the GPU memory?
I couldn't. Because as what I said, not all workers have same error. 90% requests are done as well. So I don't know which worker will be error to debug it.
And when the error happened. All requests to the same worker are error too.
Ah I found that after 5 -> 15minutes since error. The worker will be deleted by runpod serverless system.
Ohh right
Hmm ic but if you retry the same input it can succeed?
I think you really have to ask in support ticket to check what's going on I'm not sure either, the best thing I can do right now is guess because lack of context
Btw do you notice that your code might run another process using the mp library in python
If you can just share your repo maybe I cna take a look later ( dm me if you want)
Or re ping me after you send the codes