#Runpod Servelerss really unreliable, delay time is way too high sometimes

4 messages · Page 1 of 1 (latest)

sweet prism Mar 7, 2025, 3:05 AM

I'm using a 24 GB vRAM serverless endpoint, the endpoint is way too unstable, 90% of the times the "QUEUE" takes a couple of seconds and then inference of the Omniparser v2 model takes between 3-8 seconds.

This is a replicable result in Google Colab and other GPUs, nonetheless, every once on a while Runpod takes more than 40 seconds ofr even minutes to process a request. This happens when a specific worker bugs and then multiple request goes through it. The worker bugs for no reason and takes multiple minutes to do the job it should do in seconds. This only happens for some workers and when the same worker is used multiple times, it makes no sense and Runpod charges you multiple minutes of DELAY TIME, sometimes it does not even go through, meaning it says "IN_PROGRESS" as seen in the image for multiple minutes without finishing while Runpod charges you every second.

In any other environment and even runpod this process takes seconds, the "IN_PROGRESS" print shows between 3-8 times only. This makes the endpoint highly unstable and way too expensive for a model that does not even use half of the vRAM.

Stable endpoints (most of times): Delay: 5-10 seconds, Inference-3-8 seconds
Unstable endpoints: Delay (1 min-Endless), Inference (unbkown, sometimes i never finishes, other times we do not know if its slow the loading or the inference)

The workers were tested separately, onyl some of the fail.
The requirements of the model are not even close to the capacity of the machine.
Its only some bugged workers since the others have replicable times, really similar to Google Colab discouting the dealy time
I have multiple endpoints, this seems to be s recent behaviour and its not desired at all, one because it does not make sense accoridn go the hardware the serveless end point has and two because Runpod charges you these errors.

Worst part, the endpoints sometimes runs until timeout charging a whole lot

paper bison Jun 7, 2025, 10:36 PM

Can I ask please if your running of omniparser was successful? I have seen a recommendation to use an L40S pod to run it. Is the pricing really as simple as 0.86per hour in this case? Sorry for the noob questions I'm trying to scope out a test of a project I'm working on