Support Ticket responses sound AI-generated and are non-specific | Runpod | Page 1

spiral karma Oct 8, 2025, 7:32 PM

#

Every support ticket response I've gotten sounds like it was written by AI, and in the worst way.

One example is a recent response I got to an ongoing problem, where the agent said "This is likely because you have only one specific GPU type selected for this endpoint" Which is absolutely false. I sent him my endpoint ID. He could have easily looked and seen that I have many possible GPUs selected.

The "likely because" feels like something written by AI, and I feel like the support agents are just sending out rapid AI responses, instead of looking at the specific endpoint to diagnose the problem.

This has resulted in almost every support ticket I've opened going back and forth needlessly, longer time-to-resolution, and much more frustration.

silk spear Oct 8, 2025, 7:47 PM

#

Do you know your ticket id?

spiral karma Oct 8, 2025, 7:50 PM

#

Yes, it is 24807

silk spear Oct 8, 2025, 7:52 PM

#

hmm you have 1 outdated worker and 4 throlled

#

would deleted outdated one as it could prevent getting requests

spiral karma Oct 8, 2025, 7:53 PM

#

I thought about that, but I was scared that it would make my endpoint stop working b/c the only non-throttled worker was deleted

#

I also don't understand why the other workers are throttled since I have a variety of GPUs available

silk spear Oct 8, 2025, 7:54 PM

#

throlling is little bit dynamic thing

#

it's all based on suply and demand

spiral karma Oct 8, 2025, 7:54 PM

#

Actually - even though I have 1 active worker, there are now 63 jobs in queue...and no worker is running

#

Ohw ait - the outdated one just started and is processing them

#

So, since it's throttled - does that mean that ALL of the GPUs I have selected are not available? That is a bit scary

silk spear Oct 8, 2025, 7:56 PM

#

to be true I would not worry and delete outdated worker so request could come to new workers

spiral karma Oct 8, 2025, 7:57 PM

#

If H200s are throttled - why would it not use H100s or other GPUs? Why would it just show throttled?

#

Was that outdated worker never going away just a bug in RunPod?

#

Obviously I wouldn't expect to have to log in and manually terminate workers.

silk spear Oct 8, 2025, 8:00 PM

#

it did not get away as you had lot of requests

#

suspecting you have used new realese function

spiral karma Oct 8, 2025, 8:01 PM

#

Well - if you look at the original ticket, no workers were pulling requests at all...no worker was running. So, I was specifically trying to trigger a new release just to try and get the system unstuck.

There were really two issues:

My initial issue is that I had a bunch of idle workers and none of them were running jobs.

#

Then, to try and get it "unstuck" I made changes to my endpoint configuration to try and "force" a release. And after that worked, I then ended up with a bunch of throttled workers. Despite the fact that the way I triggered the release was by adding more GPU types

#

The only weird thing I noticed for issue (1) was that I had a SIGTERM in the logs...so I thought maybe a worker crashed and that caused the system to get stuck.

silk spear Oct 8, 2025, 8:03 PM

#

It's cause service will prevent sending tickets until all is ready

#

for SIGTERM it does not look like coming from us.

spiral karma Oct 8, 2025, 8:04 PM

#

So...any advice on how to prevent this in the future?

silk spear Oct 8, 2025, 8:05 PM

#

https://docs.runpod.io/tutorials/sdks/python/101/error
try avoid new realese on lot of requests sometimes is better to up max workers if you plan to do so.
you can kill problematic workers with runpodctl

Runpod Documentation

Implementing error handling and logging in Runpod serverless functi...

spiral karma Oct 8, 2025, 8:06 PM

#

Ok - right now I'm just using your standard vLLM template - not a custom serverless function

#

(although we are building one of those for another service)

#

One last question....why won't my endpoint let me add A100 as options? Every time I click those checkboxes under "Advanced" and save. Then reopen the edit dialog, they are unchecked again.

silk spear Oct 8, 2025, 8:08 PM

#

Oh another tip remove from filters CUDA versions under 12.4 and also uncheck CUDA 12.9

spiral karma Oct 8, 2025, 8:09 PM

#

oh - so maybe the A100 doesn't support some of those versions?

#

Or some of those versions don't supprot A100

silk spear Oct 8, 2025, 8:09 PM

#

nah all machines are run minimum 12.4+

spiral karma Oct 8, 2025, 8:10 PM

#

Well after making the CUDA change, it now let's me save A100

silk spear Oct 8, 2025, 8:10 PM

#

Hmm I think with A100's it might be a bug

spiral karma Oct 8, 2025, 8:13 PM

#

While I have you - sorry one other question.

One other "problem" I see happen daily is that if we are at 0 workers and jobs start to come in, the worker starts running. But because we are using a 70B model, it takes a while to start up.

While it's starting up, we hit the Queue Delay threshold - and more workers start up...so now we have 2, 3, 4 workers. Of course - we only really needed 1 worker, but the scaling up happens faster than the startup time of the worker.

I don't really want to increase the Queue Delay, because under a scaling scenario I would want them to go up. I guess what I really want is for the clock to start ticking once we have at least one worker running.

I've run into this with other autoscaling services before, where the scaling criteria from 0-to-1 is often different from 1-to-2. Although I guess this same thing could happen when going from 1 worker to 2...where it adds a 3rd or 4th prior to the 2nd one starting to take jobs.

Any advice? Thanks!

#

Other autoscaling services have a "minimum time between scaling events" setting or something like that to help with this too

silk spear Oct 8, 2025, 8:15 PM

#

ok so I checked and my old VLLM endpoint can't enable A100 but on new one it works.

spiral karma Oct 8, 2025, 8:15 PM

#

weird - any idea why?

silk spear Oct 8, 2025, 8:15 PM

#

you might want to play around here:

#

are you using network storage or model baked into docker image?

spiral karma Oct 8, 2025, 8:16 PM

#

We are passing a hugging face model to the vllm image.

#

Using this container image: registry.runpod.net/runpod-workers-worker-vllm-main-dockerfile:2becd3534

silk spear Oct 8, 2025, 8:17 PM

#

so you are using new model cache?

spiral karma Oct 8, 2025, 8:19 PM

#

I think so (not sure how to verify as another engineer set it up initially) - and we have Flashboot enabled

#

We are using the "Model (optional)" setting to set the model - which I think enables the "cached model" feature

silk spear Oct 8, 2025, 8:20 PM

#

yup it is

spiral karma Oct 8, 2025, 8:56 PM

#

Yea - it's happening right now. Despite having active workers set to 1....I have 90 jobs in queue...0 jobs in progress and 4 running workers. The first worker took over 3 minutes before it processed the first job

#Support Ticket responses sound AI-generated and are non-specific