#How to resolve throttling on RunPod Serverless endpoints?

2 messages · Page 1 of 1 (latest)

tame topaz
#

Hi,

I’m using a RunPod Serverless endpoint and occasionally I see the endpoint becoming throttled during inference.

The workload is GPU-heavy (video / image generation), and throttling seems to happen when requests run for a bit longer or when CPU-side processing (e.g. ffmpeg / preprocessing) is involved.

I’d like to understand:
• What are the common causes of throttling on Serverless endpoints?
• What are the recommended ways to mitigate throttling?
(e.g. limiting concurrency, splitting CPU/GPU workloads, adjusting endpoint settings, or using a different deployment type)

Any guidance or best practices would be appreciated. Thanks!

rancid canopyBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution