#Serverless endpoint spun up in the middle of the night

10 messages · Page 1 of 1 (latest)

vast crescent
#

I use a serverless endpoint to run LLM inference for personal use. My endpoint spun up in the middle of the night and ran as "initializing" for hours, running up like $50 in charges. As far as I can tell, it's not receiving any requests (I'm the only user, so this makes sense), it's just in some kind of error loop. There's nothing in the metrics except the worker state (that blue there on the right shows the workers trying to initialize for like six hours). I've submitted a ticket, just pretty alarming that an endpoint can spin up in the middle of the night and charge you a bunch of money for no reason. Has anyone else had this happen? Hope I can get a refund.

plucky glenBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution

tawdry reefBOT
vast crescent
#

I deleted and redeployed the endpoint, and am getting the same error loop that's keeping the workers in "initializing." Looking at the logs, I'm getting the same error:

[info] engine.py :170 2026-02-10 08:01:47,041 Error initializing vLLM engine: Qwen2Tokenizer has no attribute all_special_tokens_extended\n

I'm using the Runpod-provided vLLM image, it seems like maybe they updated the vLLM version and it broke something. sad trombone. gonna delete the endpoint for now, if anyone has any ideas, let me know.

hot edge
#

if you run vllm in pods, (install it yourself) does it work?

hot edge
vast crescent
hot edge
#

I think the vllm on runpod is v0.11.0

spiral cove