Until last week I was able to use the following startup parameters without any issue:
--host 0.0.0.0 --port 8000 --model Qwen/Qwen3-Coder-Next-FP8 --dtype auto --gpu-memory-utilization 0.94 --api-key ###### --max-model-len 131072 --tensor-parallel-size 1 --enable-auto-tool-choice --tool-call-parser qwen3_xml
This was always my go-to setup for using the pods.
As of the last few days, no matter what I do, once the model has been loaded and is passed the warm-up phase, it gives me several errors and crashes, trying to reload the shards.
I've tried this with 4 pods today and every single one of them failed. The AI bot thing tells me that hardware might have changed, but from the settings it's still the same.
I use the 96GB VRAM using RTX Pro 6000 for 1.89/hr with 100GB storge..
Any help on this :/ I've already wasted 2 - 3 $ on just trying to get it to startup..