I've been rotating through GPUs testing inference, but I've found that workers can hang when trying to reload onto a new GPU. I've tested many tiers, but I've found this to be a bigger issue when selecting higher VRAM counts. For example, I was going to test 80GB, but it's hung for about 10 hours while I was away.
I've noticed it's particularly bad if you push a change to the GitHub repo while also changing the VRAM count.
The solution I've had is to just delete the hanging workers and have them reload.
Since I tagged feature request, here's a specific request regarding this issue. Are you able to come up with a solution for this? Perhaps reload timer setting would be the solution. (if stuck initializing for more than [time], reload worker)