Hello there. On January 23rd, at approximately 5pm, my serverless endpoint was reachable here: https://kviznxeq34txwt.api.runpod.ai (i.e. it was receiving requests and spinning up workers accordingly and then sending responses once running.)
Shortly there after, it suddenly stopped sending requests to workers. This morning, after forcing a worker to stay online permanently, my request did land on that worker and it spun up my services, but my health checks were not returning anything to my app server. (i.e. I could see from within the RunPod user interface that all of my services had started up successfully, but the worker was unable to respond.)
I believe this endpoint has become corrupted and I would have no problem spinning up a new endpoint in the same region so I can attach my network volume to it, but I worry I won't be able to have the GPUs I need allocated to it. Is there any way the RunPod team could either force a reset of these workers or allocate me the GPUs I would be forgoing if I terminated this endpoint? ( I have terminated workers/done everything I can on my end to try to force a reset...the new workers are successfully pulling in new docker images; they are just continually failing to respond to health checks.)