#No CUDA GPU available after not using GPU for a while

12 messages · Page 1 of 1 (latest)

round jay
#

Hi! I need some help regarding my GPU pod. My pod shows no cuda GPU available out of nowhere a lot of times and only gets fixed if I restart the pod.

nvidia-smi output:
Failed to initialize NVML: Unknown Error

Its on secure cloud and on-demand.
If anyone faced a similar issue, please help.

fathom magnetBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution

rough ether
#

Same problem, bro (

round jay
next loom
#

Could you provide more info what is happening what template you use etc

round jay
#

Hi @next loom.

I am using the following template:

runpod/pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04
On-Demand - Secure Cloud

Instance details:
1 x H100 80GB PCIe, 32 vCPU 188 GB RAM

#

I don't think I have 0 gpus assigned. I haven't stopped my instance since I started it. It was working fine at start. Now, it shows no CUDA gpus available and fixes when I restart it. I have noticed it happens when I don't use GPU for a couple of hours (my instance being running all the time and I am being charged for it).

next loom
#

do you mind sharing pod id?

round jay
#

dmri8voyci6eh2

next loom
#

Do you mind sending me email connected to that runpod account on private message