#CUDA driver initialization failed

3 messages · Page 1 of 1 (latest)

fossil wraith
#

RuntimeError: CUDA driver initialization failed, you might not have a CUDA gpu.

(Serverless RTX4090)
(FROM runpod/base:0.6.2-cuda11.8.0)


First, it was below error at random times/workers:

CUDA out of memory. Tried to allocate 512.00 MiB (GPU 0; 23.64 GiB total capacity; 22.98 GiB already allocated; 4.81 MiB free; 23.10 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Then I added
{"refresh_worker": True}
and above error started to occur at random times/workers. It replaced errors.

distant quarry
#

It looks like you’re running into memory issues. Maybe try a more powerful GPU with more VRAM.

fossil wraith
#

RuntimeError: CUDA driver initialization failed, you might not have a CUDA gpu.

Do you think this is also memory related? @distant quarry