One serverless worker on an endpoint fails to initialize CUDA under PyTorch 2.11.0+cu128 on an RTX 5090, while a second worker in the same endpoint runs the identical workload successfully.
The only observed difference between the two workers is the NVIDIA driver build:
- Failing worker:
580.126.09 - Working worker:
580.126.20
On the failing worker, the pod accepts jobs but loops during startup because CUDA initialization never succeeds. This causes affected requests to consume retry budget without completing successfully.
Affected resources
- Data center:
EUR-NO-1 - GPU types enabled:
RTX 4090,RTX 5090 - Minimum CUDA version configured:
12.8
Observed workers
- Failing pod:
r4olk6c2f93dny- GPU:
RTX 5090 - Driver:
580.126.09
- GPU:
- Working pod:
h9q3efquin7ztc- GPU:
RTX 5090 - Driver:
580.126.20
- GPU:
Observation window
2026-04-20 from 15:18 UTC to 15:35 UTC
Both workers were running:
- the same container image
- the same Python virtual environment on the shared network volume
- the same handler code
Worker A (failing)
Pod: r4olk6c2f93dny
GPU Info: gpu_name=NVIDIA GeForce RTX 5090, compute_cap=12.0, vram_mb=32607,
driver_version=580.126.09, torch_version=2.11.0+cu128, torch_cuda_version=12.8,
device_capability=None, arch_list=[]
ComfyUI traceback on this worker (originating from torch._C._cuda_init() in torch/cuda/__init__.py:478):
RuntimeError: CUDA unknown error - this may be due to an incorrectly set up environment,
e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the
available devices to be zero.
Worker B (working)
Pod: h9q3efquin7ztc
GPU Info: gpu_name=NVIDIA GeForce RTX 5090, compute_cap=12.0, vram_mb=32607,
driver_version=580.126.20, torch_version=2.11.0+cu128, torch_cuda_version=12.8,
device_capability=[12, 0], arch_list=['sm_75', 'sm_80', 'sm_86', 'sm_90', 'sm_100', 'sm_120']