Depending on the GPU I choose and availability regions, I get this error up to around 50% of the time.
{"requestId": null, "message": "Fitness check failed: _cuda_init_check | RuntimeError: CUDA initialization failed: Failed to initialize GPU 0: CUDA error: no kernel image is available for execution on the device\nCUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1\nCompile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.\n", "level": "ERROR"}
It seems to me like either the image isn't always loading with CUDA, or I'm receiving instances that don't have GPUs attached. Happy to DM with info about the particular endpoint I'm running this on. Anyone experienced something similar? And RunPod team, please help me out!