#Passing through GPU proxmox -> vm -> docker -> frigate. Grey hairs incoming fast

1 messages · Page 1 of 1 (latest)

mental horizon
#

Hello,

I am trying to pass a nvidia gtx 1650 through to frigate, that runs in docker in a ubuntu vm on my proxmox host. The passthrough is completed but I can not for the life of me get the drivers to load. I've tried different packages, the nvidia installer etc. Cant get the drivers installed either on the proxmox host, the vm or in docker. However I am under the impression that this should not be needed?

If anyone could provide som guidance, that would be great. Thanks

vapid forum
mental horizon
#

Wow, what a nice guide, I'll try it. Thanks

#

Oh, so when passing through to a VM, it is reserved for that VM? I didnt know that.

vapid forum
#

Unless you have a vGPU capable GPU yeah.

mental horizon
#

Just to not confuse any terminology here. The drivers should be installed both on the proxmox host, and in the vm? (node = proxmox host)

vapid forum
#

In your case you only need the drivers in the VM.

#

To elaborate. In the case of a VM the VM owns the GPU and often times it's a good idea to even black list the device/driver so the host doesn't even attempt to use it.
In the case of CTs the installation has different goals. The kernel is shared so installing will provide the node the ability to load the driver and the CT with the userland libraries and tools.

mental horizon
#

Nothing failed when running your scripts, but nvidia-smi still fails after reboot. 😦

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
#

Maybe I'll try it on the node aswell?

vapid forum
#

Can you check lspci -nnk | awk '/VGA/{print $0}' RS= inside the VM?
If the GPU is passed to the VM then the node has no access to it anyways.

mental horizon
#

Ok! Suppose this is what we're looking for?

        Subsystem: Hewlett-Packard Company TU116 [GeForce GTX 1650 SUPER] [103c:87a5]
        Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
vapid forum
#

Yep. It's supposed to have a driver in use like this

01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA102 [GeForce RTX 3090] [10de:2204] (rev a1)
        Subsystem: ASUSTeK Computer Inc. GA102 [GeForce RTX 3090] [1043:87b3]
        Kernel driver in use: nvidia
        Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
#

I think I'll try to reproduce. I have to see why the awk script fails to properly limit to just VGA anyways.

mental horizon
#

Appreciate it!

vapid forum
#

Can you tell me the ubuntu version you use?

mental horizon
#

24.04.1 LTS

vapid forum
#

The ubuntu installer is so slow 😐
Can you try apt -V install libnvidia-compute-575 nvidia-dkms-575 && reboot and see if this changes anything?

mental horizon
#

Same result

vapid forum
#

Sorry, this takes a while before I have some results. Did you play around with any blacklisting in the VM?
Try this to find out

grep -sRE "softdep|blacklist" /etc/modprobe.d/

Also check journalctl -k or journalctl -k -g nvidia for hints why the driver might not have loaded.
You can also try modprobe nvidia manually and see if it complains.

mental horizon
#

Not sure anything relevant is in the blacklist:
https://pastebin.com/uTpmL3Hg

journalctl -k -g nvidia returns: -- No entries --

modprobe nvidia : modprobe: ERROR: could not insert 'nvidia': Key was rejected by service

vapid forum
#

It's possible you already installed nvidia from the ubuntu repos and there's a conflict.
dpkg -l | grep nvidia should tell.

#

What you linked aren't the results of the grep command but < 1s worth of the initial boot log 😄

#

Is secure boot enabled for the VM?

mental horizon
vapid forum
#

Did you enroll the key during the reboot via noVNC?
I could finally test this myself and it works fine but gcc was missing so building the module failed. Debian install this itself. Try apt install gcc.

#
root@ubuntugpu:~# dpkg -l | grep nvidia | awk '{print $2,$3}' | column -t
libnvidia-cfg1-575:amd64       575.57.08-0ubuntu1
libnvidia-common-575           575.57.08-0ubuntu1
libnvidia-compute-575:amd64    575.57.08-0ubuntu1
libnvidia-decode-575:amd64     575.57.08-0ubuntu1
libnvidia-egl-gbm1:amd64       1.1.2.1-1ubuntu1
libnvidia-egl-wayland1:amd64   1:1.1.19-1ubuntu1
libnvidia-egl-xcb1:amd64       1.0.2-1ubuntu1
libnvidia-egl-xlib1:amd64      1.0.2-1ubuntu1
libnvidia-encode-575:amd64     575.57.08-0ubuntu1
libnvidia-extra-575:amd64      575.57.08-0ubuntu1
libnvidia-fbc1-575:amd64       575.57.08-0ubuntu1
libnvidia-gl-575:amd64         575.57.08-0ubuntu1
libnvidia-gpucomp-575:amd64    575.57.08-0ubuntu1
nvidia-compute-utils-575       575.57.08-0ubuntu1
nvidia-dkms-575                575.57.08-0ubuntu1
nvidia-driver-575              575.57.08-0ubuntu1
nvidia-firmware-575            575.57.08-0ubuntu1
nvidia-kernel-common-575       575.57.08-0ubuntu1
nvidia-kernel-source-575       575.57.08-0ubuntu1
nvidia-modprobe                575.57.08-0ubuntu1
nvidia-persistenced            575.57.08-0ubuntu1
nvidia-utils-575               575.57.08-0ubuntu1
xserver-xorg-video-nvidia-575  575.57.08-0ubuntu1
#
root@ubuntugpu:~# lspci -k | grep -A3 VGA
00:01.0 VGA compatible controller: Device 1234:1111 (rev 02)
        Subsystem: Red Hat, Inc. Device 1100
        Kernel driver in use: bochs-drm
        Kernel modules: bochs
--
01:00.0 VGA compatible controller: NVIDIA Corporation GA102 [GeForce RTX 3090] (rev a1)
        Subsystem: ASUSTeK Computer Inc. GA102 [GeForce RTX 3090]
        Kernel driver in use: nvidia
        Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
#
root@ubuntugpu:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 24.04.2 LTS
Release:        24.04
Codename:       noble
mental horizon
#

gcc was already installed

#

Maybe I should just spin up a ct instead and start over

vapid forum
#

I'd try disabling secure boot.