GPU is detected and working but ML doesn't use it at all for any of it's related tasks. | Immich | Page 1

vapid coyote Aug 6, 2025, 11:10 PM

#

I have attached the docker compose and hwaccel along with the logs from the ml_container stating that it reverts back to CPU even though it is trying for CUDA.

📎 docker-compose.yml 📎 hwaccel.ml.yml 📎 ml_logs.txt

proven sparrowBOT Aug 6, 2025, 11:11 PM

#

:wave: Hey @vapid coyote,

Thanks for reaching out to us. Please carefully read this message and follow the recommended actions. This will help us be more effective in our support effort and leave more time for building Immich immich .

References

Container Logs: docker compose logs docs
Container Status: docker ps -a docs
Reverse Proxy: https://immich.app/docs/administration/reverse-proxy
Code Formatting https://support.discord.com/hc/en-us/articles/210298617-Markdown-Text-101-Chat-Formatting-Bold-Italic-Underline#h_01GY0DAKGXDEHE263BCAYEGFJA

#

Checklist

I have...

:ballot_box_with_check: verified I'm on the latest release(note that mobile app releases may take some time).
:ballot_box_with_check: read applicable release notes.
:ballot_box_with_check: reviewed the FAQs for known issues.
:ballot_box_with_check: reviewed Github for known issues.
:ballot_box_with_check: tried accessing Immich via local ip (without a custom reverse proxy).
:ballot_box_with_check: uploaded the relevant information (see below).
:ballot_box_with_check: tried an incognito window, disabled extensions, cleared mobile app cache, logged out and back in, different browsers, etc. as applicable

(an item can be marked as "complete" by reacting with the appropriate number)

Information

In order to be able to effectively help you, we need you to provide clear information to show what the problem is. The exact details needed vary per case, but here is a list of things to consider:

Your docker-compose.yml and .env files.
Logs from all the containers and their status (see above).
All the troubleshooting steps you've tried so far.
Any recent changes you've made to Immich or your system.
Details about your system (both software/OS and hardware).
Details about your storage (filesystems, type of disks, output of commands like fdisk -l and df -h).
The version of the Immich server, mobile app, and other relevant pieces.
Any other information that you think might be relevant.

Please paste files and logs with proper code formatting, and especially avoid blurry screenshots.
Without the right information we can't work out what the problem is. Help us help you ;)

If this ticket can be closed you can use the /close command, and re-open it later if needed.

vapid coyote Aug 6, 2025, 11:12 PM

#

the .env file

#

Setup is - Proxmox as host, Immich is setup and installed in a LXC Container. (i5-6500, 16GB RAM, Quadro P1000)

I can confirm through nvidia-smi that the gpu A Quadro P1000 gets detected in all 3 places mentioned above.

The mounting, is a backblaze b2 bucket.

The container has 4vCPUs, 4GB RAM and 90GB of storage allotted to it.

#

The image is nvidia-smi output

#

I have played around as much as I can with the docker files, and hwaccel.

Finally found the discord and saw someone pointed to the immich_machine_learning container using the internal IP of docker on the web interface.

I have spent more time on this than I'd like to admit. Any help is appreciated.

proven sparrowBOT Aug 6, 2025, 11:20 PM

#

proven sparrow ## Checklist I have... 1. :ballot_box_with_check: verified I'm on the latest rel...

Successfully submitted, a tag has been added to inform contributors. :white_check_mark:

hollow grove Aug 7, 2025, 8:15 AM

#

Is a P1000 even compatible?

#

Ah yep, compute 6.1 is >= 5.2

#

Not sure what the NVIDIA env vars are supposed to achieve?

#

We don't have those

#

By installed in an LXC, do you mean directly, or is docker running in your LXC @vapid coyote ?

vapid coyote Aug 7, 2025, 10:20 AM

#

hollow grove By installed in an LXC, do you mean directly, or is docker running in your LXC <...

Yes there’s docker installed and it all runs off of that in the container

hollow grove Aug 7, 2025, 10:29 AM

#

Could try setting:

  cuda:
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ['0']
              capabilities:
                - gpu

In hwaccel instead @vapid coyote ?

vapid coyote Aug 7, 2025, 3:23 PM

#

Ok will try and update here

vapid coyote Aug 7, 2025, 5:05 PM

#

Didn't work

#

hollow grove Aug 7, 2025, 5:17 PM

#

dang

#

maybe

group_add:
  - "xy"
  - "zw"

Where xy zw are the gid numbers for video and render 👀

#

But I'm just guessing at this point

vapid coyote Aug 7, 2025, 5:56 PM

#

I assume that's where you wanted me to add it

hollow grove Aug 7, 2025, 5:57 PM

#

indeed

vapid coyote Aug 7, 2025, 5:57 PM

#

What I can't get out of my mind is the error log still mentions "device_id" I got rid of that but the error log still mentions it

#

hollow grove Aug 7, 2025, 5:57 PM

#

It mentioned that before you switched to it

#

my guess is there is a rights issue preventing the GPU from being properly accessesd

#

resulting in GPU=-1

vapid coyote Aug 7, 2025, 5:59 PM

#

hmm, the error log at the top that I shared doesn't have the white error from now

#

I do agree it might just be that too many virtualisations happening, proxmox>lxc>docker

#

causing permission issues

#

Installed immich on a vm first, worked great, then realised maybe adding a GPU for the ML loads would be a good idea considering the CPU is probably not the best for this stuff.

That was 3 days ago when I dropped $100 on the Quadro, first my wallet hurt, now my brain hurts.

hollow grove Aug 7, 2025, 6:17 PM

#

The joke is your iGPU probably would've worked fine 👀

vapid coyote Aug 7, 2025, 6:19 PM

#

https://tenor.com/view/are-you-the-one-why-would-you-do-that-to-me-ayto-mtv-whats-wrong-with-you-gif-9989969

Tenor

hollow grove Aug 7, 2025, 6:23 PM

#

So to give you the real picture here

#

CPUs have their own accelerators

#

OpenVino for instance

#

And outside of the initial processing, you don't really need anything but RAM

#

usually loading the model is what takes the longest

#

using an external ML container for initial processing is very popular

vapid coyote Aug 7, 2025, 6:27 PM

#

hollow grove using an external ML container for initial processing is very popular

Yeah I have been thinking of 2 options -

Use a spare disk and test run a desktop environment - probably mint. Try running immich on there and if it works well then maybe I move my entire stack there - HAos, SMB, Minecraft Servers.
My main machine is a RTX 3060 Laptop, Windows. Run the ML container on there? The web UI allows me to control when the jobs are running and I can control when the container is running on the laptop.

#

Also to note - I tried sending through 7000 files and that started crashing the container(lxc) seems I loaded the CPU too much and had all the ML stuff enabled because I thought the GPU was working

#

But on the VM, the CPU seemed to handle it beautifully, didn't crash even though the CPU was going ham the same way - this was before the GPU install.

hollow grove Aug 7, 2025, 6:34 PM

#

I ran my ML on windows as well, took a good few hours but not days

vapid coyote Aug 7, 2025, 6:37 PM

#

Yeah I did look into, Nvidia and Microsoft have done some cool stuff to bring proper GPU access to WSL.

There's just so much conflicting and unclear information on the docs

hollow grove Aug 7, 2025, 6:37 PM

#

Our docs?

vapid coyote Aug 7, 2025, 6:37 PM

#

oh no developer.nvidia.com

#

don't think I found anything about that on the immich docs

lunar idol Aug 8, 2025, 4:20 PM

#

vapid coyote Setup is - Proxmox as host, Immich is setup and installed in a LXC Container. (*...

How are you passing through your GPU to the LXC container?

vapid coyote Aug 8, 2025, 6:02 PM

#

lunar idol How are you passing through your GPU to the LXC container?

Well technically I don’t think an lxc container passes it through or gets direct access to the GPU.

It lets the proxmox host have control of it and shares resources from there

#

So with a VM, one VM will have access to the GPU.

With LXC Containers, host keeps the GPU but multiple containers can use it

#

I first tried VM but in that I couldn’t even get the GPU to be detected properly, with LXC it detects just doesn’t get used.

lunar idol Aug 8, 2025, 7:24 PM

#

With LXC you still need to write rules in the LXC config file and you'll likely have to mount the GPU

#

If you do ls /dev/nvidia* in the container does anything show up?

vapid coyote Aug 10, 2025, 3:44 AM

#

@lunar idol mb don't know how I missed this message, but my config for the lxc has all these things in it

#

lunar idol Aug 10, 2025, 7:28 AM

#

Please provide me with an ls /dev/nvidia* from both the Proxmox host and from inside the container

vapid coyote Aug 10, 2025, 4:21 PM

#

This is the Immich Container

#

That's Proxmox

lunar idol Aug 10, 2025, 4:42 PM

#

Just to test the passthrough, if you download ffmpeg within the container and try to transcode a video, does it do that properly?

vapid coyote Aug 10, 2025, 4:53 PM

#

I’ll try that now

lunar idol Aug 10, 2025, 5:20 PM

#

(Make sure to use the nvenc encoder to trigger GPU usage)

vapid coyote Aug 10, 2025, 5:30 PM

#

Yeah did that and seems it failed on the container

#

ffmpeg -hwaccel cuda -i P1066118.MOV test.mp4 that was my command

lunar idol Aug 10, 2025, 5:59 PM

#

I just noticed you're passing through driver libraries from your host; is there any reason for this?

vapid coyote Aug 10, 2025, 6:15 PM

#

That's just what I found when searching online

#

The way I set it up was -

Got the driver file from Nvidia.com. Ran it on the proxmox host.
Ran the same file on the container but with the parameter to exclude building kernels since it was erroring out when I tried to install the same file because the lxc was not detecting the driver at all.

#

Also my reason to use the file from Nvidia.com was because of version discrepancy between what the pve host was installing from apt and the lxc.

#

Plus this way I assume less chance of breaking when updates are run

lunar idol Aug 10, 2025, 6:22 PM

#

The driver in the Debian repository is a bit older, it's from Februari 2024, but for something like Immich I wouldn't be chasing the newest version (especially with a Debian-based OS) but I'd want stability instead

#

If I were you I'd install the Nvidia driver from the apt repository and then try this all again and avoid mounting driver directories; just install nvidia-smi inside the LXC container and it will pull along any necessary libraries it needs

#

Let me know if you need help

#

If you install the Nvidia drivers through apt if I remember correctly it will undo the .run installation itself

vapid coyote Aug 10, 2025, 6:26 PM

#

I get that - PVE was installing 570.20 and immich was installing 570.06 something.

In my mind it made sense to have the same driver versions so I ran the .run

#

But you mean to say - let the host have the drivers or not? Because isn't the container dependent on the host for the drivers?

lunar idol Aug 10, 2025, 6:28 PM

#

https://tenor.com/view/think-emoji-thonk-meme-gif-11987870

Tenor

#

Immich comes with its own driver blobs?

vapid coyote Aug 10, 2025, 6:32 PM

#

wait a min, you mean to say the docker container should have the drivers in there?

hollow grove Aug 10, 2025, 6:34 PM

#

FYI new drivers often break ML, don't try to be clever here and stick with the "old" ones

lunar idol Aug 10, 2025, 6:42 PM

#

Break ML or break Debian; there's no winning here

#

The issue is that installing Nvidia drivers from anything that is not the repository is a hole deeper than hell

lunar idol Aug 10, 2025, 6:53 PM

#

hollow grove FYI new drivers often break ML, don't try to be clever here and stick with the "...

I assume you mean old drivers break ML?

#

Or am I reading it wrong and staying with old is good?

#

I'd think see if repo driver works and if so, don't change, right?

hollow grove Aug 10, 2025, 7:20 PM

#

I specifically said new 👀

vapid coyote Aug 10, 2025, 8:06 PM

#

Well, I have cleared out the driver install. Will reboot and try what's suggested here once the badblock test on one of my disks ends.

lunar idol Aug 10, 2025, 8:55 PM

#

Make sure to get rid of the library directory mounts; that should not be necessary if you have nvidia-smi installed within the container.

vapid coyote Aug 15, 2025, 6:09 AM

#

I am so effin happy

#

What I did was nothing short of a full system reset -

Proxmox 9 was recently released, and I still use Proxmox 8. Decided imma install Proxmox 9 on a spare disk that will have none of my stuff on it besides the default and try passing through the GPU to a Windows VM. WORKED RIGHT AWAY

So I tried to recreate it on my actual boot disk but failed, spent the day backing up and then restoring my containers and vms to the new install and finally voila!!!

Don't know what was wrong in my old install that was messing the passthrough up so bad, but having attempted this so many times now over the past 10 days, I have nailed down the process and know exactly what is needed.

#

Should it have taken this long? - NO.

Does this remind of the covid lockdown days, where teenage me spent a lot of time understanding and figuring out how networking, reverse tunnels and java works, so I could play minecraft with friends? - Absolutely.

vapid coyote Aug 15, 2025, 4:54 PM

#

Seems my video transcoding is failing now

lunar idol Aug 15, 2025, 5:08 PM

#

You're missing libcuda

#

Is this still an LXC container?

vapid coyote Aug 15, 2025, 5:09 PM

#

Nope it’s a Ubuntu server VM now.

lunar idol Aug 15, 2025, 5:09 PM

#

Make sure the GPU is blacklisted on the Proxmox host

vapid coyote Aug 15, 2025, 5:10 PM

#

Oh hmm could that also be why I had to restart docker earlier.

It had stopped detecting a cuda device according to the errors

lunar idol Aug 15, 2025, 5:11 PM

#

lspci -nn on the host and paste the output

vapid coyote Aug 15, 2025, 5:15 PM

#

lunar idol Aug 15, 2025, 5:16 PM

#

Sorry, -nnk instead of -nn

vapid coyote Aug 15, 2025, 5:18 PM

#

lunar idol Aug 15, 2025, 5:20 PM

#

Looks good, it's attached to vfio

#

And the GPU is properly visible in the Windows VM? In device management?

vapid coyote Aug 15, 2025, 5:22 PM

#

Well I have since gotten rid of the Windows VM, but it shows up nicely in my Ubuntu Server VM now.

It's chugging through the files

lunar idol Aug 15, 2025, 5:22 PM

#

Nice!

vapid coyote Aug 15, 2025, 5:23 PM

#

Just not sure why ffmpeg is misbehaving, especially considering what I have found.

Immich uses the jellyfin fork of ffmpeg and I was gonna setup jellyfin next.

lunar idol Aug 15, 2025, 5:25 PM

#

Well, it says right in the output

vapid coyote Aug 15, 2025, 5:36 PM

#

The extends portion was commented out, let's see if it works now

#GPU is detected and working but ML doesn't use it at all for any of it's related tasks.

References

Checklist

Information