I'm trying to get Whisper to run on the GPU on my Qnap NAS. I have a T400 4Gb GPU, which I bought mostly for transcoding, however I figure I should be able to run a medium.en or small Whisper model as well.
That said, I've been rather unable to tell if the GPU is offloading. When I run nvidia-smi, I only see 400 Mb of memory usage, no proccesses, however GPU-Util goes up to 100%. So that makes me think it's running on the GPU, but I would have expected vRAM to be loaded.
Am I missing something here? It could be a quirk of how Qnap works and their Docker configuration. I've had other strange issues in the past, so I wouldn't rule it out.