#Any resources on how to run whisper outside HA and load custom models?

1 messages Β· Page 1 of 1 (latest)

shadow phoenix
#

What I really want is the ability to run some kind of distilled version of whisper for spanish, but the addon only offers the distilled version for english. The reason being that the tiny, base and small models are pretty bad at understanding even the clearest of speaches unless you are really really by the microphone doing your absolute best to speak like the voice over of a nature documentary. The medium model is the first one that kind of does its job, but it's a bit too slow on my 10 core 12th gen intel nuc. The large one is out of the question then. I thought that maybe a distilled version for Spanish could be just fast enough.

My understanding is that right now the only way to run a custom model would be to run whisper yourself and not as an addon, but I couldn't find much info about it.

steel cliff
#

But I'm not sure if distilled versions even exist for other languages:

Note: Distil-Whisper is currently only available for English speech recognition. We are working with the community to distill Whisper on other languages. If you are interested in distilling Whisper in your language, check out the provided training code. We will soon update the repository with multilingual checkpoints when ready!
From https://github.com/huggingface/distil-whisper

GitHub

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate. - huggingface/distil-whisper

shadow phoenix
#

officially no, but I've seen other models in hugging face specialized for Spanish made by the community

#

Also, i'm not sure if I could squeeze more performance if I ran it independently in its own LXC container (I use proxmox, so the whisper-addon is actually a docker container running inside HA which is an LXC container).
my server only has the integrated iGPU, i'm not sure of how much help that would be. I have loads of ram tho.

fringe monolith
#

The wyoming faster-whisper impl can use CUDA, but that requires a nvidia GPU. You also need to do some tweaks to make that work, the stock docker image as-is won't do it

#

I have mine running in k8s making use of CUDA

fringe monolith
#

Also, as of v2.0.0:

#

So you can just set the model directly to a huggingface model name

#

actually just tried this and set my model to Systran/faster-distil-whisper-large-v3 and it worked great πŸ™‚

shadow phoenix
#

my only regret when buying my home server is not anticipating that in 18 months AI would take over the world, so I didn't deem important to have a GPU on it

#

i still have hopes that external NPUs like the Hailo-8 become widespread to run AI models without a huge power cost (my entire nuc sips <10w of power on average)

fringe monolith
#

Just for your info, here's my power usage over the last 24 hours:

#

basically 99% of the time the GPU idles at 6 watts, and only spikes for a second or less when infering

#

A lot of people think the GPU is cranking at 200w all the time when you run AI stuff, but in reality it sits idle and only spins up when something needs to happen πŸ™‚

rough lodge
#

@fringe monolith what kind of hardware are you use? I start thinking about Nvidia Jetson AGX..

fringe monolith
#

Running a custom built system 😁

#

Ryzen 9 7950 with 96GB ram and an rtx 4060ti for now

#

Running proxmox with an HA VM and a k8s vm

rough lodge
#

Very nice hardware πŸ™‚ .. However, I plan go to Nvidia Jetson AGX 64GB direction πŸ™‚ plan is only run Whisper (destil-large-v3 model), Piper, LLM and Frigate on this device πŸ™‚

#

this whole device really consumes a few watts..