At the moment I run HA on a small Optiplex system, with a 10th gen Intel CPU.
I keep wanting to get into the voice assistants, but am not sure if I need a system with a ‘proper’ GPU or if what I have will suffice. My understanding is that speech to text and thus Whisper would have the most benefit in speeding up voice processing.
So the options the way I see it:
-
My current system has plenty of CPU capacity, with very low load numbers. It has 16GB now, but I could increase this to 64GB. This would use CPU for processing, but I could be using large models.
-
Use another small system I have for which I could buy a low-profile GPU with 6GB VRAM.
To make the right decision, I am wondering if someone has any comments regarding the following:
A) Can the Whisper add-on use either a CPU or a GPU, and is this user-configurable?
B) Using for example the ~5GB medium model, is there a massive difference between
CPU and GPU in terms of how fast speech gets converted to text?
C) Most people won’t have a GPU on their HA server. Am I overcomplicating this and is CPU-based STT plenty fast enough?