#Could voice recogniton be offloaded to an accelerator?
1 messages · Page 1 of 1 (latest)
STT requires would greatly benefit from having a GPU
Any neural network (STT, TTS if it is neural based, and LLM) is ultimately limited by VRAM speed/capacity, as well as amount of tensors (CUDA cores or Tensor-RT cores if supported) in terms of performance. EdgeTPU devices largely are for simple things like basic facial or sound recognition, or simple ML tasks. Best bet if you want local accelerated STT would be to get a cheap used GPU that has CUDA. A lot of people were using used GTX 1070/1080 and get very good performance, under 400ms I believe. I bought a cheap 4060ti on sale for something like $400 and am able to run STT as well as some LLMs. Added bonus is I also set up a game streaming pod that can utilize GPU cycles to play games as well 😄
I offload STT to Whisper large turbo to a mac mini. STT takes about 1 second with improved accuracy compared to six second Whisper small-en on the NUC running HA. I've read that Google offers whisper as an API, but I have no experience with that service.
A mac mini to replace my intel Nuc for Home assistant ... mmm ... coulld be tempting ... 😉
Not all neural networks are memory bandwidth bound, but apparently Whisper is, which stinks.