#Could voice recogniton be offloaded to an accelerator?

1 messages · Page 1 of 1 (latest)

blissful vine
#

I know that the EdgeTPU doesn’t have enough memory to accelerate speech-to-text, but I’m wondering if other accelerators do. Is speech-to-text fundamentally limited by memory bandwidth?

grand cloak
#

STT requires would greatly benefit from having a GPU

timid plume
#

Any neural network (STT, TTS if it is neural based, and LLM) is ultimately limited by VRAM speed/capacity, as well as amount of tensors (CUDA cores or Tensor-RT cores if supported) in terms of performance. EdgeTPU devices largely are for simple things like basic facial or sound recognition, or simple ML tasks. Best bet if you want local accelerated STT would be to get a cheap used GPU that has CUDA. A lot of people were using used GTX 1070/1080 and get very good performance, under 400ms I believe. I bought a cheap 4060ti on sale for something like $400 and am able to run STT as well as some LLMs. Added bonus is I also set up a game streaming pod that can utilize GPU cycles to play games as well 😄

coral brook
#

I offload STT to Whisper large turbo to a mac mini. STT takes about 1 second with improved accuracy compared to six second Whisper small-en on the NUC running HA. I've read that Google offers whisper as an API, but I have no experience with that service.

viscid patrol
#

A mac mini to replace my intel Nuc for Home assistant ... mmm ... coulld be tempting ... 😉

dim marsh
#

I just run an nvidia gpu, works great
🚀

#

large-v3-turbo takes about 2GB vram for me

blissful vine