#Whisper slow to respond.

1 messages · Page 1 of 1 (latest)

idle ravine
#

Hi all. I have been running Whisper small-int8 en on a Home Assistant virtual machine that has been allocated 4 CPU cores and 5 GB of ram. Neither appear to be strained and the ram usually hovers around 2.5 gb. It seems that the initial speech to text request takes 15-20 seconds to respond. Subsequent requests take 2-3 seconds. However, it seems that after an extended period of time of no requests (5 to 6 hours), the initial request after that gap of time takes 15-20 seconds again.

When I monitor the ram and CPU, it seems like there's a big spike on the next request when Whisper has not been used in awhile. Almost as if it dumped its memory due to inactivity and now needs to load things back into memory. Is this expected or might something else be causing these symptoms?

whole hazel
#

Often it will unload the model after some time to free up resources. Does the addon have a ttl of some sort you can set to tell it to never unload?

idle ravine
#

Thanks. I don’t see a ttl option in the addon configuration.

#

I’m running it under HA supervised, not in docker.

#

Is this a common issue or do most Assist users use HA Cloud?

mortal willow
#

I built a separate WSL Ubuntu instance with Nvidia drivers to run Whisper on GPU memory and use largest turbo model for best quality and performance. It does STT pretty much instantly and perfectly despite my accent. Stays in GPU memory all the time at around 2Gb, ready to go.

whole hazel
#

On a k8s cluster with gpu