Hi all. I have been running Whisper small-int8 en on a Home Assistant virtual machine that has been allocated 4 CPU cores and 5 GB of ram. Neither appear to be strained and the ram usually hovers around 2.5 gb. It seems that the initial speech to text request takes 15-20 seconds to respond. Subsequent requests take 2-3 seconds. However, it seems that after an extended period of time of no requests (5 to 6 hours), the initial request after that gap of time takes 15-20 seconds again.
When I monitor the ram and CPU, it seems like there's a big spike on the next request when Whisper has not been used in awhile. Almost as if it dumped its memory due to inactivity and now needs to load things back into memory. Is this expected or might something else be causing these symptoms?