Hi Everyone,
I've been working on a project to get the fastest, most efficient voice assistant for HA.
Text to speech has been the sticking point for me as speed is critical to the latency of the pipeline. I think I've got a good solution working with an intel iGPU. I wanted to share how I did it as I'm sure lots of people are running HA on intel CPU with iGPU and could benefit from Whisper acceleration (and reduction in CPU use). In my testing it's just as fast for the small.en model as my Intel A770 with OpenVino.
The approach is as follows:
Whisper.cpp built with OpenVino support according the readme here: https://github.com/ggerganov/whisper.cpp
Adjust /examples/server/server.cpp and change OpenVino device from "CPU" to "GPU" before building.
wyoming-whisper-api-client https://github.com/ser/wyoming-whisper-api-client
This lets you use a custom whisper.cpp server instance for your HA Wyoming interface.
If anyone wants to follow in my footsteps let me know and I can give you a more detailed guide! It might be something that could be dockerized as well but that is beyond my skills.