I am very happy with the VoicePE in combination with OpenAI and Nabu Casa Cloud.
In fact, I was already able to replace my Alexa with that.
The next step would be to run the assist pipeline locally. However, I am a bit lost with the various STT solutions available.
I started with Faster-Whisper, but the results in terms of speed and recognition were not satisfying.
Similar results with Whisper-cpp.
I am not even sure what Rhasspy-Speech is doing differently.
Currently, I am exploring VOSK, as it is surprisingly very fast and promising.
What I do not understand is, why is VOSK so much faster than Faster-Whisper on the same hardware (Synology NAS)?
What are the differences between the various solutions and which one should I choose?
My goal would be to have a local pipeline for a set of pre-defined sentences (in german), like light on/off, as my voice assist is mostly used for this.
Anything else should be forwarded to OpenAI.
Ideally, it can run on my NAS or RPI5. I would like to avoid running a server which requires a lot of power.
Any help is much appreciated ๐
KR,
Wlad