#STT Tools Comparison

1 messages ยท Page 1 of 1 (latest)

rare mist
#

I am very happy with the VoicePE in combination with OpenAI and Nabu Casa Cloud.
In fact, I was already able to replace my Alexa with that.

The next step would be to run the assist pipeline locally. However, I am a bit lost with the various STT solutions available.
I started with Faster-Whisper, but the results in terms of speed and recognition were not satisfying.
Similar results with Whisper-cpp.
I am not even sure what Rhasspy-Speech is doing differently.
Currently, I am exploring VOSK, as it is surprisingly very fast and promising.

What I do not understand is, why is VOSK so much faster than Faster-Whisper on the same hardware (Synology NAS)?
What are the differences between the various solutions and which one should I choose?

My goal would be to have a local pipeline for a set of pre-defined sentences (in german), like light on/off, as my voice assist is mostly used for this.
Anything else should be forwarded to OpenAI.
Ideally, it can run on my NAS or RPI5. I would like to avoid running a server which requires a lot of power.

Any help is much appreciated ๐Ÿ™‚

KR,
Wlad

broken lintel
#

So if i recall correctly, vosk is using some limited phonetic matching i think, so it works well for short commands and non-complex words or terms. It's basically a small model (50mb or so).
When you get into whisper, it is running large neural models (the larger the model the more accurate the transcription and the better it is at handling speech errors or noise during transcription). The larger models require more compute, so will be slow on cpu. You can run it using GPU and get much faster response times (i run whisper large-v3 on a 4060ti and get responses around 250ms). You have to run Wyoming-faster-whisper in it's own docker with gpu exposed to it in order for that to work however.

rare mist
#

Thanks for the insights.
I will continue with my tests but it seems like with my HW VOSK is the tool of choice.
Do you think the NVIDIA Jetson Orin can run Faster-Whisper effectively?
I would like to run it on a SBC style HW.

broken lintel
#

I'm not sure how well it would run it, others on here have been ordering the device so maybe someone with the hardware can chime in if they have tried it ๐Ÿ™‚

rare mist
broken lintel
#

they seem to more or less be the same thing, rhasspy-speech just takes a predefined set of sentences it matches again, so no open STT