I'm new to HA and diving into the deep end with Voice Assistant. I've tested on an RPi4 w/8gb and an i7-7700T. In both cases I'm using an Atom Echo for the microphone. Using the local voice processing I can only get good enough speed on the i7, the RPi4 being too slow. I've also tried HomeLLM in HAOS and remote LLM's pointed at some other systems I have in the house. The time to action gets worse in every case. The ONLY situation I've had where the time between my voice command and the switch toggling is with the i7 and local processing. That would be fine, except about 50% of the time the speech to text gets it wrong. I get a lot of the "I don't see a device called off" responses. My first question is, is the Atom Echo up to the task? If it should be, what else do I need to tweak to get the faster-whisper to be accurate closer to 100% of the time?
#Hardware performance and recommendations
1 messages · Page 1 of 1 (latest)
the AE is a limited device and you will probably get better results using a voice-pe but thats perhaps not the main issue.
are you just running whisper locally as haos addon?
what language are you speaking?
Yes, local whisper. US English.
how much ram you working with on your i7 setup?
16GB
ok try running ONNX ASR instead of whisper
Home Assistant Add-on: ONNX ASR Addon repository Home Assistant add-on that uses onnx-asr for speech-to-text. Notably, provides access to the NVIDIA NeMo Parakeet-TDT model which should be significantly faster and more accurate than Whisper for English in most cases. Faster and better speech to text This addon provides an English language ...
you have the ram for it and it runs very quickly with good sucsess using the parakeet stt model
Nice! I'll give that a try
i should point out that this is a very new thing
but from my testing it works great
once you install and add the addon it can take a few minutes to come up the first time but you can see its progress in the addons log
First pass wasn't much better. I'm restarting just to see if it makes a difference. Not expecting it to.
did you update the pipeline config to switch over to it?
what type of mistakes is it making?
I'm practicing with a Matter device I've called "test plug". It just told me "sorry, I'm not aware of any device called touch plug". It's different every time. Sometimes it doesn't know a device called "on". I think I speak relatively clearly but I'm also not trying to change my voice. Don't want to think I have to slow down every time I use it.
on the voice assistant settings page next to your pipeline hit the dots then select debug
this will show you the trace of a call and you can see exactly what it "thought" you said
a better quality microphone. (e.g. a voice-pe instead of an AE) may help with this but its hard to be sure
Worked twice and then failed with this.
Yeah, I can try something else.
Any other options out there that lean towards free or cheap?
also remember that the basic conversation agent is very basic and picky about phrasing
I'm probably ending up with the voice-pe anyway, just would like to prove this out before I do that.
if you dont particually need/want full recognition you can try using "speech to phrase" instead
it cant recognise wildcard text but it specifically looks for phrases that can be used and work
I'll give it a try
so it precalculated turn on/off test plug (or whatever else you call stuff)
but it cant be used for something like play **songtitle** by **artist**"
because it looks for specific patterns instead of actual decoding of voice
which I might want when I get around to setting up MA
yes, this is definetly a thing
I'm testing with the app on my phone and it's doing pretty well with the recognition. Missed the first one. Been fine since that.
yeah the phone mic will likely be better
Ok, so that's my problem now. It's fast enough, I just need a good mic
yeah. the mic (and speaker) in the AE is limited. the voice pe has a 2 mic array and a dedicated audio chip to help clean it up
the AE is a great cheap device for testing/debugging stuff. i have one that i mess around with. but for practical purposes it falls down
I appreciate all the help! I'll see about picking up a PE. Probably leave the AE in an area where I can be the only one impacted by it's challenges. 🙂
yeah, its fine for your desk but if other people want to use it then its an issue
As a follow up, I've now tried the same pipeline between the RPi4 and the i7. The STT is my problem now. On the i7 it takes less than a second. .25s in some cases. On the RPi it's consistently around 5s to get through the STT step. I've tried both faster-whisper and onnx-asr. Doesn't seem to be dramatically different between the two systems.
Is there anything to be done to get faster STT on the RPi4?
not really, you are just hardware limited running STT on rpi
getting a pc with a n100 or n150 cpu is a great middle ground. if you want it to be quicker but still keep low power
That's what I'm thinking. I have an N97 board that I can test with. Roughly the same performance as the N100, maybe slightly better. I'll give that a go before I spend money.
likely wont be as fast as the i7 but likely a lot quicker than the rpi