I still use Amazon Alexa currently, though all i use it for is alarms and timers (for waking up in the morning, cooking, etc.) and I'd happily ditch it for a HA based solution.
I did set up a voice assistant in the past ( a year or 2 ago ), which didn't quite do it so i stuck with my Amazon one.
Have things changed since then? If so, what setup (hardware and software) is recommended generally speaking? I already use Sonos speakers to output other audio from HA, ideally i'd like to use them here as well. I run HA on a Pi 4 with SSD
#current state of HA voice assistants?
1 messages · Page 1 of 1 (latest)
so voice has come a long way. but is it a direct replacement for an alexa? maybe not.
the voice-pe satellite is good and can output via 3.5mm jack so if your speakers have an input you can easily pass to them instead of the built in speaker.
for actual voice processing you can probably run "speech to phrase" on a rpi which is a little limited as it only looks for specific patterns based upon your entities.
if you want full voice to text then you will need something more powerful or use a cloud service
Regarding the "alexa replacement", for me personally it would qualify as one if it does alarms and timers perfectly (or well enough)
This is my one and only use case for alexa anyways
So if that worked then it could replace it for me
I know it should do timers pretty well via STP as it has most of the expected inputs. dont expect to be able to do "2 hours, 3minutes and 6 seconds" but "45 minutes" should be fine
i am not sure about specified time alarms. although i would probaby do that via automation anyway
The way i use my alexa is like these examples:
- in the evening before a workday i would say "Alexa, set an alarm for 07:15 AM" or similar
- when cooking food that needs to stay in the oven for 30 minutes, i would say "Alexa, 30 minutes from now" / "Alexa, set a timer for 30 minutes" or similar
"okay nabu, set a timer for 30 minutes" - works great
alarms in that method though I don't think will work, but you could set an automation to trigger on your work days at a specified time
something like this
then set the action to TTS or play music or whatever
Okay cool, thanks buddy
Looks like i could finally ditch my echo dot for good
Regarding the Home Assistant Voice device, is its microphone good enough to properly pick up my voice from anywhere in the room (as the amazon one)?
I could install additional sattelite mics, but i'd prefer not to ofc
Though i would prefer to have my existing sonos speakers output the voice assistant's sound, not via 3.5mm jack
the mics are pretty good, realistically they are not as good as the amazon ones. but most people find them fine.
you "CAN" get the TTS responses to be redirected to another media player. but its unsupported and requires you to mess with the firmware a bit. there are some issues that can arise from it. using the jack is the best way to interface it with other speakers
I use the speakers via network though, both jack and network doesnt work simultaneously.
But then i'll just use the built-in one. If it doesn't satisfy me, i'll try modify the firmware a bit or if that's too much of a hassle just DIY a dedicated speaker for it. I still have two 3W 8 Ohm speaker drivers that were supposed to be added to my ESPHome Microphones back then.
this is one of my setups https://gist.github.com/MichaelMKKelly/5033dec56c5ab6ee6b7db52f690b84e0
for just TTS the internal speaker is fine. its only if you want to start playing music on it where it becomes an issue. but i guess you will be playing music via your other speakers anyway
Thank you my Friend!
Do you know of a budget-friendlier alternative to the HA voice? The 70 bucks is fine for me if it works well, but maybe there's an option that e.g. lacks a speaker and costs less.
there as the m5stack atom echo which is cheap but its only really good for development and testing tbh. i wouldnt use it in production
if you want to get something cheap to test stuff out before getting something better like the VPE then its not a bad option
You may DIY it with Respeaker Lite. https://github.com/formatBCE/Respeaker-Lite-ESPHome-integration
The mics and software overall are on par with Voice PE.
Also it would solve the speaker problem. 😉
But be aware that there's 4Ohm speaker supported, not 8Ohm (same for VPE DAC too).
oh yeah thats a shout, if you were thinking of building your own anyway and have some stuff the respeaker boards might be a direction
I own one already, as you said, for testing it's neat but that's about it
Alright, then the HA voice it is 🙂
How has the overall logic handling improved? Will i be fine without any additional LLM? If so, what are the limits here for you personally in terms of "possible without one"?
My Ollama Server can only run tiny-small models with acceptable speed (like 3B). 7-8B is possible, but too slow flor a voice assistant
Will i be fine without any additional LLM?
you're the only one who can answer that. you have to look at the building blocks for ratings instead of the pipeline as a whole
first off, what's your language? English?
You're correct, though this question was targeted at my desired "set alarm for 10 oclock" type of use case. I thought maybe someone has experience here, that's it. Ofc i need to evaluate for myself if i need one for a custom use case
German
You don't need LLMs to set an alarm (which is not supported by default). You can defined a custom intent+custom sentence combo or (maybe less likely, considering the numbers involved) an automation with sentence trigger
There are good and bad STTs for German, local and cloud. There are many good TTSs, both local and cloud
You have the M5Stack already. Test it
you would need to probably set up whisper on your other machine to do the STT though. as STP doesnt support variables
or use cloud for STT
Oh i've got everything to work already in the past, but i used an LLM with it that made it either too dumb (3B model) or too slow (7B model)
My hardware hasn't changed (Vega 8 iGPU on my Proxmox server), so i need to either change the LLM part accordingly, outsource it (e.g. OpenAI API) or leave it out entirely
This is why i'm asking where the limits are in terms of having my system understand me and act accordingly (either without LLM, or with small-ish ones)
the actual TTS andd STT wasn't the bottleneck, it was the LLM processing in between
so you can set up "local processing" with llm fallback.
so it will try and deal with it using the basic assist and if it cant work out what you mean it will ask the llm if it know whats going on
I'll have a look thank you!
A 3060 12gb can run 8b models just fine and is pretty responsive
That's what I do
But the 8b models are kinda stupid
And for the 14b models you'd need more vram unfortunately
Whisper & piper also need some vram
I use it with German too and it's decent enough
how much vram does a 14b model need?
12-16GB
considering your other services also need some a 2060/3060 12GB won't cut it
4060 has 8GB vram
I tried running qwen 2.5 14B on mine and it's slow af
i am considering a 4060ti with 16gb for the server. hoping i have enough room for whisper too
I use qwen 2.5 8b
here is my nvtop
you see ollama & whisper
4060ti with 16GB vram might just do it
(ignore the 2 ffmpegs they just do video stuff)
yeah, some background transcoding will happen for me too but that doenst use alot of vram
I got the 3060 for like 220€ used
my brother in law was talking about upgrading his 3090 with 24gb. my hope is that he does this and i can lay claim to old one
if i get a 4060ti now then if i randomly get the 3090 later i can move the 4060 to my desktop which has a 3070 currently
that could be an insane deal!
some more 50 series cards are announced next week. i am hoping that shakes up some deals on 4060ti's
yeah probably but might shake up a few ebay opportunities
maybe I should look for one too, I wish I could run 14b models tbh
you might even be able to run qwen 2.5 32b with a 3090
that'd be so wonderful if this finally worked. was googling around for it a lot, but only found feature requests with curiously few upvotes 😦 Edit: I just found one I'd not seen yet and which is already kind-of working! https://community.home-assistant.io/t/assist-create-reminders-by-voice/707287/12 🙂
I'm doing this on a measly GTX 960 running llama3.2 with the fallback option, by the way. Anything that combines traditional pre-processing with partial AI processing works great. The "remind me to do x at y" linked to above has a combined response time (STT + LLM + TTS) of under 20 seconds, which is practical enough.
NAME ID SIZE PROCESSOR UNTIL
qwen2.5:14b-instruct-q4_K_M 7cdf5a0187d5 15 GB 100% GPU Forever
That's with 32K context, flash attention and q8_0 KV cache.
According to nvtop the process uses 12130MiB or 11.85G. I hope that answers that.
oh about 12g,... see now thats not fair. now i definetly have to buy a 16gb card 😛
Or use less context 🙂
i want overhead space for whiper too anyway
You'd need it anyways - anything lower that that won't hold decent model, you will face constant hallucinations...
16gb card should be fine for a 14b model and whisper it seems
7B models are proven to be useless with context...
I'm currently using > 16G for LLM, whisper and piper so you might want to plan for more. Get a 3090 or something. Not that piper would absolutely need a GPU. I only did it because I can.
At this stage, think about Mac Mini 🙂
apple can fuck right off
What's the 15 GB figure in your screenshot?
That's a single GPU? o_O
That's the output of ollama ps. That's the amount of VRAM it thinks the model uses.
What do you mean? All the voice stuff I listed is running on a single 3090. > 16G is the total usage for all I listed. I can share more details later if you're curious.
I just didn't think there were gaming GPUs with that much VRAM yet
Is it possible to use multiple smaller-VRAM GPUs together btw?
Well the 5090 has 32G but I can't and don't want to spend whatever they ask for it . Yes, ollama can use multiple GPUs.
there is a 3090 with 24gb
running multiple 12g cards is a possible also. but i am not sure i have the pci slots available with other cards in thaat server
ok, this thread actually answered a lot of my questions.
sadly it seems like my plan to run a llm off a p4 will be a dud lol
is it more of a ram or processing speed issue?
and would more ram on the system help make up for that instead?
Hi. I've been following this excellent thread. Do you do any cpu over clocking on the pc with the 3090? I have one of the first Gen of the with 16GB, and it has a big power supply. Not sure of the electricity tradeoff vs going to chatgpt.com (local vs cloud notwithstanding). My pc has 64gb ram, as it was built for music production
speaking as someone with a desktop with a 3090 you dont want to run that thing constantly unless you have cheap power or the cost isnt a huge concern
also idk about you but I had to waterblock mine to keep it cool XD
Thanks, that's why I asked. No cheap electricity... In NY
yeah, im in the process of a massive downsize of my lab and trying to reduce power usage
once I have the money they plan is to get rid of all my servers and replace them with an epyc system
was planning to try and run a ha llm off a tesla p4 but based on this thread that will be an issue
also I didnt realize the 3090 had a 16g version
your power may be lower than mine because of it
I think mine draws 300-350w under load
yeah the 4060ti seems to be the more sensible option power wise i think
I initially bought a 3060 because it was cheaper and has more memory bandwidth than the 4060 Ti.
Has anyone here tried the llm witha p4? Curious how much I could hope to get if I used 1-2 of them
My server doesnt have any power cables for a gpu which limits what I can do a bit.
Also would really prefer to not have to deal with the extra power draw of a high end card, not sure I can do it on my breaker setup in my new apartment
with a P4 (if that's supposed to mean Pentium 4) the problem is so much with processing speed that it's not even funny anymore
think he means the NVIDIA p4
oic
Tesla p4
that i would love to have instead of a 2GB GTX 960 🙂
Yeah, they are on the older side, but im a bit limited since my server doesnt support any cards that need psu power, only slot power
8gb ram but like you said, older architecture and definitely low processing power compared to modern cards
if you already own one then go with it and see whether it ends up being good enough for you. a lot of difference in response time and RAM usage between number of exposed entities and size of loaded model
Ideally I could grab a a2000 ada, but that would run me like 900usd including the single slot mod
I have one thats currently in use for another application. I dont have the voice assistant yet, but will prob grab one in the next few months after I finish this move and get settled. I can use my current p4 to test if I need something more powerful or should grab a second one for the llm
I have so much lab gear I need to sell when I move, and that should give me the money for all these fun toys lol
If anyone wants 150 hdds lmk lmao
starting to wonder what kind of lab that is