#current state of HA voice assistants?

1 messages · Page 1 of 1 (latest)

icy bison
#

I still use Amazon Alexa currently, though all i use it for is alarms and timers (for waking up in the morning, cooking, etc.) and I'd happily ditch it for a HA based solution.
I did set up a voice assistant in the past ( a year or 2 ago ), which didn't quite do it so i stuck with my Amazon one.
Have things changed since then? If so, what setup (hardware and software) is recommended generally speaking? I already use Sonos speakers to output other audio from HA, ideally i'd like to use them here as well. I run HA on a Pi 4 with SSD

sturdy veldt
# icy bison I still use Amazon Alexa currently, though all i use it for is alarms and timers...

so voice has come a long way. but is it a direct replacement for an alexa? maybe not.
the voice-pe satellite is good and can output via 3.5mm jack so if your speakers have an input you can easily pass to them instead of the built in speaker.
for actual voice processing you can probably run "speech to phrase" on a rpi which is a little limited as it only looks for specific patterns based upon your entities.
if you want full voice to text then you will need something more powerful or use a cloud service

icy bison
#

This is my one and only use case for alexa anyways

#

So if that worked then it could replace it for me

sturdy veldt
#

I know it should do timers pretty well via STP as it has most of the expected inputs. dont expect to be able to do "2 hours, 3minutes and 6 seconds" but "45 minutes" should be fine

#

i am not sure about specified time alarms. although i would probaby do that via automation anyway

icy bison
sturdy veldt
#

"okay nabu, set a timer for 30 minutes" - works great
alarms in that method though I don't think will work, but you could set an automation to trigger on your work days at a specified time

#

something like this

#

then set the action to TTS or play music or whatever

icy bison
#

Looks like i could finally ditch my echo dot for good

#

Regarding the Home Assistant Voice device, is its microphone good enough to properly pick up my voice from anywhere in the room (as the amazon one)?
I could install additional sattelite mics, but i'd prefer not to ofc

#

Though i would prefer to have my existing sonos speakers output the voice assistant's sound, not via 3.5mm jack

sturdy veldt
#

the mics are pretty good, realistically they are not as good as the amazon ones. but most people find them fine.

you "CAN" get the TTS responses to be redirected to another media player. but its unsupported and requires you to mess with the firmware a bit. there are some issues that can arise from it. using the jack is the best way to interface it with other speakers

icy bison
sturdy veldt
#

for just TTS the internal speaker is fine. its only if you want to start playing music on it where it becomes an issue. but i guess you will be playing music via your other speakers anyway

icy bison
#

Thank you my Friend!

#

Do you know of a budget-friendlier alternative to the HA voice? The 70 bucks is fine for me if it works well, but maybe there's an option that e.g. lacks a speaker and costs less.

sturdy veldt
#

if you want to get something cheap to test stuff out before getting something better like the VPE then its not a bad option

nova spoke
#

Also it would solve the speaker problem. 😉

#

But be aware that there's 4Ohm speaker supported, not 8Ohm (same for VPE DAC too).

sturdy veldt
#

oh yeah thats a shout, if you were thinking of building your own anyway and have some stuff the respeaker boards might be a direction

icy bison
icy bison
icy bison
#

My Ollama Server can only run tiny-small models with acceptable speed (like 3B). 7-8B is possible, but too slow flor a voice assistant

haughty flower
#

Will i be fine without any additional LLM?
you're the only one who can answer that. you have to look at the building blocks for ratings instead of the pipeline as a whole

#

first off, what's your language? English?

icy bison
haughty flower
#

You don't need LLMs to set an alarm (which is not supported by default). You can defined a custom intent+custom sentence combo or (maybe less likely, considering the numbers involved) an automation with sentence trigger

There are good and bad STTs for German, local and cloud. There are many good TTSs, both local and cloud

You have the M5Stack already. Test it

sturdy veldt
#

you would need to probably set up whisper on your other machine to do the STT though. as STP doesnt support variables

#

or use cloud for STT

icy bison
# sturdy veldt you would need to probably set up whisper on your other machine to do the STT th...

Oh i've got everything to work already in the past, but i used an LLM with it that made it either too dumb (3B model) or too slow (7B model)

My hardware hasn't changed (Vega 8 iGPU on my Proxmox server), so i need to either change the LLM part accordingly, outsource it (e.g. OpenAI API) or leave it out entirely

This is why i'm asking where the limits are in terms of having my system understand me and act accordingly (either without LLM, or with small-ish ones)

icy bison
sturdy veldt
#

so you can set up "local processing" with llm fallback.
so it will try and deal with it using the basic assist and if it cant work out what you mean it will ask the llm if it know whats going on

long aspen
#

A 3060 12gb can run 8b models just fine and is pretty responsive
That's what I do

#

But the 8b models are kinda stupid

#

And for the 14b models you'd need more vram unfortunately

#

Whisper & piper also need some vram

#

I use it with German too and it's decent enough

sturdy veldt
long aspen
#

4060 has 8GB vram

#

I tried running qwen 2.5 14B on mine and it's slow af

sturdy veldt
long aspen
#

I use qwen 2.5 8b

#

here is my nvtop

#

you see ollama & whisper

#

4060ti with 16GB vram might just do it

#

(ignore the 2 ffmpegs they just do video stuff)

sturdy veldt
#

yeah, some background transcoding will happen for me too but that doenst use alot of vram

long aspen
#

I got the 3060 for like 220€ used

sturdy veldt
#

my brother in law was talking about upgrading his 3090 with 24gb. my hope is that he does this and i can lay claim to old one

#

if i get a 4060ti now then if i randomly get the 3090 later i can move the 4060 to my desktop which has a 3070 currently

long aspen
#

that could be an insane deal!

sturdy veldt
#

some more 50 series cards are announced next week. i am hoping that shakes up some deals on 4060ti's

long aspen
#

eh, prob not for a while

#

stock is terrible after all

sturdy veldt
#

yeah probably but might shake up a few ebay opportunities

long aspen
#

maybe I should look for one too, I wish I could run 14b models tbh

#

you might even be able to run qwen 2.5 32b with a 3090

grave cave
# icy bison Regarding the "alexa replacement", for me personally it would qualify as one if ...

that'd be so wonderful if this finally worked. was googling around for it a lot, but only found feature requests with curiously few upvotes 😦 Edit: I just found one I'd not seen yet and which is already kind-of working! https://community.home-assistant.io/t/assist-create-reminders-by-voice/707287/12 🙂

I'm doing this on a measly GTX 960 running llama3.2 with the fallback option, by the way. Anything that combines traditional pre-processing with partial AI processing works great. The "remind me to do x at y" linked to above has a combined response time (STT + LLM + TTS) of under 20 seconds, which is practical enough.

austere fog
# sturdy veldt how much vram does a 14b model need?
NAME                           ID              SIZE     PROCESSOR    UNTIL
qwen2.5:14b-instruct-q4_K_M    7cdf5a0187d5    15 GB    100% GPU     Forever

That's with 32K context, flash attention and q8_0 KV cache.
According to nvtop the process uses 12130MiB or 11.85G. I hope that answers that.

sturdy veldt
austere fog
#

Or use less context 🙂

sturdy veldt
#

i want overhead space for whiper too anyway

nova spoke
sturdy veldt
#

16gb card should be fine for a 14b model and whisper it seems

nova spoke
#

7B models are proven to be useless with context...

austere fog
#

I'm currently using > 16G for LLM, whisper and piper so you might want to plan for more. Get a 3090 or something. Not that piper would absolutely need a GPU. I only did it because I can.

nova spoke
#

At this stage, think about Mac Mini 🙂

sturdy veldt
#

apple can fuck right off

grave cave
austere fog
austere fog
# grave cave That's a _single_ GPU? o_O

What do you mean? All the voice stuff I listed is running on a single 3090. > 16G is the total usage for all I listed. I can share more details later if you're curious.

grave cave
#

Is it possible to use multiple smaller-VRAM GPUs together btw?

austere fog
#

Well the 5090 has 32G but I can't and don't want to spend whatever they ask for it . Yes, ollama can use multiple GPUs.

grave cave
#

crazy

#

good to know though

sturdy veldt
#

there is a 3090 with 24gb

#

running multiple 12g cards is a possible also. but i am not sure i have the pci slots available with other cards in thaat server

half nymph
#

ok, this thread actually answered a lot of my questions.

#

sadly it seems like my plan to run a llm off a p4 will be a dud lol

#

is it more of a ram or processing speed issue?

#

and would more ram on the system help make up for that instead?

near plank
#

Hi. I've been following this excellent thread. Do you do any cpu over clocking on the pc with the 3090? I have one of the first Gen of the with 16GB, and it has a big power supply. Not sure of the electricity tradeoff vs going to chatgpt.com (local vs cloud notwithstanding). My pc has 64gb ram, as it was built for music production

half nymph
#

speaking as someone with a desktop with a 3090 you dont want to run that thing constantly unless you have cheap power or the cost isnt a huge concern

#

also idk about you but I had to waterblock mine to keep it cool XD

near plank
half nymph
#

yeah, im in the process of a massive downsize of my lab and trying to reduce power usage

#

once I have the money they plan is to get rid of all my servers and replace them with an epyc system

#

was planning to try and run a ha llm off a tesla p4 but based on this thread that will be an issue

#

also I didnt realize the 3090 had a 16g version

#

your power may be lower than mine because of it

#

I think mine draws 300-350w under load

sturdy veldt
#

yeah the 4060ti seems to be the more sensible option power wise i think

austere fog
#

I initially bought a 3060 because it was cheaper and has more memory bandwidth than the 4060 Ti.

half nymph
#

Has anyone here tried the llm witha p4? Curious how much I could hope to get if I used 1-2 of them

#

My server doesnt have any power cables for a gpu which limits what I can do a bit.

#

Also would really prefer to not have to deal with the extra power draw of a high end card, not sure I can do it on my breaker setup in my new apartment

grave cave
vestal bane
#

think he means the NVIDIA p4

grave cave
#

oic

vestal bane
#

Tesla p4

grave cave
#

that i would love to have instead of a 2GB GTX 960 🙂

vestal bane
#

Yeah, thing is it's older architecture so not sure how well it'd do

#

Pascal I think

half nymph
#

Yeah, they are on the older side, but im a bit limited since my server doesnt support any cards that need psu power, only slot power

#

8gb ram but like you said, older architecture and definitely low processing power compared to modern cards

grave cave
#

if you already own one then go with it and see whether it ends up being good enough for you. a lot of difference in response time and RAM usage between number of exposed entities and size of loaded model

half nymph
#

Ideally I could grab a a2000 ada, but that would run me like 900usd including the single slot mod

half nymph
#

I have so much lab gear I need to sell when I move, and that should give me the money for all these fun toys lol

#

If anyone wants 150 hdds lmk lmao

grave cave
#

starting to wonder what kind of lab that is