#Best solution for HA install with local voice

1 messages Β· Page 1 of 1 (latest)

sand hazel
#

I know this question has been asked a thousand times but i can't seem to get the answer im looking for from reading countless posts.

What i have right now is HA on a desktop but i want to get rid of that and do a low power solution. I've been thinking of installing HA OS on a mini pc and i'd prefer to not do virtualization as that is what i have going now on my desktop and generally it's been fine but doing updates etc means restarting ha. I'd rather have a mini pc be dedicated soley to HA so it's quicker. But what i also want to do is local voice, i don't mean local llm as the more i read the more complex and confusing the answers become and there never is a good answer for it. i'm guessing i will run piper and faster whisper as i dont really need llm as i don't ask ridiculous questions that would justify the additional cost in hardware.

I'd like to put all this on one machine i wont have to worry about. was thinking a beelink maybe and doing the ha os install but i'm open to ideas on the best device. I will be running MA as well as it just seems to work reliably now. I might use grok or chat gpt as a fall back for more confusing questions as it would be cheaper.

marble shoal
#

If you want local voice that is on par or close to on par with Alexa/Google, you'd want to get a GPU and do a CUDA accelerated docker container for wyoming-whisper, and run the whisper large-v3 or large-v3-distil model. If you run that on just CPU it will be painfully slow πŸ˜„

sand hazel
#

your taking me down the wrong road lol.so your saying local piper and faster whisper would be crap on a mini pc?

#

your method has me picturing a room full of servers. I only want local assist for basic everyday commands. i can use cloud as fallback if need be

quartz forum
# sand hazel your method has me picturing a room full of servers. I only want local assist fo...

if you only want basic commands then speech to phrase works well on lower end hardware but its limited to specific commands and cant fallback to llm.

if you want full STT using whisper then only smaller models will run with any level of responsiveness on a mini pc and you may find you get more errors in the text. running a larger model to get less errors requires more serious hardware.

sand hazel
#

i've never used dockers etc and it sounds like a new road. what about a coral tpu in a mini pc?

quartz forum
#

whisper cant make use of a coral tpu

#

coral doesnt have alot of memory which is needed. coral is good for smaller object inference jobs on stuff like frigate

marble shoal
#

Don't need a room of servers lol, just a GPU that docker can make use of.

#

I have one box that runs everything in my house, and that is running a k8s cluster lol, so not a room full of servers πŸ™‚

quartz forum
#

for whisper, the tiny model should work fairly well on something like a n100 mini pc but you millage may vary on error rate

sand hazel
#

so what would be a good solution for the large v3 in terms of hardware? i seriously have no experience running docker..windows guy..yea ..i know

marble shoal
#

docker can run in windows via Docker Desktop, though most people run it on a dedicated linux server πŸ™‚

#

GPU doesn't have to be crazy, especially if all you want is STT/TTS

#

many people run used GTX 1070 or GTX 1080 GPUs from ebay that you can get for like $100 USD

quartz forum
#

yeah i currently use a 1650 super for whisper which is overkill for it tbh but it also does transcoding and stuff for me

sand hazel
#

my desktop uses a 2060 super

#

my thing is wanting to get ha on a mini pc in general. is there a way to have whisper on the desktop just for that so ha isn't effected

quartz forum
#

you could run docker plug whisper on that and offload the STT to your desktop. but you will obviously eat up a chunk of your vram. so if you play games you would be impacted

marble shoal
#

Yeah whisper doesn't have to be on the same box as HA

#

fo rinstance my HA runs in it's own VM, and my whisper runs on my k8s cluster, and HA just accesses it remotely

#

wyoming protocol handles all that πŸ™‚

sand hazel
#

any good tutorial for idiots on how to do that lol

quartz forum
#

you could try running HA + whisper on something like a jetson nano too i guess. that would be low power and combined

sand hazel
#

confused on what that is vs something like a beelink

#

ah and it's not avail in the u.s.

quartz forum
#

the CPU is more designed to be a bit more like a GPU in terms of processing this sort of thing.

#

i am sure you could find a US supplier

#

although you would need one with at least 8gb of memory to do both, that one i linked has 4 i think

#

but was more of an example of a possible route

#

it depends on your exact requirements. if you go for somethign a big bigger then a beelink. like a thin client with a pcie slot you could add a gpu

sand hazel
#

those seem a bit expensive just to be able to attach a gpu

sand hazel
quartz forum
#

i haven't run docker on windows really. i have it on 1 server i have but i run the linux version through WSL instead of the windows "docker desktop" thing.

#

i am sure theres a million youtube videos on the subject though

sand hazel
#

i'll have to look in to setting that up via docker then

#

i tried large v3 on my vm but it eats 80% of the ram in the vm. so yea docker looks like the way

quartz forum
#

i run large v3 turbo on my 1650 and its pretty great

#

thats in one of my servers though πŸ˜›

#

i have a rack in the cupboard under the stairs with all my servers and most networking equipment

sand hazel
#

for fun i'm trying ollama3.2 but god is it slow

#

i have 12gb of vram why does it take 10 seconds for an answer o_o

#

i have 237 entities exposed, response on ollama server is fast as hell. why such the slow down?

marble shoal
#

Context window most likely

#

When you chat using ollama, the context is basically just the system prompt

#

in HA it adds entity data, general prompt, tools, scripts/automations that are exposed, etc. All eats into the context

sand hazel
#

so decreasing exposed entities should decrease load time

marble shoal
#

It may,

sand hazel
#

should i change the context size in ollama itself or leave it at default?

marble shoal
#

well you can set it on the integration config in HA

#

in ollama it's just a default context, API can adjust it. But be aware increasing context window increases vram usage

sand hazel
#

is 12gb just not enough in general

marble shoal
#

there's not really a one size fits all answer for that, it really depends on the model and its size, the quantization, flash attention, context window quantization, context window size, exposed entities/tools, etc.

#

Lots of experimentation lol. But bascally if you want to do serious LLM stuff I think alot of people are gonna say to get at least 16GB, and if you are really serious then something like a 3090/4090 with 24GB Vram, or if money is no object, a 5090 with 32GB 🀣

#

The above calculator I linked lets you input the model, context window desired, and quantization, and gives an idea of how much VRAM is needed.

plush fulcrum
marble shoal
#

for STT/TTS maybe, dunno how LLM would do. With LLM at least, memory bandwidth is one of the biggest speed/performance factors. GPU's excel here because their memory bandwidth is on the order of several hundred gigabytes per sec, and on stronger GPUs it's over 1TB/sec. On these unified memory devices, the bandwidth tends to be arond 200-300gbps

plush fulcrum
#

I just really don't want to put another full-fledged PC in my house. Something small in the corner is much better. πŸ™‚

marble shoal
#

Yeah like I said it depends on the goal πŸ˜‰ For STT/TTS it's probably fine, those models aren't generally as crazy and demanding as an LLM. But if you wanted to, for instance, run a 32b LLM or even a 14B, it might not be super fast. That's more or less what I was trying to get at there πŸ˜‰

#

If your focus is lower power, small form factor, and mainly just for STT/TTS then yeah it's probably not a bad choice πŸ™‚

sand hazel
#

Would running faster whisper on the large v3 model in a docker on the desktop work fine and run ha on a mini. Tried the large v3 in the vm but it was hitting constant 85% cpu which is limited by the vm of. I'm just thinking a vm is to limiting when running something like that. As I said before no experience with docker but if putting it in there via docker desktop does it utilize then entire capability of the PC? Assume I'm a moron with this plz lol

I'm giving up on llm. I don't think it's ready for prime time. Was trying Allama for windows and pointed ha to it but aside from response times, it wasn't turning things on and off like it said it was doing. For instance "it's too bright in the office" shed respond something like "I have adjusted the brightness lower to a more comfortable setting" nothing happens. Sometimes it would even turn a completely different light on

sand hazel
quartz forum
plush fulcrum
plush fulcrum
quartz forum
#

I have said it before and ill say it again... "apple can fuck right off"

#

ill admit they got something close to reasonable with the latest mini until you decide you might need something higher than the lowest model and they are like "oh you want to use it for actual stuff, well that gunna cost you a kidney"

sand hazel
quartz forum
#

no idea, specially for windows docker

sand hazel
#

ok so tried ollama again and seems to be a little less dumb then it was yesterday. got my exposed entities down to 78. however it cant change the temp in the home saying it needs more information. how do i point it to know the thermistat? or would i write something in instructions for it to understand. curious about some good examples for that. and the time response is like 6 seconds with my 2060. if i decide to just do ollama 3.2 is this the best model right now for HA?. it's running via windows install and verified from resources when it's processing a command the gpu spikes. there is so much info out there to the point it is mind boggling to figure out as this is only a part time thing for me and i piss the wife off as it is messing with it this much lol

#

one thing im trying to figure out though is why it sometimes gets lights wrong all together like the room etc. anyone know why? they are in their own room via HA

sand hazel
#

when installing ollama there are model options. is 3.2 best or is there a more reasonable one?

plush fulcrum
sand hazel
#

thats alien to me lol

plush fulcrum
#

Basically llama 3.2 with 3B parameters and 14B parameters will behave absolutely differently.
For HA you want at least 14B. I tried 7B models and they're unreliable most of the time.

jagged tundra
#

qwen 2.5 7b is okayisch
Can't really run qwen 2.5 14b on your 2060
Assuming it has 12GB vram

sand hazel
#

how do i see if its 3b or not?

plush fulcrum
#

Will show you your models

jagged tundra
#

Assuming you didn't specify a specific xb it's gonna be 3b since that's the default

plush fulcrum
#

Yeah

#

And there's nothing better for 3.2

#

So use 3.1

jagged tundra
#
#

Or qwen 2.5

plush fulcrum
#

But as @jagged tundra said, qwen 7B seems to be the best of 7B models.

jagged tundra
#

Should use like 8gb vram of your 12gb vram I think

plush fulcrum
#

Just for model

jagged tundra
#

For a better model you'd need like a 4060 ti (16gb vram)

plush fulcrum
#

Plus couple gigs for context, if "control" is on

jagged tundra
#

And that should just about run the 14b version

#

Yeah you absolutely want tools support

plush fulcrum
#

I use 7b model on CPU, no control.

#

Just for fallback

#

Because even 14B models are hit or miss

jagged tundra
#

I am seriously considering swapping my 3060 12GB for a higher tier card with more vram to run 14b models

jagged tundra
plush fulcrum
#

So I don't want them hallucinate and turn on something I didn't meant to.

sand hazel
#

in the model list i see qwen2.5 but i dont see 7b

plush fulcrum
#

Local AI is just not ready, from my perspective.

jagged tundra
#

7b is the default for qwen 2.5

sand hazel
#

ok ok

jagged tundra
sand hazel
#

so it would work better then the 3.2 default?

jagged tundra
#

Since it often understands garbage when I don't talk extra clear and slow

jagged tundra
plush fulcrum
jagged tundra
#

That's what I run right now

sand hazel
#

maybe thats been my problem

plush fulcrum
#

7B is decent for casual speaking

#

3B is child with mental issues

jagged tundra
#

I had the 7b make up stories and play "games"

sand hazel
#

is the processing time much longer?

jagged tundra
#

It will be longer yes

#

Especially since you are on a 2060

#

But at least it will be usable, just give it a try

sand hazel
#

so if its to long whats a good gpu for that model that doesnt break the bank. assuming 4060 16gig

#

so do you think the problem with it not turning on the right lights was due to the 3.2 model mostly im an idiot with ai lol

jagged tundra
quartz forum
jagged tundra
#

And then I recommend using the local assist phrases

#

Then there won't be an LLM in the mix and it can't get stuff wrong

jagged tundra
sand hazel
#

so do people run multiple cheaper gpu to achieve this? last time i did that was back in the day with sli

jagged tundra
#

Most people are probably just burning way too much money on this

sand hazel
#

i do have nabu going i was using for fall back so if i prefer local it will want o use the llm all the time right

#

thats what im afraid of doing

quartz forum
sand hazel
#

youtubers make it seem like it works easily. kinda ticked with chuck as he obviously did a ton of editing in his vid for it

quartz forum
sand hazel
#

its a shame really cause it makes ha look bad in my opinion

jagged tundra
plush fulcrum
jagged tundra
plush fulcrum
plush fulcrum
sand hazel
#

is there a set of instructions i should add that might help it in conjunction with the default ha adds

#

last time i read ha was working with nvidia for an llm specifically for ha. anyone know if thats true

plush fulcrum
sand hazel
#

god i see that one being useful already lol

plush fulcrum
jagged tundra
quartz forum
sand hazel
#

whats with the mac minis lol

quartz forum
#

i have a few custom intents as sentence triggers in automations

jagged tundra
jagged tundra
quartz forum
sand hazel
#

ah. does seem useful. never been a mac guy

quartz forum
#

plus they probably hold their value more than other systems so they can resell them if then need to

jagged tundra
#

The framework desktop seems like a good alternative
If you don't mind the price 🀣

quartz forum
sand hazel
#

lol

jagged tundra
#

Up to 128gb unified memory!

#

"only" 2400€

quartz forum
#

the price is not totally unreasonable to be fair

jagged tundra
#

Absolutely, that memory is expensive

#

If you paid the apple tax it would be like 6000€

plush fulcrum
#

Yeah, Framework is also ARM and using unified memory, how could i forget...

#

2.5k Euro for 128 GB is great price.

jagged tundra
#

Framework is not using arm on this chip
It's an amd chip

#

Or did I miss something

plush fulcrum
#

Does x86 support unified memory o_O

jagged tundra
#

Apparently

#

AMD Ryzenβ„’ AI Max 300 Serie

#

This is an x86 chip

sand hazel
#

is it normal to sometimes get code in the response

jagged tundra
#

Make your context bigger

sand hazel
#

its 8192

plush fulcrum
jagged tundra
#

I have it at like 4 times of that

#

32k something

sand hazel
#

o.o

#

wont that make a response take like 2 minutes to process

jagged tundra
#

The context size needed will depend on how many devices you expose to assist

sand hazel
#

78

jagged tundra
#

That's a lot

#

More devices need bigger context

#

If you look into what home assistant sends ollama you will basically see it dumps all the devices as yaml

sand hazel
#

well thats another question i want to ask. i originally had like 230 entities. what devices do you guys usually expose. light bulbs i get

jagged tundra
#

Just try doubling it and see how it goes

plush fulcrum
jagged tundra
plush fulcrum
#

But bulbs can be supported by inbuilt intents

#

Oh damn

sand hazel
#

and 78 is a lot lol

plush fulcrum
#

There's no separate exposal, i just realized

jagged tundra
#

You could also group devices together like lights to reduce the number of entities exposed to assist

#

In home assistant you can create a helper group entity

sand hazel
#

group lights for a particular room right?

jagged tundra
#

Maybe not room, but certain lights in a room

#

Fully depends on what devices you have or want to expose

sand hazel
#

one thing im noticing is the llm not seeing the thermistat or being able to change temp

jagged tundra
#

Are you running qwen 2.5 7b now?

sand hazel
#

yes

#

told her to talk sassy and damn is she..

jagged tundra
#

I don't remember if there was a tool call for the temp change

sand hazel
#

response time isnt much longer then default 3.2 was

jagged tundra
#

You could check your ollama logs and see what tool calls are included in there

sand hazel
#

already wishing i had more vram though

jagged tundra
#

Just set that env var to 1
- OLLAMA_DEBUG=1

#

Then ollama should log the conversations

#

And you can see which tools home assistant presents to the model

sand hazel
#

wait where do i do that?

jagged tundra
#

Do you run ollama in docker?

sand hazel
#

in windows install

jagged tundra
#

Uh oh

#

Then you will need to set that environment variable in windows

#

I think if you search that in the windows search it should pop up

#

"environment variable"

#

Key should be OLLAMA_DEBUG
and value 1

#

After that restart ollama

sand hazel
#

i have no experience with docker but i do think i wanna do whisper and piper in it with docker desktop i think

plush fulcrum
#

At this stage you already need a Docker server, preferably on some separate Linux machine πŸ™‚

sand hazel
#

sigh..never ran linux either lmao..

#

fkin aye isnt windows good for anything..

plush fulcrum
#

That's not that bad.

#

Proxmox -> Docker LXC -> Portainer -> Docker Compose

#

Easiest way to go.

plush fulcrum
sand hazel
#

going to have to find tutorials on docker now lol

plush fulcrum
sand hazel
#

lxc?

plush fulcrum
#

No reason for Piper/Whisper, but Ollama could benefit

plush fulcrum
#

On Proxmox

sand hazel
#

well i have piper and whisper running as an addon on in a vm but its limited to the resources i allow the vm

quartz forum
jagged tundra
#

gpu passthrough into VM incoming!

#

Always fun

sand hazel
#

i hear a lot run in proxmox. this also makes me wanna get ha on a mini and run the llm and whisper on the desktop

sand hazel
jagged tundra
#

(I also run proxmox)
A dedicated VM for HA OS
& A dedicated VM where I run my docker containers

jagged tundra
# sand hazel how?

If you end up running proxmox and a VM you will have to pass the hardware into the VM
Which needs some extra setup

#

(works pretty good after you figure it out)

quartz forum
#

makes it so the graphics card is "plugged into" the vm

#

i have that setup

sand hazel
#

i heard its a pain to do

jagged tundra
#

It's okay, there is a great Reddit guide

quartz forum
#

not massivly, as long as your motherboard supports it

#

stick a line in the bootloader and add the device to the vm. i think that was about it?

#

its been a while since i set it up

#

speaking of proxmox

#

my servers want their updates apprantly

jagged tundra
#

Best thread!

sand hazel
#

sweet. i'll have to take a read on it tmrw

jagged tundra
#

Although this is for a windows VM
It's mostly similar for a Linux VM

quartz forum
#

yeah its for a vm config, the guest OS doesnt really mater

sand hazel
#

if doing ha install via the generic x86 method do you still get supervised?

quartz forum
sand hazel
#

sorry for picking your brains so much tonight but you guys have been really helpful

jagged tundra
#

You would install HA OS as a vm

#

So you'd get all the benefits

quartz forum
jagged tundra
#

I run that setup and it's so good tbh

quartz forum
#

haos has the supervisor but its not a "supervised install"

jagged tundra
#

I can also backup the vm at any point
Double backups so to speak

sand hazel
#

i was meaning if i put ha on a seperate low power device off the desk top. so i would do it in proxmox there to then

quartz forum
#

i run proxmox on my mini pc that runs HA

sand hazel
#

ok ok. im overthinking this lol

quartz forum
#

it also runs frigate

jagged tundra
#

Proxmox can run many little computers

jagged tundra
sand hazel
#

thats something else i wanna do is frigate for facial rec one day. but one thing at a time. tried blue but they hate reolink cams

jagged tundra
#

Everyone hates them, they aren't recommended with frigate either
But mine (reolink) do work no issue

sand hazel
#

they really are the best bang for the buck imo

quartz forum
#

i have a reolink on my frigate setup

jagged tundra
#

And they work 100% offline

#

Although I got played by their true colour night stuff
I wanted infrared, but that's on me for not reading :)

sand hazel
#

thats been my goal entirely. went from ring cams then got tired of them and ran ethernet in the attic and added reolink. havent looked back. only bad thing is once in a blue moon the ai goes on the fritz

#

i heard their cx cams are epic

#

lifehackster probably does the best reviews of reolink that ive seen

jagged tundra
#

I have the E1 outdoor CX

quartz forum
#

i think mine is the e1 outdoor pro maybe?

jagged tundra
#

The night vision without a spotlight is not as advertised lol

quartz forum
#

yeah mine has the spot

jagged tundra
#

I do too

sand hazel
#

well dang

jagged tundra
#

But I don't want to spook my turtles at night with a light

sand hazel
#

i heard theyre coming out with a duo version of cx

#

but they screwed the fov in duo 3 so not even sure id get it

#

personally i don't think the night vision is that bad on their cams in general. but coming from ring so i know thats not much of a basis lol

jagged tundra
#

This is at 3am right now

#

There is a bit of light from outdoors

sand hazel
#

that does seem bad

quartz forum
jagged tundra
#

To be fair, I am using them for close up shots which they are probably not intended for either

quartz forum
jagged tundra
#

Because they have no infrared

#

They just work with the surrounding light

quartz forum
#

ah right

#

i follow you now

jagged tundra
#

(which I miss interpreted when I got these)

quartz forum
#

yeah i can see why

sand hazel
#

my normal cam seems to look better and its just a trackmix.

jagged tundra
#

The pro are 30€ more, not too bad tbh

sand hazel
#

no night vision on but yours does look grainy

jagged tundra
#

But I decided it works well enough

#

They don't move in the night anyway, nothing to see!

sand hazel
#

i had hopes for the cx tbh

jagged tundra
#

This one is right next to it
Little bit better maybe

sand hazel
#

is that on high?

jagged tundra
#

High?

#

You mean brightness?

sand hazel
#

or clear i should say. highest bitrate

jagged tundra
#

Yes

sand hazel
#

thats a bummer. i was looking in to the cx410 to replace the 811a i have

sand hazel
#

Is it the norm right now to not give llm control? I tried with control and it wasn't recognizing custom sentences at all to activate automations. Is it best to just stay with whisper?

quartz forum
sand hazel
#

Ok. Well problem is I exposed entities but had to limit how much I exposed because of the load time. 40 at the moment. Still figuring out the best way to have it control lights without exposing them all singly. I didn't expose the automations no

#

Correct me if I'm wrong so the stt has more to do with assist not sounding like a robot but more life like?

quartz forum
plush fulcrum
#

@sand hazel in short:

  1. When you speak, first there's wake word. It waits for exact set of sounds and triggers the pipeline.
  2. After that, your voice goes to the STT engine (Whisper or NabuCasa or Porcupine etc.). That part of pipeline is responsible for generating text based on your words, because it's easier to process logic in text than in audio format.
  3. After that recognised text is sent to the conversation agent. It might be local intent engine in HA (with predefined set of sentences) or LLM or anything that will process the text and act on it.
  4. If the conversation agent responded with text, this text is sent to TTS engine (Piper or Nabu cloud or Kokoro etc.), that will generate audio file from that text with corresponding voice and intonation. Then this audio is sent to media player in the satellite for playback.
sand hazel
#

Is it possible to have llm in an automation give a response for a blueprint i have? for instance i have a tts tell me if i have low batteries but its a dumb response, like can llm "jazz" it up

#

Just dont want to hear the same response everytime i mean. i know i can add random to it but a llm response would be more realistic in a sense

quartz forum
plush fulcrum
sand hazel
#

you suspected correctly

quartz forum
#

brutal honesty is welcomed sometimes πŸ™‚

sand hazel
#

well i need to learn how to walk before i run

quartz forum
plush fulcrum
quartz forum
plush fulcrum
quartz forum
plush fulcrum
jagged tundra
#

As a German I can relate 🫑

sand hazel
#

whats the best model for writing yaml?

nimble tide
#

What's the best model for 16gb on a 5060Ti? I see the HA page recommendeds llama3.2 but didnt know which size or if people have tried other models

quartz forum