#Translation/aliases for areas
1 messages Β· Page 1 of 1 (latest)
You can add aliases to areas, no problem. Settings > Areas, labels & Zones > Areas (default tab) > edit an area > Aliases section
Oh, nice. Thanks. Confusing that they are not exposed like entities.
But this won't work with Assist.
I have PR to make it working (at least start it), but it's hanging for several months without review...
What do you mean? Sure they work, that's how i've been using Assist from the beginning
Areas are not entities. I'm not sure what you would like to see
Ah sorry. They will work for default stuff, the problem I had is that Hassil isn't returning area in slot, but exact alias that was said - and there's no way to use that in intent_script because there's no "get area by alias" method.
that's a different story and more of a tinkerer's issue than a common use case
I must argue, that whole Assist thing is tinkerer issue right now. It isn't just ready to replace Alexa/GHome without lots of writing and tuning.
Overall the assist config is a bit all over the place. I guess that is what it comes down to. Setup is far from straight forward for getting it to be useful, imo, after having the PE for two days.
I expected it to be lacking in Swedish, but I haven't got it to understand a single command yet, even simple things like turning on lights. So I'm forced to use english at the moment which requires me to set aliases for almost every device.
So far the experience has been quite a let down tbh. But I hope it will get better in the future.
what STT are you using? is it any good? you can check in https://my.home-assistant.io/redirect/voice_assistants/ >
> Debug
faster_whisper
is it any good? does it properly transcribe what you are saying? my best guess is no, which is why your experience is hindered
I don't know? But I guess not. What should I use instead?
I don't know? But I guess not.
I told you how to check: #1324731699082563625 message
In Whisper options you can increase beam size and other options to make it better. But it'll require beefier hardware.
you can use a larger Whisper model with larger beam size, but that's going to need a GPU
well... it won't need a GPU, but your patience probably will π
Ok, I thought you only meant the stt. No it seems to be struggling a lot.
You can add alias "case that's office" π
My wife has a name that is quite hard to pronunce correctly in english. Are there any... workarounds for that?
Besides a nickname
i meant in swedish, which i expect to be much worse π
no, sorry
Is the tiny-int8 = wisper_faster?
ouch, that's small! it's bound to suck, unfortunately
set up a 1mo trial for NC cloud and see if you get much better results with the cloud STT
Just an assumption, since it states it here... and whisper_faster isnt listed in the dropdown
I could give my vm more cpu, but I would need to know the starting point to beef it up
How do I know which one is used?
whisper_faster (or faster_whisper) is the STT engine. tiny-int8 is the model used
you select the engine in the pipeline definition and the model in the engine settings. I understand you're running it as an addon (which is ok), so you select the model in the addon settings
Ok, thanks! But can I know which model is currently used when it is set to auto?
CPU helps, but only up to a point. larger models and larger beam sizes (which transcribe better) will eventually choke the CPU. you will need a GPU at some point as you use larger models, and a version of Whisper which can use that GPU
well, it says in the description: tiny-int8
i have no reason to believe anything else is used
you could probably check the addon logs and see if it says what model it loaded or is using
I assumed it would adapt to the hardware, from the definition of auto
but as tetele said, if you are running larger models on just CPU, expect to wait several seconds before it even convers your speech to text
Would I need to run wisper in a separate vm for that?
if the response times are presently fast, I'd think it is running tiny or small
and yeah, you'd need to run wyoming faster-whisper in a separate VM, in Docker, with GPU exposed for best speeds/performance
or k8s if that's your thing π
Nice. So you're pretty much bound to use a gpu or nb cloud to get anything useful out of it. Doesn't exactly say on the box.
not just NB cloud. there are other cloud providers with (custom) integrations
As in OpenAI and such?
Doesn't exactly say on the box.
it kinda does https://www.home-assistant.io/voice-pe#choice-to-voice
Google works well, IIRC
no, that's the conversation agent, not the STT. one second
well, yeah, it's not local
The recently announced voice stuff in 2023.5 are pretty neat, and of course both the local (whisper) and cloud speech-to-text are awesome. But the more choices we have the better, so I made an integration that allows to use Google Cloud Speech-to-Text in HA. Itβs pretty fast, supports a ton of languages and can be included in an assist pipeline...
oh, wait, apparently now it's supported in the core integration https://www.home-assistant.io/integrations/google_cloud/#google-cloud-speech-to-text
STT is resource-intensive. I've said this many times before: you need to pay for it one way or the other. Your options are:
- with time (i.e. slow transcription)
- with hardware costs (i.e. get a GPU and run Whisper)
- with subscription/usage fees (i.e. use someone else's hardware and pay a fee, but also lose out on privacy)
thanks
I assume this would be enough? Characters = characters in a stt sentence?
Not sure if all this is worth it to replace the google homes though π
4M sounds like a lot, but i am not familiar with the pricing model. If that's for Google, then I've heard about monthly costs of pennies
that's entirely up to you
I mean, going through all this setup and still sending the data to google. I dont see the point really, when their hardware is also better. I'd probably be better off contributing to the sv translations then.
What are you looking for a in a gpu for running whisper? vram?
if you're gonna use it just for Whisper, make sure it has CUDA cores (i.e. is nvidia). you'll never cap out the VRAM with Whisper alone
going through all this setup and still sending the data to google
yes, but it's dissociated data. you're merely sending voice and wanting transcriptions, they can't know it's actual commands without analyzing it. whereas when you issued commands to a GH, they knew it was just that
NC Cloud, for example, uses Azure as the STT provider, but all users go to the same Azure account (for NC). it's anonymity by numbers in that case π
however, if you're not keen on supporting HA development, it's much more expensive than a simple STT service subscription
I'd probably be better off contributing to the sv translations then.
You're always more than welcome to do so β€οΈ https://github.com/home-assistant/intents
for TTS/STT you don't need crazy hardware. a decent nvidia GPU with 8GB of vram would be plenty
Think people run it on used GTX 1070 GPUs and get responses in under a second
I am using a 4060ti and my stt takes about 250ms
and that's running large-v3-turbo in my case
pretty sure you can run the large model on a 1070 and get responses in under a second tbh
I've tried that on a Tesla P4 and i could, so yeah, it works on a 1070 as well
I might actually buy one, I've been considering it for video transcoding for a while anyway. Need something that fits a optiplex sff though.
Is there documentation for running whisper in a separate vm for HA?
not official, if that's what you're asking
Anything will do, I just want to have a grasp of how complicated it would be before diving in
Or could I pass through a gpu to HA and benefit from that?
how are you running things?
i have an LXC running docker on my proxmox host (yeah, i know, but whatever works). I have passed the GPU to that LXC so docker can use it. it's then a matter of installing drivers and finding a docker image capable of using CUDA cores
installing the Linux drivers for nvidia was marginally complicated, but required on any system you choose. they had to be installed on both the host and the LXC. keeping the drivers in sync between the 2 is a bit of a bitch
don't think I'll have the time for the maintenance required tbh π¦
well... it's not mandatory
so, there's no way to utilize a gpu from within HA?
as in, passing it through HA and simply selecting a larger model. Or something like that.
no, HA doesn't utilize GPU hardware in addons or anything like that
an alternate approach you can do is just spin up another VM in proxmox and pass the GPU to that. For instance I have a k8s cluster, and I just have a VM that is my worker node with the gpu passed to it
then any containers I run on that worker have GPU access
so you can run multiple docker containers, like whisper, piper, a transcode server like Plex
I have all those running and sharing the GPU
I even have a sunshine container that runs steam so I can stream games to my TV/Phone π
Yeah, but that would still require the driver maintenance etc I suppose?
that's what i suggested, except it was an LXC, not a VM
only if you care about that
Yeah LXC has the caveat you mentioned, where you have to install the driver at the host and LXC level
I just care for it to work π
VM you just maintain it on that one VM, so a little bit easier maintenance π
but LXC has the benefit that multiple LXCs could use the same GPU
How do you tell HA to use the external wisper/piper vms? Can you direct it to an ip in the integration or something?
Or do you need to tinker with the esp code perhaps
yes. you add an instance of the Wyoming integration and it asks for host and port
yup
Ah, great
that's how mine is set up
Well I might start to hunt for a gpu then. And probably find out that I need a new host from that. It will be an expensive voice assistant π
https://docs.linuxserver.io/images/docker-faster-whisper/
https://docs.linuxserver.io/images/docker-piper/
I use these two docker images to run the services, they're pretty good and well maintained π
Welcome to the home of the LinuxServer.io documentation!
Welcome to the home of the LinuxServer.io documentation!
i ended up buying an nvidia Tesla P4 GPU (which I now no longer use) and a rackable case, which needed a new PSU (the former case was an mITX, the new one was an ATX)
yeah but if you are serious about privacy, getting the good hardware pays off π
I got a powerhouse system, and everything is local, even the LLM. And performance in terms of speed is on-par with Google/Alexa IMO.
I'm serious about the relation between caring about privacy and not be forced to sleep on the couch for the next month
π
with the LLM you can issue more complex commands and it understands quite well, at least in my experience π
Will a 1070 be enough to run a decent llm as welll? Or do you need beefier stuff
also, i was among the first 5 customers for the Onju Voice board (probably even the first), of which I hard to buy 5 and get another 2 GH mini units. so don't tell me about expensive VAs π
it definitely will not. you need beefier stuff
3060? I'm not sure what to look for really, except I know that vram is good
But thats probably another discussion isnt it π
Anyway, thanks for the help. It's certiantly not plug and play yet, thats my conclusion, for my language anyway
with LLM I'd personally try to shoot for at least 16GB vram
that depends on what LLM you want to run as a conversation agent
I'll probably leave llms to another year lol
fair enough. Good to consider though if you are gonna make the investment, in terms of future-proofing π
yeah, a 1070 is dirt cheap and will get you 80% there in terms of STT/TTS
Definitely, if you only want TTS/STT a 1070 will give great performance for the value
Im pretty sure it wont fit in my case. Do you know about the rx6400? Or do you need nvidia for these things.
3060 of course will be even better
you need nvidia
need CUDA, so has to be nvidia
A new host it is...
there are low profile nvidia cards, though
also PCIe accessories which allow you to mount GPUs outside enclosures or at an angle
Not sure height is so much an issue as overall space tbh, I'm gonna open it up and have a look
just don't be an idiot like i was and buy a server GPU (i.e. the Tesla P4) and then realize it has passive cooling
ouch yeah π
though if you have the space in your case, people have made printable shrouds for those to inject cooling
that's what i started doing
...after buying another case π
which was the limiting factor for the size in the first place
4gb for the one I'm looking at
but it will be worse than the 1070, though
yeah 4gb won't be enough for decent models
6-8GB should do
at least I don't *think
yeah, think the large model for whisper uses something like 4.5-5gb
This is what I am running for my home server π
Nice
but that runs a full k8s cluster. I do alot of research stuff for work π
I seem to at least be able to run base-int8 instead of tiny without problems
is small-int8 more capable than base? Or what's the heirarchy
so I checked on my server,
evidently the large-v3-turbo model is only using 1.866GB of VRAM
that sounds promising, maybe ill buy a 1650 and give it a go, its only 70 euros used
i wish i could provide more insight, but i haven't played with the GPU for a while (mostly because i couldn't be bothered to print a cooling shroud)
This is the small model running
INFO:faster_whisper:Processing audio with duration 00:03.750
INFO:wyoming_faster_whisper.handler: What's the weather like?
INFO:faster_whisper:Processing audio with duration 00:03.590
INFO:wyoming_faster_whisper.handler: Turn off the lights
Decent I think?
decency is subjective in this case. it is what it is. if it feels slow to you, then it's slow. otherwise it's decent
it's by no means fast, i can tell you that π
this is the performance I have with large-v3-turbo:
that's before the model is primed, so first time talking to it
3 seconds?!
465ms
the audio duration is how long the speech audio is
so 3 seconds of audio converted to text in 465ms
Now my PE flashed red when trying to wake it, what does that mean?
the primed request comes in at 219ms
oh, ok, i misread that. thanks for the clear up
error, but it could be anything
you'll need to check logs (prefferably ESPHome logs) to understand what's wrong
Now Whisper is stuck in a reboot loop with [16:46:53] INFO: Service exited with code 256 (by signal 9) π
might want to turn on debug logging to see why it is crashing
but if you are using a larger model size, it uses more ram, is the system running out of memory maybe?
Nope
But I'll try a full ha reboot
Trying to resume download...```
Things are not going great π
Up and running again, the cpu spiked from trying a too big model
Could one use a TPU instead of a gpu for local stt?
nope, TPU doesn't have the capacity to handle those kinds of models :/