#Upgrading Voice PE microphones?

1 messages · Page 1 of 1 (latest)

keen coyote
#

Hello everybody. I am pretty happy with the voice PE, but the microphone has trouble picking up my wake-word. I was wondering if there is any way to improve the microphones… also, it’s really not good at filtering a single voice source. If anybody else in the room is talking it will mix into the command. I also think the time from wake-word until it actually listens to your voice could be way shorter.

agile hamlet
keen coyote
#

The firmware is open source, isn’t it? Couldn’t it be installed on a ESP32 with better hardware connected?

#

I wonder though if it’s a hardware problem or a software problem

violet elm
#

It's not hardware problem. For your cases to work, there should be real beam forming and voice prints implemented, to extract actual request from other voices. There's no firmware for that yet, at least no open source solutions to any hardware you can buy.

agile hamlet
#

you could potentially build something with a better microphone array. but at that point your not really talking about a VPE anymore.
you could definitely look at the hardware design + the firmware of the esp32 and XMOS chip as reference to build something.

as for filtering 1 voice out of a crowd. this is realistically probably not going to be happening in hardware. you could set it to instead stream the microphone to an external service which you could perform your own audio processing on. but that is probably going to inject quite a bit of delay on using it. and there is no recomendations for such a thing currently.

keen coyote
#

Does anybody have some experience with the M5Stack Atom Echo in regards to microphone capabilities? Is it at least somewhat reliable?

#

I wish the voice preview edition would feature airplay… that way it could synch in a music group within music assistant :/

astral shell
#

doesn't work with the generic sync group?

agile hamlet
keen coyote
keen coyote
agile hamlet
#

the AE is cheap enough that its nice to have on the desk to mess with for fun and inital testing of some things

keen coyote
# agile hamlet the AE is cheap enough that its nice to have on the desk to mess with for fun an...

I do have a couple echo 4, which have audio-in. Right now I got a couple audio-cast connected to them soon can stream my local videogame soundtrack through them. I can still use Alexa commands while also being able to listen to my music. I would like to at least replace the audio cast with voice assistants … the audio casts have airplay… which is cool for syncing… but the ESP32 assistant is lacking that… maybe I should just have 3 devices: Good speaker with audio-in, audio-cast (which has all the capabilities regarding good music playback) and a custom ESP32 with good microphones to use for voice commands…

#

Also, my voice PE just now completely stopped listening to wake up words and I have to press the button

agile hamlet
agile hamlet
limber reef
#

one thing I feel like is important to mention here
I assume you use the VA in german?
naturally english a way better supported language

#

especially for the STT part

limber reef
#

the wake words also suffer from not having enough german dialect "okay nabu" samples

#

considering we are up against companies like amazon/google/apple I am surprised how good this works tbh
I am always so excited when a new HA release kicks in and I can update my yaml

agile hamlet
#

yeah, home assistant has a few thousand volunteers helping out with samples. amazon just slerp a few million samples from their users without asking

limber reef
#

I run both a VA PE & respeaker lite and I don't really see a difference between them when it comes to wake word or STT performance
that being said I haven't done a fair comparison either

agile hamlet
limber reef
#

they run pretty similar hardware after all

#

so you have a lot more choice over the hardware

keen coyote
keen coyote
agile hamlet
#

i am english and have some trouble with it too to be fair

keen coyote
limber reef
#

so do I
especially my gf
the training set for female voices is even smaller I assume

agile hamlet
#

i find "okay nar boo" to work a bit better. or do a silly voice

keen coyote
#

Heh, gotta try that 😄

agile hamlet
#

the WW models will improve

limber reef
keen coyote
#

I love how highly customizable all this is. Just sad the voice PE can’t start a conversation… I have my pipeline setup to use navy casa cloud

limber reef
#

the voice pe can start a conversation and even continue one
pretty sure that was added recently

keen coyote
#

U thought so too, but I did try and remember I got an error bout it not being supported. Will have to check again. Would make things easier

limber reef
#

maybe your software on the VA isn't updated?

#

April 2, 2025

agile hamlet
#

yeah you can start conversation now

#

just need updated firmware and HA

keen coyote
#

Shouldn’t I have gotten an update prompt?

limber reef
#

did you adopt the device?

keen coyote
#

Oh, it’s not showing up in ESPHome as installed device… but as discovered

#

I should pair it, right?

agile hamlet
#

no

limber reef
#

if you adopt it you need to update manually

keen coyote
#

Okay, good

agile hamlet
#

dont adopt it

keen coyote
#

Alright. Let’s see the firmware version

agile hamlet
#

check on the device page if the firmware entity is enabled

keen coyote
#

Firmware: 25.3.4 (ESPHome 2025.3.3)

#

Seems I just messed up the automation?

limber reef
#

you want 2025.4.0

keen coyote
#

Turned on the “beta firmware” entity

agile hamlet
#

with 25.3.4 and home assistant 2025.4 you should be good to use start conversation

keen coyote
#

It seems to be up to date

agile hamlet
keen coyote
#

Will try again later then.

agile hamlet
#

do you have an LLM attached?

keen coyote
#

I must have messed up earlier then.

keen coyote
#

So, that’s the problem then

#

I should start paying for a LLM or run my own… which is to power intensive

agile hamlet
#

it only ramps up power during queries

keen coyote
#

Had planned to get a Jetson Orin nano super earlier, but people say it’s too weak

agile hamlet
#

i got a new gpu for my server the other day and it works great

keen coyote
#

Which one?

agile hamlet
#

it has power spikes but they are short so actual energy usage is not that much

#

i got a 5060ti 16gb

keen coyote
#

Not cheap…

agile hamlet
#

got at MSRP for 400 GBP which isnt awful

#

previous gen stuff was selling for more than that on ebay

limber reef
#

3060 12gb here used for like 240€
But I'd recommend more than 12gb vram to run 14b models tbh

agile hamlet
#

you could go with multiple 12gb too depending on your server setup

limber reef
#

Other pcie slot is used by my hba sadly haha

agile hamlet
#

i could maybe squeeze another card in but i am running out of lanes

#

gpu/nic/hba

limber reef
#

Gotta saw that slot open!

agile hamlet
keen coyote
#

How much did the whole setup cost? Looks pricey. Can’t one hook up a public AI agent like mistral or open ai or similar?… at least until I have enough money to invest on a dedicated home-run AI server

violet elm
#

You sure can use third-party LLM

keen coyote
#

Would probably be cheaper for now. But token costs must be watched for sure

agile hamlet
keen coyote
#

Is that server exclusively for HA?

agile hamlet
#

the mini pc on the shelf with the external drive runs HA and frigate

limber reef
limber reef
#

can't exactly big models locally lol

agile hamlet
#

pcie speed doesnt make much of a difference if its all in vram

limber reef
#

the set still needs to be loade into vram from system ram no?

agile hamlet
#

yeah but once the model is loaded its there. unless your changing models alot its not a huge thing

limber reef
#

I think I could split my x16 up into 2x8 👁️

agile hamlet
#

that system is a supermicro server board so it does all sorts off stuff. its all pretty configurable

keen coyote
agile hamlet
keen coyote
#

Okay. So, I probably won’t be willing to spend huge money on local LLM in the future, but I’d still like to be able to have my local voice devices act as conversation starters… which can’t be done with the local voice pipeline… and thus I’d have to at least setup the tts/stt pipeline and then connect to a external LLM… which means I’d have to make sure the machine HA is running on is capable enough. Gosh, if I were to do this I’d have to bind all Zigbee devices again… well, trying to figure out which intel NUC (or similar) would be a good choice now

agile hamlet
#

zigbee devices should "in theory" move with your adaptor. just setting up the adaptor on a new system should allow you pick up the network where you left it

#

although i am not really that expeirenced with that tbh

#

my views on hardware including my setup can be found here

keen coyote
#

So, TTS is good on the listed Beelink devices?

#

The Beelink S12 is currently available for around 180€ (new)

#

Beelink S13 for 190€ (new).

#

Though the S13 is capped at 16GB ram? Not sure, product description is not quite clear in that regard

thorn stag
keen coyote
#

I would like to have HA running reliably on it with many Zigbee devices and Music Assistant

#

And of course local TTS/STt

keen coyote
limber reef
#

assuming it has a gpu faster whisper can use yes

#

there seem to be alternative containers for intel igpu

#

(I cant vouch for how good it works tho)

keen coyote
#

Hm, I hadn’t planned to add any more hardware to the NUC. Especially not a external GPU. Don’t want to spend that much money on it right now

limber reef
#

the nuc should have an igpu no?

keen coyote
#

Oh, I think it does. Hold on. I thought I’d be talking bout additional hardware

#

Inter Graphics 1000MHz (24EUs)

#

That’s the gpu it seems

limber reef
#

Hardware requirements

Intel UHD Graphics for 11th generation Intel processors or newer
Intel Iris Xe graphics
Intel Arc graphics
Intel Server GPU
Intel Data Center GPU Flex Series
Intel Data Center GPU Max Series
#

this contaienr has those hardware requirements

#

your igpu should have a bit more info
which cpu does the nuc have?

limber reef
#

yeah I am clueless if this igpu is supported lol

agile hamlet
#

they are mostly limited to 16gb ram yes, i just use standard piper on the CPU for TTS as its not particually an issue.
for STT i am not really that sure about running on the box itself although i have heard that the tiny model does work pretty quickly. but not sure if that makes use of the igpu or not but i dont think so.

i have always offloaded whisper to a bigger machine because i have been able to

keen coyote
#

Hm, maybe I should hold off then until I got a dedicated machine with its own gpu to handle all the inference and heavy loading

limber reef
#

You could always use the cloud if you have problems getting it to work properly

#

Until you have better hardware

agile hamlet
#

this is also true

thorn stag
#

I thought Whisper could run on an Intel N200 CPU.

agile hamlet
#

if will run pretty quick if your running tiny i imagine. but accuracy may be hit or miss

thorn stag
#

What would it take for an accurate model?

agile hamlet
thorn stag
# agile hamlet with any level of responsiveness? realistically... a gpu

Now I see why so much ML stuff is cloud-based.

You need a lot of compute power with an extremely low duty cycle. That means that client-side compute is massively underutilized. Server-side compute has much better utilization.

Do you need a discrete GPU or is an integrated GPU sufficient?