#Disappointed with Voice Assistant PE

1 messages ยท Page 1 of 1 (latest)

solid hill
#

I've had my HA Voice PE for a few weeks now but struggling to get it to work efficiently, perhaps I was hoping for too much but it seems poor at wake word detection.

For myself, UK English Male the only wake word that works reliably is okay nabu, but I guess I can live with that.

But for my partner, UK English Female none of the wake words work to any reliable fashion - if she trys and talks with a deeper voice it sometimes work. Same for my children also.

Was it only trained by male adults?

What can be done to get the wake word recognition much better, can I easily train it myself?

tepid fox
#

Training your own wake word is not for the faint of heart, it requires a lot of trial and error, use of google colab stuff, compute time, etc. You also would want to gather a bunch of real-world samples, as the faux samples generated will result in a wake word that benchmarks well but performs poorly in real world use. The Ok Nabu wakeword was trained on over 20,000 real samples provided by community members. You could look into providing samples if you are comfortable with that, that may help the next iteration of the model perform better. ๐Ÿ™‚ Otherwise not sure there's much else to be done, generally Ok Nabu is the best performing of the wake words.

solid hill
#

Thanks for your detailed reply, appreciated. I will try and find out about providing samples that seems to be the easiest route to try and improve it.

tepid fox
#

there's a link for it, lemme see if I can find it..

solid hill
#

Great, thanks for that ๐Ÿ™‚

tepid fox
#

No, the above is for openWakeWord, Voice PE uses MicroWakeWord

final fog
#

You can also lower probability cut-off on the model, if you adopt device in ESPHome. It helped me to get my Koala satellites to listen to my family.

solid hill
#

I can adopt it in ESPHome as I have that installed for a different ESP32 project I have - presumably though once adopted it won't pickup any updates automatically anymore (or I'll have to compile them locally) ?

final fog
#

Works for me since I still track all the changes to the PE YAML and make corresponding changes to my config ..

unkempt sparrow
#

Was it only trained by male adults?

Looking at Github... ๐Ÿ‘€

#

I also don't love the default wake words @solid hill

#

But what I understand is making your own that works reliably is very difficult

#

I wonder if Wake Word Collective could expand to be more open.

I.e. you can suggest new wake words, people vote. To download wake words, you have to contribute voice samples for the highest 5 voted wake words.

Over time it would work through all the best suggestions.

#

Wouldn't guarantee that your idea would be trained but maybe there would be a few new ones every now and again

tepid fox
#

The above are thr contributors for MicroWakeWord I assume, NabuCasa isn't listing all the people who donated voices obviously for privacy reasons and such ๐Ÿ˜‰

#

There were over 20,000 samples submitted, some could be adults, some women, some kids, etc. But hard to really know the amount of each

cyan wigeon
#

And the way to ensure that your voice is heard is to contribute at the link above ๐Ÿ™‚

final fog
tepid fox
#

Yeah but what I mean is, say out of 20,000 samples, maybe 10 families of 2 children each contributed child voices and maybe female voices, as a hypothetical example. If each person did 3 samples each, that would equate to 90 samples of women and children out of the 20,000. That also assumes they only did submission one time, maybe they did more. All I was getting at is because the data is anonymous, no one is catalouging it to know it has x children samples, y female samples, z male samples, etc

#

so basically, hard to know if it is bad at hearing those because of lack of samples in those domains, or something else ๐Ÿ˜„

north heath
#

My best estimate is around 5% of the samples are from women or children. This is based on a basic logistic regression model that predicts gender based on a voice fingerprint and manually sampled to verify. It's not perfect by any means, but splitting the samples by gender helps reduce the bias (though obviously doesn't eliminate it). If I throw all the samples in without separating, the model is incredibly biased and doesn't work for women and children. Splitting up the samples in this way reduces the bias quite a bit in benchmarking, but as people have reported, it still exists.

#

I think the model has learned to generalize more for men beyond being in the training set. For women, it tends to work okay if they have samples in the training set, but it seems the model has trouble generalizing for women not in the set.

cyan wigeon
#

I wonder if simply increasing the pitch of existing samples would help fill the gap

deep token
#

Are samples also collected for the other wake words, Like Hey, Jarvis?

north heath
north heath
solid hill
#

So just to clarify, even if say 95% of samples were from men then submitting a female sample would still make a good difference ie greater than 5% in helping the wake word model?

north heath
# solid hill So just to clarify, even if say 95% of samples were from men then submitting a f...

Yes, it would be very beneficial. The 95% number is based on the last time I processed all the samples to make a training set, the ratio may have changed since then. When training, I give it the same number of male samples as female samples. In practice, this means there are about ~1200 female samples repeatedly used versus ~20,000 male samples repeatedly used during the training process.

#

1200 is a small number of samples to train on. That's why I think the model works well for the speakers in those 1200 samples, but it struggle to generalize to women not in the set.

solid hill
#

Great, we are all submitting samples. Presumably a future update of Voice PE will pick them up ๐Ÿ™‚