#> I'm using HA Cloud for STT

1 messages · Page 1 of 1 (latest)

real cedar
#

I'm having a bit hard time understanding the "fix". It would make sense to me that training the model on problematic things would be a solution. But then again i have no idea 🤷‍♂️

dusky zenith
#

That would be THE fix, indeed. But training STT engines is not easy (given possible lack of open source training data) and Microsoft Azure's STT (which Nabu Casa Cloud uses) is not open source itself as far as i know

#

the solution i linked attempts to compute a "relative distance" between words given how many characters need to change to get from one mistakenly heard word to the accepted sentence

real cedar
dusky zenith
#

while this works with how words are written, too, in some less phonetic languages (like English) you have to change a lot of characters to get homophones

dusky zenith
#

getting back: the proper solution would be to translate whatever the STT heard to phonemes, translate all sentences to phonemes, then perform the "relative distance" (Levenshtein distance) between the phoneme representations, as that would be the most complete way to figure how "different" 2 words sound

#

the problem is that there are no multi-language text-to-phoneme libraries with a suitable license which we could use

real cedar
#

I understand. This is really the reason why I even suggested "bandaid" as for right now I guess there's no other way to fix it - and the word is what I mostly would use for setting cover positions for instance

dusky zenith
#

if the issue is common enough among Finnish STTs, you could do the unlawful thing 😛 and add the word "saada". just add a comment that it should be removed once a proper fix will be implemented