#> I'm using HA Cloud for STT
1 messages · Page 1 of 1 (latest)
I'm having a bit hard time understanding the "fix". It would make sense to me that training the model on problematic things would be a solution. But then again i have no idea 🤷♂️
That would be THE fix, indeed. But training STT engines is not easy (given possible lack of open source training data) and Microsoft Azure's STT (which Nabu Casa Cloud uses) is not open source itself as far as i know
the solution i linked attempts to compute a "relative distance" between words given how many characters need to change to get from one mistakenly heard word to the accepted sentence
I see. It's weird that however slowly i say "säädä" to a usb microphone, it hears "saada".
while this works with how words are written, too, in some less phonetic languages (like English) you have to change a lot of characters to get homophones
i have no clue how different those 2 sound in Finnish, sorry. but it's definitely the STT's fault
getting back: the proper solution would be to translate whatever the STT heard to phonemes, translate all sentences to phonemes, then perform the "relative distance" (Levenshtein distance) between the phoneme representations, as that would be the most complete way to figure how "different" 2 words sound
the problem is that there are no multi-language text-to-phoneme libraries with a suitable license which we could use
I understand. This is really the reason why I even suggested "bandaid" as for right now I guess there's no other way to fix it - and the word is what I mostly would use for setting cover positions for instance
if the issue is common enough among Finnish STTs, you could do the unlawful thing 😛 and add the word "saada". just add a comment that it should be removed once a proper fix will be implemented