#Multilingual TTS?

1 messages · Page 1 of 1 (latest)

still sundial
#

I’ve been playing with different STT and TTS models, and I like that Whisper can transcribe multiple languages seamlessly. However, I’ve not found that Piper models (or Koroko?) can do the same. If I use conversation with an LLM, the model will respond in the language I spoke, but the STT model tries to use English (my default) and it comes out unintelligible.

Has anyone figured something like this out?

ancient anvil
# still sundial I’ve been playing with different STT and TTS models, and I like that Whisper can...

no but some thoughts:

i prefix this saying that i am someone that essentially speaks only 1 language... however i do like messing with stuff. i have seen the same when forcing my glados voice to speak other languages. it seems to garble it a bit.

I also have messed with getting different language models to speak english. to see how they do and how the accent comes across. and during that messing around i discovered that some models are better than others are surviving the language swap so i would suggest you try a few models and see what works. maybe your 2nd language model moves to english better than the english model to 2nd language

auto "code switching" could be interesting if only an edge case. i havent tried but a hacky approach might look something like.

maybe you could add a script for "changing language" and tell the llm in the prompt to call it. which would then change the pipeline. but then that would only work from the next call.

although perhaps call the script with a target language and the actual responce. then return a blank so orginal pipeline finishes and then the script can tts.speak the responce.

or tell it to directly call a "respond in german" script if the text is german and then return blank. orginal pipeline ends and you can tts.speak with a specified model in the script.

still sundial
#

Yea. Good ideas. Interesting. Unfortunately I don’t think there is any kind of action or anything to change pipelines for something like auto switching. I’d like to do that depending on whether or not my desktop is booted up. Instead I’m probably going to trigger WoL when a satellite starts listening (STT is on my NAS, LLM is on my desktop).

ancient anvil
solemn brook
#

What I did was to switch the piplelines with a command.

https://community.home-assistant.io/t/multi-language-change-with-voice-pe/843954

ancient anvil
solemn brook
#

Yeah, and it works well, just have to remember with the wife to change the language back sometimes. I probably should do an automation that if no media is playing then back to english. we only use the different languages to find certain songs to play