TLDR: Vedal, what does the setup for Neuro-sama's speech recognition and TTS look like?
-# MSEdge? OpenAIWhisper? AlltalkV2? Kokoro?
-# I wanted to ask privately, but this seems to be the more appropriate approach.
I am an AI enthusiast, and, admittedly, I have only recently found out about Neuro-sama. Prior to this, I've been pulling my hair out trying to figure out how to run a speech recognition and TTS model that work in tandem with the language models I run.
Options I've explored for TTS are mentioned above (excluding Whisper), but the models and engines available either lack speed or quality.
So it's why I'd like to ask if you might point me in the direction you took for Neuro-Sama. What handles its STT and TTS (if those two are handled by third parties and not original engines/models)?
I'd imagine your incredibly busy, so thank you in advance for taking a moment to read this.