#Any plans to include mistral tts?
1 messages ยท Page 1 of 1 (latest)
Pr?
it beats only the old elevenlabs v2.5
I like this. Most people pushing a lot of output are on 2.5, because 3 doesn't have a faster/cheaper option like the 2 and 2.5 options did. Personally, like pushing voices to 1.5x speed, will have to test requirements of this personally.
Cap makes a good point, but free always beats paid anything within similar quality, and most people aren't injecting [emotion] tags by default, and requires a split track in your outputs for the formatting.
Minimax voice / audio is really good (same as Elevenlabs v3) with emotions. my agent had made an realistic laugh with it. Voice clone ist really good if you want it be your MiniMe.
Excellent info, will have to try!
Isn't the vllm-omni example have openai style API ? If that's the case you can use it via openai tts with custom baseurl.
I can't run it locally yet since my GPU is on windows machine so, right now I'm implementing the model without the vllm wsl2 path.
TTS providers shouldn't be hardcoded but should be completely configurable, like a lot of stuff in OpenClaw. I've used Speechify instead of Elevenlabs, but I just got my bot to wire it up and make a SKILL for it. We did the exact same thing with ImageGenerationModel for Alibaba's ai studio, since that too wasn't "natively" supported by Openclaw. Seems very silly to hardcode available providers for any of this stuff when the landscape changes so quickly, and people build their own endpoints.
For TTS I have 2 local endpoints running which I use... they are not wired into OC's config because it's hardcoded Elevenlabs/say/edge... if everything was just config .json files it would be a breeze to include any endpoint you want.