#what s deepgram s language support look
1 messages · Page 1 of 1 (latest)
30 plus major languages and 100 plus for translation. It's actually quite powerful. If you set it up correctly, the latency is so good that I have enough headroom to incorporate a separate translation stage. You can see my example here, which I call the "FaF" transcriber program I wrote in Python. Even though I'm making a second API call for translations, it actually produces results faster than the listener can hear the words compared to what they can see printed from the streaming response in the console.
As far as submitting complete sentences, I'm mostly doing the same thing. However, I'm using a special NLP model that pre-processes the text and splits it up into the most latency-efficient form for 11 labs. I actually got the idea from your attempt at creating a fast script using NLTK, but I found it to be too slow and ineffective.
yeah, right now I'm doing faster-whisper > deepL > elevenlabs, which is rather slow
I'd rather stick wtih deepL as a translation stage as it's simply higher quality than most other offerings I've found
but deepgram seems like a good idea to integrate as well
What is your use case? You're just trying to basically take some sort of stream of audio in one language, reproduce it in another, and have it be spoken by 11 labs?
pretty much
Ah Yeah, that's where it gets complex because you can't just trim sentences like you can in English because of the various grammatical structures of different languages, so you have to sort of have a little bit of a buffer time.
yeah, it's pretty complex due to all the non-english fun times.
so I can't just get back the partial results from deepgram
I'd need to wait for the complete sentences still
so my best bet is probably to still stick with whisper
then again, maybe something can be done on the audio end
Maybe, but there's actually room to improve it because DeepGram has some unique utterance and endpointing systems that you can tweak where you can probably get some good results in your target languages.
if I can just detect when a sentence ends from the audio, I wouldn't really need to do the text processing
eh, maybe something to look into
Yeah, DeepGram has punctuation ability, so I would look into it.
It's a more nuanced challenge, but I'm quite certain with some focused effort you could get it to be pretty fast.
ideally I'd like to target all the languages supported by whatever I'm using so that's... probably not something I can tackle
I think I'll just try to figure out how much latency whisper is contributing to begin with, then go from there
if deepL/elevenlabs turn out to be like 80% of the latency anyway, not much point in trying to shave off the little amount I'd save
I'll tell you what, I'll DM you a zip file with some of my custom optimization tools I've made for DeepGram, which you can switch the languages around and play with the settings and it should be pretty quick for you to get up and running and you can see what sort of difference it makes.
oh, cool stuff, thanks
Happy to share. I've learned a lot from you.