#what s deepgram s language support look

1 messages · Page 1 of 1 (latest)

neon halo
#

30 plus major languages and 100 plus for translation. It's actually quite powerful. If you set it up correctly, the latency is so good that I have enough headroom to incorporate a separate translation stage. You can see my example here, which I call the "FaF" transcriber program I wrote in Python. Even though I'm making a second API call for translations, it actually produces results faster than the listener can hear the words compared to what they can see printed from the streaming response in the console.

#

As far as submitting complete sentences, I'm mostly doing the same thing. However, I'm using a special NLP model that pre-processes the text and splits it up into the most latency-efficient form for 11 labs. I actually got the idea from your attempt at creating a fast script using NLTK, but I found it to be too slow and ineffective.

quasi igloo
#

I'd rather stick wtih deepL as a translation stage as it's simply higher quality than most other offerings I've found

#

but deepgram seems like a good idea to integrate as well

neon halo
#

What is your use case? You're just trying to basically take some sort of stream of audio in one language, reproduce it in another, and have it be spoken by 11 labs?

quasi igloo
#

pretty much

neon halo
#

Ah Yeah, that's where it gets complex because you can't just trim sentences like you can in English because of the various grammatical structures of different languages, so you have to sort of have a little bit of a buffer time.

quasi igloo
#

yeah, it's pretty complex due to all the non-english fun times.

#

so I can't just get back the partial results from deepgram

#

I'd need to wait for the complete sentences still

#

so my best bet is probably to still stick with whisper

#

then again, maybe something can be done on the audio end

neon halo
#

Maybe, but there's actually room to improve it because DeepGram has some unique utterance and endpointing systems that you can tweak where you can probably get some good results in your target languages.

quasi igloo
#

if I can just detect when a sentence ends from the audio, I wouldn't really need to do the text processing

#

eh, maybe something to look into

neon halo
#

Yeah, DeepGram has punctuation ability, so I would look into it.

#

It's a more nuanced challenge, but I'm quite certain with some focused effort you could get it to be pretty fast.

quasi igloo
#

I think I'll just try to figure out how much latency whisper is contributing to begin with, then go from there

#

if deepL/elevenlabs turn out to be like 80% of the latency anyway, not much point in trying to shave off the little amount I'd save

neon halo
#

I'll tell you what, I'll DM you a zip file with some of my custom optimization tools I've made for DeepGram, which you can switch the languages around and play with the settings and it should be pretty quick for you to get up and running and you can see what sort of difference it makes.

quasi igloo
#

oh, cool stuff, thanks

neon halo
#

Happy to share. I've learned a lot from you.