#What config is best for phonecall live transcription?

1 messages · Page 1 of 1 (latest)

pearl folio
#

Which config is best for you is, of course, subjective.

One thing I would start with, is that phonecall and 2-ea are the models, and nova is a tier. We accept them as nova-phonecall and nova-2-ea though, just so you can understand some of the complexity you're already seeing in your config. Our docs will be changing to make this easier to understand, soon.

smart_format is a newer formatting, so punctuate is no longer needed (it's just not yet deprecated).

Sometimes, the model needs more context in a sentence to be able to have the highest confidence in what it has heard, and give you the best and most accurate result. interim_results is for those times when you need to see the transcript coming back in real-time. is_final in the payload describes when that output is finalised and the model has moved on to the next chunk of transcription.

endpointing turns on speach_final in the rules. Endpointing is used to identify when then natural flow in speech is changed. Say, the end of a sentence or when a speaker takes a breath.

I would suggest interim_results=true with model=nova-phonecall for phone call transcription. In the near future, nova-2 will have a phonecall model ready as part of it's GA launch, and the accuracy will improve further.

split topaz
#

Hi @pearl folio I am trying to use nova-phonecall with language Spanish and I get a 400 error

deepgram.transcription.live({
            language: 'es',
            model: 'nova-phonecall',
            sample_rate: 8000,
            encoding: 'mulaw',
        });

Is this model supported for language 'es'?

pearl folio