Voice Agent API: How to pass the language of the STT model? | Deepgram | Page 1

livid comet · 2025-04-22T15:38:24.096Z

Hey there! This channel allows you to ask Kapa.ai for assistance. But it looks like you haven't connected your Discord profile to your Deepgram account yet. You can do so at https://community.deepgram.com - by doing so, you'll be able to ask questions in if you need human assistance!

Hi @knotty badge. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

To configure the language for the Speech-to-Text (STT) model in the Voice Agent API, you need to specify it in the agent.listen section of your settings configuration message that you send immediately after establishing the WebSocket connection.

Here's how you can set the language parameter:

{
  "agent": {
    "listen": {
      "model": "nova-3",
      "language": "en"
    }
  }
}

The language parameter should be set to the language code you want to use for transcription. For example, en for English, es for Spanish, etc. You can find the full list of supported languages and compatible models in the Models & Languages Overview.

If you're working with multilingual content, you can also use the Nova-2 or Nova-3 models with language=multi to enable Multilingual Codeswitching, which allows the model to detect and transcribe multiple languages within the same audio stream.

For English dialects specifically, when using Nova-2 or Nova-3, you can simply set language=en and the model will automatically detect the dialect of English being spoken.

The Voice Agent API uses the same underlying STT capabilities as Deepgram's other speech recognition services, but integrated into a conversational AI framework that combines speech recognition, natural language understanding, and text-to-speech.

#Voice Agent API: How to pass the language of the STT model?

The Voice Agent API uses the same underlying STT capabilities as Deepgram's other speech recognition services, but integrated into a conversational AI framework that combines speech recognition, natural language understanding, and text-to-speech.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon: