To configure the language for the Speech-to-Text (STT) model in the Voice Agent API, you need to specify it in the agent.listen section of your settings configuration message that you send immediately after establishing the WebSocket connection.
Here's how you can set the language parameter:
{
"agent": {
"listen": {
"model": "nova-3",
"language": "en"
}
}
}
The language parameter should be set to the language code you want to use for transcription. For example, en for English, es for Spanish, etc. You can find the full list of supported languages and compatible models in the Models & Languages Overview.
If you're working with multilingual content, you can also use the Nova-2 or Nova-3 models with language=multi to enable Multilingual Codeswitching, which allows the model to detect and transcribe multiple languages within the same audio stream.
For English dialects specifically, when using Nova-2 or Nova-3, you can simply set language=en and the model will automatically detect the dialect of English being spoken.
The Voice Agent API uses the same underlying STT capabilities as Deepgram's other speech recognition services, but integrated into a conversational AI framework that combines speech recognition, natural language understanding, and text-to-speech.
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon: