#What variant of Chinese is supported by Deepgram?

1 messages · Page 1 of 1 (latest)

remote jolt
#

On the docs, zh is used as the language code for Chinese. But there are variants of the language, e.g Chinese Mandarin and Chinese Cantonese. How do I know which to map to in my application?

desert muskBOT
#

Hey there! This channel allows you to ask Kapa.ai for assistance. But it looks like you haven't connected your Discord profile to your Deepgram account yet. You can do so at https://community.deepgram.com - by doing so, you'll be able to ask questions in #1115960287183511643 if you need human assistance!

woven fulcrumBOT
#

Hi @remote jolt. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

#

Deepgram supports multiple variants of Chinese, and the language codes you use in your application should match the specific variant you want to transcribe.

From the documentation, here are the supported Chinese variants and their corresponding language codes:

  • Mandarin, Simplified:
    • zh, zh-CN, zh-Hans
  • Mandarin, Traditional:
    • zh-TW, zh-Hant
  • Cantonese, Traditional:
    • zh-HK

If you use zh, it will map to Mandarin (Simplified). For Cantonese, you should use zh-HK. This distinction is important for accurate transcription, as the models are trained for these specific variants. You can find these codes listed under the Nova-2 model options in the official documentation Models & Languages Overview.

Summary for mapping:

  • Use zh, zh-CN, or zh-Hans for Mandarin (Simplified)
  • Use zh-TW or zh-Hant for Mandarin (Traditional)
  • Use zh-HK for Cantonese

If your application needs to distinguish between Mandarin and Cantonese, ensure you use the correct code as above. If you use just zh, you will get Mandarin (Simplified) transcription, not Cantonese or Traditional Mandarin.

For more details, see the official documentation: Nova-2 Supported Languages.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@remote jolt Kapa.ai is still learning and improving, please let me know how I did by reacting below.

remote jolt
#

this mapping works for nova-2. how does it apply with nova-3-general

#

@woven fulcrum this mapping works for nova-2. how does it apply with nova-3-general

woven fulcrumBOT
#

Let me check the knowledge base for you again. One moment! :books:

#

For the nova-3-general model, the documentation does not explicitly list support for Chinese (Mandarin or Cantonese) among its available languages or as part of its multilingual (multi) mode. The supported languages for nova-3-general are:

  • Multilingual (multi): English, Spanish, French, German, Hindi, Russian, Portuguese, Japanese, Italian, and Dutch
  • Individual codes: English (en), German (de), Dutch (nl), Swedish (sv), Danish (da), and their regional variants

Chinese (zh), including Mandarin and Cantonese variants, is not listed as supported for nova-3-general. Therefore, the mapping you use for nova-2 (e.g., zh, zh-HK) does not apply to nova-3-general, as Chinese is not currently supported by that model according to the available documentation Models & Languages Overview.

If you need Chinese transcription, you should use nova-2 and the appropriate language codes for the variant you require. If nova-3 adds support for Chinese in the future, the documentation will be updated to reflect that.

Summary:

  • nova-2: Use zh, zh-CN, zh-TW, zh-HK, etc., for Chinese variants.
  • nova-3-general: Chinese (Mandarin or Cantonese) is not supported as of the current documentation Models & Languages Overview.

If you need further clarification, please refer to the official Models & Languages Overview.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon: