#Voxtral TTS voice parameter unclear (voice vs cloning / supported voices?)

2 messages · Page 1 of 1 (latest)

signal quarry
#

Hey — I’m trying to use mistralai/voxtral-mini-tts-2603 via OpenRouter and I’m confused about voice handling.

  • The OpenAI-style endpoint /audio/speech accepts a voice parameter

  • This works for openai/gpt-4o-mini-tts

  • But for Voxtral:

    • voice doesn’t seem documented
    • behavior is inconsistent (ignored? default? unsupported?)

Questions:

  1. Does Voxtral support named voices at all via OpenRouter?
  2. If yes, is there a list of valid voices?
  3. Or is the intended usage reference audio (voice cloning) only?
  4. If cloning is supported, what is the correct request format on OpenRouter?

Right now it's unclear whether:

  • voice is silently ignored
  • mapped internally
  • or not supported at all

Would appreciate clarification or docs 🙏

shut flaxBOT
#

I'm an AI assistant - my answers may be incorrect and I cannot see screenshots. Please verify information with the OpenRouter docs.

Here's what is currently documented for mistralai/voxtral-mini-tts-2603 and voice handling on OpenRouter:

  1. Named voices via OpenRouter:
    There is no documentation stating that Voxtral supports named voices or a voice parameter. The OpenRouter docs and the Voxtral model page do not list support for named voices, nor do they mention a list of valid voices for Voxtral models. The audio guide only documents named voice support for OpenAI models (e.g., openai/gpt-4o-audio-preview) and does not mention Voxtral in this context Audio Docs.

  2. List of valid voices:
    There is no published list of valid voice names for Voxtral on OpenRouter. This parameter appears to be undocumented and likely unsupported for this model.

  3. Voice cloning / reference audio:
    The documentation does not mention reference audio or voice cloning for Voxtral on OpenRouter. The only documented audio support is for audio input (for transcription/analysis), not for TTS with selectable or cloned voices Voxtral Small API.

  4. Correct request format:
    For Voxtral, the documented use case is sending an audio file for transcription/understanding, not TTS voice synthesis with named/cloned voices. The API examples only show input audio for analysis, not generating speech output with voice selection Audio Docs.

Summary:

  • The voice parameter is not docum...