#Urgent: Major Regression in Polish Speech-to-Speech (Quality degradation since last week)Descript
1 messages · Page 1 of 1 (latest)
Thanks for flagging this. I can’t find any current, documented incident affecting Voice Changer/Speech‑to‑Speech (STS); please keep an eye on the status page (it currently only notes limited availability of some support forms). (status.elevenlabs.io)
What to check right now (most common cause of accent/clarity issues in STS)
- Force the correct STS model for Polish: set model_id=eleven_multilingual_sts_v2. The STS convert endpoint defaults to the English‑only model (eleven_english_sts_v2) if you don’t specify one, which will harm non‑English results. (elevenlabs.io)
- Confirm Polish is supported in Voice Changer (it is) and that the source accent/language should be preserved. If you’re hearing a Spanish‑like lisp or “muffled English,” that strongly suggests the wrong model was used. (elevenlabs.io)
- Voice settings for STS: when your input performance is already expressive, set style to 0 and stability high (close to 1.0) to maximize consistency. You can pass these via voice_settings on the request. (elevenlabs.io)
- If there’s any environmental noise, enable remove_background_noise=true on the STS request. (elevenlabs.io)
- Keep segments under 5 minutes for optimal processing (and split longer files). (elevenlabs.io)
- Accepted input formats (for uploads): MP3, M4A, FLAC, OGA/OGG, WAV; video: MKV, WEBM, MP4, MOV. If you feed raw PCM for lowest latency, declare file_format=pcm_s16le_16 (16‑bit mono 16 kHz). (help.elevenlabs.io)
Minimal STS API checklist (to rule out regressions/misconfig)
- Endpoint: POST /v1/speech‑to‑speech/:voice_id with model_id=eleven_multilingual_sts_v2 and (optionally) remove_background_noise=true. (elevenlabs.io)
- Choose a target voice_id, then pass voice_settings such as {"stability":1,"style":0}. (elevenlabs.io)
- If needed, verify models via GET /v1/models (look for can_do_voice_conversion and the eleven_multilingual_sts_v2 entry). (elevenlabs.io)
What to send us so we can investigate
- Two or three paired samples: original input WAV/MP3 and the resulting STS output. (help.elevenlabs.io)
- Exact request metadata: history_item_id and request_id for each run (you can fetch these via GET /v1/history or GET /v1/history/{id}). Also include voice_id, model_id, timestamp, and whether remove_background_noise was set. (elevenlabs.io)
- Your input format/codec and any pre‑processing applied (e.g., sharpening, sample rate). (help.elevenlabs.io)
How to file the report
- Submit a ticket with the above artifacts via the official support form. If the form is temporarily unavailable (see status page banner), reach out as directed there. (help.elevenlabs.io)
Reference docs you may find useful
- Voice Changer (capabilities, best practices, supported languages, parameters).
Subject: CRITICAL BUG: Polish STS routing to English encoder despite Multilingual v2 selection
Introduction: I am a professional creator using ElevenLabs for high-end Polish dubbing. I am reporting a severe regression in Speech-to-Speech (STS) that has made the tool unusable since last week.
The Evidence:
Model Selection is Correct: I explicitly use eleven_multilingual_sts_v2. However, the output sounds as if it is being processed by the English-only model.
Evolution of Phonetic Errors: Last week it was a Spanish lisp; now it is a muffled, "mushy" English accent. The Slavic "sharpness" of Polish consonants is gone.
High-Quality Input: I am using sharpened, studio-grade dialogues (processed from Veo 3). Two weeks ago, the system handled lower-quality audio perfectly. Now, it fails even on pristine, high-bitrate files.
TTS vs. STS: My Text-to-Speech (TTS) results are still perfect. The bug is 100% isolated to the STS encoder/mapping.
My Settings:
Model: Multilingual v2 (Confirmed)
Stability: Tested at 0.8 - 1.0
Style Exaggeration: 0
Input Language: Polish
Request: Please investigate why the eleven_multilingual_sts_v2 model is currently failing to preserve Polish phonetics. This looks like a server-side routing error where Polish requests are being treated with English phonetic weights.
Thanks for the detailed report. Here’s what I can verify from the official docs and what you can do to isolate whether requests are being routed to the English STS model.
What the docs confirm
- The Speech-to-Speech (Voice Changer) endpoint is POST /v1/speech-to-speech/:voice_id and accepts an audio file plus parameters including model_id. Critically, its default model is eleven_english_sts_v2 unless you explicitly set eleven_multilingual_sts_v2. There is no language_code parameter for STS; language/accent are inferred from the source audio. (elevenlabs.io)
- Voice Changer (STS) “preserves the source audio’s accent and language,” and Multilingual v2 supports Polish. (elevenlabs.io)
- Known UI quirk: when switching between Text-to-Speech and Voice Changer, the selected model can fail to switch correctly; reselecting the model (or toggling to another model and back) resolves it. (help.elevenlabs.io)
- You can confirm what model actually ran by fetching the history item: the History API returns model_id and source (e.g., STS) for each generation. (elevenlabs.io)
- STS input constraints and options: max 5 minutes per clip; accepted input formats include WAV/MP3/FLAC/M4A/OGG; optional remove_background_noise and low-latency raw PCM option file_format=pcm_s16le_16. (help.elevenlabs.io)
- Official model IDs include eleven_multilingual_sts_v2 and eleven_english_sts_v2. (help.elevenlabs.io)
- As of today (February 2, 2026), the public status page shows all systems operational; no incident is posted for STS. (status.elevenlabs.io)
Targeted checks to rule out silent routing to the English STS model
- Verify the endpoint and payload
- Ensure you are calling /v1/speech-to-speech/:voice_id (not a TTS endpoint) and that the multipart form contains model_id=eleven_multilingual_sts_v2. The STS default is English if model_id is omitted or overridden by a wrapper. (elevenlabs.io)
- Confirm what ran server-side
- After a failing run, call GET /v1/history/:history_item_id and check:
- source is STS
- model_id equals eleven_multilingual_sts_v2
If model_id shows eleven_english_sts_v2, the request was not processed by the multilingual STS model. (elevenlabs.io)
- If using the web UI
- Reopen Voice Changer and manually reselect “Multilingual STS v2,” or temporarily switch to English STS and back. This addresses the documented model-switching issue. (help.elevenlabs.io)
- Input/audio sanity
- Keep a short Polish diagnostic clip (10–20 s) with clear Polish sibilants/consonants; avoid multi-language content.
- Style/stability: for expressive source performances, set Style=0% and Stability high (near 100%) as recommended. (elevenlabs.io)
- If you suspect any environmental spill, try remove_background_noise=true to minimize artifacts
Hi,
Thank you for the detailed troubleshooting guide.
I have performed all the targeted checks you suggested to rule out "silent routing" to the English model, but the issue persists. Here are my findings:
Server-side verification: I checked the GET /v1/history for the failed generations. The response explicitly confirms that model_id was eleven_multilingual_sts_v2. It was not routed to eleven_english_sts_v2.
UI/Model State: I am aware of the UI switching quirk. I have manually toggled between models to ensure the correct state.
Input Audio: My input is a clear Polish TTS sample (Voice: Amy).
The Core Issue: Despite confirming server-side that eleven_multilingual_sts_v2 is running, the output audio sounds slurred and lacks diction (it sounds "mushy"), which was not happening previously with the exact same workflow.
Since I have confirmed that the request is not defaulting to the English model, your previous diagnosis does not apply here. The issue seems to be a quality regression within the eleven_multilingual_sts_v2 model itself (specifically when converting Female Input -> Male Output in Polish), rather than a routing error.
Please escalate this to the technical team, as the standard troubleshooting regarding model_id has been ruled out.
Best regards, Pawel
Thanks for running the server-side checks. Based on our docs:
- For Voice Changer (STS), the correct model for multilingual conversion is eleven_multilingual_sts_v2, and Polish is one of the 29 supported languages. The source audio’s accent/language should be preserved in the output. If this is not happening, it warrants a technical review. (elevenlabs.io)
Targeted checks you can try (doc-backed) before we escalate:
- Ensure you explicitly pass model_id=eleven_multilingual_sts_v2 in POST /v1/speech-to-speech/:voice_id (default is eleven_english_sts_v2). (elevenlabs.io)
- Try a lossless or higher-fidelity output_format (e.g., wav_44100 or mp3_44100_192) to rule out codec “mushiness.” Avoid optimize_streaming_latency, which trades quality for latency and is deprecated. (elevenlabs.io)
- If any room noise/reverb is present, set remove_background_noise=true. (elevenlabs.io)
- Keep inputs under 5 minutes and use a supported input format (e.g., WAV/FLAC/M4A/MP3). (help.elevenlabs.io)
- For consistency, use Stability near 100% and Style at 0% (you already have). If you’ve customized voice_settings, verify speed/similarity/use_speaker_boost via the Voice Settings endpoints. (elevenlabs.io)
If the issue persists after the above, please open a support ticket and include the following so engineering can investigate model behavior for Polish, especially female input → male output:
- 2–3 example history_item_id values (and request_id if present) from GET /v1/history showing the affected generations. (elevenlabs.io)
- The exact target voice_id, your full request parameters (model_id, output_format, voice_settings JSON, remove_background_noise/file_format), and the original input audio files. (elevenlabs.io)
- A short note on the timestamps where “mushy/slurred” articulation is audible.
You can submit this via the Help Center “Submit a request” form or email team@elevenlabs.io (include IDs, audio, and screenshots). I’m happy to forward internally once you share those artifacts. (help.elevenlabs.io)
References:
- Voice Changer (capabilities, languages, best practices): accent/language preserved; stability/style guidance. (elevenlabs.io)
- STS API (parameters, defaults, output formats, noise removal, deprecated latency flag). (elevenlabs.io)
- Models overview (STS model selection). (elevenlabs.io)
- History API (
Hi,
Thank you for the additional checks. I have proceeded with the "Targeted checks" as requested in your last message:
Output Format & Codecs: I verified that the issue is not caused by compression artifacts. I have tested with high-fidelity settings and standard output formats (MP3/WAV), avoiding latency optimizations. The "mushiness" persists regardless of the file format.
Model Verification: Confirmed again that model_id is explicitly eleven_multilingual_sts_v2.
Background Noise: Tested with remove_background_noise=true to rule out reverb interference.
The Issue Remains: The generated audio lacks diction and sounds slurred when converting Polish Female Input (Amy) to Male Output (George). Since standard troubleshooting is exhausted, I am providing the data requested for engineering investigation.
Requested Artifacts:
Target Voice ID: George - Generic British Royal
Example History Item IDs (Failed Generations):
ID 1: tQUYVW533oHd06GxtfNL
ID 2: aLyz4wqFPOVAifQp6itE
Input Files: Attached to this email (Original "Amy" TTS file).
Description: The articulation fails specifically on Polish consonants, turning them into a slur/mumble.
Please forward this to the engineering team to investigate the model behavior for Polish female-to-male conversion.
Best regards, Pawel