#[Bug] WebSocket TTS quality degradation β€” mispronunciation & abnormal pitch since ~11PM PST Mar 10

1 messages Β· Page 1 of 1 (latest)

light dove
#

Hi Fish Audio team,

We're experiencing a degradation in TTS output quality via the WebSocket API.

Timeline:

Working correctly: ~6:00 PM PST, Mar 10
Issue onset: ~11:00 PM PST, Mar 10

Symptoms:

Incorrect word readings (misread kanji/words)
Abnormal pitch and accent in Japanese TTS output

Isolation testing:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Method β”‚ Result β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Fish AI Studio (web) β”‚ Correct β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ WebSocket API via Pipecat (production) β”‚ Mispronunciation β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ WebSocket API standalone (no Pipecat) β”‚ Mispronunciation β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Since the issue reproduces when calling the WebSocket endpoint directly without any framework, we believe the root
cause is on the WebSocket API side.

Minimal repro:
git clone https://github.com/yuki901/fish-audio-test.git
cd fish-audio-test
pip install ormsgpack httpx-ws python-dotenv
echo "FISH_AUDIO_API_KEY=<your_api_key>" > .env

python fish_tts_ws.py
"γŠι›»θ©±γ‚γ‚ŠγŒγ¨γ†γ”γ–γ„γΎγ™γ€‚γŠε•γ„εˆγ‚γ›ηͺ“ε£γ§γ”γ–γ„γΎγ™γ€‚ζγ‚Œε…₯γ‚ŠγΎγ™γŒγ€εΎ‘η€Ύεγ¨γŠεε‰γ‚’γŠι‘˜γ„γ„γŸγ—γΎγ™γ€‚"
-o output.mp3
--voice 46745543e52548238593a3962be77e3a
--model s2-pro

WebSocket details:

Endpoint: wss://api.fish.audio/v1/tts/live
Protocol: start β†’ text β†’ stop events (ormsgpack-encoded)
Params: format=mp3, sample_rate=44100, latency=balanced, model header s2-pro

Expected: Natural Japanese pronunciation matching Studio output
Actual: Incorrect readings and unnatural pitch/accent

Were any changes made to the WebSocket API around 11 PM PST yesterday? Any ETA on a fix would be appreciated.

Thanks!

#

Audio comparison (both in repo):
output_studio.mp3 β€” Generated via Fish AI Studio (correct)
sample_output.mp3 β€” Generated via WebSocket API (broken)

violet sky
#

they are sharing the same backend, will take a look soon

light dove
#

Thank you! Any progress?

light dove
#

Oh It is fixed. Thank you so much!