Hi Fish Audio team,
We're experiencing a degradation in TTS output quality via the WebSocket API.
Timeline:
Working correctly: ~6:00 PM PST, Mar 10
Issue onset: ~11:00 PM PST, Mar 10
Symptoms:
Incorrect word readings (misread kanji/words)
Abnormal pitch and accent in Japanese TTS output
Isolation testing:
ββββββββββββββββββββββββββββββββββββββββββ¬βββββββββββββββββββ
β Method β Result β
ββββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββ€
β Fish AI Studio (web) β Correct β
ββββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββ€
β WebSocket API via Pipecat (production) β Mispronunciation β
ββββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββ€
β WebSocket API standalone (no Pipecat) β Mispronunciation β
ββββββββββββββββββββββββββββββββββββββββββ΄βββββββββββββββββββ
Since the issue reproduces when calling the WebSocket endpoint directly without any framework, we believe the root
cause is on the WebSocket API side.
Minimal repro:
git clone https://github.com/yuki901/fish-audio-test.git
cd fish-audio-test
pip install ormsgpack httpx-ws python-dotenv
echo "FISH_AUDIO_API_KEY=<your_api_key>" > .env
python fish_tts_ws.py
"γι»θ©±γγγγ¨γγγγγΎγγγεγεγγηͺε£γ§γγγγΎγγζγε
₯γγΎγγγεΎ‘η€Ύεγ¨γεεγγι‘γγγγγΎγγ"
-o output.mp3
--voice 46745543e52548238593a3962be77e3a
--model s2-pro
WebSocket details:
Endpoint: wss://api.fish.audio/v1/tts/live
Protocol: start β text β stop events (ormsgpack-encoded)
Params: format=mp3, sample_rate=44100, latency=balanced, model header s2-pro
Expected: Natural Japanese pronunciation matching Studio output
Actual: Incorrect readings and unnatural pitch/accent
Were any changes made to the WebSocket API around 11 PM PST yesterday? Any ETA on a fix would be appreciated.
Thanks!