Hey team! We're building a real-time voice coaching app using Pipecat + Daily WebRTC + Deepgram
nova-3 streaming STT.
Problem: When users connect via Bluetooth headsets, their mic audio is captured via BT SCO (8kHz
CVSD or 16kHz mSBC, narrow-band). The audio arrives at Deepgram as 16kHz linear16 (WebRTC upsamples
it), but the actual frequency content is limited to 4-8kHz. We're seeing significantly degraded
WER — garbled transcriptions like "exercises to redeem my short of aim" instead of "exercises to
reduce my shortness of breath."
Questions:
- For narrow-band BT SCO audio upsampled to 16kHz, should we use
nova-2-phonecallinstead of
nova-3? - Does nova-3 have any built-in robustness to narrow-band/upsampled audio, or is it strictly
optimized for wideband? - Should we pass
sample_rate: 8000when we know the source is BT SCO, even though the actual
data is 16kHz container? - Any other recommendations for handling degraded BT mic input?
Usage params: model=nova-3, encoding=linear16, sample_rate=16000, channels=1, interim_results=true,
smart_format=true, endpointing=500ms. Also if there are any tips on bluetooth package/libs, pls let us know. Backend in Python