#experiencing an issue with Twilio when trying to use it for real-time audio process

1 messages · Page 1 of 1 (latest)

halcyon cypress
#

Hi everyone,
I'm experiencing an issue with Twilio when trying to use it for real-time audio processing with OpenAI's Whisper and GPT models. Here's a summary of the problem:

The Setup:

I’m using Twilio to handle phone calls (incoming and outgoing).
The audio from these calls is sent to OpenAI's Whisper API for transcription, and the transcribed text is then processed by OpenAI's GPT model to generate a response.
The response is sent back to Twilio to deliver it to the caller.
The Problem:

When I process the audio directly with OpenAI (bypassing Twilio), the transcription and response generation are highly accurate.
However, when the audio goes through Twilio, the recognition accuracy drops significantly, especially for dates, numbers, and other precise terms.
Twilio reports no packet loss, jitter, or other network issues. Despite this, the audio quality seems to degrade in a way that impacts OpenAI's processing.
Debugging Steps Taken:

I recorded audio from Twilio and compared it to the original raw audio. Twilio’s output seems to have subtle distortions or artifacts.
I simulated "phone-quality" audio using tools like sox (downsampling, μ-law encoding, highpass/lowpass filters) and sent it directly to OpenAI. Even with lower quality, the accuracy was fine, ruling out inherent limitations of the audio format.
Adjusting Twilio's settings (e.g., gain, sampling rate) didn’t resolve the issue.
Current Thoughts:

The issue may be related to how Twilio processes or encodes the audio before delivering it to my backend.
I suspect some hidden transformation or compression is degrading the audio in a way that Whisper struggles to handle.
Help Needed:

Has anyone experienced similar issues with Twilio?
Are there alternative APIs or services that integrate better with OpenAI for handling phone calls and real-time audio processing?
Any advice or insights would be greatly appreciated. Thanks in advance!

my backend is nodejs typescript