#Stream to Twilio Voice?
1 messages · Page 1 of 1 (latest)
I haven't found a solution to this. I now temporarily store the file in R2... Did you end up finding a better solution?
Hey guys, any luck with this? Trying to figure this out
Same. Any luck?
bump on this
Still nothing, any luck guys ?
I just reached to them about it
To eleven labs support?
Hey hey! Sounds similar to some of the stuff we do at Bland.ai. We're an API for AI phone calling; like twilio, we make it really easy for developers to add inbound and outbound AI phone calling to their applications.
Check us out! 🙂
Why would I use you over twilio
For context: we do AI phone calling - not regular phone calling. Meaning, our AI agent can make a call on your behalf, armed with an objective (a given task) and a set of parameters (info it needs to complete the task).
Coming back to your question:
You could either 1) Build your own phone calling infra on top of Twilio or 2) Use Twilio's programmable voice.
Option 1: Building your own AI phone calling infra on top of Twilio is hard. Largest challenges are 1) Enabling LLMs to understand the nuances of human speech and 2) Doing so with low latency
Option 2: If you use Twilio's voice API, the voice sounds super robotic and the talk-track is pre-scripted. You don't have the same flexibility - and abiilty to engage someone in natural conversation.
So nobody figured this out with eleven labs ?
I believe you, if you want to share how you did that i will appreciate that if not this is a very stupid place to try and sell your service.
@pastel coral we're going to do a HN launch shortly; happy to send you the early version of our post though 😄
Will update this thread once it's live
Lol I’m not paying bland.ai money for a tiny feature that other tts providers already have for free
How about you share with the developers if you actually know something instead of finding customers who aren’t a good fit for your product through this
Ok, I managed to do it with ffmpeg converting the stream on the fly. Works great:
`voice.textToSpeechStream(elevenLabsApiKey, voiceID, answerText).then((res) => {
res.pipe(ffmpeg.stdin);
const CHUNK_SIZE = 320;
let buffer = Buffer.alloc(0);
ffmpeg.stdout.on('data', (chunk) => {
buffer = Buffer.concat([buffer, chunk]);
while (buffer.length >= CHUNK_SIZE) {
const chunkToSend = buffer.slice(0, CHUNK_SIZE);
buffer = buffer.slice(CHUNK_SIZE);
const base64EncodedChunk = chunkToSend.toString('base64');
const mediaMessage = {
event: 'media',
streamSid: streamSid,
media: {
payload: base64EncodedChunk,
},
};
ws.send(JSON.stringify(mediaMessage));
}
});
})`
@spare fern Let me just get it straight, you are sending this line to twilio ? : ws.send(JSON.stringify(mediaMessage));
Sending the media message through the web socket. Not good? (Amateur developer here)
I see, but sending media through socket was never the problem. The problem was converting it to right format before sending it to twilio
Here are the ffmpeg settings:
const ffmpeg = spawn('ffmpeg', [ '-i', 'pipe:0', '-f', 'mulaw', '-ar', '8000', '-ac', '1', 'pipe:1' ]);
Thanks mate !!!!!! LIFE SAVERRRRRRRR 🥹 ! Have been looking for solution for the whole day !!!!!!
Hi @spare fern , just got a quick question. Did you send a mark message and clear message after the media message was sent ? 🤔🤔
No, I didn’t. Works fine for my use case
Got it ! Thanks for your reply 🤩🫡
Hey, i managed to do the same, can anybody share the way they connect and send this media chunks to twilio. I am using connect.stream(url) to connect twilio, and then trying to send raw media through socket, but i am not hearing anything on my phone. Anybody who can help me ? I can provide you the python code
I think you should check if the media is in MULAW foramt, i tested with @spare fern 's solution, works well.
I used the same method for conversion, but i am using socketio for emiting the data, maybe that is the problem? Also what exactly is the streamSid, where can i find it?
Its from the start message
Sorry for bodering you, could you provide me the code where you connect twilio so i can see the example, please?
Hi , if you are using Python, i think this is relavant :
https://stackoverflow.com/questions/75475925/stream-audio-back-to-twilio-via-websocket-connection
Will look into it. Thank you ❤️💪🏻
Managed to connect everything, thank you, but i have conversion issues, i am using very similar approach as @spare fern in python, but all i hear is noise. I am not sure what am i doing wrong, will paste the code here so maybe someone can help me fix that.
`def convert_raw_audio_chunk_to_mulaw(raw_audio_chunk):
try:
# Define the FFmpeg command
ffmpeg_command = [
'ffmpeg',
'-f',
's16le', # Input format (16-bit little-endian)
'-ar',
'44100', # Input sample rate
'-ac',
'1', # Input channels (mono)
'-i',
'pipe:0', # Input from stdin
'-f',
'mulaw', # Output format (mulaw)
'-ar',
'8000', # Output sample rate
'-ac',
'1', # Output channels (mono)
'-loglevel',
'error', # Suppress FFmpeg logs
'pipe:1' # Output to stdout
]
# Start the FFmpeg process
ffmpeg_process = subprocess.Popen(ffmpeg_command,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
# Write the chunk to FFmpeg's stdin
ffmpeg_process.stdin.write(raw_audio_chunk)
# Close stdin to signal the end of input
ffmpeg_process.stdin.close()
# Wait for FFmpeg to finish
ffmpeg_process.wait()
# Check for errors
if ffmpeg_process.returncode != 0:
raise Exception(
f'FFmpeg error: {ffmpeg_process.stderr.read().decode("utf-8")}')
# Get the mulaw-encoded audio from FFmpeg's stdout
mulaw_audio = ffmpeg_process.stdout.read()
# Encode the mulaw audio as base64
base64_audio = base64.b64encode(mulaw_audio).decode()
return base64_audio
except Exception as e:
print(f'Error: {str(e)}')
return None`
This is my other attempt based on stackoverflow discussion you sent me :
`class AudioConverter:
def init(self):
self.input_sample_rate = 44100
self.output_sample_rate = 8000
self.audio_buffer = b''
async def convert_audio_chunk_to_xmulaw(self, audio_chunk):
try:
if len(audio_chunk) == 2048:
# Perform sample rate conversion from 44100Hz to 8000Hz
converted_audio = audioop.ratecv(audio_chunk, 2, 1,
self.input_sample_rate,
self.output_sample_rate, None)
# Convert the chunk to mulaw using lin2ulaw
mulaw_audio = audioop.lin2ulaw(converted_audio[0], 2)
# Encode the mulaw audio as base64
base64_audio = base64.b64encode(mulaw_audio).decode("utf-8")
return base64_audio
else:
pass
except Exception as e:
print(f'Error: {str(e)}')
return None
`
Per my experience, i never set input sample rate or anything about the input sample, just make sure the output sameple rate is 8000
I stick to @spare fern 's setup on ffmpeg, didnt change anything at all. just like this :
const ffmpeg = spawn('ffmpeg', [
'-i', 'pipe:0',
'-f', 'mulaw',
'-ar', '8000',
'-ac', '1',
'pipe:1'
]);
Managed to fix the problem, thank you so much!
@pastel coral also looking to fix this in Python, what ended up working for you?
I will share it with you tomorrow. I am having some issues. First thing is the latency, i cannot get it below 3s, and i think it is the python library issue, and the second one is the way ai generates chunks, because the punctuations make audio sound like shit.
If someone has done that, i would love to read about that.
currently im just splitting it by sentences and that works fine for me (just send post request to the api endpoint not the python library)
im trying to use the library for input streaming but running into the mulaw conversion issue
i'll play around with it today
I am using input streaming in elevenlabs python library, is there input streaming via post request?
I will share my code conversion with you, just i am currently not home
This is the way i am using converter:
`class AudioConverter:
async def convert_audio_chunk_to_xmulaw(self, audio_chunk):
try:
# Define the FFmpeg command
ffmpeg_command = [
'ffmpeg', '-i', 'pipe:0', '-f', 'mulaw', '-ar', '8000', '-ac', '1',
'pipe:1'
]
# Start the FFmpeg process
ffmpeg_process = subprocess.Popen(ffmpeg_command,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
# Feed the audio chunk to FFmpeg's stdin
mulaw_audio, stderr = ffmpeg_process.communicate(input=audio_chunk)
# Check for errors
if ffmpeg_process.returncode != 0:
raise Exception(f'FFmpeg error: {stderr.decode("utf-8")}')
# Encode the mulaw audio as base64
base64_audio = base64.b64encode(mulaw_audio).decode("utf-8")
return base64_audio
except Exception as e:
print(f'Error: {str(e)}')
return None
Usage
converter = AudioConverter()`
@craggy delta So you managed to get response below 4 seconds ? Because i cannot get that for some reason
Ah that implementation worked for a while but eventually I ran into this issue:
I also noticed that for some reason after a little bit it just would briefly pause and then keep going that sounded pretty unnatural
Here's just the way I tried to reproduce:
def call_gpt(message_list: list, model: str):
for chunk in openai.ChatCompletion.create(
model=model,
messages=message_list,
max_tokens=100,
stream=True
):
# Extract the content from the chunk if available
if (text_chunk := chunk["choices"][0]["delta"].get("content")) is not None:
yield text_chunk
text_stream = call_gpt(message_list, request.app['model'])
for chunk in elevenlabs.generate(
text=text_stream,
voice="Thomas",
stream=True,
latency=4,
):
convert_audio = await convert_audio_chunk_to_xmulaw(chunk)
await send_async(twilio_ws, stream_sid, convert_audio)
send_async just sends it to twilio
Yeah, for some reason i am getting the same error, but it is not consistent, and i am not sure why. But that is what i have for now.
lmao
Has anyone tried any Twilio alternatives or ElevenLabs alternatives?
No, we managed to do that, but it is not working perfectly atm. I am trying to improve it, but it is very buggy
Ummm not sure if this is helpful, but I found that seems like 11labs streaming needs some init time...
Not sure if you guys have implement this already...
The left hand side, is with init, the right hand side is without...
The major different is in the first batch, and you can see the total run time got big difference.
I create a dummy function to send a empty string to 11labs when kick start my server.
Interesting
how do I send openai completion stream to elevenlabs and stream the audio response to twilio call? im using Node.js
can someone help me with this? I am ready to work with him/her