#Stream to Twilio Voice?

1 messages · Page 1 of 1 (latest)

true otter
#

I haven't found a solution to this. I now temporarily store the file in R2... Did you end up finding a better solution?

spare fern
#

Hey guys, any luck with this? Trying to figure this out

timber hearth
#

Same. Any luck?

craggy delta
#

bump on this

pastel coral
#

Still nothing, any luck guys ?

craggy delta
#

It looks like play.ht has native support

#

Maybe we have to move there?

timber hearth
#

I just reached to them about it

pastel coral
timber hearth
grand prism
#

Hey hey! Sounds similar to some of the stuff we do at Bland.ai. We're an API for AI phone calling; like twilio, we make it really easy for developers to add inbound and outbound AI phone calling to their applications.

Check us out! 🙂

craggy delta
#

Why would I use you over twilio

grand prism
#

For context: we do AI phone calling - not regular phone calling. Meaning, our AI agent can make a call on your behalf, armed with an objective (a given task) and a set of parameters (info it needs to complete the task).

Coming back to your question:

You could either 1) Build your own phone calling infra on top of Twilio or 2) Use Twilio's programmable voice.

Option 1: Building your own AI phone calling infra on top of Twilio is hard. Largest challenges are 1) Enabling LLMs to understand the nuances of human speech and 2) Doing so with low latency

Option 2: If you use Twilio's voice API, the voice sounds super robotic and the talk-track is pre-scripted. You don't have the same flexibility - and abiilty to engage someone in natural conversation.

pastel coral
#

So nobody figured this out with eleven labs ?

grand prism
#

We did 🙂

pastel coral
#

I believe you, if you want to share how you did that i will appreciate that if not this is a very stupid place to try and sell your service.

grand prism
#

@pastel coral we're going to do a HN launch shortly; happy to send you the early version of our post though 😄

#

Will update this thread once it's live

craggy delta
#

Lol I’m not paying bland.ai money for a tiny feature that other tts providers already have for free

#

How about you share with the developers if you actually know something instead of finding customers who aren’t a good fit for your product through this

spare fern
#

Ok, I managed to do it with ffmpeg converting the stream on the fly. Works great:
`voice.textToSpeechStream(elevenLabsApiKey, voiceID, answerText).then((res) => {
res.pipe(ffmpeg.stdin);

    const CHUNK_SIZE = 320;
    let buffer = Buffer.alloc(0);


    ffmpeg.stdout.on('data', (chunk) => {
        buffer = Buffer.concat([buffer, chunk]);

        while (buffer.length >= CHUNK_SIZE) {
        const chunkToSend = buffer.slice(0, CHUNK_SIZE);
        buffer = buffer.slice(CHUNK_SIZE);
    
        const base64EncodedChunk = chunkToSend.toString('base64');
    
        const mediaMessage = {
            event: 'media',
            streamSid: streamSid,
            media: {
            payload: base64EncodedChunk,
            },
        };
        
        ws.send(JSON.stringify(mediaMessage));
        }
     });


  })`
pastel coral
#

@spare fern Let me just get it straight, you are sending this line to twilio ? : ws.send(JSON.stringify(mediaMessage));

spare fern
pastel coral
#

I see, but sending media through socket was never the problem. The problem was converting it to right format before sending it to twilio

spare fern
#

Right, I’m using ffmpeg to convert from mp3 to mulaw

#

It’s near real time

spare fern
#

Here are the ffmpeg settings:
const ffmpeg = spawn('ffmpeg', [ '-i', 'pipe:0', '-f', 'mulaw', '-ar', '8000', '-ac', '1', 'pipe:1' ]);

tardy jay
tardy jay
#

Hi @spare fern , just got a quick question. Did you send a mark message and clear message after the media message was sent ? 🤔🤔

spare fern
#

No, I didn’t. Works fine for my use case

tardy jay
pastel coral
#

Hey, i managed to do the same, can anybody share the way they connect and send this media chunks to twilio. I am using connect.stream(url) to connect twilio, and then trying to send raw media through socket, but i am not hearing anything on my phone. Anybody who can help me ? I can provide you the python code

tardy jay
pastel coral
#

I used the same method for conversion, but i am using socketio for emiting the data, maybe that is the problem? Also what exactly is the streamSid, where can i find it?

pastel coral
#

Sorry for bodering you, could you provide me the code where you connect twilio so i can see the example, please?

tardy jay
# pastel coral Sorry for bodering you, could you provide me the code where you connect twilio s...
pastel coral
#

Will look into it. Thank you ❤️💪🏻

pastel coral
#

Managed to connect everything, thank you, but i have conversion issues, i am using very similar approach as @spare fern in python, but all i hear is noise. I am not sure what am i doing wrong, will paste the code here so maybe someone can help me fix that.
`def convert_raw_audio_chunk_to_mulaw(raw_audio_chunk):
try:
# Define the FFmpeg command
ffmpeg_command = [
'ffmpeg',
'-f',
's16le', # Input format (16-bit little-endian)
'-ar',
'44100', # Input sample rate
'-ac',
'1', # Input channels (mono)
'-i',
'pipe:0', # Input from stdin
'-f',
'mulaw', # Output format (mulaw)
'-ar',
'8000', # Output sample rate
'-ac',
'1', # Output channels (mono)
'-loglevel',
'error', # Suppress FFmpeg logs
'pipe:1' # Output to stdout
]

# Start the FFmpeg process
ffmpeg_process = subprocess.Popen(ffmpeg_command,
                                  stdin=subprocess.PIPE,
                                  stdout=subprocess.PIPE,
                                  stderr=subprocess.PIPE)

  # Write the chunk to FFmpeg's stdin
ffmpeg_process.stdin.write(raw_audio_chunk)

# Close stdin to signal the end of input
ffmpeg_process.stdin.close()

# Wait for FFmpeg to finish
ffmpeg_process.wait()

# Check for errors
if ffmpeg_process.returncode != 0:
  raise Exception(
    f'FFmpeg error: {ffmpeg_process.stderr.read().decode("utf-8")}')

# Get the mulaw-encoded audio from FFmpeg's stdout
mulaw_audio = ffmpeg_process.stdout.read()

# Encode the mulaw audio as base64
base64_audio = base64.b64encode(mulaw_audio).decode()

return base64_audio

except Exception as e:
print(f'Error: {str(e)}')
return None`

pastel coral
#

This is my other attempt based on stackoverflow discussion you sent me :
`class AudioConverter:

def init(self):
self.input_sample_rate = 44100
self.output_sample_rate = 8000
self.audio_buffer = b''

async def convert_audio_chunk_to_xmulaw(self, audio_chunk):
try:
if len(audio_chunk) == 2048:
# Perform sample rate conversion from 44100Hz to 8000Hz
converted_audio = audioop.ratecv(audio_chunk, 2, 1,
self.input_sample_rate,
self.output_sample_rate, None)

    # Convert the chunk to mulaw using lin2ulaw
    mulaw_audio = audioop.lin2ulaw(converted_audio[0], 2)

    # Encode the mulaw audio as base64
    base64_audio = base64.b64encode(mulaw_audio).decode("utf-8")

    return base64_audio
  else:
    pass

except Exception as e:
  print(f'Error: {str(e)}')
  return None

`

tardy jay
#

Per my experience, i never set input sample rate or anything about the input sample, just make sure the output sameple rate is 8000

#

I stick to @spare fern 's setup on ffmpeg, didnt change anything at all. just like this :

const ffmpeg = spawn('ffmpeg', [
'-i', 'pipe:0',
'-f', 'mulaw',
'-ar', '8000',
'-ac', '1',
'pipe:1'
]);

pastel coral
#

Managed to fix the problem, thank you so much!

craggy delta
#

@pastel coral also looking to fix this in Python, what ended up working for you?

pastel coral
#

I will share it with you tomorrow. I am having some issues. First thing is the latency, i cannot get it below 3s, and i think it is the python library issue, and the second one is the way ai generates chunks, because the punctuations make audio sound like shit.

#

If someone has done that, i would love to read about that.

craggy delta
#

currently im just splitting it by sentences and that works fine for me (just send post request to the api endpoint not the python library)

#

im trying to use the library for input streaming but running into the mulaw conversion issue

#

i'll play around with it today

pastel coral
#

I am using input streaming in elevenlabs python library, is there input streaming via post request?

#

I will share my code conversion with you, just i am currently not home

pastel coral
#

This is the way i am using converter:
`class AudioConverter:

async def convert_audio_chunk_to_xmulaw(self, audio_chunk):
try:
# Define the FFmpeg command
ffmpeg_command = [
'ffmpeg', '-i', 'pipe:0', '-f', 'mulaw', '-ar', '8000', '-ac', '1',
'pipe:1'
]

  # Start the FFmpeg process
  ffmpeg_process = subprocess.Popen(ffmpeg_command,
                                    stdin=subprocess.PIPE,
                                    stdout=subprocess.PIPE,
                                    stderr=subprocess.PIPE)

  # Feed the audio chunk to FFmpeg's stdin
  mulaw_audio, stderr = ffmpeg_process.communicate(input=audio_chunk)

  # Check for errors
  if ffmpeg_process.returncode != 0:
    raise Exception(f'FFmpeg error: {stderr.decode("utf-8")}')

  # Encode the mulaw audio as base64
  base64_audio = base64.b64encode(mulaw_audio).decode("utf-8")

  return base64_audio

except Exception as e:
  print(f'Error: {str(e)}')
  return None

Usage

converter = AudioConverter()`

#

@craggy delta So you managed to get response below 4 seconds ? Because i cannot get that for some reason

craggy delta
#

yes i did

#

ill dm you

craggy delta
#

Ah that implementation worked for a while but eventually I ran into this issue:

#

I also noticed that for some reason after a little bit it just would briefly pause and then keep going that sounded pretty unnatural

#

Here's just the way I tried to reproduce:

                        def call_gpt(message_list: list, model: str):
                            for chunk in openai.ChatCompletion.create(
                                model=model,
                                messages=message_list, 
                                max_tokens=100,
                                stream=True
                            ):
                                # Extract the content from the chunk if available
                                if (text_chunk := chunk["choices"][0]["delta"].get("content")) is not None:
                                    yield text_chunk
                        text_stream = call_gpt(message_list, request.app['model'])

                        for chunk in elevenlabs.generate(
                            text=text_stream,
                            voice="Thomas",
                            stream=True,
                            latency=4,
                        ):
                            convert_audio = await convert_audio_chunk_to_xmulaw(chunk)
                            await send_async(twilio_ws, stream_sid, convert_audio)
#

send_async just sends it to twilio

pastel coral
#

Yeah, for some reason i am getting the same error, but it is not consistent, and i am not sure why. But that is what i have for now.

quasi merlin
#

Has anyone tried any Twilio alternatives or ElevenLabs alternatives?

pastel coral
#

No, we managed to do that, but it is not working perfectly atm. I am trying to improve it, but it is very buggy

tardy jay
#

Ummm not sure if this is helpful, but I found that seems like 11labs streaming needs some init time...
Not sure if you guys have implement this already...

The left hand side, is with init, the right hand side is without...

The major different is in the first batch, and you can see the total run time got big difference.

#

I create a dummy function to send a empty string to 11labs when kick start my server.

pastel coral
#

Interesting

raw fjord
#

how do I send openai completion stream to elevenlabs and stream the audio response to twilio call? im using Node.js

raw fjord