true otter May 11, 2023, 9:44 PM

#

I haven't found a solution to this. I now temporarily store the file in R2... Did you end up finding a better solution?

spare fern Aug 14, 2023, 5:09 AM

#

Hey guys, any luck with this? Trying to figure this out

timber hearth Aug 15, 2023, 4:36 PM

#

Same. Any luck?

craggy delta Aug 20, 2023, 6:39 AM

#

bump on this

pastel coral Aug 22, 2023, 3:12 PM

#

Still nothing, any luck guys ?

craggy delta Aug 22, 2023, 9:26 PM

#

It looks like play.ht has native support

#

Maybe we have to move there?

timber hearth Aug 23, 2023, 12:36 AM

#

I just reached to them about it

pastel coral Aug 23, 2023, 5:49 AM

#

timber hearth I just reached to them about it

To eleven labs support?

timber hearth Aug 23, 2023, 2:30 PM

#

Play.ht

grand prism Aug 23, 2023, 7:33 PM

#

Hey hey! Sounds similar to some of the stuff we do at Bland.ai. We're an API for AI phone calling; like twilio, we make it really easy for developers to add inbound and outbound AI phone calling to their applications.

Check us out! 🙂

craggy delta Aug 23, 2023, 11:04 PM

#

Why would I use you over twilio

grand prism Aug 24, 2023, 1:24 AM

#

For context: we do AI phone calling - not regular phone calling. Meaning, our AI agent can make a call on your behalf, armed with an objective (a given task) and a set of parameters (info it needs to complete the task).

Coming back to your question:

You could either 1) Build your own phone calling infra on top of Twilio or 2) Use Twilio's programmable voice.

Option 1: Building your own AI phone calling infra on top of Twilio is hard. Largest challenges are 1) Enabling LLMs to understand the nuances of human speech and 2) Doing so with low latency

Option 2: If you use Twilio's voice API, the voice sounds super robotic and the talk-track is pre-scripted. You don't have the same flexibility - and abiilty to engage someone in natural conversation.

pastel coral Aug 26, 2023, 3:28 PM

#

So nobody figured this out with eleven labs ?

grand prism Aug 26, 2023, 5:21 PM

#

We did 🙂

#

www.bland.ai

pastel coral Aug 26, 2023, 9:13 PM

#

I believe you, if you want to share how you did that i will appreciate that if not this is a very stupid place to try and sell your service.

grand prism Aug 28, 2023, 7:51 AM

#

@pastel coral we're going to do a HN launch shortly; happy to send you the early version of our post though 😄

#

Will update this thread once it's live

craggy delta Aug 29, 2023, 8:49 AM

#

Lol I’m not paying bland.ai money for a tiny feature that other tts providers already have for free

#

How about you share with the developers if you actually know something instead of finding customers who aren’t a good fit for your product through this

spare fern Aug 29, 2023, 9:19 AM

#

Ok, I managed to do it with ffmpeg converting the stream on the fly. Works great:
`voice.textToSpeechStream(elevenLabsApiKey, voiceID, answerText).then((res) => {
res.pipe(ffmpeg.stdin);

    const CHUNK_SIZE = 320;
    let buffer = Buffer.alloc(0);


    ffmpeg.stdout.on('data', (chunk) => {
        buffer = Buffer.concat([buffer, chunk]);

        while (buffer.length >= CHUNK_SIZE) {
        const chunkToSend = buffer.slice(0, CHUNK_SIZE);
        buffer = buffer.slice(CHUNK_SIZE);
    
        const base64EncodedChunk = chunkToSend.toString('base64');
    
        const mediaMessage = {
            event: 'media',
            streamSid: streamSid,
            media: {
            payload: base64EncodedChunk,
            },
        };
        
        ws.send(JSON.stringify(mediaMessage));
        }
     });


  })`

pastel coral Aug 29, 2023, 7:38 PM

#

@spare fern Let me just get it straight, you are sending this line to twilio ? : ws.send(JSON.stringify(mediaMessage));

spare fern Aug 29, 2023, 7:40 PM

#

pastel coral <@1093947371349606501> Let me just get it straight, you are sending this line t...

Sending the media message through the web socket. Not good? (Amateur developer here)

pastel coral Aug 29, 2023, 7:45 PM

#

I see, but sending media through socket was never the problem. The problem was converting it to right format before sending it to twilio

spare fern Aug 29, 2023, 7:50 PM

#

Right, I’m using ffmpeg to convert from mp3 to mulaw

#

It’s near real time

spare fern Aug 30, 2023, 5:56 AM

#

Here are the ffmpeg settings:
const ffmpeg = spawn('ffmpeg', [ '-i', 'pipe:0', '-f', 'mulaw', '-ar', '8000', '-ac', '1', 'pipe:1' ]);

tardy jay Aug 31, 2023, 3:59 PM

#

spare fern Here are the ffmpeg settings: `const ffmpeg = spawn('ffmpeg', [ '-i', 'p...

Thanks mate !!!!!! LIFE SAVERRRRRRRR 🥹 ! Have been looking for solution for the whole day !!!!!!

tardy jay Sep 3, 2023, 4:11 AM

#

Hi @spare fern , just got a quick question. Did you send a mark message and clear message after the media message was sent ? 🤔🤔

spare fern Sep 3, 2023, 4:33 AM

#

No, I didn’t. Works fine for my use case

tardy jay Sep 3, 2023, 6:38 AM

#

spare fern No, I didn’t. Works fine for my use case

Got it ! Thanks for your reply 🤩🫡

pastel coral Sep 4, 2023, 9:36 AM

#

Hey, i managed to do the same, can anybody share the way they connect and send this media chunks to twilio. I am using connect.stream(url) to connect twilio, and then trying to send raw media through socket, but i am not hearing anything on my phone. Anybody who can help me ? I can provide you the python code

tardy jay Sep 5, 2023, 4:07 AM

#

pastel coral Hey, i managed to do the same, can anybody share the way they connect and send t...

I think you should check if the media is in MULAW foramt, i tested with @spare fern 's solution, works well.

pastel coral Sep 5, 2023, 6:14 AM

#

I used the same method for conversion, but i am using socketio for emiting the data, maybe that is the problem? Also what exactly is the streamSid, where can i find it?

tardy jay Sep 5, 2023, 7:13 AM

#

pastel coral I used the same method for conversion, but i am using socketio for emiting the d...

Its from the start message

pastel coral Sep 5, 2023, 7:49 AM

#

Sorry for bodering you, could you provide me the code where you connect twilio so i can see the example, please?

tardy jay Sep 5, 2023, 7:51 AM

#

pastel coral Sorry for bodering you, could you provide me the code where you connect twilio s...

Hi , if you are using Python, i think this is relavant :
https://stackoverflow.com/questions/75475925/stream-audio-back-to-twilio-via-websocket-connection

Stack Overflow

Stream audio back to Twilio via websocket connection

I'm trying out Twillio's Programmable Voice feature and have implemented basic audio stream processing by referring to this doc. I'm planning to stream audio back to Twillio using the same websocke...

pastel coral Sep 5, 2023, 7:57 AM

#

Will look into it. Thank you ❤️💪🏻

pastel coral Sep 6, 2023, 9:18 AM

#

Managed to connect everything, thank you, but i have conversion issues, i am using very similar approach as @spare fern in python, but all i hear is noise. I am not sure what am i doing wrong, will paste the code here so maybe someone can help me fix that.
`def convert_raw_audio_chunk_to_mulaw(raw_audio_chunk):
try:
# Define the FFmpeg command
ffmpeg_command = [
'ffmpeg',
'-f',
's16le', # Input format (16-bit little-endian)
'-ar',
'44100', # Input sample rate
'-ac',
'1', # Input channels (mono)
'-i',
'pipe:0', # Input from stdin
'-f',
'mulaw', # Output format (mulaw)
'-ar',
'8000', # Output sample rate
'-ac',
'1', # Output channels (mono)
'-loglevel',
'error', # Suppress FFmpeg logs
'pipe:1' # Output to stdout
]

# Start the FFmpeg process
ffmpeg_process = subprocess.Popen(ffmpeg_command,
                                  stdin=subprocess.PIPE,
                                  stdout=subprocess.PIPE,
                                  stderr=subprocess.PIPE)

  # Write the chunk to FFmpeg's stdin
ffmpeg_process.stdin.write(raw_audio_chunk)

# Close stdin to signal the end of input
ffmpeg_process.stdin.close()

# Wait for FFmpeg to finish
ffmpeg_process.wait()

# Check for errors
if ffmpeg_process.returncode != 0:
  raise Exception(
    f'FFmpeg error: {ffmpeg_process.stderr.read().decode("utf-8")}')

# Get the mulaw-encoded audio from FFmpeg's stdout
mulaw_audio = ffmpeg_process.stdout.read()

# Encode the mulaw audio as base64
base64_audio = base64.b64encode(mulaw_audio).decode()

return base64_audio

except Exception as e:
print(f'Error: {str(e)}')
return None`

pastel coral Sep 6, 2023, 10:18 AM

#

This is my other attempt based on stackoverflow discussion you sent me :
`class AudioConverter:

def init(self):
self.input_sample_rate = 44100
self.output_sample_rate = 8000
self.audio_buffer = b''

async def convert_audio_chunk_to_xmulaw(self, audio_chunk):
try:
if len(audio_chunk) == 2048:
# Perform sample rate conversion from 44100Hz to 8000Hz
converted_audio = audioop.ratecv(audio_chunk, 2, 1,
self.input_sample_rate,
self.output_sample_rate, None)

    # Convert the chunk to mulaw using lin2ulaw
    mulaw_audio = audioop.lin2ulaw(converted_audio[0], 2)

    # Encode the mulaw audio as base64
    base64_audio = base64.b64encode(mulaw_audio).decode("utf-8")

    return base64_audio
  else:
    pass

except Exception as e:
  print(f'Error: {str(e)}')
  return None

`

tardy jay Sep 6, 2023, 10:28 AM

#

Per my experience, i never set input sample rate or anything about the input sample, just make sure the output sameple rate is 8000

#

I stick to @spare fern 's setup on ffmpeg, didnt change anything at all. just like this :

const ffmpeg = spawn('ffmpeg', [
'-i', 'pipe:0',
'-f', 'mulaw',
'-ar', '8000',
'-ac', '1',
'pipe:1'
]);

pastel coral Sep 6, 2023, 10:49 AM

#

Managed to fix the problem, thank you so much!

craggy delta Sep 6, 2023, 6:54 PM

#

@pastel coral also looking to fix this in Python, what ended up working for you?

pastel coral Sep 6, 2023, 7:12 PM

#

I will share it with you tomorrow. I am having some issues. First thing is the latency, i cannot get it below 3s, and i think it is the python library issue, and the second one is the way ai generates chunks, because the punctuations make audio sound like shit.

#

If someone has done that, i would love to read about that.

craggy delta Sep 6, 2023, 7:13 PM

#

currently im just splitting it by sentences and that works fine for me (just send post request to the api endpoint not the python library)

#

im trying to use the library for input streaming but running into the mulaw conversion issue

#

i'll play around with it today

pastel coral Sep 6, 2023, 8:05 PM

#

I am using input streaming in elevenlabs python library, is there input streaming via post request?

#

I will share my code conversion with you, just i am currently not home

pastel coral Sep 6, 2023, 8:55 PM

#

This is the way i am using converter:
`class AudioConverter:

async def convert_audio_chunk_to_xmulaw(self, audio_chunk):
try:
# Define the FFmpeg command
ffmpeg_command = [
'ffmpeg', '-i', 'pipe:0', '-f', 'mulaw', '-ar', '8000', '-ac', '1',
'pipe:1'
]

  # Start the FFmpeg process
  ffmpeg_process = subprocess.Popen(ffmpeg_command,
                                    stdin=subprocess.PIPE,
                                    stdout=subprocess.PIPE,
                                    stderr=subprocess.PIPE)

  # Feed the audio chunk to FFmpeg's stdin
  mulaw_audio, stderr = ffmpeg_process.communicate(input=audio_chunk)

  # Check for errors
  if ffmpeg_process.returncode != 0:
    raise Exception(f'FFmpeg error: {stderr.decode("utf-8")}')

  # Encode the mulaw audio as base64
  base64_audio = base64.b64encode(mulaw_audio).decode("utf-8")

  return base64_audio

except Exception as e:
  print(f'Error: {str(e)}')
  return None

Usage

true otter · 2023-05-11T21:44:51.768Z

Stream to Twilio Voice? | ElevenLabs | Page 1

converter = AudioConverter()`

#

@craggy delta So you managed to get response below 4 seconds ? Because i cannot get that for some reason

craggy delta Sep 6, 2023, 9:36 PM

#

yes i did

#

ill dm you

craggy delta Sep 6, 2023, 10:52 PM

#

Ah that implementation worked for a while but eventually I ran into this issue:

#

I also noticed that for some reason after a little bit it just would briefly pause and then keep going that sounded pretty unnatural

#

Here's just the way I tried to reproduce:

                        def call_gpt(message_list: list, model: str):
                            for chunk in openai.ChatCompletion.create(
                                model=model,
                                messages=message_list, 
                                max_tokens=100,
                                stream=True
                            ):
                                # Extract the content from the chunk if available
                                if (text_chunk := chunk["choices"][0]["delta"].get("content")) is not None:
                                    yield text_chunk
                        text_stream = call_gpt(message_list, request.app['model'])

                        for chunk in elevenlabs.generate(
                            text=text_stream,
                            voice="Thomas",
                            stream=True,
                            latency=4,
                        ):
                            convert_audio = await convert_audio_chunk_to_xmulaw(chunk)
                            await send_async(twilio_ws, stream_sid, convert_audio)

#

send_async just sends it to twilio

pastel coral Sep 7, 2023, 10:51 AM

#

Yeah, for some reason i am getting the same error, but it is not consistent, and i am not sure why. But that is what i have for now.

barren dune Sep 13, 2023, 2:15 PM

#

pastel coral So nobody figured this out with eleven labs ?

lmao

quasi merlin Sep 15, 2023, 10:48 PM

#

Has anyone tried any Twilio alternatives or ElevenLabs alternatives?

pastel coral Sep 16, 2023, 9:12 AM

#

No, we managed to do that, but it is not working perfectly atm. I am trying to improve it, but it is very buggy

tardy jay Sep 17, 2023, 5:49 AM

#

Ummm not sure if this is helpful, but I found that seems like 11labs streaming needs some init time...
Not sure if you guys have implement this already...

The left hand side, is with init, the right hand side is without...

The major different is in the first batch, and you can see the total run time got big difference.

#

I create a dummy function to send a empty string to 11labs when kick start my server.

pastel coral Sep 21, 2023, 8:44 PM

#

Interesting

raw fjord Mar 6, 2024, 8:04 AM

#

how do I send openai completion stream to elevenlabs and stream the audio response to twilio call? im using Node.js

raw fjord Mar 6, 2024, 8:16 AM

#

raw fjord how do I send openai completion stream to elevenlabs and stream the audio respon...

can someone help me with this? I am ready to work with him/her

#Stream to Twilio Voice?

Usage