pcm | ElevenLabs | Page 1

cedar scroll Sep 10, 2023, 5:38 PM

#

I am trying to use the new output_format parameter but the output still seem to be a mp3, can you guys please provide a code sample?

modern jay Oct 5, 2023, 2:53 PM

#

Same for me; I provide the string "pcm_16000" but get mp3 data out. It's a big deal because I have to further convert to mulaw, and streaming mp3 conversion is just a pain to deal with.

modern jay Oct 6, 2023, 12:50 PM

#

@cedar scroll Heard in #🤖│api-chat that you have to be at the Independent Publisher tier to get pcm data. Don't have verification on that yet

cosmic tusk Oct 6, 2023, 1:28 PM

#

modern jay <@732076581073453138> Heard in <#1066745035703455804> that you have to be at the...

That's not true. That's only for pcm_44100.

modern jay Oct 6, 2023, 1:31 PM

#

cosmic tusk That's not true. That's only for pcm_44100.

I see ... but what about pcm for < 44100?

cosmic tusk Oct 6, 2023, 1:32 PM

#

modern jay I see ... but what about pcm for < 44100?

That should work. I'm using PCM output. I would guess you're not passing the query parameter correctly or misreading the data you're getting.

modern jay Oct 6, 2023, 1:34 PM

#

cosmic tusk That should work. I'm using PCM output. I would guess you're not passing the que...

Perhaps, but it's just a query parameter, right? I'm using: audio_format = "pcm_16000"
url = f"wss://api.elevenlabs.io/v1/text-to-speech/{voice}/stream-input"
query = f"model_id={model}&optimize_streaming_latency={latency}&output_format={audio_format}"
self.url = f"{url}?{query}"

#

So, wss://api.elevenlabs.io/v1/text-to-speech/{voice}/stream-input?model_id=elevenlabs_multilingual_v2&optimize_streaming_latency=3&output_format=pcm_16000

cosmic tusk Oct 6, 2023, 1:37 PM

#

That looks fine to me. Have you tried different models? It definitely works for me through WebSocket with elevenlabs_monolingual_v1 and with optimize_streaming_latency set to 4.

modern jay Oct 6, 2023, 1:45 PM

#

I can certainly try that. Right now the 1st 4 bytes I get are '5a000000' -- which indeed does not look like an MP3 header, nor do I see a leading 255 anywhere (1st MP3 header byte) -- but the audio player (using pydub) only plays it correctly if I send it in with format "mp3" ... "wav" or "pcm_s16le" both just give noise. It is supposed to be just linear pcm, right?

#

It'll be a couple hours, but I'll try a few different things

#pcm