#pcm
1 messages · Page 1 of 1 (latest)
Same for me; I provide the string "pcm_16000" but get mp3 data out. It's a big deal because I have to further convert to mulaw, and streaming mp3 conversion is just a pain to deal with.
@cedar scroll Heard in #🤖│api-chat that you have to be at the Independent Publisher tier to get pcm data. Don't have verification on that yet
That's not true. That's only for pcm_44100.
I see ... but what about pcm for < 44100?
That should work. I'm using PCM output. I would guess you're not passing the query parameter correctly or misreading the data you're getting.
Perhaps, but it's just a query parameter, right? I'm using: audio_format = "pcm_16000"
url = f"wss://api.elevenlabs.io/v1/text-to-speech/{voice}/stream-input"
query = f"model_id={model}&optimize_streaming_latency={latency}&output_format={audio_format}"
self.url = f"{url}?{query}"
So, wss://api.elevenlabs.io/v1/text-to-speech/{voice}/stream-input?model_id=elevenlabs_multilingual_v2&optimize_streaming_latency=3&output_format=pcm_16000
That looks fine to me. Have you tried different models? It definitely works for me through WebSocket with elevenlabs_monolingual_v1 and with optimize_streaming_latency set to 4.
I can certainly try that. Right now the 1st 4 bytes I get are '5a000000' -- which indeed does not look like an MP3 header, nor do I see a leading 255 anywhere (1st MP3 header byte) -- but the audio player (using pydub) only plays it correctly if I send it in with format "mp3" ... "wav" or "pcm_s16le" both just give noise. It is supposed to be just linear pcm, right?
It'll be a couple hours, but I'll try a few different things