#I have a Pro subscription. When using the API via a native Python socket implementation, why can't

1 messages · Page 1 of 1 (latest)

spring krakenBOT
#
AI-generated Answer for kkberger

PCM data does not include a WAV header, which is why it may not be recognized as a proper WAV file by most audio players. The PCM format provides raw audio data, while WAV files are essentially PCM data with an added header that includes metadata like sample rate and bit depth. This makes WAV files easier for most players to handle.

If you're using the API and want to generate a WAV file, you can output PCM data and then add a WAV header to it. Here's an example of how you can convert PCM data to a WAV file using Python:

import wave

# Assuming `pcm_data` contains the raw PCM audio data
pcm_data = b'...'  # Replace with your PCM data
output_file = "output.wav"

# Define WAV file parameters
sample_rate = 44100  # 44.1 kHz
num_channels = 1  # Mono
sample_width = 2  # 16-bit audio

# Write PCM data to a WAV file
with wave.open(output_file, 'wb') as wav_file:
    wav_file.setnchannels(num_channels)
    wav_file.setsampwidth(sample_width)
    wav_file.setframerate(sample_rate)
    wav_file.writeframes(pcm_data)

print(f"WAV file saved as {output_file}")

If you're only able to output MP3 files, ensure that your output_format parameter in the API request is set to "pcm_44100". If you're still encountering issues, double-check your implementation or consider using a library like numpy and sounddevice to handle raw PCM data playback, as shown in the following example:

import numpy as np
import sounddevice as sd

# Convert PCM data to a NumPy array
audio_array = np.frombuffer(pcm_data, dtype=np.int16)

# Play the audio
sd.play(audio_array, samplerate=44100)
sd.wait()

Let me know if you need further assistance!

heavy thicket
# spring kraken

Hi, i'm trying to replicate this code in order to output and save file from PCM to WAV.

I think I miss something about the pcm_data :
pcm_data = b'...' # Replace with your PCM data

Can you explain what am I supposed to write please ?
Thanks !!

spring krakenBOT
#
AI-generated Answer for rjanot

Short answer: pcm_data should be the raw bytes you receive from ElevenLabs when you request a PCM output format from the Text-to-Speech API (or when you base64‑decode the “audio” chunks from the WebSocket TTS stream if you chose a PCM format there). You don’t invent or type those bytes yourself—the API returns them when you set output_format to a PCM variant. (elevenlabs.io)

What ElevenLabs sends for PCM

  • Encoding: Linear PCM, 16‑bit, little‑endian (S16LE). So your WAV header must use a 2‑byte sample width. (elevenlabs.io)
  • Sample rates available via API include 8 kHz, 16 kHz, 22.05 kHz, 24 kHz, 44.1 kHz, and 48 kHz. Choose one by setting output_format=pcm_<rate> (e.g., pcm_16000) and use the same rate in the WAV header. (elevenlabs.io, help.elevenlabs.io)
  • PCM is an API output option (normal, streaming, and WebSocket endpoints). If you’re streaming, base64‑decode each “audio” field to get the raw PCM bytes. (elevenlabs.io)

How to turn that PCM into a WAV file (Python outline)

  • Write a standard WAV header that matches the PCM:
    • n_channels: set to the number of channels in your stream (commonly 1)
    • sampwidth: 2 bytes (because S16LE)
    • framerate: the exact rate you requested (e.g., 16000, 22050, 24000, 44100, 48000)
    • then write the raw pcm_data bytes

Where to get pcm_data

  • REST: call Create speech with output_format set to a PCM variant (e.g., pcm_22050). The response body is the raw PCM bytes; assign that to pcm_data and wrap with a WAV header as above. (elevenlabs.io)
  • WebSocket streaming: read messages, base64‑decode the “audio” field to bytes, and append them; those decoded bytes are your pcm_data. (elevenlabs.io)

Notes

  • WAV is not listed as a direct API output in the current formats list (you can download WAV from the UI, but via API you’d typically choose MP3, PCM, Opus, μ‑law, or A‑law). That’s why wrapping PCM into WAV on the client is the usual approach. (help.elevenlabs.io)

If you’d like, share which output_format you’re using (e.g., pcm_16000) and I can sketch the exact header parameters to use for your WAV writer. (elevenlabs.io)

Sources:
Create speech | ElevenLabs Documentation
Text to Speech | ElevenLabs Documentation
Create speech | ElevenLabs Documentation
What audio formats do you support? – ElevenLabs
ElevenLabs — PCM output format | ElevenLabs
Create speech | ElevenLabs Documentation
Generate audio in real-time | ElevenLabs Documentation
What audio formats do you support? – ElevenLabs
Create speech | ElevenLabs Documentation

heavy thicket
#

pcm_44100

spring krakenBOT
#
AI-generated Answer for rjanot

“pcm_44100” is an output_format value that makes the API return raw 16‑bit linear PCM (S16LE) audio sampled at 44,100 Hz. (elevenlabs.io)

How to use: add output_format=pcm_44100 to Text-to-Speech or Speech‑to‑Speech (including streaming) requests; the response body will be the raw PCM bytes. (elevenlabs.io)

What to put in pcm_data: exactly those raw bytes from the API response. If you need a .wav file, you must wrap the bytes with a WAV header yourself or use a library—WAV is not listed among API output formats (API supports MP3, PCM, Opus, µ‑law, A‑law), though WAV downloads are available in the web UI. (elevenlabs.io, help.elevenlabs.io)

Plan limits: 44.1 kHz PCM requires Pro (or higher) plan. (elevenlabs.io)

Note: The docs specify PCM is S16LE and the sample rate, but they do not state channel count for TTS output. For input on some endpoints, PCM is explicitly mono at 16 kHz; that constraint does not document TTS output channels. (elevenlabs.io)

Sources:
Text To Speech - ElevenLabs
Create speech | ElevenLabs Documentation
Text To Speech - ElevenLabs
What audio formats do you support? – ElevenLabs
Create speech | ElevenLabs Documentation
Create transcript | ElevenLabs Documentation

heavy thicket
spring krakenBOT
#
AI-generated Answer for rjanot

pcm_44100 is an ElevenLabs output_format value that returns PCM audio at a 44.1 kHz sample rate. Note that PCM at 44.1 kHz requires a Pro plan or higher. (elevenlabs.io)

If your goal is simply to save a playable file without extra processing, request an encoded format like mp3_44100_128 instead and write the response to disk as .mp3. The Help Center lists the API-supported output formats (MP3, PCM at various sample rates, Opus, µ-law, A-law). (elevenlabs.io, help.elevenlabs.io)

If you specifically need a .wav file, generate pcm_44100 and then convert that PCM audio to a WAV container in your code (set the WAV header’s sample rate to 44100 Hz). The API documentation covers selecting pcm_44100; conversion to WAV is done client-side. (elevenlabs.io)

Sources:
Create speech | ElevenLabs Documentation
Create speech | ElevenLabs Documentation
What audio formats do you support? – ElevenLabs
Create speech | ElevenLabs Documentation

heavy thicket
#

Hi, i'm trying to replicate this code in order to output and save file from PCM to WAV.

I think I miss something about the pcm_data :
pcm_data = b'...' # Replace with your PCM data
I use pcm_44100 output format

Can you explain what am I supposed to write please ?
Thanks !!

spring krakenBOT
#
AI-generated Answer for rjanot

Short answer: pcm_data should be the raw bytes you get back from the ElevenLabs API when you request output_format=pcm_44100. That format is signed 16‑bit little‑endian PCM at 44.1 kHz. You don’t invent the bytes—use exactly what the API returns, or the concatenation of streamed chunks if you’re streaming.

Details you can rely on:

  • output_format=pcm_44100 is listed, and 44.1 kHz PCM requires a Pro (or higher) tier. (elevenlabs.io)
  • “PCM (S16LE)” is the PCM encoding used by ElevenLabs docs. (elevenlabs.io)
  • For streaming over WebSocket, the API sends base64-encoded audio chunks in the "audio" field; you base64‑decode and append them to form the full pcm_data. (elevenlabs.io)
  • ElevenLabs’ docs and help center enumerate supported formats and sample rates (including 44.1 kHz for PCM). (help.elevenlabs.io, elevenlabs.io)

Converting to WAV:

  • To save a WAV, write those PCM bytes into a WAV container with parameters that match the stream: 44,100 Hz sample rate and 16‑bit PCM. Use any standard audio library in your language to create a WAV header with those settings, then write the pcm_data as the payload. (ElevenLabs doesn’t prescribe specific WAV-writing code in docs, but the format details above are what you need.) (elevenlabs.io)

If you’re on a plan that doesn’t include 44.1 kHz PCM, switch to another PCM rate (e.g., 16 kHz) or upgrade, per the plan requirements. (elevenlabs.io, help.elevenlabs.io)

Sources:
Create speech | ElevenLabs Documentation
Text to Speech | ElevenLabs Documentation
Generate audio in real-time | ElevenLabs Documentation
What audio formats do you support? – ElevenLabs
Text to Speech | ElevenLabs Documentation
Text to Speech | ElevenLabs Documentation
Create speech | ElevenLabs Documentation
What audio formats do you support? – ElevenLabs

heavy thicket
#

pcm data for 44100

spring krakenBOT
#
AI-generated Answer for rjanot

Short answer: with output_format=pcm_44100, pcm_data must contain the raw 16‑bit little‑endian PCM bytes at 44.1 kHz that the ElevenLabs API returns.

Where to get those bytes:

  • HTTP TTS (POST /v1/text-to-speech/{voice_id}?output_format=pcm_44100): the HTTP response body is the audio in the selected format; read the response bytes into pcm_data. PCM at 44.1 kHz is available on paid tiers (Pro+). (elevenlabs.io)
  • HTTP streaming TTS (/v1/text-to-speech/{voice_id}/stream?output_format=pcm_44100): the response stream chunks are the same PCM bytes; concatenate them into pcm_data. (elevenlabs.io)
  • WebSocket TTS guide: each message includes a base64-encoded audio field; base64-decode each chunk and append to build pcm_data. (elevenlabs.io)
  • Conversational AI WebSocket events: audio is delivered as audio_event.audio_base_64; decode and concatenate. The session metadata will show agent_output_audio_format like "pcm_44100". (elevenlabs.io)

Format details (for validation):

  • “PCM (S16LE) … 16‑bit depth,” with supported sample rates including 44.1 kHz. (elevenlabs.io)
  • ElevenLabs lists PCM 44.1 kHz as an API-supported format. (help.elevenlabs.io)

If you just need a WAV file (not raw PCM), the web UI lets you download WAV directly from history; the API docs do not provide a code sample for wrapping PCM into WAV. (help.elevenlabs.io)

If you share which endpoint you’re using (HTTP vs WebSocket), I can point to the exact decode/read step in the relevant doc.

Sources:
Create speech | ElevenLabs Documentation
Stream speech (API Explorer) | ElevenLabs Documentation
Generate audio in real-time | ElevenLabs Documentation
Client events | ElevenLabs Documentation
Text to Speech | ElevenLabs Documentation
What audio formats do you support? – ElevenLabs
How do I download generated files from Text to Speech? – ElevenLabs

heavy thicket
spring krakenBOT
#
AI-generated Answer for rjanot

Short answer: with output_format=pcm_44100 the API returns raw 16‑bit signed little‑endian PCM at 44.1 kHz. “pcm_data” should be exactly those bytes (no WAV/RIFF header). (elevenlabs.io)

Details

  • Where the bytes come from:

    • Synchronous TTS (POST /v1/text-to-speech): the HTTP response body is the audio bytes; assign that directly to pcm_data. (elevenlabs.io)
    • Output streaming: the SDK/HTTP stream yields audio chunks as raw bytes; concatenate them for pcm_data. (elevenlabs.io)
    • “Stream with timestamps” endpoints: audio is provided as base64 strings inside JSON events—base64‑decode each chunk first, then concatenate to form pcm_data. (elevenlabs.io)
  • Format facts you must match when saving or processing:

    • Codec: PCM (S16LE), sample rate: 44,100 Hz. (elevenlabs.io)
    • 44.1 kHz PCM output is an allowed output_format value; note it requires Pro tier or above. (elevenlabs.io)
    • API-supported formats include MP3, PCM, Opus, µ‑law, and A‑law; WAV download is provided in the web app (not the API). (help.elevenlabs.io)

If you specifically need a WAV file and you’re using the web app, generate speech and download the WAV from your history. If you’re using the API, wrap the S16LE 44.1 kHz pcm_data in a WAV container on your side. (help.elevenlabs.io)

Background
ElevenLabs introduced PCM output (including 44.1 kHz) for normal, streaming, and WebSocket endpoints; default remains MP3. (elevenlabs.io)

If anything above doesn’t match your endpoint/SDK flow, tell me which endpoint you’re calling and I’ll map the exact byte handling steps from the docs.

Sources:
Text to Speech | ElevenLabs Documentation
Create speech | ElevenLabs Documentation
Streaming | ElevenLabs Documentation
Stream speech with timing | ElevenLabs Documentation
Text to Speech | ElevenLabs Documentation
Create speech | ElevenLabs Documentation
What audio formats do you support? – ElevenLabs
How do I download WAV files? – ElevenLabs
ElevenLabs — PCM output format | ElevenLabs

heavy thicket
#

pcm_data = b'...'