#Hello,Topic -> Text-to-SpeechPlease explain the aboive topic share the apis deatails which we can
1 messages · Page 1 of 1 (latest)
AI-generated Answer for jayeshkhatik_54264
Below is the officially documented way to use ElevenLabs Text-to-Speech (TTS) from any app, plus the exact API endpoints and parameters.
What it does
- Converts text into speech using a selected voice and model; supports file responses, HTTP streaming, WebSocket real‑time streaming, and optional per‑character timestamps. (elevenlabs.io)
Authentication (required for every call)
- Send your API key in header: xi-api-key: YOUR_KEY. Do not expose API keys in client-side code; call the API from your backend (or use short‑lived tokens/signed URLs where applicable). Keys can be scoped and rate/credit‑limited. (elevenlabs.io)
Core TTS endpoints
- Create speech (returns a single audio file)
- POST /v1/text-to-speech/:voice_id
- Body (JSON): text (required), model_id (optional), language_code (model‑dependent), voice_settings, pronunciation_dictionary_locators, seed, previous_text/next_text, previous_request_ids/next_request_ids
- Query: output_format (default mp3_44100_128; 19 enum options), enable_logging
- Response: audio file
- Use cases: generate a complete clip in one response. (elevenlabs.io)
- Stream speech over HTTP (chunked)
- POST /v1/text-to-speech/:voice_id/stream
- Same request fields as above; audio bytes are streamed progressively (lower time‑to‑first‑byte). (elevenlabs.io)
- Create speech with timestamps (file + alignment data)
- POST /v1/text-to-speech/:voice_id/with-timestamps
- Response: JSON with base64 audio and per‑character timing arrays. (elevenlabs.io)
- Stream speech with timestamps (JSON events)
- POST /v1/text-to-speech/:voice_id/stream/with-timestamps
- Response: stream of JSON chunks containing base64 audio plus timing info. (elevenlabs.io)
Real‑time WebSocket TTS (bidirectional)
- Single‑context: wss://api.elevenlabs.io/v1/text-to-speech/:voice_id/stream-input
- Multi‑context: wss://api.elevenlabs.io/v1/text-to-speech/:voice_id/multi-stream-input
- Query/options include: model_id, language_code (where supported), output_format, enable_ssml_parsing, sync_alignment, auto_mode, inactivity_timeout. Messages include initializeConnection, sendText, audioOutput, finalOutput. (elevenlabs.io)
Models and limits
- List available models: GET /v1/models. Choose one with can_do_text_to_speech = true. (elevenlabs.io)
- Character limits per request vary by model; e.g., eleven_flash_v2_5 and eleven_turbo_v2_5 up to 40,000 chars; eleven_multilingual_v2 up to 10,000. Split longer text across requests. (elevenlabs.io)
Voices
- You need a voice_id. List your voices with GET /v2/voices, or copy a voice’s ID from the dashboard. (elevenlabs.io)
Audio formats
- Select with output_format (codec_sampleRate_bitrate), e.g., mp3_44100_128; additional PCM formats are available (including pcm_48000 as of Apr 28, 2025). (elevenlabs.io)
Streaming in front‑end apps
- Recommended pattern: your frontend calls your backend; the backend calls /v1/text-to-speech/:voice_id/stream and relays the stream to the browser. The docs provide a cookbook example (ReadableStream + stream.tee() to both return to the browser and persist).
Hello
I want to add Dial Click funcationality in voice calling via Agent and Customer. So How can we add this in evelevn Call api.