#Hello,Topic -> Text-to-SpeechPlease explain the aboive topic share the apis deatails which we can

1 messages · Page 1 of 1 (latest)

flint windBOT
#
AI-generated Answer for jayeshkhatik_54264

Below is the officially documented way to use ElevenLabs Text-to-Speech (TTS) from any app, plus the exact API endpoints and parameters.

What it does

  • Converts text into speech using a selected voice and model; supports file responses, HTTP streaming, WebSocket real‑time streaming, and optional per‑character timestamps. (elevenlabs.io)

Authentication (required for every call)

  • Send your API key in header: xi-api-key: YOUR_KEY. Do not expose API keys in client-side code; call the API from your backend (or use short‑lived tokens/signed URLs where applicable). Keys can be scoped and rate/credit‑limited. (elevenlabs.io)

Core TTS endpoints

  1. Create speech (returns a single audio file)
  • POST /v1/text-to-speech/:voice_id
  • Body (JSON): text (required), model_id (optional), language_code (model‑dependent), voice_settings, pronunciation_dictionary_locators, seed, previous_text/next_text, previous_request_ids/next_request_ids
  • Query: output_format (default mp3_44100_128; 19 enum options), enable_logging
  • Response: audio file
  • Use cases: generate a complete clip in one response. (elevenlabs.io)
  1. Stream speech over HTTP (chunked)
  • POST /v1/text-to-speech/:voice_id/stream
  • Same request fields as above; audio bytes are streamed progressively (lower time‑to‑first‑byte). (elevenlabs.io)
  1. Create speech with timestamps (file + alignment data)
  • POST /v1/text-to-speech/:voice_id/with-timestamps
  • Response: JSON with base64 audio and per‑character timing arrays. (elevenlabs.io)
  1. Stream speech with timestamps (JSON events)
  • POST /v1/text-to-speech/:voice_id/stream/with-timestamps
  • Response: stream of JSON chunks containing base64 audio plus timing info. (elevenlabs.io)

Real‑time WebSocket TTS (bidirectional)

  • Single‑context: wss://api.elevenlabs.io/v1/text-to-speech/:voice_id/stream-input
  • Multi‑context: wss://api.elevenlabs.io/v1/text-to-speech/:voice_id/multi-stream-input
  • Query/options include: model_id, language_code (where supported), output_format, enable_ssml_parsing, sync_alignment, auto_mode, inactivity_timeout. Messages include initializeConnection, sendText, audioOutput, finalOutput. (elevenlabs.io)

Models and limits

  • List available models: GET /v1/models. Choose one with can_do_text_to_speech = true. (elevenlabs.io)
  • Character limits per request vary by model; e.g., eleven_flash_v2_5 and eleven_turbo_v2_5 up to 40,000 chars; eleven_multilingual_v2 up to 10,000. Split longer text across requests. (elevenlabs.io)

Voices

  • You need a voice_id. List your voices with GET /v2/voices, or copy a voice’s ID from the dashboard. (elevenlabs.io)

Audio formats

  • Select with output_format (codec_sampleRate_bitrate), e.g., mp3_44100_128; additional PCM formats are available (including pcm_48000 as of Apr 28, 2025). (elevenlabs.io)

Streaming in front‑end apps

  • Recommended pattern: your frontend calls your backend; the backend calls /v1/text-to-speech/:voice_id/stream and relays the stream to the browser. The docs provide a cookbook example (ReadableStream + stream.tee() to both return to the browser and persist).
main bone
#

Hello
I want to add Dial Click funcationality in voice calling via Agent and Customer. So How can we add this in evelevn Call api.