#yo has anyone tried empirellm.com yet? been using it for multi-model orchestration and it genuinely
1 messages · Page 1 of 1 (latest)
Thanks for sharing. I can’t speak to empirellm.com, but here’s exactly how to pipe any orchestrator’s output into ElevenLabs TTS using only official docs:
-
Authenticate
- Create an API key in the dashboard and send it on every request via the xi-api-key header. Keys can be scoped/restricted; keep them server-side or use single‑use tokens where supported. (help.elevenlabs.io)
-
Pick a voice (get its voice_id)
- Use your “My Voices” entries or list voices via the API and select the desired voice_id. (elevenlabs.io)
-
Convert text to speech (non‑streaming)
- Endpoint: POST https://api.elevenlabs.io/v1/text-to-speech/:voice_id
- Common body fields: text, model_id (e.g., eleven_multilingual_v2), output_format (e.g., mp3_44100_128), optional voice_settings, language_code. Set query param enable_logging=false for enterprise zero‑retention. Returns audio bytes. (elevenlabs.io)
-
Low‑latency HTTP streaming (chunked)
- Endpoint: POST https://api.elevenlabs.io/v1/text-to-speech/:voice_id/stream
- Same request fields as above; audio is returned incrementally via chunked transfer encoding for immediate playback. (elevenlabs.io)
-
Real‑time via WebSocket (input streaming)
- Endpoint: wss://api.elevenlabs.io/v1/text-to-speech/:voice_id/stream-input
- Use when you generate text progressively or need word/segment alignment. Supports single‑use tokens for client‑side auth. Note: WebSockets are not available for the eleven_v3 model. (elevenlabs.io)
-
Model selection (pick for your routing rules)
- Eleven Flash v2.5: ultra‑low latency (~75 ms). Eleven Turbo v2.5: low latency (~250–300 ms). Eleven Multilingual v2: highest stability for long‑form. Eleven v3: most expressive/emotional (not supported over TTS WebSocket). (elevenlabs.io)
-
Formats and telephony
- Output formats include MP3 (22.05–44.1 kHz, multiple bitrates), PCM S16LE (16–48 kHz), μ‑law and A‑law (8 kHz), and Opus (48 kHz). Choose the output_format enum accordingly (e.g., μ‑law for PSTN). (elevenlabs.io)
-
Regions/latency control
- For US‑only routing, set the base URL to api.us.elevenlabs.io; otherwise use the global default. (elevenlabs.io)
Minimal request recipe your orchestrator can issue:
- Set header xi-api-key: YOUR_KEY. Include text, model_id, output_format in the JSON body.
- Use POST /v1/text-to-speech/:voice_id for simple jobs, or /v1/text-to-speech/:voice_id/stream for immediate playback; switch to WebSocket stream‑input when you’re generating text incrementally. (elevenlabs.io)
Optional quality/continuity controls you can route by:
- voice_settings per request; pronunciation_dictionary_locators; previous_text/next_text or previous_request_ids/next_request_ids for seamless multi‑chunk synthesis. (elevenlabs.io)
Full API reference and SDK quickstarts (Node/Python) are here if you need code snippets. (elevenlabs.io)
If you want, tell me your orchestrator’s HTTP/WebSocket config and I’ll map the exact fields to the right ElevenLabs endpoints and models
Promo link