#Please help me!!! I am gonna implement AI agent using 11labs. It must say as voice when user send
1 messages · Page 1 of 1 (latest)
You have two supported ways to make an ElevenLabs agent “speak” when the user sends a text message.
Option A — Use the Conversational AI WebSocket (agent speaks back automatically)
- Create/configure an Agent in the dashboard to choose model/voice; note its agent_id. You can also fetch/update agent config via the API.
- Open a WebSocket to wss://api.elevenlabs.io/v1/convai/conversation?agent_id=YOUR_AGENT_ID. For private agents, obtain a signed URL from your server; never expose your API key client‑side.
- Send the user’s text as a WebSocket message: {"type":"user_message","text":"..."}.
- Listen for messages. The agent returns:
- agent_response (text) and
- audio events with base64 audio. Also read the initial conversation_initiation_metadata to get agent_output_audio_format (e.g., pcm_16000) so you can decode/play it.
Option B — If you already have your own LLM (just synthesize its text to voice)
- Synchronous HTTP: POST /v1/text-to-speech/{voice_id} with JSON { "text": "...", "model_id": "..." } and choose output_format (e.g., mp3_44100_128). Play the resulting audio file/bytes.
- Low‑latency streaming over HTTP: use the official Node/Python SDKs’ text-to-speech.stream to consume chunks as they are generated.
- Real‑time via WebSocket (TTS only): connect to wss://api.elevenlabs.io/v1/text-to-speech/{voice_id}/stream-input?model_id=..., send your text frames, and write out audio chunks as they arrive. For lowest latency, the guide recommends the eleven_flash_v2_5 model and explains buffering controls like chunk_length_schedule and flush.
If you prefer a no‑code frontend
- The web widget supports “Voice + text” mode so users can type or speak; enable this modality in the widget settings if you want text input with spoken responses. Avoid “Text only mode,” which disables voice output.
Security note
- For browser apps, proxy any REST calls and use signed WebSocket URLs for private agents; do not ship your xi‑api‑key to the client.
If you tell me your stack (web, iOS/Android, Node/Python), I can point you to the exact snippet to connect, send {"type":"user_message"}, and play the returned audio.
I am gonna building web application using React as frontend and python as backend.
Great—here’s the verified way to wire ElevenLabs Text-to-Speech (TTS) into a React frontend with a Python backend.
Architecture (recommended by docs)
- Never call ElevenLabs directly from React; keep your xi-api-key on the server and forward requests through your Python backend. Include the key in the xi-api-key header only on the server. (elevenlabs.io)
Choose your voice and model
- Get a voice_id your backend can use via GET /v1/voices. (elevenlabs.io)
- Pick a model based on latency/quality needs:
- eleven_flash_v2_5: ultra‑low latency for conversational uses.
- eleven_multilingual_v2: highest quality for media.
- eleven_v3 (Alpha): most expressive, but not for real‑time; API access is limited. See the models overview and model guidance. (elevenlabs.io) help.elevenlabs.io)
Backend (Python) options
- One‑shot synthesis (returns an audio file): POST /v1/text-to-speech/:voice_id with JSON body containing text, optional model_id, and output_format (e.g., mp3_44100_128). Your backend returns the audio payload to the browser. (elevenlabs.io)
- Streaming audio (lower perceived latency): POST /v1/text-to-speech/:voice_id/stream. The API streams raw audio bytes over HTTP chunked transfer; the official Python SDK provides helpers to consume the stream. Your backend can relay the stream to the client. (elevenlabs.io)
- Need timestamps for UI highlighting/sync?
- HTTP: /with-timestamps returns base64 audio plus character‑level alignment arrays. (elevenlabs.io)
- Streaming + timestamps: /stream/with-timestamps returns a stream of JSON chunks with base64 audio and alignment. (elevenlabs.io)
- Real‑time partial text (WebSocket): wss://api.elevenlabs.io/v1/text-to-speech/:voice_id/stream-input (or multi‑stream variant). Supports sending text incrementally and receiving audio chunks; optional sync_alignment, auto_mode, and SSML parsing flags are available. Use only from your backend or a secured environment. (elevenlabs.io)
Request parameters you’ll likely use
- text: the text to speak. (elevenlabs.io)
- model_id: pick from models that “can_do_text_to_speech.” (elevenlabs.io)
- output_format: codec_sampleRate_bitrate (e.g., mp3_44100_128; μ‑law and PCM are available for telephony). (elevenlabs.io) help.elevenlabs.io)
- voice_settings: optional per‑request overrides (speed, stability, similarity_boost). (elevenlabs.io)
- language_code: language enforcement currently supported only on Turbo v2.5 and Flash v2.5. (elevenlabs.io)
Frontend (React)
- Call your Python endpoint, then play the returned audio (file bytes from “convert”, a proxied stream from “stream”, or decode base64 from the “with‑timestamps” endpoints). The API shapes and streaming behavior are documented above. (elevenlabs.io)
SSML and pauses
- Natural pauses: use <break time="Xs" /> in your text (up to 3s). Phoneme tags are
how to get text from ai response?
Short answer:
- In real time, read the agent_response event. It carries the assistant’s text.
- From SDKs, use the message/callback hooks that surface that same text.
- After the call, fetch the transcript via the Conversations REST API.
How to do it
-
WebSocket (raw):
- Listen for messages where type === "agent_response". The text is in agent_response_event.agent_response. You may also receive agent_response_correction if the reply was interrupted. (elevenlabs.io)
-
JavaScript/React SDK:
- Provide an onMessage handler and use the event type to pick out the AI’s text. In chat mode, the SDK surfaces message.type === 'agent_response' with message.text. You can also call conversation.getId() later if you need the ID to fetch history. (elevenlabs.io)
- If you’re using the React hook (@elevenlabs/react), pass an onMessage option and handle agent messages there. (elevenlabs.io)
-
Python SDK:
- Pass callback_agent_response to Conversation(...). The SDK invokes it with the agent’s response text. Optional: callback_agent_response_correction handles truncated text after barge‑in. (elevenlabs.io)
-
After the session (server-side/history):
- GET /v1/convai/conversations/:conversation_id returns transcript[], containing role and message for each turn. Use the conversation ID you got from the SDK (getId) or by listing conversations. (elevenlabs.io)
Related references
- Client events (all server→client event shapes, including agent_response). (elevenlabs.io)
- WebSocket Agent API reference (example event stream). (elevenlabs.io)
- JS SDK docs (onMessage, getId). (elevenlabs.io)
- React SDK docs (onMessage option). (elevenlabs.io)
- Python SDK docs (callback_agent_response). (elevenlabs.io)
- Conversations API (fetch transcript). (elevenlabs.io)
Sources:
Client events | ElevenLabs Documentation
JavaScript SDK | ElevenLabs Documentation
React SDK | ElevenLabs Documentation
Python SDK | ElevenLabs Documentation
Get conversation details | ElevenLabs Documentation
Client events | ElevenLabs Documentation
Agent WebSockets | ElevenLabs Documentation
JavaScript SDK | ElevenLabs Documentation
React SDK | ElevenLabs Documentation
Python SDK | ElevenLabs Documentation
Get conversation details | ElevenLabs Documentation
Option A — Use the Conversational AI WebSocket (agent speaks back automatically)
Create/configure an Agent in the dashboard to choose model/voice; note its agent_id. You can also fetch/update agent config via the API.
Open a WebSocket to wss://api.elevenlabs.io/v1/convai/conversation?agent_id=YOUR_AGENT_ID. For private agents, obtain a signed URL from your server; never expose your API key client‑side.
Send the user’s text as a WebSocket message: {"type":"user_message","text":"..."}.
Listen for messages. The agent returns:
agent_response (text) and
audio events with base64 audio. Also read the initial conversation_initiation_metadata to get agent_output_audio_format (e.g., pcm_16000) so you can decode/play it.
Can you give me sample code using react?