When using conversational AI agents, can I reply with text instead of a user audio chunk? | ElevenLabs | Page 1

timber galleonBOT Feb 22, 2025, 12:24 PM

#

AI-generated Answer for tvararu

No, the primary function of ElevenLabs Conversational AI is to facilitate live conversations using audio. The system is designed to process audio inputs, convert them into text, and then generate a response. Sending text directly to the agent is not supported as part of its core functionality.

If you need to include specific context or information, you can consider structuring the system prompt or knowledge base for the agent. Alternatively, you could convert text into audio chunks and send them via the WebSocket protocol as a workaround.

ebon sonnet Feb 22, 2025, 12:25 PM

#

Can I use the ElevenLabs API to generate the audio chunk?

timber galleonBOT Feb 22, 2025, 12:25 PM

#

AI-generated Answer for tvararu

Yes, you can use the ElevenLabs API to generate audio chunks. The Text-to-Speech API allows you to convert text into audio, which can then be used as input for various applications, including conversational AI agents.

To generate audio chunks, you can use the streaming endpoint of the Text-to-Speech API. Here's an example of how to send a request to the API and receive a stream of audio bytes:

import requests

# Define constants
XI_API_KEY = "<your-api-key>"
VOICE_ID = "<voice-id>"
TEXT_TO_SPEAK = "<text>"
OUTPUT_PATH = "output.mp3"

# Construct the URL for the Text-to-Speech API request
tts_url = f"https://api.elevenlabs.io/v1/text-to-speech/{VOICE_ID}/stream"

# Set up headers for the API request
headers = {
    "Accept": "application/json",
    "xi-api-key": XI_API_KEY
}

# Set up the data payload for the API request
data = {
    "text": TEXT_TO_SPEAK,
    "model_id": "eleven_multilingual_v2",
    "voice_settings": {
        "stability": 0.5,
        "similarity_boost": 0.8,
        "style": 0.0,
        "use_speaker_boost": True
    }
}

# Make the POST request to the TTS API with headers and data, enabling streaming response
response = requests.post(tts_url, headers=headers, json=data, stream=True)

# Check if the request was successful
if response.ok:
    with open(OUTPUT_PATH, "wb") as f:
        for chunk in response.iter_content(chunk_size=1024):
            f.write(chunk)
    print("Audio stream saved successfully.")
else:
    print(response.text)

This code will generate an audio file from the provided text. You can then encode the audio in Base64 and send it as a user_audio_chunk to the conversational AI WebSocket API. For more details on the WebSocket API, refer to the Conversational AI WebSocket documentation.

ebon sonnet Feb 22, 2025, 12:39 PM

#

Can I pause a conversational AI agent call and then resume it? Is there a limit on the maximum duration of a conversation?

timber galleonBOT Feb 22, 2025, 12:39 PM

#

AI-generated Answer for tvararu

Yes, you can pause and resume a conversational AI agent call. However, the maximum duration of a conversation is configurable. By default, the duration is set to 300 seconds (5 minutes). You can increase this limit by navigating to the Conversational AI page in the website UI, selecting your agent, and adjusting the "Max conversation duration" option under the "Advanced" tab.

Let me know if you have further questions!

ebon sonnet Feb 22, 2025, 12:40 PM

#

Can I pause automatically after the AI's turn?

#When using conversational AI agents, can I reply with text instead of a user audio chunk?