#I'll post in here since I don't want to flood the channel anymore
1 messages · Page 1 of 1 (latest)
Windows?
mac
can you run the program again to the point where it gets stuck and leave it like that
sorry not bash zsh
and then open another terminal -- and find the process ID? something like ps aux | grep -i python and one of them might be the one you are running (that is stuck)
once you are sure you have the right process ID
can you kill -SIGINT process_id_here
then take a look at the program that is stuck and see if anything changed
i'm gonna see what happens when I try their example code. I've never used their example code so I'm curious now
okay i had to do it in windows due to audio
but... i'm getting what you get with their example--- control+c does nothing
but the call happens
gonna see how to make it work
@livid spear hey you still there
1 file for this
can you take a look at this ```python
import os
import signal
import sys
import threading
import time
from elevenlabs.client import ElevenLabs
from elevenlabs.conversational_ai.conversation import Conversation
from elevenlabs.conversational_ai.default_audio_interface import DefaultAudioInterface
def main():
AGENT_ID="redacted"
API_KEY="redacted"
if not AGENT_ID:
sys.stderr.write("AGENT_ID environment variable must be set\n")
sys.exit(1)
if not API_KEY:
sys.stderr.write("ELEVENLABS_API_KEY not set, assuming the agent is public\n")
client = ElevenLabs(api_key=API_KEY)
conversation = Conversation(
client,
AGENT_ID,
# Assume auth is required when API_KEY is set
requires_auth=bool(API_KEY),
audio_interface=DefaultAudioInterface(),
callback_agent_response=lambda response: print(f"Agent: {response}"),
callback_agent_response_correction=lambda original, corrected: print(f"Agent: {original} -> {corrected}"),
callback_user_transcript=lambda transcript: print(f"User: {transcript}"),
# callback_latency_measurement=lambda latency: print(f"Latency: {latency}ms"),
)
shutdown_flag = False
conversation_id = None
def signal_handler(sig, frame):
nonlocal shutdown_flag
print(f"\nReceived signal {sig}. Shutting down...")
shutdown_flag = True
conversation.end_session()
# Register signal handler
signal.signal(signal.SIGINT, signal_handler)
signal.signal(signal.SIGTERM, signal_handler)
conversation.start_session()
print("Conversation started. Press Ctrl+C to exit.")
# Non-blocking wait - check periodically if we should exit
try:
while not shutdown_flag:
time.sleep(0.1)
except KeyboardInterrupt:
print("\nKeyboardInterrupt received. Shutting down...")
conversation.end_session()
# Try to get conversation ID with a short timeout
def get_conversation_id():
nonlocal conversation_id
try:
conversation_id = conversation.wait_for_session_end()
except:
pass
id_thread = threading.Thread(target=get_conversation_id, daemon=True)
id_thread.start()
id_thread.join(timeout=2.0) # Wait max 2 seconds for conversation ID
if conversation_id:
print(f"Conversation ID: {conversation_id}")
print("Program terminated.")```
and adapt it to your code
there are 2 new imports (time, threading)
and then maybe 10-20 lines of code otherwise
their example code might be bugged or sensitive to the environment... i'm not sure. I could test it on a mac
the main differences are the 2 imports
and then-----> (see below)
stuff starting from shutdown_flag
ending right before conversation.start_session()
and then again starting the next line ("print") and extending through that try/catch and all the way to termination message
that should make it more robust to be able to actually end. right now it's just stuck due to some kind of deadlock/sync issue, or maybe something else. once you get get out of the convo, you'll be able to do whatever afterwards in the code
i could adapt your code for you (at least the snipper you sent me) --- but the imports you'll have to add
here:```python
def start_elevenlabs_sdk_conversation(questions: str, agent_id: str = 'place_holder'):
"""
Starts an intake conversation using the ElevenLabs Conversational AI SDK.
This version leverages ElevenLabs' built-in conversation management.
"""
if eleven_client is None:
print("ElevenLabs client not initialized. Cannot start SDK conversation.")
return
else:
print("ElevenLabs client initialized. Starting SDK conversation.")
dynamic_vars= {
"list_of_questions": f"""{questions}"""
}
print(f'dynamic_vars: {dynamic_vars}')
config = ConversationInitiationData(
dynamic_variables=dynamic_vars,
)
conversation = Conversation(
eleven_client,
agent_id,
config = config,
requires_auth=bool(os.getenv('ELEVENLABS_API_KEY')), # This checks if key is in env
audio_interface=DefaultAudioInterface(),
callback_agent_response=lambda response: print(f'Agent: {response}'),
callback_agent_response_correction=lambda original, corrected: print(f'Agent: {original} -> {corrected}'),
callback_user_transcript=lambda transcript: print(f'User: {transcript}')
)
shutdown_flag = False
conversation_id = None
def signal_handler(sig, frame):
nonlocal shutdown_flag
print(f"\nReceived signal {sig}. Shutting down...")
shutdown_flag = True
conversation.end_session()
# Register signal handler
signal.signal(signal.SIGINT, signal_handler)
signal.signal(signal.SIGTERM, signal_handler)
conversation.start_session()
print("Conversation started. Press Ctrl+C to exit.")
# Non-blocking wait - check periodically if we should exit
try:
while not shutdown_flag:
time.sleep(0.1)
except KeyboardInterrupt:
print("\nKeyboardInterrupt received. Shutting down...")
conversation.end_session()
# Try to get conversation ID with a short timeout
def get_conversation_id():
nonlocal conversation_id
try:
conversation_id = conversation.wait_for_session_end()
except:
pass
id_thread = threading.Thread(target=get_conversation_id, daemon=True)
id_thread.start()
id_thread.join(timeout=2.0) # Wait max 2 seconds for conversation ID
if conversation_id:
print(f"Conversation ID: {conversation_id}")
print("Program terminated.")```
some of the indentation might be fucked up though sorry. hopefully just a space or two or space vs tab
@livid spear let me know if you have questions about that or if it works/ or you choose another solution
yes implementing it rn
still freezes
problem is conversation.end_session() or conversation_id = conversation.wait_for_session_end()
can you show me the current code you are running please
there should be a straightforward fix for this
when the convo is done is when it freezes
so when the convo gets complete nothing in the terminal will work
which is fine
yes
yeah this is kool. best guess is something when it ends creates a deadlock
(two things waiting on each other)'
but can I still see what the current code looks like just in case
remind me, without that signal part, what happened with you did press control+c
program would respond by exiting right
what worked
i'm curious
but on the website
there are 2 ways to cancel the call
end_call tool (auto-chosen)
or in the system prompt
I think they are conflicting
like both are getting triggered?
end_call tool gets triggered when all questions are answered
hmmmm. no those should work together
the system message tells it the tool is there, reinforces it
the tool is actual tool -- this leads to the thing running the bot/llm/etc. to respond
ctrl + C works when end_call is turned off
interesting!
and all questions are answered
but I still need it to end the call without me hitting ctrl + C
any ideas for that?
is that .end_session()?
what I'm hearing is that whenever whatever is controlled by the end_call tool is turned on, and the convo ends, then it gets stuck. when this end_call option is off in the UI, it does respond
if that's true, reproducible, etc., would be worthy of a bug report
but let me see what you just asked
ideas I have if you cannot use control+c (which I think is still an issue to figure out)
there is an option where it hangs up due to silence
but that might trigger the same thing on the back side if that makes sense
worth a try?
so should I turn it off?
end_call needs to be turned on
the sys prompt can't turn it off
well i'd say figuring out why it gets stuck is probably the "correct" way, but.... might take a while and is hard to do like this over the internet on discord.
if you use the silence criteria to end call from the 11labs side, then that might still lead to the same issue as end_call option
i'm gonna try to reproduce what you said but for me, control+c anytime just didn't work I think...
so ur suggestion is turn it off?
if you can turn it off and set the silence to whatever you are okay with and test it... that could work for you
but like I said, it might still lead to the same problem in your code. you could just try. or wait for someone else's opinion. not trying to waste your time
the thing is, my control+c is stuck the whole time on windows console
ya mine was too
not just when the call ends
had to make new terminal
gonna try to use a debugger.
def wait_for_session_end(self) -> Optional[str]:
"""Waits for the conversation session to end.
You must call `end_session` before calling this method, otherwise it will block.```
soo that's a problem
that we are feeling
before wait_... is clearly called before end_session is called because the latter is triggered by the control+c signal handling...
i guess block until what..... maybe that signal. so perhaps it isn't implying an issue
OKAY
mac, example code, control+c works fine... but it just sometimes takes a second or two to get accepted and end the program
@livid spear probably my last attempt of the day. have to do other stuff sorry. if you have a chance to adopt your code to this and try, let me know how it goes:
import os
import signal
import sys
import threading
import time
from elevenlabs.client import ElevenLabs
from elevenlabs.conversational_ai.conversation import Conversation
from elevenlabs.conversational_ai.default_audio_interface import DefaultAudioInterface
AGENT_ID="redacted"
API_KEY="redacted"
def main() -> None:
client = ElevenLabs(api_key=API_KEY)
conversation = Conversation(
client,
AGENT_ID,
requires_auth=bool(API_KEY),
audio_interface=DefaultAudioInterface(),
callback_agent_response = lambda r: print(f"Agent: {r}"),
callback_agent_response_correction = lambda o,c: print(f"Agent: {o} -> {c}"),
callback_user_transcript = lambda t: print(f"User: {t}"),
)
conversation.start_session()
# Run wait_for_session_end() in a daemon thread -----------------
def _block_until_done() -> None:
conv_id = conversation.wait_for_session_end()
print(f"\nConversation ID: {conv_id}")
waiter = threading.Thread(target=_block_until_done, daemon=True)
waiter.start()
# ----------------------------------------------------------------
def _sigint_handler(sig, frame):
print("\nCtrl-C detected — ending session ...")
conversation.end_session() # ask ElevenLabs to shut down
waiter.join(timeout=2.0) # give it a moment to finish
sys.exit(0) # hard-exit if it’s still stuck
signal.signal(signal.SIGINT, _sigint_handler)
# Keep the main thread alive but idle so it can process signals.
try:
while waiter.is_alive():
time.sleep(0.2)
except KeyboardInterrupt:
pass # we never actually get here because we exit in handler
if __name__ == "__main__":
main()```
other sorry I couldn't help. i would stil personally see if I can ensure it doesn't really work on windows and win -- because it's worthy of a bug report if that remains the case
Nothing worked
ahhh k
Do you work for 11 labs?
nah. i've been just on vacation this week and doing stuff with it like discord bots and phone call bots and stuff
Wurd
so it's kind of fresh but i never used their example code
You write good code
Ty for helping
Not sure what to try after this
I can make another bot
I can try doing it thru JS on the frontend instead
well. there are some benefits to this. not all to you. like if I can show the example code as it is (with just env variables in there) doesn't work on a major platform, reproducible, then that is worthy of a bug report and a fix. and that means we did a good job
but not sure that is the case yet
i did do example code + mac but that DID work
Reason: Bad word usage
lol
What worked?
during the convo, I could control+c to get it to end. in windows, during or after the convo is done, it'll totally block (part of your issue I think)
(even if it is not windows, probably related)
it's deadlocked or eating up all the SIGINT or whatever signals
program just keeps waiting for something
the hard part of this kind of stuff is not knowing their library code. stuff like race conditions and stuff are hard to debug in general, especially if we are working on the outside of the issue. but I'm not gonna blame. might be a me/you issue
normally a program will be killable in the terminal. if not within (keyboard signals) or external signals (like using kill) but sometimes the program is stuck stuck--- like if it has a handler for the signal -- then the OS will let it keep going often... most times ctrl+c or whatever signal will kill the program because the default handling of that does that. but if you manually handle it and then combine that with some kind of race condition or bug --- then all the signals will be "handled" and the program will still stay stuck in some loop possibly
if that makes sense (sorry no idea your level of programming knowledge)
i tried to debug via vscode on Windows platform but it wouldn't let me enter the library code....
let me know if your adoptation of this works/worked thanks
will do 1 sec
do you think teh problem is the code is in a function?
and shouldn't be?
not sure. that is part of what I Was talking about before re not knowing the context of that function, because clearly it isn't the full code.
but
with all that we know now, not sure that's really the issue at all
i'd like to see what that code I mentioned above does. and if it doesn't work, i'd like to know exactly what you're running (minus stuff you can redact, don't care about that)
ok 1 sec
yes no difference
Ctrl-C detected — ending session ...
^C^C^C^C
but the convo got ended in actuality, just not on the terminal
can I see what the code looks like now. and if you don't want me to show the calling code, then maybe tell me what it is
since clearly it's not the whole program right.... no main
ya sure hold on
elevenlabs_questions = ["What day is it?"]
def main() -> None:
client = ElevenLabs(api_key=os.getenv("ELEVENLABS_API_KEY"))
dynamic_vars= {
"list_of_questions": f"""{elevenlabs_questions}"""
}
print(f'dynamic_vars: {dynamic_vars}')
config = ConversationInitiationData(
dynamic_variables=dynamic_vars,
)
conversation = Conversation(
client,
agent_id,
config = config,
requires_auth=bool(os.getenv("ELEVENLABS_API_KEY")),
audio_interface=DefaultAudioInterface(),
callback_agent_response = lambda r: print(f"Agent: {r}"),
callback_agent_response_correction = lambda o,c: print(f"Agent: {o} -> {c}"),
callback_user_transcript = lambda t: print(f"User: {t}"),
)
conversation.start_session()
# Run wait_for_session_end() in a daemon thread -----------------
def _block_until_done() -> None:
conv_id = conversation.wait_for_session_end()
print(f"\nConversation ID: {conv_id}")
waiter = threading.Thread(target=_block_until_done, daemon=True)
waiter.start()
# ----------------------------------------------------------------
def _sigint_handler(sig, frame):
print("\nCtrl-C detected — ending session ...")
conversation.end_session() # ask ElevenLabs to shut down
waiter.join(timeout=2.0) # give it a moment to finish
sys.exit(0) # hard-exit if it’s still stuck
signal.signal(signal.SIGINT, _sigint_handler)
# Keep the main thread alive but idle so it can process signals.
try:
while waiter.is_alive():
time.sleep(0.2)
except KeyboardInterrupt:
pass # we never actually get here because we exit in handler
if __name__ == "__main__":
main()```
ty
literally ur code except I included dynamic vars
so my convo ends when one question gets asked
hold on
import os
import signal
import sys
import threading
import time
from elevenlabs.client import ElevenLabs
from elevenlabs.conversational_ai.conversation import Conversation
from elevenlabs.conversational_ai.default_audio_interface import DefaultAudioInterface
from elevenlabs.conversational_ai.conversation import Conversation, ConversationInitiationData```
can you change _sigint_handler to be python def _sigint_handler(sig, frame): print("[DEBUG] SIGINT handler entered") conversation.end_session() print("[DEBUG] end_session() returned") waiter.join(timeout=2.0) print("[DEBUG] waiter.join done – exiting") sys.exit(0)
and tell me wh at's different during runtime, if anything -- thanks
@livid spear
Yes just @ gym will get back to you in 45
don't skip leg day
nope no change
I think wait_for_session_end is the culprit
if you have a suggestion on how to end the call immediately
sys.exit(), conversation.session_end, etc.
all ears
because I can retrieve transcript
hmm the interesting thing is with a recent mac, their example code with only the keys being adjusted (agent id, etc.) --- it was interruptable using the default terminal program....
a few hours ago
and you are on mac
but zsh but not sure that matters in this case
do you think you can just https://github.com/elevenlabs/elevenlabs-examples/blob/main/examples/conversational-ai/python/convai/demo.py
use that, adjust the 2 variables (agent id, 11labs key) -- and see if that even works?
to do a basic sanity check....
i understand you won't be able to do more things that you want, but I mean just to see if the actual example code they provide with basic adjustments (the 2 replacements in the variables) --- work in your terminal/shell/environment
because if it doesn't and it worked for me.... then that can help
hmm. well i think a hard part is not doing too many things at once.
i'd just be curious about what i said - -- i'd just copy that code (demo.py) -- adjust the keys (2 of them) and just run it and see what happens
that's a basic sanity check
sometimes when i'm coding, and it's going poorly, I just check basic assumptions --- to make sure i'm not too deep in the weeds in the wrong area
if that makes sense
but if you just assume you were a new user, cloned demo.py, replaced the things that it tells you to replace, and it doesn't work ---> this goes a long way
vs. using a highly edited/altered code with odd terminal or shell, etc. ---> then you don't know what went wrong
but what I linked above is from the official repo, in examples, and demo.py is probably the example you maybe started with
i'm just curious on your setup, who happens when you run minimally altered code from them (alterations are just making sure the keys are right)
are you using default terminal program of Mac
yes
zsh
I also tried on bash didn't work
it worked on ur end?
did u ctrl c?
fyi turning on end call
is terrifying
mac os, control c, that example, Yes it quits
asked it to end the convo 5 times and it wouldn't hahaha
it control c once the entire convo is done
as of a few hours ago
or before?
yes before works for me too
but I need the convo to be finished once it is actually finished
so I can proceed w/ rest of code
well i didn't have the bot configured to ever be able to end the call though
what i'm saying is WITH THAT
it does end, the program
i can kill it w/ ctrl+c
and thus make it move on and do stuff
what I would say is, if you promise to copy/paste this https://raw.githubusercontent.com/elevenlabs/elevenlabs-examples/refs/heads/main/examples/conversational-ai/python/convai/demo.py into some folder/whatever, and then simply changed AGENT_ID="put it here" and the next line "API_KEY="put it here too and actually run it without any other changes, and if it doesn't escape / work as expected, the mention it
but dude, if you've already done that and I'm just not registering -- i'm sorry
if you can use the example, replace those values , and it fucks up --- then that's a clear issue
yes I did that
hey can you open terminal, run bash and then run the code? just in case it is zsh, doubt it
yes I did no difference
this is a problem
signal.signal(signal.SIGINT, lambda sig, frame: conversation.end_session())
I don't want to use ctrl c to end the convo
I want it to end automatically (through voice)
when I just use conversation.start_session() I can use ctrl + C to end i t
the thing is, the example code with appropriate replacements of the keys SHOULD do whatever is natural/expected
I tried doing ```py
conversation.start_session()
conversation.end_session()
return
conversation.start_session()
sys.exit(1)
the convo wouldn't end with thi s
the library isn't simple though. on that backend, it's probably doing a bunch of stuff
if I told it so stop it wouldn't end
this doens't have anything to do with what you are telling it
ran infinitely
it's that you can't kill the program lol
I understand but I'm saying I can't use demo code
becuase it won't kill it through voice
because I could
i killed it fine
you use voice or ctrl c?
ctrl+c
it can't do it with voice
I can't ctrl c in production
i never want it to die via voice end_call
why not
i dunno. for me it doesn't matter. the phone call ends when the person calling hangs up
point is
i can still stop it
without hanging up
I don't understand
ok but what if user says they don't wanna talk anymore or they hang up
it needs to die when you ask it to die -- that is the normal way to stop it
isn't that done through voice?
unless you want the agent to self-figure it out
and emit those tool calls
(end call, silence detection, etc.)
okay. if you kill the program, which is an expected thing to be able to do, it will end the call/conversation
I hope we can agree with that
if yes, then the fact that you cannot kill the program on your side, is a problem
and not necessarily seen by others
there are certain scenarios I can kill it on my side
ctrl c before the agent auto-ending it
I am saying I can't kill it once the entire convo is finished
so when the agent does its job
it won't die
like any other program
when it does I can't
so the fact that you cannot is a serious issue
is there something I can do in code
what Mac OS version are you on
when you start a program, it will run. if you are running it in a terminal, then if you do control+c, it'lll have to affect the program
meaning it is sent to it via some posix signal
yea I think so
you cannot get the point across to your program, running on your machine
because you can't stop it, you are frustrated that the further programmatic lines are not working
or it isn't ending
the point of the signal handler is so that when you send SIGINT or whatever to the program, it'll capture it and do something
vs just killing the program outright
if you normally send it, program will stop
meaning the next lines will NOT execute
the point of the handler is so it intercepts it
says "okay"
user wants me to die
./stop
then does more stuff
and ends
anyways, you can't blame the example code unless you follow closely. i mean you can , but it's harder
but it failed for you -- sorry
i think probably someone from the team will have to help you- -- sorry I wasted your time. hopefully all this helps.
i was just moving to the bar and hanging out with friends -- so i probably won't be able to help
Reason: Bad word usage
@chilly timber bookmark
@livid spear did you try like control+c like 10 times in a row? lol
like hitting it hard and frequently for like 5 seconds
Yes
@chilly timber r u a dev?
i'm not part of the team here. i've just done stuff and I had time. but no .... i'm a nobody haha
USA, Wisconsin. I on vacation right now ----- I am a professional in medicine. So not in the stuff in here.
sorry I didn't fix your issue.
that's very interesting I am working on an AI application for healthcare rn
I'm an AI engineer
but you know a lot about coding
if ur writing threads
for a non-cs person..
if there are any automations needed for healthcare lmk I'm making some stuff for doctors rn
@livid spear i think this is similar to prior code but it did fix the issue -- before it I have the same problem as you on Windows via the traditional command prompt and git bash and powershell --- with this, it works
import os
import signal
import sys
from elevenlabs.client import ElevenLabs
from elevenlabs.conversational_ai.conversation import Conversation
from elevenlabs.conversational_ai.default_audio_interface import DefaultAudioInterface
# Global flag to handle interruption
interrupted = False
def signal_handler(sig, frame):
global interrupted
interrupted = True
print("\nInterruption signal received. Shutting down...")
def main():
global interrupted
AGENT_ID = "redacted" # TODO: replace with your actual agent ID
API_KEY = "redacted" # must exist if agent is private
if not AGENT_ID:
sys.stderr.write("AGENT_ID environment variable must be set\n")
sys.exit(1)
if not API_KEY:
sys.stderr.write("ELEVENLABS_API_KEY not set, assuming the agent is public\n")
client = ElevenLabs(api_key=API_KEY)
conversation = Conversation(
client,
AGENT_ID,
# Assume auth is required when API_KEY is set
requires_auth=bool(API_KEY),
audio_interface=DefaultAudioInterface(),
callback_agent_response=lambda response: print(f"Agent: {response}"),
callback_agent_response_correction=lambda original, corrected: print(f"Agent: {original} -> {corrected}"),
callback_user_transcript=lambda transcript: print(f"User: {transcript}"),
# callback_latency_measurement=lambda latency: print(f"Latency: {latency}ms"),
)
# Set up signal handler
signal.signal(signal.SIGINT, signal_handler)
conversation.start_session()
# Check for interruption periodically instead of blocking
import time
try:
while not interrupted:
time.sleep(0.1) # Small sleep to avoid busy waiting
except KeyboardInterrupt:
# Fallback in case signal handler doesn't work
interrupted = True
# Clean shutdown
print("Ending session...")
conversation.end_session()
conversation_id = conversation.wait_for_session_end()
print(f"Conversation ID: {conversation_id}")
if __name__ == '__main__':
main()```
Does that close with ctrl c or voice?
I don’t need a ctrl c close
i need a voice one
let me see
So according to perplexity and docs I think
Signal.sigint is a ctrl c close
But in production no one’s clicking ctrl c
I need something w/ voice
I assigned a twilio number to it
right -- it was still an issue. but that wasn't the original goal
It works
i understand
So through voice it hangs up the call
I also built an ai from scratch but it’s like 70% the conversational abilities of elevenlabs
Also looking for text to text AI agent through elevenlabs conversational api btw
There is an option through widgets but you can’t change the widget size so it’s a little box on the corner of ur screen
No bueno
If u find that would be great
i'll test to see if it ends via the voice and get back to you soon
Ty
If u need any code btw lmk
My twilio bot is great works perfectly
Just eleven labs id a little expensive
$5 for a 20 min convo
Actually really bad pricing..
But pushes applications towards use-cases where people are paid to talk
okay so this is different code -- and it works (when bot hangs up, the program proceeds/ends/isn't stuck) but more code...
not sure that audio interface piece was really needed but the code I sent like 20 minutes ago was throwing some errors sooo....
should fix your issue
also this person has the same issue so i will not file another issue https://github.com/elevenlabs/elevenlabs-examples/issues/183
@livid spear
Will check later tonight 🙂
kk
@livid spear did you get a chance to try it -- or maybe problem is no longer relevant/existent
Hey
What was the problem again
Agent wasn’t hanging up on the terminal through voice ya?
Yes it works for me 🙂
Are you able to find any text to text?
For conversational AI
The docs agent said it couldn’t be done
I think it can
It’s possible through the widget but you can’t customize widget size
Glad it is fixed
Sorry not sure about that — out at a restaurant
And haven’t done it via 11labs
Isn’t text to text …. Usual LLMs lol
ya but the 11 conversational backend is v good
I've made my own version but not as good as theirs
Well you can request in text I believe via api
And its response is audio but also it’ll send a transcript of the response
So I guess you should be able to do text to text that way
can you elaborate on this
how its T to T if they reply in audio
So the reply has a message of the transcript
And even a corrected one if interrupted
but yeah maybe technically not text 2 text if there is some audio somewhere that you are ignoring (and just using the text) -- but I was getting to the point of you can send it a text prompt, and it will send the response back in text (as well as audio, not sure if you can ask it to not send audio back vs just ignoring it on your end)
I am ok it just converting to text
the cost will be the same but it works for the demo
but where are you seeing this?
that line should be where you can register something to accept the response
if you are instead using websocks, then there is an event sent
ur saying that's for text?
it doesn't produce text from that
it does produce text but does not receive text
audio_interface=DefaultAudioInterface(),
I do not know what else to say. I literally have a bot that prints the text to console every time it says something.
no that's not what I am saying
and it receives it directly from the agent
yes I was answering where the text from the agent comes from
you can also send text.
how did u set that up?
i'm sure there is a way using the library you are already using though let me see
can u send me the code you wrote for that?
i am gonna try to make it more concise because there is much to the code that has nothing to do with what you are wanting
(it's a discord voice channel bot -- and it uses the elevenlabs websocket library features)
will get back to you in a few minutes
okay
conversation.send_user_message(text) ---- that should be how you send text to it
def send_user_message(self, text: str):
"""Send a text message from the user to the agent.
Args:
text: The text message to send to the agent.
Raises:
RuntimeError: If the session is not active or websocket is not connected.
"""
if not self._ws:
raise RuntimeError("Session not started or websocket not connected.")
event = UserMessageClientToOrchestratorEvent(text=text)
try:
self._ws.send(json.dumps(event.to_dict()))
except Exception as e:
print(f"Error sending user message: {e}")
raise```
This is what the library implements that function. it's in the comments that it is meant to send text from the user to the agent
let me adopt their example and see if it works and then send you the simple code as proof in principle
working on a simple example.
this is what it shows me when I run it: console ❯ python main.py Using Agent ID: <I redacted> Starting conversation session... Agent: Hello! Ask me about Pathfinder second edition. Sending message: Hello, what services do you offer? Waiting for agent response... Agent: Hi there! I can answer questions about Pathfinder, including rules, character creation, spells, feats, and lore. I can also help clarify mechanics, suggest builds, and explain how different parts of the game work. If you have general questions or need help with something else, feel free to ask! What would you like to know more about?
this is the code (with keys redacted) that produced that
import os
import sys
import time
from elevenlabs.client import ElevenLabs
from elevenlabs.conversational_ai.conversation import Conversation
from elevenlabs.conversational_ai.default_audio_interface import DefaultAudioInterface
class NoAudioInterface(DefaultAudioInterface):
"""Audio interface that ignores audio input/output."""
def __init__(self):
# Don't call super().__init__() to avoid audio setup
pass
def start(self, input_callback):
# Don't actually start audio processing
pass
def stop(self):
# Nothing to stop
pass
def output(self, audio_bytes):
# Ignore audio output
pass
def interrupt(self):
# Nothing to interrupt
pass
def main():
AGENT_ID = os.environ.get('ELEVENLABS_AGENT_ID', 'redacted')
API_KEY = os.environ.get('ELEVENLABS_API_KEY', 'redacted')
if not AGENT_ID:
sys.stderr.write("ELEVENLABS_AGENT_ID environment variable must be set\n")
sys.exit(1)
print(f"Using Agent ID: {AGENT_ID}")
if not API_KEY:
print("ELEVENLABS_API_KEY not set, assuming the agent is public")
client = ElevenLabs(api_key=API_KEY)
def handle_agent_response(response):
print(f"Agent: {response}")
def handle_user_transcript(transcript):
print(f"User transcript: {transcript}")
conversation = Conversation(
client,
AGENT_ID,
requires_auth=bool(API_KEY),
audio_interface=NoAudioInterface(),
callback_agent_response=handle_agent_response,
callback_user_transcript=handle_user_transcript,
)
print("Starting conversation session...")
conversation.start_session()
# Wait for the session to initialize
time.sleep(3)
# Send hardcoded user message
hardcoded_message = "Hello, what services do you offer?"
print(f"Sending message: {hardcoded_message}")
try:
conversation.send_user_message(hardcoded_message)
except RuntimeError as e:
print(f"Error sending message: {e}")
return
# Wait for the agent's response which when it happens is handled elsewhere
print("Waiting for agent response...")
time.sleep(10)
# End the conversation
print("Ending conversation...")
conversation.end_session()
conversation_id = conversation.wait_for_session_end()
print(f"Conversation ID: {conversation_id}")
if __name__ == '__main__':
main()
the agent under advanced has user_transcript, agent_response, etc. set up otherwise it'll probably not work
in this case it is just a hard coded user message to prove it -- of course in real life it can be dynamically/run-time set like from a conversation, etc.
but it's just meant to show it can be done. I'm sure there are better ways to structure it for more meaningful use
trying ur code rn
not working for me
Will take a look after work today. Thanks for trying it out
thanks for helping!
using this exact code with my api key and agent, I get this (I redacted some basic things like agent ID (my own agent)):
❯ python another_try.py
Using Agent ID: (REDACTED, THIS IS MY AGENT)
ElevenLabs client initialized. Starting SDK conversation.
Dynamic variables: {'starting_important_points': "Welcome! I'm here to help you with your inquiries.", 'list_of_questions': '\n 1. What is your name?\n 2. What brings you here today?\n 3. How can I assist you?\n ', 'ending_important_points': "Thank you for the conversation. Is there anything else you'd like to know?"}
Starting conversation session...
Initializing session...
Agent: Hello! How can I help you today?
==================================================
Conversation started! Type 'exit' or 'quit' to end.
==================================================
Your message: are you there?
Waiting for agent response...
Agent: Yes, I'm here. How can I help you today?
Your message: how old are you?
Waiting for agent response...
Agent: I don't have an age like a person does. I'm an AI designed to help you. How can I assist you?
Your message: ^C
Interrupted by user. Ending conversation...
Conversation ended. ID: REDACTED
Conversation completed successfully!```
as I have said, for the agent used, under advanced, if you don't have the right events selected, it will of course not work.
@livid spear
Ok good will check it out when I get back thanks
@chilly timber hey just wanted to say thanks for helping me solve this
it works
just wondering how you did?
did you see it in docs the text example or you went into source code and tinkered?
what's weird is if you don't wrap the conversation initiation in a try block it shows user input before agent starting messag e
that might be because there is stuff the library does behind the scenes asynchronously and that takes time, maybe the sleep also helps. but I'm not for sure
i knew 11labs sent back the agent response via text because in my voice channel bot thing, i had seen that logged to console. and I also know from agent setup they do have events that are for their text. I also knew (from earlier code that I ended up changing after I got it working reasonably well) that I could SEND text to it. so I just have to refer to that in code and just made use of it. longer answer is I've been coding since like 7th grade, non-professionally but still worked on stuff, including an open source project (reasonably large), and other small projects. I use that experience + AI now --- that combo is insanely powerful. It's not the same as someone who doesn't know coding at all. I usually know what I need the AI to do and if something isn't working, I usually on feel just have an idea of what is wrong -- so I can help guide it and work with it well. So often it's a mixture of me asking it to do my busy work, and me reading the code, understanding it, and when I need something done, I know well what to ask of it. I also do manual code edits too on top of that. For harder debugging I still stick to a debugger if needed.
yes the sleep was necessary I believe async is being done too
why did you become a doctor? you are at least a mid level engineer
why not do a startup?
ahhh.. i'm 37 now. So it's been a while since I decided on all that. most of us can be happy doing many things. sometimes we don't have time to do it all professionally. I enjoy the path I took. I just took my path and then do other stuff for hobby/fun like this.
haha fair enough
but consider a startup
because youre hacky
you understand low level stuff could probably make something really cool
thanks. glad i was of help
I know ur busy but if something comes up on my end I might reach out
and just a heads up my agent still has a few errors (call ends 1 second early due to twilio error 31921, talking when it hears background noise)
but I think first can be solved through webhook, and 2nd can be solved through webhook -> voice isolator system tool. Will be doing that 👍
then just need to make it hipaa compliant and my prototype is done
you part of a startup?
yeah I started one
@chilly timber do you know if it's possible to retrieve each audio chunk?
in conversational ai
all the elevenlabs tooling right now needs audio chunks to clean audio
but something that can be done is using an external model or writing code and putting it as a system tool
but their tooling is better just not sure how to access each chunk
audio = base64.b64decode(event["audio_base_64"])
self.audio_interface.output(audio)
"user_audio_chunk": base64.b64encode(audio).decode(),```
you should be able to inherit from AudioInterface and then customize it
@livid spear can see this code (will have control+c issue, but that isn't the point of the code) -- of course it logs it to console but you can do with it as you wish inside the code-- just giving you an example of how you can obtain the data yourself
particularly, output method of that class. input is audio_bytes
the logging it does is just the example way to use the data vs other more meaningful ways
ok so ur solutions are hacky
you rewrite the source code instead of using available tooling from elevenlabs
or u make ur own classes and use them?
well it's a valid way of accessing and it'll be more hacky because it's a wrapper around their websocket api that you are using
the websocket api will give you more direct access to the events, like audio, etc.
is what my voice discord bot thing uses. but it is in nodejs
let me see if using the conversational ai api has other ways of directly accessing the audio without customizing the interface
i do not know.
i do not think there is another way to access that audio within the python library known as elevenlabs-python
their API gives you power by allowing you to pass in the audio interface. so it's inherently something you can use/adjust. I do not think it is hacky.
have you tried this code?
my goal is to clean the audio before it is passed
through webhook or as a 3rd party model
just wondering if u tried self.default_interface.output(audio_bytes)
yes. i ran the code. it runs. it might have issues at the end on Windows (the whole ctrl+c issue, but I did not address that because I did not care).
i wasn't trying to give you production code. just proving how to do it.
of course it would have to be adopted into your code if used with alterations. but I think this might be the only or least-hacky way of doing it if using that python library
there is also the possible of using a tool and telling the AI to use it to clean the audio before receiving the call
through elevenlabs client tools
but not sure how reliable that is i'm trying ur implementation tho
yeah sorry I was just trying to see if I could help with your question of accessing each piece of audio
i'm less aware of the other aspects of the situation
i'm massaging a file together to see if it can help explain the tactic I used in the code. or other ways of using it, etc.
how did u make that so fast?
some of those are illustrative examples --- like where it is saying "look you can do this and this and this" -- like some of those methods would of course have to be implemented -- but the point is that -- you can make them and control what happens
it's not that every single one of them is something provided by 11labs
nah like how did u put the document together that fast
ohhhh. I asked one AI to make it -- that has access to the official docs, search, etc. And I asked it in the right way. then I crosschecked it with two state of the art ones on top of that. and then finally I looked at it myself.
you have an agentic system?
and how did u design the ui of that?
to get it in that output
If you know enough about coding, then AI right now will make you prolific.
if you know nothing, then it's not as great. but you know coding, so if you haven't recently used AI in coding and gotten the hang of it...
you are missing out
right now I use multiple things
Codex (which is in preview) -- openai, but very costly and I don't use it much
Cursor ($20/month plan, but I went over by a few hundred bucks this last period because I had some vacation time)
With cursor I mostly use Sonnet, and otherwise I use opus or o3-pro. I want to use the newest Grok but I think it's not that compatible with cursor IDE yet
otherwise, rarely, I would use chatgpt the app and use my same sub
yes I am familiar w/ all except codex
codex is an api call?
you prompted the AI to put it in that specific format
so probably ran search to retrieve the docs then code to output it to that format
you got a github?
I feel like a girl asking a dude for his number after being impressed
I think codex is using a custom coding model based off of o3, and maybe you need to be on some special plans to use it, and it like spins up some VMs and stuff and has tool access, etc. and then shows you the results, and then auto-creates PR if you want, etc. But it's in preview and .... the plan I have now supports it but I'm not sure which else do
wdy make as a doctor per year?
https://github.com/dv8silencer nah i'm not that special
I'm happy with what I make. haha
like 350k ish ya?
I think u should try to make something useful at scale..
you have the chops
not lying
thanks. but I'm hoping what I sent above will help your use case
it will ya
so to confirm you just had an agentic system and asked for that output format?
so I was using Cursor. I have my own rules I provide it that aren't that special probably but i'll post them here
Verify, don’t guess. Be intentional in your actions and avoid wishful thinking. You must be prudent.
Never assume how a dependency, API, CLI, or build tool behaves — either be certain already or check additional docs and other resources (e.g., web) first and cite evidence.
Use Internet search, local files, debugging, tests, and REPLs instead of guessing.
If it is a complex problem, summarize/restate the problem, list unknowns as well as possibilities and probabilities, make a plan, then execute and verify.
Summarize the root cause and update documentation once an issue is fixed.
Let me know if you have any questions about what I'm requesting before you proceed.
If you are using Python, prefer uv and remember that there might already be a virtual environment.
If the context7 MCP is available, make sure to use it to find docs whenever appropriate.```
honestly they might be bad. not sure
then I have an MCP server for it
context7
it is free, publicly available
and then
the models I use are Sonnet, sometimes opus, o3-pro, etc. and then the rest is just how I talk to it
and make my requests
so sometimes asking it the right way helps is what I mean
it has access to internet which I auto-allow
the MCP when it wants to use it -- I have to say yes
and yes in that sense it is agentic
sorry for the long winded answer.
i asked it for .md format. and then I just converted to pdf on my end externally to that
though honestly if I had asked it to do it... it woulda done it just fine
ya word I can fix ur system prompt if u want
you need less prompt engineering if ur using the big models
ohhhh i'd be happy to hear your suggestions. I'm loving how it is treating me thus far with the models I use though.
give it a system persona (start of the prompt). for very specific info use few shot examples (try do keep it at 2 - more than that makes the model more deterministic which is kind of a bad thing), add "think through your answer step by step" at the end of the prompt, if something slightly fucks up use "pay special attention to" after your "think through ..." line.
the smarter the model the less you have to prompt engineer and the fewer examples you should give it
but for reference i automated google forms entry from the phone call we were just working on and gemini and openai couldn't do it, but claude 3.5 and above could
example prompt:
Thanks I’ll look at it
if you'd like you can try my voice call AI
once this work stretch ends then I'll look into it more and maybe give it a try -- thank you for offering
I am probably going to transition ideas after
hey are you able to help me with this? I'm trying to send a video to the real time api. The docs say it can be done but didn't supply an example
Docs: https://github.com/google-gemini/cookbook/blob/main/quickstarts/Get_started_LiveAPI.py https://ai.google.dev/gemini-api/docs/live
My code
have you tried running your code and it has failed? or just want to know if it looks good before trying to run it?
anyways you can try this.
?
hey sorry bro I never got notificationa for this hence me not replying - you should @ me next time
will try ur code tonight ty for trying