#ad_discordbot (Fork of Fork of xNul's bot)

1 messages · Page 20 of 1

terse folio
#

I can get you a working example when I get home.
but it wasnt that bad to get working iirc

valid crypt
#

🥳

terse folio
#

I'm not sure about the encryption stuff, will have to test it

terse folio
#

I think I spammed discord's ratelimits too much tonight.
I'm stripping out all the unnecessary stuff like external libs and complexity for an example to echo your voice back to you

halcyon quarry
#

The next code I’m going to add is to make it possible to run the bot in a non-TGWUI environment for the image generation capabilities only

#

Most of the code to allow this already exists in the bot

#

I basically just need to update the batch files that launch the box so that it will create a new virtual environment if the text generation one is not found. Install the few requirements that the bot currently relies on from the text generation environment. Finally skip trying to import text generation modules when it’s configured for image generation only

valid crypt
#

still wait for it though

valid crypt
#

im getting it! i little slow, it should run on gpu thought, but the model is the base whisper which shouldnt be too slow even on cpu...

valid crypt
#

a bunch of bugs

#

well one big bug

halcyon quarry
#

Ok I see that this bot writes your text

terse folio
valid crypt
#

alr the thing works, it is pretty fast, what happens is that it should detect 1s of silence to chunk and 2s of silence to do the final stt and send the message, the time works but only when you speak again.
its something like, i spoke (3years later),
a little bit before i speak again, oh it had been more than 2 s send the message then yes yes marcos im hearing you

#

i spoke 2025-03-06 00:04:50,083 - INFO - Started hearing from Marcos
I spoke for 2.38s but the transcribe log is much after 2025-03-06 00:04:58,946 - INFO - Transcribing chunk for Marcos with audio length 2.38 seconds
It happened a couple of milliseconds before i spoke 2025-03-06 00:04:59,181

2025-03-06 00:04:58,946 - INFO - Transcribing chunk for Marcos with audio length 2.38 seconds
2025-03-06 00:04:58,946 - INFO - Starting transcription for temp_323088470241312774_199565.wav
2025-03-06 00:04:59,181 - INFO - Started hearing from Marcos
2025-03-06 00:05:00,388 - INFO - Transcription completed for temp_323088470241312774_199565.wav:  Hello, hello, hello
2025-03-06 00:05:02,837 - INFO - Sending transcription for Marcos:  Hello, hello, hello```
and also the silence log disappeared here D:
it feels like if i dont speak it pauses
#

gonna cointinue tomorrow

#

i hope i get it done before sunday

terse folio
#

have you looked into live whisper repos?
They do something like collecting audio as it comes in and adding it to a buffer (like 30s)
Process on that and return the output of high confidence tokens.
And trim the audio/text.

There are some issues with this too, like if it doesn't hear you finish your sentence it might continue retrying to process the same buffer over and over until it gets your final words.

#

What would be cool is if we can run the speech detection model that preprocesses before being sent to whisper to do the cutoffs more intelligently.

#

If you're going to do turn based voice chat,
I recommend starting with voice messages because there you don't have to worry about pauses and it could be a cool interface!
The user decides when they're done

valid crypt
#

i did think about realtime whisper, and instead of sending everything it hears, only send when there's a keyword like **hey **and send the message when there a that's it

#

if the problem is for the future me, it not my problem 👍

#

gonna leave more features for the future me

terse folio
# valid crypt didnt understand

I was thinking the idea is pretty similar, but just taking a slightly different route to achieve.

Also that's a good idea to use a trigger keyword, that can save a lot on processing.
There are some libraries dedicated for that too

valid crypt
#

getting a somehow working version but just takes random time to send the message

#

also gonna fix that

valid crypt
#

im dying

terse folio
#

what's going on?

valid crypt
#

fixed audio processing getting paused, but takes random time around 1-12s

#

and i cant get it fixed

#

but anything else is pretty good

#
GitHub

Real time transcription with OpenAI Whisper. Contribute to davabase/whisper_real_time development by creating an account on GitHub.

GitHub

Whisper realtime streaming for long speech-to-text transcription and translation - ufal/whisper_streaming

terse folio
#

I could take a look at it,
i have a few ideas, like how do you check that it has been silent for a certain amount of time with background noise

#

wow looks like a lot going on in the bot

#

one idea is that the audio processing stops when there's silence, have you tried printing when silence is detected?

#
sink_cb = voice_recv.BasicSink(callback)
sink_silence = voice_recv.SilenceGeneratorSink(sink_cb)
vc.listen(sink_silence)

Wrapping the voice sink in a silence generator will create silent packets when the user stops talking so the silence detection loop has silent packets to work with

#

i havent tested running it yet

valid crypt
#

i tried to implement silent, but there is something stopping it

#

the silent threshold i made is nearly useless

terse folio
#

Silence thresholds arent easy, it will have to be tweaked for everyone's mic

Since you're using the whisper-stream library, you can track based in if the sentence has been ended perhaps

#

Also whisper large v3 has has an issue with never finishing sentence punctuation iirc.

#

large v2 works great

valid crypt
#

actually whisper base is really fast

#

the silent should work

#

but for some reason there is a 10s waiting time that i dont know why is it there

valid crypt
#

i think there something about the way it retrieves voice data and processing and etc, making it really slow?

#

i think thats it, i normally disable turbo to save energy and not loosing power, but the cpu runs on low frequency

#

shouldn't be a big problem for gaming and others stuffs but my python code 😓

#

i hate my life

#

me--> 😪

halcyon quarry
#

🤗

terse folio
#

Running with the tiny model it seems fine, responds within a second of finishing talking

marsh harness
#

With the reasoning models, is it the stopping strings that need to be modified/updated so that the bot doesn't keep speaking beyond its initial response, as well as not outputting </think> at the end of its responses, or are those two separate issues?

#

Even going through deepseek r1's release, I haven't seen any examples of what would need to be specified in order to remove the </think> from the text output. I get that these models are meant to be run where you can see the thinking/context window so you otherwise wouldn't see that, but it outputs it in ooba and when speaking to the bot on discord.

#

Both deepseek r1 and now QwQ do it, and it makes sense if it's related to the thinking/reasoning tokens. Bartowski has both exl2 quants and ggml files on HF for QwQ if you haven't tried it out yet.

halcyon quarry
#

Of course it could be used as a stopping string if you wanted

#

If the responses are generally too verbose, there are other settings for that

marsh harness
#

👍

valid crypt
#

but man, if i dont remove the cap for clock speed, 35w for doing nothing and 40~70w for moving my mouse is

valid crypt
#

these days my pc is crashing, i think my intel cpu is cooked

fickle ember
#

is it possible to make the bot capable of being user installed

#

to make it usable in dms

valid crypt
#

it is usable in dms

#

in your setting file you can turn on dm

#

confing.yaml

#

although it is not installed but it can be used

valid crypt
halcyon quarry
#

And yes as Marcos said you’ll also need to enable that one setting in the config.yaml

#

By default most commands are hardcoded as disabled for users via DM, but the bot owner (you) can use almost all cmds via DMs

halcyon quarry
#

I could try to make some time to integrate the feature

valid crypt
#

works just fine, with some flaws that i dont think it is my problem

#

the time it takes to collect all the packets is demonic

#

im confused if it is my cpu, the extension or discord's fault

valid crypt
#

the water of retrieving audio is deeper than i thought

valid crypt
#

@halcyon quarry can you try those benchmark scripts and give the results? and is it possible to somehow use partially pycord? also how hard would it be to move to pycord?

requirements are pynacl for both i think
and discord-ext-voice-recv @ git+https://github.com/Aviana/discord-ext-voice-recv for the py

i got really good result with pycord and demonic with discord.py

GitHub

Voice receive extension package for discord.py. Contribute to Aviana/discord-ext-voice-recv development by creating an account on GitHub.

halcyon quarry
#

ad_bot already installs pynacl btw

valid crypt
#

ad_bot cant have pycord 💀

terse folio
#

pycord is a different discord bot library.
But discord.py was the first in python and probably still the most feature packed.

But yes there are other libraries in other languages that do voice receive out of the box

terse folio
valid crypt
#

forgot to upload that one 😱

#

should work now

#

pycord is made on top of discord.py, the voice recording was supported in discord.py but dev abandoned it while pycord even improved it

valid crypt
terse folio
#

what happened to the previous extension on discord.py?

valid crypt
#

?

#

if you mean the original and not the fork, pretty much abandoned and few months after discord deprecated of its supported decrypt method

#

tho fork added support for the new encryption

#

and it is super slow

#

at least for me

halcyon quarry
#

Sure seems like the decryption method is supported in discord.py

terse folio
#

oh I see,
I guess I havent updated yet or something

I was trying to get an echo test version of voice receive working, but joined the voice channel too much and discord locked me out for a day Xd.
hadn't really gotten there to test latency

halcyon quarry
#

hmm

#

yeah my discord.py was outdated as well, I activated the TGWUI venv and used pip install discord.py -U which updated it to current, which includes the support for that encryption

#

So if we are to add this feature I'd just need to update requirements.txt to ensure it specifies v2.5>

terse folio
halcyon quarry
#

yes my updater scripts first git pull then do execute requirements.txt

valid crypt
#

?

halcyon quarry
#

You said it was aead_xchacha20_poly1305_rtpsize yes? That's the method that the changelog says was implemented in discord.py 2.5

valid crypt
#

pure chaos

halcyon quarry
#

That Issue is from 2023 my dude

#

Seems that this encryption method is implemented as of merely 3 weeks ago

valid crypt
#

why would a guy update an extension when the main project actually does what the extension does

#

😓

#

they are playing with my mind

halcyon quarry
#

Well let's say I made a bitching extension that does something that doesn't work in the main project

#

Now some time later, I imagine they want to just update a few blocks of code rather than rewrite or discontinue the extension

#

the methods etc are probably very similar if not identical to how it works in this extension you found

terse folio
valid crypt
halcyon quarry
#

Just took a quick look through the asr bot extension and it's not a crapton of lines or anything...

valid crypt
#

asr bot extension?

halcyon quarry
#

This is what you said works yes?
<#1154970156108365944 message>

valid crypt
#

thats is my humble bot, using the extension to retrieve packets

halcyon quarry
#

Ok I see that there is not a method like receive_audio_packet only a send_audio_packet

valid crypt
#

could be encryption for sending

halcyon quarry
#

actually...

valid crypt
#

s

halcyon quarry
#

might be part of this Opus library that its referencing all over in this section

terse folio
#

a few years ago

halcyon quarry
valid crypt
#

i get 2025-03-08 22:31:46 - __main__ - ERROR - Benchmark failed: aead_xchacha20_poly1305_rtpsize

halcyon quarry
#

I made this image way way way back when TGWUI was juuuuuuust getting off the ground

terse folio
#

okay, will do a sanity check as it's been a moment since i updated ^^

#

i never got my echo test working for other reasons, but I should be able to get it to write to a wav file

halcyon quarry
#

Here's a quick LTX-Video image2video I just executed in about 1 minute on my 4070ti (12gb vram)

valid crypt
halcyon quarry
#

I'm extremely interested in actually getting the comfy UI / swarm support added in, for users to easily configure execution of various workflows via the bot and send to channel the expected output

#

The one thing that's a bit of a bummer about current video generation is that the models generate the whole video as essentially a full length diffusion process - not a sequential process that could be paused and resumed

terse folio
#

That's fascinating to me too, i did a little research into it and found that you can upload custom comfyui nodes via api to be processed?

That would be cool for things like assigning entire workflows to tags

halcyon quarry
#

So those sort of requests would stall the bot bigtime in a busy server

#

Yes, I believe there's some extreme flexibility for using the Comfy API

terse folio
#

can help there when the time comes

halcyon quarry
#

Some pretty ambitious thoughts going through my head lately in regards to managing the generative endpoints

#

I've currently got some pretty fixed and rigid definitions for what an image request payload entails -- I was thinking how it would be much better to make it so the current payload stuff I have is like an example template, but that the bot could accept, process and send whatever the user defined without any errors.

valid crypt
halcyon quarry
#

Doing this wouldn't really be too hard but I'd want to make like, a dedicated method to filter some of the client specific features I've written for A1111 / Forge / Reforge

terse folio
terse folio
halcyon quarry
#

I think it already does this... I just have a lot of micromanagement going on that I could cut back on

#

But mainly, the overhaul I need would make it very very simple to manage payloads and settings for various APIs

terse folio
#

Oooh, now I get the encryption error

#

weird that it worked a couple days ago...

halcyon quarry
#

like a user directory for storing payloads, then just having one setting in config/py to specify the one you'll be using, or something

terse folio
#

interesting, be safe with that.
Storing payloads in json/yaml files could mess things up if they aren't escaped properly

halcyon quarry
#

Well I currently already do something like this, with basesettings.yaml

#

BTW - I've been using a lot of ComfyUI workflows lately, there's a ton of cool crap you can do without having to actually play with the spaghetti

valid crypt
#

camfyui is cool

halcyon quarry
#

Followed by, lots of additional cool crap you can do once you do know how to play with the spaghetti

terse folio
#

yes, yesterday or the day before, everything was working.
And my other test bot was working.

today I'm having the encryption issue

#

I'll try out the fork later

halcyon quarry
#

The default example i2v workflow for LTX-Video includes this prompt enhancing LLM that wrote a very good caption for this image

valid crypt
halcyon quarry
#

So my input was the image, and a prompt a caveman reading something on his computer monitor

valid crypt
#

what i found comfyui interesting is that you can plug a lot more things and have a super complex workflos, i even saw people selling comfyui workflows 💀

halcyon quarry
#

Yeah it’s the YouTubers all going full blown Patreon lol

#

It’s insanity really anyone who can invest an hour or so just tinkering can figure out how to connect different workflows together and stuff

#

It’s so simple now that you can create Groups of nodes

#

You can toggle a complex feature on and off by just Bypass Group now

terse folio
#

I had a similar view, and still do kinda...
My family often told me I should create tutorials or classes for friends. (random topics like programming)
And I often replied like "anyone could figure this out with a few hours of research"
or "it's really nothing special"
But sometimes we forget how niche the things we know are.

Here for example, a lot of us have some programming knowledge, and the node networks of comfyui are reminiscing of programming where one thing pipes into another.

But, especially now days where AI tools are mainstream, everyone wants to do it.
I can understand why people sell shortcuts.
But I appreciate it when there's an open source solution that you can compile yourself for free or download a working version for a donation

valid crypt
#

people are getting dumber with tiktok

terse folio
#

Yea, I can see that with ai tools as well, taking away reason to do your own problem solving 😭

valid crypt
#

i still remember when my teacher said that the secondary school nowadays is

#

🤪

halcyon quarry
#

Memebenders

fickle ember
fickle ember
halcyon quarry
#

Also check your server settings - Integrations > Bot

#

And Roles. Make sure the bot has privileges

fickle ember
#

this is for making it userinstallable yes?

halcyon quarry
#

It could be installed without any intents scopes etc, but then it’s not very useful

fickle ember
#

i kinda got it to show up at all in another server its not installed in albiet it didnt work fully

halcyon quarry
#

In the developer portal, Bot, 0auth (I mentioned this yesterday)

#

This is where you generate an invitation link for your bot to join a server

#

The link changed as you check off the various permissions you want to allow the bot to have

fickle ember
#

yeah

halcyon quarry
#

So check off Bot, which expands a new section, and check basically everything. Copy paste the link in browser and invite the bot to the server. Repeat for additional servers

#

You may also need to give the bot a Role with more permissions like, Send Messages etc

#

In the server settings, or channel settings, etc

valid crypt
halcyon quarry
#

He has the settings templates in correct place and all that, for sure

#

the bot copies them automatically if the user did not already

valid crypt
#

wanted to fork pycord and change its name and misclicked and did a PR 😓 never used codespace and do a merge with vs code :v

#

got blocked really quick XD

#

also i think i misclicked a lot of thing 😦

valid crypt
#

how can i have pycord and discord.py at the same time?

#

this is killing me

#

i really dont want to use the extension of discord.py ;-;

#

actually i believe that they can be installed together

#

dont tell me that they really ca be together .-.

#

dreaming too much f

#

actually got an idea, gonna try it tomorrow

halcyon quarry
valid crypt
#

discord.py's extension although it can be added in one bot, i'll prefer adding another bot with pycord as it is faster for me

halcyon quarry
#

Does the bot send the transcribed text to the channel? If so, then my bot should already be able to handle it... although I'd probably want to add a tag like "regex_text" that could be a regex string to update the user text. In this case, ignore or modify the prefix that other bot adds

valid crypt
#

what i was thinking is a deeper connection, but if a separated bot sending the transcription is good enough i will be happier, right now my bot sends displayname: transcribed text

halcyon quarry
#

Just trying to stay flexible 🤗

#

besides a customizable Regex tag has been on my mind for some time

valid crypt
#

is there tag to stop tts?

#

actually, for voice chat anything should stop the tts

#

and to make it better, add "you were interrupted" to the context

halcyon quarry
#

Suppose I need a "should_tts" tag

valid crypt
#

i'll try to add multi speaker for now

valid crypt
terse folio
valid crypt
#

Before it takes me X5 time to collect all the packets

#

My hypothesis is lack of optimisation

terse folio
#

makes sense makes sense,
That's something i'm interested in testing timings with when i have free time

But i'm happy it's working good for you now 😸

valid crypt
#

Although the biggest problem it has now is recording the whole voice channel in a single file

halcyon quarry
#

I'm done with my video game vacation now so I'll be making progress again

halcyon quarry
#

Result of the recent new image command option, LLM gave a very good prompt for Flux

#

This NeuralBeagle model will never cease to amaze me

valid crypt
#

bot at the top :O

halcyon quarry
#

Interesting, I had made a PR to add the bot but I definitely did not put it at the top

#

Either ooba checked out the bot and thought it deserved the recognition? Or someone else sneaky moved it in a PR lol

halcyon quarry
#

Yeah, he totally moved it up

#

oobabooga ❤️

terse folio
#

Congrats, that's really cool :)
The best way to get beaten to something,
oh misread a little, I thought you were considering the PR but someone did it before you

halcyon quarry
#

I'm really stoked about it

#

That repository has instructions for requesting to have an extension added to the list, which is simply to clone the repo, update the list and send it as a PR.

I did this, but I had inserted my bot way down the list, just below Oobabot

#

It looks like shortly after, he decided to reorganize to sort what he thought to be most noteworthy to the top

terse folio
#

that's epic!

halcyon quarry
#

I only feel bad you don't have more recognition, you wrote some of the most impressive features that it offers

#

Well that's in my control eh 😛

terse folio
#

I was glad to make some friends here, and support something cool ^^

halcyon quarry
#

I’m grateful, very. I hope you’re also proud of this ascention of the bot in that list

#

Could not be in that coveted spot without you

#

I remember every contribution you made, notably you sorted out my chaotic single file making my life so much easier

terse folio
halcyon quarry
#

@valid crypt thanks for pointing this out I’m literally drinking champagne over this

#

Mind is blown

valid crypt
#

🫡

halcyon quarry
halcyon quarry
#

I think it could make sense to do the following:

  • ship the bot with text and image generation each disabled by default.
  • make it possible for the text generation to be run in 2 modes, as API mode, or the custom TGWUI integration
#

While the bot is configured for anything besides text gen enabled + TGWUI integration, do not attempt to activate and rely on the TGWUI venv. Instead create own venv.

#

I suppose these would be better controled via CMD Flags

halcyon quarry
#

Yes, making some good progress on this... borrowed a lot of code from TGWUI setup so that it will create a venv and install requirements

#

I’m going to have it detect and ask how to handle venv setup

valid crypt
#

i need a tag to stop tts on new message, and add "you were interrupted while speaking" or custom text to the context

valid crypt
#

wait a second, why bot is replying to my bot when i speak but not when bot speaks? :O magic

valid crypt
#

by taking a shower, i think this can be done with tags!

#

a tag to pause tts, to continue and to abort

halcyon quarry
#

I feel a bit over my head with adjusting this installation procedure...

halcyon quarry
#

In this process, I noticed that a requirement of the bot, pydub, has been expected to be present in the TGWUI environment -- but TGWUI does not seem to install this by default. It seems to only get installed by some common extension we've been using

#

In any case, I figured out the few requirements needed if not relying on TGWUI environment, and they're all now in requirements.txt

halcyon quarry
valid crypt
#

A simple stop tts would be fine for now

valid crypt
halcyon quarry
#

I’m sure someone has thought of this but I was thinking how it would be really cool if there was a such thing as future conditioning, when generating text “in the middle”

#

Something like this must already exist, but I just haven’t heard of it

#

The bot’s history management allows generating in the middle but only by omitting the future text. Would be neat if there was a mode that including it had an influence

halcyon quarry
#

Eh nvm this is achievable by summarizing the future text and just better prompting including it

valid crypt
#

at least a simple, fast tag to stop tts pls

halcyon quarry
#

Alright, I’ll see about doing this tonight

valid crypt
halcyon quarry
#

Working on the "should_tts" tag now

#

I added this tag...
toggle_vc_playback: str

Changes playback in guild's voice channel where tag is triggered. Use with 'for_guild_ids_only' condition for selective control. Valid values: 'stop', 'pause', 'resume', 'toggle' (pauses or resumes)

halcyon quarry
#

Ok I just added a should_tts tag as well - which is only useful if using value false to prevent TTS on the current interaction

#

@valid crypt Please let me know if these additions work expectedly, as I think they should!

valid crypt
#

ok updating now

valid crypt
#

toggle_vc_playback could have should_gen_text: false as default

#

actually with that off, it does not work

halcyon quarry
#

toggle_vc_playback is not intended to have any influence on TTS generation - simply affecting the current playback state in the voice channel

valid crypt
#

but it triggers generation

halcyon quarry
#

You mean, the absence of this tag does not result in TTS generation?

#

Adding this tag makes TTS generation happen explicitly?

#

I have the other should_tts tag which is intended to explicitly prevent TTS from generating

valid crypt
#

a tag triggering toggle result toggling the tts which means that it works, but the tag it self triggers text generation, and with the tag that turns off generation makes the toggle dont work anymore

halcyon quarry
#

Please tell me how you feel the logic should be applied

valid crypt
#

my intention was with the tag the vc chould be toggled, but the message that contains the tag should not generate text

halcyon quarry
#

What happens if you include both tags?

valid crypt
#

i thought it can be done with this tag but it makes the toggle dont work

halcyon quarry
#

toggle_vc_playback and should_tts

#
should_tts: false```
valid crypt
halcyon quarry
#

Right well I think you're expecting toggle_vc_playback to do more than I think it should do

valid crypt
#

😓 just it shouldn't take it as a message

halcyon quarry
#

Maybe you have it triggering on the wrong condition

valid crypt
#

there must be audio playing to be toggle, but by sending the toggle, you get a new audio (new response)

valid crypt
#

with the comment it works, removing the # it doesnt work

valid crypt
halcyon quarry
#

oops

#

Sorry :< Just pushed the fix for that

#

Line 1673 in bot.py needed an await
tts_sw = await self.check_tts_before_llm_gen()

valid crypt
halcyon quarry
#

erm

#

It wouldn't make sense because the TTS would have already been generated

#

They are generated simultaneously

#

The tags system does not currently review and apply tags to every response "chunk" as it is generating text - it applies tags after the response has completed generating

valid crypt
#

oh

halcyon quarry
#

That'd be a bit of a tricky one to implement...

valid crypt
#

add it to the to do list then, no hurry 😋

#

is that?

halcyon quarry
#

I do have a special system in place for reviewing response chunks for "censored" text

#

Could slip in the special handling for TTS here

#

at the expense of, even more code complexity

#

It's currently running this code here for every response chunk. When initially building tags, it looks specifically for censor tags and keeps separate tabs on them for this function

#

Would have to do the same - make a list of should_tts tags as they are being built, then look for trigger text in every response chunk

valid crypt
#

no hurry :p

valid crypt
halcyon quarry
#

Ok I solved one of 2 bugs there

#

this error here is due to the TTS streaming feature...

#

Try adding this little bit here which I think should resolve it

#

not sure if this will actually resolve it

#

Was TTS streaming working on the main branch?

#

with the new remote Alltalk v2?

#

It's possible that this feature is bugged for the new AllTalk (I haven't had time to test it out yet)

valid crypt
valid crypt
halcyon quarry
#

If that fails,

#

Add a print statement here and print chunk

#

print("chunk:", chunk)

#

Also one here - to print vis_resp_chunk

#

print("vis_resp_chunk:", vis_resp_chunk)

#

In any case I should probably add a line here before searching for the audio patterm if vis_resp_chunk:

valid crypt
#

ok

halcyon quarry
#

I did add this line now... which will prevent error but won't fix unexpected bug

valid crypt
#

there is not chunk

#

only vis

halcyon quarry
#

It generated 0 tokens

valid crypt
#

only when i trigger should tts

#

only the first time 😓

halcyon quarry
#

Are you sure this only happens when using should_tts?

valid crypt
#
============================================================
C:\text-generation-webui-main\installer_files\env\Lib\site-packages\llama_cpp_cuda_tensorcores\llama.py:1240: RuntimeWarning: Detected duplicate leading "<|begin_of_text|>" in prompt, this will likely reduce response quality, consider removing it...
  warnings.warn(
Llama.generate: 1334 prefix-match hit, remaining 99 prompt tokens to eval
llama_perf_context_print:        load time =    2194.38 ms
llama_perf_context_print: prompt eval time =     769.07 ms /    99 tokens (    7.77 ms per token,   128.73 tokens per second)
llama_perf_context_print:        eval time =    4166.61 ms /    45 runs   (   92.59 ms per token,    10.80 tokens per second)
llama_perf_context_print:       total time =    5032.58 ms /   144 tokens
Output generated in 5.04 seconds (8.93 tokens/s, 45 tokens, context 1465, seed 984739868)
vis_resp_chunk: <audio src="file/C:\text-generation-webui-main\extensions\alltalk_remote_tts\Ganyu_20250318-180936.wav" controls autoplay></audio>I&#x27;ve been thinking about our last conversation, and the way you described the view from the mountain still brings a smile to my face.
18:09:36.446 #3170   INFO [bot.__main__]: Marcos: "remain_silence"```
halcyon quarry
#

everywhere that I have this code, I see no evidence that it should affect text generation at all

valid crypt
#

i typed correctly right?

halcyon quarry
#

tags don't update as you modify the file though - use a command like /character to refresh tags

#

Or reload the bot

#

You have that defined correctly

valid crypt
#

i wrote that a long time ago, and in the dict tag file

#

and rebooted a lot of times

halcyon quarry
#

hmm..

valid crypt
#

turning off streaming? or changing tts?

halcyon quarry
#

try updating this block of code I have at line 1650

#

Replace with this block

    async def check_tts_before_llm_gen(self:Union["Task","Tasks"]) -> bool:
        # Toggle TTS off if not sending text, or if triggered by Tags
        if (not self.params.should_send_text) or (self.params.should_tts == False and tts.enabled):
            return await tts.apply_toggle_tts(self.settings, toggle='off')
        # Conditions which are only valid for guild interactions
        if hasattr(self.ictx, 'guild') and getattr(self.ictx.guild, 'voice_client', None):
            # Toggle TTS off if interaction server is not connected to Voice Channel
            if not voice_clients.guild_vcs.get(self.ictx.guild.id) and int(tts.settings.get('play_mode', 0)) == 0:
                return await tts.apply_toggle_tts(self.settings, toggle='off')
        return False
#

I might know the real issue here...

valid crypt
#

yes of course my skill issue :v

#

just a joke

halcyon quarry
#

I don't think the extension params are controlling AllTalk

#

What happens if you try using a different voice with the /speak command? Does it speak using a different voice?

#

Or, if you use a different voice filename in the character file?

valid crypt
#

i remember that you didnt add /speak for all talk, anyways ill go with kokoro

halcyon quarry
#

I did add /speak for alltalk

valid crypt
#

the remote

halcyon quarry
#

Does that still reside in a directory called alltalk_tts in the extensions folder?

valid crypt
#

it is called alltalk_remote

#

what i did was changing it to alltalk_remote_tts and it works

halcyon quarry
#

try changing it to alltalk_tts

valid crypt
#

kokoro, there is tts,

halcyon quarry
#

We didn't figure out how to control kokoro via extension params either

#

Try edge_tts or try renaming alltalk

valid crypt
#

renaming all talk

#

i remember that streaming broke edge

halcyon quarry
#

These are the "supported" ones:
'alltalk_tts', 'coqui_tts', 'silero_tts', 'elevenlabs_tts', 'edge_tts', 'vits_api_tts'

valid crypt
#

i could try vits later

#

renaming is not very...

halcyon quarry
#

hum

#

When I have time I need to try getting alltalk remote working on my end

#

see what's up with that...

#

I expect that the TTS will correctly be prevented when using tts apps that respect the extension parameters

#

alltalk TTS remote may have different labels for the extension params

valid crypt
#

i somehow killed vits-simple-api let me fix it

valid crypt
#

alr, idk why it died, i got to reinstall it, and i hope that everything will be fine

halcyon quarry
#

Same!

valid crypt
#

i dont know wt is going on with this

halcyon quarry
#

I'd be looking into this now if I wasn't super busy

valid crypt
#

alr vits working (just the tts)

#

actually we can delay should_tts , and make the toggle tag do not trigger a response :v

#

i think i didnt change anything but, does the bot play the tts locally???

#

it is a feature that i would like to have though

halcyon quarry
#

toggle_vc_playback applies specifically to the voice channel

#

should_tts applies to the TTS generation entirely

valid crypt
#

here buddy, python is playing the tts

#

i was wondering why i was hearing echo, it was discord and python

halcyon quarry
#

lmao

valid crypt
#

was that a new feature only with vits?

halcyon quarry
#

Probably

valid crypt
#

its your bot buddy

halcyon quarry
#

Ok I thought you meant, that the actual vits code when triggered by TGWUI, might open some python player

#

b/c tgwui is running vits code when TTS is triggered as an extension

valid crypt
#

vits is running on my main pc and bot on my remote pc

#

and the remote is playing locally the tts

halcyon quarry
#

🤷‍♂️

#

never heard of this

valid crypt
#

for now the should_tts is not very working

#

as it shouldn't generate the tts from the beggining

#

that was on me

#

typo

#

vits works

halcyon quarry
#

As in, the tag works correctly with vits yes?

valid crypt
#

double tested, vits works

#

lemme reboot and triple test it

halcyon quarry
#

The bot can only modify TTS behavior if the extension parameters are valid

#

as defined in base_settings or your character file

valid crypt
#

vits ✅

#

also tested that vc toggle gives error if there is no audio

#

vc toggle + should tts = 0 tokens

valid crypt
halcyon quarry
#

Please elaborate

valid crypt
#

i should be able to pause the tts without triggering text generation

halcyon quarry
#

You can

valid crypt
#

how

halcyon quarry
#

You may be correct

valid crypt
#

this is what happend when i used should_gen_text: false

#

basically nothing

#

it does not trigger text generation but same for the tag to pause

#

i suppose that you will fixe it so, i'll be adding those features to the asr bot

#

👍

halcyon quarry
#

ehhhh

valid crypt
#

DDD:

halcyon quarry
#

yeah I'll figure something out

valid crypt
#

👍

halcyon quarry
#

I need to review if there's any other tags irrelevant to the text generation, process them regardless

valid crypt
#

fixed and pushed?

halcyon quarry
#

Probably tomorrow 😆

#

Still trying to make progress on “unrequire TGWUI” logic

#

Bat file is getting pretty complex but almost have the installation/launch logic figured out

valid crypt
#

oh

halcyon quarry
#

I'm taking steps towards the bot being used in either of 2 ways:

  • With TGWUI integration (as it is currently)
  • As a Standalone where TGWUI is not required, but can be used via API
  • Image generation capabilities and other bot functions will work in either setup
#

The logic I'm adding into the launcher script:

  1. It is checking for a txt file that confirms whether the bot is installed, which will specify the conda environment.
  2. If the file is not found it is assuming it is the first run of the bot.
    2a. It will detect if the bot is nested in TGWUI. If so, it will have both install options.
    2b. If TGWUI is not detected, it will mention that TGWUI was not found for an integration option, and only provide Standalone option.
  3. Depending on the option, it will activate the appropriate environment, check for the bots requirements there and install as necessary.
    3a. For TGWUI integrated, the bot will not create its own environment.
    3b. For Standalone, the bot will download git / Miniconda as necessary and install them automaticaly and create environment - in the same fashion that TGWUI does.
#

I plan on segregating the TGWUI integrated features such as Extension management, and anything else that won't be compatible with API.

#

Finally - I'm replacing the Updater scripts with update wizards just like TGWUI has. The wizard will have option to switch from TGWUI to Standalone (vice versa if TGWUI is detected)

valid crypt
#

also i dont think it would work now

halcyon quarry
#

My ultimate vision is that the bot can host any number of APIs, so long as user wants to duplicate a block of code and fill in some lines

#

use the tags system to call whatever API and do whatever with the response

#

Need to inch my way in that direction and it starts with still recommending, but un-requiring TGWUI

valid crypt
#

i want to turn my lights on :p

halcyon quarry
#

ad_discordbot has ya covered 👍

valid crypt
#

i think there is a open source api for that 👍

halcyon quarry
#

From the TGWUI installer:

@rem figure out whether git and conda needs to be installed
call "%CONDA_ROOT_PREFIX%\_conda.exe" --version >nul 2>&1
if "%ERRORLEVEL%" EQU "0" set conda_exists=T

@rem (if necessary) install git and conda into a contained environment

Realizing that this startup script actually doesn't install git though 😛

#

at the end it calls another script which does, but still

halcyon quarry
#

Sorry I didn't finish up the TTS stuff yet

valid crypt
#

no hurry

valid crypt
#

technically 2 bot can log into the same account but cant use the same voice channel

#

hmmmmm

#

why sst if no tts :(

halcyon quarry
#

I'm actually glad to be splitting up these "generic tags" processing, the few things being added here are essentially duplicates in both llm tag processing and img tag processing functions

#

Will just call this ahead of those with the 'phase' as positional argument

#

@valid crypt I pushed the changes to the same branch

#

This now processes "generic" tag matches immediately after matching them, and before handling LLM and Img specific functions

#

So you should now be able to toggle the voice channel playback regardless of whether anything is being generated or not

#

this handles the following tags: flow, toggle_vc_playback, send_user_image and persist

#

If you confirm it handles the VC playback expectedly I'll push it to main

valid crypt
#

got it

#

oops

#

wrong channel :v no one saw anything

halcyon quarry
#

rats, missed it

valid crypt
#

at least with alltalk remote should tts doesnt work, and when a new message is sent the current audio cant be paused

valid crypt
halcyon quarry
#

right, the issue with that is the extension params must have changed or something, or the directory name is messing it up

#

the bot controls the TTS by hijacking TGWUI's extension loader and updating parameters

#

I should be able to figure that one out - this is an alltalk-remote specific issue though.

valid crypt
#

pause and resume tags are pretty good

halcyon quarry
#

Thanks a lot for beta testing these improvements

#

And the suggestions, all good ones

halcyon quarry
#

@valid crypt if you want to help debug this alltalk thing

#
        print("EXTENSION ARGS:", shared.args.extensions)
        print("EXTENSION ARGS:", shared.settings)
valid crypt
#

🫡

#

erm, line?

#

well gonna search

halcyon quarry
#

def on_ready

#

Did you have to do anything special to get the remote thing working, or just follow the steps carefully?

valid crypt
#

nothing special

halcyon quarry
#

I'm going to go ahead and try getting things up and running on my end since I have a little bit of time on my fingertips

valid crypt
#

settings are changed either through webui or directly in alltalk

halcyon quarry
#

Ok so in your basesettings or in your character, etc

#

There is the bots custom extension support

#

If you have an alltalk_tts dictionary key just try renaming it to alltalk_remote_tts

valid crypt
#

?

#

i dont think that would work, those settings are for v1 and im using v2 remote

halcyon quarry
#

try it

valid crypt
#

oki

#

should tts dont work

halcyon quarry
#

rats

halcyon quarry
#

These are the correct extension args Im pretty sure

#

But I need to update the bot to ensure certain keys behave correctly...

halcyon quarry
#

I’m working out some more kinks, like adding exceptions for when current method to “get voices” fails

#

Plan to add TTS API.

#

It seems like the remote extension ignores settings defined from settings.yaml, and only applies changes via gradio

halcyon quarry
#

I expect the author of alltalk will clarify whether the alltalk v2 remote extension can be controlled in the same way as other extensions (via TGWUI extension arguments).

valid crypt
#

:V

valid crypt
#

ive noticed something funny, at least for all talk remote, an api request to tgwui will trigger tts XD

halcyon quarry
#

I have API working for /speak command now

#

@valid crypt I just pushed changes to that tts branch, which prevents the script from crashing when trying to collect voices for the /speak command

#

Also a few api settings added to config.yaml

#

The bot can now use the /speak command with alltalk v2.

#

You can safely rename the extension folder to anything with the phrase 'alltalk' in it and it should behave the same (alltalk, alltalk_remote, etc)

#

These additions are kind of a hotfix - I have much bigger plans for API stuff, it's really going to be an overhaul on the bot

#

gotta change my focus back to the install process, update wizard scripts, etc

valid crypt
halcyon quarry
#

The one you shared in the past when the directory was named "alltalk_tts" occured because it tried importing a function that didn't exist in the new alltalk v2, and I did not have it in a "try / except" block

#

I revised the logic so that if there is a specified tts voices endpoint, it will first attempt to collect the voices using it. If it fails, it will try using the original methods I had. If that fails, it now just disables the voices option in /speak cmd

#

rather than just crashing and burning

valid crypt
#

after some testing, i think if there is a message with no gen text, and gets deleted after some milliseconds, the input that comes afterward is not detected

valid crypt
#

example, my bot sends pause tag after receiving audio packets and deletes it

valid crypt
halcyon quarry
#

I don't really understand what you're describing

#

Whenever the bot is triggered to interact in any way, such as a message request, etc - it collects the information it needs, creates a task, and queues it.

It then processes each task sequentially

#

It can queue up new tasks while it is processing the current task

valid crypt
#

i dont know either :v

#

just my bot is getting ignored

halcyon quarry
#

There is a behavior setting called "chance to reply to other bots"

#

maybe it's at 0.0?

valid crypt
valid crypt
#

but if i speak longer which means sending the text later, works

#

weird

halcyon quarry
#

By default, chance_to_reply_to_other_bots: 0.0

#

In dict base settings

valid crypt
valid crypt
halcyon quarry
#

Here is the code that applies the vc playback tag

#
    async def toggle_playback_in_voice_channel(self, guild_id, action='stop'):
        if self.guild_vcs.get(guild_id):          
            guild_vc:discord.VoiceClient = self.guild_vcs[guild_id]
            if action == 'stop' and guild_vc.is_playing():
                guild_vc.stop()
                log.info(f"TTS playback was stopped for guild {guild_id}")
            elif (action == 'pause' or action == 'toggle') and guild_vc.is_playing():
                guild_vc.pause()
                log.info(f"TTS playback was paused in guild {guild_id}")
            elif (action == 'resume' or action == 'toggle') and guild_vc.is_paused():
                guild_vc.resume()
                log.info(f"TTS playback resumed in guild {guild_id}")
#

If the value is stop and something it currently playing, it will stop.
If the value is pause or toggle and its currently playing, then it will pause.
If the value is resume or toggle and it is currently paused, then it will resume.

valid crypt
#

also i was trying to get as little latency as possible so i changed the stream chance to 2 and it is not always splitting (exclamation mark should split)

#

same here

halcyon quarry
#

I'm contemplating this now

valid crypt
halcyon quarry
#

The text splitting is super complicated btw lol

#

What makes it super complicated, is that longer syntax such as \n\n will never trigger without some very complicated logic

halcyon quarry
#

I made a system where it creates a little "window of text" to check (it is only evaluating like 5 characters at a time).
If it matches on a shorter syntax (like ".") it will set a flag to not split the text, and wait one more iteration.

#

Test something for me, we'll just increase the window a little bit

#

print("matched syntax:", syntax, "window:", check_window)

#

Try increasing the window size here to 3 or 4

#

I can test this out myself in a bit but right now I'm working on something important work related

valid crypt
#

ok

halcyon quarry
#

I'll have to look into it a bit more, I thought I had it figured out 100% but apparently not

valid crypt
#

this time worked, looks like that it isn't very consistant

valid crypt
#

i think ive discovered something

halcyon quarry
#

What did you discover? 😛

valid crypt
#

if the sentence have rain or pain doesnt work if the message was send by my bot

#

what logic is this .-.

halcyon quarry
#

I suppose you have those associated with tags?

valid crypt
#

but if i send it myself it works

halcyon quarry
#

You could put a print before the return to see if it is ignoring messages

valid crypt
#

i logged into my bot's account

#

:O

halcyon quarry
#

I don't think I have anything hardcoded for "ai"

#

This is where it decides whether it will reply to a message or not

valid crypt
#

only happens to my bot

#

what do i have to print?

halcyon quarry
#

Ahhh

valid crypt
#

😱

#

discrimination

#

XD

halcyon quarry
#

I think I'm starting to understand what';s happening here...

#

If you change reply_to_bots_when_addressed to 1.0 I believe it will solve this

#

I need to revise the logic to avoid this issue

#

I think this logic sort of makes sense if it is matching the whole word for the bot name, but it's not it is triggering if the bots name is anywhere in the text string at all

halcyon quarry
#

Like if you only want it to reply to another bot which said "Hey ai, tell me about"

#

It's currently rolling probability for reply_to_bots_when_addressed if the bot's name (ai) is literally anywhere in the text

valid crypt
#

fixed

#

XD

halcyon quarry
#

Could you please put it back to 0.0, and update line 6645 with this?

if message.author.bot and re.search(rf'\b{re.escape(last_character.lower())}\b', text) and main_condition:

#

this should now only trigger on whole words

#

You could test with rain, pain, etc - and also "Hey ai, what's up?)

valid crypt
#

ok

halcyon quarry
#

Ok - I had it wrong 😛

#

nvm... ehhhh

#

I'll play around with this myself

#

@valid crypt Try with the bots actual casing like is it "AI" ?

valid crypt
halcyon quarry
#

how about Hey AI dude

valid crypt
halcyon quarry
#

you did change that setting back to 0.0 right?

valid crypt
#

i removed it

halcyon quarry
#

and you restarted the script?

valid crypt
#

i rebooted and reselected 👍 i dont think i missed

halcyon quarry
#

thanks 🙂

#

alright I'll have to tinker with it later on

valid crypt
#

i didnt know that tags could take effect in real time

valid crypt
#

by tweaking to get maximun performance i've noticed that the token per second is wrong, it includes time that the tts takes to generate the tts

#

tts off

#

:P

#

average 3s to get the first stream tts, arrrgh, i want it to be faster and faster

valid crypt
halcyon quarry
#

Pushed the recent changes to Main

halcyon quarry
valid crypt
#

i really think that it is being affected

halcyon quarry
#

likely!

#

well, just the TTS model being loaded means less ram and/or vram available to TGWUI (I think that's how it works)

valid crypt
valid crypt
halcyon quarry
#

aha

#

Well you have 1000 more tokens in context

valid crypt
#

from tgwui, with tts, toggled off tts

valid crypt
#

ill dig deeper tomorrow

halcyon quarry
#

🤔

valid crypt
#

No building for /speak or anything

#

Well I'm dying right 😪

#

Now

halcyon quarry
#

I just pushed an update

#

I did overlook a few little things 😛

#

Note that the bot currently requires the extension name to be in config.yaml - even if you have TGWUI flags to launch it

#

so maybe you updated the folder name but not the name in config.yaml

halcyon quarry
#

There’s a key under “tts_settings” offhand I think it is “tts_client” which is expecting one string value

#

You can launch more extensions with the bot via CMD_FLAGS

valid crypt
#

lets say, streaming tts works, but it modifies the token/s from tgwui, and tts makes bot much slower

halcyon quarry
#

I think that typically, you'll generate all the text while it maximizes VRAM usage, then memory moves around as the TTS model gets to do its thing.
But with streaming we need to jump back and forth between both models so the memory gets shifted around more

valid crypt
#

,something that i've noticed is that pause tag works just fine but it takes around 0.05 to 0.2s, while my discord ping is 4ms

halcyon quarry
#

ah yeah derp

#

keep forgetting you have 2 machines

#

But yes, I'm sure TGWUI starts determining the tokens /sec at the beginning of text generation, and stops at the end of the entire job, but with the TTS streaming the bot hijacks normal behavior adding all the TTS time to the total time.

halcyon quarry
valid crypt
#

could be python or too much code, although i think that tts api could improve the speed

#

how do i log the time when the bot gets the tts?

halcyon quarry
#

There are limits to how fast discord can accept data from a single source - and these are applied automatically

#

So, when we are sending text as well as a "pause()" or "stop()" cmd, etc - these are all automatically throttled

#

If you search for " def apply_extensions" you'll find the function that applies the tts

#

could add a print statement there before and after the process is called

#

print("TIME START:", time.time())

print("TIME END:", time.time())

valid crypt
halcyon quarry
#

Yep thats good

#

might also want to add

#
print("TIME START:", start)```
#
print("TIME END:", end)```
#

print("TIME SPENT:", end - start)

valid crypt
#

the instance i hear the audio it gets printed, where 0.02s is from network (actually lower), 0.73s from alltalk server round it to 1.25s there is alot of time missing

#

hmmm

valid crypt
halcyon quarry
#

yep

valid crypt
#

api might help

#

if you didnt know what would happen when i try to use speak, i suppose that it is meant to work

valid crypt
halcyon quarry
#

I had successfully used the speak command with my setup

#

with alltalk remote

#

need to look into your error when I have a sec and see what's going on

halcyon quarry
#

Ok I see I do have an issue with some of the logic initializing the TTS extension

#

particularly when it does not end with _tts

#

fixing this

halcyon quarry
#

@valid crypt I have it fixed nice now

#

will be pushing it in a sec

#

I changed it so that whatever the heck extension is set in config.yaml as the tts client, it's gonna load it up.

#

If any additional TTS extensions try loading from flags, etc - it's going to warn that only one client can load and it will only load the configured tts extension

#

Might revisit this logic later when I get the API suite all worked out

valid crypt
#

discord just did an ui overhaul .-.

#

i think that it still doesnt work

#

how did you name your alltalk, did you put any params?

halcyon quarry
#

Just pushed the fixes to Main

valid crypt
#

i updated already

#

oh

#

you mena now?

#

there was another update that confused me :v

valid crypt
halcyon quarry
#

All should be good in the update I pushed 6 mins ago

#

"Improve TTS / Extension loading"

valid crypt
#

main or not main

halcyon quarry
#

Main

#

I deleted the TTS branch

valid crypt
#

then idk why it doesnt work

#

i touched this

halcyon quarry
#

Where is that last screenshot from?

valid crypt
valid crypt
halcyon quarry
#

🤷‍♂️

#

Can't see how you could be getting that error

valid crypt
#

and i dont have params in character

halcyon quarry
#

My setup is basically the same. Same, no params in character.

#

That log message isn't 100% accurate

#

Does your startup look similar to the screenshot I posted?

#

Loading your configured TTS extension "alltalk_remote"

#

It will only print this if the extension is tts.client, and the key that speak command is trying to access is that value

valid crypt
#

i just renamed it back with _tts

halcyon quarry
#

try as alltalk_remote

valid crypt
#

the first try got new error this time

#

but ended with the same

#

i missed something nothing

#

are you using the xtts?

#

got this

valid crypt
#

also i gtg 💤 i wish, my bot could be added to your readme as temporal stt solution, i tried my best... it sends tts tags!
at least ill dream it

halcyon quarry
#

Yeah I’m using xtts so that could be part of the issue…

halcyon quarry
# valid crypt got this

Upon reviewing the handling for speak cmd, that warning just means no voice was selected and there was no voice param. But this shouldn't yield an actual error if you have alltalk running with RVC or whatever

#

the api request will just be request = {'text_input': self.text} which is the minimum required information needed for a successful response

halcyon quarry
#

Do you know any very simple extensions that respect default args set in TGWUI’s settings.yaml? I think I just need to review how that’s applied, tweak alltalk remote code and send a PR

#

Ok offhand I think it was like setup() or atsetup() - just need to take a peek there in the remote…

valid crypt
halcyon quarry
#

Could be the other way around > TGWUI sets the extensions parameters so long as they are formatted expectedly

#

in any case it's the reason why for alltalkv1, edge, vits, etc - the bot is able to update parameters on the fly

#

(because you can set parameter values in TGWUI settings.yaml and they actually take effect)

valid crypt
#

then edge tts

#

?

halcyon quarry
#

eh... I'll figure it out

#

@valid crypt Is the /speak command working, aside from that warning message?

valid crypt
#

vit simple api too

#

if you did something after i went to sleep idk

halcyon quarry
#

No, but as far as I can tell it should be sending the minimal API request and should not yield an actual error

#

such as an invalid voice etc... if none selected

valid crypt
#

i wont be able to test anything for a few days as my intel cpu is definitely cooked...

valid crypt
#

forgot that i have a second machine 😓

valid crypt
#

my second machine also give the same error and it is not working, the no voice selected hapopens when the extension or the name is alltalk_remote without _tts

#

you could try downloading a vits model

valid crypt
#

brain aint braining, im downloading xtts :v

#

i'll do a clean install of the bot

valid crypt
halcyon quarry
#

Can't reproduce on my end

#

The only thing you shared that was different from what I have, is I do not include the extension in TGWUI's settings.yaml

#

I only put it in the bot's config.yaml

valid crypt
#

let me try that

#

that might be the only difference, i just did a clean install and got the same error so...

halcyon quarry
#

I hadn't said anything because I was thinking if that could happen when I was looking at the code but it didn't seem likely...

valid crypt
#

failed ;-;

#

i literally did a clean install, entered the token set voice channel, add the extension name in config... hmmm

#

wait, i removed alltalk from settings but still loading it

valid crypt
halcyon quarry
#

Yes, thats what I said - I only load the TTS from bot's config

#

If that's actually the solution to your issue I should be able to prevent that kind of conflict...

valid crypt
#

still dont work

valid crypt
# valid crypt

literally fresh install maybe i missed something that i have to do?

halcyon quarry
#

I'm super busy but Ill see if I can help for a sec

valid crypt
#

from the fresh install i only added the extension in config this time as it is on my second machine i didnt change the ip

halcyon quarry
#
            loop = asyncio.get_event_loop()
            if tts.api_mode == True:
                request = {'text_input': self.text}
                print("tts.client:", tts.client)
                print("tts_args:", tts_args)
                client_args:dict = tts_args[tts.client]
#

hmm

#

Alright, I see the problem finally

#

Honestly unsure why I didn't get error when I tested

#

Actually, the reason I didn't get error was because I chose a voice each time

valid crypt
halcyon quarry
#

I can test this myself later but I believe if you just replace the whole process_speak_args() function with this,

#

it should do the trick

#
async def process_speak_args(ctx: commands.Context, selected_voice=None, lang=None, user_voice=None):
    try:
        tts_args = {tts.client: {}}
        if lang:
            if tts.client == 'elevenlabs_tts':
                if lang != 'English':
                    tts_args[tts.client].setdefault('model', 'eleven_multilingual_v1')
                    # Currently no language parameter for elevenlabs_tts
            else:
                tts_args[tts.client].setdefault(tts.lang_key, lang)
                tts_args[tts.client][tts.lang_key] = lang
        if selected_voice or user_voice:
            tts_args[tts.client].setdefault(tts.voice_key, 'temp_voice.wav' if user_voice else selected_voice)
        elif tts.client == 'silero_tts' and lang:
            if lang != 'English':
                tts_args = await process_speak_silero_non_eng(ctx, lang) # returns complete args for silero_tts
                if selected_voice: 
                    await ctx.send(f'Currently, non-English languages will use a default voice (not using "{selected_voice}")', ephemeral=True)
        elif tts.client in tgwui.last_extension_params and tts.voice_key in tgwui.last_extension_params[tts.client]:
            pass # Default to voice in last_extension_params
        elif f'{tts.client}-{tts.voice_key}' in shared.settings:
            pass # Default to voice in shared.settings
        else:
            await ctx.send("No voice was selected or provided, and a default voice was not found. Request will probably fail...", ephemeral=True)
        return tts_args
    except Exception as e:
        log.error(f"Error processing tts options: {e}")
        await ctx.send(f"Error processing tts options: {e}", ephemeral=True)
halcyon quarry
#

did it work?

#

Actually I just realized an even easier fix, I know it will work

#

I just pushed it now

valid crypt
#

it worked, and i trust your easier fix, how do you get the audio file with the api, same as capturing the scr?

#

hijacking the extension?

halcyon quarry
#

Nope

#

er

#

When using the /speak command the bot uses an API call and gets the response directly.
When chatting normally, TGWUI runs the remote extension which makes the API call which returns the audio file, and it's detected in the response from TGWUI same as all the others

#

Not really hijacking the extension per se - more like hijacking the entire TGWUI chatbot wrapper function

#

Normally it will only apply TTS extensions after the complete text generation

#

But I monkeypatch that function, and apply TTS extensions any time the bot wants to split text

valid crypt
#

hmmm, it feels slow to be directly, the first video is raw, the second one i added black screen after completing the generation and when it sends the audio which is when the when the warn appeared

halcyon quarry
#

What's the problem?

valid crypt
#

slow

#

just that

halcyon quarry
#

Enable Deepspeed 😛

#

Alright, maybe it's due to the discord message sending over and over

valid crypt
halcyon quarry
#

Try deleting these 3 lines

#

in async def process_speak_args

#

Beyond that, there's nothing I can really do to improve the speed

valid crypt
#

sliming down >:D

halcyon quarry
#

well, until I have true API stuff implemented

halcyon quarry
#

The bot functions don't add any time

valid crypt
halcyon quarry
#

it's just generational tasks and discord interactions that slow things down

valid crypt
#
import requests
from threading import Thread

# Configure the OpenAI-compatible (AllTalk) TTS endpoint URL (change IP and port as needed)
openai_tts_url = f"http://192.168.1.14:7851/v1/audio/speech"
def generate_openai_tts(text: str, voice: str = "nova", speed: float = 1.0,
                          model: str = "any_model_name", response_format: str = "wav") -> bool:
    """
    Call the OpenAI-compatible (AllTalk) TTS endpoint to generate speech audio.
    
    Parameters:
        text: The text input (max 4096 characters).
        voice: The voice to use. Supported values: 'alloy', 'echo', 'fable', 'nova', 'onyx', 'shimmer'.
        speed: Playback speed (between 0.25 and 4.0; default 1.0).
        model: Model identifier (currently ignored but required).
        response_format: Audio format (e.g. 'wav').

    Returns:
        True if audio was successfully saved, False otherwise.
    """
    payload = {
        "model": model,
        "input": text,
        "voice": voice,
        "response_format": response_format,
        "speed": speed
    }
    headers = {"Content-Type": "application/json"}
    try:
        resp = requests.post(openai_tts_url, data=json.dumps(payload), headers=headers)
        if resp.status_code == 200:
            with open(voice_path, "wb") as f:
                f.write(resp.content)
            return True
        else:
            notice(f"Error: {resp.status_code} - {resp.text}")
            return False
    except Exception as e:
        notice(f"OpenAI TTS request error: {e}")
        return False

        elif tts_menu.get() == "OpenAI TTS":
            # Use the new OpenAI-compatible API endpoint
            # You can customize the voice and speed here if desired.
            if generate_openai_tts(text, voice="nova", speed=1.0):
                play_voice()

i might have some import missing

valid crypt
#

100% ~~not ~~stolen 👍

halcyon quarry
#

I've got big plans my dude

#

big big plans

valid crypt
halcyon quarry
#

oops yeah I started typing this warning

#

yeah clear those 2 lines

#

I'm deleting this message

#

I just deleted those 2 lines and pushed it

valid crypt
#

deleting those i get double warns but works just fine

valid crypt
halcyon quarry
#

I'll try to debug this later

#

What warning?

valid crypt
#

actually this is useful for me :p ill pass the footage

halcyon quarry
#

ugh

valid crypt
#

there is definitely 0.5s of room to improve, if the second delay is not discord that would be another 1s

halcyon quarry
#

That error is not discord so there's no improvement

#

that's just a warning from the bot

valid crypt
#

i wrote delay, it took 1s to send the audio file

halcyon quarry
#

Ok - by "There is no improvement" I mean, when using TGWUI the way the bot is now

valid crypt
#

oh ok

halcyon quarry
#

One more thing....

#

I noticed that your print statements included that your character will go idle

#

Offhand I don't think the responsiveness setting / go idle crap would add delay to this processing

valid crypt
#

it has no responsiveness setting

valid crypt
#

Trying to fix for one last, I messed up with my system and I only can boot into safe mode, hope I can fix it

#

lesson of the day, do restore point in different drives and never touch display drivers

halcyon quarry
#

When I have a chance, I’ll update the extension argument handling, so it doesn’t spam warnings when there’s really no problem

#

When I had coded this feature, the main focus was handling extensions with XTTS, And I was doing a lot of shooting from the hip to actually get it working fast

#

Still a few rough edges to smooth out

valid crypt
halcyon quarry
#

The bot code currently cannot directly support TTS API for normal requests - it's the remote extension for TGWUI that is doing the api calls

#

It is using the API for the /speak command

#

Need to update a crapton of logic in the bot and I don't have the time to do that at the moment

valid crypt
#

alr

halcyon quarry
#

Think about it though, it wouldn't save any time

#

Well

#

I'd have to include an option to simultaneously generate text and TTS

#

and the only sane reason for a user to do this would be if they have dedicated computer for each

valid crypt
#

what i was wondering is why is it taking so much (0.5s) to do a call, while by using the web (remote) it is almost instant

halcyon quarry
#

The bot could then go ahead and happily buzz along generating the text uninterrupted - and on the bot end, I'd need for the responses to wait for TTS gen to complete so it can deliver both chunks at once

valid crypt
#

same for the extension, the text is generated for 0.5s and then it send the call

halcyon quarry
#

Yes - that's because there is no stop and go

#

generates 100% text, generates TTS - sends both

#

Bot, back and forth back and forth

#

It will take a very sophisticated framework to allow both at once, with dedicated computers

#

due to the streaming responses feature

#

otherwise, very simple

#

The current framework is already very complicated - so it's going to be quite a can of worms overhauling it

#

Not saying it won't happen but it's going to be a little while - and I am moving in that direction

valid crypt
#

actually i can try all in the same machine, and then i'll tell if there's a lot of performance bottleneck or an actual problem

halcyon quarry
#

If you stick a print statement anywhere in the bot code, you'll see that code execution is almost instant, everywhere.
The only slowdowns are discord interaction and generative tasks

valid crypt
# valid crypt ?

if i want to print when does it do the api call, where would it be, in the all talk extension? i want to generate text entirely and no streaming to test the latency

#

i also want to print when all talk receives the api call

halcyon quarry
#

search for chatbot_wrapper in TGWUI/modules/chat.py

#

wayyy at the bottom, it will trigger TTS immediately before returning the complete text response

#

ad discordbot applies this logic whenever the bot decides to split text, instead of waiting until 100% text generated

valid crypt
#

aww tgwui

#

was seraching in ad bot :p

halcyon quarry
#

If you disable the bot's streaming text setting in config.yaml

#

it will behave the same as TGWUI

#

You can find the modified version in the bot by also searching chatbot_wrapper

valid crypt
#

right before blaming your bot should test tgwui first :P

#

where i print when it finishes?

#

tgwui

halcyon quarry
#

in the screenshot I shared, the first print is when text is done generating and it is about to request TTS

#

the second print is when it's going to return the text and audio.

valid crypt
#

alr tgwui has huge a problem

halcyon quarry
#

oh?

valid crypt
#

around 2.5s and the tts is done in 1.05 like with the bot

valid crypt
halcyon quarry
#

🤓

#

Bot checks out

#

the huge optimizations that you'll see in open source crap is like, different computational algorithms and stuff

#

the bot is just a whole bunch of normal simple braindead operations wrapped around these complex generative models

#

I've been messing around with it enough to know, there are no intentional pauses anywhere unless user wants to play around with responsiveness

valid crypt
#

i think that the text is generated and saved instantly, but the way it uses the extension is not very efficient

#

nope, alltalk is slow i think, the last thing is the time when it does the call

valid crypt
#

the there is something making everything slower, the first print is when it receives the api call, and the secon one is when it sends

valid crypt
#

ayo i found a way to do this lightning fast but idk how

#

example of tgwui api call

#

the part that is taking extra time is completing the tts to send or or just generation completed

#

but this only happend using the tgwui api call

#

instead, if you use the alltalk :7851 to generate tts it is done instantly and skips the completing process

valid crypt
halcyon quarry
#

ok so I finally have the installer changes hammered out. With this framework the bot could theoretically support additional deep integrations like the TGWUI one

halcyon quarry
#

This bit was definitely not my forte

halcyon quarry
#

Wizard 🌈

halcyon quarry
#

OK - I can't say for sure if these are bug free or not but I updated all the launchers and replaced all the updaters with wizards

#

This is on the "unrequire TGWUI" branch https://github.com/altoiddealer/ad_discordbot/tree/unrequire_tgwui

halcyon quarry
#

Marcos you only use windows yes?

valid crypt
#

im a windows boy

halcyon quarry
#

#

Tried linux in the past to more efficiently run a a little budget HTPC (home theater PC) and it was a very painful experience

valid crypt
#

XD, i only use wsl if windows is not supported

halcyon quarry
#

Are you able to try running the WSL launcher on that branch?

#

That’s the one I most expect to be broken lol

valid crypt
halcyon quarry
#

Ah yeah

valid crypt
#

but i can test with windows 🙂

halcyon quarry
#

The windows one will work up until it actually loads the bot haha

#

The hard part is done though I can probably finish the job tonight or tomorrow

#

On first run it determines if the parent directory is TGWUI. If it is, it gives 2 options: integrated install (uses TGWUI environment and will enable more features) or Standalone (create and use own env)

#

Also included a method to check if parent is a fork of TGWUI and allow that

#

The update wizard includes an option to switch the install from TGWUI/Standalone

#

The standalone install will not be able to do text generation until I slap in a TGWUI api method. The ultimate plan is for the bot to accept any configured software for api calls for a number of specific purposes

#

(text gen, img gen, video gen, tts gen, etc) so long as the API has a get method for or user predefines the expected payload structure

halcyon quarry
#

Also planning to add a user “command builder” that can utilize configured apis

#

So a user could easily configure their own custom command like “/set_tts_voice” etc

#

Or a command for a specific comfyui workflow like “/comfy_wan_img2vid”

halcyon quarry
#

@valid crypt about recommending your fork of STT, if you could write something up like a word doc, I could probably do that. Needs to include how to install, troubleshooting, etc

#

In terms of usage with the bot, or any deviation from normal instructions from your peoject page

valid crypt
#

my readme is pretty complete, i dont think i missed anything, althoug i can improve a little

#

i could make a bat to install or something

#

ill look into those

valid crypt
#

some bugs

#

forgot to include usage but i think this is enough

#

also you have to mention that these are for those tts tags

halcyon quarry
#

Could you share some example tags, like the minimum for it to play nicely with my bot?

valid crypt
#

as i use it with /main i didnt include ping or a field to add the name of the bot

#

i could add replacing to do some voice command, something like if i say "create image", replace it with the tag as no asr will include things like _ (create_image), but the tag can adapt to the transcription.

valid crypt
#

¯_(ツ)_/¯

valid crypt
#

i found out why my all talk is taking more time, all because of me being dumb

halcyon quarry
#

Do tell

valid crypt
#

tell?

halcyon quarry
#

Yeah, tell me 😛

valid crypt
#

didnt understand

halcyon quarry
#

Just curious what you did that caused slowdown with alltalk

valid crypt
#

rvc 💀

halcyon quarry
#

Seems like I'm picking a good time to try making TGWUI API possible.

terse folio
#

That's cool!

halcyon quarry
#

Hit a bit of a snag with my new install method

#

On this branch

#

I'm not sure what exactly is causing this, but now when I import TGWUI's modules.shared - the args parser in that module is now parsing my bot's args

#

On the plus side, the bot will now successfully launch and handle image generation tasks etc - without TGWUI.

#

Just need to debug this particular issue here and the TGWUI integration should also be back in place

#

This has something to do with how the bot initializes... previously, os.cwd() would return the directory to TGWUI, but now even running from TGWUI env os.cwd() is returning root dir of the bot

#

I was also parsing args in bot.py but I moved that to utils_shared.py

halcyon quarry
#

Ah okay… the bot code was originally popping them as they were read in. I tweaked it, there’s probably a straggler.

halcyon quarry
#

Oobabooga actually reopened 4 issues I opened that were closed as Stale

calm rain
#

every single closed-as-stale issue was reopened

#

i got like 50 notifs

halcyon quarry
#

Ah - that’s very cool

calm rain
#

i guess he noticed that a lot of the closed-as-stale issues were wrong and wanted to unfuck it a bit

#

just need somebody with a massive amount of patience given access to manually close them, to then go through and actually sort things out

halcyon quarry
#

I remember feeling a little dismayed when no response, they were legitimate issues

valid crypt
#

2.5k issues is 💀

halcyon quarry
#

Yeah at the time I think it was like > 1k as well

calm rain
#

"victim of its own success" situation - repo gets far more issue posts than the one guy running it has time to manage

#

need either a team or a dedicated crazy person extremely high patience individual to sort it

#

ComfyUI has the same problem
-# (for a bit of time I was in charge of solving it re comfy but I wasn't enough and I'm not on that team anymore so rip)

halcyon quarry
#

Meanwhile I get excited when an issue is reported - evidence someone is using the bot lol

#

I was starting to feel something like a team member in Forge, I had solved a few medium difficulty issues. But interest waned when I couldn’t get Illyasviel to review something… the project is like dead to him immediately after he added flux support. I improved the logic of module handling for model changes, they would have just merged it but I changed enough that I felt it warranted the author’s blessing

#

I can’t even imagine how good Forge would be if Illyasviel kept plugging away at it from time to time, the guy’s a genius

#

I heard there’s this cool app called SwarmUI too

halcyon quarry
#

@calm rain just curious to know - did you come up with the idea to use the calculations to factor rounding precision of image resolutions? Or did you basically lift that from trainer code / stability / etc?

#

So simple yet such an effective method for applying nice res values. Love it

calm rain
# halcyon quarry <@105458332365504512> just curious to know - did you come up with the idea to us...

it's literally how the resolutions for training data in many models (including SDXL but also pretty much most models out there) get selected. The base set of resolutions available in the UI are literally just the SDXL trainset res list. The unique bit in Swarm is just the UI to select it, and reusing the same math to estimate good values for models that work at other scales (since they almost certainly calculate train res's the same way anyway)