#ad_discordbot (Fork of Fork of xNul's bot)

1 messages · Page 19 of 1

halcyon quarry
#

but obviously it would sound a bit jarring for it to generate each word separately

halcyon quarry
#

Do you do this with other software?

visual dagger
#

I'm still theorarising of how to approach this

#

the poblem is I will need both the llm and the tts model working at the same time for real time

#

VRAM issue

#

maybe 2 gpus will help? but I don't hsve 2 gpus 😭

#

I might get a used cheap one

halcyon quarry
#

right well this is something I certainly couldn't solve 😛

visual dagger
#

the theory is there, it's possible to make it real time

#

it has been done before on yt, fully locally

#

but there is a hardware problem + software problem

#

both should be optimised for real time use

#

somehow

#

and I remember @terse folio did an experiment where you can cut off the AI, the script reality made adjusts the end of the chat history a bit to make it clear to the llm that it got cut off

#
LLM: so what I was saying is-
Me: No no shut up

someth like this, but there is more to it

halcyon quarry
#

In the works

#

But when, unsure 😛

visual dagger
valid crypt
#

i dont think realtime tts is possible, by sentence would be the maximun
this is really simple to prove, to be more acerated, bring 5 person and make them say a word with no context, or you can think how to say the word with no context

#

im sure that it is impossible to sound natural(continuing?)

#

would sound like cutting

valid crypt
#

and for stt is almost a must

visual dagger
#

if a sentence is 10 tokens, and you getting at least 10 tokens per second, then the voice will start almost instantly with 1s delay
(talking abt the LLM generation)

#

or 2s delay adding the tts process too

#

1-2s delay between you submitting the text prompt and the voice playing

halcyon quarry
#

I think Marcos just means the whole generating one word at a time, the words not flowing well, etc

visual dagger
#

ah yes I get it

#

that wont be nice yeah

halcyon quarry
#

There would need to be a lot of research into it like the way they are able to get some degree of reasonable consistency in video generation now

visual dagger
#

but a whole sentence will be possible

#

if the quality is bad then we should find a better tts

halcyon quarry
#

previous generation would need to guide future

visual dagger
#

inconsistency

visual dagger
valid crypt
visual dagger
#

make sense, like taking the last 0.5 second as a starting point for the next generation

halcyon quarry
#

No I mean the models and implementation etc would need to be some new tech that does not exist yet

#

Where it looks back a bit or knows a bit what's coming in order to still generate one word at a time but not sound like random trash

visual dagger
#

real time doesn't have to be generating speech word for word

#

if the llm is capable of 10 t/s or more then the tts model can crunch a whole sentence

#

and while the user is listening to the first sentence, the second sentence will be ready already

#

and will be played next automatically

#

queuing

halcyon quarry
#

uh huh, OH! Like the bot already does?

visual dagger
#

between speeches (sentences) there will be no delays

#

the only delay is the first time

#

1s or 2s

halcyon quarry
#

Audio is queued as it generates

visual dagger
halcyon quarry
#

It does not pause and wait for a sentence to be spoken before generating the next sentence

visual dagger
#

so you already implemented that?

halcyon quarry
#

As soon as the TTS is generated, it simultaneously plays it while generating the next text and subsequentyly more TTS

#

which can be finished by the time the sentence is spoken

#

yes

visual dagger
#

what a champion

halcyon quarry
#

If you can get the text and TTS fast enough it will stream it nicely

valid crypt
#

here

visual dagger
#

.
but we are greedy, so how can we reduce that first delay to almost zero? any ideas?

valid crypt
#

i suggested to set the first split trigger % to 100%

halcyon quarry
valid crypt
halcyon quarry
visual dagger
#

there has to be a way, there is always a way

halcyon quarry
#

well what he proposes is string manipulation 😛

#

We are manipulating string

visual dagger
#

but did it reduce the initial delay?

halcyon quarry
#

Currently sentences are split via chance_to_stream

#

so it will roll dice and split or not split

#

Marcos said it could be a good idea to make it guaranteed to split on the first sentence completion or whatever, then use the normal logic to roll random on other factors

visual dagger
#

does it play the very very first ms once it's generated?

halcyon quarry
#

I didn't explicitly time it but probably?

visual dagger
#

different than the rest of the reply

halcyon quarry
#

right

#

I could do it right now and it'll be done in 10 mins really

visual dagger
#

and how much the first delay will be?

halcyon quarry
#

nothing complicated about treating the first "chance to stream" differently from the rest

#

however long it takes to generate that text + the TTS

#

1 sentence

visual dagger
#

a human delay to first token (lol) is about 0.5-1 secs

halcyon quarry
#

you'll be welcome to configure it to send and generate each word separately

#

will just sound like dogshit, but user preference is fine 🙂

visual dagger
#

word by word isnt gonna cut ut

#

it

#

the goal is to reduce the first delay without affecting the quality

#

or affecting it slightly

halcyon quarry
#

yes, please come back and let us know if and when you solve this problem

visual dagger
#

I have an idea

#

generating 100 of starter phrases and saving them to disk, and always using those starters (text) and also the speech therefore real time voice

#

you know the "start with" concept in ooba?

#

those 100 voice starters will be pregenerated and saved for future use

#

you can allow the user to provide a list of starters and click a button "Generate all & save to disk"

#

then give the user those options/boxes to tick

  • Use saved voice starters
  • Use saved text starters
halcyon quarry
#

Well, adding a play_audio tag parameter could be a good start

#

I could implemented the same logic I have for send_user_image which can accept either a direct image file, or a folder to randomly choose from

visual dagger
#

oh and also cache first sentences

#

so in the future when the first sentence matches an already saved first senetence then play it right awya

#

caching is cool to reduce latency

halcyon quarry
#

That's getting a bit too niche

#

for the bot at least

visual dagger
#

believe me it's not

#

alot of models at some point will output the exact same first sentence

halcyon quarry
#

I do already have a tag parameter begin_reply_with - I imagine it will not actually generate that text

#

So you are already welcome to proactively prefix the LLM's reply with a specific string

visual dagger
#

just a food for thought

halcyon quarry
#

I'll think about it

visual dagger
#

you can target the most repetitve sentences ever

#

oh wait the user have to

#

but if you're the user, lol, then you just generate the most probable first sentences

halcyon quarry
#

in the meantime, a good way to battle repetition could be to add some preconfigured randomness to your prompts, in the background

visual dagger
#

I tried man

#

nope

#

some models are very stubborn, no matter what you do they insist on repeating

halcyon quarry
#

I also have the llm_param_variances tag where you can preconfigure ranges for different parameters, and each generation it will randomly select values within those ranges

visual dagger
halcyon quarry
#

There's a lot of tools in here that should be able to put prompting and generations through the blender

visual dagger
halcyon quarry
#

I don't think so

#

But you will get a whole history reevaluation if someone writes something in a different channel ("per-channel history" setting)

visual dagger
#

btw a way to battle inconcistency in voice is adding a little bg noise, like winds or whatever the user chooses, the human brain can't tell the difference

halcyon quarry
#

that's kind of a funny idea

visual dagger
#

bcz the winds are consistent

#

yeah I tried it it works

halcyon quarry
#

there's probably some functions to mix audio together and split on the original length of one

halcyon quarry
#

or loop one to the length of the other, then mix and split

visual dagger
halcyon quarry
#

LLM says "I'm at the beach" and it mixes in the sound of waves crashing and seagulls cawing

visual dagger
halcyon quarry
#

I like it 😛

visual dagger
#

like a game

#

no need to merge audios

halcyon quarry
#

oh right

visual dagger
#

alot of effort went to it

halcyon quarry
#

Never used it

visual dagger
#

me too, but just looking from outside it is very cool

halcyon quarry
#

u trying out my LLM streaming feature?

visual dagger
#

ooba doesnt work for me since I broke it like earlier

#

conda issues

halcyon quarry
#

fix it 😛

visual dagger
#

I tried alot to make it work

#

I'm now a free man

#

my webui is any ui

#

I use llama.cpp and koboldcpp nowadays

halcyon quarry
#

You're here looking for an excuse to fix tgwui

visual dagger
#

and alot of scripting with python

#

I tried again and again but it just doesnt install

#

tried to fix those conda env issues but still no hope

#

screw it I will make my own webui

halcyon quarry
#

Just run the 1 click installer and be done with it

visual dagger
#

brother

#

do you think I didnt?

halcyon quarry
#

it installs and runs from its own miniconda

visual dagger
#

I know

#

it didnt work

#

I spent hours trying to fiz ereos

halcyon quarry
#

Idk how this can go wrong for anyone

visual dagger
#

fix errors*

#

you want to make a game like tab?

#

I can contribute a bit, but can't promise doing the whole thing

halcyon quarry
#

wdym

visual dagger
#

some sort of a game interface

#

it will be beside the chat tab

halcyon quarry
#

The bot needs a settings interface but settings interfaces are a big PITA to code

visual dagger
#

like with visuals

halcyon quarry
#

would do that before thinking of a game interface

visual dagger
#

we can make something from scratch

halcyon quarry
#

not interested 🙂

visual dagger
#

dont upvote it

#

: /

valid crypt
valid crypt
#

just suggested

#

:p

halcyon quarry
#

I'd think something more like "Let me think about that..." Or "Well, let's see..." or "hmmmmmm...."

#

Unless its a frat bro LLM char "Yo like, uhhh..."

valid crypt
#

just for inspiration, should be customizable though

halcyon quarry
#

I'm going to fix that progress bar tomorrow, no matter what

#

the image gen progress bar that likes to vanish

terse folio
#

It's okay to have a second of silence,
People need time to think.

While people talk to us in voice, we are building up an output response in our minds, just not speaking it yet.

That's why it seems like humans can respond so quickly, because they have been thinking about it during the whole sentence.

if the llm is fast enough, you could do this all quickly at the end.
if not, it may be better to generate a little big mid sentence so the llm is more prepared for when you're done speaking and can get to tts immediately

I need to rebuild that system

halcyon quarry
#

I figured out the reason why the progress bar poofs. Got it on the first guess really... should've debugged it sooner

halcyon quarry
#

apparently the API call can report measurable "progress" before it will return a positive "job count"

terse folio
#

that's interesting, hmm

#

Does it include the current job count in the progress data?

halcyon quarry
#

I fixed it though

#

Now I just ignore job count - if it returns progress data it keeps going

#

I'll see if I can improve it a bit more such as updating the message if it stalls, etc

terse folio
#

yea, some sort of timeout for how long it takes for progress to change.
but if a model is being offloaded to ram, it could take too long to make meaningful progress

halcyon quarry
#

Pushed fixed and improved SD Img Gen Embed

#

If I can figure out how to effectively use the "Cancel" api endpoint I may add a Cancel button to the embed

terse folio
#

going to also want to make sure the button is disabled on completion so users cant mess with others' generations

halcyon quarry
#

ah yes... well it could check if the user is the original user. Or I could make it separate and ephemeral

terse folio
#

I mean, even if it is the original user,
A malicious user could do an image request then cancel it after completion

#

because SD doesn't know what job you are canceling, it just knows to stop everything

#

similar to tgwui

halcyon quarry
#

Not quite

#

The cancel endpoint seems to want a specific event ID

terse folio
#

oh nice!

halcyon quarry
#

which I believe will be present in the progress data...

terse folio
#

okay, that's pretty good ^-^

valid crypt
#

^-^

halcyon quarry
#

Just made a very nice commit for SD Forge API handling

#

Forge API users will rejoice

halcyon quarry
#

Mainly Illyasviel had removed the functionality of “override_settings” whilst revamping the code for Flux support, but never plugged it back in

valid crypt
#

forge can use flux?

halcyon quarry
#

Yep!

#

And I can take a fair share of credit for it working via API

valid crypt
#

🤯

halcyon quarry
#

There is example code in more recent versions of the bot (dict_imgmodels, basesettings) showing Forge specific values that can be used for managing Flux models

halcyon quarry
#

Forge got Flux, then I waited… and no one fixed the api related code… so took a stab at it and made it happen (a few times now)

valid crypt
#

:v

halcyon quarry
#

I’m going to add 2 new tag params:

  • name - will be used in cmd print statements, as well as for…
  • if_tags_matched - a new condition, a list of tag names. If any tags with a name are matched the condition will be true
halcyon quarry
#

For the latter, the main benefit will be if you have a crap ton of triggers for a tag, but you want another tag to trigger on those triggers but maybe another condition… or just want 2 separate tags with same triggers, can just use name trigger

tepid needle
#

Quick question
Can the bot connect to a comfyui server?

terse folio
tepid needle
#

Noted
Thanks for the quick response

halcyon quarry
#

Er, I’m working to make it work with SwarmUI which is a frontend for Comfy

#

Might also just work for Comfy directly

#

My PR at Forge got merged so override_settings now works for Forge again

halcyon quarry
#

Pushed new setting to define imgmodel filters per-server

#
  • Enhances usefulness of the per_server_imgmodels setting
#

I'll likely add another code block for per_channel_filters because why not

#

Not necessarily to split SFW / NSFW bot uses, but can help with only enabling relevant models, like if you have a server dedicated to cartoons and such can keep realistic modls out, etc

valid crypt
#

me while changing settings

tepid needle
#

im loving the discord bot API

#

took me a bit to set up the response formatting and configuration and figuring out the yaml settings but after that it works like a charm

#

im using a llama 3.1 70B exl2 4.0 model and image generation alongside it
fits snugly within my specs

#

the tag system is very fun

valid crypt
#

:) leave him a star

halcyon quarry
#

@tepid needle thanks! Glad you're enjoying it. Open to any feedback or suggestions

#

The tags system was this eureka idea I had to wrap up most of my existing features into one package, with easy expansion/enhancement/all that. I was in way over my head when I got into coding it, and its one of those things I'll always be proud of.

tepid needle
#

the tag system works wonderfully with image generation

#

minty swaps in and translates their prompts into something usable for image generation

#

really, really good
makes it intuitive and easy for the user who doesnt know how to work with image generation

#

the only thing I wished was that image generation swapped to a different model depending on a tag

halcyon quarry
#

Well that can be done

tepid needle
#

Huh, then I did something wrong then cause I tested it out using selfie with a character

halcyon quarry
#

Coincidentally, earlier today I noticed my comment for that param was not correct... well, it was correct at one point

#

The value for the swap_imgmodel or change_imgmodel tag param should be the "model name"

#

The model names can be fetched from the sd-models endpoint

#

its basically the filename except any subdirectories willbe prefixed before it

tepid needle
#

so to use an example
lets say I want to swap over to the 'juggernautXL_juggXIByRundiffusion.safetensors [33e58e8668]'

would I type it like this in the character file?

swap_imgmodel: 'juggernautXL_juggXIByRundiffusion.safetensors [33e58e8668]'

halcyon quarry
#

Such as /sdxl/leosamsHelloworldXL_helloworldXL70 - the model name is:
sdxl_leosamsHelloworldXL_helloworldXL70

tepid needle
#

ah

#

I see

#

so thats the proper formatting

halcyon quarry
#

I'll fix that comment now

#

swap_imgmodel: juggernautXL_juggXIByRundiffusion would be correct unless its in a subdir

tepid needle
#

theyre not, this is the file structure
\stable-diffusion-webui-reForge\models\Stable-diffusion

halcyon quarry
#

yep, then as I said should work

tepid needle
#

got it, I'll try that out later

#

Greatly appreciate the help

halcyon quarry
#

appologies for the confusion

tepid needle
#

man, I cant wait to get that settled in
now on key words I can swap to a specific model with the proper Lora to match an aesthetic

#

fun stuff

halcyon quarry
#

Yep! Lots of fun stuff to do with it

#

What amazes me is that when I promote this no one even looks lol

#

Baffling really, but I'm plugging away all the same

tepid needle
#

thats crazy
the tag system makes it really intuitive for a user who has no idea how prompting works to get what they want

#

i can just throw the bot into a server and not be pinged to get a specific prompt for an image

#

similar prompt but from two different users using different keywords
the tags activate and the proper Loras are used

halcyon quarry
#

So looking at my code - it looks like it should actually work either way

#

whether using juggernautXL_juggXIByRundiffusion.safetensors [33e58e8668] or juggernautXL_juggXIByRundiffusion

tepid needle
#

got it

halcyon quarry
#

Noice

#

In Forge... my instructions will fail though because for whatever reason, the UI does not show the [HASH] if it was calculated

#

You're using reforge though, that value from the UI should match the value in the API call

tepid needle
#

im updating my repos before testing this out

halcyon quarry
#

It tries matching these values marked in the left

#

here in Forge the damn dropdown values dont match anything

#

I guess I could also check if filename.endswith(value)

#

Another thing to consider, is trying to come up with your own prompting characters - using the same approach as the M1nty-SDXL character

#

just need to provide different example responses

tepid needle
#

fuck yeah it worked

#

it didnt change back to the previous model but it did swap

halcyon quarry
#

I'm not going to dive in but I believe that it is now just prepared to load that model again on the next request

#

If not - then I have something to fix 🔧

#

Did you use the swap_imgmodel param?

tepid needle
#

i did generate an image after it

#

yep

halcyon quarry
#

I'll look into it tomorrow

#

outta time 😛

tepid needle
#

ditto
gotta get ready for work tomorrow

#

but with this i can use the tag system to swap to the appropriate model

halcyon quarry
#

Some more cool image stuff to check out:

  • Flows tag is very powerful once you get it, check out the few examples I provided. For example, you can gen an image then use it as input to gen a second image, such as img2img and/or as a controlnet input.
  • I have an explanation in the “tips” folder on how to make some advanced workflows… the bot has a very elaborate method of choosing random images from nested directories. If you have a number of different image inputs in the same tag (img2img, controlnet inputs, reactor input, etc), all set to randomly select, it will try finding the others in the same directory
#

Such as, if I have a number of directories of “person posing with product” - and each directory has a package of the same inputs except they are different poses - the bot is able to basically pick one of those directories and apply all the matching inputs

tepid needle
#

oooh, I see

halcyon quarry
#

So I could have someone pose with the product in all different ways, then make some inpainting masks / etc for each one

tepid needle
#

ill first need to figure out these other parts of stable diffusion because i started two days ago reading up on stuff

halcyon quarry
#

There’s a lot to digest!

tepid needle
#

what I would like is to use the upscaler for higher quality images when specified by the user

#

which im assuming can be done with the flows tag

halcyon quarry
#

I’d recommend using the Flows tag for that - and use a series of incremental scale ups

#

Exactly

tepid needle
#

i have something similar in comfyui so I get the idea

halcyon quarry
#

There’s a cool extension you can get called Loopback Scaler

#

You’ll find it in extensions list

tepid needle
#

found it

halcyon quarry
#

Basically you’d do something like double your resolutions, then use 4+ steps with a medium-low denoise

#

I believe I added a custom payload param in the bot, “scale”

#

(Might be tag param)

tepid needle
#

found it

#

gonna set that to 2 for double the resolution

halcyon quarry
#

Well with Flows tag you’d wanna do something like
Step 1: scale 1.3
Step 2: scale 1.2
Scale 1.1

#

Or similar

#

Note that it will round dimensions to 64px precision

#

I may add a tag to adjust rounding precision

tepid needle
#

interesting tool

halcyon quarry
#

I got some great results using the xinsir Tile controlnet model with that - along with Reference controlnet

#

This allows a high denoise value

#

While keeping very close composition

tepid needle
#

im getting the hang of it

halcyon quarry
#

Definitely my favorite way to upscale

halcyon quarry
#

You should consider switching over to Forge from ReForge

tepid needle
#

What are the benefits?

halcyon quarry
#

The model loading/memory management may be marginally better than ReForge as well

#

I have a 4070ti (12gb vram) and quantized versions of Flux generate about the same timeframe as XL

tepid needle
#

oh really

#

ill give it a shot then

valid crypt
visual dagger
valid crypt
#

search Doherty Threshold

visual dagger
#

I did

#

"addicted to the website"

#

I like that, lol

valid crypt
#

wut

#

ill search for you :)

visual dagger
valid crypt
#

it is just a theory

#

like a goal

visual dagger
#

a game theory?

#

jk

valid crypt
#

the goal is to make it <400ms

visual dagger
#

hums?

#

hmmmmmmm...

#

voice humming? or sound effects

valid crypt
#

to lengthen the time could be hmmm

visual dagger
#

premade sounds that can be played right away, while the actual generation is being baked in the backend

valid crypt
#

Doherty Threshold only says that under 400ms we dont feel the waiting and by giving any kind of feedback can lengthen those 400ms

visual dagger
#

you mean we got like 0.4 secs for free?

#

I mean if the actual generation is late a bit it's alright because we can play a premade sound even after 0.4 secs

valid crypt
#

this tells us that we have to process the first sentece+tts in less than 800ms in worst scenarios

visual dagger
#

possible right?

#

I mean if the humming is long enough

valid crypt
visual dagger
#

or whatever premade sounds we choose

#

if they are long engough

valid crypt
#

if humming is 1s which i think is alright in total we have 1.8s

visual dagger
#

1.8s of free time to generate (speech) the actual llm response

#

seems cool

#

you just add the humming or whatever default starters we choose to the llm response before hitting start

#

so in the end the llm response will make sense bcz the voice and the llm response are matched perfectly

valid crypt
#

🤗

#

demanding a toggle to trigger tts for the first split :V

visual dagger
#

you will choose a random starter to play?

#

from the list?

#

Example list

hmm...
hmm.. {user}
{user}
I think...
So...
You know what?
I... hmm..
umm...
#

starters/premade starter voices

valid crypt
valid crypt
#

have you tried the streaming tts feature?

halcyon quarry
#

This guy doesn’t even use text generation web UI

visual dagger
#

ooba doesnt want me

#

: /

#

ooba hates me

halcyon quarry
#

Yeah I’ll add the parameter to trigger the first TTS split I’ve just been busy

#

I’ve been pushing some pretty significant Paul requests to forge the past few days

visual dagger
# valid crypt that looks good

you can allow the users to provide a custom list, then click a button "Generate Starters", all those starters will be voiced and cached for the future

#

just ideas for you guys

valid crypt
visual dagger
#

sometimes it's better to control the users behavior than the code

#

so add a note "Preferebly write long starters for good performance"

#

a note under the feature

#

longer starters will give you more headroom, more secs, more time to work with

halcyon quarry
#

Sometimes I'll get an error message when it tries saving History to file.
This was reported before.

Pushed an update that resolves this

valid crypt
#

idk if you were referring the problem i encountered before but indeed there was a problem with the history file

halcyon quarry
#

Probably it

halcyon quarry
#

Pushed an update

  • Tags can now have a "name" parameter
  • For the moment, this doesn't do much, only:
    • prints the name for some tag logging
    • Is now required for tags which include a "persist" param.
#

Coming soon, new condition which will be True if the value matches the name of a matched tag

#

Using 'name' to log/check 'persist' tags sits much nicer with me, than what I was having to do in order to capture the tag value and compare it for equality

terse folio
#

For easier at-a-glance readability, every log for tags could be displayed as [TAGS | {name}] since it's a stored attribute that gets passed around ^^

halcyon quarry
#

I might update the 'trumps' behavior to operate on 'name' instead of 'triggers'

halcyon quarry
#

Pushed update adding new condition 'only_with_tags'

  • only_with_tags is a list of tag names (the new name param)
  • This condition is only True if one of the named tags was matched.
#
  • Condition will also be True for any named tags that are persistently applied
  • If a named tag was matched, then trumped, this tag won't trigger.
tepid needle
#

@halcyon quarry one quick question before I forget
Is it possible to allow a tag, when triggered, to input a randomly chosen set of values for image generation?

halcyon quarry
#

Yes it’s the img_param_variances tag @tepid needle

#

You predefine the ranges for each setting you want randomized

#

For number values integers and floats it does not use the value that it picks rather it will add or subtract the selected value from whatever the default value is like if you have 30 steps edit choose is five image will generate with 35 steps

#

I have some comments included for the tag for him so go check it out

#

Voice input

valid crypt
#

im 😵‍💫

halcyon quarry
#

As long as the tag with persist also has a name parameter, it will work

#

Instead of retaining a copy of the entire tag value, as I was doing, it is now only retaining the name

#

During the tag matching, it fetches any persistent names that were captured.
As it iterates over the tags, if a tag has a matching name it will be automatically applied

valid crypt
#

the persist tag must have a name
if the trigger matches ✅
if the name matches ✅

halcyon quarry
#

The name does not have to match anything for the tag to trigger in the first place

#

It's only used to re-match the tag

valid crypt
#

you mean that before it retains the content of the tag and now it retains the name and looks for the content?

#

😵‍💫

halcyon quarry
#

Yes, before it made a copy of the entire tag and kept it

#

Now when a persistent tag is matched, it just captures the name

#

It applies the entire tag

#

Look, as far as you are concerned all you need to do is slap a name on it and it will behave as it has been

valid crypt
#

what are the benefits?

halcyon quarry
#

I manipulate the tags as they are being processed

#

by taking a snapshot of the tag value, then trying to compare it again later for equality, I had to mdofy a lot of code

#

so instead of capturing something like {'trigger': 'some text', 'should_gen_image': true, 'insert_text': 'some shit', 'text_replace_method': insert'} etc etc, then trying to match it later

#

Now I just capture: persist_tag_names: ['some shit', 'another persistent tag']

#

These are "tupled" with the number of remaining persistency

#

so it can be deducted by 1 every time until zero

valid crypt
#

ah

#

benefits are for the coder

halcyon quarry
#

benefits are for the code itself!

#

The way I had it was not sustainable

valid crypt
#

healthy code healthy dev

halcyon quarry
#

I found a bug which in some cases, caused a number of tags to be completely skipped, when using the /image command.

#

Just pushed a fix for that

valid crypt
#

is civit half down for you?

halcyon quarry
#

civit seems fully OK as far as I can tell

valid crypt
#

i have a big problem then

halcyon quarry
#

Your network or work network?

valid crypt
#

looks fine

halcyon quarry
#

if you are at work they may have put some specific block or something

#

try a different browser

valid crypt
#

i bet my intel cpu is dying

#

sd 3.5 🥳

valid crypt
#

when i look

#

when i look away and look back

#

ha got you

halcyon quarry
#

one line I needed 😄

tepid needle
#

Wooo

valid crypt
#

👏

halcyon quarry
#

Happy Halloween to you too

terse folio
#

Happy Halloween!

visual dagger
#

ded 💀

#

it feels so empty in here. hello!

terse folio
#

Hello!

halcyon quarry
#

Hi!

terse folio
#

wave wave How's the project going?
^-^

halcyon quarry
#

Been on break 🙂

terse folio
#

That's a mood, I've been focusing on a lot of life stuff lately

visual dagger
#

I don't pay you 0$ per month for nothing

valid crypt
#

😶

terse folio
visual dagger
#

congrats!!

halcyon quarry
#

This is actually pretty huge... Panchovix (SD ReForge) was able to bring the lora control extension into the Forge memory handling, which has been in demand since the initial release of Forge

#

I'll be pushing an update in the next day or so to allow ReForge to use my automatic loractl scaling feature

#

which is currently only enabled for A1111

halcyon quarry
#

Ok well he made some other changes that screwed it up and now he has the feature back in dev lol

valid crypt
#

donwloading reforge...

halcyon quarry
#

It's probably the best UI if you don't care about Flux, and are iffy about Comfy / Swarm

#

he definitely broke it on the next commit or one of the next ones

calm rain
#

don't be iffy about swarm

#

assimilate

halcyon quarry
#

I began adding Swarm support to the bot, but have been tied up with a video game lately / lost motivation atm

#

So far all it can do is detect if Swarm is running and capture the session id and that's it 😆

valid crypt
#

XD

#

im looking for mativation too

valid crypt
#

forge

halcyon quarry
#

Ok so now the loractl feature ACTUALLY WORKS in ReForge

#

I’ll push a quick update today to allow the bot’s auto loractl scaling feature to be enabled with ReForge

#

Actually works very well if you want to try setting up a whole crapton of tags with Loras, and triggering multiple

halcyon quarry
#

Pushed an update which allows loractl with ReForge

halcyon quarry
#

Pushed an edge case minor update

The bot can now apply the aspect ratio from an img2img image, by using the value 'from img2img' for the aspect_ratio tag parameter
This is useful for applying multi-controlnet, and other multiple-image-input tags using "random directory" method. The subdirectories can now have different resolutions.

valid crypt
#

never tried img2img :v

halcyon quarry
#

You can do a number of things with img2img via Tags

#

the most recent image generated is retained in the user images/temp location - so you can use '__temp/temp_img_0.png' as a valid img2img

#

Ya know, to use the last image as input. Can make a tag for that

halcyon quarry
#

Been daydreaming about how to make a very flexible integration of ComfyUI, with the bot’s tags system. It would be super cool to be able to run all sorts of workflows, prompt for required inputs, handle whatever the response is whether image video audio etc

#

text2video, img2video, vid2vid - all very accessible now to anyone with a 3060+ with acceptable quality and generation time

valid crypt
#

nice dream :)

#

awww comfyui

#

man there are too many uis :v

halcyon quarry
#

🎅

terse folio
valid crypt
#

what happened to alltalks :O

#

its amazing

halcyon quarry
#

Still works for me

#

Someone just posted this on resdit… supposedly the web search extension works?

#

Don’t have time to check it out myself atm

valid crypt
#

i remember that what didnt work was when you plug it to the bot

#

although ive never tried adding it

valid crypt
#

@halcyon quarry please add support for the remote version of alltalks v2

halcyon quarry
#

@valid crypt sounds like you've tested out alltalk v2? Does it work with the bot?

#

I havent played with it yet

valid crypt
#

not even for tgwui :v

#

the remote version works

halcyon quarry
#

seems like there is a "TGWUI Remote Extension" that alltalk v2 is compatible with

valid crypt
#

but with bot just no response

halcyon quarry
#

so its not a feature of alltalkv2 per se

#

ah

#

it is part of alltalk...

valid crypt
#

the is the message

#

and of course i tried and failed :)

halcyon quarry
#

ah yeah I think I also saw that message recently... and stuck with v1

#

which still works fine

#

I'll look into this a bit more

valid crypt
#

:)

burnt patrol
#

Thanks

remote thistle
#

Hello! I've ran into an issue that I suspect is my own doing but I'm kind of at a loss. At some point I created a conflict in the installer environment, so opted to just reinstall the whole stack (oobabooga, ad_discordbot and all) rather than try to figure that out. Should have been a clean wipe, but now I'm getting this error when I try to interact with the bot.

ERROR [bot.__main__]: An error occurred in llm_gen(): 'static_cache' Traceback (most recent call last): File "/home/mole/text-generation-webui-main/ad_discordbot/bot.py", line 2003, in llm_gen async for chunk in process_responses(): File "/home/mole/text-generation-webui-main/ad_discordbot/bot.py", line 1953, in process_responses async for resp in generate_in_executor(func): File "/home/mole/text-generation-webui-main/ad_discordbot/modules/utils_asyncio.py", line 161, in generate_in_executor result, is_done = await loop.run_in_executor(None, get_next_generator_result, gen) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/mole/text-generation-webui-main/installer_files/env/lib/python3.11/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/mole/text-generation-webui-main/ad_discordbot/modules/utils_asyncio.py", line 29, in get_next_generator_result result = next(gen) ^^^^^^^^^ File "/home/mole/text-generation-webui-main/ad_discordbot/modules/utils_tgwui.py", line 342, in custom_chatbot_wrapper for j, reply in enumerate(generate_reply(prompt, state, stopping_strings=stopping_strings, is_chat=True, for_ui=for_ui)): File "/home/mole/text-generation-webui-main/modules/text_generation.py", line 42, in generate_reply for result in _generate_reply(*args, **kwargs): File "/home/mole/text-generation-webui-main/modules/text_generation.py", line 97, in _generate_reply for reply in generate_func(question, original_question, seed, state, stopping_strings, is_chat=is_chat): File "/home/mole/text-generation-webui-main/modules/text_generation.py", line 305, in generate_reply_HF if state['static_cache']: ~~~~~^^^^^^^^^^^^^^^^ KeyError: 'static_cache'

I know it's an issue on the bot's end as the oobabooga client on its own works just fine. But other than that I haven't a clue. Also tested it using one of the known good example characters. Same thing.

This'll probably end up being silly, so I accept any teasing coming my way. 😛 I just want to get the bot working.

terse folio
# remote thistle Hello! I've ran into an issue that I suspect is my own doing but I'm kind of at ...

This isn't your fault, it's likely that you're using a version of tgwui that is more modern than the ad_discordbot

TGWUI is expecting a certain variable from the bot that the bot doesn't know to provide.
I'm running on a pretty outdated version of tgwui, not going to take the risk updating just yet!

But here's how you can patch that:
In tgwui/ad_discordbot/dict_base_settings.yaml
you can add static_cache: False
as one of the options under llmstate > state

remote thistle
#

Could be! I had installed tgwui a while back to originally use with oobabot before making the switch, so it's very much possible that what I grabbed a second time is a bit too new. I'll try that patch and see if it works!

terse folio
#

If after this, you continue to have similar errors about something missing from the state variable you can find the defaults under tgwui/modules/shared.py

remote thistle
#

Looks like that worked perfectly! It's up and working just like it was before. Thanks for your help!

valid crypt
valid crypt
halcyon quarry
#

Need to update the bot to add static_cache

#

This is the first time in awhile that ooba added new params that didn’t fallback to a default value when not in the payload

halcyon quarry
#

updating to latest TGWUI and testing...

#

yes, works fine with this key added.

#

pushed! 2 lines! 😄

#

Yes I see that the original alltalk_tts is not working on the latest TGWUI

#

this is very unfortunate, the dev was so dedicated

#

yes seems like we need to go to the dev v2 version...

valid crypt
#

although even v2 doesnt work ._ .

#

but it is amazin

#

g

halcyon quarry
#

this is crazy, the standalone app takes like 20 minutes to install

valid crypt
#

is it too much or too fast

halcyon quarry
#

what is the good xtts model again? 2.0.2?

valid crypt
#

?

#

i used the default one

#

in the past

halcyon quarry
#

Mistakenly did not retain my model

valid crypt
#

the base model can be easily downloaded here

halcyon quarry
#

Right, but if memory serves me right 2.0.3 was considered a step back…

valid crypt
#

the quality does not matter if there's nothing

#

i had 2.0.2 and never used 2.0.3

valid crypt
#

i think that the whisper stt stoped working after the update

halcyon quarry
#

That sucks if a number of extensions just suddenly don’t work anymore

valid crypt
#

i have an old tgwui in my another pc and i tried it and it worked, but the latest didnt work

#

you can try if it works for you

terse folio
#

Seems the newest version breaks a lot of things including the openAi extension

valid crypt
#

i might get an old tgwui just for alltalk wihtou remote

valid crypt
#

maybe wasnt ooba but if ppl updated it so it should work right?

terse folio
#

I'll have to check it out soon, but just saw someone having an issue with the static_cache variable not being present

valid crypt
#

ask them if they have it fresh installed or updated

#

i updated and im fine

#

😁

terse folio
#

That's great to hear ^^

halcyon quarry
#

I’ve still been mostly engrossed in this game I’m playing but plan on testing alltalk v2 / remote - more, soon

twin thunder
#

I think I forgot to send a message but i've been submitting issues on the github for while now

terse folio
twin thunder
#

this seems incorrect...

terse folio
#

What's this from?
TGWUI updater?
was there an updater for the bot? it's been a while

twin thunder
#

just the update_windows.bat from the folder

#

oh crap I gotta redo my api keys

terse folio
#

that's odd, wonder what happened.
You could try deleting the .git folder if something is missing and have it redownload

twin thunder
#

the correct answer is i'm stupid and didn't install it right

#

nvm I installed it right what is happening

#

I'm just gonna retry everything but through gitbash instead of terminal

terse folio
#

Tgwui is supposed to create it's own environment, that might come with it's own version of git in the conda environment

#

I'm not sure if that would affect things

twin thunder
#

im not using tgwui

terse folio
#

ahh, what are you using?

twin thunder
#

windows???

#

like I just installed the repo from github

terse folio
#

Tgwui is short for text-gen webui ^-^

twin thunder
#

OH

#

lol i'm stupid

terse folio
#

no worries!

twin thunder
#

should update that while i'm at it tho

terse folio
twin thunder
#

ad_discord bot

terse folio
#

ah, okay, will check on that

#

are you updating from a very old version of the ad_discord bot?

twin thunder
#

the updater seems to be working but i'm getting another error

terse folio
#

hm, looks like the wrong path was put in the script, that should be pointing to

textgen_webui/installer_files...

#

ad_discordbot is meant to be inside textgen_webui

twin thunder
#

oh whoops

#

I just deleted all my models

terse folio
#

ohno!

twin thunder
#

thats fine i know what they were

terse folio
#

I would recommend moving your models to a seperate folder

#

and launching the webui with an argument to tell it to read from that folder

#

Like a symlink

#

I personally keep my models on a dedicated drive ^^

twin thunder
#

oh yeah, i should move them onto my m.2 once it gets here

terse folio
#

that way you can move things around without transfering large files or deleting things

#

PS: I wouldn't recommend storing large projects on your desktop

#

windows has to load all those files as your pc boots up
and can slow things down

twin thunder
#

boot time hasn't really been an issue but thats good to know

#

oh my god

#

the web ui version i had downloaded was 1.18

terse folio
#

Wow, that's suprizing, usually it defaults to latest for downloads

twin thunder
#

no, i'm getting the latest now, I had 1.18 on my pc before

terse folio
#

ahh, okay

#

but I could walk you through what needs to be changed
fixed ^^

halcyon quarry
#

There’s no bugs with the bot atm oobabooga

terse folio
#

great!

halcyon quarry
#

I had pushed the update yesterday to add that one new key

#

Although I should really make a new “Release”

#

@twin thunder be sure to see the current install instructions for the bot

twin thunder
#

I was being quite silly the whole time (did not clone the repo to the right spot)

#

what the hell

#

how many models did I have installed

#

I just freed 150gb of space

terse folio
#

😅

#

nice

twin thunder
#

what's ya'lls prefered models?

terse folio
#

I'm a few months behind on the latest stuff, but I found that I could manage to fit gemma 27b at 2bits on my gpu which worked surprisingly well.
But I also use some llama3 8b finetune Hathor_Tahsin for simpler things

halcyon quarry
#

I’ve been running the same model forever… NeuralBeagle 7b

#

Great model

valid crypt
#

literally XD

halcyon quarry
#

works just fine eh?

valid crypt
#

that.. looks like remote...

#

the extension is the same? just using the same environment I think, but I have to try if it works!

#

i'll just ask why the start up was tgwui mode and then remote...

#

never mind, bot can't even load

valid crypt
#

yea stick to 2.0.2, 2.0.3 i feel a slight improvement but it takes me 70% more time...

terse folio
#

I believe those versions refer to XTTS models

valid crypt
burnt patrol
sullen plover
#

hey guys how do i stop people dm'ing my bot?

#

i gotta do all this seems a bit over the top? To disable DMs for your bot while using the ad_discordbot plugin, you can modify its behavior based on its structure and configuration. Below are steps and examples for implementing this functionality:

  1. Modify the Message Event in ad_discordbot
    The ad_discordbot plugin processes messages through a message event listener. You can add a check to ignore DMs. Look for the section in the code handling the on_message event or similar and update it to include a guild check.

Example:
python
Copy
Edit
@bot.event
async def on_message(message):
# Ignore messages sent in DMs
if message.guild is None: # DM channels don't belong to any guild
return
# Continue processing messages in servers
await bot.process_commands(message)
2. Bot Settings for Scope Restriction
Check if ad_discordbot has configuration settings or a config.json file to define bot behavior. If such a file exists, look for options to disable or restrict DM responses.

  1. Ignore DMs Globally
    If ad_discordbot uses decorators for command definitions (e.g., @bot.command()), you can add a global DM filter to enforce the restriction across all commands.

Example:
Modify or wrap the command logic:

python
Copy
Edit
def no_dm_check(ctx):
return ctx.guild is not None # Allow only messages from guilds

@bot.command()
@commands.check(no_dm_check)
async def my_command(ctx):
await ctx.send("This command only works in servers.")
4. Update the ad_discordbot Core Logic
You may need to update ad_discordbot's source to handle this at a higher level:

Locate the part of the code where the bot reads incoming messages or processes events.
Implement a DM filter as shown in the examples above.
5. Redirect DM Senders (Optional)
If you want to send a polite response to DM users instead of silently ignoring them, you can modify the behavior to include a reply.

Example:
python
Copy
Edit
@bot.event
async def on_message(message):
if message.guild is None: # Check if the message is from a DM
await message.author.send("I do not respond to direct messages. Please use the bot in a server.")
return
await bot.process_commands(message)
6. Testing and Validation
Restart the bot after making changes.
Test it by sending DMs and ensuring the bot does not respond.
Ensure commands work correctly in servers.
If you encounter specific issues with ad_discordbot integration or need help pinpointing where to add these changes in its structure, provide snippets of its core processing logic, and I can assist further.

#

also stop it replying to other ai's bots too

terse folio
#

chance_to_reply_to_other_bots in base_settings.yaml

#

ah, it wasn't that texting was disabled in dms, just some commands

#

~~it shouldn't be too hard to add a setting for that and an extra if statement in the on_message

I wont be able to test it as i'm using an older version of TGWUI and dont want to update yet but can make a branch for you to try in a few hours.
Busy atm~~

Edit: there is actually a setting, different file

#

discord > direct_messages > allow_chatting in config.yaml
and can disable all commands in dms too with the next setting allowed_commands

sullen plover
#

thanks checking now 🙂

#

i dont see this in my config.yaml discord > direct_messages > allow_chatting in config.yaml
and can disable all commands in dms too with the next setting allowed_commands

#

reply_to_itself: 0.0 # 0.0 = never happens / 1.0 = always happens
chance_to_reply_to_other_bots: 0.0 # Chance for bot to reply when other bots speak in main channel
reply_to_bots_when_addressed: 0.0 # Chance for bot to reply when other bots mention it by name
only_speak_when_spoken_to: true # This value gets ignored if you're talking in the bot's main channel
ignore_parentheses: true # (Bot ignores you if you write like this)
go_wild_in_channel: true # Whether or not the bot will always reply in the main channel
conversation_recency: 600

terse folio
#

chance_to_reply_to_other_bots already being 0 and still happening might be a bug, interesting

#

what kind of bot does it reply to?
Are these bots mentioning the AdDiscordbot?

terse folio
sullen plover
#

thanks 🙂

halcyon quarry
#

Just run the updater bat file

terse folio
#

what I mean is, do the configs get updated too?
because they're editable

valid crypt
#

i think configs dont get updated, only the first launch will copy from example, but after that, you have to copy or editing them manually

halcyon quarry
#

The config templates get updated-
On startup, the bot compares user settings to the settings templates. Any missing user settings default to what is in the templates, while warning in the cmd window

marsh harness
#

Love all of you

#

Have a good year

twin thunder
#

getting this error while using ExLlama as a loader (i've already posted a issue on github about it)

terse folio
twin thunder
#

ah, just downgrade or wait for ad-discord bot to update

terse folio
#

Sure, but I'm also not sure if updating the bot will add the missing paramater to the settings file as it's meant to be editable.

That means it's probably in the gitignore file.
Ill have to look into that

twin thunder
#

adding the missing parameters to the config appears to have worked! thanks!

valid crypt
terse folio
#

Oh nice!
I wonder what the license is like on that and if it supports cloning/finetuning
ahh personal only
... actually not sure, maybe it's just the demo

valid crypt
#

idk if it can be finetuned, but if the quality is good, rvc is your best friend

valid crypt
halcyon quarry
#

Ah sorry just noticed this is old comment, mb

valid crypt
#

Not even in tgwui

#

From the comments it is supposed to prepare everything at the first startup, but mine didn't, maybe there's something that I had to do but only programmers would know

halcyon quarry
#

I think there’s been a number of significant changes on TGWUI side recently… maybe try on a version as old as latest commit from extension

#

If it works, and can pinpoint commit that breaks it, that would be a good place to start fixing it

valid crypt
#

Tried

#

The extension does nothing

#

You may try it, I can't spot the bug and chatgpt doesn't help

halcyon quarry
#

Are you using the correct model v1.0?

halcyon quarry
#

I looked at the Issues, and the author had just closed one 5 days ago… idk I’d expect that the extension should be doing what it claims to be doing with an active dev

valid crypt
#

look at that

#

i believe

halcyon quarry
#

ah very cool

valid crypt
#

👍 wasnt my problem

#

i remember that the extension refers to it self as KokoroTtsTexGernerationWebui, and to be recognized as a tts extension must have _tts at the end

#

I suspect that my ISP cut off my internet, it should have no limits but I downloaded the deepseek r1 to try if it works...

valid crypt
#

Nah my ISP is down not my fault

#

Today is not the day to try if the bot works

valid crypt
valid crypt
#

nah it doesnt work, with the bot, not a big deal as i cant plug rvc to it

visual dagger
#

hi old fellas

#

still going hard at this.. I see

halcyon quarry
#

not really, haven't changed much in 3 months

valid crypt
#

i quit coding 😓

terse folio
# valid crypt

sounds like missmatched framerates?
It looked like that tts generated 24kfps audio
Perhaps rvc is expecting 16k as other tts systems output?

valid crypt
#

😋

valid crypt
#

gonna make it open source 👍

#

im ready to get roasted 👍

#

gonna fing a way to plug it to the bot 👏

halcyon quarry
#

It should just work probably

valid crypt
halcyon quarry
#

You need to make sure to put the correct extension name in the config file. And add the relevant parameters to your character file

valid crypt
#

alltalks remot didnt work too

#

older tts is like text book examples, i can see, but these new tts, i dont see

terse folio
halcyon quarry
#

Right so you need to just see the example parameters in the minty character and replicate it in your own character with the parameters

valid crypt
halcyon quarry
#

Eh the params are hiding in the code somewhere

valid crypt
#

thats the problem

valid crypt
#

let me do a final test, as ive only checked that the preview workds ._ .

#

it works, im not touching that

halcyon quarry
#

You could've fixed the spelling issue ya know

#

gerneration lol

valid crypt
#

i didnt see that one XD

halcyon quarry
valid crypt
#

it is too late to fix, theres a lot of files that uses the path with the wrong name, i'm not touching that

halcyon quarry
#

Ok so I looked into it, and so far, all the other TTS extensions had added a string to the internal response in the format of:
'audio src="file/(.*?\.(wav|mp3))" - This is the regex that captures it

#

Looking into the code of this, it actually returns a string such as this example:
<audio controls><source src="file/path/to/audio/123456.wav" type="audio/mpeg"></audio>

#

As a quick test @valid crypt you could edit the bot file shared/utils_shared.py - find the audio_source = ... (in SharedRegex)

#

Replace that line with this:
audio_src = re.compile(r'src="file/([^"]+\.(wav|mp3))"', flags=re.IGNORECASE)

#

I believe the bot would then be able to play back the TTS response

valid crypt
#

got it

halcyon quarry
#

in other words it is formatting using source src= instead of what seemed to be standardized... audio src=

#

In your fork, you could also instead try just changing this to audio src= and see how it behaves in TGWUI (via the UI) / the bot

#

I believe this is the actual correct answer... I think source src= is like the generic catch-all for extra file types that can be appended to the internal response.

#

Since my regex also requires the extension to be mp3 or wav, I should be able to safely make this change (drop the "audio") without falsely trying to potentially process other response types as audio

valid crypt
#

i didnt work with my fork, maybe is because it has no default selected voice, all talk should work, wait for me

#

my code's problem

#

all talk works XD

#

👍

valid crypt
valid crypt
valid crypt
#

not my problem, his extension has problems 😠

valid crypt
#

help 😭

valid crypt
#

@halcyon quarry 🥹

halcyon quarry
#

Help do what? lol

valid crypt
#

the extension

#

no audio

#

😭

#

i changet the output modifier to ```def output_modifier(string, state):
# Escape and clean the text
string_for_tts = html.unescape(string).replace('*', '').replace('`', '')

# Generate audio file
msg_id = run(string_for_tts, rvc_params=RVC_PARAMS)

# Create relative path from webui root directory
audio_path = pathlib.Path(__file__).parent / 'audio' / f'{msg_id}.wav'

# Get relative path from webui working directory
relative_path = os.path.relpath(audio_path, start=os.getcwd())

# Convert to web-style path and add cache busting
web_path = f"file/{relative_path.replace(os.sep, '/')}?v={int(time.time())}"

# Add audio element with proper relative path
return f'{string}<audio controls><source src="{web_path}" type="audio/mpeg"></audio>'```

making it use relative path and accessble from local network but i still dont know why bot dont work

halcyon quarry
#

Like I said the bot currently does not expect source src= it expects audio src=

#

on ur last line

valid crypt
#

i changed it alr

#

so alltalk works now

halcyon quarry
#

Does it generate TTS, and save a local version of the output? Amd just fail to play it?

valid crypt
#

webui works

#

with bot, it did generate the file

#

but no playing

halcyon quarry
#

ok this seems to be the problem here

#

in bot.py search for audio_src - there are 2 instances

#

Youll see something like:

                if 'audio src=' in vis_resp_chunk:
                    audio_format_match = patterns.audio_src.search(vis_resp_chunk)
#

Try removing that first condition, and then nudge all the lines below it so they are indented correctly

#
            def apply_extensions(chunk_text:str, was_streamed=True):
                vis_resp_chunk:str = extensions_module.apply_extensions('output', chunk_text, state=self.llm_payload['state'], is_chat=True)
                audio_format_match = patterns.audio_src.search(vis_resp_chunk)
                if audio_format_match:
                    stream_replies.streamed_tts = was_streamed
                    setattr(self.params, 'streamed_tts', was_streamed)
                    self.tts_resp.append(audio_format_match.group(1))
valid crypt
#

🫡

halcyon quarry
#

Well you can ignore the one in speak_task()

#

but that would change to

            audio_format_match = patterns.audio_src.search(vis_resp_chunk)
            if audio_format_match:
                self.tts_resp.append(audio_format_match.group(1))
#

This should work, on the assumption that you also updated the thing in Shared Regex in utils_shared.py as I had said earlier

#

audio_src = re.compile(r'src="file/([^"]+\.(wav|mp3))"', flags=re.IGNORECASE)

valid crypt
#

ah

halcyon quarry
#

right this was some dumb oversight of mine

valid crypt
#

first one correct?

halcyon quarry
valid crypt
#

oki

halcyon quarry
#

An easy way to shift the indents is to highlight all the lines and press Ctrl+[

#

To nudge them to the right, Ctrl+]

valid crypt
#

check?

halcyon quarry
#

yep looks good

#

And make sure that the regex pattern is updated in utils_shared.py

valid crypt
#

didnt work?

#

i checked

#

i think it didnt work

halcyon quarry
#

Add this print statement
print("RESPONSE:", vis_resp_chunk)

#

When you use the bot, it will print the extra crap that the extension adds to the response -

#

then I'll ask ChatGPT why the regex pattern is not finding it

valid crypt
#

🫡

#

You&#x27;re right on track with your dream city floating above the clouds - that&#x27;s an amazing concept! Now, let&#x27;s add some more features to make it even more incredible.

Here are a few ideas:

* A network of sky gardens and vertical farms to provide fresh produce for its inhabitants.
* An advanced transportation system using hyperloops or vacuum tubes to transport people quickly and efficiently throughout the city.
* A unique waste management system that converts trash into energy, water, and nutrients for the ecosystem.

Now it&#x27;s your turn! What features would you add to this floating city?

(I&#x27;ll wait patiently for your response)<audio controls><source src="file/extensions/KokoroTtsTexGernerationWebui_tts/audio/8b837f97-4ac1-421f-ab3c-7cae1ed10050.wav?v=1738705660" type="audio/mpeg"></audio>```
halcyon quarry
#

?v=1738705660" - it has to do with this bit at the end I'm sure

valid crypt
#

idk what is that

halcyon quarry
#

Try with this regex

#

audio_src = re.compile(r'src="file/([^"]+\.(wav|mp3)(\?[^"]*)?)"', flags=re.IGNORECASE)

#

actually

#

this is the one

#

audio_src = re.compile(r'src="file/([^"]+\.(wav|mp3))\b', flags=re.IGNORECASE)

#

This will ignore the extra query that appeats after the file extension

valid crypt
#

🫡

#

check

#

works

halcyon quarry
#

Pushed the changes!

#

Thanks for helping to debug that

valid crypt
#

wait, i dont know if i did something wrong, only the first audio is being played

halcyon quarry
#

Try disable tts streaming

#

I may have made it so only confirmed clients can steam

valid crypt
#

not by that mean, literally only the first audio is being played

#

the first of all

#

ps: audios are generated

halcyon quarry
#

You mean like an old file that was generated in a previous session? The oldest file in the directories?

valid crypt
#

as i'm having a lot of network problem with my laptop lately could be my fault

halcyon quarry
#

If you have TTS streaming feature enabled it could be due to it

#

Otherwise, maybe your network issue. Otherwise, you could probably further debug it with print statements at that apply_extensions()

valid crypt
#

i think is the split of the extension

#

the extension it self has a split function as kokoro only supports 500tokens

halcyon quarry
#

Xtts works the same. All talk splits the text into Individual sentences

valid crypt
#

gonna give a deep test another day 😪

#

it seems to work...

valid crypt
#

after testing the bot with a lot of hello, it is scared of it XD

#

anyways, the tts works good

halcyon quarry
#

Yeah my bot isn’t happy when I write “test” over and over

#

Eyyy nice, tell me if you had to mess with anything beyond just updating the bot

#

With the changes you helped me debug

valid crypt
#

i didnt update, i only changed everything you told me, i have to officially update it now

halcyon quarry
#

May have to delete the files to fetch fresh.

#

Or if using the github desktop app, right click discard changes

valid crypt
valid crypt
#

1h trying to fix tts, im the stupid one with 2 tts extensions XD

#

works perfectly

halcyon quarry
#

You should try to push your changes (RVC support?) to the main project

halcyon quarry
#

I could verify kokoro as a supported TTS extension

valid crypt
valid crypt
#

although only 300 are mine :)

halcyon quarry
#

Ah well

valid crypt
#

mine has _tts suffix and works too

#

I have to say, they are a good combination :) i'm proud of myself 😎

halcyon quarry
#

I'll look into adding support but the thing that's going to be a bummer is if there are actually no parameters

#

I was trying to figure out if they were hidden somewhere but couldn't find them

#

Couldn't find them via printing TGWUI code either - I think the extension independently manages its parameters

valid crypt
#

haha told you, the only parameters you can find are my rvc params hahaha

#

although idk if i used them correctly ._ .

halcyon quarry
#

the other TTS extensions set parameters to TGWUI's shared.args class

valid crypt
#

only 3 is real

#

kokoro is just no emotion

#

old ones, also the good thing about 4 is its insane speed of less than a seconds :v

halcyon quarry
#

Wow! That is very very good

#

Number 1 quite good

#

4 is pretty good indeed given you say it processes fast AF

valid crypt
#

2: 3.5x speed (with rvc)
1: 11x speed
4: 40x (lol)
5: 3.5x
a text that should be 24s long and divide by the average of 5 tries

halcyon quarry
#

4 is very impressive except at the end sounds like "young one" instead of pronouncing it correctly "woman"

#

So what, you used kokoro for all these?

valid crypt
#

only 2 is rvc

#

1 is our good friend alltalk

halcyon quarry
#

Alltalk ftw

#

You’re making me jump through all hoops to make other extensions work but just need alltalk 🤗

valid crypt
#

the misconception is that alltalk wasnt that good, I only asked for edge tts and vits, and these days I asked for kokoro and alltalkv2, but by fixing kokoro, now alltalkv2 works 😏

halcyon quarry
#

Pushed some changes regarding /image cmd

  • The user's prompt would be part of the embed. Now, it is sent as normal text along with the embed so it can actually be copied when using discord on mobile (can't copy embed text on mobile).
  • Added another selection for the use_llm option - to automatically prefix the prompt with Provide an image prompt description (without any preamble or additional text) for:
valid crypt
#

what file of the bot is related to user input? and what line of the bot.py?

halcyon quarry
#

I didn't feel like over-complicating this new option, if that's what you're inquiring about.

#

Can optionally just write the full prompt without a preconfigured prefix 😛
Or use the tags system to prefix prompts

valid crypt
#

i want to steal some stt/asr code and plug it with some black magic

halcyon quarry
#

Personally, this quick prefix stops the LLM from begining the reply with "Sure! Here's an image prompt:"

#

Although, this just made me realize it would be a great idea to add a "Generate image" option to the /prompt command, even if redundant to some degree

valid crypt
#

i was asking to make my life easier, i want to add stt but maybe i can use another program and with a little bit of inspiration make bot think that it was a message form user and 🥳

halcyon quarry
#

The bot code is relatively easy to navigate... relatively.

#

lemme see...

valid crypt
#

i mean 7000 lines is not very friendly, maybe just the name of those modules?

#

like i never thought that to fix tts i have to touch shared utils :O

halcyon quarry
#

There's a number of ways users can input... now the main listener is def on_message()

#

It determines if the bot should reply or not.
If so, it creates a task and queues it.

#

The TaskManager class processes the tasks

valid crypt
halcyon quarry
#

For user message type tasks, it will run one of these code blocks

#

There's modules that simplify/streamline a lot of the code used in these main blocks

#

For instance lots of tags related code is in the tags.py module

valid crypt
#

not touching that very soon

halcyon quarry
#

I comment so much stuff becomes I'll completely forget why the heck I do anything without it lol

#

Want a list of codes to look at if you seriously want to try helping add STT?

#

that one is likely a slippery one...

valid crypt
#

dont expect fancy results from me

halcyon quarry
#

Welcome to hear any proposal on, what your thoughts were on actually handling it.

#

like, a TGWUI extension?
Native discord functions?

valid crypt
#

i was thinking at tgwui but im not sure if it is going to work

#

as whiper stt is not working anymore ._ .

#

if extension can directly do inputs and its compatible with the bot, i could think that way

halcyon quarry
#

Well since the bot is designed to run on its own TGWUI instance and not via API, I'm not sure exactly how the extension could be beneficial...

#

What does it do that discord voice input cannot?

valid crypt
#

right now what im thinking is just make it work, and leave all the problem for a further future

halcyon quarry
#

There likely just needs to be some research on how voice input from voice channel can be captured appropriately to text

#

A listener function that uses discord code

valid crypt
#

make a separated program, a second bot just for audio input, steal some code, make fake inputs to the main bot

halcyon quarry
#

I'd likely just need to add some new "Task" or parameter to existing task, that will ensure no text response from bot, only play response on VC

#

I've been engrossed in this game lately, the ladder season is almost over and I'm definitely sitting out the next one

#

will be back in the saddle

valid crypt
#

:v

valid crypt
#

ohhhh

halcyon quarry
#

Well it's not technically an extension

#

I could ask ooba if he thinks it could be considered an exception... (disclaimer about what it actually is, etc)

valid crypt
#

your is not too far away

halcyon quarry
#

I'll see!

valid crypt
#

although i never understood why your bot cant use the webui or the api

halcyon quarry
#

If you enjoy character specific TTS settings (voices, etc), and TTS streaming - these are not possible via the API

#

Well, the TTS streaming may be possible... really not sure about that.
But definitely cannot adjust extension parameters via API.

#

Pushed small update - added new option to '/prompt' cmd

  • Can now force the response type (text / image / text+image)
halcyon quarry
#

@valid crypt I submitted a PR to the extension list to add the bot, thanks for the suggestion!

halcyon quarry
#

I know - openai API is the TGWUI API

valid crypt
#

I meant let others use the api while bot is running

halcyon quarry
#

It may be possible to run 2 separate instances of TGWUI, if using custom flags with the bot such as unique port, etc

#

I know what you mean is like, an option to run all the UI related code as well instead of how the bot currently executes the backend code on startup

#

I have an (outdated) dev version of the bot which successfulyl uses the openai API - TGWUI launches normally and can be used in the UI simultaneously, etc

#

but this version of the bot does not launch TGWUI

#

and also complicates the settings management, and also makes some features impossible like the TTS voices

valid crypt
#

f

halcyon quarry
valid crypt
#

👍

halcyon quarry
#

Pushed another update for /prompt cmd

  • Yet another option, load_history to specify how much history to load for the interaction
  • The /prompt cmd can now be explicitly disabled from use in DMs via config.yaml
halcyon quarry
#

What I need to add is for the bot to reply to show the user message

halcyon quarry
#

OK - Now the bot will immediately send an embed reflecting the user's prompt and params used for /prompt cmd

#

(don't think system message is applicable for this model/mode)

halcyon quarry
#

I had an idea for a new tag which could be pretty useful… “run_code” which would be a filename, and a companion tag “send_code_result” which would be a format to listen for and send (text, audio, video, etc)

#

Would be a bit advanced for some but would add a lot of flexibility to what the bot could actually do

valid crypt
#

didnt understand

#

exams are driving me crazy

halcyon quarry
#

like a user could define a tag that triggers for some phrase, and maybe I make some syntax that can optionally pass values into whatever code is being executed

#

like multiply >>678<< and >>2000<< and the tag will run a code that multiplies 2 values. Crude example.

valid crypt
#

ahhh

halcyon quarry
#

But it could be whatever code, could be something that generates and returns a video for instance

#

It would just add another advanced tool for users to think about using

valid crypt
#

some model are trained to be able to use tools

#

the next step is make it an agent huh

fickle ember
#

is it possible to specify which gpu the bot uses? i have a dual gpu setup and my main gpu does not have enough vram to load my models.

terse folio
halcyon quarry
#

There is a CMD Flags file for both TGWUI, and one with the bot. Should be able to set the flag for this with the bots cmd flags file

fickle ember
halcyon quarry
#

On the TGWUI repo there is an expanding text labeled List of command-line flags

#

So Ctrl+F to jump to that and open it up

halcyon quarry
#

@fickle ember welcome to the channel btw, let me know if you have any feedback on the bot 🤗

fickle ember
# halcyon quarry <@1276221874275352646> welcome to the channel btw, let me know if you have any f...

ive been using the bot for some time now. I have some feedback.

  1. I notice that as conversations stretch the AI more or less loses its personality and forgets details about itself which are specified in the character.yaml files
  2. The bot is unable to identify what youre talking about when you reply to someone and talk to the bot. i reviewed the console and it only seems to read the message outright not taking the message being replied to into account, in some cases this information would be vital.
halcyon quarry
#

Thanks! In regards to 1. this is not exclusive to the bot; this will happen in the webui as well. You can try using a system message, or maybe even limit chat history

fickle ember
halcyon quarry
#

About 2.- if you are suggesting that the bot might get an automatic prefix to the message like “(user X is replying to user Y’s original message which was ‘blah blah blah’)” then this could be an interesting idea, assuming I can get the message content from replies (would have to check into this)

fickle ember
halcyon quarry
#

System message is only applicable if your model’s template supports it and you’re in that mode (ei: chat template, or instruct template). The TGWUI code chooses the most appropriate template automatically so you’ll have to look into it a bit to see what template is loading for the model

#

Chat instruct mode might also help… i believe this prefixes your prompts with an instruction

fickle ember
#

appreciated. i will try this.

halcyon quarry
#

I’ll def look into that idea, hadn’t considered that before. I do have some other things on the backburner to make it behave more natural as well

fickle ember
halcyon quarry
#

A few features I want to add at once all under a “server mode” setting

#

Honestly, within a month or two. Been engrossed in this game that has a ladder season which doesn’t have a fixed date but ending relatively soon

#

Ttyl though goin to be now

#

Bed*

valid crypt
halcyon quarry
#

for image generation I retain zero chat history for this reason - chat history will incrementally make the responses get worse and worse from desired result

valid crypt
#

finally got some time to do stt, gonna start with the ez way, another bot only for stt, right now gonna go with whisper although these are interesting too.
(~~https://github.com/modelscope/FunASR~~)
https://github.com/FunAudioLLM/SenseVoice
https://github.com/k2-fsa/sherpa-onnx

GitHub

Multilingual Voice Understanding Model. Contribute to FunAudioLLM/SenseVoice development by creating an account on GitHub.

GitHub

Speech-to-text, text-to-speech, speaker diarization, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC...

#

it's a really good way, being really simple, and should be really effective as users can create a private text channels and let bots chat there, while in voice channel feels like totally nothing weird :v

#

not the most elegant way but yeah

halcyon quarry
#

Two bots then?

valid crypt
#

some day when i get better i might be able to fuse them

#

another benefit of this is the environment? and i can use another machine for it :)

halcyon quarry
#

Very cool

valid crypt
#

as you are using discord.py i think i should go with the extension...

valid crypt
#

bruh NotImplementedError: aead_xchacha20_poly1305_rtpsize

#

nah, big win

halcyon quarry
#

So is it working?

valid crypt
#

no more error but im still fighting for it

#

the fork works

#

i think