#ad_discordbot (Fork of Fork of xNul's bot)
1 messages · Page 19 of 1
Do you do this with other software?
I didn't implement a tts mechanism for me yet
I'm still theorarising of how to approach this
the poblem is I will need both the llm and the tts model working at the same time for real time
VRAM issue
maybe 2 gpus will help? but I don't hsve 2 gpus 😭
I might get a used cheap one
right well this is something I certainly couldn't solve 😛
the theory is there, it's possible to make it real time
it has been done before on yt, fully locally
but there is a hardware problem + software problem
both should be optimised for real time use
somehow
and I remember @terse folio did an experiment where you can cut off the AI, the script reality made adjusts the end of the chat history a bit to make it clear to the llm that it got cut off
LLM: so what I was saying is-
Me: No no shut up
someth like this, but there is more to it
AGI .gguf when??
i dont think realtime tts is possible, by sentence would be the maximun
this is really simple to prove, to be more acerated, bring 5 person and make them say a word with no context, or you can think how to say the word with no context
im sure that it is impossible to sound natural(continuing?)
would sound like cutting
the interrupt feature is cool
and for stt is almost a must
why not possible?
if a sentence is 10 tokens, and you getting at least 10 tokens per second, then the voice will start almost instantly with 1s delay
(talking abt the LLM generation)
or 2s delay adding the tts process too
1-2s delay between you submitting the text prompt and the voice playing
I think Marcos just means the whole generating one word at a time, the words not flowing well, etc
There would need to be a lot of research into it like the way they are able to get some degree of reasonable consistency in video generation now
but a whole sentence will be possible
if the quality is bad then we should find a better tts
previous generation would need to guide future
inconsistency
voice2voice?
thats is the streaming tts feature
make sense, like taking the last 0.5 second as a starting point for the next generation
to make the voice consistent
No I mean the models and implementation etc would need to be some new tech that does not exist yet
Where it looks back a bit or knows a bit what's coming in order to still generate one word at a time but not sound like random trash
real time doesn't have to be generating speech word for word
if the llm is capable of 10 t/s or more then the tts model can crunch a whole sentence
and while the user is listening to the first sentence, the second sentence will be ready already
and will be played next automatically
queuing
uh huh, OH! Like the bot already does?
between speeches (sentences) there will be no delays
the only delay is the first time
1s or 2s
Audio is queued as it generates
yup yup
It does not pause and wait for a sentence to be spoken before generating the next sentence
so you already implemented that?
As soon as the TTS is generated, it simultaneously plays it while generating the next text and subsequentyly more TTS
which can be finished by the time the sentence is spoken
yes
what a champion
If you can get the text and TTS fast enough it will stream it nicely
thats why i understood word by word, bcs buddy already mentioned the feature
here
I did some testing on my own, I made something like that a week ago, and it was playing all audio chunks at the same time which is not what I want bcz it gets too noisy etc.. so I figured I should make a queue mechanism and behold it worked, the only delay was the first secs
.
but we are greedy, so how can we reduce that first delay to almost zero? any ideas?
i suggested to set the first split trigger % to 100%
there's no much thing to do, its already splitting the response and generate by chunks, if you want it to be faster, you need better hardware
This is sensible, might add an option for this
there might be, some string manupilation? or something else
there has to be a way, there is always a way
but did it reduce the initial delay?
Currently sentences are split via chance_to_stream
so it will roll dice and split or not split
Marcos said it could be a good idea to make it guaranteed to split on the first sentence completion or whatever, then use the normal logic to roll random on other factors
does it play the very very first ms once it's generated?
I didn't explicitly time it but probably?
so a special method for onky the first sentence?
different than the rest of the reply
and how much the first delay will be?
nothing complicated about treating the first "chance to stream" differently from the rest
however long it takes to generate that text + the TTS
1 sentence
a human delay to first token (lol) is about 0.5-1 secs
you'll be welcome to configure it to send and generate each word separately
will just sound like dogshit, but user preference is fine 🙂
word by word isnt gonna cut ut
it
the goal is to reduce the first delay without affecting the quality
or affecting it slightly
yes, please come back and let us know if and when you solve this problem
I have an idea
generating 100 of starter phrases and saving them to disk, and always using those starters (text) and also the speech therefore real time voice
you know the "start with" concept in ooba?
those 100 voice starters will be pregenerated and saved for future use
you can allow the user to provide a list of starters and click a button "Generate all & save to disk"
then give the user those options/boxes to tick
- Use saved voice starters
- Use saved text starters
Well, adding a play_audio tag parameter could be a good start
I could implemented the same logic I have for send_user_image which can accept either a direct image file, or a folder to randomly choose from
oh and also cache first sentences
so in the future when the first sentence matches an already saved first senetence then play it right awya
caching is cool to reduce latency
I've been battling llm repetition for the last year or so
believe me it's not
alot of models at some point will output the exact same first sentence
I do already have a tag parameter begin_reply_with - I imagine it will not actually generate that text
So you are already welcome to proactively prefix the LLM's reply with a specific string
yeah you can just copy paste that function and editing a bit
just a food for thought
I'll think about it
you can target the most repetitve sentences ever
oh wait the user have to
but if you're the user, lol, then you just generate the most probable first sentences
in the meantime, a good way to battle repetition could be to add some preconfigured randomness to your prompts, in the background
I tried man
nope
some models are very stubborn, no matter what you do they insist on repeating
I also have the llm_param_variances tag where you can preconfigure ranges for different parameters, and each generation it will randomly select values within those ranges
llama and mistral models are one of them
There's a lot of tools in here that should be able to put prompting and generations through the blender
it doesnt cause a whole history reevaluation?
I don't think so
But you will get a whole history reevaluation if someone writes something in a different channel ("per-channel history" setting)
btw a way to battle inconcistency in voice is adding a little bg noise, like winds or whatever the user chooses, the human brain can't tell the difference
that's kind of a funny idea
there's probably some functions to mix audio together and split on the original length of one
ffmpeg!!!
or loop one to the length of the other, then mix and split
is faster than python libs
LLM says "I'm at the beach" and it mixes in the sound of waves crashing and seagulls cawing
or yk just play the winds on loop forever once the webui is accessed
I like it 😛
I think alot of people will wnat that
like a game
no need to merge audios
oh right
we're getting to silly tavern territory, but seriously the amount of features they have is insane
alot of effort went to it
Never used it
me too, but just looking from outside it is very cool
u trying out my LLM streaming feature?
fix it 😛
I tried alot to make it work
I'm now a free man
my webui is any ui
I use llama.cpp and koboldcpp nowadays
You're here looking for an excuse to fix tgwui
and alot of scripting with python
I tried again and again but it just doesnt install
tried to fix those conda env issues but still no hope
screw it I will make my own webui
Just run the 1 click installer and be done with it
it installs and runs from its own miniconda
Idk how this can go wrong for anyone
fix errors*
you want to make a game like tab?
I can contribute a bit, but can't promise doing the whole thing
wdym
The bot needs a settings interface but settings interfaces are a big PITA to code
like with visuals
would do that before thinking of a game interface
we can make something from scratch
not interested 🙂
i think i suggested similar ideas, i think it was prepare a bunch of voices or sounds that's always normal the begin a reply with, like umm, uhhh, ehhh, i think..., i... and randomly choose one and play it at the beginning, pretty much with 2s we get enough time to process the rest, 1s of audio 1s of silence feels natural
you tried it?
I'd think something more like "Let me think about that..." Or "Well, let's see..." or "hmmmmmm...."
Unless its a frat bro LLM char "Yo like, uhhh..."
just for inspiration, should be customizable though
I'm going to fix that progress bar tomorrow, no matter what
the image gen progress bar that likes to vanish
It's okay to have a second of silence,
People need time to think.
While people talk to us in voice, we are building up an output response in our minds, just not speaking it yet.
That's why it seems like humans can respond so quickly, because they have been thinking about it during the whole sentence.
if the llm is fast enough, you could do this all quickly at the end.
if not, it may be better to generate a little big mid sentence so the llm is more prepared for when you're done speaking and can get to tts immediately
I need to rebuild that system
I figured out the reason why the progress bar poofs. Got it on the first guess really... should've debugged it sooner
apparently the API call can report measurable "progress" before it will return a positive "job count"
I fixed it though
Now I just ignore job count - if it returns progress data it keeps going
I'll see if I can improve it a bit more such as updating the message if it stalls, etc
yea, some sort of timeout for how long it takes for progress to change.
but if a model is being offloaded to ram, it could take too long to make meaningful progress
Pushed fixed and improved SD Img Gen Embed
If I can figure out how to effectively use the "Cancel" api endpoint I may add a Cancel button to the embed
going to also want to make sure the button is disabled on completion so users cant mess with others' generations
ah yes... well it could check if the user is the original user. Or I could make it separate and ephemeral
I mean, even if it is the original user,
A malicious user could do an image request then cancel it after completion
because SD doesn't know what job you are canceling, it just knows to stop everything
similar to tgwui
oh nice!
which I believe will be present in the progress data...
okay, that's pretty good ^-^
^-^
Just made a very nice commit for SD Forge API handling
Forge API users will rejoice
Mainly Illyasviel had removed the functionality of “override_settings” whilst revamping the code for Flux support, but never plugged it back in
forge can use flux?
🤯
There is example code in more recent versions of the bot (dict_imgmodels, basesettings) showing Forge specific values that can be used for managing Flux models
Forge got Flux, then I waited… and no one fixed the api related code… so took a stab at it and made it happen (a few times now)
:v
I’m going to add 2 new tag params:
- name - will be used in cmd print statements, as well as for…
- if_tags_matched - a new condition, a list of tag names. If any tags with a name are matched the condition will be true
For the latter, the main benefit will be if you have a crap ton of triggers for a tag, but you want another tag to trigger on those triggers but maybe another condition… or just want 2 separate tags with same triggers, can just use name trigger
Quick question
Can the bot connect to a comfyui server?
#1154970156108365944 message
I think that or something similar is on the todo list
Noted
Thanks for the quick response
That is literally all it can do right now, but I didn’t code anything in to actually use it yet 🤣
Er, I’m working to make it work with SwarmUI which is a frontend for Comfy
Might also just work for Comfy directly
My PR at Forge got merged so override_settings now works for Forge again
Pushed new setting to define imgmodel filters per-server
- Enhances usefulness of the
per_server_imgmodelssetting
I'll likely add another code block for per_channel_filters because why not
Not necessarily to split SFW / NSFW bot uses, but can help with only enabling relevant models, like if you have a server dedicated to cartoons and such can keep realistic modls out, etc
im loving the discord bot API
took me a bit to set up the response formatting and configuration and figuring out the yaml settings but after that it works like a charm
im using a llama 3.1 70B exl2 4.0 model and image generation alongside it
fits snugly within my specs
the tag system is very fun
:) leave him a star
@tepid needle thanks! Glad you're enjoying it. Open to any feedback or suggestions
The tags system was this eureka idea I had to wrap up most of my existing features into one package, with easy expansion/enhancement/all that. I was in way over my head when I got into coding it, and its one of those things I'll always be proud of.
the tag system works wonderfully with image generation
minty swaps in and translates their prompts into something usable for image generation
really, really good
makes it intuitive and easy for the user who doesnt know how to work with image generation
the only thing I wished was that image generation swapped to a different model depending on a tag
Well that can be done
Huh, then I did something wrong then cause I tested it out using selfie with a character
Coincidentally, earlier today I noticed my comment for that param was not correct... well, it was correct at one point
The value for the swap_imgmodel or change_imgmodel tag param should be the "model name"
The model names can be fetched from the sd-models endpoint
its basically the filename except any subdirectories willbe prefixed before it
so to use an example
lets say I want to swap over to the 'juggernautXL_juggXIByRundiffusion.safetensors [33e58e8668]'
would I type it like this in the character file?
swap_imgmodel: 'juggernautXL_juggXIByRundiffusion.safetensors [33e58e8668]'
Such as /sdxl/leosamsHelloworldXL_helloworldXL70 - the model name is:
sdxl_leosamsHelloworldXL_helloworldXL70
I'll fix that comment now
swap_imgmodel: juggernautXL_juggXIByRundiffusion would be correct unless its in a subdir
theyre not, this is the file structure
\stable-diffusion-webui-reForge\models\Stable-diffusion
yep, then as I said should work
appologies for the confusion
man, I cant wait to get that settled in
now on key words I can swap to a specific model with the proper Lora to match an aesthetic
fun stuff
Yep! Lots of fun stuff to do with it
What amazes me is that when I promote this no one even looks lol
Baffling really, but I'm plugging away all the same
thats crazy
the tag system makes it really intuitive for a user who has no idea how prompting works to get what they want
i can just throw the bot into a server and not be pinged to get a specific prompt for an image
similar prompt but from two different users using different keywords
the tags activate and the proper Loras are used
So looking at my code - it looks like it should actually work either way
whether using juggernautXL_juggXIByRundiffusion.safetensors [33e58e8668] or juggernautXL_juggXIByRundiffusion
got it
Noice
In Forge... my instructions will fail though because for whatever reason, the UI does not show the [HASH] if it was calculated
You're using reforge though, that value from the UI should match the value in the API call
im updating my repos before testing this out
It tries matching these values marked in the left
here in Forge the damn dropdown values dont match anything
I guess I could also check if filename.endswith(value)
Another thing to consider, is trying to come up with your own prompting characters - using the same approach as the M1nty-SDXL character
just need to provide different example responses
I'm not going to dive in but I believe that it is now just prepared to load that model again on the next request
If not - then I have something to fix 🔧
Did you use the swap_imgmodel param?
ditto
gotta get ready for work tomorrow
but with this i can use the tag system to swap to the appropriate model
Some more cool image stuff to check out:
- Flows tag is very powerful once you get it, check out the few examples I provided. For example, you can gen an image then use it as input to gen a second image, such as img2img and/or as a controlnet input.
- I have an explanation in the “tips” folder on how to make some advanced workflows… the bot has a very elaborate method of choosing random images from nested directories. If you have a number of different image inputs in the same tag (img2img, controlnet inputs, reactor input, etc), all set to randomly select, it will try finding the others in the same directory
Such as, if I have a number of directories of “person posing with product” - and each directory has a package of the same inputs except they are different poses - the bot is able to basically pick one of those directories and apply all the matching inputs
oooh, I see
So I could have someone pose with the product in all different ways, then make some inpainting masks / etc for each one
ill first need to figure out these other parts of stable diffusion because i started two days ago reading up on stuff
There’s a lot to digest!
what I would like is to use the upscaler for higher quality images when specified by the user
which im assuming can be done with the flows tag
I’d recommend using the Flows tag for that - and use a series of incremental scale ups
Exactly
i have something similar in comfyui so I get the idea
There’s a cool extension you can get called Loopback Scaler
You’ll find it in extensions list
found it
Basically you’d do something like double your resolutions, then use 4+ steps with a medium-low denoise
I believe I added a custom payload param in the bot, “scale”
(Might be tag param)
Well with Flows tag you’d wanna do something like
Step 1: scale 1.3
Step 2: scale 1.2
Scale 1.1
Or similar
Note that it will round dimensions to 64px precision
I may add a tag to adjust rounding precision
interesting tool
I got some great results using the xinsir Tile controlnet model with that - along with Reference controlnet
This allows a high denoise value
While keeping very close composition
im getting the hang of it
Definitely my favorite way to upscale
You should consider switching over to Forge from ReForge
What are the benefits?
Mainly, Forge supports Flux
The model loading/memory management may be marginally better than ReForge as well
I have a 4070ti (12gb vram) and quantized versions of Flux generate about the same timeframe as XL
i got the answer, Doherty Threshold, the time must be <400ms and you can lengthen this time by giving any response/feedback 😄
what's that I didnt understand wdym
search Doherty Threshold
so how would you go about implementing that, exactly?
the goal is to make it <400ms
to lengthen the time could be hmmm
premade sounds that can be played right away, while the actual generation is being baked in the backend
Doherty Threshold only says that under 400ms we dont feel the waiting and by giving any kind of feedback can lengthen those 400ms
you mean we got like 0.4 secs for free?
I mean if the actual generation is late a bit it's alright because we can play a premade sound even after 0.4 secs
this tells us that we have to process the first sentece+tts in less than 800ms in worst scenarios
yes, if we dont get the response in 0.4 we can play a sound to get another 0.4
if humming is 1s which i think is alright in total we have 1.8s
1.8s of free time to generate (speech) the actual llm response
seems cool
you just add the humming or whatever default starters we choose to the llm response before hitting start
so in the end the llm response will make sense bcz the voice and the llm response are matched perfectly
you will choose a random starter to play?
from the list?
Example list
hmm...
hmm.. {user}
{user}
I think...
So...
You know what?
I... hmm..
umm...
starters/premade starter voices
this is a different approach to make the bot better, as setting the aplit to 100% you get bombed by discord, to a lower % it a matter of luck
that looks good
wdym set the split to 100%?
have you tried the streaming tts feature?
This guy doesn’t even use text generation web UI
Yeah I’ll add the parameter to trigger the first TTS split I’ve just been busy
I’ve been pushing some pretty significant Paul requests to forge the past few days
you can allow the users to provide a custom list, then click a button "Generate Starters", all those starters will be voiced and cached for the future
just ideas for you guys
after thinking this idea, if we set start reply with "Hmmm"
we start a reply, play a preprocessed "Hmmm", after generating the reply it will have another "Hmmm", so in total we will hear 2 "Hmmm"
sometimes it's better to control the users behavior than the code
so add a note "Preferebly write long starters for good performance"
a note under the feature
longer starters will give you more headroom, more secs, more time to work with
Sometimes I'll get an error message when it tries saving History to file.
This was reported before.
Pushed an update that resolves this
idk if you were referring the problem i encountered before but indeed there was a problem with the history file
Probably it
Pushed an update
- Tags can now have a "
name" parameter - For the moment, this doesn't do much, only:
- prints the name for some tag logging
- Is now required for tags which include a "
persist" param.
Coming soon, new condition which will be True if the value matches the name of a matched tag
Using 'name' to log/check 'persist' tags sits much nicer with me, than what I was having to do in order to capture the tag value and compare it for equality
That's awesome!
I think that would have been great to test a theory a while ago while debugging what was up with the censoring bypass.
For easier at-a-glance readability, every log for tags could be displayed as [TAGS | {name}] since it's a stored attribute that gets passed around ^^
I might update the 'trumps' behavior to operate on 'name' instead of 'triggers'
Pushed update adding new condition 'only_with_tags'
only_with_tagsis a list of tag names (the newnameparam)- This condition is only
Trueif one of the named tags was matched.
- Condition will also be True for any named tags that are persistently applied
- If a named tag was matched, then trumped, this tag won't trigger.
@halcyon quarry one quick question before I forget
Is it possible to allow a tag, when triggered, to input a randomly chosen set of values for image generation?
Yes it’s the img_param_variances tag @tepid needle
You predefine the ranges for each setting you want randomized
For number values integers and floats it does not use the value that it picks rather it will add or subtract the selected value from whatever the default value is like if you have 30 steps edit choose is five image will generate with 35 steps
I have some comments included for the tag for him so go check it out
Voice input
😵💫 So I have a trigger, the trigger must match and then if the name matches too then activates the persist tag?
im 😵💫
As long as the tag with persist also has a name parameter, it will work
Instead of retaining a copy of the entire tag value, as I was doing, it is now only retaining the name
During the tag matching, it fetches any persistent names that were captured.
As it iterates over the tags, if a tag has a matching name it will be automatically applied
the persist tag must have a name
if the trigger matches ✅
if the name matches ✅
The name does not have to match anything for the tag to trigger in the first place
It's only used to re-match the tag
you mean that before it retains the content of the tag and now it retains the name and looks for the content?
😵💫
Yes, before it made a copy of the entire tag and kept it
Now when a persistent tag is matched, it just captures the name
It applies the entire tag
Look, as far as you are concerned all you need to do is slap a name on it and it will behave as it has been
what are the benefits?
I manipulate the tags as they are being processed
by taking a snapshot of the tag value, then trying to compare it again later for equality, I had to mdofy a lot of code
so instead of capturing something like {'trigger': 'some text', 'should_gen_image': true, 'insert_text': 'some shit', 'text_replace_method': insert'} etc etc, then trying to match it later
Now I just capture: persist_tag_names: ['some shit', 'another persistent tag']
These are "tupled" with the number of remaining persistency
so it can be deducted by 1 every time until zero
healthy code healthy dev
I found a bug which in some cases, caused a number of tags to be completely skipped, when using the /image command.
Just pushed a fix for that
is civit half down for you?
civit seems fully OK as far as I can tell
Your network or work network?
looks fine
if you are at work they may have put some specific block or something
try a different browser
Finally got around to resolving this
one line I needed 😄
Wooo
👏
Happy Halloween to you too
Happy Halloween!
Hello!
Hi!
wave wave How's the project going?
^-^
Been on break 🙂
That's a mood, I've been focusing on a lot of life stuff lately
get back to work 😐
I don't pay you 0$ per month for nothing
😶
Wow that's more than me, need to ask for a raise
you get -100$
congrats!!
This is actually pretty huge... Panchovix (SD ReForge) was able to bring the lora control extension into the Forge memory handling, which has been in demand since the initial release of Forge
I'll be pushing an update in the next day or so to allow ReForge to use my automatic loractl scaling feature
which is currently only enabled for A1111
Ok well he made some other changes that screwed it up and now he has the feature back in dev lol
donwloading reforge...
It's probably the best UI if you don't care about Flux, and are iffy about Comfy / Swarm
This was the commit I tested where the "lora ctl" feature was still working.
https://github.com/Panchovix/stable-diffusion-webui-reForge/commit/1e950bc12e3a4f690ef6afcf96826e02e4d24ee9
he definitely broke it on the next commit or one of the next ones
I began adding Swarm support to the bot, but have been tied up with a video game lately / lost motivation atm
So far all it can do is detect if Swarm is running and capture the session id and that's it 😆
i wanted to try a model that for some reason dont work for forge
forge
Ok so now the loractl feature ACTUALLY WORKS in ReForge
I’ll push a quick update today to allow the bot’s auto loractl scaling feature to be enabled with ReForge
Actually works very well if you want to try setting up a whole crapton of tags with Loras, and triggering multiple
Pushed an update which allows loractl with ReForge
Pushed an edge case minor update
The bot can now apply the aspect ratio from an img2img image, by using the value 'from img2img' for the aspect_ratio tag parameter
This is useful for applying multi-controlnet, and other multiple-image-input tags using "random directory" method. The subdirectories can now have different resolutions.
never tried img2img :v
You can do a number of things with img2img via Tags
the most recent image generated is retained in the user images/temp location - so you can use '__temp/temp_img_0.png' as a valid img2img
Ya know, to use the last image as input. Can make a tag for that
Been daydreaming about how to make a very flexible integration of ComfyUI, with the bot’s tags system. It would be super cool to be able to run all sorts of workflows, prompt for required inputs, handle whatever the response is whether image video audio etc
text2video, img2video, vid2vid - all very accessible now to anyone with a 3060+ with acceptable quality and generation time
🎅

Still works for me
Someone just posted this on resdit… supposedly the web search extension works?
Don’t have time to check it out myself atm
i remember that what didnt work was when you plug it to the bot
although ive never tried adding it
@valid crypt sounds like you've tested out alltalk v2? Does it work with the bot?
I havent played with it yet
nope
not even for tgwui :v
the remote version works
seems like there is a "TGWUI Remote Extension" that alltalk v2 is compatible with
but with bot just no response
so its not a feature of alltalkv2 per se
ah
it is part of alltalk...
ah yeah I think I also saw that message recently... and stuck with v1
which still works fine
I'll look into this a bit more
:)
Thanks
Hello! I've ran into an issue that I suspect is my own doing but I'm kind of at a loss. At some point I created a conflict in the installer environment, so opted to just reinstall the whole stack (oobabooga, ad_discordbot and all) rather than try to figure that out. Should have been a clean wipe, but now I'm getting this error when I try to interact with the bot.
ERROR [bot.__main__]: An error occurred in llm_gen(): 'static_cache' Traceback (most recent call last): File "/home/mole/text-generation-webui-main/ad_discordbot/bot.py", line 2003, in llm_gen async for chunk in process_responses(): File "/home/mole/text-generation-webui-main/ad_discordbot/bot.py", line 1953, in process_responses async for resp in generate_in_executor(func): File "/home/mole/text-generation-webui-main/ad_discordbot/modules/utils_asyncio.py", line 161, in generate_in_executor result, is_done = await loop.run_in_executor(None, get_next_generator_result, gen) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/mole/text-generation-webui-main/installer_files/env/lib/python3.11/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/mole/text-generation-webui-main/ad_discordbot/modules/utils_asyncio.py", line 29, in get_next_generator_result result = next(gen) ^^^^^^^^^ File "/home/mole/text-generation-webui-main/ad_discordbot/modules/utils_tgwui.py", line 342, in custom_chatbot_wrapper for j, reply in enumerate(generate_reply(prompt, state, stopping_strings=stopping_strings, is_chat=True, for_ui=for_ui)): File "/home/mole/text-generation-webui-main/modules/text_generation.py", line 42, in generate_reply for result in _generate_reply(*args, **kwargs): File "/home/mole/text-generation-webui-main/modules/text_generation.py", line 97, in _generate_reply for reply in generate_func(question, original_question, seed, state, stopping_strings, is_chat=is_chat): File "/home/mole/text-generation-webui-main/modules/text_generation.py", line 305, in generate_reply_HF if state['static_cache']: ~~~~~^^^^^^^^^^^^^^^^ KeyError: 'static_cache'
I know it's an issue on the bot's end as the oobabooga client on its own works just fine. But other than that I haven't a clue. Also tested it using one of the known good example characters. Same thing.
This'll probably end up being silly, so I accept any teasing coming my way. 😛 I just want to get the bot working.
This isn't your fault, it's likely that you're using a version of tgwui that is more modern than the ad_discordbot
TGWUI is expecting a certain variable from the bot that the bot doesn't know to provide.
I'm running on a pretty outdated version of tgwui, not going to take the risk updating just yet!
But here's how you can patch that:
In tgwui/ad_discordbot/dict_base_settings.yaml
you can add static_cache: False
as one of the options under llmstate > state
Could be! I had installed tgwui a while back to originally use with oobabot before making the switch, so it's very much possible that what I grabbed a second time is a bit too new. I'll try that patch and see if it works!
If after this, you continue to have similar errors about something missing from the state variable you can find the defaults under tgwui/modules/shared.py
Looks like that worked perfectly! It's up and working just like it was before. Thanks for your help!
must be something with a fresh installed tgwui, i updated like a week ago and i have no problems :p
i discovered something, it actually does something, but idk why theres no audio
Need to update the bot to add static_cache
This is the first time in awhile that ooba added new params that didn’t fallback to a default value when not in the payload
updating to latest TGWUI and testing...
yes, works fine with this key added.
pushed! 2 lines! 😄
Yes I see that the original alltalk_tts is not working on the latest TGWUI
this is very unfortunate, the dev was so dedicated
yes seems like we need to go to the dev v2 version...
yes you need
although even v2 doesnt work ._ .
but it is amazin
g
this is crazy, the standalone app takes like 20 minutes to install
is it too much or too fast
what is the good xtts model again? 2.0.2?
Mistakenly did not retain my model
the base model can be easily downloaded here
Right, but if memory serves me right 2.0.3 was considered a step back…
i think that the whisper stt stoped working after the update
That sucks if a number of extensions just suddenly don’t work anymore
i have an old tgwui in my another pc and i tried it and it worked, but the latest didnt work
you can try if it works for you
Seems the newest version breaks a lot of things including the openAi extension
i might get an old tgwui just for alltalk wihtou remote
i saw ooba updating that extension 4 days ago
maybe wasnt ooba but if ppl updated it so it should work right?
I'll have to check it out soon, but just saw someone having an issue with the static_cache variable not being present
That's great to hear ^^
I’ve still been mostly engrossed in this game I’m playing but plan on testing alltalk v2 / remote - more, soon
I think I forgot to send a message but i've been submitting issues on the github for while now
it was a fresh install for docker
this seems incorrect...
What's this from?
TGWUI updater?
was there an updater for the bot? it's been a while
that's odd, wonder what happened.
You could try deleting the .git folder if something is missing and have it redownload
the correct answer is i'm stupid and didn't install it right
nvm I installed it right what is happening
I'm just gonna retry everything but through gitbash instead of terminal
Tgwui is supposed to create it's own environment, that might come with it's own version of git in the conda environment
I'm not sure if that would affect things
im not using tgwui
ahh, what are you using?
Tgwui is short for text-gen webui ^-^
no worries!
should update that while i'm at it tho
is this an updater for the ad_discord bot, or the webui?
im not sure if the bot has its own updater script
ad_discord bot
ah, okay, will check on that
are you updating from a very old version of the ad_discord bot?
the updater seems to be working but i'm getting another error
hm, looks like the wrong path was put in the script, that should be pointing to
textgen_webui/installer_files...
ad_discordbot is meant to be inside textgen_webui
ohno!
thats fine i know what they were
I would recommend moving your models to a seperate folder
and launching the webui with an argument to tell it to read from that folder
Like a symlink
I personally keep my models on a dedicated drive ^^
oh yeah, i should move them onto my m.2 once it gets here
that way you can move things around without transfering large files or deleting things
PS: I wouldn't recommend storing large projects on your desktop
windows has to load all those files as your pc boots up
and can slow things down
boot time hasn't really been an issue but thats good to know
oh my god
the web ui version i had downloaded was 1.18
Wow, that's suprizing, usually it defaults to latest for downloads
no, i'm getting the latest now, I had 1.18 on my pc before
There’s no bugs with the bot atm 
great!
I had pushed the update yesterday to add that one new key
Although I should really make a new “Release”
@twin thunder be sure to see the current install instructions for the bot
I was being quite silly the whole time (did not clone the repo to the right spot)
what the hell
how many models did I have installed
I just freed 150gb of space
what's ya'lls prefered models?
I'm a few months behind on the latest stuff, but I found that I could manage to fit gemma 27b at 2bits on my gpu which worked surprisingly well.
But I also use some llama3 8b finetune Hathor_Tahsin for simpler things
literally XD
guess what I've found after installing alltalkv2 as an extension
works just fine eh?
that.. looks like remote...
the extension is the same? just using the same environment I think, but I have to try if it works!
i'll just ask why the start up was tgwui mode and then remote...
never mind, bot can't even load
yea stick to 2.0.2, 2.0.3 i feel a slight improvement but it takes me 70% more time...
Of TGWUI?
I believe those versions refer to XTTS models
xtts
Ah
hey guys how do i stop people dm'ing my bot?
i gotta do all this seems a bit over the top? To disable DMs for your bot while using the ad_discordbot plugin, you can modify its behavior based on its structure and configuration. Below are steps and examples for implementing this functionality:
- Modify the Message Event in ad_discordbot
The ad_discordbot plugin processes messages through a message event listener. You can add a check to ignore DMs. Look for the section in the code handling the on_message event or similar and update it to include a guild check.
Example:
python
Copy
Edit
@bot.event
async def on_message(message):
# Ignore messages sent in DMs
if message.guild is None: # DM channels don't belong to any guild
return
# Continue processing messages in servers
await bot.process_commands(message)
2. Bot Settings for Scope Restriction
Check if ad_discordbot has configuration settings or a config.json file to define bot behavior. If such a file exists, look for options to disable or restrict DM responses.
- Ignore DMs Globally
If ad_discordbot uses decorators for command definitions (e.g., @bot.command()), you can add a global DM filter to enforce the restriction across all commands.
Example:
Modify or wrap the command logic:
python
Copy
Edit
def no_dm_check(ctx):
return ctx.guild is not None # Allow only messages from guilds
@bot.command()
@commands.check(no_dm_check)
async def my_command(ctx):
await ctx.send("This command only works in servers.")
4. Update the ad_discordbot Core Logic
You may need to update ad_discordbot's source to handle this at a higher level:
Locate the part of the code where the bot reads incoming messages or processes events.
Implement a DM filter as shown in the examples above.
5. Redirect DM Senders (Optional)
If you want to send a polite response to DM users instead of silently ignoring them, you can modify the behavior to include a reply.
Example:
python
Copy
Edit
@bot.event
async def on_message(message):
if message.guild is None: # Check if the message is from a DM
await message.author.send("I do not respond to direct messages. Please use the bot in a server.")
return
await bot.process_commands(message)
6. Testing and Validation
Restart the bot after making changes.
Test it by sending DMs and ensuring the bot does not respond.
Ensure commands work correctly in servers.
If you encounter specific issues with ad_discordbot integration or need help pinpointing where to add these changes in its structure, provide snippets of its core processing logic, and I can assist further.
also stop it replying to other ai's bots too
pretty sure there were settings ^^
chance_to_reply_to_other_bots in base_settings.yaml
ah, it wasn't that texting was disabled in dms, just some commands
~~it shouldn't be too hard to add a setting for that and an extra if statement in the on_message
I wont be able to test it as i'm using an older version of TGWUI and dont want to update yet but can make a branch for you to try in a few hours.
Busy atm~~
Edit: there is actually a setting, different file
discord > direct_messages > allow_chatting in config.yaml
and can disable all commands in dms too with the next setting allowed_commands
thanks checking now 🙂
i dont see this in my config.yaml discord > direct_messages > allow_chatting in config.yaml
and can disable all commands in dms too with the next setting allowed_commands
reply_to_itself: 0.0 # 0.0 = never happens / 1.0 = always happens
chance_to_reply_to_other_bots: 0.0 # Chance for bot to reply when other bots speak in main channel
reply_to_bots_when_addressed: 0.0 # Chance for bot to reply when other bots mention it by name
only_speak_when_spoken_to: true # This value gets ignored if you're talking in the bot's main channel
ignore_parentheses: true # (Bot ignores you if you write like this)
go_wild_in_channel: true # Whether or not the bot will always reply in the main channel
conversation_recency: 600
You can put code in clode blocks by surrounding them in 3 backticks ```code```
chance_to_reply_to_other_bots already being 0 and still happening might be a bug, interesting
what kind of bot does it reply to?
Are these bots mentioning the AdDiscordbot?
It's possible you have an older version of the config.
I'm not sure how updating it works
thanks 🙂
Just run the updater bat file
what I mean is, do the configs get updated too?
because they're editable
i think configs dont get updated, only the first launch will copy from example, but after that, you have to copy or editing them manually
The config templates get updated-
On startup, the bot compares user settings to the settings templates. Any missing user settings default to what is in the templates, while warning in the cmd window
getting this error while using ExLlama as a loader (i've already posted a issue on github about it)
A new parameter has been added, these instructions will work the same ^-^ #1154970156108365944 message
ah, just downgrade or wait for ad-discord bot to update
Sure, but I'm also not sure if updating the bot will add the missing paramater to the settings file as it's meant to be editable.
That means it's probably in the gitignore file.
Ill have to look into that
adding the missing parameters to the config appears to have worked! thanks!
im playing https://huggingface.co/spaces/TTS-AGI/TTS-Arena
and im impressed with kokoro, gonna plug it to the bot
Oh nice!
I wonder what the license is like on that and if it supports cloning/finetuning
ahh personal only
... actually not sure, maybe it's just the demo
idk if it can be finetuned, but if the quality is good, rvc is your best friend
found this but i cant make it work https://github.com/h43lb1t0/KokoroTtsTexGernerationWebui
I closed this issue as completed because I added the parameters 🤓 just update
Ah sorry just noticed this is old comment, mb
Working in TGWUI?
Not even in tgwui
From the comments it is supposed to prepare everything at the first startup, but mine didn't, maybe there's something that I had to do but only programmers would know
I think there’s been a number of significant changes on TGWUI side recently… maybe try on a version as old as latest commit from extension
If it works, and can pinpoint commit that breaks it, that would be a good place to start fixing it
Tried
The extension does nothing
You may try it, I can't spot the bug and chatgpt doesn't help
Are you using the correct model v1.0?
I looked at the Issues, and the author had just closed one 5 days ago… idk I’d expect that the extension should be doing what it claims to be doing with an active dev
ah very cool
👍 wasnt my problem
i remember that the extension refers to it self as KokoroTtsTexGernerationWebui, and to be recognized as a tts extension must have _tts at the end
I suspect that my ISP cut off my internet, it should have no limits but I downloaded the deepseek r1 to try if it works...
i was right, the extension refers as KokoroTtsTexGernerationWebui, not very hard to modify
nah it doesnt work, with the bot, not a big deal as i cant plug rvc to it
not really, haven't changed much in 3 months
as someone might guess, i tried to add rvc, but my skill is 😅
original
i quit coding 😓
sounds like missmatched framerates?
It looked like that tts generated 24kfps audio
Perhaps rvc is expecting 16k as other tts systems output?
could be, i stole those code from https://github.com/marcos33998/edge_tts so...
😋
i did something weird, idk what happened, but as what my teacher said, if it works dont touch it
gonna make it open source 👍
im ready to get roasted 👍
gonna fing a way to plug it to the bot 👏
It should just work probably
i already tried, no audio
You need to make sure to put the correct extension name in the config file. And add the relevant parameters to your character file
alltalks remot didnt work too
older tts is like text book examples, i can see, but these new tts, i dont see
i could take a look if the code isn't too long.
But depending on how you copied the code from edge_tts you shouldn't have a problem as the tts result is resampled? then saved as 44k which I imagine gets imported into rvc.
Right so you need to just see the example parameters in the minty character and replicate it in your own character with the parameters
that is older tts, new tts are just 🙈
Eh the params are hiding in the code somewhere
thats the problem
if you consider ~300lines as short do it https://github.com/marcos33998/KokoroTtsTexGernerationWebui_tts
it should work for everyone I believe
let me do a final test, as ive only checked that the preview workds ._ .
it works, im not touching that
i didnt see that one XD
it is too late to fix, theres a lot of files that uses the path with the wrong name, i'm not touching that
Ok so I looked into it, and so far, all the other TTS extensions had added a string to the internal response in the format of:
'audio src="file/(.*?\.(wav|mp3))" - This is the regex that captures it
Looking into the code of this, it actually returns a string such as this example:
<audio controls><source src="file/path/to/audio/123456.wav" type="audio/mpeg"></audio>
As a quick test @valid crypt you could edit the bot file shared/utils_shared.py - find the audio_source = ... (in SharedRegex)
Replace that line with this:
audio_src = re.compile(r'src="file/([^"]+\.(wav|mp3))"', flags=re.IGNORECASE)
I believe the bot would then be able to play back the TTS response
got it
in other words it is formatting using source src= instead of what seemed to be standardized... audio src=
In your fork, you could also instead try just changing this to audio src= and see how it behaves in TGWUI (via the UI) / the bot
I believe this is the actual correct answer... I think source src= is like the generic catch-all for extra file types that can be appended to the internal response.
Since my regex also requires the extension to be mp3 or wav, I should be able to safely make this change (drop the "audio") without falsely trying to potentially process other response types as audio
i didnt work with my fork, maybe is because it has no default selected voice, all talk should work, wait for me
my code's problem
all talk works XD
👍
the drive that had been used as virtual ram went wrong, it should pretty new and high end
got surpassed by an old high end + chipset m.2 slot 😓
something is not right with mine
not my problem, his extension has problems 😠
help 😭
@halcyon quarry 🥹
Help do what? lol
the extension
no audio
😭
i changet the output modifier to ```def output_modifier(string, state):
# Escape and clean the text
string_for_tts = html.unescape(string).replace('*', '').replace('`', '')
# Generate audio file
msg_id = run(string_for_tts, rvc_params=RVC_PARAMS)
# Create relative path from webui root directory
audio_path = pathlib.Path(__file__).parent / 'audio' / f'{msg_id}.wav'
# Get relative path from webui working directory
relative_path = os.path.relpath(audio_path, start=os.getcwd())
# Convert to web-style path and add cache busting
web_path = f"file/{relative_path.replace(os.sep, '/')}?v={int(time.time())}"
# Add audio element with proper relative path
return f'{string}<audio controls><source src="{web_path}" type="audio/mpeg"></audio>'```
making it use relative path and accessble from local network but i still dont know why bot dont work
Like I said the bot currently does not expect source src= it expects audio src=
on ur last line
Does it generate TTS, and save a local version of the output? Amd just fail to play it?
ok this seems to be the problem here
in bot.py search for audio_src - there are 2 instances
Youll see something like:
if 'audio src=' in vis_resp_chunk:
audio_format_match = patterns.audio_src.search(vis_resp_chunk)
Try removing that first condition, and then nudge all the lines below it so they are indented correctly
def apply_extensions(chunk_text:str, was_streamed=True):
vis_resp_chunk:str = extensions_module.apply_extensions('output', chunk_text, state=self.llm_payload['state'], is_chat=True)
audio_format_match = patterns.audio_src.search(vis_resp_chunk)
if audio_format_match:
stream_replies.streamed_tts = was_streamed
setattr(self.params, 'streamed_tts', was_streamed)
self.tts_resp.append(audio_format_match.group(1))
🫡
Well you can ignore the one in speak_task()
but that would change to
audio_format_match = patterns.audio_src.search(vis_resp_chunk)
if audio_format_match:
self.tts_resp.append(audio_format_match.group(1))
This should work, on the assumption that you also updated the thing in Shared Regex in utils_shared.py as I had said earlier
audio_src = re.compile(r'src="file/([^"]+\.(wav|mp3))"', flags=re.IGNORECASE)
right this was some dumb oversight of mine
oki
An easy way to shift the indents is to highlight all the lines and press Ctrl+[
To nudge them to the right, Ctrl+]
check?
yep looks good
And make sure that the regex pattern is updated in utils_shared.py
Add this print statement
print("RESPONSE:", vis_resp_chunk)
When you use the bot, it will print the extra crap that the extension adds to the response -
then I'll ask ChatGPT why the regex pattern is not finding it
🫡
You're right on track with your dream city floating above the clouds - that's an amazing concept! Now, let's add some more features to make it even more incredible.
Here are a few ideas:
* A network of sky gardens and vertical farms to provide fresh produce for its inhabitants.
* An advanced transportation system using hyperloops or vacuum tubes to transport people quickly and efficiently throughout the city.
* A unique waste management system that converts trash into energy, water, and nutrients for the ecosystem.
Now it's your turn! What features would you add to this floating city?
(I'll wait patiently for your response)<audio controls><source src="file/extensions/KokoroTtsTexGernerationWebui_tts/audio/8b837f97-4ac1-421f-ab3c-7cae1ed10050.wav?v=1738705660" type="audio/mpeg"></audio>```
?v=1738705660" - it has to do with this bit at the end I'm sure
idk what is that
Try with this regex
audio_src = re.compile(r'src="file/([^"]+\.(wav|mp3)(\?[^"]*)?)"', flags=re.IGNORECASE)
actually
this is the one
audio_src = re.compile(r'src="file/([^"]+\.(wav|mp3))\b', flags=re.IGNORECASE)
This will ignore the extra query that appeats after the file extension
wait, i dont know if i did something wrong, only the first audio is being played
not by that mean, literally only the first audio is being played
the first of all
ps: audios are generated
You mean like an old file that was generated in a previous session? The oldest file in the directories?
the extension doesnt delete its audio files, although only the first audio was played, next messages's tts are generated
as i'm having a lot of network problem with my laptop lately could be my fault
If you have TTS streaming feature enabled it could be due to it
Otherwise, maybe your network issue. Otherwise, you could probably further debug it with print statements at that apply_extensions()
worse, only that time and that audio worked
i think is the split of the extension
the extension it self has a split function as kokoro only supports 500tokens
Xtts works the same. All talk splits the text into Individual sentences
after testing the bot with a lot of hello, it is scared of it XD
anyways, the tts works good
Yeah my bot isn’t happy when I write “test” over and over
Eyyy nice, tell me if you had to mess with anything beyond just updating the bot
With the changes you helped me debug
i didnt update, i only changed everything you told me, i have to officially update it now
May have to delete the files to fetch fresh.
Or if using the github desktop app, right click discard changes
that is something that i should start using
You should try to push your changes (RVC support?) to the main project
I could verify kokoro as a supported TTS extension
to complex for the little bro to merge, hi has exams, and i have 5000 suspicious lines of code XD
rvc is 🔥
although only 300 are mine :)
Ah well
mine has _tts suffix and works too
I have to say, they are a good combination :) i'm proud of myself 😎
I'll look into adding support but the thing that's going to be a bummer is if there are actually no parameters
I was trying to figure out if they were hidden somewhere but couldn't find them
Couldn't find them via printing TGWUI code either - I think the extension independently manages its parameters
haha told you, the only parameters you can find are my rvc params hahaha
although idk if i used them correctly ._ .
the other TTS extensions set parameters to TGWUI's shared.args class
insane improvements
only 3 is real
kokoro is just no emotion
old ones, also the good thing about 4 is its insane speed of less than a seconds :v
Wow! That is very very good
Number 1 quite good
4 is pretty good indeed given you say it processes fast AF
2: 3.5x speed (with rvc)
1: 11x speed
4: 40x (lol)
5: 3.5x
a text that should be 24s long and divide by the average of 5 tries
4 is very impressive except at the end sounds like "young one" instead of pronouncing it correctly "woman"
So what, you used kokoro for all these?
Alltalk ftw
You’re making me jump through all hoops to make other extensions work but just need alltalk 🤗
the misconception is that alltalk wasnt that good, I only asked for edge tts and vits, and these days I asked for kokoro and alltalkv2, but by fixing kokoro, now alltalkv2 works 😏
Pushed some changes regarding /image cmd
- The user's prompt would be part of the embed. Now, it is sent as normal text along with the embed so it can actually be copied when using discord on mobile (can't copy embed text on mobile).
- Added another selection for the
use_llmoption - to automatically prefix the prompt withProvide an image prompt description (without any preamble or additional text) for:
I didn't feel like over-complicating this new option, if that's what you're inquiring about.
Can optionally just write the full prompt without a preconfigured prefix 😛
Or use the tags system to prefix prompts
i want to steal some stt/asr code and plug it with some black magic
Personally, this quick prefix stops the LLM from begining the reply with "Sure! Here's an image prompt:"
Although, this just made me realize it would be a great idea to add a "Generate image" option to the /prompt command, even if redundant to some degree
i was asking to make my life easier, i want to add stt but maybe i can use another program and with a little bit of inspiration make bot think that it was a message form user and 🥳
i mean 7000 lines is not very friendly, maybe just the name of those modules?
like i never thought that to fix tts i have to touch shared utils :O
There's a number of ways users can input... now the main listener is def on_message()
It determines if the bot should reply or not.
If so, it creates a task and queues it.
The TaskManager class processes the tasks
For user message type tasks, it will run one of these code blocks
There's modules that simplify/streamline a lot of the code used in these main blocks
For instance lots of tags related code is in the tags.py module
not touching that very soon
I comment so much stuff becomes I'll completely forget why the heck I do anything without it lol
Want a list of codes to look at if you seriously want to try helping add STT?
that one is likely a slippery one...
dont expect fancy results from me
Welcome to hear any proposal on, what your thoughts were on actually handling it.
like, a TGWUI extension?
Native discord functions?
i was thinking at tgwui but im not sure if it is going to work
as whiper stt is not working anymore ._ .
if extension can directly do inputs and its compatible with the bot, i could think that way
Well since the bot is designed to run on its own TGWUI instance and not via API, I'm not sure exactly how the extension could be beneficial...
What does it do that discord voice input cannot?
right now what im thinking is just make it work, and leave all the problem for a further future
There likely just needs to be some research on how voice input from voice channel can be captured appropriately to text
A listener function that uses discord code
make a separated program, a second bot just for audio input, steal some code, make fake inputs to the main bot
I'd likely just need to add some new "Task" or parameter to existing task, that will ensure no text response from bot, only play response on VC
I've been engrossed in this game lately, the ladder season is almost over and I'm definitely sitting out the next one
will be back in the saddle
:v
why didnt you add you bot to https://github.com/oobabooga/text-generation-webui-extensions
ohhhh
Well it's not technically an extension
I could ask ooba if he thinks it could be considered an exception... (disclaimer about what it actually is, etc)
your is not too far away
I'll see!
although i never understood why your bot cant use the webui or the api
If you enjoy character specific TTS settings (voices, etc), and TTS streaming - these are not possible via the API
Well, the TTS streaming may be possible... really not sure about that.
But definitely cannot adjust extension parameters via API.
Pushed small update - added new option to '/prompt' cmd
- Can now force the response type (text / image / text+image)
@valid crypt I submitted a PR to the extension list to add the bot, thanks for the suggestion!
I meant openai api
I know - openai API is the TGWUI API
I meant let others use the api while bot is running
It may be possible to run 2 separate instances of TGWUI, if using custom flags with the bot such as unique port, etc
I know what you mean is like, an option to run all the UI related code as well instead of how the bot currently executes the backend code on startup
I have an (outdated) dev version of the bot which successfulyl uses the openai API - TGWUI launches normally and can be used in the UI simultaneously, etc
but this version of the bot does not launch TGWUI
and also complicates the settings management, and also makes some features impossible like the TTS voices
f
👍
Pushed another update for /prompt cmd
- Yet another option, load_history to specify how much history to load for the interaction
- The /prompt cmd can now be explicitly disabled from use in DMs via config.yaml
What I need to add is for the bot to reply to show the user message
OK - Now the bot will immediately send an embed reflecting the user's prompt and params used for /prompt cmd
(don't think system message is applicable for this model/mode)
I had an idea for a new tag which could be pretty useful… “run_code” which would be a filename, and a companion tag “send_code_result” which would be a format to listen for and send (text, audio, video, etc)
Would be a bit advanced for some but would add a lot of flexibility to what the bot could actually do
like a user could define a tag that triggers for some phrase, and maybe I make some syntax that can optionally pass values into whatever code is being executed
like multiply >>678<< and >>2000<< and the tag will run a code that multiplies 2 values. Crude example.
ahhh
But it could be whatever code, could be something that generates and returns a video for instance
It would just add another advanced tool for users to think about using
some model are trained to be able to use tools
the next step is make it an agent huh
is it possible to specify which gpu the bot uses? i have a dual gpu setup and my main gpu does not have enough vram to load my models.
You should be able to set it in the webui in the models tab where you load your models.
Try saving settings there for how you want that model to be loaded.
You could also try the cmd_flags in ad_discordbot (was that a thing?) or just the normal cmd_flags to specify the gpu split
noted will try ty
There is a CMD Flags file for both TGWUI, and one with the bot. Should be able to set the flag for this with the bots cmd flags file
what cmd flags should i use? is there somewhere i can find a list of them?
On the TGWUI repo there is an expanding text labeled List of command-line flags
So Ctrl+F to jump to that and open it up
@fickle ember welcome to the channel btw, let me know if you have any feedback on the bot 🤗
ive been using the bot for some time now. I have some feedback.
- I notice that as conversations stretch the AI more or less loses its personality and forgets details about itself which are specified in the character.yaml files
- The bot is unable to identify what youre talking about when you reply to someone and talk to the bot. i reviewed the console and it only seems to read the message outright not taking the message being replied to into account, in some cases this information would be vital.
Thanks! In regards to 1. this is not exclusive to the bot; this will happen in the webui as well. You can try using a system message, or maybe even limit chat history
can you explain what a system message is and how i can set one up? i appreciate you getting back to me so quickly.
About 2.- if you are suggesting that the bot might get an automatic prefix to the message like “(user X is replying to user Y’s original message which was ‘blah blah blah’)” then this could be an interesting idea, assuming I can get the message content from replies (would have to check into this)
i think if you could figure out a way to do this, it would make the bots conversation abilities way better and flow more naturally like another user.
System message is only applicable if your model’s template supports it and you’re in that mode (ei: chat template, or instruct template). The TGWUI code chooses the most appropriate template automatically so you’ll have to look into it a bit to see what template is loading for the model
Chat instruct mode might also help… i believe this prefixes your prompts with an instruction
appreciated. i will try this.
I’ll def look into that idea, hadn’t considered that before. I do have some other things on the backburner to make it behave more natural as well
how long do you think it will take to get some of these ideas out? my friends and i are really enjoying the bot
A few features I want to add at once all under a “server mode” setting
Honestly, within a month or two. Been engrossed in this game that has a ladder season which doesn’t have a fixed date but ending relatively soon
Ttyl though goin to be now
Bed*
for problem 1 i think it could be either too much context or cutting context, the model that you are using also affects the quality, i would like to know about the thing in the image
for image generation I retain zero chat history for this reason - chat history will incrementally make the responses get worse and worse from desired result
finally got some time to do stt, gonna start with the ez way, another bot only for stt, right now gonna go with whisper although these are interesting too.
(~~https://github.com/modelscope/FunASR~~)
https://github.com/FunAudioLLM/SenseVoice
https://github.com/k2-fsa/sherpa-onnx
Multilingual Voice Understanding Model. Contribute to FunAudioLLM/SenseVoice development by creating an account on GitHub.
it's a really good way, being really simple, and should be really effective as users can create a private text channels and let bots chat there, while in voice channel feels like totally nothing weird :v
not the most elegant way but yeah
Two bots then?
some day when i get better i might be able to fuse them
another benefit of this is the environment? and i can use another machine for it :)
Very cool
So is it working?