#ad_discordbot (Fork of Fork of xNul's bot)
1 messages · Page 16 of 1
I'll think about trying to monkey patch it
I downloaded mcmonkey's SwarmUI yesterday - downloaded the smallest Flux model (Schnell) and generated an image using the default bare bones settings (1024 x 1024, a mere 20 steps, Euler sampler).
I'm using a 4070ti (12GB Vram) - Goddamned picture took 2 minutes, and was shit because 20 steps isn't enough.
Flux is a huge milestone for open source but only if you've got a freakin supercomputer
Although I understand most LLM models also restricted to supercomputers
i've seen people mention 2bit quants for that on reddit, perhaps you can get even smaller with the Schnell version at 2 bits?
but the quality will probably be as bad as you expect from 2 bits
i didnt mentioned but this https://github.com/Artrajz/vits-simple-api got 4 different tts in 1
I think by the time "good" quantized versions of Flux are available to the little guys, SD3.1 could be around by then and likely will be better by comparison (probably will not hold a candle to the normal Flux models)
my friend gpt said that https://github.com/Arondight/vits_api_tts i can modify this extensions code and use a different tts easily
i havent used sd3.0 yet but i remember that it was pretty similar to flux
¯_(ツ)_/¯
3.0 is very good at some things, but the fools intentionally poisoned the model and as a result it has a lot of issues
?
They recently made an announcement that they fucked up and are going to release a model that isn't borked
Have you not seen the results of "woman laying in the grass"?
It's like some kind of eldritch horror
?
nope
show me
isnt that something with the prompt
on civit i see normal results
i like more the bottom
and xl is top
https://stability.ai/news/license-update
Improving Model Quality
Before we released SD3 Medium, our initial testing indicated that it was, in most cases, a much better base model compared to SDXL, in terms of prompt adherence, diversity, detail, and overall quality. However, the community quickly identified some critical quality issues mainly related to body poses and words that were too rarely seen in the training set. To address these concerns, we have focused on two key areas:
Continuous Improvement: SD3 Medium is still a work in progress. We aim to release a much improved version in the coming week
now i understand
Can't find it but there was also some other threads where people had compared identical prompts / settings shared by Stability AI, and could not reproduce results close at all even when using their API for larger models
When SD3 was released these posts were endless
not yet but if they're actually going to do it, shouldn't be too much longer
I had seen an update shared recently which acknowledges that they are still working on it
I run the flux.dev on comfyui currently with my 3080 10GB 1024x1024, 100 seconds for 20 steps using fp8_e5m2 with the t5xxl_fp15 clip, and I have gotten really amazing results. I have never bothered to try the schnell yet though, and the dev version generations don't look right anecdotally below 12 steps.
I can give the SwarmUI a try later this week for it too. I'm not sure if there's an API for comfyui or swarmUI since still just testing the model, but definitely an interest for my users just to run 20 step.
My SDXL I always had setup for them in 70 steps 1024x1024 default which is I think just a tad faster.
Very different than a1111 or forgeui...
According to McMonkey, there is an API - at least for their version https://github.com/mcmonkeyprojects/SwarmUI
On the bot to-do list is support for this
hmm... Tags based on "role" seems to require additional privileged intents ('member')
Seems to be more trouble than it's worth
May need to require that intent at some point for something more important...
got the guild IDs and channel ID tags working though
indeed, the hack I had in mind wouldn't work after all, because it would generate TTS for the entire message not just the continued response
The whole:
"This
This is
This is a
This is a messsage!"
Issue?
stream response not working properly
you can try to pull latest version
or maybe he pushed the wrong one ._ .
Is that a thing now? Sending some TTS server the /chat/completions streaming message instead for streaming voice? Sounds like a boss move for some TTS project.
no
is only for text
for now
he is trying to make it work for tts
he could do for one in particular but he want a more universal fix
¯_(ツ)_/¯
I have some ideas about how to handle the chunks that ome through /chat/completions, but I'm not sure how like Alltalk TTS handles them, or if it'd even be fast enough..
just tell him your suggestions, these are his words
More than likely He'll have a working solution first x.x It's parts of my list of Speech features I need to fix back up for my example lightweight website.
- fix my basic STT-to-TTS
- Hands-free 'Alexa' like mode for transcribing a key word to prompt a speech request
- Streaming speech with an abort feature.
you may have missed it, but the but can now send message chunks
What I was thinking of yesterday, was trying to stop the generation every time it would split - which would result in it only generating so much TTS - then updating the payload with that response and use "Continue"
But after more consideration, realized it would regenerate the entire TTS response
I just finished my testing bot for the website under Hermes-2-Theta-Llama-3-8B-32k.i1-Q5_K_M.gguf, to continue working on it while the partial version is live.. I'll have moreof a look at the TTS streaming. Supposedly you can't stop the streaming per the Alltalk notes while mid-stream?
This bot doesn't use the TGWUI API, nor any particular TTS API
But I see what you mean combining chunks together to make less of a constant I think?
It let's TGWUI extension handling invoke the TTS stuff - which can't be chunked
The text responses can (and are) chunked - but the TTS comes all as one response
So the viable options to enable that, are to use a dedicated API, which limits the current flexibilty of clients.
Or I need to possibly monkey patch something in TGWUI to make it return partial TTS responses
@calm rain I've embraced the footer as you use in your bot 🤓
(although I know you include the image within the embed)
Any info on this? I got my 2nd discord bot running and he's hallucinating alot and came up with some 'persona' google doc links.
Overhauled alot of stopping strings. Almost no issues now x.x until he finds a new phrase to abuse
@terse folio quick FYI, in case the gears start turning again for this project, send_long_message() now returns a list of sent msg IDs and the last message object. It no longer handles assigning Bot HMessage IDs, which is now handled outside the function.
got a few time consuming things to do today, but could take a look later.
My first plan was that send_long_message would create an hmessage for each submessage and assign the ID within that function.
Returning the hmessage for the last item (which also could have related messages connected to the submessages)
this could be the only tts we will need
it literally has every tts
I prefer it this way, I think it's more flexible. For instance, when I was driving in to work today I had the idea that I can now use the returned message IDs and store them in the database, when using the Post Active Settings feature - which will allow splitting the settings by category, and replacing only the setting categories that have been modified
I'd rather not have send_long_message() returning an HMessage, the IDs list, and a discord Message, but just the latter 2
It literally does not have XTTS
the message ids would be accessable through the hmessage,
also interesting idea ^^
maybe you could edit the existing ids in the settings channel
Yes I was sending a dummy HMessage to collect them, but I don't like it 😄
what do you mean sending?
Nah, just going to delete and replace the messages. Don't want to spend too much time figuring out how to programmatically detect when the new message response will exceed the permitted text per message, and decide to edit or send for multiple message blocks
Better terminology would be "passing a dummy HMessage"
ahh, okay
dummy_hmsg = local_history.new(message(save=False)
dummy_hmsg = await send_long_message(text, whatever, dummy_hmessage)
then collect its ID and related IDs... nahhh 😛
oh why do you need to pass a dummy hmessage into the send_long_message function?
to get the channel id?
For occassions where I need the sent message IDs but it is not sending messages that I want in history whatsoever
But like I said, I changed it up now - I just get the IDs back from the function and assign them to the HMessage after
I see i see
id = sent_msg_ids.pop(-1)
related_ids = sent_msg_ids
One thing I was struggling with yesterday and gave up - still thinking for a solution...
Is trying to run the bot with privileged intents, then falling back to "False" if an exception occurs.
Like, I don't want to screw up someone's bot because I want to have a for_roles_only tag - which needs members intent
The intents are set while creating the client (bot) object at the beginning of the script, then all the commands are set, yadda yadda, then finally at the runner is when it may error due to intents
I can catch the correct exception but don't think I can simply update an intent without recreating the client object again
I don't think you can update intents, i'm pretty sure that's part of the login payload
I would create a little wiki for users on how to set intents for their bot in the developer panel
it's just a few clicks!
There is a priviledged intent that I've been setting for a long time which is required message_content, that I had no exception / custom log error message for - which now I do, but yes I will shortly be updating my install steps to explicitly explain setting that
The members one, if I can't find an elegant solution to try / fall back, I'm just going to forget that for now since it's such a minor feature
yea, that's kinda required for most interactions, especially for chatbots.
You can enable all intents for the bot without needing to verify it while it is under 75 guilds.
I do have the intents defaults, which sets true for all non-priviledged intents (there are only 3 it needs explicit permission for)
I understand, you can do it per-guild, or globally
no no, I mean the privileged intents on the developer panel,
you can access them all iirc while your bot is small
and there isn't really a reason not to enable the basics like message content/members/roles...
most discord bots will use this information in some way.
like moderation bots will need member/role info as well as message content for moderation
yes, then discord wants you to verify with them else the bot can't join new guilds
Been overhauling the Post active settings feature all morning, to not suck
Adding a command to set a settings channel per server (like voice channels)
It will create a list of message IDs for each settings category, and index it for the channel ID
When updating, will iterate over the stored IDs and fetch/delete the messages, then replace the list after sending the new settings
Made a lot of progress but still needs some debugging
alright fixed one that wasn't working, and amazin results
but not every model has the same error
The Post Active Settings feature will actually be quite nice, since it will post a
Header for each setting
And config.yaml will now allow the feature to be customized to only include certain settings.
Will help seeing TAGS much easier - planning to extract them from all sources and collect as one setting block, under subheadings like Character Tags and Imgmodel Tags etc
It's working O_O
When using the command - if changing the server's settings channel from one to another, it will delete all the settings from the previous channel, then post all settings to the new channel.
If it's the first channel set, it just posts all the settings
It's done - just need to update the code to use it more often (currently only triggering from the new slash command, or when changing imgmodels)
@terse folio Wondering if you have a solution for this...
my new /set_server_settings_channel command is working like the voice channel one - it has a prepopulated list of channels to choose from.
Big difference here though... it seems to be limited to 10 channels. D'ya know offhand a good solution for that?
this is on dev branch if you wanna check it out
if you're using the text channel typehint
you can start typing the channel name and it will narrow down your search results, then you click the one you want or presumably it can auto complete with tab
Also add a wiki to the discord docs about slash commands.
It may be beneficial to users who are completely new.
Not to toot my own horn but I think I have some pretty slick logic going on in this post active settings feature
Actually... yes to toot my own horn 😛
🚃
This feature also showcases how good the send_long_messages() function is at chunking code blocks
which I had spent a lot of time on back in the day
Also patching a pretty big bug with 'history reactions' feature
it's not reacting to the last message if sent in msg chunks. Also error if sending one message.
Resolved.
Pushing to Main
Pushed to main - start posting those settings XD
This feature is triggered to run in the background too, does not interere with main tasks
Just took the time to revamp this
Wiki is coming along
Looks good ^^
a video or something will definitely bring more users
i literally found this project by joining this server and asking in general
i suggest changing this a little
it explains very well the project but not good at attracting people
hmmmmm
not good to be searched on google
i think the majority that uses tgwui is because they want it to be local
so i suggest adding the keyword "local" :D
this might surpass 2 search results
or maybe a post in oobabooga reddit
adding (oobagbooga) could be good
Thanks for the nudge - I just updated the main description to:
Discord bot which transforms your servers into hubs for limitless local AI-driven interaction and content creation. Features cutting-edge tools for professionals, and unlocks creative fun for casual users. Integrates text-generation-webui and Stable Diffusion Web UIs.
most of the model are not very good bcs they are mostly trained with https://github.com/Plachtaa/VITS-fast-fine-tuning/tree/main with a sampling rate of 22050 instead of 44100 or 48000
or it is just im bad at searching ._ . idk
someday my bot will have a perfect voice...
That new Forge uodate with NF4 quant for Flux.Dev is interesting news.
Might make the speed for my discord bot Ecne's generations decent.
Just tested on my RTX 3080 10GB card:
normal simple prompt
1024x1024 20 step
7641MiB / 10240MiB
fp8 = 4.88 s/it
NF4 = 1.4 s/it
100sec versus 31sec generations. Very good. Will test more later.
Indeed, for me I went from 2 mins to 40 seconds
Unfortunately forge API is in shambles for now, can’t really reap the benefits via the bot just yet
That duck looks an awful lot like me 👀
joking about the blank screen being a reflection
You guys update anytime recently?
Any feedback on new features like the streaming replies, etc?
I have updated recently, but I haven't done that. I have used start_reply_with to good success, though.
If you have any thoughts on something that should obviously be added to the new /prompt command, I'm all ears
I never use that, actually. I find it easier to just type everything out without using the slash command.
yeaaaaah, since you're familiar with the instant tags syntax
before I forget again, I'm going to add to the TODO list a command to create Tags
And yes, like save locally (reusable)
To an entirely separate json?
yes
I'll be sure to have a config option for the feature
I was thinking that I could make these tags available only to the user who created them
How difficult would it be to limit tag creation to certain roles?
Well if its a slash command, then you can just adjust command permissions in Server Settings > Integrations > Your bot
So very easy
Ever have any issues with the instant tags syntax?
Oh I see. Yeah, so ideally the tag would be useable by everyone (at least for my purposes), but limiting the slash command behind a role would be nice
Nope
sweet
Maybe I could have some mechanism for admin to click a button to approve it for global use?
I could just envision one person setting up all sorts of tags that drive everyone else crazy lol
eh, maybe just config option again... to set whether they are for the user who created it only, or works globally
I don't know what kind of response you'll get with expert termonilogy
One more option for the command
@keen palm Something you may appreciate... I noticed that the param begin_reply_with was omitting the text it was continuing from.
I've fixed that. Now, it only omits that text from the reply when using the continue context command
Pushed updates to /prompt command
Anyone here use Docker for anything? There's someone asking me if it would ever be possible to run this bot with Docker... from my quick research looks a lot like "no"
Docker is just a tool for running containers (virtual machines?) basically
there's a docker image for tgwui somewhere.
it just needs to be updated to include pulling from your bot's repo as well I assume.
I'm not really sure how docker images are created.
if it's like a script of commands to create the vm, or if you package up the frozen snapshot of that vm...
it should be possible
Tomorrow, I'm going to attempt to clean up the internal settings management, to subsequently achieve per-guild settings and perhaps even characters
only caveat with characters would be avatars
One problem I need to solve… some command options are based on settings… so would need to be able to register guild specific commands… 🤔
Nvm this is a non-issue.
Making good progress on per-guild settings
Hmm. I can't get [[begin_reply_with:]] to actually output anything past the initial phrase
This is after updating?
I'm reviewing this closely... may have screwed up the logic a little bit
yeah I can see what I fudged up...
testing...
What's the issue?
I overlooked 2 lines of my existing code, when adding code to tweak the behavior of "continue" (which is what this tag uses)
the original code and new code kind of negated each other 😛
seems good now
Btw I'm very excited about the per server settings... I think I did a very solid job on reworking the core settings management
Again- wouldn't be possible without Reality's contributions - the database code is stellar.
@keen palm The fix is pushed to main
I think per server characters will also be a thing - there's a few more hurdles to jump for that one
the setting would require using a shared avatar though
Which is only really a problem when people want different pictures of their AI waifu bots
The send_user_image tag could help but adds a lot of line breaks
Hmm, it still isn't actually continuing even after updating
My bot is dumb 😦
Apparently I gave it too much to start with and thought it should be the end
Weird. I'll have to tinker with it some more, but I got it to work that time.
Hmm I’ll test with longer inputs
Oh yeah, it’s certainly possible that the LLM just didn’t add anything
This model seems to not like continuing in general
You can always use Edit History cmd, add something open ended like “Then, “ then Continue cmd
I did one attempt at begin_reply_with ending with "that " and it still didn't continue
Actually the one with "that " worked but the one with "against " didn't
you could double-check the internal history log to ensure it isn’t just the sent discord message being sent short
Or, the Edit History command will show the logged text by default
Yeah it's not cut short. It's just the model being weird.
Regenerating the prompt with "that " showed that sometimes it continues and sometimes it doesn't.
I might even have per server characters working...
unless there's some catastrophic detail I'm overlooking, this settings update is going very well
I think if Stable Diffusion WebUI setting enabled for "keep multiple models in VRAM", this may even allow per-server imgmodel handling
Ain't nobody got that much VRAM
Yay, coded everything in... idk, 12 hours? Debugging time
Reality is it a big deal if I willy nilly import discord in modules?
Or does the entire bot script as a whole, only import it once regardless of import statements in modules?
sorry I could just ask chatgpt this 😛
yes chatgpt says it's cached on first import so NBD
files are imported only once, but you have to reference that import in other files to create the variable your code will use.
Learned a harsh lesson last night with circular imports… wasted about 2 hours
Well not a total waste as I did learn stuff
Just realized I needed to add this condition to "change_X" tags processing (such as 'change_character'):
and not is_direct_message(self.ictx)
I should probably add a setting to control whether images may be generated in DMs via tags
Welp - finally got everything working again - without per_server_settings enabled 😛
Time to see how that goes now...
So far so good, but only have this bot on one server. Need to test my other one
Note to self: allow anything via DM if bot owner
@terse folio you may be interested to know: Most of the idiocy of my settings management was resolved by simply importing BaseFileMemory from database to main. I also resolved the multiple instances of Config by importing BaseFileMemory to shared and initializing Config there
That had me thinking, oh, I can just initialize all sorts of main bot class instances in shared… but ended up creating unsolvable circular import hell
Interesting, nice!
And yes, whenever you find yourself having circular imports one solution is to make a 3rd file that stands outside the previous ones which you import. (In this case shared.py)
Ok in addition to all this cool settings crap - now also allowing the bot owner to use most commands via DM
Nice!
Starting to run out of bugs, I'm digging deep now
This is going to be a super sweet update
I had to add a little inbetween function for all 16 instances of ‘local_history = get_history_for(ictx.channel)’ in order to first get the character and mode (which now has ictx.guild.id as an arg), to subsequently pass to the history function
I first modified the custom history functions (get/set history for) to accept “ictx” instead of only the channel ID - but then there were some instances the parent class was calling with just the id (channel id)
Just a funny little story 🤗 it’s working and that’s all that matters
Just need to test the per-guild characters feature a bit more before I push this to main
The per server characters is totally working
I'm pushing this now - in addition to expanding the settings management, I've also cleaned up a lot of redundant settings crap and fixed a number of bugs along the way
It's much more efficient now
AND - there is a separate option to allow per-server-imgmodel-settings. I tested this and it will work if you have enough VRAM. In my case, I can use a separate SD 1.5 model in 2 separate servers
Will I continue using an SD 1.5 model in 2 servers? Heck no - but it does work!
Pushed to MAIN
different character same photo... i still dont understand why i would be useful 🙃
and what about the name
The names are unique!
I still have the bot create a "delayed update task" to ensure profiles do not update more than once every 10 minutes
I'l review this to see if I can reduce it when only changing name
wait
what about changing bodies
2bot one program
is that a thing?
instead of opening 2 adbot, just 1
Anyway - it is allowing different names in each guild, and it is managing their settings separately. The only caveat is indeed that they need to share image
right
The config file has a field for a shared avatar image
different server can have diff name
if the image is not included there, it will just change the image to the new character every time
One more thing I updated... it now allows png, jpg and gif.
It was originally only allowing png
Oh yea, that's cool, I saw some bots with animated profiles, and banners recently!
what
rythm i think
Well, the animated gif will probably be a static image if the bot is not nitro
Idk how the heck that works, I just looked up "discord avatar supported formats" - and revised my code to allow those 3 it says are supported
bots can't have nitro,
bots just have some nitro capabilities such as the ability to use any emoji from shared servers
whaaaaaaaaaaaaaat
im not sure if there are restrictions on who can upload banners (like if it needs to be verified)
Interesting, hey maybe the bot can use animated gifs now XD
bot has more rights than most users...
mhm,
funny thing is some people have asked if they are allowed to use discord bots as user accounts because of that
Which I think you can do? Because you're not automating a user account.
Just using the bot api to run a bot which is fine
well anyways, i remember that there was a setting to make the bot reply to another bot, i thought it was running 2 adbots @halcyon quarry im i right?
sounds like the random continuation setting?
ye
i dont think it would be useful because they are going to use doble vram which i dont have...
im thinking that instead of running 2 adbots, plug 2 tokens
if it's running off the same instance of tgwui, it wouldn't have to.
But i'm not sure that was a bot feature
that why i asked
it could be funny
and easier to do
like man vram are expensive
no, not at the moment,
running 2+ discord bots isn't a simple task
Also you'd need to figure out some communication layer internally between them.
The best solution that would fix a lot of things is to create an AD_Discord bot extension that hosts an api with everything it needs from tgwui.
Then you could start up as many discord bots as you want that connect to that extension and take turns sharing the vram
oof
but an extension is no easy task either,
Because you'll now have separate configs.
One for tgwui (what tts extension to load) for example
And other for the bot (character, history... info)
it's something I want to attempt later, but i'm still busy until the end of this month
gg buddies https://github.com/aiko-chan-ai/DiscordBotClient?tab=readme-ov-file#windows
burn my computer if die
no wait wrong
ill burn my computer if it get infected
i trust!
feels like good?
it's smart to test things in a vm of course,
but it being on github is a pretty okay indicator that it could be safe most of the time
As others can look through the code.
I would run it on a new bot account
incase theres a posibility it could grab tokens
wow 58k downloads,
Its probably fine ^^
don't login as a user (discord doesn't like that)
it really has nitro
a bot can create server :O
back before discord was super strict about client side mods, I saw some to mimick some features of nitro.
Like letting you send custom emojis as plain text (others would need the mod to see them)
They can't, this program is for both user and bot accounts so they support all endpoints
as a bot account?
yea, what's cool is the discord.py docs have user endpoints too,
Added for completion, but are labeled that they wont work for bots.
Can be confusing sometimes for those who don't realise
imagine the server owner is a bot XD
heh yea
I had a joke about this
With my moderation bot, if the server owner tried to kick/ban themselves with it.
it would display the image of "transfer server ownership" with the bot's profile
🤯
amazing :3
Thanks again for your contributions (I’ll never stop thanking)
So much would not be possible
do a test using screenshare?
I don't think bots can send video.
Make sure to test by viewing these things from the official client because I feel like a lot of it would be client side
screenshare dont work
Your database and history manager made Settings() instances and management “easy” (just time consuming to ensure all my little features play nice together )
Also if you have OBS, you can use the OBS webcam as a camera ^^
i can press camera but i dont have
OBS is the GOAT
Oh yes and it gets even better, being able to program for it via obs-websockets and a python interface!
I thought Reality already clarified recently that bots can’t use video in VC
i literally logged in a bot account
Yes, so can’t use video 😛
and i could press the button 😢
the button lied to us!
i bet that you never saw a bot creating a server and being the owner
cant accept invites tho
it was a few years ago when I needed to create a bunch of servers and looked into doing it through a bot.
maybe they changed something lately?
Huhh, have to check that out!
The docs are huge, every time I need to look something up it’s a scavenger hunt
Oh interesting, this must be new!
maybe came out when discord started the bot verification thing because that also has server count limits
it can upload 50mb+
yea, bots have their own upload limits
I think it's 25 for normal users, 50mb for bots as you said, and 500mb for nitro?
got it 100mb
I may be wrong but I think uploads were recently reduced…
Or announced to be reduced…
bot banners might be changed from the developer panel
like you would change the aboutme section for a bot
true
yup, there's a button to upload them there
that might not be a bot feature, just a feature of the client if used in user mode
i closed it and it got deleted by my av

I love how I pour my heart into this and slowly lose github stars lol
Some nice github stars you got there. Would be a shame if something were to...happen to them....
if any of you guys end up trying out the per-server settings/characters features, I'd be very interested in feedback
I've only got one active server, but I can try at some point.
Same - I only tested enough to confirm it works, along with most features... anyone using the bot in multiple servers could end up finding something I missed
im doing it right?
gonna believe and try first
forgot to make activate true :P
some small error but working ^_^
buddy there is a problem
there is no description xd
idk how google grab info but google is skipping the most important part
you must explan why you call it ad discordbot 🙄
https://github.com/DaddyLazarus/AdBot
although it reminds me of ad astra
Yep - so any issues with vits_api_tts?
nah just got the speacker name worng
the code it self has the problem of encoding " ' "
and always output %26%23x27 when there is '
gpt fixed it
and for some reason changing segment_size to 0 makes the tts much better
but it is not included in the extensions code the parameter and asked gpt to add it :)
Maybe you should try pushing a Pull Request to the original repo?
I’m starting to get tempted to re-publish the project under a new name, the crappy name likely does have a huge impact
👀
never done that before
dont know how to do :V
- Create a GitHub account
- Go to the repo and "fork" it - and give it a name like add
-marcosto the end of it
- You can make a duplicate of the main branch by going to the "branches" page, clicking
new branchand call it something likefix bugs
I use the GitHub desktop app, but I think from this point all you would need to do is (in the GitHub page) click Add file to upload the updated files, which I think will trigger a commit (which updates the existing file)
Finally, from the top bar Pull requests > New pull request and it will let you compare two branches.
In this case, you would want to compare your fixed branch, to that other guys main branch
Then that guy can approve the PR and it is officially committed
I submitted these 2 PRs to sd-webui-forge this morning
It sounds like a lot of steps, but once you do it once or twice it gets easier
i always forget the command to make bot join voice channel
- The character file needs to have
use_voice_channel: Truesetting. - In
config.yamlyou need to have the extension name plugged in (you probably do) - In your server you need to use
/set_server_voice_channel
about use_voice_channel: True I think I'll make that True by default
(like the 'in character menu' setting)
The command works only if you launched the bot with the TTS enabled
i remember it was toggle tts or something but cant find it now
/toggle_tts
If you don't see the command, you may need to restart your discord client
or try the /sync command
Well, each character
that you want to use on voice
I'm changing that default now...
like i dont see the command
what do you mean with tts enabled
like it is sending audio files
but i only added some setting to the character card
When the bot starts up, it registers commands
That command does not register unless the bot was started with TTS enabled
If you opened your discord client while the bot was online and not connected - the command will not be there.
Reloading the bot with TTS enabled does register the command, but it likely will not appear in the discord window until you restart it
I'm testing now... allowing that command to always be there, but just saying "Can't use it" if no TTS client is in the config file
this will prevent it from mysteriously not being there when deciding to launch the bot with TTS
:p
what if the bot didnt detect my tts
rebooted
logged in with browser
still missing
Ok... last things to check
In config.yaml in the tts_settings, is it play_mode 2? (default)
Ahh ok, I think I know
👀
I have it currently set up so it needs to be in the 'supported clients' list...
confirm the name of that tts client and I'll add it
Actually, scratch that
I have another idea
👁️ 👄 👁️
I added a 'fallback client' logic when initializing extensions
It will consider setting up an extension that ends with "_tts" so long as it doesn't find another one that is in the supported clients list
👍
Pushed to Main:
- Any extension ending with "
_tts" can now be used, including voice channels. - Characters are no longer required to include
use_voice_channelin characters, to use VC. It will join so long as otherwise configured, and the character does not explicitly disable VC withuse_voice_channel: False /toggle_ttswill now always register, even if TTS is disabled.... to prevent the command missing in discord UI when TTS IS enabled.
@valid crypt give it a shot and lmk if it works out
my first pull request 🥳
sync reloads the discord commands iirc for your client
The bot can't be hot-reloaded without restarting the bot.
It would have to have been built with a cogs like system
Yeah, I wasn't so sure if it would make commands appear which were missing from the UI (despite those commands being registered)
i was wondering why vits_api_tts is so slow, and after changing power plan makes it fater
asked gpt to add some debug codes and
2024-08-23 02:20:42 [INFO] Using CPU on 13th Gen Intel(R) Core(TM) i5-13600KF with 14 cores and 20 threads. Total memory: 32GB [in ModelManager.log_device_info:181]```
using cpu
huh
Chatgpt turning everyone into coders 🤗
i feel the speed is very improved, until i see the log
daaaam
cpu was taking 8s at max power
Huge!
i think you should make this default
I noticed that TGWUI made that default recently
and a little detail
works but poor
i have to toggle it and toggle back to let bot join
¯_(ツ)_/¯
bruh having the respond to it self set to 10% i got 3 replies, 0.1^3 of probability .-.
that rare enough
wew rare!
I have some code to create "fake" randomness that feels better for humans.
because [10% yes, 10% yes, 10% yes] is random!, but unlikely, our minds just find patterns and don't like that
A similar thing happened with Apple's music shuffling long ago which inspired me to make such a function.
One that more deterministically randomises data
I know that, fake randomness feels more random
That’s a feature that came with the parent project I recommend maybe trying out my spontaneous message feature instead using variables for the prompts
Not that it would change the randomness but you may be able to tweak the prompt a bit
Such as adding a system message or prefixing the prompt or something
The chance to reply to itself feature actually takes its last reply and uses it as the user prompt so it may be effective but also may confuse the history or something
I think that’s how it works I really haven’t looked too much into it
There’s been a lot of new major features since you last sent anything here such as streaming risk on responses and her server settings including characters among other things
wdym by server settings including chars?
If you enable both new settings per_server_settings and per_server_characters, it will manage characters completely independently in each server. So in one server if you use /character and choose a new character, only that specific server will change to using that character.
The only caveat to this is that the avatar cannot be set independently. I have an additional config option for avatar_image so you can specify a dedicated shared image. If unset, the avatar for the bot will change in all servers every time a new character is loaded
This is all possible due to a big overhaul in settings management
you guys added ton of features and i need bunch to catch up on now
Reality hasn't added anything recently, been just little old me 🙂 But they gave me incredible tools to work with
The bot does work with many extensions, so you could always try using an extension that adds that
At minimum, I kind of recently added some new "tags" for prefix_context and suffix_context, meant to mimic the complex memory extension
giving it final options like, [agree, disagree, middle ground, wrong, true, yes, no]
the llm in the end will choose one of those but before that there gotta be an analysis that happens
before the final decision
I'd mentioned this before, but you could use a flows tag to essentially pass the history along with a background prompt to a dedicated character context, to make a decision that would then give a specific prompt to your normal character
You could have the flows tag trigger for every user message, or only on certain key words
you mean sendimg it to a second llm? that will be the actual decision maker?
For example if it triggers your flows tag, flow_step 1 may be swap_character: Decision Maker who has a dedicated context with example dialogue like system_message: You make decisions, valid responses are X Y and Z, user: an example of a prompt, Decision Maker: X, user: another example of a prompt, Decision Maker: Z, <YOUR PROMPT GETS INSERTED BY THE format_prompt tag>
flow_step2 may then be format_prompt: {llm_0} representing what the lest character just wrote
etc
those are good tools but the problem is when it comes to the decision making part
the llm acts stupid
Youll need to read my comments in dict_tags.yaml and in my Wiki, look over my examples, look at all the possible tags, look at the Variables
i tried dozen prompts for decision making
It could be a matter of prompting
Here's what most people do: Make a huge long winded explanation of how the LLM behaves
I recommend trying having literally nothing else except for some solid example dialogue
and like, a concise system message that You make decisions
you mean a cloned character but the cloned version will be customised for only decision making?
the system prompt will be finetunrmed for only that
You need to see all the crazy crap my Tags System can do
You can make all of that apply to a flow step
completely different character context, different parameters, manipulating the history it sees, manipulating the prompt, etc etc
Hiding or keeping that characters response in history, yadda yadda
wouldn't that steer the new future decisions left or right? bcz the data is biased? idk
the data I mean examples
You see that as a negative, but it's actually stronger that it is biased... towards responding the way you actually want it to
should I include examples of each option? hmm..
Good example dialogue will have all random user prompt scenarios that are not going to be an exact copy of a prompt you or your users will actually make
But similar
like, in the ballpark
[agree, disagree, middle ground, wrong, true, yes, no]
The benefit of using example dialogue only is you can use all those tokens for it without the huge long winded explanations
Fit as many examples as you can for diverse scenarios, and if you are very smart about it you can really emphasize how that character makes its decisions
without having to explain it
You can see the example characters I provided... they work
for decision making?
Well, Imgmodel_Selector
When my flows tag is triggered, it works - it picks appropriate image models depending on the prompt
that's a nice classifier
Yes, I also made one for selecting Aspect Ratios, which worked well too... unsure where the heck I misplaced that character...
the problem with dialogues, it will link unrelated dialogues together, just bcz they came aftetwards
this is what I mean
ai
This is giving me an idea for a new Tag... which is possible thanks to Reality's history manager
something like filter_history_for
Adding another layer of history manipulation. Such that, you could show that Decision Maker character it's previous responses to your prompts
Without showing those prompts/replies to the main context
Currently, the save_to_history tag is the main thing to prevent sharing unwanted history among contexts
yep, definitely adding this tag
Logging new item to history management

The upcoming tag filter_history_for will search both name and impersonated_by
and collect those exchanges
hey am back
my phone died on me in the middle of me writing
😐
you mean the decision maker character will have a history of ALL PAST DECISIONS?
that can be both good and bad, if all the decisions it made in the past were bad then rip
it's same as providing examples that you talked about
I have alot of experimentation to do to get this decision making process working, and more importantly making it choose the correct decision
this method can act like a memory which will help with consistency of the LLM overall "thinking"
There are other tags for manipulating history, load_history could restrict the depth of history to only X # of exchanges
Using /reset_conversation wipes history.
@terse folio hoping you might take a look at something... I don't 💯 understand how history works
usage:
filtered_history = self.history.filter_history_for_names(names_list)
i_list, v_list = filtered_history.render_to_tgwui_tuple()
self.llm_payload['state']['history']['internal'] = i_list
self.llm_payload['state']['history']['visible'] = v_list
The thing I'm uncertain of, is if it matters what order I collect the hmessages
Although... yeah... .render_to_tgwui_tuple() won't work on a list of hmessages
I should never call for backup until after I’ve showered - because that’s when I always solve my problems
Just do all your coding in the shower
waterproof laptop and good to go
I’m going to get the message pairs and collect to a list… if multiple bot replies do smart things to get the right one… then, iterate over a copy of history using the collected list and pop any messages not in the list
Imagine landing a 6 figure job and this is one of the terms - must be able to code in a shower during work hours
i have better brain while 💩
voice activated code assistant chatbot for hands free shower coding!
will take a look shortly
Thanks - I think I’ve got the right idea now though #1154970156108365944 message
We could change that by creating a static function for the class that takes a list of hmessages as input.
But ideally the way youd do this is create an empty history object and add your messages to it instead of a list object.
The cost in performance when instancing classes vs lists is negligible
(Lists are classes anyway)
so dont worry about doing things that way!
there are still some optimization routes we could take: Like implementing "Slots" for history so the class only reserves memory for a set list of attributes
Ahhh yeah didn’t think of that… add them to an empty history instance
Thanks!
there should be some functions for that like history.append(hmessage)
hey @terse folio how is it going
Going good, just got home pretty much, taking care of pets for neighbors while gone!
How're you doing?
I think so, I don't think I was sorting them by time.
But we can add a function to sort them if needed.
I may add another bool attribute like “manipulated_reply” or “from_tampered_history” to convey that history was manipulated to produce the reply
Could be good for logging at least, maybe for a future filtering method
I think this should do it
although, seems like the custom .append() method is eager to save the history
ah ok, this is the answer
the _ prefixing the attribute name is to indicate it's supposed to be treated like a private variable (not meant to be modified externally)
the reason being, history.append does checks to make sure the added object is an hmessage type.
How about adding a "nosave" kwarg to .append!
but yes, for now this will work.
just leave a little # TODO so it can be easily found later to be changed to something else
yea!
interesting, also something I could implement later is running slices on a history object
Like this could become history[-num:] if that's of any use
I take that back, different stuff going on as this is after extracting pairs, looks good ^^
I improved the get_history_pair() method too...
Added the check in the red box
If/when there can be multiple possible user messages for a bot reply, will need to expand that part
forgot break
(should only be one max anyway)
that's interesting, good idea,
should do a test to see if that lines up with the "reply_to" and all that stuff
There could be some edge cases where you get unrelated bot replies
nvm
I read it as all bot replies in the history at first
I get it now
It's been working well for the App Commands (regen / continue / toggle as hidden) but it may have been occasionally getting a less than perfect bot reply
User1: 1+1
User2: 2+3
Bot: 2
Bot: 5
Theoretically this should output in pairs:
1+1 = 2
2+3 = 5
I do have a separate method to return all possible replies as lists
i'd have it write the prompt to a file so it can be analysed
rather than this one which filters it down to one pair
A mix of both could be to gather all the replies from the first function, and "\n".join() them into a single reply to be used as a pair
Uh oh, the gears are turning
Might need to come burn some of that idea fuel XD
got a few more days of really busy stuff!
The filter.... I think it working, my server is too dead to test effectively lol
human coordination isn't always easy,
I have a couple test discord accounts for that!
Oh right I can test it for the impersonated_by attribute (from triggered 'swap_character')
you can login to one on a webbrowser, or alternate official discord client like Discord Canary
There is an issue with the method I used for filter history...
maximum recursion depth exceeded in comparison
I'm not sure why this is causing an infinite recursion...
Could appending items to the copy of history, be causing them to also append to the history it was copied from?
Ok well for some reason self.fresh() is actually just giving me a copy of history, not an empty history instance
seems like it should be giving me empty history but here is what is printed
fixed by copy.deepcopy() 🤓
The error I'm getting is due to comparing HMessage objects to each other
it bugs out when they match 🙂
What chatgpt suggests (and what I'll do) is make a set of the message IDs that it already appended, and check hmessage ID against that
the hmessage should be using the uid attribute for matching
make sure you have a unique one for unique messages
those are the message ids iirc
like for discord
but if it works, great!
I actually generate fake IDs for messages that are not sent to discord 😛
I see I see
and whenever there are multiple IDs, the extras are all 'related_ids'
So yeah, for some reason if hmessage == hmessage: triggers an infinite recursion
do you have a code snippet that causes that?
nvm
i never implemented an __eq__ function it seems
because dataclasses do it for you, and there was some bug with labling which attributes i wanted to be used for eq checking
in the meantime, you could get around it by defining your own .equals() function that compares the uids or ids
Welp, in my testing I realize another nice tag to add would be include_hidden_history
Currently, the only ways to include hidden history are:
- to toggle it back to visible
- or perform an App Command on a hidden item, it toggles it temporarily 😛
Added new tag: include_hidden_history
I implemented this by adding a new argument for render_to_tgwui_tuple()
if include_hidden_history:
include_hidden = True
# Render history for payload
i_list, v_list = history_to_render.render_to_tgwui_tuple(include_hidden)
I first tried deepcopying self.local_history._items and iterating over the hmessages, but that actually still referenced the original attributes and they were getting modified
i dont think i created a deepcopy method for the hmessage,
will have to check if there are seperate versions
In this case, was much more efficient to just expand the render method
There are methods that copy history… unsure how unique those copies are…
i fould the detail of json cant have comments D: does yaml have comments?
yaml can use # comments
Besides most of the code I'm sharing here is from .py
# type comments wouldn't work because json ignores whitespace.
you would have to search if it supports them, maybe there is some sort of opening and closing tag to indicate comment
Also depending on how one's code reads the json, you might be able to get away with adding extra key:values that the program ignores
like "comment":"your text here"
but yaml would be better suited ^^
I had it set to copy the data in the lists (making a new list)
But keeping the same hmessage objects so they would be uptodate with whatever is going on.
I can look into making a seperate deepcopy function that also deepcoppies the hmessages
eh, don't sweat it - we'll do it if the need arises 🙂
almost needed it, but not quite 😄
lol having a good time ha? am good except I suffer from what's called llm frustration syndrome 😆
Need a therapy finetuned llm to talk about the llm related frustrations to 😸
Yea, tuning your prompt to do something perfectly can be a pain, especially with smaller models
There is an LLM therapist model called Carl
Probably a better one by now… that was > year ago
Isn’t everything better on Linux?
Even gaming in some cases!
i disagree
It's not a matter of opinion. There are Linux-based OSs built for gaming (SteamOS, Bazzite, Batocera), and some games run objectively better that way.
For what
training
gonna pause vits training for a few days :v
if i want to add stt to the bot is modifying an extension or bot.py
stt might to be to hard for me
to add streaming tts is modifying the extension or the bot.py
?
I'm really not sure what's involved with STT, but thanks for the reminder.
I may look into that in the coming days
stt feels really hard
I need to see how whisper_stt works, and figure out how to implement it
so what i have to modify to add streaming tts?
You'd need to implement a separate API call to a TTS specific API, with the responses that are yielded from llm_gen() to the process_response_chunk() function
such as alltalk_tts's API
in the correct format, with the parameters, etc
😵💫
and then the responses from the TTS API would need to use the voice client code
i let chat gpt to do the tech part
i think that we are not speaking the same english
😵💫
It's unfortunate that the "continue" function has the TTS extensions generate the entire bot reply, and not just for the continued text
does that affect the speed?
Yes, but it also returns the filepath to the generated audio which is the entire response
if it affects the speed, then it is tough
i thought that you could make a code to do some subtraction
Let's say I have max new tokens set to 2.
Example:
User: How's it going?
Bot: G
TTS Audio: G
Continue request:
Bot: Go
TTS Audio: Go
Continue request:
Bot: Goo
TTS Audio: Goo
hmm
There probably is a way to trim the subsequent audio files, based on the length of the previous
But it will generate the entire reply for every time it sends a response chunk
if the continue dont affect the output speed, you could just write a code to delete the previous response or smth
but
if it restarts everytime...
It does affect the output speed because it has to generate the entire response each time, yes
thats the problem
but i remember that when i press regenerate, starts where it was and wasnt taking too long
I'll make a deal with you.
If you can get python code up and running that uses the Alltalk TTS API, with the parameters and everything, I'll integrate that code into the bot
my vits tts uses api
When I looked at the alltalk TTS documentation for the API, it looked very complicated
the extension is only to do a api call
Well if you can send me functioning .py files that use the API directly I'll see what I can do
this is the extension
https://github.com/Arondight/vits_api_tts
this is the tts
https://github.com/Artrajz/vits-simple-api
script.py of the extension
this is the official example code i think
https://github.com/Artrajz/vits-simple-api/blob/main/api_test.py
I'll look into it, but it looks pretty goddamned complicated lol
it says simple api tho
also i feel vits like, very fast, mimics voices very good, but training it feels like eating s*
there is a ez trainin repo but 22050hz is pain (for me)
I'm outta here, I will look into that.
What are the odds that this vits is bugged like the other one?
?
nvm
I guess Artrajz probably the original author, and the other one the guy ported as an extension
the extension was bugged, you pushed a PR that will languish unmerged for an eternity XD
not a deadly bug
just flaws
1 big flaw
1 small flaw
very simple to fix
1st flaw was ' was read as #26x27; (something like that)
2nd flaw was adding a parameter improves quality
this is the gui
by accessing the url bellow generates the audio
translated page by google
steaming responses dont work???, but since i discovered that i wasnt using gpu, it takes too little to need that function
this isn't the best, because the TTS consideres the whole text (possibly)? or at least a window long enough so that it can create realistic sounding speech.
Unlike old TTS that just mixes phonemes together and sounds all robotic.
It would be better to split TTS on the sentence.
But better than doing audio analysis, just feed the TTS sentences at a time!
I’m going to give a good try at monkeypatching whatever I need to, to get the desired result, either from Continue function or making message chunking occur in patched function
Maybe patch chatbot_wrapper
i said i was pausing vits but man it is addictive
terminal says nothing but gpu is working
i hope everything is fine
my pc is dying
so laggy
if training at 22khz im out of ram
i dont think i could survive 48kzh
committing Spongedoku
I've duplicated chatbot_wrapper() locally, imported the few required functions... now should be able to customize it...
Hmmm, bot seems to be having some issues
got a 503 error on load, but now it's just freezing....
Other bots not working as well, so seems discord broke some things with that update.
Login issue?, any chance your token was changed?
but try updating discord.py
Chatbot_wrapper is TGWUI internal function that accepts the payload and returns the response
It also interacts with the extensions - I’m going to see if I can customize it when TTS and response streaming are enabled, to not send the entire response to the extension each time it splits
This will be optional, it could have adverse effects with other extensions. It will otherwise use the unmodified function
No it was all discord's issue
already making good progress
I think streaming TTS responses will be a thing soon
There's a variety of ways that extensions can modify normal behavior.
alltalk_tts (and I suspect all other TTS extensions) change the state['stream'] variable to False.
This makes it so it must wait until complete generation is done before it lets extensions modify the "output".
I changed the behavior to force the stream flag to True, and now it at least lets the text stream... and at the end it, processes the full TTS.
I think I know what needs to be done to make it process TTS for each chunk...
I have it working, but something is making alltalk just hang sometimes... may need to add some arbitrary sleep or something.
when you say soon it will give you surprises :V
In a way I already have it generating TTS after each response chunk
Just need to clean it up
And make it not hang and crash 🙂
If I can’t figure it out the guy from alltalk may have some insight, super nice guy
Noticed character is not automatically joining voice channel on startup... will resolve that as well
Ok, the TTS did not hang and error here at other location. I may have an outdated alltalk install or something at other location
For anyone wondering how I am making this work, I've extracted this code from chatbot_wrapper() (TGWUI internal function) which is the very last code it executes, after all text has been generated:
output['visible'][-1][1] = extensions_module.apply_extensions('output', output['visible'][-1][1], state, is_chat=True)
yield output
This is what actually triggers the TTS extensions to generate the TTS.
After the audio generates, this yields the "visible" response... when TTS is enabled, the message includes the path to the audio file.
- I've extracted the code (no longer executes at the end of
chatbot_wrapper()) - I now execute it on my end every time the bot decides to chunk text, with the chunked text in the arguments
if should_chunk:
last_checked = ''
already_chunked += partial_response
audio_path = extensions_module.apply_extensions('output', partial_response, state=self.llm_payload['state'], is_chat=True)
print("audio_path:", audio_path)
The TTS extensions only use the "state" dict for applying character name to filename, etc.
Figured out why bot was not joining VC on startup. Fixed
Also noticed that per_server_characters was not handling Voice Channels correctly. Fixed,
One more unrelated bug to fix (noticed Regenerate was broken) then Im pushing
There must be some new thing I added to HMessage objects that is triggered the infinite recursion when comparing them, because old functions that used to work via comparison are now also triggering the error.
I'm doing what you suggested which is adding an .equals() method to compare ids
def equals(self, hmsg:"HMessage"):
return self.id == hmsg.id
something is super screwed up, ugh
just going to git checkout until I find where things went South
Ok, it turns out that the way I solved the one infinite recursion bug, created a different one
(go figure)
yes, History() did not like when I used copy.deepcopy on it
solved! Ok everything is looking very good now
PUSHED STREAMING TTS REPLIES TO MAIN
@keen palm @valid crypt
gotcha
TTS still isn't something I've dabbled with at all.
died
hmm... so I'm testing edge_tts and see this one is not working
AttributeError: 'Task' object has no attribute 'streamed_tts'
21:33:59.618 #2846 ERROR [bot.main]: An error occurred while processing "on_message" request: 'Task' object has no attribute 'streamed_tts'
21:33:59.618 #3875 ERROR [bot.main]: An error occurred while processing task on_message: Type is not JSON serializable: AttributeError
TypeError: Type is not JSON serializable: AttributeError
something like that
thats fast
Well, realized my mistake pretty quick after you said it 😛
The character needs to have chance_to_stream_reply behavior, some value > 0.0
The TTS streaming happens in sync with the text streaming
When it splits a sentence, it generates that portion of the TTS and plays it
If 1.0 it will split on every period or line break
0.5 means 50% chance to split on one of those
working 👀
working for you?
edge_tts is failing, because it calls some async function which apparently is not allowed while TGWUI is generating text
Your using that vits tts?
I'm getting very consistent results with alltalk, at least
Is there no rhyme or reason for the bits that aren't working?
If you could walk me through that extension I could try reproducing it
very ez
https://github.com/Artrajz/vits-simple-api follow steps
models provided by https://github.com/Plachtaa/VITS-fast-fine-tuning
this:https://huggingface.co/spaces/Plachta/VITS-Umamusume-voice-synthesizer
use my fork https://github.com/marcos33998/vits_api_tts-marcos (change the name)
A simple VITS HTTP API, developed by extending Moegoe with additional features. - Artrajz/vits-simple-api
This repo is a pipeline of VITS finetuning for fast speaker adaptation TTS, and many-to-many voice conversion - Plachtaa/VITS-fast-fine-tuning
i remember that some dependencies cant be installed automatically so you have to install it manually but i think you wont have problems
these are what you need, they are under configs folder and pretrained models
ah yes,
because there are references to itself
A hmessage contains it's parent
and history contains the hmessages.
That way they can reference eachother.
hmmm, now that I say that outloud I believe there's a bug with the history copy method.
It should create a new timeline and each hmessage item should point to the new history, not old.
i'll write that down on my todo
For what I was doing, which was makign a copy of history... I just used copy.copy() (like originally), then set its _items to []
things are working fine now, may be some other stuff not working 100%
try history.fresh() see if that fits your usecase
it also sets _items to []
and clears one of the other attrs
I’m using the microphone so this might not come out good
Now that I have a custom version of chat but rapper I don’t think that we need to use the custom a sink for partial thingy and can instead just make it an asynchronous function
I’m wondering if the inconsistent results from the TTS streaming I have is due to the method that we are making it asynchronous
Can always leave that for another commit as it wont change behavior too much ^-^
I'll have to check it out, will be free over the next week
I might need some kind of thread lock and unlock thing added to it I don’t know
if the tts code is inside a asyncio.run_in_executer
it's in another thread and shouldnt affect the bot
what kind of bug are you getting?
It’s outside 🤓
Trying not to move all my text chunking logic inside that function (I am processing TTS every time text gets chunked)
But starting to look like I may have to
Er, actually it is still in the loop in executor block
Use a generator that iterates on the output chunks, that way you can keep them as seperate functions but one is technically inside the other
but tts probably doesn't need to know about previous iterations
When you get an opportunity to look at it it will make much more sense
Okie!
I’m making TGWUI do something it doesn’t want to, which is run apply_extension() at times it usually doesn’t
i thought there was some kind of cooldown but no???
i thought that there a limit of messages but no???
i thought that after a while disables the tts but no???
i thought i have to manually generate a speech but no???
what?
Everything I said about TTS streaming not working before, are out the window now
This new idea that popped into my head the other day is definitely working
Just need to smooth out some of the wrinkles
I know you don't like alltalk but for a proof of concept you could try it and see that is at least working 100%
That author is extremely dedicated and constantly improving their code, so their efficiency and code structure can probably be credited for why this technique is working well for it
I think the solution I came up with could be further tweaked to make it work consistently for all TTS extensions
🤯
vits is not working at 100% it is like 90%, if i dont count the problem that the tts stops working
i suspect that the api call has a rate limit and the limit disappears after a manual generation so might not be your fault
I do have some wierd bug that I can't quite put my finger on, where it is uploading a file twice
What's odd about it is that the function that uploads TTS files like shown here, also sends them to voice channel.
But, this voice clip is not playing twice on the voice channel
It's very odd that this clip uploads twice
after 50+ tries it does
just very rare
probably something to do with the custom code I added to make edge_tts work 😛
because of that one model output being a bugged format
with io.BytesIO() as buffer:
if file.endswith('wav'):
audio = AudioSegment.from_wav(file)
elif file.endswith('mp3'):
audio = AudioSegment.from_mp3(file)
else:
log.error('TTS generated unsupported file format:', file)
audio.export(buffer, format="mp3", bitrate=f"{bit_rate}k")
mp3_file = File(buffer, filename=mp3_filename)
need to have chatgpt review
i think it is some kind of protection of vits simple api, as i run tts and llm on different devices...
vits simple api has a config file and i wonder why it wasnt using gpu
Ok also found a flaw in my logic
Could account for the 10% of the time not working for you
Currently it is only triggering a TTS if the text response gets split at least once
testing my fix
and also found the issue causing the extra file upload
man I'm sloppy :p
And yes my fix worked for the no-split replies
Pushed the fixes to main
Now hopefully you have 100% success with that
wanna hear a small details to be improved?
ocf you want
the message after resetting the conversation has no tts :3
Oh, the greeting
Could make an option for that
Sure, I'll do it
Glad you asked because I found some flawed logic in reset conversation
:)
The command queues a task... but it was actually replacing history with a fresh copy immediately before queueing the task
Also noticed this which is wrong... should be text not text_visible
testing if new setting works...
eyyy also found a bug in continue() while adding this
Pushed fixes / new TTS Greeting setting, to MAIN
Was able to use the existing speak_task() function
?
Showing that greeting message is making tts
thought that was showing the bug
The bug was that at the end of continue function it was processing any TTS response.
But, it already processed all TTS responses by that point
would pretty much just replay the last one
nice
Seeing if I can improve the message chunking detection...
Mainly, to allow \n\n (double linebreak) to be detected and given more weight.
Current code triggers on \n or . before having an opportunity to consider \n\n
I had given up on it back then, giving it another go now
Aaaand I'm giving up again. So complicated
Unfortunately no - the reply streaming would definitely be of higher caliber but I’ve got so many variables involved already then to make the shorter triggers be temporarily ignored, then factor a match offset, blah blah… can’t seem to make it work right
Well, I withdraw my previous gif
chunking detection would be hard to make it better
keeping it random archives the 80%
i dont think that's high a priority if you cant do it easily
let me try somehting...
What I was aiming for was this kind of logic:
chunk_syntax = ['\n\n', '\n', '.']
if matched_syntax == '\n\n':
chance_to_chunk = chance * 1.5
elif matched_syntax == '\n':
pass
elif matched_syntax == '.':
chance_to_chunk = chance * 0.5
return check_probability(chance_to_chunk)
But it is very complicated to try checking for \n\n because \n or . trigger before it can happen.
When it rolls probability for a trigger, I make it ignore that text for future checks.
you mean
this and
this?
Yes.
LLMs like to use double newlines like this
Wish I could add more weight to make it split on double newlines, compared to single newlines
what does it looks like now and what you want it to look like?
The third code block it gave actually worked for detection, despite not really understanding the logic of its solution.
Then I couldn't get it to actually split at the right place in the text.
Hey here's some text that is generating.\n\nNow here's a new paragraph.
Because the tokens being generated are like "ng", ".\n", "\nN", "ow",... the code is not checking via .endswith(syntax) because it could never match \n\n
The code is instead checking a 4 character window to see if \n\n is anywhere within it
i understand why it is not working now
hat , t is , is g, gen, etc etc
but what do you want it to do
So eventually it will find either .\n\nN or g.\n\n or \n\nNo
like how should the results be
All of which are true
so then it needs to offset the text splitting based on where it matched in the window
But I also have 2 other variables tracking the text chunking
1 which stores all text that has already been chunked
Another which holds the text it is currently figuring out where to chunk
It doesn't matter if it splits before or after "\n\n" because discord clears the white space
its catching the little bits before and after
due to the "sliding window" of text it is analyzing
is this what you want it to do?
2linebreak| --> 2linebreak(message 1)
| 2linebreak(message 2)
2linebreak|