#ad_discordbot (Fork of Fork of xNul's bot)
1 messages ยท Page 14 of 1
With that resolved I can see my delayed timing is good but I'm off the mark slightly, need to tweak a bit
hmm!
Most of this is working and it's very cool
Awesome!
might be perfect actually... just noticed it wasn't factoring the typing speed because I didn't update the method for setting the tokens generated
Yes - all working!
Need to tweak the values/weights for the random delay selections.
Then see about possibly continuing to respond to multiple messages
Revised database last_msg methods so it stores both last user msg and last bot msg.
It can get the most recent for each in a channel, or the most recent from either
She seems nice and not like a mega bitch
ayo thats cool
I'm pushing a very nice update in the next hour or so, just checking over a last few details
This update will make all the recent new behavior settings I released... actually work XD
just one thing can the bot take multiple messages as one input
That's planned
as a person sometimes you want to correct yourself or say a lot in many messages
And coming soon
example of one thing but in several messages
where is the update log ๐
my solution to that would be to generate smaller batches of text (like 5-10 tokens) if the user has typed recently.
Similar to my voice proof of concept.
But after the user has stopped sending messages/typing for maybe 0.5s, the bot will generate a full 256 tokens (or whatever you set your max to)
This way it can update what it's saying as your messages come in
i think that slow responses are not so bad as humans takes time at typing
yes, this is to mimic how a human might respond mid sentence without hearing your message out fully, but they also have something to say so far
I have 3 checkpoints before sending the response:
1 - when user message is received, it sets an initial response delay if configured.
2 - After the text is generated, it checks to see if the bot is now online, if so it removes the initial response delay and updates is typing..., "come online" and send message timing.
3 - When the message is just about to send it checks the timing one more time.
I'll be able to work in extra behaviors like continuing if another message drops in the channel (same user / maybe other user)
Maybe listen for Message is edited that it is responding to, cancel and rewrite
the "is typing..." is brilliant
oh good point, if the bot is "idle"
you can skip the whole generating slower.
Since we are not trying to return a reply as fast as possible, we can just wait for the idle time to finish, or the user to finish sending their message before gathering up all the text so far and working with that
I made an IsTyping() class which there is an instance of for each main Task the bot does (ones with no typing is None).
There's 3 delays that are possibly factored depending on the user config.
- response delay (delay bot has if "idle" - applies only to first message when idle)
- Read text delay (
msg_size_affects_delaysetting) - Writing delay (
maximum_typing_speedsetting)
The is typing... occurs after the response / read text delay
The bot will only go idle is the responsiveness setting is less than 1.0 (maximum)
many developers want quick responses in bots (voice), but to imitate a person we can add some random sounds like uhhhhh or eeee, which makes the bot feel more real for me ๐
Any non-message task will instantly wake the bot though
That can be done in the character context ๐
All of the delays are weighed by responsiveness (except writing speed).
It also affects how quickly the bot tends to go idle
oh wait i have another idea
can you cut responses?
something like instead of making a tts at the end, after a comma generate a parcial tts
Not easily done with my current implementation (relies on TGWUI integration for the TTS)
I don't think it can stream TTS responses
what about divide the reply in many messages sent in discord
Yes - but dividing the one audio file it generates is the issue with what you mentioned ๐
What you suggested (and Reality previously suggested) will come as another setting in the not-too-distant future
When they meant was sending smaller chunks to TTS to generate
one of the suggestion is to divide the tts to speed up a little the response
(Chance to send partial response after period / newline / etc)
Also, XTTS sometimes hallucinates words if your text is too short (like 1-3 words)
For current bot method of using TGWUI, this is essentially dividing the user prompt to get divided audio responses
another one is just about feeling
oh, I see I see, you're not directly triggering tts to run
For the way the bot works, in theory any TTS extension that works in TGWUI should also work in the bot
if you have the ability to use TTS streaming, I can share code to stream audio to discord (different from sending a audio file)
which is why I don't really want to commit to any particular TTS API, and I'll be damned before I make support for multiple TTS APIs on demand
unless they were kind of standardized like SD Forge / SD A1111
thats is another point offf
so far i'm aware of alltalktts api.
are there others?
Alltalk uses the TTS library iirc, that library supports many backends like XTTS, Vits?, and more.
there are too many tts and makes me crazy
my brain tells me that vits is good but never used it xd
I don't have the time to look into it XD
And I hardly ever use the TTS myself in the first place (same with these new behaviors) - I can only dedicate so much time to aspects I don't even use haha
I think our TTS handling is pretty slick already
I've probably put like 70 hours into these behaviors, no exaggeration.
If things were busy at my real job I'd have simply not done it
i still remember a long time a go, i just plugged https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/blob/main/docs/en/README.en.md into edge tts, using a virtual audio cable ๐คฉ and had a lot of fun
Yea, that's something I want to expirement with next.
Passing my finetuned tts output through a finetuned RVC to get a voice clone sounding even more accurate!
I wonder if it has a websocket api rather than using mic inputs.
i dont know about technical thing but for some reason a have lots of virtual audio cables
๐
I'm working on small API server wrappers for TTS and Whisper STT
for real time processing. ^^
Yea, the project is basically a voice changer
i wanted to plug this into the discord bot https://github.com/BuffMcBigHuge/text-generation-webui-edge-tts
but i cant even run it normally ;-;
Personally, I would use standalone projects and connect them via api
Reason being, when you install requirements for AI stuff
it's rare that they perfectly match
TTS lib might want one version of torch, so it uninstalls the existing one
then your textgen no longer works, or smething else breaks
for me it was a terrible experience with Stable diffusion and massive addons
Compiling issues?
I mean, does Edge TTS install properly in the first place?
not talking about the discord bot right now
With the bot - like I said, it currently should be able to use anything that TGWUI supports
(extension support)
PS: also with APIs we could also split up the workload across multiple machines @halcyon quarry
cant find extension ._ .
From the 3 or 4 different extensions I had tested, they used uniform settings keys
(for the most part)
the official web is a little bit dead
updated 1year ago
most of them
so i really dont know if the problem is on me or the new tgwui
@terse folio As it turned out, that bit I shared yesterday self.send_message instead of self.send_message() - is part of what caused my confusion with idle cancellation.
The way I had it coded was actually fine
The only time it was being cancelled, was if the response was not delayed (skipped message_manager() queueing and un-queueing it)
?
If its a TGWUI thing you're tinkering with, be sure you've activated the TGWUI venv
if TGWUI is using numpy 2.0.1 then... you may need to do some custom stuff
its me trying to install requirements of
cant i just have 2 numpy ๐
and where can i find help
how do i check the version of my numpy?
pip list?
gpt says
pip uninstall numpy
pip install "numpy<2"
pip install your_module # Replace 'your_module' with the name of the module you're trying to install
it did nothing ;-;
๐คทโโ๏ธ
installing yourmodule will uninstall numpy2 if it requires numpy1
this is a dpendancy issue.
there should be a flag that doesn't uninstall current versions of libraries
now i just hope everything goes as expected
didnt breake anything error at loading the extension .-.
alr
broke tgwui
broke alltts
didnt work ;-;
I finally got around to setting up an โauto-promptingโ image generator character, using the Spontaneous Messaging feature paired with the Dynamic Prompting feature
With the auto-change imgmodels feature, will have lots of interesting output
I would VERY much appreciate it if any one you jokers could try out some of those new behavior settings and let me know any feedback ๐ค
@terse folio when it comes to Continuing, to respond to multiple user messages in one reply - unsure the proper way to manage HMessages for that scenario
I know how to make it make sense for responding to multiple messages from same user via Continuing
which part?
the reply_for attribute?
there could be a GroupedHMessage class.
But other logic in the bot would have to know how to handle that
Well usually thereโs one user message - then one bot message that makes anothrt bot message marked as a coninuation
the messages should contain which ones were used to make up the reply
at the moment the bot is set up for single pairs
Hereโs a thoughtโฆ probably what you are getting at nowโฆ maybe it would still make 2 separate bot messages for 2 user messages, but it will slice the continued response for HMessage assignment
in the part of the bot where you reply to multiple messages, the "reply_for" can be a groupedHMessage (Soonโข๏ธ) so later when running continue we can use that grouped message to figure out the prompt
sure that might simplify some things by only having one branch to follow.
But now you need to somehow differenciate between the 1+ messages that share the same id
Could probably filter it by user when searching for the item - specifically when it yields multiple matches
What would be really cool if anyone actually ends up using these super cool features XD
Yea!
This should all be very interesting in a busy server / multiple servers
Finally made a page for Variables, too
https://github.com/altoiddealer/ad_discordbot/wiki/Variables
well, I have a model loaded, but it keeps saying its not
23:17:51-686328 INFO llama.cpp weights detected: "models/llama-2-7b-chat.Q4_K_M.gguf"
23:17:51.687 #444 ERROR [bot.__main__]: An error occurred while loading LLM Model: 'NoneType' object has no attribute 'Llama'
23:17:51.961 #3527 INFO [bot.__main__]: LLM model changed to: llama-2-7b-chat.Q4_K_M.gguf
23:18:11.985 #4027 INFO [bot.__main__]: Processing message #1 by CygnusXI.
23:18:11-994739 ERROR No model is loaded! Select one in the Model tab.
23:18:11.996 #1237 ERROR [bot.__main__]: Error matching tags: expected string or bytes-like object```
I think there's an issue with your model An error occurred while loading LLM Model: 'NoneType' object has no attribute 'Llama'
if you open tgwui in the browser, does it generate text there?
yes
im downloading a different model to try right now
test the model outside of the bot (running tgwui normally)
it's likely a tgwui issue
Some issues say they fixed it by using slightly older versions
The bot has its own separate CMD Flags, but itโs not needed for model settings so long as you save your model configs in the TGWUI model loader window. (Is needed to load a default model / skip model menu on startup)
The bot does not use the TGWUI API, btw. It launches its own instance of TGWUI
main reason for this is TTS handling. The API will not return the tts response, but TGWUI has an internal function (the bot imports) which returns both the text response and tts response
I also monkeypatch the extension loader to allow updating params during runtime (tts voices, narrator, etc)
I mentioned a week or so ago that there was some odd issue happening with "Tag Trumping".
Finally now just figured that out. Made a really dumb simple mistake when updating the code there, causing the "Unmatched tags" to remain unchanged after the first round of tag matching
buddy
Its in the wiki ๐ You have wrong URL
then update that
Fixed it
There's a shorthand url for linking across wiki articles - apparently that doesn't work from linking Readme to wiki article
erm
yeah... wtf
It is actually fixed now
I pushed an update a few minutes ago that makes Spontaneous Messages behave more like regular discord message requests.
Mainly, so they can get the effect of maximum_typing_speed. They ignore all other delay types (response delay, reading message text, etc)
After some more chatting and changing settings, I decided it doesn't make sense to truncate the typing_speed depending on the max_reply_delay value. Now, that only truncates the initial response delay / "read text" delay
Im trying my install again now. I think maybe I messed up the folder structure, it looks like the contents of the github should be dropping in the text-gen-ui folder without creating a subfolder for all of it?
My install instructions are pretty updated, if you just follow them step by step you shouldn't have any trouble
Before install:
\text-generation-webui\
After install:
\text-generation-webui\ad_discordbot\bot.py, CMD FLAGs, etc etc
Open the cmd prompt in the TGWUI directory then git clone https://github.com/altoiddealer/ad_discordbot
ok, thats what I did. Im currently trying this on WSL, and ended up doing a few things manually since the wsl.bat doesnt work with the WSL UNC path
Well! This is a unique opportunity for me if you don't mind
My WSL bat files, like the others, were written with help from ChatGPT.
Never had anyone to actually test the WSL
the wsl install / update files should just be a .sh file
Ah, I probably got the correct code from chatgpt just saved to wrong format
Ive found a few other projects that do the same thing lol, I guess GPT just says hey heres a bat file for WSL every time its asked ๐คฃ
Well I probably just copy/pasted it then saved it wrong
jokes aside โค๏ธ GPT
So the scripts work if just changing the format from .bat > .sh?
I just read the bat files in an editor to find what items to install
will try that now
update_wsl.sh: 1: @echo: not found
update_wsl.sh: 3: REM: not found
remote: Enumerating objects: 25, done.
remote: Counting objects: 100% (25/25), done.
remote: Compressing objects: 100% (17/17), done.
remote: Total 25 (delta 14), reused 14 (delta 8), pack-reused 0
Unpacking objects: 100% (25/25), 106.83 KiB | 848.00 KiB/s, done.
From https://github.com/altoiddealer/ad_discordbot
4dc2eed..537089c main -> origin/main
94e97e0..16ec975 dev -> origin/dev
Updating 4dc2eed..537089c
Fast-forward
README.md | 2 +-
bot.py | 124 +++++++++++++++++++++++++++++++++++++++---------------------------------------
settings_templates/dict_tags.yaml | 6 ++--
3 files changed, 66 insertions(+), 66 deletions(-)
update_wsl.sh: 5: Syntax error: "(" unexpected (expecting "then")```
Linux doesn't really care about filetypes (at least commandline)
You just set files as executable and run them with ./file.sh
it doesn't even need an extension!
chmod +x file to make executable (x for executable flag)
Ill try something else...
See if you can just run the .bat file like Reality suggests
and lmk if it errors
then I can consult the great and powerful chatgpt
Or Reality can just fix it XD
you can use a shebang I think it's called, at the start of your shell file to specify what type it is
it's like !# something something
i forgot
still giving the syntax error
ahaha, github runs on addiscordbot!
as it also ignored the "text like this"
I have the fix ๐
afk for a little bit need to run and do a quick task
did a sanity check just to make sure, yup, running an .bat file works (with executable flag) just fine even though thats a windows file type
wonder why TGWUI uses bat by default for that... start / update scripts
are you using WSL?
I'm not, but I have an ubuntu VM I use to host discord bots
there in is the difference, I get the following trying to run the bat file
CMD.EXE was started with the above path as the current directory.
UNC paths are not supported. Defaulting to Windows directory.
fatal: not a git repository (or any of the parent directories): .git
Failed to pull from the repository.```
theres work around for that, but the sh file I made will be user friendly for WSL
I should clarify, I'm not running ad_disordbot on there,
just my own projects for various utils.
The server has no GPU anyway.
These WSL scripts haven't been tested yet as @halcyon quarry mentioned
thats with chmod +x from wsl
no worries, I know how it goes, I will help fix these ๐
that's interesting, will have to test that out later.
I expected wsl to just be running a vm on windows
yes! I have it working โ ๐
Im excited to use this bot setup, by far the best feature list Ive seen on a project like this
awesome, tested and working nicely. Will make a pull request to add these .sh files, itll make life easier for the next person using WSL
I've been very dedicated to this for well over a year now, just adding in whatever features come to mind
What I dubbed the "Tags system" was the most impressive feature for some time, until Reality came around and implemented their history management, which was absolutely massive
Ill be diving into those features asap. I was just about to make my own short and long term memory system from scratch, so this is exactly what I needed. I will definitely stick around and help wherever I can with development and testing.
Just came across a very interesting bug
I used hide or reveal history on a bot reply which came from a Spontaneous Message prompt (silent message not in Discord).
And the bot reacted to its reply with the hidden emoji... as well as my most recent message, which was not the prompt!
I see... our history manager is currently assigning the same ID to the spontaneous message message
It is actually toggling the correct message as hidden, while reacting to the other message
is there a way to lock down certain menu options to admins only?
Yes, I need to write a wiki on that ๐
If you go into your Server settings > Integrations > Your bot - you can restrict all commands to users/roles
perfect thanks!
I believe you can also restrict command permissions on lower levels like, Category, channel
I'm testing whether I can just assign a random number as the ID for Spontaneous messages ๐ค
Yep, seems to do the trick. Just needed to add an exception for fetch_message() when its given a bogus ID
use the message Id from send_message if you can
spontaneous messages are sent as seperate discord messages right?
It uses an internal prompt to make the bot reply in Discord with its response
I think it makes sense to keep that an internal prompt in history
It could be marked as internal yes
a lil off topic but check this out https://github.com/2noise/ChatTTS?tab=readme-ov-file
I mean, to keep the internal prompt and the bot response both as visible
So many functions in the message loop want the interaction, username, channel - the Spontaneous Message feature (and Flows feature) just hangs on to the interaction that triggered it
Including the message ID - but for scenarios where the prompt has nothing to do with the original message, need to log a unique ID (not discord msg ID) or just None ID
The HMessage.id referred to the Id of the message on discord,
If theres no discord message, it can be left as None
I might be missing something
Nah you got it all ๐
have you guys ever tried putting .com or .net after your names and try to access them?
Now thatโs a suspicious link
at least i can order a pizza
Enjoy authentic Italian quality pizza. Dough made fresh every day. Sauce from the original Giammarco recipe. Order online for delivery or carry-out. Find locations near you!
guy this is what he promised
this is what i got
I haven't gotten that extension to work in a while
i followed @halcyon quarry this guy's advise and asked gpt
did this and got a lot of new errors but as tgwui can launch i just ignored all of them
literally didnt thought it was just delete a word and done
ChatGPT has cracked some very tough nuts for me
didnt learn codes, but delete a word is doable for me :)
I did all the delayed message timing in my head mainly, idk how the heck I did it.
I think I'm slightly off with the typing timing, finally now illustrating this logic
buddy you said "this this"
Crunching numbers is making me loopy
When it receives the user message, it picks a response delay and schedules typing based on that
Sorry I need to like write this out also lol
I'm not scheduling typing until it's unqueued actually... because can't know how long it is queued
ok so in this scenario, the send time will be 120. I think there's a certain situation I'm currently overlooking in regards to the seconds to write
I just use average tokens per word ๐
I use TGWUI count_tokens() on the output, then divide it by 4 for the approximate word count
Then factor the value of typing speed (value is UOM words per minute)
you know that am so ----- and in a 40~ token reply i counted 23 words?
I am calculating this bit accurately. The slippery part is the logic for scheduling everything
It's going to be less accurate for smaller responses
i feel like each 2 token is a word
or I should say, more volatile
google it is says 4. I think all the little stuff adds up like parenthesis periods commas that crap
i think that in a conversation is not very human to type something like:
zdfgsdhzfikugh asoiufhg osuhfdgohsdofuhg olsduhdfg ouisdhdfoug h szdloiufgh loszddfhgo szdh floghszdzlofgoi h jdfhg lkjdfhglk dflkg hdlfk ghldkfgh lkdfjg lkjfglkdjh flk ghdlkfhg lkdhfglk hdfli gldkfjg lkidfjg lkdjh f ggjdflk ghjldkf jglkdf jglkdf ldkj f glkdj f glkjddf gdf gd f fgdf gdf gdf gdf g dfg dfg srety ghsdfghsdxzfgsdtyhusr tysdjfkglg ho
lemme count how many words are them
Just copy/paste the output into a website with word counter
I've done it before, the 4 tokens average per word is pretty accurate
i think ill never type something that long in normal scenarios
ill test it again ๐ซก
maybe i shouldnt
test 1: 51token 25words
I define typing speed by characters, 7char/s worked for me in old bots ^^
things like emojis could be given longer times, but for ascii characters you find on your keyboard, those can by typed fast
test 2: 42token 23word 1.82ratio
test 3: 66 39 1.69
test 4: 58 36 1.6
test 5: 50 29 1.72
test 6: 118 82 1.43
feels like with shorter replies is 2token/word
and longer replies is getting closer to 1
i think that taking 2 as average is fine
I'm going with 1 token = 0.75 words
as short replies are short so still can be fast, and long replies adjust to the real values, feels fair
maybe i was wrong .-.
if bot_behavior.maximum_typing_speed > 0 and self.last_tokens is not None:
words_generated = self.last_tokens*0.75
words_per_second = bot_behavior.maximum_typing_speed / 60
# update seconds_to_write, increase delay
seconds_to_write = (words_generated / words_per_second)
self.seconds_to_write = max(seconds_to_write, self.llm_gen_time)
idk
checked to see if there are any more updated forks?
?
for complex memory
the original is dead as hell
i can check its forks?
how?
i thought i only can do this
the little down arrow should list other forks
ah okay
yea, I see now, there's a "forks" button lower down on the page that would lead to what Altoiddealer shared
๐๏ธ ๐ ๐๏ธ
never thought it was clickable
and here ^-^
for any future forking business
i still remember this https://github.com/BuffMcBigHuge/text-generation-webui-edge-tts
it broke everything i had to reinstall everything
oh interesting, complex memory looks like a tags (different tags) system that was talked about here, not sure if it was the same person but again cool idea!
god blessed me
which one?
it is a keyword based memory, what i understand is that if i mention a word in the list it will add it to context
If you get the extension working in TGWUI, it should also work for the bot
so speeds up the generation...?
that is what i wanted
and edge tts bcs its better
xd
my only concern is that i don't know where does it save the memory ๐๏ธ ๐ ๐๏ธ
I'm assuming you can update its values from the extension folder (not required via the UI)
gonna ask magic conch
It's burried somewhere above,
All I remember was talks about someone named john, the mention of his name would trigger a fact to be put in context, that he doesn't like potatoes
gonna inspire a little
?
That was after the conversation where we were trying to automate fact extraction
Not sure if Fire was making an addon with it or something
Hmmm
uhhhhhhhhh
i found nothing
im pretty sure that im gonna lose everything after a reboot
ill take a look
i think that the first look i had in that extension was everything is loaded in context
but for that i can just add them in the character card
it will save to extensions/complex_memory/saved_memories.yaml
or characters/{character}.yaml
probably in multiple files, I see mentions of .pkl (which is a way of storing python classes/structures)
are you using the fork you shared, or the original?
that fork changed the json to use yaml
original
didnt changed yet
characters/{character}.yaml smells like a mess
bcs the character card has the same name
idk
ah, good point
maybe it's worth forking it and adding a _memories on the end of that file name
I think this is what I was missing from my message logic
update_timimg() is called immediately after LLM Gen
the bot may or may not have already began typing at this time/
response_time is the predetermined timestamp when the message is first received
but the message will be unqueued and processed before or after that time
the typing task is initiated when it is unqueued, and schedules typing for response_time
should have that attribute
Whole lot of maths O:
brain is killin me
if you see the bug, congratulation
are you running on ancient tgwui?
check url
gonna do a backup ._ .
where do i add
i knew it
XD
im smart
uhhhhh
didnt work
should use an IDE like vscode.
Some times windows notepad saves files with some different encoding that breaks some languages.
I think shell script.
Not sure how python handles it.
but you'll get a "invalid character at X" error
what happens if you create a memory and save it
i used to find json why i think that you all like yaml?
no clue, yaml is probably easier to edit for the average person.
json is strict with opening and closing parenthesises, and if you add an extra comma it will also complain.
is there a traceback?
missing json file, therefore it imported "nothing" instead of the expected dict.
The code should read data = json.load(f, {})
nvm, that's wrong
data = json.load(f) or {}
around like 110
I guess json.load() doesn't support defaults
crashed
add a or {} on the end of json.load(f)
that will set a default value, which will let the other code run
it will detect there are no memories
because it's empty
and do whatever it needs to do later
Likely
yea, it probably doesn't support a default arg either
so "or {}" is a simple way to do it
But you'll likely get a different error before then
since it's running code to open the file, which will error that a file doesn't exist
data = yaml.load(f, Loader=yaml.Loader)or {}?
or data = yaml.load(f, Loader=yaml.Loader) or {}
what they should do is have code to check if the file exists, then import it, and check if that imported data has any content
yes
Next, keyerror
.
actually thats not a big problem
but it didn't save it in extensions\complex_memory either
forgot that i dont have to fix it if i have more forks hehe
this looks so good
amazing
solved
its amazin
g
give him a star :) https://github.com/Imitationman/complex_memory
time to ๐ด
Awesome!
a logic question for all of you :), there are 8 batteries, 4 good and 4 bad, you have a flashlight, it only turns one with 2 good ones a test is considered as putting two batteries in and try to turn on the flashlight
how many tests are needed to turn on the flashlight in the worst case scenario?
explain your strat ^_^
Not quite on topic here
0 tests because I check the batteries with a voltmeter
what model do you guys use?
I'm running a 34B RP Merge model
life is hard
the 12th time reading this
what if i change the folder's name to edge_tts
right now its name is
success???
The bot skips extensions that are not already installed for TGWUI
Are those extensions installed for TGWUI?
ofc
openai is an extension that it checked by default
although I don't even know what it does
Iโll try to reproduce in an hour
just in case if you want to reproduce on these
https://github.com/Imitationman/complex_memory
https://github.com/Unorthodox-oddball/text-generation-webui-edge-tts
to lauch edge tts you have to rename the folder to "edge_tts"
Are you actually using complex memory with the bot?
trying now
ok so I misunderstood what the setup attribute was all about
I restored that bit, and it is loading complex memory. It may even be working...
Wanna try something?
In bot.py Ctrl+F to load_extensions
Update the bit regarding setup:
if hasattr(extension, "setup"):
extension.setup()
#log.warning(f'Extension "{name}" is hasattr "setup".')
#continue
complex memory is not going to work.
But the others might
complex memory could probably be tweaked to work with the bot, but as it is it seems to use some weird method...
add two #?
Yes to comment them out
add the line extension.setup()
same indent level
I may need to add a little bit of code to make EdgeTTS work. looking into that
feels like alltalk
with my 3am brain at 100% load
you have to add some code to edgetts's code
Need to put edge_tts in the config file
you gonna add support for edge_tts!?
yes working on it now
ok it errors because it returns an .mp3 file but the bot is always expecting a wav, I think
so just need to convert it to wav if mp3
it's likely you can just tell the bot to use the mp3
where is it expecting that?
we already use the mp3 for sending the file to discord/voice chat
hmm
discord voice client uses ffmpeg to convert the audio to whatever discord wants.
that will be fine importing mp3/wav/...
the problem that im encountering with the bot is that by default there is nothing selected
while using the ui you have to press the load button and select the voice, model and turn on the rvc
mine does nothing and no error
Did you add edge_tts in the config file?
no
well there's your main problem XD
you said that you made the bot non api bcs then you can get the audio file
funfact, pydub also uses ffmpeg, so it can read most formats as well
so what i understand is the file is what matters and not the config
maybe the config makes life easier bit that is for the future
@terse folio looks good?
By config, I mean the bot config file - config.yaml
It has a tts settings section
if from_mp3 is valid, yes that looks good
yep, pylance found it
complex memory is a little bit weird
on the web ui what supposed to be for the character is for general
and the bot has no reaction
...
Complex memory will not work with the bot
f
If it has parameters for TGWUI then yes
it gives the code that i have to replace in the edgetts script.py
a big chunk but not all, and no big changes
did ctrl+f "row" to know where it ends and pasted it
params = {
'activate': True,
'speaker': None,
'language': 'en',
'show_text': False,
'autoplay': False,
'rvc': False,
'rvc_model': None,
'transpose': 2,
'index_rate': 1,
'protect': 0.33
}
this is from script.py
Find my example character M1nty
gpt told me to change some of them
Don't change anything ๐
tgwui uses these params. the bot hijacks the extensions loader to change params on the fly
you just need to use a valid speaker value, in your character's extensions / edge_tts / dictionary
mine worked
just one problem
no commando to toggle tts
and didnt join channel
ill leave it for you ๐
command is /set_server_voice_channel
currently you need to paste the voice channel ID
need to update it to provide a list of valid channels to choose from - OR, Reality can just merge their update Reality said they already updated it
reload the bot
Make sure your character has the setting enabled use_voice_channel
maybe its because of my gpt method ._ .
really there is no way to make complex memory work? ๐ญ
i thought i merged that long ago
That was a pretty old update as I went around fixing some little things
check if it works
๐ธ
Derp!
I just pushed updates that allow setup for extensions, and enable processing .mp3 tts responses
so you can undo the changes to bot.py and update.
Or delete it and update
where
idk what the valie voice names are but should be something like this
See example character M1nty
You need to add an extensions dictionary to your character file
here
yes, you can copy/paste that into your characters
Then rename one to edge_tts
and use valid parameters
#1154970156108365944 message
it should work
Those are the default values - so you only need to include ones that you want to modify
You could just use speaker
ill modify all of them ๐
But Im not sure what valid values are.
None is default, if None it plays the first generic female voice
valid speakers probably in the UI
i know that part
could be delay or pushed the wrong one
Woops
I fudged it up
er no I didn't
Ignore the warning message
I decided to include a warning message when running setup() for extensions - but it is not skipping those extensions anymore
I'm just using tags to mimic what complex memory does
@valid crypt The Tags system has tags to mimic complex memory
Does edge tts only support English or what
Can't find any other options...
is not english
Ohhh I see
the speakers have the language set in....
thats rihgt
then hmmmmmmm
why languajge
xd
the speaker part is not very important
as you chage their voice using rvc
but each has a unique pitch and style
so a little difference between them
The bot has a /speak command, I'm trying to whip up some code to support edge TTS
I'm just going to support the English speakers for now, in the /speak command
thats cool
in the web ui has a preview button which is pretty useful
ok found a bug in your code
well not you fault but
cant change voice and model
just the rvc and speacker does nothing
i think you have to do something with edge tts's code
it is somehting with edgetts
gpt:To turn on RVC by default and select an RVC model by default, you need to modify the params dictionary to set rvc to True and rvc_model to the desired default model. Additionally, ensure the default model is present in the rvc_models list when the UI is set up.
i noticed this bug before
so what model are you trying to change to
ok and then what RVC models appear
If any models appear in the RVC Models list in the UI, you should be able to use any of those values in the rvc_models param for your character specific extension settings
if i dont press the button there is no models
The models are there and valid values
this is what chat gpt gave me as answer and worked
Get Voices
voices = asyncio.run(edge_tts.list_voices())
print(f"Loaded {len(voices)} voices.")
voices = [x['ShortName'] for x in voices]
# Get RVC Models
folders, files = get_all_paths('extensions/edge_tts/rvc_models', '.pth')
rvc_models = files
print(f"Found {len(rvc_models)} rvc models.")
# Ensure the default RVC model is in the list of available models
if params['rvc_model'] not in rvc_models:
params['rvc_model'] = rvc_models[0] if rvc_models else None
if params['speaker'] not in voices:
params['speaker'] = 'en-US-MichelleNeural'
return [gr.update(value=params['speaker'], choices=voices), gr.update(value=params['rvc_model'], choices=rvc_models)]
def setup():
global voices, current_params, rvc_models, rmvpe_model, hubert_model
print("Loading hubert model...")
hubert_model = load_hubert()
print("Hubert model loaded.")
print("Loading rmvpe model...")
rmvpe_model = RMVPE("extensions/edge_tts/models/rmvpe.pt", rvc_config.is_half, rvc_config.device)
print("rmvpe model loaded.")
# Cannot run async on main gradio thread
# This works, but does not refresh gradio
thread = Thread(target=refresh, args=(None,))
thread.start()
but it worked
yes
they show after i press the button
just try using it as a value for the character rvc_model param
pretty sure is this code
Ensure the default RVC model is in the list of available models
if params['rvc_model'] not in rvc_models:
params['rvc_model'] = rvc_models[0] if rvc_models else None
if params['speaker'] not in voices:
params['speaker'] = 'en-US-MichelleNeural'
return [gr.update(value=params['speaker'], choices=voices), gr.update(value=params['rvc_model'], choices=rvc_models)]
#1154970156108365944 message
default value is rvc_model: None
replace the default with a valid value and try it
also need to set rvc: True
ye
tried restarting bot?
gonna add it buddy
It should detect models at launch, I had errors when models were missing on launch
those are required models
I don't have any RVC models though XD Could you link me to one?
https://huggingface.co/WoomyPearl/RVC-Model-Palace/resolve/main/MinionsJamesArnoldTaylor.zip
Locally trained.
Credit me if used for unlimited bananas!
And yes, Johnny Test's VA voiced the Minions in the Despicable Me game
ofc credit
oh if used for unlimited bananas
.pth that is the file
got it?
working on it
50mb is not that big
there are 2 required models that you have to put into the model folder
and the .pth into the rvc model folder
It used the rvc model
The bot had an error with decoding the output mp3 file though
Yes, that
Chatgpt helped me resolve the decoding isue
Now it is playing the RVC outputs fine
i though it was for sd or smth
https://github.com/altoiddealer/ad_discordbot/wiki/Tags#flow-chart-overview-of-the-tags-system this one?
That part is one-time tags, but you can make persistent ones that can be triggered by text
I'm using tags to inject context information for characters, places, etc., for my RPG campaign
bro this feels like the redstone in minecraft
simple but can do big stuffs if you are smart enough
leave a tutorial text here ill read it in 8hours
- trigger: '<Character Name>'
search_mode: user
suffix_context: "<Information about that character>"
got /speak to populate the menu with all the english speakers for edge_tts
The unfortunate thing about fixing the decoding error from RVC output, is it is resolved by converting the mp3 to mp3 (lossy conversion generally a big no-no)
Better than nothing in this case, though
can I get the full error?
might install RVC and try later to see what's up
Main error was the file is missing a header
"cannot find codec params for stream"
tells me rvc is just writing raw data to some file and not giving anymore information about how to read it.
yea
Hmm
you should be able to just specify a header to pydub or whatever instead of reencoding entirely
not sure what the format rvc outputs is
I think also mp3โฆ will also double check that tomorrow
The issue is with edgetts?
Yep and specifically with RVC enabled
does the bot work with rvc disabled?
Yep
can you send 2 audio samples with and without, maybe I can use ffprobe to compare the codecs/whatever params
Will do tomorrow ๐ค in bed
alrights
this is a wav file with a .mp3 extension
PSA: saw a reddit thread that the Layerdiffuse extension (for generating SD images with transparency) has been updated to allow img2img and inpainting
when i was using the gpt crap solution it was working fine
ยฏ_(ใ)_/ยฏ
you didnt see those commas xd
Your solution is to modify the projectโs code, but the ideal solution for adding support in the bot is to not require users to modify their extension code
ye
how do i fix the audio output?
Pushed it to dev branch last night, you could copy paste the code from that commit
Or hang tight while I grasp for time
anyway to only send audio
https://github.com/justoboy/complex_memory the guy was smart
this is the character card file
it adds memory there
but even this one dont work with the bot
๐ข
Yea, the bot adds some custom stuff to the character card,
i dont think the bot changes the cards but maybe the memory extension doesnt save all the values properly
feels like not doing anything
I'm putting all my information tags in my character file
gimme an example
didnt understand
wait
awww
???
no i dont understand
That's the code block for a tag that triggers when the user has <Character Name> in the prompt. Characters in the story, not the AI bot character.
So an example:
- trigger: 'Bob'
search_mode: user
suffix_context: "(Bob is a masochistic panda.)"
Any time the user includes the word "Bob" in a prompt, the LLM gets "(Bob is a masochistic panda.)" injected at the end of the current context.
trigger dont have to be the characters name simply the keyword?
search_mode: user means search from users input?
Right
- trigger: 'keyword1, keyword2'
search_mode: user
suffix_context: "(Bob is a masochistic panda.)"
or - trigger: 'keyword1,keyword2'
search_mode: user
suffix_context: "(Bob is a masochistic panda.)"
i mean
keyword1,keyword2 or keyword1, keyword2
space
No space
It's not as convenient as being able to put memories in the webui, but it still works
You could use either suffix_context or prefix_context
to add before or after right?
i think is one of these: what comes first is more important or what comes after overwrites
I was experimenting with this earlier,
I'm making a project to mimic the discord experience.
Some things I have tried:
Initializing user info for all participants in the current context window.
Like name: Reality, nickname: Kat, is_bot: False...
The idea is that as events happen this data might change.
And I want the bot to understand that if I change my nickname, that I'm no longer "Kat" but new name.
and so on.
My latest test was reversing the idea, and appending the flattened user data on the end, with events leading up to it.
instead of giving a initial starting point, then all events are changes to that start.
In my testing, using the suffix preformed better for some questions.
tests are still ongoing
@keen palm
im not sure if im doing it right
i added the code to the character card
Could you paste exactly what's in the character card for that?
tags:
- trigger: 'test'
search_mode: userllm
suffix_context: "123"
`tags:
- trigger: 'test'
search_mode: userllm
suffix_context: "123"`
I don't know. I can't see a problem there
Yeah
tags:
- trigger: 'test'
Etc
write me a simple one ill paste it
#1154970156108365944 message
what if there are more than one
search_mode: user
suffix_context: "(Bob is a masochistic panda.)"
- trigger: 'Bob'
search_mode: user
suffix_context: "(Bob is a masochistic panda.)"```
like this?
tags:
- trigger: 'Bob'
search_mode: user
suffix_context: "(Bob is a masochistic panda.)"```
Both would be triggered in that case, I believe
same
when i add
- trigger: 'Bob'
search_mode: user
suffix_context: "(Bob is a masochistic panda.)"```
breaks the character card
Would you be able to paste the entire card here?
the character card works fine
but breaks when i add ```- trigger: 'Bob'
search_mode: user
suffix_context: "(Bob is a masochistic panda.)"
greeting: hi"
context: "be cool"
use_voice_channel: true
behavior:
reply_to_itself: 0.3 # 0.0 = never happens / 1.0 = always happens
tags:
- trigger: 'Birthday'
search_mode: userllm
suffix_context: "your birthday is at 5th of december"```
breaks even with that
format it like this
tags:
- trigger: 'Birthday'
search_mode: userllm
suffix_context: "your birthday is at 5th of december"
the '-' is used to create a list
tags: is a list of dicts
so all the dict items should be on the same column since they are part of the same object
โ
Ohh, I always format it like that but didn't realize it was that important
take a look at this to get live feedback
Online YAML Parser online helps to parse, expand and collapse YAML data.
๐
the complex memorie that modifies the character card ignored that
totally misleading
xd
I don't know how complex memory was written, but they might have a list of valid keys for the file.
when adding the memories to the file, it adds back those known keys and ignores "invalid" ones
I would have to check out the fork you're using to know for sure
yea, that makes sense since it's importing data, then writing it back to the file.
instead of name greeing context .-.
it will delete comments and other information that isn't data in the process
so deadly
the python yaml parser might be a little less strict, I have no idea.
But, yaml is a format that uses indents.
unlike json for example that uses parenthesises and doesn't care about whitespace/indents.
wait @keen palm what gpu do you have to run 34b
I use 2x 3060s
do you get the speed of 2 3060?
I don't think so. I just get the benefit of the extra VRAM
How so?
Oh gotcha. How much VRAM is that?
8 ;-;
not a expensive laptop
ยฏ_(ใ)_/ยฏ
using 15% of the power of a desktop
getting the 80% of performance
W
i have a desktop aswell with a 3060
26w doing nothing ;-;
ok guys a math question, knowing that this code make the bot be able to reply to itself and can reply to the reply, what is the average number of replies if i set it to 0.5
My 2 GPUs don't even draw that much power combined when they're idling
Yes your indents are incorrect (as was said before).
The main "tags" key is a list. Lists have no "keys" in them - only "values"
This is what a list of strings (text) looks like:
states: ['New Jersey', 'Ney York', 'Vermont', (etc)]
Each little bundle of information in Tags is a dictionary, except the dictionary doesn't have a key name. Each of those dictionaries as a whole is considered as a "value" in the list.
tags: [
{'trigger': 'Bob', 'search_mode': 'user_llm', 'suffix_context': 'your birthday is at 5th of December'},
{'trigger': 'Jane', 'search_mode': 'user', 'prefix_context': 'Jane loves to smoke weed all day, every day. Close friends call her Mary.'}
]
yaml is more user friendly for formatting this stuff though
Different formatting, but same value
If gradio wasn't a total pain in the ass to work with, I'd make a gradio interface for everything.
But it is a total pain in the ass to work with
I worked with it one time... the main thing that sucks about it is that the LLMs also suck at it
And the documentation isn't crystal clear
not noob friendly
@valid crypt did you know that everything in the Tags page on the wiki is stuff that can be triggered from a text match?
swap character, modify text, change models, modify parameters, etc
๐๏ธ ๐ ๐๏ธ
I'll take that as a yes? ๐
did you know you can build a computer that can play minecraft with just redstones?
Someone built a image classifier with redstone,
We need to go the next step and build an llm in minecraft!

๐ตโ๐ซ
Instead of just interacting with the LLM one time, it interacts with the LLM as many times as you defined
I included some relatively simple examples in the dict_tags.yaml file
i know that if i set that to 1.0 it talks to itself ๐ค
If you read up on the Variables (I just made a wiki page for it), and the format_prompt tag - those are the main things for the prompting during Flow steps
One concept from my examples, is having a specialized character context - and all it does is replies with a value to use in one of your 'tag' definitions.
So I made a character context which, all it does is picks an aspect ratio for image prompts
Using Flows, you can have one character come up with an image prompt, then share it with the second context to get an aspect ratio - then finally, generate the image with the prompt and the aspect ratio.
something like you give the base idea to your employee and that employee adds detail and send it to another employee to think what aspect ratio is good and send it to stable diffusion
when i use stable diffusion i can just ask a couple of time and check the process, if i make it a flow is like getting off control
maybe better maybe not
this would be interesting for the bot
but as it is keyword based??? then...
ex: I want your selfie
bot: i wont give you my selfie
flow: img output
actually that can be done with tags kinda
gonna look for uses another day
If you have any crazy 'what if I could...' ideas, the tags system may be able to make it happen
If not, then any good idea I could just add in
what are the core features that you are working on
in your to do list
Add optional Behaviors to be more humanlike (โ
?)
Discord based conditional Tags (?)
Per-guild Characters (at least i dont care)
User Variable assignment (?)
Segment Anything extension support (stable diffusion?)
I'm still not satisfied with the behaviors, I'm screwing around with that right now
I want to expand on it once I have the timing nailed down perfectly, to reply to multiple messages at once, other stuff
Just added another item to the to-do list before I forget ( add 'begin reply with' command and Tag)
somethinng i would suggest is segment the text output and send couple short texts
I wonder how comfyui handles model loading/unloading.
For example, what if there was support to run a comfyui workflow for a tag.
But between 2 comfyui workflows, it might use some of the same models, and some new ones.
What I want to know is if it unload/loads models every run, or tries to smartly keep them in ram for a little bit...
But anyway, that would also give people a huge amount of freedom to do anything
but idk if it is possible
Yep, that's part of it
yeah buddy some ui
All the UIs have settings to allow multiple models to stay in VRAM
back when I tried it out, I saw that comfyui had an endpoint to execute workflows as json
you can say that right now you can change a lot of settings but
is pretty tough
and stt support :v
STT support now on the list as well
These things are all coming, it takes time though ๐
I've been plugging away for > 1 year
for now ill explore more extension and try sd with the bot
I use the bot mainly for SD so a lot of the features are focused on that
wait
how do i turn off the tts
๐
i remember that alltalk has a command to turn it off without modifying files
From a trigger phrase, you can pick random controlnet input from directory, or have 3 controlnets working together, etc
Just remove the name from config.yaml
Don't include it in TGWUI's settings.yaml (under default extensions)
to use it with TGWUI use their CMD Flags
to use in the bot, just put the name back in config.yaml
one more suggestion idk if it's hard
in case of using tts
add a folder to put some random sounds(voices) like uhhhh, eeeee, aaaaaa
when the bot receives the input wait a few seconds and play randomly those sound for a few seconds to gain time for the output
not very crucial actually
I always appreciate suggestions - that is probably not going to happen, though ๐
At some point we may be able to 'stream responses' so it may speak sentences as they generate
(probably what Reality is typing)
thats it
in my experience XTTS adds filler words like that sometimes on it's own.
you also could achieve it through prompting.
Making the bot type "eee" or "uhh" in the sentence sometimes.
Also the filler words can be influenced by finetuning a TTS model.
I trained one and it mimicked some of the background noises, breathing, and speed I spoke in the reference recordings.
that is something that i could explore but
alltalk only starts working when the text is done
I think to achieve streaming with TGWUI, it would need to be written into the TTS extension the user is using.
The TTS library supports streaming for many backends.
it's a different function call
filler words is not what i want but a way to gain time
I could add TTS streaming now if I wanted - the problem would be that it would be dedicated to just AllTalk TTS (probably).
I only used alltalk for a few minutes,
But I have a smaller lightweight tts server I put together that uses the same backend as alltalk (TTS lib)
for XTTS.
It also has streaming ability, it will start responding within 200ms of your request
pretty good then
That's when you have TGWUI handling it, but for streaming TTS we'd be using the TTS API directly
So we would get streaming text from TGWUI and send chunks to the TTS API
i wonder if edge could do that
they prob don't have an API. And that's what I don't like about the idea atm - not very flexible.
I got edge_tts supported in like 30 minutes b/c of the way the bot handles extensions / tts
how about this,
create a TTS hub server
that hosts api for various engines
alltalk (TTS lib)
edgetts... whatever else
You do it XD I have my sights set on all this other stuff I barely know how to do
and provides an api on common grounds
is there an api format I should mimic?
like elevenlabs
what do people consider a standard?
idk
I really don't know, I'm not that into the TTS really. There's people that eat sleep and breathe it though
i heard a lot of elevenlabs but paid ._ .
yea, I mention elevenlabs because it's one of the big ones
similar to how we use openai as a standard for text generation apis?
because everyone built apps around chatgpt, so if we want local models to mesh easily, adapt them to their api
openai is a company
api means your interface to talk to a program.
but i dont know how to use the openai api of tgwui
if you're not programming things, then it's nothing you have to worry about ^^
that's what the webui is for
an api key is a password to an account basically
if you have a subscription to chatgpt for example
your code can access it via api + api key
what does the openai api do in tgwui
wheres the key?
so that you can plug your local hosted model into programs that use chatgpt
and pretend that it's chatgpt by mimicking how it commicates over the web
i need to know where is the key to connect everything in brain
your locally hosted version of the openai api doesn't need a key
because it's not being hosted publicly
so you can put anything there I think
or just 111111111111111111 (there's an example in the docs)
hmmmmmmmm
these are my api endpoints I have so far, pretty basic, some allow for chaining processes so you don't do back and forth which will accumulate latency.
One thing I like about my server over others is I have the option to save latents to file.
And hold them in ram so you don't have to upload reference audios every time you want to generate speech!
Nifty ๐
and I have another one for STT
cool thing there is I got STT, TTS, and some other voice related models all working from within the same Venv.
Less stuff to install!
Need to figure out how to allow parallel STT streams.
It should work, as whisper can transcribe audio at pretty high speeds!
30-60 seconds, in a second of processing.
But the thing about streaming is, it's processing this window of 30-60 seconds of audio every second
as new data comes in
have to change the api adress to use the key right?
yes, instead of going to
https://openai.com/v1/chat/complete
it will go to
localhost:5000/v1/chat/complete
what program?
AdDiscordBot doesn't use the api
a game
if the game anticipated people changing the api url, it would be found in the settings probably
if the game is written in python and you still don't see any urls
it's likely using the openai library
read the docs, there's a keyword arg you can pass to the api class that will change the api url to a custom one instead
Also, i wrote "openai.com" as an example, I have no idea if that's the actual url