#ad_discordbot (Fork of Fork of xNul's bot)
1 messages · Page 7 of 1
active_settings = copy.deepcopy(bot_active_settings.get_vars())
if not active_settings:
bot_active_settings.init_activesettings()
Use the load_defaults function that runs on class start, that's designed for creating the default variables
I'll make the switch when I get back
back, and okay
Next, I can probably figure this out - is I think put all user setting files in a ‘settings_templates’ folder, with a txt instruction to copy up one level to put into effect
With this and internal changed, I think users could git pull without issues, maybe just miss out on features if they don’t pay attention
need to write an updater
little python script maybe that runs install of requirements, git pull and all
active_settings generates when switching characters
Ideally, it will have empty dicts when it isn’t found - then when botsettings initializes it will assign all the default values during update_settings()
Which is where I showed the snippet earlier
it does now, but initializing with defaults isn't necessary because the bot will fill out what it needs anyway
for example I didn't have SD online
imgmodel was left as {}
but everything else was populated
Since users wont be modifying this file, I think that's fine
the code already handles defaults with the .get statements, right?
or is there something else Im missing
Before I changed it I was getting error during character loader, the update_dict function errors because the llmstate dict doesn’t exist yet
Then yes, all is well now 🤗
Well almost well 🙂 Did you do anything in regards to the error messages?
Which ones?
There's just a mention of file not found, but that's fine, it will be created when it needs it
User doesnt have to do anything
FileNotFound - but this is expected, so I don’t think we should flag as error right away
okay
As I mentioned earlier I think it should first give a logging info or maybe warning, but if file fails to create then error
added a "missing_okay" flag to load_file
Also, at the top of the function data = None is defined.
It's actually okay without this because in all states of the if statement "data" will be assigned.
And the "with" statement doesn't change the scope of the variable.
So it just passes down until it's returned ^^
I understand there could be fears about this causing bugs, but it's all good!
Python is interesting like that.
some variables can leak out in unexpected ways
Like if you were to write
for i in range(10):
pass
Then later write
print(i)
This would actually print "9" and not cause an error.
Compiled languages on the other hand require assigning variables before they're used 😸
Well pass does basically nothing
I’m just contemplating your indent level on the print statement 
Exactly, it doesn't seem obvious at first because the indents are wrong
But python interprets it as this
i = None
for i in range(10):
do stuff
print(i)
But the "i = None" is implied
Of course, this only works if it's ensured that the loop will at least iterate once
because if the loop never runs, i is never defined
In the case of the if statements though,
if both "if" and "elif" and "else" initialise "i"
then it will never cause a variable error
Ahhh so your saying that after a loop, the last key, value, whatever would still be the variable outside the loop?
Yes, but if you're working with data where it's possible the loop will be empty, you'll want to set a default value
Like you did here
Of course
I think I only fixed that recently 
Now I know better for things like that
And I was having a lot of errors with missing embeds when trying to delete/edit them in exceptions
Until I set them as None right away
and of course ‘if embed:’
yea, the order you assign things can also be confusing
like a variable you're going to use has to be created before you use it.
But if creating functions/classes, they can be defined after
IF you use them within another function that is used later.
def main():
test()
def test():
pass
main()
This can also affect typehints too!
So, python lets you write a typehint in quotes, that lets ot evaluate in the future
What is a typehint btw
This would work for your IDE to know what class "db" takes
Mostly just useful hints for developers to know what items a function takes/outputs
Python doesn't enforce this
For example
This is very new for me
if I comment out the typehint
python doens't know what "self.llm" is
because it doesn't exist yet
it's loaded from a function where a lot could happen
it could be loaded as a Nonetype, maybe an int, maybe something else
there's uncertainty here
made a mistake in the test function
Heh
forgot to add self as a variable
Wow, python is actually smarter than that ^^
it evaluated the chain of functions to figure out that self.llm will be initlaized as a "_Statistic"
I understand… if you put a type hint, VScode and the like will make better suggestions and it will be more apparent what values are expected
This is because, self.load_defaults() will always run in __init__
so python is safe to assume what self.llm is
yes
new example
let's assime self.llm is a dict from the statistics file
data.pop will return the value of llm, or {}
so there's a posibility it wont be a dict according to python
Well, the dict
because of this, python doesn't give me a hint
by telling python it will be a dict after importing from the file
I now get all the dict hints 😸
when it's not obvious what a variable is, yes
but if you define a variable like
i = 10
python knows it's an int and will give you the typehints for it
thing is, I didn't write this code, I don't know what is being passed into all these functions and what comes out.
So it can be harder to debug
Seems like you figured out a lot pretty fast
So I began tracing back variables to their source and documenting what they are and what code should return
What does -> do?
it tells what a function will return
Ahh
def f(arg: type) -> output:
If its returns many things that probably doesn’t look so hot
Typehints aren't always purely for debugging
if you take a discord command for example
when you create an arg for the command, you specify a typehint
discord.py reads the typehints and creates code in the backend to enforce them.
they're called converters
What’s interesting is that I’ve done an absolute massive crapton of coding with chatgpt but it never provided code with typehints
so if you defined input:int
and ran a command /test 100
It would return an int in python, instead of a string "100"
the same works for "discord.User"
you could write a user's name, and discord would figure out to convert taht arg to a discord.User type
Because this isn't the "correct" use for typehints
It's just a syntax hack thing discord.py did
Ahhh
but it's technically correct in the sense it shows you what the arg is 😸
Discord.py specific then
just in an unofficial way
Well I understand there’s probably other libraries with similar hints then
yes, and you can write such code yourself!
you can access a function's doc strings and typehints using "__doc__" and "__annotations__"
I do have to run now for sure, thanks for the hints 😉
Cya!
Super secret cool thing:
You can also assign custom data to functions!
Discord.py also uses this, that's how it creates commands/events internally.
It's the way it figures out how to route the right event to the right function
Merged 👍
Just brushed teeth, picked up phone, see you had more info 🤗 good stuff
I’m looking through the commits… I don’t think you were caught up with main?
is main different from dev?
Could’ve sworn I did change bot_statistics.llm_statistics to just bot_statistics.llm
You did, I also changed it to bot_statistics.llm
maybe both of us doing it made it revert?
i'll have to check
Ok I guess I just didn’t get there yet
no no, you did make the change, but something went wrong if it's reverted again
both main and dev use bot_statistics.llm for me
It was this commit I noticed it
https://github.com/altoiddealer/ad_discordbot/pull/29/commits/df15d9a0a79f9a2ea7a2f3712a62944a1a7f8eb0
Yea, but after merging (next commits)
you the code matched so it wasn't listed in the PR
whew
Look at the files changed tab
nothing changed in bot.py ^^
because the change I made already equalled the change you made (after all the commits were applied)
You went through the Recent Changes in the readme and removed what looks like 40 line break instances, lol
Ahhh man, I try and avoid modifying most of the user setting files unless its essential - because updating the bot is still an unpleasant task.
That is quite the cleanup though I must say, must have eraditaed 1k spaces
i just ran a regex on the folder
:3
users don't have to update those files, it can only be applied for new people who freshly download it
thats fine
So long as they actually follow the install / update instructions
if they just clone it into textgen-webui and try git pull, they'll have to back up their user files first
I think you should look into making the bot an extension for tgwi, that way people could just run git pull and it updates with no need to move files
I'm aiming to change the instructions back to that (clone into TGWUI)
Now that 'internal' and those settings are created and not part of the package, thats half the battle
👍
@terse folio
My idea is to move the user settings files to a settings_templates directory.
In the main directory will be included a file called:
User settings are in settings_templates directory.txt
In that directory will be included files called:
Copy these files up to ad_discordbot directory.txt
Do not edit these files.txt
Then I modify the bot code to ensure it works without the user setting files.
Let me know if you have a better idea
it's worth a shot, what I also see is using gitignore to ignore changes to configs when running git pull?
But it downloads them on git clone.
Not sure if it would say there's a merge conflict
I believe what would happen is that it would just ignore the files
The downside to this is the users won't know anything changed in those files when it happens, unless they visit the project page
For quite some time now, I've been careful to use .get() and even the config transition code I did - to ensure the script runs for users stubborn to update the settings files
great!
(As is good practice, you're doing it too of course)
I'm going to start working on that idea next, unless you have any better idea
Have some work to do 😛
work work
I'm currently working on a wrapper for xtts so I can use my finetuned model from an api.
The webui I was testing is a bit of a mess :P
I think manually moving the files will work,
I've seen other projects do it.
And it could be implemented in a installer script to move the files if not found
Custom models?
it seems you need to load them wit different methods as tts.api doesn't have get_conditioning_latents method
for instance, Alltalk_TTS has a parameter that works with TGWUI to specify the model
I'll search alltalk, I don't think it comes with tgwi by default
Fairly certain the behavior is enabled by default
Also another thing I want to implement is holding the audio files in ram only.
I don't need a disk full of generated audio 😸
already deleted 1Gb from tests
haven't checked it out yet
I think I have one to delete the output...
text-generation-webui\extensions\alltalk_tts\confignew.json
These parameters can be included in your character file under extensions key
And the bot will load them
Here is one default parameter:
"tts_model_name": "tts_models/multilingual/multi-dataset/xtts_v2"
it would be cleaner to just never write the file, it's more performant to keep things in ram because you're not limited by read/write speeds.
And I like to move the AI stuff to an HDD so i'm not wasting SSD cycles on huge amounts of generated content
But that's a thing I'm willing to implement myself if it doesn't exist, no worries!
Yeah seems like alltalk does not have a setting to keep file in ram, seems like it wants to always write the output.
There is a setting to delete the output. But I don't think you want to use that as it will likely delete it before the bot plays it 😛
with most programs you can tell them to write to a bytesIO() object.
This is a temporary file in ram ^-^
from there you can convert it to bytes and send over a webapi to your application (in my case)
in the meantime, happy to test with it!
Does alltalk support piper models?
I want to look into alternatives to Xtts because it doesn't permit commercial use.
just want to feel less restrictions on whatever I do
I don't know all too much about it, don't know anything about piper
Seems not, piper is only mentioned in issues when talking about feature requests
If memory serves me right piper was good but more vram required
interesting, okay
I had zero interest in TTS until the advent of xtts
even then I'm just barely interested 😛
So easy to add voices though - idk if you noticed but our /speak command allows users to attach their own voice clip to generate with
can be mp3 or wav
interesting, is that saved?
Nope, one time use
need to drag n drop every time 😛
You may lose sleep to learn it writes a temp file then plays it and removes it 😄
sometime on the todolist, could add a feature to save a voice permanently or just for the current session.
And assign that voice to a name.
yea, lots of temp files everywhere haha
Gradio writes temp files when you upload an audio as well
tbh I'm not sure anyone has actually used the feature, who knows 😛
Its good enough as is IMO
borderline excessive
I could be overreacting about SSD write cycles, not sure,
I just am in the habit of building optimized things :)
We should strive for perfection, indeed
Beautiful work on the updates
It's perfect. For a sec I thought it should say something about creating the files, but I don't think users need to be told this
Actually, I do want a message for when the internal message is created only
Less instructions means less points for users to screw it up 😸
class SharedPath:
dir_root = 'ad_discordbot'
dir_internal = os.path.join(dir_root, 'internal')
if not os.path.exists(dir_internal):
logging.info('Creating dir "/internal/" for persistent settings not intended to be modified by users.')
os.makedirs(dir_internal, exist_ok=True)
seem about right?
certainly works
Sure
Streaming with alltalk is nice
But it feels like all the generation is half the speed of the other webui (even through deepspeed is enabled)
maybe it's doing some other processing to clean up the audio ontop
also that's a cool UI for alltalk, nice settings
Xtts-webui:
resemble enabled: 24s
disabled: 11s
Alltalk: 17s
all with the same text prompt
but with shorter prompts, it's pretty consistent with speeds
and sometimes faster ^^
I can't imagine why it would fail, because if path already exists, it will skip.
maybe a permission error.
but if that was the case, os.makedirs would raise an exception and not reach the log code.
Nice idea though!
Reusing code 😸
Ill just axe the second bit then
I'm not sure if you can run the function like that
it's working
wildcard message doesnt appear b/c I have it symlinked from Forge extensions folder
Random python info :P
There are decorators that enable you to run methods in a function without needing to instance it.
like @classmethod
or @staticmethod
classmethod lets you define a "cls" arg instead of "self"
so you can use other functions in the class outside of an instance.
Static method has no access to other parts of the class iirc
I've used cls before, but very little experience with it
Only lately getting any experience with self method
nice nice, it works :)
Now to offload these user settings...
Yes, on startup we're just going to go ahead and copy the files from the settings_templates directory
Woooo!
Looks great!
Guess I should include a warning for if the src_path does not exist
I don't think the code would be running in that case
that warning would have to be at the top of bot.py before you import other files
Well in this case the src_path is the file in settings_templates
so if the file does not exist in the main path, and also not in the settings_templates, raise an error
oh, thought you were talking about the ad_discordbot folder 😸
x_Copy all files from 'settings_templates' to here.txt
x_Copy all files up to 'ad_discordbot'.txt
I think naming the folder "settings_templates" is obvious
just need to mention to copy them up a level
Committed to Main
Everyone may now git clone the project into /text-generation-webui/ and be able to git pull moving forward without issues!
@viral lagoon a long time ago you suggested making the user settings as templates - finally got around to that. Thanks for the tip 🙂
What'd you add on this update?
Just need to replace bot.py and /modules/
Reality went nuts clearing up trailing spaces everywhere (why many files changed) 😛
Other than that, you could move your installation temporarily so that you can git clone https://github.com/altoiddealer/ad_discordbot into /text-generation-webui/
Then copy your settings files and /internal/ back to the new install.
Moving forward you can git pull without issues
Hmmm, I keep getting this at the end of every response:
@keen palm" Reply Delete
Three different models producing the same thing
That's not an option in config
Good 😛
Try reset conversation
Maybe one time it decided to output that, and now it's mimicking itself
Yeah, that seems to be the case. Bad bot
I'm testing a new model, and it loves using emojis and winks
I...don't know how I feel about that
😉
I think it's hitting on me
My character sent a pretty impressive image involving a dildo
A user sent an Issue that they are using the bot on multiple servers, and that it is sending TTS to the one voice channel regardless of server 😆
I'm pushing the fix for this, just pretty funny thought haha
Apparently they have the bot on 3 servers at once
Yes, I remember that one. I had a user asking about tomatoes and it'd pop up in the private voice channel I set.
I'm resolving this by checking if i.guild.voice_client == voice_client (variable representing the VC bot is connected to).
If not, it sets tts_resp = None and behaves like only text response was received
To disable the TTS from processing would require reoloading extensions constantly though
You have a plan in place for putting user name in prompts, right?
Yes
That info is being sent to the LLM, though
Not yet - when I mentioned server_mode earlier I forgot that's only on the dev branch atm
Well, I mean, when the bot responds, it responds with my user name
Oh - that's automated and the bot does not know it is @ mentioning you
But the user who wrote the messages is dynamically assigned to the user1 parameter
The LLM does see each users name as the user
The @ mentioning occurs if the bot is not responding to the same person consecutively
I'm not talking about the @ mentioning, though
LLM sees each username
If you ask the bot, what is my name?
It will (probably) reply with it - unless you use the custom thing I have for stopping strings
Yeah, I've tested that before, and it works.
From example character:
# Stopping strings you may include which this bot will dynamically replace:
# "name1" (the user's name)
# "name2" (the character's name)
custom_stopping_strings: '"### Assistant","### Human","</END>","\nname1:","\nname2:"'
stopping_strings: '"### Assistant","### Human","</END>","\nname1:","\nname2:"'
Oh, it will only stop if the name follows a \n newline from my example
I haven't yet gotten to test out whether the bot can associate a user name with a particular game character. My guess is it will get confused.
I definitely need to put in some stop strings, though
What we were discussing earlier, about names... Will only have any effect if/when the new Behaviors are implemented - when multiple messages may be merged to a single prompt
(optional).
Why are there custom_stopping_strings and stopping_strings settings?
no clue but they're both required by TGWUI
Classic
@terse folio did something dumb - ended up deleting and recreating the dev branch
Anything in the works for delete/replace last response?
On the todo list
I’ll update the one and only pinned comment to include the actual todo list
(Soon)
Had a cool little idea you could also implement using Flows.
An ability for the TTS to change how it speaks based on the text.
Maybe an LLM would decide what emotion the text is meant to convey and pull the correct reference audio for that speaker+emotion.
Same for other tasks like whispering or talking louder.
I decided to put our RPG character information into the Gamemaster context, like so:
- [Character name], played by [user name]: [Information about character]
And so far, generally speaking, the bot can differentiate the character based on the user that spoke to it
Well, the extensions would have to be reloaded to apply the params but seems like that happens almost instantly
I have no experience with this, but absolutely, if there's a syntax for that then a character could have specialized context to apply the syntax
Pinned msg has been updated with ToDo list
It's probably much longer than that
I was wondering why the heck my Aspect Ratio helper character kept giving the same answer no matter what
its because I forgot to put mode: chat - so it saw zero context
lol
It's the grammar string, grammar is awful.
That is one thin image
infinitely thin :O
Works good without the grammar
It picked the correct ratio
- trigger: 'draw,generate'
insert_text: ''
insert_text_method: replace
search_mode: user
on_prefix_only: true
save_history: False
load_history: -1
swap_character: M1nty-SDXL
should_send_text: false
should_gen_image: false
flow:
- flow_base:
save_history: False
load_history: -1
- flow_step: Ask LLM for best Aspect Ratio
format_prompt: '{llm_0}'
swap_character: '_Aspect_Ratio_Selector'
should_send_text: false
should_gen_image: false
- flow_step: Gen image with the LLM's selected AR
format_prompt: '{llm_1}'
aspect_ratio: '{llm_0}'
should_gen_image: true
should_gen_text: false
should_send_text: true
Don't actually need to use the last step there, could just use the variable to gen image on the same step as prompting the AR helper
nvm
I need to better document it - the variables get their values updated immediately before the next flow step
Anyway, from now on I'm letting the AR selector get a piece of the action for all image requests
good!
There has been a lot of great progress these past few days
The install and update instructions have changed, and will likely remain as they are now for awhile
The bot can now be git cloned into the text-gen-webui directory, and you can use git pull to update without conflicts due to modified user setting files
This is epic, I put together an XTTS api server.
You can upload audio files to create latents that will be stored during the session.
You then can use those latents to generate tts.
And everything stays in ram ^-^
I could port this to work in TGWI as well, it would just create a webserver like the openai Extension.
But this is eventually one last thing we need to worry about needing to hook directly into tgwi for.
Working on this also taught me how to fix the documentation of the openai extension for the transcription endpoint which requires a file upload.
It currently has no documentation because it's a little complicated to have both
👏
Can params be modified onthefly?
Meanwhile I added a simple function to toggle TTS activate on/off.
Also set the “loading extension X” to warn once (per extension) so it doesn’t spam it when modifying extension args.
I havent checked the full list, but I copied what the xttswebui had.
Temperature, speed, topk, topp,...
And I got some warnings about needing to use num-beams, so will add that too.
It has a low vram mode, but I'll add an endpoint to load/move the model to CPU.
Now that I think about it, there should also be an optional timeout to keep the model in memory for a few seconds after the request so you could process multiple in one go without waiting to move from ram.
Another idea I had was an endpoint to the tgwi api to list the active extensions.
This could be useful to find out what other web servers are running in tgwi for a client to call.
Like your function that picks what tts client to use.
An external bot could check what tts models are running and call the correct urls
Well yeah, in time the function may be updated to do just that, if your vision turns out like that
I think no one ever used the transcript endpoint with the openai extension.
having to install python packages that weren't part of the requirements 🤔
I'll open up internal functions too if you want to access them directly!
yay got that working!
Almost time to open your own resources thread 
😸
Just took a look at alltalk's streaming, looks like a straight forward implementation!
so I have audio streaming, but something is terribly wrong with pyaudio and it just plays sound extremely loud.
The audio is int16, all the values are correct.
So weird
is that in lieu of FFmpeg?
from the bot code:
# Otherwise, play immediately
source = discord.FFmpegPCMAudio(file)
PCM is uint8 iirc, gen_stream sends int16 (same as alltalk)
normal gen sends float32
But this isn't the issue, i can write the stream/generation to a file and play it.
But using pyaudio to stream it will blow out my headphones haha
The bot code would have to support playing from an iterator, or live buffer
not yet, this feels pretty obscure, searched a lot
I tried asking a few things to Chatgpt, wasnt of much help
Tomorrow I plan to start up alltalktts and use the streaming api and see if I get the same corrupted wav issues.
The code was nearly identical.
Cya for now!
adieu!
Pushed an update to improve TTS handling
If the bot is on multiple servers, TTS generation is now handled gracefully.
Additionally, it no longer generates TTS when a tag is triggered with should_send_text: False
I accidentally broke Cont and Regen 2 days ago.
Fixed
(they still don't work correctly... coming soon)
I'd gone through and cleaned up the user and channel references.
Apprently on_message(), the commands, etc, are happy with i.author.display_name - except for those two App commands which do not have an author attribute
must be i.user.display_name for those
About to hit 500 commits 👀
@terse folio thanks to pylance, I noticed that one of the original imports from the source project was not used at all… 'torch' 😆
(Sounds expensive)
Im afk didn’t try this yet, but if I clone the repo anywhere but TGWUI will pylance list all the missing requirements?
it's imported in the tgwi i'm sure, so it's not adding any load times as packages are only loaded once and using import just provides a connection to the already loaded lib ^^
But yes, if the bot was standalone from tgwi then torch would add 1-2 seconds of load time!
I don't keep the repo in tgwi, I work on it from outside.
The only failed to import parts are the modules and shared stuff (from tgwi)
I need to test if it runs without TGWUI
oop, this stuff still hasn't been fixed, will add to todo later
hmm, it might!
I never tried.
all your tgwi imports are hidden behind an if statement, so it's possible.
I made a lot of updates past two days, mostly cleanup. But added a nifty TTS toggler
was just looking through them this morning, 3/4th way through the changes 😸
I had refactored all the code to not actually require either program but did not actually test lol
I forget where but I noticed one false hint you wrote and fixed it 😛
idk maybe it was correct now that I think of it… forget it 😛
I cleaned up the user / use_name so it’s more obvious what variable is what
Also making better use of the ‘params’ dict that gets passed down
user_mention: str
perhaps?
Yes, discord.User().mention is a str.
I don't know how discord.User.mention is evaluated as a typehint, i had it noted to test out what that does.
Yea, it doesn't know what to do with .mention
I think that's because .mention is a property not an attribute
typehints are for types, they don't take instances/variables.
str is a type, but str() is a string and wont work.
@property
def mention(self) -> str:
""":class:`str`: Returns a string that allows you to mention the given user."""
return f'<@{self.id}>'
While User.mention has output typehints, it doesn't seem that pylance lets you use those as the typehint for other items.
The way I do typehints is for the less obvious things.
For example in the past there were arguments called "user", which would lead me to believe they were of the discord.User type.
They were not, so I marked them as str.
And in the most recent commits, you change the argument to user_name which is much more descriptive ^-^
Another thing I noticed was removing user+channel for i
"i" is a pretty commonly used variable for for-loops, like for i in ....
This could cause bugs later on where i gets replaced with the last item in the for-loop like I talked about earlier.
So I would replace i with ctx
or inter short for interaction if that's what it is
Let me know if you’re working on anything so I don’t inadvertently start working on the same thing
I will change all those interaction variables to just enter because that is a good point you make
Also the on_message event provides a discord.Message
not discord.Interaction
I'll fix that myself later
Oh interesting
Some of your functions can be activated from on_message, and interactions.
So we can use a Union[discord.Message, discord.Interaction] typehint
which means this or that
I'll look more into that, not exactly sure, I didn't trace it.
It could be context, and not a message type
When I was making my coffee this morning I decided how I want to resolve one little mess... the split on_message_gen and _hybrid_llm_img_gen.
I'll be merging those together as hybrid_llm_img_task - and probably divide it to 2-4 subfunctions
currently on_message_gen leads to hybrid_llm_img_gen and nothing else leads directly to the latter (and never will)
Another pretty big change I want to make is remove many of the large if statements.
like if SD_enabled
define x,y,z...
I think it would be better to define all the functions and bot commands in main.
But have the if statement in the interaction to tell the user this command is disabled.
This would cut down on the need to restart your discord client after enabling SD or another feature.
Because all the commands would already be there.
Yes I;ve been intending to make that simple little change.
This actually won't all happen if at the beginning, it does not toggle sd_enabled based on the client check
Just need to undo that dumb toggle
I put it there becasue I was originally collecting all imgmodels at init - but now thats fixed
its just one line wreaking havoc, just delete it 😛
You want the honors or shall I?
Even with TGWI,
We could create a TGWI class, give it a "connect" function which would run all the imports and toggle a flag "TGWI enabled"
You then could access TGWI features through that class.
So all those functions could also be there not hidden behind an if statement ^^
just running as "just in time" where it imports when it needs it.
I'll probably work on that myself since I have experience there.
I have a large library of all my useful code utils, and I keep some machine learning stuff in there too.
But importing Torch is expensive, can add a lot of wait time!
So I use JIT importing there too!
To only import torch when a model is being loaded.
We do want to disable all the commands and everything, if the config file specifically has it disabled
I understand - the way I have it works but isn't the clean way to handle it
I don't know if that would require a discord client restart.
but worth checking out.
I think a warning "this command is disabled" on run would be sufficient if not.
I'm going to add a check in the sd_api() function - when there is a successful response, if SD_CLIENT == None It will go fetch the actual client name.
(If it is None that means it was not online during startup)
This variable isn't required for anything to work, its just to enhance the user experience
will function*
Since its currently a global variable I'm just going to remove the return at init and set it in get_sd_sysinfo, to more easily call the function from sd_api()
I'm updating the get_settings_dict() from the main settings so it may return a top level dict instead of the entire settings dict
def get_settings_dict(self, key=None):
if key:
return self.settings.get(key, {})
else:
return self.settings
(not like it's used much... yet)
fixed those bad behavior assignments in init...
Added a method in ImgModel() to refresh the extension support if SD WebUI is found to be online later on
def refresh_enabled_extensions(self):
self.init_sd_extensions()
imgmodel_dict = bot_settings.get_settings_dict('imgmodel')
merge_base(self.img_payload, imgmodel_dict['payload'])
slight change to that
def refresh_enabled_extensions(self):
self.init_sd_extensions()
imgmodel_dict = bot_settings.get_settings_dict('imgmodel')
new_payload = merge_base(self.img_payload, imgmodel_dict['payload'])
update_dict(bot_active_settings['imgmodel'], new_payload)
bot_active_settings.save()
I know how much you despise love the save function 🤓
Also check out what alltalk did with the ability to install standalone or in the webui.
They have a script.py in the main dir that acts as an entry point to tgwi to start it.
You could do the same for the bot.
And people wouldn't have to move any files. just git clone.
I did attempt to start this, but found it overwhelming with the amount of functions that mimicked the startup of tgwi.
update_dict(bot_active_settings['imgmodel']['payload'], new_payload)
I looked at the documentation for the Alltalk API and it looked way too complicated for me to screw around with, when current method is working simple and effectively
haha it's all good,
I just don't want to save on every assignment like this:
settings.set(test=1)
settings.set(name='someone')
settings.set(character='example')
This would trigger 3 saves in a row
So it's best to set the attributes through settings[test] = 1, settings[name] = 'someone' ... etc
and at the end save it ^-^
I know the top level dictionary imgmodel kind of sucks for what it is - we can change that if you come up with some brilliant migration lol
I don't care enough though it can stay
I meant to look at how they made it run as an extension.
If running as an extension within tgwi, you still have access to all the internals, and can contact other extensions just like you do right now.
It can just run as an extension in TGWUI - you can leave the form field blank in config.yaml, but use the --extensions flag in CMD_FLAGS
TTS will work
(I think 😛 )
hmm interesting, haven't tried
I think it tries to match the tts_client value from extensions list...
nah. hmm...
Can easily update that
I just need to add the extension args as a source to check for the value
That field in config.yaml is meant to just be a shortcut to make it plain and simple
I mean, it will enable the extension if its in CMD_Flags - the /speak command will probably just be bugged
ANWYAY - fixing that 😛
good catch
I haven't digged around config.yaml too much, not sure which field you mean.
I'll read this back again later, have some things to do outside ^^
...
tts_settings:
extension: ''
api_key: ''
...
I think we were talking about different things :)
Maybe 😛
You're probably talking about how to handle things if/when we transition to use TGWUI API
some of the hacks currently in place may still work
or maybe can be tweaked
not exactly.
When I installed alltalk a few days ago, I noticed it prompted me if it was being installed as an extension or standalone.
Which tells me the same code could work for either!
So what if we did something similar for the bot.
Creating a script.py with a setup function that starts up the bot when tgwi loads with the --extensions bot flag.
From the bot (as an extension) you can still access all the internals of tgwi and the internals of other extensions like for tts!
If it's something you believe in, then feel free to make it happen, I don't quite understand 😛
Maybe you mean like, the ability to toggle Deepspeed or a few other settings that can't be toggled from shared params
tts_client = ''
# Initialize shared args extensions
for extension in shared.settings['default_extensions']:
shared.args.extensions = shared.args.extensions or []
if extension not in shared.args.extensions:
shared.args.extensions.append(extension)
# Get supported TTS client found in TGWUI CMD_FLAGS
if extension in supported_tts_clients:
tts_client = extension
added that bit at the end to snag the client name
It's a pretty big change to make,
Because I don't really know what functions I can remove (the ones that do the TGWUI startup)
Since that would be handled by TGWUI now.
There will be some guess and check, for sure
The Controlnet and Reactor options in the /image command were initialized depending on whether they were responsive to an API check
Revised that to just use config settings (so they can be used if SD client launched later)
cnet_data was a global variable used by the image cmd.
Now its just fetched if the controlnet option was used in /image
just got around to testing alltalk streaming api.
It also writes a corrupted wav file.
Trying to play it says there's 0 seconds.
But same as before, could open it in vscode and play it there. Hmm.
That's fine I guess, just need to figure out what's wrong with pyaudio now!
get_settings_dict() was also dumb method from Settings() - the way it's structured can just as easily write the direct value
Ditched that
Welp, all the features of SD are now working after starting it later
I'm bringing my 2 bots back online now that my dev one is complete. My dev server has a free rtx 3060 12gb, and seeing about setting up a SD server on there for the bots to use.
The one with 35GB Vram.. What Model should run that now? I was using Mixtral on it for the longest time. Is there anything better now?
ugh
I'm fairly certain the bot would successfully load extensions if they were in CMD_FLAGS.... doesn't seem to be happening now
I'm blaming @terse folio who must be the culprit
either that, or I borked it when I monkeypatched load_extensions()
I think maybe Reality is to blame here...
😄
Oh yes they are
maybe not... ugh
I give up for now. the monkeypatch is probably what is preventing extensions from CMD_FLAGS from loading
I'm making it so that the value for Image Models in Tags and dict_imgmodels.yaml can be either the title OR sd_model_checkpoint
title is the filename minus .safetensors, and is prefixed with {subdir}_ for each subirectory
sd_model_checkpoint is the exact value including the hash from SD WebUI model list
Hmm,
15:30:58.653 #209 ERROR [bot.main]: http://192.168.1.249:7861/sdapi/v1/cmd-flags response: 404 "Not Found"
15:30:58.653 #240 ERROR [bot.main]: Error getting SD sysinfo API: Failed to connect to SD api, make sure to start it or disable the api in your "ad_discordbot/config.yaml"
I'm running https://github.com/lllyasviel/stable-diffusion-webui-forge
Was there something additional to set?
Does your client have flags --api --listen ?
that's probably it. forge does some weird things, but I might have it down now
I use forge
The one thing that is not going to work correctly out of the boxfor Forge is ControlNet support.
See here https://github.com/altoiddealer/ad_discordbot/wiki/troubleshooting#image-command-does-not-initialize-with-controlnet
If you switched to their dev2 branch, then you won't have this issue
(they merged my PR)
15:41:29.181 #3656 INFO [bot.main]: Dundell2 used "/image": "anime style asian man with hat"
15:41:29.181 #3202 ERROR [bot.main]: An error occurred in img_gen_task(): 'payload'
I'll keep attempting some things
You may have made an error migrating old settings
I’m on the road right now but I’ll try cloning a fresh copy of the bot to see if there’s any issues
I’m always updating my own personal instance as I go so I may have overlooked something
Who knows
No luck with both bots. One has additional issues, but that might be something to do with no Characters, but they both have the payload error. My forge ui server does show api enabled, and allowed for local use on 192.168.1.249 port 7861
But that's it for me for a while. I did get both bots atleastrunning for now which is great
I'll check it out momentarily
The third bot, the 70B I might keep just on api though. Too slow for discord
working out one little kink on fixing the change/swap imgmodels
It's a complicated function
Got it resolved. Now to see about this payload thing...
Try again
That message occurs if the menus changed while your discord client is running
17:20:42.234 #687 INFO [bot.main]: Bot is ready
17:21:29.690 #4150 INFO [bot.main]: Dundell2 used "/imgmodel": "Juggernaut-X-RunDiffusion-NSFW"
17:21:30.076 #4079 ERROR [bot.main]: Error guessing selected imgmodel data: [WinError 3] The system cannot find the path specified: 'C:\forgeui\webui\models\Stable-diffusion\Juggernaut-X-RunDiffusion-NSFW.safetensors'
17:21:30.103 #634 WARN [bot.main]: One or more "tags" are improperly formatted. Please ensure each tag is formatted as a list item designated with a hyphen (-)
17:21:30.606 #2026 INFO [bot.main]: Image model changed to: Juggernaut-X-RunDiffusion-NSFW
17:21:47.262 #3656 INFO [bot.main]: Dundell2 used "/image": "rubber ducky on a lake"
17:21:47.262 #1439 ERROR [bot.main]: Error getting tags: can only concatenate list (not "NoneType") to list
17:21:47.264 #3137 ERROR [bot.main]: Error matching tags for img phase: list indices must be integers or slices, not str
17:21:47.266 #3202 ERROR [bot.main]: An error occurred in img_gen_task(): cannot access local variable 'key' where it is not associated with a value
Ok Im booting up fresh install
Ok I am also getting the payload error
that is no good
The error is because I never thought of how to handle the first image model
when none has ever been loaded via the bot
All those other errors you have are due to something misformatted in your tags
erm
hmm
Is that path valid? C:\forgeui\webui\models\Stable-diffusion\Juggernaut-X-RunDiffusion-NSFW.safetensors
It's all new server stuff. I can take a closer look once I get home. I'd like to setup two options for users 30 step and 70 step models, and add in some doifferent sdxl models later on. Be neat
C:\forgeui\webui\models\Stable-diffusion\Juggernaut-X-RunDiffusion-NSFW.safetensors
Might be a C: directory issue?
My family came home so can’t look again for a few hours
I may be missing an os.join() that could cause issue with non-Windows
I hear you, my son is in my lap with an apple watching youtube. Trying to work with one arm
My wife wants to strangle me when son is home and I’m on computer lol
I need to see if there’s an api call to get current imgmodel from SD
Also need to prompt for bot token if it’s not set, and save it. Instead of saying set it manually and exiting
There may be something wrong with hyphens in the name?
What I can say is that from a fresh install with nothing changed besides my bot token, I could change imgmodel then prompt images
Here’s something that’s going to change…
The image model is no longer going to be explicitly saved to activesettings anymore. When the bot starts, if the field is blank that will be whatever current model is.
This error with payload is due to a flaw in the new activesettings initialization
@terse folio another huge thanks - this new framework you set up is super easy to work with now that I get the gist of it
Just deleted that big lump of Config at the beginning, replaced with simple block in database.py
Migration is working to convert old config.py to config.yaml
It's loading.
And I added a prompt for bot token, and now can use that simple config.save()
Beautiful
@vestal python I tried reproducing the Image Model error you encountered, and I did. Then, I did resolve it
I added a hyphen to the model name and got the same error.
But the problem is not due to the filename including hyphens.
The error was due to Forge having stored information about the model, and then I changed it without relaunching Forge
After I closed and relaunched Forge, it loaded the model without errors
does this make it faster at generating ?
hello again
I made a lot of good progress today on bug fixes and ease of use
you are hussling day in, day out
I'm probably going to take it easy this weekend, though
yeah take some deep breath and go on a vacation with your AI friend, lol
any news on an open source Her (GPT4-o replika) or not yet?
Update pushed
It's based on https://github.com/daswer123/xtts-webui
But I stripped all the unnecessary code and turned it into a webserver.
My goal is to build nodes that I could distribute across multiple machines, that's why I don't just use extensions.
As for speed, that webui is slightly faster than alltalktts sometimes.
I don't know what the pattern is.
hey Reality, how is it going
This is my observation on timings
Good, bit exhausted just working outside all day!
Now to catch up on all the online stuff 😸
How're you?
you are decentralising the work needed to be done? therefore making the genetation faster?
like 3 devs doing the job instead of just one miserable dev 😭
3 devs! Now there's a juicy thought
lol
Yes, but there's a limit to how fast that can go.
Batches will be faster with multiple machines yes, but you'll still need to wait the initial time for the first generation to come in.
Also there's the drawback that sending custom trained models could be an expensive task, they're 1.5Gb each
This project has 3 devs - Reality counts for 2, for how efficient they are
Yea, but at the moment without communication with eachother
think of it as a render farm
a central hub splits tasks out to workers
I actually should look into if there's a way to do parallel streaming/generation with xtts.
Because it has some interesting model stuff going on, I'm not sure how i'd accomplish that.
Like your ability to use a custom speaker for each generation
I get you, there is a limit of how fast things can go, like 3 devs, 2 can get the job done, so there is no need for the third one
With text generation, McMonkey explained to run each token in batch, which makes sense.
I think Xtts also generates voice tokens?
therefore even if we add another dev the speed of progress will stay the same
Scaling horizontally (more devs) would allow you to process more requests at a time.
But scaling vertically (better gpus) is faster results
faster results vs more results, why you made that distinction?
It depends on your usecase.
Like some people might value the TTS output coming in really fast, immediately so the bot can respond in real time like a human.
as for more results, this is nessecary if you are running a large service, with 100s... 1000s of active users generating voice
Because at the moment, Xtts only generates one request at a time
Same with TGWI actually, text generation is locked to one at a time.
I looked into making it parallel, but found some roadblocks, like the model class for Exllamav2 was coded to ignore paralell requests.
I'm not sure if this was on purpose or not for the sake of caching, but that's something that needs testing
signing off, have a great night. Don't report any bugs tonight or I won't sleep well 😛
Sleep well!
so in theory you can chunk things up and make multiple gpus work on it
and get faster resukts
Yup, absolutely
I talked to someone who was running 3 instances of tgwi to utilize all their GPUs
oh
Not for big models, they were small, like 7Bs.
But this was for a web service
so is tgwi a fork of a tts pacakge?
Tgwi is short for textgeneration webui ^^
putting his hand on his face in a moment of silence
No worries ahah
so it was easy to setup for him? just plug and play
this
and are we talking text gen only? or also xtts
Yea, I think you can define which GPU it uses
I'm not too sure about tts, but was Text and image generation.
was getting faster results?
faster inferance
about voice, if you have bunch of gpus or a relatively mid speed gpu you can do some tricks here and there to make the illusion of a "realtime" conversation, probably openai done that with gpt4o
I didn't ask, but from my own experience running multiple things on a single GPU, yes.
If you run multiple processing tasks on a GPU, they fight for the same Cuda cores, even if they both can fit in memory.
Like running text generation would utilize 100-90% of cuda in task manager iirc.
Same for Stable diffusion, so they'd slow eachother down
Yea, and openai has the benefit of some pretty big infrastructure.
So they wouldn't have the issue of copying big models across so many machines.
They might even have a large network storage system.
like you said spreading the chunks across multiple gpus then add the mp3 files to the queue
making things seamless and smooth
If using a model that each machine has, yes.
You could split your query into sentences and send each one to a different machine along with the speaker you want to use.
yup yyp
oh so gpus are bad at parallel processing?
one gpu I mean
trickery, illusion, and magic lol
they're designed for parallel processing!
But the little issue is that some programs are so intense they use all the Gpu's resources
conflicts betwn prgrams
TTS generation uses about 20% of my gpu while generating.
It's just not a big enough model to use all the cores perhaps.
I don't know the specifics of what's going on there
i think it's possible to get a gpu dev (or God knows what the job title 😅 is) to make a customised script that takes advntge of everything in an optimal fashion
that is usually how it works
Every task tries to complete as fast as possible
Like if you're running a videogame, the frames render as fast as they can. (this is also based on how fast the CPU can put out frame data too)
But you optionally have the setting to limit the framerate if you wish
"guys relax we just need to coordinate the team"
programs stealing resources "NO IT'S ALL MINE. AND ONLY ME"
yeah i got you
where it lets those higher priority programs run as fast as they need, then in the downtime between instructions/waits run everything else.
(that's just a guess)
you think it's possible to jave a mid range npu, gpu or whatever in the future that enables crazy stuff whike being affordable to the average consumer?
something on rtx 4090 level on ai, but relatively cheap and affordable
cheap to mid range
I've heard people are working on such things.
From custom chips for processing transformer type models, to using analogue for the matrix multiplication since it's okay to lose some accuracy with ML.
Not sure when they will be for public consumption.
I imagine that could take some time, either having to build some sort of interface to existing motherboards, or create your own.
How that works is a bit beyond me
maybe such a thing would work with the normal PCIe lanes that your gpu uses.
you think nvidea will let those ppl/companies sell freely?
Competition would be nice, hopefully that would lower prices for everyone.
Not sure, they could also buy the company.
lots could happen
just hoping for the best
if you are a company wanting to sell to consumers it's better to use existing infrastructure instesd of inventing a new one
you either give the consumer more reasons or less reasona to buy
if you need to change also the mb,you will give it much thought
Mhm, such a company could also go the route of selling to datacenters
it's annoying how nvidea does things these days
hopefully thimgs change to the better soon
ywah sadly it makes sense more to sell to the businesses instead of customers
like groq is doing noe
mhm, and scary to think about those businesses trying to stop people from running their own models.
Trying to create restrictions, hmm
yeah things aren't looking good
sure we get 100b and 200b opensource models
but who's gonna run that?
just 2 or 3 ppl and the rest are other businesses
I like to see a never endng trend of small models emerging 7b and below
and how to stick them together to achieve a gpt4 or 5 level of results
I don't think that could happen with 7b models,
in my testing, most couldn't do simple decision making with multishot prompting.
But they are good at writing well, and fast!
Checked the logit probabilities, and they were all around the same for each choice, showing that the model has no idea
like how mixture of experts works but taking the logic behind it and applying it in a creative way to other stuff
like downloading only the exeprts that you want to use and are useful for your usecase
I don't have the ability to run Mixtral, but would be interested to see how much better that would feel than 7B on its own
each expert is a 3b model
I can't too, this is why I am interested in the logic behind it
not the math, but the higher level logic
that can be applied to alot of other affordable solutions
I feel like all the models would have to be loaded in vram for it to be fast.
Because having one model decide which expert to use, then loading that from ram to vram would take too long.
And perhaps this switching of experts can happen per token? not sure
I think it's possible using mutiple models finetuned to laser focused tasks
and the user can choose the experts
Yea, maybe that's it, the models are just so generalized to text they fail at logic
depending on the use case
I mean.. using small models will be faster
imagine 3b models highly focused on a specific task
laser focused
all of them are finetunes
Yup, I used a 124m gpt2 model finetuned fro turn based conversation by Facebook iirc.
It was way too overfitted 😸
but for speed, that is what I needed
a 3b that can do ml with python, and just that, no other python expertise rather than only this
we didn't have quantization back then
I see, that's very specific
we might realise in the future "wait we don't need a 7b model for this xyz task"
like someone was using gpt4 to tag some text, then found out that it was a waste of money and moved to gpt3.5
it was enough to do the job
mhm, some things sure.
But I think more abstract reading comprehension tasks like understanding the context of a chat log and picking the right function would have to be bigger.
There are so many possibilities for functions a user could define can't finetune them all.
On a similar topic, everyone is trying to throw AI at tasks to solve them.
But people are forgetting about other NLP processing, or things like regex for example!
yes lazer focused, and maybe we can have an easy way to finetune lazer focused models and the market decides the best ones on xyz category
Some simple things like extracting a mentioned email address in a text doesn't need a whole llm query.
I remember this example from somewhere
and those jackpots will get popular
yup, you see what am saying?
but not all ppl know that
you can simply use a python script for that
no llm or nlp model needed
yes yes, we just go "bigger better.." but we forget that a simple python script can solve the task sometimes
that's important the coding part, if we can somehow make llms generate code that helps them it will be reallu good
generating code on demand depending on the task without the user prompting for a code
like email extraction, the model can just go "hmm.. I don't need to waste my resources for that, I can just make and run a simple python script for that"
Mhmm!
and I'd love to see LLMs having the ability to work on large code projects that are more than a little snippet.
But that's a lot of context space needed
or you give the llm access to a giant database of small snippets of code
and the llm just chooses the best one
no need for code generation at all
just choose the best one and run it
kind of AI Coding
Coding+ LLMs is an insane combination
the db is offline or online, if online download code on demand
if offline just use it right away
scripts are tagged ofc, categorised and classified
if an llm is making the decision here, it could probably be faster to just get the emails using the llm.
As it would spend more time writing a script than outputting the list of emails.
But perhaps the context size of processing long texts for the email would cost more time than just writing the script.
Maybe some caching for tasks.
If task = find email -> create code or run existing code.
Now sure what kind of project that would be. it was a strange example
we can use a ready to go code instead of generating from scratch
just a quick search on the database
if the llm finds anything usefull it uses it, if nothing is useful then the llm goes to generate it from scratch and run it
but if the database is really massive, I doubt that there will be nothing useful
Yup that pretty much describes AI using tools ^-^
the irony in this is we can make llms make the entire db
lol
you basically getting free inferance
coz all the code is pregenerated and stored on the db
Of course, it's just caching!
yup and you can make a highly focused db that suits your usecase
just set things up and let the llm create ton of code while you are sleeping at night
this code can be used later
it has to be saved, categorised, classified the right way, to be retrieved later
maybe rag can help
😸 I like the "just in time" approach.
wasting less resources, because you're only generating what you need the first time you need it
ye and the llm won't even generate that code again
I mean.it just has to detect the right code then it will be instantly run
like this
the best code for the task is "generateImage.py"
running the file..
you see, there is no need to turtore the llm (generating the same py file again)... it will just run right there and then
Yup!
and the llm also can manupilate the code using another code instead of generating all that
sometimes a simple .replace is all you need
lol
if you know any crazy someone who made something like that I will have to know about it, so let me know : )
ai agents looks similar, but there gotta be a better way to merge llms with code
so they can handle a large db of coding tools/snippets with almost no inference time needed
First part to work on is the ability to recognise and use tools properly,
I've experimented with using embedding to try matching text to tools.
But the best results came from using 13B models with multishot prompts.
Still made mistakes with some complicated chat logs because llms have trouble with time.
Thinking something the user asked a while ago is the current task.
you mean identifying/choosing the best tool for the job?
oh yeah time is...
time is confusing sometimes
tags might helo
tagging the tools
like this os, system package, system cmds
thing is, sometimes the task depends on multiple messages in the history.
So I can't just cut it off.
Maybe you're having a conversation about a pet, the bot asks questions about it, you describe it.
at the end asking to create an image.
It would have to take into account all the previous information.
interesting
On another note, I discovered that the speaker in Xtts is the last step in the audio processing.
Maybe one could generate the audio tokens at the same time, then loop over the results applying the correct voices to each.
But that would require editing the generation code/internals, I really don't want to do that!
But possible, one day!
hmm you gotta make a tool called "objectDetailsIdentifier"
or sceneDetailsIdentifier
a script that read the logs and extract relevant info
mhm, breaking tools down into more tool calls.
would help!
Yea, but I have an example where the bot calls the wrong tool as it's stuck on the previous one.
I'll see if I have a screenshot
okk
#general message
Not sure if I posted a broken example here.
But this is the kind of tests I was doing.
And I see now why the LLM got confused.
because one of the tools is "chatbot"
And I guess it doesn't think chatbot should be used as much, and tries to use different tools.
Especially if you mention something, it might mistake it as user info to save or something continuously.
Also just noticed we're in the ad_discordbot thread 😸
Hope @halcyon quarry wont mind
If I could get grammar working, I bet results would be better
I don't, but I do give a list of the tools I want to use.
I don't enforce it using grammar though
like "the only tools you can use are xyz rtg bvc "
no, just through multishot prompting,
during tests I also like to see the new function ideas the llm might come up with if it ever does.
Like that idea of cats[0] and cats[1] I wouldn't have thought of that
the llm is miswording your tools names?
It's not, and i'm not doing anything with the tools, i'm just testing if it can understand which to use for the given chat log
Another thing I did is include
system: used tool (generate image... whatever)
in the chat log
So the llm knows it already ran that
It came with it's pros and cons, some models got confused by this, some got the point
hahah had this saved in my last run of that python notebook
Somethng I did lately might help you, so I made a dict, the pairs are trigger_word, fact
like this
{'Alex':'Alex is John's childhood friend, is living near the park', 'Potato':'John hates potatos'} etc..
so when I prompt the llm
hey John, lets visit Alex
I have script that detects any trigger words then adds a notes as a fact
hey John, lets visit Alex (Note: this is a fact, Alex is John's childhood friend etc... )
the llm answers is way better this way
same with the potato. even if I include that John hates potatos in the system prompt nothing happens, I mean the llm is acting dumb
Yup that's another problem, getting LLMs to listen to RAG.
Also nice, the dict thing is similar to this bot's tag system it sounds!
I like that idea, a simple solution like we talked about earlier
llm: I brought some potatos
john: potatos!!
llm: yes they are delicious
without the trick ^
llm: I brought some potatos
john: potatos!! (Note: This is a fact, John hates potatos)
llm: oh no, sorry I forgot you don't like potatos
with the trick ^
this is really powerful
I didn't try giant text facts
but a small sentences as facts
I don't think a giant wall of text wilk work
but a tiny fact will
brief and straight to the point
Yea, that's where that "save_user_info" idea is about.
Building these key value pairs while chatting so they can be called upon later
the insane thing is, it becomes nornal means I don't notice it, it just work on the backend
automatically
Amazing!
if the scriot detects any trigger words in my prompt it adds notes before sending it to the llm
Could also scan the llm's output and put in these annotations, maybe have it rerun the generation
maybe it comes up with something that wasn't in the original prompt, but that thing has some important context
I diddn't get it explain a bit
Something more roleplay like, where you might be moving through a space.
The LLM decides there should be a potato store for some reason to your left.
in the 2nd pass, the LLM learns John doesn't like potatos, and corrects it's response to say "ohno, john has found himself by a potato store now"
(of course the bot wouldn't generate "potato store" without being told about the potatos in the first place, but I can't think of a better example atm)
Like letting the LLM "remember" while it's talking
I’ve been missing out on some deep talk here
I love potatoes
They are nice
Just got home, I'll reboot all 4 machines and try again
my imaginary John hates them
So a little bird may have told him
hmm like generating the space? location? and the objects in it?
Also git pull if you can
Good updates
Like with this example from my message earlier.
The bot generates a picture of a dog as it thinks this is a pet that it could have.
Let's say Bot+Dog = trigger for some context.
It could trigger that the specific type of dog is a German Shepard and update the prompt for image generation to match this
I forgot something that might be important, the script is set to mention the fact only one time in X amount of last chat
I mean the fact will be mentioned IF it wasn't mentioned already in the last 4 chats (or any number)
so 2 conditions here, the presence of the trigger word potatos and the fact not being mentioned in the last 3 chats
I do this to avoid repetition of the fact which might lead to an overeaction, if the fact is mentioned a lot of times in the last 4 chats the character will overreacte
like this
I am very sorry, *cries* I will nerver give you potatos again *histericly crying*
😂😂
Absolutely!
I'd have some sort of flag set on the triggers that indicates it's currently in use until that message with the (note: here) is deleted then it releases
also wow, over reaction!
would be funny to encounter that while browsing the web, repeatedly trying to finetune what your search is because the bot isn't searching the right things
unexpected in that context
wait so you are extracting facts and saving them automatically?
that's insane
not doing anything with them yet, this is just testing the prompt design to see what would work later down the line
but yes, some simple regex would work on those results!
then you can add them to the dict and get a never ending loop of improvements
key, value pairs
trigger, fact pairs
good luck with that
This is just some fancy code to create the system prompt through random shuffling and picking random examples.
But this is how the examples are defined.
yep, would have to create a new tool, or add a lifetime value to the save_fact function
this will be saved and you will always be sick in the eyes of the llm
that's an interesting idea
you will save a temporary fact as permanant
you are not alwyas sick
so that's a bug
a solution is to update/remove those facts
with a new one
mhm,
Another idea I have been playing around with is figuring out how to build a RAG database that also considers time, favoring more recent information
I am well now, woow it was a rough 3 days
the sickness "fact" will be deleted
or updated with "John was sick but he's well now"
maybe a call to look for other facts relating to this and updating them as needed.
Could use embedding to search
"I am well" "I am sick" probably would be similar in some way, not sure, another thing to put on the todo list to try!
if you generalise on all facts that's might cause bugs (that's how I would do.it)
i would make the llm check every fact against each other to delete or update the contradactory ones, which will cause a mess
so your way of doing is better, narrowing down things
i wonder how tags can help in this?
Tags can help with anything 🌈
Mhm, would want to have them stored in a tree,
maybe the prompt should be more sophisticated like include the user's name.
save_user_info('cats', '2')
actually saves to dict(users[-1]: {cats: 2})
How do we feel about a ‘regex_prompt’ tag?
I dont know how that works
Could open up a lot of possibilities, I like it
Good news is you don’t need to - LLM knows this stuff
For optimization, you could also let people decide the range they want the regex to run on.
The whole chat? or just the last message(s)?
Well that’s the beauty of the tags system
oh that's already part of it? ^-^
still haven't messed around with it!
Well, I haven’t added any tags yet to Edit History but that’s pretty interesting thought there
edit history?
tags either have a condition (trigger and/or RNG) or they don’t (apply always). Eithrr to user msg, llm reply or both
okk
Reality suggested that regex could be applied to the entire conversation and I thought perhaps they meant including history 😛
I thought we were talking about tags like image tags etc
nothing naughty :/
This channel is nsfw no worries
Aye, figured out what was wrong with pyaudio!
It was the audio format.
I had int16 correct, width 2.
Issue was that pyaudio assumed it's a signed int, only 0 to 65536
But what we want is -32768 to 32767
Figured this out by hovering over the open function and realising the default value is wrong.
I didnt mean that 😭
I thought you meant regex for searching the tags.
Like match anything \#[0-9a-f]+\ and trigger a tag that converts this hex code to a color name.
Nice
I had a feeling it was going to be something like that
Regex substitution would also be cool
but don't know what usecase it would have yet
It's good to have a library of tools!
The use case I have already personally, is that I have an effective prompt to get the chat context to write a pretty nice image prompt, with chat history intact. But they always say Oh blah blah blah and this that, here’s your prompt:
grammar is failing to help
Regex would trim out everything before “prompt” easy
It happens like clockwork
I would take this from the other side and have the image tag just extract the image prompt.
Because that whole text prompt "generate an image" is temporary, so no need to change it
Reality I think categorising the facts might help in narrowing down
it's like folder - sub folder - sub sub folder relationship
Character
- John
-
- School Related
-
-
- Grades
- 90/100 on math
- 80/100 on english
- 60/100 on geography
-
-
-
- Club
- Joined basketball club
- Have a match next June 22th
-
at this point AI will have to come up with the categories haha
yeah totally automatically
Yes that’s the idea
generated
The text related tag behaviors can apply to either user msg or llm resp
all saved in a yaml, json file or a actual folders wih text files
Anyway, more tools more power
pawah
you gotta give autonomy to the agent so the agent can use those tools on it's own when sees fit
Someone wrote a fork/patch or something for discord.py that enables voice receive.
I used this to create a talking status indicator.
But I think it's capturing the raw wav data for each user.
if we decode that we could use whisper to STT to LLM to TTS 😸
I also think I remember seeig some forks for Whisper that enable streaming of text from stt.
so the response times could be pretty fast if done right
last time I did something like that was with Gpt2-tiny haha.
That was a lightning fast model.
As well as a really fast tts.
And the small voice rec model I used was pretty bad with the outputs, so model got confused a lot
I suppose they chunk the audio in silent moments
bla blabla ----- so yeah let's go there ----- yeh yeah
---- representing silence
that could be it.
I was thinking something closer to streaming per-word
Like Zoom's STT.
It writes as you're speaking and corrects itself if context changes the meaning of a word to another that is spelled different
hmm but it's.too late when the bot or agent speaks on real time
like a human, you can't erase what you already said
loudly
yup, you'd have to track what the bot said as it's talking so it can handle interuptions
well about the STT with zoom,
if we have something like that, you would put the text in a buffer, like wait half a second to make sure it's correct
Also you wouldn't be streaming text to the llm, that's a lot of prompts per second
it would be done in chunks when needed
why stt > llm > tts?
Ok so my Ogma bot is now working. This line in he errors. I think my old character yaml was part of the problem.:
WARN [bot.main]: ** No extension params for this character. Reloading extensions with initial values. **
swapped him to your example minty yaml and he worked
i think it will be a while before someone trains something line Gpt4o with audio input/output built in
Oh, I sould get back to working on STT->TTS
is it really one model? or multiple models?
no clue,
But I remember seeing a post that someone made a model that accepted speech directly and this was way faster than STT->LLM
So it's not a stretch to assume openai has similar tech
why are you doing this? why going from stt to tts?
repeating back what already said.. but why?
Transcripts on transcripts
voice to voice native model? I can't think of something like that, I'm like in disbelief
interesting, doing evaulation tests of stt models?
but I guess it's all tokens in the end of day
predecting the next token (voice chunks)
Like LLava for example, the model is trained to accept the embeddings of an image.
Instead of doing something like:
Image -> Blip (captioning) -> LLM
or:
Image -> Blip for initial caption -> LLM (with prompt ask about image) -> Blip (ask it questions) -> LLM
I've done that with Blip, it didn't work amazing, but it was something!
I did get the 'Hey Chat' to work in a webapp example, to then send a speech request -> whisper -> LLm -> Alltalk -> text+speech back. Hands free response which wasn't terrible with llama 3 8B 4Q
But... Y'know talking is overrated. I barely have conversations all day to begin with..
similarly, speech (the tone of your voice, the emotions...) can be encoded as a vector.
That could be fed to the LLM that's finetuned for that
Again, just guessing, I didn't read too much into that model
that's cool!
llava actually converts the image to text then processes the text using the llm?
Yea, I feel that,
human interaction is also important!
it converts it to special tokens, it doesn't convert it to text.
A LOT of special tokens, like 800 of them haha
If you sometimes use TTS and sometimes don’t, you’ll see this warning frequently
but why?
As long as we see no errors 😅
tokens. more tokens!! tokens everywhere!!!
am I a token 😐?
If you appear enough in a training dataset, maybe there is a special token just for you :P
Happened to a few people actually
yaay my token
Good news fellas, the bot now prompts for your discord bot token if it’s not in config.yaml
those usernames got pruned from the training data because it was garbage to train on haha.
And saves it !
and then the LLM lost meaning to those names, and they became glitch tokens
(from a youtube video about this topic)
hey openai, I demand you removing my this chat from your training data. now!!
guys say hi to openai
nice!
even more user friendly

😭
The only manual step now (besides setting up discord bot technically) is moving bot.py
lol
Well...I'll need to spend part of this weekend to create my character yaml's, and learn more about that. Swapping to M1nty surprised my users 'Who's this bot??' 
same, even through we talked about it moments before, it came as a surprize the first time I ran it on my main bot.
Did he speak of cryogenics? 😛
No, but lots of Tech comments
if ('change_llmmodel' in tag)
and
(not (llm_payload_mods.get('change_llmmodel'))
or llm_payload_mods.get('swap_llmmodel')):
this is how python interprets it
I did this so Incould initialize the mods as an empty dict
i'm not familiar enough with tags to know why "not change_model" but "yes swap_character"
I want to ensure that the first tag processed with either change_X or swap_X is the one that will have effect
Because I’ve taken care to keep their priority in check
Reality is it possible to make a group chat of multiple characters, each one have their own system prompt etc.. that doesn't break?
If you can write it better, please 😛
like if you provide enough details for evey charavcter in system prompt
and swap system prompts aka swap characters
each character is it's own prompt with it's own system message.
So this should work.
@halcyon quarry should be able to answer how to do that better, probably using tags to swap the character prompts
just trying to understand it first
I was originally capturing both tags, then just prioritizing “change” over “swap”
so if that will work, why the current approach doesnt work (idk exactly what it is)
but I hear often that groups chats of llms doesnt work
okay I think I see,
with the first item
If flow is a tag, but not part of the params, add it to params.
maybe bcz one character/system prompt/agent generates all the group chat?
thta's the problem?
Yes, only want to get those keys on the first instance. If it exists in the dictionary, ignore that key pair
But with swap/change.
You have some possible bugs I think.
What happens if swap_character is in llm_payload.
But not change_character?
The next line of code would indicate that it writes tag[change_character] to llm payload
and then on the next if statement, you have the same condition, which will then write "swap_character" to llmpayload
so to get both, you only need "swap_character"
If X and not (A or B)
If either A or B is in the dict, it will be ignored
When you switch characters, you're switching the prompt that the history gets placed into.
This would change the system prompt and all character information if you wish
For example through using the tag system to generate an image
I believe everything is replaced to the image generation prompt/character
Woops
Got it,
I had a feeling it was a binary operations issue
it can be confusing to write sometimes!
But, if you want one or the other, you can use if, elif statements.
It will run the first, and pass
or skip the first because it doesn't match, and run the next
your code is written as
If X and (not A) or (B)
if B exists, the 2nd half will always be True
ohh
sorry
you're right
it's git's hightlight that messed me up
Sorry for that confusion 😅
Planned changes:
- Will stop saving sd_model_checkpoint and vae to activesettings. This was necessary before I added api model loading.
- Consolidate on_msg_gen n hybridllmimggen to hybrid_llm_img_task as mentioned before
- regex_prompt tag
- all things on TODO list (📍pinned msg)
ahaha just noticed this
some of them might be CTX from commands
i'll take a look at that now
I was contemplating using ictx in places where I send ctx and i to same place
ctx is short for context, it would be fine to use that meaning interaction.
it's the context of the interaction/command/message!
So far, haven’t needed to do anything limited to one or the other
i and ctx objects have a few differences
yea, I'm just talking about the name
yep 😛
anyway, this is an example script.py
import gradio as gr
import torch
from modules import chat, shared
from modules.text_generation import (
decode,
encode,
generate_reply,
)
params = {
"display_name": "AD Discord Bot",
"is_tab": False,
}
def setup():
"""
Gets executed only once, when the extension is imported.
"""
pass
this is the entry point to TGWI
we could put the bot.run() call in setup()
I only want to distinguish the labels in case some day I do have a reason to handle one different from the other
spawn it in a new thread
if you ever need to check what an object is (because sometimes you will have A or B)
You can use
isinstance(X, A)
this also works for baseclasses, like checking if discord.Member is a discord.User which is True.
And my idea was to have a version of "Cmd_flags"
that you could pass to the bot as well if you wanted to.
But I think that's covered by prompting the user in cmd now
That could be good bc apparently it’s not intercepting TGWUI flags anymore
from the file
this seperates the flags because I don't think there would be a way to inject a new arg parser into tgwi if this is an extension
Since extensions load after that file runs
Sounds good to me!
Anything you’d like to do, it’s our shared canvas to paint upon
I believe this does the same
using the elif statement it will ignore the other if the first matches
Leave that as I had it though 😛 Pretty sure that won’t work there
Sure!
it is iterating one tag key at a time, so first one is change_char. Ok, now its added. Next key is swap_char…. Also added!
Because it’s not checking that the other is added yet