#ad_discordbot (Fork of Fork of xNul's bot)
1 messages · Page 22 of 1
dim the lights 🤣
Lady chatbot, set the mood 🕯️
but can it be triggered by the bot it self?
Like all tags, yes
i meant if bot says dim the lights isntead of me
'search_mode: llm'
you made all tags work with search_mode: llm?
last time when i logged into the bot's account the tts tag didnt work
It should work
search_mode: userllm works?
Of course
if those works too i think i have a very easy plan for stt if im correct
But that would trigger for user and llm
As far as I’m aware all bot features work as documented 🤓
last time i messed around with tags you told me that most of them only work with user
I had moved the TTS tag handling to a “process_generic_tags()” method
and you were thinking about using the censor related code to make tag work for llm or something
¯_(ツ)_/¯
I think I did do that
it it works ill try everything later
It reviews TTS replies to check for censoring before sending
I’m not 100% sure if API response_handling / workflows are injecting saved variables correctly - I need to take another look there.
Actually I’m pretty sure it is but I just need to make it very clear you need to include an “evaluate” step to convert the string to list/dict/int/float/etc
idk I just need to look again
Can definitely see the light at the end of this tunnel I’ve been in the past 6 weeks though
I see that both match_tags() and apply_generic_tag_matches() are applied one time if no LLM gen, and twice if yes LLM gen
Traceback (most recent call last):
File "D:\text-generation-webui\ad_discordbot\bot.py", line 2089, in llm_gen
async for resp_chunk in process_responses():
File "D:\text-generation-webui\ad_discordbot\bot.py", line 2050, in process_responses
chunk = await stream_replies.try_chunking(base_resp)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\text-generation-webui\ad_discordbot\bot.py", line 1983, in try_chunking
await apply_tts_and_extensions(chunk) # trigger TTS response / possibly other extension behavior
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\text-generation-webui\ad_discordbot\bot.py", line 2000, in apply_tts_and_extensions
audio_fp = await api.ttsgen.post_generate.call(input_data=tts_payload, extract_keys='output_file_path_key')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\text-generation-webui\ad_discordbot\modules\apis.py", line 1413, in call
results = await handler.run()
^^^^^^^^^^^^^^^^^^^
File "D:\text-generation-webui\ad_discordbot\modules\apis.py", line 1712, in run
step_result = await method(result, config) if asyncio.iscoroutinefunction(method) else method(result, config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\text-generation-webui\ad_discordbot\modules\apis.py", line 1765, in async_wrapper
raw_result = await func(self, data, config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\text-generation-webui\ad_discordbot\modules\apis.py", line 1880, in _step_call_api
client_name, endpoint_name = self.resolve_api_names(config, 'call_api')```
i just updated the bot
🙉
Seems like the response handling returned wrong format data
I debug this tonight
ive noticed that all talk extension alltalk stopped working, idk if i touched anything, and with the extension does not join the voice chat
Actually I think it’s something else, but in any case, I need to improve the error logging here
If the TTS Api client is enabled, it will override the TTS extension
So if you go into the api setting dict and change the alltalk API to disabled, the extension will work
i turned api off as it is failing
I’ll add a log statement for that behavior
It’s mainly so you can manually kick the bot from voice channel and still have it TTS but not play it in VC
Then rejoin it
The only other alternative is the /toggle_tts command which will make the bot leave/join VC but also enables/disables TTS
idk why i dont have that command
but i do have speak
Try closing / opening your discord
i did that, but anyways the search mode:llm is from llm and not discord?
userllm means it can trigger from either user text or LLM reply
user means from user text only
llm from llm only
so discord message from bot doesnt count?
From another bot?
from the same bot
i was thinking haha i have the stt done, i just make the bot itself send the message and add the tag and whoalla stt done :v
The bot does not analyze its sent messages to trigger tags - it analyzes the text it generated, and will trigger the tag match before sending the reply
that made my life tougher
but it doesnt work either
so these are my tags
and i didnt say the word but made the bot say it
and it was my fault
😅
it didnt work either for me
Of course the tag triggers its just that TTS was already processed by then
As you said, I do need to slip in some special handling specifically for this scenario, in the same place that censoring can be applied
i think that the silence doesnt work becuase my bot with the extension is a little bit bugged
id like a tag that make bot it self generate a text i think that "should_gen_text: is not the thing that i was looking for ;-;
Check that the names here actually match the names in your enabled TTS client
the main issue here though is just bad error logging on my end
The actual error is a bit ambiguous from your error log
a little busy right now, ill try later
@valid crypt I found the issue
it was bad code on my end
I just pushed the fix
really dumb mistake
amateur level 😛
resolve_api_names() was async (and I was not awaiting it) but was not supposed to be async
Not possible really beyond should_gen_text / should_send_text
It makes sense to honor “should_tts” from bot reply - I will add this
but should gen text does not make bot generate text, and is there any reason to not look for tags from bot's discord message?
a tag that sets chance to reply to itself to 100% once?
but it must detect the tag from the discord message and not from llm :(
i accidentally updated the tgwui and i got error launching the bot, later i did git reset --hard and updated the bot and i got 23:14:11.696 #2098 ERROR [bot.__main__]: An error occurred in llm_gen(): attribute name must be string, not 'NoneType' Traceback (most recent call last): File "D:\text-generation-webui\ad_discordbot\bot.py", line 2089, in llm_gen async for resp_chunk in process_responses(): File "D:\text-generation-webui\ad_discordbot\bot.py", line 2076, in process_responses await apply_tts_and_extensions(full_llm_resp, was_streamed=False) File "D:\text-generation-webui\ad_discordbot\bot.py", line 2000, in apply_tts_and_extensions audio_fp = await api.ttsgen.post_generate.call(input_data=tts_payload, extract_keys='output_file_path_key') ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\text-generation-webui\ad_discordbot\modules\apis.py", line 1413, in call results = await handler.run() ^^^^^^^^^^^^^^^^^^^ File "D:\text-generation-webui\ad_discordbot\modules\apis.py", line 1712, in run step_result = await method(result, config) if asyncio.iscoroutinefunction(method) else method(result, config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\text-generation-webui\ad_discordbot\modules\apis.py", line 1765, in async_wrapper raw_result = await func(self, data, config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\text-generation-webui\ad_discordbot\modules\apis.py", line 1881, in _step_call_api api_client:APIClient = api.get_client(client_name=client_name, strict=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\text-generation-webui\ad_discordbot\modules\apis.py", line 206, in get_client main_client = getattr(self, client_type) ^^^^^^^^^^^^^^^^^^^^^^^^^^ TypeError: attribute name must be string, not 'NoneType'
also i didnt touch that, i have it too
also, like in my asr bot, i could send those tts tags, but if one day i managed to add stt, how could i do it then?
Sorry that you’re having bugs - it’s helping me though 😆
Updating TGWUI shouldn’t be an issue with my bot
Unless he like, just made some big change yesterday
Again, this bug is probably my fault here
Will solve this soon
What other issues were you having with TGWUI?
before doing the git reset --hard i got syntax error, but it disappeared 🤷♂️
so no more
TGWUI used to expect a few params to be a comma separated string. dry_multiplier something, and custom_stopping_strings / stopping_strings
If you check dict_basesettings from settings templates, the values are good for new TGWUI
i've notived that the tts dont work if not written here
and i think it was buggy because i had 2 at the same time or something
¯_(ツ)_/¯
In latest bot version it’s nested under ttsgen
right, i cloned main
On main, the ttsgen dict is ignored
i cloned to debug ^_^
On main, there is no TTS api so that makes sense
but i should clone the api branch :V
Er I think I did have some rough hack thing
Yeah, I’ll see if I can fix that Nonetype bug
you got an extra } in the template
i have to sleep as soon as i can and i spoted this new thing that ive never seen before i think, from TGWUI maybe?
Do you use --extensions flag with TGWUI CMD flags?
Or have it enabled by default in TGWUI settings_tamplate.yaml?
i noticed that i had alltalk_remote in setting.yaml as extension
so no idea why alltalk_tts
I fixed the bug with Nonetype
i removed the setting.yaml only left that and same issue
again, my bad
ok I see the issue with alltalk
extension method just isn't going to work with alltalk_v2 on my bot moving forward - will have to be API method
also im 99.99% sure that should tts tag is not working after doing a fresh install, by changing should_gen_text i proved that the tag was detected (the error is just because the server is not on :v)
should_tts tag works if it's the user's text
But it does not work if its the bot's text
i did that
until I add that modification I said I need to add
i triggered that tag :)
hmmm
i even thought that i mistyped silence or something, but changing should gen text to false, i was sure that the tag is triggered
Yeah, I see. I actually didn't move it to "generic" tag processing because it wouldn't matter, currently
I could move it right now and it would trigger after LLM response, but TTS would already be handled
I'll see about sneaking it in like the llm censoring...
as in, I'll see right now
also just asking, why look for tags in the llm response instead of the message sent by itself in discord? the result wouldnt be too different but makes me happier :v
hmmmmmmmmmmmmm
This is for your super niche use case here
The point of even checking for tags on the response, before sending it, is to further manipulate the result before sending it
Maybe you could do something with 'Flows' or 'persist'
I looked into it, and having the LLM reply triggering should_tts: false is way too much trouble than it's worth
for the result of not tts throught voice chat, i think that it works with pause tag
Well it would still generate the TTS
with streaming actually does not take too much time, if energy is not the concern
The censoring really makes sense to me, so I did come up with some creative way to check that. Specifically checking if LLM reply text should trigger should_tts seems exhaustive to me
that make sense, but a lot of things can be done easily if it checks messages too, maybe only check if not from llm?
Final answer, not adding that
I have in my pinned messages make bot able to read recent discord messages (ones not in bot history).
the pause tag is very useful, maybe in the future literally bot creating event based on time, like turn off the light in 2h
or a separated script of sending dim the light tag it self in a programmed time
When I get around to it, it's possible I could have some sort of 'tags' handling for this
I've had a thought for a "create_task" tag - asyncio tasks can be scheduled like that
Well actully....
Scheduling stuff sounds interesting to me, just need an idea how to streamline it in a sensible way
The only scheduling stuff I have atm is auto-change imgmodels, and spontaneous messaging features
i was thinking of checking a folder with yamls or jsons that contains the tag that should send and when, one time or schedule, if one time delete afterwards -if bot checks its own message
Now, I could add something like another parameter for tags like the call_api tag or run_workflow tag - param like send_in_x_minutes - which would not be practical to use directly.
HOWEVER
If you used a Flows tag that secretly asks a specialized character context to decide the timeframe
It can see some recent history and then reply with the minutes value
flow tag
🤷♂️ neat things but not quite practical lol
Maybe, maybe. idk. There could be some neat ideas there, with using the flow tag to have a character decide when to run a workflow/api call
The flow tag is super cool though, you should look into it sometime
why couldnt it check both, i dont think that they have conflict, as from your words, check before is to manipulate before sending the result, so the result is impossible to have tags to manipulate the result before sending
In bot.py search for “process_generic_tag” and also search for “process_img_tag”
Can also look at “match_tags”
the LLM’s reply is checked in same way as user text
Generic tags are applied, and img tags if applicable
only found this
Maybe I forget the exact names heh
if you allow me, my approach to stt would be match tags from bot's discord message and make should_gen_text: true actually generates text(if sent by bot)
although i dont know if i can do the second part ._ .
Try flow tag
The generated text can be the “user prompt” for the next flow step
but for the case of stt, there is not generated text
what im looking for is to trigger llm with the result from the stt
i was reading on_message, queue_message_task, ............................................................................................................................................................................................................................................................................................................................................................................................................................................................
and im 😵💫
erm i got an idea
message_manager just factors any of the "human-like" behaviors (delayed responses, etc), before queueing it to the message_queue in task_manager
message_manager also stores and sends the final messages if they are supposed to be delayed
i just checked that if reply_to_itself: 1 it actually matches tags
yes, it's own message would be read in as a "user" message
my brain stopped working
ah, always including the should_gen_text:false before sending, then we got a tags matching in bots message without doing a chain
uhhh, smells like sh, id better sleep first
i was trying to understand and add stt result as input
well 😴
I would need to start messing with STT to understand, I don't quite get how that works / factors in
i mean, i already done with the job, my code gives the transcription for the voice channel the bot is in, grouping messages if multiple users speak at same time,
something like this
Jonh: yes
Marcos: bruh```
based on display name, although i think that it only works for 1 guild...
and i think ill remove the grouping mechanic, as it is more useful just grouping them'
this is how i did it, i think it was under STT PROCESSING or something
the bot already does the stt but i just dont know how to process it so i made another bot to read the .txt :v
works
so that is the progress ive done
Looks like a good place to manage that attribute
been spinning my wheels all day trying to generalize the image model management
(●'◡'●)
Hoping to wrap it up tomorrow in 15 mins or so
buddy! how well is it? 😃
it should match tags from bot's message and at least it worked with tts pause tag
You've stripped out a lot of important lines from Ok I understand the existing code is below / cropped out from screenshoton_message()
Alright I see what you've got going on...
the thing I don't like about that is that it's not configurable, and can bypass current configuration
:(
I applaud your effort though 🙂 I'll mull that over
erm ive just updated and
Did you null / remove the output_file_path_key: in the ttsgen settings?
i removed
Is that AllTalk printout?
🤷♂️
this happens when i remove or null that
I might just have a debug print statement in there somewhere that is printing the "bytes" response from alltalk
Is it otherwise working correctly?
local all talk works, and that one does not work
Default - this is what yours looks like? (aside from URL)
yes
and i dont think so, the remote alltalk console didnt print any requests
Well then I think maybe something is borked in the TGWUI settings
If alltalk is not generating anything, then that is a pretty strange printout....
hmm.
the thing is too fast
i can send more messages but that is all, the bot is not replying and etc
another one, although i dont think that this info is useful :v
That's very odd...
In modules/apis.py
Go way down to line 1721 and uncomment this one
And if you don't mind,
try just doing the first step
response_handling:
- extract_key: output_file_url
save_as: output_url
(remove the last 2 steps)
Yeah - I'm working on updates so my lines shifted a little
Could uncomment both of those
Ya know what,
The thing that gets me is, why is alltalk not printing anything....
in its cmd window
Alright - I'm going to go out on a limb that you're trying to feed text into this from the other bot or something?
Maybe something you're messing with is the cause?
Anyway -
I can see in your video that it is indeed triggering the response_handling
which is what it should be doing
i think you can do it too, nulling with local all talk it also cause that
yes - alright lemme see if I can reproduce
it does do the request 😅 but everything is not working
Alright - that's good to know
yeah... bug... hmm
Does not seem to be saving the file
Looking into it more...
Ok I think I must have screwed something up in the call_api step
yes something very strange happening...
Yeah, I'm a dummy
think I got it, lemme test real quick...
change return step_result to result = step_result
I had tweaked something else in this run() code and I screwed this up somehow
Big thanks for helping me bug test this branch
My settings management can be a nightmare to upgrade
As I'm finding with this image models crap
I'm in the process of generalizing the Progress bar that appears when generating images
In a way that users can easily apply to any other task
Well, so long as there is an endpoint to fetch progress
How this will work is via a "group" step - which is defined by sub-lists of steps
The step groups are collected and executed with asyncio.gather() - like the image gen / get progress tasks are already handled
I'm excited about this 
It's going to be something like this (there will be changes)
I just made a huge overhaul for the progress fetching... lots of complicated things... seems to be working 100% on the first test
My mind is blown
I was thinking to myself: There's probably some other reason besides "checking progress" for "polling" an API (repeatedly sending a request)
I was able to generalize the .poll() method so it can be sensibly used for other reasons.
I brought all the "check progress" logic from that outside to a .check_progress() which will in turn use the correct arguments/etc to run a .poll()
Also had a lot of duplicate code in the StepsExecutor (response handling) and in the ImgGenClient (the API that is the "main" imggen client)
Now very clean
At this point, I mainly just need to dial in the websocket support, then make sure I can run ComfyUI workflows
Textgen API for main functions, will come further down the road
/imgmodels command - ComfyUI 🥹
Need to make some logic to actually apply this to main txt2img / img2img workflows
Will have to be some comfyui specific code
(basically just find the node in the payload and create/retain an override)
Naturally I got sidetracked
As I'm trying to get Comfy in, I find myself writing if api.imggen.is_comfy() / if api.imggen.is_sdwebui_variant() / etc all over the place.
I had a moment of clarity, realizing that a few months ago when I restructured the Settings management, I wisely made an ImgModel() class that since hasn'y been doing much - I can just dump all the model management code in there (where it belonged all this time) and now subclass ImgModel() for those variants to do specific stuff
Bonus side effect - the "auto-change imgmodels" feature can now work with "per guild" settings
I had an idea to allow “Dummy endpoints” to be set up which would just return preconfigured data. For example in the “/image” command I had meticulously made ControlNet option that reads a uniquely structured response from A1111-like clients only. The response is essentially a schema for what options are valid for each controlnet model. Comfy unfortunately doesn’t have this, but I could put an example response in “examples” for Comfy users to manually populate - they could have a Dummy “get_cnet_control_types” endpoint that simply returns it. They could use the {cnet_model} {cnet_module} etc in their workflow json and the bot would format the selected values in
Seems like I’ll need to make a Comfy workflow that can optionally use some of the extra features depending on bot config without having to hotswap workflows
… might need to reach out for a comfy expert on that one
If / else / eval nodes are so clunky in comfy I haven’t figured out how to use it
Ok so I think it makes sense that a “dummy endpoint” would be one where the method is explicitly “null” (opposed to GET/POST/PUT) - and the input would just be returned
0 understanding pure believe 👍
the thinking mode for qwen3 is disabled by adding /no_think to the prompt i think :v
Basically, if an API does not have an endpoint to return certain data for main bot functions, that data could be prepared by the user and put in “user/payloads/“ (ei: cnet_data.yaml) then use that as the “payload” for an endpoint, with method: null
When the bot tries to use that “main endpoint” it won’t actually make an API call, just receive that data
i accidentally uninstalled the nvidia gpu of my laptop and it is gone 🙁 as it is a laptop, i can plug and unplug so...
im cooked, although i know that reinstalling windows will fix the problem
it is time to do the idk what time of trying to add stt! muahahahaha
the driver? just reinstall the driver?
no, the device ;-;
that thing ;-;
dont try it on a laptop ;-;
at least the system is fine i just cant use the dedicated gpu
The device is the driver 🤓
If you lick Uninstall device you are only uninstalling the driver
cant*
I assure you, maybe you are just downloading the wrong driver package or something
so as you see there is just an igpu,
Go to the website for your laptop model and get the latest recommended driver package from there
and this is what happens
Get it from your laptop site
a fresh windows without driver still have the gpu in other devices but i only have a useless usb4 thing
from the laptop site, it gives me these little things
and after checking it is the same driver from nvdia but extracted
although it gives me this
Maybe try an intermediate driver version between that one and the latest
If the error changes try higher or lowr
i solved it somehow, as laptops have a switch that can turn off (physically?) a gpu, and as i messed up with the device so yeah a lot of weird stuffs, definitely window's fault
not doing that again
:P
idk how the hell it went to npu 5 and gpu7, nice experience
Npu?
Is this one of those ai enabled laptops?
yes, but it is nearly useless
too weak to run big stuff, too few users to add support for it
i think that the only features that have support are some camera effect and noise suppression that does not work with the laptop's mic 😅
not future proof at all, so pure marketing!
So here's the system that is going to get bot variables into ComfyUI workflows (and any other API) for "main functions".
The default payload will need this block copy/pasted into it (with more/less details), populated with whatever default values the user wants.
"__overrides__": {
"pos_prompt": "beautiful scenery nature glass bottle landscape, purple galaxy bottle,",
"neg_prompt": "text, watermark",
"width": 1024,
"height": 1024,
"ckpt_name": "sdxl\\artistic\\leosamsHelloworldXL_helloworldXL70.safetensors",
"seed": "156680208700286",
"character": "M1nty",
"cnet_image": "input.png",
"cnet_mask": "input_mask.png",
"cnet_model": "diffusers_xl_depth_full",
"cnet_module": "depth_midas",
"cnet_weight": 1.0,
"cnet_processor_res": 64,
"cnet_guidance_start": 0.0,
"cnet_guidance_end": 1.0,
"cnet_threshold_a": 64,
"cnet_threshold_b": 64,
},
Then wherever the dynamic content should actually go in the payload will be mapped like this:
"6": {
"inputs": {
"text": "{pos_prompt}",
"clip": [
"4",
1
]
},
If the prompt the bot will use is something like Jerry Garcia playing guitar it would update the value in __overrides__ before the injection
"6": {
"inputs": {
"text": "Jerry Garcia playing guitar",
"clip": [
"4",
1
]
},
I'm also going to also make it so that model specific values can be defined by the user (via dict_imgmodels.yaml)
As in, for Flux models they could define variables for the extra modules (vae, clip, text encoders, etc)
hmm
Of course it doesn't work that simple for Comfy to switch between model types, because the nodes would have to be bypassed because they don't accept "None"
welp, Comfy users won't be swapping model types that need more or less models so easily... I don't have a good solution for this.
They'd need some conditional node to ignore the extra modules
Actually I have the solution
Been wondering why my trial comfy API requests keep failing, it’s because the whole payload needs to be the value for a “prompt” key
Pretty unintuitive structure
I need to get back into discord bot. I've got a decent 40t/s Qwen3 30BA3 on some llama.CPP server and just need to test the difference.
How's some tool calling with the discord bot? I've got a few automated research Python tools I might look to add and such :/ maybe just adding to the application command list instead of asking directly..
I guess look into think/no_think application command settings for qwen3, and how it handles showing it or not.
I'll branch and take a look. I've been dealing with some discord bot designs recently for auto-posting reddit/YouTube/news and summarizations. Usually just with Gemini flash
For the past ~2 months I've been working on a "universal API system" for the bot and it's really starting to come together
I wrote a step-based system to handle data, which is pretty versatile... this right here is actually working to get ComfyUI result image using generalized logic (Not some comfy-specific hardcoded methods - a user could potentially navigate the response and manipulate the data like this for any API)
So the response handling for this txt2img API call triggers a subsequent API call, and yet another API call
Each comfy workflow will not require this big code block. This can just be a “preset” and each one could just have a
“preset: Save Comfy Image”
And now I need to check if nested presetting works… because for video output I think the ending steps will be slightly different
Although I haven’t tested it at this point, it should be capable of generating videos and sending those results to discord chat via Tags
Comfy ui is supposed to enable multimodality yes?
Yep! The multimodality wont be practical until I do the commands thing though… although before I do that, maybe I should see about the bot being able to process image attachments on normal messages
Things are still going great
I have the progress tracking for ComfyUI working - which is via websocket
Still going good...
I've been structuring the system in a way that makes it very easy to define how to handle things from "known APIs" (A1111 / Forge / ReForge / Comfy / Alltalk / TGWUI / etc).
So the user configuration will be very simplified when using these for "main bot functions".
I had to come up with a very creative solution to do the progress tracking via web socket due to the way web socket messages are received
That’s working flawlessly now
The tricky part about it is that when you use websocket.receive() and filter for the data type, and get the data based on the queued 'prompt_id' it returns each result sequentially. So if you use something like asyncio.sleep(5) (wait 5 seconds) the next message received is still the next progress message and not “the latest progress”
If the bot edits the discord Embed for each update, the whole script gets throttled
The strat is to get all messages but ignore most of them. But then the “last message” is almost always ignored, and then it stalls waiting for another message that never comes
Solution was to buffer the last response while otherwise ignoring messages based on a time interval. Then intentionally setting a low “timeout” value for ws.receive() so it doesn’t get stuck waiting for that last msg that’s never coming
I'm very, very close to pushing this to Main
new tgwui 3.4, after seeing this i smelled that vision support is not very far away
and with these, im ready to throw ollama and lm studio to the trashcan :v
Going to add one more “step type” - an “ask_for_file” step which will have the bot send an ephemeral message asking for input
This will be a crutch to enable complex workflows like a Comfy workflow that uses multiple image/video inputs, while I work on the user commands feature (that will possibly obsolete needing that “step”)
step is prompt_user Pretty simple and effective
Got image2image working for comfy as well. File uploads are really tricky
I lied when I said I had websocket progress tracking working flawlessly. A last detail of it is driving me nuts
I need to add logic to optionally check for a "completion flag".
Quick report - I updated TGWUI to latest and bot is working fine
Well, chatgpt kind of solved my problem. It created a generalized "completion condition" checker thing, and when I test it with certain values it works.
The problem is that Comfy documentation seems to be lying about the websocket output messages? I'm printing the raw outputs and the condition they say you need to check for never actually appears
nvm I think I found the issue...
Ok. The whole issue is mainly because Comfy documentation totally blows
the payload needs to be sent with a client_id variable bundled in otherwise the websocket doesn't sent all messages
Going to start working on the Wiki for this
@valid crypt please let me know if I’m remembering this correct…
- since you were using alltalk remotely it gave a file not found error when trying to access the file locally?
- when using the URL from output instead to get the audio, it returned it in bytes?
For user convenience I’m trying to ensure certain things just work for known clients even with faulty configuration
the first statement im sure that's true
the second one not very sure but should be true
ahh crud
it sure looks like comfyui API does not have a route to upload a video
image inputs only
erm, looks like the bot could just be configured to allow directly downloading content to specific locations outside the bot's local environment... hmm.
ei: the comfyui input directory where the /upload/image route receives images to
yeah, perhaps I could allow a configuration for each API client
this is the obviously best solution
There’s now quite a number of “context variables” that can be formatted during response handling steps, Workflow steps, etc. Can be from the running Task (prompt, neg prompt, etc), websocket variables (client_id, session_id, etc), and saved data during Steps.
I added logging that will indicate what and why formatting happened so unexpected formatting can be noticed and fixed
Also! strings with placeholders that would take a 2-step process to convert to a python value, now happens automatically.
“‘prompt_id’: {prompt_id}”
This will sub in the value then convert it to a dict. And logs it.
In regards to file saving - I’m going to add a config setting for “allowed save locations”, by default the bot is only allowed to save in working directory. It can check config when saving. That solves the “non-image inputs” problem for comfyui
Also almost done with making all settings go to /user/settings/
bot_token.yaml will be a separate file there. This will prevent all the comments from getting wiped from config.yaml when first time users input their token via the CMD window
Pushed that
this user_apis branch is a bit mislabeled it's more like a major version upgrade
It will automatically move old settings to that dir and log it.
It will also automatically snag the existing bot token from config.yaml if its there and save it to the new bot_token.yaml
Just added the allowed save path logic
claims to be better than xtts? https://github.com/index-tts/index-tts
The first thing they emphasize in each section is superior handling of chinese language, so that’s the main focus among other things
Lemme know if you try it!
actually im more interested with gptsovits, its devs are cooking and very much lately
although ill try it :P
Any new TTS clients you’re interested in with an API, and you want to try making it work with the bot, let me know
the fair one to judge with is with 5, but the quality is more like 4, and the speed is not great
absolutely gonna try how good is it at chinese
ahhh, it actually is pretty impressive at both languages, good quality speech but low quality audio
i think that under 32khz, the audio matters more than the emotion for me :v
¯_(ツ)_/¯
bro is leaking 😱
btw the 5 is gptsovits v2, and the latest gptsovits v2 pro plus is around x3 speed, i think you definitely should add gpt sovits to the template
its a zero shot that can be finetuned easily and it provides a portable 7z, just the webui bat comes with chinese argument
I've said this before but now I'm very very close to merging API branch to main
probably 1-2 more days
Yay
Created a new thing in/utils where a payload file can be drag/dropped onto the bat file, and automatically inject most of the bot's dynamic variables into it.
Will make it very quick and easy to convert exported ComfyUI workflows (potentially others) into the correct format for the bot to use with the injection system I dreamed up
Figured out how to dynamically set Loras for ComfyUI payloads via the tags system - using same syntax expected for SD WebUIs (A1111 / Forge / ReForge)
Which is working
actually this can be really interesting, as from the sample audio it can change the emotion, and i remember that you had a tag to change some values of the api call or something
I'm merging this to Main tomorrow. I have most of the documentation available in the Wiki now
Need to detail StepExecutor (what runs response_handling / workflows)
Ideally I think a Flow tag would be used, and the character’s reply would be shown to a specialized character that would revise it to include emotion syntax
The initial response could be sent to channel as is, while the second response is for TTS purpose only
Although I’m not sure if that behavior actually works sending the TTS response without sending the text
Going to see if I can successfully run an image to video generation workflow for ComfyUI via this system, using prompt_user for the input image. Will also try one with video input.
Once I have this example working, I'm merging
This is an important read about injecting bot variables / StepExecutor syntax into payload / response handling values
https://github.com/altoiddealer/ad_discordbot/wiki/APIs-‐-Payload-Injection
Oof. Yeah I'm glad I tried testing this prompt user step
Yeah this is a bummer. I think I have to axe this step for now.
hmm... have an idea to handle it
yes I've added a mechanism to temporarily ignore a user via on_message() while the client is "waiting" for their input on something else.
They won't trigger message responses, etc, while providing expected input to the bot
Merged user_apis branch to Main 🎉
Should be a smooth upgrade:
- on first run, settings files will move automatically where they need to be now.
- the config.yaml file was reorganized a bit. Just back up your current one, use the new one. Update the few values you need to.
- Beyond that, have fun with the new api settings
The only logic I still need to figure out in terms of “main image gen functions” for ComfyUI, is changing model types via /imgmodels. If only the VAE / Text Encoder nodes were designed to accept “None”, life would be easy
Guess I’ll just stick a ComfyUI specific setting in dict_imgmodels called delete_nodes: [“list of nodes”, “that should”, “be deleted”]
I've finally successfully used run_workflow tag to execute a ComfyUI task where it prompts the user for the text as well as the input image, and executes an Img2img call, with progress tracking, saves the image and sends it to channel
with the generalized system logic - Good stuff
Should work all the same for running an image to video workflow
New preset logic - response handling and workflow steps can now be bundled up into presets, which get inserted in-line on script init
cant wait to get vision models working
Should work already via Tags
i need to figure out how all that works
just not as the "main textgen" functions
I'm running out of bugs to squish, things are looking pretty damn good
is there documentation for setting this up?
as far as i know im going to need a vision model for this
i went and downloaded Meta-Llama-3.1-8B-Instruct-IQ4_XS.gguf to use for testing
If TGWUI can run one via API, then the TGWUI API should be able to be set up, and triggered via Tags (call_api / run_workflow tags)
alr
Otherwise you can run vision models via ComfyUI
This workflow here is executing perfectly but it is like a mile long.
I'm planning to try making a ComfyUI specific "Step" that handles most of this automatically
I'm calling this with this tag:
- trigger: image from prompt
should_gen_text: false
run_workflow:
name: Comfy Prompt for Img2img
i will def check this out asap
Really, the last big chunk of steps I could just move to 'response_handling' for the endpoint
My goal at this point is to try to simplify it as much as possible.
Allowing the steps to be grouped into "presets" was a big win for this - most of the steps in what I shared could be slapped into a preset
What I need to add to the wiki is what each main endpoint response should be returning back to the bot script
actually if you mess a little with unreleased versions of tgwui you might get it working right now, theoretically if you get this guy's llama.cpp https://github.com/ggml-org/llama.cpp/pull/14016 then this branch of tgwui https://github.com/oobabooga/text-generation-webui/pull/7027 it should work
ref: #13872
Currently passing media(image/audio) to mtmd is only supported under chat/completion in llama-server.
It is still necessary for allowing mtmd in /completion endpoint, since /completion ...
@valid crypt I just pushed an update that should make the TTS post_generate endpoint handle a remote computer response by default (for Alltalk), without user having to fiddle around with response handling.
If you ever get a chance to try it out, let me know
ill probably wait for a stable release
in the mean time i want to try figuring out how to get the bot talking in voice chat
i think that might be a little more attainable
That's very attainable
To work:
-
your chat character has this value in their character card
use_voice_channel: true -
ttsgen / enabled: truein config.yaml -
ttsgenAPI needs to be configured in dict_api_settings.yaml
talking is already super easy, but if you want it to listen 😅 either you wait for my good news or his good news :V
ok
@halcyon quarry ╭───────────────────────────────────────── Traceback (most recent call last) ──────────────────────────────────────────╮ │ D:\text-generation-webui\ad_discordbot\bot.py:7497 in <module> │ │ │ │ 7496 │ │ ❱ 7497 bot_history = CustomHistoryManager(class_builder_history=CustomHistory, **config.textgen │ │ 7498 │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ TypeError: CustomHistoryManager.__init__() got an unexpected keyword argument 'greeting_or_history'
idk if i did something
I removed that from config 🤗
Guess I should just pop it on script init
ah
It wasn’t working and I didn’t feel like spending time on trying to figure it out
Traceback (most recent call last):
File "D:\text-generation-webui\ad_discordbot\bot.py", line 2066, in llm_gen
async for resp_chunk in process_responses():
File "D:\text-generation-webui\ad_discordbot\bot.py", line 2027, in process_responses
chunk = await stream_replies.try_chunking(base_resp)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\text-generation-webui\ad_discordbot\bot.py", line 1960, in try_chunking
await apply_tts_and_extensions(chunk) # trigger TTS response / possibly other extension behavior
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\text-generation-webui\ad_discordbot\bot.py", line 1977, in apply_tts_and_extensions
audio_fp = await api.ttsgen.post_generate.call(input_data=tts_payload, main=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\text-generation-webui\ad_discordbot\modules\apis.py", line 1913, in call
expected_response_data = await self.get_expected_response_data(response)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\text-generation-webui\ad_discordbot\modules\apis.py", line 2100, in get_expected_response_data
if isinstance(response.body, bytes):
^^^^^^^^^^^^^
AttributeError: 'bytes' object has no attribute 'body'
@halcyon quarry
The endpoint does not have the correct response typed sent, and probably also does not have the correct headers
Speech to text
Oh never mind I’m just an idiot
Thank you for checking that. I’m going to fix it in about 20 minutes.
Really, this is strange... sure looks correct in the code...
Before the error, did it print something like this?
<endpoint name> has 'null' method. The input data will be returned as response data.
eh, even that scenario doesn't make sense...
This is the code that leads into get_expected_response_data()
results = response.body
if main:
# Automatically handle responses from known APIs
expected_response_data = await self.get_expected_response_data(response)
if expected_response_data:
return expected_response_data
It doesn't make sense that the error isn't already raised on this line:
results = response.body
ah alr
yes, I understand the issue now. Thanks a lot of testing
What I'm aiming for is to automatically handle the second API call, but it's supposed to be in a safe way that verifies the end result is indeed .mp3 or .wav format
Just didn't analyze that second response correctly
@valid crypt I just pushed a fix that should work
oki
wait
i wait
messed up something 😛
Ok now its good
err
🤯 idk how I keep overlooking details over and over
now it is 100% good to go
Let me know if it does indeed work - this attempts to bypass response_handling when this known scenario is detected
As an extra safety layer, I'm just wrapping this "expected response handling" logic in a try/except block, so if it fails it will still default back to response_handling
I’ve had a lot of bad commits today
fixed the last bug of the day - working from dev branch now and double checking everything
Going through the steps with a fresh install with my gtx 1080ti 11gb GPU and see what I can run, and hook it up to the discord bot.
That's good, Q4 Qwen 3 14B UD XL with 16k context fits with 16~12 t/s between 0~5k context filled. I need to hook it up and test it out with personas.
I see your notes for edge_tts I'll see about. Really anything simplified is great ty
if you mean this edge_tts in readme... 😅
the project died
There’s a lot of options now because any TTS with an API should work - no longer limited to TGWUI extensions
actually, the edge_tts was special since it has rvc :O 👏
but i remember that you broke it or something
i literally copied his repo https://github.com/marcos33998/edge_tts 👍
If I remember correctly the edge tts extension would generate one format but save it to the wrong format - may have been vits tts
Don’t think I broke anything
i dont remember already
The only thing that stopped working really was alltalk extension - the v2 version
.
The original alltalk still works
was that?
Ahhhhhh yeah
So edge does work, just can’t use the streaming tts option
Chatgpt is a bit smarter now maybe I can look into that again, thanks for referencing the message
May not be solvable though
Marcos the fix is likely on your end
i have no idea i just uploaded the copy i had in my drive
Asyncio.run() is mainly to run async code during script init when the event loop isn’t ready
If it was just an await - no error on my end
So after some simplifying character card I had and setting max new tokens to 150 I don't see any hallucinations so far?
I might need to set the max tokens more
Yeah looks good.
I upped it to 2000 max tokens and 5000 truncation length just to ignore that to test with. I haven't noticed any hallucinations or character breaks yet.
I have a second bot to implement later a new one that will have Qwen 3 30Ba3 32k context with 40t/s @ 0 context. Then add in edge_tts and the basic forge server I have
That one is llama-server based api
Comfy is also working now
Sorry mcmonkey if you’re reading this but I haven’t tested Swarm yet
Also, is your main machine Windows?
Yeah, and my only OS 😛
I don’t know for sure if my installer / updater scripts work for the other OS
I was using Ubuntu on this server with my GTX 1080TI. There's some errors for start_linux.sh
I just had flash 2.5 in vscode do some changes to make it work.
Does the update_linux script work?
Also, is this on a relatively new-ish bot install? (Within last 3 months)
Yeah it's brand new everything. Nvidia gtx 1080ti w/ 570 drivers and cuda toolkit 12.8:
(venv) dundellsdxl@dundellsdxl-box:~/text-generation-webui/ad_discordbot$ chmod +x update_wizard_linux.sh
(venv) dundellsdxl@dundellsdxl-box:~/text-generation-webui/ad_discordbot$ ./update_wizard_linux.sh
usage: bot.py [-h] [--multi-user] [--model MODEL] [--lora LORA [LORA ...]] [--model-dir MODEL_DIR] [--lora-dir LORA_DIR] [--model-menu] [--settings SETTINGS] [--extensions EXTENSIONS [EXTENSIONS ...]]
[--verbose] [--idle-timeout IDLE_TIMEOUT] [--loader LOADER] [--cpu] [--cpu-memory CPU_MEMORY] [--disk] [--disk-cache-dir DISK_CACHE_DIR] [--load-in-8bit] [--bf16] [--no-cache]
If you’re able to tell me what the issue was with the start_linux that’d be nice 🤗 Did you modify it to work? Just share it if so
one min it'd be easier to show as a git compare
I added a lot of complexity with the new logic - to install it as a standalone or using TGWUI venv
escaping regex special characters, "The script is using goto commands (lines 44, 49, 67, 70, 87) which don't exist in bash.", let me check additional notes
I basically shared the windows bat with chatgpt and asked to make the same thing for linux 😛
Oh yeah kind of makes sense. I'm not too fond of chatgpt beyond asking for phone help and registry edits for vague works issues.
I really enjoy Flash 2.5 for most simple lookup and debug. Sonnet 4 has been interesting but it's an intense "Yes" man.
It’s been a bit hit or miss but there’s usually a correlation with how lazy I was with the prompting
I’ve had some very, very impressive results for certain requests
I had a complex set of requirements for what I wanted to do with my new task management system. I shared the entirety of what was my current version. The new code it provided was the ideal solution and worked absolutely perfect first run, and included all my script specific logic for certain things
The new task system is beautiful
Yummy spaghetti
This is going to allow switching between SD 1.5 / SDXL / Flux / Flux GGUF models with the bot
I have a constant process I've used in 4 projects now just to do some simple research from a request given and seeing if I can just implement something like trigger words "Please research how this game uses this item", put up a buffering while it does the research in the background similar to how you handle image generations, and once formulated the results, have it provide the answer or report depending on the requests wording. See how it goes.
It'd be interesting to see if it works later on tonight
The bot now supports multiple queues, so it can handle that while processing additional tasks
Another food for thought, you can configure wildcard values and use the dynamic prompting syntax in a list of prompts for “spontaneous messaging” feature, and set max concurrent replies to -1 (infinite) or some high number
Can even include a trigger phrase for a tag to modify history, replace the trigger with “”
Spontaneous messaging is a configurable character behavior. It’s basically an auto-prompt feature
good news?
not yet
Going through this process ass backwards. Just going to try implementing directly my project https://github.com/ETomberg391/Ecne-AI-Report-Builder and restrict the single-command down to only 3 results, and keyword is the topic unless specified in the /research discord command.
Liking that idea alot more .. Just have /research push a request to report_builder.py with proper arguments to limit search to a single brave api search, 3~5 urls from that search, plus some subreddit searches, let it build the report and wait for the raw final report txt. Then take that and feed it to the discord bot's backend LLM with some prompt "This is a report from the user's request The Request Text, please formulate a response to the user's request with the information provided in this report". That way it can probably stay within the discord's text limit...
could it not before?
i assumed discords messages were within the context window
It’s been able to send messages of any length since day one
The method existed when I forked the bot but it would just split randomly at 2000 characters, I added logic to fall back to last sentence completion, and also to maintain discord markdown syntax across breaks
It never reaches 2000 chars now that it has streaming responses anyway
I managed to get this complex Comfy workflow and logic all working
dict_imgmodels.yaml now supports a delete_comfy_nodes list, so each imgmodel type can delete the conflicting nodes from the workflow
The "Any Switch" nodes make the workflow run correctly
So, it's possible to switch between SD 1.5 / XL / Flux / Flux GGUF from the same workflow. Could "easily" be expanded for other model types like Chroma, SD3 etc
When I find a moment tomorrow I need to update the img2img workflow then will push this to main - I know you guys aren't using Comfy anyway 😛
Update
Bot can now switch between different model types for ComfyUI (Sd 1.5 / XL / Flux / FluxGGUF / and more)
- Example ComfyUI workflow payloads that use Any Switch nodes
- New logic in dict_imgmodels.yaml to delete comfy nodes from payload, per model preset.
- Users can follow the same logic to add more loaders / utilize even more model types.
Added a new "util" to resolve placeholder values back into payloads which have the {placeholder_syntax} within them - basically, to "undo the changes". Motivating use case is to restore a ComfyUI payload to its original state after applying all the syntax to it, to update it within the UI.
Automatically resolves sampler names and schedulers from user's settings that may be formatted for different software (A1111>Comfy and vice versa).
I pulled the update thanks. I'm taking a look at some things for it today. For Ubuntu there's an issue with utils_twgui.py line: from modules.chat import chatbot_wrapper, load_character, save_history, get_stopping_strings, generate_chat_prompt, generate_reply
something about circular imports, and having to set them up dynamically within the utils_twgui.py to make it work correctly. This is the second/fresh test I'm doing before Attempting testing around, adding the /research extra addition I wanted.
I noticed you had added something about that on the bot fork you messaged with. Does your update resolve it?
Yeah, but I don't know if it would affect your Windows version. It would need to be tested.
I should just setup an RDP to my Windows box and test them both at the same time with the 2 different Discord Bots.
I'm also trying to fix this stupid vscode issues with commits.
Well I can definitely test that solution for Windows... will check it out at some point today
Trying something, but not too sure if it will pan out..
Your changes seem to be working fine on Windows
For some reason it won't let me create a pull request - clicking the button is doing nothing
I might have to just update the file locally and push it
Oh there it goes
I'm like... 60% sure it works. Trying to see what else it needs.
What are you up to 🧐
That failed because TGWUI load_model just wants a string but you passed a different type
Dynamic prompting - you might be using the wrong syntax - it’s slightly different from SD
see the wiki
Wildcard syntax is ##wildcard
If you're restructuring the bot, that would be pretty awesome
Something I'd love to do but just thinking about it is painful
I started working on the User Commands feature
It can already dynamically build the commands from yaml - including all different option types.
The tricky part is how to make the resulting processing steps useful and configurable
I'm trying, bring it down from 7,500 line single script into sub modules in modules/bot_modules with commands, core, events, processing, utils folders. It's just making sure everything is still in place and working....
🫡
Started adding support for SwarmUI
@calm rain could you share a detailed (or any) txt2img / img2img payload example?
I fetched the prompt schema but it's just a giant wall of text to me XD
hero 🫡
Hopefully I don't have a ginormous merge conflict to deal with when he's done
but of course I'll deal with it 🤗
Have a lot of swarm logic worked out, just need to figure out the image payload 😛
ok ChatGpt gave me a method to dump a payload from that monsterous api response with the default values
there are examples in the API docs https://github.com/mcmonkeyprojects/SwarmUI/blob/master/docs/API.md
img2img is just feed "initimage": "data:image/png,base64;whateverthefuck" data image in the json
it's ultra straightforward, just, whatever the parameters are in the normal UI? Those are the API keys, the structure is a json, and data is put in whatever the most obvious way to encode that data as a string in json is
@vestal python let me know if you abandon the idea, hit a snag, etc 🙂
I have actually, but I have something of a different design I've used for a project for work that did wonders before. Trying to remember how it worked.
It might also be good to take a look and pull your current updates and try again
I personally never happen to have any trouble navigating my code structure, but it's bacause I know where everything is, what its called, etc
But yeah, it's not particularly friendly for any potential collaborators to easily just jump in and get their hands dirty with me
What's in bot.py is mainly these massive objects that are interconnected and need values to initialize which are not easy to modulize
For awhile now, at every opportunity I could find I've been moving code to modules - ChatGPT had helped me with an issue I was facing with the main API() class
_api = None
async def get_api():
global _api
if _api is None:
from modules.apis import API
_api = API()
await _api.init()
return _api
It's in the back of my head to try applying a strategy like this for some other things, but I've been too focused adding new features
what do you think the next big update is gonna look like?
Well Dundell2 and I are talking about back end cleanup
The next major feature (aside from SwarmUI support - almost done) will be the User Commands feature, which I have a good start on already
With this new API system, and internal settings management rewrite for Image Gen - It's very easy to add dedicated support for new Img Gen clients
noted
I need to do the same sort of settings rewrite for Text Gen but it's going to be painful
once you do for textgen thats when we start getting the big new features yes?
in the database.yaml file i noticed this
take_notes_about_users: null
what does it do?
i assume null has it disabled
There's a few random lines here and there from the original project - this is actually a fork
The original author had some WIP ideas drafted and I had left those variables
do you plan to see if that wip is doable? i think notes on users in chat is a cool idea
noted
There are a lot of interesting params for Swarm payload...
@calm rain Any chance you could skim this and let me know if I misunderstand any applicable settings?
you should not be setting any paramters you don't need to set
eg clipstopatlayer: -1 is going to wonk out any model that isn't SD1
or gridgenpromptreplace: '' is utterly chaotic to have at all
initimage: null no
initimage: null is OK for normal txt2img or needs to just be omitted entirely without an input image?
omit params that aren't in use
it's parsed as None in python btw
if you sent everything in this file as an API request it would be the worst mess of a gen ever with 10 different errors due to conflicting impossible feature combos
lmao
Also I see you wrote # group labels but like
yaml comments
they come in groups, and those groups are covered in docs?
you don't have to make up your own
Seems like it could be much easier to support Controlnet features for Swarm versus Comfy
@calm rain sorry for the pings over and over - last one....!
websocket
I don't see much info on it in the Wiki
Can the websocket be connected to by default? Or does this have to be created as a separate "backend" config?
the most important thing to remember re swarm api is. it is not complicated. do not overthink
go open the UI in your browser, hit f12 for browser tools, click network, type a prompt hit generate, look at the request it makes
the UI uses a websocket request by default
the websocket and non-websocket gen request are identical, difference is just the websocket version gives live preview updates as it goes and the non-websocket doesn't
Seems like the progress is just returned from the http request?
When I see websocket I think it's like Comfy where you need to explicitly connect to the websocket and listen
nope, comfy's setup is way overcomplicated
seriously
Thanks for confirming that
If you're still lingering - is there payload-driven model changes at all? Or explicitly from API call?
it selects model based your gen request, you can hit SelectModel if you want to force load in advance https://github.com/mcmonkeyprojects/SwarmUI/blob/master/docs/APIRoutes/ModelsAPI.md#http-route-apiselectmodel
When I dumped the payload from t2i params endpoint, I did not see any model related key in there
I do understand the API usage though, which I have working
Alright so working with Swarm compared to ComfyUI is making me realize I have no idea how websockets work
I was under the impression that a websocket connection did not have "endpoints" per se
a lot of apps for some reason just do a /ws endpoint and then blindly shove everything over one socket
that's a ... valid design choice
but broadly, a websocket is just: a post request, but you can keep sending data back and forth for a while
this can be anything from what Swarm does (literally, a post request, but it sends multiple stages of data back) up to just being a network tunnel for a video game or something
the entire concept is predicated on abusing http connections to form temporary persistence, which isn't in official specs but you can cheese it into any engine
Thanks for the explanation! I had an error when posting I need to look into it maybe I just had the wrong response type or something
Or I need to send a bona fide web socket message
I’m using aiohttp Which has dedicated methods for HTTP requests and web socket messages
I had a bit of trouble in regards to model loading... if I go into the UI and click "load now" for any model, when I send an API payload without a 'model' parameter, it errors
Resolved it by always including a 'model' parameter in the payload.
Ok finally got this working
The only annoying thing is that post model change doesn't seem to actually do anything.
Suddenly, it's all turned to shit
My system was designed around a websocket that doesn't want to close itself at every possible moment
I send the payload on the websocket, and miliseconds later it's closed
async def post_for_images(self, img_payload:dict, ictx=None) -> list[str]:
if not self.ws or self.ws.closed:
await self.connect_websocket()
img_payload['session_id'] = self.session_id
await self.ws.send_json(img_payload)
msg = await self.ws.receive()
print("Message type:", msg.type)
print("Message:", msg.data)
print("Is closed?", self.ws.closed)
print("Exception?", self.ws.exception())
results_list = await self.call_track_progress(ictx=ictx)
final_result = results_list[-1]
return final_result
12:57:55.764 #1522 INFO [bot.modules.apis]: [SwarmUI] WebSocket connection established (ws://localhost:7801/API/GenerateText2ImageWS)
Message type: 8
Message: 1000
Is closed? True
Exception? None
sendpayloadclose
🤷
I've undid every line of code I added one by one and I simply cannot get back into any form of progress I was on
Goddammit
@calm rain Your whole thing just silently errors if the key 'images' is missing from the payload. I want my 2 hours back
This finally appears after like 60 seconds of sending the payload missing images
Most painful line of code I've written
SwarmUI is now working with the bot
Oop. The http route handled that properly but the WS route was accidentally eating the error message and just closing the socket. Can't give your 2h back, but I fixed it to properly render the error for the next person
I have another complaint lol
The logic of the progress values is a bit odd to me.
The progress within each node does not seem to have any effect on the "overall percent" until that node is done
So when the bot is checking progress it quickly gets to like 60 % and when it hits the KSampler it basically just stalls until 100%
oh yeah i meant to fix that before but forgot
eh I guess there's probably a logical way to factor the per node progress with the overall. I'll consult mr Chat GPT
generally I just render both current and overall
wherein current is the one people usually care about
overall is useful info but a bit misleading - it's the progress through the comfy workflow, and most nodes are instantly, then there's the fat ksampler in the middle, then some more instant nodes
indeed... I didn't look too hard into it but out of the box the image gen tasks with Comfy yield a current step / max steps
So it increments smoothly. Seems to just completely ignore those instant nodes I guess
alright, combined the current into overall now.
comfy API returns current_percent for some nodes, nothing for others, and a node ID progress report
in other words: if you want to copy comfy api percent reads, just use current_percent
the overall is node progress which doesn't particularly matter much unless you're doing several samplers or something
Another question... again I'm being kind of lazy asking this than printing results again.
After sending the ws payload, does it return a specific ID associated with the request?
With comfy, you post the request to /prompt and it returns an ID - which you can then filter the websocket data with to ensure you are tracking the correct task
I'm just wondering how to ensure the bot is tracking the correct progress if there's multiple simultaneous gens (assuming that's possible with Swarm)
I direct you once again to the "don't overthink it" thing
the websocket only tracks progress on generation(s) requested by the websocket
there's a batch_index to differentiate gens within a group
also a request_id as a globally unique id for each gen
My brain is maybe half the size of yours so overthinking required XD
Nice update btw - the progress tracking does not stall now
Jumps to 75 then does count up instead of freezing 😄
I tried tinkering for a few minutes with how to handle a request with images > 1
I had set a condition that if “image” is in the response, progress tracking is complete - triggering it to use View to get the bytes
ye
btw if you use "donotsave": true it will give you direct data-image there in the json instead of the link, if you want that
That's ideal, thanks for the tip there
So for images > 1 does it basically loop with the progress? Counts to completion and yields a dict with image result after each one?
for more than 1 image, use batch_index or request_id to separate em
it will be sequential if you only have 1 gpu, not if you have more
Maybe you’ll give the bot a try sometime?
@halcyon quarry just wanna say thanks for the bot. Finally some remote way to use my SD, with a stable connection lol still trying to figure out all the settings and aspects, but it is a great work done!
Thanks! I'm very passionate about this project, am mostly on my own in terms of development, very few beta testers - any feedback is always appreciated. Also promoting it would be appreciated haha
The latest developments with it is that it now supports a variety of API softwares out of the box, and can theoretically be configured to use other software I don't even know about
A1111 / Reforge / Forge / Comfy / Swarm
Sure thing, let me get my bearings and I will get back to you if anything 🙂 btw, have the text integration changed somehow? could only install it as a standalone, did not want to find the TGWUI
I've been too busywith development but theoretically it can also run advanced Comfy workflows such as image to video, and return the video result
Holy... Maybe it will be the reason to get back to comfy lol
I need to take a look at my installer logic - I believe it checks the parent directory to see if it is a git repository. If so, it checks if it is TGWUI or a fork of TGWUI
In either case it would present the option
Aha, so it wont work with a portable one-click installer?
lemme see
When I get an idea I sometimes have a bit of tunnel vision and overlook some scenarios - like that one
Ahhahahah gotcha. no problem at all. Keep doing a god-like work lol
u using the default dir name for TGWUI portable?
I'll add another condition for if the parent dirname starts with text-generation-webui
I have renamed it to text-generation-webui afterwards, did not catch that too. Reinstalling via the git clone to see if that will work
Will be making this update shortly, trying to work out some other little thing first...
yeah, so with clone method installation works perfectly
Nice - I'm going to go add that logic now anyway 😛 Finished what I was tinkering with
Ahahahah nice. Encountered another issue. Not sure how to change ports of what apps are using. Basically forge and text are on the same pot 7860. If I change forge to 7861 - bot cannot find imgmodels at all
11:12:45.611 #961 ERROR [bot.modules.apis]: HTTP 404 Error: {"detail":"Not Found"}
11:12:45.611 #968 ERROR [bot.modules.apis]: [SD Forge] HTTP Error 404 on http://127.0.0.1:7860/sdapi/v1/sd-models: Not Found
11:12:45.611 #6586 ERROR [bot.main]: Error fetching image models: 404, message='Not Found', url='http://127.0.0.1:7860/sdapi/v1/sd-models'
Or it happens because of something else?
Well you can manage the ports in the CMD flags for each software
You may not have the required flags set for Forge, --api --listen ?
I recommend copying your webui-user.bat and calling it something like webui-user-api.bat and include the flags there - so you can launch it either way
It can be annoying to always launch with API enabled, because the UI will not allow you to modify settings
It worked before without integration with text thing. So i dont think that annything is wrong with forge. let me try to change the port of a text thing. Your bot has specific port it needs the text bot to be on?
lol ok
It directly imports modules from TGWUI and runs them
For API configurations you only need to focus on Imggen and TTSgen
I need to rewrite a lot of code in order to get the textgen flexible for APIs
I'm not interested in converting it rigidly to TGWUI API - when I update this code I'll be scratching my head constantly on how to generalize the logic for handling everything
I did just now add a check for if the parent directory name starts with "text-generation-webui" - bypasses checking git status
Yeah, so culprit was a port conflict. Changing only text ui resolved it
Now off to check your wiki and api docs lol
Things you'll probably be most interested in:
- Understanding how the Tags system works
- Managing "presets" in
dict_imgmodels.yaml- including Tags management
Also, I need to add this to the Wiki... it's strongly recommended to use a good code editor for managing settings, like Visual Studio Code
Once you select a bunch of lines and press Ctrl + [ or Ctrl + ] it will be life altering
(this changes the indentation level for everything selected)
Indentation is something thats been bugging me forever lol
listen any good llm models you can advise? Im getting a nuch of gibberish using the deepseek somehow
Also Ctrl + / will toggle whether things are # Commented or not
It's likely just faulty parameters for that model. You might want to play around with settings in the UI then write them back to your character file
oh, ok
See example character M1nty for some extra settings that the bot can manage
If you go into dict_base_settings.yaml that's all the defaults.
You can update those. If any of those settings are in the character file, they will have priority
A lot of these settings have no effect though
When you toggle between model loaders in TGWUI you'll see settings get hidden and appear
Basically, you should focus on the settings that are relavent to your model loader
cant figure out how to set up the bot llm settings. It gives infinite amount of response with gibberish, and generate very mad pictures lol
if you support that re comfy you presumably support that by default re swarm too ,yeh?
turning an image to a video in swarm is just set a few params and go
With effort, yeah 😛 Got so many balls in the air
For starters you can lowe the max_new_tokens while you debug
Are you having the same issue in TGWUI? Or just in the bot?
Only in the bot. I hadn't figured out which settings to migrate i guess
One goal of my bot is for users to be able to switch between main APIs without having to modify all sorts of client specific settings - been spending most of my efforts trying to solve these issues
A1111 - like UIs have the easiest and most basic syntax for the Lora triggers, they don't require the subdirectory names.
So for each relavent API subclass (Comfy / Swarm / possibly more to come) I have a method ton fetch a list of the valid Lora values.
The bot uses regex to capture the Lora syntax, check if the name is a substring of a "valid value" and automatically update it.
For Comfy, I actually pop the whole lora syntax so that it can inject the name(s) and strength(s) into the Lora stack loader node
Spend way too much time with these details to get meaningful work done
Similarly I added autocorrecting for sampler names and schedulers
And autocorrecting for various other things - example for Swarm
key_map = {'cfg_scale': 'cfgscale',
'negative_prompt': 'negativeprompt',
'CLIP_stop_at_last_layers': 'clipstopatlayer',
'sd_vae': 'vae',
'distilled_cfg_scale': 'fluxguidancescale',
'denoising_strength': 'initimagecreativity',
'sampler_name': 'sampler'}
Ok, so rolled back to default user settings. From scratch based on the git info it should work with "draw something" for me it tries, gives me huge test where it answers instead of me, then botches the image (worse than what SD1.5 did lol)
generation via /image works great, as intended, however that llm integration drives me nuts lol
The one thing the instructions do not say, is to actually copy the example character Prompt_Enhancer_XL.yaml into your characters directory
The tag which has the "draw" trigger, has swap_character: Prompt_Enhancer_XL.yaml - It swaps the character (context / params) before prompting
I did not think of that.... wow... Ok, ill finish setting up a preset for illustrious and try that
and yeah visual code is blessing lol
That tag also has some other stuff that improves the quality - hides history, does not save the interaction to history
If you are able to use Flux models / ones that like long-winded natural language prompting, you should try out the /image command option use_llm (with the "prefix my prompt" setting)
Yup, that was it lol
For flux I use gguf, and I havent figured out yet how to set clip, vae and t5
You can also move either of the sd payloads from examples, into user/payloads
The advanced one is recommended
There's an example of this in the dict_imgmodels.yaml - since you are using Forge this is handled with the forge_additional_modules setting
So long as you have things configured correctly in there, the bot can easily change between model types, even with the "auto-change imgmodels" feature
ok got it, will take a look, thanks!
It will work more consistently / predictably if you organize your models into subdirectories (the subdir name becomes part of the value that is checked)
thats for later I guess. Tried nsfw - llm flagged inappropriate. need to fix that lol PRIORITY #1 lol
For image generation, I am a huge fan of this model... https://huggingface.co/LoneStriker/NeuralBeagle14-7B-8.0bpw-h8-exl2
Here's the idea for a NSFW prompting character
Nice, thanks! Yeah tried uncensored qwen - good but boring. Will try that beagle on!
Yeah, there was a line in config that was blocking the nsfw content in bot. all good now lol beagle actually not bad
Damn, I'm going to update that to false by default
lol
Ahhahaha that it blocks nsfw? Lol
You'll get there man!
🤓
I need to finish the next planned feature, 'user commands'
Then I'm making some youtube vids on the bot
There will be yet another configuration file, where the bot owner (you) will be able to create your own bot commands that will do custom things
oh yes...
I've got this feaure about 1/3 done
the possibilities...
Im not a coder by any means, but if you need help in some capacity - let me know lol
There's already tons of possibilities with the Tags system
👏
Tags are still confusing as hell for me..
from my understanding they are just some keywords to activate certain mechanics
buy they can stack in a insane way :v
Each "tag" is a dictionary (key values)
If there are no "conditional" tags (such as trigger, etc) then that tag is considered "matched"
Otherwise, it needs to meet the conditions
When you add parameters to the tag definition, they go into effect.
the best example,
we have words, then we have what it does,
in this case when these are matched the llm is blocked 👍
Right well I just fixed that default minutes ago haha
If there's no trigger, it just blocks every generation
Certain tag params are only applicable to the text generation, and others only for the image generation
:c
Thx)
If you want to get into the really advanced stuff the bot can do, play around with the "flow" tag
A typical message request looks like:
User prompts ---> Match Tags > LLM > Match Tags > Img Gen
If a Flow is triggered, it loops through this, except you are basically defining "pre-matched tags" for each iteration.
For instance you could make the LLM response get fed back to another chat character (or even trigger an LLM model change first)
Yeah checked that file you sent, interesting stuff
There's very interesting use cases for it that people with big brains could think up
hopefully I can apprehend all this one day lol cause for now thats the best way i can use my sd while remote
appreciated!
Interesting. For some time llm gave me prompts in illustrious style, with 1girl and all. Then it began to just do nat language, than mix lol
If the history isn't being manipulated, that will happen
Yeah, figured
By default for the 'draw' tag, it should be, though
Got it
I'm going on vaca so, no development for a week or so
Lucky you! Gives us time to root into existing stuff lol
I do have one last tip for something you’d probably be interested in
You can use a combination of the dynamic prompting feature, and the spontaneous messaging behavior feature
To make a automatic image, prompting generation character thing
Just change the maximum replies to negative one, and it will just continuously re-prompt the LLM
With dynamic, prompting syntax, those can all be unique prompts
You can pretty much just make an automatic image, generating character that you can switch to and from
Oh interesting
Sorry, which file is that in? My brain melts by the end of workday lol
ok got it
@calm rain quick feedback... after sending a swarm payload, this is an example of the first message emitted:
DATA: {'status': {'waiting_gens': 2, 'loading_models': 0, 'waiting_backends': 1, 'live_gens': 0}, 'backend_status': {'status': 'running', 'class': '', 'message': '', 'any_loading': False}, 'supported_features': ['comfyui', 'refiners', 'controlnet', 'endstepsearly', 'seamless', 'video', 'variation_seed', 'freeu', 'yolov8', 'comfy_latent_blend_masked', 'comfy_just_load_model', 'comfy_loadimage_b64', 'comfy_saveimage_ws', 'folderbackslash']}
Feel like the request ID should be part of this
I know, thinking too much into it 😄
i was originally going to do that in Swarm - in fact, you can still technically use the auto webui backend on swarm. The problem is... who gives a shit about auto-based UIs anymore? The only reason to use it over comfy is the interface, and, well, when you're not using the interface... comfy is far and away better to the point of not making any sense to bother with anything else.
that message isn't related to a specific request, that's just a general status dump, you can ignore it for the bot, it's mainly intended for the UI to keep state updated
Just saying it takes too long to get a prompt ID
eh? I could have it emit one earlier with no data, if you need that?
not sure why you would though
The logic of it makes sense in Comfy, to me. You post and get the ID and you’re sure all data you get afterwards is associated with that ID
how do i actually set it up? since i tried using the discordbot outside text-generation-webui folder it doesnt work, tried putting it inside it doesnt work, can anyone help me with it? i use arch btw
btw i tried to use it with the text-generation-webui and wanna set up the image generation aswell
Vloth here posted an Issue on the repo but I couldn’t help, seems to be an OS specific problem
so can anyone tell me how to set it up from scratch?
could you tell me on how to set it up from scratch? like where to put it? might be because of that
There are install instructions on the repo that are straightforward. First you install TGWUI.
Then while in the root TGWUI folder you git clone the bot. So the dir is ‘../text-generation-webui/ad_discordbot/<bot files>
Then just run the launcher file for your OS
do i launch the text-generation-webui first or the ad discordbot first?
You just launch the bot only - the bot does not use TGWUI API - it directly imports modules and runs it
When you run the bot it basically runs TGWUI backend code without the UI
I’m planning to rewrite the code at some point, make it API
For image generation - copy the ‘Prompt Enhancer.yaml’ character from the ‘examples’ dir, into user/characters
Also copy the sdwebui payload (or Comfy, or swarm) from examples/payloads - put in user/payloads. (I definitely need to update the Wiki with this…)
wait it did work, it just didnt launch because it was the wrong directory
now how do i add image generation?
btw is this normal?
What r u using? Forge? Comfy?
i havent set it up yet
btw this is the error output
I’m on vacation btw, working overtime here 😛 I don’t know what that’s about…
Maybe try copy/paste that to chatgpt
Forge is probably the easiest to get into… has the most supported features with the bot
could you help me set it up?
Download and install it. Download some SDXL models from civitai and put them in models/Stable-Diffusion/
That’s it - you can generate images. To work with the bot you need to launch Forge with command flags —api —listen
You need to check bots config.yaml ensure imggen is enabled. Need to check dict_apisettings.yaml and ensure the URL:port are correct for Forge. Ensure the Imggen client is Forge - must be “enabled: true”
When you launch the bot, on startup it will either say the imggen API is working or will give an errorr
If it’s working you can use “/image” command, or by default if you start your message to the LLM with “draw” it will trigger image generation
What model are currently supported? Like gguf? Safetensor? Bin?
Forge, Comfy and Swarm can use run Flux models including gguf
Most flux models do not have the text encoder, clip, and vae baked in - they need to be downloaded separately and loaded in tandem
For most SDXL models you just load the model and that’s it, all baked in
in the case of Swarm you don't need to worry about the secondary files, it's all auto-managed
huge list of image model classes supported here https://github.com/mcmonkeyprojects/SwarmUI/blob/master/docs/Model Support.md, use civitai to find your favorite finetune, they basically all work in swarm
(in forge god help you if it a recent model class)
@halcyon quarry are you open for suggestions btw?
Since I have a lot of suggestions for the new update if you want
Hope you have a nice vacation.
Pretty sure there’s a parameter for thinking
where?
and how do i disable it
user/settings/base_settings.yaml
Also check out example character M1nty for usage of per-character settings overrides
Close enough 
how do i make it stop thinking though?
its just too long and sometimes the answer get cut off
thinking: false
where do i place it?
dict_base_settings.yaml
Go to llmcontext > state > I think it’s already there defaulted to true
there isnt any for think
Ill bet it’s enable_thinking
🤔
@halcyon quarry so most of the text keep getting cut off for the ai, what do I change to extend the maximum words for the ai?
Keep getting cut off like this
The name of a parameter is a maximum token i think. And default i had was 2048 i think. Sorry dont have access to pc to answer exactly
In the same file where you changed the thinking setting
Also, copy the whole text of that file and insert into chatgpt, ask it to explain all the options. It will help
Under dict_base_settings.yaml -> llmstate -> max_new_token
if you are an user of tgwui and you have your preset, you can fill the preset name, not sure if it works though :p
“Preset” does indeed work
Why does it answer stuff for me