#✨|sdxl
1 messages · Page 134 of 1
Yeah same, I've tried to make some sort of understanding regarding it, but yeah it's just been pretty vague really
it seems like in general 4096 works. tried 8012 but not sure if that improved anything
also wonder if sometimes using different aspect ratios would be preferable
I think (mobile atm so can't verify till at pc) I've been testing sticking with matching my latent ratios, but I've not been doing any serious images yet as I've been finalizing my main workflow
man, this ipadapter setup I'm working on is decent
still trying to balance things. the cats slot is kind of overwhelming the other 2. but getting there
what I'm working with currently
can't deny the cat is looking good, but needs more spock and dinosaur
using stylers but nothing that's going to alter the style of the output too much
top left has transcended and taken on it's next form
I would have guessed a porcelain plate, river rock art, and a cat were your inputs lol
i have it on hdr photograph and enhance for stylers, so the hdr photograph one might be causing them all to look like ceramic figurines, lol
probably isn't a magic ticket ideal setting for the parameters either
these are pretty quality though
kitty
switched it up
something is nanning
pathetic
2 out of 4 came out right. better than 0 I guess
Do we have a tile controlnet for SDXL yet?
Their results looked pretty good
I suspect they did move to latent diffusion, pixel diffusion has a certain look you can notice
I think it similar to SDXL, but maybe more parameters. the text in images might be more consistent
I don't think stable diffusion will be in trouble. they have different approaches to things
dalle is like a nice car that doesn't give the driver access to it's engine
Yeah I've no concern for the usecase of SDXL
it certainly looks like it works well, but different target audience I believe
maybe some overlap
idk, I think SAI might need to keep developing SDXL if we want to keep up with OAI
sam altman seems like a high level scumbag. but decent pr team. hopefully they sort of fizzle out over time
idk. I think if Dall-e 3 will actually outdo SDXL we might need to figure some stuff out
not bad
I don't think stable diffusion, or anything else, will ever take the runaway lead because they all feed off each other. the tech and ideas spread around so quickly
btw, this is a new setup. took out all the non-ipa stuff
idk what to tell you. SDXL did take the lead, but now not really sure about that because dalle 3
I don't see why we should be concerned with dalle. it's not likei t's going to cause sdxl to cease to exist
and whatever they're doing will get ported over eventually anyway
clarify "ported over"
well I didn't mean it as a literal thing. I just mean whatever they're doing will eventually get picked up or at least considered by all the competition
and it's different target audiences anyway. so I don't know. I really don't know what they could do to impact what I'm doing
in a negative way at least
I did remember Emad talking about an SD3 in the works; that might possibly be an equivalent eventually
I believe that was mentioned on the SDXL release show stage?
also, I think OAI might need to figure out something similar to IPA to use Dalle3 for blending images. since it's (probably) not pixel diffusion; they would need to figure out that component
well I'm sure they have people that understand the concept thoroughly
if I can sit here at my crappy computer and do what I'm doing surely they're able to do a whole lot more
and SAI doesn't? IPA wasn't made by SAI
no, where did I say that?
I don't think stability had much if anything to do with any of the ipa stuff so far have they?
Terminator 2: Judgment Day Scene Stars: Arnold Schwarzenegger, Edward Furlong Director: James Cameron Writers: James Cameron, William Wisher Jr. Producers: Stephanie Austin, James Cameron, Gale Anne Hurd, Mario Kassar, B.J. Rack Production: Carolco Pictures Distributon: TriStar Pictures Released: 1991
► watch Terminator 2: Judgment Day o...
I mean, that's the only way to blend images using SDXL, so I suppose they might have
I'm guessing they might have stuff in the works. but I actually don't know anything. so who knows
Just think there always working on it, its there baby
Looks like alot of the process is sorting pictures with word tags
Then telling it if its right or wrong
not really, there is more stuff going on, with things like IPA and MJ blend the subjects in outputs look like imageA or B
like this for instance, obviously- the bench is really similar to the bench in image input
what's going on there?
so it's not [BLIP OUTPUT FROM IMGa]+[text]
my theory is IPA might be injecting image inputs in to the model itself, while mixing with CLIP_vision
what do you mean by "image inputs?" I'm slow sometimes
provided image to be encoded and used as prompt
for instance, here it was [IMAGEa] + [in minecraft]
and with this it's [IMAGE_A] + [IMAGE_B]
the injection process can cause the model to lose many of it's qualities, so the IPA is meant to be pre-sampling then passed onto a sampler with the original model with BLIP output from the unfinished samples as conds, with denoising strength<1
well it inserts between layers as I understand it
so IPA is meant to give the normal model an idea and the unfinished latent that's a blend of imageA+B
this way it allows for AIT and style nodes to be implemented for better use
I'm assuming OAI also knows all this stuff, so dalle3 might have the ability for image input in prompt and image blending. so the way to outdo that with open source will be to improve SDXL itself
speed isn't as problematic due to things like OneFlow and AIT
which I'm also sure OAI has closed source alternatives of
of course
I think their biggest weakness is also their biggest strength in a sense
they have access to so many resources and things. huge amounts. but then, something that big can't move too quickly either
Yeah. Again, the thought of closed source AI outdoing open source AI sickens me and probably most people in here. So if dalle3 will actually beat SDXL we would need to take action
And the worst part is, OpenAI was intended to be open source in the first place. So they are defeating their own purpose like this
I don't think it's a winner take all sort of thing though
wish I knew how to pinpoint what causes things to get cutoff
should probably start using those solid masks
Always to the right?
always
same
I also encountered that when working on my AIT blend workflow I used a second ago. Are you using IPA VIT-H or the normal one?
normal one. they're moving to vit h though
This is IPA and never gets cut off
For sure, it works really well. I'm not at home at the moment - but I plan on releasing it very soon
Also the img+txt version
Both take ~16s per image for me, and they beat MJ's stuff most of the time
well I think mine is doing alright. maybe approaching a bit differently though
how do you take care of things moving to the right?
You'll have to wait and see ;]
my flow is goofy, but it's getting there
I don't know if it's really possible to perfectly balance 3 images without adjusting things somewhat for different combinations of things
yeah, you have a lot more nodes I believe
keep mixing it up to see what happens
interesting how it transfers the concepts to the output
pretty sure openai was originally made to be open source but they changed their mind once they realized they were the current best and could make hella money off it
Dalle3 isn't pixel diffusion I think
We could outdo it if SDXL would keep getting developed
I fucking hate how a company called OpenAI is always making closed source AI though
I did hear about an SD3 being in the works, Emad mentioned that on the same day SDXL was released, so that might be the next phase.
What size images was SD 2.0 train on ?
768^2
SDXL is 1024^2, but insanely more params
I believe dalle3 is somewhat similar to SDXL, judging by the outputs they showcased, it's definitely latent diffusion similar to SDXL. But they said it would be used via a multimodal inference. We could easily achieve this if there would be an interface for LLaMa2 x SDXL
Like giving an idea to LLaMa, then have it make a good prompt, send to SDXL - and boom shakalaka, multimodal inference
Have you done any video with sdxl?
trade the refiner for an LLM
Oh yeah, for sure. With modern SDXL finetunes the refiner would degrade results
refiner also breaks down really fast at higher res
I think LLaMa 3b would be a great fit with SDXL
deviate more than like 15% above 1024 and it just turns into soup
Also on 1024 stuff, refiner isn't necessary
Slows down speed and worsens outputs
yea I run it without refiner by default and only turn it on if I wanna re-render the seed and see if it like fixes the face or something
life's been easier since I started doing that instead of fiddling perpetually trying to make the refiner objectively better
The latest AIT workflow I made doesn't screw up faces, I usually bypass the refiner haha. But results are insane and equally as impressive speedwise
I am liking XL base though so I haven't even used any tunes yet
waiting for the civit.ai space to mature a bit more
what ait nodes does everyone use I kinda wanna try it on rocm at some point
assuming compiling the kernels for my gpu isnt too involved
You would have to compile AIT modules for AMD, it wasn't done yet.
yea but like is it a whole 50 step process or is it just python compile.py models/sdxl-base.safetensors
It's easier than that even. Especially on Linux. When I'll be home I could help you compile for AMD if you want
On windows it is a hell though, I'm the only person that could provide the precompiled modules for batch range 1-4 on windows
well rocm doesn't even support windows yet so guess that makes it easy
You would need VRAM>12G to compile for batch size>1 though
the main hangup might be idk if amd's AIT fork for my gpu follows upstream AIT
or follows it closely enough to have the required updates for whatever the nodes use
24 gigs but no flash attention so make of that what you will
You need flash attention 2 for compiling AIT
I dont even have flash attention 1
No Xformers?
AMD has 0 flash attention support for anything
they have a fork of it but its out of date and fails to compile on newer ROCm versions
that's why my gpu uses like twice the vram of other people for SD
24 gigs might be enough to still compile though
so 24 gigs behaves like 12
maybe if it can fallback to a math impl instead of hard depending on flash attention
Yeah, you might be able to compile anyways
ExLLama uses flash attention a lot but it works great on my card with the fallback method anyways
I could run 30b @ 4 bit pretty fast
I quantized mythomax 13b into 8bit exl yesterday and I get like >40 token/s
I think exLLaMa is similar to AIT? I heard it's a massive speed boost
yea its automated though. it compiles the kernels for you
Jesus, nice!
so you just feed it a model and when it loads it hot compiles the kernels
so the first load takes like a minute then after that its instant
I also made a 13b 4bit version that hit 65 token/s
thats not the hot compile though that's requantizing the whole model specifically for exllama
hot compile from a gptq based model is slower
I thought exLLaMa is only for GPTQ, how did you use it on 8bit?
exllama2 has its own custom quantization format
exl2
so you can requantize base models into exl2 and it does mixed bits between 2 and 8
so you can have like 6.5 bits average or something i guess
only problem is taking the measurements to find the error rate of all the different bit quantizations takes ages. mythomax L2 13b took like two hours just for the measurements...
so if you have a model you really like and want that huge speed boost I'd give it a try sometime
the exllama dev has pre-quantized versions of the llama2 base models
ig the 70b can run entirely on a 3090 with like 2.4 bits average or something
Oh cool! Is that for Oobabooga?
yea
works out of the box on rocm too
just yells at me for not having flash attention lol
wonder how much exllama2 takes
how much what?
vram wise
depends on the model, how you quantize it, and your context
the llama2 based mythomax 13B quantized to an average of 8 bits with 4k context uses like 17 gigs of vram without flash attention
oh wow, went down quite a bit
Oh, so not the stuff TheBloke uploads? You quantise yourself?
i thought it would take ~24 45
and and inference for me is 40 tokens/s
ooh woow that's fast as hell
I use his GPTQ stuff but he doesnt make exl2 versions
I quantize it against the same calibration data he does so I assume they're comparable quality
idk what most of the settings do though lol
the convert script from the exllamav2 repo seems straightforward enough
https://github.com/turboderp/exllamav2/blob/master/doc/convert.md
so you just convert.py -i mythomax13b/ -o /tmp/quant -cf mythomax13b-exl2/ -c wikitext-v2-test/ -hb 8 -b 8 then wait two hours and you have a quantized model
if you re-quantize the same model to a different bit depth its a lot faster cause you can re-use the bit depth error measurements
Then you just load it using exLLaMa 2 through ooba?
yea the output folder from -cf loads like a normal model folder
i have 3 measurement files for MythoMax L2 13b and Mythallion 13B if you want them
I'll look into this. Does the bloke share his quantisation setting or we can come up with better ones?
just says he uses the wikitext-v2-test as his calibration data but other than that not much. I think the EXL2 quantization settings are different anyways
measurements for mythomax/mythallion. First two are measured with 4k context and a higher line count, last one is just mythomax with default settings.
i have no idea how optimal they are but higher numbers are better I assume. the r400 one took like 3 hours to compute so it better be good lol
Will TheBloke also include EXL2 or do we have to do this on our own
Also, how do you make these files?
-om flag takes measurements, -m flag uses them. It measures the error for quantizing each part of the model in 20 different ways and then using the measurement data when you quantize it for real it mixes the bit depths for different parts to be less lossy I guess
I don't have much knowledge about LLMs. What does R mean?
I just got started with all this yesterday so im new to this too
its the -r flag from the exl2 convert script
how many rows of calibration data it uses
default is 100 so r400 means i used 4x the default which was probably way overkill
Does that effect performance?
no idea
my hypothesis is it just affects quantization accuracy
like if you use more calibration rows it quantizes better
so token/s is the same but the quality of outputs might improve?
the docs arent very specific so im just guessing tbh
quick gen using 8bit AutoGPTQ vs 8bit exllama2
Is 8bit really better than 4bit?
¯_(ツ)_/¯
it has a lower error in the measurements by a few %
for 13b models even 8bit doesnt come close to using my full 24 gigs so
guess 4bit is faster though
60 token/s vs 38.65
Obviously, but by how much and is there degradation?
Both will be instant so the one with better outputs will be the obvious choice
idk how to objectively measure degradation. The calibration err values were a few % lower for 8.13 bits vs something like 4.13 bits is all I can say
yea that's why I just went with 8 cause I assume its better
both are many times faster than I can read
Interesting. I'll look into this. It might be a huge breakthrough for using SDXL and these things at a multimodal inference
coming from wizard-vicuna 30b 4bit I prefer the results of mythomax l2 13b 8bit so far, but there's many differences between them besides just bit depth ofc
the 30b is based on llama 1 so probably why
Oh yeah, this will also work on 33b models
it works on anything llama based
Is there any 33b models that beat Mythomax?
all the 33b models are llama 1 and mythomax is llama 2 so idk
depends what you're going for I guess
apparently mythallion is also better than mythomax for chat specifically but I havent played with it yet
Also what about VRAM usage? Is 4bits half as much than 8bit with VRAM?
lemme see
We might be able to fit SDXL and MythoMax at the same time if it's that efficient
9.7 gigs with 4 bits EXL2, 15.5 gigs with 8 bits EXL2
just loaded no extra context or anything which'll add a bit more
so not quite half but I mean close
I have 12gb of VRAM. Maybe flash attention could resolve this?
one note is 8 bit is a target depth, not absolute. so if 6 bit 128g performs better than 8bit 32g for any specific layer then it'll use 6bit for that layer instead
so really the 8bit is probably more like 7.5 bits or so
you can quantize in any float between 2.0 and 8.0
so you could do 5.5 or 6.0 and easily fit within 12g
so the final model will be a mix of 4/6/8 bit layers depending on which layers benefit more or less from high bit depth
to get your target average
but yea this message implies that flash attention might lower your vram costs quite a bit
I checked the rocm flash attention fork and they cherrypicked some stuff from the flash attention 2.0 tag but still wont compile for me
and im not smart enough to fix clang errors
Does EXL2 make engines for running? I'm thinking we could make it even more efficient with something like AIT
maybe? im not sure where the cache for kernels would be
there is a bunch of exllama2 cuda files in my torch cache dir so
Anyways, I'll try EXL2 with 8bit when I get back; I'll update if it's better quality than 4bit
is there a way to benchmark relative quality?
like in some objective way with numbers
curious how my 8bit exl2 conversion compares to TheBloke's 8bit gptq
while you do that ig ill try to make AIT work on rocm
tomorrow its 1 am here
It might not be as easy. If you fail to do so, I'll help you when I get back
what's the repo for the nodes & compiler?
im on Arch Linux so any random tools are just an AUR install away
That's the thing, I had to make a compile script for scratch to make modules for SDXL
bru
that sounds super involved
python isnt that hard but idk shit about numpy/pytorch
Yeah, but that compile script should be universal
I can give it to you when I get the chance
Oh right
I gave that script to FizzleDorf, it's included in the node's repo
So you just install Bdist wheel for AIT and run that script and if you're on Linux it should work right away
Interestingly, you don't need the AIT pip module to use the modules themselves, but you do need it to compile them. This is why the node is shipped with precompiled modules
so the ait package is just for compiling and the modules are ran with pure torch?
probably helps my chances of it working lol
That's the thing, AIT enables the models to be loaded via MSVC engines, this is why it's universal
But yeah, it's only used for compiling the engines
so the ait package is only needed to load the modules on windows? Linux it just uses torch directly?
all this shit's confusing. maybe intel has the right idea with that UXL thing they announced
No, on windows loading the modules is just as easy as on Linux. The compilation process is just much more complex on windows
does nvidia cuda use clang too? ik rocm does
Specific cuDNN and CUDA versions.. compiling AIT on windows is a leaving hell
Linux handles all that on it's own, so you just need the PIP package and the script
But after someone compiled the modules for Linux, no one needs the pip package
And that someone might be you when it comes to AMD
maybe. the rocm sdk packages on arch automatically pull all the compilers too so my dependencies should be fine.
hopefully its just run script and go
I compiled my own pytorch before it officially supported rocm 5.5 so im hoping AIT is easier than that
It should be
when it comes to arch gfx1100. Dont think the kernels are cross-compatible between arches
so 7900 XTX and XT
They are. The modules you will be compiling will work for all people that use Linux and has the same GPU architecture as you
yea so gfx1100? just the 7900 cards
rocm has separate flags for gfx1100 compared to other rdna 3 cards which is gfx1102 I think
Should work for all the 7000 series cards really
so it might work but idk
rocm discriminates but maybe it doesnt matter as much for ait
With Nvidia cards modules I compiled with my 4070ti work for people with 3000 series even
Yeah, should be
aight
guess thats my project tomorrow
when its not 2 am
rocm's ait fork is just python setup.py install so I assume since that goes smoothly the rest should too
About that- to use the compile script made for SDXL you need a specific pre release, or maybe now the normal one works
Hmm. I used 0.3.33 for providing the modules, idk how this will behave with that script
there's no new commits on the rocm navi 3x branch so unless the mainline one works thats all I have
It should work, but not 100% sure about this
we in the weeds now bois
thats assuming they even finished the AIT port and it doesnt just segfault lmao
I had read that someone else got it to work with the 7900 XTX on sd 1.5
I do love AMD way more than Nvidia though, so I hope this makes AMD cards run SDXL as fast as Nvidia cards
im hoping flash attention lets me beat the 3090 at least
in blender I beat 3090's without raytracing by a decent margin
Nvidia are so greedy, AMD needs to put them in their place lol
blender doesnt support amd raytracing yet so idk how it compares with optix
maybe intel arc will save us all
saw an a780 16gb on amazon for $320 today
granted their ML toolkit is even less complete than AMDs so
no exllama for them
Maybe. Does that use CUDA cores or MATRIX cores?
its their own thing✨
Pog?
One as for OneFlow?
¯_(ツ)_/¯
OneFlow is faster than AIT but less compatible
Ah, I see
hence OneApi
so the spec is there for nvidia or amd to use too
but AMD already tried that with rocm and you see how that goes
amd HIP can backend to nvidia cards fun fact just no one does it
Yeah, I wonder if AIT gives AMD cards a bigger boost than it gives Nvidia
probably less
AMD has weaker tensor/matrix/whatever cores
so there's probably less optimization
also their AIT fork is stilll WIP so probably not complete
I'd actually guess more, look at what exLLaMa did to your speed
hell supposedly ROCm doesnt even make optimal use of RDNA3's techyes
try it yourself before you make judgement calls lol
you might 5x speed too
with AutoGPTQ I was still behind Sytan's 3090 by a small bit
Nope, I get 10T/s with GPTQ and 37T/s with exLLaMa
on the same model?
Yeah
idk i never tested the same model on both, only my re-quantized models
I have a 30b 4 bit that should work on both
lemme test
I think AIT and exLLaMa just "unlock" the potential that the software stack is limiting, so these should give a bigger boost to AMD @nimble heart
the thing is
this person said they went from 18 it/s to 25 it/s with AIT
so
That's still huge though
but idk how optimal usage that was. it sure as hell wasnt comfyui
this was way before XL dropped
hello everyone 🙂 hope you could help with my doubts - I am in the phase of thinking/planning the project, I would like to fine-tune SD with ideally a dataset of few hundreds of pics, ideally few different styles - if it cannot work, I'll scale down to one stile less pics. I plan to use vast.ai, I have good python skills and knowledge in ML/DL. However I haven't touched generative in years and now I am a bit lost. I am looking at Dreambooth and Everydream but it's not clear the way I have to go, nor I find examples that seems good/relevant for my case
yea but nvidia cards like double with ait dont they
if you have any thoughts they I'd be very grateful 🙂
They do, but AMD might get an even bigger boost when used correctly.
Like AMD gets a bigger boost than Nvidia from exLLaMa
We'll find out if that theory is true after you try it
Wizard-vicuna 33b 4bit
Autogptq vs exllama.
so not even 2x where you almost 4x'd
I 4x'd on 13B, and so did you
with EXL2
I didn't try EXL2 yet, I'll test that when I get the chance
what model was that specifically I'll DL it rn
TheBloke MythoMax GPTQ
what depth
I have no idea, I'll check when I get back
4bit 128g I'd assume for your card
Yeah that one if I remember correctly
112MiB/s lets go
I am here!!
What have I missed?!
what gpu do you have? 4070?
everything
Ti*
dang. those almost cost as much as my XTX did
Whats the latest in control nets for the XL’s? Wanna try make summin tonight 🤔
aight something's wrong
it ran @ 50 token/s on AutoGPTQ and just barfed german garbage
Tf?
Screenshot model settings?
50 token/s, german garbage
its just defaults
bloke's files usually work with defaults so idk
let me try exllama
This looks like something GPT2 would do when it was used on cleverbot
exllama works fine. 68 token/s
Cool
exllama 2 specifically. I dont know if 1 works on rocm
2 just got rocm support like a few days ago
so now im converting all my shit to EXL
reminded me of when I tried increasing the max context on OPT
like the moment it gets passed its limit just barf
wonder if flash attention would make inference faster too or if its just memory
like 70 tokens/s is cool and all but it can always be faster
It might. 68T/s is godly
'godly' on a 9 gig model
i'll have to try the llama chat 70b
the exllama dev uploaded exl2 conversions so I can just download that, no need to spend hours quantizing
Might be a bad idea, I always wanted to try 33 and 70b and could never fit them in my GPU. I'll try it again with EXL2 when I get the chance
llama.cpp supposedly supports hipblas for splitting between rocm/cpu, and while it compiled fine and the model loaded inference crashed python
Ohhhh is exLLaMa not dynamic?
I was wondering about that
multiple gpus but it cant split to cpu I dont think
might be wrong
according to the exllama2 dev, he has 70b running on a single 4090 with like 2.4 bits or something
ig I could just set my gpu mem limit to like 8 gigs and see if it uses the cpu too
This is the reason why AIT is faster than TRT, TRT isn't dynamic so AIT can move around between GPU VRAM and CPU RAM
it just ignored my limit
speaking of exllama, I made some nodes for comfy
I havent run a too-big-for-gpu model in exllama yet so idk how it handles that
what depth should I try...
the lowest, probably...
Cool! I was thinking about including something like this in my AIT workflow
Multimodal incoming
yeah it works well enough with a good prompt or lora
wish there was some good 3B model though
how recent is the llama2 data? will it know what diffusion prompts are?
feeling the pinch with 10GB usage for Base SDXL + 7B
wonder if you could make a llama tune based on those prompt websites
That's the thing. I think the goal is to make an interface where you just write "make me an image of (........)" then LLaMa will know how to prompt with SDXL, send the generated prompt to a workflow, then send back output images
not sure about the llama2 dataset but it's not like it can't hallucinate a prompt without knowing what SD is
i mean even llama 1 can guess but I never got it to be good at it
it just kidna throws random words together
which I mean tbf that's what a lot of actual 'prompters' do but still
I remember @ionic dragon saying they made a LLaMa2 LoRA for making SD prompts
maybe with llama2's 4k context you could just fill it with example prompts then have it give you some new ones
oh right, I tested that lora with my nodes, it's pretty cool
older ss but you can see the difference
the top without lora is just hallucinating cause it has no idea what SD is
exllama loras are also very easy to load/unload
yea
but I dont see many loras being used for anything
its always tunes and merges
So we could make an interface that you would ask LLaMa for an image, then send generated prompt to SDXL and send back output images
true that
on HF
pretty much what the dall-e 3 page shows with chatgpt
holy shit it loaded
there's already addons like that for ooba iirc (sillytavern too)
Yeah, but do they use SDXL with AIT?
I mean no, it's just some basic api call
to use with comfy and ait uhh I'd guess you'd either need some template first or even have the llm generate the workflow in addition to image
I say we could make something like Comfybox except it's multimodal
llama2-chat-70b @ 2.30 bits average EXL2
totally, just need somebody to make it kek
so is my vram usage
are there prebuilt wheels for exllamav2 yet?
wheels?
on oobabooga you can just clone the repo into ./repositories/exllamav2/ and it loads it
no compiling or building wheels
no wheels that I know of
It's inevitable I'd say.. just an interface where you provide an SDXL workflow, an LLM, then you could chat with the LLM and ask it to make images
huh that's weird, pretty sure it errors out on windows without visual studio, ninja etc
no idea. im on linux with amd/rocm
hence exllama2 specifically cause its one of the few things that actually supports rocm well
llama.cpp segfaults
ah that explains it, compiling exllama ext on linux is a breeze
We might not even need to make a UI from scratch. You said Oobabooga has a feature for LLM+diffusion right? What if they make the diffusion part work on ComfyUI
the cuda extension gets compiled on first run, it's just fast (when it works) so you probably didn't even notice
ah yea rocm does the same thing. first time you load a model it take a bit to compile a kernel
yeah, iirc ooba uses the auto1111 api right now
but it should be possible to switch to comfy without too much effort
but on Arch Linux I just install rocm-ml-sdk and it pulls everything I need to compile everything rocm
Yeah, then make some kind of node that places the prompt the LLM made in the prompt, then you could use any Comfy workflow with an LLM
i mean it did say it wanted tobe the auto1111 of llms
It feels like it's almost in our grip, all we need is a way to execute this.. does Oobabooga plan to do this?
Hi
comfy does have some api since the swarm-something ui uses it
haven't really looked into how easy or hard it would be to implement in ooba/sillytavern though
Which model has the best war/military/agents related things like extraction/mission impossible series? (best realistic)
Shouldn't be too hard. By making it Comfy, we could use custom workflows even, just would need to specify where is the primitive with the positive prompt
yup, there's an example here, looks like you can just provide any workflow you like and then plug prompts
https://github.com/comfyanonymous/ComfyUI/blob/master/script_examples/basic_api_example.py
Does Oobabooga know about this? I feel like this is a goal they could focus on
most likely, I can't find any plugin for it currently though
most focus on implementing llms in comfy, not the other way around (I'm guilty as charged too)
there's also the issue of gradio being hell to work with (subjectively)
I reckon this could be adapted to comfy https://github.com/oobabooga/text-generation-webui/tree/main/extensions/sd_api_pictures
The only thing Oobabooga would need to change about the LLM+diffusion feature is to have the LLM unload before executing the diffusion part, most people can't fit SDXL and 13b models at the same time
Heck, maybe even keep one of them in CPU RAM at all times and have them switch each time being used
that's also a problem on comfy's side I think, when you just want to chat you should be able to unload all the sd models
not sure it's possible to do that currently
That shouldn't be too difficult.. it's definitely possible as of now
We just need to get Oobabooga to do this
Which they don't have to, so idk if this will get executed
there's some complicated memory management stuff in comfy but no 'unload everything' button
Maybe have Oobabooga do that? Pytorch commands should be viable on the GPU regardless of the environment
So having them switch places from VRAM to CPU RAM and back each time they are being used won't be too difficult
no clue how that works exactly, but if you're calling the comfy api from ooba's side you should be able to also send an 'unload everything' command, since it could also be calling a remote pc
well, not that you can call the comfy api at all for now so just theory
Or "send model to CPU" command to save even more time when the switching is happening
yeah, same idea
Same like ComfyUI does when VRAM gets full when using the refiner, have the refiner and base switch when being used
Having this done could be the future for multimodaling
@indigo carbon I messaged you
I'm sure this will happen eventually, even soon..
anyone?
no idea about that
also I just checked sillytavern and there's a bunch of apis already
(just not the one everybody's hoping for)
Silly tavern is just Oobabooga wifeu edition, not really relevant
A ComfyUI extension for Oobabooga would be amazing imo
Wow, this is beautiful!
It should be about eh 4h 10 min I release the lora then U can have a blast with it.
Still trying to collect better demo pics.
took one of the dall-e 3 promtps and tried it on dreamshape xl (leg gone
)
Waiting for @dapper dragon to get home come home raccoon so we can release this lora.
@nimble heart how did you get EXL2 working? I got the pip package and built from source, but idk how to quantize using it
Legs not gone, is under skirt.
@cyan crown what model?
o.k. nice!
How are you guys navigating terrible faces in fullbody images in SDXL?
it is thing of faces in distance. Dont know what with it. Probably there could be workflow with codeformer or gfpgan.
is it him?
BRO 🔥
for some reason my router disconnected me for 10 minutes
during that time I've found that lama cleaner does not work without internet
Hey Picturesonpictures, how often do you get one of those right sided image outputs from using IPA?
comes and goes. I started turning up width cropping. not sure what it does but we'll see
That's exactly what I was gonna test out
Was reading through the paper again to see if I could gain any other insight and came across this part again
well I kind of ran out of other ideas. but can't completely tell what it's doing. it's not like it literally crops the image or the render
Under the 2.3 "multi-aspect training" section
you know, I feel like I'm supposed to understand what's goin on there
I've got a better grasp of things now, but early on there were some things that were just so new and foreign to my mind that it took a little while for my brain to acclimate to being aware of their existence
like I'd read about some AI thing that was so far removed from anything I'd considered prior that it just couldn't be absorbed
Yeah I still want to know what the width/height/target/crop etc fully does, I feel like I barely have an understanding on them, to the point where I kinda "think" I start to understand, but then don't really know how/when I'd actually change them
what really grinds my gears is how they always just put these images up with 3 word captions and we're left to infer what sort of magic is occurring
i just got the best worst possible image i ever have from sdxl, and i'm sure i can't post it here
Rule #4 is waiting
yeah, that one doesn't bring me joy
it's insane how realistic it looks
why did the ai make that?
0 negatives, 20 steps restart, with the prompt I feel alive. My self-portrait suicide Tonight is no time to die Now I'm awake on my model
that's it
nothing else
just let the ai figure something out from that prompt
am i losing my mind (probably) but didn't at one point all images saved in "output" folder by default save workflow in image? dragging and dropping some now and none of them seem to pull in workflow anymore
are those lyrics or what?
yeah, they're lyrics
i'm addicted to lyricsprompting
that's why i have my verser script which converts songlyrics into perfect SD snippets 😄
a1111 -> dynamic prompts
| Slow, we step into the light. Inebriate in sights Unveil the truth that guides us home
| Look inside. With darkness my tears collide Sinking into the shrine of me I'm ready to be set free I'm sick as the dying, the air I breathe
| I feel alive. My self-portrait suicide Tonight is no time to die And now I'm awake
| Truth can break your soul in two. Just when you think it's you The trеachery will breed inside
| Embers, sweet and unrеfined. So bright but not so kind Will start a fire in your mind
| I feel alive. My self-portrait suicide Tonight is no time to die Now I'm awake
| Silence fuels the deadly cries. Ripped apart by knives in lovers' lies Curse my innocence goodbye Take my hand then take my life
| I feel alive. My self-portrait suicide Tonight is no time to die Now I'm awake Oh, now I'm awake Yeah}```
I mean, just seeing that snippet, I could see how it'd come to that. Some strong words in there are probably weighted heavily
Day for stupid questions, where do I find an updated good styles.csv for the WAS node?
You ever use the twri styles?
Not that I know of, thanks will give it a go.
It integrates your prompt into the styles.
And has like 70+ or something to choose from
i finally got around to porting it: https://github.com/mcmonkeyprojects/sd-dynamic-thresholding#comfyui
https://civitai.com/models/149395?modelVersionId=166843
CircuitCraft - Infuse Electronics Into Your Visuals
Unveiling CircuitCraft, a cutting-edge LoRA (Text to Image) model that ingeniously converts everyday images into intricate printed circuit board (PCB) designs, adorned with capacitors, transistors, and electronic wonders. With CircuitCraft, you have the power to seamlessly merge the realm of electronics with your favorite visuals, creating a captivating fusion of art and technology.
Use trigger word: 3l3ctronics.
Trained on 3000 steps from a highly detailed by hand captioned large dataset.
Special thanks at:
@dapper dragon my SD buddy who helped me to make all my LoRA's and be able to output wicked stuff.
Also special thanks to my my upfront image generators and people who I care a lot about:
@masslevel
@osiworx
@mix
@abstract depot
whoa! awesome!
this is going to be interesting
question for a buddy.
He has an AMD card, 8gb. radeon 5700xt.
I've seen snippets of people talking about some issues with AMD cards using SD, but mostly ignored since I have an rtx 2060.
Is there any limitations he'll hit if he tries to jump in?
my testing model rejects liked realism too much, it seems. they make great realism tho ❤️ too bad i can't get this ported to my main model without screwing up everything else ... or can i... sigh, another dumb idea for model mixing enters my head
time to start model merging again.
AMD cards have all sorts of weird limits. Modern cards should work mostly, but yeah expect surprise snags and/or performance issues
I had 5700XT and switched to 4070 for Stablediffusion
Gotcha, I just didn't want to point him towards SDXL only for him to not be able to use it.
Mooie.
fuck. i needed this merge to fail instantly, not actually give me the most awwesome fire/ice/stone/water roses 😮
FUCK. i wanted to be done merging XD
no biggie. just gotta find a prompt i know this merge will screw up so i can discard it and go on with whatever except model merging again
@hardy cipher not almost, that's cooked 😦 i'm sorry
@fair stag ah now i get it lol
thanks. I guess I should give up
perhaps a certain weight too high?
I'm experimenting with something new
but I also think a lot of people's criticisms in this regard are sort of evolving to mimic musical theory in the sense that people begin to treat subjective opinions as objective truths
I went to look for the 4x Sharp upscale now there a whole Civitai type website for just different type of upscale
I used A1111 with SD1.5, but with SDXL I only use ComfyUI so I can kinda see whats happen behind the scenes
i love comfy's down to the gears interface. node graphs have a lot of benefit. Automatic makes it so easy to switch resolutions, styles, img2img, and just have a very dynamic workflow
they are both great
comfy isn't very conduscive to creativity. it's a back end. needs a front end like stable swarm to have that grease for the process
be sure to give my model a try if you want ❤️
😄
shameless self-promo 😄
link?
will try
you can disable refiner if you use it
yes I use
at the moment I'm exploring various painters and illustrator and there are tons in base model"
I always got lost in the A1111 interface, But with comfyui what in your flow and the path its taking is great for me to get a more desired outcome
stick to what you're comfortable with 🙂
i'm very used to the [this:to this:0.5] output which doesn't work well, or even at all in comfy given i'm very often over 150+ tokens
@Eface I hope one day you can have the Gundam deli tray in real life.
you saw my gundam deli tray? ❤️
that was actually such a hard prompt to pull off actually
@supple knot ChaiNNer is father of comfyui, some sort, only for resizing. And there are some good workflows.
any expert developer with SDXL looking to get hired for a job?
I never built a pip package. If you clone the whole exllamav2 repo into oobabooga/repositories/ it can load it directly. I was able to use the convert.py script inside the exllamav2 repo with my oobabooga virtualenv active.
I figured it out on my own. one thing, flash attention doesn't seem to be compatible with windows, so I guess it won't be any faster than EXL1 except it can do q8 now
it can only do q8 with the exl2 format which is why I'm converting everything
it can't do q8 gptq still
exl2 is supposedly better quality but for bit because it does that measurement pass first and produces a mixed bit binary
so -l 4096 always gave up on me. I had to remove it
it's been going for over 2 hours.. I think it doesn't cause issues for me?
also I think -r 400 only affects the final compile, you need to set -mr to affect the measurement rows
yea the measurement worked with -ml 4096 but the actual compile failed with -l 4096
are you still measuring?
oh, yeah, it failed haha
that's still good, then you can load it with -m and skip the 2 hour measuring phase
ok, how tf do I do that?
in output folder
can we go to DM? this channel isn't meant for this
I bet someone could calculate the total number of unique images SDXL could produce... Nope. You can always concatenate a longer prompt, and you can always start from a different img2img source. Because it accepts variable length, infinite inputs, SDXL is mathematically infinite, as far as computer programs go.
omg yeah no way to calculate it haha
Just with txt2img, img2img, ipadapter, revision, controlnet, plus any other combinations of those, then you add in the variables for seed, steps, schedulers/samplers, cfg's, and the infinite amount of prompting you could do, plus so much more I'm missing. Astronomical numbers it would come to
and that is the point.. Every possibile picture you can imagine is in it.......it's marvellous
Well, that would be impressive by itself, but consider this: If you had infinite VRAM, you could just plug in an infinite img2img source, and get out an infinite img2img. It is literally infinite, in the fancy way we define infinity in maths. 🙂
(You might get some duplication in an infinite image though, just saying.)
This system looked clearly harmful, so I used RLHF to make sure it's safe.
The crazy thing to think is how much variation you get with just a single seed number. Then just multiple that by whatever the maximum "if there even is one" number of seed numbers
Yeah, it would be interesting to know something like: What if you stuck to 75-word prompts, and just 1 sampler, and just 1024*1024 images? Use every combination of unique prompts + unique seeds. How many images do you get??? It's a mindbogglingly huge number. More images than there are particles in the observable universe.
Yeah it's like the card shuffling statement, that no deck of cards, shuffled together, has ever been the same. And that's just 52 cards
80.658.175.170.943.878.571.660.636.856.403.766.975.289.505.440.883.277.824.000.000.000.000
for 52 cards
hahaha so nuts
LOL I never thought about the deck of cards stats, but yes. Definitely true.
One caveat though: This is assuming you believe in God creating this as a unique universe. Obviously if you believe in a multiverse, there are infinite universes where you shuffled that exact same sequence.
❤️
Lads, what m I doing worng in inpaiting, I'd like to remove this furniture but cant seem to do it:
What would you suggest for cfg?
Working thank you! 🕺🏿
Final question, there seems to be an outline, is this fixable?
increase your blur a little
Doing, would you reccomend masking JUST the object?
I tried experiment more with Controlnet-Lllite training and for this kind of out/inpainting I think I need a lot more training data and training time than I care to spend on it.
It was supposed to keep more of the input image, but instead it made its own Yann Lecun as "batman" as I couldn't run the controlnet at full strength. Still, it allows for easier composition so it's not completely useless.
Up to you, but I would probably want to mask just a little bit larger of an area. Just kinda something you have to mess with. Also, sometimes just re-running it will give you a different result you might like more.
do you have a single gpu? You can load a 70b model in a single gpu with 24gb vram? 😮
thats with only 2.3 bits average quant
I could probably push it a little higher to 2.4 or something but any more it might OOM once the context fills up
How well does it work with that level of quantiziation?
hm, I never got any of these 33b models with 8k context size running, not even with 4k context size
no idea I havent extensively used it. I just wanted to see if it ran
33b is all based on llama 1 which was trained for 2k context iirc
the llama 2 13b and 70b models can do 4k apparently
yeah, there are some 8k lora finestunes
I just once tried a few of these models by TheBloke
I think they were 4bit quantisized
but as soon as I go over 2k context size I get OOM
For inpaiting is No objects a positive prompt? Or should I worry about putting objects in the negative prompt?
my fav part of Dalle-3 DALL-E 3 incorporates safety measures that restrict the generation of violent, adult, or hateful content. Moreover, it has mitigations in place to avoid generating images of public figures by name, thereby safeguarding privacy and reducing the risk of misinformation.
well they don't wanna get sued after all
demons made out of corn 🤣
who's got the best controlnet setup thus far?
not having best of luck
cannot get shields or weapons to work in controlnet to save my life, canny or depth
can definately see controlnet working, did a 100 image batch, not a single shield
shields and weapons in general can be a pain in the butt
if it was a full body shot maybe it will understand better
I don't see shield mentioned in the prompt
maybe start off with a simple prompt like viking holding a shield, then build on it
Who's got the the best hotdog dog setup thus far?
oops yeah i replaced it last minute from viking shield to space weapons, just trying anything at this point
figure this would be easy with controlnet, you can clearly see where the shield is supposed to go lol
might be easier to inpaint the shield section
Corn cat
Train your own ControlNet-lllite on shields and weapon.
yeah i really need to learn training, with the new cards no excuse anymore
yeah i was able to that, but i need to batch run without user input, so maybe i need to try a generic enough mask that would cover all weapon sizes
I've not tried with 24GB VRAM, but you might be able to squeeze in Controlnet training on a 24GB card if it's that you have. If you've something smaller then you kind of have to train in the cloud.
lol... doubt
Well...using a lora in that context is kinda like cheating. 😄 But it's all good.
TOUR: http://www.themidnightofficial.com/shows
BUY: http://smarturl.it/TheMidnightSTORE
STREAM: http://smarturl.it/TheMidnightSTREAM
MERCH: http://smarturl.it/TheMidnightMERCH
Instagram: https://www.instagram.com/themidnightofficial
Twitter: https://twitter.com/TheMidnightLA
Facebook: https://www.facebook.com/TheMidnightOfficial
WEB: http://ww...
dont worry;
owned lulz
that music is actually good up loud
Interesting, a face shot finally
wtf, at the next one:
do I need to use the sdxl_vae with kohya_ss on sdxl based models?
as in the flag --pretrained_vae_path_or_name
I love me some good Zombie pics
🙂
Made flowers for everyone
Anyone use depth controlnet?
If so, what preprocessor do you choose, and why?
I'm trying to decide on which option I prefer
I'm interested
MiDaS Depth Map
Any reasoning or just the first one you tossed in your mix?
midas is kinda The Standard Option ™️
Yeah that's what I've gathered from reading around the web, but don't really see anything set in stone on why one vs the other offers better/worse performance.
Beyond leres offering remove bg options
zoe depth is out for me. Randomly will throw errors for no apparent reason, while all the other options haven't a single time.
Just grabbed this off civit
perfect
smallest sd negative
heres my chungus negative with 27 neg embeddings (low quality, worst quality, lowres,third-party watermark:1.5),polymorphic, washed-out low-contrast (deep fried) watermark, cropped, out-of-frame, (mutated hands and fingers:1.4),blurry,Auroranegative,KFHB,bad_prompt_version2, ng_deepnegative_v1_75t,verybadimagenegative_v1.3,EasyNegative,bad-artist,bad-hands-5,bad-artist-anime,Vile_prompt3,Bad_quality,Bad-image-v2-39000, NEGS Bad Hands,negativeembed,NEGS Bad Image v4,NEGS Bad Prompt v2,badv5,deformityv6,wdbadprompt,badquality-test13-graupel-last ,badquality, colorfixerv1,re-badprompt,realisticvision-negative-embedding,an6,bhands-neg,negative_hand-neg,
I need to make some for XL as I had 2 I never did not use for 2.1
need to find their dataset if I can
yea i only see like 5 on civitai 😔
Nearly completed.
Could you guys reccomend a way to add more detail to a scene like this:
To me it just looks very blurry, not hyper detailed.
when is scribble controlNET coming to SDXL?
how you do it? blending?
it's a conglomeration of quite a few things actually
ipadapters, blip interrogators, cfg scheduling, stylers, conditioning tweaks, other things
something beyond my understanding 🙂 I tried A1111 again today
how'd that work out?
it's all I used until a couple months ago, but was glad to find something that gave me more options and control. I like to tinker with things, and comfy is ideal for that
I love A1111 now it hs sorted some of its memory problems - ADetailer for faces is a cool extension
@peak dove going to test adetailer
added your flowers
unfortunately it starts stucking...
How do you turn OFF Discord Notification Sounds? I have tried everything ... ?
mute channel?
originality 0
i am in a1111 sorry. Cant check.
@peak dove you mean afterdetailer?
yes it is it
cataracts
Is there any timeline for SDXL inpainting timeline releasing?
There is already an existing Model. https://huggingface.co/diffusers/stable-diffusion-xl-1.0-inpainting-0.1
wanna try to see through spiderweb something, i must try it again and again.
New update to my checkpoint model
hey has someone a video or something that explains how and why model merging works ( i dont want a guide on how to do it with a tool iwant to understand how it works) :3
Easy to do with ControlNet.
Yes, with SDXL
result is not the same
It certainly can be.
it's goodm but with QRMOnster it's great
👋 Hey there, Art and Tech Enthusiasts! Ever wondered how to create an image that hides another within it? Well, you're in for a treat! 🎨🔍
In today's video, we'll delve into the captivating world of hidden images and texts using StableDiffusion 1.5 and ControlNet. Transform an ordinary mountain landscape into a secret canvas that holds the Mona ...
what is this?
upscaler lol 
usually much bigger but i am testing smaller (the tests are not going well)

