#✨|sdxl

1 messages · Page 150 of 1

hasty smelt
#

I'm a new user, someone more qualified could give a better answer for this, but it seems that Vae is the time when many calculations are being done, it seems that the model takes all the files from the SSD and at that moment throws everything into the RAM, so more RAM is better, you can see the peak on the graph at the moment where the Vae pass begins.

cyan crown
#

look for sdxl_vaefp16.safetensors

hasty smelt
viscid warren
cyan crown
#

I hav 64GB of CPU RAM and 12 GB of GPU ram and it speeds up the last step a lot

viscid warren
cyan crown
#

it works for every checkpoint that needs vae

viscid warren
#

Sometimes after closing many Firefox tabs my ram is still drained it cleans it up

viscid warren
#

I also have 12gb vram on rtx4070

cyan crown
#

4070 me too

#

DDR5 yes

viscid warren
#

CPU is amd Ryzen 9 7900 x3d

#

Should I upgrade it to 64gb ram?

viscid warren
#

Can overheat easier

cyan crown
viscid warren
cyan crown
#

22

viscid warren
#

Gpu

cyan crown
#

2

viscid warren
#

Dual version too ?

cyan crown
#

yes

#

66° at 100%

#

celsius

#

75 max

viscid warren
#

Nice

#

What CPU you have for it

cyan crown
#

5950x

viscid warren
#

AMD Ryzen?

cyan crown
#

yes

#

Noctua NH-U12A as cooler

raw saffron
#

wont downloading models break my pc? each model is like 10gb agony

#

xd

cyan crown
raw saffron
#

where can i download it? im so confused with all of these models xd
scared to download the wrong one

viscid warren
#

Tutorial

viscid warren
#

They also have link

raw saffron
#

ty 🙂

cyan crown
#

download sdxl base, sdxl refiner, sdxl vae to start

raw saffron
#

how do these 3 differ

cyan crown
#

you use all of them together

raw saffron
#

oh yeah nvm xD

#

and the ui

cyan crown
#

for UI look this

real vine
#

i want to make people on a meeting table what kind of technics i should use ?

cyan crown
#

it's the same for 1.5 models and sdxl

raw saffron
#

ty

real vine
#

i target for photo realism

#

one person is easy but if people gathers it becomes weird results

raw saffron
#

wait is there a diffetence between sdxl base 1.0 or xl base 1.0

#

i think its the same?

cyan crown
#

yes+

real vine
#

@cyan crown can we make realistic people faces with embedings are they for this kind of things

raw saffron
#

and does it let me use chatacters? dalle3 doesnt allow that

#

so u gotta have to trick it saying "an image of someone familiar to... "

cyan crown
#

dalle 3 is another thing......

raw saffron
#

yeah ik

#

but is it restricted like dalle or na?

crisp owl
#

not run locally. If you're running on a colab or something, the person hosting it may or may not

cyan crown
#

what you mean with restricted?

jade hill
#

is there a sampling method considered as the "base one" for XL ?

cyan crown
#

no

#

I use DPM++ 2M Karras

jade hill
#

ok, thanks 🙂

raw saffron
cyan crown
#

ok?

lusty wolf
#

Peeking in...

raw saffron
#

ah ok ty xD

#

whats the difference between the base to the base vae

#

there are 2 files of the base models except for that one of them has vae in their name

crisp owl
#

0.9 is the good one. 1.0 is borked

cyan crown
#

look at the picture i posted before

crisp owl
#

There a base model released now specifically with 0.9 baked in it

raw saffron
#

im confused

cyan crown
#

look at the picture!

#

you have 3 files to use

crisp owl
#

1.0 base model with 0.9 vae is good

cyan crown
raw saffron
#

oh this

raw saffron
# cyan crown

oh so u need both of them
whats the "vae" for exactly? what does it do

crisp owl
#

decodes latent space to pixel space

cyan crown
#

decode

raw saffron
#

ah ok ty 🙂

shy kelp
#

i recommend 1.5 before sdxl

#

at least to know how things work

nimble heart
#

starting with xl is find just dont use the refiner

raw saffron
#

whats the refiner?

shy kelp
#

mans is going to hit generate and wait 30 minutes for 1 image lol
start with 1.5

raw saffron
#

why though?

cyan crown
#

why 30min?

#

what is your gpu?

shy kelp
#

exaggerated

nimble heart
#

XL makes an image in like <10 seconds with the right settings

shy kelp
#

and 1.5 is <5 thishowitis

crisp owl
#

impatience nowadays

nimble heart
#

why would you need an image every 2 seconds

#

XL produces substantially better quality

#

I understand if you're vram constrained and XL takes like 50x as long due to offloading but otherwise idk why you'd still use 1.5 outside of those animate tools

cyan crown
#

and qr monster 😄

shy kelp
#

people still doing qr stuff? lol

nimble heart
#

it shouldn't be hard to train a qr monster for XL im surprised it hasnt been done

shy kelp
#

there is one thishowitis

nimble heart
#

everyone's too focused on waifus

crisp owl
#

you can use it to do otherstuff than just QR's

shy kelp
#

and it works fine

nimble heart
#

oh? is it good?

crisp owl
#

And the official qr monster team is working on an sdxl model

cyan crown
rustic garnet
nimble heart
#

I feel like we're getting pretty far from QR codes lol

rustic garnet
#

yeah, but its great

#

you still see the mona lisa but only from looking at the image from far away

shy kelp
#

pretty sure this is xl, i forget

kind pendant
#

where do you guys get your models from?

shy kelp
#

custom

crisp owl
#

civitai mostly

cyan crown
#

with SDXL you can have almost any style with base model

kind pendant
#

ah.. thank you ^^

crisp owl
rustic garnet
# kind pendant löl

it's true. Base model is already very good. Custom models are not better in general, only for very limited styles

#

I would always try base model first before downloading custom models

cyan crown
#

I'm trying all the styles of base model

shy kelp
#

XL, no Loras

cyan crown
#

they are thousands

kind pendant
rustic garnet
#

that might be the case for 1.5

#

for SDXL I cannot agree on that

shy kelp
cyan crown
kind pendant
hasty smelt
#

guys, I am following the tutorial to use (--medvram-sdxl) however I am not finding the file in my folder similar to the video file (webui-user) to open in notepad, does anyone know what I should do? Thanks

crisp owl
#

scroll down more?

shy kelp
#

lol

cyan crown
#

😄

hasty smelt
crisp owl
#

.bat

cyan crown
#

the .bat

hasty smelt
#

thanks guys

steady grove
#

you can edit a .sh if you just drag and drop it into notepad or another editor. regardless, you'll want the .bat because .sh is a linux shell script

#

god speed pickle rick

crisp owl
cyan crown
steady grove
# crisp owl

reminds me of that weird fiinal seaoson of american gods, well, i mean, they were all weird i guess

cyan crown
crisp owl
#

it keeps trying to make the branches made out of jormungandr

shy kelp
#

(jormungandr serpent:1) -> 1.4

crisp owl
#

Yeah testing weights currently

I just process batches of 5 to go through a full process and do work on the side while waiting for em to finish

#

~200 seconds for a full process including upscale

raw saffron
#

is there any way to use sdxl on my phone? like maybe somehow run it on some kind of server

shy kelp
#

like your server or from a host?

#

because if it's a host, you're going to be paying a lot

south horizon
#

colab is free

shy kelp
#

for xl?

#

lol

cyan crown
#

for xl PC or pay

south horizon
#

mm I dunno, never tried it with xl

shy kelp
#

then why say it's free?

south horizon
#

things like runpod aren't that expensive are they?

shy kelp
#

for XL it will be, 1.5, whatever

cyan crown
#

since it's like a drug....yes

crisp owl
#

I've seen people using --listen 0.0.0.0 when they talk about using from a phone, but that's about all I know. No clue about the specifics.

cyan crown
#

adding --listen to webui-user

#

I tried and it works fine

crisp owl
#

No Yggdrasil here, but still kinda neat

shy kelp
#

that requires your pc to be on the same network though right?

cyan crown
#

or you can do port forwarding on your router and use something like duckdns to reach outside

shy kelp
#

port forwarding days on my own pc days are long over for me lol

cyan crown
#

well to reach your sd interface from outside your LAN you nned to do PF

#

on the router

fallow prism
#

What about share=true and using the gradio url?

shy kelp
#

people are too trusting, i just vpn or disconnect the internet altogether

cyan crown
#

yes VPN is the best solution

#

because you don't risk people using exploits of the interface

olive perch
#

OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 2.00 GiB total capacity; 1.61 GiB already allocated; 0 bytes free; 1.65 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

hey i keep getting this error when trying to generate using the base sdxl model or just the 1.5

#

what can i do about it?

crisp owl
#

If in A1111, make sure you have optimization flags set in your webui-user.bat file

olive perch
#

whats A1111?

#

oh

#

what optimization flags?

crisp owl
#

When I still used A1111, this was my settings.

TBH not sure if you still need the first two -- flags, but guessing you don't have a super beefy GPU, so at least keep the --medvram

set COMMANDLINE_ARGS= --opt-sdp-attention --no-half-vae --medvram

olive perch
#

ill try ty

#

ill add the --medvram only, if it wont work ill just add the others

crisp owl
#

the alternative to the first opt-sdp-attention is --xformers
So that's another you can try also.

olive perch
#

oh ty

olive perch
crisp owl
#

yah

olive perch
#

okki

#

still get the error after adding the medvram parameter oof

crisp owl
#

What size image are you trying?

hasty smelt
#

What do you think is the best to use?

olive perch
crisp owl
crisp owl
olive perch
#

how do i check that

crisp owl
olive perch
#

0.5GB/4

#

i mean im on a laptop rn though.. xd

#

didnt expect for the images to be generated quickly but didnt expect them to not generate at all too

#

is 4gb not enough?

viscid warren
glad grove
#

for 1.5 is barely enough,for sdxl no, specially a laptop gpu

crisp owl
#

so 4....that's probably not gonna work in A1111 at least for SDXL.
Maybe you can squeeze em for 1.5 models in A1111.

Comfyui is more memory friendly, so perhaps you'd have better luck with it, but, it's more complex also

olive perch
#

i see

#

ty

glad grove
#

post the name of the gpu here

crisp owl
#

same screen I posted, top right name

olive perch
#

idk the full name, its from intel though

glad grove
#

if its intel iris its not good enough for ai

olive perch
#

i see

#

ill try on my pc when i get back home

#

was curious how it would be on my laptop

#

ty

crisp owl
#

super slow if you set everything up perfectly lol

#

tiled processing, low vram flag, a bunch of dumped stuff to CPU processing...it would be slow.

#

probably possible, but not worth it likely

hasty smelt
half cedar
#

You cant handle that sampler

spring fulcrum
#

Has anyone found a way to use the GPT 4 idea2img in comfyui?

maiden gale
#

Im trying to batch process images in in Comfy, have a directory with images in name order, does anyone know of any custom node that has this feature?

I tried WAS suite, but it doesnt pull the images in name order.

crisp owl
maiden gale
#

It has some random order, couldnt figure out what it is

#

I'm trying to batch prompt them at the same time so the order will matter

maiden gale
#

I've found a solution by changing the script with the help of chatgpt, what a time we live in lol

#

thanks @crisp owl , I changed the load order used in the WAS suite loader

crisp owl
static berry
ivory blaze
#

ChatGPT is catching up to SD!

#

like for real LOL

#

I doubt I can decode that, no extended characters... but lol. I guess it thinks since I can supply images now it can make them some how

nimble heart
#

copy paste the data into an html file so your browser decodes it for you

#

looks like a valid png header so it might actually work

static berry
half cedar
#

That would be my luck. Escaping an inferno in a suburu wagon

indigo carbon
#

do you see it?

crisp owl
#

Frey's scalable ship

indigo carbon
#

I've hid something here

indigo carbon
# ivory blaze like for real LOL

that's more similar to how pixel diffusion works actually, it diffuses each pixel individually instead of diffusing the latents

#

I doubt it will make anything by writing PNG data, but it might be trying to color each pixel individually

#

very dumb approach to generating images though, but it'd be interesting to see an LLM being able to actually create a PNG

half cedar
indigo carbon
crisp owl
#

ah

#

now I see

indigo carbon
#

I can see it, but that's probably because I see the origin

#

I've hidden a word this time

#

it's really COOL that it's possible with SDXL now, but it's not quite as good as QR monster imo

crisp owl
#

Theirs is quite good, wonder how long until they release their sdxl version

indigo carbon
#

the one I'm using is also somewhat decent, but it really struggles when the inputs have more complex shapes

#

it also deteriorates quality way more than QR monster used to

crisp owl
indigo carbon
# crisp owl

I'm assuming he's talking about quality deterioration? because that's the only flaw I see with the one I'm using (which isn't QRmonster)

crisp owl
#

No clue, I can't make assumptions on that

indigo carbon
#

link to that post? I'd keep track of that

hoary saddle
indigo carbon
# hoary saddle

oh, that's actually a model I already released a week or so ago. I just use it as a base model in my workflows

hoary saddle
#

i gotta figure out how to train i guess, 2 409024G cards and i'm rendering 1024x1024 jpgs....boring

#

too lazy to watch a 2 hour tutorial on youtube

crisp owl
#

I'll borrow one

hoary saddle
#

will email it over to ya 😉

crisp owl
#

Sweet, looking forward to downloading it!

indigo carbon
hoary saddle
indigo carbon
hoary saddle
#

that works great tho if you play around with the cnet settings

indigo carbon
# hoary saddle

huh, yeah, that's better than what I usually get with the available QR controlnet model.. what'd you do here?

#

this is what I'm currently working with. I was thinking the only thing limiting this from getting this better is the controlnet model?

uncut fiber
#

how about white subject and black background, inverted image?
in controlnet it is definitely better.

indigo carbon
#

I tried the other way, it wasn't as good

uncut fiber
#

o.k.

indigo carbon
#

but idk, I feel when you're trying to control SDXL too much it looses the ability to make stuff like this

#

again, a flaw caused by CLiP

#

CLiP never likes being controlled too much, it's just really good at the things it's intended to do. CLiP is SD's bottleneck imo, that's the only thing limiting complex multimodal inferences that won't cause quality degradation

analog roost
#

Hi guys, has anyone noticed weird color issues with dynavisionXL after installing and enabling the TensorRT extension and switching to the dev branch of Automatic1111?

#

More than half of the images generated now have the issue in my case

#

In some cases to kind of extremes like this

#

Even more ridiculous 😅

hoary saddle
analog roost
#

Though admittedly some images generated like this are cool like this skirt

finite thunder
#

Hi all, is there a way to check the list of artists' images that are included in the dataset and haven't opted out?

crisp owl
finite thunder
#

interesting! thanks for sharing.

uncut fiber
indigo carbon
#

interestingly, it will always perform the same styles if you make up artist names

#

I used a made up name once, then did it again with an entirely different prompt that has the made up name; both images had a similar style

#

not sure if this is a good example:

#

"by jeff fuctional"

#

"bucket of water, by jeff fuctional"

#

similar color accents; but not as presistant with the made up style as I remember it being

icy brook
solar merlin
#

I keep getting red spotting visuals on my texture generations using SDXL. I have tried using the 0.9 VAE and different schedulers but they all still seem to produce this type of distortion, any ideas?

uncut fiber
#

There used to be tilled option which i believe can be found in quicksettings, are you using it?

solar merlin
#

This is using diffusers so I am manually applying the circular convolution for the Conv2D function. But the tiling is fine. It is the red artifcats that appear with SDXl in both tiled and untiled images

#

a better image

#

(watermark is also off)

uncut fiber
#

i dont know. Probably try different SDXL model with baked in VAE

solar merlin
#

I get it using the default baked in VAE too. But I guess maybe trying some other models is a good idea

minor glacier
#

Have you got invisible-watermark in your enviroment, that looks a bit like the watermark.

solar merlin
strong copper
mellow tendon
#

I'm loving the 2x faster generation speed with the TensorRT SD Unets.
But now I just find myself running Huen sampler at the same speed as I used to run DPM++ 2M Karras, just because I slightly prefer the output of Huen (but it is 2x slower than others).

fierce hollow
#

is that with the new nvidia extension?

#

guess that doesn't work with comfy for now

mellow tendon
uncut fiber
#

it working for SDXL for you as well?

mellow tendon
#

Yes it works in SDXL.

uncut fiber
mellow tendon
#

if you want to upsacle to 2k you need to generate an extra Unet (well for just about everything you need an extra unet)

uncut fiber
#

yes but it doesnt work for me with SDXL

mellow tendon
#

Did you generate the Unet with the SDXL model you want to use selected?

#

you need to do it for every model and lora you want to use.

#

I tried optimising a Lora and just got a load of errors, I think that is in Beta.

frozen terrace
#

Does SDXL + Controlnets (IP-Adapter, depth etc) work when using TensorRT in A1111?

uncut fiber
#

yes and it doesnt work

#

@mellow tendon yes

frozen terrace
#

Thanks, don't see a limitation there regarding my needs.

mellow tendon
mellow tendon
fierce hollow
#

does every resolution require compiling a new module? or is it similar to ai template where, say, a 2048 module can be used with 1536 resolution

#

or wait, more like, can I do 2048x1536 and 2048x2048 with the same module or would that requires 2 different ones

#

exporter seems kind of confusing

vale eagle
mellow tendon
mellow tendon
fierce hollow
#

ah sounds good, thanks

noble shoal
mellow tendon
viscid basin
#

Is it okay to generate images of my favourite kpop idol for personal use?

mellow tendon
#

Oh but I suppose that is generating a load of image but that is happening in the cloud.

noble shoal
noble shoal
#

In my opinion, and I can only talk for myself. I think it's a bad idea 💡.

mellow tendon
#

I still think if it can generate the image you want in one go (with a small batch of Drafts) it could be efficient, I just ran a bacth of 40 images and still didn't get quite what I was looking for.

#

I got some "happy accidents" along the way, with these learned dragons.

viscid basin
#

Those are cool

#

Can you do cute dogs doing human things?

uncut fiber
#

i got speedup in exported models.

#

i got error when tried to export SDXL

fierce hollow
#

repo says you need to be on dev branch for sdxl

uncut fiber
#

yes and one was there and it took him 3 mins to generate image, so it 9 times slower 🙂

#

i think i need be logged in nvidia and download that zip file probably. But SD is working o.k.

mellow tendon
mellow tendon
uncut fiber
#

it says it should be installed from URL?
I have only tab TensorRT, but haven't tried advanced tab, supposing i have to have choose some converted model to appear? For SD models it is working, but still getting some errors starting A1111 about entry points. Probably i need download zip from nvidia which means signup and login?

#

It wasnt Qwerty_Qwer, it was @oblique swan i think

indigo carbon
#

this would make sense if TensorRT is faster than AIT, but that's not really the case if both are done correctly

uncut fiber
#

AIT is already working?

indigo carbon
#

the next target for optimizing diffusion is pulling off something like exLLaMa for diffusion

indigo carbon
uncut fiber
#

o.k.

indigo carbon
#

the speed difference between pure PyTorch to AIT and TRT on a modern GPU is about 2-3 times the normal speed with no degradation when done correctly. HOWEVER; with language models- there's something called exLLaMa that does a WHOPPING times 8 boost by having actual optimized kernals. the day this happens to diffusion is the same day diffusion becomes an instant AI

whole kettle
indigo carbon
whole kettle
#

yeah so it just tells it what nodes to go to without any understanding of what a direct object is right? It just sees "Dog is node 52521 in model"

#

or even their relation to eachother

indigo carbon
#

the only reason SDXL can have text in the images is because the UNET is a masterpiece, that's it. CLiP is a bottleneck

indigo carbon
mellow tendon
# whole kettle Was kinda thinking about this before I joined. It doesn't really understand sent...

This video shows it well, "a plate without a bannana on it" cannot be done with just Clip right now https://youtu.be/TL2A8MYXsCE?si=MgROx4PW4kfHzocl&t=383

Get Magical AI for free and save 7 hours of busywork every week: https://getmagical.com/matt

▼ Link(s) From Today’s Video:

Research: https://idea2img.github.io/

ChatGPT: https://www.futurepedia.io/tool/chatgpt

► MattVidPro Discord: https://discord.gg/bQgcbjs2Sg

► Follow Me on Twitter: https://twitter.com/MattVidPro

------------------------...

▶ Play video
rustic garnet
#

the problem in my opinion is the limited training data for captions

#

initially, people used the ALT attribute in images to caption them

#

which means most image captions are rather uninformative

indigo carbon
#

BLiP on the other hand, can encode both text and images just as easily; due to having an additional LLM component, it can have an excellent understanding of language. SDXL completely masters txt2img, but that's it. I can almost guarantee that if SDXL would have a more modern encoder, it would DESTROY other stuff completely

rustic garnet
#

just think about an image of a battlefield with many soldiers and corpses, smoke and everything. The caption of such an image is not "battlefield, dead soldiers, corpses and blood on the ground, Napoleonic era", it's rather something like "Waterloo"

#

thus, CLIP was the best tool for this kind of data

#

it is trained to assign captions to images and embed them in the same space. This makes it extremely robust to bad captioned data

indigo carbon
rustic garnet
#

I'm not sure if we are much better in this regards nowadays. My experiences with BLIP are rather... bad. Sometimes it's okay, but most time BLIP gives me totally wrong captions

indigo carbon
#

when it comes to the QUALITY of the images, SDXL pretty much destroys everything else, but it won't be as creative as Dall-e 3 due to CLiP being a bottleneck.

rustic garnet
#

DALL-E, Imagegen and DeepFloyd are models that use LLMs trained on text-data only. So this models don't have the disadvantage that they are only trained on bad captions, they are trained on the whole internet text corpus. HOWEVER, they are NOT trained on images and, thus, have no idea about visual components

#

they have a better text understanding, but probably a worse "style" understanding, as they don't have any knowledge about visuals.

#

I'm pretty sure the reason SDXL is using CLIP is because it turned out to be better than the alternatives

indigo carbon
#

BLiP2 has knowledge about visuals

rustic garnet
#

if you look into the SDXL source code you will see that they tried different text encoders, too, such as Flan T5

rustic garnet
indigo carbon
rustic garnet
#

in theory BLIP should have a better text understanding as it is instruction trained. But I wouldn't say this is guaranteed. As said, it always depends on the quality of your training data

rustic garnet
#

also, BLIP2 is a really large model. You cannot have both, a model that fits into consumer hardware and a model that is state of the art

indigo carbon
#

for me it even distinguished the styles and expressions. you must've not inferred it correctly then

rustic garnet
indigo carbon
#

maybe if SAI came up with a new text/image encoder..

vale eagle
#

SDXL base image

indigo carbon
# vale eagle

again, I wasn't complaining about the quality, it's just that CLiP is definately a bottleneck, look at IPAdapter for instance; that's the only thing that ACTUALLY enables it to get image input

#

if the text encoder itself would have image input capabilities IPAdapter won't be necessary and there won't be any degradation when doing image input

vale eagle
#

with current text encoder

indigo carbon
#

the text encoder itself is fine, it just won't have a good understanding feeding off of small prompts and it won't get image input

whole kettle
#

Yeah if you roll the right seed and hit the right nodes in just the right way it does a good job.

vale eagle
#

It could be a huge improvement by using new techs which just came out within a few months.

indigo carbon
#

the solution to this seems to be SAI making a new text encoder that is best at all worlds

rustic garnet
#

SDXL is trained on CLIP text tokens. In principal you can include images, as CLIP embeds images and text into the same space, but then your image would be only a single token which does not make much sense

indigo carbon
rustic garnet
#

you can condition SDXL on images, too, like ControlNets are doing.

rustic garnet
#

that has nothing to do with the text encoder

#

you can train SDXL with image input if you want. It's just a decision

mellow tendon
#

Dall.e 3's prompt following/understanding is simply amazing when compared to SDXL, when your output isn't being block by the filters...

rustic garnet
#

Controlnets are doing that. IPAdapter is doing something similar. They encode an image into "text-like tokens" like CLIP and train an conditioning on that

indigo carbon
uncut fiber
#

@mellow tendon and do you know minimal requirements for it? I am happy sai is keeping to make it real for say 4GB gpu cards or even lower.
And day by day more tags on black list.

rustic garnet
#

I'm pretty sure you could even train a controlnet to blend multiple images

#

the most natural way of blending images, though, is just using their CLIP embedding

indigo carbon
#

one flaw with controlnet is it almost always causes the model's quality to degrade the more you try to force it to do something

rustic garnet
#

as more your conditioning moves away from the training data, as more difficult is it to get good results

indigo carbon
rustic garnet
#

it has nothing to do with the number of text tokens

indigo carbon
#

probably token normilization, isn't it?

rustic garnet
#

SDXL is always using at least 75 tokens

#

if you give a short caption it just fills it with blank tokens

#

what I mean is if you force SDXL to follow a strange prompt then it will degrate image quality. Long texts give good results because they often give SDXL much freedom

#

if you make an image from a song lyrics there is no "correct outcome" you enforce. You are happy with any nice looking result

indigo carbon
#

anyways, whenever SD3.0 comes out, I'm assuming it'll have image conditioning like @rustic garnet mentioned is possible, that'll probably be a huge step

#

or even possibly the next focus is mastering other components of the model? it seems like SDXL mastered the UNET, but idk about all the other stuff

mellow tendon
rustic garnet
#

SDXL IS the unet 😅
all other components (vae, text encoder) are independent of sdxl

indigo carbon
rustic garnet
#

yes, but it's trained independently from sdxl and can be used independently

indigo carbon
#

true, VAE is just pixel to latent and the other way around

#

but I think conditioning is dependent..? if you use another model's conditioning on a UNET with a different architecture the Ksampler will fail

rustic garnet
#

yes, the unet depends on the vae and the text encoder input. Use a different conditioning and you have to retrain the unet

#

I just say the unet is the main component that is trained - all other components were trained too, but independently from it (and sometimes by different labs and on different data). If you look into the source code of SDXL you will see that many different conditionings were implemented. They haven't chosen CLIP for no reason. I guess it was the best trade-off between hardware requirements and visual appealing

#

if you compare SDXL with DeepFloyd IF, which is using an LLM trained on pure text, you will see that DeepFloyd IF has a MUCH better text understanding than SDXL. However, I don't find the images from DeepFloyd IF visualy appealing... maybe it's because they still haven't published the highres model. But I think it might also have something to do with the text encoder is not good with styles and aesthetics

vale eagle
#

The model need to be run on consumer level hardware is a limitation. They could do better without this.

tribal lantern
#

Well, Dalle-3 has shown you can have both aesthetics (NOT really detailed styles) and careful prompt following; some times. But overall styles/aesthetics is defiantly SDXL's strength. Especially organic and fine details (a jungle) SDXL manages to create much, much better. At first i was impressed by Dalle-3's prompt following, i still am, but even there, i start to notice it also has a tendency to fail once scenes get really out there. At the same time, it's awesome that it works for simple things where sdxl has a tendency to flat out ignore aspectgs of a prompt, it and won't do things at worst or needs carefully formulated prompts at best. On 1, 2 and 3 of my wishlist for SD-next is better prompt-following., especially for coarse "details" (fine details like color of eyes are solvable like https://rich-text-to-image.github.io/) but i've yet to find a good solution for coarser ones affecting (composition of) whole scene.

#

Maybe one model won't need to do all, just create the basic scene in a model that does the prompt, then enhance in sdxl. Kinda like sketch to image but the sketch is a different model

cyan crown
#

Dalle3 is better with writings and understanding prompt. SDXL is better with quality

#

So one good idea could be creating base image with DE3 and then use it with Controlnet in SDXL for example

hoary saddle
finite thunder
hoary saddle
cyan crown
urban fjord
#

The problem with using controlnet to blend images is that you still need training images for that, but if you have it then it is quite doable.

uncut fiber
#

@finite thunder it is not my github 🙂 but there were owner presented. Mmnt
it is @cursive warren i think

wet nacelle
static berry
cyan crown
#

SDXL Version

native knot
cyan crown
#

😂

noble shoal
#

I once again screwed SDXL by training a Lora on mostly 80x112px images of Faces. Getting some glorious output here at the extreme resolution of 88x120px 😅

#

At least i get 11.51it/s 🤷‍♂️

steady grove
#

cool i guess

noble shoal
steady grove
#

i bet he really nose how to party

noble shoal
hoary saddle
#

dont be so picky

hoary saddle
crisp owl
#

something smells fishy

upbeat summit
solar merlin
#

I keep getting these dot mess on my white images I create. Is ist something to do with trying to create white images?

#

I am using sdxl + refiner

native knot
solar merlin
#

@native knot I am using the base 1.0 model but then swapping the VAE with this code

#

im kinda stumped since I "think" I am applying all the various fixes

crisp owl
solar merlin
hasty smelt
#

Hi everione, I'm wondering if there is a command to update the "run_nvidia_gpu.bat" file. I've never update it, so I don't know if its necessary.

crisp owl
#

Are you generating outside of standard size ratio's? Adding any other models/controlnets/lora's/etc?

solar merlin
crisp owl
crisp owl
#

don't run the python_dependencies file

#

you don't need to unless you know what you're doing.

crisp owl
hasty smelt
crisp owl
#

Yup, you can run that file whenever, the update will only apply when you restart your entire instance

hasty smelt
#

thanks buddy

rustic shadow
icy brook
lusty wolf
#

Just waiting for things to get better in South Africa...

crisp owl
#

You from there? Where at?

wet nacelle
crisp owl
#

looks like the passageway between Riverside County and Orange County in SoCal lol

wet nacelle
crisp owl
#

There we go, had to download the app to replicate the sun. Even sets in the same direction lol

soft bone
crisp owl
#

hmmmm.....well, I'd giess somewhere close to 1024

#

but I haven't used ultimate upscale for SDXL

#

can always test and see if the outcome is wonky though

static prawn
static prawn
crisp owl
#

Nice, probably if your pc can handle it, any speed increase between 1024 tiles vs 512 tiles?

static prawn
#

i felt like it doesnt make a big difference

#

bec i have doubled tiles with 512x512

#

with 1024x1024 i have bigger tiles

#

having less tiles is definetely an advantage anyway

#

im running a gtx 1070, still happy with it 🙂

#

best gpu purchase of my life so far

crisp owl
#

I notice with my vae tiled nodes in ComfyUI, if I keep the tiles at 512 vs changing to 1024, I lose about 30 seconds, and I haven't noticed a difference in quality.
I'm running a 2060S, this thing has been a trooper lol

steep wave
static prawn
#

i just dont get why sdxl is so sensitive to prompts , i often have completely oversaturated, overexposed results

crisp owl
#

I've mostly seen that with specific checkpoints.
I had protovision really disliking some prompts, but if I change to a different checkpoint, be completely perfect

lilac wren
#

I finally managed to train a LoRA model with my 8 GB of Vram, in 512x512, but I was looking for speed (2h40). The results are very impressive, despite the small number of images and steps.
(model : Leah Dizon)

icy brook
#

Aether Bubbles & Foam, coming tomorrow on Civitai.

wet nacelle
icy brook
mellow tendon
nimble heart
#

Colored latents are fun

wet nacelle
native knot
#

And don't ever talk to my son again!

wet nacelle
indigo carbon
#

I feel like that would make it even harder to blend images, this means that both modelA and modelB will need to have image conditioning to enhance the capabilities beyond what current SD can do

#

though I know Kadinsky2.2 can blend images without needing something like IPAdapter; but I'm unsure if that's because of the different encoder, or because it is pixel diffusion

lilac wren
indigo carbon
#

also idk about SDXL having NO understanding of language; I just wrote here- "a polite and friendly octopus drinking tea"

#

it figured out the tophat on its own, so unless polite means wearing as tophat, it did get creative here

lilac wren
weary yacht
#

nice.. got the Intel A770 doing 1920x1080 SDXL

weary yacht
#

dude.. this might be my new wallpaper

nimble heart
#

what's the speed you get on that?

weary yacht
#

doing 1920x1080 is like 5-6 seconds per iteration... smaller SD1.5 stuff was under 1.3S/it

#

so an image like that above just a straight, un upscaled 1920x1080 takes like 9-10 minutes

#

probably 4-5 if I wasn't using an insanely high number of steps

#

10 minutes for this.. I'll use the same prompt and do it for 50

crisp owl
#

Fenrir walking away from Thor who is busy making the second attempted chain

weary yacht
#

3 minutes, 54 seconds, 1920x1080, 50 steps

zinc cargo
# icy brook

foam is going to get so nsfw suggestive very fast 😛

crisp owl
#

That was my first thought also 😆

#

But the bubbles are neat for sure

zinc cargo
#

foam also gonna be cool, but you know 🙂

crisp owl
#

people gonna people thomas

prime juniper
#

Looking for a SDXL specialist to create a dreambooth Model for me Photorealism (no more cameras 🙂 or photography studio ). Can anyone help me create this?

pure crystal
analog fern
#

Curious to ask, can AI generate characters or animations like this?

nimble heart
#

there's some 1.5 tunes/loras meant for making spritesheets

#

you'll have to either inpaint or doctor them to make them consistent though

nimble heart
#

mermaid skeleton found by researchers in the abyssal zone

rigid lagoon
#

mermaid skeleton found by researchers in the abyssal zone

nimble heart
#

that's not the prompt lol

vale eagle
#

One image explains Dalle3

nimble heart
vital ermine
nimble heart
#

oh general the AMD Nod.AI acquisition completed

vital ermine
#

I wish Nod was for training

#

Maybe now it will be

nimble heart
#

sounds like they hired them for their general torch knowledge

#

not just SHARK

#

some of this black latent shit's kinda nightmare fuel lmao

#

she has the omae wa mou eyes

vital ermine
#

black latent?

nimble heart
#

yea

vital ermine
#

never heard of it

nimble heart
#

instead of feeding the ksampler torch.zeros() I feed it a latent with the approximate VAE values of black

#

its a custom node I wrote

vital ermine
#

Oh, nice. Yeah, I barely touched latent

nimble heart
#

it lets you get near-black images

#

I'm 99% sure its what Midjourney does

vital ermine
#

Has to be

nimble heart
#

it picks up on keywords like "dark, black, bright, white" etc and changes the input latent color

#

instead of just neutral grey like SD does by default

vital ermine
#

dynamic latent iow?

nimble heart
#

iow?

vital ermine
#

in other words

nimble heart
#

ah

#

yea

#

so I made a little node that creates colored latents in the 6 main colors + black/white

#

at any strength

#

it works super well

vital ermine
#

yeah, MJ has always been about the tricks behind the curtains

nimble heart
#

if you want just a night time scene you can use not-quite-black

vital ermine
#

I have zero knowledge of node crating

nimble heart
#

its like 50 lines

#

if that

vital ermine
#

this one needs that

nimble heart
#
XL_CONSTS = {
    "black" : [-21.675981521606445, 3.864609956741333, 2.4103028774261475, 2.579195261001587],
    "white" : [18.043685913085938, 1.7262177467346191, 9.310612678527832, -8.135881423950195],
    "red" : [-19.665550231933594, -19.79644012451172, 10.68371868133545, -12.427474021911621],
    "green" : [-3.530947685241699, 14.075841903686523, 26.489261627197266, 8.67661190032959],
    "blue" : [0.45569008588790894, 16.3455867767334, -17.67197036743164, 4.145791053771973],
    "cyan" : [12.434264183044434, 26.013031005859375, 4.298962593078613, 7.954266548156738],
    "magenta" : [-0.9616246223449707, -5.109368801116943, -12.062283515930176, -9.02152156829834],
    "yellow" : [-6.609264373779297, -10.563915252685547, 32.47910690307617, -8.209832191467285],
}
class BSZColoredLatentImageXL:
    @classmethod
    def INPUT_TYPES(s):
        return {"required": {
            "color": (list(XL_CONSTS.keys()),),
            "strength": ("FLOAT", {"default": 0.5, "min": 0.0, "max": 1.0, "step": 0.1}),
            "width": ("INT", {"default": 1024, "min": 16, "max": nodes.MAX_RESOLUTION, "step": 8}),
            "height": ("INT", {"default": 1024, "min": 16, "max": nodes.MAX_RESOLUTION, "step": 8}),
            "batch_size": ("INT", {"default": 1, "min": 1, "max": 4096}),
        }}
    RETURN_TYPES = ("LATENT",)
    FUNCTION = "generate"
    CATEGORY = "latent"

    def generate(self, color: str, strength: float, width: int, height: int, batch_size: int):
        samples = torch.empty([batch_size, 4, height // 8, width // 8])
        cols = XL_CONSTS[color]
        for batch in samples:
            batch[0].fill_(cols[0] * strength)
            batch[1].fill_(cols[1] * strength)
            batch[2].fill_(cols[2] * strength)
            batch[3].fill_(cols[3] * strength)
        return ({"samples":samples},)

entire code for it.

nimble heart
vital ermine
nimble heart
#

yea that's the node. 0 strenght is an empty latent like you're used to. 1.0 strength is pure black

#

or pure white/blue/red/etc

#

only works on XL

vital ermine
#

0.5 is the mid grey?

nimble heart
#

No 0.5 would be like 25% lightness approx

#

SD is mid grey by default.

#

so 50% black is 25% lightness if that makes sense

#

cause 0% black is gray

#

it works like that because I cant blend latent colors, only multiply

vital ermine
#

I already feed into latent and this has no pass though 😦

nimble heart
#

if you have my whole pack, the Offset node has a -1.0 -> 1.0 node that just adjusts your existing latent

#

so 0.0 is gray, -1.0 is black, 1.0 is white

vital ermine
#

I have your pack

#

oh, sweet

#

let me try that now

nimble heart
#

might be more your style

#

use it before the noise is added

#

cause it's multiplicative

vital ermine
#

what is it doing?

#

wow

#

just tried 0.5

nimble heart
#
    def offset(self, latent, offset: float):
        samples = latent['samples'].clone();
        if offset > 0:
            cols = XL_CONSTS['white']
        elif offset < 0:
            cols = XL_CONSTS['black']
            offset = abs(offset)
        for batch in samples:
            if offset != 0:
                batch.mul_(1 - offset)
                batch[0].add_(cols[0] * offset)
                batch[1].add_(cols[1] * offset)
                batch[2].add_(cols[2] * offset)
                batch[3].add_(cols[3] * offset)
        return (latent | {'samples': samples},)

basically it adds offset% white or black and multiplies the latent by the inverse to compensate

vital ermine
#

0.5 and 0. 0 is like it was without

nimble heart
#

yea the node does nothing at 0

vital ermine
nimble heart
#

except consume a few MB of memory by caching the latent I guess

#

yea 1.0 is usually a bit too strong unless you're literally making a black background with a single thing on it

vital ermine
#

well, I like it

#

thank you for this

nimble heart
#

sometimes -1.0 looks good though

vital ermine
#

the -1.0 is perfect

nimble heart
#

for positive values +1.0 will basically turn it into a digital doodle on a pure white photoshop canvas lol

#

so +0.5 is usually the limit unless it's like an angel in white robes in a blizzard

#

if you ever use 2.1 or 1.5 the RGBA node can do something similar since it has arbitrary colors.

#

for img2img or other non-empty latent scenarios you'd have to merger it though

#

comfy has nodes for latent blending/merging too btw

#

built in

#

they might be in _for_testing still

vital ermine
#

yes, I think that is where I saw them

#

I needed this last night as I was fighting with way too dark

nimble heart
#

yea hypothetically you could use that to mix my colored_latent_image node with an existing one instead of just using the offset

#

but idk that's a lot of effort hence why I just made the offset

vital ermine
#

I just did

nimble heart
#

but yea I'm playing with -1.0 black to make some sketchy research footage and its great

vital ermine
#

I need to remake this now as I will use it to colour the latent instead of the image back into latent

nimble heart
#

if you're blending then leave the colored latent at 1.0 strength and just change the blend

#

should work similarly maybe

vital ermine
#

yes

#

I would think

#

unless it overpowers it

nimble heart
#

life pro tip if you clamp the values in addition to the black latent you can straight up make pure black values

#

also the refiner does indeed still work with colored latents without totally breaking down but I'd say its even less useful tbh

vital ermine
#

I wonder if post processing I removed for my look if I can do them in latent? gonna try

nimble heart
#

i still post process

#

that's what that screenshot is

vital ermine
#

see, I prefer to get rid of post and work entirely in latent if possible

#

yeah, I can.

nimble heart
#

with pure black its not really necessary but with other values it's still allergic to black/white pixels

#

it'll limit itself to like 10% 90% lightness

vital ermine
#

lol

nimble heart
#

also some other values are fun for specific looks.
Yellow does good for golden hour sunlight
cyan for underwater coral reef stuff
etc

#

cyan can also be used for images that have a lot of sky

#

then white if it's a bright image and SDXL starts darkening things to compensate

vital ermine
#

it sure did mix them as red + white is pink

nimble heart
#

color mixing doesnt work though

#

if you mix red + blue it doesnt make magenta

vital ermine
#

no, it was a nice accident

nimble heart
#

sometimes it makes the ocean stuff have black bars. must source movie stills?

vital ermine
#

I had that and it is weird

#

something from source, it has to be

nimble heart
#

This one's fun. Has that effect I was going for of a research submarine finding some eldritch shit

vital ermine
nimble heart
#

also seems like the black significantly reduces the overtuning effect of "mermaid" to just make hot women with nylon fish tails. must force it to approach the image totally differently

nimble heart
#

also one thing I do if I'm really in the colored weeds is to connect the latent to an ImagePreview node before it samples so you get a little preview of the color you're feeding the sampler each time you run it

vital ermine
#

Yep, that is what I do

#

WHOA

nimble heart
#

white?

vital ermine
#

I accidently went from 0.3 white to 1.0 on the above image

nimble heart
#

lol

#

bones r white

vital ermine
#

I meant to go to 0.4

#

wish I could do 0.35

supple knot
#

can I move this file anywhere and it will still work the same? ComfyUI_windows_portable

nimble heart
#

ah I could probably change the steps to be 0.05 instead of 0.1

#

I didnt think it'd make a big difference tbh

supple knot
#

or does it have to be at the top of a drive?

nimble heart
#

unless you're talking about the comfyui node which I cant change

supple knot
#

I'm getting a error with the qrcode system that I cant fix

#

Error occurred when executing ControlNetLoader:

module 'comfy.sd' has no attribute 'ModelPatcher'

vital ermine
nimble heart
#

yea slight color changes affect the composition a lot

#

I pushed to git it should step at 0.05 now

#

I'm not gonna do 0.01 lol

#

should be able to just update it with the manager

vital ermine
#

well, for testing I would like to see if .01 matters. I think it might

nimble heart
#

you can change it yourself in the file it's just the "step": 0.1 value on the node input thingy

slow yoke
#

Hi guys, I hate to interrupt and not sure if this is the correct channel, but I used Realistic Vision and ChilloutMix model, all works fine, but as soon as I switch to SDXL, eerything becomes like this. Any help is appreciated!!!! :)))))

nimble heart
#

just reastart the comfy server and it'll take

rustic garnet
supple knot
#

in general SD 1.5 is 512 x 512 SDXL is 1024 x 1024 @slow yoke

nimble heart
# nimble heart

hypothetically you could also allow the min/max to be greater than they are too but that's actually insane

#

like how do you have 150% black?

slow yoke
vital ermine
#

class BSZLatentOffsetXL:
# {{{
@classmethod
def INPUT_TYPES(s):
return {
"required": {
"latent": ("LATENT",),
"offset": ("FLOAT", {
"default": 0.0,
"min": -1.0,
"max": 1.0,
"step": 0.01,
}),
}
}

nimble heart
#

idk what UI you're using but make sure you're using automatic or the XL vae

nimble heart
rustic garnet
#

hm, weird, it looks like it is not fully denoised. But yeah, you definitely should use a resolution of at least 1024x1024, otherwise the image will look very ugly

nimble heart
#

I'm 99% sure it's the 1.5 vae on an XL latent

#

which is why it works fine with ChilloutMix

#

somewhere in the UI the VAE is manually set to sd-ft-mse or something

rustic garnet
#

for sampler I wouldn't use Euler A. If you want a non-deterministic sampler use some of the Karras DPM SDE samplers. Or simply use DDIM or UniPIC for deterministic.

rustic garnet
#

I mean, technically, yes, but I would have expected you get pure noise back then

nimble heart
#

they're both 8x latent space so they technically work

#

just XL was trained from the ground up so it's totally different

vital ermine
#

I guess I changed the wrong thing

#

it only does 0.05 now

nimble heart
#

LMAO when you see your neighborhood abyssal demon on your way to work 👋

nimble heart
vital ermine
nimble heart
#

i think the comfyui web app caches what the nodes' value sliders are

vital ermine
#

yeah

nimble heart
#

so if you already restarted the server gotta F5 as well

slow yoke
#

Thank you guys for all the help, let me give it a try

vital ermine
#

I noticed this flash by

#

Failed to download lbpcascade_animeface.xml

slow yoke
vital ermine
#

OUCH, 2011

nimble heart
#

despite the stuff looking complicated my nodes are probably the most mundane of all the node packs. just slightly altered existing comfyui nodes for the most part

vital ermine
#

yes, 0.33

#

slight change

#

I prefer that one

nimble heart
#

that's mostly gonna be seed variance at that point

vital ermine
#

0.01 is good stuff

#

my seed is locked as is no memory changes stuff (no-mem sdp)

nimble heart
#

actual colors are the same

vital ermine
#

don't care it made a good change because the latent color changes content as we know

nimble heart
#

its like a variance seed

vital ermine
#

yeah. I will stick with .01 changes as I like this the best.

nimble heart
#

so the tiniest of changes will affect an image even if the actual color is the same

vital ermine
#

yep

#

latent space is a weird, wonderous, and freaky place

nimble heart
#

One thing i give XL credit for is most of my "mermaid" things dont have legs. in 1.5 you had to blacklist like feet legs knees etc and it'd still most likely fuck up. XL i dont have to blacklist anything

vital ermine
#

2.0/2.1 had legs too

nimble heart
#

never used it runs like shit

#

2 is slower than XL

#

if you autocast 2 it just NaNs instantly.

#

and fp32 is like 1/5th the speed of fp16

#

sometimes it makes them red 🤔

supple knot
#

I was trying sand castles you all got any good ones

nimble heart
#

i saw sytan with some earlier

#

when he was showing off his photography lora

supple knot
#

on this channel?

nimble heart
#

dont remember

#

think so

#

4k waifu

vital ermine
nimble heart
#

on XL?

vital ermine
#

yes

nimble heart
#

bf16 probably works

#

since its a scale issue on fp16

vital ermine
#

yeah, I use that on comfy now but still the same amount of vram being sucked up

nimble heart
#

damn really

vital ermine
#

on automatic1111 we have to use the fp32 for vae

nimble heart
#

i thought auto had bf16 support

#

sd.next does

vital ermine
#

yes, so I wonder why use it? I think speed as bf16 is faster than fp32? I dunno

nimble heart
#

yea bf16 should be speed comparable to fp16 I think

#

thought it was supposed to use less mem though

vital ermine
#

I despise vlad with a passion. I mean pure hatred the kind you probably do not know. in other words, no thanks.

nimble heart
#

interesting.

vital ermine
#

trust me, I have a legitimate reason for it so I don't go anywhere near him, or his work.

nimble heart
#

I've only spoken to Auto in pull requests and never to vlad so im not sure what the whole deal is

vital ermine
#

I have no idea Auto isn't really in control his rag tag band of devs are all over the place.

nimble heart
#

sd.next has a HF diffusers backend so its pretty nice. UI is a little jank sometimes though

#

idk anything else that supports arbitrary Diffusers models

#

I guess i could write my own CLI script it wouldnt be too hard

vital ermine
#

I can't even find any trainers that use diffusers BUT one from hugging face. Really janky but damn the quality of diffusers directly I like it

nimble heart
#

but making a full UI with live previews and everything is pain

nimble heart
#

unless you mean loras

vital ermine
#

been dead for a long time now

#

I mean all of them, yes and for xl

nimble heart
vital ermine
#

OneTrainer really took its place

#

I used to be on the ST discord then they said it was dead and I just left it

nimble heart
#

gets updated all the time lol. I might try to get it working with Lora later once ROCm isnt having a crisis

vital ermine
#

Don't know now as it was a while back. They were being asked for someone to take it over but honestly it just never was my thing. for DB I used Shiv's. FT no way in hell am I going to hand curate 3k+ images and captions.

#

This was before XL was even being talked about

nimble heart
#

ST can use DeepSpeed now for 24gig cards so if you're doing full checkpoints it might be spicy.

indigo carbon
vital ermine
#

DeepSpeed spanked me on WIndows. Being Microsoft I was shocked but I didn't like being spanked by it. I decided it won, I lost, and moved on.

indigo carbon
nimble heart
#

i got rocm's deepspeed fork to actually work but it miscompiles if you use stage 2 cpu offloading

soft bone
#

this accident has lotr level bloom

nimble heart
#

accidents are always fun

indigo carbon
#

it's just a shitty version of AITemplate except it's only for LLMs, best optimization for LLMs is exLLaMa; which is x8 speed

nimble heart
#

one of my favorite 1.5 images was with the completely wrong settings

nimble heart
vital ermine
#

I haven't heard a word, for the last month, about the new xformers being released. Supposedly it is done but was waiting on torch.

#

more mem efficent and faster

nimble heart
#

yea it's based on Flash Attention 2 now

vital ermine
#

problem is needs tensor cores of ampere and ada cards only

indigo carbon
vital ermine
#

well, I want it for training

nimble heart
#

exllamav2 uses Flash Attention 2 natively but I haven't been able to successfully compile it on rocm yet

#

there's a PR to merge the Flash Attention 2 changes into pytorch 2.2 as well

#

so scaled dot product will get the same speedup eventually too

vital ermine
#

sdp is worse than xformers for Nvidia cards

#

especially training

nimble heart
#

hypothetically you could directly use the flash attention lib on stable diffusion instead of through xformers/sdp

indigo carbon
#

maybe they could make something like exLLaMa for training?

vital ermine
#

that would rock

nimble heart
#

exllama is hyperoptimized for inference specifically isnt it

vital ermine
#

I feel like it is the Commodore64 days again and every single byte counts.

nimble heart
#

it compiles a microkernel for the model/context shape/gpu

#

so similar to AIT i guess

indigo carbon
#

but exLLaMa for diffusion seems far for now.

nimble heart
#

i mean you already have something like that with AIT. it just doesnt seamlessly compile the kernels for you and just work™️

indigo carbon
nimble heart
#

exllama works always. not depending on your system

indigo carbon
#

if you have a 3000 series card and above it will work right away

indigo carbon
nimble heart
#

exllama hot-compiles a kernel if your gpu isnt included in the pre-shipped ones

#

so it works on AMD and everything too

#

the first gen takes +30 seconds while it compiles then it's gucci

indigo carbon
#

AITemplate compiles engines, exLLaMa compiles kernals right?

nimble heart
#

idfk

#

I have no idea what the difference is

#

they call it a kernel in the readme so

indigo carbon
#

they are different, optimized kernals are more flexible than optimized engines

#

and faster in this case

vital ermine
#

I do not get it. I trained YET again and still blurry but non of my data was blurry

nimble heart
#

SHARK does the compiled kernel thing for diffusion models too

#

but it's pretty fiddly I've found.

vital ermine
#

if I switch from base the blurry goes away but base is what I trained on

nimble heart
#

blacklist "blurry"

indigo carbon
nimble heart
#

comparing exllama to transformers is apples oranges

#

exllama runs on quantized models

vital ermine
#

I can't release this like this having people type blurry in the neg

nimble heart
#

you cant quantize SD afaik so it'll not be 8x

vital ermine
#

let's see if it works

#

I typed blurry in the neg

nimble heart
#

and if you're talking about exllama 1, that's a 4bit quant which is substantially smaller than a fully fp16 PyTorch model

#

so it's going to be a lot faster

#

exllama2 can use mixed precision quants, and 8bit cuts the speed in like half compared to the default 4bit

indigo carbon
#

idk, I feel like if we'll have something like exLLaMa for diffusion we could get close to instant image generation

vital ermine
nimble heart
#

i mean it might be possible to do an exllama2 approach mixed-precision quant by brute-forcing all the layers to find which can be tuned down without spitting NaNs

#

so you could have like 9.82bit SDXL

#

or whatever

uncut fiber
#

13b 5q is very slow in comparison with 7b
Only optional, otherwise 50% users cant use it.

nimble heart
#

yea 7b models are lightning fast even at 8bit

#

7b 4bit is like 100T/s

uncut fiber
#

yes should i install xformers?

soft bone
nimble heart
uncut fiber
#

i got it as option in gradio for oob*ga

indigo carbon
#

I get 8T/s with GPTQ and 62T/s with exLLaMa... I would love to see the day this will happen to diffusion models

nimble heart
#

yea that's a pretty spicy gain.

vital ermine
nimble heart
#

try exllama 2 with flash attention if you can

#

should be even faster

#

if it's a 4bit gptq model you dont need to convert it to exl2

indigo carbon
soft bone
nimble heart
#

like exllama 2 doesnt use xformers

vital ermine
soft bone
#

oh idk i only used adam before prodigy

uncut fiber
#

o.k. i am now using gguf models. lama.cpp

nimble heart
#

not sure about llama.cpp

#

since it offloads xformers might not make a difference?

vital ermine
#

ada short for adam

nimble heart
#

lol the adam mixed-cpu kernel is exactly what miscompiled in DeepSpeed for me

soft bone
#

could be rank or lr doing that to you as well. or an optimization like mem effn attn

indigo carbon
#

so ideally; exLLaMa is 8x as fast as GPTQ, so if you have something identical to exLLaMa for diffusion- your it/s should also be x8 as fast

vital ermine
#

prodigy is so mem hungry I can barely get BS2-4 (forgot now)

nimble heart
vital ermine
#

well, that is straight up Dreambooth

nimble heart
#

else no way in hell

#

probably more like 2-2.5x

indigo carbon
vital ermine
#

I finally got it to train and it blurs like that :/

soft bone
nimble heart
#

exllama's black magic is in it's handling of the quantization

indigo carbon
#

exLLaMa also has its own attention? I tried it with and without Xformers and it was a little faster without Xformers

nimble heart
#

no diea

#

idea

#

exllama 2 uses flash attention 2

#

I've never used the original exllama

#

it uses flash attention or something else as a fallback. not sure what the fallback is

vital ermine
indigo carbon
#

I think exLLaMa 1 just has a built in version of Xformers

soft bone
indigo carbon
nimble heart
#

i dont think so, which is why GPTQ and exllama became a thing

#

so we'll need something like an SDQ first then after an ExSD can be made

indigo carbon
#

I guess it's kinda bound to happen eventually, but the question is when

vital ermine
#

wow, 0.35 for the decay?

#

constant for the scedulere or cosine? annealing is too much mem

#

I am stumped as to which one?

vital ermine
#

Those are 500 step difference checkpoints

nimble heart
#

sometimes I wonder if I should've gotten the 16 core...

vital ermine
#

Next year I get zen 5 and 16c/32t is what I am after but for python stuff it will not help but does if I do anything else along with it

nimble heart
#

i have zen 4 12c and it compiles pretty fast

#

tbh not often i can peg all 24 threads like that

vital ermine
#

I went from 1600 to 5600 a few months ago and glad I did but I already have issues with just 6

nimble heart
#

most of the time it only uses 24 for a few seconds then decreases as jobs complete

vital ermine
#

6 cores 12t not enough

nimble heart
#

so unless you're compiling the linux kernel or pytorch it'll fall off after 12c

soft bone
#

interesting

nimble heart
#

hey flash attention actually compiled?

#

now lets see if it dies

#

:O

vital ermine
#

kill it, kill it real good

nimble heart
#

okay it's not faster but exllama didn't bitch about Flash Attention not being installed

uncut fiber
#

anything to enable using only gguf models @nimble heart ?

nimble heart
#

no idea I don't use gguf

uncut fiber
#

o.k. what model i can afford and you can suggest having 16GB RAM and 8 VRAM?

vital ermine
#

double both of those

#

min

fierce hollow
nimble heart
#

i mean with ROCm on Linux

fierce hollow
#

oh that's a whole another can of worms I guess

nimble heart
#

should work just fine on nvidia

vital ermine
#

I know my next pc will have 64, or 128gb of rams right off, all slots filled.

nimble heart
#

anyways the exllama "you dont have flash attention" warning went away but it's functionally identical. Same vram usage, perf, etc. So I'm guessing it fails and falls back later or the rocm flash attention is just all stubbed functions to pass tests currently.

uncut fiber
#

will try sdp_attention and see if any difference

fierce hollow
#

you can check by importing attn from exllamav2

#

like uhh

#

python -c "from exllamav2 import attn; print(attn.has_flash_attn)"

nimble heart
#

False

vital ermine
#

@soft bone your "T_max=25" is out of whack.

nimble heart
#

sheiße

vital ermine
fierce hollow
#

yeah then it's just failing silently

nimble heart
#

ah it gates it to Flash 2.2.1 and my fork is 2.0.4

#

let me bypass it real quick

#

what's the worst that could happen

soft bone
vital ermine
#

Well, I am still a bit wonky about t_max as the net says epochs and steps but never says this is EXACTLY what it is.

nimble heart
#

amd fucking card gated it

nimble heart
#

I'm hoping that changes once the 7900 cards are "officially" supported because they should have the architecture to do that

#

they have the WMMAs that their flash attention fns use

peak dove
#

SDXL from Distillery@Discord (free in Alpha Test)