#✨|sdxl

1 messages · Page 8 of 1

azure oxide
#

i feel like comfy was saying the same thing as i just did about a1 with his comment above 😂

rustic garnet
#

you have 77 tokens, each token consists of the encoding from CLIP-L and CLIP-G. So if your prompts are "a dog" and "national geographics" then you get two tokens, one "a"+"national" and one "dog+geographics". These tokens are then used in cross attention

elfin cobalt
#

@visual glade I've been sketching out an easier-to-use API for comfyui. Something that's like the prompt parameter, but where nodes can be named instead of numbered, and where node inputs are named instead of indexed. I was going to just put that in my own code and use it myself (all Rust), but is this something you'd be interested in getting a PR for?

uneven dove
#

i doubt national is one token 😛

visual glade
rustic garnet
#

just to give an example: if you make one prompt as "a dog" and another is "a cat" then you have one token that is "cat+dog". So each pixel in the image has to be assigned to "cat+dog", it cannot be assigned ONLY to dog or ONLY to cat

elfin cobalt
#

Hm? Hang on.

rustic garnet
uneven dove
#

chibiYell is overwhelmed by all the conversations at once

visual glade
#

I added an option to export in api format in the latest version, to see it enable the dev options in the settings

elfin cobalt
#
"22": {
    "inputs": {
      "add_noise": "enable",
      "noise_seed": __SEED__,
      "steps": __STEPS_TOTAL__,
      "cfg": __BASE_CFG__,
      "sampler_name": "dpmpp_2m_sde_gpu",
      "scheduler": "karras",
      "start_at_step": 0,
      "end_at_step": __FIRST_PASS_END_AT_STEP__,
      "return_with_leftover_noise": "enable",
      "model": [
        "10",
        0
      ],
      "positive": [
        "75",
        0
      ],
      "negative": [
        "82",
        0
      ],
#

What is the "0" in positive?

sudden cliff
rustic garnet
civic sigil
#

Does anyone have an idea why some ints would be incompatible with eachother in comfyui

steady chasm
#

There's so many... Terms being stated.
Anyone happen to know a guide on the technology of diffusion, specifically written for dunces such as i

sudden cliff
#

or vectors

grizzled warren
#

There's no such thing as "pushing people" here. There are just different tools, some fit certain jobs better that others. Sometimes fundamentally, sometimes because it's more developed of mature. People are free to choose though.

The only kind of pressure you can possibly apply involves A1111 and Vlad. The better your software gets, the more important it is for them to stay on your level. But that's a very good kind of pressure, if you ask me! The community surely benefits from that.

rustic garnet
visual glade
uneven dove
rustic garnet
#

so each token in your input prompt becomes a vector. The k-th token in the CLIP-L prompt is concatenated with the k-th token in the CLIP-G prompt

civic sigil
sudden cliff
rustic garnet
#

(and with the pooled prompt from CLIP-G)

sudden cliff
#

or these AIs wouldn't work at all

rustic garnet
#

it does

#

its even worse 😉

sudden cliff
#

no

rustic garnet
#

because it always uses 77 tokens, so they are filled with blanks

visual glade
elfin cobalt
sudden cliff
#

maybe some of the vectors become 0 after cross attention?

elfin cobalt
#

It's named on the UI, after all.

uneven dove
sudden cliff
#

at least they can't be vectors with any strength

rustic garnet
#

yes, their attention can be 0

visual glade
rustic garnet
#

but I'm sure the "blank" tokens are still used somehow

rustic garnet
azure oxide
#

ive always wondered about token concatenation, how does it work in regards to not using concatenation? say if i prompt a scene with 75tokens. then i prompt with those same tokens but add 100 more to describe it more finely. Does the second one have less weight on the original 75 tokens or something? or does concatenation work flawlessly and can just simply overcome the 75 token limit and also add the extra tokens?

rustic garnet
#

you will get different results when using no padding. However, it would be more performance efficient without padding for sure

sudden cliff
#

LLMs don't work in such a way that they consider a catalog to be a cat and a log

elfin cobalt
strange mist
rustic garnet
visual glade
elfin cobalt
#

Only if they're completely equivalent, I think.

sudden cliff
# rustic garnet hm? I never claimed that

Is the core idea you're trying to convey that you basically have a prompt 'a cat licking a dishwasher' and the CLIP-L and G are creating encodings like 'a a cat cat licking licking a a dishwasher dishwasher' and cross attention is applied afterwards?

rustic garnet
#

your prompts are

[a, cat, is, licking, a, dishwasher] CLIP-G
[a, cat, is, licking, a, dishwasher] CLIP-L

and then your vectors are [a++a], [cat++cat], [licking++licking], and so on

#

but this does not mean you concatenate words or something

#

you just concatenate their vector embeddings

#

basically for SDXL they couldn't decide which text embedding works best, CLIP-L or CLIP-G, so they just used both

naive inlet
#

can anyone tell me what ToBasicPipe is? i imported a workflow and it showed me that node is missing and i can't seem to find where to install it

elfin cobalt
#

For prompt extensions, would it make any sense to just... average them?

visual glade
#

except the output vector embedding don't match that much with the words

rustic garnet
#

the cross attention gets one token with "cat" is encoded oncy by CLIP-L and once by CLIP-G

sudden cliff
#

that's why it doesn't matter

azure oxide
uneven dove
#
encoder2: "a dog"  --> [4, 5, 6]
encoder1: "a cat"  --> [7, 8, 9]
encoder2: "a cat"  --> [10, 11, 12]
    [[1, 2, 3], [7, 8, 9]],  # embeddings from encoder 1
    [[4, 5, 6], [10, 11, 12]],  # embeddings from encoder 2
]
    [1, 2, 3, 4, 5, 6],  # embeddings for "a dog"
    [7, 8, 9, 10, 11, 12],  # embeddings for "a cat"
]

an example of how it looks.

rustic garnet
eternal fog
uneven dove
sudden cliff
sudden cliff
#

human language

rustic garnet
#

I doubt so

#

if you trace back which pixels are attended to which tokens you clearly see that the tokens still keep their meaning

#

in the sentence "a cat is licking a dishwasher" you will still see that the latent pixels in the image that belong to a cat are more strongly associated with the word "cat"

sudden cliff
#

you could connect the latents to the tokens in a way, but the tokens themselves aren't relevant

#

a rose is still a rose

rustic garnet
#

maybe we just talk about different things, cause I don't use the word token precisely

#

I talk about the vectors in the cross attention

#

each vector is connected to a token in the original sentence

uneven dove
#

just because the vectors are concatenated doesn't mean they lose their meaning, i think is what kai is trying to say. but that's precisely why it works to have "a dog" in one prompt and "national geographic" in the other.

rustic garnet
#

and of course, some words will consists of more than one tokens, then their vectors are probably very similar to each other

rustic garnet
#

I think it works because you can mess up with SDXL in soo many ways and it still works

#

and yes, sometimes messing up makes it even better

#

I just say that it is strange because you technically align both prompts with each other token by token, although this alignment has no meaning

sudden cliff
#

I guess I'm trying to understand what kai is saying means wrt the 2 clips

uneven dove
#

yeah that's why i said it's not THE way, it's just Different, and that allows you to access a wider subset of the data distribution than just doing it the same way every time you prompt the models.

hushed bobcat
#

my refiner doesnt run automatically in comfyui, is there a button I am missing?

rustic garnet
#

for example I wonder if the following works similarly good:

sudden cliff
uneven dove
#

OpenCLIP didn't have much knowledge loss, if anything it knows more and is more precise than CLIP-L

rustic garnet
#

CLIP-L: "a cat is licking a dishwasher BLANK BLANK"
CLIP-G: "BLANK BLANK BLANK BLANK BLANK BLANK national geographics"

#

anyways, I guess there is still a lot to experiment

sudden cliff
uneven dove
rustic garnet
#

I just wanted to say that its not so obvious that you can use two different prompts and, therefore, I don't find it shocking when auto1111 has not implemented that yet

sudden cliff
#

I get what you're saying

rustic garnet
#

its extremely inefficient, but I think you cannot simply drop that

uneven dove
#

it ensures you still get unseen prompt features aiui

stray mantle
#

ComfyUI SDXL 0.9

rustic garnet
#

cause due to the transformer layers you not only change the text tokens but also the blank tokens

uneven dove
rustic garnet
#

like the embeddings for BLANK might still contain knowledge about "a cat is licking a dishwasher"

uneven dove
#

when you misalign the timesteps of the two models, it does that reliably

stray mantle
uneven dove
#

oh, i was just stating something randomly

rustic garnet
#

so I guess when you would remove all blank tokens, the model loses expressive power

stray mantle
boreal bough
uneven dove
#

forehead jewels

#

misaligned timesteps are badass

#

it kind of "cracks" the image apart

#

you can see these thick black sharp lines form on faces quite often

meager canopy
uneven dove
#

@visual glade one of the best aspects of AUTOMATIC1111 getting SDXL support is that it seems to be putting a rush on resolving issues that cropped up with SD 2.0 support there, and were never resolved

#

like models not loading the correct VAE or having hidden errors that just silently fail to load the model, fallback to the prev model etc

#

idk why --no-half-vae isn't the default now, the only model that works with that is 1.5

visual glade
#

probably because that's their solution to the high vram usage of the vae

sudden cliff
uneven dove
visual glade
#

I'm pretty sure it doesn't actually and it's an extension

uneven dove
sudden cliff
uneven dove
#

the zero or one vectors are tweaked by the inputs that have no captions to them - the empty caption is replaced with all zeroes or all ones, depending on the encoder, since they use different tokenizers

#

this i think is how a lot of models end up improving their negative latent space so that you don't need negative prompts

#

you can run the text encoder on very high quality data with caption dropout around 5-10% and it definitely stops needing negative prompts. but without a very diverse set of captioned images, you will start losing knowledge that exists in this 'empty space'

uneven dove
#

yep

#

that's the Weird Stuff i alluded to

boreal bough
#

there's even a second dishwasher there!XD

sudden cliff
uneven dove
#

it's just that they didn't ask for a coherent cat, so, that didn't happen

#

when you don't use classifier free guidance, your output is pretty well-pinned to the prompt

#

if you don't ask for something, it doesn't happen

#

unseen prompt features? fuck 'em. never needed em 😄

autumn forum
coral orchid
#

dish, washer

uneven dove
sudden cliff
uneven dove
#

have you not seen the way my family eats? a hybrid dish/clothes-washer would be awesome

coral orchid
sudden cliff
#

and is that because of insufficient self-attention?

uneven dove
grizzled warren
#

Incoherent cat would suffice!
Cross Attention, probably 🤣

uneven dove
coral orchid
#

what I like to do for fun with negative prompts... generate with no prompt, and then add a negative based on what the unconditional output looks like

uneven dove
#

SAG scaling never worked for 1.5 or 2.0 or 2.1 but DAMN IT things can be different

sudden cliff
uneven dove
#

read the SAG paper

sudden cliff
#

screen actors guild?

uneven dove
boreal bough
green python
#

yeeeesss, i managed to do this after 20 attempts lol, someone on reddit said it was impossible

sudden cliff
uneven dove
#

each image takes 4 minutes, so that's about an hour and 20 minutes for 20 attempts

uneven dove
sudden cliff
#

I had no idea that there wasn't a self-attention component to all these diffusion impl. I realize that self-attention wouldn't be perfect, but thought it was doing some.

lament rune
#

It was really cool when it worked

uneven dove
#

SAG only works on v1.4 with non-square resolutions, on v1.5 with square resolutions, and nowhere else

green python
uneven dove
#

i got tired of keeping track of which models do/don't work with it.

green python
#

the guy on reddit said that sdxl wasn't capable of creating that image

uneven dove
uneven dove
#

so a single image takes 4 minutes on a 1660

#

yeah it sucks but it's better than what LLaMA would do on that hardware, which is, nothing.

sudden cliff
#

Does anyone know, does even encoding do self-attention?

#

encoding ONLY

#

in these impl

boreal bough
#

"sir mitten, a 1 year old kitten, taking on the adventure of prevailing over its archnemesis the dishwasher "sir brrrrrrs a lot""

uneven dove
#

Self-Attention Guidance (SAG) is an advanced method that uses a model's own attention maps to improve the generated images. You can think of an attention map as a heat map that shows the parts of the image that the model is currently focusing on. By blurring only these parts of the image, the model is better able to focus on the most important features of the image, which leads to better results.

green python
#

fined tuned sdxl will be the same as midjourney 5.2 and sometimes maybe a bit better

urban fjord
#

SDXL fine-tuned or not isn't dependent on the developers to develop new features like with MidJourney. People seem to celebrate new features in MidJourney that has been in SD for a year.

boreal bough
sudden cliff
#

OK so what I'm asking is this, if I typed 'a cat licking a dishwasher' I could maybe type 'ing cat a washer dish lick a' and the only difference would be the vectors having a different order, there's no attention applied prior to sampling?

sudden cliff
green python
#

will i be able to run finetuned sdxl in 8gb vram?

uneven dove
#

some huge LoRAs might end up increasing VRAM requirements depending on how they're handled during runtime

boreal bough
sudden cliff
#

So that's where the non-equivalence comes in

green python
#

stable doodle is really good

sudden cliff
#

I always assumed the encodings would be wholly different

#

because of LLM self-attention

#

but now I see it's not used that way

#

I guess

#

it's raw encoding only...

boreal bough
uneven dove
#

transformer models are just different

#

almost nothing transfers over

sudden cliff
#

OK whelp

inner ruin
#

ahhh finally got a LoRA to kind of work, maybe I overbaked it? 1e-5 LR and 90 epochs

uneven dove
#

oh hell yeah you overbaked the living fuck out of that

inner ruin
#

but it wouldn't capture the face otherwise

#

I tried left and right

uneven dove
#

that's ... surprising lol

sudden cliff
#

OK well now I'm sad we don't have self-attention at any part

inner ruin
uneven dove
#

@sudden cliff same lol it was pretty good for 1.4

glossy tusk
#

hi

uneven dove
#

@inner ruin i was going to suggest you ping Caith 😛

sudden cliff
#

OK now I'm really wondering how DeepFloyd IF did as well as it did. Did that have SAG?

inner ruin
sudden cliff
inner ruin
# boreal bough pic?

it only works kind of though. The moment I change the prompt too much it just defaults back to random girl

boreal bough
#

ah... yeah. face

#

easiest solution is overfitting, for now

inner ruin
boreal bough
#

while there are better options, overfitting is a lot easier for now

inner ruin
sudden cliff
#

wonder if accurate...

boreal bough
inner ruin
sudden cliff
inner ruin
sudden cliff
#

ok DF IF DOES use SAG

grizzled warren
inner ruin
boreal bough
#

nop.
so easiest solution I found for now, is overtrain the face only - base model wont break for A LONG time, so you shouldn't have issues there, then around 600% it makes nice faces

grizzled warren
inner ruin
boreal bough
inner ruin
boreal bough
#

training images - 30 to eliminate any possible problems. more images = better results, up to around ~150 at which point it just takes longer
if you have less than 30, it can work, just try a bit around, and see if you can avoid training the background as well XD

#

under 10, make sure your captions are good, and always caption the background!

sudden cliff
#

Final actual question for anyone that knows:
Is the lack of SAG why MJ, SD, SDXL, Dall-E2 cannot do 'a boy with red hair and a girl with blue hair' (extrapolate with various associations) reliably (and DF IF can)?

inner ruin
#

that makes sense, like regular 1.5 LoRAs

inner ruin
rustic garnet
boreal bough
#

yep.
my 2B lora, where I trained a face as well for it - should have been done around 50 epochs, but I let it run to 350 to 'fix' the face. remaining model didn't suffer any damage though, clothing even improved and didn't get overbaked

sudden cliff
uneven dove
# sudden cliff Final actual question for anyone that knows: Is the lack of SAG why MJ, SD, SDXL...

In a transformer text encoder (as in GPT models), self-attention is used to capture dependencies between all words in a given text regardless of their position. For each word, it computes an attention score for all other words to determine their relative importance. The word embeddings are then weighted according to these scores to produce the final output. This mechanism allows the model to focus on relevant parts of the input sequence when generating each word in the output sequence.

The U-Net architecture is typically used in tasks such as image segmentation, where the model needs to output a pixel-wise classification of an input image. The self-attention guidance in a U-Net-like model isn't used in the same way as in a transformer model. Instead, it is used to better incorporate global context and guide the generation process in diffusion models. This guidance helps to improve the image generation quality by allowing the model to attend to different parts of the image at different stages of the generation process.

#

they're different and not the same form of SAG.

rustic garnet
sudden cliff
#

That's why I kept saying self-attention not SAG (which i didn't even know)

uneven dove
#

i'm stupid and use the wrong words sometimes

sudden cliff
#

OK so the encoding does use self-attention, I am satisfied then

#

I was horrified about that mainly

#

(I was thinking that other than how it was trained, token order didn't matter AT ALL)

boreal bough
inner ruin
uneven dove
sudden cliff
#

well see that's why I'm surprised that dishwasher and dish, washer wouldn't be more different

rustic garnet
sudden cliff
#

maybe it's because the CLIP isn't that huge

uneven dove
#

i still want self-attention guidance for SDXL KEK

rustic garnet
#

like image captions are usually quite bad and general

uneven dove
grizzled warren
# inner ruin yeah exactly. They were very smart though because 1.5 was so hard to prompt, so ...

But if you had something specific in mind, you had to dilute the prompt with tons of synonyms, weak supporting tokens or outright bogus input so the LLM doesn't add too much on top of meaningful prompt. It was a good solution for inexperienced user, not so much for someone who can actually prompt 1.5 well enough. And it seems they moved in the same general direction SDXL is moving, because I heard current versions also benefit from natural prompting which is closer to an actual sentence instead of 1.5 notation.

boreal bough
rustic garnet
#

you rarely have captions like "a photo of a girl with blond hair and a boy with brown hair"

sudden cliff
#

Well why is dish and washer not encoded more differently from dishwasher

inner ruin
uneven dove
#

not sure how accurate they are but it was Good Enough for Me

rustic garnet
#

also, CLIP is trained to create a pooled embedding. You don't care about the single words in the caption, but you want to compare a complete image against a complete caption

urban fjord
boreal bough
jaunty adder
#

I read some youtube comment about how people aren't appreciating what is possible with SDXL and something about you'll be able to make cars for shoes. And I thought really? Oh yes really.

rustic garnet
#

so it is very likely that the transformed word embeddings in CLIP in the last layers carry a lot of information about the complete image. That's why clip skip worked so well in SD

sudden cliff
urban fjord
sudden cliff
#

If that's the only reason, then I can remain sane

inner ruin
boreal bough
sudden cliff
#

dish<-washer(.9) => encoding that is a little bit 'washer' and more the concept of 'dishwasher'

rustic garnet
#

don't get that. The last layer of CLIP contains the pooled embedding. However, nothing stops the model from letting the layer before the last layer already containing pooled embeddings

sudden cliff
rustic garnet
#

for the loss function clip is trained on it would be totally fine if in the last layers all words have exactly the same embedding

#

it would be just a waste of parameters

#

but it would mean that if you use the embedding from these layers you loose the individual meaning of your words

sudden cliff
rustic garnet
#

of course this is not the case. As said, if you look at self attention maps in SD you see that it can differentiate between different words in the sentence. It's still that sometimes words get mixed up a bit and a "women with blond hair and a boy with brown hair" the vector for women contains both, blond and brown hair information

inner ruin
boreal bough
sudden cliff
boreal bough
#

its a dumb solution, but it works painfully well :/

sudden cliff
#

that makes sense tho

rustic garnet
urban fjord
rustic garnet
#

like the sentence "girl with blond hair and boy with brown hair". In the first layer each word is isolated from each other. As more layers you go forward as more context is transferred to the words, such that "girl" is associated with "blond" and with "hair". In the last layer, the complete sentence has to be associated. So this means that its very likely that in the last layers every word is associated with every word in some way

sudden cliff
rustic garnet
#

yes

#

because the output of CLIP is just a single word

sudden cliff
#

OK yep I'm understanding then

rustic garnet
#

which contains information about the complete sentence

#

in SD the last layer is removed and the layer before is used, where HOPEFULLY the words still have their individual meaning

sudden cliff
#

OK tbh this WHOLE conversation though kai, I thought you were saying that there are NO associations in ANY layer

rustic garnet
#

but it is still very likely that a little bit of attention is leaked in each word

#

sorry, I'm probably bad in explaining 😅

sudden cliff
#

It's fine because I am just more familiar with the LLM stage

boreal bough
# sudden cliff interesting

forgot what it was, but essentially I wanted a very specific chinese flower dress, and obviously it couldn't make even remotely close. was gonna train a lora. then I threw it into vit-h, it gave me back an artist name XD put the artist name into the prompt as well. works 100% how i wanted it, and only produces the right dress. wtf right?
turns out there's a photographer who does nothing but photograph people in that type of dress. the weight on his name is stronger than the real name of the dress XD

sudden cliff
grizzled warren
rustic garnet
#

there is CLIP Interrogator

sudden cliff
rustic garnet
#

it works really nice for these cases

sudden cliff
#

So for my job I actually created a multimodal captioning software

#

that is focused on accuracy

#

it way outperforms even KOSMOS-2 etc

boreal bough
sudden cliff
#

CLIP interrogator has a lot of different models you can run

rustic garnet
#

I mean, diffusers has them all ;D

boreal bough
sudden cliff
#

Thanks all for the discussion, confirmed the bits I suspected and did actually understand but also learned a lot of things that I didn't know about at all or didn't understand

#

And I'm grateful that my whole world isn't shattered

inner ruin
idle pasture
#

where can i prompt the sdxl 1.0?

cursive saddle
#

Today, with collaborators at @Google , we're excited to announce 🥳🥳HyperDreamBooth🥳 🥳! It's like DreamBooth, but smaller, faster and better. 25x faster. Think of 30 minutes vs. 14 hours for 100 models. And works on a single image!
(Thread 👇)
webpage: hyperdreambooth.github.io"

Seen on twitter

#

A new dreambooth

grizzled warren
civic sigil
#

So is it a lora or a hypernetwork

#

Results dont look that amazing, I wonder what model they used

#

I guess Im curious what the difference is between that and traditional hypernetworks

#

Not sure why they glossed over it

rustic garnet
boreal bough
#

but... 1.5/2 only, right?

#

since the old techniques no longer apply to sdxl

patent badger
#

hey guys, I know it doesn't 100% belong here but I guess it could be related to SDXL as well,
if I'm training a lora for the openjourney v4 model, should i train the lora on the model itself or on the 1.5 base model?

rustic garnet
#

I don't think its about SD at all

boreal bough
lusty raptor
#

in most cases, you're better off training on the model you intend to use the lora with

#

no hard rules, of course

rustic garnet
#

oh, I'm wrong, they applied it to SD

#

anyways. I don't think that it is so interesting either. It is very similar to an older paper by google which was doing the same just with "rank-1 lora" instead of what they call "lightweight dreambooth"

#

it might be interestint for applications and cloud services that want to create personalized images on the fly for their users

boreal bough
#

porting it to sdxl is not the issue - rather the theory behind its speed up is probably no longer applicable to sdxl. due to the larger model, we no longer have to worry about so many of the issues of training on 1.5.
hell, I trained the same dataset on sdxl in 6 different way to see which work, some completely wrong for the hell of it. and they all worked

rustic garnet
#

but anyone here could just wait a few minutes longer and train a, probably much better, model using Lora

rustic garnet
boreal bough
#

if its speed you want, 2e-3 is the fastest you can go to achieve good results. While it can't be overfitted too much, that is rarely what you want to do to begin with - and then training is a speedrun

grizzled warren
#

They said they used Stable Diffusion, but they didn't specify the version. Chances are it's either 1.5 or 2.1.

civic sigil
rustic garnet
#

in the end their model is similar to controlnet in the sense that it uses a pre-trained network for faces. It's not exactly like controlnet, and I guess its because the results with a controlnet were not good enough. But the point is you have to train a model that is able to finetune a model for face images+

#

which means it works ONLY for faces which makes it kinda boring imo

civic sigil
#

Ohh yeah maybe that will be useful for like phone apps to personalize AI filters and stuff

rustic garnet
#

yes

#

I guess thats the point

civic sigil
#

Probably exciting for some startup out there lol

grizzled warren
civic sigil
#

But not for me

rustic garnet
#

maybe also game development where you get a personal avatar based on a photo and stuff like that

civic sigil
#

Ohh yeah like you can put your pic in and it will generate a bunch of images personalized for you

rustic garnet
civic sigil
#

Yeah my Loras learned super quick but it takes a lot of vram

grizzled warren
boreal bough
uneven dove
#

jim carrey as shrek?

lusty raptor
#

is it just me or is the comparisons image in that hyperdreambooth a bit misleading?

uneven dove
urban fjord
#

I guess it is all single-image datasets comparisons. Any comparison made behind closed doors will always be misleading.

uneven dove
#

it's not hard to improve on their original research paper

boreal bough
#

I take it back. if you're ok with this, then 4e-3 is your limit XD

urban fjord
#

But yeah I don't feel like the outputs are that good.

#

But it might have some uses still.

rocky geode
#

face reveal

boreal bough
#

casually pretends lora doesn't exist
though it definitely work a lot better in niche applications. just not generalized

though I also question their prompts, since you can't just compare "A Pixar character of a [V] face" when that prompt was never intended to work on the default model... while there IS a prompt that does work.

uneven dove
#

i don't understand that test grid at all

#

that thing belongs in the Facebook group of scientific charts that look like shitposts

urban fjord
#

Yeah without prompts it is kind of worthless, and where is normal LoRA...

boreal bough
#

just finished reading. I feel bamboozled. They just made a new variant LoRA and gave it a fancier name...

uneven dove
#

i think they want a line down the middle separating the men output from women output? looks like a god damn continuum where they gradually shift the weights

urban fjord
#

I didn't read too much of the paper as I don't understand the fine-details too well. But if you're improving on LoRA you should compare this to the other LoRA variants.

boreal bough
#

basically it's a 1/0.5 lora XD

urban fjord
#

It is like me developing a new screw and makes comparisons to bolts and nails but not other screws.

uneven dove
#

welcome to the wild world of machine learning research where the comparison don't mean anything and the demonstrations don't matter

uneven dove
rustic garnet
#

in this paper they say directly that they do lora, but they use a hypernetwork to predict initial weights

boreal bough
uneven dove
#

also, how the fuck does midjourney's bot send its partially denoised outputs to the discord message

#

😭

rustic garnet
#

also they use a random vector factorization before doing the lora to further shrink down the number of parameters

uneven dove
#

i can't update an old message with a new embed

boreal bough
#

but end result is still a 120 KB lora, right?

rustic garnet
#

yes

#

but from this view you could say Lora is the same as dreambooth

#

when you add the lora to a model you get a normal model back

boreal bough
#

LiDBx2 would be the proper name XD

eternal fog
#

lmao, my training has gone back to not working again. I don't understand lmao

paper phoenix
uneven dove
#

yeah i'm well aware of how groups like SAI and Google all burn capital just because they have it. i have always done everything i do, with much less than these groups spend. but i'm not working at their level i'm sure 😁

boreal bough
uneven dove
#

LoRA is probably the best thing Microsoft ever did

boreal bough
#

Vit models go brrrrrrr on Stability Cluster

uneven dove
#

oh, looked at wandb logs?

paper phoenix
#

but it must utterly suck to be wrapped up in your own red tape and having to form an orderly queue on an idea while people outside your window are all running at it from every angle like wacky races.

uneven dove
#

ahahahaha

#

@paper phoenix the caucus race scene from Alice in Wonderland

paper phoenix
#

its kind of nasa versus the redbull flugtag and inexplicably the flugtag is competing!

uneven dove
#

this scene is an incredible metaphor for so much time wasting resource expenditure we have in life

shy kelp
#

why doesnt google release an image gen I wonder

paper phoenix
urban fjord
#

Why do even google research these things if no one gets to do anything with it.

visual glade
uneven dove
#

nice

urban fjord
#

Does ComfyUI has support for that?

uneven dove
#

yeah i wanna do what MJ did for image gen and show the images as they generate

#

i have a progress bar but that's boring

urban fjord
#

Look at how Automatic1111 is doing live-preview.

paper phoenix
#

do you mean preview every x steps?

eternal fog
#

Although it's not that fast either

visual glade
#

if you want preview in comfy it's: --preview-method auto

uneven dove
visual glade
#

except a1111 is not fast, both diffusers and comfyui beat it in speed

#

so it's actually the slowest

uneven dove
#

it's fast AND shitty which means it's the best at being the worst. it's like how 1+1=3.. for large values of 1

eternal fog
#

I'm going insane wtf

#

I can't even train 512,512 anymore

paper phoenix
#

this a bit.... weird.

eternal fog
#

What is happening

uneven dove
eternal fog
#

It just does this

#

And OOM

#

But it worked 2 days ago on 1024x1024 at batch 2

uneven dove
#

wow

urban fjord
#

Ah, I thought it were a node for it as I want to replace the normal conversion with the faster one as I've a bit of an issue with the vae atm.

uneven dove
#

it even dips into the iGPU at the end

#

lmfao

eternal fog
uneven dove
#

oh

#

what the hell is that technique called KEKL

eternal fog
#

But I don't understand why it's suddenly started doing this, when I trained one the other day and it did work.

#

So why now is it fucked

uneven dove
#

did you update?

eternal fog
#

updated, didn't work, downgraded worked. Went to sleep, ran again, didn't work.

uneven dove
#

if you haven't updated, my guess is there's some state file it's picking up on from the last run

eternal fog
#

I don't save states

#

And the commands look the exact same

uneven dove
#

well upgrading and downgrading is ... uh... well, did you look at the code changes before doing it to verify it'd be okay?

eternal fog
#

When I say downgrade, I checked out an old commit

visual glade
#

nvidia driver update?

uneven dove
#

i've heard a lot of this "works one day, does not the next"

eternal fog
uneven dove
#

have you tried turning it off and on again?

eternal fog
#

Many times

uneven dove
#

hm

#

well, something changed

eternal fog
#

Let me check to make sure it's not some corrupt cached latent or something, I'm going to clear the whole training folder.

uneven dove
#

that's what i meant when i said some saved state

eternal fog
uneven dove
#

cached latents or aspect buckets can i guess do that too

eternal fog
#

I'll see if this works

uneven dove
#

well it sounds truly frustrating, i hope you figure it out, because maybe it's the same issue for all

rustic garnet
eternal fog
#

It's just bizarre

#

Goes from 1024x1024 batch 2 working

#

to 512x512 batch 1 not lmao

#

I wonder if it's doing something fucky with some cached latents and trying to load them multiple times or something

eternal fog
trail bay
rustic garnet
#

the VAE approximation in auto1111 is super fast and a great idea

shy kelp
#

I want to see a model or method where you can get multiple subjects in on the first go with no merging

#

thats my challenge to all the eggheads

boreal bough
eternal fog
lilac wren
civic sigil
eternal fog
#

I suspect somewhere some settings on the Kohya scripts are getting fucked up

#

And it's not doing what it's telling me it's doing

civic sigil
#

Dang I need to switchto linux I cant get anywhere near that

eternal fog
#

I mean it doesn't work at all anymore

#

I would OOM with a resolution of 1x1 lmao

#

I think I'm going to delete this whole thing and start again

#

Because something is obviously broken

civic sigil
#

Been there done that

#

Derriens repo worked for me out of the box tho

eternal fog
#

How much VRAM?

civic sigil
#

12GB

eternal fog
#

Hmm, maybe I'll try that

civic sigil
#

Yeah it's based on Khoya just that it installed correctly for me instead of the actualy kohya

#

It also has some nice QoL features

eternal fog
#

I hate windows sometimes

#

It won't let me delete an empty folder because it's "In Use"

#

how... by what lmao

civic sigil
#

What you're on windows?

eternal fog
#

yes

civic sigil
#

How did you get such good results

eternal fog
#

¯_(ツ)_/¯

#

Thats what I'm trying to work out

thin nova
civic sigil
#

I would love to train at full res

eternal fog
#

How would an open folder be using an open folder XD

lusty raptor
eternal fog
#

I have a tool, but it does it on files and not folders

#

Because an empty folder shouldn't ever be locked lol

thin nova
#

if you have the empty folder open, viewing the empty contents

sage basin
#

Do you have a command line open in that folder?

thin nova
#

or if you are inside the folder in a terminal

eternal fog
#

It's fine, I'm just getting irritated because this thing randomly stopped working

civic sigil
#

Are you sure it was working in the first place?

eternal fog
#

100% it make a working LoRA

civic sigil
#

What if you were accidentally resizing to 512 or something

eternal fog
#

Well it OOMs at 512 so...

#

But my tests worked, not brilliantly, but they absolutely worked.

civic sigil
#

Hmm

eternal fog
#

And that's bang on the character style

#

It's just overfit to all hell

civic sigil
#

Im interested now I want to get it working on my machine too lol

eternal fog
#

Well I'm going to try re-install it all

#

Lets see if that works again

civic sigil
#

Good luck

delicate grotto
#

anime with bikes

lilac wren
eternal fog
gentle mirage
#

couldu yoink me your parameters?

eternal fog
gentle mirage
#

oh? i thought it was from that pic

#

all g then

eternal fog
#

It was

#

It was working

#

Now it doesn't

#

¯_(ツ)_/¯

gentle mirage
#

😂

eternal fog
#

Just re-installed all Kohyas scripts and still doesn't work

#

Exact same training settings

#

So dumb

lilac wren
gentle mirage
eternal fog
#

I don't remember tbh

#

It worked but wasn't good enough so I deleted it

green python
#

parti vs sdxl (no cherrypicking)

eternal fog
#

{\"resolution\": [768, 1152], \"count\": 2}, \"1\": {\"resolution\": [768, 1216], \"count\": 1}, \"2\": {\"resolution\": [832, 1088], \"count\": 2}, \"3\": {\"resolution\": [832, 1152], \"count\": 4}, \"4\": {\"resolution\": [832, 1216], \"count\": 4}, \"5\": {\"resolution\": [896, 1024], \"count\": 1}, \"6\": {\"resolution\": [1024, 896], \"count\": 1}}, \"mean_img_ar_error\": 0.010579165855970867}"

This is the info from another test one I trained, so it was using the correct 1024x1024 with bucketing

#

Thats from inside the safetensors file

#

So I'm so confused as to how that somehow took less than 10GB VRAM but now it takes more than 10GB VRAM to try with 512x512

inner ruin
eternal fog
eternal fog
#

If the Collab works I might just use that. Although it's annoying having to upload all the images

lusty raptor
inner ruin
#

but if I change anything it just forgets the face

#

it's dumb af

eternal fog
uneven dove
#

reimagining Dragon Fruit

lusty raptor
#

i tried derrien's for the first time recently and it didn't need admin

eternal fog
#

I just ran it, it tries to change the powershell restriction policy and then a UAC Prompt for Admin

uneven dove
eternal fog
#

Call PowerShell -NoProfile -ExecutionPolicy Bypass -Command "& {Start-Process PowerShell -ArgumentList 'Set-ExecutionPolicy Unrestricted -Force' -Verb RunAs}"

It doesn't need to do this

eternal fog
#

Yeah so did I

#

the Bat loads up that

#

Which does this

#
    try:
        subprocess.check_call(f"{os.path.join('installables', 'change_execution_policy.bat')}")
    except subprocess.SubprocessError:
        try:
            subprocess.check_call(f"{os.path.join('installables', 'change_execution_policy_backup.bat')}")
        except subprocess.SubprocessError as e:
            print(f"Failed to change the execution policy with error:\n {e}")```
#

Which then runs another BAT that runs that code to change the policy

lusty raptor
#

so why didn't i get a UAC prompt?

eternal fog
#

Have you disabled it?

lusty raptor
#

no

eternal fog
#

Or were you already running the bat in an admin command window

lusty raptor
#

also no

eternal fog
#

Not sure, but it did it for me

#

AndI'm not installing stuff that asks for admin for no reason

molten gull
eternal fog
# lusty raptor also no

Actually it might be because your powershell policy is already on bypass, so it didn't need to change it.

molten gull
#

that's weird pencils 🙂 the one in the middle is blue and black 🙂

lusty raptor
#

oh good call

eternal fog
#

But I don't see any powershell scripts that would need that changing to run

#

Think I'll just wait for 1.0 and improved tools before I train anythiing

#

I need food

molten gull
#

🙂

rustic garnet
uneven dove
#

for dreambooth, yeah training the text encoder is pretty strong, have to do it very carefully

lusty raptor
inner ruin
rustic garnet
#

hm, thats weird. I had no problems train on a subject.

#

I did first train text encoder for a few epochs and then, much longer, the unet

eternal fog
rustic garnet
#

I did textual inversion first, but I'm pretty sure you can skip that if you have a good trigger word

inner ruin
#

it

rustic garnet
#

LR 5e-4

inner ruin
#

it's funny because it still gets the demographic data (like white man with long hair) but it loses face details

lusty raptor
rustic garnet
#

Steps in total? I don't know anymore. Guess ~400 for text encoder and ~800 for unet. Actually, I just train until I see severe overfitting. Then I use the last model and a model a few epochs before

#

I would try with training text encoder. This is really powerful

inner ruin
true hazel
#

anatomy why...

uneven dove
#

cursed seed, forbidden token, or both

rustic garnet
true hazel
#

leg status?

rustic garnet
#

I have to say, SDXL is doing anatomy often surprisingly well. I see very rarely wrong number of fingers

uneven dove
#

diffusers > *

rustic garnet
#

its still sometimes mess up composition. But it feels like they trained it well on legs and fingers

uneven dove
#

brb reducing RAM use in my bot by 3GB 😄

normal frost
#

Which do you guys think is better? I like V1 more.

civic sigil
lilac wren
true hazel
#

amazing it got the shadows right

clever verge
#

Fingers seems to be very hard to get even close to looking decent. Will 1.0 be better?

#

Sometimes there's correct number of fingers (five) but a there's a normal finger instead of a thumb. 😄

rustic garnet
quasi remnant
#

a little but not super massive, 0.9 is pretty indicative of what you're getting

#

there is a decent quality bump in 1.0 though

#

there are specific models for inpainting hands afaik

#

also negative textual inversion embeddings you can use to reduce extra fingies & stuff, nothing's really perfect though

primal hatch
clever verge
#

Eyes from steep angle (profile shot) is hard too.

#

But about the same level as some good 1.5 models is I feel.

sudden cliff
#

In Comfy, I guess if you batch it doesn't save the seed for each image in the batch generated?

clever verge
#

I understand that it's pretty hard for a model to predict how hands can be shaped, they are very handy tools after all.

clever verge
lilac wren
glad fulcrum
#

I was told comfyUI was faster, but its very slow.

#

When I press on queue... Its takes a lot of time before it starts to generate

#

I have 16RAM, and 12VRAM

sudden cliff
clever verge
#

If you clear the UI and drag the image back does it show the seed?

#

Make sure you saved the UI first.

shy kelp
sudden cliff
#

I mean looking at the raw data in the output it's identical so there's no way

#

wonder how I could fix that

clever verge
vast narwhal
#

What you think about my upscale result? Too sharp?

glad fulcrum
#

Like you have to load the model for every image...

#

And it takes like forever

#

in fact, you have to load 2 models for every image.

clever verge
#

Yep, load it twice for every image.

#

Other model, possibly higher resolution.

civic sigil
clever verge
#

It's much faster if you run ddim as Sytan's flow has.

lilac wren
#

Yeah @vast narwhal, looks terrible, could you share your workflow?

vast narwhal
eternal fog
#

Not all SDXL 😢

#

I wonder if they'll give us a controlnet tile for SDXL

#

It might fix the detail loss

vast narwhal
shy kelp
steady chasm
shy kelp
#

im assuming you used a 1.5 model for upscaling

#

or 2.1 or whatever

eternal fog
#

He did, he said he used Juggernaught

shy kelp
#

ahh okay. i need to try that out.

boreal bough
autumn forum
#

Juggernaut xl will be in the works soon and oml will that be good

boreal bough
#

I've basically finished all my lora tests. if anyone has a dataset they want to me test, feel free to ping me

uneven dove
#

a stunning portrait of StabilityAI deepfrying the VAE
"yep, looks done?"
"better give it another hour"

boreal bough
#

@uneven dove
my favorite response to the comfy inside A1111 extension XD

elder rose
eternal fog
#

If you read the blog it tells you it's a T2I-Adapter

steady chasm
visual glade
uneven dove
#

that's not as much vmem savings

visual glade
#

both are 16 bit though so it should be the same?

uneven dove
#

it's not, the dynamic range is higher in bf16

#

also slower

#

bf16 is great but only because it is convenient in terms of development costs, same as tf32 for fp32-sensitive applications

elder rose
#

Thx

vernal cloak
#

How long do you lot think we will have to wait until we get a anime finetuned model of SDXL, upon release?
Really looking forward to exploring XL's better understanding of context in prompts tho.

boreal bough
#

but also waifuxl for easy use

vernal cloak
boreal bough
#

Interrogator -> Vit-L model

vernal cloak
#

Oh right! Ok. Nice! Thanks. Haha

glad fulcrum
boreal bough
#

removal should only happen if you lack enough vram

glad fulcrum
#

oh so 12 must not be enough I guess

boreal bough
#

it is -based on people here

#

however stuff in background may be taking some of it

#

photoshop or similar apps

#

or other uis

thin nova
#

i just use --highvram

boreal bough
glad fulcrum
glad fulcrum
thin nova
#

keeps the models loaded on the gpu so it doesn't load them every time

visual glade
glad fulcrum
visual glade
#

as long as the hardware supports it speed should be the same as fp16

glad fulcrum
thin nova
#

yeah

glad fulcrum
#

ok will try

eternal fog
#

lmao, I tried out a tiled sampler node

#

It did not go well

uneven dove
#

not on a 4090, an A100, or an A6000

#

maybe on a TPU it works better... hm

eternal fog
#

ok this time it worked, as in it made something, but it does not keep it coherent at all

stray mantle
eternal fog
#

I think it's loading in the Text Encoders that takes time

#

As it takes that extra time every time you change the prompt

static prawn
#

feel like prompting on sdxl is pretty hard, i only get medium good results always, sometimes blurry, grainy, missing details

sage basin
#

I've noticed it takes an extra 30~40 seconds to start generating when using a lora as well, even on subsequent generations

visual glade
#

yeah loras are not currently handled in a very memory efficient way so if you only have 16GB ram it's going to be slow but I'm fixing that

static prawn
#

dont understand why my results always get so blurry

green python
#

Congratulations to the stability.ai team, you have done a very good job with this model

#

it's better than google's models

tight fjord
#

hey guys, did we ever get official information regarding 1 vs 2 positive prompts and clip_g clip_l?

#

cause i tested both ways and i'm still not sure what is best

#

and similarly, ascore seems to have 0 effect at all if I change the int value of both positive and negative

vast narwhal
eternal fog
#

Which of these looks better?

upbeat summit
tight fjord
#

something in between these 2 @eternal fog , either blurry or overly sharp to me

green python
sharp robin
rustic garnet
green python
eternal fog
tight fjord
#

ok thanks guys, so its still mostly speculation with no real consensus

green python
rustic garnet
#

yes, and as we had a very long discussion a few hours ago: from a theoretical standpoint it's awkward using different prompts

tight fjord
#

yeah, i feel UIs won't adapt to have 2 prompts just for sdxl

elfin cobalt
rustic garnet
uneven dove
eternal fog
#

ok what about these two?

sharp robin
#

textures look more real, it doesnt feel "fake" or plasticky if you look around the area of breasts it starts looking artifact and burned

gilded plinth
#

what is the impact of ascore?

azure oxide
#

doesnt comfy already have it

eternal fog
uneven dove
#

it looks like misaligned timesteps

eternal fog
uneven dove
#

hmm

#

fair enough but when there's more noise than it knows what to do with, it does that kind of facial lining

eternal fog
#

I'll experiment a bit more I think I'm getting somewhere though

rustic garnet
tight fjord
#

try 'hairless demon' 😄

uneven dove
#

a more severe example

eternal fog
#

Instead of generating then upscaling then doing the img2img pass with refiner.

I'm generating then going straight to the img2img refiner pass, THEN Upscaling and then doing another img2img refiner pass. It seems to keep detail a lot better and only takes a few seconds longer.

sharp robin
eternal fog
#

Let me try 2 more without facial lines this time.

uneven dove
#

less misaligned makes it into some kind of excusable crayon lines. after all, he is a jester. but it looks odd

boreal bough
uneven dove
#

here's the effect you get when you randomly add noise during denoising

eternal fog
molten gull
uneven dove
#

in fact i think the random noise added during inference is possibly the best example of teh face cracking in an 'artistically acceptable way'

molten gull
#

realistic ? i'm okay with it ... hands? no freaking way 🙂

uneven dove
tight fjord
#

@eternal fog just merge both 😄

gilded plinth
#

what is the standard ascore for positive and negative?

uneven dove
#

5/1

rustic garnet
#

I think 6 and 2

uneven dove
#

its definitely 5 and 1

sharp robin
uneven dove
#

they ain't great values

eternal fog
#

These two are a bit closer, although it's buggered up the eyes on one of them

azure oxide
glad fulcrum
#

do sdxl prompts work different?

#

and if so, in what way, do you have examples

boreal bough
uneven dove
eternal fog
molten gull
#

those two look like skyrim 6 🙂

glad fulcrum
rustic garnet
#

but you can also just copy an old prompt and it will usually work

shy kelp
rustic garnet
#

nope

uneven dove
#

not the overly stupid prompts kai lmao

#

'realistic' makes it look plastic

uneven dove
#

masterpiece, trending on artstation, they make real people look like vector graphics

molten gull
uneven dove
#

you need to remove a lot of that crap

glad fulcrum
#

for realistic, what do you use?

uneven dove
#

nothing

glad fulcrum
#

please share some prompts

uneven dove
#

just say what you want

molten gull
#

pure luck, i would say 🙂

uneven dove
#

a stunning portrait of a 1985 adult in leggings

green python
uneven dove
#

it'll do fine

#

you don't NEED too much more

tight fjord
#

be careful with negative prompts that look innocuous, i just figured out that 'blurry' was making my photos paintings, then i added 'painting' and now everyone looks wrinkled

molten gull
#

then add "wrinkled", too 🙂

#

and see if that makes it blurry again 🙂

tight fjord
#

did that, all im saying is without any of those, it looks better 🙂

#

negative prompting can have a lot of strong effects that are hard to predict

rustic garnet
uneven dove
#

ಠ_ಠ

boreal bough
#

"a photo of jim the plumber working hard on pipes as he ponders the world and its meaning"
using bot v1.0

molten gull
#

EVERYTHING is hard to predict 🙂

#

yeah, that's definitively jim 🙂

shy kelp
tight fjord
#

wow, 100% believably jim

rustic garnet
#

yeah, you usually don't need negative prompts. Avoid them

#

use them only if you really need them

uneven dove
#

usually for excluding overfitted subjects

rustic garnet
#

not like 2.1 where we by default used complex negative prompts

elfin cobalt
#

I think I might need negative prompts.

uneven dove
#

2.1 just needs like one neg embed lol

soft bone
#

i have massive success with tiny negatives on 2.1

boreal bough
#

"a photo of jim the plumber working hard on pipes as he ponders the world and its meaning"
using mimizukari setup. no style/no negative

uneven dove
rustic garnet
#

I tried some Dragonball Z prompt and get crappy images back. In this cases you have to improve your positive prompt, not the negative one

#

e.g. add artist names that describe the image style you want

sharp robin
sage basin
#

https://www.midlibrary.io/ has a tonne of useful artist names that work with SDXL as well. Using photographers will usually give you decent quality photos

molten gull
elfin cobalt
#

I've had a great deal of luck using GPT-4 for first pass prompt engineering; 80% of the time it produces great pictures, although not always what I want.

#

Mind you, 10% of the time it outputs what I posted above.

molten gull
#

question: in comfyUI there's a KSampler (Advanced) node, that has a start_at_step and end_at_step ... what are those for ?

rustic garnet
#

if yiu want to change the model during sampling for example

#

or other situations where you want to stop the denoising process, do something else with the latents, and continue

molten gull
#

that's freaking crazy 🙂

rustic garnet
#

e.g. stop in between and continue with the refiner model

#

or change the prompt or model in between

eternal fog
molten gull
elfin cobalt
#

...now if you give the right prompt to GPT-4, it produces this.

vast narwhal
molten gull
autumn forum
visual glade
rustic garnet
clever verge
visual glade
#

--help will show all of them

autumn forum
shy kelp
clever verge
eternal fog
eternal fog
#

It does though infact fix black images when you run it at fp16

mystic cosmos
uneven dove
#

prompt: i installed Clippy today to show my grandkids how we used to talk to a paperclip and they said "grandpa's sunsetting again" and i was rushed to the doctors who adjusted my medication and insist i don't have grandchildren and that i need to stop going off my meds

timid sonnet
lament rune
glad fulcrum
#

I tried the --highvram but now...

#

So without it every image needs to load both models constantly

#

and with it it runs out of memory

#

on 12gbVram ?

thin nova
#

each model is pretty big

amber fulcrum
urban fjord
#

With only 12 GB VRAM you shouldn't run refiner and base with highvram

visual glade
#

yeah don't do highvram with SDXL if you only have 12GB, both unets on the gpu take that amount of memory

elfin cobalt
# amber fulcrum whats the prompt if I may ask? Looks amazing

Still experimenting. At the moment...

Given input such as "A picture of a boat", generate a creative description such as "Digital painting of a boat on the stormy ocean", deferring to user input when convenient. Also output a style, selecting relevant artists and stylistic choices that go well with the prompt. Series/character names don't work, so describe the scnee or character instead. Always include artists. While the prompt should be regular english, the style should be comma-separated keywords.

Respond using JSON, in the format {"prompt": "{prompt"}, "style": "{style}", aspect_ratio: "{e.g. 4:3}"}
Which, with this request:
Machikado Mazoku.
Produced this:
Anime-style digital artwork depicting a young girl with horns and a spiky tail, surrounded by a mysterious aura in suburban scenery --style Contemporary, Manga, Modern, Magic Realism, Hayao Miyazaki, Yoshitoshi Abe --ar 16:9

urban fjord
#

Highvram is useful if you're only doing the base, but not with both.

glad fulcrum
urban fjord
#

It shouldn't be that slow really to load both.

glad fulcrum
#

last year they were talking about 24 fps and now 6 months later we need to wait 5 minutes for an image.

#

yeah it takes like 500 seconds per image

#

diffusion is fast but the model takes too long to load

urban fjord
#

And if you create a batch of latent images then you've got less model switches to worry about

soft zealot
#

I use a 1080ti11gb , only time I notice the SDXL models slow(ish) to load is when I initially start the server and load for the first time, afte rthat its almost instant even if I switch models and use a 1.5/2.1 workflow

shy kelp
#

man its so funny seeing people claiming their upscaling setup works great then showing absolutely horrendous results

soft zealot
shy kelp
#

nothing more to it

elfin cobalt
#

A well-matched 1.5 model works fine for upscaling. But SDXL is so flexible, there isn't any single model that'll work.

shy kelp
#

also idk why you wouldnt just like to look at an image with more details than 1024... your take dosent really make sense to me at all actually

uneven dove
#

a lot of the quality improvements are just people showing off that they CAN do it

#

doesn't need prompt comprehension worth a damn

quasi remnant
shy kelp
amber fulcrum
shy kelp
glad fulcrum
elfin cobalt
#

"What", but flatter.

shy kelp
#

it dosent work

#

da ting dont do wat da ting should do

quasi remnant
glad fulcrum
#

I removed the highVram already

boreal bough
#

Ufff. No bf16 support, right?

shy kelp
#

if your 7x upscaling workflow doesnt take at least 7 minutes to complete a single image, you're not taking your image generation seriously and should probably just give up

glad fulcrum
#

I think it is working faster now

#

It seems it loaded both models now

#

maybe the fp16 vae helped

amber fulcrum
# glad fulcrum

try 536.40 - revert helped me as I was generating one image for 2-3 minutes with updated instead of 15 secs

amber fulcrum
#

Cool, Maybe they fixed the bug with latest update ^^

glad fulcrum
#

I mean it's not 17 seconds to me

#

but it doesn't load the model every time now

urban fjord
#

Is that with base + refiner?

soft zealot
autumn forum
soft zealot
autumn forum
eternal fog
boreal bough
#

to not break immersion

soft zealot
urban fjord
#

You can always just upscale in batch over night or something.

sharp robin
autumn forum
#

I wish I had a better use case for ai art, I know some people would pay for a high quality image of something specific but I wouldn’t even know how to sell that service😅

boreal bough
shy kelp
static prawn
#

kinda wish i would get some consistent results, i always fail with sdxl

shy kelp
#

is sdxl free?

boreal bough
#

or wait 3 more days for local full release

soft zealot
#

Workflow is left in there

shy kelp
#

thanks @boreal bough

glad fulcrum
#

It wrote pretty good the name of my friend

#

and the image is nice as well with very simple prompt

static prawn
#

u know where i can find prompts from sdxl?

glad fulcrum
#

hmmm I'm starting to believe this is good model

static prawn
#

especially with neg prompts?

elfin cobalt
#

Negatives aren't needed, really.

#

For positives... it understands English much better than 1.5. Stick to simple language without prepositions, and it'll work fine.

#

Well, pronouns. Prepositions you can try to use.

static prawn
#

mh ok , dunno i always tend to get blurry or extremly grainy results

eternal fog
#

Negatives do some things, I've been putting a few in like deformed, blurry and this is the sort of difference you can get from with negatives and none

shy kelp
#

Friday night hype leggo

static prawn
#

i dunno my results are just always blurry or extremly grainy

#

i cant get result like yours

eternal fog
#

What sort of steps, samplers and cfg are you using

static prawn
eternal fog
# static prawn

Don't change samplers and noise schedules between the base and refiner

#

And DIMM should be using DIMM Uniform

static prawn
#

oh ok someone suggested its way better using diff samplers on base and refiner

sharp robin
boreal bough
# static prawn u know where i can find prompts from sdxl?

A. get a good setup
B. either prompt properly sdxl (trial and error) / or write a sentence in natural language that is around 10~15 words long, no commas / Use Interrogator with Vit-H to get prompts from existing images
C. no negatives, unless you know what you want them to do
D. generate an image to make sure you didn't include a word that messes everything up (rare, but can happen)

eternal fog
autumn forum
amber fulcrum
#

I find prompting in SDXL better and worse than in 1.5 the same time XD
Better is consistency - worse is consistency.
I mean - I would like to have higher randomness to the output - often same prompt gives +90% of the same results.
Which is a plus as it seems working as intended but is also negative if you found good style but need to experiment with each prompt just to have different image to previous.
They are very similar.
Another issue is color - if I add "white background" usually it dramatically adds white as a whole to the scene/artwork.
What issues do you have and how do you overcome them?

sharp robin
urban fjord
#

If you want consistency then you can make a LoRA out of it.

boreal bough
static prawn
# eternal fog

im probably not happy with the overall result of sdxl, i think urs is super blurry too, 1024x1024 on 1.5 look so much cleaner, and crisp imo

urban fjord
#

People will fine-tune SDXL so results will improve.

eternal fog
urban fjord
#

Remember how poor 1.5 was and how well fine-tunes work now..

static prawn
#

i dunno everything i tried got kinda messed up with a lot of grain or blurry

boreal bough
static prawn
#

😂

paper phoenix
#

No negative, just his posts 😉

boreal bough
#

second image generated, seed+1

static prawn
#

trust me i look that depressed every day haha

paper phoenix
#

even if SDXL was absolutely irreproachably perfect in every way. void: "oh well, I bet ill go blind soon and not be able to see it."

boreal bough
nimble heart
amber fulcrum
#

SDXL improved a lot with fantasy but on the other hand - in some areas is overfitted as hell. Still - lot of improvement overall

urban fjord
paper phoenix
boreal bough
#

hahahaha

paper phoenix
#

oldest trick in the book. (the book being around 18 months old)

urban fjord
#

Yeah I hid that it didn't have 5 fingers.