#🔧｜finetune | Stable Diffusion | Page 20

ruby pond Jan 11, 2024, 5:13 AM

#

not dreambooth, but I've been running fine tune for sdxl with batch size 8 fine on my 4090

restive bridge Jan 11, 2024, 5:17 AM

#

ruby pond not dreambooth, but I've been running fine tune for sdxl with batch size 8 fine ...

Since I'm using ground truth for reg I'm pretty sure I'm literally just finetuning at this point. But dang must be nice, I do have xformers and other optimizations off and full bf16 though for top quality, so it is very very heavy compared to what it could be. but I just cant tell if batch 2 is worse for faces or not

ruby pond Jan 11, 2024, 5:23 AM

#

restive bridge Since I'm using ground truth for reg I'm pretty sure I'm literally just finetuni...

seems to be just adafactor that can manage the higher batch sizes. I tried prodigy and it couldn't even do batch size 1 for fine tune

restive bridge Jan 11, 2024, 5:30 AM

#

hmm I'm also on adafactor. I think if you turned off all optimizations you'd be limited to 2 batches as well. it takes 23.7gb. I dont think there's any vram difference between DB and finetune with the same settings

elder hull Jan 11, 2024, 6:58 PM

#

Same training procedure and database + VAE , different models.

Original NAI (VAE: vae-ft-mse-840000-ema-pruned) toshaka_v1_nai (subject) toshaka_v1_nai (subject_filewords)
vs Nothing v2.3 (VAE: vae-ft-mse-840000-ema-pruned) toshaka_v1_nothingv23 (subject_filewords)

I would have assumed the curse of recursion hit but...
If I use the TI trained relative to NAI to produce images with Nothing v2.3, it works just fine, so the concept should be representable within Nothing v2.3 (and given that inference is just like training without gradient upgrade)

gentle flame Jan 12, 2024, 1:44 AM

#

Perturbed noise messes with zsnr stuff. I know soeone finetuning with it, making a vpred zsnr model. The zsnr node from comfy stopped being scuffed when perturbed noise was removed.

#

Just putting that out there for anyone that plans on messing with Perturbed noise.
It also makes outputs noisy btw. Known because the problems with noise disappeared after removing it completely.

#

#🔧｜finetune message

grizzled jungle Jan 12, 2024, 3:55 PM

#

I'm getting results like this when I trained a lora. I checked, the model works just fine without the lora in place. These are my sdscripts prodigy settings. Clip skip is 2 since I'm training on a model that's good for clip skip 2. Anything that's wrong with these settings?

dusky urchin Jan 12, 2024, 4:41 PM

#

restive bridge hmm I'm also on adafactor. I think if you turned off all optimizations you'd be ...

on 24GB and adafactor, i observe the same limitation as you. I can go from 2 to 3 by enabling gradient checkpointing. you are using loras for your fine tuning, correct?

dusky urchin Jan 12, 2024, 4:46 PM

#

restive bridge Since I'm using ground truth for reg I'm pretty sure I'm literally just finetuni...

but I just cant tell if batch 2 is worse for faces or not
i haven't investigated faces very much, but imo if you are impatient, these choices matter, and if you are patient, they matter a lot less. since i think you are the patient type, is there a specific issue you are trying to address? it's not very insightful to say that "your dataset is probably too small," but maybe you can do a directly gathering of image and captions.

restive bridge Jan 12, 2024, 8:41 PM

#

dusky urchin > but I just cant tell if batch 2 is worse for faces or not i haven't investigat...

my thinking that a higher batch might improve quality is if it's kind of averaging out each pair of images which sounds like it could help with likeness consistency. if the images have enough variety, they might seem like different people to the AI but if it has to average 2 together each time, it should learn the face more consistently in theory.. I think?

stiff dust Jan 13, 2024, 12:29 AM

#

gradients are averaged all the time. I have no clue why fine-tuning sometimes works better with lower batch size, but this idea of the dnn getting confused if it sees two images at the same time is just wrong

sharp prawn Jan 13, 2024, 3:20 PM

#

I think I was supposed to post this here: Say I generate a person with a black t-shirt in txt2image, I bring it to img2img and I want to apply my own person png design on that shirt, how would I go about doing that? Do I use control net?

dusky urchin Jan 13, 2024, 9:45 PM

#

stiff dust gradients are averaged all the time. I have no clue why fine-tuning sometimes wo...

lower batch sizes usually converge to a lower loss in backpropagation https://stats.stackexchange.com/questions/316464/how-does-batch-size-affect-convergence-of-sgd-and-why

Cross Validated

How does batch size affect convergence of SGD and why?

I've seen similar conclusion from many discussions, that as the minibatch size gets larger the convergence of SGD actually gets harder/worse, for example this paper and this answer. Also I've heard...

#

i think you know that but it has been my experience, for at least the decade i've been using accelerated backpropagation, that lower batch sizes take longer but converge to a lower loss

#

fine tuning conditional unets in particular: no clue.

dusky urchin Jan 13, 2024, 9:47 PM

#

restive bridge my thinking that a higher batch might improve quality is if it's kind of averagi...

higher batch will make it train faster. it comes back to whether you are patient or impatient

#

i haven't had the bandwidth yet to experiment with (1) pretraining a LoRA (2) creating contrastive validation sets, but my expectation is that both of those things will be a big improvement on dreambooth style regularization

stiff dust Jan 13, 2024, 10:00 PM

#

no, I never heard of this effect and never experienced it myself. Also, none of the answers in the blog post is convincing. Yes, larger batches have more stable gradients which could increase overfitting. But this happens for very large batches, not for batch size of 4 or 8.

pure crypt Jan 13, 2024, 10:26 PM

#

Hello guys! I recently started my journey with SD and Loras. I have already made some models using real people (like the ones you can find on youtube tutorials), but this time I have the chance to create a unique one thanks to a model that is willing to help me. What would the "perfect" img folder be like? I mean, for my past models I used varied pictures of women with different clothes and backgrounds...this time, I can actually use a studio to get the pictures. Any advice?
Sorry if this isn't the right section for this question...

jade hornet Jan 14, 2024, 5:16 PM

#

If you take 30 photos of a model in a studio, you're likely to get that kind of output in your inference. The advice is the same as always, variation of portraits, medium shot, full shot... Try to vary the background or the training will pick that up

pure crypt Jan 15, 2024, 4:45 AM

#

jade hornet If you take 30 photos of a model in a studio, you're likely to get that kind of ...

hmm, I see, I will keep that in mind, thanks a lot!
And now that you mention the background, is there a way to make the LORA training NOT pick up the background info that much? other than variation of course.

jade hornet Jan 15, 2024, 4:47 AM

#

Caption helps, but it'll still come through if you overtrain

dusky urchin Jan 15, 2024, 8:17 AM

#

pure crypt hmm, I see, I will keep that in mind, thanks a lot! And now that you mention th...

you are at the start of a long journey. you should maybe start by trying to train a lora into a mixamo character. you are not going to get this right on your first two tries.

is there a way to make the LORA training NOT pick up the background info that much? other than variation of course.
for every concept you want to have a hope and a prayer that CLIP learns, you will need a contrast. this basically means at least 8 images per concept: the positive and negative of the concept for each of training, validation and test, and a regularization for the positive and negative.

so for example, if you want to be able to show the concept of any background versus a solid background:

white background w/ person, contentful A background w/ person
blue background w/ person, contentful B background w/ person
green background w/ person, contentful C background w/ person
regularization: red background w/ different person, contentful background w/ different person

then you actually have to use validation and test...
okay you're reading all this and thinking: wait a minute, nobody writes about this online. basically people are running datasets that are way too small, and they are usually overtraining or undertraining, ad-hoc testing some random prompts they wanted. it's up to you. if you like the results for the generality (or lack thereof) that you are trying to achieve, using community methods, you know, nothing else matters.

pure crypt Jan 15, 2024, 12:43 PM

#

dusky urchin you are at the start of a long journey. you should maybe start by trying to trai...

Very interesting, thanks a lot! I did a training test with 4 different backgrounds and I can see SD always including some of its elements. For example, in one of the backgrounds we had a wooden wall, so it's trying to put wood in every "indoors" prompt. I will have to make a dataset in many places then. Regarding the regularization, that is pretty useful I will see how I can make use of that.
Lastly and extra question: the same will apply to her clothing right? For example, if most of the real shots are taken in one set of clothing, it will try to make that part of the results too right? Or is clothing easier to manipulate?

dusky urchin Jan 15, 2024, 3:08 PM

#

pure crypt Very interesting, thanks a lot! I did a training test with 4 different backgroun...

all of this depends on how you caption. 30 images are not enough for the correct advice, "caption everything you see, including the mundane." the community will usually advise the opposite. for example, without a contrast, yes, the word "indoors" will become associated with the wood in the background.

dusky urchin Jan 15, 2024, 3:09 PM

#

pure crypt Very interesting, thanks a lot! I did a training test with 4 different backgroun...

it will try to make that part of the results too right? Or is clothing easier to manipulate?
you will need images of every different kind of clothing to robustly allow stable diffusion to transfer knowledge about other clothes onto how it would work in the model, and then you'd have to caption it properly

pure crypt Jan 15, 2024, 3:27 PM

#

dusky urchin > it will try to make that part of the results too right? Or is clothing easier ...

aah, could you give me an example please? I more or less understand what you mean here but I would like a more detailed explanation. Thanks a lot btw for answering!

dusky urchin Jan 15, 2024, 4:30 PM

#

pure crypt aah, could you give me an example please? I more or less understand what you mea...

the more examples of different clothing the model will see, the better your fine tuning will generalize to clothes that didn't appear in your dataset, especially in the details.

let's say your model is wearing a specific brand of Leota dress in all the shots.

rare_token is a woman wearing a leota dress

will make all dresses and the brand leota associated with generalizable features about your actress. as you start to overtrain, these features will appear more and more in test queries. you can use regularization to prevent the dress from being associated with rare_token: provide regularization images captioned leota dress of the exact dress being worn by different people.

rare_token is a woman

will have no impact on dresses. you can use regularization to prevent all women from looking like your actress.

at this stage, even without regularization, your fine tuning's ability to generalize will correspond to how much it is under/overtrained. it will still be perfectly capable of generalizing, and it will: your actress will be able to wear other clothes, even if you only had data of one outfit. why? because somewhere in the history of denoising steps that would create your image's dress, there is a common ancestor of noisy image between your image and another image with a different dress.

#

the captions are used to make the text encoder be able to express something specific in your image, and to provide more coherent contrasts between different ways your image interacts with concepts. so having your actress wear multiple outfits will make clothes fit much better on her generally.

pure crypt Jan 15, 2024, 6:50 PM

#

dusky urchin the more examples of different clothing the model will see, the better your fine...

Thanks a lot for the explanation! it's a bit clearer but I still need to study the documentation a lot more. This brings the to one more question: hypothetically, what would happen if my actress were (almost) naked? Could a lora work under these circumstances?

Oh, and one more that came to my mind while typing the first: what would be the best tool for captioning, other than manually doing it?

dusky urchin Jan 15, 2024, 6:50 PM

#

pure crypt Thanks a lot for the explanation! it's a bit clearer but I still need to study t...

not sure, i think you can start experimenting with pre-existing stuff

pure crypt Jan 15, 2024, 7:00 PM

#

ok, thanks a lot for real!

frigid pier Jan 16, 2024, 7:20 PM

#

Haaaalp, how do I fix one part from an image? In my case a belly, it has weird anatomy in some parts and a belly button is missin.

SOLVED: combination of low values of steps and cfg scale with middle noising value was able to do it. I’m guessing, when doing small areas inpaint, smaller values work the best. Atleast with JuggernautXL checkpoint.

#

Tried with inpaint but the results suck. Denoising from 0.3-1.0, bad results. Cfg scale 3-8 also, and combination of these two. Sampling rate, different masking etc same thing.

#

Maybe I just patch it in Photoshop and do inpainting again to fuse things together

blissful vine Jan 16, 2024, 8:15 PM

#

Can anyone recommend some good upscalers for use in a1111? I want something that smooths the images like an anime upscaler, but not as much. I want to lose the texture effects that many images have.

brisk elbow Jan 17, 2024, 5:37 PM

#

Is there a good resource for personalization fine-tuning I should start with?

icy silo Jan 17, 2024, 5:56 PM

#

can i change the vae of an already generated image? if yes, how? I dont want the image to change at all

latent charm Jan 17, 2024, 7:17 PM

#

icy silo can i change the vae of an already generated image? if yes, how? I dont want the...

no. prompt->clip->unet->vae->image.

#

You need to regenerate the image and use another vae

icy silo Jan 17, 2024, 7:31 PM

#

latent charm You need to regenerate the image and use another vae

it was made with a lot of inpainting and I got lucky too

#

so that wont work

latent charm Jan 17, 2024, 7:32 PM

#

I don't know what the purpose to use another vae if you already do a lot of inpainting. But you could load the image and pass it to another vae in comfyui

icy silo Jan 17, 2024, 7:52 PM

#

i guess its time to switch afterall, im still on a1111 everybody been recomending comfy

#

i hope the vram req aint higher?

stiff dust Jan 17, 2024, 8:33 PM

#

icy silo so that wont work

as XiaoZhi said: changing the vae afterwards makes no sense. What do you want to achieve?

amber bay Jan 17, 2024, 9:33 PM

#

Hi, do you know any fully transparent diffusion model on hugging face or other ? (-> a model where we exactly know which data were used for the training?).

stiff dust Jan 17, 2024, 9:59 PM

#

I think SD 1.5 was just using LAION, wasn't it?

dusky urchin Jan 17, 2024, 10:19 PM

#

amber bay Hi, do you know any fully transparent diffusion model on hugging face or other ?...

what's the objective?

icy silo Jan 18, 2024, 6:37 AM

#

stiff dust as XiaoZhi said: changing the vae afterwards makes no sense. What do you want to...

make the image pop off more, more vibrant colors

stiff dust Jan 18, 2024, 9:51 AM

#

the vae is just a compression. It wont make your image more colorful or anything. The only thing that can happen is that when you pass the image through the vae it loses some quality and colors. However, you cannot get that back afterwards. It's like you compress and image as jpg and you see jp artifacts. You cannot remove them easily afterwards. You would have to do a whole img2img pass.

icy silo Jan 18, 2024, 12:21 PM

#

stiff dust the vae is just a compression. It wont make your image more colorful or anything...

i see

#

but img2img is changing the intricate details even on low denoising strength

#

you got any tuorial on that?

stiff dust Jan 18, 2024, 2:23 PM

#

to be honest, if the image looks washed out it might be the vae but in most cases it's somewhat different

#

if you do a lot of inpaintings with blurred mask this leads to a washed out effect

grizzled jungle Jan 18, 2024, 3:30 PM

#

Can anyone help me with a lora training thing? I'm kinda at a loss and I need help.

#

Basically, I'm trying to train a lora on a character, and it seems like the results I'm getting are... subpar compared to the capabilities of the model I trained it on... First four are with the lora, the next four are without the lora... Main model is VividOrangeMix, the metadata should be in the pics, and these are my current settings for training said LORA.

#

I hope someone can help...

cold cliff Jan 18, 2024, 4:37 PM

#

Hey. I've been trying to find out how to train the detection models for ADetailer, but I cannot find any documentation on it. Are there guides out there for building a dataset and training, and can it be done with a consumer GPU?

Use case: whole head detection for non-human characters like Mass Effect asari where face detection works but the errors I'd like to correct are often made in the details around the face.

dusky urchin Jan 18, 2024, 11:10 PM

#

grizzled jungle Basically, I'm trying to train a lora on a character, and it seems like the resu...

i don't understand - what is the source image? what is the character?

dusky urchin Jan 18, 2024, 11:12 PM

#

cold cliff Hey. I've been trying to find out how to train the detection models for ADetaile...

i don't think the detection models play a role in your problem. they return bounding boxes.

#

adetailer itself uses the bounding box to create the mask

#

the mediapipe models make a tighter mask, but i am not sure how significant it is

#

you can certainly try to refine the output of mediapipe, but it's not designed to be trained with more images

#

you don't have the gradient weights, and it's not architected around a workflow of "pretraining" versus "training"

#

you could write code to use insightface instead

cold cliff Jan 18, 2024, 11:15 PM

#

I meant training a custom model to get the bounding box around the whole head. I've seen models on CivitAI for eyes, but no doc on how they trained it.

dusky urchin Jan 18, 2024, 11:15 PM

#

can you send me a link?

cold cliff Jan 18, 2024, 11:16 PM

#

https://civitai.com/models/150925/eyes-detection-adetailer

Eyes detection (Adetailer) - v1.0 | Stable Diffusion Other | Civitai

This model detects eyes so you can add more detail to the eyes. Can be used for: Adding details to eyes Enhance character specific eyes (Hu Tao, AI...

dusky urchin Jan 18, 2024, 11:16 PM

#

i think you are misunderstanding what the model is. there aren't any amateurs in the civitai community training eye detection systems. they simply adapted something else that already exists.

#

that file is basically a serialized python blob that configures one of the pre-existing models to match eyes

#

@cold cliff does that make sense?

cold cliff Jan 18, 2024, 11:19 PM

#

I got it. I wanted to know if it was something mortals could train, and that these models aren't trained by the amateur community answers my question.

dusky urchin Jan 18, 2024, 11:20 PM

#

it will probably be easier to just expand the bounding box

#

a little bit

#

that specific eyes detection file is yolov8 aka ultralytics

cold cliff Jan 18, 2024, 11:21 PM

#

Or just not be lazy and use img2img impaint, I guess. 😛

grizzled jungle Jan 19, 2024, 1:06 AM

#

dusky urchin i don't understand - what is the source image? what is the character?

There is no source image. The first four represent the Lora. And the second batch of four images represent the same prompt without the Lora.

dusky urchin Jan 19, 2024, 6:29 PM

#

grizzled jungle There is no source image. The first four represent the Lora. And the second batc...

yeah i know, so what is your example source image?

#

like i don't know this character or what it is trying to learn

grizzled jungle Jan 19, 2024, 6:30 PM

#

dusky urchin yeah i know, so what is your example source image?

You mean… the images I trained on?

dusky urchin Jan 19, 2024, 6:30 PM

#

grizzled jungle You mean… the images I trained on?

yes

#

an example from your dataset

#

just 1

grizzled jungle Jan 19, 2024, 6:31 PM

#

This is technically one of them, except the one in the dataset isn’t transparent.

#

What do you think?

dusky urchin Jan 19, 2024, 6:34 PM

#

do you have images with backgrounds (you don't have to share it)? and how many total images do you have?

grizzled jungle Jan 19, 2024, 6:41 PM

#

dusky urchin do you have images with backgrounds (you don't have to share it)? and how many t...

12 images in all, she’s a relatively new character.

dusky urchin Jan 19, 2024, 6:42 PM

#

grizzled jungle 12 images in all, she’s a relatively new character.

12 images isn't going to generalize that well. it really depends on how flexible of an asset you want. you might have better results experimenting with controlnets, ip adapter and attention masks, to recreate the character's look exactly.

#

your best bet is to be patient and wait until there is more art of the character though.

#

stuff like your choices of optimizer and such, the net impact is that training goes faster (or slower). if you're patient, your result is going to reflect your dataset one way or another

grizzled jungle Jan 19, 2024, 6:45 PM

#

Well, I managed to generate a pretty good Lora with 13 images once, I just lost the config for it and am trying to make things work.

dusky urchin Jan 19, 2024, 6:45 PM

#

hmm

#

i'm saying from a scientific, hard facts point of view, what i am saying is true. you can use basically any configuration, as long as you are patient. does that make sense?

#

there isn't a lost config idea that will make this "work."

grizzled jungle Jan 19, 2024, 6:46 PM

#

dusky urchin i'm saying from a scientific, hard facts point of view, what i am saying is true...

Yes, it does.

dusky urchin Jan 19, 2024, 6:46 PM

#

yeah

#

i mean it's tough with 12 images

grizzled jungle Jan 19, 2024, 6:47 PM

#

Yeah…

grizzled jungle Jan 19, 2024, 6:47 PM

#

dusky urchin i mean it's tough with 12 images

So there isn’t a definitive config to account for that, in other words?

dusky urchin Jan 19, 2024, 6:47 PM

#

yeah i think your config looks good. like you did everything right

#

you can generate more content from those 12 images, to help it generalize stuff like backgrounds, but it's not going to get a pixel-perfect recreation of the artistic concepts without a lot more data. in my experience, for characters that are art-directed like anime and video games, you need something like 100-1,000 unique instances to get the art direction right every time, and closer to 1,000-10,000 to get the exact representations right in the style they were presented in.

#

so the art direction to me means the character's silhouette, proportions, wardrobe & jewelry, palette

#

to get the face right perfectly you need a lot more representations. i don't think people in the community are releasing stuff that actually looks like the characters they think they do, because they are biased by what their own eyes focus on when comparing dataset images to genereated outputs

#

something like spiderman probably appears in sdxl's aesthetic laion2b handcrafted database 10,000-100,000 times, at least, and in different styles

#

another way to think about this is, your training set is really
O(number of artistic scenes k * (number of concepts you want to generalize CHOOSE number of simultaneous usages of those concepts)). if you are okay with having exactly 1 additional thing you want to generalize in addition to your character, such as "character on a different background" or "character wearing a blue hat" but never "character wearing a blue hat on a different background", you need 10-100 scenes and each one has to illustrate len(background, hat, ..., other concepts)^2 contrasts, so = O(k * n C 2) ~ O(k * n^2). the real limitation is creating contrasts, not gathering enough different poses, which is just 1 concept.

#

am i making sense?

#

this stuff requires way way way more data than the community thinks it does

grizzled jungle Jan 19, 2024, 6:58 PM

#

dusky urchin am i making sense?

I understand.

#

It’s just a shame because I genuinely like this character.

grizzled jungle Jan 19, 2024, 7:15 PM

#

dusky urchin this stuff requires way way way more data than the community thinks it does

I just can’t use any more images because there really isn’t any others that work, you know?

latent charm Jan 19, 2024, 8:54 PM

#

You could generate more similar images using style aligned and ip adapter to create the diversity. Also, you could add the character to different background.

fast ocean Jan 20, 2024, 5:43 PM

#

For anyone who's trained an SDXL Lora on a person (realistic not anime), what Net Dim and Alpha do you use? I'm struggling to find the right combo. Struggling with all the settings, actually. It seems to be quite a bit harder to find the right settings for training SDXL Loras than it was for 1.5.

stiff dust Jan 20, 2024, 6:55 PM

#

dim 16-48 for unet, text encoder can be much less

#

I keep alpha=1, but you can try higher if you want

fast ocean Jan 21, 2024, 2:47 AM

#

stiff dust I keep alpha=1, but you can try higher if you want

Honestly the biggest issue I have is that when I train an SDXL lora on the Base XL model, it only works with that model when loaded into A1111. For example, if I tried to use the lora with Juggernaut XL or Realism Engine, it loses most of its resemblance to that person. I used to have the same issue with 1.5 loras until I started training with Photon instead of the base 1.5 model, but with XL I don't really know what would be the equivalent of that.

stiff dust Jan 21, 2024, 11:56 AM

#

this is rather a problem of Juggernaut than sdxl

#

juggernaut is just highly overtrained

#

however, training on Juggernaut instead of base is also extremely difficult, because juggernaut uses a q lot of strange training settings such as pyramid noise. I never succeeded in training on Juggernaut and, therefore, stopped using the model at all

#

currently I'm mostly using Dreamshaper XL which is not overfitted (all loras work normally on it) while still being good in all styles and photorealism. I bet there are other good models, too, which are not severly overfitted

jade hornet Jan 21, 2024, 3:58 PM

#

Agree with most of that, but would add a comment. In general I would simply say that certain trained models have different strengths than others, even the base. I find that juggernaut handles multiple subjects really well as an example. You can get an idea of whether that's the case by how certain loras perform with the various different models, how they draw anatomy varies, how they render detail varies, etc. So I wouldn't necessarily discourage training against some of these models as a rule.

fast ocean Jan 21, 2024, 7:22 PM

#

stiff dust currently I'm mostly using Dreamshaper XL which is not overfitted (all loras wor...

Do you mean that you train your loras using Dreamshaper XL?

#

I considered it but since it's a turbo model i thought perhaps it wouldn't be the best thing to train on.

stiff dust Jan 21, 2024, 7:34 PM

#

no, I train on sdxl base

#

I said that loras trained on base are transferable to most models, including dreamshaper xl

#

if they don't work well on particular models (like Juggernaut XL or RealVision) then this might be an indication that this model is overfitted

fast ocean Jan 21, 2024, 7:37 PM

#

stiff dust if they don't work well on particular models (like Juggernaut XL or RealVision) ...

oh ok, gotcha. I just wish there was a model I could train on that would work as consistently well as Photon did for me with 1.5. You think that my SDXL base model trained lora showing almost no resemblance to the person with any model except for the base model is an indication of overtraining?

stiff dust Jan 21, 2024, 7:38 PM

#

no, I said that Juggernaut is overtrained and that's why it doesn't work with your lora

fast ocean Jan 21, 2024, 7:38 PM

#

oh, ok. But my lora doesn't work with ANY model besides base SDXL.

stiff dust Jan 21, 2024, 7:38 PM

#

people overtrain their models and then they merge them with other overtrained models and at some point nothing works anymore xD

stiff dust Jan 21, 2024, 7:39 PM

#

fast ocean oh, ok. But my lora doesn't work with ANY model besides base SDXL.

did you tried Dreamshaper? Or SDXL Turbo?

fast ocean Jan 21, 2024, 7:40 PM

#

yeah got no resemblance with dreamshaper. Honestly I don't like the turbo models. But I've tried all the most popular models and nope, doesn't work. I had the same issue with 1.5 until I started training on a non-base model.

stiff dust Jan 21, 2024, 7:41 PM

#

hm, that's strange. Do you use any weird training parameters like pyramid noise or a too high or too low offset noise?

fast ocean Jan 21, 2024, 7:42 PM

#

I haven't seen those settings in Kohya SS so I don't think so. I'm not the most experienced person when it comes to training loras.

dusky urchin Jan 22, 2024, 5:46 PM

#

fast ocean Honestly the biggest issue I have is that when I train an SDXL lora on the Base ...

what's your objective? you want someone's likeness to flexibly appear in sdxl generations? how flexible? and what's the application?

fast ocean Jan 22, 2024, 5:47 PM

#

dusky urchin what's your objective? you want someone's likeness to flexibly appear in sdxl ge...

I would like to make a Lora model based on a real person and be able to make realistic looking photos. It doesn’t have to be super flexible, at this point I’d be happy just to have it work with any model. Can’t even get it to work properly with base model SDXL. I’m not sure you mean by “what’s the application”.

dusky urchin Jan 22, 2024, 5:51 PM

#

fast ocean I would like to make a Lora model based on a real person and be able to make rea...

“what’s the application”
like what kind of realistic looking photos? what is the person doing? or is it unconstrained? like what is the idea? for example, "i want to create a #vanlife instagram except with this personality's likeness" means, okay making it look like instagram is 90% of the way there, but i don't know if ~~juggernaut will ever be able to generate someone sitting on top of a van correctly,~~ juggernaut did put the lady on the van pretty plausibly! or maybe a specific image that you can think of that you would like to recreate this person's likeness in

#

when you say can't get it to work properly - it depends mostly on your patience. how long have you run a training, on how many images?

#

to diagnose your fine tuning, take a look at tensorboard.

fast ocean Jan 22, 2024, 5:55 PM

#

dusky urchin > “what’s the application” like what kind of realistic looking photos? what is t...

Oh, ok, I understand. I think just being able to make generic instagram style pics would suffice for me. With SD 1.5 I would make ones with the person camping, standing in urban/suburban areas, backyard photos, pics inside restaurants etc just casual candid style photos. I wouldn’t have even bothered with SDXL had I not been intrigued by the fact that it seems easier to pose people differently with SDXL than with 1.5. And as far as the part about not getting it to work properly, I’ve tried everything from 2000 steps to more than 10000, tried training with a very small photo dataset and a large one, etc doesn’t seem to change the fact that the pics come out strange.

dusky urchin Jan 22, 2024, 5:57 PM

#

fast ocean Oh, ok, I understand. I think just being able to make generic instagram style pi...

just to confirm, how much time does 2,000 steps or 10,000 steps take for you?

#

and how many photos is a small dataset versus a large one?

#

you are able to generate images correctly with other community LoRAs, right?

fast ocean Jan 22, 2024, 5:59 PM

#

Yes I haven’t had any issues with other loras. Small would be anywhere from 10 photos to 20, large would be from 80 to 100. 2000 steps would probably take me about 45 mins to an hour at the most. 10,000 probably 3 to 4 hours.

dusky urchin Jan 22, 2024, 5:59 PM

#

okay

#

so for my application, replicating a likeness for flexible creative cinema-style scenes, i use 1,000-10,000 images for a "basic" level of flexibility. A LoRA based training for me is at least 50,000 steps (aka 50 epochs), which is about 16h on my fastest configuration and hardware.

#

this is what i mean by patience. i think you are probably off by an order of magnitude for the amount of patience you need, even for a LoRA training, on commodity hardware. you should be ready to wait close to a week for 10,000 images.

#

or. you can choose configuration that happens to work very well for faces, and lean into the fact that people recognize celebrities better than ordinary people, so they're going to be much more forgivable when you don't get the right appearance.

#

then you can expect thigns to go faster. but every misconfiguration can be solved by patience

#

how did you caption 10,000 images?

#

or is this with no text encoder training?

fast ocean Jan 22, 2024, 6:06 PM

#

Oh I think you misunderstood me. I didn’t use 10,000 images

dusky urchin Jan 22, 2024, 6:06 PM

#

oh i'm sorry

#

100 images

fast ocean Jan 22, 2024, 6:06 PM

#

Yes. I manually captioned the 100 images myself.

dusky urchin Jan 22, 2024, 6:07 PM

#

so you should still be spending in the 10s of hours if you are not sure if your configuration is correct

fast ocean Jan 22, 2024, 6:07 PM

#

dusky urchin so you should still be spending in the 10s of hours if you are not sure if your ...

Not sure what you mean.

dusky urchin Jan 22, 2024, 6:08 PM

#

hmm. well maybe a better question is, are you using prodigy as your optimizer?

fast ocean Jan 22, 2024, 6:08 PM

#

Adafactor.

dusky urchin Jan 22, 2024, 6:09 PM

#

all problems with adafactor can be solved by patience.

#

if you are impatient, try prodigy

fast ocean Jan 22, 2024, 6:10 PM

#

I have patience, it’s just that I’ve probably done about 50 tests in the last week and seeing no improvement. It’s just confusing 🙂

dusky urchin Jan 22, 2024, 6:10 PM

#

it's complicated, but the community guides make it sound like 30 minutes is enough time to train something

#

in my experience, it almost never is

#

however, you might have some other issues. usually the text encoder learning rate is too high for sdxl. this is an issue that prodigy can deal with for you

#

prodigy is an optimizer that was overfit, in a sense, for training facial likenesses on instagram style generations

fast ocean Jan 22, 2024, 6:11 PM

#

Oh I know it takes many hours. I’ve basically been running steady tests for the last 8 days straight lol. So if I try prodigy, what learning rate, unet learning rate and text encoder learning rate should I use?

dusky urchin Jan 22, 2024, 6:11 PM

#

you don't make those choices with prodigy

#

can you give me an example of a caption you authored?

fast ocean Jan 22, 2024, 6:12 PM

#

So with prodigy I should set them all to 1?

dusky urchin Jan 22, 2024, 6:13 PM

#

i actually never deal with the configuration files of kohya directly, i use the objects, but i think the documentation says exactly what it should be. my guess is it does not matter, given how prodigy works

#

i am surprised you are getting no improvement whatsoever

#

so maybe there are some other flaws

#

8 days is a lot for a casual or enthusiast application too. what is this really for?

#

i am supportive but it would help me understand your expectations

#

short of showing the images themselves, which i assume this isn't an anime character, it's a real person

fast ocean Jan 22, 2024, 6:15 PM

#

the weird thing is that often the samples look quite good as it's training the model, and then when i load it into a1111, they look poor. Oh I'm just on holidays and recently got a new computer so I've been playing with it a lot, that's all, lol. It's just for fun and I like learning.

dusky urchin Jan 22, 2024, 6:15 PM

#

hmm

#

what are you loading into A1111? if you are doing a LoRA training, you will have a file that is ~150-300MB, ending in .safetensors, with _NumEpochs suffixed to it, until it is done

#

how are you visualizing samples? you mean you configured it to generate something, for which prompts?

#

another POV is you should be using ComfyUI

fast ocean Jan 22, 2024, 6:20 PM

#

yes, I get safetensor files that I load into the Lora folder of A1111. I'm using the sample feature of Kohya SS, where you enter a prompt and it generates a photo every set number of steps. unfortunately comfyUI is way too far out of my comfort zone as a newbie to AI generating.

dusky urchin Jan 22, 2024, 6:21 PM

#

fast ocean yes, I get safetensor files that I load into the Lora folder of A1111. I'm using...

okay, i mean if the samples it generates are fine, something else is messed up. you will be able to figure out comfyui, it isn't as daunting as it seems

fast ocean Jan 22, 2024, 6:24 PM

#

Often the samples look good but not great. And then in A1111, they often look blurred, squished, etc. maybe it’s actually an issue with the way I’m generating my pics? I know I’m doing something wrong I’m just trying to figure out what lol.

normal ember Jan 22, 2024, 6:28 PM

#

At what size do you generate?

fast ocean Jan 22, 2024, 6:28 PM

#

normal ember At what size do you generate?

1024 by 1024

normal ember Jan 22, 2024, 6:29 PM

#

Should be good. If you would have been less than a megapixel then you would have issue like you describe.

dusky urchin Jan 22, 2024, 6:30 PM

#

fast ocean Often the samples look good but not great. And then in A1111, they often look bl...

have you tried using an sdxl lora from the community?

#

try this - https://civitai.com/models/188525/pixar-style-sdxl - and copy one of the prompts & settings of a reference image

Pixar Style (SDXL) - v1.0 | Stable Diffusion LoRA | Civitai

≥30 steps recommended. Note the difference in color between the 20 vs 30 step examples in the gallery. The strength = 1 used for all example genera...

#

anyway i think you will figure this out

#

i gotta go

normal ember Jan 22, 2024, 6:32 PM

#

Text encoder overfits quickly so it's probably a good idea to only train the unet to begin with and see where that gets you.

fast ocean Jan 22, 2024, 6:32 PM

#

Oh I can train with only one or the other?

dusky urchin Jan 22, 2024, 6:32 PM

#

if the sampled results look fine, i think there's an issue with how you are using the web ui

#

there isn't really a fine tuning bug here anymore

fast ocean Jan 22, 2024, 6:37 PM

#

I just tried that "Pixar" Lora you shared above using the same prompts in one of their images, and it did not work. So Maybe it is a problem with my A1111? Also, I didn't say all the samples looked fine, most of them are demented, lol. So I think it's probably a mix of both issues

#

i appreciate your help, i'll try a few more things and see if anything helps.

normal ember Jan 22, 2024, 6:43 PM

#

Don't give up! 🙂

dusky urchin Jan 22, 2024, 6:46 PM

#

fast ocean I just tried that "Pixar" Lora you shared above using the same prompts in one of...

it sounds like you are probably mixed up about which checkpoints you are using where, and why. you are probably mixing sd 1.5 and sdxl settings

fast ocean Jan 22, 2024, 6:47 PM

#

I don’t think so. I used the exact settings from the prompt they used on the Lora page you gave me. I’m not using any SD 1.5 vae or models, I used the SDXL base model to generate the pic

#

I thought the same thing at first.

fast ocean Jan 22, 2024, 7:00 PM

#

normal ember Don't give up! 🙂

Thank you, I feel like giving up but once I start a project I find it hard to stop til I get it right so I will continue on, lol.

fast ocean Jan 23, 2024, 5:29 AM

#

dusky urchin it sounds like you are probably mixed up about which checkpoints you are using w...

Just wanted to report back and let you know I discovered the solution to both problems: my SDXL photos were coming out so bad in A1111 because I had my CFG turned down to 1 (don’t even know how I missed that). As far as my Lora training goes, I went back to my original settings which worked fairly well, and I turned off “don’t upscale bucket resolution” which for some reason immensely helped me! My images are now coming out normal and my Lora looks really good. Thanks everyone for your help 🙂

patent moat Jan 26, 2024, 10:14 PM

#

Hi ! How can i have the rights to use the bot ?

pure prairie Jan 27, 2024, 3:39 PM

#

hi

urban halo Jan 27, 2024, 10:51 PM

#

https://discord.com/channels/1002292111942635562/1047610792226340935 and https://discord.com/channels/1002292111942635562/1100170153829871686

dusky urchin Jan 28, 2024, 9:49 PM

#

@stiff dust i am exploring full fine tunes on multiple machines, do you have any experience or opinions about this?

stiff dust Jan 28, 2024, 9:58 PM

#

never worked on full finetunes as loras work already fine. Pseudoterminalx work's a lot with full finetunes

stone garden Jan 29, 2024, 8:58 AM

#

how can i create an image here?

zenith delta Jan 29, 2024, 10:26 AM

#

Hello everyone!

I've trained a whole bunch of LORA's and Checkpoints on real persons/styles/objects/concepts previously with an overall great sucess, but there was one thing that I was never able to achieve properly.

How can I train a real person, for the purpose of anime generation? I've tried with both SD 1.5 and SDXL and achieved only a semi-sucess with SD1.5! Has anyone tried something simillar before?

For my semi sucessful attempt, I trained the NAI model with 25 of my images, removing the background from images that had complex backgrounds. The images featured myself wearing various outfits, and had images in 2 different light settings -frontlit and backlit-, I've also made sure to avoid overtraining by adding 10 full body images, whereas I only use upper body and face close ups for realistic trainings. Overall the images were of high quality, and performed really well when I used the same dataset for a realistic training. I've used a booru style capitoning, captioning an activation phrase for myself, and then not captioning any details about myself, but only my pose and outfits. During the training, I used a Network Rank of 64, and Alpha of 32. I've used Adafactor optimizer, and my learning rate for the Text Encoder and UNET was both 0,0001.

I was able to generate images featuring myself using this LORA, but the LORA would completely change the style of the base model, usually for the worse. It generally changed the style from a 2D drawing to either a 3D illustration, or in worse cases a 3D Model. The likeness however, was usually great!

Looking forward to hearing your opinions, and tips!

stiff dust Jan 29, 2024, 10:36 AM

#

In general that is not a problem. Use validation images with anime or cartoon prompts and check how they behave during training. If you overtrain, it indeed happens that the images turns to photos. But if you stop early enough this shouldn't happen. Check if your training photos are all captioned with "photo of" or similar tags. I would also reduce the rank from 64 to a smaller rank. In particular for the text encoder, you can use a MUCH smaller rank (like 6 or 8). I found text encoder training very vulnerable for style overfitting (=everything turns into photo), but I also found it hard to train on unet only. Maybe try to stop text encoder training early enough and continue training with unet only. In general it's hard to get a model that is equally good in both, photorealism and anime, so better you focus on an anime/cartoon only model

zenith delta Jan 29, 2024, 10:39 AM

#

I see. The best LORA in my case was the Epoch2 model for the style; but the likeness was off on that one. I could use the LORA's up to epoch4 depending on the complexity of the prompt; with more complex prompts working better with the higher epochs.

stiff dust Jan 29, 2024, 10:40 AM

#

yeah, prompt matters a lot

#

try not just "anime of [token]" but rather something like "anime illustration of [token] by makoto shinkai and studio ghibli"

#

(makoto shinkai is a strong prompt modifier, it turns everything quite reliably into anime, although quite realistic drawn anime)

#

if I want more less-realistic anime I also always add an anime lora additionally to my face lora

#

but in general I found it easier train my face for drawn styles like anime or cartoon than for photorealism

zenith delta Jan 29, 2024, 10:44 AM

#

I see

stiff dust Jan 29, 2024, 10:44 AM

#

if your results get better with longer prompts that could also be an indication that your text encoder is overfitted on your character prompt

#

I always trained text encoder and unet separately and with different dimensions (text encoder only low dimensions like rank 4 ), but I cannot say how much that helped

zenith delta Jan 29, 2024, 10:45 AM

#

I generally used prompts like; (1man, masterpiece, best quality, high quality:1,4), brk(my token), and then booru style prompts.

#

This is for SD1.5 though

#

I have never seperated the Unet and TE training before. Can it be done on Kohya?

#

I also thought your network alpha was supposed to be lower than your network rank. In the case I lower the rank to 8, should I have the alpha at 4?

stiff dust Jan 29, 2024, 10:48 AM

#

yes, it should. You can also keep alpha at 1

#

makes training a bit slower, but that might be rather a good thing

zenith delta Jan 29, 2024, 10:48 AM

#

I see! I'll test that out tonight!

#

I'll keep the rank at 8 and alpha at 1. I'll use the same dataset and the same lr.

#

Shall I lower the text encoder lr from 0,0001 to 0,00009?

#

But I think using Adafactor needs me to use the same LR for both TE and Unet

stiff dust Jan 29, 2024, 10:52 AM

#

for training unet/te separately: I think newer kohya versions have some stuff like stop TE training after some epochs and so on. But what I usually do is:

make a training with TE only (--train_text_encoder_only) and with very low rank (--network_dim=4)
train a few epochs, validate how the images look like and if they change style. Stop very early! Like when the images starts turning into a photo at epoch 8, don't use epoch 7, use rather epoch 5)
start a new training with unet only (--train_unet_only) and higher network dim (--network_dim=16), save after one step and immediately cancel training as soon as the output file is written out
next you merge the two output files (the text encoder one and the unet one). Now you have one output file that contains both
now you start training again with this output file (--network_weights=myfile) and unet only (--train_unet_only)

stiff dust Jan 29, 2024, 10:52 AM

#

zenith delta Shall I lower the text encoder lr from 0,0001 to 0,00009?

I found lower learning rates not effective. Even with extremely low learning rates the text encoder can overfit, so you can also keep it at 1e-4. I use AdamW, though.

#

as said, the workflow with separate trainings for unet and text encoder is quite complicated and I don't know if it's necessary at all

#

but my experience so far was that text encoder is extremely vulnerable to overfit, and such a workflow allows you to observate and validate the text encoder during training

zenith delta Jan 29, 2024, 10:54 AM

#

Aha

#

That makes quite a lot of sense

#

I'll bring the kohya ui up now to see if it has a checkbox for only unet or only te training

#

It seems not; I guess I'll use the script to start the training then

stiff dust Jan 29, 2024, 10:57 AM

#

that would surprise me. I think the UI should have that options - they are quite old

zenith delta Jan 29, 2024, 10:58 AM

#

It has a setting for Stop Text Encoder training

zenith delta Jan 29, 2024, 11:00 AM

#

stiff dust that would surprise me. I think the UI should have that options - they are quite...

But there is also a additional parametres tab, which I should be able to use the arguments youve provided

#

The results are much better when I train with the images of my girlfriend btw, this is due to being more female images in anime models?

normal ember Jan 29, 2024, 11:21 AM

#

@stiff dust I've found eyes overfitting way quicker than anything else. Do you have any workaround for that?

dusky urchin Jan 29, 2024, 4:57 PM

#

zenith delta The results are much better when I train with the images of my girlfriend btw, t...

does your dataset have drawn images of whom you're trying to train?

#

like do you have a mix of photographic and anime images of your person to train on? it's okay if the answer is no

#

regularization should help, but it works better if you already have a large dataset of images

final patio Jan 29, 2024, 5:16 PM

#

I want to finetune SD on my face and have 8 GB of VRAM. Would a LoRA be the best way to do so? Are there any up to date, ideally simple guides on how to do this?

shut relic Jan 29, 2024, 7:43 PM

#

dusky urchin does your dataset have drawn images of whom you're trying to train?

What does regularization actually do, from my experience it just makes the lora over fit on the regularization images

#

Isn't regularization supposed to tell the trainer that the images are NOT supposed to look like this

dusky urchin Jan 29, 2024, 7:44 PM

#

shut relic What does regularization actually do, from my experience it just makes the lora ...

regularization is really poorly explained online, so you have to have a lot of patience with this

#

before i say anything more, do you program? do you have some experience with probability as an idea in math?

shut relic Jan 29, 2024, 7:44 PM

#

Yes for like 7 years xd

#

I'm a software engineer

#

Doctor pangloss is typing up a storm

dusky urchin Jan 29, 2024, 7:48 PM

#

let's start with what the dreambooth paper actually says abour regularization:

Encouraging diversity with prior-preservation loss.
Naive fine-tuning can result in overfitting to input image context
and subject appearance (e.g. pose). PPL acts as a regularizer that
alleviates overfitting and encourages diversity, allowing for more
pose variability and appearance diversity.

#

there is a lot of jargon explaining what this concretely is:

Prior Preservation Loss Ablation

We fine-tune Imagen
on 15 subjects from our dataset, with and without our pro-
posed prior preservation loss (PPL). The prior preservation
loss seeks to combat language drift and preserve the prior.
We compute a prior preservation metric (PRES) by comput-
ing the average pairwise DINO embeddings between gener-
ated images of random subjects of the prior class and real
images of our specific subject. The higher this metric, the
more similar random subjects of the class are to our specific
subject, indicating collapse of the prior. We report results in
Table 3 and observe that PPL substantially counteracts lan-
guage drift and helps retain the ability to generate diverse
images of the prior class. Additionally, we compute a di-
versity metric (DIV) using the average LPIPS [73] cosine
similarity between generated images of same subject with
same prompt. We observe that our model trained with PPL
achieves higher diversity (with slightly diminished subject
fidelity), which can also be observed qualitatively in Fig-
ure 6, where our model trained with PPL overfits less to the
environment of the reference images and can generate the
dog in more diverse poses and articulations.

shut relic Jan 29, 2024, 7:50 PM

#

Do you have something better formatted

dusky urchin Jan 29, 2024, 7:50 PM

#

lol well... listen it's not super informative. the key thing is that regularization should really be called "prior preservation loss"

#

once you have this, here's a good explanation from hugging face:

Prior preservation loss is a method that uses a model’s own generated samples to help it learn how to generate more diverse images. Because these generated sample images belong to the same class as the images you provided, they help the model retain what it has learned about the class and how it can use what it already knows about the class to make new compositions.

shut relic Jan 29, 2024, 7:51 PM

#

hmm

dusky urchin Jan 29, 2024, 7:51 PM

#

so what is really concretely happening? it comes down to understanding the role of a text encoder, why it's called a "conditional" unet, why the word "conditioning" is used, and why "context free guidance" is used

#

basically a bunch of things about probability. it's actually essential to what "denoising" is, and what you are actually training

shut relic Jan 29, 2024, 7:52 PM

#

I know that the text encoder decides which vector to give each word

#

And that textual inversion works by figuring out a magic vector that gives you the image you want

dusky urchin Jan 29, 2024, 7:53 PM

#

suppose you knew what all these things meant. prior preservation loss is a way to ensure that conditioned denoising on [X, Y, Z] where you are giving examples of [Z] does not make the original [X, Y] less likely.

shut relic Jan 29, 2024, 7:55 PM

#

Isn't denoising just the process of transforming the pixels into something else, hopefully an arrangement you like?

dusky urchin Jan 29, 2024, 7:56 PM

#

in other words, when you write a caption and provide an example image for:
taylor swift at a football game with many football players
you will want to create a regularization caption and image
a football game with many football players
which can even be using an image that the generation process itself creates - that's what the dreambooth paper does, and that's where the images from the reguarlization github repos that the community uses come from

dusky urchin Jan 29, 2024, 7:56 PM

#

shut relic Isn't denoising just the process of transforming the pixels into something else,...

denoising is the forward pass of the neural network. it takes a slightly noisier image in latent space, and then makes it slightly less noisy in latent space.

#

so the thing you are training is adding a bunch of noise to the VAE[your training image], making that your output, and then adding slightly more random noise, making that your input, and then backpropagating

#

okay - so you can see how if you are generally making all noisier images less noisy in the direction of your training images, you will make taylor swift appear "everywhere"

shut relic Jan 29, 2024, 7:58 PM

#

Eh my issues may be because of other mistakes i made

dusky urchin Jan 29, 2024, 7:58 PM

#

does that make sense?

shut relic Jan 29, 2024, 7:58 PM

#

Wait i gotta read that again

dusky urchin Jan 29, 2024, 7:58 PM

#

when people say "overfitting" this is what they actually mean

#

you have to also think about what is happening in the beginning of sampling - aka, when you set your ksampler in comfyui steps to 50, what is happening at steps 0-5 versus steps 45-50

shut relic Jan 29, 2024, 7:59 PM

#

That's a good question

#

Why does it not just do it in one step 4head

dusky urchin Jan 29, 2024, 7:59 PM

#

this also touches on what the meaning of your sampler choice is - why dpm... 3 is "Better" (really preferred for certain outputs) compared to euler A

dusky urchin Jan 29, 2024, 8:00 PM

#

shut relic Why does it not just do it in one step 4head

that's what sdxl turbo does

#

all of these things are related. the underlying reason it's so hard to understand regularization is that it directly relates to the arcane details of image generation

shut relic Jan 29, 2024, 8:03 PM

#

My issues were more that i was (to hold with the same analogy) training for taylor swift, and all my images were of her at football games. Without regularization, my Lora thought that taylor swift is football studios, and with regularization images of a bunch of football studios, my Lora model thought that taylor swift is football studios, faster. Basically, 'overfitting' faster. But do note this test case i did not use any captioining

dusky urchin Jan 29, 2024, 8:03 PM

#

here's an analogy i've been auditioning:

diffuser:
set your grid size to 1x1.
visit each grid square. roll dice. if it's a 1,2, or 3, do nothing. if it's 4,5, or 6, use the dice to consult a big book that maps dice rolls and your grid point to a color you should paint in your grid square.

increase your grid size to 2x2.
visit each grid square. roll dice, if it's a 1,2, or 3, do nothing. if it's a 4, 5 or 6, use the dice to consult a big book that maps dice rolls and your grid point to a color you should paint in your grid square.

....

#

training is determining the contents of that big book.
diffuser and conditioning: let's introduce conditioning:

draw a picture of a "cat"

increase your grid size to NxN.
visit each grid square. roll dice, if it's a 1,2, or 3, consult your book as usual, which maps dice rolls and your grid point to a color you should paint. if it's a 4, 5 or 6, reroll with dice weighted for cat.

#

CFG: is the number on the dice you switch between unweighted and weighted dice

#

control net is a different kind of dice weighting

#

this is a diffuser with conditioning

#

okay, now let's introduce captionless training: you're modifying the contents of the book to paint the grid squares more frequently towards your data set.

#

that's it

#

now let's introduce captioned (i.e. text encoder) training: you're also modifying the weights of all the dice in your caption.

shut relic Jan 29, 2024, 8:09 PM

#

Something's wrong with your order of messages

#

Oh NVM

dusky urchin Jan 29, 2024, 8:10 PM

#

so when you have cat and hat in a caption, you can devise a training method where you don't want to change the weights of hat "as much" as you change the weights of cat.

#

that is what prior preservation loss quite literally is

shut relic Jan 29, 2024, 8:11 PM

#

Okay, so what is the way to eliminate bias in datasets

dusky urchin Jan 29, 2024, 8:11 PM

#

this way you can use images of your cat in a hat for training the look of your specific cat, AND make sure to generalize your cat wearing other stuff - because hat's conditioning, when using CLIP, contributes to other conditioning relating to things on top of people's heads

#

the only way to eliminate bias in datasets is to make them larger

shut relic Jan 29, 2024, 8:12 PM

#

E.g. many models are biased on white women

dusky urchin Jan 29, 2024, 8:12 PM

#

ultiamtely the goal is to preserve the generalization power of pretraining aka having this nice, generalized checkpoint of weights

dusky urchin Jan 29, 2024, 8:13 PM

#

shut relic Okay, so what is the way to eliminate bias in datasets

a simple strategy is to provide contrastive examples directly in your data. this is different than dreambooth

shut relic Jan 29, 2024, 8:13 PM

#

What if I ONLY have images of taylor swift in football studios
Is there no way to tell the trainer "this i don't want"

dusky urchin Jan 29, 2024, 8:13 PM

#

PPL is a "shortcut" that was designed just for dreambooth. the achievement was using just a few images to get a diffuser to learn a detailed look and feel of something

dusky urchin Jan 29, 2024, 8:13 PM

#

shut relic What if I ONLY have images of taylor swift in football studios Is there no way t...

that is kind of* what regularization is

shut relic Jan 29, 2024, 8:13 PM

#

Argh

zenith delta Jan 29, 2024, 8:14 PM

#

Eyo you guys basically started a topic from my question

#

I was able to get much more stylized results by lowering the network rank, and using 100 reg images.

#

The problem is now that the model is kinda overfitting on the reg images itself.

shut relic Jan 29, 2024, 8:15 PM

#

EXACTLY

zenith delta Jan 29, 2024, 8:15 PM

#

I also stopped the text encoder at %30

#

It works perfectly on me this time, a male

#

But my girlfriends features get mixed up

shut relic Jan 29, 2024, 8:15 PM

#

clearly regularization is far more than "this i don't want"

dusky urchin Jan 29, 2024, 8:16 PM

#

the community focuses a LOT on getting an exact human face that generalizes to many instagram-style casual full body pose photography.

#

dreambooth was not designed for this. the best thing to do, if that is your goal, is to just wait.

zenith delta Jan 29, 2024, 8:16 PM

#

I'll make another attempt, with stoping the text encoder at 50

dusky urchin Jan 29, 2024, 8:16 PM

#

IP Adapter and similar is dealing with this issue directly

zenith delta Jan 29, 2024, 8:17 PM

#

Oh no I'm not trying to get realistic results.

#

I'm trying to see how I can turn a male and a female real person to anime style.

dusky urchin Jan 29, 2024, 8:17 PM

#

i guess what i am saying is that IP adapter can also do that

zenith delta Jan 29, 2024, 8:18 PM

#

dusky urchin i guess what i am saying is that IP adapter can also do that

It sucked in stylizing images in my case

dusky urchin Jan 29, 2024, 8:18 PM

#

dreambooth is too general, it doesn't deal with the salient issues of "perceptual biases when looking at faces of humans"

#

i think it is challenging to use, but if your goal is to make stylized human faces, ipadapter is the framework of today and the future

shut relic Jan 29, 2024, 8:18 PM

#

When training loras, say i have 50 tags on each image, and then prepend my keyword to the tags. Would that drown my keyword, so I should reduce the other tags, or will it recognize the keyword fine

dusky urchin Jan 29, 2024, 8:19 PM

#

dreambooth is a training shortcut, is another way to think about it

#

just like lora is

#

LoRA is a cheaper-to-train "full fine tune"

dusky urchin Jan 29, 2024, 8:20 PM

#

shut relic When training loras, say i have 50 tags on each image, and then prepend my keywo...

with enough data, it matters less

shut relic Jan 29, 2024, 8:20 PM

#

But it does matter, for say 100 images

dusky urchin Jan 29, 2024, 8:20 PM

#

i think the hardest thing is to accept that many community members have "loras" that "look like someone" essentially by accident

#

100 images is too little data "in general," but if you are generating images for a celebrity who isn't a TV personality but also appears "a lot" in the base checkpoint in general, people's perceptual biases will help in accepting the likeness.

#

do you see?

#

there's a reason the dreambooth paper does not do people and does dogs and shit

shut relic Jan 29, 2024, 8:22 PM

#

Does having more images make it understand the concept in less steps

dusky urchin Jan 29, 2024, 8:22 PM

#

well having more images definitely increases steps 🙂

shut relic Jan 29, 2024, 8:22 PM

#

Or is it just for diversity

shut relic Jan 29, 2024, 8:23 PM

#

dusky urchin well having more images definitely increases steps 🙂

Not necessarily, since guides say to use 1000 steps no matter what

dusky urchin Jan 29, 2024, 8:23 PM

#

i can't speak for all community guides

shut relic Jan 29, 2024, 8:23 PM

#

So divide 1000 by count to get repeat

dusky urchin Jan 29, 2024, 8:23 PM

#

they are "darwin fine tuned"

shut relic Jan 29, 2024, 8:24 PM

#

Darwin fine tuned?

#

Natural selection?

dusky urchin Jan 29, 2024, 8:24 PM

#

the ones you hear about that go viral accidentally worked for accidental configuration for accidentally common use cases

#

yes.

shut relic Jan 29, 2024, 8:24 PM

#

Bruh

#

Hold on

dusky urchin Jan 29, 2024, 8:24 PM

#

lol

#

i mean what can i say? you are following the guides to the T, i'm sure, and getting crap results

#

and you're probably wondering why

shut relic Jan 29, 2024, 8:25 PM

#

https://rentry.org/59xed3

THE OTHER LoRA TRAINING RENTRY

Stable Diffusion LoRA training science and notes
By yours truly, The Other LoRA Rentry Guy.
This is not a how to install guide, it is a guide about how to improve your results, describe what options do, and hints on how to train characters using bad or few images.
Due to the higher prevalence of...

dusky urchin Jan 29, 2024, 8:25 PM

#

yeah exactly

#

this thing exactly

shut relic Jan 29, 2024, 8:25 PM

#

This is my holy grail

#

It has worked best

#

And I've tried others

dusky urchin Jan 29, 2024, 8:25 PM

#

it either works or it doesn't

#

it doesn't provide guidance really on getting any closer

#

because it can't create more training images for you

#

and it can't convince you to be more patient - indeed, it does the opposite! because the community is very impatient

#

something that says "train for a lower learning rate, on more steps, on more images" would not be very popular even if it were good

shut relic Jan 29, 2024, 8:26 PM

#

It does provide guidance, it says what you can do to improve results

dusky urchin Jan 29, 2024, 8:26 PM

#

well

#

like i said you've tried it

#

there's a lot of emphasis on configuration choices. if you followed my strategy (wait) you would use prodigy

shut relic Jan 29, 2024, 8:26 PM

#

Yeah but my results are not good enough and I don't really understand them

#

I use prodigy

dusky urchin Jan 29, 2024, 8:26 PM

#

prodigy is a very smart thing, and it is also darwinian-fine-tuned on training images of celebrities

#

it didn't exist 3 months ago or whatever

shut relic Jan 29, 2024, 8:26 PM

#

Which the guide suggested

#

Does putting caption files next to regularization images do anything

dusky urchin Jan 29, 2024, 8:27 PM

#

let me put it this way - the fact that prodigy exists means that a lot of so-called best practices parameter selection didn't matter

shut relic Jan 29, 2024, 8:27 PM

#

Or is it purely psychological

dusky urchin Jan 29, 2024, 8:27 PM

#

hmm

zenith delta Jan 29, 2024, 8:28 PM

#

Hmm, I need you guys opinion on the likeness

#

Is it allowed to share a generation and then a real photograph?

#

Completely sfw of course

shut relic Jan 29, 2024, 8:28 PM

#

Yeah why not

#

Unless it's illegal

zenith delta Jan 29, 2024, 8:30 PM

#

It is completely legal

#

But I pmed you anyway

dusky urchin Jan 29, 2024, 8:30 PM

#

shut relic Or is it purely psychological

actual dreambooth and the objects inside kohya's scripts, which are essentially formulaic, do use captions with regularization aka dreambooth training aka fine tuning conditioning together with the unet

#

i don't know what happens when you modify a config.toml, or what you put in which directories, because i don't use kohya's scripts that way

#

let's step back a bit though. your results will be crap if your dataset is crap

#

end of story

shut relic Jan 29, 2024, 8:31 PM

#

What makes a dataset good

dusky urchin Jan 29, 2024, 8:32 PM

#

is your goal to flexibly present a non-celebrity person in casual instagram style generatations?

shut relic Jan 29, 2024, 8:32 PM

#

no

dusky urchin Jan 29, 2024, 8:32 PM

#

what is your goal

#

☝️ gotchya

#

jk

shut relic Jan 29, 2024, 8:33 PM

#

you did

dusky urchin Jan 29, 2024, 8:33 PM

#

lol

shut relic Jan 29, 2024, 8:33 PM

#

Difficult question to answer

dusky urchin Jan 29, 2024, 8:33 PM

#

okay well let me talk about a real goal of mine, that's actually super hard

#

so maybe that's more interesting

#

one thing i am trying to do is introduce the concept of a place to diffusers. you should be able to express with words a subplace of a place - for example, "behind the big ben", and correctly get a shot from behind the big ben.

#

that's an easy one, right? the thing is fucking symmetrical

#

how about "from bathroom in long hall in cs_office"

shut relic Jan 29, 2024, 8:36 PM

#

Ok

#

That's a good example

dusky urchin Jan 29, 2024, 8:36 PM

#

someone who has played counterstrike, incredibly, can visualize exactly everywhere, in all the POVs, from inside cs_office

#

and the words are more than sufficient enough to put a very narrow possible set of valid POVs

#

and you can close your eyes and get coherence between these places and shots without motion

shut relic Jan 29, 2024, 8:38 PM

#

Is the lora supposed to to place a person behind the big ben, or in any arbitrary place near the big ben

dusky urchin Jan 29, 2024, 8:38 PM

#

so the dataset will probably need 10,000-100,000 images of POV shots of cs_office. for regularization, i would show it a vast amount of other locations, but NOT offices

dusky urchin Jan 29, 2024, 8:39 PM

#

shut relic Is the lora supposed to to place a person behind the big ben, or in any arbitrar...

i think a lora for monuments is actually really easy, since they are already iconic

#

and that's why 30 images of the "big ben" would be sufficient. anyway, it already has a ton of big bens in the dataset

#

the way a dreambooth fine tuning would help is to make all towers you generate look like the big ben

zenith delta Jan 29, 2024, 8:40 PM

#

@shut relic eyo

#

Can you check pm?

shut relic Jan 29, 2024, 8:40 PM

#

you didn't pm me

#

Oh you did

#

It's just barely visible

dusky urchin Jan 29, 2024, 8:42 PM

#

it could also help with generating unaesthetic images of the big ben, especially ones that albumentations doesn't do

#

for example, dall-e3, which is trained on incredible amounts of synthetic data, can't do negative space due to its aesthetic synthetic data

#

it CAN do white space when you ask it to make creative art

#

this is to say that everything is dependent on your training data

#

i should say that it struggles with making white space or similar concepts when you ask for it

#

they created a huge amount of synthetic data for localization concepts, and you'll see that you'll create a lot of flawed situations if you take it even slightly outside the bounds of the synthetic data

zenith delta Jan 30, 2024, 5:02 PM

#

Hey continuing from yesterdays talk

#

Reg images are defienetly not ''what you dont to generate''

#

Had like 6 different attempts since yesterday

#

And the best result is when I generated images that looked somewhat like myself using an anime model for the reg images

#

For context I'm trying to train a lora on my likeness, for anime generation purposes

dusky urchin Jan 30, 2024, 9:47 PM

#

zenith delta For context I'm trying to train a lora on my likeness, for anime generation purp...

regularization images are a strategy from dreambooth to improve generalization in conditioning. use a regularization image containing text prompts that are unrelated to your target training concept (like your likeness) and which you do not explicitly want associated with your concept.

this is confusing because the community goal is almost always "make my likeness in single character anime card portrait illustrations" or "make this person appear in instagram style casual photographs"

let's say your goal is to train the likeness of some person who doesn't appear in the training set at all. Let's say this person is Michelle Williams. you want to generate casual instagram images of this person.

let's say you have basically 3 photos of michelle williams wearing different outfits.

all of your captions should be "michelle williams, a woman wearing" followed by a bit of mundane details about the outfits.

you should not use regularization at all!

you actually do want all women who are ever generated by stable diffusion to look like michelle williams, because your goal is to generate instagram casual single person portraits.

you don't want regularization for the outfits because wherever those outfits appear on other women, you want them to "morph" into whatever meager appearance / fit you have for the three michelle williams examples.

does this make sense?

#

so @zenith delta in your concrete example you should not use regularization at all, and you should most definitely not use the regularization github repo which will really achieve the opposite of what you want to achieve

#

since you will always be rendering your likeness as anime, you want all likenesses to look like you. if you want someone else's likeness, you would disable your lora.

shut relic Jan 30, 2024, 10:52 PM

#

zenith delta Hey continuing from yesterdays talk

How did using sd 1.5 as base go

shut relic Jan 30, 2024, 10:55 PM

#

zenith delta Reg images are defienetly not ''what you dont to generate''

I wonder if there's a modified variant of lora that calculates loss to maximize similarity to the training data and minimize similarity to negative data

stiff dust Jan 31, 2024, 9:18 AM

#

shut relic I wonder if there's a modified variant of lora that calculates loss to maximize ...

yes, concept slider loras are implemented that way

shut relic Jan 31, 2024, 9:18 AM

#

stiff dust yes, concept slider loras are implemented that way

could you point me towards a resource i could learn from?

#

aha i found this https://github.com/rohitgandikota/sliders

GitHub

GitHub - rohitgandikota/sliders: Concept Sliders for Precise Contro...

Concept Sliders for Precise Control of Diffusion Models - GitHub - rohitgandikota/sliders: Concept Sliders for Precise Control of Diffusion Models

grizzled wraith Jan 31, 2024, 9:29 AM

#

Fine

zenith delta Jan 31, 2024, 9:26 PM

#

dusky urchin so <@211929193208872961> in your concrete example you should not use regularizat...

@stiff dust Exactly came down to this! So in my case, whenever I didnt use any reg images, the style would go from anime to 3d model when I trained without reg images. So reg images was a must.

#

But what I did was; used the LORA I trained on my partners likeness with low weight to generate anime style images that looked "somewhat" like her.

#

Then I repeated the training, using those images as the reg images. I also changed a few parametres, and voila! It worked like wonders!

#

I'll now do the same thing on my own likeness! If anyone is intrested in the detail feel free to pm me!

dusky urchin Jan 31, 2024, 9:30 PM

#

zenith delta <@321288280651857922> Exactly came down to this! So in my case, whenever I didnt...

why not use IP adapter though? it's purpose built for this use case

#

you can use a community anime checkpoint instead of an anime lora - they are the same thing in terms of what the process is, but the lora is a way to save money and time, not a way to get better results

zenith delta Jan 31, 2024, 10:22 PM

#

IP Adapter doesnt work well for me in most cases 😦 I guess I'm not that good with Control.Net

zenith delta Jan 31, 2024, 10:22 PM

#

dusky urchin you can use a community anime checkpoint instead of an anime lora - they are the...

Im doing exactly that. I've trained the LORA's on my partners and my own likeness and use community models to make generations.

stiff dust Jan 31, 2024, 10:42 PM

#

zenith delta Then I repeated the training, using those images as the reg images. I also chang...

interesting. It might be that training on close-but-different reg images helps cfg, as the way cfg works is to enhance the difference between conditioning (your character) and unconditioned (general human faces). So as closer the prior is to the real data as better cfg might work. That's also why tiny caption dropout might improve results.
Definitely something to research deeper!

dusky urchin Jan 31, 2024, 11:09 PM

#

stiff dust interesting. It might be that training on close-but-different reg images helps c...

i'm not sure. it sounds like it didn't have any effect at all, and something else changed. it doesn't make any sense

#

i think the user meant that there's a LORA that does "anime style"

#

and the regularization images weren't used at all, and it just trained more

#

@zenith delta if you had actually correctly used the regularization images in the manner you described you would get terrible results

proper shell Feb 1, 2024, 12:18 AM

#

i keep on having the image mess with extrabody parts and nipples, i dont knoew how to fix it

jade junco Feb 1, 2024, 4:41 AM

#

Hi, I've been trying out this epicrealism_pureevolutionv5-inpainting inpainting model to generate realistic backgrounds for people and objects, however, I was also wondering if I could finetune this to make it work better for certain things like countertops. Does anybody have any advice on how to go about doing this?

zenith delta Feb 1, 2024, 7:19 AM

#

dusky urchin i think the user meant that there's a LORA that does "anime style"

Oh no. Let me explain with a little more detail.

#

What I wanted to do was to train a LORA with a real persons likeness; that would be able to generate the trained person, using an community anime checkpoint, without changing the style.

#

Should I train a LORA on a real persons likeness without changing any of the parametres, the model would quickly start to override the style of the checkpoint I was using.

#

So then, I tried using regularization images. I was training on AnyLora, and therefore, needed anime styled images for my reg's. I went ahead and generated the images and made another training.

#

This LORA, was able to generate the images without changing the style of the checkpoint I was using, but the likeness took a massive hit. This LORA was also trained with a really low rank/ and 1 alpha.

#

I've tried multiple different settings at this point, but none really helped me.

#

Yesterday I tried generating some images with the LORA I trained on the likeness of my girlfriend, without any reg, and with realistic LORA training parametres. (High rank, high res). The style was a bit off, but by using really low weights about ,0.1-0.2, I was able to get some images that looked like my girlfriend and also in the style I wanted.

#

I generated a whole regularization dataset this way, and then trained another LORA, once again on AnyLORA checkpoint. This time however; I increased the network rank from 32 to 128, as if I was training for a realistic model. I was relying on the reg images to not overtrain.

#

This final LORA; turned out almost perfect! It generates perfectly stylized images, and pretty high likeness for the anime style. I'll improve this LORA by training another LORA with the same images; but cropped 512x768. Then I'll merge the 768x768 LORA with the 512x768 LORA to increase the flexibility.

#

The reg images had an effect for sure. Because when I used the same images/same parametres for the training without them; the LORA was changing the style already at Epoch4-5. The LORA I trained with the reg images works perfectly between Epoch18-20!

#

I do not want to share the images with this LORA; as it is trained on my girlfriends likeness. But I'll share the images when I make the same training on my own likeness.

mighty rock Feb 1, 2024, 9:23 AM

#

I’m looking to fine-tune SDXL with** Lora**. I’m wondering if I should go with TPU or GPU for this task. Can anyone tell me which one would give me better performance and be more cost-efficient?

stiff dust Feb 1, 2024, 10:36 AM

#

hm, super weird. To be honest, I never had a problem training on face for different styles. So all this workarounds you did shouldn't be necessary. I would train on base model (not on anime model) and then apply your lora on the anime model. Ranks of 128 are way too high in my opinion. I would try lower ranks, but I have mostly experience with SDXL. Also, you have to be careful with text encoder training as it can overfit very quickly. Maybe the reason why your style is taking over is because you train the text encoder too much. Try training only the unet (or use textual inversion on combination with unet training)

#

but anyways, if you are happy with your results that's also fine

dusky urchin Feb 1, 2024, 5:32 PM

#

mighty rock I’m looking to **fine-tune** SDXL with** Lora**. I’m wondering if I should go wi...

even with the least optimized configuration, such as by using a huggingface space GPU accelerated with a drag-and-drop-to-lora script with prodigy, it would cost like $4 for 10000 steps. i wouldn't stress too much about the kopecks you'd save here or there

stone garden Feb 3, 2024, 12:04 AM

#

is dreambooth still the go-to for SD 1.5 finetuning?

#

checkpoint finetuning, not LoRA

latent charm Feb 3, 2024, 8:41 PM

#

Do I need to create caption for regulation images?

stiff dust Feb 3, 2024, 10:50 PM

#

yes

muted scaffold Feb 3, 2024, 10:51 PM

#

yes

dusky urchin Feb 4, 2024, 7:06 AM

#

why is pixart-a trained on so few images

latent charm Feb 4, 2024, 7:52 AM

#

less cost, might be

sacred grail Feb 4, 2024, 9:03 PM

#

I made a customGPT in chatGPT that gives good and good length descriptions. Try it out, if you have any suggestions of what can be better let me know 🙂

https://chat.openai.com/g/g-EQFzMkKHZ-image-descriptor

You can upload multiple images at once, you don’t need to add any text. It will just give you a zip file with the descriptions in the txt files 🙂
I think if we try to make a big dataset out of this to make a better captioner we could get SD to the level of dalle3
(You need chatGPTplus to use this) let me know if you want to help compile a good dataset.
You can caption up to 400 images every 3hours with this due to chatgpt limits

ChatGPT

ChatGPT - Image Descriptor

Describes images factually and in detail, focusing on the observable aspects.

lusty stag Feb 5, 2024, 12:30 PM

#

sacred grail I made a customGPT in chatGPT that gives good and good length descriptions. Try ...

Can you make it on Hugging Face

#

How can I train a model with few images

sacred grail Feb 5, 2024, 6:12 PM

#

lusty stag Can you make it on Hugging Face

No unfortunately this is a closed model because its on openai, if we gather enough captions from this we could finetune a open source CogVLM model but that’s not something I would be able to do alone as gathering the captions would take a long time (needs a very large dataset (400k probably more)) and finetuning the model would cost too much money for me

hot breach Feb 6, 2024, 6:34 PM

#

cogvlm is outstanding as it is

warm agate Feb 7, 2024, 8:50 PM

#

hot breach cogvlm is outstanding as it is

Yup it's really good, better than sharegot4v

#

But cogvlm is kinda GPU intensive

hot breach Feb 7, 2024, 8:51 PM

#

yes, but its sort of one of those one-time costs

warm agate Feb 7, 2024, 8:55 PM

#

hot breach yes, but its sort of one of those one-time costs

Yea true, but captioning millions is kinda tough

hot breach Feb 7, 2024, 8:56 PM

#

training millions is as well

thorny gazelle Feb 7, 2024, 8:57 PM

#

question why are a lot of my samples in orintations that are not reflective of my database? all of them are sideview shots but im getting front view in the samples...

warm agate Feb 7, 2024, 8:57 PM

#

hot breach training millions is as well

It would be pretty costly though

#

Cuz captioning each image on a 3090/4090 would take around 10sec

hot breach Feb 7, 2024, 8:58 PM

#

1 beam 4bit is ~4s on an Ada card, maybe 6-7 on a 3090?

#

buy yeah its not fast

warm agate Feb 7, 2024, 8:59 PM

#

hot breach 1 beam 4bit is ~4s on an Ada card, maybe 6-7 on a 3090?

Captioning 1 million would take around 4months

hot breach Feb 7, 2024, 9:00 PM

#

not much you can do with 1 million images and just one 3090

warm agate Feb 7, 2024, 9:04 PM

#

hot breach not much you can do with 1 million images and just one 3090

Yea, 1million images aren't really useful

#

Can I DM?

thorny gazelle Feb 7, 2024, 10:10 PM

#

is this over training?

thorny gazelle Feb 8, 2024, 6:17 AM

#

are these good dataset images? 512

#

I have 54 images so far

#

they are all sideview and ive tried a set without skids and it gave me images with skids and a front view...

tame vortex Feb 8, 2024, 10:59 AM

#

maybe incorrect labeling then ?

#

Also there are probably already too much stuff linked to "helicopter" in the model already. Maybe try labelling your helicopters as MLGCopter or something like that.

#

just a thought

hot breach Feb 8, 2024, 4:15 PM

#

things like helicopters and airplanes are pretty rough to train, maybe due to all the appendages they have, but trying to just do the side shots is smart, if your screenshot is truly representing that fact

thorny gazelle Feb 8, 2024, 4:23 PM

#

ok it was in fact my keywords getting influenced by 1.5 models

#

also how do you use this graph? when it flattens out what does that mean when training?

#

using constant

#

does it mean that after 30 steps its not learning???

hot breach Feb 8, 2024, 4:25 PM

#

learning rate is an input not an output

#

my guess is you'll simply be challenged to get good helicopters, but I'd probably worry more about your data than anything, both the actual images and the captions you use. Add more data and experiment with how you label them.

#

you are training SD1.5?

#

lora or fine tune?

thorny gazelle Feb 8, 2024, 4:30 PM

#

im doing a model, right now im using a small dataset with 5img just to play around with keywords. im using dreambooth with 1.5 as a base

#

so far im getting better results

hot breach Feb 8, 2024, 4:31 PM

#

better results compared to what?

thorny gazelle Feb 8, 2024, 4:32 PM

#

better results with the unique keywords.

#

#

more sideviews. this is a 50 epoch model

hollow spruce Feb 8, 2024, 11:50 PM

#

https://github.com/jhc13/taggui

(its been recommended here once before)
Can vouch for this. good tagging tool + does cogvlm and other vlms really well on windows

GitHub

GitHub - jhc13/taggui: Tag manager and captioner for image datasets

Tag manager and captioner for image datasets. Contribute to jhc13/taggui development by creating an account on GitHub.

tiny heron Feb 9, 2024, 3:05 PM

#

Is it possible to train a style with an alpha channel?

true flint Feb 9, 2024, 6:57 PM

#

Does anybody know why it could be, that its like stable diffusion is changing seed every some frames?

dusky urchin Feb 9, 2024, 9:09 PM

#

tiny heron Is it possible to train a style with an alpha channel?

What is your objective?

#

You would need a lot of data, computational resources and patience

dusky urchin Feb 9, 2024, 9:10 PM

#

thorny gazelle are these good dataset images? 512

What is your goal?

thorny gazelle Feb 9, 2024, 9:11 PM

#

my goal was to create sideview concept art for helicopters

#

Ive gotten good results after changing keywords to be more unique that didnt call on the base 1.5 model which would add dross I didnt want

dusky urchin Feb 9, 2024, 9:18 PM

#

thorny gazelle my goal was to create sideview concept art for helicopters

But these are photographs and renders of helicopters

thorny gazelle Feb 9, 2024, 9:20 PM

#

im not seeing the error, could you explain further why this is a poor dataset?

dusky urchin Feb 9, 2024, 10:55 PM

#

thorny gazelle im not seeing the error, could you explain further why this is a poor dataset?

concept art usually means silhouettes, sketches and such

#

and something creative and zany

#

things like the motion blurred rotor do not achieve your goal

#

and you're training on a lot of motion blurred rotors and stuff that's on green

tiny heron Feb 9, 2024, 10:59 PM

#

dusky urchin What is your objective?

img2img, I wanna convert images to that style

thorny gazelle Feb 9, 2024, 11:00 PM

#

yeah I can understand the blur being bad. I can disable blur on the renders but most IRL images are done while in flight, on the ground the rotors sag and ruin the fuselage, the concept art can be done by hand in blender or gimp but the point of this concept art dataset is just to create ideas

#

Id also want to try blueprint 3views or other similar vector drawings

dusky urchin Feb 9, 2024, 11:09 PM

#

thorny gazelle yeah I can understand the blur being bad. I can disable blur on the renders but ...

what precisely are you trying to train for? like hwat do you want it to learn?

#

training on 50 images will make it generalize less compared to the base model, not more

#

and be less creative

thorny gazelle Feb 9, 2024, 11:10 PM

#

engine placement, fusalage shape, cockpits, landing gear, tail boom, rotor system

dusky urchin Feb 9, 2024, 11:10 PM

#

much rougher right?

#

so like very creative

#

something more crazy like this?

thorny gazelle Feb 9, 2024, 11:18 PM

#

mmm kind of, Im looking at it more as a tool, so recursive in use where you choice a few rough designs and expand on them with different features in a photo editor picking out features and exchanging them, maybe later if need be focus more on detail. but having detailed panel lines is not a huge concern for me, more realistic in nature and less outlandish

dusky urchin Feb 9, 2024, 11:18 PM

#

more realistic in nature and less outlandish
okay i think you gotta firm up your brief lol

#

it sounds like you don't know what you want

#

you want your exact helicopter on green background renderings, except "better"

thorny gazelle Feb 9, 2024, 11:19 PM

#

hmm I want it to take those photos and mash a totally new design that looks like it would have been manufactured

dusky urchin Feb 9, 2024, 11:20 PM

#

thorny gazelle hmm I want it to take those photos and mash a totally new design that looks like...

okay, so you want the opposite of concept art

#

you want photorealistic side view shots of helicopters that you plan to dissect and kitbash

#

into a new helicopter side view photorealistic shot, but fundamentally a pretty conventional one

#

a photocollage

#

it's too bad because i really like the concept art helicopter pipeline i just made

thorny gazelle Feb 9, 2024, 11:25 PM

#

not against other perspective shots, but I think that focusing on orthographic views are more controlled

dusky urchin Feb 9, 2024, 11:26 PM

#

you basically want this

#

with maybe more variety in the helicopter body plan

#

@thorny gazelle does that seem right?

#

it's just hard to reconcile creative and something you can kitbash, but anyway, i think you should try to achieve this some other way. a lora will not help you. the most challenging thing for it to learn is "side view"

thorny gazelle Feb 9, 2024, 11:40 PM

#

yeah more like what im after, im also not making lora. just a 1.5 model since dreambooth hates lora or I havent found a way to make it work... I need to look into kohya ss when I get more time. ive made my own unique keywords that is unfamiliar with the base 1.5, so far its given me sideview shots since its all it knows, sideview is also the only keyword that is not unique

dusky urchin Feb 9, 2024, 11:41 PM

#

thorny gazelle yeah more like what im after, im also not making lora. just a 1.5 model since dr...

it's too bad discord strips metadata

thorny gazelle Feb 9, 2024, 11:42 PM

#

such as?

dusky urchin Feb 9, 2024, 11:42 PM

#

but here's all you need to reproduce the workflow

📎 workflow6.json

#

noise in clip gives you creative concepts. noise in latents gives you creative silhouettes

thorny gazelle Feb 9, 2024, 11:46 PM

#

i havent gone to far into the rabbit hole, what am I suppose to use this json file in?

dusky urchin Feb 9, 2024, 11:48 PM

#

#🧣｜comfy-ui

thorny gazelle Feb 9, 2024, 11:51 PM

#

oh ok, I haven't installed comfy yet, just auto1111. thanks for your time btw

sullen nebula Feb 10, 2024, 3:35 AM

#

Hello, does anyone know how to become an authorized user for finetuning on Stability ais API?

hardy geyser Feb 10, 2024, 3:05 PM

#

anyone has a tutorial for finetuning xl with hundreds/thousands of images and multiple prompts?

bold ether Feb 11, 2024, 12:53 AM

#

I want to show my girlfriend the power of ai because she doesn't believe it can be that good. so she challenged me to make realistic pictures of her that are nsfw, but she doesn't want me to use her nude pictures to train the ai for privacy reasons

dusky urchin Feb 11, 2024, 1:03 AM

#

hardy geyser anyone has a tutorial for finetuning xl with hundreds/thousands of images and mu...

do you have the hardware or budget to do that

#

and have you ever used python

bold ether Feb 11, 2024, 2:20 AM

#

Can I get help?

dusky urchin Feb 11, 2024, 2:28 AM

#

bold ether Can I get help?

i think there are a lot of guides to help you do what you want to do

bold ether Feb 11, 2024, 2:35 AM

#

dusky urchin i think there are a lot of guides to help you do what you want to do

But where? I am very new to stable diffusion

open summit Feb 11, 2024, 2:36 AM

#

^ can you two continue this convo in DM? We try not to promote/chat on any NSFW on the server

bold ether Feb 11, 2024, 2:39 AM

#

okay

shy basalt Feb 11, 2024, 2:39 AM

#

Can I get some recommended settings for training a subject LoRA in Kohya? I am having a hell of a time with this.

#

I think I burned my cookies.

bold ether Feb 11, 2024, 2:41 AM

#

dusky urchin i think there are a lot of guides to help you do what you want to do

It says I can't dm you

dusky urchin Feb 11, 2024, 2:49 AM

#

shy basalt Can I get some recommended settings for training a subject LoRA in Kohya? I am h...

have you had any success generating images well?

#

before you jump into training

shy basalt Feb 11, 2024, 2:50 AM

#

Yes. For a few years.

dusky urchin Feb 11, 2024, 2:50 AM

#

hmm

#

what are you trying to do?

shy basalt Feb 11, 2024, 2:51 AM

#

Train a lora to give me camera-headed people. I've harvested a ton of images from Dall-E because it understands the prompt fairly well, phtoshopped a few, and gently reprocessed and blended everything in juggv6.

#

Got a few over 100 images, learning rate 5^e-5

#

40 epochs, but I'll just stop it once it starts working.

dusky urchin Feb 11, 2024, 2:52 AM

#

okay, do you want to turn every person into a camera headed person?

shy basalt Feb 11, 2024, 2:52 AM

#

Yep.

#

I'm not using regularisation images. Could that be a factor?

dusky urchin Feb 11, 2024, 2:59 AM

#

shy basalt I'm not using regularisation images. Could that be a factor?

you can try doing regularization images of actual cameras

#

regularization images of people would be the opposite of what you want

#

because it looks liek most of the flaws are in the camera itself

#

so you don't want it to learn dall-e3's flawed representation of cameras

#

you should also consider a full fine tune instead of a lora. if you have patience and a 24gb+ card

shy basalt Feb 11, 2024, 3:01 AM

#

I'm not sure what that would entail. I am a slow learner, but I do have that big GPU energy.

bold ether Feb 11, 2024, 3:49 AM

#

Is anyone able to help me in a DM or guide me somehow? I don't know what I am doing

dusky urchin Feb 11, 2024, 6:07 AM

#

shy basalt I'm not sure what that would entail. I am a slow learner, but I do have that big...

i think you should start with some regularization images of real photographs of cameras. you should also improve all your prompts to include mundane details

#

you should read a lot of coco-style captions

#

so you can see what CLIP was actually trained on

hardy geyser Feb 11, 2024, 6:20 AM

#

dusky urchin do you have the hardware or budget to do that

sure, i can code 👍

dusky urchin Feb 11, 2024, 6:54 AM

#

hardy geyser anyone has a tutorial for finetuning xl with hundreds/thousands of images and mu...

have you successfully generated complex images using something like comfyui?

#

i don't mean loading a workflow that someone else authored. i mean like, completing a brief

hardy geyser Feb 11, 2024, 6:54 AM

#

never used comfy

dusky urchin Feb 11, 2024, 7:08 AM

#

hmm

#

i think this is going to be a stretch

#

you have to learn what all the parameters actually mean, and what is going on

#

if you wanna do something innovative

#

if you can find a guide for exactly what you want, great

stiff dust Feb 11, 2024, 10:54 AM

#

bold ether I want to show my girlfriend the power of ai because she doesn't believe it can ...

wtf, dude. This story is so fucked up. This should not be a place where people learn how to make fake porn -_-

ornate ruin Feb 11, 2024, 12:07 PM

#

dusky urchin it's too bad discord strips metadata

discord hasn't been stripping metadata from images for almost a year btw.

bold ether Feb 11, 2024, 3:31 PM

#

stiff dust wtf, dude. This story is so fucked up. This should not be a place where people l...

Hey, we are into what we are into. If there is a better place please let me know

stiff dust Feb 11, 2024, 3:37 PM

#

Psychiatry? Jail?
I don't care if you want to generate porn, but don't abuse other people by putting their face into porn.

bold ether Feb 11, 2024, 3:39 PM

#

She literally asked for it

dusky urchin Feb 12, 2024, 12:27 AM

#

shy basalt Can I get some recommended settings for training a subject LoRA in Kohya? I am h...

maybe just augment your dataset with a lot of real images of cameras

hollow spruce Feb 12, 2024, 7:17 AM

#

shy basalt Got a few over 100 images, learning rate 5^e-5

a bit late to reply. but if you up your dataset to roughly 400 images, then you can just brute force the lora. at 400 images, the settings start to become a bit less important. just use adamw + 1e4 unet lr, 5e5 te lr + batch 7 + min snr 5 + offset noise 0.1 + dim 32 alpha 1
for fully automated tagging use cogvlm via this app:
https://github.com/jhc13/taggui

dont forget to add a triggerword to the whole thing. something like "camhead" which doesn't have any highly biased words in it

save every 5 epochs. epoch 45~60 will be your target epochs for finished lora

GitHub

GitHub - jhc13/taggui: Tag manager and captioner for image datasets

Tag manager and captioner for image datasets. Contribute to jhc13/taggui development by creating an account on GitHub.

hollow spruce Feb 12, 2024, 7:29 AM

#

bold ether Hey, we are into what we are into. If there is a better place please let me know

sdxl doesnt do nudity, since it was censored. Talks of how to avoid this censoring aren't allowed on this server. There are guides, but this discord isn't the place to ask about it due to this being the official workplace for stability.ai staff.

#

ah, my bad. didnt see that fruit already replied earlier

livid rapids Feb 12, 2024, 8:05 AM

#

Does anyone know how to disable RAM usage when VRAM is maxed on NVIDIA cards? I remember reading something about that being an option on the latest driver update but can't find it.

normal ember Feb 12, 2024, 9:32 AM

#

livid rapids Does anyone know how to disable RAM usage when VRAM is maxed on NVIDIA cards? I ...

https://nvidia.custhelp.com/app/answers/detail/a_id/5490

livid rapids Feb 12, 2024, 7:52 PM

#

normal ember https://nvidia.custhelp.com/app/answers/detail/a_id/5490

Thank you

lethal osprey Feb 13, 2024, 2:05 AM

#

Hello. I am new to the stable diffusion world and I tried making a lora of an art style but nothing I do seems to make it work. I used about 234 images from an artist and the style doesn't come through.

lethal osprey Feb 13, 2024, 2:24 AM

#

I am using Anime Art Diffusion XL checkpoint and this is what I get
I am trying to get the art style of My700 with the lora I trained

jade hornet Feb 13, 2024, 3:03 AM

#

hollow spruce a bit late to reply. but if you up your dataset to roughly 400 images, then you ...

genuinely curious why you recommend adamw over an adaptive optimizer.

hollow spruce Feb 13, 2024, 6:26 AM

#

jade hornet genuinely curious why you recommend adamw over an adaptive optimizer.

the short answer: to avoid overfitting, and achieve the best lora I can possibly make

For context, all of this refers only to sdxl.

the long answer: this will get a bit technical...

we're gonna separate this into 3 groups.
• AdamW + offshoots like AdamW8bit
• adafactor + similar adaptive ones like dadapt
• prodigy

Resources
• AdamW + constant does the math directly, and correctly. not nearly as much approximation going on. This has the downside of using more gpu, and basically sets a barrier of entry of 16gb vram, and can be used most efficiently with 24gb vram
• Adafactor does a lot of approximation, hence requiring less gpu. With every vram saving technique applied that exists you can get the barrier of entry down to 8gb vram, if you're willing to only do style loras. or 10gb vram for any kind of lora.
• Prodigy gets complicated, since you can get the vram requirements down, but doing so essentially moots the point of using prodigy in the first place. If your goal is to avoid overfitting, then prodigy has a barrier of entry of 24gb vram. If you use the methods that require less vram, then you're better off switching to adamW, since you'll get significantly better results with little more effort. Ideally you want a shit ton of resources if you wanna use prodigy efficiently (like 40 or 80gb vram)

Conceptual Complexity
• AdamW - If you're teaching one single concept, then you wont have any downsides with AdamW. If you're teaching multiple concepts (26 in case of my dnd lora) then you're best of with AdamW thanks to its consistency.
• Adafactor - if you want a quick and dirty lora, then adafactor will work just fine. If you care about the nuances of overfitting, then you'll quickly hit an upper ceiling, especially once you deal with more and more concepts.
• Prodigy - Can equal the quality of AdamW, without any of the knowledge required to make it work. The downside? An inhumane amount of resources used. If you use it via low vram requirements, then you're just turning it into a adafactor alternative, at which point, you could just use adafactor to begin with.

Actual Results
Theory crafting is all well and good, but results speak louder.
So I've trained the same lora cross testing just about every setting in kohya. I've done all my big loras in prodigy, adafactor, AdamW & AdamW8bit
Going simply by results, AdamW wins every time. Prodigy also works consistently and I have nothing bad to say about it, though I've stopped using it since I cant exactly tell people to get more vram, while with adamW my techniques at least work on 16gb vram environments.
Adafactor loses every time, and I only ever recommend it if you're running on 10~12gb vram environments

Misinformation
So most tutorials recommend adafactor, which can be traced back to the first tutorials during sdxl 0.9 release when SECourses made his youtube videos and declared his methods as "the best" and that he tried all the settings and these work best. When in fact he tried only a few settings on a single dataset of himself. Due to a fundemental misunderstanding of how network rank works, he arrived at the conclusion that adafactor works + net rank of 256. Both of which are the worst possible options, in general, but give the illusion of working. But from there, it formed a culture of using adafactor since misinformation spreads fast. Nowadays its getting better, but people are still using adafactor without knowing what differentiates it from the rest, and when to use it vs when not to use it.

#

#✨｜sdxl message
^ DND lora.
26 core concepts, which have been completely retaught
around 100 minor concepts, which have been merely influenced (like hands always having 5 fingers, pupils being round, etc... )

I attached a list of all the major concepts ("Indian" was taught as well, but doesnt show up in that list)
green wasn't taught, that's just for statistics, so I can verify I'm not accidentally teaching a bias towards any gender

#

has a roughly 80% success rate. meaning if you generate 5 images, from seeds 1~5, then 4 of those will be "actually useable", and would qualify to be printed and actually used in a game of DND. (All the images that I linked from general chat are from Seed 1, just to make a point)
All of this, on top of the default sdxl base. No checkpoints or other loras applied.

Its also not working by overfitting, as once you add a new concept it wasnt trained on, it still works. And as a few friends already tested it, it works with subject loras, to translate any subject into a "dnd portrait" and keep up the style.

#

For full transparency, I attached the training settings, which can be used with derrian distros lora training (just a different gui for kohya backend)

📎 auto_save_npcportrait_v2.toml

dusky urchin Feb 13, 2024, 7:35 AM

#

@shy basalt everything that @hollow spruce said aligns with my choice of parameters almost exactly. especially

at 400 images, the settings start to become a bit less important.
the better the dataset and the longer your patience, the less the settings matter.

dusky urchin Feb 13, 2024, 7:38 AM

#

hollow spruce the short answer: to avoid overfitting, and achieve the best lora I can possibly...

Due to a fundemental misunderstanding of how network rank works, he arrived at the conclusion that adafactor works + net rank of 256. Both of which are the worst possible options, in general, but give the illusion of working.
i also agree with this

#

i am surprised you haven't tried full fine tunes

#

if you have the patience, like a week, and the extra data needed to regularize

jade hornet Feb 13, 2024, 2:22 PM

#

hollow spruce the short answer: to avoid overfitting, and achieve the best lora I can possibly...

Thanks for taking the time for the detailed answer. I'll do more testing

viral jackal Feb 13, 2024, 9:53 PM

#

hey why is the training script for Cascade just idling with no errors? been bashing my head here for a few hours trying to figure out whats going on.

dusky urchin Feb 13, 2024, 10:51 PM

#

viral jackal hey why is the training script for Cascade just idling with no errors? been bash...

why do you figure it can be trained on a 24GB card?

#

did you try attaching a debugger?

viral jackal Feb 13, 2024, 10:51 PM

#

dusky urchin why do you figure it can be trained on a 24GB card?

they expictly state that it can be

#

and i have 8x3090s

dusky urchin Feb 13, 2024, 10:51 PM

#

and stepping through

viral jackal Feb 13, 2024, 10:52 PM

#

it does not move pass 1 step

dusky urchin Feb 13, 2024, 10:52 PM

#

it can be maybe LoRA fine tuned on a single card

viral jackal Feb 13, 2024, 10:52 PM

#

it just idles

dusky urchin Feb 13, 2024, 10:52 PM

#

can you maybe provide some more context

viral jackal Feb 13, 2024, 10:52 PM

#

sdxl can be tuned on a single 24gb card

#

the script runs and than just idles at step 1 for hours

#

on 8x3090s

dusky urchin Feb 13, 2024, 10:52 PM

#

additional cards do not make a model that requires X amount of VRAM to be trained trainable

#

you should know this

viral jackal Feb 13, 2024, 10:53 PM

#

i know

dusky urchin Feb 13, 2024, 10:53 PM

#

so

viral jackal Feb 13, 2024, 10:53 PM

#

they said it can be done on a card with 24gb

#

of vram

#

its not that

dusky urchin Feb 13, 2024, 10:53 PM

#

can you provide some more context for what you are trying to do? i am telling you it probably cannot be fine tuned on a single card in 24GB

#

hmm

#

well what version of torch are you using?

dusky urchin Feb 13, 2024, 10:54 PM

#

viral jackal they said it can be done on a card with 24gb

can you show me where?

viral jackal Feb 13, 2024, 10:54 PM

#

--find-links https://download.pytorch.org/whl/torch_stable.html
accelerate>=0.25.0
torch==2.1.2+cu118
torchvision==0.16.2+cu118
transformers>=4.30.0
numpy>=1.23.5
kornia>=0.7.0
insightface>=0.7.3
opencv-python>=4.8.1.78
tqdm>=4.66.1
matplotlib>=3.7.4
webdataset>=0.2.79
wandb>=0.16.2
munch>=4.0.0
onnxruntime>=1.16.3
einops>=0.7.0
onnx2torch>=1.5.13
warmup-scheduler @ git+https://github.com/ildoonet/pytorch-gradual-warmup-lr.git
torchtools @ git+https://github.com/pabloppp/pytorch-tools

GitHub

GitHub - ildoonet/pytorch-gradual-warmup-lr: Gradually-Warmup Learn...

Gradually-Warmup Learning Rate Scheduler for PyTorch - ildoonet/pytorch-gradual-warmup-lr

GitHub

GitHub - pabloppp/pytorch-tools: Useful PyTorch functions and modul...

Useful PyTorch functions and modules that are not implemented in PyTorch by default - pabloppp/pytorch-tools

#

https://vxtwitter.com/StabilityAI/status/1757444330757796350

vxTwitter / fixvx

Twitter

Stability AI (@StabilityAI)

Stable Cascade is now available in research preview for non-commercial use. This innovative text to image model introduces a three-stage approach, featuring enhancements for fine-tuning and training efficiency with a focus on further eliminating hardware barriers.

Learn more…

💖 256 🔁 55

#

featuring enhancements for fine-tuning and training efficiency with a focus on further eliminating hardware barriers.

dusky urchin Feb 13, 2024, 10:55 PM

#

it doesn't say anything about 24GB cards

#

are you trying to run train_c_lora.py?

viral jackal Feb 13, 2024, 10:56 PM

#

nope

dusky urchin Feb 13, 2024, 10:57 PM

#

i don't think a full fine tune will work on 24GB VRAM at 16bit. the regular RAM usage is suggestive that it requires a 48GB card

#

can you try the lora fine tuning instead?

#

wuerstchen stage C is designed to be trained on an A100 80GB

dusky urchin Feb 13, 2024, 11:00 PM

#

viral jackal nope

are you trying to run train_c.py?

viral jackal Feb 13, 2024, 11:00 PM

#

lmao i run out of memory for the lora

#

train_c? or tran_c?

dusky urchin Feb 13, 2024, 11:01 PM

#

"don't play games with me" as the president says

#

i think you are perhaps biting off more than you can chew

viral jackal Feb 13, 2024, 11:02 PM

#

? i spesficaly have hardware to train image models

#

so the model cannot be trained nor a lora with a 24gb card?

dusky urchin Feb 13, 2024, 11:02 PM

#

you have one of the worst possible setups for training image models

viral jackal Feb 13, 2024, 11:02 PM

#

i can train a 240B llm on the wardware

#

yeah i know

dusky urchin Feb 13, 2024, 11:02 PM

#

so...

viral jackal Feb 13, 2024, 11:02 PM

#

i was working within a budget

dusky urchin Feb 13, 2024, 11:03 PM

#

your deficits are in the programming department. the train_c script expects the job to be run against SLURM

viral jackal Feb 13, 2024, 11:03 PM

#

so it flat out cannot be trained on anything else than h100s or a100s?

dusky urchin Feb 13, 2024, 11:04 PM

#

so does train_c_lora

#

i don't know. i think you are "just" running the script without comprehending what it does

#

it's likely you can do a lora fine tuning on a 24GB card if you configure training for bf16 and use adamw (which is what it says)

viral jackal Feb 13, 2024, 11:05 PM

#

i am merely asking for some friendly advice and your coming in here out of the gate lecturing me about my intelligence and making jabs at me?

#

really?

dusky urchin Feb 13, 2024, 11:06 PM

#

i'm sorry i do not want to sap your excitement

#

if you buy $4,000-$8,000 of GPUs it's worth comprehending all these issues

#

i think you should try using the train_c_lora script with only one CUDA device visible, and then configure your training for 16bit (16bit backward pass, 16bit optimizer state/gradients)

#

i haven't used hugging face's config from the file for a long time so i can't tell you exactly what to put in there

viral jackal Feb 13, 2024, 11:15 PM

#

okay the memory was not the issue even when it loads up to 15 to 14 on the card it merely idles

#

not doing anything

dusky urchin Feb 13, 2024, 11:25 PM

#

you will have to step through in the debugger to see what is going on

dusky urchin Feb 13, 2024, 11:25 PM

#

viral jackal not doing anything

i don't know if observing just the side effects of the VRAM usage is going to be informative

#

do you know which line in train_c_lora you are observing a hang?

lethal osprey Feb 14, 2024, 12:24 AM

#

lethal osprey I am using Anime Art Diffusion XL checkpoint and this is what I get I am trying...

Is anyone able to help me?

normal ember Feb 14, 2024, 3:06 PM

#

lethal osprey Is anyone able to help me?

screenshot of thumbnails of your dataset could help you get better answers

lethal osprey Feb 14, 2024, 3:17 PM

#

Unfortunately it is NSFW

buoyant violet Feb 14, 2024, 7:10 PM

#

Hey peeps, not sure if anyone might be interested, but I co-founded a start up that provides cheap and easy to use GPUs for AI training (https://www.tromero.ai/) I would be over the moon if anyone decides to check it out (first couple of hours of compute are free)!

slender lagoon Feb 15, 2024, 9:17 PM

#

anyone tried training a lora yet? how much vram did you end up needing on the 3b?

dusky urchin Feb 15, 2024, 11:42 PM

#

slender lagoon anyone tried training a lora yet? how much vram did you end up needing on the 3b...

24GB

lethal osprey Feb 16, 2024, 2:07 AM

#

Does anyone know how I can get absolute black skin? I am trying to make a bat character but it keeps having her with dark brown skin, I am looking for straight up black

#

I also need help getting pure black eyes

slender lagoon Feb 16, 2024, 3:31 PM

#

dusky urchin 24GB

I am on 24 and it was still ooming regardless of batch size, what was your config?

#

sorry by 3b I meant the new cascade model btw

#

not sure that was clear

slender lagoon Feb 16, 2024, 3:32 PM

#

lethal osprey Does anyone know how I can get absolute black skin? I am trying to make a bat ch...

google noise offset or look up the lora for it

dusky urchin Feb 16, 2024, 4:05 PM

#

slender lagoon I am on 24 and it was still ooming regardless of batch size, what was your confi...

i wrote the training code from scratch, because the one in the stable cascade repo is buggy

slender lagoon Feb 16, 2024, 4:08 PM

#

dusky urchin i wrote the training code from scratch, because the one in the stable cascade re...

have you published it?

dusky urchin Feb 16, 2024, 4:16 PM

#

slender lagoon have you published it?

not yet, i'm still waiting to see some results

slender lagoon Feb 16, 2024, 4:21 PM

#

dusky urchin not yet, i'm still waiting to see some results

fair enough, if you do publish it, please ping me, because indeed the one available OOMs way past 24 gigs

small shadow Feb 16, 2024, 9:48 PM

#

Hey everyone I have a quick question.
I'm pretty new to Stable Diffusion, but I'm planning to start training my first lora. I know the basics when it comes to training a face, but does the same apply if I'm wanting to train a specific body type? I already have great results for the face I want with CyberRealistic in Automatic 1111, but im not completely satisfied with the rest of the body. Any tips/tricks/advice, or maybe even a tutorial link would be greatly appreciated!

unreal trench Feb 17, 2024, 8:45 AM

#

Hi! Could you help me with 2 questions? Suppose LoRA/LyCoris is downloaded from the internet. How can you determine the hyperparameters (especially the decomposition rank) it was trained with, and how exactly does it affect (since LyCoris can build many approaches) the diffusion model?

What metrics can be used to determine which of the LoRA adaptations to stable diffusion most accurately conveys the subject/person in the case of photorealism?

torn warren Feb 17, 2024, 8:51 AM

#

Hi. I want to train my model, but not on people, but on furry art of one artist. Tell me, maybe someone was engaged in training a model not only on portraits of people? I have a couple questions to ask.

normal ember Feb 17, 2024, 10:20 AM

#

unreal trench Hi! Could you help me with 2 questions? Suppose LoRA/LyCoris is downloaded from ...

Try to see if this can parse the metadata. https://lora-inspector.rocker.boo/

LoRA Inspector

Inspect Kohya-SS LoRA files locally on your computer without any dependenceies

normal ember Feb 17, 2024, 11:07 AM

#

torn warren Hi. I want to train my model, but not on people, but on furry art of one artist....

I'm sure there are many in here that have done just that. My advice is to just ask the question.

torn warren Feb 17, 2024, 11:10 AM

#

Well, I have two questions:

Is it possible to train a model on different art with different characters(and poses), but so that they end up with the same style and don't turn out ugly?
If so, what should be followed in preparing sources, etc.? If not, how can you achieve a similar result?

normal ember Feb 17, 2024, 1:48 PM

#

torn warren Well, I have two questions: 1. Is it possible to train a model on different art...

Yes that is possible. You want to caption as much as possible that is not the style. I.E when you later prompt for a character it will know what the rest should look like. It's probably a good idea to include some kind of keyword for the style too.

torn warren Feb 17, 2024, 1:52 PM

#

normal ember Yes that is possible. You want to caption as much as possible that is not the st...

I hope I understood you correctly. That is, when trainig model I need to add promt to each image, which will have the keyword(name) of the style and a brief description of the image/character?

#

Does the description need to be limited to a couple of words or it is better to make it more detailed?

normal ember Feb 17, 2024, 1:58 PM

#

I like to see the captioning as letting the model know the separation of the objects in the image.

#

Lets say you only caption “a man” Anything in the image will be associated with that caption. If you further describe the image with clothing the model will associate the clothing with the captioning to some extent and could make it easier to change the clothing during the inference.

#

Let’s say it’s foggy and don’t want to associate the fog with “a man” you caption the fog too.

#

If that makes any sense.

livid silo Feb 17, 2024, 2:04 PM

#

Ooo. I think I got it. All this I prescribe in text files that SD itself creates when analyzing images?

normal ember Feb 17, 2024, 2:05 PM

#

Most trainers do, like kohya or OneTrainer

#

This I trained with dark gothic fantasy art, <caption>

https://civitai.com/models/293532/dark-gothic-fantasy

Dark Gothic Fantasy - 3.01 | Stable Diffusion LoRA | Civitai

Welcome to the dark gothic fantasy LoRA for SDXL, a specialized module designed to enhance your creative journey with deep, immersive gothic and fa...

livid silo Feb 17, 2024, 2:09 PM

#

Oh, wow. That looks cool. I`ll definitely try to train my model one of these days. Thank you so much!

small shadow Feb 17, 2024, 5:05 PM

#

Just to chime in on this conversation. When training a model are you tagging your dataset images with words that you DO NOT want to see generated in your desired output, or are you tagging the things that you would like to see?

normal ember Feb 17, 2024, 6:25 PM

#

Well if you caption something you don't want to see, like you probably seen people recommend, will work but it's not like the model won't learn those things. Since your dataset probably contains more thing you want it to learn than things you don't want it to learn it will learn those things faster and you won't notice the other things change.

serene jacinth Feb 17, 2024, 8:41 PM

#

Hi everyone! i need some advice from the community. i'm need to custome train SD, with a dataset of images that has segmentation map and depth map too.

Does those 2 extra maps are revelant/will help SD to produce better result of my dataset?
Do you know if there is a train system that has already been developed i can use (which include seg map and depth map loader) ie: LoRA, Dreambooth ... etc.. ?

PS the training dataset is not human/face, it's an object

lethal osprey Feb 18, 2024, 2:33 AM

#

Why did my renders go from the first image to the second one? I didn't change anything.

#

It starts out great and then just breaks around the 70% mark. The first image was taken at about 50% while the second was when it finished

charred scarab Feb 18, 2024, 3:28 AM

#

I'm trying to use Dreambooth to train Loras of specific people with a set of 6 photos which I understand is enough but it doesn't ever seem to work. Auto1111 doesn't run on this system at all, I use SDNext if that matters

small shadow Feb 18, 2024, 3:50 AM

#

Does anyone know how to use the masking feature in OneTrainer?
I can't seem to figure out how to use it.

normal ember Feb 18, 2024, 6:26 AM

#

small shadow Does anyone know how to use the masking feature in OneTrainer? I can't seem to ...

I can recommend their Discord channel. Searching in there will get you on the right path.

serene jacinth Feb 18, 2024, 9:56 AM

#

normal ember I can recommend their Discord channel. Searching in there will get you on the r...

Can you share the link 🔗 to the discord? 🙏

normal ember Feb 18, 2024, 9:57 AM

#

serene jacinth Can you share the link 🔗 to the discord? 🙏

It's blocked by the server

#

https://github.com/Nerogar/OneTrainer but link is here

GitHub

GitHub - Nerogar/OneTrainer: OneTrainer is a one-stop solution for ...

OneTrainer is a one-stop solution for all your stable diffusion training needs. - Nerogar/OneTrainer

serene jacinth Feb 18, 2024, 10:00 AM

#

Thank you!

hardy bridge Feb 18, 2024, 11:46 AM

#

spaceship

hollow spruce Feb 18, 2024, 9:19 PM

#

lethal osprey Why did my renders go from the first image to the second one? I didn't change an...

post your training settings. (print command in kohya / .toml file in derrian)

hollow spruce Feb 18, 2024, 9:21 PM

#

torn warren Hi. I want to train my model, but not on people, but on furry art of one artist....

I did -> https://civitai.com/models/285182/rivet-xl-for-pony-xl-ratchet-and-clank-rifts-apart?modelVersionId=320839

Rivet XL - (for pony XL) - Ratchet and Clank: Rifts Apart - v1.0 | ...

Lots of examples with full metadata added. (no additional loras/extensions needed) xformers disabled so that it can be recreated 1:1 This lora is p...

lethal osprey Feb 18, 2024, 9:24 PM

#

hollow spruce post your training settings. (print command in kohya / .toml file in derrian)

How do I reopen kohya?

hollow spruce Feb 18, 2024, 9:27 PM

#

torn warren Hi. I want to train my model, but not on people, but on furry art of one artist....

here's a mini guide for sdxl
A.) get a large enough dataset (100/400/1000 = good,very good,perfect)
B.) write captions (I'll attach sample images with captions)
C.) get a checkpoint that is compatible with furry art -> Pony Diffusion V6 XL (the only one currently in sdxl)
D.) get derrian or kohya, then use settings that are proven to work -> adamw + 1e4 unet lr, 5e5 te lr + batch 7 + min snr 5 + offset noise 0.1 + dim 32 alpha 1 | save every 5 epochs. epoch 45~60 will be your target epochs for finished lora
D.2) train directly on that custom checkpoint
E.) test your lora first while using similar prompts to the ones used in your training data, then try using different prompts. both should work, with the first one being almost always perfect.

#

rivet, artwork, standing, hearts in background, drawing of a game character pointing at herself
rivet, full body, standing, 3d render, outdoors, standing on a platform
rivet, artwork, full body, standing, grey background, cartoon network, a cartoon network drawing of a game character

so when you do captions, follow this style:

<trigger word>, character (basically who or what is in this image, like a name or 'woman','man' if you dont know), pose, details, details, details, caption

if you wanna be lazy, you can fully automate your captions via cogvlm in this app:
https://github.com/jhc13/taggui

1c6acbbd15209ad27c9c7f6207a8b2421abfe29f4da87c1367f779535cc6e3a5.jpg

cba39d6ff3968dfbea427d85ea863711046b36ab5320dd96a571fd2849d5916d.jpg

f1eb3277c29b4a2d11d87d72d686a827e3b3a422b527e02d8c7605b9553a4c9e.jpg

#

lora results, for context, that this method works

hollow spruce Feb 18, 2024, 9:39 PM

#

lethal osprey How do I reopen kohya?

if you trained via kohya, it usually saves a config file where your logging folder is. (or output folder) check in those folder. odds are high the settings you used are saved there

lethal osprey Feb 18, 2024, 9:43 PM

#

This is the only thing in my log

hollow spruce Feb 18, 2024, 10:05 PM

#

lethal osprey I am using Anime Art Diffusion XL checkpoint and this is what I get I am trying...

just checked. my700 does a lot of furry art, so the guide I literally just posted is basically perfect for your goal as well XD
this one: #🔧｜finetune message

lethal osprey Feb 19, 2024, 3:05 AM

#

hollow spruce just checked. my700 does a lot of furry art, so the guide I literally just poste...

Sorry to bother you so late but I can't figure out these parts
5e5 te lr

hollow spruce Feb 19, 2024, 3:26 AM

#

lethal osprey Sorry to bother you so late but I can't figure out these parts 5e5 te lr

Text encoder learning rate of 0.0005
Unet learning rate of 0.001

lethal osprey Feb 19, 2024, 3:28 AM

#

Ah okay, I was able to stumble my way though most of it, thank you.

#

Is this how I am supposed to do the pony diffusion?

lethal osprey Feb 19, 2024, 3:30 AM

#

hollow spruce Text encoder learning rate of 0.0005 Unet learning rate of 0.001

This is everything I have. Is this correct?

#

hollow spruce Feb 19, 2024, 3:31 AM

#

lethal osprey Is this how I am supposed to do the pony diffusion?

Yep

hollow spruce Feb 19, 2024, 3:32 AM

#

lethal osprey Is this how I am supposed to do the pony diffusion?

You need to enable sdxl

lethal osprey Feb 19, 2024, 3:32 AM

#

where is that at?

hollow spruce Feb 19, 2024, 3:33 AM

#

Max Resolution is also 1024,1024

hollow spruce Feb 19, 2024, 3:33 AM

#

lethal osprey where is that at?

Where you selected the model

#

Everywhere it says fp16, switch to bf16

lethal osprey Feb 19, 2024, 3:34 AM

#

Okay, also does it matter that all the pictures I have are scaled down to 512?

hollow spruce Feb 19, 2024, 3:34 AM

#

lethal osprey Okay, also does it matter that all the pictures I have are scaled down to 512?

Sdxl works at 1024px scale, so it needs images at that size

#

How much vram do you have?

lethal osprey Feb 19, 2024, 3:36 AM

#

no clue, how do I check it?

hollow spruce Feb 19, 2024, 3:36 AM

#

What graphic card do you have?

lethal osprey Feb 19, 2024, 3:37 AM

#

3060ti

#

do all images need to be at 1024x1024 or does the width not matter?

hollow spruce Feb 19, 2024, 3:39 AM

#

lethal osprey 3060ti

Oh damn x_x that's not enough to run these settings

lethal osprey Feb 19, 2024, 3:39 AM

#

crap

hollow spruce Feb 19, 2024, 3:42 AM

#

lethal osprey crap

https://rentry.org/59xed3

THE OTHER LoRA TRAINING RENTRY

Stable Diffusion LoRA training science and notes
By yours truly, The Other LoRA Rentry Guy.
This is not a how to install guide, it is a guide about how to improve your results, describe what options do, and hints on how to train characters using bad or few images.
Due to the higher prevalence of...

#

That guide will get everything working for you. It's a bit detailed, but many of the things mentioned are optional

#

If you stick to sd1.5 based models, then 512px is working, and everything will train fast and easy for you ^^

lethal osprey Feb 19, 2024, 3:47 AM

#

hollow spruce https://rentry.org/59xed3

What would happen if I run as is? I am fine with it taking longer

hollow spruce Feb 19, 2024, 3:52 AM

#

lethal osprey What would happen if I run as is? I am fine with it taking longer

I think it would take a year 🤣

#

If you wanna work with sdxl, there's ways to make it work. 3060ti can make it work. But sd1.5 based checkpoints will be much easier to work with in your case

lethal osprey Feb 19, 2024, 3:55 AM

#

okay, now what if I turned off the SDXL and stay with the pony diffusion?

lethal osprey Feb 19, 2024, 3:58 AM

#

hollow spruce If you wanna work with sdxl, there's ways to make it work. 3060ti can make it wo...

Also I tried using the pony diffusion as a checkpoint in stable diffusion and nothing but blobs come out

wet mauve Feb 19, 2024, 6:26 AM

#

hollow spruce I did -> https://civitai.com/models/285182/rivet-xl-for-pony-xl-ratchet-and-clan...

I really like details on this LoRA.

neat fox Feb 19, 2024, 6:50 AM

#

lethal osprey Also I tried using the pony diffusion as a checkpoint in stable diffusion and no...

Did you set clip skip to -2?

lethal osprey Feb 19, 2024, 9:51 AM

#

neat fox Did you set clip skip to -2?

I dont know how to do that

#

If it is the one in settings>stable diffusion
the lowest it goes is 1

lethal osprey Feb 19, 2024, 11:24 AM

#

I was wondering if there is a way to be able to use stable diffusion from my computer on my phone?

hollow spruce Feb 19, 2024, 12:07 PM

#

lethal osprey I was wondering if there is a way to be able to use stable diffusion from my com...

Best to start reading the guide, as any tips you'll get in this channel will assume a base understanding of the tools and most common settings. Guide answers pretty much most questions

torn warren Feb 19, 2024, 1:03 PM

#

hollow spruce I did -> https://civitai.com/models/285182/rivet-xl-for-pony-xl-ratchet-and-clan...

O, wow

torn warren Feb 19, 2024, 1:06 PM

#

hollow spruce here's a mini guide for sdxl A.) get a large enough dataset (100/400/1000 = good...

OOoo, thank you. I`ll try it(but I think the biggest problem is finding so many images... ). But if i will have some unusual problems/errors, may I ask you?

hollow spruce Feb 19, 2024, 2:37 PM

#

torn warren OOoo, thank you. I`ll try it(but I think the biggest problem is finding so many ...

Sure! As long as you have 16 or 24gb vram I can help 🙂

torn warren Feb 19, 2024, 2:42 PM

#

hollow spruce Sure! As long as you have 16 or 24gb vram I can help 🙂

I have only 8gb khe khe...

hollow spruce Feb 19, 2024, 2:50 PM

#

torn warren I have only 8gb khe khe...

You can still train SD1.5 loras then
https://rentry.org/59xed3
This guide should help

THE OTHER LoRA TRAINING RENTRY

Stable Diffusion LoRA training science and notes
By yours truly, The Other LoRA Rentry Guy.
This is not a how to install guide, it is a guide about how to improve your results, describe what options do, and hints on how to train characters using bad or few images.
Due to the higher prevalence of...

torn warren Feb 19, 2024, 3:28 PM

#

hollow spruce You can still train SD1.5 loras then https://rentry.org/59xed3 This guide should...

Oke

next tapir Feb 19, 2024, 6:27 PM

#

Do full finetunes/dreambooth trainings always look like noise at their first sample generation? I was under the impression that, similarly to a LoRa, you'd start with the initial weights being set to the base model so the initial sample images would be similar to the base model. Otherwise, any idea what I'd be doing wrong? This doesn't happen when LoRa training for SDXL/1.5, nor does it happen for Cascade finetuning. It just happens for SDXL/1.5 finetunes for me

dusky urchin Feb 19, 2024, 7:27 PM

#

slender lagoon fair enough, if you do publish it, please ping me, because indeed the one availa...

we're still figuring it out. i don't think you can train stage C in less than fp32, it performs too poorly.

#

maybe stage A and B in fp16 will work fine

#

with stage C in fp32, it's possible to do a full fine tune in with 48gb

slender lagoon Feb 19, 2024, 7:38 PM

#

dusky urchin with stage C in fp32, it's possible to do a full fine tune in with 48gb

single gpu with 48gb or 2x24?

jade hornet Feb 19, 2024, 7:58 PM

#

next tapir Do full finetunes/dreambooth trainings always look like noise at their first sam...

are you seeing 'nan' in the loss output? you said initial sample, but you can literally sample whenever so that's not very clear

next tapir Feb 19, 2024, 8:00 PM

#

jade hornet are you seeing 'nan' in the loss output? you said initial sample, but you can l...

No, eventually the image changes from what you see above into an identifiable result. It just takes a bit, depending on LR and such. It just takes a few hundred steps before something normal emerges.

jade hornet Feb 19, 2024, 8:23 PM

#

technically every image looks like that during inference, it starts with noise, that's nothing unusual

dusky urchin Feb 19, 2024, 10:04 PM

#

slender lagoon single gpu with 48gb or 2x24?

2x24

#

it only makes sense with ampere though to do that

#

with 2xA5000 or 2x3090s nvlinked in tcc

slender lagoon Feb 19, 2024, 10:09 PM

#

dusky urchin with 2xA5000 or 2x3090s nvlinked in tcc

oh nvlink, unfortunate, I have 4090+3090

dusky urchin Feb 19, 2024, 10:10 PM

#

i mean we have definitely, successfully trained a LoRA on top of stage C. it's just not very good. it worked, but not as well as SDXL

agile inlet Feb 19, 2024, 11:08 PM

#

I can use google colab to train a lora with 15 images. It works good and produces an 18mb lora. I'd like to run koyha_ss locally because colab alwasy is out of free GPU time. I have koyha running, but when I train the file size is half the size, only 9mb. When I use the lora it does not change the image at all, doesn't work.
Here are the local kohya settings. All else is the default.
Source model: custom: /dataset/models/cyberrealistic_v41BackToBasics.safetensors
save as: safetensors
Folders: Not using regularization photos
Parameters - Basic: Epochs: 10, Save every N epoch: 2
Parameters - Advanced:, CrossAttention: xformers (also tried setting this to none)
Flip augmentation: checked (I also check this on google colab)

#

What things can I do to see what is going wrong. I'm using the same dataset with both.

hollow spruce Feb 19, 2024, 11:28 PM

#

agile inlet I can use google colab to train a lora with 15 images. It works good and produce...

size of lora, is determined by network rank setting.
(Higher = bigger)
also, odds are high the colab had some settings on by default, since there's no reason you'd ever want to not use them. in local kohya, you can choose to not use settings, that you absolutely should use. so your issue may be there

agile inlet Feb 20, 2024, 12:03 AM

#

I agree totally. I also wonder if the google collab has some regularization images. I think I should be able to capture the commands from both, would that show the differences?

#

Although to run it in google colab I have to wait for GPU to be available or run in CPU mode. Maybe I could grab the command then cancel it.

#

I followed this guide, and the person does not use any regularization images. https://medium.com/@dminhk/3-easy-steps-lora-training-using-koyhas-gui-on-amazon-sagemaker-notebook-573b151b4add Guide is for sagemaker but that shouldn't matter.

agile inlet Feb 20, 2024, 12:34 AM

#

hollow spruce size of lora, is determined by network rank setting. (Higher = bigger) also, odd...

I don't see settings in koyha for a network rank, are they called something different? I have seen those settings in automatic1111 with dreambooth.

hollow spruce Feb 20, 2024, 12:41 AM

#

agile inlet I don't see settings in koyha for a network rank, are they called something diff...

you are on the right tab, right?
you're not accidentally trying a full finetune XD

hollow spruce Feb 20, 2024, 12:41 AM

#

agile inlet I don't see settings in koyha for a network rank, are they called something diff...

agile inlet Feb 20, 2024, 12:45 AM

#

I see it now, good grief.

#

So I need to figure out what settings google colab is using for that?

#

Looks like colab spits out a config file into google drive.
unet_lr = 0.0005
text_encoder_lr = 0.0001

#

I'll give that shot. I suspect there must be more differences than that.

charred scarab Feb 20, 2024, 2:28 AM

#

I'm trying to train models of specific faces with SDNext and Dreambooth, it doesn't seem to ever work

vestal arrow Feb 20, 2024, 2:44 AM

#

Hello everyone,

I'm planning to train a model checkpoint to generate models from images of mannequins wearing clothes. I want the models to look realistic, and the background images should resemble real-life scenes such as streets, parks, beaches, fashion shops, etc. I've been using some models for a while, but this is my first time training a model checkpoint. Could anyone with experience in training checkpoint models share some advice?

Here are a few specific questions I have:

Should I use SDXL or SD1.5?
Is it advisable to base the model on an inpaint model? (I've tested some inpaint models, but the generated background images didn't look as good as those from regular models.)
How should I set up the parameters?
I'm planning to use kohya_ss for this project. Thank you very much for your help.

hollow spruce Feb 20, 2024, 2:12 PM

#

vestal arrow Hello everyone, I'm planning to train a model checkpoint to generate models fro...

break your project down into smaller piece. then get each piece working. finally, combine all your trainings into 1 to make the final product.

In case of SD1.5, making a model is just fine. probably ideal for you, since you can just keep scaling up the dataset to make it better over time. only downside is training time - but that should stay relatively harmless as long as you have 24gb vram or more.
downside? even with upscaling, you'll probably hit a 'reliable' limit of 768px. with a max of 1k

In case of SDXL, you can achieve this via a lora. (or to be more exact, genuine full finetuning is on a level where you should plan to have 10k$ ready for it. you'll save money by hiring someone that can already do this)
so for an sdxl lora, you break it down into about 3 individual loras that get combined:
Lora 1) train on faces that meet your clients/or business' preference (basically get the ethnicity bias you want to represent), then merge with rundiffusion checkpoint. this will be your base
Lora 2) get a dataset with images of fashion photoshoots at various locations (streets, parks, beaches, fashion shops, all the ones important for you). train that into one trigger word. this triggerword will be useable even for scenes not trained, like coffee shop, etc...
Lora 3) make a regularization dataset, filled with 50% mannequins, all wearing different clothing pieces. then 50% real people wearing any fashion clothing (make sure the people are all unique, else you'll get bias towards a person)
take a bunch of photos of your current outfit, train a 1 hour lora with your regularization data added in. that lora can then be used alongside your base and location lora, to generate the images you want.

If you get a new outfit, you only need to remake Lora 3, which is easy since the regularization data & training settings stay the same

#

if you dont get new outfits, then you can just combine the datasets from loras 1~3. add the regularization data as a normal dataset. train one single lora from all of them. <- also works, but you shouldn't do this until they work individually

#

if you need to supply only a 'checkpoint'. then just make the loras individually, and merge them in at the end

hollow spruce Feb 20, 2024, 2:18 PM

#

vestal arrow Hello everyone, I'm planning to train a model checkpoint to generate models fro...

for the more basic things like settings, refer to this guide so you can get a base understanding: https://rentry.org/59xed3

dusky urchin Feb 20, 2024, 5:12 PM

#

lol

#

i don't need a speech

#

spare me

#

i think it's actualyl that it was trained on just 700 images

#

so my attention went to zero

hollow spruce Feb 20, 2024, 5:12 PM

#

it worked in comfyui, when I was using it about a year ago? (moved to a1111 since its whats used by the majority, and easier to explain)
Basically commas ended tokens early

hollow spruce Feb 20, 2024, 5:12 PM

#

dusky urchin i don't need a speech

I wasn't writing 50% of the time XD

dusky urchin Feb 20, 2024, 5:12 PM

#

commas are their own token

#

i don't think any of that stuff really matters

#

clip isn't strong enough for that to matter

#

i mean that rundiffusion's checkpoint isn't adding much

hollow spruce Feb 20, 2024, 5:14 PM

#

dusky urchin i mean that rundiffusion's checkpoint isn't adding much

ahhhh. yeah

#

they overfit specific facial features

dusky urchin Feb 20, 2024, 5:14 PM

#

it's more that when you use a small dataset, it is more likely that sdxl already is very close to producing the results you want, and you could achieve the same results with "just" a prompt

#

it's why we can have working celebrity loras, but hardly anyone succeeds in training people that sdxl has never seen before with less than 1,000 images

#

whether lora or full fine tune doesn't matter

agile inlet Feb 20, 2024, 6:27 PM

#

I tell you this will be the end of my hair. I can build 100 different loras, different image sets, in kohya_ss and they will do Nothing At All. Might as well not even be included in the prompt. Use the same images on google colab and it works fine. I have matched up every single setting to be the same, doesn't matter. Use the default settings, doesn't matter.

#

Only thing I haven't tried is SageMaker to see if it's any different

hollow spruce Feb 20, 2024, 10:34 PM

#

agile inlet I tell you this will be the end of my hair. I can build 100 different loras, dif...

then you have some dumb error x_x
friend of mine once managed to not include ".txt" for caption extension. so something like that may be glitching you out

hollow spruce Feb 20, 2024, 10:41 PM

#

agile inlet I tell you this will be the end of my hair. I can build 100 different loras, dif...

wanna print your command, and paste it here?

agile inlet Feb 20, 2024, 11:25 PM

#

This is one I tried earlier. I was matching each single command line argument to the google colab. I don't think it needs to be the same. I've tried with and without regularization images. Makes no difference.
Config: https://pastebin.com/3NyT9dBm

Pastebin

./train_network.py" --bucket_no_upscale--enable_bucket --min_bucket...

Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

#

I've tried with the generated txt files that google collab makes and by generating them in kohya. Same text files in both.

hazy herald Feb 21, 2024, 1:37 AM

#

how critical is it to have exactly 1024x1024 images for training SDXL lora? Is it good enough if you're close enough like within 10% of that?

#

and how important is aspect ratio

agile inlet Feb 21, 2024, 5:46 AM

#

hazy herald and how important is aspect ratio

I read posts of people that say they don't bother to crop/square or resize their training photos and get great results. What if you tried one training with 10% square and one without changing any aspect, and see which produce better results?

#

When I'm testing a lora I like to to try the 7/8/9/10th generated lora and try it with 0.7/8/9/1.0. See which combination is the best looking.

#

(I'm super noob at this)

polar smelt Feb 21, 2024, 8:15 AM

#

Hey guys, if I finetune a model which is under license xyz.
Is it still the same model with same licensing?

finite marsh Feb 21, 2024, 9:18 AM

#

Any ideas on how we can improve masking (https://platform.stability.ai/docs/api-reference#tag/v1generation/operation/masking) It seems to generate a completely different image every time and ignores the masking.

hollow spruce Feb 21, 2024, 9:52 AM

#

hazy herald and how important is aspect ratio

depends entirely on what you want to achieve.
Training on only one aspect ratio, makes inferencing on all other aspect ratios a bit worse. But in return, you dont get any bias trained into an aspect ratio.
(Like that portrait photos in base sdxl look better if you generate an image with a 2:3 aspect ratio)

as for smaller images. it can work. Make sure to have upscaling turned on. in most cases its not too much of an issue. If you mess up, or your dataset is really bad, then you may end up accidentally training on the noise within your images, and generate something like my "youtube compression noise" lora when I just wanted to copy the style of a music video 😭

hollow spruce Feb 21, 2024, 10:17 AM

#

polar smelt Hey guys, if I finetune a model which is under license xyz. Is it still the sam...

yes and no. depends on the model and specific license. The last generative image model with a sort of open license was SDXL1.0. go read that license for more info.
but its not really a topic for finetuning per se, so best ask your question in #1010934719455707218 if you have a real life usecase, else #🌶｜off-topic if you're just asking to get a general feeling for it

polar smelt Feb 21, 2024, 12:46 PM

#

hollow spruce yes and no. depends on the model and specific license. The last generative image...

awesome thank you!

hazy herald Feb 21, 2024, 3:20 PM

#

@hollow spruce @agile inlet thanks for the tips. 🙂 I'm going to try having my images be approximately square but some will be a bit rectangular, nothing too crazy though. Sounds like it won't be a huge issue for my use cases

dusky urchin Feb 21, 2024, 4:38 PM

#

polar smelt Hey guys, if I finetune a model which is under license xyz. Is it still the sam...

what is your objective?

stiff dust Feb 21, 2024, 5:44 PM

#

dusky urchin it's why we can have working celebrity loras, but hardly anyone succeeds in trai...

I trained a working model of a friend with 6 images in SDXL.... oO

dusky urchin Feb 21, 2024, 5:46 PM

#

stiff dust I trained a working model of a friend with 6 images in SDXL.... oO

hmm... listen there is a scientific way to measure face resemblance. it would be a major discovery to transfer likeness into flexible scenes. if someone had a way to do it, it would be a great paper

dusky urchin Feb 21, 2024, 5:55 PM

#

stiff dust I trained a working model of a friend with 6 images in SDXL.... oO

looking into this further maybe it has just been kiboshed by the big AI labs because all the pieces are lying around

#

to achieve this

#

it would be generalizing deepface to "deephead," then "deephead adapter (controlnet)"

#

maybe SD is particularly weak for head orientations. sora figured it out, but sora is trained on synthetic data to specifically address the problem

safe tiger Feb 22, 2024, 6:57 PM

#

is there an api endpoint for creating/listing finetunes?

dusky urchin Feb 23, 2024, 9:22 PM

#

slender lagoon single gpu with 48gb or 2x24?

i have succeeded in a stable cascade lora fine tune and it performs well

#

for my dataset, which is for games IP, it seems like it needs a lot more data for stage C to non-face details like wardrobe correctly, but i'm more confident that this is something that is my fault and not the model's fault. it delivers superior silhouettes, proportions, adherance to face details and flexibility.

slender lagoon Feb 23, 2024, 10:04 PM

#

dusky urchin for my dataset, which is for games IP, it seems like it needs a lot more data fo...

thanks for the update! funny enough now SD3 is around the corner, not sure if cascade is worth it still

dusky urchin Feb 23, 2024, 10:24 PM

#

it's a very, very good model

#

it's not as good as IF though...

#

the direct pixel space models are the best for people like me with lots of fine tuning data

#

the real latent models are really good if you don't want to fine tune subjects but instead want creative generation

#

SD fake latent is a compromise between both and i wonder if SD3 is fake latent, real latent, or neither

frigid pier Feb 24, 2024, 6:00 PM

#

Got my first LoRA trained, it looks somewhat distorted / deformed haha.

frigid pier Feb 24, 2024, 7:14 PM

#

Reading through the sd training info on github, there's "Regularization data or Class data (pictures of diverse other things)", so does that mean the regularization data should be different from the dataset I use / my concept?

frigid pier Feb 24, 2024, 7:41 PM

#

This does not make fully sense, uses odd language in some points https://github.com/Guizmus/sd-training-intro

GitHub

GitHub - Guizmus/sd-training-intro: This is a guide that presents h...

This is a guide that presents how Fine tuning Stable diffusion's models work. This is an entry level guide for newcomers, but also establishes most of the concepts of training in a single p...

hollow spruce Feb 25, 2024, 5:37 PM

#

frigid pier Reading through the sd training info on github, there's "Regularization data or ...

terminology is important here, since "should be different from the dataset I use / my concept" will mean different things to different people

also relevant if you're training SD1.5 based or SDXL

here's a small breakdown for training a specific subject (lets say "Obama")
Training Dataset of obama: 20 images (or 200 for sdxl)
Regularization Dataset: 5~10 times the size of your dataset. Here we would want about 50% random black people, 50% other ethnicities.
If you have access to a good regularization dataset that someone made, then great. If you're training on synthetic (self-generated), then just generate them yourself. (real photos will make the final lora better, but you have to ask yourself if the effort is worth the result) <- if you plan to train a lot of lora, then just gather your own regularization dataset, or ask around for a good one. If its a one time thing, then the increase in quality is rarely worth it.

Dont forget to use repeat 5~10, so that all the images in your regularization dataset get used. else it will just reuse the same 20 images, despite more being available. <- lots of threads talk about this online, if you google around for a bit.

#

Do you actually need regularization?
not if your dataset is good/big enough!

but occasionally you're stuck with a mere 5~15 images, and have absolutely no way to increase those. So regularization has its place at all levels of training

frigid pier Feb 26, 2024, 6:13 AM

#

Thanks @hollow spruce for explaining it deeper, really appreciate it!

#

Just finished training my second test lora with regularization set, only got 1 repeat for those and might have gone wrong there not having enought steps.

median sun Feb 26, 2024, 7:31 AM

#

do regularization images make sense for a pixel art style? If yes what kind of images should I use?

frigid pier Feb 26, 2024, 9:23 AM

#

My second attempt did not go too well:

With the first test I did not use any regularization images at all, did only one epoch with 30 steps and the Lora responded as expected, even the quality wasnt there. I trained the first Lora on SDXL base 1.0 vaefix model.

My second attemp with 4 epochs and 30 steps (in total 10560 steps, with 44 images), I used regularization set of 150 images. Now the Lora is super unresponsive and only works (not well) with the model I trained it on. I trained the second Lora on limitlessvisionxl model.

#

Anything I could try with my next test? How do I get my LoRA smaller, adjusting the bucket size?

#

6 gig LoRA is far for good haha

stiff dust Feb 26, 2024, 9:53 AM

#

oh damned, you have to set the rank

#

or dim, however its named

#

--network_dim=8 or --network_dim=16

#

if you train a 6GB Lora then its no surprise that it won't work well

#

I found it often easier to train on SDXL Base and then just apply the Lora to whatever model I want. If the model you use is not too overfitted it can deal woth base loras without problems

frigid pier Feb 26, 2024, 10:14 AM

#

Thanks, will check the settings for that. @stiff dust

#

These two, right?

stiff dust Feb 26, 2024, 10:33 AM

#

only rank, yes

frigid pier Feb 26, 2024, 10:51 AM

#

Ok, thanks

#

I think someone suggested 128/256 on YouTube, I guess that explains the size haha

stiff dust Feb 26, 2024, 10:56 AM

#

yeah, but that's already way too high

#

use something between 8 and 48

#

try lower first and only increase if it helps

frigid pier Feb 26, 2024, 1:06 PM

#

Thanks, I'll experiment with it

jade hornet Feb 26, 2024, 1:34 PM

#

frigid pier Reading through the sd training info on github, there's "Regularization data or ...

generally you use the model you're training against to generate pictures of whatever class describes your subject (man, woman, dog), assuming it's subject training vs style training. the purpose of the reg images is to maintain the integrity of what the model knew about the class by injecting those back in. I would recommend someone try without them first to get an idea of training progress before adding in that variable. most of the issues in training come from either bad training images or bad captions

frigid pier Feb 26, 2024, 2:27 PM

#

jade hornet generally you use the model you're training against to generate pictures of what...

I see, thanks for explaining!

frigid pier Feb 26, 2024, 2:48 PM

#

Do I need to pay attention to learning rate, text Encoder Learning Rate and Unet Learning Rate?

neat fox Feb 26, 2024, 6:28 PM

#

Maxing out around 1.3 it/s with a 4090 and 5950x CPU... Seems the GPU isn't running at full load...

One of the cores for the CPU does however appear to be hitting 100% pretty often... Not using official monitors here just hwinfo, but wondering if this really is a CPU limit or not
I'm willing to build anotder PC to house the 4090 if it'll double my training speed by going from <50% load to near 100 on the 4090
I'm on windows 11, 64gb ddr4 ram @ 3600mhz
This is with OneTrainer which I've found to be a bit faster than kohya which has me at 1.1 it/s max

#

for comparison, here's what i see when generating images... each spike is during diffusion, each dip is when the system is completely idle

neat fox Feb 26, 2024, 8:33 PM

#

Yeah tried disabling SMT just in case and to clear things up

#

Cores 0&1 are maxed out

#

Looks like a CPU bottleneck

bronze igloo Feb 27, 2024, 1:43 AM

#

anyone know what the difference between the training in the original controlnet training vs the huggingface diffusers one is?

hazy herald Feb 27, 2024, 2:16 AM

#

does anyone have favorite upscalers for training images?

vestal arrow Feb 27, 2024, 2:26 AM

#

hollow spruce break your project down into smaller piece. then get each piece working. finally...

Hi @hollow spruce it can be late, but thanks for your help

frigid pier Feb 27, 2024, 11:18 AM

#

hazy herald does anyone have favorite upscalers for training images?

Do you mean like when building your dataset, you want to upscale some images to use in the set?

#

4x_NMKD-Siax_200k is one of my current favourites I've been using almost exclusively lately.

jade hornet Feb 27, 2024, 1:33 PM

#

I like lollypop

hazy herald Feb 27, 2024, 4:48 PM

#

frigid pier Do you mean like when building your dataset, you want to upscale some images to ...

Yes, and thanks for the tip!

median sun Feb 27, 2024, 4:56 PM

#

how many epochs do you guys usually use for your loras?

dusky urchin Feb 27, 2024, 5:18 PM

#

neat fox Maxing out around 1.3 it/s with a 4090 and 5950x CPU... Seems the GPU isn't runn...

can you talk a little bit more about what you are trying to do? the graphs are kind of meaningless, you will have to do real profiling, like with nsight, to know what is going on. you can use nvidia-smi for a correct, instantaneous GPU usage (really "occupancy") measurement.

#

if you care about performance in training, you should not be using windows.

dusky urchin Feb 27, 2024, 5:21 PM

#

median sun do regularization images make sense for a pixel art style? If yes what kind of i...

no. regularization images make sense in the context of where they were invented, for dreambooth, when you are trying to learn a concept from a few examples. to create a pixel art lora, you will be using thousands of pixel art images, hopefully of diverse subjects, and you will always want all things pictured inside of them to be represented as pixel art.

dusky urchin Feb 27, 2024, 5:29 PM

#

frigid pier Reading through the sd training info on github, there's "Regularization data or ...

same for you too. here is a super concrete example of what the meaning of regularization is:

let's say you are trying to train a picture of your dog, and for some reason, you only have 3 photographs of your dog.

in one of the three photographs, there is a ball. in another of your three photographs, there is a plant. so 33% of your dataset is also training balls, and 33% of your dataset is also training plants.

do you want plants and balls to be generated in your images "1/3" of the time whenever you are asking for your dog? no.

so your regularization dataset could be many things:

use a variety of images of other dogs, which have a diversity of random crap in the foreground. pros: you will not see balls and plants in your images. cons: you actually do want every dog to look like your dog, so this will actually increase training time / reduce performance of your dreambooth fine tuning.
use a variety of images of adjacent concepts, such as shots of other animals. pros: you will see fewer balls and plants in your images. cons: you actually do want every portrait of an animal to be a portrait of your dog, so this will actually increase training time / reduce performance of your dreambooth fine tuning.

okay... it should be clear there's no obvious choice for regularization images. let's look at some alternative solutions:

get more, better pictures of your dog.
caption better.
photoshop out the foreground elements like plants and balls.
moderate your expectations.
cure the disease of impatience.

frigid pier Feb 27, 2024, 5:32 PM

#

dusky urchin same for you too. here is a super concrete example of what the meaning of regula...

Perfect, thank you!

dusky urchin Feb 27, 2024, 5:34 PM

#

so you pretty much never want to use them :/

#

that's how it goes

#

what are you trying to do?

frigid pier Feb 27, 2024, 5:36 PM

#

I gathered some piece of info on one article I came across somewehre (I have a link somewhere) about reg images and those can be handy when the LoRA is bleeding too much into the checkpoint currently in use. For example if creating a red hair with a LoRA and it makes all the people not only look red haired as wanted but also looking similar to each other, i.e. it's bleeding the trained data to the checkpoint wiping away some of the data present in the main checkpoint and replacind it with what LoRA was trained with. Not just the hair alone but other charasteristics as well, such as body type, hair style, clothes etc. @dusky urchin

#

Not 100% sure that's the case though, don't have tested it yet succesfully

dusky urchin Feb 27, 2024, 5:39 PM

#

frigid pier I gathered some piece of info on one article I came across somewehre (I have a l...

but what are you trying to make?

frigid pier Feb 27, 2024, 5:40 PM

#

I will test at least for sure

#

and see how that goes

dusky urchin Feb 27, 2024, 5:40 PM

#

what is the goal?

frigid pier Feb 27, 2024, 5:40 PM

#

I currently have some bleeding occurring, to fix that

dusky urchin Feb 27, 2024, 5:40 PM

#

SDXL already knows how to make people with red hair...

#

so what are you trying to do?

frigid pier Feb 27, 2024, 5:42 PM

#

The red hair was just an example. I'm not doing anything serious, training my dog and my own face

dusky urchin Feb 27, 2024, 5:42 PM

#

okay

neat fox Feb 27, 2024, 5:43 PM

#

dusky urchin if you care about performance in training, you should not be using windows.

Unfortunately right now nothing I can do about that, need windows on here for computational stuff for work

frigid pier Feb 27, 2024, 5:44 PM

#

dusky urchin okay

I just want to learn the process and undestand the AI better. Maybe in the future train some actual LoRAs idk.

frigid pier Feb 27, 2024, 5:46 PM

#

dusky urchin SDXL already knows how to make people with red hair...

Yes, exactly but if that's the case what I'm descibing (not sure yet as it's just one article I've read about it), that means when training some specific kind of red hair, let's say with polka dots, the trained LoRA bleeds into a checkpoint currently used and soon all red haired ppl are looking the same, not just with polka dots but also same hair style, similar face etc

#

I've definitely noticed this kind of behaviour with some LoRAs not just my test loras alone, where a LoRA completely changes appearance into something dirrefent from what it originally was, for example body type. You create a plump girl and with Nike shoe lora and the originally plump girl gets transoformed into skinny girl.

median sun Feb 27, 2024, 6:00 PM

#

dusky urchin no. regularization images make sense in the context of where they were invented,...

oh thousands? maybe that is why my loras always looks like this:

#

I'm trying to create 16x16px textures

#

I'll try to up the img rep count

signal dust Feb 27, 2024, 8:43 PM

#

Quick question guys, hope this is the right place.
I´ve been getting back into stable diffusion and I wanted to finetune a model with my face. I´ve done this in the past but now I have better pictures and would like to try again.
What is the newest working google-colab I could use for this?

dusky urchin Feb 27, 2024, 11:10 PM

#

median sun oh thousands? maybe that is why my loras always looks like this:

the best thing to do is to resume from the pixel art XL lora on civitai

#

do not use it for a commercial purpose though

#

there isn't any legitimate secret sauce to anything you see. ordinary community members don't have the capital or knowledge to generate synthetic data.

median sun Feb 27, 2024, 11:29 PM

#

I'm using my own art tho

#

also I'm just gonna use it for ideas anyway

#

and resuming from custom models doesn't work for me it crashes kohya

#

setting to 100 reps did help a lot tho:

#

it's still not usable but hopefully if I resume from the training data it'll get better and if not then I'll just try again next year

tropic moon Feb 28, 2024, 2:01 AM

#

Anyone know what text encoder CLIP model Cascade uses? Seems like it might be a fine tuned half precision variant of CLIP Big G?

Strange that there’s so little information about it. Have looked through docs and release statement.

dusky urchin Feb 28, 2024, 4:11 AM

#

median sun I'm using my own art tho

you should do what i say.

dusky urchin Feb 28, 2024, 4:14 AM

#

tropic moon Anyone know what text encoder CLIP model Cascade uses? Seems like it might be a ...

it uses openai/clip-vit-large-patch14 for vision* and laion/CLIP-ViT-bigG-14-laion2B-39B-b160k for text

median sun Feb 28, 2024, 9:39 AM

#

dusky urchin you should do what i say.

I can't tho I've tried as a test and kohya crashes when I use any custom model

jade hornet Feb 28, 2024, 7:26 PM

#

kohya_ss shouldnt crash with a custom model, there's something wrong with your install, try doing the same training with kohya_ss on vast.ai or runpod

faint citrus Feb 28, 2024, 10:13 PM

#

hey dumb basic questions for Kohya

there's dreambooth and finetune methods for doing training on checkpoints, which is better?

is there a good up to date guide to finetuning checkpoints for multiple concepts anyone can recommend?

#

or presets/settings people generally agree are decently good

hollow spruce Feb 28, 2024, 10:48 PM

#

median sun I can't tho I've tried as a test and kohya crashes when I use any custom model

dont forget to enable all the correct matching settings. like in case of sdxl models, to enable sdxl. and vram optimizations

#

and to load it in under models, and not any of the two continue options XD

#

or, in case you ARE continuing a lora, to not load it as model, but instead continue from weights, while picking sdxl base or whichever base you want under models

median sun Feb 29, 2024, 12:15 AM

#

I've just trained with the base sdxl model it actually worked:

#

the textures aren't good but it at least does the 16x16px grid most of the time

#

I assume that it isn't good at such low res. esp with a low amount of images to train with. So I'm gonna train it some more after I've drawn more textures

faint citrus Feb 29, 2024, 12:37 AM

#

trying to do a kohya dreamboth XL training but its attempting to pull SDXL vae from huggingface but its down for maintanance so getting an error

faint citrus Feb 29, 2024, 12:59 AM

#

burnt raptor Feb 29, 2024, 8:25 AM

#

slender lagoon fair enough, if you do publish it, please ping me, because indeed the one availa...

hey, I have the same problem. Did he share his code already?

sonic narwhal Feb 29, 2024, 9:55 AM

#

should you caption reg data when training lora in Kohya?

slender lagoon Feb 29, 2024, 3:14 PM

#

burnt raptor hey, I have the same problem. Did he share his code already?

dont think so

burnt raptor Feb 29, 2024, 3:15 PM

#

slender lagoon dont think so

alright, thx

burnt raptor Feb 29, 2024, 4:04 PM

#

Hi guys, I got a problem with training stable cascade lora on the 1B version: During training I get intermediate output where the model generates a grid of images. There are 5 in total, the first one is the ground truth and the other ones are generated. However, the last 2 suddenly show something totally different from what I'm finetuning. I'm currently training lora for some shirts and the first 2 images look promising but the last 2 are just strange. Anyone knows what's happening?

onyx carbon Mar 1, 2024, 8:23 AM

#

What the FUCK

jade hornet Mar 1, 2024, 3:31 PM

#

excellent question

dusky urchin Mar 1, 2024, 6:28 PM

#

burnt raptor hey, I have the same problem. Did he share his code already?

no still working on it. our fork of kohya which is installable - https://github.com/hiddenswitch/sd-scripts - fixes some issues, but the basic answer is that you need more than 24GB to fine tune with pivotal tuning aka adding new vocab / tokens. i think this is because pivotal tuning requires more gradients than the LoRA layer for clip text implies; or, it's possible that train_c_lora is misconfigured

GitHub

GitHub - hiddenswitch/sd-scripts

Contribute to hiddenswitch/sd-scripts development by creating an account on GitHub.

stiff dust Mar 2, 2024, 10:10 AM

#

I implemented pivotal tuning in kohya_ss. It takes 11GB VRAM with batch size 1.

#

In general Pivotal training takes not more vram than training text encoder and unet loras

#

results are... nah. Pivotal training works as bad in Stable Cascade as it does in SDXL. I switched to text encoder training long time ago as it just works better than textual inversion

neat fox Mar 2, 2024, 3:28 PM

#

stiff dust results are... nah. Pivotal training works as bad in Stable Cascade as it does i...

haven't tried it yet... heard it was amazing somewhere, but only heard about it once really

stiff dust Mar 2, 2024, 3:39 PM

#

dunno, I think it is way overhyped to be honest ^^°°°

#

pivotal training came up a year ago as some alternative to dreambooth. Instead of using this strange sks token, learn a token via textual inversion and do dreambooth on that

#

the reason nobody ever used it was because afterwards everyone started using Loras and they worked better anyways

#

but if somebody has good experiences with pivotal training feel free to tell me about it

dusky urchin Mar 2, 2024, 11:23 PM

#

stiff dust I implemented pivotal tuning in kohya_ss. It takes 11GB VRAM with batch size 1.

well i mean you can do anything in 11GB of VRAM by unloading everything and reloading everything 🙂

#

maybe that doesn't make training take 10x as long, but 2-3x as long is still bad

#

i am just teasing it's good to hear it works

#

and that it gives bad results

stiff dust Mar 3, 2024, 12:04 AM

#

there is no offloading to cpu

#

of course there is gradient checkpointing

thorny gazelle Mar 3, 2024, 2:54 AM

#

you can train in non square images?

stiff dust Mar 3, 2024, 10:12 AM

#

yes

gentle flame Mar 4, 2024, 12:32 AM

#

token downsampling seems pretty good. Someone added a pr for it in kohya. Don't know if it's as good for finetuning, but it's still something worth looking into.
https://github.com/kohya-ss/sd-scripts/pull/1151

GitHub

Faster training with token downsampling by feffy380 · Pull Request ...

This PR adds support for training with token downsampling and replaces my token merge PR (#1146).
Token downsampling is a lossy optimization that significantly speeds up inference and training. It ...

#

better tome basically

stiff dust Mar 4, 2024, 11:31 AM

#

it's just a heuristic to speed up SD 1.5 a bit, cause in SD 1.5 they have attention in the first and last layer of the unet (which is somewhat crazy and ineffective). SDXL is not benefitting much from that

sonic narwhal Mar 4, 2024, 2:26 PM

#

Training in kohya. Why did it only create a NPZ file for 40 of my regularisation data images when I have 472? btw 40 is the same amount as images that I have in my training data img folder

hollow spruce Mar 4, 2024, 2:57 PM

#

sonic narwhal Training in kohya. Why did it only create a NPZ file for 40 of my regularisation...

thats the specific reason why repeats were created.
use 10 repeat on your training data, then 400 reg data images will be used (10x40 = 400)

(or just add the regularization data, as normal training data. then 100% of it is used.) <- while the math behind it changes, the result is on par or occasionally better. depends on your dataset.
I'd say try adding it as normal training data once. then you know for future runs if that dataset is viable to be used like that or not

sonic narwhal Mar 4, 2024, 2:59 PM

#

hollow spruce thats the specific reason why repeats were created. use 10 repeat on your traini...

Okay, thank you 🙂

sonic narwhal Mar 4, 2024, 2:59 PM

#

hollow spruce thats the specific reason why repeats were created. use 10 repeat on your traini...

Should you have 10 repeats on reg folder aswell?

hollow spruce Mar 4, 2024, 3:03 PM

#

sonic narwhal Should you have 10 repeats on reg folder aswell?

nop. that would results in this: 10x40:10 = 40
so you always want repeat 1 for regularization
(fun fact. if reg data folder is too small, it auto repeats anyway! XD)

tropic iron Mar 4, 2024, 7:38 PM

#

Hey all! I have a bunch of questions about training a LoRA on a specific style

#

Starting with detail and resolution, I guess. I understand my current checkpoint is based on sd 1.5, and that has native resolution of 512x512. When I try to make an image at that resolution though, I get low resolution images. 768x768 or even 1024x1024 produces drastically better results

#

Can I make images at 1024x and scale them down, while still preserving the detail during training?

pliant drift Mar 4, 2024, 7:41 PM

#

tropic iron Can I make images at 1024x and scale them down, while still preserving the detai...

scaling down is inherently a destructive process. the details are in the pixels and the pixels are being removed.

that being said, the fine details don't really matter too much. SD15 is going to wash them out and smooht things over either way, as it's base attention is 512.

#

what you can do is a second set of your images with closeup cropping instead of sizing the whole thing down. like, cropping just the face or another key focal point in the image

tropic iron Mar 4, 2024, 7:43 PM

#

Okay, cool

#

How about switching from 1.5 to sdxl? Recommended?

#

I just did a cursory glance at the specs, and sdxl certainly seems better

stiff dust Mar 4, 2024, 7:47 PM

#

I always found training on SDXL much better and easier than training on 1.5. But I also know people saying it is more difficult to train sdxl than SD 1.5. So I would say just try it out

#

for sdxl I would say: train on sdxl base and then use the lora on some other/better model

tropic iron Mar 4, 2024, 7:48 PM

#

Lemme see if I can wrap my mind around this

#

Lets say I use my current workflow - producing high resolution 1024x1024 images using a model that's based on 1.5

#

Then I train THAT on the base SDXL model