#šŸ”§ļ½œfinetune

1 messages Ā· Page 17 of 1

covert pagoda
#
  • a certain lighting setup
#
  • makeup
#

all within one LoRA

#

@hollow spruce so would that be a separate img folder for each, or would you keep each concept as recurring common token in the captions?

hollow spruce
covert pagoda
#

but how exactly do you develop each concept so that you can visualise the group of images that contain that concept feature ?

hollow spruce
#

my personal rule of thumb is - 100 images tagged with the same concept to work properly, when I train them together into one finetune style lora

covert pagoda
#

for instance in Everydream, you can use yaml args in a folder to inject a caption token during training, so each folder would group the images for taht concept

latent charm
covert pagoda
#

i guess in your case you are doing it manually on Danbooru tag editor

hollow spruce
covert pagoda
#

yea, i've heard about hydra, but isnt that more for anime?

hollow spruce
#

the moment I hit 2k images in total, things started to work out

covert pagoda
#

whereas i use instagram and photo websites to scrape

hollow spruce
#

at 4k images right now, it's pretty damn good

latent charm
#

Oh, nice to know

hollow spruce
#

but if you do just the cosplay training, and use keep n tokens = 1, and shuffle rest, then it should work with around 150 images, and 400 images for "close-to-perfect" results

#

(I tried it with a Nier Automata 2B cosplay) <- no clip training, since '2b cosplay' was already understood, just not producing the right images most of the time

#

it learned a total of around 10 concepts, with the '2b cosplay' working roughly 4/5 times to produce an image I consider good enough to post

tribal frigate
#

I see, thanks šŸ™‚

#

Guys do you have a tip for a good tutorial on training loras for SDXL with google colab?

covert pagoda
hollow spruce
#

while normal loras stay similar in training, with sdxl we now have options for bigger and much more complex loras, which approach finetune level of improvements - across hundreds of concepts

stone garden
#

Hello, everyone. I'm using scale_weight_norms = 1 but my average key norm doesn't stop going up and eventually approaches 1 and my keys scaled become very high after the 4th epoch. Any way I can fix this?

covert pagoda
#

My main use case for this is to attempt mixing characters... like multiplying concepts, like two characters generalized. Does that make sense? That's sort of been my desire for a while. Will give it a go this week. Any recommendations besides the ones you've given me already?

jade hornet
#

If you have one concept with 50 images, and one with 100, would you do 2 repeats on the set with 50? Or on the 100? I struggle with that

hollow spruce
jade hornet
#

Yah the kohya_ss page says it's original intent was to allow you to match up with #reg images, but like you stated there is probably good reason to use with multiple concepts even if not using regs

hollow spruce
#

I can't vouch for how well it works, since whenever I did that, it didn't really pay off, as I didn't have enough images to make it work (20 images repeated 5 times, really aren't enough to fully teach a concept - so not worth trying to make this work if you're teaching multiple concepts into one lora - better to just make 2 loras if the difference is that big)

#

on the other hand - for concepts where I have 200 images, and another with 700 images, I don't bother repeating the smaller one, as they'll still get learned at roughly the same rate

#

(the bigger dataset gets learned slower, but more flexibly - balancing everything out)

#

sub 100 images šŸ¤·ā€ā™‚ļø hard to tell without a lot of testing

#

I usually find it easier to just increase my dataset, than messing with repeat settings

jade hornet
#

Yah what I saw was one burns out the whole model before the other is trained

pliant drift
#

wonder what the documentation is on about then.

covert pagoda
#

I'm getting the vibes that adaptive algos aren't for fine tuning small datasets but rather for custom larger models

#

Getting way better with adamW8bit on my Lora

#

Guess I'll stick to constant/cosine schedulers for Lora's

#

And might go back to adaptive for say a large diverse dataset for a custom model

normal ember
#

Is it crucial to ensure that the num_train_images evenly divides the train_batch_size?

young crater
stable spruce
#

when does stablecode come out to use

signal warren
signal warren
#

hmmm I wonder what learning rate I should use for thousands of images

elfin raven
#

Note sure is this is the right place for this, but I got sdxl lora running at 1024x1024 on a 16gb gpu. My loss is crap, but I will experiment more. I'm happy to share my settings if people want to contribute and make suggestions.

hollow spruce
# signal warren Do you seperate each concept images into different sub folders? I like the idea ...

depends. If all the concepts you're training fall under some greater class, then you don't need different folders.
In my case I currently have two folders: 1_girl, 1_woman <- since one of the goals of my lora is to give these two words very specific age brackets
but even if I put everything into 1_woman, it would still work

However, additional folders are great if you're lacking enough images for a specific concept! My "tracer cosplay" concept felt undertrained, so I added an additional folder 1_tracer cosplay, exported all those tagged images there once more, and now it learned it just how I wanted it (keep in mind, those images are now essentially duplicated - since they're already a subset of the woman/girl images. But this time they got loaded into kohya witht he class prompt "tracer cosplay" - so it worked out better)

#

in short, once you have around 2k images, you can use various methods to train your lora, and they will all work 'good enough'. The magic happens when you make all the important concepts work with weights of the lora set to 1. and with no need for (tracer cosplay:1.4) or anything like that

signal warren
#

ah thanks, yeah that makes sense. I guess part of the fun really is to experiment yourself to get that perfect balance.

My idea at the moment is to make a lora which focuses on jdrama/movies, so that I can get the style of my favourite different dramas and characters all in one lora! šŸ’¦

mortal lance
#

What are best implementations available for the following methods to fine-tune the sdxl?
-- full fine tuning
-- Dreambooth + LoRA
-- Dreambooth

lethal hinge
#

What base model do you all use for training a lora with a person? just the base 1.5? (not jumping into sdxl just yet)

stone garden
#

can anyone point me in the right direction to train some text embeddings/inversion i have the sdxl1.0_0.9vae versions, thanks

hollow spruce
stiff dust
pliant drift
#

aitrenpreneur out here this week telling his audience that 99% of loras are made wrong because people are using rare tokens. I'm out here looking on civit and 1/20 users publishe with rare tokens. maybe 1/20. probably less.

i think he just made that 99% figure up

stone garden
#

whats rare tokens?

#

and how is that bad?

signal warren
#

The guy in the video (not me) said the rare tokens such as a random ewfew word like that has less effect than choosing a person the SDXL already knows that looks similar to the person you want to train, for example using the name Jessica Alba to train someone who has a similar look. He believes that training with a name SDXL already knows will make the training faster/better.

stone garden
#

It’s going to work faster yeah. But given the right settings and time the other way will also work perfectly.

stone garden
#

help pls

#

can anyone give me some advice on lora training ? i found some guides but they're a bit confusing maybe i could ask a couple questions, like how do i make the captions seems to be different ways to apply, can i just make a folder with images

orchid yoke
# stone garden can anyone give me some advice on lora training ? i found some guides but they'r...

Starting out, I would think your best option is to use the gui https://github.com/bmaltais/kohya_ss - In Lora > Tools > Deprecated you can fill in the details, click prepare training data and then "copy info to folders tab" which will handle the folder creation. Theres then a utilities/captioning/blip captioning to create the core captions. So then you have the folders in the right format, and the captions created with ease. You likely will want to make some manual edit to the captions to make them better, but you have a core framework to go off that way.

#

(screenshot taken from https://youtu.be/sBFGitIvD2A) but as you say there are lots of guides. Its still pretty new and I think there is going to be even more amazing things to come especially around multiple concepts)

In this tutorial, you will learn how to install Automatic1111 Web UI for SDXL. How to use LoRAs with Automatic1111 SD Web UI. How to install Kohya SS GUI scripts to do Stable Diffusion training. How to train LoRAs on SDXL model with least amount of VRAM using settings. All of the details, tips and tricks of Kohya trainings. How to do x/y/z plot ...

ā–¶ Play video
sour eagle
#

why does my loss stay around 0.125 when using adafactor? is that normal for variable learning rate ?

stiff dust
#

the loss has nothing to do with the optimizer or the learning rate. The learning rate is applied on the gradient that is conputed from the loss

latent charm
#

Did anyone try LoRA-FA?

latent charm
#

I have tried this LoRA-FA. It basically achieved better result at half epoches and it is memory efficient. I was able to do batch size 10 around 21/24 using 3090.

signal warren
#

Interesting, seems like worth a try then

signal warren
latent charm
signal warren
#

aaaah cheers!

normal ember
#

Anyone tried Salesforce/blip2-opt-6.7b-coco or something similar to auto-caption images? I think I'm starting to have a baseline for training.

covert pagoda
# hollow spruce that's a whole essay's worth of topics, which each have multiple answers which r...

What are your thoughts on finding the right scheduler for a given dataset? Are you one to play with runs on wandb and compare sequentially the effect of apply constant vs cosine with startups vs cosine annealing etc. maybe seeing the different feedback from each LR on strengths or weaknesses of various LR gamma helps to identify where converge is highest? Do you have graphological conversations. I've noticed this is a hot debate amongst the big model fine tuners, less so for the dreamboothers?

stone garden
normal ember
latent charm
#

My approach is tagged with wd14 and remove unnecessary tags

normal ember
#

I feed it a data structure with all the questions and then I combine it to the caption that I then write to file. questions = { "p_style": "Choose the mood of the photo: Desolate, Tense, Lonely, Stark, Quiet, Dark.", "p_subject": "Describe it in detail.", "p_mood": "Describe the mood in the image.", "p_colors": "Describe the colors.", "p_framing": "Choose the framing of the subject: Close-up, Medium, Wide.", "p_setting": "Describe the general setting or environment in a few words.", "p_lighting": "What is the lighting like? E.g.: Natural, Low, Soft, Harsh.", "p_angle": "What angle is the picture taken from? E.g.: Straight on, Low, High.", "p_dof": "What depth of field is used? E.g.: shallow depth of field, deep depth of field." }

latent charm
#

It should work better for fine tuning. It is unnecessary for lora

stone garden
hollow spruce
# covert pagoda What are your thoughts on finding the right scheduler for a given dataset? Are y...

funnily enough, I don't use the graphs at all.
I usually just rigorously test my checkpoints, as the loss values can appear perfect, yet upon real life testing - it turns out only 3/5 concepts were learned.

Tracking Loss is great to ensure that your settings don't have some massive error, or that the model didn't implode on itself. But other than that, it's not really worth it for me, as I have some loras that work greats, despite less than ideal loss values.

I usually stick with AdamW or Adamw8bit + constant. For anything that isn't faces or anatomy - this will give so close to 'perfect' results, that it wasn't worth trying with other settings for me.

#

While I'm confident, that cosine with restarts, or cosine with annealing restarts can be used to get even a bit higher quality, on harder concepts such as faces/anatomy - getting that setting just right for your specific dataset + tagging is hard enough that I find it hard to recommend.

#

Prodigy can be good - as you can use it as a set & forget scheduler. One training setting fits all - literally. It won't give you perfect results, but it gets you 80% of the way, 80% of the time. Regardless of how complex your dataset is

stone garden
hollow spruce
#

so I've seen it used successfully with genuinely insane datasets of like 30k images

hollow spruce
stone garden
# hollow spruce now if only min snr didn't kill contrast 🄲

Eh? Min snr gamma only smooths the loss so the optimiser or LR scheduler have a better chance of optimising the thing. I saw that in the code, but didn’t check if it does anything to the image before it’s fed into the model. Otherwise with a model as complex as SD it’s sure to go crazy loss-wise.

#

I shall remember to check that.

hollow spruce
# stone garden Eh? Min snr gamma only smooths the loss so the optimiser or LR scheduler have a ...

it's a ?bug?. I mean I'm not sure if its a bug since its more about how the base sdxl model was trained itself, than anything else. But if you finetune sdxl enough, especially using bright images, you'll notice that the finetune always tries to converge on 50% grey. First backgrounds turning greyer, then blacks turning less black, whites turning less white.
Some settings can speed up that effect - the biggest offenders I've found are offset noise + min snr

stone garden
#

Interesting. I shall check what they do in the code in more detail. Noise offset moved some things between the up and down pass of the Lora so that the scale was okay from what I saw, but I didn’t look in too much detail.

hollow spruce
#

since sdxl base model was trained with offset noise (and not just one consistent value either, but rather varies levels of offset noise, at varies intervals)
our running theory is that we can't exactly match that offset noise - hence this is the effect we get

#

it's even worse on full finetuning

stone garden
#

@hollow spruce I hope you don’t mind me asking, does the loss of your LoRA always go down? Mine fluctuates a bit between 0.68 and 0.72 for example.

hollow spruce
#

but it's nothing about the code - its 100% implemented correctly. It's just what the finetune is learning.
(SAI themselves don't experience this though, fyi - so their finetuning workflow, using custom scripts is immune to this issue)

stone garden
#

Oh not saying it’s implemented wrong, just saying I probably don’t understand it fully or have seen it enough yet.

normal ember
hollow spruce
stone garden
#

Means it’s something in my settings probably. I have a big enough dataset I’m confident I don’t overfit.

#

But I still have to prune and caption it a bit more.

hollow spruce
#

keep in mind, invisible things can also overfit XD I've trained more than one "noise" lora by now, on accident

stone garden
#

Yeah. If the background is too similar or the subjects are always wearing the same clothes or stuff like that.

stone garden
hollow spruce
stone garden
hollow spruce
#

that way, nothing overfits, and everything gets trained equally

stone garden
#

Super helpful. Thanks.

hollow spruce
#

I woul really really recommend training faces + anything else via 2 different loras though. saves you soooo much trouble

#

for me, I can't do that since I'm doing a finetune style lora with (currently) 4k images, which is just a literal finetune like experience, rather than teaching a specific concept.

stone garden
#

And upscale?

normal ember
hollow spruce
# stone garden And upscale?

using all high quality source images, with 0 upscale needed was the biggest improvement so far. Really helps with keeping fine detail consistent across all images generated.
My current dataset is manually edited - so all images are cropped to my ideal standards (2:3, 4:5, 1:1, 16:9, etc.)

but when you're training just a single concept (emphasis on concept, not style), then 1:1 crop will gave the same results as using completely mixed buckets.
If you only use one non 1:1 resolution, like for example 2:3, then expect your lora to perform better when generating images at that exact aspect ratio, and somewhat worse at all other aspect ratios.

If you're doing a style lora, than just use the images in various aspect ratios, and maybe crop a few so you can also have some 1:1 examples in there - in case there were none

#

fyi, this can also be misused to essentially make an aspect ratio lora, to give more consistent results with 1:4 or 4:1 aspect ratios, which usually only work around half the time.
if all your source images are 2048:512

normal ember
#

Base vs LoRA. Not sure what to think about it.

covert pagoda
#

@hollow spruce what do you use AdamW for? Usually more the 8bit variant? I'm using an A100 on runpod, but wondering on what scale are the benefits of the added precision if needed

tall condor
#

Hi Guys, im trying to train bodyparts "Fingers" so i made a libary of like 300 images of Hands and Fingers. im creating a lora from that which generally works ok but my issue is that as soon as i apply the lora the resulting images get very narrow

#

is there a tag / way i can use to basically mark it as bodyparts and avoid that

normal ember
#

Loss from tensorboard. I picked the last epoch.

sonic narwhal
stone garden
#

anyone had size mismatch errors with training sdxl ?

hollow spruce
hollow spruce
#

(would probably be worth running a few tests to compare them with the same dataset + settings, and see if there's a difference)

normal ember
stone garden
normal ember
stone garden
normal ember
#

--full_bf16 does not bomb on me with --sdpa. Use use that instead of --xformers.

normal ember
#

DoF

stone garden
#

cool that just looks like that street photography i was going for

normal ember
#

I'm running another dataset now that I've made today, 225 images total.

stone garden
#

how do you train it?

#

i was trying to but it kept failing on me

normal ember
#

With kohya_ss using sdxl_train_network.py. I took some inspiration from Caith's configs but removed some for the defaults and changed some based on this: https://hoshikat-hatenablog-com.translate.goog/entry/2023/05/26/223229?_x_tr_sl=sv&_x_tr_tl=en&_x_tr_hl=en&_x_tr_pto=wapp

å‰å›žć®čØ˜äŗ‹ć§ćÆć€Stable Diffusionćƒ¢ćƒ‡ćƒ«ć‚’čæ½åŠ å­¦ēæ’ć™ć‚‹ćŸć‚ć®WebUIē’°å¢ƒć€Œkohya_ssć€ć®å°Žå…„ę³•ć«ć¤ć„ć¦č§£čŖ¬ć—ć¾ć—ćŸć€‚ ä»Šå›žćÆć€LoRAć®ć—ććæć‚’å¤§ć¾ć‹ć«čŖ¬ę˜Žć—ć€ćć®å¾Œć«kohya_ssć‚’ä½æć£ćŸLoRAå­¦ēæ’čØ­å®šć«ć¤ć„ć¦č§£čŖ¬ć—ć¦ć„ćć¾ć™ć€‚ ā€»ä»Šå›žć®čØ˜äŗ‹ćÆéžåøøć«é•·ć„ć§ć™ļ¼ ć“ć®čØ˜äŗ‹ć§ćÆć€Œå„čØ­å®šć®ę„å‘³ć€ć®ćæč§£čŖ¬ć—ć¦ć„ć¾ć™ć€‚ ć€Œå­¦ēæ’ē”»åƒć®ē”Øę„ć®ć—ć‹ćŸć€ćØć‹ć€Œē”»åƒć«ć©ć†ć‚­ćƒ£ćƒ—ć‚·ćƒ§ćƒ³ć‚’ć¤ć‘ć‚‹ć‹ć€ćØć‹ć€Œć©ć†å­¦ēæ’ć‚’å®Ÿč”Œć™ć‚‹ć‹ć€ćÆč§£čŖ¬ć—ć¦ć„ć¾ć›ć‚“ć€‚å­¦ēæ’ć®å®Ÿč”Œę³•ć«ć¤ć„ć¦ćÆć¾ćŸåˆ„ć®čØ˜äŗ‹ć§č§£čŖ¬ć—ćŸć„ćØę€ć„ć¾ć™ć€‚ LoRAć®ä»•ēµ„ćæć‚’ēŸ„ć‚ć† ć€Œćƒ¢ćƒ‡ćƒ«ć€ćØćÆ LoRAćÆå°ć•ć„ćƒ‹ćƒ„ćƒ¼ćƒ©ćƒ«ćƒćƒƒćƒˆć‚’čæ½åŠ ć™ć‚‹ …

hollow spruce
#

so. yeah.
It works? šŸ˜•

#

why does it work 🤣

normal ember
sonic narwhal
normal ember
signal warren
#

I think he just wants the json file to have an idea what settings, learning rate etc you used lol

#

Up to you though!

normal ember
#

But there's probably better settings so take it with a big grain of salt

signal warren
sonic narwhal
normal ember
#

Doing another run on another dataset with about the same number of images and I can't understand why I get about the same loss? Only thing I can think of is the captioning is similar format.

#

Ok, not the same but similar curve.

open merlin
#

Yea loss curves tend to flatten out relatively quickly on average I've noticed. You should probably check on your training data. And maybe store the gradients so you can simulate a larger batch size. Up the learning rate too then. Then use cosine with restarts (like 2 a 3 cycles. That seemed to work for me when I trained my last lora. Also just use repeat 1 and up the number of epochs, so you can save intermediate results.

ruby pond
#

Is ~5.7s/it about normal for batch size 8 lora training on a 4090?

torn spade
#

hey yall - having some issues with a lora im currently making right now. I'm training to train on the bberny belt/skirt by diesel, and was wondering if my image selection is a little poor.
would appreciate if yall have any feedback on image selection and labeling
https://github.com/matthew2k/bberny_lora

GitHub

training images + data for lora training. Contribute to matthew2k/bberny_lora development by creating an account on GitHub.

young crater
# torn spade hey yall - having some issues with a lora im currently making right now. I'm tra...

I am by no means good at lora training, but I noticed that I got the best results when I could easily describe everything in the scene I didn't want to be trained.

For instance, Image 15 in your set, I have no idea what the top is or how to describe it and I suspect the ai would assume that's part of what you want from your training data (also the skirts being covered by the top, which will hinder training). Image 18, in comparison, is probably a solid image as the ai knows what a Shiny Jacket is and a person standing in a white photo studio.

**As for the caption, for image 18: **

mini skirt, a fashion photo of a woman standing, wearing a silver metalic jacket, high heels, holding a purse, black hair, brown skin, white photo studio background

This gives a keyword the ai already knows (mini skirt), a simple description of the subject, a description of each part of the subject you dont want the lora to remember, a description of the background.

**For Image 19: **

mini skirt, close up photo of a torso, brown skin, white background

Should be enough to get the point across. In both of these cases, I believe it is important to specify skin tone or else the LORA will err towards what ever skin tone is most prevalent in your training data.

Image 24 looks too compressed. Image 27-30 aswell. Since SDXL training is 1kx1k, any visible compression will leak into the LORA. I've found even high res photos downscaled to 1080p do better than still frames of 4k movies in terms of compresion/sharpness. It looks like, in your outputs, the image compression/resolution is baking into the LORA. Also a single bad image can screw up a training. Err towards less images of higher quality rather than more of lesser quality.

torn spade
torn spade
torn spade
young crater
#

But its not really describing what you dont want, just describing each feature of the image

latent charm
#

I had two test lora on peace sign hand. The current issue is that nail usually draw wrong and thumb connected with ring finger. Where could I find high quality training image? I think it should be perform better if the training set get improved. @hollow spruce Do you have any suggestion?

steady jackal
#

Hi guys, i was wondering if its possible to train a lora from an existing .safetensors file? Ive been looking all over but cant find a clear answer or way to do it.. From what i see i can do it from some webuis but i cant find how to do it in python. Do any of you maybe know of a guide or something? When i google it i get results where you train from base and end up with a safetensor instead of start with one.

latent charm
#

The original LECO git doesn't support SDXL yet. Someone modified it and published on civitai. Also, he was trying to improve it to apply fansy training like, generated by model A and train on model B, etc.

steady jackal
#

Looks like that does what i want, but it seems to be mainly focussed on erasing concepts?

#

I dont really see an example for training on images. only prompts to remove from the model.

latent charm
#

The idea is to use prompt to generate "images" from model and train it

steady jackal
#

Do you have a github link or something for me of one of these modifications? I'm currently looking at https://github.com/p1atdev/LECO but this might not be the right one?

latent charm
#

It is the one which haven't support sdxl yet

stiff dust
steady jackal
stiff dust
#

safetensor ist just a file format. Do you mean: train a lora from an existing lora? Or training a lora from another sdxl checkpoint?

#

but what is the issue then? In kohya_ss you give the safetensor file of the model as parameter

#
--pretrained_model_name_or_path="/path/to/sdxl/model.safetensors"
steady jackal
stiff dust
#

ah, okay. I mean you can use diffusers if you want to write the code yourself

steady jackal
#

i have tried but i dont think this accepts .safetensors format?

stiff dust
#

otherwise kohya sd-scripts is a nice collection of python scripts for lora training

latent charm
#

For general lora training, just use kohya ss is enough

steady jackal
stiff dust
steady jackal
stiff dust
#

.ckpt or .safetensors doesn't matter. It's how the model is stored within these formats

#

models are usually stored as python dictionaries of key -> tensor. .cpkt is just a pickle of these, .safetensors is a more restricted serialization routine

steady jackal
#

it didnt accept the .safetensors etc.

stiff dust
#

the problem is that diffusers has different key names than auto111/kohya/sai

steady jackal
#

i will for sure have a look at kohya

#

i wanted to use diffusers but it got really confusing as it just refused to accept the format.

stiff dust
steady jackal
#

mig thave been doing something wrong, but also could find a guide on it at all

steady jackal
stiff dust
#

I just don't know if they already work for sdxl

steady jackal
#

its fine it it doesnt, atleast then i can try to get the hang of it and start learning šŸ™‚

#

thanks a lot both of you

stone garden
#

kohya script train network keeps crashing after trying to load the network in UNet2DConditionModel print

#

Missing key(s) in state_dict: "down_blocks.0.attentions.0.norm.weight",

#

any suggestion?

hollow spruce
#

fun fact. I'm also doing anatomy training right now

#

plz send help. RTX4090 isn't fast enough 🄲

hollow spruce
latent charm
#

Thanks for share. I gather my training set from pexels. Most of them are jpg and I got the jpg effect burned into the lora.🤣

hollow spruce
#

too many selfies in my dataset 🤣

latent charm
#

I think the lora learned the general shape of the pose but failed in detail. Do you have advice for that?

hollow spruce
#

probably a valid reason to switch to cosine. Though I have no advice at what rate to start at

latent charm
#

How big is enough? My dataset only has 26 for now.

hollow spruce
#

if they are similar enough, then 30~50 for overfitting. 100~200 should allow you to get away with only a small amount of overfitting
200~500 for "flexible". <- but that's not really needed, as you probably always want a peace sign if you load your lora. so just do it via overfitting.

#

general rule of thumb, is 10x the dataset for a flexible lora, and 100x for full finetuning

#

I currently have 200~250 images per body part I'm training x_x which is why its training forever 😭

latent charm
#

I would try 50 first then continue gather more.

west plank
#

I just used Kohya to make my first LoRA and it was surprisingly easy. Made a lot of first time mistakes but it still turned out super functional. Used way too large of a data set, like 2000 images. I don’t need that much >.>

sullen locust
#

Hello I am using stable diffusion xl and i am trying to create my avtar by using lora training via kohya_ss and i am having some error attached in it , help me . thanks

young crater
young crater
# sullen locust

it looks like kohya is unable to find your training images, they should be setup as:

/[project]/img/[steps]_[keyword]/[images]
for example:
/avatar/img/1_man/image_01.png

brittle plaza
#

hey there! curious if anyone has had issues with distortions with training lora's on people. here's an example of a lora i trained recently (V* shaking hands with natalie portman) but there's some pretty significant facial distortions.

I used kohya ss with ~15 images, celeb token, no captions, no regularization. could this be due to lack of regularization?

stone garden
#

Does anyone have guides for fixing teeth if one doesn't have close-ups of the subject's teeth? Something like an embedding?

stone garden
#

bitsandbytes says no gpu support on amd anyone faced this?

normal ember
stiff dust
normal ember
#

Yep, few passes of base on it after upscale can fix it.

hollow spruce
# hollow spruce

40hours later on my rtx4090. 1.4k image dataset - 2x 40epoch runs with constant (5e-4) vs cosine with restarts (1e-3 + restart every 5 epochs)
constant with significant warmup, won hands down. Lora is working well enough after 12 hours. This is essentially a finetune when it comes to body anatomy. But realistically seen it would have to run for about to 60 more hours until it achieves perfection 🄲
cosine with restarts was my attempt to speed this up, but after 12h of training, it's already worse than the other lora, rather than better. While it might just converge at a later point, it definitely defeats the point of saving time šŸ¤·ā€ā™‚ļø

no dropout, no offset noise, no min snr gamma - since I didn't want to damage the base sdxl capabilities.

Results? Near perfect with constant. Model works almost identical to base, backgrounds aren't influenced at all, but everything about anatomy is now working on the first attempt. I'd show results but for nsfw reasons this obviously isn't an option XD
I'll probably have to move to a A100 stack once I combine this with my master dataset 🄲

But yeah, if anyone wants to train hands/feet/different body shapes/nsfw/skin detail - feel free to hit me up in a dm. I now have well working settings, which don't rely on good captions - but oh god is it training slow as hell.

hazy schooner
jade hornet
jade hornet
#

That's the one I use

#
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++ DEBUG INFORMATION +++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++ DEBUG INFO END ++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Running a quick check that:
    + library is importable
    + CUDA function is callable

SUCCESS!
Installation was successful!
#

that will show you if it's installed correctly

tepid sundial
hollow spruce
tepid sundial
#

I was more so asking about the code for the training loop you use. Do you write your own or use one that's available?

hollow spruce
#

ah, yeah. I use standard kohya gui - as I'm still writing a guide

tepid sundial
#

Ah, I see

hollow spruce
#

hard to make a good guide with all custom code XD

tepid sundial
#

Hmm, perhaps šŸ¤·ā€ā™‚ļø
A well annotated notebook is something thing community would benefit from I guess. There are many training scripts out there right now, and the one that was recently merged into diffusers to train SDXL for txt2img has major warnings that results aren't good, and would require heavy hyperparam search. Most scripts out there are fairly similar, but differ on many small (but maybe important details). Like offset, min-snr, terminal snr, etc.

It just feels in general like there's too little data being shared around successful training runs.

hollow spruce
#

checks out. diffusers training is hard as hell

#

simpletuner, while still hard, is definitely your best bet

#

does require coding knowledge though

#

(for full finetuning that actually works)

#

A6000 required or better

tepid sundial
#

The author of SimpleTuner helped and did thorough review of the PR that introduced that training script, there definitely were many suggestions from him that wasn't implemented in the final script that got merged.

#

In my own script I've tried combining strategies from diffusers, simpletuner and sd-scripts, but without good public data on sucessful training runs, it requires so much wasted time to search the param space. I truly wish model makers would be more open to sharing data. But alas they view their models as IP, rather than wanting to partake in research, they will sit on their "secret sauce".

hollow spruce
#

training via kohya is essentially fully working though. my best lora so far has achieved what would in 1.5 have been called a full finetune

tepid sundial
#

And forgive my ignorance, but when you say kohya, you simply mean a GUI frontend over the logic in sd-script?

hollow spruce
#

I'm mainly using the gui frontend for the simplicity of sharing settings, but the launch cli command is the same

tepid sundial
#

Okay, just wasn't sure if the GUI did additional things. I've never checked it out.. but the code for sd-scripts.. I've read one too many times by now..

hollow spruce
#

the gui does have the advantage of having all the new 'working' parts from the dev branches of kohya-ss integrated

#

so you don't need to mess with the dev branches yourself

tepid sundial
#

What's been your experience with training diverse aspect ratios (clipped to the ratios mentioned in the SDXL paper, however).

hollow spruce
#

other than that, nothing really

hollow spruce
#

so as long as you have multiple mixed aspect ratio buckets, it works even better than standard 1:1 ratios

tepid sundial
#

What's been your experience on bf16 vs f16, had issues with bnb AdamW8?

hollow spruce
#

had enough people recommend me to switch from AdamW8bit to AdamW + full bf16
I've seen no negative impact from this. Vram is roughly the same, so I've just stuck with it.
Would need to run identical tests though, to see if the additional accuracy actually improves things or not

#

not really a priority for me though, as there's no downside of running full bf16 for now

tepid sundial
#

Have you tried running full UNET training with batch size one on a 4090?

hollow spruce
#

did various 1 batch tests, with additional accuracies, including one run with gradient checkpointing off XD
yeah. while results were pretty different from what I was expecting, I can't say that a single one of them was actually better. just different?

#

offset noise I've had to stop using though, as it was making my backgrounds greyer. hence why most of my shared settings have 0 offset noise

tepid sundial
#

What's your stance on the viability of only training loras as opposed to full unet trianing for the further development of the SDXL ecosystem? (text encoders too, for that matter)

hollow spruce
#

more complicated of a matter

#

clip training is great, but nothing like SD1.5

#

any old tutorial/knowledge is no longer valid when it comes to sdxl clip training

tepid sundial
#

Likely not the bottleneck anytime soon; but the lack of full unet trainings has me concerned

hollow spruce
#

but when done right, its really really good

#

full finetune is ... yeah. resource heavy

#

even my rtx4090 isn't really good enough

#

batch size 1, even with GA, isn't a true solution

tepid sundial
#

Can't say I'm happy with the setup on a 4090, either. Which is a bummer

hollow spruce
#

A6000 is now working, if you go the diffusers route

#

but as I dont have one XD I cant really speak much about it

#

instead I'm just seeing how far I can take lora training

tepid sundial
#

While it's going to be slow, I think it would be great if a setup that at least achieves good results on a 4090 would benefit the community, as it would increase the amount of unet trainings available

#

Would distribute the effort across more people

hollow spruce
#

šŸ¤·ā€ā™‚ļø genuinely not sure, if I just look at all the loras that are currently publicly available

#

the bad lora echo chamber is real

tepid sundial
#

There's always going to be noisy signal, but unless there's good tooling available, you can't hope to filter out decent signal from all the noise to begin with

hollow spruce
#

true that

#

I'd get into it if it runs on 24gb vram

#

but at this rate, chances are higher I'll throw my full dataset at the owner of simpletuner once its complete XD

#

currently at 6k images. final will be around 50k images. (all manually edited/cropped/captioned)

tepid sundial
#

Well, with batch size one it runs for only unet, so it's wicked slow. But slow and working is better than nothing at all

hollow spruce
tepid sundial
#

I've tried trainings with datasets ranging from 5k images to 20k images, batch size one. And results have been very mixed depending on what techniques I incorporate. Sampling data on training results is a ... very slow process.

hollow spruce
#

at that point its cheaper to rent a runpod A100 stack, just to offset the electricity costs

tepid sundial
#

Yes, obviously it's the way to go. I'm just trying to optimise for all the people out there that simply will never run a training if it means renting on cloud, but if it means leaving their computer on during the night for two weeks, they'll give it a go.

hollow spruce
tepid sundial
#

That depends on what we're talking about specifically when we say "cheat'

hollow spruce
#

higher learning rates, or adaptive optimizers, which scale up the learning rate for you

tepid sundial
#

Yeah, and this is what I want to find more data on : D
It's hard to argue these things without any collected evidence

latent charm
#

I had tried once fine tune with 10k images. It takes 5x24 hours on my 3090 and I predicted it needs two more rounds (5x24x2) to get things done. Then, I giveup to fully fine tune and used a smaller dataset to fine tune the first output.

hollow spruce
tepid sundial
#

Have you tried doing FID?

hollow spruce
#

if I can beat or match 3/5 images, using the seed 2, then I consider my lora as 'good enough' XD

hollow spruce
hollow spruce
#

nop. have never used FID yet. but that was an interesting read x_x

stiff dust
tepid sundial
#

Right, we were talking about full unet training runs

stiff dust
#

doesn't matter except you train the unet on a complete dataset of many different subjects

#

but fid doesn't really measures aesthetics anyways

marsh brook
#

Anyone into the following

I am a sock manufacturer.
I am looking to have AI create image Files from images given by customers and utilize AI image creativity.

Image files created are used to transmit data to a machine to engage functions.
Data transmission or Data signal designators to the machine are represented by the RGB colors located in the file. Machine capability is limited. RGB colors that can be in the file must be limited. Currently Ai image generators use shading, gradient, etc.. in creating images. you also can not designate the image size more specifically image size in Pixels.

Example
168 pixels wide 400 pixels height.

168 represents the 168 needles that are in the cylinder of the machine.
400 represents how many courses are in the sock. or how many times the cylinder has rotated picking up different colored yarn at its yarn intake points.

RGB colors in the file are used by technicians to designate fixed yarn takeup points on the machine.

Transmitting the data to the machine is not what I am looking for. I am just looking to create images

#

Current reasons that ai images on all ai platforms do not communicate with machine equipment that makes textiles. 1 non able to mandate size of file in pixels. 2 non able to mandate number of allowable rgb colors in the file.

hollow spruce
ruby pond
#

Is it normal to see images that are closer to the reg images than training images in the earlier epochs of training a lora?

plain bolt
#

Is captioning really important if I am training a subject that is scifi and doesn't look anything like real life?

normal ember
#

Do you think that the clip vision model or something like it could be used to ā€captionā€ images during training?

gloomy prairie
#

Is this red and green noise a common occurance? 🤨

quiet eagle
#

What's the current state of fine-tuning sdxl with 4090 - Lora in a few hours, full fine-tune not possible?

stone garden
#

so at least i can get to the train steps now but it keeps getting killed with SIGKILL 9 for some reason

#

only got to like 70 steps last time and it took like half an hour not sure if this is normal speed or if the gpu isn't being used it didn't make ths fans spin like a jet about to take off

#

another debugging week i guess not sure if this is worth it

jade hornet
#

and run that command I posted above to make sure your bitsandbytes is good

stone garden
#

my version errors out

#

also i dont have rocm-smi?

jade hornet
#

that wont work, the pip will install the cuda version of it, you have to git pull and compile it yourself

stone garden
#

it says needs nvcc but thats nvidia right

#

python dependencies are a pain to work with lol

jade hornet
#

ignore that, read the readme

#

remember this was patched to work with rocm

#
git clone https://git.ecker.tech/mrq/bitsandbytes-rocm
make hip
CUDA_VERSION=gfx1030 python setup.py install # assumes you're using a 6XXX series card
python3 -m bitsandbytes # to validate it works
stone garden
#

cool thanks

#

make says no nvcc in path

jade hornet
#

did you edit the makefile to point to your rocm location?

#

make sure you have all the rocm packages

ruby pond
#

Is this 'done'? Will it just get worse from now? Or should I let it keep going?

jade hornet
#

never really decided based on a graph of loss, I always look at the quality of the sample images to figure out if it's getting better or worse

#

honestly, if you were putting in a prompt and getting an image out, how would you decide if it sucked? not a graph surely

ruby pond
#

It's still climbing

ruby pond
#

yeah it's done šŸ˜„ the output from the latest save is complete garbage, totally scrambled

#

good to have my gpu back after 3 days of training

opal jacinth
#

Hey, what are your experiences for using triggers while training an own lora for a person with SDXL? Because I've seen some tutorials where it states random triggers like "sks" are better, but at the same time some other tutorials mention that the persons name is just fine and random triggers aren't working that well

normal ember
ruby pond
normal ember
ruby pond
normal ember
#

tensorboard --logdir path

ruby pond
normal ember
#

What does 16 or earlier look like?

sullen locust
#

Hello , respected ones having some issues while training lora in my cmd , seems like my lora doesn't prepared . Some ss I shared check them , Thanks .

ruby pond
normal ember
#

Try even earlier

ruby pond
#

I tested all the saves

normal ember
#

Great!

ruby pond
#

I'm glad I got something out of it, since it was running for 3 days šŸ˜„

normal ember
#

I’m in the process to try to see what impact parameters and dataset has to the result. That’s why I’m curious.

#

Ran during the night but haven’t had a chance to look at the results yet.

#

I know I have to adjust the dataset though. But it’s fun to see that a specific feature gets into the LoRA.

#

I think it’s really good to have a good stable source when learning so you have the possibility to change the dataset and iterate.

ruby pond
#

This was the first photographic lora I trained with over 9000 images. I've done a few style loras with a few hundred images to train on that turned out pretty good.

gloomy prairie
#

Has anyone else encountered these horizontal artefacts and/or a solution to eliminate them? šŸ¤”

marble zodiac
normal ember
#

Yes

marble zodiac
gloomy prairie
marble zodiac
#

it's not in your training data

#

these artifacts only happen when an image is decoded - so only when the RGB image is created. the encoding is not being influenced by it as far as I know. so your training / fine-tuning is fine

gloomy prairie
#

Oh right, okay šŸ¤”

#

So nothing wrong with the checkpoint files, just to be clear? I just need to use a different VAE for the decode step?

gloomy prairie
marble zodiac
marble zodiac
gloomy prairie
#

Great. Thanks for explaining that! šŸ™Œ

marble zodiac
# gloomy prairie Great. Thanks for explaining that! šŸ™Œ

Sure. You're welcome. I'm not even training šŸ˜„ but I have troubleshooted that a couple of times already for people. I use a separate VAE at all times. You can use it in a1111 and ComfyUI without a problem.

But if your model uses the SDXL 1.0 VAE and you distribute the model to Civit or any service and they are not aware of it, people will make images with lots of bad artifacting - which isn't good.

latent charm
#

Try to overfit an image into a lora.
Experiment setting:
Lora Type: LoRA-FA
Dim 128, Alpha 1
Learning rate 0.01
Text encoder rate 0
Repeat_600, batch 10, epoch 10

Image 1: loss graph
Image 2: training set
Image 3: reproduce image

#

The final loss is around 0.00367

#

Using the final epoch to reproduce the image with only class token in positive and no negative. The brightness of the reproduce image is slightly darker than the original.

versed crescent
#

Is the ideal graph shape for loss when training to have it slowly drop in a linear way?

snow pawn
#

Hello Community
Could any one suggest me how to train the finetuned model of Indian Hindu Gods

latent charm
versed crescent
tall condor
#

with that LR in the end your model will not be able to produce anything else than your image

#

it will completely overfit even if you dont even call the concept

versed crescent
#

Ah so that means LR is only an indication of overfitting?

latent charm
#

May be it is because the training didn't train the text encoder? It seems only affected the class token to reproduce the image

tall condor
#

ah ok in this case its a lil different

glass sorrel
#

Hello Community!
How can we fine tune the indian hindu gods , could anyone suggest me to achieve this

Thank you

versed crescent
#

Is there any document or video that discusses the various different types of LoRA that people experiment with in SD ?

versed crescent
hollow spruce
#

use google or bing translate for it

#

not too much info on the types themselves, but a lot of info about everything that the types enable you to do - so in theory very helpful

versed crescent
#

Ok I'll see if I can get some understanding from them. Whenever I dig into this stuff I end up with a thousand tabs šŸ˜„

hollow spruce
#

that site should cover about 80% of what you need to know about all the different settings of training (and when you need a non LoRA training type)

versed crescent
#

I'm still persevering with your recommended LoRA training setup, and it's fascinating to see everyone's choices as more and more tutorials/videos get published

hollow spruce
#

last 20% are trial and error, as sdxl is quite different from SD1.5 in terms of anecdotal information, which there is a lot of online

versed crescent
#

yeah, I am constantly looking to see if I've accidentally stumbled into an older v1.5 guide

#

I had my first training session crap out this evening with a NaN for the loss, and black sample images. I'm glad I caught that before heading to bed

ruby pond
#

there is a model that has 0.9vae in the filename. use that

#

It's the 1.0 model checkpoint, with the 0.9 vae baked in

hollow spruce
restive bridge
#

I started getting black sample outputs during XL lora training, but the loss isnt NaN and the checkpoints work as expectedšŸ¤” guessing its something to do with switching mixed precision to fp16 from bf16 and disabling full bf16 training. I did those things hoping for better quality, should it help or should i stick to bf16?

ruby pond
restive bridge
versed crescent
#

I find it odd that in some tutorials people seem to be adjusting Network Alpha like it’s the same as Network Rank. From what I understand, it’s more like a multiplier on the values stored in the LORA ?

#

It’s not like Rank and Alpha are the x and y dimensions of some tensor. Maybe I have it wrong ?

#

Ah no I’m not wrong, this is from Caith’s recommended link from yesterday.

versed crescent
#

@hollow spruce Hey I’d appreciate a hand with prompting and workflows for trying my freshly trained LORA. I’ve tried some basic Comfy workflow, but as the strength of the LORA goes up, it pulls the resulting image straight into looking like one of the training images, regardless of prompt. I don’t know if this is an issue with my lower image count while training, or if it’s over fitting.

Looking at the checkpoint images made during training, the likeness converges in a nice linear way across the 300 epochs, and it looks correct after 250, so maybe I am not using positive and negative prompts correctly? Or style prompts? Because Comfy is so much of a blank slate I don’t know if I’m just approaching this in too simplistic a way.

dusky apex
dusky apex
dusky apex
#

Here is my current command, just in case anybody of you find something strange. On a 3090, target is SDXL.

.\accelerate launch
--num_cpu_threads_per_process=2 "C:\Users\daf\automatic\kohya_ss\sdxl_train_network.py"
--enable_bucket
--min_bucket_reso=512
--max_bucket_reso=2048
--pretrained_model_name_or_path="C:/Users/daf/automatic/models/Stable-diffusion/sdvn6Realxl_detailface.safetensors"
--train_data_dir="blah"
--resolution="1024,1024"
--output_dir="blah"
--logging_dir="blah"
--network_alpha="1"
--training_comment=trigger=blah_0.4
--save_model_as=safetensors
--network_module=networks.lora
--network_args rank_dropout="0.15" module_dropout="0.15"
--text_encoder_lr=0.0005
--unet_lr=0.0005
--network_dim=24
--output_name="blah_v0_4"
--lr_scheduler_num_cycles="70"
--scale_weight_norms="1"
--no_half_vae
--network_dropout="0.2"
--full_fp16
--learning_rate="0.0005"
--lr_scheduler="constant_with_warmup"
--lr_warmup_steps="166"
--train_batch_size="2"
--max_train_steps="3325"
--save_every_n_epochs="10"
--mixed_precision="fp16"
--save_precision="fp16"
--seed="1234"
--caption_extension=".txt"
--cache_latents
--cache_latents_to_disk
--optimizer_type="AdamW8bit"
--max_train_epochs=70
--max_data_loader_n_workers="0"
--bucket_reso_steps=64
--mem_eff_attn
--gradient_checkpointing
--xformers
--bucket_no_upscale
--noise_offset=0.0
--sample_sampler=k_dpm_2_a
--sample_prompts="blah\prompt.txt"
--sample_every_n_epochs="4"

versed crescent
dusky apex
#

Can you share your training command please? On my side my subject is already recognizable at the first epoch preview, there is few difference between the first preview and the last one.

dusky apex
#

Other question, how do windows users do for displaying their training log in a tensor board ? I saw that google collab provide a board but I don't know how to load my log files there.

versed crescent
dusky apex
#

All right. Thank you for the link, I'll follow this guide. Regarding the webUI, I'll have to find out why I get python errors when using the "Train Model" button from kohya_ss UI. I spent hours trying to understand what was wrong. I don't have the console right now but the main problem was a bad interpretation of the generated training command, for instance the console said "resolution is mandatory" while it was correctly specified in the UI. I reinstalled the Bmaltais kohya_ss UI twice with no improvement. Thank you again.

#

Ah, last question (I hope), what should I use as inference model for my woman face training? I didn't understand if the base sdxl is better than specific models from civitai. Another tutorial maker wrote somewhere that the base SDXL model was too wide for that.

covert pagoda
#

anybody know how to pass model metadata such as model output name and group name on kohya to wandb using ars? do i need to go into WandBtracker Class to set parser?

versed crescent
sonic mantle
#

if i want to set different text and unet learning rates, what should i be inputting in the red box? seems like you have to put a value of some kind in there.

stone garden
sonic mantle
#

sweet, thanks

restive bridge
#

does Memory efficient Attention, Gradient checkpointing, Xformers, or Full bf16 noticeably lower quality at all?

It's really hard to figure out which parameters were made for low vram cards vs. which ones are standard optimizations that everyone should use (like xformers for inference)

dusky apex
sonic mantle
#

is DPM++ SDE Karras not available when training?

dusky apex
#

If I launch the same command from powershell, it works.

sonic mantle
#

If i'm training with a mix of these resolutions, do i need to enable bucketing? or is bucketing only necessary for resizing?

restive bridge
#

Increasing batch size absolutely wrecks fine details on faces. Time to try GA instead

sonic mantle
versed crescent
#

@sonic mantle I think they divide the rank by the alpha to determine a maximum strength of learning. Either its the max value that the lora stores, or a multiplier to the learning rate. Either way having it higher just makes your lora 'weaker' so to speak. divided by a larger number

sonic mantle
versed crescent
sonic mantle
#

that doesn't sound like an optimal setting then

versed crescent
#

no, and if you do not specify an alpha specifically, it defaults to 1, so I don't know. Looking through the code to see if I can spot anything

#

Oh sorry that's not true:

  if network_alpha is None:
    network_alpha = network_dim

so it looks like it defaults to the dimension size, thus making the training modifier 1 which will do nothing

sonic mantle
#

im starting to think dim 128, alpha 128 was the better option then

versed crescent
#

yeah, that would effectively do nothing, so you could safely play with other settings

restive bridge
#

lowering learning rate from 4e-4 to 3e-4 is having no affect at all on loss. same with 1e-4. interesting that people use these graphs to determine "overfitting" yet i can clearly over or underfit this training without changing the graph whatsoever.
misinfo abounds

stiff dust
#

I would keep alpha low to give the model more time to adapt and learn šŸ¤·ā€ā™‚ļø but that also depends on your training data

dusky apex
#

Hello, thanks ot your help I could cook a working face LORA tonight! 🐸

#

Now it's time to fine tune the details. I am noticing that the face is correctly used in close-up portraits or portraits, but as soon as the character is full body or half body, the face is not used at all. My 50 training images are almost all tightly framed around the face, but the caption text don't mention close-up or anything regarding the framing. Do I need to change something : add images, change captions?

stiff dust
#

you should have mid range images in your training data

#

changing caption alone won't help

#

having 10 images of your face that all have same angle and distance is usually useless

#

rather use less images but from different perspectives

stone garden
#

Hi, I found BNK clip text encoder (the suspended node) with tensor problem if the prompt is too long and difficult. So I tested how can I replace to another one. I found another sdxl compatible node what accepting long prompts, but as a point of interest I tried to encode L and G prompts with separated non-sdxl encoders, and later used nodes for concat/average/combine the encoders outputs. Combining them the worst, but all of them useful. Maybe I like best the average node because strength settings. Is it right way to replace sdxl clip encoders to 2 separated 1.5 compatibles? Review or opinion welcome. (image contains workflow data):

dusky apex
#

it is not upscale, but native resolution

sonic mantle
#

which parameter is failing when your Lora data doesn't integrate well with the rest of the model? I trained a Lora on mostly closeup shots of keira knightley's face. But if i prompt something like standing on a ship in pirate clothes it will only generate a closeup of the face.
Does this mean the network rank was too low?

stiff dust
#

network rank is too high

#

and probably learnt for too long

sonic mantle
stiff dust
#

it's very high

#

some people use 4 or 8

#

For faces I would start at 16 or 24

sonic mantle
stiff dust
#

and increase only if quality is not good enough

dusky apex
sonic mantle
stiff dust
#

it's different. If the model is not able to generalize (e.g. change clothing of the character) it is overfitted. In this case reduce rank and/or learning rate

dusky apex
#

I just added an interesting thing, if you increase the resolution of your generation, you should be able to set more distance with the face (see my example above)

sonic mantle
stiff dust
#

but the model will never be good in showing the character from a angle or perspective it was never trained on

#

to be honest, I wouldn't train the text encoder at all

#

or just train it for one epoch but not more

#

text encoder overfits much faster than unet

dusky apex
#

what is the text encoder? the auto caption tool?

#

In my case I used it, then I cleaned 50% of the data to be compliant with Caith's tutorial

sonic mantle
dusky apex
#

ok

sonic mantle
#

but if you're just lora training someone's face there probably isn't much for it to learn

stiff dust
sonic mantle
#

oh damn i didn't know that

stiff dust
#

the text encoder is frozen

#

training it makes sense if your captions contains something that is unknown to the text encoder

dusky apex
#

I was wondering if I can improve my model by adding new pics with emotions : sad, smile, pensive etc.

stiff dust
#

a name would be a good example. It can make sense to train the text encoder on a new name it doesn't know yet. But it overfits very quickly

stiff dust
dusky apex
#

I will

#

I'm really pleased to see that SDXL generates perfect images when setting +25% resolution on some checkpoints.

#

Some others don't appreciate

#

Quick question : is it possible and easy to install and plug a standalone comfyui? The one embedded in sd.next is broken.

stiff dust
#

not more difficult than installing sd.next šŸ¤·ā€ā™‚ļø

#

it has no builtin venv, though

restive bridge
#

I recently got my best quality and flexibility when using a very low LR. Captions (wd14 tagging with human characteristics pruned) reduced the artifacts on clothing SIGNIFICANTLY, but it also hurt likeness quite a bit and didnt converge. I dropped the captions and got my best likeness and quality ever, but flexibility is lower again with artifacts on clothing.
Still better than anything I was able to achieve when using celebrity name token. And scales across different ppl and img counts

covert pagoda
# stiff dust For faces I would start at 16 or 24

Could you briefly convey your understanding of network alpha. I’ve heard differing opinions and never anything conclusive about what it actually does, nor have I ever seen experiments with useful results with different alpha’s. Is it useful at all?

latent charm
#

In lora-fa epoch 10, if I put peace sign in G and L, it provides more likeness to original

glass plank
#

What does something like "2e-1" refer to when someone speaks of making a lora?

mental hatch
#

lora-fa vs locon locon is superior for styles, which is what I train.

#

all settings were exactly the same just switch to fa vs locon.

stiff dust
stiff dust
stiff dust
# covert pagoda Could you briefly convey your understanding of network alpha. I’ve heard differi...

no, just keep it at 1.
The alpha is a scaling factor on the strength of your lora. alpha=1 means you lora is multiplied with 1/dim. An alpha of dim means your network is multiplied with 1 (thus, nothing happens).

The reason for alpha is that a network with high dim learns faster and has more strength in shorter time. When people experiment with Lora they often just change a single parameter between their workflows. So they change dim from say 32 to 128 and find that after epoch 10 the image looks much better with higher dim. However, dim 32 might look equally good if you would train it until epoch 20. It's just looks worse because it trains slower.
alpha is somehow countering this effect by reducing the speed of training with high dim loras. So using an alpha=1 just means your loras with different dims are more comparable

normal ember
#

How should one think about the network rank? Complex dataset larger network?

stiff dust
#

yeah, I would say so

#

Note that in large language models people tend to use Loras of rank 1 ^^ So you can learn a lot even with low ranks

#

In general Lora is based on a compression technique, so lower rank means higher compression means more compression artefacts. I found that the unet is a bit more sensitive to these artefacts. So using a too low rank lora for the unet is usually a bad idea. I would at least use rank 8 or 12 for the unet, maybe even higher. But you can try. You will see if artefacts appear.
For the text encoder you can use rank 1 or 2 and thats already enough. However, for some reason most scripts don't offer an easy possibility to use different ranks for unet or text encoder

normal ember
#

Other than getting a large lora what's the disadvantages of going of a too high network rank?

latent charm
#

I heard from @hollow spruce. Too high network rank would damage the model.

stiff dust
#

yeah, you want the lora parameter efficient, i.e. only change the base model as few as possible

#

I'm not sure if its really just the rank or more an combination of rank and alpha and learning rate, but you just don't want your Lora to overfit and "damage" the model

#

"damage" means: usually you train your model on a large variety of images. Training it only on a single subject damages the model, because it starts forgetting what it had learned before

latent charm
#

I dont know what does the "damage" mean

#

Thanks for explanation

restive bridge
#

Olivia showed an example on Twitter of 256 dim damaging the background of an image vs. 24 dim.

But other factors like LR could've affected that difference too. that was hardly a scientific test.

stiff dust
#

that's what I meant

#

just from mathematical viewpoint I would say it's a combination of LR and dim that causes the effect

restive bridge
#

yes. maybe scheduler too

stiff dust
#

so a too high DIM should not necesarilly be a problem if you keep your learning rate low enough. But why would you want to do that?

stiff dust
#

I don't think that you need high dim Lora for that

restive bridge
#

idk my config won't make perfectly circular and symmetrical eyes at any LR or epoch except 1e-4 and lower

normal ember
#

If you decrease the network rank do you also increase the learning rate? Or is that only true if you fiddle with the alpha?

stiff dust
#

for perfect iris I found it helpful to add very few cropped super high res images to the training data

#

but in general problems like unsymetric eyes and so on are problems of SDXL itself

#

if you fix that with your lora than probably just because your lora is memorizing your training images

#

which is usually a sign of overfitting

restive bridge
stiff dust
#

in theory, the better strategy would be to train on a large variety of photos of different peoples to train SDXL to make perfect eyes and THEN train on your face

latent charm
#

I feel SDXL is undertrain in many aspect like eyes

restive bridge
stiff dust
#

hm, that's strange. My experience so far is that a lora on my face is sometimes making things wrong, but it is doing so at the same rate as SDXL is doing anatomy or eyes wrong on other people

normal ember
#

There seems to be so many parameters affecting the lr one way or the other so it's hard to adjust accordingly.

restive bridge
#

Unfortunately for my use-case I have to assume the worst case scenario of 12 low quality images. I can't control what goes in, but need to guarantee it comes out good. which is easy until a high quality dataset is neutered by my "safe" config

hollow spruce
#

yeah. damage was probably the wrong word to use.
Basically when you have a huge 256dim lora, and dont train it on a metric ton of images, then you'll quickly see things like like details being forgotten/replaced. First the backgrounds stop having any detail, and literally fade into these weird contrastless messes. Then colors become more flat, as gradients stop showing up, and this will continue and continue as most "detail work" stops existing.

See how... undetailed secourses images look?

normal ember
#

Like you have created a way to big network in front of the base model

#

I wonder how much one should decrease the lr if you go lower on the network.

hollow spruce
#

less about the learning rate, more a matter of how much new information you're actually inputting. how many new images are you working with?

#

like for 4k images, dim32/dim64 give essentially the same results for me.

#

so I find it hard to see genuine usecases where you "need" 128/256dim loras

restive bridge
#

I believe it but I need to see it A/B on the same seed and more levels than max and min, and whether details of the face are improving or not when close-up (surely 64 is nothing like 256). I'll do this test soon enough

#

cuz SEcourses outputs/prompts are nothing like mine

hollow spruce
#

if you go away from photorealistic, and into pure artwork, then high dims might work to some extent, since you dont care about the photo capabilities being forgotten

restive bridge
#

hmm the last time I went low I lost skin details and it felt more artistic

normal ember
#

I'm training for color palettes, grain and such

#

dof

hollow spruce
#

do rarer ethnicities, and it gets a lot harder to replicate his results

latent charm
#

I have a photo dataset and I cropped the face and save as another folder in the dataset. Would it help to increase the likeness?

restive bridge
#

my first test is always a bald Arab (our ceo). he was impossible to train on 2.1, XL is already infinitely better but still doesn't like his face

restive bridge
normal ember
#

do you captioning him with arabic?

restive bridge
#

nope

#

that'd be counter productive

#

(captioning a black person as black makes them white in outputs)

normal ember
#

Hmm, then how will the model know what nationality you want to train for an example?

restive bridge
#

you could mention it in the prompt

restive bridge
#

or even use celeb with same ethnicity as token

stiff dust
#

my own name "Kai" for example is a typical German name, but in SDXL it's strongly associated with Japan. That's why I use the name "Christian" instead when training on my face

normal ember
#

Is this only specific to LoRA training or other types of trainging too?

restive bridge
#

but if there are any Christians weighted heavily in XL dataset then you still have a problem. hence random token for stability, but hard to scale

stiff dust
normal ember
#

How do they train it for different nationalities? Just an example could be whatever concept.

stiff dust
#

I would go for a combination of a common first name and a uncommon surname

normal ember
#

Find it unnatural to train it on names. šŸ˜„

restive bridge
#

the nationality should come from the input images, not the captioning or token. it should already know if the person is Asian based on how they look in training

normal ember
#

They must have trained it somehow

#

What's the purpose of the caption?

stiff dust
#

what dal wanna say is that the model should learn to associate your appearance (e.g. skin color) with your name

normal ember
#

Sorry for asking so many stupid questions. I just want to understand the basics good enough.

restive bridge
#

to specify what about the images you want trained into the token. captions remind it what normal things are in the image so not all of it is trained

#

so you should only mention things you DONT want the token to remember about the images

normal ember
#

Ok, so then if I want to adjust the DoF, film grain, colours and such I should probably just caption what's happening in the image?

restive bridge
#

Yes caption everything except for those effects

normal ember
#

Like "a dry, barren landscape with a fence and hills in the background"

restive bridge
#

But XL TE training seems to be a fickle bitch and I have found captions to hurt face training if not utterly perfect

#

no idea for styles

normal ember
#

fence might be in the foreground but... šŸ˜„

#

What I've found is that the LoRA seems to be activating stuff that are in the same age of the items in the dataset even though it's not in the dataset.

#

For example cars get picked from the correct age even though it's not the same make, color and such.

restive bridge
#

a caption like "modern car" should fix that. specifying the "age" in captions should keep it out of training

normal ember
#

For style I like it though šŸ˜„

stiff dust
#

as said, text encoder overfits extremely fast. If you train it, train it for very short time

#

the unet should be less sensitive to these things

latent charm
normal ember
#

Yeah, I've done that. I will try with --network_train_unet_only next.

restive bridge
hollow spruce
stiff dust
#

I usually train text encoder first with dim=1 for only one or two epochs with low learning rate and then continue training with unet only and higher dim. But that's a bit complicated as kohya does not have commandline options for that

normal ember
#

Not sure if kohya_ss is able to train text only for an epoch.

stiff dust
#

but I think there is a commandline option that allows you to stop text encoder training after certain amount of steps

restive bridge
normal ember
#

I'm searching for it in the options but can't find it.

latent charm
restive bridge
#

I think it was a setting and was dropped due to problems

hollow spruce
#

I usually spend a weekend to increase it by 1~3k images. manually edited & cropped, then manually tagged

#

once I hit around 10k it would probably make sense to train my own blip from it 🤣

restive bridge
#

I knew it was there! right in kohya

hollow spruce
#

simply increasing it isn't hard - as I can just download photoshoots. But getting good diverse images, that look nothing alike is the high effort part. Usually via flickr where I filter out 90% of all images

normal ember
hollow spruce
#

currently I'm still at 50% truly random images that I chose from flickr. 50% from photoshoots, for that super high detail

latent charm
#

Thanks for share

restive bridge
#

šŸ™„

latent charm
#

yeah, I also tried.🤣

stiff dust
#

yeah, kohya is sometimes a bit unflexible... I did a lot of changes on the code myself. For subject training it is usually sufficient to only train the cross attention layers. Text encoder training can be helpfull if it's done with low dim (e.g. rank 1) and short time. Together it's totally possible to make a Lora with filesize <50mb

normal ember
#

It's a bit confusing but there's kohya-ss and kohya_ss. I guess the first is the most upstream since kohya_ss seems to be merging with kohya-ss.

versed crescent
#

bmaltais/kohya_ss is the GUI/web version. kohya-ss/sd-scripts is the original command-line version that the GUI uses

#

kohya-ss/sd-scripts is the original source of training scripts

normal ember
#

I use the scripts in kohya_ss for SDXL

#

And those do not seem to have an equivalent in kohya-ss/sd-scripts no?

stiff dust
#

kohya-ss/sd-scripts is the original implementation

normal ember
#

will it handle sdxl or is modifications needed?

stiff dust
#

you have to checkout the sdxl branch

normal ember
#

Thanks! šŸ‘Øā€šŸ¦Æ

quiet eagle
#

I'm oom-ing with pretty default settings (e.g. preset SDXL - adafctor 1.0) with a 4090 with kohya ss. Is there any basic SDXL dataset and settings I can use to figure out if the problem is with my GPU or what

#

in this case I CPU Ram OOM-ed after 174 steps (32gb), with other settings I GPU Ram OOM

normal ember
#

What's your batch size?

quiet eagle
#

1

normal ember
#

And you also set these? cache_latents_to_disk gradient_checkpointing xformers

quiet eagle
#

yeah I have those. Currently working with some other of the random preset

#

tho on all of the ones that run I only get 2.5it/s with a 4090 while other seems to report that and more with weaker GPUs so still feels like something is wrong

#

but at least I can test from here if it works for a full epoch and adjust

normal ember
#

I think bf16 will help too

stiff dust
#

I would definitely use bf16. I would also try AdamW instead of Adafactor

quiet eagle
#

yeh I use bf16 but havent tried adam

#

managed to run 4 epochs and the lora does seem to be more or less working

safe cobalt
#

Hi I'm looking for some example sets images and captions for SDXL finetuning (not LoRa), just for starting off learning and testing. Does anyone know if there are any sets out there, paid or otherwise? Anyone know if SD have any sets available that they used as part of the SDXL training?

safe cobalt
#

This will be my first finetune I try on my 3090 using the kohya GUI -> finetune.

next tapir
#

Hi - I'm training an SDXL LoRa to replicate the style of a lithograph artist. However, after 2400 steps I got this as a result. It's got the right vibe, but the actual "style" of the photograph didn't seem to transfer. Instead, a very soft, painterly style transferred over. I started to see the consistent style forming at 1200 steps, but it always stayed "oil/painterly" and never inherited the pencil/etched look. Would this be indicative that I need more steps / training? Or perhaps I've overtrained? This is being done with Kohya GUI, and 24 sample images at 100 repeats.

mental hatch
next tapir
mental hatch
#

XL is a new beast. Do not use unique keywords.

next tapir
#

Oh, really?

mental hatch
#

I was having issues until I saw a vid that said that but my findings were already showing to not use uniques

#

With two TEs they fight each other so only train the unet

next tapir
#

Should I just generate captions for each image instead?
And for style training, should my captions be more related to the content of the image or the styles that most closely match it?

mental hatch
#

I gotta tell you about XL. You know what captions are all about, right? TE. I tested this and caption or no caption captions are worthless unless you train the TE. I did that and instantly different if I used captions or not. I no longer screw with captions UNLESS I am daring to screw with the TE.

restive bridge
#

I feel it is important to mention that repeats and epochs are interchangeable when it comes to total step count but
if you are using regularization, use high repeats on img folder
if you use 1 repeat, only a few of your reg images will be used (the same amount as your training images). if you got a big reg folder from SEcourses or Aitrepeneur patreons or elsewhere, use the maximum amount of repeats you can before (training imgs * repeats > reg imgs). you want a unique reg image per training image repeat. so you want at least as many as reg images as (img count * repeats).

repeat_1 may be the easiest way to calculate max steps but it basically nullifies your regularization

mental hatch
#

agreed.

#

Though a lot of model makers have ditched repeats for more epochs as they say it makes better, more refined, models.

#

I hate dealing with reg images it is a pain in the ass.

normal ember
mental hatch
normal ember
#

I wonder what Ejektaflex's class token should be if he should be skipping captioning or can you skip that too somehow?

mental hatch
#

my token is a known style to XL. For instance, my released locon has the keyword Cartoon. The fun one I just did as I learn how to do people was segal (for steven segal).

#

there will be a v2 of segal as I am not satisfied with it but I lacked images mainly

#

for a class token tbh I don't use them as I don't regularize a style as I want it to over take everything

normal ember
#

Can you have multi word tokens?

mental hatch
#

yes

#

steven segal man. That is the activation word(s) + class of man

normal ember
#

did you have only "steven" or "steven man"?

#

I wonder if some stuff are way more trained in the base model and hard to change or if it doesn't matter much as long as you have a match and don't have to train the text encoder.

opal jacinth
restive bridge
# opal jacinth do you mind sharing your json config for lora training <@338908442603290625> ?

I won't have the config on me for a couple days but it's something like
18 imgs, 1e-4, 20 repeats, 5 epochs, batch size 3, adamw, constant w/ warmup, "ohwx man", bf16, dim 64, 600 real reg photos, no captions.

That's just the current point in a perpetually evolving recipe. Its not something I'd recommend to anyone. It worked good on just one test, I have no idea how it performs at scale yet. not efficient either, for 24gb gpu there's a lot of vram headroom but raising batch size seemed to hurt the quality, or maybe cuz I didn't and can't evenly divide the image count by the batch size, which is the correct way.

opal jacinth
restive bridge
versed crescent
restive bridge
# versed crescent What workflow are you using to confirm whether the lora is well trained? the sam...

lately I start with a basic photoshoot prompt and check for likeness first. if it passes I try a vintage prompt. if over fitted it will often fail to put the person in black and white, in which case I roll back an epoch til it works right. if likeness is still there at that point I move to a heavily stylized prompt that forces an intricate outfit and environment. it will either pull likeness away, or have a ton of artifacts everywhere, or by some miracle will work good. at which point I'd try a couple more prompts and train again on different images which rarely ends well and the process restarts

ruby pond
#

anyone got any tips for automatically capturing still frames from videos with minimal motion blur?

jade hornet
#

That topic belongs in an animation channel

signal warren
#

Not necessarily, he probably wants to capture stills to train

#

Personally I just capture the stills manually while watching, since I want to capture exactly what I want.

#

So I can't help much.

ruby pond
#

yeah training an analog film lora requires stills from analog films šŸ˜„

safe cobalt
#

Is an advantage to using both captions and tags when fine tuning the SDXL base model,or just one or the other?

covert pagoda
#

anyone know the cause of this error on kohya_ss: HFValidationError: Repo id must be in the form 'repo_name' or
'namespace/repo_name':
'/workspace/stable-diffusion-webui/models/Stable-diffusion/sd_xl_base_1.0.safete
nsors
'. Use repo_type argument if needed.

gilded kindle
#

Hello! My understanding is that SDXL was trained on images with a variety of aspect ratios. However, most of the Python training scripts I’ve seen involve reshaping the image to a set resolution across the train/val splits. What’s the best resource for training a SD model on varying aspect ratios?

gilded kindle
covert pagoda
gilded kindle
covert pagoda
#

I’m running on kohya_ss on runpod. The model in question is local to the volume @gilded kindle

#

Its the source model tab

#

Under LoRA training section

#

here's the full error

#

and the cli command: accelerate launch --num_cpu_threads_per_process=2 "./sdxl_train_network.py" --enable_bucket --min_bucket_reso=256 --max_bucket_reso=2048 --pretrained_model_name_or_path="/workspace/stable-diffusion-webui/models/Stable-diffusion/sd_xl_base_1.0.safetensors
" --train_data_dir="/workspace/S0r4_train/img" --resolution="512,650" --output_dir="/workspace/stable-diffusion-webui/models/Lora" --logging_dir="/workspace/S0r4_train/log" --network_alpha="1" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=5e-05 --unet_lr=0.0001 --network_dim=128 --output_name="S0r4_10-01_p1" --lr_scheduler_num_cycles="12" --no_half_vae --learning_rate="0.0001" --lr_scheduler="constant" --train_batch_size="1" --max_train_steps="6960" --save_every_n_epochs="1" --mixed_precision="bf16" --save_precision="bf16" --seed="1234" --caption_extension=".txt" --cache_latents --optimizer_type="AdamW8bit" --max_data_loader_n_workers="1" --clip_skip=2 --keep_tokens="1" --bucket_reso_steps=64 --mem_eff_attn --shuffle_caption --gradient_checkpointing --xformers --bucket_no_upscale --noise_offset=0.05 --wandb_api_key="9328358809ad058d08c0f5e53cfc7f91f3d661b4" --sample_sampler=euler_a --sample_prompts="/workspace/stable-diffusion-webui/models/Lora/sample/prompt.txt" --sample_every_n_steps="270"

white imp
#

Hello, Im trying to use Kohya to train a model of mine so I can switch over to using SDXL, Apparently you need to use a XL Base model or something so im told from my freind for SDXL.

Anyways I have a 6GB card but I try and run Kohya it just stops because Cuda runs out of memory. Is there anyway in the files I can tell it my max GPU size is 6GB?

balmy fable
#

So, gang, anyone have any thoughts on how I would train a SDXL LoRa on what a keytar is? I've done a few test runs, and thus far I've not been able to get it to really get the hang of it at all. At the moment I'm using ~30 images (a few of just keytars alone, the rest of people playing them), and no regularization images (not sure what one would even use for those), and captions along the lines of a woman on a stage playing a keytar, keyboard (instrument) and the results aren't any better than asking stock SDXL for a picture of someone playing a keytar, and might actually be worse. I've trained people before, but not objects like this. Anyone have any pointers on doing this kind of thing?

little patio
#

How do you deal with a size mismatch between the checkpoint/safetensor's file and the "current model"?

little patio
#

Please help. This obstacle is truly frustrating.

sinful rune
#

I find that the model i finetuned base sd 1.5 act good in txt2img scene, but when i use it in img2img scene with controlnet, it seems not work well. eg. it will generate blur face when i use softedge. Some models in the civitai can generate nice face eventhough there is some controlnets like openpose, softedge. I feel like my model seems more seriously affected by controlnet. Does anyone know why this happened?

covert pagoda
# stiff dust no, just keep it at 1. The alpha is a scaling factor on the strength of your lor...

Thank you so much for this. It’s really useful to put aside. Interestingly, someone (Robert Jene), suggested alpha is scaling power of the captions, so the tracking listens more to the captions than the clip encoder. I’m not sure I remember this entirely correctly, but I gather this means how the text encoder is affected, which sort of makes sense because when the alpha is highly learning of concepts seems to go up but also seems to damage the underlying model’s knowledge, giving bad anatomy or badly rendered colors (in the case of photos for instance). I don’t know if this all wrong. Does it sound like there’s anything accurate in this depiction?

stiff dust
covert pagoda
# stiff dust no, that sounds wrong. alpha has nothing to do with captions, you could learn wi...

Ok that makes sense. I have a question that is somehow related. I’ve noticed when testing Lora on webui that using some of the tokens from the captions in the training will suddenly show a very overfitted Lora. So I imagine a good test for new Lora’s is to also run a prompt using some of the original captions to see if overfitting happens. Would this be correct way to test overfitting?

stiff dust
#

yes. I would always use trigger words anyways, as this is the fastest way to train loras (except when you use caption dropout)

covert pagoda
#

No, I don’t mean solely the trigger word. I mean certain descriptive words in the caption… like a leather jacket and an accessory. Which upon use in the prompt immediately makes the gen look like one of the dataset images. Whereas the trigger on its own generalises the likeness of the character

stiff dust
#

yeah, if a word that occurs only in part of the training images has a strong effect on the lora, then this is a clear sign of overfitting

tepid sundial
#

In the cloneofsimo Lora repo there was a feature to add new tokens to the tokenizer and introduce new embeddings for them, optionally initialized from existing ones - as part of LoRA training. Is there any reasons this type of approach has falled out of favour in other training scripts/repos/workflows? One can achieve this with for example sd-scripts too, as there is TI. But just wondering if there's a good reason why it's not more often suggested as part of training style and character LoRA.

#

We're seeing with SDXL that it is faily common for people to only train the unet, and given that, it would seem that performing a TI before training a LoRA should be benefitial to unet training.

jade hornet
covert pagoda
dusky aurora
#

could someone help me with img2img upscaling ? Ive been trying to use this controlnet ultimate sd upscaling method and the results look like when I look into my trash can.

stiff dust
restive bridge
#

I wish I could set different learning rates between training images and regularization images. I just discovered that its the regularization giving me clothing/environment artifacts. But dropping the reg wrecks the face quality.

ruby pond
restive bridge
quiet eagle
little patio
stiff dust
restive bridge
restive bridge
opal jacinth
restive bridge
opal jacinth
opal jacinth
restive bridge
latent charm
#

Did anyone know about the cropped training in sdxl report? what benefit from this type of training? I am planning to create a tool that could based user prompt to crop the training set to create sub set for lora training

open merlin
#

Am training using the prodigy learning rate scheduler. Anyone understand it? It seems that learning rate changes over time depending on the ratio of norm and key norm. But I cant find what 'key norm' means and how it is different from norm.

rancid acorn
# dusky aurora could someone help me with img2img upscaling ? Ive been trying to use this contr...

It would be best to use ControlNET Tiles along with Ultimate SD Upscale to get coherent, good quality upscales.

This comment I made on Reddit might help you get better upscales:

https://www.reddit.com/r/StableDiffusion/comments/142qmea/an_realistic_high_resolution_photo_of_an_rocky/jn655ms/?utm_source=share&utm_medium=ios_app&utm_name=iossmf&context=3

Reddit

Explore this conversation and more from the StableDiffusion community

glass sorrel
#

Hello community!
could you please share me the way how to fine tune the different images not like training on single type of images like dreambooth,

Ex: a model training images should be with different images shirt, pant, shoes, watch, etc... And with their respective prompts for training

Could you please let me know how can i make it

Thank you

normal ember
#

32 vs 128 network rank

gentle flame
west solstice
#

Hey all, still trying to wrap my head around regularization images. What exactly should they be? Are they meant to be a good example of the class or the literal model output of that class?

#

IE I’m training a ā€œSteve faceā€ Lora, my regularization images would be ā€œMan faceā€

#

Do I use a dataset of high quality examples for the ā€œMan faceā€ class? Or should I hop in to A1111 and crank out 800 literal ā€œMan faceā€ images with no negative prompts, presumably using the checkpoint I am training with?

#

I’m getting even further confused by conflicting information online about the value / necessity of even having regularization images. There is a lot of misinformation going around

gentle flame
#

do not fiddle with them to get "good results"

west solstice
#

so lets say image 1 is - steve, riding bike, blue shirt. the regularization would be man, riding bike, blue shirt?

gentle flame
#

it would be
riding bike, blue shirt

#

Adding man is a good idea, but it should be a part of the regular dataset caption as well, or whatever the man face is on

west solstice
#

gotcha

gentle flame
#

regularization images help to prevent overfitting and limits what the models add. You don't have to add them, but they can help if you're having overfit issues. I know of an example of a lora with and without reg images, but it's not sfw so can't share here. I can say that the regularization images helped make it more flexible though.

west solstice
#

Thanks so much! Is there a good place to learn these concepts from the engineering side? Much of the current youtube content feels a bit like rando content creators producing "guides" when they don't know what they are doing

gentle flame
#

I don't know any good guides or sites. I learned through discord. You could search here if you want to, but unfortunately there isn't a ton of info on regularization images since few people use them.

#

I also recommend using the sampler that you'll be training on for the reg images, which is usually DDIM.

#

oh yeah, forgot to mention this, but regularization images DO NOT have to match the name of the image dataset. They are not paired. You only have to make sure the reg caption file matches the reg image name.

#

and make sure they're the same resolution as what you'll be training on

latent charm
#

@hollow spruce Hello, I had done a experiment about anatomy subset of lora training. I use groundeddino to crop the original training image to get face and hand subset. Human selected the good images and add [face focus, hand focus] to each subset and keep all caption from wd14. The main dataset caption are generated by wd1.4, added prefix and human reviewed captions. It kind of improved hand in the generation. But I still think it is undertrain. All dataset set to repeat 3 and I am trying to increase the subset repeat to 10 to see if it would help.

#

new lora vs old lora. The main dataset is the same. The old one was trained multiple times with different setting. The new one was trained once. Both consume around 16 training hours in a 3090.

#

The old lora's text encoder seems to be broken.

#

The new one seems to be undertrain

opal jacinth
normal ember
#

I think one can see tendencies for tiny bit of overfitting in the higher rank. Look at the missing tents in the 128. Prompt included campsite.

latent charm
#

Could we train embedding for SDXL?

stone garden
#

so is it normal that lora training takes over certain haircolors and styles and i cant get rid of them?

#

i cut out all faces without hair and train the lora with only faces lets see if it works

latent charm
#

It is usually because your caption removed all haircolors and styles. The lora learned with that

stone garden
#

or what you mean with caption

latent charm
#

Usually, you would have a .txt file which is a pair of your image. 1.png should have 1.txt. Inside the text, it should have prompts to describe the paired image. The "caption" means the .txt file

stone garden
#

anyways i trained it with only the cutout face without hairs and it worked very good

#

i first cut out the faces with paint.net lasso tool

latent charm
stone garden
#

then i removed background with the a1111 extension (couldve maybe saved them as png to avoid that step) then in the text files i only described the face and now it works much better

latent charm
#

Multiple ways to achieve the same result. It is SD. It has no right and wrong.

stone garden
stone garden
#

i described them pretty good

#

except seperating the prompt in the text caption with commas has an effect

#

My summed up theory: If you want to train a lora model just for a face of a person, cutting out only the face/head without the hairs and body is the best/fastest method

latent charm
#

If you use reg images, it is fine

stone garden
#

I didnt use them yet šŸ¤”

latent charm
#

You might experiment your theory

covert pagoda
#

@latent charm hey have you played with anymore captioning scripts?

latent charm
#

I made a preprocessing tool which would crop images from dataset as a subset. After that, I would use wd14 to caption the images. WIP

covert pagoda
#

I do all my focus crops manually straight out of topaz crop/upscale

latent charm
#

I didn't do any upscale for the cropped image yet. But I deleted too small images.

#

I use groundingdino to auto select the focus by provided prompt like face or hand and select > 0.5 images. After that reviewed the cropped images and remove some.

stone garden
#

what's the best way to finetune with ~50k images?

latent charm
#

Share some selected result with this config.
The images was trained on two cycles.
The first one trained with 16 hours using 3090.
The seconds one trained with 8 hours and I reduce the text encoder learning rate to half of original which is 0.000025.
The dataset contains 3 folders, 3_face, 3_hand, 3_woman. Total around 750 images.
The woman dataset contains original selected photo of the person.
The face and hand were used my preprocess tool to auto crop from woman dataset.
After that, remove small images and complicated images (especially on hand dataset).
Tag them with wd14 and delete wrong tag. Keep all recognized tag.

latent charm
#

set LoRA network weights to your lora and continue the train

latent charm
#

You still use the same model. You need to set the lora network weights to resume from the lora training.

latent charm
#

I don't know what do you mean "low pixel". If you mean blurry, noise or any other effect you don't want but appear in your generation. It might related to your dataset

#

Image resolution doesn't related to lora

stone garden
#

So i trained lora with a character and i described the hairstyle in every text caption but still it sticks to the hairstyle from the character until i go down to like 0.6 weight

#

whats a good way to fix that?

latent charm
latent charm
#

You might try to add more hairstyles into dataset. It seems this hairstyle is overfitted to the lora

stone garden
#

or ill just use inpaint for other hairstyles

#

i thought maybe theres a way to make the AI mix the hairstyle of the lora with the rest of the dataset

normal ember
#

3000 steps (55 epoch), 220 images, contant adamw, unet only, lr 4e-4, batch size 4.
ref, 8, 6, 32, 64, 128
gen strength model: 0.6
gen strength clip: 0.9
a woman with eyes that have seen too much, enveloped in the twilight of a dense pine forest, with remnants of a long-abandoned campsite

#

Looking at only these I think network of 8 does the trick.

latent charm
#

I think network 8 is under train

#

The likeness is not enough for me

north meadow
#

hello, I have a question about training a custom model with dreambooth

#

not sure if this is the right channel tbh

#

but lets go

#

My question is about instance prompt and how unique each world should be

#

Say, if Ive named my instance prompt as "photo of humanoid2dside person", would the model understand the string "humanoid2dside" as completely new and unique argument during generation?
By no means the model should understand the prompt "photo of humanoid2dside person" as "photo of humanoid 2d side person"

#

Im not entirerely sure how stable diffusion recognizes each word contained in the prompt during generation.
But for my use case, a prompt with "humanoid2dside" should not be the same thing as a prompt with "humanoid 2d side"

latent charm
#

You could install it from ext

north meadow
#

hmmmmm

#

thanks, I was not aware of this

#

so that means with the string "humanoid2dside" that stable diffusion would identify the tokens "human","oid","2' and "dside"?

#

I see

#

probably I should use an more unique name then

#

I will try some strings on the tokenizer to see what works

#

thanks man

latent charm
#

That fine once you start to train text encoder. it will learn your token, humanoid2dside, for your image

north meadow
#

but once trained properly, is it garanteed to always consider humanoid2dside as a unique token?

latent charm
#

You don't need a "unique" long string for your training, you just need something won't make things wrong. That should be enough.

north meadow
#

I will try it out to see if it works then

#

having a mean to at least know if my token is being identified properly is already a step foward

#

thank you

normal ember
latent charm
#

that makes sense

normal ember
#

Should probably try dreambooth next to see what I might end up with.

next tapir
#

Is there a way to "negatively caption" things during training? I'm training a style, and every once in a while, real images bleed into the results. I can negative prompt photograph during generation, but I'd like to stop this from happening during the training phase so that it's not a burden for people who use the LoRa, if possible.

I know that I could use regularization images, but since I'm training a generic artistic style, the time needed to create and generate a wide diversity of regulation images seems rather egregious. I was hoping that there would be an easier way of training out specific elements from the resulting style.

latent charm
sonic narwhal
#

How much estimated VRAM will u need to do full finetune of SDXL?

Doing a LoRa on 24gb 3090 with Kohya_ss took about 22 hours with batch size 1, repeats 1 and epochs 90

stiff dust
#

for lora you need not more than 12gb. With a 3090 you can do batch size 10 without problems

#

training should be fast, but of course that also depends on your number of training images. But if you have so many images that it takes 22 hours I would increase batch size

sonic narwhal
#

I had 80 images

#

my goal is later to do full fine tune of a lot of different concepts

#

up to 5000-10000 images

#

will I need more Vram?

stiff dust
stiff dust
sonic narwhal
#

ill send the json I used

#

number of repeats on image folder was only 1

#

all images were 1080x1350

stiff dust
#

looks right. I would try a higher batch size. I have a 3090, too, and it definitely doesn't need that much time

sonic narwhal
#

hm okay, ill try again and see how it goes

latent charm
#

Are you using another program like comfyui or webui together while training?

gentle flame
# sonic narwhal How much estimated VRAM will u need to do full finetune of SDXL? Doing a LoRa o...

I don't know how to work this and it uses jax, but this might help you with vram requirements for a finetune if you have a few more 3090s
https://github.com/lodestone-rock/SDXL-sharding/tree/main

GitHub

training script to shard SDXL model across multiple devices - GitHub - lodestone-rock/SDXL-sharding: training script to shard SDXL model across multiple devices

undone bluff
#

im trying to train basically a character with lora. because i only have 8gb of vram im using a google collab. my training images are all realistic images and if i generate some images with that lora i only get this realistic style, even though i use a f.e more drawn, anime style checkpoint. what could cause this kind of behavior? i had about 30 images with 12 repeats and 10 epochs (i tested all epochs, same result). also how would i go about if i want multiple of these characters in one shot? what tags should i use when training? like "2MY_CHARACTER" and "1MY_CHARACTER"? or whats the best way to do it so i can use it later in my promt

latent charm
#

a photo of [your character] while training and replace the photo to other style while generation

undone bluff
#

so i dont have to put in the quantity of my characters which are in the images? in some photos the characteristics of my character are seen in 2 entites of my images. so i dont have to put "a photo of 2 (my character)"?
also could you elaborate on "replace the photo to other style"? im not sure what exactly you mean?

stiff dust
#
  • use the word "photo" in your captions
  • use a rare name for your character (not: john hammerfall, better: john hmrzufl)
  • only train unet, never train Text encoder
latent charm
undone bluff
#

I’ve put the term realistic in but I will try it with the tag photo as well, thank you āœŒļø

sonic narwhal
#

seems currently A6000 is minimum requirement to do full finetune of SDXL

#

Looking at SimpleTuner from bghira

stone garden
#

Can i add random photos with hairstyles and another prefix in the caption to my lora training images to be able to mix more hairstyles and colors to my character?

stone garden
latent charm
latent charm
#

I was enbled text encoder in training. While I train the same lora in multiple time, how could I find out the text encoder training is enough and stop te training?

sonic narwhal
#

thanks

north meadow
stiff dust
#

the original Dreambooth paper suggested to use "sks person". However, "sks" has a meaning and is not that rare, so using something like "tdjvr" might make more sense. Instead of person you can simply use a real name. It transports more information (john -> male, western culture) and is more natural to prompt.

north meadow
#

is there any documentation about these DW openpose arguments?

latent charm
#

It is finetune channel. You might ask in SDXL

north meadow
#

ok

restive bridge
rain scarab
#

im training a sdxl lora with 163 images using Kohya at 1024,1024. It says it is going to take 5 hours? Is that normal? GPU 4080 gtx(16 gig) , i9-13900K , 32gig mem, m2's

Here is the configuration

{
"LoRA_type": "Standard",
"adaptive_noise_scale": 0,
"additional_parameters": "",
"block_alphas": "",
"block_dims": "",
"block_lr_zero_threshold": "",
"bucket_no_upscale": true,
"bucket_reso_steps": 64,
"cache_latents": true,
"cache_latents_to_disk": true,
"caption_dropout_every_n_epochs": 0.0,
"caption_dropout_rate": 0,
"caption_extension": "",
"clip_skip": "1",
"color_aug": false,
"conv_alpha": 1,
"conv_block_alphas": "",
"conv_block_dims": "",
"conv_dim": 1,
"decompose_both": false,
"dim_from_weights": false,
"down_lr_weight": "",
"enable_bucket": true,
"epoch": 10,
"factor": -1,
"flip_aug": false,
"full_bf16": false,
"full_fp16": false,
"gradient_accumulation_steps": "1",
"gradient_checkpointing": true,
"keep_tokens": "0",
"learning_rate": 0.0004,
"logging_dir": "<>/KOHYA/LoraPics/finalenvoy\log",
"lora_network_weights": "",
"lr_scheduler": "constant",
"lr_scheduler_args": "",
"lr_scheduler_num_cycles": "",
"lr_scheduler_power": "",
"lr_warmup": 0,
"max_bucket_reso": 2048,
"max_data_loader_n_workers": "0",
"max_resolution": "1024,1024",
"max_timestep": 1000,
"max_token_length": "75",
"max_train_epochs": "",
"max_train_steps": "",
"mem_eff_attn": true,
"mid_lr_weight": "",
"min_bucket_reso": 256,
"min_snr_gamma": 0,
"min_timestep": 0,
"mixed_precision": "bf16",
"model_list": "custom",
"module_dropout": 0,
"multires_noise_discount": 0,
"multires_noise_iterations": 0,
"network_alpha": 1,
"network_dim": 1,
"network_dropout": 0,
"no_token_padding": false,
"noise_offset": 0,
"noise_offset_type": "Original",
"num_cpu_threads_per_process": 2,
"optimizer": "Adafactor",
"optimizer_args": "scale_parameter=False relative_step=False warmup_init=False",
"output_dir": "<>/KOHYA/LoraPics/finalenvoy\model",
"output_name": "warforged_chk_pt",
"persistent_data_loader_workers": false,
"pretrained_model_name_or_path": "<>/stable-diffusion-webui - dream/models/Stable-diffusion/sd_xl_base_1.0_0.9vae.safetensors",
"prior_loss_weight": 1.0,
"random_crop": false,
"rank_dropout": 0,
"reg_data_dir": "",
"resume": "",
"sample_every_n_epochs": 0,
"sample_every_n_steps": 0,
"sample_prompts": "",
"sample_sampler": "k_dpm_2_a",
"save_every_n_epochs": 1,
"save_every_n_steps": 0,
"save_last_n_steps": 0,
"save_last_n_steps_state": 0,
"save_model_as": "safetensors",
"save_precision": "bf16",
"save_state": false,
"scale_v_pred_loss_like_noise_pred": false,
"scale_weight_norms": 0,
"sdxl": true,
"sdxl_cache_text_encoder_outputs": false,
"sdxl_no_half_vae": true,
"seed": "",
"shuffle_caption": false,
"stop_text_encoder_training": 0,
"text_encoder_lr": 0.0004,
"train_batch_size": 5,
"train_data_dir": "<>/KOHYA/LoraPics/finalenvoy\img",
"train_on_input": true,
"training_comment": "",
"unet_lr": 0.0004,
"unit": 1,
"up_lr_weight": "",
"use_cp": false,
"use_wandb": false,
"v2": false,
"v_parameterization": false,
"v_pred_like_loss": 0,
"vae_batch_size": 0,
"wandb_api_key": "",
"weighted_captions": false,
"xformers": "sdpa"
}

jade hornet
opal jacinth
#

does it actually matter if some input training images are rotated?

#

or is it simply a "display setting" of the image for the OS and doesn't matter anyway?

latent charm
regal harbor
#

anyone finte-tuning BLIP2?

stone garden
#

Is seperating the prompts in the caption files with commas better?

stiff dust
stiff dust
latent charm
opal jacinth
stiff dust
stiff dust
# latent charm I usually get better result when training character concept with telr.

it might depend on what you want to achieve. I found that text encoder training adapts very fast to the images, BUT the resulting model becomes very unflexible if it comes to draw the character in a different art style (e.g. from photo to anime or from painting to comic) or when you want to draw the character in a different angle, with different clothing and so on. Text encoder training often ends up giving me images that are too similar to the training images in terms of composition

opal jacinth
stiff dust
#

I assume it has something to do with the pooling. It seems that by text encoder training your trigger words become too dominant and the remaining prompt will be ignored very often

latent charm
stone garden
#

3200 steps in 7 minutes?

stiff dust
stone garden
#

10k steps in 15

stiff dust
latent charm
#

I used photo as prefix and manually add camera angle tag to caption to let lora learn how to map the camera angle to images. I used wd14 for auto captioning and manually remove and added intented tag for more control for the lora.

stiff dust
#

hm, nah, adding tags that describe things on the images you don't want to overfit - I'm already doing that

latent charm
#

I also use half unet lr for telr to train it slowly

stiff dust
#

maybe it works better for your data. I can only say I did several subject trainings. I always evaluate by using simple prompts ["photo of xyz"] and unconventional prompts ["xyz as astronaut", "charcoal drawing of xyz", "egyptian hieroglyphics depict xyz"].
I always found that training Textual Inversion or training Text Encoder will improve very fast for the simple prompts but won't be able to do the unconventional prompts. Unet-only training on rare name tokens is the only strategy so far that excells also on the unconventional prompts.

#

when I have very few training data (e.g. only 1-2 images) THEN I use text encoder only training.

latent charm
#

hmmm, I train with te using 2000 images and train it multiple times. I would try ur evaluation method and see how it go on

stiff dust
#

I mean, if I train on photos of my face, then I have enough images. Training the text encoder then totally allows me to draw the image in different angles and so on

#

but it quickly overfits on the photo style

latent charm
#

But I think even the text encoder is overfitted. It could be easily reduce the strength in comfyui and use a earlier te from previous Lora

stiff dust
#

like letting my face be drawn as comic or charcoal drawing won't work anymore

stiff dust
#

also not all styles overfit similarly fast. Like "anime" style usually stays quite robust

stiff dust
latent charm
#

I think it is because anime is quite undertrain in base

stiff dust
#

I did A LOT of tests with training my face xD Using rare tokens was BY FAR what worked best

#

and with rare token I mean something like "Christian gjhsar"

#

combination of name + some random characters

latent charm
stiff dust
#

yes

latent charm
#

interesting

stiff dust
#

in fact, you only have to train the cross attention

#

this shrinks down the lora file size to few megabytes ^^

#

training self attention sometimes slightly improves image quality, but 99% of the training has to be done in the cross attention

#

I tried

  • textual inversion first, then unet training
  • textual inversion first, then text encoer, then unet training
  • text encoder, then unet on celebrity names
  • unet only on celebrity names
  • text encoder, then unet on rare tokens
  • unet only on rare tokens

The last one had best generalization capability

#

(it also took most time for training. unet on rare tokens need ~10 times more training steps to adapt to the training images than the other methods. But results just looked best by far)

latent charm
#

Thanks for sharing. I would have some experiment with it.

#

I tried rare token unet only before but it seems very hard to converge

#

might be due to wrong lr

stiff dust
#

yeah, it takes forever ^^

#

I mean, you have to combine rare token with a token that describes the character

#

I use a first name for that

#

similar to Dreambooths "sks person" I use then "John duqgzsa"

#

(cause sks is not really rare)

opal jacinth
#

you're not providing the class?

stiff dust
#

but then it trains much slower than other training variants. That is definitely the case. I just say the results are also much better than for other training variants

stiff dust
#

in the case above: "John" clearly describes a male character. No need to use "male character" additionally

#

if your name is very missleading (like you are a women with name "Alex", or your name has some other meaning, like your name is "Dick" ^^°)

#

then you should "rename" yourself šŸ˜‰

#

like my name is "Kai", which is a typical German name, but ithe name also appears in other cultures and in Stable Diffusion it is strongly associated with Japanese culture. That's why I renamed my name to "Christian" when training on my images

opal jacinth
#

but I did get it right, you actually have the same results as if using the token "ohwx man"?

#

or do you say you had even better results with "Christian duqgzsa"

normal ember
#

How do you know if it’s a rare token or not?

#

The offset lora provided by SAI is trained on just ā€œcontrastā€ by the look of the metadata.

stone garden
#

what type of captioning should i use for a clothing style lora?

stiff dust
stiff dust
latent charm
#

Have you try celebrity names plus random character?

stiff dust
#

no. The problem with celebrity names was that they are not really better than textual inversion but sometimes blend over (e.g. I once trained a DnD character on Hayden Christensen and sometimes the DnD character holds a light sabre instead of a sword xD)

#

I have to say: I don't care about training time. It might be different if you do that for business on a regular basis. For me, training 2 hours on a subject is totally okay if the results are really good afterwards.

stone garden
#

So whats better, using multiple tokens or just one token and describing all the elements of the picture to add them later in the prompt manually?

latent charm
stone garden
#

will the things from the training images mentioned in the caption still be included in the training dataset?

stiff dust
latent charm
latent charm
#

haven't finished yet.

#

and planned to stop trainning unet

#

My focus is for the most likeness to original picture which might be a little bit different of you goal

stiff dust
latent charm
#

I have this sample to change the style with te lora

#

in a few day ago. Dataset is 700 images. Training spent around 20 arounds.

#

I think it could add more oil painting to adjust the style

#

all images in dataset is photo and no reg images

#

It supposed to output images like this.

normal ember
stiff dust
#

if you have the name "fgzhw" then it is tokenized (e.g. into ["fg", "zh", "w</w>"]) and the tokens are associated with your face

#

if you train the text encoder, the name tokens are "distributed" and amplified through your caption, which then ends up in an overfitting effect

normal ember
#

and if it's not a face, like a image taken with let say kodak vision2, any idea how to best caption that?

#

I've gone with just that, seems to be working but there might be better ways.

stiff dust
#

either you caption it with a unique rare token trigger word, or you simply describe what is on the image

#

former makes sense if there is something uniquely new that cannot be described

#

(like your face)

normal ember
#

Tried to caption the image but I don't think the results were great when you do unet only. captioning the film stock turned out better.

stiff dust
#

I'm not sure if this is comparable. I said: text encoder training overfits on style. You seem to train on a style

normal ember
#

When I trained both unet and text encoder with captioned images it turned out good but it overfitted the characters, clothing and environment a bit too much.

#

Maybe I should have a lower lr for the text encoder when I'm training both?

latent charm
#

I currently use half lr for te compared to unet

normal ember
#

My first try that trained both unet and text encoders where caption something like this: cinematic film still of the road is empty, desolate, calm and serene, blue and yellow, close-up, lonely, barren, empty, natural, low, soft, straight on, shallow depth of field . vignette, highly detailed, high budget, bokeh, cinemascope, moody, epic, gorgeous, film grain, grainy

#

I've also tried something like this for the same image a road, desert, mountains, day, landscape

#

that didn't turn out as good.

#

I've also tried just kodak vision2 for all images and unet only. It seem to have the least impact on anything else except the color palette and film grain and such.

stiff dust
#

dunno... I would train unet only, honestly.

#

it's also the way SDXL was trained itself

normal ember
#

Maybe I should try to increase the number of steps in training if when I do unet only. I've run 3000 steps on 220 images.

#

If I do a rare token like kai is telling us, I'm not sure if I should just try a random token since this is not images of "Christian" nor "John" šŸ˜„

stiff dust
#

you train for a style. I wouldn't use any special token here

normal ember
#

and should i skip kodak vision2?

stiff dust
#

except something like "kodak vision2"

normal ember
#

I do 0.0004 for learning rate. Could probably experiment with that too and see if it picks up the style better or worse.

#

I could try 6000 steps and a batch size of 4 as I've done previously and see what happens. It would probably take 2 hours with that amount of steps.

stiff dust
#

you can also try to play around with the --adaptive_noise_scale parameter

#

setting it to, say 0.05, might speed up training for some concepts

normal ember
#

Adaptive noise scale:
Used in combination with the Noise offset option. Specifying a number here will further adjust the amount of additional noise specified by Noise offset to be amplified or attenuated. The amount of amplification (or attenuation) is automatically adjusted depending on how noisy the image is currently. Values ​​range from -1 to 1, with positive values ​​increasing the amount of added noise and negative values ​​decreasing the amount of added noise.

stiff dust
#

--clip_skip 1
does that make sense for SDXL?

normal ember
#

Not sure, I should probably remove it since you ask. šŸ˜„

stiff dust
#

sorry, I meant --min_snr_gamma=5, not --adaptive_noise_scale

normal ember
#

Min SNR gamma
`In LoRA learning, learning is performed by putting noise of various strengths on the training image (details about this are omitted), but depending on the difference in strength of the noise on which it is placed, learning will be stable by moving closer to or farther from the learning target. not, and the Min SNR gamma was introduced to compensate for that. Especially when learning images with little noise on them, it may deviate greatly from the target, so try to suppress this jump.

I won't go into details because it's confusing, but you can set this value from 0 to 20, and the default is 0.

According to the paper that proposed this method, the optimal value is 5.

I don't know how effective it is, but if you're unsatisfied with the learning results, try different values.`

#

I guess you could say my images are noisy, if grain is noise.