#🔧｜finetune | Stable Diffusion | Page 17

covert pagoda Aug 13, 2023, 11:20 AM

#

a certain lighting setup

#

makeup

#

all within one LoRA

#

@hollow spruce so would that be a separate img folder for each, or would you keep each concept as recurring common token in the captions?

hollow spruce Aug 13, 2023, 11:22 AM

#

#

#

covert pagoda Aug 13, 2023, 11:23 AM

#

but how exactly do you develop each concept so that you can visualise the group of images that contain that concept feature ?

hollow spruce Aug 13, 2023, 11:23 AM

#

my personal rule of thumb is - 100 images tagged with the same concept to work properly, when I train them together into one finetune style lora

covert pagoda Aug 13, 2023, 11:23 AM

#

for instance in Everydream, you can use yaml args in a folder to inject a caption token during training, so each folder would group the images for taht concept

latent charm Aug 13, 2023, 11:23 AM

#

hollow spruce my personal rule of thumb is - 100 images tagged with the same concept to work p...

I tried character cosplay. But when I use the same prompt to reproduce the image after training. The cosplay tag usually mess up things. Should we remove the cosplay tag from training?

covert pagoda Aug 13, 2023, 11:24 AM

#

i guess in your case you are doing it manually on Danbooru tag editor

hollow spruce Aug 13, 2023, 11:24 AM

#

covert pagoda i guess in your case you are doing it manually on Danbooru tag editor

I use hydrus network. (free)
a bit of effort to learn - but scales well for big manually tagged datasets

covert pagoda Aug 13, 2023, 11:25 AM

#

yea, i've heard about hydra, but isnt that more for anime?

hollow spruce Aug 13, 2023, 11:25 AM

#

latent charm I tried character cosplay. But when I use the same prompt to reproduce the image...

it works once you enable clip training, and have a big enough dataset

#

the moment I hit 2k images in total, things started to work out

covert pagoda Aug 13, 2023, 11:25 AM

#

whereas i use instagram and photo websites to scrape

hollow spruce Aug 13, 2023, 11:25 AM

#

at 4k images right now, it's pretty damn good

latent charm Aug 13, 2023, 11:26 AM

#

Oh, nice to know

hollow spruce Aug 13, 2023, 11:26 AM

#

but if you do just the cosplay training, and use keep n tokens = 1, and shuffle rest, then it should work with around 150 images, and 400 images for "close-to-perfect" results

#

(I tried it with a Nier Automata 2B cosplay) <- no clip training, since '2b cosplay' was already understood, just not producing the right images most of the time

#

it learned a total of around 10 concepts, with the '2b cosplay' working roughly 4/5 times to produce an image I consider good enough to post

tribal frigate Aug 13, 2023, 11:33 AM

#

I see, thanks 🙂

#

Guys do you have a tip for a good tutorial on training loras for SDXL with google colab?

covert pagoda Aug 13, 2023, 12:15 PM

#

hollow spruce the moment I hit 2k images in total, things started to work out

Is this for fine tune checkpoint?

hollow spruce Aug 13, 2023, 12:18 PM

#

covert pagoda Is this for fine tune checkpoint?

lora

#

while normal loras stay similar in training, with sdxl we now have options for bigger and much more complex loras, which approach finetune level of improvements - across hundreds of concepts

stone garden Aug 13, 2023, 12:32 PM

#

Hello, everyone. I'm using scale_weight_norms = 1 but my average key norm doesn't stop going up and eventually approaches 1 and my keys scaled become very high after the 4th epoch. Any way I can fix this?

covert pagoda Aug 13, 2023, 12:34 PM

#

hollow spruce while normal loras stay similar in training, with sdxl we now have options for b...

Ok, gonna digest this. So, need to start plotting working concepts like a fine tune, got to be organized to know the concepts learned in promoting

#

My main use case for this is to attempt mixing characters... like multiplying concepts, like two characters generalized. Does that make sense? That's sort of been my desire for a while. Will give it a go this week. Any recommendations besides the ones you've given me already?

jade hornet Aug 13, 2023, 2:35 PM

#

If you have one concept with 50 images, and one with 100, would you do 2 repeats on the set with 50? Or on the 100? I struggle with that

hollow spruce Aug 13, 2023, 2:36 PM

#

jade hornet If you have one concept with 50 images, and one with 100, would you do 2 repeats...

at least that is the theory behind it

jade hornet Aug 13, 2023, 2:37 PM

#

Yah the kohya_ss page says it's original intent was to allow you to match up with #reg images, but like you stated there is probably good reason to use with multiple concepts even if not using regs

hollow spruce Aug 13, 2023, 2:38 PM

#

I can't vouch for how well it works, since whenever I did that, it didn't really pay off, as I didn't have enough images to make it work (20 images repeated 5 times, really aren't enough to fully teach a concept - so not worth trying to make this work if you're teaching multiple concepts into one lora - better to just make 2 loras if the difference is that big)

#

on the other hand - for concepts where I have 200 images, and another with 700 images, I don't bother repeating the smaller one, as they'll still get learned at roughly the same rate

#

(the bigger dataset gets learned slower, but more flexibly - balancing everything out)

#

sub 100 images 🤷‍♂️ hard to tell without a lot of testing

#

I usually find it easier to just increase my dataset, than messing with repeat settings

jade hornet Aug 13, 2023, 2:40 PM

#

Yah what I saw was one burns out the whole model before the other is trained

pliant drift Aug 13, 2023, 3:10 PM

#

wonder what the documentation is on about then.

covert pagoda Aug 13, 2023, 3:20 PM

#

I'm getting the vibes that adaptive algos aren't for fine tuning small datasets but rather for custom larger models

#

Getting way better with adamW8bit on my Lora

#

Guess I'll stick to constant/cosine schedulers for Lora's

#

And might go back to adaptive for say a large diverse dataset for a custom model

normal ember Aug 13, 2023, 8:59 PM

#

Is it crucial to ensure that the num_train_images evenly divides the train_batch_size?

young crater Aug 14, 2023, 1:08 AM

#

normal ember Is it crucial to ensure that the num_train_images evenly divides the train_batch...

It won’t stop the training, but idk if it has an effect on the output.

stable spruce Aug 14, 2023, 1:08 AM

#

when does stablecode come out to use

signal warren Aug 14, 2023, 5:29 AM

#

hollow spruce I can't vouch for how well it works, since whenever I did that, it didn't really...

Do you seperate each concept images into different sub folders? I like the idea of making a huge lora with multiple concepts. I might try this too. Are you planning to merge it with a checkpoint in the end or leave it as a lora?

signal warren Aug 14, 2023, 7:14 AM

#

hmmm I wonder what learning rate I should use for thousands of images

elfin raven Aug 14, 2023, 7:39 AM

#

Note sure is this is the right place for this, but I got sdxl lora running at 1024x1024 on a 16gb gpu. My loss is crap, but I will experiment more. I'm happy to share my settings if people want to contribute and make suggestions.

hollow spruce Aug 14, 2023, 8:06 AM

#

signal warren Do you seperate each concept images into different sub folders? I like the idea ...

depends. If all the concepts you're training fall under some greater class, then you don't need different folders.
In my case I currently have two folders: 1_girl, 1_woman <- since one of the goals of my lora is to give these two words very specific age brackets
but even if I put everything into 1_woman, it would still work

However, additional folders are great if you're lacking enough images for a specific concept! My "tracer cosplay" concept felt undertrained, so I added an additional folder 1_tracer cosplay, exported all those tagged images there once more, and now it learned it just how I wanted it (keep in mind, those images are now essentially duplicated - since they're already a subset of the woman/girl images. But this time they got loaded into kohya witht he class prompt "tracer cosplay" - so it worked out better)

#

in short, once you have around 2k images, you can use various methods to train your lora, and they will all work 'good enough'. The magic happens when you make all the important concepts work with weights of the lora set to 1. and with no need for (tracer cosplay:1.4) or anything like that

signal warren Aug 14, 2023, 8:11 AM

#

ah thanks, yeah that makes sense. I guess part of the fun really is to experiment yourself to get that perfect balance.

My idea at the moment is to make a lora which focuses on jdrama/movies, so that I can get the style of my favourite different dramas and characters all in one lora! 💦

mortal lance Aug 14, 2023, 2:34 PM

#

What are best implementations available for the following methods to fine-tune the sdxl?
-- full fine tuning
-- Dreambooth + LoRA
-- Dreambooth

lethal hinge Aug 14, 2023, 3:30 PM

#

What base model do you all use for training a lora with a person? just the base 1.5? (not jumping into sdxl just yet)

stone garden Aug 14, 2023, 6:39 PM

#

can anyone point me in the right direction to train some text embeddings/inversion i have the sdxl1.0_0.9vae versions, thanks

hollow spruce Aug 15, 2023, 7:53 AM

#

mortal lance What are best implementations available for the following methods to fine-tune t...

that's a whole essay's worth of topics, which each have multiple answers which range from short to very very very complex. best to scroll up and read the chat here, as multiple of these questions have been answered in various degrees of complexity

mortal lance Aug 15, 2023, 9:57 AM

#

hollow spruce that's a whole essay's worth of topics, which each have multiple answers which r...

thanks

stiff dust Aug 15, 2023, 11:46 AM

#

stone garden can anyone point me in the right direction to train some text embeddings/inversi...

kohya_ss has scripts for that. Works exactly the same as lora training

pliant drift Aug 15, 2023, 12:23 PM

#

aitrenpreneur out here this week telling his audience that 99% of loras are made wrong because people are using rare tokens. I'm out here looking on civit and 1/20 users publishe with rare tokens. maybe 1/20. probably less.

i think he just made that 99% figure up

stone garden Aug 15, 2023, 1:33 PM

#

whats rare tokens?

#

and how is that bad?

signal warren Aug 15, 2023, 2:30 PM

#

The guy in the video (not me) said the rare tokens such as a random ewfew word like that has less effect than choosing a person the SDXL already knows that looks similar to the person you want to train, for example using the name Jessica Alba to train someone who has a similar look. He believes that training with a name SDXL already knows will make the training faster/better.

stone garden Aug 15, 2023, 3:04 PM

#

It’s going to work faster yeah. But given the right settings and time the other way will also work perfectly.

stone garden Aug 15, 2023, 6:29 PM

#

#💬｜general-chat message

#

help pls

#

can anyone give me some advice on lora training ? i found some guides but they're a bit confusing maybe i could ask a couple questions, like how do i make the captions seems to be different ways to apply, can i just make a folder with images

orchid yoke Aug 16, 2023, 4:45 AM

#

stone garden can anyone give me some advice on lora training ? i found some guides but they'r...

Starting out, I would think your best option is to use the gui https://github.com/bmaltais/kohya_ss - In Lora > Tools > Deprecated you can fill in the details, click prepare training data and then "copy info to folders tab" which will handle the folder creation. Theres then a utilities/captioning/blip captioning to create the core captions. So then you have the folders in the right format, and the captions created with ease. You likely will want to make some manual edit to the captions to make them better, but you have a core framework to go off that way.

#

(screenshot taken from https://youtu.be/sBFGitIvD2A) but as you say there are lots of guides. Its still pretty new and I think there is going to be even more amazing things to come especially around multiple concepts)

YouTube

SECourses

Become A Master Of SDXL Training With Kohya SS LoRAs - Combine Powe...

In this tutorial, you will learn how to install Automatic1111 Web UI for SDXL. How to use LoRAs with Automatic1111 SD Web UI. How to install Kohya SS GUI scripts to do Stable Diffusion training. How to train LoRAs on SDXL model with least amount of VRAM using settings. All of the details, tips and tricks of Kohya trainings. How to do x/y/z plot ...

▶ Play video

sour eagle Aug 16, 2023, 5:01 AM

#

why does my loss stay around 0.125 when using adafactor? is that normal for variable learning rate ?

stiff dust Aug 16, 2023, 7:51 AM

#

the loss has nothing to do with the optimizer or the learning rate. The learning rate is applied on the gradient that is conputed from the loss

latent charm Aug 16, 2023, 10:19 AM

#

Did anyone try LoRA-FA?

latent charm Aug 16, 2023, 5:01 PM

#

I have tried this LoRA-FA. It basically achieved better result at half epoches and it is memory efficient. I was able to do batch size 10 around 21/24 using 3090.

signal warren Aug 16, 2023, 5:48 PM

#

Interesting, seems like worth a try then

signal warren Aug 16, 2023, 6:47 PM

#

latent charm Did anyone try LoRA-FA?

How did you get it, I can't see it on my kohya

latent charm Aug 16, 2023, 7:05 PM

#

signal warren How did you get it, I can't see it on my kohya

dev2 branch

signal warren Aug 16, 2023, 7:05 PM

#

aaaah cheers!

normal ember Aug 16, 2023, 8:12 PM

#

Anyone tried Salesforce/blip2-opt-6.7b-coco or something similar to auto-caption images? I think I'm starting to have a baseline for training.

covert pagoda Aug 16, 2023, 11:13 PM

#

hollow spruce that's a whole essay's worth of topics, which each have multiple answers which r...

What are your thoughts on finding the right scheduler for a given dataset? Are you one to play with runs on wandb and compare sequentially the effect of apply constant vs cosine with startups vs cosine annealing etc. maybe seeing the different feedback from each LR on strengths or weaknesses of various LR gamma helps to identify where converge is highest? Do you have graphological conversations. I've noticed this is a hot debate amongst the big model fine tuners, less so for the dreamboothers?

stone garden Aug 17, 2023, 6:17 AM

#

normal ember Anyone tried Salesforce/blip2-opt-6.7b-coco or something similar to auto-caption...

I had to go through every single image caption personally anyway. It provides a good baseline caption most of the time though.

normal ember Aug 17, 2023, 6:31 AM

#

stone garden I had to go through every single image caption personally anyway. It provides a ...

That's probably the right way to go about it. I steered it quite hard on the dataset I had.

latent charm Aug 17, 2023, 6:33 AM

#

My approach is tagged with wd14 and remove unnecessary tags

normal ember Aug 17, 2023, 6:35 AM

#

I feed it a data structure with all the questions and then I combine it to the caption that I then write to file. questions = { "p_style": "Choose the mood of the photo: Desolate, Tense, Lonely, Stark, Quiet, Dark.", "p_subject": "Describe it in detail.", "p_mood": "Describe the mood in the image.", "p_colors": "Describe the colors.", "p_framing": "Choose the framing of the subject: Close-up, Medium, Wide.", "p_setting": "Describe the general setting or environment in a few words.", "p_lighting": "What is the lighting like? E.g.: Natural, Low, Soft, Harsh.", "p_angle": "What angle is the picture taken from? E.g.: Straight on, Low, High.", "p_dof": "What depth of field is used? E.g.: shallow depth of field, deep depth of field." }

latent charm Aug 17, 2023, 6:46 AM

#

It should work better for fine tuning. It is unnecessary for lora

stone garden Aug 17, 2023, 6:52 AM

#

normal ember I feed it a data structure with all the questions and then I combine it to the c...

Replying to bookmark this and modify it for my purposes.

hollow spruce Aug 17, 2023, 6:54 AM

#

covert pagoda What are your thoughts on finding the right scheduler for a given dataset? Are y...

funnily enough, I don't use the graphs at all.
I usually just rigorously test my checkpoints, as the loss values can appear perfect, yet upon real life testing - it turns out only 3/5 concepts were learned.

Tracking Loss is great to ensure that your settings don't have some massive error, or that the model didn't implode on itself. But other than that, it's not really worth it for me, as I have some loras that work greats, despite less than ideal loss values.

I usually stick with AdamW or Adamw8bit + constant. For anything that isn't faces or anatomy - this will give so close to 'perfect' results, that it wasn't worth trying with other settings for me.

#

While I'm confident, that cosine with restarts, or cosine with annealing restarts can be used to get even a bit higher quality, on harder concepts such as faces/anatomy - getting that setting just right for your specific dataset + tagging is hard enough that I find it hard to recommend.

#

Prodigy can be good - as you can use it as a set & forget scheduler. One training setting fits all - literally. It won't give you perfect results, but it gets you 80% of the way, 80% of the time. Regardless of how complex your dataset is

stone garden Aug 17, 2023, 6:58 AM

#

hollow spruce Prodigy can be good - as you can use it as a set & forget scheduler. One trainin...

I’ve found with prodigy that min snr gamma has quite a pronounced effect. Setting it low for complex stuff and high for simple stuff helps quite a bit.

hollow spruce Aug 17, 2023, 6:58 AM

#

so I've seen it used successfully with genuinely insane datasets of like 30k images

hollow spruce Aug 17, 2023, 6:58 AM

#

stone garden I’ve found with prodigy that min snr gamma has quite a pronounced effect. Settin...

now if only min snr didn't kill contrast 🥲

stone garden Aug 17, 2023, 7:01 AM

#

hollow spruce now if only min snr didn't kill contrast 🥲

Eh? Min snr gamma only smooths the loss so the optimiser or LR scheduler have a better chance of optimising the thing. I saw that in the code, but didn’t check if it does anything to the image before it’s fed into the model. Otherwise with a model as complex as SD it’s sure to go crazy loss-wise.

#

I shall remember to check that.

hollow spruce Aug 17, 2023, 7:03 AM

#

stone garden Eh? Min snr gamma only smooths the loss so the optimiser or LR scheduler have a ...

it's a ?bug?. I mean I'm not sure if its a bug since its more about how the base sdxl model was trained itself, than anything else. But if you finetune sdxl enough, especially using bright images, you'll notice that the finetune always tries to converge on 50% grey. First backgrounds turning greyer, then blacks turning less black, whites turning less white.
Some settings can speed up that effect - the biggest offenders I've found are offset noise + min snr

stone garden Aug 17, 2023, 7:05 AM

#

Interesting. I shall check what they do in the code in more detail. Noise offset moved some things between the up and down pass of the Lora so that the scale was okay from what I saw, but I didn’t look in too much detail.

hollow spruce Aug 17, 2023, 7:06 AM

#

since sdxl base model was trained with offset noise (and not just one consistent value either, but rather varies levels of offset noise, at varies intervals)
our running theory is that we can't exactly match that offset noise - hence this is the effect we get

#

it's even worse on full finetuning

stone garden Aug 17, 2023, 7:07 AM

#

@hollow spruce I hope you don’t mind me asking, does the loss of your LoRA always go down? Mine fluctuates a bit between 0.68 and 0.72 for example.

hollow spruce Aug 17, 2023, 7:07 AM

#

but it's nothing about the code - its 100% implemented correctly. It's just what the finetune is learning.
(SAI themselves don't experience this though, fyi - so their finetuning workflow, using custom scripts is immune to this issue)

stone garden Aug 17, 2023, 7:08 AM

#

Oh not saying it’s implemented wrong, just saying I probably don’t understand it fully or have seen it enough yet.

normal ember Aug 17, 2023, 7:09 AM

#

stone garden I had to go through every single image caption personally anyway. It provides a ...

There are so many models to pick. I'd like to test openflamingo/OpenFlamingo-9B-vitl-mpt7b too. Wish I could clone myself. 😄

hollow spruce Aug 17, 2023, 7:10 AM

#

stone garden <@211089689652887552> I hope you don’t mind me asking, does the loss of your LoR...

yep. number go down - unless I reach dangerous levels of overfitting.
I usually hover around 0.1 loss in the beginning

stone garden Aug 17, 2023, 7:10 AM

#

Means it’s something in my settings probably. I have a big enough dataset I’m confident I don’t overfit.

#

But I still have to prune and caption it a bit more.

hollow spruce Aug 17, 2023, 7:11 AM

#

keep in mind, invisible things can also overfit XD I've trained more than one "noise" lora by now, on accident

stone garden Aug 17, 2023, 7:12 AM

#

Yeah. If the background is too similar or the subjects are always wearing the same clothes or stuff like that.

stone garden Aug 17, 2023, 7:13 AM

#

normal ember There are so many models to pick. I'd like to test openflamingo/OpenFlamingo-9B-...

Do you have a script you use by any chance?

hollow spruce Aug 17, 2023, 7:13 AM

#

stone garden Yeah. If the background is too similar or the subjects are always wearing the sa...

yeah - the more concepts you train at the same time, the less you can rely on loss values, as they represent your training as a whole - but often its just one thing going super wrong, which needs to adjusted for the next training attempt

stone garden Aug 17, 2023, 7:14 AM

#

hollow spruce yeah - the more concepts you train at the same time, the less you can rely on lo...

By feeding back the thing or by adjusting the captions I assume?

hollow spruce Aug 17, 2023, 7:16 AM

#

stone garden By feeding back the thing or by adjusting the captions I assume?

I'm trying to do faces + anatomy + clothing in one single lora. all of them get learned at completely different rates though, so this is what my dataset distribution looks like XD
clothing,full body : anatomy : faces : teeth
1 : 6 : 6 : 30

#

that way, nothing overfits, and everything gets trained equally

stone garden Aug 17, 2023, 7:17 AM

#

Super helpful. Thanks.

hollow spruce Aug 17, 2023, 7:17 AM

#

I woul really really recommend training faces + anything else via 2 different loras though. saves you soooo much trouble

#

for me, I can't do that since I'm doing a finetune style lora with (currently) 4k images, which is just a literal finetune like experience, rather than teaching a specific concept.

stone garden Aug 17, 2023, 7:21 AM

#

hollow spruce I'm trying to do faces + anatomy + clothing in one single lora. all of them get ...

Do you crop as well?

#

And upscale?

normal ember Aug 17, 2023, 8:01 AM

#

stone garden Do you have a script you use by any chance?

Not at this moment, it's not releasable. 😄

hollow spruce Aug 17, 2023, 8:15 AM

#

stone garden And upscale?

using all high quality source images, with 0 upscale needed was the biggest improvement so far. Really helps with keeping fine detail consistent across all images generated.
My current dataset is manually edited - so all images are cropped to my ideal standards (2:3, 4:5, 1:1, 16:9, etc.)

but when you're training just a single concept (emphasis on concept, not style), then 1:1 crop will gave the same results as using completely mixed buckets.
If you only use one non 1:1 resolution, like for example 2:3, then expect your lora to perform better when generating images at that exact aspect ratio, and somewhat worse at all other aspect ratios.

If you're doing a style lora, than just use the images in various aspect ratios, and maybe crop a few so you can also have some 1:1 examples in there - in case there were none

#

fyi, this can also be misused to essentially make an aspect ratio lora, to give more consistent results with 1:4 or 4:1 aspect ratios, which usually only work around half the time.
if all your source images are 2048:512

normal ember Aug 17, 2023, 8:49 AM

#

Base vs LoRA. Not sure what to think about it.

covert pagoda Aug 17, 2023, 9:04 AM

#

hollow spruce funnily enough, I don't use the graphs at all. I usually just rigorously test my...

I love seeing different approaches by different camps. Thanks for sharing your experience

#

@hollow spruce what do you use AdamW for? Usually more the 8bit variant? I'm using an A100 on runpod, but wondering on what scale are the benefits of the added precision if needed

tall condor Aug 17, 2023, 10:23 AM

#

Hi Guys, im trying to train bodyparts "Fingers" so i made a libary of like 300 images of Hands and Fingers. im creating a lora from that which generally works ok but my issue is that as soon as i apply the lora the resulting images get very narrow

#

is there a tag / way i can use to basically mark it as bodyparts and avoid that

normal ember Aug 17, 2023, 11:37 AM

#

I can't belive it, can it turn out this good on the first try? Here's images with prompt. https://imgur.com/a/XPXrkKg

Imgur

Custom LoRA

#

Loss from tensorboard. I picked the last epoch.

sonic narwhal Aug 17, 2023, 12:35 PM

#

normal ember There are so many models to pick. I'd like to test openflamingo/OpenFlamingo-9B-...

Open flamingo is best captioner I have tried so far

sonic narwhal Aug 17, 2023, 12:45 PM

#

normal ember I can't belive it, can it turn out this good on the first try? Here's images wit...

can u send the json file?

signal warren Aug 17, 2023, 1:05 PM

#

normal ember I can't belive it, can it turn out this good on the first try? Here's images wit...

Very impressive!

stone garden Aug 17, 2023, 1:11 PM

#

anyone had size mismatch errors with training sdxl ?

hollow spruce Aug 17, 2023, 2:39 PM

#

stone garden anyone had size mismatch errors with training sdxl ?

you trying to merge with base?

hollow spruce Aug 17, 2023, 2:41 PM

#

covert pagoda <@211089689652887552> what do you use AdamW for? Usually more the 8bit variant? ...

I was told AdamW + full bf16 training is better than AdamW8bit. They need similar vram, so I've just been going with it 🤷‍♂️ if nothing else, it doesn't make the training worse.

#

(would probably be worth running a few tests to compare them with the same dataset + settings, and see if there's a difference)

normal ember Aug 17, 2023, 2:43 PM

#

hollow spruce I was told AdamW + full bf16 training is better than AdamW8bit. They need simila...

Do you know if this is still true?
This option enables the full bfloat16 training (includes gradients). This option is useful to reduce the GPU memory usage. However, bitsandbytes==0.35 doesn't seem to support this. Please use a newer version of bitsandbytes or another optimizer. I cannot find bitsandbytes>0.35.0 that works correctly on Windows.

stone garden Aug 17, 2023, 2:45 PM

#

hollow spruce you trying to merge with base?

no im just trying to train it with kohya

normal ember Aug 17, 2023, 2:47 PM

#

sonic narwhal can u send the json file?

Which json are you looking for? I got plenty to choice from. 😄

stone garden Aug 17, 2023, 2:50 PM

#

normal ember Base vs LoRA. Not sure what to think about it.

are you training a face?

normal ember Aug 17, 2023, 2:50 PM

#

--full_bf16 does not bomb on me with --sdpa. Use use that instead of --xformers.

normal ember Aug 17, 2023, 2:51 PM

#

stone garden are you training a face?

More a style, textures and features.

#

DoF

stone garden Aug 17, 2023, 2:52 PM

#

cool that just looks like that street photography i was going for

normal ember Aug 17, 2023, 2:53 PM

#

I'm running another dataset now that I've made today, 225 images total.

stone garden Aug 17, 2023, 2:54 PM

#

how do you train it?

#

i was trying to but it kept failing on me

normal ember Aug 17, 2023, 2:57 PM

#

With kohya_ss using sdxl_train_network.py. I took some inspiration from Caith's configs but removed some for the defaults and changed some based on this: https://hoshikat-hatenablog-com.translate.goog/entry/2023/05/26/223229?_x_tr_sl=sv&_x_tr_tl=en&_x_tr_hl=en&_x_tr_pto=wapp

Hatena Blog

hoshikat

誰でもわかるStable Diffusion　Kohya_ssを使ったLoRA学習設定を徹底解説 - 人工知能と親しくなるブログ

前回の記事では、Stable Diffusionモデルを追加学習するためのWebUI環境「kohya_ss」の導入法について解説しました。今回は、LoRAのしくみを大まかに説明し、その後にkohya_ssを使ったLoRA学習設定について解説していきます。 ※今回の記事は非常に長いです！この記事では「各設定の意味」のみ解説しています。「学習画像の用意のしかた」とか「画像にどうキャプションをつけるか」とか「どう学習を実行するか」は解説していません。学習の実行法についてはまた別の記事で解説したいと思います。 LoRAの仕組みを知ろう「モデル」とは LoRAは小さいニューラルネットを追加する …

#

This series is good too, it's three parts. https://medium.com/@dreamsarereal/understanding-lora-training-part-1-learning-rate-schedulers-network-dimension-and-alpha-c88a8658beb7

Medium

Understanding LoRA Training, Part 1: Learning Rate Schedulers, Netw...

A guide for intermediate level kohya-ss scripts users looking to take their training to the next level.

hollow spruce Aug 17, 2023, 3:29 PM

#

normal ember Do you know if this is still true? `This option enables the full bfloat16 traini...

was about to say kohya gui now includes a compatible bitsandbytes XD but wth?

#

so. yeah.
It works? 😕

#

why does it work 🤣

normal ember Aug 17, 2023, 4:05 PM

#

hollow spruce why does it work 🤣

Seems to work yes and uses like 3.5G less VRAM than without, but I don’t use xformers. Might be why it’s working.

sonic narwhal Aug 17, 2023, 4:55 PM

#

normal ember Which json are you looking for? I got plenty to choice from. 😄

The one u used for that training? I guess it was movie style training?

normal ember Aug 17, 2023, 5:56 PM

#

sonic narwhal The one u used for that training? I guess it was movie style training?

The dataset was prepared from a movie yes.

signal warren Aug 17, 2023, 6:37 PM

#

I think he just wants the json file to have an idea what settings, learning rate etc you used lol

#

Up to you though!

normal ember Aug 17, 2023, 6:47 PM

#

I see, no json sorry. Custom script but here's probably the important stuff. That did not turn out good. I can see what I can come up with. 😄 Here: https://gist.github.com/twri/1166fd65f30cea4c53d0c16ae0ee4f26

#

But there's probably better settings so take it with a big grain of salt

signal warren Aug 17, 2023, 8:11 PM

#

normal ember I see, no json sorry. Custom script but here's probably the important stuff. Tha...

Cheers love, very much appreciated, I hope that person is happy now.

sonic narwhal Aug 17, 2023, 8:15 PM

#

normal ember I see, no json sorry. Custom script but here's probably the important stuff. Tha...

Thanks, how many total images and repeats?

normal ember Aug 17, 2023, 8:16 PM

#

sonic narwhal Thanks, how many total images and repeats?

230 images, 10 repeats but I don't think it matters since I have no reg images. Just one folder.

#

Doing another run on another dataset with about the same number of images and I can't understand why I get about the same loss? Only thing I can think of is the captioning is similar format.

#

Ok, not the same but similar curve.

open merlin Aug 17, 2023, 9:19 PM

#

Yea loss curves tend to flatten out relatively quickly on average I've noticed. You should probably check on your training data. And maybe store the gradients so you can simulate a larger batch size. Up the learning rate too then. Then use cosine with restarts (like 2 a 3 cycles. That seemed to work for me when I trained my last lora. Also just use repeat 1 and up the number of epochs, so you can save intermediate results.

ruby pond Aug 18, 2023, 12:39 AM

#

Is ~5.7s/it about normal for batch size 8 lora training on a 4090?

torn spade Aug 18, 2023, 2:16 AM

#

hey yall - having some issues with a lora im currently making right now. I'm training to train on the bberny belt/skirt by diesel, and was wondering if my image selection is a little poor.
would appreciate if yall have any feedback on image selection and labeling
https://github.com/matthew2k/bberny_lora

GitHub

GitHub - matthew2k/bberny_lora: training images + data for lora tra...

training images + data for lora training. Contribute to matthew2k/bberny_lora development by creating an account on GitHub.

young crater Aug 18, 2023, 2:51 AM

#

torn spade hey yall - having some issues with a lora im currently making right now. I'm tra...

I am by no means good at lora training, but I noticed that I got the best results when I could easily describe everything in the scene I didn't want to be trained.

For instance, Image 15 in your set, I have no idea what the top is or how to describe it and I suspect the ai would assume that's part of what you want from your training data (also the skirts being covered by the top, which will hinder training). Image 18, in comparison, is probably a solid image as the ai knows what a Shiny Jacket is and a person standing in a white photo studio.

**As for the caption, for image 18: **

mini skirt, a fashion photo of a woman standing, wearing a silver metalic jacket, high heels, holding a purse, black hair, brown skin, white photo studio background

This gives a keyword the ai already knows (mini skirt), a simple description of the subject, a description of each part of the subject you dont want the lora to remember, a description of the background.

**For Image 19: **

mini skirt, close up photo of a torso, brown skin, white background

Should be enough to get the point across. In both of these cases, I believe it is important to specify skin tone or else the LORA will err towards what ever skin tone is most prevalent in your training data.

Image 24 looks too compressed. Image 27-30 aswell. Since SDXL training is 1kx1k, any visible compression will leak into the LORA. I've found even high res photos downscaled to 1080p do better than still frames of 4k movies in terms of compresion/sharpness. It looks like, in your outputs, the image compression/resolution is baking into the LORA. Also a single bad image can screw up a training. Err towards less images of higher quality rather than more of lesser quality.

torn spade Aug 18, 2023, 3:48 AM

#

young crater I am by no means good at lora training, but I noticed that I got the best result...

super helpful! i think this confirms what i was thinking, which is that image selection rather than parameter tuning seems to be the bigger culprit.

torn spade Aug 18, 2023, 3:49 AM

#

young crater I am by no means good at lora training, but I noticed that I got the best result...

as for labeling, is trying to label literally everything thats not ur Lora the best practice?

torn spade Aug 18, 2023, 4:04 AM

#

also i use this sight to see how labeling is sort of interprested by sd1.5 https://laion-aesthetic.datasette.io/laion-aesthetic-6pls/images?_search=distorted&_sort=rowid

young crater Aug 18, 2023, 4:29 AM

#

torn spade as for labeling, is trying to label literally everything thats not ur Lora the b...

Each tag in your caption is another tag being trained. You just want to make sure the ai understands that the tag you want, belt skirt, is only refering to the belt skirt she is wearing and not the other parts. That is much easier to do with simple, easy to comprehend, images. That way you can use less words to describe more.

#

But its not really describing what you dont want, just describing each feature of the image

latent charm Aug 18, 2023, 6:29 AM

#

I had two test lora on peace sign hand. The current issue is that nail usually draw wrong and thumb connected with ring finger. Where could I find high quality training image? I think it should be perform better if the training set get improved. @hollow spruce Do you have any suggestion?

steady jackal Aug 18, 2023, 10:04 AM

#

Hi guys, i was wondering if its possible to train a lora from an existing .safetensors file? Ive been looking all over but cant find a clear answer or way to do it.. From what i see i can do it from some webuis but i cant find how to do it in python. Do any of you maybe know of a guide or something? When i google it i get results where you train from base and end up with a safetensor instead of start with one.

latent charm Aug 18, 2023, 10:20 AM

#

steady jackal Hi guys, i was wondering if its possible to train a lora from an existing .safet...

The LECO concept might be what you're looking for. It use existing model and prompt to train a lora.

#

The original LECO git doesn't support SDXL yet. Someone modified it and published on civitai. Also, he was trying to improve it to apply fansy training like, generated by model A and train on model B, etc.

steady jackal Aug 18, 2023, 10:29 AM

#

Looks like that does what i want, but it seems to be mainly focussed on erasing concepts?

#

I dont really see an example for training on images. only prompts to remove from the model.

latent charm Aug 18, 2023, 10:38 AM

#

steady jackal Looks like that does what i want, but it seems to be mainly focussed on erasing ...

The original paper is to erase concept. But people modified it as training lora using prompt and model

#

The idea is to use prompt to generate "images" from model and train it

steady jackal Aug 18, 2023, 10:41 AM

#

Do you have a github link or something for me of one of these modifications? I'm currently looking at https://github.com/p1atdev/LECO but this might not be the right one?

latent charm Aug 18, 2023, 10:43 AM

#

It is the one which haven't support sdxl yet

stiff dust Aug 18, 2023, 10:44 AM

#

steady jackal Hi guys, i was wondering if its possible to train a lora from an existing .safet...

what do you mean with "existing safetensor" file?

steady jackal Aug 18, 2023, 10:44 AM

#

stiff dust what do you mean with "existing safetensor" file?

As in a civitai checkpoint for example

stiff dust Aug 18, 2023, 10:44 AM

#

safetensor ist just a file format. Do you mean: train a lora from an existing lora? Or training a lora from another sdxl checkpoint?

#

but what is the issue then? In kohya_ss you give the safetensor file of the model as parameter

#

--pretrained_model_name_or_path="/path/to/sdxl/model.safetensors"

latent charm Aug 18, 2023, 10:45 AM

#

steady jackal Do you have a github link or something for me of one of these modifications? I'm...

https://civitai.com/articles/1766?highlight=106995#comments

Civitai | 【GoodBye,AllQualityWords】bdsqlsz LoRA training Advanced T...

steady jackal Aug 18, 2023, 10:47 AM

#

stiff dust but what is the issue then? In kohya_ss you give the safetensor file of the mode...

There is no issue, im asking for a guide on how to do it in python. I have not heard of kohya_ss.

stiff dust Aug 18, 2023, 10:47 AM

#

ah, okay. I mean you can use diffusers if you want to write the code yourself

steady jackal Aug 18, 2023, 10:48 AM

#

i have tried but i dont think this accepts .safetensors format?

stiff dust Aug 18, 2023, 10:48 AM

#

otherwise kohya sd-scripts is a nice collection of python scripts for lora training

latent charm Aug 18, 2023, 10:48 AM

#

For general lora training, just use kohya ss is enough

steady jackal Aug 18, 2023, 10:48 AM

#

stiff dust otherwise kohya sd-scripts is a nice collection of python scripts for lora train...

awesome, i will have a look.

stiff dust Aug 18, 2023, 10:48 AM

#

steady jackal i have tried but i dont think this accepts .safetensors format?

as said, safetensor is just a format. But diffusers usually wants their checkpoints in their own format, but there are conversion scripts I think

#

https://github.com/kohya-ss/sd-scripts

GitHub

GitHub - kohya-ss/sd-scripts

Contribute to kohya-ss/sd-scripts development by creating an account on GitHub.

steady jackal Aug 18, 2023, 10:49 AM

#

stiff dust as said, safetensor is just a format. But diffusers usually wants their checkpoi...

i'm asuming that would be .ckpt format?

stiff dust Aug 18, 2023, 10:49 AM

#

.ckpt or .safetensors doesn't matter. It's how the model is stored within these formats

#

models are usually stored as python dictionaries of key -> tensor. .cpkt is just a pickle of these, .safetensors is a more restricted serialization routine

steady jackal Aug 18, 2023, 10:50 AM

#

stiff dust .ckpt or .safetensors doesn't matter. It's how the model is stored within these ...

when i tried putting a .safetensors file in diffusers i got an error that the string i input wasnt a folder or something like this

#

it didnt accept the .safetensors etc.

stiff dust Aug 18, 2023, 10:50 AM

#

the problem is that diffusers has different key names than auto111/kohya/sai

steady jackal Aug 18, 2023, 10:50 AM

#

i will for sure have a look at kohya

#

i wanted to use diffusers but it got really confusing as it just refused to accept the format.

stiff dust Aug 18, 2023, 10:52 AM

#

otherwise try conversion scripts, like here: https://github.com/huggingface/diffusers/blob/main/scripts/convert_original_stable_diffusion_to_diffusers.py

GitHub

diffusers/scripts/convert_original_stable_diffusion_to_diffusers.py...

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch - huggingface/diffusers

steady jackal Aug 18, 2023, 10:52 AM

#

mig thave been doing something wrong, but also could find a guide on it at all

steady jackal Aug 18, 2023, 10:52 AM

#

stiff dust otherwise try conversion scripts, like here: https://github.com/huggingface/diff...

ah thats great, i will have a look

stiff dust Aug 18, 2023, 10:53 AM

#

I just don't know if they already work for sdxl

steady jackal Aug 18, 2023, 10:54 AM

#

its fine it it doesnt, atleast then i can try to get the hang of it and start learning 🙂

#

thanks a lot both of you

stone garden Aug 18, 2023, 11:00 AM

#

kohya script train network keeps crashing after trying to load the network in UNet2DConditionModel print

#

Missing key(s) in state_dict: "down_blocks.0.attentions.0.norm.weight",

#

any suggestion?

hollow spruce Aug 18, 2023, 1:17 PM

#

latent charm I had two test lora on peace sign hand. The current issue is that nail usually d...

ah yes. anatomy training XD
https://unsplash.com/s/photos/peace-sign
there's a good amount there, if you don't already have them in your dataset

#

fun fact. I'm also doing anatomy training right now

#

plz send help. RTX4090 isn't fast enough 🥲

hollow spruce Aug 18, 2023, 1:20 PM

#

latent charm I had two test lora on peace sign hand. The current issue is that nail usually d...

there's also flickr, for more "human" photos, that don't look like photoshoots, but you'll have to filter through A LOT of bad images first XD
https://flickr.com/search/?text=peace sign

latent charm Aug 18, 2023, 1:24 PM

#

Thanks for share. I gather my training set from pexels. Most of them are jpg and I got the jpg effect burned into the lora.🤣

hollow spruce Aug 18, 2023, 1:25 PM

#

latent charm Thanks for share. I gather my training set from pexels. Most of them are jpg and...

rip ❤️
one of my trigger words also learned that phone compression from the selfie camera on phones 🥲

#

too many selfies in my dataset 🤣

latent charm Aug 18, 2023, 1:26 PM

#

I think the lora learned the general shape of the pose but failed in detail. Do you have advice for that?

hollow spruce Aug 18, 2023, 1:27 PM

#

latent charm I think the lora learned the general shape of the pose but failed in detail. Do ...

either a big enough dataset, or go all in on overfitting, and train for about 300~600% too long.
(also training rate scheduler)

#

probably a valid reason to switch to cosine. Though I have no advice at what rate to start at

latent charm Aug 18, 2023, 1:29 PM

#

How big is enough? My dataset only has 26 for now.

hollow spruce Aug 18, 2023, 1:30 PM

#

if they are similar enough, then 30~50 for overfitting. 100~200 should allow you to get away with only a small amount of overfitting
200~500 for "flexible". <- but that's not really needed, as you probably always want a peace sign if you load your lora. so just do it via overfitting.

#

general rule of thumb, is 10x the dataset for a flexible lora, and 100x for full finetuning

#

I currently have 200~250 images per body part I'm training x_x which is why its training forever 😭

latent charm Aug 18, 2023, 1:33 PM

#

I would try 50 first then continue gather more.

west plank Aug 18, 2023, 3:41 PM

#

I just used Kohya to make my first LoRA and it was surprisingly easy. Made a lot of first time mistakes but it still turned out super functional. Used way too large of a data set, like 2000 images. I don’t need that much >.>

sullen locust Aug 18, 2023, 5:41 PM

#

Hello I am using stable diffusion xl and i am trying to create my avtar by using lora training via kohya_ss and i am having some error attached in it , help me . thanks

#

young crater Aug 18, 2023, 6:08 PM

#

sullen locust

what does your folder layout look like inside of /avatar/img/

young crater Aug 18, 2023, 6:11 PM

#

sullen locust

it looks like kohya is unable to find your training images, they should be setup as:

/[project]/img/[steps]_[keyword]/[images]
for example:
/avatar/img/1_man/image_01.png

brittle plaza Aug 18, 2023, 11:19 PM

#

hey there! curious if anyone has had issues with distortions with training lora's on people. here's an example of a lora i trained recently (V* shaking hands with natalie portman) but there's some pretty significant facial distortions.

I used kohya ss with ~15 images, celeb token, no captions, no regularization. could this be due to lack of regularization?

stone garden Aug 19, 2023, 6:06 AM

#

Does anyone have guides for fixing teeth if one doesn't have close-ups of the subject's teeth? Something like an embedding?

stone garden Aug 19, 2023, 11:40 AM

#

bitsandbytes says no gpu support on amd anyone faced this?

normal ember Aug 19, 2023, 12:15 PM

#

brittle plaza hey there! curious if anyone has had issues with distortions with training lora'...

I don’t have a good answer to this but eyes in profile are always tricky to get right even on the base model.

stiff dust Aug 19, 2023, 12:40 PM

#

brittle plaza hey there! curious if anyone has had issues with distortions with training lora'...

I would say its just too low resolution. SD is not good for small faces. Upscale the image and redraw the faces

normal ember Aug 19, 2023, 1:02 PM

#

Yep, few passes of base on it after upscale can fix it.

hollow spruce Aug 19, 2023, 1:32 PM

#

hollow spruce

40hours later on my rtx4090. 1.4k image dataset - 2x 40epoch runs with constant (5e-4) vs cosine with restarts (1e-3 + restart every 5 epochs)
constant with significant warmup, won hands down. Lora is working well enough after 12 hours. This is essentially a finetune when it comes to body anatomy. But realistically seen it would have to run for about to 60 more hours until it achieves perfection 🥲
cosine with restarts was my attempt to speed this up, but after 12h of training, it's already worse than the other lora, rather than better. While it might just converge at a later point, it definitely defeats the point of saving time 🤷‍♂️

no dropout, no offset noise, no min snr gamma - since I didn't want to damage the base sdxl capabilities.

Results? Near perfect with constant. Model works almost identical to base, backgrounds aren't influenced at all, but everything about anatomy is now working on the first attempt. I'd show results but for nsfw reasons this obviously isn't an option XD
I'll probably have to move to a A100 stack once I combine this with my master dataset 🥲

But yeah, if anyone wants to train hands/feet/different body shapes/nsfw/skin detail - feel free to hit me up in a dm. I now have well working settings, which don't rely on good captions - but oh god is it training slow as hell.

hazy schooner Aug 19, 2023, 3:10 PM

#

jade hornet Aug 19, 2023, 5:27 PM

#

stone garden bitsandbytes says no gpu support on amd anyone faced this?

You need a rocm version, let me find a link

jade hornet Aug 19, 2023, 5:28 PM

#

stone garden bitsandbytes says no gpu support on amd anyone faced this?

https://github.com/Titaniumtown/bitsandbytes-rocm/tree/patch-2

GitHub

GitHub - Titaniumtown/bitsandbytes-rocm at patch-2

Contribute to Titaniumtown/bitsandbytes-rocm development by creating an account on GitHub.

#

That's the one I use

#

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++ DEBUG INFORMATION +++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++ DEBUG INFO END ++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Running a quick check that:
    + library is importable
    + CUDA function is callable

SUCCESS!
Installation was successful!

#

that will show you if it's installed correctly

tepid sundial Aug 19, 2023, 8:20 PM

#

hollow spruce 40hours later on my rtx4090. 1.4k image dataset - 2x 40epoch runs with constant ...

Have you shared your training script anywhere?

hollow spruce Aug 19, 2023, 8:35 PM

#

tepid sundial Have you shared your training script anywhere?

epochs / warmup are pretty specific to the dataset size
basically the bigger the dataset, the longer it needs to run. 1400 images would be between 40~200 epochs. which takes 12~60 hours on a RTX4090

batch size needs to be adjusted to whatever you can run. (12 is the absolute max on a 4090)

really not the fastest option, not by far, but it works well for anatomy specifically

📎 anatomy_v0.1_20230818-050922_-_Copy.json

tepid sundial Aug 19, 2023, 8:37 PM

#

I was more so asking about the code for the training loop you use. Do you write your own or use one that's available?

hollow spruce Aug 19, 2023, 8:38 PM

#

ah, yeah. I use standard kohya gui - as I'm still writing a guide

tepid sundial Aug 19, 2023, 8:38 PM

#

Ah, I see

hollow spruce Aug 19, 2023, 8:38 PM

#

hard to make a good guide with all custom code XD

tepid sundial Aug 19, 2023, 8:40 PM

#

Hmm, perhaps 🤷‍♂️
A well annotated notebook is something thing community would benefit from I guess. There are many training scripts out there right now, and the one that was recently merged into diffusers to train SDXL for txt2img has major warnings that results aren't good, and would require heavy hyperparam search. Most scripts out there are fairly similar, but differ on many small (but maybe important details). Like offset, min-snr, terminal snr, etc.

It just feels in general like there's too little data being shared around successful training runs.

#

https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/README_sdxl.md

GitHub

diffusers/examples/text_to_image/README_sdxl.md at main · huggingfa...

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch - huggingface/diffusers

hollow spruce Aug 19, 2023, 8:41 PM

#

checks out. diffusers training is hard as hell

#

simpletuner, while still hard, is definitely your best bet

#

does require coding knowledge though

#

(for full finetuning that actually works)

#

A6000 required or better

tepid sundial Aug 19, 2023, 8:42 PM

#

The author of SimpleTuner helped and did thorough review of the PR that introduced that training script, there definitely were many suggestions from him that wasn't implemented in the final script that got merged.

#

In my own script I've tried combining strategies from diffusers, simpletuner and sd-scripts, but without good public data on sucessful training runs, it requires so much wasted time to search the param space. I truly wish model makers would be more open to sharing data. But alas they view their models as IP, rather than wanting to partake in research, they will sit on their "secret sauce".

hollow spruce Aug 19, 2023, 8:45 PM

#

training via kohya is essentially fully working though. my best lora so far has achieved what would in 1.5 have been called a full finetune

tepid sundial Aug 19, 2023, 8:46 PM

#

And forgive my ignorance, but when you say kohya, you simply mean a GUI frontend over the logic in sd-script?

hollow spruce Aug 19, 2023, 8:47 PM

#

I'm mainly using the gui frontend for the simplicity of sharing settings, but the launch cli command is the same

tepid sundial Aug 19, 2023, 8:48 PM

#

Okay, just wasn't sure if the GUI did additional things. I've never checked it out.. but the code for sd-scripts.. I've read one too many times by now..

hollow spruce Aug 19, 2023, 8:48 PM

#

the gui does have the advantage of having all the new 'working' parts from the dev branches of kohya-ss integrated

#

so you don't need to mess with the dev branches yourself

tepid sundial Aug 19, 2023, 8:49 PM

#

What's been your experience with training diverse aspect ratios (clipped to the ratios mentioned in the SDXL paper, however).

hollow spruce Aug 19, 2023, 8:49 PM

#

other than that, nothing really

hollow spruce Aug 19, 2023, 8:49 PM

#

tepid sundial What's been your experience with training diverse aspect ratios (clipped to the ...

working perfectly, with the exception of having your whole dataset consist of only a single bucket size that isn't 1:1

#

so as long as you have multiple mixed aspect ratio buckets, it works even better than standard 1:1 ratios

tepid sundial Aug 19, 2023, 8:50 PM

#

What's been your experience on bf16 vs f16, had issues with bnb AdamW8?

hollow spruce Aug 19, 2023, 8:51 PM

#

had enough people recommend me to switch from AdamW8bit to AdamW + full bf16
I've seen no negative impact from this. Vram is roughly the same, so I've just stuck with it.
Would need to run identical tests though, to see if the additional accuracy actually improves things or not

#

not really a priority for me though, as there's no downside of running full bf16 for now

tepid sundial Aug 19, 2023, 8:52 PM

#

Have you tried running full UNET training with batch size one on a 4090?

hollow spruce Aug 19, 2023, 8:53 PM

#

did various 1 batch tests, with additional accuracies, including one run with gradient checkpointing off XD
yeah. while results were pretty different from what I was expecting, I can't say that a single one of them was actually better. just different?

#

offset noise I've had to stop using though, as it was making my backgrounds greyer. hence why most of my shared settings have 0 offset noise

tepid sundial Aug 19, 2023, 8:54 PM

#

What's your stance on the viability of only training loras as opposed to full unet trianing for the further development of the SDXL ecosystem? (text encoders too, for that matter)

hollow spruce Aug 19, 2023, 8:54 PM

#

more complicated of a matter

#

clip training is great, but nothing like SD1.5

#

any old tutorial/knowledge is no longer valid when it comes to sdxl clip training

tepid sundial Aug 19, 2023, 8:55 PM

#

Likely not the bottleneck anytime soon; but the lack of full unet trainings has me concerned

hollow spruce Aug 19, 2023, 8:55 PM

#

but when done right, its really really good

#

full finetune is ... yeah. resource heavy

#

even my rtx4090 isn't really good enough

#

batch size 1, even with GA, isn't a true solution

tepid sundial Aug 19, 2023, 8:57 PM

#

Can't say I'm happy with the setup on a 4090, either. Which is a bummer

hollow spruce Aug 19, 2023, 8:57 PM

#

A6000 is now working, if you go the diffusers route

#

but as I dont have one XD I cant really speak much about it

#

instead I'm just seeing how far I can take lora training

tepid sundial Aug 19, 2023, 8:59 PM

#

While it's going to be slow, I think it would be great if a setup that at least achieves good results on a 4090 would benefit the community, as it would increase the amount of unet trainings available

#

Would distribute the effort across more people

hollow spruce Aug 19, 2023, 8:59 PM

#

🤷‍♂️ genuinely not sure, if I just look at all the loras that are currently publicly available

#

the bad lora echo chamber is real

tepid sundial Aug 19, 2023, 9:01 PM

#

There's always going to be noisy signal, but unless there's good tooling available, you can't hope to filter out decent signal from all the noise to begin with

hollow spruce Aug 19, 2023, 9:01 PM

#

true that

#

I'd get into it if it runs on 24gb vram

#

but at this rate, chances are higher I'll throw my full dataset at the owner of simpletuner once its complete XD

#

currently at 6k images. final will be around 50k images. (all manually edited/cropped/captioned)

tepid sundial Aug 19, 2023, 9:02 PM

#

Well, with batch size one it runs for only unet, so it's wicked slow. But slow and working is better than nothing at all

hollow spruce Aug 19, 2023, 9:04 PM

#

tepid sundial Well, with batch size one it runs for only unet, so it's wicked slow. But slow a...

I'm not sure if running it for 1000hours is even a 'true' option 🤣

tepid sundial Aug 19, 2023, 9:04 PM

#

I've tried trainings with datasets ranging from 5k images to 20k images, batch size one. And results have been very mixed depending on what techniques I incorporate. Sampling data on training results is a ... very slow process.

hollow spruce Aug 19, 2023, 9:04 PM

#

at that point its cheaper to rent a runpod A100 stack, just to offset the electricity costs

tepid sundial Aug 19, 2023, 9:05 PM

#

Yes, obviously it's the way to go. I'm just trying to optimise for all the people out there that simply will never run a training if it means renting on cloud, but if it means leaving their computer on during the night for two weeks, they'll give it a go.

hollow spruce Aug 19, 2023, 9:05 PM

#

tepid sundial I've tried trainings with datasets ranging from 5k images to 20k images, batch s...

while there are ways to cheat with training time - if you use them, are you even benefitting from full unet training? Cause then I genuinely have to ask if a lora wouldn't be both more efficient and better

tepid sundial Aug 19, 2023, 9:06 PM

#

That depends on what we're talking about specifically when we say "cheat'

hollow spruce Aug 19, 2023, 9:07 PM

#

higher learning rates, or adaptive optimizers, which scale up the learning rate for you

tepid sundial Aug 19, 2023, 9:08 PM

#

Yeah, and this is what I want to find more data on : D
It's hard to argue these things without any collected evidence

latent charm Aug 19, 2023, 9:11 PM

#

I had tried once fine tune with 10k images. It takes 5x24 hours on my 3090 and I predicted it needs two more rounds (5x24x2) to get things done. Then, I giveup to fully fine tune and used a smaller dataset to fine tune the first output.

hollow spruce Aug 19, 2023, 9:16 PM

#

tepid sundial Yeah, and this is what I want to find more data on : D It's hard to argue these ...

I usually compare my loras to random dreamshaper images that have been shared XD

tepid sundial Aug 19, 2023, 9:17 PM

#

Have you tried doing FID?

hollow spruce Aug 19, 2023, 9:17 PM

#

if I can beat or match 3/5 images, using the seed 2, then I consider my lora as 'good enough' XD

hollow spruce Aug 19, 2023, 9:19 PM

#

tepid sundial Have you tried doing FID?

fid?

tepid sundial Aug 19, 2023, 9:21 PM

#

https://huggingface.co/docs/diffusers/conceptual/evaluation#quantitative-evaluation

Evaluating Diffusion Models

#

https://torchmetrics.readthedocs.io/en/stable/image/frechet_inception_distance.html

#

There was some indications that FID might not be a stellar metric in the SDXL paper

hollow spruce Aug 19, 2023, 9:40 PM

#

nop. have never used FID yet. but that was an interesting read x_x

stiff dust Aug 19, 2023, 10:30 PM

#

tepid sundial Have you tried doing FID?

fid is not suitable for one-topic lora training. It measures distributions over whole datasets

tepid sundial Aug 19, 2023, 10:37 PM

#

Right, we were talking about full unet training runs

stiff dust Aug 19, 2023, 10:44 PM

#

doesn't matter except you train the unet on a complete dataset of many different subjects

#

but fid doesn't really measures aesthetics anyways

marsh brook Aug 19, 2023, 10:52 PM

#

Anyone into the following

I am a sock manufacturer.
I am looking to have AI create image Files from images given by customers and utilize AI image creativity.

Image files created are used to transmit data to a machine to engage functions.
Data transmission or Data signal designators to the machine are represented by the RGB colors located in the file. Machine capability is limited. RGB colors that can be in the file must be limited. Currently Ai image generators use shading, gradient, etc.. in creating images. you also can not designate the image size more specifically image size in Pixels.

Example
168 pixels wide 400 pixels height.

168 represents the 168 needles that are in the cylinder of the machine.
400 represents how many courses are in the sock. or how many times the cylinder has rotated picking up different colored yarn at its yarn intake points.

RGB colors in the file are used by technicians to designate fixed yarn takeup points on the machine.

Transmitting the data to the machine is not what I am looking for. I am just looking to create images

#

Current reasons that ai images on all ai platforms do not communicate with machine equipment that makes textiles. 1 non able to mandate size of file in pixels. 2 non able to mandate number of allowable rgb colors in the file.

hollow spruce Aug 19, 2023, 11:16 PM

#

marsh brook Anyone into the following I am a sock manufacturer. I am looking to have AI cre...

wrong channel. this belongs in #💬｜general-chat
definitely doable, but you need a full (custom) app or at least script in the middle to handle the issues you're facing

ruby pond Aug 20, 2023, 12:48 AM

#

Is it normal to see images that are closer to the reg images than training images in the earlier epochs of training a lora?

plain bolt Aug 20, 2023, 1:32 AM

#

Is captioning really important if I am training a subject that is scifi and doesn't look anything like real life?

normal ember Aug 20, 2023, 5:33 AM

#

Do you think that the clip vision model or something like it could be used to ”caption” images during training?

gloomy prairie Aug 20, 2023, 7:03 AM

#

Is this red and green noise a common occurance? 🤨

quiet eagle Aug 20, 2023, 8:25 AM

#

What's the current state of fine-tuning sdxl with 4090 - Lora in a few hours, full fine-tune not possible?

stone garden Aug 20, 2023, 10:24 AM

#

jade hornet You need a rocm version, let me find a link

yeah turns out i needed a rocm version of torch and the script reqs installed the nvidia ones that's why it was giving errors and not showing gpu

#

so at least i can get to the train steps now but it keeps getting killed with SIGKILL 9 for some reason

#

only got to like 70 steps last time and it took like half an hour not sure if this is normal speed or if the gpu isn't being used it didn't make ths fans spin like a jet about to take off

#

another debugging week i guess not sure if this is worth it

jade hornet Aug 20, 2023, 10:29 AM

#

stone garden only got to like 70 steps last time and it took like half an hour not sure if th...

run rocm-smi, it'll tell you the gpu utilization

#

and run that command I posted above to make sure your bitsandbytes is good

stone garden Aug 20, 2023, 10:54 AM

#

jade hornet and run that command I posted above to make sure your bitsandbytes is good

how do you install the rocm version? it just says pip install bitsandbytes

#

my version errors out

#

also i dont have rocm-smi?

jade hornet Aug 20, 2023, 10:55 AM

#

that wont work, the pip will install the cuda version of it, you have to git pull and compile it yourself

stone garden Aug 20, 2023, 10:57 AM

#

jade hornet that wont work, the pip will install the cuda version of it, you have to git pul...

https://github.com/Titaniumtown/bitsandbytes-rocm/blob/patch-2/compile_from_source.md whats the right cuda target ?

#

it says needs nvcc but thats nvidia right

#

python dependencies are a pain to work with lol

jade hornet Aug 20, 2023, 11:04 AM

#

ignore that, read the readme

#

remember this was patched to work with rocm

#

git clone https://git.ecker.tech/mrq/bitsandbytes-rocm
make hip
CUDA_VERSION=gfx1030 python setup.py install # assumes you're using a 6XXX series card
python3 -m bitsandbytes # to validate it works

stone garden Aug 20, 2023, 11:07 AM

#

cool thanks

#

make says no nvcc in path

jade hornet Aug 20, 2023, 11:13 AM

#

did you edit the makefile to point to your rocm location?

#

make sure you have all the rocm packages

ruby pond Aug 20, 2023, 11:27 PM

#

Is this 'done'? Will it just get worse from now? Or should I let it keep going?

jade hornet Aug 21, 2023, 12:31 AM

#

never really decided based on a graph of loss, I always look at the quality of the sample images to figure out if it's getting better or worse

#

honestly, if you were putting in a prompt and getting an image out, how would you decide if it sucked? not a graph surely

ruby pond Aug 21, 2023, 12:34 AM

#

jade hornet honestly, if you were putting in a prompt and getting an image out, how would yo...

good point, I have been testing the files as they save. The file with this spike hasn't saved yet

#

It's still climbing

ruby pond Aug 21, 2023, 1:59 AM

#

yeah it's done 😄 the output from the latest save is complete garbage, totally scrambled

#

good to have my gpu back after 3 days of training

opal jacinth Aug 21, 2023, 8:36 AM

#

Hey, what are your experiences for using triggers while training an own lora for a person with SDXL? Because I've seen some tutorials where it states random triggers like "sks" are better, but at the same time some other tutorials mention that the persons name is just fine and random triggers aren't working that well

normal ember Aug 21, 2023, 9:44 AM

#

ruby pond Is this 'done'? Will it just get worse from now? Or should I let it keep going?

What does the loss/epoch look like?

ruby pond Aug 21, 2023, 9:46 AM

#

normal ember What does the loss/epoch look like?

when the last file saved, it was about 10.8. but the output was useless

normal ember Aug 21, 2023, 9:47 AM

#

ruby pond when the last file saved, it was about 10.8. but the output was useless

If you wouldn’t mind to paste the graph

ruby pond Aug 21, 2023, 9:48 AM

#

normal ember If you wouldn’t mind to paste the graph

I closed the session. How do you get the graph up?

normal ember Aug 21, 2023, 9:51 AM

#

tensorboard --logdir path

ruby pond Aug 21, 2023, 9:52 AM

#

normal ember tensorboard --logdir path

normal ember Aug 21, 2023, 9:54 AM

#

What does 16 or earlier look like?

sullen locust Aug 21, 2023, 9:54 AM

#

Hello , respected ones having some issues while training lora in my cmd , seems like my lora doesn't prepared . Some ss I shared check them , Thanks .

ruby pond Aug 21, 2023, 9:55 AM

#

normal ember What does 16 or earlier look like?

16 is a bit overdone, 17 is good

normal ember Aug 21, 2023, 9:56 AM

#

Try even earlier

ruby pond Aug 21, 2023, 9:57 AM

#

I tested all the saves

normal ember Aug 21, 2023, 9:57 AM

#

Great!

ruby pond Aug 21, 2023, 9:58 AM

#

I'm glad I got something out of it, since it was running for 3 days 😄

normal ember Aug 21, 2023, 9:59 AM

#

I’m in the process to try to see what impact parameters and dataset has to the result. That’s why I’m curious.

#

Ran during the night but haven’t had a chance to look at the results yet.

#

I know I have to adjust the dataset though. But it’s fun to see that a specific feature gets into the LoRA.

#

I think it’s really good to have a good stable source when learning so you have the possibility to change the dataset and iterate.

ruby pond Aug 21, 2023, 10:03 AM

#

This was the first photographic lora I trained with over 9000 images. I've done a few style loras with a few hundred images to train on that turned out pretty good.

gloomy prairie Aug 21, 2023, 11:15 AM

#

Has anyone else encountered these horizontal artefacts and/or a solution to eliminate them? 🤔

marble zodiac Aug 21, 2023, 11:49 AM

#

gloomy prairie Has anyone else encountered these horizontal artefacts and/or a solution to elim...

yes. I'm very sure these are the VAE artifacts when using the broken VAE from the initial release of SDXL 1.0

normal ember Aug 21, 2023, 11:49 AM

#

Yes

marble zodiac Aug 21, 2023, 11:50 AM

#

gloomy prairie Has anyone else encountered these horizontal artefacts and/or a solution to elim...

you can just use a separate VAE like the official one (which is actually the 0.9 VAE) https://huggingface.co/stabilityai/sdxl-vae/resolve/main/sdxl_vae.safetensors

or you can also patch your model with a python script as far as I know

gloomy prairie Aug 21, 2023, 11:50 AM

#

marble zodiac yes. I'm very sure these are the VAE artifacts when using the broken VAE from th...

Like the .safetensors file is outdated? 🤔

marble zodiac Aug 21, 2023, 11:50 AM

#

it's not in your training data

#

if you can't choose a separate VAE for training use this model for merges and fine-tuning: https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0_0.9vae.safetensors

it's the updated version by Stability AI with the working VAE that does not produce artifacts

#

these artifacts only happen when an image is decoded - so only when the RGB image is created. the encoding is not being influenced by it as far as I know. so your training / fine-tuning is fine

gloomy prairie Aug 21, 2023, 11:52 AM

#

Oh right, okay 🤔

#

So nothing wrong with the checkpoint files, just to be clear? I just need to use a different VAE for the decode step?

gloomy prairie Aug 21, 2023, 11:55 AM

#

marble zodiac if you can't choose a separate VAE for training use this model for merges and fi...

Oh, but this is a replacement checkpoint file?

marble zodiac Aug 21, 2023, 11:55 AM

#

gloomy prairie So nothing wrong with the checkpoint files, just to be clear? I just need to use...

yes, there is. but not in the checkpoint data itself. it's the decoding part (VAE) of the model which translates your latent space data to an RGB output image. but this can be fixed

marble zodiac Aug 21, 2023, 11:56 AM

#

gloomy prairie Oh, but this is a replacement checkpoint file?

yes. if you use sd_xl_base_1.0_0.9vae.safetensors there are no artifacts

gloomy prairie Aug 21, 2023, 11:56 AM

#

Great. Thanks for explaining that! 🙌

marble zodiac Aug 21, 2023, 11:57 AM

#

gloomy prairie Great. Thanks for explaining that! 🙌

Sure. You're welcome. I'm not even training 😄 but I have troubleshooted that a couple of times already for people. I use a separate VAE at all times. You can use it in a1111 and ComfyUI without a problem.

But if your model uses the SDXL 1.0 VAE and you distribute the model to Civit or any service and they are not aware of it, people will make images with lots of bad artifacting - which isn't good.

latent charm Aug 21, 2023, 12:45 PM

#

Try to overfit an image into a lora.
Experiment setting:
Lora Type: LoRA-FA
Dim 128, Alpha 1
Learning rate 0.01
Text encoder rate 0
Repeat_600, batch 10, epoch 10

Image 1: loss graph
Image 2: training set
Image 3: reproduce image

#

The final loss is around 0.00367

#

Using the final epoch to reproduce the image with only class token in positive and no negative. The brightness of the reproduce image is slightly darker than the original.

versed crescent Aug 21, 2023, 1:16 PM

#

Is the ideal graph shape for loss when training to have it slowly drop in a linear way?

snow pawn Aug 21, 2023, 1:19 PM

#

Hello Community
Could any one suggest me how to train the finetuned model of Indian Hindu Gods

latent charm Aug 21, 2023, 1:23 PM

#

versed crescent Is the ideal graph shape for loss when training to have it slowly drop in a line...

depends on scheduler and optimizer

versed crescent Aug 21, 2023, 1:24 PM

#

latent charm depends on scheduler and optimizer

Ok. I thought that loss would naturally drop over time as it gets to be a better and better fit with the training data, but my current training run isn't showing that

tall condor Aug 21, 2023, 1:24 PM

#

with that LR in the end your model will not be able to produce anything else than your image

#

it will completely overfit even if you dont even call the concept

versed crescent Aug 21, 2023, 1:25 PM

#

Ah so that means LR is only an indication of overfitting?

latent charm Aug 21, 2023, 1:29 PM

#

tall condor it will completely overfit even if you dont even call the concept

It is generated by lora strength 1.0.

#

May be it is because the training didn't train the text encoder? It seems only affected the class token to reproduce the image

tall condor Aug 21, 2023, 1:30 PM

#

ah ok in this case its a lil different

glass sorrel Aug 21, 2023, 3:16 PM

#

Hello Community!
How can we fine tune the indian hindu gods , could anyone suggest me to achieve this

Thank you

versed crescent Aug 21, 2023, 7:54 PM

#

Is there any document or video that discusses the various different types of LoRA that people experiment with in SD ?

versed crescent Aug 21, 2023, 9:26 PM

#

To reply to myself, I found some english translations of the sd-scripts docs https://github.com/darkstorm2150/sd-scripts#links-to-usage-documentation

hollow spruce Aug 21, 2023, 10:42 PM

#

versed crescent Is there any document or video that discusses the various different types of LoR...

this site goes into a fair amount of detail
https://hoshikat.hatenablog.com/entry/2023/05/26/223229#LoRA-type

#

use google or bing translate for it

#

not too much info on the types themselves, but a lot of info about everything that the types enable you to do - so in theory very helpful

versed crescent Aug 21, 2023, 10:44 PM

#

Ok I'll see if I can get some understanding from them. Whenever I dig into this stuff I end up with a thousand tabs 😄

hollow spruce Aug 21, 2023, 10:45 PM

#

that site should cover about 80% of what you need to know about all the different settings of training (and when you need a non LoRA training type)

versed crescent Aug 21, 2023, 10:45 PM

#

I'm still persevering with your recommended LoRA training setup, and it's fascinating to see everyone's choices as more and more tutorials/videos get published

hollow spruce Aug 21, 2023, 10:46 PM

#

last 20% are trial and error, as sdxl is quite different from SD1.5 in terms of anecdotal information, which there is a lot of online

versed crescent Aug 21, 2023, 10:46 PM

#

yeah, I am constantly looking to see if I've accidentally stumbled into an older v1.5 guide

#

I had my first training session crap out this evening with a NaN for the loss, and black sample images. I'm glad I caught that before heading to bed

ruby pond Aug 22, 2023, 12:07 AM

#

there is a model that has 0.9vae in the filename. use that

#

It's the 1.0 model checkpoint, with the 0.9 vae baked in

hollow spruce Aug 22, 2023, 8:35 AM

#

gloomy prairie Has anyone else encountered these horizontal artefacts and/or a solution to elim...

in case you're wondering. the 1.0 vae causes this

restive bridge Aug 22, 2023, 11:36 PM

#

I started getting black sample outputs during XL lora training, but the loss isnt NaN and the checkpoints work as expected🤔 guessing its something to do with switching mixed precision to fp16 from bf16 and disabling full bf16 training. I did those things hoping for better quality, should it help or should i stick to bf16?

ruby pond Aug 23, 2023, 12:05 AM

#

restive bridge I started getting black sample outputs during XL lora training, but the loss isn...

I got black sample images on my last lora training, but the saves were still working when I tested them in the workflow

restive bridge Aug 23, 2023, 12:06 AM

#

ruby pond I got black sample images on my last lora training, but the saves were still wor...

same, i guess I'll just leave it as is. samples were never useful anyways

versed crescent Aug 23, 2023, 11:56 AM

#

I find it odd that in some tutorials people seem to be adjusting Network Alpha like it’s the same as Network Rank. From what I understand, it’s more like a multiplier on the values stored in the LORA ?

#

It’s not like Rank and Alpha are the x and y dimensions of some tensor. Maybe I have it wrong ?

#

Ah no I’m not wrong, this is from Caith’s recommended link from yesterday.

versed crescent Aug 23, 2023, 12:21 PM

#

@hollow spruce Hey I’d appreciate a hand with prompting and workflows for trying my freshly trained LORA. I’ve tried some basic Comfy workflow, but as the strength of the LORA goes up, it pulls the resulting image straight into looking like one of the training images, regardless of prompt. I don’t know if this is an issue with my lower image count while training, or if it’s over fitting.

Looking at the checkpoint images made during training, the likeness converges in a nice linear way across the 300 epochs, and it looks correct after 250, so maybe I am not using positive and negative prompts correctly? Or style prompts? Because Comfy is so much of a blank slate I don’t know if I’m just approaching this in too simplistic a way.

dusky apex Aug 23, 2023, 12:37 PM

#

restive bridge I started getting black sample outputs during XL lora training, but the loss isn...

I had the same problem tonight. It solved by itself the following time, with this difference : I added this argument : --no_half_vae
By the way, would you mind sharing your training command? Mine is not satisfying for a face training.

dusky apex Aug 23, 2023, 12:41 PM

#

versed crescent <@211089689652887552> Hey I’d appreciate a hand with prompting and workflows for...

Can you please illustrate or explain "the likeness converges in a nice linear way", on my side I get "over baked" images from the start to the end, I don't know what must be changed. In the end my LORA is over baked, I get acceptable results only using a 0.1 weight.

dusky apex Aug 23, 2023, 12:57 PM

#

Here is my current command, just in case anybody of you find something strange. On a 3090, target is SDXL.

.\accelerate launch
--num_cpu_threads_per_process=2 "C:\Users\daf\automatic\kohya_ss\sdxl_train_network.py"
--enable_bucket
--min_bucket_reso=512
--max_bucket_reso=2048
--pretrained_model_name_or_path="C:/Users/daf/automatic/models/Stable-diffusion/sdvn6Realxl_detailface.safetensors"
--train_data_dir="blah"
--resolution="1024,1024"
--output_dir="blah"
--logging_dir="blah"
--network_alpha="1"
--training_comment=trigger=blah_0.4
--save_model_as=safetensors
--network_module=networks.lora
--network_args rank_dropout="0.15" module_dropout="0.15"
--text_encoder_lr=0.0005
--unet_lr=0.0005
--network_dim=24
--output_name="blah_v0_4"
--lr_scheduler_num_cycles="70"
--scale_weight_norms="1"
--no_half_vae
--network_dropout="0.2"
--full_fp16
--learning_rate="0.0005"
--lr_scheduler="constant_with_warmup"
--lr_warmup_steps="166"
--train_batch_size="2"
--max_train_steps="3325"
--save_every_n_epochs="10"
--mixed_precision="fp16"
--save_precision="fp16"
--seed="1234"
--caption_extension=".txt"
--cache_latents
--cache_latents_to_disk
--optimizer_type="AdamW8bit"
--max_train_epochs=70
--max_data_loader_n_workers="0"
--bucket_reso_steps=64
--mem_eff_attn
--gradient_checkpointing
--xformers
--bucket_no_upscale
--noise_offset=0.0
--sample_sampler=k_dpm_2_a
--sample_prompts="blah\prompt.txt"
--sample_every_n_epochs="4"

versed crescent Aug 23, 2023, 1:02 PM

#

dusky apex Can you please illustrate or explain "the likeness converges in a nice linear wa...

I can't really show you as I'm training on personal photographs of friends, but I'm generating images every 5 epocs from 300 in total. The sample images slowly converge from 0-250 after which I'm seeing a recognisable likeness to the training images.

dusky apex Aug 23, 2023, 1:05 PM

#

Can you share your training command please? On my side my subject is already recognizable at the first epoch preview, there is few difference between the first preview and the last one.

versed crescent Aug 23, 2023, 1:10 PM

#

dusky apex Can you share your training command please? On my side my subject is already rec...

I'm following Caith's original post here #🔧｜finetune message

dusky apex Aug 23, 2023, 1:10 PM

#

Other question, how do windows users do for displaying their training log in a tensor board ? I saw that google collab provide a board but I don't know how to load my log files there.

versed crescent Aug 23, 2023, 1:13 PM

#

dusky apex Other question, how do windows users do for displaying their training log in a t...

If you run the GUI in kohya_ss, there's a button in the webUI to launch tensorboard and point it to the log folder

#

dusky apex Aug 23, 2023, 1:40 PM

#

All right. Thank you for the link, I'll follow this guide. Regarding the webUI, I'll have to find out why I get python errors when using the "Train Model" button from kohya_ss UI. I spent hours trying to understand what was wrong. I don't have the console right now but the main problem was a bad interpretation of the generated training command, for instance the console said "resolution is mandatory" while it was correctly specified in the UI. I reinstalled the Bmaltais kohya_ss UI twice with no improvement. Thank you again.

#

Ah, last question (I hope), what should I use as inference model for my woman face training? I didn't understand if the base sdxl is better than specific models from civitai. Another tutorial maker wrote somewhere that the base SDXL model was too wide for that.

covert pagoda Aug 23, 2023, 2:15 PM

#

anybody know how to pass model metadata such as model output name and group name on kohya to wandb using ars? do i need to go into WandBtracker Class to set parser?

versed crescent Aug 23, 2023, 2:32 PM

#

dusky apex Ah, last question (I hope), what should I use as inference model for my woman fa...

I cannot think of a reason why you'd ever want to train a LoRA on top of a different base instead of SDXL base. That way you keep it most compatible with every other workflow

sonic mantle Aug 23, 2023, 2:49 PM

#

if i want to set different text and unet learning rates, what should i be inputting in the red box? seems like you have to put a value of some kind in there.

stone garden Aug 23, 2023, 2:53 PM

#

sonic mantle if i want to set different text and unet learning rates, what should i be inputt...

It doesn’t matter, put anything. It will be overridden by the ones below.

sonic mantle Aug 23, 2023, 2:53 PM

#

sweet, thanks

restive bridge Aug 23, 2023, 5:47 PM

#

does Memory efficient Attention, Gradient checkpointing, Xformers, or Full bf16 noticeably lower quality at all?

It's really hard to figure out which parameters were made for low vram cards vs. which ones are standard optimizations that everyone should use (like xformers for inference)

dusky apex Aug 23, 2023, 6:17 PM

#

versed crescent If you run the GUI in kohya_ss, there's a button in the webUI to launch tensorbo...

I'm going crazy with kohya's UI. I just loaded @hollow spruce Json, adjusted the directories, and bam, same shit again. I have reinstalled kohya three times to be sure. The python version is the right one, I don't know what to do. After these screenshots, I shortened the path to the files (in case it would be the cause) but same problem again.

sonic mantle Aug 23, 2023, 6:19 PM

#

is DPM++ SDE Karras not available when training?

dusky apex Aug 23, 2023, 6:19 PM

#

#

If I launch the same command from powershell, it works.

sonic mantle Aug 23, 2023, 8:20 PM

#

If i'm training with a mix of these resolutions, do i need to enable bucketing? or is bucketing only necessary for resizing?

restive bridge Aug 23, 2023, 8:54 PM

#

Increasing batch size absolutely wrecks fine details on faces. Time to try GA instead

sonic mantle Aug 23, 2023, 9:21 PM

#

versed crescent Ah no I’m not wrong, this is from Caith’s recommended link from yesterday.

i've read this 5 times and i still don't understand. i think im gonna train dim:128, alpha:1 and then dim:128, alpha 128 and see what the difference is.

versed crescent Aug 23, 2023, 9:36 PM

#

@sonic mantle I think they divide the rank by the alpha to determine a maximum strength of learning. Either its the max value that the lora stores, or a multiplier to the learning rate. Either way having it higher just makes your lora 'weaker' so to speak. divided by a larger number

sonic mantle Aug 23, 2023, 9:39 PM

#

versed crescent <@480446009898696715> I think they divide the rank by the alpha to determine a m...

Thanks, that makes more sense. It's strange because I saw some examples on this blogpost and the lower alpha number always looked worse, but maybe that's because the initial dataset was so limited that the strength of training on a limited dataset worsened the outcome.

#

this is the blogpost btw: https://medium.com/@dreamsarereal/understanding-lora-training-part-1-learning-rate-schedulers-network-dimension-and-alpha-c88a8658beb7

Medium

Understanding LoRA Training, Part 1: Learning Rate Schedulers, Netw...

A guide for intermediate level kohya-ss scripts users looking to take their training to the next level.

versed crescent Aug 23, 2023, 9:42 PM

#

sonic mantle Thanks, that makes more sense. It's strange because I saw some examples on this ...

I read this:

For example, if Alpha is 16 and Rank is 32, the weight usage intensity will be 16/32 = 0.5, which means that the learning rate will only be half as effective as the Learning Rate setting.

#

From https://hoshikat.hatenablog.com/entry/2023/05/26/223229#LoRA-type translated to English

sonic mantle Aug 23, 2023, 9:46 PM

#

versed crescent I read this: > For example, if Alpha is 16 and Rank is 32, the weight usage inte...

so a rank of 128 and alpha of 1 is 1/128 = 0.0078?

#

that doesn't sound like an optimal setting then

versed crescent Aug 23, 2023, 9:47 PM

#

no, and if you do not specify an alpha specifically, it defaults to 1, so I don't know. Looking through the code to see if I can spot anything

#

Oh sorry that's not true:

  if network_alpha is None:
    network_alpha = network_dim

so it looks like it defaults to the dimension size, thus making the training modifier 1 which will do nothing

sonic mantle Aug 23, 2023, 9:49 PM

#

im starting to think dim 128, alpha 128 was the better option then

versed crescent Aug 23, 2023, 9:50 PM

#

yeah, that would effectively do nothing, so you could safely play with other settings

restive bridge Aug 24, 2023, 3:03 AM

#

lowering learning rate from 4e-4 to 3e-4 is having no affect at all on loss. same with 1e-4. interesting that people use these graphs to determine "overfitting" yet i can clearly over or underfit this training without changing the graph whatsoever.
misinfo abounds

stiff dust Aug 24, 2023, 9:54 AM

#

sonic mantle that doesn't sound like an optimal setting then

as larger your lora as more strongly it affects your model and as faster it learns. Reducing learning rate counteracts this. alpha is a way of somehow automatically adjust learning rate based on your lora size

#

I would keep alpha low to give the model more time to adapt and learn 🤷‍♂️ but that also depends on your training data

dusky apex Aug 24, 2023, 10:21 AM

#

Hello, thanks ot your help I could cook a working face LORA tonight! 🐸

#

Now it's time to fine tune the details. I am noticing that the face is correctly used in close-up portraits or portraits, but as soon as the character is full body or half body, the face is not used at all. My 50 training images are almost all tightly framed around the face, but the caption text don't mention close-up or anything regarding the framing. Do I need to change something : add images, change captions?

stiff dust Aug 24, 2023, 11:28 AM

#

you should have mid range images in your training data

#

changing caption alone won't help

#

having 10 images of your face that all have same angle and distance is usually useless

#

rather use less images but from different perspectives

stone garden Aug 24, 2023, 1:07 PM

#

Hi, I found BNK clip text encoder (the suspended node) with tensor problem if the prompt is too long and difficult. So I tested how can I replace to another one. I found another sdxl compatible node what accepting long prompts, but as a point of interest I tried to encode L and G prompts with separated non-sdxl encoders, and later used nodes for concat/average/combine the encoders outputs. Combining them the worst, but all of them useful. Maybe I like best the average node because strength settings. Is it right way to replace sdxl clip encoders to 2 separated 1.5 compatibles? Review or opinion welcome. (image contains workflow data):

dusky apex Aug 24, 2023, 6:21 PM

#

stiff dust you should have mid range images in your training data

By the way, I was saying that SDXL didn't generate nice pics using my new trained LORA using 2048px close up faces. I am currently generating images at 1.25 scale (1440x1120), they are all nice using my LORA. Interesting.

00624-lora_crealie_v1_1_3_crealie_woman_style-crystalClearXL_ccxl.jpg

#

it is not upscale, but native resolution

sonic mantle Aug 24, 2023, 6:52 PM

#

which parameter is failing when your Lora data doesn't integrate well with the rest of the model? I trained a Lora on mostly closeup shots of keira knightley's face. But if i prompt something like standing on a ship in pirate clothes it will only generate a closeup of the face.
Does this mean the network rank was too low?

stiff dust Aug 24, 2023, 7:05 PM

#

network rank is too high

#

and probably learnt for too long

sonic mantle Aug 24, 2023, 7:06 PM

#

stiff dust network rank is too high

I believe i had it set at 128. what seems like a good target?

stiff dust Aug 24, 2023, 7:06 PM

#

it's very high

#

some people use 4 or 8

#

For faces I would start at 16 or 24

sonic mantle Aug 24, 2023, 7:07 PM

#

stiff dust For faces I would start at 16 or 24

thanks so much.

stiff dust Aug 24, 2023, 7:07 PM

#

and increase only if quality is not good enough

dusky apex Aug 24, 2023, 7:29 PM

#

sonic mantle which parameter is failing when your Lora data doesn't integrate well with the r...

This is exactly what I complained this morning. And @stiff dust told that I had to insert mid range pics in my set. He didn't mention the rank.

sonic mantle Aug 24, 2023, 7:30 PM

#

dusky apex This is exactly what I complained this morning. And <@321288280651857922> told t...

as someone who has run 7 training sessions in the last 2 days, i know that feel.

stiff dust Aug 24, 2023, 7:31 PM

#

it's different. If the model is not able to generalize (e.g. change clothing of the character) it is overfitted. In this case reduce rank and/or learning rate

dusky apex Aug 24, 2023, 7:31 PM

#

I just added an interesting thing, if you increase the resolution of your generation, you should be able to set more distance with the face (see my example above)

sonic mantle Aug 24, 2023, 7:32 PM

#

stiff dust it's different. If the model is not able to generalize (e.g. change clothing of ...

do you have any insights on varying the unet LR from the text LR?

stiff dust Aug 24, 2023, 7:32 PM

#

but the model will never be good in showing the character from a angle or perspective it was never trained on

#

to be honest, I wouldn't train the text encoder at all

#

or just train it for one epoch but not more

#

text encoder overfits much faster than unet

dusky apex Aug 24, 2023, 7:34 PM

#

what is the text encoder? the auto caption tool?

#

In my case I used it, then I cleaned 50% of the data to be compliant with Caith's tutorial

sonic mantle Aug 24, 2023, 7:35 PM

#

dusky apex what is the text encoder? the auto caption tool?

it's the rate at which the model learns the relationship between your images and your captions

dusky apex Aug 24, 2023, 7:35 PM

#

ok

sonic mantle Aug 24, 2023, 7:35 PM

#

but if you're just lora training someone's face there probably isn't much for it to learn

stiff dust Aug 24, 2023, 7:36 PM

#

sonic mantle it's the rate at which the model learns the relationship between your images and...

not really. SD was never trained with the text encoder

sonic mantle Aug 24, 2023, 7:36 PM

#

oh damn i didn't know that

stiff dust Aug 24, 2023, 7:36 PM

#

the text encoder is frozen

#

training it makes sense if your captions contains something that is unknown to the text encoder

dusky apex Aug 24, 2023, 7:36 PM

#

I was wondering if I can improve my model by adding new pics with emotions : sad, smile, pensive etc.

stiff dust Aug 24, 2023, 7:37 PM

#

a name would be a good example. It can make sense to train the text encoder on a new name it doesn't know yet. But it overfits very quickly

stiff dust Aug 24, 2023, 7:38 PM

#

dusky apex I was wondering if I can improve my model by adding new pics with emotions : sad...

haven't tried that yet for sdxl

dusky apex Aug 24, 2023, 7:38 PM

#

I will

#

I'm really pleased to see that SDXL generates perfect images when setting +25% resolution on some checkpoints.

#

Some others don't appreciate

#

Quick question : is it possible and easy to install and plug a standalone comfyui? The one embedded in sd.next is broken.

stiff dust Aug 24, 2023, 7:42 PM

#

not more difficult than installing sd.next 🤷‍♂️

#

it has no builtin venv, though

restive bridge Aug 24, 2023, 9:51 PM

#

I recently got my best quality and flexibility when using a very low LR. Captions (wd14 tagging with human characteristics pruned) reduced the artifacts on clothing SIGNIFICANTLY, but it also hurt likeness quite a bit and didnt converge. I dropped the captions and got my best likeness and quality ever, but flexibility is lower again with artifacts on clothing.
Still better than anything I was able to achieve when using celebrity name token. And scales across different ppl and img counts

covert pagoda Aug 24, 2023, 10:45 PM

#

stiff dust For faces I would start at 16 or 24

Could you briefly convey your understanding of network alpha. I’ve heard differing opinions and never anything conclusive about what it actually does, nor have I ever seen experiments with useful results with different alpha’s. Is it useful at all?

latent charm Aug 25, 2023, 2:04 AM

#

Lora vs lora-fa.

#

In lora-fa epoch 10, if I put peace sign in G and L, it provides more likeness to original

glass plank Aug 25, 2023, 4:31 AM

#

What does something like "2e-1" refer to when someone speaks of making a lora?

mental hatch Aug 25, 2023, 6:40 AM

#

lora-fa vs locon locon is superior for styles, which is what I train.

#

all settings were exactly the same just switch to fa vs locon.

stiff dust Aug 25, 2023, 9:46 AM

#

restive bridge I recently got my best quality and flexibility when using a very low LR. Caption...

that cloth captions hurt likeness is strange... maybe you can try to make your name longer. Longer captions give the network more flexibility. So if you think that "photo of Peter wearing a red shirt and blue socks and green pants" works better than "photo of Peter" then you might try something like "Photo of Peter Widdlediddlebuggs" (add some last name with more tokens)

stiff dust Aug 25, 2023, 9:47 AM

#

glass plank What does something like "2e-1" refer to when someone speaks of making a lora?

erm... that's just scientific notation. 2e-1 is 0.2.

stiff dust Aug 25, 2023, 9:51 AM

#

covert pagoda Could you briefly convey your understanding of network alpha. I’ve heard differi...

no, just keep it at 1.
The alpha is a scaling factor on the strength of your lora. alpha=1 means you lora is multiplied with 1/dim. An alpha of dim means your network is multiplied with 1 (thus, nothing happens).

The reason for alpha is that a network with high dim learns faster and has more strength in shorter time. When people experiment with Lora they often just change a single parameter between their workflows. So they change dim from say 32 to 128 and find that after epoch 10 the image looks much better with higher dim. However, dim 32 might look equally good if you would train it until epoch 20. It's just looks worse because it trains slower.
alpha is somehow countering this effect by reducing the speed of training with high dim loras. So using an alpha=1 just means your loras with different dims are more comparable

normal ember Aug 25, 2023, 9:54 AM

#

How should one think about the network rank? Complex dataset larger network?

stiff dust Aug 25, 2023, 9:55 AM

#

yeah, I would say so

#

Note that in large language models people tend to use Loras of rank 1 ^^ So you can learn a lot even with low ranks

#

In general Lora is based on a compression technique, so lower rank means higher compression means more compression artefacts. I found that the unet is a bit more sensitive to these artefacts. So using a too low rank lora for the unet is usually a bad idea. I would at least use rank 8 or 12 for the unet, maybe even higher. But you can try. You will see if artefacts appear.
For the text encoder you can use rank 1 or 2 and thats already enough. However, for some reason most scripts don't offer an easy possibility to use different ranks for unet or text encoder

normal ember Aug 25, 2023, 10:24 AM

#

Other than getting a large lora what's the disadvantages of going of a too high network rank?

latent charm Aug 25, 2023, 10:32 AM

#

I heard from @hollow spruce. Too high network rank would damage the model.

stiff dust Aug 25, 2023, 10:32 AM

#

yeah, you want the lora parameter efficient, i.e. only change the base model as few as possible

#

I'm not sure if its really just the rank or more an combination of rank and alpha and learning rate, but you just don't want your Lora to overfit and "damage" the model

#

"damage" means: usually you train your model on a large variety of images. Training it only on a single subject damages the model, because it starts forgetting what it had learned before

latent charm Aug 25, 2023, 10:34 AM

#

I dont know what does the "damage" mean

#

Thanks for explanation

restive bridge Aug 25, 2023, 10:35 AM

#

Olivia showed an example on Twitter of 256 dim damaging the background of an image vs. 24 dim.

But other factors like LR could've affected that difference too. that was hardly a scientific test.

stiff dust Aug 25, 2023, 10:36 AM

#

that's what I meant

#

just from mathematical viewpoint I would say it's a combination of LR and dim that causes the effect

restive bridge Aug 25, 2023, 10:36 AM

#

yes. maybe scheduler too

stiff dust Aug 25, 2023, 10:36 AM

#

so a too high DIM should not necesarilly be a problem if you keep your learning rate low enough. But why would you want to do that?

restive bridge Aug 25, 2023, 10:37 AM

#

stiff dust so a too high DIM should not necesarilly be a problem if you keep your learning ...

better eyes

stiff dust Aug 25, 2023, 10:37 AM

#

I don't think that you need high dim Lora for that

restive bridge Aug 25, 2023, 10:38 AM

#

idk my config won't make perfectly circular and symmetrical eyes at any LR or epoch except 1e-4 and lower

normal ember Aug 25, 2023, 10:38 AM

#

If you decrease the network rank do you also increase the learning rate? Or is that only true if you fiddle with the alpha?

stiff dust Aug 25, 2023, 10:38 AM

#

for perfect iris I found it helpful to add very few cropped super high res images to the training data

#

but in general problems like unsymetric eyes and so on are problems of SDXL itself

#

if you fix that with your lora than probably just because your lora is memorizing your training images

#

which is usually a sign of overfitting

restive bridge Aug 25, 2023, 10:40 AM

#

stiff dust but in general problems like unsymetric eyes and so on are problems of SDXL itse...

idk, if I swap the token out for a random name or just man, eyes are nearly perfect

stiff dust Aug 25, 2023, 10:40 AM

#

in theory, the better strategy would be to train on a large variety of photos of different peoples to train SDXL to make perfect eyes and THEN train on your face

latent charm Aug 25, 2023, 10:41 AM

#

I feel SDXL is undertrain in many aspect like eyes

restive bridge Aug 25, 2023, 10:41 AM

#

stiff dust in theory, the better strategy would be to train on a large variety of photos of...

Yes waiting on a good fine-tune that does that.

Wyvernmix has given me the most circular eyes but they're way too big (anime weighting)

stiff dust Aug 25, 2023, 10:41 AM

#

hm, that's strange. My experience so far is that a lora on my face is sometimes making things wrong, but it is doing so at the same rate as SDXL is doing anatomy or eyes wrong on other people

normal ember Aug 25, 2023, 10:41 AM

#

There seems to be so many parameters affecting the lr one way or the other so it's hard to adjust accordingly.

restive bridge Aug 25, 2023, 10:43 AM

#

Unfortunately for my use-case I have to assume the worst case scenario of 12 low quality images. I can't control what goes in, but need to guarantee it comes out good. which is easy until a high quality dataset is neutered by my "safe" config

hollow spruce Aug 25, 2023, 10:45 AM

#

yeah. damage was probably the wrong word to use.
Basically when you have a huge 256dim lora, and dont train it on a metric ton of images, then you'll quickly see things like like details being forgotten/replaced. First the backgrounds stop having any detail, and literally fade into these weird contrastless messes. Then colors become more flat, as gradients stop showing up, and this will continue and continue as most "detail work" stops existing.

See how... undetailed secourses images look?

normal ember Aug 25, 2023, 10:46 AM

#

Like you have created a way to big network in front of the base model

#

I wonder how much one should decrease the lr if you go lower on the network.

hollow spruce Aug 25, 2023, 10:48 AM

#

less about the learning rate, more a matter of how much new information you're actually inputting. how many new images are you working with?

#

like for 4k images, dim32/dim64 give essentially the same results for me.

#

so I find it hard to see genuine usecases where you "need" 128/256dim loras

restive bridge Aug 25, 2023, 10:49 AM

#

I believe it but I need to see it A/B on the same seed and more levels than max and min, and whether details of the face are improving or not when close-up (surely 64 is nothing like 256). I'll do this test soon enough

#

cuz SEcourses outputs/prompts are nothing like mine

hollow spruce Aug 25, 2023, 10:50 AM

#

if you go away from photorealistic, and into pure artwork, then high dims might work to some extent, since you dont care about the photo capabilities being forgotten

restive bridge Aug 25, 2023, 10:51 AM

#

hmm the last time I went low I lost skin details and it felt more artistic

normal ember Aug 25, 2023, 10:51 AM

#

I'm training for color palettes, grain and such

#

dof

hollow spruce Aug 25, 2023, 10:51 AM

#

restive bridge cuz SEcourses outputs/prompts are nothing like mine

he has enough source images + regularization images, and has a face that is easy enough to train

#

do rarer ethnicities, and it gets a lot harder to replicate his results

latent charm Aug 25, 2023, 10:54 AM

#

I have a photo dataset and I cropped the face and save as another folder in the dataset. Would it help to increase the likeness?

restive bridge Aug 25, 2023, 10:55 AM

#

my first test is always a bald Arab (our ceo). he was impossible to train on 2.1, XL is already infinitely better but still doesn't like his face

restive bridge Aug 25, 2023, 10:55 AM

#

latent charm I have a photo dataset and I cropped the face and save as another folder in the ...

full body images almost never help with likeness so I'd say yes

normal ember Aug 25, 2023, 10:56 AM

#

do you captioning him with arabic?

restive bridge Aug 25, 2023, 10:56 AM

#

nope

#

that'd be counter productive

#

(captioning a black person as black makes them white in outputs)

normal ember Aug 25, 2023, 10:57 AM

#

Hmm, then how will the model know what nationality you want to train for an example?

restive bridge Aug 25, 2023, 10:58 AM

#

you could mention it in the prompt

stiff dust Aug 25, 2023, 10:58 AM

#

normal ember Hmm, then how will the model know what nationality you want to train for an exam...

use a proper name

restive bridge Aug 25, 2023, 10:59 AM

#

or even use celeb with same ethnicity as token

stiff dust Aug 25, 2023, 10:59 AM

#

my own name "Kai" for example is a typical German name, but in SDXL it's strongly associated with Japan. That's why I use the name "Christian" instead when training on my face

normal ember Aug 25, 2023, 11:00 AM

#

Is this only specific to LoRA training or other types of trainging too?

restive bridge Aug 25, 2023, 11:01 AM

#

but if there are any Christians weighted heavily in XL dataset then you still have a problem. hence random token for stability, but hard to scale

restive bridge Aug 25, 2023, 11:01 AM

#

normal ember Is this only specific to LoRA training or other types of trainging too?

all the same

stiff dust Aug 25, 2023, 11:02 AM

#

restive bridge but if there are any Christians weighted heavily in XL dataset then you still ha...

a name like "Christian" is too common to be a problem

normal ember Aug 25, 2023, 11:02 AM

#

How do they train it for different nationalities? Just an example could be whatever concept.

stiff dust Aug 25, 2023, 11:02 AM

#

I would go for a combination of a common first name and a uncommon surname

normal ember Aug 25, 2023, 11:02 AM

#

Find it unnatural to train it on names. 😄

restive bridge Aug 25, 2023, 11:03 AM

#

the nationality should come from the input images, not the captioning or token. it should already know if the person is Asian based on how they look in training

normal ember Aug 25, 2023, 11:04 AM

#

They must have trained it somehow

#

What's the purpose of the caption?

stiff dust Aug 25, 2023, 11:06 AM

#

what dal wanna say is that the model should learn to associate your appearance (e.g. skin color) with your name

normal ember Aug 25, 2023, 11:06 AM

#

Sorry for asking so many stupid questions. I just want to understand the basics good enough.

restive bridge Aug 25, 2023, 11:06 AM

#

to specify what about the images you want trained into the token. captions remind it what normal things are in the image so not all of it is trained

#

so you should only mention things you DONT want the token to remember about the images

normal ember Aug 25, 2023, 11:07 AM

#

Ok, so then if I want to adjust the DoF, film grain, colours and such I should probably just caption what's happening in the image?

restive bridge Aug 25, 2023, 11:08 AM

#

Yes caption everything except for those effects

normal ember Aug 25, 2023, 11:09 AM

#

Like "a dry, barren landscape with a fence and hills in the background"

restive bridge Aug 25, 2023, 11:10 AM

#

But XL TE training seems to be a fickle bitch and I have found captions to hurt face training if not utterly perfect

#

no idea for styles

normal ember Aug 25, 2023, 11:11 AM

#

fence might be in the foreground but... 😄

#

What I've found is that the LoRA seems to be activating stuff that are in the same age of the items in the dataset even though it's not in the dataset.

#

For example cars get picked from the correct age even though it's not the same make, color and such.

restive bridge Aug 25, 2023, 11:14 AM

#

a caption like "modern car" should fix that. specifying the "age" in captions should keep it out of training

normal ember Aug 25, 2023, 11:14 AM

#

For style I like it though 😄

stiff dust Aug 25, 2023, 11:16 AM

#

normal ember What I've found is that the LoRA seems to be activating stuff that are in the sa...

do you train text encoder?

#

as said, text encoder overfits extremely fast. If you train it, train it for very short time

#

the unet should be less sensitive to these things

latent charm Aug 25, 2023, 11:16 AM

#

normal ember How do they train it for different nationalities? Just an example could be whate...

The nationalities came from the images in SAI training dataset which captioned with the nationalities. I think it is why the nationalities is so biased.

normal ember Aug 25, 2023, 11:16 AM

#

Yeah, I've done that. I will try with --network_train_unet_only next.

restive bridge Aug 25, 2023, 11:17 AM

#

stiff dust as said, text encoder overfits extremely fast. If you train it, train it for ver...

have you tried lowering LR on only the TE? one thing I plan to test soon

hollow spruce Aug 25, 2023, 11:17 AM

#

normal ember Hmm, then how will the model know what nationality you want to train for an exam...

"african"
I still feel racist for tagging every black person as african in my master dataset 🥲 but its the best working word for me so far
(Caucasian|African|Asian|Indian|etc...) <- every person in my dataset has one of these tags. Due to the quantity of total images + clip training, it works really well. But for smaller datasets this would obviously be counterproductive

stiff dust Aug 25, 2023, 11:18 AM

#

restive bridge have you tried lowering LR on only the TE? one thing I plan to test soon

yeah, sure, but that won't be enough

#

I usually train text encoder first with dim=1 for only one or two epochs with low learning rate and then continue training with unet only and higher dim. But that's a bit complicated as kohya does not have commandline options for that

normal ember Aug 25, 2023, 11:19 AM

#

Not sure if kohya_ss is able to train text only for an epoch.

stiff dust Aug 25, 2023, 11:19 AM

#

but I think there is a commandline option that allows you to stop text encoder training after certain amount of steps

restive bridge Aug 25, 2023, 11:21 AM

#

stiff dust I usually train text encoder first with dim=1 for only one or two epochs with lo...

freezing the text encoder. I always used that method for DB since the paper that tested it with Kramer. way better than full TE training and way better than without. never thought it would perform the same on Lora.
I'm pretty sure Kohya does have that setting

normal ember Aug 25, 2023, 11:21 AM

#

I'm searching for it in the options but can't find it.

latent charm Aug 25, 2023, 11:22 AM

#

hollow spruce "african" I still feel racist for tagging every black person as african in my ma...

How do you build your master dataset? Do you have any schedule or planned process to continue scale up your dataset? Very appreciated for share.

restive bridge Aug 25, 2023, 11:23 AM

#

I think it was a setting and was dropped due to problems

hollow spruce Aug 25, 2023, 11:23 AM

#

I usually spend a weekend to increase it by 1~3k images. manually edited & cropped, then manually tagged

#

once I hit around 10k it would probably make sense to train my own blip from it 🤣

restive bridge Aug 25, 2023, 11:24 AM

#

I knew it was there! right in kohya

hollow spruce Aug 25, 2023, 11:24 AM

#

simply increasing it isn't hard - as I can just download photoshoots. But getting good diverse images, that look nothing alike is the high effort part. Usually via flickr where I filter out 90% of all images

normal ember Aug 25, 2023, 11:25 AM

#

restive bridge I knew it was there! right in kohya

Only in GUI it seems. 😦

hollow spruce Aug 25, 2023, 11:25 AM

#

currently I'm still at 50% truly random images that I chose from flickr. 50% from photoshoots, for that super high detail

latent charm Aug 25, 2023, 11:26 AM

#

Thanks for share

restive bridge Aug 25, 2023, 11:26 AM

#

🙄

latent charm Aug 25, 2023, 11:27 AM

#

yeah, I also tried.🤣

stiff dust Aug 25, 2023, 1:54 PM

#

yeah, kohya is sometimes a bit unflexible... I did a lot of changes on the code myself. For subject training it is usually sufficient to only train the cross attention layers. Text encoder training can be helpfull if it's done with low dim (e.g. rank 1) and short time. Together it's totally possible to make a Lora with filesize <50mb

normal ember Aug 25, 2023, 2:11 PM

#

It's a bit confusing but there's kohya-ss and kohya_ss. I guess the first is the most upstream since kohya_ss seems to be merging with kohya-ss.

versed crescent Aug 25, 2023, 2:20 PM

#

bmaltais/kohya_ss is the GUI/web version. kohya-ss/sd-scripts is the original command-line version that the GUI uses

#

kohya-ss/sd-scripts is the original source of training scripts

normal ember Aug 25, 2023, 2:35 PM

#

I use the scripts in kohya_ss for SDXL

#

And those do not seem to have an equivalent in kohya-ss/sd-scripts no?

stiff dust Aug 25, 2023, 2:39 PM

#

kohya-ss/sd-scripts is the original implementation

normal ember Aug 25, 2023, 2:41 PM

#

will it handle sdxl or is modifications needed?

stiff dust Aug 25, 2023, 2:46 PM

#

you have to checkout the sdxl branch

normal ember Aug 25, 2023, 2:49 PM

#

Thanks! 👨‍🦯

quiet eagle Aug 25, 2023, 5:17 PM

#

I'm oom-ing with pretty default settings (e.g. preset SDXL - adafctor 1.0) with a 4090 with kohya ss. Is there any basic SDXL dataset and settings I can use to figure out if the problem is with my GPU or what

#

in this case I CPU Ram OOM-ed after 174 steps (32gb), with other settings I GPU Ram OOM

normal ember Aug 25, 2023, 5:39 PM

#

What's your batch size?

quiet eagle Aug 25, 2023, 5:51 PM

#

1

normal ember Aug 25, 2023, 5:54 PM

#

And you also set these? cache_latents_to_disk gradient_checkpointing xformers

#

You can search for memory in here: https://hoshikat-hatenablog-com.translate.goog/entry/2023/05/26/223229?_x_tr_sl=auto&_x_tr_tl=en&_x_tr_hl=en&_x_tr_pto=wapp

quiet eagle Aug 25, 2023, 6:18 PM

#

yeah I have those. Currently working with some other of the random preset

#

tho on all of the ones that run I only get 2.5it/s with a 4090 while other seems to report that and more with weaker GPUs so still feels like something is wrong

#

but at least I can test from here if it works for a full epoch and adjust

normal ember Aug 25, 2023, 6:29 PM

#

I think bf16 will help too

stiff dust Aug 25, 2023, 7:24 PM

#

I would definitely use bf16. I would also try AdamW instead of Adafactor

quiet eagle Aug 25, 2023, 7:44 PM

#

yeh I use bf16 but havent tried adam

#

managed to run 4 epochs and the lora does seem to be more or less working

safe cobalt Aug 26, 2023, 5:24 PM

#

Hi I'm looking for some example sets images and captions for SDXL finetuning (not LoRa), just for starting off learning and testing. Does anyone know if there are any sets out there, paid or otherwise? Anyone know if SD have any sets available that they used as part of the SDXL training?

#

safe cobalt Aug 26, 2023, 6:09 PM

#

This will be my first finetune I try on my 3090 using the kohya GUI -> finetune.

#

This is the config I plan to use, I think I've gotten most of the settings right.

📎 MyConfig.json

next tapir Aug 26, 2023, 8:26 PM

#

Hi - I'm training an SDXL LoRa to replicate the style of a lithograph artist. However, after 2400 steps I got this as a result. It's got the right vibe, but the actual "style" of the photograph didn't seem to transfer. Instead, a very soft, painterly style transferred over. I started to see the consistent style forming at 1200 steps, but it always stayed "oil/painterly" and never inherited the pencil/etched look. Would this be indicative that I need more steps / training? Or perhaps I've overtrained? This is being done with Kohya GUI, and 24 sample images at 100 repeats.

mental hatch Aug 27, 2023, 2:03 AM

#

next tapir Hi - I'm training an SDXL LoRa to replicate the style of a lithograph artist. Ho...

Try with less repeats and more epochs. Did XL have any idea about this concept or did each time you tried something different came out? That last part will tell you if a lora will do it (btw, lora for styles I have never had luck with and it is better to go to locon as it trains 1 layer deeper which makes it better for styles).

next tapir Aug 27, 2023, 2:07 AM

#

mental hatch Try with less repeats and more epochs. Did XL have any idea about this concept ...

It probably knows about lithographs, but I didn't keywork anything, I only used the unique keyword. Interestingly, it veered away from lithographs and towards the painterly look, probably because it knows a lot more about painterly looks than it does lithographs.

I'll try your suggestions, thanks!

mental hatch Aug 27, 2023, 2:08 AM

#

XL is a new beast. Do not use unique keywords.

next tapir Aug 27, 2023, 2:08 AM

#

Oh, really?

mental hatch Aug 27, 2023, 2:08 AM

#

I was having issues until I saw a vid that said that but my findings were already showing to not use uniques

#

With two TEs they fight each other so only train the unet

next tapir Aug 27, 2023, 2:09 AM

#

Should I just generate captions for each image instead?
And for style training, should my captions be more related to the content of the image or the styles that most closely match it?

mental hatch Aug 27, 2023, 2:11 AM

#

I gotta tell you about XL. You know what captions are all about, right? TE. I tested this and caption or no caption captions are worthless unless you train the TE. I did that and instantly different if I used captions or not. I no longer screw with captions UNLESS I am daring to screw with the TE.

restive bridge Aug 27, 2023, 2:26 AM

#

I feel it is important to mention that repeats and epochs are interchangeable when it comes to total step count but
if you are using regularization, use high repeats on img folder
if you use 1 repeat, only a few of your reg images will be used (the same amount as your training images). if you got a big reg folder from SEcourses or Aitrepeneur patreons or elsewhere, use the maximum amount of repeats you can before (training imgs * repeats > reg imgs). you want a unique reg image per training image repeat. so you want at least as many as reg images as (img count * repeats).

repeat_1 may be the easiest way to calculate max steps but it basically nullifies your regularization

mental hatch Aug 27, 2023, 2:48 AM

#

agreed.

#

Though a lot of model makers have ditched repeats for more epochs as they say it makes better, more refined, models.

#

I hate dealing with reg images it is a pain in the ass.

normal ember Aug 27, 2023, 6:25 AM

#

mental hatch Though a lot of model makers have ditched repeats for more epochs as they say it...

How do you think about batch size?

mental hatch Aug 27, 2023, 7:15 AM

#

normal ember How do you think about batch size?

I mainly do styles and use BS 8. On my 4090 that is the sweet spot for the lowest amount of time. I understand that for people it is best for a BS1 but still a debate about that.

normal ember Aug 27, 2023, 7:24 AM

#

I wonder what Ejektaflex's class token should be if he should be skipping captioning or can you skip that too somehow?

mental hatch Aug 27, 2023, 7:26 AM

#

my token is a known style to XL. For instance, my released locon has the keyword Cartoon. The fun one I just did as I learn how to do people was segal (for steven segal).

#

there will be a v2 of segal as I am not satisfied with it but I lacked images mainly

#

for a class token tbh I don't use them as I don't regularize a style as I want it to over take everything

normal ember Aug 27, 2023, 8:01 AM

#

Can you have multi word tokens?

mental hatch Aug 27, 2023, 8:14 AM

#

yes

#

steven segal man. That is the activation word(s) + class of man

normal ember Aug 27, 2023, 8:20 AM

#

did you have only "steven" or "steven man"?

#

I wonder if some stuff are way more trained in the base model and hard to change or if it doesn't matter much as long as you have a match and don't have to train the text encoder.

opal jacinth Aug 27, 2023, 8:33 AM

#

restive bridge I feel it is important to mention that repeats and epochs are interchangeable wh...

do you mind sharing your json config for lora training @restive bridge ?

restive bridge Aug 27, 2023, 9:11 AM

#

opal jacinth do you mind sharing your json config for lora training <@338908442603290625> ?

I won't have the config on me for a couple days but it's something like
18 imgs, 1e-4, 20 repeats, 5 epochs, batch size 3, adamw, constant w/ warmup, "ohwx man", bf16, dim 64, 600 real reg photos, no captions.

That's just the current point in a perpetually evolving recipe. Its not something I'd recommend to anyone. It worked good on just one test, I have no idea how it performs at scale yet. not efficient either, for 24gb gpu there's a lot of vram headroom but raising batch size seemed to hurt the quality, or maybe cuz I didn't and can't evenly divide the image count by the batch size, which is the correct way.

opal jacinth Aug 27, 2023, 9:16 AM

#

restive bridge I won't have the config on me for a couple days but it's something like 18 imgs,...

Thanks for sharing 🙂 it's just that it seems you're also trying to find the best configuration for a limited number of images, while most here are talking about the best config with a high number of images.

restive bridge Aug 27, 2023, 9:21 AM

#

opal jacinth Thanks for sharing 🙂 it's just that it seems you're also trying to find the bes...

that's true. I've used all the configs people share here and always have to adjust it a lot to work better on limited data

versed crescent Aug 27, 2023, 3:58 PM

#

restive bridge I won't have the config on me for a couple days but it's something like 18 imgs,...

What workflow are you using to confirm whether the lora is well trained? the samples during training are one thing, but whether the lora is under/overtrained needs to have more styled prompts applied

restive bridge Aug 27, 2023, 5:25 PM

#

versed crescent What workflow are you using to confirm whether the lora is well trained? the sam...

lately I start with a basic photoshoot prompt and check for likeness first. if it passes I try a vintage prompt. if over fitted it will often fail to put the person in black and white, in which case I roll back an epoch til it works right. if likeness is still there at that point I move to a heavily stylized prompt that forces an intricate outfit and environment. it will either pull likeness away, or have a ton of artifacts everywhere, or by some miracle will work good. at which point I'd try a couple more prompts and train again on different images which rarely ends well and the process restarts

ruby pond Aug 27, 2023, 11:32 PM

#

anyone got any tips for automatically capturing still frames from videos with minimal motion blur?

jade hornet Aug 28, 2023, 1:26 AM

#

That topic belongs in an animation channel

signal warren Aug 28, 2023, 11:35 AM

#

Not necessarily, he probably wants to capture stills to train

#

Personally I just capture the stills manually while watching, since I want to capture exactly what I want.

#

So I can't help much.

ruby pond Aug 28, 2023, 12:27 PM

#

yeah training an analog film lora requires stills from analog films 😄

safe cobalt Aug 28, 2023, 12:44 PM

#

Is an advantage to using both captions and tags when fine tuning the SDXL base model,or just one or the other?

covert pagoda Aug 28, 2023, 2:20 PM

#

anyone know the cause of this error on kohya_ss: HFValidationError: Repo id must be in the form 'repo_name' or
'namespace/repo_name':
'/workspace/stable-diffusion-webui/models/Stable-diffusion/sd_xl_base_1.0.safete
nsors
'. Use repo_type argument if needed.

gilded kindle Aug 28, 2023, 2:22 PM

#

Hello! My understanding is that SDXL was trained on images with a variety of aspect ratios. However, most of the Python training scripts I’ve seen involve reshaping the image to a set resolution across the train/val splits. What’s the best resource for training a SD model on varying aspect ratios?

gilded kindle Aug 28, 2023, 2:26 PM

#

covert pagoda anyone know the cause of this error on kohya_ss: HFValidationError: Repo id must...

You using .from_pretrained() in the HF library? Looks like it’s not a fan of the path you provided. I would try:

ensure the path itself is correct. Are there any typos? Currently it seems that workspace is under your system’s root directory, is that true?
specify the path to the parent folder, not to the .safetensors file itself

covert pagoda Aug 28, 2023, 3:01 PM

#

gilded kindle You using `.from_pretrained()` in the HF library? Looks like it’s not a fan of t...

Yea I’m using a safetensor model locally not sure why this hf library is coming up

gilded kindle Aug 28, 2023, 3:02 PM

#

covert pagoda Yea I’m using a safetensor model locally not sure why this hf library is coming ...

What code are you using to load your model? The HF library is very commonly used to load/run models

covert pagoda Aug 28, 2023, 3:02 PM

#

I’m running on kohya_ss on runpod. The model in question is local to the volume @gilded kindle

#

Its the source model tab

#

Under LoRA training section

#

#

📎 error.txt

#

here's the full error

#

and the cli command: accelerate launch --num_cpu_threads_per_process=2 "./sdxl_train_network.py" --enable_bucket --min_bucket_reso=256 --max_bucket_reso=2048 --pretrained_model_name_or_path="/workspace/stable-diffusion-webui/models/Stable-diffusion/sd_xl_base_1.0.safetensors
" --train_data_dir="/workspace/S0r4_train/img" --resolution="512,650" --output_dir="/workspace/stable-diffusion-webui/models/Lora" --logging_dir="/workspace/S0r4_train/log" --network_alpha="1" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=5e-05 --unet_lr=0.0001 --network_dim=128 --output_name="S0r4_10-01_p1" --lr_scheduler_num_cycles="12" --no_half_vae --learning_rate="0.0001" --lr_scheduler="constant" --train_batch_size="1" --max_train_steps="6960" --save_every_n_epochs="1" --mixed_precision="bf16" --save_precision="bf16" --seed="1234" --caption_extension=".txt" --cache_latents --optimizer_type="AdamW8bit" --max_data_loader_n_workers="1" --clip_skip=2 --keep_tokens="1" --bucket_reso_steps=64 --mem_eff_attn --shuffle_caption --gradient_checkpointing --xformers --bucket_no_upscale --noise_offset=0.05 --wandb_api_key="9328358809ad058d08c0f5e53cfc7f91f3d661b4" --sample_sampler=euler_a --sample_prompts="/workspace/stable-diffusion-webui/models/Lora/sample/prompt.txt" --sample_every_n_steps="270"

white imp Aug 28, 2023, 7:37 PM

#

Hello, Im trying to use Kohya to train a model of mine so I can switch over to using SDXL, Apparently you need to use a XL Base model or something so im told from my freind for SDXL.

Anyways I have a 6GB card but I try and run Kohya it just stops because Cuda runs out of memory. Is there anyway in the files I can tell it my max GPU size is 6GB?

balmy fable Aug 28, 2023, 11:06 PM

#

So, gang, anyone have any thoughts on how I would train a SDXL LoRa on what a keytar is? I've done a few test runs, and thus far I've not been able to get it to really get the hang of it at all. At the moment I'm using ~30 images (a few of just keytars alone, the rest of people playing them), and no regularization images (not sure what one would even use for those), and captions along the lines of a woman on a stage playing a keytar, keyboard (instrument) and the results aren't any better than asking stock SDXL for a picture of someone playing a keytar, and might actually be worse. I've trained people before, but not objects like this. Anyone have any pointers on doing this kind of thing?

little patio Aug 29, 2023, 12:41 AM

#

How do you deal with a size mismatch between the checkpoint/safetensor's file and the "current model"?

little patio Aug 29, 2023, 2:56 AM

#

Please help. This obstacle is truly frustrating.

sinful rune Aug 29, 2023, 4:04 AM

#

I find that the model i finetuned base sd 1.5 act good in txt2img scene, but when i use it in img2img scene with controlnet, it seems not work well. eg. it will generate blur face when i use softedge. Some models in the civitai can generate nice face eventhough there is some controlnets like openpose, softedge. I feel like my model seems more seriously affected by controlnet. Does anyone know why this happened?

covert pagoda Aug 29, 2023, 11:57 AM

#

gilded kindle What code are you using to load your model? The HF library is very commonly used...

Sorted

covert pagoda Aug 29, 2023, 12:03 PM

#

stiff dust no, just keep it at 1. The alpha is a scaling factor on the strength of your lor...

Thank you so much for this. It’s really useful to put aside. Interestingly, someone (Robert Jene), suggested alpha is scaling power of the captions, so the tracking listens more to the captions than the clip encoder. I’m not sure I remember this entirely correctly, but I gather this means how the text encoder is affected, which sort of makes sense because when the alpha is highly learning of concepts seems to go up but also seems to damage the underlying model’s knowledge, giving bad anatomy or badly rendered colors (in the case of photos for instance). I don’t know if this all wrong. Does it sound like there’s anything accurate in this depiction?

stiff dust Aug 29, 2023, 12:10 PM

#

covert pagoda Thank you so much for this. It’s really useful to put aside. Interestingly, some...

no, that sounds wrong. alpha has nothing to do with captions, you could learn without captions and alpha would still has same meaning.

It's really just how strong the lora is applied to the model. higher alpha means stronger lora. However, the lora also gets stronger as longer you train it. So lower alpha means you have to train it longer to reach same effect

covert pagoda Aug 29, 2023, 12:26 PM

#

stiff dust no, that sounds wrong. alpha has nothing to do with captions, you could learn wi...

Ok that makes sense. I have a question that is somehow related. I’ve noticed when testing Lora on webui that using some of the tokens from the captions in the training will suddenly show a very overfitted Lora. So I imagine a good test for new Lora’s is to also run a prompt using some of the original captions to see if overfitting happens. Would this be correct way to test overfitting?

stiff dust Aug 29, 2023, 12:27 PM

#

yes. I would always use trigger words anyways, as this is the fastest way to train loras (except when you use caption dropout)

covert pagoda Aug 29, 2023, 12:29 PM

#

No, I don’t mean solely the trigger word. I mean certain descriptive words in the caption… like a leather jacket and an accessory. Which upon use in the prompt immediately makes the gen look like one of the dataset images. Whereas the trigger on its own generalises the likeness of the character

stiff dust Aug 29, 2023, 12:59 PM

#

yeah, if a word that occurs only in part of the training images has a strong effect on the lora, then this is a clear sign of overfitting

tepid sundial Aug 29, 2023, 1:02 PM

#

In the cloneofsimo Lora repo there was a feature to add new tokens to the tokenizer and introduce new embeddings for them, optionally initialized from existing ones - as part of LoRA training. Is there any reasons this type of approach has falled out of favour in other training scripts/repos/workflows? One can achieve this with for example sd-scripts too, as there is TI. But just wondering if there's a good reason why it's not more often suggested as part of training style and character LoRA.

#

We're seeing with SDXL that it is faily common for people to only train the unet, and given that, it would seem that performing a TI before training a LoRA should be benefitial to unet training.

jade hornet Aug 29, 2023, 2:00 PM

#

white imp Hello, Im trying to use Kohya to train a model of mine so I can switch over to u...

6gb is probably not enough for training. You can use cloud compute to do it

covert pagoda Aug 29, 2023, 3:32 PM

#

Anybody here going to CogX Festical ‘23 London? https://stabilityaicogx.splashthat.com/

#

https://stabilityaicogx.splashthat.com/

dusky aurora Aug 29, 2023, 4:18 PM

#

could someone help me with img2img upscaling ? Ive been trying to use this controlnet ultimate sd upscaling method and the results look like when I look into my trash can.

stiff dust Aug 29, 2023, 8:35 PM

#

tepid sundial In the cloneofsimo Lora repo there was a feature to add new tokens to the tokeni...

I agree, it's super annoying that there is no easy way to bundle TI and Lora.

restive bridge Aug 30, 2023, 12:13 AM

#

I wish I could set different learning rates between training images and regularization images. I just discovered that its the regularization giving me clothing/environment artifacts. But dropping the reg wrecks the face quality.

jade hornet Aug 30, 2023, 1:26 AM

#

dusky aurora could someone help me with img2img upscaling ? Ive been trying to use this contr...

Is that bad?

ruby pond Aug 30, 2023, 1:42 AM

#

restive bridge I wish I could set different learning rates between training images and regulari...

I was even getting images that looked more like the reg images than the training images

restive bridge Aug 30, 2023, 2:04 AM

#

ruby pond I was even getting images that looked more like the reg images than the training...

such a bummer. at this point I'm forced to choose between likeness or flexibility. if only captions worked better

quiet eagle Aug 30, 2023, 6:58 AM

#

restive bridge I wish I could set different learning rates between training images and regulari...

how much did you experiment with having less/more regdata and the ratio of training data to reg data (e.g. via repeats)

little patio Aug 30, 2023, 1:14 PM

#

little patio How do you deal with a size mismatch between the checkpoint/safetensor's file an...

Has no one else encountered this issue?

stiff dust Aug 30, 2023, 1:43 PM

#

restive bridge I wish I could set different learning rates between training images and regulari...

I thought there is a regularization strength parameter

restive bridge Aug 30, 2023, 4:43 PM

#

quiet eagle how much did you experiment with having less/more regdata and the ratio of train...

I tried the same amount of reg as training, twice as many, and 600, with 1-20 repeats on imgs. 600 (one unique reg per img repeat) gave the best faces.

restive bridge Aug 30, 2023, 4:43 PM

#

stiff dust I thought there is a regularization strength parameter

where is that 👀

opal jacinth Aug 30, 2023, 7:15 PM

#

restive bridge where is that 👀

I assume he is referring to https://github.com/bmaltais/kohya_ss/wiki/LoRA-training-parameters#prior-loss-weight

GitHub

LoRA training parameters

Contribute to bmaltais/kohya_ss development by creating an account on GitHub.

restive bridge Aug 30, 2023, 7:19 PM

#

opal jacinth I assume he is referring to https://github.com/bmaltais/kohya_ss/wiki/LoRA-train...

wow, cant believe i never realized what that setting meant. That should fix some problems. Thank you both happemad

opal jacinth Aug 30, 2023, 7:21 PM

#

restive bridge wow, cant believe i never realized what that setting meant. That should fix some...

I'm looking forward to hear from your results! 🙂

opal jacinth Aug 30, 2023, 7:27 PM

#

restive bridge wow, cant believe i never realized what that setting meant. That should fix some...

by the way, did you find the reg images more useful from secourses or aitepreneur?

restive bridge Aug 30, 2023, 7:29 PM

#

opal jacinth by the way, did you find the reg images more useful from secourses or aitepreneu...

SEcourses if the goal is just faces. Aitrepeneurs had way more dynamic poses, more focus on actions than faces. I never directly compared results tho

latent charm Aug 31, 2023, 7:36 AM

#

Did anyone know about the cropped training in sdxl report? what benefit from this type of training? I am planning to create a tool that could based user prompt to crop the training set to create sub set for lora training

open merlin Aug 31, 2023, 3:06 PM

#

Am training using the prodigy learning rate scheduler. Anyone understand it? It seems that learning rate changes over time depending on the ratio of norm and key norm. But I cant find what 'key norm' means and how it is different from norm.

#

rancid acorn Aug 31, 2023, 10:36 PM

#

dusky aurora could someone help me with img2img upscaling ? Ive been trying to use this contr...

It would be best to use ControlNET Tiles along with Ultimate SD Upscale to get coherent, good quality upscales.

This comment I made on Reddit might help you get better upscales:

https://www.reddit.com/r/StableDiffusion/comments/142qmea/an_realistic_high_resolution_photo_of_an_rocky/jn655ms/?utm_source=share&utm_medium=ios_app&utm_name=iossmf&context=3

BunniLemon's comment on "an realistic high resolution photo of an r...

Explore this conversation and more from the StableDiffusion community

glass sorrel Sep 1, 2023, 4:15 PM

#

Hello community!
could you please share me the way how to fine tune the different images not like training on single type of images like dreambooth,

Ex: a model training images should be with different images shirt, pant, shoes, watch, etc... And with their respective prompts for training

Could you please let me know how can i make it

Thank you

normal ember Sep 1, 2023, 8:02 PM

#

32 vs 128 network rank

#

gentle flame Sep 1, 2023, 11:13 PM

#

👀
https://github.com/lodestone-rock/SDXL-sharding

GitHub

GitHub - lodestone-rock/SDXL-sharding: training script to shard SDX...

training script to shard SDXL model across multiple devices - GitHub - lodestone-rock/SDXL-sharding: training script to shard SDXL model across multiple devices

west solstice Sep 2, 2023, 1:22 AM

#

Hey all, still trying to wrap my head around regularization images. What exactly should they be? Are they meant to be a good example of the class or the literal model output of that class?

#

IE I’m training a “Steve face” Lora, my regularization images would be “Man face”

#

Do I use a dataset of high quality examples for the “Man face” class? Or should I hop in to A1111 and crank out 800 literal “Man face” images with no negative prompts, presumably using the checkpoint I am training with?

#

I’m getting even further confused by conflicting information online about the value / necessity of even having regularization images. There is a lot of misinformation going around

gentle flame Sep 2, 2023, 1:50 AM

#

west solstice Hey all, still trying to wrap my head around regularization images. What exactly...

take your captions, remove the instance tag, generate with them. Make sure they have a caption file similar to the image dataset minus the instance tag.

#

do not fiddle with them to get "good results"

west solstice Sep 2, 2023, 1:52 AM

#

so lets say image 1 is - steve, riding bike, blue shirt. the regularization would be man, riding bike, blue shirt?

gentle flame Sep 2, 2023, 1:53 AM

#

it would be
riding bike, blue shirt

#

Adding man is a good idea, but it should be a part of the regular dataset caption as well, or whatever the man face is on

west solstice Sep 2, 2023, 1:55 AM

#

gotcha

gentle flame Sep 2, 2023, 1:56 AM

#

regularization images help to prevent overfitting and limits what the models add. You don't have to add them, but they can help if you're having overfit issues. I know of an example of a lora with and without reg images, but it's not sfw so can't share here. I can say that the regularization images helped make it more flexible though.

west solstice Sep 2, 2023, 1:58 AM

#

Thanks so much! Is there a good place to learn these concepts from the engineering side? Much of the current youtube content feels a bit like rando content creators producing "guides" when they don't know what they are doing

gentle flame Sep 2, 2023, 2:01 AM

#

I don't know any good guides or sites. I learned through discord. You could search here if you want to, but unfortunately there isn't a ton of info on regularization images since few people use them.

#

I also recommend using the sampler that you'll be training on for the reg images, which is usually DDIM.

#

oh yeah, forgot to mention this, but regularization images DO NOT have to match the name of the image dataset. They are not paired. You only have to make sure the reg caption file matches the reg image name.

#

and make sure they're the same resolution as what you'll be training on

latent charm Sep 2, 2023, 8:36 AM

#

@hollow spruce Hello, I had done a experiment about anatomy subset of lora training. I use groundeddino to crop the original training image to get face and hand subset. Human selected the good images and add [face focus, hand focus] to each subset and keep all caption from wd14. The main dataset caption are generated by wd1.4, added prefix and human reviewed captions. It kind of improved hand in the generation. But I still think it is undertrain. All dataset set to repeat 3 and I am trying to increase the subset repeat to 10 to see if it would help.

#

new lora vs old lora. The main dataset is the same. The old one was trained multiple times with different setting. The new one was trained once. Both consume around 16 training hours in a 3090.

#

The old lora's text encoder seems to be broken.

#

The new one seems to be undertrain

opal jacinth Sep 2, 2023, 9:29 AM

#

normal ember

thanks for sharing. what training settings did you use?

normal ember Sep 2, 2023, 9:41 AM

#

opal jacinth thanks for sharing. what training settings did you use?

I can share that later when I’m at the computer. But basically unet training only, 3000 steps, 4e-4 and a keyword for the dataset. Tried to train for a specific film stock.

#

I think one can see tendencies for tiny bit of overfitting in the higher rank. Look at the missing tents in the 128. Prompt included campsite.

latent charm Sep 2, 2023, 12:08 PM

#

Could we train embedding for SDXL?

stone garden Sep 2, 2023, 2:18 PM

#

so is it normal that lora training takes over certain haircolors and styles and i cant get rid of them?

#

i cut out all faces without hair and train the lora with only faces lets see if it works

latent charm Sep 2, 2023, 2:21 PM

#

It is usually because your caption removed all haircolors and styles. The lora learned with that

stone garden Sep 2, 2023, 3:57 PM

#

latent charm It is usually because your caption removed all haircolors and styles. The lora l...

you mean that i dont have it in my prompt?

#

or what you mean with caption

latent charm Sep 2, 2023, 3:59 PM

#

Usually, you would have a .txt file which is a pair of your image. 1.png should have 1.txt. Inside the text, it should have prompts to describe the paired image. The "caption" means the .txt file

stone garden Sep 2, 2023, 4:00 PM

#

latent charm Usually, you would have a .txt file which is a pair of your image. 1.png should ...

ah yeah i know that cause ive done it already 😄 But you mean it happens because i didnt describe the haircolor and hairstyle on the text files properly

#

anyways i trained it with only the cutout face without hairs and it worked very good

#

i first cut out the faces with paint.net lasso tool

latent charm Sep 2, 2023, 4:01 PM

#

stone garden ah yeah i know that cause ive done it already 😄 But you mean it happens because...

If you want to have the flexibility to change the haircolor and hairstyle with the image which isn't cutout. You should mention them in the caption.

stone garden Sep 2, 2023, 4:02 PM

#

then i removed background with the a1111 extension (couldve maybe saved them as png to avoid that step) then in the text files i only described the face and now it works much better

latent charm Sep 2, 2023, 4:04 PM

#

Multiple ways to achieve the same result. It is SD. It has no right and wrong.

stone garden Sep 2, 2023, 4:04 PM

#

latent charm If you want to have the flexibility to change the haircolor and hairstyle with t...

I think with my pics the best decision is to cut out, cause my pics contained a complex haircolor and hairstyle and it seemed like describing it didnt work

stone garden Sep 2, 2023, 4:05 PM

#

latent charm Multiple ways to achieve the same result. It is SD. It has no right and wrong.

i dont think the text files were the problem for me

#

i described them pretty good

#

except seperating the prompt in the text caption with commas has an effect

#

My summed up theory: If you want to train a lora model just for a face of a person, cutting out only the face/head without the hairs and body is the best/fastest method

latent charm Sep 2, 2023, 4:11 PM

#

If you use reg images, it is fine

stone garden Sep 2, 2023, 4:14 PM

#

I didnt use them yet 🤔

latent charm Sep 2, 2023, 4:16 PM

#

You might experiment your theory

covert pagoda Sep 2, 2023, 4:26 PM

#

@latent charm hey have you played with anymore captioning scripts?

latent charm Sep 2, 2023, 4:28 PM

#

covert pagoda <@331826740898824195> hey have you played with anymore captioning scripts?

I have tried that. But for my lora training, the only usualful one is the wd14

#

I made a preprocessing tool which would crop images from dataset as a subset. After that, I would use wd14 to caption the images. WIP

covert pagoda Sep 2, 2023, 5:47 PM

#

latent charm I made a preprocessing tool which would crop images from dataset as a subset. Af...

Cropping in to hands and faces? But then you need to upscale with high quality like topaz or latent upscale with SD no? To get proper detail quality of the focus?

#

I do all my focus crops manually straight out of topaz crop/upscale

latent charm Sep 2, 2023, 5:48 PM

#

I didn't do any upscale for the cropped image yet. But I deleted too small images.

#

I use groundingdino to auto select the focus by provided prompt like face or hand and select > 0.5 images. After that reviewed the cropped images and remove some.

stone garden Sep 2, 2023, 11:20 PM

#

what's the best way to finetune with ~50k images?

latent charm Sep 3, 2023, 5:56 AM

#

Share some selected result with this config.
The images was trained on two cycles.
The first one trained with 16 hours using 3090.
The seconds one trained with 8 hours and I reduce the text encoder learning rate to half of original which is 0.000025.
The dataset contains 3 folders, 3_face, 3_hand, 3_woman. Total around 750 images.
The woman dataset contains original selected photo of the person.
The face and hand were used my preprocess tool to auto crop from woman dataset.
After that, remove small images and complicated images (especially on hand dataset).
Tag them with wd14 and delete wrong tag. Keep all recognized tag.

latent charm Sep 3, 2023, 6:33 AM

#

set LoRA network weights to your lora and continue the train

latent charm Sep 3, 2023, 3:13 PM

#

You still use the same model. You need to set the lora network weights to resume from the lora training.

#

latent charm Sep 3, 2023, 5:18 PM

#

I don't know what do you mean "low pixel". If you mean blurry, noise or any other effect you don't want but appear in your generation. It might related to your dataset

#

Image resolution doesn't related to lora

stone garden Sep 3, 2023, 5:22 PM

#

So i trained lora with a character and i described the hairstyle in every text caption but still it sticks to the hairstyle from the character until i go down to like 0.6 weight

#

whats a good way to fix that?

latent charm Sep 3, 2023, 5:26 PM

#

stone garden whats a good way to fix that?

Do you have different hairstyles or only one hairstyle?

stone garden Sep 3, 2023, 5:26 PM

#

latent charm Do you have different hairstyles or only one hairstyle?

only one

latent charm Sep 3, 2023, 5:27 PM

#

You might try to add more hairstyles into dataset. It seems this hairstyle is overfitted to the lora

stone garden Sep 3, 2023, 5:28 PM

#

latent charm You might try to add more hairstyles into dataset. It seems this hairstyle is ov...

i would have to manipulate the pics then because the character i use only has 1 hairstyle yet, well one other hairstyle i could add

#

or ill just use inpaint for other hairstyles

#

i thought maybe theres a way to make the AI mix the hairstyle of the lora with the rest of the dataset

normal ember Sep 3, 2023, 6:40 PM

#

3000 steps (55 epoch), 220 images, contant adamw, unet only, lr 4e-4, batch size 4.
ref, 8, 6, 32, 64, 128
gen strength model: 0.6
gen strength clip: 0.9
a woman with eyes that have seen too much, enveloped in the twilight of a dense pine forest, with remnants of a long-abandoned campsite

#

ref, 8, 6, 32, 64, 128

#

ref, 8, 6, 32, 64, 128

#

ref, 8, 6, 32, 64, 128

#

Looking at only these I think network of 8 does the trick.

latent charm Sep 3, 2023, 6:58 PM

#

I think network 8 is under train

#

The likeness is not enough for me

north meadow Sep 3, 2023, 6:59 PM

#

hello, I have a question about training a custom model with dreambooth

#

not sure if this is the right channel tbh

#

but lets go

#

My question is about instance prompt and how unique each world should be

#

Say, if Ive named my instance prompt as "photo of humanoid2dside person", would the model understand the string "humanoid2dside" as completely new and unique argument during generation?
By no means the model should understand the prompt "photo of humanoid2dside person" as "photo of humanoid 2d side person"

#

Im not entirerely sure how stable diffusion recognizes each word contained in the prompt during generation.
But for my use case, a prompt with "humanoid2dside" should not be the same thing as a prompt with "humanoid 2d side"

latent charm Sep 3, 2023, 7:07 PM

#

#

You could install it from ext

north meadow Sep 3, 2023, 7:08 PM

#

hmmmmm

#

thanks, I was not aware of this

#

so that means with the string "humanoid2dside" that stable diffusion would identify the tokens "human","oid","2' and "dside"?

#

I see

#

probably I should use an more unique name then

#

I will try some strings on the tokenizer to see what works

#

thanks man

latent charm Sep 3, 2023, 7:11 PM

#

That fine once you start to train text encoder. it will learn your token, humanoid2dside, for your image

north meadow Sep 3, 2023, 7:12 PM

#

but once trained properly, is it garanteed to always consider humanoid2dside as a unique token?

latent charm Sep 3, 2023, 7:12 PM

#

You don't need a "unique" long string for your training, you just need something won't make things wrong. That should be enough.

north meadow Sep 3, 2023, 7:14 PM

#

I will try it out to see if it works then

#

having a mean to at least know if my token is being identified properly is already a step foward

#

thank you

normal ember Sep 3, 2023, 7:30 PM

#

latent charm I think network 8 is under train

I haven't been clear enough. I'm not training for characters.

latent charm Sep 3, 2023, 7:32 PM

#

that makes sense

normal ember Sep 3, 2023, 7:34 PM

#

Should probably try dreambooth next to see what I might end up with.

next tapir Sep 3, 2023, 11:48 PM

#

Is there a way to "negatively caption" things during training? I'm training a style, and every once in a while, real images bleed into the results. I can negative prompt photograph during generation, but I'd like to stop this from happening during the training phase so that it's not a burden for people who use the LoRa, if possible.

I know that I could use regularization images, but since I'm training a generic artistic style, the time needed to create and generate a wide diversity of regulation images seems rather egregious. I was hoping that there would be an easier way of training out specific elements from the resulting style.

latent charm Sep 4, 2023, 5:20 AM

#

next tapir Is there a way to "negatively caption" things during training? I'm training a st...

There is no way to "negatively caption" during training. Your caption is a pair of the image. You should add the prompt to describe what your want to extract. For example: "photograph". When you add photograph in your caption, you could try to add photograph to neg prompt in your generation.

sonic narwhal Sep 4, 2023, 12:43 PM

#

How much estimated VRAM will u need to do full finetune of SDXL?

Doing a LoRa on 24gb 3090 with Kohya_ss took about 22 hours with batch size 1, repeats 1 and epochs 90

stiff dust Sep 4, 2023, 2:11 PM

#

for lora you need not more than 12gb. With a 3090 you can do batch size 10 without problems

#

training should be fast, but of course that also depends on your number of training images. But if you have so many images that it takes 22 hours I would increase batch size

sonic narwhal Sep 4, 2023, 2:12 PM

#

I had 80 images

#

my goal is later to do full fine tune of a lot of different concepts

#

up to 5000-10000 images

#

will I need more Vram?

stiff dust Sep 4, 2023, 2:13 PM

#

north meadow but once trained properly, is it garanteed to always consider humanoid2dside as ...

I got really good results with training unet-only on rare tokens.

stiff dust Sep 4, 2023, 2:14 PM

#

sonic narwhal I had 80 images

you don't need 22hours for 90 pictures...

sonic narwhal Sep 4, 2023, 2:14 PM

#

ill send the json I used

#

@stiff dust

📎 Rough_portrait_20230811-153628.json

#

number of repeats on image folder was only 1

#

all images were 1080x1350

stiff dust Sep 4, 2023, 2:20 PM

#

looks right. I would try a higher batch size. I have a 3090, too, and it definitely doesn't need that much time

sonic narwhal Sep 4, 2023, 2:22 PM

#

hm okay, ill try again and see how it goes

latent charm Sep 4, 2023, 2:24 PM

#

Are you using another program like comfyui or webui together while training?

gentle flame Sep 4, 2023, 3:48 PM

#

sonic narwhal How much estimated VRAM will u need to do full finetune of SDXL? Doing a LoRa o...

I don't know how to work this and it uses jax, but this might help you with vram requirements for a finetune if you have a few more 3090s
https://github.com/lodestone-rock/SDXL-sharding/tree/main

GitHub

GitHub - lodestone-rock/SDXL-sharding: training script to shard SDX...

training script to shard SDXL model across multiple devices - GitHub - lodestone-rock/SDXL-sharding: training script to shard SDXL model across multiple devices

undone bluff Sep 4, 2023, 4:07 PM

#

im trying to train basically a character with lora. because i only have 8gb of vram im using a google collab. my training images are all realistic images and if i generate some images with that lora i only get this realistic style, even though i use a f.e more drawn, anime style checkpoint. what could cause this kind of behavior? i had about 30 images with 12 repeats and 10 epochs (i tested all epochs, same result). also how would i go about if i want multiple of these characters in one shot? what tags should i use when training? like "2MY_CHARACTER" and "1MY_CHARACTER"? or whats the best way to do it so i can use it later in my promt

latent charm Sep 4, 2023, 4:12 PM

#

a photo of [your character] while training and replace the photo to other style while generation

undone bluff Sep 4, 2023, 4:20 PM

#

so i dont have to put in the quantity of my characters which are in the images? in some photos the characteristics of my character are seen in 2 entites of my images. so i dont have to put "a photo of 2 (my character)"?
also could you elaborate on "replace the photo to other style"? im not sure what exactly you mean?

stiff dust Sep 4, 2023, 7:10 PM

#

use the word "photo" in your captions
use a rare name for your character (not: john hammerfall, better: john hmrzufl)
only train unet, never train Text encoder

latent charm Sep 5, 2023, 1:17 AM

#

undone bluff so i dont have to put in the quantity of my characters which are in the images? ...

when you put photo in your caption, the lora would be learned that is a photo of your character. After the training finished, use your lora to generate images. At this time, you could use other style rather than photo to generate your character.

undone bluff Sep 5, 2023, 4:43 AM

#

I’ve put the term realistic in but I will try it with the tag photo as well, thank you ✌️

sonic narwhal Sep 5, 2023, 8:38 AM

#

seems currently A6000 is minimum requirement to do full finetune of SDXL

#

Looking at SimpleTuner from bghira

stone garden Sep 5, 2023, 9:09 AM

#

Can i add random photos with hairstyles and another prefix in the caption to my lora training images to be able to mix more hairstyles and colors to my character?

stone garden Sep 5, 2023, 9:10 AM

#

undone bluff im trying to train basically a character with lora. because i only have 8gb of v...

Funfact you dont need more than 8GB Vram to train on your pc, you just need to enable two settings in the parameters and it will work with 8GB, but collab or runpod is faster

latent charm Sep 5, 2023, 9:15 AM

#

stone garden Can i add random photos with hairstyles and another prefix in the caption to my ...

it should able to If it trained properly

latent charm Sep 5, 2023, 12:57 PM

#

I was enbled text encoder in training. While I train the same lora in multiple time, how could I find out the text encoder training is enough and stop te training?

sonic narwhal Sep 5, 2023, 1:18 PM

#

stiff dust looks right. I would try a higher batch size. I have a 3090, too, and it definit...

Did It again with 8 batch and took 2h 30 min

#

thanks

north meadow Sep 5, 2023, 2:16 PM

#

stiff dust - use the word "photo" in your captions - use a rare name for your character (no...

wouldnt using a common token like "john" as a rare name a problem, since the base model might have been trained with others images identified as "john"?

stiff dust Sep 5, 2023, 2:20 PM

#

the original Dreambooth paper suggested to use "sks person". However, "sks" has a meaning and is not that rare, so using something like "tdjvr" might make more sense. Instead of person you can simply use a real name. It transports more information (john -> male, western culture) and is more natural to prompt.

north meadow Sep 5, 2023, 5:02 PM

#

is there any documentation about these DW openpose arguments?

latent charm Sep 5, 2023, 5:11 PM

#

It is finetune channel. You might ask in SDXL

north meadow Sep 5, 2023, 5:13 PM

#

ok

restive bridge Sep 5, 2023, 8:02 PM

#

latent charm I was enbled text encoder in training. While I train the same lora in multiple t...

the "stop text encoder training" parameter doesnt work anyways:( but I'd assume as always the only way to find the right freeze point is experiment and see

rain scarab Sep 5, 2023, 8:14 PM

#

im training a sdxl lora with 163 images using Kohya at 1024,1024. It says it is going to take 5 hours? Is that normal? GPU 4080 gtx(16 gig) , i9-13900K , 32gig mem, m2's

Here is the configuration

{
"LoRA_type": "Standard",
"adaptive_noise_scale": 0,
"additional_parameters": "",
"block_alphas": "",
"block_dims": "",
"block_lr_zero_threshold": "",
"bucket_no_upscale": true,
"bucket_reso_steps": 64,
"cache_latents": true,
"cache_latents_to_disk": true,
"caption_dropout_every_n_epochs": 0.0,
"caption_dropout_rate": 0,
"caption_extension": "",
"clip_skip": "1",
"color_aug": false,
"conv_alpha": 1,
"conv_block_alphas": "",
"conv_block_dims": "",
"conv_dim": 1,
"decompose_both": false,
"dim_from_weights": false,
"down_lr_weight": "",
"enable_bucket": true,
"epoch": 10,
"factor": -1,
"flip_aug": false,
"full_bf16": false,
"full_fp16": false,
"gradient_accumulation_steps": "1",
"gradient_checkpointing": true,
"keep_tokens": "0",
"learning_rate": 0.0004,
"logging_dir": "<>/KOHYA/LoraPics/finalenvoy\log",
"lora_network_weights": "",
"lr_scheduler": "constant",
"lr_scheduler_args": "",
"lr_scheduler_num_cycles": "",
"lr_scheduler_power": "",
"lr_warmup": 0,
"max_bucket_reso": 2048,
"max_data_loader_n_workers": "0",
"max_resolution": "1024,1024",
"max_timestep": 1000,
"max_token_length": "75",
"max_train_epochs": "",
"max_train_steps": "",
"mem_eff_attn": true,
"mid_lr_weight": "",
"min_bucket_reso": 256,
"min_snr_gamma": 0,
"min_timestep": 0,
"mixed_precision": "bf16",
"model_list": "custom",
"module_dropout": 0,
"multires_noise_discount": 0,
"multires_noise_iterations": 0,
"network_alpha": 1,
"network_dim": 1,
"network_dropout": 0,
"no_token_padding": false,
"noise_offset": 0,
"noise_offset_type": "Original",
"num_cpu_threads_per_process": 2,
"optimizer": "Adafactor",
"optimizer_args": "scale_parameter=False relative_step=False warmup_init=False",
"output_dir": "<>/KOHYA/LoraPics/finalenvoy\model",
"output_name": "warforged_chk_pt",
"persistent_data_loader_workers": false,
"pretrained_model_name_or_path": "<>/stable-diffusion-webui - dream/models/Stable-diffusion/sd_xl_base_1.0_0.9vae.safetensors",
"prior_loss_weight": 1.0,
"random_crop": false,
"rank_dropout": 0,
"reg_data_dir": "",
"resume": "",
"sample_every_n_epochs": 0,
"sample_every_n_steps": 0,
"sample_prompts": "",
"sample_sampler": "k_dpm_2_a",
"save_every_n_epochs": 1,
"save_every_n_steps": 0,
"save_last_n_steps": 0,
"save_last_n_steps_state": 0,
"save_model_as": "safetensors",
"save_precision": "bf16",
"save_state": false,
"scale_v_pred_loss_like_noise_pred": false,
"scale_weight_norms": 0,
"sdxl": true,
"sdxl_cache_text_encoder_outputs": false,
"sdxl_no_half_vae": true,
"seed": "",
"shuffle_caption": false,
"stop_text_encoder_training": 0,
"text_encoder_lr": 0.0004,
"train_batch_size": 5,
"train_data_dir": "<>/KOHYA/LoraPics/finalenvoy\img",
"train_on_input": true,
"training_comment": "",
"unet_lr": 0.0004,
"unit": 1,
"up_lr_weight": "",
"use_cp": false,
"use_wandb": false,
"v2": false,
"v_parameterization": false,
"v_pred_like_loss": 0,
"vae_batch_size": 0,
"wandb_api_key": "",
"weighted_captions": false,
"xformers": "sdpa"
}

jade hornet Sep 5, 2023, 8:42 PM

#

rain scarab im training a sdxl lora with 163 images using Kohya at 1024,1024. It says it is ...

The time is a factor of how many epochs, and steps / epoch, which is image * repeats + reg images. Your card is 16gb, so you should consider a smaller batch size, and probably 1 with only 163 images

opal jacinth Sep 5, 2023, 8:55 PM

#

does it actually matter if some input training images are rotated?

#

or is it simply a "display setting" of the image for the OS and doesn't matter anyway?

latent charm Sep 5, 2023, 11:56 PM

#

restive bridge the "stop text encoder training" parameter doesnt work anyways:( but I'd assume ...

Because I train the lora multiple times. I could set te training rate, e.g. 0.00005, in the first training and set it to 0 in the second training training.

regal harbor Sep 6, 2023, 6:07 AM

#

anyone finte-tuning BLIP2?

stone garden Sep 6, 2023, 6:11 AM

#

Is seperating the prompts in the caption files with commas better?

stiff dust Sep 6, 2023, 6:28 AM

#

opal jacinth does it actually matter if some input training images are rotated?

what do you mean? Training images should look like you want them in the output. Modifications like flipping can help training, though

stiff dust Sep 6, 2023, 6:30 AM

#

latent charm Because I train the lora multiple times. I could set te training rate, e.g. 0.00...

yep, in my opinion it makes more sense anyways to train text encoder and unet separately. So text encoder first, then unet.
However, training text encoder is difficult and in most cases I found training unet only is better for generalization

latent charm Sep 6, 2023, 6:32 AM

#

stiff dust yep, in my opinion it makes more sense anyways to train text encoder and unet se...

I usually get better result when training character concept with telr.

opal jacinth Sep 6, 2023, 6:36 AM

#

stiff dust what do you mean? Training images should look like you want them in the output. ...

I used a script for automatic cropping to 1:1, but afterwards some images were flipped. So I was wondering if it affects the training

stiff dust Sep 6, 2023, 8:30 AM

#

opal jacinth I used a script for automatic cropping to 1:1, but afterwards some images were f...

flipping is rather good. I would just avoid it if there are features that are site-specific (e.g. a mole that should be always on the left side). Otherwise it will rather improve training

stiff dust Sep 6, 2023, 8:32 AM

#

latent charm I usually get better result when training character concept with telr.

it might depend on what you want to achieve. I found that text encoder training adapts very fast to the images, BUT the resulting model becomes very unflexible if it comes to draw the character in a different art style (e.g. from photo to anime or from painting to comic) or when you want to draw the character in a different angle, with different clothing and so on. Text encoder training often ends up giving me images that are too similar to the training images in terms of composition

opal jacinth Sep 6, 2023, 8:32 AM

#

stiff dust flipping is rather good. I would just avoid it if there are features that are si...

in my case it's about faces. you think it would even improve the result? that's interesting, how come? 🙂

stiff dust Sep 6, 2023, 8:33 AM

#

I assume it has something to do with the pooling. It seems that by text encoder training your trigger words become too dominant and the remaining prompt will be ignored very often

latent charm Sep 6, 2023, 8:33 AM

#

stiff dust it might depend on what you want to achieve. I found that text encoder training ...

I think it related to how to prepare your captions

stone garden Sep 6, 2023, 8:34 AM

#

man im gonna use dreamlook.ai for training next time

#

3200 steps in 7 minutes?

stiff dust Sep 6, 2023, 8:34 AM

#

opal jacinth in my case it's about faces. you think it would even improve the result? that's ...

more variety. If you have enough images (say, more than 10) it might not be important. But if you have very few images, the unet tends to produce artefacts. By flipping you increase the number of images artificially (flipped images are not entirely new, but at least they are slightly different)

stone garden Sep 6, 2023, 8:35 AM

#

10k steps in 15

#

https://dreamlook.ai/ anybody used it before?

dreamlook.ai

Train Stable Diffusion models in minutes. Scale up to 1000s of runs per day.

stiff dust Sep 6, 2023, 8:35 AM

#

latent charm I think it related to how to prepare your captions

I always use manually captions. If you have some trick to improve on that, I'm glad to hear that. The text encoder overfitting is a annoying issue for me

latent charm Sep 6, 2023, 8:36 AM

#

I used photo as prefix and manually add camera angle tag to caption to let lora learn how to map the camera angle to images. I used wd14 for auto captioning and manually remove and added intented tag for more control for the lora.

stiff dust Sep 6, 2023, 8:37 AM

#

hm, nah, adding tags that describe things on the images you don't want to overfit - I'm already doing that

latent charm Sep 6, 2023, 8:38 AM

#

I also use half unet lr for telr to train it slowly

stiff dust Sep 6, 2023, 8:40 AM

#

maybe it works better for your data. I can only say I did several subject trainings. I always evaluate by using simple prompts ["photo of xyz"] and unconventional prompts ["xyz as astronaut", "charcoal drawing of xyz", "egyptian hieroglyphics depict xyz"].
I always found that training Textual Inversion or training Text Encoder will improve very fast for the simple prompts but won't be able to do the unconventional prompts. Unet-only training on rare name tokens is the only strategy so far that excells also on the unconventional prompts.

#

when I have very few training data (e.g. only 1-2 images) THEN I use text encoder only training.

latent charm Sep 6, 2023, 8:42 AM

#

hmmm, I train with te using 2000 images and train it multiple times. I would try ur evaluation method and see how it go on

stiff dust Sep 6, 2023, 8:43 AM

#

I mean, if I train on photos of my face, then I have enough images. Training the text encoder then totally allows me to draw the image in different angles and so on

#

but it quickly overfits on the photo style

latent charm Sep 6, 2023, 8:44 AM

#

But I think even the text encoder is overfitted. It could be easily reduce the strength in comfyui and use a earlier te from previous Lora

stiff dust Sep 6, 2023, 8:44 AM

#

like letting my face be drawn as comic or charcoal drawing won't work anymore

latent charm Sep 6, 2023, 8:45 AM

#

stiff dust like letting my face be drawn as comic or charcoal drawing won't work anymore

that what I want to test

stiff dust Sep 6, 2023, 8:45 AM

#

also not all styles overfit similarly fast. Like "anime" style usually stays quite robust

stiff dust Sep 6, 2023, 8:45 AM

#

latent charm But I think even the text encoder is overfitted. It could be easily reduce the s...

yeah, but why would I do that? Then I can just skip the text encoder initially

latent charm Sep 6, 2023, 8:46 AM

#

I think it is because anime is quite undertrain in base

stiff dust Sep 6, 2023, 8:46 AM

#

I did A LOT of tests with training my face xD Using rare tokens was BY FAR what worked best

#

and with rare token I mean something like "Christian gjhsar"

#

combination of name + some random characters

latent charm Sep 6, 2023, 8:46 AM

#

stiff dust I did A LOT of tests with training my face xD Using rare tokens was BY FAR what ...

you mean rare token with no te training right?

stiff dust Sep 6, 2023, 8:46 AM

#

yes

latent charm Sep 6, 2023, 8:47 AM

#

interesting

stiff dust Sep 6, 2023, 8:47 AM

#

in fact, you only have to train the cross attention

#

this shrinks down the lora file size to few megabytes ^^

#

training self attention sometimes slightly improves image quality, but 99% of the training has to be done in the cross attention

#

I tried

textual inversion first, then unet training
textual inversion first, then text encoer, then unet training
text encoder, then unet on celebrity names
unet only on celebrity names
text encoder, then unet on rare tokens
unet only on rare tokens

The last one had best generalization capability

#

(it also took most time for training. unet on rare tokens need ~10 times more training steps to adapt to the training images than the other methods. But results just looked best by far)

latent charm Sep 6, 2023, 8:52 AM

#

Thanks for sharing. I would have some experiment with it.

#

I tried rare token unet only before but it seems very hard to converge

#

might be due to wrong lr

stiff dust Sep 6, 2023, 8:57 AM

#

yeah, it takes forever ^^

#

I mean, you have to combine rare token with a token that describes the character

#

I use a first name for that

#

similar to Dreambooths "sks person" I use then "John duqgzsa"

#

(cause sks is not really rare)

opal jacinth Sep 6, 2023, 8:58 AM

#

you're not providing the class?

stiff dust Sep 6, 2023, 8:59 AM

#

but then it trains much slower than other training variants. That is definitely the case. I just say the results are also much better than for other training variants

stiff dust Sep 6, 2023, 8:59 AM

#

opal jacinth you're not providing the class?

no need for that. A first name IS the class

#

in the case above: "John" clearly describes a male character. No need to use "male character" additionally

#

if your name is very missleading (like you are a women with name "Alex", or your name has some other meaning, like your name is "Dick" ^^°)

#

then you should "rename" yourself 😉

#

like my name is "Kai", which is a typical German name, but ithe name also appears in other cultures and in Stable Diffusion it is strongly associated with Japanese culture. That's why I renamed my name to "Christian" when training on my images

opal jacinth Sep 6, 2023, 9:11 AM

#

stiff dust if your name is very missleading (like you are a women with name "Alex", or your...

ye that was my first thought, what about for example "Kim", could be both 🙂

#

but I did get it right, you actually have the same results as if using the token "ohwx man"?

#

or do you say you had even better results with "Christian duqgzsa"

normal ember Sep 6, 2023, 9:14 AM

#

How do you know if it’s a rare token or not?

#

The offset lora provided by SAI is trained on just “contrast” by the look of the metadata.

stone garden Sep 6, 2023, 9:19 AM

#

what type of captioning should i use for a clothing style lora?

stiff dust Sep 6, 2023, 9:24 AM

#

opal jacinth but I did get it right, you actually have the same results as if using the token...

dunno, I didn't tried "ohwx man". I found using first name and last name way more natural

stiff dust Sep 6, 2023, 9:25 AM

#

normal ember How do you know if it’s a rare token or not?

random characters

latent charm Sep 6, 2023, 9:29 AM

#

Have you try celebrity names plus random character?

stiff dust Sep 6, 2023, 9:31 AM

#

no. The problem with celebrity names was that they are not really better than textual inversion but sometimes blend over (e.g. I once trained a DnD character on Hayden Christensen and sometimes the DnD character holds a light sabre instead of a sword xD)

#

I have to say: I don't care about training time. It might be different if you do that for business on a regular basis. For me, training 2 hours on a subject is totally okay if the results are really good afterwards.

stone garden Sep 6, 2023, 9:39 AM

#

So whats better, using multiple tokens or just one token and describing all the elements of the picture to add them later in the prompt manually?

latent charm Sep 6, 2023, 9:39 AM

#

stiff dust I have to say: I don't care about training time. It might be different if you do...

how about your dataset size and lr? 2 hours seems pretty fast

stone garden Sep 6, 2023, 9:39 AM

#

will the things from the training images mentioned in the caption still be included in the training dataset?

stiff dust Sep 6, 2023, 9:40 AM

#

latent charm how about your dataset size and lr? 2 hours seems pretty fast

I think for my face I had around ~50 images in total

latent charm Sep 6, 2023, 9:48 AM

#

stiff dust no. The problem with celebrity names was that they are not really better than te...

It sounds interesting but yes, I also noticed celebrity remaining effect after training.

latent charm Sep 6, 2023, 9:50 AM

#

stiff dust I think for my face I had around ~50 images in total

for my current training dataset, 2000/50=40*2=80hrs. I trained with te around 30 hrs. hmmm

#

haven't finished yet.

#

and planned to stop trainning unet

#

My focus is for the most likeness to original picture which might be a little bit different of you goal

stiff dust Sep 6, 2023, 9:57 AM

#

latent charm My focus is for the most likeness to original picture which might be a little bi...

yeah, my issue with overfitting goes in a different direction

latent charm Sep 6, 2023, 10:02 AM

#

I have this sample to change the style with te lora

#

in a few day ago. Dataset is 700 images. Training spent around 20 arounds.

#

I think it could add more oil painting to adjust the style

#

all images in dataset is photo and no reg images

#

It supposed to output images like this.

normal ember Sep 6, 2023, 10:21 AM

#

stiff dust random characters

If you train unet only how does it learn a random token?

stiff dust Sep 6, 2023, 10:22 AM

#

normal ember If you train unet only how does it learn a random token?

the unet cross attention learns to associates tokens with the latent pixels in your image. That's all you need

#

if you have the name "fgzhw" then it is tokenized (e.g. into ["fg", "zh", "w</w>"]) and the tokens are associated with your face

#

if you train the text encoder, the name tokens are "distributed" and amplified through your caption, which then ends up in an overfitting effect

normal ember Sep 6, 2023, 10:24 AM

#

and if it's not a face, like a image taken with let say kodak vision2, any idea how to best caption that?

#

I've gone with just that, seems to be working but there might be better ways.

stiff dust Sep 6, 2023, 10:25 AM

#

either you caption it with a unique rare token trigger word, or you simply describe what is on the image

#

former makes sense if there is something uniquely new that cannot be described

#

(like your face)

normal ember Sep 6, 2023, 10:26 AM

#

Tried to caption the image but I don't think the results were great when you do unet only. captioning the film stock turned out better.

stiff dust Sep 6, 2023, 10:27 AM

#

I'm not sure if this is comparable. I said: text encoder training overfits on style. You seem to train on a style

normal ember Sep 6, 2023, 10:29 AM

#

When I trained both unet and text encoder with captioned images it turned out good but it overfitted the characters, clothing and environment a bit too much.

#

Maybe I should have a lower lr for the text encoder when I'm training both?

latent charm Sep 6, 2023, 10:30 AM

#

normal ember When I trained both unet and text encoder with captioned images it turned out go...

You might try to add more token to describe the image as detailed as possible

#

I currently use half lr for te compared to unet

normal ember Sep 6, 2023, 10:31 AM

#

My first try that trained both unet and text encoders where caption something like this: cinematic film still of the road is empty, desolate, calm and serene, blue and yellow, close-up, lonely, barren, empty, natural, low, soft, straight on, shallow depth of field . vignette, highly detailed, high budget, bokeh, cinemascope, moody, epic, gorgeous, film grain, grainy

#

I've also tried something like this for the same image a road, desert, mountains, day, landscape

#

that didn't turn out as good.

#

I've also tried just kodak vision2 for all images and unet only. It seem to have the least impact on anything else except the color palette and film grain and such.

stiff dust Sep 6, 2023, 10:36 AM

#

dunno... I would train unet only, honestly.

#

it's also the way SDXL was trained itself

normal ember Sep 6, 2023, 10:36 AM

#

Maybe I should try to increase the number of steps in training if when I do unet only. I've run 3000 steps on 220 images.

#

If I do a rare token like kai is telling us, I'm not sure if I should just try a random token since this is not images of "Christian" nor "John" 😄

stiff dust Sep 6, 2023, 10:38 AM

#

you train for a style. I wouldn't use any special token here

normal ember Sep 6, 2023, 10:38 AM

#

and should i skip kodak vision2?

stiff dust Sep 6, 2023, 10:38 AM

#

except something like "kodak vision2"

normal ember Sep 6, 2023, 10:39 AM

#

I do 0.0004 for learning rate. Could probably experiment with that too and see if it picks up the style better or worse.

#

I could try 6000 steps and a batch size of 4 as I've done previously and see what happens. It would probably take 2 hours with that amount of steps.

stiff dust Sep 6, 2023, 10:42 AM

#

you can also try to play around with the --adaptive_noise_scale parameter

#

setting it to, say 0.05, might speed up training for some concepts

normal ember Sep 6, 2023, 10:44 AM

#

This is what I use atm. https://gist.github.com/twri/3b4fdc6adc6a81e6dbd9ea5256997f11

#

Adaptive noise scale:
Used in combination with the Noise offset option. Specifying a number here will further adjust the amount of additional noise specified by Noise offset to be amplified or attenuated. The amount of amplification (or attenuation) is automatically adjusted depending on how noisy the image is currently. Values range from -1 to 1, with positive values increasing the amount of added noise and negative values decreasing the amount of added noise.

stiff dust Sep 6, 2023, 10:46 AM

#

--clip_skip 1
does that make sense for SDXL?

normal ember Sep 6, 2023, 10:46 AM

#

Not sure, I should probably remove it since you ask. 😄

stiff dust Sep 6, 2023, 10:48 AM

#

sorry, I meant --min_snr_gamma=5, not --adaptive_noise_scale

normal ember Sep 6, 2023, 10:50 AM

#

Min SNR gamma
`In LoRA learning, learning is performed by putting noise of various strengths on the training image (details about this are omitted), but depending on the difference in strength of the noise on which it is placed, learning will be stable by moving closer to or farther from the learning target. not, and the Min SNR gamma was introduced to compensate for that. Especially when learning images with little noise on them, it may deviate greatly from the target, so try to suppress this jump.

I won't go into details because it's confusing, but you can set this value from 0 to 20, and the default is 0.

According to the paper that proposed this method, the optimal value is 5.

I don't know how effective it is, but if you're unsatisfied with the learning results, try different values.`

#

I guess you could say my images are noisy, if grain is noise.