#🔧|finetune

1 messages · Page 16 of 1

sonic narwhal
#

39 repeats * 39 images, batch size 8, 20 epochs total steps 3803, estimated training time 116 hours on RTX 3090.
Why is my estimated training time for SDXL 1000x that of training 1.5 loras?

young crater
#

This is what the progress should look like

#

on a 3090

sonic narwhal
#

why so small amount of steps when training 5k+ steps on SD1.5 is no problem?

young crater
#

Different model, different lora training styles

lucid ice
young crater
sonic narwhal
lucid ice
#

40_lora

sonic narwhal
#

3 hours for 100 steps

young crater
# sonic narwhal

113s/step is waay too high for a 3090. There is something else up here..

young crater
lucid ice
#

1? what does that number even do?

#

someone else told me between 20 and 40

sonic narwhal
young crater
young crater
young crater
young crater
# lucid ice 40_lora

if your SD1.5 LORA settings were:

40 images
4 Batch
30 repeats
1 Epoch

it is now:

40 images
8 Batch
1 repeat
30 Epochs

(estimated)

lucid ice
young crater
lucid ice
young crater
lucid ice
young crater
orchid yoke
# young crater should be 1_lora

Dang it, ive been sitting here for hours trying to figure out why everything Caith said was gold but not working for me..Thank you

sonic narwhal
#

225s/it

lucid ice
sonic narwhal
#

yes

lucid ice
sonic narwhal
#

yup

sonic narwhal
young crater
# sonic narwhal

Try running it with half the batch size, it could be a vram issue

sonic narwhal
#

batch size 4, epoch 20

young crater
#

are your images scaled to 1024x1024?

sonic narwhal
#

no

lucid ice
#

that might be it

sonic narwhal
#

ahh okay xD

#

thought the bucket would take care of it

#

very well then, saving that for tomorrow

#

thanks for the help

warm fog
spring sun
#

If I remeber correctly, in ML the loss should be decreasing contantly over itterations. Is this True for diffusion models? Is it ok if my loss is going a little bit up then down though iterarions or it should always decrease?

When you are finetuning, loss is always going down if lr is right?

open merlin
#

Depends. The loss landscape is very large, sometimes to get to global optima you have to go through suboptimal solutions. Do not know how exactly this applies to sd xl though. This is what my current model in training looks like, using cosine with restart, hopefully it will work:

spring sun
#

@open merlin thanks, it helps

distant halo
#

Has anyone had issues with their LoRAs doing very well on some prompts (simpler ones, typically), but struggling to reproduce the training subject accurately on others (usually larger and more complex ones)?

spring sun
#

Train with transparent vs White background, anyone have an idea about which is better? I trained with transparent and its giving me alot of colorful background. Dont know if it had influence.

hollow spruce
hollow spruce
distant halo
#

"amcm, a woman wearing a black dress, smiling at the camera with a white curtain behind her, head shot"

spring sun
hollow spruce
# distant halo "amcm, a woman wearing a black dress, smiling at the camera with a white curtain...

changing your trigger word, to something that the model already knows exists and is close to what you're trying to make - will significantly improve your experience. but this shouldn't really be causing any issue other than longer training time. (in the past, this was addressed by clip training, but since we're doing unet only right now, using proper fitting words as the trigger word saves you time)
other than that all good. "with a white curtain behind her" is good! always tag background 🙂

distant halo
spring sun
#

Btw in kohya, should I add the trigger and class words in the start of all captions?

hollow spruce
hollow spruce
spring sun
#

Oh thank you! It was not clear if I should add the trigger or it is already doing that behind the scenes.

junior owl
#

anyone had any luck with SDXL textual inversion?

spring sun
junior owl
hollow spruce
junior owl
hollow spruce
analog sinew
#

For some reason when setting --num_cpu_threads_per_process=2 with sd-scripts, accelerate deadlocks. Very odd. Anyone see this before?

#

I think i found the issue

#

max_data_loader_n_workers

primal isle
#

@hollow spruce do u still recommend vit-h for auto tagging datasets for sdxl lora training? i want to experiment with ppl/faces. i will check captions manually, but i want to auto tag the dataset first as a base

hollow spruce
primal isle
hollow spruce
pliant drift
sonic narwhal
#

What is a good upscaler for a image like this that needs to go from 512 to 1024?

sonic narwhal
warm agate
sonic narwhal
#

a1111

warm agate
sonic narwhal
#

thank you

meager spade
#

CUDA Unified Memory is saving the day for me with LoRA training on a 3070, it OoMs without it

sonic narwhal
young crater
sonic narwhal
#

can they be more than one megapixel?

young crater
#

But, I have had 100% success rate at 1.5 Lora’s and a 0% success rate at SDXL, so I may not be the best repo of information

signal nimbus
#

have a question, when training an sdxl lora, why some people put the bucket max size over 1024px?

young crater
signal nimbus
#

yeah i guess

hollow spruce
# sonic narwhal can they be more than one megapixel?

with buckets turned on, and resolution set to 1024,1024 - everything that is too large gets scaled down to the best size for sdxl.
all aspect ratios work (but you'll save vram by not having all too many of them)

also although your images can be bigger, dont go complete overkill - if you have multiple 4000x7000 pictures, you'll get weird issues while the script is starting, and may run out of ram, or just have it run super slow. Keep the size at a humane level of like under 4000px in the largest dimension

signal nimbus
#

is this considered fast? 🤔

hollow spruce
signal nimbus
#

172 img, 12 repeats

hollow spruce
#

at 2k images, I'd say you're about averaged speed

signal nimbus
#

cooking the new Blame lora :D

hollow spruce
#

'cooking' well defined XD hope you dont burn it

signal nimbus
#

improved the datased, all hand picked and edited + processed

#

gonna look better than before for sure

hollow spruce
#

you better be saving every epoch with that high repeat rate!

signal nimbus
#

i do!

#

previous lora was just 3 epoch, i'll see what it does up to 5

open merlin
#

Would it make sense to use an llm like llama2 to adjust the automatically generated prompts in the correct format? It might be able to distinguish background from main character.
Then you can just write a python script that goes over the folder with automatically labelled images and cleans it up. Then you can train the llama with the cleaned up prompts for a copilot. Is this project feasible?

open merlin
#

@hollow spruce How do you make sure your training prompts are in the correct format? Do you really go through all images every time?

hollow spruce
open merlin
#

hmm, thanks. Do you think using open source LLM's could improve captioning?

hollow spruce
hollow spruce
covert pagoda
sonic narwhal
#

my dataset is 50 images so far, no background in any of them only solid white color

#

All 50 images are similar to this

signal nimbus
#

results of 2h and a half of taining my Blame! lora. Not bad!

#

"ryan gosling by nihei tsutomu"

ashen field
#

@hollow spruce your finetune config works with bitsandbytes 0.35, once upgraded to 0.41 (when using dev2 branch of kohya) the loss would diverge after a few epochs, it's upstream bug but I think you should be aware

hollow spruce
ashen field
hollow spruce
# ashen field Thanks for sharing the config. Any thoughts on using prodigy instead of adamw8bi...

16~32 should be fine. 8/1 ratio for dim/alpha should be kept though (so 16/2) - significantly increases the learning time. From my tests I can say it works great - but at the same it's not like I've run into any issues on 8/1 that weren't dataset or captioning related
but 64~256 you should take care to not accidentally overthrow the common knowledge of the sdxl model. Basically everything starts getting a bit worse if you use those sizes and don't take a lot of preventative measures

ashen field
#

And how about conv dim? Keep at 4?

hollow spruce
#

apart from adamw8bit, I've only tested adafactor - where I did 4 loras, once with adam, once with adaf, same dataset. adamw8bit turned out better/faster every time. But I'm assuming it will arrive at the same detail, just slower.
will be testing prodigy soon - as it sounds promising. just haven't found the time for it yet

hollow spruce
sour eagle
#

I tried adamw8bit yesterday and it said it was gunna take 3 hours lmao so I switched to adamw and it only took 1 hour. Is there a reason for that?

hollow spruce
hollow spruce
#

how many steps was it in total?

sour eagle
#

Well I could probably push 4. I think 3 was like 15.8G of vram

ashen field
hollow spruce
ashen field
#

So the learning rate is stuck at 0

hollow spruce
ashen field
hollow spruce
ashen field
hazy elbow
sacred grail
#

ur not putting --w 1024 and --h 1024 in the sample prompts right? that might be the problem

ivory yew
#

has anyone tried finetuning the XL unet instead of finetuning a LoRA?

analog sinew
#

waifu diffusion xl

young crater
analog sinew
#

now this is overfitting 😎

sour eagle
#

Is there a way to get a fancy graph of my loss? I’m using kohya gui.

analog sinew
#

i think you can configure wandb

young crater
sour eagle
pliant drift
#

in the gui

analog sinew
#

got masked lora training working on kohya-ss/sd-scripts sdxl branch

#

(mask covers the text/drawing in the training data)

#

anyone have a masked hands dataset? 😄

young crater
young crater
sour eagle
#

Ohhhh okay thanks

#

Sorry didn’t see that as I’m not home😂

sinful rune
#

Hi, does anyone know how to train on multi gpu devices with everydream2?

hollow spruce
sonic narwhal
#

Sample images of sdxl sketch style training looking good 😆

hollow spruce
sonic narwhal
#

what does training comment do?

#

Have you fixed this?

hollow spruce
# sonic narwhal

nothing functionally. some UIs let you see the comment if one was attached

sonic narwhal
#

cursed sample images

dull bramble
restive bridge
#

why does training pull 450w from my gpu 🙃

young crater
restive bridge
#

ftw3

young crater
#

Some gpus have a way to change that with a switch on the top of the gpu, but I’m not sure about the ftw3

restive bridge
#

my sensor logs say it's pulling 110% TDP

restive bridge
young crater
#

550w between the two, it’s possible that you are getting a spike in your power draw causing it to shut down.

Assuming you are using caith’s settings, if you lower batch size from 8 to 4, you will lower your memory usage quite a bit and, in turn, wattage.

restive bridge
young crater
#

By running way slower and offloading to system memory

#

The extra performance (on a factor of like 5-10x in this case) is what is drawing so much power

#

And as Caith notes, running batch 4 means you can check your training in ComfyUi while your training rather than relying on the not-great sample output

#

Which means you’d technically be running faster as you can check your work sooner

restive bridge
#

something else is off. i'm using caith's config file with no changes and getting crashes even on batch 6, while thousands of others are running it fine on worse gpu's and higher batches, and if anything they only get oom.

#

if i got oom i wouldnt mind

young crater
#

🤔

In that case, it sounds like there’s something up with the gpu specifically.. have you tried running stress tests on it recently?

restive bridge
#

yes a few today. I also use vr heavily and have never had problems. temps are good

ocean dune
#

Hoi, for training, do the images have to be 1:1 ratio? Like 1024x1024? And if so, how do would i say train one for a game character? AS most of them for the ingame character is in 21:9 ratio or around there. Just photoshop the images to have no backgrounds and just make image as wide as tall and make them a transparent png with no background? :P

young crater
# ocean dune Hoi, for training, do the images have to be 1:1 ratio? Like 1024x1024? And if so...

You can train at different aspect ratios, just make sure of a few things:

1. The image size should be One Megapixel in total (1024x1024, 2048x512, etc). Here is a calulator for this purpose: https://www.scantips.com/mpixels.html (there are others aswell, just a hard thing to find). You can use Presize.io to crop your whole data set at once
2. Set max image size in the settings to your source image resolution
3. When testing in ComfyUI, make sure to set your CLIP resolution to the same aspect ratio as your generated image or else SAI staff may post your lora in the sdxl chat and call you out for claiming to fix double characters

dawn pawn
# restive bridge why does training pull 450w from my gpu 🙃

You could power limit the GPU. The command is: nvidia-smi -pl 350 if you wanted to limit it to 350W.

I run my 3090 at 300W and lose about 3% performance in training compared to the normal power and it's much quieter as well (i.e. it's very worth it).

young crater
dawn pawn
ocean dune
#

There's no clip res for loras as far as i can see. Used this guide https://www.youtube.com/watch?v=AY6DMBCIZ3A

Updated for SDXL 1.0. How to install #Kohya SS GUI trainer and do #LoRA training with Stable Diffusion XL (#SDXL) this is the video you are looking for. I have shown how to install Kohya from scratch. The best parameters to do LoRA training with SDXL. How to use Kohya SDXL LoRAs with ComfyUI. How to do checkpoint comparison with SDXL LoRAs and m...

▶ Play video
#

Also, is it normal for when training a lora for it to use all 24GB and then some? Thonk Training for SDXL

sour eagle
#

i dont trust that guide lmao

young crater
ocean dune
#

Thanks :)

#

So how much video memory should a lora training consume? Like all of it?

young crater
#

Batch 4: 10gb
Batch 8: 16gb
Batch 10: 24gb

or something along those lines

ocean dune
#

Will take 19 sec per iteration, so 15 hours for this quick test lol.

young crater
#

whats your repeats at o.o ?

#

A SDXL lora of any reasonable size on a 3090 should be 15-50 minutes

#

should be around 80 Epochs, 1 repeats and 8 batch

ocean dune
#

I used the config in the post you linked to. And for some reason, training just ceased for some reason

young crater
#

press space on the command prompt, but you have way too many repeats

ocean dune
young crater
#

repeats at one, 35 images, epoch 100, batch 8 takes about 2 hours

ocean dune
#

Ah, seems like i forgot this one. Gonna do a through folder structure tomorrow and do the deprecated folder part as well :P

ocean dune
young crater
ocean dune
#

Huh, odd. And yeah, something is amiss lol

young crater
#

whats your img folder name?

#

(You can censor the prompt name if you want, just want to know the first number)

ocean dune
#

100_link. Testing making a game character lora.

young crater
#

you are using Epochs as steps per image with Caith's workflow

ocean dune
#

Ah, thought number indicated steps per image

ocean dune
#

Not too good with text sadly, though the guide you linked to was fairly easy, just some terms i wanna dig deeper into :P

young crater
#

Steps = (Repeats x Img Count x Epochs) / Batch Size

Aim for 150-200 steps with Caiths workflow

I honeslty dont know what they mean either sadly. Hopefully one day there will be a solid video tutorial, but most today are either bloated or confusing..

hollow spruce
#

but I've seen the issue people run into with vram, so I'll be adding presets for 12gb vram, 16gb vram, 24gb vram in the next few days - should solve the first issue people usually encounter

sour eagle
hollow spruce
sour eagle
#

i just realized all of them are in the same pose. lmao

#

except one

restive bridge
#

why does training with regularization double the training steps even though reg repeats is 1 and img repeats is 20?

hollow spruce
#

1reg image per dataset image is the general rule of thumb

#

I think kohya automates that

restive bridge
young crater
restive bridge
#

ohh

signal nimbus
#

guys what's the max batch size you usually put on a 24gb GPU when training SDXL LoRa?

signal nimbus
#

damn stupid me using batch 2 lol

hollow spruce
orchid yoke
hollow spruce
#

for a more "full" workflow, I can recommend sytans highres fix (since that uses the base again at the end - therefore more lora details
but for testing the capabilities of your lora - this is the most efficient way

proud robin
#

whats the easiest/ best way to generated images of myself in any style?

orchid yoke
sour eagle
#

I wonder when any actual good fine tunes will come about for sdxl. Dreamshaper is meh and all the anime ones are meh as of right now.

tall condor
#

does anybody have a propper description on how exactly finetuning in kohyass works?

#

also is it possible to merge 2 lora models to 1 model?

restive bridge
# hollow spruce for a more "full" workflow, I can recommend sytans highres fix (since that uses ...

I was very excited with sytans 3rd pass to bring back lora detail but it made details a bit too perfect. for my use case if someone is ugly irl they need to stay ugly in the output lol.

So i discovered that if a lora is sufficiently trained enough, the details of a face make it into the first base pass and arent removed for refiner pass. refiner does its job on details but doesnt remove any likeness. It's quite nice

young crater
restive bridge
young crater
restive bridge
young crater
#

Ahh, I misread the post, tyty

tall condor
#

does it make sense to train a 1.5 model with max size 768 by 768? i read that it can pick up more details this way? is it true?

queen matrix
#

I have usually trained 1.5 at 768x768 and everything always worked out fine. But I never compared to training at 512x512 with the same settings.

young crater
#

I mainly trained at 512x512 for faces, with a good starting checkpoint, the faces looked great.

I think 768x768 could help with more complex trainings?

hollow spruce
#

Barely docu on this if you google, so I thought I'd post it here. Advanced settings in kohya

Dropout caption every n epochs
Usually, images and captions are learned as a pair, but it's possible to train just on "images without captions" every certain number of epochs.

This option allows you to specify "drop out captions every ○ epochs."

For instance, if you set this to 2, you will conduct image training without captions every 2 epochs (2nd epoch, 4th epoch, 6th epoch...).

By training on images without captions, it is expected that your LoRA will learn a more comprehensive feature set from the images. It can also help prevent the image features from being tied too closely to specific words. However, if you use captions too sparingly, your LoRA could become ineffective at prompts, so be cautious.

The default is 0, and in the case of 0, caption dropout is not performed.

Rate of caption dropout
This is similar to the "Dropout caption every n epochs" mentioned above, but during the entire learning process, you can train on "images without captions" for a certain proportion of the time.

Here, you can set the proportion of images without captions. 0 means "always use captions during training," and 1 means "never use captions during training."

Which images will be trained as "images without captions" is determined randomly.

For example, if you train LoRA with 20 images, reading each image 50 times for just 1 epoch, the total number of image learnings is 20 images x 50 times x 1 epoch = 1000 times. If you set the rate of caption dropout to 0.1, 1000 times x 0.1 = 100 times, you will train on "images without captions."

The default is 0, and all images are learned with captions
opal jacinth
compact trellis
#

has anyone managed to get the refiner to train?

hollow spruce
#

once it's supported, you'll see it via commit there first

stone garden
#

Question, is there a nice automated way to generate captions, and a UI to edit them? I want something to work with initially, and then edit the captions myself.

#

Also to curate the dataset, delete some images, etc.

narrow kraken
#

Question, is it possible to merge xl1.0 base with the refiner ?

#

@stone garden lol, sup

hollow spruce
# stone garden Question, is there a nice automated way to generate captions, and a UI to edit t...

Interrogator on webui to auto-tag using the Vit-H or Bit-big-G model. (vit-big-g is better, but requires more resources to run)
curation, I recommend adobe bridge for the initial "delete, rate, move" part.|
resize all images above 4000px to be less (can be automated in many apps)
then move to hydrus network, where you can import all the tags, and efficiently edit them

I'll have a comprehensive guide up eventually, on how to do it. but this is it in a nutshell

stone garden
#

How much do all these cost lol?

hollow spruce
#

those are all free ^^

#

hydrus network is somewhat complicated to learn though. so if your dataset is like around 50~100 images, you could use this app instead (not as good, but much much easier to use)
https://github.com/lukemoore66/FastCaption

hollow spruce
stone garden
analog sinew
sour eagle
#

Does anyone know if it’s possible to train to a negative amount instead of 1? Like, if I train a Lora I want to train the negative variant so the positive does the opposite? If that makes sense.

tepid sundial
#

Haven't experimented much with this yet, but super interested in its impact (also the impact of training with a different scheduler). If you give it a try I'd love to hear what you find.

hollow spruce
#

tried it with my sketch lora and got pretty weird (but not bad) results

sour eagle
hollow spruce
tepid sundial
# tall condor does anybody have a propper description on how exactly finetuning in kohyass wor...

The code in sd-scripts is fairly straight forward and makes sense in terms of just adopting the minimal required changes from 1.5. There aren't that significant differences compared to for example the training setup in the diffusers training for LoRA, and there's an open fine-tune PR for diffusers right now that also is very similar in approach (some things differ like inclusion of snr for example). Diffusers PR here https://github.com/huggingface/diffusers/pull/4401

What's still quite unclear to me is what the impact of all these slight differences are, there's not much available data. For example how good results people are getting with float16, bfloat16 vs float32 precision, snr or not, batch sizes, learning rates, etc. There's simple quite a lack of benchmarks and data still. Model trainers are noutoriously BAD at sharing details of their findings as well.

GitHub

What does this PR do?
This is a modified pix2pix that should enable a basic text to image trainer for SDXL
fixes issue #4366 as requested by @sayakpaul
theres some small todo in the code, and it do...

hollow spruce
sour eagle
hollow spruce
#

yes! XD

sour eagle
analog sinew
#

I made an inverse aesthetic lora, but the results weren't obvious

sour eagle
hollow spruce
#

you can just do weight = -1 to quick test it. results are the same as if you inverted it

#

and hot damn its working

#

my dumb accidental youtube artifact lora is now high detail lora

#

right is with lora applied with -1

sour eagle
#

Is that the same res? If so I’d say that did add some detail. Interesting

hollow spruce
#

big image for the brave!
Artifact lora is using weight -1
BASE ONLY | Artifact Lora | Face Lora | Artifact + Face Lora

#

also some cherry picked results:

#

first of all, this makes using 2 loras at the same time a damn lot easier.
and the negative lora helps a lot with following the prompt XD which... I can't really explain

either way, this opens up a whole new box of stuff to research

young crater
analog sinew
#

has there been any proper research on negative prompting?

stone garden
young crater
restive bridge
#

anyone got foolproof background removal i can run locally?

hollow spruce
#

but yeah, for properly running the flavor chain each time you'll be needing 48/80gb vram depending on 8bit or 16bit

#

it's not really a local solution, as the 3090/4090 can barely run it.
most that I'm aware of, run it via an A100 runpod or other hosted solution

stone garden
sour eagle
#

anyone messed with specified down and up weights when training lora? i get an error saying no perameters specified

hollow spruce
sour eagle
#

i looked it up, still 12, at least it should be. may have some hidden parameters that are not in the gui that are missing for me idk

restive bridge
rain scarab
stone garden
#

How are regularization images supposed to be created ideally???

#

If I just generate regularization images using "photo of a woman", I never get full body shots, face closeups, sitting or lying poses, or pictures from behind, while the training images include those shots.

#

On https://rentry.org/59xed3#hard-route I read:
"regularization images are reduced to latents and then trained on how to produce them back, using DDIM as sampler"
"You will want to generate an AI reg image for every training image you have. The names will have to match. So every training image will have a matching regularization image."
"Same prompt as the caption for the training image."
"DDIM sampler, resolution equal to your training resolution (not the same as the training image!), seed equal to your training seed (420 if you didn't touch it in the scripts below)."

#

I would imagine the AI can learn the difference between training and regularization images best if the prompt is the same for both images, just with the trigger word in the training image!?

warm fog
#

I am renting a 1 x A100 SXM 80GB on runpod. What are some good initial settings for dreambooth sdxl training that optimize for performance? I am using the huggingface/diffusers library with latest pytorch, latest diffusers lib, and the training pipeline from examples/dreambooth. I guess fp16 precision is a good choice for performance on a100?. I get something like 3.44s/it during training.

accelerate launch train_dreambooth_lora_sdxl.py \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --instance_data_dir=$INSTANCE_DIR \
  --output_dir=$OUTPUT_DIR \
  --mixed_precision="fp16" \
  --instance_prompt="a photo of sks dog" \
  --resolution=1024 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 \
  --learning_rate=1e-4 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=500 \
  --validation_prompt="A photo of sks dog in a bucket" \
  --validation_epochs=25 \
  --seed="0"

and ~/.cache/huggingface/accelerate/default_config.yaml looks like this:

compute_environment: LOCAL_MACHINE
distributed_type: 'NO'
downcast_bf16: 'no'
gpu_ids: all
machine_rank: 0
main_training_function: main
mixed_precision: fp16
num_machines: 1
num_processes: 1
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false
warm fog
tidal silo
#

anyone have any good recomendations for a good tool to manage / tag / edit datasets?

orchid dew
#

seems like a good place to ask, but how can someone make a sdxl checkpoint? I can't seem to figure out... anything

tall condor
#

quick question, what is "CV"?

#

"8bit Adam is slower but saves memory and results in a higher CV" - what does CV stand for?

jade hornet
orchid dew
tall condor
#

and what does that mean?

primal isle
ashen field
tall condor
#

ok, than i gues smy question is shall i use adamw oder admaw8 bit? i cant find too much about the difference

ashen field
tall condor
#

i am wondering on what is the difference in the model at the end of the day

#

so far i allways used adamw

#

i am wondering weather i can expect the model to get worse with adamw8bit

ashen field
tall condor
#

took me 2 month to finde mine with adamw so better not touch that lol

ashen field
#

Adam8bit will allow higher batch size which can train faster

tall condor
#

doe sit make sense to train an 1.5 model with 768x768 max size? i read online that it can pick up more detailes this waay

bronze oyster
#

Hi,
How deep should captioning go when training a person/character lora?
I have seen conflicting information on this and was wondering what results people here have had.
At the moment I am not captioning very deep.
Eg:

Uniquetrigger, man wearing tank top, close-up, from side
uniquetrigger, man wearing jacket and pants

Should I be captioning even less?
Eg:

Uniquetrigger, man, from side, close-up

Or more

Uniquetrigger, man with a beard standing in room wearing blue tank top, from side, close-up
Uniquetrigger, man with a beard standing on a balcony wearing blue tank top, from side, close-up

or excessive

Uniquetrigger, man with a beard standing in room wearing blue tank top, from side, close-up, white wall, cabinet, brown eyes, photo
Uniquetrigger, man with a beard standing on a balcony wearing green jacket with black sleeves, from side, close-up, trees, ocean, islands, houses, photo
latent charm
#

tag or prompt is to descript your image. In training, it means your lora try to learn to use your caption to descript the training pair image. If less, you would able to produce the similar image in less caption. Something you are not mention in caption would be learned into your mentioned tag.

#

Correct me if I am wrong

jade hornet
#

The guideline I follow is to describe everything but the character. You can describe their clothing if you don't want the AI to associate the clothing in the training images with the character. I'd avoid things like hair, eye color, etc unless you want to be able to change those things in your inference

pliant drift
tall condor
#

which resolution did you useß

#

and what batch size?

pliant drift
# bronze oyster Hi, How deep should captioning go when training a person/character lora? I have ...

my usual captioning strategy. i collect a bunch of tags i want to consistently use. like "muscles, shades, looking cool" whatever. Those might not apply to all your images but you want them to be consistently tagged where they are used. so you form a consistent tag set. thats step 1.

step 2. captioning each image with a template pattern. "token class, tags, clothes, background" so it could be "jack man, muscles, shades, looking cool, lumberjack flannel jacket, heavy duty jeans, in the woods"

step 3. there is no step 3. ez pz.

#

end of the day, just go for it. captioning is just voodoo imo

#

whenever i try using clip and blip to caption, it's always some junk like "man holding a beer and sitting holding a beer with a beer in his hand holding a beer sitting at a beer table with beer"

#

Thing i don' tlike about those is that there's no commas so they don't play into caption shuffles or dropout mechanisms that kohya uses

tall condor
#

for me what i do as captioning is i create as @pliant drift mentioned my base catpions of the concepts i want to train, like, "standing, sitting, whatever" and tag all my images

#

then i use an automated tagger and add all the usefull tokes i get from it

#

and then i prefix my own tokens

#

for me DeepDanbooru captioner worked quite well, however you need to get rid of the nsfw tags afterwards

#

@pliant drift when you say you trained with higher resolution do you mean to change the max resolution settings higher or just use high res input images?

tall condor
#

flowwolf you should try DeepDanbooru captioner, you get a comma seperated list of tags and if you enhance that with your own the results got much more flexible in my tests

pliant drift
# tall condor flowwolf you should try DeepDanbooru captioner, you get a comma seperated list o...

i've never considered that app because i don't do anime or anything like what danbooru hosts. LOL only reason i know about danbooru is because i was wondering what all this talk about booru tags in anything v3 were

would an app like that matter to me? like, i'm training generalized styles for non anime models. I feel like the anime guys just got way better tooling. Booru tagging culture is actually a huge boon to this new field

tall condor
#

i dont do animie either

#

but i have tried like 10 different taggers and overall this one worked be for me for lora on ppl

#

i even did testing for landscape and it did a good job

#

you can test it here

#

what i found most importaint is that you still create your concepts and tag the primary tags you want by yourself

#

but once you have that sorted you can just use the tagger for the details

#

@flowwolf regarding the sizes may i ask for more details?

hollow spruce
#

the amount of images I manually tagged in the last two days.
only 25hours of manual reviewing/editing/tagging
roughly 20k images were reduced to that
100% watermark free

warm agate
#

@hollow spruce i am train a lora and i forgot to add these args to one of the sample prompts --w 1024 --h 1024 --d 1 --l 7.5 --s 35 --n blurry,text,watermark can I edit them in the prompts.txt file from the samples folder

warm agate
#

if I didnt have those, it wouldhave run into an error

hollow spruce
#

once the next image generates, it always checks the .txt file for what is has to generate

warm agate
#

and would terminate the training

#

experienced it yesterday

hollow spruce
#

so you can change it like every epoch if you want XD even change prompt entirely

warm agate
#

but my main prob was that its just 600 steps away from generating samples

hollow spruce
#

I've done that before when I had an unlucky seed for what I was training

warm agate
#

so i was kinda worried

#

as i have to sleep

#

and gonna leave my pc up for training

warm agate
hollow spruce
#

but now I'll just wait it out

hollow spruce
#

after waking up from sleep I'll just test everything. if you're not in a rush, you can just load up comfy anyway, let it overflow into ram, and sure it takes like 10 minutes to generate 1 image. but if you're not using the pc that's little time to wait

warm agate
#

i cant run comfy simultaneously

#

as i only have 16gb vram

grizzled agate
warm agate
#

@hollow spruce when using Kohya, does the training continue after waking the pc after sleep?

#

or will it terminate if we put the pc on sleep?

hollow spruce
#

ah. may wanna try out the cpu only method. see if that works in the future. (not now though, when you just try if it works)

hollow spruce
warm agate
#

oh ok

hollow spruce
#

I'd assume it crashes since it uses so many systems

#

not many apps are optimized for sleep

pliant drift
#

@tall condor sorry i've been wrestling with configurations . when i say using 896 images, i mean i'm bucketing and setting the resolution in kohya to 896,896. the training sets i have i try to do at the highest quality, and i let kohya downscale it to appropriate buckets. Common crops help here a lot

#

where appropriate i'll even double up the training data a bit, for people especially, i'll do high quality square crops of faces. If the closeup crop isn't decent quality at the training resolution, then i toss it. quality training data is paramount

#

high resolution alone isn't a magic bullet. there should be high quality imagery too and thats up to you to eyeball

tall condor
#

and how many batches can you do with that? like 2?

pliant drift
#

i have 16gb. i guess maybe 3 -4. gradient checkpoints seeem to be a magical way of getting higher batches at the cost of speed. I use them to huge success.

tall condor
#

thanks for the input i will try that

pliant drift
#

i tried deepdanbooru on a few typical photos of the sort i'd like to deal with. i guess the demo doesn't show the tag editing capabilities. it kind of sucks for photographic purposes and the only thing it does see accurately is the text. i hope the editing abilities are better

#

i use a janky script i milked out of chatgpt right now

tall condor
#

i have something for that sec

#

its quite fancy

#

it works quite well but does not produce tags, its more like the clip

hollow spruce
#

will post more info on this eventually,
but if you're training a single face, then use DIM 24/alpha 1
that is big enough that no detail should be lost at all
emphasis on face only.
for essentially all other loras, dim 8/alpha 1 is still the way to go.

young crater
sour eagle
#

getting this error when trying to merge loras merge_lora.py", line 6, in <module>
import library.model_util as model_util
ModuleNotFoundError: No module named 'library

tall condor
#

@pliant drift when i increase the max size to 768 my model turns into maximum crap

tall condor
#

any suggestion on the settings?

orchid yoke
# hollow spruce will post more info on this eventually, but if you're **training a single face, ...

Woah, a quick test (because your workflow allows for such quick tests) - and.. it seems to have made a lot of difference. 🙂... Im particularly honing on quick loras as i use them for reference so usually have about 10 face shots (which like, 100 odd repeats on each image for 1.5 and random crap tagging was fine. Looking forward to your detailed guide.. if you haave a buymeorcoffee or somesuch, let me know. Keenly following as you've been a massive help.

latent charm
#

@hollow spruce Sorry to bother you, if we are not training the text encoder on sdxl, why would the trigger words work with the lora? As my understanding, the trigger was learned into text encoder in previous lora. When we loaded the lora with models, we use the trigger words to trigger related trained features. In my sdxl lora testing, I am still doing the same thing without training the text encoder. I dont understand how the model+lora could know my trigger word and trigger related features.

hollow spruce
# latent charm <@211089689652887552> Sorry to bother you, if we are not training the text encod...

the clip is like a translator. it takes your prompt, and translates it into ai language.
if you are training a specific car, and give it the caption "car", then clip will take the image, and do its best to translate it from a car image, into car data.
if you give it an image of a monkey, and call it "car", then the clip will still translate it into car data, and the unet layer will just stare at the clip in a very confused way and mumble: "well if you say so...". causing all future cars to very very slowly converge into monkeys.
but if you used the caption "monkey", then it would make a lot more sense, and the unet will learn it much faster, since its data that already makes sense when the unet looks at the converted word

#

if you actually train the clip

then in the first instance, it will look at your car, and make the ai word for car also sound a small bit more like your english word for "car". It makes the whole translation process smoother and easier.
In the second example of telling it that monkeys are cars... its gonna REALLY mess everything up about both cars and monkeys XD but after enough epochs, and after having seen enough monkeys that are all called "car", then essentially the ai will learn the new meaning for the word, and translation will be smoother.

so why don't we do clip training for everything?
because just telling it that its translation is wrong, often does more harm than good. You only give it a word, with very little context. meanwhile the original meaning the ai had before you, was incredibly complex and linked to many things.

#

for cars, this isn't much of an issue, as it will quickly learn it. but if you dare to teach it eyes, neck, hands, feet or any such words, which are extremely complex when viewed in context of the bigger picture, then unless you're willing to provide it with around 5k images, you'll only teach it wrong things.

#

clip training is great. half assed clip training is bad.

#

here's an example. I'm currently retraining "girl" since the clip understanding of that word is pretty wrong.

#

vertical is clip training, horizontal is unet training

#

basically, once my training is done, I'll be able to specify age groups, without bias from the words "girl" or "woman", since I'm completely retraining the meaning of those words in both clip & unet

#

but it comes at the cost of 4200 manually tagged images 🤣

#

still not nearly enough, as I'll need about 3000 images per ethnicity

latent charm
#

I think I get the answer after seeing your explaination. For example, my trigger words is multiple characters, xeanlwan, the text encoder is freezed. The trigger word would be splitted into something like this xe, an, nl, w, and the unet got trained on those words leanred from the training image. When I loaded the lora, the text encoder also split my trigger word into same combination and because of that, te unet could load the trained features.

#

thanks for sharing

hollow spruce
#

its usually best to change your trigger word to one that matches your content.
if you can't change it (training many many concepts, and you don't want to write a guide for how to use your lora)
then you have no option but to train the text encoder, to make all your trigger words work

latent charm
#

Does it means if I change my trigger word to the closest word in the vocab would makes the training much faster?

#

For example, when I train an anime character call jianguo, rather than use his name as the trigger word. I should use male anime character to trigger the lora.

hollow spruce
#

it's literally like a multiplier

#

if you're training a face for example, you should find out which celebrity looks closest to you, then use their name to train your own face XD

latent charm
#

but does it means I need to input his name to trigger the lora?

#

lin the celebrity example

#

Thank you for sharing this useful information

supple lynx
#

did anyone tried textual inversion with xl model? im getting mixed results

hollow spruce
latent charm
hollow spruce
#

and I was hinting at gold medalist lin sheng

latent charm
#

Oh, I didnt realize the name is a celebrity.Lol

hollow spruce
#

you dont actually use your own name. you use the name of a famous person that looks like you! XD

latent charm
#

Now I get that

pliant drift
tall condor
#

which one worked best for you?

slim thistle
#

anyone know if you can get the entire codeformer algorithm into stable diffusion? THere is an extension but it seems it doesnt come with uber face upscale and enhancement

covert pagoda
sacred grail
pliant drift
#

Lora's have an essential requirement according to the kohya documentation. --network_train_unet_only (i think thats' corect) so keep that in mind.

now those other learn rates are optional. if they're both 0, then they both use the learn rate. But since you're only training the unet, setting the TE here doesn't matter anyways. Understanding what the settings are helps a lot. They're optional settings if you don't want the LR for both the TE and Unet to be the same.

As usual i'm always coming back to this. RTFM. I could've also mentioned --network_train_unet_only at the beginning too, but i thought it was implied and known already. I tend to trust people have at least read the manpages when they're looking for help.

latent charm
#

@hollow spruce How would you find the closest celebrity for lora training? I have tried one but I think the celebrity I had chosen wasn't the closest one and it does affected the likeness to the lora.

supple lynx
supple lynx
#

originally from diffusers, with lots of edits from myself

supple lynx
#

🤷‍♀️ i can share if you interested

covert pagoda
hollow spruce
latent charm
hollow spruce
latent charm
sacred grail
#

I have a question, is Micro-Conditioning from the paper also related to training? or is it only related to generating?
If I select don't upscale buckets in training a Lora (train on the original resolution of the images) and then do Micro-Conditioning in generating in comfy does it know that it should use the quality of the images above 1024 px but still use the knowledge from the images under 1024px?

stone garden
pliant drift
hollow spruce
latent charm
hollow spruce
#

the more uncropped they are, the longer it takes, and the more images you need

latent charm
#

My dataset are 15 images, most are upper body.

#

I am trying 400+ images now with celebrity training.

rough crypt
#

I see now many fine-tune anime xl mode in civital , but all of them had a bad performance , even worse than 1.5. I think training sdxl not only just train base , refiner model is also even more important. That's why until now there is not good xl model of anime.
SDXL is good, 50% is because strong clip, 50% is because refiner model. No refiner model, the result of sdxl is not surprising .
It seems that most people still don't realize how important the refiner model is.

#

If there are a good anime sdxl model, it must have a related anime refiner model .

sacred grail
#

or the other way around maybe

latent charm
#

I do agree a good fine tune should has its paired refiner model, especially the anime model which is a total different things of current model.

ashen field
#

SAI should provide insights on how to train properly or at least how they did it, it's kind of weird they did beta testing with finetuners yet no best practice being shared.

#

For example to this date we don't know if 1.0 uses offset noise or zero terminal snr.

latent charm
#

And the training method is also different

ashen field
sacred grail
ashen field
latent charm
ashen field
#

Two my best knowledge 1.0 doesn't use offset noise, but we shouldn't be the ones to make guesses when such things can be easily explained by offical SAI

ashen field
latent charm
#

I do agree SDXL is kind of lacking well documentation. Many things need to be experiment. If dev team could provide more information which would be nicer.

ashen field
#

Release shouldn't be "let's throw it out there and let people solve puzzles", documentation is important

#

Especially given the fact that there was a beta test

latent charm
#

I could imagine 0.9 is rougher than 1.0 and they couldn't postone 1.0 again

tall condor
#

prodygy and adafactor for me uses a way to high elarning rate. is there any way to control that

latent charm
#

both of them should be auto adjusted in the training

ashen field
#

Lora does tend to use high LR rate, nothing inherently bad with that

stone garden
#

Could someone edit one caption for me so it’s in the proper format (I could work on editing the rest myself):

The image shows a woman with long red hair wearing a black top and looking up at the sky with a pensive expression on her face. The background is a cityscape with skyscrapers and other buildings visible in the distance. The image is well lit, with the sun shining down on the woman's face and casting shadows on her body. The overall mood of the image is contemplative and introspective.

pliant drift
#

it's a machine learned neural network. releasing it with documentation would mean 3 years of research into figuring out how to document it

#

these tend to be black box systems. even by the people who make them. it's sort of a key aspect of the entire field of Machine Learning.

ashen field
pliant drift
#

also, due to the nature of open sourced fields, most of the components are all made by different people. the memory optimizers for instance. wnat to learn about dadaptation? the documentation is in the project. https://github.com/facebookresearch/dadaptation

pliant drift
ashen field
# pliant drift how are they to determine best practices?

You think they didn’t do any research during training and just hit run and released one model they got? There’s a lot of trials and errors during fitting, a lot of learning during beta testing, not much of those shared yet

pliant drift
#

these complaints got strong armchair expert energy to them. you're coming in here demanding help and blaming the world why you haven't found it yet. lack of documentation on cutting edge software is kind of what's expected and always has been. There is so much information for you to search out and dig your teeth into, but you're instead coming here and wasting energy blaming stability for not releasing documentation about a black box system

#

have you read the paper they released for sdxl on arxis? they published

#

complaining about documentation lacking when you've not even studied the paper or put any notes on it

ashen field
pliant drift
ashen field
#

Such as offset noise

pliant drift
#

I can say that because i've managed to find my way around just fine. i've not relied on any youtube tutorials at all since after the 2.1 release and i figured out that none of them really knew what they were talking about

pliant drift
#

1.0 is a progressed version of 0.9 too so all the information in the former version of the paper would matter

ashen field
#

It’s not like they don’t have the info, but not everyone is as talented as you or as patient to dig stuff out

pliant drift
#

have you even read kohya's training manual? all of that informaiton still applies too

ashen field
pliant drift
#

because you're making unreasonable demands when we break it down. like i have.

#

believe me when i say i'm not angry about this.

ashen field
pliant drift
#

"the community shouldn't be digging for information" feels really lazy and more about "i don't want to do legwork" since nobdoy researching any field will find all the information in one place. Research and learning will always require leg work. I don't understand people's aversion to research. learning is awesome. Spoon fed learning is just doing what you're told. Going out and discovering exactly the knowledge you need is what being human is all about.

ashen field
pliant drift
#

users won't read the manual. always assume that. the lazy people will always want someone else to do it for them. that'll never end regardless of all of stabilities efforts. we are in the Eternal September

#

#oldmemes

#

example. LInux is arguably more documented than windows. windows costs money while linux is free. people still use windows because they don't want to read manuals, ever

#

i really don't buy the altruism schtick if i wasn't being clear

tall condor
#

i understand that lora is using high LR but the models are so dominant

ashen field
tall condor
#

i mean that as soon i add it it it becomes very dominant in the results, i need to turn it down to 0.4 or less to generate regular concepts - but if i turn it down so far it barely generates the face right

#

if this was a regular model i would say its very overfit

#

i have treid about 10 different settings with the lora but apparently its either too strong or too weak - it appears to be much more harder to get it right than creating actual models

#

also after around 50 epocs the loss move to <0.02 and so on

#

i think its just very fast overfitting

#

how many epocs do you guys run? usually

latent charm
#

You could just save the lora more frequently and test which one is the best

tall condor
#

yes obviousely i do that but i find it quite hard to find propper settings

latent charm
#

If you have 50 epochs come to <0.02, you might try 10,20,30,40,50 and try to findout which one is overcook. If you find that, move to the previous one and testing out.

#

repeat the process you might find the perfect one

tall condor
#

but even at 50 epochs some concepts tend to underperform so i cant just reduce epocs

#

what learing rate are you guys using and how many epochs and how many repeats in each epoch?

latent charm
#

if your loss is less than 0.02, it should means you could reproduce the training image by the training caption at very high level similarity, let say 98%.

tall condor
#

yes but it usually also means that it an not create anything else

#

*can not

latent charm
#

yes, how come some concepts underperformed?

tall condor
#

well some concepts are just harder to learn and take more epochs

#

i try to balance that out with higher repeat count but you can also only push that so far because otherwise it will break the model

latent charm
#

Does the concepts perform correctlly in 50 epoches result? If yes, you might try the 40 epoches one. If 40 is great, you could try to use 40 epoches in next time or run the same dataset with 40 epoches and see would it be ok.

tall condor
#

with adafactor the resulting model produces allmost only the model arready at 50 epochs

#

i need to do some research on the behavior i gues

#

prodigy is even worse

latent charm
#

And you don't need to run the full training if you could see the result is enough

tall condor
#

what dim/alpha ratio are you guys using?

small eagle
#

possible for a mod to pin this?

sacred grail
tall condor
#

so for me it is very unexcpected that a rank 8 lora has allmost the same effect than a rank 256 - why is that

restive bridge
#

I need clarification on something. Online guides are very conflicted on this.

As I understand it: If my 20 training images are in the training img folder, "20_ronald", thats 400 after repeats. now if my reg folder, "1_man", also has 20 images in it, then there's one reg image for every training image.

BUT if I put way more images into the reg folder and keep "1_man" title, training still goes for the same amount of steps. So theoretically if I put 400 images into reg rather than 20, there would be one unique reg image per training image repeat, rather than 20 reg images repeated 20 times.

Correct? Why does this not lengthen training time? Why are people being told to use the same # of images in both? Is there even a benefit to putting more images in "1_reg" than in "20_img"?

jade hornet
#

Unless you get the guy that wrote kohya_ss to comment, or someone that actually understands the code, it's just guess work

restive bridge
#

@safe pecan 🤔

jade hornet
#

The way I try to think of it, every comparison of your training image to the class results in a score, that gives you the loss. But the class itself changes as you fine tune the model. The reg images would serve as a stable unchanging representation

#

That's why people recommend 1 for 1

rough crypt
#

anyone know how to train a sdxl refiner model using custom data?

hollow spruce
restive bridge
hollow spruce
#

essentially, the moment you declare a regularization folder, you multiply your steps by a total of 2. Images from the regularization folder are taken until they match this number, or multiplied until they match it

restive bridge
hollow spruce
#

I've currently sidestepped this by using no regularization folder at all, and instead including it as an additional training folder. -while I can confirm that this works for big and more complex loras, trained on datasets of over 1k images (for the both the training images AND reg images), I haven't tested it with smaller datasets yet

restive bridge
hollow spruce
#

I don't train faces enough, to have tested this properly. But I can tell you the 2 ideal theoretical ways.

Training token: <celebrity name>
Regularization: man (or woman)

option 2:
<celebrity name> man
man

option 2 follows the original dreambooth intention a bit more. can't confirm if it is better or not though

latent charm
#

@hollow spruce I tried the celebrity method and I think it is less likeness then unique token method. Some celebrities features remain on the result compared same training with unique token.

#

I also tried to use your reg files. It kinda prevent my lora to learn the features from training image compare with no reg.

hollow spruce
#

oh no D:
thanks for letting me know. I'll mess around with it tonight, and see what changes work best

restive bridge
#

but tbf my LR is .0003 so less relevant to caiths params

hollow spruce
latent charm
#

I was training on 400 epoch and the result still the same as 20-40 epoch with celebrity+reg

hollow spruce
latent charm
#

Yeah, my training data is Asian which seems conflicted with your reg set.

restive bridge
latent charm
#

also some unusual features

hollow spruce
#

I finally got all the images though, so now its just a matter of filtering and tagging ^^

covert pagoda
restive bridge
covert pagoda
# restive bridge yes you can't spread across epochs but you can spread across repeats. if a 10 im...

Thanks for simplifying and clarifying. So, if we can only randomize within an epochs repeats, it stands to reason to only use as many class images as there are in an epoch, in this case above, 200. Furkan from SECourses suggested to Bernard to implement a way to link class image randomization across epochs, but he said he didn't have the time to do it unfortunately. Think dreambooth extension solves this by using global repeats for randomization of class

versed crescent
#

Can I double check something with folks here? I've read a reddit post on SDXL lora training, and in the example they have 14 training images repeated 7 times, and 200 regularisation images, with a training step count of 3000. In their post they claim it takes 30 minutes on a 4090.

Now I'm trying a broadly similar setup using the same params (as far as I can tell) and it's estimated to be about 20 hours on my 4080. Is there something I've likely got wildly wrong? I can't imagine estimated training time is non-linear. There aren't so many posts on using sdxl_train_network.py so I'm having trouble cross-checking this with other sources.

rancid tartan
#

How do I train a character LoRA? I tried making a character and style LoRA with 10 images and I got the style but failed on the character
By some miracle, I have access to my desktop again, so I should be able to train on a reasonable time

versed crescent
#

Do LoRA dimensions always need to be powers of 2?

hollow spruce
versed crescent
orchid yoke
#

Caith - Kinda curious on why your posts arent pinned , .. its like, anything you say should be revered (IMO). But anyone poking about on the things youve said already would.. learn alot. Felt impolite to @ you welcome to ignore it or not even see it 😄

hollow spruce
#

it's fine though. some day once my guide is up, I'll get them to link that!

stone garden
#

Question, what are people's views on reg images? Also, in my LoRA, I can't get people's teeth to come out right. Any tips?

orchid yoke
hollow spruce
stone garden
#

I mean, I'm not picky whose teeth they are, they should look like teeth and not one big blob.

hollow spruce
#

I didn't solve it for my mega lora until I hit 4k images in my dataset 🥲 (essentially a finetune, but in a lora)

sacred grail
# stone garden Question, what are people's views on reg images? Also, in my LoRA, I can't get p...

I had this theory for why reg images are not working properly, I personally think you'd have to generate images on the same seed you put your training to and the same prompts your captions of your training images are, not sure how accurate this is , didn't have time to properly test it yet..
reg images in dreambooth needed the activation word of your model (to target the right part of the model) so that might be the same if you use captioned images in lora

hollow spruce
stone garden
#

Ty so much. 😄

warm agate
#

@hollow spruce i trained the lora

#

but i dont know if its really good

#

do you have any workflow and sample prompts to test?

hollow spruce
#

I usually mess with random civitai prompts, and generate an image without and with my lora

warm agate
#

@hollow spruce can you share a workflow for testing lora, as i dont have any

versed crescent
#

Can you get over-training with a lora ? I remember over-training in SD1.5 Dreambooth and the subject's likeness was present in every single face that was generated

hollow spruce
#

up to a certain point it's usually good, and after a certain time it gets bad

#

especially bad if you accidentally also trained things like low quality noise/backgrounds/a watermark present in all training images

versed crescent
#

Ok, and I guess that's why you have checkpoints generated every n epochs, and you manually test them to find the right balance

versed crescent
#

I haven't quite figured out the maths for a good epoch balance, so my current run has generated 18, which is probably a bit excessive 😄

leaden patio
#

does dreambooth work if you have jpgs, pngs, and webps?

#

i mean in the same dataset

hollow spruce
#

it will work, but if you have sidecar caption files, with the same name
then obviously "image.jpg" and "image.png" can't both have their own "image.txt"

#

that can lead to serious issues - but other than that, you're good to go ^^

versed crescent
#

Are the sidecar caption files suffixed with .npz ?

#

Or is that an intermediate cache of the latent values? (maybe this depends on your workflow. I'm using kohya-ss/sd-scripts)

shadow stream
#

When I train my Lora after a certain number of epochs I just get black output. Is that overtraining or what is that? Any idea of what I am doing wrong?

stone garden
#

Is the loss going up a bad thing, and what can I do to control it? It goes up, but only slightly. I'm already using prodigy with ["decouple=True", "weight_decay=0.01", "d_coef=2.0", "use_bias_correction=True", "safeguard_warmup=False", "betas=0.9,0.99"], I've noticed it happens when the d*lr/d jumps.

shadow stream
#

I'm getting it with other samplers too

stone garden
#

You probably need a lower learning rate and a different sampler.

shadow stream
stone garden
#

I don't know, I'm still doing SD1.5 first.

restive bridge
shadow stream
restive bridge
#

styles, concepts, objects, animals, mostly anything can be trained much faster than faces if you're going for flexibility + likeness + photoreal

shadow stream
#

Thank you for the explanation. What's the best resources for best practice for captioning, regularization and parameters when it comes to faces? Any up to date guides that works for SDXL

restive bridge
stone garden
restive bridge
covert pagoda
narrow kraken
#

hello everyone

#

i just cant get this to run on google colab

#

no matter what i do, it throws the following error

glossy valve
#

Has anyone tried to finetune XL 1.0 refiner for anime style?

hollow spruce
sonic narwhal
#

How did those two guys finetune the base?

stone garden
#

How is ANYONE training SDXL LoRAs? Which commit of kohya-ss do you guys use, the dev and sdxl branches are broken.

hollow spruce
ruby pond
#

anyone know how to find and remove unicode characters from the caption files?

hollow spruce
#

Speed training for faces! it's time

Lora Training Settings - speed training faces edition

(24gb vram version for 3090/4090 or datacenter cards) - no regularization images - trains relatively fast

Use exactly 40 or 45 or 50 or 55 or 60 images (multiple of 5, and as close to 50 as possible)
Do we need captions for images? Yes! Because this is training the clip - hence the instruction are a bit more important to make this exact setup work
What captions?

  • A trigger word (caith, sdxl_token, george, sara, ohwx, shirogane-sama <- can be anything, as we're clip training. no need for celebrity names. Just please dont use "coffee shop" or "toyota". Any normal names, or completely made up names will work though)
  • Your class token (boy/man/girl/woman)
  • any features that aren't present in all images (glasses/sweater/suit/outdoor/indoor/shower/red lipstick/black lipstick)
A few example captions for images from my dataset:

girl, glasses, indoor, shirogane-sama
cindy aurum cosplay, girl, shirogane-sama
asuka cosplay, girl, indoor, shirogane-sama

(order doesn't matter - since we use shuffle captions!)

Training Images setup:

  • they don't need to all be 1024x1024 - its fine to have some lower quality ones, and it's fine to have like 2048x2048 images
  • they do need to be perfect squares 1:1, as we won't be using buckets, to reduce the amount of things that can go wrong. (buckets work just fine, we don't use them to keep this as simple to follow as possible)
  • images should be zoomed in on faces, similar to portrait shots. I'll include 4 sample images of my dataset, so you can see what level of zoom is recommended
  • folder name should be your class token that represents your images. Choose 1 from these 4: boy/man/girl/woman
    (my folder name was 1_girl in this case)
  • No need for regularization photos
  • Repeat must be set to 1
  • Caption files need to have the same name as the images:
    1bec16d.jpg
    1bec16d.txt
  • jpg/png/webp all work just fine - but obviously make sure they all have unique names

Settings:

  • Make sure you're actually on the LoRA tab
  • Change Source model path to your own
  • Change folders to your own
  • Under Parameters -> There's a VAE option. Link it to the 0.9 VAE (then samples are kind of working)
  • Sample Prompt needs to update to match your own:
<trigger word>, <class token>, flavor text --w 1024 --h 1024 --d 2 --l 7 --s 30
shirogane-sama, girl, indoor, glasses --w 1024 --h 1024 --d 2 --l 7 --s 30
#

Expectations:

  • This should work straight on the first attempt, as long as you follow the guidelines.
  • Epoch 60 should be perfectly cooked. (but do try a few below and above that - just to be sure)
  • Training time: 8min per 10 epochs with no samples. (so about 80 minutes for 100 epochs)

Explanations of Parameters:

Consider this training to be a bit more... aggressive... to put it mildly.
Essentially we're using Dropout caption every n epochs to literally nuke the model with our training info.
We are using Text Encoder training - hence what captions are used is important, as some captions will break the model quickly. Start out with my recommendations, and then slowly expand from there.
All learning rates are set to 0.0005. This slows down training a bit - but that's needed due to what we're doing to the poor sdxl model with Dropout caption every n epochs
Network Rank (Dimension) is set to 24. This is the highest you should need to go - while higher may give better results - don't mistake this for your lora getting better vs the whole sdxl model getting worse. Essentially our dropout setting should emulate the effect that that higher dim setting was giving. It's not necessarily the best option that exists - but it's certainly not more destructive than using dim 128 or 256.

F.A.Q.:

Q: Can I enable Buckets?
A: Yes! Just make sure all the buckets have images that are a multiple of 5.

Q: Can I use more images? (Like 100!)
A: Yes! It will most likely increase quality, but the epochs that this needs to run will change. Basically just try out your various checkpoints afterwards, and let it run for longer.

Q: Will this run on 16gb vram?
A: Yes! Batch size & caption dropout will need to adjusted. To what? That requires testing - feel free to try out various combinations and report back.

Q: Will this run on 12gb vram?
A: Most likely not. Text Encoder requires some vram as well - and that will probably push you above 12gb vram 😦
(But you can still train with other settings that don't use clip training)

Q: Why is it taking longer to train?
A: Cause we're generating samples. Feel free to turn them off for an almost 100% speed boost.

Q: For captions, can I write "a photo of a man standing inside a room"
A: No. Captions need to be simple words separated by commas. Simple but effective.

Q: Are more captions better?
A: Usually not. There are a lot of words we really don't want to train, so we're keeping it super simple on purpose.

Q: What if I want to train without captions?
A: Then this is the wrong setup - there are many other ways, just that this one relies on a few captions per image

Q: Should I save the training state?
A: Yes! It will let you pick up right where you left off. Meaning you can set training to 60 epochs, finish it in 48mins, and if you're unhappy with your checkpoint, you can just resume training again.

Q: Should I change the Save every N epochs setting?
A: You can change to it to like 10 if you want. But keep in mind that every 5th epoch is a 'big one', since that's the one that runs with dropout.

Q: Why is this using offset noise of 0, instead of 0.0357?
A: This... is a lot more complicated to answer. But in a nutshell, it will make our images less grey in the end.

#

4 training images for context (I used a total of 45, in random cosplays, random positions, random outfits & hair colors, random backgrounds)
I trained her face, and lightly the aesthetic of her images

#

without/with lora (base only - 1 sampler node only)

#

(images were made using random civitai prompt - so I can be impartial in how well the lora works)

hollow spruce
ruby pond
#

Amazing! Thanks @hollow spruce w00t

latent charm
#

Can't wait to try it

ancient pier
#

@hollow spruce wrote "mages were made using random civitai prompt "

Or using random prompts supplied/inspired by the audience lol ( I recognise the 70's feeling)

hollow spruce
#

also, in case somebody is wondering, this is what dropout even means

Dropout caption every n epochs
Usually, images and captions are learned as a pair, but it's possible to train just on "images without captions" every certain number of epochs.

This option allows you to specify "drop out captions every ○ epochs."

For instance, if you set this to 2, you will conduct image training without captions every 2 epochs (2nd epoch, 4th epoch, 6th epoch...).

By training on images without captions, it is expected that your LoRA will learn a more comprehensive feature set from the images. It can also help prevent the image features from being tied too closely to specific words. However, if you use captions too sparingly, your LoRA could become ineffective at prompts, so be cautious.

The default is 0, and in the case of 0, caption dropout is not performed.

Rate of caption dropout
This is similar to the "Dropout caption every n epochs" mentioned above, but during the entire learning process, you can train on "images without captions" for a certain proportion of the time.

Here, you can set the proportion of images without captions. 0 means "always use captions during training," and 1 means "never use captions during training."

Which images will be trained as "images without captions" is determined randomly.

For example, if you train LoRA with 20 images, reading each image 50 times for just 1 epoch, the total number of image learnings is 20 images x 50 times x 1 epoch = 1000 times. If you set the rate of caption dropout to 0.1, 1000 times x 0.1 = 100 times, you will train on "images without captions."

The default is 0, and all images are learned with captions

Consider this option going nuclear. Might be great for style loras, but anything else is technically seen a wrong application. It's working here since we're only training for a few epochs - and are fine with the little damage that is does do. It's still a lot less damage than using network rank 256

covert pagoda
#

@hollow spruce I saw BMaltais write a review of Ai3 Lycoris where he had two dataset image folders one regular 10_busterkeaton man and 10_buster Keaton hat. Strangely no class name it seems. But I'd be interested to know if dividing central concepts in dataset preparation benefits training with decisive separation of the concepts https://www.reddit.com/r/StableDiffusion/comments/14low8y/lora_lycoris_ia3_is_amazing_info_in_1st_comment/

stone garden
covert pagoda
#

Thanks for your response

hollow spruce
hollow spruce
#

I've done it a few times so far

#

especially for my big datasets, where I did this

#

it's also how I include regularization images, when I misuse them as training data rather than regularization data

orchid yoke
# hollow spruce especially for my big datasets, where I did this

Curious what your take on a small dataset is, like, on the lower end of 10 - 20 images. Its pretty easy to get, just photo style of the subject. You can pretty much do that with 1 image, or one image with a coupel of different crops. So i was about to try, your last method, with the real images i can collect, plus, generated photo images of that subject which i guess isnt ideal, but.. was trying to think of, the best you can do with the least.

hollow spruce
#

also depends on quality of those images

orchid yoke
#

down to that snappy tagging? (+ good quality images)

hollow spruce
#

keep in mind that with small datasets, you might accidentally end up training something like a jpg-compression artifact lora 🤣 happened to me once when I trained on not-so-high quality images, and since there weren't enough, I accidentally made a lora that added jpg compression 🥲

#

but yeah. 2 high quality images is the lowest that really "works"

orchid yoke
#

Yeah i appreciate everything, Loras were always a side thing, i could just throw in before on 1.5, so having to learn a bit just to do what was quick mockups, but im in love with SDXL and i dont want to go back 😄

hollow spruce
#

at 1 image, you're just building a weird controlnet lora to hopefully reproduce the right thing.
can only recommend to get it good enough to produce a few good images - then reuse those to train a good lora

hollow spruce
#

and can be fixed with a bit of prompting

#

not my thing, since my long term goal is to make a finetune that is trained on less than 50% professional photography
but I can vouch for a lot of loras trained on nothing but synthetic images, and the lora is 😘 chefs kiss!

#

(especially style loras suffer essentially 0 quality loss for synthetic training)

covert pagoda
#

Im about to move into style training and my main use case, as I am a photographer by trade, is to train for specific photographic styles which include studio flash or very natural soft light, or composition and film look. Usually this is done by using in style of so and so. But as I want to refine the styles into something much more focused and specific, I wondered if you had some basic art style tenets. I suppose cohesion image to image is the most important thing. So the end result in the Lora is snappy. I wonder if art/fasjoon photographic styles as concepts are difficult. I shall know soon. Will test with standard Lora and prodigy Lokr @hollow spruce

#

Have had some excellent results with loha/profigy for character...

onyx thicket
#

Has anyone tested the difference between training a lora on the base model vs a different model which images you like better? Like dreamshaperXL. Which is a better base for a lora

covert pagoda
#

Here's an example of character LoHA. Dataset of 20

#

And gen

#

40 repeats, batch 8, 6 epochs

#

And yes, fully images are hit and miss as far as face accuracy though I suspect adetailer and impairing should be standard for full body

hollow spruce
#

what network + Convolution ranks + alphas were you running on?

covert pagoda
#

Pretty much all default on preset loha/prodigy. Except precision at bf/bf. Dim alpha 32/16

#

Nothing really changed but I did curate my dataset a ton

#

For the photo styles I guess I could organise image folders under film look, light, edgy (for stuff that is more complex to categorize). The first two are pretty standard and not much captioning to do. But the third would be more of a flavour thing. I wish it was possible to use yaml files to increase repeats on specific important subfolders recursively like in Everydream

hollow spruce
covert pagoda
#

The Lycoris discord guys guys discuss it a little, though they are not especially always forthcoming with config settings. They talk math more often than not lol

covert pagoda
hollow spruce
warm agate
#

how to get prodigy for gui?

covert pagoda
warm agate
#

oh ok tq

warm agate
covert pagoda
#

But if I want to generalize from a variety of similar outfits from a same collection, should I fine tune a model rather than do a loha with several outfits and mix them up with a multiplier?

warm agate
#

@hollow spruce what does this mean?

covert pagoda
warm agate
#

oh ok

covert pagoda
hollow spruce
warm agate
#

i disable it rn

#

idk why even with basic settings the vram consumption is high

hollow spruce
warm agate
#

which is better adamw or prodigy?

simple ivy
#

Can anyone offer assistance on the relationship between training images and epochs and steps? In Kohya LoRA training I have 33 images, set to train with batch size 1, no bucketing, learning rate 1 ... I'd expect epoch amounts to 33 steps, right? One step for each image? But some other setting is making that 1 epoch amount to 5445 steps. How do I make sense of this?

warm agate
simple ivy
# warm agate steps = (images x repeats) / batch_size x epochs x 2(only if reg images are use...

repeats? I don't see such a setting. And I am not using reg images right now so that should mean no doubling. I have all settings I can find to just x1 so I expect 1 epoch should equal my training set. 33. 5,445 is so weirdly huge I can't figure out what is going awry

-edit: I think I found it - just needed to type some things out and then do some arithmetic. Conflict arising between trying to piece together incomplete direction from multiple sources.

hollow spruce
warm agate
#

i'll train it once this gets completed

hollow spruce
warm agate
#

tq

warm agate
#

@hollow spruce do you suggest loha or lora for characters?

warm agate
hollow spruce
hollow spruce
warm agate
hollow spruce
# warm agate oh ok but 100x160

🤷‍♂️ might take less epochs, might take more. My preset was designed for 50 images. You'll have to try and find out

warm agate
#

oh ok

#

will use 100 and try

cyan harbor
#

Hi guys. I was working with stable diffusion 1111 without a problem today but since I have copied the ckpt file version 7gig it stopped to work with SDXL refiner and even after I have deleted it it doesn't work when I push the generate button it gives me a black image 😔

tidal silo
#

Does anyone have any good information or resources on instance & class tokens for training SDXL? I have seen things such as "ohwx" recomended because its unique & 1 token? but also seen people say with SDXL just use the name of the person or something else? really trying to understand better.

tidal silo
stone garden
#

It helps if you want to re-use the images later.

stone garden
# warm agate which is better adamw or prodigy?

I’ve found that with Prodigy min_snr_gamma has a significant effect. Set it low for simple Loras (like characters) and high for complicated Loras (style, for example). But outside of that, it’s really been the only parameter I needed to adjust for good results.

#

Don’t know about 100% of the way though. Never went that far.

#

And my dataset is pretty shit to say the least. The captions are good, the images are not edited in any way. Just selected/pruned.

latent charm
#

@hollow spruce

#

I also has this effect using your preset. Does it related to dataset? How do you resolve it?

hollow spruce
# latent charm <@211089689652887552>

the samples are always half-working. same for me.
it's good enough to roughly know whats going on - but actually testing the checkpoints will give proper results

latent charm
#

It has the effect in real run.🤣

hollow spruce
signal warren
#

Maybe you have that setting which ends the steps early?

placid stag
#

没有中国人吗

south jungle
#

有美国人

signal warren
#

If someone wanted to make a SDXL finetune with many concepts and ideas all in one, but can't because they only have 12gb vram, would making a huge lora or multiple loras then merging it with the SDXL model be any good?

stone garden
restive bridge
#

just realized SEcourses is training the text encoder with only an instance token and no captions in his video. and results look good🤔 trying it for myself, except his params for 16 imgs is 3 fkn hours

hollow spruce
#

text encoder for single words work just fine - but if you pick the wrong one, uff does it go bad quickly XD

#

so use with words you've tested are ok

restive bridge
#

random string?

#

i.e. "2jF7DT98L"

hollow spruce
#

in your particular case, I'd rather figure out the perfect parameters for training on the 4 main class words for people though

#

boy/man/girl/woman

#

cause once you got those down, you can train them much easier

restive bridge
hollow spruce
#

did a lot of tests the last two days, and it should work

#

I'm starting to get good results with less and less images

#

(much trial and error though)

restive bridge
hollow spruce
#

that will work if you can get your training to finish in under 100 epochs (1 repeat)

#

so basically play with training rate, until 50 epochs looks 'perfect'

#

that way you also train much faster obviously

#

I've even pulled of a somewhat successful training on a single image - using that method

restive bridge
hollow spruce
restive bridge
#

ok I'll give it a shot thank you

fervent sandal
#

Hey all. I've been interested to train a lora. But my set-up of using Linux means I get a crash with the koyha_ss gui. Can anyone say if they have had success in just using the sd-script along (on Linux)?

hollow spruce
latent charm
#

@hollow spruce Your preset is good but I think the likeness is not enough. My caption is using nature language like this: {rare_token}, a {subject}..... Should I reduce token in caption? I think the result of this attempt is like 70-80% of original using default epoches with 60 images. If I extend the epoch, might be increase the likeness?

fervent sandal
hollow spruce
#

and you're using 50 images? or less?

#

I should probably stop using epochs as a reference. essentially 3000 steps (600 steps at batch 5) is where my loras turn out ideal.
but the full range is 1500 steps to 4500 steps (so that would be epoch 30~90)

latent charm
#

using 60 images

hollow spruce
#

it's why I initially suggested 100 epochs - just to be sure

hollow spruce
latent charm
#

I would try the 4 token caption way

hollow spruce
#

basically you write a word for everything that actually changes between the images in your dataset

latent charm
#

how detail should this 4 parts be?

hollow spruce
#

"glasses" aren't needed if the subject wears a glasses in all images.
but if they only wear it in half the images, then its important to tag it

latent charm
#

like clothing, white t-shirt and blue jeans or t-shirt and jeans?

hollow spruce
#

to give an example

hollow spruce
latent charm
#

I cropped all as 1024 for testing

ancient pier
#

@hollow spruce L to R
No LORA
SDXL Offset Example at 0.65
Your LORA at 0.65

Nice 🙂

sonic narwhal
ancient pier
restive bridge
#

@hollow spruce would i not see any benefit from adding regularization to your method? considering I don't have captions, and many images have matching clothing, and training set is usually 12-25 imgs, and backgrounds are all white

#

if i can afford the time

hollow spruce
restive bridge
hollow spruce
# restive bridge have you experimented with batch 1? I keep hearing quality is best one image at ...

batch size impacts learning rate & in case of clip training, how the actualy clip training works
so it's more of a case of people using presets designed for batch 1 - then having worse experiences after using a higher batch size - which makes total sense.

higher batch sizes can deliver better results, but that doesn't mean that training gets automatically better with high batch sizes.
Basically, one setting will never fit all vram options.
(but you can use batch 1 + Gradient accumulation 5, to get extremely close to batch 5) <- so that helps when you already have a good workflow that is designer for a higher end card

restive bridge
#

@hollow spruce "man" token worked surprisingly well but not even 100 epochs was enough so I may extend it. artifacts on clothing is WAY less than before with celeb and reg, but detail is lower for sure. Could I raise dim just a bit?

hollow spruce
restive bridge
hollow spruce
covert pagoda
# hollow spruce interesting. then it's probably that you have 25 images, instead of my usually t...

What about gradient accumulation on batch 10 on a 80gig A100 card. Is there any benefit to update weights less frequently if using a high batch size with a larger gpu? I'm running right now at batch 4 with excellent results but at 3 hours training. I'd love to bring time down. Would higher batch size plus grad accumulation combo = faster iterations (higher batch) + slower updates (less frequent updates)? What is the correct calculation to predict gains?

hollow spruce
# covert pagoda What about gradient accumulation on batch 10 on a 80gig A100 card. Is there any ...

with an A100 you can do cool things.
I'd probably try a dataset of 50~100 images, and fitting the whole thing into 1 single checkpoint
(so like batch 50 with GA of 2, for exactly 100 images) <- will take a while to get all the settings right, but then you can pull of really cool things

also, you can train at 1536x1536 <- captures more detail, but obviously your images need to be high quality enough, to not accidentally train image noise found in low to mid quality jpgs

versed crescent
#

How does a model like SDXL cope with mixed resolutions? Could a LoRA encode higher resolution information than the base model?

covert pagoda
hollow spruce
#

still a lora.
also, you can do finetune style loras with that gpu (training on 30k images)
which brings similar results to full finetune on SD1.5
^ that's what I'm currently doing, where my rtx4090 can barely keep up, by running for 20~30 hours

restive bridge
#

why would extending epochs make likeness drift further and further 🤔 i would expect it to start overfitting but it's more like it starts un-learning entirely.

signal warren
stone garden
stone garden
versed crescent
#

Ok so to train within 16GB, one can follow Caith's writeup, drop Batch Size down to 1, and raise Gradient accumulate steps up to 5 under Advanced Config? Would there be any mileage in changing the Optimiser too? It's currently set to AdamW and am I right in thinking that AdamW does nothing fancy in terms of memory or lower bit quantisation ?

stone garden
sonic narwhal
#

Anyone has a dataset of regularisation images for faces that they can share?

sonic narwhal
#

Why does Caith not use bf 16 for both mixed and saved precision?

fervent sandal
stone garden
#

(For me, system Python was 3.10 but python3-tk installed a version for 3.8 for some reason.)

fervent sandal
# stone garden Which python version are you using? Try installing `apt install python3.10-tk` (...

Looks like others are having the same issue (https://github.com/bmaltais/kohya_ss/issues/873 and others). Using a python that was not installed by the system package manager is going to cause all sorts of issues to be raised.

GitHub

Apologies, I don't have a particularly masterful understanding of python environmental settings. I'm on Ubuntu 22.04. I've cloned the kohya_ss repo, entered the newly created directory ...

hollow spruce
#

(also, when I tried it - I saw no difference that I could actually notice, so I just left it on what was recommended to me)

#

might have been a fringe case though - so if you notice a difference, please do share ❤️

stone garden
hollow spruce
stone garden
#

No disagreement there. 🙂 Just saying for those who really want to.

fervent sandal
stone garden
fervent sandal
stone garden
#

Wait, you're mixing apt and yum on the same distro? One comes from Fedora/RHEL and the other from Ubuntu. How's that even possible?

fervent sandal
stone garden
#

Oh -- forgot to say. You need both system and venv installations for tk to work.

fervent sandal
stone garden
stone garden
fervent sandal
# stone garden Do you have the option to use a different AMI? That one seems effed up.

Well this is the one Amazon have optimised for GPUs with python and the frameworks. But I'm guessing as it is a headless environment they won't have bothered with a GUI library like Tk. Docs: https://docs.aws.amazon.com/dlami/latest/devguide/what-is-dlami.html

stone garden
covert pagoda
#

does anybody know if possible to resume training on kohya lora even if i didnt tick the save training state on the last run?

#

also, what is this path for, LoRA network weights

digital dew
hollow spruce
# digital dew Hello, I've been wondering if I could replicate something like this for other ar...

the proper way?
get enough dataset images with a high enough quality (100~500)

the easy way?
Use that lora (or any other lora) + highres fix to generate around 100 images. Then mess with sdxl until you also get about 100 good enough images.
then train on those 200 images. (this is called synthetic training)

the long way?
find whatever high quality images you can on the internet
then use those to train a v1 of your lora.
Then generate 1000 images using that lora, pick your 100~200 favorites.
Combine those with the original dataset -> then train the mix of original + synthetic images to make a really good v2 of your lora.

digital dew
#

When you say high quality, you mean high resolution?

#

I'm thinking proper captioning is also critical? What should I be thinking about when I caption images?

hollow spruce
hollow spruce
#

if that doesn't work - then you should worry about captioning

digital dew
hollow spruce
#

that gets loaded into kohya automatically

digital dew
#

Ah, so I can train with the images alone? I was thinking captions are also required.

#

Sorry, I'm quite the noob at this. 😅

hollow spruce
hollow spruce
warm agate
#

@hollow spruce trained my lora

#

testing it

wind compass
#

Hey anyone, I want to trained an inpainting model on traditional dress. Currently working on scraping the dataset. Hoping to use 30k-50k images. Which model to used? and most importantly how to get caption for each images(clip, blip, or manually..) that will help get good training results. Last time have tried training a simple model on sd2.1 using manual caption on 30k images with plain background. The model trained was not good. Can anyone help?

versed crescent
#

I wonder if flipping face images horizontally would be good for increasing the size of a small dataset

restive bridge
latent charm
#

@hollow spruce Using your preset, same scheduler, same learning rate, same optimizer. 100 epoch, 200 epoch, 250 epoch, 275 epoch, 300 epoch.

latent charm
#

I think around 275 is the sweet point for this type of training. 300 is a little bit overfit.

slim plaza
#

Hello, how is the stable-diffusion-inpainting model trained? and why do we need seperate checkpoint if SDEdit-based method (does not require training) used?

hollow spruce
#

good results! ❤️

latent charm
hollow spruce
latent charm
#

loss around 0.15-0.13 in a few epoch. After that mostly around 0.13-0.1. In very last, it became 0.11-0.08.

latent charm
hollow spruce
latent charm
#

Thanks for advice

latent charm
hollow spruce
latent charm
#

Ok. 👌

hollow spruce
#

but that is specifically for annealing

#

@latent charm just did a quick check - but the preset for prodigy seems to have everything set correctly - including the special optimizer settings, so probably use that if you really want to use prodigy

#

(it was updated a few days ago - so make sure your install is up to date)

latent charm
#

Just updated in yesterday. It should be fine.

versed crescent
latent charm
#

I just share what I saw in my training using @hollow spruce facr preset in 300 epoch with 40 images

versed crescent
#

I'm in epoch 194 and loss is around 0.12, so it looks similar (but I'm using Caith's original settings with Adafactor)

versed crescent
latent charm
#

Even the loss didnt come down, The model still learned.

#

You could see the different in above conversation

versed crescent
#

@hollow spruce Regarding regularisation images, are there basically two choices in training, captions or regularisation ? I saw your comment somewhere above that the current LoRA tutorials on youtube which use regularisation images have a weaker ability to create detailed backgrounds. Is that the weakness behind the regularisation approach?

hollow spruce
#

but reg images are always good for face training - just changes your learning rate & the length of your training to some extend

#

but quality should be better - at the expense of training longer

versed crescent
#

Aaah ok I misunderstood

#

I'm trying your above approach, but I had to reduce the batch size to 1 to fit my 16GB card. I've also changed to Adafactor for the optimiser. I think it's working, but my training images aren't as varied as yours, so many of my captions are the same, so I'm unsure of how successful it will be. All part of learning 😄

hollow spruce
versed crescent
#

Yeah but I don't cosplay enough, clearly. Need some more variety in my life 😄

versed crescent
#

Interesting. The sample prompt I'm using for every 5 epochs has created an image that's much more like the training set, rather than a slowly morphing face. I wonder if this is the start of overfitting

lethal oracle
#

i want to make ai images from face of my friend which method should i use textual-inversion, lora, dreambooth, kohya_ss?

stone garden
hollow spruce
#

@urban halo

stiff dust
hardy storm
#

I have a question about the concept of using a celebrity's name for training instead of a the generic "ukj", etc...
I recently tried doing this with a friend of mine. I used "Tom Hanks" b/c he kind of sort of a litte bit resembles him. I trained both a Checkpoint and a Lora. It worked pretty well for the Checkpoint ... however, with the Lora (and I assumed this would happen), when I use the Lora with a different checkpoint (say, Photon), it creates a person who looks like a radiated blend of my friend and the actual Tom Hanks.
Am I correct in assuming that this concept does not work for Lora's?

covert pagoda
#

small hickup with training, i am having trouble resuming from last saved State. Error message is Kohya script could not locate the State folder.... any ideas what could be causing this minor error? I am using the / in the path.

covert pagoda
sour eagle
#

is it possible to merge checkpoints in kohya or is a1111 the only way?

latent charm
#

@hollow spruce After 5 hours with 300 epoch, didn't learn enough.

#

Always bouncing between 1.0~1.4, sometime get 1.6~1.7

hollow spruce
#

but really really odd :/

#

cause it should learn everything, even with standard cosine

#

(even if not as aggressive)

latent charm
#

300 epoch. Left is cosine prodigy. Right is constant with warmup adam.

#

prodigy with cosine didn't learn much at all.

#

No lora

#

Might be it is due to I didn't set the Optimizer extra arguments "decouple=True weight_decay=0.5 betas=0.9,0.99 use_bias_correction=False"

livid rapids
#

I'm just getting started training loras on 1.5, using kohya_ss. I looked around for guides but they all seem to be "just copy my example for this 16 image sample of a character! quick and easy!". Is there a more in depth guide that explains all the options/parameters and what they actually do? Like, what should I be thinking about when setting a learning rate? What are the 20 different optimizers and why should I use one over the other? etc.

Also training for a concept rather than character with around 200 images, so the endless tutorials with 16-20 images probably use settings that I should change. I just don't have the info on what and why to change, though.

crimson tundra
#

Is there a way to change a body part shape and size using inpaint?

hollow spruce
hollow spruce
pliant drift
# latent charm https://github.com/konstmish/prodigy/issues/3 refer to this issue. Seems prodigy...

yeh. while i've been getting good results with prodigy, i've noticed it could be better when i look at lr graphcs. I found this https://github.com/kohya-ss/sd-scripts/pull/271

GitHub

Add argument --lr_scheduler_type and --lr_scheduler_args to use lr_scheduler from another library

For example, to use torch.optim.lr_scheduler.CosineAnnealingLR with T_max=100 as lr_scheduler, we ...

pliant drift
#

these options just showed up in the gui recently. we can add cosine annealing now

#

hmm nevermind maybe not. the annealing is seperate and can't be used. i'll have to craft a command

#

seems like a pita. i'm going to go back to adamw

livid rapids
#
  num train images * repeats / 学習画像の数×繰り返し回数: 2770
  num reg images / 正則化画像の数: 0
  num batches per epoch / 1epochのバッチ数: 1385
  num epochs / epoch数: 4
  batch size per device / バッチサイズ: 2
  gradient accumulation steps / 勾配を合計するステップ数 = 1
  total optimization steps / 学習ステップ数: 5540

It's estimating 4 hours at 2.8 it/s for this lora. 270 images at 10 repeats each, 4 epochs, batch size of 2. I'm running on a 12gb VRAM RTX 3060. Are these normal speeds?

restive bridge
#

dim 24 vs. dim 256, from aitrepreneurs new video

stone garden
tribal frigate
#

Guys, do you have tips for tutorials on how to train your own embeddings for SDXL using google colab?

covert pagoda
#

If anyone interested

tribal frigate
#

Also... is LORA tied to a specific model or is it like an embedding that you can use on top of a model?

covert pagoda
stone garden
open merlin
#

How can I make the learning rate decay over the repeats when using cosine with restarts? Every repeat returns to the initial learning rate though it would be better if every repeat is a little less than the one before.

hollow spruce
latent charm
#

@hollow spruce Tried prodigy with https://rentry.org/59xed3 this suggested extra parameter. It is better than the default parameter in the gui preset.

#

Prodigy 300 epoch cosine prodigy
decouple=True weight_decay=0.5 betas=0.9,0.99 use_bias_correction=False
Epoch: 100, 200, 250, 275, 300

#

Prodigy 300 epoch constant prodigy
decouple=True weight_decay=0.01 d_coef=2 use_bias_correction=True safeguard_warmup=False betas=0.9,0.99
Epoch: 100, 200, 250, 275, 300

#

But overall, I still think constant adam produce the best result over 300 epoches.

covert pagoda
covert pagoda
latent charm
hollow spruce
covert pagoda
#

Worth turning off TE in arg?

covert pagoda
hollow spruce
#

1e-3 is kinda the best 'value' of learning rates that I've found. best performance to training time ratio
5e-4 is the highest quality I achieved, but yeah, for some subjects it needs to run til like epoch 250~350. :/

my next test will be to do a small full finetune, and see if training speeds change on that finetuned sdxl model - where only the faces are easier to train

#

not sure if that will work out, but we'll see

covert pagoda
hollow spruce
covert pagoda
#

Batch 6?

latent charm
#

I used batch 5

hollow spruce
#

repeats are only relevant if you do unevenly weighted datasets - or are training a smaller amount of images on a specific face, and need a lot of regularization images to run with it

#

but for all other situations, repeat 1 is always the go to method, since it gives you more control over everything

latent charm
#

constant adam

covert pagoda
#

here are mine. prodigy cosntant then with annealing -- --network_train_unet_only --lr_scheduler_type "CosineAnnealingLR" --lr_scheduler_args "T-max=25" and weight_decay=0.5 d_coef=2 use_bias_correction=True

#

a little higher on the weight decay

#

constant was good, green. the adapt scheduler went a bit bonkers and didnt learn. Probably because my weight decay was too high

covert pagoda
#

but in my case, i am working on a character. 23 images at the moment (might go up to 50)

#

10 faces, 10 midshots, 3 full bodies.

latent charm
covert pagoda
#

and i use reg with photographs, to inbibe photo style... prodigy constant

#

dataset:

covert pagoda
hollow spruce
# covert pagoda ok, i didnt know that.

basically there's no downside to just increasing epochs. but you have the advantage of using settings like 'dropout every n epochs' or the cosing with restarts with a lot more control

latent charm
hollow spruce
covert pagoda
latent charm
#

I didn't use wandb on the testing. Although I have the account, a little bit lazy to set it up.

covert pagoda
#

Can you give an example set of concepts in one of your trainings?

#

I wonder if for instance for the above asian model, concepts could be:

#
  • a certain hair style