stiff dust Sep 6, 2023, 10:53 AM

#

it's not about noise, but about how to weight different timesteps in the diffusion model

#

SD tries to predict noise in your image. As less noise you have, as harder the task, as higher the loss. With min_snr_gamma you stabilize this a bit, saying that very low amount of noise are not too overweighted during training

normal ember Sep 6, 2023, 10:55 AM

#

I think I'll try to remove clip_skip and add min_snr_gamma=5 first and keep steps 3000 and lr 0.0004

stiff dust Sep 6, 2023, 10:56 AM

#

I would say, intuitively, it is more about: do you want to train on high details or on overall composition. min_snr_gamma should be used when you train on composition. You should not use it if you train on details. For example, when you train on faces, you don't want min_snr_gamma, because tiny details like skin structure are super important for you

normal ember Sep 6, 2023, 10:56 AM

#

Yeah, then I think your recommendation is spot on.

#

I've seen examples when they have set gradient_accumulation_steps to 4. It seems to affect learning rate too? I've gone with the defaults of 1.

stiff dust Sep 6, 2023, 10:58 AM

#

nah, wouldn't use that

#

make your batch size as large as possible, that's more efficient

normal ember Sep 6, 2023, 10:59 AM

#

I could go 8 too, does that affect the learning in any way other than it's faster?

stiff dust Sep 6, 2023, 10:59 AM

#

dunno. I would always do as highest as possible.
There are other people saying lower batch size is better. I don't believe that. But I never did experiments on that.

normal ember Sep 6, 2023, 11:00 AM

#

That's why it's 4. 😄

stiff dust Sep 6, 2023, 11:00 AM

#

it's not faster, though. High batch size makes it rather slower

normal ember Sep 6, 2023, 11:00 AM

#

1 is slow

stiff dust Sep 6, 2023, 11:00 AM

#

it's only faster if you also increase learning rate. I usually use batch size 10 but still a low learning rate of 4e-4

stiff dust Sep 6, 2023, 11:01 AM

#

normal ember 1 is slow

only because it trains longer. But how long you train the model is your decision anyways

normal ember Sep 6, 2023, 11:01 AM

#

I feel there's more overhead when loading the images for each step the lower batch size you go

#

It spends more time loading than training if you understand what I mean.

stiff dust Sep 6, 2023, 11:02 AM

#

hm... yeah okay, I never had that many images that they would not fit into my RAM

normal ember Sep 6, 2023, 11:03 AM

#

I guess you do without --cache_latents_to_disk then.

stiff dust Sep 6, 2023, 11:03 AM

#

yes

#

I keep the latents in RAM

normal ember Sep 6, 2023, 11:04 AM

#

and with what optimizer?

stiff dust Sep 6, 2023, 11:04 AM

#

AdamW

normal ember Sep 6, 2023, 11:04 AM

#

number of steps?

stiff dust Sep 6, 2023, 11:04 AM

#

as long as necessary

#

just record validation images and stop training if you are happy with the results or nothing happens anymore

normal ember Sep 6, 2023, 11:07 AM

#

I'll go with this then: https://gist.github.com/twri/0df8c1df30a9ed83be2d92261159445e

#

I will see what I can fit it memory

#

batch size 2 is the largest I can go without caching to disk.

#

loss is lower now

normal ember Sep 6, 2023, 3:24 PM

#

@stiff dust I've read somebody claims that SDXL was trained with --noise_offset=0.0357, do you know if there's any truth to that?

stiff dust Sep 6, 2023, 3:24 PM

#

it's the best working noise offset in my experience

#

but Joe Penna said they trained sdxl several times with different parameters, so there is not a single right noise offset 🤷‍♂️

covert pagoda Sep 6, 2023, 3:29 PM

#

Anyone know a if there is a comfyui workflow for testing Lora’s in an xy plot fashion?

stone garden Sep 6, 2023, 4:30 PM

#

What is this 1girl token about?

normal ember Sep 6, 2023, 6:53 PM

#

Could it be just a random token?

rain scarab Sep 6, 2023, 9:32 PM

#

stone garden What is this 1girl token about?

the prompt to produce a girl in the image

#

usually followed by solo somewhere

stone garden Sep 6, 2023, 9:36 PM

#

rain scarab the prompt to produce a girl in the image

Hmm i never had problems getting a girl so idk why i would need to include that prompt

#

It is part of the preset for wd14 captioning

#

Solo and 1girl

rain scarab Sep 6, 2023, 9:46 PM

#

maybe the engines understands it as one word to save prompt space?

stone garden Sep 6, 2023, 10:14 PM

#

Even if i dont describe any woman i get a girl because the class prompt i chose is woman

#

I don’t even need to include my instance prompt, the lora thing with <> around is enough

#

It always worked for me

latent charm Sep 7, 2023, 2:19 AM

#

@stiff dust After tested my lora, text encoder doesn't need to train to produce good enough result. Might want to try find out how should we proper train the text encoder

covert pagoda Sep 7, 2023, 7:41 AM

#

stiff dust just record validation images and stop training if you are happy with the result...

Hi, just curious, when you say validation images, do you mean sample prompts without the trigger word to see when it starts to overfit the class token?

quiet eagle Sep 7, 2023, 9:15 AM

#

do you usually do (blip) captioning just for your training data or for the reg, too?

#

and do you add the class prompt as a prefix, too or only the main prompt

stiff dust Sep 7, 2023, 9:25 AM

#

covert pagoda Hi, just curious, when you say validation images, do you mean sample prompts wit...

no, I use the trigger words in unusual styles or combinations. e.g., "charcoal drawing of xyz", "watercolor painting of xyz", "comic stripe with xyz" and so on

covert pagoda Sep 7, 2023, 10:26 AM

#

Oh I sees kind of simple genius idea

#

Thanks

covert pagoda Sep 7, 2023, 10:39 AM

#

stiff dust no, I use the trigger words in unusual styles or combinations. e.g., "charcoal d...

Do you use loss graphs at all? What’s your philosophy on it? Or just prompt sampling/cat plot?

stiff dust Sep 7, 2023, 10:48 AM

#

no, I don't think the loss gives you much valuable feedback

#

quite often loss gets higher in the beginning although the image improves

#

also the loss depends heavily on the sampled timesteps. If you want to interpret the loss you have to use proper validation data with fixed timesteps.

normal ember Sep 7, 2023, 6:26 PM

#

Can one transfer lora training parameters if you switch to full training? Is there anything that have to change?

#

Should you still train unet only, for instance?

covert pagoda Sep 8, 2023, 12:14 AM

#

stiff dust also the loss depends heavily on the sampled timesteps. If you want to interpret...

Do you mean, look at loss graph next to validation samples at regular timestep intervals? Is that what you mean?

fathom vault Sep 9, 2023, 9:51 AM

#

Any models out there that can generate keywords for my images?

#

Like light description

#

And be fairly accurate

normal ember Sep 9, 2023, 11:54 AM

#

fathom vault Any models out there that can generate keywords for my images?

openflamingo is the best I've tried.

#

Especially if you give it a few images with captions for your dataset

stone garden Sep 10, 2023, 11:24 AM

#

min_snr_gamma with any optimizer other than Prodigy tends to do that.

stone garden Sep 10, 2023, 5:39 PM

#

does the order of the prompt in the caption matters?

stone garden Sep 10, 2023, 7:24 PM

#

hmm yeah ill rework my captions and see if it changes

covert pagoda Sep 10, 2023, 11:17 PM

#

anyone has Kohya_ss full Finetuning training? I am stuck on the metadata preparation. DO i run python script in Jupyeter python kernel in the merge_captions_to_metadata.py directory? See doc https://github.com/bmaltais/kohya_ss/blob/master/fine_tune_README.md#preprocessing-caption-and-tag-information. Not sure, as this was not necessary during Lora training..

GitHub

kohya_ss/fine_tune_README.md at master · bmaltais/kohya_ss

Contribute to bmaltais/kohya_ss development by creating an account on GitHub.

stone garden Sep 11, 2023, 10:41 AM

#

do i need regularization images for SDXL Lora training?

normal ember Sep 11, 2023, 11:06 AM

#

covert pagoda anyone has Kohya_ss full Finetuning training? I am stuck on the metadata prepara...

Only used kohya-ss (sd-scripts) to do dreambooth training. I'm using toml-files for my dataset for both LoRA and dreambooth.

#

I train on different AR so it's much easier with a dreambooth style dataset.

#

Still researching parameters for dreambooth so I have only had it run for a few epochs.

covert pagoda Sep 11, 2023, 11:10 AM

#

someone mentioned dreambooth without trigger, but i dont think its the same

normal ember Sep 11, 2023, 11:14 AM

#

https://github.com/darkstorm2150/sd-scripts/blob/main/docs/config_README-en.md

GitHub

sd-scripts/docs/config_README-en.md at main · darkstorm2150/sd-scri...

This is a ChatGPT-4 English adaptation of the original documents by kohya-ss - darkstorm2150/sd-scripts

#

I don't know what's the difference between fine-tune and dreambooth to be honest. I don't specify dreambooth anywhere but my dataset is dreambooth style with captions for each image.

covert pagoda Sep 11, 2023, 12:01 PM

#

normal ember https://github.com/darkstorm2150/sd-scripts/blob/main/docs/config_README-en.md

Thanks. Will test again locally

#

Though I’d imagine there’s a train _network difference in the code

latent charm Sep 11, 2023, 12:59 PM

#

normal ember I don't know what's the difference between fine-tune and dreambooth to be honest...

dreambooth usually means using reg images.

stiff dust Sep 11, 2023, 4:40 PM

#

latent charm dreambooth usually means using reg images.

no, reg images were suggested in the Dreambooth paper, but they were only used for training on faces

#

Dreambooth means you use a rare token as trigger word (the paper suggested the token "sks") to fine-tune the model on a new subject

#

however, the term Dreambooth is used very differently. Sometimes it refers to full-finetuning (in contrast to Lora), sometimes it refers to the style of the caption ("photo of a sks person")

#

I would always recommend to write custom captions when using kohya 🤷‍♂️ you have most flexibility with that and you ensure that nothing strange happens

normal ember Sep 11, 2023, 5:06 PM

#

@stiff dust Is there a reason a full fine tune uses lower learning rate than a lora?

stiff dust Sep 11, 2023, 5:08 PM

#

yes. A lora trains a matrix factorization and not the original matrix.

#

so basically you multiply two numbers to obtain the weight change. Multiplying two small numbers gives you an even smaller number (e.g., 1e-3 x 1e-3 is 1e-9)

#

so you need a much larger change in the two numbers to obtain a noticeable change in the result

normal ember Sep 11, 2023, 5:13 PM

#

Would that also mean longer training when doing a full run given same dataset is used? I know I'm trying to generalize a bit too much but still. 😄

#

There are some fine tuned models claiming 200k steps, not sure how big the datasets are though.

#

200k and let's say 50 epoch is only 4k images.

#

But if it's learning faster then it could be more images I guess.

stiff dust Sep 11, 2023, 5:46 PM

#

not if they used batch size > 1.

But I'm very sure that most fine tuned models are trained on very small datasets (rather 100 images than 4000)

normal ember Sep 11, 2023, 5:53 PM

#

If that's the case it must be a very very slow process

#

Can't find much about fine tunes on SDXL. Do you have links?

#

Many of the fine tunes are not much more than LoRA merges too.

#

Wish we could get our hands on some parts of the dataset for SDXL.

gentle flame Sep 11, 2023, 6:50 PM

#

finetuning SDXL is expensive and time-consuming

#

hopefully optimizations happen or GPUs become cheaper. I'm hopeful for sharding.

hazy herald Sep 12, 2023, 3:10 AM

#

hi, I am looking to train my own SDXL lora and there is an auto-captioning script that I found in a guide

#

call .\venv\Scripts\activate.bat
python.exe "finetune/make_captions.py" --batch_size="1" --num_beams="1" --top_p="0.9" --max_length="75" --min_length="5" --beam_search --caption_extension=".txt" "D:/!PhotosForAI/billie/billie-1024" --caption_weights="https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_large_caption.pth"

#

but this looks like it's calling out to an online service to generate the captions, is that correct?

#

or is it just going to download the file and use it?

normal ember Sep 12, 2023, 8:45 AM

#

hazy herald but this looks like it's calling out to an online service to generate the captio...

You have to check the code. My guess is that it downloads the weights.

stiff dust Sep 12, 2023, 12:32 PM

#

yes, it only downloads the weights.

fair helm Sep 12, 2023, 6:55 PM

#

hi, is that any lora has its own float point (16 and 32)?
if it's so, when merging two loras having different float points, is there any conflict? and how to determine its float point? thank you very much

stiff dust Sep 13, 2023, 10:08 AM

#

either the script you use for merging is dealing with the conversion, or it's not and you get an exception. But you won't get an incorrect or corrupt file back.

sonic narwhal Sep 13, 2023, 11:40 AM

#

Anyone have good json file for training LoRA character or object?

orchid yoke Sep 13, 2023, 4:54 PM

#

sonic narwhal Anyone have good json file for training LoRA character or object?

My gotos at the moment are: for quick, dont want to learn just make it happen

https://education.civitai.com/sdxl-1-0-training-overview/
https://www.reddit.com/r/StableDiffusion/comments/16a2ixm/comment/jz51npm/?utm_source=share&utm_medium=web2x&context=3

Okay i want to learn a bit (caith)
<#🔧｜finetune message>
<#🔧｜finetune message>

and Ai Characters
https://civitai.com/articles/1771/how-to-create-near-perfect-character-and-style-loras-for-sdxl-10-important-update-19082023

All have shared jsons.

sonic narwhal Sep 13, 2023, 5:05 PM

#

orchid yoke My gotos at the moment are: for quick, dont want to learn just make it happen ...

Thank you 🙏👍

stone garden Sep 13, 2023, 9:35 PM

#

man

#

2nd time accidentally training with dreambooth instead of Lora

sonic narwhal Sep 14, 2023, 7:04 AM

#

How many concepts can fit into a LoRA? Thinking if full finetune is even necessary. Going to train on about 15 different products/objects in different categories and for each category there is going to be atleast 3-5 variations of form design & color. Dataset will probably be at around 10k images

latent charm Sep 14, 2023, 7:06 AM

#

Haven't tested 10k. But I am training with 5k dataset. It is decent.

sonic narwhal Sep 14, 2023, 7:07 AM

#

5k LoRA training?

#

How many different concepts are u training

latent charm Sep 14, 2023, 7:08 AM

#

many

#

It contains multiple characters, anatomy, different angles

sonic narwhal Sep 14, 2023, 7:11 AM

#

Does it train new objects besides humans?

latent charm Sep 14, 2023, 7:11 AM

#

My training mainly focus on human.

sonic narwhal Sep 14, 2023, 7:13 AM

#

ok

normal ember Sep 14, 2023, 8:03 AM

#

latent charm Haven't tested 10k. But I am training with 5k dataset. It is decent.

How many steps / epochs do you need for that dataset?

latent charm Sep 14, 2023, 8:04 AM

#

normal ember How many steps / epochs do you need for that dataset?

Set with 100 epochs and 89900 steps with 20 bs using A100. It is experiment. I am not sure which would be the best. 70% now

#

The learning rate is 1e-4.

sonic narwhal Sep 14, 2023, 8:11 AM

#

"Additional parameters: –max_grad_norm=0" Why does it say unrecognized when starting training?

normal ember Sep 14, 2023, 8:25 AM

#

latent charm Set with 100 epochs and 89900 steps with 20 bs using A100. It is experiment. I a...

Will be interesting to know what the results will be when you finish. Have you implemented multi aspect training? I will try it soon I hope. Writing tools for this now.

latent charm Sep 14, 2023, 8:28 AM

#

normal ember Will be interesting to know what the results will be when you finish. Have you i...

Yes, multiple aspect ratio with buckets

normal ember Sep 14, 2023, 8:28 AM

#

But not cropped images as in the SDXL paper?

stone garden Sep 14, 2023, 8:29 AM

#

is training with batch size 1 better ?

normal ember Sep 14, 2023, 8:29 AM

#

Seems like SAI feeds same image cropped in different AR into training.

#

Page 5 and 6 in paper.

latent charm Sep 14, 2023, 8:30 AM

#

normal ember But not cropped images as in the SDXL paper?

I have cropped images from the original images with different captions

#

The original dataset is around 2600. 2400 is anatomy cropped from origianl images

normal ember Sep 14, 2023, 8:42 AM

#

I wonder if you feed pre-generated cropped images would do about the same as their fourier embeddings as conditioning parameter does or if that needs to be implemented properly in the trainer.

#

@stiff dust Do you have any knowledge about this?

latent charm Sep 14, 2023, 8:44 AM

#

Might be

stiff dust Sep 14, 2023, 9:03 AM

#

kohya sd-scripts uses the correct cropping conditioning only if it crops the images itself

#

if you provide already cropped images, then this information is not available to kohya

#

my own solution for that was to write the cropping parameters into the meta tags of the png file and made some code additions to kohya to read these tags and add the conditioning

#

I'm not entirely sure, though, how helpful cropping is

#

I used it for when training my own face: Next to the normal 1024x1024 images I also added a few extremely high resolution images (like 4096x4096), cropped them into 1024x1024 blocks, and added them to the training data. It seemed that doing that too often will end up in overfitting on closed-up faces (even the cropping conditioning don't prevent that). When only around 1%-5% of my training data is cropped, it seemed to help improving details (e.g. skin details), but I'm not entirely sure if the improvement is really due to the cropped images.

latent charm Sep 14, 2023, 9:17 AM

#

I have 1,840 face cropped images / 5827 total images. It does tended to make close up images but it still able to make full shot

stiff dust Sep 14, 2023, 9:30 AM

#

oh, yes, it can. I just say it happens more often that it makes images in close-up without specifying "close-up"

#

the reason I came up with cropped images was rather that I hoped it helps SDXL to render images in high resolution without making the typical placement errors (two noses and so on). But it didn't really worked so well, upscaling is still the better strategy

latent charm Sep 14, 2023, 9:45 AM

#

cropped the character but it is interesting to see the image has two bottom navigation bar.🤣

normal ember Sep 14, 2023, 10:55 AM

#

stiff dust I used it for when training my own face: Next to the normal 1024x1024 images I a...

But shouldn’t kohya tell what resolution the image is when training?

#

I’m cropping and resizing based upon the closest resolutions that SDXL was trained on and put them in to a folder for each resolution. But this could be improved to resize and crop to other resolutions if there are pixels enough if there’s any point hence my question.

#

My feelings are that some fine tunes have regressed in rendering resolutions that base handles just fine. My thought was that this might be able to improve if you train on same image but to different resolutions.

stiff dust Sep 14, 2023, 11:31 AM

#

normal ember But shouldn’t kohya tell what resolution the image is when training?

resolution, yes, but not cropping

latent charm Sep 14, 2023, 12:15 PM

#

This images are selected result and I think the anatomy training might introduce other issue.

#

For example, the following image lost the relation between the legs and upper body. Failed-diffusions~

stone garden Sep 14, 2023, 12:22 PM

#

Nice now my character sticks with a cleft chin😭

stone garden Sep 14, 2023, 1:12 PM

#

what the fuck

#

why

#

im so fed up with this shit

#

they all have this type of chin now

#

completely random my training images didnt even contain those chins

stiff dust Sep 14, 2023, 2:22 PM

#

oh yeah, I know that very well: artefacts that appear in the images although they were not part of the training images. I had the same issues with SD 1.5 and 2.1, though

stone garden Sep 14, 2023, 3:26 PM

#

stiff dust oh yeah, I know that very well: artefacts that appear in the images although the...

funny thing is this is the 5th version of my lora

#

and the one before didnt have this (a tiny bit but not that strong)

normal ember Sep 14, 2023, 4:53 PM

#

stiff dust resolution, yes, but not cropping

What other data is provided other than the cropped image to training? Source size? Source image?

stone garden Sep 14, 2023, 6:18 PM

#

i have over 10 pics with thighhighs in my training set and still they appear when i put them into negative prompts... and all the time

normal ember Sep 14, 2023, 6:23 PM

#

Not sure how it works but it seems like when training it associates with other dimensions in the model that is related to the newly learnt data but not in the training data.

#

It looks like it's nipples from cats or something that has been associated with the ears

#

maybe the chins are related to that too

stiff dust Sep 14, 2023, 10:03 PM

#

normal ember What other data is provided other than the cropped image to training? Source siz...

it can't, cause it does not have this information. If you let kohya ss doing the resizing and cropping itself, then it provides this information. Otherwise, not.

normal ember Sep 15, 2023, 10:12 AM

#

stiff dust it can't, cause it does not have this information. If you let kohya ss doing the...

First of all I want to thank you for all invaluable information that you provide! My question was what data you provided with the cropped image. By any chance do you have a repo for the numerous changes you made to kohya?

opal jacinth Sep 15, 2023, 11:10 AM

#

what is the reason for saving/using "training states" in the context of dreambooth, if we can actually resume training also by using the last created checkpoint as source model?

opal jacinth Sep 15, 2023, 11:19 AM

#

opal jacinth what is the reason for saving/using "training states" in the context of dreamboo...

it is even possible to save such states for LoRA, even though we can simply put the path to the last saved LoRA to resume in kohya ss gui

jade hornet Sep 15, 2023, 12:46 PM

#

opal jacinth what is the reason for saving/using "training states" in the context of dreamboo...

I've always looked at it as a redundant save feature, personally

stiff dust Sep 15, 2023, 12:54 PM

#

it also stores the optimizer state and, thus, the current momentum of the optimizer

#

using training states you can stop and resume training any time without any drawbacks

#

loading from a checkpoint means the optimizer need some warmup phase and will probably rather harm the model for the first few steps until it improves the model again

#

this is not a big issue if you do stop in the middle and resume again. But if you plan to stop and continue several times and close to each other, saving states is definitely better

stiff dust Sep 15, 2023, 12:56 PM

#

normal ember First of all I want to thank you for all invaluable information that you provide...

no, its on my local git. Also many of the changes are quite specific. If you need someone specific I can just send you the code

opal jacinth Sep 15, 2023, 12:59 PM

#

stiff dust this is not a big issue if you do stop in the middle and resume again. But if yo...

nice insights, thanks for the clarification

jade hornet Sep 15, 2023, 1:02 PM

#

stiff dust it also stores the optimizer state and, thus, the current momentum of the optimi...

Nice, good points

astral island Sep 15, 2023, 3:12 PM

#

is there a place to learn about the basics of the LORA training process? (eg. the loss calculation, gradient descent, etc.)

stiff dust Sep 15, 2023, 3:55 PM

#

it's the same as for fine-tuning

valid stream Sep 15, 2023, 5:40 PM

#

Hello!!
I compared 4 popular face upscalers.
Base image 640x640.
Scaled to 960x960.
Results:

4x-UltraSharp details: 3/10 noise: 2/10 light: 5/10
4x_NMKD-Siax_200k details: 7/10 noise: 4/10 light: 6/10
4x_foolhardy_Remacri details: 8/10 noise: 4/10 light: 7/10
4x_NickelbackFS_72000_G details: 4/10 noise: 6/10 light: 7/10
R-ESRGAN 4x+ details: 6/10 noise: 3/10 light: 4/10

#

Please test it yourself and verify my results

#

I'm horrified how everyone says 4x-UltraSharp is good. It is terrible for realistic graphics... maybe decent for some anime or pixel perfect art

#

small update:
for 3x scale 4x_NMKD-Siax_200k details are insane, like 9/10

stiff dust Sep 15, 2023, 6:27 PM

#

I also always use Siax, but haven't evaluated it yet, so thanks for the info

stone garden Sep 15, 2023, 6:32 PM

#

do lora questions go here or somewhere else

#

specifically an anime lora but still

#

i guess ill put it here

#

it seems others have talked about it here before

#

first time trying to make a lora, im using colab and NAI for anime style
am I being too picky? like I feel like it's weird compared to anime ones on civitai. granted, my character is an OC, but theyre at least kindof anime girl looking (and androgynous) so idk.
I use adaptive optimizers, batch size of 6, output samples are good quality and very accurate. but when I apply it to a model the quality (not literal) ranges from broken, to kind of accurate, to pretty good.

tldr; is this normal? should I just pick one and go for it? are loras known to depend on the model you're applying them to?
ill get an example
heres a plot, idk if the "styilization" is overfitting or just my dataset. i put style tags in the images, and some weights are a good balance but its either not accurate enough or fried

#

#

this is on a custom model, but its basicaly just dreamshaper with anime composition/clip. makes it very stylized and a bit messy

stiff dust Sep 15, 2023, 7:23 PM

#

a lora should work with weight= 1. Using higher weights is rather hacky, using lower weights is rather a sign that the lora was overfitted

#

a lora should work with other models in most cases. However, if the other model is bad (e.g., totally overfitted and broken), then it might not work. Dreamshaper XL should work, though

#

I would not be too worried if the anatomy is sometimes wrong (too many legs or fingers). These things happen in the base model without any lora, too. Try different seeds and check if it happens too frequently

valid stream Sep 15, 2023, 9:36 PM

#

Hello!
Do you guys have any pro tips for fine tuning skin texture on 2k image?

I can only think of img2img with Ultimate SD Upscale with 4x_NMKD-Siax_200k with some denoise.
Extras with upscaler is really bad for skin even with the best possible upscaler.
I want advanced opinions on this topic.
Maybe it is just actually impossible as for 2023... Should I really expect from any sort of AI to perfectly reproduce human skin?

jade hornet Sep 15, 2023, 9:36 PM

#

stone garden first time trying to make a lora, im using colab and NAI for anime style am I be...

in general lora's work best on the model they were trained against, but you will having varying results against other models with the same base (ie. 1.5 base, sdxl base). as for the sample inference during training vs the actual live inference in auto1111 or comfy, that kind of just depends on all the settings and negative prompts etc

#

basically, dont expect super consistent behavior with so many variables in play

#

think of a lora like a formula that says when I use these tokens I will apply some weights to bias toward training parameters, but those weights will vary depending on the other weights in play

stone garden Sep 15, 2023, 10:09 PM

#

Ahh okay cool, thanks
I won't worry too much then, i have epochs that work at weight 1 and apply generally "this is that character" to the image. plus all the prompting

astral island Sep 15, 2023, 10:37 PM

#

is there a page that documents the inner workings of the LORA learning algorithms in details? i'm trying to learn more about the LORA training process.

stiff dust Sep 15, 2023, 11:12 PM

#

there is no special learning algorithm

#

lora is like fine-tuning

#

the main difference to fine-tuning (sometimes called Dreambooth) is

you freeze the original model M and train the difference model D with M+D = finetuned, instead
your difference model D is factorized. You can think of that as a lossy compression algorithm. Similar like you compress images as jpg. It makes the lora smaller than the original model
usually you don't train all matrices in the original model but only the nost important ones (that's why there are so many subtypes of Lora. Most time Lora is just training the Transformers. Lycoris is also training the Resnet and so on)

#

but loras are not much different from normal fine-tuning

astral island Sep 15, 2023, 11:19 PM

#

is there a place that goes into how normal fine-tuning works? i still need to learn about the very basic like the loss calculation, SGD, backpropagation and such

stiff dust Sep 15, 2023, 11:21 PM

#

uhm, this is the same for any kind of neural network, so lookup any textbook about neural networks

#

SD is using a simple l2 loss (mean squared error) on the noise prediction

#

(squared difference between predicted noise per pixel and real noise per pixel)

astral island Sep 15, 2023, 11:27 PM

#

i've been trying to learn about this with GPT4, but the issue is I can't put any of it in the context of Stable Diffusion LORA training

ocean dune Sep 16, 2023, 3:35 AM

#

No idea if it fits here, but what can i finetune/add to prevent third upscale resample to have 2 mouths and quite vastly different mouth/lips? The extra hand on the neck is a quick fix, it's just the extra mouth i can't seem to get rid of. Neither with denoise nor cfg

stiff dust Sep 16, 2023, 9:26 AM

#

astral island i've been trying to learn about this with GPT4, but the issue is I can't put any...

I don't know what you mean 😅 there is really nothing special on lora training

stiff dust Sep 16, 2023, 9:27 AM

#

ocean dune No idea if it fits here, but what can i finetune/add to prevent third upscale re...

maybe tiled upscaling? But that does not exist yet for SDXL. You can also try using control net (e.g. line art control net)

opal jacinth Sep 16, 2023, 9:31 AM

#

does it make a difference if I train a SDXL LoRA or a full fine tuned model with, let's say "medium quality", pictures? Is it possible that LoRA works better for images with lower quality? or should the dreambooth training always yield better results

stiff dust Sep 16, 2023, 9:39 AM

#

in theory, if you set the lora rank to max then lora and dreambooth should be the same. So no difference. In practice, most Lora implementations do not train all weight matrices but only the important ones

#

in fact, if you want to train ob subjects its totally sufficient to only train the cross attention

#

but most lora implementations train the complete transformer

#

anyways, that could be a reason why Lora are sometimes better than Dreambooth. They overfitt less, because they don't train all the weights that are not really important for the training

#

you could do the same with dreambooth, though. You always can decide freely which weights in your network should be trained

opal jacinth Sep 16, 2023, 10:00 AM

#

thank you, it is as interesting as ever to read your detailed answers.

normal ember Sep 16, 2023, 11:11 AM

#

valid stream small update: for 3x scale 4x_NMKD-Siax_200k details are insane, like 9/10

Bit off topic but try this upscaler, it's my favorite that doesn't ruin details. https://github.com/Phhofm/models/tree/main/4xLexicaHAT

GitHub

models/4xLexicaHAT at main · Phhofm/models

All my self trained & released AI upscaling models. After gathering and applying over 600 different upscaling models, I learned how to train my own models, and these are the results. - Phho...

normal ember Sep 16, 2023, 11:15 AM

#

stiff dust no, its on my local git. Also many of the changes are quite specific. If you nee...

The only two changes inserting changes I have knowledge about that you have done to kohya is the stop train text encoder after certain amounts of steps? epochs? and the crop conditioning. You might have made other neat changes that we still don't know about.

stiff dust Sep 16, 2023, 11:17 AM

#

normal ember The only two changes inserting changes I have knowledge about that you have done...

I always train EITHER text encoder OR unet which is already possible in sd-scripts. Also a simple trick to perfectly control what should be trained on is to set "--save-after-n-steps=1", save the lora safetensors, then open it in python and remove all matrices you don't want to train on.

#

for cropping you would have to store the cropping information somewhere. I used the PNG meta tag for that, but this is something you would have to do in python yourself. I just added a few lines to the BaseDataset#getitem method to read out this cropping information

normal ember Sep 16, 2023, 11:22 AM

#

Yeah, I have done metadata writing to png on other purposes already

normal ember Sep 16, 2023, 11:27 AM

#

stiff dust for cropping you would have to store the cropping information somewhere. I used ...

If you have a diff I'd be more than happy!

stiff dust Sep 16, 2023, 12:14 PM

#

############ kaidu
if "size" in img.info:
    ww, hh = img.info["size"].split(",")
    original_size = (int(ww), int(hh))
if "crop" in img.info:
    ww, hh = img.info["crop"].split(",")
    crop_ltrb = (int(ww), int(hh))
############

# augmentation
aug = self.aug_helper.get_augmentor(subset.color_aug)
if aug is not None:
    img = aug(image=img)["image"]

if flipped:
    img = img[:, ::-1, :].copy()  # copy to avoid negative stride problem

latents = None
image = self.image_transforms(img)  # -1.0~1.0のtorch.Tensorになる

#

that's the part in library/train_util.py

#

only the 6 lines in the ##kaidu block are relevant

#

should be around line number 1100, in the get_item method

#

it reads the source size and crop coordinates from the "size" and "crop" parameters in your PNG info. Numbers are given comma separated (e.g., "crop=40,20")

normal ember Sep 16, 2023, 12:56 PM

#

Thanks! I found the functions in that file when I checked

ocean dune Sep 16, 2023, 4:23 PM

#

stiff dust maybe tiled upscaling? But that does not exist yet for SDXL. You can also try us...

Tiled helps more for lack of video memory to split up the generation on smaller squares than one large, no?

#

CAuse i'm attempting SD 1.5 gen that is natively 620x620 was it, then upscale and resample at 2x, then another 2x, so 2400x2400

stone garden Sep 16, 2023, 7:20 PM

#

does this blown out look come from overtraining?

#

this is before hires fix so it seems fine

stone garden Sep 16, 2023, 11:45 PM

#

are CLIP/deepbooru still the go-tos for captioning images in a dataset? or has something better come out?

latent charm Sep 17, 2023, 4:51 AM

#

There many caption tool, like wd14, ML-Danbooru, blip, blip2, openflamingo, etc. The problem is your caption should be used for your purpose. Even using caption tool, you still need to manually delete or add more tags.

ruby pond Sep 18, 2023, 1:10 AM

#

how do you train a lora that only affects the composition of the image, and doesn't change the colors, lighting, etc? e.g. like a hands or eyes lora

stone garden Sep 18, 2023, 1:33 AM

#

probably correct tagging?

#

also a variety dataset

#

probably easiest to use 3d models or real images

stone garden Sep 18, 2023, 6:54 AM

#

Is there any good Mac software to create/edit captions? Also a guide for captioning images?

#

(I train on RunPod, but I'd like to caption locally)

normal ember Sep 18, 2023, 8:02 AM

#

taggui

#

https://github.com/jhc13/taggui

GitHub

GitHub - jhc13/taggui: Desktop application for quickly tagging images

Desktop application for quickly tagging images. Contribute to jhc13/taggui development by creating an account on GitHub.

stone garden Sep 18, 2023, 8:06 AM

#

Thanks for the suggestion. That isn't on macOS but I found https://github.com/toshiaki1729/dataset-tag-editor-standalone

A related question, is there a guide to captioning images for best results? Specifically for a person LoRA.

GitHub

GitHub - toshiaki1729/dataset-tag-editor-standalone: WebUI to edit ...

WebUI to edit dataset captions for txt2img models. Contribute to toshiaki1729/dataset-tag-editor-standalone development by creating an account on GitHub.

opal jacinth Sep 18, 2023, 8:29 AM

#

hey @restive bridge just being curious how you progress is so far? 🙂 I've recently tried dreambooth training but got mixed results... at least not that much improvement over trainings I did with LoRA

normal ember Sep 18, 2023, 8:53 AM

#

stone garden Thanks for the suggestion. That isn't on macOS but I found https://github.com/to...

Its python 100% so should probably work on mac too

#

It even claims cross platform

latent charm Sep 18, 2023, 9:05 AM

#

My 5000k images lora training had done. It includes mutiple person, mutiple outfits, anatomy focus, nsfw, etc. Some issues had found during testing. First, element mixing. For example, in outfit A has ribbon element and in oitfit B also has ribbon element. Due to 'ribbon' tag learned in both dataset, when using outfit A prompt to reconstruct the image, it might occur outfit B 'ribbon'.

#

Second, element on the fly. It occurs in second half epochs. It might be identified as overfit? fingers, arms or element on the outfit tear apart. Wrong composition of element or extra element from the outfits.

#

Thrid, anatomy hand training. It has most element mixing issue during the test. It might due to my lazy captioning. I used 'hand' as a general tag for different hand pose images. I think it mixed with different hand pose and it mixed front side and the back side of hand. I would test the anatomy training again with more accurate caption.

stone garden Sep 18, 2023, 10:29 AM

#

do i need a classprompt ?

#

if i make a lora for a clothing style, what classprompt should i use

sonic narwhal Sep 18, 2023, 11:01 AM

#

What is strongest automatic captioning model atm?

stone garden Sep 18, 2023, 1:12 PM

#

normal ember Its python 100% so should probably work on mac too

Thanks, it worked, just had to do a little magic to get GPU captioning working properly, but I don't need it anyway.

jade hornet Sep 18, 2023, 5:46 PM

#

stone garden if i make a lora for a clothing style, what classprompt should i use

Generally with a class you want to try and leverage what the base model already knows as much as possible, ex shirt, jacket, pants...

stone garden Sep 18, 2023, 7:59 PM

#

resizing a couple of images and wondering if anyone knows how to resize by the shortest side, ideally in python. the python image.thumbnail method and bulkresizephotos.com go by the longest side.

normal ember Sep 18, 2023, 8:26 PM

#

stone garden resizing a couple of images and wondering if anyone knows how to resize by the s...

Like this? https://gist.github.com/twri/a2e1262aa50a030576df25730ca0cdbc

stone garden Sep 18, 2023, 8:27 PM

#

yes tyvm

normal ember Sep 18, 2023, 8:31 PM

#

Not sure if thumbnail is the best way to resize

stone garden Sep 18, 2023, 8:33 PM

#

resize is probably better but IIRC you have to set the size for height and width and I'm working with a lot of varying resolutions

normal ember Sep 18, 2023, 8:35 PM

#

If it's a dataset for training I've used a list of resolutions and cropped and resized to the one closest to the target and put them in a separate folder by resolution.

stone garden Sep 18, 2023, 8:37 PM

#

in python?

normal ember Sep 18, 2023, 8:37 PM

#

Yea

stone garden Sep 18, 2023, 8:38 PM

#

👉 👈 wouldn't happen to have a copy of that as well, would you?

normal ember Sep 18, 2023, 8:40 PM

#

https://gist.github.com/twri/b1ba07f7d4b840ea44188fedea8c1e44

#

Doesn't have a target dir parameter but I guess you could add that, it puts it into directory called processed

stone garden Sep 18, 2023, 8:41 PM

#

that's perfect, tyvm you saved me a lot time

normal ember Sep 18, 2023, 8:42 PM

#

Try GPT-4 😉

stone garden Sep 18, 2023, 8:42 PM

#

facepalm

#

ofc

stone garden Sep 19, 2023, 1:46 AM

#

does using xformers while training do anything to a lora

stone garden Sep 19, 2023, 2:03 AM

#

like in a bad way

stiff dust Sep 19, 2023, 9:29 AM

#

stone garden like in a bad way

no

slim talon Sep 19, 2023, 2:51 PM

#

Hey Guys,

I'm wanting to train a model on a specific pixar-like character. I've generated 20 images of the sam(ish) character and I want to be able to prompt that character via dreambooth or something similar. Any tips for how I'd be able to do that?

stiff dust Sep 19, 2023, 5:05 PM

#

caption the images (either use the real name of the character, or a custom name without meaning, e.g., "Monica Tdezk").

#

then train a lora on that

#

using the kohya/sd-scripts (or kohya-ss) library

#

you can either train a pure text encoder lora with low dim (e.g. dim=2)

#

or you train the unet (with a bit higher dim, e.g. dim=8 or dim=12)

#

I often found unet training slower but more flexible, but you can just try. Text encoder training is usually fast, you get good results after a few minutes

hardy storm Sep 19, 2023, 5:32 PM

#

Regularization image question. Trying to train Lora models in Kohya for a buddy of mine of his kids to make them into superheroes. What would you recommend for regularization images?

stiff dust Sep 19, 2023, 8:18 PM

#

you don't necessarily need them. Jtst try without

fervent bison Sep 20, 2023, 12:08 AM

#

Is it possible to train a LoRA with 8 gigs of VRAM?

astral island Sep 20, 2023, 5:52 AM

#

question: what does dreambooth/ti/lora/finetune training does with the loss from all the images in a batch? do they use them to find a derivative?

stone garden Sep 20, 2023, 6:49 AM

#

stone garden resizing a couple of images and wondering if anyone knows how to resize by the s...

I put together a couple of scripts with the help of ChatGPT to help me manage my growing SD training datasets: https://github.com/boomerchan/sd_training_scripts
resize_bulk.py is by far the most useful tool as it allows you to crop and resize based on:

a given height and/or width
the original SDXL training resolutions
the shortest side or the longest side
And it doesn't modify your original image(s).
glhf

GitHub

GitHub - boomerchan/sd_training_scripts

Contribute to boomerchan/sd_training_scripts development by creating an account on GitHub.

#

also thanks to twri for reminding me GPT can do Python

normal ember Sep 20, 2023, 7:08 AM

#

stone garden I put together a couple of scripts with the help of ChatGPT to help me manage my...

There are even more resolutions available that SDXL was trained on if you want to include them all. It's in the SDXL paper.

stone garden Sep 20, 2023, 7:09 AM

#

oh? I was going off of this. picked it up somewhere when SDXL first released

#

I'll look up the paper

normal ember Sep 20, 2023, 7:11 AM

#

Yes, that's why I only used them in my resize-tool too but there are more.

#

Will result in less cropping I guess and the model should be able to handle it.

marble zodiac Sep 20, 2023, 7:12 AM

#

stone garden I'll look up the paper

https://arxiv.org/pdf/2307.01952.pdf page 17

normal ember Sep 20, 2023, 7:20 AM

#

tagging tool could be useful but there's also options available in kohya-ss to caption_suffix and caption_prefix in later versions depending on use case,

#

I try to keep the unique stuff in the caption file for each image and the general stuff in the config (toml) for the dataset for flexiblity.

stiff dust Sep 20, 2023, 8:27 AM

#

astral island question: what does dreambooth/ti/lora/finetune training does with the loss from...

you take the average of the loss and compute the derivative of the average (which is itself the average of the derivatives).
You can think of each image is handled separately and then you merge all your updates via averaging

hardy storm Sep 20, 2023, 10:26 AM

#

Do we have a definitive answer on the whole use a celebrity name while training or don't? Seems to be a another one of those contentious mystery topics - like regularization images.

latent charm Sep 20, 2023, 10:34 AM

#

I don't

#

You could create funny thing which mix famous character' features to your training target.

astral island Sep 20, 2023, 11:03 AM

#

stiff dust you take the average of the loss and compute the derivative of the average (whic...

this means that it needs several averages across different epochs to get a derivative right? i'm not very good at math so i'm not sure

stiff dust Sep 20, 2023, 11:34 AM

#

astral island this means that it needs several averages across different epochs to get a deriv...

not sure what you mean...

#

the derivative is a weight change. For each parameter you get a number saying how to change the parameter. When you have a batch of 10 images, you would obtain 10 of these numbers and take the average of them

stiff dust Sep 20, 2023, 11:36 AM

#

hardy storm Do we have a definitive answer on the whole use a celebrity name while training ...

I get better results with random names instead of celebrity names

ripe sleet Sep 20, 2023, 2:37 PM

#

I wanted to experiment today with training a lora so I was wondering for people which tends to work better if anyone knows, locon or loha lycoris?

stiff dust Sep 20, 2023, 6:38 PM

#

I would simply use Lora

#

anything else is probably not necessary. Maybe it helps in rare cases for style training

sonic narwhal Sep 21, 2023, 10:56 AM

#

does anyone have a good comfyUI workflow for evaluating LoRAs?

sonic narwhal Sep 21, 2023, 12:27 PM

#

Also is it possible to set a setting in kohya so that it starts saving epochs after 40 epochs etc?

opal jacinth Sep 21, 2023, 12:45 PM

#

sonic narwhal Also is it possible to set a setting in kohya so that it starts saving epochs af...

that would be --save_every_n_epochs="1"

opal jacinth Sep 21, 2023, 12:46 PM

#

sonic narwhal does anyone have a good comfyUI workflow for evaluating LoRAs?

that's from Caith, it is a very basic workflow to load the base model and a lora

📎 simple_lora.json

sonic narwhal Sep 21, 2023, 1:09 PM

#

opal jacinth that's from Caith, it is a very basic workflow to load the base model and a lora

Thanks

sonic narwhal Sep 21, 2023, 1:10 PM

#

opal jacinth that would be ` --save_every_n_epochs="1"`

I didnt mean "save every n epoch" but "start saving every n epoch after x epoch"

opal jacinth Sep 21, 2023, 1:16 PM

#

sonic narwhal I didnt mean "save every n epoch" but "start saving every n epoch after x epoch"

ah, so skipping the first n epochs before starting to save. I'm not aware of that option, even though I would also find it helpful

latent charm Sep 21, 2023, 1:23 PM

#

Comfyui xyplot workflow using efficient nodes

📎 xylora.json

jade hornet Sep 21, 2023, 5:06 PM

#

sonic narwhal I didnt mean "save every n epoch" but "start saving every n epoch after x epoch"

Sure, but why not set it to save after every 40 epochs, and only run 40. Save the state, and then resume with save every 1

real citrus Sep 21, 2023, 9:23 PM

#

stiff dust the derivative is a weight change. For each parameter you get a number saying ho...

Ah, so is that why character training with high batch count results in a worse likeness? Because it's kind of taking an average of multiple images together.

stiff dust Sep 21, 2023, 9:23 PM

#

real citrus Ah, so is that why character training with high batch count results in a worse l...

I'm very sure that is myth

#

it does not take the average of the images but the average of the weight changes

#

there is no rational reason why character training should work better with lower batch count

jade hornet Sep 21, 2023, 9:43 PM

#

well there's some rationalization, if you leave the learning rate and steps unchanged, they will certainly not yield the same results, so I suppose it's better to say, that when accounting for those variables, results should be similar

#

and then what happens if you use an auto-adjusting optimizer, and plotting loss, there are things you have to accout for when using batch size

stiff dust Sep 21, 2023, 10:28 PM

#

I'm not talking about that the results should be the same. Of course you have to adjust learning rates and step count, as you do anyways. But many guides claim that you should use batch size 1, because the network would get confused when it sees multiple images at the same time and then learns some blended image and stuff like that. THIS is totally bullshit. You can achieve similar or even better results with high batch size.

hardy storm Sep 21, 2023, 11:21 PM

#

Question about captioning. I've heard it said that you should caption what you don't want the model to remember. So, in other words, if training a person you know, who you want to put in different locations and clothing, then you should caption the details of the background and clothing. However, when watching captioning tutorials, people always start their captions off by saying the gender of the person (ie. "a man" or "a woman"). Am I wrong, or does that go against the "caption what you don't want" philopshy? Because, if that were true, then captioning the sex of the person, should make their gender fluid in the model. Would love some insight into this.

stone garden Sep 22, 2023, 12:16 AM

#

more just a general guide. Basically when you caption, especially if you have multiple images with 1 outfit it should know exactly what to do when the prompt is brought up (but only). I have things trained that are "genderfluid", but the face and body still apply to any prompt so I rarely see change. if you don't caption that and use the Lora/mini model, it might freely assume and you can never really "get what you want" if it's not trained specifically enough to do it (if you know how to prompt for an image similar to training, it's easier than guessing). for styles this doesn't matter because they're not specific but for people I'd think its important

#

generally it also varies by dataset. not really something that is for sure going to work, but should. ai can be weird

jade hornet Sep 22, 2023, 2:09 AM

#

hardy storm Question about captioning. I've heard it said that you should caption what you d...

so there's identifying, and there's describing. your scene contains a woman, but if you say a woman with blonde hair, that's describing the woman. And by doing so, you basically tell the AI about the hair and it wont try to learn it. This is very handy if you want her to have red hair sometimes. read this, I think very well written exploration of captions from a reddit post: https://www.reddit.com/r/StableDiffusion/comments/118spz6/captioning_datasets_for_training_purposes/

From the StableDiffusion community on Reddit: Captioning Datasets f...

Explore this post and more from the StableDiffusion community

#

similarly with clothing, there was a guy in one of the discord rooms that had trained a character but always got the same outfit. his issue was not describing the outfit

#

I'll say this, to save you some grief, dont go overboard describing everything, if your dataset is diverse, the AI wont learn it easily. What I mean by that is, if all your images are in diverse settings, you dont necessarily have to say in a bedroom with a side table containing a lamp with a green shade...that's just silly. unless that lamp appears in several images

#

most of my captions are like... no more than 10 words

hardy storm Sep 22, 2023, 2:32 AM

#

jade hornet I'll say this, to save you some grief, dont go overboard describing everything, ...

This is fantastic. I've bee struggling to find good info on this. Thank you kindly for the tips and link

neon kelp Sep 22, 2023, 5:21 PM

#

Hey there, I'm wondering if anyone knows how to train a stable diffusion model with a different language? Like Greek, spanish, japanese, etc?

wooden badger Sep 22, 2023, 11:41 PM

#

I have rtx 3090 and I can't do sdxl dreambooth training
The vram usage is full and it takes 12 g more from shared rams
I did sd1.5 training with dreambooth just fine 3000 step in 10 minutes
Any help will be appreciated 👍

hot breach Sep 22, 2023, 11:57 PM

#

neon kelp Hey there, I'm wondering if anyone knows how to train a stable diffusion model w...

easiest way tbh would be to put a translator in front of SD, if you want it to run off native non-latin languages it would take a different text encoder and probably significant retraining or starting from scratch

#

the tokenizer/text encoder are really only setup for english

ruby pond Sep 24, 2023, 8:08 AM

#

I think I like the caption output from this setup the best

#

it creates a fairly long caption, with repeating but slightly different descriptions of the same thing, but avoids mentioning artists for the most part

#

e.g. 'two women talking on a couch in an office, sitting on a couch, sitting on couch, calmly conversing 8k, sitting on the couch, sitting on a sofa, sitting in a lounge, giving an interview, on a couch'

#

it has a weird behaviour though when it reads a word, it adds a bunch of captions that are related to that word, so make sure to check the captions for text or signs

minor heart Sep 24, 2023, 9:28 AM

#

sdxl_train.py: error: unrecognized arguments: --network_train_unet_only

minor heart Sep 24, 2023, 9:30 AM

#

minor heart sdxl_train.py: error: unrecognized arguments: --network_train_unet_only

I get this error when adding this parameter (network_train_unet_only) in advanced tab in dreambooth

#

what i am doing wrong here , training consuming more than 24 g vram for sdxl

stiff dust Sep 24, 2023, 9:40 AM

#

the parameter does not exist for dreambooth training

#

you can use --stop_text_encoder_training=-1 instead - it should have the same effect

#

(in general: all parameters starting with --network are for loras)

full moat Sep 24, 2023, 9:42 AM

#

Hello, what do I need to change in Sd so that when I boot it up the neg prompt box has a "standard text prompt that I want"?

minor heart Sep 24, 2023, 9:45 AM

#

stiff dust you can use `--stop_text_encoder_training=-1` instead - it should have the same ...

I got this error when used it sdxl_train.py: error: unrecognized arguments: --stop_text_encoder_training=-1

minor heart Sep 24, 2023, 9:46 AM

#

stiff dust the parameter does not exist for dreambooth training

I tried with no parameters but it consume also from shared vram

#

what should i do to make it consume less vram ?

stiff dust Sep 24, 2023, 9:46 AM

#

okay, that's yet another class

#

but for sdxl_train.py you don't have to specifying anything, the text encoder is not trained by default

minor heart Sep 24, 2023, 9:47 AM

#

stiff dust but for sdxl_train.py you don't have to specifying anything, the text encoder is...

okay

stiff dust Sep 24, 2023, 9:47 AM

#

you can try --cache_text_encoder_outputs --cache_latents though, maybe that helps with vram

minor heart Sep 24, 2023, 9:48 AM

#

stiff dust you can try `--cache_text_encoder_outputs --cache_latents` though, maybe that he...

i did try that but same vram consumtion

#

elder zealot Sep 24, 2023, 4:24 PM

#

Is it currently possible to use loras with sdxl img2img? While there is an existing inherited method for this, I'm having the same issues described here:
https://discuss.huggingface.co/t/how-to-use-lora-with-sdxl-img2img/55295

Hugging Face Forums

How to use lora with SDXL img2img?

I am trying to apply a lora to the SDXL refiner img2img pipeline. I’ve tried multiple sdxl loras that work with the base model and pipeline but when i try them with StableDiffusionXLImg2ImgPipeline and the refiner model it errors (I have set low_cpu_mem_usage=False and ignore_mismatched_sizes=True to no avail) StableDiffusionXLImg2ImgPipeline h...

real echo Sep 24, 2023, 5:53 PM

#

anyone have an sdxl kohya config for a person/face? I know I'll have to edit all the paths etc, I just want a complete training config to work from

latent charm Sep 24, 2023, 7:33 PM

#

elder zealot Is it currently possible to use loras with sdxl img2img? While there is an exist...

use loras with sdxl img2img? Yes. Apply lora to SDXL refiner? no

latent charm Sep 24, 2023, 7:33 PM

#

real echo anyone have an sdxl kohya config for a person/face? I know I'll have to edit al...

You could just start with preset

elder zealot Sep 24, 2023, 7:45 PM

#

latent charm use loras with sdxl img2img? Yes. Apply lora to SDXL refiner? no

This is what I'm trying to do:
pipe = StableDiffusionXLImg2ImgPipeline.from_pretrained( "stabilityai/stable-diffusion-xl-refiner-1.0", torch_dtype=torch.float16) pipe = pipe.to("cuda") pipe.load_lora_weights(prj_path, weight_name="pytorch_lora_weights.safetensors")

The tensor shapes don't align, as described by other folks with the same error in the link I posted.

latent charm Sep 24, 2023, 7:51 PM

#

The answer is no because there is no lora trained base on refiner and base is different than refiner.

#

You could use lora with base but not refiner

elder zealot Sep 24, 2023, 8:47 PM

#

Thanks so much—that totally makes sense. Now that I'm running it with the base, I'm experiencing results that I didn't expect. I can sample from the lora-weighted base using the DiffusionPipeline and see images that resemble my training data. However, when I use the same lora weights with Img2Img, I'm not seeing images that resemble the training data, even when I bump up the strength. Any ideas about what might be causing that?

stiff dust Sep 24, 2023, 9:29 PM

#

there is no 0th timestep, so even if you set 100% denoise you start at step 1. If you have 20 steps in total, then 1/20 of the input image is still conserved. The early timesteps usually determine the composition of the image. So even with 100% denoise strength the rough shape of the image as well as the colors and brightness might be taken over from the input image. You can increase the number of steps to negate that (but of course, when you do img2img you usually want that the input image is bleeding into the resulting image)

#

anyways, the Lora is applied on img2img same way as on text2img, so if your images look a bit different then just because your input image effects the outcome no matter what your denoise strength is

elder zealot Sep 24, 2023, 9:41 PM

#

stiff dust there is no 0th timestep, so even if you set 100% denoise you start at step 1. I...

That all makes sense. However, when I do the same thing with 1.5, I see an unmistakable relationship with the fine-tuning data that increases dramatically as I bump the strength up. In that case, by the time I hit, say, 0.7 for the strength, the resulting image is very clearly highly conditioned by the fine-tuning data. But for some reason, this setup is behaving entirely differently. I'm sure there's something wrong with my workflow somewhere...

stiff dust Sep 24, 2023, 9:58 PM

#

do you use trigger words for your lora?

elder zealot Sep 24, 2023, 10:06 PM

#

stiff dust do you use trigger words for your lora?

yes

stiff dust Sep 24, 2023, 10:07 PM

#

oh, I haven't looked at your code

#

you are using the refiner model

#

loras trained on base do not work for refiner

#

you would have to train a separate lora on the refiner

#

refiner and base have different architectures and are not compatible to each other. Honestly, I would just skip the refiner. Base is already good enough, sometimes even better than refiner.

elder zealot Sep 24, 2023, 10:10 PM

#

Actually, I was using the refiner, but thanks to a helpful comment, I am now using the base, not the refiner (and getting the results I described to you)

stiff dust Sep 24, 2023, 10:12 PM

#

then I have no clue 🤷‍♂️ there is no reason why the lora should behave differently in img2img than in text2img

elder zealot Sep 24, 2023, 10:14 PM

#

stiff dust then I have no clue 🤷‍♂️ there is no reason why the lora should behave differen...

I'm going to gather up the workflow and see if I can get the relevant parts of the code concise, thanks for your input 🙂

zenith vine Sep 25, 2023, 1:39 AM

#

Hello! I feel like I'm going insane—need a sanity check.
I'm training a LoRA using kohya-dreambooth method in Colab. It literally worked six hours ago. I collected images, I used the notebook to caption them, I trained LoRA, everything worked swell. I try again now, it just plainly doesn't work. The training proceeds without errors, but the end result does not even try to capture the character. The activation word is not even recognized as a signal to try and make a character. Where should I look? What might I be doing wrong?

real echo Sep 25, 2023, 1:56 AM

#

typo?

zenith vine Sep 25, 2023, 1:57 AM

#

real echo typo?

Nope, I checked for typos and I tried two different datasets under two different names. Both fail.

zenith vine Sep 25, 2023, 3:34 PM

#

zenith vine Hello! I feel like I'm going insane—need a sanity check. I'm training a LoRA usi...

I actually came up with a solution on my own. Not sure why I haven't tried this earlier. I changed the optimizer from Adam8bit to just Adam and upped the learning rate 4x. As an ML person, I should have figured out sooner that it's just a learning rate issue. Everything works now. It's not perfect, but I'll play with LR some more to get the best result.

final delta Sep 25, 2023, 5:00 PM

#

Question, I am trying to tune Stability.AI to produce an "app icon" image... Instead it produces many icons... What information path am I looking for in order to "tune" this, or change it into a consistant image? I see here the parameters of the stabilityai python library I am using but... It does not seem to result in what I am looking for. Images attached. Not sure this is even the right place to ask, but pointing me in the right direction would be very useful!

stiff dust Sep 25, 2023, 6:34 PM

#

train a textual inversion or a text encoder lora on app icons

#

training data shouldn't be the problem as there a plenty of icons freely available

lone karma Sep 25, 2023, 9:57 PM

#

final delta Question, I am trying to tune Stability.AI to produce an "app icon" image... Ins...

what happens if you reduce your height and width to approx the size of an icon?

#

may be a bad idea, but its a thought

final delta Sep 25, 2023, 10:02 PM

#

stiff dust train a textual inversion or a text encoder lora on app icons

ok I can look into this... That is doable.

final delta Sep 25, 2023, 10:02 PM

#

lone karma what happens if you reduce your height and width to approx the size of an icon?

Tried that actually, a valid idea, but no, still gives inconsistant results.

elder zealot Sep 25, 2023, 10:37 PM

#

stiff dust then I have no clue 🤷‍♂️ there is no reason why the lora should behave differen...

got this working now, thanks for your help!

rain crag Sep 26, 2023, 11:49 AM

#

Hey there, I'm new in ML and SD so I've got some questions. I learned how to fine-tune an SD model using Dreambooth on a specific person. Now I want the person to wear some specific clothes. I learned on the internet that I can train a textual inversion embedding and combine these two solutions to get the result. But after training an embedding, generated images of any person wearing the clothes have very low quality, the face and body is deformed, very rarely I can get anything close to more or less realistic person. It's needless to say, that when I try to generate an image of a fine-tuned person wearing the embedding clothes, it results in something awful.

Where it could go wrong? Maybe I should learn more about textual inversion and train it better?

normal ember Sep 26, 2023, 11:54 AM

#

final delta Question, I am trying to tune Stability.AI to produce an "app icon" image... Ins...

You might find something from this training on emojis. I think I’ve seen the dataset on huggingface. https://replicate.com/fofr/sdxl-emoji

fofr/sdxl-emoji – Replicate

An SDXL fine-tune based on Apple Emojis

stiff dust Sep 26, 2023, 12:25 PM

#

rain crag Hey there, I'm new in ML and SD so I've got some questions. I learned how to fin...

how do your training images of the clothing look like?
You can't control what aspects textual inversion is learning. If you are unlucky, it is not just learning color and shape of your clothing but also low quality of the image

#

besides that, it is always helpful to add enhancing tags to your prompt (ultrahigh quality, 30mm photo, raw photo, product photo, ...)

opaque rain Sep 26, 2023, 12:42 PM

#

hi! does stability.ai provide API for finetuning image model?

rain crag Sep 26, 2023, 12:55 PM

#

stiff dust how do your training images of the clothing look like? You can't control what as...

I just downloaded photos of a suit from an online shop. Just a woman in a suit with white background. The images are professional, have good quality and I'd say that it should be easy to learn on them.

BTW, maybe I just said it vaguely, but the generated pictures are ok, the clothing is close enough to the images, but the person and their face is deformed.

stiff dust Sep 26, 2023, 1:04 PM

#

then it's probably just a resolution problem. Take a look at SD upscaling

#

SD has problems getting fine details right if they are low resolution

#

https://github.com/Coyote-A/ultimate-upscale-for-automatic1111

GitHub

GitHub - Coyote-A/ultimate-upscale-for-automatic1111

Contribute to Coyote-A/ultimate-upscale-for-automatic1111 development by creating an account on GitHub.

rain crag Sep 26, 2023, 2:10 PM

#

stiff dust then it's probably just a resolution problem. Take a look at SD upscaling

don't we use upscalers on already generated images? so if the generated images doesn't has bad details it'll just upscale them?

#

Ok, I'll try to simplify the problem:

I generate a photo of a person - everything is fine
I connect my embedding of clothing and generate a photo of a person wearing the clothing - clothing is ok, person face and body are deformed

stiff dust Sep 26, 2023, 2:17 PM

#

maybe you show an example of the generated image?

#

if the face of the generated person is deformed then you can usually fix that by upscaling

rain crag Sep 26, 2023, 2:56 PM

#

stiff dust if the face of the generated person is deformed then you can usually fix that by...

based on Realistic Vision 5.1

#

almost the same prompt just without the embedding token

stiff dust Sep 26, 2023, 3:04 PM

#

yeah, okay, there is something strange with the embedding. Maybe to overtrained

#

could also be that realistic vision does not work that well with embeddings

rain crag Sep 26, 2023, 3:05 PM

#

used embeddings from civitai, not perfect but way better

#

just noticed that it learned 2 poses from the photos, maybe overtrained indeed

#

maybe you could help me with captions for photos? I used these photos with just "photo of a woman wearing [keyword], white background". Maybe I should describe all the details?

normal ember Sep 26, 2023, 5:02 PM

#

--learning_rate 0.0004 has to be redundant if you do --unet_lr 0.0004 --network_train_unet_only no?

final delta Sep 26, 2023, 5:21 PM

#

normal ember You might find something from this training on emojis. I think I’ve seen the dat...

Right but wheres the code for this, it does not do me any good if it is not for devleopment purposes!

#

I can look at how this works but it seems like it is using some sort of custom diffuser... I guess I would be looking for information on how to create these.

#

I think I have found the right path of information.

normal ember Sep 26, 2023, 5:22 PM

#

I doubt that there's anything special about that LoRA except the dataset. I just can't find the dataset now. 😦

final delta Sep 26, 2023, 5:23 PM

#

Not even sure what a lora is at this point! Was just told to do this stuff by my job and here I am lol.

#

RnD weird.

normal ember Sep 26, 2023, 5:23 PM

#

But if I remember correctly it was just plain simple raw dump of the apple emojies.

final delta Sep 26, 2023, 5:23 PM

#

So do you know what software they are using to train the data? I am using openAI, Pinecone (and milvus for local), and langchain

#

do you know the Stable Diffusion equivalent?

#

Those are for text -> text, this would be for text -> image

#

Found this image. I will use it as a guide.
https://i.imgur.com/J8xXLLy.png

Imgur

I made a LoRA training guide! It's a colab version so anyone can use it regardless of how much VRAM their graphic card has!

normal ember Sep 26, 2023, 5:25 PM

#

Most are using kohya-ss or kohya_ss depening if you want a gui or not for training a LoRA..

final delta Sep 26, 2023, 5:25 PM

#

normal ember Most are using kohya-ss or kohya_ss depening if you want a gui or not for traini...

Thank you!

normal ember Sep 26, 2023, 5:26 PM

#

Non GUI: https://github.com/kohya-ss/sd-scripts/tree/sdxl
GUI which uses code base from above: https://github.com/bmaltais/kohya_ss

#

But you could probably also use replicate.com and their API to train it without having to have the hardware.

#

https://replicate.com/stability-ai/sdxl#training-inputs
https://replicate.com/blog/fine-tune-sdxl

woven delta Sep 26, 2023, 9:21 PM

#

Hey,
How should I approach training Lora for specific style of outfits? I was experimenting with object-like captioning but the results were underwhelming.

paper field Sep 27, 2023, 4:19 AM

#

Anyone know if it's practical to train a LoRA using irl images if I plan on using an anime SD model?

#

Specifically, I'm looking to train a LoRA for an irl dog

stiff dust Sep 27, 2023, 10:58 AM

#

I don't even know what "irl" means, but if you refer to "real life photos" or something then yes, that should work

#

when I train on photos of my face I can use the same model to create anime images of me

opal jacinth Sep 27, 2023, 1:00 PM

#

stiff dust when I train on photos of my face I can use the same model to create anime image...

what training settings are you currently using for your own LoRA?

stiff dust Sep 27, 2023, 1:01 PM

#

hm, a lot of custom stuff. But best results so far were with rare tokens, learning rate ~5e-4 unet only training, batch size 10, default noise offset

opal jacinth Sep 27, 2023, 1:04 PM

#

stiff dust hm, a lot of custom stuff. But best results so far were with rare tokens, learni...

and what optimizer? 🙂

stiff dust Sep 27, 2023, 1:04 PM

#

AdamW

#

I don't see any reason using something else

opal jacinth Sep 27, 2023, 1:07 PM

#

thx, I will give it a shot. It's still pretty wild out there with regards to best training settings and also contradictory information. I lately also had quite good results with only 4DIM 😄

stiff dust Sep 27, 2023, 1:16 PM

#

after so many tries I think there is no best setting

#

it depends on your training dataset and what you want to achieve

#

but yes, most models out there use WAY too high dims

normal ember Sep 27, 2023, 2:08 PM

#

@stiff dust I'm trying this concept of tagging. Do you think one should still use Jackie Chan person if you are not using instance and class but captions for each image? Like close-up shot of Jackie Chan person holding chopsticks or should one go with close-up shot of Jackie Chan holding chopsticks? https://arxiv.org/abs/2306.00926

arXiv.org

Inserting Anybody in Diffusion Models via Celeb Basis

Exquisite demand exists for customizing the pretrained large text-to-image model, $\textit{e.g.}$, Stable Diffusion, to generate innovative concepts, such as the users themselves. However, the newly-added concept from previous customization methods often shows weaker combination abilities than the original ones even given several images during t...

#

When trying this training I get a feeling I get best results if I also use Jackie Chan person when generating, but it's somewhat inconclusive.

manic loom Sep 27, 2023, 6:42 PM

#

I realise I didn't really form my comment as a question so I'll try again: Does anyone have any tips on how to consistently get braces in stable diffusion? Is there some models that's better (or even capable) of it than others? Even when I tried to train an embedding on a girl with braces in every shot it turned out without braces as result... I'm out of ideas lol!

woven delta Sep 29, 2023, 9:02 AM

#

manic loom I realise I didn't really form my comment as a question so I'll try again: Does ...

Have you seen this https://civitai.com/models/46690/braces-concept ?

braces concept - v2-e08 | Stable Diffusion LoRA | Civitai

braces my first attempt at an appearance lora; it works, ish. It heavily impacts style, but it works. This is trained at 768, I might retrain at 10...

manic loom Sep 29, 2023, 9:04 AM

#

Yeah, i tried that one but it affect the base model way too much. Trying to understand why braces are so hard to get right, it shouldn't be hardere than jewellery or something but it is!

woven delta Sep 29, 2023, 9:06 AM

#

manic loom Yeah, i tried that one but it affect the base model way too much. Trying to unde...

Maybe just inpaint the teeth?

manic loom Sep 29, 2023, 9:07 AM

#

That is a good idea. Didnt really consider that, might be easier!

#

The funny part is that it seems SD understand what i want, since the girl always shoes alot teeth when I add braces in the prompt, it just dont draw the braces... maybe its trained on those fancy invisible braces!

final delta Sep 29, 2023, 4:27 PM

#

@normal ember Hey! I got pretty far, now the real work begins. Got this set up, and got the data set im going to use set up as well. Not sure what to do next!

stiff dust Sep 29, 2023, 5:41 PM

#

manic loom The funny part is that it seems SD understand what i want, since the girl always...

maybe it's a resolution problem (as so often). Try to upscale the image and only repaint the teeths

gloomy sierra Sep 30, 2023, 2:40 AM

#

I have constant issues with photographic LoRAs "unflattening" my 2D illustraton base model outputs -

Has anyone experimented with doing a (for lack of a better word) "2-pass" training flow roughly similar to:

train the lora on photos
img2img the original photos at some appropriate denoising rate using illustration prompts and illustration base model
generate additional images with the lora + illustration model
use these "flattened" 2D source images to train a second lora

Just want to make sure this isn't a known dead-end before I spend much time on it (or if there are any tweaks to the above that would make sense)

latent charm Sep 30, 2023, 5:15 AM

#

Usually people dont use the same model generated images to train the model. The "error" in the generated images would got learned into model and make the model collapse.

#

But you could try

real citrus Sep 30, 2023, 9:25 AM

#

Does anyone know how koyha (gui or ss) selects images for a batch?
For example, if I'm training on 5 images - with the same resolution - and a batch size of 4 presumably the first batch will be of images 1,2,3,4 but what about the second batch? Would it be images 5,1,2,3 or images 5,5,5,5 or something more obscure?

Is there a way to get koyha to log what images are actually being using in a batch? That would help 🙂

final delta Sep 30, 2023, 3:44 PM

#

@real citrus Sent you a PM!

broken hemlock Sep 30, 2023, 7:57 PM

#

bit of a technical question - I've been fine-tuning SDXL (not dreambooth, not lora). It's slowly getting somewhere, but I always seem to have garbled outputs when I zoom closely. I recently discovered that I'm training on the first base model, which had a sub-optimal VAE. Not 100% sure that's the problem, but this is why I'm wondering if the VAE involved at all when fine-tuning? If not, I can just use the updated one at inference time and need to find the cause elsewhere.

ocean cape Oct 1, 2023, 12:23 AM

#

Just use Kohya-ss as I think the issue is that your training script is too old @mental anchor

edgy wharf Oct 1, 2023, 10:14 AM

#

Is anyone here have experience to fine tune lora for inpaint?

woven delta Oct 1, 2023, 9:32 PM

#

edgy wharf Is anyone here have experience to fine tune lora for inpaint?

Working on this rn. Afaik there is no way to train a separate inpainting Lora at the moment.
https://github.com/kohya-ss/sd-scripts/issues/502

GitHub

Feature Request: Training Lora on inpainting models. · Issue #502 ·...

As a developer, one of the challenges I've encountered is that lora networks, when trained on standard models, tends to underperform when used with inpainting models. The possible solution: All...

ripe scaffold Oct 2, 2023, 7:00 AM

#

How much of the tertiary model is merged into the final result? in AAA1 Webui?

#

The formatting has changed, and many tutorials and videos are outdated, any easy to understand updated guide?

#

Discard weights with matching name? How can I use that?

#

https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features#checkpoint-merger
The webui wiki is outdated too so no idea where to gather this info

GitHub

Features

Stable Diffusion web UI. Contribute to AUTOMATIC1111/stable-diffusion-webui development by creating an account on GitHub.

astral island Oct 2, 2023, 11:22 AM

#

i have a pretty weird question. bear with me for a sec.

#

say i have a dataset of a single image, with a prompt called "PROMPT A". I then put it in my training script of my choice put it through a single iteration of Dreambooth training, during which it finds the output to have a loss of 0.345.

#

Then I modify that training script to not require any image in my dataset whatsoever, but only a prompt. I then put my original model through a single iteration of Dreambooth training, which uses "PROMPT A" to generate an image (that would have been used for loss calculation), after which I manually input a loss of 0.345 (exactly the same as the previous example) and do backpropagation with it.

#

my question is: all else being equal (the class name, the instance name, etc.), would the resulting model from both examples be identical?

lethal lily Oct 2, 2023, 2:39 PM

#

Hi, I'm trying to create an embedding / textual inversion for the style of an artist, I'm using kohya and the learning doesn't seem to work very well. Can someone help me ?
I'm using this tutorial, but it's not about a style it's about a specific person, so I'm not sure if I should change some settings - I did, but I don't know if it's good. And considering that it takes more than a day to complete an epoch of training, testing blindly does not really work in my case.
https://civitai.com/articles/618/tutorial-kohya-ss-dreambooth-ti-textual-inversion-embedding-creation

viral geyser Oct 2, 2023, 3:41 PM

#

Does anyone here train on a 4070 Ti with 12GB? And how long does it take for you to finetune a model

viral geyser Oct 2, 2023, 5:53 PM

#

Like I really don't get it

#

it takes 31 seconds for 1 step

#

Is 12GB that bad?

normal ember Oct 2, 2023, 6:03 PM

#

viral geyser Does anyone here train on a 4070 Ti with 12GB? And how long does it take for you...

Full model or LoRA?

#

From what I know you need like 16G to just load the model

#

Possibly some tricks here https://twitter.com/kohya_tech/status/1672826710432284673

Kohya Tech (@kohya_tech)

SDXLのLoRA学習、Text Encoderの出力をキャッシュすればVRAM 12GBでbatch size 1、rank (dim) 128のC3Lier(LoCon)まで行けそう（rankが低いとわりと余裕がある）。キャッシュしないと16GB必要みたい。

sullen locust Oct 2, 2023, 6:26 PM

#

Hello , everyone I'm getting this error while generating images:
Runtime : m1 and m2 have same dtype

stone garden Oct 2, 2023, 6:29 PM

#

i have a lora but it only really works at 1.4, not 1. just adam8bit. what do i increase to push it earlier? lr: 0.0003, weight decay 0.1, cosine with 5 restarts. testing and changing individual settings barely does anything so its probably a combination of both

#

its got 5000 steps so that shouldnt be the problem...

viral geyser Oct 2, 2023, 6:40 PM

#

normal ember Full model or LoRA?

It’s a SDXL Lora so I suppose a full model. But I am quite new to teaching LoRA

viral geyser Oct 2, 2023, 6:41 PM

#

normal ember From what I know you need like 16G to just load the model

16gb for training is quite insane ngl😅

#

I mean

#

It’s working now

#

But at this rate it will be done in 2 days

normal ember Oct 2, 2023, 6:45 PM

#

viral geyser It’s a SDXL Lora so I suppose a full model. But I am quite new to teaching LoRA

Ok, 12G for LoRA might be enough but you probably need to pull all the tricks there are not go over 12G and into RAM.

viral geyser Oct 2, 2023, 6:49 PM

#

normal ember Ok, 12G for LoRA might be enough but you probably need to pull all the tricks th...

You got any tips on where to find these tricks?

normal ember Oct 2, 2023, 6:51 PM

#

train unet only, gradient checkpointing, xformers, cache latents to disk, small batch size, not too large network dim

#

possibly use optimzer adafactor instead of adamw but I'm not fully sure, maybe someone else knows better

#

check reddit too, I'm sure there are many that wants to train LoRA on 12G

viral geyser Oct 2, 2023, 7:00 PM

#

I will check, thanks for the tips!

lethal lily Oct 2, 2023, 7:05 PM

#

Hi, I'm trying to create an embedding / textual inversion for the style of an artist, I'm using kohya and the learning doesn't seem to work very well. Can someone help me ?
I'm using this tutorial, but it's not about a style it's about a specific person, so I'm not sure if I should change some settings - I did, but I don't know if it's good. And considering that it takes more than a day to complete an epoch of training, testing blindly does not really work in my case.
https://civitai.com/articles/618/tutorial-kohya-ss-dreambooth-ti-textual-inversion-embedding-creation

gentle flame Oct 2, 2023, 11:58 PM

#

ripe scaffold https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features#checkpoint...

afaik it just prunes the model

weary field Oct 3, 2023, 4:20 AM

#

Who is the best LoRa creator here? I want to create children’s books and have LoRas for multiple characters to be able to create books at scale.

Anyone have experience doing something like this?

oak coral Oct 3, 2023, 1:24 PM

#

Could someone help me complete a render that was too closely zoomed in and ControlNet refused to work with the Checkpoint (ZavyChromax v12) I created it with? Disclaimer: It's NSFW but not overtly so.

opal jacinth Oct 3, 2023, 5:45 PM

#

Any idea why my LoRA came out really bad if I use raw images from my mobile phone with resolution 3024x4032? Kohya didn't log any errors. Trained with training resolution 1024x1024 and the results were really bad.

But after cropping those to 1024x1024 and training with same parameters, the results were excellent.

I usually train with random cropped images and never had issues, it must be the high resolution? Dunno, it's super strange for me.

bitter merlin Oct 3, 2023, 7:00 PM

#

Question about Lora training:
A friend of mine was suggesting to train different spots.. "in between" so you can find which one is the best and not under trained and over trained.

hollow valley Oct 4, 2023, 12:46 AM

#

just wondering if anyone has worked out how to train a person who has a lot of tattoos properly?

gentle flame Oct 4, 2023, 2:38 AM

#

opal jacinth Any idea why my LoRA came out really bad if I use raw images from my mobile phon...

random cropped images is why

#

I don't really know anyone that uses it anymore

#

since bucketing exists now

latent charm Oct 4, 2023, 5:06 AM

#

I want to make a 16 latent channels vae for testing how could I start?

opal jacinth Oct 4, 2023, 7:30 AM

#

gentle flame random cropped images is why

but I have issues with uncropped raw images from my mobile phone with resolution 3024x4032... that's why I'm confused. There is only little resemblance. But if I crop them to 1024x1024 I get decent results

stark tinsel Oct 4, 2023, 10:11 AM

#

Hi, I'm trying to make a Lora that generates variations of cute animals faces? I've managed to train a Lora to generate the same watercolor texture and style, but the animals faces are always the same, always the same bunny from sdxl base or other models. Is it possible to get this kind of variation just with Loras? Or would I have to train a checkpoint for this?

#

I would like to get variations like these ones
Different animals faces

normal ember Oct 4, 2023, 1:15 PM

#

@stiff dust Tested this?
`--v_pred_like_loss ratio option is added. This option adds the loss like v-prediction loss in SDXL training. 0.1 means that the loss is added 10% of the v-prediction loss. The default value is None (disabled).

In v-prediction, the loss is higher in the early timesteps (near the noise). This option can be used to increase the loss in the early timesteps.`

stiff dust Oct 4, 2023, 1:18 PM

#

no, but it sounds strange. if you want to increase the loss in earlier timesteps, either use min_snr_gamma or use min and max timesteps. v-pred is a quite different objective and you usually need hundred thousands of training steps to adapt a model to v-pred

stiff dust Oct 4, 2023, 1:18 PM

#

stark tinsel Hi, I'm trying to make a Lora that generates variations of cute animals faces? I...

I found that the text encoder is very quickly overfitting on your training data. If you have enough sample images, try to train the unet only.

stark tinsel Oct 4, 2023, 1:23 PM

#

stiff dust I found that the text encoder is very quickly overfitting on your training data....

Thank you! I'll try to train unet only and see the results

normal ember Oct 4, 2023, 1:31 PM

#

stiff dust I found that the text encoder is very quickly overfitting on your training data....

Ok, maybe it's just for full training. It does not say.

wanton yoke Oct 4, 2023, 7:18 PM

#

we just published this article about using the new ChatGPT multi modal to help improve and accelerate captioning. Sharing this to this great community https://civitai.com/articles/2436

ChatGPT-4 Multi Modal for Captioning Datasets for Stable Diffusion ...

There are many discussions about the importance and pain that comes from captioning images for Stable Diffusion models. All of us who fined tuned m...

normal ember Oct 4, 2023, 8:08 PM

#

wanton yoke we just published this article about using the new ChatGPT multi modal to help i...

When you can use the API it will get even more useful. Not sure about the cost though.

normal ember Oct 4, 2023, 8:23 PM

#

https://github.com/kohya-ss/sd-scripts/issues/855

GitHub

Text encoder of SD 1.5 model is not trained which is not supposed t...

Here the executed command accelerate launch --num_cpu_threads_per_process=2 "./train_db.py" --pretrained_model_name_or_path="/workspace/stable-diffusion-webui/models/Stable-diffusion...

stiff dust Oct 4, 2023, 9:00 PM

#

good that I haven't done git pull since many weeks ^^°

normal ember Oct 4, 2023, 9:10 PM

#

I train unet only anyway 😄

astral island Oct 5, 2023, 4:43 AM

#

hey can anyone help me with a question?

astral island Oct 5, 2023, 4:43 AM

#

astral island my question is: all else being equal (the class name, the instance name, etc.), ...

this one

stiff dust Oct 5, 2023, 9:30 AM

#

astral island my question is: all else being equal (the class name, the instance name, etc.), ...

no, that doesn't make any sense at all

#

in general: SD inference and SD training are two very different things. What you do when SD generates images is VERY different from what you do when you train SD

edgy wharf Oct 5, 2023, 12:20 PM

#

woven delta Working on this rn. Afaik there is no way to train a separate inpainting Lora at...

Do you know any alternatives?

astral island Oct 5, 2023, 3:18 PM

#

stiff dust in general: SD inference and SD training are two very different things. What you...

can you explain in more details?

#

you mean during training SD doesn't generate an image so that it can calculate an L2 loss with the ground truth image in the dataset?

stiff dust Oct 5, 2023, 3:56 PM

#

yes, during training you don't generate images, you denoise images

#

or better: you use an image, add noise to it, then predict the noise on the image (the difference between the noise you added and the noise predicted is your l2 loss)

#

if you generate an image "from scratch" (at inference) things are different. You start with pure noise and then denoise it. The problem itself is ill-posed here. If I use a pure noise image and ask for predicting the noise, the answer is trivial (everything is noise). You can't really use any meaningfull loss here

#

the reason it still works at inference is that SD does not know that the image is pure noise and tries to find any familiar patterns in the noise. Similar as humans can look into the sky and see faces in the clouds

#

also things like CFG are inference-only, they don't exist at training time

#

also training is always only one step while in inference you use many steps to generate the image

#

and many other things. It's just that training and inference are two different things in SD. It's not like in most other machine learning problems, where inference and training are more or less the same. Here, you basically solve two different tasks.

astral island Oct 6, 2023, 12:04 AM

#

@stiff dust Thanks for the explanation. I keep looking for some 3blue1brown style deepdive on the training process but there's none.

#

About the noise adding process, is it adding noise until the whole image is completely noise? Or just a little bit of it?

#

And what exactly is the "predicted noise"? Is that yet another separate process?

#

Or maybe you're saying we denoise the image similar to img2img? And you calculate the difference only on the pixels changed by the noise adding and denoising process?

brittle ridge Oct 6, 2023, 1:03 AM

#

Prompt for Steampunk Batman in Victorian London (Inspired by Dishonored)

Visualize Batman in a steampunk attire set against a Victorian-era London street:

Batman's Attire: Dark leather combined with bronze elements. Instead of the traditional Bat-logo, imagine a bat-shaped gear centered on his chest. His utility belt would be a series of leather pouches adorned with copper rivets and dangling chains. His eyes would glow behind amber-colored aviator goggles.

Victorian London Street: Wet cobblestones from a recent rain, with a gentle mist rising from the sewers. Gas-lit lampposts casting a golden glow, creating dancing shadows. Tall red-bricked buildings with smoking chimneys lining the street, and Victorians in period attire casting furtive glances at the Dark Knight.

Background Elements: A blimp displaying a large bat insignia lights up the night sky of London, serving as a steampunk Bat-signal.

Capture the atmosphere and tones reminiscent of the game "Dishonored", blending the old with the futuristic in a unique manner.

stiff dust Oct 6, 2023, 8:51 AM

#

astral island About the noise adding process, is it adding noise until the whole image is comp...

you basically mix the original image with a random noise image. 0th timestep would be 100% noise, 0% image. But you skip that step. You train randomly by drawing a number between 1 and 999. If your for example draw the numer 200 then you use 20% image and 80% noise. At least for a linear scheduling. In practice, other scheduling schemes are in use. What the unet predicts is noise image (quite unintuitive, as you are interested in the image, not the noise, but you get the image after subtracting the noise)

astral island Oct 6, 2023, 9:25 AM

#

stiff dust you basically mix the original image with a random noise image. 0th timestep wou...

ok, but how do you generate the noise? just the random algo that gives you the starting noise during inference?

stiff dust Oct 6, 2023, 9:25 AM

#

astral island ok, but how do you generate the noise? just the random algo that gives you the s...

yes, just gaussian random noise

astral island Oct 6, 2023, 9:28 AM

#

@stiff dust when you say the unet predicts the noise image, do you mean it generates a noise image using a prompt?

stiff dust Oct 6, 2023, 9:59 AM

#

astral island <@321288280651857922> when you say the unet predicts the noise image, do you mea...

no. It gets a mixture of noise and image as input and has to predict the noise part of the image. The prompt is an additional input/conditioning

astral island Oct 6, 2023, 11:21 AM

#

stiff dust no. It gets a mixture of noise and image as input and has to predict the noise p...

when the unet predicts which part of an image is noise, is it simply saying which part of the image it thinks "looks wrong"?

#

or is it something more complicated?

woven delta Oct 6, 2023, 12:18 PM

#

edgy wharf Do you know any alternatives?

You can just train a Lora using base model and use it for inpainting.

edgy wharf Oct 6, 2023, 1:26 PM

#

woven delta You can just train a Lora using base model and use it for inpainting.

I have already done that but results was not good. #🤝｜tech-support message

tall condor Oct 7, 2023, 9:48 PM

#

hey guys, can someone recommend a windows tool to auto crop faces?

unique cloak Oct 7, 2023, 10:00 PM

#

tall condor hey guys, can someone recommend a windows tool to auto crop faces?

I never used it myself, but this one seemed popular amongst contacts I had. https://github.com/leblancfg/autocrop

tall condor Oct 7, 2023, 11:20 PM

#

thx

latent charm Oct 9, 2023, 8:37 AM

#

What does it mean If I fine tune the model without caption?

normal ember Oct 9, 2023, 9:32 AM

#

Wouldn’t it train as responding to empty prompt?

stone garden Oct 9, 2023, 1:31 PM

#

It will train on everything the images are, i.e, taking everything in the image so it will train based on the characters, backgrounds, and even the shadows or stuff as it will really have no idea on what exactly to replicate or what the loss is trying to find out

latent charm Oct 9, 2023, 1:32 PM

#

normal ember Wouldn’t it train as responding to empty prompt?

I think yes but would it improve the overall quaility in selected images via training?

latent charm Oct 9, 2023, 1:35 PM

#

stone garden It will train on everything the images are, i.e, taking everything in the image ...

I had tried this but I had issue with this type of training. If I have tag a exist in image A and tag a also exist in image B. The tag A learn with both images and mixed the looking.

stone garden Oct 10, 2023, 5:00 AM

#

latent charm I had tried this but I had issue with this type of training. If I have tag a exi...

I mean yea, but tag a should correspond to something- for example if tag a is 'forest environment' then it should correspond to the forests in both image a and b

#

Using tag a on an image without a forest will of course result in it being weird, for example if you use tag a on an actual forest but then on a desert, it will force a mix by the loss optimiser as to match both images

latent charm Oct 10, 2023, 5:02 AM

#

Something like that.

#

I use wd1.4 tagger

#

Some tag exists in multiple images. It has the mix issue further

stone garden Oct 10, 2023, 5:04 AM

#

You gotta do some manual tag checking for that tho- or change the training setting so it doesn't take your tags seriously

#

I also used wd1.4 and ngl it's wayyy easier to use it but I am still gonna stick with blip captions- They are just better for sdxl

stone garden Oct 10, 2023, 2:19 PM

#

I used around 120 images to train a LoRa with myself, but it usually only learns my ears (distinctively sticks out) and hair, but my face details usually get lost. (And many times it has artifacts.)

I'd love to use it both with graphic/real ckpts, not sure if possible.

My base model is RealisticVision 2.0, as that got me the "best" results so far.

Ngl I feel like it's a GIGO problem.

What kind of dataset should I provide? I know it has to be "varied", but for example when I include too many "different expression, but looking away" that pose gets overtrained(?), and appears too frequently (even when tagged).

I read a few guides, but the dataset part is always very vauge, I'd be thankful for some examples.

(I always crop to 1:1, and remove the bg)

latent charm Oct 10, 2023, 2:23 PM

#

You could use 10 face focus images to get your face lora

stone garden Oct 10, 2023, 2:25 PM

#

latent charm You could use 10 face focus images to get your face lora

now that you say "face lora", I realised that the full-body shots are useless (and probably hurting my results) as I can always tell SD to what body to draw (so added flexibility)

thanks, I'll retry like that

stone garden Oct 10, 2023, 2:25 PM

#

latent charm You could use 10 face focus images to get your face lora

should I do diff. expressions, angles?

latent charm Oct 10, 2023, 2:27 PM

#

yes, you could train with different angle face images

stiff dust Oct 10, 2023, 5:09 PM

#

stone garden I used around 120 images to train a LoRa with myself, but it usually only learns...

hi, first of all: what you describe might be a side effect of how CFG works in inference. During training, there is no CFG. But when you generate images, you always use the CFG (often a default value of 7 or even higher). Often CFG is described as "how strong your prompt influence your image". But you can also think of it as an enhancer. When you generate a face of yourself what the cfg is doing is it takes what differs your face from the average face and adds this to the image. So the CFG exaggerates your facial features. Anything that makes your face special will be increased by the cfg. So when you have the feeling that your images have strange artefacts, first thing to try is always to decrease the CFG value. This is particularly useful when you want photorealism. Try with CFG 4 and check if the images look better

#

besides that, I don't think that full body shots are useless.

#

However, you are using SD 1.5, right? I don't have much experience with 1.5 training, but I found that it is quite vulnerable if you train it on too much variety (or too much images in general). I have to say I always struggled training my face on 1.5, but the best results in 1.5 I got with very few and high quality images. I got MUCH better results when training on SDXL, though, and in SDXL I could use as much images as I want and results got rather better than worse

normal ember Oct 10, 2023, 6:00 PM

#

Not related to this but I've tried many lr_samplers and optimizers. I seem to have better control over overcooking when using a cosine scheduler with some warmup along with AdamW.

#

Also about alpha of half of dim seems to be working nicely.

#

warmup is 100 steps which is about 6-7% of my total steps

#

Prodigy worked but was too big movement between epochs so was hard to select a good sample.

stone garden Oct 11, 2023, 5:39 PM

#

stiff dust hi, first of all: what you describe might be a side effect of how CFG works in i...

I've a lot more testing to do, but lowering CFG to 4 gave instantly better results. Thanks for the tip!

latent tiger Oct 11, 2023, 8:55 PM

#

hey! I have a some what strange question I think.
So I'm wondering is it possible to train a Lora in parts?
(train it one time on the first part of a data set then train it in another run on the second part and so on)
Would this be posible? I'm asking as my googlcolab time is limited i have large data set.

jade hornet Oct 12, 2023, 12:37 AM

#

latent tiger hey! I have a some what strange question I think. So I'm wondering is it possibl...

Absolutely yes,I do that all the time. It's like cooking really. The trick is to figure out which concepts train fast, or from which point you want to resume... Maybe take some concepts out so you can focus on others, it gets crazy, but it's doable

jade hornet Oct 12, 2023, 12:45 AM

#

latent tiger hey! I have a some what strange question I think. So I'm wondering is it possibl...

Or even if it's just one concept, save in step increments, and use the last safetensors file to start the next batch

latent tiger Oct 12, 2023, 5:11 AM

#

jade hornet Or even if it's just one concept, save in step increments, and use the last safe...

Oo that's amazing news 😁, thank you! I'm going to try it!

sonic narwhal Oct 12, 2023, 11:26 AM

#

What do you think the training recipe for those "world morph" models on civitai is? As in data set and training

#

how big and how much variation in dataset for "world morph"?

normal ember Oct 12, 2023, 11:41 AM

#

sonic narwhal What do you think the training recipe for those "world morph" models on civitai ...

There could be metadata in the model that could give clues. Some removes it, but quite common it’s still present.

sonic narwhal Oct 12, 2023, 1:55 PM

#

normal ember There could be metadata in the model that could give clues. Some removes it, but...

https://github.com/by321/safetensors_util

Using this?

GitHub

GitHub - by321/safetensors_util: Utility for Safetensors Files

Utility for Safetensors Files. Contribute to by321/safetensors_util development by creating an account on GitHub.

normal ember Oct 12, 2023, 3:56 PM

#

sonic narwhal https://github.com/by321/safetensors_util Using this?

Yeah, that would probably work. Simple file viewer would do too.

#

It's the __metadata__ entry you want to have a look at.

sonic narwhal Oct 12, 2023, 4:05 PM

#

Thank you

normal ember Oct 12, 2023, 4:52 PM

#

@stiff dust Have I understood the code correct that the network_alpha acts as a brake of how much the weights can change for each training step?

if self.lora_layer.network_alpha is not None: w_up = w_up * self.lora_layer.network_alpha / self.lora_layer.rank

#

If so is there a logic to increase the brake if and when you increase the network_dim

stiff dust Oct 12, 2023, 4:57 PM

#

because its a matrix factorization. The weight change is w_down @ w_up (@ = matrix multiplication). So a single weight is changed by the dot product between a row in w_down and a column in w_up.

#

the length of these vectors is the rank of the lora

#

if you have rank 1 then you just multiply two numbers

#

if you have rank 100 then you multiply 100 numbers and sum them up

#

now during training each single weight parameter is changed in each step, but the change cannot be arbitrary high (due to the learning rate)

#

but changing 2 x 100 numbers and multiply them and sum them up gives you a 100 times larger change than just changing 2 x 1 number

#

so in each training step you make a small update on the lora. But a lora of rank 100 has a 100 times stronger effect than a lora of rank 1, so you divide the result by 100 to make both comparable

#

otherwise you would have to decrease your learning rate whenever you increase the rank

normal ember Oct 12, 2023, 5:04 PM

#

Makes sense! How large are these vectors in the base model?

#

And thanks for a excellent reply as always!

#

Or maybe the dot product is always stored?

stiff dust Oct 12, 2023, 5:12 PM

#

this is just how matrix multiplication works. The matrices in the original model are different depending on the layer and so on, but usually they are quite big (~1000 rows and columns)

normal ember Oct 12, 2023, 5:32 PM

#

What I find odd is that when training a LoRA with alpha 1 vs something higher you don't necessarily get overtrained in the same way as a too high learning rate or too many training steps would do.

#

It just seems like the LoRA loses some flexibility in regards to how it can be mixed with other prompts if it's trained with a higher alpha.

#

Like the signal stronger in the LoRA than the base model so it get's preference over the model. But I guess that makes sense on what you explained earlier.

#

Let's say you trained on a photo dataset and try to generate an image with a anime style. When using a higher alpha the base model seem to get less priority and the data in LoRA get stronger which results in an image that's more of a photo or completely a photo but could have some traits of the anime style from base model like the character becomes asian instead of something like in the photo dataset.

#

I guess that has to do with the weight vectors has a higher value since we didn't reduce the w_up as much when the alpha is higher which overpowers the base model.

stiff dust Oct 12, 2023, 10:11 PM

#

I never noticed such an effect of the alpha. Should check that myself

latent tiger Oct 13, 2023, 5:29 AM

#

jade hornet Or even if it's just one concept, save in step increments, and use the last safe...

Hey hope i may ask another question about incremental training. Do you divide you data set in to parts or will the check point save hoe far it progressed?

latent charm Oct 13, 2023, 4:08 PM

#

@normal ember Do you have any result of giving the cropped coordinate in meta data? I want to improve my fine tune with hand anatomy training but not sure the coordinate would help or not.

normal ember Oct 13, 2023, 4:11 PM

#

latent charm <@721443437349699615> Do you have any result of giving the cropped coordinate in...

I have not taken it any further yet, I've had much else to test first.

#

It would be neat to get tools that could replicate the training SAI has done when training the base model

latent charm Oct 13, 2023, 4:12 PM

#

they released their training tool

#

But I haven't really look into it

#

https://github.com/Stability-AI/generative-models

GitHub

GitHub - Stability-AI/generative-models: Generative Models by Stabi...

Generative Models by Stability AI. Contribute to Stability-AI/generative-models development by creating an account on GitHub.

restive bridge Oct 13, 2023, 10:16 PM

#

What ever became of Lora-Fa? anyone using it? advantages?

stone garden Oct 14, 2023, 4:53 AM

#

restive bridge What ever became of Lora-Fa? anyone using it? advantages?

From what I have heard it just didn't have any advantage over traditional lora techniques, but if you were to do comparisons, there would be some differences but enough to be worth it? Not at all

jade hornet Oct 14, 2023, 4:52 PM

#

latent tiger Hey hope i may ask another question about incremental training. Do you divide yo...

The only thing saved is the weights. If you stop and start again it'll start over as far as recursing through the image folders

latent tiger Oct 14, 2023, 4:58 PM

#

jade hornet The only thing saved is the weights. If you stop and start again it'll start ov...

Sorry I'm not sure what you mean. So it starts over from the start or from where the checkpoint stopt?

stone garden Oct 14, 2023, 6:25 PM

#

latent tiger Sorry I'm not sure what you mean. So it starts over from the start or from where...

I may be wrong, but weights are continuously updated rather than a single full sized update, and the updated weights are further updated nearly independently than the previous loaded ones, so once you have the weights at any stage they should be usable as it is or can be further updated at any point without prior approval from previous weights

#

That is they start over every time for each update, using the updated weights-

#

(I am still learning About this so I can be wrong)

stone garden Oct 14, 2023, 9:40 PM

#

Hello everyone. I'm trying to train a LoRA on a person, but I can't seem to get the facial features right. Is there a way to "focus more on the face" when training the LoRA? Or provide some kind of "weights to the image pixels" when doing the training?

jade hornet Oct 15, 2023, 12:42 AM

#

latent tiger Sorry I'm not sure what you mean. So it starts over from the start or from where...

I'll say it another way, basically everything it has learned will be preserved, ie the weights. It sounded like you were asking about the image dataset. It doesn't bookmark what image was being compared

#

For that reason, I normally save at intervals and have multiple saves, when it starts to go off course just make adjustments and start from the last save that was going well

latent tiger Oct 15, 2023, 7:24 AM

#

stone garden That is they start over every time for each update, using the updated weights-

Thank you for your response, from what i can tell your totally right! And that why i want to avoid overtraining them on my data so I'll tey to avoid letting them train over and over on the same images. Thank you for your help 😁

latent tiger Oct 15, 2023, 7:26 AM

#

jade hornet I'll say it another way, basically everything it has learned will be preserved, ...

Thank you! Than I'll need to divide my data set in to several chucks i guess

#

Thank you all for you help ☺️, really appreciate it!

foggy cradle Oct 15, 2023, 5:00 PM

#

hi folks - is it possible to inference a single image using two LoRa characters?

queen matrix Oct 15, 2023, 9:59 PM

#

Sure why not. You wold just load both loras and use both keywords/names. You probably would have to adjust the strength of each lora for best effect.

foggy cradle Oct 15, 2023, 10:27 PM

#

the concern would be that they merge into a single blended character instead of creating separate characters

queen matrix Oct 15, 2023, 11:03 PM

#

Yeah it is somewhat likely to do that. Probably most loras are trained primarily with images containing only one figure in the image. But you'll generally get some images where it mixed the characters, some images with two of the same character, and some with actually two separate characters like you want. Prompting to be clear that there are two people can help.

foggy cradle Oct 15, 2023, 11:48 PM

#

wonder if we could do it by 1) inferencing LoRa character_A 2) outpainting with LoRa B

queen matrix Oct 16, 2023, 12:13 AM

#

That could work. You could also get an image with two characters and use img2img with a mask to replace one of them.

carmine zinc Oct 16, 2023, 6:47 PM

#

How do I gt SD to do more vibrant colours? it always darkens the pictures at the end:

stone garden Oct 17, 2023, 5:39 AM

#

stone garden Hello everyone. I'm trying to train a LoRA on a person, but I can't seem to get ...

For anyone wondering, one cannot do this in Kohya but can in OneTrainer.

quaint viper Oct 17, 2023, 2:45 PM

#

I guess this channel is mostly about LoRA's but does anyone here have experience training ControlNet? (and is there another channel where I should be asking?) I am attempting to train a ControlNet and I keep getting these weird high frequency details that I don't want. For example this cat. I am not sure if I just need to train the model more or this is a signal that I have already overbaked it or what. Training loss has been going down very slowly but the effect remains the same.

#

Any advice would be very welcome

quaint viper Oct 17, 2023, 3:01 PM

#

This is the loss in case anyone is interested. Batch size is 160 and the training set has 125,280 images

wary wedge Oct 17, 2023, 6:24 PM

#

What model is better to use for custom LoRa training in pixel style? SD 1.4, 1.5 or XL? And also why? Should I use the cpp fork of the repo for better performance because my specs are not that good.

#

And also how do I do 16x16

sonic narwhal Oct 19, 2023, 9:11 AM

#

quaint viper I guess this channel is mostly about LoRA's but does anyone here have experience...

There is a discord for controlNET training. Dont remember the name

delicate fractal Oct 19, 2023, 10:18 AM

#

quaint viper This is the loss in case anyone is interested. Batch size is 160 and the trainin...

What is loss what does this mean ?

quaint viper Oct 19, 2023, 1:35 PM

#

sonic narwhal There is a discord for controlNET training. Dont remember the name

Oh really? If you can remember anything that could help me find it, that would be great. I'll try google

quaint viper Oct 19, 2023, 1:39 PM

#

delicate fractal What is loss what does this mean ?

When you train a model, the way it works is by defining a loss function that the optimizer can minimize. Think of it like a metric for how well the model does and the optimizer uses it to improve the model. In this case, the loss is mean square error, so the average of all the (model_output - target) ^ 2 in the batch

#

If the loss goes down, it's going the right way

latent charm Oct 20, 2023, 7:47 AM

#

@normal ember @stiff dust Does training empty token(no caption) would affect the whole bias of the model? Or any training would affect the whole model? My friend claimed that training without caption would change the whole model style and I don't understand. In my understanding training without caption would train as empty token but how would it affect the whole model?

stiff dust Oct 20, 2023, 8:04 AM

#

on inference you do cfg (classifier free guidance) which means you run the unet once with and once without caption

#

so training on empty caption retrains the prior distribution of the model

#

(what it thinks how images look like without knowing a caption)

latent charm Oct 20, 2023, 9:14 AM

#

Oh, thanks a lot. It solved my question. I feel the empty caption would affect the result but don't know how it affected.

versed crescent Oct 20, 2023, 1:34 PM

#

I'm having a really hard time performing a LoRA training on SDXL using a friend's face as input. I have ~25 images, and I'm seeing his likeness in the resulting checkpoints, but I have a SUPER hard time performing any kind of styling with his likeness, like I did with the original Dreambooth workflow on SD1.5. It always wants to pull the resulting image back towards the training images, and when I use earlier checkpoints, his likeness is lost. I'm currently not using regularisation images either, as I just want the LoRA to make images of the tuned person.

stiff dust Oct 20, 2023, 2:19 PM

#

I got best results (regarding styles) with:

using rare tokens (e.g. "photo of chris thsgc")
train unet only (this is very important as the text encoder is very sensitive to overfitting)

#

in general it's hard to get a checkpoint that gives you perfect photorealism AND perfect generalization

#

but the results I got are still thousand times better than what I achieved with SD 1.5

normal ember Oct 20, 2023, 3:41 PM

#

I’ve found it very sensitive to both too high and too low learning rate. A higher alpha have given me much better results too. Everything without reg images.

#

Can’t verify it yet but I feel like I get better results with 10 repeats instead of 10x epochs.

#

A high learning rate looks good on the loss graph but results normally worse.

#

I’ve trained both rare tokens and not. Both possible but I agree that it’s easier to get overfitting when it’s something well trained. Alpha helped when trying to train something known.

versed crescent Oct 20, 2023, 8:52 PM

#

I never touch Alpha, and have Rank/Dimension set to 128. I also use train the unet only. Hmmm

#

I don't really have a good idea about what alpha does. If the value is 1, I think it defaults to the dimension size?

versed crescent Oct 20, 2023, 9:31 PM

#

Also, is it strictly necessary to use the class prompt when triggering a lora trained on a specific prompt? If I train with the unique token "ohxw" and the class of "man", do I need to use "ohxw man" ? or just the unique word

stiff dust Oct 20, 2023, 9:46 PM

#

versed crescent Also, is it strictly necessary to use the class prompt when triggering a lora tr...

use same style of captions you use for training. I actually prefer to use manually written captions instead of the automated ones, so you see better how you have to prompt it

stiff dust Oct 20, 2023, 9:46 PM

#

versed crescent I never touch Alpha, and have Rank/Dimension set to 128. I also use train the u...

also try lower rank. Very likely you will get similar results with rank 16 or 24

versed crescent Oct 20, 2023, 9:47 PM

#

yeah I don't use a class prompt in the caption, but it is encoded in the directory name of the training images, for the kohya_ss interface

stiff dust Oct 20, 2023, 9:47 PM

#

versed crescent I don't really have a good idea about what alpha does. If the value is 1, I thin...

it's a scaling factor. To be very simple: a higher alpha means the lora learns faster. I still haven't tried if high alpha are really better. It sounds counterintuitive for me, but might be possible

stiff dust Oct 20, 2023, 9:48 PM

#

versed crescent yeah I don't use a class prompt in the caption, but it is encoded in the directo...

yes, that somehow is then encoded in the prompt. It's probably something like "photo of [token] [class]" or so. In this case you should also prompt it like that

versed crescent Oct 20, 2023, 9:48 PM

#

ok

ruby pond Oct 20, 2023, 10:38 PM

#

is there some way to visualise the block weights of the lora, to see which blocks were trained? kind of like a feature importance graph

stiff dust Oct 20, 2023, 11:03 PM

#

just take the norm of the matrices. This is usually a direct readout of the importance

normal ember Oct 21, 2023, 7:08 AM

#

Would be neat that you could trace the network from a prompt

normal ember Oct 21, 2023, 7:12 AM

#

versed crescent Also, is it strictly necessary to use the class prompt when triggering a lora tr...

I've made experiments on doing both captioning with class and without, and then prompting with and without. I came to the conclusion that it would be better to just caption as you prompt like kaibioinfo said. It made no big difference though. But could be that I mainly use known captions and they are already associated with class in the model.

wanton rampart Oct 21, 2023, 7:20 AM

#

How to teach the bot a new artstyle like madhubani and mural

stone garden Oct 21, 2023, 7:57 AM

#

Anyone know how these guys are getting 10+ it/s on AMD?

#

https://vladmandic.github.io/sd-extension-system-info/pages/benchmark.html

SD WebUI Benchmark Data

SD WebUI Benchmark Data; Author: Vladimir Mandic

#

The best I can do is 7it/s with a 7900xt

#

karmic flame Oct 21, 2023, 8:40 AM

#

Hello, I want to ask: What will be the best way or approach to train a stable-diffusion model on my pre and post-image dataset? For example, consider a dataset with 'pre-images' featuring a man without a beard and 'post-images' depicting a man with a beard after one month. I want to develop a project where, given the pre-image without a beard as the input in an image-to-image task, and with a prompt specifying a given time, such as 'at 2 months,' the output should contain post-images of a man with a similar face but with a beard after 2 months.

So my questions are,

is it even possible with SD, if yes, what should be my best approach to train the model?
if SD should not be my choice, what should be my alternative approach?

Any help would be really appreciated.

stiff dust Oct 21, 2023, 11:11 AM

#

normal ember Would be neat that you could trace the network from a prompt

you could measure the cross attention - but that's already more complicated

stiff dust Oct 21, 2023, 11:12 AM

#

karmic flame Hello, I want to ask: What will be the best way or approach to train a stable-di...

that sounds like a task for control net

#

I guess GANs are a bit better for your task than diffusion models. But it should be possible in SD

#

but you usually need a lot of training data for training a control net. How large is your pre- and post-image dataset?

tropic moon Oct 21, 2023, 2:25 PM

#

Anyone have any idea why training textual inversion gets such crazy results when not on base 1.5 model?

normal ember Oct 21, 2023, 7:15 PM

#

Try #🤝｜tech-support this is mainly for discussing tuning / training of models.

small eagle Oct 21, 2023, 7:34 PM

#

pretty sure lorai offers a few free credits to create a quick lora (currently broken), are there any others that offer a free one before signing up?

normal ember Oct 22, 2023, 12:00 PM

#

This seems neat to possibly be able to train the model with consumer hardware. https://github.com/bghira/SimpleTuner/releases/tag/v0.7.0

GitHub

Release v0.7.0 - tune SDXL in just 12G VRAM! · bghira/SimpleTuner

DeepSpeed lives!
Now correctly integrated, DeepSpeed allows for training SDXL's full U-net at 8 seconds per iteration on just ~12G of VRAM. See the documentation for more information!
Features
...

jade hornet Oct 22, 2023, 5:08 PM

#

stone garden Anyone know how these guys are getting 10+ it/s on AMD?

Linux or Windows? Are you running rocm 5.6? Think 7900 needs that

#

Buying the newest card usually means waiting for drivers to catch up

stone garden Oct 22, 2023, 5:10 PM

#

They're on Linux, but in on WSL2 so I don't think it's the same

karmic flame Oct 23, 2023, 3:04 AM

#

stiff dust but you usually need a lot of training data for training a control net. How larg...

I've 30 train, 10 validation and 10 test images..pre-post in each set..

opal jacinth Oct 25, 2023, 8:16 AM

#

if I'm working with f.e. 1024x1024 regularization images and my training data set has another resolutions (= bucketing enabled), is that a problem? Would the results be better if I crop the training data beforehand to 1024? I've read conflicting statements

edgy wharf Oct 25, 2023, 3:29 PM

#

Hi! I want to train the img2img lora model to see a particular sofa model in other rooms. I did a tutorial before using SD XL base 1.09, but the results were not good. I upscaled 14 images and deleted their backgrounds, leaving only the seats. Do you have any suggestions? Sample data;

#

"ohwx, a couch with a blue cushion, modern sectional sofa with a reclining mechanism, basic background" here is my example prompt

tall condor Oct 25, 2023, 9:36 PM

#

anyone has any experince with creating his own Danbooru Tagger model

#

i want to create my own tagger model so that it can generate my own tags

jade hornet Oct 26, 2023, 2:00 PM

#

tall condor anyone has any experince with creating his own Danbooru Tagger model

No, but sounds like a lot of work to reinvent the wheel. You could just skip using a tagger and use the tag editor built into webui, you can do any tags on any subset of photos. I've seen others out there also. Not trying to discourage, just know how much work that'll probably be

tall condor Oct 26, 2023, 7:54 PM

#

but i have like 20k images i need to tag

latent charm Oct 26, 2023, 8:09 PM

#

Usually people use wd14 for tagging anime because they used danbooru tag. If your dataset is anime, just use wd14 is good enough. Otherwise, many research have been using custom llm as a tagger to provide descriptive prompt for model training. for example, gpt4v or open source multi modality llm

normal ember Oct 26, 2023, 8:15 PM

#

You could try openflamingo but I have not tested on tagging. The way it works is that you provided it with a few images and captions and it will use that as a reference on how the captioning should look like

#

GPT-4V is great but no API yet

gentle flame Oct 27, 2023, 1:27 AM

#

#1003207327203209236 message

#

Something recent that's supposed to reduce noise for captioning

#

I haven't used it, but it might help

modest igloo Oct 28, 2023, 7:46 AM

#

hey everyone question: when making a LORA of lets say a specific type of haircut, should I also use tag like 1girl, black shirt etc? or should I only describe what I want like, for example, m0hawk, green hair. My worry is that if I tag too much stuff, using my lora will also polute the model with data I dont want like faces, backgrounds etc. Am I thinking too much?

stiff dust Oct 28, 2023, 11:00 AM

#

modest igloo hey everyone question: when making a LORA of lets say a specific type of haircut...

I think the idea is that if you describe the image as best as possible the model will be "polluted" as less as possible. i.e., if the image shows a girl and you tag it as girl, then you won't change the model too much. If you tag it as person, then the model will be changed more.

modest igloo Oct 28, 2023, 11:14 AM

#

stiff dust I think the idea is that if you describe the image as best as possible the model...

I see, my concern was that if I add girl to my Lora end then I use the Lora it will render a different girl even tho I just want to render then hair.

But maybe I'm approaching the issue the wrong way and instead I should use something like control net and inpainting etc to swap the hair on the face I want.

Because I noticed that the Lora I made with pictures of haircuts my wife made that if I add the to my prompts, after a certain threshold starts to change things like the pose or even the girl shown.

tired plank Oct 29, 2023, 2:25 PM

#

Question about Lora captioning

Say I'm training a specific "species" of wyvern(aka i'm training a specific and niche type of fantasy creature)

When blip caption my training set images it's almost always "a dragon with a point head on a black background"

Is it better to remove the dragon and fix it to be "<my keyword> head on a black background...<more descriptions here>" or should I leave that and just do "<my keyword> a dragon head on a black background"? Does letting SD know it's similar to a dragon make it better?

#

I'm not sure what the model understands better

normal ember Oct 29, 2023, 2:56 PM

#

I’d say depends on how you want to be able to prompt it at inference. If you train it to associate with dragon you can’t obviously also get a wyvern and a dragon in the same image.

#

If this type wyvern has a specific name I’d use that.

tired plank Oct 29, 2023, 3:01 PM

#

so i should caption more like "closeup of rathalos(<-keyword) head flying through the sky among the clouds" and just drop any reference to dragon or wyvern?

normal ember Oct 29, 2023, 3:02 PM

#

If it’s obvious that it is flying when close-up of the head

#

If not, I’d skip the flying.

#

You could also associate rathalos with wyvern if you want to train several types of named wyverns that share similarities.

tired plank Oct 29, 2023, 3:11 PM

#

The wyverns from Monster Hunter tend to share a very similar body plan so if this Lora doesn't end up being trash I might try that

#

I was using the class as dragon for right now cause I figured since the SDXL model already knows what a dragon is and rathalos is dragonish it might help with the learning. At least that's what ther guides I read suggested. I'll try again without the dragon captioning

#

New to this so i'm just doing this to learn, thanks for the advice

normal ember Oct 29, 2023, 3:15 PM

#

Is this the guy? Here's the prompt used: a majestic wyvern named rathalos soars through the clouds on wings that span over 10 feet. his scales shimmer with every movement, and as he lets out a piercing roar, lightning bolts strike around him. in the distance, a group of hunters can be seen attempting to bring down this formidable creature.

tired plank Oct 29, 2023, 3:15 PM

#

Oh you can post images here...I'll get a sample image from my training set

#

#

If you prompted that image using "rathalos" in sdxl, it's not that far away from Rathalos IRL

normal ember Oct 29, 2023, 3:19 PM

#

Never tried training with a solid background and how it would react, somebody else in here might know.

tired plank Oct 29, 2023, 3:19 PM

#

I'm worried about some of my backgrounds in my training set. I was looking for high res screenshots for the game but couldn't find stuff ofer 4 MP

#

Was wondering if I should tag my backgrounds or not, but I have heard mixed things on tagging backgrounds

normal ember Oct 29, 2023, 3:20 PM

#

And you can't capture them yourself from the game?

tired plank Oct 29, 2023, 3:20 PM

#

I can, it was more a laziness thing since I don't have the game instealled but I could do that.

#

Maybe I could also generate a background using sd and just paste ^that guy onto it

normal ember Oct 29, 2023, 3:22 PM

#

Getting him from different angles, medium shot, close-ups and full might help too.

tired plank Oct 29, 2023, 3:24 PM

#

I have a couple different angles in my set now but a lot of the images are like renders or statues
https://1drv.ms/f/s!Aujm3Wog6vazpXlHjqUlkwhzICG_?e=6gasQV

20_rathalos wyvern

Folder

normal ember Oct 29, 2023, 3:25 PM

#

Worth a try to caption if it's a statue or from a game too if you want flexibility that is.

#

And from there, just experiment and compare the results.

tired plank Oct 29, 2023, 3:35 PM

#

Speak of the devil, it just finished. Though an error on the 9th epoch made it so I only got 8, but here's the 6th epoch

#

Not terrible. Well kinda terrible but not as bad as I was expecting
prompt: a dragon flying through the sky, flame in it's mouth,lora:RATHALOS_RATHALOS-000006:1

tired plank Oct 29, 2023, 3:50 PM

#

Is the above image underfitted over fitted? Not sure how to improve it. My first thought is it's underfitted. Also thing some of those training images need to be replaced by better screenshots in game

stiff dust Oct 29, 2023, 6:15 PM

#

I would say try again with a better prompt.

tired plank Oct 29, 2023, 6:36 PM

#

I tried a chibi version

#

Not bad, even after losing his arm in the war

smoky flame Oct 29, 2023, 9:23 PM

#

Hey all! What's the best repository for training a controlnet on SDXL?

jade hornet Oct 30, 2023, 1:49 PM

#

smoky flame Hey all! What's the best repository for training a controlnet on SDXL?

Haven't tried it, but have you tried following this tutorial https://huggingface.co/blog/train-your-controlnet
Scripts here https://github.com/huggingface/diffusers/tree/main/examples/controlnet

Train your ControlNet with diffusers

GitHub

diffusers/examples/controlnet at main · huggingface/diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch - huggingface/diffusers

smoky flame Oct 30, 2023, 10:54 PM

#

@jade hornet Thank you I've seen that - it's not specific for SDXL - that example is for SD1.5 right?

jade hornet Oct 31, 2023, 12:29 AM

#

Presumably, but there's an sdxl script in that folder

#

Just wondering if you tried it

normal ember Nov 1, 2023, 5:04 PM

#

@latent charm Tried bakllava-1?

latent charm Nov 1, 2023, 5:35 PM

#

normal ember <@331826740898824195> Tried bakllava-1?

?

#

https://github.com/THUDM/CogVLM CogVLM seems great. But it requires 24GB*2 to inference

GitHub

GitHub - THUDM/CogVLM: a state-of-the-art-level open visual languag...

a state-of-the-art-level open visual language model | 多模态预训练模型 - GitHub - THUDM/CogVLM: a state-of-the-art-level open visual language model | 多模态预训练模型

#

I am fine tuning SDXL with captions that generated by an VLM, the captions are selected and modified by me. Let see what would be the result. The dataset is 2400 text-images pairs using tag, generated captiosn and empty captions with 10 repeat.

normal ember Nov 1, 2023, 6:00 PM

#

latent charm ?

It's a multimodal variant of llava but instead of using base llama2 it uses a finetuned version of llama2 named mistral 7b which in itself is a great model for it size.

#

It's very GPT-V like

#

It's the first model I've tried that can be controlled consistently to the result you want.

latent charm Nov 1, 2023, 6:04 PM

#

Really, sounds great. I heard mistral 7b is pretty good too.

normal ember Nov 1, 2023, 6:09 PM

#

I've written some code that automates the captioning

#

I'm using llama.cpp as it has a built in rest-api and is fast and lightweight, then some python code that sends the instruction and the images.

latent charm Nov 1, 2023, 6:17 PM

#

After I had done the captioning, I found that calculate the clip score would help to found out the unwant captions, too short or too high might result bad caption.

#

https://arxiv.org/pdf/2310.20550.pdf CapsFusion

#

Also I just see this.

normal ember Nov 1, 2023, 6:27 PM

#

and then a full tune of the model?

latent charm Nov 1, 2023, 6:30 PM

#

They doesnt release the weight

normal ember Nov 1, 2023, 6:32 PM

#

I'm thinking about your captioned data you are training

rapid meteor Nov 1, 2023, 6:33 PM

#

hey guys has anyone tested training models vs loras for people? Does one produce better results over the other?

latent charm Nov 1, 2023, 6:34 PM

#

normal ember I'm thinking about your captioned data you are training

using dreambooth in kohya

#

rent an a100 40gb running now

normal ember Nov 1, 2023, 6:35 PM

#

AdamW? What LR?

latent charm Nov 1, 2023, 6:37 PM

#

constant pageadamw8bit. lr 2e6, noise offset 0.1,

latent charm Nov 1, 2023, 6:38 PM

#

rapid meteor hey guys has anyone tested training models vs loras for people? Does one produce...

I heard fine tune is better but I didnt compare by myself

normal ember Nov 1, 2023, 6:40 PM

#

I'm running on my 4090 now adamw8bit, some offset noise (0.0357) and 3e-6

#

but you are probably running a batch size larger than my 1 😄

latent charm Nov 1, 2023, 6:40 PM

#

batch size20

normal ember Nov 1, 2023, 6:41 PM

#

What I don't have a clue about is how many epochs it will take

latent charm Nov 1, 2023, 6:41 PM

#

using pageadam8bit could increase bs to 4 and fp16 in 24gb vram

#

or even 5

normal ember Nov 1, 2023, 6:42 PM

#

I have some room yes, using 18G VRAM atm.

#

I've run 4 times as long as a LoRA would take me. It's moving increadibly slow, but I guess that's normal since the LR is way less.

latent charm Nov 1, 2023, 6:43 PM

#

I am doing the large bs large repeat route. It seems give me better results

normal ember Nov 1, 2023, 6:43 PM

#

I do 10 repeat

latent charm Nov 1, 2023, 6:43 PM

#

I have 3 dataset with repeat 10

normal ember Nov 1, 2023, 6:44 PM

#

How many epochs do you plan to run? I save state so I can resume.

latent charm Nov 1, 2023, 6:44 PM

#

running with max 20 epoch

#

I just rent 3 days of a100 to see the progress and have to decide would continue the training or not

#

it totally uses 300 hours

normal ember Nov 1, 2023, 6:46 PM

#

that's 480 000 steps if I calculated it correctly, 24000 batches

latent charm Nov 1, 2023, 6:47 PM

#

around 70000 steps

#

with 20 bs

normal ember Nov 1, 2023, 6:47 PM

#

runpod?

latent charm Nov 1, 2023, 6:48 PM

#

Another platform

spring arch Nov 3, 2023, 8:32 AM

#

i trained my lora on base SD1.5 model.Later i merged that lora into another SD1.5 model and retrained it.Now the newly created LORA doesnt work on other models apart from the ones i retrained.Any help on this

latent charm Nov 3, 2023, 12:45 PM

#

Standalone Kosmos-2 auto captions https://github.com/lrzjason/kosmos-auto-captions
python autoCaptionsKosmos.py --input_dir /path/to/input --output_dir /path/to/output --clip_failed_dir /path/to/clip_failed
--clip_failed_dir is optional. Just need to enter input_dir and output_dir

GitHub

GitHub - lrzjason/kosmos-auto-captions

Contribute to lrzjason/kosmos-auto-captions development by creating an account on GitHub.

coral canopy Nov 3, 2023, 2:50 PM

#

latent charm Standalone Kosmos-2 auto captions https://github.com/lrzjason/kosmos-auto-captio...

That looks cool, but it already failed at describing the image on the GitHub page.

latent charm Nov 3, 2023, 3:14 PM

#

coral canopy That looks cool, but it already failed at describing the image on the GitHub pag...

Just a random image with description. Not cherry pick LOL.

#

It still has the hallucination like other llm

coral canopy Nov 3, 2023, 3:27 PM

#

latent charm Just a random image with description. Not cherry pick LOL.

I can't find the part where the repo owner it is saying that its not cherry picked. Anyway, the caption is littered with assumptions and things that are not actually happening within the image. That's a bit sub-optimal. I would like to see other examples and if it does that more often.

latent charm Nov 3, 2023, 3:28 PM

#

I am the repo owner

#

I selected another example after you point it out. LOL

coral canopy Nov 3, 2023, 3:29 PM

#

🤣 oh, sorry. I couldn't connect lrzjason with XiaoZhi

latent charm Nov 3, 2023, 3:29 PM

#

That fine

coral canopy Nov 3, 2023, 3:31 PM

#

Ok, that's more fitting to the image. I guess there is still a lot of cleaning up to do, but it would be interesting to run a training with those captions.

latent charm Nov 3, 2023, 3:32 PM

#

I am already training with 2400 images from other llm generated captions. It is 19% now.

#

kosmos-2 would give captions with more subject composition

coral canopy Nov 3, 2023, 3:33 PM

#

I think it was the highly descriptive captions what makes Dalle-3 king at the moment. Time to reproduce that.

latent charm Nov 3, 2023, 3:34 PM

#

Yeah, I am encouraged by Dall E 3 and pixart-alpha

coral canopy Nov 3, 2023, 3:35 PM

#

What's the current state of pix art-alpha? Are the weights released?

latent charm Nov 3, 2023, 3:36 PM

#

Still no. But they released the LLaVA-captioning inference code.

#

Just checked

#

Interesting. You could use pixart-alpha scripts to caption your dataset.

#

It uses LLaVA-Lightning-MPT-7B-preview

coral canopy Nov 3, 2023, 3:45 PM

#

Thats indeed interesting.

mossy condor Nov 3, 2023, 3:50 PM

#

anyone here knows how this works?
https://colab.research.google.com/github/hollowstrawberry/kohya-colab/blob/main/Lora_Trainer.ipynb
i want to create a lore of an artstyle, but dont know how to set up for it, please help!

Google Colaboratory

stiff dust Nov 3, 2023, 4:04 PM

#

spring arch i trained my lora on base SD1.5 model.Later i merged that lora into another SD1....

such things can happen if the model yiu merge your lora into had some special noise training routine (e.g., noise offset or pyramid noise)

plain cloak Nov 3, 2023, 6:05 PM

#

hello everyone im trying to train my model for guns but what tag do i use

#

because when i use a custom tag i get nothing

hardy storm Nov 4, 2023, 2:05 PM

#

Anyone here tried training a model on a house/property?

I've been kicking around this idea of taking photographs of my childhood home (exterior) and the surrounding 1 acre of grounds (trees, pond, barn, etc), and making a LoRA out of it. I'm just not sure how best to do it... specifically:

a) If I should only train it on, for example, just the house, or if I could include photos that focus on other features of the property, so long as they're all connected (trees, pond, barn, etc)?
b) How to caption the images? For example, should I approach it like I'm training an object, as if the 'object' is the entire property?

blazing turret Nov 4, 2023, 4:35 PM

#

hi i just starting with automatic 1111 and in my canvas zoom im missing this pen/painbrush option. What should i do to get is (was trying to update agony )

digital dune Nov 5, 2023, 1:16 AM

#

hardy storm Anyone here tried training a model on a house/property? I've been kicking arou...

ofc u can

#

take as many pictures and describe everything, and preferably use a base model of one that is already trained on architecture, which there are probably already a lot of

#

if you train on kohya it will ask for a keyword and class. something like 01_childhoodhome house would do it. house is the class and childhoodhome is the keyword

hardy storm Nov 5, 2023, 1:27 AM

#

digital dune if you train on kohya it will ask for a keyword and class. something like 01_chi...

Those are great tips. Thank you kindly!

pure plume Nov 5, 2023, 12:03 PM

#

Hi, i've got a style lora and i think it's time for a 2nd ver.
what is the method here?
re-train the lora or use the old one and just bring the old data in?

plain cloak Nov 5, 2023, 12:05 PM

#

is it possible to finetune SDXL models

oblique adder Nov 5, 2023, 8:03 PM

#

when i interrupt my render i get the style i want, it usually happens when i interrupt at 80%, what setting do i have to change for it to be consistent and automatically save to my folder?

ocean fractal Nov 5, 2023, 8:07 PM

#

oblique adder when i interrupt my render i get the style i want, it usually happens when i int...

I would also like to know. A lot of the images look better before the final step.

oblique adder Nov 6, 2023, 2:35 AM

#

right now i just interrupt and save manually but i'd love to just pump images out without checking what % its at

grand depot Nov 6, 2023, 8:59 PM

#

Hi! I'm looking for repositories of open datasets that people use to finetune text-to-image models. One such repository I found is https://huggingface.co/datasets?task_categories=task_categories:text-to-image. Are there any other ones?

warm veldt Nov 7, 2023, 6:02 PM

#

oblique adder when i interrupt my render i get the style i want, it usually happens when i int...

skip clip 2?

oblique adder Nov 7, 2023, 6:02 PM

#

warm veldt skip clip 2?

already on 2

tall condor Nov 7, 2023, 9:39 PM

#

hey guys, which tagger are you using atm?

#

i am particularry looking for one for faces, expressions, poses and so on

#

one that generates tokens

normal ember Nov 8, 2023, 8:06 PM

#

tall condor hey guys, which tagger are you using atm?

gpt-4-vision-preview is probably the best way to do it now.

tall condor Nov 8, 2023, 9:23 PM

#

can i deploy that locally?

normal ember Nov 8, 2023, 9:27 PM

#

No

#

It's probably something like 150-200 lines of code, OpenAIs API and some money.

tall condor Nov 8, 2023, 10:01 PM

#

can anyone explain to me why there is a difference between merging model A and B and Merging Model B and A?

spring arch Nov 9, 2023, 2:24 AM

#

Can't we train expressions for lora especially eyes and mouth.Its so horrible

coral quail Nov 9, 2023, 11:59 PM

#

Would it be possible to train a version of SD that is good at making emojis in a specific style, like android or ios?

#

Or could this be achieved by simply using input images or prompts without a separate model?

normal ember Nov 10, 2023, 12:45 PM

#

coral quail Would it be possible to train a version of SD that is good at making emojis in a...

Yes, that has been done: https://replicate.com/fofr/sdxl-emoji

fofr/sdxl-emoji – Replicate

An SDXL fine-tune based on Apple Emojis

#

https://huggingface.co/SvenN/sdxl-emoji

SvenN/sdxl-emoji · Hugging Face

coral quail Nov 10, 2023, 10:57 PM

#

thank you for the link! I'll try to get that running locally on my dockerized SD instance. Generally with image file prompts, is there a way to essentially say "this but recolor/redraw in the same style and position", like changing the color of a flag emoji or swapping what is in the hand of a human emoji? Or maybe this should just be handled by the prompt/negative prompt? @normal ember

normal ember Nov 10, 2023, 11:21 PM

#

coral quail thank you for the link! I'll try to get that running locally on my dockerized SD...

The model will generally learn the style. Probably a good start with emoji's since they are vector based and can be scaled to correct resolution and converted to pixels. You got good description of them too that you can use for building the captions.

hollow spruce Nov 11, 2023, 1:47 AM

#

just saying hi, I'm not ded XD
here's some updates with the cool new toy, juggernaut + my lora based on my master dataset

#

jugger / jugger + lora

#

prompts & settings are taken from civitai (unmodified), the ones used to represent the model. so they're cherry picked to favor juggernaut

#

nsfw + general anatomy works better as well, but I can't show the progress on that here

#

I like how one of the cover images for the model literally uses "150 mm" (headshots), "full body", "sitting" in its tags :"D
my lora really didn't like that original composition of just a headshot when these tags are added. But its only loaded at 0.7 strength, so oh well
||complex 3d render ultra detailed of a beautiful porcelain profile woman android face, cyborg, robotic parts, 150 mm, beautiful studio soft light, rim light, vibrant details, luxurious cyberpunk, lace, hyperrealistic, anatomical, facial muscles, cable electric wires, microchip, elegant, beautiful background, octane render, H. R. Giger style, 8k, best quality, masterpiece, illustration, an extremely delicate and beautiful, extremely detailed ,CG ,unity ,wallpaper, (realistic, photo-realistic:1.37),Amazing, finely detail, masterpiece,best quality,official art, extremely detailed CG unity 8k wallpaper, absurdres, incredibly absurdres, robot, silver halmet, full body, sitting||

#

not sure what to think of this one :/ it doesn't show off juggers abilities. why would they list this as one of the main cover images? not even the people tags are followed in any way
will redo it with one of the people I trained in my datasets
Portrait Photo a portrait, hyperdetailed photography, by Elizabeth Polunin, red haired young woman, Gianna Michaels, brooklyn, looking straight to camera, sweaty, olya bossak, nepal, very accurate photo, suspiria

#

redone with shirogane instead of the people they mentioned
Portrait Photo a portrait, hyperdetailed photography, by Elizabeth Polunin, red haired young woman, shirogane-sama, brooklyn, looking straight to camera, very accurate photo, suspiria

covert pagoda Nov 11, 2023, 5:08 PM

#

Anybody have a good auto cropping tool that sees subject to crop tighter in, while also using a given aspect ratio? With computer vision? Cropping my dataset and just want to speed it up!!

latent charm Nov 11, 2023, 5:18 PM

#

covert pagoda Anybody have a good auto cropping tool that sees subject to crop tighter in, whi...

You might find some tools in this repo https://github.com/longpeng2008/awesome-image-cropping

GitHub

GitHub - longpeng2008/awesome-image-cropping: auto image cropping/c...

auto image cropping/composition methods. Contribute to longpeng2008/awesome-image-cropping development by creating an account on GitHub.

covert pagoda Nov 11, 2023, 5:19 PM

#

latent charm You might find some tools in this repo https://github.com/longpeng2008/awesome-i...

Checking. Nice thanks

covert pagoda Nov 11, 2023, 5:24 PM

#

latent charm You might find some tools in this repo https://github.com/longpeng2008/awesome-i...

Actually don’t see any simple autocropper for portraits of humans

#

All very general

latent charm Nov 11, 2023, 5:26 PM

#

https://github.com/wuhuikai/TF-A2RL

GitHub

GitHub - wuhuikai/TF-A2RL: The official implementation for A2-RL: A...

The official implementation for A2-RL: Aesthetics Aware Rinforcement Learning for Automatic Image Cropping - GitHub - wuhuikai/TF-A2RL: The official implementation for A2-RL: Aesthetics Aware Rinfo...

#

Might be this one

covert pagoda Nov 11, 2023, 5:30 PM

#

Cool I’ll give it a try. At least it’s not just a white paper 🫠

#

Thx

covert pagoda Nov 11, 2023, 5:58 PM

#

latent charm https://github.com/wuhuikai/TF-A2RL

yikes. something is broken

#

(venv) D:\TF-A2RL>python A2RL.py --help
Traceback (most recent call last):
File "D:\TF-A2RL\A2RL.py", line 14, in <module>
with open('vfn_rl.pkl', 'rb') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'vfn_rl.pkl'

#

after installing all the modules and running python A2RL.py --help

latent charm Nov 11, 2023, 5:58 PM

#

hollow spruce redone with shirogane instead of the people they mentioned `Portrait Photo a por...

Long time no see. By using the same prompt, it is my latest fine tune. It still needs to adjust a bit before release.

covert pagoda Nov 11, 2023, 5:59 PM

#

will try to crop one with a crop command now

latent charm Nov 11, 2023, 6:00 PM

#

covert pagoda will try to crop one with a crop command now

I haven't tried that yet.

covert pagoda Nov 11, 2023, 6:00 PM

#

yea, onto to the next one. hard to find one that works

latent charm Nov 11, 2023, 6:01 PM

#

I also want to find a good tool to crop various ratio.

#

I find this but it has no code implement. https://arxiv.org/pdf/1911.10492.pdf

covert pagoda Nov 11, 2023, 6:03 PM

#

https://github.com/Morpheus2962/Smart_Cropper

GitHub

GitHub - Morpheus2962/Smart_Cropper: Smart Cropper is an image crop...

Smart Cropper is an image cropping tool which also straightens image. It uses python libraries to open and edit image. Along with crop it also can enhance and apply black and white filter. - GitHub...

#

need something that is ready to use

#

testing this one then giving up

#

nothing works at the moment

#

another possible one is https://github.com/xuebinqin/U-2-Net

GitHub

GitHub - xuebinqin/U-2-Net: The code for our newly accepted paper i...

The code for our newly accepted paper in Pattern Recognition 2020: "U^2-Net: Going Deeper with Nested U-Structure for Salient Object Detection." - GitHub - xuebinqin/U-2-Net: The ...

#

with this code:
`import u2net
import cv2

Load the pre-trained U2-NET model

model = u2net.U2NetModel()

Load the image you want to crop

image = cv2.imread("image.jpg")

Segment the image and remove the background

segmented_image = model.predict(image)

Crop the image to the desired size and aspect ratio

cropped_image = segmented_image[y1:y2, x1:x2]

Save the cropped image

cv2.imwrite("cropped_image.jpg", cropped_image)
`

#

U2-net seems a bit more advanced

#

@latent charm can you give this one a try?

#

I ll try the first one

latent charm Nov 11, 2023, 6:07 PM

#

I have used u2-net to remove background before

#

But didn't use it to cropping

covert pagoda Nov 11, 2023, 6:12 PM

#

smart cropper is totally off topic lol

covert pagoda Nov 11, 2023, 6:51 PM

#

ok, this seems to work quite well
https://www.youtube.com/watch?v=Fbuyu35TkE4

YouTube

SECourses

SOTA Image PreProcessing Scripts For Stable Diffusion Training - Au...

One of the most important aspect of Stable Diffusion training is the preparation of training images. In this tutorial video I will show you how to fully automatically preprocess training images with perfect zoom, crop and resize. These scripts will hugely improve your training success and accuracy.

Scripts Download Link ⤵️
https://www.patreon...

▶ Play video

digital dune Nov 11, 2023, 9:21 PM

#

Anyone has had better results in training by lowering/freezing the text encoder learning rate?

hollow spruce Nov 12, 2023, 12:18 AM

#

digital dune Anyone has had better results in training by lowering/freezing the text encoder ...

yes, if you have bad captioning
basically the worse it is, the less you should be doing TE training. also some words are off limits unless you're dealing in the 4k image range

I use either a super low TE, or UNET only training if my keywords are lazy, or I'm using a single word to describe them

#

inverse is also true. my best tagged dataset, where every tag occurs around 50~1000 times across 5k images, the TE training alone did more good than the actual training did.

digital dune Nov 12, 2023, 12:50 AM

#

Will this setting affect anything if I'm training dreambooth with no captions?

#

The issue I have is that after it finally starts getting my subjects right, the images end up a little "overcooked", probably from being influenced by my reg images some of which were made with loras for good detail. At the same time the gens stop "listening to my prompts" and just end up rendering my subject however it wants

#

I've already been told to lower my prior loss weight which seems to have great results so far but I'm wondering if the UNET/text encoder training also has anything to do with it

tall condor Nov 12, 2023, 1:10 AM

#

do you guys have any tips if my general dataset is learned well but the faces are bad?

#

i tag my dataset by hand + deepdanburu tags

#

i dont see too much issue with the tags but yet for some reason faces get messed up quite a lot

#

however the rest is learned fine

#

im on dreambooth btw not lora

#

@digital dune what i recommend is that you run deepdanburu tagging of your images and use at least 0.03-0.05 of nosie, also for me using random crop instead of center corp in combination with flip and color augmentation worked well

#

i recoomend much lower learning rates with that settings and more epochs

#

with dreambooth i mostly use 5e-7 or so as lr

#

also make sure to enable shuffle captions

#

i recommend traing with kohya ss

#

and add more weight to the promts that are hard to generate (in my experience not more that 5-10 times the base weight)

stiff dust Nov 12, 2023, 10:29 AM

#

digital dune The issue I have is that after it finally starts getting my subjects right, the ...

I have very contradicting experiments with text encoder training. Sometimes it overfits MUCH more than the unet, sometimes it gives the model more flexibility :/ So far I can say:

TE training is often vulnerable to overtrain on the image composition. As you wrote, sometimes it starts ignoring your prompt and put everything into the same composition as the training image
TE training sometimes totally overfits on style (makes everything anime or everything photographic), but not always. I have not found a pattern here :/
all overfitting effects seem to get less severe when I train with low batch size (e.g. batch size 1). I have no explanation for it, but that observation was made by other people, too
low learning rate is not always good. My feeling is that low learning rate + many epochs is much worse than high learning rate + few epochs

stiff dust Nov 12, 2023, 10:34 AM

#

hollow spruce yes, if you have bad captioning basically the worse it is, the less you should b...

but what is good captioning for subject training? I'm not a big fan of the dreambooth method (tagging everything with "photo of xyz"), however, even better captions like "photo of xyz hiking, sunny day, mountains in background" are not "good" captions I guess. In the end what we want with subject training is that the text encoder associates "xyz" with the person in the image.

normal ember Nov 12, 2023, 10:46 AM

#

I've also noticed not captioning "photo of" but only the subject itself will make it more flexible to use it in different styles. If you do the other way around more often then not you almost always get a photo even if you want anime or something else.

#

Takes forever to train with BS1, if it would not be for that I would probably almost always use that. BS of 4 seems like a good compromise.

#

@stiff dust What ratio between the rate for unet vs te have you found work the best? Same or different rate so to say.

stiff dust Nov 12, 2023, 10:53 AM

#

I always train them after each other. So I first train text encoder as short as possible. I stop training as soon as the subject in the image looks roughly as I want it and long before I see overfitting effects. Then I restart from that using unet only training

vapid mango Nov 12, 2023, 11:16 AM

#

Hello guys, I'm going to do a full finetuning for SDXL to learn the aesthetic. I'm wondering is there a need to finetune the text coders of SDXL as well? Or I just need to finetune the UNet? Some people told me they always disable the training of text encoder of SDXL when doing a full finetune. Many thanks

normal ember Nov 12, 2023, 11:22 AM

#

stiff dust I always train them after each other. So I first train text encoder as short as ...

With --save_state? or just start from the generated safetensors?

stiff dust Nov 12, 2023, 11:23 AM

#

you don't need save-state for that. Just from the generated safetensors

normal ember Nov 12, 2023, 12:02 PM

#

stiff dust you don't need save-state for that. Just from the generated safetensors

Any rough hints with the ratio if you seen any pattern with the ratio between te and unet?

#

(running first test now)

#

and have you tested --debiased_estimation_loss

stiff dust Nov 12, 2023, 1:52 PM

#

what's that option?

normal ember Nov 12, 2023, 2:22 PM

#

stiff dust what's that option?

https://github.com/kohya-ss/sd-scripts/pull/889

GitHub

Debias Estimation loss by sdbds · Pull Request #889 · kohya-ss/sd-s...

paper:https://arxiv.org/pdf/2310.08442.pdf
just change for SNR weight like min-snr-gamma

hollow spruce Nov 12, 2023, 4:52 PM

#

stiff dust but what is good captioning for subject training? I'm not a big fan of the dream...

subject (person or new outfit with new clothing/accessories) training is painful no matter how you do it thanks to sdxl :/
"good captioning + practices" to basically get the best result that sdxl + current training tools can offer is a lot of effort.
In return you get a lora where 5/5 images represent your subject between "good enough" and "utterly perfect"
^ my take on shirogane -> 400ish images + manual tagging + regularization images

Personally I'd rather do 20% of that work, to achieve a 'good enough' result, where I invest the same time into gathering the dataset, but keep my captioning down to keyword + automated
In return I get a lora where 3/5 images are good enough. 1/10 is "utterly perfect"
^my take on 2B cosplay -> 400ish images + keyword + automated tagging / no regularization images

(all of this without relying on absurd dimension settings, to hide any dataset issues)
also in regards to composition, all my (manually tagged) loras follow wording/composition better than default sdxl, so my manual tagging is definitely working

#

#

<subject name>, girl, race, type of photo | optional: unique things about this image - like is it a cosplay? is in front of a window?
important about the optional keywords, as well as the mandatory ones, is to have a regularization set, that is ALSO all tagged in the same style, with the same keywords appearing as well

in doing so, <subject name> actually gets trained on the unique details that make this person, this person.
by having different races in my reg set, it learns what features are racial, which are unique to this person. (also retrains some racial details, which are wrong in default sdxl)
type of photo is important, since it absorbs the the background/colors/general composition of the image, rather than letting it drift into the subject name. <- but this only works cause my reg set is ALL tagged in the exact same way by hand

hollow spruce Nov 12, 2023, 5:18 PM

#

for anyone else reading this: there are a lot of ways of doing this, and they are all correct
this is just the best option for me, based on my datasets and interests, and long terms goals with training sdxl

digital dune Nov 12, 2023, 9:52 PM

#

stiff dust I have very contradicting experiments with text encoder training. Sometimes it o...

The lower batch size theory is not a theory but a fact

#

I cant remember where I saw the documentation but it is definitely true

#

People often recommend never doing more than batch size 2, the REAL question is... how much worse does it go from BS2 to BS3? Or from BS1 to BS2? I wish there was a wiki for all of this but I guess since this is all relatively new technology we are all stuck on the trial-and-error phase.

#

Btw. Tysm for that info. It is SO hard to find help on these topics sometimes.

stiff dust Nov 12, 2023, 10:00 PM

#

digital dune The lower batch size theory is not a theory but a fact

yes, it's just totally unclear to me why this is the case 🤷‍♂️ I don't know such a phenomenon from any other machine learning problem

latent charm Nov 12, 2023, 10:08 PM

#

Does it mean low bs would be great to learn the fine details?

#

I usually use high bs which is great for the shape but not much for the fine details.

digital dune Nov 12, 2023, 10:16 PM

#

latent charm I usually use high bs which is great for the shape but not much for the fine det...

I think, but dont quote me on this, that you can mitigate the effects of high BS if you set the gradient accumulation steps to match

#

It was a long time ago since I tried it, but I remember having good results going BS8 and gradient acc steps to 8 as well. It is technically slower than GAS1, but it did work way back when I tested it. Might be worth a shot

stiff dust Nov 12, 2023, 10:53 PM

#

digital dune I think, but dont quote me on this, that you can mitigate the effects of high BS...

yes, gradient accumulation is the slow and less vram hungry variant of batch size. But it does not work "better". If high batch size won't work for you, gradient accumulation won't work either.

tall condor Nov 12, 2023, 11:47 PM

#

@stiff dust when i train with derambooth i never add my own prefix

#

i just tag the images natuarlly and create my own model. it works really well. i know the demos tell you to use a unique prefix but IMO its bullshit. it really makes no sese at all as long as you really finetune a model

#

if you do it propper you are not breaking your model by using dog instead of xyz dog

#

it will just make the dog the way you train the model your dog

#

and regarding batch size. all our batch sizes are low xD even 5-6 are low. allmost makes no difference than 1 or 2. i recommend lower LR with higher batch sized and more epochs

stiff dust Nov 13, 2023, 12:00 AM

#

tall condor if you do it propper you are not breaking your model by using dog instead of xyz...

the whole idea of using "xyz dog" is to not let the model make every dog to your dog. Also, it can help training if you add some rare tokens that are not yet used for anything

tall condor Nov 13, 2023, 12:00 AM

#

yes i know but that kind of kill the conecpt of training a model IMO

hollow spruce Nov 13, 2023, 12:00 AM

#

stiff dust yes, it's just totally unclear to me why this is the case 🤷‍♂️ I don't know suc...

sure this isnt just adafacter or the adaptive rates messing things up? cause on constant I never get this

stiff dust Nov 13, 2023, 12:01 AM

#

tall condor and regarding batch size. all our batch sizes are low xD even 5-6 are low. allmo...

yes, but that is not the point. Lower batch size just works better for whatever reason. Using lower learning rate with higher batch size is also super strange - you do it the other way around usually

tall condor Nov 13, 2023, 12:01 AM

#

i recommend to use cosine with 10% warmup

stiff dust Nov 13, 2023, 12:01 AM

#

hollow spruce sure this isnt just adafacter or the adaptive rates messing things up? cause on ...

what do you mean with constant? constant learning rate schedule? Yes, I always use that

hollow spruce Nov 13, 2023, 12:02 AM

#

I'm still doing constant with batch 8 or 10, and have no issues with overfitting. (do note that my smallest datasets are 300 images / and all are genuinely different from oneanother)

tall condor Nov 13, 2023, 12:02 AM

#

you can use constant with warmup

#

but you should use a small warmup phase

hollow spruce Nov 13, 2023, 12:02 AM

#

yeah, I do 5% warmup

stiff dust Nov 13, 2023, 12:02 AM

#

it might depend on what you are training. It's just: training subjects works better on super low batch size. I experienced that now for many different subjects and training setups.

tall condor Nov 13, 2023, 12:02 AM

#

thats fine. just to tacke the early overfitting

#

subjects,faces,expressions. basically mostly except styles

#

i mostly use learning rates between 1e-8 to 1e-6 and 200-300 epochs

#

with random crop,flip augmentation and at least 0.03 of noise

hollow spruce Nov 13, 2023, 12:04 AM

#

stiff dust it might depend on what you are training. It's just: training subjects works bet...

while my shadowheart lora worked out great, I guess I'll retry it with batch 1. see if there's an improvement

tall condor Nov 13, 2023, 12:04 AM

#

also shuffel caption is very importaint

#

for loras make sure to train the UNET like 10 times more than the text encoder

#

and for my cases lower learning rates with more epocs really do the trick

#

for loras that are like for faces i sometimes go down to 1e-8 for the text encoder even

hollow spruce Nov 13, 2023, 12:06 AM

#

tall condor for loras make sure to train the UNET like 10 times more than the text encoder

this depends on a lot of factors, and wont help without the context of your tagging style / dataset size / training settings

tall condor Nov 13, 2023, 12:07 AM

#

with 200 epochs

#

yes you are absolutely right

stiff dust Nov 13, 2023, 12:08 AM

#

all that depends on a lot of factors.
I have the feeling there is no right and only way of training. Seems to be different for every dataset and every problem you deal with.

tall condor Nov 13, 2023, 12:08 AM

#

in my experience it all stands and falls with the tagging

#

she better the tagging the better the result

#

atm my issue is that there are really not that many taggers that can produce tokens

#

i am using deepdanburu but that is very much animi based+

hollow spruce Nov 13, 2023, 12:11 AM

#

deepdanbooru also does double concept words. basically you if you use two or more words to describe the same thing, you eventually mess up both of them :/

stiff dust Nov 13, 2023, 12:11 AM

#

I always user natural prompts, but I should give tagging a chance

tall condor Nov 13, 2023, 12:12 AM

#

tagging is really helpful

#

solved alot of issues in my models

#

but its really hard work

stiff dust Nov 13, 2023, 12:12 AM

#

what convinces me is that tagging makes it easier to control if images are consistently prompted and if same tags appear in regularization images

tall condor Nov 13, 2023, 12:12 AM

#

i tagges most of my images by hand

hollow spruce Nov 13, 2023, 12:12 AM

#

automated always results in meh to good enough results. manual tagging gets me there all the way, but oh god it takes long. and I dont even want anyone to learn the horror that is hydrus network, just for the sake of fast manual tagging x_x

tall condor Nov 13, 2023, 12:12 AM

#

with 5-20 tags

#

and then use deepdanburu to enhance that

#

i even made a programm for that

#

that can weight concepts and respect the tags in them

stiff dust Nov 13, 2023, 12:13 AM

#

but manual tagging is still much faster than manual prompting

tall condor Nov 13, 2023, 12:13 AM

#

depends on then ammount of images

#

my biggest models have like 50k images

#

really no fun to tag that by hand

stiff dust Nov 13, 2023, 12:14 AM

#

with so many images you can automate that I guess

tall condor Nov 13, 2023, 12:14 AM

#

not really as the taggers all wont tag what you are looking for

#

like if you want a dog with big ears

#

no tagger will tag that for you

#

or small tail

#

and so on

#

they will tag dog,dark fur for you tho

hollow spruce Nov 13, 2023, 12:15 AM

#

writing your own python script, to tag based on your custom word flavor chains is your best option

stiff dust Nov 13, 2023, 12:15 AM

#

you might be able to train one on the clip embeddings

tall condor Nov 13, 2023, 12:15 AM

#

i treid to train the deepdanburu tagger to learn my tags

#

that failed hard xD

#

also it only runs in cpu for some reason here

#

took 8 days just to not work xD

hollow spruce Nov 13, 2023, 12:16 AM

#

tall condor also it only runs in cpu for some reason here

rip

#

but yeah. training your own classifier wasnt very optimized, last I checked

tall condor Nov 13, 2023, 12:16 AM

#

what rank are you guys using for lora?

hollow spruce Nov 13, 2023, 12:16 AM

#

so unless you rent a A100 cluster, you can forget about it XD

stiff dust Nov 13, 2023, 12:16 AM

#

16-32

#

for text encoder rather 4

tall condor Nov 13, 2023, 12:17 AM

#

im on 128 atm but it generates a very inflicting model in some cases

#

A100 cluster for sure bro xDDD

hollow spruce Nov 13, 2023, 12:17 AM

#

8~32 for normal loras
64~128 for my master dataset (currently at 6k images manually tagged -> goal is around 30k)
256 to make a point of why not to use 256 XD

tall condor Nov 13, 2023, 12:17 AM

#

im happy i can afford 4x 4090

#

i havent even started with sdxl yet

#

because that would probably tain half a year on that

#

my largest model takes 3 weeks on 4x 4090

#

with 1.5

hollow spruce Nov 13, 2023, 12:19 AM

#

tall condor im on 128 atm but it generates a very inflicting model in some cases

at 128 every small error will kill the general capabilities of sdxl x_x

tall condor Nov 13, 2023, 12:20 AM

#

i once converted a 128 model to a rank 4 model

#

allmost no difference

hollow spruce Nov 13, 2023, 12:20 AM

#

not that the lora isnt working, just that standard sdxl looses its composition & detail afterwards, unless you train to reinforce it again inside your lora

tall condor Nov 13, 2023, 12:21 AM

#

im a bit scared of sdxl yet xD

#

still struggeling with 1.5

#

but the sdxl results look amazing

hollow spruce Nov 13, 2023, 12:21 AM

#

worst part is that the tutorials have a lot of critical false info included 😭

tall condor Nov 13, 2023, 12:22 AM

#

but thats the same for 1.5

#

so much trail and error

hollow spruce Nov 13, 2023, 12:22 AM

#

basically, rank 32 is the highest you can go without damaging sdxl. anything higher than that, and you need to account for the downsides in your lora, to retrain those

tall condor Nov 13, 2023, 12:23 AM

#

have you trained dremabooth on sdxl?

hollow spruce Nov 13, 2023, 12:23 AM

#

(not that that matters if you dont care about sdxl general capabilities)
artwork / anime / remaking your source images in slight variations XD