#🔧|finetune

1 messages · Page 13 of 1

stiff dust
#

if you show it images of your car you narrow this view and the model will no longer be able to create cars that don't look like yours.

tall condor
#

so what i need to do is to specify my car better and more clear right?

stiff dust
#

using something like "sks car" you basically tell the model that this is just one specific car

#

and the model will still be able to draw, let's say, comic images of your car

tall condor
#

but in theory if i leave the istance promt empty i can retrain car like that right?

#

what is confusing me tho is the link between instance and model, lets say my instance promt is xyz and what i want to train is "red car" so i build the advanced captions like "xyz red car driing on the street" and "image of xyz red car on the motorway"

#

and if i dont use xyz in the advanced captions will the model still be able to match my car?

stiff dust
#

no, but that's a good thing

#

you don't want the model to forget what a usual car is

tall condor
#

where does my own isstance stop? if i have xyz red car and xyz headlights of car on the motorway how does the system know my model is xyz red car and not xyz headlights or just xyz red

stiff dust
#

it's magic 🤷‍♂️
very likely it will associate the words xyz and car together in a sentence

tall condor
#

yea i fugured there is a lot of magic here 🙂

stiff dust
#

but that depends a bit on your training captions. If you always writr "xyz car" then it will more likely get stuck to that

#

that's why I prefer to randomize training captions as much as possible

tall condor
#

so in reality if i want to create my xyz car model i need to have a base model that really is the type of car i want as xyz car and then specialize it with advanced captions but allways ponting back to xyz car?

#

there is a checkbox iy kohya_ss "randomize captions" maybe i try that

#

when you say randomize you mean like relaly randomize or just redefining so that it still make sense

stiff dust
#

it will probably split the sentence by "," and shuffle

tall condor
#

what is the "," doing when it is tokenized

stiff dust
#

I mean with randomize to avoid fixed patterns

tall condor
#

is it like a tag seperator?

stiff dust
#

like not caption everything with "xyz car, from front, on street". Also sometimes use "front view of car xyz on street" and so on

stiff dust
#

and don't overthink it 🤷‍♂️ just try and learn from errors xD in the end every training dataset behaves a bit different anyways

tall condor
#

yea but each cycle takes like 24 hours of trianing so trail and error is really not that funny xD

#

so the more i understand the better i can prepare xD

stiff dust
#

then first try to train for a few hours only and check how far you get

tall condor
#

i allready did that and this is why those questions come up 🙂

#

training each image 10 steps in 10 epochs is the same result as training each image 100 steps in 1 epoch?

stiff dust
#

it didn't worked out?

tall condor
#

its going there but i still have issues with overfitting or undefitting

stiff dust
#

oh, and one very important thing: use a good prompt for testing

tall condor
#

i used a learning rathe of 5e-7 and for some cases its still overfitting and some are underfitting and that is kind of frustrating

stiff dust
#

like I found that using "xyz car" as prompt I get shitty images all the time, and as soon as I get good images it already overfits

tall condor
#

yes i kind of have the same issue, how did you solve that?

stiff dust
#

so first experiment what is a good prompt that gives you a nice picture of any car and then use the same prompt for your cae

tall condor
#

ah ok i see

stiff dust
#

like these crazy "photography of a red car, masterpiece, perfect angle, blablablabla"

tall condor
#

i was hoping i can make a very roboust model that is not that specific and thus i used 5e-7

#

so that it is sloely learning the concept

stiff dust
#

and then replace "red car" by "xyz car"

tall condor
#

is the target to have as many different input images with an as detailed caption as possible or can i expect that at some point the system just learns my car and can map it on other cars?

#

and how can i tackle the issue that minly the model is oferfitting but yet some sepcifics that i have added pictures with captions from are underfitting in the same model

#

🙂 so many questions - i really like that stuff

stiff dust
#

it should... I mean, sometimes it can do that with 5 images already 🤷‍♂️ So I think 1000 images are more than enough

tall condor
#

for the overfitting and undefitting - shall i just increase the number of steps for the concepts that ar eunderfitted or shall i just keep training more with lower learning rate?

stiff dust
#

I think your learning rate is as low as possible 😅

tall condor
#

so if it is now overfitting i use too many steps of a particular concept?

stiff dust
#

but yes, you might try to add more examples of the images which still fail to the training data

tall condor
#

i was thinking of reducing the per image step sizes by /10 and then train 10 epochs while saving each epoch to see what is going on but im not sure if training 100 steps in the same epoch is the same as 10x10 epochs

stiff dust
#

and experiment with captions and prompts. Like it's super hard to get the model to the point where it dies everything right just with "xyz car" as prompt. If something doesn't work, try to describe it in the prompt, maybe that helps already

stiff dust
tall condor
#

so basically what you are saing is that the input promts with propper image selection is the key right?

stiff dust
#

not only for training, also for inference

tall condor
#

so what kohya is doing is that it allows for each folder to define how many times each image in that folder is trained

stiff dust
#

if you try if your model works and it gives you bad results, try to improve the prompt

tall condor
#

and currently i train each image in each folder between 100 and 300 times or so

#

i was wondering if i can reduce that to 10 to 30 and just train 10 epochs

tall condor
#

so i would have 30*1000 steps per epoch

stiff dust
#

just use one epoch and train each directory, e.g. 5 times

tall condor
#

if i train only 5 times each image the result is absolute crap xD

#

i found it to start working from 40 times +

stiff dust
#

40 x 1000 images? Really?

#

okay, you use an extremely low learning rate, maybe that's the reason

#

I think you should use larger learning rate and experiment a bit with less steps

tall condor
#

i found if i go to 1e-6 it is way to overfitting even at 40 times per image

stiff dust
#

yes, so use less steps;)

tall condor
#

but less steps the images get really blurry and stuff

#

well anways you gave me a whole lot of input thank you so much!

stiff dust
#

hm, if your caption is good it should produce good images from the start that more and more look like your training images

#

yeah, I would say try first with higher learning rates and less steps to experiment a bit snd find a good training setup

tall condor
#

i have a very wild mix of captions, also with alot of details, like headlights, spoilers and so on

#

and i want the model to be able to also understand those parts

#

thank you very much for your time sir

#

i will do some tests and report back with the results! 🙂

stiff dust
#

sure, good luck

tall condor
#

have a nice day

pure plume
#

I wish to train for a style for the first time, where should i start?
I mean i know i should build a dataset, but is it like 20 pics, 50, 100, 200?
how do I tag such a thing?
in the style of <whatever>?
how to decide if it's TI, Lora, a model?

stiff dust
#

try TI first, then do lora, use detailed captions. "in the style of <whatever>" is fine, but not important, you can also go for "by <whatever>".

tall condor
#

anybody has a link describing how the text tokenizer works for training

#

also is there a way to extend a model rather than retraining it for something specific?

#

and if i finetune what is the best model to base on? currently im working with 1.5 and some i found on the internet that have 50% ema mixed in

spark comet
#

hi

rigid starBOT
#
FAQ: What is Stability AI?

Our vibrant communities consist of experts, leaders and partners across the globe. They are developing cutting-edge open AI models for Image, Language, Audio, Video, 3D and Biology.. AI by the people, for the people. Learn more here stability

tall condor
#

hi 🙂

unique cloak
#

you won't be able to train SDXL yet, as it is not released for download.
As for "can you", to be honest, almost no chance SDXL could be trained on this. You can't train any model I think on 4GB VRAM currently, the minimum I see is 8GB for LoRAs training

#

even just running SDXL has a big chance to require a super fancy GPU

#

you can train some models in google colab though, using notebooks

unique cloak
#

it's not just taking long. it's another process and tool to train, and won't run at all on CPU or on 4GB VRAM 😢
with that card, your best bet is to use google colab for it from time to time, where a small training like 15 pics will be done in 20 minutes

tall condor
#

loui i will remind you when when you make a model with 1000+ images to train on sdxl xDD

#

ant it will take half a year lol

stiff dust
#

In my opinion all models have big issues with vikings. It seems to be that there is just no good training data for vikings. Like they all tend do give them weird horns and stuff

#

you might try different words or cultures that "feel similar", like "celtic" instead of "viking"

tall condor
#

if i train a model what is the best model to base on?

#

also is it safe to use a pruned model?

#

also shall i use ema or nonema for training

#

v2-1_768-ema-pruned.safetensors or v2-1_768-nonema-pruned.safetensors

#

or v1-5-pruned-emaonly.ckpt / v1-5-pruned.ckpt

stiff dust
#

always use ema

#

and yes, you can use pruned models. They just have some parts removed you don't need anyways

cold wyvern
#

Is loss=NaN an issue when training a SD2.1 model??

unique cloak
#

usually, loss=NaN means it's going badly yes. but it can also just bug out. this is very early in your training. unless you used a very high learning rate, this seems like a bug in display.
I would stop it and test though personally

cold wyvern
#

Any suggested rates for 2.1??

unique cloak
#

Dreambooth ? I use 6e-7 personally, with a polynomial scheduler

#

a lot lower than yours there

cold wyvern
#

kohya_ss gui

unique cloak
#

not sure then, especially if LoRA

cold wyvern
#

based on dreambooth yeah

unique cloak
#

yeah, I'm not sure how they implement it. I imagine those were the default values ?

cold wyvern
#

0.0001 was default

unique cloak
#

so 1e-4

cold wyvern
#

I'll try yours and up the epoch count

unique cloak
#

worth a try on 1e-4 already

#

but I do recommend using polynomial LR yeah

#

it makes LR reduce slowly over time

#

this is subjective, but from the tests I did on it, I find it gives better quality in smaller details

stiff dust
#

NaN is usually a problem of your precision

#

if you train with fp16 you need special techniques like mixed precision and gradient scaling

cold wyvern
#

had mixed precision = "fp16" in there already

#

and gradient checkpointing

#

am trying again with bf16 precision now

#

Thank you @stiff dust !

#

That's a batch size of 4 there at 768x768 on an A4000

unique cloak
#

just under a picture per second. not that bad !

cold wyvern
#

indeed, and it'll get a bit quicker as it goes on, not that it bothers me, I'm heading to bed and will collect the results tomorrow 🙂

tall condor
#

hi guys, im finetuning with dreambooth on RTX4090 but for some reason i can only do batch of 1

#

any idea what im droing wrong

#

as soon as i use batch of 2 my vram runs out

#

ResidentChiefNZ: i had the same issue yesterday

#

may i ask how you solve it? for me it worked switching to batch of 1 and no memory efficioen attention

#

but i would really like to use bacthes of 2

#

does switching from fp16 to bf16 help at all on the memory?

unique cloak
#

you can't train XL for now, even on DS. 2.1 can be taught specific things if you need to.
For a professional pipeline, this seems like a completly valid way to go still, and for quite some time
Same for SD 1.5, it has such a big user base, there are very specialized models doing wonders on those subjects.

I love SD XL, and I really like that you spread that love too, but let"s not antagonize over users either on what they use if they want to use it.
1.5 is far from an abandonware, 2.1 is not for old fart with the results it can give, let's water it down and enjoy what we each like, if it's cool with you

#

Using XL doesn't train it, no. it does help for sure on giving feedback on different things, like the tokens you use the most, and stats of use or even just all the great art you've been sharing since you came around, but other ways like pickApick, where human feedback is given back to the machine, are also great ways to help on it

The comparison you are proposing is not directly possible though. A prompt is tuned for the model it's targeted on, using the tokens that "resonate" the most with it. The good question there would be, for a given wanted result, what quality can you achieve by tinkering the prompt. But this becomes an unfair question, putting XL at a disavantage since it stays generalistic, and not specialized on a given result

Here is an example where we compared this with TwoDukes, one of those currently finetuning SDXL, if you really want :
SD 1.X results : #🍥|anime message
SDXL results : #🍥|anime message
we'll get more "on point" results from the models trained on the specific prompt, even if SDXL does really really great on it too, but importing more from styles that weren't prompted (like 3D in this example)

It's gotta be kept in mind that SDXL isn't out yet, it's just the beta, and that beta ROCKS, I'm all with you, and I have great hopes for it

unique cloak
#

but to give another comparison I just did

#

prompt is "a realistic picture, professional portrait of a cat octopus creature wearing a suit, unmythical creatures"

#

and this second example is on a 1.5 trained model

median flax
#

The checkpoints on Civitai based off 1.5 beat pretty much everything I've ever seen

cold wyvern
#

As in the community trained/merged checkpoints found on CivitAI.com

median flax
#

You need to get the right one for the right job though

tall condor
#

FYI if i switch to bf16 instead of fp16 i can use batches of 2

#

can i expect changes in the result based on this change?

stiff dust
#

sorry dude, but you seem to have no understanding how ai works 😅

cold wyvern
stiff dust
#

oh sorry, I was referring to louis strange monologue above

stiff dust
#

dude, your comparison mean nothing. You can make photos with 1.5 with much better quality than vanilla SDXL, just use the right model and the right prompt

#

Research Model - How to Build Protogen ProtoGen_X3.4 - Enbrace the ugly, if you dare... By Downloading you agree to the Seek Art Mega License , and...

NEW: Download the new User Guide here: RPG User Guide v4.3 Available on: Originally posted to HuggingFace by Anashel Mage: https://www.mage.space/u...

Deliberate All in One / Any Case Version This model provides you the ability to create anything you want. The more power of prompt knowledges you h...

cold wyvern
stiff dust
#

haha, right. Have fun with him ;D

cold wyvern
#

I'm just warning you before you make the same mistake I made!

unique cloak
#

a lot of people went on that argumentative road already.
I think XL is promising, but not the joker that beats everything either.
But let's just agree to disagree and all enjoy our tools, is what I landed on there :p

tall condor
#

why ppl compare sdxl to 1.5 and not 2.1?

#

is 2.1 really that worse than 1.5?

stiff dust
#

depends. Many people like 1.5 more

unborn rain
#

well

stiff dust
#

I would say 2.1 often makes more and better details. However, if you use a good custom model that doesn't matter so much

#

1.5 on the other hand is better in drawing humans, however, 2.1 is not as bad in that as many people say. It just cannot draw nude people

tall condor
#

ah i see

stiff dust
#

some people say 2.1 is a bit more overfitted and less versatile. Hard to say if that is true, though

tall condor
#

from my tests 2.1 draws faces much better than 1.5

#

and with much more detail

stiff dust
#

but I would say that, indeed, 2.1 gives me more of the same look (e.g. its hard drawing people full body, it always wants them close-up, while 1.5 seems a little bit more versatile)

#

yeah, as said, in the end nobody really uses the base models, but custom models like dreamshaper, deliberate and so on. They are also much better in details and faces

tall condor
#

what model do you recommend to finetune on?

#

dreamshaper looks interresting

#

when i finetune do i use the vae version?

#

i still dont really understand that part

cold wyvern
#

I finetune on the base models, as I know they still have their ema data, whereas most of the models have been pruned and may not have the training data

hot crow
cold wyvern
#

Fools errand - a) quality is subjective and subject to bias; b) some of the fun of AI art is the "pull the handle" pokie machine roll it and see if you get a good one or not; and c) SD2.1 has well past moved on being just one model - there are dozens of finetunes out there (thousands for 1.5 models)

That said - I pressed generate once - got this

#

We aren't saying that SD XL is bad; nor that one should not support Stability as they are awesome - we are just saying that there is far more to AI art, and we need the resources of all involved to make this the best it can be...

#

SD2.1 was trained on 768x768...

#

And besides.. that wasn't what you asked for...

#

well each of the individual images in that plot is 640x960

#

dude.. I'm going to politely tell you where you can take your rule...

#

you've been beating this drum for 4 days! it's time to let it go

unique cloak
#

I think they last said they feel it's time to let it go, not sure they are looking for more comparison, nor that you are stupid there.
From what I read, it feels like you are both on a different opinion and failed to convince the other or ear them out.

#

I do feel, like I explained yesterday, that your rules don't make sense.
Trying to see what's the best model either for that mater, since the possibilities of training are different, and they can't be integrated in the same pipelines because of it.

#

yes seriously. You start from a random noise, so to be fair on number of picture, 1 pic only doens't make sense. it's the medium aesthetic score of hundreds of gens that you would need to compare. but it's far from the only point that doens't make sense in comparing them : parameter counts, different text encoder, finetune to prompts, loras, hypernetworks, ...
There are numerous difference that make comparing them a statistical nightmare if you want to be fair
If you want to be practical, each has a different target and use on the market currently, the use cases they are intended for, or at least used for, diverge completly
Because of it, each will be the best in a different category

To keep your analogy, you are comparing 2 different sports golden medals at the Olympics

#

try to use reasoning maybe to answer me if you want to debate. this isn't far fetched, it's the practical world people coming for help on this server describe. I'm not attacking you there, you don't need to "screw" everything.
Anyway, since XL can't be finetuned at all currently, let's at least diverge this to #🏞|general-with-images and not clog this channel.

#

finetuning is not modifying the prompt.
#🔧|finetune is the channel for dreambooth, lora, textual inversion, controlnet, and other kinds of finetuning techniques
Prompting work is essential on any model, but you are messing the terminology. we mostly work on the good prompting techniques, and ways to build coherent prompts, in #📝|prompting-help

unique cloak
#

still not finetuning, please stop this contest as you've been asked multiple times now. nobody is asking for fair rules in this, you can keep having fun with it if you want, but move this either to #✨|sdxl or to #🏞|general-with-images .
This channel still isn't the place for what you are doing.

#

I think it's still not cool to keep using the wrong channel intentionaly there no

#

I didn't subscribe to the rules of your comparison
you did accept the rules of this server though. I'm explaining to you that you are in the wrong channel and need to change channel to respect other users, as I'm here right now to help those rules get applied to everyone.
So keep ignoring me and posting in finetune to prove your point and get timed out, I'm not sure what you are looking for there

unique cloak
#

I have 10k credits currently, and used around 2k credits on SDXL currently.
I know what I'm talking about also on finetuning, having been on this server for months now, and being a moderator on it for around 6 now. So, the theorical place to talk about a given subject, and find people willing to talk about that subject, I know quite well. And I'm telling you, this is not fine tuning you are doing.
It's prompting, and yes it's essential. As well as working on your sampler, scheduler, steps and all other available settings there.

#

I'm saying this one last time. stop sharing non fine tuning on this channel. I'll time you out, I can only warn you. I am giving you every chances to please not come to this end and move to #✨|sdxl or #🏞|general-with-images

unique cloak
#

but thanks

paper knot
#

I was going to ask something related to LoRA but I got hooked by this surreal conversation, I even forgot what I was going to say

unique cloak
#

nope, don't start this again lol

#

still not the good channel, and still not a drama server

cold wyvern
#

Did anyone have a working script for merging sd2.1 lora INTO a model?

prime briar
#

left is the training data, right is what i get after training a lora from it. are there any lora parameters i should adjust to get more accurate results? i already tried the basic steps, cfg, and tried making the aspect ratios match
idk if i would have to use one of those img2img tools or inpainting tools, or switch to something like textual inversion?

#

also could it be that AI just struggles on guns like how it generates hands? idk anymore

stone garden
#

Oi

olive vapor
#

I'm training dreambooth on generating a character, though the results come out really, uhh, weird

#

I've got a dataset of about 70 images

#

I'm not sure if my config is correct but my settings go like this
instance token: a photo of (name) person
class token: a photo of a beautiful woman

instance prompt: a photo of (name), and then a whole bunch of variables like high quality and all (which give me photorealistic results when I use it in tex2img)
class prompt: a photo of a beautiful woman, then the same parameters
classification image negative prompt is regular stuff like bad hands, bad quality, whatnot (again this stuff gives me photorealistic results in tex2img)

sample prompts is a txt file with prompts corresponding to my dataset images
sample image prompt is blank (not sure what to fill in there)
sample negative prompt is the same negative prompts like before

#

I did set up my class images in this way: each image is named like 001 002 003 etc, and in a separate folder I have a bunch of txt files also named that, so for example txt file 001 has the prompt corresponding to image 001
But I'm not sure how to make dreambooth read in the correct txt file, so what I did was put all those prompts into one txt file which I now read in alone

unique cloak
#

hello there JoJoCa 🙂 Happy to see you post your problem.
You are doing a lot of things right here, but some are double work for nothing, and some small errors that hammer your quality

olive vapor
#

Thanks for the reply, yeah I probably did weird stuff, pretty new to SD

unique cloak
#

first of all, if you are training on a single character, usually, 10 to 20 pictures is a good target. Anything above is a danger getting bigger and bigger, because things start to repeat in the pictures and you don't want that : you want variety
So first step I would take, is select the 15 best pictures of that dataset, with varied clothes, lighting, background, pose, and framing (close up, full body, ...)

#

then, if you are using instance prompt and class prompt like you are, then there is no need for a caption file next to the pictures, this is double the work for nothing.
It looks inside the files when you check the option to do so. if not, it takes the class/instance prompt to train

#

next, the "instance token" is not what you are using, it's just name

#

as for class token, in this case, I would use woman

#

it's a "single token"' you want in those

#

the prompts in class and instance prompt will help build on it, but those are the main concepts you are targeting : your new token nameand the class token woman

#

sample prompt seems good, but it's just a control measure anyway, it's something that shows you during training, what would the result be currently if you were to run that "sample prompt" on the model

#

so it lets you test how good the model performs

#

usually, only 2 or 3 prompts are enough

#

like

portrait picture of name
drawing of name, very detailed, half body shot
full body shot of name

#

(ask if I said something that is not understandable)

olive vapor
#

That makes sense thank you, I dont really understand the part where you said it looks inside the files

unique cloak
#

the captions files you created, the .txt files. Those aren't used at all if you don't check the corresponding checkbox in the UI (not sure of its name)
If you do then, instance prompt becomes unused, it instead takes what's in the txt file linked to each picture
There is a last method available, it's using [filewords] as instance prompt. this will make it so the name of your picture will be used. I personally use that : I write the caption of each picture directly in the filename, like "painting of a cat.png"

The whole goal of those 3 things (caption files, instance prompt, and [filewords]=>filenames) is the same : provide a prompt to train each picture on
The instance prompt is intended to be used when you want the same prompt for each picture. In your case, it's what I would recommend, and I would use a very simple instance prompt : name, nothing else. (with your name of course 😉 )
The caption file and the [filewords] method have the same goal : letting you have a different caption per picture. This can be very potent, especially on bigger trainings, but I found it to be overhyped and more complicated to use correctly, so not the best for single subject training

olive vapor
#

I see, thanks, but if I make my images a simple caption, how do I make it still use the other part (with all the variables like high quality, 4k, ultra res, ....)

unique cloak
#

well, the good thing is : the base model you are training already knows those parts, you don't need to train those

by providing 15 very varied photos of you, and just saying to Dreambooth "learn this : it's 'JoJoCa'", dreambooth will learn each pic as you, and try to find the common part, the part that is in each and that could fit into that "JoJoCa" token.
It will discard what is changing automaticaly, not learn your wall behind you if it's a different background each time for example, same for your clothes if you have different clothes.
And it will put everything it find common in those pictures, inside that single token, JoJoCa.

So when you later use that model, you can prompt "JoJoCa", and get already a "mostly valid but bad" picture of yourself
Then you add to your prompt all the other tokens, the 4k, the realistic, ... and you get the results you wanted

olive vapor
#

I see, thanks a lot, that makes it clear

#

Last question (for now 🗿 ), whats the difference between instance prompt and class prompt (I'm now using the image caption for instance)

cold wyvern
#

instance prompt is the token it's going to use, and the class is what it token that would replace is how I understood it

#

i.e instance of Emma Watson and class of woman -> the prompt "masterpiece, a woman" would be trained to be equal to "masterpiece, Emma Watson"

unique cloak
#

class is supposed to represent something larger than your instance. just "woman" in your case, or even "person"
this is called "prior preservation" or "regularization data", the class itself, and is completly optional by the way
It's what helps the model remember what a random woman is, and not replace every woman with your face too fast in the model

cold wyvern
#

wonder where I got my info from then :S

unique cloak
#

you are also right

#

the regularisation data is trained too

#

it's a "second concept"

#

trained at the same time as your main concept

cold wyvern
#

oh sweet lol, last thing I want to do is spread bad info!

unique cloak
#

it's complicated, but yeah, it was all right from what I understand of it too

olive vapor
#

Thanks a lot, lets hope it works well now 🙂

olive vapor
#

Hmm weird thing, I have class images per instance image set to 5 but its only generating 45 images (I have 25 instance images)

unique cloak
#

you already had some of it generated in a previous try maybe ? not sure, not the tool I use

olive vapor
#

oh yeah I did, my bad, well any way to make it go faster since its doing 8700 images and the eta is 17 hours 🗿

#

16gb vram btw

finite creek
#

Hello, interesting conversation. Can I ask you guys a quick question. Do the classification images need to be square?

unique cloak
unique cloak
#

it's in the wiki usually to be sure

olive vapor
unique cloak
olive vapor
#

yeah

#

yesterday when I still tried with 60 images it put it on like 20k so now its resuming from that I suppose

finite creek
unique cloak
olive vapor
#

aight thanks, do I need to restart for that or does it update automatically

unique cloak
olive vapor
#

oh its training, class images are finished

unique cloak
unique cloak
#

what's that 8700 picutres you were talking about ?

#

total steps ?

#

yeah that is a lot too many in my opinion

#

how much Batch Size are you using ?

#

2500 steps on batch size 1 should be enough for the settings I can see in your screenshot

olive vapor
#

batch size 29
I used the performance wizard and it put it on that automatically

unique cloak
#

ho ok lol so then 8700 is just insanely high

#

batch size 29 means your total dataset is trained each step

#

my recommendation here would be to run 100 step

#

max

#

to me, the 500 you already did is too many

#

such dataset is trained in 20 minutes or so

olive vapor
#

Yeah I thought it was a bit much having 18 hours for a dataset, but I cant find where to put the max steps

#

oh is it the training steps per image?

unique cloak
#

18 hours is 50% more time than I took to train my mega model on 750 pictures

olive vapor
#

yeah definitely bad settings on my end then 🗿

unique cloak
#

the base recommendation, but it depends on so many things, like the batch size, the gradient accumulation step, the learning rate, ... is to train around 100 times each picture

#

I have a guide that comments on lots of things like this

#

it's not specific to Automatic1111 dreambooth though, all training methods are included, it focus on the theory behind it more, and helps fix your errors by understanding how it works

olive vapor
#

Damn that looks really interesting, thanks for making that

#

lol the entire character looks perfect aside from the face so far

unique cloak
#

you'll have "fix faces" to help a little when prompting for real

olive vapor
#

true, I'll let it do its thing

finite creek
olive vapor
#

alright 68% and it says training finished, I'll test how it looks with a prompt now

pearl swan
#

Hey not sure if I can post about this on here but I’m keen to pay $500 for someone who will spend say 3 hours with me walking me through fine tuning with Everydream2 — kinda like a learning session

Am doing a lot of things trial and error so could do with some accelerated learning session

olive vapor
cold wyvern
#

You need to pick up a vae

olive vapor
#

Yeah I realised as well, used a VAE and now it looks perfect

unique cloak
#

Just found the multiply.txt feature in everydream, simulating bigger or smaller datasets :
Adding a multiply.txt applies the factor inside it to the current folder
That means that, if I put 0.25 inside a 200 pictures folder, 50 pictures would be selected at random in there each epoch.
the thing works also with numbers higher than one if needed be, but this is just what I needed to continue working on my mega model without retraining the whole 750 first pictures each time : I can just use this old dataset as regularisation, adding the good multiply.txt to balance it in size with the new dataset

finite creek
#

Anybody had this problem? When training in dreambooth A1111, it generates classification images even though I have a folder with classification images in it.

tall condor
#

for some reason my trained model is producing tints and sometimes super high contrast

#

is there a way i can avoid that?

tall condor
#

anyone can help with the tint issue?

#

am i overfitting?

hot breach
#

tweak to avoid catastrophic forgetting, etc

hot breach
#

i think a lot of the offset noise has been implemented as 10% because the original blog post that created the idea used 10% but its too much sometimes

tall condor
#

what would be the setting for the noise?

royal island
#

Can anyone point me in a direction for gaining a deeper understanding of how to train for concepts? I've only had success doing faces via textual inversion but I want to go way deeper than that. I want to be able to prompt images of characters at a party actually interacting with each other, or working on a car/skydiving/skateboarding/anything. I don't quite grasp why I can generate a million images of a trippy giant mushroom and have them all come out different from each other and look amazing, but it's impossible to make even one good image a crowd of people on a dance floor. What would go into making such a thing possible? Is it something viable for one person with a 4090 to accomplish or is this the sort of thing that would require hundreds of thousands of images and a mining rig running nonstop for an unholy amount of time?

tall condor
#

so i spend the last 5 days capturing 6k images

#

if the result sucks its all your fault xDDD

tall condor
#

*captioning

tall condor
#

does SD understand perspective? like angles on an object? when training is there a propper way to specify angels of view? like looking from top down, looking from front?

tall condor
#

also shall i use fp16 or bf16?

cold wyvern
serene flicker
#

Hey, someone mentioned it's possible to train on top of a model with images in the dataset from the first training, plus and minus a bunch of images, without overtraining on the consistent images. Is there such a thing?

minor jay
#

Anyone trying to finetune deepfloyd if

cold wyvern
minor jay
serene flicker
#

Ok, I need some help. I am training a new version of a model on top of the original. I can tell that it is training based off the sample images. But when I try to test these models, the outputs are basically the exact same thing as the base model. Does anyone know what is happening and how I could salvage this? It's the third time I have tried training and I really don't want to go again since it takes like 4-5 hours each.

#

There is an example

#

Btw this is a 2.1 768 model

#

I'm going to try one more thing.

oak yew
#

before (base SD 1.5, a painting of flowers in a vase on a table):

#

after:

#

My lora REALLY likes tulips

#

I finally got kohya to work after much reinstalling everything and this is the first lora file I've made that doesn't instantly make things MUCH worse, lol

dull snow
#

any tips how to make the black lines smoother?

visual needle
#

Message🍴

dull snow
tall condor
#

hi guys, im having massiv eissues with yellow tints in my finetuned model, anyone know how to tackle it?

tall condor
#

also i have another question: i have a model that has like 100 different concepts in them, some concpts have 5 images and some have 500 images. now what is happening in my model is that the ones that have like 500 images perfoem well but its allmost impossible to generate the conecpts that have only 5 images, i ran it with 50 epochs with multiple of 10s (so every image is learned at least 500 times) - what is the best way to peroppely weight those concepts?

fickle haven
#

so i need someone to tell me how many u_net steps i need to train a character with 95 images in dreambooth and also text encoder steps.. . by deafult for 10 images it was 1500 unet and 350 text encoder so i multiplied x 9 since i have 90 images...

dull snow
#

everyone is asking

#

nobody is answering

fickle haven
#

Can we tag support?

#

They are ignoring us

cold wyvern
#

Its a community support forum - there is no “@ Support”, you are relying on others passing their knowledge on

#

As for the step count - as with a lot of things with SD its trial and error - you may find with one dataset 1500 steps is plenty even for 90 images, and another might need 25000 steps depending on the look you want to go for…

tall condor
#

its not like you can expect ppl to wait here and answer you pal. just be happy if you get a reply once in a while!

fickle haven
fickle haven
#

i will try with 15000

tall condor
#

i have a model that has like 100 different concepts in them, some concpts have 5 images and some have 500 images. now what is happening in my model is that the ones that have like 500 images perfoem well but its allmost impossible to generate the conecpts that have only 5 images, i ran it with 50 epochs with multiple of 10s (so every image is learned at least 500 times) - what is the best way to peroppely weight those concepts?

fiery rampart
#

👀

tall condor
#

endet up writing a programm that helps with the weighting of concepts 🙂

unborn wind
#

keywords work without calling the lora

oak yew
# tall condor also i have another question: i have a model that has like 100 different concept...

The newest version of my Lora is doing great because this time instead of repeating all training images the same amount of times, I have a "tier system" where the best images have duplicates. I only trained for 4 epochs and I'm getting good results because I'm training on the good images 100 times and a bunch of "meh" ones 5-20 times, which is weighting the concept higher
This is also working for concepts I don't have a lot of images of but I want to be strong, so I put them in the high repeat folder
I don't really know what I'm doing yet, but I'm using Kohya as a front end to train LORA and in the images subfolders you name things number_foldername like so and number is how many times your script repeats the folder

#

I learned this on this sketchy 4chan lora training guide under "How to set up the directory" or something like that
https://rentry.org/lora_train

#

There's no inappropriate images in that document but 4channy things make me nervous so watch out

oak yew
#

tried making my flower lora better today
turning it up to see which colors really come through, I think I'm gonna try looking for more cool tones to add "high" in the data set

#

I'm collecting before and after every time I add to this LORA so that I know what's working and what isn't

tall condor
#

i have like 250 concepts that i want to train in a model. can i do that using lora or should i just use dreambooth? all my concepts are captioned in folders atm

cold wyvern
#

Lora for each and merge them together?

tall condor
#

im going for dreambooth not for testing, 112hours 1.1mil steps

#

if it doesnt get me where i want to get ill go to lora next or try to fix the weights

#

i still dont understand the difference between lora and dreambooth tbh, with kohya ss as far is i understand the textxtual model is also trained so whats different does lora do other than the output being a kinda diff

cold wyvern
oak yew
#

im on 112 of about 700 hand editing, annotating, and sorting my dataset

stiff dust
#

there is basically no difference, just a different way of storing the model. The only limitation of lora is that you set a rank beforehand that limits how fine your model can differ from the base model

cold wyvern
tall condor
#

what im concerned with the lora is that my input dataset is cress dependent

#

*cross

#

meaning that the content is linked via tags, if i seperate the concepts and make each of them a different lora im not sure if they can cross match at the end

tall condor
warm ginkgo
#

any suggestions? this is for dreambooth on base v2.1

tall condor
#

how big is the difference if i use something like wd14 on my images and have much more detailed tags rather than concepts?

cold wyvern
#

I know there is a big difference between captioned and uncaptioned.. have only ever hand captioned stuff myself though

stiff dust
#

just train them all together

sonic narwhal
#

anay difference between everydream and stabletuner?

#

Trying to pick one for full fine tuning

tall condor
sonic narwhal
#

Also wondering should you add miniconda to PATH in order to get stabletuner?

hot breach
nocturne vale
#

is there a way to get the Triggerwords of a lora (safetensor) file? Cause over time I accumulated quite a lot of loras and either the site have been deleted or i can't find it anymore...

hot breach
#

this is the issue with using weird keywords for everything, you have hundreds of files and can't use them without a magic dictionary

nocturne vale
#

aight, imma start making myself a dictionary for my collection then

oak yew
# tall condor how are you sorting and annotating? in folders?

I'm using BLIP captioning to generate text files for each image, but then editing all these txt files in vscode because I can pull up the image easy and also crop it and stuff with an image editing plugin

then I'm just throwing them all into folders that are how many repeats each image gets during training

tall condor
#

may i ask some examples i just want to understand how extensive your captions are

oak yew
# tall condor may i ask some examples i just want to understand how extensive your captions a...

Keep in mind that I don't know what I'm doing 😅 , but I try to keep them as simple as possible. When I was detailed and had several commas and tried to describe the whole image well, my LORA made the outputs worse.
a man with a flower crown on his head
It doesn't matter if the man has a suit on or a t shirt or he's shirtless, or what he looks like, I just try to keep it simple and pertaining to the subject of my LORA, which is decorating things with flowers. The man is interchangeable.
a painting of a black cat surrounded by flowers
a woman wearing a dress covered in flowers in a garden, depth of field"
many flowers in a garden
floral illustration of roses
Less seems to be more

tired plank
#

Have been had success training lora for an action as opposed to an object? Like a jump or a punch

cold wyvern
#

Im guessing by some of the specific loras on civitai thats absolutely possible

warm agate
#

Hello guys,
I want to train a model on landscape photography with a dataset over 100k images

wintry carbon
#

can anyone please help me with training embeddings ive tried 3 times now and every time it looks nothing like the original reference images not even close

im trying to have an embedding of an original anime character and its not getting anything right not even the color of the hair, the thing about prompts is i am really confused about that part I did process the images to make the prompts for me but in the tutorial i was watching said only keep the prompts that arent integral to the character so like if it always has long hair remove the long hair part and if its wearing helmet remove the helmet part from prompt which i did, setting are
Embedding learning rate: 0.05:10, 0.02:20, 0.01:60, 0.005:200, 0.002:500, 0.001:3000, 0.0005
prompt template: custom_subject_filewords.txt
max steps: 3000
save image to log directory N steps: 50
save a copy of embedding to log directory every N steps: 50
save images with embedding in PNG chunks: checked
shuffle tags by',' when creating: checked
drop out tags when creating prompts: 0.1
sampling method: deterministic

model im using isnt the 1.5 but ive tried using that one and it got even worse results
i can dm the results and reference images

unreal linden
#

hello, I know how to drop a training image into the SD bot, but I do not know what prompt to use to attach it so I can then do image2image

quiet eagle
#

I do have an 8gb gpu, and i do have colab pro (tho it seems pretty nerfed now and even the A100 shows like 16gb of vram) just curious if it's worth it even trying to set it up on my 8gb gpu or if it will be subpar

cold wyvern
#

The bmaltais/kohya_ss fork works fine on a 1060, if a tad slow

wintry carbon
tall condor
tall condor
wintry carbon
#

i dont wanna use dreambooth because thats only for 1 model

tall condor
#

kohya suppots lora also, also it has a nice weg ui and helps creating the folder structure and so on

#

its probably your most easy starting point

#

also it takes care of the dependencys and so on by itself

tall condor
gentle osprey
#

i've had a lot more success making a dreambooth first since it doesn't require captioning

wintry carbon
#

yeah but i cant use infinite loras when creating prompts

#

im pretty sure

gentle osprey
#

what do you mean by infinite loras?

wintry carbon
#

they all together have to = 1 in value no?

gentle osprey
#

don't think there's a hard limit, just that they can interact in weird ways

#

and no, don't need to add up to 1

wintry carbon
#

well what if i want multiple people i cant put 2 loras with different people

gentle osprey
#

i'd just use inpainting for that

wintry carbon
#

seems way too much work to do for every single image until i get something that looks good also inpainting never looks good at least not when ive tried

#

all i want is some help with training TI embeddings .-.

gentle osprey
#

looked at your original statement, how are you doing your captions

wintry carbon
#

if you mean the txt file captions for the images then i generate them with the processing tool and then i remove everything that is integral to the characters design

#

from the captions thats what ive been told to do

gentle osprey
#

yeah, that's basically the gist of it, thought you might be over describing the object you're trying to embed

wintry carbon
#

all i describe is the background sometimes pose and clothes

#

and when i train it even after 3000 steps its not even close not even the color of the hair is right

gentle osprey
#

mind posting a training image and your caption?

wintry carbon
#

ye sure should i send you it in the dms ?

gentle osprey
#

sure

tall condor
#

is there any way to interpret loss rate in correlation with learning rate

#

im using a cosine with warmup and my loss rate is stabelizing to a constant after 20 epochs even tho the learning rate keeps decreasing. is there way to interpret this?

#

also for some reason my speed went from 2.6 It/s to 3.1 it/s why is that?

#

gpu work faster when turtured hard over some time? xDDD

warm agate
stiff dust
#

loss in SD does not tell you much

#

it mostly depends on the sampled noise and time step, so you have to average loss over many steps or large batch sizes to see anything and even then the change in image quality is not necessarily going with a decrease of loss

cold wyvern
#

Think of the model as a giant music studio with 895 million channels, training the model is just moving the sliders to get the desired result, learning rate tells you how far you can adjust each slider at each step

warm agate
#

Deliberate is around 2 gb, so how with how many images did they train the model?

cold wyvern
warm agate
cold wyvern
#

Theres still 895 million parameters, the only difference will be if there is additional information which there shouldnt be - the training weights will be there, but if its trained on fp16 or bp16 it will come out to either 3.95gb or 1.99gb for a sd1.5 base trained model, or 2.43gb for sd2.1

warm agate
#

@cold wyvern Can you also collaborate with us on the project?

#

I want to train a model on landscape photography with a dataset over 100k images

cold wyvern
cold wyvern
#

The cliffs notes??

#

0 is usually fine - I think 0.05 is what the original scientific paper recommended when they introduced it, most people go for 0.1 or 0 I think

wintry carbon
#

so i am trying to make a lora and its stuck at 0% and wont move
anyone know whats up?

tired plank
#

Question, say I generate an image of a person and they are wearing a shirt I really like.
However the shirt is patterned(flora, Hawaiian)and is at a not centered angle. It’s also drawn in a particular style and not photorealistic.

How would to you generate a bunch of test images from this shirt from different angles given this one image from different angles and close ups.

#

So I have a good training set for a Lora, I was wondering if I had to crop it out and put it in image to image with low denoising but I feel like that pattern will change even with low denoise

stiff dust
#

actually what I really like is to use alpha layer to mask out the part of the image that is irrelevant (e.g. just keep the shirt). Then you might train with a single image. Textual Inversion in auto111 Supports this for example

hot breach
#

the longer you train the less you need

gentle osprey
#

for LoRA extraction with Kohya SS what should the network dimension (rank) and conv dimension (rank) be set to?

slate ledge
#

There's probably a wiki somewhere that says something along the lines of

Don't ask how many (prompt, image) pairs are required for fine-tuning a stable diffusion model, because every fine-tuning task is a bit different.

On that note, has anybody fine-tuned a stable diffusion model to generate photorealistic images within a specific domain using a custom dataset? What was the size of your dataset? I need to generate a training set of (image, text) pairs for fine-tuning a CLIP model, where each image is a photorealistic scene and each text snippet is a description of what's in the scene (types of objects, and their spatial relation to one another).

#

I know "as many training samples as possible" is a totally valid answer, but I've only got enough motivation to manually label one reasonably small training set. That being said, I suspect the CLIP model will require significantly more samples than the SD model, and so I'm tempted to use a fine-tuned SD model to bootstrap a training set for fine-tuning my CLIP model. Has anyone done this before?

warm agate
#

I am planning to train a model.
What do you guys suggest?

#

Lora or Dreambooth?

#

What's the difference?

cold wyvern
#

@warm agate - I go for Lora as I wanted to be able to take that info and merge it into different models and see what came out. If you are trying to get a single model out, then dreambooth I think

#

I dont know if theres a technical reason to choose one over the other

warm agate
#

B'cuz it goes to through the images more times?

cold wyvern
#

Same reason a lora is smaller than a whole model i think - one is generating a set of instructions on how to edit the model, the other is editing the full model on the fly

warm agate
#

Am I right?

cold wyvern
#

Possibly, they did change the version of torch

warm agate
cold wyvern
warm agate
#

@cold wyvern Can you help me with image scraping from r/EarthPorn?

cold wyvern
warm agate
#

@cold wyvern Do we even need to add description for each image if we want to train.
For example, if a human is wearing a red dress so we add description of Man wearing a red dress on a sunny day?

cold wyvern
#

Again, with most things stable diffusion related is “try it and see”. I have had some success with uncaptioned, and some where the captions where absolutely necessary

tall condor
#

hi guys, in my model some concets are overfitting while some concepts are allmost impossible to be created. is there any propper way of handeling that other than trail and error?

stiff dust
#

you can try alpha channel masking to weight down regions in the image it should not put too much effort on. I don't know, though, which scripts support that.

tall condor
#

what is alpha channel masking?

#

also note that i have a set of around 7000 images

#

also another question i have is: for some reason my model is mixing up concepts that can notn be mixed. its like a car and a train, when i create something it creates like a cartrain. how can i stop that? also is it possible to have a car and a train generated in one image somehow?

cold wyvern
stiff dust
# tall condor what is alpha channel masking?

you can open your images in a graphic editor and make parts of the images transparent.
SD ignores transparency, but some training methods use the transparency as weighting to emphasize which part of the image the model should train on.
Of course, for 2000 images you don't want to do that manually

stiff dust
#

you can create two images separately, copy paste them into one and run img2img

#

you can use composable diffusion - it sometimes get these things right

#

I think there is even a plugin for composable diffusion in auto111 that allows you to make regions in your image with separate prompts

gentle osprey
#

Regularization images should help prevent classes from merging

stray kindle
#

Any Lora colabs that still work? Please @ me if you know.

tall condor
#

composable diffusion - any more info on that?

tall condor
gentle osprey
#

Are you fine-tuning existing classes? If so regularization should work. Like teaching it 50 different cars and 50 different planes and 50 different ducks could all be done with their own regularization images.

stiff dust
serene flicker
#

Honestly what even are reg images, or class images? I never understood what to use those for

#

Though I have never trained subjects so I guess they don't matter for me?

stiff dust
#

just for preventing overfitting. If you show images of rabbits drawn in a particular style you also for example show photographies of rabbits such that the model does not forget how rabbits without that style look like

gentle osprey
#

if i'm training a face: any reason to use a learning rate other than 0.000001 and a learning rate schedular of constant/linear?

sonic narwhal
#

is there a way to scrape instagram accounts for images?

gentle osprey
#

jdownloader probably does that

brazen oriole
#

simple question: will an embedding trained using textual inversion on sd 1.5 work with any other sd 1.5 based models, for example the base model of Deliberate https://civitai.com/models/4823/deliberate is 1.5, so should/would my embedding work with Deliberate?

cold wyvern
#

a textual inversion will "work" with any other model made with the same base; but results may vary 🙂

#

@brazen oriole

brazen oriole
#

gotcha, thanks. just kinda wondering if i should be training with the standard model or the variant mostly, since id like to be able to use the embedding with different base models (of the same versioN)

stiff dust
#

use the model you will use the embedding in for best results

chrome breach
#

I have tried fine-tuning dreambooth using 100 images with 1500 regularization images for 10K steps... Interestingly the output i get from it on 5 inference prompts is exactly the same as what the SD1.5 gives on those prompts...

#

It is as if fine-tuning never happened... Anyone faced same issue??

stiff dust
#

guess there is something going wrong. The output is never the same, even after very few steps of training....
You should track progress with some validation images

chrome breach
#

Yeah cool I'll have some validation images and see

#

Also, I am not sure if the data has to be specifically of one subject only or we can keep it like a mixture?

carmine pilot
#

Hello fellow humans,
I was trying to use Dreambooth to create lora from photos of specific girl with glasses.
And I am getting mixed results and idk if it is just bad training or if the glasses are the problem.

Is it generally better to train on photos without those are SD can handle that just fine?

Thanks in advance,
feel free to DM (or just @ me, but i dont want to spam there) if u are experienced in person training. Ive got few more simple questions. And any help is well appreciated.

waxen grove
#

Hi, I'm running a startup that heavily uses SD. we are looking to expand the team with a freelancer or 2. Searching for somebody that has tons of experience in finetuning SD 1.5 & 2.1. Shoot me a DM if you thank that could be you. happy to explain more there

gentle osprey
tall condor
#

nonfp16?

#

you mena bf16?

#

of you mean fp32 models?

tall condor
#

may i ask your workflow and maybe a sample image of what is not working

tall condor
#

can someone explain how to make regularisation images if i finetune with mutliple concepts?

#

also for regularisation images what shall be in them? the current model is not able to create what i finetune for. so how would i create those images?

gentle osprey
#

So like if you're training a woman's face and your instance prompt is "a photo of kdbwh85 woman face" then you have your base model spit out 1500 images with the prompt "a photo of a woman face"

tall condor
#

but i dont fine tune a single class

#

i have like 300-600 classes

#

thats why i am asking how i would take care of the regularisation image. i understand how it works for 1 single class

gentle osprey
#

You would need regularization images for each class

#

Same process I'd imagine

tall condor
#

but as far as i undestand dreambooth can only have 1 regularization folder

#

also my issue is that the class is new so i would know how to create regularization images

#

its just something the current model cant create

gentle osprey
#

Not sure what your set up looks like, but for each concept I'm trying to train I can specify a different regularization folder

tall condor
#

are you working with kohyass?

gentle osprey
#

Nah, Dreambooth extension for A1111

#

But also not trying to train 300-600 concepts at a time lol

tall condor
#

concepts is like 900 xD

gentle osprey
#

The concepts that are bleeding together, are those the new concepts?

tall condor
#

but i can probably cur the regularization down to 300 to 600 different classes

gentle osprey
#

And is it all of them? Or just certain ones.

tall condor
#

what do you mean with certain one or all together

gentle osprey
#

You said that classes were kind of merging together

tall condor
#

yes

#

its like a cross dependent model

gentle osprey
#

Is it all the classes you're training or just some of them

tall condor
#

all of them

#

i mean the results im getting without regularisation ist too bad, its just that some concepts are overfitting and some are hard to create at all

#

which is because for some cocnepts i have 100 images and for some i have 3

#

what im doing now i weight them down so that the concept with 100 gets as many runs as the concept with 3

#

but it seems that this is what is causing the overfitting

gentle osprey
#

Maybe separate them into different batches

tall condor
#

because those concepts with 3 images are run 30 times in 1 epoch which the others are run once

gentle osprey
#

Like train all the high image concepts together

tall condor
#

what do you mean by batches?

gentle osprey
#

Then do a separate training for the low image concepts

#

Have no idea if that would work, but might be worth testing

tall condor
#

you mean like training each indivitual and then merge them?

gentle osprey
#

Yeah

tall condor
#

tbh the level of work required with this is really not an options

gentle osprey
#

Or train the model in steps

tall condor
#

its more likely to define a regularization for each "base concept"

#

what do you mean by train the model in steps?

gentle osprey
#

Like train it with all the high image concepts

#

Then train the fine tuned output with the low image concepts

tall condor
#

i see

#

i have like 1300 class folders

#

and like 7000 images

#

so my average is like 5 images

#

however maybe what i can do is reduce the classes down to like 30 or 40

gentle osprey
#

Hahaha damn

tall condor
#

with the same images

#

and train a base model with them

#

and then specialize it with the full set

#

so basically create 30 classes with 7000 images

#

train the model as base

gentle osprey
#

Yeah, I'd definitely experiment with how much you're feeding it at any given time

tall condor
#

and then run the actual training

#

i was just hoping that weighting alone will help

#

also anyone know if shuffeling captions make a big difference?

#

any also what LR Scheduler are you guys using?

gentle osprey
#

I use 0.000001, constant but I had the same question earlier

tall condor
#

1e-6 right?

#

for me im using cosine with warmup

#

it appears the warmup is quite importaint

#

i have tested with constant before without warmup and it really didnt turn out too good

#

it appears everybody is using a scheuler that is reducing the LR at the end but i dont really understand the benefit yet

visual horizon
#

I'm new to AI and I'd like to do a LoRa character training. I found this tutorial that seems super nice and easy https://imgur.com/a/mrTteIt#TjsDxqp but it's using google collab and I don't, so I have struggles following the actual training part

Could anyone explain it to me or have a good link to what I'm trying to learn please? This is the only tutorial that seemed easy and I struggle with whatever else I found

gentle osprey
visual horizon
#

I hope its not too hard to understand

turbid shuttle
#

Not sure where to ask, does the WebUI have a function similar to Remix (MidJourney)?

cold wyvern
#

Webui doesnt but Nerdy Rodent had a video a while back on a SD based image merge

tall condor
#

what are you guys using to latent caption images? DeepDanbooru works well for some but not so well for other images. anyone used WD14 before? is it any better

tired plank
#

I figured you would use deepbooru for stuff trained on danbooru(anythingv3/novelai) , but I've only ever tried to make two loras so I don't have the testing data to say

#

Related to that...I just made a lora for a character and while they got the hair and face kinda right, they completely got the skin tone wrong despite me not including the skin tone in the captioning. Character is much lighter skin than I wanted.

You are supposed to not include attributes in the captioning text files that are attributed to your character right?

#

Also if you are training on a character, is better you leave the 1girl/1guy tags or remove them in your captioning files,i figured you would remove those since being 1girl/1guy is an attribute of any character

gloomy stag
#

I have a collection of faces of different characters I'm trying to train on. Would it make sense to say add woman and man to the description. For example should I saw Ryu man and Chun li woman? Will that give better results in general?

tall condor
#

from my experience the better you caption your pictures the better your result is

#

i suggest you run something like DeepDanbooru on your images and then kind of optimize the captions.

#

i have just tested a model without detailed tags (just the concept) and when i added the detailed tags the result is much batter than before

wise locust
#

does anyone know if training a dreambooth on a model that's already dreamboothed would affect quality in any way? Say if I want to train a new subject on a model where I already added subjects

cold wyvern
#

shouldn't affect quality, but as always, the proof is in the pudding 🙂

wise locust
#

cuz I do know that the reg images affect the entire model

#

so maybe if you train it too much it will start overfitting into ur reggies? do many questions, so liittle time to test them all

warm agate
#

@cold wyvern Is it important to caption images before training model?

cold wyvern
warm agate
#

I dont find Deepdanbooru for landscape photography

cold wyvern
warm agate
cold wyvern
warm agate
#

does dreambooth have a gui?

cold wyvern
sonic narwhal
#

The blip captioning in kohya is pretty shit though

#

Is there any other that works better?

short python
#

i’ve been getting some good results finetuning SD2.1 by freezing all layers of the text encoder except for the last handful (2-6 out of 24). it seems to do a great job of preventing overfitting and catastrophic forgetting even at relatively high learning rates. doing this makes SD2.1 training feel about as “easy” as SD1.5 training (it’s still tricky, it’s just no longer a nightmare). there’s a branch on EveryDream2trainer if anyone is interested in trying it out.

vast dome
#

guys what does "batch count" do? I don't think it increases the speed of training because its/s remains the same regardless of the batch count during Dreambooth

#

it is number of images that it processes before it updates the main model right?

#

wouldn't 1 batch count be better in terms of quality because it updates itself with every new sample? Though it might increase the training time because GPU-CPU transfer

cold wyvern
#

its the number of images that it processes simultaneously, decreases the number of steps and has a slight performance increase

vast dome
#

performance increase is speed it takes to complete the whole training right?

#

what I want to know is its impact on the final quality of the model

#

I get it that it reduces time it takes to train because it reduces the number of bus transfer between cpu-gpu

#

however i don't understand whether it can improve/deform quality of the final model

stiff dust
#

during training the point is rather that multiple images are trained together in one update step. So your gradient update is an average across the images in one batch

#

this makes the gradient less noisy and unstable (in SD the gradient is often extremely noisy due to the stochastic nature of the noise sampling)

#

as far as I know, when training diffusion models people tend to use extremely large batch sizes to make training more stable. However, on consumer hardware you cannot do this, so people tend to use small batch size and extremely low learning rate instead

#

so yeah, probably quality of the model is better when doing larger batch size one same amount of steps. But of course, training time is also increasing a lot.

tall condor
#

it will create tags for the images as txt file

warm agate
#

But the results aren't that reliable for landscape photography

tall condor
#

i did some tests the other day and it wasnt too bad

#

its better than nothing

tall condor
stiff dust
#

every paper I read used batch size of 20 or more

#

as said: on consumer hardware you cannot use large batch size, so people have to use super small ones

tall condor
#

i think it also matters much if you have alot of images of the same type or very few images

stiff dust
tall condor
#

if you have only a few images per concept than i think mixing them up will really give bad results. but if you have 100 of one it will not matter

tall condor
stiff dust
tall condor
#

like this example of querty

#

i find the result pretty fair

#

never tested it on animi really. i just didnt find anything better yet

stiff dust
#

I would say CLIP interrogator is WAY better

tall condor
#

do you have a link?

stiff dust
#

I just used the webui by vladic. It has it builtin

#

but guess you can also download it as separate extension for auto111

tall condor
#

thanks i will test that

#

i saw huge improovements in the model when adding more detailed tagging so i guess the better the tags are the bettery

tall condor
stiff dust
#

no, because you always sample a timestep. Let's say you sample a timestep at 1% then the image is completely noise and the model only learns rough shapes. If you sample at 99% then the image is almost perfect and the model just learns the fine details and textures

#

thus, what the model learns is completely different

tall condor
#

what do you mean with sample at timestep?

stiff dust
#

when you generate an image you create a random noise image and then step by step denoise it. You can watch this process in the webui

#

when you do training, you do not start from pure noise. Instead you draw a random time step. Lets say you draw the time step at 50%, then you add as much noise to the image as it would look like after doing 50% of the steps. The model then only removes as much noise as a single step

tall condor
#

i just dont understand what this has to do with the batch size

#

if you mix up the training result of lets say 6 images that are all completely different rather than doing one at a time. i wuld expect the result to be compoletely different

#

isnt it?

stiff dust
#

no, it makes it better

#

if you train 6 different images at the same time, each of them will have a very different gradient

#

averaging them is good, cause it makes the gradient more stable

#

in the end you want the model to train on ALL concepts, thus, you do not want it to overfit on a single image

tall condor
#

it is only good if you train for one single concept

#

but if you would want to train a frog and a cat in one concept

#

i dont see how that would be beneficial

#

unless you want a catfrog as result

#

no?

stiff dust
#

the opposite is the case

#

if you train ONLY on a cat, then this will override the models ability to do something else. Like it will forget how to draw anything except a cat

#

if you train a model hundred thousands of steps only on cat images, it will then only be able to generate cat images

#

the same happens with frog images

#

so training on 100,000 cat images basically destroys the model.

#

but lets say you train on 100,000 cat and 100,000 frog images. Now, order is important. If you train first on the cat images the model is already destroyeds

#

but if you train on one cat then a frog then a cat and so on, then the model will never forget what a frog and what a cat is, because you "remember it every step"

#

that's also the reason why we use regularizer images when training

#

this is example has not directly to do with batch size. I just want to demonstrate why it is a good thing to have variety during training

#

batch size increases variety

tall condor
#

what you are saing makes a lot of sense. but i still think you need to compensate the bigger batch size with lower learning rate and higher number of runs if you are mising concepts

stiff dust
#

yes, higher batch size means you need more epochs

#

learning rate, however, can rather be increased

tall condor
#

ah yes because you are learning a mix

#

which is lexx critial than learning too much of 1 image

#

makes sense

stiff dust
#

I mean, in general you could think of batch size as a purely performance thing. Increase batch size by 10 times means you can increase learning rate by 10 times. Training is faster because it can be easier parallelized

#

but in reality, batch size also stabilized training. Too high batch size makes gradient too stable, too low batch size makes gradient to instable. Somewhere in the middle is the sweet spot

tall condor
#

what batch size you recommend for "normal ppl"? you think 6 is too high

stiff dust
#

but in deep learning on images, memory requirements are so insane high that we never even reach this sweet spot. We can only use very low batch sizes. So I would say, take the batch size that fits in your memory

tall condor
#

as for the regularization image. maybe you can help me with that. if i want to finetune a concept for lets say a swimming cat. can i still use a regular cat a regularization?

#

or do i need to have regulariszation images of swimming cats

#

and then if i want to train for a swimming cat in black fur. do i need cats with black fur as regularization or can i still use the "regular cat"

stiff dust
#

I mean, you can do all these things. If all your images show swimming cats then also showing regular cats during training is a good idea to prevent overfitting

#

but concepts like "swimming cat" can usually be trained purely on textual inversion

#

or just by finetuning the text encoder

#

then you don't need regularization images at all I would say

tall condor
#

i am getting quite good results for most concepts, just i do see quite some overfitting for other concepts. espacially the ones that have very little images

gentle osprey
weary locust
#

If I train a textual inversion embedding on a new concept, is it possible to train a better LORA using that specific embedding? Can custom embeddings in an interface like automatic1111 affect the training process or are they ignored?

gentle osprey
#

when generating regularization images, do i want a wide distribution of sampling steps?

#

same question for CFG scale

weary locust
# gentle osprey an embedding isn't part of a model, it's a matrix that you feed a model, so no t...

Thanks for the clarification. One more question? So hypothetically if the images contained the custom embedding, and the embedding is loaded into the embeddings folder (within Automatic1111 or comparable webui), would it append the custom vector representation to the corpus of tokens when training a LoRA for an additional boost to fine tuning? Or would there need to be missing functionality added on to achieve this in the form of an extension?

gentle osprey
weary locust
gentle osprey
#

There'd be no token. If you're feeding it images created with an embedding it will treat it like any other image and have no knowledge of the token. At that point it's just pixels. Like the token isn't something the model knows so there's no association between what you feed it and the token.

#

You could specify that token if you wanted to model the associate those images with that token but it would be an approximation of the original.

sonic narwhal
#

I downloaded instaloader using these 3
-m pip install instaloader
pip3 install instaloader
pip install instaloader

But when I run a script that has
import instaloader

it returns with "ModuleNotFoundError: No module named 'instaloader'" Why is this and how to fix?

chrome breach
sonic narwhal
#

"python -m pip install instaloader" I did this today this morning and it fixed it

stiff dust
#

@weary locust In principal you can use embeddings for LORA and it makes totally sense to do so. If the input embedding is used within the LORA training depends on the implementation you use - would have to check the source code.

#

note that every good implementation of LORA or dreambooth is doing textual inversion as first step anyways. So before even start training a LORA, a textual inversion is usually trained first

warm agate
sonic narwhal
sonic narwhal
tall condor
#

anyone can recommend an interriogator like CLIP that works standalone with scripts

#

rather than a python module

#

DeepDanbooru works ok but the tags are sometimes really far off

idle crown
#

Hey! I have a total noob question. I have a bunch of custom shapes - I want to have a finetuned SD model that will understand the design style of the shapes and generate a custom shape for any scene I describe - for example : if i give a prompt - chilling at a beach - the model should maybe output a custom shape created out of a picture of sand or a custom shape created out of a picture of sea etc.

#

can anyone please guide me on how I can create a model like this ?

surreal lagoon
#

is there a script that uses captions instead of just a single instance prompt?

#

i can load BLIP and interrogate, as i've got 80G

chrome breach
#

Is it that you are using the library in a virtual env?? but installed the library outside of that env??

indigo orbit
#

Hello: Lost my upscalers recently could not find upscaler named R-ESRGAN 4x+, using None as a fallback

gentle osprey
sonic narwhal
#

Are you guys using any tools for text file batch editing and management?

Etc now after scraping instagram I want to batch edit all text files remove certain parts of the text, replace hashtags with commas etc..

echo shuttle
#

hey, my LoRA's are all a bit too strong and look baked on 1 strength, but usually work perfectly on like 0.6 to 0.8. Are there parameters I can adjust to make them a bit weaker to work on 1.0? (I crop them, pre process them with booru tags, remove tags i want my model to be associated with, add a triggerword to every file and then train it with usually 100 iterations each)

sonic narwhal
#

U can make it output multiple versions on different points of training. Then do a test gen with each model and select the one that is not overtrained

echo shuttle
#

hmm lemme find that setting

#

is it save every N steps?

sonic narwhal
#

Save every N epochs

#

if ur using kohyass

echo shuttle
#

yes i am

#

does 1 mean its only one model at the end cuz thats default?

#

these are my usual settings

sonic narwhal
#

ah yea then it will be only one model. Maybe its possible to set the save every N to a number below 1?

echo shuttle
#

well it lets me input 0.1, lemme run it and see

#

while i wait, will my models improve if i use lycoris instead of lora on the same settings?

#

i shouldn't have picked one with 2.5k steps

echo shuttle
#

ok 0.1 also only gave me 1 model

sonic narwhal
#

hm ok

echo shuttle
#

trying 10 now

echo shuttle
#

10 gave me 1 model

#

what is your purpose slider

#

aaaaaa

honest nexus
#

Somebody having problems with kohya dreambooth google colab notebook?

#

seems not working anymore

echo shuttle
#

did you use it a lot recently?

#

i dont use colab but ive read that it starts to lock you out for a certain time if you use it too much, and that time gets longer the more you use it

honest nexus
#

I use multiple google accounts

#

looks like a huggingface problem

echo shuttle
#

did you try a different colab?

swift turret
#

Hey dont know if this the right channel to ask but do you usually prefer to use batch count or use seed variation strength ?

hexed cypress
#

Does anyone have any advices in terms of settings? I am honestly looking to fine tune it a bit as I feel like things are a bit too deformed sometimes

dark gull
stiff dust
#

seed is just for initializing the random generator

#

noise in each image is still different, same way as not every pixel in the noise image is the same

gentle osprey
#

for dreambooth training, what batch size can i get use with a 4090

turbid shuttle
#

New to Training, in Dreambooth do I need a text file corresponding to each image?

#

Or is there a better training tool?

gentle osprey
#

so better is hard to quantify

turbid shuttle
#

For context my first attempt used a source model, its hard to tell the difference in the end results

gentle osprey
#

like with no base model? i'd imagine that's something you could only do if you had a server farm at your disposal

#

the base models are trained on absolutely massive data sets

turbid shuttle
gentle osprey
#

that's how each training approach works

turbid shuttle
gentle osprey
#

hahaha, oh yeah

#

there's a masssssssssssssiiiiiiiiivveee learning curve

#

you can get started relatively quickly

#

like took me about two weeks of tinkering to get decent results

#

but the ceiling of what you can learn is really high and always going up

turbid shuttle
#

Unfortunatly I stared on MidJourney, now I am trying to get the same quality of results

gentle osprey
#

midjourney is definitely a more user friendly product, but less versatile/customizable

gentle osprey
#

hahah that too

turbid shuttle
#

Plus the NSFW filters, I type in something harmless and end up having to appeal it 🙄

#

Okay, Ill keep reading, thank you!

tall condor
#

is there any way to find out what is causing my model to overfit?

#

what is happening is that my model is learning strange pattern and applying them

#

is there any mechanism to aviod those patterns?

gentle osprey
#

or a LoRA and then add a negative weight to it

tall condor
#

as for regularisation images: whap happens if they are from a completely different bade model like dreemshaper instead of 1.5? does it matter? also how close do regularisation images have to fit the actual concept i want to train? how far can they be off?

#

how would i filter that pattern and create a negative embedding? also i would prefere to avoid them in first place

gentle osprey
tall condor
#

what happens if i dont? 1.5 is not even getting close to the concepts i want to train on so its gonna be hard to create regulariuation iamges

gentle osprey
#

it's definitely a kludgy solution though

tall condor
#

im using kohya ss which is runnin gimages multiple times based on weights i apply

#

and i guess that i just run certain images too often, but the patetrn is so unclear that its hard for me to identiy which images causing them

gentle osprey
tall condor
#

what exactly does dreamstudio do with the regularisation images?

#

its it like a filter or is it like a correction?

gentle osprey
#

dreamstudio or dreambooth?

tall condor
#

dreambooth sorry

gentle osprey
#

regularization images are there to keep your training data from having too much of an effect on the base classes. like if you're trying to teach your model a specific instance of a car, regularization will ensure that all cars don't start looking like your car

gentle osprey
#

do you guys use classification image negative prompts for dreambooth training?

tall condor
#

anybody used regularization images from different models before?

tall condor
#

anyone know how to avoid strange pattern when finetuning? its like mixing multiple concepts so ther its neither nor

#

also is there any interpretation of loss?

echo shuttle
#

is there a setting for koyha ss lora training to give me a model after every n% of being done so that i can pick the one that is not overtrained?

tall condor
#

i think you can set that in advanced options

#

i know there is for dream and as far as i saw all the settings are identical so there must be

#

hi guys. what is the point in reducing the Learning Rate in the later stage of the training? LR Scheduler cosine for example? i see that its pretty standard but i dont understand what is the benefit

#

does it like allow to run longer and pick up more details or so?

#

or can i just use constant with warmups and train unteill the model performs best

sonic narwhal
chrome breach
stiff dust
#

actually, I would use real class images if available

#

in particular for people, as they don't have strange artefacts and deformed hands

#

I would say its less important that the regularization images come from SD itself. Its more important that they have high diversity and you use every reg image only once or few times

chrome breach
#

I see...

#

U got any ideas on how to NOT let the model overfit on the training data? (with regularization images made from SD itself by the usual way)

tall condor
#

@stiff dust do you know how well the regularization image have to fit the concept i want to train? or can they be very general?

stiff dust
#

I think the idea is to give them almost same caption as your training images. However, the best is probably to use random LAOIN subsets - so they can be also very general

tall condor
#

anyone know if kohya ss is bucketing regularisation images?

#

kaibioinfo do you have any comment on my question reguarding learning rate?

tall condor
#

so it appears that lower learing rate (5e-7) in combination with constant scheduler (with warmup) does much better than 1e-6 with cosine

#

may i ask what settings you gyus are using?

gloomy pike
#

in automatic1111's web ui does anyone know how to make batch with masks do "only mask" instead of "whole image"? is there an argument I could use to make only mask the default?

#

I guess I found a way because text2mask uses the inpaint settings instead of the batch settings when masking and generating from batch. I would like to use the masks generated with the batch though so I could use the xyz script instead of the txt2mask.

#

Idk I guess the batch is using inpaint settings too. I just tested with generated mask. Idk the last time I tried it would all ways use whole image regardless of the setting in the inpaint panel.

#

maybe it's because this time I sellected it in inpaint upload instead of regular inpaint? Is that correct???

tall condor
#

how many regularisation images should i provide per concept?

#

i read online it shall be 200 per training image is that correct?

devout dome
gloomy pike
# devout dome When you use "only mask", what do you have the padding set at?

default, I've checked, inpaint upload settings are used for batch stuff and txt2mask.

I've run into a new issue now though. I can't use only mask with batch because when I do it never changes the mask to the next one along with the corresponding images, I think the masks work normal in batch when it is generated based on whole image but when I use only mask it keeps the first mask and goes through all the different images with the same first mask. They are all named the same as their corresponding images.

finite creek
#

Hello everyone, after training on 1.5 for a while I’m trying 2.1. Does anybody know if there are specific parameters to take into account that would be different from 1.5?

stiff dust
#

one per epoch per training image

#

but if you have less its not that bad

tall condor
#

ok so if i run 100 epochs i shall have 100number of image in conceptnumber of concepts right

#

so 70k images xDD

#

700k sorry xD

turbid shuttle
#

Inpaint/Sketch: I've noticed the masked area doesn't reset when I add a new image. The paint color is removed but the mask is still there. Anyone else have this issue or a different inpaint/sketch tool?

#

Also after a few uses during the session it begins to alter areas never masked

serene flicker
surreal lagoon
#

has anyone tried altering the sort of the images pulled in by the dreambooth script, to sort by atime in reverse order eg. the oldest images / least-touched images go first?

hot breach
#

I use random shuffle the entire data set every epoch in ED2

#

main gain may just be putting different images together in batches every epoch more than the order matters

#

the original dreambooth repos based on xavier's repo only have batch size of 1 or 2 maybe on 24gb so it may be a bit moot

stiff dust
surreal lagoon
#

^ this

#

i'm using about 122,000 images right now and it's just... amazing. they're well-tagged and varied

warm agate
surreal lagoon
#

they are also called 'class images'

hot breach
# warm agate What are regularization images?

two techniques, dreambooth uses SD generated images mixed in with training, or if fine tuning you can mix in your own, you can use multiply.txt in ED2 to load those images less frequently compared to training images

#

the purpose either way is to avoid overfitting to your training data, i.e. "remind the model what it already knows" so it doesn't forget, in very hand-wavy terms

warm agate
hot breach
#

dreambooth has a particular way it pairs the training images and regularization images, it pairs them up every step

#

ED2 does not, its random shuffle of all the data, no distinction is made inside the software on what "regularization" even means

warm agate
hot breach
#

in dreambooth, batch size 1 would mean 1 training image and 1 regularization images are in 1 batch/step

warm agate
hot breach
#

in ED2 its random selection, everything is just shuffled together, you can just sort of simulate the dreambooth thing though

surreal lagoon
#

the typical setup for DB has a lot of 'repeats' on your training data and fewer repeats on the class data

warm agate
hot breach
#

ED2 was built for general fine tuning, dreambooth is a specific technique, so ED2 is more general training and fine tuning

surreal lagoon
#

they're generated by the checkpoint you're training from.

hot breach
#

in dreambooth the typical technique is regularization images are generated from SD itself, they are inference outputs

warm agate
#

Oh ok

surreal lagoon
#

they usually look like total garbage but it depends on the checkpoint

warm agate
#

So they use the base SD's images as their input for regularization images

hot breach
#

I'm not a fan myself, when you can source Laion or ffhq or coco and use those instead if you even need regularization at all

surreal lagoon
#

that's highly likely to burn the model if you don't use reggies from the checkpoint, btw

#

at least this is the case for 2.1

#

it possibly isn't for 1.5. i haven't checked

hot breach
#

not much happens if you just train for one character

warm agate
surreal lagoon
#

ah see i've been trying to do generalised fine-tunes and so class data tends to harm my results

hot breach
#

partial freezing of text encoder and using a separate lower LR seem to help with training sd2.1 a lot

warm agate
surreal lagoon
#

i just use polynomial learning rate and a high warmup run

#

training and fine-tuning are the same but generally when a distinction IS made, training is understood to be from scratch and fine-tuning is providing specific concepts to a pretrained model to bring those weights up and make it more likely that type of output is produced.

hot breach
surreal lagoon
#

StabilityAI

hot breach
#

stability ai

surreal lagoon
#

jinx

warm agate
#

Ok

surreal lagoon
#

there's a group that re-trained SD 1.5 on more than 2.9 million images with thoroughly tagged captions and i personally have trouble declaring that 'fine-tuning' considering the extent of catastrophic loss from the original SD 1.5 model but that IS fine-tuning.

hot breach
#

yeah fine tuning is a pretty generic term, I would say dreambooth is a specific technique inside fine tuning for example

surreal lagoon
#

yeah. dreambooth is a subset

hot breach
surreal lagoon
#

for sure, yes

#

and LoRAs generally are a world of their own, with a lot of similarity to Dreamboothing but different training data setup, different learning rate, different impact on each "delta from zero" for each hyperparameter you change

hot breach
#

you can do some pretty amazing things with just a few thousand images though, train entire fictional worlds worth of characters and scenery and stuff, people underestimate how much "room" is in the model to learn

surreal lagoon
#

well the model has a lot of garbage connections

warm agate
hot breach
#

example of dreambooth or of fine tuning?

warm agate
surreal lagoon
#

@warm agate a general fine tune will have thousands and thousands of ideally, well-captioned data. this results in a "generalization" of your improvements across all of the tags you had in your training data. this can be MONUMENTAL.

dreambooth is trying to insert a single subject into a model so it can be referenced by a single keyword. in other words, add yourself into your favourite model so you can become a subject in its fantasies.

hot breach
#

"fine tuning" is training an already trained/started model with labeled data (i.e. captioned images), that's the most generic version

#

dreambooth does the same thing, but generally the labels are a fixed word like "xyzbob" or "xyzbob person" and regularization images are also mixed in with just some generic label like "person"

#

the point of either is to make the model learn something it doesn't know, can be anything that relates text to a 2D image really, like styles, camera angles, characters, etc

warm agate
hot breach
#

LORA is its own thing, its a trick to try to make training more efficient by training and patching a much smaller submodule, but it isn't actually updating the core model weights at all

warm agate
surreal lagoon
#

yeah, expanding on that last point, you can use Dreambooth to "fix" the model's understanding of a "concept". example: SD 2.1 cannot make aliens.

solution: provide Dreambooth,

  • the instance prompt "aliens" and class prompt "person"
  • about 500-3000 training images of different aliens
  • about 15,000 class images
  • use a VERY low learning rate, and a LARGE number of steps

and that will overload the 'aliens' keyword with your concept from the training data, usually replacing the astronaut it places under 'alien' by default.

warm agate
#

What are class images?

hot breach
surreal lagoon
#

we're going in circles

#

you already asked that

surreal lagoon
#

class images = regularization data

#

you have a 'subject' and a 'class' in Dreambooth training. if your subject is Lara Croft, your class is woman

hot breach
#

you pick your class, you could use "person" too but yeah,t he idea is the class is some sort of super-class of your trained thing

surreal lagoon
#

if you're trying to improve the anatomy of humans, providing 'hands' as your subject, your class would be 'human anatomy'

#

and good luck with that

hot breach
#

if you are training your pet dog Chewy, your class would probably be "dog" etc

warm agate
#

For example, I input 100 images of 'Forests', 100 images of 'beaches', 100 images of 'sunset' and 100 images of 'camels'.
So with an prompt like An aerial view of a beach during sunset with a dense forest located near the beach, camels approaching the beach through the forest

warm agate
#

@hot breach @surreal lagoon

surreal lagoon
#

yeah you'd want to provide all of that in a single dataset and put what you would prompt for each image as its caption

#

there's many different training tools and so i can't really provide guidance on how you'd use those captions for training, but i name my files by their prompts, with _ in place of spaces and then in my training code, i replace those with spaces and do a bit more cleanup on it

hot breach
#

dreambooth works fine for training in one person with like 10-30 images, but if you want to train 8 characters and a bunch of screenshots from your favorite TV series all at once, or reform the entire model to be some special style, I don't think dreambooth technique is very helpful

surreal lagoon
#

nah if you do it right you can train a movie into a single keyword and avoid style bleed that you'd see with general fine-tune. it all depends what you're after.

#

i did this with The Hobbit, and "lotr style" would make everything look very, uh, Peter Jacksony

#

but that's when i realised The LOTR movies actually have a terrible style to them

#

🤣

#

i thought the training just didn't work but i went back and looked at the movie and was like holy cow, it really does look like that

#

maybe i can pre-process them with img2img to make them more brighter and vibrant but the movie is dull and grainy and even just straight-up blurry and it makes all of the images you apply the LOTR Style keyword to, appear "decayed"

#

city skyline = vibrant, colourful, alive
city skyline lotr style = Aleppo, Syria

#

i want to try A Scanner Darkly next

warm agate
#

If we train a model with a dataset of 1 million or maybe more as it's easy to get the dataset images of humans, will the faces become way better?

stiff dust
#

yes and you don't need millions. I think most models out there are trained on very few images. You see this, because they tend to generate the same faces over and over

#

so more is better, but a thousand is probably more than enough

#

also there is a limitation. The problem SD has with generating faces of people more far away in the picture (e.g. full body shots) is a limitation of the model rendering in low resolution. You probably won't be able to fix that (except if you train on very high resolution, which takes insanely amount of time and memory)

warm agate
warm agate
warm agate
stiff dust
#

just check models like icantbelieveitsnottrue

#

they achieve really good faces

stiff dust
# warm agate Why only high res for full body?

because SD computes in 8 times lower resolution. So if you compute an 512x512 picture the internal latent resolution is 64x64. Now if the face in the original image was 64x64 pixel in size, it's internal size is 8x8. This is too few pixels to get the details right

warm agate
#

Oh, so we have to try like 4096x4096 images?

#

But I don't think we can get such high res images

stiff dust
#

no, you can't. Use upscaling and inpaint, or tiled diffusion. There are many techniques to get higher details and fix artefacts in images

warm agate
stiff dust
#

no, train normally on the native resolution or close to native

#

when you generate images you can use upscaling to make images larger, then img2img to add details.

warm agate
orchid jay
#

weird discovery, I dreamboothed my favorite model and the whole model ended up looking even better....

#

even when prompting stuff that was not dreamboothed. I think it's cuz of my reg images?

stiff dust
#

I would say that's normal if the images you use are aesthetically better than the random stuff the model generates otherwise

tall condor
#

hi guys, how many times shall i run regularisation images? my training images run like 30 times per epoch per image

#

is 1 run gonna be too little?

orchid jay
#

I use 1500 total reggies and split them up by my instance img ct

#

so if I have 100 instance images, thats 15 reggies a pop

stiff dust
#

usually 1 time is enough 😳
ideally, you do not train more than one or few times on the same regularization image

gloomy stag
#

Has anyone had great success with blip2 or clip captioning? I am trying to find a project preferably with a guide to run either of these for human pictures.

surreal lagoon
#

didn't expect that. i was careful with the input data

surreal lagoon
#

about 140,000 remain R_Flex

#

god bless the A6000

warm agate
#

What does sub 60 images mean?

stiff dust
stiff dust
gloomy stag
#

I having trouble finding a blip2 project that works on runpad

stiff dust
gloomy stag
warm agate
sonic narwhal
ancient mural
#

I made a LoRA with Dreambooth but every image it generates is miscolored or has a blue tint to everything. Any ideas?

stiff dust
#

I observed such artefacts when you train the unet with too low rank

#

default for LORA is 4 which is fine for the text encoder but too low for the unet

#

use larger rank (e.g. 16 or higher)

gentle osprey
surreal lagoon
#

141,212 to go

sonic narwhal
gloomy stag
surreal lagoon
serene flicker
#

Oh

surreal lagoon
#

mixed-precision training

stiff dust
#

as long as the dataset is not too huge, I prefer training on float 32 to avoid these issues at all

surreal lagoon
#

100% i agree

#

what qualifies as too large in your eyes? i am using a6000, 4090, and a100 80G cards for training

#

i assume each has a different threshold

sonic narwhal
stiff dust
#

for textual inversion I use 16bit, though, as it is really slow otherwise

#

lora training, in contrast, is often surprisingly fast even with 32bit

surreal lagoon
#

yeah i saw that when helping Sytan figure his overly baked output out

regal trail
#

Hey guys what are you using to train dreambooth? I used stable turner because i could use the shuffle after epoch on windows but the install is broken. Does anyone have a better alternative? I have a 3090 and a 4090 and have quite a decent experience trianing models from 20-50k images. I'm really looking for dreambooth not fine tuning as EDT is good enough for that

hot breach
#

you can simulate dreambooth in everydream2, the bonus is it is actually maintained

regal trail
#

Thank you

surreal lagoon
#

has anyone tried to keep a translation list of common terms and swap the words out randomly when training english datasets so that the encoder is introduced to new languages? eg. say you have a various dataset of landscapes, subjects, objects, and you want cats to also be gatto, you could change out cat for gatto randomly when you encounter it

stiff dust
#

wouldn't it ne easier to make a new token with same embedding as e.g. cats

warm agate
#

@stiff dust is ED2 better than Dreambooth for landscape photography?

#

What does batch size mean in training?

stiff dust
#

it's all just different scripts implementing fine-tuning

#

but I would say ED2 is most sophisticated

#

batch size should be set as highest as possible without getting out of memory errors

warm agate
#

Which one do you suggest for Landscape photography training?