#🔧|finetune

1 messages · Page 12 of 1

floral pollen
#

Oh sorry you meant the traning aspect, yes totatly agree

unique cloak
#

yep

#

I tried using SD and controlnet for "logo variation" though

#

using my old job's logo

#

no need for training on such things

#

controlnet is quite strong

floral pollen
#

Oh that is great. My brother did something similar with just text for an album cover of his

#

This technology in the right hands is insanely versatile

unique cloak
#

base logo (real estate)

floral pollen
#

(I think he used depth maps for that though)

unique cloak
#

just magic tbh

floral pollen
#

Okay that is so cool

floral pollen
#

That is so creative

unique cloak
#

I was going for a "never stop dreaming" kind of vibe

#

with the logo of the real estate being a portal to a paradise house

#

I would use it as commercial material

floral pollen
#

Yeah that hit it perfectly, this sand texture is insane

#

Absolutly

#

Crazy

unique cloak
#

last one

#

less realistic

floral pollen
#

But still an interisting vibe, a bit uncanny valley

unique cloak
#

most things don't need training

#

for basic stuff like those at least

#

but a training on top, brings it to another level

floral pollen
#

Yeah absolutly

unique cloak
#

the funiest part is, I could make more good variations like those

#

and train on all those

floral pollen
#

I'm trying to get the embed to work to try it out in combination of an automated use of controlnet for video generation on top of existing one. I thought that might come in handy

unique cloak
#

it's better with an inhouse artist for making initial dataset though

floral pollen
unique cloak
#

for research purpose I may have. it's scary what you can do

floral pollen
unique cloak
#

sorry misinterpreted the sentense

unique cloak
floral pollen
#

You just needed to make sure that there are no hands in the pictures

unique cloak
#

I misunderstood you on that one. I meant I may have trained on a random insta I found, just to see if I could emulate another post. clearly, yes

unique cloak
floral pollen
floral pollen
unique cloak
#

yeah, it moves fast :p

#

what will the next models bring ?

floral pollen
#

I don't know but I'm sure it will give us more and more and more and more control over this tool we know so much about already

unique cloak
#

quite a wild ride for sure 🙂 I'm happy I caught that train

#

this took quite a hit on my videogame time lol

floral pollen
#

I'm so happy too, seing all the news now about it I would probably not have been so open about it since I would have had to learns so much. But yeah my playtime is down to a minimum too xD

I do actually have a question though, now that I'm setting up my local env. About the batch size and Gradient accumulation steps when training embeddings directly in the automatic111 ui (I think that is dreambooth). Do you know your way around those by any chance?

unique cloak
#

one of those yeah

#

the other one I have doubts

#

batch size is exactly like in picture making

#

but for training

#

you train on multiple pictures at once

floral pollen
#

Gradient accumulation steps is just a multiplier for batch sizes per step, so nothing too fancy

unique cloak
#

making you go through your dataset faster

#

but it has also a positive effect on quality

#

like I did some unitary test on that one

#

backgrounds were 100% better on batch size 6 than 1

#

for the same dataset and same training seed

#

gradient I wasn't sure

floral pollen
#

Oh that is very interisting.

#

Gradient is just a way to have a larger batch size per step, at least from what I've gathered. If youhave 50 images and your card can only have 25 of them in ram you set batch to 24 and gradient to 2 (Uneven number to shuffle it a bit). An then it will go through 48 images per step.

#

But that is also only what I've heard I haven'tbeen able to acutally test multiple things out

#

but in general without all that. That means I should set my batch size as high as possible without running out of vram right? Or is there like a good value that should be used in all cases?

#

Because the colab implementation uses a default batch size of 1 and gets some decent results too

unique cloak
#

basicaly yes; it's also faster per picture globaly

floral pollen
unique cloak
#

if you can fit more, put more in batch size would be my tip

unique cloak
#

it's faster

#

like when making pictures

floral pollen
#

Oh for real? I didn't know that I thought it would add time

#

thats awesome!

unique cloak
#

making 8 batch of 1 picture is about twice slower than 1 batch of 8

floral pollen
#

oooooh that is so good to know xD

unique cloak
#

there is only 1 downside that I know of

#

and it's only in some cases

#

it needs to batch pictures of the same size together

#

so if you have lots of ratios

#

it can be a pain and some pictures can be droped out of the dataset

#

not all tools support multi ratio anyway

floral pollen
#

All of my pictures are 512 so that is not a problem ^^

unique cloak
#

yep

#

edge case

floral pollen
#

yesss

#

Can I ask you one more question?

#

About the dataset setup?

unique cloak
#

I have it every week when I train the model for the PoW

#

yeah

floral pollen
unique cloak
#

those models, I make using the submission to the contest here

#

and every submission has a different size almost

#

so I train on batch size 1

#

weekly constest on the server

floral pollen
# unique cloak yeah

In the video I watched to get started the person preproccessed the images, meaning an ai looked at the image placed a file next to it with a prompt that should be altered. He said: "Everything you describe in detail will NOT be part of your embedding" Meaning if the person wears a black shirt and I write that into there it does not get saved in the embedding since it is not defined as part of the character.
So I went through all of the pictures I wanted to train on and created descriptions.
Looking at the colab version you can not do that, or it is no necessary maybe? Do you have any experience with that kind of thing?

floral pollen
unique cloak
#

reading your message

floral pollen
unique cloak
#

but the UI I was seeing for dreambooth in automatic earlier seemed to let the user use captions

floral pollen
#

Oh yes it does. I was just wondering if you use it if the colab version doesnt even mention it. Reading the attention text now

unique cloak
#

that collab has it if you need

#

TheLastBen

#

I can also find a Shivam one or everydream one I think

#

if needed

#

don't feel obligated to run every cell in that TheLastBen one. some are really optionnal if you don't need

floral pollen
#

Oh thank you so much! I'm going to take a look at it when I need to test something in colab the next time.

I am a bit confused about the tokens tbh. I currently have this prompt for example for one of the pictures

a man, wearing a green shirt, sitting on a bench in a garden, sunset, his hands folded, garden in the background

If I would want to take the weight away from the objects it sees I should probably remove the posing/position attributes, like "sitting" and "in the background" leaving only garden

a man, wearing a green shirt, garden, sunset, his hands folded

And I should remove extra words liek "a", "wearing a" and "his [...] folded". Leaving me with:

man, green shirt, garden, sunset, hands

Or is that too extreme? And should "man" even be in there then? Don't we want "manyl features" to be associated with the character and therefore dont want to reduce the weight form it?

floral pollen
floral pollen
unique cloak
# floral pollen Oh thank you so much! I'm going to take a look at it when I need to test somethi...

it isn't extreme enough in my opinion, but this depends on your dataset.
If your dataset keeps changing the colour of the shirt and there is relative balance in the colour used, then you don't want to use "green" at all
Same goes if there is lots of tops, not only shirts. If you present diversity, then giving the token is not useful, unless you also want to train that token.

The tip about captioning other things is useful if you are stuck in that environment.
Like if all pics are in a garden, it would be essential to caption it, and even then, it would still be learned with your main token because it was in the whole dataset.
But what is your main token in that sentence ? what is the concept you are training ?
This feels like a caption intended for full fine tuning, that's great too, but requires a beast of a dataset, and I'm not sure it's what you were going for

#

basicaly, not captionning something, if that something keeps on changing in the dataset, makes the AI think that this is not an important part, that can be swapped out for similar parts

floral pollen
#

I actually do not present diversity, 95% are black I just grabbed pretty much the only one that was green as an example xD

I simply want to have a dataset, trained on myself. With images from myself.

I'm always infront of the same background, since I do not present diversity I should add it correct?

unique cloak
#

ok so you've made a dataset basically from a session of selfies

#

in that case some things will happen

floral pollen
#

How would I go about defining the maintoken in my caption then? I do actually just want my face and hair to be the embedding, at least as best as it can

unique cloak
#

and there is a very simple caption strategy

floral pollen
unique cloak
#

you won't be able to have just your hair and not that wall

#

you have presented that wall in each pic

#

if you were training on the wall, you would have done that too

#

it will be learned

#

do another session in the street, or accept to have the wall as background

#

with or without caption

#

or play in photoshop first

#

and insert backgrounds

#

as for the captionning strategy, very simple

#

caption all your picture the same way : "nurni" or something

#

a single token
representing what should be static there : you

floral pollen
#

Ah okay. So there is literally no way around that okay. I do have another dataset of a friend of mine because I put her head on a d&d character for our campaign, maybe I should use that then for testing

unique cloak
#

if you want to test, yeah it's a possibility. or just open your google photo and crops headshot of a family member

floral pollen
unique cloak
#

the only thing that would change is that you "erase" what was in the token you choose

#

but you can also test on your face with that static background, or take your facebook for some background variety

#

depends on your goal here ^^

floral pollen
#

oh okay. Meaning when all of my dataset pictures have "nurni" inside of them and the only thing that is not being diversified is my face and hair it will learn to simply associate that with that word?

#

(I'm using me as an example but Im going to use the other dataset)

unique cloak
#

exactly

#

if only 1 token is present, every "static" part will be trained in the token

floral pollen
#

Which means I could literally go into the folder of my friend and dont need to add 44 captions for each image? 0.o

unique cloak
#

it's how I trained that insta I was talking about

#

literally dumped all pics in a folder, selected the 20 best ones, called them all with the same arbitrary token

floral pollen
#

Omg that would have safed me so much time back then 🥹 🥲

unique cloak
#

I just hunted for good pictures, that kept presenting variety

floral pollen
#

I sat there for 2 hours creatinmg fake prompts xD

unique cloak
#

like color of suit changing each time

floral pollen
#

Okay I can defenitly do that with this set then

unique cloak
#

I will or won't say that I had to do folder for each swimsuit

floral pollen
#

xD

unique cloak
#

to balance the dataset out

floral pollen
#

Okay took a look at the image set and she is always wearing a black shirt too. Does that mean I should add it as a token or not?

unique cloak
#

no, it just means it will be very hard to prompt her in another shirt

#

if you don't have variety on it, how would the AI make the difference between the shirt and the hair, even if you added it in caption ?

#

(you should ask for her consent maybe here, just saying)

floral pollen
#

Oh you're right, I never thought about it this way. But that also means that putting some effort in photoshop to turn her clothes into diffrent colors would make the embed better right?

floral pollen
floral pollen
#

I did for the d&d thing but I guess I should ask her again

unique cloak
#

photo realistic is another thing imo

#

just look at civitAI

#

I mean, you could merge her model with some dubious things technicaly

floral pollen
unique cloak
#

as long as everybody know what they agree to 👍

floral pollen
#

Okay that is a ton of information(, and making my life so much easier). But I have one more thing I am a bit worried about. Embedding learning rate: 0.005. Yes/No? A dynamic one? Calculated based on input images?

unique cloak
#

well, when in doubt I stay on default values on these but I can help a little

#

The learning rate is the speed at witch the model is trained, it represents how much the weights are able to change per step.

#

A higher value will result in a shorter training session. It has some dangerous sides effects though :

the training can go faster into the overtrained state.
the training can fail completely and diverge, meaning it manages less and less to draw pictures like you wanted.

floral pollen
#

meaning I should keep it at base until I find a good section and train on that further with a lower value?

unique cloak
#

dynamic learning rate makes the learning rate evolve in different ways. polynomial, constant, ... depends on the tool. I like polynomial the most, but couldn't argue why exactly

#

not sure about the last question there, "Calculated based on input images?"

floral pollen
#

Based on input image count. The guy in the video based on gradient multiplication divided or multiplied the default value

#

But I'm not going to use that now. I'm going to test how many pictures I can keep in my vram and use that as the highest, leaving gradient at 1

#

I'm going to make all the adjustments now and will let it run. I can post the result, the loss graph, not the pictures, I doubt she would be okay with that 😅 tomorrow evening. It takes around 14 hours for me to train for 3000 steps.

Again thank you so so much for all the help. I learned more in the last few hours than the whole week before that about the topic.

unique cloak
#

yeah, never forget to experiment

floral pollen
unique cloak
floral pollen
#

And maybe I can even share some pipelines since everything is open source anyway

unique cloak
#

and I hope that works for your company too, PMs are open if you need some tips and can't share all publicly

floral pollen
charred sail
#

Hi everyone, I'm desperately trying to train an embedding of myself for a music video. Unfortunately, as soon as I try to switch my model to something other than SD1.5 (let's say RealisticVision), it just looks like an inbred 3rd cousin of mine.

Initialization text : *
Number of vectors per token : I've tried 3 and 8, 8 seemed better

I've trained with decreasing learning rate up to 24k steps (reaching 0.17 Vector Strength)

I'm running out of time for this project, so any help is welcome

PS, see my dataset attached below, picture descriptions are basically "a man looking up at something in the sky with a white wall in the background"

river cypress
#

U should be training on the model u want to use

#

I would remove the man from ur captions

#

Replace with ur caption

#

A xx looking at etc etc

river cypress
#

You should also tag emotion if u have so many different facial expression

charred sail
#

Ok thanks, will try this !

sonic narwhal
#

Guizmus Keeps dropping gems in this chat 🔥

sonic narwhal
#

What are thoughts on using Loras on full strength? I've heard its preferable to use them on lower strengths and never go to full 1

unique cloak
#

it will depend a lot on how strongly the LoRA has been trained, and each can benefit from different values from what people say
but I use very few lora tbh

oak void
#

Anyone have a nice way of managing their training image tags/descriptions? Having each one in a .txt is getting really painful. Especially when you start copying them around to different folders for resizing, flipping, etc.

storm rock
#

@unique cloakFinally got some hours to start and I'm about to create the dataset as we talked about and I suddenly feel overwhelmed. 😂

unique cloak
#

spoiler

#

you are about to make a dataset and a first training

#

and the results are going to be bad

#

but you are about to learn a lot

#

that's about 95% of what has the most chance of happening the first time

storm rock
#

Mhm. 🙏

unique cloak
#

it's the natural process, don't worry

storm rock
#

Guess I'll start by creating folders containing the filter words and just start puling stuff into em.

#

and deleting bad ones

unique cloak
#

start by cloning your dataset. new directory, everything, and then organize, delete, ...

#

create for you the space to try without risk

storm rock
#

Yeah got a folder called Raw Eyes, which has the 3K+ files.

#

Trying to see if XnView can do a layout where I can have to folders viewing

storm rock
#

🥹 Aight getting there.

#

So it should be ok if the only real prompt I want viable is the shape of the pupils?

#

So you would be able to do like Eye, Slit Pupil, Orange

unique cloak
#

back sorry

floral pollen
#

@unique cloak Hey there! I hope you had a great day!
For the past 8 hours I've been unsuccessfully trying to setup a gui that I found through a website after finding out that the dreambooth training embedded in auto 1111 currently does not safe the loss values in the tensorboard 😮‍💨 Would you be so kind as to tell me what you are using for training currently or what is out there and can be used? So that I don't run around trying to get things to work that don't anymore like I did until now? 😅 Finding guis for training is so much harder than I expected. I'd have no problem with going back to using dreambooth from the bare repository but if there is a gui I would like to use it ^^
I'm on ubuntu now trying around, but windows 10 would not be a problem obviously.

unique cloak
# storm rock 🥹 Aight getting there.

well... it seems like it should work at least.
but like, this is the first time I see multiple pictures of one of the folders at once. and just to say... this doesn't feel like diversity to me. this feels like a color swap in the 4 I see each time.
if all the folder is like that, then I would drasticaly reduce the size of each folder, to something like 2 to 5

unique cloak
#

ok so I use 2 different tools

storm rock
#

Oh yeah the raw files has about 40+ in each. Next should cherry picking I believe for the actual training.

#

the main one I will focus on is the slit and round

#

the others are just extra from sortig.

unique cloak
unique cloak
#

those don't have GUIs...

#

I do include, in my installer, example uses

floral pollen
unique cloak
#

but I know very few with a GUI

unique cloak
#

I'm not sure on the "no pupils" token maybe

#

try "empty pupils" potentially

storm rock
#

oh true

floral pollen
storm rock
#

those are the eyes who are simply just filled with texture, fibers etc.

unique cloak
floral pollen
unique cloak
floral pollen
#

The only thing I need to find out is if that repo safes the loss value, do you know if that is possible by any chance?

unique cloak
#

shivam does

#

the 2 I linked do

#

but I think it's mostly automatic that is the exception here

#

it's stupid not to log loss imo

floral pollen
#

Yeah I feel like it should be in automatic too but it is broken right now, because you can turn it on but nothing happens

#

But that is good to know with the other two repos, I will try those out on a company machine when I can grab one with enough vram. My company laptop has a 3080 in it but that does only have 8gb too

#

Oh and thank you again for the quick help I'll go and see if I can get my examples from back then running again!

floral pollen
#

@unique cloak I got it to work thank you so much for your help again! The repository that worked for me was the one from Shivam Shrirao with his dreambooth examples + deepspeed. And I even have the loss rate upating live now so I can finally work with it correctly 💪

charred sail
#

Would you include expressions in your captioning? The dataset I have includes pictures of my subject frowning/smiling/open mouth/closing eyes. Should I mention it in my captioning, or would that prevent my embedding from generating those expressions?

rigid mulch
#

SD 1.5 finetuned on nasa images, produces cool looking galaxies and stars mainly

pure plume
#

Hi,
So i've been playing with TI training, and i see that on my 3090 i can't do more than 1 batch or CUDA will crush.
is there a setting i'm missing somewhere just for this?
I'm doing 1 batch and 4 gradient but not really sure if that's what i need to do.

I just wonder how to speed things up

green basin
floral pollen
#

So the first one went through tonight. I guess this is a failed attempt right? Or should I continue training this to see if it will go up again?

unique cloak
floral pollen
#

12 Pictures running with a training batch size of 4

unique cloak
#

well... that seems like already overtrained, since you'd be at 1000 repeats already

#

but the curve doesn't indicate that

#

it's strange, you compared the results ?

#

it should be overfitting hard at that point imo

#

if I was going only by the graph, I would say you are not trained enough yet lol

#

the descend into low values isn't fast enough like overtraining usually does

floral pollen
#

I have not tried generating pics with it, I dont know why but it did not generate any smapling images I think I'm missing a flag. I'm training a new one right now which is at 700 steps and doing fine, laos a batch size of 4 with 26 images total.

I'm currently doing a "normal" dreambooth training where I can convert it into a model file

floral pollen
unique cloak
#

lol nah

#

really, this would be madness ^^

#

unless you put like 1e-12 Learning rate

#

I don't get how it could still need more training

floral pollen
#

Oh okay xD I guess I should just start fromt he top then? Simply restarting the process?

unique cloak
#

multiple things

#

first, find that flag for pictures

#

it's invaluable to see how it's doing, and it shows them in tensorboard too

#

second, what learning rate do you use ?

floral pollen
#

This how the current graph for the other one looks like

floral pollen
unique cloak
#

700 seems potentially good

floral pollen
unique cloak
#

ok. that is a little fast in my personal book (I train really slow) but is quite in the mean of what I see going around

#

by the way

#

I found the answer to a question I think you asked recently

#

not sure if it was you though

#

Gradiant accumulation steps

floral pollen
floral pollen
unique cloak
floral pollen
unique cloak
# floral pollen Oh yes that was me

Gradient accumulation steps are basically higher batches but not parallel.
If you have a batch size of two, the script "looks at" two images and then "learns" them. It does this at the same time and therefore needs more VRAM since it's processing it at the same time on that step.
With gradient accumulation steps (gas for making it short now) does the same but the image is "looked at" one after another. But that means it basically needs two steps to look at the two images. It looks at the first one, holds the "learning", then looks at the second one and then "applies" the learned concepts.
It's basically the "parallel" and the "series" way of learning.
The parallel way is faster since it's not taking the extra step to learn the two images, but uses more VRAM. gas takes longer because it uses extra steps to learn the same amount but uses less resources.

Since you can combine the two you can think of it multiplying. A batch site of 4 and gas of two looks at 4 images for two steps, adding it up to 8 images learned per step.

#

(thanks @upper prism !)

floral pollen
unique cloak
#

yep

floral pollen
unique cloak
#

I'll add it in my guide in the paragraph on batch size

#

summed up though

floral pollen
#

Okay thank you again, I'm going to dive into the files now to see how I can turn the image generation on ^^

unique cloak
#

in Shivam right ?

#

all options are in the parser at the start

#

or run "python train_dreambooth.py --help"

#

--save_sample_prompt
--save_sample_negative_prompt
--n_save_sample
--save_guidance_scale
--save_infer_steps
all those are for the sample parameters

#

ok

#

so it's always active on each time you save the model

#

it does pictures as long as you provide at least a "--save_sample_prompt"

#

and will make n_save_sample each time it saves the model

floral pollen
floral pollen
#

The training looks a lot more solid this time! And even though 3 out of the 4 sample images do not show the person at all. The one image that does looks absolutly incredible already. Only the eyes are a bit scuffed. I'm using 2e-6 as a learning rate now! Again 12 images, batch size 4 💪

oak void
#

anyone know why (token:0) in a prompt would have an affect on the results? Intuitively, I'd think that token's impact should be zeroed out.

#

And the token is still actually having an impact. It's not just affecting padding or something else internally. So (dog:0) will add a dog.

#

Oh, and I'm using automatic1111. So it might be something to do with how it handles prompts.

unique cloak
#

It's possible 0 would be a considered a wrong value, and just be dropped, treating the :\0 as a token in the prompt, using the parenthesis as their default, so using it like ("a dog:0":1.1) (I added " for clarity

oak void
#

I threw a print() at the end of the parser and it's being included in the array with a weight of 0.

#

so not a parsing issue

#

still trying to follow it from there to see how zero weights are treated

oak void
#

Yeah, it's going all the way through. So I guess the difference is instead of padding tokens of weight 1.0, you're getting your tokens with weight 0. Not sure why those have different results. Interesting mystery for me. Guess I'd need to dig deeper to see what a zero weight token contributes vs a 1.0 weight padding.

dapper prism
#

How many lora iterations would be ideal for 2000+ training images?

stone garden
#

I got 10m images to train guys

#

how can I do it fast and best

true flint
#

any tips for training a lora on outfits? do i include the face or just outfit in the dataset? how many epochs for realistic outfit for 20 images? how many side view/back view images do i have to include?

manic patio
#

When I'm training an embedding, should the images that it creates after so many epochs be really good representations of the input subject matter or person?

#

I'm not sure how close I should be looking for these epoch sample images to be to the desired result since there's nothing else driving these sample photos like a model

oak void
#

@manic patio not really. That'd be a bit on the overtrained side. If you've got a basic description, like "man with his dog", you wouldn't expect the sample image to closely resemble an input. If that happens, you've lost flexibility. Suddenly all dogs are the same breed, all backgrounds are that trail, etc etc.

#

@manic patio I like to use the "use text2img prompt" with a novel prompt using the embed when training. Set a fixed seed that helps you judge the progress of your embed.

#

Then you'll have a better idea if it's getting closer to what you want your results to be.

manic patio
#

So like this is pretty close to the likeness of the input person, but with some very exaggerated features like the chin and the skin texture

#

Is this actually a good result if the initialization text is just "woman with blue eyes"

oak void
#

Yeah, it looks a little overcooked to me. Maybe your learning rate is too high or you have too many vectors.

#

Another thing you can do is reduce the weight of the embedding

#

like "photo of (johnsmith:0.5)"

#

or maybe more like "photo of a man (johnsmith:0.5)

unique cloak
#

that feels quite burned, yeah, same feeling for me

manic patio
unique cloak
#

Textual inversion ?

#

ok

manic patio
#

Yeah

unique cloak
#

yeah a little more vectors,a little less steps
Also, prompting your TI token + still describing the person can help the final quality

#

you may have some savepoints that happened automatically at less steps though

manic patio
#

You think increasing the vectors would help here?

oak void
#

Maybe more, maybe less. Gotta experiment I think. For me going too high starts capturing way more than I want, like backgrounds.

#

But idk I'm still kinda new to this.

unique cloak
#

increasing the vectors will let the TI store more weights, so potentially better quality.
But overtraining can be fought a lots of ways

#

regularisation, more diverse dataset, less training steps, ...

manic patio
#

Are you all using the:

#

standard training tab for this?

#

or is there a better extension?

oak void
#

That's how I do TIs, yeah

unique cloak
#

it's one of the 2 main ways people do TI yeah

manic patio
#

Okay cool

oak void
#

Certainly other ways to train if you're doing LORAs or new models

manic patio
#

Thank you both for the guidance here

oak void
#

Good luck

unique cloak
#

TI is even availble in the code diffusers library now

#

no problem, good luck to you !

#

(I'll plug my guide, not specific to TI, but for all types of trainings)

manic patio
#

Oh and one last question.. I feel like it's going to be a stupid question, but I never see anyone state it in tutorials.

#

Is the model you have loaded in to WebUI at the time the model that it is training on?

unique cloak
#

quite sure it is yes

manic patio
#

cool

unique cloak
#

I don't see no "model" selector in the Train tab

#

so it must be

oak void
#

@manic patio also when you create an embedding, the * default text seems to have some meaning. It seems to slightly cook my results. If you delete that and leave it empty, you get a 'clean' zeroed embedding to start with. I'm not sure what the default * represents but it's not nothing.

manic patio
#

like "woman" or "woman with blue eyes" for this training session

#

Should I avoid that?

oak void
#

I think that's generally a good starting point. But I'm not sure how it works. Might depend on how many vectors you have, etc.

manic patio
#

It's honestly huge.. The thing that's been throwing me off the most with this process was not having any idea of what a "control" is supposed to be.. like what even is a baseline result.

#

Seeing it change something that I'm already familiar with helps track the progress in a pretty big way, IMO

oak void
#

It's honestly pretty badly positioned in the UI. And the description is poor. I have trouble finding it when I know what it is and where it is.

#

I also write a script yesterday to generate 11 images by adjusting the weight of a string by 0.1. So if your prompt is "a photo of mytoken riding a bike" then mytoken as the replacement string, you'll see it from "a photo of riding a bike" all the way to "a photo of (mytoken:1.0) riding a bike"

#

So I can visualize what some part of my prompt is contributing

manic patio
#

Oh that's super helpful

#

Are you releasing that as an extension or is there somewhere I can grab that script to test it?

oak void
#

Oh I can just send it to you.

#

I haven't figured out how to package it yet

manic patio
#

Sick, if you don't want to post it publicly, feel free to DM me at your convenience

oak void
#

Yeah just sent it.

#

nothing secret about it. It's just not something I'd consider shippable at this point.

#

Oh, and I like to use controlnet's openpose to further restrict samples. Cause otherwise you might see the poses snap to different positions, making it hard to compare progress.

#

Usually I just generate a baseline, then use that as the controlnet input.

unique cloak
#

You can do that using the "grid x/y/z" script feature too, using the "prompt S/R" to replace a part of your prompt with different values. It's quite a nice tool tbh, i almost use only that to stress test models and weights

manic patio
#

How many epochs on average would you say it takes you to train a face as a TI?

#

Does it take you 5k - 10k steps like some tutorials claim?

oak void
#

@unique cloak ah maybe that's what I was trying to find. I couldn't find something to do it so just threw my own together in a few minutes haha

#

@manic patio I think my best results have been in the 3000-5000 range. But I'll see obvious progress within a few hundred, especially if I'm using a learning rate that starts higher

#

But if you're using a larger batch size, it's going to be different too.

manic patio
#

0.03:100, 0.008:300, 0.005 This is the current learning rate I've got stuck in here

#

12 images with batch size at 6 and GA steps at 2

oak void
#

0.03 is very strong. You're going to see the effects very quickly.

#

I'd set GA to 1, no point in doing that.

manic patio
#

Yeah it was explained to me previous that I should start with an aggressive learning rate and then quickly pace it down to something near 0.005

oak void
#

I'll go down way under that, 0.0005 and 0.00001 if I want to run it overnight.

manic patio
#

But I also can't find where anyone on youtube is showing their intermediate pictures when the 'checkpoint' embeddings are created at whatever interval you choose

oak void
#

yeah. I've been creating an embedding, run for 100 steps, then evaluate it.

#

Find a good starting point.

#

Then run it to 300-500 with the next step

#

If it starts getting overcooked, I back those off.

manic patio
#

I think I'm going to take that approach.

oak void
#

Cause 100 steps is pretty quick. I don't mind repeating that a few times to avoid ruining the entire thing right at the start.

manic patio
#

It's so frustrating too, because everyone doing youtube tutorials thinks I need to know how to install WebUI and how to copy a path into a field.. then they conveniently don't show anything but the end result like that's somehow helpful.

#

The important info is quite literally what gets omitted Anger

royal island
#

Can anyone help me understand the distinction, if any, between training a few dozen images of a specific person or style and what it means when these huge models on civitai say that they're trained on hundreds of thousands of images? Is that a completely different thing or are they actually just opening dreambooth and pointing it at a giant folder? Is this a feasible thing to do for one person with a top of the line GPU to just let happen while they sleep?

It just seems that every tutorial I can find on training is talking about little things like training a face with a dozen or so pictures, nothing about these massive datasets and how to make a unique custom model

oak void
#

The style and such options refer to prompt templates in the textual_inversion_templates folder

#

I made my own template for training people since none of the existing options seemed particularly good for that.

#

Texture Inversion is not creating a full model. It's creating a tiny model that sits on top of an existing model and tweaks a few things. So it's good for adding a person's likeness.

#

So yeah, those hundreds of thousands of images models are quite a bit different in training.

#

You have other options besides TIs. But TIs are a good place to start. figure them out, then look into LORAs.

#

And yeah, I think a midrange GPU should be able to train a basic TI. It's just a little slower and you need to be more careful about hitting vram limits.

#

If it's too frustrating, just pay a bit to rent some compute somewhere.

royal island
#

When you say quite a bit different do you mean they aren't done using say, the A1111 dreambooth tab? I'm sorta familiar with textual inversion and have done a few successfully already, and I did try a LORA once and got not so great results- but I'm really curious how these massive models are done, how people go about training whole checkpoints/safetensors with 100k+ images

#

and if that's a different process entirely than using any of the options available within A1111

oak void
#

They probably use command line scripts for larger scale jobs. I don't know what the process there is. I'm not sure if they split the workloads between machines, etc. Not sure how people are doing it specifically.

royal island
#

hmm ok, thank you. Yeah I see the openjourney devs offer a course on it but it's 200 bucks and I'm a bit skeptical because I get pretty mixed results using that model

oak void
#

It's probably great if you're a manager and want to get your team trained up on it.

pure plume
#

any reason for lora to give this muddy feeling? (first lora and all taht)

floral pollen
#

I'm sorry for this probably stupid question but is this training failing? The samples look a bit fucked but looking at the progress of the sample pictures througout the training process it seems to be really hard for the training to actually change anything, seeming as if it needs more time. I'm using a batch size of 4, with a learning rate of 2e-6 and a polynomial scheduler. 144 class images.

unique cloak
#

If it has been hard for the sample picture to change for the last thousands steps it's overtrained yeah

floral pollen
# unique cloak Given your numbers, i would say the good point was on the local low around 1750 ...

Okay I guess I will defenitly stop it then. I have to test the 1700 checkpoint out by creating a model for it, since I simply put the name of the instance into the sample prompt... leading to cleaning products being show, because part of the name was "cleaned" 😅 So the sample images are really not helpful. I'm going to create some actual prompts for this before I start again to have that be more useful the next time.

I think your calculation is correct as it is but I'm using gradient accumulation steps at 7, since I'm currently training on 28 pictures meaning the trained should be at 7000 * 7 = 49000 pictures right?

#

At that point I mean

#

Okay I'll stop it then, and retry with actual sample prompts

unique cloak
#

It would depend if you used scale_lr but usually yeah it would explain it a lot

#

You should be able to stop imo

#

But yeah, taking the time to set up prompts for correct debugging is important

floral pollen
#

Or how does it scale that..

#

And I guess going over 3000 training steps is even with a learning rate at 2e-6 overkill correct?

unique cloak
# floral pollen I do not, what does that do?

It should be on by default i think. It makes it so your learning rate scales with batch size. If it doesn't then having a batch size of 2 would mean training twice slower on each pic.

#

And i say batch size, i mean batch size multipled by GAS

floral pollen
#

If that makes it slower I mean, or do I want that?

unique cloak
#

You can try both. Both are interesting on their own but yes, if you have it on, and have a total batch of 10, you'll speed up quite a lot the training

#

From what I seem to get, but this is more speculation than not, i tend to catch up more the thin details on lower LR personally

floral pollen
unique cloak
#

Well it would mean 10 times shorter in that example

floral pollen
unique cloak
#

It's a dangerous parameter when you use higher GAS and batch size

#

Because high learning rate can also make your training diverge completely

floral pollen
#

Ah okay, since you are using something around 2e-6 too if I remember correctly: Is 3000 a good point to end or should I raise the value of training steps since the default was 5e-6 and there the max training steps were 3000=

floral pollen
unique cloak
# floral pollen Ah okay, since you are using something around 2e-6 too if I remember correctly: ...

Going to 1e5 is also still ok, but i never push higher.
3000 steps is a random value without considering the context, what dataset size they were recommending imo.
3000 on 5e-6 seems high, on 50 pictures to 100, so i think it was calibrated for style training mostly.
My value is either 1 or 2 e-6. I go from one to the other, even sometime starting on 2, and lowering to 1 in a second phase of the training

floral pollen
#

Ah okay thank you so much that helps a lot, I haven't had a single model which was worth training in a second step, I hope the next one will bring good enough results for it 💪

unique cloak
#

Well, it really depends on what I'm doing. Usually i don't go 2 steps. When i have already done a pass on a dataset and know approximately when it should be cooked, then if i need to do another training from 0, I will train like usual for the start, and stop it a little before the right step, starting a second training from that point but on lower LR

#

I see it as a bucket that i fill with water, and don't really know where the top is the first time

#

I reduce the incoming water when i know I'm close

#

Brb

floral pollen
#

Wait okay, does that mean when I train it with the same seed and the exact same parameters the training "path" it will take will also be the same?

#

Just answered my own question by looking into the argument descriptions, god dam that useful 0.0

unique cloak
#

Sorry making food at the same time

floral pollen
#

Oh btw I had a meeting today with my boss talking about ai and stuff and we will start talks about implementing an ai driven workflow for some of our stuff, combining chatgpt with stable diffusion, I'm probably not going to be alowed to share a lot of the code but if I have some principiles and hints I can give for setting stuff like that after that I'm defenitly going to share it

floral pollen
unique cloak
floral pollen
#

Yeah same, I'm so glad when I can get him away from midjourney finally xD I explained controlnet to him and thats why he wants to switch now, and the whole company too

unique cloak
#

As long as you share enough for us to be able to help

floral pollen
# unique cloak As long as you share enough for us to be able to help

Step 1 will be the integration of chatgpt and stable diffusion to help our artists, second step is fast emedding training for moods and other stuff to be able to generate better. I guess the second one will be the most interisting to you guys. Since I will need to automate a ton of stuff around loss values. Since we will have ques running 24/7 and we cant look at the results at 0 o clock in the night to say that that one is finished. That will definitly be an interesting task

#

No new technology, yet. But a "production" ready setup with results having a lot of influence on our daily business

floral pollen
#

@unique cloak Hey there, sorry for the direct add. Since I'm pretty close to your set up right now could you tell me how many regulation images you are using? I found people online saying that in a paper there was a reference to the amount of regulation images, for a calculation for that to be precise, of: instanceImages (dataset) * 200. That seems a bit exessive to me but I can't find any other calulations or information about the amount other than 6 month old threads and some articles that reference the same. Can you confirm that it should be that high or is that outdated by now?

unique cloak
#

when I used shivam, about 100 times the size of the dataset was my baseline

#

more is better, but also having good pictures is important

#

so finding the good balance is hard

floral pollen
unique cloak
#

good regularisation pictures

#

always good dataset for sure

#

but reg gets trained too

#

so having better reg means also bettering your model

floral pollen
# unique cloak but reg gets trained too

Oh okay, but which pictures should I delete or not have in my reg images then? Distorted pictures, picture that are painted and dont have a lot of detail or what are the criteria there?

floral pollen
unique cloak
#

if you can, the best you can do is to go and scrap LAION B

#

it's not that hard but yeah, something else to learn

#

but if you can't, then yes, generating them, putting enough steps for pics to be relatively correct, and parse in a minute or two the folder to delete the really bad ones

floral pollen
unique cloak
#

it's always one, but yeah, data in the style of your base model is better as reg

floral pollen
dapper prism
#

Is there a simple way to strip some of the saved training information from a LoRA model?

gloomy sierra
#

are there any character restrictions when captioning that would strip symbols?
ie starting to train the concept of 1boy@left, 1girl@right, or 1girl@front, 2girl@back etc?
I just want to make sure that the @ character is not illegal / stripped during consumption

dapper prism
finite creek
hexed bloom
#

Anyone have any idea what sort of limitation we have with finetunning in dreambooth when it comes to amount of images? I've had good luck with 15,000, and now I'm going for 25,000, what would the theoretical upper limit be? In the millions maybe?

hot breach
#

I've done hundreds of thousands with ED2, main limit is speed/time

hexed bloom
unique cloak
vast dome
#

Guys i noticed the quality of my model degrades on othrr things as I fine tune more and more on certain subject

#

Is there no way to "teach" new concept instead "override old concepts with new concepts"?

unique cloak
#

Nope, this is the nature of training currently. Models don't grow in size during training, they learn by removing too

vast dome
#

Thats sad. I wonder in future if they will give us a way to grow the model

#

I had someone saying DONT USE DREAMBOOTH!!! USE stabletuner or everydream!!!!!

#

Why do you think they did that?

#

Are they just an idiot... Or do they have a reason? I couldnt ask their reasoning as they were quite provocative...

#

(starring at chat intensively)

hexed bloom
hexed bloom
vast dome
#

I see

hexed bloom
#

Use whatever works for you best

#

And maybe try them all to experience the differences yourself

floral pollen
#

I think I finally managed to get something that looks like actual overtraining, which means it defenitly was an issue with the low amount of class images.

Due to the now pretty high number of class images I had to reduce some of my settings to be able to even train because of vram overload.

I ran two models since yesterday. Both started to overtrain at around 650. Which mean I will cap the max steps to 1500 the next time I train to train two after another.

It took me almost a day to generate around 700 class images and I therefore reduced the amount of instance images for these tests to:


Dataset 2: 8 Images, with batch size 2 and gradient accumulation steps 4```

The training results are still not satisfactory though, and I guess a huge problem is that the instance images/dataset image are still really low in number. Which means I need to generate more class images to be able to actually train larger datasets. And there is one question I've been asking myself about that.

I don't know if it was yesterday or the day before that where I asked about datasets already. But I came to the conclusion that for the workflow I'm working on I need to generate the class images myself. But what kind of prompt would I have to/should I use?
I would usually, following tutorials simply use "man" as a class word. That way there are also a lot of drawings etc being generated, is that okay to be in the class images? Or does it make more sense to use a prompt like: "A photo of a man" because I'm training based on real life pictures, and therefore my class pictures should be as close to reality as possible?

If there is anything else wrong with what I've done feel free to point it out btw 😅
#

(I generated the class images with sampling steps set to 100 thats why it took so long. Since I read that that makes the quality when training faces better)

hexed bloom
floral pollen
# hexed bloom Can you not source more dataset images? That way you can do away with class/reg ...

I have around 20 more for each dataset, I pretty much just selected the best ones. But I want to get the best quality in general so using class images in addition does not seem option from that from what I've learned?
Which is why I want to generate more class images, but thats where the question for the class image prompts comes from, should I use class images that are as close to my dataset images as possible or should I simply generate pictures that look like a "man"

#

Do you know the answer to that maybe @unique cloak ?

hexed bloom
floral pollen
hexed bloom
#

As long as your dataset images are different from one another, with various backgrounds, poses, whatever, they'll be great to use

floral pollen
#

Because I only have around 20 to 30 per dataset

hexed bloom
#

Per subject, I would say 20-50, but of course, the more varied the better

unique cloak
#

in your case, given what you are saying, I'm not sure it's the regularization the problem

floral pollen
#

I've created a model from step 600, before the downward curve, and when using it it feels like at that point it was defenitly not trained enough

unique cloak
#

yes so

#

you reach an overtrained state, that's for sure

#

adding regularisation can smooth that a little but that is not the main tool to fight it

#

the problem if your concept gets overcooked before being qualitative enough, is diversity

#

your 8 pictures aren't enough to represent the level of quality you want, or are too similar

#

you need to add more pictures in more settings if you can

unique cloak
#

I got to go for now, I'll answer more later, sorry

floral pollen
#

I can but it gets limited at the moment by how many class images I have to use per dataset image, which would allow me to use 16 to 20 dataset images with 100 class images per picture. Or should i follow Rodrigos advice and simply not use class images at all and be able to use like 30 pictures or so?

unique cloak
#

I do that to you a lot but we are on timezones that seem to bring that ^^ sorry

floral pollen
#

All good mate thank you for the help either way, got more to try out now ^^ :D

hexed bloom
floral pollen
hexed bloom
#

Yeah the image captioning!

floral pollen
floral pollen
# hexed bloom Yeah the image captioning!

Just to make sure, that means a prompt like this is defenitly not enough right?

"a photo of a ((adriancuone man))"

What should I add or not add to it? Should I describe the hair color for example?

hexed bloom
#

Then to generate I would: a photograph of a woman wearing red sunglasses

#

The captions you use, will dictate how you use your model, and same the other way around, how you use the model, dictates how you should caption

floral pollen
# hexed bloom It all depends how you want to use your model, personally I go for more descript...

Oh wait you are talking about captions for every image in the dataset correct? I've done that via stable diffusions ui before but never for dreambooth per say. I've always only used the instance_prompt flag. I simply want to train one persons face into there, not a whole concept, should I still describe the images that way then? I thought that when I'm literally just training a face I could use one keyword when the dataset is divers enough so that the rest is being "exchanged" and only the face is always there

hexed bloom
#

But captions for individual diverse images will work best, as you'll be training multiple vectors on it, making it easier to bring out that face, as it will 'overtake' more words, if that makes any sense

#

Think of it like you're corrupting the original data with your own; if you have different but similar enough captions for each image, you'll be corrupting more vectors than if you used the same caption/words

floral pollen
hot breach
#

if you're training a character and you don't wish to change their outfit/style then you don't really need to caption the outfit/style, it will get baked into the character's name

hexed bloom
#

If you're going to be generating images with those prompts, it would definitely help to bring out those specific frames

hot breach
#

if you have a character with training images of many outfits, it might make sense to caption the outfits

hexed bloom
#

If you're training a character with different hair styles, then you could do the hair thing for sure

floral pollen
#

Okay great that cleared up a lot in my head! Thanks! How does dreambooth actually use the captions or where do I need to put them to associate them with the images?

hot breach
#

I was able to find like fan art and stuff and I mix it in at small amounts, the stickers/anime/plush doll/cosplayman stuff helps improve styling at inference, attached are some samples of "alternate styles" and the "main training images"

hexed bloom
#

In dreambooth you use a .txt file with the same filename as the image within the same directory

#

But I think you can use the filename

#

I haven't tried it that way

floral pollen
floral pollen
hot breach
#

I use a feature in ED2 to reduce how many times the alternate stuff actually gets trained, so even with 60 images of like sketches and anime etc it only picks a few of them every epoch and focuses on the main training images

floral pollen
#

Does dreambooth also realise what I'm talking about if I add diffrent lighting names? I have some images that are a bit blown out, but if I add a caption like "on a sunny day" that should probably help differentiating that "style" to be able to put in in a negative prompt later right?

floral pollen
hot breach
hexed bloom
floral pollen
hot breach
#

you don't necessarily need to use rare tokens, they're painful to remember, if you use a full proper name it will learn

hexed bloom
#

It's why some examples use sks, its one of the rare tokens, but honestly, just caption it how you would use it naturally

hot breach
#

i.e. I trained the above character as "cloud strife" not a rare token, it works perfectly fine, and "with clouds in the sky" does not suddenly start drawing pictures of my character floating in the air

#

yeah if you have to remember some special token its.. too painful to actually use it later without having to look over a cheat sheet

hexed bloom
#

The idea is that these models should be learning from real world usage of language/words, rather than try to fit data into unused tokens; it'll just be a bigger headache for you in a longterm

#

Although it works

hot breach
#

yes the text encoder is pretty smart

hexed bloom
#

Remember, the original models have been trained on millions of images

floral pollen
#

Okay wow that is so cool.

hexed bloom
#

Your hundreds or even thousands of images won't destroy the model just because you're using words its been trained on already

#

As Freon mentioned, the text encoder is smart enough to realize the order of your words (tokens) and make the associations you want from them

floral pollen
hexed bloom
hot breach
#

yeah that's pure pain

hexed bloom
#

Before you know it you'll have hundreds of nonsense words associated and whenever you want to render something, you'll have to look it up, and you'll be more focused on using the right tokens rather than creating images

floral pollen
#

Oh okay xD Well since I'm only training one character right now it should be fine. I'm just going to try it out. And I'm not going to train several characters into one model at the moment anyway, I always use a new one

floral pollen
hexed bloom
#

Also remember, that if I was to train a model on my cat for example, using the caption cat would be positive, as the model would have context as to what I'm doing it, rather than training something completely from scratch

#

cat already looks like a cat, but sks would look like some random stuff, making it more difficult to properly train on

hot breach
#

sks is a rifle

#

its not even random

hexed bloom
#

lol! i havent actually tried that

#

so yeah

hot breach
#

its a old soviet rifle from post WW2

hexed bloom
#

you'd be transforming a rifle into a cat, which would be more difficult than a cat to my own cat

floral pollen
# hexed bloom Also remember, that if I was to train a model on my cat for example, using the c...

Okay but since I am training in a new character, I should probably use something a bit unique right? Since it has nothing to do with anything but "people" or "man" right? And since I dont want to train myself over every "person" or "man" I should probably not use something like this in my caption "adriancuone man" right? Or does that actually help with the generation since I wont be able to overwrite "man" since there are too many other images already beng trained on that phrase?

hexed bloom
#

I use Danny Devito for him, for example

floral pollen
hexed bloom
#

Yeah I would use it

#

Danny Devito, a man holding an egg

floral pollen
#

Okay then I'll do that!

hot breach
#

only trick I think is characters with like short single word/syllable names that are common words, it will be harder to train that

hexed bloom
#

You could also just do Danny Devito holding an egg, it's just for me I use BLIP2 to automatically caption, so most of the time characters are referenced by man/woman, so I add his name at the front

hot breach
#

I have a character named "wedge" that has no canonical surname, so I gave him a fake last name of "ff7r" and actually did that for all the characters with short one-word names

floral pollen
hot breach
#

early in training it draws blocks of wood for "wedge" lol or it thinks "wedge ff7r" is some sort of futuristic sports car but it learns it pretty fast it is a male video game character

#

lol

hexed bloom
#

Similarly you can put where they are from, so something like: powerpuff girls, buttercup, a girl flying over a city

#

Which helps with single names like Freon said

hot breach
hexed bloom
#

Don't think of it as single words, but rather "the collection" of them

#

So it helps to add context, even if it's adding a fake last name, source, whatever

hot breach
#

I kinda point out different things in different training images, it seems to help, sometimes I mention the bandana, sometimes I point out the shirt instead, or time of day, or gesture, or what direction the character is looking, without making every single caption ridiculously long

hexed bloom
floral pollen
hot breach
#

can actually build the captions dynamically from yamls in ed2 but its a bit of effort to do so

hot breach
floral pollen
hot breach
#

https://github.com/victorchall/EveryDream2trainer/blob/main/doc/DATA.md#by-caption-yaml you can make a yaml and split your captions into parts, with a main prompt and tags, and shuffle the tags, assign tag weights, and a max length etc, even change the conditional dropout or flip_p per image, or you can also use local.yaml or global.yaml to control it per folder or for the entire dataset, its fairly complex if you really wanted to get into that

floral pollen
hot breach
#

yeah keep it simple until you are comfortable

unique cloak
#

Your tool still rocks so much Freon. Thanks again for what you do

#

And i keep referencing to it in my guide, and just using exclusively that nowadays

floral pollen
unique cloak
#

Everydream2trainer

#

It's the best training suite for models imo right now

hot breach
floral pollen
#

Oh yeah you've shown me that, double the sadge now that my vram is not high enough for it :\ But Ill try it out once we get a beefy card at work!

hot breach
#

yeah its down to 11-12GB now but I do not think its really possible to reduce further without using a non-fine-tuning method like LORA or TI, etc

#

or something like CPU offloading that will simply destroy performance

finite creek
#

Anybody tried training a material ?

vast dome
unique cloak
#

different. I feel better, but this paper explains more

vast dome
#

"regularization" images

#

Its the class images you generate right?

#

I dont use unique token or token class. I just leave them empty. I dont use regularization images as well

#

Isnt... What I am doing basically what everydream is doing?

#

Its a bit confusing to follow. What makes it better? I am dumb. Need smart person to explain it to me

finite creek
finite creek
# vast dome Isnt... What I am doing basically what everydream is doing?

Not that smart person but found this explanation quite good: The purpose of the "regularization" images is to "preserve" the model so that not everything generated out of the model looks like your training subject, "bob smith" or whatever it may be. So that, hopefully, Tom Cruise does not suddenly look like your Bob Smith training images.

finite creek
vast dome
#

Regularation is good if you already have a good model and want to go further while preser what it already knows

#

But then again that concept doesnt make sense because i am told finetuning is basically destroying part of it

#

That finetuning is, by nature, destruction

#

Fuck

finite creek
#

It’s interesting, because you can customize your model with specific training data .

vast dome
#

The way I see is

#

If it does not increase the file size

#

Inform is not added

#

It means you are DESTROYING what it already knows then REPLACING destroyed parts with your data

#

Inform is not added. Inform is overwritten

#

And that seems like nature of all finetuning?

#

But there are like 3+ finetuning thingings. They all claim to be different

#

Either I am wrong

#

Or they are... Wrongish? I dont know maybe it is different. These things are too difficult and there seems no clear explanation

finite creek
#

I guess you have a list of tokens that you can replace and it’s up to you to chose which ones?

unique cloak
finite creek
deft dust
#

Hello. I'm new to SD 🙂 I'm trying to do my first training. Is it here the best channel to ask about it?

#

I would like to ask about the reference images I'm about to train from. Each of them have the same width, but different heights. Is that an issue? should I somewhat resize them and fill the empty space with the color background?
I wouldn't like to crop, it would lose important information in the image.

unique cloak
#

Also i wrote an intro on training guide if you want

deft dust
cold cliff
#

ED2 crops as well, but it uses multiple resolution with around the same pixel count as the resolution you're training. A ~3.5:1 image will be resized to 896x256 at 512 resolution, for example

deft dust
#

I started a test training, with some business cards layouts, and the images that are popping in the automatic app are totally nonsense, a mix of artistic abstract colored boxes. What have I done wrong?

#

is there any small example of what a datase for graphic design would look like to properly render?

deft dust
#

How can I mark some of the generated images as totally different from the expected result, so it gets taken in consideration while training?

hexed bloom
hexed bloom
sly leaf
#

So how does regularization work when it comes to LORA training (using kohya_ss)

I know you can point to a folder containing regularization images, do you have to caption them just like the training images? How many would be recommended if say, training on 100 images?

ashen perch
# sly leaf So how does regularization work when it comes to LORA training (using kohya_ss) ...

I'm also interested in this, I've just downloaded these images, I'm currently just trying enter this folder, it contains the person folder
https://github.com/aitrepreneur/REGULARIZATION-IMAGES-SD
But I have no idea if it is actually correct

GitHub

Regularization images for Dreambooth SD. Contribute to aitrepreneur/REGULARIZATION-IMAGES-SD development by creating an account on GitHub.

#

At first, I tried to do based on this guide
https://rentry.org/59xed3#regularization-images

finite creek
ashen perch
slate ledge
#

Has anybody finetuned SD to generate synthetic training data for other downstream models? For example, for scene classification or image captioning tasks?

pure plume
#

can i convert a 1.5 lora to 2.1 lora?

zinc palm
coarse crest
#

with the second one being the most common in the wild, afaik.

digital willow
#

what is the best tutorial for starting with Lora training? (my tests failed, probably cause the input images were not good enough)

pure plume
#

people who uses Kohya SS, do you use the blip for tagging or an external tool?

finite creek
#

If anybody manages to train a Lora in Kohya let me know, doesn’t work for me at the moment.

jolly lagoon
summer oriole
lost idol
#

does anyone know why it is harder to train likeness on 2.1 than 1.5? training an embedding of a person, or rather a robot rn.

primal copper
sharp solstice
#

has anyone experimented with automated finetuning of a 2.x model on a 1.x model?

#

the obvious manual way is to just generate a bunch of images from the 1.x model and use them to finetune a 2.x model maybe with the prompt that was used

spice hinge
#

Hello, I have 5k images and each one with a txt, how can I train SD?

#

Sorry I am noob :'c

ashen perch
#

I'm trying to train Lora with kohya_ss and I keep running out of VRAM.
Only 50MB is used before I start the training and I know 8GB is enough, because I've trained with the same settings and same images before. All of them are 512x512. I'm using v1.5 with gradient checkpointing and memory efficient attention turned on and cache latent turned off

#

Any suggestions?

icy canyon
#

Has anyone found a way to make sd consistently leave blank space in parts of the image?

#

I’d like to add my own text, but it usually looks messy when I just put it over a busy image

paper holly
icy canyon
# paper holly maybe inpaint some blank space instead?

Do you know any prompts that give blank space? When I do inpainting it distorts whatever was behind or next to the mask. The prompts I’ve tried are: empty double quote, the same prompt as the original image, “blank”, the color I want to to add, and some bland scenery like “barren hill”

upper wasp
#

Looking for someone interested in helping me with a paid POC project. Please DM me if interested.

vast dome
#

IS TRAINING done in order? if I stop at 20% ALWAYS,do I not train the last 80% always?

royal island
#

Should it be possible to train attributes of a thing as opposed to the actual thing? For example, say I wanted to create an embedding or a lora for puckered lips. Could I take a bunch of pictures of closeup pictures of my puckered lips, do a training, and then generate any character doing that? If that should be possible would I want to train just different angles of one persons mouth doing that or would I get several people to do it and preprocess all of those images into the same training set?

hexed bloom
forest yew
#

Is there any advice for LoRA training someone might have? I have been trying it out trying to train on a person and it "burns" very quickly so I have been halving the learning rate repeatedly but now it only picks up very general features. I found this happens with the TI training I attempted on the same dataset so I am wondering if there might be an underlying reason (it is already very familiar with the face structure perhaps)?

tribal frigate
#

Whats the reasonable minimum of pictures to train your own model? I tried to briefly google it but most articles talk about training an embedding so that's quite different i imagine

cold wyvern
tribal frigate
cold wyvern
#

Well you generally need 5-20 images for each token you want to train, a finetune will only train a couple of tokens whereas from scratch youd be trying to train most/all tokens (49408)

tribal frigate
#

49k, holy hell 🙂 do you know where i can read up on this a little bit? tokens and stuff? i wanted to eventually put together a dataset for my own model but 49k doesnt seem realistic. i was hoping for something like 2k images

#

my motivation was to get an amalgamation of styles but with high quality images only and see what it will spit out

cold wyvern
#

So the list of tokens is the vocab.json file, never tried doing a multi concept finetune though. I would suspect captions would be very important for it

tribal frigate
#

ah, thanks. and where exactly should that be located? i dont see it anywhere and the look up function didnt find it either

#

hmm, that actually looks like HTML code for a website. not sure its the right file

unique cloak
# tribal frigate my motivation was to get an amalgamation of styles but with high quality images ...

for styles, I would indicate around 50 to 200 pictures per style, very diversified in the subjects and settings they present
If you want to go for multi concept, it's all good but it's a little harder. You'd better start by "calibrating" all your concepts, aka training them on their own, to have a working dataset, before merging the datasets into one big dataset.
You'll need to balance the sizes of the datasets for the trainings to globally be readdy on the same total number of epochs in this callibration.

#

I did a guide on finetuning, but it doesn't specifically explain the full fine tune. I do spend quite a lot of words on the dataset making part of the equation, whatever your training goal is

tribal frigate
unique cloak
#

you can check the dataset

#

I tried to present lots of types of things that I wanted to be able to prompt for

#

it was on the low side, 46

tribal frigate
#

will do, thank you 🙂

unique cloak
#

still worked correctly, like there is no Morgan Freeman or Woopy Goldbergs in my dataset

cold wyvern
#

I've taken 20 of the best photorealism images from CivitAI and am training a LoRA on that, will be interesting to see how it goes, given several of them aren't that sharp and so have blurry added to the caption...

#

How many it/s do you get when training @unique cloak

unique cloak
#

around 30 to 40 img/s, depending on the settings I use, for 1.5

#

img/s not it/s, it completly depends on the batch size

#

and also quite depends on the size you use

#

(I need to recheck, this seems too high)

#

nah

#

10 times less

cold wyvern
#

yeah I'm getting 2.5s/step

unique cloak
#

2.91 ims/s

#

a little old one (last month) but it didn't change since

finite creek
#

Or just the speed for training ?

unique cloak
# finite creek Or just the speed for training ?

it affects VRAM, speed and quality
VRAM shoots a little higher with it each time you increase it
Speed goes up but can feel the contrary at first : each iteration is now a batch of multiple pictures, and each iteration takes longer because of it. but you need less iterations per epochs. In the end, it's a global speed up, even if not as major as what you get when making pictures and using batch size.
Lastly, quality. From my tests and the ones from a few friends, it's a net positive in there too, even if it has diminishing returns as you grow it even higher (back size 2 is usually quite better than batch size 1, but you get lower improvements on batch size 3 and up).
The last thing I see corelated to batch size is aspect ratio. Multiple tools out there now let you train on different aspect ratios at once, no need to crop squares. But even so, they can't batch different ratios of pictures in the same batch.

finite creek
fading kindle
#

I've spent some time fine tuning stable-diffusion 2 and stable diffusion 1.5 (base model). I'm looking into fine tuning https://huggingface.co/stabilityai/stable-diffusion-2-inpainting Are there any special considerations I need to make here? I read the model card and I was wondering if training using the diffusers fine tune script with a different set of images will interfere with the model in a destructive way (I doubt it, but I have not looked too much into how the mask conditioning is implemented in inpainting models, is it similar to the design of any other conditioning like depth-conditioned diffusion?)

cold wyvern
#

I thought the idea with inpainting was to finetune the main model then do a A + (B - C) merge where A is the inpaint model, B is your finetune and C is the base model

wicked horizon
#

hello @everyone,I need to develop a "background diffusion model" that generates various background and angle images based on prompts while keeping the product in the uploaded image unchanged. If you are interested in exploring and developing in this area, please message me privately. There will be handsome rewards for this cooperation.

minor jay
#

Get your gpus ready to fine tune depp floyd

fading kindle
cold wyvern
hot breach
#

you may need to bump learning rate up slightly with larger batch

spice sentinel
#

Context: Trying to fine tune icons of AWS services. Dataset with ~30 different icon concept, each with 15 images 770px square. So far in my 770px, the icon is about 50px in size, so I just move it around for variation. Should my captions be "icon lambda right", "icon lambda left" ?

Below is my result with 12 epochs of ~450 images with 30 concepts, lr 2e-06, GAS 2, Batch size 5

unique cloak
#

I see, but if there is only 1 type of Lambda icon, all the 3 training samples will be all the same, just translated around? So far in my 770px, the icon is about 50px in size, so I just move it around for variation. Should my captions be "icon lambda right", "icon lambda left" ?

only 1 icon is plenty enough.
Just look at my "SDArt models". I train each token on only 1 picture in the dataset, and it does great at doing variations of that picture
example base pic
and variations : #1094265318483972096 message

unique cloak
#

can you show a small screenshot of the dataset so I can get it better ?

spice sentinel
#

Here they are. The manifest JSON, the folder structure, the files inside folder, and 2 samples of the dataset for lambda category

unique cloak
#

wait, so you have this same lambda icon just moving around ?

#

first, the icon is a lot too small, it should take the whole picture, not move around

#

second, this is not variation

#

to DB, this is the exact same picture

#

you are burning the concept in

#

at most, you are training a model on positionning an icon into the frame there

#

what types of output do you want the model to have ?

#

new types of lambda ? or lambdas positionned in other places ?

spice sentinel
#

I see. The expected output is also a small icon (50 px in 768 px image). That is the first stage. I plan to fine tune it further to use those concepts in a more complex concept such as this one below

unique cloak
#

I don't think SD is the right tool for this :/

#

You will have a very hard time training the icons for a start, because SD isn't a database that stores pictures, it understands how they are made and try to make things in the same way, not reproduce

#

and controling such lines will also be impossible

#

prompts won't let you do this, controlnet will, and it will require that you almost draw the full output first, making it useless

#

I think training isn't the way to go there

spice sentinel
#

Yeah I doubt that from get go, but wanted to challenge it hehe
Thanks Guizmus. But say if I change the use case to produce a different type of Lambda icon, that would be possible ya? I should use the full 770px rather than 50 px icon?

unique cloak
#

you can totaly train a model to generate "new AWS icons", training in the style itself.
The model would understand that the icons are always square, always 1 single gradiant of color, and a hollow shape in white inside
You would train on pictures of the icon in full 768x768 (the exact size you should use)
You would caption them "AWSicon lambda orange" for example, and have only 1 of each. A total of 30 different icons with different colors captionned like that should already do a good model
This model would be able to make for example "AWSicon horse purple"

spice sentinel
#

Amazing advice. That would be fun too. Thank you.
What about using the concepts in "a photo" kind of prompt. For example "a photo of AWSicon lambda crafted in 3D on mountain surface". Will that work?

unique cloak
spice sentinel
#

Yeah, I was looking at the controlnet while waiting for your reply 🙂 Seems like it tries to lock some part of the model (still trying to comprehend it).

Alright, let me do some experimentation. Thank you so much for your help today @unique cloak

unique cloak
#

this one was my best

spice sentinel
unique cloak
#

thanks 🙂 the power of controlnet !

#

really a great tool

dry jetty
#

does anyone know how to train a LoRA?

cold wyvern
dry jetty
#

I saw and trained a LoRA but when I use it my color of the generated is blown out

cold wyvern
#

have you tried using the lora at a lower weight - I found the loras I made worked best at around 0.3

atomic phoenix
#

I have a question about training LoRa (or dreambooth, it all depends on what hardware I can acquire). I've trained models on a certain face, to produce "avatars" in that face's likeness. Now, for a project, I'd like to train a model to produce pictures in a signature style. Let's say I want to train a model to produce only pictures that look like they're stills from the Simpsons. I take 30 shots of the style I want to use, with many different subjects, backgrounds and such. How do I tag those datasets to capture "the style"?

And for classification pictures, I'll generate 1000-1500 images with then "Scene from the simpsons" or something?

Is there a model/checkpoint that works well with animated cartoon imagery?

unique cloak
#

The dreambooth extension seems to generate a bunch of class images based on the captions, then use those to train against my images (22 pictures of my face)
after training completes, I try to generate some output and I get this

#

so, the class image, it's normal. it won't do it again as long as you don't change your class prompt, but it's also something you can remove completly by removing "prior preservation". this is regularisation data that is there to help on bigger training, but I don't think you need it

#

I don't know what you were training, so this example pic doesn't tell me anything

#

can you show me a screenshot of your 22 pictures dataset, so I can see a little what we are working with ?

dry jetty
rare belfry
#

when fine-tunning a model on a face it is recommended to have 15/20 pictures maximum, why there is a limit ?
usually when training a model the more data the better

unique cloak
# rare belfry when fine-tunning a model on a face it is recommended to have 15/20 pictures max...

there isn't really a limit, you can train on more if you like, but this won't bring a better quality model. Instead, you risk :

  • to have some details of the face be trained faster/better than others, making an inconsistent final result.
  • to create more bias potential from any repeating feature outside of the face
  • to deteriorate more the rest of the model, needing to train more because of the number of pictures

1 picture is already a lot for the AI, and more isn't better in the case of a dataset.
More if it's too close to what was already in the dataset is detrimental on all sides.

cursive grotto
#

Hey, I'm looking to finetune SD for a use case involving generating realistic human faces. I'm planning to use everydream2 for this. I'm trying to keep it restricted to male faces for now.

I have used civitai checkpoints from AnalogDiffusion, AnalogMadness, and a few others. I want to create something of that quality in terms of realisitc faces, skin, eyes, and hair.

What could be the size of data set I would need? Should I crete separate data sets for faces, eyes, skin, etc or just bust up images of real people with good prompts will do? HOw many images are we looking at roughly?

unique cloak
cursive grotto
unique cloak
#

well, we just double pinged him

#

so he may come around too x)

rapid swallow
# cursive grotto Thanks man. Will DM <@259757918927192075> and check if he got one second 😉

On mobile right now, but heres a copy/paste of another comment I made that has the training info I used for Analog:

For Analog Diffusion I used around 110 images. My quick count has it at 30% closeup-medium, 25% medium-full, and the other 45% as non-people or images where no face is visible (such as silhouetted or facing away). It was trained with prior preservation, 1500 class images made with the prompt "photograph", LR of 1e-6 with polynomial scheduler to 10k steps, and trained at 512x512.

Since I did Dreambooth class training, no captioning done for the dataset.

cursive grotto
cursive grotto
unique cloak
#

not sure about 1. but for 2. :
using only "photograph" token as prompt is quite strong to train this token strongly. it won't train on other tokens, and it will try to find the common factor of all those photos : the fact that they are analogic photos.
Adding more tokens to your caption will slow the training, and won't help the style be captured.
The only downside of this method is, you got to be sure to have a very varied dataset, with a mix of all types of subjects, even no subjects at all. So no single subject gets really trained, only the style gets

oak hamlet
#

hi, I want to ask if i use the photo generated by stable diffusion to train Lora (with similar face but not the same face) and then i use this lora to generate photo with weight 1.0 will i get a consistent face which is the mixture of the photo i used or how

#

for example, I use 10 Jisoo photos and 10 Lisa photos to train Lora, then use this Lora with weight 1.0, will I get a consistent face that is half Jisoo half lisa or will it just sometime look like Jisoo and sometime look like Lisa

rapid swallow
oak hamlet
rapid swallow
oak hamlet
stiff dust
#

Hi,

I want to use SD to create images of characters from my DnD group. While not all results are bad so far, I still struggle a little bit. My current strategy is to first use textual inversion on a single picture, then create multiple good pictures of the character with heavy prompt refinment. So far so good. Now I would like the model to remember the face of the character based on these 5-10 example images, such that I can easier create new images of the same character without all the inpainting and prompt refinement.
My plan was to use textual inversion and then LORA. However, I am never really happy with the results. They seem heavily overfitted (like it has problems showing the character in other poses than from the examples). The strange thing is, that overfitting seem to happen even with textual inversion alone. If I train the images with TI it can create relatively good images, but they all look similar to each other and they are completely unable to transfer new styles to the same character. Like if I use a prompt for anime characters, I still get the same result as when I ask for a photorealistic image. What I find strange is that this does not happen when I train TI on photos of my own face. There it is perfectly able to transfer my face to completely different styles and settings. Why is it overfitting on articial AI generated images so much, but not on real photos?

stiff dust
#

in general I have the feeling that using e.g. only one token for TI leads to characters which generalize much better, but which are often broken. You see that TI tries to place the token between different concepts and then randomly add artefacts from one of these concepts. For example, one character has a blond braid, so sometimes a women appears in the images, probably because his token is somewhat shifted to women with braids. If I use many tokens (like 3) for TI, then overfitting occurs and it seems to memoize too much.
Any recommendations for the number of tokens and number of steps for training TI on digital faces (not real photos)?

dapper prism
#

Are there any tricks to training LoRAs on small datasets?

sonic narwhal
hot breach
#

thanks @rigid sorrel for running test trainings, his results from "min-SNR" technique, seems like not much going on

#

this is using kohya's implementation hacked into everydream2

rigid sorrel
#

7.5k high resolution portrait photographs on top of SD 1.5. gray line is a SNR 5, blue line is SNR off.

forest yew
#

I've tried Aitrepreneur and TheAlly's LoRA training guides and neither of them really gave me consistent results. When the Lora did hit, it was very good, but it's about.. maybe 1 or 2 in 12 images which had good results.

stuck cedar
#

Hello everyone. I'm trying to generate photo realistic images of individuals using 10-15 photos of them. Even though I've been training them using dreambooth, there's a problem with the end product not quite looking like the original test subject when the generation is done, but instead looks like a different person that has similar features as the original test subject. This problem is particularly present for women. I've been mainly using realistic vision 2.0 for this task, but also tried 1.5 pruned. Do you have any advice on what I can do to make the final generated images the same as the test subjects? Perhaps there is a better model I can train on?

hot breach
#

any way you can source more images?

#

have you tried just training longer?

wicked horizon
serene flicker
#

I'm thinking about trying training agian tonight, I haven't trained in a very long time due to xformers issues. Maybe this new fork of the webui I have will work because it has different optimizers.

#

It's basically for concept art geared towards game developers like myself. I am thinking about training on dreamshaperv5 but I have trained every previous embedding of mine on the base 1.5 model. But I do know dreamshaper is already pretty good at concept art so that's why I wanted to do it.

unique cloak
#

Anyone knows of a way to analyze your caption ?
I want to see what words I may have overused, what is trained enough and what could need a change.
I would need to dump my file names in some way and analyze the words, but somebody may have encountered this problematic already ?

stiff dust
#

you could visualize the cross attention maps

#

very manual way of inspecting the prompt, though

unique cloak
#

I'm not sure how I would go about that, but I was looking for a pre-training way to see, and update some words on some pictures to "balance" the concept out.
In the past, I worked with shorter captions, but here for example, I'd love to know how many time I use "picture" or "photo" for example, or "anime". I'm targeting a multi style model with quite some concepts.
I have a good idea of some main themes balance, but the inner details, the hard truth of statistics... I don't have for now

cold wyvern
#

Some sort of script to collate the captions and then count each word in em?

unique cloak
#

I think I'll ask chatGPT for a little script, this seems like a "simple enough" task
famous last words

cold wyvern
#

powershell should be able to sort that pretty easy

stiff dust
#

or just a few lines python 🤷‍♂️

radiant lake
radiant lake
#

ruby's like python's younger sibling, simpler, prettier syntax... but still the same thing basically

#

(please keep me posted if forking my scripts and making improvements)

#

got a lotta crazy plans for it still 😄

stiff dust
#

I also like ruby much more than python 😉 I just thought it's more likely Guizmus has python installed already

#

I like Lora conceptionally more than Dreambooth and several papers also suggest that it is much better suited for fine-tuning

unique cloak
#

Almost done for a first "pass" :p
115/160 [9:56:52<3:57:01]

unique cloak
#

I'm having nice results. I'm analyzing the tokens now
In the end, I used BulkRenameUtility to be able to copy all file names into the clipboard, notepad++ to replace spaces by \n and excel for the stats

trim portal
#

Hey folks. My name is Vatsal. I am a Computer Science student. I have received a research fellowship in deep learning and a data science internship at a top vc backed startup. I’m currently doing ML research at Stanford.

I am looking for people with experience fine tuning stable diffusion for Text2Image with excellent results. I have a custom dataset with image-caption pairs that I built readily available so a decent amount of the work is done already. Didn’t have time to do proper fine-tuning cause I was so busy.

I am really looking for anyone with significant experience fine tuning stable diffusion that can help me wrap up this project with good results.

Ideally we would also package the product and ship it as a web app.

neat grove
#

what's the current state of the art for finetuning for drawing style (cartoon)?

I'm totally out of the loop on finetuning. I want to finetune to match a particular cartoonist's style. What should I look at?

stone garden
#

Hey there, can someone be able to help me in VC? I want to create a LORA model of myself but don't know how to do it. Can someone help me?

unique cloak
#

Jack

#

you can want help

stone garden
#

yea

unique cloak
#

but not try to ping 150k people

stone garden
#

ok

#

but is there anyone available

unique cloak
#

yeah I get it. but still man, you can't just ping the whole server on a size like this. it gets captured by the bot for one, but also you get reprimanded for it
there are lots of people that do LoRA, and you'll get some help, but VC channel aren't a lot used around here. it's mostly text and it will take some hours

stone garden
#

ok

#

@dense ravine I am currently using Vladmandic/Automatic

#

This is what I see.

#

I am not sure how to create one

#

can someone help?

void osprey
#

Hello guys... I don't know much about training image data on stable diffusion to generate your own style of images. I know you need a bunch of the type of image you want to train the model on. Now, quick question. can you train a model on timelapse video of yourself drawing? I suppose the video with have a more detailed approach as to how the final result will come out...

cold wyvern
#

@unique cloak - maybe one needs to find the best Nerdy Rodent/olivio sarakas/aitrepeneur videos on TI/Lora making and pin them

stiff dust
hexed bloom
#

`Anyone know if bucketing sizes were changed?

#

I noticed on the latest branch I'm getting 1080 max bucketing size at 768, but on an older build I'm getting 768 max

hexed bloom
hexed bloom
#

I figured it out, they did change bucketing a bit ago

ashen summit
#

hi, i heard there was a "stable diffusion training labs" discord, but i can't find a link anywhere. Was it referring to these channels or can someone DM me the link?

neat oxide
#

Can anyone help me my Lora models keep coming out overbaked...

upper wasp
#

Anyone with experience in fine tuning models ....i have a good dataset....please DM me to discuss.

unique cloak
#

Hey wayne :p

neat oxide
#

but other people said its because its similar photos my issue is im trying to train a pose

stiff dust
#

what do you use as rank?

neat oxide
#

so it kinda has to be all the same

unique cloak
#

similar photos will sure create a burn quite fast

stiff dust
#

in general, many people just stay with overtrained Lora and instead reduce the weight at inference

#

training a pose sounds like Textual Inversion would be better suited for that...?

#

or directly using control net

unique cloak
#

that's an option too yeah.
when using the lora, you can reduce weight with this syntax :

a picture of a sks house lora:sks:0.8

stiff dust
#

do you just want to train the pose or also the person which does the pose?

hexed bloom
#

On the latest dreambooth version, when training the models (safetensors/ckpts) being generated, are identical to the model it was trained on. I have to manually convert diffusers to safetensors/ckpts for them to work. Anyone else experiencing this?

[+] xformers version 0.0.18 installed.
[+] torch version 2.0.0+cu118 installed.
[+] torchvision version 0.15.1+cu118 installed.
[+] accelerate version 0.18.0 installed.
[+] diffusers version 0.14.0 installed.
[+] transformers version 4.26.1 installed.
[+] bitsandbytes version 0.35.4 installed.
oak hamlet
#

hi, I change the people in the photo by using inpaint. However, it is obvious that the photo is made by AI as the people generated is not merge with the background(maybe due to the color or shadow). is there any method to make the image more natural
I try to put the photo generated to img2img again to reduce denoise strength to help merging but it will affect the background of the photo

neat oxide
#

my lora works perfectly fine

#

but someone in the comments complained that its overbaked at 0.8

#

in reality its really different depending on the model you use

#

but i think this is a good example of when not to listen to people sometimes

stiff dust
#

oh, you should use it with the same model you used for training

neat oxide
#

if u look here

#

youll see my lora actually works for all kinds of models

#

including dreamshaper

#

so genuinely i just think i drove myself crazy over 1 comment 😅

stiff dust
#

it's will work, but not that good

#

I just don't get it. Your "pose" is a full body shot?

neat oxide
#

its the general vtuber model pose

stiff dust
#

I have no clue what that is 😅

neat oxide
#

oh so like

#

people take 2d images of anime girls that look like this

#

symmetrical or non symmetrical

#

they rig them up with live 2d

#

and then well u become a vtuber

stiff dust
#

guess I'm too old for that 😂

Anyways, keep in mind that you can achieve such stuff probably with Textual Inversion, too. Somehow I think Lora is the wrong tool here

#

but I might be wrong 🤷‍♂️

neat oxide
#

i know very little about training so yeah youre probably right but i have no idea how to do textual inversion

stiff dust
#

which tool do you use? Automatic111?

neat oxide
#

kohya gui

stiff dust
#

okay, I don't know that one 😅 usually textual inversion is implemented everywhere, as it's the most basic variant of model training

neat oxide
#

i think kohya has it

#

no idea what to put here

restive plank
# neat oxide no idea what to put here

I followed https://youtu.be/70H03cv57-o, and he has a couple .json's that will fill all that in

LORA is a fantastic and pretty recent way of training a subject using your own images for stable diffusion. Say goodbye to expensive VRAM requirements and hello to this innovative new way of fine-tuning! In this video I will show you how to train a LORA weight using the kohya ss GUI with less than 7GB of VRAM and how you can then use those LORA ...

▶ Play video
neat oxide
#

yeah that tutorial is outdated

#

i actually had to figure it all out myself

#

cuz following it it didnt work

#

other people in the comments section also didnt work for them

restive plank
#

Ah, I see now you're doing TI; it works for LoRA, but I haven't tried Kohya for TI.

stiff dust
#

no, what you showed is Dreambooth + TI

#

TI is more or less the first step for all sort of trainings. So when you do lora you do TI, too. The point is, you probably don't need the lora part at all but just the TI step

#

but it looks like the kohya tool is a tool for training

#

you probably use another tool for creating the images

#

most likely the automatic111 webui

#

and TI is in there if you go to the train tab in the webui

#

its called "embeddings" sometimes

unique cloak
#

@obtuse fern
let's not spam the "contest" channel with training talk
Usually for a single subject, 10-15 is the recommendation. 20 is quite all right too, as long as they keep diversified

obtuse fern
#

I use fast-dreambooth colab and dont sure about "UNet-training-steps" part generally I do 100 per image. Is it oke

unique cloak
#

it's the tip I would have given, if you keep most other params standard

#

in particular "learning rate" can influence this quite a bit, but on defaults, this seems the good way to go

obtuse fern
#

thanx a lot

tall condor
#

so i am trying to train a model and im not clear on a couple things. first of all im using automatic1111 dreambooth training. my first question is how is the correlation between learning rate and number of training steps per image class?
i have like 1000 images in several different folders (50-150 each) and i tried training them around 200-500 steps in 1 epoch
my main question is how do i have to create the folder cations for the model to be propper
and how do i tokenize the captions? is there a description on how to structure the caption for training a model like this where the folders kind of are cross dependent?

unique cloak
#

hey again 🙂

#

first of all, this is a very complicated project you are presenting here, compared to most first time trainings.

tall condor
#

i know sorry

unique cloak
#

how is the correlation between learning rate and number of training steps per image class?
Learning rate is how much your model will get trained per step. So the impact of it will depend on how many steps you are doing
Each step, you train on 1 or more pictures, depending on the "batch size" and the "Gradiant accumulation step" you are using (sorry it's complicated)
Usually, I tip to train for 100 step per picture in the dataset, total, so more around 100k steps than 500 in your case

#

this is a training that will take around 20 hours on a RTX3090TI by the way

#

and that is if you get it right the first time

#

about complicated captions that would go cross folders, and such big datasets, my second tip would be to use a dedicated training tool, and not automatic.
But to stay in automatic, you can just caption all your pictures in a text file, next to each picture. The text file should have the same name as the picture, and contain the caption

stiff dust
#

hi, WHAT do you want to train?

unique cloak
#

yeah that's a question I should ask there, it seems like you may have some base notions wrong about the good number of pictures to use

tall condor
#

i am aware that thraining this model will take a while yes, now here is my confusion i also got when googeling. you say train each picture 100 times. i have 1000 images and each is trained if i understand dreambooth auto1111 correctly each pic is trained in my case 200 - 300 times per epoch

#

resulting in 200-300k steps

stiff dust
#

there is no common rule for that. I would say number if steps is more important than number of steps per image

#

200k steps is way too much for most use cases

tall condor
#

my question is shall each image be trained 100 times an am i training too much? in the internet i found that each image shall be trained 1500 times

stiff dust
#

it's totally fine training for only one epoch

#

but that's also depends on what you want to train

#

if you want to train a concept or style I would say training each image only once is even better

tall condor
#

another question is if i train 100 times per image is it the same than training 10 times per image in 10 epochs?

stiff dust
#

yes

#

from a technical side, having many images and train them as few times as possible is optimal

#

however, seems like many people have better experiences with training very few images but train them many times

#

I think it depends on image quality. A few images with bad quality can heavily impact your training. So I would use as many images as you can do proper quality checks on

#

and for most use cases, train for a few thousand steps is sufficient.

#

learning rate should be as small ad possible. In Dreambooth I would use something between 1e-6 and 4e-6

tall condor
#

you say few thousend steps all together or per image

stiff dust
#

gradient accumulation is, in my opinion, useless. Use higher batch size if you can afford it, but most time the vram is too small for that

#

all together

#

but as said, it also depends on what you want to train

#

but 100k all together is A LOT. Like it makes sense if you want to train a new drawing style or something. If you want to train a single concept it would be too much

#

I think best is to make validation images and check yourself when the model starts overfitting

#

do not only use prompts that describe the concept you train for, but also outlier prompts that check if the model still generalizes well

tall condor
#

im using batches of 6 and that was also my question, in most descriptions i find that batches shall not be more than 2 - why is that

#

what are outliner promts?

stiff dust
#

make them as big as you can. It just costs you a lot of memory

tall condor
#

so its ok to use more than 2?

stiff dust
stiff dust
tall condor
#

regarding promts, lets say my model is "car" and i have a foder car, car in red, car in red from front with headlights on, and so on. what is a propper way to structure the promt?

#

shall i use "car" "car,red" "car,red,headlights,font" or shall i sturcture as above

#

how do i make sure that my model learns "car" but also the concept of "car from front" while still using the "car" model?

stiff dust
#

I would rather go for complex prompts. If possible, randomize the prompt as much as possible

tall condor
#

complex promts means an actual description without "," seperator?

stiff dust
#

doesn't matter

#

but don't just use "car"

tall condor
#

my idea was to have the "car" model as base and then specialize it with more advanced promts

stiff dust
#

yes, makes sense

tall condor
#

but i still want to model to use "car" of my type

#

bascially retraining car

stiff dust
#

better add a new embedding for that

#

with textual inversion

tall condor
#

i see alot of promts in the internet using seperator but i dont understand how they matter in the promts - what is the difference between "car in red" rather than "car, red"

stiff dust
#

I think the standard Dreambooth approach is to use something like "sks car"

#

but I would rather use textual inversion before starting Dreambooth

tall condor
#

can i do that with auto1111?

#

i though dreambooth is doing that with the promts i give as folder names

iron tiger
#

Sorry to interrupt. Has anyone seen anywhere where Stable Diffusion is used to generate software architecture diagrams, that look like those from excalidraw or draw io? I think it would be pretty useful.

stiff dust
#

I don't know 😅 I usually write the training scripts myself

#

check if it's possible, if not just go for "sks car"

tall condor
#

dreambooth has an instance promt that i set for now to something like"123xyz" do you know why i need that and what it does?

stiff dust
#

it's unclear anyways which approach is better. Some say that "sks car" does less overfitting than using an embeddings. It's always difficult to decide for the best workflow 🤷‍♂️

#

do you have a link to the script?

tall condor
#

i dont really want to use 123xyz car, i want to use car

#

no sec

#

thats not it

#

i use kohya_ss

stiff dust
tall condor
#

it is overfitting because i train to many times with too high learning rate or why is that?

stiff dust
#

think of it that way: SD was trained on millions of images about cars. Photographies, comics, digital arts and so on

#

so it has a very general concept of what a car is

tall condor
#

also training each image 10 steps in 10 epochs is the same result as training each image 100 steps in 1 epoch?