#🔧|finetune

1 messages · Page 11 of 1

manic patio
#

This is similar to what I'm currently exploring except instead of a race of creature/persons, I'm trying to train a very specific facial expression. I can tell you a little bit about it and you can see if it's possibly a direction you'd like to go.

woeful kettle
#

awesome

manic patio
#

So I'm attempting to train a "LoRa model" which is basically a much much smaller and hyper specific model that will have a certain 'bias' when I add it in parallel to a different model for text2image or image2image generation

#

The process (as best I understand it) is to take a small subset of images specific to the concept you're trying to train. In my case, I'm using a base model meant for anime and illustrations to train this hyper-specific LoRa model.

#

The anime and illustrations model is a larger 3Gb that has it's own weights and biases, but I like that it generates illustrations and anime-esque artwork

#

So I'm going to create this dataset of this very specific facial expression and I'm going to provide this dataset to the base 3Gb anime model and say, "listen here Anime model, I know you think you know what a [insert facial expression here] is, but you don't. However, since I'm a nice person, I'm going to give you these examples so you can use these as inspiration the next time someone asks you to depict [insert facial expression here] in an illustration."

woeful kettle
#

nice, how do you do that?

manic patio
#

What the process ends up creating as an output is a very small model (200mb or so) that I can then take to any text2image generation notebook and apply the biases that came from this "training" to the final output of the image generation. The final result should be something close if not completely what I trained it to make me.

#

Though if you have a GPU that has 12Gb or better of Vram, you have more options that are arguably easier to use than a colab notebook without a polished GUI

#

And fwiw, the process under the hood is much more technical than my description of the events here, but it's the best spur-of-the-moment explanation I have for a 10-thousand foot view of the process.

woeful kettle
#

I see, so there are ways like the one you linked of specifically training a baby model

manic patio
#

Yep

woeful kettle
#

understood 🙂

#

thanks a lot, this is helpful

#

I'm pretty worried that my art won't be enough, or that it'll fixate on the style or something, but I'll give it a shot

manic patio
#

No problem. And my apologies if anyone comes along with more experience and proves me wrong about any one concept here. I JUST went through the same learning/exploration process you are doing and finally got a version of my LoRa model to work more often than it doesn't.

woeful kettle
#

haha, nice

manic patio
#

You'll see two keywords a lot in this process ->

#

concept name and class name

#

sometimes concept is called other things, but class name is usually always called class name

#

So for instance, if you wanted to train a LoRa to put Nicolas Cage's face on every male leaning individual in a photo, your concept name is "Nicolas Cage" or "Nicolas Cage man"

#

and your class name is "man"

#

so that the token generated from this is "Nicolas Cage" (or the difference between the two names if you remove common denominators)

woeful kettle
#

wait so, what would it be for like... birdperson?

manic patio
#

This can sometimes be the most simple part, and other times seems to be more complicated depending on your specific subject matter

#

There's no one right answer.. the best thing is to test multiple approaches, but I'd say: concept name = "bird person" and class name = "person"

woeful kettle
#

well just to be concrete, my dudes are call "Usaq" so if I want to gen like "usaq men around a campfire" what are my concept and class names?

#

and in this case, would I then need to separately train usaq woman? or how does that work?

manic patio
#

Since there's likely a dichotomy between usaq men and usaq women, my approach would be to train two different LoRa's using the same settings (but different dataset specific to each sex of usaq person) and then merge them into one LoRa when you're done.. or you can merge them both back into the same base model and you'll now have a large custom model that just knows what usaq people are without a LoRa.

#

The downside there is less flexibility to apply the LoRa to other larger models and just the inherent large file size of said custom model.

woeful kettle
#

yeah, interesting

manic patio
#

You'd end up with 2 LoRa's in the intermediate, one of them that was concept "usaq male" or "usaq man".. and the other that was concept "usaq woman" or "usaq female". Their respective class names being "male" or "man" and then "woman" or "female"

#

Your options are numerous to be honest. This was just the most straight forward approach that made sense to me while I was wrapping my head around this process.

stone garden
#

What’s the best model to generate attractive faces?

manic patio
#

attractiveness is also subjective, so the prompt can be steered toward what you find most attractive while using photorealism to render it

tiny wolf
#

oh im dumb, by base model i kept thinking about a community one i downloaded, but people usually use 1.5/2.x. Yeah I'm fine doing that

arctic jasper
#

Hey dumbass question, attempting lora training, and when i made a lora file using the lora .pt it generates doesnt do anything at all, but it generates a massive 7GB checkpoint model which works, but i want to use the lora in other models not a checkpoint...! is there some magical obvious step i'm missing?

#

anyone have any advice?

sweet otter
#

anyone else's loss chart look like this?

#

usually i see it look more like a hockey stick. not a cardiogram

sweet otter
arctic jasper
#

merge which? the lora file and the checkpoint you just made generating a lora?

sweet otter
#

yeah... you using dreambooth?

arctic jasper
#

yes

#

but all i want is a small lora file to use with other models

#

i guess i'm unclear why you go to train a lora and it generates a lora that doesnt work and a giant 7GB file you didn't ask for hahas

#

so whats the merge procedure or is there a good tutorial?

fallen cloud
#

I have never used lora.pt so i dont really know, but according to the size it either sounds like a dream booth mode, or that you've set the nerwork rank to waaay to much

arctic jasper
#

is koya something you install within Automatic 1111? i see people talking about it but i still dont know what exactly it even is

fallen cloud
#

no kohya is a separate application, similar to the UI of automatik 1111

#

The Easy Starter Guide to Installing LORA on Automatic 1111 for Stable Diffusion. Follow my super easy Lora setup guide and learn how to train your Lora files for super-high quality portraits. Use Realistic Vision V1.3 as the base model for extremely detailed and realistic results. Get better portraits with Lora, the super fast training tool tha...

▶ Play video
#

Here's a good installation guide

tiny wolf
#

is lora training supposed to slow down as it nears the end 😅

#

over 2 hours it went from 2.5 it/s to 3.5

sweet otter
#

@arctic jasper you have to select both the lora and the new model, and "create new ckpt"

#

then you can use that new ckpt for regular inversion and stuff

#

no idea how its supposed to work as an actual lora

#

ppl be usin the kohya plugin for that

deep sentinel
#

Hey, i have tried the full fine-tuning the text2img model SD and trying to convert the model into inpainting model, I've got this error, "AssertionError: Bad dimensions for merged layer model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn2.to_k.weight: A=torch.Size([320, 1024]), B=torch.Size([320, 768])" do anyone have an idea how to solve this ?

arctic jasper
sweet otter
#

yeah im having a hard time myself

#

i think its because we should be using khoya?

#

a question for anyone:
im installing khoya (turns out i didnt have it) and its asking for a dynamo backend during installation
eager
aot_eager
inductor
nvfuser
aot_nvfuser
aot_cudagraphs
ofi
fx2trt
nnxrt
ipex

#

any idea which I should select?

fallen cloud
sweet otter
#

im getting this error when running khoya_ss
ModuleNotFoundError: No module named 'torch._dynamo'
Traceback (most recent call last):
File "C:\Users\kohya_ss\train_network.py", line 507, in <module>
train(args)
File "C:\Users\kohya_ss\train_network.py", line 176, in train
unet, text_encoder, network, optimizer, train_dataloader, lr_scheduler = accelerator.prepare(
File "C:\Users\kohya_ss\venv\lib\site-packages\accelerate\accelerator.py", line 876, in prepare
result = tuple(
File "C:\Users\kohya_ss\venv\lib\site-packages\accelerate\accelerator.py", line 877, in <genexpr>
self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
File "C:\Users\kohya_ss\venv\lib\site-packages\accelerate\accelerator.py", line 741, in _prepare_one
return self.prepare_model(obj, device_placement=device_placement)
File "C:\Users\kohya_ss\venv\lib\site-packages\accelerate\accelerator.py", line 914, in prepare_model
import torch._dynamo as dynamo
^^^this is where it breaks

tiny wolf
#

Do you guys recommend any settings for basic 15-image lora training? Am I supposed to mess with epoch or keep it at 1?

#

My first lora model didn't turn out so well

obsidian sand
flint apex
#

Does anyone have a working danbooru scraper for specific searches/tags? I found one, but ran into problems with Cloudflare.

bronze igloo
#

whats the current state of Lora? Is dreambooth still the method of choice for fine tuning faces?

unique cloak
#

LoRA stays a little under DB in my opinion, quality wise, but it has lots of over merits that make it about as popular right now

  • very light to share
  • faster to train
  • you can now compose a single picture using multiple LoRA
peak comet
#

Ok so I just tried training a face in 2.1 and damn, I’m not getting what I need. Not even close. My 1.5 settings are not fit for this. I suspect it has to do with the classification images

#

Where “photo of man” in 2.1 spits out out black and white actual photos of man

#

Or maybe I have to play with the lr a bit more. Anyone have any settings they’d like share?

stone garden
#

I've been out of the dreambooth/lora loop for a short while (although in the SD community, a short while might as well be an eternity) and I'm a bit confused about the prevailing wisdom when it comes to captioning your input files; it used to be that adding complex descriptive captions was not preferrable (so if you wanted to train a person or concept, you only used your custom tag) but recently I've been seeing a lot of tutorials and tools incorporate the idea that captioning your images is actually essential. Is this different for LoRa's than it was for DreamBooth? ...or have the auto-captioning tools just gotten better to the point where captioning every individual image actually became a viable option?

main raft
#

I can't speak to the history but every guide I've read puts emphasis on tagging your dataset properly. Might have been different in the past but then I wonder why we go through the trouble with the text encoder to give control over what's actually being diffused? I supposed you could train a lora with nothing but an identifier, but if I had to guess, then I'd assume that all of the information seen in the images is hardcoded to that identifier, so you essentially lose all control for prompting against a style of clothes, backgrounds, hair color, anything, because it only has a very narrow concept of what's been shown and it recognizes that concept only in that narrow view.

Granted, I haven't been active in this space for very long either, but from everything I've gathered the rule of thumb was this: tag anime models based on nai/anything with booru-style tags seperated by commas and realistic finetuning based on sd 1.5 should do proper, descriptive, short phrases for anything in the image that isn't intrinsic to a character. For example, if a character always has cat ears, it's not necessary to mention that in any of the captions, unless you plan on prompting against it at some point and remove them. Similarly, everything you describe can be exchanged by other vectors in the future. What I mean is that a prompt of "sunny park" can easily be exchanged for a futuristic landscape while invoking the LoRA - if captioning has been done properly. If not, then I'd imagine it being either very difficult if not impossible to exchange certain information, or you'll get some really terrible blends because it tries to squeeze in the information of a sunny park into any of your prompts, regardless of what you type. I haven't actually ever tested training without captions, so take anything I say with a grain of salt. I might be completely wrong in my understanding.

#

Ah, the auto tagger tools certainly do a ton of the heavy lifitng, but I haven't worked on a case where I didn't have to go in manually and dispose of any unwanted clutter. In other words, working with large data sets will still require a fair share of manual intervention. There are also people who don't touch the auto tags at all and get very favorable results, but those are cases working with anime models with which I haven't spent much time yet.

reef cliff
#

Hi everyone. I'm trying to learn more about using textual inversion. I have successfully trained a few embeddings on my own drawing style. They don't look that much like my drawings, but I like them a lot anyway. Something I'd like to be able to do though is add in other text like 'photorealistc 3D render' (which is very, very far from what my drawings are). But when I do this the result is like if you just made my style very bland. I think what is going on is that it is simply shifting my embedding towards 'photorealistc 3D render', which is losing the essence of how it looks (and not gaining any 3d render lookingness)

#

I watched a youtube video (How To Do Stable Diffusion Textual Inversion) where the uploader said that if an embedding is overfitted it loses 'flexibility'. I used his suggested tool, the 'embedding inspector', and used only embeddings below 0.2 strength (above which apparently is where they lose flexibiilty), but it didn't help.

#

I did also use prompts in the generation, like "experimental drawing by [name]", which didn't really help either.
Any ideas?

mental frost
flint apex
#

Yes

hollow niche
#

Hi everyone, we're a games studio working on a title that requires quite a bit of social assets, usually using the same characters, in different poses. Here's a reference of what we'd like to achieve with training SD. We have the 3d models to generate the training materials, but I am very curious to hear what training strategies you'd suggest to achieve accurate variations on these characters.
I was thinking on training using Textual Inversion and controlling with skeletons or sketch in ControlNet.
What are your thoughts? Thanks!!

mental frost
flint apex
#

Thank you!

high venture
#

Anybody can tell me, are rare tokens for 2.0 and 2.1 models the same as for 1.5 model?

#

When training dreambooth on 2.1 768 model, and i use the same I am using for 1.5 model from "rare token list"?

visual urchin
unique cloak
# hollow niche Hi everyone, we're a games studio working on a title that requires quite a bit o...

Hey !
Just a first idea with what you gave here, here is how I would go about it :
1/ train a dreambooth on each of the characters you want here.
2/ train a TI or/and an hypernetwork on getting the neon/plastic aesthetic down, while still refining your characters
3/ use controlnet for the poses, good prompt for the rest, photoshop for the texts.
yep, doubly training the same character in, so you don't loose quality on him. That would be what I would try, but mostly because you are going after very few shots of very high quality

quaint canyon
#

Hi guys, I'm having a problem with my stable diffusion. After spending a lot of time installing various models and making my tool robust, an error has occurred that I can't solve. Can someone help me with this?

I would love to receive some suggestions on what I can do to solve the problem. If possible, I have attached some screenshots to show what's happening.

Thank you in advance for your help!

high venture
thin saddle
#

Anyone have experience with large training sets for text-to-image fine-tuning? I have 122,000 images (I can easily increase to as many as I need) with very high quality text descriptions ranging from 35 to 77 tokens. I'm going to be training on an A100 40GB (could move to 80GB if needed). I'm trying to figure out max training steps & batch size. Would love to learn from other's experience here..

tribal frigate
#

is it possible to train anything on 12GB vram? is it just too slow or you wont even be able to run the process?

visual urchin
# high venture What is rare token generator? It is an extension to auto1111 webui?

It's part of the original Dreambooth codebase. Unfortunately, almost no one uses it despite it being quite important to training.
https://github.com/Victarry/stable-dreambooth/blob/main/generate_identifier.py

GitHub

Dreambooth implementation based on Stable Diffusion with minimal code. - stable-dreambooth/generate_identifier.py at main · Victarry/stable-dreambooth

#

Everyone just uses SKS, but the Dreambooth paper makes a specific point to mention that you shouldn't just guess at random tokens, but specifically extract one from the model.

spare kestrel
#

Which clip model is used for stable diffusion v2/v2-1? From the readme file in github, the laion/CLIP-ViT-H-14-laion2B-s32B-b79K model is used in stable diffusion v2/v2-1. I tested the laion/CLIP-ViT-H-14-laion2B-s32B-b79K model, but the result and parameters are not the same. I'm confused. Does anyone have the same problem? HELP

main breach
#

What is a good base model to train faces on? or does everyone just use SD1.5?

unique cloak
main breach
#

But a LoRA trained on 2.1 can't be used on 1.5 models correct?

unique cloak
#

I'm not 100% sure but i think you are right

#

I do dreambooth mainly, not Lora, so not sure

main breach
#

That would slim my use of it down to less than 1% xD

unique cloak
#

You can also find a very nice photorealistic model based on 1.5

#

And train that

main breach
#

Also all I have online are tutorials on training Anime LoRA, are there any guidelines on what settings to use for photoreal ones? Learning rates and schedulers? etc?

hollow niche
rigid starBOT
#

@summer stag

FAQ: How do I generate images?

Currently there is no bot on the server that generates images. However, there are plenty of other ways such as the official https://beta.dreamstudio.ai/ website or running Stable Diffusion locally using your own system resources! Check out #1080946152318443610 for more details! You can also stop by #1025467151206854736 for any issues you experience while using the website or #🤝|tech-support for any problems you encounter while installing it locally!

arctic steppe
#

First time starting webui-user.bat each day, will take a very long time, why?

It will sit at Commit hash: xxxxxxxxxxxxxx for a very long time, and then move on.

hexed bloom
#

half a mill down, a million to go!

main breach
#

Ok I've trained 3 Lora on a real person today, they are all shit. I've trained plenty of great anime lora but somehow I cannot manage a realistic one?

#

Are there any writeups on the right settings and captioning for realistic LoRA? I watched two youtube videos, one from a beardy bald dude and one from some old dude and both were worthless to my efforts.

#

I managed to get the body relatively right, but the face seems just completely wrong, is there any point in adding a concept folder with a bunch of photos of just the face?

oak void
#

Quick question, when training a textual inversion, lora, etc for a face, does the size of the face matter? Eg I have 10 closeup photos, will it be able to incorporate those into scenes or only be useful for full-sized faces (eg inpainting)?

stone garden
#

does anyone have a good suggestion for a workflow to improve faces while upscaling?

finite osprey
#

Hi guys I'd like to plot the strenth of the lora in Automatic1111, what X/Y/Z settings should I use for it? thx!

sturdy rune
# oak void Quick question, when training a textual inversion, lora, etc for a face, does th...

It will still be usable but not as good as if you included 3-5 waist up pictures. It will basically make someone vaguely resembling them if not a closeup...but you will then be able to take that to inpaint and touch up the face with "original area/masked only" and it will work. If you want it to gen nice waist up or full body of your person you need to include that in the training...i like to think of it as... stable diffusion doesnt know your friend has legs unless you show it that they have legs...make sense?
I should mention that with just face pictures you'll need to drop the Lora weights enough until it actually lets you generate somebody with a body Then that is the image you're going to inpaint patch up

sturdy rune
torpid night
#

How to do transfer training on a SD model? Preferably on Google Colab
How to merge models without quality loss on Google Colab?

oak void
#

Or very strongly skewing towards the outfit. I thought maybe i could just start the TI with something generic like “40 year old caucasian man” and only train the face.

sturdy rune
#

If you remove the background on all your data make sure that you switch up the colors

oak void
#

Do you caption the outfit? Maybe i should use something really generic “black shirt and jeans”. Im making a TI of myself so i can pretty much take whatever photos to replace the ones i just pulled from my profile to start with.

sturdy rune
#

Go to the model that you're trying to train it... I make a prompt using the description that you're doing

#

If the model you're using doesn't know what a t-shirt is it doesn't matter if you put it in the captions it's going to attach it to your character

oak void
#

I’ve been training on top of 1.5

sturdy rune
#

And be careful with things like jackets that have patches

#

Because you might think that just putting leather jacket is enough next thing you know you're generating a character that has leather patches on a regular shirt

oak void
#

Yeah that’s why im thinking a plain black or white shirt with jeans would be the most generic possible outfit

sturdy rune
#

Are you training someone that's real someone that you know?

oak void
#

Myself

#

With the goal of being able to make creative profile pics, etc

#

“JaRail riding a motorcycle” “JaRail winning bodybuilding competition” 🤣

sturdy rune
#

Then you might try my method... 12 pictures of your face, 5 pictures of your torso and head, one picture full body. Solid white T-shirt and blue jeans. Or if you're comfortable enough shirtless and tighty whities (captioned boxer briefs)... Standing in front of a solid wall with nothing else in the picture.

oak void
#

Yeah i can def do that

sturdy rune
#

Your caption jarail standing, nude, shirtless, nipples, infront of a solid white wall wearing white boxer briefs

oak void
#

Do you just do front on faces with different expressions?

sturdy rune
#

And yes you have to put nipples, The reason nipples show up through clothing on a lot of people's models

#

For some reason stable diffusion is very well trained on nipples and shirtless being two separate things

#

Absolutely no expressions just poker face

#

It will still do expressions unless you want it to learn how you're smile is

#

I've had no problems prompting for angry smiling pouty lips on alora trained purely on poker face

oak void
#

I was thinking it could learn my teeth, as they aren’t perfect. But I wouldn’t mind perfect teeth either haha

#

Cool thanks for all the advice. Ill let you know how it goes.

sturdy rune
#

Just for a show of good faith that I'm not blindly leading you, i did this last night of my partner

#

Using the exact method that I'm telling you

oak void
#

Damn those are amazing

#

Do you train on 1.5 then use a different model?

sturdy rune
#

LR .000001 ...86 repeats..net rank 128...

oak void
#

I’ve been doing that and seems to work alright

sturdy rune
#

On 1.5sd yes

#

And it works on any model that has even a little bit of 1.5

oak void
#

What did you use for starting tokens and number of vectors?

sturdy rune
#

Use koyha the instance was just their name... With an underscore between first and last

oak void
#

I mean when you create a TI in automatic, you can “seed” it with some tokens “40 year old man” before training

#

Or start zeroed, or random

sturdy rune
#

Regularization images... I didn't use them

#

I could, but for images like I'm doing here I don't think it mattered

#

This run came out perfect on the first try though I do normally prefer them to be a little teeny bit over fit

#

But the reg images just keeps it from putting the person's face on everybody in the image

oak void
#

Let me find what i mean one sec

#

Okay it calls it "initialization text"

#

and then vectors per token

#

Those are the two I was asking about.

#

I don't care about keeping the TI small. I'm rarely hitting vector limits.

sturdy rune
#

Ah.. Gotcha yeah..i do the train with lora/locon ... If what you're after is what I just did there where I took somebody train them in and I can put them in cool ass pictures maybe you should consider doing Lora training instead of embeddings

oak void
#

But I was worried having a large number of vectors was contributing to it picking up backgrounds, etc

#

Okay. Yeah, I'm fine trying out Lora, dreambooth, etc.

sturdy rune
#

Yeah do a lora man, its far easier, faster, and you see its the kind of results it sounds like you were trying to get

oak void
#

Yeah, I should have checked what you were using before asking. My bad.

sturdy rune
#

I can put him in any scenario, doing anything I went through a shit ton of genres last night

#

As you can see up there I went through Cthulhu Superman cyberpunk steampunk

#

It had no problems and that's exactly what he looks like

oak void
#

Your results are pretty amazing. I'm really impressed.

sturdy rune
#

It was using a sketchy ink punk dream like art mix

#

Which is why it looks drawn

oak void
#

I think my workflows will be a lot more controlled, lots of inpainting to get the scenes I want. But just throwing in a prompt and getting those results is a great start. I love the art styles.

sturdy rune
#

Yeah these were just right out of the gens

#

I didn't modify them

#

I probably could have cleaned up some of the sketchiness that I didn't like

oak void
sturdy rune
#

But I was just testing to see if it trained well I wasn't actually trying to make art with it yet

#

😆 The top half lol

oak void
#

haha

#

Alright, I think this is going to be fun.

#

Will let you know in a day or two how it goes.

sturdy rune
#

And a lot of people say train on the model you're going to use it on and yes it does produce better results for that model

#

But I like to switch the models constantly cuz all of my models do completely different things and training on 1.5 as long as the mix has 1.5 in it it will generally work well

#

And definitely well enough to tell who it is

oak void
#

Okay, gotta run. Thanks!

sturdy rune
#

Cya! Let me know how it worked

main breach
#

More than 16~20 ish pictures better or worse for realistic person LoRA training? What about variety? Make up variation, hair variation? Poses? Face angle?

crimson wasp
# oak void I mean when you create a TI in automatic, you can “seed” it with some tokens “40...

I've been doing a lot of TI of people and artstyles to insert into the model before full finetuning, to try to contain as much of the changes from each concept in just the specific tokens and not damage the model so much as it initially has to lock onto the concept.

A technique I've come up with is using celebrity face match sites to find existing faces which SD might know, figure out how many vectors their name encodes to (and ideally see if their first name or last name gives good approximate results), and sometimes blending multiple names together using the embedding inspector extension for a1111 webui. After that it can be just a few dozen steps to start to get a decent likeness with textual inversion and a high LR like 0.001 - 0.0005, though you have to be careful because it quickly overshoots and begins to get worse

#

Starting with a known person helps capture all the relevant info which describes them in minimal tokens, and seems to help SD understand the concept being worked with

oak void
#

hmmm that'd be interesting.

#

I get a lot of "you look like David Duchovny" comments

#

So I could prolly use that.

#

I've basically just been spending the last day fixing a mess of package requirements. It sucks having automatic1111 be MIA without anyone merging important fixes into webui.

gloomy sierra
#

I've been trying to work out which is the best method for style extraction, and so far it seems like the conclusion is LoRA due it's modularity -
Is this correct, and if so, is there a maximum number of images to train a LoRA on? I often see people using a small number like 20, but I have about ~150 I'm slowly captioning, but will cull it back if there is an upper limit that beings to degrade.
I've searched fairly extensively around upper limit, but so far I haven't been able to find any info on this, so hoping someone that's experimented with smaller and larger datasets might have some insight!

dense flame
#

When merging two Dreambooth models can I then use both of the "trigger words" for the separate styles/objects?

hardy storm
gloomy sierra
#

After a bit more research the closest (low confidence) guidance I've found is:

Minimum 10 per concept, but much more is much better

[ https://www.reddit.com/r/StableDiffusion/comments/114dxgl/advanced_advice_for_model_training_finetuning_and/ ]

Which I infer means "at least 100" to put a 10x qualifier on "much more"

I'm at ~50/200 and already feeling the pain - luckily modafinil exists 😑 💊

(fwiw this is the guide I'm following through for image preparation in case it helps someone: https://youtu.be/7m522D01mh0?t=671)

hardy storm
gloomy sierra
#

edit: spoke too soon lol - starting to degrade ~ epoch 18/20

#

if I understand the process, stripping it back to the "best" epoch is the next step, though

gloomy sierra
#

interestingly after building the lora, I keep getting this long Failed to match keys when loading Lora '{path}/{name}-000001.safetensors' error each iteration.

fwiw the error array starts with ['lora_te_text_model_encoder_layers_0_mlp_fc1.alpha', ...]

I note on the github repo there was some relation to the locon extension, but I specifically didn't use this during training, and after following that fix the error still occurs. [src: https://github.com/bmaltais/kohya_ss/issues/310 ]

Additional unresolved thread at [ https://www.reddit.com/r/StableDiffusion/comments/11mkyko/failed_to_match_keys_when_loading_lora_error_fix/ ]

I'm trying to work out if there was some value I set in training that somehow has it out of sync with v2-1-768-ema-pruned.safetensors 🤔

#

weirdly, as in the thread referenced, it still kind of works though - so not sure if this is more of a warn than an error

warm anchor
#

How should I read these graphs?

warm anchor
#

Also, if you have 10 times more training images, should you train with 10 times fewer iterations per image?

wooden zephyr
#

Hey guys, anyone know how to start training in dreambooth without it generating class images? I'd like it to use the images already present in the folder, which it does, but then it still it generates another 30 or so extra images even though there are more than enough images present! How do I skip that step, there doesn't appear to be a setting anywhere...

warm anchor
wooden zephyr
gloomy sierra
#

has anyone found a fiverr or similar supplier that reliably captions/tags image sets well?

candid ledge
#

Hey guys, could anyone tell me good settings for Kohyas Fast Lora ?
I have some settings that work very well, but it feels like that there are much to improve.
Any ideas to improve them ?
Data Set Repeats was : 60

hardy storm
gloomy sierra
#

Also CLIP skip should be 1 (not 2) for a SD1.5 model which it seems like you're training

unique cloak
#

Does someone around here, familiar with training 2.1 and 1.X, help a brother out ? I'm desperate at this point. I can't manage to find params that train well on 2.1 compared to what I get on 1.X base... how many repeat do you go per picture for a style for example ? do you play on the learning rate ? I'm still on 1 or 2 e-6, not sure if I should change, the tests I did didn't work well on that...

I'm just so lost on this, I want to get it right, but not sure what i'm messing up
or maybe it's a prompting thing... I don't know.
With the same dataset and caption, I get some good checkpoints on 1.X, but on 2.1, I get to the overtrained phase before any good checkpoint shows

serene flicker
torpid night
#

How do I merge 100's of models together without quality loss.

#

I would also want to know how to transfer train.

upper wasp
#

i am looking to train a custom model that can replace a shirt on a person with a specific shirt. does anyone do consulting here that can help me?

unique cloak
# upper wasp i am looking to train a custom model that can replace a shirt on a person with a...

(cut past from the other channel)
well, then you train this as you would train a face. let me detail :
1/ you select 5 to 15 pictures of your piece of clothing, in the most varied scenes possible (or at least in the types of variations you want it to be able to do) : lights, person wearing it, ... Some close and some far pictures.
2/ train like a subject, training all those pics only on your chosen token, nothing more fancy.
3/ use inpaint, mask the piece of clothing to replace, prompt with your new token, using your new model

upper wasp
#

thank you

#

do the image names matter? malemodel-onboat-wearing-sku123 for example

#

and then concepts to train with it - i wasn't sure of the format

#

but the idea is to be able to type "man on boat wearking sku124" and it would replace sku123 with sku124 and give me all variations

#

with that specific shirt on it

#

also would like to setup a pipeline to continuously train as new images would come in

gloomy sierra
gloomy sierra
#

Anyone come across a good explanation of LoHa vs LoRA?

gloomy sierra
hardy storm
thorn gorge
#

Which is the section to find embeddings?

grave igloo
#

Hi guys
I'm having an issue training a model of my face in TLB's dreambooth colab.
I got great results training on Runway SD 1.5, but when I tried training on Deliberate I barely get decent results...
Do I need more steps or a biggest dataset in previously trained models?

frigid fog
#

Not sure if this is the right chat to ask this, but is it possible to get good results from a LORA trained on 3D game models for characters that don't have artwork made of them? (Such as custom characters from video games)

gloomy sierra
#

Any thoughts on the improvements ChatGPT 4 modified here?

#

(reference)

#

Some modification to the prompt after a few tests:

I want you to act as a Stable Diffusion AI expert specialising in Dreambooth training. I will give you a series of captions used to describe an image, and you will improve them for use as Dreambooth training captions. Do not use any full stops or punctuation within each caption. Captions should be comma separated. All output should be in lowercase. You will also correct spelling mistakes in US English. Write "..." if you understand. Do not write anything yet.
hardy storm
# gloomy sierra Any thoughts on the improvements ChatGPT 4 modified here?

First off, that image ... 🤌😙

Secondly, I'm terribly excited to try out GPT4. I was never able to get great results with ChatGPT in my previous trials, but something tells me I was overcomplicating it by writing a dissertation trying to teach it everything about image prompting. My goal was always that I could simply give it a single subject, just one or two words, and that it would come up with the rest and not have it be nonsense. What usually ended up happening was it would fall apart over time become less and less prompty and more like it was just writing a short story.

Was that using ChatGPT Plus?

stone garden
#

Greetings everyone, I'm a beginner and I am facing a challenge while attempting to train a Stable Diffusion model with DreamBooth for a specific product, an E-bike, I’m using a 3D model to generated numerous reference images from various perspectives to ensure adequate training data. However, despite my efforts, the trained model is generating weird results for the E-bike see examples.

Unfortunately, I am unable to find any resources or guidance on how to train models effectively for detailed products like the E-bike. Does anyone have any tips or tricks?

Specs DreamBooth:
Training image count: 50
Reference image size: 512x512
U-net Training steps: 1500
Learning rate: 2-e6

#

this are the reference images

#

This is the output

gloomy sierra
tired wind
#

Thank you @gloomy sierra, I really appreciate the shout-out!

hardy storm
#

What's your guys' workflow for captioning large datasets?
Do you use a a specific app or interface?
I'm brute forcing this like a pleb opening each file individually in a text editor and a Windows image previewer and ... maaan, it's a slog

wooden zephyr
#

Any way of getting the seed from a sample image generated by dreambooth? Has a txt file with just the prompt.

abstract plover
#

LoRas are now supported ✨

cold cliff
#

Have anyone used Dreambooth to generate a specific sci-fi speices/race? If so, did you do something different to how you'd go about doing it for a character or an art-style wrt. labeling, settings, class/instance?

I've just started using DB since I was short on VRAM before, and the results I'm getting so far are overfitted.

unique cloak
# cold cliff Have anyone used Dreambooth to generate a specific sci-fi speices/race? If so, d...

I have trained some original things, like Sid, or TMNT, but not exactly that.
You have 2 ways to approach this:
1/ you can go the "training a subject" way. you have 1 alien, and you take around 5 to 15 pics in diversified environment/light/general setting, but same alien.You train on 100 repeats on 2e-6 LR if doing it on 1.5 base. Best starting to make a checkpoint at 80 repeats, and add more if undertrained.
2/ you can go the "training a style" way. You have lots of different aliens from the same race, 50 to 200 pics presenting the most variety possible in alien features/environment/light/general. You train on 100 to 200 repeats on 2e-6LR for 1.5 base model. I do usually 4 or 5 checkpoints in between those 2 and compare the quality of the results

Sorry I have only the 1.5 measures pinned down in those scenario. I can't find the good values for 2.X on that.

if going the 1st way, you would caption your dataset using always only 1 same token that would represent that specific alien.
if going the second way, you could add to that token some specificities that you have named for your alien scpiecies, like give a token to represent tentacles, or other features. This will train on multiple tokens and let you call on only the features you'd like. This will require a better dataset than the first though

cold cliff
unique cloak
# cold cliff Thanks for the pointers! I guess I was doing the second way with just 8-9 pictur...

I don't always use class images at all. when I do, I mostly include a second concept in my training, that is far away from what I'm training on.
Example, training on Mosaic art, I added a second concept "rick roll" for photorealistic people, and hit more weights in the model and keep it grounded.

The main concept of regularisation/class data is that : you don't overtrain the main weights your new concept is targeting. Instead, you also train on varied things on the whole model. So having a secondary concept of high quality photos and high use (like maybe photos of "full body shot" or I don't know what else could be uselful in your case) will result in 2 goals at once : training a useful concept, and having regularisation so your main concept doesn't overtrain too fast

#

but given what you were saying, the main problem is the subject's dataset

#

8-9 would be good if it was the same alien

#

if you go for lots of varied people, I wouldn't go under 50

#

and I would try to keep it balanced

#

like don't have 50% of them grey and the rest different colors, they would all be grey in the end

#

it will try to find things common to most pictures and learn those

#

likewise, you want poses and ligthing, and all that, that change a lot

#

if it doesn't, it will also get picked up and learned

cold cliff
#

Or does less relevant details sort of get 'washed out' by there being 50 samples to keep things like that from sticking?

unique cloak
# cold cliff Does it help to label the environment, lighting and pose for the dataset?

yes and no.
Each word that you add in your caption has 2 main effects on the Attention of the algorithm :
1/ it will take some of that Attention away from other tokens, meaning the other words in your prompt get a little less trained on each step.
2/ it will train those token a little, doing 2 main changes :
2.1/ it will reduce the impact those feature had on other tokens. For example, if you add a "tie" in your caption and the alien is wearing a tie, then it will reduce the fact that aliens wear tie, it won't produce those without being prompted for it
2.2/ it will train the token itself. taking the same example, it will make the "tie" token a little more fitting to the example tie you showed it.

#

if you don't caption those background elements at all, it's not a problem, as long as they change a lot

#

if they don't change a lot, then they get trained into the alien token too

#

each step of the learning, depending on your learning rate, there is some kind of "Attention budget", it represents how much this step is able to modify the model at once

#

adding more tokens splits that attention, and that has some good and bad effects that I described

#

in a way, and played with well, adding more tokens like that acts as regularisation

#

like if you use a different token describing a different background in each picture

#

this will split a little of the attention of each step on a new token on the model : the same as regularisation does

#

as long as you don't repeat those tokens

cold cliff
#

Ah, thanks again! I don't have enough Mass Effect screenshots lying about to try this right away, but I'll definitely try it sooner rather than later.

unique cloak
#

I get that feeling 🙂

#

I also learned big trainings like that through series and videogames

#

here are some funny experiments https://huggingface.co/Guizmus/Experiments with just the ckp and some examples
and here is my main library, with usualy the dataset and all training parameters useds on each model. https://huggingface.co/Guizmus
in particular, the death not model has lots of people in it, and the different PoW models are trained on around 50 different tokens

plucky current
#

what would be the ideal resolution for training images to finetune a model using dreambooth?

gloomy sierra
plucky current
gloomy sierra
#

As I understand it, it would just downscale or bucket them to whatever target setting you use, but I'm not sure exactly if there are any downsides to bucketing other than resource usage

cold cliff
#

With [filewords], are there tokens/words that are ignored in filenames by Dreambooth? I'm trying to avoid creating text files.

EDIT: I went digging in the code, it appears that Dreambooth (at least the one in the a1111 extension) ignores a number and dash prefix like TI does.

hexed bloom
unique cloak
hexed bloom
#

You talk about tokens, and multi-tokens but you don't talk about full sentence captioning

unique cloak
unique cloak
hexed bloom
#

In dataset diversity, for thousands+ of images, you should mention how each subject should have at least X amount of images to train properly

unique cloak
#

(those are great feedbacks, thanks a lot)

hexed bloom
#

The simple UI is great, makes it easy for anyone to navigate

#

I noticed though, I press back too quickly as it loads, the page still loads while not allowing me to go back

cold cliff
#

I got a chance to try out the advice from yesterday's chat.

I'm getting much better results with a set of 53 samples with minimal labeling (species name + color), but it still feel like it's overfitting somewhat. I did 150 epochs with LR 1E-6. The faces ended up looking quite detailed and crisp even at 3.5 CFG while the background got washed out.

Since it's a Mass Effect race, and I'm a bit iffy on using fanart, it's hard to build a bigger dataset without repetition.

unique cloak
hexed bloom
#

ah

unique cloak
#

I may migrate everything on github one day, but it's moving quite fast currently (the AI field) and I really like the quick productivity that this tools gives currently

#

it's great at least to lay out the plan

hexed bloom
#

Something I'd mention, that a lot of people get confused is that tokens are a caption, and that captions are multiple tokens. A lot of individuals still get confused by the two, even though they are the same thing

unique cloak
#

yeah, I tried to clarify the most things using that terminology page

#

I'll add lots in it I think still

#

it's really useful

unique cloak
hexed bloom
#

10-20 very different images of the same subject

unique cloak
#

ok so treat every token as a "single concept training"

unique cloak
#

ok.

#

got to go for now, but I'll add it in there

#

thanks again 🙂

hexed bloom
#

good luck! np!

dapper prism
#

What's the go-to project for training LORA models?

gloomy sierra
#

My settings (fwiw)

1.1. Install Dependencies
- branch: <blank>
- install_xformers: true
- mount_drive: true

(I have the SD2.1 safetensors model copied to Drive to avoid re-downloading each time, so a few steps are skipped here. In fact, I deleted all the ones I never use, but ymmv)

3.1. Locating Train Data Directory
train_data_dir: /content/drive/MyDrive/LoRA/projects/<name>/
reg_data_dir: <blank>

(I pre-caption all my images in Drive, so have deleted a few steps here)

5.1. Model Config
- v2: true
- v_parameterization: true
- pretrained_model_name_or_path: /content/drive/MyDrive/LoRA/models/<your-sd2-safetensors-model-name>
- output_to_drive: true


5.2. Dataset Config
- train_repeats: <15 - 20>
- reg_repeats: 1
- instance_token: <project name>
- class_token: <doesn't matter, so use "style", "man", "car" etc>
- resolution: 768
- flip_aug: false
- all rest defaults

5.3. Sample Prompt Config
- optional

5.4. LoRA and Optimizer Config
- network_module: networks.lora
- network_args: <blank>
- network_dim: 128
- network_alpha: 128
- optimizer_type: AdamW8bit
- optimizer_args: <blank>
- train_unet: true
- unet_lr: 1e-4
- train_text_encoder: true
- text_encoder_lr: 5e-5
- lr_scheduler: constant
- lr_warmup_steps: 0


5.5. Training Config
- lowram: false
- noise_offset: 0
- num_epochs: <8 - 16>
- train_batch_size:
- mixed_precision: fp16
- save_precision: fp16
- save_model_as: safetensors
- clip_skip: 2
- gradient_checkpointing: false
- gradient_accumulation_steps: 1

5.6. Start Training
- make sure to check all the `print` outputs from previous steps then run


cold cliff
#

I put the results of my DB training and dataset on HuggingFace. It can sometimes generate cool results, but it's very overfitted from the games' lack of variety. The larger sample count did really help with quality and prior preservation, though.
https://huggingface.co/vmaple/sd_asari

unique cloak
#

I'll take a look at your dataset if you want when I get the time.
EveryDream is using dreambooth mechanism, it trains the model the same way. We tend to refer to dreambooth as small trainings, single subjects or style, but it only depends on what you give it, how you caption it,... Even regularization data is optional in dreambooth if you'd like, of you can teach a second concept far from your main concept, and use that as regularization.
Regularisation data main goal is to spread the training on the model, so not only the weights you want are trained, and the model keeps "grounded" for longer. But it's data that gets trained on too, so you really benefit from "good" regularization.
EveryDream lets you train on as many pics as you'd like, doesn't make a distinction between class data and regularization data. It's all data to it, and you need to balance your dataset with some regularisation if you want to prevent the bad effects of long trainings

unique cloak
cold cliff
#

I'd appreciate that! I want to get a solid understanding of it for one concept wrt labelling and settings, then start working on assembling a bigger Mass Effect dataset.

#

I know I pollute the color labels since using them alone with this model does make cursed half-asari, but I found that using labels like "blue_skin" as the model on CivitAI does caused a strong bias towards revealing clothing or nudity with the same training time and parameters.

#

Maybe adding concepts with other blue and purple things would help there.

unique cloak
# cold cliff I'd appreciate that! I want to get a solid understanding of it for one concept w...

I think the biggest problem you encounter here is, you train lots of tokens total but with the same amount of base pictures you would use for 1 single token training. You used my tip and split the attention by adding more tokens, refining colors for example, but this reduces the attention on the alien token itself too. Thus needing more and more steps like you saw, and overtrain happens because not big enough dataset.
Problem is, the dataset won't be able to grow a lot more while keeping variety, so reducing the token count could help, or increasing a little learning rate.

I can't really know more on mobile phone rn, I'll need to look at the dataset itself and the mix of colors to understand better

cold cliff
#

The almost full body sample and the backs of heads might have to go, since their expression in the generated results are very distorted. Though for the latter, maybe that will cause generations with the head on backwards. 😛

sturdy dagger
#

Hello everyone, a very beginner question.
Supposing I'm using dreambooth to train it on me but I'm not using the default SD 1.5 model but already a modified one such as this one for example https://civitai.com/models/4201/realistic-vision-v13-fantasyai.
When I'm going to generate new images the generation will go more towards result to Realistic Vision than what could offer basic SD 1.5 (with a prompt following guidelines for Realistic vision) ?

unique cloak
#

this depends a lot on the quality of your dataset though

unique cloak
# cold cliff The almost full body sample and the backs of heads might have to go, since their...

ok so here are my feedbacks on that dataset :
1/ there is not enough full body shot. We see the foot in no picture at all for example. Some would be better. around 10% of the dataset seems like a minimum if you don't want to be stuck prompting heads only
2/ good diversity in the settings themselves
3/ The back photos don't bring a lot to the table here... They are not nice artisticaly, and can be confusing for the AI, so I would remove all of them.
4/ some pictures, like 13 or 43 can be hard to understand maybe, a little too dark ? Not sure, this could also be good.

Given the diversity, 50 pictures more or less, I would go for 2 to 3 tokens.
The main one is "asari" for sure, no doubt about it, and it should be on each and every picture.
The second one could be male/female, not sure. the color can seem tempting but yeah, I'm afraid it takes too much away from the main token to add 4 colors or something...

After that, it's about all. You could also add "closeup", "halfbody" and "fullbody", one or the other depending, to each picture.

Does this need regularisation ? not sure it does. Simplifying your tokens first like this may be enough.

#

one other thing that come to mind :
this could be reduced to a 20 picture training too. droping all token except "asari", taking only the best most artistic pics, even maybe including an artwork or two. This could work great, there isn't a lot of variety in those asari models

#

it's between a single subject and a style... it feels more like a variation on a single model of asari, if you know what I mean.
skyrim did better in the variety of kajjits for example

cold cliff
#

Skyrim had a rig on the face for character creation, where as the asari are the same 5-6 general head models with textures aside from major characters like Liara, Peebee and Samara. ME3 had the best texture for the heads, but they were rationing memory hard in that game.

I was hoping for better in Mass Effect Andromeda to have a CC for multiplayer, but that game did even worse by having a very distinct face mesh for all asari except the party member. Dragon Age: Inquisition was done in the same engine, and it had a pretty powerful albeit unwieldy character creator for its four races

#

Thanks for the feedback! I'm using the GPU for other stuffs now, so I'll try with these changes tomorrow.

unique cloak
#

they botched it a little here I admit... (I haven't played ME yet, I promised myself to do the trilogy one day though)

unique cloak
peak comet
#

hey folks, so I've trained hundreds of models on people, mostly myself, but I just never really get the results I'm looking for.
My ask:

Would you mind training some 1.5 models on my concept images, using your approach, and explain that process to me? I want to test if it's just biased from looking at my own face and seeing everything wrong with it, or something else. I'm willing to pay too.

unique cloak
peak comet
#

ok so not just me, good to know. Thanks for the reply, lmk when you're free. I can run on my machine too if you walk me through your approach.

unique cloak
#

I'm set up on a lower rig than yours (3090ti) but I should be good, i train a lot on it too

#

Thanks though. In any case I'll document what I do don't worry

peak comet
#

appreciate you!

stone garden
#

have installed kohyaa and train 960x960 16bit on Windows 3090, 32bit and 1024x1024 I will not achieve with linux specific optimizations? who know? :/

#

where is the upper limit?

gloomy sierra
gloomy sierra
# gloomy sierra One huge breakthrough I had even with crappy input was putting the Analog LoRA o...

this is my current workflow and comparisons if anyone is interested or has some suggestions (ie @unique cloak ) 👀

1. Find image in the pose and clothing you're after.
2. Activate your custom face LoRA (I generally find anything less that full `1` strength is not enough)
2. Use inpainting in img2img to get a rough version of your target face to the source image (just needs to get the basic sketch down).
3. Use the sketch as a source for ControlNet on txt2img to get the the Canny thresholds right. (Preprocessor "canny", Model "control_sd15_canny")
4. Then keep running "cheap" experimental passes on lowish Steps and no face or hi-res fix until you find the aesthetic you're after.
5. Use that seed to run the same job with Restore Faces and Hi-res Fix (takes a while, so I leave these until I'm fairly certain I've got the overall composition and color grading down).
6. Adjust the Hi-res denoising until it fits.
(Not shown in the image below).
7. Go back and start to inpaint any bugs like extra fingers etc.

I added examples lower in the image to show the effect of the ControlNet, Analog LoRA, and Hi-Res Fix, stripping them back so you can see how crappy the original output is by comparison.

Before I worked out these other couple of steps (most importantly the Analog LoRA and Hi-Res fix), I found that having the face LoRA on strength 1 created lots of image degradation, but anything less started to morph to a different person, which was frustrating. This flow let's you use the full strength to get correct face correlation, but somehow magically heals all the artifacts.

Analog LoRA: https://civitai.com/models/14826/analog-film-photography-portraits

You can probably use the base 1.5 model by cranking up the weight on the LoRA, but I'd suggest using a photography ckpt as your base and mixing in any LoRAs to your preference.

I don't have my face model trained on 2.1 yet but I imagine you'd get even better results with Realism Engine (https://civitai.com/models/17277/realism-engine)

warped anchor
#

Hello, everyone! I’m new. If I have chosen the wrong channel please redirect me to the proper one.
I’m looking for hints on the pipeline for my task. I do a bit of art and I want to make comics with SD (mostly to get familiar with technology). Do I understand the general idea properly?

  1. To make a comic with my own characters I need to train some models, but as for backgrounds and general style I can use something like the Comic Diffusion model.
  2. I can draw faces of my characters from different angles and feed them to a textual inversion model. After that I’ll be able to get more or less similar faces from SD.
  3. Do I need to train a different model for their bodies and outfits?
  4. Is this line of faces enough to explain to a model how the character looks? Do I need to draw emotions as well?
  5. What are other things that I should know?

As for the database I can do about five angles per character but not dozens, etc. This is why I’m looking at textual inversion. I don’t mind manually fixing things on generated images so I don’t need them to be super precise.

Having several trained models, how do I apply them into one generation? Do I need to switch between them with Inpaintt? Any other good ways?

So far I’ve managed to install SD and spend some time there and in MJ but that’s all. I'm a total noob who has watched two dozen videos on YouTube and is now overloaded with info. I’d be super grateful for any tips and help you can provide. Huge thanks in advance!

unique cloak
# gloomy sierra this is my current workflow and comparisons if anyone is interested or has some ...

it seems like a solid workflow to me, yeah.
his problem seems to be high detail quality, and it would help but I think it's also to "manage to train correctly in dreambooth".
Thanks for that realism engine, it seems like a base i'll consider here.
I'm going to stick to Dreambooth here (it's what I know best) and see what it lands me for a start
I think your process for making good quality pictures is perfect here, even a little on the "directive" side of things, letting few space for AI improv in there.
But if you were targetting Scotty's problem, I think he wants a model that does great on his face without more tricks like those. Not sure how good we can get it

unique cloak
# warped anchor Hello, everyone! I’m new. If I have chosen the wrong channel please redirect me ...

those are a lot of good questions !
1/ this is a good way to go yes. training (not necessarily models) will let you have those characters that stay the same from a page to the next.
2/ textual inversion should work OK. as well as LoRA on that side, and both will let you use the face you trained. I would not include only face shots though. Those 5 in that strip are very good, but :

  • no body each time may make the AI think this is a head that can't have a body
  • having the same armour will make it get learned with the face
    so you may want to try to vary a little on those 2 things, adding at least a fullbody shot and a halfbody shot, and changing the outfit in some. Great faces though
    3/ I don't think it's necessary, but it can depend. when you start prompting using what you get from 2/, you will have already clothes in the model. You can try to prompt for specific clothes, and in a cartoon style it may be enough to keep the close the same from a shot to the next. if it's not the case though, going back to 2/ and training some more TI/LoRA on the main clothes could be good. but at that point I would mostly train face+clothes in a single embedding, it seems better fitting
    4/ this should be enough for the AI to get the face quite well.
    5/ lots possibly... I wrote a guide on all this is you want, explaining the training process and the different training methods (this stays quite superficial). https://trainingsd-parameters.carrd.co/

I think Lora would be better than TI because it trains faster with a higher quality.
You can use multiple LoRAs in a single prompt (composable lora extension), or even in different part of a picture (latent couple extension). You can also merge LoRAs into a model but I think that would just bring simplicity at the cost of quality, so I wouldn't.

About your last phrase, here is something I read recently and I want to share to you

warped anchor
# unique cloak those are a lot of good questions ! 1/ this is a good way to go yes. training (n...

Oh thank you very much!

  1. I really wanted to try LoRa but I thought it needs a lot more images, like about hundreds. Your advice makes me happier)
  2. Feeding full bodies with heads is even easier no need to invent new faces!
  3. Thank you for the link I’ll study that.
    And yes, I totally agree on “go and experiment” idea, but wanted to find out if anyone has done anything already. Somehow I can find lots of info on comics in MJ but zero in Stable Diffusion
    P.S. thanks, it feels so good that you like her face)
unique cloak
# warped anchor Oh thank you very much! 1. I really wanted to try LoRa but I thought it needs a...

to be honest, this is quite a common project, making comics, but very few come in here with the technical skill to build their own dataset from drawings they do. So it's more limited to research interests, what's I've seen around here for now on this.
Doing your own art to build your models like this is a really good skill to have, and it lets you do things for real with all the rights on the pictures you trained.

My personal approach to your problem would be slightly different :

1/ I would train a model, using dreambooth, on the general artstyle you have, with lots of different backgrounds, characters, situations,... keyword is diversity, we don't want any character to be learned, only the global art style you have
2/ I would train a list of lora or TI, using the model made in 1/ as base.
Each of those embeddings would be a full character for a full scene : they keep the same face/clothes in 2 to 3 pages usually, a full scene, and this seems more appropriate than training individually clothes and faces. You would just have 1 file per character per scene, and 1 base model for your whole style
3/ to push the quality even further, you could train an hypernetwork on top of all that, but this seems overkill.

#

if the model trained in 1 was trained using "kateStyle" token for example, and the characters in 2 would be trained using TI on the tokens "misterX" and "missY", then you could prompt something like :

misterX walking, kateStyle
adding the extension controlnet, you will be able to just sketch the scene, and pass it through canny mode to see it be draw completly

#

adding the extension Latent Couple, you could prompt both characters in the same scene easily too. this seems useful because it would prevent both misterX and missY tokens to get mixed in the result

warped anchor
# unique cloak to be honest, this is quite a common project, making comics, but very few come i...

Wow, thanks! Good to get insights on the approach.
Yep, I understand that many do comics with AI. I decided to do it as well cause I think it’s the easiest way to develop a pipeline.

One more question. How many backgrounds do I need? Approximately. Since this is a side project and I don’t have great drawing speed I’m a bit afraid to dive deep into the whole bunch of backgrounds, narrative situations, etc. I do not have such base developed at the moment and my portfolio jumps from style to style tbh due to work and study tasks so I was going to use a Comic Diffusion model I have found on Reddit. Maybe I can tune it somehow.

Also I’m looking through your link and it says that lighting can be remembered hard by the model. Does this mean that I need to add different lighting conditions into the data base? Do I need to make them for every character and location that I have in the story?

unique cloak
# warped anchor Wow, thanks! Good to get insights on the approach. Yep, I understand that many d...

a style training takes usually around 50 to 200 pictures, and that's how many I would suggest you target. but given the situation, it could be quite complicated.
The more backgrounds the better. A larger one could also be split into 2 or 3 to train on without repeating things. A minimum of 20 would be my tip. It will directly impact the capacity of your trained model to adapt your style to different situations.
Like it would be complicated for your style to do any animal if it hadn't seen any in the training.

Lighting is just an example. Anything that repeats can be picked up by the AI. Lighting, for sure, but also composition, similar objects, or any feature you may think off, in terms of photography.
So yes, having at least a little variety on everything you want the model to be able to vary on is important
For example, were you to train on the faces you showed, the model would learn that there is a white background in 100% of case, and try to put it without you prompting for it

#

making a base model on your own style can be optional though

#

you can find a good model on civitAI with the good rights associated with it, and find the good tokens to make it give a style close to what you want

#

and use that model

#

being a side project, I would advice in that direction, yeah

#

also I'm really open to feedbacks on the guide I sent, it's quite new, intended for people just discovering training SD, and I want it to be easy to understand, so if something seemed quite too hard, I'd love to know

warped anchor
unique cloak
#

for sure, for a small animation/comic studio, this is a godsend tech

warped anchor
unique cloak
#

it does the job, for a free tool

#

but I would have loved to have the definitions as tooltips

#

have a good night 🙂

cold cliff
#

Are there good tools out there for managing datasets, labels and such for this kind of training? More than just batch cropping, I mean.

unique cloak
#

I use 3 main tools for working on my datasets :
1/ XNView, lets me do batch work on pictures, like resizing, changing file format, ... it's focussed on image manipulation
2/ Bulk Rename utility, it lets me rename files using lots and lots of options, and do that on lots of pictures at once. essential to modify a token, add something to all pictures, ...
3/ a suite of tools to scrap LAION B or do some auto caption https://github.com/victorchall/EveryDream

#

(I use the name of the picture files as caption)

cold cliff
#

Thanks. I'll take a look at these tools.

warped anchor
# unique cloak it does the job, for a free tool

I understand and it’s a very good guide.
One more thing I came up this morning. You tell about situations when the model gets something wrong (e.g. concept bleeding, or white background is used everywhere instead of regular background, etc) and you tell that “This means you need to add better regularization data”.
This seems a bit vague to me, like what kind of data do I need to add? Like characters with no white background?

The vocabulary doesn’t help much in this case since it says:

Regularisation: Pictures of other things you don't need training specificaly. Those will still get trained on too.

indigo orbit
#

do controlnet models get meaningful updates?

unique cloak
unique cloak
#

"better regularization" doesn't mean anything on its own

#

I need to put images example on those things too

#

I'll expend on this

#

I'm in the process of transiting it to github for better access

#

and to answer, the "better" targets what the main goal of regularization that I don't put clearly out there : hitting the most weights possible, spread the learning on the model with very low attention on everything trained in there, just to make sure everything keeps getting updated, and we don't have a part of the model that stays further and further away from what we are training it on

#

so it would mean more diversified regularisation, more different tokens in its caption if using a captioned regularization, or more diversity in its content/more generic class token

warped anchor
# unique cloak so it would mean more diversified regularisation, more different tokens in its c...

I think the visual will help a lot, yes. I need some time to digest this answer tbh. For now I understand it this way: if I see that model tends to add blue to the background even if it is not asked to do that I need to look at the source images and find which of them confuse the model. Either fix them by recoloring or remove them completely. Or add more images with non-blue background.
Will ‘— no blue background’ command work in this theoretical situation?

unique cloak
warped anchor
unique cloak
#

about the "how to solve it", do you think this blue would come from your instance data, or from your regularization ?

warped anchor
unique cloak
#

because the most likely situation is that it comes from the instance data itself, and is a "bias" of your model
So to tackle it, the best thing to do is to hunt in the dataset for the pictures presenting that background
And for each, you want to reduce the attention on it, aka how much the model will learn from it
one way is to add to the caption "blue background". this will divert the "blue" attention towards that new token, and preserve your main token from it
second way is to remove that picture completly. Sometimes less is more. or swap it for another one

#

I think I had it ? I'll add it if not (token)

unique cloak
#

on shivam/automatic/thelastben you have a "class data" to specify

#

in everydream, you add more pictures to your main dataset, but pictures of other things than your main concept

warped anchor
unique cloak
#

one thing confusing

#

there are 2 vocabulary pages

#

one per guide

#

the "parameters" one and the "methods" one

#

I'll solve this on github

warped anchor
warped anchor
unique cloak
#

a token is a part of a word. it's usually around 3/4th of a word in size. it's a series of letters, and helps split your prompt into bits the model can understand

#

each token has weights linked to it

#

and those weights are what you train when you train a token

#

you will change what "decisions in the creative process" the AI makes for that specific token

warped anchor
unique cloak
#

(that's a big old vulgarization)

warped anchor
unique cloak
#

the more I use it, the more I feel how splitting it two, even if necessary, was a big problem

warped anchor
#

At least maybe you can add a note about two pages

#

Like this is a second vocabulary guys, check it. It’s different

unique cloak
#

I may have finished transitioning in the next few days, it's my main goal tomorrow and monday

#

I want to put those definitions as tooltips too

unique cloak
#

(quite a nice portfolio you have online by the way)

warped anchor
unique cloak
#

you trained already and have observed a bias, or you are anticipating how to react to it/giving feedbacks on the lackings of the guide ?

warped anchor
unique cloak
#

since you are making the dataset by hand, I must remind of one thing

#

don't spend too much time planning, and not enough baking.

#

you can't ever really get how the AI will understand what you give it

#

I did dozens of models now, and keep on not being able to detect everything wrong in a dataset

#

going back to the dataset, and changing your samples, is part of the process

#

given what you are going for, I believe you don't really need regularization that much. It should be good without

warped anchor
#

Yep, I more or less understand that. I think of feeding some MJ generations to a separate model just to practice and see how it goes. I tried doing consistent characters there

unique cloak
#

I gave a hand to someone working on a photoreal consistent face (their own), and it's quite hard to get every detail correct.
Even when training a model/lora, it's preferable to still prompt all the features you need, like the hairs or clothes, to help the AI keep on track and up the quality
I hope it works for you !

warped anchor
unique cloak
#

Me neither, but we understand ourselves quite well I believe here 🙂

#

except when I go nerdy technical

#

yes, you need to be able to call for it/prompt it, and the beauty of it is, you don't have to caption "bangs" for "bangs" to still train. it will detect those as bangs

#

later on, using the token bangs in your prompt, it will force those specific bangs, and not the random ones SD knew about before

warped anchor
unique cloak
#

if you take a look at the last model I did today, the "PoW Style", all pictures are trained on arbitrary tokens that have no inner meaning to SD. but it still manages to extract the main features of each picture of the dataset, so understanding bangs isn't out of the picture at all.

unique cloak
#

if it doesn't have any examples of it, or too many examples of the contrary, it blocks its "imagination"

#

like, good luck making a horse riding a cat

river cypress
#

Anyone interested in making a model based on realistic asian women willing to learn/collaborate together?

sonic narwhal
#

Anyone got good learning resources on finetuning SD?

gloomy sierra
# sonic narwhal Anyone got good learning resources on finetuning SD?

#stablediffusion #characterdesign #conceptart #digitalart #machinelearning #dreambooth #style #LoRA #digitalillustration #aiart #style #automatic1111 #kohya #redjuice #vofan #lucy #cyberpunkedgerunners #aqua #kingdomhearts

CAPTIONING GUIDE (2/23/23) by u/SecureWeeb on Reddit: https://www.reddit.com/r/StableDiffusion/comments/118spz6/captionin...

▶ Play video
sonic narwhal
#

Do you have anything on how to finetune complete stable diffusion ?

#

Like the controlNET guys and illuminati civitAI model did

unique cloak
#

do a full finetune ? I touch just a little on it in the guide I made for training, more genericaly

#

Usually, you train only on a few concepts at once. Full Fine-Tuning, on the contrary, aims at training the whole model on every concept it can. It requires an enormous dataset to do correctly and keep all concepts high quality.

#
  • full Fine-Tuning a model, you train on multiple thousands pictures. Each concept included is to be treated at least like a single concept would, with around 20 to 50 pictures each.
#

but I don't have experience on it myself so not sure how you monitor such training

#

that seems the hardest part

sonic narwhal
#

Thank you 👍

sonic narwhal
unique cloak
#

regularization, you're the second one to ask for more detail on this, has 1 main goal : keep the whole model training, and prevent overtraining that way

#

to take an image, a model is a point on a big map

#

training is moving that point on the map

#

but when you train, you only hit some of the millions of parameters inside the token

#

so sometime, your point start to split in multiple parts : what you are training on goes on its way, while the rest of the model "stays in place"

#

this will create the burning effect we call "overtraining", when one part of the model starts to go too far

#

so to prevent this, we try to train on more of the model at once

#

using regularization let us do that, you don't train a lot anything but you train a little lots of things

#

like using even "a man" as regularization with 1000 pictures (if using shivam) will bring lots of diversity in what is learned

#

those images get also trained a little, and modify your model weights on purpose, so :

  • you can use that regularization to refine different useful concepts, like picture framing or ligthing
  • you can show only superb pictures to train the model passively on aesthetics you like
stone garden
#

i somehow dont find much info about training vae, also there arent many vaes floating around - could someone point me to a place where i find something in an not full nerdy way >.<

sonic narwhal
unique cloak
#

Not sure I haven't used it in webui

hardy storm
#

DING! My LoRa is finished!
Whew. It's been a journey. I would like to thank God, my family, and everyone who's helped get me here. You know who you are 🥹

sullen python
#

Is there a free text too image generating tool with no restrictions?

hardy storm
#

... dear lord ... these results from this LoRa are .......... unspeakably terrible

#

i've clearly done something horrifically wrong

sonic narwhal
#

Keep going king

gloomy sierra
hardy storm
# gloomy sierra was it a style or a character?

Character. But, I was chatting with someone in a different server, showing my settings, and it seems I had almost everything set wrong. Of course. I think his exact words were " you just wasted 4 hours for nothing "... Lol. Soooo ... I'm starting over

gloomy sierra
unique cloak
#

you didn't waste those on nothing, you earned precious experience, and understood a little more each parameter you had that was wrong

#

don't worry, failure is part of the path too

hardy storm
gloomy sierra
#

With a LoRA the instance and class prompts don't seem to really have an effect, as I understand it that's more from the Dreambooth training step it has to go through.
What I mean is say you're trying to train 3 characters in the LoRA, you'd call one "foo", one "bar", one "baz" in your prompts when describing the character to let you invoke that specific character rather than a generic "man", and negative prompt down the others if required.

(Obviously replace those variables with a unique name)

dapper prism
slender anvil
#

What I learned from comments in the #1011228667659178055 channel is that SD is not good at (re)generating text, like titles of book covers. I get that in a context where I would be generating books from a text prompt. I was trying to use dreambooth to fine tune SD to recognize my cook cover in order to be able to generate that book in new situations (like other backgrounds). In my head, I was thinking of keeping that book 100 or 99% intact, just in newly generated situations. Is there a way to do that and to tell SD not to deviate from the newly learned book and cover but to keep that intact as much as possible?

paper surge
#

What does the Koyha_ss train resolution setting do when the training dataset is of a different aspect ratio? Let's say the train resolution is set to 512*512, what happens if one image of the train datasets is in one of the following dimensions?

  • 768 * 512

  • 768 * 480

  • 768 * 1024

vast dome
#

Can I get some answers on this? The higher the batch size, more it samples from many images. So if I had 99 well learned image and 1 not well learned image, at batch size 100, my 1 not well learned image will have a hard time being learned due to the fact 99 images makes the AI think its already so good at the training data right?

#

Gradient Accumulation Step is how sensitive AI is to little details. If you set it too high the AI will get surprised by every little detail. You need to set it to right number because if its too low it cannot capture important details, if its set too high its sensitive to every little detail that it cannot capture the presented concept

floral pollen
#

Hey guys I'm currently working on getting deeper into embed training and I found the guide in the #1080946152318443610 channel at the bottom explaining how loss rates work. Since I only own a 1070 training an embed takes a long time for me which is why I found it especialy interisting because this way I might be able see if a training is going wrong during the 10 hour time it takes for me to create one.

So I've enabled tensorboard and tried to access that, but it seems that it is necessary to complete the training to actually display something or to be used?
So I went ahead and told it to write the loss rate on every step into the csv file. After 4 hours of work with around 1000 steps I loaded the csv file, and added a "trendline" to show me how the tendency of the value has changed.
But it seems like it has only been reduced by around 0,002 as you can see in the (extremly "value zoomed")picture. Is this a normal difference to have? It feels like the difference should be much higher to me

#

Which would mean if I finish training with the 3000 steps I have it will probably only be at around 0,005 or 0,006

river cypress
#

usually theres an option to like display loss rate

#

without going into tensorboards idk

floral pollen
# river cypress what are u using to train

I'm using the stable diffusion webui, the integrated training of embeds there (which I believe is dreambooth). There is a loss value being displayed which is bein updated. I thought that is probably only the loss rate of the last step or something or is that adjusting based on all the steps taken or an average?

river cypress
#

im sure its adjusting

floral pollen
# river cypress im sure its adjusting

Ah okay that is good to know! Can you tell me some “good" and "bad" values for a certain training step amount or something similar? I couldn’t find any advanced guides on how this value works other than that it might go up and down

river cypress
#

steps?

#

easiest way is to gen sample images

#

and see if ur shit starts getting baked

floral pollen
# river cypress easiest way is to gen sample images

Hm okay, I was trying to see wether I can get a bit more in depth through the lossrate because just generating sample images every few steps seems like a bad way to actually see results on a larger scale since the images can be hit or miss

storm rock
#

👋

unique cloak
#

hey again !

#

so, you're training for a project ? 🙂

storm rock
#

So I did one model for generating Eye Textures for fun, wanted to see if I could do it!

A1111 Using Dreambooth, did a test set of 80 Images, 100 Steps per image and I'm actually really happy with the results.

unique cloak
storm rock
#

oH i'll give it a read!

unique cloak
storm rock
#

Here is an example of v1 one can do.

unique cloak
#

ho that helps

storm rock
#

Yeah was just generating one real fast.

unique cloak
#

I would have gone maybe a little under 80 images, from the top of my head around 30 to 50 would be enough

storm rock
#

Now I just used the Prompt "Eye Texture" as I trained it. So I don't really have any control on what the eye actually look it.

#

Yeah I have a larger dataset now which I need to sort.

unique cloak
#

but if you got diversity enough for 80 that's great

storm rock
#

So my main question is this

unique cloak
storm rock
#

Yeah I'm a bit confused on the whole instance token, instance prompt etc.

unique cloak
#

I expend a little on it in the Dataset chapter, Captioning and Attention are nice to understand

storm rock
#

But I wanted to know if I seperate all my datasets into categories like

Slitted Pupils, Circle Pupils, Eyes with highlights, etc.

#

Should I train these into the model separately or=

#

?*

unique cloak
#

nope, you should train them all simultenaously. for multiple reasons

storm rock
#

So I dont actually need to sort out my dataset?

unique cloak
#

first, doing different models and merging them, you'll lose quality on every trained concept
Second there will be a lot of concurent concepts : the "eye" token for example will be changed a lot in each model, and merging them could break all of those

unique cloak
#

you want a certain balance in your dataset

storm rock
#

I see!

#

I imagine the balance would be different type of textures?

unique cloak
#

if you train on half red eyes, and 25% blue and 25% green for example, you would have a red eye bias in the end

storm rock
#

Oh!

unique cloak
#

well for multi concept at once like that

#

the best way I found is this

#

first you make a balanced dataset : 25 pics of each for example

#

but the same for each

storm rock
#

Mhm.

unique cloak
#

then you do a first training session, doing lots of save points

#

the goal of that training session is to see when each concept is "just trained enough"

#

some will train faster than others

#

and so you want to synchronize them for best quality

#

that brings it to the last step : the real training

#

you start by training only the concept that took the longest in the test run

#

and you add more concepts every few epochs, doing just your maths based on the test run

storm rock
#

(Oh I see, I can classify each images within it's name for teh AI to perhaps understand what is differnet about it?)

unique cloak
#

but that is a lot more complicated than just tagging all of them "eyes"

storm rock
#

Sorry reading here and your write-up.

unique cloak
#

take your time, no problem

storm rock
#

I started SD 3-4 days ago and I got the question "Can you, generate eye textures with this?" So I got real curious.

unique cloak
#

I'm not on automatic, I use another tool, and I can train just on a folder and the subfolders. I think in automatic you'll need to declare each concept as a sub dataset, but it's possible too

storm rock
#

So making a balanced Dataset.

unique cloak
#

about your token, "eye texture". I think you should drop the "texture" completly. just "eye"

#

it's already a strong token to train onto

storm rock
#

So it would be possible to have a Core folder called "Eye" and sub folder to further categorize it

#

Like Slitted Pupils

unique cloak
#

but that just depends on the training tool you use

storm rock
#

That is really cool and would greatly improve the workflow of sorting a dataset.

#

I have about 2.5K eye textures I can build a dataset from so ._.

unique cloak
#

yeah, I have a big "datasets" folder, lots of "preparation" ones, some "captionned" ones, and I prepare a "mix" folder, copy pasting the concepts I want

#

but....

#

I don't think you are on the good path using so many eye textures, or I could be wrong on the diversity

storm rock
#

The plan was to shorten it.

#

As a test perhaps do like 10 of each eye type.

unique cloak
#

I don't imagine 2k5 eyes being different. and I mean different really, not color swap once, no repeating motifs

storm rock
#

with balanced colors.

#

I think I sued abyssorangemix3AOM3_aom3 for the source checkpoint which I also need to figure out what that actually entails.

unique cloak
#

for me, each eye type, like cat eye, or whatnot, would need max 15-20 pics

storm rock
#

Mhm

#

I have enough juice here to get 15-20 each type for sure!

unique cloak
#

but you may have lots more motifs in the background or other

#

the important thing is

#

no duplication

#

if you duplicate motifs or things like that, those will train insanely fast

#

and burn

storm rock
#

Yeah I noticed in the original V1 of the model.

#

I had the same eye repeated in it just with different colors.

unique cloak
#

and that eye shows 1 out of 2 seeds now ?

storm rock
#

Haven't noticed it that much, I see some repeatring patterns.

#

Doing a batch gen so you can see what it'll do.

#

Me and my friends already consider this version enough for what we want.

#

But I wanted to see if I can make it more fun to sue and customizable

unique cloak
#

for sure, I'm going far into what can be done, and DB out of the box is already great

storm rock
#

Like we can easily use these, throw it in PS and mask it up for use.

unique cloak
#

well if you want to do it the simple way, and gain a little control

storm rock
#

I'm new to this AI thing and I'm almost a bit scared how well it does when I set this up without know anything. Sweats

unique cloak
#

I would use 2 tokens at once, max.
meaning "cat eye", "monster eye", ...
and I would just balance the dataset

#

forget the 2 steps training I was talking about

storm rock
#

At once does that mean during the final training

#

and if I wanted more different eyes I would make more models?

unique cloak
#

you could, yes

#

but you can only use one model at once

storm rock
#

Yeah.

unique cloak
#

so it's a little problematic, switching like that

storm rock
#

Mhm.

unique cloak
#

and it's the case where I was saying merging wouldn't work great

storm rock
#

In a perfect world we would add the prompt "Cat Eye" and it'll just add the slitted pupils etc.

unique cloak
#

yeah, so for that...

#

can you show me the current UI for training DB in auto ?

#

long time I have used that one

#

I think the option you need is "caption training"

#

but not sure

storm rock
#

Sure!

unique cloak
#

the one where you set up the concepts mainly

storm rock
#

The settings tab.

#

oh concepts!

#

I'm on 3080 Ti

unique cloak
#

I see an option that could do it already there, but the UI for concepts may be more simple for it

#

(you pasted the wrong UI I believe)

storm rock
#

ops

unique cloak
storm rock
unique cloak
#

I trained on that for some time 🙂

#

ok yeah so

#

there are 4 concept tabs here

#

ok nice

#

they improved it

#

lots of possible ways to do it.

storm rock
#

I see. :o

#

Yeah I picked DB using A1111 cause it's the first results I got when googling.

unique cloak
#

you can start by just naming the pictures as you like in fact

#

like naming the file "cat eye (1).png"

#

and so on

storm rock
#

So 4 Concepts means I can incorporate 4 different concepts into the model.

unique cloak
#

you could yes

#

but if you use the [filewords] thing

#

that is shown in the fields

storm rock
#

Mhm.

unique cloak
#

it will take the name of the files

#

and train on that, as if it was each a different concept

#

since most pic will be named the same, it's not doing 2k5 concepts though

#

just as many as you have given different names

storm rock
#

Would it look like this in practice?

unique cloak
#

nope

river cypress
#

i recommend doing textual inversion

unique cloak
#

just put [filewords]

storm rock
#

textual inversion, Haven't gotten to learn this one yet. :o

river cypress
storm rock
#

Ah that's one fo the training methods is it? :o

unique cloak
river cypress
#

shit i do name [filewords]

#

Or rather instance token [filewords]

unique cloak
# storm rock Ah that's one fo the training methods is it? :o

yes. textual inversion doesn't make a new model. it makes a file that knows how to talk to your existing model in order to get the eyes you want. you can have multiple textual inversion at once, so it's nice in this case
the downside of TI is that, if the model really doesn't know of you eyes, it won't do great quality on those. but good thing : the model does know eyes quite a lot

#

that seems like a better solution than dreambooth here

storm rock
#

So it takes the names of the image file and uses that.

#

So if I then use Cat Eyes as a prompt it'll weigh more on those eyes.

#

?=

unique cloak
river cypress
#

There's no harm in doing name filewords imo

river cypress
#

Since u force db to train the token

storm rock
#

I see.

river cypress
#

I don't think it appears otherwise

#

Also

storm rock
#

So what if I do things like "Cat_Eye_01"

river cypress
#

U don't need an instance token for fine tuning

storm rock
#

Will the underscore seperate Cat and Eye?

unique cloak
#

remove the underscores

#

just put the (1) at the end

storm rock
#

Oh (1) with the ()

#

I see!

unique cloak
#

or just put all pictures in a folder, select all, hit "F2" and type "Cat eye", it will rename all and add numerotation

river cypress
#

Aren't u supposed to tag everything for db

#

Tag everything remove the thing u wanna train

unique cloak
storm rock
#

"Cat Eye(1)" I see. 🤔 So what if I have 10 Cat Eye's and then 10 Cat Eyes with highlights in them.

#

Would having "Cat Eyes(1)" and "Cat eyes highlights(1)"

river cypress
#

I see

storm rock
#

Do anything there?

river cypress
#

Yeah

unique cloak
# storm rock Would having "Cat Eyes(1)" and "Cat eyes highlights(1)"

Each step of the training, a batch of pictures is trained, and the weights of the model move a little. Those changes happen slower or faster, depending on the learning rate you use. This "budget" of changes that could happen on a single step is called Attention, and is split amongst the tokens you used in your caption.

Adding more tokens to a caption then has multiple effects :

it slows down the training on each single token. This may require more total steps to produce the same results, or to use more trained tokens at once in your prompt later on.
it looks for the more fitting parts of the picture for that token and associates with it the changes. This means that describing a feature in your caption can prevent that feature from being associated with the other tokens. As an example, if training on a character that has a tie in half the shots, adding the token "tie" would reduce how much of the tie feature is associated with your character.
it spreads the training on more weights of the model, and reduces the need for regularization.

storm rock
#

Now my lack of understanding can't decide if this is a good thing or not.

unique cloak
#

just check the guide, and the terminology

storm rock
#

Will do!

unique cloak
#

this is at the mid point, so I wrote it with the mindset that lots of concepts were already there

storm rock
#

Mhm.

river cypress
#

A lot of this shit is trial and error really

unique cloak
#

yep

storm rock
#

Yeah I've noticed!

river cypress
#

Noone knows for sure what we are saying

#

We could be bullshitting you for all we know

#

Maybe the right thing is to tag half

#

Or to train first half steps tagged and last half untagged

storm rock
#

What I can tell is that having a curated, varied and properly balanced dataset is good. 🤔

unique cloak
#

that is a guide from confirmation bias for sure, but still quite a lot of experiments and independant people training having similar observations.
We could be wrong, but those technique seems to work at least

river cypress
#

I've heards tons of crazy things ppl have done

#

Yeah those probably work

storm rock
#

Both in Shape, colors etc.

river cypress
#

But u cant say they are 99% optimal

#

Hell I don't even think it's like close to 80% optimal

unique cloak
#

nah

#

mostly dataset building

#

finding bias

#

understanding what the damn machine understood and why it did not do what you wanted

#

that's just doing and doing again

#

and confirmation bias

storm rock
#

So here we have two eyes from the dataset, one has a black background and the other is just a transparent png.

#

Is this something I should watch out for?

river cypress
#

All I'm trying to say is don't take anything we say as gospel truth

river cypress
unique cloak
storm rock
#

I see

river cypress
#

If not it'll never actually spawn without a cat kinda

unique cloak
#

yeah, the token "cat" is strong, if not trained on it will bring the whole cat

storm rock
#

I see.

#

I considered naming it Slitted or Slit something

unique cloak
storm rock
#

Cause it's the pupil i*m after.

unique cloak
#

you can name it as you'd like tbh. the weakest the toke you choose, the better

#

to test the strengh of a token, do some pictures with just that word

storm rock
#

My V1 just spit this out.

unique cloak
#

if it comes out as something completly different each time, it means there isn't a lot associated with it in the model, and that it should bring less things you don't want into your model

storm rock
#

Just what I'm looking for.

unique cloak
#

or look for a token that already brings something that is in the direction of what you want ^^

storm rock
#

Now when looking for a token

#

Does that relate to doing prompt tests on the model checkpoint I use?

unique cloak
#

you can, yeah, I just described how

#

but you can also use a list of low weight tokens

#

those aren't fun tokens usually

storm rock
#

Cause that's another part that confuses me, deciding on the source model.

unique cloak
#

but they are the best token to train without bringing in unwanted inspirations

#

well, the base model...

#

think of models like point on a map. some are more in the north, in the realistic direction for example, some are more in the cartoon side, ...

#

it's more 4D than that but it's the logic

#

if you train, you push your model in a direction on that map, towards where you dataset would sit basically

#

so using a model that is already close to your needs means less training

storm rock
#

I just did "Cat Eye" on my v1

unique cloak
#

seems ok to me.

storm rock
unique cloak
#

once again, I give tips towards better quality if you feel it's not enough on that side, but doing strong tokens like "cat" will also work, to an extend

storm rock
#

oh yours information and tip helped me understand a bit how I would progress.

#

I think I'm going to try and create a balanced and named dataset and see how that goes.

unique cloak
#

you could name those eyes "boat on a river eyes", with enough training it would take

storm rock
#

If I did that

unique cloak
#

don't though

#

^^

storm rock
#

that means when I use my model and type in "boat on a river eyes"

#

Does that means it will use those eyes with that categorization to generate?

#

Like on in practice, is that how it works?

unique cloak
#

it would make eyes like the ones you trained that had that name

#

but it could also make a little of a boat for quite some time

storm rock
#

I see so it is that straight forward in a sense.

#

and the boat part would come from the base model right?

unique cloak
#

yeah

#

it would start from what the model knows

#

and modify that

storm rock
#

So for example if I was goign to pick a check out

#

Having something that would work as iris would be optimal 🤔

#

Civiai browsing time. Sweats

unique cloak
#

an example though

#

each week we have a contest on the server

storm rock
#

I saw that, did some browsing around here. :o

unique cloak
#

and each week, I make a model out of the participant's submissions

#

I caption each picture "PoW Style XXXX"

#

PoW Style is the main token, based on the name of the event

#

and XXX is different for each picture

storm rock
#

Mhm.

unique cloak
#

it's a token I give from a list, one to each user

#

if I then prompt "PoW Style Floaf", I would get variations on the eye you would have drawn

#

and if I prompt "PoW style Float Guizmus", I would get a variation of a mix of both of ours

storm rock
#

I see. 🤔

unique cloak
#

here is a thread where I show lots of examples, having trained 42 different tokens

storm rock
#

Just generating now with the prompt "Eye Texture" and paying attention to the patterns I can see the parts that are weighted more heavily in the dataset.

#

Especially the Highlights

#

Three random generations in a row.

#

Same general highlight, looking at my dataset 20 of the images out of the 80 have that.

#

Really helps to see if happen.

#

Looking through that thread, those are cool!

unique cloak
#

yeah

#

those are bias

#

also try to run with CFG at 0

#

this will show it easily

storm rock
unique cloak
#

you managed to figure out a lot of the process by the way, you understand this quite well already

#

so yeah, we see the general shape that has started to "bleed" in the model

#

this is not real problem

#

it is to be expected on such dataset

storm rock
#

Yeah the dataset is purely Circle with things in it.

unique cloak
#

and you aren't making this model to be able to draw picasso on a boat

storm rock
#

So probably helps.

#

CFG at 30

#

So now it's strictly trying to use the dataset

#

Full blast.

unique cloak
#

CFG 0 is special, your prompt is completely ignored

#

so it shows what the model would do naively

storm rock
#

I can recognize some of the images I used.

#

Oh I forgot to mention I used LORA for this!

unique cloak
#

that won't change a lot of thing in your case, and will be faster to train and to share with friends

#

so I would encourage it

storm rock
#

Mhm

#

Also gave me a list of errors when I tried to not use LORA

unique cloak
#

it's a little lower quality than dreambooth on its own, but it's a fringe remark

storm rock
#

Yeah all I want to generate is circle, with an iris and pupil with some fibers/Color thrown in for good meassure.

unique cloak
#

who wouldn't

storm rock
#

As an 3D Artist the idea of generating infinite amount of texture is just fun.

#

Alright so my first step for V2 is dataset sorting.

#

cool_doge I love images.

unique cloak
#

also

#

did you know

#

🐮 Did you know that some cows have been known to moonlight as abstract artists? It's true! These bovine Picassos have been spotted using their tails to create intricate, colorful designs in the dirt. The most talented of these cows have even been known to create what appear to be intricate eye textures, complete with lifelike lashes and irises. Of course, some critics have suggested that these "eye textures" are really just the result of a particularly enthusiastic cow tail flick, but we prefer to believe that these cows are true visionaries of the art world.

#

(sorry random cow fact about eye textures)

storm rock
#

🤔

unique cloak
#

we just love cows, us mods recently, can't really say why

#

and I just tend to drop random cow facts custum made for users

#

using chat gpt

storm rock
#

Sometimes you don't need answers as to why things are the way they are.

#

🙏

#

Last time I played with GPT I convinced him to ignore his filter guidelines and made him write meme fanfics.

#

Good fella'

unique cloak
#

❤️

storm rock
#

So coming up with Tokens for the file names is important then.

#

Words are scary yo

unique cloak
#

yeah

storm rock
#

God help me

#

🥹

unique cloak
#

but i can give you a list of tokens if you'd like, not beautiful ones but tokens

#

what's that ?

#

dataset ?

storm rock
#

That is years

#

YEARS of eye textures.

unique cloak
#

minutes in tomorrow's world

#

thanks to that collection at least

storm rock
#

Everything from Highlights, Iris, Full Eye textures, Round Pupils, Cat Pupils, Oval shape, round shapes,

unique cloak
#

one software I love when manipulating lots of file's name : bulk rename utility

#

handles lots of filenames and manipulations

storm rock
#

Yeah I use it religiously.

#

and czkawka to check for dupes.-

#

I usually use photoshop for batching but I think I'll need something faster here.

#

Need to see if I can process black or white background to all of them.

unique cloak
#

XNview can do that

storm rock
#

Downloading so fast.

unique cloak
#

same concept as bulk rename utility but for pictures

#

picture manipulation, lots of basic ones, some filter ones, ...

storm rock
#

XnConvert right?

#

or XnView?

unique cloak
#

hit CTRL+U and enjoy

#

XNView Classic

#

to be exact

floral pollen
#

Sorry but I have a quick question, I have a file in my logs when training via dreamboothnamed event.out.tfevents is that something I can analyze? Is that the file that should be used with tensorboard?

unique cloak
#

yep, that is tensorboard file usually

storm rock
#

My brain is moving to fast now I'm asking myself can I train it to generate face textures following a UV. dance Alright time to sort dataset.

#

Thanks Guizmus!

unique cloak
floral pollen
#

Oh that is very helpful thank you so much! I was having issues analyzing the loss value!!

unique cloak
#

this folder/file is even populated WHILE the training is happening

#

so you can stop it if you see it doesn't go well

#

it runs on CPU only

#

you're welcome 🙂

floral pollen
#

Oh wow that is EXACTLY what I was looking for :D

unique cloak
#

well, happy I wrote it then :p that part could need pictures, I just transited it to github

floral pollen
#

I was about to write a script analyzing the loss values through the csv file xD

unique cloak
#

tensorboard should already be in any training environment

#

if not it can be pip installed easily

floral pollen
#

Once I get tensorboard up and running I can give you some feedback on it if you want to

unique cloak
floral pollen
floral pollen
unique cloak
#

it's intended for entry level into training but honestly, it's hard not to go too deep when you start explaining

#

the dataset chapter in particular is a little deep

storm rock
#

@unique cloakOriginal image size is important right, I think I read up on this when I did my first model. having all images in 512 is preffered right?

unique cloak
floral pollen
unique cloak
#

it's possible to train on other size and ratios though

storm rock
#

Oh yeah I forgot 2.1 is a thing, I've mostly used 1.5 models. Would doing this as a 2.1 help me? As I have no need for anything crazy in terms of filtering adult content.

#

I imagine if I did like 1024 this would really push the vram usage. ._.

unique cloak
#

XL is about to come on #1084896022368624640 so I wouldn't sleep on the size for long either. in particular : stay future proof. keep all your highest definition picture somewhere

storm rock
#

Alright!

unique cloak
#

you will need bigger models with more parameters

#

to keep the same quality

storm rock
#

Oh so running 1K images is a detriment?

unique cloak
#

training on 1k size is a detriment on a 512 model

#

running 1k pictures on a 512 model works but gives duplications without some tricks

storm rock
#

My thought process was have the entire dataset in 512 and the eyes that get the Ok we AI Upscale.

#

Oh! I see.

#

I think I'll just 512 it the entire way then!

unique cloak
#

like I do a lot of desktop backgrounds recently

#

and I have a 32:9 screen

#

so keeping duplications out is really hard

#

example, we get twice the same kind of picture here

storm rock
#

It feels mirrored

#

but it's not an exact mirror

#

Weird..

unique cloak
#

here, it needed to add a second robot head

#

because it was feeling it had no robot on the right side

storm rock
#

I noticed this when I tried to do widesceen images

#

It almost insisted on adding more people

unique cloak
#

from time to time I also got great ones though

#

using lots of extensions to help

#

there are multiple tricks to it

#

mostly using higres fix, and describing more the background in your prompt

#

(I'll stop examples sorry)

floral pollen
#

I find the problem interisting though. That seems like something controlnet would be able to fix

#

And I love ur example pics, nice vibes/styles

unique cloak
#

i'm not the best prompt maker ^^

unique cloak
#

those pictures;, 5120x1440 already took more than 5 minutes on 3090TI

#

using controlnet, I'm looking at 15 minutes from my one try

#

in the end, I prefer to tweak the settings and run more seeds

#

but I switched to 1920x1080 for more fun too, now that I found a good desktop to use

floral pollen
#

xD At least that is somewhat fast, iI'm working with a 1070. No way in hell I'm actually going to get anywhere with prompts fast. Which is why I test most of my prompts initially on like a 10 to 15 step set

unique cloak
#

(model name is "realbite" and it's on civitai)

#

I can't link to it, it has some NSFW pics too

#

rule 4 here

floral pollen
unique cloak
#

hehe yeah, I get that feeling

floral pollen
#

The pace at which all of this is going is so insane. Yesterday i stumbled upon a local text generator and needed to try that out OF COURSE, I'm pretty much hopping from topic to topic right now trying to learn as much as possible

floral pollen
unique cloak
#

I had quite a lot more time to take it all in, I was there since this became a public beta

floral pollen
#

Okay That is still more than mine

unique cloak
#

so every piece showed up once after the other

#

but everything is just so fun to play with, to tame

#

it's like an hydra though.
Every time you think your understood something, you find 3 new concepts to try and understand next

floral pollen
#

I've been following along since it came out on github last year and I've tried so much stuff. Stable diffusion, Automating it, dreambooth, loras, live chat bots with whisper, the text stuff now. Every time you turn around 3 new insane checkpoints have been reached and there is something new to discover

floral pollen
#

But it help that some of the stuff is being used in several projects

unique cloak
#

yeah. the logic is the same

#

it's why I made that guide that way

#

to stay "method agnostic"

#

there are specificities to each for sure

#

but dataset making is the same for example

#

and for diffusion models too, we find the same concepts in each

#

I haven't played locally with the GPT stuff

#

only tried some chatGPT online

storm rock
#

Would you reckon really going ham on naming the files like Cat Eye Orange (1) would help or AI gets colors? 🤔

floral pollen
#

I work as a programer, and use chatgpt as an alternative to google now when it suits it. Took off around 10 to 15 hours the past few weeks from what I would have usually taken. It helped a lot

#

So having something like that localy is not possible with the amount of data they have but something similar like a home assitant would be nice

floral pollen
#

I actually got the tensorboard to work now so thank you so much for the hint!

unique cloak
storm rock
#

I'm under the assumption that some things the AI itself can figure out, like in this case colors.

#

Obviously like you said if 50% of the eyes in the set is red, it'll weight more.

unique cloak
unique cloak
#

the thing is, it will understand that what keeps changing is to not be learned

#

it's all

#

even if that makes no sense to us/it

#

so colors are a good example, but like, trees are, mostly everything can be, even ligthing, photo grain, ... as long as the dataset is diverse on it

#

if you only have 3D renders, it will give 3D renders though

#

brb

floral pollen
# unique cloak don't hesitate to hit us with screenshots of the graphs if needed be

Based on your guide I would have guessed that the lowest point in the red circle is the best since it goes up again after that, but it seems like it might have been able to get trained a bit more.
The part in the red square has the lowest value but I guess with around 490 steps that would probably not be enough right?
How close should the value be to 0 to be an actually useful embed of a person(s face)?

#

Sorry for all the questions 😅

unique cloak
#

ok back, I realised it was 11PM and I forgot to eat

floral pollen
unique cloak
#

the other point around 1750 could be of interest

#

if the later point was overtrained that is

#

I can't answer on how close to 0.... It would depend on the method, the exact implementation of the tool, ... and I haven't used all

floral pollen
floral pollen
unique cloak
# floral pollen Even though the loss value is that high?

it's a local low. not the best local low but a low either way.
Usually for me, those are "ok but low quality". but there has been some cases where the red circle was just overtrained, burned outlines. and the higher local low was good

floral pollen
#

Finally some graphs and data 🤩

unique cloak
#

yay !

#

I took ages before getting into that

#

and back then I had quite a hard time collecting the information I gave you on lows and highs, and loss behaviour

#

mostly I was told "loss value is crap don't bother"

floral pollen
#

Yeah I bet. Even now it is hard to find good information, but your guide is really precise 👍

unique cloak
#

but I have really been able to correlate graphs to those observations that were also given to me/experiements I made

floral pollen
unique cloak
#

yep

#

depending on the tool, then even add more data in the tensorboard

#

like the sample pictures if you ask for some

floral pollen
#

Oh yeah I saw that option in my local installation. But this online version doesent have it. I'd use the online version because it is 10 times faster than mine but my gdrive is full so I cant create a save every 100 steps which I will need when I actually want to get precise states

unique cloak
#

I can feel the pain

floral pollen
unique cloak
#

happy to redistribute info and not let it go to waste

floral pollen
unique cloak
#

training on LoRA would mean smaller save points maybe

floral pollen
# unique cloak happy to redistribute info and not let it go to waste

Yeah and again thank you so much :D Especially since I don't feel like this part of stable diffusion is going to get a lot easier over time, since only a small percentage of people want to also use it in this way. And with everything that is happening right now I can't wait for the first embed creator jobs to pop up. Or model trainers

floral pollen
unique cloak
#

but for the random user, going into this in the first few days I wouldn't advice

floral pollen
floral pollen