#šŸ”§ļ½œfinetune

1 messages Ā· Page 7 of 1

split acorn
#

and adjust

#

then resume training

#

it might spike the VRAM about 0.5 GB extra for a little, so keep that in mind

clear lion
#

I mean that i cant get sample images to look anything similar to mine. Person is slightly recognizable

split acorn
#

another thing to try is your initialization text. If it's describing your character, then it might work better

split acorn
#

and I like the method where the text files (in your dataset folder) are only describing the scene, the stuff you don't want the AI to remember. I'm still testing if that works better or not, but it seems to be pretty good!

clear lion
#

It's the beginning

#

Then it will start looking worse. Not even close to supposed person/

split acorn
#

could be your dataset ChillBar_shrug

#

if your vector count is like 8-16 then I'm not sure

clear lion
split acorn
#

if your vector count is like 1, then this is to be expected

#

yeah 5 might be far too low

#

could try 16

clear lion
split acorn
#

And you're using [filewords]?

clear lion
#

yes

split acorn
#

Two methods for filewords, there's the "describe everything" method and the "describe everything you DONT want the AI to learn"

#

(though I'm still figuring out which works best atm, the second seems to be working a little better)

clear lion
split acorn
#

how many images?

clear lion
#

tried 8-30.

split acorn
#

you generally don't need too many, just high quality. Almost all being close-ups from various angles, lighting, backgrounds, clothing and then maybe a few upperbody

#

anyways, I'm bed alicatBed

clear lion
# split acorn you generally don't need too many, just high quality. Almost all being close-ups...

I know. I did deepfakes before. All are completely different with different all (distance, cloths, backgroud, expression etc)
For example here
#1026789870884102226 message
a guy tested 18 settings with TI. But what is common - result really looks like a source, no matter what settings he used - difference is subtle, but overall the same face.
His settings are described, I just followed - but not getting any similar result at all.

autumn token
#

Hello, I was trying to find a guide on how to fine tune model using images with different aspect ratio, but I all get on google is dreambooth and textual inversion. Is there a good guide on how to actually do it?

#

Anyone knows a colab for it?

robust urchin
#

I can create it

#

but when I try to use it in Automatic 1111

#

I get an error like dimension or tensor dont match ?

ocean grotto
#

Hello guys,
Some of you know why i'm getting this image ?

#

On SD2.1 768px

final matrix
#

not everyone or most, and youre still using a subject/ip you dont own
so i disagree there

split acorn
#

you disagree and think it’s wrong to create likeness of a subject you don’t own the rights to, in the context of fan art?

#

if that’s the case, the I simply disagree both on a moral and legal standpoint (as long as it’s fair use / you’re not profiting off it)

stone garden
#

from my knowledge, then you can profit from "fair use." Tho I haven't checked up if that's really true :P

split acorn
#

it’s muddy. I think if it’s seen as a ā€œsubstitutionā€ and isn’t a parody, and the ip owner doesn’t permit it, then they could be issued like a cease and desist/demand payment (or something rather)

#

if it doesn’t fall under fair use

stone garden
#

as long as it's transformative, then it's okay I'd say. And I don't think the IP owner has any right other than doing something if it isn't fair use

split acorn
#

ye ye

#

And, as a side note, Infringing happens all the time in the art communities, after all CB_nod

#

not saying it’s right or wrong, but that it happens a lot

winter apex
ocean grotto
#

thanks

earnest roost
#

[Epoch 30: 1/1]loss: 0.3550469: 0%| | 30/100000 [00:46<41:55:48, 1.51s/it]

final matrix
#

i am saying fanartists who are against ai art are hypocritical

#

people ask me "how can it be youre paying hundreds of euros for training a single model?"
me:
(i test a lot)

split acorn
#

I also test a lot alicatKEK just spamming TI to learn what everything does

spice bridge
#

How much images did you train on with these settings?

summer minnow
spice bridge
#

that's not much, i'll give it a try, thanks

nocturne cave
#

how do i train my own models with dreambooth? i looked at an online guide on dreambooth but it was outdated, i have no idea what i'm doing

split acorn
hoary frost
split acorn
#

yosh

#

there are a couple in there though that are generated via hypernetworks (auto1111's repo)

hoary frost
#

okay thanks, if i want to have photorealistic images eg of houses, no styles, no cartoon etc, what would be more suitable, textual inversion or hypernetworks?

grave igloo
#

Hi everyone
I was wondering if is it possible to fine tune the depth model, or a way to use it with 2.1

real grail
#

is it possible to fine-tune multiple classes at once? like if I wanted to do a model with me and my partner captured, or multiple pets

round hare
#

Do you guy found a good way to dreambooth on SD 2.1 768 ? Is there any update on joe pena's repo (seems to be on 1.4). I didn't had good result with automatic or last ben, but maybe I do it wrong. Any good resources or tutorial ?

hoary frost
hoary frost
round hare
#

any tuto ?

hoary frost
# ocean grotto

i have same problem, i am on the ShivamShrirao notebook, how did you solve it? it works for sd 2.1-base with 512 resolution though

ocean grotto
hoary frost
#

so it doesnt resize/crop the images to the correct size during training?

ocean grotto
#

I dont know about this notebook

zealous ginkgo
#

First time training an embedding but how long does embedding take usually? 100k steps, Embedding Rate 0.005 8 Images. It says ~7 hours but dreambooth took me 10-20 minutes. Just wondering if im doing it right

lime bison
#

I too have a lot of questions about the parameters for creating embeddings in auto1111; everything I have made has been very poor

winter apex
#

Has anybody trained using LoRA? i cant find a single tutorial about it

mild phoenix
#

How many images is too much for an embedding? I want to imitate the style of a game's background, so I could potentially take a whole lot.

hardy bloom
#

hi, anyone managed to train embedding in webui for 1.5sd?

#

getting weird error, it stops right after start and says it finished training at 0 steps

mild phoenix
#

welp, it just did the same for me 😦

#

TypeError: can only concatenate list (not "NoneType") to list

Edit: solved, I mistook what model I had switched to

split acorn
#

mmm the finished training at 0 steps happened when the training template didn’t contain [name]. I think there was another thing that caused it, but i forgot

#

Using the same embedding name as something previous, i think?

#

to make sure the new created embedding is unique

mild phoenix
#

this is the first time I try. I also tried some of the standard templates to make sure I didn't break something and same error

#

never downloaded embeddings either

dapper prism
#

Have people gotten TI training for SD 2.1 working on Automatic1111 without using --no-half yet?

split acorn
split acorn
#

might help alicatPog

mild phoenix
#

could I pm you those?

mild phoenix
split acorn
#

you can censor them and then pm, sure!

#

whatever works

ocean grotto
#

Hello Guys,
Removing the background to the image will improve the training ?

split acorn
#

it can be the opposite, actually

#

varied backgrounds being better then just one

#

I suppose unless you want a specific background, in which case having the same one could be better alicatHm2

#

like a bright pink for easy future background removal

#

I might try that alicatPog

final matrix
stone egret
#

Any estimate on how long this will take? It was my first time clicking the Train Embedding button (auto1111)

cobalt steeple
#

Hi all, I'm trying to learn how to use Textual Inversion, and when I load the colab I get this error

#

Does anyone know how I can fix this?

#

Or can someone recommend another online space where I can train a Textual Inversion model?

rough hamlet
#

Are there any good guides for ti training sd 2.x? My settings from 1.5 don't seem to work very good

jaunty robin
#

Dude. You should not be posting pictures of your child here

covert crest
prisma nacelle
#

does using images with transparency backgrounds mess with the training?

somber summit
#

so i don't know if this is happening for anyone else: but when i train dreambooth models with my face (and my friends' faces too), the closeups photos generated look great but the fullbody photos are just horrible. I already turned on "restore faces" and i have included full body shots in the instance images. is this happening for anyone else?

desert zealot
#

can you train an already trained model?

hoary frost
peak fjord
#

There are very good midjourney artwork. Instead of taking artist work it is possible to take those images and train on it. What is issue with it

chrome dust
pure blade
#

i did this test a while back to see how faces perform through the vae as they become smaller. As the face becomes smaller the resolution in the latent just becomes to small to encode the details and it just turns into a blob

#

this was for 512x512

#

that also means if you use images during training where the face is to small you are basically teaching the model that your subject looks like a blob

somber summit
pure blade
#

ye what matters is what resolution is the actual face part of the image in this case, so the larger resolution the image is the smaller the face could be relative to that

little hollow
#

So you would always get the char without qny background(screwed up lots of embeddings)

round hare
obsidian sand
#

I use fast dreambooth colab to train one face before and it turns great but I wondering, is it possible to train many person face and many art style in one training model?

round hare
#

I tried dreambooth colab, hugging face repo, automatic 1111, on SD2.1 768, none of them have the same quality I had previously with the same sources images (resized to 512) on joe pena's repo on 1.5 with 1.5 model.

obsidian sand
round hare
#

wich one ?

obsidian sand
#

The best one

round hare
#

joe pena's ? I tried it on runpod, it's not free but really cheap, and you can have a 24Gb gpu

#

Dreambooth is Google’s new AI and it allows you to train a stable diffusion model with your own pictures with better results than textual inversion. Dreambooth is originally based on Imagen text-to-image model and this technology makes it possible for you to insert any character (yourself, your friends, your family), object or animal you want in...

ā–¶ Play video
obsidian sand
#

How many images and steps you train and how long it takes?

round hare
#

12 images, and it run an hour I think. I trained it few weeks ago, don't fully remember

obsidian sand
#

I trained 40 images 3000 steps and it takes 1 hour

#

Now I want to train many person face and many art style in one training, but I'm not sure if it will works.

winter apex
# obsidian sand I use fast dreambooth colab to train one face before and it turns great but I wo...

with this you can train more than 1 person: https://github.com/TheLastBen/fast-stable-diffusion

if youre training 2 people then the minimun steps should be 6000 (3000 each person)

GitHub

fast-stable-diffusion, +25-50% speed increase + memory efficient + DreamBooth - GitHub - TheLastBen/fast-stable-diffusion: fast-stable-diffusion, +25-50% speed increase + memory efficient + DreamBooth

winter apex
#

i guess that you can then merge a human trained model with an artstyle model

obsidian sand
#

How? Copy cpkt link to model download custom cpkt link?

obsidian sand
somber summit
#

oh don't use generic names like john and sarah btw

obsidian sand
#

How about style?

somber summit
#

are you talking about the concept images or?

obsidian sand
#

Like anime or cartoon

somber summit
#

are you trying to train a new style?

#

that's not readily available in SD?

obsidian sand
#

Not really, I'm just wondering.

somber summit
#

i never trained new art styles before i'm sorry

obsidian sand
#

It's okay. Maybe I will use textual embedding for applying a new style.

somber summit
#

guys has anyone run into this error when running dreambooth?

#

RuntimeError: Error(s) in loading state_dict for LatentDiffusion:
size mismatch for model.diffusion_model.input_blocks.1.1.proj_in.weight: copying a param with shape torch.Size([320, 320, 1, 1]) from checkpoint, the shape in current model is torch.Size([320, 320]).

ocean grotto
#

Hello Guys,
There is a prompt or something for remove theses kind of pixelized pictures ?

bold heath
#

Neg prompt : jpeg artifacts, pixelated, jagged edges

Maybe

ocean grotto
#

will try, thanks šŸ™‚

#

It's works

#

love you

dusty turtle
#

is there a way to fix a over trained model? or just dete and start over again?

split acorn
#

You can add the token further in the prompt, you could turn the CFG down alicatHm2

#

starting over might save you the prompting headaches though alicatKEK

dusty turtle
#

training the instance image token name without adding photos?

rapid solstice
#

@unborn wind A question if you can answer. How do you use styles.txt file while training ?

#

v2.1 using dreambooth

night jackal
#

Hey guys! I've been playing with dreambooth for a week, and I start to have decent results.
Do you have any good ressources, or tips on how to correctly use prior preservation loss? For example, is it a good idea to have a classifier prompt a lot larger than the prompt to train on?

wooden atlas
#

Would really appreciate someone helping me unblock on next steps.. I have a highly curated data set of about 12k images with highly accurate descriptions. These are product photographs where the technical specifications have been translated to natural language. So there is industry specific language that I'm trying to capture in the model, hopefully it can pick up on if there is a certain type of widget in the product versus ones that have a different type of one..

My question is, what is the best way to train the model to learn a category of products such as bicycles? I have 1-6 images per product.

mossy heron
# wooden atlas Would really appreciate someone helping me unblock on next steps.. I have a high...

Vlad Kushkin — 19/12/2022 14:58
let's say you got john and sarah, then the images should be john(1).jpg, john(2).jpg....john (20).jpg, and sarah's would be sarah(1).jpg, sarah(2).jpg...sarah (20).jpg,
oh don't use generic names like john and sarah btw
i think if you train "a photo of an object" in instance prompt with images named accordingly to what you want (bicycles) in your dataset directory, you will have the desired result at the end.
need prior testing.

mossy heron
night jackal
clear lion
mossy heron
tough rampart
#

Got a question for tagging on a HN training
Let's say an image I used had the character in armor, and tags that,but I don't care about the armor, just the facial features of said character ,should I keep the tag or get rid of it?

clear lion
#

Is there any general purpose trainer like EveryDream but capable of running on 16Gb GPU?

split acorn
#

The DreamBooth extension for Automatic1111 can be used for both caption training and dreambooth style training

#

and they have a lot of the optimization settings that make it run on lower VRAM usage

upbeat light
#

Anyone familiar with hypernetwork training? Generating images fine in auto1111, but black squares on the hypernetwork sample generations.

median sun
#

does anyone know a style transfer for SD that works like the one from midjourny?

#

with 2 image files

ripe sleet
#

what is the best way to train a the 2.1 model right now?

clear lion
split acorn
#

@clear lion

#

And yeah, it trains the model CB_nod

#

Oh, it's not working for you in colab? alicatHm2 hmm

#

might be better to just use the direct colab version instead

#

just use the [filewords] type of training, instead of the classic DB style, for caption training

clear lion
split acorn
#

yeah, that one seems to work diff CB_thumbs_up

tough rampart
#

How many steps should I pick for a character dreambooth?
Heard it's 800-1200 ,but that sounds too little right?

split acorn
#

You honestly don't need that many most of the time

#

800-1200 is about right CB_nod

#

some might do like 2k but

tough rampart
#

Sweet

stone garden
obsidian sand
#

How to make this works? Should I rename all prepared concept images with the same name like instance images?

#

It's from TheLastBen fast dreambooth colab.

eternal echo
#

Hi everyone, I'm trying to fine tune a SD model with dreambooth to generate cartoon animal drawings. Finetuned model does a great job even with some animals that's not in the training datasets like octopus or deer but somehow have hard time to good looking cartoon snakes or crocodiles. I tried to add 100 regularization images consisting of different cartoon animal images I found online but still no luck. Any ideas or hints?

eternal echo
obsidian sand
eternal echo
obsidian sand
prime rivet
#

Hey. Anyone with experience in using images with transparency intraining? I find some examples of this but they mention a probalem of alpha halos around the trained things.

#

I'm trying to make a sort of a "things in cages" embedding and DB. To test it out.

ripe sleet
#

What's the best way to train a subject into v2.1 right now?

hushed blade
#

don't know why you would use something else than v1.4/v1.5 tbh

#

I tried 2.0 and 2.1 and got very bad results compared to the former

prime rivet
#

Adding new tokens in with text encoding is an artform on it's own, but overwriting you can do just with Unet training

#

Also TI is very powerful if you do the inite text correctly and have good dataset, especailly for styles

#

It is highly recommended that you check the LAION database for similar images and check the awful SEO/clickpait titles for good terms to describe your thing.

#

A shirt is not just a shirt in the dataset it is: "High Quality Cotton Mens Boys Fashionable Trendy Organic Breathable Soft Smooth OEM China Women Lady Girl Bodywear"

#

Thanks to SEO/Clickbait level google manipulation from Amazon/aliexpress/baba/indiamart/ebay/wish.... etc

ripe sleet
prime rivet
#

Also you can train UNET with that unique thing, then fetch it with an embedding.

#

But this required you to train both.

ripe sleet
prime rivet
#

And if it isn't working out, you have to consider your dataset

ripe sleet
prime rivet
#

Also you can train 2.1 with 1024x1024 if you REALLY want details

#

All the issues I have had with training 2.1 I have put down to me treating it as if it is 1.5, which it is not.

ripe sleet
chrome oxide
#

Is there anyway to gauge if my dreambokth model has overfitted?

#

Some of the samples generated are very close to actual training data

final matrix
#

šŸ‘€

split acorn
#

or possibly depending on the prompts you're using to test

#

You can save it with lower CFG, but that only saves you for so long

dapper prism
#

Anyone know how I can extract key frames from video within a specific start and end time? My search for specific training images have thus far only turned up videos with the image content I need

final matrix
tawny inlet
split acorn
# tawny inlet https://github.com/cloneofsimo/lora Does anyone know about LORA?

Ever wanted to have a go at training on your own GPU, but you've only got 6GB of VRAM? Well, LORA Dreambooth for Stable Diffusion may just be the thing for you! Faster! Smaller! More Better!

:)

ā–¶ Play video
#

good introduction to it CB_nod

#

it's using the extension version though, so it's a tiny bit outdated

#

but they go through the various settings and give a better idea of options

obsidian sand
#

Whats the best anime model as base model?

ashen perch
#

Hi, I'm trying to preprocess images with settings like these:

#

My problem is that I run out of RAM (not VRAM)

#

And I have no idea why

#

I got 32GB total

#

here's my memory usage, I just started the webui at the marked part

weary tusk
#

I have heard that even 8GB should be just fine for TI embedding learning but my 11GB card throws nothing but errors
Tried to allocate 6.33 GiB (GPU 0; 11.00 GiB total capacity; 5.64 GiB already allocated; 3.44 GiB free; 6.00 GiB reserved in total by PyTorch

#

with this enabled

ashen perch
#

Is 8GB VRAM enough to train on 768x768 images?

pearl geyser
#

there is only one way to find out

#

but it won't be much worse if you make 512 model

ashen perch
#

I'm using the v2.1 checkpoint and it's 768 already

#

It is working not but incredibly slow

weary tusk
#

I can't even train 512x512 images on 11GB without errors

#

are there any good services for embedding training? I used OpenArt for model training but I would prefer training embeddings for more modular usage

prime rivet
#

So if you can, consider find a nonsense token already exists, and just UNET that.

#

HOWEVER Ben does have a beta colab which allows descriptions

#

Go to the github of Fast ben's and the switch the branch to captions.

#

The reason 1.5 was "easy to train" was because the CLIP was absolute fucking mess. There was lots of margins to organise within it.

#

OpenCLIP that Stability trained is more "clean", so there is less nonsense that you can take use of. But because it is so clean, messing it up bit too much will lead to a disaster

#

I like to imagine it as if welding thin stainless steel. Correct settings and technique and it is just a joy and great show of skill. Slightly bad settings or technique and you are going to be fucking miserable, but you can still get something done,

#

The fucking annoying bit is that because it takes so long to get results, iteration is really hard.

#

But that is just the nature of ML work isn't it. Like when I do embeddings, I leave it and come back 1hr later. I can just about do 2-3 iterations a day. BUt just enough notes and analytical thinking and youll get it

#

Like for example I spent like... 5 days fiddling around on Auto's extra tab, just to find the best balance of upscalers to pump up pictures for further correction in Photoshop. Both source images for training and outputs.

#

My current setup for training images is R-Esgran General 4xV3, then Upscaler 2 Nearest, with 0,5 mix of them. Then I take them to Photoshop for colour and cropping, along with use the photoshops neural photo correction filter.

#

After this I add just a bit of blur and gaussian noise; scale the whole dataset to desired reso (768x768) And drop to the colab.

#

When training anyth8ing, it is critical that you remove EVERYTHING unrelated and undesired from the pictures.

#

Unless you are training specific person where details matter, then every picture doesn't need to be absolutely perfect and in focus. As long as all the general elements are spottable.

#

Also! Here is a good thing I have noticed! @ripe sleet Have picture of your subject in different scales. As in a close up, then further away. In neutral background preferably. If you don't have such pictures just make few in photoshop by just taking your subject and sizing it smaller on to the picture.

#

This helps at least Fast-Bens to realise that "This thing can also be further away!"

#

And lower quality seems to allow it to realise it can be in different quality.

#

This I learned by accident in trial and error

slate vessel
#

I'm trying to figure out how to do LoRA training on the Huggingface space. It says it has been trained, but I don't know if it's able to use it inside the space

final matrix
final matrix
#

the old north korean animation has no watermark and no subtitles
i love it

#

i kid you not the old north korean animation looks so much better
this just proves again that 2d > 3d animation smh

prime rivet
final matrix
#

i just went with batch cropping niw

dapper prism
#

Is it preferable to train 2.x embeddings on ema or non-ema models?

raven crest
#

what is the best scheduler for training texual inversion?

the colab notebook at sd_textual_inversion_training uses DDPMScheduler. is it any better to switch to EulerAncestralDiscreteScheduler

raven crest
split acorn
#

Also not sure why they didn't release a "full" (ema+non-ema) and not sure why they separated it like that but... i digress

final matrix
#

my north korean animation model has started training. Any claims that this is just another distraction for me to keep delaying the 2.0 Korra model are a lie!

600 training pictures in a... well quality.

two series, both in old and new artstyle.

I downloaded about 25GB of data from archive.org (there were dvds of the series and someone bought them and digitized and uploaded the video material, albeit in not so good quality and very low resolution) and then from the first episode (old artstyle) and last episode (new artstyle) of the two series I extracted frames

However, captions are kept very simple and formulaic because I didn't want to put so much effort into such a joke model. also I still have to finish korra v2.0.

I don't know if these simple captions aren't perhaps too simple and the training won't deliver good results in the end because of that, but we'll see.

the model will of course be called "greatest-diffusion".

stone garden
ripe sleet
crimson wasp
#

does anybody have any thoughts about whether it would be ideal to train individual tokens for each variant of an object such as clothing, or train one token for the common object and then modifier tokens which are paired with it. Such as design1 + high_neck_shirt, design2 + high_neck_shirt, design3 + high_neck_shirt, etc, or do each one as its own token?

real thicket
# final matrix anybody know of a way to batch remove watermarks and subtitles from images (vide...

If you have a bit of scripting/coding skill - there is https://github.com/mindee/doctr - script it to find the text, creating a bounding box at the location, and then script inpaint to fill it

GitHub

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning. - GitHub - mindee/doctr: docTR (Document Text Recognitio...

real thicket
#

with its own token

#

otherwise the concepts can bleed into each other with a model such as clip

#

BERT the neural network can seperate out the various meanings easier

crimson wasp
#

But yeah I suspect separate tokens would be better

real thicket
#

well in that case you can train a red silk blouse

#

and the color would go in red, the silk material property would get mapped to silk, and the shape information would be mapped to blouse

#

it wouldn't provide any benefit to do red_silk_blouse likely - unless you have different shades of red

#

and different silk textures

#

that need to be kept distinct

crimson wasp
#

It's just that then if you want other tokens to sometimes modify the shared concept it gets harder, e.g. you might have an extra long neck token which combines with the high neck shirt token sometimes, so having one token which is modified by variation tokens (such as materials and colors) seems better in that case

real thicket
#

shape modifiers are more difficult - and should likely be trained seperately

#

long_sleeve_shirt and short_sleeve_shirt

#

should be different concepts

#

(it can learn long and short, but they are such varied modifiers)

#

(that it is easy to confuse the model)

#

or long_sleeve shirt and short_sleeve shirt

#

also depends on how specific you want the concepts - if you are doing it for a fashion catalog

#

from a specific company - i'd train each seperately

crimson wasp
#

Yeah that would be a fairly simple case I think, but then if you've got say a costume helmet, which is also sometimes damaged in some scenes, it gets less clear if you'd want to train one costume token and then also sometimes a damaged token which can proceed it, to be used it combination. Then you might also have a raised visor token, rather than trying to train each concept separately with so much crossover of the core concept

real thicket
#

since the preexisting concepts might be different enough

#

also realize that most preexisting concepts in CLIP have text they are pointing to also

crimson wasp
#

The current plan is to train each token in textual inversion first, then insert into the model

real thicket
#

depends on the fidelity you want also - do you want your specific damage or the concept of damage in general?

crimson wasp
#

to minimize destruction to the unet as the finetuning happens

real thicket
#

if you want it damaged to stick to your concept, then it needs to be trained on your token more specifically

#

if you want to use the concept of damaged, then you might want to first train the undamaged image, and test to see how effective it is

#

with just prompting

crimson wasp
#

Probably going to take a lot of experimentation to know for sure, but yeah both approaches seem valid to me in theory

#

actually the 4 elements model somebody made for Legend of Korra might answer my question. They used different tokens for the hairstyles and outfits which could be varied along with a token for the character, and a token for the art style.
So they acted like modifiers to the same core 'concept', which was the Korra character, but were all learned together https://huggingface.co/ai-characters/4elements-diffusion

prime rivet
#

However I recommend choosing a colour that isn't anyway relevant to the subject. As in a white shirt should be on black or primary colour.

#

Black shoes on white... etc.

split acorn
#

oo that's a good idea

drifting hawk
#

To generate images from a data set of 700k images, how many epochs should I run and also should I start from a pretrained_version or start from random weights?

final matrix
upbeat light
ripe sleet
final matrix
prime rivet
# ripe sleet Would you say that embeddings are more worth to train when wishing to invoke a s...

You can't do specific human subject with TI. Unless you think the model has data of that specific human in it.

For specific person to get likeness you need Dreambooth implementation. HOWEVER you will get best results and editability with DB first then using embedding to reinforce it. But in this case - far as I know - you must ensure that the DB model in the sweet spot region where the Unet has the images in it, but not dominating it.

#

Training methods are not exclusive, and you shouldn't think them as such. They should be thought complimentary. Especially with 2.1.

#

Example: You can add a great variety of... I don't know types of Diapers (I'm still on my quest to make perfect Drump as a angry toddler in nappies without using DB - why? Because why not no one has given me a better challenge yet). However if you want to get a specific nappy from the set, you can TI train to get that specific one.

#

So in a sense that you can just Unet train things and not touch the encoder; then TI those things out of it. This way you can keep the model integrity as high as possible without messing the text encoders or Unet, but still be get more specific results.

hearty laurel
ripe sleet
obsidian sand
#

Why I can't train embedding on colab, it always gives me a bunch of error line?

final matrix
rustic sedge
#

how can I initiate image generation with the wanted prompt?

spice solar
split acorn
#

The exception being if the face/style is wildly different than what the model has. That, of course, likely wouldn’t work well.

round hare
#

Hi, sometimes, I'm pretty happy with the flexibility of my model, but not completely with the likeness. If I push the learning further, I'm loosing flexibility with artist in the prompt but I'm increasing likeness. Could it have any sense too resume a training with good flexibility without text encoder too just gain likeness ?

final matrix
spice solar
prime rivet
#

So if you wanted to just like reinforce the language of the model like show them pictures of a potato so it would learn to name that picture "a potato" then you'd only train text encoder. You can use this to redefine vague things like a difference between two very similar thing. I don't know... White T-shirt and white button shirt. you'd show it White T-shirts and train the text encoder with the term "Shirt".

#

The context training is there to make sure that you are less likely to fuck up the words relating to the custom concept you are trying to train.

#

But you shouldn't mess around with text encoder much, since training it, you can force every token in the model to find only the thing you trained and nothing else.

echo matrix
#

what is the best way to do styles in dreambooth? i find tons of YT videos and written guides how to train people but nothing on how to train styles

elder stream
#

Does fine-tuning make sense if you don't have captions? I am not familiar with all the options, but I am thinking of something like using clip to generate captions and then using those is one. Does this approach work well? Is there a better option?

lapis rivet
#

Are there descriptions for each sampler?

honest nexus
#

after I train a 2.1 custom model, should I change config on the yaml file or just duplicate from another model?

split acorn
#

finetuning being that it's training on each of the tokens and DreamBooth more so training how to produce your Instance Token, is how I understand it alicatHm2

elder stream
grave carbon
#

I am training with Lora but I think there's too many steps?

#

I put 150 per image but it seems its trying to do way more

amber hill
grave carbon
#

Although I have that number of reg images I believe

#

Which shouldnt have anything to do? Right?

grave carbon
#

hmm it seems it trained with my regularization images. I don't know why it would do that though,

grave carbon
#

I think it was because my regularization images were inside a folder which is inside my instances images.... I moved the folder to another directory now and trying again.

#

yeah because I have 360 reg images.. and 18 input

round hare
#

does anyone use dreambooth on 1111 on runpod ? I can run a train, but if I cancel it, I'm not able to send a second one, it does'nt do anything and I have to restart everything.

small trench
#

I tried to combine a number of complex concepts in one image today and it was a disaster. So I tried having Superman wear the Infinity Gauntlet, While holding Mjƶllnir and wearing four Lantern Corps rings. How could the model be finetuned to give a coherent output for something like that?

buoyant wigeon
wraith socket
#

Not sure if this is the right place for this - I've just picked up playing with this again after not using it for a couple months, I'm using the Automatic1111 github UI trying to train a textual inversion embedding for a friend. But all the results using the trained embedding just end up looking exactly like slightly fudged versions of the input images. I didn't have this problem on any of the past embeddings - I'm using the 1.4 model (not sure what version my original attempts in October were using) has something changed with how to get decent results out of the training process?

summer scarab
#

when training in dreambooth extension , can you add your captions to the image instead of creating text files ... for example the first image that I would be training on would be called " photo of (randomname) wearing an orange hoodie" and image two would be " photo of (randomname) wearing a black shirt in front of tall bushes "

full knot
#

anyone have issue with training without prior on shiro db ?

full knot
#

i'm wondering if the lastben dreambooth haven't changed "too much"

#

a 800 step style with around 200 instances is already overfits

tight heart
#

I remember a learning rate in the order of 1e-6. But now the default seems to be 2e-5.

obsidian sand
full knot
#

yeah tried to 1e-6 back but float error x)

#

on default shiro at 800 steps it reflects a lot more what i want

#

do you know if for concepts i create a new token for it or i chose an already existing one ?

#

like aravcreature or simply creature ?

#

i'll try both

orchid nexus
#

So ive watched a few dreambooth tutorials for 1111. And every seems to be getting 7-20min training times. But I am getting about 2+ hours for mine.

Im running on a 3090

#

Any ideas why my training time is so high?

full knot
#

14k steps ?

#

never put my hand on dreambooth for automatic but if thats the same steps we are talking about its a way too much

obsidian sand
obsidian sand
full knot
#

i see thanks, so with concepts we can't really differentiate / use a different keyword to trigger it ?

obsidian sand
#

Yes we can't

full knot
#

like training a style A in concept A and style B in concept B ?

hot breach
#

if you use captions on each image you can label them further to indicate as many "things" per image as you want, i.e. both the character name and a style, ex "a painting of bob smith by claude monet" or "bob smith in a cyberpunk outfit riding a motorcycle"

#

dunno if auto works like that, but traditional "dreambooth" typically uses a pretty simple label per "concept" instead of full labels/captions per image so you can't capture all that

full knot
#

for scenes yea i almost understood, but i'm trying to train two different styles but i'm maybe wrong on how i should do it

#

its like training a all monet painting and all tristan eaton ones, not merged but each separate triggers

obsidian sand
#

I think if you training art style you don't need to use concept images.

orchid nexus
#

How do I lower my actual steps?

full knot
#

maybe thats the embedding / hypernetwork if i want separate, not sure if we can do it

obsidian sand
obsidian sand
#

I've tried many times on colab and still failed

light solar
#

Hi, I am learning and experimenting a fine tuned model called infinitum diffusion based on stable diffusion 1.0

Still in process.

https://twitter.com/Felipe3DArtist/status/1607412835700678657

Hey! long time I haven't posted something. I am working on a fine tuned version of #StableDiffusion 1.0 called #InfinitumDiffusion. Still in progress :D

Here are some samples of what this model can do.

#AIart #AIArtwork #AIartists @EMostaque

fast crater
#

when your creating a hugging face page for your embed, how do you add images to the readme? only ever used civatai to share uptil now

stoic shoal
#

Hi! Can someone please help me?
I have 780 images (photos, already cropped 512x512) in a certain style that I want Stable Diffusion to remember and generate similar images.
I've used Automatic 11111 webUI and dreambooth extention to train my model two times and both times I couldn't get any results that look like the images I have trained the model with.
I've been trying to follow tutorials on youtube, but there was no info on my specific setup.

I'm getting really tired with it, my last training lasted for 30+ hours using training wizard (object/style). I'd be really happy and grateful if anyone could assist me or give directions.

silent holly
#

Not sure if this helps but I recommend starting small and following existing guidance. Find a video or tutorial that walks you through training a single subject from 10 images using a class word. Play around with that until you get results you like. Then try moving up to 50 and seeing how it changes the outcome. When you are feeling comfortable consider training by labelling each image using BLIP or something. I haven't seen a single effective guide that got me great results the first time.

orchid nexus
#

Any one get dreambooth to work with SDv2 on automatic1111?

crimson wasp
#

does anybody know if we could theoretically merge models just with the parts which are activated by various tokens? Gradient descent seems to work out which parts are responsible in the model, and quite a lot of the model seems to get left alone more or less, so in theory if we had say a finetuned style model and a finetuned object model, could we extract the activated parts from each and add them to a base model? Maybe averaging the diff of each where they both alter the same parameters

split acorn
#

I've merged the dreambooth training with other models before to pretty good success. I just did a "New model" (A) + (DreamBooth - Whatever base model), with Add Difference of 1

#

not sure what would happen if you did the style + object though alicatHm2

#

would be super interesting to find out

proper mantle
#

what is the different between diffuser textual inversion
[14:53]
and automatic1111 textual inversion
[14:54]
automatic1111 i saw shape of ( 8 1024 ), whereas for diffuser it's just ( 1024)

red breach
#

For dreambooth training on an art style, should I preprocess the training and regularization images with captions? Should those captions include the style token like "newstyle-12" -> eg a caption like "painting of a landscape, in the style of newstyle-12".
And for the reg images, having captions without the "newstyle-12" word in it?

hearty spoke
#

Guys, have a general question about model training.
What's the difference between dreambooth, finetuning and hypernwetworks?
If I understand correctly they all use same main.py file underneath but with different settings for training. So we can say they basically the same?
Biggest difference it seems that dreambooth uses custom prompt and optional class, while finetuning can be done without one, in this case it just changes output for all prompts. Also you can make captions for each train image in ft, but this was already implemented in lastben dreambooth. So I think there is less and less differences between all these methods.

hard peak
#

Is it possible to train SD on a particular pose?

#

Or what about a prop? Ex: Train SD on an AK-47 and be able to generate images of people holding it properly

coarse hemlock
#

having problems training a hypernetwork, just getting black squares as the output, training data is deffo fine (works fine for TI) - does anything specific need to be set to create a 2.1 hypernetwork?

chrome valve
#

I have problem, with finetune sd2.1, using dreambooth with automatic1111, i am getting bad results, and base model gets wrecked, cant draw anything, all prompts starts giving almost exclusively greyscale images

#

any idea what could be wrong ?

gloomy nexus
#

love to see it

#

only 3000 steps with 28 images in the dataset

#

painge

#

If I wanted to train Goku per say would I probably need to train two separate Embeddings for Base and Super Saiyan?

round hare
#

Hello, I need some help on LR schedulers in automatic 1111. I tried polynomial, but I don't really understand how LR decrease over time and how to set a Min Learning rate. Same for cosine, how can I tell the training to don't go under a learning rate of 1e-6 for example ? In the same way, how work the scale learning rate ?
I you have any good resource, please share.
thanks

upbeat light
#

.0025:50, .0002:200, 1e-4:400, 1e-5:600, 1e-6:1000, 1e-7:2000, 5e-8:3000,1e-8

#

it does .0025 for 50 steps, .0002 until the 200th step, 1e-4 until the 400th, etc....

#

Might be the same syntax for whatever part of the app you're in for your training learning rate.

#

hypernetwork, finetuning, TI ,and dreambooth numbers will be VERY different

iron depot
#

Would be great if someone could answer my little question. I've been trying to train an art style as embeddings. I heard that when it comes to style it is suitable to train on hypernetwork instead but I found it very difficult to get a good result when it comes to training hypernetwork. I want to ask if its possible and if it is, what should type in initialization text? Would be also great if someone who is experienced in training embeddings answer this.

waxen gulch
#

taro
/

tough flame
#

Do you have good videos on training and fine tuning?

neon patrol
#

Anyone have any tips on training a model with a class of objects rather than a singular one? For example, to fine-tune for the generation of character concepts, or monsters/creatures, not just one character.
Despite using hundreds of images my results have been quite mediocre so far.
I've used learning rates of 1e6 , 2e6, have used steps anywhere between 100-250 per imqfe, have tried with and without classification images. Am training the text encoder.

#

You don't use steps per instance image?

neon patrol
#

Exactly

#

Is there a quality difference between using Dreambooth ckpt vs diffusers?

white current
#

Could anyone help me with training? I have thousands of images, a lot of time, but only an RTX 3060 Ti with 8GB VRAM

solid hemlock
#

Hi guys! Is there any guide somewhere on how to train an embedding of a face on automatic1111? I can`t get good results 😦

iron depot
solid hemlock
solid hemlock
obsidian sand
#

No problem šŸ‘

split acorn
#

trending on artstation, iirc, tends to result in a lot of cropped pictures because "trending on" has thousands of pictures of t-shirts in the LAION dataset

#

See?

#

don't do it GoatUppies

lone wind
#

hi all - just wondering - is the depth checkpoint not great with faces or is it more due to the fact my original image doesn't have enough definition it understands. I keep getting blursed faces that look like bad capchas from 2008

oak ermine
#

I have been trying to train an embedding (textual inversion) for a style, and the sample images that it outputs during training look perfect. But when I start trying to use the embedding on original prompts, it doesn't look good at all. Is that a limitation of embeddings, or am I likely just doing something incorrectly?

red breach
#

Does Dreambooth training require to preprocess the training set with captions?
Some guides suggest preprocessing the trainingset like its required for textual inversion but others do not mention that.

split acorn
#

Can help a lot depending

split acorn
#

For early dreambooth training, people were doing it without captioning and it still seemed to work well in some instances CB_nod

red breach
split acorn
#

Yep!

#

Just make sure the name of the txt file matches the image

#

image1.png image1.txt

red breach
#

perfect, thanks. So basically I can use the same training set that I am using for textual inversion

split acorn
#

yep!

#

mmm

#

For textual inversion, I had better luck captioning everything EXCEPT the subject itself (basically, the details I want the model to learn)

#

For dreambooth you can do that too, but it also seems to work well if you include ALL the information in there, as well

red breach
split acorn
#

mmm

#

I'm not exactly sure! I've only been heavily researching subject training, personally

#

one sec

#

It's a bit outdated, but the information remains largely relevant

red breach
#

thanks! I will check that out

red breach
#

@split acorn one more question, do you know if the filename matters? since captions are in the extra txt files it should not matter right?

split acorn
#

mmm, most repos have it so that if you have a txt file with the same name, that takes priority, so the filename wouldn't matter. Some repos have it so that if you don't have a txt file, then it'll use the filename as the caption instead

#

I wouldn't use any special characters in the filename

#

just keep it simple if you're using a txt file

red breach
#

Ok I see, do you know how the automatic 1111 dreambooth extension handles it? (I can also check the code but if you already know it saves a lot of time haha)

split acorn
#

it uses the txt method, not sure if it falls back to image captions if there is no txt

#

I don't think it does tho

unborn wind
#

Anyone ever try fixing an overtrained model by merging it with a different model with SD2.x?

hot breach
#

alternative thinking ahead is to try to get multiple checkpoints as you train, so if the last one is overtrained you can go back to an earlier file

unborn wind
#

I was messing around and merged it with redshift and the results are turning out surprisingly way better than I expected.

unborn wind
#

Maybe try highres fix, or lower the cfg scale?

coarse hemlock
#

for those purists that want to use rare tokens that don't get split by the tokenizer, I put this list together, it's not every combo that works, it's most of them though

#

granted some will be rarer than others

#

just test by using them alone in a prompt if the results are super random, it's a rare token

split acorn
#

And another technique is, generate with a high CFG (it'll have the overfit glow) and then img2img with a low cfg

#

(for subjects). Would be neat to try for styles, not sure though.

stone garden
coarse hemlock
#

I got chatgpt to write a python script that made every possible combo and fed that into a tokenizer web app then threw that result into a text file and got chatgpt to make another python script to extract every contiguous three letter combo in the result and put that into another text file

stone garden
coarse hemlock
stone garden
#

or something else

coarse hemlock
regal harbor
#

I want to train concepts of

1 - emotions (laughing, crying, afraid, angry)

2 - train the model to position better (playing sports, fighting/wrestling)

should I train 2 completely different models and merge later? Or could I even train both at the same time (using captions)... so a single training set has a picture of a closeup angry face, a picture of 3 people fighting, a picture of someone doing a backflip, a picture of someone crying while lifting weights etc...

I feel like training it all at once, with natural language that describes all elements of the image I'd like to train makes the most sense?

indigo orbit
#

I want to train SD so I can apply different costumes and accurate ice skates for my characters. What's the best approach to do that? Dreambooth or textual inversion?

royal kayak
#

I keep getting this error when I try to generate a picture from a model that I merged, any ideas?

Traceback (most recent call last):
File "D:\stable-diffusion-webui\venv\lib\site-packages\gradio\routes.py", line 284, in run_predict
output = await app.blocks.process_api(
File "D:\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 983, in process_api
data = self.postprocess_data(fn_index, result["prediction"], state)
File "D:\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 930, in postprocess_data
prediction_value = block.postprocess(prediction_value)
File "D:\stable-diffusion-webui\venv\lib\site-packages\gradio\components.py", line 3308, in postprocess
file = processing_utils.save_pil_to_file(img, dir=self.temp_dir)
File "D:\stable-diffusion-webui\modules\ui_tempdir.py", line 18, in save_pil_to_file
shared.demo.temp_file_sets[0] = shared.demo.temp_file_sets[0] | {os.path.abspath(already_saved_as)}
AttributeError: 'Blocks' object has no attribute 'temp_file_sets'

white current
split acorn
#

DB is a lot faster, from what I've experienced, but does require more vram

indigo orbit
#

Can I do it on 6gb?

split acorn
#

TI takes 2 to 2.5 GB of VRAM on top of the normal VRAM usage of SD

#

For 512 so

#

Maybe if you did something smaller like 256 x 256?

indigo orbit
#

Do all my images have to conform to the same image dimensions?

split acorn
#

It would out of memory me if I wasn't careful with 8 GB (for 512 TI training)

#

When I did 256 training, I made all my images that size, yeah

fast crater
#

Has anyone found the solution to training an embed on people who are not skinny but not fat either?
Both myself and a a mate are not fat but not ideal weight either, but the embed training seems to blow it out of proportion. Doesnt happen in dreambooth training though.

If anyone knows a solution, please ping me

tidal cliff
#

is it better to train an embedding of a person with a variety of pictures from my cell photos? like, at the beach, in a restaurant, etc... vs just getting the subject to stand in front of a white wall and I take 20-30 photos in a perfect pristine environment with no other stuff in the image?

#

im trying to get embeddings for my kids to put them in funny stable diffusion pics

#

i have 100's of photos of them from my cell phone, but you know... at birthday parties, playing in the yard

#

or i could have them stand in front of a white wall and take a bunch of new ones

vivid rapids
tough rampart
#

Question
Are styles better for TI or HN?

split acorn
#

HN, imo

#

I really like TI for subjects though alicatUwU

#

I still need to do more style research, so take that with a grain of salt CB_nod (I'm currently focused on subjects)

stone garden
split acorn
#

Mmm one sec

#

Well

#

I only have settings for anime + subject alicatKEK

stone garden
split acorn
#

Yeah, though just keep in mind this study isn't finished and the examples near the bottom work better than at the top. No example images are added yet though

#

But it should give you and idea

white current
#

@tropic quail

split acorn
#

As per before, it's not done yet

#

just some examples

#

For Intitialization text, describing as much of your subject as possible seemed to result in much better results

#

at least the parts of the subject that you never want changing

tropic quail
white current
#

WebUI

tropic quail
#

yea

split acorn
#

I'd never use "once" for the latent sapling method

#

Bold is used to determine what setting is being tested

#

though I still need example pictures or it's not as helpful and why it's not done alicatKEK

stone garden
split acorn
#

it shouldn't take that long

#

10k steps should take less than 45 min (with GA steps of 1)

#

I think you might have the slow iteration issue that others have been experiencing with the 4090s

#

there's a fix though, one sec

#

might help!

white current
#

launch your webui

#

go to extensions tab

stone garden
white current
#

@split acorn Could you help me with Finetuning models on DB + Lora?

split acorn
#

Gradient Accumulation steps will greatly increase the training time

#

as a heads up

white current
#

I am already training one but still dont know that many details

split acorn
#

I have no experience with Lora, but I have a lot with DB and subjects

#

what was the question? or

white current
#

Okay so

#

About overfitting

#

does having 2K images, a lot of time, and low learning rate + high step per image help avoiding it?

tropic quail
split acorn
split acorn
#

low learning rate is a good way to avoid it though, more images isn't necessarily better

white current
#

This was my results from the test run:

stone garden
split acorn
#

nono

#

whoever did that guide is

#

questionable

stone garden
#

haha

split acorn
#

Gradient Accumulation is there to avoid VRAM restrictions

#

in exchange for longer training time

#

if you can get away with having it all on batch, that would be better than using gradient at all

white current
#

and any datasets u suggest?

stone garden
split acorn
#

lmaooo yikers

white current
stone garden
white current
split acorn
#

well depending on how many images you're training on

stone garden
#

10 images

split acorn
#

then it wouldn't make sense to go more than 10

stone garden
#

gotcha

split acorn
#

not sure on the best amount though, but ChillBar_shrug

split acorn
stone garden
#

Thanks @split acorn I'll re-run this after this one finishes.

split acorn
#

you could do like 100 per image if you wanted. If you're fine with every tank looking like you're tank then regular DB methods are fine without regularization

#

you could also do the finetuning method and avoid using regularization all together

split acorn
#

Just make sure when you're generating the class images, that you're using the model you're training on and you're using the class/class prompt as the prompt to generate them

split acorn
#

You should be able to get good results with just like 0.0005

#

the other one I tried was "5e-03:200, 5e-04:500, 5e-05:800, 5e-06:1000, 5e-07" but I didn't like the results as much

tropic quail
split acorn
#

obviously the LR would depend on what you're training, how many images, what model you're using, etc. So feel free to adjust as needed and experiment GoatUppies

white current
stone garden
split acorn
#

Or just a different guide haha

white current
#

after you installed it

shrewd wedge
#

does embedding (not Dreambooth) require regularization?

split acorn
#

Nope

shrewd wedge
#

huh ok, then I have no idea why my ouput images suck shit as if it got seizure

#

it do be looking like this

#

like has anyone experienced this issue before?

split acorn
#

Could try lowering the cfg

stone garden
split acorn
#

Or putting the embedding further into the prompt

shrewd wedge
split acorn
#

Yeah, it should be close to like 4

#

Nono

shrewd wedge
#

oh

stone garden
tropic quail
#

i dont see it for some reason in the tabs though

white current
split acorn
#

Yes, because the image preview is based on either a prompt from your training or the prompt from the txt2img tab. You have to interrupt training to change the prompt if you're using the txt2img option though

tropic quail
white current
stone garden
split acorn
shrewd wedge
#

oh ok i guess i will try that

stone garden
#

@split acorn For your testing so far, how many steps before you start seeing a likeness? On this re-run, 500 steps looks much better. I realized I had set the model in A1111 to something other than what I wanted to train on so that may be part of the reason for the Cronenberg sample pic. šŸ˜„

split acorn
#

Not sure how to answer that, tbh

split acorn
#

Depending on the LR it could be less than 1k or more

#

Or your dataset

stone garden
#

I will just wait it out. I wish I could figure out why the it/sec are so slow

white current
split acorn
tropic quail
#

okay

white current
#

First of all

#

do you have your images prepared?

#

(as in, cropped up to 1:1 aspect ratio, and has accompanying text files for the prompt)

#

@split acorn One question, could you explain me everything from Instance Token to Class prompts and filewords n stuff?

#

i know them basically but dont know how they affect each other

split acorn
#

Instance token just being a unique token that you use for the subject/style you're training on. Usually people recommend using a rare token (like sks). To test if a token is actually rare, just prompt it and then see if they all look random.

Class prompt being everything but your instance token, to help it learn what the subject/style actually is.

Filewords just meaning having a text file or filename that contains a prompt that describes what the picture looks like. One method bring to describe everything and the other method to just describe everything EXCEPT the subject/style.

#

So a class prompt could be
Photo of person
Instance token being olis
Class token being person

#

Instance prompt being
Photo of olis person

#

Or

#

Instance token, [filewords]

#

And [filewords] as the class prompt

#

Is how I understand it

white current
#

tysm

#

@frank urchin

frank urchin
#

hello

white current
#

alright so

#

i assume you have webui

frank urchin
#

indeed

#

(ive been using it for a while now so i know the basics)

white current
#

Go to extensions tab, press available sub tab, press load

frank urchin
#

just wanted to mention that

white current
#

nice

frank urchin
#

mhm

#

whats next

white current
#

search for "dreambooth" and press install

frank urchin
#

got it already!

white current
#

nice šŸ˜„

#

now

#

press Apply and Restart UI in the Installed tab

frank urchin
#

got that done too

#

sorry im really ahead of you LOL

white current
#

now completely restart webui

#

lol

frank urchin
#

whats next

white current
#

open the dreambooth tab

#

now

#

on the "create" section

#

give a name to your model

stone garden
#

Silly question for you @split acorn if you are still there or if anyone else knows. I trained a TI subject and the likeness is good, but it was trained against the standard 1.5 model. If I wanted to add it a custom model as part of the prompt will I be able to get the same likeness or similar as long as the original model was also 1.5?

white current
#

on the source checkpoint, select the model you want to finetune your new model on

frank urchin
white current
frank urchin
#

can you train embeddings as well?

#

or is it just models

white current
#

yeah but i am not experienced on that

frank urchin
#

ah ok

#

cuz im mostly just going for a little style

white current
#

models are better imo

frank urchin
#

ok yeah lets continue

shrewd wedge
frank urchin
#

ill use elysium as the source

white current
#

alright

#

select the scheduler as "euler", do not know if it affects or not, but if it does, euler is best

frank urchin
#

done

white current
#

press create

#

and wait 2-3 minutes

stone garden
#

and then your food is all heated up! :D

frank urchin
#

LMAO

shrewd wedge
split acorn
split acorn
frank urchin
shrewd wedge
split acorn
#

DDIM for realistic, seems to work well

white current
shrewd wedge
white current
#

now we prepare your images

shrewd wedge
#

idk if that's normal or smth, but when I use that embedding, it makes those kinds of images

#

which is very annoying

stone garden
white current
shrewd wedge
frank urchin
#

i crop them to 512x512 right?

white current
#

768 or 512

frank urchin
#

512x512 (pretty sure)

white current
#

alright 512 then

frank urchin
#

99% sure

#

ok

#

and i forgot how to get the prompt thing

#

txt files

white current
#

you do it trough the train>preprocess images tab

#

there are 3 options with prompts

#

how many images you have?

frank urchin
#

getting them rn

#

how many should i have?

#

and how should i name them?

white current
#

depends

#

well

frank urchin
#

im trying to just base it off a character

white current
#

name does not matter if they aren't neatly named

#

if they are

#

just keep them

frank urchin
#

can i do like 1, 2, 3, 4

#

for each file?

white current
#

you can ofc,

frank urchin
#

ok lovely

white current
#

so how many images are we talking about approx

frank urchin
#

is 20 enough?

stone garden
#

20 images are often more than enough. If it's for one character that is

frank urchin
#

oh then is like 10 enough?

white current
#

yes

#

yes

#

if you have 10, it is best to manually describe each of them, using as much detail as possible, in a text file with the same name as the image

#

(if you have 2048 images like me you have to automate it with some clever tricks lmao)

frank urchin
#

that is a lot

white current
#

i consider it low lol

#

i more like to train big stuff

#

working on learning image scraping and stuff

frank urchin
#

i dont even really need to make a model

#

just for fun

#

i like learning random things

white current
#

nice

#

same here lul

#

i just want to draw good tanks

#

and vehicles

stone garden
#

depends on the style of the vehicles, then I believe there are quite a lot of models which can draw them :)

split acorn
#

Have you tried fine-tuning?

white current
split acorn
#

Seems to be nicer for bigger datasets and training various things like tank types

#

Nah

white current
#

oo

frank urchin
white current
split acorn
#

EveryDream is an example

frank urchin
#

perfect

white current
#

@split acorn how do i do it oo

split acorn
#

Mmm one sec

white current
#

vision models?

frank urchin
#

sorry bout that

split acorn
white current
frank urchin
#

now for the whole prompting thing

#

how does one get the txt parts

white current
frank urchin
#

mhm

split acorn
#

both are very easy and are good for getting an idea for caption training instead of DreamBooth style training

frank urchin
#

oh and then am i writing in the txt what the prompt would be?

#

like to describe the character basically?

white current
#

and then rename the txt file to same as the image

frank urchin
#

and do i write that as the file name or inside the txt?

#

oh ok got it

#

thank you

split acorn
#

There are two methods if you're doing TI or DreamBooth, that I've had good luck/experience with:
Method 1) Describes all aspects of character & background
Method 2) Describes all aspects OTHER than the character

white current
#

if image is image.jpg text file should be image.tct

frank urchin
#

this look good?

vale egret
#

For TI you should use filewords to describe every part of the image that you don’t want it to learn

split acorn
#

@vale egret that's method 2, yosh

frank urchin
#

now im very confused šŸ’€

split acorn
#

That's using Method 1

frank urchin
#

what does TI mean 🄲

#

im sorry im new to training stuff

split acorn
#

Textual Inversion

frank urchin
#

OH

#

oops

vale egret
#

The point of TI is for it to learn the commonalities between your training images. If you’re training a female character X and you put girl in the filewords, you’re saying ā€œthis is what X looks like as a girlā€

frank urchin
#

ahhh

split acorn
#

yep

vale egret
#

If you remove girl from filewords, it will learn that X is always a girl

split acorn
#

Going from method 1 to method 2 just changes the flexibility and how you prompt to get the results you're looking for

#

both can work though

#

Just method 2 seems to give more consistent results

#

Or like

#

it's much easier to prompt

#

and less forcing haha

vale egret
#

Method 1 using auto captioning always gave me crappy results

split acorn
#

yeahhhh auto captioning doesn't seem to work well enough

#

like it's a good base start

#

but for smaller datasets especially, I think it's important to check them

#

take out the irrelevant parts and add in the missing parts

#

you get WAY better results that way

#

bad data in bad data out

vale egret
#

Like for characters make sure to describe the environment, pose, background, any specific clothing and lighting

frank urchin
#

this is so tedious i really hope this works 😭

split acorn
#

OH a neat technique is if you plan on using a simple background for ALL your pictures, if they're all the same, make sure to include that background description in the filewords and then you can help by including that in your negative prompt

#

seems to work really well

#

thoooo, with high CFG you run into issues

#

but

#

it's neat!

frank urchin
frank urchin
#

oh wait that made no sense im just stupid

split acorn
frank urchin
#

šŸ’€

#

good to know

untold rose
#

Who is here the master on model training from a person face. i kind of did all the steps followed instructions, but they dont have LOARA in them, did a check point of my wifes face and she looks like nothing when i generate the pix.
or if you know a channel/person who i can ask about it that would be amazing. i'm just new and still trying to understand this whole thing lol i'm not gonna bore you just need some directions in a right direction.

vale egret
#

I’m sure there are guides online you can follow

split acorn
#

An embedding may look like trash at 7.5, but if you pop it up to like 11, all of a sudden it looks really good!
Or vice versa, at 7.5 it looks like trash, but at 3 it looks really good

#

but that's more so for saving a model/embedding that was like under or overtrained alicatKEK2

vale egret
#

Weird, how does changing cfg compare to reweighting?

split acorn
#

reweighting?

vale egret
#

(:0.8)

split acorn
#

Like dreambooth or caption training?

#

oh

#

gottcha

#

yep!

#

that's another method to help

#

it acts similarly to CFG

#

from my experience

frank urchin
#

that was long

please dont make fun of my obscure character

#

please tell me that looks correct šŸ’€

split acorn
#

though... I will say weighting it doesn't do much if you include the instance token/embedding name as an entry in the beginning of the prompt

#

and I've had much better results adding it after like 5 words instead

vale egret
split acorn
#

yeah

white current
#

WHO SHE

vale egret
#

Also, do we have accurate data on how many training images you need if you crank up the embedding vector count to something like 20 or 25?

frank urchin
#

princess tutu!

#

fun lil anime

white current
split acorn
frank urchin
split acorn
#

there's some information where "you need more images for a higher vector count!" but I have zero idea where that comes from

frank urchin
#

whats next!

split acorn
#

because I've had good luck with higher vector count with not that many images. Just depending on the character and model, etc, but

vale egret
#

The idea is to avoid overfitting

split acorn
#

yeah, but I think it's more complicated then just that

vale egret
#

more vectors and few images could make it obsess over the specifics of those images

#

Idk what the actual data is on what numbers cause that though, which is why I’m asking

split acorn
#

ahh yeah

stone garden
#

my embedding had 16 vectors on 20 images and it really took over the entire scene, but I had stille make it for me as I couldn't so I can't really say it's a normal thing or not :P

split acorn
#

16 vectors worked really well with one sec, checking how many pictures

#

13 pictures

#

but that was anime style with an anime model

#

I imagine the vector count thing with more pictures is more so for self photos and still keeping flexibility

vale egret
#

It’s a general rule of AI that the number of training instances needs to be more than the number of parameters and I don’t see why it would be any different for embeddings

split acorn
#

but I'd be super interested in examples BonGoat

vale egret
#

So 13 images seems a bit low for 16 vectors

split acorn
#

You'd think that, but it worked way better than 8

#

and 32 was way too much

#

I think it's heavily model/subject dependant

#

If it's easy for the model to make vs hard?

vale egret
#

32 vectors 100 images character embedding. Someone needs to try this

split acorn
#

mm mm

#

I'll try that later today

#

not that specifically

#

but testing the relationship between vector count and images, up to like 13 images

#

(because I'm lazy alicatKEK)

frank urchin
white current
vale egret
#

I’ve used 25-35 images for my embeddings

white current
#

sorry got distracted

frank urchin
#

oh np!!

white current
#

anyway so

#

now you have the thing

#

lets go train the ai

frank urchin
#

yes

white current
#

open the dreambooth tab

frank urchin
#

yep

split acorn
#

A big believer of "sometimes less is more" and "quality over quantity" alicatUwU so I try to keep them on the smaller side

#

but 25-35 is still super reasonable depending

#

Nitrosocke being known for their high quality style dreambooths too CB_nod

#

though it's a bit outdated, but still largely relevant

10-100 sample images

white current
frank urchin
#

and i dont have that tab?

#

did i do something wrong 😭

white current
#

settings

#

same thing

frank urchin
#

ah ok

#

next!

white current
#

Training steps per image: 1000

#

then go all the way down to learning rate

#

now this will be the settings i personally think are good, based on my understanding of the thing

frank urchin
#

okok

white current
#

This for learning rate: 0.000001
This for Lora Unet: 0.0002
This for lora text encoder: 0.00002

#

after, go to image processing

#

set the resolution to 512 if not set

frank urchin
#

yep

#

got all that

split acorn
#

I personally do 100 steps per image and cap out around 300, personally, as a quick note (though this also super depends on other factors though, as well)

split acorn
#

1000 seems like a lot

white current
#

tru

#

you can reduce to 200~

#

sorry

#

bit sleepy

#

after u have done that

#

untick center crop and apply horizontal flip

frank urchin
#

ok ill do that

white current
#

then at the most bottom, open the advanced dropdown menu

vale egret
frank urchin
#

yep

white current
#

Tick Use Lora

#

Tick Use 8bit Adam

stone garden
#

ai art is black magic to me so you're waaaay ahead of me!

white current
#

right SHIT

frank urchin
#

got that

white current
#

forgot to ask sorry

frank urchin
#

are you asking me?

white current
frank urchin
#

yeah i do

white current
#

ok

#

good

#

set mixed precision to fp16

#

set memory attention to xformers

#

tick both dont cache latents and train text encoder

vale egret
#

Tbh if there was a much smaller version of SD specifically for fine tuning, we might not even have needed dreambooth in the first place

split acorn
#

Also bf16 needs more love

#

I really like it alicatUwU

frank urchin
white current
#

yes

#

different in mine but ur correct

frank urchin
#

also i dont see any train text encoder

white current
frank urchin
white current
#

okay not that important

#

now go to concepts tab

frank urchin
#

yep

white current
frank urchin
#

done

white current
#

now uh

#

leave instance token empty,

frank urchin
#

mhm

white current
#

type girl/character/person one of them (or what word you think describes the cwute pwincess the best uwu~)

#

it needs to be 1 word

frank urchin
#

wait so like

#

id just write girl?

#

in here?

white current
#

yes

split acorn
#

if you're using an anime model, use 1girl

white current
#

yeah

frank urchin
#

oh yeah oops

#

ok done

white current
#

for the instance prompt: [filewords]

split acorn
#

I do class token, [filewords]

white current
#

hm

white current
#

i mean you know better than me šŸ˜„

split acorn
#

(textual inversion)

#

So when doing my instance token, I went with hta

#

because it's both a rare token and Any3 didn't know what that was

frank urchin
#

im so confused what yall are talking about 😭

split acorn
#

Sorry, mmm you can take over, I just would include the instance token in the instance prompt

#

so:
hta, [filewords]

#

and hta being your instance token, whatever that is

frank urchin
#

is the instance token the thing thats basically telling the prompt to generate my character?

#

im new coolguy

split acorn
#

instance token is how you tell Stable Diffusion "hey, this is the character I want"

frank urchin
#

ok got it

#

so could i write the name of the character?

#

"princess tutu" or would that be stupid

split acorn
#

You could! Though, keep in mind, if you do then it might mix it up with someone else with the same character name

#

people generally recommend a rare token. This means that the model doesn't really know what it is. That way, when you train with it, the ONLY info it knows is the info you're feeding it

frank urchin
#

i dont think theres any other characters named princess tutu?