#🔧|finetune

1 messages · Page 3 of 1

restive ridge
#

Same thing I'm seeing with my samdoesarts model which is supposed to have digital painting results. Going to have to rethink my prompts, not getting the same style. Same prompt is looking more realistic / CGI. Might to be more careful now about using "realism" "ray tracing" etc

modern lintel
eternal radish
#

is here any option to make small specialized SD for img2img who will be like 100 times faster and use less vram, and like hardcoded only one prompt ?

hot breach
#

not sure thats immediately possible but you can reduce steps and/or denoising strength

restive ridge
#

Using a v1.2 model or earlier might be lighter

hot breach
#

they're all the same model, just more or less training

eternal radish
#

i'm more about another model i can create and train

#

just like "cat into dog" dataset, or single style

#

not sure if smaller Unet or something still be able to give good images

viral jay
#

Guys I'm trying to train it with a logitech G935 headphone, but even after 20k steps it still not able to really catch it, any tips?

hot breach
#

those may be too close up

#

and cropped

viral jay
#

it has around 40 photos for learning and BLIP captions, so I'm training the embedded with subject_filewords too

hot breach
#

screenshot of training images?

viral jay
hot breach
#

some of the training images look a bit too close up and may confuse it

viral jay
#

did you think selecting images that are more "clean" would help? or those with texts around it, etc aren't such a problem?

hot breach
#

ideally you want your training images to be very well cropped, I would remove the images where the entire headset is not visible

#

like, a small portion close up might be ok, but.. I see quite a few there that are cropped weidly

viral jay
#

🤔 hmm I will try to use just good ones then and see if I get better results, I thought closeup ones would help to improve details

hot breach
#

it can, but they probably need to be a much smaller % of the total training set

#

you can see from your outputs you get heavily cropped images, too

viral jay
#

that's true I haven't paid attention to it

hot breach
#

I would say keep the extreme close ups like 5% max of the total set, just as a rough idea

#

so maybe 1-2 images out of 40, or maybe try once with none of them cropped, only full photos

viral jay
#

I will take pics of my real one so I can get it more controlled, the main problem is that I proably will be holding it with my hands so not quite sure if that's a good idea lol

hot breach
#

maybe get a string or coat hanger or something to hang them from

#

take some photos outside, too

#

or on your own head

#

then caption "a man wearing logitech headset" or whatever

viral jay
#

good idea, Thanks for the advice, lets see if I can get this thing good

#

do you think the filewords are relevant at all?

#

for embedding?

hot breach
#

what sort of training are you doing? TI? dreambooth?

viral jay
#

I'm doing with TI, I've also tried with HN

hot breach
#

I'm not sure how it works with TI

#

I don't know if captions count for that

viral jay
#

well yeah for faces TI didn't do very well like HN, maybe my images aren't helping it so I will try to take more consistent photos and see which one can catch it better

hot breach
#

no real idea on HN but I got the impression it is something more useful for styles than objects/subjects?

viral jay
#

for faces it does a great job

#

not sure if I'm right but I think faces are mainly a subject type?

#

I'm going to try with something else, took photos of a thermometer once I manage to get it being replicated then I get back to the headset, my headset is too dusty for pictures 😅

green flax
#

if i train an embedding with a hypernetwork selected in settings will it use the hypernetwork while training

tribal rapids
gilded crater
#

I got it from huggingface. Google search reddit and huggingface

maiden grail
#

What kind of time averages are people getting for training an embedding? It takes like a half a second per step, and not sure how many steps are recommended.

Should I just reduce the max step field? Or is that going to significantly hurt the model?

#

And also, trying to track an embeddings "progress". It generates an image every 500 steps? But what even is that image represeting? just an amalgome of all the images so far?

#

Or it is just a random test file created?

hot breach
#

it tries to generate an image of the last class or caption trained I believe

radiant rose
gilded crater
#

I can find in an hour or so. Gotta feed the kids first.

hot breach
#

ripped out regularization from dreambooth as I reform code to just do full fine tuning and seeing quite a drop in vram use, good sign

#

also seems existing codebases (xavier, etc) are using lightning 1.5.9 and may be improvements moving to 1.60+

frozen forum
ivory veldt
#

Do people know good resources for custom models? I need to generate 2d dashboards, web ui, app UI and logos. The current 1.5 model is focusing on art and photography. There might be models out there that has been trained specially for 2d designers

ivory veldt
#

Btw, the legendary works of Neville Brody, the Designers Republic and Alex Trochut are completely seem to be missing from the datasets. All from the 2000's, so internet era.

sand pine
hot breach
#

up to batch size 6 now on local fine tuning

spare magnet
frail thunder
#

Hello colleagues, could somebody point out the information on how to fine-tune SD to introduce new category to the model? Not a dreambooth way

frail thunder
icy olive
# fast current Wondering this myself

/r/StableDiffusion had an app-icon-generating model posted. Generally you can use Dreambooth to finetune, so gather up and describe samples of icons, UIs, logos, etc. and feed 'em to the machine.

next nimbus
#

hey guys, so I'm following Nerdy Rodent tut on dreambooth, but when I run the last command (./my_training.sh), I get this error, how should I fix it, any idea?

train_dreambooth.py: error: the following arguments are required: --pretrained_model_name_or_path, --instance_data_dir Traceback (most recent call last):

eternal radish
#

just python script do not get it

next nimbus
# eternal radish just python script do not get it

thanks for the help, looks like it was due to extra space during copy/past. but I'm stuck again with this new error : /

accelerator.py", line 286, in _init_ raise ValueError(err.format(mode="fp16", requirement="a GPU"))

I own gtx 1070 so i tried with and without deepspeed, but the same error even when I chose CPU.

#

"Unable to proceed, no GPU resources available", not sure why.

#

i just run the command torch.cuda.is_available() and the result was false : (

edit: nvm, probbaly need to update my windows to allow cuda pass through.

rugged wolf
#

dumb question, what's an epoch? some models say they were trained by X number of steps and others talk about epochs

tribal rapids
#

"An epoch means training the neural network with all the training data for one cycle."

hot breach
#

generally means one look at every training sample, but a lot of the code people are running has repeats so its actually a lot more than 1 look per sample

mellow dock
#

on what resolution was novel ai trained on??

raven pecan
#

god im so lost

#

all i want is to make my midjourney images cleaner and nicer and enlarge

#

I dont know what Im doing. I managed to figure out how to run the notebooks though lol

#

I tried to research but now i know 100 new things that have nothing to do with what im trying to actually do 🤣

gloomy belfry
gilded crater
fallow valley
#

unfortunately there is no guide out there as far as I know on how to further finetune the autoencoder.. if anyone know of one let me know, I am willing to use some of my cloud gpu credits to try and finetune

#

I only know how to use it with the popular gui, it was recently updated to import the weights from the vae file into the model you are running.

woeful goblet
#

How am i supposed to make a mask for the mask upload feature in some guis? (using automatic)
If i paint in the area i want to redo in black, its treated as a negative mask when generating, i have to select "Inpaint not masked" for it to work
So then it surely makes sense that white would be positive. But a white area does nothing at all, it just generates the whole image

foggy hinge
#

Hey what workflow or colab with SD 1.5 are you using to fine-tune with only one to a few base images?

glossy rune
#

Same as 1.4

ivory veldt
next nimbus
#

Do you guys know if its possible to train/dreambooth ckpt file instead of diffuser? localy? if so which guide I should follow?

rugged wolf
next nimbus
next nimbus
rugged wolf
#

oh I didn't know, so dreambooth is not textual inversion?

next nimbus
ivory veldt
#

nice

foggy hinge
#

What to do if we only have one picture?

stone garden
#

Hello I wanna do some dreambooth trainig

#

what is the best repo for doing that as of today? I have a GPU with 23GB VRAM

#

tried to do it with a repo about 1 or 2 weeks ago but I was getting errors because of lack of VRAM

glossy rune
#

But consider that for Image Generation you are much less flexible in augmentations e.g. of colors vs for the usual classification that most augmentation guides will probably aim at

foggy hinge
#

Hmm right.

#

Have anyone tried to use Stable Diffusion fine-tuning with WebAssembly (with Rust bindings or just Python) so that the client run the training part?

unborn fulcrum
#

greetings, does anyone know about this error when training HN?

AssertionError: no gradient found for the trained weight after backward() for 10 steps in a row; this is a bug; training cannot continue
hot breach
#

original for SD at least, there are also diffusers versions, some* diffusers don't unfreeze everything though, there's a trade off on VRAM use and how much of the model is unfrozen, and results are different if they don't unfreeze everything

next nimbus
hot breach
#

some of the early diffusers ones were only unfreezing unet I think, and it doesn't work as well, but I haven't messed with diffusers myself so take that with a grain of salt

#

if diffusers unfreezes the same parts of the model I imagine they're close either way, but all the 8gb/10gb/16gb stuff I believe is simply not unfreezing the entire model and that probably leads to worse results, but there's also the concept of "enough" to get your project done

#

there are differences in optimizers in diffusers and such, too

#

people get either to "work" but most people are also doing fairly small scale projects, like 10-50 images of their dog, or their own face, etc

#

my take is if you have higher ambitions, unfreezing the entire (latent diffusion*)model, caption training, etc is far more capable and moves you away from TI/dreambooth towards more general fine tuning, depends on your goals, your hardware capabilities or what you want to rent, etc

next nimbus
#

I see, thanks for the info.
we got alot of stuff out there in a very short time, I can't keep up with all.

I want to do a test similar to the pokemon model outs there but instead of generating pokemon, I would like to use naruto anime, so I figured I train the model with a bunch of naruto pictures, you think this can do that?

#

its the last ben repo of dream booth.

hot breach
#

don't know, I think most people have to work on their projects a bit to get them to work and gain experience, try different things

#

anime might be harder than more photo real type characters from what I've seen of other people's experiences, but I don't do anime stuff myself, nor do I use diffusers

#

I'm trying to update the lightning trainer (local/xaiver/compvis based) to see what can be done about getting it more up to speed, its still running on some old libraries

cobalt sorrel
#

Hypernetworks dont work with 1.5 yet?
I try to train a Hypernetwork with 1.5 model but i get a error messagfe.

tribal rapids
#

could summarize what are the clear symptoms of overfitting please? also how CFG would be affected (eg it'd have to be lower and lower etc)

hot breach
#

at least sometimes, faces look sunburnt or high contrast at standard cfg

tribal rapids
#

for instance i trained 36 images 3600 steps with 1500 regs, then another 900 steps to make it 4500. i'm pretty sure at 4500 the results are starting to look both less like the subject (a bad thing), but also just look less like the training images (eg more variety in the outputs so a good thing). i'd expected if i overfitted then it would all look like my training images with not much variety? i need to compare exact sampler parameters between the 2 models though

#

what are we calling standard CFG? 7?

hot breach
#

yeah close enough

#

7.5 was default on compvis launch, auto defaults to 7 i think, either way close enough

tribal rapids
#

what would you say is a good sampler/steps for just a basic unstyled output eg (photo of jmp909 man) as a baseline test?

sonic bobcat
#

normal steps should be fine

tribal rapids
#

you mean 20 Eular A?

sonic bobcat
#

30~ i don't use euler a

#

the big difference is dpm2 a and euler a from the rest

tribal rapids
#

euler a is the default selected in a1111, that's why i was wondering what sampler is a good baseline default for unstyled output

sonic bobcat
#

you can make it render more than 1 sampler

tribal rapids
#

sure i know you can do that in extras, i was just wondering opinion on sampler/steps for outputting a baseline unstyled result to compare 2 sets of training on a photo likeness

#

i know there are a so many variables to it all it makes it tricky to generalize

#

like at 3600 steps i could get a good likeness with DPM adaptive, 20 steps, cfg 8..... now at 4500 steps if i don't lower the CFG to about 3.5 there, i get somebody else's face that has similar features but does not look like the subject

#

with photo of (jmp909 man) wearing yellow hat .. obviously the emphasis has some effect there on both tho

#

actually i think with 4500 i can take the emphasis out and get a better result. what i cant work out is how far to take the training to actually improve things

#

being able to take the emphasis out for the same CFG suggests the model is doing better?

sonic bobcat
#

not necessarily ? try other steps

tribal rapids
#

yeah doing an X/Y on CFG vs steps now thanks

last delta
#

Hi guys, I want to train DreamBooth for a style. I did it a few time for faces no problem.
Is it the same process for styles?

Setting it up with prior preseveration, what is the class of a style.
For example if I want to train for watercolor images, the class would be what kind of images do I have to supply? just random other styles for comparison?

icy olive
#

The regularization images would be whatever images the model would generate previously with the class keyword:

Prompt: artstyle -> your regularization images

stone garden
#

oops sorry, sent wrong place

last delta
restive ridge
hot breach
#

I imagine it should work if you use captioned training and describe the art such as "a wolf standing in a forest by Biff Artistman" or whatever for each image, haven't messed too much with outright styles though

last delta
hot breach
#

mrwho added it to joepenna repo

#

there's some setup to name the files and organize them into subfolders

#

works in kanewallman's repo just based on filename but the notebook is probably not maintained

past parrot
#

hey now

stray kindle
#

Yo, looking for advice. If I wanted to train this shit on different species of cavemen, what would be the best option for that? I assume I'd have to do one at a time, yes?

#

Well, more importantly, would it have to be a particular individual, or would different ones of the same category work well together?

icy olive
#

Ok, it's time to step up.

I know dreambooth can be used to train one particular style or subject in, but how would I train the model in general -- as in pick up training of SD 1.5 with my own dataset of various different things?

hot breach
#

you can caption all your images and train as much stuff as you want

#

you'd have to check the diffusers stuff though on 16GB, the lightning/xavier repos take 24gb because they unfreeze the entire model, or rent runpod/vast or colab pro I guess

icy olive
#

I'm already renting on vast

icy olive
hot breach
#

there's some stuff with how you put things in subfolders, unfortunately they hadn't put any documentation on it last I checked

#

there are other options local or if you dont might doing everything from the CLI but I gues everyone doing remote runtimes are likely using notebook

#

technically all you need is a terminal on the runtime even on a runpod or vast but you need to be kinda familiar with linux command prompt instead of just clicking the play button on a notebook...

#

im working on a general fine tuning trainer, ill see if I can make a notebook for it at some point...

icy olive
hot breach
#

do you know how to push your own files into the rte and move them around in folders? i.e. your training files

icy olive
#

yes

#

already did that for the previous model I trained

hot breach
#

readme on both should explain how to organize files, mine doesn't use regularization per-se, but you can put reguliarization images in the training folder if you want anyway

#

how big is your training set?

#

tbh these work better with larger sets, I used kanes with 600, 900, and 1400 and then forked mine last one I did with 1600

icy olive
#

I have around 400 images (half of which I'm still tagging)

hot breach
#

ah thats probably good enough then

#

im working on some stuff to autoclip tag stuff

icy olive
#

I'm gonna make a tool to help me tag stuff

hot breach
#

both mine and kane wallmann's use the same naming convention of "your caption goes here_n.ext"

icy olive
#

"Show image, input tags in text box, click next"

hot breach
#

yeah you can do it in automatic but its not batch, you can probably script calling interrogate.py but I want to build something in myself

icy olive
#

tbh, for datasets less than 1000, I think fully manual tagging is king

#

getting it exactly right seems really important when you have fewer images to train on

hot breach
#

you can batch name them with clip then just replace "a man" with "john cena" or whatever

#

yeah my results have improved greatly, first 4 character one I did was just all "name of character_n.ext" then I ran them through clip to get the surrounding context and it helped quite a bit

#

slowing building training set with a mix of other ground truth data...

restive ridge
icy olive
#

I just trained a model with JoePenna's implementation

#

It seems to work pretty well

gloomy belfry
#

Shivam is good as well

next nimbus
#

What makes the ckpt 2GB? I mean if we remove the full float and unnecessary data?
How can we make sure to add to the ckpt instead of overwriting it?
Or the answer to these questions are still not out there?

fast current
#
#

Definitely gonna be the new way

fast current
#

Mmm, makes me wanna make an embedding for Attack on Titan scenery

stone garden
#

where can I find thousands of regularization images of woman

ivory veldt
#

using custom models = a new era for humanity,

viral jay
#

man dreambooth is so much better compared to TI and HN, shame I'm not able to run it locally

#

dreambooth with 1200 steps, it's a fiat uno, the damn thing has learn it quite well

#

this is TI after 5k steps

viral jay
#

what I'm finding cool about it is that I still able to edit the results, for example the first image of car on snow, there's no pictures that have been trained on snow, on bottom its also on snow but from 10 images I've generated only one had something leaning to snow and wasn't correct at all

#

any tips on how to achieve the same with TI or HN?

tough gazelle
fast current
#

Yeah it sounds tricky for sure. A lot of it probably comes down to consistency in the training images at a guess

tough gazelle
fast current
#

I think thats sort of the idea though? From what im reading it's more of a method of nudging art style in the right direction, rather than adding specific elements

tough gazelle
#

Or you get the opposite. I trained one on vapourwave style images and all it took from them was the colours and it turned everything into a purple/yellow blob

tough gazelle
fast current
#

Yeah that sounds about right. Time shall tell more i suppose

steel ocean
#

hey guys , I am using this repo of optimised dreambooth https://github.com/gammagec/Dreambooth-SD-optimized
the problem is my training always stop after around 19 minutes , like there is a timer or something .. how do I edit the code to make it work till it finish certain amount of steps

GitHub

Implementation of Dreambooth (https://arxiv.org/abs/2208.12242) with Stable Diffusion - GitHub - gammagec/Dreambooth-SD-optimized: Implementation of Dreambooth (https://arxiv.org/abs/2208.12242) wi...

trail flower
#

Has anyone merged with the inpainting model? I'm wondering if it works and retains its inpainting abilities with the trained elements of a secondary model

tough gazelle
tough gazelle
steel ocean
tough gazelle
#

Not at my computer so can't 100% remember but I believe it's in the bottom of this file

configs/stable-diffusion/v1-finetune_unfrozen.yaml

steel ocean
#

does the algorithme choose how many steps it needs ?

#

like did the algorithme did choose 2264 steps , or there is a variable

tough gazelle
#

Ignore that. It's based on the amount of training data.

#

It will stop at whatever step count you set.

tough gazelle
steel ocean
#

I made 8000

tough gazelle
#

I found the best for speed / decent accuracy was between 4-6k but your mileage may vary.

#

8k should be fine too

#

You don't usually start overtraining until a decent bit over

#

Keep an eye on the renders it makes to track how well it's doing.

steel ocean
#

so it will go beyound 2k since I set the max 8k

steel ocean
tough gazelle
#

Yeah it will move to the next epoch

tough gazelle
steel ocean
tough gazelle
#

You made it sound like you were planning on stopping changing the data and resuming.

#

You can't do that and have good results in my experience. You need all your training and regularisation data set from the beginning. And then set your steps and let it complete the whole process.

steel ocean
#

ahh , cant I do something to pick good pictures for ai

tough gazelle
#

Well you need to pick your own training images for what to train. For regularisation images what I did was used stable diffusion and created a few hundred images using my chosen class name as the prompt.

steel ocean
#

that exacty what i did

ivory veldt
#

where do people download image datasets that are great for training? For example I need open source images of everyday women and men, same folks, with different settings, emotions, clothes etc. There are a lot to be found by celebrities, but not from open sourced people.

icy olive
#

LAION, or have the model generate them

#

The latter is mostly acceptable, since the point of the images is to make sure the model can still generate the same stuff it could before finetuning

storm linden
#

If I want to train text encoders with Dreambooth, do I need to change any .json files before running the training script? I'm using gallery-dl to get my images and I have the metadata, I just don't know if or where I should be putting the file

storm linden
icy olive
#

Update: now using ShivamShirao's repo locally; even faster than JoePenna's repo on the cloud GPU

hot breach
# steel ocean this ?

keep in mind there may be several flags strewn about that limit training, might be settings in the yaml and another setting on the CLI args, make sure to double check, tbh hard to keep track of all the repos args but some have multiple limits

#

might be useful to just delete or set the default on cli args to 99999 and just use the yaml to keep it all in one place if you find other limits so you're not chasing your tail constantly

upper prism
hot breach
#

from my understanding some of the diffusers stuff isn't unfreezing the entire latent diffusion model like the xavierxiao based repos (joe/kane/gamma) do, so be wary of comparisons because it may not be apples to apples

hot breach
frail thunder
next nimbus
#

And when it would be a better choice to finetune and when it would be the best to train using dreambooth?

hybrid pilot
next nimbus
#

@hybrid pilot If I want to train the model with a new art style, what would be the best choice?

hybrid pilot
#

depends really on how you want to use it. I'm still sort of figuring things out, but I wills say that dreambooth does great at adding a single specific thing or a very apparent art style, fine tuning is the "best" way, but has the draw back that you really need to know what your doing to not mess up weights.

I would say textual inversion actually. the downside being it will take up some context space, but for a quick style, it does the job really well

#

or the new aesthetic diffusion thing I've seen mentioned. that looks like it has room to really set a style

#

hypernetworks are too heavy for just a single style

#

disclaimer: I have no idea what I'm doing and this is purely just things I've read/observed, I could be 100% wrong

icy olive
#

I've used all 3 -- dreambooth gives the best results
for styles, TI is pretty good most of the time
Hypernetworks are hit and miss

icy olive
next nimbus
icy olive
next nimbus
#

oh, thats cool.

#

man I wish I had these tools when I was a kid.

hybrid pilot
#

If this was around when I was a kid, I'd be a data scientist instead of a sysadmin lol

next nimbus
#

I just finished training and testing my first Dreambooth using TheLastBen, with only 800 steps and 24 pictures, I feel like the source images I feed the model leaked all over the ckpt, or am I hallucinating?

upper prism
next nimbus
upper prism
#

can you see the 200 created images in the folder structure of the colab?

tawny inlet
#

Hey guys, can someone please tell me what's the best way to Fine Tune Stable diffusion for Set of Characters from Anime?

upper prism
# next nimbus yes.

I'm lost then, sorry
It's not happening with the other repos like that and I dont know if its missing a configuration or if it doesn't do it correctly

covert gazelle
icy olive
#

I'm not sure, but it's simple to do it yourself if you have python and pytorch installed

hot breach
covert gazelle
icy olive
#

Converting models doesn't require any VRAM. It occurs on the CPU.

hot breach
#

I don't think ckpt convertors touch the gpu at all?

#

ckpt files are just tar files (like a zip file) you can even open them in 7zip and look around

covert gazelle
#

Surprisingly fast!

#

Or maybe I just didn't do research about the conversion process

#

great! 👍

#

actually, maybe not that great, but I did only make a 2 GB model

#

It's not doing so well when it comes to placing the said person in different locations, probably because it was trained in colab???

raven pecan
covert gazelle
#

but what about SD upscale catlurk

stone garden
wild totem
tribal rapids
#

Nice can I ask how you’re training please? Mind you.. he’d be in the model already

#

Also what % of generations are actually decent?

stray kindle
#

If I wanted to train a bunch of caveman species, what should I do?

icy olive
#

Dataset protip: Install gallery-dl, and scrape dozens of images at a time

#

e.g. gallery-dl -D mydataset --range 1-25 <danbooru search results link>

wild totem
sterile pivot
#

cooolll

wild totem
#

but the colab notebooks have improved since then

stray kindle
#

No advice?

tribal rapids
#

thanks

#

has anybody got a technical explanation why if i train photo of jmp90 man then photo of jmp91 man man will also give close results.... is it just converting jmp90 to a number internally that jmp91 is also close to and will crossover essentially ... not sure how far the difference is eg jmp9, jmp etc

next nimbus
#

Some fine-tune mention "This version uses the new train-text-encoder", anyone can explain this to us? and how can we use it to train/ finetune our model?

icy olive
#

if it's split as ('jmp', '9', '0', 'man') then ('jmp', '9', '1', 'man') will be similar

tribal rapids
#

you think i should just use jmp?

#

sorry jmp man

vale egret
#

use 🏃‍♂️

tawny inlet
#

and which type of training???

ivory veldt
#

FYI: allegedly runpod.io machines running dreambooth create better ckpt, nicer images: https://www.youtube.com/watch?v=mVOfSuUTbSg

Dreambooth, Google's new AI, allows you to train a stable diffusion model with your own images, producing better results than textual inversion. Dreambooth was built on the Imagen text-to-image model, which allows you to insert any character (yourself, friends, family), object, or animal you want into a stable diffusion model with just a few ima...

▶ Play video
dire heath
ivory veldt
#

I was unlucky with google colab today. I even paid $9.00 to get an undisturbed training. 1 out of 6 training processes went through, others just stalled, stopped, "something went wrong" after an hour wait. I think it's a bad quality product from google, the ux is also terrible. Now trying runpod

hot breach
hot breach
icy olive
ivory veldt
#

yes. runpod is unstable - it's a joke. I want a reliable training server, not a shared machine

ivory veldt
#

who has a web service where I can upload images and I get a ckpt file back. Without the painful interruptions, errors and 100s of log files?

limber peak
#

runpod works pretty well

ivory veldt
#

Not for me. Training stop every time

tribal rapids
#

hmm..
(jmp person:1.0) and jmp person are not the same it seems... if you have Use old emphasis implementation ticked in a1111
(jmp person:1.1) and (jmp person) are also not the same in that scenario

I had assumed each bracket was 1.1, with ie 2 brackets being 1.21

solar shale
#

Hi, does anyone know how can I implement stable diffusion on my own macbook?

woeful sphinx
#

When it comes to improving the quality of the faces around a specific model for 1.4, what comes to mind?

#

We are not happy with the results

#

Should we try training the model longer?

tame aurora
hot breach
#

clip remains frozen

#

unet and autoencoder for the local repos, and I think you can unfreeze them both in diffusers with the right setup if you choose but I'm a bit behind on that

#

at least the how part

#

but I believe it is an option

silk marsh
#

Hey, I've noticed after training an embedding of a character with Auto1111's textual inversion interface, that after a certain point, using the embedding tends to make duplicates of the character. Can it be because of too many steps of training? Is there a way to prevent that? Have any of you had the same thing happen?

tame aurora
hot breach
#

if you have the vram I think unfreezing the whole model is worth it from what I've seen

#

maybe for some projects its not terribly important

tame aurora
#

I’ll add that to the list of experiments, thanks

hot breach
#

yeah the backlog problem is real 😆

woeful goblet
#

how do i remove colors with inpainting? I'm making an image of a wasteland, and it has lots of glowy red spots on the ground that i don't want, i'm trying to just replace them with boring, cracked black dirt. But it keeps maintaining the colors and just painting in red cracks. I've even got black in the prompt, and red in the negative prompt

#

Example: I selected all the red bits that aren't on the pillar, used prompt cracked black ashen ground and negative prompt red pink color
Still red everywhere 😐
using automatic ui

#

i only want red on the pillar not the ground, any thoughts?

tribal rapids
#

What about something like (red_pillar:1.2) in prompt and (red_ground:2) in negative ?

#

Keep the underscores see if it works

#

Play with CFG level

#

Dunno

woeful goblet
tribal rapids
#

Inpaint after?

azure crane
#

I read maybe a few weeks ago that someone had made a SD version that was trained STRICTLY for HANDS (and maybe feet too), but I can't find it anymore... Anyone knows which one it is and it's any good?

woeful goblet
upper prism
tribal rapids
#

If there’s multiple people in a photo and you just want to train one of them what’s the best way to exclude the rest? Just mask them out with noise?

upper prism
tawny inlet
#

Hey, can somebody help me with Dreambooth?

#

I'm trying to train a Person Face, should I leave these two values as they are?

upper prism
upper prism
tawny inlet
#

Shivam's Repo.

upper prism
#

Ah yeah, that's the one I'm using.

tawny inlet
#

I don't understand too much about AI and all these stuff, so yeah, please don't ask too complex questions. I would prefer normal english.😅

#

I still didn't understood what you meant by class images.

#

I just added face images on training folder...

#

Tutorial I was following said that Class images will be generated automatically. I guess...

upper prism
#

To make it as easy as possible I'd leave the settings as is and try it without any modifications.
If that test runs well and gives you good results you could come back to that ☺️

upper prism
midnight owl
#

Anyone have any suggestions for working with different ckpt / weights quickly? I'm using A111111 SD Webui. For me it takes up to a minute or so.

hot breach
#

faster hard drive will help, its fast to load on an NVMe drive

#

loading a 4gb file from spinning rust will be slow, but it only takes 2-3 seconds on a fast NVMe drive

midnight owl
#

Ok so those YT vids I've seen where folks are switching 'instantly' are time snipped I guess

hot breach
#

1-2GB/sec read on NVMe drive means 4gb is just a couple seconds

midnight owl
#

Oh really OK, 2-3 seconds, wow then I'm doing it wrong, thanks

hot breach
#

I initially started using SD off my NAS, it's reasonably fast NAS but it would still take 20 seconds or so to load, on NVMe drive its just a few seconds

#

SATA SSD is like 6-10 seconds, etc

midnight owl
#

I simlinked my models from my nvme drive - some of them anyway. I seem to remember it didn't help initial checkpoint weights loading, so never pursued it

#

I very superficially looked at how to point the SD webui at an external location for the models

restive ridge
#

Yeah on my M.2 SSD it's a few seconds. It's also worth trying out embeddings if you haven't. Much lighter than checkpoints and ideal for styles.

grand jay
#

Anyone got a good tutorial link on how to fine tune the model? with 10k+ images. lambdalab's pokemon demo looks horrendous...

tribal rapids
#

on shivam's, for a person what do you currently recommend steps wise for 12 images @ 1e6 with 300 regs? i'm goin up in steps of 800, up to 4800. .. i'm not sure 100 per image does it.. or maybe 800 was to low and 1600 too much.. but the face isnt settling

2e-6 converged (well a bit) a lot quicker

#

it was defaulted at 5e-6 at one point wasnt? i've seen someone here training 20 images for 4040 steps so dunno

#

there's some unnatural data in these photos tho this time, eg i've moved a person from the edge of a photo to the middle, cropped anybody else off and filled with black.. it's probably problematic

delicate stream
#

Out of all the 28 or 29 activation functions of the hypernetwork on Automatic1111, and the Layer weights initialization. What is the best/recommended options i should choose? I AM SO CONFUSED as to which i should go with.

#

i used to do linear, but

#

Linear is not there anymore, and they added layer weights initialization

delicate stream
#

Also this Select Layer weights initialization. relu-like - Kaiming, sigmoid-like - Xavier is recommended

#

is very vague

#

what does that mean? i should mix Relu with Kaiming Normal and always use Sigmoid with Xavier?

#

That is so vague and confusing

alpine blade
#

I made a bunch of interesting photos to say the lease

hot breach
#

it's so new maybe people don't know yet

delicate stream
#

Guess i'll have to find out myself.

hot breach
#

try training with the same stuff and just change the init setting and compare after I guess, it's pretty experimental stuff

alpine blade
delicate stream
#

i tried earlier and im still a bit confused, either way i'l keep experimenting with it.

alpine blade
#

but what a common theme between all of them is that there is something inherently wrong with their limbs and faces

#

does anyone know how to fix that?

delicate stream
alpine blade
#

uh what

#

its all images

delicate stream
#

Yhea but Disocrd do be like that

alpine blade
#

47 images in there

#

maybe just because its zipped

delicate stream
#

Those faces look....weird, i see

#

Have you tried using mutated, disfigured in the negative prompt?

#

or any other limb correcting phrase?

alpine blade
#

no

#

I'm new to this

delicate stream
#

Well that certainly can help

alpine blade
#

this was just an initial test of mine where I put in a series of photos with the prompt "anime girl"

delicate stream
#

Negative prompt basically is what you don't want in the image

alpine blade
#

alright

#

any other tips?

delicate stream
#

Well what UI are you using?

#

Automatic1111, CMDR2 or Grisk, etc

alpine blade
#

the webui?

delicate stream
#

like where did you download it from?

alpine blade
delicate stream
#

Then Automatic1111

alpine blade
#

ahh

#

I see

delicate stream
#

well there's LOTS of things to do on it so i cant name all of them, i suggest taking a look at this.

#

the features mainly

#

but tips specifically

#

well:

#
  1. You can use Loopback on img2img to make the image sort of better over time
#
  1. in order to upscale an image you can use a 512x512 image and pass it to img2img with denoise strength at 0.50-60 or use ESRGAN
#
  1. 20-50 steps is more than enough, unless you are doing img2img or oupainting or inpainting.
#
  1. The recommended sampling methods are usually, Euler_a, Euler or DDIM, you can use the others but i mainly and other people use those.
#
  1. using [] or () with a specific word inside like:
#

[dark alley] with a (red light) on the ceiling.

#

[] = Doesn't pay much attention to what's inside but its still there

#

() = Pays more attention

#

the more (()) the more attention as well as [[]] for less attention

#
  1. The more complex your prompt, sometimes is better but sometimes cutting back helps, so don't go crazy with the prompt.
#
  1. Things like "an anime girl is running down the road" is much better like this "1girl, anime, running, street"
#

But most of these you can find as well on the Automatic1111 GitHub, just read around and you'll see it.

alpine blade
#

tysm

delicate stream
#

No problem

tribal hearth
#

any tips for drastic inpainting? it works fine when I mask an object and try to replace it with something similar, but if (for example) I mask a wall and prompt "hole in the wall" often very little to nothing will change, irrespective of cfg/steps/sampler/etc

pearl merlin
icy olive
#

The slow speed in loading weights seems to have to do with Python's relatively slow pickle deserialization (I should try to profile it). I'm stuck waiting for 10-20 seconds even with an NVMe SSD and 32 GB of RAM.

delicate stream
#

Found a nice guide for the Hypernetwork thing (activation function of hypernetwork)

#

basically explains what they look like

crimson wasp
delicate stream
#

under stable diffusion in settings

#

as far as i can see (currently training) elu is good

#

The best seem to be, relu, rrelu, swish, sigmoid (hard to say), leakyrelu, tanh.

delicate stream
#

you can add selu to the list, it's also good.

dawn trellis
#

Hrm.... I'm playing with textural inversion, and I'm getting results that appear to be over fitting? ie. If you put * in as a prompt, it tries to literally regenerate one of the sample photos. Such things as 'a * themed lunchbox' just don't work at all.

#

I've been reading all the issues on the original textural inversion github... but, there's just heaps of people trying to random stuff as far as I can tell.

#

eg. 'set num_vectors_per_token to 60'

#

Even though the original work didn't do anything remotely like that.

#

Does anyone know of a good guide for the right settings to use?

crimson wasp
dawn trellis
#

Yes, so I fail to see how that’s useful despite the advice in various places to do so, and simply set the inference number to some arbitrary lower value like 8

#

…but practically the number of vectors of 6 (used in the original repo) seems to give results that don’t do anything remotely like what they describe.

#

“Banksy art of *” gives gives me a photo of *

#

Is there a trick to making it do something meaningful, instead of just spitting out the training images?

#

TLDR; he can’t.

#

Meh, I think this may actually just be broken with LD.

oak ether
#

hello, how to add a new style in SD ? i precise that i have not it running on my pc, i use it online

tribal rapids
#

to increase negative emphasis in te negative prompt (ie I want it even less like the word) do I use (word) or [word]?

#

like instead of red in the negative prompt, i want to say I really really dont want red.. is that ((red)) or [[red]] ?

#

not sure because it's negative

crimson wasp
crimson wasp
crimson wasp
# icy olive Wasn't always broken, at least

Yeah I've had some success with it, but can't get anything good out of it now. I have some very minor modifications, but they're the same as I have in other repos (reading prompts directly from filenames)

delicate stream
#

They are bringing Linear back to Hypernetworks... nice

native bison
young quail
#

Has anyone had any luck determining a more optimal prompt for multiple subjects without distortions or merging taking place?

hot breach
#

is that a fine tuning question or a prompting question?

#

because putting group photos in your training set helps

#

that's a render of a trained model that includes group photos

#

getting outfits on specific characters in group photos is still elusive

young quail
somber roost
#

What is the best way to train an art style in dreambooth? I picked 150 samples and trained with 2000 steps, the results are impressive but seems the model can't understand some concepts that appeared in the dataset

#

Maybe bcs there are few references to those concepts

indigo siren
#

I’m a newb hoping to make a Transformers model, and I have two questions:

  1. If I want to teach the AI to recognize details like a ‘red helm’ (specifically the blocky face-framing element some Transformers have) or a pair of ‘blue pedes’ (the nonhuman feet of a Transformer, trying to avoid toes), is it better to say “red helm/blue pedes”, “red-helm/blue-pedes”, or “redhelm/bluepedes” when teaching? Or something else entirely?

  2. Is it possible to teach the AI multiple tags at the same time? I found SD through NovelAI, which made a model that recognizes most danbooru tags, and I would really like to be able to do something similar. Eg) “whirl-idw1, blue-long-empuratee-helm, yellow-long-empuratee-optic, neck-up, from-side, suspicious” to describe a picture in the dataset, and “swindle-g1-cartoon, black-helm, gray-face, purple-optics, black-neck, yellow-pauldrons, yellow-chest, glass-windshield, purple-torso, waist-up, from-front, fake-happy” to describe another. Is there a way to do this efficiently?

I apologize if these are ridiculous questions with obvious answers. I’m new to this.

delicate stream
#

I have spent...hours trying to determine which Hypernetwork activation function to use. I have determined......SCREW IT!

#

It's Math, it's all different ways to plot a graph and based on that graph your training will go different. My advice...... There's no better method, they just all produce different things. There saved you the trouble, just stay on Linear, relu, selu, elu and leakyrelu. Those seem to be stable......for the love of god....don't change the Layer weights initialization from Normal. Jus leave it there, DON'T touch it. Also....DON'T use SIGMOID, it's just a mess......if anyone wants to keep trying, go for it. But for me? im good with Linear.

#

as for Dropout....well be careful with that. i could do some perfect stuff without it before, but it will drastically change your outputs as well.

#

Maybe for the good or bad.

delicate stream
#

Here is some examples of what the graphs look like.

#

Do with that what you will.

alpine rose
#

@hot breach what ratio have you been using between training images and reg images?

#

1:100 ?

#

i have 400 training images for a model im trying to make, kinda lame to generate 40000 reg images

ivory veldt
#

I installed dreambooth locally for local training, but getting a cuda memory error. RTX3060 6GB VRAM Anyone successfully trained locally?

alpine rose
#

at some point I read you needed 24GB VRAM to train, maybe that changed

ivory veldt
half folio
#

the model is simply too big to be loaded in 6GBs of VRAM

hot breach
viral jay
#

guys I've installed dreambooth on WSL, but when I convert to ckpt and load it on automatic webui I'm getting this error, if I copy the whole folder and execute same command but on windows directly it works, any idea?

hot breach
#

auto has a safety checker on the ckpt that rejects if it there are unexpected things in it so people don't end up getting malware

#

it may be the converter being used on the repo you have from diffusers->ckpt is doing something unexpected in the ckpt pickle file

#

his safety checker is probably far from perfect, but better than nothing

viral jay
#

yeah disabling the checker make it work, the weird thing is that same script on windows creates a working ckpt but on debian produces that message

hot breach
#

that's certainly an interesting data point

fair perch
#

hello everyone. I was thinking a way to produce datasets of generated characters for fine tuning and today I founded a simple way to do it.
I did just a few tests using img2img, the idea is cropping every image in each angle and use it in dreambooth

icy olive
hot breach
#

not bad for 13 minutes of training

delicate stream
#

😂

#

Never thought i'd see Ted as superman.

dawn trellis
#

Drops the reg as part of the loss in the unfrozen finetuning?

dawn trellis
mighty igloo
#

I have a dataset that I created by parsing one site, image tags are inserted in the file names, and I want to finetune SD so that it understands the tags of this site. What is the best way to do this?

delicate stream
#

Yo quick question, if im training a hypernetwork on a specific anime character. Is it better if i do this for the textual inversion template?

#

Ichigo, [filewords] ? or [filewords] since im thinking adding ichigo will make it more pronounce to know that the anime character im training it on is called Ichigo and every time i say 1boy, orange hair it wont just generate a random dude with orange hair instead. So that's why im thinking if i do Ichigo, 1boy, orange hair it will consistently make it like the guy i trained it on. What do yo guys think?

hot breach
# dawn trellis `Prompt is simply "ted bennett"` <-- why no complex prompt?

there's not much need for it, the text encoding is quite smart from what I can tell and it's too painful at inference to remember magic prompts and magic tokens, I don't think the magic tokens really has legs, no one is going to want to have to read a giant prompt guide especially as mega models start flowing in place of having a drive full of a hundred 2GB dreambooth trainings

hot breach
#

the data management to change the ratio of preservation to training data is obtuse if you use kane's, it was sort of fixed at 1:1 without a complex explanation of moving data between "reg" and train folders

#

kane's was essentially just passing pairs of train/reg and training them equally, which actually worked very well for preservation using laion data in place of "regularization" in dreambooth paper terms, captioning already removed token/class so at this point it back to a more general case fine tuner and there's nothing left of the dreambooth paper in there really, and I wanted to be able to more easily manage the ratios of new and preservation data

#

I'm actually fairly pessimistic about "fast" "dreambooth" (no regularization) long term because but people seemed hyped about it so I did a POC for it above and it only took me like 30 minutes of actual work to run a test and write up the readme for ted bennett to show it works

naive wharf
#

getting super off results when training hypernetwork with automatic1111, running 2000 steps and the outputs are nothing that matches the prompt... any suggestions on how to direct the model closer to the prompt?

delicate stream
#

What are your settings? and are you trying to train a person, object or a style?

#

by settings i mean, activation method, layer structure, learning rate, etc.

naive wharf
#

i'm training purple boots / shoes

#

so moreso object

delicate stream
#

By getting super off results do you mean the images degrade (deep-fry) over time or do they just not look as what you are training?

naive wharf
delicate stream
#

i recommend first using this method: but also make sure your images are of your subject and nothing else just to be safe, "Quality over quantity" as the guide says.

naive wharf
delicate stream
#

Enter hypernetwork layer structure: 1, 1 (linear)
Select Layer weights initialization Normal
Use Dropout: Enabled

make sure they are 512x512 or resize them.

Hypernetwork Learning rate: 5e-5:100, 1e-5:1000, 5e-6

also make a custom prompt template with something like
Shoes.txt inside> a picture of purple shoes or Purple shoes

if that doesn't work try using:
Select activation function of hypernetwork: relu, rrelu, elu, swish, leakyrelu

#

usually depending on what i am training i get good results at 2k

#

also make sure you have in settings Move VAE and CLIP to RAM when training hypernetwork. Saves VRAM. Enabled

naive wharf
#

ok awesome will try this now

delicate stream
#

and

#

Stop At last layers of CLIP model: 1

#

your results will also depend on what model you are using, SD 1.4~5 is mostly real stuff and Waifu diffusion for anime training

#

and also

#

make sure while training a hypernetwork you don't accidentally have another hypernetwork enabled

#

That's all i can say, it should be pretty straightforward. if even after everything, it still fails. Well....idk

naive wharf
delicate stream
#

Yhea they changed a few stuff and i was mad because i had to re-learn hypernetworks again.

#

to help you a bit...

#

These are the activation functions

#

basically they are the weight of the learning

#

something like sigmoid has a high starting point therefore you might get weird noise and stuff. Linear is straight forward and it gets better overtime, however it has a negative slope too, relu is basically linear without that negative slope.

#

Hope that helps in visualizing how your hypernetwork might change overtime

naive wharf
#

yeah, that's great, thank you

delicate stream
#

No prob, dont be afraid to change the learning rate btw just dont go over 0.00005 (5e-5) or your network will die quickly at 1000 or more

#

so in short
1e-4 = no
5e-5, 1e-5, 5e-6, 1e-6, 5e-7, etc. = good

naive wharf
#

and, for the sizing, keep it at 512 x 512 ? would for example 768 x 512 inject extra time/error ?

delicate stream
#

well, from what i understand it's better to play it safe at 512x512 but you can do 768x768 but not uneven numbers, they have to have a 1:1 ratio.

#

or else you might get weird stuff

naive wharf
#

ok great, implementing now the updates 🤞

delicate stream
#

Good luck! Make sure to experiment.

viral jay
#

any luck with anything except linear? if I use sigmoid or anything else I just get noise

tribal rapids
#

Anyone trained 30 images or so of a person on shiv’s? And if so what was your LR and steps? Thanks

delicate stream
delicate stream
#

step 50

#

step 900

#

im not using Linear right now so i can't give an example but as you can see selu gives good stuff, obviously this is still training so it's bad rn

#

step 1150

icy olive
#

where is selu?

fathom crane
#

Hi all, is there any relations between the layer structure and learning rate? Should I use learning rate smaller than 5e-7 when layer is more than [1, 2, 1] ?

hot breach
alpine blade
#

anyone running into an issue where the ai understands who some characters are, but not others?

hot breach
#

might need to be more specific on what you're training, posting examples would help

tough gazelle
sage creek
#

Can anyone link a decent tutorial on how to navigate the embedding/textual inversion procedure? Thanks and I am sorry if not the correct place to ask.

hot breach
#

there are some basics on automatic1111's wiki on his github too

#

this is the right place

sage creek
#

Ty. I appreciate the help

edgy raptor
#

How many steps should a dreambooth model be trained on? E.g how many steps should you use relative to the amount of sample images

#

Currently, I'm using a 1:100 ratio. I used 40 images previously, so that's 4000 steps. However, it seemed to have overfitted the model. Thoughts?

alpine rose
stone garden
#

https://huggingface.co/hlky/xynthii-diffusion
dreambooth model of Xynthii (cyclops monster girls)
1000 steps, will be testing different amounts, results from num_train_epochs=24 (1920 steps) are good
the same images (and prompt) were used for both instance and class

alpine rose
#

by using the technique used in the repos above, the model is able to understand the concepts much better
for example you can train painting styles, and get results like this :

#

here the subject was already known by the model and not in the training images, but it "understood" what it meant to represent subject in the style i was training

viral jay
#

playing with dreambooth, any ideas on how to avoid text on the images? the images used for training have no text, but I've used microsphere worlds as prompt so I think its the root of text on the image, maybe using a random word would help?

alpine rose
storm linden
#

Did Shivam’s dreambooth update break training for anyone? I’m having issues after updating it recently. My training setup is:
3080TI
Windows using Ubuntu
Shivam’s DB
CUDA v 11.6
Python 3.9
Everything else basically set up with nerdy rodent’s 10GB dreambooth video

I’m using 300 reg images with 20 training images. I’ve tried 800 steps at 5e-6 and 2000 at 1e-6 but the ckpt ends up giving either all black images or the colored images. What’s frustrating is that the loss rate is really inconsistent: sometimes it says at 0.18 and other times it goes to nan by 50%

storm linden
#

I’m trying to do the training locally, not on collab

prime rivet
#

How knowledgeable are you lot with Textual Inversion embeddings. I can't seem to get anything but guesses about the actual parameters. What I'm struggling with is that the embedding is too dominant, as it overtakes everything. Even with token range of two. However this seems to be the case regardless of the learning rate. I have understood that adjusting the learning rate can be used to influence the scale dominance. Should I try drastically lower rates? Since noise loss doesn't seem to be tied to learning rate.

#

Also does the initialisation term(s) act as if they were prompts. Should I give it a term or broader range of terms. As in "Underwear" or "Underwear, briefs, pants" with or without the comma. There sodesn't seem to be much of useful information and which there is seem to conflict. I even read the original paper on the topic.

#

However I think all current implementations are different and advanced compared to the original paper.

#

What is the primary stuggle really is to ensure editability of the embedding in use. Currently they seem to work fine even if dominant to make generic basic SD outputs, but if you try to force style it refuses to.

stone garden
vocal pawn
#

Hallo all, I want to train a model on a subject but I've only got 20 good images, if I train a baby model, output 100s of pics until I get a new one that looks very very decent - can I do that until I've got 10 new ones and retrain with the new decent fake 10 to get a better 30 model? Thonk Sounds feasible to me but very new to this so not sure if there's some hidden pitfall

restive bridge
vocal pawn
#

Nice, I will try it and see then peepoBlush
Will have to be extremely selective and mind hands of the warp

#

I've also done some others - I trained 2 60 image models on 6000 steps earlier, and the 6000 ckpt seemed a bit wonkier/worse than the 4500ish one Thonk Is that something anyone else has found, that there are sweetspots with image numbers/steps?

tribal rapids
#

I’ve seen n*80 suggested as a figure, but it’s anecdotal obviously

#

Since it depends on your data, I think you just have to hone in on it

#

@vocal pawn which trainer?

vocal pawn
#

Using the colab fast db atm

restive bridge
light beacon
#

Which sampler should I use if I want very photorealistic results, like this sample?

#

this is dall-e but I am trying to get as close as possible with SD

#

and any other settings recommendations to get this?

#

the prompt is A photo portrait of a female supermodel, soft neutral expression, long blonde hair, symmetrical face, front facing, looking at camera, studio lighting, 8k. Dramatic, professional photography. UHD.

vocal pawn
#

Ty for that :>

restive bridge
light beacon
restive bridge
tribal rapids
restive bridge
tribal rapids
#

Ah right yeah I can get great results from the core model for sure. Thanks for prompts tho

light beacon
restive bridge
# light beacon OMG those look amazing! Mind sharing the prompts for the last two?

for the sweater one: stunningly beautiful fit woman with shorter hair, wearing knit sweater and denim pants, skinny, full body portrait, award winning photo, sharp focus, detailed, photography, 50mm
Steps: 35, Sampler: Euler a, CFG scale: 8.5, Seed: 4293303383, Face restoration: GFPGAN, Size: 960x1344, Denoising strength: 0.32 and for the blonde one: stunningly beautiful young woman with shorter wavy blond hair, thin, wide shot, award winning photo, sharp focus, detailed, photography, 50mm
Steps: 35, Sampler: Euler a, CFG scale: 9.5, Seed: 4192447683, Face restoration: GFPGAN, Size: 896x1408, Denoising strength: 0.32

light beacon
#

TY

lime anvil
fast current
#

Mostly i just wonder what it does with the "ly" i guess

hot breach
hot breach
#

Added a colab notebook for above, any scuffed nvidia GPU should work, just need maybe 4GB?

edgy raptor
#

Made a script for making 512x512 dreambooth images, crops to a 1:1 image first and then resizes to 512x512to preserve as much of the image as possible; for use with TheLastBen's fast-dreambooth

hot breach
edgy raptor
#

Nice tool!

hot breach
#

yeah its just js in the browser

#

sometimes chokes if you put too many huge images in but works 98%

edgy raptor
#

Could've made something like that myself but I don't need to reinvent the wheel 😄

stone garden
#

Dreambooth
prior-preservation loss
train text encoder
105 images
same images used for both instance and class
prompt of "taylor swift" for both instance and class
69 epochs/7245 steps lr 2e-6

prompt: taylor swift k_euler_a 69 steps

works quite well, another test of my idea to use the same images/prompt for instance and class

alpine rose
#

Has anyone content to recommend on regularization images ?
Some sort of theory guide ?

#

I know they're supposed to match what you are going for during inference, but there's probably more to it

trail rock
#

I have few dozens pictures of art style I am trying to replicate, but what reg images should I use to train a style in Dreambooth, please?

#

can I use the person dataset?

#

I tried generate 200 images with prompt "graphic style" and I will see how it goes 🙂

shrewd wedge
#

how do I save the values used for training textual inversion?

novel trout
#

Hello channel, anything reference code to recommend please if I'd like to finetune SD-1-4 on a customer image-text dataset?

#

Am i supposed to resume from SD-1-4 or SD-1-4-ema?

#

Thanks!

next nimbus
#

what samplers you guys use after training your model with dream booth?

#

I usually use lms but after training my model with dream booth, the only good sampler that gives a good image is euler a, not sure why.

#

my lms gives a really bad result.

stone garden
tame aurora
novel trout
novel trout
stone garden
#

(original is for the CompVis repository and their scripts that don't use huggingface.diffusers)

novel trout
#

I checked this before and it seems no ema weights are included. I might miss sth tho.

stone garden
#

ah, sorry then, I haven't used the Diffusers library yet so I'm not familiar with what's available for it

#

btw is finetuning/training even possible with diffusers (the library)?

novel trout
#

Don't think they have a full training pipeline supported otherwise.

stone garden
#

Yeah, last time I checked I couldn't find anything.
Thanks for sharing! It does seem fresh 🙂

hot breach
alpine rose
#

Ok guess I'm a retard

hot breach
#

I guess png support would be nice but I think at 99 you're losing very little

viral jay
#

dreambooth is pretty crazy, did a model to do emojis

chrome oxide
#

is there a dummy guide to dreambooth for Collab?

#

I am asking, as I am having a hard time getting to understand some of the notebooks, and the performance is not as good as I expected.

fluid folio
frozen ivy
#

Hello everyone! newbie here.
I followed Arki's guide to install InvokeAI. I am now looking for how to train it with custom faces. Is there a guide somewhere by any chance?
Or do I have to use another version than InvokeAi?

light jetty
# chrome oxide is there a dummy guide to dreambooth for Collab?

nerdy rodent does good guides https://youtu.be/VgKDZqAii1I

Want to add things to your AI art but don't have a powerful Nvidia GPU at home? No worries - got you covered with this diffusers version of Dreambooth which can be run for FREE on Google Colab! Works GREAT on a T4 with just 15GB VRAM. No need to install anything - just run straight from your web browser. Even runs on a potato computer ;)

As a b...

▶ Play video
#

might be slightly different, as it was a month ago

#

which in AI time, is about a year

night hound
viral jay
#

a bunch of different images with modelling clay look

#

it produces good results

night hound
manic estuary
#

Anyone tried doing (something like) Dreambooth on JUST the text encoder, while freezing the main part of the model?
I'm experimenting with this now but I'm not sure what to expect

novel trout
manic estuary
# manic estuary Anyone tried doing (something like) Dreambooth on JUST the text encoder, while f...

Following up on this: it worked reasonably well on pictures of my own face, although it didn't replicate my likeness quite as much as normal Dreambooth. I haven't tried textual inversion, but I'd expect that the results are roughly comparable given the similarity between the methods. Possible that the results would be better if I used a lower learning rate than 5e-6, always used 1e-6 or lower for normal Dreambooth.

viral jay
#

I've now created a new model, but instead of 1000 steps 1e-6 I'm with 6000 steps 1e-7, it's a bit more free to create different stuff now, still looking for the right spot

#

with second model I can create logos and other stuff applying the same look which is very nice

dawn trellis
#

(Difference; it’s quicker to do but the results are significantly worse)

frozen ivy
novel trout
viral jay
#

I just found that using TI + my emoji model I can then create emojis based on real people, my wife and me for example

abstract widget
viral jay
#

It'a the Textual Inversion, but I said it wrong I'm actually using Hypernetwork + model, I think TI can also help to direct the image but HN for me on photos has give some better results, we can use a custom model like disney or this emoji one with HN or TI that has been trained on another model even if it's not suggested to as results may vary

alpine rose
#

if anyone is interested, i made this script for automatic webui to generate regularization images for a set of training images, to then use with kanewallman's repo
it's pretty ugly but seems to work
for each training image, it first creates a caption using BLIP, then generates X reg images out of it

#

can be used with txt2img or img2img, haven't really tested training yet so I can't tell what yields better results

#

i should probably expand it to automatize the captioning and renaming of training images as well, now that i think about it

summer oriole
#

I am going to attempt to do some hypernetwork training. Do I need to use the bigger 1.5 checkpoint? And once I've done the training, can I use the hypernetwork .pt I create with the smaller 1.5 (emaonly, whatever that means) checkpoint file?

viral jay
#

does anyone have examples or a better explanation of what the prior preservation does on dreambooth?

hot breach
#

its there to keep knowledge in the model, beyond that you probably want to read the dreambooth paper and it gets math heavy fast

#

if you train without any effort to keep the model in tack you'll cause "damage" to the model, things will start to look messed up. You will with dreambooth too but the regularization/prior presevation is there to try to slow that down

neat oxide
#

anyone interested in markiplier finetune

#

i made it

stone garden
# stone garden Dreambooth prior-preservation loss train text encoder 105 images same images us...

Further development on this. Retrained using a larger more refined dataset (235 images total), still experimenting with the idea of using the same images/prompt for instance and class however this time some images were excluded from the instance set (205 remaining out of 235 total).
These results are 19 steps k_euler_a, 512x704, 7.5 cfg scale, gfpgan1.4 + RealESRGAN_x4plus, they are not cherry picked either, 8/10 results at 19 steps are good
prompt: a photograph of taylor swift, outdoors, shot on iphone 14 instagram 2022
Personally I haven't seen anyone else's results with dreambooth produce such an accurate likeness to the person trained.

Just to note: this is purely for research purposes, I have no intention of releasing these models. I do want to write up my findings along with my thoughts on the implications of models like this which can accurately reproduce a person.
Also if you're wondering about the choice of subject, I chose Taylor Swift for two reasons, mainly because she is a celebrity so everyone knows what she looks like, if I tested this method on myself I couldn't really ask people "does this look like me", then the deciding factor was just that I like the new album and I've been listening to it a lot.

crimson meteor
#

hey guys, Python noob here and i'm trying to fine-tune my first custom ckpt model, Kinda like that robo diffusion model, would love it if you can provide me with any links to tutorials or resources to help me get started?

jovial ore
#

Having a hard time making a D&D-style Kenku. Any suggestions for artists or any other prompts to add?

hardy storm
night hound
night hound
night hound
chrome oxide
#

good lightning conditions with many facial expressions etc.

glossy rune
stone garden
abstract widget
half spoke
#

What is a good tool for finetuning with Dreambooth locally? I have a 3090, I've been using n00mkrad's text2image-gui. I am comfortable with the cli.

north stream
alpine rose
#

how do you guys measure model "corruption" when fine tuning, to tell if you are overfitting or not ?

alpine rose
fierce oar
#

Hi guys, I wonder if any of you guys know the procedure for finetuning stable diffusion for inpainting task only (the one that they described in v1.5). There seems to be a config file for inpainting inference, I am not sure about training and how to run the script for inpainting. Hope that someone can help me out! Thank you in advanced!

viral jay
# hardy storm This is fantastic. Bravo! I've been trying to do this exact thing. And for your ...

Yup, I've generated around 50 class images, but with several different types around 5 images per generation the idea was to try to not bias the style to something exactly, but I'm also playing more with it and I found that using no class images also brings good results, I'm kinda lost right now because I did so many tests lol but I will try to get something more "scientific" with proper results later so I can study and share better info about it.

#

guys what the dreambooth train_batch_size does exactly? its a bit confusing because I thought it would train 2 images per interaction, but it seems to take almost same time as using batch_size 1? or speed keeps the same but it can do more? like 1 batch_size = 500 it and 2 batch_size = 1000 it?

fair perch
hardy storm
viral jay
#

img2img with a full white image produces a image with white background, use 1.0 for denoise strength

hardy storm
viral jay
hardy storm
viral jay
#

or use an image like this

hardy storm
viral jay
#

it can get some good results, using 0.95 denoise strength, and we can easily change the background color that way, I've used a purple background with noise circle, so it generates the emoji on top of that

viral jay
#

So here's probably the best results I achieved for emoji, 50 class images, 22 (I have selected less images from what I had, reduce count of faces itself) emoji images with style I wanted to replicate, 1000 steps LR=1e-6

#

dreambooth instace prompt was "dreamfoil emoji" and this is the result of "dreamfoil emoji, head girl with colorful hair"

#

here's more examples of results

hardy storm
manic estuary
#

This is kind of vague because I don't have time right now to write more about my experiences with this, but I tried running dreambooth but only optimizing the weights of the attention modules, i.e. 'CrossAttention', 'SpatialTransformer', 'SpatialSelfAttention', and 'LinearAttention', and my first impressions are that it seems to work BETTER than optimizing the entire model (with or without optimizing the cond stage). Better generalizability during inference and harder to overfit.

dapper prism
leaden patio
#

Has anybody dreambooth'd it with the best midjourney images yet? You'd think so.

leaden patio
viral jay
#

sphere worlds, 12 images, 1000 steps and no class images, loving dreambooth

hardy storm
night hound
fallen nova
#

finetuned on 24 pieces by Yves Tanguy, 2400 steps

#

prob my fav result so far

#

kay sage is next on the list

drowsy adder
vast aurora
#

Do people recommend using female / woman as a training class vs person? what are the best practices?

viral jay
north stream
#

This is what he meant

viral jay
#

Ah, then no, I'm using shivam repo

#

running locally on 3080ti / WSL2

novel trout
woeful goblet
#

Some of the black inpainting mask is appearing in the output

#

that dark part is there in all of them, its what i painted and not a generated result

#

why/how can this be happening?

runic hatch
#

does anyone have a guide on how to train or further train a model hosting on paperspace

dapper prism
dapper prism
#

With an Nvidia A100 40GB graphic card, I was able to produce 50 images every 1 minute and 22 seconds. The speed could potentially be improved if I could get xformers setup properly

oak ether
glossy rune
#

has anyone tested the effect of including the updated stabilityai vaes into dreambooth training vs adding them after training?

wintry girder
#

Can you use an embedding in the initialisation text for a new embedding?

night hound
split acorn
#

yeah

glossy rune
wintry girder
#

Wellllll?

glossy jasper
#

Anyone with a good config to train Dreambooth on a 3090 on Runpod?

#

I get CUDA out of memory with lots of configs and it seems a bit weird

wintry girder
#

Ok, on another topic, I hear that ---medvram is crap for textual inversion, but if I disable it I get out of memory errors. I heard from the interwebs that I could edit "v1-finetune_lowmemory.yaml" to make num_workers = 4 (instead of 8), but I don't see that file in a1111. Please help?

stone garden
# glossy jasper Anyone with a good config to train Dreambooth on a 3090 on Runpod?
Which type of machine are you using? ([0] No distributed training, [1] multi-CPU, [2] multi-GPU, [3] TPU [4] MPS): 0
Do you want to run your training on CPU only (even if a GPU is available)? [yes/NO]:
Do you want to use DeepSpeed? [yes/NO]:
Do you wish to use FP16 or BF16 (mixed precision)? [NO/fp16/bf16]: fp16

or

In which compute environment are you running? ([0] This machine, [1] AWS (Amazon SageMaker)): 0
Which type of machine are you using? ([0] No distributed training, [1] multi-CPU, [2] multi-GPU, [3] TPU [4] MPS): 2
How many different machines will you use (use more than 1 for multi-node training)? [1]:
Do you want to use DeepSpeed? [yes/NO]:
Do you want to use FullyShardedDataParallel? [yes/NO]:
How many GPU(s) should be used for distributed training? [1]:2
Do you wish to use FP16 or BF16 (mixed precision)? [NO/fp16/bf16]: fp16
hot breach
#

got variable aspect ratio training working, need a touch more work on the code before release but works surprisingly well, posted some tests in #1010579244188958901

#

the real gain is never having to crop or resize images again

glossy rune
#

That’s a nice thought 😁 looking forward to see how you made it work

wintry girder
north stream
#

I don't think you can (it needs 8gb of vram)

glossy rune
#

there's a 12gb 3060 version, but i have not tried TI yet

north stream
#

With 12gb it should work just fine

wintry girder
#

I guess i just need to not use the a1111 version. Shame.

gloomy belfry
#

let me know if you need a tester

hot breach
gloomy belfry
hot breach
#

I forked from kanewallmann, it's one of the xavier forks using lightning trainer, at this point the code is mostly xavier's original DB implementation and my changes

#

kane was the first to put in the ability to fully caption images so I started from his

old igloo
#

I did my first couple of experiments training images with dreambooth. I copied the ckpt files into the models/stable-doffusion directory, and have been able to utilize them, with decent results so far. Probably need to do some retraining with better photos. But my question is this: Do I need to merge the ckpt file into the 1.5 ckpt file for best results?

hot breach
#

there's been a LOT of hacking on the code, I think djbielejeski put in some meaningful changes as well

glossy rune
# old igloo I did my first couple of experiments training images with dreambooth. I copied t...

in my experience 1.4 is a bit easier for dreambooth than 1.5. for 1.5 you probably want to work with lower learning rates and fewer training steps (like 1e-6 and 2000 vs 2e-6 and 7000), but that also depends on the number of your training examples.

i have not merged dreambooth ckpts into original ckpts yet but i assume you get best results by proper training and just using the dreambooth ckpt

old igloo
#

Thank you for that. I only did 25 images, and the only setting I adjusted was steps from 800 to 1000. Sounds like I could/should try much higher number of steps?

glossy rune
#

my first tests were 4/5 images, sd 1.4 and 800 steps and those were pretty decent (photo realistic character). now scaling up from there with much more nuanced expectations...

tame sierra
#

If I'm testing different checkpoints of my embeddings (.pt) in the embeddings folder of automatic's gui, do I have to restart the server to pick up the changes?

gloomy belfry
vast crystal
#

idk if this is the right server to ask my question but my 1050ti went from generating 1 iteration every 1.5seconds to 1 iteration every 6.5seconds and i changed nothing.

hot breach
gloomy belfry
#

nice thx

hot breach
granite portal
#

I'm trying to train a massive dataset (About 100k images or so) using this notebook https://colab.research.google.com/drive/1vrh_MUSaAMaC5tsLWDxkFILKJ790Z4Bl?usp=sharing&authuser=4#scrollTo=Um6kJUmIlDaC (It's the only one I know that allows image names as prompts) but whenever it tried to generate a sample, I get the error "KeyError: 'sample'". I tried changing stuff on my own based on errors I saw on github but it never worked. Is there a better colab for finetuning a network on longer prompt images?

tacit bronze
#

dreambooth'd every king's quest 6 bg, 5000 steps. chef's kiss

crimson wasp
old igloo
#

But I'm very new to dreambooth, so I haven't figured out much yet.

#

When convertng the weights to ckpt, what's the purpose of converting to fp16? Does that 50% reduction in size also reduce quality?

tacit bronze
#

@crimson wasp I think you gotta be careful, spread across too many games and you start to lose the style for sure

#

best option in my opinion would be either picking a single game's style for each gen just based on the intended scene (ie kq6 only), training together very similar artists (ie kq5, kq6, sq5 or so), or training only within a particular genre (ie "fantasy" with kq5, kq6, conquest of the longbow)

light jetty
tacit bronze
#

merged can comprehend certain things better that are in the other games that arent in just kq6, though it tends to smooth out finer details and has more perspective (when the flatter scenes are likely more preferable for a 90s style graphic adventure game)

noble cairn
#

Hello,
Anyone know how can I proceed to generate images like this?

tacit bronze
#

hmm, after trying kq6+laurabow2 training, I gotta say that similar detail doesn't work quite as well, worse than similar genres (the fantasy merged one above).

I think composition and nostalgiawise, the best approach is a per-game 5000 step model

orchid imp
#

Could someone point me in the right direction to understand how to train dreambooth with a 'style' rather than a 'model'?

regal harbor
#

anything trained on faces? Especially diverse faces (not all good looking. maybe some ugly. different ages)

viral jay
viral jay
#

Guys, has anyone tried to train normal map images to see if we can get it as a style?

little parcel
#

Hello

#

I have 20 images!

#

What are the correct settings in your opinion?

slow badger
#

Dumb question but do I have to use the full-ema model to generate images if I have trained an embedding on it or can use the same model's pruned version and get the same results?

hot breach
#

if you're just training an embedding I don't know if it will matter much

#

if you're unfreezing the model the "right" thing to do is train on the full file with both ema and nonema weights in it, and only prune to a 2GB ema only file when you're "done"

#

the lightning trainer will use nonema weights to train when present and fall back to ema weights if it can'tfind the nonema weights

#

again if you're just doing an embedding/TI I don't think it will matter a whole lot

slow badger
#

Alright, I'm just creating embeddings, I don't have enough VRAM for the rest, thanks!

silent spear
#

Does anyone have a publicly-accessible trained model I can use as a test? Nothing sinister, I promise 🙂

glossy rune
limber peak
#

What is best dreambooth repo for my rtx 3080 10gb?

#

I see many of the top repo ask me to give them like 24GB or more

subtle moth
#

So I'm trying to do fine-tuning on SD 1.5 with a large dataset (900k image). I have the training running on a 5-GPU A100 box with 90 cpus and 470 GB Ram.

For some reason anytime I run training with multiple GPUs it runs slower than just a single GPU. I've been trying to figure this out for hours now but can't explain that yet.

Does anyone have a guess as to what I'm missing? Or an example of training on mult-GPU computers?

slow badger
wooden shuttle
# orchid imp Could someone point me in the right direction to understand how to train dreambo...

This video has the info you are looking for. Also @dapper prism put together some regulaization images, including a "style" that you can use for your training.

https://youtu.be/7bVZDeGPv6I

Want to run Dreambooth for Stable Diffusion locally so you can train multiple concepts at once really quickly? Not a problem! Runs on Google Colab as well, so you don't actually need a modern computer to train.

Works on Microsoft Windows (partly), but for the lowest VRAM usage you'll need to use Linux (as with most AI stuff). Also remember to c...

▶ Play video
dapper prism
#

Then if that doesn't work, I'd start looking at the other options

tacit bronze
#

best jojo results so far, 900 step training on part 4 character portrait renders, cropped at the top square

chrome oxide
#

are there any other image up scaling models? I tried to use ESRGAN to upscale the images generated by SD, the results are good, was easily able to get 4x resolution bump, but I am seeing more artifacts in the images compared to the original outputs from SD. Any Suggestions?

north stream
#

SwinIR

opaque scroll
hot breach
#

no cropping or resizing required at all

#

going to work on updating some stuff so I can bring vram requirements down, then probably work on notebooks

#

example training set directly fed into trainer

swift terrace
#

oh wow.

#

thank you, i'll give this a shot this week

gray phoenix
#

I have a few questions about Dreambooth

So, I wanna give it a try training a model, but I have a question: do you train the model to the style that you want your outputs to follow (let’s say for example anime style), or do you train it using a subject that you want your outputs to be like (let’s say for example an actor)? Or you can do both? And if so, how do you train your model for each purpose?

half spoke
gilded crater
alpine rose
tall badger
tame aurora
tame aurora
hot breach
#

ema and nonema are standard ML terminology, lots of info on the web, but short version is EMA is intended for inference and is a way to keep the model from being biased to the most recent training samples it was trained on

#

for fine tuning, it is generally preferred to use non-ema weights because the ema is going to bias the starting point of your fine tuning

#

I guess in practice with SD using EMA weights for fine tuning doesn't seem like some huge critical failure, but it's the general suggested practice

tame aurora
#

but isn't ema supposed to keep more of the model's original "knowledge"? In that case it's actually good to "keep the model from being biased to the most recent training samples"
(I'm trying to understand it and make up my mind what's best in my case)

#

I'm not sure what you mean by "bias the starting point of your fine tuning". Maybe I should read more indeed 🙂

hot breach
#

biases the start of fine tuning because when you fine tune it is creating nonema weights anyway, ema weights are a byproduct

#

the trainers will (or should) use nonema weights if present in the ckpt

#

if nonema weights are not present, it will copy paste the ema weights to nonema weights, then start training and be training on nonema weights, only producing ema weights as a byproduct

#

it will probably make more sense if you do a bit of reading on ema

tame aurora
#

Definitely! I'd do that and thank you for the effort

hot breach
#

again, in practice, I'm not seeing that using normal 4gb/2gb files as a starting point screws stuff up a bunch, maybe its not a big deal for stable diffusion, but its just a best practice type thing

tame aurora
hot breach
#

there are a lot of weird things people are doing that fall out of best practice, so just be careful of what the masses parrot around, I imagine SD is most people's first foray into training machine learning stuff and they usually are sort of "going through the motions" after watching a couple youtube videos, and the popular creators are often just as ignorant

tame aurora
#

I agree, I'm trying to filter those out as well

hot breach
#

I've had a lot of "wtf" moments watching what people do or tell others to do, or when they state opinions on what is best, etc

#

and back to ema/nonema I'd like to A/B test that and compare just an issue of time to do it, but since I did most of my early stuff just off the 4gb ema-only 1.4 and it wasn't a disaster its just been low priority

#

SD may be more resilient to ema vs nonema due to its architecture vs other ML models? just guessing, or the LR people are using for fine tuning is low enough the recency bias is not that large

tame aurora
#

btw I had the chance to try non-ema before switching to with-ema and it seemed like the model was quicker to "forget" it's general knowledge and start producing stuff similar to my training data (a very small dataset)
with EMA it takes noticeably more epochs before it shifts that way (without changing other hyperparameters)
so it's like striking a balance between what you need - more original model or more "your" model. And I believe there's multiple ways strike this balance, considering other hyperparameters like learning rate

subtle moth
tame aurora
#

there's a progress bar (tqdm?) for each epoch and it reports the time for that epoch.. I just noticed those times are shorter when on 4 gpus. Today I also noticed there's an "Average epoch time" in the logs as well so you could pay attention to it as well, I guess

subtle moth
tame aurora
#

why not make a subset of 900 images and play with it to streamline the process first? 🙂

subtle moth
#

ha yeah, good obvious idea that I hadn't thought of. to be fair, I didn't expect the metrics to be like this

#

I'm also planning to add some better multi-gpu metrics

tame aurora
#

what kind of metrics do you mean?

subtle moth
tame aurora
#

actually, you may be right - I remember something like the # of steps in an epoch being less when I switched from 1 gpu to multiple

old igloo
#

I'm new to Dreambooth but so far I've trained 5 or 6 models with photos of myself and family members. I've noticed that with some of the models, SD struggles to produce images that are significantly different from the input images. I am sure I've made some missteps in the training process, but I'm not sure where to start in terms of correcting those mistakes. For each set, I used 30 images, cropped to 512x512, with training steps of 3000, and 3e-6 learning rate (also did some with 1e-6). I used generic "man" and "woman" class names. I am able to get outputs that look like the people I trained in the models, but can't seem to get it to change their appearance much, such as with a prompt like "XYZ man as superman". Any advice on what I can adjust to correct this?

old igloo
old igloo
#

So, it seems like with a model I trained with Dreambooth and converted into a ckpt file for use with Automatic111, I need to use much lower CFG scale than I normally use in order for it to honor the prompt and not just give me recycled versions of the images I trained on. Is that related to the number of images I trained on, the number of epochs, and/or the learning rate?

hot breach
#

struggling to generate something other than training images and having to lower cfg scale is a sign of overtraining

#

adding more images to train and/or decreasing steps might help

#

ideally, you are getting multiple checkpoints when you train at different step intervals and you can test several out and pick the best one, if you only get one ckpt then you're kinda stuck starting over to try fewer steps

old igloo
#

Is 30 images not enough? Is 3000 steps too many for 30 images? I'm using the Dreambooth colab notebook, I don't know if I'm getting multiple checkpoints. My save interval was 4000 when I ran 3000 steps. Does that have anything to do with it?

stone garden
#

your save interval should be lower than your steps. when you have 3000 and a save interval of 500 it creates a checkpoint every 500 steps. You would end up with 6 checkpoints (500,1000,1500,2000.2500 an 3000)

old igloo
#

Ok, that makes sense. And is that a recommended save interval for 3000 steps?

sharp solstice
#

is there a comparison doing embedding vs hypernetwork in automatic1111? i understand the differences, but i don't think i've seen any comparisons online

#

i'm trying to train an embedding right now and so far I'm getting much better results than hypernetwork

stone garden
old igloo
#

I see. So the purpose of the checkpoints is to give me the option of choosing which checkpoint produces the best results?

tired wind
#

yes, since overtraining is a common problem

hot breach
#

yes there's a chronic problem because of how dreambooth has been popularized

#

people just try to guess how many steps they need and only generate one ckpt file, if you overtrained you're screwed and have to start over

#

if you use an online service you'll need more volume storage to store the files as you train, but its well worth the small extra cost for the volume storage so you don't have to keep renting the instance again to start over

half spoke
frozen bobcat
#

I've had very good results training a custom character.
But is there a best practices to inculding certain poses and angles among the images used for training?

half spoke
#

With embeddings/hypernetworks, I imagine you could just note the pose in the template [textfiles]

frozen bobcat
sharp solstice
# half spoke My understanding is that hypernetworks are better for style, but I haven't exper...

Yeah I'm getting the hang of it i think. The comparison you posted is also very useful.

So embedding is just like training a word to become like a very specific prompt. So for example if you're using a model trained on real people only, you wouldn't be able to train an embed to fit an anime character

Hypernetwork is seems more like a continuation of the checkpoint where the image data is stored onto the network given the prompts you use which is closer (or is the same?) as how real training is done

sharp solstice
fallen nova
#

normal 512 dimensioned stuff

thorny sapphire
#

Does anyone have any good tutorials for getting textual inversion/hypernetworks working? I tried myself with some that I downloaded off the Hugging face repository and I cannot seem to get them working. They always throw an error about things not being in the right memory space or something whenever I hit the train button.

#

I am wanting to train for my wife, step daughter, and pets so I can produce some art of them, but I can't seem to figure out what is going on.

#

Seems like a pytorch issue.

half spoke
# sharp solstice Yeah I'm getting the hang of it i think. The comparison you posted is also very ...

if you're doing anything anime related, you're going to want to use an anime model. No matter the training/tuning you do, the other data in the model will still effect the results. As for what you would do for prompts for a hypernetwork, I'm no expert. I'd just use the hypernetwork.txt file under the textual_inversion_templates folder from web ui. Yes embeddings effect the results of a given prompt, while a hypernetwork can be loaded and have its effects increased/decreased. They're both detached from the actual model file, unlike dreambooth which will create a new checkpoint file

iron tundra
tired wind
# sharp solstice Yeah I'm getting the hang of it i think. The comparison you posted is also very ...

You should be able to do textual inversion on an anime character. Then you could combine it with a hypernetwork of a style. I don't know about dreambooth. One important thing is you can generate an image with multiple textual inversion embeddings, so you can say [person1] and [person2] in X, where as the hypernetwork is 1 thing right now.

I've been running tests of [person1] and [person2] in [style] with and without using a hypernetwork in addition to that. I don't have any conclusions yet other than its probably worth training the same thing as textual inversion (do this first), then hypernetwork. Also I think hypernetworks should have more training samples where as textual inversion embeddings you may get better results on a low number.

tired wind
# thorny sapphire Does anyone have any good tutorials for getting textual inversion/hypernetworks ...

The AUTOMATIC1111 webui should work out of the box. I was expecting it to be really complicated and it ran with zero problems on a Windows machine with a 3090ti. Running on google collab I had lots of problems. If you are getting memory errors your video card may not have enough vRAM. Textual inversion follow this https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Textual-Inversion (simplest, only 4-5 images should be enough) For hypernetworks follow this for settings - https://rentry.org/hypernetwork4dumdums

GitHub

Stable Diffusion web UI. Contribute to AUTOMATIC1111/stable-diffusion-webui development by creating an account on GitHub.

half spoke
iron tundra
half spoke
reef junco
#

Trying to run hypernetwork training locally, getting a ValueError: the following 'model_kwargs' are not used by the model: ['encoder_hidden_states' , 'encoder_attention_mask']. Any idea what is causing this error? Can't find too much info about it online right now

#

this is during the preprocess step, seems to be an issue with BLIP as some extra info, PP runs fine if i turn off blip captioning

#

Can i just use an external clip interrogator and copy the prompts over for each image instead? Clip interrogator seems broken with the same error as well

glossy rune
#

blip/clip-interrogator seem to depend on transformers==4.15.0 while auto1111 will bump transformers to something more recent (4.24.0 is current)

reef junco
#

So in ubuntu, would i just remove the current transformers package and reinstall the older one?

glossy rune
#

pip install transformers==4.15.0
then run clip-interrogator
when done run pip install -U transformers

reef junco
#

I will try this out in a few minutes and report back if i have any more issues, thank you very much 🙂

#

I ran the hypernetwork training with no captions just to see what would happen while i waited so imma give it the last 10 minutes it needs to finish, this will atleast be a good test aswell of seeing what no captions does.

#

also just as a side questions, does it need to have the same name on the photos after the captions as the hypernetwork name? like if i named my hypernetwork CBReal should all the photos have CBReal1..2...3 etc before i preprocess?

glossy rune
#

never worked with hypernetworks, sorry

reef junco
#

No worries! i appreciate your help at any rate, im starting with hypernetworks cuz the local dreambooth training is a bit above my head at the current moment

glossy rune
#

ah nice, tanks

reef junco
#

👌 Noted,

#

well fuck me, its still erroring out, clip interrogate is showing TypeError: Unsupported operand types for += 'nonetype' and 'str'

#

Blip is still the same error as above

reef junco
#

Yeah no luck with downgrading transformers, anyone got any other ideas, clip interrogate was working like a week ago idk what would make it stop

#

found the requirements file from blip seems a bunch of stuff was different version, gunna correct the versions it needs and see what happems

#

also fuck me i cant spell today

reef junco
#

sadge, still not working

half spoke
#

okay, noob/dumb question time. Using d8ahazard's extension would I use my folder with the regularization images for the Classification dataset directory?

#

and for "instance prompt" if I'm training a character on NAI would "masterpiece, best quality, artwork of 'X'" be ideal? In any case how does it know 'X' is the prompt I want? or would I be stuck with using "artwork of 'X'" for the prompt?

frozen bobcat
#

Is it possible to feed images of a 3D created city block and have SD train on that location? Anyone tried it yet?

rustic lava
#

Has anyone updated https://rentry.org/hypernetwork4dumdums or something similar to show the workflow with the new plugin for AUTOMATIC111's repo?

old igloo
# stone garden that's up to you how many check points you want. if you want 10 check points you...

Ok, so I took your advice, and I retrained my model with 30 images up to 3000 steps with checkpoints every 500 steps. I'm finding that even on the 500 step and 1000 step checkpoints, my model is overtrained. Testing it with a 7.5 GS produces images that look too much like the original images and don't really honor the prompt, which as I understand it, means it's overtrained. Do any of these other settings look off to you?

stone garden
#

Sp, what if you you use like 200 images of a subject? is there a rule for training steps? or is it a overkill?

#

dreamboothing of coarse

old igloo
#

I read in a Dreambooth guide that you should use 200x the class images as you have training images, and another that said they generally just use 1000 class images generated by the script. So I tried changing the num_class_images parameter to 1000, but as I am watching it run, it is only attempting to generate 238 class images. Any ideas?

stone garden
#

okey, pretty close to the art style of sleepy gimp. now I have to train trench coats I see 🤣 (trained on 13 images with out class data and 2400 training steps)

stone garden
#

okey, needs some fine tuning. lets lower the learning rate

old igloo
stone garden
stone garden
#

well there is a little bit of michille Michelle Pfeiffer in her... okey lets do 1e-7 and reduce the steps

shrewd jewel
#

I've been trying to train embeddings or hypernetwork (only 8gb of VRAM so no dreambooth 😭) and on a certain set of training images I ONLY get results like these: I've tried trimming down the image set (from like 50+ down to 20ish) with different lighting, clothing, a couple full body shots, a couple torso shots. I cropped and resized all the images myself. Went through and edited the pre-processed prompts blip spit out. I've tried both hypernetwork and embeddings (using auto1111). I've tried adjusting the learning rate up and down. I've tried various checkpoints from 1000 steps up to 5000 steps. I've tried changing up the keyword juuust in case its colliding with something else in the dataset. What gives? These should be photos of a person btw and I've successfully done this with other people.

shrewd jewel
#

rubber ducky your the one. I think I figured it out, the vectors per token was set to 1 with this embedding and not the others. Raising it seems to have fixed this issue.

surreal mango
#

so im using a dreambooth model of myselve
last night it worked okay
but then Sd updated and the layout is weird like this but not only that I get the same photo on every generation even with diffrent seeds

slow badger
surreal mango
#

yes

#

any suggestions?

#

also

#

what the hell keeps happaning with the noise?

slow badger
#

The last update to AUTOMATIC1111 was on the 8th at 7AM UTC

surreal mango
#

I dont know then

#

it showed git pull commands working saying there was changes

glossy rune
clear flume
#

yo

#

I tried to install dreambooth locally

#

but it keeps giving me errors

#

can't import some stuff

#

maybe there is a way to unistal it and reinstall it?

surreal mango
glossy rune
#

Depends on other variables like training text-encoder or not, prior preservation and if yes number of class images. Generally speaking 1000 steps are a good start and for sd-1.4 as base model I’d recommend lr 2e-6. with 1.5 I prefer 1e-6

maiden grail
#

Is there a way to label images with certain words, for the model fine tuning?

For example, let's say I am making magic staffs.

I would train this on images of staffs, and let's say 1 of the staffs is called a "staff of power".

I would want my model to be able to generate a "staff of power" but I don't want to make models for EACH of these descriptors.

I want to have 1 model, that is "staff" model, but I also want to instill into it the concept of a staff "of power"

high venture
#

Should i use larger batch size when training the model with dreambooth? I am able to set batch size of 4-6 on my rtx3060, and it runs significantly faster, if multiply iteration time times the number of batches.

glossy rune
#

I‘d probably first remove low vram limitations like fp16, training without text encoder etc, before increasing batch size. But for efficiency you want to use your vram as much as possible. I usually train with full features and bs 2 (on 24 gb vram)

hot breach
#

larger batch size is probably better, computes gradient across the whole batch

glossy rune
old igloo
#

For anyone running Dreambooth on colab, if you want to have all of your checkpoint weights to be converted to .ckpt files, you can modify the conversion cell to the following:

import subprocess
import os

half_arg = ""
#@markdown  Whether to convert to fp16, takes half the space (2GB).
fp16 = True #@param {type: "boolean"}
if fp16:
    half_arg = "--half"

print("Converting all weights located within " + WEIGHTS_DIR)

for dirname in os.scandir(WEIGHTS_DIR):
    if os.path.isdir(dirname):
        try:
          print(dirname.path)
          modelpath = dirname.path + "/model.ckpt"
          print(modelpath)
          val = subprocess.run(["python", "convert_diffusers_to_original_stable_diffusion.py", "--model_path", dirname.path, "--checkpoint_path", modelpath, half_arg])
        except RuntimeError:
          print(RuntimeError)
          continue
old igloo