#🔧|finetune

1 messages · Page 8 of 1

split acorn
#

a rare token is like "olis" or "hta" or "sks"

frank urchin
#

i see what you mean yeah

split acorn
#

You could try it! might work well. I'm not sure

frank urchin
#

correct me if this is dumb but could i like shorten princess to prncs or something 😭

split acorn
#

The longer the instance token the more you have to train

#

You could. But honestly, I think that's something you worry about later once you get the hang of it alicatPog

#

If you're training on Any3, I'd recommend "hta"

frank urchin
#

im using elysium

#

is that bad 😭

split acorn
#

nah, that's fine

frank urchin
#

ok so

#

princess tutu is fine for now then?

#

this isnt serious or anything

#

i just like learning new stuff so its all just for fun

split acorn
#

mmm, if you're learning you could try that, but the next time you try it, try it with "olis" or "sks" or "hta" and you might have better results

frank urchin
#

ok tysm

#

what else should i add? (if anything)

white current
#

instance prompt:

[filewords] or princess tutu, [filewords]

frank urchin
#

wdym by filewords?

#

OH

white current
#

just type [filewords]

frank urchin
#

youre not telling me to replace that are you

#

LMAO

white current
#

no

frank urchin
#

im too tired for this 💀

#

ty guys

#

ok whats next!

white current
#

for class prompt: [filewords]

#

i use sth different but eh need to experiment

frank urchin
#

okey

#

anything else?

white current
#

alright done

#

now make sure you have selected the model you made

#

on the leftmost

#

section

frank urchin
#

look good?

#

great model name i know

#

😌

stone garden
#

it's better than mine, mine crashed the whole webui!

frank urchin
#

LMAOOO

white current
#

and

frank urchin
#

WOOO

white current
#

now u have to wait a bit

#

only a bit

#

3060 ti might have low vram

#

but that processor is mmmmmmmm

stone garden
#

enough time to pray to the art gods!

white current
#

it nom nom iterations

frank urchin
#

8gb isnt thatttttt low

#

its just low for ai 😌

white current
#

fr

frank urchin
#

i heard peoples 4090s dying from stable diffusion

#

thanks to overheating

#

and having to RMA them

stone garden
#

just be careful with the temperatures of your card if you don't already have a hand on it (not literally)

split acorn
#

I think you can only db with 8gb by doing like 256x256 but I could be wrong

frank urchin
#

i have a very well cooled pc!

stone garden
#

as long as it doesn't turn int o *has!

frank urchin
#

yall i think it failed

white current
stone garden
#

and then had :P

white current
vale egret
#

My 4090 never seems to go above 65deg. Is that because it only has 3/4 of the power cables plugged in?

frank urchin
vale egret
#

What?

frank urchin
white current
#

plug that 4th one, you arent using it full capacity...

white current
#

Lake.exe has stopped working
Please reboot

vale egret
#

Do i need full capacity? It seems to be like 95% the way there with 75% the power. Looks like good efficiency to me

white current
#

dam

#

i gotta get rtx 4090

#

but how

#

am broke

stone garden
#

just save for a A100 instead, not that much of a price difference!

vale egret
#

Sell your ai pictures on patreon

frank urchin
#

oh heres what happened

#

that sucks

stone garden
#

yeah, didn't you need about 12 gig for training? or did they fix that?

frank urchin
#

LORA

vale egret
frank urchin
#

makes 7gb work fine

#

LMAO

white current
#

That explains why

#

W11

#

sucks

#

hm

vale egret
#

Firefox uses gpu

white current
#

fr

frank urchin
#

true

#

ok i closed like 5 tabs

#

i will try again

#

and it failed again

#

is it using LORA?

#

i feel like its not

#

this should work fine

stone garden
#

if nothing else, then I'm sure there's people here who can help make the embedding if you ask them nicely! :D

white current
#

in the cmd

frank urchin
#

yeah it does

vale egret
#

Embedding? I thought kole was training a model

frank urchin
#

this is a model

#

i dont really need a whole model thats just what got suggested to me

#

💀

vale egret
#

What are you trying to train?

stone garden
#

@split acorn Found the speed issue. I set the batch size to 1 and now I'm getting over 4it/sec

white current
#

it would take like

#

2000/4 = 500 seconds to train

frank urchin
frank urchin
stone garden
#

I'd also recommend an embedding if it's just for one character, that way they might use it on all kinds of models :D

frank urchin
#

thats what i was thinking as well

vale egret
#

You can definitely train a character using an embedding, it is somewhat lower quality

#

But at least you can do it with lower vram

frank urchin
#

yeah this doesnt seem to be going well

stone garden
#

and you might need to tweak the images, and settings if your results aren't to your liking so better start slow with something simple as an embedding! :)

frank urchin
#

do i need to start over to do an embedding?

vale egret
stone garden
stone garden
vale egret
#

I accidentally used one of my anime embeddings in SD 1.5 and it gave me Japanese people with massive bug eyes

stone garden
frank urchin
#

ok thats good at least LOL

vale egret
#

No need for regularization with an embedding

stone garden
round hare
#

Thanks, i't a really good idea, but unfortunately, I can't put this kind of expression in automatic1111. I only able to fill the box with one number

frank urchin
#

ok so where would i go from here then to make an embedding?

stone garden
vale egret
stone garden
stone garden
frank urchin
#

forgot that was a default thing my bad 😭

#

i was in dreambooth

stone garden
frank urchin
#

should i change this?

stone garden
#

you can always play around and test what'll happen, if it works or not. But I fear the vram needs to be at a certain number to even be able to run :/

stone garden
# stone garden Not vague at all. You helped!

you must be hallucinating, I never help people. I mostly stumble over things until I either get kicked out, or someone gives me another drink... wait, this isn't the bar. W-Where am I?!

frank urchin
#

(got this btw dw)

stone garden
#

don't worry about the settings, try and get it to work first. You can always redo it if it turns out bad. And also ask people for more help. There's no limits... other than the electrical bill, but I never pay my bills so

frank urchin
#

LMAO

#

i turned 18 a couple weeks ago

#

and literally 3 days after i turned 18 my dad started making me pay rent 😭

frank urchin
#

ty!

split acorn
#

The Prompt Template had a file that just said [filewords] I think?

#

just going to double check that

white current
#

@split acorn finetuning on stable tuner doesnt work cause vram lul

#

guess back to lora

stone garden
frank urchin
#

LMAOO

#

Yeah it's not fun

white current
frank urchin
#

I'm currently in debt to my dad!

#

Isn't that fun

split acorn
#

oh yeah, Stable Tuner is only for 24 GB

#

EveryDream I think is also 24 GB? alicatHm2

stone garden
white current
white current
#

@split acorn also i did a clever trick with .txt file prompts

frank urchin
#

for my 18th birthday my grandparents gave me $1000 to use for buying a car

#

and my dads already taken a quarter of it

#

for rent

white current
stone garden
# frank urchin wdym?

Just joking that you can ask your parents for help, but as your dad is the one who you need to pay it'll turn around as a weird scene :P

white current
frank urchin
#

oh yeah 😭

split acorn
frank urchin
#

at least they agreed on not kicking me out

#

they havent decided what they'll do when i run out of money to pay them tho

#

so thats worrying 💀

white current
#

what the f...

#

thats how they treat their kids...

frank urchin
#

yeah its definitely different

#

its weird being THIS cautious about spending money

#

bc im literally this close to being broke bc of my own parents

#

😭

#

the best part is, is both my mom and dad are very against "normal" jobs

#

and would literally shame me if i got a typical job

stone garden
#

@stone garden The secret sauce was just to add the embedding name at the front of the prompt as well as around midway and even with the custom model it looks quite a bit like the source images

frank urchin
#

so this is literally gonna talk days right

stone garden
#

if you have it save after X images, then you can always check the quality before continuing, I think. Not sure about that now that I think about it :O

frank urchin
#

its at 500 😭

#

ill just check on it tmr and we'll see how it goes

#

thank you all so much for the help!!

#

ima head to bed now

#

currently 2:48 am here

regal harbor
#

I have 300 carefully editing and captioned images, some are closeups of faces, some are single people, some are 2 people together. I'm training on a 1060 6gb, so it's slow.

Could someone tell me an ideal LR to not overtrain, but to maximize my time?

little hollow
#

let it train as much as possible

split acorn
#

10k takes about 40 mins with a 3090 (and GA Steps of 1)

#

Yosh what Taken said

#

10k should be enough to give you good results

little hollow
#

you need to find the ideal lr to grad to batch -> this is literally throwing rocks and trying to hear if it hits something

#

takes time

split acorn
#

yeppp

little hollow
#

certain styles have a "standard" sort of in different models

in 2.1 anime is between 0.1 to 0.0005

#

realistic is ~ 0.05 to 0.001

split acorn
#

has anyone had luck training on the new Waifu Diffusion 1.4 epoch 1? Results so far are really bad alicatNF4

#

via dreambooth

#

I'll try going up to 1800 steps, 600 and 1200 were pretty yikers (nope 1800 is also yikes... will probably wait for the full release in 5 days and then fiddle around trying to make it work after)

little hollow
#

Can anyone donate his favorite ti templetes?

regal harbor
regal harbor
split acorn
little hollow
#

this can copy style and the drawing style altogether

#

but... it doesn't produce high quality

#

just mid

#

i didn't try just name, filewords yet

split acorn
#

Yeah, I could see that being mid quality

regal harbor
#

I'm confused

I train one model, then I train another model, but it seems to continue from the last step of the first model (the Lora Model). I don't understand what's going on there

split acorn
#

you might have better luck with just [name], [filewords]

#

I've only done this for anime stuff though

regal harbor
#

what exactly does Lora Model mean here?

split acorn
#

Danbooru tags make training via filewords a breeze. Also Shuffle Tags works really well, imo alicatUwU

little hollow
#

yeah, sadly not for 2.1

split acorn
#

danbooru tags + 2.1 for training? or

split acorn
#

mind you, I haven't done too many models yet, but... the old settings that worked really well just... aren't

little hollow
#

the danbooru tangs on 2.1 are useless

#

tags*

#

@split acorn

#

you either need to use clip 2.2
or do it by hand, clip 1 works semi well
danbooru like crap

split acorn
#

This includes models like WD 1.4 epoch 1 (which was trained on 2.1)?

little hollow
#

this is for example using clip 1

little hollow
#

2.1 has horrible hands and eyes
look at this pic for example above -->> there are no hands, and the eyes are red

#

even though the entire embedding is B&W or yellow tint

split acorn
#

yeahhhh, my Clip 1 results are horrible too

little hollow
#

eyes in anime usually went above the 0.995 filter --> meaning that eyes are suxual and weren't included in the model

split acorn
#

So I should be training using like Clip 2? and then change the settings to Clip 2 when using it?

#

for better results or

little hollow
#

filter out bad words and such

#

its really easy, you can even filter in the entire dataset or replace words or certain tags

#

usefull as hell, i removed "a drawing of" and "a pencil drawing of" into nothing

#

basically erasing it all in 1 click

little hollow
#

everyone gets errors

split acorn
#

Run 2.1 models via Clip 2 without getting errors?

little hollow
#

clip 2 doesn't work locally

#

no idea why

split acorn
#

ahhh gotcha

#

going to give that a try just to see what happens

little hollow
#

lots of people tried to make it work, but it just wont cooperate

#

you can use the collab for clip 2, but 30 secs per image

#

sometimes a full minute

split acorn
#

it runs, just going to double check that it's doing something

little hollow
split acorn
#

ooo the results are better but still kinda poor

#

I probably overtrained it a lot

#

going to try a diff model

#

OHH

#

I think I know what's happening

#

Training dreambooth on 2.1 results in models with the same infistructure of 1.5, because it doesn't create a yaml?

#

oh it does make a yaml

#

mmm

#

will post example pictures in a sec

#

Really poor quality though

#

but like, it still works

#

I didn't train it via clip skip 2 though

#

Oh that's not the same seed

#

one sec, deleting and trying again

#

Clip Skip 1:

#

Clip Skip 2:

#

super poor quality but no error message

#

and a little bit of a difference

#

but huge though

#

clip skip 1 being better

#

I'm doing something wrong though. I think the settings definitely need to be different

little hollow
#

Clip skip in 2.1 doesnt do anything

split acorn
#

It did do something, but I'd agree that it looks broken

#

or at least it's not working as intended

little hollow
#

Because it touches floats but it just means that the floats round up to a different close number

split acorn
#

ahhh

little hollow
#

So basically it doesnt do anything

split acorn
#

that would explain the subtle difference

little hollow
#

Say thqnks to sinister for explaining that

split acorn
#

alicatLove tyty

#

Sweet! So I'll ignore 2.1 training for now BonGoat

little hollow
#

Dont ignore it, 2.1 gives off much higher quality results

split acorn
#

I could try with the base 2.1 model alicatHm2

little hollow
#

2.1 base is gonna give you different results

split acorn
#

That's what I'm looking for, different results, because the WD 1.4 epoch 1 results were super bad GoatUppies

#

(2.1 would use different prompting and would need me to redo my filewords, so I'll save that for a future project)

little hollow
#

Epoch 1 almost always will be useless, also you can accelerate your training by using good "Init words"
When you are creating the embedding

#

Good init is ~1 to 3 words

split acorn
#

Is that true for dreambooth too?

#

I'll give embeddings a try

native lodge
frank urchin
#

Do it in the training menu

native lodge
frank urchin
#

Im only training for my first time so I probably shouldn't give advice haha

#

OMG

#

it worked!

#

i was able to train on a 3060 ti!!

split acorn
#

also, fyi it should be only "princess tutu, [filewords]"

frank urchin
#

iidk how much its gonna suck

split acorn
#

yay GoatUppies

frank urchin
#

but hey im happy it worked

#

do you have to restart webui for it to load the new embeddings?

split acorn
#

nah

#

just type the name of the embedding and it should auto activate. For the "-18000", "-17000" you'll need to click and drag them to your "embeddings" folder

#

Example:
E:\Programs\AI\Auto1111\stable-diffusion-webui\embeddings

frank urchin
#

yep!

#

done that

#

time to see how bad it is 💀

split acorn
#

now just type the name of the pt to test it out

#

for example "princess_tutu-18000"

#

or just "princess_tutu" for the default

frank urchin
#

ngl this aint working 💀

#

its like

#

not using my embedding 😭

native lodge
frank urchin
split acorn
frank urchin
#

im not sure why its not working

#

ive got it in the prompt and stuff

#

i tried a few prompts

split acorn
#

and the name of the embedding in the embedding folder is "princess_tutu"?

frank urchin
#

yep

#

i put 2 in there just in case

#

with different names

#

and tried both

split acorn
#

weird!

#

That should be working

native lodge
#

can someone just give me an example how to?

split acorn
#

do you have a screen shot of this area?

frank urchin
native lodge
#

something for noobs

split acorn
#

Also just use princess tutu, [filewords] next time for your instance prompt. The instance prompt you used is super wrong alicatLove

native lodge
#

https://youtu.be/HahKXY7AQ8c
It worked for me, but the interface has updated and I don't understand where to click now

Dreambooth local training has finally been implemented into Automatic 1111's Stable Diffusion repository, meaning that you can now use this amazing Google’s AI technology to train a stable diffusion model with your own images. You can train a character, an object, a style, or anything you want! There is also a new option that allows you to use D...

▶ Play video
native lodge
#

Or can I just bring back the old interface?

#

Just like in this video

frank urchin
#

you'd have to downgrade versions

#

which idk if you can do

native lodge
frank urchin
native lodge
#

idk

frank urchin
#

im not sure what youre asking

split acorn
#

I

#

It's really not that bad once you get used to it

#

the new DreamBooth UI

storm parrot
#

hi all!
I really need expert advice about few questions.
Trained casual artstyle for icons of in-game resources.
Used dreamboot from Thelastben and model 2.1 768
The dataset consisted of 100 images. Iterated the training several times. As a result, it turned out about 30k steps
The result is disappointing, with many images having severely distorted proportions. Also, the model practically stopped responding to the CFG scale values...

Who has already encountered training 2.1, what is the optimal UNet_Training_Steps and UNet_Learning_Rate for a dataset of 200 images?

frank ibex
#

What learning rates have you tried?

storm parrot
sleek yoke
#

Hi all. Happy new year!! May I ask whether there is some fine-tuning examples for stable diffusion, especially for inpainting model?

honest nexus
oak spear
#

Did anyone have any semblance of success training a Textual Inversion with Anything V3 through Automatic Web UI? Because so far it’s failing really badly for me.

tropic quail
#

I trained my own model, but it doesnt really follow prompts, seems to just churn out random images in the style of the instance images I used to train it

#

any idea what could be wrong?

amber musk
#

This one is for Textual Inversion

real dust
#

is there a useful guide somewhere to explain a little better how to decide how to configure ti/hypernetwork training? using automatic1111 there are a lot of options such as reversing images, deepbooru for anime tags vs BLIP interrogator (or both) for automatic image tagging, what prompt templates are recommended - the guides I've found have been very light on detail. Since training takes 8+ hours it takes a long time to conduct experiments.

white current
#

Collected 9K images for my diffusion project
gotta collect 18K more
mmm

obsidian idol
#

Last night's training was with 1300 images. V2 training has been .. interesting

white current
#

What are your settings if ok to ask

obsidian idol
#

I'm a newb I'm terrible person to ask for good advice. But webui embed in this case.

#

5950x/3090

#

In terms of introducing many unique subjects with few shared classifications, I've had much more success with dreambooth.

white current
#

some say its same but programs like StableTuner differentiates it

split acorn
#

They only differentiate it to make life easier

#

DB for training on a token and Finetuning training on captions

#

They don't need the same settings, so it just disables the settings it doesn't need

white current
#

whats the difference between tokens and captions

split acorn
#

Have been having good luck with 2.1 training on TI and Hypernetworks but I think I'm doing something wrong with Dreambooth 2.1

#

Token is just the instance token. People use a rare token like "sks". Captions just means the filewords, the words that describe the picture

#

so you train on those words

#

is how I understand it

obsidian idol
#

In 1.4, i used the same steps count and samples for embed vs. hypernetwork vs. dreambooth. For my scenario, embed and hypernetwork were similar, and dreambooth was exceptional. ~1000 artistically drawn pokemon with species and types. My measure of success generally was applying typing to different species. "dragontype eevee" for example. an attempt was made at "gmax style."

#

"dugtrio GMAX"

little hollow
#

For example: bi pedal, quad pedal, wtf this isn't legs pedal etc...

white current
#

@little hollow Wait, so more pictures aren't always good? I am probably going to train a sci fi diffusion soon, with a dataset im collecting which has over 26K images

#

Well, sci fi is a really broad topic so i think its fine?

obsidian idol
#

I think the issue @little hollow is pointing out is with the singular uniqueness of the samples and names.

#

For my case, I separate them into "styles" by identifying the types (firetype, poisontype), so the result is a dozen or so styles with 60-90 samples respectfully.

white current
#

i first write a script so it makes a textfile with the filename and the filename written in it, then i append the clip interrogation prompt to the txt file

#

or vice versa

obsidian idol
#

BLIP + manual species/type data works well for me. "a cartoon character holding a stuffed animal in its arms and smiling at the camera with a smile on its face, Slowbro, WaterType, PsychicType"

white current
obsidian idol
#

well, let me rephrase -- I have a spreadsheet of numbered subjects and attributes (style, type, gender, region, etc) that I use to script into a string. I append that string to the BLIP. By "manual" I mean not BLIP.

vale egret
#

You could use all sorts of things for training pokemon, like dex entries, or the dex species name, or maybe even base stats and abilities

vale egret
#

Then you can submit the results to the CAP project

obsidian idol
earnest aspen
#

Hello,

How are people doing Textual Inversion on Apple Silicon since it appears to not be working on latest version of InvokeAi?

I have tried several google colabs including the official hugginface one, and can't get it to work?

obsidian idol
#

results for 2d have been really positive. able to merge species and cross-type species. ok maybe "really positive" isn't the right word, but it's .. within expectations.

#

results for 3d ... well,

little hollow
sudden isle
#

welp, I got 5 fingers

#

but not what i wanted lmfao

obsidian idol
sudden isle
vale egret
#

Create-A-Pokemon run by smogon

sudden isle
#

I got it to be consistently 5 fingers, except its literal nightmare fuel

#

even the off fingers look decent

little hollow
#

@white current
Try to place them by a certain category, the more round ones
The more square ones or something

Anything you can think off, and if something has nothing in common with the rest - to the bin with you

sudden isle
#

any clue what caused this nightmare fuel with my training? I used a dataset of 11k hands

#

1e-6, 14,000 steps

-mixed_precision=fp16
--train_batch_size=1
--resolution=512
--gradient_accumulation_steps=1
--use_8bit_adam
--train_text_encoder

#

I did this twice

little hollow
#

@white current #1045349359044280360 message

Look at this example of how learning works for embeddings at least, the chat between me and sinister, he gave out a long explanation, it lasted ~1 hours so ~ 60 70 chat logs down

#

Some visual explanation from his as well, it really helped me to understand on how to filter

#

From 400 pics i went to 30, and those 30 gave out 10x as much effect than the 400 could

obsidian idol
#

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 93: invalid continuation byte

#

Argh. Turns out I was converting the txt files to utf-16 during processing.

#

Headsup. They need to stay utf-8 apparently. heh

regal harbor
#

how does the text encoder learn new ideas?

#

if I give it an image with something it's never seen before, and a description, how does it understand what in the image the new thing is?

obsidian idol
regal harbor
#

I want to understand theoretically how it thinks, so I can make the right decisions when I curate / edit training data

amber musk
gloomy belfry
dense flame
#

How many epochs would you recommend for a 60k image dataset on everydream?

storm parrot
#

Hi all!
I have read several guides on training a model in a dreambooth. Everywhere it is written that file names must have a unique identifier. However, the examples in such cases are usually about learning based on one subject.
I am having trouble figuring out the correct approach to naming image files in a dataset if I need to train a model on an artstyle rather than a specific subject.

I am asking for help with the file naming approach if I am training a model based on different subjects that are similar in style.

  1. Should I give descriptive names to the image files in the dataset?
  2. Should I give the files a unique text identifier and include a description of the image in a separate text file?

PS: When training a model based on approach 1.5, I gave the file names a descriptive name by separating the words with spaces. The results were good. However, on version 2, the results deteriorated sharply

prisma nacelle
#

Anyone know if adding mirrored images to the data set help with training? If so, do i need to add to the description that the image is mirrored, or should i just name the file as SubjectA Mirror (#).ext ?

little hollow
prisma nacelle
little hollow
# prisma nacelle so those guides that have mirrored data sets for their photos of a person were n...

imagine someones face with a freckles on the left side, only the left side(from the perspective of someone looking at the person)
and the person having one arm only, lets say his right

you flip the pictures, now what do you have? - sometimes a person with freckles on the left, sometimes on the right, a hand on the right, sometimes on the left

ahh ok, so it must be a person with sometimes L/R freckles, and sometimes L/R hand
4 different variants arise

#

or worse -it might make the face fully symetrical

#

what im saying is correct for 2.1 yeah?, not so sure about 1.4/5

#

should be quite simmilar

prisma nacelle
#

oh i see, my thoughts are if you specify that it is mirrored then the AI might take that into account.

little hollow
#

just take 4 pics and put a mirror
i think that 20 epochs are enough?
then try to see how well it performs

prisma nacelle
#

right, in order for me to test that was why i opened the original question. If i am to specify it is mirrored, would it be done in the text description or the file name?

little hollow
#

filename

prisma nacelle
#

alright, cool.

little hollow
#

the templete might give the regular one the word mirrored sometimes

#

yo, if anyone thinks im wrong tell me, id be glad to learn something new

prisma nacelle
#

i'll give it a shot after i set up the data. the main reason is indeed because the data has asymmetrical aspects and i sometimes notice the AI doesn't see it as asymmetrical.

bright plank
#

Hey is anyone using runpod for stable diffusion 2-1? I can't seem to get dreambooth working on it

white current
bright plank
#

I've watched the tutorials I've found, the dreambooth tab just doesn't show up for me. The tutorials for the Joe Penna notebook don't work for 2-1

obsidian idol
#

For sample descriptions (filewords/captions), is it problematic to have superfluous language and punctuation? For example:

a squat, quadrupedal amphibian with bumpy, blue-green skin. It has small, circular red eyes and a short, blunt snout. Its mouth is wide with two pointed teeth in the upper jaw and four in the lower jaw. On top of its head are small, pointed ears with reddish pink insides. It has three clawed toes on each foot.

#

Not because I like to write verbose descriptions, but because there are some well-written descriptive language that I can reference, and ideally programmatically.

rain tapir
#

Even if you get those juicy yaml files, the training sucks as of now

#

It just doesnt train well on that model, there may be a way to do it, but I have yet to see someone do it successfully

ornate flare
#

I'm assuming you're all using dreambooth

#

this runs all the way but the final model is the exact same as the initial

#

The only weird thing is how it's "only" using 9gb of ram

#

I don't think it's training at all

#

i can change the learning rate to 1 and nothing changes

split acorn
#

where does that come from? alicatHm2

ornate flare
#

dreambooth notebook

#

If i try on colab, same exact settings

#

it does work

#

decently well actually

#

it's the default dog whatever

#

if i try it locally however it doesnt work

#

im on a 3080

ornate flare
#

i think the problem might be xformers

ornate flare
#

i installed xformers

#

does not fucking work it's driving me nuts

split acorn
#

or using the colab

ornate flare
#

lovely thanks

livid axle
#

Hi there! I know how to train for faces and I know how to train for styles... but how would I do that for specific body parts like a hair-style? 🤔

prisma nacelle
livid axle
#

I stick more with the v1.4 and 1.5 Versions 🙂

prisma nacelle
#

so from my understanding after trying to train a character with a specific hair style, you would need samples of the data set that has many people with the same hair style. you can probably make this yourself using img2img and inpainting.

after you have enough data for it just set it up with the right descriptions and it should be able to train it.

livid axle
prisma nacelle
#

good luck. lol i spent the last few days stuck with my training only to find out the build i was using was bugged.

ornate flare
#

I am about to xform so hard

dark pivot
#

I'm trying to train a textual inversion embedding for sd 2.1, but I keep getting the error Sizes of tensors must match except in dimension 0. Expected size 1024 but got size 768 for tensor number 1 in the list. Does anyone know what I'm doing wrong based on that error?

obsidian idol
#

are your samples uniform? and "resolution" set correctly?

jolly bear
#

Just the settings I use on StableTuner to train on a 12GB 3060 card. I also set sampling to more than the total number of steps to avoid any samples, I do sample epochs and save the epoch though.

#

If you don't train the text encoder you can set the batch size higher, I've used 4 successfully but I've heard people using higher numbers.

#

Train epochs can be set to as high as you want.

#

As each epoch is the total number of images seen once (each step=one image) it can take quite a while on a 3060 card, I usually set it to 5-20 epochs, but depending on your needs you might want to set it higher. I've got some pretty good results with 10 epochs and more.

restive orchid
robust urchin
#

@torpid oar ici pour les questions

#

sur dreambooth

frank urchin
#

no idea why i cant run this on 8GB

#

are there any settings or something i messed up?

#

keep getting this

vale egret
#

That’s only half the error

prisma nacelle
#

anyone get dreamartist to work for training? i can't seem to get it to work and would like to experiment with it a little to see how it stacks up with other tuning methods.

jolly bear
restive orchid
stray kindle
#

Any tips for textual inversion with a photograph style?

vale egret
# prisma nacelle anyone get dreamartist to work for training? i can't seem to get it to work and ...

There’s a lot of discussion about it on the extension github page, with different people getting different results. I’d also be happy to learn the right way to do it, so lmk if you figure it out

https://github.com/7eu7d7/DreamArtist-sd-webui-extension/issues/18

GitHub

Hey guys, I am just wondering if anyone has successfully replicated the 1 image embedding and recreated similar results from 7eu7d7? Right now I have no luck testing it myself. Training time for th...

gloomy belfry
peak canopy
peak canopy
livid axle
#

Dreambooth and Textual Inversion should both work for this 🙂

peak canopy
#

just trying to learn all these tech behind the fine-tuning and training

livid axle
peak canopy
#

oh yeah, I've heard about it. I thought it was using a dreambooth way. But looks like it's a different one. It's helpful.

peak canopy
#

Is my approach is right? any feedback?

livid axle
peak canopy
#

for sure, thank you so much for your answers 👍 🙏

#

It's really helpful and I'll also explore Everydream

full knot
#

for shiro dreambooth, the instance token is the instance_data_dir right ?

#

i mean it rely on the directory name ?

rain tapir
#

That's the subject

#

But yeah, I think they call it the token

full knot
#

thanks, so for the shivaro db the only triggers is from the directory name or the whole instance prompt ?

pure blade
full knot
#

yeah, i just want to know what is the instance "trigger" on shivaro db x)

#

since there is no specific field for it

pure blade
#

the prompts decide that

full knot
#

i see thanks

finite creek
#

Hello, anybody knows of a tutorial or document going through all the settings in Dreambooth A1111 webui?

split acorn
#

There is some information missing, but it's a really good start!

#

For best results, generally your dataset should be with different backgrounds, clothing, lighting, expression and different camera distance/angles. In the video they used an unideal dataset, but the rest is pretty good!

finite creek
# split acorn https://www.youtube.com/watch?v=9Nu5tUl2zQw

Thank you Alicat! I have followed it, its pretty good. Having issues training an object (a car), I did it once and it came out pretty good, now Im trying it again with a variation and not working so well. Not sure what went wrong...

cunning isle
#

just had a little disucssion in #🌶|off-topic where someone asked "how many images do you need for finetuning" .. answers varied from 10-1000s ("2.1 can't be tuned on 10 with good results" "not dreambooth"...) whats the situation , i'd basically heard "fine tuning is possible with a few dozen", but aparently more can also help increase accuracy for a model narrowed for a more specific domain?

split acorn
#

I don't think there's an answer to that. It all depends on what you're trying to do and what you have available

#

How many images for training can vary from 1 to 1000s yeah

cunning isle
#

(haven't started on finetuning myself , I need a new PSU for a bigger GPU first.. but basically I'm interseted in generating game art - textures, background wraps - and I have some hacks in mind to try and project onto scenery from keypoints - anyway a fine-tune on sci-fi film stills could help out I guess.. I wouldn't want it to replicate specific copyrighted things but just be better at making 'generic futuristic buildings' etc)

split acorn
#

a bit outdated, but the information is still relevant

finite creek
honest nexus
prime perch
#

Hi, im pretty new to this, but have been playing around with embeddings. I want to copy the style of art from the rules book of the RPG my group is playing. So I trained it on 50 character images like the one on the left for 100,000 steps and its produces... the right. Any idea how I train it to understand what this art style is like, or even just what a human face is? Using the embedding leads to all pictures turning into monstrosities like the one on the right. *Not sure if this is the right place as this is an embedding not a model, but I didn't see an embedding fine tune channel. *

finite ivy
#

I'm in pretty much the same boat as @prime perch, I've trained 8 different embeddings trying to generate images with my likeness, but I get results like this (second). This is trained with 18 images similar to the first image, trained with the colab notebook that @honest nexus posted a few messages ago. Any thoughts? Image gen info: 22 steps Euler a on protogenDragon. I've tried with the default SD1.5 model, but the results are even less coherent. (at least it gets my hair right though, ha)

honest nexus
finite ivy
# honest nexus try another model, sd is pretty bad for textual inversion. I suggest you elldret...

Ok will do! Is there a convenient way to find that model? Google leads me to a reddit post with this model
https://civitai.com/models/1259/elldreths-og-4060-mix

This mixed model is a combination of my all-time favorites. A genuine simple mix of a very popular anime model and the powerful and Zeipher's fantastic f222.What's it good at?Realistic portraitsStylized charactersLandscapesFantasySci-FiAnimeHorrorIt's an all-around easy-to-prompt general purpose semi-realistic to realistic model that cranks out ...

prime perch
steel eagle
#

Anyone used dreambooth to make a LORA? wondering how to use the .pt file it generates in models/LORA in A111, the additional networks extension won't load those

honest nexus
#

I still think the best way to train textual inversion is from automatic1111, lowering the learning rate every 300 steps

little hollow
#

oi, ACCELERATE implemented into a1111?

#

i saw a few results from training using it, they were freaking top notch

#

even replicating everything that he did gave me about idk a third of his quality?

honest nexus
little hollow
winter apex
lethal totem
#

Can we train SD on 32x32 and 7x1 images for example?

cunning isle
limber peak
#

Is there a good low vram (sub 10gb) version of dreambooth out there currently?

little hollow
#

new au1111 has Gradient Clipping --modes: norm/value
default is 0.1 - any ideas what it is and what it does? seems like a cool new option but figuring it out is gonna take way too long alone

finite ivy
honest nexus
finite ivy
#

Auto1111, I couldn't quickly figure out how to get the model into the colab notebook you sent so I just went local

#

Here are the results at about 1050 steps, still takes some coaxing to get right. Might be some trouble with my dataset with multiple people, but this is pretty good!

#

First is generated, second is reference

lethal totem
#

and this will be awesome

#

pos is: photo_of_benclements_mod6_man_close_up_alone_sharp_focus

#

maybe negative is: lowres, low_resolution, bad_light, bad_shadows

finite ivy
#

Thanks! I appreciate the help!

lethal totem
#

and show me example

#

🙂

#

what will change

finite ivy
#

I'm away from my computer right now 🥲 I will keep the channel updated though. Also fwiw, this is on a 6gb card, so this embedding stuff can be accessible to more people than training a whole dreambooth model!

honest nexus
finite ivy
honest nexus
#

yep, 512x512 and xformers, but never tried --medvram

finite ivy
finite ivy
astral mica
#

Can anyone point me in the right direction? I'm trying to follow Aitrepreneur's instructions to create a textual inversion embedding in runpod using the Auto1111 UI and keep on getting a {} error when trying to preprocess images.
https://www.youtube.com/watch?v=4E459tlwquU&t=659s

astral mica
glass dove
#

hi, so i dreambooth trained using the colab for 2k steps, is it possible to go for more steps without having to start over?

#

results feel a little undertrained

full knot
#

you can start with your previous created diffuser model

#

instead of starting from 1.5 or whatever your base is

tidal cliff
#

anyone know why the TI templates have so many lines in them? like... painting of [name], rendering of [name], etc... what's the point of all this

#

im not sure how these templates are even used I guess

#

are they using them to create... potential images of your [name] while training... but then how does the algorithm determine if the resulting image is "good" or not... like... what's driving the loss function of the optimizatino routine

#

with each step of the training process... how does SD figure out if it's going in the right direction or not...

#

I had thought that it was using back propagation on your set of training images... using the captions that you write for each one as input and the actual photo as the correct answer

#

but in that case, what's the point of the TI templates

#

are they just there so that you produce a variety of different images for like... qualitative evaluation while the thing is running? but they serve no purpose in the actual training process ?

crisp cloak
finite ivy
# crisp cloak Is it possible to train embeddings using the google colab notebook?

Yes, but for high quality results of a person's likeness you will need to use a model other than SD 1.5, I posted a result a couple messages back with my results from that colab notebook and the results were subpar. I haven't done much looking into it but you need to figure out how to import a model other than SD 1.5. another user recommended me eldreths vivid model, and it's worked really well in my local installation, so I would recommend that.

crisp cloak
crisp cloak
crisp cloak
finite ivy
# crisp cloak https://colab.research.google.com/github/TheLastBen/fast-stable-diffusion/blob/m...

Yes, ok, so there is a difference between that and textual inversion embeddings. What you linked is for training a dreambooth model, which as you stated, creates a whole new 2GB .ckpt model on whoever you provide it pictures of. This works really well, but ckpt files are large and you can only use one at a time, but TI embeddings are harder to train it seems, but you can use them on top of nearly any model, so long as the version matches

#

No prob! I only found that a day or so thanks to another user. Again, you'll likely have to figure out how to import a different model into that notebook bc the default 1.5 model is not that great at it.

crisp cloak
finite ivy
#

@tidal cliff
The templates are used to give the embedding some 'context' to what you're training. The template file provided probably isn't suitable for plug and play into most trainings. I created a custom template file named 'custom_subject.txt' that only contains "a photo of [name], [filewords]". This is sufficient for training on a likeness of a person, in fact, for my own likeness, I created a template file that only contained "a photo of [name], a close up photo of a young man" to some pretty good results. The [filewords] is a caption that describes the image. This is useful because without it, the embedding will pick up on things in your training images. Say for example you have a lot of pictures of you in a black shirt with trees in the background, then the generations of your trained embedding will favor you in a black shirt and trees in the background, so by letting the network know what is NOT you in the image, the embedding becomes more versatile. I hope that makes sense?

finite ivy
crisp cloak
finite ivy
final patrol
#

Since LORA's been around for a bit, is there a general opinion on its usefulness? I know it's a smaller file, so ignoring that, does it have any particular weaknesses/strengths? I assume it's somewhere between TI and Hypernetwork?

split acorn
#

it's like dreambooth lite

#

can train with less VRAM and the added bonus of smaller file sizes

#

there's also a convenient webui extension that lets you try it out with various models and lets you adjust the strength too

#

With first impressions, I prefer TI and Dreambooth to LoRA, but I'm still pretty new to it

obtuse shard
empty shuttle
#

Anyone want to test (for free) a dreambooth service focused on video game characters? I mostly am looking for feedback on whether you like the results. Link is https://polymorf.me/ . Send me a DM and I'll send you a custom test link. We use a combination of diffusers + dreambooth + some Textual inversion + some img2img depending on character

white current
#

(very small part of a 2k image tank dataset)

ornate flare
#

Can someone explain me something

#

can i actually train stable diffusion on 1024x1024 images by selecting resolution = 1024?

#

or if i do that does it just downsample my input to 512x anyways?

white current
ornate flare
#

kk

#

it works on 768x768 on my 3080

#

1024 crashes just checked now

spare herald
#

can anyone offer advice on why a hypernetwork seems to have no effect?
I got good results during training but it doesn't seem to affect the image gen

agile wadi
#

So I've been succesfully training Embeds with my 3070ti 8gb, but when I increase the batch size over 1, I get a CUDA out of memory error - I've seen people succesfully train on 8gb with batch sizes of 4-6 before, any ideas why this is happening? I'm launching with --xformers --opt-split-attention --medvram

finite ivy
spare herald
#

thanks for your help, I did have to figure out that stuff but I think I did get it working

#

I think in easy sd the setting takes time to turn on and off, in automatic1111 it does seem to work when you enable and reset the ui

#

doing smash bros it's not easy

#

I'm having some success, need to do a lot more description and having more reference images will make it better but I'm already 500 img in smh

#

went from the machine having no idea who mr game and watch is to having a pretty good idea, 8 hour training tho rip

#

most people just trying to put in one face I wish it was that easy for me

finite ivy
#

Rip yeah. Local training is slow for me as well, I got embed training working on my 6gig card, but takes time. Can hyper networks do multiple subjects like that? I didn't know that was possible, if so, cool!

spare herald
#

yes

#

like I said most tutorials or examples is 1 face

#

so I wasn't sure if it would be ok with 89 subjects + their sidekicks and cohords

#

cohorts

#

and with some settings it does reach what I call the singularity where all the characters are mashed up but

#

my last training went like this

#

oh I can't post pic

#

imo

#

smarter than some of the irl people I've tried to explain smash bros to

#

it def has a hard time looking at 2 subjects in 1 pic that it doesn't know

#

so I'm having to break down like banjo and kazooie etc

#

the koopalings are a nightmare for it but I am helping

finite ivy
#

Interesting! I'd like to see how you make out with this!

spare herald
#

the training in most of the models for mario series subjects is just wack af and is hindering the process but we'll get there

#

if nintendo doesn't find out

#

don't tell em

finite ivy
#

Safe with me 🤐

spare herald
#

I had to do it cause it's like 98% anime girls and I'm like yo show me yoshi

#

and it's like some japanese guy

#

I cannot stand for this

#

I'm concerned about weights

#

like, for example the pikmin people there's 2 dudes and 5 pikmin and so I have multiple shots of the pikmin cause there's 5 colors of each and 3 stages of growth

#

it's rough being me

#

so like, is olimar and the pikmin gonna weigh more than say cloud who the machine already knows or mr game and watch who it's clueless about

#

we'll find out I guess

finite ivy
#

So long as they're labelled correctly in your dataset though that shouldn't be too huge right? Also with such a large and varied dataset I would set the LR really low

spare herald
#

uh

#

I'm new

#

what's lr

#

oh I got another question too

#

so

#

can I use this .pt in conjunction with deepdanbooru so I can stop labeling the shit at some point?

#

like can I feed my smash dataset another dataset of smash images and have it point out who's in the img once it knows

finite ivy
#

learning rate, by default it's set to .005, in my recent experience with embeddings, .005 becomes too fast around 800 steps or so, so I went down to .001 and then .0005

spare herald
#

I'm running .0000001

finite ivy
#

But it might be something you play with. If you end up with generations that look beyond screwed up then your LR is too high

spare herald
#

for 100000 step

#

yeah I've reached singularity before

finite ivy
#

I'm not overly familiar with LR for hyper networks, so I don't know if that's lower high but that's definitely one of the knobs to turn as you start to refine your process

spare herald
#

tbh around 30 or 40k it has it mostly figured

#

but like subjects that are really similar it doesn't have it figured by 100k

#

duck hunt is hard cause of the name

#

there's 2 roys

#

it's rough stuff

#

I think it's a worthy endevour I hope at least, but I'm worried when I release it nintendo gonna shut the whole sd project down haha

tidal cliff
finite ivy
# tidal cliff thanks for your reply re: the template file. How many vector per token have you...

8:00 to 10:00 seems to be good for getting someone's likeness. As far as learning rate, you know it's gone too fast when you generate something with it and it looks otherworldly and not even close to sensible. When an embedding is undertrained The output images will look reasonable and you can tell it's starting to get the idea of what your subject is, but it's not quite there yet. The case of a learning rate that is too high is worse than too low. A good schedule that I found is this: .005:700,.001:1000,.0005:2000,.0001

#

This is for batch size 1 and grediant accumulation 1. That schedule would change if those values were to increase, but I can't increase them because of VRAM

finite ivy
#

I haven't verified this, but I hypothesize that if you're training a style instead of a person, you can set the learning rate a bit higher and train for less steps

crisp cloak
finite ivy
crisp cloak
plucky current
#

Hey, can anyone give me some sugestions at finetuning since i'm a newbie?, I have about 350 images instance images and 550 concept images, all images are high quality and 1024x1024, but it seems like I'm getting A LOT of failed ckpts, any advice? (im trying to fine tune on dreambooth colab)

peak canopy
#

Folks, What's the difference between diffusers and ckpt model types? I'm getting good results when dreambooth training using joepenna repo with ckpt file compare to the diffuser repo.

winter apex
plucky current
#

alright, thanks a lot for the advice, I'm currently trying again with 100 images, all captioned and for about 1000 steps (10x the number of instance images), what about concept images, any advice? does the rule of less is more also apply? thank you

final patrol
#

Has anyone here trained LORA? According to this guide, you can train many concepts at once, but you're not supposed to use the character name or series name, so that it would "become implicit to other tags". This is really confusing to me, because I want to add a cartoon series that has various characters in it. How would it know the difference between the style and a character? How am I to differentiate between characters?

For instance, for X-Men, on a given image, I want to be able to make Gambit hang out with Jubilee, or Storm with Wolverine, or Gambit with Storm. Maybe I just one one of the characters to hold a cat. If I can't use any character names, how would it implicitly know which setup I want when generating images?

https://rentry.org/lora_train#captions

split acorn
#

Because you're training on an instance token

#

The instance token is the replacement for the "character name" or "series" basically

final patrol
#

Got it. You are still giving each concept a name, but not the original one

split acorn
#

yeah, using a rare token to help it with the learning process

#

You can do it another way though

#

I'll find the video

final patrol
#

ohh thank you.

split acorn
#

I'll find the time stamp

#

but basically they trained on something the original model already knew how to make

#

Which is another valid way to do it

#

though, I'm not sure if it's better or worse

final patrol
#

Ah I watched several videos, but not this one. Thank you. I'll investigate.

split acorn
#

It's not nessesarily better overall

#

but it can work CB_nod

final patrol
#

regarding what you were saying before, each character has to be a pre-existing token, right? I can't just do "xmenGambit" "xmenStorm", etc..?

#

I've seen models use phrases like that before.

#

But maybe not LORA tunings...

split acorn
#

mmm

#

Generate images with those tokens and if they look good, then it should work (to some extent)

final patrol
#

I'm sure they don't exist

#

it would likely be random

split acorn
#

mmm let me reword it

#

if it doesn't exist, then you're better off using a rare token

#

So if "Xmen Gambit" doesn't consistently produce good quality Gambit pictures, then you're better off with like "olis" or another rare token

final patrol
#

"Gambit" gives a gentle unsure whiff of the original character. So that would be a good candidate to train?

split acorn
#

I honestly don't know, but my gut feeling is no.

final patrol
#

okay

#

I'll experiment

split acorn
#

Rare tokens, so it's completely random.
Or well known tokens, so you have high quality token you can train off of

#

I'm not aware of anyone doing anything different alicatHm2 though, it'd be cool to see

final patrol
#

ah, all or nothing. Hmm

#

Thank you very much. I mainly wanted to know if LORA worked best with a fundamentally different approach (and it sounds like it doesn't). I'll experiment and figure something out.

median sun
#

anybody else getting error when loading 2.x model with automatic1111? I''ve tried to get it working 4 times now and I kinda dono what to do anymore.

#

the yaml is in there renamed and auto1111 is updated

#

it just gets killed without an error and sais error in the ui

full knot
#

someone have an example of captionned image for scenes please ?

stone bloom
#

Any clue what might be causing problems with lora dreambooth, whilst training the 512 version of SD 2.1?

768 seems to be developing normally, but 512 immidietly turns into an abstract mess.

split acorn
#

I've had poor luck with it too, not sure why alicatHm2

stone bloom
#

It's so weird, I swear

split acorn
#

TI still works great, just not dreambooth

#

I'm not sure why

stone bloom
#

Yeah ti seemed to work fine for me as well

#

Dreambooth becomes literal black magic to the 512 base. Tried almost everything at this point. Nonema, 8bit, text encoder, non standard resolutions, preservation, learning rates. Nada.

Simply won't budge.

#

Meanwhile even 512 training on the 768 looks better 🤡

split acorn
#

I honestly wouldn't be surprised alicatKEK

stone bloom
#

At least I can vaguely recognize the shapes

#

Think I only haven't tried turning off xformers and fp16 in parameters.
Then again, the 512 base would output blank, brown images without using xformers, so what gives.

full knot
#

for the caption image names and text encoder, does every tokens counts as their own or a whole ?

#

like "An ARAV74 plane", does the whole sentence will be trained word per word and quite destroy "plane" token ?

#

or it will create a new entry in the model for the entire sentence only ?

#

or should I name the concept image file "ARAV74" only and specify after at prompting "An ARAV74 plane"

#

hmm i may be confused between concept and caption aswell

#

i still don't even know where the text encoder is reading from : the filename or the instance prompt ? both ?

misty glacier
#

Anyone knows how to train stable diffusion inpainting?

#

I can only find [img2img inference] [img2img fine tune] [inpainting inference] sample code.

#

But I want to do [inpainting fine tune]

tacit wedge
#

Bit of a noob question here. If a model is trained using a specific sampler, does that mean the same sampler will deliver the best results when creating images. Or doesn't it necessarily work out like that?

torn turtle
#

which one is the best sampler that can produce optimal image with CFG Scale more than 10?

split acorn
#

Anime trained on DDIM can still look better using something like Euler A

#

Although my Euler A models turned out well so ChillBar_shrug

restive stream
#

Did anyone finetune inpainting model to generate backgrounds for transparent images?

plucky current
#

there are so many options to train right now, I wonder, what would be the best way to train something on a consistent style?

stone bloom
# plucky current there are so many options to train right now, I wonder, what would be the best w...

I'd say textual inversion is usually enough, since style is more of an adjustment in tone, rather than a totally foreign concept.

Dreambooth is kinda overkill for the most part, unless we're dealing with styles, which focus on subjects completly unknown to the base model.

With ti for style, your best shot would prolly be with a [name], [filewords] template, having manually described content of each training image in their filename. Just avoid using adjectives, which are inherent to a given style.

plucky current
stone bloom
#

here's an example from the people behind it

#

so when you make a template, go for something like [filewords] in style of [name]

#

[name] will automatically get replaced by name of the embedding/textual inversion, and [filewords] will be replaced with whatever you wrote in filename of each image

#

as for colab, can't help, since I'm only familiar with local 😅

#

@split acorn Did you use Lora, when you tried to dreambooth SD 2.X in 512?

split acorn
#

No, just dreambooth

#

Oh

#

I think maybe

#

Well I tried WD 1.4 as one of them

#

and that's based largely on 512 2.1

#

so yes! at least a model based on it

plucky current
stone bloom
restive bridge
#

in the case of fastDB, does text encoder training consist of blip auto-captioning? or does it just train the token name?

full knot
#

i hope to do some serious tests on caption / text encoder soon

stone bloom
#

You can use multiple embeds in one generation. Only requirement being, that they were trained on the same model.

#

one embed is basically one concept, be it person, shape, style, pose, composition, action, whatever

tender silo
#

How is Embedding and Textural Inversion different from using Dreambooth? I'm still new to this so I'm kinda clueless
I'm also seeing LoRA being thrown around

#

Let's say I wish to create a consistent output that shows a set of poses, a bunch of clothing from different eras in the style of Bloodborne
Basically I want to generate Lady Maria doing an A pose, T pose, and Flossing in various historical clothing, how would I best achieve this? Then would it be possible to change it to Queen Elizabeth II

#

Sorry for the cursed example, Im not trying to mock the British royal family, just trying to learn how they’re different from each other

little hollow
stone bloom
#

Same here. Just phrased it that way, since the example is easier to understand I hope.

little hollow
#

Sometimes adding
X, bad art, horrible art, bad painting, horrible painting, bad darwing, horrible drawing

#

Might eliminate the need to use negs(works like 20 40% of the time? Idk)

stone bloom
#

Oh? That's interesting. Never considered training in negatives per se'

little hollow
#

What you caption is to be eliminated from the embedding

#

What is left is what makes the embedding

#

That's why it works at all

stone bloom
#

Makes sense, I guess that's why some of the default templates would use phrasing like "the weird X" or "dirty X"

little hollow
#

Yep

stone bloom
#

Just sounds counter intuitive to me, purposefuly training it on qualities that would supposedly make a worse generation.

little hollow
#

No, what happens is that it lets you render all of the words of the org picture

What cannot be generated using those prompts is given to the embedding to regenerate it

#

Hopefully this clears it up a bit

#

And btw, id put the embedding somewhere close to the beginning of the caption, as it put more attention to the ones at the start

#

Too late and it might regen the entire original picture using only the prompts(caption) you given it

#

Lets say a dog in a forest, and you want only the dog

You caption
A forest, green, branches, leafes, sky etc...

But leave out the dog out without being captioned

It will regenerate everything beside the dog

And here comes the embedding trying to regen the dog with random words

#

Once it gets better at making the doggo come back, the loss drops

#

At 0 loss, you get q replica of the dog

#

But, if there was a stop sign you didn't caption - it might try to regen it as well

#

This might interfere and slow down the training

#

And give you stop signs

stone bloom
#

Usually shuffle my tags, since I tend to train more abstract or generalised concepts. Something akin to generalised style guidelines, rather than a certain character/subject.

#

Not sure if it was for the better, but seemed to make sense to me, since it resembled training a checkpoint designed for a certain type of art.

#

Interesting. Someone basically had the opposite problem.
Just can't wrap my head around what might be causing this behaviour..

plucky current
#

is there any colab that I can use to train an embending?

honest nexus
#

It needs an update definetely

plucky current
honest nexus
whole gorge
#

I am having difficulties with embeddings getting to the result I want

#

they always seem low res or out of scale

#

Its taking about 12gigs of GPU memory to train an embedding

#

its either blurry or checkerboard sometimes like this

tough flame
#

When I fine tune a model in dream booth, should I be using the model name in the prompt?

stone bloom
#

Also looks like someone's been trying to embed space marines

high venture
#

Anybody had luck with StableTuner on 12G card?

stone bloom
# whole gorge Its taking about 12gigs of GPU memory to train an embedding

Shouldn't be that big.. My guess would be either:
-training with too big a batch size
-training in wrong resolution
-training with images that weren't resized and cropped to your models resolution
Other than that, if you are using autos, look into settings under the training tab.
You should probably check "cross attention optimizations while training" and "Move VAE and CLIP to RAM when training if possible"

hard peak
#

Is it possible to use checkpoint merger to allow one to use danbooru tags with another model?

My line of thinking is that the LAION tags are pretty awful especially when compared to the danbooru tags of waifudiffusion. It would be interesting if you are able to merge WD with another checkpoint to allow using danbooru tags.

stone bloom
#

Technically speaking yes, it's just that they'll have lesser or greater impact, depending on weights of the mix.

#

Merging is quick and easy, so just give it a go, see for yourself. Try out different proportions, and note how the different tag systems interact.

restive bridge
#

Has anyone found the ideal class images for training faces? with portraits being the intended outputs

hard peak
#

Yeah, the merging didn't seem to work terrifically. Tough luck for me

split acorn
#

That should work alicatHm2 just a matter of finding the right merge combination CB_nod

coarse hemlock
fading forge
bold dragon
#

Hello everyone, fairly new in training with stable diffusion. Hope this is not a noob question.

I trained a style with 30 images with hypernetwork. I tried different layer structures, using the preprocessed captions without further editing, and different learning rates down to 5e-8, 20000 steps.

The problem i have got is that, i was able to img2img and got a similar trained style, but the results were very messy. The lines were not straight, shading was not consistent, especially the eyes, it was totally messed up.

I wanted to find out which part had gone wrong.
Was it my data sample not large enough? or was it the learning rate or steps that was not set to the right scale? Or is it the model is already well trained but I need to fine tuning the settings in img2img instead.

Not looking for an exact answer, discussion is also welcome, really need some new ideas on what to do next. Thank you guys.

prisma nacelle
# bold dragon Hello everyone, fairly new in training with stable diffusion. Hope this is not a...

having messed with hypernetwork training myself, and still struggling to get the ideal results i'm constantly trying to update the training data with anything that might be "slightly" off.

I've also used dreambooth models and tried TI embeds. Overall dreambooth works well in fast learning but it also pretty much makes the model it is trained on not usable for anything else.

the hypernetwork issue you experienced is the same as mine, sometimes it just doesn't work at all and other times it works on img2img. Recent hypernetworks I've trained have been better, but there is still those problems such as uneven outputs, inconsistent lines etc.

recently trying to see if TI embeds can do any better, but that seems to be taking longer to train and hard to see if it is any better than hypernetwork.

what exactly are you training the hypernetwork to generate?

bold dragon
# prisma nacelle having messed with hypernetwork training myself, and still struggling to get the...

Hey shihiko, thanks for the time in replying
For the TI embeds method u mentioned, was it "embedding"? sorry if this is a stupid question, not very familiar with the terms yet.

I was training a set of chibi characters with a painting style that is similar to fire emblem heroes's chibi characters.
I have trained both embedding and hypernetwork with the same set of images, and what i have experienced are:

  1. Embedding could generate a much closer style when i use txt2img, but one problem is that there is a hint of my original images' pose in all generated images, and when i ask to generate some poses that never existed in my dataset, the results are broken. Not sure if that was because of my prompts not accurate enough or some other issues, couldn't figure out yet.

  2. After several tries with embedding, the same poses made me give up in the hope of using prompts only to generate sth in different compositing, so i then thought of using img2img, hoping for with a simple draft, i could get a stylized result. So I switched to train with hypernetwork. (after reading articles saying HN is better in training style) Failed several times but in the end, i tried with a setting of layer: 1,3, 0.75, 0.75, 0.75, 3, 1, LR: 5e-8:20000. Took one of the “alright” pt and was able generated sth that I think is around 50% looking alike my expected style.

  3. So in order to improve the result, I tried to add the trained embedding model in (1.) in the prompt when doing img2img together with the HN model (2.) But no matter how i tried differently with changing CFG scale, steps, denoising, sampling method, I could never get back to the 50% in (2.)

Thoughts in mind now:

  • Should I continue training with embedding but in a much slower rate? However, i know my GPU is not having enough RAM to train.
  • If stick with HN, what other settings should I try?
  • Couldn't try with dreambooth, simply couldn't run the thing with my current GPU
#

omg, my message is so long, really sorry about that

prisma nacelle
# bold dragon Hey shihiko, thanks for the time in replying For the TI embeds method u mentione...

yeah so TI is referring to Textual Inversion embedding.

I also experience alot of "pose biases"

I think the issue here is just not having enough training data to give the AI more examples to learn from. When it keeps learning from the small data set, it improves in detail but also becomes more biased towards what the data set contains.

If you are able to get the style you want to come from the img2img it might be a good idea to get the AI some more data from generated images that are closer to what you want. Which is what I am doing, it takes a long time and trial and error, but I feel it is the best way to go when trying to be specific with what you want.

bold dragon
split estuary
prisma nacelle
bold dragon
stone bloom
ornate flare
#

What does finetuning with 8-bit adam look like?

#

Is it noticeably worse?

#

also, weird question but if i wanted to finetune on about 10000 images

#

what learning rate would be most appropriate?

hard peak
#

Is 3-4 images truly enough to train a subject in dreambooth?

hard peak
#

Additionally, for dreambooth, should one provide mostly closeup portraits, full body, or a mixture of the two?

whole gorge
#

So I got a decent embedding but its struggling on the faces/helmets

#

can I add additional reference images to my dataset that focuses just on those details and then it will improve the embedding or will it get confused if you have images that are only a "part" of the whole?

#

I made a space marines embedding and it works pretty well for their armor but the helmets are wrong

#

and it doesn't understand if I say I want one without a helmet

prisma nacelle
whole gorge
#

Dont trust your preview settings for how your embedding performs

#

always switch to a custom model and try generating using some of the similar keywords from your captions

#

and if hypernetwork is like embeddings copy them all over so you can try myembedding-1000, myembedding-1500 etc

#

its also better to train using a generic model like the SD 1.5 one or WD

#

how is the results of a hypernetwork different from textual inversion?

dapper prism
#

How much VRAM does EveryDream need to train a 768px SD 2.x model?

winter apex
#

has anyone tried training a person with LoRa and got decent results?

#

i tried on myself and it was a failure

hexed bloom
#

Anyone have a recommendation for a number of epochs using 30k+ images?

restive bridge
#

what's the visual difference between over/underfitted text encoder vs unet?
every time a model fails i have to experiment in both directions with both unet and text because no one seems to know the visual difference between under and overfitting of the text encoder vs. under or overfitting of the unet.

winter apex
#

will try it, thank you

#

file size is the least of my worries, i already have like 50gb worth of sd models

foggy fog
#

Hi all, i'm using dreambooth to train files. I need to hold a lot of fine tuned models but storage is an issue considering the cpkt files are 4gigs

I saw a video that converted a tar file to a cpkt really fast. I downloaded a similar tar file and saw it's also 4 gigs so doesn't solve the problem.
https://www.youtube.com/watch?v=-6CA18MS0pY

Is there any way to just hold the weights in some smaller file format then convert them to cpkt easily. Goal again is to massively reduce storage size of the customization.

In this video, I am explaining how to run Stable Diffusion models that not provided in .ckpt file format. Moreover, I am step by step explaining how to convert these .bin training weight / model files into a .ckpt file to use in Automatic1111 Web UI and other interfaces. Furthermore, I am explaining how to use generated ckpt file to teach your f...

▶ Play video
dim rampart
#

what's the best collab for making those insanely amazing videos that i keep seeing on instagram? I know a few but would like to get input from others as this field is ever changing.

wide meadow
#

i have used camenduru's colab model to run it works great but can't work with models larger than 7gb i also use nocrypt model it still needs to choose to be able to install 7gb but it's not very stable when 2 or more images, it can't be output compared to camenduru, can output more than 2 images without any problem so is there any easy-to-use model like camenduru and can load 7gb ckpt model file

wide meadow
#

Where can I find instructions?

hexed bloom
#

If I have images that aren't 1:1, does keeping center crop **unchecked ** automatically force the images to be squished to 1:1?

#

I'm looking to squish images that are 512x512+ or 512+x512 when training

split acorn
#

It shouldn't if you're using the dreambooth webui extension version. They added aspect ratio bucketing about 2 weeks ago.

novel pond
#

I've been working on a new embedding that I'm working on anythingV3. But then.

#

RuntimeError: CUDA out of memory. Tried to allocate 1.25 GiB (GPU 0; 4.00 GiB total capacity; 2.13 GiB already allocated; 180.00 MiB free; 2.24 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

whole gorge
#

anyone good with photoshop scripts?

novel pond
#

I'm using NVIDIA Geforce GTX 1650

#

and xformers...

whole gorge
#

trying to load image, remove background, add a solid color black layer, move the layer done, flatten image, save as png

#

yes you ran out of ram

#

training in particular is very ram intensive

#

try checking this box if you havent

novel pond
whole gorge
#

get a better graphics card

#

4gigs is very much low end for this type of thing

novel pond
#

like... RTX 3090?

whole gorge
#

anything Nvidia with more ram

novel pond
whole gorge
#

Depends on your budget

novel pond
#

like more then one RTX 3090?

whole gorge
#

I don't know card stats I just went from a 1080 to a 4080

#

and there are still things I can crash on with 16gigs of ram

novel pond
#

Damn. I was about to get ready to make some custom models...

whole gorge
#

oh you want to make models lol

#

even embeddings takes 12gig of ram

novel pond
#

and embeddings

whole gorge
#

also preparing the images before training is very important I am finding

#

my first attempt I just dumped images into a folder and it didnt go well

#

I tried to make a space marines one and it still wont get the helmets right and you won't be able to get specific poses or chapter colors etc

#

and I've trained it through different settings and sets of images a few times

novel pond
#

I was trying to mix Anythingv3 with some of that f222 along with some sd.1.4.

whole gorge
#

thats just merging models

novel pond
whole gorge
#

you can do that on the merge tab in automatic1111

novel pond
#

checkpoint merger you mean?

whole gorge
#

yes

novel pond
#

I already did that.

whole gorge
#

thats exactly what that does mix's checkpoints together

novel pond
#

But it doesn't have that realistic feel that I'm currently making...

whole gorge
#

well you are using an anime checkpoint?

#

try this one

#

I ve had a lot of success with it

novel pond
#

AnythingV3 yeah that one.

#

I'll go check that out then...

#

I also honestly want to make my own textual inversion.

whole gorge
#

I am doing that right now

#

But it takes 12gigs of VRAM

novel pond
#

like a anime + realistic way.

#

What do you have?

whole gorge
#

a 4080 which has 16gigs of ram

novel pond
#

the 12gigs of VRAM i mean

whole gorge
#

but I just upgraded from a 1080 which I think had 8

novel pond
#

let me guess, a GeForce RTX 4080 Graphics Card?

whole gorge
#

interesting the 3090 is more than my 4080 but has 24gigs of ram

novel pond
#

What about the
NVIDIA GeForce RTX 4090?

#

also $1,309.99??

#

Rtx 4090 £1,599.00?

#

Jesus...

whole gorge
#

I had not bought a new card since 2016 but I already wish I had more VRAM

novel pond
#

Looks like the road for me to make textual inversions has already ended in a matter of a few minutes...

whole gorge
#

trying to make one now

winter apex
novel pond
whole gorge
#

you have to like rent server time or something for that?

#

@novel pond Im using my first collab and im literally just clicking play buttons

#

the code is already written

winter apex
#

yes its literally clicking buttons and adding whats missing

whole gorge
#

oh if its a collab that runs automatic1111 then its practically the same

novel pond
#

Fine, is there link for on automatic1111?

#

towards google colab cause I was usally on webui.

full knot
#

does anyone knows how the text encoder is set on the shivan collab ? I mean which values are used ?

whole gorge
#

time to test the embedding

#

fingers crossed

novel pond
whole gorge
#

I mean they do mostly look like miata's

#

but in my experience you have to back through the embedding and check the every X iterations models and try using some different checkpoints

#

you want to train on the base checkpoint like 1.5 but switch when checking it out

novel pond
#

By the way does anyone know DPM++ 2M Karras Simpler? Cause I've seen others prompts using that one. But I've seen anything like on my webui.

whole gorge
#

there is loads its one of the default in automatic1111

novel pond
novel pond
whole gorge
#

fwiw I wouldn't worry about missing a sampler

novel pond
#

hmm.

whole gorge
#

see how largely similar they are for a given seed

#

even changing steps as well doesn't usually make a big difference

#

you are more likely to find what you want by tweaking your prompt, just generating lots to choose from and then going into img2img

#

the biggest advantage for AI art is being able to take hundreds of shots at it

#

all of the karras's were similar and the first seed was the same with all samplers

novel pond
#

I doubt mine could reach to 150 steps with the low memory I got. But there's only few changes around the prompts.

whole gorge
#

I don't think steps uses more memory

#

just takes more time