#🔧|finetune

1 messages · Page 1 of 1 (latest)

wild totem
#

🔧

uneven flame
#

has anyone gotten fine tuning working on a 12gb card?

#

i can get it running, but i have to reduce the size to 256 😭

covert dragon
#

Is there any guidance on how to optimize the rate that stable diffusion generates images? I'd like to know what configuration options make the most difference.

#

After that I want to know what hardware setup is best for running it. I haven't found any articles on that, figured this might be a good place to ask that.

#

I'm pretty much a complete noob with it other than setting up the server on my computer and using it via a plugin in Krita

lilac helm
#

Innnnnnnnteresting, haven't seen or heard of that notebook before

random ocean
#

I'm trying to train a model to emulate my own illustration style. Textual inversion isn't really cutting it and dreambooth seems to be geared towards people and objects. I came across this pokemon model and I'm wondering if this method of fine-tuning is better to emulate styles. Thoughts? I'm a dummy when it comes to coding btw.

https://github.com/LambdaLabsML/examples/tree/main/stable-diffusion-finetuning

GitHub

Deep Learning Examples. Contribute to LambdaLabsML/examples development by creating an account on GitHub.

lilac helm
random ocean
#

how much is that ballpark?

lilac helm
#

You're probably gonna have to do the pricing research yourself since there are a ton of different hosted GPU services out there. But if you read the link you shared you might have an idea, since it does mention how much that 6 hours did cost the author.

sinful venture
lilac helm
#

Whether $10 is a lot honestly differs from person to person and we shouldn't assume, but I'd say if someone's not familiar with coding or willing to get in the guts to troubleshoot, that $10 is almost certainly going to be burned away, six hours of frustrating wait and problems to fix

tame aurora
#

Hey, is anyone able to continue training a model with the original CompVis main.py script with the --resume parameter? For me it seems to go well at first - the proper model is loaded from the checkpoint and the configs are merged as expected but once it gets to trainer.fit() ... it just returns, doesn't train for a single step and the script ends

hollow salmon
keen pasture
random ocean
hot breach
#

there's a pesonalized_style.py that should be there and some slightly different cli params I think just everyone is doing objects instead of styles so its not as well tested

bright obsidian
#

is there a website where i can just submit some images and get it finetuned w/ dreambooth?

hollow salmon
#

dreambooth vs finetuning?

keen pasture
somber estuary
#

And it's almost as easy as drag, drop, and fire if you rent a GPU from runpod or vast.ai to do it semi-locally.

somber estuary
somber estuary
#

A cloud service that offers "true" training, like how we got the anime, furry, and pony finetunes, would be very interesting, and could charge in the hundreds to thousands of dollars depending on dataset size...

bright obsidian
jaunty spade
#

anyone around to provide guidance on how to train a batch of pictures using the google colab portal? im getting an error message that says ## Convert weights to ckpt to use in web UIs like AUTOMATIC1111.

bright obsidian
jaunty spade
jaunty spade
#

on the last step.. ahhh so close

bright obsidian
#

can you link the colab?

bright obsidian
#

did you run this cell?

#

this is where OUTPUT_DIR is defined

jaunty spade
#

just noticed there wasnt a green tic beside that play button. just ran it

bright obsidian
#

awesome

#

worst case you can just define OUTPUT_DIR= <put something here>

jaunty spade
#

does the green tic have to be green on all the play icons?

bright obsidian
#

and then run the cell

jaunty spade
#

should that be run too?

bright obsidian
#

should be

jaunty spade
#

ok... looks like i skipped a few should be steps

bright obsidian
#

sent a friend req

jovial hemlock
#

does anybody know if "a character" is a valid training subject?

bright obsidian
#

if anybody is interested in using dreambooth to finetune people, i've found this notebook to work well

analog sinew
#

Is anyone aware of a finetune on just 64x64 images? I'm looking to generate low resolution images, and in my experience going far below 512x512 yields awful results.

bright obsidian
#

why do you want 64x64?

#

you can generate 512x then downsample

analog sinew
#

gonna attempt to implement dreamfusion3d

#

i imagine finetuning to lower resolutions shouldn't be too hard compared to other finetuning projects

vivid mural
#

This experiment is very insightful

bright obsidian
#

send me a friend req?

tawdry forge
#

before I dive in to setting up an environment I'm trying to understand if I should be trying to set up dream booth or textual inversion. my understanding is that textual inversion can use multiple trained concepts at once but dreambooth can't?

#

also I'm running local on a 3080 so not sure if I can run dreambooth at all

ashen perch
#

I hope this is the right channel

#

I'm messing around with Textual Inversion, used a bunch of town art from Heroes of Might and Magic 3, first I used styles.txt for prompt template, then I've used a custom (styles_filewords.txt, but [name] and [filewords] swapped) and with less steps
I'm not sure what I'm doing wrong, my first try (with steps 3000, 8500 and the homm3-style-old with 11000 steps) looks better, but doesn't resemble the prompt. My second try looks worse, but closer to the prompt with the final embedding (3000 steps homm3)

the prompt was simple: london, 8k, highly detailed, homm3-500 with Prompt S/R replacing homm3-500 with homm3-500, homm3-1000, homm3, homm3-style-3000, homm3-style-8500, homm3-style-old

vivid mural
stone garden
vivid mural
stone garden
vivid mural
#

No not much. I've just experimented with cartoon characters, and the results were... As horrible as possible 🥲

mossy oriole
# stone garden Did you experiment a lot on the number of tokens ? I'm not sure where the ratio ...

I judge it like with java memory for minecraft. Half the amount of tokens.

Depending on the (aesthetic) score, the more you pile on, the less will be recognized.

Example 1:
Subject, effect, whatever else, Peter Max - PM is 1661933.16.00

Example 2:
Subject, effect, whatever else, Jordan grimmer - JG is 749135.43.00

It's a big pain in the peach, to balance out artists - I honestly have no idea if I should consider evening out the scores to match high-scored artists.
The difference is 91.279.773 in score.
That would be between
Bartholomeus Strobel 911728.56.00
and
Emil Fuchs 913456.35.00

-> Eggplant, shine, by jordan grimmer, by Emil fuchs, by peter max
-> Eggplant, shine, by jordan grimmer, by Bartholomeus Strobel, by Emil fuchs, by peter max
Additionally, by including Bartholomeus, I start to experience getting a lot more frames with my outputs - Perhaps there's an unintended high-limit with scores?

But, that science project, requires me to have more resources available - so I don't have to think about cost-performance stuff

stone garden
#

ok let's do some tests on that waldo 🙂

bleak swallow
#

tried using the huggingface textual inversion colab to get the style of first pic, ended up getting second pic instead, feels bad

#

problem is it would take at least another 3 hour training session to figure out if changing some settings can fix the results or not

#

now I remember why I wasn't able to get into machine learning...

stone garden
#

that was on how many learning steps ?

bleak swallow
#

3000, I left all the parameters at the notebook default

#

honestly it wouldn't surprise me if the style is too weird for TI to even be able to locate it

#

I mean the style probably fails the filtering of the SD training data

stone garden
#

I'm still in the dark on TI... it requires testing to really get right, but that takes so much time

#

progress on the waldo training

#

getting closer

bleak swallow
#

definitely looks like a waldo

#

although I have a bad feeling that even as the overall picture looks waldo-like, the individual figures in it might stay indistinct blobs forever

stone garden
#

it could, especialy since this is only TI, but the goal is not to make more "where is waldo", but to extract an embedding for "waldo style"

#

I'd like that intricate style with lots of characters in lots of other styles and subjects

#

I'm training it as a style there, not a subject

bleak swallow
#

I see, interesting idea

ashen perch
#

and the results are confusing for me

#

1000 steps are so good, while the final 3000 steps look overcooked, my previous attempt also looks horrible with 3000, 8500 and 11k steps

ashen perch
#

Any suggestions how should I restart training it?

bright obsidian
#

is the a browser version of webui?

#

for my frens that don't want to have to download anything

serene condor
#

i want yo train 35 photos, what settings should i chose and how long should i expect to finish?

#

should i run anything over 15,000 steps?

rough hamlet
#

I've found 12,200 steps and 3 vector tokens works pretty good for faces. haven't tried training any styles yet

#

Here are some samples of my firend, I don't have the original training images on this comp but the first set of faces is pretty much the same as the training set. Trained with 22 closeups of face, 12200 steps, init word=man, vectors per token=3 leanring rate of 0.005.

gilded crater
#

I would advise caution against posting anything child related on any public area. Even if it's your own child.

ESPECIALLY IF ITS YOUR OWN CHILD.

eternal wing
#

Where can I look for details on training in general rather than asking a bunch of questions? Checked pins, but don't see anything

vivid mural
lament moat
#

After a long night of futile research, I could use some help. I use the SD Autmatic1111 UI on my PC. I have a single .PT file inversion file. How do i run it (get it to work) on my version of Automatic1111?

hot breach
ashen perch
lament moat
lament moat
ashen perch
#

And yes, it works like that

ashen perch
#

I’m messing around with Papa Franku and I don’t know what’s wrong, but with 2 different training both looks overcooked for me

#

Filthy-frank-6 used only 1500 steps and 6 vector tokens, the others used only 3, and I think I stopped somewhere between 5000-8000

stone garden
#

I did a waldo style, on 30 tokens and trained it a little too much. the results are fun though

ashen perch
stone garden
ashen perch
#

It’s good to hear that I’m not the only one 😂

lilac helm
turbid patrol
#

fun to look at

stone garden
#

TI has quite some fun to show me yet I think, I'm just trying it. Especialy TI on a style, not a subject

turbid patrol
#

i haven't been able to replicate styles too well yet. let us know how your experiments go!

weak sonnet
#

i was trying to teach it how to draw specific swimsuit
it learns how to draw them well, could be better, could be worse
but in the process it includes the most horrible faces i've ever seen, like enormous cheekbones and lips, also shaved head, something like neandethals
and i cannot remove this face even with specific prompting. needless to say there is nothing like that in the initial imageset
i tried everything: different vector sizes, initialization texts, different template files and captions, include only images with/without faces
only what's help is lowering learning rate, but that is not enough
most disturbing is that the most pleasant results i get on 200-500 steps of training, it's not perfect, it's still thinking about initial prompt and not the pictures i included, but it's nice to look at. and then it come only worse and worse and worse

stone garden
#

it's still struglling

#

I'll send a zip with all checkpoints here when I reach 15k

stone garden
mental hatch
#

It just has issues with R&M

sharp grove
half folio
#

Is that textual inversion?

high nest
#

/dream apple

half folio
#

Nevermind, just saw those files

stone garden
#

yep it is

rigid starBOT
#

@high nest

FAQ: I'm new here, how do I generate images ? Where is the bot ?

Welcome ! There is no bot currently to generate your images on discord. You may want to start by taking a look at the #1014939219904450590 channel. You can access Stable diffusion in different ways : 1️⃣ the official website, https://beta.dreamstudio.ai/. The easiest and fastest way to access Stable diffusion with 200 free credits. For any question on it, you can find help in the #1025467151206854736 channel. 2️⃣ Installing Stable diffusion on your computer. There are numerous projects that let you do that, and you will find help in the #🤝|tech-support channel. 3️⃣ Running Stable diffusion in the cloud, through rented GPU services, using notebooks. You can find lots of them shared and discussed over in the #1011228442399883294 channel.

half folio
#

hmm, this certainly looks interesting

viral jay
#

well after lot of trying seems like I found a good spot for face learning with textual inversion, 6 tokens and up to 20000 steps was the answer, with 1 token above 15000 steps it was bringing random results, with 6 its giving me realistic faces and still kinda flexible to work on styles and stuff

whole ice
bright obsidian
#

finally finetuned a model with dreambooth, kinda painful

#

would anybody be interested in an website that lets you do dreambooth finetuning?

viral jay
#

guys what's the effects of increasing the learning rate?

#

lower rate means more accurate? or just slower?

half folio
#

lower rate means the training will be slower, yes

#

it controls the rate at which the model learns

#

I'd suggest not to increase it much if you're doing dreambooth finetuning as it will most likely make your model worse

#

you could increase it slightly if you're training on more images than recommended

viral jay
#

its actually for textual inversion, I'm using around 20-25 images, is that good amount?

#

I'm trying to get the result better, its for face learning, there's one training that did great results but its kinda rigid on styles so I've cropped images at face bounds but then things got worse, trying to extract best of it isn't easy

half folio
#

I'm not sure how good textual inversion is for that, I've only trained it on styles and not faces or objects.

#

I think you could try increasing the learning rate a bit

viral jay
#

it did learn my face pretty well

half folio
#

that's pretty cool

viral jay
#

that's pretty accurate

half folio
#

yes, the quality is very decent

viral jay
#

the images above got generated without inpaint so I think ti was very good, but it seems just efficient to learn my uggly face 😂

#

well discord stopped uploading the images 🤔

#

there's some way to finetune a embedded model?

#

for example, I got a training with very nice results but its with incorrect shape on chin

hot breach
#

new version from what I posed the other day on reddit

viral jay
#

I just found the [face:.... face:0.5] parameter, there's more like that?

tough gazelle
hot breach
#

use per image prompts on the training images instead of a global class word

tough gazelle
#

How do you do that?

hot breach
tough gazelle
#

Ah ok and that works well then? It doesn't merge features at all?

hot breach
#

I'm running it locally, not sure if the notebook was updated

#

nope doesn't merge

tough gazelle
#

Would you say if your doing 2 characters you should do double the steps?

#

Also I wish I'd seen that earlier lol before I created this monstrosity.

hot breach
#

that model was ~7500 steps and >500 training images

tough gazelle
#

Ok I've been doing 4000 for 30 images for 1 character

hot breach
#

can get more than one character in frame, still sometimes mixes their features but that's a global issue with SD

#

I believe by adding more multi-character training images it can learn to separate them better though

tough gazelle
#

Ok, but I'm guessing if you use one of their subject words it just looks like them?

#

And you don't get say 1 persons outfits colours on another's

tough gazelle
hot breach
#

I'm using blip to interrogate the images, and then fine tuning the prompts from there

#

adding my character names in place of "man" or "people" and such, tweaking things like duplicates or fixing incorrect prompt words

tough gazelle
#

Ok cool, thanks. I'll give it a go.

hot breach
#

there's a bit of style transfer here, barret sort looks like cloud's outfit more than his own but it sorta knows they are separate characters

#

my theory is adding multi-character training images helps attention work better, this is much better than a previous attempt and I only went from like 35 group photos to about 55

tough gazelle
#

You think you need a lot of training images for it to work well then

#

I guess you could also use this to get multiple outfits for 1 character under different subject names as well

hot breach
#

yes, I've spent many hours assembling the training set but working on automating, blip and txt2mask can probably help a lot in automating this

#

yes I tried adding many outfits into the training data as well, it actually helps with style transfer, it's easier now to get characters to wear other outfits from base SD by including many outfits in the training set, it can attend to the face vs. body I think when it sees a character name associated with different outfits

#

and also vice versa, putting base SD faces on characters

hot breach
#

my sis will love this, she's obsessed with labyrinth

tough gazelle
hot breach
#

yeah its a lot of hours of training

tough gazelle
#

Also got a bit confused as it seems to expect you to have the images in a directory called images, so I was pointing it to where the files were and was getting a "Not A directory" error. But I think I've got it going now.

#

Well I must have, it wouldn't start if it couldn't find any training or reg images

hot breach
#

this is how mine is organized

#

reg folder is the same

tough gazelle
#

I just have them all in 1 folder, but the files are named accordingly like it says on the Github page

#

So
SubjectA_001
SubjectB_001
SubjectC_001

And so on

hot breach
#

yes

tough gazelle
#

Does this generate 3 test images at the intervals?

#

Or just the one

hot breach
#

it only seems to generate one pair

tough gazelle
#

Ok, would be nice to have all 3 so I can see that it's working properly.

hot breach
#

it also does some batch size hunting at the beginning I think because my sets are mismatched

tough gazelle
#

Probably not that hard to implement, but I'd have to find where it is in the code

hot breach
#

I'm not positive how it choses what to generate, I guess whatever prompt/image it is on when the training image logger triggers

#

yeah

tough gazelle
#

Yeah I would assume that too. It must just look at the last training subject and class word and use that

hot breach
#

probably could use some better collation

tough gazelle
#

I suppose if it's doing that you could just feed the script the different names you've put in and tell it to do all of them at the interval

hot breach
#

my training images almost all have entirely unique names

#

ex "cloud strife turned away from the camera with a buster sword on his back"

#

cloud strife standing on a sidewalk, night, streel lights_ (15).webp

tough gazelle
#

Oh I've not gone that far with this test. Just using simple names for each character

hot breach
#

tifa lockhart in a purple dress holding her right hand up standing in front of a shelf with liquor bottles on it's shelves.webp

#

a lot of stuff I get out of blip

#

yeah just a suggestion, I think it will help with attention in training, and also maintaining the existing model knowledge

tough gazelle
#

Ok, well I'm just running this for a couple hours before I sleep to see what it's like.

#

That SD-Optimized one also provides more images, like it shows you what phrase it was using to get the image. This one unfortunately doesn't.

#

So I'm just going to have to assume it's working

hot breach
#

you can tweak the v1-finetuning.yaml and change the imagelogger interval

#
    image_logger:
      target: main.ImageLogger
      params:
        batch_frequency: 400``` <---
tough gazelle
#

oh

#

it got to 1000 steps and died

#
Traceback (most recent call last):
  File "main.py", line 832, in <module>
    trainer.test(model, data)
  File "/root/miniconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 911, in test
    return self._call_and_handle_interrupt(self._test_impl, model, dataloaders, ckpt_path, verbose, datamodule)
  File "/root/miniconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 685, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/root/miniconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 954, in _test_impl
    results = self._run(model, ckpt_path=self.tested_ckpt_path)
  File "/root/miniconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1128, in _run
    verify_loop_configurations(self)
  File "/root/miniconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/trainer/configuration_validator.py", line 42, in verify_loop_configurations
    __verify_eval_loop_configuration(trainer, model, "test")
  File "/root/miniconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/trainer/configuration_validator.py", line 186, in __verify_eval_loop_configuration
    raise MisconfigurationException(f"No `{loader_name}()` method defined to run `Trainer.{trainer_method}`.")
pytorch_lightning.utilities.exceptions.MisconfigurationException: No `test_dataloader()` method defined to run `Trainer.test`.```
hot breach
#

you have to pass the number of max steps or set it >1000

#
        "--max_training_steps",
        type=int,
        required=False,
        default=12000,
        help="Number of iterations to run")```
tough gazelle
#

I changed it in the v1-finetune_unfrozen file

#

Is it different on this one?

hot breach
#

yeah it is overriden by the cmd line arg

tough gazelle
#

Ah ok

#

So I leave the file alone and pass that command line

hot breach
#

I just changed it in code there on the arg definition

#

but you can just add it to your arg list too, either works

#

I believe max_steps in the finetune.yaml is also respected

#

so which ever comes first

tough gazelle
#

I changed it in v1-finetune_unfrozen.yaml

#

And it didn't listen to that

#

I'll use the command line

hot breach
#

the command line arg is inserting an interrupt thus the error message, the one in the yaml I believe doesn't throw an error but they behave the same, and both are active limits

#

it should still dump the ckpt either way

tough gazelle
#

It did dump it, but at 1000 steps

#

Then stopped

hot breach
#

correct, as designed, you should set both limits higher

tough gazelle
#

Ok, you didn't need to set it in the command line on the other repo I've been using

#

It just looked at the file

hot breach
#

correct, kane added that for whatever reason I guess

#

you could just go into the argparse code and set the default value to 99999 or whatever if you want

#

above I set mine to 12000

slow mantle
# hot breach

Yay, I'm glad the model will be seen by further people

hot breach
#

she rides dressage and uses the labyrinth music for her routine, I sent her a photo of jareth on a horse 😆

tough gazelle
#

Ok so I only had time to run through it 3000 steps tonight to test it, but it's worked reasonably well

#

1 of the characters doesn't turn into a blob like doing back to back training steps

#

There's still a lot of style transfer, but I think more training images and more steps will most likely help that

hot breach
#

yeah include a few with both characters in it, with a proper prompt, I used ~55 images out of 550 as group photos, even that seems to have helped a lot, I think a bit more may be better

tough gazelle
#

It seems like the class has bled into it a little too much as well

#

On a normal model, putting in the class doesn't really make a massive difference. On this, it starts making them look like the regularisation images

#

Maybe I used too many reg images

#

I'll have to give it another go tomorrow and leave it a lot more steps

#

But it's at least partially working at low steps

#

One of the models of the 3 is worse than the other

#

But it did have slightly less training images

#

It was Darkness that had less images and she doesn't come up much at all
I guess it could just be, low steps didn't have time to train as well with the lower images.

coral mist
#

When training SD, what do I need to do to have it start from a previous checkpoint? I'm seeing references to --finetune_from and --resume and --resume_from_checkpoint

#

Which is it lol. I tried --finetune_from since that was referenced on the pokemon diffusion but it took me two epochs before I realized it was trying to start from scratch

fallen nova
#

textual inversion shenanigans with Yves Tanguy paintings as the input for training

#

if it works like that.. heard you can just drop it in an embeddings folder on the automatic1111 fork but i'm not sure if just works plug and play like that

#

oh damn guess it does

coral mist
#

Ran a few tests at different gpu configurations to see what made sense for training speed vs cost, maybe it will be useful to others

1xa6000 ($0.63 /hr)
80 in 2 minutes

4xa6000 ($2.40 /hr)
200 in 2 minutes
costs ~300% more and only 150% faster

4xa100 ($3.20 /hr)
252 in 2 minutes
costs 33% more and only 26% faster
stone garden
tame aurora
#

For example, I skimmed some of LambdaLab's Pokemon images and BLIP did decent job IMHO so I concluded an accurate caption would benefit the fine tuning

tame aurora
tidal orbit
#

How do you assemble a training set? Are there places with good libs for that? I would be interested to finetune on some fantasy/sci-fi image gallery

hot breach
#

a lot of it has been brute force data prep, but I'm working on workflow improvements

hot breach
#

I didn't fully blip prompt the entire set either, maybe like 30%, too time consuming for now but a lot of this can be automated

tulip comet
#

what do the parenthesis stand for? like what's the dif between, highly detailed, (highly detailed) and ((highly detailed))

tame aurora
charred jackal
#

Is there a list of model weights and what they have been trained on? (I´m using Artroom stable diffusion, if that makes a difference)

rough hamlet
tulip comet
#

nice thanks both! @hot breach @rough hamlet

fallen nova
stone garden
# fallen nova you might want to change the classifier if you're going to do that lol, currentl...

I try to keep my identifyers a single word, preferably not recognised already too much by the AI, or at least not recognised as something else (don't use a nickname for your face for example, I was called a monk a long time and I used monk as part of the identifier, learning went wild)
but with TI, if you use the identifier corresponding to an embedding, it's not the number of words in your identifier that matters, it's how many tokens per vector you choose when you created the embedding.

#

like, waldo I shared earlier is 30 tokens on its own I think

fallen nova
#

damn wow

#

so yves tanguy painting would be a shit ton then

stone garden
#

it just depends how you create your embedding

#

I would choose a word like "yvestanguystyle"

#

and put about 8 tokens in to see how it works already

fallen nova
#

there u go then thanks

rose cosmos
#

someone knows why my loss while training with dreambooth doesn't go down.. tested at 5e-5 and 5e-6 on 20instance pics 200 regularization 800 1600 and 3200 steps

tawdry forge
#

made with a textual inversion watercolor training

delicate rock
#

is it possible to finetune on sizes that aren't powers of 2? i want to use data from drawception which has a fixed size of 600x500

viral jay
#

guys, what's best method to train a whole person with textual inversion? I'm running now a training but now I've separated some good quality images from head / face and body and I'm using subject_filewords so it use the filename to guide it, is this a good approach?

sweet sand
#

Hey there!

#

Quick question: I would like to start training model in Automatic11111 and share the model with friends. Any advice? My questions are:
1- Should I train on my 3080 rtx or pop some AWS instance instead if I want to do mass trainning (cost not really an issue, I have a pool of credits on AWS)
2- Can a train model on Automatic1111 output a .pck that I put back in the stable diffusion folder?
3- Any training I could test to learn? (Ex: sample of folder with X images that I could follow)
4- Anyone have some time to walk me trough the process? Will gladly pay for the training 🙂

viral jay
#

I have a 3080ti too, with around 20-25 images It do aprox 10000 steps per hour (may take less time)

#

it does generate few .pt files

#

that goes inside of embeddings and textual inversion folders

sweet sand
#

In the main folder, I have mode > Stable-Diffusion. This is where I put the model I find, for example: robo-diffusion-v1.ckpt

viral jay
#

result from training I did

viral jay
sweet sand
#

Wow, thanks for all the info. Let me know If I understand:

  1. Name: Whatever I want
  2. Initialization text: Description of the prompt for that image set?
  3. Number of tokens : 75?
  4. Source: Path on my folder with all image
  5. Destination: Where I want the process image
    6: I click Flip and Caption?
viral jay
#

with 6 tokens on textual inversion and only 8000 steps I could get some very nice results even mixing the style with real photos

sweet sand
#

This will create a nice 512x512 folder I can feed to the embedding part?

silent spear
# viral jay guys, what's best method to train a whole person with textual inversion? I'm run...

I am still working my way through the TI process, but my overall gist is (re: auto1111 TI interface, I should say):

  • yes, use filewords, but maybe make a new subject_filewords.txt file for humans, because when you use the default, the phrasing tends to be a bit wonky... you don't have a good photo of "a jane_doe", y'know? 🙂
  • I use more initialization text than I think may be standard: say, "person female woman actress". For some reason, "actress" really made a difference. I used to use very few words, but then I'd have the effect of "clear face, blurry everything else", which I suspect was SD saying "I know that jane_doe is this face, but I dunno WTF the rest of her looks like"... the extra words allow it to borrow from existing concepts, if that makes sense.
  • I use 2 vectors per token max. Any more and I can't use the character the way I want (different costumes, styles etc).
  • learning rate at 0.001 for about 3000-5000 steps. I actually usually save out 2 versions, one for "I need this to look like the character exactly" and one for "let's have some fun and be flexible".

Thus far, it's working pretty well. I refine it constantly, but I've made a handful of characters this way, and those are the general parameters I use.

sweet sand
#

A: Embedding: I guess this will have something once I did the first part?
B: Learning Rate: 0.015?
C: Dataset Directory: Any path I want?
D: Log - Ok
E: Prompt Word: uh?
F: Max Steps: 15 000?
G: Save image asd save copy: disable or 500?

tough gazelle
#

If you guys are running that training repo that lets you do multiple characters I've noticed that it doesn't have the speed upgrades.
https://github.com/kanewallmann/Dreambooth-Stable-Diffusion

You can get the speed upgrades from
https://github.com/gammagec/Dreambooth-SD-optimized

Just replace the files in the kanewallmann repo with the ones from Dreambooth-SD-optimized. Increases training speed to 1.06s/it on a 3090, instead of around 1.5 you get currently.

Files to replace:
/ldm/modules/attention.py
/ldm/modules/diffusionmodules/model.py

On Linux I also had issues with an older version of Pytorch being used causing it to use more memory and therefore run out.
To fix this edit the Environment.yaml so it uses:

  • pytorch=1.11.0
  • torchvision=0.12.0
viral jay
# sweet sand Wow, thanks for all the info. Let me know If I understand: 1) Name: Whatever I w...
  1. yes you place the name you want it so when you use on prompt it will use the info learnt
  2. keep as is
  3. tokens I use 4-12, more tokens means it can get more details, but your prompt is limited to 75 tokens which means you will get more and more rigid results as you increase to max of 75 tokens
  4. source path is the folder with images (prefer to crop images to subject you want to learn)
  5. this is where images will be output after you press the process button and is where you will input on the dataset directory
  6. I don't use flip or caption, I tried with caption and my results got worse so I didn't really tried much with it, I don't use flip because faces are asymmetric
tidal orbit
viral jay
#

@silent spear gave some good tips, I'm still learning too, for the learning rate I use default of 0,005 I have increased but didn't noticed any changes, might try to decrease as he said, for steps I've noticed that going above 5000 does not make a huge difference, but I always let it process up to 10-15k and then I delete the higher steps numbers from the folder if I above certain steps I noticed a degradation

hot breach
#

yeah I spent a lot more than an hour on it lol

tidal orbit
sour pond
#

Does the optimized version of Dreambooth work on a 3080?

hot breach
#

I was screenshotting the game, then I'd go back resize/crop, then fix prompts

silent spear
#

Oh, one thing that I think is very important (for everyone, regardless): never feed TI an image that isn't "right". If you give it a photo of your subject that looks different or weird or non-standard, it will latch on to that sucker and reproduce all the wonky qualities you hate in every image you create from then on.

viral jay
#

That's very true, and its very good to get bad details lol

#

a face I was training had a small shadow on middle of lower lip, it made the algorithm think that lower lip were separated...

#

got good once I removed that single image from the dataset

tough gazelle
# sour pond Does the optimized version of Dreambooth work on a 3080?

If you mean the Dreambooth-SD-Optimised one, no, it uses 23.3GB of VRAM.

If you mean the diffusers one, then sort of. It can work on just under 10GB VRAM. However windows running probably means you'd run out, so you'd probably have to run it in a Linux console for it to work.

sour pond
tough gazelle
#

You can rent 3090's for like $0.40 an hour if you really want to give it a go

sour pond
#

Its all good. 4090 releases next week and the price of a 3090 has fallen dramatically.

tough gazelle
#

I'm using Vast.ai as it has a $5 minimum credit amount and Runpod has $10

sweet sand
#

Nice thanks

#

i'll take a look

#

People who train and output a .ckpt (ex: the one who published robo-diffusion-v1.ckpt) - are they using another software?

tough gazelle
#

The main tools output ckpt files. It's only the diffusers packs that don't

#

I've never used the diffusers one as I've heard it's worse quality

viral jay
#

Can anyone explain to me what's a token exactly? is it the training trying to cook information on a single word? I'm trying your params @silent spear and with 2 tokens and 0.001 learning rate its doing reasonably well but 5k steps wasn't enough, I'm letting it do some more training and with 6k its giving some nice results but will let it go up to 10k

#

what I'm noticing is that its learning less data, the character suit is less detailed, let me grab some examples of it

#

This is Wraith from Apex Legends, with 2500 steps / 6 tokens / 0.005 learning rate

#

this is with 2500 steps / 2 tokens / 0.001 learning rate

#

2750 steps / 2 tokens / 0.001 learning rate, when comparing to first image its much less detailed

#

and this one is the real character from the game

#

will now do a test with 12 and 24 tokens to see how this reflects the quality to have some parameters, but I believe I will get back to average of 4-6 tokens as they seem to give decent output

viral jay
#

2500 steps / 12 tokens / 0.001 learning rate, yeah not very good tbh

#

3750 steps / 12 tokens / 0.001 learning rate, starting to get interesting, the fact is that learn more details on the cloth but face seems better on low tokens one, will run this up to 10k and see if there's a good improve

viral jay
#

here's some comparison

#

2 tokens (top) vs 12 tokens (bottom)

#

same prompt and parameters on both images, its easy to see that higher tokens create a higher bias to source images

viral jay
tough gazelle
#

No

#

24GB

tough gazelle
crimson sandal
#

This might be a naive question, but I successfully was able to fine tune SD using a set of images I provided. If I wanted to train that same model on another set of images, can I do that? Can someone link me to an explanation? For instance, if I trained it on <cat-toy> images and now I want to add <dog-toy> images, how can I make that happen?

tough gazelle
shy widget
#

like <cat-toy> AND <dog-toy>

plucky swan
#

Theres a repo that able to finetune at 8gb of vram for dreambooth, have you guys tried it?

stiff dust
#

if you are talking about the diffusers dreambooth with deepspeed, I made that PR. It works fine for me on 8GB VRAM, some other people have tested it too successfully

#

it's also not written anywhere yet, but replacing the Adam optimizer with Deepspeed version of Adam gives very substantial speed up

gloomy belfry
strange crest
#

I'm new here... can anyone direct me to where to learn better prompts? I keep getting images with blurred faces and extra limbs

plucky swan
#

Crazy that it had at least a degree of success anyway, did you find the quality significantly worse or its mostly the same?

stiff dust
#

I don't think I did anything special, the instructions are included. Make sure to set batch size 1, gradient checkpointing and mixed_precision=fp16

gloomy belfry
#

Yeah I did all that

#

Idk

plucky swan
#

Are there some sacrifices on quality for the 8gb repo?

stiff dust
#

no, but it requires a lot of CPU RAM

gloomy belfry
#

And is sloweer

#

If you could share your Adam code @stiff dust id like to try it again

plucky swan
#

Interesting this allows the tech to be a lot more accessible by just adding more system ram

stiff dust
#
from deepspeed.ops.adam import DeepSpeedCPUAdam
...
optimizer_class = DeepSpeedCPUAdam
strange crest
stiff dust
#

it should work without that change too but using DeepSpeed Adam gives about 2x speedup

gloomy belfry
#

Nice

#

Thanks

stiff dust
#

also that requires cuda toolchain that is same version as pytorch

#

I had troubles with it and built pytorch from source to make it work

gloomy belfry
#

Ah

#

Well that may be too excessive

sullen zephyr
#

Have anyone tried finetuning the decoder?

half terrace
# viral jay guys what's the effects of increasing the learning rate?

Learning rate changes the step size for each cycle (how much it's allowed to adjust the network weights) higher values make it faster, and easier to pass local minima (values that the ANN thinks are good) but it decreases stability and increases the risk that you'll blast past actually optimal values.

ashen perch
#

does anyone have a sample prompt template for training specific people?
subject.txt and subject_filewords.txt have a bit strange combinations

ashen perch
#

I'm using automatic1111's webui

austere wigeon
#

so I am trying to run the Dreambooth on a runpod https://github.com/JoePenna/Dreambooth-Stable-Diffusion I have a Runpod with 24gb VRAM which I thought should be enough based on info I found... but I am still faced with RuntimeError: CUDA out of memory. Tried to allocate 58.00 MiB (GPU 0; 23.68 GiB total capacity; 18.33 GiB already allocated; 39.56 MiB free; 18.41 GiB reserved in total by PyTorch) at step 1... any ideas?

GitHub

Implementation of Dreambooth (https://arxiv.org/abs/2208.12242) by way of Textual Inversion (https://arxiv.org/abs/2208.01618) for Stable Diffusion. Tweaks focused on training faces, objects, and s...

half terrace
#

When running TI, is there any issue with using a longer initialization text than the number of tokens for an embedding?

sweet sand
#

I am running Arki's RunPod fro training, when it's completed, do I download the last.ckpt in the logs (checkpoint) to put it in automatic1111?

viral jay
half terrace
viral jay
#

hmmm, what I've noticed is that it seems better if I throw very different images of same subject instead of lot of close images but only different angles

ashen perch
#

what prompts were u using?

tough gazelle
#

I believe the collab one does some pruning first to make it smaller, so make sure you read the instructions

sweet sand
#

And do yo advice 2k, 6k or 15k training?

tough gazelle
tough gazelle
#

Just make sure you training images are of good quality and you have a decent amount of regularization images

#

I use around 20-30 training images and 300 regularization images.

grave carbon
#

Can using too many regularization images be bad?

tough gazelle
sweet sand
sweet sand
tough gazelle
inner rock
#

I'm trying out textual inversion in the automatic1111 webui on an AWS server. I used 5 training images, all head shots from different angles and backgrounds, cropped to 512x512. I edited the subject.txt file to match the photos, along the lines of "a close-up photo of a smiling [name] looking forward". I trained for 6,000 epochs, which was less than an hour. Is there something similar to what JoePenna describes on his dreambooth repo readme as "If you trained with joepenna under the class person, the model should only know your face as: joepenna person". I don't see a way to indicate the class in the webUI version. Is that needed in some way, or am I good to go with just using my token? TIA!

ashen perch
austere wigeon
#

can anyone help me with runpod and automatic1111... the version of automatic111 that the template starts is old... I am wondering how do I get the new version...? I did a git pull, but not sure how to restart everything... starting the webui-user.sh doesn't seem to do anything

#

I have downloaded my custom dreambooth model and want to use it in the runpod environment now...

#

I have used it locally successfully, but I want to run it with the powerhouse GPU I rent with the runpod service 😄

dusky pecan
#

But there were some breaking changes for that today

austere wigeon
#

I mean it is already installed

#

it is just running an old version

dusky pecan
#

Yes, you have to connect to jupyter and update it

#

git pull
git checkout 4999eb2ef9b30e8c42ca7e4a94d4bbffe4d1f015

#

In a new terminal in the webui dir

#

Then pip install the requirements again and then restart the pod with the little pen icon

#

You have to checkout a commit from earlier today.

austere wigeon
#

thanks!

#

that worked

austere wigeon
#

hm suddenly getting Sizes of tensors must match except in dimension 0. Expected size 82 but got size 77 for tensor number 1 in the list. even after restarting and even with default settings

#

huh okay I shortened the prompt which seems to have fixed it

austere wigeon
#

so wondering, can you use good outputs of dreambooth creations as training for a 2nd round of dreambooth training?

sweet sand
#

base_learning_rate : any advice, i have 1.0e-06

hot breach
#

I've seen people using lower value, like 5e-7 but the base 1e-6 has been serving me well on the mega model for ff7r so far

stray idol
plucky swan
#

In diffusers dreambooth especially, class token is be the most important thing to choose after the training images and amount of steps itself. A good matching class can nicely fill up the gaps that the training data didn't have

#

Armor class token for me ended up resulting mostly like this

#

And warrior class token is able to create something like this

main ocean
#

Im having some ugly outlines that I can’t get rid of … with img2img . Any ideas. ? 🥹

obtuse shuttle
#

Quick question about fine tuning: I’ve just finished creating a CKPT via dreambooth, which can run in SD. But it seems as though the training has influenced everything from that ckpt file. Every single person now looks like the person I fed into the prompt. Is it possible to isolate the CKPT file to only come when explicitly prompted, as with Textual Inversions?

zenith fable
#

Anybody know if I can train an additional person onto an existing dreambooth ckpt, i.e, put the 2gb model into the directory and train onto it new new faces and new prompt? it's a pain to swap between ckpt files when making images with different faces. Hope this makes sense.

tough gazelle
tough gazelle
bleak swallow
#

if you have any checkpoints from partway thru training you could try those 🤔 or you could try merging the vanilla sd model with it

zenith fable
tough gazelle
#

That repo has the functionality

#

Trained it with 3 characters and it worked well

#

Speaking with someone else you can apparently put training images in where they are together and it make it more likely for group shots to work. Rather than what it usually does and you just get multiple clones

zenith fable
#

That's ace thanks! I'll give it a try

obtuse shuttle
long summit
#

Question about Dreambooth style training and regularization image.
When I trean new style , what do I put in training and regionalization images?

TrainingImage: various class images with specific style
Regulalization images: photography of various class images. e.g. cat, car dog, person, etc…

Is it ok?

visual atlas
#

Do you think that there is a way to finetune the model to give only good images to our liking (using a simple classification good versus bad of images output from normal model for retraining)?

gloomy hatch
#

Is it possible to combine AItemplate and Xformers to get the benefits of both?

sturdy charm
#

Hi guys,
Which enhancer works best for anime faces? Like the GFPGAN but for anime o:

hot breach
#

anime specific SRgns or real-esrgan

#

might help a bit at least

#

they are super resolution gans but typicall you can downsize then upsize to have it fix up styling some

hot breach
#

was able to get some concepts of the city and style for midgar into latest multi-model for FF7R, along with a 5th character

silent spear
# visual atlas Do you think that there is a way to finetune the model to give only good images ...

I've been experimenting with that in a way. I created a TI styled called "me-likey" based in 300 images I thought looked good, then just added it as a style influence the way I would any other style. Sadly, it probably takes more discipline and diversity than I have patience for... the definition of "good" clearly has some commonalities I didn't notice (unconscious bias) so the outputs were all kinda samey and eventually painfully repetitive. I'd probably have to start with a much bigger, much more random base to avoid that. But it's definitely possible, at least.

pulsar arrow
visual atlas
final matrix
shy widget
#

Sys VRAM: 12288/12288 MiB (100.0%)
wheew, no OOM

#

must have been really close

dreamy zenith
#

Are there any good rules of thumb for picking a TI step count? Is there a mathematical formula or something?

proper thorn
#

Hey, im looking for some help or direction on how to get started training my own model. if anyone could point me in the right direction that would be great

stuck arrow
#

I'm thinking of writing up my experiences and troubleshooting into a blog post sometime soon too

pulsar arrow
#

Anyone able to use the Shivam Shrirao Dreambooth collab notebook on the premium google GPUs? It only seems to work on standard GPUs for me.

proper thorn
#

i have an rtx 3070 and a r7 5800x

gloomy belfry
proper thorn
#

ahh ok

#

ill look into that

cursive obsidian
#

this was a badass reddit thread

#

after looking at all the results/timings, I don't really see a reason to use any of the dpm or heun samplers

stoic bough
#

For textual inversion is it better to have around 3-5 images or is it better to have more images of the object/style?

#

And if I were to increase the number of images, would I need to increase the training steps?

viral jay
#

from my newbie tests I found that its better to have the subject being trained and have different images of it with different lightning and background, closeup of faces for me worked worse than getting whole face from far but must be careful as it may learn other stuff around too like wallpaper or texts (like text on my chair), from experiments it seems like having less images but with more changing happening while subject you want to train remains stable is better than lot of pictures of same thing but with small changes

#

3-5 images was enough for some stuff, others I trained with 10-15 images...others with 20-25 images...it really depends if its getting the results you want on the output, but more images not always equal to more quality on my tests

#

steps is also something weird, sometimes I got well trained result with 2000-3000 steps for some face, for another I had to run like 6000-8000 steps, and in few situation I had it running up to 30k steps, but above 10-15k I didn't noticed a huge difference but that's for what I've been training, maybe something else that's not a person face could take advantage of more steps

stoic bough
#

I see, thanks for the info

#

Will just try a bunch of random stuff ig

limber peak
#

It uses deepspeed

gloomy belfry
#

seems like it OOM when pinning the memory for the optimizer even if I have plenty more free memory

proper thorn
#

yes its 8gb, but ill look into that to see if there are any updates then

plucky swan
#

Anyone trained subjects that have very limited training images (rare character or object)

limber peak
worn river
#

Is anybody making a horror/body horror kind of model, because that would be epic

fervent grail
#

Hey guys!! I am running Dreambooth on a 3060 and can't tell if it's doing 'good enough' or not. It very clearly has to do with settings - but I am getting a high loss this time. Should I nuke it and try something else? My first try I got like .04 or something like that - but I was extremely unsatisfied with the results, so I deleted the model and tried again

fervent grail
#

ahh! thats a great idea too! I saw a cool one with Studio Ghibli art style - was that you?

right now experimenting with my face, and it seems like it's working! doesn't feel 'exact' so I think I'll just have to fine tune it as I go along. My eyes were really bad the first time I tried it, and so were the teeth, but when I ran it again, it got better.

Thanks for the response! I was a little confused about that!

fringe shell
#

how did you get that "loss" data to show up? i'm only seeing this;

Total progress: 100%|████████| 30/30 [00:35<00:00,  1.19s/it]
100%|████████| 60/60 [01:13<00:00,  1.23s/it]
Total progress: 100%|████████| 60/60 [01:12<00:00,  1.22s/it]```
fervent grail
#

3060 - 165 photos - only took 19 minutes. I was surprised.

hot breach
hot breach
wary stratus
#

Hey for those who might be interested, I modified the Dream Booth notebook, mostly so that all available parameters of the training function can be used. I also reorganized a bit because of that, and added the possibility to just put a path to a gdrive folder where your images are from the start. Oh and an auto-disconnect at the end so you can let it run and it will disconnect on its own without having to wait to be kicked out.
https://colab.research.google.com/github/Zinston/colab_notebooks/blob/main/DreamBooth_Stable_Diffusion_(advanced_settings).ipynb

hot breach
#

several styles and a bunch of characters all in one

muted relic
#

what setting am i missing? i just want more sky

gloomy belfry
#

created an issue on the diffusers repo for the 8gb version if anyone wants to chime in their experience

fervent grail
#

Anyone know if Dreambooth with over 2000 samples makes a difference? Tried 2000 after doing 800 and it was night and day.

Also - probably obvious - but restore faces has the unfortunate side effect of removing some features.

muted sierra
#

what's a good number to set the "image creation progress every N" to?

#

in the web UI

still shore
#

Found this in a post previously, anyone tried it? Allows fine tuning but isn’t dreambooth and looks like you can have captions with the images etc

#

I’ve run out of compute units for the month trying to get Textual Inversion working on Hugging faces colab…

still shore
#

Or a hypernetworks finetune colab? I’d love to try an alternative to TI and compare.

limber peak
#

Is joepenna’s DB version still up to date, or are there better models for finetuning now?

hybrid pilot
ashen perch
#

i'm trying to achieve the style of Heroes of Might and Magic 3 with Textual Inversion, I've used these 11 images, maybe the descriptions are not the best

#

I've used 8 tokens and the initialization text was illustration style

#

I've used only this line in my custom prompt template:

[filewords], in style of [name]
#

After 5500 steps, loss is at 0.28 👀

#

what am I doing wrong?

#

should I rename my sample images?

#

the name of the embedding is homm3-v2-tk8-illustrationstyleinit-customtemp, is it a problem that I use homm3?
if I generate an image with the prompt homm3 it seems like SD has some knowledge of it, can it influence the result?

#

ok, i'll try it again

#

new descriptions

#

a new name for embedding: sksv3tk8styleinitcustomtemp

#

initialization text is style

wintry girder
#

Using auto1111, how do I know I'm generating images using the embed I created with textual inversion?

stray idol
#

I'm pleased to announce that the auto1111 repo now supports generating embeddings as shareable .png images:

#

Which can be dropped into your embeddings folder and loaded in just the same way a .pt can.

#

Uses the custom prompt from "Preview prompt" so you can choose what image is used as a representation of your work.

ashen perch
ashen perch
#

a castle, in style of sksv3tk8styleinitcustomtemp

#

with 5500, 4500, 3500, 2500, 1500 and 500 steps

#

same seed with simply a castle

#

what am i doing wrong? :/

stray idol
#

doesn't look like an unreasonable style from the images.

#

dilute the idea of 'castle' a bit in your prompt?

ashen perch
#

I don't know, this is with London, sksv3tk8styleinitcustomtemp
None of them look like London and I don't know where these characters come form

stone garden
#

pls zoom in the face

#

does it look overtrained to u?

ashen perch
#

I don't know, you can't really tell from a single image

stone garden
#

how can I tell?

#

I have many images

ashen perch
#

I try different styles eg. comic

#

if it won't be anything like a comic, it's overtrained I think

#

but I might be wrong

stone garden
#

ok ill try

hot breach
stone garden
hot breach
#

usually overtraining you get "cooked" faces, like sunburnt or too much contrast, you can sometimes get it to look good again lowering CFG scale, but its best to use an earlier copy of your CKPT from earlier steps if you start to see that

stone garden
#

there is just something off about the face

hot breach
#

I know some repos/colabs dont let you control it, but some let you keep copies of the CKPT every so many steps, or you can start training again on an existing trained CKPT just to add some steps

stone garden
#

like the features are too defined

#

ignoring the common issues with eyes and teeth, thats just SD

#

yea I used my own GPU I have all the checkpoints

silent spear
#

Feels like it's just a tiiiiiny bit overtrained. Though the "render as a comic" trick is a good way to tell.

stone garden
#

I did 100k steps on 800 images

hot breach
#

100k???

#

wow

stone garden
#

I'll do the comic thing

#

yea 100k was the default on automatic's webui

hot breach
#

oh

stone garden
#

so I just left it at that, its textual inversion not dreambooth

#

it took 7 hours

#

on an ampere a5000

hot breach
#

ok, yeah I'm using "dreambooth" and the most I've ever done is like 15k steps with 1400 images which is maybe 8-10 hours on a 3090

#

not quite comparable

stone garden
#

can u share the guide u followed

#

I tried to use dreambooth first but couldnt find proper docs

#

and was getting VRAM issues

#

I have 24GB

hot breach
stone garden
#

thanks

hot breach
#

I'm doing multi-character + style training all in one go

#

you can name your training images like "[whatever name] in a white dress holding her left hand up to her face.png" and such

#

example filename in training set "a food truck in the slums distrct of midgar city with people standing around_2.png"

#

"a close up of barret wallace in a brown collared vest and a necklace around his neck with a concerned look on his face_ (31).webp"

silent spear
#

My tolerance for accuracy is a bit wonky, but generally speaking ~3000-9000 is pretty good on TI, I find. But with 800 images, that changes the math a bit. Best way to think of it, I find, is to imagine someone locked you in a room with a pile of images and said "tell me what all these images have in common" ... the longer you worked at it, the most delirious you'd get, and you'd start seeing patterns that weren't really there, and eventually you'd become convinced that this "subject" was all about the mole on her right cheek. You want to reduce the number of steps accordingly, to avoid the trainer losing its mind.

(this is also why the comic style test is useful, because if you've overdone it, it won't be able to fathom that person existing in any form except a photograph, because it's convinced the "photograph" aspect is vital to the definition)

stone garden
#

so I must have overtrained a lot then

silent spear
hot breach
#

try "a photo of [whatever name], 50mm f5.6" or something like that

#

without the brackets

silent spear
# stone garden so I must have overtrained a lot then

Sometimes the boundary between overtrained and not-overtrained is really slight. I have two versions of most of my models, so I can swap when the "strict" version doesn't work. For instance, one of my characters absolutely cannot wear shirts with collars using the "strict" model, but drop to the "flex" version and it has no problem at all. The difference between strict and flex is 500 steps. 3500 to 4000.

ashen perch
#

Is it possible to train TI with transparent sprites?

#

like this

#

all 256x256

stone garden
#

I guess yea you'd just have to upscale them to 512x512 for training I think

silent spear
#

I've trained it on ~300 transparent PNGs and it didn't have any issues relating to the transparency, at least. I just couldn't make it generate anything useful from the training. Might've been my source, though. It wasn't the most cohesive set.

ashen perch
#

Is it a nice practice to mirror the images to get more images for training?

#

I mean is it a good idea?

silent spear
#

I haven't had a problem with it myself, but apparently mirroring can cause problems if your subject NEEDS asymmetry, like one side is uniquely different than the other, and needs to stay that way. Again, I've never had the issue myself, but I've read that it can be an issue.

prisma kiln
#

What are the top ERSGAN models everyone is using for (drawing, illustration and photos)?

stray idol
silent spear
stray idol
#

Nice, spookily similar style chosen.

silent spear
#

Oh, I used your embedding style for that. Just wanted to see how nicely it would play with my character embedding.

stray idol
#

Ahh, that makes much more sense! 🧠

ashen perch
stray idol
#

Just data as presented

#

If they're doing that you might get weird colour edges as the transition to 100% transparency and then turn off the colours though.

fervent grail
gloomy belfry
#

changed the diffusers dreambooth to accept multiple subjects in one training session for anyone interested

alpine rose
#

https://youtu.be/7m__xadX0z0
Hey guys, I followed this tutorial to train the model on custom people, however I'm struggling to use them correctly.
It's as if I was very limited in creativity once I include my person in the prompt.
Is that something to expect based on the quality and diversity of the training images ?
My training pictures are mainly portraits and not very diverse in terms of composition (chest and above)
Also I'm wondering if you can overfit the model ? Is 4000 steps too much ? I guess the amount of steps to use depends on the amount of training pictures ?

Dreambooth is Google’s new AI and it allows you to train a stable diffusion model with your own pictures with better results than textual inversion. Dreambooth is originally based on Imagen text-to-image model and this technology makes it possible for you to insert any character (yourself, your friends, your family), object or animal you want in...

▶ Play video
gloomy belfry
ashen perch
fervent grail
spice solar
fervent grail
#

So - I ran 2 people in Dreambooth. When I ran the second one, it seems that it overwrote the first. Is there a way to... you know... not?

ashen perch
alpine rose
stray idol
ashen perch
#

oh i see it now

hoary slate
#

What is best for training the model to recognize a certain pose? Textual inversion, dreambooth or training a hypernetwork?

#

I have 50-100 images of the same pose, but it's not the same person in any of them. Would that be something that could be trained?

sullen apex
#

Probably. On that note, does anyone have a reference for a list of all of the classes? Like person, dog, etc.

hybrid pilot
ashen perch
#

Preprocessing with Use BLIP for caption used to name the images with the captions, now they are placed in a txt, does anyone know why?

velvet glen
silk crystal
#

Hello, I am trying to finetune SD with Textual inversion and I am getting poor results
To be short, the training and val loss is oscillating during training and images generated in logs/images doesn't show any improvement even after 3h of training

I am using the InvokeAI colab notebook (https://github.com/invoke-ai/InvokeAI) which uses the hyperparameters of the original paper and I have 5 images in my dataset like recommended in it

#

Here are images in the training set :

#

And here is a sample of the results :

silent spear
velvet glen
silent spear
# silk crystal Hello, I am trying to finetune SD with Textual inversion and I am getting poor r...

I haven't used the InvokeAI training myself, but I find that with styles (especially more fantastical styles, or with fantastical subjects) you need more source images than normal training. At 5 images, it will be picking up the gist of the style (which is seems to be doing well) but has no idea how to apply it more broadly. Depending on your tolerance for pain, I'd try adding images in batches of 5 to see what happens. There'll be a sweet spot in there somewhere.

silk crystal
#

Interesting, thanks

#

Also if you know an alternative way to finetune SD with textual inversion I would be glad to check it

silent spear
#

I've switched over to using Automatic1111's version lately. The settings are a bit easier to manage, and it gives fancy PNG-based embeddings, which are 900 kinds of awesome.

silk crystal
#

Thanks a lot, I am gonna take a look 😄

#

It looks very awesome 😄

silent spear
#

I used to run a more customized version based on some colabs (a few weeks ago) but Auto just moves so fast that anything else felt like I was missing out on the future. Though as a warning: you may wake up some days and discover everything has turned upside-down and nothing works anymore. Just wait a few minutes and there'll probably be a new version you can git pull.

nimble harness
silk crystal
silent spear
#

I think for style training you want to be around the 10ish range (someone can correct me if I'm wrong). Then again, your specific style is both visual and conceptual, so maybe a bit higher would work too. All I can say for sure is that I once trained a style on 50 images at 30 vectors/token and all it would produce was very strange garbage that looked similar to my source, but in a truly demented way. Heh. Not sure that helps 🙂

crimson wasp
#

If anybody wants to try experimenting with textual inversion which limits embeddings to the range of weights seen in the original embeddings, I wrote some changes to Automatic's code with his help. You can replace modules\textual_inversion\textual_inversion.py with this, and you can play with the power of the effect on line 265. The original author of the textual inversion paper said that it should in theory help to retain editability of an embedding, and make it play nicer with other prompts

#

The changes are just the function determine_embedding_distribution, and where it's called to get the floor/ceiling, and then where they are used

hidden hatch
#

What went wrong with this imbedding? too many steps?

silent spear
#

Feels like it, yeah. How many steps vs source images?

hidden hatch
#

13 source images of my dog and 6k steps

#

1k steps the animal didn't quite look like my dog, it looked very generic.

tribal rapids
#

text_filename = os.path.splitext(path)[0] + ".txt"

silent spear
# hidden hatch 13 source images of my dog and 6k steps

Hmm, yeah, the sweet spot is probably in the 3-4k range, I suspect. I'm finding there's a spot pretty early on (1.5-2k) where things start to look decent, and by 3k they look solidly recognizable... and then it goes downhill fast once you pass 5k. But then no two trainings seem to be exactly alike, so it's hard to pin down an absolute truth to all this. At least not yet. I'm still working out a process 🙂

tribal rapids
#

@ashen perch if it can find the text file, it uses the words in there, otherwise it splits them from the filename (the old way)

ashen perch
#

I already renamed them manually 😄

#

I might do something wrong 😦

#

I took some screenshots from Heroes 3 HD, and cropped some parts into separate images, rescaled them into 512x512 and named them

#

there are 40 images total, +40 mirrored

#

I think the descriptions might be the problem

#

at 1600 steps, it produced results like this (I saved an image every 100 steps)

#

at 2500

#

at 5k

#

and it became worse, I ran for 23500 steps 😄

midnight knot
#

blip is kinda stupid... like 90% of the pictures of me that I ran thru it says that im holding a pizza or a cellphone or a remote and wearing a backpack.

#

is there a better option than BLIP?

silk crystal
bleak swallow
#

try the new deepdanbooru

ashen perch
#

I used 30 tokens and * as initialization text

silk crystal
#

I am as ignorant as you unfortunately
We can just wait for people to live their lives before they answer 😛

ashen perch
#

I’m trying to make it work for a week and I got about 0 help from here 😅 I don’t know if the file names are wrong or the images or the initialization text

silk crystal
#

For my tiny experience initialization text can make a huge difference
your file names look fine

ashen perch
#

And what should I enter if I wanna train styles?

silk crystal
#

For me entering the actual style I wanted to learn gave better results

"Sci-fi" or "space-opera"

#

I may be very wrong though

#

(and may confuse initialization text, embedding name and initalizer words 😅)

final matrix
#

so i am trying to finetune a model using this pokemon diffusion tutorial https://lambdalabs.com/blog/how-to-fine-tune-stable-diffusion-how-we-made-the-text-to-pokemon-model-at-lambda/ and this repo https://github.com/justinpinkney/stable-diffusion and I keep running into this error:

Found nested key 'state_dict' in checkpoint, loading this instead
/venv/lib/python3.8/site-packages/pytorch_lightning/loggers/test_tube.py:105: LightningDeprecationWarning: The TestTubeLogger is deprecated since v1.5 and will be removed in v1.7. We recommend switching to the pytorch_lightning.loggers.TensorBoardLogger as an alternative.
  rank_zero_deprecation(
Monitoring val/loss as checkpoint metric.
Merged modelckpt-cfg: 
{'target': 'pytorch_lightning.callbacks.ModelCheckpoint', 'params': {'dirpath': 'logs/2022-10-13T08-13-33_pokemon/checkpoints', 'filename': '{epoch:06}', 'verbose': True, 'save_last': True, 'monitor': None, 'save_top_k': -1, 'every_n_train_steps': 2000}}
ModelCheckpoint(save_last=True, save_top_k=-1, monitor=None) will duplicate the last checkpoint saved.
/venv/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/ddp.py:20: LightningDeprecationWarning: The pl.plugins.training_type.ddp.DDPPlugin is deprecated in v1.6 and will be removed in v1.8. Use pl.strategies.ddp.DDPStrategy instead.
  rank_zero_deprecation(
/venv/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:297: LightningDeprecationWarning: Passing Trainer(accelerator='ddp') has been deprecated in v1.5 and will be removed in v1.7. Use Trainer(strategy='ddp') instead.
  rank_zero_deprecation(
/venv/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:317: LightningDeprecationWarning: Passing <pytorch_lightning.plugins.training_type.ddp.DDPPlugin object at 0x7fd1e0c3f790> strategy to the plugins flag in Trainer has been deprecated in v1.5 and will be removed in v1.7. Use Trainer(strategy=<pytorch_lightning.plugins.training_type.ddp.DDPPlugin object at 0x7fd1e0c3f790>) instead.
  rank_zero_deprecation(
/venv/lib/python3.8/site-packages/pytorch_lightning/loops/utilities.py:92: PossibleUserWarning: max_epochs was not set. Setting it to 1000 epochs. To train without an epoch limit, set max_epochs=-1.
  rank_zero_warn(
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs

Traceback (most recent call last):
  File "main.py", line 846, in <module>
    data.prepare_data()
  File "/workspace/stable-diffusion/main.py", line 211, in prepare_data
    instantiate_from_config(data_cfg)
  File "/workspace/stable-diffusion/ldm/util.py", line 79, in instantiate_from_config
    return get_obj_from_str(config["target"])(**config.get("params", dict()))
  File "/workspace/stable-diffusion/ldm/util.py", line 87, in get_obj_from_str
    return getattr(importlib.import_module(module, package=None), cls)
  File "/venv/lib/python3.8/importlib/init.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 973, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'ldm.data.local'```
The Lambda Deep Learning Blog

Stable Diffusion is great at many things, but not great at everything, and getting results in a particular style or appearance often involves a lot of work "prompt engineering". If you have a particular type of image you'd like to generate, then an alternative to spending a long time crafting

GitHub

Contribute to justinpinkney/stable-diffusion development by creating an account on GitHub.

#

any ideas on how to fix that?

final matrix
midnight knot
final matrix
#

you can also change the tags (like 1girl to girl or brown_hair to brown hair) by using notepad++ and ctrl + f + shift and selecting the folder with your captions

midnight knot
#

yeah

#

i gotta figure out if im doing things correctly. tried training hypernetwork but it didnt work at all

final matrix
midnight knot
#

try newer python

final matrix
#

but also i font think thats going to fix the error because clearly a file seems to be missing i think?

midnight knot
#

for ldm

#

3.10.6 is the one i work with

silk crystal
#

The issue might be the stable diffusion version

#

I didn't look in the details though

midnight knot
#

you might be better off looking for a more popular repo.

final matrix
silent spear
# silk crystal In your opinion is it better to have as much images as possible from the same ar...

It kinda depends, but I can put it this way: I loaded up a ton of images from an artist who does painting and pencil art (nicely shaded, but noticeably pencil). The results were insanely messy, because it seemed to be trying to reconcile the two visual effects at the same time (weirdly, eyeballs all came out very pencil-drawn, while the rest of the face was painted). So if it's mostly the color palettes in play, you might be able to mix and match, but in general I will load an artist's different styles as unique embeddings, just to keep them distinct.

(and, as with everything SD, don't show it anything you don't want it to learn from, because it will invariably obsess over the ONE image you didn't really want 🙂

stray idol
silent spear
# ashen perch Please help me what I’m doing wrong

This one is gonna be tricky 🙂

So, as a kinda foundational concept, assume the AI is incredibly stupid and needs a lot of hand-holding. Give it a picture of a mausoleum and it will say "WTF is that?" Tell it it's a mausoleum and it will learn that all mausoleums look like that one image. Ask it to draw a mausoleum and it will spit out exactly what you showed it. That's why we use multiple images for each subject, to help it learn enough about a mausoleum that it can connect certain dots and make up its own stuff.

For TI, I like to give it some help by feeding some initialization text like "building" so it can blend its general knowledge of buildings with the image of the mausoleum, which lets it fill in gaps more easily.

BUT: you are actually feeding it multiple things at once. You've got a style + subjects. So you're basically saying "here's a picture of a castle in the middle of a forest" and the AI is learning and saying "OK, I am ready for another castle in the middle of a—" and then you give it a picture of a medieval building with a flag on the roof. And the AI is saying "WTF I don't see what any of these have in common..." and starts grasping at straws to find commonalities.

Now again, if the initialization text says "building" then it might have a bit more of a foundation (ha!) to build on, but in my experience it's gonna struggle either way. The longer you train it, the more desperate it will be to find ANYTHING that connects the source images, and you'll start to get really freaky images. I have become overly fond of the concept of locking the AI in a room with a set of photos and telling it the only way it can get out is if it figures out what they all have in common... the longer it's in there, the more delirious it's gonna be 🙂

(tbc)

untold peak
#

It might have as well used a single token.

silent spear
#

When I train styles, I like to give it very plain subjects that I know it will understand. Standard-issue humans, trees, bridges etc. Things I know for sure it will understand without much hassle. That way, it will be looking at the style, not the subject. Anything "unique" or "different" will send it down the wrong path.

The tricky part with your set (and this is out of your control, I know) is that you've got cool architecture and angles and overall CONCEPTUAL style, as well as the visual style. So the AI is going to be struggling to understand what it is you want it to learn. Especially in things like the mountains (which may not match any mountains the AI recognizes off the top of its head) or the graveyard (which is busy and probably hard for it to pick out individual features from). Imagine it's locked in a room and looking at that graveyard pic and you're saying "what do you notice about this photo of a graveyard?!" and it's scanning its memory trying to reconcile what it knows about graveyards with what you've given it. It's not going to focus on the art style, it's going to obsess on the wrong things.

All of which is to say: I would trim back your set to only include images that are fairly clear and distinct, where there is clearly a building that it might recognize as a building, with very little excess noise around it (so avoid shots where the BG has colors that match the foreground, or it may not recognize the object) and see how that works.

Then, once you get your style locked, you can use that style to generate new sub-classes of, say, architecture. So you can say "draw me a mausoleum in the style of style-123" and it will hopefully give you variations on a mausoleum that match the style you've built.

final matrix
#

any of you know ways to convert a ton of images from webp to jpg or png very fast?

final matrix
bleak swallow
#

Irfanview can do it but it needs a plugin to open webp

final matrix
#

ill try it thx

untold peak
#

oh sure, I just wanted to make sure you weren't wasting your time and resources doing something suboptimal.

#

also FFMPEG can convert webps

#

So you can write a batch script.

#

"ffmpeg -i image.webp image.png"

final matrix
#

on another server just now:
"i use irfan view"
"go learn ffmpeg cmd line tools"

ashen perch
silent spear
#

Yeah, I would stick to single buildings where possible, and add maaaaaybe "painting style" as the initialization text. Just so it knows what ballpark it's playing in.

final matrix
#

irfanview was easy, just had to install 2 exes

silent spear
#

I know this is assigning a personality where none exists, but still: I was testing a 3k step TI embedding just now: "photo of a woman played by b153, short hair" and it generates an image where the woman is too far away to accurately gauge the face. I keep trying, keep getting distant shots, almost like it's so uncertain about the face that it's AVOIDING drawing it.

"photo of a woman played by b153, head and shoulders, close up, short hair" --- and it generates image after image of the woman turned away from the camera so I can't see the face.

😐

"photo of a woman played by b153, head and shoulders, close up, short hair, (facing camera:1.3)"

...

Shot of the woman with long hair swept across her face.

viral jay
#

Hello guys, so after some experimentation, I think hypernetwork can't be used alone but its a good complement for textual inversion, here's a test I did, 3k steps for textual inversion, 0.002 learning rate then 3k steps for hypernetwork and 0.000002 learning rate and last image is the control image

silk crystal
#

Not Bad gg

fair perch
viral jay
#

the downside is that as its more efficient on style, most of images produced with hypernetwork are giving her a doll like face

#

but its producing some very nice results this way

alpine rose
stray idol
#

yeah, click as many times as you like for repeated crops.

alpine rose
#

so good 🤩

stray idol
#

Simplest thing that could possibly work 🧠

alpine rose
#

do you know of a tool to extract images from multiple videos ?

#

lets say one image every sec

stray idol
#

I have an ffmpeg script to do just that

alpine rose
#

HAHA no way

#

i'll look into it

silk crystal
# silent spear This one is gonna be tricky 🙂 So, as a kinda foundational concept, assume the...

So you're basically saying "here's a picture of a castle in the middle of a forest" and the AI is learning and saying "OK, I am ready for another castle in the middle of a—" and then you give it a picture of a medieval building with a flag on the roof.
Very interesting
So if I have multiple characters in the same style art doing different things (shooting, looking through a microscope...) I shouldn't give those details in the prompts (like BLIP does) ?
I should just put "a cartoon character in the style of X" ?

silent spear
# silk crystal > So you're basically saying "here's a picture of a castle in the middle of a fo...

That's where you run into the very difficult question of "what will the AI get hung up on?" because if it's reasonably clear what's happening, the extra detail in the prompts seems to help it focus on the art, but for instance "looking through the microscope" has a decent chance of causing you trouble, because it may not know what a microscope is (with any certainty, anyway) and skew further away from the goal while trying to figure it out.

So yeah, I typically stay with "a cartoon character in the style of X" and hope for the best. But then I also don't usually have a fantastically diverse set of source images to begin with, so that probably hurts/helps too 🙂

silk crystal
#

Thanks a lot for those very useful insights 🙏

silent spear
stray idol
alpine rose
#

@stray idol atm

#

i was getting somewhere 🥺

stray idol
#

It was just laying around my old project junk.

silk crystal
#

It would be nice if there was a way to identify the images where training loss is clearly raising 🤔

stray idol
stray idol
fervent grail
#

Why would you try tagging everyone? You monster

silk crystal
stray idol
#

textual_inversion.py

#

setup a dict entryLoss = {} before for i, entry in pbar: push entries in the loop after loss is calculated entryLoss[entry.filename] = loss.item()

#

yeah eventually maybe

ashen perch
#

these are my sample images

silk crystal
#

your dataset looks very heterogenous from my understanding of entropie's messages
I would keep only the castles

ashen perch
#

you mean these 6?

#

or the first 2 and the 4th?

silk crystal
#

everything on this picture except the last one

obtuse sleet
silk crystal
#

On my side, here are my current results

#

The training set
All images were captioned by "a cartoon man/woman"

#

1000th training step :

#

8000th training step:

#

21000th:

#

Attempt to generate something with the fine tuned model with the prompt "a woman in the style of mush characters" :

#

(a training set sample to compare more closely)

ashen perch
#

what was your initialization text? did you use textual inversion?

silk crystal
#

initialization text was "sci-fi character" with 10 tokens and i am using textual inversion

#

I feel like I am close to get very good results but it lacks something
Using more precise prompts gives very poor results so as the textual inversion paper suggests I think there are too much images in training set
poke @silent spear 👉 👈

next nimbus
#

hey guys, can we run dreambooth on windows without installing Linux?

fervent grail
next nimbus
#

And 2nd quesion, what if I run colab on my pc with jupyter?

fervent grail
#

I can't answer that one, I don't use colab

next nimbus
silk crystal
vale egret
#

In automatic 1111 how do styles work?

fervent grail
vale egret
#

There’s a create style button. What does it do, and how do you use it?

next nimbus
#

Any of you guys know of a colab that uses deepspeed so it will work on 8GBVram?

#

The one I found ask for 14GB : /

silk crystal
#

you can pay to use GPUs with more VRAM 🤷

next nimbus
vale egret
unborn basin
#

When training embeddings, does the model you train on matter? For example, if I use waifu diffusion to train an embedding, will that same embedding be as accurate as one trained on the regular model?

next nimbus
unborn basin
gloomy belfry
raven rain
#

Does anyone know if it's possible to resume training a Dreambooth model from where you left off? Or do I have to start from the beginning on a fresh SD 1.4 model?

#

I can't find this answer anywhere, sorry if it's been asked

crimson wasp
hot breach
#

I've resumed on 2gb models several times, no issues with the python cli from xavier and forks

hidden hatch
#

Is there a way to delete an imbedding? Do I just have to revert to the basic SD model?

stray idol
#

just delete/move the embedding file, or don't use its special term.

hidden hatch
#

I found the embedding file but wasn't sure if deleting it would break anything. I guess if it does it's easy enough to start over.

stray idol
#

no that's safe, don't worry.

hot breach
#

onward to adding back laion data now

dreamy zenith
#

Should the total step count in TI be proportional to the amount or detail of input images?

ashen perch
#

actually it worked

#

what you suggested

ashen perch
#

not perfect, after 10k steps it's still not quite catched what I wanted

#

but I chose render style as initialization text, the promt was a render of [filewords], [name] style and the preview prompt was Big ben, homm3v11tk30renderstyle style

#

and it gave me Big Ben!!!

#

first time it actually gave me what I entered in the prompt

#

😄

#

or not exactly 🤔

#

preview images are good

#

but if I try anything with homm3v11tk30renderstyle style, it gives me a random old guy

#

e.g Budapest, homm3v11tk30renderstyle style

silk crystal
#

Try castle, buliding, mausoleum etc.

wintry girder
#

What's the difference between a hypernetwork and an embed, in a practical sense?

#

After searching a lot, I see that the question is asked a lot but never answered

silk crystal
#

I want to finetune SD on 50x50 images
Do I need to upscale them to 512x512 ? If yes, what is the best way to do it ?

stone garden
#

Can someone explain hypernetworks and/or textual inversion and/or link to a website or article that explains it

crimson sandal
unborn basin
# stone garden Can someone explain hypernetworks and/or textual inversion and/or link to a webs...

I'm an amateur at this, but my understanding of Textual Inversion is this: the language model translates your prompt into a vector that is used to guide the unet from the random noise towards an image. TI is the act of fine tuning that vector for specific concepts that the language model might not have a word for. Basically, TI is telling the network "when I say 'asadayo', I'm talking about images that look like this"

#

I do not understand hyper networks and haven't experimented with them yet.

#

I would also love if someone who actually understands this stuff could chime in. I have a feeling I'm missing something about the mechanics of TI

humble bramble
#

So, if I have understood clearly, TI/embeddings are instruction to tell the model "what" a word is and hypernetwork are instruction to tell the model how to recreate a particular style? what about VAE?

hot breach
#

if 2 particular characters have a group photo, the likelihood you can get a good image of them together at inference time is a lot higher

#

I can do a pretty good job getting Cloud/Barret/Tifa in one image because there are a couple examples of all three of them in one image in the training set, plus a ton of training images for all of them individually (140+ each now) mixing aerith/jessie together is much harder, no images of those two together (not even sure they meet in the game?) and jessie has a slightl smaller training set herself (~90) which may affect that

#

worth noting jessie still looks spot on by herself with just the 90 solo images and a few with her and cloud/biggs/wedge

#

wedge/biggs have much smaller sets and don't look very good, I have more captures ready but trying to avoid just training again until I get laion data introduced back to replace regularization

feral lava
#

is there any resource or chart for explaining the differences in the "Samplers"

green flax
#

do any of you know of a tool that will grab every frame from a video that contains a specified character

#

it should be possible to use ai to do this somewhat reliably and it would be an excellent source of images for textual inversion

hot breach
#

only aware of speech models for "diarization", you'd need to have a model you could first train to recognize individual characters, certainly possible just don't know if something like that exists

sacred grail
hot breach
stone garden
hot breach
#

one interesting thing to test is the main characters vs. side characters, it will give you an idea of how the weight of training data impacts quality, i.e. "biggs ff7r" has much less data than "cloud strife" and looks worse for it, and "rufus shinra" has a tiny training sample

#

"red xiii" is also still pretty awful, barely 10 images out of 1400

hollow surge
#

how did waifu diffusion train a wifu model? is that dreambooth? textual inversion? or something else?

hot breach
#

I don't think they used dreambooth

#

they trained only on anime content afaik, and did not attempt to "protect" the existing model, so it can no longer produce "a photo of tom cruise" etc

#

I actually haven't tested, I don't have WD ckpt

hollow surge
#

it actually can, it's just more anime styled

hot breach
#

yeah

hollow surge
#

but yes it totally messed with every generation

hot breach
#

they stomped on the model so to speak, reformed it to be very specialized

#

I believe next step for fine tuning is introducing laion data back and dropping regularization

#

I'm working on that as we speak

#

once regularization is dropped, it's not really "dreambooth" anymore

hollow surge
#

no clue what that means but sounds good

#

regularization is the weird fact you have to generate like 300 class images right

hot breach
#

yes

#

it's used to keep the model from overtraining on the training images, to "protect" the model to not forget how to draw the stuff it knew before you try to train new things

#

so the FF7R model can still do "a photo of tom cruise" and he doesn't look like a Final Fantasy video game character for instance

hollow surge
#

that makes sense

hot breach
#

he will if you prompt "tom cruise standing on the rooftops of midgar city slums district" however, as the style transfer happens when you start prompting "midgar city slums district" and soforth

#

the gist or the huggingface links have links to imgur with examples of that

#

yeah its on the gist if you scroll down, links there to imgur

hollow surge
#

❗❗ THE NUMBER ONE MISTAKE PEOPLE MAKE ❗❗
Prompting with just your token. ie "joepenna" instead of "joepenna person"

#

is this true?

#

i haven't been using the class name and i thought my results were excellent. i'll have to try adding the class

hot breach
#

I've loaded some models from other people and I've found person may not be required

#

it may learn enough on just "joepenna" to work, but keep in mind there are a variety of techniques being used

#

several of us have not been using "sks" this or that, or "person" for class_word on the older repos at all

#

I trained "john carmack" just like that, without "person" anywhere at all and it worked perfectly fine

#

richservo has a giant list of models trained this way as well, no "person" at the end, just the name

#

there's also no particular reason to train without spaces in names and such that I can tell, though really long names do use up tokens later when you want to prompt I guess

hollow surge
#

btw, i just noticed automatic1111 ui seems to have no tokens limit!? that's crazy, didn't expect that to happen so soon / ever

fervent grail
atomic lagoon
#

I am installing webui and my PC is lagging so much, I can't move anything, is it normal?

fervent grail
atomic lagoon
inner turtle
# atomic lagoon Ok, how much time did it take?

Depends on the disk/bandwidth speed on your first time since it’a downloading SD and creating a new environment. For me it took between 4-6 mins. But after the first time it takes 30-60 seconds

atomic lagoon
#

how i reopen after the instalation?

north nest
#

Does anyone know What’s the vram requirement for the textual inversion training in webui?

bitter matrix
#

Ok so I turned on live preview for sample steps, How does this help me refine an issue I see during the render...For example and incomplete weapon missing a hilt .....I'm not sure how this benefits me....?

maiden coyote
#

You can't force SD to fix something like missing body parts/incomplete weapons, etc.. but you can at least identify that it's not coming out how you want it and interrupt early to tweak setting and try again

unborn fulcrum
#

How good is hypernetwork? Anywhere I can see results?

viral jay
#

without hypernetwork

#

with hypernetwork

#

control

rose flame
rose flame
#

My most important discovery was I had to turn hypernetwork strength to 0.5 for it to work!

#

I was getting rubbish results nothing like my reference material until I did that

#

Are people combining cpkt, hypernetwork and textural inversion with good results or is that considered overkill or is one of them considered redundant at this point?

harsh viper
#

Good news, for those who love to mess around with hyper networks or embedding folder, you don’t need to restart the whole program to have the dropdown listing your new pt files. With my change (just merged to master) just simply click the refresh button, you will see the new pt files in the drop-down

silk crystal
#

Hey, is there a playbook for style transfer learning ? 🤔

sturdy willow
#

hypernetwork is great for styles , it does take some time to get it right

#

I trained a hypernetwork using some Mob psycho anime screenshots. Here are some comparisons

#

26k steps

#

started with e5 then e6 than back to e5 for the last 10k steps

#

same params , just changed the HN for each one

silk crystal
#

Did you use different characters in your training set ?

half folio
#

Has anyone tried finetuning the full SD model (not TI or DB) using the script in the diffusers repo?

#

It looks very straightforward to me but I don't know if it works on colab or if the results are any decent as it says the script still is experimental.

viral jay
#

what I've noticed is that hypernetwork allow to use bigger image sizes than TI

#

I can train up to 2048x2048 on my 12gb card

#

So after a bit of test, I figured that actually hypernetwork is able to learn faces too, but it works different from TI

#

That's me (photo)

#

that's a test with hypernetwork, 2048x2048, 42 images, 0.000025 learning rate, 1500 steps, images have been labelled with BLIP

#

from my tests during training I can say that lower learning rate wasn't giving me desired output, I was training 15-20k steps and still got nothing, so I was increasing learn rate and at 0.00005 it went crazy (noise texture) above 1200 steps, so I've reduced the learning rate by half and it seems to work so far

sturdy willow
#

@silk crystal just in case if you are interested , here is the images and meta data that I used for my HN, keep in mind that I used the [redacted] model

#

another comparison using the HN trained and different strength

viral jay
#

did some comparison too, that's what I said about it going crazy, with 0.000025 and above 2500 steps it start to get really bad, but the sweet spot is close to it at 2000steps, so kinda delicate balance I would say

#

will do more testing with different learning ratios, to see how it behaves at lower rate

stone garden
#

I have 800 pics of a person, I want to do textual inversion, any idea what are the best settings to have it learn a face/person but without overtraining? last time I used all default settings in automatic1111's UI (100k steps) and it learned the photo style so I couldnt then use it to do any different style like oil painting or anime drawing.

sturdy willow
#

had to tweak some stuff and start over and over

#

but Its a good start to get a idea

stone garden
#

Also us it better to have it learn on the full pics or should I extract the faces only and train it on that?

sturdy willow
sturdy willow
stone garden
viral jay
#

I never tried 100k steps, think it may be overkill, on my case for TI I use between 10-20k steps, sometimes it get good even at lower steps like 5-7k

#

something that also play a good role is the tokens count, higher token count takes out the control, so I would keep it in between 1-10 tokens only, higher makes it difficult to apply styles

stone garden
sturdy willow
#

I thought 10k was the default

#

aah you are talking about TI

hot breach
#

you can experiment running at a certain LR rate for X steps, then continue with more steps at a lower LR, assuming whatever code you're using allows you to continue training on your own bin/pt/ckpt

stone garden
#

I think Im just gonna try dreambooth instead

#

since it seems like the superior option

#

just gotta figure out how to run it on my 24GB VRAM without it crashing lol

#

would be nice if automatic adds dreambooth to the webui

viral jay
#

yeah unfortunately I have only 12gb 😦 so can't say about it

#

so a bit more of testing, hypernetwork with 0.00001 learning rate and from 0 to 6k steps

stone garden
hot breach
#

neat!

ashen perch
ashen perch
silk crystal
#

test prompt

ashen perch
silk crystal
#

well i was wrong then

tribal rapids
#

hi, if i'm using dreambooth training eg with token=jmp909 class=man so jmp909 man is there a way to give more weight to the token to steer it towards picking up my likeness? eg a photo of a [man:jmp909 man:0.3] on a beach holding an icecream or whatever? or a photo of a (jmp909 man:1.3) on a........

silk crystal
#

But you will need to gather a bunch of images of buildings for regularization for your specific problem

ashen perch
#

my card only has 8gb of vram so dreambooth is off for me, maybe through colab

ashen perch
wintry girder
#

What's the difference between a hypernetwork and an embed, in a practical sense?

#

I'm thinking that no one actually knows, is that accurate?

hot breach
#

if the game has a square aspect it might be better

#

if you can use caption training you could add "with a toolbar on the right" only to the right side crops, maybe it would learn the difference, just an idea

stone garden
wintry girder
wintry girder
#

Great, thanks. Is this from experience, or do you have a source?

stone garden
#

based on the context I infered by reading what other people that dont understand it either said

wintry girder
#

Gotcha

viral jay
#

embeds allow you to create a word for some subject, while hypernetwork seems to learn the overall information

wintry girder
#

Does the hypernetwork not use a word also?

viral jay
#

nope, if I write a prompt like "a man with mustache" with my hypernet any man will be like my face

wintry girder
#

Hmmm... So every generation is thereafter contaminated by the hypernetwork?

viral jay
#

if hypernetwork is enabled, yes

wintry girder
#

Do we get any control over it?

#

Right

viral jay
#

on automatic webui you can control strength and choose which one

wintry girder
#

Oh... Where do I find that in auto1111?

viral jay
#

I'm still testing it, for my face learning it seems to be doing a very good job now, I've taken some more photos of myself, was using 40 now I'm with 80 photos, changed my shirt and took some photos with and without glasses at different places of my house, want to see if it stop showing my jacket on all generations lol

#

ah sorry

#

misunderstood your question

wintry girder
#

Maybe it appears after you create a hypernetwork? I can't actually seem to create one because it says "Error" 😂

viral jay
#

I have this problem sometimes too, you need to close/open it

#

often happens after training a TI

wintry girder
#

Gotcha

viral jay
#

the strength you can control on settings

#

or do like myself

#

on settings > quicksettings list use "sd_model_checkpoint, sd_hypernetwork, sd_hypernetwork_strength"

#

this adds the options to top of page so its faster to change them

wintry girder
#

I don't have quicksettings, maybe I have to update

viral jay
#

hm yeah I think its some recent addition

wintry girder
#

When I'm done with my current work I'll update and have a look, thanks for the info 🙂

viral jay
#

I'm getting very happy with results on hypernet now with 80 photos its producing less biased content

wintry girder
#

What application are you thinking about for it?

viral jay
#

my idea is to make it learn the faces so I can make different styles of them and maybe create shirts, cups, etc using those

wintry girder
#

I see, and is there a reason you're choosing hypernetworks over embeds for that?

viral jay
#

I'm actually experimenting with, on my first try I wasn't getting anything good out of hypernetworks, but after some tries I'm starting to get good outputs, but still need with other faces

wintry girder
#

Cool. And have you already experimented with embeds?

viral jay
#

from comparison I think the HN is producing more accurate face than embeddings

#

yeah, with it I had to generate several times to get something good out of it

wintry girder
#

Ok interesting

viral jay
#

lol I'm getting really good results with hypernetwork

#

what I can say is that it's producing more natural images compared to TI

tribal rapids
#

@viral jay what’s your training settings? (Steps, images etc). Thanks

viral jay
#

6000 steps, 0.00001 learning rate, 73 photos (had 80 but I removed some out of focus photos), all them tagged with BLIP

tribal rapids
#

Thanks