#🔧｜finetune | Stable Diffusion | Page 4

maiden grail Nov 10, 2022, 6:56 PM

#

Hmm, so merge them with different prompts? IE, I have a staff of power, and a staff of the archmage, with prompts for each?

The problem with that is now I have to make a model PER item. It multiplies the amount of training data I need. So 25 items becomes, 125 images needed.

Or, I guess, maybe thats the only way to do it.

Another way to put is, is that I am trying to train the model, both on the concept of a "staff", as well as on the concept of "of power", or "of the archmage", and then combining that together into one input.

So I guess it does seem like multiple models...

old igloo Nov 10, 2022, 7:00 PM

#

maiden grail Hmm, so merge them with different prompts? IE, I have a staff of power, and a s...

I think what I would do is try to use "staff" as the class, and train one model on each specific staff, so for example "staff_of_power staff", "staff_of_archmage staff", where each one is a separate model, and then merge B into A. Then if you make a third model, merge C into AB and so on.

#

As far as training data goes, if you want each different named staff to be represented differently visually, then you'd need images of each anyway, so I don't understand what you meant by it multiplying the number of images needed.

maiden grail Nov 10, 2022, 7:02 PM

#

Well, so that almost gets there. the idea is I want to be able to do something like "robes of power", and I would have a concept of a robe, the concept of a staff, and the concept of "of power"

#

robes of power doesn't exist, lets say. So it would be newly generated.

old igloo Nov 10, 2022, 7:02 PM

#

Same thing, but you'd need to do a training for each class and object combination

#

"staff_of_power staff", "robe_of_power robe"

maiden grail Nov 10, 2022, 7:03 PM

#

Why would it be "robe_of_power robe" ?

Wouldn't it be "of_power robe"?

old igloo Nov 10, 2022, 7:03 PM

#

Actually in that sense, you're talking about defining the concept of "of power"

#

And as far as this goes, I don't know how one defines a concept that can be applied to any class

#

I don't know if, or how that would work.

maiden grail Nov 10, 2022, 7:05 PM

#

Yeah, there are 2 concepts here actually.

Concept 1: how to "define a desciptor" like of power.

concept 2: combine a descriptor model, with a different object model. Because I don't want to use the SD baseline definitions of a "staff". I want to finetune both

old igloo Nov 10, 2022, 7:07 PM

#

Maybe the way to do this is to provide your own class images for "of_power"

#

Which would include samples of all kinds of things with "of_power" applied.

#

But I think you'd still end up having to define every object you want that to be applicable to.

maiden grail Nov 10, 2022, 7:09 PM

#

Gotcha, yeah this makes sense and was about what I was thinking. Will try it and see how it goes. Thanks!

old igloo Nov 10, 2022, 7:09 PM

#

The only other idea I have is that I know there's a different technique for training on a particular style

#

So if you consider "of power" to be a style, that might be another approach.

maiden grail Nov 10, 2022, 7:09 PM

#

That might work too, what is that?

old igloo Nov 10, 2022, 7:10 PM

#

I'm not sure, but I think it's similar to normal training, but you supply the class images.

old igloo Nov 10, 2022, 7:12 PM

#

maiden grail That might work too, what is that?

https://www.youtube.com/watch?v=7OnZ_I5dYgw

YouTube

Aitrepreneur

TRAIN Stable Diffusion With Your Own Art! Textual Inversion Made Si...

We can finally train Stable Diffusion using our own art and photos thanks to textual inversion! The hugging face team recently created 2 Google Colab docs that allow you to upload your own images and train the base 1.4 stable diffusion model on them. In this video, I will show you how you can use these Colab docs to understand the concept behind...

▶ Play video

maiden grail Nov 10, 2022, 7:13 PM

#

Wait, is merging just done through the checkpoint merger tab? It only combines 3 models together.... Are there problems doing merges iteratively? Like a 100 of them? will it bias the latest ones? Or I guess if you put a different prompt on each it shouldn't matter?

old igloo Nov 10, 2022, 7:13 PM

#

I'm not 100% clear on the differences between these concepts yet, but I think what you're talking about could be handled (or is handled) by textual inversion.

#

And I think if you go the Textual Inversion route, you wouldn't have to do any merging, because you'd be doing a single training on a batch of images representing the "of power" style.

maiden grail Nov 10, 2022, 7:15 PM

#

So the "of power" thing is fine. I think that would just be a regular model.

But, the issue is that I don't want an "of power" model alone. Instead, I also want a robe model, and a staff model, because the defaults for both of these are bad.

#

So would need to merge models, just because both parts need to be good

old igloo Nov 10, 2022, 7:17 PM

#

Yeah, I think that sounds right. You would first use textual inversion to train on the style, then you would use dreambooth to train on specific objects, merging those models into the style model. As far as I understand it, you can merge A into B, then C into AB then D into ABC then E into ABCD, and so on and it doesn't dilute with each additional merge, it just appends.

stone garden Nov 10, 2022, 7:18 PM

#

Is there a way to train it so you can put an effect on an image for example, and then have images that are control-group and images that are the effect-group and tell it that when you generate any image you can have the style of the effect-group? Like a style transfer but more specifically trained with matching input control and effect images.

maiden grail Nov 10, 2022, 7:18 PM

#

Why did you suggest SD for the style part? Or is it that DreamBooth just can't make styles? I've found dreambooth to universially be better than the standard create embedings.

stone garden Nov 10, 2022, 7:19 PM

#

Or is it just enough to train with lots of images already in the desired style?

maiden grail Nov 10, 2022, 7:20 PM

#

stone garden Is there a way to train it so you can put an effect on an image for example, and...

So, you are saying, have a base, and a solution image directory?

What even does this? I haven't seen anything that allows setting a start and a finish.

I thought it was only on a set of input images.

stone garden Nov 10, 2022, 7:22 PM

#

Yeah my specific use case will still have to wait a bit for really-high-res capabilities, but basically we should be able to train a system to be able to output files that have like a high-res color-halftone pattern to them so they can be easily printed in some processes.

#

Imagine like a style transfer, but more of a system that spits out images with certain criteria like color-count and in patterns pre-set for printing like halftones

#

Maybe with enough training on really good high-res datasets it can be done and just really specify the training text/image matching. Is dreambooth done through naming the images or it just uses the images themselves?

#

Trying to determine whether I should still look deeper into more custom fine-tuning methods

maiden grail Nov 10, 2022, 7:25 PM

#

stone garden Maybe with enough training on really good high-res datasets it can be done and j...

Was literally trying to figure out if there was a way to name images, and train of the names/words that you put in, yeah.

This other strat that is discussed could work for me, but I am super curious if you can name images, and give it text descriptors, and then train the images/texts together.

#

There are 2 fields, in the dreambooth tab, that look promising, but I don't know how they work.

"concept list" and "classification directory".

No idea what those do, or how they work, but they sound related.

#

Even "class prompt" could help? I feel like there are a bunch of fields already in these models, and people just don't know how to use them....

surreal mango Nov 10, 2022, 7:35 PM

#

okay so question

#

I have 17 images

#

how many training steps should I do?

#

for dreambooth

old igloo Nov 10, 2022, 7:37 PM

#

maiden grail Why did you suggest SD for the style part? Or is it that DreamBooth just can't ...

Textual inversion is a different concept than what Dreambooth does, as I understand it. The video I posted above explains. What you are trying to do sounds like two things- first train on a style with textual inversion (i.e. "of power" which I do not think Dreambooth can do) and then train on new subjects (i.e. "Staff", "Wand", "Whatever"), which Dreambooth does do.

old igloo Nov 10, 2022, 7:40 PM

#

surreal mango for dreambooth

As far as I understand it, there isn't a one size fits all formula to this. What I have been doing with my recent models is using steps of 5000, with checkpoints every 1000 steps. Then I can test which of the 5 checkpoints produces the best results. Usually some are over trained and some are undertrained and one or two is in the Goldilocks zone.

surreal mango Nov 10, 2022, 7:42 PM

#

wait I just realized

#

#

It says this

#

but Ill still take your advice

old igloo Nov 10, 2022, 7:48 PM

#

surreal mango wait I just realized

Right. Since you have 17 images, 5000 isn't a bad upper limit since 17*200 is 3400, you'd be going above and beyond the recommendation a bit, and by setting the checkpoints at every 1000 steps, you can test to see if you get better results from 3000, 4000, or 5000 steps. Could also just go big and set it at like 10000 steps, but you'll almost certainly not be happy with the results above a certain point.

surreal mango Nov 10, 2022, 7:51 PM

#

old igloo Right. Since you have 17 images, 5000 isn't a bad upper limit since 17*200 is 34...

what resolution would you reccomend?

old igloo Nov 10, 2022, 7:56 PM

#

I have not tried anything other than 512x512 because my understanding is that's what SD was initially trained on, and I think if you were to attempt to double the image size, you'd need double the vram.

surreal mango Nov 10, 2022, 7:57 PM

#

one last question should I leave this at default?

old igloo Nov 10, 2022, 8:06 PM

#

I'm not familiar with that setting. The notebook I use doesn't have that.

tough gazelle Nov 10, 2022, 8:10 PM

#

surreal mango one last question should I leave this at default?

When I tried to use that it tried to generate regularization images. And if your using a free collab GPU, it takes ages.

stone garden Nov 10, 2022, 8:18 PM

#

okey, some sample images from dreambooth run 1200 steps LR 2e-6 on 13 1024 images it's a little bit too colourful compared to the dataset

surreal mango Nov 10, 2022, 8:31 PM

#

stone garden okey, some sample images from dreambooth run 1200 steps LR 2e-6 on 13 1024 im...

try using some negative prompts

#

like strong colors or something

#

also maybe not be british

#

(im jk)

stone garden Nov 10, 2022, 10:29 PM

#

is xformers for dreambooth supposed to give a better quality of results or something ? I see it gives me more VRAM to play with but it has a really big cost on my it/s

icy olive Nov 11, 2022, 12:46 AM

#

stone garden is xformers for dreambooth supposed to give a better quality of results or somet...

I thought xformers was a speed optimization

hot breach Nov 11, 2022, 1:40 AM

#

Runpod notebook now available on EveryDream. Also recent updates include training on anything without bothering to crop or resize, mass autoprune all your checkpoints, and a tools repo is available to help you caption your images on a 4GB GPU or on colab/gdrive: https://github.com/victorchall/EveryDream-trainer/blob/main/README.md

GitHub

EveryDream-trainer/README.md at main · victorchall/EveryDream-trainer

General fine tuning for Stable Diffusion. Contribute to victorchall/EveryDream-trainer development by creating an account on GitHub.

warm gull Nov 11, 2022, 1:46 AM

#

icy olive I thought xformers was a speed optimization

that's correct

jovial iris Nov 11, 2022, 4:32 AM

#

do you have the dreambooth model?

surreal mango Nov 11, 2022, 4:57 AM

#

how would I apply an anime artstyle to a dreambooth model and still make it look like the face?

hot breach Nov 11, 2022, 5:06 AM

#

did you try "soandso drawn in the style of anime" ?

stone garden Nov 11, 2022, 8:52 AM

#

I'm a little ashamed to try out your EveryDream tool on a Rick roll dataset, but I'll do it. for science

#

last time I tried, it was my first ever DB, and I ended up with a "rollrick" token that was creating a tiger or a baby half of the time

terse cradle Nov 11, 2022, 12:28 PM

#

can you send me the model?

stone garden Nov 11, 2022, 1:45 PM

#

Hello, a question: loosely, about how many images does one need to finetune a generic model, not for a specific object or person but more like those stylistic models (trinart, elder ring etc)?

terse cradle Nov 11, 2022, 1:49 PM

#

iirc around 50 imgs

tired wind Nov 11, 2022, 4:19 PM

#

stone garden okey, some sample images from dreambooth run 1200 steps LR 2e-6 on 13 1024 im...

You might want to look at what the instance prompt was, it could be pulling in some brighter colors from that. Remember dreambooth is supposed to take an existing concept and piggyback on it. Like X dog becomes your specific dog, but it still needs to attach itself to the dog concept. So if you did X painting style, or X some_artist painting, then the bright colors might be leaking from there rather than your dataset.

stone garden Nov 11, 2022, 4:22 PM

#

tired wind You might want to look at what the instance prompt was, it could be pulling in s...

I'll do another run, but with f222 as base and some minor changes

viral jay Nov 11, 2022, 4:42 PM

#

jovial iris do you have the dreambooth model?

unfortunately I can't share it, It's trained with partner paid graphic content

viral jay Nov 11, 2022, 4:43 PM

#

terse cradle can you send me the model?

unfortunately I can't share it, It's trained with partner paid graphic content

terse cradle Nov 11, 2022, 4:45 PM

#

viral jay unfortunately I can't share it, It's trained with partner paid graphic content

Sadge, can you share the dataset tho? kekmad

tired wind Nov 11, 2022, 4:56 PM

#

stone garden I'll do another run, but with f222 as base and some minor changes

Also I forgot, try lowering the CFG scale and see what happens

tired wind Nov 11, 2022, 5:00 PM

#

viral jay unfortunately I can't share it, It's trained with partner paid graphic content

With your class prompts, you generated unique images that weren't specifically emojis. From what I understood of dreambooth the class images should have been for the original concept, e.g. if it was X dogs the class prompt images would have just been dog. Was there a reason you generated them as mostly unrelated images of groups of objects? It isn't clear to me that the class images are influencing the embedded concept, however I do see that they can shift the model output when the embedding token is excluded.

old igloo Nov 11, 2022, 5:55 PM

#

Can anyone help me better understand the factors and/or settings that can cause over-training? I trained on 30 images, with 5000 steps, checkpointing every 1000 steps, so I have 5 models (1000,2000,3000,4000,5000 step). I used a learning rate of 5e-7 on this training, and I suspect that setting is the one I need to adjust because even my 1000 step model is overtrained and I have to use a CFG of 5 or lower to get it to actually apply concepts from the prompt to the image and not just spit out images of the original subject.

#

Like if overtraining happened because of the learning rate of 5e-7, would I go up or down on that value to attempt to correct the issue?

cobalt sorrel Nov 11, 2022, 6:22 PM

#

old igloo Can anyone help me better understand the factors and/or settings that can cause ...

2e-6 or 1e-6

#

probably with these values you will have better results.

old igloo Nov 11, 2022, 7:13 PM

#

cobalt sorrel probably with these values you will have better results.

I also lowered my max steps to 1200 and checkpoints every 300 steps, and it seems I am getting much better results now with the 900 step checkpoint.

tired wind Nov 11, 2022, 7:14 PM

#

old igloo Can anyone help me better understand the factors and/or settings that can cause ...

It could be overtraining even past 1,000 steps. From my own experience with around 20 images it always overtrains by about 3,000. Also the images used are super important. The original dreambooth paper is for 3-5 images of the subject. I would look closely at what images are being reflected in the output too strongly and remove those.

old igloo Nov 11, 2022, 7:21 PM

#

tired wind It could be overtraining even past 1,000 steps. From my own experience with arou...

Thank you for the suggestion. It was hard to say which images were being over-represented, because it would generate unique looking images of the subject, but it just wouldn't apply transformations. I'd have to step down to 5 or lower on cfg, and then the subject would be poorly represented, so I couldn't really find a good balancing point. I did go through and remove the lowest resolution images from the set, so it went from 30 to 12 images, then I retrained it as above, and now I am getting much better results, transofrmations are being applied and subject is preserved at CFG 7.5, so I think this model is good now.

short ivy Nov 11, 2022, 9:12 PM

#

An official music video that I made for DJ Pone (with fine-tuning dreambooth) https://youtu.be/p3eygmzWJYI 😁

YouTube

DJ Pone

DJ Pone - Paradis (Clip Officiel)

DJ Pone - Paradis (Clip Officiel)
Nouveau single disponible : https://bfan.link/djpone-paradis
S'abonner à la chaîne YouTube : https://bit.ly/djpone-yt

Clip vidéo réalisé par neb : https://www.instagram.com/nebsh83

"Paradis"
Produced by DJ Pone and Blasé
Lead Vocals by Disiz
Composed by DJ pone and Romain Hainaut
Written by Disiz
Arrangeme...

▶ Play video

sand pine Nov 11, 2022, 10:31 PM

#

Is this model available anywhere?

honest saddle Nov 12, 2022, 1:44 AM

#

I have a dataset of around 500 different named subjects, each with 5 or so poses in the same style. I would like to train a model with this style, and think it would be cool to also retain the subjects. What would be the best way to do this?

#

If keeping the named subjects isn't possible, what's the best way to train a checkpoint with 2500 samples of a style?

stone garden Nov 12, 2022, 1:50 AM

#

tired wind You might want to look at what the instance prompt was, it could be pulling in s...

The instance prompt is --[artist name]... I'm pretty sure that's not it. I went from 1200 to 2400 steps, batch size 2 to 1, and I checked the scale learning rate . And I use the f222(nsfw) instead of the original sd-v1-5 . Well, got better hands now overall, but now I have to add some negative prompts for the extra nipples 😊

#

Results are pretty good

wise trench Nov 12, 2022, 2:16 AM

#

Anyone done a lot of toying with dreambooth and find out decent learning rates for digital drawings?

#

Can't seem to find a good middle ground it's either over trained and will only generate the original images OR it's under trained misses details and is quite noisy

somber shell Nov 12, 2022, 3:25 AM

#

Hi all. Does anyone know how to make sure the image is always fully in the frame. Example: a portrait.
Thanks much.

grave carbon Nov 12, 2022, 4:42 AM

#

I'm trying to dreambooth locally on 12GB VRAM

#

but running out of memory

#

anythingg I can do?

open oasis Nov 12, 2022, 5:51 AM

#

grave carbon but running out of memory

reduce your output file dimensions?

stone garden Nov 12, 2022, 7:00 AM

#

somber shell Hi all. Does anyone know how to make sure the image is always fully in the frame...

Since you are in the #🔧｜finetune i would suggest DB. You can refine the "full body" concept with like 10 varied pictures illustrations and photos of someone in the frame like you want, give it a keyword and train on it. If you don't train too much and use prior preservation you could make yourself a model with better framing for your tastes

stone garden Nov 12, 2022, 7:04 AM

#

wise trench Can't seem to find a good middle ground it's either over trained and will only g...

Like all subject, i usually start at 1e-6 on constant. I cook it to 100 times the pictures in the learning dataset, and then I start doing checkpoints every 250 steps and compare the outputs, refine on lower LR on the best checkpoint before it is overlearn.

I'm not sure if "digital painting" or anything of that changes params recommanded. From my tests, it's mostly the quality of your dataset that changes everything, then comes your choice of token, and lastly, if there was anything related to your new concept in the model or not. Each of those will have an influence on the good learning parameters imo

wise trench Nov 12, 2022, 7:05 AM

#

stone garden Like all subject, i usually start at 1e-6 on constant. I cook it to 100 times th...

Issue I'm having is when it does get to a point of being really good it only generates the input images. If I merge that with the og model though it turns out okay BUT loses a few features

#

I can try cooking it then lowering LR

#

What does scale learning rate do by the way if you know? And is SKS/Some other random phrase really needed a paper I read on it said it wasn't assuming your token is already something not used

stone garden Nov 12, 2022, 7:09 AM

#

wise trench Issue I'm having is when it does get to a point of being really good it only gen...

Take the last checkpoint before you hit this stage, lower your learning rate by half and do some steps on it to hit that sweet spot.
You may need to alter your dataset to get better end results here, like try more extensively the things you are teaching it, see if some are already learned, and focus on the pictures you think deserve more attention in the dataset for the end of the learning.

Like i had a model that was too character centric, generating the main character in each image, i did 250 more steps on a dataset where I removed most of the character pics, and it solved the balance

And if everything starts to look like your input, it may also be a problem with prior preservation, and a bigger class image collection could help too

stone garden Nov 12, 2022, 7:12 AM

#

wise trench What does scale learning rate do by the way if you know? And is SKS/Some other r...

I'm not sure about scale learning rate, I'm not on that install.

About the choice of token like SKS, it has an importance. The theory is that you take a "neutral" token like SKS, not affiliated to concepts in the model, so your learning is not tainted.

But if you take a token close to your concept (like the name of an actor resembling a subject in your dataset), you will :

start the learning from this "point" in the model, so closer to your results and you may need less steps
destroy completely the concept that was behind your chosen token in the initial model, of course

wise trench Nov 12, 2022, 7:15 AM

#

I'll have to try again soon

#

Really wish I could figure out what scale learning rate does couldn't find anything on it

stone garden Nov 12, 2022, 7:17 AM

#

Found it, in Shivam code directly

#

"--scale_lr",
action="store_true",
default=False,
help="Scale the learning rate by the number of GPUs, gradient accumulation steps, and batch size

#

Not sure what it means in practice though

wise trench Nov 12, 2022, 7:21 AM

#

Yeah I'm using the new auto extension

#

Most user friendly thing I found previous dreambooth stuff needed WSL on windows etc etc

#

Took up like 50GB so I uninstalled it after using it like once or twice

stone garden Nov 12, 2022, 7:23 AM

#

Hehe Dreambooth costs a lot of hard drive space

#

I need to curate my learning from last night

#

I have about 500GB of models to test

#

I'm using Freon's method, very nice, but yeah, you need a fridging high hardware for it

stone garden Nov 12, 2022, 7:42 AM

#

honest saddle I have a dataset of around 500 different named subjects, each with 5 or so poses...

Easiest way to do it, or at least try, would be to use Shivam, or Everydream. Both allow multi concepts.
In Shivam, you would prepare a training of 500 concepts, each one being 5 pictures of a single subject, with the instance prompt "MrJohns in Super style", if you were after learning MrJohns as one of the 500 character, and "super style" being the name of your style for example.
Everydream would let you do the same by captioning all your pictures with a prompt fitting them, just like in Shivam but even more precise.
But for such a large set of subjects, with only 5 pics on each, you will have 2 problems
1/ they may not take well, each character, if the 5 pics aren't diversified enough
2/ with such a large corpus, you will need so many steps ... 2k5 pictures, i would start with a baseline of 250k steps ! This feels insane

If you want to just go after the style, take 50 to 100 pictures of your dataset, presenting the most variety possible, and do a single concept DB on it, a lot easier

honest saddle Nov 12, 2022, 7:46 AM

#

stone garden Easiest way to do it, or at least try, would be to use Shivam, or Everydream. Bo...

Thanks for the advice!

#

Is there a good guide for style training with dreambooth?

#

I know there are multiple repos for it now

stone garden Nov 12, 2022, 7:49 AM

#

Nitro's guide is quite decent on this
https://github.com/nitrosocke/dreambooth-training-guide/blob/main/README.md

GitHub

dreambooth-training-guide/README.md at main · nitrosocke/dreambooth...

Contribute to nitrosocke/dreambooth-training-guide development by creating an account on GitHub.

crimson wasp Nov 12, 2022, 1:28 PM

#

Somebody claims to have come up with a new version of textual inversion which looks like it blows previous methods out of the water and can do better than finetuning the model in some situations, haven't tried it yet though: https://github.com/7eu7d7/DreamArtist-sd-webui-extension

GitHub

GitHub - 7eu7d7/DreamArtist-sd-webui-extension: DreamArtist for Sta...

DreamArtist for Stable-Diffusion-webui extension. Contribute to 7eu7d7/DreamArtist-sd-webui-extension development by creating an account on GitHub.

north stream Nov 12, 2022, 1:36 PM

#

ah it's finally out, it was first a PR, then a standalone thing. he said he was going to make an extension, glad to see it's done

#

time to try it

solemn latch Nov 12, 2022, 1:48 PM

#

I'm wondering if this new technique works with less VRAM. I'd love to do some training with my 6GB RTX 2060

indigo inlet Nov 12, 2022, 2:40 PM

#

grave carbon I'm trying to dreambooth locally on 12GB VRAM

There are some parameters to reduce the VRAM used. Ex. --gradient_checkpointing and --use_8bit_adam

indigo inlet Nov 12, 2022, 2:44 PM

#

indigo inlet There are some parameters to reduce the VRAM used. Ex. --gradient_checkpointing...

I use Shivam's Diffusers Fork, there a readme file pointing the combinations of parameters with VRAM used.

crimson wasp Nov 12, 2022, 2:44 PM

#

solemn latch I'm wondering if this new technique works with less VRAM. I'd love to do some tr...

I think some implementations let you do it on CPU only, which is much slower but at least doable. I'm not 100% sure but I think you could put some tasks on the gpu and some on the cpu, like at least do preview image generation and data loading on the other to speed things up

grave carbon Nov 12, 2022, 2:51 PM

#

indigo inlet There are some parameters to reduce the VRAM used. Ex. --gradient_checkpointing...

Yes I managed to run it

#

it seems it installs some dependencies after 2 or 3 runs

#

but now I have a problem

#

I can't generate any style image

#

every generated image looks like a photo

#

what have i done wrong? maybe my class prompt or my instance prompt?

crimson wasp Nov 12, 2022, 3:06 PM

#

if you disable the DreamArtist extension you made need to replace /modules/ui.py with the original, since the extension changes it slightly, and then the webui doesn't start up without it (you may need to remove it from the extensions folder as well, disabling may not be enough)

crimson wasp Nov 12, 2022, 3:25 PM

#

See here if you need help removing it: https://github.com/7eu7d7/DreamArtist-sd-webui-extension/issues/1#issuecomment-1312504974

GitHub

RuntimeError: Input type (torch.cuda.HalfTensor) and weight type (t...

Tried running a training session following the example on the site but I keep getting this error. Tried launching webui with --no-half option but does not change anything. Any idea? Training at rat...

indigo inlet Nov 12, 2022, 3:45 PM

#

grave carbon Yes I managed to run it

I don't have much knowing about, but i think if the problem occurs with the ckpt file generated, maybe could be the class prompt or class training images. Other thing to comes in mind are the files appointed by --pretrained_model_name_or_path

frozen bobcat Nov 12, 2022, 3:48 PM

#

When using dream artist training, I'm supposed to have one image on the dataset folder, right?

Can I include more than one image?

crimson wasp Nov 12, 2022, 3:52 PM

#

frozen bobcat When using dream artist training, I'm supposed to have one image on the dataset ...

I've tried both and not had luck with either, though don't have train with reconstruction on since it breaks my ui somehow

grave carbon Nov 12, 2022, 5:34 PM

#

all my model does is photo style

#

I tried like 3 models already

#

I can't generate any style

#

using dreambooth on A1111.

azure stump Nov 12, 2022, 6:22 PM

#

Hi everyone, new to this community and would really appreciate some input. I used huggingface's stable diffusion dream booth co-lab notebook to fine tune the model on 6 pictures of a cologne bottle, but when I did inference the words from the original cologne bottle are not not shown. I understand stable diffusion has some difficulty rendering text, would be grateful to learn about solutions to the text rendering problem! Thank you! 🙂

rough marten Nov 12, 2022, 7:24 PM

#

Can anyone explain to me what the difference between regularization images and training images are please? For context i'm a complete noob trying to play around with dreambooth and setting up my first style model

old igloo Nov 12, 2022, 7:28 PM

#

azure stump Hi everyone, new to this community and would really appreciate some input. I use...

It's just not what it was meant to do. You'll have better luck photoshopping text in later.

stone garden Nov 12, 2022, 7:50 PM

#

Could you pseudo-merge models at inference runtime by loading mutliple unets and merging their noise_pred at each step? Similar to how DiscoDiffusion used to load multiple CLIP models and merge them

#

(assuming none of the other components were finetuned)

half folio Nov 13, 2022, 12:41 AM

#

It could be possible, just had a look at the diffusers pipeline

half folio Nov 13, 2022, 1:07 AM

#

yup, it is possible, I'm messing around with the pipeline right now

#

problem is, it needs a little bit too much vram

bitter matrix Nov 13, 2022, 6:24 AM

#

what does this mean /bin/bash: accelerate: command not found.................in the training section its the error I got in dreambooth?

stone garden Nov 13, 2022, 10:36 AM

#

half folio It could be possible, just had a look at the diffusers pipeline

Thanks for the encouragement - got it working! https://gist.github.com/stevenwaterman/435daa09237cd01c08bb2057523a22ab

Gist

A Unet shim for stable diffusion diffusers that combines multiple m...

A Unet shim for stable diffusion diffusers that combines multiple models in varying amounts at runtime - multiUnet.py

#

Pretty jank but it works so I don't care 😂

half folio Nov 13, 2022, 10:53 AM

#

All I did last night was loading the unet from two different models and calculating the weighted mean of the two noise_pred tensors resulting from them, couldn't be bothered to do all that because it was late though 😆

stone garden Nov 13, 2022, 10:54 AM

#

That's all I'm doing too, just in a way where I can still use the pre-made diffusers pipelines

#

the important bit is __call__ at the bottom, which does exactly that

half folio Nov 13, 2022, 11:03 AM

#

yep, I had a look at your code

fast gazelle Nov 13, 2022, 11:21 AM

#

I'm trying to use a checkpoint as a dreambooth source checkpoint but about half my models aren't showing up in the dreambooth dropdown box. Any thoughts on how to change this?

crimson wasp Nov 13, 2022, 12:44 PM

#

can anybody recommend an up-to-date finetuning repo which will work in under 12gb of vram on windows? I've almost finished creating an image tagger to create a novel-AI-like training dataset, and then just need to edit a finetuning repo's datasource to instead read the file locations/tags/clipping region from the database this image manager creates (and also to tag a ton of my artwork)

surreal mango Nov 13, 2022, 1:42 PM

#

what would happen If I merge 2 diffrent dreambooth models that have the same the same subject, together ?

stone garden Nov 13, 2022, 2:38 PM

#

Can someone help me convert diffusers to a valid CKPT? It seems that I'm either using the official HF script wrong, or that it doesn't build correctly the CKPT to pass the unpickle test

#

https://huggingface.co/Guizmus/DarkSoulsDiffusion/discussions/2#6370d9657a5e5d8efdbde159

grand jay Nov 13, 2022, 2:48 PM

#

excellent

dapper prism Nov 13, 2022, 2:53 PM

#

grand jay excellent

I've since added new datasets for "cat", "kitty", "sexy athlete", "cyberpunk", "guy", "femme fatale", "bikini model", and more!

stone garden Nov 13, 2022, 2:54 PM

#

always good to share the results of processing power like that, thanks from the community and the planet

wide sky Nov 13, 2022, 3:00 PM

#

Is it possible to train specific styles of eyes? without including other facial features in the training. Tonight I wanted to make some tests but it's my first time with textual inversion or DB:)

austere wigeon Nov 13, 2022, 3:25 PM

#

I uninstalled the dreambooth extension and re-installed it, and now I am getting TypeError: start_training() takes 40 positional arguments but 41 were given any ideas?

raw wraith Nov 13, 2022, 4:26 PM

#

Has anyone tried training models on the Collosal ai release?
https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/diffusion
Is there any quality degradation?, I wonder if it speeds up inference as well

GitHub

ColossalAI/examples/images/diffusion at main · hpcaitech/ColossalAI

Colossal-AI: A Unified Deep Learning System for Big Model Era - ColossalAI/examples/images/diffusion at main · hpcaitech/ColossalAI

vale egret Nov 13, 2022, 5:46 PM

#

I really want to know if anyone can get DreamArtist working. The results in the repo look sick, but everyone I ask who has tried it didn't get good results

tough gazelle Nov 13, 2022, 5:50 PM

#

The guy needs to do proper documentation

#

There's basically none

terse cradle Nov 13, 2022, 5:53 PM

#

vale egret I really want to know if anyone can get DreamArtist working. The results in the ...

Is DreamArtist just another repo like auto1111? Or something else?

vale egret Nov 13, 2022, 5:53 PM

#

It has a repo version called advanced prompt tuning, and an extension version https://github.com/7eu7d7/DreamArtist-sd-webui-extension

GitHub

GitHub - 7eu7d7/DreamArtist-sd-webui-extension: DreamArtist for Sta...

DreamArtist for Stable-Diffusion-webui extension. Contribute to 7eu7d7/DreamArtist-sd-webui-extension development by creating an account on GitHub.

old igloo Nov 13, 2022, 6:35 PM

#

I've been having a hard time with one particular training. I'm using 20 headshots of my brother, taken with different cameras (some DSLR, some iphone), in different settings, with different clothes, all have good focus, cropped to 512x512. For the latest training I did 2000 steps at 1e-6 with 2000 class images in the class "man", with checkpoints every 500 steps. All of the checkpoints produce images that look like my brother with a prompt of "photo of xyz man" with CFG 7. But on all of the checkpoints, I can't get transformations to apply, which suggests they are all over-trained. If it's over-trained everywhere from 500 steps to 2000 steps, does that mean I need to reduce or increase the learning rate, or adjust some other setting? Would I be better off using the person class instead of the man class?

glossy rune Nov 13, 2022, 7:03 PM

#

old igloo I've been having a hard time with one particular training. I'm using 20 headshot...

Do you train based on SD 1.4 or 1.5? I found 1.4 an easier / more forgiving base to start from.

Do you train with the text encoder or without? With needs more VRAM but gives generally better results.

Do you have enough variation of photos from face and upper body, studio and outdoor settings?

I found the attached list a good guide for dataset creation.

vale egret Nov 13, 2022, 7:12 PM

#

How long does it take to make a dreambooth model? (I want to know iteration speed compared to inference & recommended step count)

old igloo Nov 13, 2022, 7:17 PM

#

glossy rune Do you train based on SD 1.4 or 1.5? I found 1.4 an easier / more forgiving base...

Thanks for responding!

I am using 1.5 from hugging face.
I am training with the text encoder
Most of the photos were cropped down to headshots. A few included more torso. None were full body. They were all taken in totally different surroundings, etc. I just culled my photo library for photos of him to work with, and it's a mix of indoor, outdoor, studio lighting, natural lighting, etc. But only 20 images total.

Thanks for that guide, I'll try those tips out.

random star Nov 13, 2022, 7:24 PM

#

idk if this counts as finetuning, but should i make my own textual inversion template prompts for a person subject?

#

the textual inversion subject template is kind of meh

#

does it actually affect the results?

vale egret Nov 13, 2022, 7:25 PM

#

random star does it actually affect the results?

yes. I use just a 1-liner "[name], [filewords]" as my template

random star Nov 13, 2022, 7:26 PM

#

thanks!

obtuse flint Nov 13, 2022, 7:29 PM

#

Is it possible to to finetune multiple subject types? As in finetune myself along with a specific article of clothing

#

That way the finetune model can detect both

random star Nov 13, 2022, 8:21 PM

#

also, for the dataset, does more == better?

#

assuming all images are of good quality

#

and variation

random star Nov 13, 2022, 8:29 PM

#

wide sky Is it possible to train specific styles of eyes? without including other facial ...

this is what im wondering as well

#

im having trouble with getting the number of stripes on a characters face to be correct in my embedding. its always supposed to be three. is there a way to fix this?

stone garden Nov 14, 2022, 12:24 AM

#

stone garden Can someone help me convert diffusers to a valid CKPT? It seems that I'm either ...

the answer to "why does my brand new ckpt not pass an unpickle test" was "don't convert it with pytorch 1.13". I would never have found it myself, just logging the answer for anyone having the same difficulty

random star Nov 14, 2022, 1:20 AM

#

wide sky Is it possible to train specific styles of eyes? without including other facial ...

it seems possible

#

i tested it

#

with textual inversion

#

you have to inpaint the eyes tho

wide sky Nov 14, 2022, 1:26 AM

#

random star you have to inpaint the eyes tho

well at least it's a way that everyone can try and it's not too complicated

random star Nov 14, 2022, 1:27 AM

#

i trained it with close up images of eyes btw

wide sky Nov 14, 2022, 1:37 AM

#

so no full face images? Then I'll test that way too, for now I'll try to feed faces at 512 at the best quality I can, also upscaling

random star Nov 14, 2022, 1:41 AM

#

yeah

#

#

so this is the kinda image you want to train it with

#

for watever style your doing

final matrix Nov 14, 2022, 3:39 AM

#

is this what too high learning rate looks like? trying to figure out if its worth continuing training with a lower training rate from here or if i should start all over

10667-1234567-Korra_facial_closeup_smug_-outfit_-hairstyle_city_tlok_artstyle.png

10666-1234570-Korra_facial_closeup_soft_smile_-outfit_-hairstyle_city_tlok_artstyle.png

10787-3673354911-Korra_full-shot_-garments_-hairstyle_city_tlok_artstyle.png

wooden shuttle Nov 14, 2022, 5:49 AM

#

Which repo is this?

final matrix Nov 14, 2022, 6:31 AM

#

Joes

#

from what i heard from others its overtrained from too high LR

crimson wasp Nov 14, 2022, 7:20 AM

#

for models trained on big, well-tagged data like I think Hentai Diffusion and Novel AI were, does anybody know if artists were used as just another prompt word or were separately handled to be added to the prompt like "by artistname" or "in the style of artistname"?

brave hedge Nov 14, 2022, 8:06 PM

#

Has anyone been able to successfully combine dreambooth with inpainting or img2img? I'm trying to apply a learned style to a real image. Wonder if this is possible.

tired wind Nov 14, 2022, 10:20 PM

#

brave hedge Has anyone been able to successfully combine dreambooth with inpainting or img2i...

Yes, this is possible. I would strongly suggest it since you'll have more control over composition.

sick hare Nov 14, 2022, 10:47 PM

#

brave hedge Has anyone been able to successfully combine dreambooth with inpainting or img2i...

I haven't used it personally, but Shivam's repo has a script to train a dreambooth inpainting model: https://github.com/ShivamShrirao/diffusers/blob/main/examples/dreambooth/train_inpainting_dreambooth.py

GitHub

diffusers/train_inpainting_dreambooth.py at main · ShivamShrirao/di...

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch - diffusers/train_inpainting_dreambooth.py at main · ShivamShrirao/diffusers

slim moon Nov 15, 2022, 1:07 AM

#

I made a basic GUI for manually captioning images. It's intended for use with the EveryDream repo, but should work with any system that pulls captions from a sidecar .txt file

#

https://gist.github.com/mstevenson/4a87230bf599e68cbd22e948484e7230

Gist

Manual image captioning tool for Stable Diffusion training

Manual image captioning tool for Stable Diffusion training - image_caption_gui.py

crimson wasp Nov 15, 2022, 4:46 AM

#

slim moon I made a basic GUI for manually captioning images. It's intended for use with th...

If you want a c# tool, I've almost finished making one for tagging, batch tagging, searching by tags, and selecting training regions in the images. I'm just finishing up getting mouse resizing of the training region working and will hopefully update the repo with that soon. It stores all images dragged into it in a text file with tags and training regions, so any training repo would need to read that for datasources. It's ideally for handling tens of thousands of training images and tagging them efficiently for training models with novel AI's quality. It also makes a pretty sweet image viewer with tagging and searching by tags https://github.com/CodeExplode/Image-Tagger

GitHub

GitHub - CodeExplode/Image-Tagger: Simple Image Viewer with ability...

Simple Image Viewer with ability to tag images, search by tags, and mark regions for AI training - GitHub - CodeExplode/Image-Tagger: Simple Image Viewer with ability to tag images, search by tags,...

vale egret Nov 15, 2022, 4:53 AM

#

https://drive.google.com/file/d/10YRLM8q_Ew0DVmrmzivPa3hG1tXb_5g3/view?usp=sharing
fast image region selector now with gui file select 🙂 no tagging though

Google Docs

ImagePicker.jar

#

source

📎 FastIdent.java

slim moon Nov 15, 2022, 5:35 AM

#

crimson wasp If you want a c# tool, I've almost finished making one for tagging, batch taggin...

Nice! I'll check it out, looks a lot more sophisticated

crimson wasp Nov 15, 2022, 6:04 AM

#

slim moon Nice! I'll check it out, looks a lot more sophisticated

it's a bit hacked together in truth but hopefully is stable enough to use. It might be a good idea to make backups of the database text file on occasion just in case it gets messed up somehow and loses everything, or you accidentally save over it at the location without loading it first (it doesn't give saving over confirmation each time since ideally load then hit save regularly after making changes)

crimson wasp Nov 15, 2022, 7:56 AM

#

Image Tagger now has mouse controlled resizing and moving of the training area, so should be pretty much ready to go now. Just need to update a repo to read the database file as a data source during training

manic estuary Nov 15, 2022, 11:03 AM

#

How many it/s are you all getting when finetuning (Dreambooth) on your 3090s?

crimson wasp Nov 15, 2022, 11:39 AM

#

added a few more updates to ImageTagger to fix some important things if anybody is using it, but now I think it's really really ready to be used 😅

bold seal Nov 15, 2022, 1:15 PM

#

I was wondering if anyone had experience in the following problem. So i have a big upscaled outpainted landscape. Now i want to go back in and correct small details in say photoshop. What i really want to do is zoom in on certain features (like distant faces or images), but it occured to me that SD image to image or inpainting could accomplish the same thing with the original prompt in such a setting (as opposed to photoshop). Is there something that helps automate that. Usually its the opposite direction (downscaled to upscaled)

crimson wasp Nov 15, 2022, 3:57 PM

#

bold seal I was wondering if anyone had experience in the following problem. So i have a ...

do you mean like inpainting at full resolution? It's a feature in Automatic's UI which is really helpful. e.g. inpaint a 40x40 area at 512x512 (SD's working resolution), then automatically scale it down and paste it in

bold seal Nov 15, 2022, 6:03 PM

#

yea. It would be great if we could do exactly that, all the while staying zoomed in . B/c its hard to spot little details. I have to put each image into photoshop after each inpaint to spot the little imperfections. You can't really spot the issue by eye sometimes. One could easily imagine using SD to 'correct' things at all scales.

dapper prism Nov 15, 2022, 6:07 PM

#

What's the best repo right now for actual finetuning (not DreamBooth)? Is it the EveryDream repo (I see it uses reg images, so is it not actual finetuning)?

tropic glacier Nov 15, 2022, 7:30 PM

#

Hey guys. I could really use some pointers I just feel so discouraged after so much time... I'm so bad at in painting. Can someone point me in the right direction? I usually run
Full resolution
Resize and fill
Restore faces
Mask 1.0
I just can't produce anything of any quality. Are there any settings I'm missing or just any tips would be nice. I also use the 1.5 inpainting model

prisma kiln Nov 15, 2022, 8:36 PM

#

These are fantastic, are you sharing your model?

ashen swan Nov 15, 2022, 10:41 PM

#

Just made this, hope people get some use out of it :)
https://github.com/antis0007/sd-webui-multiple-hypernetworks
Script that lets you apply multiple hypernetworks at once in auto's webui, if you like it consider giving the repo a star.

GitHub

GitHub - antis0007/sd-webui-multiple-hypernetworks: Script that all...

Script that allows the use of multiple hypernetworks at once in AUTOMATIC1111's Stable Diffusion webui - GitHub - antis0007/sd-webui-multiple-hypernetworks: Script that allows the use of mu...

surreal mango Nov 16, 2022, 1:26 AM

#

ok so Im starting fresh for dreambooth
I got about 28 photos
I wanted to ask a few questions

Is ddalton a good token or should I use something diffrent
how many steps should I do?
what model should I use as base? (1.4, 1.5)

tired wind Nov 16, 2022, 1:30 AM

#

surreal mango ok so Im starting fresh for dreambooth I got about 28 photos I wanted to ask a f...

you can test your token and see if it does anything, otherwise I wouldn't have something similar to an existing word
Save it every 500 steps, but probably 3,000 it will be overtrained
Start with 1.4, some have said it works better but I don't know for sure

surreal mango Nov 16, 2022, 1:33 AM

#

also how do I prevent colab from dissconnecting?

#

without pro

tired wind Nov 16, 2022, 1:34 AM

#

Usually by making sure it doesn't idle. I had a lot more trouble getting colab to run than doing it locally

#

Maybe it's easier now, but I spent hours getting automatic1111 to work on colab and then the next day it would break. Locally, on a 3090ti+win11 everything worked on the first try including dreambooth plugin. Very painless and not what I was expecting.

surreal mango Nov 16, 2022, 1:36 AM

#

should I use this option?

#

I looked in the code and it just adds regularzation images of men or women

#

should I be using it? or just continue without them

tired wind Nov 16, 2022, 1:39 AM

#

Not sure about that

surreal mango Nov 16, 2022, 1:40 AM

#

its a folder/file of just a bunch of pictures of men

#

faces

#

so I assume its similir to deepfakes in a way maybe?

#

idk for now I just wont use them

surreal mango Nov 16, 2022, 2:05 AM

#

how long would it take to train a model on a gtx 1660

#

nvrmind I just realized

surreal mango Nov 16, 2022, 3:09 AM

#

ok so I trained my dreambooth model and every background is the same almost everytime

#

how many photos should I do? instead

hot breach Nov 16, 2022, 3:19 AM

#

are your backgrounds all the same in training images? can tend to happen

#

or clothing, etc can be hard to transfer for the same reason

surreal mango Nov 16, 2022, 3:26 AM

#

I dont have any other backgrounds other then my room so I did photoshop half of them to have a white wall behind

hot breach Nov 16, 2022, 6:40 PM

#

I'd go take some photos of yourself outside in natural light

hot breach Nov 17, 2022, 3:40 AM

#

https://huggingface.co/panopstor/EveryDream/blob/main/keir_v_15vae.jpg interesting results from some new code, download and zoom in

keir_v_15vae.jpg · panopstor/EveryDream at main

final matrix Nov 17, 2022, 5:09 AM

#

Now that I've changed the names of my captions, I can suddenly no longer put hairsfyle X on outfit Y

really weird. i didn't do anything except add another 100 pictures of another outfit and reduce the token length of the outfit and hairstyle tokens from several russian letters to just one and remove the english names like outfit or garments.

Now I can tell the outfits apart better and I can also prompt other outfits that werent in the show, but the hairstyles are somehow almost hardcoded to the outfits.

even (outfit x:1.3) [hairstyle a:100] doesn't change anything. I have to prompt hairstyle b of outfit x or I won't get outfit x. but then I don't get hairstyle a either...

I can do (outfit x:1.3) hairstyle a hairstyle b but then I only get a hairstyle that is a mix of both

any ideas on how to fix this?

final matrix Nov 17, 2022, 6:28 AM

#

well if i cant fix it soon ill just release my model without the ability to change hairstyles yet

because at least it can prompt all outfits on their own with no overlap and also prompt any non show outfits without overlap. that should be good enough for a V1.0 release.

final matrix Nov 17, 2022, 8:53 AM

#

new idea

i tag it as "dressed like X" "wearing Y hairstyle"

bold seal Nov 17, 2022, 11:30 AM

#

Hello, i'm about to train a dreambooth model using a 'style'. Whats the consensus on learning rate and steps for say 30 training images. Also what sort of regularization images should i use. I downloaded a few thousand images of a random 'art style', would that work or should i have SD generate images itself.

hot breach Nov 17, 2022, 3:59 PM

#

if you are using real images with captions (ex "tulips by claude monet" or "a painting of a group of people sitting at a table in a room with a light hanging from the ceiling by Vincent van Gogh" you're better off

icy vale Nov 17, 2022, 4:49 PM

#

Hey, I trained this model with Justin Pinkney's repo for emojis and have a general question - training via this method took a large amount of time (5 hours on A100) and results in a checkpoint that specifically only creates emojis.

I see people using dreambooth to create a style - which I'm about to try out. But what is the benefit of doing it the way I did with Justin's repo versus Dreambooth which seems much much faster and cheaper? I get the textual inversion and embedding a new "object" with Dreambooth but what is the tradeoffs with training styles

hot breach Nov 17, 2022, 4:51 PM

#

oof 5 hours of a100?

icy vale Nov 17, 2022, 4:51 PM

#

ya lol

#

~1200 training images

#

was that too many

tired wind Nov 17, 2022, 4:52 PM

#

I've done TI, fine tuning, and dreambooth, and dreambooth results were much better than the other two. Also usually can have a good model in 1,000-2,000 steps vs like 13,000+

#

I haven't seen any consensus on number of training images. The dreambooth paper stated they only used 3-5 images for training, but that seems to be for very specific subjects (a unique person or object) rather than style

icy vale Nov 17, 2022, 4:56 PM

#

then theres really no point of ever fine tuning?

#

versus dreambooth?

tired wind Nov 17, 2022, 4:56 PM

#

I'm not clear that there is

#

At least for TI, it can be useful for combining subjects, like "person X and person Y", though you can train dreambooth on multiple things at once (haven't tried)

#

I would suggest start with Dreambooth just given the training speed and test the other stuff after, which is the reverse of what I did. The dreambooth plugin for automatic1111 is pretty painless to use

icy vale Nov 17, 2022, 4:59 PM

#

cool, thanks!

vale egret Nov 17, 2022, 5:47 PM

#

Fine tuning is fine if you have a massive and varied dataset and tons of computational power since it will forget about everything you don’t reinforce

hot breach Nov 17, 2022, 5:49 PM

#

vale egret Fine tuning is fine if you have a massive and varied dataset and tons of computa...

take a look at this: https://huggingface.co/panopstor/EveryDream/blob/main/keir_v_15vae.jpg https://huggingface.co/panopstor/EveryDream/blob/main/keir_v_15vae_grid2.jpg

vale egret Nov 17, 2022, 5:50 PM

#

Why?

hot breach Nov 17, 2022, 5:50 PM

#

up to you! click, or don't

icy vale Nov 17, 2022, 6:29 PM

#

vale egret Fine tuning is fine if you have a massive and varied dataset and tons of computa...

what do you mean by "forget about everything you don’t reinforce"

vale egret Nov 17, 2022, 6:35 PM

#

When you train an AI with a certain dataset, it starts assuming that the dataset contains everything it will ever need to know. That’s why embeddings are so popular, they let you put your character into any scene you want, doing whatever you want. But if you fine tune, you can only create combinations of scenes that exist in the training data

Dreambooth gets around this problem somewhat, as it retains more of its previous knowledge

hot breach Nov 17, 2022, 6:39 PM

#

there's a lot of space in CLIP and LD to add new knowledge, it knows potentially millions of concepts, adding a few shouldn't necessarily ruin that if your training regime is intelligent

frozen bobcat Nov 17, 2022, 6:42 PM

#

What's TI? 👀

split acorn Nov 17, 2022, 7:06 PM

#

TI usually refers to Textual Inversion alicatPog

rough shoal Nov 17, 2022, 7:07 PM

#

Can you train hypernetworks on multiple people by adding their name in their training data? Like: (slash)(character name(slash)). Do they accurately show the different characters and clothing without any mixing?

frozen bobcat Nov 17, 2022, 7:07 PM

#

split acorn TI usually refers to Textual Inversion <:alicatPog:973492381947154453>

Oh! Hypernetworks????

split acorn Nov 17, 2022, 7:09 PM

#

Hypernetworks, at least atm, are only good for training one (as far as I'm aware). BUT I think you could train with a dataset of two people (with those two people in all the input images) with a hypernetwork, but you're going to always get them together alicatHm2 . I haven't tried it yet, but that should work. Would it be good? very likely not. Would it be flexible? No

sullen ravine Nov 17, 2022, 7:10 PM

#

Hey, so I've been messing around with dreambooth in A1111 to moderate degrees of success. My big question is, Does anyone have experience training multiple concepts with it yet? Would love some basic introductory help with how that works as there isn't anything I can find online about it given its only been a few days.

rough shoal Nov 17, 2022, 7:11 PM

#

split acorn Hypernetworks, at least atm, are only good for training one (as far as I'm aware...

Damn, do you think if I trained a style embedding I could get close to replicating the characters I use in training? Say for a certain anime for example.

split acorn Nov 17, 2022, 7:13 PM

#

sullen ravine Hey, so I've been messing around with dreambooth in A1111 to moderate degrees of...

The stable diffusion dreambooth server has more info on that, but they primarily use a diff repo, as far as I'm aware alicatHm2

split acorn Nov 17, 2022, 7:13 PM

#

rough shoal Damn, do you think if I trained a style embedding I could get close to replicati...

Yep! I did that and that's what happened

sullen ravine Nov 17, 2022, 7:13 PM

#

I'll check it out, I appreciate it

split acorn Nov 17, 2022, 7:14 PM

#

Though... it's like character adjacent alicatKEK2

#

dm'd!

rough shoal Nov 17, 2022, 7:16 PM

#

split acorn Yep! I did that and that's what happened

What does your embedding look like?

split acorn Nov 17, 2022, 7:17 PM

#

I'm reinstalling stable-diffusion atm (getting black image bug) but I can show some examples in a moment alicatPog

rough shoal Nov 17, 2022, 7:19 PM

#

I started getting that out of nowhere a while ago, had to use --no-half-vae

split acorn Nov 17, 2022, 7:20 PM

#

o, interesting NOTED

#

it did it with or without vae though alicatHm2

#

only on img2img with larger images

frozen bobcat Nov 17, 2022, 7:38 PM

#

Is there a decent guide for embeddings? @split acorn

#

Also, TI is hypernetworks? Is it also embeddings?

vale egret Nov 17, 2022, 7:39 PM

#

TI is embeddings

#

There’s a guide on the automatic1111 wiki

split acorn Nov 17, 2022, 7:48 PM

#

TI and HN are different and yeah, there's a guide on auto's wiki

#

although I only use DreamArtist for TI right now and everything else HN / DB

#

For HN, I like this guide:
https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/2670

#

has helped a lot

tough gazelle Nov 17, 2022, 7:51 PM

#

split acorn although I only use DreamArtist for TI right now and everything else HN / DB

How do you even get dreamartist to work?

I'm following his guide trying to do a one image training to see how it is and it looks nothing like the image.

split acorn Nov 17, 2022, 7:52 PM

#

3 positive tokens, 6 negative tokens, training was 0.003 and I used a model that could do something similar to what I was looking for

#

to mmmm how many steps was it, checking

tough gazelle Nov 17, 2022, 7:53 PM

#

Yeah that's basically what I'm doing.

Was trying to follow his example of a character. But for some reason it just makes loads of group images. Even though it's a single character.

rough shoal Nov 17, 2022, 7:53 PM

#

How much Vram do you have for DreamArtist?

tough gazelle Nov 17, 2022, 7:54 PM

#

It's working on my 10GB 3080

#

As in running

split acorn Nov 17, 2022, 7:54 PM

#

I have 8GB and it worked fine

tough gazelle Nov 17, 2022, 7:54 PM

#

Uses 9.8GB VRAM when I run it

split acorn Nov 17, 2022, 7:54 PM

#

HN + image previews takes up like full 8GB though. It works, it's just tight

#

I'm also using xformers

rough shoal Nov 17, 2022, 7:55 PM

#

Really? I couldn't get it working on my 2070 super not running preview images, I don't have xformers though, I like Win7 too much.

tough gazelle Nov 17, 2022, 7:55 PM

#

I just don't understand why the dreamartist stuff isn't working. It's not just me either. There's a bunch of other people that have registered an issue saying they have similar problems

split acorn Nov 17, 2022, 7:55 PM

#

I had to edit the code

tough gazelle Nov 17, 2022, 7:55 PM

#

rough shoal Really? I couldn't get it working on my 2070 super not running preview images, I...

Your just causing yourself a bunch of issues doing that really

split acorn Nov 17, 2022, 7:55 PM

#

also recontruction is broken atm, iirc

#

so I left it unchecked

tough gazelle Nov 17, 2022, 7:56 PM

#

I'm not using reconstruction.

What code did you have to edit.

split acorn Nov 17, 2022, 7:59 PM

#

oh, the code edit was just part of the ui.py, it's really only important when uninstalling and they already fixed it

#

basically it'd edit the ui.py directly and it would break SD alicatKEK2

rough shoal Nov 17, 2022, 8:00 PM

#

How do you know when to stop training an embedding or hypernet?

split acorn Nov 17, 2022, 8:01 PM

#

image previews helps a lot

#

there's also people that will use the loss data to see the learning curve

#

and then base it off of whether or not the training is still learning or not

#

you can get that info through the cvs (if enabled in the settings)

#

a couple people have made py's to convert the info

rough shoal Nov 17, 2022, 8:03 PM

#

So either train until the preview image looks bad and train the previous at lower rate/until nothing changes or watch for data loss to go above a certian number, how much loss is to much loss?

tough gazelle Nov 17, 2022, 8:04 PM

#

@split acorn how many steps were you doing then? Maybe I'm not doing enough. Although I'm doing the same as what he put in his examples.

split acorn Nov 17, 2022, 8:05 PM

#

I went up to 3600

#

dreamartist worked pretty quickly though

tough gazelle Nov 17, 2022, 8:07 PM

#

His examples say 8000

#

And I'm at 8000 and it's still shit lol

split acorn Nov 17, 2022, 8:07 PM

#

Oh, you should notice it being pretty good a lot earlier than that

tough gazelle Nov 17, 2022, 8:07 PM

#

Are you trying to do a subject or a style, got any examples?

split acorn Nov 17, 2022, 8:07 PM

#

it might be better higher than 3600, that's just when I decided to stop

#

it was subject

#

ye one sec

tough gazelle Nov 17, 2022, 8:10 PM

#

Yeah I just get complete shite out of it

split acorn Nov 17, 2022, 8:10 PM

#

The middle is the original and the other two are from dream artist alicatHm2 this is a rough example though.

tough gazelle Nov 17, 2022, 8:10 PM

#

What model you use to train?

split acorn Nov 17, 2022, 8:11 PM

#

D&D merged with Any3

tough gazelle Nov 17, 2022, 8:11 PM

#

What about initialiser text

split acorn Nov 17, 2022, 8:11 PM

#

think I used halfling or 1boy alicatHm2 I don't recall

tough gazelle Nov 17, 2022, 8:12 PM

#

The "results" I get aren't even consistent

#

This was the input image

39-Chisato-Nishikigi-Wears-Japanese-Clothes-768x727.jpg

split acorn Nov 17, 2022, 8:12 PM

#

I used [filename] as the training txt

tough gazelle Nov 17, 2022, 8:12 PM

#

Yet the outputs are shit like this

split acorn Nov 17, 2022, 8:12 PM

#

and I used the sd tagger to tag all my images

#

ooo wow

#

that's not even close

tough gazelle Nov 17, 2022, 8:12 PM

#

Sometimes it does this

#

#

That's probably the closest it's gotten

split acorn Nov 17, 2022, 8:13 PM

#

I tagged them, see example:
solo, looking at viewer, smile, short hair, bangs, brown hair, 1boy, long sleeves, hair between eyes, brown eyes, jewelry, sitting, flower, ahoge, male focus, earrings, boots, outdoors, pointy ears, belt, pants, bag, grin, tree, depth of field, blurry background, brown footwear, knee boots, grass, ear piercing, pink flower, bracer, wall, brown pants, brick wall

tough gazelle Nov 17, 2022, 8:13 PM

#

As in you tagged the images?

split acorn Nov 17, 2022, 8:13 PM

#

and then the txt file only contains [filename]

#

yeah

#

example

📎 Hal5.txt

tough gazelle Nov 17, 2022, 8:14 PM

#

I guess I can try that

split acorn Nov 17, 2022, 8:14 PM

#

according to the person though you don't need to do that for it to work

rough shoal Nov 17, 2022, 8:14 PM

#

tough gazelle This was the input image

Have you tried sharpening this image and trying it again?

split acorn Nov 17, 2022, 8:14 PM

#

but it's just a habit from other TI, HN, and DB

tough gazelle Nov 17, 2022, 8:15 PM

#

rough shoal Have you tried sharpening this image and trying it again?

I wouldn't have thought it would make much of a difference

#

I've done Hypernetworks before with Similar images and it works perfectly fine

split acorn Nov 17, 2022, 8:15 PM

#

it might, it's kinda blury

tough gazelle Nov 17, 2022, 8:15 PM

#

It's just this Dreamartist thing that doesn't work

split acorn Nov 17, 2022, 8:15 PM

#

the tagging / txt file though might make a big diff

tough gazelle Nov 17, 2022, 8:15 PM

#

Let me try get a clearer image

rough shoal Nov 17, 2022, 8:16 PM

#

I noticed with sharper anime screencaps the deepdanbooru picks up almost everything perfectly, maybe upping the clarity will be better for the ai to understand what it's training.

rough shoal Nov 17, 2022, 8:17 PM

#

tough gazelle I've done Hypernetworks before with Similar images and it works perfectly fine

Similar images, you're only training with one this time though right? It's got to be the cream of the crop of images.

tough gazelle Nov 17, 2022, 8:17 PM

#

It can't look that good as the resolution has to be low

tough gazelle Nov 17, 2022, 8:24 PM

#

split acorn I used [filename] as the training txt

[filename] or [name] As the examples have [name] in them

bold seal Nov 17, 2022, 8:26 PM

#

if i'm merging two checkpoints together.. one style and one person.. weighted sum seems like the obvious choice?

split acorn Nov 17, 2022, 8:27 PM

#

[filename]

#

that's what I did, at least

split acorn Nov 17, 2022, 8:30 PM

#

bold seal if i'm merging two checkpoints together.. one style and one person.. weighted su...

The other thing to try would be Primary: Model / Secondary: Person / Tertiary: The ckpt that "Person" was trained on. Add difference 1?

#

Still looking for a model merging guide alicatKEK2

tough gazelle Nov 17, 2022, 8:32 PM

#

If I try use [filename] in the project template I get this error
AttributeError: 'NoneType' object has no attribute 'detach'

#

You sure it's not [filewords] ?

split acorn Nov 17, 2022, 8:33 PM

#

oh sorry, yaeh

#

I was just double checking now alicatKEK2

#

[filewords] yeah

#

and the input image file name and the txt file share the same name in the same folder

tough gazelle Nov 17, 2022, 8:34 PM

#

That's still giving me errors....

split acorn Nov 17, 2022, 8:34 PM

#

and then the txt file has all the tags

tough gazelle Nov 17, 2022, 8:34 PM

#

I can use the inbuilt subject one fine

#

But Filename or Filewords don't work

split acorn Nov 17, 2022, 8:34 PM

#

with your dataset folder, does it contain the image and the txt file?

tough gazelle Nov 17, 2022, 8:34 PM

#

Yeah

split acorn Nov 17, 2022, 8:34 PM

#

huh odd

#

I've done over a dozen like this and I've never seen that

#

alicatHm2

tough gazelle Nov 17, 2022, 8:35 PM

#

His instructions say not to use filewords

split acorn Nov 17, 2022, 8:36 PM

#

and the Prompt template file is pointing to the txt that only has [filewords] in it?

tough gazelle Nov 17, 2022, 8:36 PM

#

Yeah

#

2 seconds there's an update to DreamArtist, let me try again

split acorn Nov 17, 2022, 8:36 PM

#

Well it works so I don't know what to tell them alicatKEK2

#

maybe it works better if you don't use filewords but

#

this is more "something to try since the default isn't working" kind of thing

tough gazelle Nov 17, 2022, 8:37 PM

#

Well it doesn't work at all at the moment, so willing to try anything to see if it makes a difference.

split acorn Nov 17, 2022, 8:37 PM

#

yeah

tough gazelle Nov 17, 2022, 8:37 PM

#

are you using CFG Scale 3 on the training like he recommends?

#

Ok something isn't right. It works when I use the Subject_filewords.txt file. So it's not like it can't read the filewords.

#

Maybe I need to do [name] [filewords] otherwise it's going to complain the embedding word isn't in there

#

ok it runs when I put [name], [filewords]

#

Lets see if this makes any difference and if it looks anything like the image.

rough shoal Nov 17, 2022, 8:41 PM

#

Hmm, looks like my suspicions were right: a single from above perspective image was squashing my results. A shame because the squashed version visually looks better. I hope I can recover it with a little more training but anything after 28000 gets blown out at 5e-06 while I used it throughout the whole 30k with the old ver.

02912-443435399-best_quality_masterpiece_1.3_1girl_albino_neck_bags_under_eyes_bangs_closed_mouth_white_eyelashes_colored_eyelashes.jpeg

#

The problem image.

split acorn Nov 17, 2022, 8:49 PM

#

tough gazelle are you using CFG Scale 3 on the training like he recommends?

I used 5

tough gazelle Nov 17, 2022, 8:51 PM

#

Hmm well the preview is looking better so far

#

Used this as the training image

#

Preview after 1000 steps

#

At least it's actually a girl with blonde hair this time

#

And not a temple

split acorn Nov 17, 2022, 8:58 PM

#

alicatPog

frozen bobcat Nov 17, 2022, 9:19 PM

#

Ok, so I've used Dreambooth to train on one specific character (using waifudiff3 model). The resulting ckpt NAILS the character, being able to pretty much apply any anime style to that character. Looks fabulous when it comes to drawing the original character in (for example) CLAMP style.

Created a hypernetwork using the same character images. When applied over the trained ckpt, I can almost replicate the exact style with that exact character with the hypernetwork effect turned down to .5.

What are the benefits of training with a TI? The resultant files are so tiny. And the effects of TI embeddings i downloaded have almost ZERO effect on anything.

Is there a decent guide for TI?

vale egret Nov 17, 2022, 10:45 PM

#

tough gazelle Hmm well the preview is looking better so far

So you figured out the problem?

tough gazelle Nov 17, 2022, 10:45 PM

#

It looked a little better, but still didn't look enough like the model.

#

Might as well just use Dreambooth, takes about the same amount of time and looks a lot better.

solemn notch Nov 17, 2022, 11:18 PM

#

frozen bobcat Ok, so I've used Dreambooth to train on one specific character (using waifudiff3...

To use a TI you have to use the keyword. in auto1111 the filename is the keyword foobar.pt -> painting by foobar / painting of foobar.

it's basically teaching SD what a particular word means in the context of everything it already knows; the advantage is that they should theoretically be mixable, if they were trained on the same starting model.

frozen bobcat Nov 17, 2022, 11:34 PM

#

solemn notch To use a TI you have to use the keyword. in auto1111 the filename is the keyword...

Thanks.
I'd like to create an embedding for one specific character.
I've chosen this as an initialization text: 1boy, demon boy, bat wing, golden horns, blue eyes, jumpsuit,
I have no idea what *Number of vectors per token * means.
Some people are training one character but they chose 10 tokens.
What does that mean?
I just want to be able to consistently reproduce my one character as close as possible.

solemn notch Nov 17, 2022, 11:51 PM

#

frozen bobcat Thanks. I'd like to create an embedding for one specific character. I've chosen...

So, when you type some words into a prompt for stable diffusion, the first step the computer does is translating those words into a series of tokens. For some common words and symbols, a word corresponds to one token. For other, less common words, it could be several tokens.

Textual Inversion embeddings basically train SD on which tokens you want it to translate a particular keyword into. The number of tokens you give it / the number of vectors per token are telling it how many pieces of other things it should look at.

You have a limit on the number of tokens you can include in a prompt, which is why you can't just set it to 100 and assume it'll figure something out... but in theory, the more tokens you let it play with, the better the results should be.

#

like many things in machine learning, you may need to play with it a bit before getting great results.

final matrix Nov 18, 2022, 6:09 AM

#

final matrix Now that I've changed the names of my captions, I can suddenly no longer put hai...

okay the problem is probably that the model can't use kyrellic or non-english characters. other users noticed that and I couldn't prompt anything after I had switched everything to Cyrellian

I changed my captions to "wearing X outfit" where X is a two token token like e.g. default outfit -> defa

that should work brilliantly. If it does, then I should have eliminated all the problems my model has ever given me except for the training itself, i.e. which learning rate and repeats are best. 1e-7 was still a bit too high for my 1100 pictures. I'm trying it now with 7.5e-8 and if that works I'll continue to train until I have something usable.

frozen bobcat Nov 18, 2022, 6:35 AM

#

solemn notch So, when you type some words into a prompt for stable diffusion, the first step ...

I am too stupid to connect what you said with how it directly relates to what I should put in that field when starting to train an embedding. But your kindness is appreciated nonetheless. Thank you.

peak ridge Nov 18, 2022, 1:36 PM

#

i want to train a new concept to SD, various actions (thinking martial arts) - what's the best way of doing it? embedding, hypernetwork or dreambooth?

wooden shuttle Nov 18, 2022, 3:38 PM

#

You mean like, martial arts poses?

stone garden Nov 18, 2022, 5:24 PM

#

split acorn The middle is the original and the other two are from dream artist <:alicatHm2:9...

what learning rate did you use and how many steps? did you also use "train with reconstruction"?

#

i really want to get good results with dreamartist

split acorn Nov 18, 2022, 5:25 PM

#

Train with reconstruction was broken when I trained it

#

not sure if it was fixed or not

#

I didn't use it

#

Steps was only 3600, I know they recommend more, but for what I was wanting 3600 seemed pretty good, might try longer but, I'd rather just hypernetwork with cherry picked generations alicatKEK2

#

I used the recommended learning rate

stone garden Nov 18, 2022, 5:27 PM

#

hmm okay thank you! 😄

split acorn Nov 18, 2022, 5:28 PM

#

0.003 was the learning rate

#

CFG was 5

#

Looks like they have a bunch of examples now alicatPog
https://github.com/7eu7d7/DreamArtist-sd-webui-extension#pre-trained-embeddings

vale egret Nov 18, 2022, 6:02 PM

#

split acorn 0.003 was the learning rate

do you remember what the issue was arron17 was having?

tough gazelle Nov 18, 2022, 6:23 PM

#

vale egret do you remember what the issue was arron17 was having?

My Issue was it just not giving me results that were any good

#

After tagging the image it was a little better. But it still was nowhere near as close as the devs examples

split acorn Nov 18, 2022, 6:57 PM

#

Could be the input image, it seems kinda hard to learn from alicatHm2

#

They have a bunch of examples and even if following the various settings they list and if it still doesn't produce similar results, it's probably the input image

peak ridge Nov 18, 2022, 8:27 PM

#

wooden shuttle You mean like, martial arts poses?

yea, martial arts poses

icy vale Nov 18, 2022, 8:42 PM

#

is there a good tutorial for training a style on dreambooth?

versed oriole Nov 18, 2022, 10:14 PM

#

split acorn Looks like they have a bunch of examples now <:alicatPog:973492381947154453> <h...

yeah he's improved the documentation on that page quite a bit, I think i'll try it again now there are examples of LR and steps.

#

I just wonder if when he says to use EMAs just loading the "full weight" versions of the models is sufficient, or if I should also use a .cfg that actually loads them in (doubles ram usage)

#

it's heavy enough to prevent 512 training and reconstruction loss with 24gb lol.

golden moon Nov 19, 2022, 3:50 AM

#

Any idea if v1-5-pruned.ckpt is available with 840k VAE baked in?

#

I know there's a scrip that unpacks the checkpoint where you could include the VAE yourself, but it's redundant work if it is already up somewhere

steady heath Nov 19, 2022, 3:54 AM

#

Can anyone tell me where do i start if i wanted to train my own model using WaifuDiffusion as a base model?

final matrix Nov 19, 2022, 6:55 AM

#

final matrix okay the problem is probably that the model can't use kyrellic or non-english ch...

i can confirm now that replacing the cyrellic letters with a unique two token english letter combination has worked and enabled me to distinguish all the outfits and hairstyles and freely combine them as i see fit

this is from a 35rpt ckpt hence it still looks very weird but one can clearly see the divergence

xy_grid-0001-628810579-Korra_wearing_runa_outfit_stada_hairstyle.png

xy_grid-0000-3364256803-Korra_wearing_defa_outfit_stada_hairstyle.png

#

and giving the outfits and hairstyles to other people, in this example Emma Watson, also works (again, only 35rpt hence it looks so bad)

xy_grid-0003-917138351-Emma_Watson_wearing_defa_outfit_stada_hairstyle.png

#

i just noticed an oopsie
i forgot to crop and resize the last images i added to my dataset
luckily it works fine if you dont use the exact way i captioned the images, but instead just make a more generic prompt like "photo of X wearing Y" etc
wont restart training for this though. its just 30 images and a minor issue. ill fix that in my first post-release version.

https://cdn.discordapp.com/attachments/1041753916381085717/1043423509910655006/grid-0070.png
https://cdn.discordapp.com/attachments/1041753916381085717/1043423510195863622/image.png
https://cdn.discordapp.com/attachments/1041753916381085717/1043423510535610438/grid-0072.png

stone garden Nov 19, 2022, 9:39 AM

#

I made a small UI for Shivam if anybody interested. "pip install easygui" in your local Shivam env should be enough to have it running

📎 train_ui.py

rough shoal Nov 19, 2022, 10:19 AM

#

What do you put in initialization text and vector tokens if you're training a style embedding?

stone garden Nov 19, 2022, 11:19 AM

#

rough shoal What do you put in initialization text and vector tokens if you're training a st...

initialization text should be the keyword you want to use. verctor tokens will control how many token your keyword will cost when you use it in a prompt, but can also mean better quality. 8 is a commonly used value
More on this : https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Textual-Inversion#explanation-for-parameters

rough shoal Nov 19, 2022, 11:21 AM

#

stone garden initialization text should be the keyword you want to use. verctor tokens will c...

So it doesn't strictly matter what it is if I was making a style? Just '(name of the artist)style' is fine?

stone garden Nov 19, 2022, 11:23 AM

#

rough shoal So it doesn't strictly matter what it is if I was making a style? Just '(name of...

it does matter a little in my experience :

you won't be able to use that keyword normaly when that embedding is loaded, so if you choose "photo" as keyword, you'll have a hard time prompting without it
the training will be easier or harder depending on that keyword, at least a little. It won't change that much final results, but if you used "tree" as keyword, it will need more steps to understand it's not the same "tree" that it knew of. So using a keyword related to what you are training is easier usually

final matrix Nov 19, 2022, 12:37 PM

#

00080-2392364825-Korra_facial_closeup_shoa_hairstyle.png

00086-3132453613-full-shot_Emma_Watson_wearing_runa_outfit_shoa_hairstyle_standing_in_city_tlok_artstyle.png

00088-519272223-photo_full-shot_Emma_Watson_wearing_runa_outfit_shoa_hairstyle_standing_in_city.png

00092-1408001005-photo_1.3_full-shot_Korra_wearing_runa_outfit_shoa_hairstyle_standing_in_city.png

#

my legend of korra model is likely release-ready by tomorrow evening

i restarted training to fix some caption issues as well as add 10 more images of smug korra (lol) so that delays things somewhat but i think by tomorrow i should have a releaseable model.

some outfits also need more training still (the above screenshots are with an outfit i have more training data for so its easier to prompt) so the model needs to bake longer in the oven which further delays things. i am also very careful with repeats and learning rates here to not overtrain (too much). in certain cases cfg4 must be used but thats a sacrifice i am willing to make.

the entire model dataset now encompasses 1142 manually captioned images.

the release version will allow you to output anyone and anything in the legend of korra artstyle as well as any person in Korras outfits, as well as prompt Korra with any of her outfits as well as any other non show outfit as well as prompt her in any artstyle.

#

I have been working on this model for the past 4 weeks and spent hundreds of euros generating dozens upon dozens of ckpt's to test any kind of changes to repeats, learning rates, captions, classes, tokens, regs etc... as well as test different repos.

I have now settled on no class, no token, manually captioned images, JoePennas repo. Repeats ans learning rates are still being tested but the above results were with an earlier model with 10 repeats at 3e-06, 25 repeats at 2e-06, 60 repeats at 1e-06. Some outfits still needed more training.

It took me at least 50 real working hours building up that dataset (despite using a bulk image downloader) and manually captioning it (despite using a bulk file rename tool).

I may create future versions adding the other characters from the show with better captions (for example location captions are extremely rudimentary right now).

I will release the dataset alongside the ckpt.

stone garden Nov 19, 2022, 2:28 PM

#

final matrix I have been working on this model for the past 4 weeks and spent hundreds of eur...

Thanks a lot for this telling of your process/work. I feel less alone on the same kind of things too, captioning hours upon hours, scrapping the web, filtering, ...
Joe's repo is really good. I use Everydream, in the same kind of process, if you like Joe you may like it too

Those results are really cool, though i can't really appreciate them at their full value, having not watched Legend of Korra.

I never tried doing such small epochs and micro managing the learning rate like that. I start on 1e-6 and stay there, monitoring the tensorboard to see when to stop, and epochs of 50 repeats right now, but maybe I'm on bigger datasets than you. I'm really impressed by the quality on so "few" steps tbh

Thanks again for this diary, keep it going if you feel like it, very instructive

final matrix Nov 19, 2022, 3:01 PM

#

stone garden Thanks a lot for this telling of your process/work. I feel less alone on the sam...

how big is your dataset?
how many repeats until you had a good result?

stone garden Nov 19, 2022, 3:06 PM

#

final matrix how big is your dataset? how many repeats until you had a good result?

I'm in full experimental phase, i have tested datasets of 10 to 250 pictures with full caption by hand.
If I use very diverse datasets (like part of it is illustration style, but i also teach other concepts at the same time, some photo realistic, some not,... Trying to get diversity in the datasets so it can act as its own regularisation, and preserve the rest of the model.

I start getting ok results once i hit epoch 4 usually, but for the multi concepts to all take at the same time and mix together, i push toward epoch 12 (50 repeats)

My last test was 100 pictures with 3 diverse datasets, trained for 100k steps, didn't bleed other the rest of the model, and all concepts are learned from what my tests got me. First ok checkpoint was on epoch 12, and then 15 and 17. Best one was 15, so 750 repeats total

#

For example my last one had 3 concepts. Rick roll, Bad cosplay man and Magic powers. I can't get Rick roll with powers but the other cross over of concepts worked

final matrix Nov 19, 2022, 3:16 PM

#

stone garden I'm in full experimental phase, i have tested datasets of 10 to 250 pictures wit...

epoch 4 meaning 200 repeats at 1e-6 yes?

#

epoch 12 damn

#

mine would be long overtrained by then

stone garden Nov 19, 2022, 3:17 PM

#

I work a lot on the regularisation

#

Each picture is also captionned with random tokens that are present in it, like a fence, the keyword "portrait", a chair... I try to add more concepts in each so it can't overtrain easily

#

It makes it learn slower since the attention gets divided

#

But it prevents quite correctly bleeding

#

I was surprised on the e17 that didn't bleed all over tbh

#

That's 850 repeats already

final matrix Nov 19, 2022, 3:20 PM

#

yes my captions are usually like this

"Korra, defa outfit, stada hairstyle, smug, half-body; exterior, day; tlok artstyle"

i find it weird you need that extreme amount of repeats though for your concepts to work. at the aforementioned repeats setting i am already able to freely combine concepts as i want to. just the likeness still has some issues.

stone garden Nov 19, 2022, 3:21 PM

#

You are using one single direction though : kora.
I'm pushing in diverse directions. Bad cosplay man has nothing to do with the other concepts, and is present in only 30% of the dataset. I wasn't surprised to not see it learned well very soon

final matrix Nov 19, 2022, 3:21 PM

#

but i also have 1142 images so that may be a factor. also batch size 5

stone garden Nov 19, 2022, 3:21 PM

#

I'm on batch size 6, but lot less images yeah

final matrix Nov 19, 2022, 3:22 PM

#

stone garden You are using one single direction though : kora. I'm pushing in diverse directi...

well thats not true. i am training on 10+ outfits simultanously + 300 diverse style images (random screenshots from the show)

stone garden Nov 19, 2022, 3:22 PM

#

Very instructive though, I'm getting so much info from our differences in workflow

stone garden Nov 19, 2022, 3:23 PM

#

final matrix well thats not true. i am training on 10+ outfits simultanously + 300 diverse st...

Sorry true, but all of it pertains to the same universe and style no ?

final matrix Nov 19, 2022, 3:24 PM

#

i havent had enough repeats to test this yet bu4 my latest model switched "photo" for "cosplay" because my thinking here is that the model training understands that those images are photos and thus they help with training but captioning them as cosplay instead of training will make it so that i dont just prompt the cosplay training data instead of random photos

i did the same with changing digital art to fanart and it worked

#

also changed figurine to merchandise

stone garden Nov 19, 2022, 3:24 PM

#

Here is the tensorboard for loss value on that last training

final matrix Nov 19, 2022, 3:25 PM

#

stone garden Sorry true, but all of it pertains to the same universe and style no ?

well 10-20% of my training data is photos and such

final matrix Nov 19, 2022, 3:26 PM

#

stone garden Here is the tensorboard for loss value on that last training

i find these loss metrics so useless. for me it seems to always stay at 0.100 loss no matter what i do. only in the first 20 or so repeats its at a very high number and then it just jumps down to a low loss number like 0.060 randomly for a few steps here and there

stone garden Nov 19, 2022, 3:26 PM

#

This may also be why you get overtraining too, 80% pics on the main subject?
By the way, why so many pic ? You still manage to find diversity and new things to add in the 1000+th picture ?

stone garden Nov 19, 2022, 3:27 PM

#

final matrix i find these loss metrics so useless. for me it seems to always stay at 0.100 lo...

I'm not entirely sure how to read them, and have not been able to find good info I'm able to understand that would help me make sense of it yet

final matrix Nov 19, 2022, 3:46 PM

#

stone garden This may also be why you get overtraining too, 80% pics on the main subject? By ...

yes

You still manage to find diversity and new things to add in the 1000+th picture

#

youll see when i upload the final dataset

#

00095-4129780592-facial_closeup_Korra_shoa_hairstyle_wearing_admirals_uniform_19th_century_oil_painting_style.png

final matrix Nov 19, 2022, 4:03 PM

#

with negative prompt of the artstyle

00096-3889473095-Korra_shoa_hairstyle_wearing_admirals_uniform_19th_century_oil_painting_style.png

final matrix Nov 19, 2022, 4:07 PM

#

final matrix

also is it just me or do those hands look better than vanilla SD? if so, it might be because i have quite a couple shots with visible hands in my training data.

stone garden Nov 19, 2022, 4:15 PM

#

I have had the same results : if I have at least 5% of my dataset with clear shots of hands, even a handshot or two, hands are drasticaly improved in the resulting model
I have a list of little concepts like that that I try to put in all my datasets, mostly hands and framing (headshot, fullbody shot, halfbody shot, training those keywords a little helps a LOT)

final matrix Nov 19, 2022, 4:24 PM

#

final matrix also is it just me or do those hands look better than vanilla SD? if so, it migh...

maybe not lol

final matrix Nov 19, 2022, 4:27 PM

#

stone garden I have had the same results : if I have at least 5% of my dataset with clear sho...

yes i added "full-shot", "half-body", closeup and facial closeup manually to all 1142 images

#

and "back"

final matrix Nov 19, 2022, 5:02 PM

#

i might have found a somewhat workable fix for the "overtrain" effect all my images have (despite not actually being overtrained)
literally just adding "photo" to the negative prompt as well as the positive prompt lol
example, first image is photo only as positive prompt, second image is photo also as negative prompt:

(the weird cropping is from an oopsie of forgetting to correctly crop some images, thats fixed in my newest version, but that one doesnt have enough repeats yet for testing)

humble crane Nov 19, 2022, 5:03 PM

#

Hey yall. I installed the stable diffusion but when go to the local url it says this site cant be reached

#

And when i run the webui-user.bat it says file is most likely corrupted

crimson wasp Nov 19, 2022, 7:00 PM

#

final matrix and giving the outfits and hairstyles to other people, in this example Emma Wats...

That is absolutely incredible. It took outfits and hairstyles only shown in 2D and made photorealistic versions of them

crimson wasp Nov 19, 2022, 7:05 PM

#

final matrix I have been working on this model for the past 4 weeks and spent hundreds of eur...

I made this tool to try to make captioning much faster when working with huge amounts of images, though currently it stores all the original filename locations / tagged text / selected image bounds in a text file and I haven't updated any repo to use that as a datasource yet https://github.com/CodeExplode/Image-Tagger

e.g. You could drag in all your images, click batch tag, and tag them all with Korra to have it added to the caption for every image, then search for images with the word Korra, and if you want to rename a token, search for all images with that tag and make a batch (ctrl b), and rename it with batch tagging enabled

GitHub

GitHub - CodeExplode/Image-Tagger: Simple Image Viewer with ability...

Simple Image Viewer with ability to tag images, search by tags, and mark regions for AI training - GitHub - CodeExplode/Image-Tagger: Simple Image Viewer with ability to tag images, search by tags,...

hot breach Nov 19, 2022, 7:45 PM

#

everydream trainer now has some swanky jitter on cropping, accepting any aspect ratio images as long as you do NOT crop your images to 512x512, I'm seeing a pretty substantial increase in quality from this

#

Some examples of tests here training Keir Dullea and comparing it to the base 1.5 + new vae base model: https://huggingface.co/panopstor/EveryDream/blob/main/keir_v_15vae.jpg https://huggingface.co/panopstor/EveryDream/blob/main/keir_v_15vae_grid2.jpg

#

that was just a brief testing pounding Keir Dullea into the model with no sort of regularization or preservation data at all

frozen bobcat Nov 19, 2022, 7:58 PM

#

under the dreambooth tab, what does "half-model" mean???

dapper prism Nov 19, 2022, 10:37 PM

#

frozen bobcat under the dreambooth tab, what does "half-model" mean???

half precision, floating point 16. It means that it rounds the numbers and thus is less accurate

frozen bobcat Nov 19, 2022, 10:55 PM

#

dapper prism half precision, floating point 16. It means that it rounds the numbers and thus ...

OMG FINALLY an answer! habby
thank you so much!!!!!

When I'm creating a new model under dreambooth, there is an option to make it fp16 under the advance options.
But then there is ALSO a checkbox when selecting a model to load paramters from at the top.
What does that checkmark do?

#

#

is it just redundant?

dapper prism Nov 19, 2022, 10:57 PM

#

frozen bobcat OMG FINALLY an answer! <:habby:1017789370532646913> thank you so much!!!!! Whe...

I don't use that UI to train DreamBooth models, so I'm not sure what that option does

dapper prism Nov 19, 2022, 10:57 PM

#

frozen bobcat is it just redundant?

possibly

frozen bobcat Nov 19, 2022, 10:58 PM

#

dapper prism possibly

https://tenor.com/view/namor-talocan-tlalocan-salute-greeting-gif-27029033

Tenor

#

I thank you nonetheless.

woeful goblet Nov 20, 2022, 1:15 AM

#

is anyone else finding inpainting tobe broken in automatic1111 ?

#

It duplicates and offsets my mask painting
https://i.imgur.com/PN10sWi.gif

Imgur

plucky swan Nov 20, 2022, 2:06 AM

#

Do you guys notice some differences when using fp16 compared to fp32, mine seem to get more overfit after converting to fp16

stone garden Nov 20, 2022, 5:26 AM

#

Hi guys, am i able to dreambooth a character using the anything v3 model?

glossy rune Nov 20, 2022, 6:53 AM

#

plucky swan Do you guys notice some differences when using fp16 compared to fp32, mine seem ...

For inference (generating with auto1111 etc) we have not found a visual difference when the model is loaded in fp16. During training/dreambooth it is more likely to make a difference, but if so, still small from what we experienced. We convert every trained model to fp16 once done and have also successfully trained from fp16 checkpoints. But I’d recommend fp32 (or better tf32 or bf16) for training, if gpu allows

plucky swan Nov 20, 2022, 7:01 AM

#

glossy rune For inference (generating with auto1111 etc) we have not found a visual differen...

Im using shivram's repo and noticed that, the diffusers based weights were in good fit on 32 img + 3.4k class + 6.4k steps, but the converted ckpt (fp16) ended up generating overfit results, i wonder why this could happen

glossy rune Nov 20, 2022, 7:21 AM

#

plucky swan Im using shivram's repo and noticed that, the diffusers based weights were in go...

I’ve not had that experience and I don’t use Shivram. What first comes to mind:

checkpoint conversion script, I use the diffusers lib
when you compare diffusers weights and ckpt weights, do you use exactly the same setting? Different steps, scheduler, cfg scale can make a big difference.
Other than that, no idea..

plucky swan Nov 20, 2022, 7:33 AM

#

glossy rune I’ve not had that experience and I don’t use Shivram. What first comes to mind: ...

I tried as close as possible to the diffusers settings to replicate the issue and i used the conversion that is included in the repo,
Is the conversion script included on the huggingface diffusers repo?

glossy rune Nov 20, 2022, 7:34 AM

#

plucky swan I tried as close as possible to the diffusers settings to replicate the issue an...

https://github.com/huggingface/diffusers/blob/main/scripts/convert_diffusers_to_original_stable_diffusion.py

GitHub

diffusers/convert_diffusers_to_original_stable_diffusion.py at main...

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch - diffusers/convert_diffusers_to_original_stable_diffusion.py at main · huggingface/diffusers

plucky swan Nov 20, 2022, 7:35 AM

#

Lemme try that later, thanks for the help 🙂

final matrix Nov 20, 2022, 7:55 AM

#

19198-4139231649-cosplay_photograph_full-shot_Korra_as_a_cop_wearing_police_uniform_stada_hairstyle.png

glossy rune Nov 20, 2022, 8:31 AM

#

stone garden I'm not entirely sure how to read them, and have not been able to find good info...

Not sure this helps, but you are looking at the tail end of something like this https://i.stack.imgur.com/kd2YE.png (the beginning is the 160k hrs of training by stability/runway/compvis).
You can probably only hope to interpret averages over many many steps/epochs.
And then it obviously depends on the repo and the respective details of the loss function, that is used in training. Not sure if that is consistent.

final matrix Nov 20, 2022, 9:01 AM

#

img2img test (and with Emma Watson also an inpainted face)

https://cdn.discordapp.com/attachments/1023293330601287711/1043813081979027476/00168-397052252-Korra_standing_city_street_wearing_defa_outfit_stada_hairstyle_tlok_artstyle.png
https://cdn.discordapp.com/attachments/1023293330601287711/1043813082318774302/00169-2223496039-cosplay_photo_1.3_Korra_standing_city_street_wearing_defa_outfit_stada_hairstyle.png
https://cdn.discordapp.com/attachments/1023293330601287711/1043813082645938187/00170-2696555625-photo_Emma_Watson_standing_city_street_wearing_dress.png
https://cdn.discordapp.com/attachments/1023293330601287711/1043813082994049125/00176-2754479230-facial_closeup_Emma_Watson_standing_city_street_wearing_dress_tlok_artstyle.png

final matrix Nov 20, 2022, 11:34 AM

#

thanks to good captioning you can give Korra other clothes despite being trained hard on e.g. her series parka
https://cdn.discordapp.com/attachments/990008745981530253/1043851537115521074/20695-1158126018-cosplay_photo_Korra_wearing_parka_jeans_shoa_hairstyle.png

cloud yoke Nov 20, 2022, 11:39 AM

#

Anyone tried using multiple classes of regularization images, i.e an instance can have for example 10 reg classes?

glossy rune Nov 20, 2022, 11:54 AM

#

With EveryDream repo you stop thinking in classes and just train on good captions. Also train and reg are just all „training data“. That works well, but is not dreambooth anymore

final matrix Nov 20, 2022, 12:08 PM

#

you can make her younger and older, more or less muscular
https://cdn.discordapp.com/attachments/1023293330601287711/1043857639026741318/20740-744242808-Korra_as_a_very_old_woman_wearing_defa_outfit_shoa_hairstyle_tlok_artstyle.png
https://cdn.discordapp.com/attachments/1023293330601287711/1043857639316135966/20716-794053591-cosplay_photo_young_teen_14_year_old_Korra_wearing_defa_outfit_shoa_hairstyle.png
https://cdn.discordapp.com/attachments/1023293330601287711/1043857639660064848/00261-691938722-Korra_as_a_mature_40_years_old_woman_wearing_defa_outfit_shoa_hairstyle_tlok_artstyle.png
https://cdn.discordapp.com/attachments/1023293330601287711/1043857639966265374/20751-1782472112-very_muscular_Korra_wearing_defa_outfit_stada_hairstyle_tlok_artstyle.png
https://cdn.discordapp.com/attachments/1023293330601287711/1043857640251465878/20755-2679346538-very_weak_thin_Korra_wearing_defa_outfit_stada_hairstyle_tlok_artstyle.png

glossy rune Nov 20, 2022, 1:48 PM

#

I see a kid’s christmas present taking shape…

final matrix Nov 20, 2022, 1:56 PM

#

for a long time i had the issue of thinking my models are overtrained because especially photos would be so extremely fried. but today i have found the solution.
both images same prompt, cfg7

top image has negative prompt "white skin, tlok artstyle"
bottom image has negative prompt "white skin, blur, cosplay photo, vignette, instagram, tlok artstyle"

https://cdn.discordapp.com/attachments/1023293330601287711/1043887298737082408/grid-0222.png
https://cdn.discordapp.com/attachments/1023293330601287711/1043887299076837386/grid-0221.png

glossy rune Nov 20, 2022, 2:05 PM

#

final matrix for a long time i had the issue of thinking my models are overtrained because es...

Pretty strong corrections possible through prompts… Steps, cfg scale etc can also do a lot. I always try to go for good visual results with very basic prompts and default settings.
My intuition would be to work on the training…

final matrix Nov 20, 2022, 2:07 PM

#

glossy rune Pretty strong corrections possible through prompts… Steps, cfg scale etc can als...

I always try to go for good visual results with very basic prompts and default settings.
My intuition would be to work on the training…
thats the ideal but doesnt work in this case because without enough training there wont be enough likeness

#

so i rather fix it through some negative prompts and in return have good likeness

glossy rune Nov 20, 2022, 2:08 PM

#

Yes, impressive results!

final matrix Nov 20, 2022, 3:27 PM

#

21482-753890848-realistic_detailed_pencil_drawing_Korra_wearing_defa_outfit_stada_hairstyle.png

rough shoal Nov 20, 2022, 4:34 PM

#

final matrix my legend of korra model is likely release-ready by tomorrow evening i restarte...

Nice work, but why TLoK when ATLA is superior? Did you add any of that in your model?

final matrix Nov 20, 2022, 4:38 PM

#

rough shoal Nice work, but why TLoK when ATLA is superior? Did you add any of that in your m...

because the TLOK artstyle is superior and higher quality and I like Korra more lol

final matrix Nov 20, 2022, 5:17 PM

#

https://cdn.discordapp.com/attachments/990008745981530253/1043922220814172180/21499-285405452-drawing_Korra_wearing_defa_outfit_stada_hairstyle.png
https://cdn.discordapp.com/attachments/990008745981530253/1043922221153923132/21503-3825959815-childrens_drawing_Korra_wearing_defa_outfit_stada_hairstyle.png
https://cdn.discordapp.com/attachments/990008745981530253/1043922221497851984/21513-199030591-Soviet_propaganda_mural_on_a_house_Korra_wearing_defa_outfit_stada_hairstyle.png
https://cdn.discordapp.com/attachments/990008745981530253/1043922221896302622/21536-461260207-antique_white_marble_bust_Korra_wearing_defa_outfit_stada_hairstyle.png

#

https://cdn.discordapp.com/attachments/990008745981530253/1043922222227664936/21557-366080050-cosplay_photo_of_a_stone_statue_of_Korra_in_a_park_wearing_defa_outfit_stada_hairstyle.png
https://cdn.discordapp.com/attachments/990008745981530253/1043922222579982457/21590-787217719-Korra_wearing_defa_outfit_stada_hairstyle_as_a_cute_Pixar_character.png
https://cdn.discordapp.com/attachments/990008745981530253/1043922222919712848/00327-2663151080-Korra_wearing_defa_outfit_stada_hairstyle_figurine.png
https://cdn.discordapp.com/attachments/990008745981530253/1043922223284629504/21617-1406750662-Korra_wearing_defa_outfit_stada_hairstyle_realistic_detailed_digital_art_by_Greg_Rutkowski.png
https://cdn.discordapp.com/attachments/990008745981530253/1043922223662112788/21704-3946447154-Korra_wearing_bafo_outfit_loes_hairstyle_art_by_Greg_Rutkowski.png

final matrix Nov 20, 2022, 6:06 PM

#

https://cdn.discordapp.com/attachments/1038282137545211946/1043950350807351467/00341-1601892205-facial_closeup_Emma_Watson_tlok_artstyle.png

glossy rune Nov 20, 2022, 8:24 PM

#

stone garden Here is the tensorboard for loss value on that last training

here's a tensorboard example with a few more steps and smoothing 0.9, so the dark line shows a moving average of the loss and the lighter line shows the loss value for each step. the step value fluctuates heavily (very likely due to the relatively small batch size), but the average is a bit easier to interpret in terms of progress.. but i heavily rely on visually inspecting samples from the model to judge.

stone garden Nov 20, 2022, 8:26 PM

#

glossy rune here's a tensorboard example with a few more steps and smoothing 0.9, so the dar...

Train, loss, I would really need to look into the code more precisely to understand how they are calculated.
I'm still missing any indicator during training for bleeding, for a neutral prompt to see if my concepts starts to impact. I wish everydream gave a little more parameters on the image logging during training

glossy rune Nov 20, 2022, 8:28 PM

#

yes, that loss chart will not do that. i haven't looked into it, but wondered as well, if i can modify/control the image generation during training for everydream. should not be too complicated... i'll let you know, if i find something

#

train loss is basically the difference between the prediction (image generated from prompt) and the provided image. but since there are multiple models involved, i'm not sure about all the details.

a validation/test set could be used to monitor performance on some benchmark (preservation) prompt/image combinations. then validation/test-loss would show, if that becomes worse overtime. but would still not necessarily catch any bleeding effects

#

if i see this correctly, the sample images are the first (?) n predictions from the current training batch. so while everydream is looping over your training data, you can see on tensorboard, what the most recent predicted images looked like. if you have 1000s of images, this can be a bit more random, but if your training set is small, you should repeatedly see, if there is progress on the same input prompts.

#

i'm thinking of adding a "validation set" with only ~4 prompts and maybe even no images. you don't use it for loss monitoring, but for visual comparison... the trainer should be able to run a prediction on this validation set whenever it runs it on your current training batch. and you can monitor the results on your target prompts with this 🤔

final matrix Nov 20, 2022, 11:02 PM

#

I'm currently training my last model and then I'll do the final tests tomorrow and then I'll probably publish my model tomorrow evening.

I even set up a KoFi.

#

I will release my 1142 manually captioned dataset alongside the model.

And if I get to finish it in time I will also include a lengthy writeup of my process the past 4 weeks.

It will likely be the best model released for StableDiffusion yet with it being highly flexible and having high likeness. The only major shortcoming is the fact that 512x512 generations in the artstyle WILL look garbage without img2img to 1024x1024 or a higher resolution as well as inpainting the face.

Some outfits will also work better than others simply because some outfits just have massively more training data for them than other outfits.

The closer you generate a character the better it will look from the onset too. Full-shot images will sometimes need img2img up to a 1536x1536 (the maximum my GPU can do) resolution before looking okay.

The model works really well with img2img however and I recommend always using img2img to upgrade the initial 512x512 to a higher resolution or using a photo as a basis and transferring it into the artstyle using img2img.

More simple prompts also work better than more complex prompts, though the latter can work with prompt engineering.

Due to the nature of how I trained the model prompts will also often ironically need the artstyle as well as some other negatives like instagram; vignette and blur in the negative prompt section in order to massively reduce the overtraining effect. You will see what I mean when I provide a guide on how to use the model correctly tomorrow.

final matrix Nov 21, 2022, 1:00 AM

#

two prompts from reddit that i tested with my model
https://cdn.discordapp.com/attachments/1039626924458258533/1044052433514537070/00351-2012633143-tlok_artstyle_Full_colour_Hollywood_publicity_shot_of_attractive_young_white_female_astronaut_with_single_low_ponytail_brown_hai.png
https://cdn.discordapp.com/attachments/1039626924458258533/1044052020061032469/00346-3958477208-tlok_artstyle_beautiful_detailed_realistic_portrait_furry_anthro_scary_dog_fursona_muscled_man_wearing_punk_clothes_in_the_stree.png

final matrix Nov 21, 2022, 1:18 AM

#

"Korra as a winged fairy with butterfly wings wearing bafo outfit loes hairstyle realistic digital art by Greg Rutkowski"
https://cdn.discordapp.com/attachments/1023293330601287711/1044059116173283338/00353-3303366631-Korra_as_a_winged_fairy_with_butterfly_wings_wearing_bafo_outfit_loes_hairstyle_realistic_digital_art_by_Greg_Rutkowski.png

viral jay Nov 21, 2022, 1:04 PM

#

frozen bobcat

what extension is that?

#

oh nvm its the d8ahazard, thats weird I was doing git pull but mine was still "old" and didn't had those options, I reinstalled it and now it showed the new options

slate vessel Nov 21, 2022, 3:19 PM

#

What I want as a finetuned model is a Enchanted/Disenchanted model since the style is Disney-like but not classic Disney animation

steep nova Nov 21, 2022, 6:16 PM

#

a painting of two men playing chess while sitting in a park, by rutkowski and artgerm, highly detailed, trending on artstation, movie concept art, cinematic lighting

last swallow Nov 21, 2022, 10:42 PM

#

I created a WebUI extension that can predict tags in a single or multiple image. Supports two tagging models: DeepDanbooru and Waifu Diffusion 1.4.

#

https://github.com/toriato/stable-diffusion-webui-wd14-tagger

GitHub

GitHub - toriato/stable-diffusion-webui-wd14-tagger: Tagger for Aut...

Tagger for Automatic1111's Web UI. Contribute to toriato/stable-diffusion-webui-wd14-tagger development by creating an account on GitHub.

leaden patio Nov 21, 2022, 11:05 PM

#

Has someone made a dreambooth of liminal spaces yet? Where can I find it?

final matrix Nov 22, 2022, 12:15 AM

#

I hereby finally release my All-In-One Legend of Korra artstyle + Korra character model to the public, including a 1142 manually captioned dataset!
Do note that this model is trickier to use than other models you may be accustomed to! I have a "How to" section on the Huggingface page for this exact reason.

Dataset download is also included at the bottom of the page there. The page also has example images and a short explanation of how I created this dataset and model.

https://huggingface.co/ai-characters/4elements-diffusion

ai-characters/4elements-diffusion · Hugging Face

#

since the thumbnail hasnt loaded yet, here are some examples of what the model can do
https://cdn.discordapp.com/attachments/1004159122335354970/1044406087371202640/gandr-collage.jpg

peak ridge Nov 22, 2022, 8:28 AM

#

guys is there a simple way to native train SD? like with a script like dreambooth or somthing? i have a bunch of images manually tagged / captioned, different concepts involved, and i feel that dreambooth is kinda limited and destroys the original model... not sure where to begin, i found everydream on github, but not sure it's the right thing, as it seems to be based on dreambooth - and i need to teach sd new concepts, not replace existing ones...

plucky zinc Nov 22, 2022, 9:28 AM

#

peak ridge guys is there a simple way to native train SD? like with a script like dreamboot...

do you have prior preservation on?

#

anyone have experience training cartoon characters on dreambooth? what would you use as the word on a character that's somewhat humanoid but not really a human nor really resembling any kind of animal? just character? or creature maybe?

glossy rune Nov 22, 2022, 9:32 AM

#

glossy rune I see a kid’s christmas present taking shape…

final matrix Nov 22, 2022, 9:32 AM

#

my god

peak ridge Nov 22, 2022, 9:54 AM

#

plucky zinc do you have prior preservation on?

i don't even know where to start, i know what prior reservation is, but i don't even have a training setup / script

glossy rune Nov 22, 2022, 9:58 AM

#

i'd recommend dreambooth first (even though it might not be a perfect fit for your captioned dataset). everydream would be a better fit, but it's probably easier to get started with dreambooth.
auto1111 now seems to have dreambooth ui integration. if you know a bit about python, i'd recommend the diffusers library -> examples -> dreambooth as a simple script to start off. and then maybe proceed to more elaborate setups.

peak ridge Nov 22, 2022, 10:02 AM

#

oh yea, i already train with dreambooth, but it's very limited, i want to train it new concepts - while keeping the original model intact

glossy rune Nov 22, 2022, 10:03 AM

#

you can do some of that with dreambooth (dreambooth on dreambooth works). but then proceed to everydream

peak ridge Nov 22, 2022, 10:07 AM

#

yea, i've been reading about everydream, seems a step forward from db, one thing i'm not sure (with both of them) is captions / tags : i will add tags / captions manually for max accuracy, but i'm not sure if i would add them in human readable format (like "a man with 6 legs walking on the beach") or comma separated (man, 6, legs, walking, beach)

glossy rune Nov 22, 2022, 10:08 AM

#

for everydream you just caption your images in the file name (on of the options) in human readable format. think alt-tag

#

you can then add/replace some specific terms, where you dont want to just extend the training but define something new

#

"a person sitting on a chair" -> "characterxy sitting on a chair"

#

everydream tools comes with blip autocaption

peak ridge Nov 22, 2022, 10:10 AM

#

cool stuff, read about this, but wasn't 100% sure about it, thanks for the answer

#

also, when i interrogate clip / blip to get a description - because the images i want to train on are really sci-fi - it doesn't understand the concepts in there (because it doesn't know them) - should i use my own captions (that represent the exact reality of that picture - like man with 6 legs on the beach) or whatever clip / blip gets me (a man walking on the beach - but this is missing the main concept i want to train)?

glossy rune Nov 22, 2022, 10:15 AM

#

you will probably need a pretty big dataset (i'm using >=10k imgs for projects that would come closest to what you hint at) gpu and training budget to teach as many new concepts. but yes, then i'd sure go with custom captions.

peak ridge Nov 22, 2022, 10:16 AM

#

i can get the images - currently at about 2k captioned - and i can also get gpu power

#

what i'm trying to achieve is, based on the "man with 6 legs" concept (and some other sci-fi stuff) - train SD to generate a random man with 6 legs when prompted to. what i currently get with dreambooth is the same man with 6 legs as in the training images - if there's a way of doing this with DB, i can try it, as it's fast and working ok. should i try captions like "man6leg, man with 6 legs"? and when i prompt man6leg it will generate the exact characted in the training images, but when i prompt "man with 6 legs" it will generate a random man with 6 legs, in the same style? not sure how to approach this

glossy rune Nov 22, 2022, 10:28 AM

#

i see. you can modify a dreambooth concept. i trained a person with db ("sks person") and made them e.g. different dress sizes by prompting ("overweight sks person").
so i'd guess you can change the appearance of the upper body by prompting your dreambooth model. and if you get to a very good quality level, then you can generate more images of different looking 6-legged men and do a fresh dreambooth with them.

#

if it is only this one concept, i'd go with dreambooth and probably a dataset of 50-100 images.
if you have a lot of all-new concepts like this, then i would consider everydream and the big dataset

peak ridge Nov 22, 2022, 10:38 AM

#

sounds like a plan, one last question about DB (this whole discussion has been really helpful btw), if i train one concept, and then use the trained ckpt file to train it a new concept, will that work? i mean, will it know both concepts? could i train multiple concepts this way? one at a time? in case everydream doesn't work as expected (DB works fine at the moment)

glossy rune Nov 22, 2022, 10:43 AM

#

yes, i've done that for two iterations successfully (e.g. style and person). not sure how often before stuff fades or other problems occur

peak ridge Nov 22, 2022, 10:45 AM

#

going to give it a try with db first, then try the other thing...

final matrix Nov 22, 2022, 12:04 PM

#

final matrix I hereby finally release my All-In-One Legend of Korra artstyle + Korra characte...

Dataset has also been uploaded to Huggingface now https://huggingface.co/datasets/ai-characters/4elements-diffusion-captioned-dataset

ai-characters/4elements-diffusion-captioned-dataset · Datasets at H...

final matrix Nov 22, 2022, 12:44 PM

#

i just noticed that tagging locations such as "city" or "exterior" may have been a huge mistake as it now always tries to refer to my training images when doing those prompts.
i will change my caption set and train a versiion 2.0 of my model on it and see what happens!

plucky zinc Nov 22, 2022, 3:03 PM

#

glossy rune i see. you can modify a dreambooth concept. i trained a person with db ("sks per...

noteworthy is that in sd 1.5 sks is a real life semi-automatic rifle so might want want to use a different word

glossy rune Nov 22, 2022, 3:04 PM

#

thanks, i'm aware. but i still find it works better than some of what i tried.

split acorn Nov 22, 2022, 3:04 PM

#

yeah, I hope that trend of sks goes away

frosty wave Nov 22, 2022, 3:36 PM

#

Hi! I'm new to training, however I already tested hypernetworks and Dreambooth succesfully on my local SD.
I wonder if it is possible to manipulate characteristics of main classes ? Say for instance I want to create a model in which everytime I prompt for a "child" it generates a child having only one eye in the middle of the forehead, instead of a normal child with two eyes. I mean, really by prompting "child" only, without adding "cyclop" before - that is to manipulate the classe "child" itself. Given that I do have a bunch of SciFi cyclop kids images to feed him...
Is there a strategy for this kind of thing using Dreambooth or any other tool?

glossy rune Nov 22, 2022, 3:41 PM

#

i'd not go down that path. with all the training the model has seen before your dreambooth, it has learned about children and faces and eyes (even if it can't count fingers...). thats a prior. if you want to modify that prior, you have to move the model weights by a lot. and that is likely do damage other things it learned in major ways. in dreambooth you create something new, that should have almost no prior, to give it some meaning and the weights can combine to result in learning child and cyclops should be returned when asked for the newly learned concept.

#

fwiw i have once tried to modify brad pitts face with prompting and it wouldnt show a black eye or bruises or anything with simple prompts. there's a stong prior. with major weight modifications on parts of the prompt, you could get some results, but you could feel how this was not a good approach. with a good approach, you get at least decent results pretty effortlessly

frosty wave Nov 22, 2022, 3:56 PM

#

I see. Now I noticed that when I learn SD a new indtance/concept via Dreambooth while giving him as "class pictures" a database of images that have not been generated by SD, then it does have an impact on the class. Once the concept is learnt, the class itself seems to have been a bit changed in a way that matches more or less the biases of my database.
For instance i tried to learn him my own face with a few pictures of me, but using a "person" class picture database that were real photos of a bunch people. However that database had a strong biais: a big disproportion of older people (statistically compared to a normal population). Then I noticed after that, when I prompted for a person in general (and not only myself) it gave me more often old people than before, statistically.
So that's why I wonder if we can manipulate a classe by learning concepts as usual, but using biased image database as "class pictures".

glossy rune Nov 22, 2022, 4:17 PM

#

class images in dreamboth have the purpose of allowing to train the instance without damaging the existing knowledge too much. i dont think this should be used to modify the class, but only the instance.
i prefer to add knowledge instead of overwriting/redefining. much easier and you dont work against a mountain of gpu hours of the original training.
in case you want to modify more than an instance, research everydream. that is not dreambooth, you dont train instance/class but just train the model with captioned data. allowing to add and modify concepts.

split acorn Nov 22, 2022, 4:25 PM

#

frosty wave Hi! I'm new to training, however I already tested hypernetworks and Dreambooth s...

If you train with pictures with only one eye and you don't use any regularization images on that class, then it should overfit into that class and create cyclopes. At least, that's how it worked with my anime tests with no regularization images. Though, this wouldn't work well with stuff it doesn't have knowledge of? I'd think you'd have a really really hard time with cyclopes, tbh

#

it's a bit more complicated than that though and there are a lot of other factors to consider as well (like the model / settings your training with including how much)

#

It'd say doing that way isn't ideal though alicatKEK2

split acorn Nov 22, 2022, 4:44 PM

#

I'd do a proof of concept to show you, but I can't locally train DB alicatCry (I use runpod)

warm cloud Nov 22, 2022, 4:49 PM

#

is there a way to convert this pt file into ckpt? I would like to use this architecture model as checkpoint but just changing file extension does not work https://huggingface.co/jerostephan/Architecture_Diffusion_1.5M

jerostephan/Architecture_Diffusion_1.5M · Hugging Face

final matrix Nov 22, 2022, 5:39 PM

#

I radically changed my dataset:

removed full shot, half-body, closeup etc
removed location tokens (e.g. exterior, interior, city street etc)
removed all mentions of facial expressions but smug
removed mentions of poses such sitting, lying, etc except fighting pose
removed all , and ;
removed mentions of non-show clothing
made the syntax the same for all images and added "with" in front of the hairstyle mentions

my thinking here:

my version 1.0 is very bad at locations and landscapes. not only do they look bad but i have noticed that its very focused on my training images. my belief here is that this is due to me tagging all locations within my images, like "city street" when an image has a city street background. so instead of applying the tlok artstyle to a random city street when prompting it, it will output the city street i trained it on. and because in some shots you would barely even see any of the city street it would now always only give me very zoomed in shots of a city street.

this is likely because each caption token is trained into the model as a token. so it trained my training image version of a city street into the city street token.

so now that i left it empty again it should still be able to learn what a street in the tlok artstyle will look like, but without training the specific street into the model.

and i have done this for all things now. so almost all my style images are now tagged as just "tlok artstyle" except for those inages where i do want certain things be trained into the model, e.g. when people wear earth kingdom clothing.

as for the other stuff: i think the whole full shot stuff was unnecessary and cluttered the captions too much. i should be able to prompt a full shot just as easily if i negative prompt closeup and the like.

i think the , ; are unnecessary and may even be bad? anyway lets see what happens without them.

i also removed mentions of any facial expressions the model seems to already know; e.g. all but smug.

similarly i removed all mentions of poses it should already know.

similarly to the city street thing i removed all mentions of non-show clothing and hairstyles such as a tshirt because i believe that the model can infer this information itself from the training but if i tag it as e.g. a tshirt it will now always try and give korra that tshirt.

and last but not least i had some images where it was "wearing X outfit Y hairstyle" but then some where it woulf be "wearing X outfit fighting pose Y hairstyle" and i have no idea if it influences anything but just to be sure i made the syntax the same for all now and made it grammatically correct, e.g. "wearing X outfit with Y hairstyle"

i have also even further upped the training rate from 3e-6 to 5e-6 and i am very curious if it that will work. i am doing it for only 25 repeats right now and taking a ckpt every 5 repeats.

stone garden Nov 22, 2022, 5:52 PM

#

interesting changes. I'll look forward to the results.

The whole "training the street" part, I get and understand well, I made the same error I think. For most of those other tokens, using them once is nice, too many times and it trains them too much too, almost as much as the base concept we are teaching.

Having less token in the prompts will also make learning faster I believe ? It seemed so for me at least, I thought because the attention was higher on the main token I was teaching thanks to this.

And a big thank you from me and others of the community, very interesting and detailed analysis !

frosty wave Nov 22, 2022, 7:02 PM

#

glossy rune class images in dreamboth have the purpose of allowing to train the instance wit...

Thanks! Well I'm a player, I like to experiment an push the limites 🙂 But I will definitively look at that Everydream thing, I didn't know it. Any suggestion for a starting point or documentation on this? If I can use it with my local SD it would be great.

frosty wave Nov 22, 2022, 7:03 PM

#

split acorn I'd do a proof of concept to show you, but I can't locally train DB <:alicatCry:...

Thanks, I would be very interested!

glossy rune Nov 22, 2022, 7:06 PM

#

frosty wave Thanks! Well I'm a player, I like to experiment an push the limites 🙂 But I wil...

Yeah, with everything „you should do“, there is obviously always the path to just ignore it and try anyways. Should be fun 😃
I find everydream sufficiently documented on GitHub to get startet and don’t know of other sources.

split acorn Nov 22, 2022, 7:21 PM

#

100% yeah

final matrix Nov 22, 2022, 7:31 PM

#

final matrix I radically changed my dataset: - removed full shot, half-body, closeup etc - r...

@stone garden
its working! still very early repeats but this is promising!
first image is new model, 2nd image is old model
while its easier to prompt full-body shots in the old model, the new model is clearly much better at basically everything lol

stone garden Nov 22, 2022, 7:34 PM

#

so over captionning is bad too.... hum.... well I have some work to do too x)

final matrix Nov 22, 2022, 7:40 PM

#

i might just add the "full-shot" captions back in though.

stone garden Nov 22, 2022, 7:50 PM

#

don't add it on all the pics full shot. 5 to 10% of dataset is a lot for such a keyword : it's already mostly known by SD

tranquil river Nov 22, 2022, 8:49 PM

#

If i were to want to fintune a model on pixelart 32×32 pixels (tiles from a game) should I first rescale them to 512×512? or can I/ is it better to work with the small images?

split acorn Nov 22, 2022, 8:52 PM

#

There are a couple examples out there, could use those as references to see what they did. Typically, they use sheets of 32x32 pixel characters

lament moat Nov 22, 2022, 8:53 PM

#

anyone have any tips or know of a good resource for bleniding models using the "Checkpoint" feature on the webGUI? I have had some luck, but it always seems to go one step too far. I am especially lost on how the third "tertiary" model fits in.

split acorn Nov 22, 2022, 8:54 PM

#

For sprites, here's an example:
https://huggingface.co/Onodofthenorth/SD_PixelArt_SpriteSheet_Generator

tranquil river Nov 22, 2022, 8:55 PM

#

that's great thank you! I'll check it out :)

split acorn Nov 22, 2022, 8:55 PM

#

I'll send another one sec alicatPog

#

There's this example:
https://publicprompts.art/pixel-art-v1-dreambooth-model/
https://publicprompts.art/all-in-one-pixel-art-dreambooth-model/

#

There are some models that do pixel art better than others, as well

#

I think the NAI does pixel art well, but they haven't added the ability to train yet (though, I imagine they will at some point? no idea, I know they did "Prompt Tuning" which works similarly but for GPT text based stuff)

crimson sandal Nov 22, 2022, 9:26 PM

#

I am sure someone posted here before so I apologize for re-asking the question. I am having a hell of a time training a stable diffusion model on multiple people. I see people have done it using dreambooth, but I am getting terrible results. Has anyone successfully done this? I would love to chat if so! These are real people and I want them all to be as distinct as possible but coming from the same model.

split acorn Nov 22, 2022, 9:33 PM

#

Lots of examples! There's a Genshin model trained with multiple which looks really good. There's a discord server specifically dedicated to Dreambooth that also has a channel dedicated to multiple subjects, I can DM if you'd like (I have no affiliation in any way)

sullen apex Nov 22, 2022, 9:58 PM

#

0_0

stone garden Nov 22, 2022, 10:00 PM

#

so, you wanted more details on "50 total pictures in the dataset, 160 repeats total each, over 4 Epoch on LR1e-6."
160 repeats means that each picture is trained 160 times on the model.
Epoch is a way to cut the training in parts. In this case, 4 epochs mean that those 160 repeats were done in batch of 40. Every 40 repeat of all 160 pictures (randomised), it would make a checkpoint. each epoch makes also sure every picture is presented for its number of repeats.
LR is learning rate. 1e-6 is 0.000001. This is the speed at witch you train and learn things. Higher can go faster, sometimes bring lesser quality results. lower lets you manage more in my opinion

sullen apex Nov 22, 2022, 10:02 PM

#

But there are 50 images so 50x160=8000 trainings so 2000 per epoch?

stone garden Nov 22, 2022, 10:02 PM

#

yep this is 8k steps

sullen apex Nov 22, 2022, 10:03 PM

#

Ok that makes more sense thanks

stone garden Nov 22, 2022, 10:03 PM

#

one last param I didn't show is batch size

#

it's how many pictures are trained on at the same time

#

more VRAM for sure, but higher speed, like in image generation, AND quite higher quality from what I can see

#

not 100% sure on quality, could be confirmation bias

#

still experimenting

sullen apex Nov 22, 2022, 10:04 PM

#

Interesting!

stone garden Nov 22, 2022, 10:05 PM

#

those 2 models (character + creature), I did 4 trainings total, I spent around 100K steps total lol

#

200GB of checkpoints compared x)

#

I tried making only 1 model for both style, but didn't have enough time to make it work

sullen apex Nov 22, 2022, 10:08 PM

#

Can you just weighted sum merge 0.5 at that point?

stone garden Nov 22, 2022, 10:12 PM

#

sullen apex Can you just weighted sum merge 0.5 at that point?

I never did merging myself, I don't exactly understand what you mean.
I know I had a friend merge my model and it still worked (sims model) so I think it should

#

you mean do a sum/average of all models ?

#

I loose quality on each concept when I do that, I prefer to train all concepts at once, but it requires a little more experimenting, I'm trying to see how I can add 2 datasets correctly... Like here, I didn't use any regularisation outside of full captioning (witch helped), but I've been very worry of not overtraining, and tested mostly on that criteria

silent holly Nov 22, 2022, 10:25 PM

#

split acorn Lots of examples! There's a Genshin model trained with multiple which looks real...

I'm interested in the multi-subject training channel.

split acorn Nov 22, 2022, 10:27 PM

#

Model merging is amazing, would def recommend it

#

random use case, I downloaded the D&D model but it didn't do what I wanted it to do. Merged it with a high quality model (using add difference) and now it can produce results closer to what I was looking for

crimson sandal Nov 22, 2022, 10:41 PM

#

split acorn Lots of examples! There's a Genshin model trained with multiple which looks real...

That would be amazing! Thank you!

karmic warren Nov 22, 2022, 11:07 PM

#

hey, i'd like to explore changing the dataset for DB training as i add more steps, has anyone experimented with it already ?!

#

my initial thought is to start the first 500 steps with a 50 50 mix of 2x2 3x3 grids and single pictures, then gradually fade out the grids until 2.5k steps and add another 2.5k steps with only single pictures

#

unless someone with experience advises against that, that's what i'll be trying in a couple hours 🤞

open abyss Nov 23, 2022, 1:08 AM

#

I'm fine-tuning to get a specific person, and I am fortunate enough to have a "plenty big" set of data -- 200+ images -- at very high resolution. The automatic1111 built-in crop-and-caption was being stupid and wasting a ton of space on backgrounds, rarely getting the face in-frame (even with face attention turned on!) so I'm manually cropping. But since I'm the one in charge, I realized I can crop both an upper-body portrait (showing the person's figure, fashion sense, and posture) but also go in close and get a 512px face portrait from the same source.

Does anyone know if having "repeats" of the same data at different scales will be beneficial or detrimental to my training?

split acorn Nov 23, 2022, 1:20 AM

#

I've done multiple TI/HN/DB to good success with that method. Seems to work well! I'd just be careful with flips and small datasets with only a few images.

#

I don't have anything to compare it against, so it could technically still be detrimental, but AG_Shrug

ocean patrol Nov 23, 2022, 1:44 AM

#

Has anyone tried to train a model to generate normal maps (or height maps)? I'm thinking img2img could be very useful for that.

#

Maybe it'd need to be a style?

frosty wave Nov 23, 2022, 1:53 AM

#

glossy rune Yeah, with everything „you should do“, there is obviously always the path to jus...

EveryDream: " is designed to run on a local 24GB Nvidia GPU, currently the 3090, 3090 Ti, 4090, or other various Quadrios and datacenter cards (A5500, A100, etc), or on Runpod with any of those GPUs." << ahem 😄 I will stuck on Dreambooth for now then... Unless it has been tested with smaller config? (I've a 3060 12gb...)

open abyss Nov 23, 2022, 2:12 AM

#

Thanks! I've mostly been doing sets of 25, and getting really solid results, so I'm excited to have a set of 250+ HD pictures to work with. Do you have any experience with setting the learning rate to something besides 0.005? I tend to drop mine by ~2 orders of magnitude once the outputs start to look like the original and get pretty good results... but even at 100K steps I still get some weird mutant/plastic outputs occasionally

karmic warren Nov 23, 2022, 3:21 AM

#

trying to make a publishable set of anime pictures for women characters :
first try used AnythingV3 with the prompt a woman character the result was definitely not publishable,
currently using Elysium_animeV2
img2img with the output of AnythingV3
positive prompt: a character
negative prompt: lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, missing nose, young, sexy, loli, lolita, children, child, kid, cat ears, horns, nudity, skimpy clothes, swim suit
DDIM 64 steps

#

doesn't seem like it's toning down the output too much, suggestions to get relatively neutral characters are appreciated

karmic warren Nov 23, 2022, 3:49 AM

#

it seems to work much better now, the trick was o not try to tone it down but tone it up:
generated a bunch of realistic women of all ages and looks then asked AnythingV3 to img2img with
prompt: a woman character
and all the negative prompts above

#

should be done generating in two hours, is Mega alright here for uploads or is there a dedicated area for data ?

bold saffron Nov 23, 2022, 4:15 AM

#

karmic warren trying to make a publishable set of anime pictures for women characters : first ...

didn't occur to me that literally putting in "bad anatomy" for the negative prompts would actually work

karmic warren Nov 23, 2022, 4:18 AM

#

bold saffron didn't occur to me that literally putting in "bad anatomy" for the negative prom...

😄 took that from the readme here https://huggingface.co/hesw23168/SD-Elysium-Model then added a couple of my own

bold saffron Nov 23, 2022, 4:28 AM

#

welp, there you go I gues

final matrix Nov 23, 2022, 8:46 AM

#

final matrix <@456226577798135808> its working! still very early repeats but this is promisin...

@stone garden
ok finished model results:

this model is much better at

outfit likeness
scenes/backgrounds/landscapes
general flexibility

its worse at

overall artstyle and cohesion for some shots, particularly those i didnt label and which were hard for the AI to discern itself what the image is supposed to represent
full-shots
Korra wearing non-show outfits

i have also done some testing regarding "Y artstyle" vs. "image of X in the Y artstyle"
and my conclusion is that "image of X in the Y artstyle" is superior.

case on point: ukiyo-e.

1st image: car ukiyo-e
2nd image: car ukiyo-e style
3rd image: image of a car in the ukiyo-e style

the third image output a car or something car-adjacent the most often.

i will create a version 3.0 of my dataset now which will change the following things:

readd full-shot, but not half-body or closeup since the model does that already well on its own. i will also only add full-shot to images with a background, not to the character concept art with no background
readd captions to some style images like city streets etc, but less, more simple, and as "image of X in the Y artstyle"
readd captions for some but not all non-show outfits, e.g. tshirt
make the outfit tokens more unique and be only a single token
either remove some of the outfits or get better images of them

https://cdn.discordapp.com/attachments/1029222282511515678/1044896496673374208/grid-0585.png
https://cdn.discordapp.com/attachments/1029222282511515678/1044896497025683506/grid-0586.png
https://cdn.discordapp.com/attachments/1029222282511515678/1044896497344446494/grid-0587.png

final matrix Nov 23, 2022, 9:03 AM

#

some more extensive testing on the same seed:

car ukiyo-e
car ukiyo-e style
car in the ukiyo-e style
image of a car in the ukiyo-e style
https://cdn.discordapp.com/attachments/1038283818949414973/1044900859026341908/grid-0632.png
https://cdn.discordapp.com/attachments/1038283818949414973/1044900859370291241/grid-0633.png
https://cdn.discordapp.com/attachments/1038283818949414973/1044900859743588352/grid-0634.png
https://cdn.discordapp.com/attachments/1038283818949414973/1044900860058140672/grid-0635.png

#

also:
artstyle > style
artstyle is two tokens as opposed to style which is just one token, so in that aspect artstyle is worse, but in respect to the AIs understanding of what I want, artstyle is better

e.g.
1st image: image of Emma Watson in the lok style
2nd image: image of Emma Watson in the lok artstyle
https://cdn.discordapp.com/attachments/1029222282511515678/1044902789559631882/grid-0644.png
https://cdn.discordapp.com/attachments/1029222282511515678/1044902789911945236/grid-0643.png

#

a more radical example of why "image of X in the Y artstyle" is superior

1st image: city tlok artstyle
2nd image: image of a city in the tlok artstyle

https://cdn.discordapp.com/attachments/1029222282511515678/1044904828721180672/grid-0649.png
https://cdn.discordapp.com/attachments/1029222282511515678/1044904829107048448/grid-0650.png

#

for those who dont get what i am doing here:

i trained my captions as just "caption tlok artstyle" so far
and have issues with prompting certain things properly in just the style
so i have done this testing and turns out "image of X in the Y artstyle" is a superior way of captioning in order to teach the model that you want just the style, nothing else

stone garden Nov 23, 2022, 10:26 AM

#

final matrix for those who dont get what i am doing here: i trained my captions as just "cap...

One thing that could be a bias here for the "artstyle > style" argument is that you trained on artstyle keyword, but unless I misunderstood, you didn't train on the style keyword to compare, just used the style token instead of artstyle while prompting, right ?

If so it would seem logical that the "artstyle" keyword is stronger than style, since it's what you trained on, and would respond the other way if trained on the other token

I wanted to do an example with the "AnimeChan Style" model trained on this "style" token instead of "artstyle", but to be honest, both keywords come out as well to me there. slightly different results but none is really closer to the dataset artstyle... Well, I'll try "artstyle" next training

your examples are quite striking in difference, mine is quite tame (I ran different samplers and seeds, it came out the same), so this artstyle method seems to have merits 🙂

final matrix Nov 23, 2022, 10:29 AM

#

stone garden One thing that could be a bias here for the "artstyle > style" argument is that ...

no those examples are all from vanilla SD

#

not from my trained model

#

so that the testing is pure

#

you can run those test prompts in your vanilla SD model right now and get the same results

stone garden Nov 23, 2022, 10:30 AM

#

ho ok sorry I didn't understand right

#

style or artstyle acts a little like "class" here then a little
on one side, it should help the teaching process to get there faster, since it starts from closer,
on another side, it will train the artstyle token a little too, so maybe having a small set of pictures in any other artstyle, and tagged as such, could help prevent bleeding a little
That's super info. I'm not doing a model right now, but I'll try it with the PoW submission model tomorrow

viral sail Nov 23, 2022, 10:50 AM

#

Hey guys! I'm currently delving into the topic of textural inversion, and I often come across the term "DreamBooth". Would anyone mind explaining or point me to a resource to see what the difference is? Thanks!
As in: What is an embedding, what is a textural inversion and what is DreamBooth? seems like they're all in the same vein

rich cipher Nov 23, 2022, 12:51 PM

#

Hi guys, I'm somewhat stuck with dreambooth... I trained a model based on 1.4 with 25 images of me, started with 2500 steps (learning rate at 0.000001 or 1-E6) and got results that don't really look like me. Added 500 steps multiple times now, and all I'm getting out of a simple promt is a random Chinese dude with my beard and cheeks and tons of age spots like a 100 year old... Either I'm doing something fundamentally wrong or dreambooth just hates me...

timid fable Nov 23, 2022, 1:23 PM

#

hello, is there any tutorial on how to train a style with SD Dreambooth extension? I'm getting really weird results

split acorn Nov 23, 2022, 1:51 PM

#

rich cipher Hi guys, I'm somewhat stuck with dreambooth... I trained a model based on 1.4 wi...

it only took 2k for good results that look like my input alicatHm2 I'd suggest following along with some of the online dreambooth tutorials. There might be missing our outdated information, but they should give you a good hint on what's missing or what went wrong... or at least an idea alicatPog

open abyss Nov 23, 2022, 3:47 PM

#

viral sail Hey guys! I'm currently delving into the topic of textural inversion, and I ofte...

An embedding is a map from text to a set of tokens that will likely result in an image matching the text; when you do Textual Inversion, you create an embedding (saved as a small *.pt file) that is added to the list of things the model "knows". The training process involves tens or hundreds of samples where the text prompt is fed into one end, and your reference image into the other. The reference image is diffused into noise, then the model tries to find the embedding that results in the same target noise, so that when the diffusion is reversed, the image that comes out resembles the starting image (and therefore matches the prompt).

open abyss Nov 23, 2022, 3:51 PM

#

viral sail Hey guys! I'm currently delving into the topic of textural inversion, and I ofte...

Dreambooth, on the other hand, is fine-tuning the model. It does contrastive/ablative learning where it contrasts pairs of images from the "prior class" and your new target -- for example a dog and a Weimaraner dog -- and learns the difference in much the same way. But in order to do that, it's modifying the checkpoint model itself, essentially grafting the whole training process onto an existing .ckpt file so that the result is a checkpoint that has stored your concept in a rare/sparse token somewhere.

viral sail Nov 23, 2022, 3:53 PM

#

open abyss Dreambooth, on the other hand, is fine-tuning the model. It does contrastive/abl...

awesome, thank you so much jurph!

#

May I use this info with citing you in my SD wiki? (https://jerostephan.notion.site/Stable-Diffusion-Reference-618eef6f74f049868886a2c300446425#108b8126689545a9948675837c804ad2)

Jerome Stephan's Notion on Notion

Stable Diffusion: Reference

This guide is meant to be a starting point for those who want to start using SD, findings from experiments and a possibility to learn more for everyone who is already experienced in SD. Inspired by Ethans incredible travellers guide to latent space for DD, with most of what you’ll read driven by the amazing community around this open source soft...

open abyss Nov 23, 2022, 3:56 PM

#

You may, but you probably want to go back to the original academic papers that the two are based on, and use the authors' exact words (or cite their papers). I'm certain that there are things in my descriptions that gloss over other (academically) important differences

final matrix Nov 23, 2022, 5:13 PM

#

im wondering how bad it would be to just straight up use this concept art shot instead of making each of those shots their own image like in the 2nd image
like say i caption this as "character concept art of" and prompt this character but with character concept art as a negative prompt.... what would happen? would i get one shot of the character or multiple?

stone garden Nov 24, 2022, 12:21 AM

#

I finished an installer for Shivam with xformers and 8bitadam for windows, with a working test run. I'm working on a UI to prepare, queue and test/compare learning sessions in it

stone garden Nov 24, 2022, 1:05 AM

#

the install and testrun work nicely, the UI works but needs more tests, windows Nvidia only, not sure about the minimum VRAM it requires, it runs on 16 for sure, it should theoricaly run on 12 but I couldn't test

📎 ShivamUI-v0.zip

stone garden Nov 24, 2022, 2:06 AM

#

confirmed to work on 12GB 🙂

open abyss Nov 24, 2022, 2:35 AM

#

stone garden I finished an installer for Shivam with xformers and 8bitadam for windows, with ...

What’s Shivam-and-all-that? I’ve got 12GB. Do I want it?

stone garden Nov 24, 2022, 2:40 AM

#

open abyss What’s Shivam-and-all-that? I’ve got 12GB. Do I want it?

Shivam is an implementation of dreambooth that can let you train models on new pictures
https://github.com/ShivamShrirao/diffusers/tree/main/examples/dreambooth
It's really nice but to set it up it can take some time. I don't guarantee my zip will work on all computers though, I've tried on 2 for now

GitHub

diffusers/examples/dreambooth at main · ShivamShrirao/diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch - diffusers/examples/dreambooth at main · ShivamShrirao/diffusers

rigid sorrel Nov 24, 2022, 2:48 AM

#

QQ for you model pros. I finally have a model with good output, but it only does 1 thing. How do I tune it for more generality? AKA I'd like to produce something besides super amazing female portraits.

hot breach Nov 24, 2022, 3:04 AM

#

add some other stuff back into your training set

rigid sorrel Nov 24, 2022, 3:11 AM

#

others stuff like what?

atomic lagoon Nov 24, 2022, 3:25 AM

#

super fast

hot breach Nov 24, 2022, 3:31 AM

#

rigid sorrel others stuff like what?

the stuff that you messed up

#

if you want to tune more generally, add "general" images back

#

seeing examples of your problems would help

rigid sorrel Nov 24, 2022, 3:32 AM

#

I didn't have general images in the first place to remove. So you're saying I should find some general images close to this style?

#

#

They all look like this, please help!

hot breach Nov 24, 2022, 3:33 AM

#

not sure I fully understand your goals and problems achieving, may need more details

rigid sorrel Nov 24, 2022, 3:33 AM

#

its very difficult to get anything but a close-up portrait

hot breach Nov 24, 2022, 3:33 AM

#

plenty of fine tuners now allow you to caption images so you can train as many concepts, subjects, and styles at once as you wish

#

instead of class/token nonsense

#

you need labeled data but you can train tons of stuff at once if you don't mind the compute time

#

and the labeling effort

atomic lagoon Nov 24, 2022, 3:34 AM

#

rigid sorrel

where did you did this?

rigid sorrel Nov 24, 2022, 3:35 AM

#

atomic lagoon where did you did this?

on my computer!

atomic lagoon Nov 24, 2022, 3:35 AM

#

wich model?

hot breach Nov 24, 2022, 3:36 AM

#

@rigid sorrel https://huggingface.co/panopstor/ff7r-stable-diffusion I made a model with 6-7 characters and a bunch of cityscape stuff in it all at once

#

its just a matter of labeling all the images and using a trainer that can take captions for each image for the most part, you can train 5 artstyles at once if you want

rigid sorrel Nov 24, 2022, 3:37 AM

#

atomic lagoon wich model?

My model =]. I haven't shared it yet.

hot breach Nov 24, 2022, 3:38 AM

#

https://huggingface.co/panopstor/ff7r-stable-diffusion/blob/main/mega_test_characters01_sm.jpg this shows six of the characters

rigid sorrel Nov 24, 2022, 3:38 AM

#

hot breach <@401918287723757569> https://huggingface.co/panopstor/ff7r-stable-diffusion I m...

Yeah, I def. want to do that. The 'booth version I have can do it.

stone garden Nov 24, 2022, 10:05 AM

#

Hey all, i need your wisdom 😇 Do you think it would be possible to train a model to produce only white/transparent-looking background images where the subject is always centered, isolated and full body? Would that work if i train it on a dataset of images with only these charachteristics?

#

stone garden Nov 24, 2022, 11:03 AM

#

stone garden Hey all, i need your wisdom 😇 Do you think it would be possible to train a mode...

yes, it would work.
In fact, depending on what you use to train, using a secondary concept for white background, and a third concept for "fullbody" or centered shot, lets you even teach a main concept alongside.
Teaching "centered" and "white background" isn't very hard because it's mostly refining those existing concepts, already present in SD but not good enough

#

you can go multiple ways here

#

first you can teach all those concepts on their own, and be able to call for each one individualy in the resulting model. This is the hardest one that could be more complicated to train well, but gives the more possibilities in the end

#

you could teach all those concept all at once, under a common token, like "photoshoot" or something. A single concept is easier to teach but you won't be able to prompt for only one of those characteristics you wanted, it will be all or nothing

#

if you want this "photoshoot" concept to work, you will need to teach it as "a style", so it can be used on a whole lot of types of subjects

#

here is nitrosocke guide on training styles if you're interested https://github.com/nitrosocke/dreambooth-training-guide/blob/main/README.md

stone garden Nov 24, 2022, 11:24 AM

#

stone garden yes, it would work. In fact, depending on what you use to train, using a seconda...

Super detailed answer, thank you!! So i guess dreambooth is the way to go in this case. I think training it all under a common token would make sense for my usecase, my ideal concept would be a 3D Model/ asset preview.
Another question, can i also train them on a specific Resolution? I know SD is trained on 512x512, could i train it on 512 x 896 for example without it giving me duplicates?

stone garden Nov 24, 2022, 11:26 AM

#

stone garden Super detailed answer, thank you!! So i guess dreambooth is the way to go in thi...

training on a specific ratio/resolution is supported in very few Dreambooth implementation, I know only of Everydream that does that, and it requires 24GB VRAM

#

but yes it would work

#

if you want to test yourself on local install and have at least 12GB VRAM, I tried to put together an installer yesterday, and I'd love some more tester. only for windows and NVIDIA cards, but still

#

#🔧｜finetune message

stone garden Nov 24, 2022, 11:37 AM

#

stone garden if you want to test yourself on local install and have at least 12GB VRAM, I tri...

Would love to try it but i sadly have only 6GB VRAM 😦

#

That means you still can, but on Collab

#

Lots of people in #🏞｜general-with-images post things they do on the model they made that day on Collab

crimson wasp Nov 24, 2022, 12:06 PM

#

Does anybody know what kind of parameters the huge multi-subject models like Novel AI/Waifu Diffusion/the furry ones which I think exist though may have dreamed it, use for training? e.g. Would you want a lower learning rate when training on a hundred different concepts at once? I'm currently trying to build a model for controlled posing and number of people with text by using a few thousand really well tagged images

final matrix Nov 24, 2022, 2:09 PM

#

final matrix a more radical example of why "image of X in the Y artstyle" is superior 1st im...

after a ton more testing with more different prompts and subjects i have come to the conclusion that "image of X by Y" is the best one

final matrix Nov 24, 2022, 2:49 PM

#

the differences are quite subtle though

final matrix Nov 24, 2022, 4:21 PM

#

ok so maybe i just need to be more specific about my captions
like i just did a test prompt in vanilla sd

"photo of a house in the ghibli style" vs. "photo of a house in the ghibli architectural style" vs. "(photo:2) of a house in the ghibli style"
or
"photo of a house in the art noveau style" vs. "photo of a house in the art noveau architectural style"
clearly being more specific helps a lot. right now i just captioned all show screenshots as "tlok artstyle" (not even "image of ... in the..." and the people in there as say "woman"

so if i make my captions more specific and detailed i may be able to better contain the infection. like say "screencap from the tlok anime of the woman Korra with the avatar ponytail hairstyle wearing a sleeveless avatar shirt and avatar fur skirt and avatar armband and avatar sleeves and standing in front of background art of a building in the tlok architectural style"
or something similar
though that would be a gigantic 49 tokens, the fantasy card model trained its model on images that all on average have around 50 tokens so it should be fine?
https://cdn.discordapp.com/attachments/1026983549154361425/1045373282385412166/grid-0264.png
https://cdn.discordapp.com/attachments/1026983549154361425/1045373282725134396/grid-0271.png
https://cdn.discordapp.com/attachments/1026983549154361425/1045373283006156871/grid-0270.png
https://cdn.discordapp.com/attachments/1026983549154361425/1045373283429793932/grid-0272.png
https://cdn.discordapp.com/attachments/1026983549154361425/1045373283773710336/grid-0273.png

unique berry Nov 24, 2022, 4:41 PM

#

If I have ~15k images that are well captioned, does it make sense to do dreambooth? I’ve been doing unet finetuning and the model doesn’t seem to learn the style in the dataset (white background, front view of the shoe in middle)

dapper prism Nov 24, 2022, 6:10 PM

#

unique berry If I have ~15k images that are well captioned, does it make sense to do dreamboo...

I think you may need to finetune the text encoder too, but I'm not sure EveryDream can do that yet

copper lagoon Nov 24, 2022, 8:44 PM

#

https://huggingface.co/nitrosocke/redshift-diffusion Does anybody know how to load the custom VAE from this model into auto1111? It's in the bin format, renaming to .vae.pt just throws an error.

onyx vault Nov 24, 2022, 10:57 PM

#

copper lagoon https://huggingface.co/nitrosocke/redshift-diffusion Does anybody know how to lo...

That model doesn’t come with a vae

copper lagoon Nov 24, 2022, 10:57 PM

#

well there's a bunch of custom parts in there

#

I can't reproduce the results 1:1 anyway, for some reason

#

with stated parameters and prompt

tiny wolf Nov 24, 2022, 10:58 PM

#

Anyone know any good models that are good for creating images of birds eye views of towns and cities?

frank mango Nov 24, 2022, 11:16 PM

#

Hey, is using multiple GPU (A5000) faster for dreambooth training ?

#

Also, is there any documentation about the "concept list" of dreambooth, the expected format for example ? I can only find such link with no answers : https://www.reddit.com/r/StableDiffusion/comments/ypxhw7/dreambooth_how_to_use_concept_list_or_prompt/

r/StableDiffusion - Dreambooth: How to use "concept list" or prompt...

1 vote and 0 comments so far on Reddit

heavy lynx Nov 24, 2022, 11:24 PM

#

Hey, what's the difference for the repos/colabs of ShivamShrirao and TheLastBen for dreambooth?

tacit bronze Nov 25, 2022, 7:57 AM

#

a first version dreambooth gen to create hd2d game screenshots (octopath/livalive/dq3)
"hd2d, a crowded desert marketplace in front of a pyramid"

pallid perch Nov 25, 2022, 9:17 AM

#

what would be the best method for training an action or activiy, like fencing, acrobatics, etc. ? dreambooth ckpt, TI, hypernetwork ?

frosty wave Nov 25, 2022, 9:51 AM

#

stone garden Shivam is an implementation of dreambooth that can let you train models on new p...

Hi! What is the difference with the sd_dreambooth_extension that can be installed from the Automatic1111 web-ui? That one works well with my 12 Gb gpu but I can give a try to another if there are interesting differences

stone garden Nov 25, 2022, 9:52 AM

#

frosty wave Hi! What is the difference with the sd_dreambooth_extension that can be installe...

For now, not a lot. I reorganised the params mainly, and made it a clean install on its own as a standalone. The extension was buggy on me, and not that much user friendly.
But I'm still working on the UI to make something better.

#

Well main difference is that it trains on diffusers instead of ckpt

frosty wave Nov 25, 2022, 9:56 AM

#

^Ho ok sorry I'm still a bit new to all this but... what are diffusers, again? 🙂

grave carbon Nov 25, 2022, 11:24 AM

#

frosty wave ^Ho ok sorry I'm still a bit new to all this but... what are diffusers, again? �...

Diffusers I think are inside the ckpt. It needs a conversion. We did conversion all the time when dreamboothing so...

open abyss Nov 25, 2022, 11:50 AM

#

tiny wolf Anyone know any good models that are good for creating images of birds eye views...

This should be fairly easy to train because satellite images and maps are widely available and may even be in the training set. See if satellite image in the style of SPOT or MAXAR gets you results. Might need to use img2img on street maps to get good results because the structure of roads is a complex rule-based layout — like hands & text, a generative model will struggle and a human brain can recognize flaws in it

tough gazelle Nov 25, 2022, 1:21 PM

#

Using "Satellite photo of "works pretty well in SD 2.0

viral jay Nov 25, 2022, 1:59 PM

#

tough gazelle Using "Satellite photo of "works pretty well in SD 2.0

Does it generate good imagery? Any example?

dense hawk Nov 25, 2022, 3:17 PM

#

Hey I want to setup my own personal thumbs up thumbs down local image gallery so I could easily manually train my aesthetic. Does anyone know of any open source simple solution that tags the images in a way that works well?

tight cradle Nov 25, 2022, 3:20 PM

#

Hi Folks, So if I would like to train my SD on a certain aesthetic style of logo design and I collected a large amount of different images representing this style. How would I use these images to train my AI using automatic1111? Do I use dream booth for this or should I use the Train tab (which sub tab)? Are these even the right solution for what I want to achieve? It is not like I am trying to train on the face of a single person but I want my prompts to have a specific style text-image is not giving me at this moment. What do you recommend?

dapper prism Nov 25, 2022, 3:53 PM

#

grave carbon Diffusers I think are inside the ckpt. It needs a conversion. We did conversion ...

Differs are simply a different format for the model weights. ckpt and diffuser weights are the exact same, just stored in a different format.

stone garden Nov 25, 2022, 4:14 PM

#

https://github.com/Guizmus/DreamboothSimpleUI
Dreambooth local install for windows, compatible with 2.0, delivered to you with 2 working examples, one on 1.5 and one on 2.0.
require conda and git
The 1.5 example should run under 12GB VRAM

tough gazelle Nov 25, 2022, 4:40 PM

#

viral jay Does it generate good imagery? Any example?

I can get some later, at work at the moment

rigid sorrel Nov 25, 2022, 5:47 PM

#

why do folks use "style" as a training set vs "art"?

#

"style" is a lot of fashion & magazine prints

#

sorry, as class images

viral jay Nov 25, 2022, 7:19 PM

#

tough gazelle I can get some later, at work at the moment

Thanks I've actually installed the sd 2.0 here for testing but for me it still not good enough, I will give a try to train a model using sat imagery, hopefully I get better results with it

tough gazelle Nov 25, 2022, 7:20 PM

#

viral jay Thanks I've actually installed the sd 2.0 here for testing but for me it still n...

stuff like this, sometimes even with bing/google maps city name overlays on lol

viral jay Nov 25, 2022, 7:21 PM

#

lol yeah kinda same result I got here, I'm looking for better quality, maybe dreambooth can achieve it, will see, need some hq tiles

dapper prism Nov 25, 2022, 7:34 PM

#

How does EveryDream training time vary by dataset size? Can I train a small dataset in a matter of hours?

viral jay Nov 25, 2022, 8:45 PM

#

using some tiles for training I can achieve some higher quality images, but still need to tune this thing, probably need captioning to describe where is fields, city, etc

stone garden Nov 25, 2022, 8:46 PM

#

dapper prism How does EveryDream training time vary by dataset size? Can I train a small data...

you can, yes.
EveryDream trains by counting "repeats" and not "steps". It will repeat each picture of the dataset X time. So the bigger the dataset, the longer the Epoch (an epoch is a given number of repeats for each image)

gloomy belfry Nov 25, 2022, 9:07 PM

#

anyone tried finetuning the 768 model?

#

after a few epochs all I get in the samples is this

#

#

(prompt is just a test, that's the image name as well)

#

works fine with the SD2-base model so I'm not sure what's wrong

sharp solstice Nov 25, 2022, 9:32 PM

#

gloomy belfry anyone tried finetuning the 768 model?

which repo is that?

gloomy belfry Nov 25, 2022, 9:33 PM

#

mine, but it's based on Shivam's which is diffusers

sharp solstice Nov 25, 2022, 9:34 PM

#

i found that applying an embedding from 1.x "works" on 2.x but you get the brown smudge like in your screenshot

#

automatic1111

gloomy belfry Nov 25, 2022, 9:34 PM

#

amm

sharp solstice Nov 25, 2022, 9:34 PM

#

i can't train based on 2.0 because the textual inversion stuff is written to work for the previous clip model

gloomy belfry Nov 25, 2022, 9:34 PM

#

the only difference between the two is that they used v-prediction training on it

sharp solstice Nov 25, 2022, 9:35 PM

#

it kinda sounds like you were training on the previous clip model and tried to use the result on the new one

sharp solstice Nov 25, 2022, 9:35 PM

#

gloomy belfry after a few epochs all I get in the samples is this

did it ever look ok?

gloomy belfry Nov 25, 2022, 9:35 PM

#

nuh I'm training and sampling on per the model from diffusers

gloomy belfry Nov 25, 2022, 9:35 PM

#

sharp solstice did it ever look ok?

at the start yes

sharp solstice Nov 25, 2022, 9:36 PM

#

ah okay, then nvm i'm not sure what i'm talking about

gloomy belfry Nov 25, 2022, 9:37 PM

#

actually it might just be the scheduler

#

but not sure

sharp solstice Nov 25, 2022, 9:38 PM

#

where can i try your repo?

gloomy belfry Nov 25, 2022, 9:38 PM

#

you can't, it's not online

#

I might share it at some point but I'm still changing it a lot

#

it's basically Everydream but based on diffusers, so I can do 24 batch_size on a 4090

sharp solstice Nov 25, 2022, 9:40 PM

#

is it notebook based?

gloomy belfry Nov 25, 2022, 9:41 PM

#

no

#

it's geared more towards large fine-tuning

weary knot Nov 25, 2022, 11:46 PM

#

Does dreambooth always involve a new model? Can you join two dreambooth finetunings into a single model?

hot breach Nov 26, 2022, 12:04 AM

#

mostly a new model, there are some model merging tools but I think they'll generally water down the stuff you train

weary knot Nov 26, 2022, 12:12 AM

#

thanks a lot

crimson wasp Nov 26, 2022, 2:43 AM

#

Just a heads up, the CLIP text encoder's vocabulary has a lot of words which end in </w>, often the same word twice with and without </w>. It seems to indicate the end of a word made up of 1 to n tokens and likely has some sort of marker in the embedding which SD understands.
apple = "apple</w>":3055
applesauce = "apple":8629 "sauce</w>":5520
computer = compu":11639, "ter</w>":652
ahsoka = "ah":1772 "so":759 "ka</w>":1525
arnold schwarzenegger = "arnold</w>":13609 "schwarzenegger</w>":33860
emma watson = "emma</w>":7445 "watson</w>":9294
chadwick boseman = "chad":13095 "wick</w>":6479 "bo":647 "seman</w>":28378
https://huggingface.co/openai/clip-vit-base-patch32/resolve/main/vocab.json

It could be useful when picking tokens for training (e.g. using an existing single word token might be bad, using a new combination might be better), or for initializing embeddings (where more tokens is probably more unweidly, and finding an init string with fewer tokens is probably ideal)

crimson wasp Nov 26, 2022, 3:11 AM

#

sks for example is "sks</w>":48136, which is probably why the issues with it were so pronounced. It's already an existing single word

if your token was applesauce you'd be working with 'apple sauce</w>' but if your token was 'apple sauce' you'd be working with 'apple</w> sauce</w>', so the space is important for whether the token is going to be treated as one joined word or multiple prompt words (presuming there's a word termination marker on those </w> embeddings which SD understands). Essentially just don't have spaces in names when training a model and I suspect you'll get better results

apple_sauce is treated as apple</w>_</w>sauce</w> so underscores won't work in place of spaces

If you want variations of a shirt, like v1shirt, v2shirt, you'd want to make sure those actually tokenize to always ending in the shirt</w> token, so that the first part is working as a modifier of the same final shirt concept (you can check with automatic's webui-tokenizer plugin)

acoustic sluice Nov 26, 2022, 3:16 AM

#

gloomy belfry

I'm getting this too

karmic warren Nov 26, 2022, 8:19 AM

#

not sure if this'll do as a place to dump code but just put this script together from the Loopback script and the sd_upscaling script, with a smidge of xy_grid for good measure 😄 all credit goes to the authors of those

📎 loopback2.py

#

currently testing, seems to be working up to now 🤞

#

maybe a quick description, grab the file put it in the scripts folder and reboot the UI

#

and for the use, it's an im2img script that will sd_upscale the given image then downscale it back to original resolution and run another sd_upscale with the next scheduler specified in the schedulers list on the output of the current loop, and that up to the number of loops specified

#

a quick illustration with it

karmic warren Nov 26, 2022, 9:35 AM

#

regal harbor Nov 26, 2022, 10:12 AM

#

should I put all images of the same pose into a single folder? e.g. standing facing forward, standing facing sideways, and standing facing away? And train each separately? Or does it matter if I've BLIPed them all?

half folio Nov 26, 2022, 12:50 PM

#

So I have a question, is it possible to finetune on top of v2-velocity?

#

I know there's already dreambooth models trained on top of v2-base but velocity model appears to be a little bit different, no?

sharp solstice Nov 26, 2022, 6:18 PM

#

i have a feeling that when training subjects the composition of the training data has a substantial impact when training
for example if you train a subject up close, you'll have a harder time reproducing the subject from further away even though the angle is the same

#

with that in mind, i'm thinking maybe it helps to not only flip, but to rotate, scale and translate the subject around as well

#

maybe that could be aided with inpainting or something for varied backgrounds

dapper prism Nov 26, 2022, 7:39 PM

#

Anyone had any luck DreamBoothing or finetuning the 768 model yet?

sharp solstice Nov 26, 2022, 7:58 PM

#

dapper prism Anyone had any luck DreamBoothing or finetuning the 768 model yet?

i'm trying right now with embeddings in a111. it's kinda slow to use 768 though

hidden plinth Nov 26, 2022, 8:15 PM

#

anyone dreambooth/finetune the depth2img model yet?

slim sphinx Nov 27, 2022, 12:24 AM

#

tight cradle Hi Folks, So if I would like to train my SD on a certain aesthetic style of logo...

I have the same question

remote vapor Nov 27, 2022, 3:13 AM

#

how are people finetuning the new 2.0 in automatic1111 ....... i cant get it to work

hardy lantern Nov 27, 2022, 5:06 AM

#

how do you remove duplicate images from larger datasets, what software do you use?
None of the software I have tried has been satisfactory.
there are good python modules but they would require a GUI as they also produce false positives and I have no idea about GUI programming. 😭

crimson wasp Nov 27, 2022, 6:48 AM

#

has anybody tried training with CLIP skip?

hardy lantern Nov 27, 2022, 7:03 AM

#

crimson wasp has anybody tried training with CLIP skip?

of course, if the model you want to fine tune on uses clipskip itself, you should do it too for better results

crimson wasp Nov 27, 2022, 7:05 AM

#

hardy lantern of course, if the model you want to fine tune on uses clipskip itself, you shoul...

I've been doing it from 1.5, and aren't sure it's helping (with the things novel AI saw it help with)

hardy lantern Nov 27, 2022, 7:06 AM

#

since 1.5 doesn't use clipskip you shouldn't either, unless you train a really huge dataset

crimson wasp Nov 27, 2022, 7:07 AM

#

hardy lantern since 1.5 doesn't use clipskip you shouldn't either, unless you train a really h...

yeah training with a 4.1k manually tagged dataset with about 15 tags per file, might just not have had enough steps yet at 22k with 5e-7 lr

hardy lantern Nov 27, 2022, 7:09 AM

#

i mean a lot more pictures, besides i wonder about the 5-7

#

no clipskip and 5-6 did not have the desired results?

#

how many epochs do you have?

#

which trainer?

#

4.1k manual - that's hardworking catwhaaa

delicate stream Nov 27, 2022, 11:25 AM

#

For anyone wondering why the Loss rate is weird in the first 1k steps. This is constant LR with 1e-6, i noticed during all of my trainings, the Loss rate decreases after 2k steps.

#

This is another attempt at ttraining

#

So just because your loss rate is going up and down in the beginning, doesn't mean you did something wrong

#

#

Some of you might be thinking, isn't the loss rate supposed to be like this?

#

Nope

#

This is like Hypernetworks, where the loss rate will never go down like that, the loss rate to be honest. It should alternate between 2 numbers high and low but eventually go down, tho super slowly. The loss rate is more or less a way to know when your model is blowing up, if it keeps rising, its dying. if it's just alternating between two numbers, its fine. Its still learning and as long as it doesn't keep going up it will be fine. So not necessarily lower = better.

#

example in some cases

#

when you train it might always say 0.169 or 0.165

#

but then you go and cancel and train again from where it left off and it will say 0.134 or 0.140

#

so there's no need for us to just see it go down, it's more of a number saying "Yup, im training, but don't let me get out of proportions."

#

and if your loss is 0

#

then....you fucked up

weary knot Nov 27, 2022, 12:08 PM

#

delicate stream For anyone wondering why the Loss rate is weird in the first 1k steps. This is c...

thank you, very informative

delicate stream Nov 27, 2022, 12:11 PM

#

No problem, just trying to spread the word....so people don't waste weeks on a model like i did trying to figure out the perfect settings

#

In the end i finally figured out i just needed to wait, and just have a good dataset

weary knot Nov 27, 2022, 12:41 PM

#

that gets me every time "oh it's not working" and then you just need to wait a very long time

#

iterative development in deep learning requires so much of the patience I don't have xD

delicate stream Nov 27, 2022, 12:45 PM

#

weary knot that gets me every time "oh it's not working" and then you just need to wait a v...

i feel that, i have said so many times, "i'll just do 4k steps" but in reality something like 8k or more is a bit better. Granted you have a varied dataset in my case, 21 images and 8400 steps at 4 epochs

weary knot Nov 27, 2022, 12:46 PM

#

only 21 images? What technique are you using

delicate stream Nov 27, 2022, 12:46 PM

#

also i highly recommend [filewords] for when you have text files

weary knot Nov 27, 2022, 12:46 PM

#

textual inversion?

delicate stream Nov 27, 2022, 12:46 PM

#

Dreambooth

weary knot Nov 27, 2022, 12:47 PM

#

I thought dreambooth required more images... maybe I got it mixed up

delicate stream Nov 27, 2022, 12:47 PM

#

Nope, Dreambooth requires less, 3-5 depending on what the subject is, but you can add more

weary knot Nov 27, 2022, 12:48 PM

#

very cool. I never looked too much into dreambooth because it requires generating another ckpt

#

I love the textual inversion idea, that dude is a genius

#

but, alas dreambooth is mor precise in generating what you want

delicate stream Nov 27, 2022, 12:49 PM

#

Yup, i used to do Hypernetworks all the time but after dreambooth, man i really don't want to go back. But to be honest they are both good methods, tho dreambooth usually has a better understanding of you subject.

weary knot Nov 27, 2022, 12:50 PM

#

cool. How do you make it fit on a 3090 (assuming that's you card). iirc, deambooth requires a lot of compute

#

from the paper, I mean. Haven't seen the implementations

delicate stream Nov 27, 2022, 12:50 PM

#

weary knot cool. How do you make it fit on a 3090 (assuming that's you card). iirc, deamboo...

using a 3080Ti actually, 12Gb vram

#

i just use the Dreambooth extension for the Automatic1111

weary knot Nov 27, 2022, 12:51 PM

#

oh I see

delicate stream Nov 27, 2022, 12:51 PM

#

weary knot Nov 27, 2022, 12:51 PM

#

the community is amazing

delicate stream Nov 27, 2022, 12:51 PM

#

https://github.com/d8ahazard/sd_dreambooth_extension

GitHub

GitHub - d8ahazard/sd_dreambooth_extension

Contribute to d8ahazard/sd_dreambooth_extension development by creating an account on GitHub.

#

huh....after updating the dreambooth extension and using the new sampler, i am watching a miracle

#

it GOES DOWN!

#

Still its going up again a bit

#

but like always im betting after 2k it will just keep decreasing

weary knot Nov 27, 2022, 12:55 PM

#

that's very good news

delicate stream Nov 27, 2022, 12:56 PM

#

hopefully, the only options yesterday was DDIM sampler training, now im trying the new Euler_a sampler for training.

weary knot Nov 27, 2022, 12:56 PM

#

I spent so much money renting an A60000 instance to develop my own code.... now that I made the proof of concept for what I wanted, I should use these repos to run on my local machine

weary knot Nov 27, 2022, 12:57 PM

#

delicate stream hopefully, the only options yesterday was DDIM sampler training, now im trying t...

huh it's even weird to me that it wasn't already available. I wonder if euler_a has any problem with gradients

#

maybe it was just a matter of plugging it in

delicate stream Nov 27, 2022, 12:58 PM

#

idk but until yesterday my only options were LMS, DDIM, PNDM and DPM

#

#

Now i have 2 more

weary knot Nov 27, 2022, 12:59 PM

#

cool

delicate stream Nov 27, 2022, 12:59 PM

#

if they provide better quality

#

How much Vram do you have?

#

if you have 10 you can run Dreambooth, tho you might need memory saving options

weary knot Nov 27, 2022, 1:03 PM

#

I use a 3060 with 12GB VRAM

delicate stream Nov 27, 2022, 1:03 PM

#

then you can easily run dreambooth

#

no Linux or docker or WSL needed

weary knot Nov 27, 2022, 1:04 PM

#

yeah. The thing is that I'm developing my own code, so I need to find a way for that to work with the optimizations. But these repositories should help a lot, ty

delicate stream Nov 27, 2022, 1:04 PM

#

Sure no problem

delicate stream Nov 27, 2022, 1:56 PM

#

this time the LR went down after 3k

#

#

but

#

the dreambooth extension has a bug

#

it was introduced today

#

#

No matter what i do, it wont generate the CKPT, so i have to wait till they fix it

#

i did a bug report but haven't gotten any replies

weary knot Nov 27, 2022, 2:16 PM

#

wow so sad xD all of this and it doesn't get saved

median sun Nov 27, 2022, 2:35 PM

#

has anyone used dreambooth together with a hypernetwork and textual inversion prompt keyword?

delicate stream Nov 27, 2022, 2:35 PM

#

weary knot wow so sad xD all of this and it doesn't get saved

no, its saved

#

it's just i cant generate the ckpt

#

so i have to wait till they fix the bug and then

#

just hit that

weary knot Nov 27, 2022, 2:36 PM

#

I see

weary knot Nov 27, 2022, 2:36 PM

#

median sun has anyone used dreambooth together with a hypernetwork and textual inversion pr...

well, it should work, I don't see any reason why not. Are you having any issues?

median sun Nov 27, 2022, 2:37 PM

#

nah just asking

delicate stream Nov 27, 2022, 2:37 PM

#

median sun has anyone used dreambooth together with a hypernetwork and textual inversion pr...

some people do yes

median sun Nov 27, 2022, 2:37 PM

#

I've seen someone use hypernetworks with TI with good results

delicate stream Nov 27, 2022, 2:37 PM

#

i personally haven't but i heard Dreambooth + Emebeddings are good

median sun Nov 27, 2022, 2:40 PM

#

I still need to figure out how to get good training results

#

but as I'll be testing all 3 with the same source material I'll def. mix em up as test

#

what is good amount of source images?

#

I've just rendered out a 20 min cartoon as test

#

that gave me about 1k images at 1fps

#

there are a lot of similar looking images there, but they do have small differences like the characters in a different pose etc.

#

should I keep them or delete everything that is too similar?

delicate stream Nov 27, 2022, 2:46 PM

#

Are you training a HN or using Deforum?

median sun Nov 27, 2022, 2:47 PM

#

I'm trying TI HN and Dreambooth atm

#

deforum was for videos wasn't it?

delicate stream Nov 27, 2022, 2:47 PM

#

yhea but since you said 1fps i thought it was deforum

median sun Nov 27, 2022, 2:48 PM

#

nah I just rendered a video into images

delicate stream Nov 27, 2022, 2:49 PM

#

So you are using the video as a source to train the HN?

median sun Nov 27, 2022, 2:49 PM

#

I used 1fps to reduce the img count as 30fps would have created something like 30 to 50k pictures

#

yes

#

the dreambooth result is really messy

delicate stream Nov 27, 2022, 2:49 PM

#

ahh i understand now

weary knot Nov 27, 2022, 2:50 PM

#

I know of someone who did something similar

median sun Nov 27, 2022, 2:50 PM

#

prob. because it's not focused on one object

weary knot Nov 27, 2022, 2:50 PM

#

they animated a face then used dreambooth on it

delicate stream Nov 27, 2022, 2:50 PM

#

What are you training? a style, person or an object?

weary knot Nov 27, 2022, 2:50 PM

#

and it worked

median sun Nov 27, 2022, 2:50 PM

#

a cartoon series

#

so a style I guess

delicate stream Nov 27, 2022, 2:51 PM

#

The number of images for style generally can be from 50- 200 in most cases, but you have to understand its better to choose Quality over Quantity

#

i haven't really trained a style yet, only people

median sun Nov 27, 2022, 2:52 PM

#

good I'll reduce it then

delicate stream Nov 27, 2022, 2:53 PM

#

as for the LR i think for styles 1e-6 is recommended or lower from what i heard.

alpine rose Nov 27, 2022, 2:59 PM

#

https://civitai.com/models/1097
published my samdoesarts model, check it out :)

00619-520424121-a_woman_by_samdoesarts_octane_houdini_vfx_render_detailed_4k___1.1.png

delicate stream Nov 27, 2022, 3:13 PM

#

alpine rose https://civitai.com/models/1097 published my samdoesarts model, check it out :)

Makes me want to train Belle Delphine

spring sun Nov 27, 2022, 3:19 PM

#

anyone know if hypernetworks should be working on for 2.0 on auto1111?

median sun Nov 27, 2022, 3:40 PM

#

is textual inversion and hypernetwork training possible with 12gb vram?

#

for dreambooth I had to use fp16 and flash attention but it worked

delicate stream Nov 27, 2022, 3:43 PM

#

median sun is textual inversion and hypernetwork training possible with 12gb vram?

Hypernetwork IS textual inversion, same as embeddings

#

Dreambooth is just injecting the newly trained word and data that corresponds to generating the subject you trained. So technically dreambooth, HN and embeddings are textual inversion just in a different way.

median sun Nov 27, 2022, 3:45 PM

#

ic thx I'll call them embedings then

spring sun Nov 27, 2022, 3:48 PM

#

I think that is possible with the Gradient accumulation, https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/4886
but dont know if is already working on 2.0, I would like to know haha

delicate stream Nov 27, 2022, 3:48 PM

#

median sun ic thx I'll call them embedings then

No i dont mean HN is embeddings

#

i mean this

#

#

you can train an embedding

#

or a hypernetwork

#

They are both textual inversion

median sun Nov 27, 2022, 3:49 PM

#

yes I got that

delicate stream Nov 27, 2022, 3:50 PM

#

median sun for dreambooth I had to use fp16 and flash attention but it worked

also if you have 12Gb all you need is these settings

#

no need for flash memory attention

#

use xformers

median sun Nov 27, 2022, 3:52 PM

#

where can I find those for TI?

delicate stream Nov 27, 2022, 3:53 PM

#

Thats the dreambooth tab

#

#

on advanced settings

median sun Nov 27, 2022, 3:54 PM

#

So the dreambooth settings work for embeddings and hypernetworks?

delicate stream Nov 27, 2022, 3:54 PM

#

No, dreambooth uses different settings

#

for HN

#

hm...in the past i used to do

median sun Nov 27, 2022, 3:55 PM

#

I know the dreambooth settings already

#

I can't find any for HN and EM tho

delicate stream Nov 27, 2022, 3:56 PM

#

5e-5:200, 5e-6:3000, 1e-6:8000, 1e-7
12k steps
Save an image every 100
And use [filewords]

#

For embeddings i really dont know

#

i never got good results so i never did them

#

#

for HN

#

#

other LR i tried

#

for styles

#

in my case i made a text file and named it 90s.txt and inside i just put 90s anime style

#

and then used that as the Prompt template file

#

for people

median sun Nov 27, 2022, 4:00 PM

#

I have no idea what the 5e-5: etc is

delicate stream Nov 27, 2022, 4:00 PM

#

i do *name of person*.txt

median sun Nov 27, 2022, 4:01 PM

#

are you sure that you settings decrease vram use?

delicate stream Nov 27, 2022, 4:01 PM

#

and inside *name of person*, [filewords]

delicate stream Nov 27, 2022, 4:01 PM

#

median sun I have no idea what the 5e-5: etc is

The learning rate

#

example

#

5e-5:200, 5e-6:3000, 1e-6:8000, 1e-7

#

from 0 - 200 the LR is 5e-5 (0.00005) and from 201 - 3000 is 5e-6 (0.000005)

#

etc

median sun Nov 27, 2022, 4:02 PM

#

ahh ok

#

my settings aren't that different from yours I'll try decreasing img size

delicate stream Nov 27, 2022, 4:03 PM

#

median sun are you sure that you settings decrease vram use?

it doesn't decrease vram, it decrease the learning over time so it doesnt over train and die

delicate stream Nov 27, 2022, 4:03 PM

#

median sun my settings aren't that different from yours I'll try decreasing img size

i highly recommend img size is 512x512

median sun Nov 27, 2022, 4:04 PM

#

I'd like to keep the aspect ratio of the originals

#

as I'll also generate with that later

delicate stream Nov 27, 2022, 4:05 PM

#

But then you'd have the problem where you have different aspect ratios than what you generate, the AI is way better at 1:1 aspect ratios (512x 768x 1024) than wide images

#

tho i suggest dont go over 512

median sun Nov 27, 2022, 4:05 PM

#

768x432 seems to work now

delicate stream Nov 27, 2022, 4:05 PM

#

for better results

median sun Nov 27, 2022, 4:05 PM

#

I'll test that later

#

also 512x512 backgrounds are kinda juck

delicate stream Nov 27, 2022, 4:06 PM

#

if you really want to get the entire image i recommend cutting them in half so it fits into 512 so you'd have landscape image split between two images

delicate stream Nov 27, 2022, 4:06 PM

#

median sun also 512x512 backgrounds are kinda juck

remember, you can generate them at any size later using the width and height