#🔧|finetune

1 messages · Page 4 of 1

maiden grail
#

Hmm, so merge them with different prompts? IE, I have a staff of power, and a staff of the archmage, with prompts for each?

The problem with that is now I have to make a model PER item. It multiplies the amount of training data I need. So 25 items becomes, 125 images needed.

Or, I guess, maybe thats the only way to do it.

Another way to put is, is that I am trying to train the model, both on the concept of a "staff", as well as on the concept of "of power", or "of the archmage", and then combining that together into one input.

So I guess it does seem like multiple models...

old igloo
#

As far as training data goes, if you want each different named staff to be represented differently visually, then you'd need images of each anyway, so I don't understand what you meant by it multiplying the number of images needed.

maiden grail
#

Well, so that almost gets there. the idea is I want to be able to do something like "robes of power", and I would have a concept of a robe, the concept of a staff, and the concept of "of power"

#

robes of power doesn't exist, lets say. So it would be newly generated.

old igloo
#

Same thing, but you'd need to do a training for each class and object combination

#

"staff_of_power staff", "robe_of_power robe"

maiden grail
#

Why would it be "robe_of_power robe" ?

Wouldn't it be "of_power robe"?

old igloo
#

Actually in that sense, you're talking about defining the concept of "of power"

#

And as far as this goes, I don't know how one defines a concept that can be applied to any class

#

I don't know if, or how that would work.

maiden grail
#

Yeah, there are 2 concepts here actually.

Concept 1: how to "define a desciptor" like of power.

concept 2: combine a descriptor model, with a different object model. Because I don't want to use the SD baseline definitions of a "staff". I want to finetune both

old igloo
#

Maybe the way to do this is to provide your own class images for "of_power"

#

Which would include samples of all kinds of things with "of_power" applied.

#

But I think you'd still end up having to define every object you want that to be applicable to.

maiden grail
#

Gotcha, yeah this makes sense and was about what I was thinking. Will try it and see how it goes. Thanks!

old igloo
#

The only other idea I have is that I know there's a different technique for training on a particular style

#

So if you consider "of power" to be a style, that might be another approach.

maiden grail
#

That might work too, what is that?

old igloo
#

I'm not sure, but I think it's similar to normal training, but you supply the class images.

old igloo
# maiden grail That might work too, what is that?

We can finally train Stable Diffusion using our own art and photos thanks to textual inversion! The hugging face team recently created 2 Google Colab docs that allow you to upload your own images and train the base 1.4 stable diffusion model on them. In this video, I will show you how you can use these Colab docs to understand the concept behind...

▶ Play video
maiden grail
#

Wait, is merging just done through the checkpoint merger tab? It only combines 3 models together.... Are there problems doing merges iteratively? Like a 100 of them? will it bias the latest ones? Or I guess if you put a different prompt on each it shouldn't matter?

old igloo
#

I'm not 100% clear on the differences between these concepts yet, but I think what you're talking about could be handled (or is handled) by textual inversion.

#

And I think if you go the Textual Inversion route, you wouldn't have to do any merging, because you'd be doing a single training on a batch of images representing the "of power" style.

maiden grail
#

So the "of power" thing is fine. I think that would just be a regular model.

But, the issue is that I don't want an "of power" model alone. Instead, I also want a robe model, and a staff model, because the defaults for both of these are bad.

#

So would need to merge models, just because both parts need to be good

old igloo
#

Yeah, I think that sounds right. You would first use textual inversion to train on the style, then you would use dreambooth to train on specific objects, merging those models into the style model. As far as I understand it, you can merge A into B, then C into AB then D into ABC then E into ABCD, and so on and it doesn't dilute with each additional merge, it just appends.

stone garden
#

Is there a way to train it so you can put an effect on an image for example, and then have images that are control-group and images that are the effect-group and tell it that when you generate any image you can have the style of the effect-group? Like a style transfer but more specifically trained with matching input control and effect images.

maiden grail
#

Why did you suggest SD for the style part? Or is it that DreamBooth just can't make styles? I've found dreambooth to universially be better than the standard create embedings.

stone garden
#

Or is it just enough to train with lots of images already in the desired style?

maiden grail
stone garden
#

Yeah my specific use case will still have to wait a bit for really-high-res capabilities, but basically we should be able to train a system to be able to output files that have like a high-res color-halftone pattern to them so they can be easily printed in some processes.

#

Imagine like a style transfer, but more of a system that spits out images with certain criteria like color-count and in patterns pre-set for printing like halftones

#

Maybe with enough training on really good high-res datasets it can be done and just really specify the training text/image matching. Is dreambooth done through naming the images or it just uses the images themselves?

#

Trying to determine whether I should still look deeper into more custom fine-tuning methods

maiden grail
#

There are 2 fields, in the dreambooth tab, that look promising, but I don't know how they work.

"concept list" and "classification directory".

No idea what those do, or how they work, but they sound related.

#

Even "class prompt" could help? I feel like there are a bunch of fields already in these models, and people just don't know how to use them....

surreal mango
#

okay so question

#

I have 17 images

#

how many training steps should I do?

#

for dreambooth

old igloo
old igloo
# surreal mango for dreambooth

As far as I understand it, there isn't a one size fits all formula to this. What I have been doing with my recent models is using steps of 5000, with checkpoints every 1000 steps. Then I can test which of the 5 checkpoints produces the best results. Usually some are over trained and some are undertrained and one or two is in the Goldilocks zone.

surreal mango
#

wait I just realized

#

It says this

#

but Ill still take your advice

old igloo
# surreal mango wait I just realized

Right. Since you have 17 images, 5000 isn't a bad upper limit since 17*200 is 3400, you'd be going above and beyond the recommendation a bit, and by setting the checkpoints at every 1000 steps, you can test to see if you get better results from 3000, 4000, or 5000 steps. Could also just go big and set it at like 10000 steps, but you'll almost certainly not be happy with the results above a certain point.

surreal mango
old igloo
#

I have not tried anything other than 512x512 because my understanding is that's what SD was initially trained on, and I think if you were to attempt to double the image size, you'd need double the vram.

surreal mango
#

one last question should I leave this at default?

old igloo
#

I'm not familiar with that setting. The notebook I use doesn't have that.

tough gazelle
stone garden
#

okey, some sample images from dreambooth run 1200 steps LR 2e-6 on 13 1024 images it's a little bit too colourful compared to the dataset

surreal mango
#

like strong colors or something

#

also maybe not be british

#

(im jk)

stone garden
#

is xformers for dreambooth supposed to give a better quality of results or something ? I see it gives me more VRAM to play with but it has a really big cost on my it/s

icy olive
hot breach
#

Runpod notebook now available on EveryDream. Also recent updates include training on anything without bothering to crop or resize, mass autoprune all your checkpoints, and a tools repo is available to help you caption your images on a 4GB GPU or on colab/gdrive: https://github.com/victorchall/EveryDream-trainer/blob/main/README.md

GitHub

General fine tuning for Stable Diffusion. Contribute to victorchall/EveryDream-trainer development by creating an account on GitHub.

warm gull
jovial iris
#

do you have the dreambooth model?

surreal mango
#

how would I apply an anime artstyle to a dreambooth model and still make it look like the face?

hot breach
#

did you try "soandso drawn in the style of anime" ?

stone garden
#

I'm a little ashamed to try out your EveryDream tool on a Rick roll dataset, but I'll do it. for science

#

last time I tried, it was my first ever DB, and I ended up with a "rollrick" token that was creating a tiger or a baby half of the time

terse cradle
#

can you send me the model?

stone garden
#

Hello, a question: loosely, about how many images does one need to finetune a generic model, not for a specific object or person but more like those stylistic models (trinart, elder ring etc)?

terse cradle
#

iirc around 50 imgs

tired wind
# stone garden okey, some sample images from dreambooth run 1200 steps LR 2e-6 on 13 1024 im...

You might want to look at what the instance prompt was, it could be pulling in some brighter colors from that. Remember dreambooth is supposed to take an existing concept and piggyback on it. Like X dog becomes your specific dog, but it still needs to attach itself to the dog concept. So if you did X painting style, or X some_artist painting, then the bright colors might be leaking from there rather than your dataset.

stone garden
viral jay
viral jay
terse cradle
tired wind
tired wind
# viral jay unfortunately I can't share it, It's trained with partner paid graphic content

With your class prompts, you generated unique images that weren't specifically emojis. From what I understood of dreambooth the class images should have been for the original concept, e.g. if it was X dogs the class prompt images would have just been dog. Was there a reason you generated them as mostly unrelated images of groups of objects? It isn't clear to me that the class images are influencing the embedded concept, however I do see that they can shift the model output when the embedding token is excluded.

old igloo
#

Can anyone help me better understand the factors and/or settings that can cause over-training? I trained on 30 images, with 5000 steps, checkpointing every 1000 steps, so I have 5 models (1000,2000,3000,4000,5000 step). I used a learning rate of 5e-7 on this training, and I suspect that setting is the one I need to adjust because even my 1000 step model is overtrained and I have to use a CFG of 5 or lower to get it to actually apply concepts from the prompt to the image and not just spit out images of the original subject.

#

Like if overtraining happened because of the learning rate of 5e-7, would I go up or down on that value to attempt to correct the issue?

cobalt sorrel
#

probably with these values you will have better results.

old igloo
tired wind
old igloo
# tired wind It could be overtraining even past 1,000 steps. From my own experience with arou...

Thank you for the suggestion. It was hard to say which images were being over-represented, because it would generate unique looking images of the subject, but it just wouldn't apply transformations. I'd have to step down to 5 or lower on cfg, and then the subject would be poorly represented, so I couldn't really find a good balancing point. I did go through and remove the lowest resolution images from the set, so it went from 30 to 12 images, then I retrained it as above, and now I am getting much better results, transofrmations are being applied and subject is preserved at CFG 7.5, so I think this model is good now.

short ivy
#

An official music video that I made for DJ Pone (with fine-tuning dreambooth) https://youtu.be/p3eygmzWJYI 😁

DJ Pone - Paradis (Clip Officiel)
Nouveau single disponible : https://bfan.link/djpone-paradis
S'abonner à la chaîne YouTube : https://bit.ly/djpone-yt

Clip vidéo réalisé par neb : https://www.instagram.com/nebsh83

"Paradis"
Produced by DJ Pone and Blasé
Lead Vocals by Disiz
Composed by DJ pone and Romain Hainaut
Written by Disiz
Arrangeme...

▶ Play video
sand pine
#

Is this model available anywhere?

honest saddle
#

I have a dataset of around 500 different named subjects, each with 5 or so poses in the same style. I would like to train a model with this style, and think it would be cool to also retain the subjects. What would be the best way to do this?

#

If keeping the named subjects isn't possible, what's the best way to train a checkpoint with 2500 samples of a style?

stone garden
#

Results are pretty good

wise trench
#

Anyone done a lot of toying with dreambooth and find out decent learning rates for digital drawings?

#

Can't seem to find a good middle ground it's either over trained and will only generate the original images OR it's under trained misses details and is quite noisy

somber shell
#

Hi all. Does anyone know how to make sure the image is always fully in the frame. Example: a portrait.
Thanks much.

grave carbon
#

I'm trying to dreambooth locally on 12GB VRAM

#

but running out of memory

#

anythingg I can do?

open oasis
stone garden
stone garden
# wise trench Can't seem to find a good middle ground it's either over trained and will only g...

Like all subject, i usually start at 1e-6 on constant. I cook it to 100 times the pictures in the learning dataset, and then I start doing checkpoints every 250 steps and compare the outputs, refine on lower LR on the best checkpoint before it is overlearn.

I'm not sure if "digital painting" or anything of that changes params recommanded. From my tests, it's mostly the quality of your dataset that changes everything, then comes your choice of token, and lastly, if there was anything related to your new concept in the model or not. Each of those will have an influence on the good learning parameters imo

wise trench
#

I can try cooking it then lowering LR

#

What does scale learning rate do by the way if you know? And is SKS/Some other random phrase really needed a paper I read on it said it wasn't assuming your token is already something not used

stone garden
# wise trench Issue I'm having is when it does get to a point of being really good it only gen...

Take the last checkpoint before you hit this stage, lower your learning rate by half and do some steps on it to hit that sweet spot.
You may need to alter your dataset to get better end results here, like try more extensively the things you are teaching it, see if some are already learned, and focus on the pictures you think deserve more attention in the dataset for the end of the learning.

Like i had a model that was too character centric, generating the main character in each image, i did 250 more steps on a dataset where I removed most of the character pics, and it solved the balance

And if everything starts to look like your input, it may also be a problem with prior preservation, and a bigger class image collection could help too

stone garden
# wise trench What does scale learning rate do by the way if you know? And is SKS/Some other r...

I'm not sure about scale learning rate, I'm not on that install.

About the choice of token like SKS, it has an importance. The theory is that you take a "neutral" token like SKS, not affiliated to concepts in the model, so your learning is not tainted.

But if you take a token close to your concept (like the name of an actor resembling a subject in your dataset), you will :

  • start the learning from this "point" in the model, so closer to your results and you may need less steps
  • destroy completely the concept that was behind your chosen token in the initial model, of course
wise trench
#

I'll have to try again soon

#

Really wish I could figure out what scale learning rate does couldn't find anything on it

stone garden
#

Found it, in Shivam code directly

#

"--scale_lr",
action="store_true",
default=False,
help="Scale the learning rate by the number of GPUs, gradient accumulation steps, and batch size

#

Not sure what it means in practice though

wise trench
#

Yeah I'm using the new auto extension

#

Most user friendly thing I found previous dreambooth stuff needed WSL on windows etc etc

#

Took up like 50GB so I uninstalled it after using it like once or twice

stone garden
#

Hehe Dreambooth costs a lot of hard drive space

#

I need to curate my learning from last night

#

I have about 500GB of models to test

#

I'm using Freon's method, very nice, but yeah, you need a fridging high hardware for it

stone garden
# honest saddle I have a dataset of around 500 different named subjects, each with 5 or so poses...

Easiest way to do it, or at least try, would be to use Shivam, or Everydream. Both allow multi concepts.
In Shivam, you would prepare a training of 500 concepts, each one being 5 pictures of a single subject, with the instance prompt "MrJohns in Super style", if you were after learning MrJohns as one of the 500 character, and "super style" being the name of your style for example.
Everydream would let you do the same by captioning all your pictures with a prompt fitting them, just like in Shivam but even more precise.
But for such a large set of subjects, with only 5 pics on each, you will have 2 problems
1/ they may not take well, each character, if the 5 pics aren't diversified enough
2/ with such a large corpus, you will need so many steps ... 2k5 pictures, i would start with a baseline of 250k steps ! This feels insane

If you want to just go after the style, take 50 to 100 pictures of your dataset, presenting the most variety possible, and do a single concept DB on it, a lot easier

honest saddle
#

Is there a good guide for style training with dreambooth?

#

I know there are multiple repos for it now

stone garden
crimson wasp
#

Somebody claims to have come up with a new version of textual inversion which looks like it blows previous methods out of the water and can do better than finetuning the model in some situations, haven't tried it yet though: https://github.com/7eu7d7/DreamArtist-sd-webui-extension

GitHub

DreamArtist for Stable-Diffusion-webui extension. Contribute to 7eu7d7/DreamArtist-sd-webui-extension development by creating an account on GitHub.

north stream
#

ah it's finally out, it was first a PR, then a standalone thing. he said he was going to make an extension, glad to see it's done

#

time to try it

solemn latch
#

I'm wondering if this new technique works with less VRAM. I'd love to do some training with my 6GB RTX 2060

indigo inlet
indigo inlet
crimson wasp
grave carbon
#

it seems it installs some dependencies after 2 or 3 runs

#

but now I have a problem

#

I can't generate any style image

#

every generated image looks like a photo

#

what have i done wrong? maybe my class prompt or my instance prompt?

crimson wasp
#

if you disable the DreamArtist extension you made need to replace /modules/ui.py with the original, since the extension changes it slightly, and then the webui doesn't start up without it (you may need to remove it from the extensions folder as well, disabling may not be enough)

crimson wasp
indigo inlet
# grave carbon Yes I managed to run it

I don't have much knowing about, but i think if the problem occurs with the ckpt file generated, maybe could be the class prompt or class training images. Other thing to comes in mind are the files appointed by --pretrained_model_name_or_path

frozen bobcat
#

When using dream artist training, I'm supposed to have one image on the dataset folder, right?

Can I include more than one image?

crimson wasp
grave carbon
#

all my model does is photo style

#

I tried like 3 models already

#

I can't generate any style

#

using dreambooth on A1111.

azure stump
#

Hi everyone, new to this community and would really appreciate some input. I used huggingface's stable diffusion dream booth co-lab notebook to fine tune the model on 6 pictures of a cologne bottle, but when I did inference the words from the original cologne bottle are not not shown. I understand stable diffusion has some difficulty rendering text, would be grateful to learn about solutions to the text rendering problem! Thank you! 🙂

rough marten
#

Can anyone explain to me what the difference between regularization images and training images are please? For context i'm a complete noob trying to play around with dreambooth and setting up my first style model

old igloo
stone garden
#

Could you pseudo-merge models at inference runtime by loading mutliple unets and merging their noise_pred at each step? Similar to how DiscoDiffusion used to load multiple CLIP models and merge them

#

(assuming none of the other components were finetuned)

half folio
#

It could be possible, just had a look at the diffusers pipeline

half folio
#

yup, it is possible, I'm messing around with the pipeline right now

#

problem is, it needs a little bit too much vram

bitter matrix
#

what does this mean /bin/bash: accelerate: command not found.................in the training section its the error I got in dreambooth?

stone garden
#

Pretty jank but it works so I don't care 😂

half folio
#

All I did last night was loading the unet from two different models and calculating the weighted mean of the two noise_pred tensors resulting from them, couldn't be bothered to do all that because it was late though 😆

stone garden
#

That's all I'm doing too, just in a way where I can still use the pre-made diffusers pipelines

#

the important bit is __call__ at the bottom, which does exactly that

half folio
#

yep, I had a look at your code

fast gazelle
#

I'm trying to use a checkpoint as a dreambooth source checkpoint but about half my models aren't showing up in the dreambooth dropdown box. Any thoughts on how to change this?

crimson wasp
#

can anybody recommend an up-to-date finetuning repo which will work in under 12gb of vram on windows? I've almost finished creating an image tagger to create a novel-AI-like training dataset, and then just need to edit a finetuning repo's datasource to instead read the file locations/tags/clipping region from the database this image manager creates (and also to tag a ton of my artwork)

surreal mango
#

what would happen If I merge 2 diffrent dreambooth models that have the same the same subject, together ?

stone garden
#

Can someone help me convert diffusers to a valid CKPT? It seems that I'm either using the official HF script wrong, or that it doesn't build correctly the CKPT to pass the unpickle test

grand jay
#

excellent

dapper prism
# grand jay excellent

I've since added new datasets for "cat", "kitty", "sexy athlete", "cyberpunk", "guy", "femme fatale", "bikini model", and more!

stone garden
#

always good to share the results of processing power like that, thanks from the community and the planet

wide sky
#

Is it possible to train specific styles of eyes? without including other facial features in the training. Tonight I wanted to make some tests but it's my first time with textual inversion or DB:)

austere wigeon
#

I uninstalled the dreambooth extension and re-installed it, and now I am getting TypeError: start_training() takes 40 positional arguments but 41 were given any ideas?

raw wraith
vale egret
#

I really want to know if anyone can get DreamArtist working. The results in the repo look sick, but everyone I ask who has tried it didn't get good results

tough gazelle
#

The guy needs to do proper documentation

#

There's basically none

terse cradle
vale egret
old igloo
#

I've been having a hard time with one particular training. I'm using 20 headshots of my brother, taken with different cameras (some DSLR, some iphone), in different settings, with different clothes, all have good focus, cropped to 512x512. For the latest training I did 2000 steps at 1e-6 with 2000 class images in the class "man", with checkpoints every 500 steps. All of the checkpoints produce images that look like my brother with a prompt of "photo of xyz man" with CFG 7. But on all of the checkpoints, I can't get transformations to apply, which suggests they are all over-trained. If it's over-trained everywhere from 500 steps to 2000 steps, does that mean I need to reduce or increase the learning rate, or adjust some other setting? Would I be better off using the person class instead of the man class?

glossy rune
vale egret
#

How long does it take to make a dreambooth model? (I want to know iteration speed compared to inference & recommended step count)

old igloo
# glossy rune Do you train based on SD 1.4 or 1.5? I found 1.4 an easier / more forgiving base...

Thanks for responding!

  1. I am using 1.5 from hugging face.
  2. I am training with the text encoder
  3. Most of the photos were cropped down to headshots. A few included more torso. None were full body. They were all taken in totally different surroundings, etc. I just culled my photo library for photos of him to work with, and it's a mix of indoor, outdoor, studio lighting, natural lighting, etc. But only 20 images total.

Thanks for that guide, I'll try those tips out.

random star
#

idk if this counts as finetuning, but should i make my own textual inversion template prompts for a person subject?

#

the textual inversion subject template is kind of meh

#

does it actually affect the results?

vale egret
random star
#

thanks!

obtuse flint
#

Is it possible to to finetune multiple subject types? As in finetune myself along with a specific article of clothing

#

That way the finetune model can detect both

random star
#

also, for the dataset, does more == better?

#

assuming all images are of good quality

#

and variation

random star
#

im having trouble with getting the number of stripes on a characters face to be correct in my embedding. its always supposed to be three. is there a way to fix this?

stone garden
random star
#

i tested it

#

with textual inversion

#

you have to inpaint the eyes tho

wide sky
random star
#

i trained it with close up images of eyes btw

wide sky
#

so no full face images? Then I'll test that way too, for now I'll try to feed faces at 512 at the best quality I can, also upscaling

random star
#

yeah

#

so this is the kinda image you want to train it with

#

for watever style your doing

final matrix
#

is this what too high learning rate looks like? trying to figure out if its worth continuing training with a lower training rate from here or if i should start all over

wooden shuttle
#

Which repo is this?

final matrix
#

Joes

#

from what i heard from others its overtrained from too high LR

crimson wasp
#

for models trained on big, well-tagged data like I think Hentai Diffusion and Novel AI were, does anybody know if artists were used as just another prompt word or were separately handled to be added to the prompt like "by artistname" or "in the style of artistname"?

brave hedge
#

Has anyone been able to successfully combine dreambooth with inpainting or img2img? I'm trying to apply a learned style to a real image. Wonder if this is possible.

tired wind
sick hare
# brave hedge Has anyone been able to successfully combine dreambooth with inpainting or img2i...

I haven't used it personally, but Shivam's repo has a script to train a dreambooth inpainting model: https://github.com/ShivamShrirao/diffusers/blob/main/examples/dreambooth/train_inpainting_dreambooth.py

GitHub

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch - diffusers/train_inpainting_dreambooth.py at main · ShivamShrirao/diffusers

slim moon
#

I made a basic GUI for manually captioning images. It's intended for use with the EveryDream repo, but should work with any system that pulls captions from a sidecar .txt file

crimson wasp
# slim moon I made a basic GUI for manually captioning images. It's intended for use with th...

If you want a c# tool, I've almost finished making one for tagging, batch tagging, searching by tags, and selecting training regions in the images. I'm just finishing up getting mouse resizing of the training region working and will hopefully update the repo with that soon. It stores all images dragged into it in a text file with tags and training regions, so any training repo would need to read that for datasources. It's ideally for handling tens of thousands of training images and tagging them efficiently for training models with novel AI's quality. It also makes a pretty sweet image viewer with tagging and searching by tags https://github.com/CodeExplode/Image-Tagger

GitHub

Simple Image Viewer with ability to tag images, search by tags, and mark regions for AI training - GitHub - CodeExplode/Image-Tagger: Simple Image Viewer with ability to tag images, search by tags,...

vale egret
slim moon
crimson wasp
# slim moon Nice! I'll check it out, looks a lot more sophisticated

it's a bit hacked together in truth but hopefully is stable enough to use. It might be a good idea to make backups of the database text file on occasion just in case it gets messed up somehow and loses everything, or you accidentally save over it at the location without loading it first (it doesn't give saving over confirmation each time since ideally load then hit save regularly after making changes)

crimson wasp
#

Image Tagger now has mouse controlled resizing and moving of the training area, so should be pretty much ready to go now. Just need to update a repo to read the database file as a data source during training

manic estuary
#

How many it/s are you all getting when finetuning (Dreambooth) on your 3090s?

crimson wasp
#

added a few more updates to ImageTagger to fix some important things if anybody is using it, but now I think it's really really ready to be used 😅

bold seal
#

I was wondering if anyone had experience in the following problem. So i have a big upscaled outpainted landscape. Now i want to go back in and correct small details in say photoshop. What i really want to do is zoom in on certain features (like distant faces or images), but it occured to me that SD image to image or inpainting could accomplish the same thing with the original prompt in such a setting (as opposed to photoshop). Is there something that helps automate that. Usually its the opposite direction (downscaled to upscaled)

crimson wasp
bold seal
#

yea. It would be great if we could do exactly that, all the while staying zoomed in . B/c its hard to spot little details. I have to put each image into photoshop after each inpaint to spot the little imperfections. You can't really spot the issue by eye sometimes. One could easily imagine using SD to 'correct' things at all scales.

dapper prism
#

What's the best repo right now for actual finetuning (not DreamBooth)? Is it the EveryDream repo (I see it uses reg images, so is it not actual finetuning)?

tropic glacier
#

Hey guys. I could really use some pointers I just feel so discouraged after so much time... I'm so bad at in painting. Can someone point me in the right direction? I usually run
Full resolution
Resize and fill
Restore faces
Mask 1.0
I just can't produce anything of any quality. Are there any settings I'm missing or just any tips would be nice. I also use the 1.5 inpainting model

prisma kiln
#

These are fantastic, are you sharing your model?

ashen swan
#

Just made this, hope people get some use out of it :)
https://github.com/antis0007/sd-webui-multiple-hypernetworks
Script that lets you apply multiple hypernetworks at once in auto's webui, if you like it consider giving the repo a star.

GitHub

Script that allows the use of multiple hypernetworks at once in AUTOMATIC1111's Stable Diffusion webui - GitHub - antis0007/sd-webui-multiple-hypernetworks: Script that allows the use of mu...

surreal mango
#

ok so Im starting fresh for dreambooth
I got about 28 photos
I wanted to ask a few questions

  1. Is ddalton a good token or should I use something diffrent
  2. how many steps should I do?
  3. what model should I use as base? (1.4, 1.5)
tired wind
surreal mango
#

also how do I prevent colab from dissconnecting?

#

without pro

tired wind
#

Usually by making sure it doesn't idle. I had a lot more trouble getting colab to run than doing it locally

#

Maybe it's easier now, but I spent hours getting automatic1111 to work on colab and then the next day it would break. Locally, on a 3090ti+win11 everything worked on the first try including dreambooth plugin. Very painless and not what I was expecting.

surreal mango
#

should I use this option?

#

I looked in the code and it just adds regularzation images of men or women

#

should I be using it? or just continue without them

tired wind
#

Not sure about that

surreal mango
#

its a folder/file of just a bunch of pictures of men

#

faces

#

so I assume its similir to deepfakes in a way maybe?

#

idk for now I just wont use them

surreal mango
#

how long would it take to train a model on a gtx 1660

#

nvrmind I just realized

surreal mango
#

ok so I trained my dreambooth model and every background is the same almost everytime

#

how many photos should I do? instead

hot breach
#

are your backgrounds all the same in training images? can tend to happen

#

or clothing, etc can be hard to transfer for the same reason

surreal mango
#

I dont have any other backgrounds other then my room so I did photoshop half of them to have a white wall behind

hot breach
#

I'd go take some photos of yourself outside in natural light

hot breach
final matrix
#

Now that I've changed the names of my captions, I can suddenly no longer put hairsfyle X on outfit Y

really weird. i didn't do anything except add another 100 pictures of another outfit and reduce the token length of the outfit and hairstyle tokens from several russian letters to just one and remove the english names like outfit or garments.

Now I can tell the outfits apart better and I can also prompt other outfits that werent in the show, but the hairstyles are somehow almost hardcoded to the outfits.

even (outfit x:1.3) [hairstyle a:100] doesn't change anything. I have to prompt hairstyle b of outfit x or I won't get outfit x. but then I don't get hairstyle a either...

I can do (outfit x:1.3) hairstyle a hairstyle b but then I only get a hairstyle that is a mix of both

any ideas on how to fix this?

final matrix
#

well if i cant fix it soon ill just release my model without the ability to change hairstyles yet

because at least it can prompt all outfits on their own with no overlap and also prompt any non show outfits without overlap. that should be good enough for a V1.0 release.

final matrix
#

new idea

i tag it as "dressed like X" "wearing Y hairstyle"

bold seal
#

Hello, i'm about to train a dreambooth model using a 'style'. Whats the consensus on learning rate and steps for say 30 training images. Also what sort of regularization images should i use. I downloaded a few thousand images of a random 'art style', would that work or should i have SD generate images itself.

hot breach
#

if you are using real images with captions (ex "tulips by claude monet" or "a painting of a group of people sitting at a table in a room with a light hanging from the ceiling by Vincent van Gogh" you're better off

icy vale
#

Hey, I trained this model with Justin Pinkney's repo for emojis and have a general question - training via this method took a large amount of time (5 hours on A100) and results in a checkpoint that specifically only creates emojis.

I see people using dreambooth to create a style - which I'm about to try out. But what is the benefit of doing it the way I did with Justin's repo versus Dreambooth which seems much much faster and cheaper? I get the textual inversion and embedding a new "object" with Dreambooth but what is the tradeoffs with training styles

hot breach
#

oof 5 hours of a100?

icy vale
#

ya lol

#

~1200 training images

#

was that too many

tired wind
#

I've done TI, fine tuning, and dreambooth, and dreambooth results were much better than the other two. Also usually can have a good model in 1,000-2,000 steps vs like 13,000+

#

I haven't seen any consensus on number of training images. The dreambooth paper stated they only used 3-5 images for training, but that seems to be for very specific subjects (a unique person or object) rather than style

icy vale
#

then theres really no point of ever fine tuning?

#

versus dreambooth?

tired wind
#

I'm not clear that there is

#

At least for TI, it can be useful for combining subjects, like "person X and person Y", though you can train dreambooth on multiple things at once (haven't tried)

#

I would suggest start with Dreambooth just given the training speed and test the other stuff after, which is the reverse of what I did. The dreambooth plugin for automatic1111 is pretty painless to use

icy vale
#

cool, thanks!

vale egret
#

Fine tuning is fine if you have a massive and varied dataset and tons of computational power since it will forget about everything you don’t reinforce

vale egret
#

Why?

hot breach
#

up to you! click, or don't

icy vale
vale egret
#

When you train an AI with a certain dataset, it starts assuming that the dataset contains everything it will ever need to know. That’s why embeddings are so popular, they let you put your character into any scene you want, doing whatever you want. But if you fine tune, you can only create combinations of scenes that exist in the training data

Dreambooth gets around this problem somewhat, as it retains more of its previous knowledge

hot breach
#

there's a lot of space in CLIP and LD to add new knowledge, it knows potentially millions of concepts, adding a few shouldn't necessarily ruin that if your training regime is intelligent

frozen bobcat
#

What's TI? 👀

split acorn
#

TI usually refers to Textual Inversion alicatPog

rough shoal
#

Can you train hypernetworks on multiple people by adding their name in their training data? Like: (slash)(character name(slash)). Do they accurately show the different characters and clothing without any mixing?

frozen bobcat
split acorn
#

Hypernetworks, at least atm, are only good for training one (as far as I'm aware). BUT I think you could train with a dataset of two people (with those two people in all the input images) with a hypernetwork, but you're going to always get them together alicatHm2. I haven't tried it yet, but that should work. Would it be good? very likely not. Would it be flexible? No

sullen ravine
#

Hey, so I've been messing around with dreambooth in A1111 to moderate degrees of success. My big question is, Does anyone have experience training multiple concepts with it yet? Would love some basic introductory help with how that works as there isn't anything I can find online about it given its only been a few days.

rough shoal
split acorn
split acorn
sullen ravine
#

I'll check it out, I appreciate it

split acorn
#

Though... it's like character adjacent alicatKEK2

#

dm'd!

rough shoal
split acorn
#

I'm reinstalling stable-diffusion atm (getting black image bug) but I can show some examples in a moment alicatPog

rough shoal
#

I started getting that out of nowhere a while ago, had to use --no-half-vae

split acorn
#

o, interesting NOTED

#

it did it with or without vae though alicatHm2

#

only on img2img with larger images

frozen bobcat
#

Is there a decent guide for embeddings? @split acorn

#

Also, TI is hypernetworks? Is it also embeddings?

vale egret
#

TI is embeddings

#

There’s a guide on the automatic1111 wiki

split acorn
#

TI and HN are different and yeah, there's a guide on auto's wiki

#

although I only use DreamArtist for TI right now and everything else HN / DB

#

has helped a lot

tough gazelle
split acorn
#

3 positive tokens, 6 negative tokens, training was 0.003 and I used a model that could do something similar to what I was looking for

#

to mmmm how many steps was it, checking

tough gazelle
#

Yeah that's basically what I'm doing.

Was trying to follow his example of a character. But for some reason it just makes loads of group images. Even though it's a single character.

rough shoal
#

How much Vram do you have for DreamArtist?

tough gazelle
#

It's working on my 10GB 3080

#

As in running

split acorn
#

I have 8GB and it worked fine

tough gazelle
#

Uses 9.8GB VRAM when I run it

split acorn
#

HN + image previews takes up like full 8GB though. It works, it's just tight

#

I'm also using xformers

rough shoal
#

Really? I couldn't get it working on my 2070 super not running preview images, I don't have xformers though, I like Win7 too much.

tough gazelle
#

I just don't understand why the dreamartist stuff isn't working. It's not just me either. There's a bunch of other people that have registered an issue saying they have similar problems

split acorn
#

I had to edit the code

tough gazelle
split acorn
#

also recontruction is broken atm, iirc

#

so I left it unchecked

tough gazelle
#

I'm not using reconstruction.

What code did you have to edit.

split acorn
#

oh, the code edit was just part of the ui.py, it's really only important when uninstalling and they already fixed it

#

basically it'd edit the ui.py directly and it would break SD alicatKEK2

rough shoal
#

How do you know when to stop training an embedding or hypernet?

split acorn
#

image previews helps a lot

#

there's also people that will use the loss data to see the learning curve

#

and then base it off of whether or not the training is still learning or not

#

you can get that info through the cvs (if enabled in the settings)

#

a couple people have made py's to convert the info

rough shoal
#

So either train until the preview image looks bad and train the previous at lower rate/until nothing changes or watch for data loss to go above a certian number, how much loss is to much loss?

tough gazelle
#

@split acorn how many steps were you doing then? Maybe I'm not doing enough. Although I'm doing the same as what he put in his examples.

split acorn
#

I went up to 3600

#

dreamartist worked pretty quickly though

tough gazelle
#

His examples say 8000

#

And I'm at 8000 and it's still shit lol

split acorn
#

Oh, you should notice it being pretty good a lot earlier than that

tough gazelle
#

Are you trying to do a subject or a style, got any examples?

split acorn
#

it might be better higher than 3600, that's just when I decided to stop

#

it was subject

#

ye one sec

tough gazelle
#

Yeah I just get complete shite out of it

split acorn
#

The middle is the original and the other two are from dream artist alicatHm2 this is a rough example though.

tough gazelle
#

What model you use to train?

split acorn
#

D&D merged with Any3

tough gazelle
#

What about initialiser text

split acorn
#

think I used halfling or 1boy alicatHm2 I don't recall

tough gazelle
#

The "results" I get aren't even consistent

#

This was the input image

split acorn
#

I used [filename] as the training txt

tough gazelle
#

Yet the outputs are shit like this

split acorn
#

and I used the sd tagger to tag all my images

#

ooo wow

#

that's not even close

tough gazelle
#

Sometimes it does this

#

That's probably the closest it's gotten

split acorn
#

I tagged them, see example:
solo, looking at viewer, smile, short hair, bangs, brown hair, 1boy, long sleeves, hair between eyes, brown eyes, jewelry, sitting, flower, ahoge, male focus, earrings, boots, outdoors, pointy ears, belt, pants, bag, grin, tree, depth of field, blurry background, brown footwear, knee boots, grass, ear piercing, pink flower, bracer, wall, brown pants, brick wall

tough gazelle
#

As in you tagged the images?

split acorn
#

and then the txt file only contains [filename]

#

yeah

tough gazelle
#

I guess I can try that

split acorn
#

according to the person though you don't need to do that for it to work

rough shoal
split acorn
#

but it's just a habit from other TI, HN, and DB

tough gazelle
#

I've done Hypernetworks before with Similar images and it works perfectly fine

split acorn
#

it might, it's kinda blury

tough gazelle
#

It's just this Dreamartist thing that doesn't work

split acorn
#

the tagging / txt file though might make a big diff

tough gazelle
#

Let me try get a clearer image

rough shoal
#

I noticed with sharper anime screencaps the deepdanbooru picks up almost everything perfectly, maybe upping the clarity will be better for the ai to understand what it's training.

rough shoal
tough gazelle
#

It can't look that good as the resolution has to be low

tough gazelle
bold seal
#

if i'm merging two checkpoints together.. one style and one person.. weighted sum seems like the obvious choice?

split acorn
#

[filename]

#

that's what I did, at least

split acorn
#

Still looking for a model merging guide alicatKEK2

tough gazelle
#

If I try use [filename] in the project template I get this error
AttributeError: 'NoneType' object has no attribute 'detach'

#

You sure it's not [filewords] ?

split acorn
#

oh sorry, yaeh

#

I was just double checking now alicatKEK2

#

[filewords] yeah

#

and the input image file name and the txt file share the same name in the same folder

tough gazelle
#

That's still giving me errors....

split acorn
#

and then the txt file has all the tags

tough gazelle
#

I can use the inbuilt subject one fine

#

But Filename or Filewords don't work

split acorn
#

with your dataset folder, does it contain the image and the txt file?

tough gazelle
#

Yeah

split acorn
#

huh odd

#

I've done over a dozen like this and I've never seen that

tough gazelle
#

His instructions say not to use filewords

split acorn
#

and the Prompt template file is pointing to the txt that only has [filewords] in it?

tough gazelle
#

Yeah

#

2 seconds there's an update to DreamArtist, let me try again

split acorn
#

Well it works so I don't know what to tell them alicatKEK2

#

maybe it works better if you don't use filewords but

#

this is more "something to try since the default isn't working" kind of thing

tough gazelle
#

Well it doesn't work at all at the moment, so willing to try anything to see if it makes a difference.

split acorn
#

yeah

tough gazelle
#

are you using CFG Scale 3 on the training like he recommends?

#

Ok something isn't right. It works when I use the Subject_filewords.txt file. So it's not like it can't read the filewords.

#

Maybe I need to do [name] [filewords] otherwise it's going to complain the embedding word isn't in there

#

ok it runs when I put [name], [filewords]

#

Lets see if this makes any difference and if it looks anything like the image.

rough shoal
#

Hmm, looks like my suspicions were right: a single from above perspective image was squashing my results. A shame because the squashed version visually looks better. I hope I can recover it with a little more training but anything after 28000 gets blown out at 5e-06 while I used it throughout the whole 30k with the old ver.

#

The problem image.

tough gazelle
#

Hmm well the preview is looking better so far

#

Used this as the training image

#

Preview after 1000 steps

#

At least it's actually a girl with blonde hair this time

#

And not a temple

split acorn
frozen bobcat
#

Ok, so I've used Dreambooth to train on one specific character (using waifudiff3 model). The resulting ckpt NAILS the character, being able to pretty much apply any anime style to that character. Looks fabulous when it comes to drawing the original character in (for example) CLAMP style.

Created a hypernetwork using the same character images. When applied over the trained ckpt, I can almost replicate the exact style with that exact character with the hypernetwork effect turned down to .5.

What are the benefits of training with a TI? The resultant files are so tiny. And the effects of TI embeddings i downloaded have almost ZERO effect on anything.

Is there a decent guide for TI?

vale egret
tough gazelle
#

It looked a little better, but still didn't look enough like the model.

#

Might as well just use Dreambooth, takes about the same amount of time and looks a lot better.

solemn notch
frozen bobcat
# solemn notch To use a TI you have to use the keyword. in auto1111 the filename is the keyword...

Thanks.
I'd like to create an embedding for one specific character.
I've chosen this as an initialization text: 1boy, demon boy, bat wing, golden horns, blue eyes, jumpsuit,
I have no idea what *Number of vectors per token * means.
Some people are training one character but they chose 10 tokens.
What does that mean?
I just want to be able to consistently reproduce my one character as close as possible.

solemn notch
# frozen bobcat Thanks. I'd like to create an embedding for one specific character. I've chosen...

So, when you type some words into a prompt for stable diffusion, the first step the computer does is translating those words into a series of tokens. For some common words and symbols, a word corresponds to one token. For other, less common words, it could be several tokens.

Textual Inversion embeddings basically train SD on which tokens you want it to translate a particular keyword into. The number of tokens you give it / the number of vectors per token are telling it how many pieces of other things it should look at.

You have a limit on the number of tokens you can include in a prompt, which is why you can't just set it to 100 and assume it'll figure something out... but in theory, the more tokens you let it play with, the better the results should be.

#

like many things in machine learning, you may need to play with it a bit before getting great results.

final matrix
# final matrix Now that I've changed the names of my captions, I can suddenly no longer put hai...

okay the problem is probably that the model can't use kyrellic or non-english characters. other users noticed that and I couldn't prompt anything after I had switched everything to Cyrellian

I changed my captions to "wearing X outfit" where X is a two token token like e.g. default outfit -> defa

that should work brilliantly. If it does, then I should have eliminated all the problems my model has ever given me except for the training itself, i.e. which learning rate and repeats are best. 1e-7 was still a bit too high for my 1100 pictures. I'm trying it now with 7.5e-8 and if that works I'll continue to train until I have something usable.

frozen bobcat
peak ridge
#

i want to train a new concept to SD, various actions (thinking martial arts) - what's the best way of doing it? embedding, hypernetwork or dreambooth?

wooden shuttle
#

You mean like, martial arts poses?

stone garden
#

i really want to get good results with dreamartist

split acorn
#

Train with reconstruction was broken when I trained it

#

not sure if it was fixed or not

#

I didn't use it

#

Steps was only 3600, I know they recommend more, but for what I was wanting 3600 seemed pretty good, might try longer but, I'd rather just hypernetwork with cherry picked generations alicatKEK2

#

I used the recommended learning rate

stone garden
#

hmm okay thank you! 😄

split acorn
#

0.003 was the learning rate

#

CFG was 5

vale egret
tough gazelle
#

After tagging the image it was a little better. But it still was nowhere near as close as the devs examples

split acorn
#

Could be the input image, it seems kinda hard to learn from alicatHm2

#

They have a bunch of examples and even if following the various settings they list and if it still doesn't produce similar results, it's probably the input image

peak ridge
icy vale
#

is there a good tutorial for training a style on dreambooth?

versed oriole
#

I just wonder if when he says to use EMAs just loading the "full weight" versions of the models is sufficient, or if I should also use a .cfg that actually loads them in (doubles ram usage)

#

it's heavy enough to prevent 512 training and reconstruction loss with 24gb lol.

golden moon
#

Any idea if v1-5-pruned.ckpt is available with 840k VAE baked in?

#

I know there's a scrip that unpacks the checkpoint where you could include the VAE yourself, but it's redundant work if it is already up somewhere

steady heath
#

Can anyone tell me where do i start if i wanted to train my own model using WaifuDiffusion as a base model?

final matrix
#

and giving the outfits and hairstyles to other people, in this example Emma Watson, also works (again, only 35rpt hence it looks so bad)

#

i just noticed an oopsie
i forgot to crop and resize the last images i added to my dataset
luckily it works fine if you dont use the exact way i captioned the images, but instead just make a more generic prompt like "photo of X wearing Y" etc
wont restart training for this though. its just 30 images and a minor issue. ill fix that in my first post-release version.

https://cdn.discordapp.com/attachments/1041753916381085717/1043423509910655006/grid-0070.png
https://cdn.discordapp.com/attachments/1041753916381085717/1043423510195863622/image.png
https://cdn.discordapp.com/attachments/1041753916381085717/1043423510535610438/grid-0072.png

stone garden
#

I made a small UI for Shivam if anybody interested. "pip install easygui" in your local Shivam env should be enough to have it running

rough shoal
#

What do you put in initialization text and vector tokens if you're training a style embedding?

stone garden
rough shoal
stone garden
# rough shoal So it doesn't strictly matter what it is if I was making a style? Just '(name of...

it does matter a little in my experience :

  • you won't be able to use that keyword normaly when that embedding is loaded, so if you choose "photo" as keyword, you'll have a hard time prompting without it
  • the training will be easier or harder depending on that keyword, at least a little. It won't change that much final results, but if you used "tree" as keyword, it will need more steps to understand it's not the same "tree" that it knew of. So using a keyword related to what you are training is easier usually
final matrix
#

my legend of korra model is likely release-ready by tomorrow evening

i restarted training to fix some caption issues as well as add 10 more images of smug korra (lol) so that delays things somewhat but i think by tomorrow i should have a releaseable model.

some outfits also need more training still (the above screenshots are with an outfit i have more training data for so its easier to prompt) so the model needs to bake longer in the oven which further delays things. i am also very careful with repeats and learning rates here to not overtrain (too much). in certain cases cfg4 must be used but thats a sacrifice i am willing to make.

the entire model dataset now encompasses 1142 manually captioned images.

the release version will allow you to output anyone and anything in the legend of korra artstyle as well as any person in Korras outfits, as well as prompt Korra with any of her outfits as well as any other non show outfit as well as prompt her in any artstyle.

#

I have been working on this model for the past 4 weeks and spent hundreds of euros generating dozens upon dozens of ckpt's to test any kind of changes to repeats, learning rates, captions, classes, tokens, regs etc... as well as test different repos.

I have now settled on no class, no token, manually captioned images, JoePennas repo. Repeats ans learning rates are still being tested but the above results were with an earlier model with 10 repeats at 3e-06, 25 repeats at 2e-06, 60 repeats at 1e-06. Some outfits still needed more training.

It took me at least 50 real working hours building up that dataset (despite using a bulk image downloader) and manually captioning it (despite using a bulk file rename tool).

I may create future versions adding the other characters from the show with better captions (for example location captions are extremely rudimentary right now).

I will release the dataset alongside the ckpt.

stone garden
# final matrix I have been working on this model for the past 4 weeks and spent hundreds of eur...

Thanks a lot for this telling of your process/work. I feel less alone on the same kind of things too, captioning hours upon hours, scrapping the web, filtering, ...
Joe's repo is really good. I use Everydream, in the same kind of process, if you like Joe you may like it too

Those results are really cool, though i can't really appreciate them at their full value, having not watched Legend of Korra.

I never tried doing such small epochs and micro managing the learning rate like that. I start on 1e-6 and stay there, monitoring the tensorboard to see when to stop, and epochs of 50 repeats right now, but maybe I'm on bigger datasets than you. I'm really impressed by the quality on so "few" steps tbh

Thanks again for this diary, keep it going if you feel like it, very instructive

final matrix
stone garden
# final matrix how big is your dataset? how many repeats until you had a good result?

I'm in full experimental phase, i have tested datasets of 10 to 250 pictures with full caption by hand.
If I use very diverse datasets (like part of it is illustration style, but i also teach other concepts at the same time, some photo realistic, some not,... Trying to get diversity in the datasets so it can act as its own regularisation, and preserve the rest of the model.

I start getting ok results once i hit epoch 4 usually, but for the multi concepts to all take at the same time and mix together, i push toward epoch 12 (50 repeats)

My last test was 100 pictures with 3 diverse datasets, trained for 100k steps, didn't bleed other the rest of the model, and all concepts are learned from what my tests got me. First ok checkpoint was on epoch 12, and then 15 and 17. Best one was 15, so 750 repeats total

#

For example my last one had 3 concepts. Rick roll, Bad cosplay man and Magic powers. I can't get Rick roll with powers but the other cross over of concepts worked

final matrix
#

epoch 12 damn

#

mine would be long overtrained by then

stone garden
#

I work a lot on the regularisation

#

Each picture is also captionned with random tokens that are present in it, like a fence, the keyword "portrait", a chair... I try to add more concepts in each so it can't overtrain easily

#

It makes it learn slower since the attention gets divided

#

But it prevents quite correctly bleeding

#

I was surprised on the e17 that didn't bleed all over tbh

#

That's 850 repeats already

final matrix
#

yes my captions are usually like this

"Korra, defa outfit, stada hairstyle, smug, half-body; exterior, day; tlok artstyle"

i find it weird you need that extreme amount of repeats though for your concepts to work. at the aforementioned repeats setting i am already able to freely combine concepts as i want to. just the likeness still has some issues.

stone garden
#

You are using one single direction though : kora.
I'm pushing in diverse directions. Bad cosplay man has nothing to do with the other concepts, and is present in only 30% of the dataset. I wasn't surprised to not see it learned well very soon

final matrix
#

but i also have 1142 images so that may be a factor. also batch size 5

stone garden
#

I'm on batch size 6, but lot less images yeah

final matrix
stone garden
#

Very instructive though, I'm getting so much info from our differences in workflow

stone garden
final matrix
#

i havent had enough repeats to test this yet bu4 my latest model switched "photo" for "cosplay" because my thinking here is that the model training understands that those images are photos and thus they help with training but captioning them as cosplay instead of training will make it so that i dont just prompt the cosplay training data instead of random photos

i did the same with changing digital art to fanart and it worked

#

also changed figurine to merchandise

stone garden
#

Here is the tensorboard for loss value on that last training

final matrix
final matrix
stone garden
#

This may also be why you get overtraining too, 80% pics on the main subject?
By the way, why so many pic ? You still manage to find diversity and new things to add in the 1000+th picture ?

stone garden
final matrix
#

youll see when i upload the final dataset

final matrix
#

with negative prompt of the artstyle

final matrix
# final matrix

also is it just me or do those hands look better than vanilla SD? if so, it might be because i have quite a couple shots with visible hands in my training data.

stone garden
#

I have had the same results : if I have at least 5% of my dataset with clear shots of hands, even a handshot or two, hands are drasticaly improved in the resulting model
I have a list of little concepts like that that I try to put in all my datasets, mostly hands and framing (headshot, fullbody shot, halfbody shot, training those keywords a little helps a LOT)

final matrix
#

and "back"

final matrix
#

i might have found a somewhat workable fix for the "overtrain" effect all my images have (despite not actually being overtrained)
literally just adding "photo" to the negative prompt as well as the positive prompt lol
example, first image is photo only as positive prompt, second image is photo also as negative prompt:

(the weird cropping is from an oopsie of forgetting to correctly crop some images, thats fixed in my newest version, but that one doesnt have enough repeats yet for testing)

humble crane
#

Hey yall. I installed the stable diffusion but when go to the local url it says this site cant be reached

#

And when i run the webui-user.bat it says file is most likely corrupted

crimson wasp
crimson wasp
# final matrix I have been working on this model for the past 4 weeks and spent hundreds of eur...

I made this tool to try to make captioning much faster when working with huge amounts of images, though currently it stores all the original filename locations / tagged text / selected image bounds in a text file and I haven't updated any repo to use that as a datasource yet https://github.com/CodeExplode/Image-Tagger

e.g. You could drag in all your images, click batch tag, and tag them all with Korra to have it added to the caption for every image, then search for images with the word Korra, and if you want to rename a token, search for all images with that tag and make a batch (ctrl b), and rename it with batch tagging enabled

GitHub

Simple Image Viewer with ability to tag images, search by tags, and mark regions for AI training - GitHub - CodeExplode/Image-Tagger: Simple Image Viewer with ability to tag images, search by tags,...

hot breach
#

everydream trainer now has some swanky jitter on cropping, accepting any aspect ratio images as long as you do NOT crop your images to 512x512, I'm seeing a pretty substantial increase in quality from this

#

that was just a brief testing pounding Keir Dullea into the model with no sort of regularization or preservation data at all

frozen bobcat
#

under the dreambooth tab, what does "half-model" mean???

dapper prism
frozen bobcat
#

is it just redundant?

dapper prism
dapper prism
woeful goblet
#

is anyone else finding inpainting tobe broken in automatic1111 ?

plucky swan
#

Do you guys notice some differences when using fp16 compared to fp32, mine seem to get more overfit after converting to fp16

stone garden
#

Hi guys, am i able to dreambooth a character using the anything v3 model?

glossy rune
# plucky swan Do you guys notice some differences when using fp16 compared to fp32, mine seem ...

For inference (generating with auto1111 etc) we have not found a visual difference when the model is loaded in fp16. During training/dreambooth it is more likely to make a difference, but if so, still small from what we experienced. We convert every trained model to fp16 once done and have also successfully trained from fp16 checkpoints. But I’d recommend fp32 (or better tf32 or bf16) for training, if gpu allows

plucky swan
glossy rune
plucky swan
glossy rune
plucky swan
#

Lemme try that later, thanks for the help 🙂

final matrix
glossy rune
# stone garden I'm not entirely sure how to read them, and have not been able to find good info...

Not sure this helps, but you are looking at the tail end of something like this https://i.stack.imgur.com/kd2YE.png (the beginning is the 160k hrs of training by stability/runway/compvis).
You can probably only hope to interpret averages over many many steps/epochs.
And then it obviously depends on the repo and the respective details of the loss function, that is used in training. Not sure if that is consistent.

final matrix
cloud yoke
#

Anyone tried using multiple classes of regularization images, i.e an instance can have for example 10 reg classes?

glossy rune
#

With EveryDream repo you stop thinking in classes and just train on good captions. Also train and reg are just all „training data“. That works well, but is not dreambooth anymore

glossy rune
#

I see a kid’s christmas present taking shape…

final matrix
#

for a long time i had the issue of thinking my models are overtrained because especially photos would be so extremely fried. but today i have found the solution.
both images same prompt, cfg7

top image has negative prompt "white skin, tlok artstyle"
bottom image has negative prompt "white skin, blur, cosplay photo, vignette, instagram, tlok artstyle"

https://cdn.discordapp.com/attachments/1023293330601287711/1043887298737082408/grid-0222.png
https://cdn.discordapp.com/attachments/1023293330601287711/1043887299076837386/grid-0221.png

glossy rune
final matrix
#

so i rather fix it through some negative prompts and in return have good likeness

glossy rune
#

Yes, impressive results!

final matrix
rough shoal
final matrix
final matrix
#
#
glossy rune
# stone garden Here is the tensorboard for loss value on that last training

here's a tensorboard example with a few more steps and smoothing 0.9, so the dark line shows a moving average of the loss and the lighter line shows the loss value for each step. the step value fluctuates heavily (very likely due to the relatively small batch size), but the average is a bit easier to interpret in terms of progress.. but i heavily rely on visually inspecting samples from the model to judge.

stone garden
glossy rune
#

yes, that loss chart will not do that. i haven't looked into it, but wondered as well, if i can modify/control the image generation during training for everydream. should not be too complicated... i'll let you know, if i find something

#

train loss is basically the difference between the prediction (image generated from prompt) and the provided image. but since there are multiple models involved, i'm not sure about all the details.

a validation/test set could be used to monitor performance on some benchmark (preservation) prompt/image combinations. then validation/test-loss would show, if that becomes worse overtime. but would still not necessarily catch any bleeding effects

#

if i see this correctly, the sample images are the first (?) n predictions from the current training batch. so while everydream is looping over your training data, you can see on tensorboard, what the most recent predicted images looked like. if you have 1000s of images, this can be a bit more random, but if your training set is small, you should repeatedly see, if there is progress on the same input prompts.

#

i'm thinking of adding a "validation set" with only ~4 prompts and maybe even no images. you don't use it for loss monitoring, but for visual comparison... the trainer should be able to run a prediction on this validation set whenever it runs it on your current training batch. and you can monitor the results on your target prompts with this 🤔

final matrix
#

I'm currently training my last model and then I'll do the final tests tomorrow and then I'll probably publish my model tomorrow evening.

I even set up a KoFi.

#

I will release my 1142 manually captioned dataset alongside the model.

And if I get to finish it in time I will also include a lengthy writeup of my process the past 4 weeks.

It will likely be the best model released for StableDiffusion yet with it being highly flexible and having high likeness. The only major shortcoming is the fact that 512x512 generations in the artstyle WILL look garbage without img2img to 1024x1024 or a higher resolution as well as inpainting the face.

Some outfits will also work better than others simply because some outfits just have massively more training data for them than other outfits.

The closer you generate a character the better it will look from the onset too. Full-shot images will sometimes need img2img up to a 1536x1536 (the maximum my GPU can do) resolution before looking okay.

The model works really well with img2img however and I recommend always using img2img to upgrade the initial 512x512 to a higher resolution or using a photo as a basis and transferring it into the artstyle using img2img.

More simple prompts also work better than more complex prompts, though the latter can work with prompt engineering.

Due to the nature of how I trained the model prompts will also often ironically need the artstyle as well as some other negatives like instagram; vignette and blur in the negative prompt section in order to massively reduce the overtraining effect. You will see what I mean when I provide a guide on how to use the model correctly tomorrow.

final matrix
viral jay
#

oh nvm its the d8ahazard, thats weird I was doing git pull but mine was still "old" and didn't had those options, I reinstalled it and now it showed the new options

slate vessel
#

What I want as a finetuned model is a Enchanted/Disenchanted model since the style is Disney-like but not classic Disney animation

steep nova
#

a painting of two men playing chess while sitting in a park, by rutkowski and artgerm, highly detailed, trending on artstation, movie concept art, cinematic lighting

last swallow
#

I created a WebUI extension that can predict tags in a single or multiple image. Supports two tagging models: DeepDanbooru and Waifu Diffusion 1.4.

leaden patio
#

Has someone made a dreambooth of liminal spaces yet? Where can I find it?

final matrix
#

I hereby finally release my All-In-One Legend of Korra artstyle + Korra character model to the public, including a 1142 manually captioned dataset!
Do note that this model is trickier to use than other models you may be accustomed to! I have a "How to" section on the Huggingface page for this exact reason.

Dataset download is also included at the bottom of the page there. The page also has example images and a short explanation of how I created this dataset and model.

https://huggingface.co/ai-characters/4elements-diffusion

peak ridge
#

guys is there a simple way to native train SD? like with a script like dreambooth or somthing? i have a bunch of images manually tagged / captioned, different concepts involved, and i feel that dreambooth is kinda limited and destroys the original model... not sure where to begin, i found everydream on github, but not sure it's the right thing, as it seems to be based on dreambooth - and i need to teach sd new concepts, not replace existing ones...

plucky zinc
#

anyone have experience training cartoon characters on dreambooth? what would you use as the word on a character that's somewhat humanoid but not really a human nor really resembling any kind of animal? just character? or creature maybe?

final matrix
#

my god

peak ridge
glossy rune
#

i'd recommend dreambooth first (even though it might not be a perfect fit for your captioned dataset). everydream would be a better fit, but it's probably easier to get started with dreambooth.
auto1111 now seems to have dreambooth ui integration. if you know a bit about python, i'd recommend the diffusers library -> examples -> dreambooth as a simple script to start off. and then maybe proceed to more elaborate setups.

peak ridge
#

oh yea, i already train with dreambooth, but it's very limited, i want to train it new concepts - while keeping the original model intact

glossy rune
#

you can do some of that with dreambooth (dreambooth on dreambooth works). but then proceed to everydream

peak ridge
#

yea, i've been reading about everydream, seems a step forward from db, one thing i'm not sure (with both of them) is captions / tags : i will add tags / captions manually for max accuracy, but i'm not sure if i would add them in human readable format (like "a man with 6 legs walking on the beach") or comma separated (man, 6, legs, walking, beach)

glossy rune
#

for everydream you just caption your images in the file name (on of the options) in human readable format. think alt-tag

#

you can then add/replace some specific terms, where you dont want to just extend the training but define something new

#

"a person sitting on a chair" -> "characterxy sitting on a chair"

#

everydream tools comes with blip autocaption

peak ridge
#

cool stuff, read about this, but wasn't 100% sure about it, thanks for the answer

#

also, when i interrogate clip / blip to get a description - because the images i want to train on are really sci-fi - it doesn't understand the concepts in there (because it doesn't know them) - should i use my own captions (that represent the exact reality of that picture - like man with 6 legs on the beach) or whatever clip / blip gets me (a man walking on the beach - but this is missing the main concept i want to train)?

glossy rune
#

you will probably need a pretty big dataset (i'm using >=10k imgs for projects that would come closest to what you hint at) gpu and training budget to teach as many new concepts. but yes, then i'd sure go with custom captions.

peak ridge
#

i can get the images - currently at about 2k captioned - and i can also get gpu power

#

what i'm trying to achieve is, based on the "man with 6 legs" concept (and some other sci-fi stuff) - train SD to generate a random man with 6 legs when prompted to. what i currently get with dreambooth is the same man with 6 legs as in the training images - if there's a way of doing this with DB, i can try it, as it's fast and working ok. should i try captions like "man6leg, man with 6 legs"? and when i prompt man6leg it will generate the exact characted in the training images, but when i prompt "man with 6 legs" it will generate a random man with 6 legs, in the same style? not sure how to approach this

glossy rune
#

i see. you can modify a dreambooth concept. i trained a person with db ("sks person") and made them e.g. different dress sizes by prompting ("overweight sks person").
so i'd guess you can change the appearance of the upper body by prompting your dreambooth model. and if you get to a very good quality level, then you can generate more images of different looking 6-legged men and do a fresh dreambooth with them.

#

if it is only this one concept, i'd go with dreambooth and probably a dataset of 50-100 images.
if you have a lot of all-new concepts like this, then i would consider everydream and the big dataset

peak ridge
#

sounds like a plan, one last question about DB (this whole discussion has been really helpful btw), if i train one concept, and then use the trained ckpt file to train it a new concept, will that work? i mean, will it know both concepts? could i train multiple concepts this way? one at a time? in case everydream doesn't work as expected (DB works fine at the moment)

glossy rune
#

yes, i've done that for two iterations successfully (e.g. style and person). not sure how often before stuff fades or other problems occur

peak ridge
#

going to give it a try with db first, then try the other thing...

final matrix
#

i just noticed that tagging locations such as "city" or "exterior" may have been a huge mistake as it now always tries to refer to my training images when doing those prompts.
i will change my caption set and train a versiion 2.0 of my model on it and see what happens!

plucky zinc
glossy rune
#

thanks, i'm aware. but i still find it works better than some of what i tried.

split acorn
#

yeah, I hope that trend of sks goes away

frosty wave
#

Hi! I'm new to training, however I already tested hypernetworks and Dreambooth succesfully on my local SD.
I wonder if it is possible to manipulate characteristics of main classes ? Say for instance I want to create a model in which everytime I prompt for a "child" it generates a child having only one eye in the middle of the forehead, instead of a normal child with two eyes. I mean, really by prompting "child" only, without adding "cyclop" before - that is to manipulate the classe "child" itself. Given that I do have a bunch of SciFi cyclop kids images to feed him...
Is there a strategy for this kind of thing using Dreambooth or any other tool?

glossy rune
#

i'd not go down that path. with all the training the model has seen before your dreambooth, it has learned about children and faces and eyes (even if it can't count fingers...). thats a prior. if you want to modify that prior, you have to move the model weights by a lot. and that is likely do damage other things it learned in major ways. in dreambooth you create something new, that should have almost no prior, to give it some meaning and the weights can combine to result in learning child and cyclops should be returned when asked for the newly learned concept.

#

fwiw i have once tried to modify brad pitts face with prompting and it wouldnt show a black eye or bruises or anything with simple prompts. there's a stong prior. with major weight modifications on parts of the prompt, you could get some results, but you could feel how this was not a good approach. with a good approach, you get at least decent results pretty effortlessly

frosty wave
#

I see. Now I noticed that when I learn SD a new indtance/concept via Dreambooth while giving him as "class pictures" a database of images that have not been generated by SD, then it does have an impact on the class. Once the concept is learnt, the class itself seems to have been a bit changed in a way that matches more or less the biases of my database.
For instance i tried to learn him my own face with a few pictures of me, but using a "person" class picture database that were real photos of a bunch people. However that database had a strong biais: a big disproportion of older people (statistically compared to a normal population). Then I noticed after that, when I prompted for a person in general (and not only myself) it gave me more often old people than before, statistically.
So that's why I wonder if we can manipulate a classe by learning concepts as usual, but using biased image database as "class pictures".

glossy rune
#

class images in dreamboth have the purpose of allowing to train the instance without damaging the existing knowledge too much. i dont think this should be used to modify the class, but only the instance.
i prefer to add knowledge instead of overwriting/redefining. much easier and you dont work against a mountain of gpu hours of the original training.
in case you want to modify more than an instance, research everydream. that is not dreambooth, you dont train instance/class but just train the model with captioned data. allowing to add and modify concepts.

split acorn
#

it's a bit more complicated than that though and there are a lot of other factors to consider as well (like the model / settings your training with including how much)

#

It'd say doing that way isn't ideal though alicatKEK2

split acorn
#

I'd do a proof of concept to show you, but I can't locally train DB alicatCry (I use runpod)

warm cloud
final matrix
#

I radically changed my dataset:

  • removed full shot, half-body, closeup etc
  • removed location tokens (e.g. exterior, interior, city street etc)
  • removed all mentions of facial expressions but smug
  • removed mentions of poses such sitting, lying, etc except fighting pose
  • removed all , and ;
  • removed mentions of non-show clothing
  • made the syntax the same for all images and added "with" in front of the hairstyle mentions

my thinking here:

my version 1.0 is very bad at locations and landscapes. not only do they look bad but i have noticed that its very focused on my training images. my belief here is that this is due to me tagging all locations within my images, like "city street" when an image has a city street background. so instead of applying the tlok artstyle to a random city street when prompting it, it will output the city street i trained it on. and because in some shots you would barely even see any of the city street it would now always only give me very zoomed in shots of a city street.

this is likely because each caption token is trained into the model as a token. so it trained my training image version of a city street into the city street token.

so now that i left it empty again it should still be able to learn what a street in the tlok artstyle will look like, but without training the specific street into the model.

and i have done this for all things now. so almost all my style images are now tagged as just "tlok artstyle" except for those inages where i do want certain things be trained into the model, e.g. when people wear earth kingdom clothing.

as for the other stuff: i think the whole full shot stuff was unnecessary and cluttered the captions too much. i should be able to prompt a full shot just as easily if i negative prompt closeup and the like.

i think the , ; are unnecessary and may even be bad? anyway lets see what happens without them.

i also removed mentions of any facial expressions the model seems to already know; e.g. all but smug.

similarly i removed all mentions of poses it should already know.

similarly to the city street thing i removed all mentions of non-show clothing and hairstyles such as a tshirt because i believe that the model can infer this information itself from the training but if i tag it as e.g. a tshirt it will now always try and give korra that tshirt.

and last but not least i had some images where it was "wearing X outfit Y hairstyle" but then some where it woulf be "wearing X outfit fighting pose Y hairstyle" and i have no idea if it influences anything but just to be sure i made the syntax the same for all now and made it grammatically correct, e.g. "wearing X outfit with Y hairstyle"

i have also even further upped the training rate from 3e-6 to 5e-6 and i am very curious if it that will work. i am doing it for only 25 repeats right now and taking a ckpt every 5 repeats.

stone garden
#

interesting changes. I'll look forward to the results.

The whole "training the street" part, I get and understand well, I made the same error I think. For most of those other tokens, using them once is nice, too many times and it trains them too much too, almost as much as the base concept we are teaching.

Having less token in the prompts will also make learning faster I believe ? It seemed so for me at least, I thought because the attention was higher on the main token I was teaching thanks to this.

And a big thank you from me and others of the community, very interesting and detailed analysis !

frosty wave
frosty wave
glossy rune
split acorn
#

100% yeah

final matrix
stone garden
#

so over captionning is bad too.... hum.... well I have some work to do too x)

final matrix
#

i might just add the "full-shot" captions back in though.

stone garden
#

don't add it on all the pics full shot. 5 to 10% of dataset is a lot for such a keyword : it's already mostly known by SD

tranquil river
#

If i were to want to fintune a model on pixelart 32×32 pixels (tiles from a game) should I first rescale them to 512×512? or can I/ is it better to work with the small images?

split acorn
#

There are a couple examples out there, could use those as references to see what they did. Typically, they use sheets of 32x32 pixel characters

lament moat
#

anyone have any tips or know of a good resource for bleniding models using the "Checkpoint" feature on the webGUI? I have had some luck, but it always seems to go one step too far. I am especially lost on how the third "tertiary" model fits in.

split acorn
tranquil river
#

that's great thank you! I'll check it out :)

split acorn
#

I'll send another one sec alicatPog

#

There are some models that do pixel art better than others, as well

#

I think the NAI does pixel art well, but they haven't added the ability to train yet (though, I imagine they will at some point? no idea, I know they did "Prompt Tuning" which works similarly but for GPT text based stuff)

crimson sandal
#

I am sure someone posted here before so I apologize for re-asking the question. I am having a hell of a time training a stable diffusion model on multiple people. I see people have done it using dreambooth, but I am getting terrible results. Has anyone successfully done this? I would love to chat if so! These are real people and I want them all to be as distinct as possible but coming from the same model.

split acorn
#

Lots of examples! There's a Genshin model trained with multiple which looks really good. There's a discord server specifically dedicated to Dreambooth that also has a channel dedicated to multiple subjects, I can DM if you'd like (I have no affiliation in any way)

sullen apex
#

0_0

stone garden
#

so, you wanted more details on "50 total pictures in the dataset, 160 repeats total each, over 4 Epoch on LR1e-6."
160 repeats means that each picture is trained 160 times on the model.
Epoch is a way to cut the training in parts. In this case, 4 epochs mean that those 160 repeats were done in batch of 40. Every 40 repeat of all 160 pictures (randomised), it would make a checkpoint. each epoch makes also sure every picture is presented for its number of repeats.
LR is learning rate. 1e-6 is 0.000001. This is the speed at witch you train and learn things. Higher can go faster, sometimes bring lesser quality results. lower lets you manage more in my opinion

sullen apex
#

But there are 50 images so 50x160=8000 trainings so 2000 per epoch?

stone garden
#

yep this is 8k steps

sullen apex
#

Ok that makes more sense thanks

stone garden
#

one last param I didn't show is batch size

#

it's how many pictures are trained on at the same time

#

more VRAM for sure, but higher speed, like in image generation, AND quite higher quality from what I can see

#

not 100% sure on quality, could be confirmation bias

#

still experimenting

sullen apex
#

Interesting!

stone garden
#

those 2 models (character + creature), I did 4 trainings total, I spent around 100K steps total lol

#

200GB of checkpoints compared x)

#

I tried making only 1 model for both style, but didn't have enough time to make it work

sullen apex
#

Can you just weighted sum merge 0.5 at that point?

stone garden
#

you mean do a sum/average of all models ?

#

I loose quality on each concept when I do that, I prefer to train all concepts at once, but it requires a little more experimenting, I'm trying to see how I can add 2 datasets correctly... Like here, I didn't use any regularisation outside of full captioning (witch helped), but I've been very worry of not overtraining, and tested mostly on that criteria

silent holly
split acorn
#

Model merging is amazing, would def recommend it

#

random use case, I downloaded the D&D model but it didn't do what I wanted it to do. Merged it with a high quality model (using add difference) and now it can produce results closer to what I was looking for

crimson sandal
karmic warren
#

hey, i'd like to explore changing the dataset for DB training as i add more steps, has anyone experimented with it already ?!

#

my initial thought is to start the first 500 steps with a 50 50 mix of 2x2 3x3 grids and single pictures, then gradually fade out the grids until 2.5k steps and add another 2.5k steps with only single pictures

#

unless someone with experience advises against that, that's what i'll be trying in a couple hours 🤞

open abyss
#

I'm fine-tuning to get a specific person, and I am fortunate enough to have a "plenty big" set of data -- 200+ images -- at very high resolution. The automatic1111 built-in crop-and-caption was being stupid and wasting a ton of space on backgrounds, rarely getting the face in-frame (even with face attention turned on!) so I'm manually cropping. But since I'm the one in charge, I realized I can crop both an upper-body portrait (showing the person's figure, fashion sense, and posture) but also go in close and get a 512px face portrait from the same source.

Does anyone know if having "repeats" of the same data at different scales will be beneficial or detrimental to my training?

split acorn
#

I've done multiple TI/HN/DB to good success with that method. Seems to work well! I'd just be careful with flips and small datasets with only a few images.

#

I don't have anything to compare it against, so it could technically still be detrimental, but AG_Shrug

ocean patrol
#

Has anyone tried to train a model to generate normal maps (or height maps)? I'm thinking img2img could be very useful for that.

#

Maybe it'd need to be a style?

frosty wave
open abyss
#

Thanks! I've mostly been doing sets of 25, and getting really solid results, so I'm excited to have a set of 250+ HD pictures to work with. Do you have any experience with setting the learning rate to something besides 0.005? I tend to drop mine by ~2 orders of magnitude once the outputs start to look like the original and get pretty good results... but even at 100K steps I still get some weird mutant/plastic outputs occasionally

karmic warren
#

trying to make a publishable set of anime pictures for women characters :
first try used AnythingV3 with the prompt a woman character the result was definitely not publishable,
currently using Elysium_animeV2
img2img with the output of AnythingV3
positive prompt: a character
negative prompt: lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, missing nose, young, sexy, loli, lolita, children, child, kid, cat ears, horns, nudity, skimpy clothes, swim suit
DDIM 64 steps

#

doesn't seem like it's toning down the output too much, suggestions to get relatively neutral characters are appreciated

karmic warren
#

it seems to work much better now, the trick was o not try to tone it down but tone it up:
generated a bunch of realistic women of all ages and looks then asked AnythingV3 to img2img with
prompt: a woman character
and all the negative prompts above

#

should be done generating in two hours, is Mega alright here for uploads or is there a dedicated area for data ?

bold saffron
karmic warren
bold saffron
#

welp, there you go I gues

final matrix
# final matrix <@456226577798135808> its working! still very early repeats but this is promisin...

@stone garden
ok finished model results:

this model is much better at

  • outfit likeness
  • scenes/backgrounds/landscapes
  • general flexibility

its worse at

  • overall artstyle and cohesion for some shots, particularly those i didnt label and which were hard for the AI to discern itself what the image is supposed to represent
  • full-shots
  • Korra wearing non-show outfits

i have also done some testing regarding "Y artstyle" vs. "image of X in the Y artstyle"
and my conclusion is that "image of X in the Y artstyle" is superior.

case on point: ukiyo-e.

1st image: car ukiyo-e
2nd image: car ukiyo-e style
3rd image: image of a car in the ukiyo-e style

the third image output a car or something car-adjacent the most often.

i will create a version 3.0 of my dataset now which will change the following things:

  • readd full-shot, but not half-body or closeup since the model does that already well on its own. i will also only add full-shot to images with a background, not to the character concept art with no background
  • readd captions to some style images like city streets etc, but less, more simple, and as "image of X in the Y artstyle"
  • readd captions for some but not all non-show outfits, e.g. tshirt
  • make the outfit tokens more unique and be only a single token
  • either remove some of the outfits or get better images of them

https://cdn.discordapp.com/attachments/1029222282511515678/1044896496673374208/grid-0585.png
https://cdn.discordapp.com/attachments/1029222282511515678/1044896497025683506/grid-0586.png
https://cdn.discordapp.com/attachments/1029222282511515678/1044896497344446494/grid-0587.png

final matrix
#
#

for those who dont get what i am doing here:

i trained my captions as just "caption tlok artstyle" so far
and have issues with prompting certain things properly in just the style
so i have done this testing and turns out "image of X in the Y artstyle" is a superior way of captioning in order to teach the model that you want just the style, nothing else

stone garden
# final matrix for those who dont get what i am doing here: i trained my captions as just "cap...

One thing that could be a bias here for the "artstyle > style" argument is that you trained on artstyle keyword, but unless I misunderstood, you didn't train on the style keyword to compare, just used the style token instead of artstyle while prompting, right ?

If so it would seem logical that the "artstyle" keyword is stronger than style, since it's what you trained on, and would respond the other way if trained on the other token

I wanted to do an example with the "AnimeChan Style" model trained on this "style" token instead of "artstyle", but to be honest, both keywords come out as well to me there. slightly different results but none is really closer to the dataset artstyle... Well, I'll try "artstyle" next training

your examples are quite striking in difference, mine is quite tame (I ran different samplers and seeds, it came out the same), so this artstyle method seems to have merits 🙂

final matrix
#

not from my trained model

#

so that the testing is pure

#

you can run those test prompts in your vanilla SD model right now and get the same results

stone garden
#

ho ok sorry I didn't understand right

#

style or artstyle acts a little like "class" here then a little
on one side, it should help the teaching process to get there faster, since it starts from closer,
on another side, it will train the artstyle token a little too, so maybe having a small set of pictures in any other artstyle, and tagged as such, could help prevent bleeding a little
That's super info. I'm not doing a model right now, but I'll try it with the PoW submission model tomorrow

viral sail
#

Hey guys! I'm currently delving into the topic of textural inversion, and I often come across the term "DreamBooth". Would anyone mind explaining or point me to a resource to see what the difference is? Thanks!
As in: What is an embedding, what is a textural inversion and what is DreamBooth? seems like they're all in the same vein

rich cipher
#

Hi guys, I'm somewhat stuck with dreambooth... I trained a model based on 1.4 with 25 images of me, started with 2500 steps (learning rate at 0.000001 or 1-E6) and got results that don't really look like me. Added 500 steps multiple times now, and all I'm getting out of a simple promt is a random Chinese dude with my beard and cheeks and tons of age spots like a 100 year old... Either I'm doing something fundamentally wrong or dreambooth just hates me...

timid fable
#

hello, is there any tutorial on how to train a style with SD Dreambooth extension? I'm getting really weird results

split acorn
open abyss
# viral sail Hey guys! I'm currently delving into the topic of textural inversion, and I ofte...

An embedding is a map from text to a set of tokens that will likely result in an image matching the text; when you do Textual Inversion, you create an embedding (saved as a small *.pt file) that is added to the list of things the model "knows". The training process involves tens or hundreds of samples where the text prompt is fed into one end, and your reference image into the other. The reference image is diffused into noise, then the model tries to find the embedding that results in the same target noise, so that when the diffusion is reversed, the image that comes out resembles the starting image (and therefore matches the prompt).

open abyss
# viral sail Hey guys! I'm currently delving into the topic of textural inversion, and I ofte...

Dreambooth, on the other hand, is fine-tuning the model. It does contrastive/ablative learning where it contrasts pairs of images from the "prior class" and your new target -- for example a dog and a Weimaraner dog -- and learns the difference in much the same way. But in order to do that, it's modifying the checkpoint model itself, essentially grafting the whole training process onto an existing .ckpt file so that the result is a checkpoint that has stored your concept in a rare/sparse token somewhere.

viral sail
#
Jerome Stephan's Notion on Notion

This guide is meant to be a starting point for those who want to start using SD, findings from experiments and a possibility to learn more for everyone who is already experienced in SD. Inspired by Ethans incredible travellers guide to latent space for DD, with most of what you’ll read driven by the amazing community around this open source soft...

open abyss
#

You may, but you probably want to go back to the original academic papers that the two are based on, and use the authors' exact words (or cite their papers). I'm certain that there are things in my descriptions that gloss over other (academically) important differences

final matrix
#

im wondering how bad it would be to just straight up use this concept art shot instead of making each of those shots their own image like in the 2nd image
like say i caption this as "character concept art of" and prompt this character but with character concept art as a negative prompt.... what would happen? would i get one shot of the character or multiple?

stone garden
#

I finished an installer for Shivam with xformers and 8bitadam for windows, with a working test run. I'm working on a UI to prepare, queue and test/compare learning sessions in it

stone garden
#

the install and testrun work nicely, the UI works but needs more tests, windows Nvidia only, not sure about the minimum VRAM it requires, it runs on 16 for sure, it should theoricaly run on 12 but I couldn't test

stone garden
#

confirmed to work on 12GB 🙂

open abyss
stone garden
# open abyss What’s Shivam-and-all-that? I’ve got 12GB. Do I want it?

Shivam is an implementation of dreambooth that can let you train models on new pictures
https://github.com/ShivamShrirao/diffusers/tree/main/examples/dreambooth
It's really nice but to set it up it can take some time. I don't guarantee my zip will work on all computers though, I've tried on 2 for now

GitHub

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch - diffusers/examples/dreambooth at main · ShivamShrirao/diffusers

rigid sorrel
#

QQ for you model pros. I finally have a model with good output, but it only does 1 thing. How do I tune it for more generality? AKA I'd like to produce something besides super amazing female portraits.

hot breach
#

add some other stuff back into your training set

rigid sorrel
#

others stuff like what?

atomic lagoon
#

super fast

hot breach
#

if you want to tune more generally, add "general" images back

#

seeing examples of your problems would help

rigid sorrel
#

I didn't have general images in the first place to remove. So you're saying I should find some general images close to this style?

#

They all look like this, please help!

hot breach
#

not sure I fully understand your goals and problems achieving, may need more details

rigid sorrel
#

its very difficult to get anything but a close-up portrait

hot breach
#

plenty of fine tuners now allow you to caption images so you can train as many concepts, subjects, and styles at once as you wish

#

instead of class/token nonsense

#

you need labeled data but you can train tons of stuff at once if you don't mind the compute time

#

and the labeling effort

atomic lagoon
rigid sorrel
atomic lagoon
#

wich model?

hot breach
#

its just a matter of labeling all the images and using a trainer that can take captions for each image for the most part, you can train 5 artstyles at once if you want

rigid sorrel
hot breach
rigid sorrel
stone garden
#

Hey all, i need your wisdom 😇 Do you think it would be possible to train a model to produce only white/transparent-looking background images where the subject is always centered, isolated and full body? Would that work if i train it on a dataset of images with only these charachteristics?

stone garden
#

you can go multiple ways here

#

first you can teach all those concepts on their own, and be able to call for each one individualy in the resulting model. This is the hardest one that could be more complicated to train well, but gives the more possibilities in the end

#

you could teach all those concept all at once, under a common token, like "photoshoot" or something. A single concept is easier to teach but you won't be able to prompt for only one of those characteristics you wanted, it will be all or nothing

#

if you want this "photoshoot" concept to work, you will need to teach it as "a style", so it can be used on a whole lot of types of subjects

stone garden
stone garden
#

but yes it would work

#

if you want to test yourself on local install and have at least 12GB VRAM, I tried to put together an installer yesterday, and I'd love some more tester. only for windows and NVIDIA cards, but still

stone garden
#

That means you still can, but on Collab

crimson wasp
#

Does anybody know what kind of parameters the huge multi-subject models like Novel AI/Waifu Diffusion/the furry ones which I think exist though may have dreamed it, use for training? e.g. Would you want a lower learning rate when training on a hundred different concepts at once? I'm currently trying to build a model for controlled posing and number of people with text by using a few thousand really well tagged images

final matrix
final matrix
#

the differences are quite subtle though

final matrix
#

ok so maybe i just need to be more specific about my captions
like i just did a test prompt in vanilla sd

"photo of a house in the ghibli style" vs. "photo of a house in the ghibli architectural style" vs. "(photo:2) of a house in the ghibli style"
or
"photo of a house in the art noveau style" vs. "photo of a house in the art noveau architectural style"
clearly being more specific helps a lot. right now i just captioned all show screenshots as "tlok artstyle" (not even "image of ... in the..." and the people in there as say "woman"

so if i make my captions more specific and detailed i may be able to better contain the infection. like say "screencap from the tlok anime of the woman Korra with the avatar ponytail hairstyle wearing a sleeveless avatar shirt and avatar fur skirt and avatar armband and avatar sleeves and standing in front of background art of a building in the tlok architectural style"
or something similar
though that would be a gigantic 49 tokens, the fantasy card model trained its model on images that all on average have around 50 tokens so it should be fine?
https://cdn.discordapp.com/attachments/1026983549154361425/1045373282385412166/grid-0264.png
https://cdn.discordapp.com/attachments/1026983549154361425/1045373282725134396/grid-0271.png
https://cdn.discordapp.com/attachments/1026983549154361425/1045373283006156871/grid-0270.png
https://cdn.discordapp.com/attachments/1026983549154361425/1045373283429793932/grid-0272.png
https://cdn.discordapp.com/attachments/1026983549154361425/1045373283773710336/grid-0273.png

unique berry
#

If I have ~15k images that are well captioned, does it make sense to do dreambooth? I’ve been doing unet finetuning and the model doesn’t seem to learn the style in the dataset (white background, front view of the shoe in middle)

dapper prism
copper lagoon
onyx vault
copper lagoon
#

well there's a bunch of custom parts in there

#

I can't reproduce the results 1:1 anyway, for some reason

#

with stated parameters and prompt

tiny wolf
#

Anyone know any good models that are good for creating images of birds eye views of towns and cities?

frank mango
#

Hey, is using multiple GPU (A5000) faster for dreambooth training ?

heavy lynx
#

Hey, what's the difference for the repos/colabs of ShivamShrirao and TheLastBen for dreambooth?

tacit bronze
#

a first version dreambooth gen to create hd2d game screenshots (octopath/livalive/dq3)
"hd2d, a crowded desert marketplace in front of a pyramid"

pallid perch
#

what would be the best method for training an action or activiy, like fencing, acrobatics, etc. ? dreambooth ckpt, TI, hypernetwork ?

frosty wave
stone garden
#

Well main difference is that it trains on diffusers instead of ckpt

frosty wave
#

^Ho ok sorry I'm still a bit new to all this but... what are diffusers, again? 🙂

grave carbon
open abyss
# tiny wolf Anyone know any good models that are good for creating images of birds eye views...

This should be fairly easy to train because satellite images and maps are widely available and may even be in the training set. See if satellite image in the style of SPOT or MAXAR gets you results. Might need to use img2img on street maps to get good results because the structure of roads is a complex rule-based layout — like hands & text, a generative model will struggle and a human brain can recognize flaws in it

tough gazelle
#

Using "Satellite photo of "works pretty well in SD 2.0

viral jay
dense hawk
#

Hey I want to setup my own personal thumbs up thumbs down local image gallery so I could easily manually train my aesthetic. Does anyone know of any open source simple solution that tags the images in a way that works well?

tight cradle
#

Hi Folks, So if I would like to train my SD on a certain aesthetic style of logo design and I collected a large amount of different images representing this style. How would I use these images to train my AI using automatic1111? Do I use dream booth for this or should I use the Train tab (which sub tab)? Are these even the right solution for what I want to achieve? It is not like I am trying to train on the face of a single person but I want my prompts to have a specific style text-image is not giving me at this moment. What do you recommend?

dapper prism
stone garden
#

https://github.com/Guizmus/DreamboothSimpleUI
Dreambooth local install for windows, compatible with 2.0, delivered to you with 2 working examples, one on 1.5 and one on 2.0.
require conda and git
The 1.5 example should run under 12GB VRAM

tough gazelle
rigid sorrel
#

why do folks use "style" as a training set vs "art"?

#

"style" is a lot of fashion & magazine prints

#

sorry, as class images

viral jay
tough gazelle
viral jay
#

lol yeah kinda same result I got here, I'm looking for better quality, maybe dreambooth can achieve it, will see, need some hq tiles

dapper prism
#

How does EveryDream training time vary by dataset size? Can I train a small dataset in a matter of hours?

viral jay
#

using some tiles for training I can achieve some higher quality images, but still need to tune this thing, probably need captioning to describe where is fields, city, etc

stone garden
gloomy belfry
#

anyone tried finetuning the 768 model?

#

after a few epochs all I get in the samples is this

#

(prompt is just a test, that's the image name as well)

#

works fine with the SD2-base model so I'm not sure what's wrong

sharp solstice
gloomy belfry
#

mine, but it's based on Shivam's which is diffusers

sharp solstice
#

i found that applying an embedding from 1.x "works" on 2.x but you get the brown smudge like in your screenshot

#

automatic1111

gloomy belfry
#

amm

sharp solstice
#

i can't train based on 2.0 because the textual inversion stuff is written to work for the previous clip model

gloomy belfry
#

the only difference between the two is that they used v-prediction training on it

sharp solstice
#

it kinda sounds like you were training on the previous clip model and tried to use the result on the new one

sharp solstice
gloomy belfry
#

nuh I'm training and sampling on per the model from diffusers

gloomy belfry
sharp solstice
#

ah okay, then nvm i'm not sure what i'm talking about

gloomy belfry
#

actually it might just be the scheduler

#

but not sure

sharp solstice
#

where can i try your repo?

gloomy belfry
#

you can't, it's not online

#

I might share it at some point but I'm still changing it a lot

#

it's basically Everydream but based on diffusers, so I can do 24 batch_size on a 4090

sharp solstice
#

is it notebook based?

gloomy belfry
#

no

#

it's geared more towards large fine-tuning

weary knot
#

Does dreambooth always involve a new model? Can you join two dreambooth finetunings into a single model?

hot breach
#

mostly a new model, there are some model merging tools but I think they'll generally water down the stuff you train

weary knot
#

thanks a lot

crimson wasp
#

Just a heads up, the CLIP text encoder's vocabulary has a lot of words which end in </w>, often the same word twice with and without </w>. It seems to indicate the end of a word made up of 1 to n tokens and likely has some sort of marker in the embedding which SD understands.
apple = "apple</w>":3055
applesauce = "apple":8629 "sauce</w>":5520
computer = compu":11639, "ter</w>":652
ahsoka = "ah":1772 "so":759 "ka</w>":1525
arnold schwarzenegger = "arnold</w>":13609 "schwarzenegger</w>":33860
emma watson = "emma</w>":7445 "watson</w>":9294
chadwick boseman = "chad":13095 "wick</w>":6479 "bo":647 "seman</w>":28378
https://huggingface.co/openai/clip-vit-base-patch32/resolve/main/vocab.json

It could be useful when picking tokens for training (e.g. using an existing single word token might be bad, using a new combination might be better), or for initializing embeddings (where more tokens is probably more unweidly, and finding an init string with fewer tokens is probably ideal)

crimson wasp
#

sks for example is "sks</w>":48136, which is probably why the issues with it were so pronounced. It's already an existing single word

if your token was applesauce you'd be working with 'apple sauce</w>' but if your token was 'apple sauce' you'd be working with 'apple</w> sauce</w>', so the space is important for whether the token is going to be treated as one joined word or multiple prompt words (presuming there's a word termination marker on those </w> embeddings which SD understands). Essentially just don't have spaces in names when training a model and I suspect you'll get better results

apple_sauce is treated as apple</w>_</w>sauce</w> so underscores won't work in place of spaces

If you want variations of a shirt, like v1shirt, v2shirt, you'd want to make sure those actually tokenize to always ending in the shirt</w> token, so that the first part is working as a modifier of the same final shirt concept (you can check with automatic's webui-tokenizer plugin)

acoustic sluice
karmic warren
#

not sure if this'll do as a place to dump code but just put this script together from the Loopback script and the sd_upscaling script, with a smidge of xy_grid for good measure 😄 all credit goes to the authors of those

#

currently testing, seems to be working up to now 🤞

#

maybe a quick description, grab the file put it in the scripts folder and reboot the UI

#

and for the use, it's an im2img script that will sd_upscale the given image then downscale it back to original resolution and run another sd_upscale with the next scheduler specified in the schedulers list on the output of the current loop, and that up to the number of loops specified

karmic warren
regal harbor
#

should I put all images of the same pose into a single folder? e.g. standing facing forward, standing facing sideways, and standing facing away? And train each separately? Or does it matter if I've BLIPed them all?

half folio
#

So I have a question, is it possible to finetune on top of v2-velocity?

#

I know there's already dreambooth models trained on top of v2-base but velocity model appears to be a little bit different, no?

sharp solstice
#

i have a feeling that when training subjects the composition of the training data has a substantial impact when training
for example if you train a subject up close, you'll have a harder time reproducing the subject from further away even though the angle is the same

#

with that in mind, i'm thinking maybe it helps to not only flip, but to rotate, scale and translate the subject around as well

#

maybe that could be aided with inpainting or something for varied backgrounds

dapper prism
#

Anyone had any luck DreamBoothing or finetuning the 768 model yet?

sharp solstice
hidden plinth
#

anyone dreambooth/finetune the depth2img model yet?

remote vapor
#

how are people finetuning the new 2.0 in automatic1111 ....... i cant get it to work

hardy lantern
#

how do you remove duplicate images from larger datasets, what software do you use?
None of the software I have tried has been satisfactory.
there are good python modules but they would require a GUI as they also produce false positives and I have no idea about GUI programming. 😭

crimson wasp
#

has anybody tried training with CLIP skip?

hardy lantern
crimson wasp
hardy lantern
#

since 1.5 doesn't use clipskip you shouldn't either, unless you train a really huge dataset

crimson wasp
hardy lantern
#

i mean a lot more pictures, besides i wonder about the 5-7

#

no clipskip and 5-6 did not have the desired results?

#

how many epochs do you have?

#

which trainer?

#

4.1k manual - that's hardworking catwhaaa

delicate stream
#

For anyone wondering why the Loss rate is weird in the first 1k steps. This is constant LR with 1e-6, i noticed during all of my trainings, the Loss rate decreases after 2k steps.

#

This is another attempt at ttraining

#

So just because your loss rate is going up and down in the beginning, doesn't mean you did something wrong

#

Some of you might be thinking, isn't the loss rate supposed to be like this?

#

Nope

#

This is like Hypernetworks, where the loss rate will never go down like that, the loss rate to be honest. It should alternate between 2 numbers high and low but eventually go down, tho super slowly. The loss rate is more or less a way to know when your model is blowing up, if it keeps rising, its dying. if it's just alternating between two numbers, its fine. Its still learning and as long as it doesn't keep going up it will be fine. So not necessarily lower = better.

#

example in some cases

#

when you train it might always say 0.169 or 0.165

#

but then you go and cancel and train again from where it left off and it will say 0.134 or 0.140

#

so there's no need for us to just see it go down, it's more of a number saying "Yup, im training, but don't let me get out of proportions."

#

and if your loss is 0

#

then....you fucked up

delicate stream
#

No problem, just trying to spread the word....so people don't waste weeks on a model like i did trying to figure out the perfect settings

#

In the end i finally figured out i just needed to wait, and just have a good dataset

weary knot
#

that gets me every time "oh it's not working" and then you just need to wait a very long time

#

iterative development in deep learning requires so much of the patience I don't have xD

delicate stream
weary knot
#

only 21 images? What technique are you using

delicate stream
#

also i highly recommend [filewords] for when you have text files

weary knot
#

textual inversion?

delicate stream
#

Dreambooth

weary knot
#

I thought dreambooth required more images... maybe I got it mixed up

delicate stream
#

Nope, Dreambooth requires less, 3-5 depending on what the subject is, but you can add more

weary knot
#

very cool. I never looked too much into dreambooth because it requires generating another ckpt

#

I love the textual inversion idea, that dude is a genius

#

but, alas dreambooth is mor precise in generating what you want

delicate stream
#

Yup, i used to do Hypernetworks all the time but after dreambooth, man i really don't want to go back. But to be honest they are both good methods, tho dreambooth usually has a better understanding of you subject.

weary knot
#

cool. How do you make it fit on a 3090 (assuming that's you card). iirc, deambooth requires a lot of compute

#

from the paper, I mean. Haven't seen the implementations

delicate stream
#

i just use the Dreambooth extension for the Automatic1111

weary knot
#

oh I see

delicate stream
weary knot
#

the community is amazing

delicate stream
#

huh....after updating the dreambooth extension and using the new sampler, i am watching a miracle

#

it GOES DOWN!

#

Still its going up again a bit

#

but like always im betting after 2k it will just keep decreasing

weary knot
#

that's very good news

delicate stream
#

hopefully, the only options yesterday was DDIM sampler training, now im trying the new Euler_a sampler for training.

weary knot
#

I spent so much money renting an A60000 instance to develop my own code.... now that I made the proof of concept for what I wanted, I should use these repos to run on my local machine

weary knot
#

maybe it was just a matter of plugging it in

delicate stream
#

idk but until yesterday my only options were LMS, DDIM, PNDM and DPM

#

Now i have 2 more

weary knot
#

cool

delicate stream
#

if they provide better quality

#

How much Vram do you have?

#

if you have 10 you can run Dreambooth, tho you might need memory saving options

weary knot
#

I use a 3060 with 12GB VRAM

delicate stream
#

then you can easily run dreambooth

#

no Linux or docker or WSL needed

weary knot
#

yeah. The thing is that I'm developing my own code, so I need to find a way for that to work with the optimizations. But these repositories should help a lot, ty

delicate stream
#

Sure no problem

delicate stream
#

this time the LR went down after 3k

#

but

#

the dreambooth extension has a bug

#

it was introduced today

#

No matter what i do, it wont generate the CKPT, so i have to wait till they fix it

#

i did a bug report but haven't gotten any replies

weary knot
#

wow so sad xD all of this and it doesn't get saved

median sun
#

has anyone used dreambooth together with a hypernetwork and textual inversion prompt keyword?

delicate stream
#

it's just i cant generate the ckpt

#

so i have to wait till they fix the bug and then

#

just hit that

weary knot
#

I see

weary knot
median sun
#

nah just asking

median sun
#

I've seen someone use hypernetworks with TI with good results

delicate stream
#

i personally haven't but i heard Dreambooth + Emebeddings are good

median sun
#

I still need to figure out how to get good training results

#

but as I'll be testing all 3 with the same source material I'll def. mix em up as test

#

what is good amount of source images?

#

I've just rendered out a 20 min cartoon as test

#

that gave me about 1k images at 1fps

#

there are a lot of similar looking images there, but they do have small differences like the characters in a different pose etc.

#

should I keep them or delete everything that is too similar?

delicate stream
#

Are you training a HN or using Deforum?

median sun
#

I'm trying TI HN and Dreambooth atm

#

deforum was for videos wasn't it?

delicate stream
#

yhea but since you said 1fps i thought it was deforum

median sun
#

nah I just rendered a video into images

delicate stream
#

So you are using the video as a source to train the HN?

median sun
#

I used 1fps to reduce the img count as 30fps would have created something like 30 to 50k pictures

#

yes

#

the dreambooth result is really messy

delicate stream
#

ahh i understand now

weary knot
#

I know of someone who did something similar

median sun
#

prob. because it's not focused on one object

weary knot
#

they animated a face then used dreambooth on it

delicate stream
#

What are you training? a style, person or an object?

weary knot
#

and it worked

median sun
#

a cartoon series

#

so a style I guess

delicate stream
#

The number of images for style generally can be from 50- 200 in most cases, but you have to understand its better to choose Quality over Quantity

#

i haven't really trained a style yet, only people

median sun
#

good I'll reduce it then

delicate stream
#

as for the LR i think for styles 1e-6 is recommended or lower from what i heard.

alpine rose
delicate stream
spring sun
#

anyone know if hypernetworks should be working on for 2.0 on auto1111?

median sun
#

is textual inversion and hypernetwork training possible with 12gb vram?

#

for dreambooth I had to use fp16 and flash attention but it worked

delicate stream
#

Dreambooth is just injecting the newly trained word and data that corresponds to generating the subject you trained. So technically dreambooth, HN and embeddings are textual inversion just in a different way.

median sun
#

ic thx I'll call them embedings then

spring sun
delicate stream
#

i mean this

#

you can train an embedding

#

or a hypernetwork

#

They are both textual inversion

median sun
#

yes I got that

delicate stream
#

no need for flash memory attention

#

use xformers

median sun
#

where can I find those for TI?

delicate stream
#

Thats the dreambooth tab

#

on advanced settings

median sun
#

So the dreambooth settings work for embeddings and hypernetworks?

delicate stream
#

No, dreambooth uses different settings

#

for HN

#

hm...in the past i used to do

median sun
#

I know the dreambooth settings already

#

I can't find any for HN and EM tho

delicate stream
#

5e-5:200, 5e-6:3000, 1e-6:8000, 1e-7
12k steps
Save an image every 100
And use [filewords]

#

For embeddings i really dont know

#

i never got good results so i never did them

#

for HN

#

other LR i tried

#

for styles

#

in my case i made a text file and named it 90s.txt and inside i just put 90s anime style

#

and then used that as the Prompt template file

#

for people

median sun
#

I have no idea what the 5e-5: etc is

delicate stream
#

i do *name of person*.txt

median sun
#

are you sure that you settings decrease vram use?

delicate stream
#

and inside *name of person*, [filewords]

delicate stream
#

example

#

5e-5:200, 5e-6:3000, 1e-6:8000, 1e-7

#

from 0 - 200 the LR is 5e-5 (0.00005) and from 201 - 3000 is 5e-6 (0.000005)

#

etc

median sun
#

ahh ok

#

my settings aren't that different from yours I'll try decreasing img size

delicate stream
delicate stream
median sun
#

I'd like to keep the aspect ratio of the originals

#

as I'll also generate with that later

delicate stream
#

But then you'd have the problem where you have different aspect ratios than what you generate, the AI is way better at 1:1 aspect ratios (512x 768x 1024) than wide images

#

tho i suggest dont go over 512

median sun
#

768x432 seems to work now

delicate stream
#

for better results

median sun
#

I'll test that later

#

also 512x512 backgrounds are kinda juck

delicate stream
#

if you really want to get the entire image i recommend cutting them in half so it fits into 512 so you'd have landscape image split between two images

delicate stream