hollow spruce Nov 13, 2023, 12:24 AM

#

kai has. I'm still with LoRA as I've not hit any limits yet. just a lot of effort x_x

tall condor Nov 13, 2023, 12:26 AM

#

maybe kai can give some tips on dreambooth on sdxl

#

and if you have larger datasets all is alot of effoert

#

one thing tho. i have learned that if you use random crop you can train 5 models close to overfit and if you merge them later into one model the results get really neat

#

so for me the key to sucess was merging multiple models into one final

#

the difference was so big to the single models you wont believe

#

also if you advance your model i like to add the new resuling model on top of the old or vice versa

#

especially for faces that make alot of difference

#

this way you can also add new stuff to your old models

hollow spruce Nov 13, 2023, 12:29 AM

#

tall condor one thing tho. i have learned that if you use random crop you can train 5 models...

oooooh. this is a cool Idea. 👍

tall condor Nov 13, 2023, 12:29 AM

#

trust me it works great

#

i recommend to start merging based on a very low learning rate model

#

if will give a very flexible result

#

the model you start your merge with makes huge difference

hollow spruce Nov 13, 2023, 12:30 AM

#

now I also need 4x 4090 XD

#

I'm working with a mere 1 of them

tall condor Nov 13, 2023, 12:30 AM

#

xDD

#

its a pain isnt it?

#

i started with one

#

then 2

#

and now 4

#

and i wish i had that a100 cluster you refered to

#

but if you have a progessive model that you advanced over time

#

try to merge it

#

the results are much better sometimes as long as you weight in the newer with enough weight

hollow spruce Nov 13, 2023, 12:32 AM

#

but this might be a good time to ask... does it actually double your generating speed? also can you actually run them in multi gpu mode, or do you just run 2 instances of your training XD

tall condor Nov 13, 2023, 12:32 AM

#

no it does not double

#

it doubles if you dont use random crop

stiff dust Nov 13, 2023, 12:33 AM

#

hollow spruce kai has. I'm still with LoRA as I've not hit any limits yet. just a lot of effor...

no, I only train lora

#

but you can merge loras, too

tall condor Nov 13, 2023, 12:33 AM

#

yes

#

merged loras are amazing too

stiff dust Nov 13, 2023, 12:33 AM

#

in principal there is nothing you can't do with lora you could do with Dreambooth

tall condor Nov 13, 2023, 12:34 AM

#

so you loose about 20% speed with multiple gpus because of the syncronization

#

but it still much faster

hollow spruce Nov 13, 2023, 12:34 AM

#

tall condor it doubles if you dont use random crop

if you run a script to randomly crop them pre-training, and pentuple your dataset - wonder if that would help too

tall condor Nov 13, 2023, 12:35 AM

#

it would. but if you run 200 or 300 epochs with 2-10x repetion just imagine how many variants you will get

hollow spruce Nov 13, 2023, 12:35 AM

#

tall condor so you loose about 20% speed with multiple gpus because of the syncronization

which is fine.... might try it x_x then I can finally run vit-big-g without relying on hacks

tall condor Nov 13, 2023, 12:36 AM

#

for me random crop together witn not downscaling buckets really did alot to the quality

#

also flip augmentation

#

with those the models tend to overfit much later

#

and the resulting model was way less producing closeups

#

its like 3x slower tho

stiff dust Nov 13, 2023, 12:38 AM

#

hm, I don't like flip augmentation.

#

regarding cropping: this problem does not exist anymore in adxl

tall condor Nov 13, 2023, 12:38 AM

#

flip augmentation acually does not "flip". it only flips by a few percent

stiff dust Nov 13, 2023, 12:38 AM

#

I know, the thing is

#

faces are not symetric

tall condor Nov 13, 2023, 12:39 AM

#

maybe i need to try that

stiff dust Nov 13, 2023, 12:39 AM

#

if the subject has certain facial features on the left side I don't want to flip them

tall condor Nov 13, 2023, 12:39 AM

#

i do have my issues with the faces

#

maybe removing the flip augmentation improoves that

#

need to try

#

@stiff dust so you have not tried dreambooth on sdxl yet?

hollow spruce Nov 13, 2023, 12:44 AM

#

tall condor flip augmentation acually does not "flip". it only flips by a few percent

wait. so it isnt just a 100% horizontal flip?

tall condor Nov 13, 2023, 12:44 AM

#

no

hollow spruce Nov 13, 2023, 12:44 AM

#

never questioned that one due to the wording x_x

tall condor Nov 13, 2023, 12:45 AM

#

actually yea

#

but from my experiance its just a vertial clip

#

very see any artifcats from horizontal flip

digital dune Nov 13, 2023, 12:49 AM

#

welp this is also why I can usually get nowhere in finding optimal parameters

#

everyone has their own opinion on what to do, what not to do

#

just when I thought I found the perfect params, someone tells me why I shouldnt use them

#

guess it all boils down to trial and error in the end. There just never is a consensus on most things around here.

hollow spruce Nov 13, 2023, 1:00 AM

#

digital dune just when I thought I found the perfect params, someone tells me why I shouldnt ...

settings work along context.
• dataset size matters
• how your lora is intended to be used. what is your target success rate? (like out of 10 images generated, how many out of those need to be good enough to share with others, for you to consider it a success)
• what you're training (style/face of a person/clothing/a pose/general composition/etc...)
• how much vram you have (since above 16, under 16gb, under 12, under 10 you need entireley different workflows to make it work on your system)
• how you do your tagging
• how small are the details that you need to capture/making sure your dataset doesn't have noise, since you dont want to train fine details and accidentally make a "youtube compression noise" lora by accident
• does your training include concepts that aren't known by sdxl (like nudity, facial accessories, new clothing articles, proprietary or new subjects that dont exist in public datasets)
• are you training eyes? (yeah. training eyes messes up more than you'd think. its a long conversation)

tall condor Nov 13, 2023, 1:00 AM

#

yes its alot like that

hollow spruce Nov 13, 2023, 1:04 AM

#

digital dune everyone has their own opinion on what to do, what not to do

only tip I can give, is don't go over rank 32, unless you know what happens at higher ranks, since your mistakes get more hidden at higher ranks, but dont actually dissappear, and will lead you down endless rabbitholes that feel like you're just one setting away from fixing them

oblique adder Nov 13, 2023, 3:41 AM

#

how can i make it so adetailer doesnt entirely change my render? i only want the eyes to be better but i dont know what setting makes it so different

#

left is adetailer, right is how i want it to be

digital dune Nov 13, 2023, 4:36 AM

#

oblique adder how can i make it so adetailer doesnt entirely change my render? i only want the...

dont use adetailer

#

its way hyped up

#

nothing beats good ol cnet and inpainting

oblique adder Nov 13, 2023, 4:42 AM

#

can i make the eyes render better that way?

raw dirge Nov 13, 2023, 4:43 AM

#

no the eyes shape depends on the model

#

only way to change eye shape is either use another model or use an artist style lora

oblique adder Nov 13, 2023, 4:47 AM

#

its less about the shape but rather about the clearness (i guess?), i want it to look this like render

raw dirge Nov 13, 2023, 4:48 AM

#

it depends on the resolution and model some models add a lot of details but theres also models that are very flat and clean without much details

digital dune Nov 13, 2023, 4:48 AM

#

have you tried inpainting?

raw dirge Nov 13, 2023, 4:48 AM

#

you could also use a lora style for that

oblique adder Nov 13, 2023, 4:50 AM

#

digital dune have you tried inpainting?

like the inpainting under adetailer? ive seen it but dont know how to use it

digital dune Nov 13, 2023, 4:50 AM

#

nah dont use adetailer g

#

like I said, its unnecessary

oblique adder Nov 13, 2023, 4:50 AM

#

raw dirge you could also use a lora style for that

that be nice, do you by chance know any lora's of that kind?

#

oh, i just saw inpainting under adetailer assumed it was that one

#

or atleast, when i open the tab inpainting appears

raw dirge Nov 13, 2023, 4:55 AM

#

on civitai search for style and filter by lora/lycoris to see all of them theres a lot for 1.5 (more than 1k)

oblique adder Nov 13, 2023, 4:58 AM

#

okay, thanks i'll give it a go

normal ember Nov 14, 2023, 2:17 PM

#

tall condor and now 4

What does your rig look like to be able to fit 4x 4090s? I'm thinking about getting a rig at work.

#

I've looked at specs at vast.ai for 4x 4090s. But interested in your setup.

#

I guess you would need a large PSU or even multiple.

digital dune Nov 14, 2023, 5:08 PM

#

How exactly does kohya decide how many reg images to use per subject image? Is it the total subject images*repetitions divided by total reg images*repetitions? Im getting an issue with my images becoming a little too toasty during training and I'm ready to blame my reg images but can't narrow down what exactly kohya is doing with my regs.

#

This is a problem that I've had since the dawn of time and I remember it being able to solve it by simply adding more reg images per subject image. Otherwise, I have to lower the CFG of my gens all the way down to 2 to avoid them looking overcooked af.

#

This wasn't a problem in A1111 dreambooth because you can explicitly say how many regs*subject you want but I cant go back to that since it has too many bugs. Halp plz.

latent charm Nov 15, 2023, 2:14 AM

#

How does the reg dataset work in fine tuning?

digital dune Nov 15, 2023, 4:34 PM

#

latent charm How does the reg dataset work in fine tuning?

the reg images will influence the output images of the whole model

#

as well as your class images ofc

#

when it's only a small influence, it's perfect for when you have super-detailed images on your reg images so they blend with your class images, thus your output images are super highly detailed

#

but if the influence is too much, the outputs are overfitted to the reg images and everyone starts looking like wrinkly grandmas. Finding the sweetspot is hard and I'm finally convinced it's impossible with current technology to train a model with more than 10 subjects that is decent.

hollow spruce Nov 15, 2023, 10:03 PM

#

digital dune How exactly does kohya decide how many reg images to use per subject image? Is i...

1:1 by default.
1:n where n is repetitions. its basically the only good reason to use repetitions in your datasets

#

any additional reg images in the folder just stay unused

#

in case of not enough reg images, it starts to repeat them until that equation checks out

digital dune Nov 15, 2023, 10:10 PM

#

hollow spruce 1:1 by default. 1:n where n is repetitions. its basically the only good reason t...

you don't use the technique of matching datasets when training multiple subjects who have different class images?

#

#

So if I'm training 4 subjects, one has 400 class images, the other 200, the other 50, 25. Would'nt you train them as 1_A person, 2_B person, 8_C person and 16_D person?

hollow spruce Nov 15, 2023, 10:13 PM

#

digital dune So if I'm training 4 subjects, one has 400 class images, the other 200, the othe...

in 1.5, absolutely. in sdxl, it gets so much more complicated x_x

digital dune Nov 15, 2023, 10:14 PM

#

yeah I aint touching xl in a year or two

#

but would that mean that one person gets 1 reggie, the other 2 reggies, the other 8 reggies etc. so they all get an uneven number of reggies?

#

I just want a solid 20 per subject goddangit. Why does kohya make things hard.

hollow spruce Nov 15, 2023, 10:15 PM

#

rule of thumb is treat everything with a tenfold amount of images. so when you go for sdxl, just remember to up your dataset by 10, to achieve a 1.5 equivalent result. (but obviously at the higher 1024px)

#

and once you get into 4k image territory, then standard rules of finetuning change. for the better even, but its a mess, since it depends on so many factors

hollow spruce Nov 15, 2023, 10:16 PM

#

digital dune but would that mean that one person gets 1 reggie, the other 2 reggies, the othe...

oh. I see what you mean. I do manual tagging for my reg images, so I dont have that issue. it just literally increases the quality of the resulting model

#

1 word/folder class type of reg images always had their own issues, so I stopped doing that :/

#

it basically messed up the text encoder, and only changed how the model was overfit, instead of protecting it from overfitting

#

not an issue if you're doing like a pose for women, or a face expression for men - since you're not gonna generate cats/dogs/cars etc., but it does matter if you're trying to do more general type of lora/full finetune without damaging the core sd model

digital dune Nov 15, 2023, 10:24 PM

#

hollow spruce oh. I see what you mean. I do manual tagging for my reg images, so I dont have t...

wait, manual tagging?

#

you mean the reg image filename? or you use captions for the reggie?

#

I have 11,000 reg images in my folder so that GPT autotagger better come out soon 💀

#

Thankfully this is only an issue when training multi-subject models, so I'll have to bite the bullet and train a lot of small models and just buy a new ssd I guess. Really curious to see if tagging all those reggies would work 🤔

hollow spruce Nov 15, 2023, 10:31 PM

#

digital dune I have 11,000 reg images in my folder so that GPT autotagger better come out soo...

it is already T.T but you're still limited by the normal gpt limits. unless you go api, which will probably run you dry

hollow spruce Nov 15, 2023, 10:31 PM

#

digital dune you mean the reg image filename? or you use captions for the reggie?

yep. I'

#

I'm currently at 6k manually edited images + manual tags 🤣

digital dune Nov 15, 2023, 10:32 PM

#

The manually edited images I believe, the tags for reggies I do not

#

And I thought I was no-lifing this shit 💀

#

I mean honestly, a generic reggie dreambooth with a tagged lora is already an insanely accurate combo

hollow spruce Nov 15, 2023, 10:34 PM

#

here's my woman subset 🤣

digital dune Nov 15, 2023, 10:35 PM

#

do you use boorudatasettagmanager or do you recommend something else?

hollow spruce Nov 15, 2023, 10:35 PM

#

assuming you're mad enough, then hydrus network is not only the best option, but also the only. Which makes me sad. Cause hydrus is terrible. But sorta required T.T

digital dune Nov 15, 2023, 10:35 PM

#

I'm decided that the optimal path is untagged dreambooth + tagged lora. Anything else would only be good for huge multi-subject training but I sure as hell am not gonna test that theory just to have it blow up in my face with all that time tagging reggs

digital dune Nov 15, 2023, 10:36 PM

#

hollow spruce assuming you're mad enough, then hydrus network is not only the best option, but...

For manual tagging?

hollow spruce Nov 15, 2023, 10:36 PM

#

digital dune For manual tagging?

yep. though its also the no. app for scraping boorus

#

in fact I think that's how it came to be

#

the manual tagging was an afterthought XD

hollow spruce Nov 15, 2023, 10:38 PM

#

digital dune I'm decided that the optimal path is untagged dreambooth + tagged lora. Anything...

there's always the option of just using them as proper training images, instead of reg images.

#

but at that point, might as well full finetune

digital dune Nov 15, 2023, 10:38 PM

#

Do you by any chance have a model that was trained with tagged reggies vs. untagged reggies and all other parameters the same that you can share on civitai? Even if it's weeb stuff I really only want to see what kind of effect tagging has on the reggies which up until today I have never heard of

hollow spruce Nov 15, 2023, 10:39 PM

#

digital dune Do you by any chance have a model that was trained with tagged reggies vs. untag...

only for sdxl - which is way way way overkill for 1.5

digital dune Nov 15, 2023, 10:39 PM

#

I wanna see

#

so you think it won't make a dif tagging them on 1.5 anyways right

hollow spruce Nov 15, 2023, 10:39 PM

#

digital dune so you think it won't make a dif tagging them on 1.5 anyways right

not negatively - but also not needed

#

like by far

#

you can get away with so much more in 1.5

digital dune Nov 15, 2023, 10:40 PM

#

my plan is to master all my subjects in 1.5 training, then if I wanna move them to XL, just use the best gens from 1.5 directly upscaled to 1024 and train those

hollow spruce Nov 15, 2023, 10:40 PM

#

I'd recommend you to use vit-big-g to autotag all your reg images with captions only

digital dune Nov 15, 2023, 10:40 PM

#

so that's also why I plan on waiting out XL

hollow spruce Nov 15, 2023, 10:40 PM

#

that way you get essentially the same result - but its 100% automated

digital dune Nov 15, 2023, 10:40 PM

#

Ty for the tip. I never really used anything besides boorutag

hollow spruce Nov 15, 2023, 10:42 PM

#

in webui, install Interrogator extension
then use the batch option with Vit-bigG enabled and set to caption to autotag all your reg images

#

it will give you natural language prompts that are about 80~90% accurate for all your images

#

warning, Vit-bigG requires a 3090/4090. not sure how much lower you can go

#

If you are low on resources, you can use Vit-H to caption them instead

#

not nearly as good - but good enough since its just for the reg images

digital dune Nov 15, 2023, 10:45 PM

#

hollow spruce warning, Vit-bigG requires a 3090/4090. not sure how much lower you can go

I dont speak the language of people with short vram. chad

#

seriously, thank you so much for the tip. I'm excited at the prospect of this leveling up my model game.

hollow spruce Nov 15, 2023, 10:46 PM

#

digital dune seriously, thank you so much for the tip. I'm excited at the prospect of this le...

feel free to @ me if you ever have questions ^^

harsh shoal Nov 17, 2023, 1:42 AM

#

I am taking a dress (on a mannequin) and creating an image of a model wearing it. I highlighted the details in this post https://www.reddit.com/r/StableDiffusion/comments/17x13pf/update_1_clothing_on_model/
Progress so far: https://imgur.com/a/ilZdU2D
Does anyone have ideas what I can do to improve image generation?

From the StableDiffusion community on Reddit: update 1: clothing on...

Explore this post and more from the StableDiffusion community

Imgur

models

slow garden Nov 17, 2023, 7:03 PM

#

Hi, I'm not sure where is best to ask this, so I will try here.

My grandparents passed away this year and I would like to create a special gift for my mother. I have scanned old photos of them and I would like to try and upscale / restore them- but I'm unsure on what the best way to do it is. I tried using upscale with codeformer however it seemed to change the look of them too much that I could tell it wasn't my grandma. I also don't know how to remove the artifacts from the scanning, such as speckles and white dots.

#

If anyone could help point me in the right direction for me that would be very helpful. I'm hoping I can get it all done for Christmas

stone garden Nov 17, 2023, 8:00 PM

#

slow garden If anyone could help point me in the right direction for me that would be very h...

Hey, if the amount of photos isn't too much, I own a copy of https://www.topazlabs.com/topaz-photo-ai

I can offer to restore them for you, free of cost.

Topaz Photo AI - Maximize Image Quality with AI

Make good photos great. Sharpen, denoise, and upscale your images with tomorrow’s technology.

hollow spruce Nov 17, 2023, 10:46 PM

#

slow garden Hi, I'm not sure where is best to ask this, so I will try here. My grandparents...

yeah. agreed. topaz ai is your best bet for automatic, photoshop + photoshop knowledge ist your best bet for manual.
Stable Diffusion, while powerful, is random. You won't be able to get the face details right by doing only SD upscaling

supple pollen Nov 19, 2023, 7:11 PM

#

hello! Im new to SD and im having trouble getting the juggernautXL model to work. I just get images like this, does anyone know a fix?

#

heres the image details beautiful lady, (freckles), big smile, green eyes, long hair, dark makeup, hyperdetailed photography, soft light, head and shoulders portrait, cover, random hair color, supermodel face lora:add_detail:1
Negative prompt: badhandv4, (worst quality, low quality, normal quality, lowres, low details, oversaturated, undersaturated, overexposed, underexposed, grayscale, bw, bad photo, bad photography, bad art:1.4), (watermark, signature, text font, username, error, logo, words, letters, digits, autograph, trademark, name:1.2), (blur, blurry, grainy), morbid, ugly, asymmetrical, mutated malformed, mutilated, poorly lit, bad shadow, draft, cropped, out of frame, cut off, censored, jpeg artifacts, out of focus, glitch, duplicate, (airbrushed, cartoon, anime, semi-realistic, cgi, render, blender, digital art, manga, amateur:1.3), (3D ,3D Game, 3D Game Scene, 3D Character:1.1), (bad hands, bad anatomy, bad body, bad face, bad teeth, bad arms, bad legs, deformities:1.3)
Steps: 40, Sampler: DPM++ 3M SDE Exponential, CFG scale: 5, Seed: 3685031502, Size: 512x512, Model hash: 1fe6c7ec54, Model: juggernautXL_version6Rundiffusion, VAE hash: c6a580b13a, VAE: vae-ft-mse-840000-ema-pruned.ckpt, Denoising strength: 0.7, Token merging ratio: 0.2, Token merging ratio hr: 0.2, Hires upscale: 2, Hires steps: 15, Hires upscaler: Latent, Lora hashes: "add_detail: 7c6bad76eb54", Version: v1.6.0-2-g4afaaf8a

latent charm Nov 19, 2023, 7:12 PM

#

don't use 1.5 vae on sdxl model

supple pollen Nov 19, 2023, 7:12 PM

#

i dont understand lol

#

explain to me like im a small child

#

i think i understand now

latent charm Nov 19, 2023, 7:18 PM

#

Also, you might use 1024x1024 to get better result rather than 512x512

supple pollen Nov 19, 2023, 7:18 PM

#

i turned off vae and still the same

#

i do hire unsacler to 1024

latent charm Nov 19, 2023, 7:21 PM

#

Sdxl didnt use the same vae which is used in sd1.5. It is why your image get weird effect. sdxl is designed to generate image on 1024x1024 without high res fix.

supple pollen Nov 19, 2023, 7:21 PM

#

it worked!

#

new result

#

thankyou for the help

tame otter Nov 19, 2023, 9:21 PM

#

kohya got a bunch of updates and is using a newer bitsandbytes now. so full bf16 training? whats that about?

digital dune Nov 19, 2023, 11:28 PM

#

hollow spruce yeah. agreed. topaz ai is your best bet for automatic, photoshop + photoshop kno...

yo caith

#

I tried ur autotagger and for some reason it tagged all my women as either "araf, araffed or arafed" and as far as I can search that isn't a real word

#

did it do that to yours? Should I replace that word?

hearty bane Nov 20, 2023, 5:29 PM

#

I was watching my picture generating, i had that first timage: i was like wow what a nice city it'll be perfect ... i'll take a screenshot of that, but two second later i see this, a less quality image with not what i prompted in the background and so my question i noticed several times, why sometimes my pictures are better when they are not finish ?

errant nest Nov 20, 2023, 8:21 PM

#

hearty bane I was watching my picture generating, i had that first timage: i was like wow wh...

hiresfix? if so it's probably the denoise strength

#

standard is 0.7 i think but i usually use 0.35

hearty bane Nov 20, 2023, 8:42 PM

#

Idk i never really changed it

errant nest Nov 20, 2023, 8:46 PM

#

if its at 0.7 it changes the look a lot

tame otter Nov 21, 2023, 12:14 AM

#

anyone know of a tool to edit safetensor metadata?

tame otter Nov 21, 2023, 12:29 AM

#

i found mention of one coming soon in the kohya-ss gui github page, but no sign of it

worthy hazel Nov 21, 2023, 5:33 AM

#

#🔧｜finetune just want to train the local sd to get this kind of result in img2img, can anyone guide me /??

digital dune Nov 22, 2023, 2:38 AM

#

worthy hazel <#1026382406279770152> just want to train the local sd to get this kind of resul...

use an animated model + lora

#

or if u got the vram cojones for it, train a dreambooth on it instead

dusky urchin Nov 22, 2023, 4:35 AM

#

is it possible to train a lora on top of SDXL that

trains on a high quality dataset of captioned images
that have a mix of new terms, like more than one distinct character
and correctly generalizes on new captions that contain those characters

let's say my dataset contains pictures of two kinds of pokemon that CLIP has never seen before, Pokemon_X and Pokemon_Y. let's say they are never pictured together in this dataset. in order to correctly generate imagery for a photo of Pokemon_X and Pokemon_Y together which requires generalization, is it sufficient to (1) properly tokenize the captions to use an uninitialized pre-existing token / a new token (2) train the text encoder simultaneously with the unet? i am a sophisticated user.

worthy hazel Nov 22, 2023, 4:44 AM

#

digital dune use an animated model + lora

i have placed the model and tried to get result but not working, can you tell me how to set lora?

stiff dust Nov 22, 2023, 8:37 AM

#

dusky urchin is it possible to train a lora on top of SDXL that - trains on a high quality da...

I would say, yes, but I haven't tried it myself. However, SDXL tends to mix up concepts if you prompt for more than one subject. So quite often it's still better to use regional loras and regional prompts

#

actually, I tried it with two separate Loras and it works. So if you have a Lora on subject A and a lora on subject B and you use both of them together you can show them in one image (e.g. shaking hands).
However, it did not worked for all my loras equally well. For some Loras it starts mixing up concepts, for others it could well separate the subjects

#

but if it works with separated loras, it should work even better within a single lora

sonic narwhal Nov 22, 2023, 10:08 AM

#

Anyone got a python script for using GPT Vision API for captioning?

#

That takes all images in input folder, creates txt for them and puts generated description in correct txt

opal jacinth Nov 22, 2023, 10:44 AM

#

sonic narwhal Anyone got a python script for using GPT Vision API for captioning?

https://github.com/psdwizzard/GPTVisionTrainer

GitHub

GitHub - psdwizzard/GPTVisionTrainer: This Python project is design...

This Python project is designed to prepare training data for Stable Diffusion models by generating detailed descriptions of images using OpenAI's GPT Vision API. - GitHub - psdwizzard/GPTVi...

#

Didn't try it out yet, but saved it for later

sonic narwhal Nov 22, 2023, 10:45 AM

#

thank you. Ill test it today

sullen locust Nov 22, 2023, 1:45 PM

#

Hello sd users , i am getting this error while generating images

sonic narwhal Nov 22, 2023, 1:49 PM

#

opal jacinth https://github.com/psdwizzard/GPTVisionTrainer

It gives very good captions

opal jacinth Nov 22, 2023, 2:01 PM

#

sonic narwhal It gives very good captions

nice, great to hear

opal jacinth Nov 22, 2023, 2:01 PM

#

sonic narwhal It gives very good captions

you used it as it is?

opal jacinth Nov 22, 2023, 2:02 PM

#

sullen locust Hello sd users , i am getting this error while generating images

you might want to ask that in #🤝｜tech-support

sonic narwhal Nov 22, 2023, 2:05 PM

#

opal jacinth you used it as it is?

changed prompt slightly

#

"can you give me a description of this image content. Separate each sentence with commas and dont use dots"

normal ember Nov 22, 2023, 2:29 PM

#

Only issue with the script is that it will not handle rate limiting or any other error for that matter. But it can be improved.

digital dune Nov 22, 2023, 7:06 PM

#

why. are. my. fucking. prodigy. loras. so. fucking. random

#

kohya can suck a phat dick I aint even bullshittin

#

sometimes theyre overtrained at 200 steps, sometimes it takes 5000 steps, sometimes they won't train properly at all

#

do yall recommend shuffling tags and turning on weighted captions?

#

nvm I figured it out. I had some extreme closeups in the dataset that were tripping the lora up despite me tagging them with double weights that these are extreme closeups.

#

what does appear to be random is how many reps to do in the folder. Some train beautifully with 01, some only with 20+ despite being the same number of steps. I can't find any rhyme or reason to this other than trial and error

tall condor Nov 22, 2023, 11:11 PM

#

hey guys, do you recommend windows or linux for kohyass and Finetuning?

#

im setting up a new device and im wondering what is the advantages of either or

tall condor Nov 22, 2023, 11:47 PM

#

digital dune why. are. my. fucking. prodigy. loras. so. fucking. random

dont tor un weighted captions unless you actually hav weighted captions

#

also i recommend random crop as well as adding a noise offset of at least 0.03-0.05

#

regarding repete i recommend to balance out the model with no more than 15 reps. i usually do no less than 3 and no more than 10 to 15

#

epochs really do the trick for me

#

also note that some concepts are much harder to learn than others.this depends alot on the base model you train on. if the base model allready know the concept and you are fintuning it it takes way les steps than if it is something that the base model has never seen or heared

digital dune Nov 23, 2023, 2:44 AM

#

tall condor also note that some concepts are much harder to learn than others.this depends a...

oh shit that's a good observation

#

do you weigh your captions using (()) or :x.x?

#

my autotagger uses escape slashes before the parentheses and I'm wondering if that's affecting anything

tall condor Nov 23, 2023, 9:19 PM

#

i think the format need to be correct

#

anyone know if for finetuning sdxl i in kohya ss i need to use the sdxl brance still or can i use the master branch by now?

normal ember Nov 23, 2023, 9:46 PM

#

it's been merged into master yes

dusky urchin Nov 23, 2023, 9:51 PM

#

stiff dust actually, I tried it with two separate Loras and it works. So if you have a Lora...

That has been my experience as well. We are having some success with one Lora multiple characters so far too

tall condor Nov 23, 2023, 9:56 PM

#

if you want to have multiple characters in one lora you probably have to train it as one lora and not as 2

tall condor Nov 23, 2023, 10:13 PM

#

anyone using kohyass with multiple gpus on windows?

dusky urchin Nov 23, 2023, 11:07 PM

#

tall condor anyone using kohyass with multiple gpus on windows?

yes, in order to increase the batch size

#

however i do not use the scripts directly, i use the code as a package and operate it directly from a proper python package

#

we authored types and documented all the args

#

however it apparently hasn't be uploaded yet

tall condor Nov 24, 2023, 1:23 AM

#

anyone using kohya ss gui with windows and multiple gpus? for some reason windows only uses the first gpu

#

however on linux it uses them all

vernal dock Nov 24, 2023, 3:30 PM

#

Hello! Anyone offers freelance services to help train models?

slate marsh Nov 24, 2023, 7:07 PM

#

I've been trying to train an embedding recently, with textual inversion, both with a1111 and kohya. In both case it failed. No idea why. I get an embedding at the end, but it doesn't look like the subject, especially after I switch from sd-v1-5 model to a custom model.
anyone can help me? (actual good links that explain it properly, for example)

digital dune Nov 24, 2023, 7:54 PM

#

vernal dock Hello! Anyone offers freelance services to help train models?

mememememe

#

as long as it fits in a single 3090

digital dune Nov 24, 2023, 8:03 PM

#

slate marsh I've been trying to train an embedding recently, with textual inversion, both wi...

try a lora. TIs are outdated

tall condor Nov 24, 2023, 9:11 PM

#

i solved the issue with multi gpu on windows, however i had to change the backend from nccl to glue - anyone knows if there is any drawbacks from that

restive bridge Nov 25, 2023, 2:14 AM

#

anyone get OOM trying to extract Lora from an XL Dreambooth? seems others are doing it fine but I cant even with 48gb ram and 24gb vram

stone garden Nov 25, 2023, 6:09 AM

#

Are there any completely free online ways to train models? Google colab is now paid 😦

stone garden Nov 25, 2023, 5:19 PM

#

Alternatively, how can I train custom models with a 30 to 100 img dataset on comfyui?

latent charm Nov 25, 2023, 6:16 PM

#

You don't train on comfyui

stone garden Nov 25, 2023, 8:25 PM

#

oh ok

#

Is it possible to use dreambooth on 6gb vram and about 100 training images?

digital dune Nov 25, 2023, 9:37 PM

#

stone garden Is it possible to use dreambooth on 6gb vram and about 100 training images?

6gb is pushing it too much

#

Im gonna say no

#

You dont train on comfy, you train on either kohya or old school A1111

#

You can train a lora though, should be good enough for most purposes.

#

If you do want a dreambooth though, I can train one for you, no charge. This is one of the rare cases where I'm intentionally backing off of training just because I've been experimenting so much it's making my head spin.

stone garden Nov 25, 2023, 10:13 PM

#

digital dune You can train a lora though, should be good enough for most purposes.

How would i train a lora in a1111? If it helps, I'm trying to train it on unrealistic cartoony images. I can send an example if you want.

digital dune Nov 25, 2023, 10:54 PM

#

stone garden How would i train a lora in a1111? If it helps, I'm trying to train it on unreal...

lora is only for kohya

#

I would recommend training that on an animated model of your choice

#

this is the current training file I use and it's pretty neato mosquitarooni so far

#

📎 Super_Prodigy_Settings.json

#

just replace the folders with the ones with ur images and I suggest anywhere from 10-60 repetitions on the subject folder

haughty burrow Nov 26, 2023, 12:25 AM

#

Hi, my Lora train process stopped at epoch 10/10 98% because the Google Collab runtime stopped, what do I need to do with this existing epoch so that it becomes final Lora?

digital dune Nov 26, 2023, 9:33 PM

#

haughty burrow Hi, my Lora train process stopped at epoch 10/10 98% because the Google Collab r...

just use the last saved one as the lora

haughty burrow Nov 27, 2023, 3:21 AM

#

digital dune just use the last saved one as the lora

ahh ok ok

sonic narwhal Nov 27, 2023, 1:49 PM

#

Why did the lora that I trained on a portrait generate rotated portraits at interference

#

#

like this

#

None of my training data is rotated

stiff dust Nov 27, 2023, 3:07 PM

#

looks like a bug, lol

#

in SD they always specify images in (height, width) while in basically any graphics tool it is (width, height)

#

thus it might happen that scripts accidentally swap both values

torn dew Nov 27, 2023, 4:01 PM

#

I am just wondering. If I want to train say, an urban techwear style hoodie, but the patterns and designs aren't very varied in the images of it I have with it being worn, is it possible to include photos of different jackets or ones that aren't being worn to get it to take some of those pattern / design variations without it ruining the concept?

old quiver Nov 27, 2023, 4:10 PM

#

sonic narwhal Why did the lora that I trained on a portrait generate rotated portraits at inte...

what did you use to train it kohya?

#

some trainers might rotate the data for more variety though its usually something you need to turn on yourself kohya shouldnt do that automatically though another trainer might

sonic narwhal Nov 27, 2023, 5:22 PM

#

I did in kohya

old quiver Nov 27, 2023, 5:35 PM

#

sonic narwhal I did in kohya

what settings did you use

dusky urchin Nov 27, 2023, 6:34 PM

#

is anyone familiar with an approach for LORA training with a text encoder training that

adds a new token to CLIP, thereby changing its shape, initializing the embedding for that token appropriately
correctly accounts for the change in CLIP's shape, and correctly does NOT change the interaction between clip and the conditional UNET, since that shape should NOT change
"somehow" freezes the weights in pretrained clip, and only "trains" the new vocab token as it appears in the dataset as part of the broader LORA training approach
achieves this training with the same complexity as LORA training, because such a vocab increase "only" increases the rank of CLIP weights "by 1" in "a few places"
the goal is to not rely on community-discovered "rare token" since nobody has done this in a scientifically rigorous way, and i plan to train a lot of new distinct characters / concepts

#

OR is someone aware of a scientifically rigorous discovery of "rare tokens" (meaning in the context of SDXL, "low impact" on the conditional unet when used)

stiff dust Nov 27, 2023, 7:50 PM

#

what you say is basically "textual inversion"

#

however, I assume you want something different, like training the complete text encoder on a new token without changing it for other tokens

hot breach Nov 28, 2023, 11:33 PM

#

sonic narwhal

your training software is not respecting exif orientation, this is common if you are using photos off your phone. Phones commonly record images in the same orientation to the CCD on the camera itself, then just put a tag in the EXIF info to tell the applications that open it to rotate it on load (based on your phone's inertial sensor, i.e. gravity), by default PIL package won't fix this on loading and you have to add my_image.exif_transpose() somewhere in the code to fix it

hot breach Nov 28, 2023, 11:39 PM

#

dusky urchin is anyone familiar with an approach for LORA training with a text encoder traini...

have you tried simply using the actual names for things? Like first+last name for people or characters. I agree, finding more rare tokens, then having some cheat sheet or lookup table to relate actual names to some rare token is a completely garbage workflow, and also its never been needed at least for fine tuning where you can and should be using natural language and real words/names for things

dusky urchin Nov 29, 2023, 12:06 AM

#

stiff dust what you say is basically "textual inversion"

@stiff dust i am indeed describing textual inversion 🤦‍♂️

dusky urchin Nov 29, 2023, 12:06 AM

#

hot breach have you tried simply using the actual names for things? Like first+last name ...

my colleague and i are right now doing a scientifically reasonable, rigorous search for rare tokens, for a variety of definitions of rare

hot breach Nov 29, 2023, 12:10 AM

#

even if you find a bunch of rare tokens, constantly having a reference sheet for them or having to code around that is just a pain

dusky urchin Nov 29, 2023, 12:10 AM

#

hot breach even if you find a bunch of rare tokens, constantly having a reference sheet for...

yeah but we'd program the pipeline

hot breach Nov 29, 2023, 12:10 AM

#

give the text encoder some credit, it's not a 1:1 translation engine of tokens to embedding outputs

dusky urchin Nov 29, 2023, 12:10 AM

#

like it would be transparent

#

that's true, but it has been our experience that even the community's rare tokens, which were not rigorously discovered, work a lot better for loras with text encoder training enabled

#

i'm excited to share more on the work.

hot breach Nov 29, 2023, 12:12 AM

#

I've never been convinced it is worth seeking rare tokens vs using natural language and letting the text encoder do its job

dusky urchin Nov 29, 2023, 12:12 AM

#

stiff dust however, I assume you want something different, like training the complete text ...

i want the low computational complexity of a LORA, and also the ability to train CLIP together with the conditional unets, instead of "just" CLIP.

hot breach Nov 29, 2023, 12:12 AM

#

though I focus a lot more on fine tuning

dusky urchin Nov 29, 2023, 12:13 AM

#

hot breach though I focus a lot more on fine tuning

most opinions i've seen online is "the lora is your token"

#

but that approach does not tackle character X is holding hands with character Y

#

@swift loom

what you say is basically "textual inversion"

dusky urchin Nov 29, 2023, 12:19 AM

#

stiff dust what you say is basically "textual inversion"

why are there so few community textual inversion models for SDXL?

dusky urchin Nov 29, 2023, 12:24 AM

#

stiff dust what you say is basically "textual inversion"

are you familiar with textual inversion where i choose which tokens to adapt? it looks like it may make random choices

#

is it as simple as, i use the rare token in the image's caption, just like i would with a LORA?

random minnow Nov 29, 2023, 2:58 AM

#

Anyone know the ideal parameters for Lora training at 12gb vram?

digital dune Nov 29, 2023, 3:34 AM

#

is triton good for anything?

stiff dust Nov 29, 2023, 9:57 AM

#

in textual inversion you only train a single token which is newly introduced into the tokenizer (and, thus, is maximal rare ;)). So you don't need rare tokens when you use textual inversion. However, it has several disadvantages (which is why it is not used so often anymore):

training is much slower than Lora. Computationally, you don't save that much time training only an embedding instead of a lora. So time per step is similar. But you need much more steps to train a textual inversion. In Lora you train many matrices at once, so each step makes bigger changes to the model, while in textual inversion you only change ~2k parameters.
results are similar or worse than TE lora
most the downsides of TE loras are also happening in textual inversion

You could think that textual inversion is less destructive, as it only changes a single token and not the rest of the caption. But that's not really the case: even if you don't train CLIP, it will propagate your token to the complete caption. During training textual inversion, the embedding is learnt such that it seems to get as much attention as possible. Textual inversions often tend to dominate the caption, such that you lose prompt understanding.

In SD 2.1 I used textual inversion a lot. In SDXL it never worked so well for me and I found TE loras always better (and even try are very tricky). Quite often, only training unet is the better option :/

night estuary Nov 29, 2023, 8:38 PM

#

changing setting sd_model_checkpoint to ohwxBusB_step_2000.ckpt [136f107cjn]: AttributeError
Traceback (most recent call last):
File "D:\SD\stable-diffusion-webui\modules\options.py", line 140, in set
option.onchange()
File "D:\SD\stable-diffusion-webui\modules\call_queue.py", line 13, in f
res = func(*args, **kwargs)
File "D:\SD\stable-diffusion-webui\modules\initialize_util.py", line 170, in <lambda>
shared.opts.onchange("sd_model_checkpoint", wrap_queued_call(lambda: sd_models.reload_model_weights()), call=False)
File "D:\SD\stable-diffusion-webui\modules\sd_models.py", line 741, in reload_model_weights
state_dict = get_checkpoint_state_dict(checkpoint_info, timer)
File "D:\SD\stable-diffusion-webui\modules\sd_models.py", line 315, in get_checkpoint_state_dict
res = read_state_dict(checkpoint_info.filename)
File "D:\SD\stable-diffusion-webui\modules\sd_models.py", line 301, in read_state_dict
sd = get_state_dict_from_checkpoint(pl_sd)
File "D:\SD\stable-diffusion-webui\modules\sd_models.py", line 245, in get_state_dict_from_checkpoint
pl_sd = pl_sd.pop("state_dict", pl_sd)
AttributeError: 'NoneType' object has no attribute 'pop' I've trained 4 different models with fast-dreambooth colab repo but when it comes to select it from A1111 this error occurs. Whats wrong?

digital dune Nov 30, 2023, 1:14 AM

#

that pop error is hella ancient it feels like I havent seen it in years lol

remote storm Nov 30, 2023, 2:16 AM

#

I'm trying to train a model for a face on my own machine(16 GB VRAM). Would anyone recommend a model I can train relatively quickly? Also, what mechanism are people using to train SD models locally these days? Automatic1111?

ancient eagle Nov 30, 2023, 2:34 AM

#

I’m training Loras on sdxl of animals, and in general it’s working great. But for some animals, it randomly ignores their color with many prompts. Even simple prompts. Anyone have any idea how you might address this?

dusky urchin Nov 30, 2023, 4:39 AM

#

stiff dust in textual inversion you **only** train a single token which is newly introduced...

thanks for the detailed explanation. i'll follow up with a comparison of LoRA on very rare token versus textual inversion for various challenges.

#

we're polishing an ETL pipeline to make this reproducible. dagster & dask are pretty great with preprocessing, and our packagized kohya scripts are also really handy

stiff dust Nov 30, 2023, 10:33 AM

#

remote storm I'm trying to train a model for a face on my own machine(16 GB VRAM). Would anyo...

kohyas sd-scripts is the most common way of training Loras for SDXL. There are also some forks with UIs

stiff dust Nov 30, 2023, 10:34 AM

#

dusky urchin thanks for the detailed explanation. i'll follow up with a comparison of LoRA on...

nice, I'm looking forward to hear more about this.
I compared myself TI+UNet, TI+TE, TI+TE+Unet and TE+Unet and found TE+Unet works best. TI often overfitted on style for me. However, I also found that these things depend heavily on your captions and your images, so results might be very different on other data

sonic narwhal Nov 30, 2023, 3:14 PM

#

Is it a bad idea to train a LoRA on ~1000 images?

stiff dust Nov 30, 2023, 4:17 PM

#

no, why?

dusky urchin Nov 30, 2023, 4:18 PM

#

stiff dust nice, I'm looking forward to hear more about this. I compared myself TI+UNet, TI...

we are trying a lot of different captioning approaches from the literature. imo, the biggest problem with the community approaches is that people do not design the splits, they make random splits, and for so few images, often a concept does not appear in all three splits

#

that and all the default LR settings in kohya are trash 👀

stiff dust Nov 30, 2023, 4:18 PM

#

what do you mean with splits?

dusky urchin Nov 30, 2023, 4:19 PM

#

stiff dust what do you mean with splits?

if you want to learn about character X, a distinct example of it should be in the training, validation and test sets

#

clearly for a lora only on that one character, no problem

#

but if you depict the character in different wardrobe for example, and you bothered to caption it, tough cookie, you have to design the training, validation and test sets

stiff dust Nov 30, 2023, 4:20 PM

#

I never use test and validation sets... 😅

dusky urchin Nov 30, 2023, 4:20 PM

#

so part of our pipeline is to report the distribution of tokens into sets

#

lol

stiff dust Nov 30, 2023, 4:20 PM

#

the loss is not a good metric anyways

#

I always use validation prompts and compare the results visually

dusky urchin Nov 30, 2023, 4:21 PM

#

i'll be honest i'm not sure yet how impactful all this stuff is. i will report soon

wintry girder Nov 30, 2023, 4:55 PM

#

I needed to improve my workflow for captioning datasets, so I made this GUI: https://github.com/MakingMadness/caption-fletcher/tree/main

I hope someone finds it useful.

Only tested on linux, so would appreciate feedback from windows users.

GitHub

GitHub - MakingMadness/caption-fletcher: A simple caption editor fo...

A simple caption editor for image datasets. Automatically create captions, and hone them to be right on target! - GitHub - MakingMadness/caption-fletcher: A simple caption editor for image dataset...

#

Here's a screenshot 🙂

latent charm Nov 30, 2023, 9:14 PM

#

I had done something similar. Haha

#

#

Something like this

silent saddle Nov 30, 2023, 11:15 PM

#

sonic narwhal Is it a bad idea to train a LoRA on ~1000 images?

300 is a good number. Above it is overkill. I see many loras with 1000-1500 images with really bad or poor results. And many good loras with 100-200. What i mean is: pick the best cherries (images). Good quality, low noise, focus, different poses, different distance, light, etc... study the training parameters repeating the process. There is not a bullet proof recipe cuz each dataset need parameters adjustments. Try with a small data set first for faster training time.

wintry girder Nov 30, 2023, 11:28 PM

#

latent charm

Is it published? If so I'd like to check it out 🙂

latent charm Dec 1, 2023, 2:36 AM

#

wintry girder Is it published? If so I'd like to check it out 🙂

Uploaded to git but haven't write the readme yet. It is still in development. https://github.com/lrzjason/DatasetManagement

GitHub

GitHub - lrzjason/DatasetManagement: DatasetManagement is builded f...

DatasetManagement is builded for easily manage image and captions. - GitHub - lrzjason/DatasetManagement: DatasetManagement is builded for easily manage image and captions.

wintry girder Dec 1, 2023, 7:44 PM

#

latent charm Uploaded to git but haven't write the readme yet. It is still in development. ht...

Cool, thanks, I took a look! Good luck with it 😄

hollow spruce Dec 3, 2023, 3:08 AM

#

sonic narwhal Is it a bad idea to train a LoRA on ~1000 images?

quality > quantity
but assuming they are all good + well captioned, then yeah. more is definitely better - but its a question of if you can keep up quality that long?
basically, the less images, the more you rely on overfitting. the more images you have, the more flexible your lora becomes.

its a complex topic since it changes depending on what you're training.
One important tip is to fail fast. I'd suggest to a get a 'barely working' lora with 50 images, so that you can run different settings/variations fast. See what parts overfit faster than others. Then fix those as best you can.
then double the dataset each time.

you dont need to do this each time you train a lora - because the lessons learned stay the same. the next time you can start with a more complex setup right out of the box.

#

fyi: my biggest lora is currently at 6k (manually tagged) images. so I say this from experience

valid sequoia Dec 3, 2023, 4:30 AM

#

Anyone know what the most likely cause would be for a lora that doesn't like negative prompts? Just finished my 5th training on SDXL and everything seems to work quite well for most of the testing but if I add any token to the negative prompt it completely overbakes, like more than bumping the CFG to 10+

#

I noticed the lora epochs themselves seem to burn easier the longer it trained, cfg 11 for epoch 2 versus cfg 6 for epoch 10 as an example, but even epoch 1 burns if I use a negative prompt at all

valid sequoia Dec 3, 2023, 6:17 AM

#

Just ran another training with 10 repeats instead of 20 and lowered learning rate from 0.0003 to 0.0001 and it did move the "finished" epoch from ~4 to 9 but the burning is still happening 😒

#

Makes me wonder if it's the model(s) i'm testing on

#

or I'm missing something else

valid sequoia Dec 3, 2023, 7:44 AM

#

After a bit of checking the models I have it seems like there's a common problem with negative prompt causing the quality to tank, and one of them even requires clip skip -2. That's really annoying

stiff dust Dec 3, 2023, 11:15 AM

#

do you train text encoder?

valid sequoia Dec 3, 2023, 11:33 AM

#

I've tried both at this point

#

one video said not to, that lora is basically useless on most models

#

I think I may have finally figured it out. My lora training wasn't actually the problem, it was the models I was testing with (trained on base). Specifically, the AlbedoBaseXL model says "Leaving the negative prompt blank is the way to achieve the nice quality". It should say you can't use the negative prompt over CFG ~3 with a lora or it fries everything completely. At least the ones I've tried, not just my own. Multiple other models I was testing are based, likely heavily, on AlbedoBaseXL so the problem was widespread in my collection

stiff dust Dec 3, 2023, 11:49 AM

#

probably albedobasexl was then training the text encocer, too

#

I agree that training unet only is not always an option. Training text encoder improves results often, but might also often have undesired side effects

valid sequoia Dec 3, 2023, 11:53 AM

#

I ran a massive x/y with 3 epochs (5, 7, 10) from each of the 7 loras I trained, against every XL model I have. Found all but 2 were burning when there was a negative prompt

#

did that for CFG 5,7,9

#

Then I tested all the same loras on a 'good' model with CFG scales 2-15 and found the model only burned on a couple of the loras

#

Most of them were the "finished" epoch 10 versions, multiple epoch 5-7 were fine

#

Turns out the best training I have was actually epoch 5 of the first lora I trained, but the eyes 'glow' if I specify an eye color, which is what started all this.. some negative prompting can clean that up though I guess

stiff dust Dec 3, 2023, 12:07 PM

#

valid sequoia Turns out the best training I have was actually epoch 5 of the first lora I trai...

oh, yes, I know that problem. Specifying eye color is not possible in base model

#

to be fair: CFG is a hack. If you specify any CFG value over 1 the model is not doing what it was trained on

dire orchid Dec 4, 2023, 12:39 AM

#

hello guys, I'm new to finetuning sd and ran into a small problem

#

I get this error when trying to train with dreambooth: died with <Signals.SIGKILL: 9>.

#

is this a memory error? My dataset has 4k images, and my cpu ram is around 13 gbs

#

I can share my collab if needed, I'm using V100 rn, but can go for A100 if needed

torn dew Dec 4, 2023, 3:36 AM

#

If I am merging pruned models into my unpruned model, will it still be decent for lora training in the future? Or should I really only be merging unpruned if my goal is to use it for training?

gentle flame Dec 4, 2023, 4:33 AM

#

Does anybody know if finetunes using perturbed noise or min snr gamma produce noisy outputs? I think one of the two is causing issues, but I'm not sure which (think it might be the former, but I want to know if anybody knows for sure)

#

example

opal jacinth Dec 4, 2023, 9:53 AM

#

Which option do I have to adjust for training if the face comes out well in the aDetailer with the prompt "photo of ohwx woman" and not without?

#

so basically a prompt with "photo of ohwx woman" yields very good resemblance with the training data

#

but "portrait of (ohwx woman) as a viking, ...." has no resemblance without aDetailer, only if I heavily weight (ohwx woman:1.7)

#

I assume the training of TE 1+2 are out of balance? As I train unet + te1 + te2

stiff dust Dec 4, 2023, 11:35 AM

#

dire orchid I get this error when trying to train with dreambooth: died with <Signals.SIGKI...

this sounds rather like a RAM problem, not vram. Guess the process was killed because it took too much ram

dire orchid Dec 4, 2023, 1:30 PM

#

stiff dust this sounds rather like a RAM problem, not vram. Guess the process was killed be...

Thanks for response

#

Yeah I upgraded and it worked

silent saddle Dec 4, 2023, 9:32 PM

#

torn dew If I am merging pruned models into my unpruned model, will it still be decent fo...

I prefer to train on sd1.5 base model.

torn dew Dec 4, 2023, 9:34 PM

#

There is no benefit to training on a tune or merge? Even if I want to use that lora with said merge pretty much primarily?

silent saddle Dec 4, 2023, 9:34 PM

#

Many discord friends too. To avoid any unexpected effect.

silent saddle Dec 4, 2023, 9:37 PM

#

torn dew There is no benefit to training on a tune or merge? Even if I want to use that l...

On tuned or merged a lora will inherit some characteristics, good (ex: style) or bad but is not possible to predict what will happens exactly.

#

On base 1.5 will have a "healthy" lora.

#

Will not get any plus but wont get any flaws too.

torn dew Dec 4, 2023, 9:40 PM

#

That does make sense. I will actually try both to compare. 🙂 But I guess I still am wondering if merging pruned and unpruned models has any issues. For training on em afterwards. Logic says it shouldn't

silent saddle Dec 4, 2023, 9:41 PM

#

It is good because it will be easier to correlate parameters adjustments and results. The point is: there is something strange on my lora, it is because my parameters or the checkpoint is guilty?

silent saddle Dec 4, 2023, 9:43 PM

#

torn dew That does make sense. I will actually try both to compare. 🙂 But I guess I stil...

No problem merge pruned with full. With some checkpoints it won't works, dont merge, don't know why. So you will have to prune all. But most times works perfectly.

torn dew Dec 4, 2023, 9:44 PM

#

Awesome. Thank you for your advice.

silent saddle Dec 4, 2023, 9:44 PM

#

I had only 3 cases where was unable to merge. Rare.

torn dew Dec 4, 2023, 9:45 PM

#

I did a lot of MBW on my merges. It took forever to figure out what worked properly. I was worried I would have problems training on it lol.

silent saddle Dec 4, 2023, 9:46 PM

#

torn dew I did a lot of MBW on my merges. It took forever to figure out what worked prope...

Yeah. MBW is very time consuming but is the best way if you will merge many models

torn dew Dec 4, 2023, 9:47 PM

#

silent saddle Yeah. MBW is very time consuming but is the best way if you will merge many mode...

Maybe someday I will delve into Elemental merges. But I am not that crazy right now. 🙂

silent saddle Dec 4, 2023, 9:48 PM

#

For up to 3 merges with "health" checkpoints normal merge is ok.

silent saddle Dec 4, 2023, 9:48 PM

#

torn dew Maybe someday I will delve into Elemental merges. But I am not that crazy right ...

Hehehe.

torn dew Dec 4, 2023, 9:49 PM

#

I just notice normal merging just sorta smears everything together. Especially if I wanna mix realism and anime together.

#

That is why things look like plastic all the time.

silent saddle Dec 4, 2023, 9:50 PM

#

torn dew Maybe someday I will delve into Elemental merges. But I am not that crazy right ...

Many merges sometimes destroy the text encoder.

silent saddle Dec 4, 2023, 9:50 PM

#

torn dew That is why things look like plastic all the time.

True.

torn dew Dec 4, 2023, 9:50 PM

#

That is something I wanted to know. Is there a way to fix the text encoder?

#

Would I BASE merge a good one?

silent saddle Dec 4, 2023, 9:55 PM

#

torn dew That is something I wanted to know. Is there a way to fix the text encoder?

Man... i still don't found a good way to fix a damaged TE. Dont import base TE into your merge. It will make it more dumb. What i do with some success is: make a prompts set and create the images (all same parameters) with some checkpoints and your damaged model. Next analyze what checkpoint match closer your model. Export the TE from it and import on yours.

torn dew Dec 4, 2023, 9:56 PM

#

That makes a lot of sense.

silent saddle Dec 4, 2023, 9:56 PM

#

It helps but is not good enough but really help

#

The best is on MBW never mix TE.

torn dew Dec 4, 2023, 9:57 PM

#

AFAIK I never touch base.

silent saddle Dec 4, 2023, 9:57 PM

#

Set 0 or 1.

torn dew Dec 4, 2023, 9:58 PM

#

The problem seems to stem when you touch models that have a lot of loras merged in, or furry models since they don't use danbooru tags.

#

It seems to be what I notice fractures the TE.

silent saddle Dec 4, 2023, 9:59 PM

#

Checkpoints is a zoo right now. XL is still healty. But almost all 1.5 are "contaminated".

torn dew Dec 4, 2023, 10:00 PM

#

Very true. But I know some crazy person will release a tool or paper to fix all the contamination.

#

It's more of a when than an if.

silent saddle Dec 4, 2023, 10:00 PM

#

Example: realistic model with anime perfect eyes injected. It is very common problem

torn dew Dec 4, 2023, 10:01 PM

#

Yes. 100%.

#

But that is because people just do too many alpha merges.

#

I really do think Elemental Merges might be able to fix that. But with 600 paramaters or whatever it is per layer. Its too hard to do by hand.

silent saddle Dec 4, 2023, 10:02 PM

#

torn dew Very true. But I know some crazy person will release a tool or paper to fix all ...

Im polishing a model for 9 months. Trying to remove many problems.

silent saddle Dec 4, 2023, 10:04 PM

#

torn dew But that is because people just do too many alpha merges.

Famous models mix made some problems spread quickly.

torn dew Dec 4, 2023, 10:04 PM

#

cough RevAnimated cough

#

Completely breaks eyes.

silent saddle Dec 4, 2023, 10:07 PM

#

Eyes, warped face, non symmetric body, long arms or fingers, old face but having young body, etc etc etc

#

But is fun and we learn alot trying to figure out and fix things

torn dew Dec 4, 2023, 10:08 PM

#

silent saddle Eyes, warped face, non symmetric body, long arms or fingers, old face but having...

Throw it all in the trash and go back to 1.5 base. Lol

silent saddle Dec 4, 2023, 10:09 PM

#

More some months and migrate to XL makes more sense.

torn dew Dec 4, 2023, 10:09 PM

#

XL just feels very restrictive right now.

#

1.5 is so flexible.

silent saddle Dec 4, 2023, 10:10 PM

#

But im still in love with 1.5

torn dew Dec 4, 2023, 10:10 PM

#

Same.

silent saddle Dec 4, 2023, 10:11 PM

#

This model im working for long time can generate cool images with cfg 2 and steps 6 in 1sec. While there is alot of buzz about turbo xl.

#

And i prefer the non perfect (non crystal clear) mood of 1.5

torn dew Dec 4, 2023, 10:15 PM

#

silent saddle And i prefer the non perfect (non crystal clear) mood of 1.5

I agree.

torn dew Dec 4, 2023, 10:15 PM

#

silent saddle This model im working for long time can generate cool images with cfg 2 and step...

LCM is pretty magical.

dusky urchin Dec 4, 2023, 10:54 PM

#

latent charm Uploaded to git but haven't write the readme yet. It is still in development. ht...

can you give me some more info on this?

#

i am using the image metadata and Adobe Bridge / similar applications

latent charm Dec 5, 2023, 3:14 AM

#

dusky urchin can you give me some more info on this?

It is still in development. It contains a vue front end and a python api. You need to run npm run dev in vue front end folder and python api.py in python folder. After starting both program, you could get the interface in localhost running.

timid grail Dec 5, 2023, 7:19 PM

#

Hello everyone,
I'm having a real hard time understanding Lora training. I followed a tutorial to the letter, but I always get the same error.
Can someone please help ?

#

I'm using kohya_ss 22.2.2

#

Nevermind... Just found the solution, it seems

dawn oriole Dec 5, 2023, 10:29 PM

#

Anyone has any idea how I could use CLIP models with the embedding size (77, 512) instead of (77, 768) for guidance in stable diffusion? Is it possible to do it?

dusky urchin Dec 6, 2023, 6:26 PM

#

dawn oriole Anyone has any idea how I could use CLIP models with the embedding size (77, 512...

what would the purpose be? there's no deterministic way to do that, you'd have to find a function to translate the embeddings, its own scientific problem

dusky urchin Dec 6, 2023, 6:43 PM

#

does anyone know which tokenizer concretely was used by Stability in training SDXL's CLIPs? ~~the vocabulary 2,496/34,483 of the vocab terms have word terminators, which are never produced by e.g. comfyui~~

stiff dust Dec 6, 2023, 9:04 PM

#

of course they are

#

the word terminators are implicitly there whenever you have a word ending with a blank or nonalphabetic character

tender hearth Dec 6, 2023, 9:27 PM

#

Hey folks! I've just fine tuned my first model following this tutorial. Model is saved on my Google Drive but now I'm wondering how can I run the inference on it? I don't have Colab Pro, so can't use AUTOMATIC1111 Stable Diffusion WebUI

dawn oriole Dec 6, 2023, 9:39 PM

#

dusky urchin what would the purpose be? there's no deterministic way to do that, you'd have t...

I found some pretrained CLIP models that I wanted to experiment with, and they have the embedding shape of 512 instead of 768, hence I asked. Also, does the 77 in (77, 768) matter more or can SDs be conditioned with a shape of say (x, 768)?

stiff dust Dec 6, 2023, 10:07 PM

#

dawn oriole I found some pretrained CLIP models that I wanted to experiment with, and they h...

if you change the text encoder, you basically have to train SD from scratch

#

regarding token count: in theory you can change the token count from 77 to something else, but results can be unpredictable

dusky urchin Dec 7, 2023, 12:53 AM

#

stiff dust the word terminators are implicitly there whenever you have a word ending with a...

i guess i have found other bugs. you can pass exact matches to the vocab in the clip tokenizer, and it will still break it apart

#

i have a feeling there are more

#

the large amount of mojibake in vocab.json is not reassuring

wheat rock Dec 7, 2023, 9:21 AM

#

Hello, I am developing dreambooth model using 1 training image.
And I am not sure what is the proper hyperparameters to train lora model
I am currently using realistic_vision_5.1_no_vae as based model.
And lora rank is 4.
Please help me.
🙏

dire orchid Dec 7, 2023, 2:11 PM

#

anyone knows what's currently the best vae for sdxl?

#

is it just this one? https://huggingface.co/madebyollin/sdxl-vae-fp16-fix

madebyollin/sdxl-vae-fp16-fix · Hugging Face

valid sequoia Dec 7, 2023, 2:34 PM

#

dire orchid is it just this one? https://huggingface.co/madebyollin/sdxl-vae-fp16-fix

That's the only one i'm aware of that works properly, there are a couple others on civitai but they aren't fp16 so they take a lot more memory to use

dire orchid Dec 7, 2023, 2:35 PM

#

gotcha

#

and the openai decoder thing only works with normal sd, right?

waxen pine Dec 7, 2023, 4:04 PM

#

regarding text encoders in sdxl:
was sdxl trained with both encoders getting the same prompt?

#

is their any official statement on how they really relate to each other?

#

there are some tests what might be the best way to use them, but I never found anything official

#

regarding training unet only vs textencoder:
if you feed the unet with image / captions pairs the unet will have some knowlegde about the captions?
the textencoder has it's somehow limited vocabulary file, but also can concat single letters to build unknown words?
-> then u only have to provide enough samples to learn a new concept?
why do you ever want to train the text encoder then? (maybe to faster associate an image with a caption)?

stiff dust Dec 7, 2023, 4:29 PM

#

waxen pine regarding text encoders in sdxl: was sdxl trained with both encoders getting the...

yes, both text encoders always get the same prompt

stiff dust Dec 7, 2023, 4:30 PM

#

waxen pine is their any official statement on how they really relate to each other?

to be honest: I assume that the CLIP-L is just there because the images look better with it during the experiments

waxen pine Dec 7, 2023, 4:30 PM

#

but was this also the case for the initial sdxl training?

stiff dust Dec 7, 2023, 4:30 PM

#

there were always complains about CLIP-G being less effective for certain artists and styles

#

so they just used both

stiff dust Dec 7, 2023, 4:30 PM

#

waxen pine but was this also the case for the initial sdxl training?

yes

waxen pine Dec 7, 2023, 4:31 PM

#

aaaah i see 😄 thanks!

i did some tests with seperate captions for each decoder during finetuning, and it turns out pretty well

stiff dust Dec 7, 2023, 4:31 PM

#

using different prompts for CLIP-L and CLIP-G is very problematic. I wrote about that multiple times

waxen pine Dec 7, 2023, 4:31 PM

#

so i was wondering

stiff dust Dec 7, 2023, 4:31 PM

#

it works and sometimes might make the results better, but in most cases it will make things worse

waxen pine Dec 7, 2023, 4:32 PM

#

https://github.com/kohya-ss/sd-scripts/issues/781

can you have a look at this 🙏 ?

GitHub

Feature request: separate captions for SDXL text encoders · Issue #...

From what I have read about the 2 text encoders that SDXL uses, the G CLIP encoder is better at understanding natural human language/full sentences, whereas the L CLIP encoder isn't as advanced...

#

i have the assumption that CLIP L works very well for a style tag and G for the rest

stiff dust Dec 7, 2023, 4:33 PM

#

people assume that since the beginning, because it fits this idea of clip-g being better in understanding and clip-l is better at styles

#

but there is not a single study that proofs that

waxen pine Dec 7, 2023, 4:33 PM

#

but if sdxl was trained on both at the same time ... i think than it doesn't matter seperating them..

stiff dust Dec 7, 2023, 4:34 PM

#

whenever somebody really made a big case study with dozens of prompts and compared them it was always better to use both with same caption

#

but you can always make examples where some strategy works and the other not

waxen pine Dec 7, 2023, 4:34 PM

#

i see, you have point there

stiff dust Dec 7, 2023, 4:34 PM

#

the reason why I don't like separate captions is the following:

#

the way clip+unet works is that your sentence consists of tokens (in most simple case these are words) and each token is connected with parts of your image (like each latent pixel in the image gets a probability to belong to a certain token)

#

this way, a face in your image might be connected to the token "trump" and during diffusion process the pixels are changed in a way to look more like trump

#

when you have two clip encoders, the embeddings of the tokens are just concatenated

#

so say you have a caption "trump as pope" and a second caption "hyperrealistic photography"

#

then what sdxl becomes as input is the token ["trump"+"hyperrealistic"], the second token is ["as"+photography"] and the third token is ["pope"+""]

#

this doesn't make any sense

waxen pine Dec 7, 2023, 4:38 PM

#

i think "when you have two clip encoders, the embeddings of the tokens are just concatenated" is the crucial point.
is this internally in sdxl? because in comfy for example you can feed the captions seperate

stiff dust Dec 7, 2023, 4:38 PM

#

doesn't matter. They are concatenated as soon as they enter the unet

waxen pine Dec 7, 2023, 4:39 PM

#

damn 😭

#

xD

stiff dust Dec 7, 2023, 4:39 PM

#

concatenated means: the first token in CLIP-L is combined with the first token in CLIP-G

waxen pine Dec 7, 2023, 4:39 PM

#

yeah I understand that, but it's somehow a little bit sad xD

#

hmm thanks for all your thougths

stiff dust Dec 7, 2023, 4:40 PM

#

don't get me wrong: there are many weird things that surprisingly work even though they don't make sense

waxen pine Dec 7, 2023, 4:40 PM

#

is there any paper or something like that how sdxl works under the hood?

stiff dust Dec 7, 2023, 4:40 PM

#

I don't say that it's forbidden to use different captions

waxen pine Dec 7, 2023, 4:40 PM

#

yeah, there are so many different parameters whcih influence the whole training process

stiff dust Dec 7, 2023, 4:40 PM

#

but I would never ever encourage people to do that without evaluating theirself what works better

waxen pine Dec 7, 2023, 4:41 PM

#

that's one reason why I don't like the concept of shuffling captions, because it even adds more randomness to a random thing xD

stiff dust Dec 7, 2023, 4:41 PM

#

unfortunately, the claim that using linguistic prompts for CLIP-G and style prompts for CLIP-L is distributed and repeated very often without people ever questio that or evaluate that. Whenever such discussions pops up, somebody comes with a single example where it makes things so much better and acts as this would be a proof

stiff dust Dec 7, 2023, 4:42 PM

#

waxen pine that's one reason why I don't like the concept of shuffling captions, because it...

shuffling captions makes sense if you train many many epochs

#

you don't want the model start relying on the exact order of the content in the prompt

waxen pine Dec 7, 2023, 4:42 PM

#

hmm that makes sense

stiff dust Dec 7, 2023, 4:43 PM

#

in general: if you train many epochs you might want to add some randomization to the training process to avoid that the model starts memorizing things

#

this can be random cropping, flipping (dangerous), caption shuffling or dropout

stiff dust Dec 7, 2023, 4:44 PM

#

waxen pine regarding training unet only vs textencoder: if you feed the unet with image / c...

the text encoder can do single letters. That is knows whole words is mostly a performance trick. Each token costs you a lot of time and memory, so you want to keep the number of tokens as low as possible

waxen pine Dec 7, 2023, 4:44 PM

#

ahhh 🤩

stiff dust Dec 7, 2023, 4:44 PM

#

so if you give the text encoder a word that is tokenized into single letters, it will still learn that the combination of these letters make up this word

waxen pine Dec 7, 2023, 4:44 PM

#

ok ok makes also sence

#

but would use more tokens

stiff dust Dec 7, 2023, 4:45 PM

#

yes

#

which is then also bad for SD, because the unet also has to use all these tokens

#

(which doesn't matter that much as the unet is using a fixed length of 77 tokens anyways)

waxen pine Dec 7, 2023, 4:45 PM

#

after some weeks of debugging koyhas scripts I wondering at every line: "why is this like this xD what happens if i change this to that xD"

#

i also don't understand why we need this super complex multi layer text encoder when the model was trainined on alt tags

stiff dust Dec 7, 2023, 4:46 PM

#

because of that

#

if you have a french word it will not be contained in the dictionary

#

so the text encoder has to learn that this combination of tokens belong to a single word

waxen pine Dec 7, 2023, 4:47 PM

#

i made some tests with using the different layers als input (clip skip) but I'm no smarter than before

stiff dust Dec 7, 2023, 4:47 PM

#

which will take many layers

waxen pine Dec 7, 2023, 4:47 PM

#

i see 🤔

stiff dust Dec 7, 2023, 4:47 PM

#

but yes, alt tags are horrible. But this is the reason why CLIP is used, as CLIP is very robust to shitty captions

#

but it's also the reason why the text encoder is so extremely stupid in many cases. The alt tags are just too bad to learn something really useful

waxen pine Dec 7, 2023, 4:48 PM

#

i imagine something like this: "i have a really high creafted vector embedding by clip which should be compared to a alt tag vector embedding"
how can they even be close to each other xD?=

stiff dust Dec 7, 2023, 4:49 PM

#

CLIP-L has only 11 layers. That's actually a very small transformer given the amount of data it is trained on

waxen pine Dec 7, 2023, 4:49 PM

#

G has 32 ot something like that i thing

stiff dust Dec 7, 2023, 4:49 PM

#

yes. I'm sceptical about that xD

#

like it definitely has a better text understanding

#

but it's probably wasted on alt tags

waxen pine Dec 7, 2023, 4:49 PM

#

jap xD

#

i don't udnerstand why not instead of using alt text clip or blib was used to caption the images befor?

stiff dust Dec 7, 2023, 4:51 PM

#

I mean, CLIP is older than BLIP

#

also BLIP makes a lot of errors

#

even newer models like llava or coglvm make a lot of errors

waxen pine Dec 7, 2023, 4:51 PM

#

maybe as a adition to the alt text

stiff dust Dec 7, 2023, 4:51 PM

#

llava is sometimes not even able to distinguish a monochrome photo from a colorful photo

waxen pine Dec 7, 2023, 4:51 PM

#

hmmm

stiff dust Dec 7, 2023, 4:51 PM

#

but I agree that this is the way to go

#

Dall-E 3 and Pixart Alpha did that

#

they made automatic captions and trained on them

#

Pixart Alpha is almost as good as SDXL, but was trained on 1/100 of the data

waxen pine Dec 7, 2023, 4:52 PM

#

i have my eyes on pixart too xD

#

but currently so deep into sdxl so i want to learn everything about it first xD

#

can i ask another question xD? you have some seroius knowledge

#

have you tried training with zsnr fix?

#

i read sdxl already implemented it

#

and koyha supports it

#

but my results are not getting really better with it

stiff dust Dec 7, 2023, 4:53 PM

#

haha, you should ask Pseudoterminalx about that

waxen pine Dec 7, 2023, 4:54 PM

#

the betas look so much different than the originals

#

and my math is worse haha

stiff dust Dec 7, 2023, 4:54 PM

#

I think zsnr fix in kohya is something different than zsnr

waxen pine Dec 7, 2023, 4:54 PM

#

at least koyha uses the little script which is used in the offical paper

#

fix beta: zero terminal SNR

print(f"fix noise scheduler betas: https://arxiv.org/abs/2305.08891")

...

#

it's absurd that a view beta values can change the result that much 😅

stiff dust Dec 7, 2023, 4:56 PM

#

as I understand zsnr you cannot use it for epsilon prediction

waxen pine Dec 7, 2023, 4:56 PM

#

with v-pred I alwys get grey images

#

during training and even in comfy

#

don't know how to get this to work

stiff dust Dec 7, 2023, 4:57 PM

#

you cannot easily change the prediction type

waxen pine Dec 7, 2023, 4:57 PM

#

in koyha you can enable it with v_parameterization but he also writes to not use it with sdxl

stiff dust Dec 7, 2023, 4:57 PM

#

if a model is trained on epsilon, you cannot let it predict v

waxen pine Dec 7, 2023, 4:57 PM

#

ahhhhhhhhh

stiff dust Dec 7, 2023, 4:58 PM

#

you have to completely retrain the model for hundred thousands or even million of steps

waxen pine Dec 7, 2023, 4:58 PM

#

😭 again xD

stiff dust Dec 7, 2023, 4:58 PM

#

pseudoterminalx is training an SDXL variant with v-pred

waxen pine Dec 7, 2023, 4:58 PM

#

ok, so i don't have to investigate this route any further

#

thanks 🥰

waxen pine Dec 7, 2023, 4:59 PM

#

stiff dust pseudoterminalx is training an SDXL variant with v-pred

woah cool xD looking forward to the results

#

thanks so much for your knowledge :D!!

stiff dust Dec 7, 2023, 5:00 PM

#

https://huggingface.co/ptx0/terminus-xl-gamma-v1

ptx0/terminus-xl-gamma-v1 · Hugging Face

waxen pine Dec 7, 2023, 5:01 PM

#

woah 🤩

#

instant tryout now 😂

frozen island Dec 7, 2023, 6:05 PM

#

Im planning on trying to train my first lora on a comic book artist's style, are there things I should avoid for the training dataset, such as text?

dusky urchin Dec 7, 2023, 6:12 PM

#

stiff dust there were always complains about CLIP-G being less effective for certain artist...

fwiw, i've found lots of big surprises with tokenization in transformers being really unoptimal

#

and it's almost certainly not what stability uses for its training

#

like the tokenization is... let's say weak oftentimes

#

it's just not using the vocabulary correctly

#

huggingface and comfyui both played telephone with https://github.com/openai/CLIP/blob/main/clip/simple_tokenizer.py and got something different

GitHub

CLIP/clip/simple_tokenizer.py at main · openai/CLIP

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image - openai/CLIP

#

which, obviously, this implementation isnt' the one OpenAI used to train CLIP, and openclip... they also played telephone - https://github.com/mlfoundations/open_clip/blob/main/src/open_clip/tokenizer.py

GitHub

open_clip/src/open_clip/tokenizer.py at main · mlfoundations/open_c...

An open source implementation of CLIP. Contribute to mlfoundations/open_clip development by creating an account on GitHub.

dusky urchin Dec 7, 2023, 6:35 PM

#

@stiff dust i think one of the reasons lora training with text encoding is going so poorly on sdxl is because the tokenization is broken. what people think are rare tokens are not really. stability, laion, whoever, almost certainly precomputed tokenization for openclip on its corpus, which to do so performantly means authoring BPE in something like the polars DSL. they probably didn't use any of the ordinary unoptimized python code at all.

silent saddle Dec 7, 2023, 10:10 PM

#

I saw somewhere they do only one pass through part (picked randomly) of laion db.

gray pine Dec 8, 2023, 9:03 AM

#

Image captioning for LoRa

Hi there,
I find image captioning a bit confusing as tutorials often have different approaches. My main problem is that they usually explain them on the example of subjects or styles. What I want to train is more of a concept. I want to train a LoRa that focuses on everything balloon related such as inflation and manipulation/handling by characters. I found that most models can’t produce any good results for this except for characters holding floating helium balloons.
So my questions are:

How should I caption the images, what to describe, what not to describe? Sentences or single terms or both?
Should I use a prefix? And if so should it be a term that the model is familiar with?
Should I use a unique identifier?
what Settings should I use for a concept like this? (Using Kohya SS)
should I use as many images as possible (I have about 300 rn) or only use a few?

Any other tips and hints are welcome.

I’m new to this so I hope I don’t ask stupid questions.

Thank you guys

silent saddle Dec 8, 2023, 3:21 PM

#

gray pine Image captioning for LoRa Hi there, I find image captioning a bit confusing as ...

Hey man, I don't know how to answer some questions for your case (looks more complex than usually i have experience). On my loras i caption terms not related to my object of interest. Dont need be detailed: a table with fruits is ok. a table with orange, grapes and bananas is not necessary. Use a trigger not familiar to model. You can test before to see what is generated. I rather use something with consonants and numbers together like: h4th4w4y or short h4w4y. I like to train on sd1.5 base model. Minimal influence of model is better (my opinion). Settings is hard, there is no bulletproof recipe. Start with few images (20-30 best quality images) for faster training to adjust your parameters. When satisfied goes adding 30, adjust again, add 30, etc... For your case i think more is better (300-500). Simple loras 30-50 is enough with a good dataset. Sometimes 20 best image already does the magic.

gray pine Dec 8, 2023, 4:49 PM

#

silent saddle Hey man, I don't know how to answer some questions for your case (looks more com...

Hi! Thanks for your advice. It is definitely helpful and I will try some of these steps. I’m not even sure if a Lora is the right choice or if other training methods would be more fitting. What makes it difficult is that I don’t really teach a single object but make an existing object more defined, in this case balloons. So I’m afraid if I only describe everything else the model can’t distinguish enough. One idea I had was to make balloon related terms for captions. Like „small balloon“, „uninflated balloon“, „fully inflated balloon“. Also actions like „inflating balloon“ „holding balloon“ etc. I’m not sure if this is the right approach, especially if I use pictures with multiple balloons in them, how should the model know what specific balloon I am describing.

silent saddle Dec 8, 2023, 5:09 PM

#

gray pine Hi! Thanks for your advice. It is definitely helpful and I will try some of thes...

Yeah. Your situation is really new to me. I think you should focus on balloon object condition. Actions most models should figure out it. If some action/pose is hard my intuition says to train it on separate lora and use it with balloon lora.

dire orchid Dec 9, 2023, 2:01 AM

#

Assuming that I have a Lora that I already trained with dreambooth and saved on HF, how do I continue training this lora from where I left off?

#

Any clear guides on that? Maybe use the HF link as the checkpoint path or smth?

hollow spruce Dec 9, 2023, 3:49 AM

#

gray pine Image captioning for LoRa Hi there, I find image captioning a bit confusing as ...

so in a nutshell, there are multiple approaches to this, that will all work, but have different downsides/side-effects
learning the different methods is obviously trial and error.

The easiest way to get it working, imho, would be to take the 50 best images of your set, and start with those. Basically get a lora working 'well enough' with only those images. (the full 300+ will be used later)

make sure that signatures are removed. You can crop to 1:1 to avoid certain pitfalls, but this is entirely optional. Ideally you want bucketing to work, since that also makes your lora better in the long run -but that's not the focus of this question.
once you have them, caption them using only caption style words.
"<trigger word>, <subject>, <action>, <location>" <- really depends on your dataset, but keep it simple, and be careful when captioning to not miss any/apply when not relevant. Add anything specific, that you want to be able to prompt for.
"balloon_style, woman, sitting, indoor" <- one potential image caption

(why not natural language style captions? <- is a long conversation. but with such small datasets, using captions will almost always get you better results)

trigger word always first. shuffle turned on. keep n tokens set to 1 (so that trigger word stays front)
min snr set to 5. (read up on min-snr if you're brave enough. but leave it on, as it helps train new subjects a lot faster) -> https://arxiv.org/abs/2303.09556
settings REALLY depend on your hardware. I could help you on a 3090/4090 setup, since that's what I have.
stuff that stays same:
• Leave offset noise at 0 <- can and usually will cause issues
• LR between 0.001 ~ 0.0005 + constant scheduler + adamw8bit <- for your first tests run at 0.001. While not ideal, it will still show you very very fast if you have issues somewhere else in your lora, that you need to fix first
• Network dim set to 32 <- 8 or 16 also work, and will make the final model more flexible, but will be harder to get right if you're not used to making loras. Do not go above 64 unless you know what that means, as 64~256 will cause significant damage to base sdxl, unless you know how to fix that.
• use all the performance gains you need to make it work (depends on hardware -> stuff like xformers, bitsandbytes, full bf16, etc... - there are other guides on performance gains)
• you can either try a lower LR, and get it working just right, or a higher LR and then use the final lora at like 0.5 strength. (getting it right means more flexibility - but takes like 7 hours to train VS training for 10~25 minutes, but working with getting results via overfitting to some extent)
• enable text encoder training <- while definitely not always recommended, for this specific scenario, if you roughly follow my captioning example, will work really well. Just don't use any words that pertain to anatomy (mostly nsfw stuff), cause that will break your entire training very very quickly

when you're done, run an "infinity grid" to compare results from without vs with lora, and to test different epochs, and to test different cfg + step values + different prompts
(infinity grid can be found in A1111 extensions, or stableswarmui has native support for it)

when you're happy with your lora, remake it with all images you have available, keep all settings the same except learning rate, which you should tone down to around 0.0005, and then generate your truly final final™ LoRA (expect this to run between 5~15 hours on 24gb vram)

silent saddle Dec 9, 2023, 3:19 PM

#

@hollow spruce Good point about text encoder. ✨

dire orchid Dec 9, 2023, 3:54 PM

#

hey guys, I'm fairly new to finetuning sdxl (tho I have some experience finetuning LLMs), can someone guide me to where I should look at to get up to speed with everyone?

#

I'm trying to train sdxl with a dalle 3 style captions (have around 5k images), but I feel like I'm missing a lot of knowledge that messes up my attempts

valid sequoia Dec 9, 2023, 4:46 PM

#

dire orchid I'm trying to train sdxl with a dalle 3 style captions (have around 5k images), ...

I'm not sure how dalle 3 captions are, but you want your captions to work on the model you're training (unless it's a completely new context) and depending on what you're training, 5k images is way too many to start with

dire orchid Dec 9, 2023, 4:47 PM

#

valid sequoia I'm not sure how dalle 3 captions are, but you want your captions to work on the...

dalle captions are highly descriptive natural language captions

valid sequoia Dec 9, 2023, 4:48 PM

#

In most cases you're better off with maybe 50 images, I train faces with 10-20 usually

#

but training SD, especially SDXL, is a lot of trial and error. Start with a small set and see if it works at all and then adjust from there

dire orchid Dec 9, 2023, 4:49 PM

#

but like I'm trying to make it learn complex composition and who is where in the image, would a small dataset really give it that understanding?

#

stuff like this guy is in the right and this guy is in the left

valid sequoia Dec 9, 2023, 4:50 PM

#

SDXL in particular is good with natural language prompts, but you don't want you captions to be too long or overly descriptive

#

I'm not sure training a lora is the best option for things like composition

dire orchid Dec 9, 2023, 4:51 PM

#

so full finetuning?

latent charm Dec 9, 2023, 4:51 PM

#

It is quite complicated. I think some model on civitai already used dalle3 like descriptive training.

valid sequoia Dec 9, 2023, 4:51 PM

#

controlnet would be way easier and likely more successful

dire orchid Dec 9, 2023, 4:51 PM

#

latent charm It is quite complicated. I think some model on civitai already used dalle3 like ...

oh really

#

any link please?

#

would love to talk to the guy and see what he learned

latent charm Dec 9, 2023, 4:52 PM

#

I am not sure which. I think pixelwave might be one of them.

#

The author of pixelwave is in this server. You might ask him in sdxl or here

dire orchid Dec 9, 2023, 4:54 PM

#

huh, interesting

#

ah can I ping?

#

@ruby pond Hey man, I'm fairly new to sdxl finetuning and would really love to learn. I tried to finetune sdxl with a dataset captioned by cogvlm (similar to chatgpt4vision), and I had to hack into the dreambooth lora code quite a bit to make it able to accept long prompt (bigger than the native 77). Would love to learn how you did it

hollow spruce Dec 9, 2023, 5:16 PM

#

@dire orchid
Try to make a text encoder only LoRA as a proof of concept. I can guarantee that that works up to a certain degree.

Assuming you wanna go all the way with your plan, you'll need around 30~50k images with descriptions. Keep in mind they need to about under 77 tokens, since SDXL is not dalle 3 and cant keep up with such long captions.
Ifyou pull off such a dataset, then your next step is either a cheap finetune, where you finetune on a A6000 or A100 with however big of a batch size you can get into it + keep accumulation high.
If you have money to spare or a company behind you, then get a 8stack of A100s or even more, and do either a full finetune or multi step finetune (first TE -> freeze TE -> then UNET)

#

in either case, you'll need a good dataset with prompts that arent biased towards any specific subject, since you'll be hardcoding that bias into the TE

dire orchid Dec 9, 2023, 5:17 PM

#

ah, okay, that helps a lot actually

#

Im not sure of my hack and Im sure it fucks up a lot of things

dire orchid Dec 9, 2023, 5:20 PM

#

hollow spruce <@181893595177943040> Try to make a text encoder only LoRA as a proof of concep...

any easy way to train the text encoder?

hollow spruce Dec 9, 2023, 5:20 PM

#

dire orchid any easy way to train the text encoder?

yes
what gpu do you have?

dire orchid Dec 9, 2023, 5:20 PM

#

hollow spruce yes what gpu do you have?

using google colab A100

#

40 gb vram

hollow spruce Dec 9, 2023, 5:20 PM

#

ah neat

#

then you're more than good to go.

use kohya-ss -> enable TE training -> disable unet training
do a few test runs to see how high you can push your batch size. the bigger the better (cause it adds context to how the prompts relate/differ from one another)
then let it run for 5 epochs. (save each epoch) for LR feel free to stay around 0.0008 <- way too high, but this is just for testing it
on an a100 this should be damn fast. far less than a hour.

download your safetensor files. run locally and compare to base. you should see significant changes around epoch 5.

run a few tests like that to eliminate any possible errors in your settings.
when you're happy, run it with LR of 0.0001 to get your first final version.

as your dataset grows, you can slowly incread the network dim, and decrease the LR even further <-- will require a lot of testing to get it just right

dire orchid Dec 9, 2023, 5:27 PM

#

okay, right now I have a super biased dataset that always starts with "This image is", quirk of cogvlm

#

I can easily change it with a simple LLM script

hollow spruce Dec 9, 2023, 5:27 PM

#

for starting, network dim 32 ~ 64 will probably be ideal

#

yeah cogvlm be like that XD

#

I usually limit its space, which gets rid of it. Probably not ideal though, so 🤷‍♂️

#

dire orchid Dec 9, 2023, 5:37 PM

#

hollow spruce then you're more than good to go. use kohya-ss -> enable TE training -> disable...

does starting with just 5k help or do I need to reach the 50k mark before it has any effect?

hollow spruce Dec 9, 2023, 5:37 PM

#

dire orchid does starting with just 5k help or do I need to reach the 50k mark before it has...

3k is roughly the minimum to make a proper impact. so you're more than ready

dire orchid Dec 9, 2023, 5:38 PM

#

amazing

#

do I need kohya or does dreambooth allow for training just the text encoder?

#

also looking into the dreambooth code, I'm not sure where it saves the trained encoder

hollow spruce Dec 9, 2023, 5:41 PM

#

unless I missed something, kohya was the only one to do have proper sdxl LoRA support
(with exception of diffusers - if you wanna go that route)

dire orchid Dec 9, 2023, 5:41 PM

#

ah gotcha

#

then kohya it is

#

since I no longer need to do some random compel stuff to extend the max length, this should be easy

hollow spruce Dec 9, 2023, 5:42 PM

#

ah, you gonna try to extent token limit and average it?

dire orchid Dec 9, 2023, 5:43 PM

#

I was doing that

#

no longer gonna do that

hollow spruce Dec 9, 2023, 5:43 PM

#

did it work? xD cause it shouldn't... but I really wanted to try it anyway...

dire orchid Dec 9, 2023, 5:43 PM

#

ehh, it did train

#

Im not sure if it trained well

#

I can share the code and you take a look at it

hollow spruce Dec 9, 2023, 5:44 PM

#

I mean the resulting file?
(kohya has native support for that - you can extend up to 225 tokens)

#

so its a not a question of if you can, just if you should 🤣

dire orchid Dec 9, 2023, 5:45 PM

#

ahh

#

wat the hell

#

I spent a week trying to put that into dreambooth

hollow spruce Dec 9, 2023, 5:46 PM

#

latent charm Dec 9, 2023, 5:48 PM

#

all llm generated captions have hallucination. You might filter or fix your dataset before.

dire orchid Dec 9, 2023, 5:49 PM

#

true, I should look through it manually

dire orchid Dec 9, 2023, 5:53 PM

#

hollow spruce I mean the resulting file? (kohya has native support for that - you can extend u...

yo do you have a simple kohya implementation on colab?

#

the ones I see are big for no reason

hollow spruce Dec 9, 2023, 5:54 PM

#

dire orchid yo do you have a simple kohya implementation on colab?

I run stuff locally, so sadly not ^^'

dire orchid Dec 9, 2023, 5:54 PM

#

ah rip

dire orchid Dec 9, 2023, 6:07 PM

#

hollow spruce I run stuff locally, so sadly not ^^'

do you use this gui?

#

or do u just code it up in python

hollow spruce Dec 9, 2023, 6:11 PM

#

dire orchid do you use this gui?

I've done both. but lately only the gui since I've gotten lazy xD

#

use the LoRA tab though

dire orchid Dec 9, 2023, 6:13 PM

#

hollow spruce use the LoRA tab though

I'm looking at the Lora tab and there is so much stuff

#

not sure what to click/change

gentle flame Dec 9, 2023, 6:13 PM

#

gentle flame Does anybody know if finetunes using perturbed noise or min snr gamma produce no...

For anyone that's interested, it WAS perturbation noise. Not my model, but the person finetuning saw the problem lessen once perturbed noise was reduced.

#

If you have a noisy model and are training with perturbed noise, then consider reducing it.

hollow spruce Dec 9, 2023, 6:16 PM

#

dire orchid not sure what to click/change

https://hoshikat-hatenablog-com.translate.goog/entry/2023/05/26/223229?_x_tr_sl=sv&_x_tr_tl=en&_x_tr_hl=en&_x_tr_pto=wapp

#

here's a good link for what all the settings do

dire orchid Dec 9, 2023, 6:16 PM

#

I'm seeing this and considering just dropping the learning rate to 0 for unet, but that seems extra stupid

dire orchid Dec 9, 2023, 6:17 PM

#

hollow spruce https://hoshikat-hatenablog-com.translate.goog/entry/2023/05/26/223229?_x_tr_sl=...

nice, thank you!

#

I will go through all of it later

hollow spruce Dec 9, 2023, 6:18 PM

#

dire orchid I'm seeing this and considering just dropping the learning rate to 0 for unet, b...

while it would work, it will still use up resources as if it were training x_x

#

@stiff dust do you know the param argument to disable unet training? (so it can be used for Text encoder only)

dire orchid Dec 9, 2023, 6:23 PM

#

maybe just this?

#

something like --train-unet = False

hollow spruce Dec 9, 2023, 6:28 PM

#

I cry everytime I want to find a specific kohya setting 🤣

#

--network_train_text_encoder_only

#

dear god that took way too long to find

#

and yep, it goes into that box

#

@dire orchid found it

#

also you should hide that link again ^^

dire orchid Dec 9, 2023, 7:14 PM

#

hollow spruce also you should hide that link again ^^

Thanks for telling me lol

dire orchid Dec 9, 2023, 7:15 PM

#

hollow spruce dear god that took way too long to find

Thank you man!

#

Did you get that from parse args?

#

I should have done that work sorry

ruby pond Dec 9, 2023, 9:32 PM

#

dire orchid <@383243561044410378> Hey man, I'm fairly new to sdxl finetuning and would reall...

I train LoRAs with the number of tokens set to 225 in kohya 🙂

dire orchid Dec 9, 2023, 9:32 PM

#

ah, nice

#

you use kohey ss?

ruby pond Dec 9, 2023, 9:32 PM

#

Yeah

dire orchid Dec 9, 2023, 9:33 PM

#

hmm, I wanted to ask a question about that, can you pass kohey an hf dataset that it can parse on its own?

#

rather than download the images and the captions and send the local folder

#

this is the dataset in question: https://huggingface.co/datasets/Yorth/dalleTestDataFiltered2

Yorth/dalleTestDataFiltered2 · Datasets at Hugging Face

ruby pond Dec 9, 2023, 9:34 PM

#

I haven't tried that, only done training on local data

dire orchid Dec 9, 2023, 9:34 PM

#

ah gotcha

#

so, how do you make sure that the images and caption pairs don't get randomly scrambled?

#

That's kinda what Im afraid of here

ruby pond Dec 9, 2023, 9:35 PM

#

Scrambled?

dire orchid Dec 9, 2023, 9:35 PM

#

as in a caption is linked with the wrong image

#

they're two folders right?

ruby pond Dec 9, 2023, 9:36 PM

#

Each image has a txt file with a matching name in the same folder

dire orchid Dec 9, 2023, 9:36 PM

#

ah

#

so that's how the link is done, gotcha

#

imma just use id then for the name

#

and do you just specify the image folder here and thats it?

#

or is there like a caption thing somewhere else where I need to pass the same folder

#

sorry for asking too much btw

ruby pond Dec 9, 2023, 9:39 PM

#

That folder should have a folder or multiple folders that are named starting with a number which is the number of repeats per epoch, e.g. 1_photo

#

The name isn't used unless you don't provide captions for each image

dire orchid Dec 9, 2023, 9:41 PM

#

so, something like 1_1.jpeg and 1_1.txt could work?

#

for like the first image for example

#

this way I keep the names unique

ruby pond Dec 9, 2023, 9:51 PM

#

dire orchid so, something like 1_1.jpeg and 1_1.txt could work?

The filenames don't need the number at the start, just the folder the files are in. E.g. I usually have a train_img folder with multiple folders like 1_photo 1_painting

#

Then select the train_img folder

dire orchid Dec 9, 2023, 9:51 PM

#

so these subfolders are for classes?

#

or something like that?

#

or do they just hold image text pairs?

#

I'm not sure what's their use

ruby pond Dec 9, 2023, 10:25 PM

#

dire orchid I'm not sure what's their use

It uses the folder name if the images aren't captioned. Otherwise just for organising or to balance how many times the images are seen because the number is how many repeats per epoch

dire orchid Dec 9, 2023, 10:27 PM

#

ruby pond It uses the folder name if the images aren't captioned. Otherwise just for organ...

did you do many repeats per epoch?

#

I'm thinking no repeat 5 epochs, unet plus text encoder training

#

over 5k images currently

ruby pond Dec 9, 2023, 10:30 PM

#

Depends. If it's more of a general training on lots of photos, then 1 repeat 30 epochs. If training on a handful of images for a specific thing then I might do 10 repeats for 10 epochs

dire orchid Dec 9, 2023, 10:31 PM

#

ah gotcha

#

trying to do general

#

that will prolly take a lot of compute then

ruby pond Dec 9, 2023, 10:34 PM

#

I usually just run overnight while I'm sleeping 🙂

dire orchid Dec 9, 2023, 10:35 PM

#

yeah that make sense lol

#

also, wanted to ask, did you train text encoder too?

#

seems like it would help with model understanding too

ruby pond Dec 9, 2023, 10:50 PM

#

dire orchid also, wanted to ask, did you train text encoder too?

I did, but then ended up putting the original sdxl clip model back into PixelWave as it started to behave weirdly. So I'd probably only train it if absolutely necessary, otherwise don't train it

dire orchid Dec 9, 2023, 10:52 PM

#

ah gotcha

#

I trained my text encoder and it seemed to help

#

at least compositionwise

#

@ruby pond it messed up right from left half the time as base and no text encoder run from my evaluation

#

sample size was 20 images

#

but then again, I trained it using dreambooth and I hacked into that code a lot

#

so not sure if that was better or worse than the kohey implementation at least for this case

ruby pond Dec 10, 2023, 12:12 AM

#

dire orchid <@383243561044410378> it messed up right from left half the time as base and no ...

lots of trial and error with training 🙂 even when you have something that works well with the data and captions, it will not work well on a different set of data, and could be the captions at fault, could be the images, could be the settings

dusky urchin Dec 11, 2023, 12:18 AM

#

gray pine Image captioning for LoRa Hi there, I find image captioning a bit confusing as ...

can you give some examples of captions you would like to render?

dusky urchin Dec 11, 2023, 12:18 AM

#

dire orchid I trained my text encoder and it seemed to help

a textual inversion may be easier and simpler to train

dusky urchin Dec 11, 2023, 12:23 AM

#

hollow spruce

where is the source code for this approach?

sacred grail Dec 12, 2023, 8:01 AM

#

what's the best auto-captioner out now? any progress on that lately?

lavish stream Dec 13, 2023, 2:51 PM

#

sacred grail what's the best auto-captioner out now? any progress on that lately?

Blip has been quite good for me.

#

Now training 4500/1000 steps with around 170 images. Using Adafactor with linear LR scheduler, using 3e-5 as the LR. Is the quality supposed to get better? I thought linear is going down from that initial value?

#

I'm running Kohya-trainer-XL script, not sure if this is the latest, found via direct Google link. On the Github of Linargruf this is not linked.

#

Relative_step has to be False but doesn't this break the Adafactor LR adjustment. With true the script contains that warmup must be 0, but even if I set it there, training does not run. Adafactor scheduler never works, I'm getting the same error. I thought Adafactor scheduler would be the best with Adafactor optimiser?

#

Speaking of the linear scheduler, where does it head for the LR? Absolute zero?

latent charm Dec 13, 2023, 3:16 PM

#

Planned to train a lora with Pick a pic v2 dataset. 1k image pairs proof of concept testing. Next step is to train with 10k image pairs.

#

Original, 200 steps lora with 1k image pairs

#

rotund cargo Dec 13, 2023, 5:53 PM

#

Greetings, fellow creators. I want to ask you if any of you know how to mantain consistency in a face without using ReActor or any other faceswappers out there. Should I just train a checkpoint for my model, try specific prompts (I'd rather not), any other ideas? I appreciate your help.

grave carbon Dec 14, 2023, 6:00 AM

#

cat

stiff dust Dec 14, 2023, 11:43 AM

#

lavish stream Now training 4500/1000 steps with around 170 images. Using Adafactor with linear...

I would say, if the image is destroyed like this it can just get worse. What is your base model? Some custom models like juggernaut use really weird noise settings and can be difficult to finetune.

stiff dust Dec 14, 2023, 11:45 AM

#

sacred grail what's the best auto-captioner out now? any progress on that lately?

BLIP is really bad in my opinion. LLava is much better, you have to use quantization, though, if you want to use the biggest model. Then there is CogVLM, which is I think the best autocaptioning model (except for ChatGPT4 maybe)

regal harbor Dec 15, 2023, 7:25 AM

#

everyone says to use constant, but it seems more logical to me to use linear... wouldn't it prevent over training by doing smaller steps near the end?
I also don't really understand what 'loss' means

lone zenith Dec 15, 2023, 7:36 AM

#

During training an AI is asked to produce something and that is compared against what the AI is being trained to produce, and the difference is the loss. During training, the model is slowly tweaked to minimize the loss.

icy valley Dec 15, 2023, 7:55 AM

#

Hi what s the best way to train sdxl model?

stone garden Dec 16, 2023, 7:01 AM

#

I am trying to figure out lora training and I have 10 epochs that I am running through, about half of them seem to ignore the character I trained the lora on and fulfil everything else in the prompt, I get the same image from 5 epochs that has nothing to do with the lora I want. Does that mean those epochs were overtrained or is there something in the way I am testing it?

stone garden Dec 16, 2023, 7:02 AM

#

icy valley Hi what s the best way to train sdxl model?

I am a newbie to all this, but three ways I am aware of: 1) downloading and using Koyha SS, 2) using a Google Colab of Koyha SS, 3) using the Civitai system. The first two didn't work for me, so I am using the third, but you have to pay for it.

stiff dust Dec 16, 2023, 9:29 AM

#

stone garden I am trying to figure out lora training and I have 10 epochs that I am running t...

what are the captions of your training data?

opal jacinth Dec 16, 2023, 1:23 PM

#

stiff dust BLIP is really bad in my opinion. LLava is much better, you have to use quantiza...

CogVLM just released something new yesterday

#

#

https://github.com/THUDM/CogVLM#introduction-to-cogagent

coral creek Dec 17, 2023, 9:35 PM

#

hey everyone, I'm working on fine-tuning a stable diffusion 1.5 model. was curious if anyone has a good way of seeing if the model is improving?

the problem I have is that because the noise + train timesteps are random the loss is very "jumpy" and doesn't give much sigal. does anyone have experience with metrics that help indicate it is learning? or does the fact that it is jumpy mean the network is not learning?

normal ember Dec 17, 2023, 10:17 PM

#

A/B testing is the way most do. Stability seems to have taken it a bit further with ELO scores.

coral creek Dec 17, 2023, 10:32 PM

#

normal ember A/B testing is the way most do. Stability seems to have taken it a bit further w...

thanks for the reply, for more context I'm trying to replicate animate anyone completely from scratch. the images generated are 99% still noise so I'm trying to debug if something is wrong with my code / architecture. I don't think I'm anywhere near being able to a/b test effectively and was curious if there is something that can be done much closer to the start of the training process

lavish stream Dec 18, 2023, 5:46 AM

#

stiff dust I would say, if the image is destroyed like this it can just get worse. What is ...

It was the base SDXL 1.0. The problem was the linear scheduler, and maybe a bad LR. I have to use constant with warmup and a constant rate, which kind of defeats the purpose of a dynamic optimiser, but the results are OK at least.

#

It's weird how almost none of the optimiser/scheduler settings even run in the original Kohya Colab script, and also Prodigy is missing :/ The GUI version is much better in this sense.

#

latent charm Dec 18, 2023, 5:49 AM

#

coral creek thanks for the reply, for more context I'm trying to replicate animate anyone co...

you might try constant lr to see does the loss go down.

dusky urchin Dec 18, 2023, 6:34 PM

#

does anyone have experience resumining training on a LoRA that was distributed on civitai? aka no gradients or related data, just weights. i understand that kohya supports resuming training, but on its own little data bundle. my expectation is that this has to be recreated with undefined data for the training snapshot, but with trained weights

dusky urchin Dec 18, 2023, 6:35 PM

#

stone garden I am trying to figure out lora training and I have 10 epochs that I am running t...

you have to use a rare token for character names, not the character names themselves, in a caption

boreal elk Dec 18, 2023, 7:24 PM

#

Does anyone know how to make the transitions between batches smoother in comfyui? I’m doing 150 batches and you can see the cut between them at 150,300,450 frames and so on. Is there a node that saves the last 5 frames from the previous batch so it will use that map for the new batch? If so what is the name and where does it go? Thanks! 🙏 (unless that node increases render time, then does anyone know how to make it smooth with davinci resolve?)

stiff dust Dec 19, 2023, 10:17 AM

#

dusky urchin you have to use a rare token for character names, not the character names themse...

rare tokens are not necessary in all cases. In particular, if the character already has a unique name

stiff dust Dec 19, 2023, 10:18 AM

#

dusky urchin does anyone have experience resumining training on a LoRA that was distributed o...

you always can resume training. The first steps might be a bit destructive without the gradient statistics, but you could use learning rate warmup for the first epoch

steel tide Dec 19, 2023, 8:19 PM

#

I've been creating SDXL LORAs of cartoon characters which for the most part work great at replicating the characters and style - However I'd like them to be more flexible so that I can change the overall style to something that it wasn't trained on as currently they're following the training images a bit too closely. Should I add some sort of captioning for the style?

bob belcher, wearing a white shirt, mouth open, sat in a bank

rancid tartan Dec 20, 2023, 12:37 AM

#

I'm trying to train a LoRA of a specific building. I'm testing the LoRA now and it can consistently generate the general shapes of the building. However, very often the proportions are off and/or some parts of the building go on the wrong place or are just missing. And the building repeats everywhere. The building is a bit complicated, but I wonder if there is a way to make it more consistent

robust urchin Dec 20, 2023, 12:46 AM

#

Am I wrong or more you give description in your .txt when you train your lora, then you get better freedom to change elements ?

#

For example I couldn't get a desert background until I described all my backgrounds in my databae

#

It seems like if you don't describe it in your database, your lora will generate it automatically. But if you do, it will give you the choice to modify it or not.

jade hornet Dec 20, 2023, 2:14 AM

#

robust urchin Am I wrong or more you give description in your .txt when you train your lora, t...

This is true, you describe anything you want to be about to change later. If you describe their hair, you can prompt for other colors. If your describe their clothing, you can prompt for other outfits. There's also an element of how overtrained... To some degree you can get away with not describing things, until you train a certain number of steps

robust urchin Dec 20, 2023, 8:39 AM

#

jade hornet This is true, you describe anything you want to be about to change later. If yo...

Oh ok thank you. I try to describe clothing now. In order to see if I can change them easily.

Also, the gender is easier to change when you say it is a man. Instead of nothing.

And, I'm not sure but it seems a lower learning rate and repetition may change the way of changing things.

For example with a higher learning rate of 0.0004 + repetition of 10, the lora becomes overtrained quickly. Like around 20-25 epoch.

When at 0.0001 or 0.00005 and repetition of 2. After 100 epochs, lora stays flexible.

#

But sdxl lora are impressive and catch easier what you want. For example the spear the jaffa is holding. I put more Ma'tok staff images in my database and described it like a spear.

And it starts to look more like the thing from the show than a regular spear:

#

Same for the tissue strips attached to the helmet. They started to be generated when I described them in my .txt.

potent rain Dec 20, 2023, 7:52 PM

#

Hi all! Where do you put / find your finetuned SD models? I'm looking for a good existing anime-centric finetune of 1.5 . I figure there must be a dozen options for this, but can't seem to find any.

stray mulch Dec 20, 2023, 8:56 PM

#

Hello, I need help with Inpaint and hands.. How can I solve this issue?

#

#

How do I set it up to fix the hands? :/

#

I am new with intune

robust urchin Dec 21, 2023, 2:25 AM

#

Even in describing the clothes, it is really hard to change them. Like I cannot get my jaffa in a suit with a tie...

#

But the Matok shaft is awesome now...

robust urchin Dec 21, 2023, 2:45 AM

#

and I can control the eye color....

#

oh yeah....

#

start understanding how to train my lora...

sacred grail Dec 21, 2023, 3:59 PM

#

robust urchin Even in describing the clothes, it is really hard to change them. Like I cannot ...

If you describe what its wearing in the training you’ll probably be able to control it
For example if you put desert background in the training image it will know that the background is a desert so it’ll be able to keep it out if you want to, otherwise it will just train it in and assume it should be there by default

robust urchin Dec 21, 2023, 4:00 PM

#

sacred grail If you describe what its wearing in the training you’ll probably be able to cont...

I put green bodysuit + green and purple loincloth + antique belt

#

But it doesn't seem to change it...

robust urchin Dec 21, 2023, 6:00 PM

#

I train with armor. Not easy to modify the clothes but I start getting something

#

I remove the head and use that in my data base

feral orbit Dec 21, 2023, 9:13 PM

#

Hey all, pretty new to the space so sorry if this is a stupid question. I just made a dreambooth finetune using epicphotogasm, but all the images coming out of it are a bit fried. I used 15 images to train it. Also I've run this same prompt like 10x, and just about every single one has the same pose, facial expression, angle, etc. Does this mean it's overfitted? Any pointers on what I'm likely doing wrong here, or where I should look to get my answers? I've been googling for a while and haven't quite found anything clear yet.

robust urchin Dec 22, 2023, 2:17 AM

#

It works !!!!!!

#

not perfect but much better than my last tries

#

they look intimidating now...

#

god bless sdxl lora...

robust urchin Dec 22, 2023, 3:33 AM

#

And I keep flexibility for gender and background....

#

ok I will try few last things and start the ultimate lora about stargate !!!

#

https://tenor.com/view/stargate-sg1-gif-19263898

Tenor

#

damn... it really works....

dusky urchin Dec 22, 2023, 7:41 PM

#

robust urchin damn... it really works....

lol i am reading this post on reddit now. good job

robust urchin Dec 22, 2023, 8:22 PM

#

dusky urchin lol i am reading this post on reddit now. good job

https://tenor.com/view/jethroe-kurt-russell-countach-stargate-gif-18077845

Tenor

#

A new ai god is born

robust urchin Dec 22, 2023, 10:10 PM

#

Also, Im not sure yet but I think it is important to remove th background when we can... SDXL Lora is really powerful and if you have a desert background, it will learn it at 100% and change the random desert background from sdxl model by the one of sdxl lora.

#

they should call SDXL Lora, CTRL + C, CTRL + V...

#

weird because previous Loras from 1.5 and 2.1 needed a background. Otherwise, you got only empty background everywhere....

robust urchin Dec 23, 2023, 1:19 AM

#

I think I have the perfect parameters now...

stone garden Dec 23, 2023, 12:10 PM

#

What is fine tuning?

robust urchin Dec 23, 2023, 3:26 PM

#

stone garden What is fine tuning?

subtle art of creating lora, embedding, hypernetwork , etc...

#

And I find a guide to create captionning. the guy uses almost everything I use but in more detailed. I didn't know if I was making a mistake in using danbooru tags. But it seems good.

#

https://www.reddit.com/r/StableDiffusion/comments/118spz6/captioning_datasets_for_training_purposes/

From the StableDiffusion community on Reddit: Captioning Datasets f...

Explore this post and more from the StableDiffusion community

stone garden Dec 23, 2023, 3:27 PM

#

robust urchin subtle art of creating lora, embedding, hypernetwork , etc...

What do any of those mean?

robust urchin Dec 23, 2023, 3:27 PM

#

stone garden What do any of those mean?

huuh

#

https://civitai.com/models

Civitai Models | Discover Free Stable Diffusion Models

Browse from thousands of free Stable Diffusion models, spanning unique anime art styles, immersive 3D renders, stunning photorealism, and more

#

checkpoints are general models

#

lora are some kind of style or you could call that flavor to the general model

#

for example, if you think the base Sdxl model is not able to generate a correct baby yoda. You will train it with a data base to have the one you want:

#

But most of Lora trained on civitai are useless. People are just bad at prompt engineering. Therefore, they create a lora. Since you can get a good baby yoda from the base model without training if you use the correct prompt.

#

But in my case, I try to create a stargate universe lora. And sdxl do not know that. Because sdxl dev probably do not know what stargate is or copyrights issue. Therefore, I have to create a data base and train a lora in order to get that:

#

https://tenor.com/view/stargate-atlantis-stargateportal-gif-14618984

Tenor

#

Shortly, you just try to make your model learn something it doesn't know. And that is a lora, embedding, hypernetwork, fine tuning...

topaz needle Dec 24, 2023, 11:48 AM

#

robust urchin But in my case, I try to create a stargate universe lora. And sdxl do not know t...

I am finding this trouble too with warhammer, are there any guides or tutorials to make your own lora?

swift kite Dec 24, 2023, 11:49 AM

#

/img 3d

jade hornet Dec 24, 2023, 3:01 PM

#

topaz needle I am finding this trouble too with warhammer, are there any guides or tutorials ...

So many. You really need 2-3 tutorials. One that focuses on proper captioning - Google for a reddit post named "captioning datasets for training purposes". Another that will explain the settings of your chosen trainer, kohya scripts or kohya_ss wrapper, everydream 2, etc. and optionally one to show to use cloud compute, unless you happen to have a top of the line card with lots of VRAM. Here's one that combines some of those https://aituts.com/sdxl-lora/

AiTuts

Yubin Ma

How to train an SDXL LoRA (Koyha with Runpod) - AiTuts

This guide will cover training an SDXL LoRA. It's meant to get you to a high-quality LoRA that you can use with SDXL models as fast as possible. "Fast" is relative of course. Gathering a high quality training dataset will take quite a bit of time. The actual model training will also take time, but ... Read more

robust urchin Dec 24, 2023, 3:04 PM

#

topaz needle I am finding this trouble too with warhammer, are there any guides or tutorials ...

There is guides on reddit. Type sdxl lora reddit. Or on youtube.

#

And you can train your lora locally or on colab

jade hornet Dec 24, 2023, 3:07 PM

#

Colab pro, vast.ai,runpod. Colab free not recommended

robust urchin Dec 24, 2023, 3:07 PM

#

Localy = use your own graphic card
Colab = use a cloud

#

https://colab.research.google.com/github/Linaqruf/kohya-trainer/blob/main/kohya-LoRA-trainer-XL.ipynb

Google Colaboratory

jade hornet Dec 24, 2023, 3:07 PM

#

Unless you want crappy Lora, and piss off Google at the same time

robust urchin Dec 24, 2023, 3:09 PM

#

Colab free works great for me. I opened 10 accounts and switch from one to another.

Only problem is my training must last less than 4h or I get kick out...

#

But it is ok. Instead of creating one big lora of 400 images. I just split it in 20 images

jade hornet Dec 24, 2023, 3:10 PM

#

robust urchin Colab free works great for me. I opened 10 accounts and switch from one to anoth...

Not enough resources to do it correctly

robust urchin Dec 24, 2023, 3:13 PM

#

Yep sdxl requires a lot of GPU ram... But Im not here to sell anything or create the ultimate sdxl lora. It is just to experiment and see some sexy jaffa

stiff dust Dec 24, 2023, 3:41 PM

#

there are so many tutorials and each claims to be the ultimate guide. Don't take them too serious. If you have enough vram (12 gb are good) you should rather experiment yourself

robust urchin Dec 24, 2023, 4:39 PM

#

what is frustrating is I need a lora around 40 epoch to get good details and flexibility. Also, I need to keep batch size at 3-4 and repetition around 5. If I reduce my database I lose control too. So I need at least 20 images.
So to train my lora it is something between 3h and 4h...

#

I got disconnected after 2h30 for 2 days....

#

last one stopped at 35 epoch, I was so close to get something really good...

#

when I think about all these guys with their 4090 just making nudes... Give me the same thing and I flood the world with real ai art !

robust urchin Dec 24, 2023, 5:10 PM

#

some stuffs dont work but damn I was close...

#

rare holly Dec 24, 2023, 7:52 PM

#

I want to fix the collar and name tag in this picture.

#

I'm trying to use inpaint sketch, but it's changing everything for the worse.

#

#

#

What am I doing wrong here?

stiff dust Dec 24, 2023, 8:18 PM

#

you can try better prompt. Add "detailed, collar" into negative prompt

rare holly Dec 24, 2023, 8:26 PM

#

stiff dust you can try better prompt. Add "detailed, collar" into negative prompt

I can't get the spots on the nose to work with my current prompt

#

I was able to add those with inpaint sketch

#

I'm just not getting consistent results with inpaint sketch probably because I don't understand it very well

stiff dust Dec 24, 2023, 8:33 PM

#

inpaint Sketch is identical to juat changing the image with any graphics program before doing inpainting

rare holly Dec 24, 2023, 8:35 PM

#

stiff dust inpaint Sketch is identical to juat changing the image with any graphics program...

I'm not sure what you mean

stiff dust Dec 24, 2023, 8:35 PM

#

rare holly I can't get the spots on the nose to work with my current prompt

with inpainting you can just change the neck

#

inpaint sketch is the same as normal inpainting. But if you draw a red dot on the image and inpaint it it is more likely it gets something red

rare holly Dec 24, 2023, 9:02 PM

#

stiff dust inpaint sketch is the same as normal inpainting. But if you draw a red dot on th...

Thanks for the input

red hornet Dec 24, 2023, 11:10 PM

#

With ControlNet Inpainting, how is the behavior different with and without an input image in the ControlNet panel?

topaz needle Dec 25, 2023, 3:07 AM

#

Is it okay if I only install SDXL through civitAI as a checkpoint?

#

Or do I have to do anything else

#

I am soft in the head

robust urchin Dec 25, 2023, 4:21 AM

#

topaz needle Is it okay if I only install SDXL through civitAI as a checkpoint?

install sdxl turbo

#

faster and same quality

#

https://www.youtube.com/watch?v=2r3uM_b3zA8&t=544s

YouTube

Nerdy Rodent

How to Install ComfyUI in 2023 - Ideal for SDXL!

In this ComfyUI tutorial I show how to install ComfyUI and use it to generate amazing AI generated images with SDXL! ComfyUI is especially useful for SDXL as poor old Automatic1111 can have a hard time using it - especially if you try to use the refiner! Also works great for Stable Diffusion 1.5!

It's really easy to install - especially with th...

▶ Play video

#

Me Im blocked on SDXL + Controlnet...

#

I try to use my Lora with controlnet but it doesn't seem to work in Comfyui...

dusky urchin Dec 25, 2023, 6:12 AM

#

robust urchin when I think about all these guys with their 4090 just making nudes... Give me t...

i can give you access to more resources

wheat rock Dec 25, 2023, 6:32 AM

#

I am trying to train dreambooth lora sdxl with diffusers

#

But doesn't work well for me.

#

I am using this code.

#

https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/SDXL_DreamBooth_LoRA_.ipynb

Google Colaboratory

#

When I change the image directory, it does work

sacred grail Dec 25, 2023, 7:20 AM

#

Could it be possible to merge a specific prompt and seed into another model?

#

Im wondering the same, it could be really sick to do a bot that has all custom models and then asks which is the best image and merges the best prompt+seed combo into the base model

latent charm Dec 25, 2023, 8:17 AM

#

sacred grail Im wondering the same, it could be really sick to do a bot that has all custom m...

you are general asking for a Reinforcement Learning from Human Feedback. The recent released paper DPO makde use of pickapicv2 dataset for such a thing.

#

Merging with layers doesn't directly effect specific concept. You might try the slider repo prompt training to extract that. https://sliders.baulab.info/

Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models

Precise and continuous control over concepts through low-rank model parameter directions

stiff dust Dec 25, 2023, 11:56 AM

#

in general, the first and last layers encode fine details and the middle layer encodes composition

#

there is no general formula. Use validation images and train until the images look good. Often it's better to slightly overtrain the lora.

wise dust Dec 25, 2023, 12:45 PM

#

I asked that almost a year ago 😭

stiff dust Dec 25, 2023, 12:47 PM

#

lol, sry, my discord is really buggy at the moment 😂

robust urchin Dec 25, 2023, 3:12 PM

#

dusky urchin i can give you access to more resources

I think some people need to have handicap. Because they are so talented, the world is not ready. If with 8GB I can create that, I will probably destroy the competition with something more powerful. And merry christmas !

#

#

#

#

#

robust urchin Dec 25, 2023, 6:23 PM

#

it finally works....

#

robust urchin Dec 25, 2023, 6:53 PM

#

robust urchin Dec 26, 2023, 6:56 AM

#

dusky urchin Dec 28, 2023, 1:18 AM

#

i'm experimenting with some effects like adding and removing shadows. is there a way i can query clip for captions whose embedding is -shadow? or is there a way to train with negative conditioning in the text encoder?

south narwhal Dec 28, 2023, 6:58 AM

#

hey guys, can anyone help? I keep getting ValueError: '/data/DuaLipa/DL 3.jpg' is not in list when I'm trying to train model in dreambooth, anyone know why?

astral island Dec 28, 2023, 9:41 AM

#

when training dreambooth or lora, is there any mechanism to try to steer the model towards only modifying the relevant parts?
for example, when I have a dataset that consists of images of "lamborghini super car driving on a mountain road", is there any mechanism that helps the training algorithm to only modify the neurons responsible for "lamborghini","super car","driving" and "mountain road"?

full remnant Dec 29, 2023, 4:49 AM

#

Wanna train a model for JAV-style images (SDXL finetuned on movies snapshots and posters), too bad there is no suitable dataset for that. 😢

dusky urchin Dec 29, 2023, 5:55 AM

#

astral island when training dreambooth or lora, is there any mechanism to try to steer the mod...

text encoders will learn contrastive examples best. so you'd need images of
a ford pinto car driving on a mountain road (lambourghini contrast)
a Lamborghini motorcycle driving on a mountain road (car contrast)
a lamborghini super car parked on a mountain road (driving contrast. observe this may be indistinguishable from driving, which should illuminate for you why driving is hard to depict: it is not something you can see)
...

stiff dust Dec 29, 2023, 1:45 PM

#

dusky urchin i'm experimenting with some effects like adding and removing shadows. is there a...

you can interpolate embeddings, e.g. take the two captions "a tree near a house" and "a tree near a house with shadow" and then for each token compute the difference between both captions. You can then add the difference multiplied with some scaling to increase or decrease the shadow
there are lora sliders that train the unet to increase or decrease a certain term, see https://github.com/rohitgandikota/sliders/tree/main

GitHub

GitHub - rohitgandikota/sliders: Concept Sliders for Precise Contro...

Concept Sliders for Precise Control of Diffusion Models - GitHub - rohitgandikota/sliders: Concept Sliders for Precise Control of Diffusion Models

dusky urchin Dec 29, 2023, 4:50 PM

#

stiff dust - you can interpolate embeddings, e.g. take the two captions "a tree near a hous...

i guess shadows are complex because there's "shadow" in the sense of an explicit image of a shadow, and shadow in the sense of the intrinsic rules of lighting. i am trying to express that the image is missing the intrinsic rule of lighting that causes shadow, and that we want it back. slider lora sounds promising - to me this sounds like colorizing a black and white photograph, like something that needs a distinct approach to fix.

dusky urchin Dec 31, 2023, 5:32 AM

#

is there any community with deepfloyd loras?

lament zodiac Jan 1, 2024, 10:08 AM

#

I am fairly new to training stable diffusion , i had trained couple of lora model for specific use case, now i want to implement it on large scale with big dataset ,
i used kohya ss gui to train lora , now i am using a dtaset with 260000 images to train lora , but i unable to start it

I am using runpod with rtx 4090 gpu, using batch-size :1

Any idea how to do training or finetunning on large dataset

stiff dust Jan 1, 2024, 11:52 AM

#

what's your dim and optimizer?

lament zodiac Jan 1, 2024, 1:16 PM

#

optimizer :- Adam W
no idea about dim

stiff dust Jan 1, 2024, 3:27 PM

#

AdamW should be fine...

dim/rank is one of the most important parameters, you should check that. With so many images you can probably use high rank, but should be still below 64

latent charm Jan 2, 2024, 9:18 PM

#

You might try ShareGpt4v https://github.com/InternLM/InternLM-XComposer/tree/main/projects/ShareGPT4V instead of llava. It is closer to gpt4v in captioning.

GitHub

InternLM-XComposer/projects/ShareGPT4V at main · InternLM/InternLM-...

Contribute to InternLM/InternLM-XComposer development by creating an account on GitHub.

jade hornet Jan 4, 2024, 4:29 AM

#

Why not just start a GitHub project vs polling a discord channel for interest, seems inefficient

#

Plus most people would prefer that link vs DM for more information

latent charm Jan 4, 2024, 8:52 AM

#

I released my filtered 4k image pairs human preference dataset, a subset of pickapicv2. https://mega.nz/file/fgsxhbIa#QSNcjVxm4vY2f68PyOzmlIMHQCQOe93EyyFK1rmRkEc
More information: https://civitai.com/articles/3620

Human Preference Training (Utilize concept slider and pickapicv2) |...

Pickapicv2 dataset https://huggingface.co/datasets/yuvalkirstain/pickapic_v2 Collected 10k subset from pickapicv2 training split with hpsv2 score f...

6.21 GB file on MEGA

gray pine Jan 4, 2024, 1:50 PM

#

Hey Guys,
I want to do some training and not sure if should use LoRa, Dreambooth, Textual Inversion or something completely different. Let’s do it on the example of big feet. So most models do know what feet are but can’t really distinguish between sizes. My approach would be to gather many images of people with big feet, use the WaifuDiffusion Tags to describe them, i.e. 1girl, sitting, …., feet, big feet. Then I would train a LoRa with those images and the descriptions. I’m not sure if this is the best approach since I don’t want to train every aspect of the pictures but just have the training focus on the „big feet“ aspect. Any advice? Thx

latent charm Jan 4, 2024, 2:04 PM

#

To control specific conecpt, you might try concept slider to train a lora

wind garden Jan 4, 2024, 3:01 PM

#

Hello, can anyone help me?

#

I am trying to create sdxl lora model based on 6 product images

#

Every image has white background

#

I wanna keep the original color but after I finish training, it struggles with color

#

It sometimes generate random color product even if it capture the exact shape of the product

#

does anyone can help me?

jade hornet Jan 4, 2024, 3:08 PM

#

wind garden I wanna keep the original color but after I finish training, it struggles with c...

Are you using captions? If so provide an example

wind garden Jan 4, 2024, 3:33 PM

#

These are captions I am using.
{"file_name": "9535401033758.jpg", "prompt": "a photo of TOK, chanel quilted flap bag"}
{"file_name": "14590ch_chanel_classic_double_flap_medium_front_strap.jpg", "prompt": "a photo of TOK, chanel classic flap bag in black"}
{"file_name": "9536196280350.jpg", "prompt": "a photo of TOK, chanel black classic flap bag"}
{"file_name": "9535399264286.jpg", "prompt": "a photo of TOK, chanel black quilted flap bag"}
{"file_name": "9535401164830.jpg", "prompt": "a photo of TOK, chanel classic flap bag in black quilted leather"}
{"file_name": "9535397691422.jpg", "prompt": "a photo of TOK, chanel quilted flap bag with gold chain"}
{"file_name": "9535401099294.jpg", "prompt": "a photo of TOK, chanel black quilted flap bag with gold chain"}

#

These are the product images

#

14590ch_chanel_classic_double_flap_medium_front_strap.jpg

latent charm Jan 4, 2024, 3:46 PM

#

remove black in catpion which would learn black with the bag

dusky urchin Jan 4, 2024, 4:44 PM

#

wind garden These are captions I am using. {"file_name": "9535401033758.jpg", "prompt": "a p...

you will simply need a better training set and better captions. is this for a commercial project?

wind garden Jan 4, 2024, 4:55 PM

#

Yeah.
How can I improve captions?

stiff dust Jan 4, 2024, 4:55 PM

#

can you show an example image where the color went wrong?

wind garden Jan 4, 2024, 4:56 PM

#

latent charm remove black in catpion which would learn black with the bag

I tried that but it didn't work

wind garden Jan 4, 2024, 4:58 PM

#

stiff dust can you show an example image where the color went wrong?

#

This is the example image

stiff dust Jan 4, 2024, 4:59 PM

#

wow, that's weird 😅

#

colors should be easy to learn...
do you use noise offset and if so how much?

wind garden Jan 4, 2024, 5:00 PM

#

Just use default

stiff dust Jan 4, 2024, 5:01 PM

#

maybe the word "gold" is somewhat bleeding over the image. What happens if you generate the image with very low CFG (e.g. cfg 2-3)?

wind garden Jan 4, 2024, 5:05 PM

#

In case of cfg=2 in inference, still go wrong sometimes.

#

stiff dust Jan 4, 2024, 5:06 PM

#

maybe the color is "too easy" such that it does not learn it. I don't think it's a problem with the caption. You can try using a higher noise offset and/or training with min-snr-gamma

wind garden Jan 4, 2024, 5:08 PM

#

I train the model with following parameter set.
accelerate launch train_dreambooth_lora_sdxl_advanced.py
--pretrained_model_name_or_path="stabilityai/stable-diffusion-xl-base-1.0"
--pretrained_vae_model_name_or_path="madebyollin/sdxl-vae-fp16-fix"
--dataset_name="./in"
--instance_prompt="a photo of a TOK"
--validation_prompt="a photo of a TOK, a beautiful young woman holding black chanel bag"
--output_dir="out"
--caption_column="prompt"
--mixed_precision="bf16"
--resolution=1024
--train_batch_size=1
--repeats=1
--optimizer="prodigy"
--learning_rate=1.0
--text_encoder_lr=1.0
--max_train_steps=720
--gradient_accumulation_steps=1
--gradient_checkpointing
--train_text_encoder_ti
--lr_scheduler="constant"
--lr_warmup_steps=0
--rank=32
--checkpointing_steps=2000
--seed="0" \

#

And also tried with this set.
accelerate launch train_dreambooth_lora_sdxl_advanced.py
--pretrained_model_name_or_path="stabilityai/stable-diffusion-xl-base-1.0"
--pretrained_vae_model_name_or_path="madebyollin/sdxl-vae-fp16-fix"
--dataset_name="./in"
--instance_prompt="a photo of a TOK"
--validation_prompt="a photo of a TOK, a beautiful young woman holding chanel bag"
--output_dir="out"
--caption_column="prompt"
--mixed_precision="bf16"
--resolution=1024
--train_batch_size=3
--repeats=1
--gradient_accumulation_steps=1
--gradient_checkpointing
--learning_rate=1.0
--text_encoder_lr=1.0
--adam_beta2=0.99
--optimizer="prodigy"
--train_text_encoder_ti
--train_text_encoder_ti_frac=0.5
--snr_gamma=5.0
--lr_scheduler="constant"
--lr_warmup_steps=0
--rank=32
--max_train_steps=1000
--checkpointing_steps=2000
--seed="0"

#

But both doesn't work well.

normal ember Jan 4, 2024, 5:31 PM

#

They (huggingface) combine some new stuff in that training method along with existing like prodigy optimizer. They are trying to find something that doesn't require as much adjustments of the hyperparameters.

#

Not had much luck with adaptive optimizers myself.

#

I’ve seen suggestions that it would be better to stick with fp16 instead of bf16 too.

regal harbor Jan 4, 2024, 7:45 PM

#

jade hornet Jan 4, 2024, 9:05 PM

#

wind garden I tried that but it didn't work

his advice was good, if you caption the color of your subject, in this case a purse, you will get other colors in the inference. Only caption the things you dont want the training to learn. You can caption distance, some of your images are "close-up", some are "full shot", some are "medium shot". You can caption orientation, some of yours are "profile", some are "frontal". Doing that should allow you to ideally try to pull those out later. In general, just provide a token to be able to reference it later, "xyzpdq" ...or pick something else less cryptic, maybe include a designer "xyzdesigner" and a class "purse". In your training shuffle captions put your special token first, your class "purse" 2nd, and then for the rest just caption what you dont want..like maybe the white background? unless you want that in your results. Use "keep tokens" of 2, assuming you're using kohya, which keeps those special tokens prioritized

#

one thing I'll say here about having all images against a white background, unless that's desired, even with captions that will be hard. you're honestly better off doing some hack jobs to put the purse in some diverse backgrounds...or maybe you can do some creative things with controlnet. like generate a lora with the images you have, but then use that lora, in addition to controlnet to create a new image dataset as a source for your next generation of lora that's far more diverse.

#

you can do all black background vs all white, then in controlnet generate some depth images, canny, etc, and with those you'll have more power and flexibility to put the products in any kind of background you want

dusky urchin Jan 4, 2024, 10:07 PM

#

wind garden Yeah. How can I improve captions?

you aren't going to make progress with the assets you have. it will not generalize to coloring the bag differently. you will have to modify the bag in photoshop to generate more data; or, the best option is they mail you the variety of bags they want you incorporate, and you photograph it and caption it yourself. otherwise, most of your choices for the arguments look flawed to me.

dusky urchin Jan 4, 2024, 10:12 PM

#

wind garden

is the expectation that it looks flawlessly like the chanel purse? i think the media managers for this kind of thing are pixel peepers

dusky urchin Jan 4, 2024, 11:49 PM

#

let's say I want to train a lora A against SDXL + another Lora B (lora B is frozen) aka instead of my weights looking like

SDXL (frozen) . LoRA A (backpropagation changes it in training)
i want SDXL . Lora B = SDXL'
and now i want to train
SDXL' (frozen) . LoRA A (backpropagation changes it in training)

is this possible?

wind garden Jan 5, 2024, 7:33 AM

#

jade hornet one thing I'll say here about having all images against a white background, unle...

I tried to augment background.
But it seems like during the training model also learn the background.

#

https://github.com/camenduru/background-replacement-hf

GitHub

GitHub - camenduru/background-replacement-hf

Contribute to camenduru/background-replacement-hf development by creating an account on GitHub.

#

I used this code to replace the background.
But this code not working well and also, the final result was not good

wind garden Jan 5, 2024, 7:39 AM

#

jade hornet his advice was good, if you caption the color of your subject, in this case a pu...

One question in captioning is, if I use the color prompt in caption, is it possible to capture the exact color of the original product after training?
For example, the exact color of the product is black brown.
But I use black in captions.
After I train the model, I simply use the prompt like this.
"a black chanel handbag on the table"
The question is, does that color prompt black can capture the exact color of original bag - black brown? Or it will generate black bag?

jade hornet Jan 5, 2024, 2:39 PM

#

you're telling it that the bag is black, so it will ignore the color entirely...I wouldnt describe the bag at all, if you want it to learn it as it was presented in your images, then let it learn it without trying to tell it what it sees

dusky urchin Jan 5, 2024, 4:00 PM

#

wind garden I tried to augment background. But it seems like during the training model also ...

having just done what you are trying to do, i think you should maybe go back to to the folks you are working with and ask for more images like i said

dusky urchin Jan 5, 2024, 4:01 PM

#

jade hornet you're telling it that the bag is black, so it will ignore the color entirely......

i am not sure if this is true

jade hornet Jan 5, 2024, 4:03 PM

#

uncertainty noted

dusky urchin Jan 5, 2024, 4:03 PM

#

jade hornet uncertainty noted

for example, what is "it"? what does "ignore" mean?

jade hornet Jan 5, 2024, 4:04 PM

#

well there is subject training and style training, "it" refers to the subject

dusky urchin Jan 5, 2024, 4:04 PM

#

there are guides online that say things like subject and style training

jade hornet Jan 5, 2024, 4:04 PM

#

I thought it was pretty obvious the bag was the subject in the conversation

dusky urchin Jan 5, 2024, 4:04 PM

#

but all of that stuff is folksy wisdom

#

i mean some of it can be useful, but it isn't robust or generalizable. it's usually 1 person's experience doing like 10 things and reading 100 reddit posts. and maybe the 1 thing of the 10 that guy does works, and he writes about it

#

writing a caption "black chanel bag" in my opinion doesn't make the text encoder in CLIP ignore, or focus on, anything, in a generalized sense. but looking at the dataset, it would be more accurate to say that without an example of a white bag, a text encoder LoRA isn't going to separate associating blackness with the shape of the bag and the color of the bag

jade hornet Jan 5, 2024, 4:07 PM

#

you casting shade over all the guides out there isnt really constructive in this case, I guess take what I said at face value, or dismiss it with all the other folklore then

dusky urchin Jan 5, 2024, 4:07 PM

#

jade hornet you casting shade over all the guides out there isnt really constructive in this...

yeah yeah, i mean the problem is the poor user way overpromised what can be done

#

@wind garden so you better break out photoshop or get more images

#

so you are right that the user should remove "black" from the caption, but only because there are no examples of white bags. with the current dataset, black would become associated with every aspect of the image, which is undesirable. it's the Contrastive in CLIP that is missing.

jade hornet Jan 5, 2024, 4:11 PM

#

if he had a white bag, and he captioned white bag, I'm saying that would be a bad idea if he wanted it to be white, because by captioning it thus, you essentially turn the subjects color into a variable

dusky urchin Jan 5, 2024, 4:11 PM

#

i think the user also misunderstands the purpose of LoRA. it isn't a general subject or style learning approach that works with small datasets, it's computational trickery to make diffusion model fine tuning require way less resources

dusky urchin Jan 5, 2024, 4:11 PM

#

jade hornet if he had a white bag, and he captioned white bag, I'm saying that would be a ba...

hmm, if he had images of both a black and a white bag, and he wanted to eventually generate a green bag, he should definitely include the color in the caption.

jade hornet Jan 5, 2024, 4:12 PM

#

you can certainly still prompt for a white bag, but as you just pointed out, prompts can be flawed and using a color like that in a prompt may and almost certainly will bleed into other items in the image

dusky urchin Jan 5, 2024, 4:12 PM

#

it's dumb luck that it works well with styles and subjects undertrained with small datasets

#

i'm sure tencent is working on the exact application the user needs right now.

jade hornet Jan 5, 2024, 4:13 PM

#

so all I am suggesting is to let the training discern the color, dont try and force it with captioning

#

this is somethnig I know to be true, if you have 10 images of someone with blonde hair, you'll always get blonde hair in your inference, unless you caption the hair

dusky urchin Jan 5, 2024, 7:06 PM

#

does there exist an idea of pretraining for loras? i.e., i would always resume lora training from a pretrained lora checkpoint, with its gradients and everythinglg?

latent charm Jan 5, 2024, 7:16 PM

#

what benefit does it have? train faster or better result?

dusky urchin Jan 5, 2024, 7:31 PM

#

latent charm what benefit does it have? train faster or better result?

i would assume better results. a pretraining for example might take SDXL's pre-existing dataset, and train a lora against it, which ought to produce no visible difference in the results. but such an "initialized" lora will have good gradients and weights for the unet & text encoder, even a learning rate, that may result in improvements when you resume that lora with your training set

latent charm Jan 5, 2024, 7:51 PM

#

You might do an experiment and share the results. I think if you are gonna to train several similar lora which has same style but different character, it might be helpful. But in general, everyone train with different dataset and push the lora close to the dataset. A general pretain might have contradict direction.

autumn jay Jan 5, 2024, 8:13 PM

#

Hi every body, i try to make a lora but after many tries the result isn t good. My caracther is always malformed. I have read and listen many tutorial but it seems that any tuto was made by perso who understand clearly the concept. Somebody would answer my questions ? About caption, images... I have missed something but what ?

jade hornet Jan 6, 2024, 12:04 AM

#

dusky urchin does there exist an idea of pretraining for loras? i.e., i would always resume l...

Your could use a save, but what makes more sense is to merge the lora back into the base and just use that checkpoint

#

Or just generate a full checkpoint in the first place

#

Checkpoints are big so you'll want to keep it in hugginface or gdrive

wind garden Jan 6, 2024, 2:54 AM

#

I wanna generate background of product image.
What is the best way to generate background of object?

regal harbor Jan 6, 2024, 8:31 AM

#

wind garden https://github.com/camenduru/background-replacement-hf

if you replace too many backgrounds during fine-tune, wouldn't the model become worse at making the subject look realistically in the room/environment they're in? Because of lighting / color mismatches?

autumn jay Jan 6, 2024, 10:12 AM

#

I had a question about photo quality, i need to resize photo to 512*512 for SD training. I download high quality of my character on the internet, but when i resize them (even if i crop them before to have only the portrait) i loose information (it's logic cause i pass from 1024 to 512 for example, and there are twice less information). There is a way to keep an extrem high quality when i zoom on the 512 picture ?

regal harbor Jan 6, 2024, 4:42 PM

#

autumn jay I had a question about photo quality, i need to resize photo to 512*512 for SD t...

you don't need to crop the images at all. In kohya simply choose max-size 512x512 and it'll auto-crop everything.

autumn jay Jan 6, 2024, 5:25 PM

#

ok, so i can crop image to get part that i want to train, no matter the size (10241024, 20482048, ...) kohya resize to 512. With good quality ? cause i have used a soft to crop all my image but result is less quality that when i use gimp and do it by hand

old birch Jan 6, 2024, 7:45 PM

#

Hey guys, I need help with how to add certain objects or specific body parts to an image. Lets say I have a portrait of a man, and I want to add a military hat and badges to him. So I impaint the forehead and heair area for the hat, and the chestarea for the badges, but the results are poor. The color and style doesnt match well and it looks bad. The badges arent even thw type I wanted.
So my question is, is there any way to use own resources (google images) to add it and keep the consistency? I've read that lora works well for characters, but not sure if it works well with objects and faces

#

Not sure if new model, lora or embedding would do the best job for that

grand stag Jan 6, 2024, 8:17 PM

#

Lora's should work well enough yea as it limits the training data a bit more to what the lora is trained on instead of the whole model

old birch Jan 6, 2024, 8:59 PM

#

So i just need to train lora with the object i want and should be fine?

dusky urchin Jan 6, 2024, 9:46 PM

#

old birch So i just need to train lora with the object i want and should be fine?

what is the purpose of the art? is it a commercial or video game art or... helps us understand what quality you need

old birch Jan 6, 2024, 9:51 PM

#

dusky urchin what is the purpose of the art? is it a commercial or video game art or... helps...

Its hobby haha

#

I just like sharp and high res stuff, but at least that looks realistic or in the context

#

for instance, this image from the user kuromi, how would you add a militar hat to the main subject?

dusky urchin Jan 6, 2024, 10:11 PM

#

old birch for instance, this image from the user kuromi, how would you add a militar hat t...

generally, it is extremely hard to add a specific, coherent element to a pre-existing image. you must use inpainting, which means you must create a mask, which means you must use a real art tool

#

in this particular instance, there's no chance a diffusion model will correctly add a hat that is cropped by the top of the frame. you would need to outpaint some margin at the top of the image, then inpaint a hat, then crop it again.

#

there are some instruct2pix style solutions but none that are really easy to use by the community nor robust enough for whatever problem you throw at it

old birch Jan 6, 2024, 10:16 PM

#

okay i get it

#

So better to generate similar image directly, instead of adding stuff right?

dusky urchin Jan 6, 2024, 11:49 PM

#

old birch So better to generate similar image directly, instead of adding stuff right?

yes. for open source work, deepfloyd is the best model in terms of text comprehension, and then you can use its outputs for later workflows in stable diffusion for aesthetics and control

stiff dust Jan 7, 2024, 6:33 AM

#

you can always add the hat yourself with a graphics program (e.g. cut the hat from one of the soldiers, increase its size and copy it to the guy) and then do inpainting afterwards to make the poor copy-pasted hat into a hat that fits to the image

tame otter Jan 7, 2024, 9:59 PM

#

dusky urchin yes. for open source work, deepfloyd is the best model in terms of text comprehe...

DF is not an open license. Research only restricted licensing

#

https://huggingface.co/spaces/DeepFloyd/deepfloyd-if-license

Deepfloyd If License - a Hugging Face Space by DeepFloyd

lilac egret Jan 8, 2024, 4:05 AM

#

Hello, Can anyone help me fix this?, when i use roop, or Reactor, this happen to the face, i tried to change Restore Face Visibility, CodeFormer Weight (Fidelity), same thing happen, and when i use ip adapter to change the face, it not recoganize the face,

dusky urchin Jan 8, 2024, 5:39 PM

#

old birch So better to generate similar image directly, instead of adding stuff right?

it really depends on yoru goals because like kaibioinfo and i are saying, if you can use a tool that you work with by hand, like photoshop or something, to draw masks and to copy and paste items, this is achievable. if you want to do this without any human intervention, it is impracticable

tame otter Jan 8, 2024, 8:35 PM

#

you can draw masks directly in a few UI's . FOoooocus, invokaAI, automatic1111, vladmantic

#

i'm not sure if comfy or swarmui have anything setup for live masking

#

there's also great extensions like segment anything or inpaint anything, for automatic1111, which make creating masks a breeze

stiff dust Jan 8, 2024, 9:23 PM

#

my point was not so much about drawing the mask (yes, thats possible in comfyui and any other ui I know), but about making a rough sketch what you want

#

if you want to inpaint a sword into the hand of a character, it's often easier to just copy&paste a sword into the hands and then inpaint on that

tame otter Jan 8, 2024, 9:34 PM

#

that technique is used tons in the past. before diffusion. its tried and true. good call. i've heard it called photobashing a lot

frigid pier Jan 10, 2024, 9:02 AM

#

Hi guys, new to here. 🙂

I've created couple images of ppl posing in different locations. The images look otherwise quite nice but they lack detail and are somewhat soft. I've tried numerous ways to enhance the images with several upscalers, img2img to get more detail, ADetailer, LoRAs, ControlNet, combination of these, different noising strenght and sampling stemps and methods, tweaked promts etc. Still the results are far from good. - Ideas what to try next?

I'm on a Mac Studio Max M1, A1111

dusky urchin Jan 10, 2024, 4:54 PM

#

frigid pier Hi guys, new to here. 🙂 I've created couple images of ppl posing in different ...

are you using a fine tune or lora you're developing yourself?

frigid pier Jan 10, 2024, 7:44 PM

#

dusky urchin are you using a fine tune or lora you're developing yourself?

Everything I’m using is downloaded from hugging or civitai

dusky urchin Jan 10, 2024, 10:09 PM

#

frigid pier Everything I’m using is downloaded from hugging or civitai

maybe try the #1072238304042438758

restive bridge Jan 11, 2024, 4:52 AM

#

has anyone tried a batch size of 2 when training XL with dreambooth? SEcourses config is batch 1 but I can fit another one on 24gb. cant really tell what it's doing to quality since I havent tested both on the same person yet.