tall condor Jun 8, 2023, 9:15 PM

#

faces are really tircky, for me front shots work very well but as soon as its side view and stuff it gets really complicated

#

you should add a promt that is requesting things like "laying in the grass" or side view to evaluate the faces better

#

or "spinning around, side view"

wise seal Jun 8, 2023, 9:18 PM

#

Hi, I need some help with lora.
I noticed that there are some loras trained on 2.x models that are compatible with the standard auto1111 syntax lora:filename:multiplier.
How do you train them? Is there any way to create them with kohya_ss?
The reason is that I use openoutpaint a lot and it is not compatible with kohya's additional networks extension, the loras can only be activated by the standard syntax

tall condor Jun 8, 2023, 9:20 PM

#

did you install the extension "LyCORIS"

#

which lora type did you create in kohya_ss?

#

@wise seal i think if you create a "Standard" Lora model you should be able to just load it

#

if you create a LyCoris Model you need that extension in auto 1111

#

@surreal lagoon are you still running with noise offset 0.02?

surreal lagoon Jun 8, 2023, 9:32 PM

#

no

#

i don't think i'll bother using that anymore

tall condor Jun 8, 2023, 9:32 PM

#

ok

surreal lagoon Jun 9, 2023, 12:18 AM

#

#

man, i love this style

steady heath Jun 9, 2023, 3:00 AM

#

Already in ED? 🙏

hot breach Jun 9, 2023, 3:00 AM

#

yes

#

there are some coordination issues with inference apps though

steady heath Jun 9, 2023, 3:01 AM

#

Does it work with 2.x models?

steady heath Jun 9, 2023, 3:01 AM

#

hot breach there are some coordination issues with inference apps though

Oh?

hot breach Jun 9, 2023, 3:01 AM

#

and its not playing nice with SD1.x for now because it doesn't use v_prediction

steady heath Jun 9, 2023, 3:01 AM

#

I see

hot breach Jun 9, 2023, 3:01 AM

#

I'd say it basically only works on SD2.x 768 models that already use v_prediction?

steady heath Jun 9, 2023, 3:02 AM

#

I see

hot breach Jun 9, 2023, 3:02 AM

#

it might be able to finetune SD1.5 with v_prediction long enough that it accepts the change though

#

someone said SD2.1-768 was based on512 which would've been epsilon then switched

steady heath Jun 9, 2023, 3:03 AM

#

There was somthing about V-pred models that they have triple the loss or something? What kohya said hr fixed afaik

hot breach Jun 9, 2023, 3:03 AM

#

their loss is higher during training but its not comparable

#

its not something i'd worry so much about

steady heath Jun 9, 2023, 3:04 AM

#

I wasn't quite sure sd1.5 LRs would work well with 2.x models since id imagined they would require a higher LR

hot breach Jun 9, 2023, 3:08 AM

#

SD2.1 768 trains well with customized optimizer on the text encoder

steady heath Jun 9, 2023, 3:09 AM

#

lettme yoink me config for optimzier and i wanna ask if theres anything youy'd change

#

oh wait a sec theres 2

#

SD21 and regular optimizer

#

does ED auto select if it notices a 2.1 V model?

hot breach Jun 9, 2023, 3:18 AM

#

TE and unet get separate optimizer instances now in ED2 s you can use different LR,different Lr schedule, completely different type of optimizer, etc

#

TE has some layer freezing setup, it seems value like -2 or -6 which just unfreeze the last 2 or 6 layers works well for SD2.1

#

TE is very sensitive in SD2.x, needs a light touch so to speak

steady heath Jun 9, 2023, 3:21 AM

#

I see

#

I wasn't quite sure what the freezing did so i just put false iirc

hot breach Jun 9, 2023, 3:23 AM

#

we're still trying to figure out ideal settings, but freeze embeddings true, layers -6, and freeze final layer norm false seems to work well

steady heath Jun 9, 2023, 3:23 AM

#

ill keep that in mind

#

also thanks for implementing the snr freq 🤝

#

I shall try training again asap

#

https://cdn.discordapp.com/emojis/709076438652813443.gif

hot breach Jun 9, 2023, 3:25 AM

#

        "freeze_embeddings": true,
        "freeze_front_n_layers": -2,
        "freeze_final_layer_norm": false
    }```   -2 seems to do well for small scale stuff like training one or a few characters?

steady heath Jun 9, 2023, 3:26 AM

#

I see

chrome breach Jun 9, 2023, 6:02 AM

#

Anyone tried freezing text_encoder??

#

I have no idea how it affects the model quality... But would love to know some results about it

rugged cobalt Jun 9, 2023, 6:35 AM

#

Do you have any ideas on how to best train the appearance of a specific piece of furniture? I've tried different LORAs, but the furniture isn't being represented correctly.

chrome breach Jun 9, 2023, 7:03 AM

#

Haven't tried LORAs myself... So cant suggest u on that

#

U might wanna try dreambooth training if you only want model to learn one kind piece of furniture

#

That might work for u

surreal lagoon Jun 9, 2023, 2:23 PM

#

hot breach ```"text_encoder_freezing": { "freeze_embeddings": true, "freeze...

-6 is too much for large scale. and -2 isnt enough for general fine tune without balanced concepts. it forgets stuff

#

basically the data you use is almost more important than which layers you freeze.

midnight tusk Jun 9, 2023, 3:09 PM

#

Hello, how to train images without using lora with 8gb VRAM? thanks

#

Can anyone help? I'm about to generate images likes this but the results are not so good.

#

when I'm generating images, it doesn't really look good. I don't need extra objects or lines that is not recognizable.

#

I want it look like a human-made. Without flaws or blemishes, smudges, and so on...

surreal lagoon Jun 9, 2023, 4:12 PM

#

midnight tusk Hello, how to train images without using lora with 8gb VRAM? thanks

a large number of gradient accumulations, most likely.

#

make sure you use DeepSpeed and CPU offload, and then you'll be running that training job for like 3 weeks.

midnight tusk Jun 9, 2023, 4:15 PM

#

3 weeks?

surreal lagoon Jun 9, 2023, 4:15 PM

#

the less video ram you have, the more the system has to spend time moving data around between CPU and GPU

#

it's a substantial loss in performance

midnight tusk Jun 9, 2023, 4:16 PM

#

so, there's no other way or option?

#

How about DeepSpeed and CPU offload, what are those?

surreal lagoon Jun 9, 2023, 4:16 PM

#

i'm on an 80GB GPU and i'm training for 3 days just to have it run in a way that i don't actually destroy the model while training it. when it learns too quickly with too little context, it over-corrects and destroys the coherence of the text encoder.

#

if i had an 8GB GPU, to train as well as i am on an 80GB GPU, it would take probably 3 weeks to get where i've gotten in 3 days.

#

when you successfully use less memory while training without extending the training runtime, you're giving the model less context on each iteration. and this works okay for short training sessions on a diverse, and well-captioned set of data. but if you want the model to learn everything from your training data, you have to give it more steps. and that results in more incoherence, due to the reduced context

#

it doesn't actually cost that much to rent time on an 80GB GPU. like $3 an hour, and you can get quite far in 24 hours. less than $100 to fine-tune a model. depends on how much data you have. the more it needs to learn, the more expensive it becomes.

midnight tusk Jun 9, 2023, 4:21 PM

#

Thanks, do you use deepspeed?

surreal lagoon Jun 9, 2023, 4:21 PM

#

no

#

i use large GPUs so i can avoid stuff like that, as it influences training in ways i don't understand and don't wish to spend time figuring out

midnight tusk Jun 9, 2023, 4:23 PM

#

So, in conclusion, I won't be able to train without using Lora?

surreal lagoon Jun 9, 2023, 4:24 PM

#

you can do textual inversion, lora, possibly others. but yeah, training the full model incl text encoder is going to be difficult-to-impossible on a small GPU.

#

this is pretty important to accept, so that you don't waste time trying to make it work

#

i would say 24GB is probably the point to start with for fine-tuning

midnight tusk Jun 9, 2023, 4:25 PM

#

So I won't be able to get a perfect result with these?

surreal lagoon Jun 9, 2023, 4:26 PM

#

the point of dreambooth is to cleanly integrate a subject into a model, and if you can't tune the text encoder properly with a large enough context. it loses the ability to do most things other than the subject you trained into it

#

i can show you the results of some experiments! they're not pretty.

midnight tusk Jun 9, 2023, 4:27 PM

#

May I see, please? Thank you

surreal lagoon Jun 9, 2023, 4:27 PM

#

this is supposed to be a tuxedo cat

#

we call this "catastrophic forgetting" and i'm not trying to make a cat pun, sorry

#

this is supposed to be a politician whose name i don't wish to say, but it doesn't matter anyway, because it's just noisy garbage

midnight tusk Jun 9, 2023, 4:29 PM

#

Yah

surreal lagoon Jun 9, 2023, 4:29 PM

#

pretty neat results but very useless for text guided diffusion.

midnight tusk Jun 9, 2023, 4:29 PM

#

In my case, I want to generate a character just like a human-made drawing

surreal lagoon Jun 9, 2023, 4:30 PM

#

same model, but prompted with a class it still understands

#

you end up needing to provide like thousands and thousands of super high quality images to the training on a low VRAM system. and it takes longer because of that, while still potentially destroying the text encoder

#

you just need to make a LoRA. why do you not want to make one? they can be used on top of any compatible SD model.

#

the benefits far outweigh the downsides. @north valley please chime in on this

north valley Jun 9, 2023, 4:32 PM

#

I have been peenged

#

never been in this channel before

surreal lagoon Jun 9, 2023, 4:32 PM

#

it's what you were just talkin about

north valley Jun 9, 2023, 4:32 PM

#

waow

surreal lagoon Jun 9, 2023, 4:32 PM

#

LoRA > Dreambooth

#

and go

north valley Jun 9, 2023, 4:33 PM

#

Oh yes, LoRA's are massively beneficial

surreal lagoon Jun 9, 2023, 4:33 PM

#

apparently you can merge the LoRA into a base model's unet but you said before when i'd asked you what that would do, "it would be a waste of time"?

north valley Jun 9, 2023, 4:33 PM

#

they can be trained in literal single digit minutes on any half decent GPU, they can provide dreambooth quality results with far less work, can be weighted individually and in tandem, across any model you want, assuming its adjacent enough

north valley Jun 9, 2023, 4:34 PM

#

surreal lagoon apparently you can merge the LoRA into a base model's unet but you said before w...

I honestly don't remember that, personally

#

I have been looking for ways to finetune by merging LoRA's into models for a good moment now

surreal lagoon Jun 9, 2023, 4:34 PM

#

we were working on your commissioned network

#

so, forever-ago in AI time

midnight tusk Jun 9, 2023, 4:34 PM

#

Lora is for low vram right?

north valley Jun 9, 2023, 4:34 PM

#

hmm... Maybe I misunderstood what you asked, I have been wanting to combine LoRA's into a model

surreal lagoon Jun 9, 2023, 4:34 PM

#

LoRA isn't just for low VRAM, no

north valley Jun 9, 2023, 4:35 PM

#

midnight tusk Lora is for low vram right?

its for any VRAM, but it does have the benefit of being much faster and easier to train

surreal lagoon Jun 9, 2023, 4:35 PM

#

it's a low overhead network that alters the weights of the base model's unet / text encoder via cross-attention

#

because they take like 5-15 minutes to train you can quickly see what you're doing wrong and adjust hyperparameters and see results in the same hour

midnight tusk Jun 9, 2023, 4:36 PM

#

I'm sorry. I don't really understand these things in Stable Diffusion.

north valley Jun 9, 2023, 4:36 PM

#

basically, its a little tweak you put ontop of the model, which is much smaller, faster, and works with any model thats at least decently adjacent

surreal lagoon Jun 9, 2023, 4:36 PM

#

it's fine, a lot of what i say won't make sense now but it might later

#

when Sytan says "adjacent" he means the training state of the model should be similar enough that the LoRA still behaves as expected when layered on top.

north valley Jun 9, 2023, 4:37 PM

#

You can achieve DB quality results with a decent LoRA', in much less time, with a much smaller data set, and it has the benefit of being usable on different models

midnight tusk Jun 9, 2023, 4:37 PM

#

But, can lora give good images? or maybe the problem is the model I've been training

surreal lagoon Jun 9, 2023, 4:37 PM

#

LoRA is most likely higher quality than a Dreambooth with the same dataset.

north valley Jun 9, 2023, 4:37 PM

#

so if you train a real person into a realism model, putting that ontop of an anime model should be able to translate them into that style really good

north valley Jun 9, 2023, 4:38 PM

#

midnight tusk But, can lora give good images? or maybe the problem is the model I've been trai...

absolutely, I make money off of selling LoRA's

#

let me get some of my examples really fast

surreal lagoon Jun 9, 2023, 4:38 PM

#

LoRAs feel like cheat mode.

north valley Jun 9, 2023, 4:38 PM

#

this was one of my first LoRA's

#

this is what this character actually looks like in the show he is from

#

#

and now granted, this LoRA is not that good compared to what I can do now, but you will get the point

#

#

I made that in about 20 minutes with only 7 images of him

midnight tusk Jun 9, 2023, 4:39 PM

#

I wanted to generate more images of a robot with this kind of design. I want it to look like a human-made drawing

north valley Jun 9, 2023, 4:39 PM

#

fast forward to LoRA's that I did a much better job with, and you get results like these

#

surreal lagoon Jun 9, 2023, 4:40 PM

#

for comparison, i've been trying to fine-tune a base model for a month and a half with more than 30,000 high quality images and it's cost me about $3200. i'd be done already if i cared to use a LoRA. but i'm in it for the challenge, not the results.

north valley Jun 9, 2023, 4:40 PM

#

As you can see, LoRA's can get fantastic results, and even these are honestly a little lackluster compared to what I can do now, with even less, in less time 😅

#

LoRA's are arguably better for specific things, like a specific style, or a specific character/concept

#

for example, I trained a LoRA on Funko Pops, with just 10 images, and about 7 minutes of training

surreal lagoon Jun 9, 2023, 4:41 PM

#

rely on other people for fine-tuning a base model and just make LoRAs and you'll be way happier.

#

@north valley have you seen LoHa yet?

midnight tusk Jun 9, 2023, 4:42 PM

#

Yah, thanks to you all. maybe I'll have to do more research first. Thank you very much. Appreciated your insights and advices.

north valley Jun 9, 2023, 4:42 PM

#

Here, you can see some of the results I got from an extremely small data set, with extremely fast training

#

onlyt took 10 images, and about 7 minutes training to create a LoRA that can do this style across countless models reliably

midnight tusk Jun 9, 2023, 4:42 PM

#

Can you give me datasets for the images I sent?

surreal lagoon Jun 9, 2023, 4:43 PM

#

Sytan, GPT4 explained to me how LoRA benefits from regularization images by "helping it learn representations that are core to the base model's unet and text encoder, resulting in more coherent results, as the weights merge at runtime more effectively."

north valley Jun 9, 2023, 4:43 PM

#

surreal lagoon <@296720689468145664> have you seen LoHa yet?

yes, however there are soooo many LoRA adaptations, that I genuinely can't find the interest in learning every minute difference lol

surreal lagoon Jun 9, 2023, 4:43 PM

#

same, and i'm a developer who doesn't even sell this shit, but i'm in it for the knowledge and learning

#

i totally appreciate that someone who is calculating the return on investment wouldn't find the time to look into it

surreal lagoon Jun 9, 2023, 4:45 PM

#

north valley yes, however there are soooo many LoRA adaptations, that I genuinely can't find ...

thanks for explaining by the way, an example image is worth a million words lmao

north valley Jun 9, 2023, 4:46 PM

#

All good, I have a lot of experience with many facets of LoRA's, always glad to help haha

#

especially for when its between DB and LoRA's

#

99/100, LoRA is the way to go

surreal lagoon Jun 9, 2023, 4:48 PM

#

yep, take it from the one who has destroyed like 150 models in a few months here..

#

by the way, my current fix is working great. you just need to use a batch size of 150!

#

large batch size = very gradual learning, with more coherence

#

seems like having many repeats over training data with a large batch size has a much reduced impact on eg. overfitting

#

apocalyptic wasteland for example, i just love this. it looks to me like a real photo from an exploring channel on youtube

north valley Jun 9, 2023, 4:50 PM

#

that looks dope :>

surreal lagoon Jun 9, 2023, 4:50 PM

#

thanks, nat geo!

#

same prompt without the midjourney keyword

#

why does midjourney word make some photos so real looking? no idea. mysteries of life

north valley Jun 9, 2023, 4:51 PM

#

hmm, interesting

#

MJ is likely a huge network of many models and LoRA's that are all triggered off of prompt nuance, as we have previously suggested

surreal lagoon Jun 9, 2023, 4:52 PM

#

and here's the stupid reason i consider this training run a success. it kept the mediocre version of Robin Williams from the base 2.1 model in-tact. it even fixed some of his artifacts

#

remember my whole initial goal of this was to add some of the flavour of MJ without looking like MJ, while retaining most of the base 2.1

#

the first half is really easy. the second half, not so much

#

i was comparing my model against OpenJourney last night. i used to love that model so much, lmao it was my go-to. it is total garbage to me now

#

i'm curious how 1.5 goes once i finish this 2.1 model up, so that's my next step is to switch to base 1.5 and train it again on v_prediction loss and terminal SNR, while paying more attention to the results and trying to tune things so it works better. two days ago i tried a brief 1.5 training session on an A100-80G over 24hrs and it cost $72 and resulted in pretty poor images compared to my 2.1 attempts

#

i've likely tuned hyperparameters too far in favour of 2.1, so i'll have to find the switches to flip defaults for, and make sure that's possible to do with a single --option-at-runtime

#

oh, i added 10,000 images of hands to my training data last night, too. that has been in there for a couple checkpoints at least, by now

#

couldn't do faces of children reliably before. fixed now

wise seal Jun 9, 2023, 5:01 PM

#

tall condor <@624495192048926740> i think if you create a "Standard" Lora model you should b...

Thank you! That works fine with 1.5 loras but 2.x needs a separate extension to activate them. I saw however some loras on civitai that doesn't need the extension, I just don't know how they were created, or maybe there's some detail I'm missing.

surreal lagoon Jun 9, 2023, 5:01 PM

#

can you link me the extension? i can check how it loads them for you

gentle osprey Jun 9, 2023, 6:24 PM

#

https://github.com/d8ahazard/sd_dreambooth_extension/discussions/547

GitHub

[Pre-Final] A large research of the main training parameters *Updat...

NEW UPDATE Updated formulas for calculating frame resolutions. Also updates with permissions were carried out in the table with mathematics. The custom lr_scheduler has been moved to a separate doc...

#

Great write up on Dreambooth training

#

Way more detail than most guides

surreal lagoon Jun 9, 2023, 7:19 PM

#

gentle osprey Great write up on Dreambooth training

there's some issues with it.

However, when changing LR, there is a problem that when generating with high CFG values, images contain distortions, that is, elements on them begin to break, and the lower lr, the lower the CFG value at which this begins to develop. This is most likely due to the deviation during training of the LR value from the base value of LR during the initial training of the model, since the same problem is observed when the frame resolution is increased, only there it is expressed differently, but the principle is similar.

for example, this isn't true.

#

"this is most likely" preceeds statements that are pulled out of thin air as a guess.

gentle osprey Jun 9, 2023, 7:25 PM

#

honestly, haven't found a guide that doesn't have some inaccuracies. there's sooooooooooo much contradictory information flying around

surreal lagoon Jun 9, 2023, 7:26 PM

#

yes. all of the research referenced here is just a momentary stepping stone of understanding on our way to the next truth

#

prior loss preservation is such a basic concept that is implemented poorly by everyone

#

using a single token for it, ugh. captioned regularization images is where it's at

surreal lagoon Jun 9, 2023, 7:54 PM

#

it says that gradient checkpointing doesn't help image quality, but the fact that you can increase the batch size via that, does help substantially. people focus on how long it'll take to train, too much.

#

$3.18/hr plus storage costs

#

i ran on an 8x A100 80G system for a few days just to see what happens

#

that was like $40/hr

#

my takeaway is that the quality boost from many GPUs is great but too expensive for me to justify, so i simply emulate it with a single 80G GPU now, and resort to gradient accumulations to boost it to the batch size i had on the 8x A100 system.

#

i did buy a 4090

sonic narwhal Jun 9, 2023, 9:16 PM

#

north valley onlyt took 10 images, and about 7 minutes training to create a LoRA that can do ...

Bro my lora training on kohya_ss takes 6 hours on rtx 3070 anything im doing wrong?

tall condor Jun 9, 2023, 9:27 PM

#

only 6 hours? xD

#

mine take like a week

jaunty wadi Jun 9, 2023, 10:14 PM

#

gentle osprey honestly, haven't found a guide that doesn't have some inaccuracies. there's soo...

Preach, I've been struggling with that, especially regarding Dadapt and networkdim/alpha + learning rates

#

By the way, are there any DAdapt users or Lycoris model creators I could have a discussion with, I've been trying to create a Lycoris that emphasizes larger outlines/cel-shading and It can work, but I feel there are things I can do to innately optimize it, I just need to know what I should do training/captioning wise in a few different questions areas.

For instance,

When creating said lycoris or lora in the process of making a style; are you supposed to include certain stylistic keywords such as thick outline, white outline, black outline, celshading, or is that best left pruned from tags? If you keep the style tags, where would they be placed start middle end?
What kind of prompting should someone utilize in testing said style Lora, (for reference I can get my lycoris to work quite well with some prompts including outlines or celshading, but on its own it doesn't unless weight is applied extensively 1.25, 1.5
For DAdapt I've seen some really contrary info regarding both whether to use Dadapt or Dadaptadam, and also what dims and alphas should be used.

Any insight is appreciated!

I've put some examples of what it can cook at just basic 512x512, but struggles with consistently maintaining outlines (especially if not in prompt).

💀 edit: by the way it can output more than women just tested with such as a consistent output 💀

jaunty grove Jun 9, 2023, 11:57 PM

#

sonic narwhal Bro my lora training on kohya_ss takes 6 hours on rtx 3070 anything im doing wro...

I started with AI art, and Stable Diffusion about 3 weeks ago, read time, and still learning.

With koyah_ss and a 4080, my Lora training for people/celebs is taking around 15 mins for say 1700 steps.

I've done about 8 trainings now, all but one are working amazingly well. I'm astounded and amazed with myself how well they've turned out.

The results are spot on, and the ai art is looking like the person in the real pics I trained on.

I don't know how fast a 3070 should be, but training for me on a 4080 with around 22-36 images, 25-30 repeats, around 10 epochs, batch size 5-7 and a learning rate of 0.00005 is taking 15-20 mins.

I try to aim for total steps between 1500-2500 using the formula:

steps = (images * repeats * epochs) / batch size

jaunty wadi Jun 10, 2023, 12:01 AM

#

sonic narwhal Bro my lora training on kohya_ss takes 6 hours on rtx 3070 anything im doing wro...

I'll say it certainly depends on settings, I come from the perspective of a 3080ti, if I accidentally were to add a large amount of epochs while say running dadapt, then it would take that long. Meanwhile, if I were to run it on Adamw8bit, it might be a fair margin easier/faster to get the lora baking, but for instance the model im asking for assitance from took like 2 hours for 4 epochs because its running DAdaptation/DAdaptADAM and a high network alpha/dim, so my answer is it depends lol

jaunty wadi Jun 10, 2023, 12:03 AM

#

jaunty wadi By the way, are there any DAdapt users or Lycoris model creators I could have a ...

in regard to my own message ive been looking through training data on civitai for some other styles and its like 60% use a trigger word and 40% don't with pretty insane levels of repeats, they be repeating 100 images like 15 times 💀, I'd be curious as to others captioning/training styles

jaunty grove Jun 10, 2023, 12:09 AM

#

I've made 8 character / person Lora's in recent days, all with amazing results, bar one, which was just a shade off of perfect.

I'm looking to train a style Lora next, are there any tips, anything different I need to do in terms of number of images to learn the style.

I assume I just need a variety of images in the style, and maybe more than the 22-36 character images I've been using for character training?

Do I need to aim for any increase in epochs, repeats, lower the LR etc?

Thx

jaunty grove Jun 10, 2023, 12:12 AM

#

jaunty wadi in regard to my own message ive been looking through training data on civitai fo...

My Lora's use a trigger word that is the same name as the Lora. Repeats are about 20-25 ish on maybe 30 images, batch size 5, epochs 10.

Lr is 0.00005

I try and keep total steps between 1500 and 3000

Captions on the images are using wd14 tagger, and no regularisation images (yet)

jaunty wadi Jun 10, 2023, 12:15 AM

#

jaunty grove I've made 8 character / person Lora's in recent days, all with amazing results, ...

I know I've been talking about questions with styles lora, but I'll try to give out some info I've discovered in my research (to be honest alot of soruces contradict eachother)*

use kohya_ss

captioning requires a different kind of pruning, instead of character traits, some either don't bother or will comb out style details (aka artists names, line size, shading style, character names) this pruning is one im trying to figure currently

I find Lycoris to be far better at capturing styles, different parameters but the results have captured it better imo,

I hear people use at least 100-150+ for styles, sometimes far more,

epochs/repeats/LR are all really dependent on what you've got thus far, depending on your system you can run DADAPT and it tends to provide better output but takes more system requirements, otherwise yeah the classic 5e-5 with adamw8bit,

jaunty grove Jun 10, 2023, 12:22 AM

#

jaunty wadi I know I've been talking about questions with styles lora, but I'll try to give ...

Thx Pappas,

I'll definitely be using Kohya, as I am already for my character Lora's.

The caption pruning makes sense, given you don't really want the Lora to worry about character specifics.

I'll have to experiment, and see if it's better pruned or with no captions at all.

Image wise I can easily get 150+ as its a video game style I want to train.

Lycoris I saw some on Civitai, but wasn't sure about them, and saw I needed an extension for support.

I might kick off my first ever style training tomorrow, and experiment.

jaunty wadi Jun 10, 2023, 12:23 AM

#

jaunty grove Thx Pappas, I'll definitely be using Kohya, as I am already for my character Lo...

from what I've seen 80% of people use captions and those end up better, so I'd stick with it IMO, but thats up 2 u, it is an extension for support ,but its not a complicated extension fortunately, plug + play in that regard

#

the only thing I'll say that could even be deemed annoying about it is less documentaiton/ civitai autoupdater doesn't update it automatically

#

otherwise execution wise it inputs the same as lora expect it'd be lyco: instead

jaunty grove Jun 10, 2023, 12:26 AM

#

Thx. Yeah captioning makes sense to me. It's how the model learns the association between the text and uk the image, be it character or style

I've a few extensions installed in Automatic1111, so one more won't hurt lol

Time to experiment over the weekend 😊

jaunty wadi Jun 10, 2023, 12:27 AM

#

execution wise should be really similar to loras even with training, most guides ive seen treat it identical, loras better for characters, lyco better style wise, lyco I've seen a fair bit of people low the dim/alpha, but i mean if you run dadapt it doesnt matter as much PI_Shrug

#

one thing im still struggling to see with these style loras/locons is that whether I should retain art-style prompts in the tags, like if the style is really defined by thick outlines, do I retain that, or try to let it absorb without tags AShmm , in the models current rendition it can add said outlines with either adding 1.5 weight or adding the tag "outline" PI_Sigh

jaunty grove Jun 10, 2023, 12:32 AM

#

What does the dim/alpha do?

One thing I've found when experimenting with Lora's, is they can be weighted too high by default and I had to make them a 0.7 or 0.6 attention for them to look ok.

I resolved that issue eventually, but believe my original Lora's were over fitted, would that be right given me having to lower the attention.

Yeah I guess in some ways the training should be just learning the style, without you needing to specify 'outlines'. If the style has outlines, then I'd kinda expect the model to just learn it at part of the overall aesthetic

jaunty wadi Jun 10, 2023, 12:37 AM

#

jaunty grove What does the dim/alpha do? One thing I've found when experimenting with Lora's...

yeah I thought that last part would apply, but perhaps I haven't applied enough weight, dim and alpha are defined as "for lora weight scaling" with DIM always being higher or equal to alpha, "https://rentry.org/59xed3" this being one of the better guides, discussing said settings

jaunty grove Jun 10, 2023, 12:40 AM

#

jaunty wadi yeah I thought that last part would apply, but perhaps I haven't applied enough ...

I've already got 15 or so tabs open in my browser, as I've a backlog of reading to do on various SD and Automatic1111 topics. I'll be adding that link to the list 😊

Thx

north valley Jun 10, 2023, 3:01 AM

#

sonic narwhal Bro my lora training on kohya_ss takes 6 hours on rtx 3070 anything im doing wro...

uhhh, yeah, likely a ton

#

my LoRA's took like 10 minutes on a 3060ti. Likely you are doing a TON of images, with wayyy too many steps

#

if you are following like... anything from AItrepreneurs video on making LoRA's, then that certainly explains something. I do not recommend listening to anything from his LoRA video

#

its very bad information, and you will get very bad results from it

#

this was after like 5 days of trying his stuff to get a Na'vi LoRA

#

and these were about 30 minutes after I learned how to actually make LoRA's lol

sonic narwhal Jun 10, 2023, 8:17 AM

#

north valley and these were about 30 minutes after I learned how to actually make LoRA's lol

yeah I used around 100 images and I am following some stuff from aitrepreneurs video. What is the "actual way" of making a lora that you have learned?

tall condor Jun 10, 2023, 8:30 AM

#

any way to run more than a batch of 1 with rtx4090 when training on v2 models?

#

also is there any drawback of using memory efficient attention?

stiff dust Jun 10, 2023, 10:26 AM

#

jaunty grove What does the dim/alpha do? One thing I've found when experimenting with Lora's...

dim is the rank of the matrix factorization. You can think of Lora as "compressed training", like you train your model but you compress the changes you made to the model to keep it small in memory. The rank is then more or less the compression strength. Higher rank = less compression = the model is more able to finetune. If your rank is as big as the matrices you change, Lora becomes more or less equivalent to Dreambooth.
alpha is a scaling factor you multiply your lora weights with. It is divided by the dim internally, so you have to set alpha to the dim parameter to get an effective scaling factor of 1. The idea of alpha is that Lora needs lower learning rate as higher the rank is (I mean, people train lora with lr of 1e-4 to 1e-5 while Dreambooth for example is rather 1e-6 to 1e-7. If you would use full rank lora you would also have to use very small learning rates). So instead of manually changing the learning rate whenever you try a different dim parameter, they scale down the lora when you increase the dim and provide an alpha parameter, such that you can control this downweighting

stiff dust Jun 10, 2023, 10:27 AM

#

jaunty grove What does the dim/alpha do? One thing I've found when experimenting with Lora's...

many people found that downweighting the lora at inference time improves results. My theory is that downweighting the lora has a similar effect like EMA or like merging models (which also improve quality at inference time)

#

@jaunty grove do you tried LORA on photorealism? I still struggle a little bit with that (not particular for LORA, I'm also not happy with dreambooth results).

main breach Jun 10, 2023, 1:23 PM

#

ok so hear me out: how cool would it be if you could make a mask for LoRA training or SD finetuning or dreambooth or whatever, where for each training image you could have various captions for the masked area. I've found especially in LoRA training, the ai sometimes cannot tell what's a piece of clothing or what is in the background, and features get baked in despite being very well described in the caption.

stiff dust Jun 10, 2023, 1:46 PM

#

oh, that's possible without problems

#

I implemented that myself for my lora training - however, I used it for training on subject without the background

tall condor Jun 10, 2023, 1:49 PM

#

why cant you just crop the image accordingly and caption the cropped parts?

stiff dust Jun 10, 2023, 1:51 PM

#

I don't think that works as good as using a mask

tall condor Jun 10, 2023, 2:01 PM

#

anyone here using dreatmbooth with kohya ss to train 2.1 models?

#

for some reason i can not do more than batch of 1

tall condor Jun 10, 2023, 2:20 PM

#

is there any drawback of using memory efficient attention?

jaunty grove Jun 10, 2023, 3:04 PM

#

stiff dust <@345506719788564480> do you tried LORA on photorealism? I still struggle a litt...

All 8 of the Lora's I've trained so far, have been photo realistic, and of real people. All have turned out amazingly well, bar one which is ok, but just not quite as good as the other 7.

I didn't do anything special. 15-35 images, 10-13 epochs, 20-28 repeats, and a batch size of 5.

I aim to get total steps between 1500-3000. Learning rate is 0.00005

gentle osprey Jun 10, 2023, 3:07 PM

#

Anyone got a top tier Lora training guide?

stiff dust Jun 10, 2023, 3:10 PM

#

maybe it's just the quality of my input photos 🤷‍♂️ they are quite jpeg-ish. When I train the unet for too long it starts to adapt to the grainyness and not to the facial features

#

do you use textual inversion first, or do you use "sks person" style captions?

surreal lagoon Jun 10, 2023, 4:15 PM

#

i wonder if it's even possible to fix all of the faces always

1686413524.425846cff4cadb5764113e8d3f67c2ed5b75f8.png

#

@stiff dust so this is after tuning only the last 2 layers, do you think maybe i need to tune 3 deep?

stiff dust Jun 10, 2023, 4:18 PM

#

as said, I found the text encoder Training rather helpful. I freezed everything except the last 6 layers because this was recommended everywhere

#

but overfitting on the texture happened first in the unet

surreal lagoon Jun 10, 2023, 4:19 PM

#

well i didn't have texture overfitting when i froze more than 6 layers, but i did when i did what you suggested

#

this is at 5400 steps which would be pretty toasty otherwise by now

#

i was considering freezing the unet as an experiment until it worked out so well to use massive gradient accumulations on just 2 layers of the TE

stiff dust Jun 10, 2023, 4:22 PM

#

I don't really know what you are referring to.
I just said that for subject training I observed overfitting often rather in the unet than in the text encoder.

#

when you train lora you can switch of each layer after training to check how much it contributes to the produced image

surreal lagoon Jun 10, 2023, 4:23 PM

#

that alters the output in a Schrodinger type way

#

the math changes a lot, it's not like photoshop layers where you can disable one and see what it does more obviously

stiff dust Jun 10, 2023, 4:24 PM

#

there I found that

text encoder did most of the work
cross attention improved results slightly, but it also introduced overfitting on the jpeg-ness of the input
self attention did nothing

stiff dust Jun 10, 2023, 4:24 PM

#

surreal lagoon the math changes a lot, it's not like photoshop layers where you can disable one...

of course you can. That's why lora and model merging works

surreal lagoon Jun 10, 2023, 4:25 PM

#

model merging is done in many different ways, and often averages the weights together or does a 50/50 "one from here, one from there" (or other ratio)

stiff dust Jun 10, 2023, 4:25 PM

#

of course as longer you train as more interdependencies between the layer changes are introduced. But for subject training you do only few epochs

surreal lagoon Jun 10, 2023, 4:25 PM

#

are you talking about making the lora a new layer on the base model?

stiff dust Jun 10, 2023, 4:26 PM

#

the lora is changing the base model

#

it's s delts you add to the weights

#

and of course you can just switch off the lora for any layer

surreal lagoon Jun 10, 2023, 4:26 PM

#

in automatic1111 it is merged into the base model weights at runtime but that's not how Diffusers is going to be doing it

stiff dust Jun 10, 2023, 4:27 PM

#

in Dreambooth/Fine-tuning the equivalent would be to set some layers back to their original weights after training

stiff dust Jun 10, 2023, 4:27 PM

#

surreal lagoon in automatic1111 it is merged into the base model weights at runtime but that's ...

I implemented my own lora and its doing it at runtime

#

but honestly, that's just implementation detail. It doesn't matter if you add lora on your weight matrix or on the input

#

math is the same. It's just a question if you want to change the loras frequently

surreal lagoon Jun 10, 2023, 4:31 PM

#

aye

#

the dolphins are now birds Sad

#

it never made sense that they were dolphins but i loved it

#

tall condor Jun 10, 2023, 5:19 PM

#

whenever i train with kohya on 2.1 models as base my results is completely broken, i see color lines and there is no pictures actually esixtint

#

any idea why

stiff dust Jun 10, 2023, 5:21 PM

#

probably a bug in kohya

#

maybe they use the wrong scheduler or epsilon prediction instead of v prediction

tall condor Jun 10, 2023, 5:25 PM

#

anyone here run into that issue before?

#

v2 require v_parameterization?

sonic narwhal Jun 10, 2023, 5:53 PM

#

tall condor v2 require v_parameterization?

yes

tall condor Jun 10, 2023, 8:05 PM

#

maybe thats the issue

#

getting better now but the result still looks very broken, anything else special when training 2.1 rather than 1.5?

stiff dust Jun 10, 2023, 8:17 PM

#

768 is native resolution, DDPM is the default sampler I think.

#

learning rate should be less than in 1.5

surreal lagoon Jun 10, 2023, 8:24 PM

#

tall condor getting better now but the result still looks very broken, anything else special...

for dreambooth, lora, Textual inversion, or general fine-tune?

tall condor Jun 10, 2023, 8:24 PM

#

dreambooth

#

the images it generates are really bad

warm agate Jun 10, 2023, 9:54 PM

#

Which dataset was used to train SDXL.

surreal lagoon Jun 10, 2023, 11:13 PM

#

their own custom one

hot breach Jun 11, 2023, 4:18 AM

#

tall condor v2 require v_parameterization?

the 768 models use v_pred

#

the 512 ones do not IIRC

surreal lagoon Jun 11, 2023, 9:36 PM

#

@stiff dust i put the unet from 2.1 back on top of ma burned 7650 step checkpoint and using the fine-tuned text encoder only, fixed some texture issues

#

so maybe i'll fully freeze the unet during training

stiff dust Jun 11, 2023, 9:36 PM

#

its so weird, cause its the total opposite of what people always suggest

#

"freeze the textencoder or everything will overfit!"

surreal lagoon Jun 11, 2023, 9:37 PM

#

😅

#

#

it way cleaned the images up

stiff dust Jun 11, 2023, 9:37 PM

#

but I found that with super simple textual inversion I can already learn different styles and concepts

#

so it seems that 2.1 unet is already very powerful and just needs the correct text tuning to get things right

surreal lagoon Jun 11, 2023, 9:38 PM

#

yeah i agree that's a great approach. i just wanted a good base model to do that on

stiff dust Jun 11, 2023, 9:38 PM

#

I hope you find a solution without freezing the unet, though ;D

#

maybe training it with lower learning rate?

#

or maybe you really need something like EMA on the unet to get good results

#

anyways, I go to bed, good night

surreal lagoon Jun 11, 2023, 9:53 PM

#

burned unet

#

#

text encoder at 7650 steps vs unet at 4200 steps ^

surreal lagoon Jun 11, 2023, 11:31 PM

#

how do i fine-tune the VAE?

vast dome Jun 12, 2023, 4:42 AM

#

guys I am training 1024 max pixel on 4090, and I am getting 1.00 it/s and 2 batch size

#

is this normal?

nocturne vale Jun 12, 2023, 7:03 AM

#

I'm about to train my first style LoRA out of an animated series from YouTube. Is there anything I'll have to think about in terms of aspect ratio and resolution of the dataset?

stiff dust Jun 12, 2023, 7:47 AM

#

surreal lagoon how do i fine-tune the VAE?

the trick is to only finetune the decoder, so that the finetuned vae is still compatible to the unet

#

beside that its normal variational autoencoder training I guess

median ocean Jun 12, 2023, 8:29 AM

#

Hello, i would like to ask, if I want to train 1000 ~ 2000 datasets with captions using Dreambooth, may I know what is the most recommended parameters? (epoch, optimizer, scheduler, learning rate, mixed precision, warmup steps, text encoder, weight)
my goal is to create a style checkpoint similar to MJ.

#

so far the best setting for me is still Lion, bf16 precision, 1e-7, constant with warmup. but would like to know if there are other recommendations 😄

surreal lagoon Jun 12, 2023, 1:12 PM

#

stiff dust the trick is to only finetune the decoder, so that the finetuned vae is still co...

https://github.com/CompVis/stable-diffusion/issues/409

GitHub

Autoencoder training details · Issue #409 · CompVis/stable-diffusion

In the paper not many details are given regarding the autoencoder training fot txt-to-image, and those would be very helpful! Can we get some answers? Which dataset the autoencoder is trained on? I...

#

theres no info

clever kayak Jun 12, 2023, 2:23 PM

#

Anyone had success using Lora from your own service?

twilit cradle Jun 12, 2023, 7:18 PM

#

Hey everyone! Can someone point me to a good source (or yourself if you know) where I can find info on the relationship between number of repeats, epochs and dataset images for Lora training in Khoya? Like when should I do more repeats vs more epochs or vice versa, what are trade offs?

hot breach Jun 13, 2023, 3:34 AM

#

surreal lagoon https://github.com/CompVis/stable-diffusion/issues/409

the yaml has hints, it shows you the loss function setup, most of what is needed is there, the target classes have more information if you go dive into the code

#

people have tuned the VAE using that code more or less, or like dreambooth forks of it, you'd need to look at the data loader class (fullopenimagestrain) and see what it is loading, many of the dataloaders in there are sort of one-offs for specific datasets and do some translation

#

it's also possible those yamls are just the last thing that was checked in when they published, and they may have used other datasets

#

🤷‍♂️

reef agate Jun 13, 2023, 4:48 AM

#

hey so im trying to use controlnet to convert a photo of a 2 people into a cartoon version of themselves however my generations allways mess up the faces of one of the 2 people like this

#

#

so i tried photoshopping the guys face and using img2img and controlnet to fix his face but then it doesnt really blend in with the environment around it

#

#

pls help

fair totem Jun 13, 2023, 6:15 AM

#

reef agate hey so im trying to use controlnet to convert a photo of a 2 people into a carto...

This is not the help channel

reef agate Jun 13, 2023, 6:49 AM

#

which is

surreal lagoon Jun 13, 2023, 1:22 PM

#

i just compile the model.

#

i get about 180 it per sec on a 4090

#

5800x3d

brittle ore Jun 13, 2023, 10:55 PM

#

reef agate pls help

Try inpainting?

vast dome Jun 13, 2023, 11:01 PM

#

hello guys how doI use regulazation images? I couldn't find any resources online. Help is appreciated on guiding me on instructions on how to use regulazation images

velvet grove Jun 14, 2023, 3:17 PM

#

Hi, apologies if this has already been asked and answered but I couldn't find anything in the FAQ and I'm not really sure how to search for my question.

My question is - Would it work to train a stable diffusion model on different training data for a character's face and body and would I then be able to diffuse images of the character combining the relevant face and body?

Context - I'm developing a Visual Novel which uses art assets generated using AI. There are lots of benefits obviously but one of the main challenges is getting consistent results for the same character.

I've therefore chosen my favourite SD model and am going to be training a new model with that as a base to merge. However the training data I've collected for the character is going to be split between the face and the body of each character I'm training the model on.

And after 100s of hours of work on this I started to get a bad feeling that I'm being an idiot and that maybe this wouldn't work. Any advice would be greatly appreciated.

stiff dust Jun 14, 2023, 6:54 PM

#

I would say that it should work. Use textual inversion on the face to train token1 and on the body to train token2 and then use both tokens in the prompt

velvet grove Jun 14, 2023, 8:39 PM

#

stiff dust I would say that it should work. Use textual inversion on the face to train toke...

That's a relief. That was my plan. Thank you so much for the advice.

#

w00t

surreal lagoon Jun 14, 2023, 10:20 PM

#

@stiff dust my unet ~~has cleared up upon continuing to train and train~~ is still trash, i mis-read the filename

clear solstice Jun 15, 2023, 8:47 PM

#

hi, please what is the best approach to merge multiple models?

livid otter Jun 15, 2023, 9:06 PM

#

Links for a good guide on training photorealistic LoRA with Kohya SS? 🙏🏻

turbid ledge Jun 16, 2023, 5:42 AM

#

livid otter Links for a good guide on training photorealistic LoRA with Kohya SS? 🙏🏻

Did you find any ?

livid otter Jun 16, 2023, 5:47 AM

#

turbid ledge Did you find any ?

Nope

turbid ledge Jun 16, 2023, 5:47 AM

#

livid otter Nope

I just use all default setting except the model o changed to sd1.5

#

Everything is fine except the ✋

livid otter Jun 16, 2023, 6:30 AM

#

turbid ledge Everything is fine except the ✋

I guess hands are not strictly LoRA dependent right? If the base model sucks at hands your lora will suck at hand too. Just asking

turbid ledge Jun 16, 2023, 6:32 AM

#

livid otter I guess hands are not strictly LoRA dependent right? If the base model sucks at ...

Yeah you are right,model play a huge part too

unborn wind Jun 16, 2023, 10:04 AM

#

surreal lagoon how do i fine-tune the VAE?

Check out the blessup extension. It only allows fixing of brightness and contrast. Ide love to know if there are any other tools out there for vae editing.

https://github.com/sALTaccount/VAE-BlessUp

GitHub

GitHub - sALTaccount/VAE-BlessUp: A tool to easily modify a Stable ...

A tool to easily modify a Stable Diffusion VAE. Contribute to sALTaccount/VAE-BlessUp development by creating an account on GitHub.

surreal lagoon Jun 16, 2023, 1:24 PM

#

livid otter I guess hands are not strictly LoRA dependent right? If the base model sucks at ...

use the hands dataset

livid otter Jun 16, 2023, 2:07 PM

#

surreal lagoon use the hands dataset

For training? Which one specifically?

surreal lagoon Jun 16, 2023, 2:25 PM

#

Hands. it has 10k images in it

#

just google "10,000 images of hands dataset"

blissful vine Jun 16, 2023, 5:19 PM

#

livid otter Links for a good guide on training photorealistic LoRA with Kohya SS? 🙏🏻

https://www.youtube.com/watch?v=j-So4VYTL98

YouTube

Olivio Sarikas

LORA + Checkpoint Model Training GUIDE - Get the BEST RESULTS super...

This LORA + Checkpoint Model Training Guide explains the full process to you. Learn how to select the best images. How to key word tag the Images for Lora and Checkpoint Training. How may steps and Epochs to use in Training. How to Merge Models to get better results.

Link from my Video

Join my Discord: https://discord.gg/XKAk7GUzAW
Bu...

▶ Play video

fierce egret Jun 16, 2023, 7:49 PM

#

Do you guys know of any tools similar to stable tuner that give you a graphical interface to help you fine tune a model?

surreal lagoon Jun 16, 2023, 8:18 PM

#

what would the interface do?

daring plume Jun 16, 2023, 10:18 PM

#

@waxen solar what kind of Data do you plan to use for your model?

waxen solar Jun 16, 2023, 10:19 PM

#

daring plume <@249951549223206913> what kind of Data do you plan to use for your model?

as in the training set? i wanted to train a model on the character of simon from gurren lagunn haha

daring plume Jun 16, 2023, 10:22 PM

#

waxen solar as in the training set? i wanted to train a model on the character of simon from...

Get some frames of the Character and cut him out. Afterwards look at this and try to create a LORA. should be pretty simple.

https://civitai.com/models/22530/guide-make-your-own-loras-easy-and-free

[Guide] Make your own Loras, easy and free - colabs | Stable Diffus...

You don't need to download anything, this is a guide with online tools. Click "Show more" below. 🏭 Preamble Even if you don't know where to start o...

waxen solar Jun 16, 2023, 10:26 PM

#

daring plume Get some frames of the Character and cut him out. Afterwards look at this and tr...

hm i see, thank you for the link! what would you recommend to use as a source checkpoint/class images if i was trying to train a checkpoint model?

#

i have around 30 images in my training dataset

#

but i'm not quite sure what to use for the class dataset as it is an animated character?

surreal lagoon Jun 17, 2023, 12:12 AM

#

wow, unfreezing another layer of OpenCLIP after 2 weeks of fine-tuning is like punching it in the face with knowledge

#

it's amazing

ancient mural Jun 17, 2023, 2:27 AM

#

Hey, is anyone aware of an extension that will convert mp4 to jpg images?

#

I want to resize them and use them to train. Hoping I can do it locally

stiff dust Jun 17, 2023, 4:43 AM

#

surreal lagoon wow, unfreezing another layer of OpenCLIP after 2 weeks of fine-tuning is like p...

haha, so I'm curious in the end how your conclusions look like

#

using more clip layer than recommended?
repeatably freezing and unfreezing parts during training?

surreal lagoon Jun 17, 2023, 4:44 AM

#

you can try the checkpoints from HF as ptx0/pseudo-real and ptx0/pseudo-real-beta

#

the former is 4.2k steps and the latter is 17.6k steps and at 15.6k i thawed another layer of the TE

#

the 17.6k ckpt with the 4.2k step unet

stiff dust Jun 17, 2023, 4:46 AM

#

yeah, the question is: was it a wrong decision freezing so many text layers at the beginning?

surreal lagoon Jun 17, 2023, 4:46 AM

#

i'm starting to see improvements in fine details like small faces

#

nope

#

allows major improvements to happen gradually and then i can kick the model in the face by opening layer 21

#

it's starting to diverge a lot, but that's the goal now

#

i'm considering putting together a very high quality 2048x2048 dataset for some more unet training to bring that back in line with the new text encoder

stiff dust Jun 17, 2023, 4:47 AM

#

I wonder if the unet sometimes does not keep up with the te and so it's good to freeze the te from time to time and unfreeze it after training the unet

surreal lagoon Jun 17, 2023, 4:47 AM

#

i've tried that before. it seems to work better the other way around

stiff dust Jun 17, 2023, 4:47 AM

#

like in the early days of deep learning when people had to train layer for layer 😅

stiff dust Jun 17, 2023, 4:48 AM

#

surreal lagoon i've tried that before. it seems to work better the other way around

what is the other way around?

surreal lagoon Jun 17, 2023, 4:48 AM

#

to train the text encoder with an old unet

#

in the -beta repo i've updated the TE but not the unet

#

it was working really well and producing very clean images around 4200 steps of fine-tuning, so i decided to keep that for a while and focus on the text encoder's representations

#

the thing is, i have kept training the unet and my inference validations do all of the combinations

#

i can see with the new TE with old unet, new unet with old TE, and then, both fully trained components together

#

the unet is so weird. it clearly influence the composition in ways i do not understand. it can introduce deformity. and it introduces more detail. but not necessarily coherently. this detail can manifest as the artifacting

#

both the TE and unet
the unet
the TE

#

imo it makes sense to periodically bring the unet weights up and see if it can remain clear, and if it starts to 'dirty up', go back to a ckpt where it was clean again and freeze it

stone garden Jun 17, 2023, 10:28 AM

#

anyone know how automatic vae works? Like I'd like to use the orangemixvae only for that while using the nai vae for everything else

#

I'd like it to be done automatically

vast dome Jun 17, 2023, 10:53 AM

#

jru

#

anyone using dreambooth? I am getting some weird result patterns, all my samples have this weird cut

#

#

#

is this because of dynamic image normalization

vast dome Jun 17, 2023, 11:26 AM

#

or is it because I am training it on 512x512

vast dome Jun 17, 2023, 2:10 PM

#

hello

smoky umbra Jun 17, 2023, 2:40 PM

#

Hey there! I'm having a bit of trouble with image generation. Whenever I try to generate an image, it just turns out completely black. I've tried a few different solutions, but nothing seems to be working. Do you happen to have any ideas on what might be causing this issue? Any help would be greatly appreciated

gentle osprey Jun 17, 2023, 4:26 PM

#

smoky umbra Hey there! I'm having a bit of trouble with image generation. Whenever I try to ...

Wrong channel

surreal lagoon Jun 17, 2023, 5:13 PM

#

http://captions.christoph-schuhmann.de/aesthetic_viz_laion_sac+logos+ava1-l14-linearMSE-en-2.37B.html

for anyone who was curious what the aesthetics scores look like in practice when they see "LAION subset with an aesthetic score >6"

gentle osprey Jun 17, 2023, 6:21 PM

#

https://youtube.com/@machinelearningatberkeley8868

YouTube

Machine Learning at Berkeley

Machine Learning at Berkeley empowers passionate students to solve real world data-driven problems through collaboration with companies and internal research. To find out more, check out our website http://ml.berkeley.edu where you can sign up for our newsletter and apply for our project teams.

#

For people looking to get in the weeds

surreal lagoon Jun 17, 2023, 6:46 PM

#

@undone fable is the unet responsible for learning resolutions or is the text encoder? or both?

undone fable Jun 18, 2023, 12:26 AM

#

surreal lagoon <@147927142045319170> is the unet responsible for learning resolutions or is the...

The attention layers in the unet do the heavy lifting in regards to resolutions and aspects

surreal lagoon Jun 18, 2023, 12:31 AM

#

thank you 🙂

#

does it make sense to f/t the openclip on it to some extent?

#

also if i were to try and freeze the gradients on some piece of the unet to prevent the texture crisp, which would you suggest

surreal lagoon Jun 18, 2023, 3:04 PM

#

damn, loss is as low as i've ever seen it when training with multiple high-res aspect ratios

#

0.165 on average

tulip dagger Jun 18, 2023, 3:07 PM

#

#

do you know how i can fix this inpainting mistake?

#

#

#

What the heck is even this!?

gentle osprey Jun 18, 2023, 3:24 PM

#

Your denoising strength is way too high

stiff dust Jun 18, 2023, 3:31 PM

#

use the inpainting control net.

#

without inpainting model or control net you can only use low denoising strength

#

also you should use "original" instead of "latent noise" if you just want to make smaller changes

surreal lagoon Jun 18, 2023, 5:18 PM

#

114/4025 [22:36<34:44:08, 31.97s/it, loss=0.146, lr=4.07e-9]

#

i like these loss values here

tired atlas Jun 18, 2023, 6:14 PM

#

Howdy all...does anyone have any tutorials/resources on how to get the best resemblance results for training a model? I know the process, but most tutorials are entry-level and just gloss over the quality of the data set like, "Just get a collection of 10-30 images of the person from different angles, different light, different focal lengths." Does anyone have a resource that dives into that a little deeper? I want to dramatically improve my results.

surreal lagoon Jun 18, 2023, 8:30 PM

#

it is true, guides toward what a good and bad dataset look like are slim

#

JPEG style artifacts are a huge deal. any kind of image noise gets "focused on" it seems, by the convolutional neural network, aka "the unet"

#

too similar of images that are a few pixels off, eg. think a couple random crops of the same image where the face shows up offset. this could cause blurring/double vision. so you do truly want your images to be "varied", more than "similar".

#

aspect ratio bucketing is also a big deal

sturdy rune Jun 19, 2023, 10:58 AM

#

surreal lagoon `114/4025 [22:36<34:44:08, 31.97s/it, loss=0.146, lr=4.07e-9]`

31.97s/it?! How?

surreal lagoon Jun 19, 2023, 12:59 PM

#

1920x1080

#

batch size 150

surreal lagoon Jun 19, 2023, 6:36 PM

#

221.09s/it, loss=0.123

#

worth it lmao

unborn wind Jun 19, 2023, 9:07 PM

#

surreal lagoon `221.09s/it, loss=0.123`

Are you using Scale v prediction loss? I'm getting similar loss rates with it. If you're not, you might be able to get something even better.

#

Another difference I'm finding is how fast the model becomes over-trained with it selected however. What used to take 22epochs now gets overtrained in 8.

surreal lagoon Jun 19, 2023, 9:13 PM

#

that's 2.1-v

#

it's always v-prediction loss

unborn wind Jun 19, 2023, 9:15 PM

#

do you mean by selecting it, it's doing it twice?

surreal lagoon Jun 19, 2023, 9:16 PM

#

i have no idea what that option does 😅

#

what are your loss values like without this?

#

mine are typically .3-.5 with spikes up to .7-.9, and i do not use prior preservation loss

unborn wind Jun 19, 2023, 9:17 PM

#

usually .3-.4

surreal lagoon Jun 19, 2023, 9:17 PM

#

interesting that a lower loss causes more burning for you. it is the other way around

unborn wind Jun 19, 2023, 9:18 PM

#

https://xrg.hatenablog.com/entry/2023/06/02/202418

Hatena Blog

xrg

noise_predictionモデルとv_predictionモデルの損失 - 勾配降下党青年局

Stable-Diffusionのv1系は画像に加わったノイズを予測するモデルですが、v2の一部はvelocityというものを予測しています。この2つは損失関数が違うのでlossで比べられません。経験的にv_predictionモデルの方が3倍くらいlossが大きくなるイメージですが、数学的に確認していきます。ノイズが加わった画像について元の画像を、ノイズをとすると時刻でノイズが加えられた画像はという式で表されます。はVAEエンコーダの出力である潜在変数なので、平均0で分散1の正規分布に従っています。ノイズはそもそも実装として平均0で分散1の正規分布です。めんどくさいのでとします。すると画…

surreal lagoon Jun 19, 2023, 9:18 PM

#

loss is typically fed into the backwards pass

                accelerator.backward(loss)

#

oh, well that sounds interesting

#

"improvement of details" is fucking vague

#

i will say that not every research paper has valid conclusions for their proposed reasoning

unborn wind Jun 19, 2023, 9:20 PM

#

yup lol. i'm messing around with it. dunno if it's good or not due to how fast my models get over-trained before I feel it has sufficiently learned the concepts.

surreal lagoon Jun 19, 2023, 9:23 PM

#

yeah that sucks and maybe you can go nuts and download LAION Aesthetics locally for like 20,000 images and give that a try for training

#

see if the dataset size helps at all with this early burning issue

#

alternatively i would request that you put your batch size up as high as you can and use a lot of gradient accumulations. think like, batch size 5 and 30 gradient accumulations

unborn wind Jun 19, 2023, 9:24 PM

#

I trained a LoHa on a dataset of about 700 images and it burnt out at 8 epochs. I then cut it in half and trained again and it seems overtrained at 6 epochs.

surreal lagoon Jun 19, 2023, 9:24 PM

#

you can increase your batch size, and it will effectively lower your learning rate

#

but it does so in a more coherent way that is less likely to get stuck in local minima

#

so it will take longer to train, and you can have more repeats before problems appear

#

you might need to still adjust your LR

unborn wind Jun 19, 2023, 9:26 PM

#

I left them at the default in Kohya

surreal lagoon Jun 19, 2023, 9:29 PM

#

i honestly have no idea what LoHA is and how you train it

#

if you're seeing "burnt" models it's because the unet is being fucked

#

the unet can be fucking wild, man lmao

#

step 100 of training

#

step 125

unborn wind Jun 19, 2023, 9:31 PM

#

LoHa: https://openreview.net/pdf?id=d71n4ftoCBy

surreal lagoon Jun 19, 2023, 9:31 PM

#

that's just the unet being trained

#

i'm quite liking this amount of learning it does every 25 steps tbh

#

5e-7 is my constant scheduler i'm using to train the unet

#

i would recommend you try that with the batch size changes i mentioned

#

your batch_size itself likely can't exceed 4 due to GPU memory but whatever you have to multiply your batch_size you can run with, by, to get to 150, you put that as your gradient accumulations. eg. 30 for batch_size 5

unborn wind Jun 19, 2023, 9:34 PM

#

ok I'll give it a try

surreal lagoon Jun 19, 2023, 9:37 PM

#

i wonder if tuning the text encoder and unet together is just really damaging in general

#

it's not done a whole lot in production teams, eg. stable diffusion 2.1 was made on a frozen text encoder

#

i've had really good results tuning the TE on its own, or the unet on their own but not both together

unborn wind Jun 19, 2023, 9:38 PM

#

is there a way in kohya to do one but not the other? would you set lr of one of them to 0?

surreal lagoon Jun 19, 2023, 9:38 PM

#

both combined -> just the TE -> just the unet

#

well i'm doing aspect bucketing now too

#

the text encoder is being put through total hell with that

#

i just don't understand why combining them makes the image all poopy

#

@stiff dust any idea ?

#

both -> text encoder -> unet

#

the unet is picking up widescreen details but the text encoder is not

unborn wind Jun 19, 2023, 9:45 PM

#

Is it possible to just train one or another in kohya?

surreal lagoon Jun 19, 2023, 9:46 PM

#

no clue you would have to look at their source code

#

or open an issue to ask

#

dem fingers tho

#

text encoder = two dudes lmao

#

unet = one dude

#

he beat him, took his guitar, and stole the show and our hearts

#

i wish the contrast issue weren't there, but, one thing at a time

#

faces 👁️👄 👁️

#

@undone fable what was the thing with your contrast issue recently? seeing the same thing here with the zero scaled SNR

surreal lagoon Jun 20, 2023, 4:51 AM

#

seem to have mostly eliminated the proliferation issue here 🙂 that's pretty great

#

that one image at the bottom still doing it but that seems to be a cursed seed

sturdy rune Jun 20, 2023, 2:15 PM

#

surreal lagoon i just compile the model.

Can you elaborate on this? I have a 4090 and I'm getting like 4.9it/s when training

surreal lagoon Jun 20, 2023, 2:20 PM

#

that's really fast

#

i get 229 seconds per iteration or so

#

the high it/sec was when i was training 512x512 on 1.5

#

it was also with a batch size of 1, which isn't good for training

#

if you're getting 4.9it/sec and you're doing a general finetune you probably need to slow it down with a higher batch size, which sounds stupid, but it works

sturdy rune Jun 20, 2023, 2:31 PM

#

Oh lower is better?

#

On a side note, can anyone point me to a Lora guide or video that DOESN'T just teach you how to overfit a Lora... There's several I've seen more popular on YouTube that all they're doing is teaching you how to make a extremely overfit lora of yourself... While those were good to get my feet wet, I'm trying to train concepts like break dancing, tipping hats, ect... And following those guides either end ugly or just completely overfit. I'm fine having to make multiples of the same attempts to get a good Lora... But I'm currently just stuck in an endless loop, because there's so much conflicting information

surreal lagoon Jun 21, 2023, 5:33 AM

#

idk about LoRAs but i finally broke this damn family up with fine-tuning on high res

#

there used to be a mom

#

and like 6 other kids

coral summit Jun 21, 2023, 11:19 PM

#

could someone help me fix this hand?

raven shore Jun 22, 2023, 12:20 AM

#

Have you tried inpainting the hand?

mental cliff Jun 22, 2023, 3:15 PM

#

Hey guys ! hand again. I've tried it all and cannot fix the hands on this one. Any idea how to proceed.
I just photobashed the hand from my 3D base and tried to use the depth map but its having a hard time showing hands following my map.

#

idk why but it shows everything but proper hands

surreal lagoon Jun 22, 2023, 4:25 PM

#

this isn't the "help me fix hands" channel

dull snow Jun 23, 2023, 11:45 AM

#

coral summit could someone help me fix this hand?

thanos

#

Shes about to snap

fallow pier Jun 23, 2023, 11:50 AM

#

I've created a custom model and reduced down to two checkpoints I can't decide on. I was wondering, if I merge both the models will I get something in-between? or will it just mess them up?

stiff dust Jun 23, 2023, 12:07 PM

#

just try it. Usually, merging improves model quality

#

messing up is rather unlikely. It's often surprisingly robust

vernal dock Jun 23, 2023, 12:28 PM

#

Hello! Has anyone been able to train clothing with patterns or drawings to generate it 1:1 (for example training a LORA of a dress (object only) and using it in generations so that models can wear it?

#

Thanks!

surreal lagoon Jun 23, 2023, 3:01 PM

#

@stiff dust so i'm trying to train on frozen text encoder considering i've seen others have good success with it and i'm getting a disappointing amount of noise in the images

#

#

the images look fine at a glance but zooming in, ugh

stiff dust Jun 23, 2023, 3:03 PM

#

could be worse. But yeah, I had so far better experiences with training the text encoder than training the unet

surreal lagoon Jun 23, 2023, 3:04 PM

#

but i need to train the unet to fix the proliferation issues

#

i tried training the text encoder and the unet with different rates and it still doesn't seem to work very well and i think it's because the text encoder's weights are shifting while the unet is trying to match its representations

#

so my current theory is, going back to base 2.1-v to tune the unet only, on larger aspect ratios, and then, bring the text encoder around a bit and hopefully clear the image up

#

that's what this image above is, 450 steps into that

#

the contrast changes from the noise schedule are probably still getting worked out too

#

the left is the original ckpt and the right is now

stiff dust Jun 23, 2023, 3:09 PM

#

yeah, I also train them consecutively, but I haven't evaluated how much that helps. Im training text encoder first, then unet

surreal lagoon Jun 23, 2023, 3:10 PM

#

when i did that i noticed coherence issues starting to arise in some of my test prompts. i have one asking for a mountain bike on a mountain road and the bike disappeared

#

it's still there at a different resolution but it's odd

#

it's not necessarily a show-stopper but it makes me wonder what will happen if i continue, and things just kinda got worse across other prompts

stiff dust Jun 23, 2023, 3:11 PM

#

nah, I found this happening all the time without meaning anything

surreal lagoon Jun 23, 2023, 3:11 PM

#

when you use laion data, are you using the text field as a caption?

stiff dust Jun 23, 2023, 3:12 PM

#

sometimes an image just flips between two different outcomes even with the smallest change on the weights

surreal lagoon Jun 23, 2023, 3:12 PM

#

i am wondering if that's hurting me

stiff dust Jun 23, 2023, 3:12 PM

#

if you train further the bike might comes back 🤷‍♂️

surreal lagoon Jun 23, 2023, 3:13 PM

#

here's an example of a ckpt that made me want to give up vs some improvements in it that i'm like " pikaOMG it's not over yet"

#

the original isn't so great tho either

#

i'm not using snr gamma or offset noise on this run because it was causing issues before, a little over halfway through training

#

but maybe i need a little bit of it

#

this noise in the water is what i was able to eliminate before with a bit of concurrent unet/text encoder tuning

hot breach Jun 23, 2023, 4:58 PM

#

surreal lagoon i'm not using snr gamma or offset noise on this run because it was causing issue...

I think min snr and offset are essentially dead, all the "diffusion is flawed" stuff which includes zero terminal snr with cfg rescaling is the way to go, zero terminal works right out of the box on sd2.1 and doesn't cause divergence, it works fairly well even if cfg rescale isnt applied at inference though it certainly helps

surreal lagoon Jun 23, 2023, 4:58 PM

#

SAI used offset noise with epsilon prediction instead for SDXL

hot breach Jun 23, 2023, 4:59 PM

#

I saw their loss.py didn't include zero term, just l1, l2 and lpips for the vae, but I swore someone said they used zero terminal

surreal lagoon Jun 23, 2023, 4:59 PM

#

Joe Penna did but i think he was just messing with people

hot breach Jun 23, 2023, 4:59 PM

#

they may have done something else in noise schedulers

#

lol

surreal lagoon Jun 23, 2023, 4:59 PM

#

they're using standard schedulers like DDIM

#

nothing changed there

#

i slapped the text encoder from pseudo-journey-v2 (15.6k steps fine-tuned) and the noise is gone in most images i try

hot breach Jun 23, 2023, 5:03 PM

#

surreal lagoon Joe Penna did but i think he was just messing with people

@cobalt nova 🤨 catwhaaa

surreal lagoon Jun 23, 2023, 5:03 PM

#

they might have tried it and then abandoned it

#

the huggingface guy Patrick said he didn't think terminal SNR was so groundbreaking

#

at me own expense, i quickly tuned a model about 30k steps at batch size 150 for them to get the fixes working inside diffusers otherwise the work was going to stall

#

that model knows what darkness is but it takes things too damn far

#

i found lower FID scores corresponded with higher cfg rescaling around 0.7 but the aesthetics scorer had the highest score around rescaling at 0.3 so i split the difference and settled in 9.2 guidance and 0.3 rescale, as it has the best intersection point between the two scores

#

the aesthetics score is highest when there's no cfg rescaling done but to my actual human eyes that shit is blown out and hyper contrasted

#

@stiff dust i hoped VAE tiling would improve faces, does it not work that way?

#

or the representation going in i guess is still very small

stiff dust Jun 23, 2023, 5:25 PM

#

just from the name I thought vae tiling is just splitting an image and decoding the parts independently 🤷‍♂️ for convolution it doesn't matter so much, but attention gets expensive if you process a large image at once

surreal lagoon Jun 23, 2023, 5:52 PM

#

ohhh... hmmmmm

#

it actually seems to look better 😐

#

#

ah i like this a lot now actually

surreal lagoon Jun 23, 2023, 6:19 PM

#

exotic musk Jun 24, 2023, 12:18 AM

#

what tips do u guys have for deciding a good network dim and alpha based on the dataset?

narrow kraken Jun 24, 2023, 12:19 AM

#

@exotic musk there is a paper that was released for this

exotic musk Jun 24, 2023, 12:19 AM

#

is their a link u can give me?

narrow kraken Jun 24, 2023, 12:20 AM

#

type network_dim network_alpha conv_dim conv_alpha
LoRA 32 16
LoCon 16 8 8 1
LoHa 8 4 4 1

#

i'll look for the article

exotic musk Jun 24, 2023, 12:21 AM

#

but, i am trying to figure out how to decide and switch up the networks based on my dataset size

narrow kraken Jun 24, 2023, 12:22 AM

#

i know, you want a corelation between the 2

#

it's really hard to determine since there is no real baseline to compare against

#

i'll search for that paper and get back to you

exotic musk Jun 24, 2023, 12:24 AM

#

ok

#

thank you appreciate it.

acoustic tapir Jun 24, 2023, 12:51 AM

#

Hi! Could anyone please help me a bit with training? I was trying to train my own images and tried with dreambooth as well as embded, but they all seems using other existing model and the result seems pretty off, is there other recommended ways to do it? thank you!

cobalt nova Jun 24, 2023, 4:17 AM

#

surreal lagoon Joe Penna did but i think he was just messing with people

We tried it.

#

Wasn't better.

#

Made some darker images, but messed up a lot of stuff.

#

Will give it another go when we have some time.

surreal lagoon Jun 24, 2023, 4:20 AM

#

but that wasn't with v-pred?

#

it might just be something that people choose to fine-tune in. i would love to research that, i did apply for access to the weights.

#

i did see mcmonkey say that offset noise is being used and i assume snr gamma, in which case the terminal SNR stuff seems to 'fight' with it and it makes the darks splotchy

#

and 'too' dark

surreal lagoon Jun 24, 2023, 7:48 AM

#

i can get a white background now

stiff dust Jun 24, 2023, 8:35 AM

#

acoustic tapir Hi! Could anyone please help me a bit with training? I was trying to train my ow...

I'm also not super happy with my results so far, but I found:

textual inversion is extremely helpful
even better is a lora on the text encoder
training the unet is necessary to achieve real photorealism, but should be done carefully. For drawing anime pictures of yourself, textual Inversion or text encoder is usually enough

surreal lagoon Jun 24, 2023, 5:25 PM

#

import open_clip
import torch
from PIL import Image

model, _, transform = open_clip.create_model_and_transforms(
  model_name="coca_ViT-L-14",
  pretrained="mscoco_finetuned_laion2B-s13B-b90k"
)

im = Image.open("cat.jpg").convert("RGB")
im = transform(im).unsqueeze(0)

with torch.no_grad(), torch.cuda.amp.autocast():
  generated = model.generate(im)

print(open_clip.decode(generated[0]).split("<end_of_text>")[0].replace("<start_of_text>", ""))

think how well does this work?

#

is it better than clip_interrogator?

#

@stiff dust might be a dumb question but can you use a textual inversion on the text encoder during training?

stiff dust Jun 24, 2023, 6:19 PM

#

yes, if the embedding layer is not freezed it happens automatically

#

or do you mean "using" an already trained TI? But same answer

opal spoke Jun 24, 2023, 6:47 PM

#

Guys is there any tutorial to train LoRA on Google Collab

https://colab.research.google.com/github/Linaqruf/kohya-trainer/blob/main/kohya-LoRA-dreambooth.ipynb#scrollTo=8VT6NLv-2u6q

Google Colaboratory

surreal lagoon Jun 24, 2023, 7:31 PM

#

i think i will put multi-checkpoint comparison into my discord bot somehow

#

currently i can switch between models and do a single batch for one but i could just have a config for which models to compare and then return one image from each, and to steal and idea from sdxl training, have buttons to tell it which is the best, and keep score for each model

surreal lagoon Jun 24, 2023, 10:21 PM

#

stiff dust yes, if the embedding layer is not freezed it happens automatically

this stuff is still a struggle for me to understand even as far as i've gotten with it in practical terms. i am beginning to understand after reading the CLIP paper by OpenAI that the text embeds are essentially a condensed representation of a string-image pair in latent space, right? it does make sense to me that the academics behind this stuff would be confused why we want to fine-tune it. but i am not sure what we're changing or benefitting when we fine-tune it. is it, to ensure it more closely follows prompts?

#

the text encoder seems to be important for image clarity, but i'm not sure how it's achieving that. eg. a textual inversion's goal is to find the most optimal vector that produces the least noisy result

stiff dust Jun 24, 2023, 10:44 PM

#

I would say TI is quite intuitive. You introduce a new token, not known to the text encoder, and set it weights such that it represents/describes what you see in the images

surreal lagoon Jun 24, 2023, 10:55 PM

#

am i mixing up embeds and inversions?

stiff dust Jun 24, 2023, 10:55 PM

#

dunno, embeds is a rather generic term

surreal lagoon Jun 24, 2023, 10:56 PM

#

i'm thinking of the negative prompt embeds you see like 'nfixer'

stiff dust Jun 24, 2023, 10:56 PM

#

yes, that are usually TI

surreal lagoon Jun 24, 2023, 10:57 PM

#

i basically want to make that somehow the default weights of the text encoder rather than have to prompt for it

stiff dust Jun 24, 2023, 10:58 PM

#

that tuning the text encoder is so powerful is surprising, but I would not think of the text encoder as a single text-image embedding. It is a transformer, so EACH word in the text is given to the unet. The text encoder contextualizes the words, such that they somehow better align to the image space. So I assume that the sentence "anime movie of a castle floating in the sky" is contextualized in a way that the word "castle" is not just connected to images of castles, but to images of floating castles and anime movies. Thus the unet "knows" how to relate that word to the pixels in the image

#

now you can either train the unet to better connect words to the image, or the text encoder, such that the contextualization better fits to what the unet already learned

surreal lagoon Jun 24, 2023, 10:59 PM

#

yeah i thought of that too. eg. "this word is likely to appear with these other words"

#

eg. that's how you get a human with hands without asking for a hand

stiff dust Jun 24, 2023, 11:00 PM

#

surreal lagoon i basically want to make that somehow the default weights of the text encoder ra...

maybe train a prior of high quality images without a caption in 5% of the cases

surreal lagoon Jun 24, 2023, 11:00 PM

#

oh i definitely don't drop captions randomly though i did read that helps with CFG

stiff dust Jun 24, 2023, 11:01 PM

#

I mean if you do not provide a negative prompt, the "" prompt is used

surreal lagoon Jun 24, 2023, 11:02 PM

#

#

current tuning results

#

i've got clear images, low residual noise and true blacks

#

multi-aspect support

#

but i just want all of the images this clear Sad

surreal lagoon Jun 24, 2023, 11:20 PM

#

by the way it seems to do better "small faces" at higher resolutions now that i've trained it on 1766x1024 and 1024x1766 and 1024x1024 (plus two other aspects i forget)

#

so i can get a crowd of people with better faces because each face is actually larger

surreal lagoon Jun 24, 2023, 11:28 PM

#

stiff dust I mean if you do not provide a negative prompt, the "" prompt is used

i'm supposed to have negative captions for images too? 😮

#

so i've got some conditions in my training dataloader where it is possible their captions get completely reduced to "", just an empty string, which i was kind of happy about, because i just assumed it would help with CFG

#

but it was unintentional and just kind of sporadic. it would be nice to have a minimum dropout threshold

#

@hot breach what kind of improvements have you all seen from consistently applied caption dropout

#

i don't do image flipping either as i don't want to damage text coherence

hot breach Jun 24, 2023, 11:40 PM

#

conditional dropout should in theory improve CFG scale response, you can get funny results if you use super high values like 0.5 or 50%

surreal lagoon Jun 24, 2023, 11:41 PM

#

i don't have any values to consider so for me it'll just be a % i'm implementing here

hot breach Jun 24, 2023, 11:41 PM

#

the publication from the SD release said they used 10%

surreal lagoon Jun 24, 2023, 11:42 PM

#

yeah that's where i remember reading this

#

hmmmm okay so not using enough caption dropout is reported to result in an overreliance on prompts and basically overfitting TO prompts, memorizing their vocabulary and damaging the ability to generalise on unseen captions

hot breach Jun 24, 2023, 11:45 PM

#

I think I usually use 0.02 to 0.05 the most, higher values can sometimes have an effect of forcing a style into generations without asking for the style, you can get some interesting results, not sure its super useful to use super high values but you might want to try it out just so you can see how it differs, on a small scale

surreal lagoon Jun 24, 2023, 11:46 PM

#

ok interesting so i should try training the lord of the rings dataset without any captions at all then

hot breach Jun 24, 2023, 11:49 PM

#

I expect if you do that for long enough, you'll see lord of the rings characters pop up in generatoins despite not asking for it, particularly at lower cfg scale settings

surreal lagoon Jun 24, 2023, 11:49 PM

#

that's kind of the goal with that model 😄

#

it's just a toy because i messed with it in a small scale and managed to get a downhill mountain bike sitting against a mountain in one of the lord of the rings style forests and i was like POGGERS

#

also when i prompt for wizards now it brings up Gandalf KEKL

#

friggen Coco knows about hobbits and wizards as well as blip does

#

PepoG

#

i guess we're into a new era, something's happening KEKL

#

Trust the Process ™️

raven shore Jun 25, 2023, 5:29 AM

#

Is there any reliable way to upscale a busy image with a lot of small buildings in the distance?

dark egret Jun 25, 2023, 12:11 PM

#

raven shore Is there any reliable way to upscale a busy image with a lot of small buildings ...

I've had mixed success in these cases (at least using this model https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler)

surreal lagoon Jun 25, 2023, 2:49 PM

#

raven shore Is there any reliable way to upscale a busy image with a lot of small buildings ...

this is the wrong channel for questions like that. but you should ask about controlnet in #1072238304042438758 or #💬｜general-chat or #🏞｜general-with-images

raven shore Jun 25, 2023, 2:50 PM

#

surreal lagoon this is the wrong channel for questions like that. but you should ask about cont...

Sorry about that, my mistake! Thanks

surreal lagoon Jun 25, 2023, 2:53 PM

#

np

surreal lagoon Jun 25, 2023, 5:36 PM

#

@hot breach have you noticed the breakdown of terminal SNR capabilities for darkness at some point?

hot breach Jun 25, 2023, 5:37 PM

#

no

#

it seems stable long term, unlike offset noise

surreal lagoon Jun 25, 2023, 5:37 PM

#

this is about the best i can get for solid black background and it only really works at the 1:1 aspect ratio the model was trained on

#

left side is terminal SNR and right is offset noise at 1280x720

1687712951.9933171597a8ddc7b4f4a3c6e576f8eac60322.png

1687713789.3587084e6b6af2cd38760ac840c68d4601e44d8.png

hot breach Jun 25, 2023, 5:38 PM

#

maybe at some certain scale it could, beyond what I've tested but I've done many 10k steps on a 25k image dataset of just random assortments of things and not noticed issues

surreal lagoon Jun 25, 2023, 5:38 PM

#

hm, interesting

#

and you just grab the trained betas from DDIM and pass into DDPM?

#

for training noise scheduler

hot breach Jun 25, 2023, 5:39 PM

#

my test data is not very concentrated on bright or dark in particular

#

I use the algorithm that was in the paper to create trained_betas and pass that in to the noise schedule as trained_betas which diffusers accepts

#

DDIMScheduler.from_pretrained(model_root_folder, subfolder="scheduler", trained_betas=trained_betas)

hot breach Jun 25, 2023, 6:01 PM

#

samples from fairly early on, it can do both light and dark great

#

well discord unforunately cuts the previews, click and look at the left sample (cfg 7)

#

"cg render of a tree ent on a few small branches with leaves on its body, with a black background" this was not even 1 epoch in

#

many more epochs in still stable on bright and dark images

#

#

high contrast, faces still good, etc

#

logo is potato quality for lack of data (just a random sample I grabbed from middle of training a bunch of random data), but it nails the contrast at least and colors look good to at least illustrate the zero terminal snr works well

surreal lagoon Jun 25, 2023, 6:57 PM

#

try other resolutions

#

i was trying prompt solid black background from CFG of 3 to 9 and rescale of 0 to 0.7 and the closest i get is at like 9 CFG and 0 rescale

#

in a square resolution. in a widescreen one i get, washed out uniform look

#

i can get images with bright scenes, no problem

#

#

the model still makes great images at a square res

#

this is what i get in widescreen for a nighttime prompt

#

hm i guess it's not great in square either lol

surreal lagoon Jun 25, 2023, 9:23 PM

#

50 steps with offset noise and it's so much better

surreal lagoon Jun 26, 2023, 12:54 AM

#

@undone fable i needed a bit of offset noise and pertubation to bring darkness in line, but the results with CFG rescale are interesting

#

1687740641.2191603092570e331754969fcb0b3000ebaccc5.png

1687740683.1617575ce78d208c02cf97a0a16652cf55b3ed9.png

#

left is cfg rescale at 0.7 and right is 0.0

#

the cfg rescale change to "prevent washed-out images" seems to also prevent images from becoming too dark

surreal lagoon Jun 26, 2023, 1:53 AM

#

1024x1536 native gens 😄

surreal lagoon Jun 26, 2023, 4:41 AM

#

surreal lagoon Jun 26, 2023, 5:48 AM

#

https://huggingface.co/ptx0/pseudo-flex-base/blob/main/README.md @stiff dust

stone garden Jun 26, 2023, 11:32 AM

#

Hello everyone,
I'm looking for a small fine-tuned stable diffusion model, that output max 256x256 images, you know any?

surreal lagoon Jun 26, 2023, 12:51 PM

#

deepfloyd kinda?

final matrix Jun 26, 2023, 1:07 PM

#

i compared my models photorealism vs. that of deliberate, dreamshaper, and realistiv vision
mine is arguably worse, but also doesnt have any noise fix applied and i would argue my model is more diverse reagrding the output, e.g. the clothing, faces, backgrounds, etc

#

the grid doesnt show the negative prompt but it was:
anime, cartoon, digital art, cgi, render, 3d, drawing, sketch, instagram, pastel, dada, zombie, ugly, surreal, text, watermark, abstract, old, fat, jpeg, black and white, vintage, amateur, film grain, evil, damaged, concept, unfinished, model, cover, clay, figure, toy, pixelated, bad, inexperienced, illogical, random, oversaturated, overexposed, rough, fake, unrealistic, sloppy, artificial, low budget, unprofessional, cropped, out of frame, low-quality, poorly drawn, deformed, bad proportions, malformed, imperfect, unnatural, extra, rushed, weird

viscid condor Jun 26, 2023, 1:49 PM

#

I was trying to train a Lora but some photos in the dataset are 1080x1920 and others are different sizes. I wanted to keep the full images in the dataset without resizing or cropping out parts as they were important for the lora training to be what i want to create. Is this possible or would everything need to be 512x512

surreal lagoon Jun 26, 2023, 2:12 PM

#

you can condition them so the smaller side is 512 and preserve aspect ratio

viscid condor Jun 26, 2023, 3:05 PM

#

is this through collab lora training?

surreal lagoon Jun 26, 2023, 6:07 PM

#

@undone fable you're right about the 1920x1080 not holding up even if it "(kinda) learns how to do it"

#

it'll generate dupes about 20% of the time and the rest of the time it doesn't produce a duplicate you get a weirdo like that

#

but this model works way better with hires fix than one that i trained only on square 768

undone fable Jun 26, 2023, 6:56 PM

#

surreal lagoon <@147927142045319170> you're right about the 1920x1080 not holding up even if it...

yea at a certain point the lowest level feature size as well as the attn blocks alone are going to have a bad time

surreal lagoon Jun 26, 2023, 6:57 PM

#

by the former part you mean when you make a 512x512 image and it's all artifacted?

#

yeah as far as i can tell that's because the attn layers learn to expect a large res image with details that can't be expressed

exotic musk Jun 27, 2023, 1:58 AM

#

is 2e-05 to low of a training rate for 108 image dataset i did 200 steps and it was super undertrained?

surreal lagoon Jun 27, 2023, 2:08 AM

#

that might be too little info, can you elaborate on what style of training and what you're training on

#

it sounds like you're doing a LoRA?

exotic musk Jun 27, 2023, 2:11 AM

#

oh yea

#

lora my bad

#

and 1.5

exotic musk Jun 27, 2023, 2:12 AM

#

surreal lagoon that might be too little info, can you elaborate on what style of training and w...

i am trying to make a model of an artist i like in prosthetic here for example one of the images

surreal lagoon Jun 27, 2023, 2:13 AM

#

man that's hard to look at

#

i think you need to do like 800 steps but it depends on your batch size. if it's very high, it'll need a higher LR. if you're using regularization / class images (prior preservation loss technique) it'll also take longer to train

#

just a warning that if your images are all blurred and pixelated like that one you're going to have a bad time

#

there's other parameters for Lora i'm not versed in, like rank etc

exotic musk Jun 27, 2023, 2:17 AM

#

surreal lagoon man that's hard to look at

ok ty

#

this was helpful

surreal lagoon Jun 27, 2023, 2:18 AM

#

i mostly do general finetuning. i have not messed with lora. be sure to update us with your progress!

1687832238.228052473929611e070c0c87249ba54e73544e2.png

exotic musk Jun 27, 2023, 2:23 AM

#

surreal lagoon i mostly do general finetuning. i have not messed with lora. be sure to update u...

got it what ratio for checkpoints do u recommend because it could overtrain before 800

surreal lagoon Jun 27, 2023, 2:23 AM

#

ratioooo think what is this

#

oh

#

12.5%

exotic musk Jun 27, 2023, 2:24 AM

#

screenshot-colab.research.google.com-2023.06.26-22_23_59.png

#

ok ty

final geyser Jun 27, 2023, 12:12 PM

#

inpainted his head but idk of the heads size matches

surreal lagoon Jun 27, 2023, 2:57 PM

#

try textual inversions for your dresses and subjects. you can train them in not-very-long, and they result in very clean and noise-free images, especially on 2.1

#

textual inversions are going to become the tool of choice moving forward, the SAI devs have stated that TIs in SD 2.1 are as powerful as LoRAs were in 1.5, and TIs in SDXL are as powerful as 2.1 LoRAs, etc

#

everything shifts up in power and capability as the number of parameters in the model increases

#

SDXL will likely not require massive fine-tunes to be amazing, just a few tiny textual inversions.

solid axle Jun 27, 2023, 4:30 PM

#

surreal lagoon try textual inversions for your dresses and subjects. you can train them in not-...

Oh, great, thanks a lot, I haven't investigated this yet. Before I do some research and test - shall I just drop the idea to train on 'style' or instance/class for that particular case? I just wanted to make sure I am not trying to reinvent the wheel 🙂

surreal lagoon Jun 27, 2023, 5:37 PM

#

@stiff dust can we (you and i) somehow make a new VAE for SD 2.1?

1687887211.9632316c2ee5ec0f94ab47baf3713032e88900c.png

#

it is the main issue now that my noise issues are resolved (this model looks smooth like DALL-E 2)

crimson meteor Jun 28, 2023, 12:04 AM

#

Hiring a stable diffusion developer,

I'm currently launching a startup that uses stable diffusion as a base for image generation, So far I have trained my own models and LoRA's but I need someone who's more advanced in model training. for example, someone who can help come up with workflows to get specific results using features like controlnet, or automate higher quality captioning, and squeeze more quality than my average dreambooth style trainings.

I'm looking forward to starting with a few paid gigs and then we can agree on a long-term arrangement.
feel free to shoot me a DM if interested.

stone garden Jun 28, 2023, 9:39 AM

#

crimson meteor Hiring a stable diffusion developer, I'm currently launching a startup that us...

I'm not nearly there yet; but a post in #1011228667659178055 might help.

stone garden Jun 28, 2023, 10:24 AM

#

Hello everyone, I was running into a NaN loss when creating a Lora/DyLora. I've checked all the usual suspects: No nans in input data for the given step; reasonable captions present, etc. Does anyone have any tips to resolve the NaN loss? I've tried kohya-ss/sd-scripts as well as cloneofsimo/lora and both create this issue. I was on an AMD GPU so I thought it might be related but I debugged deep and it seems to be overflows happening somewhere.

#

In fact, the unet produces all NaNs for non-nan inputs.

surreal lagoon Jun 28, 2023, 1:12 PM

#

on amd use fp32

stone garden Jun 28, 2023, 1:13 PM

#

I tried that too -- It NaNs out, only at a later stage. 😦 I also rented a paperspace machine with Nvidia, it still NaNs out there as well with fp16.

surreal lagoon Jun 28, 2023, 1:14 PM

#

make sure you arent using snr gamma

stone garden Jun 28, 2023, 1:21 PM

#

So, set it to zero? Thank you so much for the guidance, btw.

surreal lagoon Jun 28, 2023, 2:31 PM

#

yeah snr gamma seems to be broken these days tbh

#

it causes NaNs here every time but i'm on v_prediction models like 2.0-v and 2.1-v

exotic musk Jun 28, 2023, 5:42 PM

#

surreal lagoon i mostly do general finetuning. i have not messed with lora. be sure to update u...

At 800 steps it didn’t learn the 4 eyes still

surreal lagoon Jun 28, 2023, 5:42 PM

#

try a textual inversion as well

#

you might need both

exotic musk Jun 28, 2023, 5:45 PM

#

How do i train a textual inversion?

#

Sorry

#

I have never tried it

exotic musk Jun 28, 2023, 5:47 PM

#

surreal lagoon try a textual inversion as well

Can you help me or show me how to train a textual inversion

surreal lagoon Jun 28, 2023, 5:52 PM

#

unfortunately i've not got time to

stone garden Jun 28, 2023, 6:07 PM

#

surreal lagoon try a textual inversion as well

Yeah I'm training both U-Net and TI as I need to inject a few keywords.

whole crypt Jun 28, 2023, 10:22 PM

#

Any tips on fine tuning a subject with glasses and a beard?
I have had great luck with training on women subjects:

no classification images
30~ clear varied images
50~ steps (around 1600 total steps)
text files with classifications and tokens (xyz a woman wearing a tank top, standing outside)
lion optimizer
Using these settings (plus a couple of others) i can train a female subject almost every time.
However, male subjects with glasses and a beard give me bad results. Deformed faces, terrible eyes, etc.
Any tips?

tall condor Jun 29, 2023, 12:45 PM

#

you will need much more steps

#

also make sure that your captions are correct

gentle osprey Jun 29, 2023, 2:10 PM

#

Anyone mess around with training GANs? Playing around with the idea of using a GAN to produce SD training data but wasn't sure how easy it was, if even possible, to teach a GAN concepts like a specific face.

surreal lagoon Jun 29, 2023, 3:12 PM

#

GANs are definitely used to generate or improve training data. the CelebA-HQ dataset comes with a processing script that takes each downloaded celeb picture and upscales it using RealESRGAN

surreal lagoon Jun 29, 2023, 9:48 PM

#

something i've discovered now is that if you train the text encoder on its own on a set of captioned data and then, separately train the unet on top of that frozen text encoder, the results are really powerful

#

training them together sucks ass and should not be done

#

when you train them concurrently, the text encoder will learn to represent the relations between concepts before the unet has a chance to learn how to represent them faithfully

#

seems that the unet picks the concepts up a lot sooner with the altered procedure

hot breach Jun 30, 2023, 12:06 AM

#

messed with open-flamingo a bit the last day, I made a script to run locally in ED2 so you can bulk caption with it:
python caption_fl.py --data_root input --min_new_tokens 20 --max_new_tokens 35 --num_beams 3 --model "openflamingo/OpenFlamingo-9B-vitl-mpt7b" --temperature 1.0
it will take hints from the example image/caption pairs you supply (2 seems to be enough) and then caption your images, seems to be accurate and know a lot of proper names

#

the authors of open-flamingo itself have a demo up on HF space you can use to try it out without loading anything: https://huggingface.co/spaces/openflamingo/OpenFlamingo

#

it uses example image/caption pairs to prime the model (here, the first two), then you provide the novel image for captioning (third one here), here their demo uses the humus and sign already captioned, then I gave it the image of cloud on a ladder and it captioned it for me, without me telling it who cloud strife is at all

tall condor Jun 30, 2023, 8:07 AM

#

cant wait for kohya to support the new sdxl 0.9

surreal lagoon Jun 30, 2023, 2:45 PM

#

it already does

#

i don't think it supports the new pieces of training but no one seems to care about the aesthetics score input lol

surreal lagoon Jul 1, 2023, 3:17 AM

#

@hot breach teaching 2.0-v terminal snr and widescreen. there's a bit of oscillation to it figuring out the noise schedule where it seems to get it early and then really starts to pick it up

#

it's supposed to be just the glowing TV in a dark room

meager tangle Jul 1, 2023, 7:53 AM

#

Hi i hope i'm in the good chanel to ask that , i want to add less know celebity to the absolutereality model , can i achieve that with dreambooth ?

torpid mason Jul 1, 2023, 5:26 PM

#

hey guys, may I know if there is any channel here where I could find regularization images for LoRa training? (looking for realistic women)

hot breach Jul 1, 2023, 7:45 PM

#

torpid mason hey guys, may I know if there is any channel here where I could find regularizat...

FFHQ and imdb-wiki perhaps?

surreal lagoon Jul 2, 2023, 4:41 AM

#

torpid mason hey guys, may I know if there is any channel here where I could find regularizat...

LAION-Faces

quick onyx Jul 2, 2023, 2:30 PM

#

whole crypt Any tips on fine tuning a subject with glasses and a beard? I have had great luc...

Are there any good guides on how to build lora?

quick onyx Jul 2, 2023, 2:32 PM

#

whole crypt Any tips on fine tuning a subject with glasses and a beard? I have had great luc...

When you say clear images do they need to be a certain resolution?

bright solar Jul 2, 2023, 2:38 PM

#

hello 🙂

what can cause loss to be > 0.4?

training lora with kohoya on SD-2.1 model with pytorch 2.0

sleek fossil Jul 2, 2023, 2:51 PM

#

When you're finetuning a model to a specific persons face via Dreambooth training, should the class be included on both the instance and class sections? (& if so, should the class also always be included when prompting with the unique token?)

Example:
Instance: "a photo of XYZ123 person"
**Class: ** "a photo of a person"

I typically see some version of the above in the majority of tutorials/guides, but then in the same content - I'll see the prompting examples they provide after completion exclude the "person" part of the unique ID.

"a highly realistic photo of XYZ123,….." **vs. ** "a highly realistic photo of XYZ123 person,….."

I recently read that the exclusion is a common mistake, but with how fast things change and how much conflicting information I see across the resources - I feel like it'd be a better idea to get the answer here.

whole crypt Jul 2, 2023, 2:52 PM

#

quick onyx When you say clear images do they need to be a certain resolution?

They are going to be resized to 512x512, so bigger than that. Otherwise they just need to be sharp, and not blurry.

jade hinge Jul 2, 2023, 5:08 PM

#

Here best Realism fine tuning

surreal lagoon Jul 2, 2023, 5:16 PM

#

jade hinge Here best Realism fine tuning

please don't spam here. you spam on huggingface forum too. just stop spamming

quick onyx Jul 3, 2023, 3:00 AM

#

whole crypt They are going to be resized to 512x512, so bigger than that. Otherwise they jus...

So smaller resolution photos I shouldn’t bother with? I’m attempting to do letters/words/logo. It’s for logo which is just words. I want to use it on products like a coffee packaging.

How would you go about arranging photos for this?

quick onyx Jul 3, 2023, 7:58 AM

#

I'm using Booru, I want a word to be the focus which ones tags would be best used?

#

for instance i want to use a logo with only words on it, and attach that (word/logo) onto coffee cups, t-shirts, signs, and other products

sonic narwhal Jul 3, 2023, 8:55 PM

#

Openflamingo works really well for captioning

gentle osprey Jul 3, 2023, 11:17 PM

#

quick onyx for instance i want to use a logo with only words on it, and attach that (word/l...

I'm pretty sure StableDiffusion is terrible with words. Think deepfloyd is more geared towards what you're trying to do.

quick onyx Jul 4, 2023, 12:14 AM

#

gentle osprey I'm pretty sure StableDiffusion is *terrible* with words. Think deepfloyd is mor...

I’ve seen the monster energy drink logo on stable diffusion. I’ll look into the one your talking about thanks

quick onyx Jul 4, 2023, 7:34 AM

#

whole crypt They are going to be resized to 512x512, so bigger than that. Otherwise they jus...

Once i downloaded it the issue disappeared

https://developer.nvidia.com/cuda-downloads?target_os=Windows&target_arch=x86_64&target_version=10&target_type=exe_local

NVIDIA Developer

CUDA Toolkit 11.7 Downloads

Get the latest feature updates to NVIDIA's proprietary compute stack.

quick onyx Jul 4, 2023, 7:59 AM

#

gentle osprey I'm pretty sure StableDiffusion is *terrible* with words. Think deepfloyd is mor...

Is that an add on I can plug into stable diffusion or a competely seperate ai?

quick onyx Jul 4, 2023, 8:05 AM

#

gentle osprey I'm pretty sure StableDiffusion is *terrible* with words. Think deepfloyd is mor...

#

https://civitai.com/models/19043?modelVersionId=22626

MONSTER ENERGY - v2 | Stable Diffusion LoRA | Civitai

only holding (still morphing characters in monsters)

#

It's not perfect but it looks decent

quick onyx Jul 4, 2023, 8:23 AM

#

@quiet moat Is there anything in the pipeline to improve words and text in images?

vast dome Jul 4, 2023, 11:45 AM

#

guys

#

what is "dreambooth dynamic image normalization"

#

what does it do?

#

does it resize my images?

gentle osprey Jul 4, 2023, 2:45 PM

#

quick onyx Is that an add on I can plug into stable diffusion or a competely seperate ai?

Separate AI

dusty pawn Jul 4, 2023, 4:29 PM

#

im making a lora about diamondhead from ben 10

#

https://tenor.com/view/diamondhead-fault-i-dont-care-angry-ben10-gif-21875243

Tenor

#

hes a very complicated character for the ai

#

so far i made a lora but

#

it looks

#

well.........

#

01457-4261821129-diamondhead_mugshot_masterpiece_best_quality__daomeng_inception_Mirror_City.png

01462-4100559970-diamondhead_masterpiece_best_quality__daomeng_inception_Mirror_City.png

01442-1107560943-diamondhead_full_body_portrait_masterpiece_best_quality__daomeng_inception_Mirror_City.png

01447-3863722367-diamondhead_sitting_on_bench_outdoors_masterpiece_best_quality__daomeng_inception_Mirror_City.png

01449-540789961-diamondhead_sitting_on_bench_floating_in_space_masterpiece_best_quality__daomeng_inception_Mirror_City.png

01450-3929108870-diamondhead_floating_in_space_masterpiece_best_quality__daomeng_inception_Mirror_City.png

01451-823430445-diamondhead_wearing_orange_hoodie_masterpiece_best_quality__daomeng_inception_Mirror_City.png

01452-432834633-diamondhead_wearing_orange_hoodie_masterpiece_best_quality__daomeng_inception_Mirror_City.png

#

these are the best ones

#

that look like him

#

and arnt just green diamonds

#

what do u advice me to change?

#

i have 144 images

#

all from the original show

#

no fan art

#

i have some promotional images of him in different artstyles so idk about that

#

i used the google collab one by hollow strawberry

quick onyx Jul 4, 2023, 11:43 PM

#

gentle osprey Separate AI

Wonder why they made a separate one it’s from the same company?!?

dusty pawn Jul 5, 2023, 5:08 AM

#

dusty pawn

anybody ?

#

ive been told it has something to do with the learning process

gentle osprey Jul 5, 2023, 5:05 PM

#

quick onyx Wonder why they made a separate one it’s from the same company?!?

That answer is probably more technical than it's worth getting into

#

That being said, I would look into using ControlNet rather than training LoRAs

quick onyx Jul 5, 2023, 10:38 PM

#

gentle osprey That being said, I would look into using ControlNet rather than training LoRAs

Wait what, Can you explain why?

quick onyx Jul 5, 2023, 10:40 PM

#

gentle osprey That answer is probably more technical than it's worth getting into

I’m not following that one does art as well not only words, what would be the major difference? If it’s turning noise into a picture

#

I’m trying to understand this if anyone feels like a breakdown or even a link

gentle osprey Jul 5, 2023, 10:43 PM

#

quick onyx Wait what, Can you explain why?

honestly, if you're new to this, the answer to that isn't going to make sense and isn't going to provide you with any useful information that you can apply to what you're working on

#

like if you're at the "how do I train a LoRA stage", the answer to "why is deepfloyd better at text than stable diffusion" isn't something that's worth you exploring unless you already have a good understanding of how the whole diffusion process works

#

but you can start here for foundational stuff: https://arxiv.org/pdf/2205.11487.pdf

quick onyx Jul 5, 2023, 10:51 PM

#

Thanks!

regal harbor Jul 6, 2023, 8:37 AM

#

which algo are you using for captions? What's the best one, especially to tag details like dirt on skin, water droplets, clothes, lighting, noise

dusty pawn Jul 6, 2023, 12:11 PM

#

https://tenor.com/view/hello-hello-everyone-hello-everybody-im-here-ben10diamondhead-gif-19775842

Tenor

surreal lagoon Jul 6, 2023, 3:53 PM

#

regal harbor which algo are you using for captions? What's the best one, especially to tag de...

use ViT-bigG-14

regal harbor Jul 6, 2023, 5:06 PM

#

surreal lagoon use ViT-bigG-14

any idea how it compares to BLIP2?

surreal lagoon Jul 6, 2023, 5:06 PM

#

BLIP2 is just a way of using these text embeddings

dusty pawn Jul 6, 2023, 5:06 PM

#

brother

surreal lagoon Jul 6, 2023, 5:06 PM

#

the text encoder provides embeddings

dusty pawn Jul 6, 2023, 5:06 PM

#

i need advice

regal harbor Jul 6, 2023, 5:07 PM

#

surreal lagoon BLIP2 is just a way of using these text embeddings

oh, I thought BLIP2 could create captions / tags for images

surreal lagoon Jul 6, 2023, 5:07 PM

#

it does

dusty pawn Jul 6, 2023, 5:07 PM

#

does my role make me invisibe

surreal lagoon Jul 6, 2023, 5:07 PM

#

@dusty pawn you haven't asked a question yet.

dusty pawn Jul 6, 2023, 5:07 PM

#

surreal lagoon <@639450191937798144> you haven't asked a question yet.

i did

regal harbor Jul 6, 2023, 5:07 PM

#

I want to tag my dataset accurately, and there are too many images to do it all manually

dusty pawn Jul 6, 2023, 5:08 PM

#

dusty pawn well.........

here

#

im making a lora

surreal lagoon Jul 6, 2023, 5:08 PM

#

oh, from days back

dusty pawn Jul 6, 2023, 5:08 PM

#

yes

surreal lagoon Jul 6, 2023, 5:08 PM

#

well i don't train LoRAs so, unfortunately i can't answer

dusty pawn Jul 6, 2023, 5:08 PM

#

damn

#

well thanks anyways

surreal lagoon Jul 6, 2023, 5:08 PM

#

it looks complicated, the goal

#

i would recommend trying a textual inversion

#

they can be much simpler

dusty pawn Jul 6, 2023, 5:09 PM

#

this is the best one so far

01538-3396645663-_diamondhead_masterpiece_best_quality.png

regal harbor Jul 6, 2023, 5:09 PM

#

so I guess the question is BLIP2 vs. ViT-bigG-14?

which one will create tags/captions which include objects, clothes, lighting, and info about image quality (blurry, grainy, noisy, jpeg etc.), because I don't want these elements to be 'baked in' to the Lora

dusty pawn Jul 6, 2023, 5:09 PM

#

surreal lagoon i would recommend trying a textual inversion

how do i do that

surreal lagoon Jul 6, 2023, 5:10 PM

#

also not a thing i do, but i understand how it works. as such i don't know which tools support it or how to prepare a dataset for it

regal harbor Jul 6, 2023, 5:12 PM

#

surreal lagoon use ViT-bigG-14

I'm assuming 'best' is best? Or maybe caption?

surreal lagoon Jul 6, 2023, 5:21 PM

#

you'll have to test

fallen kestrel Jul 7, 2023, 4:55 AM

#

Hello! I would like to create a model based on my own characters and signature style. I have completed a crude model successfully with Dreambooth, but I have read in a few places that it may be replaced by EveryDream? Specifically because I would like to train multiple characters, I've read that it has advantages over Dreambooth for that purpose. Can anyone shed some light on this? Or provide any resources where I could learn more?

torpid sinew Jul 8, 2023, 10:05 PM

#

If you are training unet only is it able to learn new tokens? I havent really noticed my activation token making a difference in my prompts

steep arch Jul 9, 2023, 3:43 PM

#

does anyone have any scripts/tools the use to automatically check a prompt or lora on different models, different loras, or with a scripted change/iteration in the prompt?

frozen citrus Jul 9, 2023, 11:30 PM

#

torpid sinew If you are training unet only is it able to learn new tokens? I havent really no...

Hope someone cna correct me if this is incorrect, but the U-net actually gets a text-representation from the Diffusion Models's text-encoder so the U-net isn't actually seeing individual tokens like the text model. Fine-tuning the U-net with new tokens can lead to some gains if new tokens have been introduced or the text-encoder has been changed in any way

uneven ermine Jul 10, 2023, 1:46 AM

#

steep arch does anyone have any scripts/tools the use to automatically check a prompt or lo...

you mean the "x/y/z plot" script?

regal harbor Jul 10, 2023, 11:16 AM

#

we can use weights in tags? like

(jpeg:1.5)

vast dome Jul 10, 2023, 11:38 AM

#

I am trying to dreambooth with 4000 images, I am running out of vram even when 24GB 4090

#

I can do fine with 2000 images

#

r u guys experiencing this

thin mantle Jul 11, 2023, 11:01 AM

#

I think the voting should be like midjourney did. It made it easy to rate alot of images. Rating in discord is inconvenient, therefor less people will vote

lost idol Jul 11, 2023, 4:29 PM

#

@pseudo tulip Hi! Wonder if you could implement the adaptive offset noise function to the kohya colab?

surreal lagoon Jul 11, 2023, 4:44 PM

#

https://github.com/huggingface/diffusers/pull/4041

single-file ckpt files got a major boost in Diffusers today.

GitHub

Improve single loading file by patrickvonplaten · Pull Request #404...

What does this PR do?
This PR improves the single model loading file so that it now works very well for:

SD 1.5 (both normal and inpainting)
SD 2.0 and 2.1
SDXL

All unnecessary downloads are remo...

neat oxide Jul 11, 2023, 7:15 PM

#

#

anyone know what this means?

wintry girder Jul 11, 2023, 11:28 PM

#

When training a LoRA, how do you get it to keep an eye patch on the correct eye in every generation? Any tips?

jade hornet Jul 12, 2023, 12:29 PM

#

Better off just controlling it during generation using inpaint sketch or controlnet. Or make sure your training set has that specific orientation/pose

wintry girder Jul 12, 2023, 7:42 PM

#

jade hornet Better off just controlling it during generation using inpaint sketch or control...

Thanks for the reply. Not really feasible to inpaint with Deforum gens though. My training set always has the eyepatch on the correct side and I'm not making flipped copies

#

Has anyone had any luck making a symmetrical body part asymmetrical in a consistent way? E.g. a robot arm that's always on the left arm?

restive bridge Jul 12, 2023, 7:54 PM

#

aye guys where can i find some kohya training configs for XL?

hollow spruce Jul 13, 2023, 1:39 PM

#

restive bridge aye guys where can i find some kohya training configs for XL?

#✨｜sdxl message

and use this program for an easy ui (uses original kohya ss in background)
https://github.com/derrian-distro/LoRA_Easy_Training_Scripts/tree/SDXL

#

based on testing since then - these settings still yield the best results

lost idol Jul 13, 2023, 2:25 PM

#

lost idol <@850007095775723532> Hi! Wonder if you could implement the adaptive offset nois...

@pseudo tulip would you reply? Do you even reply on the GitHub for the kohya colab?

rugged narwhal Jul 13, 2023, 2:34 PM

#

How do you train or something similar with amd gpus?

glacial comet Jul 15, 2023, 7:42 PM

#

anyone had any luck with Dreambooth and SDXL? I have trained a few in the last day or so using a face like usual and it seems like it works as far as generating images but it will pretty much only generate images almost exactly like the training images. Prompts barely change anything at all

tulip fern Jul 17, 2023, 6:33 AM

#

I would try editing the image as best as I can and then merging it with i2i, though it usually helps to give both parts of the image the same style

remote crow Jul 17, 2023, 4:46 PM

#

hello is there a simple tutorial on finetuning for begginers ? i want to create a minecraft texture pack model but i dont think a lora will consistently be able to create 16x16 pixel art

jade hinge Jul 18, 2023, 11:16 PM

#

First Ever SDXL Training With Kohya LoRA - Stable Diffusion XL Training Will Replace Older Models : https://youtu.be/AY6DMBCIZ3A

YouTube

SECourses

First Ever SDXL Training With Kohya LoRA - Stable Diffusion XL Trai...

How to install #Kohya SS GUI trainer and do #LoRA training with Stable Diffusion XL (#SDXL) this is the video you are looking for. I have shown how to install Kohya from scratch. The best parameters to do LoRA training with SDXL. How to use Kohya SDXL LoRAs with ComfyUI. How to do checkpoint comparison with SDXL LoRAs and many more cool stuff.

...

▶ Play video

regal harbor Jul 19, 2023, 10:35 AM

#

I got so many noisy pics. I can denoise them, but then they start to look anime. Any way to get the best of both worlds?

jade hornet Jul 19, 2023, 12:48 PM

#

regal harbor I got so many noisy pics. I can denoise them, but then they start to look anime....

By that you mean low quality? You can try adding that to the captions, "low quality photo, grainy"... See if doing that helps prevent it from learning that

regal harbor Jul 19, 2023, 12:59 PM

#

jade hornet By that you mean low quality? You can try adding that to the captions, "low qua...

I did that, but it didn't work out. Hence I'm working on denoising now. The photos look a bit cartoonish after denoise... hopewfully it doesn't learn that.

It seems to be that any photos have some noise/jpeg, even if I go on a stock image website, there are almost no 'perfect' photos... so how are others not having this problem?

jade hornet Jul 19, 2023, 3:16 PM

#

Maybe you're just overtraining, and the noise in the photo isn't the issue, hard to say

oblique hamlet Jul 19, 2023, 4:46 PM

#

a bird

kind elk Jul 19, 2023, 6:27 PM

#

Has anyone gotten it down pat to get a product into scenes with a LoRA? I have seen cars and a few things. But I have yet to see ... say for instance .... a microwave or a certain type of dress with 100% accuracy. In theory it should be quite possible

#

(for photoreal)

#

Planning to render a 3d Object at multiple angles and lighting styles and hard train a lora into it. It should be quite possible. Just haven't seen it in the wild. Curious if anyone has seen

kind elk Jul 19, 2023, 11:37 PM

#

https://giphy.com/gifs/bueller-ferris-buellers-day-off-vHwGAMZfWj3mU

Giphy

autumn obsidian Jul 20, 2023, 4:50 PM

#

Hi guys what would you suggest for finetuning on 100+ objects? or even 1000+?

blazing scarab Jul 20, 2023, 6:57 PM

#

autumn obsidian Hi guys what would you suggest for finetuning on 100+ objects? or even 1000+?

Do you mean you want the model to be able to learn 1000+ objects or you have 1000+ images of an object? If it's the former, you're probably better of with textual inversion embeddings and for the latter, you can do whatever you want (1000 is a lot of images for finetuning!): LoRA, Dreambooth or Textual Inversion embedding

hidden flame Jul 22, 2023, 1:43 PM

#

Would clip captioning be good enough for attempting to train multiple styles into a model? or would using multiple keywords/concepts as well work better?

abstract merlin Jul 23, 2023, 1:42 AM

#

are there any resources for generating regularization images for SDXL? I'm finding it impossible to run kohya_ss full-finetune/dreambooth training on my 24GB card, so I'm preparing a jupyter notebook to run on a rented host. I'd like to be able to run the training with regularization images, but I'm falling short on tools that can use SDXL to build that imageset. Maybe I'll just have to write a script that calls the SDXL inference script in kohya and build it that way?

#

I've got Comfy running too, but I recall regularization images should use the prompts from the captions that'll be used from training image captions, if I'm not mistaken. In that case, just typing is the class prompt and generating a ton of images with SDXL in comfyui wouldn't quite be the same, right?

fierce nova Jul 24, 2023, 4:31 PM

#

abstract merlin I've got Comfy running too, but I recall regularization images should use the pr...

not sure about sdxl but the best outputs I've had from 1.5/2.1 dreambooth was when I used canny controlnet + prompt for the regularization images vs just prompt alone.

#

I've seen this happen if you are using noise offset's

#

you can also counter this a bit by using a noise offset inverse of what you were using before

abstract merlin Jul 24, 2023, 5:54 PM

#

Has anyone else run into this error while running kohya training? File "/content/kohya-trainer/library/train_util.py", line 1190, in __getitem__ example["latents"] = torch.stack(latents_list) if latents_list[0] is not None else None RuntimeError: stack expects each tensor to be equal size, but got [4, 104, 128] at entry 0 and [4, 72, 128] at entry 1

torpid ledge Jul 24, 2023, 10:12 PM

#

Anyone has done a successful TI of SDXL? If so could you please share the settings you used?

peak comet Jul 25, 2023, 5:45 PM

#

I recently spent some time to break down HyperDreambooth and all the diffusion training methods. If you've ever wanted to know how LoRA or Dreambooth actually works, check this out: https://open.substack.com/pub/aibits/p/paper-break-hyperdreambooth?r=2lknwd&utm_campaign=post&utm_medium=web

Paper Break: HyperDreambooth

You can now train your face on a model 25x faster

olive elbow Jul 26, 2023, 4:57 PM

#

Does anyone have experience training a model for product photography that can provide enough detail that it can accurately reproduce the information on a label? Is it possible?

warm fog Jul 26, 2023, 6:09 PM

#

peak comet I recently spent some time to break down HyperDreambooth and all the diffusion t...

great stuff, subscribed for more

a pity there is no opensource implementation yet.

#

anyone doing dreambooth or loras on apple silicon with mps? It seems mixed precision support is an issue everywhere. I have lot of RAM though, so thats not a limitation but training is slooow. I’ve tried to do a dreambooth on sdxl using https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/README_sdxl.md

GitHub

diffusers/examples/dreambooth/README_sdxl.md at main · huggingface/...

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch - huggingface/diffusers

peak comet Jul 26, 2023, 6:18 PM

#

warm fog great stuff, subscribed for more a pity there is no opensource implementation y...

Thank you! 🙏🏾

fossil nova Jul 26, 2023, 7:49 PM

#

what captioning method is reccomended for SDXL? 🙂

arctic solar Jul 26, 2023, 8:30 PM

#

torpid ledge Anyone has done a successful TI of SDXL? If so could you please share the settin...

Why TI and not dreambooth?

fossil nova Jul 26, 2023, 9:42 PM

#

Anyone knows why my lora is not working for SD XL? 🙂
RuntimeError: The size of tensor a (768) must match the size of tensor b (640) at non-singleton dimension 1

torpid ledge Jul 26, 2023, 10:39 PM

#

arctic solar Why TI and not dreambooth?

I’m using a custom module to train subjects that best work with TI

arctic solar Jul 26, 2023, 10:47 PM

#

torpid ledge I’m using a custom module to train subjects that best work with TI

Do you find TI better than Dreambooth when it comes to the resemblance of faces to original images?

candid ledge Jul 26, 2023, 11:18 PM

#

Anyone able to train LoRAs on 12gb ?

torpid ledge Jul 26, 2023, 11:24 PM

#

Dreambooth is always better than just TI but for any kind of training, training the text embeddings always help tremendously

junior moss Jul 27, 2023, 12:20 AM

#

fossil nova Anyone knows why my lora is not working for SD XL? 🙂 RuntimeError: The size of ...

I guess you're using a LoRA made for SD 1.5. SD 1.5 and SDXL are different neural networks, so the LoRAs are incompatible.

open merlin Jul 27, 2023, 7:59 AM

#

I'm using Kohya_ss to tinetune sd xl for the first time. But i get stuck because of a runtime error where NaN is detected in the latents. What can i do about this?

#

I also struggle to install triton, which might be the problem. Any tips on how to get that working?

raw wraith Jul 27, 2023, 9:58 AM

#

open merlin I'm using Kohya_ss to tinetune sd xl for the first time. But i get stuck because...

are you on windows?

#

if you're on windows you don't need to use triton, it's incompatible

open merlin Jul 27, 2023, 9:59 AM

#

raw wraith are you on windows?

Yes, thanks. Then it's something else. Ill just wait for some more tutorials to be released then

fossil nova Jul 27, 2023, 12:45 PM

#

junior moss I guess you're using a LoRA made for SD 1.5. SD 1.5 and SDXL are different neura...

i was just stupid and tired in the evening and downloaded the refiner and not the base i was training it on 🙂 . All working as expected.

unkempt wigeon Jul 27, 2023, 2:29 PM

#

Hey. New here, but have been tinkering with SD for a while now. My technical knowledge on it is very limited, so forgive me if I make a bit of a dumb here. I'm looking to render images in blender to act as training data for a LoRA, but I'm not entirely sure what I should be aiming to render out. I aiming to work on a set of clothes, and a second one for a particular character. Should I make face and head shots a focus for characters? Is there anything I should try to avoid doing?
Anything at all really or a link to any sort of guide on this topic would be much appreciated.

young crater Jul 27, 2023, 5:35 PM

#

unkempt wigeon Hey. New here, but have been tinkering with SD for a while now. My technical kno...

For the character, I recommend using 25+ headshots primarily of just the head and a few that are more zoomed out (I got solid results with 1 or two images that were upper body and the rest shoulders and head).
Since you are doing a cg character, you can even swap out HDRIs and fiddle with the face controls / general pose a bit for each image. Try to get a few angle shots as well.

Avoid shots where something is overlapping the face.
Allegedly dramatic lighting is bad too, but I included a few in my training data while still getting solid results.

unkempt wigeon Jul 27, 2023, 5:38 PM

#

Would it be wise to include some facial expression changes per shot, or would a static expression help keep it focused on the face details themselves?

young crater Jul 27, 2023, 5:40 PM

#

unkempt wigeon Would it be wise to include some facial expression changes per shot, or would a ...

If your characters default expression is unique, it's probably best to use that in most images. (Just include each unique expression, like a disney character reference sheet)
But if they have a regular resting face, then I would recommend facial expression changes, yes.

unkempt wigeon Jul 27, 2023, 5:42 PM

#

Any particular minimum number of shots you'd recommend?

tough flame Jul 27, 2023, 5:43 PM

#

Are we just plugging SDXL into regular DreamBooth? for example, training on LastBen's dreambooth Colab works?

young crater Jul 27, 2023, 5:45 PM

#

unkempt wigeon Any particular minimum number of shots you'd recommend?

25-40

unkempt wigeon Jul 27, 2023, 5:53 PM

#

Thank you for your suggestions, they're much appreciated! Oh, I was also going to ask - last time I tried LoRAs, I ran into an issue of them having a very grainy low detail look, like it was painted in large brush strokes on a very noisy and rough canvas, is this "overcooking' that I've seen googling around, and is there any changes I should make to my training parameters/data set in order to alleviate it?

young crater Jul 27, 2023, 5:55 PM

#

unkempt wigeon Thank you for your suggestions, they're much appreciated! Oh, I was also going t...

For 1.5: https://civitai.com/articles/391/tutorial-dreambooth-lora-training-using-kohyass
This guide has served me well. The only way I differ is by cropping all my shots to a square 512x512 with: https://www.presize.io/

For SDXL: I have no idea. I can't get it to run faster than 10s/it 😦

unkempt wigeon Jul 27, 2023, 5:57 PM

#

Oh, I've not seen this guidfe thank you, I haven't touched SDXL yet as Im still a little too attached to whats familiar still, ha

#

oh, so wait my training images don't have to be 512x512? Should I be rendering them higher?

young crater Jul 27, 2023, 6:00 PM

#

unkempt wigeon oh, so wait my training images don't have to be 512x512? Should I be rendering t...

You dont have to have 512x512 images, but I have found it performs better when you do.

For rendering, it's probably save the most time to just render them at 512x512 of just what you need, then no need for presize.io

stone garden Jul 28, 2023, 12:36 AM

#

how much slower fine tuning sdxl compared to sd1.5? approximately on A100 80gb for example

#

only base without refiner

storm juniper Jul 28, 2023, 5:27 PM

#

I have a question about how the SDXL uses the CLIP model as the paper is a little hard to follow.

Does SDXL concatinate the output of the two CLIP model into a 2048 vector; as seams to be implied via the table; or is it doing something more complex as implied by this paragaph (which i'm finding hard to follow):

we use OpenCLIP ViT-bigG [19] in combination with CLIP ViT-L [34], where we concatenate the
penultimate text encoder outputs along the channel-axis [1]. Besides using cross-attention layers to
condition the model on the text-input, we follow [30] and additionally condition the model on the
pooled text embedding from the OpenCLIP model. These changes result in a model size of 2.6B
parameters in the UNet, see Tab. 1. The text encoders have a total size of 817M parameters"```

The reason for asking is i'm wondering if I can avoid finetuning by doing manupulations in the clip embedding space

blissful dragon Jul 29, 2023, 3:49 AM

#

In the kohya colab, one of the preparation sections said that

All 175 images have captions
No tags found for any of the 175 images
So what's the difference between a caption and a tag? Thanks

real sundial Jul 29, 2023, 4:31 AM

#

anyone know what i should do with a dataset of 4k images

#

it's too small for a model but too big for

#

anything else really

remote vapor Jul 29, 2023, 5:04 AM

#

Im wondering how i can reduce the size of my output LORA training with Kohya sdxl 1.0
All the lora checkpoints come out at around 1.3 gig and id like to compress them a bit

dull bramble Jul 29, 2023, 4:48 PM

#

fine tuning goes wrong 😮

real sundial Jul 29, 2023, 6:15 PM

#

rip

tall condor Jul 29, 2023, 6:56 PM

#

i have created a lora with multiple concepts, if i create a text input that adresses this lora how are the sub concepts address? just via the same prompt?

#

also is it possible to merge a lora directly into a model?

brave bronze Jul 29, 2023, 7:31 PM

#

are there any tutorials for locally fine tuning using SDXL? How much VRAM do you need? (I got 96 GB)

tall condor Jul 29, 2023, 8:05 PM

#

you have 96 un multiple cards i guess right?

brave bronze Jul 29, 2023, 8:12 PM

#

tall condor you have 96 un multiple cards i guess right?

2xA6000

tall condor Jul 29, 2023, 8:18 PM

#

so from what i have seen you can run around a batch of 1 on 24gb but only with some tweaks, so i would expect 96 to handle around 3-4

tall condor Jul 29, 2023, 8:42 PM

#

how are lora weights applied to a model when i add it to the prompt? is adding a lora to a model having the same result as training the model directly?

dull bramble Jul 29, 2023, 9:22 PM

#

stone garden Jul 29, 2023, 9:54 PM

#

dull bramble

Hello, very nice anime, I like it very much.

dull bramble Jul 30, 2023, 2:46 AM

#

stone garden Hello, very nice anime, I like it very much.

thank you

#

wicked perch Jul 30, 2023, 6:20 AM

#

how do i fix the fingers and feet of my characters?

ivory pine Jul 30, 2023, 6:30 AM

#

wicked perch how do i fix the fingers and feet of my characters?

use negative prompts like: bad hands, bad feet, etc. That did it for me most of the time, although I get the ocassional bad ones

wicked perch Jul 30, 2023, 6:31 AM

#

ivory pine use negative prompts like: bad hands, bad feet, etc. That did it for me most of ...

do you know how many sampling steps would be generally good?

#

and the cfg cale?

ivory pine Jul 30, 2023, 6:32 AM

#

I use 20+ steps, usually 20-50 depending on the model

#

and cfg I like to keep between 7-10, usually just 7.5

wicked perch Jul 30, 2023, 6:32 AM

#

alr, ill use that in the inpainting process

young dust Jul 30, 2023, 6:48 AM

#

have you messed with cfg scale scheduling and mimic cfg scale?

regal harbor Jul 30, 2023, 9:15 AM

#

would you finetune datasets organizing photos by folders? e.g. "front facing", "from behind", "side view", "laying down", "2 people". Maybe if makes sense to train those concepts separately, merge them, then train over them at a low LR?

open merlin Jul 30, 2023, 1:27 PM

#

brave bronze are there any tutorials for locally fine tuning using SDXL? How much VRAM do you...

You need network alpha on 1, very important otherwise you overtrain. Network dim can be 264 but you can go higher as you have more ram. I'm lately having a lot of success using ~100+ total steps per image. So that includes steps per image multiplied with epochs. Learning rate using cosine (which goes to 0 from the starting) and adafactor. Starting at 0.0004 was ok, now I'm trying 0.001. also regularisation like dropout seems to work really well. Finally the weights and biases API is really useful, highly recommend it.

brave bronze Jul 30, 2023, 1:48 PM

#

open merlin You need network alpha on 1, very important otherwise you overtrain. Network dim...

appreciate the tips, but at the moment I'm not even sure where to start. Do I just use the same diffusers library that can be used to train SD1.5?

open merlin Jul 30, 2023, 1:49 PM

#

I use kohya ss, then put the base xl model as the one to train

#

I can make a quick word document with links and send it if you want

brave bronze Jul 30, 2023, 1:49 PM

#

that would be super helpful, I'm sure to more people than just me

open merlin Jul 30, 2023, 2:52 PM

#

Here is a first draft. Maybe it makes sense to make this a web page or something, if anyone wants to help we can crowdsource the best settings. These are just the ones I figured out the last few days.

📎 startup_guide.docx

tepid sundial Jul 30, 2023, 4:02 PM

#

Anyone have information or links to reading on SD training with non-DDPM noise scheduler?

#

I haven't used KohyaSS, but I've been reading the code for sd-scripts, and the training script doesn't make assumptions about image size; so as long as the dataset loader doesn't as well it should.

median ocean Jul 30, 2023, 4:27 PM

#

hi guys, does anyone know where to download or own any repository of good regularization images of real life female / male pictures? i plan to train using SDXL and good reg images could improve the quality much better

torn mica Jul 30, 2023, 6:58 PM

#

Working on my first lora. I couldn't find a set of regularization images that were good enough, so I've been building my own. I have 135 decent 1024x1024 images (and about 31 training images). Is 135 enough for regularization?

#

I'd skim some youtube tutorial, but I get the sense they all, at best, have a link to an image reg dataset that doesn't fit my usecase well enough.

#

Rather than building their own (to find out how many is enough to get by)

#

searched the discord server for "how many regularization"

looks like I might be ready to progress

open merlin Jul 30, 2023, 10:19 PM

#

torn mica Working on my first lora. I couldn't find a set of regularization images that we...

135 is more than enough. Depending on your video card your compute time will be massive. I'm talking days if you have a multiple of that in training images..

torn mica Jul 30, 2023, 10:55 PM

#

open merlin 135 is more than enough. Depending on your video card your compute time will be ...

Computer time was pretty long but each step was taking like 5 min instead of 1.5s like the tutorial. I changed the bfloat to uncapped experimental and it’s running 10k steps while I’m here at the gym. 1.2k finished by the time I showered.

torn mica Jul 30, 2023, 10:59 PM

#

open merlin 135 is more than enough. Depending on your video card your compute time will be ...

Almost forgot, thank you for the answer

jade hornet Jul 31, 2023, 12:19 AM

#

median ocean hi guys, does anyone know where to download or own any repository of good regula...

Just generate them with the model. Why do I say this? You want the AI to learn what's different or special about your subject vs the 'class'. If you have regularization images, it will compare you're subject images vs that to discern what's different and learn that. Therefore, you want your subject images to be what stands out, if that makes sense. For that reason, I wouldn't even spend a ton of time curating them. However, if your subject has a particular body type, or you want the model to draw it better, you could train multiple subjects into the lora, like a body type, and a likeness. Think of it like a game, you say what is a woman, and it spits out an example, and you say no, no... And point it to a fine tune example

spring sun Jul 31, 2023, 1:28 AM

#

Any suggestions to train a lora on sprite sheets, that usually has lower dimensions (96x128)?
I would like to train a lora for sprite sheets on SDXL.

I will be able to produce 96x128 with on XL?
should I scale everything up to train and generate, then scale down?

median ocean Jul 31, 2023, 4:15 AM

#

jade hornet Just generate them with the model. Why do I say this? You want the AI to learn...

i got your point, thanks for the help 😄

#

hi guys, another questions, does anyone ever tried to train Lora using the SDXL refiner? can i know what GPU did you guys use? RTX 3090 or A6000?

dusky aurora Jul 31, 2023, 6:39 AM

#

does anyone have experience in upscaling into very high resolutions ?

hollow spruce Jul 31, 2023, 7:55 AM

#

For SDXL LoRA Training

kohya gui, main branch, and use this config file
for anyone wanting to make loras

epoch and max epoch need to be adjusted. 40 for normal loras, 80 for very complex loras (big dataset/faces/anatomy).
(also obviously adjust batch size, to whatever your card can handle)
repeat on dataset folders = 1

There'll be a comprehensive guide eventually on what all the settings do - but for now: that config will work as long as your dataset isn't too bad

expect this to run around 7 min per 10 epochs on a normal card. 16gb vram guaranteed - 12gb vram should work if your overhead is low enough. <- lower batch size to whatever your hardware can handle. 12gb vram should be able to handle batch 1~2
24gb vram can handle up to 8~12
if you have 24gb vram card -> run at batch 3 (only a minute slower) -> then you can continue using comfy during training, to test your checkpoints and see if you're happy and can stop training early.

also, just in case anyone's curious about file size. the full details & nuance of a face fit in dim 1 - we just use 8 cause we want the 8:1 ratio of dim to alpha.
setting it higher usually won't give better results, as dim 8 is big enough that I've fit the concepts of 100 complete dresses + faces + anatomy into it. So unless you're doing a full finetune level of lora with more than 5k images, dim 8 will be good enough

📎 kohya_gui_adamw8bitPreset.json

#

For captioning, all old 1.5 rules still apply.
Dataset size of 50~100 is recommended to avoid the typical training pitfalls. 10~30 works if you know what you're doing. less than 10 can work if you really know what you're doing, and have your captioning down to a science.
Datasets above 100 definitely improve the model, under the condition that you can keep up proper captioning. (better to have 50 good caption images, than 500 bad captioned ones) - quick and/or automated methods of captioning for sdxl will be in the guide

in an ideal world you'll have a .txt file for each image with tagging like this:
<trigger word>, caption, caption, caption, caption, caption, caption, <background description>
(seriously, don't forget background descriptions unless you want your trigger word to also affect the background)

open merlin Jul 31, 2023, 9:30 AM

#

I just worked out my findings from training Lora's the past few days: https://medium.com/@berd0stad/training-sd-xl-1-0-loras-22cc1daa20b

Medium

Training SD XL 1.0 LoRA’s

Have been training a few LoRA’s for Stable Diffusion XL 1.0 these past days. Here a short start up guide with a step by step process.

open merlin Jul 31, 2023, 9:33 AM

#

hollow spruce # For SDXL LoRA Training **kohya gui, main branch, and use this config file** fo...

Nice to see you have different settings. Why do you use --network_train_unet_only? And constant instead of cosine learning rate, especially with such a high LR. Is a network dim of 8 enough? I use 264 and it makes the file sizes huge. Also why not use any of the dropout functions? I find they really help generalise the models.

#

Also why not use full training with bf16? It frees up a lot of VRAM. I could sneak in an increase in batch size by enabling it.

#

What tradeoff do you make between number of epochs and steps per image?

hollow spruce Jul 31, 2023, 10:08 AM

#

open merlin Nice to see you have different settings. Why do you use --network_train_unet_onl...

I mean I literally said in the message that I trained 100concepts into dim8 (42mb LoRA) - yes dim 8 is enough as long as you stay under 5000 image dataset. If you go above that, then these settings won't really help you much, as you're already aware of what you're doing

#

unet only is a complicated topic, that can't be properly explained in a few sentences. I'll be mentioning that in full in the guide - as well as everything you need to change about dataset + settings if you do plan on training the open clip layer.
In short without much explanation: sdxl now has 2 clip models, and they are set up in a complicated way. If you train clip, then in 99% of cases you are actually damaging the whole sdxl model while doing so. If you are making waifudiffusionXL, then that is fine, as you don't really care about the abilities the clip used to have - but in our case of normal LoRA training, we really really don't want to cause damage to the model. Especially when it takes longer to train, to often receive worse training data. The only downside of not training clip is that your trigger word should be close to what you're training, else you'll need twice the epochs to achieve good results.

open merlin Jul 31, 2023, 10:15 AM

#

Thank you, look forward to your guide!

hollow spruce Jul 31, 2023, 10:16 AM

#

but also to justify it a bit without proper proof, here is a screenshot from the readme of kohya himself
(can be found here -> https://github.com/kohya-ss/sd-scripts/tree/sdxl )

fierce nova Jul 31, 2023, 10:17 AM

#

has anyone tried doing layer-wise training on sdxl?

#

http://proceedings.mlr.press/v97/belilovsky19a/belilovsky19a.pdf

hollow spruce Jul 31, 2023, 10:25 AM

#

open merlin Nice to see you have different settings. Why do you use --network_train_unet_onl...

dropout & cosine can be used to improve your training - but they aren't fool proof. Essentially I want the json to be able to be used by anyone who's training for the first time, and achieve good results.
If you understand the changes that schedulers make - then you should eventually move over to cosine with restarts, once you have a grasp on when the model starts to deteriorate. But do keep in mind that this is no longer SD1.5, and a lot of knowledge about the precision settings that was applicable to training before, is now no longer correct.

hollow spruce Jul 31, 2023, 10:28 AM

#

open merlin Also why not use full training with bf16? It frees up a lot of VRAM. I could sne...

I saw a lot of warnings about compatibility whenever it was mentioned by the various UI devs for kohya - so I've held off on trying it, since I can only test on my 4090.

fierce nova Jul 31, 2023, 10:30 AM

#

i use bf16 exclusively and its fine, just cast it to fp16 when you are done

hollow spruce Jul 31, 2023, 10:31 AM

#

open merlin What tradeoff do you make between number of epochs and steps per image?

?
not sure if you're referring to repeat vs epochs, or generally what happens once you hit high epochs
steps = image count * repeat * epochs
just a matter of where the math happens

if your steps go high enough, you will either start to overfit the model, or have it break down completely. Due to 8/1 ratio this doesn't happen till fairly late though.

midnight vale Jul 31, 2023, 11:32 AM

#

Does KohyaSS support training images of any aspect ratio? I want to train an SD XL model with images with a broad range of aspect ratios

hollow spruce Jul 31, 2023, 11:40 AM

#

midnight vale Does KohyaSS support training images of any aspect ratio? I want to train an SD ...

yep. #🔧｜finetune message does it with as little vram as possible. if you have 24gb vram, you can increase the aspect ratios by switching the bucket size from 128 to 64.
(i'd still recommend leaving it at 128 though, as the benefit is minor, but vram issues can occur if you have too many aspect ratios)

sonic narwhal Jul 31, 2023, 12:32 PM

#

Training SDXL lora in kohya_ss and it is incredibly slow compared to previous runs. Im on a RTX3090. Anyone else experience this?

open merlin Jul 31, 2023, 12:35 PM

#

How many seconds per iteration are you running?

hollow spruce Jul 31, 2023, 1:07 PM

#

sonic narwhal Training SDXL lora in kohya_ss and it is incredibly slow compared to previous ru...

use this. shouldn't take more than 30 min to train a complete lora, with room to spare.
#🔧｜finetune message

dull bramble Jul 31, 2023, 1:12 PM

#

why do when I train a lora, the sampling images look absolutely disgusting

#

but when I try the same lora on automatic1111 webui, it still doesn't look good but it's way better

#

(this is supposed to be ramlethal from guilty gear strive)

midnight vale Jul 31, 2023, 1:17 PM

#

hollow spruce # For SDXL LoRA Training **kohya gui, main branch, and use this config file** fo...

Wait doesn't "additional_parameters": "--network_train_unet_only", stop accompanying caption files from affecting training?

hollow spruce Jul 31, 2023, 1:39 PM

#

dull bramble why do when I train a lora, the sampling images look absolutely disgusting

because kohya T.T
if you have enough vram, you can do training + comfyui at the same time, to test your checkpoints. other than that, no good solution at the moment

hollow spruce Jul 31, 2023, 1:39 PM

#

midnight vale Wait doesn't `"additional_parameters": "--network_train_unet_only",` stop accomp...

nop. it means that the clip model won't be affected, but the unet model still relies on captions to shift weights

dull bramble Jul 31, 2023, 1:40 PM

#

hollow spruce because kohya T.T if you have enough vram, you can do training + comfyui at the ...

thank you ! What am I doing is generate the sample images with auto1111 webui in full cpu mode during training, it takes about 20 minutes to generate a 1024x1024 image with 20 steps

warm agate Jul 31, 2023, 1:52 PM

#

@hollow spruce do you know any LLM finetuners, i want to finetune a model to generate SD prompts

normal pike Jul 31, 2023, 2:21 PM

#

Hey guys. How much VRAM do I need to finetune SDXL model? Would a 4090 24gb be enough?

dull bramble Jul 31, 2023, 2:26 PM

#

normal pike Hey guys. How much VRAM do I need to finetune SDXL model? Would a 4090 24gb be e...

I am fine tuning with 16Gb VRAM so yes

normal pike Jul 31, 2023, 2:31 PM

#

dull bramble I am fine tuning with 16Gb VRAM so yes

Nice, good to know. Thanks for the answer!

robust skiff Jul 31, 2023, 2:32 PM

#

Hello!
Which video card is better for GIGABYTE GeForce RTX 3060 Ti EAGLE OC (LHR) 8G or GIGABYTE GeForce RTX 3060 GAMING OC 12G?

open merlin Jul 31, 2023, 2:34 PM

#

more ram is more better

normal pike Jul 31, 2023, 2:36 PM

#

robust skiff Hello! Which video card is better for GIGABYTE GeForce RTX 3060 Ti EAGLE OC (LH...

Only go for non-ti versions if they have more vram

#

3060 ti is barely better than the 3060, but since it has 4gb less VRAM, the 3060 wins by miles

sonic narwhal Jul 31, 2023, 2:54 PM

#

hollow spruce # For SDXL LoRA Training **kohya gui, main branch, and use this config file** fo...

on 5000 steps estimated time is almost 300 hours

#

Now im using different settings and estimated time is 36 hours for 6240 steps

#

did a 1.5 Lora in 18 minutes in between these

young crater Jul 31, 2023, 2:56 PM

#

sonic narwhal on 5000 steps estimated time is almost 300 hours

how many images do you have?

sonic narwhal Jul 31, 2023, 2:56 PM

#

39

young crater Jul 31, 2023, 2:57 PM

#

Caith's is assuming 1 sample per image, and 40-80 epochs

#

so it should be 1900-3000 steps

#

far less assuming a batch size over one

#

my last 28 image lora training was 320 steps following caith's settings

sonic narwhal Jul 31, 2023, 2:59 PM

#

1 sample per image? Is that a setting in kohya?

young crater Jul 31, 2023, 2:59 PM

#

sonic narwhal 1 sample per image? Is that a setting in kohya?

on your /img/ folder, you have the number before the lora name

#

it should be 1_[name] [class]

sonic narwhal Jul 31, 2023, 3:00 PM

#

ok

#

Is training lora on SDXL supposed to create .npz files in the dataset folder?

young crater Jul 31, 2023, 3:00 PM

#

sonic narwhal Is training lora on SDXL supposed to create .npz files in the dataset folder?

yes, those are your latent caches

#🔧｜finetune

Link from my Video

For SDXL LoRA Training

epoch and max epoch need to be adjusted. 40 for normal loras, 80 for very complex loras (big dataset/faces/anatomy). (also obviously adjust batch size, to whatever your card can handle) repeat on dataset folders = 1

There'll be a comprehensive guide eventually on what all the settings do - but for now: that config will work as long as your dataset isn't too bad

epoch and max epoch need to be adjusted. 40 for normal loras, 80 for very complex loras (big dataset/faces/anatomy).
(also obviously adjust batch size, to whatever your card can handle)
repeat on dataset folders = 1