#LoRA_Easy_Training_Scripts

1 messages · Page 2 of 1

humble axle
#

you should consider that "don't train the conv part to learn denoise"

normal charm
#

More precise?

humble axle
#

and actually generate image with different resolution is totally different task

#

so you should use the resolution that your base model trained for

#

if your base model is something merged and you want to train lora on it
I will recommend you to disable the conv layer with conv_dim=0

vivid python
#

I'll throw in some parameters that I used for training a locon style lora of tab_head, just to give another setup.

4/1 linear
8/1 conv
5e-4
1200 steps
768x
Cosine
Warmup ratio of 0.05
Batch size 2

Got pretty much the entire style.

humble axle
#

BTW

#

even with conv_dim=0
loha/locon will train more layer then kohya_ss' lora

ivory crescent
#

Clyde censored my hands images Agony

normal charm
humble axle
#

basically only need to change the settings about locon/loha

humble axle
#

res and other is ok

ivory crescent
vivid python
#

Hands be too sexy man

shut siren
#

Interesting, I haven’t had issues with using high alpha in my training runs

#

High alpha meaning alpha=dim or basically no scaling

normal charm
vivid python
#

Reference yes, each dataset is different

shut siren
#

When I tried low alpha in the past, it required giga cranking up the LR to learn anything

ivory crescent
normal charm
#

All the math i learned in school and shit still make me feel smooth brained

vivid python
#

But seems like training at 16/8 or 8/16 then resizing using the dynamic resizing might be better

normal charm
#

Thanks us education system

vivid python
#

Those are dims btw

ivory crescent
vivid python
#

Yeah

#

The lora resize script can resize locon

#

And it has a few dynamic resizing modes

normal charm
#

Oh that

#

It only lowers the size correct?

vivid python
#

Yes

#

But usually it can lower the size without breaking things

shut siren
#

Maybe I need to try low alpha again

#

But it always seemed undercooked when I tried it

vivid python
#

I've generally had success at low alphas

shut siren
#

End up having to use giga learning rate to compensate

vivid python
#

Almost all of my lora are either dim16 with an alpha of 8, or dim 8 with an alpha of 1

normal charm
#

Wait
I shouldve asked this earlier
But are we talking about the base dim/alpha or just the dim/alpha for lycoris training

vivid python
#

Both

ivory crescent
#

what warmup ratio you use?

shut siren
#

Back when sdscripts first implemented alpha and the default became 1, I was wondering why none of my loras were learning anything lmao

vivid python
humble axle
#

I don't change the ratio(?

#

I always use 100 warm up step

normal charm
vivid python
#

Some styles may need more

ivory crescent
#

it's ratio in derrian's argslist DrollFrozen

vivid python
#

Which is why I set default to 32

ivory crescent
#

but probably could calculate it for 100 steps

humble axle
#

yeah

vivid python
#

Just steps directly

shut siren
#

I’m dying from seeing loras I want to try out on civitai and then they’re like 256dim or something

vivid python
#

We have the means

normal charm
#

Oh another thing
Would lycoris models have anything to do with why additional networks extension isnt loading them for meLucaWat

shut siren
#

What settings do you prefer for dynamic resize?

vivid python
#

Yeah, provided it was a loha it doesn't load at all

vivid python
ivory crescent
#

I guess also 512 res with that batch size

#

or A100 KoiSee

vivid python
#

And you can only load it using the built in method

#

Not additional networks

normal charm
#

I…thought i already installed it?

vivid python
#

Loha weren't integrated into additional networks though

humble axle
#

with gradient checkpoint

ivory crescent
#

I would have to use cache latents too probably

#

I like random crop NotLikeKogasa

normal charm
#

That not it?

vivid python
#

And it's up to date?

normal charm
#

I just updated an hour ago

ivory crescent
#

I "only" have 3090, but probably only thing that matters is VRAM

vivid python
normal charm
#

Yeah its only the prompt box that understands

#

Additional networks is just borked

vivid python
#

It's not broken

#

It just doesn't support loha

normal charm
#

Can it support loha?

vivid python
#

Nope

#

Not without a rewrite

shut siren
#

do yall adjust learning rate for batch size?

normal charm
vivid python
normal charm
#

Not for batch size, not me

vivid python
#

So not really

normal charm
#

I used to do batch size 3, but since using locon or loha i had to lower it to 2

ivory crescent
#

what buckets do you use?

shut siren
#

im a batch size 1 user lol

humble axle
#

default in kohya

#

I m bs1 also (in llama
but I use grad acc step 64 (X

ivory crescent
#

I remember showing Salt my batch size when making softprompt for LLM, and he though it was for WD Agony

shut siren
#

gonna try dim 8/4 with 1 alpha for both

#

and see what happens

#

considering i was alr using lr 4e-4 with alpha=dim this might be undertrained

ivory crescent
#

batch size 400 and 64 grad steps Agony

shut siren
#

is there a way to tell if your lora/locon underflows with high alpha?

humble axle
#

but you can check the weight of lora_up

#

for loha I need to check(

shut siren
#

cuz my results have always been usable with alpha=dim

humble axle
#

just check if any weight is super small

#

and has lot of zero

shut siren
#

but i dont know if there is any problem with it lol

humble axle
#

and in sometimes

#

my friend actually meet something like:
"All zero in the weight"

#

super underflow

normal charm
#

I never touch weight decay
Idk if i should

shut siren
#

i nudged it up to 0.1

normal charm
#

Thats the default no?

humble axle
#

You should use weight decay if you are using small dataset

shut siren
#

default is 0.01 iirc

humble axle
#

and actually amount<10k is small dataset

#

(some people will say <100K is small BTW XD)

ivory crescent
normal charm
#

10k? Images?

shut siren
#

some of my subject datasets are like 20-30 lol

humble axle
#

I also has trained locon with 5 img total

shut siren
humble axle
#

and just do some crop aug on it

#

and get good result

#

super good result

shut siren
#

ive been using random crop

normal charm
#

Who the hell has a more than 10K dataset

humble axle
shut siren
#

to try to get a pseudo larger dataset lol

humble axle
ivory crescent
normal charm
humble axle
#

umamusume 60k dataset
after repeat for balance the amount of each characters

#

the size become 350k

normal charm
#

Beefy ass pcs u got

ivory crescent
#

what does pc have to do with dataset size lol

#

maybe training time chenThink

normal charm
#

If i train on large sizes i get prone to crash after awhile

#

Too large sizes

ivory crescent
#

@humble axle I'm about to test your settings but I don't think batch size 8 is good for 38 images chenThink

shut siren
#

is there any drawback to high alpha besides the risk of underflowing?

humble axle
#

just add 10x repeat on it XD

humble axle
#

just a scale for output

ivory crescent
#

but warmup still 100 steps? it will be like 1 epoch 💀

#

also adamW8bit right?

vivid python
shut siren
#

normal default is 0.01 iirc

normal charm
#

Noted

ivory crescent
#

how many steps do you do for small datasets?

humble axle
#

lower than 1k but I forgot

#

maybe 300~600

#

or if you can read the metadata from pt file(

ivory crescent
#

I, in fact, know how to read

humble axle
#

good

#

here is it

ivory crescent
#

yes, being able to read is good

humble axle
#

train with 5img (22 after aug)

#

and get super good result

#

I also using this as style model BTW

#

the light/shadow is good for me

ivory crescent
#

by aug you mean cropping, modyfing dataset by yourself

#

or flip/crop args?

humble axle
#

crop + flip

#

I manually modify it

ivory crescent
#

yea, I do that too for less than 30 images or even more

shut siren
#

alpha=1 seems kinda undertrained if i keep the rest of the settings the same

#

not getting the outfit details as precisely

humble axle
#

try little bit higher lr

shut siren
#

im already at 4e-4 which seems kinda high

humble axle
#

not

#

with alpha=1 and cosine scheduler you can just push to 8e-4 (

#

basically depends on your dataset

shut siren
#

any change to the text encoder lr?

humble axle
#

WAIT

#

you have different LR for unet/te?

shut siren
#

yeah

#

te lower

humble axle
#

just use both 4e-4

shut siren
#

than unet

humble axle
#

just use both 4e-4

shut siren
#

you think maybe the TE is undertrained?

ivory crescent
#

I still don't know what TE does in single subject LoRAs chenThink

#

I only noticed it when I did like 8 characters

shut siren
#

ive been using a lower TE lr than unet just because those were the defaults ive seen

ivory crescent
#

When I had to increase TE

shut siren
#

not because of any theoretical reason

normal charm
#

Never bothered messing with either of those

shut siren
#

you just used the same for both?

humble axle
#

yes

#

I just cannot understand why people use different(

shut siren
#

i think the thought is the TE doesnt need as much training for this

humble axle
#

when you are doing some f/t like this

#

actually te learn much than unet

#

undertrained is almost always = te undertrained

shut siren
#

i heard TE can blow up if the LR is too high

humble axle
#

yeah it will

#

but you have alpha=1

#

no need to afraid that

#

alpha=1 for dim=8 means 1/8 grad(

#

(BTW actually all the thing will blow up if LR is too high)

shut siren
#

the old colab i used months ago for DB did this method where it would train just the TE first, and then freeze the TE and train unet after that

humble axle
#

(UNet is not more stable, just in the past you only train the transformer block in it)

#

Oh I cannot say UNet is not more stable

#

It is more stable (In some other experiments)

#

but it will blow up

humble axle
#

betwee make sense and not make sense(

normal charm
#

Surely theres a dictionary for all the training terminology there is, right?

humble axle
#

:deadsmile:

#

oh I need to use another one

shut siren
#

major differences between adamw and adamw8bit?

humble axle
#

8bit is way more smaller

shut siren
#

is there a difference in speed or quality?

humble axle
#

speed... not that significant
quality is also not that siginificant

humble axle
shut siren
#

ive been avoiding increasing batch size because my datasets are giga small

humble axle
#

no need

#

pro-tip

#

Ideal situation is:
batch size = dataset size

shut siren
#

also because it sometimes gives weird step numbers

#

if an aspect ratio bucket has a # of images that isnt a multiple of batch size

humble axle
#

oh right

ivory crescent
#

batch size 16k lets go

humble axle
#

the step number...

shut siren
#

bleh the results from my unet/te both 4e-4 run still seem worse than my old alpha=dim locon

humble axle
#

use higer lr

#

maybe your dataset need higher lr

#

I use 5e-4 for dataset with trigger word only(?
(remove all tags about the trigger word)

#

If you use tag/caption + trigger word

#

you will need higher lr

shut siren
#

oh yeah this is trigger word + tags

normal charm
#

Thats all my datasets ever are

humble axle
#

trigger word + relative tags need higher lr

shut siren
#

when i tried trigger word only the first time i trained lora, it exploded

#

even at 1e-4 lol

normal charm
humble axle
#

all

#

te is important than unet(

#

(Just considering about TI/HYN, actually modify things for TE)

#

(And anime diffusion also just change the cond layer only and get totally different style)

#

oh wait

#

HYN is for unet transformer

#

I'm wrong sorry

#

(or something also for TE?)

normal charm
#

The only lr ive ever seen anyone use for te is 5e-5
Or 1e-5 (mightve been unet, unsure)

#

Most times no one has it set

shut siren
#

maybe i just go back to what i was doing before at alpha=dim, since i found LR that works for that lol

humble axle
#

if you are always using same dim

#

considering about use that alpha always

#

fixed alpha ratio should not work(

shut siren
#

the alpha itself matters more than the alpha relative to the dim?

#

and yeah i basically always use the same dim size

#

8 linear 4 conv for locon

#

4/2 for loha

humble axle
#

yes

humble axle
#

if you fixed alpha

#

means your ratio is related to your size

#

which is good for NN

shut siren
#

the loras i have on civitai are 128dim/128alpha lol

humble axle
#

LOL

shut siren
#

since i know the old behavior of sdscripts before 0.4.0 was basically alpha=dim

#

cuz there was no scaling

#

hmmm interesting that the details im getting on this character's thighhighs are wayyy better when the character is standing than sitting lol

#

i guess its just somewhat unreliable with the sitting pose

humble axle
#

XD

shut siren
#

maybe alpha=1 just needs 8e-4

#

double the LR of alpha=8

humble axle
#

make sense

shut siren
#

8e-4 text encoder makes me uneasy lol

humble axle
#

XD

shut siren
#

thats more than an order of magnitude higher than my previous

normal charm
#

Trying another training using scraps of everything we just talked about
I wont say i know what im doing now, but at least i know…somewhat more than before?

shut siren
#

i feel like ppl use widely different settings and still manage to arrive at a usable result

normal charm
#

Well yes civitai is proof of that

#

Its more a matter of efficiency than usability

#

Well

shut siren
#

im just trying to figure out if there are better settings than what im using that will more consistently give a usable result

normal charm
#

Those are kinda the same thing

#

Somewhat

shut siren
#

like my old 128dim/128alpha loras

#

were usable

#

i made some pretty cool gens with them

humble axle
#

all the setting is depend on your dataset your task your model... depends on all the thing

shut siren
#

but now im a low dim believer

humble axle
#

I'm 1dim believer (X

shut siren
#

save disk space

normal charm
#

Theres an extension that helps view the details of other models people have trained (not additional net)

shut siren
#

theres some 256dim users on civitai

humble axle
#

I also recevied an issue about blow up LoHA

#

and that guy just use

#

384dim loha

#

I don't know why

shut siren
#

surely more dim = better result

humble axle
#

higher than the dim of some layer in the UNet

shut siren
humble axle
#

so we should use 100B stable diffusion(O

normal charm
#

Highest ive ever gone was 128
I was under the impression low dims were for characters and higher were for styles

humble axle
#

I actually use 4dim for style XD

#

but since I use 1dim for character
so higher for style may be correct?

shut siren
#

damn 1dim

humble axle
#

Like this
4dim loha

#

Fuzi style

shut siren
#

i mean i guess ppl make TIs for characters and those are even smaller than 1dim loras

humble axle
#

Yeah

normal charm
#

I figured higher dim did something like account for more of the details in the image or something

#

Hence better for styles

normal charm
#

Or at least as far as ive seen

shut siren
#

you use 1dim for linear and conv?

normal charm
#

I didn’t knoe what to set it to at first, so i tried keeping them close to the base alpha and dim i had set

#

Like if i had 8/16 locon would be 9/18 or something

shut siren
#

you were using alpha higher than dim?

normal charm
#

Rarely

#

I only ever kept the alpha to between 8 and 16, and MAYBE 32

#

On occasions where things werent working

shut siren
#

damn its competent when the char is standing, but cant do this when the char is sitting lol

normal charm
#

Such perfect ai symmetry makes me cry of joy

proper ember
#

What's her name?

normal charm
#

Target concept:

#

Result:

#

@humble axle

humble axle
#

If you can get good result with normal lora
Just continue using it
If you get bad result with all the things

You need to adjust your dataset

#

And I need yo sleep sry(

normal charm
shut siren
#

I expanded some of my datasets from 20 to 30 images, but I don’t feel that it’s necessarily an improvement for all of them

vivid python
#

I usually do 60-100 images for a dataset

#

50-150 is what I consider good enough

normal charm
#

Im trying to train 24 img dataset

#

Its missed twice Agony

proper ember
#

I would consider remaining with old version of both scripts personally with Linux

#

Wasn't a good idea updating both to latest version

#

I too had the same bad accuracy problem as his

normal charm
#

3rd miss

normal charm
#

4th miss

normal charm
#

5th miss
But this time, it retained more details after i changed the base alpha from 1 to 4

#

All the others had it at one

#

So so far alpha=1 aint lookin good

#

Does that need higher steps or lr?

vivid python
#

long story short, setting alpha to 1 only really works at low dims

#

higher than 8 set it to half

normal charm
#

But it was at 8 lol

#

8/1

#

Are we talking lower than that even?

shut siren
#

Low alpha needs higher LR

#

When they introduced alpha and it dropped my 128/128 loras (at the time) to 128/1, it essentially learned nothing on the same settings

ivory crescent
#

trying to use gradient checkpointing

#

no way this is working, I have batch size 8 at 768 res

vivid python
#

higher batch size isn't necessarily better, not at our scale

shut siren
#

my friend asked me to try to train with a single image - and its a fkin discord emoji

#

pretty sure this will result in absolute junk

vivid python
#

i'd be surprised if it works

shut siren
#

it made some absolute junk but we got a few laughs

#

doesnt help that the emoji is 64x64

normal charm
#

Two additional misses in the past few hours since last update

#

8th attempt in progress

normal charm
#

8th attempt ended up retaining the most details with these parameters:
4/8 linear
8/1 conv
5e-4 lr
0.05 warmup
1200 steps
This is when trained as a locon
The outputs still fuck up body anatomy fairly bad, but it showed signs of getting the concept right, so this is next checkpoint i think

#

Next ill try doing loha and see how that fares

ivory crescent
#

@humble axle tested your settings with higher/lower TE/LE and 512/768 and they work pretty well, will have to make LoRA to compare the results but at least they work, so thanks for the help

humble axle
normal charm
#

Tried training my 9th attempt with the same settings as my 8th, but as a loha this time. Result was another complete miss

#

Loha needs more focus or something on it right?

vivid python
#

Loha usually take more to train, because they compress their dims

normal charm
vivid python
#

Or higher lr

worn locust
#

1500 is definitely too low

normal charm
#

How much then

vivid python
#

For loha, either a high lr or like 3k steps

#

The way I train in general is start at 1e-3 for 800 steps

#

Figure out if it's good enough, if not then figure out what needs to be changed

#

If it's baked but didn't learn anything, lower lr increase steps

#

If it learned most things but not everything and isn't baked, increase steps a bit

#

If it learned nothing and wasn't baked, increase lr

#

Though I never had to do that one

#

I usually end up with something like 5e-4 for 1600 steps

#

Also. I generally keep TE at 1e-4 in pretty much every case

vivid python
worn locust
#

That's solid advice

vivid python
#

I learned it from a dude with a masters in datascience

#

I picked his brain a lot

#

At least I'm pretty sure he said masters

#

It was a few months ago at this point

normal charm
vivid python
#

Might be the dataset

#

Ah right, I should be asleep

#

I've gotta wake up early

normal charm
#

Np

#

Ill just refer back to old trainings and improve from there ig

normal charm
#

@vivid python im the only one who frequents this place bc of endless questions and issues Agony

anyway, i updated to v6

vivid python
#

That's the first time I've seen the venv creation just straight up fail

#

I literally just updated my install of sd-scripts to torch 2.1.0 as well

#

So I know it exists

#

The v6 installer is meant to be in its own folder

#

Because it installs everything of course

#

But the torch_update.bat is just supposed to nuke the venv then reinstall everything with installing the new torch

normal charm
#

Had to download and install an older python version tho, but i digress, i suppose

vivid python
#

I only tested everything on 3.10.6

#

That's why

normal charm
normal charm
sterile bolt
#

yo, somebody training locon/lyco in LoRA_Easy_Training_Scripts local repo? how can i understand what exactly trains right now? and if anyone can, please send me configuration file for analysis, thanks in advance

normal charm
#

Id tell u if i could train rn but

sterile bolt
#

I'll wait

#

does this mean i'm training lycoris?

digital kite
normal charm
vivid python
#

not sure

#

the torch_update.bat should just re-create the venv

normal charm
normal charm
#

Welp
Guess im disabled now

normal charm
normal charm
#

Cant reinstall v5 either

#

Same issue

#

When did the program suddenly decide i dont have xformers installed

normal charm
#

@vivid python idk what happened, but as stated above, i cant seem to reinstall v5 either
It all seems tied to a torch error

vivid python
#

v5 shouldn't have any issues with installation

#

the only difference in v6 is the option for the new torch version

#

and proper checks on python version

normal charm
vivid python
#

v4 cannot work

#

v5 was when I completely changed the code

normal charm
#

Guess i can delete that then
But that still doesnt explain the v6 and v5 issue

vivid python
#

it doesn't you're right

#

but it's likely an issue with your computer

#

rather than my scripts

#

as I haven't had any other issues with it, nor any other reports of issues

normal charm
#

That makes it even more obsure

#

Since everything i need should already be installed

vivid python
#

which is only python 3.10.6

#

and git

normal charm
#

Yep
I have git and 3.10.6 installed

vivid python
#

i'm running through the installer again

#

so far no problems

normal charm
vivid python
#

how do you have 3.10.6 installed?

normal charm
#

Wdym by how

vivid python
#

you can get it either through the app store or from the website

#

the app store version is dogshit

normal charm
#

Oh no i went directly to the site

vivid python
#

and I'm assuming you added it to path?

#

do you have any other versions of python?

normal charm
vivid python
#

is it still installed?

normal charm
#

I only downloaded 10.6 after this error started appearing
I assumed installing 10.6 would overwrite any and all 10.9 stuff

vivid python
#

not at all

normal charm
#

Should I uninstall and reinstall then?

vivid python
#

just uninstall python 3.10.9

normal charm
#

Wait i also have a 3.11 too…?

#

Wtf

vivid python
#

that's 100% the issue then

#

3.11 doesn't work at all

#

that I know for sure

normal charm
#

Yeah theres actually 3 different python versions here now
Im not sure how it was working before

vivid python
#

some things seem to want to install 3.11 for some reason

#

i've had it happen too

normal charm
vivid python
#

might be better to just uninstall all python versions then reinstall then

normal charm
vivid python
#

I don't know how your python got fucked up, but I'm also not surprised

#

python is honestly really shitty

normal charm
#

Its a miracle (though i was unaware), that it even worked in the past at all

normal charm
vivid python
#

what does it say above?

normal charm
vivid python
#

looks like diffusers it throwing a fit

normal charm
#

Do i use one of the update bats?

normal charm
#

I would like to state for the record that I do try what I can to fix stuff before I go pinging you, which is why ur not getting a ping every 3-5 minutes @vivid python

vivid python
#

Oh I get it, I not annoyed or anything, I just don't have notifications on

#

Thay being said

#

Try running torch_update.bat

#

And installing 1.12.1

#

The original torch and see if it runs

normal charm
#

just keep using it with 1.12.1 then?

vivid python
#

Yeah, seems like torch 2 isn't working for you which is odd

#

I haven't had somebody have issues with it

normal charm
vivid python
#

guess so

#

and it sucks too

#

because torch 2.1.0 made my bake times go from 2.5 hours to 1.5 hours

#

and i'm on a 3060

normal charm
vivid python
#

they are alright, less vram than me

normal charm
#

Mhm
Maybe i shouldnt be too surprised

quiet notch
#

I was trying to get this trainer up and running, but I was only able to get it working on torch 1.12.1, not on >2.0. I have a GTX1070, so I'm just randomly assuming it's not supported.

#

Oh, I remember my issue now

#

It was something about bitbytes saying that there was no kernel available for execution for my device

#

That said, I don't know if I can ever actually run with torch >2.0 cirnoHelpImDyingInside

#

that speedup sounds nice

#

it looks like there may be solution for me to try in here...

quiet notch
#

nope, still wasn't able to get it up and running

#

There's that dreaded RuntimeError: CUDA error: no kernel image is available for execution on the device

#

Running 1070 on win10

shut siren
#

try turning off 8bit adam

#

or errr

#

using regular adamw instead of adamw8bit

#

i know adamw8bit isnt compatible with GTX 1080 and will throw that error

#

@quiet notch

quiet notch
#

i will give that a try

#

hopefully ram isn't an issue...

shut siren
#

does it run?

quiet notch
#

nope

#

same issue

#

switched over as you said

#

i noticed that from earlier messages, the easy_training folder is supposed to have a venv?

#

i'm not running in venv (didn't do an earlier step correctly perhaps?), could that be part of the issue?

#

@shut siren i might have to sleep soon so we can resume this some other time (unless you want me to troubleshoot rn fast)

#

i am using loha settings, btw

shut siren
#

well "no kernel image is available for execution on your device" implies that something is incompatible with your hardware

quiet notch
#

gtx 1070 8gb

#

guess i'll be having an early christmas soon

shut siren
#

have you tried using base sdscripts

quiet notch
#

i have yet to touch it directly

shut siren
#

it may be torch 2.0 as you mentioned before?

#

i know that bitsandbytes doesnt like GTX 1080

#

but you shouldnt need bitsandbytes unless youre trying to run adamw8bit

#

so theres definitely something else

#

i had someone in another discord also run into that kernel image error, but it specifically indicated bitsandbytes, and he was using sdscripts with the bmaltais wrapper on GTX 1080 - fixed immediately when he switched to regular adamw

quiet notch
shut siren
#

you got things working on older torch version?

quiet notch
#

yeah, it works on 1.12.1

shut siren
#

it might just be that torch 2.0 doesnt like your hardware

quiet notch
#

Do you know if gradient acc steps is properly supported? I was getting this warning...

#

None of the inputs have requires_grad=True. Gradients will be None

#

not too sure if this is specific to the easy training scripts or lora in general

normal charm
#

Im pretty sure 2.0 didnt work for me either, and im on 3070

#

I was just instructed to use 1.12

vivid python
# quiet notch gtx 1070 8gb

10 series cards requires a specific patch, I'm gonna assume you applied it, I had believed that it should just work, but perhaps the main.py file changed for torch2.1.0, which actually makes sense.

#

Perhaps it might actually be that the dlls need to be updated to support torch 2.1.0, which means, likely, that until a new version gets built, 10 series cards cannot use torch 2

quiet notch
#

I did hear something about building from scratch

normal charm
#

Im surprised i never asked this question before but if epoch count is ignored when max steps is set, are the number of repeats also ignored?

quiet notch
#

Speaking of multiplying inputs, I noticed that when the repeat config multiplies the input, the "epoch" then becomes based on that multiplied dataset rather than the original.

#

if I understood my readings correctly, epoch is when the AI looks over your images once and has updated parameters. But with the repeat variable, when you look over all of your data once * repeat, it only counts as 1 epoch instead of what the repeat is.

#

i'm even more confused when i take into account gradient acc steps and the config parameter epoch, because the shown epoch during training can be higher than what you originally intended for it to be, and then on top of that, since input is multiplied, the real epoch is actually the shown epoch multiplied by repeats?

normal charm
quiet notch
normal charm
#

That was my understanding of max steps at least

#

So what it sounds like is it just affects how many images are processed during training

#

However long its specified to go

quiet notch
#

though...

#

assuming max steps refers to optimization steps (the step in which parameters are updated)

#

hm

#

nevermind

#

i was thinking that if a "step" referred to an image training iteration, then if you set repeats to like 10 and max steps to 10, then you would only go through 1 image out of your entire dataset

normal charm
#

okay so ik for a FACT this sample folder did not exist before, despite the date modified column saying its been here since a few weeks ago
i have really really wanted a an output folder for samples so i could better track the training progress, but from, what I can tell, theres no path argument or whatever that can be assigned for that

#

Theres the option to tell it how often to dropout samples, but apparently the output directory for the lora isnt enough for that (according to the error i get when using that function)

Theres no way im that blind bc i constantly check this folder after training things, downloading things, etc

#

No way i wouldnt have noticed this, but if it can actually be done then where do i specify where it drops the samples?

worn locust
#

you just got mandela effect'd by derrian

vivid python
#

nani

#

I don't think there is an option to set where samples go

#

at least, there isn't an arg for it

#

yeah, I don't see one

#

it is supposed to output to the outputs folder

#

I can only see the problem being that there wasn't a proper txt file being pointed to for samples

#

but dunno

#

anyways, gonna sleep now

#

if you still have questions

#

I'll answer in the morning

normal charm
#

The samples i found in that folder were from a lora i trained awhile back
And i have only ever trained loras using ur script, at least, as far as i can remember

#

Which makes this all the more confusing

#

Since any time i turn the samples arg on, it just gives me path errors during training

vivid python
#

Not sure then, there's no way to change the folder for samples, but the path error can be it looking for the txt file and not able to find it

vivid python
#

The text file that has all your prompts

#

It doesn't just pull random prompts from the dataset

normal charm
#

U mean the sample prompt txt?

#

Is that what it needs? I do remember i used it for that lora

normal charm
#

@vivid python

vivid python
normal charm
vivid python
#

Depends on the dataset

#

I usually follow

#

The idea of 1e-3 to see how it bakes really quickly

#

Then adjust based on those outputs

#

Dim doesn't affect lr a ton, from what I know

normal charm
#

Ok, thank u

sterile bolt
#

so i use 41 pic dataset with 5 repeats for this settings to lora training (local LoRA_Easy_Training_Scripts):
self.optimizer_type: str = "AdamW8bit" self.scheduler: str = "cosine_with_restarts" self.cosine_restarts: Union[int, None] = 3 self.learning_rate: Union[float, None] = 1e-4 self.unet_lr: Union[float, None] = 1e-4 self.text_encoder_lr: Union[float, None] = 5e-5 self.net_dim: int = 128 self.alpha: float = 128 self.train_resolution: int = 768 self.batch_size: int = 6 self.num_epochs: int = 12

then i trying to start train LyCORIS with same dataset and same count of repeats when add next settings:
i changed value self.net_dim: int = from 128 to 16
i changed value self.alpha: float = from 128 to 1
self.lyco: bool = True self.locon_dim: Union[int, None] = 8 self.locon_alpha: Union[int, None] = 1 self.locon: bool = True
add self.network_args: Union[dict[str:str], None] = {
algo": "lora"
"conv_dim": "8"
"conv_alpha": "1"
"disable_conv_cp": "True"
}`

as a result, i get an output image worse than lora. im dumb?

sterile bolt
#

for lora 128, for lycoris 16

vivid python
#

eh, lora don't need to be dim128

#

also update pushed

vivid python
#

and locon is just not better for characters because of style bleed

#

and lower dims require higher steps

#

or higher lr

sterile bolt
vivid python
#

I usually never use dim128 lora

#

either I resize them

sterile bolt
#

can i increase number of steps by increasing repeats of dataset? like from 5 to 20

vivid python
#

or just don't use them

sterile bolt
#

or how i can try increase amount of steps

vivid python
#

better that way

sterile bolt
#

value self.net_dim value self.alpha
this is need to change when training lycoris or will be used only conv values?

vivid python
#

it will use both

sterile bolt
#

which one is better trying to training - lycoris or locon?

vivid python
#

depends

#

actually, it doesn't

#

lycoris is locon

#

they are the same

sterile bolt
#

self.lyco: bool = True # turn on if you want to use the new locon architecture

#

so, different architectures

#

which one is better for now

vivid python
#

between LoRA, LoCon, and LoHa

#

LoHa is trash

#

the other two have their uses

#

neither one is better than the other

#

LoCon learn styles much better

#

because they train on the whole model

#

LoRA are better for characters, because they don't learn style easily

#

and you can certainly get exactly everything right about a character at dim16

#

just, don't do 16/1

#

that just doesn't work

#

16/8 works

#

8/1 works

#

16/1 doesn't

sterile bolt
#

Sheffield has a lot of details, so i want to try to do it as accurately as possible, lora can't convey all the details

normal charm
vivid python
sterile bolt
#

lora?

vivid python
#

Hatakaze has a really complex pattern on her kimono, dim16 got that as well

#

lora

sterile bolt
#

i cant reach all details on dim 128

vivid python
#

train better?

sterile bolt
#

what i need to change? all my settings are above

vivid python
#

eh, all of it

#

high batch size lends itself to learning less often

#

1e-4 works well at 16/8, if you set that for all and go like 3k steps

sterile bolt
vivid python
#

bad usually

#

the way batch size works is that it merges that many images into one latent

#

then trains on that latent

#

means it learns less small details usually

#

I train at batch 2

sterile bolt
#

then why i bought 3090

vivid python
#

speed

sterile bolt
#

nah

vivid python
#

training takes a 5th the time it does on my 3060

#

regardless, you do you

sterile bolt
vivid python
#

I've stopped giving concrete advice because people don't usually actually follow it

sterile bolt
#

and how u calculate ur amount of steps?

vivid python
#

I don't

#

I just set steps

sterile bolt
#

but how it works

vivid python
#

I use the variable to set max train steps

#

kohya handles the rest

#

I pretty much never use repeats

#

unless the dataset is really small

sterile bolt
#

41 is small?

vivid python
#

3 repeats probably

#

epochs save instantly

#

so I see no reason to just use more epochs

sterile bolt
#

k i will try ur settings rn

vivid python
#

you usually need to bake a bunch of attempts and tweak settings

#

this will be true regardless of dim

sterile bolt
#

but i won't get 12 epochs in 8 minutes due to batch size change cry

vivid python
#

speed is the enemy of accuracy, I've found

#

anyways

#

not gonna be looking at this chat for the next whole day

#

gonna be away from computer

sterile bolt
#

which scheduler do you use?

kindred belfry
#

To claim something it is probably better to show some evidence as what I have always been doing Drool
(Providing dataset composition, hyperparameters, trained networks, xyz grids etc

vivid python
vivid python
#

I also don't plan on getting into this

kindred belfry
#

That's your choice of not switching to loha, and I understand it. In the end they are not that different, but claiming loha is trash is totally misleading.

vivid python
#

Loha don't improve over locon

kindred belfry
#

Could be. It depends on many factors. I don't claim any of them to be an improvement in the end.

vivid python
#

The problem is how much must be done to make them work

#

And even then, they usually end up having a larger file size from my experience

kindred belfry
#

It always works for me

#

Maybe for some reason it doesn't work well with your configuration

#

For some others they find loha to be better

vivid python
#

I've had them work, but I've only seen them worth messing with when making something huge

sterile bolt
kindred belfry
#

Personally I think they are on par. We really need to see the dataset, the training hyperparameters, and the results to be able to say something

vivid python
#

Like the entirety of umamusume

vivid python
#

I'm not gonna help

sterile bolt
vivid python
#

I don't help people anymore

#

Because I've continually gotten "fuck you" from people

sterile bolt
#

but i dont say that

vivid python
#

Sure, you probably don't, but I've just stopped wanting to help people because of how often it does happen

#

That being said, I'm also phone posting, which makes it annoying to type

vivid python
#

In general

vivid python
sterile bolt
#

it means thanks for help anyway

vivid python
#

I see

#

BTW, just as a thing to mention

#

Could be the dataset

normal charm
vivid python
#

Also technical questions are usually less of an issue

#

It's more or less actual training questions that I struggle to find the will to help with now

worn locust
vivid python
worn locust
#

yee

shut siren
#

You’re going to need a much higher LR going from 128/128 to 16/1

sterile bolt
worn locust
#

So what's min_snr_gamma

vivid python
#

a thing

#

no clue what it actually does

#

but apparently it improves training

normal charm
#

xyz chart time

shut siren
#

it adjusts for the fact that loss is inversely proportional to noise timestep

#

kinda normalizes it

worn locust
normal charm
normal charm
#

Anyone know how to format the negative prompt in the sample prompt txt file?

quiet notch
#

Does anyone know if the warmup_lr_ratio is based on optimization steps or epochs? Or maybe even total image iterations...?

vivid python
vivid python
#

Basically it's the ratio of total amount of steps

vivid python
#

Has anybody managed to get loha working? I want to replicate a training setup to see if I can provide a decent starting point for loha, because they are pretty different from training locon or lora

normal charm
#

Everytime i trained loha, it took way too many times to get it right

vivid python
#

Seems really finicky

#

But I was thinking

#

What if we can get more accurate styles out of roughly the same space as locon

#

I say this because, unlike locon, which don't work with cp decompression

#

Loha seem to work well with it

#

So we might be able to reduce the size of loha to match that of dim16 dim8 locon

#

With better results than dim16 conv dim8 locon

#

But I'm only thinking about this in a purely hypothetical context

#

Because I haven't managed to bake a loha that I was entirely happy with

normal charm
#

Ive only managed to once I believe

#

And I haven’t been able to reproduce the results

#

The thing that sucks about training imo is that even if one models training settings worked well, it cant’t be reproduced to work for another

#

At least in my experience

vivid python
#

Unfortunate

normal charm
#

Even similar dataset sizes don’t promise anything

vivid python
#

It might not be possible to have good defaults then

normal charm
#

I would agree with that assessment, yes.
More likely, it’s better to have defaults to work from rather than work with.

vivid python
#

Well yes, but I don't think I've had an instance of loha working without 20 or so bakes

#

Which means I have no clue what would make sense

normal charm
#

Again, loha takes work to get right. I already stuggle enough with normal locon training, cant get anything without training it at least 5 times. Analyzing how loha works is not my best interest currently

#

For that, I’ll wait for more knowledgeable people to take the reigns

kindred belfry
vivid python
#

I mean, if you managed to consistently get good results I'll look though your setup

#

I should mention though, most of the loha on civitai are trained poorly, so I don't plan on using them as basis

kindred belfry
vivid python
#

Well I have some questions, primarily, why train on ACertainty?

#

Actually, also why clip skip 1?

kindred belfry
#

No difference of clip 1 and clip 2 in my experiment. 金Goldkoron#9929 from Mynefactory said he found the model to be better if trained on clip skip 1. As sd is initially trained on clip skip 1 I see no true reason to train on clip skip 2.

vivid python
#

But why train on ACertainty?

#

Over nai

kindred belfry
#

Training on acertainty is basically as good as nai

#

And I don't want to say I train on nai. That's all.

vivid python
#

But mixes have a track record of destroying lora

kindred belfry
#

See my experiment for that

vivid python
#

Especially anythingv3

kindred belfry
#

Acertainty is basically like nai in terms of how the trained Lora performs

#

Acertainty is not mix

vivid python
#

Also, while that looks fine, it's not exactly the use case most will use loha for

kindred belfry
#

I cannot help in that case

#

I only train model for multiple concepts at a time

vivid python
#

People will likely use loha to train one character or one style

#

So I need to have a setup that is decent in that case

kindred belfry
#

Though my task is supposedly harder so I don't see why it would not work for them if it works for me

vivid python
#

I think it's a result of you having more data actually

kindred belfry
#

Oh yeah. Probably.

vivid python
#

Because I've had very poor results with smaller concepts

kindred belfry
#

It has been long time that I don't train anymore on dataset of 30 images

#

The last time I did it saw probably in November

vivid python
#

Perhaps I should make it very clear in my scripts that loha should only be used when you are training a bunch of concepts at once

#

I don't mean a small dataset btw

kindred belfry
#

Or sufficient images for one concept

vivid python
#

I mean a small amount of different concepts

kindred belfry
#

I don't think that matters

vivid python
#

I think it does

kindred belfry
#

On the other extreme you also have loha of blueleaf

#

Wait

#

Trained on 5 images

vivid python
#

Because with only one concept, for example, it seems to not actually fill everything

#

I had this issue when I tried to train a loha on unicorn

kindred belfry
#

Alright it was locon, then I don't know

#

What do you mean by fill everything

vivid python
#

Seems like it just has some data that doesn't really get trained

#

In the case of unicorn, her China dress would periodically be the incorrect color

#

Which was more common than I wanted

#

Which meant that her dress wasn't learned as much as it should have at the amount of steps I'm used to baking

#

This was true for all three of her outfits

kindred belfry
#

Could be.

#

I cannot say how you compromise training speed and quality with all the hyperparameters and captioning technique

#

This is too complex to investigate

#

I probably train much longer than most people anyway

vivid python
#

3k steps is usually the furthest I go

#

Unless I have a particularly large dataset

#

500+ images

kindred belfry
#

Like the one above is 40k steps with batch 8

vivid python
#

I use batch 2

#

I don't really have the vram to go higher

kindred belfry
#

So just not the same scale lol

vivid python
#

Not at all

#

So long story short

#

Loha requires a ton more training

#

Unlikely that people will use it given that it seems to be a huge increase in time spent training

kindred belfry
#

Cannot say

vivid python
#

Judging by the fact that I haven't gotten it to work at lower step count, and you have at a factor of over 10x I'm going to say it's likely this is the case

kindred belfry
#

In the end it may just depends on the habit and the use case of each user

vivid python
#

I can't see most making use of loha because of the step counts

#

As most come from training lora

#

Which usually take 20 minutes to train something decent, if you don't mind screwed up backgrounds, eyes, and hands

#

(That's the old, and honestly very bad, dim128 training)

kindred belfry
#

An interesting fact is that the picture of anisphia is probably seen like 40000 times when I trained that loha. I never know how many is enough. I just train for sufficiently long time and check if I have some good results.

#

If it's overbaked I can always use intermediate checkpoint but I never find the final checkpoint to be really unusable.

#

On the other hand for the mother of anisphia it is around 5000 times.

kindred belfry
vivid python
#

at least in the case of Unicorn, that was not the case it had just about the same amount of bleeding if not more, but that might have been related to dataset

kindred belfry
#

I don't know. This is just what I mean each person would find it more or less useful depending on their dataset and what they want to achieve.

normal charm
#

1 character
4 concepts
111 images
28-32 images for each concept

What settings would anyone recommend for that
yeah ik “it depends on the dataset” but i havent gotten it right still so i need help

#

I have so much conflicting knowledge about training shit that i cant even say ik what im doing anymore

vivid python
normal charm
vivid python
normal charm
vivid python
#

👍

worn locust
#

was it bad this whole time just cause it wasn't working right? lol

vivid python
#

network_weights was created for loading hypernets

#

so it's still pretty much not useful

normal charm
# vivid python 👍

Tried those settings, nothing changed much.
To be specific about the issue, it keeps merging the first two outfits of the character, while mostly missing the third (the most complex one maybe). Theres a swimsuit outfit also, and go figure, it manages to get that working fine for the most part, and that has less images to use then any of the others.

The first two outfits are similar, so its hard to differentiate them too much using the available tags. The third is completely different, but its missed the concept and ignores the activation token each time ive trained it

vivid python
#

it worked for the three outfit unicorn Loha I made, though it did have some bleeding, which I believe was more or less a tagging issue

normal charm
shut siren
#

got the swimsuit but couldnt differentiate the main and alt costumes

normal charm
shut siren
#

not really but some of the tags might have been similar

#

based on straight up booru tagging

normal charm
#

It is very annoying when you train something 10 times in a row, and it only nails the bikini each time flawlessly, is all imma say

shut siren
#

its ok

normal charm
vivid python
#

AHHHHHHHH

#

that is all

normal charm
#

I should be saying that

vivid python
#

I have a lot of work ahead of me

shut siren
#

hmm lycoris added a new algorithm as well

#

not sure of the details

normal charm
#

groan

shut siren
#

surely thats not a bad thing

normal charm
#

Maybe not
If you dont already struggle enough with the systems we do have already like me

#

Dont mind me, tho, im just a walking skill issue

vivid python
#

all I know from what I read, it's basically LoHa but different

normal charm
#

Different in that its easier to work with?

#

Right? Right?

vivid python
#

NO FUCKING CLUE

normal charm
#

The best kind of clue

vivid python
#

not like it matters anyways

#

sd-scripts just broke compatibility with it anyways

normal charm
#

That sounds terrific lol

vivid python
#

yeah, the block weight thing breaks compatibility with LyCORIS

#

granted it's up to Kohaku to update to sd-scripts

#

not the other way around

shut siren
#

it says its like 300kb files

vivid python
#

300kb lora huh? I'll let other people use it

#

I'm not even gonna try and touch it

#

Had enough trying to get loha to work

#

Don't want to tinker with this

worn locust
worn locust
kindred belfry
#

It is just that lycoris does not support blockwise learning rate for the moment. I don't know if kohaku plans to add it any soon.

#

As for IA3, I cannot say what its use case will be. For now it trains the same part as lora, in terms of how it trains it is similar to a mini hypernetwork, and in terms of result it is more like ti but for the style.

#

Its small size also indicates it's probably not as good as other methods in general, but if your style is simple enough it should be fine. It trains faster for the same number of steps (like half of time of loha), but it is unclear to me whether we would need more or fewer steps to get something that is ok to the user.

vivid python
vivid python
kindred belfry
#

I don't think you have anything to add in your script for ia3. The user only need to specify it in the network argument part, so in the end you probably don't need to touch the lycoris part in your update.

shut siren
#

The documentation says it’s less transferable between models so that probably limits use cases

kindred belfry
#

In fact the block wise thing that kohya implements is also just another network argument so I am not sure if the easy learning script really needs to be modified for that.

#

And yes I guess ia3 at it's current stage may just become an argument that no one uses lol

vivid python
kindred belfry
#

I see I never used the popup version because I only work remotely

vivid python
#

a majority of my users use the popups

#

so I need to make sure it is possible to use with them

quiet notch
#

I only use popups to generate the configure file, then I just use the configure file

vivid python
#

so do I, but that's still using the popups lel

normal charm
#

Should i update ur script? Havent done it in awhile @vivid python

vivid python
#

you can, lots of smaller updates happened probably

#

soon there will be the update for sd-scripts

#

which introduced block weight training

#

but I need to set up some stuff for it

#

namely, a proper popup for it

normal charm
#

Ill just wait til can is a must

#

I dont wanna suddenly throw something into disarray

#

Like last time deadsmile

cold orchid
#

I just backup argslist for my own brain dead reasons, and just edit that for each

quiet notch
#

i dont even know what argslist even is

vivid python
#

It's the python file that has all the args

vivid python
#

Finally got around to adding all of the block weight training stuff that kohya introduced

#

It took way too long to make sure everything was working as expected

#

but anyways

#

it's done

#

people can update through the update.bat if they already have it

#

or use the v6 installer

#
GitHub

A set of two training scripts written in python for use in Kohya's SD-Scripts repository. - GitHub - derrian-distro/LoRA_Easy_Training_Scripts: A set of two training scripts written in pyth...

GitHub

Complete re-write of the installer to be a python script. does everything the previous installers did as well as allows installation of torch 2.0.0 or 2.1.0 as well as triton for those versions. Ad...

shut siren
#

theres a LoKr module in lycoris now

vivid python
#

yeah

#

seems like kohaku suggests using D-Adaption for it because it's a bit finicky to train

normal charm
#

Until theres another game breaking way to train stuff i aint changing shit

#

That aside, i barely even understand what block merging is, even having looked at that one rentry page for it

vivid python
#

uh...

#

this is that game breaking way

normal charm
#

Awesome

#

More struggling

#

Love to hear it

vivid python
#

block weight training will allow you a ton of control

#

... or you could completely ignore it and continue like you have

#

this update doesn't change the ability to train like before

#

just adds a new way

normal charm
#

I could ignore it
But if i start seeing everyone switching up and talking about “oh yeah i used block merging for this” then fomo will get me

vivid python
#

oof

normal charm
#

As i am not immune to that

vivid python
#

I can't say many will

#

its very complex

#

like very complex

#

like 125 values complex

normal charm
#

That would explain how i still didnt really understand it

vivid python
#

well, techincally its only 25

#

but you can set the weight, dims and alphas

#

per layer

#

which means 25 per, or 125 different possible inputs

#

anyways

#

sleep time for me

ancient lynx
#

Now we just need auto mbw for lora training troll_handsome

cyan orbit
#

Weighted captions is apparently a thing now, kohya just added

shut siren
#

Yeah one of the guys in unstable was working on it for a while

vivid python
#

I'll add it once it's out of dev branch, I'm a bit burnt out after this update

#

So I'm gonna step away from it for a few days

worn locust
vivid python
#

well, there are presets for the weights

vivid python
#

but literally only that

worn locust
#

GOD damn

normal charm
#

Too much tech for brein

normal charm
#

So no new args besides the huggingface ones?

vivid python
#

And all of the stuff for block weights

#

But that goes into the network_args

shut siren
#

ill let someone test out the block weights lol

normal charm
#

@vivid python eh?

#

i used the update file, but ig it didnt fully work?