#LoRA_Easy_Training_Scripts
1 messages · Page 2 of 1
More precise?
and actually generate image with different resolution is totally different task
so you should use the resolution that your base model trained for
if your base model is something merged and you want to train lora on it
I will recommend you to disable the conv layer with conv_dim=0
I'll throw in some parameters that I used for training a locon style lora of tab_head, just to give another setup.
4/1 linear
8/1 conv
5e-4
1200 steps
768x
Cosine
Warmup ratio of 0.05
Batch size 2
Got pretty much the entire style.
Clyde censored my hands images 
I always train with nai, do i need to change it for that too?
basically only need to change the settings about locon/loha
res and other is ok

Hands be too sexy man
Interesting, I haven’t had issues with using high alpha in my training runs
High alpha meaning alpha=dim or basically no scaling
This does ofc vary across datasets and styles, needless to say. Im guessing this is best used a reference?
Reference yes, each dataset is different
When I tried low alpha in the past, it required giga cranking up the LR to learn anything
literally had to make imgur album because I couldn't post them https://imgur.com/a/ZUf57UA
All the math i learned in school and shit still make me feel smooth brained
But seems like training at 16/8 or 8/16 then resizing using the dynamic resizing might be better
Thanks us education system
Those are dims btw
Lel
Hands too sexy for imgur

Dynamic resizing?
Yeah
The lora resize script can resize locon
And it has a few dynamic resizing modes
I've generally had success at low alphas
End up having to use giga learning rate to compensate
Almost all of my lora are either dim16 with an alpha of 8, or dim 8 with an alpha of 1
Wait
I shouldve asked this earlier
But are we talking about the base dim/alpha or just the dim/alpha for lycoris training
Both
what warmup ratio you use?
Back when sdscripts first implemented alpha and the default became 1, I was wondering why none of my loras were learning anything lmao
This one is about lora specifically though
Thats where i get further confused, so when is it necessarily to raise the dim past 16?
Generally... never
Some styles may need more
it's ratio in derrian's argslist 
Which is why I set default to 32
but probably could calculate it for 100 steps
yeah
Yeah, sd-scripts doesn't have ratio
Just steps directly
I’m dying from seeing loras I want to try out on civitai and then they’re like 256dim or something

Resize them down
We have the means
Oh another thing
Would lycoris models have anything to do with why additional networks extension isnt loading them for me
What settings do you prefer for dynamic resize?
Yeah, provided it was a loha it doesn't load at all
"It depends"
It was
You need to use the locon webui extension to load loha
And you can only load it using the built in method
Not additional networks
I…thought i already installed it?
Loha weren't integrated into additional networks though
Only the webui way, you know, the lora:name:1 thing
actually you can use 4090 with bs12~16 on 640*640
with gradient checkpoint
And it's up to date?
I just updated an hour ago
I "only" have 3090, but probably only thing that matters is VRAM
Then it should load loha fine, provided you use the prompt box for it
Can it support loha?
do yall adjust learning rate for batch size?

I never change my batch size from 2
Not for batch size, not me
So not really
I used to do batch size 3, but since using locon or loha i had to lower it to 2
what buckets do you use?
im a batch size 1 user lol
I remember showing Salt my batch size when making softprompt for LLM, and he though it was for WD 
gonna try dim 8/4 with 1 alpha for both
and see what happens
considering i was alr using lr 4e-4 with alpha=dim this might be undertrained
batch size 400 and 64 grad steps 
is there a way to tell if your lora/locon underflows with high alpha?
XD
no
but you can check the weight of lora_up
for loha I need to check(
cuz my results have always been usable with alpha=dim
but i dont know if there is any problem with it lol
and in sometimes
my friend actually meet something like:
"All zero in the weight"
super underflow
I never touch weight decay
Idk if i should
i nudged it up to 0.1
Thats the default no?
You should use weight decay if you are using small dataset
default is 0.01 iirc
and actually amount<10k is small dataset
(some people will say <100K is small BTW XD)
me when my subject has 12 images
10k? Images?
some of my subject datasets are like 20-30 lol
I also has trained locon with 5 img total

ive been using random crop
Who the hell has a more than 10K dataset
to try to get a pseudo larger dataset lol
me
me

umamusume 60k dataset
after repeat for balance the amount of each characters
the size become 350k
Beefy ass pcs u got
@humble axle I'm about to test your settings but I don't think batch size 8 is good for 38 images 
is there any drawback to high alpha besides the risk of underflowing?
I use batch size 8 for 22 img (after aug) before
just add 10x repeat on it XD
higher grad, more unstability?
I don't know
just a scale for output
0.1 is the default on my scripts, because I found it worked better than the normal default
normal default is 0.01 iirc
Noted
how many steps do you do for small datasets?
lower than 1k but I forgot
maybe 300~600
or if you can read the metadata from pt file(
I, in fact, know how to read
good
Yog Sothoth Trained with LoRA and LoCon, Detail for locon: KohakuBlueleaf/LoCon: LoRA for convolution network ( github.com ) Extension for using lo...
here is it
yes, being able to read is good
train with 5img (22 after aug)
and get super good result
I also using this as style model BTW
the light/shadow is good for me
yea, I do that too for less than 30 images or even more
alpha=1 seems kinda undertrained if i keep the rest of the settings the same
not getting the outfit details as precisely
try little bit higher lr
im already at 4e-4 which seems kinda high
not
with alpha=1 and cosine scheduler you can just push to 8e-4 (
basically depends on your dataset
any change to the text encoder lr?
just use both 4e-4
than unet
just use both 4e-4
you think maybe the TE is undertrained?
I still don't know what TE does in single subject LoRAs 
I only noticed it when I did like 8 characters
ive been using a lower TE lr than unet just because those were the defaults ive seen
When I had to increase TE
not because of any theoretical reason
you just used the same for both?
i think the thought is the TE doesnt need as much training for this
when you are doing some f/t like this
actually te learn much than unet
undertrained is almost always = te undertrained
i heard TE can blow up if the LR is too high
yeah it will
but you have alpha=1
no need to afraid that
alpha=1 for dim=8 means 1/8 grad(
(BTW actually all the thing will blow up if LR is too high)
the old colab i used months ago for DB did this method where it would train just the TE first, and then freeze the TE and train unet after that
(UNet is not more stable, just in the past you only train the transformer block in it)
Oh I cannot say UNet is not more stable
It is more stable (In some other experiments)
but it will blow up
ummm
betwee make sense and not make sense(
major differences between adamw and adamw8bit?
8bit is way more smaller
is there a difference in speed or quality?
speed... not that significant
quality is also not that siginificant
trained with adamw8bit
ive been avoiding increasing batch size because my datasets are giga small
also because it sometimes gives weird step numbers
if an aspect ratio bucket has a # of images that isnt a multiple of batch size
oh right
batch size 16k lets go
the step number...
bleh the results from my unet/te both 4e-4 run still seem worse than my old alpha=dim locon
use higer lr
maybe your dataset need higher lr
I use 5e-4 for dataset with trigger word only(?
(remove all tags about the trigger word)
If you use tag/caption + trigger word
you will need higher lr
oh yeah this is trigger word + tags
Thats all my datasets ever are
trigger word + relative tags need higher lr
when i tried trigger word only the first time i trained lora, it exploded
even at 1e-4 lol
Base lr or te
all
te is important than unet(
(Just considering about TI/HYN, actually modify things for TE)
(And anime diffusion also just change the cond layer only and get totally different style)
oh wait
HYN is for unet transformer
I'm wrong sorry
(or something also for TE?)
The only lr ive ever seen anyone use for te is 5e-5
Or 1e-5 (mightve been unet, unsure)
Most times no one has it set
maybe i just go back to what i was doing before at alpha=dim, since i found LR that works for that lol
if you are always using same dim
considering about use that alpha always
fixed alpha ratio should not work(
the alpha itself matters more than the alpha relative to the dim?
and yeah i basically always use the same dim size
8 linear 4 conv for locon
4/2 for loha
yes
ratio = scale
but fixed scale means you need to tune lr for different dim(size)
if you fixed alpha
means your ratio is related to your size
which is good for NN
the loras i have on civitai are 128dim/128alpha lol
LOL
since i know the old behavior of sdscripts before 0.4.0 was basically alpha=dim
cuz there was no scaling
hmmm interesting that the details im getting on this character's thighhighs are wayyy better when the character is standing than sitting lol
i guess its just somewhat unreliable with the sitting pose
XD
make sense
8e-4 text encoder makes me uneasy lol
XD
thats more than an order of magnitude higher than my previous
Trying another training using scraps of everything we just talked about
I wont say i know what im doing now, but at least i know…somewhat more than before?
i feel like ppl use widely different settings and still manage to arrive at a usable result
Well yes civitai is proof of that
Its more a matter of efficiency than usability
Well
im just trying to figure out if there are better settings than what im using that will more consistently give a usable result
like my old 128dim/128alpha loras
were usable
i made some pretty cool gens with them
just remember
all the setting is depend on your dataset your task your model... depends on all the thing
but now im a low dim believer
I'm 1dim believer (X
save disk space
Theres an extension that helps view the details of other models people have trained (not additional net)
theres some 256dim users on civitai
I also recevied an issue about blow up LoHA
and that guy just use
384dim loha
I don't know why
surely more dim = better result
higher than the dim of some layer in the UNet

so we should use 100B stable diffusion(O
Highest ive ever gone was 128
I was under the impression low dims were for characters and higher were for styles
I actually use 4dim for style XD
but since I use 1dim for character
so higher for style may be correct?
damn 1dim
i mean i guess ppl make TIs for characters and those are even smaller than 1dim loras
Yeah
I figured higher dim did something like account for more of the details in the image or something
Hence better for styles
That would be true except everyones switched to loras now
Or at least as far as ive seen
you use 1dim for linear and conv?
I didn’t knoe what to set it to at first, so i tried keeping them close to the base alpha and dim i had set
Like if i had 8/16 locon would be 9/18 or something
you were using alpha higher than dim?
Rarely
I only ever kept the alpha to between 8 and 16, and MAYBE 32
On occasions where things werent working
damn its competent when the char is standing, but cant do this when the char is sitting lol
Such perfect ai symmetry makes me cry of joy
That character has nice thighhighs
What's her name?
If you can get good result with normal lora
Just continue using it
If you get bad result with all the things
You need to adjust your dataset
And I need yo sleep sry(


I expanded some of my datasets from 20 to 30 images, but I don’t feel that it’s necessarily an improvement for all of them
I would consider remaining with old version of both scripts personally with Linux
Wasn't a good idea updating both to latest version
I too had the same bad accuracy problem as his
5th miss
But this time, it retained more details after i changed the base alpha from 1 to 4
All the others had it at one
So so far alpha=1 aint lookin good
Does that need higher steps or lr?
long story short, setting alpha to 1 only really works at low dims
higher than 8 set it to half
Low alpha needs higher LR
When they introduced alpha and it dropped my 128/128 loras (at the time) to 128/1, it essentially learned nothing on the same settings

trying to use gradient checkpointing
no way this is working, I have batch size 8 at 768 res
higher batch size isn't necessarily better, not at our scale
my friend asked me to try to train with a single image - and its a fkin discord emoji
pretty sure this will result in absolute junk
i'd be surprised if it works
it made some absolute junk but we got a few laughs
doesnt help that the emoji is 64x64
Two additional misses in the past few hours since last update
8th attempt in progress
8th attempt ended up retaining the most details with these parameters:
4/8 linear
8/1 conv
5e-4 lr
0.05 warmup
1200 steps
This is when trained as a locon
The outputs still fuck up body anatomy fairly bad, but it showed signs of getting the concept right, so this is next checkpoint i think
Next ill try doing loha and see how that fares
@humble axle tested your settings with higher/lower TE/LE and 512/768 and they work pretty well, will have to make LoRA to compare the results but at least they work, so thanks for the help


Tried training my 9th attempt with the same settings as my 8th, but as a loha this time. Result was another complete miss
Loha needs more focus or something on it right?
Loha usually take more to train, because they compress their dims
Does that equate to setting it for more steps then? Ive been doing 1500 so far to keep it safe
Or higher lr
1500 is definitely too low
How much then
For loha, either a high lr or like 3k steps
The way I train in general is start at 1e-3 for 800 steps
Figure out if it's good enough, if not then figure out what needs to be changed
If it's baked but didn't learn anything, lower lr increase steps
If it learned most things but not everything and isn't baked, increase steps a bit
If it learned nothing and wasn't baked, increase lr
Though I never had to do that one
I usually end up with something like 5e-4 for 1600 steps
Also. I generally keep TE at 1e-4 in pretty much every case
I just realized your name is 5e-4, that's perfect
That's solid advice
I learned it from a dude with a masters in datascience
I picked his brain a lot
At least I'm pretty sure he said masters
It was a few months ago at this point
This is what i had to fall back to and that didnt help, so im thinking the issue is something else
@vivid python im the only one who frequents this place bc of endless questions and issues 
anyway, i updated to v6
That's the first time I've seen the venv creation just straight up fail
I literally just updated my install of sd-scripts to torch 2.1.0 as well
So I know it exists
The v6 installer is meant to be in its own folder
Because it installs everything of course
But the torch_update.bat is just supposed to nuke the venv then reinstall everything with installing the new torch
I did do this, on desktop
Had to download and install an older python version tho, but i digress, i suppose
So how do i fix the venv issue?
yo, somebody training locon/lyco in LoRA_Easy_Training_Scripts local repo? how can i understand what exactly trains right now? and if anyone can, please send me configuration file for analysis, thanks in advance
lycoris with lora algo is locon
@vivid python

Welp
Guess im disabled now
Cant reinstall v5 either
Same issue
When did the program suddenly decide i dont have xformers installed

@vivid python idk what happened, but as stated above, i cant seem to reinstall v5 either
It all seems tied to a torch error
v5 shouldn't have any issues with installation
the only difference in v6 is the option for the new torch version
and proper checks on python version
Running the kohya v4 installer also gives the same error about not finding a satisfactory requirement
Guess i can delete that then
But that still doesnt explain the v6 and v5 issue
it doesn't you're right
but it's likely an issue with your computer
rather than my scripts
as I haven't had any other issues with it, nor any other reports of issues
Yep
I have git and 3.10.6 installed
how do you have 3.10.6 installed?
Wdym by how
you can get it either through the app store or from the website
the app store version is dogshit
Oh no i went directly to the site
I checked the add to path option in the installer im certain
And before this version, i actually had 10.9 or something installed, with which training sessions never encountered issues.
is it still installed?
I only downloaded 10.6 after this error started appearing
I assumed installing 10.6 would overwrite any and all 10.9 stuff
not at all
Should I uninstall and reinstall then?
just uninstall python 3.10.9
Yeah theres actually 3 different python versions here now
Im not sure how it was working before
Wow. That didn’t fix it ._.
might be better to just uninstall all python versions then reinstall then
Managed to collect torch, so far so good
I don't know how your python got fucked up, but I'm also not surprised
python is honestly really shitty
Its a miracle (though i was unaware), that it even worked in the past at all
almost had it
what does it say above?
Just a repeat of the same distribution line
looks like diffusers it throwing a fit
Do i use one of the update bats?
I would like to state for the record that I do try what I can to fix stuff before I go pinging you, which is why ur not getting a ping every 3-5 minutes @vivid python
Oh I get it, I not annoyed or anything, I just don't have notifications on
Thay being said
Try running torch_update.bat
And installing 1.12.1
The original torch and see if it runs
it did
just keep using it with 1.12.1 then?
Yeah, seems like torch 2 isn't working for you which is odd
I haven't had somebody have issues with it
Just remember me as the guy who is problem prone

guess so
and it sucks too
because torch 2.1.0 made my bake times go from 2.5 hours to 1.5 hours
and i'm on a 3060
3070
Ive heard these arent the best with training
they are alright, less vram than me
Mhm
Maybe i shouldnt be too surprised
I was trying to get this trainer up and running, but I was only able to get it working on torch 1.12.1, not on >2.0. I have a GTX1070, so I'm just randomly assuming it's not supported.
Oh, I remember my issue now
It was something about bitbytes saying that there was no kernel available for execution for my device
That said, I don't know if I can ever actually run with torch >2.0 
that speedup sounds nice
it looks like there may be solution for me to try in here...
nope, still wasn't able to get it up and running
There's that dreaded RuntimeError: CUDA error: no kernel image is available for execution on the device
Running 1070 on win10
try turning off 8bit adam
or errr
using regular adamw instead of adamw8bit
i know adamw8bit isnt compatible with GTX 1080 and will throw that error
@quiet notch
does it run?
nope
same issue
switched over as you said
i noticed that from earlier messages, the easy_training folder is supposed to have a venv?
i'm not running in venv (didn't do an earlier step correctly perhaps?), could that be part of the issue?
@shut siren i might have to sleep soon so we can resume this some other time (unless you want me to troubleshoot rn fast)
i am using loha settings, btw
well "no kernel image is available for execution on your device" implies that something is incompatible with your hardware
have you tried using base sdscripts
i have yet to touch it directly
it may be torch 2.0 as you mentioned before?
i know that bitsandbytes doesnt like GTX 1080
but you shouldnt need bitsandbytes unless youre trying to run adamw8bit
so theres definitely something else
i had someone in another discord also run into that kernel image error, but it specifically indicated bitsandbytes, and he was using sdscripts with the bmaltais wrapper on GTX 1080 - fixed immediately when he switched to regular adamw

you got things working on older torch version?
yeah, it works on 1.12.1
it might just be that torch 2.0 doesnt like your hardware
Do you know if gradient acc steps is properly supported? I was getting this warning...
None of the inputs have requires_grad=True. Gradients will be None
not too sure if this is specific to the easy training scripts or lora in general
Im pretty sure 2.0 didnt work for me either, and im on 3070
I was just instructed to use 1.12
10 series cards requires a specific patch, I'm gonna assume you applied it, I had believed that it should just work, but perhaps the main.py file changed for torch2.1.0, which actually makes sense.
Perhaps it might actually be that the dlls need to be updated to support torch 2.1.0, which means, likely, that until a new version gets built, 10 series cards cannot use torch 2
I did hear something about building from scratch
Im surprised i never asked this question before but if epoch count is ignored when max steps is set, are the number of repeats also ignored?
From what I understand, number of repeats merely multiplies you inputs. It shouldn't have any effect on what you set your max steps to be (assuming max steps is the point at which the model is forcefully stopped)
Speaking of multiplying inputs, I noticed that when the repeat config multiplies the input, the "epoch" then becomes based on that multiplied dataset rather than the original.
if I understood my readings correctly, epoch is when the AI looks over your images once and has updated parameters. But with the repeat variable, when you look over all of your data once * repeat, it only counts as 1 epoch instead of what the repeat is.
i'm even more confused when i take into account gradient acc steps and the config parameter epoch, because the shown epoch during training can be higher than what you originally intended for it to be, and then on top of that, since input is multiplied, the real epoch is actually the shown epoch multiplied by repeats?

so to properly answer this question, i dont think it should because num repeats doesn't effect the point at which you stop, assuming max steps is the training cutoff
That was my understanding of max steps at least
So what it sounds like is it just affects how many images are processed during training
However long its specified to go
though...
assuming max steps refers to optimization steps (the step in which parameters are updated)
hm
nevermind
i was thinking that if a "step" referred to an image training iteration, then if you set repeats to like 10 and max steps to 10, then you would only go through 1 image out of your entire dataset
okay so ik for a FACT this sample folder did not exist before, despite the date modified column saying its been here since a few weeks ago
i have really really wanted a an output folder for samples so i could better track the training progress, but from, what I can tell, theres no path argument or whatever that can be assigned for that
Theres the option to tell it how often to dropout samples, but apparently the output directory for the lora isnt enough for that (according to the error i get when using that function)
Theres no way im that blind bc i constantly check this folder after training things, downloading things, etc
No way i wouldnt have noticed this, but if it can actually be done then where do i specify where it drops the samples?
you just got mandela effect'd by derrian
nani
I don't think there is an option to set where samples go
at least, there isn't an arg for it
yeah, I don't see one
it is supposed to output to the outputs folder
I can only see the problem being that there wasn't a proper txt file being pointed to for samples
but dunno
anyways, gonna sleep now
if you still have questions
I'll answer in the morning
The samples i found in that folder were from a lora i trained awhile back
And i have only ever trained loras using ur script, at least, as far as i can remember
Which makes this all the more confusing
Since any time i turn the samples arg on, it just gives me path errors during training
Not sure then, there's no way to change the folder for samples, but the path error can be it looking for the txt file and not able to find it
Wait what txt file
The text file that has all your prompts
It doesn't just pull random prompts from the dataset
U mean the sample prompt txt?
Is that what it needs? I do remember i used it for that lora
@vivid python
Yep, it's required
I see. Also, ‘nother quick question: how high do u reckon my lr should be if i train at, say, 5 dim?
Depends on the dataset
I usually follow
The idea of 1e-3 to see how it bakes really quickly
Then adjust based on those outputs
Dim doesn't affect lr a ton, from what I know
Ok, thank u
so i use 41 pic dataset with 5 repeats for this settings to lora training (local LoRA_Easy_Training_Scripts):
self.optimizer_type: str = "AdamW8bit" self.scheduler: str = "cosine_with_restarts" self.cosine_restarts: Union[int, None] = 3 self.learning_rate: Union[float, None] = 1e-4 self.unet_lr: Union[float, None] = 1e-4 self.text_encoder_lr: Union[float, None] = 5e-5 self.net_dim: int = 128 self.alpha: float = 128 self.train_resolution: int = 768 self.batch_size: int = 6 self.num_epochs: int = 12
then i trying to start train LyCORIS with same dataset and same count of repeats when add next settings:
i changed value self.net_dim: int = from 128 to 16
i changed value self.alpha: float = from 128 to 1
self.lyco: bool = True self.locon_dim: Union[int, None] = 8 self.locon_alpha: Union[int, None] = 1 self.locon: bool = True
add self.network_args: Union[dict[str:str], None] = {
algo": "lora"
"conv_dim": "8"
"conv_alpha": "1"
"disable_conv_cp": "True"
}`
as a result, i get an output image worse than lora. im dumb?
base dim 128?
for lora 128, for lycoris 16
eh, lora don't need to be dim128
also update pushed
also, 16/8 is better than 16/1
and locon is just not better for characters because of style bleed
and lower dims require higher steps
or higher lr
i always used in lora 128/128 and i like output results, as in preview above
can i increase number of steps by increasing repeats of dataset? like from 5 to 20
or just don't use them
more epochs
or how i can try increase amount of steps
better that way
value self.net_dim value self.alpha
this is need to change when training lycoris or will be used only conv values?
it will use both
self.lyco: bool = True # turn on if you want to use the new locon architecture
so, different architectures
which one is better for now
between LoRA, LoCon, and LoHa
LoHa is trash
the other two have their uses
neither one is better than the other
LoCon learn styles much better
because they train on the whole model
LoRA are better for characters, because they don't learn style easily
and you can certainly get exactly everything right about a character at dim16
just, don't do 16/1
that just doesn't work
16/8 works
8/1 works
16/1 doesn't
Sheffield has a lot of details, so i want to try to do it as accurately as possible, lora can't convey all the details
I managed to get Shun + Shun small into dim16, who has just as many small details
lora?
train better?
what i need to change? all my settings are above
eh, all of it
high batch size lends itself to learning less often
1e-4 works well at 16/8, if you set that for all and go like 3k steps
is it good or bad
bad usually
the way batch size works is that it merges that many images into one latent
then trains on that latent
means it learns less small details usually
I train at batch 2
then why i bought 3090
speed
nah
this settings for lora, right?
I've stopped giving concrete advice because people don't usually actually follow it
and how u calculate ur amount of steps?
but how it works
I use the variable to set max train steps
kohya handles the rest
I pretty much never use repeats
unless the dataset is really small
41 is small?
3 repeats probably
epochs save instantly
so I see no reason to just use more epochs
k i will try ur settings rn
you usually need to bake a bunch of attempts and tweak settings
this will be true regardless of dim
but i won't get 12 epochs in 8 minutes due to batch size change 
speed is the enemy of accuracy, I've found
anyways
not gonna be looking at this chat for the next whole day
gonna be away from computer
which scheduler do you use?
Or you probably don't know the right hyperparameter to use?
To claim something it is probably better to show some evidence as what I have always been doing 
(Providing dataset composition, hyperparameters, trained networks, xyz grids etc
Usually cosine or cosine with restarts
Not worth it, I don't plan on spending the hours to train loha when locon have gotten better results
I also don't plan on getting into this
That's your choice of not switching to loha, and I understand it. In the end they are not that different, but claiming loha is trash is totally misleading.
Loha don't improve over locon
Could be. It depends on many factors. I don't claim any of them to be an improvement in the end.
The problem is how much must be done to make them work
And even then, they usually end up having a larger file size from my experience
It always works for me
Maybe for some reason it doesn't work well with your configuration
For some others they find loha to be better
I've had them work, but I've only seen them worth messing with when making something huge
im using cosine with 3 restarts, so, idk, your settings (16 dim, 8 alpha, 1e-4, 2 batch size, 3 repeats and 3k steps) dont seems to be good (lora with your settings from right)
Personally I think they are on par. We really need to see the dataset, the training hyperparameters, and the results to be able to say something
Like the entirety of umamusume
Meh, you do you
I'm not gonna help

but i dont say that
Sure, you probably don't, but I've just stopped wanting to help people because of how often it does happen
That being said, I'm also phone posting, which makes it annoying to type
In general
I have literally no idea what that means in this case
it means thanks for help anyway
Dw about that with me, i already feel like enough of a burden if i ask for help more than 3 times
Your questions are very basic, doesn't usually take much time to answer
Also technical questions are usually less of an issue
It's more or less actual training questions that I struggle to find the will to help with now
If you go from dim 128 to dim 16 and don't increase the learning rate, then obviously the result will be trash
I always do 5e-4, try that
you can do 1e-4, but you need a ton of steps
yee
You’re going to need a much higher LR going from 128/128 to 16/1
i will try, thanks, you set 5e-4 for all settings? (lr, unet, text encoder)
text encoder 1e-4
So what's min_snr_gamma
xyz chart time
it adjusts for the fact that loss is inversely proportional to noise timestep
kinda normalizes it
Denoising diffusion models have been a mainstream approach for image
generation, however, training these models often suffers from slow convergence.
In this paper, we discovered that the slow convergence is partly due to
conflicting optimization directions between timesteps. To address this issue,
we treat the diffusion training as a multi-task ...


Anyone know how to format the negative prompt in the sample prompt txt file?
Does anyone know if the warmup_lr_ratio is based on optimization steps or epochs? Or maybe even total image iterations...?
Do your normal prompt first then, on the same line --n then your negative prompt, might actually be only one dash though
It's based on whatever you use to calculate your steps, if you give a step count it will use that, if you give it epochs then it will Calc the step count and use that
Basically it's the ratio of total amount of steps
Has anybody managed to get loha working? I want to replicate a training setup to see if I can provide a decent starting point for loha, because they are pretty different from training locon or lora
Everytime i trained loha, it took way too many times to get it right
Seems really finicky
But I was thinking
What if we can get more accurate styles out of roughly the same space as locon
I say this because, unlike locon, which don't work with cp decompression
Loha seem to work well with it
So we might be able to reduce the size of loha to match that of dim16 dim8 locon
With better results than dim16 conv dim8 locon
But I'm only thinking about this in a purely hypothetical context
Because I haven't managed to bake a loha that I was entirely happy with
Ive only managed to once I believe
And I haven’t been able to reproduce the results
The thing that sucks about training imo is that even if one models training settings worked well, it cant’t be reproduced to work for another
At least in my experience
Unfortunate
Even similar dataset sizes don’t promise anything
It might not be possible to have good defaults then
I would agree with that assessment, yes.
More likely, it’s better to have defaults to work from rather than work with.
Well yes, but I don't think I've had an instance of loha working without 20 or so bakes
Which means I have no clue what would make sense
Again, loha takes work to get right. I already stuggle enough with normal locon training, cant get anything without training it at least 5 times. Analyzing how loha works is not my best interest currently
For that, I’ll wait for more knowledgeable people to take the reigns
I suppose you are asking someone else than me because I did the entire series of my experiments in loha. Still drop the message here just in case. You can otherwise check the lohas posted on civitai. (Like the umamusume ones of mht, or probably the one trained by the person that asks about supports of loha to be added in comfui
I mean, if you managed to consistently get good results I'll look though your setup
I should mention though, most of the loha on civitai are trained poorly, so I don't plan on using them as basis
I don't know what you mean by a good model. You can try mine anyway https://civitai.com/models/21305/tenten-character-lohafullckpt
All the intermediate checkpoints can be found in https://huggingface.co/alea31415/tenten-characters The base model is ACertainty Trained at clip sk...
Well I have some questions, primarily, why train on ACertainty?
Actually, also why clip skip 1?
No difference of clip 1 and clip 2 in my experiment. 金Goldkoron#9929 from Mynefactory said he found the model to be better if trained on clip skip 1. As sd is initially trained on clip skip 1 I see no true reason to train on clip skip 2.
Training on acertainty is basically as good as nai
And I don't want to say I train on nai. That's all.
But mixes have a track record of destroying lora
See my experiment for that
Especially anythingv3
Acertainty is basically like nai in terms of how the trained Lora performs
Acertainty is not mix
Also, while that looks fine, it's not exactly the use case most will use loha for
People will likely use loha to train one character or one style
So I need to have a setup that is decent in that case
Though my task is supposedly harder so I don't see why it would not work for them if it works for me
I think it's a result of you having more data actually
Oh yeah. Probably.
Because I've had very poor results with smaller concepts
It has been long time that I don't train anymore on dataset of 30 images
The last time I did it saw probably in November
Perhaps I should make it very clear in my scripts that loha should only be used when you are training a bunch of concepts at once
I don't mean a small dataset btw
Or sufficient images for one concept
I mean a small amount of different concepts
I don't think that matters
I think it does
Because with only one concept, for example, it seems to not actually fill everything
I had this issue when I tried to train a loha on unicorn
Seems like it just has some data that doesn't really get trained
In the case of unicorn, her China dress would periodically be the incorrect color
Which was more common than I wanted
Which meant that her dress wasn't learned as much as it should have at the amount of steps I'm used to baking
This was true for all three of her outfits
Could be.
I cannot say how you compromise training speed and quality with all the hyperparameters and captioning technique
This is too complex to investigate
I probably train much longer than most people anyway
3k steps is usually the furthest I go
Unless I have a particularly large dataset
500+ images
Like the one above is 40k steps with batch 8
So just not the same scale lol
Not at all
So long story short
Loha requires a ton more training
Unlikely that people will use it given that it seems to be a huge increase in time spent training
Cannot say
Judging by the fact that I haven't gotten it to work at lower step count, and you have at a factor of over 10x I'm going to say it's likely this is the case
In the end it may just depends on the habit and the use case of each user
I can't see most making use of loha because of the step counts
As most come from training lora
Which usually take 20 minutes to train something decent, if you don't mind screwed up backgrounds, eyes, and hands
(That's the old, and honestly very bad, dim128 training)
An interesting fact is that the picture of anisphia is probably seen like 40000 times when I trained that loha. I never know how many is enough. I just train for sufficiently long time and check if I have some good results.
If it's overbaked I can always use intermediate checkpoint but I never find the final checkpoint to be really unusable.
On the other hand for the mother of anisphia it is around 5000 times.
@vivid python https://www.canva.com/design/DAFeAteHW18
I have no idea whether this is true but someone finds that you have less bleeding with loha
at least in the case of Unicorn, that was not the case it had just about the same amount of bleeding if not more, but that might have been related to dataset
I don't know. This is just what I mean each person would find it more or less useful depending on their dataset and what they want to achieve.
1 character
4 concepts
111 images
28-32 images for each concept
What settings would anyone recommend for that
yeah ik “it depends on the dataset” but i havent gotten it right still so i need help
I have so much conflicting knowledge about training shit that i cant even say ik what im doing anymore
all 4 are roughly equal in size?
If u mean the amount of images then yes
alright, give each 2 repeats, and do dim32 to start with, you can dynamically resize down later, alpha16 works well probably, 5e-4 unet, 1e-4 TE for 1600 steps
Alright ill give it a go after current training finishes (i doubt itlll work but just in case)
👍
network_weights was created for loading hypernets
so it's still pretty much not useful
Tried those settings, nothing changed much.
To be specific about the issue, it keeps merging the first two outfits of the character, while mostly missing the third (the most complex one maybe). Theres a swimsuit outfit also, and go figure, it manages to get that working fine for the most part, and that has less images to use then any of the others.
The first two outfits are similar, so its hard to differentiate them too much using the available tags. The third is completely different, but its missed the concept and ignores the activation token each time ive trained it
not sure then, that usually works for me
it worked for the three outfit unicorn Loha I made, though it did have some bleeding, which I believe was more or less a tagging issue
had an experience like this when i tried a lora with 3 diff outfits
got the swimsuit but couldnt differentiate the main and alt costumes
Were the others similar in appearance?
not really but some of the tags might have been similar
based on straight up booru tagging
It is very annoying when you train something 10 times in a row, and it only nails the bikini each time flawlessly, is all imma say

AHHHHHHHH
that is all
I have a lot of work ahead of me
groan
surely thats not a bad thing
Maybe not
If you dont already struggle enough with the systems we do have already like me
Dont mind me, tho, im just a walking skill issue

all I know from what I read, it's basically LoHa but different
NO FUCKING CLUE
That sounds terrific lol
yeah, the block weight thing breaks compatibility with LyCORIS
granted it's up to Kohaku to update to sd-scripts
not the other way around
it says its like 300kb files
300kb lora huh? I'll let other people use it
I'm not even gonna try and touch it
Had enough trying to get loha to work
Don't want to tinker with this
I'll just wait for 4chan to find the perfect ratio
I don't think this is the case. I can run with both sd-scripts and lycoris at the most recent versions.
It is just that lycoris does not support blockwise learning rate for the moment. I don't know if kohaku plans to add it any soon.
As for IA3, I cannot say what its use case will be. For now it trains the same part as lora, in terms of how it trains it is similar to a mini hypernetwork, and in terms of result it is more like ti but for the style.
Its small size also indicates it's probably not as good as other methods in general, but if your style is simple enough it should be fine. It trains faster for the same number of steps (like half of time of loha), but it is unclear to me whether we would need more or fewer steps to get something that is ok to the user.
I didn't test it yet myself, was just going off of what kohya said
Yeah, definitely not gonna even try to use this. I'll just implement it and let others fuck with it. Not worth my time, not gonna learn how to train it.
I don't think you have anything to add in your script for ia3. The user only need to specify it in the network argument part, so in the end you probably don't need to touch the lycoris part in your update.
The documentation says it’s less transferable between models so that probably limits use cases
In fact the block wise thing that kohya implements is also just another network argument so I am not sure if the easy learning script really needs to be modified for that.
And yes I guess ia3 at it's current stage may just become an argument that no one uses lol
the popups needed to be modified to allow for it
I see I never used the popup version because I only work remotely
a majority of my users use the popups
so I need to make sure it is possible to use with them
I only use popups to generate the configure file, then I just use the configure file

so do I, but that's still using the popups lel
Should i update ur script? Havent done it in awhile @vivid python
you can, lots of smaller updates happened probably
soon there will be the update for sd-scripts
which introduced block weight training
but I need to set up some stuff for it
namely, a proper popup for it
Ill just wait til can is a must
I dont wanna suddenly throw something into disarray
Like last time 
Ye but u can just edit argslist
I just backup argslist for my own brain dead reasons, and just edit that for each
It's the python file that has all the args
Finally got around to adding all of the block weight training stuff that kohya introduced
It took way too long to make sure everything was working as expected
but anyways
it's done
people can update through the update.bat if they already have it
or use the v6 installer
links just to make it easier to get to:
https://github.com/derrian-distro/LoRA_Easy_Training_Scripts
https://github.com/derrian-distro/LoRA_Easy_Training_Scripts/releases/tag/installers-v6
A set of two training scripts written in python for use in Kohya's SD-Scripts repository. - GitHub - derrian-distro/LoRA_Easy_Training_Scripts: A set of two training scripts written in pyth...
yeah
seems like kohaku suggests using D-Adaption for it because it's a bit finicky to train
Until theres another game breaking way to train stuff i aint changing shit
That aside, i barely even understand what block merging is, even having looked at that one rentry page for it
block weight training will allow you a ton of control
... or you could completely ignore it and continue like you have
this update doesn't change the ability to train like before
just adds a new way
I could ignore it
But if i start seeing everyone switching up and talking about “oh yeah i used block merging for this” then fomo will get me
oof
As i am not immune to that
I can't say many will
its very complex
like very complex
like 125 values complex
That would explain how i still didnt really understand it
well, techincally its only 25
but you can set the weight, dims and alphas
per layer
which means 25 per, or 125 different possible inputs
anyways
sleep time for me
Now we just need auto mbw for lora training 
Weighted captions is apparently a thing now, kohya just added
Yeah one of the guys in unstable was working on it for a while
I'll add it once it's out of dev branch, I'm a bit burnt out after this update
So I'm gonna step away from it for a few days
That's an unreasonable amount, surely we can figure out an optimal setup with math or something
well, there are presets for the weights
god DAMN it
but literally only that
GOD damn
So no new args besides the huggingface ones?
ill let someone test out the block weights lol













