#LoRA_Easy_Training_Scripts
2192 messages · Page 3 of 3 (latest)
updated my scripts to support dylora https://github.com/derrian-distro/LoRA_Easy_Training_Scripts
Dylora
From what I know, it's a way to make low dim lora work better? I haven't thoroughly tested it
With the ever-growing size of pre-trained models (PMs), fine-tuning them has
become more expensive and resource-hungry. As a remedy, low-rank adapters
(LoRA) keep the main pre-trained weights of the model frozen and just introduce
some learnable truncated SVD modules (so-called LoRA blocks) to the model.
While LoRA blocks are parameter efficient...
They state it can train x7 faster than lora...?!

👀

More like 7x slower 
I tried reading the paper but it sounds like dylora is gonna be useless
If it was 7x faster that would be epic but it wasn't when derrian tested it
I due prefer a good speed
actually, I didn't really test it entirely, just tested to make sure it actually trained, it's possible that it actually is, I just haven't had the time to test it myself
though, I did find that it trains about as fast in terms of iteration speed, just ran out of vram that first time
so it didn't count
So the jury is still out
what kind of settings were tested for dylora?
based on the paper, it seems like the purpose of dylora is that you can do inference at different ranks

my dylora seems super undertrained for the same settings as locon
Which version of dylora did you use? Kohaku's is different from kohya's
And because of that dylora is not going to be able to be used depending on the mode
If kohya's, then you have to use additional networks
not a x7 faster, but a x7 faster to get to optimzal results. did you cut training time by 7?
oh man... so many new implementations of lora training while i was busy
well, mainly ia3 and lokr and dylora

and then this block weight training thing
i haven't even looked into what optimizers to, besides that adam8 is the "best"
I should've considered this more seriously
You should test dylora out of all the new things
i'm doing that right now 
setting up a json, but i won't be able to train until a bit later
I'm waiting for someone to figure out an optimal setup and roll with that
i tried kohaku's dylora
adamw8bit I guess
4e-4 unet with 5e-5 text encoder learned like basically nothing
at ~900 steps
what dims were ppl testing on dylora
batch size?
supposedly the idea is that you can do inference at a diff rank than its trained at based on the paper?
im always batch 1
basically stochastic lol
like, generating images
oh, that's interesting... i'm reading now that it's adaptive at inference time
i was under the impression that training is adaptive in determining rank (dim)
hence, was confused why you would want to pick a dim, since dylora would optimize the dim anyways
from my understanding of the paper, the supposed benefit of dylora is to avoid having to do multiple training runs at different rank
to find the optimal rank
since you can just select the rank used at inference
now, im not sure what the dim settings on dylora do
maybe its the maximum rank?
rip neither kohya's or kohaku's repos having english documentation for how to use kek
ok im just gonna run kohya's documentation through deepL lol
"Features of DyLoRA in this Repository
After training, DyLoRA model files are compatible with LoRA. LoRAs of multiple dims below a specified dim(rank) can be extracted from the model file."
so i think the rank specified for dylora is like the max rank
it will simultaneously train for all ranks below that
"According to the paper, higher ranks of LoRA are not necessarily better, but it is necessary to find the appropriate rank depending on the model, dataset, task, etc. Using DyLoRA, LoRA is trained simultaneously at various ranks below a specified dim(rank). This saves time in learning and searching for the optimal rank for each."
"Also, specify a unit for --network_args, for example --network_args "unit=4", where unit is a unit to divide ranks. For example, --network_dim=16 --network_args "unit=4" where unit is a divisible value of network_dim (network_dim is a multiple of unit)."
so you can specify how to divide them
based on this i think you can do like dim16 with training also at dim12/8/4 if you set unit=4
in kohaku's its called block size iirc
"For example, training with dim=16 and unit=4 (see below) will train and extract LoRA for 4, 8, 12, and 16 ranks. By generating images with each of the extracted models and comparing them, the LoRA with the best rank can be selected."
basically dylora is to avoid having to retrain multiple times to find the ideal dim size
interesting... 
i wonder how increasing dim effects training time
cause lately i've been trying to train style at 8 dim, and with default unit 4, then dylora would only train 4, 8, which doesn't seem like it would be an improvement
i haven't experimented with dim/alpha at all tbh, so i don't know too much about how they effect results
but i guess with dylora and extraction, it would be easier to extract lower rank dims and compare them
i know there's already comparison grids of dim/alpha, but it's a different kind of learning if you do it yourself with your own dataset that you're familiar with
not sure why with kohaku's it seems to need either a higher LR or more steps
than locon
i didnt have any turn out well
i'm thinking about trying this dylora with dadaptation
ha, i'm an idiot. i put too many 0's on my unet_lr, so I was training x10 less than usual


oof
good
any guide on how to use these scripts?
or even link a message in a convo of someone explaining it
feeling really stupid rn
you just need to follow the popups
once they are installed using the installer
you can run them by running the run_popup.bat
once loaded it will ask you a bunch of questions sequentially
if you know what settings you want, it's pretty quick
if not, then it might be a bit confusing
I'm working on an overhaul of the UI right now, as in, I'm making a whole UI right now
what are you having an issue with in particular?
Honestly, using the json file with notepad is my UI and honestly that's all I need.
The arglist.py (i think) is a good reference as well, albeit a bit hidden.
Hey, could someone help me out with the following or give me tips how I can succesfully make a lora out of these images:
I know these are quite limited, but I cant figure out how to do this properly. I am getting mixed results with Kohya, would your easy training script help? Like normalizing.
Going to check it out rn though
the easy training scripts also uses kohya on the back end
so it's likely you won't get better results if you were getting bad results before
that being said, that is far outside of what I normally train, so I can't really help you
just some quality of life features i'd like to see implemented.
- if the output folder doesn't exist, just create it
- allow the provided json name to be used as the name of the output folder, log prefix, and output name (togglable functionality)
- maybe have the same functionality with im/reg folder path, but enforce suffixes to keep naming ordering consistent (togglable functionality)
- let custom schedulers take the "num_warmup_steps" and "num_training_steps" as arguments for kwargs (my custom schedulers are a similar implementation of built-in schedulers)
my workflow is currently as follows
- generate a template json config script through json
- edit the json config. i find myself redundantly editing the output folder, output name, and log prefix to the same name
- copy json file to create variants, usually adjusting one hyperparameter, but also changing the output folder, output name, and log prefix
- create output folders
- run multiple json training
- go away for a long time and hope training didn't stop because i made a typo or forget to create a folder or something
it's very fiddly but very powerful, i like it
it's just... after doing 50+ trainings... it kinda gets to you
this is probably super extra and probably not needed on main branch, but if in the json file name, i put something like e12, it would know that this json is meant to be ran for 12 epochs, and will run for 12 epochs regardless of what's in the file itself
that's probably something i would have to do for my own personal workflow, but just something cool to bring up, i guess

other examples could be Ux# to multiply lr_unet by #, or Tx# to multiply lr_textencoder. all separated by spaces
ah, when i tried running that after installing with v5 quite simply nothing happened
Sometimes it takes a while for everything to initialize, this is seemingly an issue with tkinter, not sure if I can change it
Creating an output folder Is doable. Id have to rewrite some of my json code but allowing the use of the json name is possible, not entirely sure what you mean by enforcing suffix. Pretty sure custom schedulers already have that ability in the scripts but I don't use custom schedulers nor anybody else I've talked to, so I didn't feel the need to care about its implementation, either way, that one would probably be really annoying to account for. To be entirely honest, seems like you just kinda go... way too far per bake?
That being said, development time is currently being spent on making a UI
Oh, and about the name being used for arguments, that kinda defeats the purpose of the json files in the first place. And it would be a lot of work making a parser for a system very few would use
pressing enter helps sometimes. it could have just paused on its own. happened a few times for me
Oh very true, that's unfortunately a quirk of command line
well somehow my output with easy training seems a bit different from just doing it through powershell. probably just me
i do like the easy features and .json
a custom ui/gui would be ideal and great
uh so what now
A popup should have appeared, sometimes it doesn't appear on top of everything else so just alt + tab to find it
There might be a real reason for that, by default weight decay is something like 0.01, I have it set to 0.1 by default on my scripts because I've found it generally produces better results
ah i see, I didn't expect one that doesn't show up on the taskbar
ty
Yeah, quirk of tkinter was hoping to not be using it anymore at this point but that's not how it panned out. Good thay it's working for you now though
What does weight decay do
It basically is the rate in which a model forgets something, so a higher weight decay can help in reducing bad training early on
Or, If too high, it could completely not learn anything
Granted, 0.1 is give or take a good spot to be in
If you have a really high lr, the weight decay can actually fix some of the issues that comes with that
Usually
That makes sense. Ya. I have my LR low. As in for character 1-1.4e-4 max for characters. (Lion too)
0.1 does sound a bit too much in some cases
I'm trying to add dadaptation as an option to my colab but I got this error
Setting different lr values in different parameter groups is only supported for values of 0
I have these settings ```toml
[additional_network_arguments]
unet_lr = 1.0
text_encoder_lr = 0.5
network_dim = 16
network_alpha = 16
network_module = "networks.lora"
[optimizer_arguments]
learning_rate = 1.0
lr_scheduler = "constant_with_warmup"
lr_warmup_steps = 35
optimizer_type = "DAdaptation"
optimizer_args = [ "decouple=True", "weight_decay=0.02",]```
I don't know what a parameter group is in this context
I think the parameters groups in question are unet and text encoder
But other people can set them differently just fine
I had to pip install dadaptation manually, maybe that's why? But what else can I do
D-adapt can't have seperate unet and te
It's based on adam not adamw, so it's all on one lr
0.1 is not really too much, in 99% of cases, even 1e-4 (which isn't really that low) works fine, as that's what I used to train at
i thought decouple would change that
decouple decouples weight decay
that's it
True
Should i run the update again

Also is there anywhere i can read about the schedulers?
so i want to train only the out/up layers, but when i put a weight of 0 for the middle layer, it just asks me to cancel, .1 says not an integer, it only accepted 1
oh i should try updating..
I have a question that I train lora on colab and the result is also very good (probably) but the size of that lora file is only about 10-20 mb (or I have trained too few images) and is there a way can anyone help it improve because when exporting lora file about 10 files, 1 file seems to be fine
That sounds odd... let me double check the code
Ok yep, that was my mistake, I forgot to set it's mode to float I'll fix it soon
What exactly do you mean?
i don't know if my lora is really ok, and i also tried many versions and the result is very different and sometimes error and my model seems the file size is very small compared to some other lora (the lora which is 80 -150 mb in size)
All of my lora are either 16-ish mb or 30-40mb depending on if I'm using a lora or locon
So it's not an issue, just means you are using a smaller dim size
It might just be your training parameters causing problems here.
Because a small dim size won't break them
you are used colab to train not
I don't use colab to train, no. I'm the one who makes the easy training scripts
@errant wraith I updated the scripts, it should be fine now
Wouldn't #1092821901430227085 be a better channel for this?
Like why not move there?
Sick, thanks 
this is among the oldest posts probably, the "guides and resources" section didn't exist when this was created. I saw no reason to move over as usually there isn't much talking that happens here, that being said once the UI is done I'll probably create a new thread over there


