#LoRA Easy Training Scripts Redux
2733 messages · Page 3 of 3 (latest)
random shit that you don't have to worry about
more or less just torch yelling at itself
maybe cuz I set power limit to 75%
in addition to the undervolt I already had
or it was random
either way Icould finish the last one I did
peaked at 90c for memory
ok got 110 images for mutsuki dress
Btw seems like tag and toml doesn't save
Eh, 107 images with 2 repeat and 4 bat j gives like 70 epochs for 2k steps
I'll just save every 200
Hm, I set it to save only every 200 steps but it shows 11 epoch saves
Oh I forgot to change sample to per step
I'm only at 600 steps now but the epochs around 200 already seemed fine
Oh for dev? Yeah I know, I need to fix it
When I was working on the complete rewrite, I just flat out neglected to recreate those systems lel


Oh no
didnt go that well this time, even the last epoch doesnt get it right every time 
oh well I'll redo it later
Oof
interesting though
batch 4 repeat 2 2222 step run
batch 2 repeat 2 run so like 1122 steps
the 4 batch one took longer to do something and still isnt great at last epoch

wtf
I ran the trainer while sd was launched with an xl mode lloaded
💀
150 mb away from oom
150 isn't oom though lel
sadge
maybe I left in some shit pics
meh
dumbshits tagged different type of dresses for her
with the same teags
dumb af
ok should be fine
but had to delete a bunch
oinly left with 101 images now
I mean, it doesn't matter how much I try to help him
like, I've tried way more then the average person would even attempt to
I don't know what to tell him

mf literally made a personal branch for llil bro
💀
my latest favorite is this
I won't go back 😠
oh no how will we ever survive
oh god...
you know what? I would love for him to never come back
note to self, NEVER MAKE WHAT HE WANTS POSSIBLE
from the furry server
I use 0.41.1 though
Finally,bitsandbytes 0.43 support windows
https://github.com/TimDettmers/bitsandbytes/releases/tag/0.43.0
Good, finally can update it
can someone help me to use this correctly?
I have it set to 118 because there was some compatibility issue, but it's probably resolved at this point, so I'll change it to 121 and see
sup
short question
how many steps should u have with 500imgs?
8k or 200
i got 2 configs and im a bit confused on the high difference
I didnt know how to do torch 2.0 with 121 so I just downloaded the latest stable
no xformers is the downside
doesn't 2.2.0 have support for 121 though?
dunno, I just didn't know how to download it 
but I didnt see any performance being worse with sdpa
so was fine for me
btw what would you normally use deepspeed for
isn't deepspeed for mutli gpu?
yeah, so not really useful for the average person
gonna get 3 moer 3090s soon 
when using lora trainer, and being over the limit, what happens to the additional tokens? The settings have a max token length of 225, but my file has in total 400tokens ?
It truncates to 225 tokens
oh thank you, do you also know by chance what may be causing this blur? I have it in almost all epochs
#ai-art message
ya looking at dev branch commits it should be working as of at least 2 days ago in za UI

Kek
I've been testing Dora since it was fixed lel
Honestly, it seems worth the hype
Still figuring that out, they are slightly different then normal lora, so I've been messing around with testing them. I have a very prototype toml I can send you once I get home from work
Don't worry, I know you've been wanting it too lel
Wasn't that before the fix?
Yeah, it was
The fix was 4 days ago
The "way smaller lr" was 100% related to a bug in the code
Considering I still nan'd even with lr 1e-6 lel
bruh
what would be good to try for finetuning?
like not singular topics/concepts but a bunch
First time using it through the backend feature and it works flawlessly, sasuga Derrian
I think printing the local port that uvicorn binds to could be a good idea though unless it confuses the newbs
and maybe a way to change the port through like an ENV var 
Np, lots of work went into making it easy to install and use lel
Oh yeah, that was actually something I was thinking about
I have a config.json, so I can just add that as a config option

I also need to look into adding more tunnel options too, right now it's ngrok, and cloud flared, but I also want to add zrok, and local tunnel

Zrok might be annoying though lel
Btw, for most use cases, the user is either using it locally, and likely doesn't even realize there is a backend running as a separate process (in which case they use 127.0.0.1:8000 (pre populated), or they are using a tunnel, which doesn't have a port to enter in
So printing the port is generally not very useful
yee I just forgor uvi port off the top of my head
Fair
It seems less limited then ngrok, but it also seems like a pain to install
yeah it was a pain, I think I had to re-do it at one point
Either way, glad the backend is working fine for you
I don't even know if it's working now on my vps
Oof
imma be honest I don't even know why I did it
coz I have nginx and shit
do I have dementia
I might have dementia
That bad
yeah I absolutely cant tell why would I ever need it
AH OKAY I GOT IT
nvm
I was seeing if I can tunnel something to the world using zrok like that, from my local pc
Ah I see
would you filter out multi char pics for char loras? I have like one with 500 images but half of the pics have like 2-3 other chars on it. saw u did that duo blue archive green hair train chars shit so maybe u have some insight? 
I usually try not to include art which has multiple characters, unless it explicitly has both of the characters in it, and I explicitly tag both
when I run grabber, I usually make sure to have 1girl, and neg the other characters I am training, so that the images I do get in that search result are explicitly only the expected character
trying lion and rex 
forgot to update
whatever only doing same run as last time
:\LoRA_Easy_Training_Scripts\backend\sd_scripts\venv\lib\site-packages\diffusers\models\attention_processor.py:1259: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.)

steps: 1%|▍ | 24/2596 [01:50<3:17:10, 4.60s/it, avr_loss=0.0734]
if i disable gradiant checkpointing I'll oom
even tho it only uses 12gb with it on
Huh
Are you training these 3?
Because if so... I have a version of them trained for ponyxl...
Or are you only training ein?
what I do is I have a folder with those
and basically overprune the tags quite a lot
just so the chars are reimu, cirno, pose, location and outfit
and I set lower repeats on those
Interestingly, I do the opposite, I usually make sure all of their tags are there
And it usually works well enough if you have those images in their own folder with less repeats
hmm, I can see that working as well, yeh
I could maybe run some comparisons at some point
just 1
Humu, yeah I would only include images of that one character
Any image that has more then 1 of them is an immediate toss
Actually, I'm the madman that trained all 3 into one lora in 1.x lel
Fair enough
I kept all of my datasets with the expectation that I would be using them again eventually
Oof
I've only trained on pony, so I only have one model to worry about
don't use pony a lot so I train for all the other xl models instead
I'd gen a blank char in pony in the setting I want and then inpaint over it in other models instead
at least for the face
hmm
what card do u train on again
A6000
I don't gen really in general, outside of lora tests, so I just train on what most people are using
how much was it
4k
With checkpointing though... it's 11gb
You use oft?
yea
I didn't bother with it because it didn't actually look like an improvement
To be fair, dora is the first alternative algo I have used
i'll try it later as well 
Locon, and dora when it came out
I havent updated yet
It's only on dev, and it's not a selectable algo
y 
It's a modifier on lycoris locon, loha, and lokr
Because kohaku


hmm
epoch 1 vs 2
why buba
wasnt sure what to adjust for lion and rex scheduler
so I just ran same shit as before 
2e-5 unet and 1e-6 minimum
Do it
dont have 4k now
limbs where
lion huh?
ye
wait
I confused it
with the other one u told me about

FUCK
I didnt see the fucking scroll bar
meh
last epoch sample
Ein has really odd eyes, so I would not expect it to learn perfecfly
could be nice if the artstyle was cleaner
or adetailer
need to alternate outfit to be able to send full pic

beta1, beta2, beta3 = group['betas']
ValueError: not enough values to unpack (expected 3, got 2)
3rd one is 0.999?
Consider setting beta3 between 0.9995 and 0.99995 if beta1,beta2 are set to 0.9,0.999.
Beta3 (often denoted as (\beta_3)) is specific to the CAME optimizer.
It determines the confidence level for the adaptive memory update.
I'll just do 0.9995
similar memory use
11,800mb
I just use the default betas for came
I had a feeling you would say that
it didnt change for me
I mean, I just deleted the arg from the optimizer args
ah
and use whatever is packed into came
oh I don't use diag oft, so I can't give you any params
I just use whatever 
I meant if u train te for diag oft it wont work in webui
so gotta disable it anyways
oh yea its not even close
came diag oft epoch 1
1girl, ein \(blue archive\), blue archive, by quasarcake, [by ssambatea], solo, black background, simple background, best quality, masterpiece, absurdres <lora:ein3-000001:1>
bruh this artist tag forces muscle bodies
eh, so close on the eyes
gonna check tmrw which one gets the eye right the most
using artist tags lel
yeah, nah, it's definitely me that's the weird one for absolutely refusing to use artist tags
maybe, i dont realy give a shit personally, I'll do almost anything to get good looking images
if I really cared about artist tags I wouldn't even be using ai in the first place since the entire thing is a mix of artist tags
sec
well, I should say, I don't use artist tags when testing lora
oh yeah
but then, I don't gen outside of testing lora, so I never use artist tags

I just enjoy training
heh
it is nice being able to do batch 2 with a dora without checkpointing
a solid 41gb of vram when not training te
dunno dont even play games much so its not like the resources are needed elsewhere
but I couldnt gen while training or do llm
when i'm not training, i'm working on the UI, and when i'm not working on the UI, then i'm playing games with friends
how is it that much
I dunno, dora just uses a ton of vram, but somehow compresses down to 11gb with checkpointing
I don't make the rules
oh u added a toggle button for dora?
wait does higher batch size mean slower train
It can if you have to use a bunch of grad acc and checkpointing, but usually it's faster
On dev, yeah
didn't see it so probably didn't update properly
It's just a checkbox on the network args I believe
You aren't on dev lel
Either that or you aren't on the newest dev
`45bf41ff988e99304cbe716ad1642571288712e4 branch 'dev' of https://github.com/derrian-distro/LoRA_Easy_Training_Scripts
f595d1e5430e1037688cf1d09efa3c0733840c1f not-for-merge branch 'main' of https://github.com/derrian-distro/LoRA_Easy_Training_Scripts
9855d2ee1a2ea9ed82df0b35cb17a6f723ff5365 not-for-merge branch 'old-scripts' of https://github.com/derrian-distro/LoRA_Easy_Training_Scripts`

I used the update bat
i'll just redo it
If you are swapping from main branch you have to do a complete reinstall
I'm on dev
D:\LoRA_Easy_Training_Scripts dev git status
On branch dev
Your branch is up to date with 'origin/dev'.
no
I just cloned it at that point
and thne used update bat
in backend and in the base folder
You only need to touch the base folder, and the update folder expects you to have already run the install.bat
(In the root folder, it handles both)
nice
Yeah, you shouldn't ever need to go into the backend
I only did to use a different torch version and just saw there was also an update bat there so I ran it

It installs torch 2.2.1 I think, which is only one version behind
yeah this was back before 2.2.0
And I think I swapped it to using cu121 at some point too
I think I might change the update files to just nuke the venvs and reinstall from scratch everytime
Easier then having people ask questions
does it come up often
Enough times I have people who don't actually run the installer before running the updater
Or they installed everything then realized they weren't on dev
I mean it kinda makes sense that u only run the installer when you're first setting up so from that angle I get it 
I still have a ton of work to do on the UI before I want to move it to main
But time makes it so my work is somewhat sporadic
Yes lel
kek
FAMST
welp that didn't work 
What exactly were you trying to do?
wanted to try if boft would work since it seemed to use the same parameters as diag oft but I blew it up 
can I just type the names of these for lycoris preset in the preset box?
https://github.com/KohakuBlueleaf/LyCORIS/blob/main/lycoris/config.py

tried but didnt do much
oh well done for now, was worth a try either way
oh yeah, it will be fine with those
#1198937785961291808 message
ok it works after turning off rescale
I don't have access to that server
ah
its just the lycoris server
but its not relevant anymore, since it works
seems like boft is more prone to pick up style with same settings compared to oft
but tbh i'M not sure what settings are good for it, 0 info pretty much anywhere
thats the downside of brand new or not much used stuff
ok epoch 1 of th eboft is really fucking good

lmao char is just defaulting to nude with empty prompt, based
boft deez nut
Ah I see, yeah I'm not in that server because I don't actually use lycoris stuff often, at least not until dora came out
bofa?
what is bofa
Ligma, ligma what you ask? Bofa, bofa what you ask? Sugma, sugma what you ask?
G2A (Cheap Steam Games) Link: https://www.g2a.com/r/rekts
Boss Boxes (10% OFF With Rekts Code: https://bossboxes.store/
Thanks so much for watching I'm just another YouTuber trying to get out there on YouTube.
Love you guys press that Sub button and hit that notif...
uhh
I think I know why boft took so long
just 10gb worth of checkpoints
and I only saved every 2 epochs
oft with same settings was 92mb
the fuck
4x size
that's not getting added until it's in kohya
i'm not touching that with a ten foot pole
(i've already tried to implement it)
fair
Factor?
It's worth it compared to full finetune
But I don't think it's a setting for just lokr
They're just calling it a different name than what's in the UI orobablxy
It's what was used for the kohaku models so that's enough for a result for me to try 
hmm, I guess we look for quality in different ways, because I don't really consider those models quality
Yeah I don't consider pony quality either yet here we are 
neither me, but at least it's trainable
big reason why I don't use animagine is that it's annoying to train
What's wrong with animagine training
it's just really annoying to train on
it's too "fiddily" I guess
not very easy to get good results from it
I wonder if kohaku fixed oft and boft training
time to try
wtf didnt realize there was an option for textual inversion
Yeah, some random furry pr'd it a while ago now to add support for it
I just decided to clean it up a bit and add it in
I'm only on 3.0.0dev4 so it's not the most up to date version
I need to get around to updating everything
I updated to latest and it worked fine
Didn't get to test though, ran a setup I know worked, undertrained though 
D:\LoRA_Easy fork\LoRA_Easy_Training_Scripts\backend\sd_scripts\venv\lib\site-packages\torch\utils\checkpoint.py:90: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn(
steps: 0%| | 5/3456 [01:30<17:15:28, 18.00s/it, avr_loss=0.0604]
is that warning okay
wanted to try different setup
batch size 4 accumulation step 2 128 epochs is not the way to go for boft

actually not sure, maybe more effect than last time but also changed other settings 
dataset has some really pretty pictures tho
can u fit in the option for boft sometime? I think it just needs an entry in the algo list and ui and that's about it 
Probably can, yeah. I also have another arg I need to add at the request of somebody else anyways. I might actually add another drop down to allow adding of random args people might want to use but doesn't make sense anywhere in the UI
BOFT is added to the algo list
cool 
Today I learned REX uses d=0.9 by default and not 0.5 as noted in the paper 
Why was 0.9 chosen as the default rather than 0.5? Was it personal preference 
I also noticed that d isn't a parameter that you can adjust in scheduler args
perhaps i'll do a pull request for it to be allowed 
d is a scheduler parameter for the custom scheduler REX
it affects how aggressive the curve fall-off is (or not aggressive?)
do u have a link? I couldnt find it
Deep learning practitioners often operate on a computational and monetary budget. Thus, it is critical to design optimization algorithms that perform well under any budget. The linear learning rate schedule is considered the best budget-aware schedule, as it outperforms most other schedules in the low budget regime. On the other hand, learning r...
The effects of d on the curve of the scheduler. The paper sets d=0.5
I'm also just speculating, but I think the REX scheduler works because the overall learning rate is higher, and that at later steps, it's at such a low learning rate that it literally does nothing to the model.
So perhaps derrian choose d=0.9 to aggressively cut off the useless and low learning rate that isn't used, and that d=0.5 isn't enough. I personally chose d=0.7. I might pull request for my changes to allow users to set d values as scheduler args.
It was the default in the implementation that kohya didn't accept the pr for, I didn't read the paper, and it worked fine, so I never bothered to change it
I actually didn't do any number crunching or theorycrafting for this one. I was asked to implement X version, so I just did lel
I actually couldn't quite figure out what d was doing from the code alone, so I didn't touch it
I already have plans on updating the UI, I'll probably do that this weekend
d is how bias the scheduler is towards enforcing higher LR towards the end/middle of learning.
first image is when d=0
second image is when d=0.5
third image is when d=0.9
implemented default is 0.9, paper default is 0.5. I think something around 0.7 (fourth image) is maybe a bit better, but i idk
when I get the chance I'll make the changes and make a PR. it just exposes d to the init function of the scheduler.
No need, I'll be working on the UI today, so I can add it as an arg
But I've had no issues with 0.9
It works very well
cool beans, thank you 
0.9 is probably still a good value at medium-low learning rates
Well, it also works nicely for longer training runs
though if you use a rather high lr, it never decreases enough to learn and finalize the bake
i think 0.9 has it's uses, yes
I train at 7e-5 normally
i did a few trainings at 1e-3 
I would never think of using that high of an lr even when I wasn't using rex lel
i was trying out a very specific technique of copier loras
essentially, you overfit a lora on exactly one image, then merge it, and train the a new lora on the difference from the merge
for style traininng... it's a mixed bag, but i've seen it work for consistent image filters (e.g. greyscale)
Yeah, I know of them. I thought the lr for them was smaller
they can be, i think it's just a longer train to overfit. i'm sure that high of an LR has implications on the final lora even if it's just to lock image generations
i'm still trying to wrap my head around it... namely when training the second LoRA; if it should have exactly the same params as the initial train or train as if it's a normal LoRA
or if, again, that high of an LR on the initial copier lora has implications on the final result
i'll have to formally run a trial to test this
nice, it's weekend from tomorrow 
It'll be the weekend starting in an hour or two for me
So I'll be working on it soon

Libadwaita update?
what the fuck is that
Just Gnome’s GTK 4 layer
No?
And I have done the Libadwaita joke to a few Windows users
oh, alright then
libtard waiter
Yes I forgot you were delayed in French sorry

Was debias estimation implemented yet?
Extra args
Basically, any arg I didn't know where to place, or didn't feel needed to be a default value got shoved to the extra args, basically just put the arg and the variable in and it will load it into the args
👍
Xformers doesn't, no
You have to use sdpa
I'll just use that then 
I don't think I have too much of a diff between xformers and sdpa in terms of performance, idk
there isn't
enable fp8 training.
Traceback (most recent call last): File "D:\LoRA_Easy_Training_Scripts\backend\sd_scripts\flux_train_network.py", line 408, in <module> trainer.train(args) File "D:\LoRA_Easy_Training_Scripts\backend\sd_scripts\train_network.py", line 594, in train t_enc.text_model.embeddings.requires_grad_(True) File "D:\LoRA_Easy_Training_Scripts\backend\sd_scripts\venv\lib\site-packages\torch\nn\modules\module.py", line 1729, in __getattr__ raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'") AttributeError: 'T5EncoderModel' object has no attribute 'text_model'
the watface

guess I'll try a different one
yeah, you need the T5xxl_fp16 from SD3 medium
oof 9gb
nvm
justr gonna disable it, would be a tight fit anyways

maybe my config from before is overwriting something
gonna go through a bunch of settings and barebones
ok I forgot to change main args
was on boft
do you need to load t5xxl even when you disable te training
ok yeah thats not it
happens even when I use thhe sd3 one
but I checked and you're already up to date with that pr I linked so I don't get it
apparently a workaround is cache te outputs but then I get another error 
yeah not sure
alright got a working setup I think
https://github.com/bmaltais/kohya_ss/issues/2701 was reading this

oof
this time it will work 
ok it went back to optimizer got empty param lllist
after I removed relative_step=False scale_parameter=False warmup_init=False
so it's probabllymissing a setting
or I need to try the other optimizers
are you using came? came doesn't have those args
I was using came for my tests, so I know it works
I don't get why it says that then 
no clue
do you have any of your test configs?
ok I got it I tihnk
got further
Traceback (most recent call last):
File "D:\LoRA_Easy_Training_Scripts\backend\sd_scripts\flux_train_network.py", line 408, in <module>
trainer.train(args)
File "D:\LoRA_Easy_Training_Scripts\backend\sd_scripts\train_network.py", line 1099, in train
noise_pred, target, timesteps, huber_c, weighting = self.get_noise_pred_and_target(
File "D:\LoRA_Easy_Training_Scripts\backend\sd_scripts\flux_train_network.py", line 323, in get_noise_pred_and_target
assert network.train_blocks == "single", "train_blocks must be single for split mode"
AssertionError: train_blocks must be single for split mode
prob most of the problems were related to trying to use lycoris modes
That should be handled by my UI
Also yes, lycoris is completely not supported
yeah that was on lora

disabled split mode and now I'm stuck at 24gb
Yeah, that's related to the split_mode arg
There was a bug related to it like 4 hours ago
But I fixed that quickly
Well, to be fair, it's experimental for the biggest reason of it can break at any moment and I've no clue why
I'll give you a config that worked for me when I get home and finish eating
Keep in mind you'll have to tweak it
Because it's built for 48gb of vram
pokemon
yeah
No good
uh
Load the bf16 version directly from the flux repo
I get to this point with both fp8 and bf16 ver
and both give this error
I'll change huber to something else
alright it's running
changed loss to l2
oh yeah, l2 loss only
changed to batch size 2 since didn't make much of a difference being on 1

shouldve put saving per steps instead of epoch with 2k images but whateva

oh yeah, this happened to me
the defaults are really bad lel
best to setup the flux args to be something close to this
np
was running this lul
wonder if I can speed it up than what I did llast time
was like 30 minutes for 100 steps
huh, my files and sample images aren't saving to the designated folder/wandb and I don't see any warnings or error messages 
Odd, things save to the expected folders for me
yeah just randomly happened 
hopefully not on 2nd epoch
welp
ggs
dunno how thhat happened
-1 hour I guess 
!configs?
so long as kohaku hasn't updated lycoris for kohya, there is nothing for me to add
Make easy training a multi backend project
I would love to see how you would do that
you turn it into swarm
i'm not learning C#

too java like
is this factor thing a setting somewhere?
fuq it düd
ok yeahh it doesnt matter what I set it to
as long as factor is -1
that's a big dim 👀
That is new, I'm pretty sure, and considering I don't use lokr at all, I didn't notice the error
not just lokr, I see it in others too
So long as it isn't locon or lora, I haven't used it at all to know something was wrong lel
(And clearly nobody else has either, considering my absolute lack of any sort of bug report)
I did ask once before 
Gonna be honest, I tend to forget if it's not made an issue on the github
I didn't know it was an issue 
I never tried lokr either, it only came up in boft warnings before
I still don't know if it's an actual customizable option, but from reading the lycoris server it seems to come up fairly often when talking about settings for networks
and it doesn't make sense that lokr is locked at -1
it gives a 10mb file even if I set dim to 10000
ok I thought there were only like 30 new teri pics that I got from boorus
but on cn forums there's like 3 times that in just a month

I have no clue who that is or where they come from
issue or not issue
On most git host making an issue helps track suggestions, bugs and other stuff
I think i've seen some art of her. I definitely dont know her though
lost mask only to become zorro again 
turned up lr and dim a bit but maybe it's too much, quality is ass now 
overtrainging now, making the plushies without even prompting 
overbaked sadge
@magic peak 

Is it hard to set up remote use?
no, just install the backend on the remote server + a tunnel if you need one (it has 2 to choose from, but you can just use your own if you prefer) and then copy paste the url to the frontend
(which is installed on your computer)
Pog
what gpu do u use for training flux?
3090 in my pcc
how many it/s do u get on that?
depends on the setup 
I think 3 hours for 200 steps
withh 23gb
vram use
again depends on setup
oh well i was looking for some other settings to speed it up on a 3060 12gb but idk if 10secods per it is fastest u can get on kohya using adafactor/constant
not really a good metric to use for comparisons unless you compare against someone running the same exact settings
what's your vram usage?
8.9 peak
try telling ai your exact settings with that amount of vram and ask what to increase to get more speed out of it
do u think increasing resolution of 512 vs 1024 or batch size would increase speed? or maybe switching to lion?
worth a try
all I know is that increasing from 512 x768 to 1024
made mine use 10gb more vram
for some reason
but when u trained loras for sdxl/pony do u notice that training speed on flux is 2x slower or 3x times slower?
well the models are bigger
its not much slower for me, just uses more vram. I was used to 3 hours~ for sdxl
well ill just wait maybe they will optimize scripts later
I had triton for it for a while, never actually worked, so I won't be adding this
I mean it's just a package in the end, so people can install it themselves, if it ever gets onto pypi then I'll think about it
I don't want to manage auto downloading the correct wheel
Especially because the updates will continue to change the names
just linked cuz saw it
not that you have to add it
how do the samples work if you train lora on vpred model?
no clue, that's definitely a kohya question

left mutsuki dress prompt


