#LoRA Easy Training Scripts Redux
1 messages · Page 2 of 1
yeah tried both 
I don't think you should use gradient checkpointing with SD1.5
but you could give it a try.
was initially using python 3.11 and i even changed it to 3.10
thinking maybe that was it
I think it starts when I Enable it but it takes like 10 hours
and I take you have swap enabled and you're not getting OOM for ram, right?
yea i have swap
this is about the speed I got with my 6800xt
what's the diff between running fp16 and bf16?
precision, but afaik RDNA2 cards can't do bf16
I read somewhere that
the default memory attention setting
after torch 2.0.0 uses sdp
or whatever the fuck
iirc you need a MI100 or a MI200 to get BFLOAT with AMD rn
dunno if that was a hoax but I know that sdp ooms compared to other stuff in sd
but that would expalin the dumb vram usage
sdp should work with RDNA2
it works but it just ooms out
compared to invokeai or doggetex or whatever else option you can select
Like
`[00:02<00:00, 9.03it/s] 512x512 doggetx
A: 3.08 GB, R: 3.56 GB, Sys: 4.1/15.9844 GB (25.5%)
[00:22<00:00, 1.07it/s] 2x esrgan .45
A: 10.28 GB, R: 14.72 GB, Sys: 15.3/15.9844 GB (95.8%)
[00:05<00:00, 3.69it/s] 544x960
A: 6.12 GB, R: 8.53 GB, Sys: 9.2/15.9844 GB (57.3%)
[01:03<00:00, 2.76s/it] 2x esrgan .45
A: 10.49 GB, R: 14.91 GB, Sys: 15.6/15.9844 GB (97.4%)
[00:02<00:00, 9.07it/s] sdp
A: 3.08 GB, R: 3.56 GB, Sys: 4.1/15.9844 GB (25.5%)
OutOfMemoryError: HIP out of memory. Tried to allocate 8.00 GiB. GPU 0 has a total capacty of 15.98 GiB of which 4.73 GiB is free.
again
[00:02<00:00, 9.17it/s] invoke 512x512
A: 3.08 GB, R: 3.56 GB, Sys: 4.1/15.9844 GB (25.5%)
[00:27<00:00, 1.16s/it] 2x esrgan .45
A: 10.22 GB, R: 14.72 GB, Sys: 15.3/15.9844 GB (95.8%)
[00:05<00:00, 3.71it/s] 544x960
A: 6.11 GB, R: 8.53 GB, Sys: 9.2/15.9844 GB (57.3%)
[01:00<00:00, 2.62s/it] 2x esrgan .45
A: 10.75 GB, R: 12.55 GB, Sys: 13.2/15.9844 GB (82.6%)
[00:02<00:00, 7.05it/s] sub quad 512x512
A: 2.48 GB, R: 2.94 GB, Sys: 3.5/15.9844 GB (21.6%)
[00:48<00:00, 2.19s/it] 2x esrgan .45
A: 3.85 GB, R: 5.33 GB, Sys: 5.9/15.9844 GB (37.0%)
[00:07<00:00, 2.65it/s] 544x960
A: 2.93 GB, R: 3.82 GB, Sys: 4.4/15.9844 GB (27.8%)
[02:13<00:00, 6.23s/it] 2x esrgan .45
A: 5.67 GB, R: 7.75 GB, Sys: 8.4/15.9844 GB (52.7%)`
sub quad still king for rdna2 lol
yea half ram usage is nuts
oh not bad, i could start a 768 run
pc pretty much unusable tho
and if it spikes hiher than 200 mb I crash
yeah just trying stuff out
on the colab trainer I've been doing 1024 runs sometimes

well, nvidia, can't do much there
most folks chose rdna2 due to energy consumption vs ampere
but almost gave up AI in the process lol
yeah I Got it before the I got into ai stuff
had a titan xp before and was like amd is better fuck nvidia
and instead I got fucked 
honestly
if you have the money, just rent some 3090 in vast.ai
it should be cheaper than paying the energy bill for your 6800xt
it's not that bad now I guess, maybe if that 1 not even an amd employee guy finishes the miopen stuff I can finally use all this stuff on windows
well not the sd part, or is it
I know koboldcpp has a rocm port but the other important ai stuff didn't get ported over yet
https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/9512 according to this.
it should be running
slow, but running
I think this is directml
didn't bother reading it, sorry.
yeah
you can test the loras in your pc
while it trains
and it's quite cheap, like 0.25$ hour
same stuff in tensordock iirc.
https://colab.research.google.com/github/hollowstrawberry/kohya-colab/blob/main/Lora_Trainer.ipynb? I've just been using this before, takes like 2 hours for 2k steps on 768-1024 tho so not bad tbh, just wanted to test if it had an issue with regularization image usage compared to derrian stuff
i borked a lora so hard it looks cooked even at epoch 1 0.6
result of less than perfect ai gen dataset
hay hair

Someone told me to upgrade WebUI to 1.6.0 (I have 1.1.0) if I want to use my trained LoRAs properly with his scripts (I'm using old version)
How do I do that besides git pulling?
the answer is so obvious that its not even worth answering....
I'm sorry but I'm not aware of it, sorry if I sound like a noob
u have to git pull regardless
I did but it tells me already up to date
dont see the reason for not updating
Also I use a custom frontend for WebUI
Not the vanilla one, I replaced it with a better one
because the version is about over 1000 commit behind
easy way is to git reset. or even better. nuke your venv
git reset --hard
Will I lose everything by doing that?
your lora directory will be fine
its just resetting everything so that you can update to the new commit
so yes. you will lose your custom frontend
you don't lose anything you need
used pillow to add white bg
should I consolidate an outfit to a single tag if I dont want it to be modular?
I ended up git cloning original commit of WebUI and pasting it over my personal one instead of git resetting like someone told me
I hear some people do that, yeah
Seems webui-ux is 665 commits behind AUTOMATIC1111:master
That might explain why it can't update further than 1.1.0
I didn't like vanilla one since I couldn't zoom in the pictures after I generate 'em
Also on mobile it's kind of worst to navigate with
Since some sliders are sensitive and move even when scrolling if you touch one accidentally
ui-ux already came with zoom feature
Indeed
It works but I get an XFormers warning
WARNING:xformers:WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for: PyTorch 2.0.1+cu118 with CUDA 1108 (you have 2.1.0+cu121) Python 3.10.11 (you have 3.10.9) Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers) Memory-efficient attention, SwiGLU, sparse and more won't be available. Set XFORMERS_MORE_DETAILS=1 for more details
@trail tree
What's the version I need to install for PyTorch 2.1.0?
But I do like its new tabs for choosing LoRAs and hypernetworks instantly
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
or
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
no clue, I have amd, maybe ask in ai art
also the default setup should work
random 124 124/2x2x17 2108
magical 77 77/2x3x17 1963
casual 32 32/2x7x17 1904
herrscher 17 17/2x14x17 2023
hmm
oh well gonna try this
if it's burned I'll leave all on 2 repeats (it was burned)
Seems too high both for U-net ant TE
besides, you're only doing 3 restarts over 17 epochs, it might stay at too high of a LR for too long
for U-net use something like 2e-4 instead of 5e-4 and 5e-5 for the TE
so basically I just change it to prodigy or lower steps
I only made it until the 9th epoch
and first one was already overfit
I have to lower steps either way
cuz I can only train for 4 hours
and 4 hours is only enough for 9 epochs under these settings
ok I'll try higher restarts and your learning rate
tried 2e-4 still overcooked on epoch1
no, just
overfit
even with prodigy
14 repeats too high for 17 images
trying this now
other than that if nothing works I'm breaking up the subsets and doing combined, easier to keep track of
well wait I only just saw that you're doing multi outfit
so there's 3 actual outfits and a bunch of random shit?
bring the random down a bit if possible, but make it 1 repeat
make magical 1 repeat as well
casual can be 2-3
herrscher can be 4-5 yeah
your epochs just seem too big but also since every single dataset is learning her face
I did combined before just wanted to see how it goes if I take it apart
you don't need as high repeats
make sure all images are quality and varied
if you have some that are too similar just choose to yeet 1
is it gonna die even if it has 1-2 that are similar
but you can even drop dim/alpha and unet lr to idk 8e-5
well it's not gonna die usually from that but multi char/outfit loras are finicky
I'd keep the wide one
just out of the principle that most of dan is taller rather than wider
when it's 1girl
5Head
idk if that's an option in holo's trainer or linn's but since you have 4 datasets you're doing 600 steps (or 300 post batch division) per epoch
as in derrian's you can save every x steps
I'd combine them
meh
how good is the compression on the wide one?
if it's still overcooked and since it's already learning her face and stuff I'd drop the repeats even further
because the face and heterochromia and shit I bet is shared right
nvm, no heterochromia
tall is 1080x2340 wide is 1080p
the scripts will handle cropping and stuff
yeah so that can add up
at least its not implemented
it always handles different resolutions through either cropping or bucketing
wdym
you dont need to manually crop stuff unless you're doing very specific shit
bucketing should be in there right?
like eye loras
there is bucketing
yeah you should be fine
yeah so that will handle stuff for ya
just get high quality stuff
she doesn't have a whole lot of pictures on dan from what I see but just make sure that every picture is something that you'd say is nice
as in 212 dan pictures should still allow you to pick and choose
I finally accomplished what I did in April
I finally nailed a character using settings from April with previous LoRAs
This time I used Prodigy instead of DAdaptation as optimizer
Hands look off tho
It was trained with 7880 steps
Previously three folders (kuro, cocktail_observer and spirit_trap) had 4, 3 & 5 repeats respectively which trained at 4700 steps
Which were values Holo suggested to me
With this recent training I increased repeats of those three folders to 8, 7 & 9 respectively again, which is why it ended up with 7880 steps
Tomorrow I'll retrain her again, this time with those repeat values for those 3 folders Holo suggested to me to see if anything changes
bucket 0: resolution (576, 1280), count: 4
bucket 1: resolution (640, 1280), count: 2
bucket 2: resolution (704, 1280), count: 29
bucket 3: resolution (768, 1280), count: 13
bucket 4: resolution (832, 1216), count: 25
bucket 5: resolution (896, 1152), count: 19
bucket 6: resolution (960, 1088), count: 16
bucket 7: resolution (1024, 1024), count: 37
bucket 8: resolution (1088, 960), count: 16
bucket 9: resolution (1152, 896), count: 11
bucket 10: resolution (1216, 832), count: 7
bucket 11: resolution (1280, 512), count: 2
bucket 12: resolution (1280, 576), count: 2
bucket 13: resolution (1280, 640), count: 2
bucket 14: resolution (1280, 704), count: 105
bucket 15: resolution (1280, 768), count: 9
big buckets
still too much 
gonna put all of it in 1 folder like ebfore
not really benefitting from this
the last one where I didnt split it into fodlers was pretty good I just had 1 issue that I wanted to see if this would fix
honestly I'd say, 1 repeat and just yolo it via more epochs
if you find something decent during those epochs
just resume from it
and iterate over it
This is anecdotal
but I've had more issues when I separate outifts into folders and then add repeats over it
either the outfits bleed into each other
or something from one outfit overrides stuff in others
I tried it only once before, but there the reason it sucked was cuz the dataset was shit so dunno, here it's not even bad and still getting ass results 
do you mind sharing the dataset and allow me or someone else to check it and try it?
the only thing that changed is that I added like 80 3d images and around 10 ai gens compared to when I trained on combined
but I've done stuff with 3d images before and it didn't make the result worse
so dont think it's the dataset
unless it's the fact that bucketing splits it into like 15
didn't change tags either, same as the ones that I made these pics with (outside of the outfit specific folder tags) 
and 5e-5 for TE
this was the last good one
and use the epochs as checkpoints
for your training
if you find one at that LR
then resume from that epoch
instead of starting from scratch
eh
why 1024x1024 and nai?
again, anecdotal but I have had tons of issues when I go that high
cuz wanna learn these
oh hov sirin's pupils
as for nai, what else? 
I mean
you don't need to use that high resolution with nai
768x768 will do just fine
yeah that's what I usually do, unless theres some small detail that I wanna learn, didn't really compare 768 to 1024 for those tho so can't say if 1024 got it better
this is again, anecdotal
but for this
I trained her eyes as a separate concept
hakari_eye
and in there I put closeups of her face, and closeups of her eyes
at 768x768 I'd like to believe that it got quite close to the anime style
nah, just for these parts
that is a first for me ngl
but I don't think the u-net nor the TE can "understand" where the brooch goes
without a full picture
I have full pictures as well
specially if you don't have pictures with and without the brooch.
just added some closeups
otherwise it only generated shit like this
Why did the nsfw filter turn off on its own?
I wanna bleach my eyes
no idea, ngl.
back to that hakari lora I made, her hair ornaments, I trained them as something else
hakari_flower iirc
well its kinda the same thing just tagging something as brooch if it doesnt actually know what it is no?
but the first token for the brooch is... brooch or is it something like
"sirin_brooch"
I mean the brooch pics
yea I did sirin, brooch only on the closeups
theres no way I can do inpaint on eyes on non hires images
well the
you have a neck, collarbone, and a brooch
its just a cutout of exisiting images
"sirin" isn't there.
so it saw the full
my point being, you might find more sucess with it if you use a token for it
tried anyways
I'll try that approach next time
thanks 
good enough 
for hakari and her eyes
I put the special token
hakari_eye in images that had close ups
and the eyes were detailed enough
but not as the first token
since the folder 1_hakari_eye had it as the first token
can only do so much on 544x960
you know what, let me go and start the pc and share the dataset with you
that might give you more ideas lol
that would be pog
thanks
altho going to bed in a bit
so I'll check it tmrw
does mess up the halterneck tho
I think its cuz
on booru tagged images
brooch is on a
bowtie
like a jewel
yeah I dont like that it takes away the halterneck
so you either describe it down to every detail, or just use a special token for it
well its just a smal ldetail
andi t doesnt look like the actual thing anyways
got the pm
gotcha
ok turns out
the problem is the dataset
somehow
top is dataset with ai and3d pics, bottom is only drawings
I'll remove the ai images
oh and 2nd one is 15 epoch s and 15th still looks more normal than epoch 1 of the first row
like
it even plateaus out without being burned
I guess its probably the fact that I added like 80 3d screenshots and all white bg 1080p
ok maybe not
I did one where I removed the 3d images but kept the ai images and it looks worse than the ones where I removed the ai images but kept the 3d ones
ai, no3d
no ai, 3d
bruh why does such few ai image have such a strong effect
maybe it's not "clean" enough or dunno
like its too rough
would need to find a model that has a "general" look
maybe anylora or whateverthefuck
ok so it s not bad this time
but why does it get the fucking hair ornamnet so bad
I'll try training on amedira once
fu
maybe Ican train for longer
doesnt really look cooked at 11
and the outfit is still not perfect
tomorrow last day of training this, I either figure out the hair ornament or work around it with other prompts somehow
most likely the model I'M using is the one at fault for the multi hair shit
yea its not the loras fault
@foggy comet think last training went alright, but I couldn't get rid of the hairband fuckery completely, still kinda rng, in the end I Just went with 2 2 2 1 repeats and 2e-4, 2k steps 15 epochs
but I think it's less training related and more just the models and overall tag, see without the lora
nyaruhodo

ok back to the drawing board
I'm including crops of both only hairband and only hair ornament
and see if that helps
after that I'll start replacing these 2 tags with made up shit
@magic peak Is there a way to automatically convert a kohya json config to a easy training toml config?
no, there isn't, the toml files I create for loading and saving are fundimentally different from the kohya_ss way of doing it, and I'm not willing to put the effort into it
ok
I just tried to brute force it by putting all the arguments under each subset but then you run into duplicate key errors and stuff.
well yeah, they are fundamentally incompatible
I tried that but it wouldn't really understand which argument goes into which subset.
sadge
@magic peak yoyo
AttributeError: partially initialized module 'triton' has no attribute '_C' (most likely due to a circular import)
Failed to train because of error:
is there a fix for this on windows? or not needed?
I installed this but seems like it didn't work
pip install https://huggingface.co/r4ziel/xformers_pre_built/resolve/main/triton-2.0.0-cp310-cp310-win_amd64.whl
Not needed, and triton isnt available on windows
ah
print("installing xformers") if reply in {"2", "1"}: xformers = "xformers==0.0.20" else: xformers = "https://github.com/C43H66N12O12S2/stable-diffusion-webui/releases/download/f/xformers-0.0.14.dev0-cp310-cp310-win_amd64.whl" subprocess.check_call(f"{python} install -U -I --no-deps {xformers}".split(" ")) # if reply in {"1", "2"}: # reply = None # while reply not in ("y", "n"): # reply = input( # "Do you want to install the triton built for torch 2? (y/n): " # ).casefold() # if reply == "y": # subprocess.check_call( # f"{python} install -U -I --no-deps {os.path.join('..', 'installables', 'triton-2.0.0-cp310-cp310-win_amd64.whl')}".split( # " " # ) # )
yeah
In the newest version, that section is actually just gone lel

I tried that one version but it said something about circular imports etc
and I know nothing about that so

and I dont even know what it does anyways
glad it was a lot easier to get to work than on rocm 
what's a normal speed for 1024x1024 sdxl training on 3090?
steps: 0%|▏ | 9/2563 [02:19<11:01:38, 15.54s/it, loss=0.112] 
ahh its cuz I'm maxed out on memory
`[general_args.args]
pretrained_model_name_or_path = "D:/stable-diffusion-webui/models/Stable-diffusion/reproductionSDXL_v87.safetensors"
mixed_precision = "bf16"
seed = 218
max_data_loader_n_workers = 1
persistent_data_loader_workers = true
max_token_length = 225
prior_loss_weight = 1.0
sdxl = true
xformers = true
cache_latents = true
max_train_epochs = 11
vae = "D:/stable-diffusion-webui/models/VAE/sdxl_vae.safetensors"
[general_args.dataset_args]
resolution = [ 1024, 1024,]
batch_size = 2
[network_args.args]
network_dim = 8
network_alpha = 1.0
min_timestep = 0
max_timestep = 1000
[optimizer_args.args]
optimizer_type = "AdamW8bit"
lr_scheduler = "cosine"
learning_rate = 0.0005
max_grad_norm = 1.0
lr_scheduler_type = "LoraEasyCustomOptimizer.CustomOptimizers.CosineAnnealingWarmupRestarts"
lr_scheduler_num_cycles = 3
unet_lr = 0.0005
text_encoder_lr = 0.0001
warmup_ratio = 0.05
min_snr_gamma = 5`
jesus
think I'm getting 24gb ram usage cuz of no triton?
batch 1 is 20gb
guess I'll just do batch 1
dunno how ppl are doing training with 8gb vram toh
I was able to get, without triton, just over 8gb vram usage for sdxl training. Though I definitely haven't tried again since
But the triton wheel was causing issues on windows, so I got rid of it.
Oh, I think I see. You are training the TE right?
Not caching latents or the te outputs and training te will do that
Yep, considering there is 2 of them
for now I'm finishing this batch 1training since its halfway through anyways
I unticked te learning rate and its still like this
1/2563 [00:14<10:31:18, 14.78s/it, loss=0.111]

eh
I'll try without venv thing
alright
nah
I installed with install.bat instead and same behavior
jumps from 9gb to 24.5 after this
A matching Triton is not available, some optimizations will not be enabled. Error caught was: No module named 'triton'
its at 9gb after caching latents
got it, 13gb with my initial settings with the adidtion of gradient checkpointing enabled
yeah, checkpointing will do that
it's nice, haven't had to use it before since I didn't try sdxl loras
yeah
D:\LoRA_Easy_Training_Scripts\sd_scripts\venv\lib\site-packages\torch\utils\checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn("None of the inputs have requires_grad=True. Gradients will be None") D:\LoRA_Easy_Training_Scripts\sd_scripts\venv\lib\site-packages\xformers\ops\fmha\flash.py:339: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() and inp.query.storage().data_ptr() == inp.key.storage().data_ptr()
why
the training runs tho
13gb vram used
how long do I train a lora with 20k images 
base model is shit at some words and I wanna try fixing it
I'm gonna try sdxl lora training with a 3060 😭
just delete them afterwards
no I mean it only saves an epoch every 1k steps
dunno how trained that is
I'm not gonna let it run for 35 hours
works but can train tenc
I did batch 2 with gradient chechkpointng
oh so I need to put unet only?
I think what's his face, uhh lemme look

adamw but also adafactor should work
[optimizer_args.args] optimizer_type = "AdamW8bit" lr_scheduler = "cosine" learning_rate = 0.0001 max_grad_norm = 1.0 lr_scheduler_type = "LoraEasyCustomOptimizer.CustomOptimizers.CosineAnnealingWarmupRestarts" lr_scheduler_num_cycles = 5 unet_lr = 0.0001 text_encoder_lr = 0.0002 warmup_ratio = 0.05 min_snr_gamma = 5 scale_weight_norms = 1.0


I just realized I'm an idiot who put backslashes into my caption txts
idk if it makes a difference but yeah
gonna hope clip just ignores that stuff
I just downloaded the tags along with the gelbooru image
had a bunch of shit that I had to remove
and had to change encoding to utf 8
oh I didn't download the captions, just autotagged and cleaned that
10 GB of VRAM looks like it works
Gonna try with batch 2
Seems slightly faster
Batch size 2 is at 10.7
I'm guessing it'll go up depending on bucket size
Some of the bucket sizes it made aren't the sdxl ones that we're supposed to use, idk how much that'll impact things
Yeah but I saw the fooocus guy say that bc of positional encoding it's actually important to follow the sdxl sizes
Idk what that means for training tho
If I have a 1.5 Lora and an xl Lora, it should be possible to transplant the te that matches right
Also it's possible to train the SDXL Unet and TEs on a 3060, but it's batch size 1
sounds fine, just might take longer
Idk what one trainer is doing differently but it's running both te and unet training on sdxl with prodigy and batch size 3 under 12 GB vram
25% speed up to go through one epoch? Might be bc it's lora and not locon idk
huh interesting
what I was saying was from a few months back since I wasnt training stuff on sdxl in 2 months or so
maybe tehre's some optims
Maybe my pics are smaller than 1024x1024 bucketing
I didnt experience any difference between enabling te and disabling it with the same settings, or maybe just didn't noitce it
also tried with 18k pictures and it didn't add much more ram use, maybe a few houndred give or take
what other trainer?
does it have all the settings?
I only tried bmaltais and some other shit but I didn't like the
layout
onetrainer doesn't have locon
only lora or full
trying to finetune the text encoder on a lora doesn't even work for sdxl that well bc you can't load the 2nd TE in a1111 or comfy or sd.next
I guess no one is doing it idk
I can see it
I tried it but can't really tell how different it performed compared to default te values
Oh ok I only use comfy with sdxl
I only tried once so I can't really say if it's better or worse
Could also depend on your dataset

@magic peak I read something about resolutions for training stuff for sdxl, heard anything about changing bucketing resolution having a negative effect or anything?
not that I know of
Maybe it was just a prank
Alright ty
I've been changing it and didn't notice anything off so
yeah, I don't think there's any problem
his profile
always the weirdest fucks attack the most
Oh yeah, already got all of those removed and excluded, just because he wants to lewd them doesn't mean everybody does
At least it was just that one person
What do you guys think about IP-Adaptor, and how it's called the "instant LoRA"? Thoughs on their performance to a real LoRA?
they are absoutely not better
lora is still better in pretty much every way
i do find it strange that some people marketed it as an "instant lora"
i was very confused 
if u guys trained sdxl loras so far
did you need to raise stuff compared to your 1.5 settings?
feels like stuff is slightly undertrained for sdxl
like I can use 1.6 weight and it will be closer to what 1 used to be or even 0.8
gonna try .0005 instead of .0002
marketing
yeah my character loras were are less accurate 0.0001 lr?
I haven't trained any since I updated my scripts to support sdxl, but I believe in general it requires more to learn
Are there any resources on how many chairs can fit in one Lora at various dims and shit?
probably depends on the type of chair
if they're really fancy they would require more than basic ikea chairs
a finetune might be better
like I was doing uhh
I think only 5 chars at once and it gets real messy
unless you wanna do 20 retrains I don't reccomend :v
I didnt think about it
if I can do fine tune on sdxl with my card
5k steps takes like 3-4 hours when I last tried training a big lora
seems so
full bf16 uses less vram?
think so
might be slightly worse quality too
hmm
yea either 64 32 32 16 was too much or the .0002 for te
I'll do 1 repeat and 32 16 16 8 and .0001 te next time, and 1 repeat
It cuts off data on a few layers, so that would be correct, it degrades quality
Bf16 or lora vs locon
Lol
But yeah, some layers requires larger precision then bf16 can provide, so it will cut off some stuff
So like gradient issues?
That sounds about right
I believe it also sometimes means that a value will be too large and either wrap or get scaled, not sure which happens
Huh I thought bf16 should have the full range cuz you don't chop off exponent bits
I think I saw an example of bf16 training vs full before though and it had more artifacts
Nah, what gets cut off the most is precision I believe. Bf16 and fp32 are calculated entirely differently so it's fairly easy to assume that it can have a different precision
https://moocaholic.medium.com/fp64-fp32-fp16-bfloat16-tf32-and-other-members-of-the-zoo-a1ca7897d407
There are many floating point formats you can hear about in the context of deep learning. Here is a summary of what are they about and…
This explains the difference between the bit ranges
Namely, the largest thing that gets cut off is the precision lel
It says fp32 and bf16 have the same number of exponent bits
So I figured the range is about the same
Just not precise
Yes, it's range is fairly similar, but I don't believe it to be the exact same
Either way, precision is where things get fucky anyways
Lel
Range: ~1.18e-38 … ~3.40e38
On that site it's the same range
But 3 digit precision vs 6-9
Well anyway I would like to see a locon trained on both fp32 and bf16 and see what the difference is
If it's tiny it's probably acceptable
On kohya thing he said for finetune to use full bff16 I think
Might have been for being able to do it on 24gb tho
whats a good setting for char stuff where you can learn the outfit parts and other characteristics without forcing them on all the time?
reinstall it or something 
ah good
Traceback (most recent call last):
File "C:\SD\training\LoRA_Easy_Training_Scripts\sd_scripts\train_network.py", line 993, in <module>
args = train_util.read_config_from_file(args, parser)
File "C:\SD\training\LoRA_Easy_Training_Scripts\sd_scripts\library\train_util.py", line 3344, in read_config_from_file
config_dict = toml.load(f)
File "C:\Users\sound\miniconda3\lib\site-packages\toml\decoder.py", line 156, in load
return loads(f.read(), _dict, decoder)
File "C:\Users\sound\miniconda3\lib\site-packages\toml\decoder.py", line 514, in loads
raise TomlDecodeError(str(err), original, pos)
toml.decoder.TomlDecodeError: Reserved escape sequence used (line 36 column 1 char 1025)
Failed to train because of error:
Command '['C:\Users\sound\miniconda3\python.exe', 'sd_scripts\train_network.py', '--config_file=runtime_store\config.toml', '--dataset_config=runtime_store\dataset.toml']' returned non-zero exit status 1.
Nvm again, i hadn't tried training yet. I get this error which seems to be caused by some xformers issue
no that's a toml decoding error
the TOML module is really annoying, and I don't know why it fails half the time
well, I don't know why it fails all the time
it feels like it fails just because
I have set to auto save toml files, is that the issue?
no, that shouldn't be an issue
this is the xformers one
I…honestly dunno what happened. I havent trained in awhile
how did you launch the program?
The run.bat file. I updated beforehand, nothing seemed wrong there.
I’ve also been messing with other Ai stuff so its possible one of those programs required another python version
But ultimately i really dont know
ah, yeah, you need to delete the venv and reinstall
delete it from where?
sd_scripts folder
pepeshrug
should i just delete it all and reinstall the whole program?
even a full reinstall wont cut it, it definitely wants me to upgrade to 3.10 it seems...
yeah, I knew that. my scripts really don't like any other version
which is why it's odd that the installer ever worked, as there is a 3.10 requirement on it
gonna conclude that at some point another program i installeddowngraded me a version
@magic peak aight it doesnt matter what i try now, i cant install it again. i reinstalled python several times, still says i dont have 3.10 installed
can you activate the venv then type python for me?
its just "venv activate" right?
in the root folder of my UI, open up a cmd and type sd_scripts\venv\Scripts\activate
just says it cant find the path specified
also checked python version, still says 3.9. I've just been trying to install over it which doesn't seem to be working
I see. then you have to uninstall all versions of python you have installed, then reinstall python 3.10
because I can't figure out if you even have 3.10 installed
give me a minute
@magic peak It doesnt make sense, bc the specific version its detecting is 3.9.5, according to powershell. But the only version i see and that itll allow me to uninstall is 3.10.6
yeah that's definitely odd
well you can edit the paths so that 3.10.6 is the default
you can get to your path by going to environment variables
in the Path variable, you need to edit it and just move the python 3.10 path to the top
I launched the installer again and checked add to path, now it reads 3.10
I forgot that needed to be checked, but anyways it running the bat file now, ill get back to you in a few minutes
alright
Ok the only issue now is that toml thing
one of your file paths, comments, or names are causing that to happen. make sure to not have any form of ", ', or , in them
Okay seems fixed now, thank you yet again 🙏
np
whats a good size for easy concepts? 
Same as characters, dim16 or so
Shout out from mcmonkey #1134546054872842290 message
Oh yeah, he told me that he did lel
derrian disco
Yeah, I know of leco, it's a type of lora that is trained off of the idea of erasing a concept vs using images to add a concept
It works fairly well, but it does have its limitations, namely, the model already has to have an idea of what the thing you are erasing is
Oh yeah, I definitely don't think that leco would help very much for that lel
It works kinda but ruins the model when not using the remapped tags
Kohaku said to try to remap into "empty" instead of a different tag, so I'll try that I guess
Yeah, I can see that happening lel
Leco is really odd
might have figured out why
used same target as neutral point
so it took the already learned knowledge
instead of incorporating what I wanted
basically negating the goal
there's just no documentation sadly
lot of comments in the scripts but all japanese
can figure out kinda with gpt
Ah yeah, that sounds about right lel
10 hour leco with schizo config
loss moving between 1 and 150
lets see if time well spent 
I'l lblame it on kohaku if not
bitchass
Use gradient checkpointing and accumulation
And accumulation? Isn't it only one or the other?
unless i'm supposed to edit the toml to have both checkpointing and accumulation

Update the UI, I made it do you can use both, after I found out kohya fixed that
ooh, there's an update i see
update...?
may 21st, v1?
oh yeah the update script
aight lets give it a try
darn, still slightly over 
Lower your batch size and increase your grad acc steps
takes 13gb with gradient accumilatino
But if you have both checkpointing and accumulation you can get it to be just under 12gb
what model are people using to train sdxl nowadays?
Animagine v3 or ponyxl
i might be using a larger model, that's why
though it's still 6.46gb large 
oh i figured it out
i didn't save the gradeint settings when i loaded 
does the size of the dataset affect the amount of vram used?
it seems as though I can only train with 5 images 
likewise, does caching latents increase the vram used as well (even if saved to disk)


saving a loading toml files do not seem to properly apply gradient settings
even though it's checked on the ui, it actually doesn't seem to apply it
i have to turn it off then turn it back on
going to confirm this suspicion
yeah... it doesn't apply it on load from toml
well that answers my inconsistencies
i'm kinda low on sleep so it could just be me going insane 
do you guys change bucketing resolution when training on sdxl? or default
no, it was a bug lel
I already pushed it to main
and yeah, I use 512x2048
wonder if these actually work
https://github.com/wkpark/triton/actions/runs/7518654030/artifacts/1168090483
Probably not, it seems like windows xformers disabled triton regardless of if it exists
I heard it was a thing for linux and that windows doesnt actually use it
yeah, pretty sure that's the case
that was very recent I see
not even merged yet
Ah fun. I'll wait for it to be merged then, I don't like supporting experimental things until they have at least been tested
ppl said it works
imi you can wait it’s not merged yet
the guy has uploaded it like 3 weeks ago
its prob not getting merged for like 6 months
don't think u need to do anything tho
"works" sounds like it just works
I'll leave it as an optional install step that you can manually install, either by adding it as a question in the installer, or just saying something about it in the readme
I'm not going to install it standard until it gets merged

Nice
Still gonna be a bit before I can make any changes though, currently doing a rewrite with the intention of separating the front and backend
This is one that I think most people probably wouldn't need though lel
Is it just me does adding double quotes and new lines to the comment section general args results in an improper toml at the start of training

is it not escaped/encoded properly at runtime?
Seems to save to a toml properly, but if you just run it as is it seems to throw the invalid toml error
Loading settings from runtime_store\config.toml...
Traceback (most recent call last):
File "E:\Software\Applications\LoRA_Easy_Training_Scripts\sd_scripts\venv\lib\site-packages\toml\decoder.py", line 511, in loads
ret = decoder.load_line(line, currentlevel, multikey,
File "E:\Software\Applications\LoRA_Easy_Training_Scripts\sd_scripts\venv\lib\site-packages\toml\decoder.py", line 778, in load_line
value, vtype = self.load_value(pair[1], strictly_valid)
File "E:\Software\Applications\LoRA_Easy_Training_Scripts\sd_scripts\venv\lib\site-packages\toml\decoder.py", line 849, in load_value
raise ValueError("Found tokens after a closed " +
ValueError: Found tokens after a closed string. Invalid TOML.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "E:\Software\Applications\LoRA_Easy_Training_Scripts\sd_scripts\train_network.py", line 993, in <module>
args = train_util.read_config_from_file(args, parser)
File "E:\Software\Applications\LoRA_Easy_Training_Scripts\sd_scripts\library\train_util.py", line 3344, in read_config_from_file
config_dict = toml.load(f)
File "E:\Software\Applications\LoRA_Easy_Training_Scripts\sd_scripts\venv\lib\site-packages\toml\decoder.py", line 156, in load
return loads(f.read(), _dict, decoder)
File "E:\Software\Applications\LoRA_Easy_Training_Scripts\sd_scripts\venv\lib\site-packages\toml\decoder.py", line 514, in loads
raise TomlDecodeError(str(err), original, pos)
toml.decoder.TomlDecodeError: Found tokens after a closed string. Invalid TOML. (line 13 column 1 char 424)
Failed to train because of error:
Command '['E:\\Software\\Applications\\LoRA_Easy_Training_Scripts\\sd_scripts\\venv\\Scripts\\python.exe', 'sd_scripts\\train_network.py', '--config_file=runtime_store\\config.toml', '--dataset_config=runtime_store\\dataset.toml']' returned non-zero exit status 1.
might make throw it on the github as an open issue
@magic peak have u tried oft
I have not, I added it but I didn't have time to test it myself
ok I'll try it
Humu

holy shit
no error
I installed triton, deepspeed, and ran accelerate config with deepspeed as well
but this is slow as fuck
and only 12gb vram used out of 24

steps: 2%|▉ | 36/2200 [02:33<2:33:51, 4.27s/it, avr_loss=0.0868]
I wonder if I can do something to make it a bit faster without going over 24gb
maybe I'll try higher batch size next run
from what I remember not using gradient checkpointing made me crash but that was around the time you first put sdxl into it
havent really touched training since then
e1 e2 
thought her dress was balck not grey
alright its gonna be fine
dress went from white to grey to black

shouldnt have used my experimental broken merge to train
but maybe its only the samples
how often does tag dropout drop tags
like lets say with 0.1
ah
well
my lora doesnt do anything in the webui for some reason
used same prompt as for samples for the epochs
and it doesnt do shit in webui

git pulled
prayge
so maybe dropped a tag every 40 imageso r some shit
network oft updated, more chances that it works
loading network D:\stable-diffusion-webui\models\Lora\SDXL\kayokodressoft\kayokodress-000001.safetensors: AttributeError Traceback (most recent call last): File "D:\stable-diffusion-webui\extensions-builtin\Lora\networks.py", line 280, in load_networks net = load_network(name, network_on_disk) File "D:\stable-diffusion-webui\extensions-builtin\Lora\networks.py", line 219, in load_network net_module = nettype.create_module(net, weights) File "D:\stable-diffusion-webui\extensions-builtin\Lora\network_oft.py", line 9, in create_module return NetworkModuleOFT(net, weights) File "D:\stable-diffusion-webui\extensions-builtin\Lora\network_oft.py", line 44, in __init__ self.rescale = self.rescale.reshape(-1, *[1]*(self.org_module[0].weight.dim() - 1)) File "D:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1695, in __getattr__ raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'") AttributeError: 'MultiheadAttention' object has no attribute 'weight'
really bruh
rip
I think it's for each token or comma separated sequence, there's a 10% chance of dropping it? but idk what you're using
the easy trainer script I showed, I'll ask derrian later as well 
they must know
it is black
oof
Yeah, I can see that
Oof
Depending on which tag dropout you have enabled, if it's the first instance, then it is x% chance per tag to drop out
Per epoch means all tags are dropped once every x epochs, I believe
And the third one... I don't remember, never once used it lel

we,ll the descriptions are helpful
"default rate that any 1 tag gets dropped"
man
I really ywant to do an oft lora
I wonder if disabling te will still be good
I don't get it
What's the point of training te if not training it still learns shit fine
I've personally found that training the te helps with generalization
f, shouldnt have used my broken merge to train on
but at least the method works fine
so it was good for a test
I'll try normal animagine next
without broken shits involved
or maybe its cuz I did full bf16
cuz while using the model (that I trained on) in the webui this issue is not present anymore
steps: 8%|████▎ | 180/2200 [13:42<2:33:46, 4.57s/it, avr_loss=0.0781]
could be worse
gradient checkpointing bach 4 and 2 repeats
Yeah that's actually not bad
I try not to use full bf16, I don't believe it to be very useful tbh
I dont know what the difference is between using it and not using it
unless it makes it slower
its a bit slower actually than batch 2 repeat 1
somehow
Larger batch sizes equals more processing time per step, makes sense to me lel
Because of how checkpointing works
guess I'll stick with batch 2 then, only wanted to try higher to see if it would be fast
I find that batch 4 is much better for generalization, which is the real reason to use a higher batch size tbh
really
thought it only impacted performance
what the fuck is going on?
this cant just be happen ing to me
its less visible but
its still there
with this logic this looks fine too cuz its only visible when u full screen it
I want none of that shit
filter it afterwards
how 
idk how to get rid of the noise 100% but it'll remove some of it
well, luckily its only lora so far
lora trained on animagine
I can always just train it on sdxl base
or apply the same thing to the lora as I did to the merge but dunno how, we'Ll see
Yeah, it helps in generalizing the concept more, because it effectively trains on the average of the latents
It also technically means it needs more steps, but... well
More steps means only is in terms of the original number of images
thats interesting
so higher is better as long as ur pc can handle it? 
Up to a point
I saw big finetunes train with batch 16 etc
I usually say don't go higher then 8 for lora, but sometimes higher works
Yes, finetunes want very high batch sizes I'm comparison
I usually recommend 4 or 8

I like the game, every time I think "man I had to use so much to roll"
2 weeks later I'm almost back where I was
Yeah, the game is very generous
yeah it's my fav after azur lane and thats on its own category
I'm currently sitting on like... 500 rolls or something lel
real
Yeah, I've sort of expected that, it's always like that lel
If ba isn't your favorite game, it's either al or arknights
9oh
I meant like
in rolling terms
gacha
but that too
arknights is actually very low for me
actual hobo playthrough when you're saving up for someone
Heh, fair
And yeah, 100% al is the best for rolling
maybe its better now 
but in the first year I remember not rolling for months and still being like 100 or more rolls away from the pity on limited banner
unfunnest shit
goofy ahh match holding
seems like its not present on the latest one I Trained
maybe lucky seed or some shit
yeah the animagine one seems fine
no glitches so far
maybe it was only on the samples during training

I mean makes sense
cuz it used animagine for inference I think
she tilts more with training
must be a league player
oof
😭
what would u recommend for multiple char or outfit stuff? full is last resort but kinda worried about altering style with scuffed params 
I usually just use locon
locondeeznuts
this is what I used for the kayoko outfit
just turned off te cuz of that issue
I think I was seeing decent results around 1k steps but ended up using the one from 2k
more cinsistent in feel
I can see it
when it comes to characters, I'm still not entirely sure how to train them on SDXL
styles I have an idea, not really characters though
yeah I only tried chars with same settings or a bit more steps/higher lr than on 1.5
this doesnt really count cuz the model already knows the char
understandable
but thats why it sohuld be easy to train in the new outfits and stuff
but I really hate about merging not knownig what you're exactly losing
for the new information
yeah that's fair
loha is a compressed locon
oh locon is lora
and I personally don't see a use in it if you aren't training like an entire anime into it
locon is lora with the conv layers
also saw in leco theyre called c3lier and lierla
good thing kohya didnt name it lolicon
yeah, those are kohya's names for them
or whatever he wanted to meme with
no, lolicon was the correct meme

I would have found it absolutely hilarious











