#🔧|finetune

1 messages · Page 14 of 1

warm agate
#

ok

stiff dust
#

usually you can't use higher values than, say, 4

warm agate
#

How to check the highest possible, just through trial and error?

stiff dust
#

yes

warm agate
#

ok

#

Which one do you suggest for Landscape photography training?

stiff dust
#

just use ED2

#

don't overthink it. In the end, every training data behaves a bit different anyways

warm agate
#

@stiff dust How to check VRAM usage?

#

Can we upload images of different resolutions?

surreal lagoon
#

well on my multi a100 system i can really only tolerate bs 7 and 64 gradient accumulations

#

it gets too slow and training takes 10x longer

stiff dust
#

I don't believe in gradient accumulations. Why should it help?

surreal lagoon
#

i was just trying different things to get a more gradual and cohesive training session

#

i have been observing pretty great results now that i've frozen the text encoder by decaying its learning rate to zero as the unet learning rate rises

#

4k steps to train the text encoder and then another 6.5k are nearly done now, with just the unet

#

do you think i'll have good luck if i do one epoch of final training with the TE unfrozen after the unet training completes?

mild mortar
#

Hi everyone, I was just wondering what is the most recommended way to up-scale images while retaining realistic texture on the skin, face and remains crisp and not too smooth after upscaling 4x. Thank you 😇

stiff dust
surreal lagoon
#

odd

#

i do use a super low learning rate. it starts at 1e-10 and rises to 9e-8

#

my loss averages 0.244 with my current training method, likely due to ADAM 8bit optimizer spikes, but this is so much lower than the .546 avg i could get with the 'official guidance' for fine-tuning 2.1

#

i was wondering how i was breaking everything for a few days before i realised that almost not training it at all on each iteration has the best results

#

freezing the text encoder fixed the 'memory loss' problem i was having where for instance it would forget how to make a gecko since geckos weren't in my training data and i couldn't figure out a way to generate reg images that would represent them and also everything else i wanted to preserve

#

example: the term leopard gecko started out lizard-like, pretty close to a real one. but then it starts to add whiskers and fur and then an actual leopard's head on a gecko body before finally it's just a leopard. the next checkpoint, the leopard was gone, and it was a "leopard tank", eg. a vehicle used in war, with little soldiers standing around, and smoke in the background

#

my model i published was from about 2,000 steps before this loss became so noticeable in my test matrix of prompts. but for stuff outside my test matrix, it's apparent there was still some loss. it's just an acceptable amount, and things i generally didn't care to preserve eg. celebrities

hot breach
#

from another conversation, qwerty only has 74 images and was trying to set batch size to 35, this will likely cause issues, wouldn't suggest setting batch size more than like 10% of your total image count

stiff dust
#

high batch size is even good if you have only a single image

#

as you sample the image at different noise time steps

hot breach
#

the issue becomes with runt batches and aspect bucketing

#

if you have 30 images and a batch size of 25, you end up with two steps, one of 25 images and another 5, these remainders become an issue

stiff dust
#

but this is just an implementation issue

#

depends how you implemented the bucketing

tall condor
#

so my firs try with regularisation images was a complete fail

#

all my mopdel is producing is regularisation images

#

and ther eis bascially nothing left from any of my concepts, what did i do wrong?

#

i have 240 regularisation images per concept that has like 10 images and i run those 10 images like 20-30 times per epoch, so i theory i have around 1 regularisation image per image i run per epoch

#

any recommendation?

stiff dust
#

like one seed produces a gecko, another more an leopard, and a third one a tank. When training your model, it might switch between these three interpretations of the image, but it would also switch if you just use a different seed

#

but in general the text encoder is surprisingly the workhorse. Training it mostly determines the outcome of the image, while the unet takes way more time to train and easily overfits on texture instead of the structure or shape

tall condor
#

well without regularisation images all works kinda ok except the overfitting

surreal lagoon
#

the gecko never came back at any seed and eventually that model stopped working very well

hot breach
#

damian did a bunch of testing and put a PR in so you can even just choose to train the final X layers of the text encoder, which seems to be really good for SD2.x with the newer 24 layer encoder, makes it train more like SD1.x

stiff dust
#

yes, I also always freeze the first 16 layers of text encoder

#

also I train Lora on low rank for the textencoder

surreal lagoon
#

i'm just using the train_dreambooth.py script with slight modifications. can those improvements be backported to that script?

finite creek
#

Thank you! Appreciate the info 👍🏻👍🏻

surreal lagoon
#

@finite creek see above about freezing the layers of 2.1's text encoder to better train it without catastrophic loss in lower layers

#

my understanding is that when OpenCLIP was trained, LAION went stage by stage gradually freezing subsequent layers of the text encoder in order to preserve the foundational features it learnt

#

so we kind of have to do the same thing to avoid disrupting the connections and structure of those layers too

normal pike
#

Hey there. So, I've got an issue. The issue is, that doesn't matter how I train the lora, it does everything pretty decently, BUT the eyes of the character... Doesn't matter how much I try. What could I do?

#

(my dataset has 34 images both close-up of the character's face, body and eyes too, unet is 1e-4 and text LR is 5e-5)

finite creek
surreal lagoon
#

certainly. i'm using the normal train_dreambooth.py script, modified to use the filename of an image as its prompt (with some added cleanup etc)

that script has the option to train (or not) the text encoder. the process goes something like:

  • analyze training data, retrieving all keywords and count their frequency
  • prepare a subset of training data that contains the least commonly used keywords, as they're least likely to be known by the encoder
  • at this time you can also remove any outlier data or look at the most commonly used keywords and either modify or remove them to ensure you're populating the segments you actually want to

once your data is ready,

  • an initial training run at a supremely low learning rate using polynomial learning rate scheduler on the text encoder and unet, simultaneously, for a certain number of steps. you likely want to make a ckpt every 50-100 steps when you're in the "toy model" phase just to see how your test prompt output is changing.
  • i'm not sure whether prior preservation is useful here. if you're doing Dreambooth for a single subject, probably it is mandatory. if you're doing a general fine-tune, it seems to be incredibly harmful.
  • select/cherrypick from your checkpoints for the one that has the most pleasing results. this can be quite subjective. it helps to have a wide array of prompts generated from each checkpoint in a way that you can compare them easily. you want to select a checkpoint that didn't change the output much, but be sure to check the results of prompts containing your pre-training keywords, so that you can more easily see the early changes that training is applying.

Honestly if it's changing too much between each ckpt, your learning rate might be too high.

Once you've got the text encoder trained,

  • use save_pretrained on the pipeline for that checkpoint in an inference script, to save it as a complete model

  • begin another training run, this time on your full subset of data, and your full step count, and no --train-text-encoder option

  • this can have a much higher learning rate, but since i had a large number of images to process, i kept it low

  • you can use save_pretrained on the checkpoint that is most appealing to you, from these results. I save checkpoint every 1000 steps when training the unet alone, but if your LR is higher than mine, you might need every 500.

  • once you have that complete model saved again, you can go back to the text encoder training step, this time, on your full subset of data.

disclaimer: this is my process i'm doing currently and not what i think a lot of other people are doing. if you can, at all, use the new dreambooth code instead, that will use separate learning rates for TE vs unet, since they benefit from that. additionally, the new code has the ability to actually freeze the more important layers of the text encoder so that it is harder to damage.

tall condor
#

hi guys, im still having issues with tht number of regularisation images, how many regularisation images per concept shall i have?

chrome breach
#

Ig trying out with 5, 10, 15 and evaluating then on each of resulting models will be hepful

tall condor
#

5-15 per image?

chrome breach
#

Yes

tall condor
#

im using kohya ss, so my images are run between 10 and 40 times per epoch

#

does that mean that i also need to get 5-15*10-40 reg images?

#

or shall i still stick with 5-15 per image?

surreal lagoon
#
                if args.with_prior_preservation:
                    # Chunk the noise and model_pred into two parts and compute the loss on each part separately.
                    model_pred, model_pred_prior = torch.chunk(model_pred, 2, dim=0)
                    target, target_prior = torch.chunk(target, 2, dim=0)

                    # Compute instance loss
                    loss = F.mse_loss(
                        model_pred.float(), target.float(), reduction="mean"
                    )

                    # Compute prior loss
                    prior_loss = F.mse_loss(
                        model_pred_prior.float(), target_prior.float(), reduction="mean"
                    )

                    # Add the prior loss to the instance loss.
                    loss = loss + args.prior_loss_weight * prior_loss
                else:
                    loss = F.mse_loss(
                        model_pred.float(), target.float(), reduction="mean"
                    )
#

so this is code for prior preservation and i kind of see what it's doing, but, why is it doing that?

it makes the loss value appear much higher than it is without prior preservation, and i see now how the weight is applied to the prior loss and explains why the loss is lower with it being less taken into consideration.

but how does this actually direct the process or change its result?

stiff dust
#

it's just training on the regularization images and the training images

#

you don't need any special loss for that. You could also just put the regularization images to your training data.

#

however, the idea of regularization images is that they are only seen once in training (ideally). So you cannot overfit on reg images as they are trained only for one epoch

tall condor
#

so they are not applied every epoch?

surreal lagoon
#

oh

#

so the backwards pass uses the loss value to determine how much error to resolve

#

so why does SD 2.1 just start out with insane loss values on the regularization data when i feed only those through?

tall condor
#

hi guys, i see a major different between running images more than once rather than running multiple epochs, anyone know why?

tall condor
#

also anyone tried the difference between random crop and center crop?

surreal lagoon
#

depends on your source material, how much source material, how it's tagged, etc.

#

i like the partly-frozen TE

tall condor
#

my source material is all captioned, however i have a lot of concepts that are mixed

#

the souurce material is very low sometimes per concept- sompetimes maybe 5 pics only

#

sometimes 100

surreal lagoon
#

not sure, my best results were with about 3000 images so far

tall condor
#

does it makes sense to train half the time with random crop and half the time with center crop or so?

surreal lagoon
#

are you just doing style transfer

tall condor
#

no its not only style, its objects with details

surreal lagoon
#

then you want to manually crop your images

#

that's not very many and it will be easy

tall condor
#

but i have like 7k images xDD

surreal lagoon
#

idk, i don't see the point of just trying to train like 5 images of something and hundreds of others. it likely won't learn the lesser-frequent concepts

tall condor
#

well with kohya_ss what you can do is define per concept how often the images are repeated, however that also causes issues for me like creating strange pattern

#

and i still havent found a way to tackle this

hot breach
#

duplicating the rare examples can help a bit, just don't try to fully equalize, the one with all the duplicates will overfit

uncut vapor
#

Hello, I'm wondering if anyone can point me in the right direction. I want to remove speech bubbles from images of comic panels without giving prompts. Dataset in the thousands. I think I can train something like meta's new SAM to segment, YOLO to ID, then SD to inpaint? Is that the SOTA? Can SD tools help in the ID part at all?

surreal lagoon
#
    # Load scheduler and models
    noise_scheduler = DDPMScheduler.from_pretrained(
        args.pretrained_model_name_or_path, subfolder="scheduler"
    )
    text_encoder = text_encoder_cls.from_pretrained(
        args.pretrained_model_name_or_path,
        subfolder="text_encoder",
        revision=args.revision,
    )
    first_frozen_layer = 0
    last_frozen_layer = 0
    total_count = 0
    for name, param in text_encoder.named_parameters():
        total_count += 1
        pieces = name.split(".")
        if pieces[1] != "encoder" and pieces[2] != "layers":
            print(f"Ignoring non-encoder layer: {name}")
            continue
        print(f'Pieces: {pieces}')
        current_layer = int(pieces[3])
        if (
            current_layer >= first_frozen_layer and current_layer < 21
        ):  # choose whatever you like to freeze, here
            last_frozen_layer = current_layer
            if hasattr(param, 'requires_grad'):
                param.requires_grad = False
                print(f'Froze layer: {name}')
            else:
                print(f'Ignoring layer that does not mark as gradient capable: {name}')
#

this has me training just the last 2 layers

hot breach
#

ED2 allows you to freeze first/last n layers in text encoder, it seems pretty good for SD2.x models with the newer openclip, though for SD1.5 I just set the learning rate of the text encoder lower than for unet, seems to work well, like 1/5th to 1/2 or so, or use cosine schedule on text encoder only

#

here only training last 6 layers, using different LR for text encoder

serene widget
#

Hi folks (again) ! I really need help here. I installed the plugin to my photoshop - but it says that "Failed to load stable art"
I did all that was in the instruction
I have few versions of photoshop. And the plugin doesn't work in any of them.

tall condor
#

higher prior loss value or lowere prior loss value will cause more overfitting?

#

as far as i read a lower value will cause more overfitting but from the source code it appears a higher will

#

can someone confirm that a higher prior loss value will cause the model to overfit less

dull snow
#

can anyone introduce me to the settings of training a embedding

dull snow
#

forget it

stiff dust
#

a higher prior loss will reduce the impact of the training data

#

just set it to 1:1

untold moss
#

hi people how many max steps do you use for faces? it's 3000 too much or too low??

#

it's taking me 4 or 5 hours for each embedding, it's actually maddening.

surreal lagoon
#

3k steps for a face seems like a lot

#

i finally have enough sunlight for today (for now) to do some training on my 4090 here 😄

finite creek
finite creek
finite creek
surreal lagoon
surreal lagoon
finite creek
finite creek
surreal lagoon
#

mine, or theirs?

surreal lagoon
hot breach
surreal lagoon
#

i should freeze some of the unet too?

hot breach
#

I've not tried, im sure if you selectively froze stuff in general you may see interesting behaviors

#

kohya did experiments with merging two unets with different weights per layer between A and B models, it produced different results, but all his examples were anime and harder for me to judge, certainly interesting differences by using different layer weights

surreal lagoon
#

i tried to merge models but those seem to be for 1.5 and the weights aren't named the same now

surreal lagoon
ionic gulch
#

hey
I'm trying to create a data set to train my lora which is pretty much just to create this animation

#

i have 2 options. one input all data sets i have from the internet, which is pretty much all cartoonish art.
two try to create it from WebUI and then use it as a data set
would doing the first option cause results to turn into cartoon art? as I want to create semirealistic art with it

wispy elbow
#

Anyone know how I can get txt2img to generate red hair with blonde tips? I've tried all manner of prompts and weights but it always chooses one or the other. I've got multicolored hair before, but this is my first time wanting a specific coloration.

surreal lagoon
#

so with Terminal SNR and training on 2.1 i can get loss down to 0.11

#

@hot breach have you experimented with the alternate noise schedules? or @stiff dust

hot breach
#

I just added the zero terminal snr thing but need to run experiments, I've messed with offset noise quite a bit

surreal lagoon
#

it isn't working very well for me

hot breach
#

offset noise requires smaller amounts the longer you train, and it is not very stable since it has to be modified based on length of training, or turned on only for some portion of training

#

20k steps you can try 0.01 or 0.02, the original blog post suggested 0.10 but that only works well for training hundreds or a few thousand steps

surreal lagoon
#

i mean terminal snr

#

it's just destroying the model immediately

hot breach
#

ah

surreal lagoon
#

it might come back

untold moss
#

what is the min dataset size you recommend for 3000 steps? 20, 30 ?

tall condor
#

pseudoterminalx: i am having similar issues, your learing rate is too high and you are using to many steps

#

try a learning rate of 5e-7 or 8e-7

dull snow
#

every time i try using kohya ss i get exit code 0

#

😭

#

even if i do everything right

dull snow
#

returned non-zero exit status 1.

#

i followed whole tutorial

untold moss
#

why makes the error "AttributeError: 'FreeTypeFont' object has no attribute 'read' stable diffusion" in training each time it has to save a png ?

surreal lagoon
#

and 100 steps

#

how is that too much

#

4600 steps in...

#

should i just keep going?

#

i can kind of see the output improving. but also feels super broken

dull snow
#

dying rn

surreal lagoon
#

like im having to totally re teach it how predictions work with a new. algorithm

surreal lagoon
#

seems to be figuring something out

tall condor
#

pseudoterminalx can i see some of your input data

#

and how many images are you training on?

#

are you running images multiple time sin an epoch?

surreal lagoon
#

i have 22,000 images

#

and no

tall condor
#

you should not get that result with 22k images

#

are they captioned?

surreal lagoon
#

i'm switching the model to Terminal SNR, so

tall condor
#

did you tag the images propperly

surreal lagoon
#

of course

tall condor
#

can i see some input samples?

surreal lagoon
#

i have good results without Terminal SNR, though they're not exactly what i want the model to do, usually..

tall condor
#

ok thats a quite big variation of input types, can i see some captions with the correspondig image?

surreal lagoon
#

they're the filenames for each

tall condor
#

pizza_au_fromage_et__la_confiture_confiture_dabricot_fromage_comt_la_pizza_est_saupoudre_de_persil_cinmatique_hyper_dtaille_dtai.png

#

are you captioning in non-english?

surreal lagoon
#

it's french

#

it's a mix of languages

#

i think there's some Hindi, Russian, Japanese

tall condor
#

i dont believe that it works if its not english. also just to make sure, you are having the caption in a .caption file and it is beeing used right?

surreal lagoon
#

i have a custom dreambooth script

tall condor
#

which script are you training with?

#

ah ok

surreal lagoon
#

the filename itself is the tag/caption

#

i like this approach better than what anyone else does

tall condor
#

first of all i believe you need to normalize the data to be all the same language, english

surreal lagoon
#

i don't believe that to be the case at all

tall condor
#

i dont think you can train in non english

surreal lagoon
#

like i said i have good results without Terminal SNR.

#

the OpenCLIP model already understands other languages, sir 😛

#

my efforts are to improve the model across the board incl more language comprehension

tall condor
#

the problem is that the text model training will be screwed up if you dont and its as far as i understand the most importaint part

#

if the base model you train on is in english you are bascially intriducing new "words" to the model and it will probably only lik those words to your images

#

so if your cheese is now "fromage" it will only know your one "fromage" image as "fromage" but not chees

#

at least thats how i understand that

#

which means that it will not be able to recognize your "fromage" as chees and also not recognize chese as "fromage"

#

i believe what you want to do is to train in english and than convert your input promt from any lange you enter to inglish before generating an image

surreal lagoon
#

man it's really improving still. i am going to leave it running

#

@tall condor i don't think you fully understand what's happening here

#

this is stabilityai/stable-diffusion-2-1 output with my current settings and that same prompt

#

i'm fine-tuning the model to use terminal SNR

tall condor
#

sorry i dont know anything about that. i though you are doing regular training

surreal lagoon
#

it's okay 👍🏽 usually i am but last night i started down the rabbit hole of implementing a research paper

#

i wish there were more people that have done this specific transition

#

i have no idea if what i'm seeing is correct

tall condor
#

what exactly id the difference

surreal lagoon
#

it improves the contrast balance of the image

#

the typical noise schedule of SD means that the overall average colour grade of the image is gray

#

offset noise was a workaround to help with this issue but apparently it's a hackish fix and terminal SNR is "the right way"

#

4500 steps

tall condor
#

so its basically a replacement for lowering the applied noise?

surreal lagoon
#

5700 steps

#

i don't know if i'm doing this right but i'm okay with what's happening now at least.

tall condor
#

would be interresting to see where its going

surreal lagoon
tall condor
#

the images are still having ver yhigh contrast tho

#

i wonder if that will be lowered to the end

surreal lagoon
#

they were washed out before

#

the contrast is a fix in progress as far as i can tell

#

you know what's crazy, the researcher leading this team only graduated from university about a year ago

#

that said, he obtained a masters' degree in CS so, a bit more than what i did KEK

tall condor
#

cool

#

what i really hate on training is that there is so little description on impact of certain settings

#

for me its mostly trail and error

surreal lagoon
#

yeah people hold these cards too closely to their chest

surreal lagoon
#

loss=0.0743

surreal lagoon
#

on the off chance anyone else has done this in here, did you freeze the text encoder partly, fully, or not at all?

tall condor
#

anyone played with color augmentation yet?

surreal lagoon
#

well i'm going to restart this with a fix pulled from ED2

#

i was definitely at the very least, doing inference wrong

#

way better

tall condor
#

that from the same model before or did you restart?

surreal lagoon
#

this is the restarted training

tall condor
#

what did you change this time?

surreal lagoon
#

scheduler and config for it

#

this time i kept the SD2.1 scheduler config and overloaded values into the betas

#

previously, i just used the default scheduler config with overloads

#

my understanding now is that SD2.1's config is pretty different from the way the schedulers are used out of the box

tall condor
#

looks much better now but is it doing what you expect it to do?

surreal lagoon
#

i didn't expect it to work this well, so i'm not sure how to answer that

#

it still has photoreal issues but i can fix those

tall condor
#

maybe its not doing anything at all? xD

surreal lagoon
#

nope, the image quality is +++ compared to baseline

tall condor
#

cool

surreal lagoon
#

the contrast is changing a lot

surreal lagoon
#

ok so bikes look super good

#

i'm going to nuke my learning rate because it's too high

untold moss
#

any guide for fine tuning or faq or pro tips?

summer rose
surreal lagoon
#

it's in quite disarray

#

are you a developer?

summer rose
surreal lagoon
stiff dust
#

interesting. Should try the scheduler fix, too

surreal lagoon
#

scheduler fix?

#

i'm so overwhelmed, lol

stiff dust
surreal lagoon
#

yeah i know that now because i've fixed the scheduler config and now terminal SNR works out of the box

#

this is baseline without any fine-tuning

#

nothing special about the prompt and there's no negatives

#

btw, ignore the nsfw prompts in my inference script, they're only there so i can stop training if i begin to introduce any by accident

#

they do really weird stuff LUL like paint the swiss alps, a cabin in the woods. because there's zero concept of those in 2.1

surreal lagoon
#

i'm using photos as class data but i've got it feeding the class data's filename in as its prompt instead of a single token, and i've used BLIP to label them

tall condor
tall condor
surreal lagoon
#

i'm not going to go rush to implement stuff in applications i never use, that's up to you

tall condor
#

i saw. thanks!

tall condor
#

so i have 2 same images, one time with orange one time with blue tint, i would expect sd to equalize them out to be a neutral tint but for some reason the blue one is very stronly dominating, why is that?

surreal lagoon
#

likely because of how the denoising process works

#

but i don't know if anyone knows exactly why that would happen, are you using any special techniques like offset noise or SNR fixes?

restive plank
main blade
#

Hello. I'm trying to train sd with a lora with this art style, but so far I'm not getting good results. I think it's partly because my English is not very good and I'm not describing the image properly in the text file. Can you help me with the caption text of this image for example? It's a bullfighter's jacket, I guess I should describe as well as possible the ornaments, the embroidery, the brooches...

surreal lagoon
#

watercolor

surreal lagoon
#

well, training has completed Blurry_eyes

#

first test prompt outside of my sanity check prompt list

lilac topaz
#

I am trying to train a model to use this specific mask. If you pause at any given frame most of them look like the mask I'm trying to fine tune but each frame scrambles the colors/features. I tried retraining with images of a single mask but it still scrambles the color... any thoughts on getting consistency? Not 100% sure if it's a training issue or a settings issue

surreal lagoon
#

can you share more details about your training setup?

lilac topaz
#

this dataset you see here was trained on about 25 pictures of a variety of these masks (no 2 are exactly the same). I can't remember the exact settings but it was a standard how to vid on youtube, I believe training steps were 1600 at 2e-6.

I then trained it on a single mask but used 50 photos same settings, this one was muddled and looked too much like a generic mask.

I then used a dataset of a single mask at 25 photos which seems slightly more consistent however it requires a lot more fine tuning in the settings.

My approach here was to treat it like a face and give it a modest dataset at 1600steps, 2e-6 which is what most people say faces should be trained at

surreal lagoon
#

you're doing Lora?

#

or Dreamboothing a keyword?

lilac topaz
#

I'm using dreambooth on google collab. My GPU is too slow to use it locally

surreal lagoon
#

i would suggest training a Lora

#

dreambooth is kind of iffy and is a heavier process, and a Lora will sit on top of any compatible model

lilac topaz
#

thank you! I will try Lora

warm agate
#

@surreal lagoon to caption a face which all traits/features should we mention?

stiff dust
#

I would always first do textual inversion to learn the facial features / assign them a token.Then you can use that token for captioning

#

caption can be "photo of <mytoken>". Better is to randomize it a bit, "<mytoken>, photography", "an photo of <mytoken> made by smartphone camera" and so on

warm agate
stiff dust
#

same character. I thought that was the question

warm agate
surreal lagoon
#

to me that led to improvement across general keyword use, eg. a woman or a man will come back by default as more complete-looking

#

i like using BLIP for training captions because it essentially has the text encoder tell me where it wants those images to be

#

like "oh, i recognise those features! we have these keywords for them." and then i provide high quality images for those keywords and it learns to do them better

hot ether
#

Heya, i have 1,3k images and I want to train Lora/Model (dont know yet) which one is more suitable, looking for some advice and guides on where to start:)

stiff dust
#

I use Blip, too, but still find its captions way too short.

surreal lagoon
#

you can do it with a higher temperature and return multiple possibilities and merge them

#

🙂

surreal lagoon
#

is the vae fine-tuned in the script from diffusers?

tall condor
warm agate
gloomy stag
hot ether
hot ether
#

is there a way to run dreambooth fully locally?

stiff dust
#

yeah, of course, you should have 8gb vram at least, better more

hot ether
#

i have 12

#

tried to used colab but it throws disconnects from time to time ruining everything

finite echo
#

Im looking at making a "pixel perfect" 32x32 pixelart model, do I need to finetune everything into it (knights, dragons, zombies, swords, skeletons, trees, bushes, etc.) or is there a way to do this in 1 go or does it need to be done 1 at a time?

surreal lagoon
hot ether
surreal lagoon
#

if you clone the diffusers repo into your google drive on colab (GPT can help you with that) you can find the examples directory and in there is some dreambooth, fine-tuning, etc scripts and you can pick whichever you want, and follow the directions on the huggingface hub tutorials for it - that's how i've done it anyway, i'm sure there's different approaches

trim portal
surreal lagoon
#

like this

#

i fine-tune on 2.1 only because i love its compatibility with higher resolutions and SNR fixes, as well as the penultimate clip handling

#

it's not as "easy" though, it takes a fair bit more understanding of what you're trying to accomplish

hot ether
#

i just want to give it a word like " mouse" and get a mouse in that image style:D

surreal lagoon
#

that sounds like you want a LoRA

hot ether
#

maybe

#

i found it out just today, didnt manage to get into it tho

surreal lagoon
#

or textual inversion

hot ether
#

this idk what is:D

surreal lagoon
#

like a positive/negative prompt, on steroids

hot ether
#

can i feed it with my own images to make it "understand" what i want

trim portal
#

What do you think i shud use for this use case: i have 1000+ images of one specific meme in diff styles. I want to create a model that lets u input any text and get that version of the meme (a batman version, spongebob version etc.)

surreal lagoon
#

a meme in different styles?

#

you mean like pepe with his identity crisis?

trim portal
#

Exactly yea

#

Like this would be trump pepe and the other drooling pepe i alr have captions but have been struggling with getting good results finetuning

#

Obv my dataset is not pepe but a diff meme but similar idea

surreal lagoon
#

i think you want a LoRA

tall condor
#

does anyone know if there is a way in auto 1111 to download all dependencies at once?

tall condor
#

@hot ether you may want to take a look at kohya_ss

trim portal
warm agate
#

@surreal lagoon can I DM?

#

I want to discuss it in DMs rather than here

surreal lagoon
#

i'm about to go to slepe but ok

tall condor
#

what is the minimum resolution you gyus recommend for training?

#

is 1000x1000 ok?

surreal lagoon
#

it only works at 512 or 768 square

#

the aspect bucket stuff is really poorly implemented, manually crop and centre everything

#

downsampling high res images to the right size will artifact the image too

#

better to crop instead

warm agate
#

@surreal lagoon I have DM'ed

#

Kindly please check your inbox

warm agate
surreal lagoon
#

nope not impossible

#

i will answer here. i think using a llm to caption images is a waste of resources and money

#

you already have a Clip encoder that can caption, and it can be fine tuned on new captions

#

unless you are training an encoder from scratch, i see no point

warm agate
surreal lagoon
#

it means you are making BLIP work better

#

openclip was trained on like 2 or 3 billion image caption pairs

#

i doubt theres a lot that can be improved on that with a llm

warm agate
surreal lagoon
#

first, demonstrate to me that small captions dont work well

#

dont simply assume

warm agate
#

Lemme show me the output

surreal lagoon
#

why do you need the details captioned at all

#

you are tuning its current vocabulary

#

it will be fine and you can use shorter prompts to get good results then

#

if you want new keywords added then you will have to add them to each caption yourself

#

its tedious but not the end of the world. you arent training on 10k or 100k images, so

warm agate
#

I am away from my pc

#

I'll share the images and their captions generated by ED2 captioning(blip2)

surreal lagoon
#

ok you seem to be taking on a project you arent prepared for at all

warm agate
warm agate
#

What are the things that I have check for?

surreal lagoon
#

start small

#

observe what different parameters change for an end result

#

your 100k images will take more than a week on a single gpu to train effectively

#

imagine finishing that and realising it wont work

#

i started with 300 images which werent enough and 3k images were okay but starts to take long enough that the text encoder could be damaged. the most recent try was 30k images and it took a lot of work to figure out

#

100k images to me would require a much smarter approach to training the text encoder

#

smarter doesnt mean faster, wastes some compute to optimize training but layers can converge faster

#

so in a roundabout way, it is faster but fewer iterations per second

warm agate
surreal lagoon
#

i mean the text encoder

warm agate
surreal lagoon
#

time to google

warm agate
surreal lagoon
#

it does most of the work

warm agate
surreal lagoon
#

and more

#

do it and find out lol

warm agate
#

@surreal lagoon which training method is best suitable for landscape photography?

#

And which one for faces?

surreal lagoon
#

these all feel like questions that google can.easily answer and im not trying to be rude but it feels slightly rude to keep asking someone stuff that is so easily discovered. like, i am not a search engine, ya know?

#

i will say good luck training on the word city or downtown because new york times square is overfitted something fierce and likely cant be fixed

warm agate
#

Np

#

Will check through Google

rose bridge
#

Is it possible to train lora's with an m1 apple? i tried googling but it doesn't seem supported yet by most GUI i seen.

tall condor
#

@surreal lagoon may model is producing strange patterns after a while of training while other things keep training well. is there a way to find out what causes those patterns? for example it looks like skin is being ripped off and stuff. it feels like a single image is becoming too dominat or so. any idea how to tackle that?

surreal lagoon
#

are you training the text encoder? if so, are you freezing any of it?

#

shapes, textures and patterns are pretty strong features in the lower-to-mid layers of the text encoder

tall condor
#

what do you mean when you say freezing?

#

do you think it makes sense to stop the text encoder training at the point where i start seeing patterns?

tall vault
#

How can I create images with a specific person's face in it? I know I can try to get the description of the image. But can I train the model to know a person's face?

#

ok so I guess dreambooth can do this but I only have 6GB of vram

#

so, not possible?

valid coral
#

Heyyyy

So I did this tute:
https://youtu.be/3uzCNrQao3o

Took me almost 6 hours to get through it

In the end, results were AWFUL

How to install famous Kohya SS LoRA GUI on RunPod IO pods and do training on cloud seamlessly as in your PC. Then use Automatic1111 Web UI to generate images with your trained LoRA files. Everything is explained step by step and amazing resource GitHub file is provided with necessary commands. If you want to use Kohya's Stable Diffusion trainers...

▶ Play video
#

Is this still the best or is there a better one? I've heard aitrepreneur makes good ones?

#

Also wondering if I had the wrong python version when I did training ......

surreal lagoon
#

it really might depend on your images that you're using

valid coral
#

And then wondering if that wrong python version is resulting in the loras not getting loaded properly in a1111

surreal lagoon
#

try with like 15 images and no class data and about 4000 steps at a learning rate of 1e-4

valid coral
#

I have previously used the exact same images (13 of them) with Shivam's Dreambooth colab thingy, got great results

surreal lagoon
#

oh

#

well dreambooth captions and lora stuff are different

valid coral
#

Yeahh ... .. I dunno anything now 😄 my info is all from November, forever in the AI world

surreal lagoon
#

i have never actually messed with lora

#

sounds messy

valid coral
#

ahh? yeah see the terms are all new and I really don't know what is what

#

dbooth was destructive etc and lora is supposedly not?

#

But how do people make their civitai stuff? 😛

#

So basically I wanna add some faces to a civitai model ......

#

What's your personal favorite way of doing that?

surreal lagoon
#

dreambooth doesn't have to be destructive

#

freeze about half the text encoder

#

do as few steps as it requires to actually get the results you seek

#

you want to have a few validation prompts like different celebs or 'a random european man' kind of thing to ensure you are not breaking it

#

i usually check these:

        "woman": "a woman, hanging out on the beach",
        "man": "a man playing guitar in a park",
        "child": "a child flying a kite on a sunny day",
        "alien": "an alien exploring the Mars surface",
        "robot": "a robot serving coffee in a cafe",
        "knight": "a knight protecting a castle",
        "menn": "a group of men",
        "bicycle": "a bicycle, on a mountainside, on a sunny day",
        "cosmic": "cosmic entity, sitting in an impossible position, quantum reality, colours",
        "wizard": "a mage wizard, bearded and gray hair, blue  star hat with wand and mystical haze",
        "wizarddd": "digital art, fantasy, portrait of an old wizard, detailed",
        "macro": "a dramatic city-scape at sunset or sunrise",
        "micro": "RNA and other molecular machinery of life",
        "gecko": "a leopard gecko stalking a cricket"
valid coral
#

And you enter these into what or where 😄 an a1111 extension? A separate thing?

surreal lagoon
#

in fact my dreambooth script there is the least destructive i know of but it's more like a fine-tuning script now...

valid coral
#

OK ........ that's not as straightforward/obvious as I was hoping 😄

#

But perhaps an avenue I shall explore

#

For now, I am neglecting my child and must get back to daddy duty 😛

valid coral
surreal lagoon
untold moss
#

what is the recommended resolution in ppp for training? is 72 enough?

stiff dust
#

cool thing on LORA is: you can train it and then afterwards disable certain layers to check what happened

#

for example, I found that when training on my face it was the unet that made trouble (my photos are rather low quality android photos and the unet very fast learned the grainyness of the photos). So scaling down the unet and relying more on the text encoder fixed that for me

#

similarly, you can disable the first k layers of the text encoder and check how that affects your results

#

you can do all that afterwards and find out what went wrong in your training. Next, you retrain the lora or dreambooth but this time with removing the layers that caused harm

#

@valid coral if you don't want to write your own code, there is a tool called EveryDreamer2 that can do DreamBooth and that is parameter tuned

#

like it already freezes the first 16 layers of text encoders in Sd 2.1 and so on

valid coral
#

venv fixed. Seems the dbooth extension for a1111 has been broken for a while. Saw a couple fixes on various forums that didn't seem to solve my issue. SO FORGET THAT IDEA

Now back to seeing if I can get dbooth to train in kohya_ss.

If not ... EveryDream Trainer 2.0 💪

valid coral
#

OK! I finally got all my ducks in a row and hit the Train button for Dreambooth in kohya_ss.

Shortly after, I got a memory error 😄
(I have 12 GB)
So that's an official "no" to running Dbooth on my machine? There's no config to edit or slower way to do it?

"try setting max_split_size_mb to avoid fragmentation" -- not an option?

🙏

warm agate
valid coral
#

This is a heck of a place to ask that 😄

warm agate
valid coral
#

Not the worst place place. There are people here that might know. But are they awake right now ....

valid coral
# warm agate Not sure

Are you familiar with Text Generation WebUI? There's a discord, I've gotten some great support with related stuff there...

#

I just tried sharing the link but it was blocked 😛

#

Easy enough to google it

valid coral
#

Indeed!

valid coral
warm agate
#

@valid coral

#

What might be the error?

valid coral
#

Just looks like a timeout error .... tried installing too many github things? Happened to me once. Go to the main github website in a browser and see if it has you on timeout...

#

also not sure what the "jllllll" thing is about

warm agate
valid coral
#

And you're using the one-click installer?

rare niche
#

Is it better to train on a celebrity with a smaller set of images (40) or larger set of images (160)? How do you decide how much is too much? I heard that 100 steps per image is correct. Is that true?

tall vault
#

I'm trying to figure out how to train for specific faces. My graphics card only has 6GB vram so I can't use Dreambooth so I am trying to use Kohya to create a lora but everything I try with Kohya causes errors and nothing works.

What should I do?

warm agate
#

@valid coral Same error when I ran pip install -r requirements.txt through the normal installation meth

jaunty grove
#

Hi All,

Noob to AI Art here, having starting generating with early access to the Leonardo AI cloud platform 2 weeks ago where I generated a fair few images, using different models, and even trained a couple of models as well (albeit through a very simple user friendly GUI)

I swiftly moved on to installing Automatic1111, and SD, and have been doing some great local generations, and upscales using different models and LOra's from Civitai etc.

Today I installed Kohya_SS GUI, after much frustration as the install doesn't "just work" when running Setup.bat, or at least didn't for me.

I set about doing my first training in the Kohya GUI, and provided a set of 130 images, and used WD1.4 to tag then. Tags look good enough to me, so I went ahead with the training, and got my first character/preson Lora out of it. I kinda of worked, and it's had an affect but it's not strong enough.

I just don't know what I'm doing with the Repeat, Batch Size, and Epoch settings 😦 I ran that first Lora on defaults:

Repeats: 40
Batch Size: 1
Epoch: 1
Optimizer: AdamW8bit
Text Encoder Learning Rate: 5e-5
UNet Leading Rate: 5e-5

I have no idea how this correlates to the number of steps, how often I should output a sample image etc

I understand an epoch is a complete pass over the data set, where each image is trained "repeat" times at part of the epoch. What's a good number of steps to aim for to train a model? How does the number of repeat / epochs affect things

Any advice appreciated, or pointers to some good resources.

Thx

jaunty grove
#

I think just answerd some of my own questions lol:

I wanted to know how image repeat, epochs etc related to steps, and then saw this in the Kohya output:

#

That tells me what I want to know, but what I don't know is, is 26K steps good? Is there any guidance around this, and also the batch size.

I know what batch size is when doing generations. Is it the same here? So if I set batch size to 4, I'd end up with 26K x 4 steps, and the idea being that each pass over an image, per epoch, will generate 4 images to train on

#

I'm running a training now, and have set it to output an image every 100 steps. That's gonna be a crap ton of sample images as it trains

tall vault
#

@valid coral did you have any luck with training? I'm also trying to figure out how to do it

jaunty grove
# tall vault <@543489650241568780> did you have any luck with training? I'm also trying to fi...

Well, I took out a few images that I thought would pollute the data set because in one the person was wearing a hat, and a couple of others they had weird colour contacts in.

I tried with batch size 4, epoch 10, repeats 30.

Took an hour, and I could see it was getting closer to representing the person, but wasn't quite there. I was watching the sample pics being generated.

I'm doing another run, this time batch 4, epoch 20, repeats 40 which is likely to take 3 hours.

My learning rate for both is 5-e4.

I'm new to it all, both AI art with SD, and training. Fingers crossed I'll get something good this time.

I'm doing lots of reading to try and understand everything

valid coral
#

The Dreambooth extension is broken with a recent update of A1111 ... and the lora I trained with kohya_ss was also totally unusable (generated errors).

tall vault
#

mama mia, has anyone been able to get this to work?

valid coral
#

Of course! Just nobody that's here right now 😄

#

And I just hit the part in the tutorial video where the guy shows it running -- it's using 22 GB of VRAM ... so that's the end of EveryDream2trainer for meeee!

valid coral
tall vault
#

Ya I think I might need to do the training on something that's cloud hosted. Then once I have the lora I'll be good?

#

if I'm doing cloud hosting maybe I should use dreambooth? does dreambooth produce lora's too?

jaunty grove
jaunty grove
valid coral
#

Yeah I'm gonna rewatch the koyah_ss tutorial video and do exactly what the guy did, instead of branching off with my own model...

valid coral
tall vault
#

then it says " Ignore above cudart dlerror if you do not have a GPU set up on your machine."

#

but I do have a GPU

#

I have a laptop with integrated graphics and a GTX 2060. So maybe its accidentally using the integrated one?

tall condor
#

anyone know if it a problem that i have buckets with only 1 image?

#

also if i increase the bucket size does that mean that my images are also handled in bigger chunks or is the only difference the grouping?

jaunty grove
# tall vault then it says " Ignore above cudart dlerror if you do not have a GPU set up on yo...

I have the solution for that, as I was installing koyah for the first time today and had that error.

What I did was copy that file from the Automatic1111 directory, to the root folder of koyah.

I then got errors about my xformer version, and pytorch version.

To fix this from a python prompt, and in the koyah venv root dir, I ran the following to install latest xformer, and pytorch+cu118 (koyah wasn't running at this time)

pip install -U xformers

pip install -U torch torchvision torch audio - - index-url https://download.pytorch.org/whl/cu118

After that, it amazingly works.

Why the setup.bat of koyah gui just doesn't work is a mystery.

Took me a while to figure out, as whilst I am a dev, I know nothing of python.

If you need the cuart dll let me know

tall condor
#

found this, IMO its quite usefull

#

it describs the kohya_ss parameters quite well

tall condor
#

if i dont "upscale bucket resulution" (dont downscale my images to fit the max size) in combination with random crop does that mean that for each time the image is picked for learning a random 512x512 section of the image is selected for learning but without scaling the image down?

surreal lagoon
#

400 steps into trying to burn a model lol

tall condor
#

if its only 400 steps xD

surreal lagoon
#

2.1 can burn like ice

tall condor
#

all my 2.1 trails failed hard so i stick with 1.5 for now xD

surreal lagoon
jaunty grove
# tall vault THAT WORKED

Glad to help out. I really struggled this morning to get Koyah Gui to work, had to many issue, but glad I got it sorted eventually, and also that I could spread word of how to get it working for others 🙂

My own lora works nicely (though I need to re-train to get a closer look to the person I trained on), but what I'm finding is that when combining with other lora's, the prompt weights seem off. it's like my lora's weight is too heavy, and I have to really up the prompt weight of the other lora

tall condor
#

maybe your model is overfitting too much

surreal lagoon
#

the bicycle with lord of the rings i am trying to get transferred into the model

tall condor
#

yea i see no ring xD

surreal lagoon
#

the seat is beige

#

very hobbitsy, you see

tall condor
#

xD

surreal lagoon
#

look at the blurry, soft focus

tall condor
#

come back when it can ride in mordor - then we talk!

surreal lagoon
#

look at its gandalf

#

hes so smol

tall condor
#

please have gandalf ride that thingy

surreal lagoon
#

the ballrog?

tall condor
#

no the bicycle

surreal lagoon
tall condor
#

@surreal lagoon can you answer this: if i dont "upscale bucket resulution" (dont downscale my images to fit the max size) in combination with random crop does that mean that for each time the image is picked for learning a random 512x512 section of the image is selected for learning but without scaling the image down?

surreal lagoon
#

omg it doesn't do it

#

oh i typed basketball rog

tall condor
#

that gandog?

surreal lagoon
#

gandog the gray

surreal lagoon
#

you know my feelings on this bucketing nonsense

tall condor
#

yea i know but if the behavior is as i describe i an actually even see a benefit

#

because i would leant way more details on a model this way

surreal lagoon
#

it's possible

#

it's also possible it'll break it

tall condor
#

btw my patterns are gone after latent upscaling all the images

#

i guess it can only break if if the images are too big and the sections are too small

#

but i see your point

#

probably creating those details beforehand and then caption them propperly would be the best option

surreal lagoon
#

baseline's understanding of a leopard gecko man

tall condor
#

but i just have too many images for that

#

lizmouse?

surreal lagoon
#

leopard gecko

tall condor
#

make a cacco? xD cat gecko?

surreal lagoon
tall condor
#

also its way more gecko than man

surreal lagoon
#

man isn't in the prompt!!!

#

you

#

you monster.

tall condor
#

but still nice result

surreal lagoon
#
        "woman": "a woman, hanging out on the beach",
        "man": "a man playing guitar in a park",
        "child": "a child flying a kite on a sunny day",
        "alien": "an alien exploring the Mars surface",
        "robot": "a robot serving coffee in a cafe",
        "knight": "a knight protecting a castle",
        "menn": "a group of men",
        "bicycle": "a bicycle, on a mountainside, on a sunny day",
        "cosmic": "cosmic entity, sitting in an impossible position, quantum reality, colours",
        "wizard": "a mage wizard, bearded and gray hair, blue  star hat with wand and mystical haze",
        "wizarddd": "digital art, fantasy, portrait of an old wizard, detailed",
        "macro": "a dramatic city-scape at sunset or sunrise",
        "micro": "RNA and other molecular machinery of life",
        "gecko": "a leopard gecko stalking a cricket"

tall condor
#

very much and clear details

surreal lagoon
#

it's the gecko prompt

#

yeah when you train 2.1 on terminal SNR it goes amazingly clear and crisp

#

SAI needs to update it with this baked in

tall condor
#

my main issue with moy models atm is that the training picks up the colors and patters sometimes more than the objects. is there a way to tackle that?

surreal lagoon
#

yeah freeze the text encoder more

tall condor
#

also can i see your "cosmic"

#

my next try will be to stop the text encoder at 50% and see where it takes me

surreal lagoon
tall condor
#

looks cool

surreal lagoon
#

pretty cool eh

#

way better than 2.1 without SNR fixes

tall condor
#

and lol for "menn"

surreal lagoon
tall condor
#

ill copy that xD

#

it was my idea

#

and if it sucks it was yours

#

some of my concepts have very few pics (maybe 2-3) - do you think the batch size of 6 will kill those concepts?

#

im quite happy that i could get rid of those patterns as they were a real PITA

surreal lagoon
#

i broke it. my cosmic prompt

tall condor
#

how?

surreal lagoon
#

successfully regressed Stable Diffusion's capabilities by like 3 years KEK

surreal lagoon
tall condor
#

some of my concepts have very few pics (maybe 2-3) - do you think the batch size of 6 will kill those concepts?

surreal lagoon
#

no idea

tall vault
#

I was trying to see if I could successfully create a lora with kohya so I just made one with 2 images. It's still taking a while, downloading a bunch of stuff it seems, and I gotta take my laptop with me to the airport in like 10 min. If I cancel the operation will it screw up my kohya installation?

tall condor
#

likely yes

#

pseudoterminalx: does it makes sense to train a 1.5 model with images greater 512px?

jaunty grove
surreal lagoon
#

broke it again, this is supposed to be a wizard

tall condor
#

i mean, kinda looks like a wizzard xD

tall vault
surreal lagoon
#

100 steps earlier tho

tall vault
#

What are you trying to make?

surreal lagoon
#

i've extracted frames from The Hobbit and i'm transferring its style

#

experimenting with different text encoder layers training

#

so far it seems like the 16th layer is too early still

#

too many fundamentals you can break

tall vault
#

Very interesting. Where did you learn about layers?

surreal lagoon
#

by breaking them

jaunty grove
# tall vault I didn’t change any of the defaults. Maybe it downloaded stuff because it was th...

I've a 900Mbps connection, so maybe I didn't notice the downloads lol, as I was doing other stuff at the time as well.

Hope you get a lora creates. Any problems let me know, and I'll see if I can help.

I'm new to koyah as well, as of this morning. I'm on my second training now.

It's going to take 3 hrs total on my 4080. Around 24K steps to train in total.

Same one I did earlier, that came out pretty good. But it wasn't quite close enough, so I'm re-running with more epochs and repeats

#

@surreal lagoon Are you using koyah?

How do you go about selecting the layer to train?

surreal lagoon
#

i'm using a script i'm developing as i go

#

i love this one

#

looks so damn bootleg

tall vault
#

Thanks @jaunty grove I’ll definitely give you a ping if I run into more trouble. I’m trying to figure out if I can create a Lora with someone’s face and use it in combination with the studio ghibli Lora to make ghibli themed profile pics

tall vault
#

@surreal lagoon do you have a background in data science?

surreal lagoon
#

lol i just keep breaking it god damn that's supposed to be a bicycle

#

same ckpt

#

sigh

valid coral
valid coral
surreal lagoon
#

this training session is going much better.

#

i've obviously bumped the weights of "people" up immensely. a scene that just describes a robot serving coffee, now serves it to a person

#

this one, in other training sessions with different dataset, would result in an animated wizard

warm agate
#

are vast.ai gpu prices cheaper than runpod?

#

As its just 1 cent/h and on run pod its $1.4/h

surreal lagoon
#

that's so cheap. why

warm agate
surreal lagoon
#

i would honestly do performance tests. it might be overshared

warm agate
#

Its 4x3090

#

It shows VRAM of 24GB

#

so does have 24gb vram per gpu or collectively 244gb vram?

surreal lagoon
#

hmm

#

24 per

#

that's priced linearly above the 48gb system too

#

damn so i can run like a cluster over there for what i pay now KEK

warm agate
#

so we get 96gb vram

surreal lagoon
#

seems like it

warm agate
surreal lagoon
#

yeah

warm agate
#

Runpod is exactly twice the price of vast.ai

surreal lagoon
#

that's a more common price

#

still really good price

warm agate
#

Is vast.ai displaying low prices for new users

surreal lagoon
#

good question

warm agate
#

SO they can acquire the customers?

surreal lagoon
#

well let me know if you find out lmao

warm agate
#

Ok

#

I am contacting the customer support

#

see this

surreal lagoon
#

the woman hanging out at the beach has always been a difficult prompt for 2.1 for some reason but now she even has all the right number of fingers. they're just the wrong ones

warm agate
surreal lagoon
#

hm that shoulder tho

warm agate
surreal lagoon
#

i'm just continuing training it lmao

#

you guys inpaint whatever ya want

surreal lagoon
#

not sure

surreal lagoon
#

@warm agate the price increases when you allocate more than 10gb of storage

warm agate
surreal lagoon
#

and also it's just random people's computers i think

warm agate
surreal lagoon
#

so the datacenter prices are still good

#

makes longterm prices much higher tho, about 2x the price for 60gb of space

warm agate
#

We have a reliability score, so we can easily estimate

#

@surreal lagoon do you know how we can add minigpt4 into text-generation-webui?

surreal lagoon
#

nope.

warm agate
#

@valid coral how to add minigpt4 into text-generation-webui, I have asked in their discord, but they seem unresponsive.
i have added minigpt into the pipeline, but how to work with the model?

surreal lagoon
#

not really the right channel for any of that

valid coral
warm agate
#

@valid coral Can you debug this

valid coral
#

Debug? It looks fine to me...

But no, not really, I have probably 3% knowledge when it comes to Python.

#

I see you were getting a bunch of replies on the other Discord, but you were in the "dev" channel and not a support channel.

#

Asking all over the Interwebs is only gonna get you banz0red 🙂

So .... yes it's frustrating and confusing, but that's where we're at, this stuff is very much in its infancy.

warm agate
tall condor
#

@surreal lagoon: what can i expect from the UNet Training after freezing the TI? what shall i focus on to see if training is still improving

surreal lagoon
#

i would expect it to start taking on the textures of your images more than their contents

#

at least for sd2.1 it can kind of improve the model to keep training the unet but if your source material isn't truly high quality it just ruins it

#

the text encoder is most worthwhile to train

#

it's also the hardest to 😦

#

currently i'm testing offset noise for the first time and i'm just not expecting to see stuff like this in my results. is that what it does at first before swinging back and making more sense?

surreal lagoon
#

well

#

tuning with multiple GPUs apparently you can't freeze the text encoder during training, at least, not the way i've done it

tall condor
#

for me when i reduce the noise ofset to 0.02 the results are getting way better

#

also it got rid of alot or wired stuff for me

#

what value did you pick?

surreal lagoon
#

GPT4 is telling me i can't have offset noise trained in and also freeze the text encoder so early 😄

#

it says it is not going to work

surreal lagoon
#

oh whoa, GPT4 was right LMAO

#

god damn it

#

i hate it when the robot is right

tall condor
#

0.02

#

try that

#

works quite well even for dark pics

#

also you need to increase your epochs with low noise

surreal lagoon
#

thank you

surreal lagoon
#

love how thorough the notes from LAION are on OpenCLIP

#
H/14 with big batch size works well, but unstable, and very hard to recover

Planned for 256 * 135M epochs from 2B-en
Spike at epoch 122. Tried a lot of stuff to recover
Only one thing worked: decreasing lr fast for 8 epoch, got 74% that way
Batch size 79k, starting lr 5e-4
1 week to train + many days to try to figure it out
800gpus
Doing 8 epoch with batch size 158k gave 75.4%

Finished up to 256 at batch size 79k in bfloat16 and reached 78.0%
#

so trying to fine-tune 2.1 on a huge cluster of GPUs is harder than training on a small group

#

i hadn't seen this page before now but it's fun how my results mesh with theirs

#

kinda wish they'd started from scratch once they figured it out again

surreal lagoon
tall condor
#

i had the same experience

#

especially for very dark and very bright. and for me after adding that low noise value the contrast got much better in general

surreal lagoon
#

omg omg omg

#

it's happeeennnningggg

#

ALL schedulers in diffusers will do zero terminal SNR now

tall condor
#

great work man

#

cant wait to see this feature in kohya

surreal lagoon
#

oh Max is the one who did all the work, i just trained a model that allowed them to test it

tall condor
#

@surreal lagoon can you freind/pm me i have 1 question regarding hardware

surreal lagoon
#

i hate doin that tbh i get a huge friends list full of people i don't know lmao

tall condor
#

np. can you recommend any AI workstation with 2-4 4090?

#

what hardware setup are you on?

surreal lagoon
#

2 to 4 of them? 😮

#

threadripper, to start with

#

dual power supplies.. it's a lot

#

like, my 5800X3D and the ASUS X570p (AM4) are capable of having two GPUs but the 4090 uses three slots

tall condor
#

thiungs i have seen so far have capabilities for 2x 4090

surreal lagoon
#

so get one 😛

tall condor
#

i allready hve 2 workstations with 1 each but it soooo darn slow

surreal lagoon
#

it kind of doesn't go faster with two

#

you just get higher batch size

tall condor
#

i am a bit concerned that high batch sizes mess up my concepts

surreal lagoon
#

you want your entire dataset absorbed in a single shot if possible

#

that is the best, but, no one can do that

#

that's the only reason we batch stuff

#

so, the higher the better

tall condor
#

i see

#

it just still doesnt make sense to me if i mix up the learning result of 2 different things in one update that it can still learn both the things you know

#

and mixing 6 of them IMO cant make it any better

#

just makes no sense in my head

surreal lagoon
#

i just do what the AI tells me

surreal lagoon
#

i now have a pretty good National Geographic dataset

tall condor
#

cool

#

i found a way to add another rtx4090 to one of the workstations xD

#

@surreal lagoon do you have any other usefull tipps for training larger datasets? im still have some issues that the details in the pictures are not picked up very well

surreal lagoon
#

nay

tall condor
#

also any tips on how i can prolong the training and improoving the dataset on the long run?

#

so far 100-200 epochs are gettting me somewhere but some concepts are still very badly generating

surreal lagoon
#

well

#

you might need to up the learning rate and freeze more layers for 1 epoch

#

and then, go back to old settings

#

or unfreeze more layers

#

it's a game about tricking the model into a new space that is in the direction you want and then slowing down and refining it

tall condor
#

if i reduce the LR the concepts wont create at all mostlky

surreal lagoon
#

that's why the polynomial learning rate has a really high learning rate for a number of warm up steps

#

so it can move the model into a new zone that it needs to clean up the output of

#

then the learning rate decays and slows down

tall condor
#

im currently using constant scheduler

#

but i tried cosine and tbh i didnt see much difference

dark gale
#

Hello! anyone using Runpod? I want to know if it worth the use

surreal lagoon
#

@tall condor ever seen this pixelation?

surreal lagoon
#

hm it's nto always there

stone garden
#

/imagine, soggiorno, parquet, grande porta finestra, tramonto, arredamento country, divano, camino

tall condor
#

look s like it failed to create the depth of field

#

which is ok IMO

surreal lagoon
surreal lagoon
#

😄

#

progress

surreal lagoon
#
Meta key: Title values: ["RHODES"]
Meta key: Creator values: ["Cushman, Charles W., 1896-1972"]
Meta key: Date modified values: ["02\/03\/2022"]
Meta key: Subject values: ["Towers","Spires","Seas","Forts & fortifications","Coastlines","Buildings","Clouds","Waterfronts","Islands","Rhodes (Greece : Island)"]
Meta key: Roll Number values: ["4-65"]
Meta key: Date Created values: ["1965-04-04"]
Meta key: Source values: ["P13980"]
Meta key: Holding Location values: ["Bloomington - University Archives<br \/>Wells Library E460<br \/>1320 E 10th St.<br \/>Bloomington, IN 47405<br \/>Contact at <a href=\"mailto:archives@indiana.edu\">archives@indiana.edu<\/a>, <a href=\"tel:812-855-1127\">812-855-1127<\/a>"]
Meta key: Alternate ID values: ["465.37"]
Meta key: Campus values: ["IU Bloomington"]
Meta key: City values: ["Rhodes"]
Meta key: State/Province values: ["Aegean Islands"]
Meta key: Country values: ["Greece"]
Meta key: Genre values: ["Seascapes","Cityscape photographs"]
Meta key: Call Number values: ["P13980"]
Meta key: Frame Number values: ["37"]
Meta key: County values: ["Sporades"]
Meta key: Persistent URL values: ["http:\/\/purl.dlib.indiana.edu\/iudl\/archives\/cushman\/P13980"]
Meta key: Cushman Identifier values: ["P13980"]

generates the caption:

Generated title for image: rhodes towers spires seas forts fortifications coastlines buildings clouds waterfronts islands (greece island) aegean seascapes cityscape photographs sporades county
surreal lagoon
#

this is amazing

sturdy dagger
#

I would like to run some dreambooth training with ShivamShrirao repo, two questions:

  • Is it mandatory to provide a class_data_dir folder?
  • Is there somewhere I could find good quality regularization photos (for men and women)?
tall condor
#

gabinino: i recommend you start without regularisation

#

and see where it takes you. as far as i undersatnd you can not just take any regularisation images, the need to be made with the model you train on

jade hinge
#

I have discovered a workflow that has never been explored before, which allows for studio-quality realism beyond expectations using Stable Diffusion DreamBooth / LoRA training. To achieve this workflow, it required an exceptionally high-quality dataset of classification / regularization images. Additionally, I developed a script capable of autom...

▶ Play video
#

can be used for fine tuning

sturdy dagger
jaunty grove
# tall vault Thanks <@345506719788564480> I’ll definitely give you a ping if I run into more ...

Hey Snubber,

I now know why you were getting all those downloads in Kohya when you kicked off training that first time. It's because you had the default model path in "Source model", that causes Kohya to go off and download all the checkpoints from the runwayML git.

Probably a bit late now, but what you need to do, and what i did, is press teh paper icon button and open an existing checkpoint/safetensor model e.g. SD1.5, or RPGv4, or any other model that you likely have got installed into Automatic1111

I just set the path to a model that is in my Automatic1111 directory, and then Kohya just uses that and doesn't download anything

tall vault
#

Okay cool! Good to know thanks for the info @jaunty grove !!

jaunty grove
# tall vault Okay cool! Good to know thanks for the info <@345506719788564480> !!

Did you get any Lora's trained in the end? I've done two Lora's for two different people. I had to try a few times with different learning rates, epochs, repeat etc, as I've read that for a person around 1500-3000 steps are enough.

I found with too low a learning rate for the unet, and main learning rate, it just didn't take. Too high, and it ended up just looking really bad. I'm still trying to figure it all out. Through trial and error I got my two people Lora's to work, but the weighting seems really "heavy". By that I mean my Lora's default weight compared to other keyword is too high. I've no idea how to change the weightings that get backed into the Lora as part of the learning 😦

tall vault
#

civitai isn't loading for me but if you check out the example in here https://civitai.com/models/6526/studio-ghibli-style-lora
One example they use a zelda lora and the studio ghibli lora to make an image of zelda in ghibli style. And you can see they have different weights for the different loras

#

here it finally loaded

jaunty grove
#

Yeah I can put weights on when I use it, but the default weighting feel too heavy, so in use I end up putting a low weight on it to bring it down a bit e.g. lora:mylora0.6

tall vault
#

ohh ok, interesting

jaunty grove
#

I'm sure it's me missing something. Let me know if you find the same when you get a chance to try 🙂

tall vault
#

will do

hot ether
#

which option to choose guys?

#

3060ti

stiff dust
#

honestly, you don't need accelerator if you have only a single gpu

stiff dust
#

can you share your learning rate, number of steps, and number of images, as well as the rank?

tall condor
#

so it appears running with 2GPUs is not halfing the time for training, anyone know why?

#

also it shows like it is running for 400 epochs eventho i specifiy 2

#

anyone know why

surreal lagoon
surreal lagoon
#

you just get to run a larger batch size but everything is limited by the main system doing the training

tall condor
#

actually i think that the time is halving but for some reason if i spicify 200 epocs it runs 200 epochs per gpu

#

so it will run 400 epocs

#

same for batch. i specify 6 batch but it does 12

#

however 1 epoch does run much faster

#

but what im not sur eof is if i can just say 100 epochs if i want 200

surreal lagoon
#

yeah it'll scale learning rate too

#

so it'll destroy model faster

tall condor
#

can you eloborate?

#

not sure if i understand

surreal lagoon
#

?

#

the learning rate is impacted by batch size

#

rescaled CFG is scary

tall condor
#

but i though heigher batch size means more stable learning?

#

thus i could even increase the learning rate

#

or did i missunderstand that part

stiff dust
#

no, thats correct

tall condor
#

i wonder what paralell mechanism is used by dreambooth

stiff dust
tall condor
#

if i merge 2 models somehow the result is different if i merge A+B and B+A - site for first is 5.8GB size of 2nd is 7.8 GB

#

anyone know why?

#

is the merge not combining both models into one and apply a weight if they both have the same key?

surreal lagoon
stiff dust
#

dunno, I thought its also just unet.compile(). But to be honest: I didn't noticed any performance improvement by compiling (beside waiting minutes until the compiler's done)

surreal lagoon
#

when you compile it and try and do certain operations it has to be recompiled and it breaks if you try and recompile it when it's already done

sturdy falcon
#

hey so I'm not sure if this belongs here or in #📝|prompting-help let me know if this isn't the right channel for this please :)

I'm trying to take a seamless tiling image and upscale it with Controlnet 1.11 tiling resampler, following this guide: https://stable-diffusion-art.com/controlnet-upscale/

Environment Info:

  • A1111 webui
  • dreamshaper model
  • Controlnet v1.1 sd15_tile model
  • Ultimate SD Upscale script

I'm getting it to generate nice upscale details, but its not seamless at the edges of the image. Its generating abrubt lines where it repeats even though I am selecting tiling setting under the img2img settings at the top.

**Pictures of settings attached: **

I'm using the website: https://www.pycheung.com/checker/ to check the images tile seamlessly, heres a comparison of the input and output images:

Images in the tiling checker

Does anyone know how to fix this?

#

I'm gonna post example 1 again so it embeds these two but not everything else, I don't want to clog up the channel lol

visual lichen
#

Sorry if this is the wrong place if so I'll delete:
Looking for guidance on where to go to learn to train "styles" as embeddings or loras or whatever where it won't affect the content much but will affect colors and lighting to make it match a style independent of content.

Also want to learn to train specific concepts better. I have been trying to make monuments into enormous fishtanks with mixed results. Is there a best practices here? I've trained hypernetworks and textual inversions with mixed success

#

Currently I am training by masking out the alpha on everything except my subject. That works ok but I don't have enough control. And I can't make the monuments (such as the leaning tower of pisa) into fishtanks well.

surreal lagoon
surreal lagoon
surreal lagoon
#

you're saying the seams are always detected?

sturdy falcon
surreal lagoon
#

you can try inpainting the seams maybe

#

unfortunately i don't think they will ever be truly invisible. the way it works is by inpainting them already

sturdy falcon
#

hmm I can try inpainting but I doubt it'll work, when I was inpainting images before without upscaling it was messing up the seams

#

I've gotten Topaz Photo AI to upscale them and keep the seamless tiling effect, but it doesn't keep generating the image with fine details like SD Controlnet does, it just kind of makes the low res image sharper and smoother looking

#

yeah, unfortunately inpainting the seams made it worse

#

its like the controlnet is ignoring the tiling setting at the top

surreal lagoon
#

i've managed to reintroduce the idea of smoking 😄

#

they're super smoky smokers now

#

i've upped the resolution of all my validations to see if i've managed to train out the model's tendency toward duplicate subjects, and voila, 1152x768

wispy elbow
#

When running hires fix under latent, it always ruins hands that were previously perfect for me, I've been trying different settings but can't quite seem to get it. Anyone know the issue? Also, anyone able to explain the difference between the different Latent upscalers like Latent Nearest and Latent Nearest Exact? I've tried googling all these things to no avail, so I resort to bugging people here.

surreal lagoon
sturdy falcon
#

So it looks like its the stable diffusion upscaler doesn't actually support seamless tiling images, but I might have found a workaround

  1. generate your texture with the tileable setting turned on (result: 512x512 image)
  2. tile the resulting image 2 by 2; meaning 2 tiles in X and Y direction = 4 tiles in total (result: 1024x1024 image)
  3. upscale the 2-by-2 tiled image as much as you like (result: for example 4096x4096 for 4x upscaling)
  4. crop the center part of the upscaled image (result in this case an 2048x2048 image)
  5. check that the center crop is seamlessly tileable - which it usually is...
GitHub

Is there an existing issue for this? I have searched the existing issues and checked the recent builds/commits What would your feature do ? I wish to be able to use SD upscaling on seamless texture...

surreal lagoon
#

have you tried mixture diffusers

wispy elbow
#

Not even sure what that is, so no.

surreal lagoon
#

seamless tiling

#

it's in the multidiffusion extension

visual lichen
surreal lagoon
#

prompt alternating?

#

maybe

tall vault
#

@jaunty grove first attempt at creating a lora based on me. It's pretty cursed. I did 1500 steps. I can increase the steps but are there any other settings I should try changing? (some people mentioned epochs, what does that do?)

stray pecan
#

When using the Tiling option (in auto11), does it change the Lora layers too, or only the base models?

#

I want to train a Lora that is optimized for Tiling and I don't know if I should enable Tiling (changing Conv2d layers' padding to circular) just for the base model or also for the lora network during training

jaunty grove
# tall vault <@345506719788564480> first attempt at creating a lora based on me. It's pretty ...

Good first attempt. Can't say mine was much better lol.

So from all the reading I've been doing around 1500 - 3000 steps total is good for training on a person. My two that I've done that have worked out ok, and present the person in the AI art accurately were around 1900 steps total I think.

So there are a few elements that you need to consider when working out the total steps:

Total Steps = (Number of Images * Image Repeats * Epochs) / Batch Size

Number of images = Self explanatory
Image Repeats = Number of times the source image is shown to the model
Batch Size = Number of source images taken as a batch, which are presented to the model for training
Epoch = A single training pass (Number of Images, Images Repeats, Batch Images)

Think of each epoch as a training run, the more epochs you do, you're kind of re-enforcing the model by going over it all again and again. I've seen reccomendations that around 10 epochs is good for training.

For learning rates, there are 3 of them. LR Rate, Text Encoder Learning Rate, and UNet Learning Rate. I've literally no idea of the specifics of these, but I do know that the larger the value the faster the learning rate (which isn't good), the lower the value the learning rate slows, but make it too slow and it doesn't learn a lot. It's a huge balancing act, and I'm doing it through trial and error.

#

@tall vault I set all 3 learning rates to the same value. If you put in something like 0.005 that will be too big a number, the learning rate is too fast, and it won't learn. I used 0.000005 at one point, and that was crap, it didn't learn a lot. Eventually I used 0.00005, and that worked out great, but I do feel it will depend on your images.

So for my first character that I trained I had 122 images. I had more, but dropped them because the person had face paint on, and weird contacts which was messing it up). So 122 images total

To get around 1500-3000 steps total, I used the following:

(122 images * 15 repeats * 10 epochs) / batch size 6 = 3050 steps total

3050 was near enough for me. I was happy with that so then I set about training with the learning rates mentioned above. Once I tried 0.00005 it was good for me, for that particular training data set.

Another tip for person training, is to make sure you have images of the person from different angles, wearing different clothes, different expressions etc, or the model will get over fitted (basically baked in), and your rendering using your lora, will mean your AI person will always have the same kind of pose etc Under-fitting is where it hardly looks like your character you trained.

Apparently you can get away with 10 - 15 images to train on a person. I used 122 for that first person. Second person I used 19 images,and it worked ok (though not as good as the first one)

tall vault
#

OKay awesome! thank you so much for all the info! I am going to give it another shot tonight

jaunty grove
#

Also I've been reading up on, and watching lots of videos on AI upscaling techniques. The best one yet, uses a combo of the ControlNet, and Ultimate SD Upscaler extensions to Automatic1111

Here's some examples where I took an original 512 x 768 render, fixed up the eyes via inpainting, and took the image in 2x scaling increments to to a whopping 8192 x 12288. So four lots of 2x upscalings to get there :-).

The small pic, and the first pixelated face are the original image 512 x 768 image (eyes fixed up via inpainting), followed by ones with increasing detail.

tall condor
#

i recommend to add a noise offset of 0.02 to your lora and use a LR Scheduler with 10% warmup (constant with warmup or cosine)

stiff dust
stiff dust
#

like it is super easy to train on your face and generate nice anime images of you. But making it photorealistic is difficult. use a CFG of 3 or 4

tall vault
#

do you think this is even possible?

stiff dust
#

yes, I found that works straight away

#

even textual inversion is often good enough for that

#

the funny thing is that I also got extremely photorealism anime portraits of myself without problems (like they fit my face in super high details) but as soon as I want photorealism things get hard

#

reducing CFG to an extremely low value helps a lot, though. Like I got almost good photorealism with that

tall vault
#

how did you get it get your face details? training a lora?

#

or just textual inversion?

stiff dust
#

for best details you need a lora

#

but for a anime character (where you don't need all wrinkles and other details ;)) a textual inversion is enough

#

I have to say I found it easier train on SD 2.1 than on SD 1.5 (in contrast to what most people say)

tall vault
#

ok cool, any tips on settings/steps for anime?

surreal lagoon
#

hmm so at batch size 18 i'm seeing 2.23 seconds per iter and at batch size 6 i'm seeing 1.2 seconds per iter. how much faster is BS=18?

#

a train leaves los angeles at <x> miles per hour, ...

#

using a batch size of 18 processes samples approximately 1.614 times faster than using a batch size of 6 (8.07 divided by 5 equals 1.614)

#

For batch size 18: 2.41 hours * $3.18/hour = approximately $7.67

For batch size 6: 3.89 hours * $3.18/hour = approximately $12.37

surreal lagoon
#

still waiting for the model to be able to make a white background

hot breach
#

some random samples from a training I ran last night using zero terminal SNR, it gets very close to white/black backgrounds

#

black and white backgrounds

#

settings, this was just sort of a big random dump of training data I have lying around, 13k images

surreal lagoon
#

doesn't help me much because i'm using diffusers implementation 😛

hot breach
#

this is diffusers

#

I think there are some things that still need to happen in auto1111 or whatever since it mostly uses ldm code and patches, some issues with getting the extensions that supposedly do the CFG rescaling to look right

surreal lagoon
#

well you're only training the last 2 layers of the TE

#

damn, you ran 20 epochs of training? how many samples?

#

you're using EveryDreamer2, not diffusers

surreal lagoon
#

i wish it would be better at faces already, how many do i have to show it

#

definitely understands the overall concept

surreal lagoon
#

this is about 5000 steps earlier in training. it did not want to do a white background at all

#

i'm assuming that it's going to take a while to fully train all of this new noise schedule i'm applying

warm agate
#

@surreal lagoon can you please explain what generated images mean from this?
does it mean they were artificially generated using the training images fed into the algo?
they are completely different from the ones that were used to train it?
https://github.com/NVlabs/stylegan

surreal lagoon
#

i don' tknow

warm agate
hot breach
#

given what I'm seeing here I think offset is not required with zero terminal snr, I think that was the one of the points of the paper as well, offset noise is not very stable over time

#

that training was on 13k images, fairly random assortment of things

surreal lagoon
#

offset noise helps it converge more quickly

#

might have to stop applying it at some point but it does help still, even with trailing and rescaled zero SNR betas

hot breach
#

at least to me, it looks like a stable version of offset noise, with offset noise you need different mounts of it based on how long you train, like offsetnoise*0.1 the 0.1 is too much if if you train more than a few thousand steps and the model will turn into splochy figures on black etc

#

if you hand tune offset noise down its more stable for longer periods, but not entirely stable

surreal lagoon
#

well i trained at 30k steps without offset noise and the terminal SNR stuff didn't help anywhere near as much as both together did

hot breach
#

diffusers handles it more elegantly than changing both betas and alphas manually, just easier, but you need a schedule/timestep curve to run through the code snippet from the paper to "correct" it, so step 1 is load normally, then load again withthe corrected schedule and discard the temporary scheduler instance

surreal lagoon
#
        pipeline.scheduler = DDIMScheduler.from_pretrained(
            model_id,
            subfolder="scheduler",
            rescale_betas_zero_snr=True,
            guidance_rescale=0.3,
            timestep_scaling="trailing"
        )
hot breach
#

I'm a few subversions behind, not sure that was in

#

there are a few of us hacking on it regardless, its sort of a backdoor feature until we sort it all out and document

surreal lagoon
#

it's still a WIP pull request i've merged on my fork and have been testing

hot breach
#

ah cool

surreal lagoon
#

my knight is becoming more handsome either way

#

how it starts, roughly

hot breach
#

at least so far I think zero term works, several of us getting great samples but I think there may be issues in the auto1111 whatevers that handle inference side, but works like a charm on diffusers since the trained_betas actually get saved right in the scheduler_config.json, so works on invoke, sdgrate, samples from actual trainer, etc

surreal lagoon
#

perfection

hot breach
#

d-adaptation adam also seems to be working well for some people but unforunately not very efficient rightnow

#

should have AdamA in soon

surreal lagoon
#

well the offset noise has done what i've wanted so now i've removed it, at 12k steps

#

let's hope he comes down to earth and improves on the next ckpt

tall condor
#

as for faces it needs to see the same face at least 500-1000 times for it reproduce it

#

my concepts that run very little times suck very hard with the faces

sturdy falcon
tall condor
#

is there any tool that can convert regular text captions into tokens/tags?

hot breach
#

tokens and tags are different things, what is it you're actually trying to accomplish?

tall condor
#

well if i use wd14 or any other clip captioner i do not get tokes

#

like car, red, open window

#

its more like "a red car with open windows"

#

and i am wondering if there is a tool that can convert that into tokens

surreal lagoon
#

those are tokens

#

people get caught up on this stuff and think that something little is going to solve their problem when it's not even close to being the issue

#

like, what problem are you trying to solve with that

ancient mural
#

How much quality/accuracy is lost if you merge multiple models together?

hot breach
# tall condor its more like "a red car with open windows"

textcap models like blip write sentences and phrases, but some caption utilities do something they call "clip flavors" which is just trying to figure out if your image is visually close to a bunch of words in a dictionary, i.e. tags,

#

some of caption utilities do both, use blip to create "a man standing in a park" then clip flavors would add something like "claude monet, daytime, oil on canvas, outdoor"

surreal lagoon
surreal lagoon
#

@hot breach ok so offset noise breaks things a lot more now that terminal SNR is in there

#

without it, training goes better

hot breach
#

offset noise is unstable, ztnr should be stable

#

getting things to play nice in auto1111 may be a challenge, diffusers handles sharing the data about the updated beta schedule better since it can be explicitly shared in the schedule_config.json

#

that's ztnr only, no offset noise

surreal lagoon
#

i don't use automatic

#

once i hit 10k steps of training though, the thing starts screwing up

#

10k -> 12k -> 14k

hot breach
#

above is 30k or so steps at batch 15

surreal lagoon
#

i'm at batch 12

hot breach
#

also grad accum 6 so effective batch size close to 100

surreal lagoon
#

oh wow

#

that's a lot higher than mine, i have zero gradient accumulations in use

#

i'll restart from 10k steps with a higher batch and gradient size since i'm not interested in speeding through this training

hot breach
#

unet LR 3.5e-6 constant, TE only unfreezing last 2 layers with lr 2e-6 cosine schedule

#

some of those settings are somewhat haphazard as I experiment but they're not far off

surreal lagoon
#

i accidentally unfroze my whole text encoder for a few hundred steps once

#

it was about halfway into 30k steps

hot breach
#

works with brighter stuff too

surreal lagoon
#

i've unfrozen a couple more layers of the TE to see whether this helps bring the weights up or whether it makes it worse shrug

#

we'll see, i guess

#

my assumption is that it makes it worse

jaunty wadi
#

Okay maybe not the greatest example, but it should suffice. This is what I was referring to with these. (excuse the cat example, just wanted a really obvious choice its not trained on cats) These are tested with ((masterpiece)), outline, cel_shading cat, <lyco:CelShading-000001:0.75>,1 and ((masterpiece)), cat, <lyco:CelShading-000001:0.75> (with xyz plot on epochs/weight), for style loras, I've always heard it should be avoided if possible to make sure that you don't have to enter anything into the prompt style-wise. How could I avoid this? Would I prune all variants of "outline" or "cel_shading" from my training data?

surreal lagoon
#

so damn close

woeful goblet
#

How do i inpaint a fist in this pose, with the palm facing towards the viewer?
https://i.imgur.com/60pjsKM.png

i cannot seem to do it, i keep getting fists pointed the opposite way, with the back of the hand facing the viewer, even when inpainting over an image like this.
ive tried using varions on "palm facing viewer" and having knuckles in the negative prompt. But all i'm getting is either high quality inverted fists, or garbled flesh spaghetti

i have even tried inverting the colors of the fist in photoshop, and it made no difference

surreal lagoon
#

wrong channel @woeful goblet

woeful goblet
chrome breach
surreal lagoon
chrome breach
#

Mind sharing more details about how u did this fine-tuning??

#

I am currently trying to get my fine tuned model to give at least some level of realistic faces... but well, uk 1.5😂

#

Ig i'll try those configuration settings on 1.5... see how things with that

#

Try using a lower cfg

#

Damn

worthy orchid
#

does LORA add new info, or does it just tune your prompt to get the best result, like embeddings do

stiff dust
#

LORA is more or less same as dreambooth. However, it depends a bit on the implementation you use

worthy orchid
#

is dreambooth the same as embeddings 😛

stiff dust
#

okay 😜
Dreambooth finetunes the complete model.
Lora finetunes large parts of the model, depending on the used implementation.

#

so it's not just the embedding but the tect encoder and the unet

hot breach
#

hella contrast with zero terminal snr only, no offset noise

surreal lagoon
#

fine-tuning progress going super well this time

tall condor
#

still quite a lot of contrast tho

surreal lagoon
#

that's my prompt asking for it

#

for the end of the training run i've added more faces to the dataset. that collection worked well before but if i used it for too long it started picking up watermarks

#

hoping that resolves this munchkin face issue

#

it did on a separate training run 🤞🏽

stiff dust
#

can we except one of these models, you invest so many gpu hours in, online and downloadable soon? ;D