#šŸ”§ļ½œfinetune

1 messages Ā· Page 9 of 1

whole gorge
#

But I have been messing with this for about a month+ now and have seen little use for steps over 50

novel pond
#

oh, i mean the batch size and count.

#

get a little side-tracked.

#

Cause I was testing out the anythingv3 model to see if I can get highly detailed eyes from others I've been inspired

#

↑ (Not mine)
But nothing yet out of the sort, Just horrible quality mess of eyes...
↓ (Mine)

whole gorge
#

thats a resolution issue

#

youll never get great faces or eyes on a 512x512 image

#

that's why there is a whole channel here called upscaling

#

the ones that inspire you have probably been upscaled at least once if not twice

#

I don't start to see faces look really good until 1024x1024 or higher

novel pond
#

That's the thing, Higfix to increase the resolution. I even used img2img sections to upscale the images : ESRGAN_4x and also R-ESRGAN 4x+ Anime6B.

#

But it gotten worst...

whole gorge
#

highres fix does not add detail

#

it just resizes the same image

#

take your image from text to image, send it to img2img

#

under script select SD upscale,

#

set your denoising strength between .2 and .3

#

since its img2img and you wont lose your original feel free to play with denoising and try a few times

novel pond
#

This is what I currently have around, For the moment. (Just took it days ago.)

whole gorge
#

down there at the bottom were it says script none

#

you drag an image up top

#

or send it from the txt2img buttons

#

click where it says none and there is one called upscale

#

that will actually add details not just change the resolution

novel pond
#

SD upscale?

whole gorge
#

yes

novel pond
#

And should I change the width and height?

#

well a little bigger?

whole gorge
#

no

#

when you do the SD upscale it already says what scale factor

#

and its 2 by default

#

so its going to make a 1024by1024

novel pond
#

Well the Image is 512x768 actually.

whole gorge
#

thats fine

#

it will be 1025x1550 orwhatever

#

with enough resources you could then plug your 1024x1550 thing into the image to image and run SD upscale again and get a 2050x3k image etc

#

it usually falls apart on me when I try to do that though

novel pond
#

Should I do this many times to find the right pair detailed eyes like in img2img and Inpaint?

#

Along with the batch count and size?

whole gorge
#

I don't usually batch the upscales as they are so heavy

#

but you might try making sure your prompt has information about eyes in it

#

and for anime generally its the danbooru tags they are trained on

#

so you can go to danbooru and search for some images with eye tags you like

novel pond
#

Oh, one thing. Does sd upscale goes the same for txt2img?

whole gorge
#

I dont think you can do it on txt2img because what is it denoising then

#

unless it generates an image and then denoise on that image

novel pond
#

Ah, ok then. It's taking it's time upscaling right now...

whole gorge
#

Im generating batch img2img for another video

novel pond
#

And it just upscale into multi images like 7 images in to one!

#

and I mean it in a serious fashion.

whole gorge
#

did you change any of the settings

#

it usually tile maps with some overlay but stitches them together

novel pond
#

You mean the tiling one or the 64 one?

whole gorge
#

either

#

are you on img2img?

novel pond
#

I didn't use the Tiling but I did use the SD upscale tile maps around 64.

#

I just left it there.

#

Anyhow, I should hit the haystack. Thanks for the help/information man, I'll try to keep learning around the webui and stuff!

#

And you keep doing what your doing!

obsidian idol
#

Anyone got insight into the distinction of concepts in dreambooth? Like are all 3 datasets smashed together into the same latent space? Or are they trained sequentially, in that order? I ask because It seems like certain scenarios have different training results depending on the order of samples trained, and it'd be nice to treat this as a queue/sequencer.

tidal cliff
#

on the 1.5 webpage is says the v1-5-pruned is "suitable for fine-tuning", this is the larger 8 gig model. Does that mean it's better for textual inversion? or when they say fine tuning do they mean, like training the model on additional iterations using the original 1.5 source training data

#

is the 1.5-pruned model better at creating textual inversions than the 1.5-pruned-emaonly model?

tall pawn
#

Did you manage to fix the problem? I try to train with google colab but got the same error

terse wren
#

hi guys! šŸ™‚ I'm working on a way to quickly train different people faces to replace on the images already created.. .what is the best way to do so, hypernetwork or embeddings?

#

what are the difference?

#

Also, whatever I do i can't train an embedding od sd 2.1, keep getting black pictures.. what can be the problem?

#

How can I create such a matric with different sampling methods?

whole gorge
#

at the bottom of your webui

#

click where it says script and there is one called X Y plot

#

pick sampler for one of your plots and then type like Euler a, Euler, DDIM, Heun etc

#

its both case and spelling sensitive

terse wren
#

Great, Thanks! šŸ™‚

oblique blade
#

is there any way i can change the setting of stable diffusion in the UI to make it use my RAM to work more efficient?

terse wren
whole gorge
#

you need a yaml file to use the SD 2.1 model i know that

#

I assume you have generated images with SD 2.1 successfully then?

terse wren
#

yes

#

and i have the yaml file

stuck parrot
#

can anyone help me learn how to upscale without this sort of result?

#

or am i stuck with skin teeth? 🤣

whole gorge
#

Your skin looks weird anyway

#

What upscale settings are you using

stuck parrot
whole gorge
#

havent used that one

#

whats your base image look like

#

a lot of times for me the images fall apart when I go from 1024x1024 (or 1024x1500 or whatever) to 2k by 2k

#

because the "scale" the models is trained on doesnt think of things being that large and endsup making extra heads and stuff

#

so I usually only do 1 2x upscale using the SD upscale script I don't have that ultimate one

stuck parrot
#

cropped to be more sfw

#

@whole gorge ^

whole gorge
#

Do you have the normal sd upscale script

stuck parrot
#

guess i should try it? recommend any settings?

dapper prism
#

Is it possible randomly extract extract only a random subset of key frames in a video with ffmpeg?

#

I am using: ffmpeg -skip_frame nokey -i video.mp4 -vsync 0 -ss 00:03:56 -t 00:00:36 -f image2 frames/f1-%06d.png

#

but that is crazy slow for something like a short film or movie

whole gorge
# stuck parrot yep

Ive had really good luck with default settings, denoise .3-.4 and LDSR model

#

txt2img

#

the 2x upscale LDSR default settings

#

the clips, the crispness of the button/light on the back of the arm etc

#

see what happens when I try to upscale again lol

stuck parrot
#

oy

whole gorge
#

worked this time

#

denoise .4 notice things like the green thing on the shoulder became like a badge etc

stuck parrot
#

nice!

#

here's what i just got on a diff image

#

upscaled

#

original

fading forge
#

If I wanted to train on a bunch of images that are naturally and very distinctly not square, like 16:9 game screenshots (where capturing that UI appears in the corners is important), would it be better to add white padding above/below to get to 1:1, or cut it into two slightly overlapping squares?

#

Or, hell, maybe I’ll just train it at 16:9 and report back, just to see

whole gorge
#

AI seems really bad at text

#

so UI seems like it would be sort of a terrible thing to train

#

I think you'd be better off having the UI be layers you add over the generated images?

fading forge
#

Oh for sure, I’m cool with the text being absolutely junk, I’m just interested in the UI art, layout patterns, etc. So if I asked it to show me a mock-up of a fantasy character selection screen it’ll get it in the ballpark, with some AI garbled text where the name and stats are, and that would be enough

#

I've got it doing some interesting things with text2img already just with an embedding created from a few 1:1 screenshots from one game and then upscaling, generating at 904x512 with sd1.5

obsidian idol
#

Wracking my brain since yesterday on why my tuning wasn't working. It'd basically ignore half my samples. Gave up, screwed around in image2image. Dropped one of my samples in, and ... it failed? Had a bad TIFF header or something. Half my samples did too. The half that were the result of my hasty upscaling.

#

So at least I've found a solution, but man

tacit bronze
#

training tip: when training pixel art, be sure to scale it up to fit in 512x512, making sure you dont lose the quality in the process. keeping it small (like 60x60 or even a clean divisible 64) leads to blurring (which a lot of software tends to do with really small images)

hot breach
#

yeah upscale using nearest neighbor

finite creek
#

Can anybody tell me the diference between the V1-5 prune-emaonly and the pruned?

#

I read the second is better for training, is this true?

obsidian idol
#

Is there a benefit to using xformers on a 24GB setup? I turned it on, and seeing some unpredictable results.

split acorn
carmine zinc
#

I'm trying to make an embedding that makes a character like the one in the first picture, but after training they don't come out properly and I get these results, any advice?

slow badger
#

Should I use CLIP or deepbooru for making an embedding or hypernetwork? Is there any case when one is superior to the other?

#

Also, how many images would you say are necessary for a good embedding/hypernetwork? Should I go with 140 CLIP annotated images or something like 25 but with manually corrected annotations?

slow badger
#

More questions:

  • When I launch the training for an embedding or hypernet automatic1111 is training it using the model that has been selected right?
  • Is it possible to train (using a 512*512 model if the above is true) an embedding in 256*256 or even 128*128 without encountering issues?
whole gorge
whole gorge
#

for training you want to use a generic model like the base SD 1.5 or WD 1.3 etc don't use more limited custom models while training, however do not trust the auto generated images while training, set your checkpoint saving to 200-500 and then put all those embeddings into your embedding folder and test each one on your custom model after training

#

also I am not really sure about training things for smaller than 512 by 512 images most of the models are trained off 512 by 512 tho

slow badger
slow badger
whole gorge
#

so are you wanting to learn to make like a yearbook page as a style?

#

or to learn how to make portraits

slow badger
#

Actually it's for a game I love, I'd like to make more portrait for pilots in Starsector

#

They look like this

whole gorge
#

so you need to be training each of those portraits as its own image

#

not blurry

slow badger
#

I have 140 of them from the game, deepbooru generated weird annotations so I went with CLIP but I was wondering if I should reduce that number to a few dozen only and make the annotations myself (or at least modify them)

whole gorge
#

preferably on a white or black background

#

and you should try to make them 512x512 even if in the end you will downscale

slow badger
whole gorge
#

the captions are more about what other things it might describe like if you want to call your embedding and brown hair

#

Right and if you are ingesting those right now you are confusing the model

slow badger
whole gorge
#
  1. make sure you use 512 by 512 diffusion
#
  1. make sure they are all not blurry
whole gorge
#
  1. make them on white or black not transparent backgrounds
slow badger
#

alright, I'll run another training on sd 1.5 then, thank you!

whole gorge
#
  1. use the base model checkpoint like v1-5-pruned
#

make sure you have NO vae set

slow badger
#

erf šŸ˜„

#

pretty sure I did

whole gorge
#

set it to 4000 steps, learning rate default .005 save image every X to 200-500

#

ignore what you see in the training outputs

slow badger
#

Is this the right one?

whole gorge
#

when its done go to textual inversion/image embeddings and copy those to your embeddings folder

#

dunno I have this one

#

waifu diffusion for whatever reason is also a good training model

#

I never really use those two for image generation but they work well for training

slow badger
whole gorge
#

You are making each head one image or sets of 2 heads?

#

I think you want to use style_filewords for what you want

#

like Starsector_style

slow badger
whole gorge
#

put whatever you want to use to activate it in the ini text on the create embeddings tab

#

its often the same as the name for me

#

emb_star_sector

#

or whatever

#

your textual inversion directories will spit out a bunch

#

copy these into the main embeddings folder, swap to your custom model and try a prompt like "a portrait of an asian general, emb_star_sector-200"

slow badger
#

Yup, I already made some embeddings but did them naively before

whole gorge
#

then you can keep changing the number to try the different amounts of learning

#

usually for me at 4000 steps, 4000 steps is not my "best" one

serene flicker
whole gorge
#

somewhere in the 2k-3k range will be best

slow badger
whole gorge
#

yeah could totally matrix a prompt

#

I usually also X/Y checkpoints to try my embedding on a bunch of models

slow badger
#

"Number of vectors per token", I've seen 8 and 10 used, is that a decent value?

whole gorge
#

I use 8

slow badger
#

great

whole gorge
#

It's the "number of keywords" you want your embedding to reference sorta? as far as I understand

#

So if you were just making like a "neon" style it doesn't really need to know other words

#

but if you are describing a portrait of a person you expect more adjectives

slow badger
#

alright, sounds right to use 8 then, glasses, helmet, jacket, not too many variations on that in my training set (nor in the results I want to get)

whole gorge
#

well if you don't care about specifying them anyway when you do a batch prompt its not that important

#

I made this one. could be better but its hard in the sense NA miatas have popup headlights so it gets confused

slow badger
#

lol, even at 200 steps it already looks way better than what I had before

#

(that's from the preview but still)

whole gorge
#

yeah so generally it should be better than it is in the preview in the end

slow badger
#

great!
I've also read in a guide on rentry.org that I should use either deterministic or random here (random being slower), do you agree?

whole gorge
#

I've used both once and deterministic I can't really tell what it does

#

Although it just came to me its probably like which reference image it uses?

#

like does it go through each once, or similar ones, or random?

slow badger
#

Oh ok, I checked and it's about VAE apparently (I unloaded mine)

Ensure you have "Choose latent sampling method" set to deterministic or random (random could be better but incurs a performance penalty, deterministic will suffice", do NOT have it set to "Once" or the VAE will not be correctly sampled which could lead to hypernetwork death when training over many iterations

whole gorge
#

Its also talking about hypernetwork and not textual inversion?

slow badger
slow badger
#

I'm still unsure what the difference between those two is in practice

whole gorge
#

[–]HerbertWest 3 points 21 days ago

I'm also interested in the answer to this. Mostly because I've tried training with all methods and my best results are still with "once," which everyone else has said sucks. It seems to work the best for me, which is bizarre. Maybe it has to do with my dataset in some way.

permalinkembedsavereportgive awardreply

[–]PervertoEco[S] 1 point 19 days ago

Thank you, can confirm.

"Once" gives better results, with greater subject similitude between generations.

#

"All tutorials recommend "Deterministic" over "Once" or "Random", but do they actually do?"

slow badger
#

ah! given how quick 4k steps are I'll try both

whole gorge
#

yeah let me know if you see any noticeable difference

#

I tried adding about 10 more images specifically of helmets to my space marine embedding and it still struggles

slow badger
slow badger
whole gorge
#

I noticed a lot of autocaptions will use the word warhammers but I don't use that

slow badger
#

It also tends to produce image of a warhammer game board, I managed to get rid of that but it still an issue sometimes

whole gorge
#

well thats why I was training an embedding

#

and only used images that are not of figures

slow badger
#

those are old generations, I'm better at it nowadays

whole gorge
#

yeah looks more like a knight than a space marine

#

almost everything is pretty good obviously not chapter colors except the damn helmets lol

slow badger
slow badger
whole gorge
#

yeah

slow badger
whole gorge
#

I see gradient accumulation steps but no idea what it does

#

sounds worth trying tho

slow badger
#

yeah same, I've never seen anyone talk about it and I'm curious what training so quickly can do

slow badger
whole gorge
#

I mean it does it automatically with the preprocess tab so why would you

#

I also don't know the answer however

slow badger
#

Sometimes the captions are very wrong, adding stuff that's "incompatible" with the pictures

#

like this one for example, CLIP gave me "a digital painting of a man in a suit and helmet with a helmet on his head and a gun in his hand"

#

Given they are all portraits and I don't want any gun there (and no portrait in the training set has one) maybe it would be better to just leave out the captions, at least that's what I'm wondering about

slow badger
#

With 30 images, gradient accumulation at 20, 4 tokens and 0.1 learning rate the training has slowed dramatically.

#

It went from 4it/s to 10s/it

#

after 90, 110 and 100 steps, ahem:

#

I'll try again with less steps, I forgot to enable the preview before

split acorn
#

Gradient accumulation steps just allows for higher batch size without the increase of vram but at the cost of speed/ram

#

For the webui dreambooth extension, it's Batch Size x GA = Equivalent batch size

slow badger
#

oh ok, I had no idea, I'm just playing with the settings a bit randomly now

#

The selected image is pretty good, I'll try inference with this one

slow badger
#

This last experiment didn't work well it seems, I'm not getting anything close to a portrait most of the time

serene flicker
slow badger
whole gorge
#

or I should say its a lower vram option than straight batching?

split acorn
#

It allows some freedom to have higher batch sizes without the need for that extra VRAM, yep

#

Though, I'm not sure how much RAM it eats up alicatHm2

whole gorge
#

ah I don't care too much about a larger batch size just for faster training

#

well usually my VRAM is full and my RAM is at like 50%

split acorn
#

And bigger batch size isn't necessarily better and all that CB_nod

whole gorge
#

and here I was hoping for a secret technique that would make more details (like the stupid helmets) get learned into my embedding

#

I even tried training with some lower learning rates wondering if lower rate would mean more "detail"

split acorn
#

Right right

#

yeah, lower learning rate could just mean it doesn't learn anything

#

There was a really cool visual with learning rates, I'll see if I can find it

slow badger
#

What kind of batch size can I use with 12GB? Is that the same memory consumption as the inference batch size?

#

About learning rates there is this article which includes some good x/y plots

carmine zinc
lusty thistle
#

hey newbie question, I found some LoRA embeddings in .safetensors format. Using google colab, I loaded it in exactly like a .ckpt models but it just doesn't work as intended. The thing is how can I properly use LoRA, preferably for google colab TYSM!!!

finite creek
#

I updated Dreambooth in A1111, I cant train anymore. Get the following message: Returning result: Exception training model: ''CheckpointInfo' object is not subscriptable'.

#

Any ideas?

tepid sundial
whole gorge
#

have you manually captioned any of your images?

tepid sundial
#

I've been trying to train a LoRa (using the pti scripts in cloneofsimo/lora). But I'm facing an issue when I try to train SD 2.1, the textual inversion works fine, but the model training always results in nan loss. Been trying to tune the parameters available in the CLI, but to no avail. Anybody faced anything similar?

#

@split onyx ^ ?

split onyx
#

Hi.

#

Oh.

#

Have you been using 512 512 images?

#

Oh hold on, ive not yet trained with SD2.1 since v0.1.0

#

So it might not work. I'll work on it!

tepid sundial
#

I can train the LoRa just fine using 768x768 using SD 2, and can train using 512x512 using SD 2.1 Base

split onyx
#

Wait so its specifically sd2.1?

#

that cant be...

tepid sundial
#

Appears so, no idea why

split onyx
#

huh, sd2.1 and sd2.0 has tiny parameter difference i think?

#

Hmm...

tepid sundial
#

I'll run some more tests, but I'll compile a matrix of what works and not

carmine zinc
whole gorge
#

Is she always in that outfit?

#

For example looking at Danbooru your prompt should have lace-trimmed dress, and detached sleeves or wrist cuffs in it

#

I think you need to work on your prompt I can get this without even having an embedding

#

@carmine zinc

#

"a photo of a young girl wearing a red white and blue (lace-trimmed dress), (bustle:1.3), (wrist cuffs), hair bow, blue thighhighs, solo focus, looking at viewer, hand up, masterpiece, best quality, ultra detailed, (full body), (anime:1.1), cowboy shot, "

#

I wrote this prompt in the last couple minutes and have not even really refined it or added negative prompts

#

try putting the above with your embedding

carmine zinc
#

No she isn't always wearing that but the harstyle is always the same

#

the pigtail is one of the characteristic traits

whole gorge
#

thats not a pigtail

#

looks like a version of a high ponytail

#

have you searched danbooru for the character?

#

and looked at the tags used to describe her?

#

ok so its hayasaki mei from idoly pride

#

most of these images she is wearing a school girl outfit so makes sense the embedding learned that

#

for a very specific model that you just want to copy this is what dreambooth is relatively good for

#

if you really want to train an embedding for her I would suggest 40-50 images, not blurry, 512by512 use empty canvas to make them 512 wide since she is likely mostly not in square images

#

and put her on a plain white or black background

#

since she is an anime character I would use subject_filewords and deepbooru tags

#

and train it using the wd_v1-3-float32 or similar waifu diffusion checkpoint

spare herald
#

has anyone had any experience with negative prompt training? I'm using some fork that says it supports neg prompts using flags but it seems to suck ass

#

I tried using stable tuner also but it just eats up all the memory and dies immediately

tepid sundial
#

Is there some recourse to track fine tuned models that are trained on large datasets? Say if someone fine tunes sd1.5 on a large dataset in a bespoke manner, I'd like to identify that as opposed to someone that has done more simple dreambooth or lora training?

#

It seems quite rare for people to publish what datasets they have finetuned with.

split acorn
#

Very common in the LoRA community, but yeah, others not so much

tepid sundial
#

I'm especially interested in finding efforts of finetuning that's been started at a base of say sd1.5 but have been trained with a decent and public dataset. Seems hard to find via searching, perhaps because it doesn't exist.

spare herald
#

danbooru has a little bit more than anime but yeah a main focus for sure

tepid sundial
#

I am interested in fine tuning for illustrations (think corporate modern clipart), so anime is not too far out. But I'm worried that datasets like danbooru will skew the generation too much into risky territory.

#

Enough fine tuning might make that a non-issue however. So a fine tune on that dataset that has fine tuned the entire model, might perform very well for illustrations. I might have to test.

split acorn
#

Risky data in, risky data out. Clean data in, clean data out

#

You can filter it out CB_nod just know that if you train on a specific model and it's geared towards NSFW, then it won't matter alicatKEK

#

Corporate modern clipart might work best on 2.1

#

which has an INCREDIBLY strict NSFW filter on it

tepid sundial
#

That's probably a very good starting point, yes - 2.1

wintry girder
#

Are there any models or finetunes that are good for getting alien skintones? Green, red, blue etc?

whole gorge
#

I haven't had an issue just saying like "blue skin" in the prompt

wintry girder
full knot
#

for embeddings, is it better "once" or "deterministic" for the sampling method ?

whole gorge
#

general opinion is deterministic or random, but some prefer once

#

best to test both

#

you should not need more than 4000 steps or you are doing something wrong so training should be relatively quick

#

if your embedding is off by a lot its probably your source images

full knot
#

thanks for infos !

#

i'm wonderig if i should name the caption as cloest at it should visually be

#

like A rendering of a skull head, octane render, sharp focus etc

#

or only A skull head

whole gorge
#

make sure its not blurry and has a clean white or black background will have a much larger effect than the caption

tepid sundial
whole gorge
#

I've never needed more than 4k even with 60-90 images but ok

split acorn
#

The original paper used "random"

full knot
#

does hypernetwork needs the same base models for generations as embeddings ?

split acorn
#

Nope

#

But you will run into issues with one's trained on 2.0 models mixed with trained on 1.5 models

#

I don't think they're compatible, as far as I'm aware

#

Also some may not work as well with different models

full knot
#

yeah i guess so thanks

true portal
#

hello

hazy schooner
#

Hello everyone!

Do captions for dreambooth models for example serve the same purpose as in TI? What I've learned is that captioning for embeddings basically seperates the style you want to train from the subjects on the images- but I'm making my first tiny steps into model training right now so this is all very new

split acorn
#

Yep!

#

It can work either way and of course your results may vary, but it seems to work better, in some cases

hexed bloom
#

I've noticed that after the checkpoint I set (5%), my Vram usage drops by 8gb and stays this way, would I be able to increase the batches ya'll recon? Even if the first epoch gets throttled, it should hopefully resume its speed after the first checkpoint right? (as long as the vram usage drops again)

full knot
#

does vectors are related to what I write on captions for embeddings ? A more described caption = more vectors ?

hexed bloom
hazy schooner
hazy schooner
#
split acorn
#

As Joppe indicates in his comment, the amount of vectors required can loosely correlate to how complex your initialization text is
Huh, is that true? alicatHm2 I suppose it'd be easy enough to test

#

I thought it was just related to the embedding itself

split acorn
#

Yeah, is related to how complex the embedding is itself

hazy schooner
#

but if the concept is complicated, very detailed and precise, lots of things to describe- more vectors

split acorn
#

I just think it's misleading, because people might think the vector token count relates to the tokens in the initialization text alicatHm2 although I definitely see the point and I do think it's generally a good rule of thumb

hazy schooner
split acorn
#

Gradient Accumulation Steps should be a number that, multiplied by your batch, equals the total number of training images.
I think that works pretty well! But I wouldn't say "should" alicatHm2 rest seems good alicatPog

#

That's of solid info in there alicatPog

#

[filewords], [name]? or [name], [filewords]? There's more weight to the first token in the prompts

hazy schooner
split acorn
#

LOL true that, but with any machine learning, there going to be sad when reality hits em

#

because there's no one fits all for anything CB_nod

#

BUT

#

rules of thumb or guidelines are fine

#

like "generally these settings work" kinda thing

hazy schooner
split acorn
#

Do you do tag shuffling?

hazy schooner
split acorn
#

gotcha gotcha NOTED

#

I think tag shuffling works really well for more anime stuff? because it works with danbooru tags and those are tag shuffle friendly

hazy schooner
split acorn
#

interesting! And is a good idea to try [filewords], [name] when I don't want as strong an effect alicatHm2

hazy schooner
#

also I'm not into anime so booru tags won't work well for my stuff kek

split acorn
#

It might though

hazy schooner
split acorn
#

Yeah, but the idea would be to seperate concepts using natural language in the caption itself

#

because there's a LARGE effect on whatever is closest to the start of the prompt

#

so by shuffling it, you better distribute it

hazy schooner
#

Makes a lot of sense- might play around with it in the near future when I have the time to train something again

split acorn
#

Yeah, Auto1111 webui training is pretty slow, 10k steps at 40 minutes (w/ a 3090) is honnaDespair

hazy schooner
#

Usually the captions are just long sentences with maybe one comma for me

split acorn
#

If you try that method, you could try with more commas, like separating concepts. I don't have a whole lot of knowledge with non-anime so I'm not enitrely sure GoatUppies

hazy schooner
split acorn
#

Like, basically you'd look for is when you do a prompt and it produces good results while including commas, or produces results that are very similar to your own pictures (accurate). Then you'll be golden

#

if commas breaks things, then it probably won't work that well but alicatHm2

full knot
#

I saw some tutorials where the initialization text should be unique, it's only for subjects unknown to sd no ?

split acorn
#

mmm I wouldn't say unique

hazy schooner
split acorn
#

Describing the character traits or what you ALWAYS want your subject to be related to, seems to work really well

#

And then captions is what you DON'T want your embedding to learn, seems to work well

hazy schooner
#

initialization text is pretty much where you want your concept or subject to start training

split acorn
#

yep

full knot
#

so its the same for styles, if i want to train a new pointillism style the init would be pointillism and not "test5478" right ?

hazy schooner
#

pointillism art style or something

full knot
#

god thanks i got misleaded pretty crazy haha

hazy schooner
split acorn
#

The "unique" token is for the instance token

#

unique meaning rare CB_nod

#

as in, the model doesn't know what that token is

hazy schooner
#

When I started I would just use the embed name again, but at some point I switched to using it like a starting point and the difference is immense if you look at the generated training images

split acorn
#

mm mm!

#

I've done 32 embeddings with the same dataset

full knot
#

and for hypernetwork, there is no init how does it work ?

split acorn
#

and the ones with the initialization text describing the character turned out a LOT better

full knot
#

i see thanks

hazy schooner
#

my knowledge on HNs is pretty much zero for now so I can't help you there unfortunately

split acorn
#

I haven't really touched HN

#

Your customization comes from the hypernetwork layer structure activation function and Layer weights initialization mostly

#

deeper being better for subjects, iirc

#

and wider being better for styles?

#

I don't recall GoatUppies

#

deep meaning like 1,3,1

#

Guide ^

stuck parrot
#

can anyone offer some advice on dreambooth captions? i'm using TLB-FastDreambooth and i'm just struggling to get the output i'm looking for. i'm using a keymash token (ohwx), naming my images ohwx-1.png, ohwx-2.png, etc. (total of 12 images). the dataset is properly varied. some medium, some wide, some close. different backgrounds, expressions, looks, etc. captions are descriptive, but don't include the ohwx token. (i tried including the token in the caption and instead it was getting setting and pose but not her face). example caption: "a photo of a woman in a dress standing in a room with a door and a clock on the wall behind her". i'm not using regularization images based on TLB's advice to skip the step if training a face. i set a text encoder step value, but it doesn't run ever (i assume because i'm not doing regularization). i've tried 15k steps with a dataset of 82 images and it was generating her face well but was locked into selfies because too many of the 82 were selfies. am i missing something?

split acorn
#

15k steps is a LOT for a subject with the idea of learning a face

rain tapir
#

Has anyone gotten an embedding training to look just as good as a dreambooth training for a specific subject, not an artstyle? I am currently training mine but thinking it's a waste of time because I don't think it looks as good as dream booth.

split acorn
#

82 images is also a lot for just face learning

rain tapir
#

Not to mention to actually make it look like the subject it takes 10 times longer since each step takes a lot more

split acorn
#

you could tone it down to like 5-30 and see better results, tbh

#

Most of them should be closeups and you have the right idea with having them varied CB_nod

rain tapir
split acorn
#

and yep you can just skip regaularization if you're fine with the model only generating that face

rain tapir
split acorn
#

You don't need them if you don't care about generating other things

stuck parrot
#

i'm fine with the model only doing her face

#

its a model just of my wife, so no biggie

rain tapir
#

Of course, you want her to only do her face, that's the point of training a subject

split acorn
#

That might be "better" but for someone starting out, I really don't think they need it

rain tapir
#

With Reggie's not only do you get more flexibility, but you also point the model into the right direction, for example, high-quality Reggies increase your chances of getting a high quality image if that's what you're going for

#

Midjourney style Reggies will push the style towards a midjourney style generation, etc. I honestly don't see the point of not using Reggies

split acorn
#

Do you have any examples, that's super interesting alicatPog

stuck parrot
#

well, if this train i've got going doesn't produce anything usable i'll go back to reggies

split acorn
#

Like regularization is typically just random images (of the class) generated from the model you want to train on, so it's interesting to see people not doing that and it having a positive effect on the outputs

#

So comparison examples would be super interesting to see

rain tapir
split acorn
#

LOL that's a mood

#

and yeah, no worries!

rain tapir
#

And see for yourself

#

In fact, I'm going to be generating my own set of Reggies based on the new protogen model, since I already have a bunch of them for other models

#

I will upload it to hugging face and you can download it from there and use them, and you can see the difference yourself and post results for research

split acorn
#

Though I will firmly say that it's more user friendly to not worry about them to start off with. Like I'm confident you can get usable results without them. They might not be as high quality, but it is less work and less variables to worry about, imo

#

but if it makes it easier to get usable results, I suppose I would just be wrong alicatKEK

full knot
#

accumulation steps is the number of my total pictures if the batch size is 1 ?

hazy schooner
#

I'm all for high quality though

split acorn
#

that's an option, yes

#

You could also leave it at 1

full knot
#

really it don't matters ?

rain tapir
#

I think it means gradient accumulation steps

full knot
#

i would like to do batch 2 but i have 47 pictures x)

split acorn
#

You can get away with either, it'll just change how long it takes to train

rain tapir
#

Which you should leave at one unless you have any reason to change it, which is usually Vram limitations

split acorn
#

and LR might need to be adjusted

hazy schooner
split acorn
#

Batch is VRAM and GA doesn't care about VRAM

full knot
#

rtx 5000 (16)

split acorn
#

so you keep stuff on batch if you can and then GA if you have VRAM limitations

rain tapir
split acorn
#

right

rain tapir
#

To try to imitate the same result, I don't remember the exact process on which grading works, but it is inferior to using full grading

split acorn
#

and sometimes less is more

hazy schooner
rain tapir
#

It has to carryover data to the next epoch, or something like that when you do gradient accumulation, instead of passing the entire data at once so there is a negative effect if used unnecessarily

hazy schooner
#

batch size of 1 would be a waste of time I think- as in, you have the vram

full knot
#

if i cut to met the calculus requirement i can set up the batch more higher

split acorn
#

yeah, pretty much

#

You could argue Batch 1 is easier to train? alicatHm2 even if it might take longer to do so?

full knot
#

so 44 images with a batch 4 i set the gradient to 11 ?

split acorn
#

Or maybe, easier to not overtrain, is the better way to word that

full knot
#

for example

#

haha yea

split acorn
hazy schooner
split acorn
#

I'm more so referring to the learning rate

hazy schooner
#

I personally do 1 epoch every step so it's easier to keep track of progress

split acorn
#

You're speeding up the learning process, right?

#

Which makes it easier to overshoot or overlearn

#

or something like that

rain tapir
#

Lets say u have 60 images

hazy schooner
#

Something like that- although I feel like the lr isn't something that will be accurate anyways on a first try

rain tapir
#

You ideally want to train 60 batch size, 1GA

split acorn
#

yeah

rain tapir
#

But u cant fit 60, so u can the try 30 BS, 2 GA

split acorn
#

right right

rain tapir
#

Still doesnt fit, 24 BS, 2.4 GA

#

The higher the GA, the shittier your results will be

split acorn
#

I don't know if that's true, is it?

#

GA is just a matter of time, right?

#

like just taking longer but same results

rain tapir
#

But since VRAM is worth its weight in gold and it is scarce then you have to deal with it

#

I remember hearing that it does cause a loss of accuracy but I need to research it. These topics are super complicated unless you have a degree in machine learning which I want but can't afford.

#

Because all you hear are super technical answers

split acorn
rain tapir
#

So when someone says yeah, it causes a loss of accuracy, usually the reason behind it is explained with a bunch of more jargon so it's hard to fit the entire idea in your mind

#

I'm a take a read

split acorn
#

Also, easy enough to test out and compare alicatKEK I should do that at some point

#

mmm there's another one, one sec

finite creek
#

Has anybody tried a full tuning with Everydream or similar?

split acorn
#

Yep!

#

I'd argue it's one of the easiest ways to train

#

since you're just learning on the captions

finite creek
#

Why would you use it as opposed to dream booth ?

split acorn
#

depending on what you're looking for, I suppose?

#

If you want to learn multiple things, finetuning/caption training works well

#

or if you don't want to worry about rare tokens or

#

just being overall easier alicatHm2

#

I think it's easier to not overfit?

#

it doesn't have the same level of overfitting that DB can have

finite creek
#

Let’s say I want to train a model for a very specific purpose. Designing a car Rim, but I don’t want it to use generic wheels that are in the model. I’d like to train it with a specific data set so the results are more interesting. Would it be better to go with full tuning ?

split acorn
#

I'm not sure what one works better for something like that alicatHm2 mmm, unfortuntely the community doesn't share a whole lot of info on how they trained specific thigns

#

some people do but yeah alicatKEK

#

mm there are a couple good example starting points

#

one sec

finite creek
#

Thanks Alicat!

split acorn
#

A really solid model AND has a lot of info on how it was made

#

you could do that for rims

#

I think alicatHm2

#

oh oops, that's an embedding

finite creek
split acorn
#

nope! Sorry I'll grab a different one

finite creek
distant patio
#

Using Dreambooth,
I see some using a single keyword in Prompt instance (zkz, ohwx,...),
others on the other hand use an additional class word (e.g. ohwx woman)
while some use a whole prompt (e.g. Photo of a beautiful ohwx woman, award winning photography).

What is the right entry among these 3 methods?
I don't understand how this works.

split acorn
#

Instance token is like ohwx. Class token is "woman".
Instance prompt is like "Photo of ohwx woman"
Class prompt is "Photo of woman"

#

When prompting you would use "ohwx woman"

#

for the strongest effect

#

is one way to do it

#

So "Photo of a beautiful ohwx woman, award winning photography" should work well for an example prompt when using the model

distant patio
#

I saw many video where they just leave that filewords entries blank and it work. That's what puzzles me here.

split acorn
#

What was the entry called?

#

[filewords] can go in like Instance Prompt or like Class Prompt

#

or do you mean, the text files were blank?

distant patio
split acorn
#

Ooohh

distant patio
#

I often see this parameter blank in "tutorial" video

split acorn
#

Yeah if you leave those blank, you would just be doing caption training

#

also known as finetuning

#

and then it's ONLY learning on the filewords

#

is how I understand it

#

which tutorial are you watching? there are a couple solid ones out there now

distant patio
#

I am explaining from scratch to very advanced level how to use #Automatic1111 Web UI and D8ahazard #DreamBooth extension to teach new subjects, e.g. your face into a model. Moreover, I am showing how to inject your taught face into a completely new model e.g. Protogen x3.4 to produce awesome quality images without wasting too much time on findin...

ā–¶ Play video

Support us on Patreon: https://www.patreon.com/entagma
https://www.entagma.com

After installing the stable diffusion webui (https://youtu.be/cL_ZYdkIqBU), Mo goes over how to train an AI model to generate portraits of your face using dreambooth.

Download Automatic1111's WebUI:
https://github.com/AUTOMATIC1111/stable-diffusion-webui

Installing...

ā–¶ Play video

Dreambooth local training has finally been implemented into Automatic 1111's Stable Diffusion repository, meaning that you can now use this amazing Google’s AI technology to train a stable diffusion model with your own images. You can train a character, an object, a style, or anything you want! There is also a new option that allows you to use D...

ā–¶ Play video

DreamBooth for Automatic 1111 is very easy to install with this guide. With DreamBooth for Automatic 1111 you can train yourself or any other subject. Use your own trained Model to create images in your styles or of yourself. The DreamBooth training in for Automatic 1111 takes only around 30-40 minutes with a good GPU.

LINKS From Video ##...

ā–¶ Play video
rain tapir
#

@halcyon linden Team Aitrepreneur FTW. Kay do you have your own discord? If not, make one

distant patio
#

For me, the one from SE Courses is what gave the best result, although hardly usable at the moment.

split acorn
#

mm

#

Yeah you can do that, the difference between the Class prompt and Instance prompt is the token you're training on

#

essentially

#

Also a huge fan of SE Courses

distant patio
#

Thank you.
The descriptions of the fields are often haphazard (succinctly, what is listed on the github of Dreambooth extension A1111).

split acorn
#

Yeahhh

distant patio
#

plenty of magical numbers too šŸ™‚

#

Entagma's is the clearest of all (you feel the teaching professional behind), but the video is a bit dated now, it has quickly evolved on this side.

split acorn
#

It's still the new dreambooth UI too, and the video is solid! CB_nod

hazy schooner
finite creek
hazy schooner
#

I'm only just getting into making models

finite creek
#

Oh ok, yeah was wondering how to make a full tuned model. Looking around my deduction is that you need more input images and perhaps no reg images…somebody correct me if I’m wrong.

split acorn
#

GoatUppies yeah is a good embedding

crystal schooner
#

Hi, I'm about to make a TI of a specific body type, should I go for style or object?

full knot
#

in init text if i do : inittext1, initext2, initext3 is that the training spread across three or the whole sequence ?

#

embedding

hexed bloom
#

How do I enable/disable bucketing on AI Dreambooth training? I can't seem to find the option anywhere

split acorn
#

depending on the repo, it can be automatic

#

Resolution being 512, does:

bucket 1: resolution (256, 896), count: 0
bucket 2: resolution (256, 960), count: 0
bucket 3: resolution (256, 1024), count: 0
bucket 4: resolution (320, 704), count: 0
bucket 5: resolution (320, 768), count: 0
bucket 6: resolution (384, 640), count: 10
bucket 7: resolution (448, 576), count: 10
bucket 8: resolution (512, 512), count: 90
bucket 9: resolution (576, 448), count: 0
bucket 10: resolution (640, 384), count: 10
bucket 11: resolution (704, 320), count: 0
bucket 12: resolution (768, 320), count: 0
bucket 13: resolution (832, 256), count: 0
bucket 14: resolution (896, 256), count: 0
bucket 15: resolution (960, 256), count: 0
bucket 16: resolution (1024, 256), count: 0```for example
#

automatically (at least that's how kohya's works, I'm pretty sure the DB extension does that too as of 2-3 weeks ago

#

"disabling" it would just be using only 512x512 images

#

then no bucketing would happen

stuck parrot
rain tapir
#

Since I'm in the process of comparing embed vs db

stuck parrot
#

how many reggies would you recommend i use for face training a woman?

rain tapir
#

rule of thumb is 10 reggies per sample img

stuck parrot
#

kk

hexed bloom
#

And I just pulled yesterday, and the console does show that bucketing exists

#

I'm just not sure how to enable it

split acorn
#

odd! yeah not sure

rain tapir
#

You would never want to feed anything other than 512

#

Never rely on dreambooth resizing

#

is what I mean

#

or birme, that thing compresses like a mf

hexed bloom
split acorn
#

Do you have any comparisons? With the auto resizing from dreambooth versus manually? I'd love to see those alicatPog

#

like a 1 to 1 comparison with the only diff being that?

#

Intuitively I feel like that'd be right, but

hexed bloom
#

I usually do it all in imagemagick myself at 95quality

rain tapir
#

I swapped to photoshop for that exact reason

#

Im probably training a dreambooth model soon, I have soooo many models on queue

split acorn
#

I've been using Kohya and they auto bucket, but if that seriously hurts quality, I'll just manually resize them

hexed bloom
#

i really doubt that would hurt the quality

#

magick is so fast as it is

rain tapir
#

Sadly few programs have respect for quality

#

Also this might be superstition

#

But if they are outputting anything other than PNG there is always a loss, even if you cant see it

hexed bloom
#

Jpg loss is less than 1% at that size by the way

rain tapir
#

If a png is 2mb and the jpg is 300kb, where did that 1.7mb go? It's data you can't see with the naked eye, but a computer can probably see it.

rain tapir
#

My png's are about 10 times bigger than jpgs

hexed bloom
#

Size doesn't mean more quality

#

Just uncompressed

#

My tiffs are 10 times bigger than PNGs, should we be using those instead? Probably but we aren't using machine learning to detect stars at the edge of the observable universe

#

We just making picshures

rain tapir
#

tiff is lossy

#

oh wait no, it is analog to bmp I think

hexed bloom
#

tiff can be lossless or lossy

rain tapir
#

I mean, ideally you would, but these models are running on 512 anyways, 678 for 2.šŸ’©

#

Or 1024 if ur a maniac like me 🤪 You wouldnt believe some results that Ive gotten. But you would have to hit the lottery to get something coherent. INSANE detail but low coherence

hexed bloom
#

Yeah I train 512 and 1024 as well

#

I still prefer 512 at the end of the day

stuck parrot
#

anyone know why TLB's fast-dreambooth leaves instance prompt blank?

full knot
#

wow just tried my embeding on my own machine with the same "name" of the model and it obviously didnt worked

#

look like the sha is different

oak girder
#

Quick question regarding DreamBooth with SD v2.1 (sorry if this is the wrong place to ask).

I'm comparing different pre-trained models (SD v1.4, v1.5, and v2.1) in the DreamBooth method and notice that v2.1 has a greater loss during fine tuning (normally just under 0.5) compared to v1.4 and v1.5 (normally just under 0.2), but v2.1 still produces high quality images for my particular application. I know v2.1 is trained (as least in part) using a different loss function, but that shouldn't affect the loss here as I'm using the same training script regardless of model version. Any idea where the difference is coming from? Thanks

lean needle
#

Can anyone who was experience in Hypernetwork training help me to train a style from a specific artist?

I have the captions in the dataset directory but couldn't find how it is related with the training since the prompts are coming from prompt template file.

crisp python
#

Remake the past

rustic lava
rain tapir
#

People pay for advise? Dayum

rain tapir
rustic lava
#

yeah man that's what a consultant is

rain tapir
#

1.5 is still the alpha chad master race of models

#

Oof. Wait until they hear about chatGPT

#

I swear chatGPT is my therapist, consultant, tech support, and I shit you not, once realistic female androids come out - wife. Just plug that shit into an android and voila. The perfect human companion.

oak girder
rain tapir
#

If all u do is draw cute pictures of bunnies though I think it might be better

#

just cuz of 768. Once we get a 1.5 1024 model tho, the world will implode on itself.

oak girder
split acorn
#

Well there's still celeb content, but the names aren't attached to them to the same extent anymore.

Human content was HUGELY nerfed due to the insanely strict filter on 2.0. They attempted to fix it by continuing to train on it by introducing a moderately strict filter, but it could only do so much.

#

InvisibleParrot
It's good for landscapes and non-humans though!

hazy schooner
#

Question: what could I be doing wrong? I'm using the a1111 DB to train 2.1 512, but this is the garbage I'm getting no matter the sampler or the settings. I feel like I did everything right that I could be doing right, and 1.5 training with exactly the same settings, samples etcetera worked like a charm

#

Is there something about training 2.x that is so fundamentally different?

#

Or should I not use a1111's DB for this but rather EveryDream or the main diffuser based repo

full knot
#

looks like you don't have a correct yaml loaded for the generations

mellow meteor
#

I had weird colors like these when I lowered the image resolution way way down in Stable Diffusion 2.1, idk why.

hazy schooner
hazy schooner
mellow meteor
#

Yeah I know. No wonder why I have to stay on 512x512 all the time. 384x384 seems to work correctly though.

hazy schooner
#

It had a yaml already, and even with the original SD2 yaml the problem persists

full knot
#

hmm i heard that a specific model i used to train on it lemme check

hazy schooner
full knot
#

weird

#

i'm sure i got it working with shivaro db

hazy schooner
full knot
#

haha the "what" has so many variable in there indeed

#

wondering why in my embeddings samples i asked a simple "microwave in mystyle" and i got a volcano

#

all my txt captions are empty

#

we don't even looks for something evident first, it can be "everything" x)

hazy schooner
full knot
#

hmmm no that's weird captionsa re there

#

100 steps at 0.0025 lr

#

An intricate microwave, art by xxxx

#

same dataset worked fine on 1.5 don't really understand

#

hmm maybe the 2.1 embeddings need to be trained on 768 ?

#

768 resolution dataset ?

coarse idol
#

hey guys i want to train style with dreambooth i have tried with small dataset like 20 images and the results are great but it seems when i train with 200 the results wasn't that good,i have searched about it and found there is Everdream but it seems my gpu is not enough for this, so i was thinking if i train the model with dreambooth partly with every 20 images and merge them into one model, how would it be ?

full knot
#

for 2.1 "which" models i should use for train embeddings ?

#

got really nonsense results even at very low steps with the 2.1-768 nonema

swift void
#

here's a tip for fine tuning

#

use the config file v1-finetune_style.yaml or v1-finetune.yaml or v1-m1-finetune.yaml

hoary stone
#

Hey Crew!

So I am finally getting into fine tuning. Looking for some dataset setup advice. To begin with I am using fast-dreambooth.

I want to train on a specific vehicle, in this case a pickup truck.

There is a new model of the pickup truck coming out and there are limited photos of it.

The photos I have are just of one colour. However, the new model is only slightly different (has a longer chassis).

If I wanted to give dreambooth more context, is it okay to give it photos of the old models of all different colours, but perhaps label them differently, like pickup-long, and pickup-short.

Penny for your thoughts?

swift void
#

you can also start collecting 512x512 pictures of what you want to train on and make your own model

#

try learning how to code up a web scrapper, and use something like googles, custom-search api

#

this will allow you to automate the process of gathering the images

#

also prob want to add a script to resize and convert images to .png

#

I'll post some examples/scripts I have

#

APIKEYS n such are fake, I did not post my keys šŸ˜‰

#

example of using ffmpeg to resize and convert image
ffmpeg -i input.jpg -vf scale=320:240 output_320x240.png

#
  • Note: The scale filter can also automatically calculate a dimension
  • while preserving the aspect ratio: scale=320:-1, or scale=-1:240
hoary stone
#

Cool, thanks. I am a python coder actually. The tough thing is the pickup truck I am talking about isn't released yet, so there are only a few "sitings" of it in the community. I only have about 10 images.

So I am going to pepper in some images of the front and back of an older model which those parts of the vehicle look the same.

swift void
#

Now just put all that together, and add some nedy programmer magic and booyakasha

#

tbh, I hate python

#

but I have to use it so, I use it.

#

the output of that google script is json, so you'll just prob need to add a regexp to filter out image links and pipe that to something like wget keeping it simple

swift void
#

try adding this python snippet to the google search script

#

testfile.retrieve(url.replace('"',''), "tmp/images/full/fish")

#

or
print '\n'.join(jsonpath.jsonpath(parsed_input, "$..URLs[?('unica' in @)]"))
or

import urllib.request, json 
with urllib.request.urlopen("http://maps.googleapis.com/maps/api/geocode/json?address=google") as url:
    data = json.loads(url.read().decode())
    print(data)
#

just idea's of what one could add to it, idk, try shit, see what works n what fails

swift void
#

once you figure this part out, all you'll need to do is just change what it searches for online, wait for it to finish and have a folder populated with all the 512x512 images you need to train a model with

#

šŸ¤“

hoary stone
#

Good thinking

swift void
#

grep -Poza '(?:\G(?!^)",|"groups":\s*\[)\s*"\K[^"]+'

#
    • P - use PCRE engine to parse the pattern
    • o - output matches found
    • z - slurp the whole file, treat the file as a whole single string
    • a - treat the file as a text file (it [should be
  • used](https://stackoverflow.com/questions/152708/how-can-i-search-for-*a-multiline-pattern-in-a-file#comment44086821_152755) because when the
  • -z switch may trigger grep binary data behaviour that changes the
  • return values).
  • Pattern
    • (?:\G(?!^)",|"groups":\s*\[) - either the [end of the previous
  • match][1] (\G(?!^)) and then ", substring, or (|) a literal text
  • "groups":, 0+ whitespaces (\s*) and a [ char (\[)
    • \s*" - 0+ whitespaces and " char
    • \K - [match reset operator][2] discarding the whole text matched
  • so far
    • [^"]+ - 1+ chars other than "
  • As you see, this expression finds "group": [", omits that text and
  • matches each value inside "s only after that text.
#

@hoary stone

#

example pulling ip address's out of a file with grep
grep -o '[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}' file.txt

#

Just use your imagination, and write it out in code, (or find someone that already has online, lol)

#

or since the google script outputs in json, just pipe it into curl using a GET request

#

hmmmm so many ways 😁

#

This enables you to be the creator of your own models rather than submitting to an authority figure (someone else) that you have to wait on to solve your problem for you (windows users, -- cough -- lol)

hoary stone
#

Haha, wow man. You didn't have go that far! Thanks for your input, much appreciated.

swift void
#

it was no problem

#

i just get carried away sometimes šŸ™ƒ

#

and there may be others reading this and taking notes

#

i was writing this for everyone, tho you were the focal point

#

āœŒļø

full knot
#

hmm just love how turned my embedding for 2.1

bitter adder
#

Does anyone know how to add negative prompts to the openjourney model?

#

I don't have a specific negative_prompt field and the usual [] or ::-1 isn't working at all in the regular prompt

willow mason
#

hey question can i allocate more ram to SD becasue it looks like mine is hard locked at half

#

also is there like a guide to the UI

#

like what everything does and setting n shit

stuck parrot
#

someone was telling me you can use LORA to compare two models and extract the difference as a pt but I can't find anything about this in Google. can anyone steer me in the right direction?

rain tapir
#

Dreambooth is a Google AI technique that allows you to train a stable diffusion model using your own pictures. This Imagen-based technology makes it possible for you to insert any subject you want into a stable diffusion model. I made a similar video in November showing you how to use the Dreambooth extension in automatic1111 but since then a lo...

ā–¶ Play video
#

Updated guide that fixes the current broken extension. Finally

distant patio
#

Mixing trained embeddings AND dreambooth of the subject seems to give very good results.

#

My embedding model is terrible, though, even when blurring and noising background.
My Dreambooth model is "okay" alone, but without embedding I can hardly get the subject at all when introducing style and scenery.

#

I guess my dataset could be better

rain tapir
#

Are you talking about an embedding of one subject AND the dreambooth of the subject? That legit sounds dope af

#

I JUST got my embedding to work and look like my subject after 27,000 steps of training.

#

Dreambooth has always worked like a charm for me, if you are using the current dreambooth model, it will not work, you have to use the models provided in that video

distant patio
rain tapir
#

oo shit I havent finished watching the video lmao

#

I got stuck halfway trying to fix his old install which didnt work right off the bat for me

#

I am legit gonna become his patreon, the guy's tuts are better that anyone else out there

distant patio
#

Se Course help me to get the rope though :
Textual inversion :
https://www.youtube.com/watch?v=dNOpWt-epdQ
& Dreambooth :
https://www.youtube.com/watch?v=Bdl-jWR3Ukc

In this video, I am explaining almost every aspect of Stable Diffusion Textual Inversion (TI) / Text Embeddings. I am demonstrating a live example of how to train a person face with all of the best settings including technical details.

If I have been of assistance to you and you would like to show your support for my work, please consider bec...

ā–¶ Play video

I am explaining from scratch to very advanced level how to use #Automatic1111 Web UI and D8ahazard #DreamBooth extension to teach new subjects, e.g. your face into a model. Moreover, I am showing how to inject your taught face into a completely new model e.g. Protogen x3.4 to produce awesome quality images without wasting too much time on findin...

ā–¶ Play video
#

They both have slightly different settings, nothing extraordinary though.

#

lol, thanks for the thumb down

#

a pleasure

rain tapir
#

I hate those guys, him and olivio

#

Ppl always come here cuz their tuts dont work

#

Team Aitrepreneur ftw

distant patio
#

They both badly use the X/Y plot script though.

#

Prompt S/R should not be used like
token:keyword (and then a value = keyword, 1.0, 1.5, 2.0,...) that would made the first column of data useless
but rather
token:1.0 (and then a value = token:1.0,token:1.5,token:2.0,...) which would save you one full column of data

dapper prism
#

Has anyone ever tried training different aspect ratios?

steel egret
#

has anyone managed to successfully run tensorboard on either colab free or kaggle free ?

split acorn
brazen osprey
#

Would anyone have any advice with approach for training a style in the vein of ā€œlego city adventuresā€ tv series. At the moment I am attempting via a hyper network with about 200 images from the show. About 10k iterations (still early I guess) getting concerned perhaps it’s not learning very well. I’ve manually edited all the classifications so that they say ā€œlego manā€ or ā€œlego womanā€ if there is a character in the shot. And prefixed the text classifications with ā€œa 3D render ofā€¦ā€.

distant patio
#

Do you also have completely grotesque faces with Textual inversion? Even after 2000 steps and a strength lower than 0.2?

dapper prism
#

Are there any Dreambooth projects capable of training on datasets using non-square images?

distant patio
dapper tundra
#

to train an embedding, for example on different kinds of dragons (ice, fire, black, undead etc.), is it better to set it to character or style?

spiral sleet
#

Is there any way to upscale the image on weak graphics?i want this, but RuntimeError: Not enough memory, use lower resolution (max approx. 960x960). Need: 2.0GB free, Have:0.5GB free

stuck parrot
spiral sleet
#

I'm too stupid to use this

hot breach
next vale
#

Hello , can anyone tell me that which finetune technique is much better if we want to train our model on large dataset , dreambooth or lora or texual inversion or Everydream2Training

serene flicker
next vale
serene flicker
#

I haven't done much training myself, and I don't know what that last option on your list is, but Dreambooth is probably your best option if you want to train on a large dataset.

hot breach
#

dreambooth is good for training a face or a couple of things but tends not to scale well, not sure if LORA is really meant to capture many different things, but if its all one style or person maybe it works well and sometimes TI can as well

next vale
hot breach
#

interesting

next vale
#

actaully at this moment we are suppose to go with dreambooth but not sure that it is best fit for us or not !

hot breach
#

I don't even know what regularization class you'd use for food but I'm personally not that keen on the regularization process anyway

lofty hinge
#

Hi. Kinda noob when it comes to finetuning/ training models. There’s a character I like but he’s so obscure / unpopular there’s almost no fan-art / official art of him lol.

I want to make a Dreambooth model with about 20 images of him. Which would be the best Google collab repository / settings for this? I already tried with Last Ben’s fast collab and got acceptable results in my first try, but it seems to be overtraining the model on my following attempts. Any help would be appreciated. Thanks in advance!

stoic mason
#

hey all. i'm looking to train a SD model on a particular art style (not specific subject). should i be using dreambooth or textual inversion?

split acorn
#

I recommend LoRA CB_thumbs_up

dense sparrow
#

are the new lora updates integrated into automatic1111 yet?

weak hearth
#

don't think so but kohya's extension is about as fast as built in loras now and much easier to use since you don't have to touch the prompt box

restive bridge
#

speaking of Lora, how are results lately for training (real) faces, compared to DB? my goal is photorealism and style-ability

covert gazelle
#

I heard the AUTO1111 dreambooth extention got broken recently, has it been fixed yet?

wide meadow
#

@hot breach is it possible to train the model on colab, sorry to bother you

hot breach
#

yes

#

link is there, hopefully the descriptions on the cells are enough to get you going

#

just need the low-tier GPU instance (16GB T4) and its mostly setup in a way that should work out of the box

warm crane
median sun
#

is there a way to keep SD from cutting object in parts like this:

#

Tiling is on but I'd rather have the big snowflake in the middle of the texture

final matrix
#

Any of you know a software with which I can batch-remove watermarks? Like say load up a thousand images into the program, then switch with arrow keys between them and adjust the watermark removal mask individually for each image?

I have found programs where I can edit an image that way individually, but I have to load each image individually and then save before moving on to the next one which takes a lot of extra time. It would be a lot faster if I could load up all images simultaneously, then just switch between them via arrow keys or whatever, and adjust the mask for each image, and then save them all at once.

Kind of like how https://www.birme.net/ works.

I am willing to pay a small sum for a program that can do this but only in the paid version.

wise dust
#

does anyone know a forumula for how many repeats i should use for x number of images when training a lora?

#

or at least some examples of how many repeats for how many images people have used succesfully

woeful sphinx
#

When it comes to fine tuning a model on different concepts with 2.1 and DreamBooth, can you train the model on different steps for each concept or does it have to be universal for all data?

frank ibex
# rain tapir Ppl always come here cuz their tuts dont work

And they have zero clue what they are talking about and they just throw in 100,000 for epoch LonFlushed They just race to get a video on youtube for the ads and they even try to advertise their videos on github and huggingface wikis. That to me means they don't know jack shit lol

crimson wasp
#

has anybody looked into merging specific parts of the unet from different model sources to capture different elements such as composition, lighting, texturing, etc, which the unet presumably focuses on at different scales? Some current mix-models have apparently gotten much better results doing that, with this extension being built for it: https://github.com/bbc-mc/sdweb-merge-block-weighted-gui

The orange mixs model https://huggingface.co/WarriorMama777/OrangeMixs creator says:

2.Added IN deep layers (IN06-11) to the layer merging from the realistic model (BasilMix).

It is said that the IN deep layer (IN06-11) is the layer that determines composition, etc., but perhaps light, reflections, skin texture, etc., may also be involved. It is like "Global Illumination," "Normal Map," and "Ambient Occlusion" in 3DCG.

pine gazelle
#

reasons to use '512-base-ema.ckpt' vs 'v2-1_512-ema-pruned.ckpt' as a vae ?what does pruining really do if anyone could explain please.
do you want pruned for generation, and full for when youre using it to do training?

next vale
#

Hello, can anyone help me to understand diagram of texual inversion and how texual inversion works ?

median wren
#

How would I describe uncommon scenes when training? I want to train images for zombies (which are not recognizable as such, they are part of my game and might still look like normal people) penetrating other people with their hands. This is totally new, no model knows this. How do I write prompts that describe the scene? Should I write "A 40-year-old man puts his hand into another mans chest"?

median wren
amber musk
#

Excuse me, if I may ask. This is first time I trained Dreambooth in Colab. So far is okay, but could I convert it to LORA?

#

Uh sorry. I found LORA in different colab

rain tapir
#

Does anyone know if training 2 separate subjects in dreambooth and merging the models would work just as fine as training 2 subjects in a single model?

#

I ask because multiple subject training appears to be broken in my dreambooth commit.

dusky smelt
#

Let’s say, i have 20-30 pieces of art works like this sample (like this concept drawing by artist Maniani), and i wanna to use them as dataset to train a model, maybe for both style and characters together, is it possible?

I have tried the methods of embeddings and hypernetworks in stable diffusion (not yet tested in dreambooth due to hardware limitations, although lora released recently), but the results are not fairly nice.

For hypernetworks, basically, the style can be trained, when i use txt2txt to generate some images, the results are acceptable in artstyle. But, for example, it cannot generate a new pose or action that are not appear in the trained images, most likely it is a combination of various limbs, body and legs extracted from different trained images. When i use some prompts of objects that are not existent in trained images, it almost generate some ambiguous shapes of elements of the keywords.
In addition, when i use img2imag to generate images, the results are also not so ideal. My purpose is the trained hypernetwork can turn my input image (some rough sketches like line art filled with basic color without shading) to the desired style with details.

I don’t know if the dreambooth training method would give me a better results, so i will use this after an upgrade of my display card.

So, no matter embeddings/hypernetworks or dreambooth were used, i would like to know if i can train for a more specific items, like ā€œeyesā€, ā€œmouthā€ only, or merely the facial expressions; or training the ā€œposeā€ or ā€œactionā€. I am not sure if this way of training is possible?

I hope that the questions above are not too much and hope anyone is kind enough to provide some directions to me to let me find a nice solution.

lofty hinge
#

Anyone has some tips on training a character model on TheLastBen’s Dreambooth colab? I keep either overfitting or getting weird generations :/

silk hare
fast epoch
#

@silk hare A short tutorial on how you made this model would be much appreciated! Keep up the good work!

rain tapir
#

Wouldn't a dreambooth model only affect those that use the same prompts you used for training?

#

For every other image it would still be 512, I know because I tried one with 1024

tepid sundial
fickle haven
#

can anyone teach me how to load a githbub model to finetune it in thelastben dreambooth???

fickle haven
#

i dont know how to train this in a new character

winter apex
fickle haven
winter apex
#

@ me if you need help

fickle haven
#

didnt know what link was

winter apex
#

good luck then i hope it gives good results

fickle haven
hexed bloom
#

If I have an image thats 512x1000 and I train it with Center Crop disabled, does it squish the image?

rain tapir
#

No, it will probably just end up looking like crap

#

Not to mention, it can crash. The training, I accidentally had a few smaller pictures in my subject folder.

rain tapir
#

By the way, if anyone is having issues with dreambooth, giving the CPU allocator issue, I found out the reason.

#

Its because Aitrepreneur recommended you toggle on an option in the settings to transfer some stuff from VRAM to RAM. Even with 32 GB of RAM this was causing me to crash my training with generating samples.

#

Turn that shit off and problem solved. Took me weeks to figure out. Needless to say, I am officially no longer going to be his Patreon.

brazen osprey
#

wondering if anyone has had any success training an image classifier. Looking for some advice on transfer learning, need to classify 40-50k images of an animated tv series

tepid sundial
#

You should be able to finetune a CLIP fairly easily, is that not what you're looking for?

bronze igloo
#

any idea how he's doing it?

split acorn
#

LoRA and LEAP can be around that fast

#

I imagine they're borrowing code from LoRA, would be my guess

#

where they're only training weights and not the full model

final oracle
#

Any idea how many class regularization images i need for a model based on 300 images?

brisk swan
#

Are there any good resources for fine-tuning stable diffusion inpainting?

hidden nymph
#

would like to know this as well if any one could help^

gloomy pike
#
  1. Is it more benificial using characters name rather than describing their gender in the image prompt text files generated from BLIP for traning?

  2. For the text file prompts for the indevidual photos being trained, can I type one prompt that is vague and then a second prompt that is a copy of first with more detail? for example, "woman wearing green leotard, red hat, posed with back to camera giving tumbs up," and then, "cammy white doing victory pose," will that help it associate characters and outfits to names?

  3. Do I need to use underscore to combine words like BLIP and DeepBooru results are or can I type any number of things without underscore anywhere in the text file seperated by comma? Must the main prompt be first and keywords last?

  4. Should I use the BLIP results even if they are inacurate or am I doing the proper thing by reviewing and corecting them?

  5. Will changing any images in the database botch the end results or can it be used strategically?

  6. Same question as before but in regard to the training prompt. Do I need to keep the prompt I train it on the same through the entire session?

... continued

#
  1. Checking the results of my training can be done by draging the .pt file from my hypernetwork into a directory called embeddings and then calling the file name in prompt without the file extention in txt2img and img2img correct? I have not started traning, still reviewing my BLIP results for a lot of images.

  2. is over 500 images too much? or am I gonna have excelent results after running my computer for 36 hours?

  3. Should I use hypernetwork or the other options and I can us the pt from hypernetwork in embeddening same as a file from textual inversion right?

I plan to eventually finish the documentation, I just want to quickly know these things so I can get started training though. Thank you!

split acorn
#

For anime models:

  1. Generally recommended to not use names. DB you would train on the rare instance token. Finetuning you would train on the various tags/captions and the character result would be implicit with the use of some of those tags. Including gender (e.g. 1girl) is recommended. For DB, that's what people use for the class token.

  2. Depends on the repo. Many don't use underscores at all. I think the only one that did was the earlier WD models. You could get away with no commas at all, but "tag shuffling" can help training and that looks for things separated by commas. "Keep token" allows for the first X tokens to be kept when shuffling. This is important since the beginning tokens are weighted heavier than the ending tokens. Typically you'd 'keep' the rare token.

  3. Don't use inaccurate tags, is a good rule of thumb. This just comes down to time/cost. For smaller datasets it's easy to correct, but for larger ones it might just not be worth it.

  4. Probably depends on the repo and training method. People typically don't recommend it as it can mess with the learning, but I have no idea if there's a specific way to utilize that, since most people avoid doing that.

#

InvisibleParrot
6) Same as 5. People generally don't since it can interfere with training. I've done it before and it broke the model I was training.

  1. A good rule of thumb is to fail smarter / fail softer. You should really be starting with a small dataset and getting the hang of it before doing a large one, but I can't stop you. I can say that you can have really good results with under a tenth of that size.

For questions 9 and 2 I don't understand the questions

short python
#

you should be able to learn enough doing that to know how to structure your larger dataset and prompts more generally

gloomy pike
short python
#

i can answer 2), somewhat. CLIP is very weird, it won't behave like you'd uh, expect. it bizarrely seems to "know" things that are surprising, and then it is very stupid about other things. i told it that a bunch of photos of a VW Polo care were photos of "an sks car" and when i prompted "sks car" after training i'd get a VW Polo 100% of the time, but when i prompted "sks" i got a gun and "car" gave me some generic car

gloomy pike
short python
#

you probably want "cammy white wearing a green leotard and a red hat, posed with her back to camera, doing a victory pose with a thumbs up". i haven't done much testing with training humans but i'd expect you'd get training data attaching itself in useful ways to all of those terms.

short python
#

basically fully english sentences work better than tag lists, the training process does (somehow, unreasonably) break up the prompt into chunks that actually make semantic sense when you try and prompt them later

#

don't think about it too much, just describe the picture

#

and then test what you're doing

#

and based on the test, try doing another pass of "don't think about it too much, just describe the picture"

gloomy pike
# short python you probably want "cammy white wearing a green leotard and a red hat, posed with...

I'm worried about having to use the same prompt to train. Does it matter how vague or specific the prompt is? like should i have a more narrow goal or is woman posing for picture good enough? Should I go ahead and get rid of the single word tags or keep them in the training txt files because they help? ive been spending a lot of time correcting the generated prompts but trying to keep them resembling what I generally get.

#

Im changing to a smaller database, im gonna have to get more of a feel for this as I go on, thanks for your help!

short python
#

an insight that might help: when you train SD you're sort of doing img2img generations

#

every step is an img2img. the trainer takes your input image, does a one-step img2img on it with a random strength %, compares the generated image with the input image, and then pulls the model in the direction of your image based on the difference.

#

so anything you can put in a prompt that would make an img2img generation better will help the training, with the caveat that any difference between what that img2img would produce and your image will alter the terms in the prompt you use to do the img2img

#

so if you train "cammy white wearing a green leotard" it will be more easy later to prompt "cammy white wearing a yellow business suit", however at the same time if you prompt "boris johnson wearing a green leotard" the green leotard he's wearing is going to look more like cammy's

#

oh, i'm assuming dreambooth training - if you're just doing a TI then only the TI term itself will get trained, but you should still caption the other details because it will help

fast epoch
#

Yo

#

I have made the "Dreambooth" extension to work in the webui on google colab

#

I think I'll share it soon

fallen cloud
#

Really? Haha, that was exactly what i went in here just a minute ago to look if someone also had problems getting to work šŸ˜‚

fast epoch
#

Yea

#

After many tries

#

and I'm not even a coder :)))

#

But I used my logical thinking

fallen cloud
#

Haha, knows the feeling. Was doing the samt thing all night last night. Went to bed 5AM reluctantly but had to get to work at 7 so had to find myself be defeated by that task. Just sat down again getting into it ^_^

fast epoch
fallen cloud
#

Nice!

fast epoch
#

Editing images with only one prompt

fallen cloud
#

I have a new model i need to train, then i was planning to check that one out as well šŸ™‚

fast epoch
#

The proof

#

I even tried it with some cat training

#

It works

#

but I canceled, because the training was taking 30-40 mins

fallen cloud
#

im not getting lastben's fast-dreambooth to work, so i was hoping i could get DB to run in the webui instead and hopefully if will accept all the captions for my model.

#

Sweet! 😁

fast epoch
#

Found a way to make some extensions to work on google colab webui

fallen cloud
#

Was it hard to solve?

fast epoch
#

like "prompt generator", "model converter" etc.

#

Idk. It took some time

#

knowing that I'm not a coder šŸ™‚

fallen cloud
#

Haha, me neither šŸ˜‚ but with logic and google on your side anything is solvable

#

Im gonna try fast-db once again. I have chores to do, and i was hoping to have a model in training during šŸ˜‚ #has put everything on hold at home wile trying to be able to train a model first, for the last two days šŸ˜‚

fast epoch
#

From what I have tried, the extension is better

#

I mean it really showed images like the one from the training set

#

And I didn't even change the settings

#

I used the default ones

fallen cloud
#

Yeah, also heard that. But noticed yesterday that the extension wasnt working anymore in google colab šŸ˜…

fast epoch
#

Done

#

Made that extension to work too

#

the one with editing

#

:))

fallen cloud
#

Great work!

fast epoch
#

Thanks!

fallen cloud
#

Feels good to know that i wasnt the only one with the problem, and also that it actually IS solvable 😁 šŸ‘

gloomy pike
fast epoch
#

The hypernetworks are the less precise and the worst

#

Use textual inversion, dreambooth or lora training

fallen cloud
#

I've been working mostly with human-models. Now im trying to get deeper, more into also chathing body-patterns, facial expressions, postures, body sizes, every trait that resambles that specifik person. Glad i have a girlfriend that bears with me on this šŸ˜‚

#

Just been using dreambooth though. Tried some hypernetwork, but didnt get a good result at all. And havn't even tried textural inversion yet. Just read about Lora today though, so was thinking of trying to figure that one out.

fast epoch
#

From what I heard, dreambooth is the best atm

#

followed by textual inversion/lora

#

Promptgen working

fallen cloud
#

Nice! 😁

#

Freaking hell.. why wont fast-db's external captions work 🤬

winter apex
#

like 1 hour ago

#

i had to write captions manually

fallen cloud
#

yeah, i had done that also. but in the files šŸ˜… 2967 images, and i had edited them all by hand. Reeeeally dont feel like doing it in the browser also..

#

#grumpy

#

Well.. dont have time to fix that issue now. Just have to run the training without my captions and have a test-run.

#

there.. now its atleast running. Trainingtime 7 h, 48 min.. without the captions.. šŸ˜… wish me luck!

winter apex
#

yeah, impossible to write manually but if i recall correctly in the gdrive theres a folder inside fast-dreambooth that says "captions"

#

also a zip file

fallen cloud
#

Hmm... damnit. now i have to try once more

#

perhaps i can just manually place the captions in the session folder, and edit the captions.zip manually, and it could work šŸ¤”

winter apex
#

it should be the same, better than training without captions

fallen cloud
#

If looked like dreambooth was confusing the txt files with the png files. But this way they really get separated. Might work!

#

Done, lets see if DB gets along with it!

#

Wooh!

#

I think it actually worked!

winter apex
#

well see in 8 hours then

fallen cloud
#

well.. right now, with the captions it says 45 h and 48 min šŸ˜‚

#

Guess i have time to do the dishes now atleast

gloomy pike
#

I keep getting this error when I try starting up the learning

storage = zip_file.get_storage_from_record(name, numel, torch.UntypedStorage).storage().untyped()
RuntimeError: PytorchStreamReader failed locating file data/0: file not found

#

i remade the pt file

#

it solved it!

flat depot
#

question, how do i change the res of generated classification images in the DB extension? im trying to make a model using 2.1

brazen osprey
#

is dreambooth really the best? it seems subjective. alot of people are saying everydream v2 is better for fine tuning. i have only done a few hypernetworks, and a couple of everydream models. so havent yet tried dreambooth.

fallen cloud
#

hmm.. just heard about everdream today actually, but from what i saw now it seems to be worth a try later on though! Especially when working with larger images-sets šŸ¤”

brazen osprey
#

yeah, my last attempt using was with about 300 images and was "ok".. so trying to get a dataset classified now with about 30 - 40k.

#

i had some really poor results yesterday with a hypernetwork and the character "admiral ackbar" from star wars.. so might try again with dreambooth.

fallen cloud
#

Oooh.. nice! That would be interesting to see the result from when you've tried šŸ™‚

#

Damn.. 5.13 AM today again 😭 really need to turn my day rhythm back to normal again. Good night!

brazen osprey
#

pretty bad - but will see give dreambooth a shot later today instead..

short python
brazen osprey
#

That was 11 images. Limited resources. I’ve had good results with hyper networks before actually. But the reference I was allowed to use was quite poor.

#

Shouldn’t just assume my captions are rubbish

#

The learning rate was .00005

#

But I’ve used hyper networks enough to see when it’s not learning the way i want it to. Hence I thought I’d give dreambooth a try.

#

The top image above was def over training - after 30 steps. The bottom was after 10k

#

So Obv I was testing dif amounts of training also

#

I can’t show the reference I was using because of an NDA - but I figured there wouldn’t be an issue with these results given how off they were

short python
# brazen osprey The learning rate was .00005

idk about hypernetworks or TI but that’s a very high learning rate for everydream2, at least 10-15 times higher than what’s recommended. does your trainer have a proper validation split? if it doesn’t you’re going to have a hard time finding the correct LR

brazen osprey
#

It isn’t every dream 2

short python
#

can you post some sample captions? not assuming they’re bad they just might not be the best for what you’re trying to train

brazen osprey
#

.00005 learning rate is fine, then when it start to overtrain you lower the rate and continue that’s standard practice

#

The main issue is the reference. They are of the actor from rotj in his suit from when they were making the film

short python
#

you’re trying to train his face? are your captions describing the background and what he’s doing?

brazen osprey
#

If it was just his face that wouldn’t be a problem. Need the full body.

#

And then the goal is to put him in the stance with his arms and hips also from the film

#

I don’t need the background so I’ve left it pretty vague

short python
#

that’s a mistake

brazen osprey
#

Also, for the record he is captured on a white background

short python
#

SD needs you to tell it the background so it knows what to ignore

brazen osprey
#

When they were doing the costume tests for the film

#

I’ve said on a white background

short python
#

ok i can’t help without sample captions and you seem petulant/resistant to my advice and experience, goodbye

brazen osprey
#

Lol. Im not at my computer so no I can’t provide a caption. No need to be so irritable on a forum.

#

I also clearly described what the background was given jt was white from the photos they took off set.

short python
#

yet you are getting bad results. ergo something must be wrong. you don’t know what, i suggest some things that might be a problem, such as too high learning rate. you shrug it off as ā€œnah that’s what everyone usesā€. idk what to say mate

brazen osprey
#

Dude, I said it was the reference. And said I was not going to use hypernetworks next time. Maybe you didn’t read the context above the images. You also said you don’t hypernetworks so maybe you don’t really know?

short python
#

just try a lower LR and see what happens.

brazen osprey
#

Did you not read the part where I said I lower it once it looks like it’s overtraining? When I am giving dreambooth a go I will be using a totally different learning rate

short python
#

try lowering it from the beginning

#

once it looks like it’s overtraining it is already overtrained

#

that might be your problem

brazen osprey
#

I save out tree hypernetwork every 100 iterations. So I go back to before it was overtraining

#

But this is irrelevant - since I won’t be using hypernetworks on it

#

Next time

short python
#

yes but if you start with a too high LR the network has already baked bad data into itself

brazen osprey
#

Ok. Well if i go back to hypernetworks next time - if dreambooth doesn’t work well - then I will start with a lower rate

#

But I’m fairly sure the results will be better with dreambooth

short python
#

sure, if you pick the right LR and have a good captioning strategy and don’t overtrain šŸ™ˆ

brazen osprey
#

Yes, with those things in mind..

short python
#

these systems are all the same under the hood

#

different degrees of flexibility, that’s all

brazen osprey
#

Yeah, well above (before the images) I was merely saying I was looking forward to trying it out in dreambooth for comparison. I’ve had good results w hypernetworks in the past, and good results with every dream