#🔧｜finetune | Stable Diffusion | Page 21

tropic iron Mar 4, 2024, 7:49 PM

#

And then when I use my lora, instead of using the old checkpoint and the lora I instead use the sdxl base and the lora

#

Because the lora should in theory take the place of the checkpoint?

dusky urchin Mar 4, 2024, 7:53 PM

#

do a new fine tuning on SDXL.

#

do whatever you have RAM and data for. you can start by trying a LoRA fine tuning on SDXL with the same data you used for your other fine tunings.

tropic iron Mar 4, 2024, 7:54 PM

#

Rockin

tropic iron Mar 4, 2024, 8:11 PM

#

What happens if I use a LoRA trained with SDXL alongside a checkpoint based in 1.5?

dusky urchin Mar 4, 2024, 8:19 PM

#

tropic iron What happens if I use a LoRA trained with SDXL alongside a checkpoint based in 1...

they are not cross compatible

#

you can always turn 1.5 latents into pixels, then into XL latents

#

and run img2img

#

there isn't really a point in that though. simpler workflows, in my experience, have always been better, whereas improving prompts yields better results

stiff dust Mar 4, 2024, 8:44 PM

#

yes that works. But I would do that only if you don't have proper training data for sdxl.

tropic iron Mar 4, 2024, 10:22 PM

#

stiff dust yes that works. But I would do that only if you don't have proper training data ...

It isn't that I don't have proper training data

#

Its that the only checkpoint (for furry art) I've found which produces anything other than crap has 1.5 as a base model

tropic iron Mar 5, 2024, 2:34 AM

#

Hey finetuning people

#

I'm back with 1000 questions

#

I'm trying to understand the machinery. That's how I've always done best. So okay...

#

Y'all have made pretty clear that if I'm using training upon a checkpoint based on SD 1.5, which has native resolution 512x512, then all my images should have that resolution. I can see in my mind's eye (I think) why - the base library has x output neurons, where x is 512x512

#

Assuming that's correct - what happens when I ask a 1.5 based model to give me something with a different resolution? Does it distort? How does "native resolution" become "arbitrary resolution"?

steady seal Mar 5, 2024, 3:46 AM

#

Hi everyone. I am looking for someone who can train the lora model to convert seifle to AI grillz image. Please DM if someone is available

dusky urchin Mar 5, 2024, 6:01 AM

#

tropic iron Assuming that's correct - what happens when I ask a 1.5 based model to give me s...

i think the single most important thing to realize is that you don't need high resolution generations; that you can get very far with a square aspect ratio; and if a square aspect ratio isn't enough, then a 16:9 ratio is good, and the maybe a 9:16 one, and that's it

neat fox Mar 5, 2024, 6:43 AM

#

tropic iron Assuming that's correct - what happens when I ask a 1.5 based model to give me s...

Distortion yeah

#

Ask for much beyond the standard resolutions and aspect ratios and you get mutated limbs and duplicated objects

#

Generally 768x512 is fine, 768x768 is usually just over the line

lime ivy Mar 5, 2024, 7:11 AM

#

Hi everyone. I finetuned SD1.5 on a dataset with mostly half-body image. The clothes trained well, but the face is kinda distorted. Can I improve it by further training on a face-only or close-up dataset (like a regularization)? Or is there any other way to improve the face quality?

worn imp Mar 5, 2024, 10:13 AM

#

Hi, I'm trynna train a model on art from Yabujin. In total I wanna do training on drain gang too, stuff like my profile picture basically, but I wanted to take things slowly first and see how it goes. But I'm already having problems obviously, so if anybody could help me out to get some good results, any help would be appreciated. :)

#

#

this is my code

#

https://files.catbox.moe/yzp3oq.zip

#

and this the current training data

#

I used to have a way larger one, like 200 images basically, but I didn't caption them all so well. I thought the ai would figure out the style itself, without much captioning from me, due to the sheer size Lol

simple wave Mar 5, 2024, 12:18 PM

#

Hi all!

I am learning to finetune a model on dreambooth. I just need a small dataset for getting through basics. If anyone has any resource where I can find small image datasets of the same object, pls share them

Ex: 25 images of the same animal/thing/place

jade hornet Mar 5, 2024, 12:53 PM

#

why would you be doing finetuing without knowing what you are finetuning? figure out what you want to do, then solve the problem

worn imp Mar 5, 2024, 12:55 PM

#

jade hornet why would you be doing finetuing without knowing what you are finetuning? figur...

you are not talking to me right

jade hornet Mar 5, 2024, 12:56 PM

#

no

worn imp Mar 5, 2024, 12:58 PM

#

but can you help me

#

opa ╱|、
(˚ˎ 。7
|、˜〵
じしˍ,)ノ

stiff dust Mar 5, 2024, 1:17 PM

#

tropic iron Y'all have made pretty clear that if I'm using training upon a checkpoint based ...

it's not working that way.
In general, SD supports any resolution. But the way objects are composed and arranged to each other (probably by the convolution layers) is trained on a specific resolution. The model knows how to place a face in 512x512 . if you give it a 1024x1024 it will get confused and create multiple faces. That's why you should rather stick to the resolution it was trained for.

jade hornet Mar 5, 2024, 2:48 PM

#

there are ways to be clever and maybe outpaint the image, combined with controlnet or other tools to achieve the original intended result...or just use XL which was intended for higher resolutions

craggy sierra Mar 5, 2024, 7:31 PM

#

Is there a discord community for training loras and such?

jade hornet Mar 5, 2024, 8:16 PM

#

I've found a couple others, but frankly they're pretty low volume

#

the r/stable diffusion reddit has a discord, for example

craggy sierra Mar 7, 2024, 5:21 AM

#

My dataset has images like 1.jpg, 1.txt, 2.jpg, 2.txt, and so on. The .txt files contain tags for each image. Do I need to set this setting to .txt, or are .caption files something else?

ruby pond Mar 7, 2024, 7:10 AM

#

craggy sierra My dataset has images like 1.jpg, 1.txt, 2.jpg, 2.txt, and so on. The .txt files...

put .txt in that box

craggy sierra Mar 7, 2024, 7:10 AM

#

ruby pond put .txt in that box

alright, thanks!

viral jackal Mar 7, 2024, 2:22 PM

#

having a lot of trouble getting cascade to train

stiff dust Mar 7, 2024, 2:58 PM

#

me, too. My feeling is that Cascade is really bad for training.

hollow spruce Mar 7, 2024, 10:07 PM

#

same :/
like it works with high effort. but it doesn't ever work perfectly. Some things are easy to train, but most are hard.
really depends on what you're aiming for. Just dont ever aim for face/person loras XD

somber thorn Mar 8, 2024, 6:48 AM

#

We created an index for the datacomp-12.8M dataset using Fondant and published it on the huggingface hub. You can find more details and info on how to use it in this short post.
You could use the dataset to fine-tune your own controlnet models.

errant scarab Mar 8, 2024, 8:31 AM

#

Hello Everyone,
I need some help getting started with training my own Dreambooth or Lora.
I have a good local system with 24GB GPU Vram and know how to use ComfyUI (& automatic).

I want to train on a very old comic book style and I have a couple of those comic books with me.
This is what I found on the resources section on discord https://huggingface.co/docs/diffusers/training/lora
Is there a video you'd recommend I follow to get started on this journey.
It would help me a lot. Much appreciated.

LoRA

neat fox Mar 8, 2024, 8:44 AM

#

errant scarab Hello Everyone, I need some help getting started with training my own Dreambooth...

I've had good luck with OneTrainer. It's pretty easy to use and a little lighter on vram than kohya. Also faster

quick anchor Mar 8, 2024, 10:12 AM

#

Hello -- hope this is the right place for this question : I am attempting to run a py script that is calling on tokenizer /tokenize_config from https://huggingface.co/base_model/resolve/main/tokenizer/tokenizer_config.json but is returning a 404 error , checking the link leads to a "Repository not found" page. Is this likely a temporary outage with hf or did I do something wrong /overlook something else? log snippet if helpful: Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py", line 304, in hf_raise_for_status
response.raise_for_status()
File "/usr/local/lib/python3.10/dist-packages/requests/models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/base_model/resolve/main/tokenizer/tokenizer_config.json

errant scarab Mar 8, 2024, 5:56 PM

#

neat fox I've had good luck with OneTrainer. It's pretty easy to use and a little lighter...

thanks for the heads up 👍🏼

neat fox Mar 8, 2024, 5:56 PM

#

errant scarab thanks for the heads up 👍🏼

if you need help getting started lemme know

#

i'm far from an expert but i've gotten a few pretty decent ones in my limited experience

dusky urchin Mar 8, 2024, 6:55 PM

#

errant scarab Hello Everyone, I need some help getting started with training my own Dreambooth...

do you have a shot from the comic book that you are talking about?

thick bear Mar 9, 2024, 2:57 PM

#

If anyone can share their lora settings for onetrainer I'd love to see, especially for training a style. The UI is so much nicer than kohyas mess of nested tabs but there's almost no resources for it

#

I tried to replicate prodigy settings I've seen around for kohya but it didn't do too well in OT

neat fox Mar 9, 2024, 4:35 PM

#

thick bear I tried to replicate prodigy settings I've seen around for kohya but it didn't d...

i haven't done a style, but i could share ones that have worked reasonably well for a character

thick bear Mar 9, 2024, 9:53 PM

#

That'd help, I think the biggest difference in character vs style is in captioning anyway. I've actually had okish results just using the XL default settings in OT, but I don't think they're entirely appropriate for pony models which is my problem

hollow spruce Mar 10, 2024, 3:18 AM

#

thick bear If anyone can share their lora settings for onetrainer I'd love to see, especial...

how much vram you got? if you're at 16 or 24 I can help

#

For everyone who has captioning issues:
https://github.com/jhc13/taggui
added moondream1 as a model for auto captioning.
Definitely not the best model, but it runs on a toaster

so as long as you have 6gb vram or more, you can do (relatively good) auto tagging
(if you're at 14 or more... use cogvlm!)

dusky urchin Mar 10, 2024, 3:40 AM

#

hollow spruce For everyone who has captioning issues: https://github.com/jhc13/taggui added mo...

do you author this?

hollow spruce Mar 10, 2024, 3:41 AM

#

dusky urchin do you author this?

nop. just use it a lot.

thick bear Mar 10, 2024, 3:55 AM

#

hollow spruce how much vram you got? if you're at 16 or 24 I can help

24 here

hollow spruce Mar 10, 2024, 3:58 AM

#

thick bear 24 here

[[subsets]]
num_repeats = 1
caption_extension = ".txt"
shuffle_caption = false
flip_aug = false
is_reg = false
image_dir = "A:/Datasets/npcp/source"
keep_tokens = 0

[noise_args]

[sample_args]

[logging_args]

[general_args.args]
pretrained_model_name_or_path = "B:/SD_models/checkpoints/sdxl/sd_xl_base_1.0_0.9vae.safetensors"
mixed_precision = "bf16"
seed = 23
max_data_loader_n_workers = 1
persistent_data_loader_workers = true
max_token_length = 225
prior_loss_weight = 1.0
sdxl = true
xformers = true
cache_latents = true
cache_latents_to_disk = true
no_half_vae = true
gradient_checkpointing = true
max_train_epochs = 60

[general_args.dataset_args]
resolution = 1024
batch_size = 7

[network_args.args]
network_dim = 64
network_alpha = 1.0
min_timestep = 0
max_timestep = 1000

[optimizer_args.args]
optimizer_type = "AdamW"
lr_scheduler = "constant_with_warmup"
learning_rate = 0.0001
max_grad_norm = 1.0
text_encoder_lr = 5e-5
warmup_ratio = 0.05
min_snr_gamma = 5

[saving_args.args]
output_dir = "A:/Datasets/npcp/output"
save_precision = "bf16"
save_model_as = "safetensors"
output_name = "npcportrait_v2"
save_every_n_epochs = 5
save_last_n_epochs_state = 1
save_state = true
save_toml = true

[bucket_args.dataset_args]
enable_bucket = false
min_bucket_reso = 512
max_bucket_reso = 2048
bucket_reso_steps = 64

[optimizer_args.args.optimizer_args]
weight_decay = "0.1"
betas = "0.9,0.99"

#

my settings for derrian distro. but they translate 1:1 into onetrainer
Was my settings for -> https://civitai.com/models/336145/npc-portrait-xl-for-basedreamshaperlightning
which was also a pretty high effort style lora. so I can vouch that these settings work for 3090/4090 users

NPC/DND Portrait XL (For Base/Dreamshaper/Lightning) - v1.0 | Stabl...

This is a pen & paper npc character portrait generator lora. Its mainly optimized for DND, Pathfinder, and equivalent fantasy games, but it can...

#

I had 1.2k images for my dataset. (but the settings barely/dont change unless you have a sub 100 dataset)

#

npcp, simple background, round ears, human, caucasian, girl, a human with a contemplative expression draped in a blue scarf gazing into the distance. <- had captions like this, to reinforce the style (basically do a trigger word, then tag everything that is happening in the image, and nothing about the style)

thick bear Mar 10, 2024, 4:06 AM

#

Thanks I'll test it out on the next run, probably be more like a 50-100 dataset though so I'll fiddle

#

For style captions in kohya I had pretty good results having only "by artistname" as the caption, but I'll try adding some non style caps like you did as well

hollow spruce Mar 10, 2024, 4:08 AM

#

thick bear Thanks I'll test it out on the next run, probably be more like a 50-100 dataset ...

100/400/3k are the big "breakpoints" in dataset size where quality significantly changes, and where you can do captioning things that dont work on smaller sets. keep those numbers in mind

thick bear Mar 10, 2024, 4:08 AM

#

Llava 1.6 is pretty amazing for describing images

#

Looks like taggui doesn't have 1.6 support yet but I was really impressed testing it out in comfy, I'll def try tagging a dataset with that soon

dense abyss Mar 10, 2024, 6:19 AM

#

ha

polar smelt Mar 10, 2024, 11:12 AM

#

Hi guys, I want to use the openai clip model and a classification model to format a simple image description (for example a user who wants to create an image) into an sd-prompt.

My thought was to use clip to capture the text features and then train the classification model on the features and labels (which would be the sd-prompts).
Could anybody with more experience tell me if this is a viable way to achieve my goal?

foggy inlet Mar 10, 2024, 12:10 PM

#

@polar smelt just use CogVLM or ShareGPT4V, or check out some other models available at vision arena (https://huggingface.co/spaces/WildVision/vision-arena) - some of them are pretty good, and you'll probably get better results if you just use one (or more) of those and figure out best prompts for them to get what you want, instead of trying to create new model from scratch

last panther Mar 10, 2024, 1:34 PM

#

Morning folks. Does anyone pls know any good tutorial on how to set the right parameters for Dreambooth SDXL on Diffusers "train_dreambooth_lora_sdxl.py"? My training is working but the results are not great 🙂

sacred grail Mar 11, 2024, 12:14 AM

#

does anyone know how I can do Aesthetic score finetuning?

slim gyro Mar 11, 2024, 1:00 AM

#

sacred grail does anyone know how I can do Aesthetic score finetuning?

Set time machine to 1992 and refuel the Delorean with bananas.

hollow spruce Mar 11, 2024, 4:26 AM

#

probably some dumb mistake like using .caption instead of .txt for captions... or some pretty basic setting that always needs to be enabled... but wasnt

#

any reason you want to use dreambooth specifically? rather than lora?
if you're determined to use dreambooth original or kohya implementation, you're in a for a bit of troubleshooting ^^'

#

ah yeah. then it checks out.
there's a million things that can go wrong if you're doing finetuning.
best start with an existing preset, to test your dataset if its working at all. then adjust from there
(Onetrainer has a basic finetune preset for each major sd version, to get you started)

#

if you dont wanna switch trainers, then just take inspiration from the preset, and redo it in your trainer

#

oh. ooooohhhh.
if you're dealing with datasets under 10k images, then best stick with LoRA

if anything, then you should just take a simple lora preset that people recommend, and spend 80% of your time on making your captions better and better

raw dirge Mar 11, 2024, 4:50 AM

#

a lora would work better for u

#

not enough images for you to train a whole helicopter checkpoint

#

with people is easier because the model already knows how ppl look

#

but for machinery you would probably need 10k to 50k

#

since it already knows the basics of helicopters a lora would be better,u could get like 50 imgs of each heli model and train a lora for each one of them

#

id say give lora a try,it will save u time and energy if u dont like it u can always train the checkpoint with lots of imgs

#

for 1.5 i used to do it on kohya but idk whats changed its been a while

silver dawn Mar 11, 2024, 9:32 AM

#

https://www.youtube.com/watch?v=gt_E-ye2irQ. Haven’t tried it yet but I thought I help a bit

YouTube

AIFuzz

Lora Training using only ComfyUI!!

We show you how to train Loras exclusively in ComfyUI

Github
https://github.com/LarryJane491/Lora-Training-in-Comfy

Join and Support me

Support me on Patreon:
https://www.patreon.com/AIFuzz

Let’s be Instagram friends:
https://www.instagram.com/aifuzz1/

Discord

▶ Play video

stone garden Mar 11, 2024, 9:53 PM

#

One message removed from a suspended account.

minor shale Mar 12, 2024, 12:02 PM

#

Got an error when training a lora

📎 Error.txt

jade hornet Mar 12, 2024, 12:25 PM

#

that output looks normal, some of it was missing though

minor shale Mar 12, 2024, 12:33 PM

#

wdym

minor shale Mar 12, 2024, 12:34 PM

#

jade hornet that output looks normal, some of it was missing though

at the end it says returned non-zero exit status 1.

jade hornet Mar 12, 2024, 2:57 PM

#

Nevermind, I see it now. It threw an attribute error, which means one of your package versions is wrong. I'd compare your installed packages vs the ones it needs

#

I can't tell from that which one

minor shale Mar 12, 2024, 2:58 PM

#

amazing...

#

what command do i run to check

jade hornet Mar 12, 2024, 3:07 PM

#

I'm a Linux guy, try pip list? You might have to go to the tech support channel. The requirements text files have the versions you need

minor shale Mar 12, 2024, 3:12 PM

#

jade hornet I'm a Linux guy, try pip list? You might have to go to the tech support channel....

thanks

#

im on macos but it's pretty similar to linux

#

based on unix kernel

jade hornet Mar 12, 2024, 3:16 PM

#

There's a pip command to force reinstall and you can point it at those version files, it be easier... Something like pip install --force-reinstall -r file.txt... Google it though, that was from memory

tame vortex Mar 12, 2024, 3:24 PM

#

(and you d need to use venv's pip not your system's one, assuming whatever you re using has a venv)

dusky urchin Mar 12, 2024, 4:16 PM

#

i think you've been at it for a while. didn't we discuss that you can already achieve this pretty straightforwardly?

#

you don't have to fine tune at all

#

for stuff like this: it isn't meaningful. think about what is visible, not what it's called

#

if it can't be seen it's not going to fine tune either

#

if the differences are subtle it's not going to show up in fine tuning unless it is similar to something that has already existed in sdxl.

hoary ember Mar 12, 2024, 9:56 PM

#

I have about 2.5 million adult photos (mostly 1280x720 resolution) that I scraped from a large pic gallery site. All of these images include metadata such as a one-line text description of the scene and categories/tags. What would be the best software for making a fine-tune with this scale of dataset?

#

(I would ideally like to train something based on SDXL or a similarly fast model, because part of what I want to do is video generation, but I'm open to alternatives)

dusky urchin Mar 12, 2024, 10:49 PM

#

hoary ember (I would ideally like to train something based on SDXL or a similarly fast model...

SVD takes an input image that can be created from any model, so it isn't specific to sdxl

hoary ember Mar 12, 2024, 11:22 PM

#

Do I need video samples for finetuning, or can it work with just images? I could also easily scrape a video site, but the dataset size would be massively larger (my image set takes up nearly 500GB). I could always compress/convert the video to lower-res, but wonder how this would affect training/output quality

#

Also, does SVD let me use large image sets (millions) or does it just take a single image?

hollow spruce Mar 13, 2024, 9:56 AM

#

hoary ember I have about 2.5 million adult photos (mostly 1280x720 resolution) that I scrap...

A.) invest hundreds of hours, to learn the skills to take on a project of this magnitude
B.) invest thousands of dollars, to pay someone/or a group/ who already has the skills, to make a finetune of this level for you, using this dataset

what you're asking is the equivalent to "I got a hold of lots of spare car pieces and raw metal. How do I use this to tune my car to look better and go faster. Preferably I want to use less gas as well, since I want to compete with supercars"

its not that there's an issue with the ambition, but you're best off starting small, and learning how to make a small lora based on 400 images. then on 3k images. then on 10k images. then take what you've learned, and start from new again, since genuine full finetuning is even more destructive/hard to do right, and work on making 10k finetunes until you get good enough, to slowly scale up.

Expect your final final finetune on 2.5kk images to costs several thousands and will have to run on dedicated rented cloud hardware (since batch size matters, meaning you'll need an A100 cluster to do it in a reasonable amount of time, or else you'll wait for 3 months for a single A100 to get through it), and you only get one chance unless you're willing to invest that kind of money again and again

hoary ember Mar 13, 2024, 6:24 PM

#

hollow spruce A.) invest hundreds of hours, to learn the skills to take on a project of this m...

I understand that this is a big project, and that I will need to learn skills, and that this will take time. I'm fine with that.

I'm just trying to determine which skills I need to learn. I am trying to determine specifically which tools would be used for this type of project, so that I can learn how to use them. I am planning on starting out small as you said, and working towards the larger goal (I wasn't planning on just immediately trying it with 2.5m images without trying small batches first lol)

Also, as far as hardware, is there really no way that this could be done on a local machine (AMD 7970X CPU, 512 GB DDR5, and 2 x RTX 4090)? Given this hardware, what scale of training would be possible / in what kind of time period?

hollow spruce Mar 13, 2024, 6:46 PM

#

hoary ember I understand that this is a big project, and that I will need to learn skills, a...

well you're in a bit of trouble.
hardware wise you're good to go including standard full finetuning (unet only). so thats about 80% of the way.

But your main issue will be:
A.) Guides <- lots of guides say different things, while saying "this is the best way". 95% of them, are in fact, geared towards ultra small datasets. Things change significantly once you hit the 400 & 3k dataset size mark.
B.) Captioning <- while there are tools that help automate this, you also gain their bias. (Example: cogvlm always saying "serene", or LLava often mixing up arm locations, which results in wrong arm anatomy if you train for too long on many images)
C.) Dataset management. Anything but 1k images, needs a dedicated tool for dataset management. currently there exists no true free or even paid dataset management software. You'll definitely find many if you google, but you'll also despair once you get more and more images...
basically, there exists "Dataset architecture" which is genuinely complicated. And this becomes unavoidable once you hit 100k

and you might think... well do I really need that? I can just autocaption everything and accept the bias from x or y tool.
Which would then result in your training not actually improving in quality beyond a certain level. Meaning you'd benefit more from a trimmed dataset that is well managed. <- downloading huge datasets of millions of images is easy... but this is the core reason why no one actually trains that large. its for the simple reason that improperly managed datasets only cost more to train, but dont infinitely improve quality nor adaptability

#

if you just want to start training, while being ok that this might be too big of a goal, I can heartily recommend:
• onetrainer <- for the easiest training of sdxl
• taggui <- for tagging of your images
• hydrus network <- for real (but very very painful) dataset management

hoary ember Mar 13, 2024, 7:00 PM

#

hollow spruce well you're in a bit of trouble. hardware wise you're good to go including stand...

Thanks so much for the recommendations! That's a great starting point 🙂 ... as far as dataset management, I'm very comfortable with Python and wrote the tools to scrape the images myself, and made sure to generate JSON metadata for all of the images (title, descriptions, keywords, names, etc) that makes it pretty easy to work with. Basically, the entire dataset is already tagged / organized quite well.

... I'm glad you brought up CogVLM because that is something I had actually been looking into. I have one line image descriptions (~100 chars / 10-15 words) and 5-10 tag words that I scraped along with each image, but I was considering using CogVLM to expand on these descriptions even more. But I am hearing what you're saying re: biasing the dataset. ... Maybe I could work on making a fine-tune of CogVLM first, and then work with it?

hollow spruce Mar 13, 2024, 7:03 PM

#

hoary ember Thanks so much for the recommendations! That's a great starting point 🙂 ... as ...

keep in mind the 77 token limit. its one of the core issues why we cant describe images better during training

hoary ember Mar 13, 2024, 7:04 PM

#

Oh damn, I didn't realize the limit was that short ... that suuucks ... welp, I guess I won't bother spending the time with all the CogVLM stuff then, because my idea had been to try to generate a detailed paragraph describing each image to append to the end of the original one line description

hollow spruce Mar 13, 2024, 7:10 PM

#

hoary ember Oh damn, I didn't realize the limit was that short ... that suuucks ... welp, I ...

one thing that works well for me, since I have complex custom tags for all my datasets, is to make a custom prompt for each image for cogagent. (or in your case, for llava 1.6 in order to support anatomy knowledge, as cogvlm is 100% sfw)

#

obviously you'd extend this to make the most use of your existing information

#

this also helps you avoid most vlm issues, as you reduce hallucinations to an absolute minimum

#

this will give you natural language captions which dont work well for small dataset lora training, but really shines if you do 4k images or more

hoary ember Mar 13, 2024, 7:17 PM

#

Thanks so much for taking the time to explain all of that! ✨ I'm gonna go do some research into what you've told me so far, and mess around with some test batches and see what I can come up with.

jade hornet Mar 13, 2024, 9:01 PM

#

hollow spruce this will give you natural language captions which dont work well for small data...

what do you recommend for small datasets? Ive typically done wd tagger

#

<50 images

hollow spruce Mar 13, 2024, 9:07 PM

#

jade hornet what do you recommend for small datasets? Ive typically done wd tagger

are we talking SD1.5, any of NAI (anime) based checkpoints?
or are we talking sdxl base, or sdxl ponyXL?

#

different answers depending on which one you're training

jade hornet Mar 13, 2024, 9:45 PM

#

for me it would be XL, realistic

hollow spruce Mar 13, 2024, 10:03 PM

#

jade hornet for me it would be XL, realistic

option A.) <trigger word>, ask cog to generate a short caption of everything you dont want your lora to learn.
option B.) since its under 50, <trigger word>, then manually keyword tag everything you dont want the model to learn. do like 1 ~2 word descriptions. <- then enable shuffle captions + keep 1 token. enable TE training.
never mention any word related to anatomy, like "neck", "boobs", "stomach", "hands", "arm", "feet" etc... <- these will always make your lora worse since you dont have enough examples for it to learn an actual improvement, meaning you will most like cause a senseless offset, unless it is the actual concept you're training. (and even then, you'll probably have to rely on pure overfitting... as sub 400 images isnt enough to make an actual positive contribution to anatomy knowledge)

depending on situation make use of the mask feature in onetrainer

option b will usually give better results, since its more targeted

jade hornet Mar 13, 2024, 10:06 PM

#

cool, regarding anatomy knowledge, I've found some derived models that already have that trained in work somewhat better for nsfw type training...but of course those are easier to overfit or already are in some cases

hollow spruce Mar 13, 2024, 10:08 PM

#

jade hornet cool, regarding anatomy knowledge, I've found some derived models that already h...

true. but if done right, then you'll end up with a lora trained on base, which will work flawlessly on every non-foundational XL model out there

#

my lora (of a specific dnd character in my campaign, that I generated in dalle3) + base | dreamshaper turbo. taught just the face without messing up the hands XD (and no background bias or weird skin color offset)
trained on 8 images. works on every model except pony & other foundational ones

2040-artwork20of20aline1320short20messy20red20-sdxlbasesd_xl_base_10_09vaesafetens-3.png

1739-aline15red20hairshort20messy20straigh-sdxldreamshaperdreamshaperXL_v21TurboD-288889718.png

#

one of the 8 training images + mask I added

0f093b30-38ea-45fb-8404-207a575525ec-Recovered-masklabel.png

pale hawk Mar 14, 2024, 1:25 AM

#

I have a question about my first attempt at full fine-tuning SDXL 1.0. Here's what I did:

Used 370 high-quality, advertisement-style text-image pairs with the kohya sd-script.
Set the batch size to 16 and the learning rate to 3e-4, leaving other parameters at default.
Observation during training:
Instead of gradually adapting the existing SDXL 1.0 outputs to fit the custom dataset style, the image generation process seemed to start from scratch. The images began in a distorted state and slowly formed over time.

Generated Images:
Below, you'll find the image generation results for a given prompt, captured every 2000 steps from 0 to 18,000 steps.

prompts:
A bottle of Paul Medison White Musk shampoo is prominently featured against a soft purple backdrop, complemented by an elegantly draped white chiffon fabric. The vibrant red bottle with white and black text stands out, highlighting the product's sophisticated appearance and suggesting a luxurious hair cleansing experience.

I'm new to the Discord community culture, and I want to ensure I'm respecting the rules and norms here. If my question is not appropriate for this community, please let me know, and I'll promptly delete it. Thank you.

#

most left image is ground truth image

jade hornet Mar 14, 2024, 1:41 AM

#

the question is appropriate, though I must have missed what the question actually was. I saw your method and results

pale hawk Mar 14, 2024, 1:50 AM

#

jade hornet the question is appropriate, though I must have missed what the question actuall...

I'm curious about the typical process of full fine-tuning for SDXL. Is it normal for it not to gradually modify existing SDXL outputs to match the desired custom dataset style, but instead to start from a distorted state as shown in the attachment?

From other research, I've noticed that fine-tuning and quality-tuning often stop around 15K~30K steps. I'm wondering if it's okay to continue training beyond this point.

jade hornet Mar 14, 2024, 2:09 AM

#

with the caveat that my understanding is very basic and high level, full finetuning is retraining all the parameters, and typically works better with thousands of images. dreambooth is more well suited for a smaller dataset. as for how many steps in general, it'll start to overtrain at some point and the results will degrade. It's difficult to say in advance where that will happen

pale hawk Mar 14, 2024, 4:29 AM

#

Got it, thanks for the response.

full remnant Mar 14, 2024, 9:12 AM

#

How long would it take to finetune SD3 on one P100?

hollow spruce Mar 14, 2024, 12:20 PM

#

full remnant How long would it take to finetune SD3 on one P100?

A.) we dont have access to SD3, so we know close to nothing, other than what the paper has told us
B.) that's not how that works. You can have all the compute in the world... what you need is a dataset and a good understanding of dataset architecture & captioning
C.) due to this being the official SAI server, talking about nsfw topics or how to circumvent censoring isnt allowed on this server.

full remnant Mar 14, 2024, 12:22 PM

#

Oh, sorry then

dusky urchin Mar 15, 2024, 5:17 AM

#

pale hawk I'm curious about the typical process of full fine-tuning for SDXL. Is it normal...

your goal cannot be achieved

pale hawk Mar 15, 2024, 5:39 AM

#

dusky urchin your goal cannot be achieved

Do you know anything about this? Can you point out if there's anything wrong? Are there only methods like LoRA or DreamBooth for fine-tuning the SDXL model?

dusky urchin Mar 15, 2024, 5:42 AM

#

pale hawk Do you know anything about this? Can you point out if there's anything wrong? Ar...

sdxl isn't capable of generating fine typography like this. it will be a bajillion times simpler to composite the label on

#

it doesn't really even make sense as an application. you are only going to use like 5 creatives for an ad campaign, which would take less than an hour to make.

pale hawk Mar 15, 2024, 6:00 AM

#

dusky urchin sdxl isn't capable of generating fine typography like this. it will be a bajilli...

Thank you for the insightful comments. However, what I’m curious about is, I understand that models like LoRA or ControlNet, which freeze the backbone model and only train the adapters, maintain the capabilities of the backbone model from the beginning and gradually proceed with the generation process towards the style of the custom dataset. On the other hand, I’m wondering if, in the case of full fine-tuning, the original model’s capabilities are lost and the image generation starts off in a disrupted state from the beginning.

dusky urchin Mar 15, 2024, 6:07 AM

#

pale hawk Thank you for the insightful comments. However, what I’m curious about is, I und...

both reduce capabilities in the sense of text to image prompting generally.

pale hawk Mar 15, 2024, 7:04 AM

#

dusky urchin both reduce capabilities in the sense of text to image prompting generally.

thanks for the reply

sonic narwhal Mar 15, 2024, 8:48 AM

#

Which is best finetuning SDXL vs Cascade when it comes to realism?

stiff dust Mar 15, 2024, 10:41 AM

#

pale hawk Thank you for the insightful comments. However, what I’m curious about is, I und...

Hi,
you have to use different learning rates for Lora and Fine-Tuning. Your learning rate auf 3e-4 is totally fine for Lora training, but WAY too high for full fine tune. That's why the model breaks in the beginning. Take the square of your Lora learning rate to obtain a proper full finetuning learning rate. In your case, instead of 3e-4 use its square 9e-8 (or simply 1e-7).

#

also you should always use a few warmup iterations. The AdamW optimizer is in an unstable state in the beginning and need some time to adapt to the data. With a warmup of, say, 50 steps, you set the learning rate gradually increase to your desired value for the first 50 steps, giving the optimizer time to collect statistics

#

beyond that there is no huge difference between Lora and full finetuning (beyond Lora being more parameter efficient).

pale hawk Mar 15, 2024, 11:04 AM

#

stiff dust Hi, you have to use different learning rates for Lora and Fine-Tuning. Your lear...

Thank you so much for the answer. I was curious because the capability of the existing SDXL was so distorted that it was unrecognizable even before many steps of fine-tuning had passed, and I now understand that 3e-4 was a large value. As you suggested, I will try again with 1e-7. Thank you very much.

stone garden Mar 15, 2024, 11:38 AM

#

Nice to meet you, I'm Japanese.
Let me ask you a question.

I decided to give style learning a try, and I did so while referring to this site.
https://romptn.com/article/22757
The SD Ver is local environment WebUI 1.5, Windows 11-64, memory 16GB, GPU: Ge Force RTX 3060Ti 8GB, but even if I press the learning button, it stays in "standby" and no images are created. (This is the "Train 'embedding'" part of the above site)
I restarted my PC and closed unnecessary programs, but if there is any solution, please let me know.

*If possible, it would be easier to notice if you write in a reply.

romptn Magazine

Stable Diffusionで1枚の追加学習で高品質な画像が生成できる拡張機能『DreamArtist』の使い方！

オリジナルの「embedding」を作成できる『DreamArtist』　今回は「ずんだもん」の１枚の画像からキャラクターのembeddingを学習させて「ずんだもん風」の画像を生成するまでの、一連の工程を解説しています。

bronze igloo Mar 15, 2024, 3:04 PM

#

Hello all. I'm working on training a ControlNet to remove furniture from furnished room photos. Wondering if anyone has done anything similar - but after training for about 5 days, it seems to have plateaud. I've posted details here in case anyone can help: https://github.com/lllyasviel/ControlNet/issues/659

GitHub

Training a ControlNet to generate furnished room -> empty room (and...

I'm working on a project to take images of furnished rooms and remove all the furniture. I've got a large dataset of image pairs. I'm not using any preprocessing on the images so as to ...

tame vortex Mar 15, 2024, 3:12 PM

#

stone garden Nice to meet you, I'm Japanese. Let me ask you a question. I decided to give st...

it would help if you provide the console log.

shut pike Mar 15, 2024, 7:38 PM

#

So SD3 got captioned with CogVLM - is there a source for good captioning prompts (that detail the image, subjects, their clothes, pose etc. & also judges the image quality) ?

jade hornet Mar 15, 2024, 9:33 PM

#

bronze igloo Hello all. I'm working on training a ControlNet to remove furniture from furnish...

what you're trying to do is pretty complex. I'm actually impressed with the results you've achieved. No idea how to help ,but nice work

#

I wouldnt have thought SD would be a suitable application for that

#

somehow the AI needs to understand the difference between what constitutes the empty room and the "stuff"

stone garden Mar 15, 2024, 11:59 PM

#

tame vortex it would help if you provide the console log.

Thank you for answering.
What is the console screen? Maybe you mean this black window?

tame vortex Mar 16, 2024, 12:00 AM

#

yes

#

but the whole log please.

#

the whole text

#

copy paste it in a .txt. Then drop that file in here

stone garden Mar 16, 2024, 1:14 AM

#

tame vortex yes

Thank you, I understand.
(If you reply with a quote, you will receive a notification, so it will be easier to notice, so please help us)

Next time I teach the style, I will paste the log.

stone garden Mar 16, 2024, 2:35 AM

#

tame vortex yes

I have copied the text from the console screen, so please check it.

I think the last one, "AssertionError: Training models with lowvram not possible" is probably the cause, but even after searching, I don't know what to do.

📎 venv_Fstable-diffusion-webuivenvScr.txt

hollow spruce Mar 16, 2024, 4:59 AM

#

stone garden I have copied the text from the console screen, so please check it. I think the...

| AssertionError: Training models with lowvram not possible

The final line here describes the issue.

Here's an explanation for why this is not working, on the PC you are currently using:
You are using a GeForce RTX 3060Ti 8GB.

8GB VRAM is not a lot when it comes to AI Image generation. It is basically the bare minimum to generate.
While generating, the program uses tricks to reduce the amount of VRAM that it needs, which allows you to generate.

For training, it needs all parts of the model loaded at the same time, in order to train it.
8GB are not enough to do this while using stable-diffusion-webui

There are ways to still do it on your pc, by using the program kohya or onetrainer. But it wont be easy nor very fun due to the issues you will face on an 8GB vram card.

16GB VRAM will let you do all kinds of trainings efficiently.

Else, you can also use online service to train your dataset for you. (Civitai has a free version of this. Some other sites also provide such services.)

hollow spruce Mar 16, 2024, 5:04 AM

#

bronze igloo Hello all. I'm working on training a ControlNet to remove furniture from furnish...

this needs a few more examples, using new pictures which weren't in your training dataset, to see what your model has learned so far.
There's a chance its working. But there's also a chance that it learned to work only on your training dataset... which is another way to say, you've overfit it to hell and back.
(If however it is working as intended, then there are a lot of things which can be done to improve from your current situation.)

hollow spruce Mar 16, 2024, 5:09 AM

#

shut pike So SD3 got captioned with CogVLM - is there a source for good captioning prompts...

no one-fits-all solution sadly.

there are a few basic prompts, which will work on all images, at the cost of worse captions.

but you'll be much better off if you can segment your datasets into categories.
For example this is the prompt I used to generate captions of all images that have one woman in them: Caption the woman, pose and background
its given me the best results.
(Protip: use cogagent! better results than with cogvlm, barely more vram used)

stone garden Mar 16, 2024, 5:11 AM

#

hollow spruce | AssertionError: Training models with lowvram not possible The final line here...

Thank you for answering.
There just wasn't enough memory...😭

onetrainer
Is this it?
https://github.com/Nerogar/OneTrainer

It would be helpful if you could paste the URL of the free online service.

GitHub

GitHub - Nerogar/OneTrainer: OneTrainer is a one-stop solution for ...

OneTrainer is a one-stop solution for all your stable diffusion training needs. - Nerogar/OneTrainer

hollow spruce Mar 16, 2024, 5:15 AM

#

stone garden Thank you for answering. There just wasn't enough memory...😭 >onetrainer Is t...

yes!
Just be careful with tutorials, as most of them assume that you have 12 or 16gb vram

stone garden Mar 16, 2024, 5:17 AM

#

hollow spruce yes! Just be careful with tutorials, as most of them assume that you have 12 or ...

thank you.
Either way, my computer doesn't have enough memory.

Does the "free version" you mentioned earlier use the same amount of memory?

hollow spruce Mar 16, 2024, 5:18 AM

#

stone garden Thank you for answering. There just wasn't enough memory...😭 >onetrainer Is t...

https://rentry.org/59xed3
this tutorial is pretty accurate on most things.
use "AdaFactor" as that will work well on 8gb vram

and good luck!

stone garden Mar 16, 2024, 5:20 AM

#

hollow spruce https://rentry.org/59xed3 this tutorial is pretty accurate on most things. use "...

Thank you, I'll try studying!

shut pike Mar 16, 2024, 8:35 AM

#

hollow spruce no one-fits-all solution sadly. there are a few basic prompts, which will work ...

Interesting. I tend to use more complicated prompts than that... and for me CogVLM always performed better than CogAgent (which just got fine-tuned on Screenshots of UI's).

bronze igloo Mar 16, 2024, 12:53 PM

#

hollow spruce this needs a few more examples, using new pictures which weren't in your trainin...

LOL, love the way you put it. for sure, i will try to post some samples from validation set.
in the case it hasn’t been overfit to hell and back, what are some things you think are worth trying?

unreal trench Mar 16, 2024, 6:05 PM

#

Hi! is there a place like Civit where I could download captioned datasets for Dreambooth LoRA? How to create regularisation datasets?

jade hornet Mar 16, 2024, 10:42 PM

#

unreal trench Hi! is there a place like Civit where I could download captioned datasets for Dr...

Datasets are just images of whatever you want to train, which is anything under the sun. I know of no such place, and even if it existed, what are the odds that they would have exactly what you need? As for reg images, you generate them from the model you want to train on, using a class prompt (ie "a photo of a dog","a photo of a man", etc)

hollow spruce Mar 17, 2024, 2:09 AM

#

unreal trench Hi! is there a place like Civit where I could download captioned datasets for Dr...

yes. kinda. but emphasis on "dataset" meaning all kinds of datasets... (like text, audio, images, video, etc...)
https://huggingface.co/datasets

Hugging Face – The AI community building the future.

#

but use the filter and find just what you need. 120000 datasets currently exist. common topics will be easy to find. niche topics require luck, or you can be the difference you want to see!

hollow spruce Mar 17, 2024, 2:10 AM

#

unreal trench Hi! is there a place like Civit where I could download captioned datasets for Dr...

https://huggingface.co/datasets/ptx0/photo-concept-bucket/viewer/default/train?row=1
this one specifically, is good to use to build regularization sets. but its 500000 images, so you need to filter it down to the categories you want to create reg datasets for

restive bobcat Mar 17, 2024, 8:46 AM

#

I would like to try and replicate the green screen LoRA https://civitai.com/models/240019/green-screen, for gaining training experience, further model generalization, and own use cases.

The end goal would be to have a green screen LoRA to generate multiple characters, in any pose, on a green screen background.

For training I would rent GPU VMs on Google Cloud, perhaps with a budget of ~ $100 per month. I have for example a 16 GB VRAM instance (stopped atm), just for occasional testing generation/ port forwarding for local Web UIs etc. I do know a bit of Python.

From all the super cool discussions in this channel, I understand that I need to start from curating a good dataset, with good captioning. It should be feasible to collect enough green screen images from Shutterstock etc, in addition to generating from the mentioned LoRA. Based on a minimum of 400 (?) images, I would try and generate prompts with Cog-something. Then I should caption the images like grScr anime old man in coat standing, basically describing all features I don't want the model to learn.

Am I somehow on the right track? Based on my budget, how many images should I realistically aim for in my training dataset?

EDIT: most green screen stock photo is actually video. Could I split each video into its still images, with each image from the video given identical caption describing everything but the background, and use all of those images in the training data set? The background is "all" the model needs to learn, right?

Example: 2 seconds of 25 fps video of woman dancing in front of green screen. Split video to 50 images with caption gr_scr a woman in casual clothing dancing.
Too easy?

dusky urchin Mar 18, 2024, 8:45 PM

#

restive bobcat I would like to try and replicate the green screen LoRA https://civitai.com/mode...

what is the use case?

shut pike Mar 18, 2024, 10:19 PM

#

I think I'll go crazy for SD3 and use CogVLM to caption my images with several natural language tags for different areas (like subject, composition/lighting, etc.) - what do you think? Too much or a good idea for a Model with a 512 token limit?

foggy inlet Mar 18, 2024, 10:24 PM

#

@shut pike you might want to try out MoAI, too - it got pretty nice results in benchmarks (I posted info about it few days ago, on #1003207327203209236)

shut pike Mar 18, 2024, 10:33 PM

#

foggy inlet <@994255528370909265> you might want to try out MoAI, too - it got pretty nice r...

Thanks. I'll consider it... though at the moment I'm learning how to prompt CogVLM and getting better at it. (might be better than to swap Models every day)

What do you think about having several long natural language "tokens" (and even some descriptive old-school tokens) for SD3 fine-tuning?

foggy inlet Mar 18, 2024, 10:35 PM

#

yeah, figuring out good prompt might be really important - that will result in accurate description of both the content, as well as all the important visual aspects

#

just mind even the best of currently available VLMs can struggle with some cases, even simple ones:

shut pike Mar 18, 2024, 10:39 PM

#

foggy inlet just mind even the best of currently available VLMs can struggle with some cases...

Oh I am absolutely aware. I figured I have "all the time in the world" until Sd3 goes public and am using cogVLM as a starting point and iterating / manually curating from there.

#

I would never trust auto-tags.

foggy inlet Mar 18, 2024, 10:40 PM

#

so might be worth to figure out some safety system for that - probably best automated (for example ask both CogVLM and MoAI for the description, and then GPT-4 or Claude if output from both models describes the same image)

#

and if you want to do good finetuning don't forget about regularization - there are multiple models on CivitAI that sucks terribly on that (and then turning males into females, making all faces looking the same, etc. crap)

dusky urchin Mar 19, 2024, 12:13 AM

#

shut pike Thanks. I'll consider it... though at the moment I'm learning *how* to prompt Co...

ask for a detailed description of the image. that's it.

#

you also need a 24GB GPU if you want to run CogVLM at 4bit, or get an 80GB GPU. it doesn't perform as well quantized.

hollow spruce Mar 19, 2024, 4:35 AM

#

foggy inlet just mind even the best of currently available VLMs can struggle with some cases...

the lowest tier VLM I currently use, is moondream. which runs on a toaster.
it answered:
A black and white drawing of the number 3.

#

cogvlm, with the prompt: caption this image

hollow spruce Mar 19, 2024, 4:42 AM

#

dusky urchin you also need a 24GB GPU if you want to run CogVLM at 4bit, or get an 80GB GPU. ...

4bit + 1 beam goes down to around 14ish gb vram.
so 16gb vram cards can run it with minimal settings.

for a 24gb,48,or 80gb vram gpu, it still makes more sense to instead increase beams, rather than load it unquantized. (At least according to our testing on around 50k images)

#

noteworthy mention. this is specifically in regards to using cogvlm for captioning datasets. If you wanna talk with it, or iterate on a conversation... then yeah. 4bit is terrible

restive bobcat Mar 19, 2024, 7:14 AM

#

dusky urchin what is the use case?

My use case is to generate characters and backgrounds separately, and blend them together in a video editing program like DaVinci Resolve. I believe this process is called chroma keying? I just got a hunch that it should work, but how good I don't know.

foggy inlet Mar 19, 2024, 7:54 AM

#

@hollow spruce I just tested moondream2 (https://huggingface.co/spaces/vikhyatk/moondream2) and indeed it handled digits test nicely. I had to change your prompt a bit to produce better captions for more complex images, but I like this model - it seems it might be ready for SD3 era (at least for simple use cases) - thx for sharing it 🙂

foggy inlet Mar 19, 2024, 8:15 AM

#

(phi-1.5 might be the limiting factor for more complex scenes. something similar to it, but based on phi-2 or T5-XXL + CLIP + OpenCLIP could increase compatibility with SD3 style prompting)

broken mulch Mar 19, 2024, 9:15 AM

#

Hey everyone, I'm curious about something. If I use specific keywords and elements to train an SD Lora for creating images, and then later change up these keywords to design clothes, do you think the designs and elements on the clothes will come out consistently? Has anyone experimented with this kind of thing before?

drifting mirage Mar 19, 2024, 11:35 AM

#

Hi! It's my first time preparing a dataset for Kohya SS. This will be a photo style, for realistic portraits. The set includes photos from 750 x 1000 to 2500 x 3700. Please help me understand a few points:

What resolution is optimal for Lora SDXL? Maximum quality is important to me. Obviously, for 750 x 1000 there needs to be an upscale, and 2500 x 3700 needs to be downscaled, but to what extent?
Should the dimensions be multiples of 16/32/32? Does it matter?
Is it worth even at x1 to increase the detail, clarity, and remove jpeg artifacts, for example, using SUPIR?
Is it worth compressing with 100% quality using some jpeg optimizer? The dataset is large, 200+ photos, this may affect the speed of training.

dusky urchin Mar 19, 2024, 5:57 PM

#

restive bobcat My use case is to generate characters and backgrounds separately, and blend them...

you want to use layer diffusion in comfyui

#

and an sd 1.5 model. use the "joint" workflow

dusky urchin Mar 19, 2024, 5:58 PM

#

broken mulch Hey everyone, I'm curious about something. If I use specific keywords and elemen...

you have to use the workflows specific to clothing if you want to change outfits on models. everything else is kind of a waste of time

dusky urchin Mar 19, 2024, 6:00 PM

#

broken mulch Hey everyone, I'm curious about something. If I use specific keywords and elemen...

this is SOTA: https://github.com/levihsu/OOTDiffusion

GitHub

GitHub - levihsu/OOTDiffusion: Official implementation of OOTDiffus...

Official implementation of OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on - levihsu/OOTDiffusion

#

fine tuning will generally cause outputs to be less creative, not more.

dusky urchin Mar 19, 2024, 6:03 PM

#

foggy inlet (phi-1.5 might be the limiting factor for more complex scenes. something similar...

the underlying issue is that it's very expensive to produce captioning datasets that only and correctly describe what is visible in the scene and where, and nothing else

#

you cannot "see" unease and mystery

#

looking critically at how many words it uses that are not visible things, and how few actually do, it is not good at all at the task you need it to do

#

"but i don't know though"

foggy inlet Mar 19, 2024, 8:16 PM

#

dusky urchin you cannot "see" unease and mystery

Some models can see "unease" - here you have sample caption from llava-v1.6-34b (last paragraph):

In the image, there's a character with a futuristic appearance, seated in a contemplative pose on a rocky outcropping. The character is wearing a black body armor with pink lights that suggest technological functions. The armor's design is sleek and polished, with a headpiece that includes a visor and what appears to be a communication device.

The character is facing towards the right side of the image, where a large, towering structure looms in the distance. This structure is complex and appears to be a fusion of organic and mechanical elements, with tentacles extending outward. It stands against a backdrop of a turbulent sky, where dark clouds or perhaps an otherworldly atmosphere gathers.

The ground is rugged and strewn with debris, suggesting a place that has been through significant events or perhaps a battle. The entire scene is awash in a palette of dark, moody colors with accents of pink and purple, contributing to a somber and mysterious atmosphere.

The art style is detailed and realistic with a touch of surrealism, given the fantastical elements present. The image quality is high, with a texture that suggests a digital painting. The level of detail is impressive, from the individual strands of the character's hair to the intricate patterns on the armor.

The overall atmosphere of the image is one of solitude and introspection, with a sense of anticipation or unease. The juxtaposition of the character's calm demeanor with the chaotic and threatening environment creates a powerful visual narrative.

#

Prompt was

Please describe this image, so another person could imagine the same picture. Include all the relevant information about the content, artistic style, image quality, interesting visual aspects, and the general atmosphere of the image. Be accurate, and concise.

#

hm, but even the tiny moondream2 noticed the sense of unease in this image - I am not sure why you've said it cannot be seen

dusky urchin Mar 19, 2024, 9:28 PM

#

foggy inlet hm, but even the tiny moondream2 noticed the sense of unease in this image - I a...

you can use words to describe what concrete things that are visible

#

and you can call that unease

#

and i'm trying to tell you that using the concrete visible words make a useful caption

#

for these purposes, but also for the purposes of making visual art, and prompting

dusky urchin Mar 19, 2024, 9:29 PM

#

foggy inlet Some models can see "unease" - here you have sample caption from llava-v1.6-34b ...

in my opinion this performs even worse from the point of view of the answer to the prompt

#

it gave you the opposite of concise

foggy inlet Mar 19, 2024, 9:31 PM

#

not really - it was as concise as it could to describe the things I've asked for

#

and the T5-XXL in SD3 should be able to comprehend long prompts like that

#

and with SD3 model trained on longer and more detailed captions from CogVLM I would expect SD3 to be able to generate visualizations of more complex and abstract ideas pretty well, too - we'll probably see in a month or so, when SAI will publish the weights and we'll be able to experiment with various parts of the model

dusky urchin Mar 19, 2024, 9:40 PM

#

if you wanted to use this for training, maybe a caption would be

a digital airbrush illustration of a rear wide angle shot of a skinny brunette woman wearing black spandex, black science fiction futuristic skinny body armor with pink energy coming from a window on the armor in the back; pink fringed neon lights on her armor; illuminated pink headphones with a pink band behind instead of on top of her head with short antenna; she is in a "thunderbolt" pose seated on her knees with her legs more apart than the traditional pose, and her legs appear fused into black webbing and tissue with an octopus tentacle tip going from her butt to the foreground bottom edge of the frame; the black tissue webbing is fused with red grass and branches of a large black tentcale form evocative of a tree, with black and purple branches above her at the top of the frame; a pink waxlike substance is melting from these branches like cables hanging. in the midground is a series of low mountaintop or wave crest forms behind and to the right of the woman; and in the upper right corner deep in the background is a platform superstructure building high in the sky, with curved cable forms as piers into the mountaintop/ocean elements. it is illuminated by small red lights; the structure appears to be a few large boxy platforms with darker greebles at the top, using a pink-purple-black palette. the sky is gesturally rendered clouds illiuminated by a mostly set sun in the background, the atmosphere is light orange.

#

i think science fiction futuristic skinny body armor is not good

#

i would have to think of a better way to describe what she is wearing but honestly it's kind of vague in the image too

#

@foggy inlet do you see my point?

#

these captions are helpful because if you want the model to generalize from conditioning correctly, it has to recognize when "the same thing" appears in different contexts

#

so if you wanted to generate an image of something out of sample - technically you always want to do this

#

okay, let's say the model has never seen "spiderman" before

#

like that word has never been used

#

it would be nice if it had seen a lot of examples of different costumed people correctly captioned with the elements of the costume

#

merely saying "batman" does not help you at all tackle the problem of rendering "spiderman"

#

am i making sense?

#

you can't "see" batman either, in thes ame way you can't "see" a person who isn't a celebrity

#

batman is a name for a collection of real visual artifacts.

#

so listen, you can "see" unease, but it's not helpful for training or generating out of sample images, which is like, the problem you are trying to solve

#

take it or leave it, this is my professional opinion

#

you could ask the LLM to "describe the collection of concrete visual elements used to express the emotions in this text"

#

i don't know how llava will work with that, but that's what the goal would be

foggy inlet Mar 19, 2024, 9:45 PM

#

well I agree model should see "the same thing" in a lot of different context to understand it, but I am not sure if we'd need that kind of description like you provided for that.

imagine you'd be talking to an artist - would you describe him what you want with every tiny detail, or rather describe high level concept and composition, atmosphere style etc. and then let him do his job as an domain expert and figure out the details based on his vast experience?

dusky urchin Mar 19, 2024, 9:46 PM

#

foggy inlet well I agree model should see "the same thing" in a lot of different context to ...

imagine you'd be talking to an artist - would you describe him what you want with every tiny detail, or rather describe high level concept and composition, atmosphere style etc. and then let him do his job as an domain expert and figure out the details based on his vast experience?
i do this a lot professionally too and you and i both know that's a pretty complex question lol

#

i hear what you are saying

#

i don't think the model is creative, in the same way ChatGPT struggles to be creative

foggy inlet Mar 19, 2024, 9:46 PM

#

SD3 is getting there

dusky urchin Mar 19, 2024, 9:48 PM

#

foggy inlet SD3 is getting there

they can only go so far with the resources they have. for all the things they set up in unreal engine, such as millions of images of different three object juxtapositions and placements, it doesn't help with five objects. the model is totally capable of correctly generating images with five objects, but the state of the art approach to this stuff is limited by the generalizability of the conditioning

#

i think this is also why dall-e3 has such a "look" whereas SDXL does not

foggy inlet Mar 19, 2024, 9:49 PM

#

yeah, but I've seen MJ responding pretty well to longer description even long time ago in v4 times - so it's pretty sure possible

dusky urchin Mar 19, 2024, 9:49 PM

#

it is "undertrained" but also less conditioned

foggy inlet Mar 19, 2024, 9:49 PM

#

and current alpha SD3 revisions can respond pretty well to poetry, too:
https://twitter.com/thibaudz/status/1768009402970263667

Thibaud Zamora ∞ (@thibaudz) on X

long prompt on sd3: Tomorrow, at dawn, at the hour when the countryside whitens,
I will set out. You see, I know that you wait for me.
I will go by the forest, I will go by the mountain.
I can no longer remain far from you.

I will walk with my eyes fixed on my thoughts,
Seeing…

dusky urchin Mar 19, 2024, 9:49 PM

#

i think we have a similar experience

dusky urchin Mar 19, 2024, 9:51 PM

#

foggy inlet yeah, but I've seen MJ responding pretty well to longer description even long ti...

probably the multi-modal models will have the best chance of being a "pretrained" object used for conditioning in "SD4"

#

they're just so hard to make and train

#

someone will have to publish it first

#

lots of work

#

dall-e3 is going to be SOTA for a while. they have telemetry. they know what people prompted for dall-e2 and 1, and hence could generate training data in whatever to improve results for those prompts

#

sd3 does not

#

stability has no meaningful telemetry

foggy inlet Mar 19, 2024, 9:52 PM

#

true, but there are also new methods to increase speed of training, like BTX - stuff like that will help

dusky urchin Mar 19, 2024, 9:52 PM

#

time will tell, but midjourney and dall-e4 have better odds

dusky urchin Mar 19, 2024, 9:52 PM

#

foggy inlet true, but there are also new methods to increase speed of training, like BTX - s...

i wonder why they didn't release a pixart based model

#

they also need to transition to pixel diffusion like IF

foggy inlet Mar 19, 2024, 9:52 PM

#

then BitNet 1.58 - but that needs quantization during training to work, if I recall correctly

dusky urchin Mar 19, 2024, 9:52 PM

#

lots of things that need to happen

#

yeah

#

i don't know if midjourney published on their model

#

i figure they use pixel diffusion because of how well it recreates scenes from movies

#

the movies it is trained on lol

#

you know that meme where it's like a woman sending an email "look it wrote a whole email from a bullet point" and the recipient is saying "look it turned a whole email into a bullet point"?

#

that's midjourney

foggy inlet Mar 19, 2024, 9:54 PM

#

progressive training is done in the realm of pixels 256x256, then 512x512 etc. - why we cannot do the same in the realm of complexity (simple tasks, medium difficulty tasks, hard tasks), like we train our own brains (noones ask toddlers to learn quantum physics, isn't it?)

dusky urchin Mar 19, 2024, 9:54 PM

#

it's people saying something vague, and then they're really happy that midjourney is remixing scenes from famous movies

dusky urchin Mar 19, 2024, 9:54 PM

#

foggy inlet progressive training is done in the realm of pixels 256x256, then 512x512 etc. -...

yeah i think all imagen family models are this idea in a nutshell

#

and wuerstchen too, just not pixels

foggy inlet Mar 19, 2024, 9:59 PM

#

there's also a lot of interesting papers flying around every day, more than I can read - I guess we'll soon need AIs to read all of this with comprehension (at least at high level), and then help us pick most promising ideas potentially giving biggest improvements - maybe not GPT-4, but a bit smarter models like Claude 3 Opus or GPT-5, with properly working bigger contexts - those might be able to help us with such as tasks, too 🙂

#

(or maybe something like that could do? model designed to support research papers analysis, published 4 days ago: https://arxiv.org/abs/2403.10301)

hollow spruce Mar 20, 2024, 8:41 AM

#

drifting mirage Hi! It's my first time preparing a dataset for Kohya SS. This will be a photo st...

Optimal resolution is as close to 1 Megapixel as possible.
That could be 1024x1024, or 1153x768, or similar.

Kohya and onetrainer have this built in natively if you enable resizing + buckets. Then you don't need to worry about cropping or sizes in any way, as long as its over 1 Megapixel, as it will just be scaled down to optimal size

If you wanna go absolute quality, then do the resizing yourself via Photoshop automated process, and save as a png file, so that it's lossless. (that way you avoid jpg artifacts. It's a complicated topic... But the quality increase is incredibly small if you do all this extra work manually)
Better captioning and datasets are always more effective at increasing quality

drifting mirage Mar 20, 2024, 1:02 PM

#

hollow spruce Optimal resolution is as close to 1 Megapixel as possible. That could be 1024x10...

Interesting. Thank you! It turns out that it's not just a matter of resizing one side, but then I need to crop the other side. Pre-sort all the photos by their aspect ratio. Considering I have over 250 photos in my dataset right now, that sounds a bit complicated. I could try to do an automatic workflow in comfy though.

dusky urchin Mar 20, 2024, 3:18 PM

#

drifting mirage Hi! It's my first time preparing a dataset for Kohya SS. This will be a photo st...

can you show me an example of the best work you've made so far, in a comfyui workflow for example, and then that made you say "I need a better photographic style"

jade hornet Mar 20, 2024, 3:30 PM

#

heh, yah more photo realistic seems to be like an endless journey, doesnt it

small eagle Mar 20, 2024, 5:27 PM

#

anyone have a current up-to-date version of this but for local (instead of google collab)

#

https://stable-diffusion-art.com/train-lora-sdxl/

#

have a pair of 4090 24G cards with 64GRam, i wanna give training an sdxl lora a try

small eagle Mar 20, 2024, 6:39 PM

#

got kohya setup, same machine i have comfy installed, seems like the cuda version is different
can kohya force the version comfy is using? or am i gonna need to fully upgrade everything to get kohya to run?

The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/lib64')}
DEBUG: Possible options found for libcudart.so: set()
CUDA SETUP: PyTorch settings found: CUDA_VERSION=118, Highest Compute Capability: 8.9.
CUDA SETUP: To manually override the PyTorch CUDA version please see:https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md
CUDA SETUP: Loading binary /home/vender3d/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118.so...
libcusparse.so.11: cannot open shared object file: No such file or directory
CUDA SETUP: Problem: The main issue seems to be that the main CUDA runtime library was not detected.
CUDA SETUP: Solution 1: To solve the issue the libcudart.so location needs to be added to the LD_LIBRARY_PATH variable
CUDA SETUP: Solution 1a): Find the cuda runtime library via: find / -name```

drifting mirage Mar 20, 2024, 7:55 PM

#

dusky urchin can you show me an example of the best work you've made so far, in a comfyui wor...

What you mean? Just because xl doesn't need a portrait style and you can do a cool style just with a promt? I agree. But only if we are talking about one portrait. I want to make 100+ for my project, of real people, to have likeness, so I want to fix the style with Lora so that the whole hundred are consistent.

dusky urchin Mar 20, 2024, 8:44 PM

#

drifting mirage What you mean? Just because xl doesn't need a portrait style and you can do a co...

have you tried creating a diffused image with the likeness of a non-celebrity without using fine tuning?

dusky urchin Mar 20, 2024, 8:45 PM

#

small eagle have a pair of 4090 24G cards with 64GRam, i wanna give training an sdxl lora a ...

you can simply follow train network from kohya with sdxl and prodigy. that's it. fill out your dataset

#

in terms of interacting with python and completing basic programming and IT tasks, it's better to ask chatgpt

drifting mirage Mar 20, 2024, 9:30 PM

#

dusky urchin have you tried creating a diffused image with the likeness of a non-celebrity wi...

Yes. InstantID with 1-3 photos. Not perfect but works. Its doable but takes effort, sometimes a lot of effort

dusky urchin Mar 20, 2024, 9:45 PM

#

drifting mirage Yes. InstantID with 1-3 photos. Not perfect but works. Its doable but takes effo...

okay so what specifically do you imagine will be improved?

#

you need a lot of photographs and processing per person if you want to recreate a non-celebrity likeness in SDXL flexibly. are these going to be photos of real people that you take on a stage? do you have hundreds of photos per person? did you caption it?

neon quail Mar 20, 2024, 10:41 PM

#

Hi guys, sdxl finetune question, i have 5000+ large high resolution images and i don't want to crop and lose part of the image, so i resize them to fit 1024×1024 and now i have black border frame, will this frame be one the final result or will be ignored on the training ?

#

Example

dusky urchin Mar 21, 2024, 1:32 AM

#

neon quail Hi guys, sdxl finetune question, i have 5000+ large high resolution images and ...

what is your goal?

#

the black border frame, all else being equal, will 100% appear in anything created by your fine tuning

neon quail Mar 21, 2024, 1:52 AM

#

My goal training the full image without losing any part , because the images mostly big in height and not square and i will lose the body or the clothes if i cropping them 1204×1204 , now i resized all the images to fit in 1204×1204 but with black borders , will buckets work ? If i resize only the highest?

ruby pond Mar 21, 2024, 2:56 AM

#

neon quail My goal training the full image without losing any part , because the images mo...

You don't need to make the images square, if you enable bucket ratios in koyha it will crop the images to the 1MP resolution, but they don't have to be square. I use xnviewmp to resize my images setting the resolution to 1.05MP

neon quail Mar 21, 2024, 2:58 AM

#

Thank you 🙏 🌹

dusky urchin Mar 21, 2024, 3:16 AM

#

neon quail My goal training the full image without losing any part , because the images mo...

i meant what is the application

neon quail Mar 21, 2024, 5:45 AM

#

dusky urchin i meant what is the application

Kohya ss

dusky urchin Mar 21, 2024, 5:45 AM

#

neon quail Kohya ss

what are you trying to train?

neon quail Mar 21, 2024, 5:58 AM

#

dusky urchin what are you trying to train?

People, in traditional clothes,asian,arabs, culture ,building , cities, one of the imges dimension is 4014×5017

#

So i will try to resize the height to 1024 and let the buckets deal with width 🤔

#

👆

gentle flame Mar 21, 2024, 5:06 PM

#

https://github.com/huggingface/diffusers/issues/7365

GitHub

Provided pooled_prompt_embeds is overwritten via prompt_embeds[0] ·...

diffusers/src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py Line 386 in 25caf24 pooled_prompt_embeds = prompt_embeds[0] Simple fix: pooled_prompt_embeds = prompt_embeds[0]...

river umbra Mar 21, 2024, 5:47 PM

#

Hey guys, need some help
I'm training my stable diffusion model with Dreambooth. I have 100 images and I go for 20 steps per image, so I have 20600 steps in total and it tells me that it will take 50 hours. How can I decrease the time that the training will take? Is it better to get rid of some images or decrease the amount of steps? Thanks in advance!

#

#

I'm creating my own architecture model

dusky urchin Mar 21, 2024, 7:54 PM

#

river umbra I'm creating my own architecture model

what does that mean?

#

you mean architecture as in Architecture

#

designing buildings

#

share the output of nvidia-smi

jade hornet Mar 22, 2024, 12:48 AM

#

river umbra Hey guys, need some help I'm training my stable diffusion model with Dreambooth....

8s/it? I'd think a 4090 would get like 1it/s or less and be able to chew through that in like 20 mins, but having said that, you have 10 epochs there, you can spit out a checkpoint every epoch and stop when it's done, ie. test each checkpoint as it's produced and if you're happy, you can stop the training

charred socket Mar 22, 2024, 8:45 AM

#

Im training a style lora for sdxl, which preset should i use Im a beginner T-T

hollow spruce Mar 22, 2024, 9:09 AM

#

river umbra Hey guys, need some help I'm training my stable diffusion model with Dreambooth....

Ram overflow. You're not using any optimization techniques, which is why you're out of vram, which is why it's taking 50 hours instead of 20 minutes

#

Try using kohya or onetrainer. They built in a lot of optimizations, so it will run a lot faster.

jade hornet Mar 22, 2024, 1:07 PM

#

charred socket Im training a style lora for sdxl, which preset should i use Im a beginner T-T

adafactor or adamw should be fine, and remember to follow best practices for style training maybe dont even bother with captions until you know what you're doing

celest pier Mar 24, 2024, 7:45 AM

#

Hi everyone! I want to finetune stable diffusion but after loading its components separately from huggingface. I have loaded them separately and am using them for inference but I am struggling with how to finetune it now

#

as far as my intuition goes, i am supposed to freeze the vae and text encoder and train the unet>

#

📎 message.txt

#

This is the inference pipeline from their documentation

#

but how do i finetune this now

hollow spruce Mar 25, 2024, 2:18 AM

#

celest pier but how do i finetune this now

google kohya or onetrainer.
thats the easiest way to get into it

little dust Mar 25, 2024, 2:47 AM

#

hollow spruce google kohya or onetrainer. thats the easiest way to get into it

Does that work for stable video diffusion?

hollow spruce Mar 25, 2024, 12:30 PM

#

little dust Does that work for stable video diffusion?

community sentiment goes more towards animatediff, since its compatible with most existing workflows and app pipelines

check that out if you wanna help improve video generation.

if your goal is actually SVD specifically, then there are few groups and companies that actively try it. Its not easy, nor plug &play. You probably wont find much help online, as its too complicated to casual walk into, and also requires significant hardware & compute to even make sense

little dust Mar 25, 2024, 12:33 PM

#

hollow spruce community sentiment goes more towards animatediff, since its compatible with mos...

I have the compute. I'm not unable to find any guides so I will post one as soon as I get it running. One thing I'm confused about is how different the input is to regular stable diffusion. I can't get the same results just calling SD for the 30 in frames in the future I want, correct?

hollow spruce Mar 25, 2024, 12:36 PM

#

little dust I have the compute. I'm not unable to find any guides so I will post one as soon...

your best bet is to contact the few people that actively pursue SVD, and see if they can get you into any research groups
google "DragNUWA - svd". its the only finetune I'm currently aware of, other than some github contributers

little dust Mar 25, 2024, 1:22 PM

#

I mean I can just load it in huggingface and run a pipeline from stability's gen-AI, no? @hollow spruce

bold geyser Mar 25, 2024, 5:19 PM

#

Any tips for epoch number/ batch size/ repeats- could any of these settings be the reason I m getting json file instead of checkpoints or s_ ? I am Training small model on a few projects based photos? Around 20 ? Any link with settings? I M having enormous RAM 128 GB latest M3 chip, base model LoRA?

jade hornet Mar 25, 2024, 5:36 PM

#

the json file is normal, every time you run training it'll output the config you used for that specific batch

#

if you want a beginner level kohya_ss tutorial I can link a decent one

#

try this: it's not super new, but it's very thorough https://youtu.be/xXNr9mrdV7s?si=xAfvABoxq1VzWndY

YouTube

Not4Talent

LORA training EXPLAINED for beginners

LORA training guide/tutorial so you can understand how to use the important parameters on KohyaSS.

Train in minutes with Dreamlook.AI: https://dreamlook.ai/?via=N4T
code: "NOT4TALENT"

Join our Discord server: https://discord.gg/FWPkVbgYyK (Amazing people like LeFourbe on there)

------------- Links used in the VIDEO ----------

Folder to J...

▶ Play video

jade hornet Mar 25, 2024, 5:43 PM

#

bold geyser Any tips for epoch number/ batch size/ repeats- could any of these settings be t...

forgot to tag you

bold geyser Mar 25, 2024, 5:44 PM

#

jade hornet forgot to tag you

Tnx I ll check… I mean m not getting other output beside json… no model found

drifting mirage Mar 25, 2024, 6:00 PM

#

Hi! I am trying to train Lora for the first time. I seem to have done everything according to the Aitrepreneur tutorial, but Kohya stops right at the beginning. Honestly, I don't even understand what line exactly is error here. Please help me understand what I'm doing wrong (there's a lot of repetition between the 2 and 3 screenshot).

hollow spruce Mar 25, 2024, 6:45 PM

#

drifting mirage Hi! I am trying to train Lora for the first time. I seem to have done everything...

you did enable the sdxl checkbox, right?

#

next to where you set the path to the base model

queen matrix Mar 25, 2024, 6:46 PM

#

That was my guess too.

drifting mirage Mar 25, 2024, 6:47 PM

#

hollow spruce you did enable the sdxl checkbox, right?

Yes, I already found the checkbox, but Kohya still stops

#

On my win11 I have CUDA 12.4 installed over 11.8 does this affect Kohya? Kohya was installed with no modifications clean latest version a week ago. It has its own CUDA inside and is not connected to the global or?

hollow spruce Mar 25, 2024, 6:58 PM

#

it does look like a cuda error tbh x_x

#

I'd first try onetrainer, see if that works. so you dont have to mess with your installs

#

assuming you're just getting started, use the included preset for sdxl lora. that one is nearly flawless for starting out

drifting mirage Mar 25, 2024, 9:16 PM

#

hollow spruce it does look like a cuda error tbh x_x

I did setup.dat things 2 and 4 one more time and now it works

regal harbor Mar 26, 2024, 11:44 AM

#

so I wonder how well SDXL understands relativity. Say I'm training an IRL Homer, so I'm finding various men who look like Homer might if he were real.

In one photo, he looks like homer, but not as fat. Should I tag it "skinny" even though he's not really skinny? But he's skinny compared to Homer

dusky urchin Mar 26, 2024, 3:41 PM

#

regal harbor so I wonder how well SDXL understands relativity. Say I'm training an IRL Homer,...

you have to define what is essential about "an in real life homer," which is full of valid subjective judgements about the art that you have to express formally.

adding the skinny caption will improve the performance of the fine tuning. imagine there are two ways for the training process to "learn" how to generate your image:

the caption closely resembles your image. this means that the image it generates "starts out" looking like your image early on. from then on, only small changes to the parameters are needed.
there is no caption. the training process will make a lot of changes to a lot of parameters. if you don't have other training data that depends on those parameters, those changes will stick, along with whatever changes actually improve its ability to generate your particular image.

so under the theory that only a small subset of parameters are needed to improve generation of your images (true), poorly described captions will cause a "loss of generalization" aka lots of other spurious parameter changes are "kept" along with the small number of changes that actually improve performance.

#

as long as you are using captions that describe what you can concretely see, you will get good results.

#

when training styles, people omit the captions because they want a lot of changes to a lot of parameters.

#

a full fine tune versus lora fine tune also helps. a lora fine tuning has so many fewer parameters that the effect of having bad captions or no captions is diminished.

dusky urchin Mar 26, 2024, 3:44 PM

#

regal harbor so I wonder how well SDXL understands relativity. Say I'm training an IRL Homer,...

so the punchline is that there are concrete, scientific explanations for the behavior the community observes. "relativity" is more like, well can someone see skinny versus fat? i think so, so the conditioning that ships with sdxl (aka "CLIP" and the "conditional UNET") will correctly speed up training when you use those keywords

drifting mirage Mar 26, 2024, 6:05 PM

#

Hey guys! What is your Kohya train speed on 4090? On windows. Yesterday I was able to run Kohya and trained a couple of models for the first time, everything works ok but the speed... 2.30-2.50s/it on XL training, xformers, butch size 5. It's not okay, right?

hollow spruce Mar 26, 2024, 6:15 PM

#

drifting mirage Hey guys! What is your Kohya train speed on 4090? On windows. Yesterday I was ab...

sounds about right

#

I get about 5.5s/it using batch 8 + adamW (no normalizing, on windows with overhead)

there are a bunch of settings that make it go slightly faster or slower. but they're marginal. so your speed looks pretty normal

regal harbor Mar 26, 2024, 7:34 PM

#

dusky urchin a full fine tune versus lora fine tune also helps. a lora fine tuning has so man...

I was thinking of e.g.

"Homer sitting in Moe's_Tavern drinking beer wearing a pink shirt".

I don't tag "bald, beard, fat" because these elements are essential to Homer. However, if he had thick hair in that image, I might add "brown hair".

#

Are you saying tagging like that will screw the model?

dusky urchin Mar 26, 2024, 8:21 PM

#

regal harbor Are you saying tagging like that will screw the model?

tagging like what?

#

you are asking for a "brother, i "just" need an answer" answer, which is impossible

#

if your captions accurately describe what can be seen in the image, your model training will occur "faster" in the sense that the fine tuning will be able to create the images in your dataset with fewer iterations of backpropagation aka in less time

#

homer simpson is already in the sdxl training dataset

#

you are not teaching it a new concept

#

sdxl already knows what homer simpson specifically is. it might not know "the complete collection of concrete visual elements that make up homer simpson" are related to "homer simpson" 100%, but it might

dusky urchin Mar 26, 2024, 8:27 PM

#

regal harbor I was thinking of e.g. "Homer sitting in Moe's_Tavern drinking beer wearing a p...

I don't tag "bald, beard, fat" because these elements are essential to Homer.
CLIP already knows what "homer simpson" is. however the distance between bald, beard and fat to homer simpson is probably larger than you would assume. you could probably improve the speed at which the model trains by including bald beard and fat. but since you are using the word essential, i think you are still dancing around the hard task of deciding what subjectively defines "an in real life depiction of homer simpson"

#

if you spent 5 minutes writing down the concrete visual elements that describe an in real life homer simpson, you will be able to make much better captions

primal pawn Mar 27, 2024, 1:21 PM

#

Hi !
Hope everyones fine.

So, I'm developing a diffusion model for a project that converts text inputs into image outputs (Text to layouts). The stable diffusion model seems to be the most suitable option for this task. My datasets consist of 4003, 256x256 images, each accompanied by detailed captions (Roughly 250 words) in text format. These datasets are hosted on Hugging Face : https://huggingface.co/datasets/jkanishkha0305/text-based-layout-generation-dataset.

However, during training(Using keras implementation : https://keras.io/examples/generative/finetune_stable_diffusion/), the model encounters an issue related to CLIP embedding, specifically mentioning a "ValueError" due to a shape mismatch. The error message states: "Cannot assign value to variable 'clip_embedding_1/embedding_3/embeddings:0': Shape mismatch. The variable shape (1000, 768), and the assigned value shape (77, 768) are incompatible." This problem ig arises because my captions are very detailed, containing roughly 250 words each.

Additionally, when attempting to train the model with a simpler dataset on platforms like Colab or Kaggle, I encounter "OOM" (Out Of Memory) issues, likely due to limited GPU memory (15GB).

I have additionally tried the method specified here : https://github.com/huggingface/diffusers/tree/main/examples/text_to_image. But it runs into "torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 114.00 MiB. GPU 0 has a total capacity of 14.75 GiB of which 101.06 MiB is free." So can someone suggest better ways pls. like may be should i go with TPU V3 ?

I need assistance in resolving these issues. So any help or guidance related to fine tuning of stable diffuion model using custom text(captions) with image dataset would be greatly appreciated.

Thank you.

primal pawn Mar 27, 2024, 1:22 PM

#

primal pawn Hi ! Hope everyones fine. So, I'm developing a diffusion model for a project th...

Should i go for SDXL Model instead or SD v1.5 good enough for my task ?

stiff dust Mar 27, 2024, 2:32 PM

#

that's a CLIP issue, which can only deal with 77 tokens at max

#

also, I don't think that CLIP can deal with you images at all...

#

I would say a general diffusion model is just not the right tool for your task

#

your problem has nothing to do with image generation at all. You want to extract room geometries from a text prompt. So instead of making images, just make geometries like <roomtype, x, y, width, height>. So either train a custom model on top of a text foundation model or train an instruct model that generates these geometries from a text

regal harbor Mar 27, 2024, 4:10 PM

#

dusky urchin > I don't tag "bald, beard, fat" because these elements are essential to Homer. ...

But baldness is intrinsic to Homer, isn't tagging bald redundant?

You don't tag in every photo "person, man, Homer, eyes, nose, mouth, lips, teeth, beard, chin, neck, shoulders, arms, hands, fingers" etc, right?

#

However, if he were missing an arm, I might tag "amputated" or whatever descriptive word for that the model already knows.

gentle flame Mar 27, 2024, 5:29 PM

#

new lora thing for XL models
https://b-lora.github.io/B-LoRA/

Implicit Style-Content Separation using B-LoRA

gentle flame Mar 27, 2024, 6:34 PM

#

Below seems very relevant for auto captioning efforts
https://github.com/IDEA-Research/T-Rex

GitHub

GitHub - IDEA-Research/T-Rex: T-Rex2: Towards Generic Object Detect...

T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy - IDEA-Research/T-Rex

dusky urchin Mar 27, 2024, 7:59 PM

#

regal harbor But baldness is intrinsic to Homer, isn't tagging bald redundant? You don't tag...

You don't tag in every photo "person, man, Homer, eyes, nose, mouth, lips, teeth, beard, chin, neck, shoulders, arms, hands, fingers" etc, right?
you kind of should. it has nothing to do with being intrinsic or extrinsic to homer. it's that when you train on a dataset of billions of images from the public, and for hundreds to tens of thousands of epochs, there are going to be weights in the unet that are "reused" both for generating homer and for generating bald men with a complete set of organs and limbs. when you are training for some BS number of epochs on a vanishingly small number of parameters and very little data, you are hijacking preexisting stuff and "multiplying" it a little bit to bias the whole, complex process towards making copies of your image. if you want the process to find those multiplying coefficients faster, you should say person, man, homer, etc., because an image with those details is more likely to look like your goal per step of denoising.

dusky urchin Mar 27, 2024, 7:59 PM

#

regal harbor But baldness is intrinsic to Homer, isn't tagging bald redundant? You don't tag...

You don't tag in every photo
no, the community doesn't tag all these mundane details in every photo. but I do.

#

you still haven't said what an IRL homer should look like

#

until you specify that you are just stabbing in the dark following guides

hollow spruce Mar 27, 2024, 9:22 PM

#

regal harbor But baldness is intrinsic to Homer, isn't tagging bald redundant? You don't tag...

That's the logic of tagging "what is unique to this image"

Pangloss' method of tagging every feature is similar to how Anime models are trained. Every feature is mentioned. And while prompting is more work afterwards, they are undeniably accurate, and ignore less keywords, while also hallucinating less.

For all of my big datasets, I tag every damn feature. Due to that, they are consistently better at following prompts, and have significantly less token bleed in general.

For super small datasets, the most efficient way is to have a trigger word, and write a short sentence description of everything visible.

There are models where training becomes different, harder, or easier.
But for all base sdxl derivatives, this holds true

#

btw, training via 'whats unique' is a pretty easy and good way of training. But there's a definite "quality ceiling" you'll hit with it.
So its not wrong to train like that, but it is important, that once you want to achieve certain details, or being able to train multiple concepts into a single lora, you'll need to change your method of captioning, to one which has a higher quality ceiling

regal harbor Mar 28, 2024, 5:34 AM

#

gentle flame new lora thing for XL models https://b-lora.github.io/B-LoRA/

I want to see this in reverse, making cartoon characters real

regal harbor Mar 28, 2024, 5:38 AM

#

dusky urchin you still haven't said what an IRL homer should look like

Fat bald man with a bad combover, goatee,

regal harbor Mar 28, 2024, 5:42 AM

#

hollow spruce btw, training via 'whats unique' is a pretty easy and good way of training. But ...

I'm wanting to make a checkpoint more than a lora, ideally.

But I was just thinking, I want fat to bleed into Homer. If I tag every Homer pic with "fat", then when I prompt "Homer" won't I also be required to prompt "fat"?

#

I suppose after training the model/lora, I could also train a TI, in order to get all those Homer details without using so many tokens

primal pawn Mar 28, 2024, 8:36 AM

#

stiff dust your problem has nothing to do with image generation at all. You want to extract...

Thank you so so much this was a really helpful suggestion. So my approach could be like text captions to <roomtype, x, y, width, height> and then use instruct model to generate layouts ryt ?
Also do you have any suggestions for instruct models ?
Using seq2seq approach for text to <roomtype, x, y, width, height> a good approach ?

mighty magnet Mar 28, 2024, 10:21 AM

#

I have trained now a couple of loras, but always ahve the issue, that I can't continue the trainig, what is there to do to make sure I can continue training a lora?

#

I want to do a first few epochs on very basic captioning and than continue training with a set of very detailed captions

dusky urchin Mar 28, 2024, 4:37 PM

#

gentle flame new lora thing for XL models https://b-lora.github.io/B-LoRA/

where did you find this?

gentle flame Mar 28, 2024, 4:50 PM

#

I saw it in another discord

hollow spruce Mar 29, 2024, 3:24 AM

#

mighty magnet I have trained now a couple of loras, but always ahve the issue, that I can't co...

kohya & onetrainer both have the option to "continue" from an existing lora.
you do need to keep the core parameters like net rank the same though. but just changing captions shouldn't be an issue (remember to not save captions to disk! as you'll want them to be generated new, when you switch them up later)
(if you have a different folder, with the images again, just different captions, then saving captions to disk is fine though)

#

here is what it looks like in onetrainer

dusky urchin Mar 29, 2024, 5:51 AM

#

gentle flame I saw it in another discord

which one

dusky urchin Mar 29, 2024, 5:52 AM

#

hollow spruce kohya & onetrainer both have the option to "continue" from an existing lora. you...

does onetrainer resize a base model to the new lora rank?

#

so if it's rank 32 base, will it resize to rank 128 in that field?

#

or does it error out

shut pike Mar 29, 2024, 11:05 AM

#

gentle flame new lora thing for XL models https://b-lora.github.io/B-LoRA/

And how exactly do we need to prepare Datasets for the seperation?

gentle flame Mar 29, 2024, 11:54 AM

#

dusky urchin which one

laion

mighty magnet Mar 29, 2024, 12:13 PM

#

hollow spruce kohya & onetrainer both have the option to "continue" from an existing lora. you...

thanks, I finally found the option and could make good use of it

#

https://civitai.com/models/373459/randommaxx-jamesjean-loha

Randommaxx JamesJean Loha - v1.0 | Stable Diffusion LyCORIS | Civitai

After improving my training and tagging I finally took on the challenge to create a Loha that captures the essence of the wonderful and fantastic s...

stone garden Mar 29, 2024, 10:55 PM

#

good morning.
I tried to create a LORA from a photo, but I can't.
I can't understand the meaning even if I translate the command prompt text, so could you please explain it to me in a simple way?

📎 Windows_PowerShell.txt

stone garden Mar 29, 2024, 11:03 PM

#

stone garden good morning. I tried to create a LORA from a photo, but I can't. I can't unders...

Supplement.
When replying, you will receive a notification if you use a quote, so it will be easier to understand.

exotic cipher Mar 30, 2024, 2:23 PM

#

stone garden good morning. I tried to create a LORA from a photo, but I can't. I can't unders...

Which trainer are you using?
Kohya, one trainer or some other trainer?

jade hornet Mar 30, 2024, 9:09 PM

#

stone garden Supplement. When replying, you will receive a notification if you use a quote, s...

realistically, you should have at least 10 photos from different angles. If you can only get 1, then go with face swap/controlnet vs a lora

#

and you can cheat, using controlnet to generate photos to train a lora too

stone garden Mar 30, 2024, 10:43 PM

#

exotic cipher Which trainer are you using? Kohya, one trainer or some other trainer?

Good morning, thank you for your reply.
I use this method when creating LORA.
https://www.youtube.com/watch?v=N1tXVR9lplM

YouTube

テルルとロビン【てるろび】旧やすらぼ

日本一わかりやすいLoRA学習！sd-scripts導入から学習実行まで解説！東北ずん子LoRAを作ってみよう！【Stable Dif...

🐸本動画は、画像生成AI『Stable Diffusion webUI AUTOMATIC1111』(ローカル版)で使用する学習データの自作方法について解説したものです。23年4月時点での仕様に基づいて作っています。
🐸本動画は技術研究目的で作成しています。またソフトウェアの使用方法の例を紹介した動画であってその使用を推奨するものではありません。
This video is edited for technical research purposes.
It only explains how to install the software and its features, and is NOT intended as a recommendation to use itself.
...

▶ Play video

stone garden Mar 30, 2024, 10:46 PM

#

jade hornet realistically, you should have at least 10 photos from different angles. If you...

Thank you for answering.
A total of 13 photos of people were used: front, diagonal, side, and back.

exotic cipher Mar 30, 2024, 10:49 PM

#

stone garden Good morning, thank you for your reply. I use this method when creating LORA. ht...

I am installing it right now
I personally use kohya ss gui which is the front end of this

stone garden Mar 30, 2024, 10:50 PM

#

exotic cipher I am installing it right now I personally use kohya ss gui which is the front en...

https://github.com/bmaltais/kohya_ss this?

GitHub

GitHub - bmaltais/kohya_ss

Contribute to bmaltais/kohya_ss development by creating an account on GitHub.

exotic cipher Mar 30, 2024, 10:51 PM

#

stone garden https://github.com/bmaltais/kohya_ss this?

yes

stone garden Mar 30, 2024, 10:51 PM

#

exotic cipher yes

thank you.
First, read the page description.🙂

#

Sorry, I have an additional question, but when using photos of real people, does it affect learning if the image size is too large? Please let me know if you have a suitable image size.

exotic cipher Mar 30, 2024, 11:05 PM

#

im sorry i cant get it to work all the python files just die on me the second i open them
though just try kohya ss gui its slightly better to use since it does have an interface instead of being a commandline tool
my apologies from not being able to help with your issue
the installation process should be similair enough with the key difference that you launch the setup.bat file and afterwards the gui.bat file

stone garden Mar 30, 2024, 11:25 PM

#

exotic cipher im sorry i cant get it to work all the python files just die on me the second i ...

No, don't worry about it.🙂
I'm still working on something else and haven't read the page description, but is kohya ss gu an extension of Stable Diffusion?
Or is it another application?

exotic cipher Mar 30, 2024, 11:28 PM

#

it is another application
I would suggest making a new empty folder on your home screen
and follow the instructions described on the github page

stone garden Mar 31, 2024, 1:49 AM

#

exotic cipher it is another application I would suggest making a new empty folder on your home...

It was no good...
"During the installation process, ensure that you select the option to add Python to the 'PATH' environment variable."
At this point, I no longer understand.😫

exotic cipher Mar 31, 2024, 9:18 AM

#

stone garden It was no good... "During the installation process, ensure that you select the o...

Oh that’s an easy fix
Uninstall python and reinstall it make sure that when you install it you check the box on add python to PATH

stone garden Mar 31, 2024, 10:28 AM

#

exotic cipher Oh that’s an easy fix Uninstall python and reinstall it make sure that when you...

thank you.
I am currently installing python3.10.6 on drive F of my computer using Stable Diffusion1.5.
I'm planning to put kohya ss on another drive, but do I still need to uninstall python3.10.6?

exotic cipher Mar 31, 2024, 10:31 AM

#

yes uninstall python and reinstall it.
the changes should all apply to each drive / disk on your computer
when you are reinstalling python make sure these boxes have been checked

stone garden Apr 1, 2024, 9:01 AM

#

exotic cipher yes uninstall python and reinstall it. the changes should all apply to each driv...

thank you.
If Python is successfully reinstalled, can I use Stable Diffusion without any problems?

exotic cipher Apr 1, 2024, 9:02 AM

#

stone garden thank you. If Python is successfully reinstalled, can I use Stable Diffusion wit...

Yes

stone garden Apr 1, 2024, 9:02 AM

#

(Sorry I didn't notice your reply)

exotic cipher Apr 1, 2024, 9:03 AM

#

I had to migrate from 3.10.6 to 3.10.9 and I had no issues with stable diffusion or any other program that uses python
What matters is that python is added to PATH so you can run the venv and the pip install lines

celest pier Apr 1, 2024, 9:08 AM

#

hi

#

anyone has experience

#

with funituning stable diffusion

#

and is willing to help me out with a project?

exotic cipher Apr 1, 2024, 9:50 AM

#

I only have experience with kohya ss gui doing lora training
https://github.com/bmaltais/kohya_ss
finetuning is similair to training lora's but takes alot longer and needs alot more data (about 10-100x that of a lora dataset) from what I have heard
I suggest you start with training a few lora's before you move on to full finetuning

GitHub

GitHub - bmaltais/kohya_ss

Contribute to bmaltais/kohya_ss development by creating an account on GitHub.

#

@celest pier

stone garden Apr 1, 2024, 9:51 AM

#

exotic cipher I had to migrate from 3.10.6 to 3.10.9 and I had no issues with stable diffusion...

thank you.
I don't have much time on weekdays, so I'll try again on the weekend.

upper smelt Apr 3, 2024, 10:06 AM

#

Hello there ! I have a strange result training a Lora above SDXL. when I use the layer, images are oversaturated, with bands artefacts

#

I don't have this issue with SD 1.5

#

I tried playing with the lora's weight, with no luck

#

Is there anyone with an idea?

unique belfry Apr 3, 2024, 12:59 PM

#

i have question about this https://civitai.com/models/4201?modelVersionId=130072, are the no vae checkpoints ones that require no additional vae, or ones that contain no real vae and require the user to use an seperate vae. and are there any vae files on the page, if so which ones are the vaes and which ones are the checkpoints

jade hornet Apr 3, 2024, 7:58 PM

#

upper smelt Hello there ! I have a strange result training a Lora above SDXL. when I use the...

using kohya_ss? perhaps try one of the presets for sdxl. make sure you select sdxl on first lora page, and that your resolution is set to 1024x1024 vs 512x512. same for your sample images if you're doing those during the training, noticed this one you shared is a strange resolution

upper smelt Apr 3, 2024, 8:03 PM

#

jade hornet using kohya_ss? perhaps try one of the presets for sdxl. make sure you select s...

Yes with Kohya. My samples should be at 1024? And if my training image are 512 , is it ok? ( I shared a print screen I think)

jade hornet Apr 3, 2024, 8:30 PM

#

sdxl does not generate very well at resolutions less than 1024x1024

amber shard Apr 3, 2024, 10:50 PM

#

Hi everyone, I am very new to stable diffusion and looking to fine tune a model so that I can do image to image style transfer for many images, with a consistent pencil sketch style. I've tried DreamBooth, but found that the style I created is not what I am looking for. Does anyone know of any resources that I can look into?

this is the script I found and ran:
https://colab.research.google.com/drive/1hMXWO1f9Q344XiixDirZAPqbEXT2pRGF?usp=sharing

Google Colaboratory

jade hornet Apr 4, 2024, 12:36 PM

#

dreambooth style training is what I'd recommend, but style training can be tricky. Maybe look at this reddit post as a reference? https://www.reddit.com/r/StableDiffusion/comments/14rcr7t/kohya_ui_settings_as_asked_stylecharacter_training/

From the StableDiffusion community on Reddit: Kohya UI settings as ...

Explore this post and more from the StableDiffusion community

jade hornet Apr 4, 2024, 12:37 PM

#

amber shard Hi everyone, I am very new to stable diffusion and looking to fine tune a model ...

forgot to tag you above

#

subject training you are describing the scene, the pose, etc, with style training it's normally the opposite, maybe try no captions except "ohwx style" or some custom keyword you choose

amber shard Apr 4, 2024, 6:49 PM

#

jade hornet subject training you are describing the scene, the pose, etc, with style trainin...

Thank you for the link! I'm trying dreambooth but I will look into configuring it differently

latent charm Apr 5, 2024, 6:26 AM

#

Does 1epoch 30repeat equal to 1repeat 30epoch?

silver dawn Apr 5, 2024, 7:52 AM

#

Hello,
I have a quick question: Any tips on training a model on a reflective objects, such as a stainless steel bottles? I've captured the dataset myself, but the issue is that in all the data, reflections on the bottle are visible, and the model seems to be generating them after training

jade hornet Apr 5, 2024, 8:49 AM

#

latent charm Does 1epoch 30repeat equal to 1repeat 30epoch?

Mathematically yes, but there is a subtle difference, as a completed epoch describes the point at which all images in the dataset have been processed.

latent charm Apr 5, 2024, 10:03 AM

#

jade hornet Mathematically yes, but there is a subtle difference, as a completed epoch desc...

Thank, it should be the same as I think.

coral canopy Apr 5, 2024, 10:09 AM

#

latent charm Thank, it should be the same as I think.

Epochs produce more trash on your drive, but you can better pick "best trained model/Lora" candidates. Find a nice in-between.

latent charm Apr 5, 2024, 10:12 AM

#

coral canopy Epochs produce more trash on your drive, but you can better pick "best trained m...

I am implementing my own training script. I could skip some epochs to save the disk.

stone garden Apr 5, 2024, 1:17 PM

#

22、

dusky urchin Apr 5, 2024, 6:57 PM

#

does anyone have any opinions on how to render dice? this seems exceptionally hard

dense sequoia Apr 7, 2024, 9:53 AM

#

dusky urchin does anyone have any opinions on how to render dice? this seems exceptionally ha...

Im late but couldn't you use controlnet to influence its shape and layout? Though you'd need a jpeg for the dice frame

neon quail Apr 8, 2024, 12:38 AM

#

hello, help i don't mess this up because i'm in the medial of captioning 11k images of people

#

do i have to caption and describe only the subject and the clothing with the background or without the background

#

what about tagging , do i add them after described the image like , (man, wearing cowboy hat in farm with big mustache, man,mustache.cowboy,4k,sunset

#

i want the model to be flexible and focus on the subject not the background but with an 11k image maybe it's ok to caption the background ?

dusky urchin Apr 8, 2024, 4:40 AM

#

neon quail i want the model to be flexible and focus on the subject not the background but ...

what is your goal?

#

oh yeah

#

i remember now

#

sophisticated users of diffusion models describe everything that is concretely visible in the scene, in unambiguous terms.

#

let's say you have this image:

a-new-spider-man-far-from-home-trailer-reveals-where-spidey-gets-his-new-red-and-black-suit-social.png

#

worst: spiderman

bad: spiderman on top of a skyscraper in london

bad-okay: spiderman, squatting, high in the air. behind him is the big ben in london on a sunny day.

#

okay: spiderman, a man dressed in a red and black spiderman costume, squatting with legs outstretched on top of girder high in the air, medium closeup level angle, backgrounded by an in focus shot of the big ben in london far away down below

#

best:

spiderman: a fit young adult male wearing a skintight nylon costume fully covering his body. the costume is red all over with black sections: black from the waist to below the knees, from the bicep to the elbows on both arms, from the tricep to the base of the fingers on his lower arms, with a widely spaced black grid on the red areas evocative of a spider web; a black arachnid symbol in the center of the chest about 2 inches in diameter; and graphic eye paint about three times as large as ordinary human eyes that resemble upside down almond shapes, with a white fill and a thick black stroke and curved accents of the stroke at distant corner of the eye evocative of spider compound eyes). he is posed with his right knee bent and his left leg bent and outstretched in a low squat, right arm forward, three quarters profile, and left arm out in the [insert the tai chi stance that they are using here, i don't know what it's called]. he is squatting on an architectural, cropped triangular element that resembles 14 inch steel pipes welded together at the top of skyscrapers. he and the architecutre are composed on the right side of the frame. behind him deep in the distance is a shot of [the london neighborhood with the big ben], showing the big ben near the bottom of the frame,westminster, then [more description of the concrete visible city elements of london]. the sky is an overexposed bright haze on the left and some sunset illuminated clouds on the right, it appears to be near dusk on a cloudy day.

#

@neon quail do you see

#

this isn't practicable

#

but this is how you get Ideogram quality results

neon quail Apr 8, 2024, 4:58 AM

#

thank you this is really helpful, I spent six days and 12 hours every day to separating the images in folders and sorting them by , men, women,clothes, then captioning them and manually edit the captions , now I'm on the final 1000 images and i got scared and confused by other captions tutorials 🌹 🌹

dusky urchin Apr 8, 2024, 5:00 AM

#

yeah

#

focus on stuff that matters to you

#

ultimately if you have a bad learning rate for your text encoder, you're not even going to use the captions correctly

#

you should test on 5 images and see what kind of results you get first

#

and try to understand what all the parameters do

#

if you plan to use prodigy, you are wasting your time captioning

#

are you doing a full fine tune or lora fine tune?

neon quail Apr 8, 2024, 5:05 AM

#

i test on 500 ,learning rate 0,00001 ,40 repet 1 epoch and the result is good

#

full fine tune

dusky urchin Apr 8, 2024, 5:06 AM

#

okay

#

then i don'tthink your captions matter

neon quail Apr 8, 2024, 5:08 AM

#

captions don't matter for full fine tune ?

#

is 10 epoch 1 repeats good for 11k images ? or should I go higher with epoch 🤖

real citrus Apr 8, 2024, 6:51 AM

#

dusky urchin if you plan to use prodigy, you are wasting your time captioning

I'm curious, what is it about prodigy that means captioning is waste of time compared to other optimizers?

real citrus Apr 8, 2024, 6:55 AM

#

dusky urchin are you doing a full fine tune or lora fine tune?

Is the approach to captioning different for a lora vs ft? (assuming same number of images). Is it possible to train a lora with 11k images and get good results?

slim gyro Apr 8, 2024, 10:33 AM

#

does koyha delete image files for the fun of it

#

where is mtefularization images?

dusky urchin Apr 8, 2024, 3:32 PM

#

real citrus I'm curious, what is it about prodigy that means captioning is waste of time com...

every training run is different

#

there's no generalizable advice

#

i get that there exists generalized advice but it doesn't mean it's correct, in even any case

#

i don't know if for @neon quail 's particular problem, if he "just" prompted better, or used "just" clipvision, he could achieve most of what he wants

real citrus Apr 8, 2024, 5:54 PM

#

dusky urchin every training run is different

Maybe I misunderstood your statement. Is there any difference with prodigy compared to other optimizers regarding captioning?

twilit cradle Apr 8, 2024, 8:39 PM

#

Hey everyone! so.. did kohya newest updates break lora training? all of a sudden now when I make new Loras, they do not work (same settings as before)

full remnant Apr 10, 2024, 3:45 PM

#

Anyone here who is experienced at LoRA training?
I am looking for anybody who can train and upload a Ponydiffusion LoRA with best parameters, using a high-quality and diverse synthetic dataset I will provide. (No need to credit me)

jade hornet Apr 10, 2024, 11:18 PM

#

real citrus Maybe I misunderstood your statement. Is there any difference with prodigy compa...

I'm curious too,I know prodigy is pretty aggressive. I've had terrible results with it on sdxl, which is probably a setting, but I never figured out which. It worked great on 1.5. @dusky urchin

#

I know that's not specific, basically it burns out too fast

woven sequoia Apr 11, 2024, 11:33 AM

#

Hello i'm trying to train a model with my mother's artworks, I have a small dataset (to start) of 52 artworks
I would like to let the model hallucinate without any prompts and let it generate new artworks based on the one it has been trained on
Could you help me?

woven sequoia Apr 11, 2024, 12:23 PM

#

🙏

coral canopy Apr 11, 2024, 2:37 PM

#

woven sequoia Hello i'm trying to train a model with my mother's artworks, I have a small data...

What kind of art is it? I suppose you would have more luck with abstract art if you don't prompt.

woven sequoia Apr 11, 2024, 2:37 PM

#

https://discord.com/channels/1002292111942635562/1227943020603899976

#

It's abstract yes

woven sequoia Apr 11, 2024, 2:40 PM

#

coral canopy What kind of art is it? I suppose you would have more luck with abstract art if ...

If you can please give me some help to start, I use automatic1111 but can learn using ComfyUI as well
I had in mind to start from scratch and train fully a model from my dataset but i've been told that I could train a lora and use random weights

coral canopy Apr 11, 2024, 2:42 PM

#

woven sequoia If you can please give me some help to start, I use automatic1111 but can learn ...

What I would try is make a Lora and caption every image with "art by {whateverName}". Then use that on inverence as a prompt and check what happens.

woven sequoia Apr 11, 2024, 2:43 PM

#

so the prompt would be empty or "art by name"?

coral canopy Apr 11, 2024, 2:43 PM

#

Art by name

woven sequoia Apr 11, 2024, 2:43 PM

#

you're talking about the caption for the embedding right?

coral canopy Apr 11, 2024, 2:44 PM

#

Caption for each image for training the Lora, yes

woven sequoia Apr 11, 2024, 2:44 PM

#

okay i'll try that

#

how long is it to train a LORA generally? I have a dataset of 52 pictures to start from

coral canopy Apr 11, 2024, 2:44 PM

#

What graphics card?

woven sequoia Apr 11, 2024, 2:44 PM

#

3080

#

10gigs vram

coral canopy Apr 11, 2024, 2:45 PM

#

I would say not more then 20 min

woven sequoia Apr 11, 2024, 2:45 PM

#

okay nice!

#

thanks for the help i'll try that

coral canopy Apr 11, 2024, 2:46 PM

#

If you have problems you can DM me.

woven sequoia Apr 11, 2024, 2:46 PM

#

thanks FoxHeart

worldly ledge Apr 11, 2024, 8:15 PM

#

Does someone know the best way to do a fine tuning with art style. Is indeed using dreambooth?

rugged mountain Apr 13, 2024, 2:55 PM

#

大家好

#

Hello everyone

swift urchin Apr 14, 2024, 8:57 AM

#

what kind or regularization images should be used for a style?

jade hornet Apr 14, 2024, 5:10 PM

#

you dont need them, imo

ebon sable Apr 15, 2024, 8:29 PM

#

so is it just me or do LoRAs seem to overfit really easily on generated images?

jade hornet Apr 15, 2024, 10:14 PM

#

well overfit is not bad necessarily, unless you mean like artifacts. when training a lora, somtimes it should be overfit, meaning that you dont want a bunch of variation, you want exactly one specific output

#

the other thing to consider is that some non-base checkpoints dont work well with certain loras. the reason everyone loves juggernaut so much is that in general, it's extremely forgiving to use with loras

#

having said that I downloaded a likeness lora from civit recently, and it had artifacts even at like .5 strength, so not sure what that creator was thinking when they uploaded it

tall condor Apr 18, 2024, 12:40 AM

#

hi huys, i spent allomost the last 2 weeks trying to get kohya_ss up and running
i tried windows 10 as well as ubuntu22, however i cant seem to get it fully working to use the GPUs
anyone has propper instruction especially on driver versions,os versions and cuda versions
i basically tried cuda11.8 which apparenly is allmost uninstallable on ubuntu22 by now because the drivers 520 wont build
with 535 i cant get any real speed however the gpus is utilized
what OS Ubuntu version do you recommend?
and what cuda and driver version?

jade hornet Apr 18, 2024, 3:23 AM

#

tall condor hi huys, i spent allomost the last 2 weeks trying to get kohya_ss up and running...

you didnt even mention your card, but you might want to post in tech-support room in any case

ebon sable Apr 18, 2024, 5:04 AM

#

jade hornet well overfit is not bad necessarily, unless you mean like artifacts. when trai...

what I mean is like... when training with generated images it seems a lot harder to get it to pin down the intended concepts/styles without learning every single thing in the images

#

I mean that's natural for training yes, and normally captioning everything besides the main concept works fine with non-generated images. but every time I've tried training on my own generated images, captioned or not it always learns minute details to the extent that every resulting LoRA looks like an overcrowded mess. the concept is usually learned perfectly fine, but... with everything else along with it

#

in the past I could always skirt around it by simply training with the model used to generate the dataset, so it only learned the difference. but right now I'm in a conundrum where my generated image dataset was made with a combo of multiple base models and LoRAs per each image, so that.. doesn't work too well

tall condor Apr 18, 2024, 7:25 AM

#

@jade hornet im on RTX4090

tall condor Apr 18, 2024, 10:31 AM

#

anyone recently installed kohyass on ubuntu 22 or ubuntu20?

#

it appears its impossible

#

my next install with cuda11.8 and 520 drivers failed with blank display

#

this is gerring really redicolous

tall condor Apr 19, 2024, 5:36 AM

#

anyone can help with this

restive sun Apr 19, 2024, 6:03 PM

#

Hey, I am trying to fine-tune SDXL with 30 sets of images, each set having a specific building, I am thinking of using Dreambooth for this. If I am not wrong I think for each set I have to fine-tune the model separately. So, I am thinking of fine-tuning the model with one set of building images and then saving the model and using the saved model to do the next set of images and so on. Do you think doing so will cause issues with the model? Is there a better way for doing this?

tough bison Apr 20, 2024, 5:45 PM

#

if i wanted to do a batch of images, can i set it up so that i can choose which images out of the batch to upscale (in the same workflow)?
for example, a batch of 5 images takes 20 seconds on the first step, but the next step of upscaling takes about 2 minutes per image.
i dont want to upscale all 5 images, but choose images before proceeding to the next step. is it possible to set the workflow to "pause" and wait for me to choose images for the next step?

#

i guess the alternative is to just do the batch, save the files you want, and upscale them in a different workflow?

stable coyote Apr 20, 2024, 11:03 PM

#

tall condor anyone recently installed kohyass on ubuntu 22 or ubuntu20?

sudo ubuntu-drivers install nvidia:525
in kohya_ss venv:
pip install torch==2.1.2+cu121 torchvision==0.16.2+cu121 xformers==0.0.23.post1 --extra-index-url https://download.pytorch.org/whl/cu121

tall condor Apr 21, 2024, 4:28 PM

#

i thik that there is a general problem in kohya with multi gpu
i tested 4 version: 24.x.x. 22.x.x. and 21.x.x.
same machine same cuda same driver same gpus (4x4090)
kohya 21 takes 1:50, kohya 22 and 24 take ~28:00
its like 15 times slower
any ideas?
i tested under ubuntu22.04, ubuntu20.04 and windows 10. no matter what i do i can not get the speeds back to the speeds of 21
i tested also with cuda12 and cuda11
anyone got any ideas?
i even tested on 2 systems, one with 4 gpus and one with 2 gus, one intel one amd
so im quite sure that everybody will run into this issue
tested gloo and nccl

#

jade hornet Apr 21, 2024, 8:38 PM

#

I had no idea multi gpu was supported

oblique jay Apr 22, 2024, 1:47 AM

#

I am training a Lora in One trainer with 450 images and 150 epochs with 2 batch sizes but it consumes 10.4 GB of the 12 GB of vram that my RTX 3060 12GB has. But the problem is that it is very slow, there is no way to increase the batch size consuming the same amount of vram or only 12 vram to make the training faster?

jade hornet Apr 22, 2024, 3:33 AM

#

is it working? if so rejoice. why stress about the speed, you got a hot date coming up?

oblique jay Apr 22, 2024, 4:37 AM

#

jade hornet is it working? if so rejoice. why stress about the speed, you got a hot date c...

Is it normal that it takes 36 hours to do that training? es sdxl 450 images, 150 epochs, 2 batch size, 12 vram and prodigy optimizer

jade hornet Apr 22, 2024, 1:06 PM

#

oblique jay Is it normal that it takes 36 hours to do that training? es sdxl 450 images, 150...

that seems long yah, I'd think a 3090 would get through that in about 30 mins. I often do training on a cloud gpu for that reason

stiff dust Apr 22, 2024, 1:54 PM

#

no, that's normal. You train way too many epochs. In total you have 33750 steps that could take between 20-40 hours depending on your gpu

dusky urchin Apr 22, 2024, 3:56 PM

#

oblique jay I am training a Lora in One trainer with 450 images and 150 epochs with 2 batch ...

what are you trying to generate?

dusky urchin Apr 22, 2024, 4:03 PM

#

tall condor i thik that there is a general problem in kohya with multi gpu i tested 4 versio...

hmm what do you mean you tested under the different operating systems? what is your goal?

dusky urchin Apr 22, 2024, 4:04 PM

#

tall condor i thik that there is a general problem in kohya with multi gpu i tested 4 versio...

use kohya 21 since it at least works, and then use geohot's p2p patch for the 4090

#

don't waste your time on updates in kohya

oblique jay Apr 22, 2024, 4:44 PM

#

dusky urchin what are you trying to generate?

a concept

dusky urchin Apr 22, 2024, 5:15 PM

#

oblique jay a concept

can you be more specific?

oblique jay Apr 22, 2024, 5:19 PM

#

dusky urchin can you be more specific?

It is a specific style of photography with an aesthetic atmosphere in the landscape and some people posing in the background in an aesthetic way.

dusky urchin Apr 22, 2024, 5:25 PM

#

oblique jay It is a specific style of photography with an aesthetic atmosphere in the landsc...

hmm that is pretty vague

#

do you like the results you are getting?

oblique jay Apr 22, 2024, 5:27 PM

#

dusky urchin do you like the results you are getting?

The entertainment is not over yet, there are 21 hours left

dusky urchin Apr 22, 2024, 5:28 PM

#

oblique jay The entertainment is not over yet, there are 21 hours left

without any more details about what you are trying to do, it is hard to say how to make things go faster

oblique jay Apr 22, 2024, 6:18 PM

#

images similar to this and sometimes people appear, they are not just landscapes

tall condor Apr 22, 2024, 8:19 PM

#

geohot's p2p patch seems interesting but im currently on windows

#

however this with nccl could give quite a speed bump.

#

can anyone share his settings for dreambooth finetune sdxl from scratch

#

especially with ppl in it

arctic venture Apr 22, 2024, 9:47 PM

#

neon quail thank you this is really helpful, I spent six days and 12 hours every day to sep...

Final 1000!!

how many are the complete set?

dusky urchin Apr 23, 2024, 5:38 AM

#

oblique jay images similar to this and sometimes people appear, they are not just landscapes

lol that's pretty fun

#

can you remove the text? then, do unet only training.

oblique jay Apr 23, 2024, 7:04 AM

#

dusky urchin can you remove the text? then, do unet only training.

If that's what I did remove all the text from each image

covert pagoda Apr 23, 2024, 7:19 AM

#

has anyone here tried using SDPA instead of xformers for cross attention, in Kohya training? https://pytorch.org/tutorials/intermediate/scaled_dot_product_attention_tutorial.html?

digital glade Apr 24, 2024, 7:15 AM

#

Hi there

#

Anyone confortable with finetuning with kohya ss ? ( full finetune' )

charred sleet Apr 25, 2024, 2:27 AM

#

Does anyone know if 225 is a hard limit for max token length for lora training? I've been wanting to try out natural language captions for lora training and some of mine are a fair bit over that limit. koya_ss seems to cap it at 225, but I don't know if that's just a UI limit

latent charm Apr 25, 2024, 7:22 AM

#

75 is the clip limit

viscid mica Apr 25, 2024, 9:24 AM

#

I have installed the following extensions in Stable Diffusion, but Lora is not generated.

https://github.com/liasece/sd-webui-train-tools

It is not generated as attached and [Ending job] is displayed on the command, but
On train-tools
There is no change even after an hour has passed since "A: 4.34 GB, R: 4.48 GB, Sys: 6.1/23.9883 GB (25.4%)" is displayed.

What I confirmed
・Stable Diffusion version change
(1.8.0→1.6.0→1.5.1→1.7.0)

・Can the Train base model be used to generate Lora unless it is adapted?

GitHub

GitHub - liasece/sd-webui-train-tools: The stable diffusion webui t...

The stable diffusion webui training aid extension helps you quickly and visually train models such as Lora. - liasece/sd-webui-train-tools

copper tangle Apr 25, 2024, 4:58 PM

#

hi all. i've lost my patience with Kohya. in my images folder i have pngs with corresponding .txt files that match filenames (UTF-8, permissions are fine). for whatever reason when i begin training, the caption files show up as missing. anyone else experience this? any solutions? training goes fine otherwise.

dusky urchin Apr 25, 2024, 6:47 PM

#

copper tangle hi all. i've lost my patience with Kohya. in my images folder i have pngs with c...

can you be more specific? what error do you see

dusky urchin Apr 25, 2024, 6:48 PM

#

viscid mica I have installed the following extensions in Stable Diffusion, but Lora is not g...

what is your goal?

copper tangle Apr 25, 2024, 7:44 PM

#

dusky urchin can you be more specific? what error do you see

In Terminal I'm getting this message: " WARNING No caption file found for 140 images. Training will continue without train_util.py:1459 captions for these images. If class token exists, it will be used. /" even though there are indeed caption files for each of the images living in that same folder with the same filenames.

dusky urchin Apr 25, 2024, 7:46 PM

#

copper tangle In Terminal I'm getting this message: " WARNING No caption file found for 140...

what are you trying to train?

charred sleet Apr 25, 2024, 7:46 PM

#

copper tangle In Terminal I'm getting this message: " WARNING No caption file found for 140...

do you have the caption extension specified as ".txt"? the default setting is ".caption"

copper tangle Apr 25, 2024, 8:21 PM

#

dusky urchin what are you trying to train?

training a set of illustrations (testing really, I don't have faith the style will work). the illustrations are pngs accompanied by txt files.

copper tangle Apr 25, 2024, 8:30 PM

#

charred sleet do you have the caption extension specified as ".txt"? the default setting is "....

GAH that's it. thank you so much!

copper tangle Apr 25, 2024, 8:41 PM

#

charred sleet do you have the caption extension specified as ".txt"? the default setting is "....

I updated caption.py so the default is .txt but still not working. Must be hardcoded somewhere else

copper tangle Apr 25, 2024, 9:11 PM

#

Updated this file too: "merge_captions_to_metadata.py" but still didn't fix the problem :/

charred sleet Apr 25, 2024, 9:14 PM

#

copper tangle I updated caption.py so the default is .txt but still not working. Must be hardc...

are you not using the gui? There's a setting in the gui to choose which extension to use

#

it's also a command line arg if you're using sd-scripts I think

#

I believe caption.py is used for kohya_ss's caption-file generating tool, so it wouldn't affect training for imagesets with existing captions. I'd have to take a closer look to see what merge_captions_to_medata does, but from a skim, I think it is used to help generate the output lora metadata, so not directly involved in training

#

it's best not to modify the repo files in general. Changes will make your local copy out of sync with the repo, possibly introducing bugs that are hard to diagnose

#

(the caption extension parameter is under the "Parameters" section with the name "Caption file extension". The command line arg is --caption_extension)

dusky urchin Apr 25, 2024, 10:00 PM

#

copper tangle training a set of illustrations (testing really, I don't have faith the style wi...

can you share one?

#

and a corresponding caption

#

have you tried an online service for this? a LoRA training costs like $0.50

viscid mica Apr 26, 2024, 12:37 AM

#

dusky urchin what is your goal?

I would like to be able to generate Lora with train-tools.
But if it doesn't seem to work now, I'll consider another Lora generation tool.

copper tangle Apr 26, 2024, 1:27 PM

#

charred sleet I believe caption.py is used for kohya_ss's caption-file generating tool, so it ...

Omg, can't believe I missed that. You're right, I see it now under Parameters in the GUI. Guess I should undo my file edits! Thank you so much for your help 💚

copper tangle Apr 26, 2024, 1:29 PM

#

dusky urchin can you share one?

Solved! Can't share one bc it's for a work project. Trying to learn to do everything on my own for learning's sake. Otherwise I'd give up and use something online 🙂

charred sleet Apr 26, 2024, 1:43 PM

#

copper tangle Omg, can't believe I missed that. You're right, I see it now under Parameters in...

you're welcome!

drifting portal Apr 27, 2024, 8:07 AM

#

Hey everyone 😊. Does anyone know if I can use 300dpi image to train LoRa? Or can point to some documentation that goes over size and resolution? Thanks

charred sleet Apr 27, 2024, 12:52 PM

#

what matters for lora training is resolution and quality. sd1.5 loras should be trained on images with at least 512x512 resolution, and 768x768 is often recommended. sdxl should be trained on 1024x1024. You can train on higher res images and it'll be fine (they'll be bucketed by aspect ratio and scaled down to hover around the training resolution), but lower res images can be bad for the output lora quality.

#

for documentation, I dunno if there's any one guide that's been helpful for everything, but there're a lot of videos and articles online covering some basics. https://www.reddit.com/r/StableDiffusion/comments/11vw5k3/lora_training_guide_version_3_i_go_more_indepth/ is a popular one, and https://rentry.co/59xed3 for a bit more in-depth details. https://civitai.com/articles/3522/valstrixs-crash-course-guide-to-lora-and-lycoris-training seems nice too

From the StableDiffusion community on Reddit: LoRA training guide V...

Explore this post and more from the StableDiffusion community

THE OTHER LoRA TRAINING RENTRY

Stable Diffusion LoRA training science and notes
By yours truly, The Other LoRA Rentry Guy.
This is not a how to install guide, it is a guide about how to improve your results, describe what options do, and hints on how to train characters using bad or few images.
Due to the higher prevalence of...

Valstrix's Crash-Course Guide to LoRA (& LyCORIS) Training | Civitai

After copy-pasting a guide I wrote in discord several times, I think it's time I consolidated and expanded on it here on Civit. With so many guides...

covert pagoda Apr 27, 2024, 5:23 PM

#

does anyone have experience training with lion optimiser for style models?

charred sleet Apr 28, 2024, 4:28 PM

#

anyone know if there's a trainer that allows multiple text captions for the same image? I'm experimenting with natural language captioning and the regular "keep tokens" and "shuffle caption" functionality won't work for that. I know you can just copy the image and write new captions for each image, but that messes with the repeats and adds more images to be cached

EDIT: nvm, figure out the option: sd-scripts has --enable_wildcard which looks like it can do this.

copper tangle Apr 28, 2024, 6:58 PM

#

stupid question when someone has a moment. is it possible to train simple 2D vector icons (after converting them to png)? i'm talking like, very simple as in graphic of a globe/smartphone/flower/envelope, on white/black backgrounds, etc. All in the same color palette and not very detailed. they're not truly "images" so i assume not...

charred sleet Apr 28, 2024, 8:16 PM

#

copper tangle stupid question when someone has a moment. is it possible to train simple 2D vec...

I'm not sure what you mean by "not truly images". icon loras are out there though: https://civitai.com/models/49021/minimalist-icons
https://civitai.com/models/141066/game-icon

Minimalist icons - v1.0 | Stable Diffusion LoRA | Civitai

This is a Lora-model for creating minimalist icons. You will no longer have copyright issues. Just generate icons and use them! Trained on the Deli...

Game icon - v1.0 | Stable Diffusion LoRA | Civitai

Prompt 2d icon. {your prompt}. lora:game_icon_v1.0:1 The shorter the prompt, the better. Use the "2d icon" modifier at the beginning. Then ...

wet wadi Apr 28, 2024, 9:22 PM

#

I'm using Kohya, trying various types of LoRAs. So usually, the sample images are at best, a rough indication of how the training is going and a means to tell when the model is over fitted. We expect them to be pretty bad.

What do you do when the training samples look more like the target than anything you can generate in A1111 or Comfy?
Just to be clear, the training samples look rough, but they get the face right. She has an uncommon face, but the sample images look like bad pictures of her, while in A1111 and Comfy, the facial structure, lips, nose just isn't right. Like it's been overly normalized by the rest of the model.

wet wadi Apr 28, 2024, 9:32 PM

#

charred sleet anyone know if there's a trainer that allows multiple text captions for the same...

Thanks. I've been wanting to try something like that.

charred sleet Apr 29, 2024, 4:20 AM

#

wet wadi I'm using Kohya, trying various types of LoRAs. So usually, the sample images a...

I'd say double-check that the checkpoint you're generating on is the same as the one you're using the lora on (default training checkpoint is base 1.5). After that, check if the sampler, cfg scale, etc. on the sampler settings is the same as the ones you're using on automatic/comfy.

copper tangle Apr 29, 2024, 2:34 PM

#

charred sleet I'm not sure what you mean by "not truly images". icon loras are out there thoug...

thanks, those are some great references. "not truly images" meaning they are just flat solid color 2d svgs. nothing really photorealistic about them. but seeing there are icon loras out there gives me some hope that it's possible!

jade hornet Apr 29, 2024, 3:28 PM

#

wet wadi I'm using Kohya, trying various types of LoRAs. So usually, the sample images a...

Just more steps then, and if it burns out before it gets it right, you need to either slow down the LR or tweak your data set to have better angles

dusky urchin Apr 29, 2024, 5:49 PM

#

jade hornet Just more steps then, and if it burns out before it gets it right, you need to e...

i've been reading some hero's thread about improving the loss function for fine tuning, which is definitely appearing to work well for my training

#

i don't think arbitrarily stopping the training is a good idea in general and it's been limiting the capability of fine tuning for a long time

hoary ember Apr 29, 2024, 9:14 PM

#

I am trying to create a fine-tuned model based on SDXL (using either KohyaSS or Huggingface libraries). I have a captioned set of ~500k images in ~400 categories that I want to train on to create an initial checkpoint.

I have a couple of questions regarding how to prepare images as far as cropping/resizing the image dataset to prepare it for training:

All of my images are available in full 1MP resolution but in a variety of different aspect ratios (e.g. 1024x1024, 1280x720). Are square images generally best? Should I train on both the uncropped full version and a version that is cropped to square? For cropping to square, it better to use random cropping, center cropping, or to use object segmentation and try to crop around important subjects?

#

Also, is it beneficial to train on smaller copies of images, so that the model is getting trained on generating the subject at all resolutions? For instance, if I have a 1024x1024 image, is there any benefit to also creating a 512x512 and 256x256 copy of this image and training on those too? I was thinking this might improve generation of the subject at lower resolutions, but was worried about it overfitting with repeated use of same image.

stone garden Apr 29, 2024, 9:14 PM

#

hoary ember I am trying to create a fine-tuned model based on SDXL (using either KohyaSS or ...

jesus chill tf out

#

Stop writing bibles here

hoary ember Apr 29, 2024, 9:15 PM

#

lol, it's you that needs to chill my dude. You're spending too much time on twitter if you think writing a couple short paragraphs about a complex issue is a "bible"

stone garden Apr 29, 2024, 9:15 PM

#

hoary ember lol, it's you that needs to chill my dude. You're spending too much time on twit...

Do something about it big guy

hoary ember Apr 29, 2024, 9:15 PM

#

I don't need to do anything, you're just an angry little yapper with nothing better to do that be an annoyance to someone seeking info on a technical question. Get a life.

stone garden Apr 29, 2024, 9:16 PM

#

let's fight

#

you can't do anything exactly, shhhh now

#

🤫

latent charm Apr 30, 2024, 10:55 AM

#

hoary ember I am trying to create a fine-tuned model based on SDXL (using either KohyaSS or ...

kohya ss implemented bucket for variety of different aspect ratios But you could do the bucket by your own implementation before feeding your dataset to kohya ss script. It has very little different for different cropping strategies. The default is center crop and again, if you implemented your own bucket script, you could try to use different strategies.

latent charm May 3, 2024, 10:58 AM

#

I have an idea. I want to transfer the pose from the anime style to realistic style. Both image created by the same model same prompt with different style. Anyone has idea how to achieve that via training?

humble holly May 3, 2024, 3:58 PM

#

Hi, I would like, learn to fine tune a model to modify existing image. Where can I start to learn to do this?

jade hornet May 3, 2024, 8:40 PM

#

latent charm I have an idea. I want to transfer the pose from the anime style to realistic st...

You can train a pose using subject style captions, but why not just use controlnet with canny and openpose

latent charm May 4, 2024, 1:43 AM

#

jade hornet You can train a pose using subject style captions, but why not just use controln...

I have done with a training with some img2img and the result is pretty good.

pine night May 4, 2024, 8:27 PM

#

Hello. Are the stable diffusion models not available for fine-tuning through the API?

jade hornet May 4, 2024, 9:15 PM

#

they should be, if you're into that it should be on modelslab or whatever. but if there's an issue, you dont need the api

pine night May 4, 2024, 9:26 PM

#

Thanks for the reply. Do you mean that we don't need the api to fine tune the diffusion models?

jade hornet May 4, 2024, 9:34 PM

#

correct

pine night May 4, 2024, 10:51 PM

#

I found that we could fine tune the models through HuggingFace. But I am supposed to do it on a huge dataset. That is why I was looking for a solution where I don't have to write the code to do the distributed training and manage the necessary compute myself.

gentle flame May 5, 2024, 2:53 AM

#

sfw danbooru dataset
https://huggingface.co/datasets/CaptionEmporium/anime-caption-danbooru-2021-sfw-5m-hq

CaptionEmporium/anime-caption-danbooru-2021-sfw-5m-hq · Datasets at...

dusky urchin May 5, 2024, 3:05 AM

#

copper tangle thanks, those are some great references. "not truly images" meaning they are jus...

are you trying to say training on the noun project stuff?

#

what is your goal?

stable cloud May 5, 2024, 3:11 PM

#

Hi Guyz i am new here can any body help me to prevent text generation on the image from stable diuffusion

tell me if can we finetune the Sdxl for not producing the text on any image, becasue stable diffusion is very bad in producing text so can we somehow stop sdxl to product any kind of text on the generated image, Currently i am using negative prompt

NEGATIVE PROMT = "text, fonts,words,3d, cartoon, anime, (deformed eyes, nose, ears, nose), bad quality,bad anatomy, ugly"

but it does not listen to the negatrive prompt so well
i want to generate canva template for christmas halloween etc with no text on it but it always put text with wrong spelling

empty horizon May 6, 2024, 6:31 AM

#

Hello everyone. Can anyone suggest me a feature in which I can create layers similar like photoshop through AI

stone garden May 6, 2024, 10:20 AM

#

I think I was able to install kohya_ss, but it seems like the version is wrong and the installation URL is displayed.
Which one should I download?

stone garden May 6, 2024, 10:46 AM

#

Please let me ask you an additional question.
After updating to Python 3.10.11, Stable Diffusion no longer starts...
I was prompted to "Press any key" on the command prompt screen, so I pressed the enter key, but the screen just closed and the SD did not start.

📎 SD.txt

stone garden May 6, 2024, 11:29 AM

#

I will write down my machine specs.

Model number: ILeDEs-M07M-A134-SASXB
CPU: Intel Core i5-13400
Memory: 16GB
HDD: 8TB
graphic board：Ge Force RTX 3060Ti 8GB

If you receive a reply with a quote, you will receive a notification and it will be easier to notice.

empty horizon May 7, 2024, 12:45 PM

#

Hello can anyone suggest same feature that Runway uses for Erase and replace (ai-tools/erase-and-replace) in Stable Diffusion sdxl? I have used inpainting but i cannot replicate the same through prompt which runway does.

jade hornet May 7, 2024, 6:07 PM

#

Inpainting is the answer, despite it working differently, that's the workflow you would use

ocean dune May 7, 2024, 7:02 PM

#

drifting portal Hey everyone 😊. Does anyone know if I can use 300dpi image to train LoRa? Or ca...

Iirc you just want to downsize it to say 1024x1024 along with the rest of the images if you desire the model to be SDXL, or use OG resolution tall, if you desire it to natively be higher res

drifting portal May 7, 2024, 7:22 PM

#

ocean dune Iirc you just want to downsize it to say 1024x1024 along with the rest of the im...

Thank you 😊

stable cloud May 9, 2024, 2:06 PM

#

Hi Guyz can any one help me i want to finetune sdxl so it generate every image with the "solid color empty background" where i can put text in future i want stable diffusion to give me result like this :

#

I generated this image by this prompt "solid color background, christmas sales template,soft lightning,8k" but this type of prompt does not wokr if i want to make fathers day template , halloween templat ebut i need this type of thing for every image generation where i have room for the text on image

#

is it possible to finetune sdxl to get this type of result and one more thing there should not be a text on any image.

#

fituning lora would be better or the other one?

dusky urchin May 9, 2024, 2:26 PM

#

is there a captioning tool or web based UI that anyone likes?

dusky urchin May 9, 2024, 2:27 PM

#

stable cloud Hi Guyz can any one help me i want to finetune sdxl so it generate every image w...

you do not need to train this. have you tried layered diffusion?

stable cloud May 9, 2024, 2:28 PM

#

dusky urchin you do not need to train this. have you tried layered diffusion?

no i never try layered diffusiuon

#

can you tell me what is layered diffusion ?

#

is it the type of finetuning method

dusky urchin May 9, 2024, 2:47 PM

#

stable cloud can you tell me what is layered diffusion ?

did you try googling it?

dusky urchin May 9, 2024, 2:48 PM

#

stable cloud Hi Guyz can any one help me i want to finetune sdxl so it generate every image w...

also have you tried ideogram? what is this for?

stable cloud May 9, 2024, 2:49 PM

#

seripously i dont even know ideogram, i am new giuy i onlyknow lora finetuning

dusky urchin May 9, 2024, 2:51 PM

#

stable cloud seripously i dont even know ideogram, i am new giuy i onlyknow lora finetuning

what is your goal?

#

i mean why do you want to generate greeting cards?

stable cloud May 9, 2024, 3:43 PM

#

@dusky urchin i want to generate greeting card and put text on these type of templates so i can use it any where like in my shop to display christmas discoiunt offer or any thing

pliant drift May 10, 2024, 1:40 AM

#

this ic_light model . might change the game for synthetic dataset creation

ruby moth May 10, 2024, 7:10 AM

#

So blip 3 just dropped, looks like another step up for captioning.

https://twitter.com/CaimingXiong/status/1788745828007645578

Caiming Xiong (@CaimingXiong) on X

We introduce #BLIP3, a series of large multimodal models (LMMs) developed by Salesforce AI Research.

#BLIP3 is a new SOTA model under 5B on few-shot learning and multimodal benchmarks.
Check our first HF release at https://t.co/uyZmC33zak, and stay tuned for the coming technical…

stone garden May 11, 2024, 1:58 AM

#

I installed Forge, but the following problem occurs.
・“Error Connection errored out.” occurs frequently.
・I installed sd_xl_base_1.0.safetensors and Pony Diffusion V6 XL, but LORA does not appear (F:\webui_forge\webui\models\Lora)

I've been struggling for about a week now.
help me! (It will be easier to understand if you reply with a quote)

📎 Python_3.10.6_tagsv3.10.69c7b4bd_.txt

livid rapids May 11, 2024, 7:51 AM

#

Who do I have to beg to try an implementation of this? https://twitter.com/rasbt/status/1758502685995589698

I noticed it months ago but haven't seen any support for it in training repos and don't have the skill to implement it myself

Sebastian Raschka (@rasbt) on X

While everyone is talking about Sora, there's a potential successor to LoRA (low-rank adaptation) called DoRA. Here's a closer look at the "DoRA: Weight-Decomposed Low-Rank Adaptation" paper: https://t.co/Mmjhy3xTpd

LoRA is probably the most widely used parameter-efficient…

livid rapids May 11, 2024, 11:51 AM

#

nvm it's actually been supported for a while I just missed it

drifting mirage May 11, 2024, 8:07 PM

#

Hi! Is it possible to pause and resume train in OneTrainer? I would like to pause the training and test the model in a real workflow and if it is not trained enough, then resume training from the same place

fallen halo May 12, 2024, 4:20 AM

#

Hi everyone! I need since help fine tuning stable diffusion for a product, if there is anyone that can help?? Appreciate you all!

dapper prism May 12, 2024, 7:53 PM

#

I just added a standalone version of the greedy search bad caption detection script that my CogVLM captioning tool uses: https://github.com/ProGamerGov/VLM-Captioning-Tools/blob/main/bad_caption_finder.py
You can use the script to determine the extent of the issue in your own datasets!

GitHub

VLM-Captioning-Tools/bad_caption_finder.py at main · ProGamerGov/VL...

Python scripts to use for captioning images with VLMs - ProGamerGov/VLM-Captioning-Tools

#

Note that the greedy search caption failure issue is present in all automatic captioning tools to varying degrees, and it can impact up to 3% or more of your total dataset

dapper prism May 12, 2024, 9:24 PM

#

For those who don't know what greedy search is, all you have to know is that the greedy search caption failure occurs when you come across captions that are endlessly repeating, letters, characters, phrases, and sentences. Greedy search is used in all VLMs and captioning models currently available

fallen halo May 13, 2024, 5:34 PM

#

Hey! 🆘 I'm working on a project with stable diffusion Finetuning and ControlNet and need some HELP. If you're experienced with these, I’d appreciate your input. Thanks!

gentle flame May 16, 2024, 8:05 AM

#

GPT4O is a lot less restrictive with what it's willing to caption

gentle flame May 20, 2024, 3:40 PM

#

new cogvlm is out
https://huggingface.co/THUDM/cogvlm2-llama3-chat-19B

THUDM/cogvlm2-llama3-chat-19B · Hugging Face

gentle flame May 22, 2024, 5:47 PM

#

https://huggingface.co/microsoft/Phi-3-vision-128k-instruct

microsoft/Phi-3-vision-128k-instruct · Hugging Face

torpid basalt May 22, 2024, 6:06 PM

#

Does anyone know a guide on dreambooth for artstyle? I want to do the fine-tuning with some real paintings and the trained model must match the artist's painting technique!

knotty pivot May 22, 2024, 6:37 PM

#

Does anyone know how to create exact variants of a particular image. For example, if I want to create exact variants of a shirt in this format:

#

When remixing, I have not been able to get the exact orientation of the template

#

Please PM me!

hybrid shard May 23, 2024, 7:01 PM

#

knotty pivot Does anyone know how to create exact variants of a particular image. For example...

Maybe start with a blank one and use controlnet to generate the remixes? That'd be my first thought.

dapper prism May 23, 2024, 9:16 PM

#

gentle flame new cogvlm is out https://huggingface.co/THUDM/cogvlm2-llama3-chat-19B

still has all the same issues with lack of batch inputs, CCP license, and others

hollow spruce May 27, 2024, 10:03 PM

#

@naive slate need a mod for this ^^

#

thank you to whichever mod removed it ❤️

pliant charm May 30, 2024, 2:03 AM

#

give me a office task for employee working and decrease heat

rocky yew May 31, 2024, 2:57 AM

#

I am curious, are there any good tutorials on to fine tune a lora? I am in the process of doing a redo of an old one and I want to fix it up, how do I go about this? It was trained on Kohya at basically the default, 22 images, no regulations and at 4400 steps - I would like to know how to... fix it up so it looks a bit more cleaner ((at the moment it looks quite whack)) -- Images used where large and clear

jade hornet May 31, 2024, 3:19 PM

#

Determining what caused the learning to not produce what you intended could be many things. Maybe a different training model. Maybe it just needed more steps. Maybe some of the photos are to far apart in concept and the AI was confused on what it was supposed to learn. Maybe your captions need tweaked so there is more guidance on the particular scenarios that you wish to produce later. There's probably not a guide you will find to tell you exactly how to improve your specific scenario

polar nova Jun 2, 2024, 4:34 AM

#

Hey, I was wondering if it is possible to train a lora to a person( full body, close ups, etc.) and if so, how to go about it and what checkpoint is better at realistic photography for this job.

jade hornet Jun 4, 2024, 1:37 AM

#

polar nova Hey, I was wondering if it is possible to train a lora to a person( full body, c...

yes it's possible 2. you would find a tutorial on kohya_ss, I can find you a decent one if you struggle 3. I'd say juggernaut is pretty good at people and realism and responds well to training

torpid basalt Jun 6, 2024, 11:22 PM

#

what the max number of images do i can use to train a style using dreambooth?

stiff dust Jun 7, 2024, 9:12 AM

#

torpid basalt what the max number of images do i can use to train a style using dreambooth?

infinite?

stiff dust Jun 7, 2024, 9:13 AM

#

polar nova Hey, I was wondering if it is possible to train a lora to a person( full body, c...

I always train on sdxl_base. As long as you don't use any extremely fancy model you can use your base loras on other checkpoints, too. In particular with Juggernaut I had really bad experiences, training it didn't worked at all. But maybe that changed with never versions. At least old versions of juggernaut used some stupid pyradimal noise that fucked up any finetuning attempts

hot breach Jun 7, 2024, 10:41 PM

#

torpid basalt what the max number of images do i can use to train a style using dreambooth?

at somepoint for a single subject you may have diminishing returns, but if you just used supervised finetuning (labeled images without the "regularization" images proposed by dreambooth) you can train as many things at once as you want

torpid basalt Jun 7, 2024, 11:48 PM

#

hot breach at somepoint for a single subject you may have diminishing returns, but if you j...

Thanks! I separated some paintings into several 512x512 samples for the model to focus on the artist's technique, but I have like 700 samples to use in the finetuning

empty horizon Jun 10, 2024, 6:54 AM

#

Hello everyone, can you suggest me a good model for open pose ?

white ocean Jun 10, 2024, 7:09 AM

#

torpid basalt Thanks! I separated some paintings into several 512x512 samples for the model to...

Have you considered doing the 700 samples as a lora?

full trellis Jun 10, 2024, 9:35 AM

#

Hello, I have a problem with kohya_ss. I started making my first model today, I did everything as in the guide, and my model was ready in 3 seconds while in others it takes up to 9 hours

my monitor is small so sorry for this logs:(

#

📎 New_Text_Document.txt

torpid basalt Jun 10, 2024, 3:57 PM

#

white ocean Have you considered doing the 700 samples as a lora?

i will try it but i will train on a good pc, so probably i will try to train using the 700 samples in normal way too

valid sentinel Jun 12, 2024, 3:41 PM

#

Hi, I'm using an ID-preserving pipeline with ControlNet (canny|pose) and SDXL for a use case that involves only one reference image for the pose and a face image to interpolate that face onto the pose. What is a good approach to fine-tune my SDXL or any other components in the pipeline to achieve better throughput? We are inferring on an image with the same prompt, so the model should generate only one use case but with a consistent pose.

tulip walrus Jun 13, 2024, 2:08 PM

#

i'm having trouble opening stable diffusion I keep getting this message

stable coyote Jun 13, 2024, 3:25 PM

#

I was able to train SD3 LoRA using example code: train_dreambooth_lora_sd3.py, but then file generated seems to be in diffufers format not webui/kohya. In ComfyUI I get errors like:

lora key not loaded: transformer_transformer_blocks_6_attn_to_v.alpha
lora key not loaded: transformer_transformer_blocks_6_attn_to_v.lora_down.weight
lora key not loaded: transformer_transformer_blocks_6_attn_to_v.lora_up.weight
lora key not loaded: transformer_transformer_blocks_7_attn_to_k.alpha
lora key not loaded: transformer_transformer_blocks_7_attn_to_k.lora_down.weight
lora key not loaded: transformer_transformer_blocks_7_attn_to_k.lora_up.weight
So i tried convert_diffusers_sdxl_lora_to_webui.py but it seems it doesn't convert it properly. Any ideas or tips how to progress with that?

This model: https://civitai.com/models/512239/pixel-art-medium-128 seems to have the same issue- file header looks similary to my generated file. I bet it was trained the same method.

Pixel Art Medium 128 - v0.1 | Stable Diffusion LoRA | Civitai

This is an early version of "Pixel Art Medium" for SD3 Medium. Outputs 128x128 pixel art, grid-aligned images. Tips Always use "pixel art style" at...

earnest barn Jun 13, 2024, 5:09 PM

#

i started using kohya to finetune sdxl on a dataset of 343 images of my art. the resulting checkpoint kinda had the style i was looking for but lacked the details i'm hoping for, it seemed like it might require more training. since i only did 10 epochs, i'd like to train it with another 10 to see if that helps. to pick up where i left off, would i choose the first version of the model as the source model instead of sdxl base? if not, how would i go about training more on top of what i already did?

small eagle Jun 13, 2024, 9:49 PM

#

are there any newer tools for training SDXL lora's that are more simplistic than Kohya or Invoke-Training?

cold wyvern Jun 14, 2024, 5:10 AM

#

small eagle are there any newer tools for training SDXL lora's that are more simplistic than...

OneTrainer

small eagle Jun 14, 2024, 5:31 AM

#

cold wyvern OneTrainer

was just reading about that one, will check it out more

#

thanks!

river lake Jun 14, 2024, 12:24 PM

#

How are people training sd3 lora already can anyone tell me where they are training lora from

cold wyvern Jun 14, 2024, 12:56 PM

#

river lake How are people training sd3 lora already can anyone tell me where they are train...

The diffusers github repo has a script

#

You will need a 3090/4090 to make a lora, and a a100 or h100 to do a full finetune

rigid matrix Jun 14, 2024, 3:38 PM

#

cold wyvern The diffusers github repo has a script

Is there any tuning sample dataset available?

#

Apart from those dog and other small ones. I would like to see a full scale one.

white narwhal Jun 15, 2024, 11:57 AM

#

老兄，有没有loraSD3的训练方法

#

在线等，着急

#

或者教学文档

stone garden Jun 15, 2024, 12:15 PM

#

cold wyvern The diffusers github repo has a script

damn i'll have to learn how to do that

white narwhal Jun 15, 2024, 1:06 PM

#

waow

#

看看我

regal shore Jun 15, 2024, 4:02 PM

#

Hello everyone, is it possible to locally finetune an SDXL model on an RTX 4060 ti 8 gb GPU if the dataset has 300 images? If this is possible, then approximately how long will it take?

cold wyvern Jun 15, 2024, 10:04 PM

#

regal shore Hello everyone, is it possible to locally finetune an SDXL model on an RTX 4060 ...

I dont think you can make a lora for sdxl in 8gb, let alone a full finetune

tulip plover Jun 16, 2024, 4:20 AM

#

white narwhal 看看我

（我迟了一天还回复 😅 ）

关于SD3的LORA 现在这个其实只是拿来尝鲜一下罢了。因为现在为止开放的训练方式就只有用diffusers训练的方式，而且这个训练方式有很多限制，包括只能支持一个概念的训练。不过现在我们倒是有SimpleTuner的大佬支持SD3 LORA训练。前提是你要有3090或者比它还要更多显存的卡，这个diffusers训练方式还是挺怂的。

中文圈其实也没有太多很好的关于SD3的文档。不过你假如真有一张3090，你可以Google Translate一下这个文档（官方hugging face）或者上SimpleTuner （ https://github.com/bghira/SimpleTuner ）
https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/README_sd3.md

hollow spruce Jun 16, 2024, 1:05 PM

#

cold wyvern I dont think you can make a lora for sdxl in 8gb, let alone a full finetune

at this point you can. well kinda
OneTrainer added fused back pass support. it means the absolute minimum for actual sdxl finetuning went down to 6.3gb

its still a lot of work though, and takes quite literally forever to train. but just wanted to point out that its at least possible now

hollow spruce Jun 16, 2024, 1:08 PM

#

regal shore Hello everyone, is it possible to locally finetune an SDXL model on an RTX 4060 ...

^ In onetrainer it's possible.
but its a lot of effort to set up correctly, and will prob take an entire day to train for each attempt. (like 12~24h)

So if you really really wanna do it. it's possible. but I would recommend 16gb vram to actually have fun with training, and to avoid time consuming frustrations due to vram limits

#

LoRA would be easier to train with your limits. and will take like 1~7h to train depending on your settings. will still be a lot of effort to get it working the first time though. (but after first time, settings barely change - so if it works, it works)

river lake Jun 16, 2024, 6:35 PM

#

is it possible to use sdxl lora with sd3?

cold wyvern Jun 16, 2024, 7:18 PM

#

river lake is it possible to use sdxl lora with sd3?

You can only use the clip from the lora

river lake Jun 16, 2024, 7:28 PM

#

cold wyvern You can only use the clip from the lora

and btw is diffusers only cloud based? (newbie question)

#

also can you give me a article if there's one or a quick guide on how to install it

#

https://tenor.com/view/shenmue-shenmue-please-shenmue-sparkling-shenmue-sparkle-sparkling-gif-26169593

Tenor

cold wyvern Jun 16, 2024, 7:50 PM

#

river lake and btw is diffusers only cloud based? (newbie question)

Diffusers is a python library

river lake Jun 16, 2024, 7:51 PM

#

so I use it inside a terminal right?

river lake Jun 16, 2024, 7:52 PM

#

cold wyvern Diffusers is a python library

and the dataset required for making sd3 lora should be only about 5 images or we can add more like 50?

white ocean Jun 17, 2024, 11:08 AM

#

river lake and the dataset required for making sd3 lora should be only about 5 images or we...

I find that lora will happily accept lots of images

river lake Jun 17, 2024, 11:09 AM

#

white ocean I find that lora will happily accept lots of images

alright thanks

viral swan Jun 17, 2024, 8:45 PM

#

How much do captions help with training lora? Are they important? Mostly for style not learning a new concept

stiff dust Jun 17, 2024, 9:48 PM

#

always use captions describing the image. You can/should still use a small caption dropout

gentle flame Jun 18, 2024, 5:24 PM

#

https://ai.meta.com/blog/meta-fair-research-new-releases/?utm_source=twitter&utm_medium=organic_social&utm_content=video&utm_campaign=fair

Meta AI

Sharing new research, models, and datasets from Meta FAIR

Meta FAIR is releasing several new research artifacts. Our hope is that the research community can use them to innovate, explore, and discover new ways to apply AI at scale.

regal shore Jun 19, 2024, 11:37 AM

#

Hey guys.Does it happen when the style from the dataset is not copied in fine-tuning at low epochs (1-2 epochs)?

empty horizon Jun 20, 2024, 5:27 AM

#

Hello everyone, can anyone suggest me why my generation of mask is not applying on the left side of the image. I am keeping the dimensions - 1016 x 504 but there is this left black patch that is coming again and again

thin mantle Jun 20, 2024, 5:39 AM

#

stiff dust always use captions describing the image. You can/should still use a small capti...

What if i go really crazy on the descriptions using gpt4o

#

Like super descriptive. Will this hep or is a simple obvious description all you need

stiff dust Jun 20, 2024, 6:40 AM

#

thin mantle What if i go really crazy on the descriptions using gpt4o

use the descriptionyou want to prompt when using the model

river lake Jun 20, 2024, 6:06 PM

#

How do I extract the parameters from a lora? like comfyui images have workflow
I trained a lora once and I can't remember it's settings now

white ocean Jun 21, 2024, 2:52 AM

#

thin mantle Like super descriptive. Will this hep or is a simple obvious description all you...

i've been using blip2, which is pretty accurate and verbose ... i think it helps to have lengthy descriptions. I add my own tags in, as well as wd14 tags

thin mantle Jun 21, 2024, 3:03 AM

#

white ocean i've been using blip2, which is pretty accurate and verbose ... i think it helps...

Do these captioning models beat out llm vision models like gpt4o and claude 3.5 sonnet?

#

Also is this the blip-2 you are referring to: https://replicate.com/smartinezbragado/salesforce-blip2

smartinezbragado/salesforce-blip2 – Replicate

BLIP2 model trained on blip2-flan-t5-xl-coco dataset

thin mantle Jun 21, 2024, 3:36 AM

#

#

For example which would be a better caption and why. After ill provide you with more images to show the kinds of images im wanting to train for

#

Llava-13b:
The image is a colorful illustration featuring a woman with pink hair, wearing a yellow raincoat and a frog hat. She appears to be staring at the viewer with a somewhat angry expression. The woman is also wearing a nose ring, adding to her unique appearance. The illustration is likely created using a digital medium, as it has a vibrant and detailed style. The combination of the frog hat, pink hair, and yellow raincoat gives the image a whimsical and quirky vibe.

#

Gpt4o:

#

The image depicts a person with vibrant, neon-pink hair, accentuating a striking and bold fashion statement. They wear a distinctive frog hat with large eyes, combined with pink-tinted glasses and a red clown-like nose, enhancing the unique and eclectic style. The individual dons a yellow and black jacket with a high collar, further contributing to the bold and modern aesthetic.The background is bright yellow with white Japanese characters, creating a vivid and eye-catching contrast. The overall style is reminiscent of modern digital illustration and graphic art, heavily influenced by Japanese pop culture and cyberpunk elements. The neon colors and street fashion sensibilities give the portrait a contemporary and edgy feel.

#

These are the type of images i want to tune for:

#

zapfska_a_retro-futuristic_Rihanna_4bce12bb-2599-496e-96e7-02b40d021890.png

zapfska_a_neo-digital_retrofusion_woman_e80f3686-284a-4751-be1b-5e568932b500.png

spiral kettle Jun 21, 2024, 4:00 AM

#

hey there is there any actual way right now to fine tune stable diffusion with 2 input images to give out one output image??

#

I want to combine some aspects of each input image to give an output if that makes sense

mossy oriole Jun 21, 2024, 6:00 AM

#

thin mantle For example which would be a better caption and why. After ill provide you with ...

From personal experience, SD/SDXL doesn't understand a majority of the LLM caption junk.
What I would take from Llava, then Gpt4o.

colorful illustration, colorful, illustration, woman, pink hair, yellow raincoat, frog hat, looking at the viewer, angry expression, nose ring, vibrant, whimsical and quirky vibe.
-> whimsical and quirky vibe might be a terrible choice, as these prompts already have a pre-existing function.

vibrant hair, neon-pink hair, frog hat, pink-tinted glasses, pink glasses, yellow jacket, black jacket, black and yellow jacket, bright yellow background, digital illustration, cyberpunk, neon colors, street fashion, portrait
-> You can consider removing the "bright yellow background"

mossy oriole Jun 21, 2024, 6:07 AM

#

spiral kettle hey there is there any actual way right now to fine tune stable diffusion with 2...

Depending on the images you have, if they're side-facing, consider making a copy of them & rotate them horizontally, to not create a bias roosmug

white ocean Jun 21, 2024, 6:45 AM

#

thin mantle Also is this the blip-2 you are referring to: https://replicate.com/smartinezbra...

that's the one, it's also integrated into Kohya_SS utilities tab now ... so you just point that to the directory of images, and let it rip, and it will create caption files

white ocean Jun 21, 2024, 6:50 AM

#

thin mantle Do these captioning models beat out llm vision models like gpt4o and claude 3.5 ...

gpt4o and claude 3.5 sonnet are the latest hotness ... but blip2 is pretty good at describing what is in the image, that's what it was designed for and it's optimized for doing that, so it may be faster ... whereas gpt4o and claude are general purpose. Also blip2 is free and already integrated into Kohya. I haven't compared them head to head, but I'm guessing they are similar ... i'm not sure if Claude or gpt4o use a separate vision model to ingest images

thin mantle Jun 21, 2024, 6:51 AM

#

Thanks guys i recieve my training pc tmrw so im excited to hit the ground running with your tips

white ocean Jun 21, 2024, 6:57 AM

#

thin mantle Thanks guys i recieve my training pc tmrw so im excited to hit the ground runnin...

My process is: 1) use blip2, in Kohya_ss then 2) I run exiftool from the commandline to extract exif tags that I put in the images via lightroom, these are appened to the .txt caption file. then 3) back to kohya_ss and run wd_14 tagger with the append option, this puts the wd_14 tags on the end. This last one tends to produce some nsfw images, but also seems to pick up a lot of creative and artwork anime characters as well.

#

Also really helpful ... if you are training a lora of a single person, it helps to use "starbyface" website ... this will give a random celebrity that looks a lot like your person. Include this celebrity doppleganger name in the caption file ... it will put your subject in more scenes and give more character.

white ocean Jun 21, 2024, 7:05 AM

#

thin mantle

I missed this post, but these captions seem far better than blip2, although it is only set to write a short caption by default; I haven't set it to write more than just one or two sentences. Keep an eye on your caption size limit for training.

native iron Jun 21, 2024, 7:49 AM

#

thin mantle Thanks guys i recieve my training pc tmrw so im excited to hit the ground runnin...

specs?

thin mantle Jun 21, 2024, 1:02 PM

#

native iron specs?

Intel I914400k, nvidia rtx 4090 24gb vram, 64gb ram, 4t ssd

#

So pretty much the standard consumer build

cunning zinc Jun 21, 2024, 1:18 PM

#

Hi guys, i want to try out finetuning and i am following the following:
https://next.platform.stability.ai/docs/features/fine-tuning

however I get an ModuleNotFoundError: No module named 'stability_sdk.finetune' even after installing stability-sdk.
Is there another package that needs to be installed?

#

#

stability-sdk version is 0.8.6

river lake Jun 25, 2024, 5:04 PM

#

if I increase my batch size in khoya ss lora training from 1 to 4 on a 3090 will it affect the output quality of the lora

jade hornet Jun 26, 2024, 12:49 PM

#

try it? I tend to run with a batch size of 4

#

in theory no it's just how many you're doing at once, some swear it makes it better. maybe there's more blending when you're doing them at the same time, hard to say.

scarlet rune Jun 29, 2024, 7:11 AM

#

Hello everyone! I have a question and wonder if anybody know how to do. I am an architect and want to fine tune a model for a specific case study. Such as a tower ruin that designer require to create a new design within it, like adding new structures on it (adaptive reuse). I have the photos of tower ruin and 300 images of design alternative renders for the tower (there was a competition and I took images from there). So the question is: can I fine tune the stable diffusion that it can generate new design solution by keeping the site and ruins but adding new structures on it?

#

Which model would be best for it

fading sail Jul 1, 2024, 5:12 AM

#

@tired wind Hi, I saw your repo https://github.com/ratulrafsan/Comfyui-SAL-VTON but my question is, does this only work for upper body garments? and also, does it work only on women? thanks for your awesome work 🙂

tired wind Jul 1, 2024, 5:25 AM

#

@fading sail Hi, Thank you, all credits goes to the original author. My node is just a wrapper around their implementation.
https://openaccess.thecvf.com/content/CVPR2023/papers/Yan_Linking_Garment_With_Person_via_Semantically_Associated_Landmarks_for_Virtual_CVPR_2023_paper.pdf
Their paper only discusses finding landmarks on upper body garments. The dataset VTON HD (https://www.kaggle.com/datasets/marquis03/high-resolution-viton-zalando-dataset/data) is upper body only as well. So current implementation & model won't work for lower body garments,
And, it should work for both man and woman, as long as the model can detect the landmarks appropriately.

VITON-HD

High-Resolution VITON-Zalando Dataset

#

You might want to take a look at this as well
https://rlawjdghek.github.io/StableVITON/

fading sail Jul 1, 2024, 3:35 PM

#

ah thanks for replying, il check it out. thx 🙂

#

im mainly looking for something that does all the body parts, so like upper, lower or full dress, i think IDM VTON does it and the magicclothing one, but they seem to require a lot of VRAM, but il keep searching

thin mantle Jul 2, 2024, 7:52 AM

#

Whats your guys workflow for captioning many images and making sure they are correct. What app do you guys use? And what features are your favorites?

#

Specifically to train loras and checkpoints

#

Cuz vision models often get things wrong

hoary geyser Jul 2, 2024, 10:38 AM

#

hey all, i'm attempting to train a sd(1.5) u-net from scratch, on a small (~2000 images) dataset that's not varied (specific subject).
my theory is i can use kohya_ss modified to re-initialize the weights at the start of the training loop to effectively reset the u-net.

technically this is working, but the output images aren't sensical yet. Wondering if there's anyone here who I can talk to, to explore this further.

jade hornet Jul 2, 2024, 2:49 PM

#

thin mantle Whats your guys workflow for captioning many images and making sure they are cor...

I guess you would have to define many. If I was doing thousands, I would put them in categorized folders so I could tag them by similar groups and styles. I like wd14 convnext v2 even for realistic, and I tend to use either dataset tag editor or taggui. If I'm doing realistic, I get rid of certain tags like 1girl that are strictly danbooru

#

There are lots of attempts to make something completely automated, but forget that, dataset prep is where you should be spending the most time

stiff dust Jul 3, 2024, 5:59 AM

#

hoary geyser hey all, i'm attempting to train a sd(1.5) u-net from scratch, on a small (~2000...

I mean you cannot train anything reasonable with 2000 images. That not even the data is memoized might be because transformers are very unstable at beginning and need long training time and a low lr at start

neat rover Jul 4, 2024, 5:25 PM

#

How hard it is for a beginner to create a pixelart model that will output similar results to this? I can get thousands of graphics in similar style and exact same resolution and format to train it.

hG1P52bcVSJkNk8EosgR14mAhH15D8SIGzpITlsxPLt950aPE8qVHfepTHxAAOw.png

nocturne sierra Jul 5, 2024, 12:14 PM

#

Hello everyone,

Does anybody know any good SDXL checkpoint training guide?

We have a nice dataset of 5,000 images in a specific style, but we just can't find any tutorials and articles about checkpoint training (not LORA)

blissful quest Jul 6, 2024, 5:28 PM

#

/imagane Real style, exterior, day, school building in long shot, a group of four 11-year-old children are standing on the far balcony, on the second floor by a railing, the children are talking excitedly. The first child is fair skinned with curly black hair and is wearing a red t-shirt and gray pants And talking to Solo, the second boy is tanned with straight blond hair he is wearing a green button up shirt and gray pants he is talking to the first boy, the third boy is light skinned with straight black hair he is wearing a light orange t-shirt and gray pants he is looking at Solo
The boy in the center: solo, tanned skin tone devilish blond wavy hair, blue t-shirt, short jeans and a gray school bag on his back
.

thin mantle Jul 7, 2024, 5:49 AM

#

I just finished training my first lora. I am now left with 800 safetensor files. How do i test them all to see which is most efficient?

latent charm Jul 7, 2024, 5:49 AM

#

just test the last. If it ok, boom.

thin mantle Jul 7, 2024, 5:50 AM

#

Ok lol

empty horizon Jul 8, 2024, 11:18 AM

#

Hello everyone, what should be the input prompt to extend the below images. I am trying to automate the extend feature so I require one specific prompt for the extension of both the images.
for patterned images, it's working absolutely fine. But for plain background images, it's adding some background which is not matching with the original image bg.

#

Prompt used: Generate creative background scene matching original image. Environmental scene, city life, dressing style, nature, building, non-living objects.
Neg prompt used: Blurry, bordered, zoomed, solid color, monotonic background, disfigured, human figure, living objects, gore, dead, hazy, dull.

thin mantle Jul 9, 2024, 3:44 AM

#

Ty for anyone that helped answering my questions. My first lora came out great. I just posted in #✨｜sdxl now im wanting to mix 2 models together because i find myself having to make an image then denoise that image with another model to get the result i want. Would merging the models save me from having to do this?

deft solstice Jul 9, 2024, 10:31 AM

#

Imagine a sleek, modern laptop displaying a vibrant and futuristic website interface. The screen showcases innovative design elements like smooth animations, interactive features, and bold typography. In the background, there are creative tools scattered around, symbolizing the process of crafting cutting-edge digital experiences. The scene conveys a sense of forward-thinking design, blending creativity with technology to shape the future of web development.

cursive condor Jul 9, 2024, 10:32 AM

#

thin mantle For example which would be a better caption and why. After ill provide you with ...

what is the prompt you gave to Llava and gpt4?

#

anyone here has information about fine-tuning? I have no idea how much more/less data it needs. the example I had from dreambooth dataset was like 10 images per object/class.

deft solstice Jul 9, 2024, 10:33 AM

#

#🔧｜finetune Imagine a sleek, modern laptop displaying a vibrant and futuristic website interface. The screen showcases innovative design elements like smooth animations, interactive features, and bold typography. In the background, there are creative tools scattered around, symbolizing the process of crafting cutting-edge digital experiences. The scene conveys a sense of forward-thinking design, blending creativity with technology to shape the future of web development.

deft solstice Jul 9, 2024, 12:11 PM

#

Prompt used: Generate creative background scene matching original image. Environmental scene, city life, dressing style, nature, building, non-living objects.
Neg prompt used: Blurry, bordered, zoomed, solid color, monotonic background, disfigured, human figure, living objects, gore, dead, hazy, dull.

thin mantle Jul 9, 2024, 12:37 PM

#

cursive condor anyone here has information about fine-tuning? I have no idea how much more/les...

I never done a checkpoint b4 but perplexity said about 200. Anything less should be a lora

gentle flame Jul 9, 2024, 5:11 PM

#

https://github.com/ROCm/aotriton/issues/16#issuecomment-2216077119

GitHub

[Feature]: Memory Efficient Flash Attention for gfx1100 (7900xtx) ·...

Suggestion Description Started using torchlearn to train models in pytorch using my gfx1100 card but get a warning that 1toch was not compiled with memory efficient flash attention. I see there is ...

valid sentinel Jul 11, 2024, 8:16 AM

#

https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image_lora.py

Hi, I'm fine-tuning my own style (with a fixed pose and background) using about 30K images. These images were generated using the SDXL model. Now, I want to fine-tune the SD15 model to adapt to that style for better performance.

All of my samples in this dataset use the same prompts. However, the results after one epoch are bad. The model seems too vibrant. I don't know if this is due to my dataset preparation (one prompt for all) or something else. Has any developer struggled with the same issue?

GitHub

diffusers/examples/text_to_image/train_text_to_image_lora.py at mai...

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX. - huggingface/diffusers

woven viper Jul 11, 2024, 12:54 PM

#

I need help setting up a workflow to train a Lora. I can get the text description files to generate but It will not generate the actual Lora file at the end. Is anyone familiar with doing this?

graceful nova Jul 12, 2024, 5:26 PM

#

Hi everyone
i trying to train Canny Controlnet to generate clothing images. in my training set (of 100K images), i have only Cloth items but while inferencing SD 1.5 drawing cloth along with humans. I using HF diffuser to train my controlnet.

Is it normal for SD 1.5 to generate human even if there are no in training set.
Currently i am not using any negative embedding. can negative embedding help removing unwanted human.

elfin valve Jul 13, 2024, 2:39 AM

#

Hey friends! I'm looking for paid lessons from someone with experience in training masked multi-resolution SDXL fine-tuning using OneTrainer.

I've been attempting to create a photo-realistic fine-tune to generate images of Miranda Kerr, the Australian model. For my base model, I used epicrealismXL_v7FinalDestination. This model can produce good pictures at both 1024x1024 and 1024x1280 resolutions. I thought that training on a mix of images in these resolutions might improve the fine-tune, but the results have been disappointing, indicating I might be missing something crucial.

Here's what I did:

I prepared 20 images at 1024x1024 resolution (half face close-ups and half-body shots) and 38 images at 1024x1280 resolution (a mix of half-body and full-body shots).
All images were masked with SAM masking, and captions were generated using "WD14 VIT v2," with "mirandakerr" as the first word.
For the first training concept, which used only 1024x1024 images, I set "Resolution Override" to "1024x1024". For the second concept, which used only 1024x1280 images, I set "Resolution Override" to "1024x1280".
I enabled "Aspect Ratio Bucketing" during training, hoping for at least decent results.

Despite this, the output quality is poor. Starting from epoch 1, I see degraded quality, and by epoch 5, the quality resembles a painting style (really bad).

The dataset used:

Concept 1 (20 masked images): 1024x1024 dataset: Google Drive Link
Concept 2 (38 masked images): 1024x1280 dataset: Google Drive Link

In total, I trained with 58 masked images per epoch.

I tried training for 100 epochs, and it can be seen that after the first 5 epochs, the quality degrades and stays at the degraded level (paint-like style). I'm attaching 4 samples of 1024x1024 resolution (epochs: 0, 5, 50, 100) and 4 samples of 1024x1280 resolution (same epochs: 0, 5, 50, 100). It is clear that the image quality at epoch 0 is good, but anything after the first epoch is unusable due to the really bad quality. Learning rate used "3e-6". Learning rate "2e-6" gives similar results (but slower training).

I'm also attaching the training preset "Tuned SDXL FineTune BFloat16 3.json." I have 16GB VRAM and used ADAFACTOR with the bfloat16 data type. However, I can rent a better machine if you believe the results will be better on a bigger machine.

Note, "epicrealismXL_v7FinalDestination" can already produce "Miranda Kerr". However, I'm trying to teach the model to recognize her using the unique trigger word "mirandakerr".
In fact, I'm not interested in training this specific model and I'm not interested in the final checkpoint. My goal is to learn how to train photo-realistic images of real people.
Currently, I'm concerned about the poor quality (fried skin).
The reason for using masked training is that I need to train the model on photo studio face shoots where the background is always white. In such cases, I don't want the model to learn to reproduce a white background in all photos.

In the training dataset, you may find black bars on the sides - these were created to match the training resolution. However, these black bars are masked and thus not included in the training.
I have never used Kohya_ss, so I can't say if it gives better results, but I think I will experiment with Kohya_ss as well because currently, I'm a bit stuck with OneTrainer.
I would really appreciate any help or advice on effectively training a photo-realistic model using OneTrainer.

Disclaimer: "Voplica" is the project where I'm conducting my experiments. The final goal is to create a service for photo-realistic model training. I am personally a software developer. All content at Voplica is AI-generated.

Looking forward to finding a teacher and potential partner.

Thank you!

📎 Tunned_SDXL_FineTune_BFloat16_3.json

Google Docs

1024_1024.zip

Google Docs

1024_1280.zip

frail moth Jul 13, 2024, 2:23 PM

#

Does anyone have advice for Logo prompts and how high the CFG should be?
Been using Redmonds Logo Lora

#

with prompts like this

#

logo, claw reaching out of the screen (animal paw with claws), neon claw, 3d realistic <lora:LogoRedmondV2-Logo-LogoRedmAF:1>

#

but i get mostly shit

#

using fp8 to not run out of vram idk if thats an issue

#

and sdxl

latent tiger Jul 14, 2024, 8:02 PM

#

Hey all! Been a while since i did any fintuning so was wondering if you know any great guides in fintuning and dataset building (mostly how to make a quality dataset) thanks for your help!

tame otter Jul 16, 2024, 7:17 PM

#

are there any good guides, like not tutorials, just in depth information regarding the building and structuring of a large dataset of thousands or 100s of thousands of images

minor island Jul 17, 2024, 4:30 PM

#

Hey all, asked this question over in tech support and we could not land on an answer... Maybe you guys can figure it out..

I am training a lora and saving samples during the process. The samples have very clear influence of the training data in them so the lora is for sure seeing the results from the training. Fast forward to using the lora and no matter what I do, I can't get the image output using the lora to have any difference than the same prompt/seed than the image without the lora.

Using other loras, this is not the case, it only seems to impact the ones I have trained. I see the lora listed in the prompt details so A1111 is picking up the lora is there but it just seems to have zero effect.

Not sure where else to look at this point. Any thoughts or advice is greatly appreciated.

elfin valve Jul 18, 2024, 8:33 AM

#

minor island Hey all, asked this question over in tech support and we could not land on an an...

Could be that you are training Lora on the other type of base model than the one you are using in A1111. For example, if you are training SD1.5 but using SDXL or vice versa.

jade hornet Jul 19, 2024, 12:29 AM

#

minor island Hey all, asked this question over in tech support and we could not land on an an...

Does the meta data of the images you generate acknowledge the Lora? Does the console record the Lora loading successfully or does it throw errors? does the size of the Lora seem right from your training parameters?

minor island Jul 19, 2024, 7:57 PM

#

elfin valve Could be that you are training Lora on the other type of base model than the one...

I am training on the same model I am using to produce the images, both 1.5 model.

minor island Jul 19, 2024, 7:59 PM

#

jade hornet Does the meta data of the images you generate acknowledge the Lora? Does the c...

Yes, the metadata shows a "Lora hashes:" section that is populated with the lora I trained and a hash. I don't see any errors in the console loading the lora. The lora is just over 2Gb.

elfin valve Jul 19, 2024, 11:57 PM

#

Update: The position is now closed. Thank you everyone who applied ❤️

Leaving the below information for reference. This position is now filled, but we will notify when we have new positions. Moreover, we are always open for collaboration, partnerships, or just meet other tech enthusiasts in this field.

Voplica is a startup specializing in AI model training, image generation, and image enhancement. We are seeking a Python Engineer with strong expertise in Stable Diffusion (SD) models for a full-time position.

Job Responsibilities:

Develop automated model training processes that require minimal human oversight.
Develop pipelines for dataset preparation, including cropping, masking, and image analysis using various models.
Create inference processes utilizing fine-tuned Stable Diffusion models.
Build pipelines for image restoration and enhancement, such as upscaling and fixing details in hands, feet, and faces.
Contribute to the kohya_ss/sd_scripts project by automating parameter tuning and improving model training processes. This includes enhancing training on non-diverse datasets, such as masked training for photo studio datasets with consistent backgrounds.
Quickly learn and integrate new AI technologies into our projects.

Qualifications:

Strong expertise in Python programming.
Extensive experience with Stable Diffusion models.
Knowledge of data structures and algorithms (Big O notation, space and time optimization, hash maps, trees, heaps).
Preferred: Experience with SDXL hyperparameters tuning, learning rate loss analysis using TensorBoard.
Preferred: Knowledge of microservices architecture, developing Python workers for job queues (RabbitMQ, Kafka), experience with S3 storage or other object storages (OpenStack Swift, Ceph, etc.), gRPC.
Preferred: Experience with context switching optimizations for LLMs.

If you are passionate about advancing AI technology and improving image processing capabilities, we would love to hear from you.

Best regards,
Alex

elfin valve Jul 20, 2024, 8:21 PM

#

Hey friends,

Some of you have sent me DMs here, but your privacy settings don't allow me to reply or send you a friend request. Please ensure your privacy settings are adjusted, or send me an email instead.

Best regards,
Alex

hollow spruce Jul 24, 2024, 6:19 AM

#

minor island Hey all, asked this question over in tech support and we could not land on an an...

probably an issue when the lora is saved

#

either you messed up a setting, in how it is saved in kohya/onetrainer/diffusers
or your venv/torch got messed up.

I'd recommend for you to check the settings for how your lora is saved, and if you dont see any obvious issues, then wipe your current venv (virtual python environment), then let it install fresh, then try again.

hollow spruce Jul 24, 2024, 6:33 AM

#

tame otter are there any good guides, like not tutorials, just in depth information regardi...

nop 😦

but based on the large scale finetuners (not companies) I've met, they all fall into 3 camps:
Amazon S3 Buckets. Example: https://huggingface.co/datasets/ptx0/photo-concept-bucket which is then precomputed and hosted on S3 and loaded via a json file
JSON only Datasets: https://huggingface.co/datasets/CaptionEmporium/furry-e621-sfw-7m-hq which include multiple captions for each image, enabling multi-caption style training (very effective, but also costly to train)
For complete local storage, but datasets of above 100k images, you either do the lots of folders + json solution, or go the hydrus network route. Both are equally painful.

#

If you were talking about the balancing act of said dataset, then there's just no agreed upon solution.
and most companies are just winging it (terribly) - which is why we have these extreme biases to begin with.

hollow spruce Jul 24, 2024, 7:09 AM

#

elfin valve Hey friends! I'm looking for paid lessons from someone with experience in traini...

another victim of SECourses/Furkan? 🥲

You've prob already found someone to help you.
But just wanted to point out key things:

• Training photorealistic is very very different from general training. You actually wanna avoid training things like details, but instead the concept of how a person looks.
Example I trained using 9 images, to make a point of this: https://civitai.com/models/349773/oc-aline
• a "full finetune" opens up more possibilities for training. but that's not always a good thing. Unless you at least roughly know what you're doing, a full finetune will often do more harm then good. A simple lora training in 10~30minutes will often do the trick just fine.
• Experiences cannot be transferred 1:1 for training models. Training a lora of a white woman, will usually have a similar pipeline. But training one of an indian or chinese woman or man will have significant differences. (Hence why fully automated solutions don't work, unless you have a genuinely balanced base model)

to recognize her using the unique trigger word "mirandakerr"
trigger words, especially using real names, are a very complex topic which I've literally witnessed kill an AI startup.
tokens already have meanings assigned to them. Meaning there is no "one size fits all" fully automated solution.
For the sake of simplicity, you need to train both the unet and the clip models correctly without overfitting either, if you want custom trigger words to work.
(It's not that custom trigger words dont work - THEY DO! just that they sometimes work better, and sometimes worse, and that's something you need to be aware of. Sometimes a lora fails completely, simply due to the chosen triggerword, so you have to pick a different one)

Currently, I'm concerned about the poor quality (fried skin)
Your training is picking up the fine details before picking up the general concept. Min SNR 5 is your friend here. Also, doing only a net rank 32 lora will help as well, since you affect less parameters. (meaning less details can be messed up)

Kohya and Onetrainer give fairly similar results if you've got a grasp on the parameters. Masked training, while useful, does also have negative side-effects. so relying on it completely isn't a good idea either.

The final goal is to create a service for photo-realistic model training
If you wanna match the currently existing services (phone apps and a few online sites), that's not too hard. They rely mostly on overfitting a fair bit, and thus don't bother with the typical issues that might pop up. This can be automated fairly simply.
If you wanna beat them in quality, then that's genuinely hard, due to the issues I mentioned earlier, which will all occur if you don't cheat while training.

The Pareto principle really applies here. You can get 80% of the way, with 20% of the effort. But the closer you wanna get to 100%, the amount of time and effort you invest will rise exponentially.

OC: Aline - v1.0 | Stable Diffusion LoRA | Civitai

A LoRA of an original character named "Aline" Use with: 832x1216px The Standard checkpoint is compatible with base, as well as all sdxl based finet...

elfin valve Jul 24, 2024, 8:04 AM

#

hollow spruce another victim of SECourses/Furkan? 🥲 You've prob already found someone to he...

Wow. So much useful information you pointed here. Thank you so much!
I haven’t checked SNR yet, but will definitely take a look at it.
Regarding masked training. Many datasets I’m working with may be a photo studio shots with constant background (white background) and I’m afraid the model will learn white background during the training (which I don’t what to do).
I noticed you trained DoRa models as well. Have you noticed differences in quality between LoRa, DoRa and full fine tunes by any chance?

hollow spruce Jul 24, 2024, 8:19 AM

#

and I’m afraid the model will learn white background during the training
• With a very simple photoshop script, you can automatically replace the background <- good enough to add random backgrounds to images, so that none of it gets picked up by training
• "simple background, white background" can be added to the captions of the whole dataset, then used as a negative during inference. that also gets rid of the background. <- change white to whatever background color you actually have
• masks can be used, but doing so means that model loses the context of "where" to generate your new data. <- also works, but isnt less work than the other 2 options. best to try all 3 for your unique scenario

Have you noticed differences in quality between LoRa, DoRa and full fine tunes by any chance
Yeah. A lot.
DoRA is a massive improvement to everything, on the same scale as full finetuning, then extracting a lora. But it does come with the same possible downfalls of full finetuning, where more can get messed up. But once you have experience with captioning & a basic preset that works, DoRA is basically as easy to use as a LoRA, just better in a lot of ways.

regal shore Jul 26, 2024, 1:58 PM

#

Hi everyone,can you advise for 370 images, how many minimum epochs and steps are needed for average quality?

hollow spruce Jul 26, 2024, 2:36 PM

#

regal shore Hi everyone,can you advise for 370 images, how many minimum epochs and steps are...

sdxl base, sdxl pony, sd1.5?

regal shore Jul 26, 2024, 2:36 PM

#

hollow spruce sdxl base, sdxl pony, sd1.5?

sdxl pony

hollow spruce Jul 26, 2024, 2:36 PM

#

using kohya?

regal shore Jul 26, 2024, 2:36 PM

#

hollow spruce using kohya?

no,one trainer

hollow spruce Jul 26, 2024, 2:38 PM

#

regal shore no,one trainer

then you can just copy these values to onetrainer

#

#

names are pretty much identical in onetrainer, so should be fairly easy

#

just remember to turn on min snr 5 in onetrainer, since that ones hidden away in a dropdown

#

epoch 100 will be your target

#

I usually let it run to 150, in case I prefer a slight overfitting

#

https://civitai.com/models/597892/juno-overwatch-for-pony-properly-trained
and
https://civitai.com/models/596487/sonoshee-mclaren-for-pony-redline

were both trained with those settings, on pony sdxl. so you can look at those for what quality to expect from that preset

Juno - Overwatch (for Pony) - properly trained - v1.0 | Stable Diff...

Core Tags ( with suit ): juno overwatch, purple hair, short hair, gloves, bodysuit, covered navel, breasts, medium breasts, blue gloves, multicolor...

Sonoshee Mclaren (for Pony) (Redline) - v1.0 | Stable Diffusion DoR...

Core Tags: sonoshee mclaren, 1girl, solo, green hair, green eyes, multicolored hair, pink hair, breasts, necklace, hairclip, large breasts, sunglas...

regal shore Jul 26, 2024, 2:42 PM

#

Thank you

hollow spruce Jul 26, 2024, 2:45 PM

#

regal shore Thank you

you're welcome. and remember to adjust learning rate in case you change the batch size
your learning rate / batch size = 0.0001
thats why the preset has 0.0008 with a batch size of 8
works if you have a 3090 or 4090.

if you have a smaller gpu, then just adjust it using that simple formular

hollow spruce Jul 27, 2024, 4:48 PM

#

@unborn wind

#

I think you got the wrong channel

unborn wind Jul 27, 2024, 4:49 PM

#

Lol I actually didn't. thank you!

remote creek Jul 27, 2024, 10:17 PM

#

Hi could you please advise me on how to format/partition? I got a new 4tb SSD I want to install Linux and automatuc1111 on it and use it to store all my big files (models, loras, etc), AND use it sometimes for webuis in windows as well (SD.next, forge). So I want to reuse the same directory for loras and models across windows and Linux dual boot, issue is, sharing requires ntfs and I understand having the models on an ntfs partition will slow down the Linux perf of automatuc1111? Is that correct?

unkempt perch Jul 28, 2024, 5:46 AM

#

https://www.patreon.com/posts/kohya-gui-colab-108956809?utm_medium=clipboard_copy&utm_source=copyLink&utm_campaign=postshare_creator&utm_content=join_link

Patreon

KOHYA GUI COLAB FREE | Sahil Shah

Get more from Sahil Shah on Patreon

lime pawn Jul 29, 2024, 8:49 AM

#

Little chubby guy eats fruits

hollow spruce Jul 29, 2024, 12:27 PM

#

remote creek Hi could you please advise me on how to format/partition? I got a new 4tb SSD I ...

for linux distro, use:
pop!os 22.04 LTS Nvidia
^ that solves your drivers + cuda + torch issues

On that, set up a folder for "AI"

#

under models, is where I have all checkpoints, loras and stuff saved. its also where I automatically store things I train myself

#

And every application has some way of being able to point at a different directory for models.
For A1111, you edit the webui-user.sh
export COMMANDLINE_ARGS="--data-dir '/mnt/md/0/AI/A1111/stable-diffusion-webui' --ckpt-dir '/mnt/md/0/AI/MODELS/checkpoints' --lora-dir '/mnt/md/0/AI/MODELS/lora' --embeddings-dir '/mnt/md/0/AI/MODELS/embeddings' --listen --port 16999"

comfy, swarmUI, and all the other generators can also have different locations set up - you just point them at that models folder

#

as for windows and linux sharing a drive - there's info on that to be found online, and is not specific to stable diffusion. You can basically just "mount" the shared partition in linux. pop!os makes that fairly simple - so read up on their site or ask on their discord

hollow vortex Jul 31, 2024, 9:46 PM

#

Hi, I have a question about finetuning stable diffusion for inpainting.

I came across this paper which is similar to the work we are doing: https://arxiv.org/abs/2312.03606. They finetuned the Stable Diffusion model to create synthetic satellite images. After reading this paper I was wondering it would be possible to finetune the Stable Diffusion Inpainting model as well. For instance, we could mask out an area in an image and then our finetuned model could fill in that area with a road, river, or something of that nature.

I currently have input images and masks of land cover features on hand. I do not have any prompts but I am under the assumption I could possibly create some myself if needed. What would be the best way to go about finetuning the stable diffusion for inpainting model on my dataset (~10000 images)?

For reference, I have trained a custom UNet2DConditionModel for inpainting from scratch with simple labels such as grass, trees, water, etc... using my dataset that I posted about here: https://discuss.huggingface.co/t/custom-pipeline-inference-speed-extremely-slow/89642 and was able to get some decent results (I posted this because the inference speed was slow but that has now been fixed). I pulled the off the shelf stable diffusion inpainting model and attempted to inpaint certain features into the images and it did pretty well but could definitely use some work. With that being said, I was hoping that finetuning the stable diffusion inpainting model could outperform my current current model

Hugging Face Forums

Custom pipeline inference speed extremely slow

Hi, I have been using Diffusers recently and was able to create a custom inpainting model. The model is based off the Palette repo: GitHub - Janspiry/Palette-Image-to-Image-Diffusion-Models: Unofficial implementation of Palette: Image-to-Image Diffusion Models by Pytorch. My model is able to take an image, mask, and label as input and inpaint th...

pseudo sail Aug 7, 2024, 3:50 PM

#

Hi all, looking to make my first dataset. Are there any web-based or mobile-friendly tagging tools? had a look at taggui but it's desktop only from what i can tell

stone garden Aug 9, 2024, 3:21 AM

#

any 4090 guys here trained flux yet?

stiff dust Aug 9, 2024, 2:42 PM

#

still trying. It works but results are not really good yet

golden quail Aug 12, 2024, 12:51 PM

#

Has anyone tried applying ReFT to any diffusion model yet? Seems to beat DoRA and with much fewer parameters for LLM benchmarks https://arxiv.org/abs/2404.03592

arXiv.org

ReFT: Representation Finetuning for Language Models

Parameter-efficient finetuning (PEFT) methods seek to adapt large neural models via updates to a small number of weights. However, much prior interpretability work has shown that representations encode rich semantic information, suggesting that editing representations might be a more powerful alternative. We pursue this hypothesis by developing ...

torpid basalt Aug 13, 2024, 3:00 AM

#

which method is the best for train a artistic style with a lot of images? (Lora,Dreambooth,Custom Diffusion, etc)

cloud ice Aug 13, 2024, 11:37 AM

#

hey not sure if this is the right place to ask such a question, but i am currently trying to finetune sd1.5 with python and i dont get any errors but my outputs test images are exactly the same (pixelcomparisson), also i compared the .safetensors from the unet cmd: "fc /b file1 file2" and no changes were detected.
my idea for the dummy data was to just use the same image several times with the same description as a proof of concept. I then use exactly the same prompt to generate test images, hoping to see some resembelence to my dummy input image

any help would be highly appreciated ❤️

📎 train.py

cloud ice Aug 13, 2024, 12:23 PM

#

also tipps where to ask would help 😄

jade hornet Aug 15, 2024, 12:12 AM

#

why 1.5? Old model. Anyway maybe use kohya vs trying to write it yourself

cedar birch Aug 17, 2024, 2:14 PM

#

Hello!

Does anyone trying to train Juggernaut XL that version https://civitai.com/models/133005?modelVersionId=456194

I did, but the results was not good enought. Dataset was captioned by the same tool as jugger, near the 300 images.

Can you give me some advice how to improve it?

Some examples of captions:

A cocktail, amber fluid color, old-fashioned glass, ice cubes 75% ice, lemon slice garnish in glass, reflective surface, vibrant surroundings, illuminated counter, glistening drink

A cocktail, vibrant magenta fluid color, coupe glass, lime wheel garnish on the rim, rose petals around glass, blurred background with bokeh lights, indoor setting

📎 config.toml

Juggernaut XL - Jugg_X_by RunDiffusion | Stable Diffusion Checkpoin...

For business inquires, commercial licensing, custom models, and consultation contact me under juggernaut@rundiffusion.com Join Juggernaut now on X/...

stone garden Aug 17, 2024, 2:55 PM

#

cedar birch Hello! Does anyone trying to train Juggernaut XL that version https://civitai.c...

Its a very good model, and always had a lot of respect for it. But of recent they seem to be slipping from the number one spot with the training .

cedar birch Aug 17, 2024, 2:56 PM

#

stone garden Its a very good model, and always had a lot of respect for it. But of recent the...

Yeah its good model, but looks like not good for lora training

stone garden Aug 17, 2024, 2:57 PM

#

it used to be, I trained a lot of my first SDXL loras with jugg and copax

#

back before I was training checkpoints

cedar birch Aug 17, 2024, 2:59 PM

#

stone garden it used to be, I trained a lot of my first SDXL loras with jugg and copax

Cool, what I did wrong with jugg? Any ideas why so bad results in my loras?

stone garden Aug 17, 2024, 3:23 PM

#

if you want a model to train a lora really well use this SDXL lightning https://tensor.art/models/751519943066912725/LIGHTNING-Dream-Diffusion-By-DICE-v1

LIGHTNING - Dream Diffusion - By DICE - v1 | Stable Diffusion Model...

1.5K runs, 9 stars, 17 downloads. ⚡ LIGHTNING HAS LANDED FOR SDXL ⚡Welcome to the new wave of Stable DiffusionI'd like to start off with a HUUUGE thank you t...

cedar birch Aug 17, 2024, 3:26 PM

#

stone garden if you want a model to train a lora really well use this SDXL lightning https://...

I'd prefer to use jugg, its know more concepts related to cocktails. Also its trained on gpt4v captions.

I want to train lora on jugg for now.

stone garden Aug 17, 2024, 3:29 PM

#

ok but lightning models are the best to train with as you need a lot of images and lightning runs CFG 1 and steps 2

cedar birch Aug 17, 2024, 3:30 PM

#

stone garden ok but lightning models are the best to train with as you need a lot of images a...

I feel fine, to rent h100 for 16hours.

stone garden Aug 17, 2024, 3:31 PM

#

there is a render with them settings

#

even more so reason to use it if your renting

cedar birch Aug 17, 2024, 3:31 PM

#

Try to render a cocktail, sdxl know a less concepts for coctails.

stone garden Aug 17, 2024, 3:32 PM

#

what a cocktail drink?

cedar birch Aug 17, 2024, 3:33 PM

#

stone garden what a cocktail drink?

A close-up of a champagne flute on a marble surface, filled with sparkling champagne. A lemon twist spirals around the inner rim of the glass. The background is dark, emphasizing the clarity and bubbles in the drink, creating an elegant and sophisticated atmosphere.

#

Something like that

#

https://www.liquor.com/recipes/champagne-cocktail/

Liquor.com

The Elegant Simplicity of the Champagne Cocktail

The Champagne Cocktail, a simple combination of sparkling wine, bitters and sugar, is an easy way to alter a flute of bubbly.

#

Im targeting to something like that, but without bottle

stone garden Aug 17, 2024, 3:35 PM

#

i have just written this and its rendering now on the beach in the sand is a cocktail glass with a colourful drink inside, with ice cubes and a light condensation on the glass, in the back ground is a stunning sunset on the ocean horizon,

#

cedar birch Aug 17, 2024, 3:36 PM

#

The main problem another. I need an image to recipe. So i need full control on, glass type, fluid color, soft drink or not, with/without ice, garnish etc

stone garden Aug 17, 2024, 3:37 PM

#

tell me what glass and colur drink

cedar birch Aug 17, 2024, 3:37 PM

#

glass: champagne flute
fluid color: transparent golden

extras:
without ice
with lemon twist garnish on the glass rim

stone garden Aug 17, 2024, 3:39 PM

#

if your training you need every type of colour not nesseryly the brand of drink as the check point will know that you just need primery colurs and differnt glasses

#

ice, garnish is all handled by the checkpoint

cedar birch Aug 17, 2024, 3:41 PM

#

Some of my captions

A cocktail, amber fluid color, old-fashioned glass, ice cubes 75% ice, lemon slice garnish in glass, reflective surface, vibrant surroundings, illuminated counter, glistening drink

A cocktail, pink fluid color, lowball glass, crushed ice 90%, cucumber slices in glass, grapefruit slice on rim, mint sprig on rim, tray surface, daytime

A cocktail, white fluid color with foam on top, coupe glass, cinnamon stick garnish on rim, pink and yellow background

#

334 image total

stone garden Aug 17, 2024, 3:41 PM

#

the lora is a very basic guide for more custom cheraters for the newer models as the new models are so well trained they dont really need a lora

#

ill put those prompts into flux and you will see no need for a lora

#

ok used this one A cocktail, pink fluid color, lowball glass, crushed ice 90%, cucumber slices in glass, grapefruit slice on rim, mint sprig on rim, tray surface, daytime

#

spot on ,,,, no need for a lora or neg prompt

cedar birch Aug 17, 2024, 3:44 PM

#

cucumber is not a real.

#

Real photo

stone garden Aug 17, 2024, 3:45 PM

#

could make it more real with 1 word added to your prompt

#

thats not what your prompt ask for tho lol

cedar birch Aug 17, 2024, 3:45 PM

#

Flux 😄

stone garden Aug 17, 2024, 3:46 PM

#

thats cant be the same prompt i used

#

they are to different

cedar birch Aug 17, 2024, 3:46 PM

#

cedar birch Real photo

it's another for example, another prompt, another generation should be. Not about pink

stone garden Aug 17, 2024, 3:46 PM

#

oh

cedar birch Aug 17, 2024, 3:47 PM

#

stone garden ok used this one A cocktail, pink fluid color, lowball glass, crushed ice 90%,...

The source image for that prompt was:

cedar birch Aug 17, 2024, 3:48 PM

#

stone garden ok used this one A cocktail, pink fluid color, lowball glass, crushed ice 90%,...

At least that fluid color is not pinkish enough

#

Flux works better, it's true.

cedar birch Aug 17, 2024, 3:49 PM

#

stone garden ok used this one A cocktail, pink fluid color, lowball glass, crushed ice 90%,...

see cucumber slice, in the generation is citrus+cucumber slice.

stone garden Aug 17, 2024, 3:50 PM

#

rendering for your red drink

cedar birch Aug 17, 2024, 3:51 PM

#

stone garden rendering for your red drink

cedar birch Aug 17, 2024, 3:51 PM

#

cedar birch Flux works better, it's true.

Flux works better, it's true.

stone garden Aug 17, 2024, 3:51 PM

#

my flux hyper on the left your google image to the right

#

A cocktail, red fluid color, tumbler glass, medium ice cubes 90%, lime slices in glass, rosemary, pommegranite, white marble table, daytime

cedar birch Aug 17, 2024, 3:52 PM

#

What is did with the Pomegranate xD

stone garden Aug 17, 2024, 3:52 PM

#

omg i just noticed that lol

#

😆

cedar birch Aug 17, 2024, 3:53 PM

#

Flux better, but you see )

stone garden Aug 17, 2024, 3:53 PM

#

well trained it even knew what i ment from that shambles

#

running it correctly now

#

#

wah lah better

#

lol

cedar birch Aug 17, 2024, 3:55 PM

#

the next problem is the layers (real photo below)

#

A cocktail, amber and green gradient fluid color, highball glass, crushed ice, 80% ice, mint sprig garnish on top, tequila bottle in background, ice on plate, bar setting

stone garden Aug 17, 2024, 3:57 PM

#

ok here goes

#

#

A cocktail, amber and green gradient fluid color, highball glass, crushed ice, mint sprig garnish on top, tequila bottle in background, ice on plate, bar setting, UHD, microscopic photography, magnified, molecular, unseen worlds revealed, scientific exploration, capturing molecular details, professional imaging techniques, precise focusing, revealing hidden beauty, scientific discovery, artistic interpretation, nano scale, revealing the wonders of the unseen

cedar birch Aug 17, 2024, 5:41 PM

#

stone garden

img2img looks like a cheating

stone garden Aug 17, 2024, 5:42 PM

#

ip adapter

cedar birch Aug 17, 2024, 6:09 PM

#

stone garden ip adapter

not to much difference. I can't do it in my case. Only txt2img.

dull wigeon Aug 18, 2024, 2:32 PM

#

Mia Khalifa serving food in a restaurant wearing a protective revealing leather outfit

granite rover Aug 28, 2024, 12:45 AM

#

Does someone know If I can caption my images with llama 3.1 to create some loras?, (in comfy ui) (I will just use 20-30 images)

sharp basalt Aug 28, 2024, 2:37 PM

#

Commercial photography, powerful yellow powder explosion, fried chicken, black background, bright environment, white lighting, studio lighting, OC rendering, super detail, solid color isolation platform, professional photography, color gradinging About Midjourney Parameters --ar 9:16 --v 5.2 --s 750 --c 0 --q 1

stark veldt Sep 1, 2024, 4:17 AM

#

hi guys, i learned recently a bit about finetuning and i have some questions about how dynamic it is.

im building an app where one could input their biz/startup/game idea and go through steps where they will be generating context about it (objective, target audience, biz model, etc)

every time you generate something, it is used as context for next time you interact with an AI

rn im also working on a step to generate a logo for that brand.

what im doing currently is im dumping the brand context into GPT and asking it to create a prompt for SDXL, which i then insert with some other prompt keywords to make sure it looks like a proper logo

the issue is GPT's prompts are kind of trash.

i was wondering how well fine tuning would work if i:

Gotten a bunch of pairs of brand context dump -> ideal logo image
Trained a SDXL model on it

i.e would it work well if the "prompt" is a huge context about a brand, and not rather some small trigger keyword?

or is that not how it works?

compact mango Sep 2, 2024, 10:47 AM

#

I fine-tuned a model with images of an actor from an old movie. All of them have a characteristic look due to the camera quality. Now the model generates images in that particular style, not with the face that I wanted to achieve. What did I do wrong? Should I improve captioning or prompting? I don't want to replicate the style, but face.

rare fern Sep 5, 2024, 4:28 PM

#

Hey everyone
i want to train a LLM SDXL fine tune model with about 100k images i have trained 84k images model till now but haven't gotten any better results till now, can anyone tell me how to start it?

cloud matrix Sep 5, 2024, 9:10 PM

#

rare fern Hey everyone i want to train a LLM SDXL fine tune model with about 100k images ...

what's an LLM SDXL model? like, you use captions from an LLM?

#

don't use an LLM with vision capabilities. they can be good but they're over kill for flux training. This is out. Florence 2 is a Vision Language Model that's more light weight and specialized. Look into that.

full trellis Sep 13, 2024, 1:34 PM

#

why when i installed control net in stable diffusion i didnt see a tab with controll net

#

#

stone garden Sep 14, 2024, 9:15 PM

#

yo i got a question about dataset preparation.
i am currently distilling a lora by making a big amount of wildcard prompt based images and simply discarding low quality and bad character features.
the idea i'm having to make the lora less biased towards one style is to try and generate multiple different styles using the already existing lora.
so i'll make an equal amount of different style images of the character using the old lora and sdxl, take the best images and then train a flux lora on those images

#

how many images does it usually require for a flux lora and how can i make it learn the characters features abstractly and not directly associate it to one particular style

jade hornet Sep 15, 2024, 8:16 PM

#

You mention style and you mention character, normally a character training would use subject captioning, where you describe as much in the image as possible except the character, because you want the ai to learn the character. Style training is different, you want to describe as little as possible, because you want it attempt to learn the style which is a more abstract concept.

stone garden Sep 17, 2024, 12:38 AM

#

jade hornet You mention style and you mention character, normally a character training would...

yeah i dont describe the features of the character, just what the character is doing and emotions and environment

#

i used joycaption to caption the dataset and then removed things that describe the recurring features of the character

jade hornet Sep 17, 2024, 1:05 AM

#

stone garden i used joycaption to caption the dataset and then removed things that describe t...

nevertheless, I feel like your goals are diametrically opposed. style training and character training are different methods. Eithre try to train with no captions and just see what happens, or do them separately, or with a multiple concept lora with one trying to capture the style and one trying to capture the character. different triggers for both. or you could do 2 separate loras worst case. remember that flux is very new to all of us still, so any advise is to be taken with a grain of salt

stone garden Sep 17, 2024, 1:06 AM

#

huh i just want to capture the character and nothing else, no style

#

i want a token to be associated to the character

#

i dont want to have to describe my character in detail just to get him

#

as it is with the pure joy caption loras where character features havent been purged

#

i did a new dataset and new captions and this time i'm just referring to the character as if the model already knew it

#

using the token i chose

#

the description that has the most content is the background and style of the image

jade hornet Sep 17, 2024, 1:09 AM

#

I see, well character training is something flux should handle easily, even with few images

stone garden Sep 17, 2024, 1:09 AM

#

because i want a lora that can portray the character in different styles

#

not just in the original one it was made in

cloud matrix Sep 19, 2024, 6:05 PM

#

stone garden not just in the original one it was made in

there's lots of strategies here. If your dataset is consistent enough, you can get away with just using one trigger word to train. If the dataset is varied, you'll want to use captioning. This guy has been writing tons of training diaries. this and his other writings about flux are insightful

https://civitai.com/articles/6868/flux-character-caption-differences-training-diary

Flux Character Caption Differences - Training Diary | Civitai

I wanted to do more Flux training experiments, and I got some ⚡⚡⚡ buzz donated ⚡⚡⚡ from a user to run some character experiments, so run the experi...

gentle flame Sep 20, 2024, 7:43 PM

#

if anyone wants to try it out, here's a new optimizer. Modified adamw iirc.
https://github.com/lodestone-rock/compass_optimizer

GitHub

GitHub - lodestone-rock/compass_optimizer

Contribute to lodestone-rock/compass_optimizer development by creating an account on GitHub.

#

I'm not technical enough to explain it properly, but the author described it as rmsprop with a low pass filter ontop.

amber cypress Sep 22, 2024, 11:10 PM

#

Hi:) I just finished a dataset of around 40gb of architectural images. Tagged by scraping the original captions, extracting keywords + florence2 descriptions.

#

The goal is ideally to make a full fine tune of Flux, since with so many images a lora might not make sense. Any tips, guides?

#

Around 75k images

jade hornet Sep 23, 2024, 8:44 PM

#

you would typically be right with that approach. 75k images dataset probably not suitable for dreambooth/lora. The problem is flux is so new I dont know anyone that has done a full finetune on it yet. basically, you're treading into new territory here

twilit cradle Sep 26, 2024, 9:59 AM

#

hey hey quick question if anyone knows! kinda new to this. I am using kohya to train loras, but want to queue up the trainings to test different parameters (when one training ends, the other starts up without me having to manually start it). All i have found so far is to print the training command for each configuration, and then paste them into a script or bat? is first question. 🙂

twilit cradle Sep 26, 2024, 10:26 AM

#

twilit cradle hey hey quick question if anyone knows! kinda new to this. I am using kohya to t...

nvm i just figured it out! in case anyone else asks, after hitting print training command, your terminal will show a line that starts with "kohya_ss\venv\Scripts\accelerate.EXE launch .... and ends with your .toml file (any arguments you have will show after the toml, for example, --network_train_unet_only)

pine relic Sep 26, 2024, 10:18 PM

#

hi all, I am a complete beginner to finetuning SD models. I have tried to set up the Automatic1111 web ui on me mac but failed badly as I was getting some error related to device type being mps which I was unable to fix.

do you have any advice of where to start if I am a complete beginner with not much knowledge in software development and need an easy way to finetune the sd model? The use case is teaching it to generate images of a specific product. Thank you in advance. And sorry if this is written anywhere, I was unabel to find it

jade hornet Sep 26, 2024, 10:27 PM

#

pine relic hi all, I am a complete beginner to finetuning SD models. I have tried to set up...

dont do it on mac if you can avoid it. what matters most is gpu, and unfortunately apple does not shine here. 2. automatic1111 wont help with finetuning, it's only for doing image generation locally. 3. check out this link for a good training app. maybe you should look into doing it on a cloud service such as vast.ai, if you have limited compute choices locally. https://github.com/bmaltais/kohya_ss

GitHub

GitHub - bmaltais/kohya_ss

Contribute to bmaltais/kohya_ss development by creating an account on GitHub.

pine relic Sep 27, 2024, 7:05 AM

#

jade hornet 1. dont do it on mac if you can avoid it. what matters most is gpu, and unfortun...

thank you! Have you ever tried running the colab notebook? I am getting this error and no idea what to do with it:
ImportError: cannot import name 'split_torch_state_dict_into_shards' from 'huggingface_hub' (/usr/local/lib/python3.10/dist-packages/huggingface_hub/init.py)

jade hornet Sep 27, 2024, 2:41 PM

#

I stopped using colab when they changed their EULA to forbid it on their free tier. you can pay for colab pro, but honestly vast is better imo

mighty sedge Sep 27, 2024, 11:04 PM

#

first time i manage to make my loss graph look this beautiful lmao

jade hornet Sep 27, 2024, 11:31 PM

#

.08? wow...hopefully the actual results match that

mighty sedge Sep 27, 2024, 11:32 PM

#

nah, it overfit a long while ago lmao

#

this is actual

jade hornet Sep 27, 2024, 11:33 PM

#

umm, well depending on what you're going for... 😄

mighty sedge Sep 27, 2024, 11:33 PM

#

I think best epoch was this one:

#

I'm trying to revive a dead artist style, but sdxl can't keep up with original, I've been trying for weeks

#

this is source, looks at those patterns.. wish sdxl could do it lmao

jade hornet Sep 27, 2024, 11:35 PM

#

that's a lot of detail

mighty sedge Sep 27, 2024, 11:36 PM

#

yah.. might have better luck with flux, but I don't got the hardware/patience for it

jade hornet Sep 27, 2024, 11:37 PM

#

hard to say, I'm still working on flux loras, and kohya only jsut recently added the ability to train the text encoder.

mighty sedge Sep 27, 2024, 11:37 PM

#

any luck with text encoder training? I'm using to training unet only, not sure if text enc benefits this style usecases

jade hornet Sep 27, 2024, 11:38 PM

#

well it converges quicker I can say that, but my current training is still in progress, so I'm not satisfied with it yet we'll say

mighty sedge Sep 27, 2024, 11:38 PM

#

I see, pretty nice to share to insights into this blackbox that is training lmao

jade hornet Sep 27, 2024, 11:39 PM

#

I'm doing one with multiple concepts just to see if it works...when I get strange defects I cant tell if it's going off course or if I just need more steps, so onward

mighty sedge Sep 27, 2024, 11:40 PM

#

yah, too many params, overfitting can take many forms. I tried the 2 layer flux training on civitai and it created carbon copies of source material, tried again burning a heap of buzz, 32dim,16 alpha, and got decent results

#

can't beat good ol adamw with low learning rate

mighty sedge Sep 28, 2024, 1:06 AM

#

I kinda like it, but not good enough for me..

#

Maybe lowering learning rate could help with the patterns and details?

#

already running Loha at 32rank so pretty high

mighty sedge Sep 28, 2024, 6:11 PM

#

Yah.. gave a shot on flux..

jade hornet Sep 29, 2024, 4:36 PM

#

I'll say that's definitely an interesting and complex style to try to mimic, it's not bad even if it misses the mark

mighty sedge Sep 29, 2024, 11:30 PM

#

yah.. I got some interesting results after training.. much closer to source

#

just released it on civitai, "ayahuasca dreams"

#

but ofc there no soul.... original artist had intention over each detail

charred ferry Sep 30, 2024, 12:49 AM

#

Hey there everyone!

I want to develop an application which converts hand drawn sketches to images of clothes/garment etc

Can i achieve this by finetuning StableDiffusion? If yes, how so?
I would appreciate resources on this

Moreover, my model will have image+text input

Is there any other better approach than stable diffusion?

Open to all suggestions, thankyou!

mighty sedge Sep 30, 2024, 1:25 AM

#

SD can already do this using img2img. Take a look into controlnets, "canny" and "lineart"

#

by using prompt + img2img + lower strenght controlnets you can achieve this

#

Finetuning or LoRAs are used incase you want to teach new concepts for the models, but there are a bunch of models with photorealistic or fantasy already, so unless you need something very specific you wouldn't use it.

#

Something of note is that there are "inpaint" models, which works better on some img2img scenarios. from personal experience "pony" models work best too.

#

this is a pretty bad example using canny only, not even img2img with a paint sketch