#šŸ”§ļ½œfinetune

1 messages Ā· Page 22 of 1

mighty sedge
#

for your usecase:
depending on controlnet strenght it dictates how faithfull it is to the drawing

#

same prompt different strenght

#

!

charred ferry
#

My goal is to only render real-life garments from input sketches

I think finetuning would be needed heree as i need to teach the model to generate eastern-specific real-world clothes from sketches

These clothes should have realistic texture, embroidery patterns and much more

mighty sedge
#

There's 2 base models that are viable for this:
SDXL -> can finetune or create LoRA, you can finetune with 16GB vram or more, more is always better. Can run in 8GB cards, recommended 12GB for all the upscaling and controlnets.
FLUX -> can create LoRAs, finetuning still in its infancy. minimum 16GB vram for any serious work, 24 best. Not sure how much vram you need to create lora let alone finetune.

If you don't wanna invest in hardware you can create lora on civitai for about 2 dollars SDXL or 5 dollars flux each run.

Lora is easier to train, you can use about 15-30 images per concept and use it with the base model.

Finetuning is a different beast, only way to train it without investing is renting online gpu.

on top of that it is not an exact science, there is a lot of trial and error involved and only hands on experience can help.

The hurdles are:
1- Selecting dataset: selecting the images for training, a single bad image can ruin training and you need to know how to spot em.

2- Captioning: You need to caption each image accordingly, setting a triggerword for the concept you want to capture. EX: you want to capture a cross pattern skirt in training dataset. you will caption everything in the photo except the skirt, then add a triggerword for the skirt, the triggerword is chosen carefully as to not bleed into know words and concepts, you could choose for example "sk1rt".

3- Training parameters: This is a huge can of worms, there's just too many params. Learning rate, rank, alpha, normalization and so on. There's guides for all those concepts online, none of them are very good but serve as a great starting point.

jade hornet
#

well said, I'd add that captioning is another one of those trial-and-error type categories. and the different models respond differently to different captioning techniques. How to caption is really something that, well maybe in several years we'll have some kind of better understanding , but right now I feel like many of us out there are doing our own thing and figuring out what works

eternal snow
#

Hi everyone, I’m using the free tier of Google Colab and was able to fine-tuned an SDXL model with five images of Emma D'Arcy from House of the Dragon using Autotrain Dreambooth with the following parameters:

!autotrain dreambooth
--model 'stabilityai/stable-diffusion-xl-base-1.0'
--project-name ${PROJECT_NAME}
--image-path Images/
--prompt 'Photo of Emma D'Arcy'
--resolution 1024
--batch-size 1
--num-steps 500
--gradient-accumulation 4
--lr 1e-4
--mixed-precision fp16" ) \

However, when I try to fine-tune an SDXL model on a logo, the model isn’t able to learn it. Does anyone know why this might be happening, and could you suggest a solution to fix it?

mighty sedge
#

you want just to regenerate the logo or put it in clothes and so on?

#

if it is second option you will need to add mockups on your dataset

eternal snow
#

I want to fine-tune the model by adding a specific logo into its vocabulary, using a trigger word like C4MP_L0G0. My goal is for the model to generate images based on prompts such as "A tall building with C4MP_L0G0 on it" or "People running a marathon wearing C4MP_L0G0 branded t-shirts," where the logo appears in the generated images accordingly.

mighty sedge
#

Yeh.. you need to create mockups, dont need to be perfect, but dataset must contain a few

#

vectorize the logo on photopea and put it on some shirts and so on to create dataset

#

Also, C4MP_L0G0 is a bad trigger word

#

C4 is known concept for the model, LOG too

#

even if it is all a single word it bleeds concepts

placid dune
#

training a flux dev lora on 48gb vram machine but its only consuming 25gb vram any solution?

neat fox
#

I mean... You could increase the rank lol

#

You could also fire up comfyui and load a model and cancel the job to eat up the extra vram if that helps you feel better lol

valid sentinel
jade hornet
lament rock
#

Hey all! I have a very niche requirement where I would like to inpaint a very specific background type and would like to finetune a model that, when an image of an object is inputted, It inpaints that aforementioned specific type of background. Do you have any guides as to how I can finetune a model to achieve what I want to do?

Example of the aforementioned background type (a gradient background):

#

The idea is that I can pass an the below image (with a transparent background), the model will then inpaint the transparent area with a gradient background along with the shadow:

mighty sedge
#

No need to finetune I guess, you just need a comfyUI workflow

lament rock
#

Thanks - the reason for fine tuning is because if I specify ā€œgradient backgroundā€ in the prompt, the model doesn’t inpaint a gradient background with shadows

mighty sedge
#

juggernaut model

#

not perfect as is but if you adjust the params might get closer

lament rock
#

Thank you for sharing - yes I’ve actually been able to do this, but what I’m looking for is a specific type of background that is consistent. The problem I have with a non fine tuned model is that the prompt doesn’t return the gradient background consistently.

The goal is if I’m able to enter a token ā€œgradientbackgroundBlackWhiteā€ in the prompt, it would return the image but with the gradient background (as well as the relevant shadows) consistently

mighty sedge
#

Just train a lora then

#

tho I think it is easier to just photoshop at this point

#

since it is very specific

lament rock
#

Thanks - I’ve been trying to search how to train a Lora for inpainting and the types of datasets to use, but either looking in the wrong places or it’s not there. Are there any guides out there?

mighty sedge
#

Only trial and error tbh, hyperparameter tuning takes a long time

#

but for the dataset you'll need some mockups

#

like various objects in this background

#

then caption the objects and leave the background with the trigger word

lament rock
#

Thank you @mighty sedge!

mighty sedge
#

try your first loras on civitai

#

you can train about 5 sdxl loras with 5 dollars

#

the default settings are usually "okay" and give you a starting point if you wanna go local training

sage plover
#

Hey guy's any advice I trained on this b/w anime style a on flux LoRA, but isn't absorbing the trigger word properly. It works fine when I crank the strength to 1.3-1.4 and use prompting explaining the style.

#

Should I increase dims and drop LR? This was already trained on 3k total steps, 16/16, batch size 2 and natural captions without style description.

desert fulcrum
#

Lora training: I trained a man, but he has a female v*… (Pony training). Any idea how to fix that with another training?

mighty sedge
#

either you add some naked photos to the dataset or adjust captioning. Pony is heavily biased toward women

desert fulcrum
#

Okay, thanks.

charred ferry
#

Hey there everyone!
Could anyone provide me a guide for LoRA finetuning of SDXL or Flux?
I have readied a dataset that contains pairs of sketch and its corresponding realistic images
along with caption for each realistic image

my model will take in sketch input and convert it into realistic image using prompt

how do i move forward to LoRA finetuning on this custom dataset?

stray pecan
#

aside from that, am looking for Flux training guides myself, I haven't managed to get it to work on a 3090 yet

pine relic
#

Hi, I am trying to use dreambooth to finetune SDXL base 1.0 on google console, using the NVIDIA L4 for GPUs.
When I select my parameters, the training either does not run and returns an error, or runs but when I load the resulting safetensors into AUTOMATIC1111, it produces very weird images (like the one attached, the prompt was something with a man standing in the nature). Has anyone experience this? What am I doing wrong? Would anyone be willing to share an example of their training parameters, that actually work, with me?

mighty sedge
#

Do you generate samples while training? gotta check those every 100 steps or so and check when this starts happening

#

I've never had this problem, but it looks like model colapse?

#

or it literally happens every epoch you try to load?

celest prism
rancid scarab
#

šŸ”„šŸ”„ any recommendations to finetune 3.5 ? Code, configs,.. ? šŸ”„šŸ”„ (No lora, finetune)

pallid hollow
#

I have created a LoRA based on the Shakker-Labs/AWPortrait-FL pretrained model and runwayml/stable-diffusion-v1-5 unet and vae.

Is it possible to upload the LoRA to some service and use it as pay-per-use? So I don't have to rent a gpu?

What service should I use for this and how?

storm blade
#

hello,everyone, i want to fine-tune a sd1.5-inpainting for human, which vae model shoud i choose? waow

latent charm
#

sd1.5 vae

candid ledge
#

Was anyone already able to train SD 3.5 on KohyaSS ?
He released a new repo named 3-3.5-FLUX ?
Anyone ?

mighty sedge
#

its beta, works but barely

cunning heath
rancid scarab
#

Is the HG code to for SD3 working with 3.5 ?
Hope SAI could release official training and fine-tuning code for model/CN maybe Lora/CNLora, providing the tool to the community will help a lot the devs šŸ˜‰

rancid scarab
#

About 3.5 controlnet, is there an approximate release date ? 🄳

eternal snow
#

Hi everyone,

I've been fine-tuning an SDXL model on a dataset of 15 images featuring a logo. This dataset includes the logo in different colors and placements, such as on billboards, booths, and walls. For each image, I used varied prompts like ā€œa photo of sks logo, [caption generated by Florence 2].ā€

When I generated images using prompts like ā€œa photo of sks logo on top of a buildingā€ or ā€œa photo of sks logo on a t-shirt,ā€ the logo appeared distorted.

To troubleshoot, I created mockups of t-shirts with the logo and fine-tuned an SDXL model using instance_prompt="photo of sks shirt." This worked well for prompts like ā€œphoto of a boy wearing an sks shirt,ā€ but it didn’t yield results for prompts such as ā€œsks logo on top of a tall building.ā€

I read that DreamBooth LoRA performs better with instance prompts in the format ā€œa photo of [unique identifier] [class].ā€ In the second case when using "t-shirt" was the class, the model performs well for that context. However, using "logo" as the class in the first model didn’t result in accurate contexts.

Is there an approach I can try to generate the sks logo in varied contexts (like on buildings, t-shirts, etc.) more reliably?

mighty sedge
#

This is partly due to how lossy the VAE is, it compresses the image, so everything that is "far away" from the view and has less pixels to work with will look distorted. You should upscale/inpaint and see if results are better

steady pumice
#

Hi! Where can I find the ultimate guide to LORA training? Not a simple guide that explains what to do - but an advanced guide, which explains what to do when you're not happy with the results..?

mighty sedge
#

If you need assistance you can ping me

digital scaffold
#

Are there any instructions on when to use which option in Forge's parameters, for example, the "Diffusion in low bits"?

torpid basalt
#

do i need to use lora or dreambooth? in my case i want to do a finetuning using paintings

frigid wigeon
#

hello guys im trying to full finetune sdxl on 100k images any recommendation what should i use ?simpletuner or kohyass or anything else

crude holly
#

I have trained more than 100 lora for sdxl using kohya. Assume it works for dinetune too.

#

@frigid wigeon

#

Anyone gotten kohya working on Ubuntu 24.04? I had it working great on 22.04. Can't get it running on 24.

lofty iris
tawny storm
#

de-distill, libre flux, or fluxdev-2pro; which is the best model for finetuning flux?

gaunt cosmos
#

getting some weird loss

#

idk why its happening, this config has worked before

#

help

gray karma
#

Your learning rate is set to 1

#

set it to 0.0001 at most

#

In general, if your loss explodes, it's probably because your LR is too high

slow moon
#

Yooo everyone. Does anyone have tips on training a 2.1 lora for a specific art style (painter) using around 100 sample images in different formats using Kohya or another tool??

#

I feel like the loras i train are either heavily under or overtrained:(

neat geode
#

Hello, someone knows what do multires noise discount and multires noise iteration do? From what i read, the values fors from 0.1 to 1.0 and 1 to 10 respectively, and with higher values, the model will pay more attention on fine details like face, eyes and skin details. Is that true?

calm owl
#

gud question

main breach
#

Ok how do people train LoKR with network_dim of 100000000? I have 80gb of RAM and that's not enough to train a LoKR with even 10000 dim. Also despite the huge dim they're only 64mb? What am I not understanding here?

ocean dune
charred ferry
#

any guide on training wan2.1 i2v lora?

hot breach
ocean dune
#

How much training data/frames can one fit in a motion lora? As i'm currently accumulating tonnes of clips that i plan to create a large motion lora of. But don't know how much i can stuff in before it'll become "too much dataset", or is there no such as too large dataset? HAhaa

ocean dune
#

Wew, didn't realize rank 128 lora would be so damn huge lol

gaunt cosmos
#

how do i avoid training a style while making a concept lora?
the dataset is relatively balanced in terms of art styles, but the end result overpowers any other style loras, and the style itself just does not look good
IDK if i can use reg images, its a niche concept that the base model cant do

torpid basalt
#

to do a lora finetuning in sdxl, which is the recomend? diffusers, koya or OneTrainer?

last dove
#

BĆŗssola estilizada integrada a uma tela de TV ou antena

gentle flame
#

@thorny briar

wind sorrel
#

Hi! I recently open tensorboard to have a glance on the performance of my trained LoRAs. I can't really figure out what the loss function represent ? Is it a distance between between generated images vs. dataset ? Or other thing ? Can't really find sources online

ocean dune
terse wharf
#

Hi everyone
I am trying to fine-tune flux+lora model on runpod.
I've deploy RTX 4090 with fluxgym template and set all fields of ui.
But I've encountered some errors which are related with huggingface. I've already login huggingface-cli with read permission token.
How can i resolve this?

jolly urchin
hoary ember
#

Have any of you managed to successfully fine tune Qwen-Image on consumer GPUs? I have a 2 x RTX 4090 machine with 512 GB system RAM and even with all blocks swapped to RAM, I can't get it to fine tune Qwen-Image with musubi-tuner, even using tiny 256px images

dawn schooner
#

Hello,
I’m looking for a multiview lipsync model. I know a few, but I'm having trouble running them on my 5080 in a virtual machine. I've worked with LatentSync, KeySync, and GeneFace++, but I'm encountering either a CUDA sm_120 error or issues with multiview. Does anyone know of a suitable model?

grizzled folio
#

I've been out of the loop for a while and getting back into it now, trying to train some Wan 2.2 LoRAs and have some noob questions. So the I2V LoRAs won't work with T2V LoRAs and vice versa, right? How about the Wan Fun Control models? Are they based on I2V or T2V and will LoRAs trained on one of those work with it?

inner parcel
#

Can anyone guide me to some resources on how to train an img2img lora?

#

I want the model to add christmas lights to houses consistently. img2img models give inconsistent results with just prompting and reference images, but they can remove the christmas lights perfectly. I could easily create a dataset by removing the lights from images and the training the lora with the houses with lights as the target and the houses with no lights as the prompts

dapper prism
#

Has anyone done any work on figuring out what layers are responsible for grid artifacts / screen door artifacts in Qwen-Image?

grave berry
#

Hi, I testing qwen edit, and I see at the nodes_qwen.py that the clip tokenizer accept list of images (tensors):

        conditioning = clip.encode_from_tokens_scheduled(tokens)
        if ref_latent is not None:
            conditioning = node_helpers.conditioning_set_values(conditioning, {"reference_latents": [ref_latent]}, append=True)
        return (conditioning, )```

But if I sending list of reference images (I dont wanna concat them) the result always match only the first image. The `images` and the `ref_latent` is simply list type variable created by `append()`. Is it possible to send list of tensors (and latents) to the conditioning to process all input images without merging or conditioning?
sullen lagoon
#

Hey, If I'm cropping a bunch of images for a character lora, should I be adding random backgrounds so the model doesn't pick up the background as part of the lora?

#

Maybe something like this, but I pan around so no background is the same?

ocean dune
#

To make a successful motion only lora, what should i adjust? Using 1 clip, 33 frames long, tried 200 repeats, 1e-5 learning rate, rank 16, after 10 epochs, still no closer to any resemblance of motion, but if i do 2e-5, 150 repeats and rank 32, it gets the motion, but 80% of the result is of the character of said motion Thonk Using diffusion pipe with said parameters altered from the stock wan dataset parameters.

rugged cedar
#

Hello everyone. Has anyone tried fine-tuning the SD 3.5 medium? I've been struggling for days to get it working, but so far I've only noticed that the AdamW optimizer is killing it. It drops to pure noise even at ultra-low LR (1e-7) after a thousand steps. Adafactor seems more stable, but no matter what I do—low LR, high LR—the quality just degrades. The dataset is excellent, very large size and high quality, and I use kohya sd-scripts. Could there be something wrong with my settings? Maybe the SD 3.5 needs a special optimizer? I even tried setting Max grad norm 0.1, but it only slowed down the degradation. Even lower—NaN.

#

accelerate launch --multi_gpu --num_cpu_threads_per_process=8 sd3_train.py --pretrained_model_name_or_path="/root/.cache/huggingface/hub/models--stabilityai--stable-diffusion-3.5-medium/snapshots/b940f670f0eda2d07fbb75229e779da1ad11eb80/sd3.5_medium.safetensors" --clip_l="/root/.cache/huggingface/hub/models--stabilityai--stable-diffusion-3.5-medium/snapshots/b940f670f0eda2d07fbb75229e779da1ad11eb80/text_encoders/clip_l.safetensors" --clip_g="/root/.cache/huggingface/hub/models--stabilityai--stable-diffusion-3.5-medium/snapshots/b940f670f0eda2d07fbb75229e779da1ad11eb80/text_encoders/clip_g.safetensors" --t5xxl="/root/.cache/huggingface/hub/models--stabilityai--stable-diffusion-3.5-medium/snapshots/b940f670f0eda2d07fbb75229e779da1ad11eb80/text_encoders/t5xxl_fp16.safetensors" --train_data_dir="/workspace/Pictures-and-tags" --output_dir="/workspace/output" --output_name="Q84" --resolution="1024,1024" --enable_bucket --min_bucket_reso=448 --max_bucket_reso=4096 --bucket_reso_steps=64 --train_batch_size=13 --max_train_steps=65000 --learning_rate=9e-6 --max_grad_norm=1.0 --lr_scheduler="constant_with_warmup" --lr_warmup_steps=1500 --mixed_precision="bf16" --gradient_checkpointing --optimizer_type="Adafactor" --optimizer_args "scale_parameter=False" "relative_step=False" "warmup_init=False" --sdpa --save_every_n_steps=550 --save_precision="bf16" --save_model_as="safetensors" --sample_every_n_steps=100 --sample_prompts="/workspace/sample_prompts.txt" --sample_sampler="euler" --logging_dir="/workspace/logs" --log_with="wandb" --max_data_loader_n_workers=6 --caption_extension=".txt" --seed=16426 --caption_dropout_rate=0.05 --resize_interpolation area --ddp_gradient_as_bucket_view --loss_type "l2" --vae=/workspace/Vae-checkpoints/vae_clean.safetensors --training_shift 3.0 --weighting_scheme="logit_normal" --gradient_accumulation_steps 2

#

Degradion on steps. After this black-white noise, after latent noise

#

It just degrades, even at very low lr. I increased the batch size to 256-512, but it didn't help. The lower the lr, the slower the degradation. Increasing it to 5e-5 simply speeds it up; visually, the degradation is the same. First, the image is zoomed in and cropped, then it dissolves.
Training shift 1.0/3.0. Degradation everytime.
What I've tried so far: max grad norm 0.01 minimum, maximum until it's completely disabled. I couldn't notice much of a difference. AdamW, adafactor, wrote an experience above. I tried Prodigy, but it crashes to NaN after a couple of steps if I set the lr to 1e-6 or higher. If I set it lower, it simply doesn't increase the lr, the model doesn't move, and the samples remain at roughly the same low level.

The problem isn't with vae. Using a custom one isn't the reason. I removed this parameter completely and used the vae from the model, but the problem persisted. I'd appreciate any tips or help. Surprisingly, the loss, despite such degradation, either plateaus or decreases very slowly. But the visual quality remains very low, despite changing the validation sampler, replacing the sampler manually with comfyui, 30-400 steps, and 3-18 cfg. In any case, the quality doesn't improve. If this helps, I can provide logs and wandb metrics.

#

I suspected the shuffle tags argument might be the cause, perhaps because Clip or T5 with mmdit can't shuffle tags. Unfortunately, even using the tags in a consistent order didn't help at all.

molten turret
#

Hey everyone. I have a request for the community here:

I am working on some research in school about failed SD models: collapsed, overfit, etc to kind of identify and document some different characteristics.

I can't seem to find many in the wild because obviously people don't upload their failures, so I wanted to see if anyone here has some particularly bad or weird models they would be down to share that I could run some tests on. Having the training parameters would be great but not necessary. Any base model is fine, I'm kind of doing an inventory and will share the results. If you'd like to share I won't post anything publicly without permission besides a few outputs on the document.

Here is an example from a trash SD2.1 model of art museum photos that I kind of love because the outputs are so weird.

vague surge
#

šŸ™‚

restive loom
#

hey,

i help teams fine-tune models for their specific use cases. whether it's adapting llama 3, mistral, or a smaller specialized model, i focus on making it actually improve your workflow—not just look cool in a demo.

i handle the data prep, training loops, eval, and deployment so you don't have to wrestle with the details.

if you've got a dataset and a goal, but aren't sure how to bridge the gap, hit me up. happy to review your approach or jump in end-to-end.

grave berry
#

Hi, on my new Comfy node I have presets for analog color (CF) and black and white (BWF) films and CCD sensors (CCD). but I dont really know it is real or just hallucinated (the vibe coder I used to create presets) Somebody can test it because I would like to know it is really similar or just something random?

dense sail
#

Hey — not sure if this is the right place for this, feel free to point me elsewhere if not.
I don't think it counts as self-promotion, because it is just a open-source framework I'm making available if it helps someone.

I’ve been working with LoRAs for a while and got frustrated with how inconsistent they can feel depending on prompt/setup, so I built a local workbench to evaluate them more systematically from ComfyUI workflows.

It stores measured outputs + build context, and lets you review runs side-by-side with metrics instead of just eyeballing results.

Main thing I found: there isn’t really a universal ā€œscoreā€ for LoRA impact — a lot of them behave very differently depending on conditions, so the tool focuses on capturing evidence and letting you inspect it.

Thought it might be useful to others here:
https://github.com/Gyropilot2/lora-evaluation-project

GitHub

A local lab for evaluating image-generation LoRA runs from ComfyUI workflows. It stores measured outputs and build facts, then assembles inspectable reviews for side-by-side comparison, reruns, and...

grave berry
#

Hi, Comfy users! Check out the preview of my new post-processing node here: https://youtu.be/ya1Kq8U3EWg.
For developers: One simple JavaScript controls the separation of features by titles, labels, and feature preview pictures.

Increase reality of generated images, simulate analog films, CCDs, phones, photopapers, and apply LUR .cube files. Rasterix is the part of Primere nodepack. Get Primere nodepack from GitHUB.

ā–¶ Play video