#š§ļ½finetune
1 messages Ā· Page 22 of 1
My goal is to only render real-life garments from input sketches
I think finetuning would be needed heree as i need to teach the model to generate eastern-specific real-world clothes from sketches
These clothes should have realistic texture, embroidery patterns and much more
There's 2 base models that are viable for this:
SDXL -> can finetune or create LoRA, you can finetune with 16GB vram or more, more is always better. Can run in 8GB cards, recommended 12GB for all the upscaling and controlnets.
FLUX -> can create LoRAs, finetuning still in its infancy. minimum 16GB vram for any serious work, 24 best. Not sure how much vram you need to create lora let alone finetune.
If you don't wanna invest in hardware you can create lora on civitai for about 2 dollars SDXL or 5 dollars flux each run.
Lora is easier to train, you can use about 15-30 images per concept and use it with the base model.
Finetuning is a different beast, only way to train it without investing is renting online gpu.
on top of that it is not an exact science, there is a lot of trial and error involved and only hands on experience can help.
The hurdles are:
1- Selecting dataset: selecting the images for training, a single bad image can ruin training and you need to know how to spot em.
2- Captioning: You need to caption each image accordingly, setting a triggerword for the concept you want to capture. EX: you want to capture a cross pattern skirt in training dataset. you will caption everything in the photo except the skirt, then add a triggerword for the skirt, the triggerword is chosen carefully as to not bleed into know words and concepts, you could choose for example "sk1rt".
3- Training parameters: This is a huge can of worms, there's just too many params. Learning rate, rank, alpha, normalization and so on. There's guides for all those concepts online, none of them are very good but serve as a great starting point.
well said, I'd add that captioning is another one of those trial-and-error type categories. and the different models respond differently to different captioning techniques. How to caption is really something that, well maybe in several years we'll have some kind of better understanding , but right now I feel like many of us out there are doing our own thing and figuring out what works
Hi everyone, Iām using the free tier of Google Colab and was able to fine-tuned an SDXL model with five images of Emma D'Arcy from House of the Dragon using Autotrain Dreambooth with the following parameters:
!autotrain dreambooth
--model 'stabilityai/stable-diffusion-xl-base-1.0'
--project-name ${PROJECT_NAME}
--image-path Images/
--prompt 'Photo of Emma D'Arcy'
--resolution 1024
--batch-size 1
--num-steps 500
--gradient-accumulation 4
--lr 1e-4
--mixed-precision fp16" ) \
However, when I try to fine-tune an SDXL model on a logo, the model isnāt able to learn it. Does anyone know why this might be happening, and could you suggest a solution to fix it?
you want just to regenerate the logo or put it in clothes and so on?
if it is second option you will need to add mockups on your dataset
I want to fine-tune the model by adding a specific logo into its vocabulary, using a trigger word like C4MP_L0G0. My goal is for the model to generate images based on prompts such as "A tall building with C4MP_L0G0 on it" or "People running a marathon wearing C4MP_L0G0 branded t-shirts," where the logo appears in the generated images accordingly.
Yeh.. you need to create mockups, dont need to be perfect, but dataset must contain a few
vectorize the logo on photopea and put it on some shirts and so on to create dataset
Also, C4MP_L0G0 is a bad trigger word
C4 is known concept for the model, LOG too
even if it is all a single word it bleeds concepts
training a flux dev lora on 48gb vram machine but its only consuming 25gb vram any solution?
I mean... You could increase the rank lol
You could also fire up comfyui and load a model and cancel the job to eat up the extra vram if that helps you feel better lol
Try to increase your micro-batch size.
Also FYI training on the free tier of colab is against Google's TOS. You don't have to care but it could get your account disabled
Hey all! I have a very niche requirement where I would like to inpaint a very specific background type and would like to finetune a model that, when an image of an object is inputted, It inpaints that aforementioned specific type of background. Do you have any guides as to how I can finetune a model to achieve what I want to do?
Example of the aforementioned background type (a gradient background):
The idea is that I can pass an the below image (with a transparent background), the model will then inpaint the transparent area with a gradient background along with the shadow:
No need to finetune I guess, you just need a comfyUI workflow
Thanks - the reason for fine tuning is because if I specify āgradient backgroundā in the prompt, the model doesnāt inpaint a gradient background with shadows
juggernaut model
not perfect as is but if you adjust the params might get closer
Thank you for sharing - yes Iāve actually been able to do this, but what Iām looking for is a specific type of background that is consistent. The problem I have with a non fine tuned model is that the prompt doesnāt return the gradient background consistently.
The goal is if Iām able to enter a token āgradientbackgroundBlackWhiteā in the prompt, it would return the image but with the gradient background (as well as the relevant shadows) consistently
Just train a lora then
tho I think it is easier to just photoshop at this point
since it is very specific
Thanks - Iāve been trying to search how to train a Lora for inpainting and the types of datasets to use, but either looking in the wrong places or itās not there. Are there any guides out there?
Only trial and error tbh, hyperparameter tuning takes a long time
but for the dataset you'll need some mockups
like various objects in this background
then caption the objects and leave the background with the trigger word
Thank you @mighty sedge!
try your first loras on civitai
you can train about 5 sdxl loras with 5 dollars
the default settings are usually "okay" and give you a starting point if you wanna go local training
Hey guy's any advice I trained on this b/w anime style a on flux LoRA, but isn't absorbing the trigger word properly. It works fine when I crank the strength to 1.3-1.4 and use prompting explaining the style.
Should I increase dims and drop LR? This was already trained on 3k total steps, 16/16, batch size 2 and natural captions without style description.
Lora training: I trained a man, but he has a female v*⦠(Pony training). Any idea how to fix that with another training?
either you add some naked photos to the dataset or adjust captioning. Pony is heavily biased toward women
Okay, thanks.
Hey there everyone!
Could anyone provide me a guide for LoRA finetuning of SDXL or Flux?
I have readied a dataset that contains pairs of sketch and its corresponding realistic images
along with caption for each realistic image
my model will take in sketch input and convert it into realistic image using prompt
how do i move forward to LoRA finetuning on this custom dataset?
image -> image is not really standard Lora training, you probably have to edit existing training repos on your own for that; kinda sounds like making your own ControlNet
aside from that, am looking for Flux training guides myself, I haven't managed to get it to work on a 3090 yet
Hi, I am trying to use dreambooth to finetune SDXL base 1.0 on google console, using the NVIDIA L4 for GPUs.
When I select my parameters, the training either does not run and returns an error, or runs but when I load the resulting safetensors into AUTOMATIC1111, it produces very weird images (like the one attached, the prompt was something with a man standing in the nature). Has anyone experience this? What am I doing wrong? Would anyone be willing to share an example of their training parameters, that actually work, with me?
Do you generate samples while training? gotta check those every 100 steps or so and check when this starts happening
I've never had this problem, but it looks like model colapse?
or it literally happens every epoch you try to load?
š„š„ any recommendations to finetune 3.5 ? Code, configs,.. ? š„š„ (No lora, finetune)
I have created a LoRA based on the Shakker-Labs/AWPortrait-FL pretrained model and runwayml/stable-diffusion-v1-5 unet and vae.
Is it possible to upload the LoRA to some service and use it as pay-per-use? So I don't have to rent a gpu?
What service should I use for this and how?
hello,everyone, i want to fine-tune a sd1.5-inpainting for human, which vae model shoud i choose? 
sd1.5 vae
Was anyone already able to train SD 3.5 on KohyaSS ?
He released a new repo named 3-3.5-FLUX ?
Anyone ?
its beta, works but barely
One trainer has much better support for SD3.5 I would recommend you to use that instead
Thanks mate, I have been on Simple Tuner afterward, good trainer. Require linux experience but more well documented repo
Is the HG code to for SD3 working with 3.5 ?
Hope SAI could release official training and fine-tuning code for model/CN maybe Lora/CNLora, providing the tool to the community will help a lot the devs š
About 3.5 controlnet, is there an approximate release date ? š„³
Hi everyone,
I've been fine-tuning an SDXL model on a dataset of 15 images featuring a logo. This dataset includes the logo in different colors and placements, such as on billboards, booths, and walls. For each image, I used varied prompts like āa photo of sks logo, [caption generated by Florence 2].ā
When I generated images using prompts like āa photo of sks logo on top of a buildingā or āa photo of sks logo on a t-shirt,ā the logo appeared distorted.
To troubleshoot, I created mockups of t-shirts with the logo and fine-tuned an SDXL model using instance_prompt="photo of sks shirt." This worked well for prompts like āphoto of a boy wearing an sks shirt,ā but it didnāt yield results for prompts such as āsks logo on top of a tall building.ā
I read that DreamBooth LoRA performs better with instance prompts in the format āa photo of [unique identifier] [class].ā In the second case when using "t-shirt" was the class, the model performs well for that context. However, using "logo" as the class in the first model didnāt result in accurate contexts.
Is there an approach I can try to generate the sks logo in varied contexts (like on buildings, t-shirts, etc.) more reliably?
This is partly due to how lossy the VAE is, it compresses the image, so everything that is "far away" from the view and has less pixels to work with will look distorted. You should upscale/inpaint and see if results are better
Hi! Where can I find the ultimate guide to LORA training? Not a simple guide that explains what to do - but an advanced guide, which explains what to do when you're not happy with the results..?
There isn`t one. the professional thing to do once you're 100% sure the dataset is fine would be to do a hyperparameter search, like any fine tunning. You're likely to get a feeling for it tho, like knowing when to increase rank, lower LR, and such. But it is mostly trial and error. Most of the times the problem lies in the dataset instead of the hyperparameters.
If you need assistance you can ping me
Are there any instructions on when to use which option in Forge's parameters, for example, the "Diffusion in low bits"?
do i need to use lora or dreambooth? in my case i want to do a finetuning using paintings
hello guys im trying to full finetune sdxl on 100k images any recommendation what should i use ?simpletuner or kohyass or anything else
I have trained more than 100 lora for sdxl using kohya. Assume it works for dinetune too.
@frigid wigeon
Anyone gotten kohya working on Ubuntu 24.04? I had it working great on 22.04. Can't get it running on 24.
New technique / AI Flipbook style animation - https://www.youtube.com/watch?v=e-F7rtctxHs
Technique consisting in a new synthetically trained AI model [FLUX.D LORA], a little bit of Python, and some human[?]made editing.
You can access this LORA as of today through @civitai, and full project files [1760 images + prompts + Py files] through: https://linktr.ee/uisato
#animation #ai #design
de-distill, libre flux, or fluxdev-2pro; which is the best model for finetuning flux?
getting some weird loss
idk why its happening, this config has worked before
help
Your learning rate is set to 1
set it to 0.0001 at most
In general, if your loss explodes, it's probably because your LR is too high
Yooo everyone. Does anyone have tips on training a 2.1 lora for a specific art style (painter) using around 100 sample images in different formats using Kohya or another tool??
I feel like the loras i train are either heavily under or overtrained:(
Hello, someone knows what do multires noise discount and multires noise iteration do? From what i read, the values fors from 0.1 to 1.0 and 1 to 10 respectively, and with higher values, the model will pay more attention on fine details like face, eyes and skin details. Is that true?
gud question
Ok how do people train LoKR with network_dim of 100000000? I have 80gb of RAM and that's not enough to train a LoKR with even 10000 dim. Also despite the huge dim they're only 64mb? What am I not understanding here?
Would you say this linked way of making hunyuan video loras are most effective? Or onetrainer?
https://civitai.com/articles/9798/training-a-lora-for-hunyuan-video-on-windows
any guide on training wan2.1 i2v lora?
i used diffusion-pipe to train t2v and looks like it supports i2v, just follow instructions mostly
How much training data/frames can one fit in a motion lora? As i'm currently accumulating tonnes of clips that i plan to create a large motion lora of. But don't know how much i can stuff in before it'll become "too much dataset", or is there no such as too large dataset? 
Wew, didn't realize rank 128 lora would be so damn huge lol
how do i avoid training a style while making a concept lora?
the dataset is relatively balanced in terms of art styles, but the end result overpowers any other style loras, and the style itself just does not look good
IDK if i can use reg images, its a niche concept that the base model cant do
to do a lora finetuning in sdxl, which is the recomend? diffusers, koya or OneTrainer?
BĆŗssola estilizada integrada a uma tela de TV ou antena
@thorny briar
Hi! I recently open tensorboard to have a glance on the performance of my trained LoRAs. I can't really figure out what the loss function represent ? Is it a distance between between generated images vs. dataset ? Or other thing ? Can't really find sources online
Loss one iirc is the resemblance betqween what you train and what model already has of information/context.
Hi everyone
I am trying to fine-tune flux+lora model on runpod.
I've deploy RTX 4090 with fluxgym template and set all fields of ui.
But I've encountered some errors which are related with huggingface. I've already login huggingface-cli with read permission token.
How can i resolve this?
I recommend using this subordinate.
pinokio
Have any of you managed to successfully fine tune Qwen-Image on consumer GPUs? I have a 2 x RTX 4090 machine with 512 GB system RAM and even with all blocks swapped to RAM, I can't get it to fine tune Qwen-Image with musubi-tuner, even using tiny 256px images
Hello,
Iām looking for a multiview lipsync model. I know a few, but I'm having trouble running them on my 5080 in a virtual machine. I've worked with LatentSync, KeySync, and GeneFace++, but I'm encountering either a CUDA sm_120 error or issues with multiview. Does anyone know of a suitable model?
I've been out of the loop for a while and getting back into it now, trying to train some Wan 2.2 LoRAs and have some noob questions. So the I2V LoRAs won't work with T2V LoRAs and vice versa, right? How about the Wan Fun Control models? Are they based on I2V or T2V and will LoRAs trained on one of those work with it?
Can anyone guide me to some resources on how to train an img2img lora?
I want the model to add christmas lights to houses consistently. img2img models give inconsistent results with just prompting and reference images, but they can remove the christmas lights perfectly. I could easily create a dataset by removing the lights from images and the training the lora with the houses with lights as the target and the houses with no lights as the prompts
Has anyone done any work on figuring out what layers are responsible for grid artifacts / screen door artifacts in Qwen-Image?
Hi, I testing qwen edit, and I see at the nodes_qwen.py that the clip tokenizer accept list of images (tensors):
conditioning = clip.encode_from_tokens_scheduled(tokens)
if ref_latent is not None:
conditioning = node_helpers.conditioning_set_values(conditioning, {"reference_latents": [ref_latent]}, append=True)
return (conditioning, )```
But if I sending list of reference images (I dont wanna concat them) the result always match only the first image. The `images` and the `ref_latent` is simply list type variable created by `append()`. Is it possible to send list of tensors (and latents) to the conditioning to process all input images without merging or conditioning?
Hey, If I'm cropping a bunch of images for a character lora, should I be adding random backgrounds so the model doesn't pick up the background as part of the lora?
Maybe something like this, but I pan around so no background is the same?
To make a successful motion only lora, what should i adjust? Using 1 clip, 33 frames long, tried 200 repeats, 1e-5 learning rate, rank 16, after 10 epochs, still no closer to any resemblance of motion, but if i do 2e-5, 150 repeats and rank 32, it gets the motion, but 80% of the result is of the character of said motion
Using diffusion pipe with said parameters altered from the stock wan dataset parameters.
Hello everyone. Has anyone tried fine-tuning the SD 3.5 medium? I've been struggling for days to get it working, but so far I've only noticed that the AdamW optimizer is killing it. It drops to pure noise even at ultra-low LR (1e-7) after a thousand steps. Adafactor seems more stable, but no matter what I doālow LR, high LRāthe quality just degrades. The dataset is excellent, very large size and high quality, and I use kohya sd-scripts. Could there be something wrong with my settings? Maybe the SD 3.5 needs a special optimizer? I even tried setting Max grad norm 0.1, but it only slowed down the degradation. Even lowerāNaN.
accelerate launch --multi_gpu --num_cpu_threads_per_process=8 sd3_train.py --pretrained_model_name_or_path="/root/.cache/huggingface/hub/models--stabilityai--stable-diffusion-3.5-medium/snapshots/b940f670f0eda2d07fbb75229e779da1ad11eb80/sd3.5_medium.safetensors" --clip_l="/root/.cache/huggingface/hub/models--stabilityai--stable-diffusion-3.5-medium/snapshots/b940f670f0eda2d07fbb75229e779da1ad11eb80/text_encoders/clip_l.safetensors" --clip_g="/root/.cache/huggingface/hub/models--stabilityai--stable-diffusion-3.5-medium/snapshots/b940f670f0eda2d07fbb75229e779da1ad11eb80/text_encoders/clip_g.safetensors" --t5xxl="/root/.cache/huggingface/hub/models--stabilityai--stable-diffusion-3.5-medium/snapshots/b940f670f0eda2d07fbb75229e779da1ad11eb80/text_encoders/t5xxl_fp16.safetensors" --train_data_dir="/workspace/Pictures-and-tags" --output_dir="/workspace/output" --output_name="Q84" --resolution="1024,1024" --enable_bucket --min_bucket_reso=448 --max_bucket_reso=4096 --bucket_reso_steps=64 --train_batch_size=13 --max_train_steps=65000 --learning_rate=9e-6 --max_grad_norm=1.0 --lr_scheduler="constant_with_warmup" --lr_warmup_steps=1500 --mixed_precision="bf16" --gradient_checkpointing --optimizer_type="Adafactor" --optimizer_args "scale_parameter=False" "relative_step=False" "warmup_init=False" --sdpa --save_every_n_steps=550 --save_precision="bf16" --save_model_as="safetensors" --sample_every_n_steps=100 --sample_prompts="/workspace/sample_prompts.txt" --sample_sampler="euler" --logging_dir="/workspace/logs" --log_with="wandb" --max_data_loader_n_workers=6 --caption_extension=".txt" --seed=16426 --caption_dropout_rate=0.05 --resize_interpolation area --ddp_gradient_as_bucket_view --loss_type "l2" --vae=/workspace/Vae-checkpoints/vae_clean.safetensors --training_shift 3.0 --weighting_scheme="logit_normal" --gradient_accumulation_steps 2
Degradion on steps. After this black-white noise, after latent noise
It just degrades, even at very low lr. I increased the batch size to 256-512, but it didn't help. The lower the lr, the slower the degradation. Increasing it to 5e-5 simply speeds it up; visually, the degradation is the same. First, the image is zoomed in and cropped, then it dissolves.
Training shift 1.0/3.0. Degradation everytime.
What I've tried so far: max grad norm 0.01 minimum, maximum until it's completely disabled. I couldn't notice much of a difference. AdamW, adafactor, wrote an experience above. I tried Prodigy, but it crashes to NaN after a couple of steps if I set the lr to 1e-6 or higher. If I set it lower, it simply doesn't increase the lr, the model doesn't move, and the samples remain at roughly the same low level.
The problem isn't with vae. Using a custom one isn't the reason. I removed this parameter completely and used the vae from the model, but the problem persisted. I'd appreciate any tips or help. Surprisingly, the loss, despite such degradation, either plateaus or decreases very slowly. But the visual quality remains very low, despite changing the validation sampler, replacing the sampler manually with comfyui, 30-400 steps, and 3-18 cfg. In any case, the quality doesn't improve. If this helps, I can provide logs and wandb metrics.
I suspected the shuffle tags argument might be the cause, perhaps because Clip or T5 with mmdit can't shuffle tags. Unfortunately, even using the tags in a consistent order didn't help at all.
Hey everyone. I have a request for the community here:
I am working on some research in school about failed SD models: collapsed, overfit, etc to kind of identify and document some different characteristics.
I can't seem to find many in the wild because obviously people don't upload their failures, so I wanted to see if anyone here has some particularly bad or weird models they would be down to share that I could run some tests on. Having the training parameters would be great but not necessary. Any base model is fine, I'm kind of doing an inventory and will share the results. If you'd like to share I won't post anything publicly without permission besides a few outputs on the document.
Here is an example from a trash SD2.1 model of art museum photos that I kind of love because the outputs are so weird.
š
hey,
i help teams fine-tune models for their specific use cases. whether it's adapting llama 3, mistral, or a smaller specialized model, i focus on making it actually improve your workflowānot just look cool in a demo.
i handle the data prep, training loops, eval, and deployment so you don't have to wrestle with the details.
if you've got a dataset and a goal, but aren't sure how to bridge the gap, hit me up. happy to review your approach or jump in end-to-end.
Hi, on my new Comfy node I have presets for analog color (CF) and black and white (BWF) films and CCD sensors (CCD). but I dont really know it is real or just hallucinated (the vibe coder I used to create presets) Somebody can test it because I would like to know it is really similar or just something random?
Hey ā not sure if this is the right place for this, feel free to point me elsewhere if not.
I don't think it counts as self-promotion, because it is just a open-source framework I'm making available if it helps someone.
Iāve been working with LoRAs for a while and got frustrated with how inconsistent they can feel depending on prompt/setup, so I built a local workbench to evaluate them more systematically from ComfyUI workflows.
It stores measured outputs + build context, and lets you review runs side-by-side with metrics instead of just eyeballing results.
Main thing I found: there isnāt really a universal āscoreā for LoRA impact ā a lot of them behave very differently depending on conditions, so the tool focuses on capturing evidence and letting you inspect it.
Thought it might be useful to others here:
https://github.com/Gyropilot2/lora-evaluation-project
Hi, Comfy users! Check out the preview of my new post-processing node here: https://youtu.be/ya1Kq8U3EWg.
For developers: One simple JavaScript controls the separation of features by titles, labels, and feature preview pictures.
Increase reality of generated images, simulate analog films, CCDs, phones, photopapers, and apply LUR .cube files. Rasterix is the part of Primere nodepack. Get Primere nodepack from GitHUB.