#StableDiffusion is too big to fine-tune

8 messages · Page 1 of 1 (latest)

gritty tendon
#

I need to fine-tune a text-to-image model. I'm trying to do it with StableDiffusion but 16Gb seems not to be enough VRAM even for 64x64 images and batch size of 1.

Is this normal or I'm doing something wrong?
Do you know some lighter version of StableDiffusion?
How could I configure SD to make it have less parameters?

Thank you!

sacred goblet
#

Dreambooth likes ~18-25GB ram, but there are variations that use less, I believe. You are best off using colab to be honest.

gritty tendon
sacred goblet
#

I'm not actually sure. Something akin to dreambooth but with a higher learning rate. I'm not sure if it freezes anything on top of the VAE components and text encoder, but there are repos that help you to fine tune these too.

However, I'm not sure exactly why you'd want to? Given a semi-large dataset you can use some variants of dreambooth to fine tune SD into a very different, if not more, capable model. Examples of these trained endpoints are available and are very good. To get SD level results from scratch would cost your tens of thousands in compute power, I'd assume. Plus the 10TB of space for the dataset.

gritty tendon
#

Do you know some reliable small text-to-image model?

sacred goblet
#

This is essentially stable diffusion