I need to fine-tune a text-to-image model. I'm trying to do it with StableDiffusion but 16Gb seems not to be enough VRAM even for 64x64 images and batch size of 1.
Is this normal or I'm doing something wrong?
Do you know some lighter version of StableDiffusion?
How could I configure SD to make it have less parameters?
Thank you!