#Recommended tips to mitigate VRAM usage?

1 messages · Page 1 of 1 (latest)

main fiber
#

I recently bought a 3090 24GB, but seem to still be running out of vram. I am using a quantized t5 as well as ae, clip-l, and clip-g, and i'm using about 18 GB of vram. Any recommendations to mitigate the vram usage? As soon as i throw in a 2 GB lora, i'm getting OOM errors

azure cloud
#

messing with the text encoders and vae is barking up the wrong tree

#

don't mess with them, leave them default

#

you want to swap the main model itself (presumably a Flux model from what you're describing) for a smaller variant

main fiber
#

the model i made is a kohya dreambooth model of myself, is it possible to quantize it to make it smaller?

azure cloud
main fiber
#

yeah i could do that, but th elora doesn't really captures styles nearly as well as the dreambooth version

azure cloud
main fiber
#

Thank you

main fiber
#

ok soi quantized my model, but when i try to load it, i'm getting "All available backends failed to load the model '/mnt/Kodi_Backup/Applications/StableSwarmUI/SwarmUI/Models/diffusion_models/Flux/Zono/Zono_1-000150-Q4_K_S.gguf'."

azure cloud
#

post the output of server->logs->pastebin button

main fiber
#

i think i got it, i remember a long time ago with gguf you had to set the metadata for it to flux.dev

azure cloud
#

yes

#

which is what it tells you there backend #0 failed to load model with error: Model loader for Flux/Zono/Zono_1-000150-Q4_K_S.gguf didn't work - architecture ID is missing. Please click Edit Metadata on the model and apply a valid architecture ID.

main fiber
#

yeah the quantization didn't work, getting a black box during inference time. i'm also using arch linux, i did find some commands that did convert to fp16 and then to fp8, but not working inference wise. I'll keep looking