#Recommended tips to mitigate VRAM usage?
1 messages · Page 1 of 1 (latest)
messing with the text encoders and vae is barking up the wrong tree
don't mess with them, leave them default
you want to swap the main model itself (presumably a Flux model from what you're describing) for a smaller variant
the model i made is a kohya dreambooth model of myself, is it possible to quantize it to make it smaller?
well, A, if it's a flux small train run use a lora not a full checkpoint omg
yeah i could do that, but th elora doesn't really captures styles nearly as well as the dreambooth version
B there's gguf quant scripts in https://github.com/city96/ComfyUI-GGUF/tree/main/tools
Thank you
ok soi quantized my model, but when i try to load it, i'm getting "All available backends failed to load the model '/mnt/Kodi_Backup/Applications/StableSwarmUI/SwarmUI/Models/diffusion_models/Flux/Zono/Zono_1-000150-Q4_K_S.gguf'."
post the output of server->logs->pastebin button
Content of Swarm Debug Log Paste #129159: SwarmUI v0.9.4.0 Server Log - 2024-12-18 08:58:26... pasted 2024/12/18 05:58:27 UTC-08:00, Paste length: 91741 characters across 738 lines, Content: 2024-12-18 08:52:16.656 [Init] === SwarmUI v0.9.4.0 Starting at 2024-12-18 08:52:16 ===2024-12-18 08...
i think i got it, i remember a long time ago with gguf you had to set the metadata for it to flux.dev
yes
which is what it tells you there backend #0 failed to load model with error: Model loader for Flux/Zono/Zono_1-000150-Q4_K_S.gguf didn't work - architecture ID is missing. Please click Edit Metadata on the model and apply a valid architecture ID.
yeah the quantization didn't work, getting a black box during inference time. i'm also using arch linux, i did find some commands that did convert to fp16 and then to fp8, but not working inference wise. I'll keep looking