CUDA OOM error on 3b model while using zero3, qlora, fp16 AND 4 a6000 GPUs!! | Learn AI Together | Page 1

I know this error is like beating a dead horse but I'm really, really, really stuck (have been trying to solve this for the past 2 WEEKS) and don't know whats wrong. Trying to SFT Qwen2.5-VL-3b-Instruct on only 500 samples of images and text but keep getting cuda OOM even though I'm using every single trick i can find.

There's posts about initializing it before called .from_pretrained (did that didn't change anything), used accelerate, batch size 1, using gradient checkpointing and everything but just can't get this to work. Here are my train, ds_config and model_loader files, it's only ~ 1m trainable parameters and each a6000 should have 48GB of vram... it's a bit of a tedious thing to debug so i'm willing to tip/buy an e-coffee for anyone who can give me advice on this @-@

train: https://pastebin.com/D4g7DXbN
ds_config: https://pastebin.com/9iSqNS3c
model_loader: https://pastebin.com/TnepKhkQ

Pastebin

Train.py - Pastebin.com

Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

Pastebin

{ "fp16": { "enabled": "auto", "loss_scale": 0, - ...

Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

Pastebin

model_loader.py - Pastebin.com

Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

#CUDA OOM error on 3b model while using zero3, qlora, fp16 AND 4 a6000 GPUs!!