#CUDA OOM error on 3b model while using zero3, qlora, fp16 AND 4 a6000 GPUs!!

1 messages · Page 1 of 1 (latest)

eternal folio
#

I know this error is like beating a dead horse but I'm really, really, really stuck (have been trying to solve this for the past 2 WEEKS) and don't know whats wrong. Trying to SFT Qwen2.5-VL-3b-Instruct on only 500 samples of images and text but keep getting cuda OOM even though I'm using every single trick i can find.

There's posts about initializing it before called .from_pretrained (did that didn't change anything), used accelerate, batch size 1, using gradient checkpointing and everything but just can't get this to work. Here are my train, ds_config and model_loader files, it's only ~ 1m trainable parameters and each a6000 should have 48GB of vram... it's a bit of a tedious thing to debug so i'm willing to tip/buy an e-coffee for anyone who can give me advice on this @-@

train: https://pastebin.com/D4g7DXbN
ds_config: https://pastebin.com/9iSqNS3c
model_loader: https://pastebin.com/TnepKhkQ

old kindle
#

I assume it's trying to use all of the ctx window at once