I know this error is like beating a dead horse but I'm really, really, really stuck (have been trying to solve this for the past 2 WEEKS) and don't know whats wrong. Trying to SFT Qwen2.5-VL-3b-Instruct on only 500 samples of images and text but keep getting cuda OOM even though I'm using every single trick i can find.
There's posts about initializing it before called .from_pretrained (did that didn't change anything), used accelerate, batch size 1, using gradient checkpointing and everything but just can't get this to work. Here are my train, ds_config and model_loader files, it's only ~ 1m trainable parameters and each a6000 should have 48GB of vram... it's a bit of a tedious thing to debug so i'm willing to tip/buy an e-coffee for anyone who can give me advice on this @-@
train: https://pastebin.com/D4g7DXbN
ds_config: https://pastebin.com/9iSqNS3c
model_loader: https://pastebin.com/TnepKhkQ
Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.