#4bit model creation

7 messages · Page 1 of 1 (latest)

jaunty saddle
#

Hi! Is there an easy way to convert any model to 4bit nf4 bnb for unsloth, to save space?

I want to try out some Llama 3 and Mistral finetunes that aren't provided and I'd like to not have to store the full precision models on disk.

I know I could write some code to use the bnb and the safetensors libs to do it, but I'd prefer something easier / more plug and play, that preferably also wouldn't require changes to unsloth code itself.

If not supported, I think it'd be a neat feature to have in unsloth to save those to disk after converting (and replace the full precision model), via a function or an optional argument or whatever.
Perhaps I could contribute it myself with some direction.

Thanks!

chrome root
#

wait as in once the model finishes downloading, remove the 16bit model

#

and keep the 4bit one?

jaunty saddle
#

makes sense no? me and I assume others might only ever want to train 4bit qloras (hw constraints being a common reason). no reason to waste many GBs of disk space (& have them load slower I assume)

jaunty saddle
#

there are Q4 quantized models on hf but these are GGUF which I don't think unsloth can read

chrome root
#

ye sadly GGUF doesnt work yet