#Does Unsloth have quantized versions of optimizers other than AdamW?
10 messages · Page 1 of 1 (latest)
It seems likely that Unsloth primarily supports quantized optimizers for AdamW, especially via bitsandbytes (e.g., 8-bit AdamW), but there is no explicit mention in the data sources of quantized versions of other optimizers like Adafactor or Muon Clip being available in Unsloth. The Hugging Face Transformers documentation notes that Adafactor can reduce memory requirements, but quantized (8-bit/4-bit) versions are only described for AdamW, not for Adafactor or Muon Clip, and Unsloth's documentation and issues do not mention support for quantized Adafactor or Muon Clip optimizers either.
If you want to use quantized optimizers beyond AdamW, you may need to look for external implementations or request this feature from the Unsloth team. Would you like more detail on optimizer support or memory-efficient training strategies in Unsloth?
Sources:
- https://huggingface.co/docs/transformers/main/en/perf_train_gpu_one#mixed-precision
- https://github.com/unslothai/unsloth/issues/2884
Note: I wasn’t able to find highly relevant data sources, but above are a few potentially relevant links.
For that, we depends on the HuggingFace transformers to implement it. AFAIK Muon is still not supported in HF. And I am not aware of the quantized version
Also, I am not sure of how Muon works, but I don't think they have the momentum weights right? Which now we don't have things to quantized?
Oh they have to store one momentum instead of two like in Adam
So the VRAM savings wouldnt be huge then
well, saving is still saving. it's still cut 50% at least
We have an 8bit Lion implementation in bitsandbytes. Also AdEMAMix, though if you're looking for memory savings that uses more than 8bit AdamW.