Does Unsloth have quantized versions of optimizers other than AdamW? | Unsloth AI | Page 1

lean flume Sep 18, 2025, 2:58 AM

#

Does Unsloth have quantized versions of optimizers other than AdamW? I was looking into other optimizers and I want to try out Adafactor / Muon Clip, but I couldn't use the 32-bit ones on my hardware, and I was wondering if unsloth has other ones.

cerulean acornBOT Sep 18, 2025, 2:58 AM

#

It seems likely that Unsloth primarily supports quantized optimizers for AdamW, especially via bitsandbytes (e.g., 8-bit AdamW), but there is no explicit mention in the data sources of quantized versions of other optimizers like Adafactor or Muon Clip being available in Unsloth. The Hugging Face Transformers documentation notes that Adafactor can reduce memory requirements, but quantized (8-bit/4-bit) versions are only described for AdamW, not for Adafactor or Muon Clip, and Unsloth's documentation and issues do not mention support for quantized Adafactor or Muon Clip optimizers either.

If you want to use quantized optimizers beyond AdamW, you may need to look for external implementations or request this feature from the Unsloth team. Would you like more detail on optimizer support or memory-efficient training strategies in Unsloth?

Sources:

https://huggingface.co/docs/transformers/main/en/perf_train_gpu_one#mixed-precision
https://github.com/unslothai/unsloth/issues/2884
Note: I wasn’t able to find highly relevant data sources, but above are a few potentially relevant links.

lean flume Sep 18, 2025, 3:00 AM

#

Aw man

#

That kinda sucks

bitter silo Sep 18, 2025, 7:10 AM

#

For that, we depends on the HuggingFace transformers to implement it. AFAIK Muon is still not supported in HF. And I am not aware of the quantized version

https://github.com/huggingface/transformers/pull/39541