hyperparameter choice | Unsloth AI | Page 1

upper pollen Jul 22, 2024, 5:03 PM

#

Hello all! I was curious regarding what literature exists, regarding hyperparams choice, that can help build an intuition…

I know some ideas about rsLoRa, lower LR for higher ranks, large batches for generalization, etc, but each time I finetune some on some new data, I feel like I’m shooting in the dark to see what sticks, and can’t help but feel there’s a better way to start this kind of parameter sweep.

So, I’d be grateful if someone could link / talk about some relevant content regarding:

Epoch choice, LR scheduler choice, batch size, packing vs no packing vs neat packing, dropouts, etc, for what kinds of data

#

Linking @paper canyon / @patent sand , because I feel like you guys might know best

empty elm Jul 22, 2024, 8:33 PM

#

I'm pretty new, but now that I've read a few papers, I'm starting to think half of the research out there is trying to answer this question

hearty lily Jul 24, 2024, 9:55 AM

#

I would suggest studying andrej kaparthy's gpt2 from scratch repository. The repo teaches you how the parameters affects the pytorch model

acoustic pelican Jul 24, 2024, 11:10 AM

#

top advice

#

his yt series of the same topic is top too

worthy roost Jul 24, 2024, 11:16 AM

#

There's this video in which he teaches to search for the best lr for example

#hyperparameter choice