#hyperparameter choice

7 messages · Page 1 of 1 (latest)

upper pollen
#

Hello all! I was curious regarding what literature exists, regarding hyperparams choice, that can help build an intuition…

I know some ideas about rsLoRa, lower LR for higher ranks, large batches for generalization, etc, but each time I finetune some on some new data, I feel like I’m shooting in the dark to see what sticks, and can’t help but feel there’s a better way to start this kind of parameter sweep.

So, I’d be grateful if someone could link / talk about some relevant content regarding:

Epoch choice, LR scheduler choice, batch size, packing vs no packing vs neat packing, dropouts, etc, for what kinds of data

#

Linking @paper canyon / @patent sand , because I feel like you guys might know best

empty elm
#

I'm pretty new, but now that I've read a few papers, I'm starting to think half of the research out there is trying to answer this question

hearty lily
#

I would suggest studying andrej kaparthy's gpt2 from scratch repository. The repo teaches you how the parameters affects the pytorch model

acoustic pelican
#

top advice

#

his yt series of the same topic is top too

worthy roost
#

There's this video in which he teaches to search for the best lr for example