Hello all! I was curious regarding what literature exists, regarding hyperparams choice, that can help build an intuition…
I know some ideas about rsLoRa, lower LR for higher ranks, large batches for generalization, etc, but each time I finetune some on some new data, I feel like I’m shooting in the dark to see what sticks, and can’t help but feel there’s a better way to start this kind of parameter sweep.
So, I’d be grateful if someone could link / talk about some relevant content regarding:
Epoch choice, LR scheduler choice, batch size, packing vs no packing vs neat packing, dropouts, etc, for what kinds of data