Hi, i am trying to train a transformer based ai model. I just doubled my dataset size because i wanted to soften out the effect of shit data mixed in-between. I came to this conclusion after recently running my model that it spoke a lot of slang. (ye , boi , uhuh , pp etc...) so i added some** NLP qna data which i thought was small (48Mb compressed) and it turned out to be 80000 samples which is double my current amount which i have added to my main training dataset. So i just wanted to ask for you guys' opinion if
VOCAB_SIZE = 30522 # for DistillBERT
DIMMENTIONS = 512 # Hidden dim density
HIDDEN_LAYERS = 24 # Amount of hidden layers
HEAD_LAYERS = 8 ...
is enough for it to effectively learn all the data. should i consider increasing the hidden layers from 24 to 32 and 768 neurons per layer? i'd need to rent a gpu tho...