#(ML) Why can't we avoid overfitting by dripfeeding model more data?

9 messages · Page 1 of 1 (latest)

sonic umbra
#

Hey guys, i was wondering with training models and overfitting.

My understanding is that overfitting happens because the model begins memorizing the training data, and that training data is a subset of like the total data? E.g. we have house prices from 2000-2025, and give it only 2000-2015 as training data.

I was wondering, would it be possible (or why it would not be good) to dripfeed the model more data step by step and throw in some incorrect data every once in a while.

Like, if I give a model training data of house prices from 2000-2015, and then ask it to predict 2016, and then reveal 2016, and then do the same for 2017, and so on? If the model somehow memorizes that a particular street address always corresponds to high price from 2000-2015, and this street address dropped heavily in price in 2016, cant we reveal this in the next round to the model and have it account for that when its predicting the next time for 2017?

Sorry if i misunderstand, still trying to piece together how all these things fit together

fickle phoenix
#

Your training data always is a subset of the total data

#

overfitting isn't just memorizing the dataset

#

it's learning patterns that don't generalize

#

Why would you throw in incorrect data though? That'll just make your predictions significantly worse (also how will you come up with the fake prices)

#

dripfeeding also doesn't really solve our problem with overfitting, the model could still overfit to the sequence or on smaller batches or it just forgets about the long-term patterns because it's so focused on short-term things

#

Some keywords for further research: online learning, rolling retraining

sonic umbra
sonic umbra