I have a time series dataset with exogenous data. But the issue is that some of the dates in the time series are missing. this goes fine with training the forecasting model such as ARIMA, but when I try to evaluate it on test, it gives a size mismatch, since it also expects values for the in between dates, which I don't have. What is the best way to handle such an issue?
#How to forecast a time series with missing days?
6 messages · Page 1 of 1 (latest)
A common strategy is to just do forward fill https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.ffill.html
But I would suggest to understand if the missing data makes sense. For example, if it’s about patient visitation, then there are papers that specifically tackle this problem by learning representation of visit frequency
Imputation in general, you want to fill with something that looks as generic as possible. For time series the forward fill is an option, and different types of weighted averages can also work well.
Look at Cell #15 here: https://github.com/DrDub/artfeateng/blob/master/Chapter7.ipynb
I dont think you should impute during test.
just drop the missing data during test
What do you mean "drop the missing data"? Like drop the incomplete rows during test? That'd give you a very bad appraisal of the behaviour of the trained model if it is going to be used in an environment that features missing data.