You can segment the validation to be data after a certain date, and train on dat...

thousandautumns · on June 19, 2018

That only ensures your model is accurate assuming real world parameters remain the same, which again, is prone to overfitting.

To use a real world example, financial models on mortgage backed securities were the root cause of the financial crisis, because they were based on decades of mortgages that were fundamentally different than the ones they were actually trying to model. Even if someone was constructing a model by training on data from say, 1957-1996, and validating using 1997-2006, they would have failed to accurately predict the collapse because the underlying factors that caused the recession (the housing bubble, prevalence of adjustable rate mortgages, lack of verification in applications) were essentially unseen in the decades of data prior to that.

Validation protects against overfitting only to a certain degree, and only to the extent that the underlying data generating phenomena don't ever change, which, in the real world, is generally a terrible assumption.

sethrin · on June 19, 2018

I'd probably put fraud ahead of models as the root cause. The entire purpose of those securities was to obscure the weakness of their fundamentals.