Hacker News new | past | comments | ask | show | jobs | submit login

You can segment the validation to be data after a certain date, and train on data before that date. You get an accurate sense of how well the model will perform in the real world, as long as you make sure the data never borrows from the future.



That only ensures your model is accurate assuming real world parameters remain the same, which again, is prone to overfitting.

To use a real world example, financial models on mortgage backed securities were the root cause of the financial crisis, because they were based on decades of mortgages that were fundamentally different than the ones they were actually trying to model. Even if someone was constructing a model by training on data from say, 1957-1996, and validating using 1997-2006, they would have failed to accurately predict the collapse because the underlying factors that caused the recession (the housing bubble, prevalence of adjustable rate mortgages, lack of verification in applications) were essentially unseen in the decades of data prior to that.

Validation protects against overfitting only to a certain degree, and only to the extent that the underlying data generating phenomena don't ever change, which, in the real world, is generally a terrible assumption.


I'd probably put fraud ahead of models as the root cause. The entire purpose of those securities was to obscure the weakness of their fundamentals.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: