Hacker News new | past | comments | ask | show | jobs | submit login

Having only hundreds of training examples and fitting hundreds of models seems like a good recipe for overfitting. Have you noticed any issues with overfitting? Or maybe there is something that I am missing.



Oh, that's definitely something we worry about a lot! One thing to note is that each record contains a lot of data in it, so it's not as though there's just 100 data points we have to go on. A 24 hour record has ~20 million data points, it's just that they are highly correlated with each other.

Cross validation helps to some extent. If you're fitting enough models, though, you can still trick yourself into thinking that some models are not overfitting, when they just happen to do well on your validation set by chance. But one of the things I look for is a smooth transition from underfitting to overfitting as the model capacity increases.

Then the other thing we do is probe the models so that we can understand what they're doing as much we can. I like to generate synthetic data and look at the transfer function of the model as I vary a particular parameter in my synthetic dataset, for example.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: