Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't think that is correct. The numbers on the leaderboard are not the scores on the training set.

The netflix prize uses three datasets: the training set, the leaderboard set, and the test set. The training set is distributed to everyone. The test set is totally secret. Access to the leaderboard set is only by submitting results (once per 24 hours) and looking at the results. It is not at all trivial to "overfit" to this leaderboard set. (It could be done by, e.g. submitting results with slight tweaks to the algorithm parameters, but this would take a lot of time since you can only submit every 24 hours. Also, you would basically have to do it consciously.)



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: