Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I hope you mean "having more data doesn't lead to better prediction automatically", because, used correctly, it always gives more accurate predictions (e.g. prediction intervals[1] are smaller, one can make better determinations about the distribution of the underlying population, etc).

[1]: https://en.wikipedia.org/wiki/Prediction_interval




Let me me correct my statement. More data does not necessarily lead to better prediction. I must add that it is not always about usage as well. In some cases, the data is insufficient to make any kind of prediction.


The only data that is insufficient to make a prediction is either no data at all, or data suspected to be wrong (e.g. faulty observation equipment).

Even given a single sample, one can make a prediction of what the next observation will be: the same value. Once you get more samples and more knowledge about the subject, you can bring stronger statistical tools to bear to get error estimates, prediction intervals, improved models, etc.


> Once you get more samples and more knowledge about the subject, you can bring stronger statistical tools to bear to get error estimates, prediction intervals, improved models, etc

Like I said earlier, increase in data does not always lead to improved models.


If you have enough data, you have a census and then there is no model, just reality...

In polls like this, the more data the better as long as it is unbiased (i.e. as long as it is DATA).




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: