Hacker News new | past | comments | ask | show | jobs | submit login

How do you test a system like this for accuracy? Is this done by simulating millions of unique requests?



The algorithm's accuracy is known. From the wiki[1]:

    The HyperLogLog algorithm is able to estimate 
    cardinalities of > 10^9 with a typical error rate of 2%
[1] https://en.wikipedia.org/wiki/HyperLogLog


But what about the implementation accuracy? :)


Tests against both historical and synthetic datasets.


Reddit probably has enough analytics to be able to show mathematically that it will be accurate without simulating any requests.


Can't you just use Apache Benchmark and some proxies?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: