Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm not sure you could make the data statistically meaningful and have too many false positives to deanonymize an id. I think you're basically suggesting randomly grouping the ids so they average X real ids per grouped ID. At least if you just did it randomly instead of by hashing then there would be no danger of a dictionary attack.


The expectation is that a brute force attack would try orders of magnitude more IDs than you actually have. It means that if a random ID is 90% likely to have a unique hash and 10% likely to map to one of your real IDs, then your real data won't have that many collisions, however, if someone does a brute force check of (for example) a million email addresses, then they'll get 100 000 positive responses, the vast majority of which will be false positives.


That's a reasonable point but doesn't explain why you're using hashes instead of random groupings in the first place.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: