I'm not sure you could make the data statistically meaningful and have too many ...

PeterisP · on May 1, 2018

The expectation is that a brute force attack would try orders of magnitude more IDs than you actually have. It means that if a random ID is 90% likely to have a unique hash and 10% likely to map to one of your real IDs, then your real data won't have that many collisions, however, if someone does a brute force check of (for example) a million email addresses, then they'll get 100 000 positive responses, the vast majority of which will be false positives.

nebulous1 · on May 1, 2018

That's a reasonable point but doesn't explain why you're using hashes instead of random groupings in the first place.