> replaces every string in the customer's database with a cryptographic hash > S...

neeleshs · on Nov 12, 2023

I think of this as a graph of interconnected data and metadata. Unless the entire graph is anonymized, it's not really anonymized.

  Deducing relationships between metadata elements (city field, and purchase store) ends up being the tricky part, and highly domain specific.
  Hashes with salts make it a bit harder too.

lukifer · on Nov 12, 2023

Let’s at least grant the benefit of the doubt, that the poster knew to salt the hash. We can take it as given that incompetence makes for poor anonymization.

Your example of “transactions in Chicago” is much more salient; there’s clearly a cat-and-mouse dynamic where data can be de-anonymized, especially if the dataset is public. How much that will actually be possible will be specific to the data in question; but the risk is non-zero. There’s certainly a case that no amount of obfuscation is sufficient if a user has not explicitly consented to their data being used this way.

mindslight · on Nov 12, 2023

> Let’s at least grant the benefit of the doubt, that the poster knew to salt the hash. We can take it as given that incompetence makes for poor anonymization.

Actually let's not, because adding salting doesn't actually get it right either. Rather it's just another easily-broken system that is only good enough to fool its own designer. It's still trivial to run through a list of the most common city names, and recover nearly all of the entries. And if there is just a single "salt" per DB, which would be necessary for the apparent requirement that matching city names stay matching, even cycling through all combinations of letters is nearly practical. There just isn't enough starting entropy to make hashing meaningful.