Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How would you combine HIPAA with another data source to identify the individual? Not suggesting it can't be done, just wondering how one might do that? Being able to link data that can identify a person to some de-identified would only be possible if the original data was not properly de-identified right?



There is no such thing as "proper de-identification" in general; it's all the matter of what other data sets the re-identifying party has at its disposal.

Consider the following de-identified data sets:

- [date, time, clinic, procedure or test being done, insurer] - as collected by the clinic chain so that it can get money from insurers

- [month, clinic, test name, test result] - for all tests made in the last year, collected for statistical purposes

- [date, time, latitude, longitude, phone number] - because AFAIR telcos sell this data

- [name, surname, phone number, ...] - some insurance company's list of customers

If you can get your hands on these datasets, you can trivially de-identify patients and even assign test results to them with high probability (that depends on how many tests of a given type are made in any given clinic per the unit of time used to group the second data set).

Real-world data sets may be less clear-cut than this, but there is more of it, and you can apply statistical methods to find correlations. You don't need to be 100% sure customer X has diabetes for the information to be useful to you; 70% or 60% is useful too.


Section 164.514(b)

"The following identifiers of the individual or of relatives, employers, or household members of the individual, are removed:

...

(B) All geographic subdivisions smaller than a state, including street address, city, county, precinct, ZIP code, and their equivalent geocodes, except for the initial three digits of the ZIP code

(C) All elements of dates (except year) for dates that are directly related to an individual, including birth date, admission date, discharge date, death date, and all ages over 89 and all elements of dates (including year) indicative of such age, except that such ages and elements may be aggregated into a single category of age 90 or older

... "

This the "Safe Harbor" method.

You could use the "Expert Determination" method. However, date + time + location attached to health information in your first data set definitely doesn't meet the criteria. I'll eat my hat if you find a supposed "non-PHI" data set with those.

In fact, the criteria for expert determination is literally that re-identification cannot be performed (without already having PHI-type information).


Yea this was my impression too. I've worked with HIPAA data and usually I had to remove far more than just like a "name" for it to be de-identified.



Genomic data is by definition identified.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: