Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

While interesting, this is not surprising. One of the most commonly utilized datasets for learning ML is the Boston Housing dataset - https://www.kaggle.com/c/boston-housing

In it there's a problematic feature tagged simply as "black" and it is defined as the proportion of blacks by town.

Any pricing model that is built off of this dataset is inherently racially biased because the data has been collected and the feature tagged - but what's the alternative? Not to collect the information? Or collect it but completely ignore this feature?



Sensitive features should not be collected or used in applications where they introduce bias.

For example, a medical screening NN may find race to be a valuable feature for the prediction of illness; but a health insurance assessor should not.


Due to spurious correlations, it could still be helpful for the insurance assessor so that they can use bias mitigation techniques. Otherwise, it might learn something about zip code or something else that leads to a similar outcome as having race as an input variable. Just removing a sensitive variable does not suffice for preventing unwanted bias.


Then you are confusing bias with bias. One is data bias and the other one is racial bias. If you remove the data bias, you are by definition introducing a racial bias by imposing your will on reality.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: