The intent behind this tool seems good, but I don't think it's a good idea. To actually anonymize data requires semantic understanding of that data and an understanding of what sort of data, harmless by itself, is transmuted into identifying data when provided in the context of other otherwise harmless data.
This tool doesn't help you with any of that. It seems to be a glorified awk script. My concern is that helping the user with the easiest part of anonymizing data stands to encourage the user to go full steam ahead without slowing down to stop and think very carefully about what they're doing.
Hey! I'm one of the co-maintainers of the project here. I've posted a very similar reply to a very similar comment below at [1], but to replay the main points:
We absolutely agree this tool only solves the easiest part of anonymising data, and internally we rely on our team of data scientists to do the difficult parts. This tool is absolutely not up to the task of anonymising a dataset in such a way as to make it able to be made public. For us, it's about risk management vs effort: from a security perspective there are scenarios where we can use samples of data that have gone through this process and decrease the risk of holding data internally in multiple places substantially without significant effort. If we were to go onto to make any of these datasets ultimately public, we'd be looking for a better suited tool (eg. ARX [2]).
Regarding one part of your comment:
> My concern is that helping the user with the easiest part of anonymizing data stands to encourage the user to go full steam ahead without slowing down to stop and think very carefully about what they're doing.
We're going to try to add something to the README addressing this exact question from both of you as it's one I anticipate we're going to get asked a lot - or one that carries risk if it's not made obvious form the outset - so thanks for the constructive line of questioning as it really will ultimately help us and people who choose to use this tool make a decision that's right for them and their use-cases.
This tool doesn't help you with any of that. It seems to be a glorified awk script. My concern is that helping the user with the easiest part of anonymizing data stands to encourage the user to go full steam ahead without slowing down to stop and think very carefully about what they're doing.