Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Love the article! My team at nvidia recently released a GPU accelerated version of the fuzzy deduplication algorithm described, and I figure this community might be interested.

Here's the repo: https://github.com/NVIDIA/NeMo-Curator/

Some documentation on the fuzzy dedup scripts: https://docs.nvidia.com/nemo-framework/user-guide/latest/dat...

And a Python example: https://github.com/NVIDIA/NeMo-Curator/blob/main/examples/fu...

Would be interested in any feedback from the folks here.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: