Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

No need so many commands :)

    $ cat duplicates.txt
    abc  7   4
    food toy ****
    abc  7   4
    test toy 123
    good toy ****

    $ awk '!seen[$2]++' duplicates.txt
    abc  7   4
    food toy ****

    $ awk '!seen[$2]++{cnt++} END{print +cnt}' duplicates.txt
    2


On a large file with many duplicates, seen[x]++ can overflow, unless you're using GNU Awk with bignums (gawk -M).


that's a good point

I'll add a note, thanks :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: