I've been a TA for Matlab course in my university for three semesters and there are a lot of students cheating in assignments. I thought it would be interesting to see how different submissions are grouped into clustered so I made a visualization page (and a bunch of scripts). Turns out that there are quite a few clusters even the similarity threshold is set to be 90%.
https://github.com/songgao/AlikeSubmissions
I wonder what it's gonna be like in Google Code Jam submissions. Although alike submissions don't necessarily imply cheating.