The underlying dilemma is that so many of these stores are not really "related". They're just the same story, rewritten off of a press release. The ideal system would pick out clusters, but also have sub-clusters within the cluster that would contain articles on the same subject but with diverse info.
It is helpful that the same story is often written off of PA or AP articles and thus includes pretty much the same info - but it doesn't change the fact that stories on the same subject almost always include the same set of key words unique to that subject whether or not they were rewritten from a press release. That's the beauty of TF.IDF weighting - that it'll cluster stuff based off of words that are uniquely important in one article.
Sometimes they're just rewrites of a press release - and those ones are easy to get right - but a lot of the time they really are totally different articles about the same event. Go back and look at the three sets of example clusters and you'll see what I mean.
They also have access to the link graph, so they probably use that for clustering instead of looking at the text. Pages with lots of inbound linkers in common are likely to be similar.