I think it's the other way around. The similarity metric determines which repos have edges (possibly weighted?)
And then some clustering algorithm makes sense of this giant graph by laying out sets of nodes that have a lot of edges to each other, close to each other
The closeness is just layout, the edges is the data structure that determines closeness.