Merging is pretty simple, and could probably use a little more TLC. The way we do it is that when we get a new document in the system, if the similarity score is above some threshold for documents in two different clusters we will consider merging those clusters. We then make the yes/no decision by comparing random documents from both clusters and averaging the scores, but the threshold we use here is a bit lower than the non-merging decisions (since we have the additional information of this new document doing a good job of linking the clusters).
this doesn't work in the case where there are two (and only two) similar documents get ingested into the system as new singleton clusters at the same time; this case is very rare so it is not a big issue to you, i guess.