I'm sort of in the same boat, but with the sentence,
> To help this situation, we developed two tools, cdc_rsync
Why not use rsync?
(It does seem they ended up faster than rsync, so perhaps that's "why", but that seems more like a post-hoc justification.)
The "we use variable chunk windows" bit is intriguing, but the example GIF sort of just pre-supposes that the local & remote chunks match up. That could have happened in the rsync case/GIF, but that wasn't the case considered, so it's an oranges/apples comparison. (Or, how is it that the local manages to be clairvoyant enough to choose the same windows?)
>> Or, how is it that the local manages to be clairvoyant enough to choose the same windows?
Yes! In content defined chunking, the chunk boundaries depend on the content, in our case a 64 byte window. If the local and the remote files have the same 64 byte sequence anywhere, and that 64 byte sequence has some magic pattern of 0s and 1s, they will both have chunk boundaries there. A chunk is the range of data between two chunk boundaries, so if N consecutive chunk boundaries match, then N-1 consecutive chunks match.
I was wondering the same thing. It isn't explicit in the write-up but it's because rsync does not have a native Windows implementation, only rsync under Cygwin, so they developed this to achieve the same thing except it turned out faster than rsync.
> To help this situation, we developed two tools, cdc_rsync
Why not use rsync?
(It does seem they ended up faster than rsync, so perhaps that's "why", but that seems more like a post-hoc justification.)
The "we use variable chunk windows" bit is intriguing, but the example GIF sort of just pre-supposes that the local & remote chunks match up. That could have happened in the rsync case/GIF, but that wasn't the case considered, so it's an oranges/apples comparison. (Or, how is it that the local manages to be clairvoyant enough to choose the same windows?)