I wonder how using tee to compute the hash in parallel would affect the overall ...

mrb · on March 12, 2024

On GigE or even 2.5G it shouldn't slow things down, as "sha1sum" on my 4-year-old CPU can process at ~400 MB/s (~3.2 Gbit/s). But I don't bother to use tee to compute the hash in parallel because after the disk image has been written to the destination machine, I like to re-read from the destination disk to verify the data was written with integrity. So after the copy I will run sha1sum /dev/XXX on the destination machine. And while I wait for this command to complete I might as well run the same command on the source machine, in parallel. Both commands complete in about the same time so you would not be saving wall clock time.

Fun fact: "openssl sha1" on a typical x86-64 machine is actually about twice faster than "sha1sum" because their code is more optimized.

Another reason I don't bother to use tee to compute the hash in parallel is that it writes with a pretty small block size by default (8 kB) so for best performance you don't want to pass /dev/nvme0nX as the argument to tee, instead you would want to use fancy >(...) shell syntax to pass a file descriptor as an argument to tee which is sha1sum's stdin, then pipe the data to dd to give it the opportunity to buffer writes in 1MB block to the nvme disk:

  $ nc -l -p 1234 | tee >(sha1sum >s.txt) | dd bs=1M of=/dev/XXX

But rescue disks sometimes have a basic shell that doesn't support fancy >(...) syntax. So in the spirit of keeping things simple I don't use tee.

usr1106 · on March 12, 2024

It's over 10 years ago that I had to do such operations regularly with rather unreliable networks to Southeast Asia and/or SD cards, so calculating the checksum every time on the fly was important.

Instead of the "fancy" syntax I used

   mkfifo /tmp/cksum
   sha1sum /tmp/cksum &
   some_reader | tee /tmp/cksum | some_writer

Of course under the conditions mentioned throughputs were moderate compared to what was discussed above. So I don't know how it would perform with a more performant source and target. But the important thing is that you need to pass the data through the slow endpoint only once.

Disclaimer: From memory and untested now. Not.at the keyboard.