Diff algorithms are astoundingly widely applicable. curses had the task of updat...

Diff algorithms are astoundingly widely applicable.

curses had the task of updating your 2400 baud terminal screen to look like a TUI program's memory image of what should be on it. (The TUI might be vi, nethack, a menu system, a database entry screen, ntalk, whatever.) The simple solution is to repaint the whole screen. But 80x24 is 2000 bytes, which is 8 seconds at 2400 baud. Users won't use a program that takes 8 seconds to respond after their latest keystroke. So curses uses a diff algorithm and the terminal capabilities database to figure out a minimal set of updates to send over the serial line. (Some terminals have an escape sequence to insert or delete a line or a character, which can save a lot of data transmission, but others don't.) React's virtual DOM is the same idea.

If you run `du -k > tmp.du` you can later `du -k | diff -u tmp.du -` to see which directories have changed in size. This can be very useful when something is rapidly filling up your disk and you need to stop it, but you don't know what it is.

If you `sha1sum $(find -type f) > good.shas` you can later use diff in the same way to see which files, if any, have changed.

rsync uses rolling hashes to compute diffs between two computers at opposite ends of a slow or expensive connection in order to efficiently bring an out-of-date file up-to-date without transmitting too much data. It's like curses, but for files, and not limited by terminal capabilities.

rdiff uses the rsync algorithm to efficiently store multiple versions of a file, which does not in general have to contain source code. Deduplicating filesystems can use this kind of approach, but often they don't. But it's common in data backup systems like Restic.

It's common for unit tests to say that the result of some expression should be some large data structure, and sometimes they fail by producing some slightly different large data structure. diff algorithms are very valuable in understanding why the test failed. py.test does this by default.

Genomic analysis works by finding which parts of two versions of a genome are the same and which parts are different; BLAST, by Gene Myers, is a common algorithm for this. This is useful for an enormous variety of things, such as understanding the evolutionary history of species, identifying cancer-causing mutations in DNA from tumors, or discovering who your great-grandmother cheated on her husband with. It's maybe reasonable to say that all of bioinformatics is real-world use-cases of diff algorithms. They have to handle difficult pathological cases like transposition and long repeated sequences.

Video compression like H.264 works to a great extent by "motion estimation", where the compressor figures out which part of the previous frame is most similar to the current one and only encodes the differences. This is a more difficult generalization of the one-dimensional source-code-diff problem because while the pixel values are moving across the image they also can get brighter, dimmer, blurrier, or sharper, or rotate. Also, it's in two dimensions. This is pretty similar to the curses problem in a way: you want to minimize the bandwidth required to update a screen image by sending deltas from the image currently on the screen. Xpra actually works this way.