But once you introduce a piece of software into the middle to make this usable, what's the actual difference between this and just using VNC?
At that point it doesn't really matter if the screen is a file or not -- you need a compressor that can easily provide the output on a network socket, and a client that can perform the decoding.
You're right that it doesn't matter if it's a file or not per se.
What matters - and what the file interface gets you, but you can do the same thing in many other ways - is introducing the concept of a generic pluggable, chainable API.
At that point it doesn't really matter if the screen is a file or not -- you need a compressor that can easily provide the output on a network socket, and a client that can perform the decoding.