But I suspect that the analysis would look very similar for the entire file. Just calculate the average offset of the file of a given size. It won't be smaller than the file itself.
I guess that is mostly true. In theory you could think of a more efficient way to store the indexes such as run-length encoding. So you could store 1M of whatever the first byte of Pi is and then run-length encode the 1M zero addresses. You can also imagine a scheme such as 2 bit varint encoding the indexes or a tally system that only uses 1 bit per offset to store a zero.
...of course you are still better off to just compress the file consisting of a single repeated byte.
But I suspect that the analysis would look very similar for the entire file. Just calculate the average offset of the file of a given size. It won't be smaller than the file itself.