Sure, those ideas might hold up for objects far away from the eyes, but for nearby objects there can a pretty big difference between what each eye sees. I think the human brain would quickly call BS on an image processed through that kind of compression, and it would not be very immersive or realistic.
There is a difference between what the successive frames of a video depict. Yet, video compression heavily relies on encoding just the differences between successive frames, which is very effective.