I think one of the most important aspects of human vision that everyone seems to overlook is that it's active. We aren't just sitting in a dark room looking through a video feed our whole lives, we actually live in and interact with the world.
Our eyes are active in that they move freely and can focus at different distances. We also happen to have two of them and our brains have a model for how far apart they are. These two features (active focusing and binocular vision) give us incredible depth perception.
Our brains use this depth information to separate objects from the background, something a machine learning algorithm cannot do if you're just feeding it a billion photo labeled training set.
The brain also makes decisions very early and updates it as it has time to reconsider the data. We've all probably had cases where we saw a person sitting down then realised it was just a jacket draped over a chair.
At least from my own personal experience, it's very biased too. It seems the more tired we are, the more likely we are to incorrectly recognise immobile objects as people or animals at a glance.
Our eyes are active in that they move freely and can focus at different distances. We also happen to have two of them and our brains have a model for how far apart they are. These two features (active focusing and binocular vision) give us incredible depth perception.
Our brains use this depth information to separate objects from the background, something a machine learning algorithm cannot do if you're just feeding it a billion photo labeled training set.