So they are right in assuming that people are searching more and more for answers to questions rather than looking up documents containing keywords. They are wrong in assuming that all of these questions inquire about what some physical object/location/etc is. But I think having something like this is useful in other ways. One application that comes to mind is that they could gather all submitted answers to build an automatic object/scene recognizer. Probably in a few years, Jelly could automatically tag objects in the photo you just took.
True, but I would think that since photographs taken by Jelly would a) most like be close-ups and b) might also have some extra annotations (eg. circled objects, hint in the question). This would result in a training set that better suits real world input. I am not really sure to be honest, it's just a thought.