True, but I would think that since photographs taken by Jelly would a) most like...

True, but I would think that since photographs taken by Jelly would a) most like be close-ups and b) might also have some extra annotations (eg. circled objects, hint in the question). This would result in a training set that better suits real world input. I am not really sure to be honest, it's just a thought.