Exactly. The large amount of faith currently placed in probabilistic models that do not have common sense ways of eliminating factors which are extremely unlikely to be causal (like the presence if a ruler causing cancer) disturbs me. There is something that humans do that we have not quite figured out how to teach computers yet, at least as far as I can tell, which is to get them to evaluate whether their model is not just compatible with their observations, but other models and prior knowledge about the objects being observed.
I think we’ll get there at some point, and I’m not exposed to the most cutting edge AI research, but it seems like AI us currently very overhyped and deeply flawed for many of the applications people would like to use it for.
Because these algorithms don't know that they are classifying cancer. The label it sees is just 1 or 0. For all it knows, based on its inputs, you may want to classify ruler/non-ruler images.
To achieve what you want, semantic structure must be used as labels instead of just categorical labels.
Assuming we have a sane AI that now knows its looking for cancer, it knows what that means (from digesting medical textbooks, papers and generic text corpora) and it can detect rulers and knows the two are not casually linked from ruler to cancer, we could make the model output "dataset diagnostics", like a "Warning! The cancer label in this dataset is implausibly correlated with the visual presence of a ruler". Or "Warning: 99% of your hotdog images show a human hand. Evaluation on this dataset will ignore errors on hotdog images without hands!"
Context does matter though. If there's an orange fluff on a tree trunk, the AI is right to look at the environment and infer it's a squirrel.
I think we’ll get there at some point, and I’m not exposed to the most cutting edge AI research, but it seems like AI us currently very overhyped and deeply flawed for many of the applications people would like to use it for.