I don't think things are quite as dire as that. I have experience as a machine l...

I don't think things are quite as dire as that. I have experience as a machine learning scientist, have actually never worked on the kinds of papers you're talking about, but have helped apply machine learning to scientific problems. First in a systems biology lab for tasks upstream of drug discovery, and then in industry for credit risk modeling. I've mostly just used classical machine learning or Bayesian machine learning models though.

My interest has always been in using machine learning as a tool to help understand some underlying phenomena, not in trying to push to the top of the leaderboard on benchmark datasets. I think this kind of attitude isn't uncommon in academic labs focused on doing real science, though the methods used aren't necessarily state of the art, and ML typically plays only a supporting role.

Curiously, I'm going to present at a conference later today and will give a toy example showing why those leaderboards are not necessarily reliable for distinguishing between the quality of different models.