I was very keen on machine learning for some time- I started working with ML in ...

I was very keen on machine learning for some time- I started working with ML in the mid 90s. The work I did definitely could have been replaced with a far less mathematically principled approach, but I wanted to learn ML because it was sexy and I assumed that at some point in the future we'd have a technological singularity due to ML research.

I didn't really understand the technology (gradient descent) underlying the training, so I went to grad school and spent 7 years learning gradient descent and other optimization techniques. Didn't get any chances to work in ML after that because... well, ML had a terrible rep in all the structural biology fields and even the best models were at most 70% accurate. Not enough data, not enough training methods, not enough CPU time.

Eventually I landed at Google in Ads and learned about their ML system, Smartass. I had to go back and learn a whole different approach to ML (Smartass is a weird system) and then wait years for Google to discover GPU-based machine learning (they have Vincent Vanhouke to thank- he sat near Jeff Dean and stuffed 8 GPUs into a workstation to prove that he could do training faster than thousands of CPUs in prod) and deep neural networks.

Fast forward a few years, and I'm an expert in ML, and the only suggestion I have is that everybody should read and internalize: https://research.google/pubs/pub43146/ So little of success in ML comes from the sexy algorithms and so much just comes from ensuring a bunch of boring details get properly saved in the right place.