So did you check if the simple solution outperfomed it?
In the real world there's often more noise and variance and additionally, part of the benefit of using those techniques is that you can arrive at solutions that are about as good without being an expert in every single thing.
I'm sympathetic as this is a showcase and if their general method performed as well it does show it can learn the data well for other comparable problems without easy solutions. I know I often test my models on verifiable problems as a sanity check..
Bingo. The kind of problems OP talks about are fairly frequent, but this problem of train speed estimation is not it.
Also worth discussing is what happens when instead of one you put 30 sensors to improve your estimate of the speed. Good luck figuring out the closed form Doppler expression in that case (you technically _can_ use Kalman filtering but you are assuming each sensor is independent - they would not be, they would be correlated based on their spatial location and closeness to train).
With deep learning, all you need to extend your 1 microphone solution to 30 is a lil bit of pytorch code to add more neurons and some plumbing to pass in 30 audio streams but that's it.
Not to mention extensions to more complicated scenarios - people talking nearby, cars nearby etc. With deep learning you probably wont even need to modify any code, just throw training data (assuming your original model architecture is well designed).
In the real world there's often more noise and variance and additionally, part of the benefit of using those techniques is that you can arrive at solutions that are about as good without being an expert in every single thing.
I'm sympathetic as this is a showcase and if their general method performed as well it does show it can learn the data well for other comparable problems without easy solutions. I know I often test my models on verifiable problems as a sanity check..