I recall attending a technical talk given by a team of senior ML scientists from...

Tenoke · on Sept 22, 2021

So did you check if the simple solution outperfomed it?

In the real world there's often more noise and variance and additionally, part of the benefit of using those techniques is that you can arrive at solutions that are about as good without being an expert in every single thing.

I'm sympathetic as this is a showcase and if their general method performed as well it does show it can learn the data well for other comparable problems without easy solutions. I know I often test my models on verifiable problems as a sanity check..

omegalulw · on Sept 23, 2021

Bingo. The kind of problems OP talks about are fairly frequent, but this problem of train speed estimation is not it.

Also worth discussing is what happens when instead of one you put 30 sensors to improve your estimate of the speed. Good luck figuring out the closed form Doppler expression in that case (you technically _can_ use Kalman filtering but you are assuming each sensor is independent - they would not be, they would be correlated based on their spatial location and closeness to train).

With deep learning, all you need to extend your 1 microphone solution to 30 is a lil bit of pytorch code to add more neurons and some plumbing to pass in 30 audio streams but that's it.

Not to mention extensions to more complicated scenarios - people talking nearby, cars nearby etc. With deep learning you probably wont even need to modify any code, just throw training data (assuming your original model architecture is well designed).

p1esk · on Sept 23, 2021

But you might need a lot of training data, which in some cases you might not have.

Iv · on Sept 22, 2021

Well the main sound you hear when a train arrives in a station is the sound of brakes. Its frequency and volume changes as the train slows down. You'll need to analyze the physics of that before extracting doppler shift from it.

Also, depending on the track used, there may be trains passing by without braking, so you will need at least a classifier to sort these two cases.

I'd argue that using ML to build such a classifier is almost always a time saver.

And if you have the ML pipeline there, why not try to train it to recognize the speed while we are at it? It will likely find out about doppler shift but also do things that would take ages to code manually:

- Use volume levels and volume level differences - Use the clicks at rails junctions to evaluate the speed - Recognize the intensity of the braking/engine running - Use cues like rails vibration at certain speed - Adjust for air pressure difference when it hears the rain

All of that for free. Nowadays, going ML first is becoming a pretty good idea actually.

mrits · on Sept 22, 2021

I remember the first 15 years of my life getting woken up by trains and it certainly wasn't the breaks that I heard first.

Iv · on Sept 22, 2021

They brake only when they stop at the station. If you were sleeping next to the tracks but not next to a station, you probably did not hear them much.

tlb · on Sept 22, 2021

If the train had a loudspeaker on the front emitting a pure sinewave of known frequency, louder than anything else in the environment, you could probably just use a frequency counter and the Doppler formula.

Given just some microphones picking up whatever sound the train makes on its own, it's not obvious to me that there's a simple solution.

kumarvvr · on Sept 22, 2021

A doppler shift thingamajig might work in a lab, but not in the real world.

I guess, such a calculation could have been one of the inputs to the system.

I do get your point that an ML system for such a thing is an overkill. I guess there are more reliable and rugged methods to get the speed of the incoming train (sensors that need not be mounted on the train)

yobbo · on Sept 22, 2021

Doppler shift wouldn't help much in this case.

The clues required are in the how the thousands of waveforms are affected by the environment, how they change as the train passes different features, and how their volumes change over time, and other features we can't know in advance. Probably the clicks as the wheels pass joints between tracks are the most telling clues about speed.

The microphone doesn't give a sine wave.

TeMPOraL · on Sept 22, 2021

> The clues required are in the [random bits of physics we can't know in advance]

If we can't know in advance, how can you expect a glorified Markov Chain to magically figure it out? If it could - and it can't, but if it could - how would you know it did it correctly?

Fortunately, we know enough about physics to be able to deal with it without a divination server.

I get it. The train operator wants a solution, but realizes figuring this out is too hard, so it's better to pay someone else to do it. That's normal. It used to be that this someone else would do the actual work necessary. But thinking is hard and electricity is cheap, so some figure it's better to just light up a GPU farm and wait until a solution forms in the primordial soup of repurposed vertex shaders. That too, perhaps, would be OK in principle - if the technology was there. But it's not there yet. We're still better off doing the actual thinking.

> The microphone doesn't give a sine wave.

No, it gives an infinite number of sine waves added up together. Which become a finite number of sine waves after passing through ADC, and then a finite sequence of sine waves after a Fourier transform.

jakeinspace · on Sept 22, 2021

I had an internship project which was a simpler cousin of this, where I needed to determine the approximate location of a WiFi-enabled device, based off of received signal strengths from a several access points. Normally this would be trivial, but this demo was meant to simulate an environment highly reflective to 2.4GHz RF. So while it took only a day or 2 to demonstrate relatively poor performance using simple triangulation (actually, trilateration is the better word here), I spent several weeks collecting data and putting it through a support vector machine. With a simple moving average filter on top of that SVM, around 98-99% accuracy was pretty easily achievable in classification (I believe my prediction classes were 2 x # of rooms, so quite coarse but good enough for the task).

The main advantage over a physics-based modeling approach - which with enough information, could surely have reached practically 100% accuracy - is that the SVM didn't rely on knowing anything about the location of the access points, or the geometry of the space. The signal strength training data was to be available for free as a biproduct of another device, so this solution had very low cost in the form of manual effort/precise measurement, both of which would have dwarfed a few weeks of intern time.

yccs27 · on Sept 22, 2021

> If we can't know in advance, how can you expect a glorified Markov Chain to magically figure it out? If it could - and it can't, but if it could - how would you know it did it correctly?

We might not know anything about them in advance, but the patterns are there and could maybe be extracted from the some training data. If only you had a statistical model that was flexible enough to find them…

Validation is then as easy as running the model on some examples outside the training set.

> No, it gives an infinite number of sine waves added up together.

Yeah, and after Doppler shift it is still an infinite number of sine waves - no immediate information gained.

Of course, if there are characteristics in the original noise and its frequency distribution, you could try to find those in the doppler-shifted signal. How would you determine the characteristics? From a dataset of examples, I guess. So now the problem is: recognize a pattern from examples and try to find it in new instances. Sounds like the kind of problem ML has found success in. (If you're now thinking "we don't need ML, just some advanced statistics"… Well ML is often basically a statistical model with lots and lots of parameters.)

TeMPOraL · on Sept 22, 2021

> Validation is then as easy as running the model on some examples outside the training set.

Only if you can trust the data gathered from that validation to be representative. You can do that easily when you understand the statistics your model is doing - which is the case with an "old-school" ML solution, but not so with DNNs.

This gets worse the more complex your problem is. I can expect a DNN to pick up the correct frequency patterns in audio time series quickly, as it stands out in the solution space - but with more variables, more criteria, we know it takes ludicrous amounts of data for the model to start returning good results, and it still often fixates on dubious variables.

And then you have to ask yourself - what are your error bars? With a classical approach to estimating train velocity from sound, your results will be reasonably bounded, and won't surprise you. With a DNN, all bets are off.

> How would you determine the characteristics? From a dataset of examples, I guess.

And physics. In this case, a human can apply their understanding of physics to determine what characteristics to expect, verify they exist in the dataset, and encode that knowledge in the solution. A DNN will have to figure this out on its own, and we have no good way to verify it did it correctly (and isn't just overfit on something that's strongly but incidentally correlated).

I agree there are plenty of problems where we don't have a good "first principles" solution - where we're just looking for correlations. DNNs automate this nicely. But such models belong to the category of untrusted ones - they might seem to work now, but because of their opaqueness, we can't treat past performance as a strong indicator of reliability.

> Well ML is often basically a statistical model with lots and lots of parameters.

Yes. But I think it matters if people know what those parameters do.

higginsc · on Sept 22, 2021

Ha! A friend sent me this comment when he recognized this project. Unless there happens to be another firm who did the exact same thing we did, I was a part of this project (see this blog post https://www.svds.com/introduction-to-trainspotting/).

You misunderstood the point of the presentation. The company was a consulting firm that specialized in data science and engineering. Our clients wanted to kick the tires and see what our technical chops were before hiring us but they didn't want to let us use their proprietary and confidential data for our own tech demos.

We didn't want to just use the same open source datasets everyone else did, so we got to thinking about novel datasets we could create that might have applications for industries we sold our services to. From this, the Trainspotting project was born.

Many of us commuted via the Caltrain, which was right next to our office, and we were frequently frustrated with the unreliability (this was in ~2016 or so when car and pedestrian strikes were happening seemingly every week), so we made an app that tried to provide more accurate scheduling.

We used the official API for station:train arrival times, but we found that it was unreliable, so we wanted some ground truth data on whether a train was passing. Since our office was right next to the Castro MTV station, I had the idea to use a microphone (attached to a raspberry pi) to just listen for when the train went by. In addition to ground-truth data for validating arrival times, this gave us a chance to show off some IoT applications. It actually worked pretty well, but it had false positives (e.g. the garbage truck would set it off). So we added a camera.

We pointed it at the tracks and started streaming data off of it. At first we used very simple techniques, processing the raw stream on-device with classic computer vision algos (e.g. Haar cascades) in openCV. We discovered that the VTA, which had a track parallel to the Caltrain and was "behind" the Caltrain in our camera's shot, could cause false positives. Gradually we used more and more complex techniques like deep learning, but the raspberry pi couldn't handle it (IIRC it could only process a single frame in like 6 seconds). So we used a two-stage validation whereby the simpler, faster detectors that could run on the raw stream in real time detected a positive and then we'd send a single frame to run deep learning.

TL,DR: The whole point was to be a tech demo, not to gauge the speed. The trains were either stopping or pulling out of the station, so speed would have been useless.

ccmonnett · on Sept 22, 2021

Really enjoyed this post and explanation, thank you! I work in ML and used to live on Alma St in Palo Alto so it really hit home for me :).

I also acutely enjoy the notion that a pithy critique of people who refused to simplify the problem they were solving is in itself grossly oversimplified!

shihab · on Sept 22, 2021

Sorry, but this seems too strange to be true. Are you sure you didn't miss anything?

Particularly strange since moving train (i.e. vehicle) is about the most common way doppler effect is explained in textbooks- it's not like you need any big "eureka" moment to get to this solution either.

bostonpete · on Sept 22, 2021

Analyzing the doppler shift to calculate speed only works if you know what the unshifted audio spectrum should be. Trains generate a ton of noise at a wide range of frequencies and that noise probably varies significantly based on a bunch of factors.

93po · on Sept 22, 2021

If you put the microphone directly against the track, I would bet the friction and movement of the wheels against the track generates vibration that is fairly consistent for a given speed. Maybe a sensor that better detects slight vibration would be better than a microphone for this use case.

Additionally, train engines run as generators to actually power the wheels, which means they're likely running at consistent RPMs or a consistent range of set RPMs. This could be listened for.

tomp · on Sept 22, 2021

A sufficiently large RLCDNN would reinvent the Doppler effect from data, eight?

atoav · on Sept 22, 2021

You could also let Tom the traindriver sit there and have him guesstimate the speed.

fho · on Sept 22, 2021

Or just have two switches on the train tracks

pbhjpbhj · on Sept 22, 2021

Radar exists too.

The need might be for a sensor local to the platform as a back up to give warning for a train that's traveling too fast? In which case a sensor that mimics the old Cowboy film favourite of putting one's ear to the track seems like a reasonable thing to try.

atoav · on Sept 22, 2021

Ah, we're talking about practical solutions here? Should've warned me. A chain of laser reflective sensors might be even better, because there is less mechanical wear + you can use them to know where the train currently is and where it isn't.

But this is very likely a very well researched area and there are definitly train people who can point out a flaw in this idea (dirt?)

Ekaros · on Sept 22, 2021

Or some other type of sensor and minimal gear added to each locomotive...

skummetmaelk · on Sept 22, 2021

When all you have is a hammer...

yccs27 · on Sept 22, 2021

Or maybe in this case: When you have a shiny new hammer, and not enough fitting nails.

elcomet · on Sept 22, 2021

But what about noise ? Is it really accurate in a real world environment ?

johnthescott · on Sept 22, 2021

ok, a compromise. let's do ML on the power spectral density of the train audio. or just use lidar.