I am 100% abundantly positive that signal processing code could do this, and in fact by signal processing standards it's not even particularly hard. You may need more time than the app takes, though I am inclined to think that the app doesn't exactly need a number to twelve significant digits and you could get a single-digit-percentage significant digit lock pretty quickly. Even at 33&1/3rd you only need a few seconds for three or four revolutions.
I am much less confident about my claims that it is something you could bash together in "normal code" from an FFT, without advanced math, but it still seems likely to me. You have huge stonking correlations between the frequencies you can exploit. Imagine a normal FFT chart like you've seen any number of times. Now, take that same thing and wave it up and down quite visibly on a sine wave. Nice and big in your imagination so you can see it. You think that's not something that could be picked up? Now, scaling it down to where you can't hear it anymore may make it harder to believe, but the same code will pick it up. To a computer it would still be clear as day. This is one of those things microphones pick up much, much better than human ears, just like microphones trivially pick up a quite 1001Hz tone next to a loud 1000Hz tone even though we can't hear it at all.
Compared to, say, recognizing a voice and extracting words from it, this is pretty trivial stuff.
> Now, scaling it down to where you can't hear it anymore may make it harder to believe, but the same code will pick it up.
That's where you're wrong. FFT frequency bands are surprisingly wide. You can make them narrower but with the tradeoff of losing temporal resolution. And it gets worse the lower the frequency gets.
There is absolutely no way you're going to detect a near-0.555 hZ effect from a few seconds of audio and determine whether it's off the frequency by 0.1% or even 1%.
Like I said, sure if you're dealing with a pure sine wave. But not a complex signal using FFT.
Or to put it another way -- a 1,000 hZ signal? Absolutely. But a 0.5 hZ signal? Absolutely not.
There are various DFT-based algorithms for high-precision pitch detection.
Two common algorithms are cepstrum and analysis, and auto-correlation, which involve taking the DFT or inverse DFT of the absolute value of the DFT of the signal.
Find the peaks of the result, and then fit a cubic polynomial to the the peak, and the bins on either side, and then calculate the maximum value of the polynomial. The value at the which the maximum occurs determines the inverse frequency, which can then be converted to pitch.
Both algorithms produce results that are accurate to less than 0.1 cents. You do have to tweak buffer sizes and windowing depending on what pitch ranges you are interested in, and do some post filtering to skip over transients.
The temporal resolution problem is solved by calculating the result on overlapping frames. .
Sure but the problem remains: you can't do that with only a few oscillations of a weak signal against a loud noisy complex signal.
You simply can't detect an inaudible-to-human-ears 0.5 hZ signal from 3 or 5 seconds of complex normal-volume audio, down to the accuracy of cents, much less 0.1 cents.
As I said above: a 1,000 hZ signal? Absolutely. But a 0.5 hZ signal? Absolutely not. There just isn't enough signal for that level of precision. No matter what tool you're using.
But you could easily detect frequency modulation of a 220Hz signal by a 0.5 Hz sin wave, which would have sidebands separated by 4 cents. This is conceptually similar to heterodyning. Wow in the source material ends up creating sidebands of the source material in a frequency range that is more amenable to signal analysis. Whether this works or not depends on how much wow an actually record player has. But a back-of-the-envelope calculation seems to suggest that very tiny amounts of wow should create detectable side-bands.
My suspicion is that OP assumed that the source material was accurately tuned to A=440, which is not a safe assumption, but is probably true for any source material that has a keyboard instrument which will almost always be tuned to A=440. Calculate the reference pitch for the source material, and you can tell how much the speed of the turntable is off. (And as others have pointed out, may be completely buggered by common mastering practices, and by Original Instrument recordings of classical music using pitch references other than A=440).
But it doesn't seem implausible that you could use analysis of wow in the source signal too.
I am much less confident about my claims that it is something you could bash together in "normal code" from an FFT, without advanced math, but it still seems likely to me. You have huge stonking correlations between the frequencies you can exploit. Imagine a normal FFT chart like you've seen any number of times. Now, take that same thing and wave it up and down quite visibly on a sine wave. Nice and big in your imagination so you can see it. You think that's not something that could be picked up? Now, scaling it down to where you can't hear it anymore may make it harder to believe, but the same code will pick it up. To a computer it would still be clear as day. This is one of those things microphones pick up much, much better than human ears, just like microphones trivially pick up a quite 1001Hz tone next to a loud 1000Hz tone even though we can't hear it at all.
Compared to, say, recognizing a voice and extracting words from it, this is pretty trivial stuff.