For an article containing a lot of "well, if you knew signal processing..." there are two fairly major oversights:
1) Any well-designed system is going to have headroom. Period. Just because 48kHz can capture the frequencies the human hear theoretically, it's always good to have a little wiggle room. This comes into play even more with interactive situations: humans are particularly sensitive to jitter. Having an "overkill" sample rate lets you seamlessly sync things easier without anyone noticing.
2) 192kHz comes with an additional benefit besides higher frequencies: it also means more granular timing for the start and stop of transients. More accurate reverb would be the obvious example. I don't know if the human ear can discern the difference between 0.03ms and 0.005ms but it's something I don't see mentioned often.
2) increased sampling rate does not improve timing. This also has been researched in detail (because it sounds like it could possibly be true given that the ears can phase match to much greater granularity than the sample clock). It was found false in practice, and in retrospect, the sampling theorem explains why. The Griesinger link discusses this with illustrations, and provides a bibliography.
48kHz already has enough 'wiggle room'. How many people do you personally know that can hear a 24kHz sine tone?
> more granular timing for the start and stop of transients.
... it's something I don't see mentioned often.
Probably because it doesn't make sense. Human ears cannot hear frequencies about 24kHz and Nyquist tells us that 48kHz is enough to completely capture all the detail of a signal at that frequency and below.
> Having an "overkill" sample rate lets you seamlessly
> sync things easier without anyone noticing.
You can get the same theoretical benefit by oversampling on playback. And a lot of audio equipment does just that.
> 192kHz ... also means more granular timing for the
> start and stop of transients.
Not really, for two reasons -- unless you're talking about glitch music, transients are unlikely to ever be so sudden that the difference between 0.03ms and 0.005ms could possibly matter.
I'm pretty sure that #2 isn't true; signal processing folks will be able to phrase this better than I can, but I think that if you have enough information to capture the waveform at a given frequency, you also have enough information to precisely place it in time - phasing errors are more likely due to quantization error, which is about bit depth, not sample rate. No?
This is completely incorrect, by shannon (http://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_samplin...). The sampling frequency determines the maximum frequency that can be captured, not the temporal resolution. That said, a transient containing higher frequencies will be sharper than a transient that doesn't, but its onset time resolution will not be determined at all by the sample rate.
Said another way, two band limited pulse signals with different onset times, no matter how arbitrarily close, will result in different sampled signals.
> two band limited pulse signals with different onset times, no matter how arbitrarily close, will result in different sampled signals.
This is true, but different than what I am arguing. You're saying that a listener over time will be able to tell that the two signals differ. I am saying that a listener will be able to determine this at fractional wavelengths.
It's similar to dithering a high dynamic range signal onto a lower bit depth: more than two samples are required for "evidence" of two different signals, while sampling at a high enough rate will tell you this almost instantly.
Again, I don't know if human ears are able to detect this, just that I haven't seen it addressed in these discussions.
As a thought experiment, let's consider a pulse that has been band-limited to 20kHz. Are you arguing that the analog output of a (filtered, idealized) DAC would look different depending on whether the dac was running at 44.1kHz vs 192kHz? If so, I don't think many people would agree with you.
Any difference in the "timing" of the output wave would have to come from energy that falls above nyquist of the slower sample rate. So, while I agree with you that the timing would be sharper, this is exactly caused by "higher frequencies", not by some other sort of timing improvement.
> Are you arguing that the analog output of a (filtered, idealized) DAC would look different depending on whether the dac was running at 44.1kHz vs 192kHz?
No. I'm arguing this: take a 44.1kHz signal and upsample it to 192. It's the same signal, same bandwidth and everything. Duplicate the stream and add a 1 sample delay to one of the channels. When you hit play, that delay would be there. If you downsampled the 44.1kHz signals after applying the delay to one of the channels, you would almost hear the same thing. The difference is that you could not detect the difference between the signals until after a few samples. With the 192kHz stream it would be unambiguous after 2.
Remember, Nyquist-Shannon holds if you have an infinite number of samples. If your ears could look into the future then what you say is perfectly correct, but they need time to collect enough samples to identify any timing discrepancies.
i think what jaylevitt is referencing to is that there is interpolation going on in the dac. that could mean (i'm no dac expert, so not sure) that the dac could guess more granular than the sampling rate would allow the start points (of transient e.g.)
but the question for me is how exact that guessing is.
correct me if i'm wrong but, that interpolation happens twice: when recording by the adc and on playback by the dac.
so a lot of that whole discussion (yeah, finally something about acousticts :) depends on how accurate interpolation works in adcs and dacs.
This is the core secret of the sampling theorem. It says if you have signals of a particular type (bandlimited) you can do a certain kind of interpolation and recover the original exactly. This is no more surprising than the fact that you can recover the coefficients for an N degree polynomial using any N points on it, though the computation is easier.
It turns out that if you reproduce a digital signal using stair steps you get an infinite number of harmonics— but _all_ of them are above the nyquist frequency. The frequencies below the nyquist are undisturbed. Then you apply a lowpass filter to the signal to remove these harmonics— after all, we said at the start that the signal was bandlimited— you get the original back unmolested.
Because analog filters are kinda sucky (and because converters with high bit depth aren't very linear), modern ADCs and DACs are oversampling— they internally resample the signal to a few MHz and apply those reconstruction filters digitally with stupidly high precision. Then they only need a very simple analog filter to cope with their much higher frequency sampling.
But at a given sample rate, if I'm sampling at bit depth 2, doesn't that quantization error end up temporally shifting the sine wave I'm reconstructing?
It's not the timing differences, it's the phase differences. The ear is exceptionally sensitive to phase differences between the ears below 1kHz. This information is captured exactly (to well beyond the naive precision of the sampling clock) for any frequency below Nyquist.
0.03ms is 33kHz - you can't, no matter how much you want to, make a granular timing that is faster than at least one cycle of the frequency you are using. 0.005ms is 200kHz BTW.
This isn't true. Sample a bandlimited impulse. The exact timing is encoded into the gibbs oscillations of the signal. So long as you have a high enough SNR you can have timing as precise as you want. (and because the ear doesn't work with ultrasonics— it is itself bandlimited— it uses the same phenomena for timing)
Humans are sensitive to jitter, but jitter isn't a major problem with modern digital electronics and reclocking strategies. This ArsT thread hashed out these issues a couple of months ago: http://arstechnica.com/civis/viewtopic.php?f=6&t=1164451...
1) Any well-designed system is going to have headroom. Period. Just because 48kHz can capture the frequencies the human hear theoretically, it's always good to have a little wiggle room. This comes into play even more with interactive situations: humans are particularly sensitive to jitter. Having an "overkill" sample rate lets you seamlessly sync things easier without anyone noticing.
2) 192kHz comes with an additional benefit besides higher frequencies: it also means more granular timing for the start and stop of transients. More accurate reverb would be the obvious example. I don't know if the human ear can discern the difference between 0.03ms and 0.005ms but it's something I don't see mentioned often.