What's Going On Up There? Letters part 2

Transient information and ultrasonic spectra

Editor: In the penultimate paragraph of his article in the October 2000 Stereophile, John Atkinson wrote: "Something other than pure frequency response must be going on."

Well, yes, John, it is. That's because we don't hear music as a frequency-domain phenomenon, it isn't processed by our recording/reproduction chains as a frequency-domain phenomenon, and it isn't created as a frequency-domain phenomenon, and certainly not a steady-state frequency-domain phenomenon (which is what would be required for the Fourier analysis that gives rise to "frequency response" to be even a good approximation).

Music is a highly dynamic, sharply transient time-domain phenomenon. What's important (to us as listeners) for those cymbals and drums is not their spectral content, but the sharpness of the rise in sound intensity as a function of time. Now, it is true that electrical engineers learn in college/university that such an intensity-vs-time signal can be transformed into an amplitude-vs-frequency graph giving its spectral content, using the Fourier Transform. But the Fourier Transform is truly correct only for continuous signals that have the same waveform repetitively over an infinite time period. It is only approximately true for signals that are "continuous and have the same waveform repetitively" over a time interval that is very long compared to the period of the lowest "frequency" (repetition) of the signal. Neither of these is true for those dynamic transient cymbal sounds, so the Fourier Transform does not accurately relate the "spectral content" of those cymbal sounds to their actual time-domain existence.

What this means is that the human listener can hear the complete dynamic transient characteristics of those cymbal sounds, even though a frequency-domain analysis of human hearing suggests that the "spectral content" of those sounds includes frequencies that the human ear is unable to respond to—the human ear simply isn't hearing those sounds in the frequency domain!

That doesn't mean those cymbal sounds don't have a meaningful spectral content, or that JA didn't measure something real (assuming he used a measuring instrument that does an actual spectral scan, not one doing a Fast Fourier Transform of a time-domain signal). It just means that what you're measuring has nothing to do with how a human hears the music.

What JA measured does have to do with what is required for an audio system (and particularly a digitally sampled system) to reproduce those sounds correctly, however. His spectral-content analysis shows the flaws inherent in concluding that the necessary frequency bandwidth and sampling rates of audio systems can be determined simply by analyzing the frequency response of the human ear. Because the Fourier Transform isn't valid for those dynamic, transient musical sounds and resulting signals, the assumption simply isn't so. That's why even completely analog systems have always had to have frequency bandwidths much higher than the range of human hearing. It's also why the original CD sampling rate of 44.1kHz is too low for the proper reproduction of those transient sounds from drums, cymbals, etc.

There's nothing mysterious going on. All that is required is a recognition of the strict limitations of Fourier analysis, and its inapplicability to the capture and reproduction of transient, highly dynamic musical sounds.—Don Winter, Hermosa Beach, CA, donwinter@earthlink.net

Ultrasonic spectra and better sound quality

Editor: Why do high-sample-rate audio and upsampled audio both sound better than 44.1 or 48kHz data? It's actually quite simple. I'm convinced that human hearing above 20kHz is not responsible—or at least, if it occurs at all, is not the principal factor. This is borne out by the fact that upsampled 16-bit/44.1kHz sounds almost as good as true 24/96. My hypothesis depends on the opposite conjecture, which is that humans are virtually deaf above 20kHz. Indeed, if this were not so, 44.1kHz data upsampled to 88.2kHz, 96kHz, or higher sample rates would sound terrible. This would be so because of the large image energy above 20kHz due to aliasing that occurs with upsampling, as opposed to oversampling, in which a sharp digital filter removes all the images above 20kHz.

Many have conjectured that the superiority of upsampling is due to less time-smearing because there is no sharp anti-imaging filter. However, if time-smearing were a factor, then upsampling could not work. The reason is that there must be a decimation filter to downconvert a 96kHz master to 44.1kHz for distribution on the CD format. This decimation filter is necessarily sharp and introduces uncorrectable time-smearing, which precedes the upsampler.

I have had many e-mail exchanges with Mike Story of dCS, who was mentioned in John Atkinson's article on the subject in the October issue. It seems that Story is stuck on the time-smearing theory. As is well known, the time-smearing introduced by most speakers is many orders of magnitude larger than the time-smearing caused by digital filters. Yet Mr. Story and many researchers ignore this fact, and are quite content with non-time-coherent speakers for their listening tests.

The explanation for the superior sound quality of upsampling is not related to time-smearing. It is simply that the large amount of ultrasonic image energy presented to the output DAC serves to dither it in a way that does not degrade the overall signal/noise ratio. I call this "signal-dependent dither." When there is no signal there are no ultrasonic images, and thus no dither. With a large signal the ultrasonic images are also large, and so the dither is large.

This explains why upsampling sounds good even with inferior DACs. They are super-linearized by the signal-dependent dither generated by upsampling. The ear acts as the final analog filter to filter out the ultrasonic energy.

Even straight 24/96 provides signal-dependent dither. As JA showed in his article, there is a lot of ultrasonic energy captured on high-sample-rate recordings. But we cannot hear it. It serves only to dither the output DAC and turn it into a nearly analog-like device by dithering away its differential nonlinearity.

The test of any hypothesis is its ability to explain a wide range of empirical data. I believe that mine is the only one out there able to meet that test.—Douglas Rife, DRA Labs, dra@gte.net