The Law of Averages Page 2
For readers not au fait with the conventional downsampling (decimation) process and the concept of aliasing, a few words of explanation are due at this point.
In order for digital encoding to capture a signal's content accurately, the signal must not contain any frequency component above half the sampling rate—the famous Nyquist Criterion. When a continuous signal is converted into a discrete (ie, digital) equivalent within an analog-to-digital converter, this condition is ensured by first passing the signal through a low-pass filter that removes signal components above half the sampling rate. If this is not done, then above-Nyquist components will cause aliasing distortion, so named because they are misidentified as frequencies below the Nyquist limit.
Take, as an example, a slowly swept sinewave, and let's assume the sampling rate is CD's 44.1kHz. As the signal frequency rises toward half the sampling rate, 22.05kHz, all is well—it will be correctly represented in the digital encoding. But as the signal frequency rises above 22.05kHz (we are considering what happens without input filtering, remember), an odd thing happens. When the Nyquist limit is busted, the encoded signal frequency appears to bounce off a glass ceiling at half the sampling frequency, so while the actual input frequency continues to rise, the encoded frequency now falls. A 30kHz signal is encoded as 14.1kHz (44,100-30,000). As the input passes through 44.1kHz, something similar happens: the encoded frequency now bounces off the 0Hz limit and begins rising again. A 58.2kHz signal is again encoded as 14.1kHz (58,200-44,100). And so on and so forth.
This aliasing behavior can give rise to notably unpleasant distortion on music program, so appropriate input filtering is usually applied to prevent it. (Having said which, if you study the filter responses of most audio ADCs, you will discover that the attenuation provided by the input filtering is typically only 10 or 20dB down by the Nyquist frequency. So a little aliasing is actually permitted. In the case of 44.1kHz sampling, this premeditated infringement is useful because it prevents the filter curtailing the response below 20kHz. Provided the filter achieves high attenuation by 24.1kHz, aliasing products will be kept out of the nominal audio passband below 20kHz.)
When downsampling a digital signal from a higher sampling rate to a lower one, exactly the same requirement applies: to prevent signal components above half the new sampling frequency being aliased, they need to be removed by low-pass filtering. So the textbook block diagram of the decimation process always shows it being achieved in two stages: first the digital signal is low-pass filtered, then the sampling frequency is adjusted to the new value. In the case of downsampling from 176.4kHz to 44.1kHz, the latter simply involves retaining every fourth sample and trashing the remainder.
As Tony Faulkner has described, the downsampling method he hit upon serendipitously works rather differently. No low-pass filtering is applied, at least not ostensibly. But rather than simply extracting one in every four samples to construct the downsampled signal—which results in obvious aliasing—he averages each block of four adjacent samples.
Give this averaging process a little thought and you'll appreciate that it amounts to a filtering of sorts. Imagine what occurs at a frequency one quarter the input sampling rate (44.1kHz for 176.4kHz sampling). Each cycle is sampled exactly four times, the samples symmetrically disposed so that each has an equivalent of equal amplitude, but opposite polarity in the other half-cycle. Averaging the four samples therefore gives a result of zero (forgetting, for convenience, any small departures resulting from dither). Likewise, at half the input sampling rate (88.2kHz for 176.4kHz sampling) there will be two equal-amplitude, opposite-polarity samples per cycle, which again will cancel when averaged.
This begins to sound something like a comb filter, which indeed it is. If you plot the frequency response of the 176.4kHz four-sample averaging filter, it looks like fig.1, while that for the 88.2kHz two-sample equivalent looks like fig.2. In the first instance, the in-band response is 0.7dB down at 10kHz and 2.9dB down by 20kHz; in the second, the figures are 0.6dB and 2.4dB at the same frequencies—not unlike what we're used to seeing from Wadia CD players.
Fig. 1 Faulkner "averaging" of 176.4kHz-sampled data comb-filters the input signal. (20dB/vertical div.)
Fig. 2 Faulkner "averaging" of 88.2kHz-sampled data has a similar filtering effect. (20dB/vertical div.)
Clearly, this filtering is much more gentle than that of a typical "brick wall" filter (fig.3). More significant, perhaps, it also has a clean impulse response, unlike that of the brick-wall item (fig.4), which shows the characteristic pre- and post-ringing of an abrupt linear phase filter. But manifestly, the averaging filter will not prevent aliasing. So how come the end result—the 44.1kHz file generated using the Faulkner method—doesn't make your hair stand on end and your teeth grind?
Fig. 3 Conventional "brick wall" low-pass filtering of 176.4kHz-sampled data applied prior to downsampling to 44.1kHz. (20dB/vertical div.)
Fig. 4 Conventional "brick-wall" low-pass filter, impulse response.
The answer lies in the spectrum of the input signal. Tony provided me with WAV-file copies of two excerpts from 176.4kHz masters he's downsampled for CD using the averaging technique. One was a five-minute section from Tchaikovsky's Eugene Onegin (Owain Arwel Hughes, Royal Philharmonic Orchestra), the other a longer excerpt from the aforementioned Symphonie Fantastique. Trawling through these using real-time FFT, the worst spectrum I could find—in the sense of threatening the severest aliasing—was that shown in fig.5. As you can see, the frequency content is already about 35dB down by 22.05kHz relative to 1kHz, and dives into the noise floor before 50kHz. So even in this worst case, the level of aliasing distortion "folded back" into the passband will be relatively low—perhaps low enough to be masked by the genuine in-band signal components.
Fig. 5 Spectrum, 1kHz-100kHz, of a Faulkner classical orchestral music recording, sampled at 176.4kHz. (20dB/vertical div.)
To check this out, I first wrote a piece of software to downsample Wave files using the Faulkner method, then enlisted the help of Syntrillium's Cool Edit Pro software editor (CEP) [now Adobe Audition—Ed.]. Using its Scientific Filter option, first I high-pass-filtered one of the 176.4kHz masters using an 18th-order Chebyshev alignment set at a corner frequency of 22kHz. Next I downsampled this to 44.1kHz using the software I'd written, then upsampled it back to 176.4kHz using CEP. What this palaver created was a 176.4kHz file containing the aliasing distortion generated by the Faulkner downsampling process. I also used CEP to downsample the 176.4kHz original to 44.1kHz the conventional way, which allowed me to assemble the following tracks for burning as a DVD-A to DVD-R (using Minnetonka Audio Software's discWelder Chrome authoring package):
1) the 24-bit/176.4kHz original
2) the 24-bit/44.1kHz Faulkner downsampled version
3) the 24-bit/44.1kHz CEP downsampled version
4) the left channel of the 24/176.4 original with its Faulkner aliasing distortion on the right channel
5) the right channel of the 24/176.4 original with its Faulkner aliasing distortion on the left channel.
(This is one of the unsung benefits of DVD-Audio for inveterate tinkerers like me: You can record files of different sampling rates to the same disc and replay it on the same player, thereby eliminating many of the variables that would otherwise afflict such an exercise.)
When I played the resulting disc using an Arcam DV89 DVD player, the results unequivocally justified Tony Faulkner's faith in his downsampling method when used with this type of source material. Although the Faulkner-downsampled track wasn't as vibrant and airy as the 176.4kHz original, there was no question it outshone the conventionally downsampled equivalent, which had a pervasive grayness to it and was obviously "pinched," both spatially and dynamically. Listening to the last two tracks confirmed that aliasing in the Faulkner downsampled file was at a low, probably inaudible level. When replaying these tracks, even at realistic levels, I had to put my ear to within a couple of inches of the tweeter on the aliasing distortion channel to hear anything from that side—an occasional ffff, ffff, ffff modulation of the noise level, synchronized with the music's pulse.
If anyone would like to try the Faulkner downsampling method for themselves, I've posted FaulknerDownsample.zip on the freeware page of my website, where you will also find other software I've written in the context of articles published elsewhere. I appreciate that few of you will have access to 88.2kHz or 176.4kHz masters, but those who do may like to point the software at some of them and report their findings in "Letters." (Please note that the software may not be used for commercial purposes.)
If, having tried it, you agree that a little aliasing is indeed sonically preferable to steep low-pass filtering, the question forming on your lips will be: Why? Explanations have been offered that relate to the ringing of high-rate filters and the "energy smear" they cause (this is the view Tony Faulkner subscribes to), but I'm not utterly convinced of this, for reasons I may regale you with another time. (Note that I'm not talking here of the discrete pre- and post-echoes that afflict filters with passband ripple—that's a different matter.) The test of this idea is, or ought to be: Has anyone demonstrated this putative energy smear in a music-like signal subject to steep low-pass filtering? Not that I'm aware of, although I'm sure you'll let me know if you reckon otherwise. (Hint: Impulse testing doesn't qualify.)
If energy smear isn't the answer, then I have one other possibility up my sleeve, but there it stays until I've had longer to establish whether I believe it myself. Sorry to be a tease. In the meantime, we don't have to know why to appreciate that this is another means of wringing the best from CD. It will never work with source material having energetic high-frequency content, rock cymbals, foe example—the aliasing would be unacceptable—but for a range of other musical forms, it could be just the ticket. I wonder if anyone else has the guts to do as Tony Faulkner has done: stand digital etiquette on its ear in the cause of improved sound.