Give CD a Chance

According to a recent newsletter sent to its regular contributors, our "competition"—The Absolute Sound—sees "controversy and confrontation" as the core of its editorial policy. By contrast, Stereophile sees as its modus schtickus an unflagging devotion to, and pursuit of, truth, reason, all of the eternal verities (including some you never heard of), and the intelligent exchange of informed ideas. In honor of all of the above-mentioned precepts (as well as some I didn't mention), this issue of Stereophile is largely devoted to the confrontation between knowledgeable writers for whom the widely proclaimed perfection of the Compact Disc remains a controversial issue.

Every technological advance in sound reproduction has been hailed as "unmusical," "unnatural," and "contrary to God's law." The first electrical recordings were condemned (by those who cared about sound at all) as "shrill," "steely" (footnote 1) and "unmusical." The first stereo discs were castigated by most sonically-aware critics on precisely the same grounds, except that two new cavil criteria had been added: inner-groove distortion and mistracking. Could we really have expected CD to be greeted with any less skepticism.

Several of digital's critics point out that PCM has unleashed new and unfamiliar forms of distortion on reproduced sound. They then proceed to explain these in terms of PCM's sampling rate (too low) and 16-bit encoding (not enough bits). JA explains elsewhere in this issue why the CD's 44.1kHz sampling rate is not (in theory) the disaster that CD's critics claim it to be. To that I will now add the reasons why I do not feel 16-bit encoding to be a liability either.

The number of bits (BInary DigITS) comprising each parcel of sampled information (Word) determines how many different numerical quantities can be expressed. One bit has only two states: 0=Zero (Off), 1=One (On), so it can express only two amplitude levels. Two bits can be used to express 4 values: 00=Zero, 01=One, 10=Two, and 11=Three. 16 bits, as used in most of today's PCM systems, including CD, allow us to encode 2 to the 16th power (65,536) amplitude levels. The question is, is that enough for music reproduction?

Only a madman would care to listen to an audio system at a level of more than 115dB, which is only 5dB below the level required to produce actual physical pain in the ears. Most audiophiles, even when not constrained by considerations of neighborliness, rarely listen at levels higher than 105dB (even a loud soundtrack explosion in a well-equipped movie theater with Dolby sound rarely exceeds 105.) No sound at all is, of course, considered to be 0dB, and it is almost impossible to find such a quiet environment. Even the best sound-isolated anechoic chamber may have a noise floor of 5dB, and a concert hall's ambient noise is rarely less than 25dB. But let's assume, just as a worst-possible case, that the recording venue had an ambient noise floor of 20dB, that we can hear sounds whose level is 15dB below that noise floor to a level of 5dB (which we can), and that we're going to say to Hell with the neighbors and listen at peak levels of 115dB. The dynamic range we will need then is 110dB, and we will usually need much less than that.

With so-called linear encoding, those 65,535 recordable volume increments are all of equal size, and if the encoding system were capable of recording a dynamic range of 110dB, each level step will have a magnitude of less than 0.01dB. Since not even the most golden-eared perfectionists claim to be able to hear a change of much less than 0.1dB, it is clear that 16 bits are more than we need to provide what sounds like a continuous (analog-type) change of signal level.

But the quantization is not that precise. Only when the signal level falls precisely at a quantizing step point will it be perfectly accurately encoded. If it lies about midway between two adjacent step points, the A/D converter can encode it either way: at the upper level or at the level below. Either way, the quantization will be inaccurate, by a factor of up to half the difference between those two quantizing levels.

The sum of such errors is called quantization noise or distortion and, if gross, can be heard as a hiss that fluctuates in accordance with the signal level (footnote 2). In a perfect 16-bit system, it occurs at 1/131,070 the level of the highest recordable signal. If you care to look that up in a decibel table (for power ratios), you'll see that it represents 98dB, which is also precisely the dynamic range which can be encoded by the CD system.

That's quite a bit shy of the 110dB that we figured we'll need for perfect music reproduction, but CD's promoters undoubtedly assumed that it was more than would be necessary for the mass-market system that CD was intended to be. Even if a recording actually had 98dB of dynamic range on it (which very few have), it was reasonably assumed that most people would never listen louder than 90dB, so they would never hear the quantization noise at the system's cutoff point. It would be 8dB below the 0dB threshold of normal hearing.

But to an audiophile, 90dB is almost considered to be a high background-music level. And at 105dB on peaks, CD's modulation floor is 7dB above the hearing threshold, where it may or may not be masked by ambient noise in the listening room. At best, we may hear a rough quality to the weakest musical overtones; at worst, we'll hear an irritating hiss riding on the softest sounds. And we might also find that the hall reverb cuts off abruptly just above the point where it should fade to silence. In other words, it appears that the CD system can't meet the needs of the perfectionist. And we all know that the format standards for CD are so rigid they can't be modified to improve its performance, right?

No, wrong!

In analog recording on tape, the magnetic properties of the oxide particles behave in a very erratic fashion in response to a weak magnetic field. Some will change polarity, others won't, and the result is very poor tracking of low-level signal amplitude changes, resulting in gross distortion of moderate-level signals and a total loss of the quietest ones. This problem was solved by mixing an ultrasonic "bias" (70-200 kHz) signal in with the audio signal, which keeps the magnetism on the tape alternating continuously in polarity at a high enough level that the particles' residual magnetism is held above the nonlinear region. Being ultrasonic, the bias is inaudible. The audio signal is simply superimposed on the bias, allowing low-distortion encoding of low-level material.

The cure for CD's modulation-floor limitation is something analogous to tape bias. Instead of an ultrasonic signal, a PCM system uses white noise, at a level of just a dB or so above the modulation floor. White noise, which sounds like a sibilant hiss, is a complex signal consisting of random frequencies at random amplitudes (footnote 3), and spanning the entire audio range or beyond. Covering the modulation floor, its random energy spikes add to the intensities of the lowest-level signals to permit them to be encoded in a linear fashion, although at intervals which are far enough apart so as not to make them, in playback, any louder than they were originally.

The subjective effect of this noise "biasing," more correctly termed "dithering," is dramatic. Not only does it eliminate quantizing-error noise at very low signal levels, it also extends the effective modulation floor by a full 15dB or so below what an undithered system can record! This bestows upon our "rigidly standardized" CD system a usable dynamic range of 113dB, which is 3dB more than the 110 we figured as the most extreme requirement.

Finally, it must be acknowledged that, although dithering is now generally recognized as an important element in PCM recording, it is still not universally designed into recording systems. Few mastering recorders have dither "built-in", although nearly all of them have it inadvertently, as a result of residual background noise in their audio input signal or circuitry. Practically all CDs, therefore, are dithered, by design or otherwise.

So, if neither sampling rate nor number of bits are sabotaging the CD, why do so many people dislike its sound? I think it's due to a number of things.

First, the whole idea of digital—the chopping up of music into little pieces, and reconstituting it like powdered orange juice—is offensive to some people. Others are offended by the idea of measuring time—which is the measure of music—as quanta rather than as a continuum. (Yet their "non-digital" wristwatch has an escapement which goes tick, tick, tick, in a most discontinuous fashion.)

Many of the complaints about CD sound are justified, however. We now know that a CD player's audio electronics and D/A conversion accuracy have a great effect on the sound. And as long as the sound of CD players continues to improve, we cannot truthfully say that we know, yet, what a CD really sounds like, or how bad or good "CD" is. And CD player refinement is obviously just the start of a long evolution of the kind that the system's detractors feared was impossible because of CD's rigid standardization.

Certainly, the audio input and A/D conversion circuitry of PCM recorders is long past due the kind of attention being lavished now on playback machines. What about DC-coupled audio circuits, isolated and regulated power supplies, and oversampled A/D converters in the machines on which CDs are mastered? What about the development of lower-distortion mixing consoles, and getting rid of all those signal processors, and using lower-distortion, smoother-response mikes in the recording studio and concert hall, as mentioned by James Boyk later in this issue?

What about giving CD the 28-year chance to prove itself that we gave the LP?

Footnote 1: The first use of this term that I know of dates back to 1914! Much of the language of "subjective" audio assessment predates the era of "perfectionist audio."—J. Gordon Holt

Footnote 2: The live Lorin Maazel Beethoven 5 recording, on CBS/Sony, has some of the most beautiful examples of audible quantizing distortion that one could wish to hear. You hear it as a granular fuzz that rides on the envelope of the instrumental sound. Buy this CD, if only to educate your ears to the new sound of digital.—John Atkinson

Footnote 3: Strictly speaking, so is "pink noise," but their spectral energy distributions are different. Pink noise has equal energy through each octave, white noise has equal energy at each frequency. And there are a lot more frequencies between 1000 and 10,000 Hz than between 100 and 1000.—J. Gordon Holt