A Question of Scale
To take an example at random from our own pages, if the improvement afforded by Armor All was "a revelation" to Sam Tellig, the Audio Anarchist, what language would he need to employ when describing the difference between the Esoteric P2 CD transport and a conventional CD player—an improvement that, to my ears, is ten times greater than the benefit provided by Armor All and green paint combined?
At Canada's National Research Council in Ottawa, Floyd Toole's speaker listening tests employ a variety of grading scales that culminate in a numerical ranking system. The sound of each loudspeaker is ranked in terms of several subjective parameters (brightness, bass extension, resolution of detail, depth imaging, et al). For each parameter, listeners rate the speaker on a linear scale whose divisions are marked with adjectives (for example: very bright, moderately bright, slightly bright, neutral, slightly dull, moderately dull, very dull). Each rating scale can later be converted to numbers in order to enable computer-averaging of the ratings by many listeners, or to compare an individual listener's consistency of judgment from day to day. This is not a new idea; similar ranking schemes are widely used in psychoacoustic studies and other research involving sensory differences.
At the conclusion of every listening test, each listener assigns to each speaker a judgment of overall fidelity, using a numerical rating scale. The end points of the fidelity scale are calibrated thus: 10 is the sound of live music (remembered or imagined), while 0 is voice-grade sound quality like that delivered by a telephone or small transistor radio—just adequate for intelligible speech.
Toole has found that trained listeners with good hearing are able to produce remarkably consistent judgments. When I participated with several other US reviewers in a speaker-judging session at the NRC a few years ago, I was impressed by how well our judgments correlated with each other, despite our varied listening habits and musical preferences. We didn't all assign the same numbers; some used consistently high numbers, some low. But after adjustment for that scaling bias, we all ranked the differences in about the same way, agreeing as to which speakers ranked high, which low, why we ranked them that way, and how large the difference was.
On Toole's 10-point scale good hi-fi speakers generally rank between about 7.0 and 8.5; thus the world of serious hi-fi occupies only about one-sixth of the overall scoring range for sound. Anything below 7.0 falls into the mass-market mid-fi category and can safely be ignored by serious audiophiles, while the finest high-end speakers score around 8.5. The gap that remains between the top rating and 10 reflects the consensus that the best two-speaker stereo still doesn't sound quite like live music.
I think it's safe to assume that speakers good enough to qualify for inclusion in Stereophile's "Recommended Components" listing would occupy only the upper part of the "good" range, say from about 7.7 to 8.5, which amounts to a total span of 0.8. If this is true, each Class (A, B, C, D) in our list of Recommended loudspeakers would correspond to just 0.2 point on Toole's scale. I'm just guessing here, of course, but the numbers look reasonable. Toole has not endorsed any attempt to correlate his scale with judgments made elsewhere.
With this in mind, how would we score other differences, for example those between CD players or the (generally smaller) differences between amplifiers? To some extent, of course, we're comparing apples with oranges here. Differences between CD players not only are not of the same degree but also are not usually of the same kind as those between speakers. But I think it's fair to make the attempt, in order to put into some sort of consistent perspective the subjective size of the differences we hear.
Speaking only for myself, I would guess that the largest differences I've heard between competent CD players (for example, between the Esoteric P2 transport and a low-cost Philips, or between the Adcom GCD-575 and my previous player) would correspond to about 0.1 point on the scale. I said earlier that the benefit of Armor All and green paint seemed only one-tenth as large as the improvement afforded by the P2. Therefore the combined effect of Armor All and green paint would make about 0.01 point of difference on Toole's 10-point scale of sound quality.
Don't get me wrong; I'm not seriously proposing that Stereophile reviewers should start using this (or any other) system of numerical rankings in their reviews. Nor do I mean to suggest that a subtle (0.01 point) improvement in sound is either trivial or unimportant. That judgment, as always, must be an individual one: what is important to me may be trivial to you, and vice versa. Moreover, a series of tweaks that are individually subtle can add up to an overall improvement in musicality that we all would find satisfying.
I do think that a numerical scale is a useful tool to keep in the back of our heads, as a way of reminding ourselves that we should distinguish small changes from large ones in our writing. If the best wine critics can do it, why not we? The world of the audiophile is like that of the oenophile: both involve expensive products that differ from each other (and from mass-market versions) in ways that only customers who have sharpened their tastes and perceptions can appreciate.
The craft of describing these differences is one that we must constantly strive to refine. We write for this magazine because we have a lifelong passion for music wonderfully reproduced. As lovers, we fall too easily into the trap of overpraising what we like and over-damning what we don't. When a wine critic complains that a particular '82 Chardonnay tastes like mouthwash, you probably shouldn't take him literally either.