The Blind leading the Deaf

As the person who "invented" subjective testing, I have followed with great interest the many articles in the mainstream audio press which purport to prove that none of us can really hear all the differences we claim to hear, particularly those between amplifiers. My reaction has usually been: "Why didn't they invite me to participate? I would have heard the differences under their double-blind listening conditions." I could make that assertion with supreme confidence because I had never been involved in any such test.

A few weeks ago, however, I was involved in such a test. It happened during Bob Carver's visit to Santa Fe, in response to our report on his M1.0t amplifier, which he claimed to sound identical to a well-known perfectionist tube amplifier. Following our extended auditioning of both amplifiers, John Atkinson and I had declared that they didn't sound the same, so Carver insisted that we prove we were hearing differences, a demand which JA and I felt to be completely unwarranted and unreasonable. (Doesn't he trust us?).

Both JA and I report in full on the results of these further tests on the Carver on p.117, but I was anticipating no problems whatsoever. After all, the differences I had heard at one point during my preliminary listening were great enough for me to describe them as "dramatic." Certainly, any "dramatic" differences would be immediately audible under any conditions of comparison.

Before the blind tests began, I had been listening "informally" for about an hour to both amps, but attributed the fact that I had not been hearing "dramatic" differences to the additions to the system of the lightweight wiring and the switch which Bob had rigged up to allow instant comparisons. I assumed that, when we went back to hard-wired conditions, those "dramatic" differences would re-emerge. To my surprise and chagrin, they didn't.

I was, in fact, hard put to hear any differences between the amplifiers. That I was able to rack up four successful calls out of five was, I believe, only because I was listening for the two most conspicuous sonic differences I had heard previously, on program material which I had personally chosen to reveal those differences as obviously as possible. The limited (by time) number of blind trials were too few to make an airtight case for my ability to distinguish one amp from the other, but they nonetheless indicated that I could. And, it must be noted, the objective differences between the Carver 1.0t and its reference amp were by no means unmeasurable. On Carver's own null tests, nulling between the two amps was 11dB or so less than the 50dB that he had claimed would result in an inaudible difference. Thus, there should have been an audible difference. But what bothered me was why differences which I had previously described as "dramatic" should suddenly become "very small" under the conditions of a blind listening test. Why, in fact, do all blind listening tests seem suddenly to deprive trained, normally perceptive, listeners of their powers of discrimination?

The skeptic's viewpoint, of course, is that the differences reviewers claim to hear are due to nothing more than autosuggestion. We expect a tube amplifier to sound a certain way, so that's what we hear. The hard evidence to support that skeptical view is scant but overwhelming. The evidence to refute it is abundant, but almost entirely "anecdotal"—that is, "a lot of people have reported it, but no one has proven it." It is appalling that, after more than 100 years of sound reproduction, during most of which time anecdotal evidence of audible differences was practically all we had to spur on technological advances, there should still be serious questions about the validity of observational data. So-called subjective testing, today, is still viewed by most of the "scientific community" as being in the same category as psychic phenomena: not proven, and thus the province of crackpots.

The ability to hear these small differences does seem uncomfortably akin to extrasensory perception. Both seem stubbornly resistant to scientific corroboration, although there have certainly been enough attempts to verify both. Some tests have almost conclusively proven that listeners cannot distinguish between objectively similar components—that, under carefully controlled tests, the ability to make such distinctions simply evaporates. A few tests have suggested that, perhaps, under some conditions, some people may be hearing inexplicable differences. But hard, incontrovertible evidence for the latter continues to elude researchers (footnote 1). On the other hand, to those of us who do hear these things, the findings of some of those controlled listening tests have been laughable. Witness the one reported recently in Stereo Review, wherein listeners were unable to prove their ability to reliably tell a $219 Pioneer receiver from a $12,000 NYAL OTL-1—surely two of the most different-sounding components one could find today. (NYAL's Harvey Rosenberg responds to that test in "Letters" on p.30)

Actually, these "controlled" tests have always had several obvious shortcomings. The listeners are always a "cross-sample" of audiophiles who claim to hear differences between products, rather than people who appear to have demonstrated an ability to hear such differences. The tests are invariably conducted in a room unfamiliar to most of the panelists, through loudspeakers equally unknown. (Most recording engineers, who eschew today's state-of-the-art loudspeakers in favor of their old, familiar monitors, argue that, in making small quality discriminations, familiarity with a system is more important than the ultimate in resolving power.) In addition, it is extremely rare for such tests to be conducted with just one listener present, seated in the optimum seat, and making the decisions when to switch. Without that "luxury," JA assures me that, from his experience, panelists can very quickly become hopelessly confused.

Most such tests have utilized "the best" signal sources—master tapes or outstandingly good CDs—rather than ones whose distortion content might reveal differences in how much such content is exaggerated by different products. But perhaps most important of all, the conditions for making the discriminations in a formal, controlled test are entirely unfamiliar to people who have honed their listening skills through long-term listening. Instead of leisurely, unpressured listening, which is how most of us make these allegedly impossible discriminations at home, the panelists for these tests are being called on to make quick decisions under decidedly pressured conditions.

Time, of course, is one of their sources of pressure. Another, more important source is the desire to succeed—to prove that they actually hear, when it's important, what they claim to hear when it isn't. Most of us know all too well what pressure to perform can do to something as natural and mundane as sexual prowess; it is hardly surprising that pressure to perform might also adversely affect an acquired and definitely unnatural skill like assessing audio performance.

JA has suggested what strikes me as the most likely explanation for why "controlled testing" doesn't seem to work. His hypothesis is that the two conditions of listening—leisurely, unpressured experience of listening to music in the home, and controlled, high-pressure listening as part of a panel—call on different parts of our brain: the right cerebral lobe, which controls the motor functions for the left side of the body, for holistic impressions and emotional responses; and the left "brain," which controls the right side of the body, for serial processing of data and making logical comparisons and analyses.

It is well known that these functional divisions between the left and right brain exist: that the right deals with sensory input on an intuitive level, while the left specializes in the cognitive and analytical treatment of sensory information. We know that the ability to make fine sonic discriminations is learned, often over a period of many years. And all of us do 99.99% of our listening over time, under relaxed conditions, which allows time for us to form holistic impressions about the sound of a component. So it is the left brain which we train to detect and react to small sonic differences. The right brain functions as the information receptor; the left brain then analyzes these impressions in a logical manner to yield specifics about the sound that reviewers such as myself report on in Stereophile. But how much opportunity do we have to train the left brain as the primary information receptor? Very little, because that isn't the way we normally listen. So, naturally, when the left, logical, unemotional cerebral hemisphere is called upon to detect sonic differences using music as a test signal, all of us become untrained listeners, incapable of distinguishing anything less than the grossest differences.

Scientific evidence for the seemingly arcane talent of being able to hear differences continues to elude us, but much of the evidence for it is more than merely anecdotal. Why, for example, do reviewers' comments about products which they have auditioned independently so often coincide? Why, when a reviewer misjudges something, do so many readers agree about the thing(s) he miscalled? Why do subjective reviewers so often describe sonic flaws which are only later found to be the result of hitherto-unmeasurable objective flaws? Clearly, the issue is by no means a dead one. But equally clearly, we can no longer take seriously the traditional A/B comparison test as proof, one way or the other, of what we believe to be the truth. I propose an alternative approach: true double-blind, controlled testing, under leisurely, unpressured conditions. How could this be done? Here's how.

Obtain three (or more) power amplifiers widely acknowledged to be quite different in sound but similar in power output. These might be, for example, a Conrad-Johnson Premier 5, an Electron Kinetics Eagle 2, and a Sansui AU-G99X. Place each in a sealed black box with external input and output receptacles, and identical AC cords. Add mass to the two lighter boxes so that all three weigh the same. Provide adequate bottom ventilation holes (with external baffles to prevent peeking) for the amp which will run the hottest (the C-J, probably), duplicate these holes in the other two boxes, and add a (baffled) cooling fan at the top of each box. Add a heating element to the two cooler-running amps, so that all three will throw out the same amount of hot air from the top of the box. Finally, mark the boxes 1, 2, and 3.

Then send the three amplifiers, in turn, to each of a number of volunteer subjects. Each should then listen to his numbered box for as long as he wishes (up to a point), write a Stereophile-type review of it, and submit the report to whoever is organizing the test. He should then return the box, and another of the test amplifiers could, at the discretion of the test organizer, be substituted. This would scotch any attempt at collusion between participants.

I'm not volunteering to organize such a test, because it would be difficult to conduct and, I feel, even unnecessary. The results would prove nothing to those of us who already know what the outcome would be, nor would they sway those whose left brain has never been opened to the gestalt of reproduced sound. It would be like trying to prove the existence of green to a colorblind person who sees both green and red as shades of gray. They would believe it intellectually, but they could never accept it emotionally. Perhaps, ultimately, it isn't important that we prove our point to the deaf. Audio has continued to advance despite the protests of the selectively deaf, who claim that perfection was achieved with the advent of stereo. Many of them have since come over to our side, and their products have improved accordingly. As long as this continues to happen, there is no reason for us to continue trying to persuade nonbelievers that what we hear is more real than ESP.

They'll just have to find it out for themselves.—J. Gordon Holt



Footnote 1: There have been blind tests performed indicating that differences between amplifiers have been audible, particularly when a large enough number of listeners has been involved to make statistical analysis more rigorous than in the Stereo Review tests referred to by JGH. (See HFN/RR, May/July 1986, and Stereophile, passim.) But the audio engineering establishment has remained curiously quiet about such tests, leading to the situation where an apparently uninformed, but extremely influential, columnist such as Hans Fantel (Sunday New York Times, February 1987) can declare, with reference to the SR tests, that "Statistically, these hotly debated differences [between amplifiers] didn't exist."—John Atkinson

X