As We See It

Audio, Meet Science

Jim Austin Mar 07, 2011

For a field based on science, high-end audio has a relationship with its parent discipline that is regrettably complex. Even as they enjoy science's technological fruits, many audiophiles reject the very methods—scientific testing—that made possible audio in the home. That seems strange to me.

Of course, there are reasons—not necessarily good ones—why some people in audio have a low opinion of science. The entire hobby/industry suffered a grievous blow a generation or so ago, when too much focus was put on something—how an amplifier measured—that resembled science, and too little on how such products actually sounded. A little later, the Compact Disc's promise of technical "perfection" yielded dubious sonic benefits when compared with an older, simpler medium, the LP. And there has been, over the years, no shortage of self-identified scientific types manipulating purportedly scientific tests, or ignoring inconvenient test results, to support their preconceptions.

Subjectivists, meanwhile, sometimes seem to intentionally hold themselves up for ridicule. A few audio writers, especially for the online 'zines, seem eager to prostitute themselves for the latest preposterous product—the Intelligent Chip, the Shakti Hallograph Soundfield Optimizer, the Machina Dynamica Brilliant Pebbles, the Marigo Audio Lab Dots, the Tice Clock. Meanwhile, some prominent industry folks have consistently failed, in my opinion, to maintain a sufficiently skeptical posture toward such products. It doesn't help that in audio there has been a long tradition—a dogma, even—of "trusting your ears," despite abundant evidence from neuroscience and cognitive psychology that our ears and other senses, though supremely acute, are supremely unreliable.

Yet science—and scientific testing—has much to offer audio. Audiophiles ought to embrace scientific methods with the same healthy skepticism with which they embrace sighted, subjective equipment evaluations: as a tool that, though subject to misuse, is invaluable in the hands of honest people with the right set of skills.

One form of testing that's especially important—and especially controversial—is the use of rigorous methods to validate apparent perceptions: Did you really hear what you thought you heard? Is the sound really different with that new amplifier/cable/CD player from what it was with the old one? Does putting that photo in the freezer really change the way the system sounds?

Rigorous tests can offer scientific validation of subjective observations.(footnote 1) Such validation doesn't come easily, but when it comes it's a force for truth, justice, and the American way.

Such tests have two possible outcomes: Either you establish scientifically that there's a difference between A and B, or you fail to establish a difference. And here's a crucial point that's often overlooked by people on the objectivist side: While a positive result establishes the reality of a perception to a certain level of confidence, a null result—a failure to reliably detect a difference—does not indicate the nonexistence of that difference.

To cite an oft-quoted phrase that's sometimes attributed to Carl Sagan, "absence of evidence is not evidence of absence" (footnote 2). Or, to cite Les Leventhal in an AES paper, "Properly speaking, a statistical conclusion about H₀ from a significance test cannot justify 'accepting' the scientific hypothesis that differences are inaudible" (footnote 3).

In simpler language: Differences we think we hear but that testing fails to validate may nonetheless be real. An experiment that fails to show a difference between the sounds produced by two amplifiers does not indicate that no audible difference exists.

Many of the rigorous listening tests that are relevant to audio are difficult and tiring to perform, requiring serious concentration, many repetitions, and sometimes heavy lifting. Things can be made easier by using many listeners at once, but then the only conclusions you can draw are about the average characteristics of the group. The group average may not permit distinguishing cable A from cable B, but that doesn't mean a particularly golden-eared member of the group can't.

One of the beautiful things about science is that often you can make your experiments more sensitive by applying new technologies. The likelihood of a non-null result can be squeezed and squeezed until it approaches zero, and you can begin to feel sure that there's really nothing happening. But in audio, there's not a whole lot you can do to make your tests more sensitive. Your measuring instruments are limited not by technology, but by the ear/brain system of very human listeners.

The dullness of our tests—their insensitivity—leaves a space where reviewers can roam free. Within that space, reviewers may wax poetic about palpability and sweetness and air and light without fear of contradiction by science. It's a space where much could be happening—apparently is happening—but where it's impossible to be sure whether it's happening or not. A lot of life is like that.

Of course, this is not the only space where audio reviewers operate. They may—and frequently do—roam outside this safe habitat, making observations about easily audible things that no doubt could, with a bit of work, be scientifically verified. But there is little incentive for audio writers to take such tests, especially when they are already sure of what they've heard.

If my argument so far seems to favor the subjectivist side, it's now time to rebalance things. On reading a paper by a young colleague, the great physicist Wolfgang Pauli (1900–1958) commented, "It's not even wrong." He meant that the ideas proposed in the paper could not even be tested. For Pauli, this was the ultimate insult.

Luckily for us, the human population is diverse. Not everyone feels as Pauli did. Yet a science-based activity without scientific constraints, in which the only distinction among tweaks that appear to be nothing more than snake-oil, well-designed amplifiers, and speakers with good dispersion characteristics are the vicissitudes of personal aural experience, makes me uncomfortable. I find myself craving some certainty, if only to put a little more space between the creations of a skilled audio designer and, say, a jar of pretty rocks.

Footnote 1: Perhaps the best example is the testing of audio codecs and lossy compression schemes intended for the delivery of digital audio. While these tests almost certainly underestimate audibility of some compression artifacts, they routinely establish the audibility of others.

Footnote 2: The Demon-Haunted World: Science as a Candle in the Dark (1995).

Footnote 3: "How Conventional Statistical Analyses Can Prevent Finding Audible Differences in Listening Tests," Leventhal, presented at the October 1985 AES Convention in New York. Preprint 2275. See also his article in Stereophile. H₀ is the null hypothesis, the hypothesis that the effect under test is inaudible. Leventhal is saying, in other words, that a failure to detect audibility is not evidence of inaudibility.