Deeper Meanings Letters part 4

Value judgments & experiments
Editor: As a lifelong lover of serious music and the author of more than 50 scientific papers, I am well acquainted with both "subjective" and "objective" approaches to knowledge. I also have the good fortune to be married to a professional musician, a violinist, and have witnessed many times the manner in which musical judgments are made. Thus I was much interested in Robert Harley's thoughtful piece on the evaluation of audio equipment (Stereophile, Vol.13 No.7). Herewith a few comments stimulated by Harley's remarks:

First of all, I was surprised that Harley did not attach the most obvious meaning to Prof. Lipshitz's reply to John Atkinson. Needless to say, I did not overhear this seminal conversation, nor for that matter have I ever met Lipshitz or heard him speak. Nevertheless, the context strongly suggests that when the professor asked, "Ah, but how do you know what is good?" he merely left unsaid the (to him) self-evident qualifying clause, "unless of course you measure it."

In this light the question is no more than a rhetorical device. I doubt that Lipshitz had any intention of opening a deep philosophical inquiry; he was merely reaffirming the objectivist's habitual mistrust of raw, unquantified sensory evidence. Harley may well argue that such skepticism is inappropriate in realms demanding refined aesthetic judgment, but it is nevertheless a cornerstone of the scientific method, and as reflexive as a knee-jerk among scientists. Perhaps it is fortunate for all of us that Harley overlooked (or at least neglected to mention) this simple interpretation of Lipshitz's question; otherwise we might have been deprived of the inquiry it provoked.

In reply, Harley takes his text from Robert Pirsig's Zen and the Art of Motorcycle Maintenance. I agree that Zen is a memorable book, with valuable things to say about self-discovery and self-knowledge. But I am not aware that it has anything to say about the design of experiments, which is the true subject of Harley's piece. If reading assignments are to be made, let me recommend instead a classic paper on experimental design, "Mathematics of a Lady Tasting Tea," by Sir Ronald Fisher, one of the founding fathers of modern statistical theory. Here is a paper that ought to be required reading for all audio equipment reviewers. The original publication is not easy to find, but it has been reprinted in James R. Newman's anthology The World of Mathematics, which in turn has recently reappeared in paperback.

The paper concerns a lady who asserts that her surpassing delicacy of taste permits her to tell whether the tea or the milk was first added to the cup when her tea was brewed. (Parallels to the claims made by certain reviewers will immediately suggest themselves.) How shall her claim be tested? In a mere 10 pages Fisher lays out with lapidary clarity the principles which underlie the design of experiments, establishes a test protocol suitable to this case, examines the significance of all possible outcomes, and discusses various modifications and elaborations of the test procedure. No mathematical skills beyond elementary arithmetic are required to follow the argument.

One of the points which Fisher emphasizes most strongly is that only an exact hypothesis can be tested. The hypothesis in this instance (Fisher calls it the "null hypothesis") is that the lady lacks the power of discrimination she claims, in which case the number of teacups she successfully identifies will eventually and inevitably approach the number attainable by chance alone. This is, of course, a limiting operation, and demands in principle an experiment of infinite duration. The hypothesis can be disproved, however, in relatively few trials, by the attainment of a score sufficiently remote from a chance outcome.

The point which almost all lay persons (and I dare say many scientists as well) fail to grasp is that if the null hypothesis is disproved, its opposite is not thereby proved. This appears to contravene common sense; surely if the lady makes a highly improbable number of correct identifications, she is likely to possess some power of discrimination. Indeed she probably does, but this is an inexact hypothesis and therefore admits at most a statistical interpretation, not a proof.

The only other exact hypothesis is that she possesses unfailing power of discrimination, and it is once again clear that this hypothesis can be disproved by a single error of judgment, but can never be proved by any finite amount of experimentation. Einstein clearly illustrated this principle when he said, "No amount of experimentation can ever prove me right. A single experiment at any time can prove me wrong." It is the everlasting falsifiability of hypotheses which distinguishes genuine science from, say, creationism.

It is worth noting that only extremely simple judgments are involved in the foregoing example, those with answers which contain at most a few bits of information. Some questions in the audio business are of this type ("Do amplifier A and amplifier B sound the same?"), but most—including those of greatest importance—are not. ("Is amplifier A or amplifier B a better amplifier?")

I wish Harley had drawn this distinction more clearly, because the two types of question demand very different procedures for arriving at an answer. In particular, blind testing, which Harley deplores, is clearly essential to answer questions of the first type, but may or may not be appropriate in answering questions of the second type. On the other hand, Harley makes a point too often ignored, which is that comparative value judgments enter every stage of the recording business, from the choice of performers and venue to the choice of processing plant, and it seems inconsistent (or at least needlessly restrictive) to condemn them when applied to the choice of playback equipment.

I can contribute some anecdotal evidence which supports Harley's views on the importance of "subjective" reviewing. Over the years I have watched my wife's progress from her original mass-produced Mittenwald student violin to her present Italian master violin, a 1762 Carlo Antonio Testore. I can attest that, to a professional musician, the choice of a performing instrument dwarfs all other decisions in life except possibly the choice of a spouse. The process takes many weeks. The candidate violins are tested first in the luthier's workshop, then at home, then in concert, then at home again. They are tested with scales and finger exercises, then with Vivaldi and Bach, then with Mozart and Paganini, then with Tchaikovsky, Bruch, and Berg. Strings are replaced, bridges are exchanged, sound posts are tweaked this way or that. Variations in humidity, temperature, ambience, mood, and fatigue are taken into account. Other musicians are solicited for their opinions. Agonies of vacillation and indecision are suffered until at last, with trembling heart and crossed fingers, a final choice is made and the prospective purchaser turns to the task of obtaining a second mortgage on the house.

What I find remarkable about this process is the extent to which it resembles the evaluation of a major high-end component or system. Substitute Krell for Cremona and you have a fairly accurate description of the behavior of an obsessed audiophile. The comparison is not intended to disparage the audiophile; on the contrary, to my mind it legitimizes or validates his behavior. He is behaving exactly as a musician would under the circumstances.

It might be thought that the musician has no alternative, that there exist no "objective" criteria for the evaluation of violins, but this is not true. Thanks to the researches of the Catgut Acoustical Society and others, one can distinguish good violins from bad with near-perfect certainty in the laboratory. Nevertheless, it is inconceivable that a musician would choose an instrument solely on the basis of laboratory measurements, without having heard it "under the ear." Between a very good violin and a superlative one there exist differences which measurement cannot yet reveal. The trained ear is capable of levels of discrimination far exceeding anything that can be caught in the coarse net of available diagnostic techniques.

This does not, however, excuse us from the task of endeavoring to refine those techniques. Much more needs to be said about this, and Harley scarcely touches upon it. Forty years ago, when I was learning electroacoustics at the feet of F. V. Hunt and B. B. Drisko, it was indelibly impressed on me that music lives in its transients, and nothing I have seen or heard since has caused me to change that opinion. In the absence of the initial "ictus" or consonant of speech, one can scarcely distinguish a softly blown trumpet from a flute. Yet almost all the laboratory tests contained in a typical "objective" review are performed in the steady-state and presented in the frequency domain. Only rarely is anything done in a transient mode and presented in the time domain. To be sure, a plot of impulse response has become a more or less standard feature of loudspeaker reviews, but usually only as a stepping stone to the derivation of frequency response by Fourier transform. Similarly, photos of squarewave response often accompany amplifier reviews, but only to illustrate the behavior under reactive load in qualitative fashion.

In all essential respects the repertoire of laboratory measurements is no larger today than it was half a century ago, when a pair of 2A3s represented the acme of high-fidelity amplification and the attainment of flat response to the extremes of human hearing was the principal goal of designers. In the realm of measurement at least, audio engineers have proved almost as resistant to change as automotive engineers, who still employ the system of units established by James Watt in 1770.

What accounts for this reluctance to devise newer and more revealing diagnostic techniques? I don't know, but I recall a time when it was not so. Around 1950, when the McIntosh amplifiers first appeared—the celebrated 50W-2 and 20W-2 on inverted chassis—they quickly drove the competition from the marketplace because of their evident superiority. A mere glance at the McIntosh patent showed how cleverly McIntosh had solved the problem of attaining adequate output-transformer bandwidth, the bane of plate-coupled vacuum-tube amplifiers. (Remember that in this era the majority of prospective purchasers knew how to read a circuit diagram.) It was a "technically sweet" solution, and instantly recognizable as such. In consequence, there ensued a sort of underground contest among amplifier designers to contrive a test which the McIntosh would fail, or at least one on which it would perform badly. Improvement of the breed had nothing to do with this effort. The contest was motivated purely by professional envy and the stakes were competitive advantage; ie, the prospect of running full-page ads saying "Try this with your McIntosh!" One of the fruits of this contest was the interrupted sinewave test: four cycles of sinewave followed by an equal period of zero input. During the nominally silent period a sensitive recorder was gated on, and the RMS output of the amplifier was recorded. Frequency and amplitude of the input were the independent variables. I do not in fact recall whether McIntosh amplifiers performed well or badly under this test, but the test itself would seem to be the perfect tool for the quantification of intertransient silence—a concept much in vogue these days—and for the investigation of transient bias shifts under dynamic load, the plague of vacuum-tube and transistor amplifiers alike. Why has it vanished from the armamentarium of the technical reviewer?

Despite these comments, I find Harley's efforts to reconcile the two schools of evaluation praiseworthy on the whole. Although I do not agree with him on every point, he has consistently tried to contribute light rather than heat to the discussion, a quality all too rare among the hypertrophied egos in this field. I hope his article marks the beginning of a continuing dialog on the fundamentals of the reviewer's art in Stereophile.—Edward A. Fagen, Newark, DE