The 2011 Richard C. Heyser Memorial Lecture: "Where Did the Negative Frequencies Go?" Measuring Sound Quality, The Art of Reviewing

Measuring Sound Quality

Table 1: What is heard vs what is measurable

This is a table I prepared for my 1997 AES paper on measuring loudspeakers. On the left are the typical measurements I perform in my reviews; on the right are the areas of subjective judgment. It is immediately obvious that there is no direct mapping between any specific measurement and what we perceive. Not one of the parameters in the first column appears to bear any direct correlation with one of the subjective attributes in the second column. If, for example, an engineer needs to measure a loudspeaker's perceived "transparency," there isn't any single two- or three-dimensional graph that can be plotted to show "objective" performance parameters that correlate with the subjective attribute. Everything a loudspeaker does affects the concept of transparency to some degree or other. You need to examine all the measurements simultaneously.

This was touched on by Richard Heyser in his 1986 presentation to the London AES. While developing Time Delay Spectrometry, he became convinced that traditional measurements, where one parameter is plotted against another, fail to provide a complete picture of a component's sound quality. What we hear is a multidimensional array of information in which the whole is greater than the sum of the routinely measured parts.

And this is without considering that all the measurements listed examine changes in the voltage or pressure signals in just one of the information channels. Yet the defects of recording and reproduction systems affect not just one of those channels but both simultaneously. We measure in mono but listen in stereo, where such matters as directional unmasking—where the aberration appears to come from a different point in the soundstage than the acoustic model associated with it, thus making it more audible than a mono-dimensional measurement would predict—can have a significant effect. (This was a subject discussed by Richard Heyser.)

Most important, the audible effect of measurable defects is not heard as their direct effect on the signals but as changes in the perceived character of the oh-so-fragile acoustic models. And that is without considering the higher-order constructs that concern the music that those acoustic models convey, and the even higher-order constructs involving the listener's relationship to the musical message. The engineer measures changes in a voltage or pressure wave; the listener is concerned with abstractions based on constructs based on models!

Again, this was something I first heard described by Richard Heyser in 1986. He gave, as an example of these layers of abstraction, something with which we are all familiar yet cannot be measured: the concept of "Chopin-ness." Any music student can churn out a piece of music which a human listener will recognize as being similar to what Chopin would have written; it is hard to conceive of a set of audio measurements that a computer could use to come to the same conclusion.

Once you are concerned with a model-based view of sound quality, this leads to the realization that the nature of what a component does wrong is of greater importance than the level of what it does wrong: 1% of one kind of distortion can be innocuous, even musically appropriate, whereas 0.01% of a different kind of distortion can be musical anathema.

Consider the sounds of the clarinet I was playing in that 1975 album track. You hear it unambiguously as a clarinet, which means that enough of the small wrinkles in its original live sound that identify it as a clarinet are preserved by the recording and playback systems. Without those wrinkles in the sound, you would be unable to perceive that a clarinet was playing at that point in the music, yet those wrinkles represent a tiny proportion of the total energy that reaches your ears. System distortions that may be thought to be inconsequential compared with the total sound level can become enormously significant when referenced to the stereo signal's "clarinet-ness" content, if you will: the only way to judge whether or not they are significant is to listen.

But what if you are not familiar with the sound of the clarinet? From the acoustic-model–based view, it seems self-evident that the listener can construct an internal model only from what he or she is already familiar with. When the listener is presented with truly novel data, the internal models lose contact with reality. For example, in 1915 Edison conducted a live vs recorded demonstration between the live voice of soprano Anna Case and his Diamond Disc Phonograph. To everyone's surprise, reported Ms. Case, "Everybody, including myself, was astonished to find that it was impossible to distinguish between my own voice, and Mr. Edison's re-creation of it."

Much later, Anna Case admitted that she had toned down her voice to better match the phonograph. Still, the point is not that those early audiophiles were hard of hearing or just plain dumb, but that, without prior experience of the phonograph, the failings we would now find so obvious just didn't fit into the acoustic model those listeners were constructing of Ms. Case's voice.

I had a similar experience back in early 1983, when I was auditioning an early orchestral CD with the late Raymond Cooke, founder of KEF. I remarked that the CD sounded pretty good to me—no surface noise or tracing distortion, the speed stability, the clarity of the low frequencies—when Raymond metaphorically shook me by the shoulders: "Can't you hear that quality of high frequencies? It sounds like grains of rice being dropped onto a taut paper sheet." And up to that point, no, I had not noticed anything amiss with the high frequencies (footnote 3). My internal models were based on my decades of experience of listening to LPs. I had yet to learn the signature of the PCM system's failings—all I heard was the absence of the all-too-familiar failings of the LP. Until Raymond opened the door for me, I had no means of constructing a model that allowed for the failings of the CD medium.

An apparently opposite example: In a public lecture in November 1982, I played both an all-digital CD of Rimsky-Korsakov's Scheherazade and Beecham's 1957 LP with the Royal Philharmonic Orchestra of the same work, without telling the audience which was which. (Actually, to avoid the "Clever Hans" effect, an assistant behind a curtain played the discs.) When I asked the listeners to tell me, by a show of hands, which they thought was the CD, they overwhelmingly voted for what turned out to be the analog LP as being the sound of the brave new digital world!

I went home puzzled by the conflict between what I knew must be the superior medium and what the audience preferred. Of course, the LP is based on an elegant concept: RIAA equalization. As Bob Stuart has explained, this results in the LP having better resolution than CD where it is most important—in the presence region, where the ear is most sensitive— but not as good where it doesn't matter, in the top or bottom octaves. But with hindsight, it was clear that I had asked the wrong question: instead of asking what the listeners had preferred, I had asked them to identify which they thought was the new medium. They had voted for the presentation with which they were most familiar, that had allowed them to more easily construct their internal models, and that ease had led them to the wrong conclusion.

When people say they like or dislike what they are hearing, therefore, you can't discard this information, or say that their preference is wrong. The listeners are describing the fundamental state of their internal constructs, and that is real, if not always useful, data. This makes audio testing very complex, particularly when you consider that the brain will construct those internal acoustic models with incomplete data (footnote 4).

So how do you test the effectiveness of how changing the external stimulus facilitates the construction of those internal models?

In his keynote address at the London AES Conference in 2007, for example, Peter Craven discussed the improvement in sound quality of a digital transfer a 78rpm disc of a live electrical recording of an aria from Puccini's La Bohème when the sample rate was increased from 44.1 to 192kHz. Even 16-bit PCM is overkill for the 1926 recording's limited dynamic range, and though the original's bandwidth was surprisingly wide, given its vintage, 44.1kHz sampling would be more than enough to capture everything in the music, according to conventional information theory.

But as Peter pointed out, with such a recording there is more to the sound than only the music. Specifically, there is the surface noise of the original shellac disc. The improvement in sound quality resulting from the use of a high-sampling-rate transfer involved this noise appearing to float more free of the music; with lower sample rates, it sounded more integrated into the music, and thus degraded it more.

Peter offered a hypothesis to explain this perception: "the ear as detective." "A police detective searches for clues in the evidence; the ear/brain searches for cues in the recording," he explained, referring to the Barry Blesser paper I mentioned earlier. Given that audio reproduction is, almost by definition, "partial input," Peter wondered whether the reason listeners respond positively to higher sample rates and greater bit depths is that these better preserve the cues that aid listeners in the creation of internal models of what they perceive. If that is so, then it becomes easier for listeners to distinguish between desired acoustic objects (the music) and unwanted objects (noise and distortion). And if these can be more easily differentiated, they can then be more easily ignored.

Once you have wrapped your head around the internal-model–based view of perception, it becomes clear why quick-switched blind testing so often produces null results. Such blind tests can differentiate between sounds, but they are not efficient at differentiating the quality of the first-, second-, and third-order internal constructs outlined earlier, particularly if the listener is not in control of the switch.

I'll give an example: Your partner has the TV's remote control; your partner flashes up the program guide, but before you can make sense of the screen, she scrolls down, leaving you confused. And so on. In other words, you have been presented with a sensory stimulus, but have not been given enough time to form the appropriate internal model. Many of the blind tests in which I have participated echo this problem: The proctor switches faster than you have time to form a model, which in the end results in a result that is no different from chance.

The fact that the listener is therefore in a different state of mind in a quick-switched blind test than he would be when listening to music becomes a significant interfering variable. Rigorous blind testing, if it is to produce valid results, thus becomes a lengthy and time-consuming affair using listeners who are experienced and comfortable with the test procedure.

There is also the problem that when it comes to forming an internal model, everything matters, including the listener's cultural expectations and experience of the test itself. The listener in a blind test develops expectations based on previous trials, and the test designer needs to take those expectations into account.

For example, in 1989 I organized a large-scale blind comparison of two amplifiers using the attendees at a Stereophile Hi-Fi Show as my listeners. We carried out 56 tests, each of which would consist of seven forced-choice A/B-type comparisons in which the amplifiers would be Same or Different. To decide the Sames and Differents, I used a random number generator. However, if you think about this, sequences where there are seven Sames or Differents in a row will not be uncommon. Concerned that, presented with such a sequence, my listeners would stop trusting their ears and start to guess, whenever the random number generator indicated that a session of seven presentations should be six or seven consecutive Differents or Sames, I discarded it. Think about it: If you took part in a listening test and you got seven presentations where the amplifiers appeared to be the same, wouldn't you start to doubt what you were hearing?

I felt it important to reduce this history effect in each test. However, this inadvertently subjected the listeners to more Differents than Sames—224 vs 168—which I didn't realize until the weekend's worth of tests was over. As critics pointed out, this in itself became an interfering variable.

The best blind test, therefore, is when the listener is not aware he is taking part in a test. A mindwipe before each trial, if not actually illegal, would inconvenience the listeners—what would you do with the army of zombies that you had created?—but an elegant test of hi-rez digital performed by Philip Hobbs at the 2007 AES Conference in London achieved just this goal.

To cut a long story short, the listeners in Hobbs's test believed that they were being given a straightforward demo of his hi-rez digital recordings. However, while the music started out at 24-bit word lengths and 88.2kHz sample rates, it was sequentially degraded while preserving the format until, at the end, we were listening to a 16-bit MP3 version sampled at 44.1kHz at a 192kbps bit rate.

This was a cannily designed test. Not only was the fact that it was a test concealed from the listeners, but organizing the presentation so that the best-sounding version of the data was heard first, followed by progressively degraded versions, worked against the usual tendency of listeners to a strange system in a strange room: to increasingly like the sound the more they hear of it. The listeners in Philip's demo would thus become aware of their own cognitive dissonance. Which, indeed, we did.

Philip's test worked with his listeners' internal models, not with the sound, which is why I felt it elegant. And, as a publisher and writer of audio component reviews, I am interested only peripherally in "sound" as such (footnote 5); what matters more is the quality of the reviewer's internal constructs. And how do you test the quality of those constructs?

The Art of Reviewing
That 1982 test of preference of LP vs CD forced me to examine what exactly it is that reviewers do. When people say they like something, they are being true to their feelings, and that like or dislike cannot be falsified by someone else's incomplete description of "reality." My fundamental approach to reviewing since then has been to, in effect, have the reviewer answer the binary question "Do you like this component, yes or no?" Of course, he is then obliged to support that answer. I insist that my reviewers include all relevant information, as, as I have said, when it comes to someone's ability to construct his or her internal model of the world outside, everything matters.

For example: in a recent study of wine evaluation, when people were told they were drinking expensive wine, they didn't just say they liked it more than the same wine when they were told it was cheap; brain scans showed that the pleasure centers of their brains lit up more. Some have interpreted the results of this study as meaning that the subjects were being snobs—that they decided that if the wine cost more, it must be better. But what I found interesting about this study was that this wasn't a conscious decision; instead, the low-level functioning of the subjects' brains was affected by their knowledge of the price. In other words, the perceptive process itself was being changed. When it comes to perception, everything matters, nothing can safely be discarded.

In my twin careers in publishing and recorded music, the goal is to produce something that people will want to buy. This is not pandering, but a reality of life—if you produce something that is theoretically perfect, but no one wants it or appreciates it enough to fork over their hard-earned cash, you become locked in a solipsistic bubble. The problem is that you can't persuade people that they are wrong to dislike something. Instead, you have to find out why they like or dislike something. Perhaps there is something you have overlooked.

For the second part of this lecture, I will examine some "case studies" in which the perception doesn't turn out as expected from theory. I will start with recording and microphone techniques, an area in which I began as a dyed-in-the-wool purist, and have since become more pragmatic.



Footnote 3: For a long time, I've felt that the difference between an "objectivist" and a "subjectivist" is that the latter has had, at one time in his or her life, a mentor who could show them what to listen for. Raymond was just one of the many from whom I learned what to listen for.

Footnote 4: This is a familiar problem in publishing, where it is well known that the writer of an article will be that article's worst proofreader. The author knows what he meant to write and what he meant to say, and will actually perceive words to be there that are not there, and miss words that are there but shouldn't be. The ideal proofreader is someone with no preconceptions of what the article is supposed to say.

Footnote 5: My use of the word sound here is meant to describe the properties of the stimulus. But strictly speaking, sound implies the existence of an observer. As the philosophical saw asks, "If a tree falls in the forest without anyone to observe it falling, does it make a sound?" Siegfried Linkwitz offered the best answer to this question on his website: "If a tree falls in the forest, does it make any sound? No, except when a person is nearby that interprets the change in air particle movement at his/her ear drums as sound coming from a falling tree. Perception takes place in the brain in response to changing electrical stimuli coming from the inner ears. Patterns are matched in the brain. If the person has never heard or seen a tree falling, they are not likely to identify the sound. There is no memory to compare the electrical stimuli to."

Share | |
COMMENTS
GeorgeHolland's picture

No you may address me as George or Mr Holland. Georie is an attempt at making fun but then again that's about all you know to do anyway.

Please ask a relevant question or just shut it.

The lamb would first have to prove they can hear a difference with a dbt

The shepard can buy whatever he wants.

The big bad wolf tried to convince both that the expensive boutique amp was the one to buy

They told him to shove it and bought a less expensive but well built amp and they all lived happily ever after except for Mr Big Bad who soon folded his shop due to no sales.

ChrisS's picture

So Georgie,

Let's make this question relevant....

Let's say we're testing two amplifiers with two listeners. The first listener is an 18 year old young lady who is trained in classical piano at a Grade 10 music conservatory level. She can hear that Amp A has an excellent range 16hz-40khz through the test system, but even though Amp B doesn't have the same range, she likes the "sound" of it better.  The second listener is the shepherd boy who's grown up now. He's 54 years old, likes big band jazz, but his hearing has been damaged by working with heavy machinery without hearing protection. He can't hear a difference between the two amps.

 

Which amplifier should the ex-shepherd buy?

GeorgeHolland's picture

The 18 year old can hear to 40KHz?  Were her parents bats?

"Like" doesn't have anything to do with blind testing. You don't pick which one you "like" you see if you can tell WHICH amp is playing. You don't even know how a dbt test is run I can see already.

 Who cares which amp they buy?  Maybe the people selling them do but that is completly irrevelant to dbt. You have no clue as to what blind testing is all about.

ChrisS's picture

Georgie,

Are you a real person or a computer generated figment from JRusskie's russian clone of an old IBM PC? Do you know anyone with normal hearing? Do you know how real people shop or do you isolate yourself in the sanctuary of your closed mind and order everything on-line after reading extensive reviews in Consumer Reports?

ChrisS's picture

You call those proper Double Blind studies?

GeorgeHolland's picture

Oh so you are an expert on double or single blind studies. The ABX system is a proven dbt way to do things. Just because the results have you so upset, you claim the people doing the testing are doing it wrong? Laughable. Tell me some more jokes.

ChrisS's picture

...Swinging down the street so fancy-free..."

 

In fact, Georgie, my major(s) for my undergraduate degree were in Developmental Psychology (including Perception) and Statistics (including Research Methodology). So yes, the set up and methodology shown in those links are crap and the results are laughable...

ChrisS's picture

Get thee to a local college and enroll in a first year research methodology course. Have fun learning!

GeorgeHolland's picture

Either address me as George or STFU you stupid little boy. I think you majored in being a twit and smart ass. Who can take anything you say as serious? Grow the fuck up already. You act out like a lil boy with the IQ of a rock.

Go tell the people who make the ABX test system what you just said and see how they laugh you out of the room. You bring nothing to this discussion other than what you don't agree with , with zero facts to back up your claims. Come on show us all how the ABX test methods aren't any good or why the blind testing done in the othe link was faulty. Better yet tell Harman Kardon that their blind testing techniques are faulty and not worth doing.

ChrisS's picture

Georgie,

You must be running out of neurons if you don't trust your own eyes and ears. Yes, the facts are out there.

JohnnyR's picture

.is what SBT and DBT are all about. Using your eyes to test audio products? Well yes I can SEE that YOU would have to look so you would know which one is "better".

ChrisS's picture

Whose ears? Why?

John Atkinson's picture

GeorgeHolland wrote:
I find it sad that Stereophile keeps saying that DBT or even SBT are not a valid way to test those claims.

Please do not put words in my mouth. That is not what I have said. What I _have_ written is that to design a blind test that limits the variables to just that which you are interrsted in and that produces valid results when there is a small but real audible difference is complicated and time-consuming. The literature is full of poorly designed and performed blind tests that have been proclaimed by audio skeptics as "proving" that there are no audible differences. Such people demonstrate both their ignorance of the Scientfiic Method and their unquesitoning faith in "Scientism."

John Atkinson

Editor, Stereophile

GeorgeHolland's picture

JohnnyR was right more EXCUSES.

It's pretty simple Mr Atkinson but then having the will or gumption to put a dbt into the line of testing is the first thing you have to have,

You obviously don't have that or just don't care so it's a moot point anyways.

JohnnyR's picture

.nor ever will when money is involved.

dalethorn's picture

I would hate to see magazines and websites like Stereophile become intimidated by naysayers who demand "proof" of everything they say, in advance or after the fact. There's a lot of that in mainstream media, due no doubt to controversial topics and false information being fed to reporters. But come on, people - this isn't a mainstream news outfit reporting life or death stories. We have here an incredibly rich article full of facts that can be researched and questioned with references that are well established over time. Instead we have people questioning the author's motives or his pursuit of truth? I think people who are looking for "The truth" should be looking in a religious forum, not a hi-fi forum. There's very little you can "prove" on these topics - the value here is the very informed opinion that costs you nothing.

Ariel Bitran's picture

nice comment Dale. as an attendee of this lecture, i can tell you it was surely enlightening.

GeorgeHolland's picture

I think you have it all backwards there friend.....

"I think people who are looking for "The truth" should be looking in a religious forum, not a hi-fi forum. There's very little you can "prove" on these topics - the value here is the very informed opinion that costs you nothing."

Religion is based upon belief and subjectivists cling to their belief, they don't go looking for proof or the scientific method. No you can't prove very much when the reviewers use subjective say so instead of actually measuring the units. Opinons are a dime a dozen or even less than that and worth next to nothing. Just look at all the opinions here.

dalethorn's picture

No, George, you don't get it. You're still stuck in religion, looking for proof of something. Here you get 'information' only, and if you want 'proof' of something, you have to do the work in proving it to yourself. What *you* believe, outside of yourself, is purely opinion. Perhaps all these people looking for truth or proof are just lazy, and trying to intimidate others into doing the work for them. Like bullies.

JohnnyR's picture

......the facts there pal. Your "arguement" is FLAT.  The only "info" Stereophile shows us is what the WANT to show us. Cables, power cords, magic bowls are off the list of even testing them in anyway what-so-ever. The reason? Ohhhhh we really don't know how to test those duhhhhh. How lame an EXCUSE is that?  Stereophile is SUPPOSED to be a magazine for information NOT excuses.

"Perhaps all these people looking for truth or proof are just lazy, and trying to intimidate others into doing the work for them. Like bullies."

 BWAHAHAHAHAH!!! that has to be on my "Top 10 WTF Things of 2012"

 You DIDN'T just say that did you???? So let me get this straight, Stereophiles job as you see it is to just fling out "say so" and it's up to the readers to wade through the muck and mire of those reviews to try and grasp one little bit of truth? Yeah right, you must really love it then because they rarely show any truth at all.

Ariel Bitran's picture

why is it so hard to accept that double-blind listening tests are difficult to achieve as JA has explained in his lecture?

the fact that our existences are commandeered by individual perception based on thousands of variables makes it very easy for me to understand, just as how one person may enjoy spicy foods but not grapefruit or where some may hear too much bass and others not enough. so many VARIABLES!!! culture, upbringing, what sounds you are surrounded by, traffic signals, your genetic structure, your actual physical position when listening. perception is a learned skill that we do not choose to accept, it just happens and it is different for every single person.

i think THESE are the sort of differences between individuals that make DBT difficult: everyone hears differently. there is no absolute sound.

the best example of how an ear and sonic preference can change is in the study of language and sounds. the chinese language has a completely different set of sounds to that of the english language, thus their speaking intonation, laughter, and music reflect their cultural and sonic inclinations. eastern and western and andean and greek and celtic and ... and ... all use completely different scales based on their preferences of sound learned over time through language and their environments.

Thus, i often wonder do hi-fi listeners across the globe prefer different sounding systems based on their installed sonic memory? or is there a constant in terms of preference across the globe? probably not. or even more interestingly, can one find similarities in preferences in sound based on linguistic sounds of an individual region? are the frequencies accented in the german language more easily noticed by a german in his hi-fi? DBTs are a waste of time. instead of focusing why not, it is much more fun to focus on the why.

the heart of all of this lies within JA's question: where do the negative frequencies go? there are aspects to our perception of sound that simply cannot be measured because they are based on individual perception which is different for every single one of us.

GeorgeHolland's picture

If blind testing is so difficult then how did the people that I linked to manage to do so?  Harman Kardon does blind testing at the drop of a hat. Go ask them how they do it so easily. Mr Atkinson's refusal to do so is simply an excuse as to not have to bring up why cables, power cords and other snake oil is indeed snake oil. He can merrily go along his way as he has for years now ignoring such products and letting his reviewers say whatever BS they want about the sham products and not have to worry one bit. He just doesn't care is the bottom line.

ChrisS's picture

Georgie Porgie,

That you cite these links as authoritative sources indicates the level of your understanding of testing methodology.

JohnnyR's picture

......nothing to back up whatever it is you are trying to say but it is amusing.

ChrisS's picture

Research methodology courses are taught in colleges and universities all over the world, even Russia...

Let us know when you and Georgie take one.

JohnnyR's picture

Nothing to cite other than your own wandering silly posts? Thought so.

Regadude's picture

Well little Johnny, at least Chris is A REAL psychologist. He's not a, you know, a hobbyist like yourself...

JohnnyR's picture

........credentials from ChrisSy just say so. Oh lets see I think I'll be a nuclear scientist now just because I say I am on the forums. There now it's a done deal. Besides how he acts out is more like a 3 year old than an adult. Some professional he is and tell us all again just what your expertise is? Trolling perhaps?

Regadude's picture

...credentials from Johnny the hobbyist speaker designer. Let's see some pictures of your Johnny brand speakers! Post some pics, or provide a link to a site where we can see these speakers of yours.

I demand to see these speakers of yours! 

ChrisS's picture

If JRusskie has a misguided and limited understanding of DBT, and....

Georgie has a misguided and limited understanding of DBT, then....

Are JRusskie and Georgie one and the same person?

Has anyone seen them in the same room together? Hmmmm.

Please, one of you (I guess it doesn't matter which...) ask Harman Kardon how they do their DBT's and how they use the results.

Thank you.

Pages

X
Enter your Stereophile.com username.
Enter the password that accompanies your username.
Loading