You are here

Log in or register to post comments
RobertSlavin
RobertSlavin's picture
Offline
Last seen: 4 months 3 weeks ago
Joined: Mar 25 2007 - 8:01am
The scientific method and this month's As We See It

I agree with Jim Austin about the need for introducing a more scientific approach to testing in the audio review field.

In July 2008 Stereophile posted an interesting interview with Kevin Voecks, the chief designer for Revel loudspeakers. Among other things he said:

[Double-blind] listening tests over the past 10 years have taught us [at Revel] one other thing. Above the midprice range of loudspeakers, there is no correlation between the sound quality and the loudspeaker's price. Although many high-priced loudspeakers do perform adequately in our listening tests, the most expensive speaker in a given double-blind listening test may be the least preferred by our listening panel.

--http://www.stereophile.com/interviews/608kev/index.html

In his recent book Sound Reproduction, Floyd Toole related the experience of Harmon International loudspeaker researchers in their double-blind testing. They found that people ranked speakers differently when they could see them. He also notes that many expensive and very well reviewed speakers are rated only so-so when they are tested under double-blind conditions. These speakers generally also have measurable shortcomings. (pages 357-362, 396-398)

While double-blind or even simple blind testing is difficult, Stereophile should make an effort to do these sorts of tests in some circumstances. I believe Stereophile's now deceased founder J. Gordon Holt called for blind testing in an interview with Editor John Atkinson in an As We See It published a few years ago.

Catch22
Catch22's picture
Online
Last seen: 6 min 6 sec ago
Joined: Nov 21 2010 - 1:58pm
I agree that it would be entertaining

But, that's about as far as its worth would take it. Sure, large and obvious differences would easily be discerned, but there are just too many subtleties that take long listening sessions over long periods of time to reveal themselves and that would make a blind test impractical.

Often times when I'm comparing components that I'm considering for upgrades to my system, I will go many days or weeks before a certain song or a certain recording that I'm very familiar with sounds suddenly different. This could be after hearing many other recordings over the period of time that didn't reveal a certain improvement or distraction. And that's when the "clue" leads me to compare with more carefully chosen music.

Anyway, the bottom line is that it would be great to poke fun at the reviewers who have to endure this sort of thing, but it wouldn't be a very valuable tool for people who genuinely want a thorough review of a particular product.

Audio_newb
Audio_newb's picture
Offline
Last seen: 1 year 6 months ago
Joined: Mar 1 2007 - 9:12pm
This topic should get an entire issue

An interesting read that at least to me fell a little flat in its target. But let me unpack that statement a bit and then add to the above comments. Yes, there are many audiophiles who seem skeptical of the scientific method. And perhaps an equal share who are skeptical of audiophile manufacturers and the audiophile press for not being scientific enough. After finishing the article I was still at a loss as to who its target audience was.

I love me some science and I know that the editors at Stereophile do as well. I would never accuse them of shoddy testing methodology or have them change their evaluation methods. Having said that, the audiophile community as a whole certainly has a bit of a hill to climb in justifying itself scientifically to the non audiophile. Some of this has to do with products (stones, blocks, anything with quantum in the name) seen by many as snake oil, much of it has to do with a certain populist resentment at luxury goods, and I suspect much of it has to do with a near universal refusal to conduct and report double blind tests.

Few things rile audiophiles into a frenzy like the concept of the double blind test, but I for one would love nothing more than an issue dedicated to it. Specifically, there are several topics of concern I would love to see explored. The first goes back to the oft repeated "when in doubt, trust your ears." At least in my case however, these are connected to my brain (and eyes), which sometimes act suspiciously. I'm more suspicious still after having read Malcolm Gladwell's "Blink," which I would highly recommend. But back to the matter at hand, it is well documented that not only do we trust our eyes more than our ears, but our brains often play tricks on us. Testing the power of psychology versus the power of what we think we can hear would be not only profoundly interesting, but critically useful for an audiophile. How does my perception of sound change when I think I'm listening to an expensive system? Or when I think I'm listening to a bookshelf speaker versus a floorstander?

Secondly, it would be interesting (and useful) to explore the concept of the golden ear. Certainly some of us have more acute hearing than others. Furthermore, I'm certain that those who are trained to listen for the slightest of audible differences (a Stereophile editor for instance) have what we would consider much keener hearing then the audio layman. But how great are these differences?

And lastly, some plain ole fashion double blind testing on components, compression rates, etc. Truly interesting would be to retest gear (unbeknownst to those in the experiment) that has already been tested by those listening to see if the double blind results are the same.

I won't say that long listening sessions and other concerns are not relevant to auditioning audio gear. But an in depth discussion on psychoacoustics is long overdue. I've no claims to any side in the discussion (other than that of a rigorous scientific methodology), but am truly curious to see the results that would follow.

JimAustin
JimAustin's picture
Offline
Last seen: 7 months 1 week ago
Joined: Dec 16 2005 - 5:58am
A different use for blind testing

Robert,

Thanks for your note. The points you make are sound. But I thought I would point out that the application of blind testing you're describing -- where listener preference is tested under conditions where (eg) speaker size or amplifier face plates could be factored out -- is quite different than the kind of testing I was referring to in the column. My goal would be much more modest than trying to figure out which loudspeaker was better; I was suggesting using such testing simply to confirm that a difference was heard. Consider it the first, hugely important step towards credibility.

Many of the posts on that other hi-fi forum make it clear that many audiophiles refuse to accept the well-established scientific fact that we're very suggestible, that we routinely hear things that aren't there. I think we can overcome this, at least partially, with training. But never completely.

Some effects that we hear are very subtle. But others (if we writers are to be trusted) are quite large; we express a great deal of certainty. If the size of some of these effects are as large as we sometimes say they are (and if they're real), we should have no trouble discerning a difference under blind testing conditions. It's not something we need to do routinely. We just need to establish, once and for all, under controlled, audited, rigorous conditions, that we can hear the difference -- between, say, two sets of speaker cables (with near-identical electrical characteristics), or a gold and a standard CD "pressing" of the same recording. Do it once, show it can be done -- and therefore that the difference is real and not imagined -- and then happily go back to our subjective testing.

Of course, many of the most radical anti-subjectivists still won't believe the result; they'll assume fraud. But many, less ideological audiophiles will be convinced: Speaker cables really DO sound different because the trained listeners at Stereophile showed that they are.

Of course, such testing carries a substantial risk of failure, and the consequences of (especially public) failure could be severe.

Jim

Drtrey3
Drtrey3's picture
Offline
Last seen: 3 months 2 weeks ago
Joined: Aug 17 2008 - 2:52pm
No interest whatsoever

in blind listening tests. I have done research and am trained in science. I can read a journal for science. The articles I have read that reported on blind listening tests were dry and boring. So please, none of this garbage for me. I am quite entertained and informed by the wornderful subjective and objective reporting that currently infroms Stereophile.

Trey

Stephen Mejias
Stephen Mejias's picture
Offline
Last seen: 1 month 1 week ago
Joined: Nov 7 2010 - 3:35pm
impressive
Audio_newb wrote:
Testing the power of psychology versus the power of what we think we can hear would be not only profoundly interesting, but critically useful for an audiophile. How does my perception of sound change when I think I'm listening to an expensive system? Or when I think I'm listening to a bookshelf speaker versus a floorstander?
This is something I think about from time to time. I know that I am generally less impressed by expensive systems, but I do wonder if my expectations drop when I'm confronted by less expensive gear. I constantly remind myself to focus on the music and my emotional response to that music.

Recently, I was at JA's house, listening to some music. When I walked into the listening room, I simply sat down in the listening chair and put my head down. I knew that the electronics were all first-rate, but I wasn't sure which speakers were in the room -- I had assumed that they were something massive and crazy-expensive. But, when I finally looked up, I found that JA's original pair of 1978 Rogers LS3/5A were (at least, partly) responsible for the awesome stereo imaging and soundstaging.

That was impressive.

Audio_newb
Audio_newb's picture
Offline
Last seen: 1 year 6 months ago
Joined: Mar 1 2007 - 9:12pm
Another JA story

Here's another recent JA story for you. Also quite unscientific, but to my mind amusingly interesting. I was fortunate enough to sit in on one of John's presentations at a Music Matters event in Seattle. The speakers were the quite impressive GoldenEar Triton Two's and John was playing a choral piece that he had mastered in 24/88.2. Or so we thought. In reality the piece had been stepping from 24/88.2 all the way down to 128 kb mp3 as it played. John even played the beginning again and asked what we thought. Nothing, nada. To be fair I'm sure this wasn't the result in all of the sessions, but it gave me a chuckle.

Now don't think I'm claiming 24/88.2 sounds the same as a 128 kb mp3, because I don't think it does. A recent gizmodo article claiming possible hi def downloads on itunes would be bad for consumers made me ill (well not really, but I'm all for progress). What it did demonstrate to me, however, was the importance of context in evaluating audible differences. I'm sure John never expected us to fail his test. And I certainly wouldn't fault him if he had said the differences to him were stark. But playing in an uninterrupted piece, all of us enjoying the equipment and ambiance, the differences were lost on us. Such is the power of the mind and the power of the ear.

absolutepitch
absolutepitch's picture
Offline
Last seen: 9 months 3 weeks ago
Joined: Jul 9 2006 - 8:58pm
Listening and testing
Audio_newb wrote:

... Having said that, the audiophile community as a whole certainly has a bit of a hill to climb in justifying itself scientifically to the non audiophile. Some of this has to do with products (stones, blocks, anything with quantum in the name) seen by many as snake oil, much of it has to do with a certain populist resentment at luxury goods, and I suspect much of it has to do with a near universal refusal to conduct and report double blind tests.

... But back to the matter at hand, it is well documented that not only do we trust our eyes more than our ears, but our brains often play tricks on us. Testing the power of psychology versus the power of what we think we can hear would be not only profoundly interesting, but critically useful for an audiophile. How does my perception of sound change when I think I'm listening to an expensive system? Or when I think I'm listening to a bookshelf speaker versus a floorstander?

Secondly, it would be interesting (and useful) to explore the concept of the golden ear. Certainly some of us have more acute hearing than others. Furthermore, I'm certain that those who are trained to listen for the slightest of audible differences (a Stereophile editor for instance) have what we would consider much keener hearing then the audio layman. But how great are these differences?

And lastly, some plain ole fashion double blind testing on components, compression rates, etc. Truly interesting would be to retest gear (unbeknownst to those in the experiment) that has already been tested by those listening to see if the double blind results are the same.

...

I hope I have quoted and excerpted your post correctly.

Those who promote products that others may classify in the 'snake oil' category, have the responsibility to demonstrate that the difference is audible in well-designed,scientific controlled tests. IMHO too often their rebuttals are lacking in sufficient evidence. If they are reporting what they hear, then that's fine, it's their opinion. But when that opinion is held up as fact and that those who do not hear audible differences are called various names, it does not further discussion or promote confidence in their products.

Your 'Golden Ear' topic would be interesting. I'm not sure how we would go about judging that, should a test be performed.

We read in nearly every issue, equipment reviews where the reviewer mentions that this component sound a little 'dry' compared to another one, or more 'liquid', or other terms, for example. If that's what they hear then that's certainly O.K. - I report what I hear to others (in this forum too). But how does a reader take that information? Was the difference really there and was it due to equipment interaction with other components? If it was interactions, that may mean the sound is not inherent to that piece of gear but may vary with combinations with other gear. A double-blind test may pick out what a sighted test already told us in this case but does not clarify the claim that 'what was heard' is the sound of that equipment under review, when it may not be purely due to that particular equipment.

IMO, this topic is not an easy one to cover in a commercial publication. Look at how much work was described by Dr. Toole in his book. I can see myself designing and performing tests just like Dr. Toole did. But would it be as rigourous as Dr. Toole's work? No, because I would not be an expert in this field or experienced enough to do that caliber of audio research that he has done. But I can report my opinion and make careful claims (or conjecture) with caveats clearly stated.

Glotz
Glotz's picture
Offline
Last seen: 1 month 4 days ago
Joined: Nov 20 2008 - 9:30am
Stereophile already did this in 1992...

The extensive group loudspeaker review, that JGH participated in, was very inconclusive at best. They provided a TON of information, ratings by the staff over days, and with different program material, and each reviewers findings, from the same speaker, changed day to day, leaving no conclusive information whatsoever. Every reviewer disagreed with each other, reminding us that this IS ALL SUBJECTIVE.

The biggest problem is the media/listening material. Every recording is utterly different (and also changes as the recording moves through time!- Bass parts, dynamic movements here then there...), and each speaker sounded subjectively less attractive or more attractive as the recordings changed. Seems obvious in statement, but again, it proves that on ONE recording (and really one system of components influencing that recording as well) can one make the judgment (regarding the speaker) about. I mean, to be objective about those findings, one can only draw conclusions about THOSE particular recordings that were being served up, and nothing more.

Any other recording hypothetically introduced after those listening sessions would have scored differently, and no real consistent law of averages or conclusions about those speakers could be drawn. Reminder- we're listening to music, not test tones, and it changes constantly, not to mention our less than objective heads, with fatigue, attitude, time of day, daily power grid issues,etc... there are simply WAY too many variables in the method of this blind panel to come up with solid conclusions, and the scores of the reviewers attest to this.

Keeping scientific method in the subjective review process DOES have huge value (and Mr. Fremer proves this monthly), but in the context of ONE system, and to ONE reviewer. The review is merely a guidepost for us to seek out what WE will find individually for our subjective selves, and nothing more. If we find differently, it DOES NOT invalidate what MF heard, or AD, or EL, or JA. It was a report of THEIR SYSTEM, and THEIR EARS!!!!! The infusion of scientific method in the subjective review process is critical, but to THAT INDIVIDUAL LISTENER ONLY.

This is why when MF pans a particular cable, I don't freak out and curse him to hell, I remind myself, that this is what MF heard in HIS SYSTEM. If he says the cable is sucked out in the midrange, maybe MY system could use a dose of reticence, instead of midrange presence. My ears and needs for my system are DIFFERENT. Perhaps these cables are still vastly better than the ones I used to have, and the crutch they provide might be a better fix for me than another cable. BUT, the most important thing we can pull from reviews is the PROPENSITY of a given product to sound, so we may hear for those qualities (or flaws) for ourselves. I cant hear it? Keep listening. I still dont hear it? Tough shit- find a way TO hear it, with a change in system components or media. Or just try losing one's sense of jaded mistrust. THAT goes a long way.

How can a purely 'objective' process be successful, and to what end? It can't, other than in the pure vacuum of that moment or assembled system.

All objectivists (much like politicians) are selfish fools trying to pretend they are right.

Audio_newb
Audio_newb's picture
Offline
Last seen: 1 year 6 months ago
Joined: Mar 1 2007 - 9:12pm
Some more thoughts

Having read the above posts and given some more thought to the topic, I'm prepared to wade into the breach again. I fear I might rile some more feathers (gently, I hope), but bare with me. In my initial post I mentioned Malcolm Gladwell's book "Blink," but let me elaborate on that reference a bit. At one point in the book, Gladwell explores the work of a professional food taster. Note that this is not the same as a food critic, although their skill sets are not altogether dissimilar. Where a food critic focuses on the big picture, however, the food taster is more concerned with the minute fluctuations in very similar foods. They are primarily employed by large manufacturers of consumer foods to help refine their products. As such, the language they use (out of necessity) is much more precise. Gladwell describes dimensions of flavor, texture, appearance, scales of taste (sweet, sour, salty, spicy), and several other measured qualities.

Ask a food taster whether they like vanilla or chocolate ice cream better and you will get a range of answers. The question is subjective: there are as many differing tastes as there are food tasters. Ask them to describe a specific vanilla ice cream, however, and you may be shocked to find how similar their descriptions can be. Furthermore, chemists and chefs are keenly aware of the makeup of their foods. There are subtleties to their craft they must learn, but a skilled chef can alter sweetness and texture to their liking.

That's all well and good, you say, but what does this have to do with audio. I would suggest that the gourmand and the audiophile are not wholly dissimilar creatures, likewise the objects of their affections. Food and music are subjective in that each of us responds to identical stimuli in differing ways, but that does not change the fact that in both cases the stimulus, whether it be a glass of wine or a classical concerto played through a specific system, is a measurable constant.

Audio reviewers also use specific language to describe what they hear, evaluating gear on sound staging, tonality, frequency response, timing, detail, and many other dimensions. But I believe the effort is a bit more piecemeal here. Some reviewers focus more on the big picture--how gear makes them feel--and I can hardly blame them, we are in this to enjoy the music after all. Also, one gets the impression that there is little benchmarking for reviewers (although perhaps this is a lack of knowledge on my part): that they could get together and listen to a certain piece on a specific system and agree that this is a 10 on dynamic range, but a 5 on soundstaging.

Perhaps though this is harder for sound than for food. There may very well be a greater subjective variance in how we hear than how we taste. This, however, raises my other point: as technology has improved have we perhaps not made use of measurement equipment to the fullest? This past year there was an interesting announcement from Nordost and Vertex on a new measurement system. I've heard little about it since, but the time seems right for a new generation of measurement techniques. For if there are audible (and measurable) differences in cables and stands--as I believe to be the case--then certainly the same holds true for components.

As the previous poster noted, it's hard to make judgements on how a specific component will behave in a different system. But perhaps this needn't always be the case. If there is value to be gained by measuring equipment on the test bench, it seems that there would be as much if not more value measuring the changes in the system using a sophisticated microphone array at the listening position. Perhaps comparing listening tests and such precision measurements, a clearer picture could emerge as to how for instance time delay of a system goes into creating a stable and coherent image. Furthermore that knowledge might then be used to look at the differences between specific components. Or how much a part the room has to play versus the components themselves. What are the sonic signatures that give tube amps their sound? How does jitter manifest itself in the final sonic signature?

Some will argue that such an exploration is by necessity dry and boring, but I don't think that need be the case. I think that by combining listening tests with measurements you can give description to the numbers and graphs, making each more valuable in the process. I eagerly await such an exploration.

Catch22
Catch22's picture
Online
Last seen: 6 min 6 sec ago
Joined: Nov 21 2010 - 1:58pm
Now that, I agree with completely

Remove the "gotcha" aspect of designing these sorts of tests and you have something that can be a reasonable start.

dlb
dlb's picture
Offline
Last seen: 1 year 6 months ago
Joined: Mar 15 2011 - 6:13am
Great Idea!

Well said Audio-newb. A special issue evaluating an exhaustive list of class A components utilizing a double blind panel of randomly selected audiophiles from the Stereophile collective. I for one, am ready....Engage!! There will be beer and snacks right?

Drtrey3
Drtrey3's picture
Offline
Last seen: 3 months 2 weeks ago
Joined: Aug 17 2008 - 2:52pm
as far as I am concerned

this is a bad idea. It would be boring.

The scientific method is really a great tool, but it requires measurements. Rankings really don't cut it, words don't cut it, ya gotta have numbers for it to really work.

Trey

geoffkait
geoffkait's picture
Offline
Last seen: 27 min 24 sec ago
Joined: Apr 29 2008 - 5:10am
The scientific method and measurements
Drtrey3 wrote:

The scientific method is really a great tool, but it requires measurements. Rankings really don't cut it, words don't cut it, ya gotta have numbers for it to really work.

Trey

Actully, that's not true. The scientific method seeks truth in any fashion; it can use measurements, but other means are also available such as our senses, i.e, hearing and sight. Also, measurements can be deceptive at times and might not lead to the truth of the matter. Case in point: the Japanese amplifiers of the 1970s and 1980s that measured exceptionally low THD (total harmonic distortion) but generally sounded "bad" when compared to tube amplifiers of the era for which THD was generally measured much higher, in fact, Orders of Magnitude higher!!

Furthermore, a predicament has arisen in the past 20 years or so, especially in the last five years, concerning the so-called outrageous tweaks, some of which JA referred to in "As We See It."

The predicament can be stated thusly: For highly-dubious tweaks such as the Intelligent Chip, the Schumann Frequency Generator, the "tiny bowl" resonators, clocks that plug into the wall (Tice) or do not plug into the wall (Clever Little Clock), The Red X Coordinate Pen, among others - in particular the devices that apparently DO NOT AFFECT THE AUDIO SIGNAL anywhere in the system - not the power from the wall, or the audio signal in the electronics or cables, and not the acoustic waves arriving at the listener's ears. - then HOW can measurements even be made? Or if measurements might be made, as in the case of the Intelligent Chip, which ones? Does this mean the Scientific Method is flummoxed?

Geoff Kait
Machina Dynamica

Drtrey3
Drtrey3's picture
Offline
Last seen: 3 months 2 weeks ago
Joined: Aug 17 2008 - 2:52pm
Actually,

it is spot on true mate!

The good statistics that can be used for prediction utilize real, hard numbers. Not rankings, not ratings, no likert scale, real hard numbers with a good, reliable zero.

As a licensed clinical psychologist, I had to take lots of graduate stats. I was also exposed to "measurements" of every kind, including sand tray assessments and extensive individual evaluations that used no psychometrics whatsoever. I actually appreciate correlational studies and such for what they are, but they are not strong math.

Now I am happy to do comparative listening, sounds like a party! But it is hyperbole to call it hard science.

Cheers, and listen to some tunes! I have Rise and Shine by The Bears going. I can recommend it.

Trey

  • X
    Enter your Stereophile.com username.
    Enter the password that accompanies your username.
    Loading