Blind Listening Page 2
As suggested in one of this month's "Letters," I had actually considered restaging the 1985 "Carver Challenge," where a Carver M1.0t could be compared with a pair of Conrad-Johnson Premier Fives. In the event, however, I decided that as Stereophile had just bought a recent Adcom GFA-555 to compare with an early sample (footnote 3), it would be one of the test amplifiers. (The $750 '555 has always been a safe Class C recommendation in Stereophile's "Recommended Components.") The other amplifier, as different in design as possible, would be the similarly powerful VTL 300W monoblock ($4900/pair), which had sufficiently impressed J. Gordon Holt last October to merit a Class A recommendation.
As well as comparing what the magazine's reviewers consider a sound beyond criticism with one that is excellent at the price, the test would thus contrast solid-state with tube; dual-mono construction with a power supply shared between the channels; and an affordable with a cost-no-object component.
The methodology involved paired comparisons with a forced choice. The listeners would listen to a piece of music twice; the amplifier would either be the same for each presentation of the music, or different. At the end of the second presentation, each listener would have to decide whether the amplifier had remained the same or not and mark the score sheet accordingly. If they could not hear a difference, then the answer would have to be "The Same," otherwise it would be "Different." Not marking the score sheet is therefore not an option.
It is essential in such a test to reduce the questions being asked of the listeners to just one: "Was there a difference?" Otherwise, the results will not be reliable. For example, if you ask the listeners to identify what they think they heard—"Was Amplifier A the Adcom or the VTL? Was Amplifier B the Adcom or VTL?"—you are now asking the listeners to perform three tasks under blind conditions: again, to identify if the amplifier was the same or different, but also to assign a value judgment to each presentation. This, in my experience, is sufficiently complicated that the resultant stress reduces the listeners' scoring to no better than they would achieve by chance alone. For example, you have first to make a hypothesis concerning the identity of the first amplifier. Then, when you hear the music repeated, you have to make a second hypothesis about the second amplifier's identity and test it against your memory of the first. All the time you are doing this, in my experience, the one thing you are not doing is listening.
Similarly, other tests have asked the listener "Which is better, A or B?," which again produces random results due to the fact that "better" is a subjective parameter that will vary from person to person. Over a long series of tests, listener X might consistently feel that amplifier A was better than amplifier B, while listener Y might consistently feel that B was better than A. Without a question, they reliably heard a difference between the two amplifiers, but lump their scores together and the result could be presented as "No, they didn't hear a difference. Amplifier A was preferred as often as Amplifier B." You might think this example trivial, but I have taken part in tests apparently producing null results that featured exactly this kind of illogical absurdity (or should that be "absurd illogicality"?).
In addition, using preference as a probe question leaves the listener the option of not making a decision if he or she had no preference either way, raising the overall "noise" level in the results. (True identifications become submerged in a sea of genuine non-identification and deliberate non-differentiation.)
The question as to whether the amplifiers would be the same or different for each presentation was decided by a table of random numbers generated by computer. In addition, Will Hammond decided which amplifier would be presented first, when they were different, by flipping a coin. Thus the presentations would be random and there would no trends to give listeners additional clues. There was one departure from true randomness which should be explained: whenever the random number table indicated that a session of seven presentations should be six or seven "differents" or "sames," I discarded it. Though it is quite possible to get a series of six or seven "heads" by chance, I felt that this would confuse the listeners too much and that they would stop trusting their ears and start to guess. Think about it. If you took part in a listening test and you got six presentations where the amplifiers always appeared to be the same, wouldn't you start to doubt what you were hearing? Again, my motive was to reduce stress on the listeners, to prevent them from trying to second-guess what they were hearing.
Overall, the 56 separate presentations split up into 30 where the amplifiers were different and 26 where they were the same (13 VTL/VTL, 13 Adcom/Adcom). Of the 30 "different" presentations, 17 were Adcom/VTL, 13 were VTL/Adcom.
The room available to us at the show venue, the Dunfey San Mateo hotel, could hold about 55 seated people without crowding; as it turned out, the tests were so popular that we had to squeeze in additional people who were content to stand or sit on the floor (footnote 4). As then only about 10% of the listeners would have received any semblance of a stereo soundfield and the room was particularly lively, especially in the upper midrange and treble, differences in soundstaging performance between the test amplifiers would not have contributed to any subjective differences, I felt. In addition, there was often a problem with breakthrough from the adjacent room, despite Jeff Rowland keeping his sound pressures to mainly reasonable levels.
The loudspeakers used were B&W Matrix 801s (footnote 5), mounted on Arcici's dedicated stands (filled with dry sand), both to give listeners at the rear a chance of hearing the contribution of the tweeter/midrange heads and to give the smoothest mid-to-upper bass transition. For source, we used CD exclusively, with first an Adcom GCD-575 player giving service, then a Marantz CD-94 after the Adcom decided that it didn't want to play in our game anymore. The CD players were connected via a 1m pair of Monster M1000 interconnects to a Hafler Iris preamplifier (the actual sample that I reviewed last month). I chose to use the Hafler because it has a reasonably neutral line section, but more importantly, because it features a superb IR remote with what is effectively an analog volume control. The line-level bass equalizer for the B&Ws was inserted in the Iris's External Processor Loop with short lengths of MIT 330 cable.
The Hafler was connected with a 1m pair of AudioQuest Lapis interconnects to a small aluminum splitter box so that both the power amplifiers under test would be driven all the time—plugging and unplugging power amplifier inputs fitted with phono jacks is a certain route to sudden amplifier and/or loudspeaker death. The splitter box had one pair of direct outputs, to feed the less sensitive of the two amplifiers, and another pair which could be attenuated with a conductive-plastic Bourns stereo 25k potentiometer. In this manner, I could match the output levels of the two amplifiers to within the resolving power of the digital AC voltmeter (one part in 4000) without compromising the sound quality too much. The only experimental differnce seen in the two test situations would be the presence of the audiophile-quality pot in the feed to the VTL. Otherwise, all cables and number of contacts would be identical.
Footnote 3: Robert Harley will be reporting on this comparison between old and new Adcom GFA-555s in a future issue of Stereophile.
Footnote 4: My thanks to Discrete Technology's Sal DeMicco and to Scott Rundle of B&W who effected an eleventh-hour crossoverectomy on one of the Stereophile-owned speakers which had gotten damaged in shipping.
Footnote 5: Richard Lehnert and Robert Deutsch did an exceptional job, in my opinion, in handling the people-moving with the minimum of fuss and bother.