Watching the Detectives
I didn't know the listening room, but I've loved Handel's Messiah since I made my debut as a boy soprano in a 1959 school performance; the DVD-Audio recording that engineer Philip Hobbs had burned from one of his SACD masters (footnote 1) was living up to my expectations.
The second refrain began. I shifted in my seat. The sound now wasn't quite as spacious as in the first refrain , not quite as silky smooth in the highs. Perhaps, as I grew more familiar with the room, I was beginning to recognize its defects. But it was the Center for Digital Music's listening room at London's Queen Mary University, and it had been designed by experienced acousticians—I wasn't expecting there to be any obvious defects.
We were now listening to the chorus' third refrain . This was strange. My initial impression had been of a wide, deep, well-defined soundstage, with superb separation between the images of the singers and instruments. What I was now hearing was flatter and thicker. I was sure I hadn't been wrong, but there was no doubt that my initial impressions had been way too optimistic, colored perhaps by my respect for Philip's work in the excellent-sounding SACDs he's engineered for Linn Records.
Fourth refrain. Okay, this was getting silly. Why hadn't I noticed that coarse quality in the upper midrange before? And the strings are just shrill and grainy. This really is a poor recording. I glanced to my sides. My fellow listeners appeared to be equally uncomfortable with what they were hearing.
The music stopped. "What did you think?" asked Philip.
We tried to be polite, but it was clear that all of us had had a similar experience: despite our initially positive reaction, the sound of Philip's Messiah had ultimately proved very disappointing.
Then I noticed that Philip was grinning. It turned out that we'd been unwittingly involved in a blind listening test. The DVD-A was a ringer. Philip had chosen a Handel chorus in which the same music is heard four times. He had prepared four versions of the chorus—the original 24-bit/88.2kHz data transcoded straight from the DSD master; a version sample-rate–converted and decimated to 16/44.1 CD data; an MP3 version at 320kbps; and, finally, an MP3 version at 192kbps—and spliced them together in that order. The last three versions had been subsequently upsampled back to 24/88.2 so that the DAC's performance would not be a variable. The peak and average levels were the same for all four versions; the only difference we would hear would be the reductions in bandwidth and resolution.
This was a cannily designed test. Not only was the fact that it was a test concealed from the listeners—as I've often argued in the past, the listeners' awareness that they're taking part in a formal test is an interfering variable because it changes their state of mind—but organizing the presentation so that the best-sounding version of the data was heard first, followed by progressively degraded versions, worked against the usual tendency of listeners to a strange system in a strange room to increasingly like the sound the more they hear of it. The listeners in Philip's demo would thus become aware of their own cognitive dissonance. Which, indeed, we did.
So when you read in the popular press that 128kbps MP3s are indistinguishable from CDs, or that satellite radio, which runs at around 64kbps for two channels, is of "CD quality," think of the implications of Philip's demo. Not the least of these, of course, was that we were aware of the degradation from 24/88.2 to "Red Book" CD data, despite the proclamations from some pundits that the CD medium is audibly transparent.
As I mentioned in last month's "As We See It," I was in London in June for the "New Directions in High Resolution Audio" conference, organized by the Audio Engineering Society. Reader Lloyd Klibert raises the subject of blind testing in this issue's "Letters" (p.9), and there was much discussion at the conference of why formal blind tests tend to miss differences that can be audible under normal listening conditions.
In his keynote address, for example, Peter Craven demonstrated the improvement in sound quality of a digital transfer a 78rpm disc of a live electrical recording of an aria from Puccini's La Bohème when the sample rate was increased from 44.1 to 192kHz. Even 16-bit PCM is overkill for the 1926 recording's limited dynamic range, and though the original's bandwidth was surprisingly wide, given its vintage, 44.1kHz sampling would be more than enough to capture everything in the music, according to conventional information theory. Those same skeptical pundits would therefore claim that any perceived improvements must be delusional.
But of course, as Peter pointed out, with such a recording there is more to the sound than only the music. Specifically, there is the surface noise of the original shellac disc. The improvement in sound quality resulting from the use of a high-sampling-rate transfer involved this noise appearing to float more free of the music; with lower sample rates, it sounded more integrated into the music, and thus degraded it more.
Peter offered an interesting hypothesis to explain this perception: "the ear as detective." "A police detective searches for clues in the evidence; the ear/brain searches for cues in the recording," he explained, offering as support AES Fellow Barry Blesser's tutorial in the perception of reverberation in the October 2001 issue of The Journal of the AES (p.886). "The auditory system...," Blesser wrote, "attempts to build an internal model of the external world with partial input. The perceptual system is designed to work with grossly insufficient data...."
Given that audio reproduction is, almost by definition, "partial input," Peter wondered whether the reason listeners respond positively to higher sample rates and greater bit depths is that these better preserve the cues that aid listeners in the creation of internal models of what they perceive. If that is so, then it becomes easier for listeners to distinguish between desired acoustic objects (the music) and unwanted objects (noise and distortion). And if these can be more easily differentiated, they can then be more easily ignored.
This has been my own experience. I've been recording in high resolution since pianist Hyperion Knight's performance of Gershwin's Rhapsody in Blue, in 1997 (Stereophile STPH010-2), and a constant observation has been that undesirable aspects of the sound that I felt were at or below threshold with the original hi-rez files become more annoying, and less readily resolved, when the data are mastered for the commercial CD release. This was the case with my recording of Robert Silverman performing Beethoven's complete piano sonatas, in which the early reflections from the walls of the small recital hall had more of a deleterious effect with the "Red Book" data than with the original hi-rez data (see Stereophile, January 2001, pp.99–107), thus mandating a remix.
This is currently the case with my most recent recording of the vocal group Cantus: after I'd done all the mixing and equalization at 88.2kHz, the CD versions sounded more muddy and less refined than I'd expected, given the care with which I'd downsampled and noiseshaped the hi-rez data. The mix and EQ choices I'd made at 88.2kHz were not optimal for the 44.1kHz versions. My detectives had been misled by the clues. As a result, the release of the CD is horribly late.
"Who needs high-resolution audio?" Peter Craven asked at the end of his presentation. His answer: "We all do!" I know I do.
Footnote 1: Handel, Messiah (Dublin Version, 1742), Dunedin Consort, SACD, Linn CKD 285; also available as 24-bit/88.2kHz (lossless-compressed with FLAC or WMA), "Red Book," and MP3 downloads.