The Blind, the Double Blind, and the Not-So Blind The 1991 AES Workshop on Data Compression
The 91st Convention of the Audio Engineering Society held in New York on October 4–8 1991 had an unusual theme: "Audio Fact and Fantasy: Reckoning with the Realities." Although anti-audiophile expression occurs at every AES function, this was the first time an entire convention appeared to be devoted to it. The event seemed like a kind of showdown where the audio engineering establishment would once and for all discredit those who use their ears to judge reproduced audio quality.
However, I was encouraged by the tenor of the debate on digital audio data compression. This technique reduces the amount of data needed to represent an audio signal not only by more efficiently encoding the data but also by throwing away musical information judged to be "inaudible." The motivation behind such schemes—some of which use less than one-tenth the amount of data used in 16-bit linear PCM representation found on CDs—is pure economics coupled with convenience. Digital audio data compression affects all music lovers: Many future audio formats—Philips's Digital Compact Cassette (DCC), Sony's Mini Disc (MD), Digital Audio Broadcasting (DAB), and possibly even some professional master recorders—are based on these low–bit-rate encoding techniques. Previous AES discussions have been surprisingly unconcerned about what these systems do to music (footnote 1).
This workshop was different. Many highly respected audio professionals raised questions about the wisdom of implementing these systems without much more extensive listening evaluations. Workshop chairman Ken Pohlmann called data compression "one of the most important topics facing us today." John Eargle, recording engineer and author of several university-level textbooks on audio, began his presentation with a series of questions we should ask about data compression. These included "Can the [encoding] algorithm be subject to impartial scrutiny before it is released? If not, why not?"
With this question, he identified a fundamental problem these systems: The developers finalize their encoding algorithms without evaluation by a wide range of trained listeners. He went on to suggest that we should "pick the very best ears available" for evaluating these systems. He was also concerned about implementation of a proprietary data-compression standard—such as PASC encoding used in Philips's DCC—without any evaluation by those outside the company.
Further, Eargle expressed dismay that official evaluations of these systems haven't used naturally miked acoustic music (see my "Industry Update" last month). Consequently, certain audible problems in the encoding algorithm may not be detected. Eargle asked: "How can we ensure that we won't be limping along with something that has problems that are acknowledged later on?" Finally, Eargle suggested that the original psychoacoustic research on which all data-compression systems are based should be "reinvented" and "rediscovered." Overall, his tone was one of caution: These system may have a place in audio, but it's unwise to lock into standards we make be stuck with for decades at this early stage of development. I couldn't agree more.
Bart Locanthi, a well-respected member of the audio community and a Fellow of both the AES and Acoustical Society of America, couldn't attend the workshop, but sent a tape that was replayed to the audience. Mr. Locanthi is chairman of an AES ad hoc committee formed to study digital audio data-compression systems and perform listening tests. Unfortunately, his recorded speech didn't make it onto the cassettes of the workshop, so I'll have to rely on my memory and notes of the event. His comments were the most critical public stance taken on data compression by anyone within the audio engineering establishment.
First, Locanthi related his experience of listening to a DAT tape that contained examples of low–bit-rate encoded music. He had requested the tape from Swedish Radio, the organization which conducted the official listening tests of these systems (footnote 2). Almost immediately Locanthi heard several peculiar sounds in the music, the most obvious being an idle tone at 1.5kHz.
When Locanthi informed Swedish Radio of this problem, they were surprised that they had not discovered it, but they did hear the 1.5kHz artifact after it was pointed out to them. When Locanthi asked how such an obvious flaw could go undetected, the response was that he "knew what to listen for."
It is ironic that Swedish Radio's extensive listening tests, with over 20,000 separate trials and 60 "expert listeners," failed to detect a flaw immediately apparent to a single listener. Their listening-test methodology—called "hidden reference, double-blind, triple stimulus"—was beyond scientific reproach. Yet a single listener in "unscientific" listening conditions immediately identified this fundamental problem. A paper by Michael Gerzon described later in this report comments peripherally on this issue of double-blind listening-test protocols not revealing the very flaws they are designed to detect.
Swedish Radio had previously concluded that "Both codecs [data compression encoder/decoder systems] have now reached a level of performance where they fulfill the EBU [European Broadcasting Union] requirements for a distribution codec." In other words, the system in which Locanthi discovered the flaws had already been officially proclaimed sonically acceptable as the replacement for AM and FM radio broadcasting—a replacement that will likely be in place for many decades.
Locanthi went on to urge caution about implementing these systems, suggesting that standards should not yet be set in such a new field. He has personally auditioned six different low–bit-rate codecs and has heard artifacts in all six. The claims that these systems are "transparent" and equal to CD quality are obviously optimistic at best.
The technical aspects of various low–bit-rate encoding systems were explained by their designers (summarized in last month's report from the London AES Conference). Curiously, one panelist suggested that the best ears for evaluating data-compression systems are those of the developers themselves. He also believed that "synthetically generated sequences" of music should be used during listening evaluations "rather than rely only on naturally produced examples."
Unfortunately, the workshop was concluded without the chance for audience discussion. These debates after the formal presentations are often the most enlightening. In addition to airing opposing points of view, they elicit more honest, spontaneous responses from the presenters.—Robert Harley
Footnote 1: See "Industry Update," Vol.14 No.4; "As We See It," Vol.14 No.5; and "Industry Update" in Vol.14 No.12 for reports on and discussions of data compression. See also this issue's "Letters" column.
Footnote 2: The December 1991 "Industry Update" has a full report on the Swedish Radio tests.