Watermarking, High-Resolution Audio Big Topics at AES

For the 109th convention of the Audio Engineering Society, the main floor of the L.A. Convention Center was transformed into a bazaar of new tools for audio professionals—but the panel discussions upstairs were where the real action took place. On Friday, September 22—just an hour before researchers Dr. Stanley Lipshitz and John Vanderkooy of Ontario's University of Waterloo presented a paper offering a mathematical proof for the "imperfectability" of one-bit delta-sigma recording systems—Sony Corporation issued a clarification of the technical standards for its Direct Stream Digital technology, the basis of the Super Audio Compact Disc. DSD, it now appears, is a one-bit technique as it applies to consumer playback systems, but uses a multi-bit PCM quantizer [presumably within a delta-sigma converter negative-feedback loop; see an article on this subject in the forthcoming November issue of Stereophile—Ed.] at the recording and mastering ends of the business. (The Lipshitz/Vanderkooy paper is available as AES preprint #5188.)

Not enormously significant for consumers, this revelation prompted some engineers in attendance to voice what they had long suspected—that DSD and DVD-Audio are not as technically incompatible as Sony/Philips publicity had previously indicated. In a tangential discussion during Saturday afternoon's session on high-resolution audio, Jim Johnston of AT&T Research speculated that DSD and DVD-A datastreams might be able to co-exist if output from different points within the same microprocessor. Optical disc players that will operate seamlessly across a number of platforms are therefore easily conceivable. As it turns out, in late September, Cirrus Logic announced an inexpensive multi-format chip, the CS4392, which will decode both DSD and DVD-A. (It wasn't clear if Johnston had prior knowledge of the new chip or was simply demonstrating his prescience.)

Chaired by Malcolm O.J. Hawksford of the University of Essex, the panel of experts discussed several obstacles to achieving higher levels of resolution in audio recording and playback. Mike Story, of dCS, Ltd., made a strong case for promoting ultra-wide audio bandwidth as a hi-rez standard—perhaps as much as 100kHz. While measurable hearing in normal adults rarely extends beyond 16kHz, many experiments have demonstrated that acoustic energy above this range can have a pronounced effect on the perceived realism of reproduced music. Story mentioned that dCS has conducted some very-well-controlled studies in which acoustic energy remains constant out to 30kHz while the energy in the 30–90kHz band varies. "The degree of focus and localization correlates quite well with the amount of energy in this band," Story mentioned. "Current theory is inadequate," he said of the standard engineering belief that a 20Hz–20kHz frequency response is all that is necessary for quality playback.

Pioneer Corporation's Takeo Yamamoto agreed that bandwidth affects "the perceived depth of an acoustic image." A fascinating discussion ensued in which an alternate model of human hearing was presented as a possible explanation for the reason high-resolution audio sounds better. Instead of simply detecting tones, the hearing system might also detect impulses, or "clicks"—localization cues that arrive at the ears within a 10-microsecond window. These impulses necessarily lie above the bandwidth for tones—in the energy band studied by Story and his dCS colleagues. In the wild, hearing is the body's "early warning system," as one panelist put it, and the "wideband target locator" hypothesis might explain why high-resolution audio sounds better—because it gets the cues right.

The human brain is "a most amazing pattern-recognition processor," said legendary loudspeaker and phase-relations researcher Siegfried Linkwitz. (The Linkwitz-Riley crossover formulation bears his name.) "Minimizing false cues should be one of our primary objectives," he said, drawing attention to the fact that a live trumpet played down the hall and around the corner is immediately perceived for what it is, and that very few people, expert listeners or not, would mistake a recording of the same instrument for the real thing. Linkwitz does not necessarily agree that a center-channel speaker is needed to produce a believable soundstage, seeing promise in two alternate six-channel surround configurations: one with three channels up front, two on the sides, and one in the center rear, and another (as proposed by David Chesky) with two front, two side, and two rear speakers, all of them full-range—and "at least two derived low-frequency channels."

Linkwitz pointed to two major culprits in the quest for hi-rez audio in the home: the speakers and the room. The power response of most loudspeakers favors the bottom end, he noted, with too much output in the bass and not enough in the midrange and high frequencies. Unpredictable room behavior, as all audiophiles know, makes integrating any loudspeakers into any room a delicate exercise of chance and skill. Linkwitz proposed standardizing all woofer levels as a good starting point toward a repeatable audio experience, and would like to see all low-frequency crossovers standardized at 100Hz with a 6dB/octave slope.

The good-natured Linkwitz, of course, harbors no hope of ever getting loudspeaker manufacturers to agree on anything, and in the remainder of his talk discussed problems in recording studios that might limit the resolution of their products. He presented some rather dismaying statistics about the level of RFI (radio-frequency interference) found in typical studios, and explained how it can cause residual noise, hum, and various types of nonlinear distortions. "Not enough attention is paid to RFI in the studio," Linkwitz said, noting that " . . . a length of speaker wire makes a very good antenna at 30MHz."

Mastering engineer Bob Katz, with Digital Domain of Orlando, Florida, inveighed against cheap digital recording devices, going so far as to present a chart showing that, below a certain price point (he specifically mentioned $1500 all-in-one 32-channel recording consoles), tried-and-true analog technology is better. Reflecting the experience of many music-lovers on the playback side, Katz said, "Inexpensive analog is better than cheap digital. DSP is not all equal." Quantization distortion caused by forcing a single Motorola 56001 DSP chip to its limits is what causes so much raw material he receives to sound "edgy, hard, cold, and distorted." Top-quality digital technology is superior to its analog equivalent, he feels, but is far beyond the budgets of most aspiring musicians. "Just say no to cheap DSP," he advised.

Bucking yet another of the engineering community's sacred cows, AT&T's Jim Johnston discussed the lower limits of human hearing. He first alluded to the "noise floor of air, at 4–6dB sound pressure level," and compared it to the upper limit of the threshold of pain. Given to working mathematical formulae aloud, Johnston noted evidence that some people can clearly identify sounds as delicate as "25dB below the white-noise level." In apparent agreement with statements that have been made by Meridian's Bob Stuart, Johnston feels that a word depth of 24 bits sufficiently "covers it all."

Attending a discussion such as this makes one thing glaringly apparent: Audio technology is far from reaching "maturity." While the general public may believe that all that can be known has already been discovered and incorporated into the latest products, scientists and engineers continue to explore the ever-receding frontier of knowledge.

The most hotly contested frontier du jour is the watermarking issue. How is digital music to be identified? Who will control how it is played and when—if at all—it may be copied? On Saturday morning, this topic was addressed with fervor by a panel chaired by mastering engineer Tony Faulkner, of London's Green Room Productions. Paul Jessop of the International Federation of Phonograph Industries explained why his organization feels that watermarking is a necessity: The proper compensation of artists and copyright holders for the public playing of music on the radio and over the Internet, and to prevent the proliferation of copies by consumers and professional pirates. Jessop stressed that watermarking is not intended to deprive music-lovers of their rights to "fair use" of recordings they have legally purchased, and mentioned that the inclusion of watermarks both weak and strong is a strictly voluntary matter, to be left to the discretion of artists, their managers, and their recording labels.

According to Jessop, all watermark-compliant playback devices will have a gate, or "pinch point," through which the datastream must pass in order to be decoded. Watermark-free recordings, such as those made by home recordists or musicians without contracts, will play freely on all devices, which should alleviate the fears of some audiophiles that they could buy players that will not play "unauthorized" discs. Copy-protected discs will play, but may not allow the creation of digital copies. Others may allow one copy, but not allow copies to be made from the copy.

Al McPherson of Warner Music Group emphasized that "transparency" is a primary concern for his industry, and stated repeatedly that they have no interest in imposing any in-band signal that will "compromise the integrity of the music." Former AES president and Stanford professor Elizabeth Cohen questioned the definition of "integrity," noting that many audio formats compromise the quality of original recordings. McPherson agreed, but insisted that watermarks within the audioband be robust enough to survive even AM broadcasts. Out-of-band watermarks will not work, he said, "because they are easily stripped out."

Prof. Malcolm Hawksford presented a number of concerns: How does watermarking affect noiseshaping? Is it equally applicable to DSD and multi-bit recordings? How does it affect a recording's spatial signature? How can multiple watermarks be unobtrusively implemented? Will they be coherent in all channels, appearing as a mono signal in the midst of the soundstage? Might it be possible to create a "computer inversion" of a watermark and thereby remove it from a recording? Hawksford acknowledged that some of his questions did not lend themselves to easy answers, but advised all present to think hard on them before rushing headlong into any watermarking technology.

Mastering engineer Glenn Meadows of Nashville's Masterfonics also had many questions, but of a practical rather than a theoretical nature. Meadows wanted to know if the music industry would insist on a serial number for each disc, as some music fans have feared, or if a simple ID for one title would suffice. He wondered how a watermark or ID could be extracted if it were inserted in error. Engineers must have some way of getting back to the original uncontaminated recording, he insisted. "Years of working in this business have taught me that, as often as not, the label is going to call us back at the last moment and say, 'Oh, we sent you the wrong numbers. Use these instead.' Human beings make mistakes. We need some sort of failsafe procedure in place when that happens." To the question of serializing DVDs, Al McPherson replied that it was simply "unworkable." For the near future at least, no one will be able to track you down through the purchase of a disc.

At the outset, Jessop included traceability as one of the music industry's goals with watermarking. "We need to be able to trace the origins of pirated recordings," Jessop said. Karlheinz Brandenburg, of Germany's Fraunhofer IIS-A AEMT, discussed what he called "forensic rather than block watermarking," and said that more testing is needed. "It's impossible to say that it's not audible," Brandenburg asserted.

Several panelists agreed that a watermark undetectable in a short ABX test might become irritatingly obvious in long-term listening. Despite the many arguments launched against watermarking, Verance Corporation president David Leibowitz stood by it as the best solution for the music industry in an age when digital high technology enters new hands daily.

Perhaps not coincidentally, on the opening day of the AES convention, Sonic Solutions announced that it is ready to support the final version of the new DVD-Audio specification, Version 1.2. "Using the new capabilities defined by the expanded specification and incorporated in Sonic DVD Creator AV, DVD-Audio producers will now be able to include copy protection and content watermarking in their DVD-Audio titles," stated the official press release. "Version 1.2 of the specification defines a new copy-protection method called CPPM (Copy Protection for Prerecorded Media), developed by 4CEntity, LLC. The CPPM specification defines a renewable cryptographic method for protecting entertainment content when recorded on physical media. . . . Sonic will also integrate Verance watermarking technology as an option with SonicStudio HD for secure content delivery in DVD media," the announcement continued.

The controversy is far from resolved. It's perhaps cold comfort to audiophiles, but the always-trenchant David Chesky reminded all in attendance that record labels like his, which specialize in jazz and classical recordings, are as unlikely to use watermarking as pirates are to mass-produce copies of such labels' recordings. As he put it, the stuff that's going to be hammered hard with watermarks is so nasty that "you could mark it with a lawnmower and never hear the difference."