New Audio Standard for Personal Computers and the Internet Slated for MPEG-4

Everyone claims "CD-quality" sound over the Internet these days, but the reality always seems far short of that promise. As a result, work continues to develop an encoding scheme worthy of the CD-quality title. Recently we reported on developments at AT&T regarding the a2b format, and both Liquid Audio and RealNetworks compete on a weekly basis to grab headlines for their audio technology announcements.

Thus, we read with interest news coming out of MIT about their new approach to sound processing, called "Structured Audio," that is to be incorporated in the new MPEG-4 international Standard. But before readers get too excited, there's a catch: the format's "Structured Audio Orchestra Language" can "describe" and sequence synthesizer-like sounds, but doesn't handle real voices or recorded sound directly. This is similar in many ways to the Beatnik approach, recently incorporated into the Java Standard.

MPEG, the Moving Picture Experts Group, is part of the International Standardization Organization, and is chartered with the development of industry standards for the compression, processing, coding, and transmission of audio and video. These standards are used worldwide as a blueprint for the design, development, and manufacturing of audio software and hardware components (for example, MPEG-2, used for DVD and digital TV).

The MPEG-4 standard will be released in October 1998 and formally become an international standard in December 1998, and is intended as a standard for multimedia applications. Last month, the Final Committee Draft was completed, which indicates that all parts of the specification, including the Media Lab's contributions, will proceed into the final standard. The current draft standard will change little before completion.

"The contributions the Media Lab has made to MPEG-4 are a crucial part of the audio tool set, and represent a fundamental advance in audio standardization," said Leonardo Chiariglione, MPEG convener and chairman.

According to MIT, Structured Audio is a set of specifications for the description and transmission of sound. While existing audio standards represent sound as a stream of bits, in Structured Audio, content is stored and delivered as a computer program in a flexible language, then translated into sound on the user's computer. Because transmitting data as a program is considerably more efficient than transmitting streams of bits---compare the bandwidth required to transmit a PCM-encoded recording of Beethoven's Symphony 9 with that for the MIDI instructions to play back the same symphony---this method enables an increase in the quality and efficiency with which sound is delivered.

Eric Scheirer from MIT also points out that "it's true that Structured Audio by itself doesn't have built-in primitives for voice encoding. However, the Structured Audio method is powerful enough to encapsulate and transmit a voice codec. You don't get a bandwidth savings with this method, but you're guaranteed performance at least as good as any existing and known method. This is not true of model-based synthesizers like Beatnik or Yamaha XG.

"The Structured Audio work isn't meant to stand on its own. We have carefully developed the tools to be a well-integrated part of the overall MPEG-4 standard. And of course MPEG-4 has the state-of-the-art voice and recorded-sound methods in addition to Structured Audio."

The Structured Audio method, developed by researchers in the Media Lab's Machine Listening Group, comprises more than 20% of the MPEG-4 Audio standard. This submission, which includes software, technical documentation, and testing methods, was evaluated and verified by MPEG and found to meet the requirements of the standards body.

The Media Lab's Structured Audio method is designed to integrate seamlessly with the other components of MPEG-4. These include methods for the transmission of speech, recorded music, computer graphics, and compressed digital video. All of these tools may be combined in a single MPEG-4 presentation.

A statement from MIT says, "The Media Lab has executed its current standardization work in an open arena, free of patent and copyright restrictions, in order to encourage advances in multimedia for all computer users and technology companies. All of the computer tools developed by the Media Lab in the Structured Audio project have been freely donated to the Internet, and the Media Lab maintains no control or veto power over the direction of the standard."

"Structured Audio points the way to a more powerful common platform for sound processing," said Professor Barry Vercoe, head of the Media Lab's Machine Listening Group and leader of the Structured Audio research project. "By incorporating these findings into an accepted international standard, we can ensure that musicians, producers, and PC users around the world can benefit from this research."

MIT reports that the "CD-quality" (their words) stereo audio data will be easily transmitted and received via a normal computer modem. "The performance levels achieved through the MPEG-4 Structured Audio method enable significant new composition and commerce models. Composers of popular music styles such as house music, rave music, techno, and electronica will be able to efficiently sell high-quality compositions directly to listeners via the Internet.

"Interactive movies and virtual-reality experiences containing music, sound effects, and dialog will likewise be able to envelop the listener in a 3-D world of sound. MPEG-4 also allows the creation of 'virtual karaoke' songs, where the music actually slows down and speeds up to follow the singer.

"Structured Audio will also have an impact on the music-composition process itself. Composers are free to create new 'virtual synthesizers' at will, so their creativity is no longer limited by the capabilities of the fixed hardware synthesizers they own. A composer's PC system incorporating MPEG-4 Structured Audio technology can replace an entire studio of synthesizers, effects processors, and mixing consoles. The standard unifies a growing marketplace in 'software synthesizers,' which overcome some of these limitations, but until now have been hampered by restricted features, data incompatibility, and a small user base."