Features

Audio Engineering: the Next 40 Years

John La Grou Mar 13, 2014

By 2035, the way we produce and consume media will be entirely different from how we experience it now. Today there is still a "fourth wall" between us and the media we consume: within three decades, that line between reality and its recreation will all but disappear. Our media experiences will become fully immersive—from spherical audio and video that tracks with our body's movements, to gestural computing, to physical-feedback devices, and more. Using tomorrow's technology, our children and grandchildren may find it difficult to distinguish the real thing from reproduced.

Technology Improvements Follow a Trend
Fifty years ago, people thought Alan Turing was crazy. The father of algorithmic computing, Turing predicted that computers would employ about one gigabit (1GB) of data storage by the turn of the century. He was right. In 1965, Intel's Gordon Moore famously speculated that the number of transistors on an integrated circuit (IC) would double every two years. He was right, too, though his prediction turned out to be a tad conservative.

The growth of nearly every other technology describes a similarly predictable slope. For example: Since 1990, the cost-performance efficiency (CPE) of wireless devices has doubled every seven months. From 1980, the CPE of video-display technology has doubled every 18 months (footnote 1) And since the early 1950s, magnetic-storage bits-per-dollar has doubled every 18 months (fig.1).

Fig.1 The trend in magnetic storage in bits per dollar.

Since 1970, power consumption per data instruction has halved every 18 months. The cost of DNA sequencing has halved every 10 months since 1990. (NEC is now shipping a portable crime-scene DNA analyzer that takes just 25 minutes. The cost of transistors has halved every 16 months since 1970. One transistor now costs less than the ink for one letter of newsprint (fig.2).

Fig.2 The trend in transistor manufacturing cost.

Similar CPE slopes are seen for dynamic RAM since 1970 (18-month doublings), CPU calculations-per-second since 1950 (24-month doublings), CPU million instructions per second (MIPS) per-dollar since 1950 (22-month doublings), Internet global backbone bits-per-second (14-month doublings), Internet data traffic (7-month doublings), and growth in supercomputer floating-point operations per second (FLOPS) since 1990 (14-month doublings). The list continues for scores of technologies, including audio technologies.

Audio Dynamic-Range Innovations Follow a Trend
At the beginning of recorded sound, in 1890, we achieved a systemic dynamic range of 15–20dB, which is equivalent to 3 bits. By the 1930s, vacuum tubes, condenser microphones, and electric cutter heads had improved dynamic range to 35–40dB (6 bits). Magnetic tape gave us a 60–70dB range and more, especially once noise-reduction technologies like Dolby SR were available (12 bits). With the advent of commercial digital recording in the 1970s and '80s, early digital systems were capable of a dynamic range of about 90dB (15 bits).

Today, we can achieve a best-case, unweighted, systemic dynamic range of 110–115dB (19 bits) from concert hall to home playback, but only under pristine controlled conditions. (A typical high-quality home system playing better-than-average program material delivers around 16 bits.)

I've visualized the history of audio dynamic range on a growth graph (fig.3):

Fig.3 The history of whole-system, unweighted, dynamic range capability.

Two things should be noted. First, looking at technology growth with too narrow a time frame obscures the long-term trendline. For instance, from 1885 through 1925, acoustic dynamic range didn't improve much—it took the breakthrough innovation of electric recording to significantly improve dynamic range. Second, economic incentive drives innovation and improvement. Generally, those technologies with the greatest economic incentives improve the fastest.

If we "average" (or "smooth") 120 years of dynamic range, we see that its growth is predictable. From the beginning of audio recording, the dynamic range of commercial audio formats has improved by roughly 0.8dB annually, or about one bit every seven years. We can easily extend this growth slope into the future and expect the trend to continue until the reproduction by audio systems and recording media of real-world dynamic range is no longer limited by technical or economic factors.

Trends Predict the Future
The economic engines driving the next decades of media technology will be gaming, film, and television, which now have a combined global revenue of almost $500 billion.

The future of audio/video could be called the first-person-shooter era of media production. The world of high-end audiophile and videophile products—less than 2% of media markets—will not be the primary driver of these emerging technologies. Instead, it will be the beneficiary of this massive investment in innovation.

Thus, to better understand the future of A/V, we need to explore a number of emerging technologies and their possible futures over the next 40 years. Then we will converge our exploration into a singular vision for media creation and delivery—especially A/V creation and postproduction.

Gestural Control
Remember the big, gesture-controlled video screen Tom Cruise used in Minority Report, which was released in 2002? As envisioned for the film by John Underkoffler, of MIT's Media Lab, the actual technology would have likely cost more than $1 million in 2001. Today, we have consumer gesture devices that do more than Tom Cruise could do for less than $100. Samsung televisions respond to hand gestures while you sit on the couch. Hewlett-Packard notebook computers are currently shipping with the Leap Motion Controller (Visit Leap's website to see a video of its significant capabilities.)

How soon will free-air gestural control replace the mouse? When will gestural control become the de facto human/machine interface? Consider this: Today, a company called Microchip sells an e-field gestural-control chip for about $4. That IC comes fully equipped with no fewer than five paralleled A-to-D converters, onboard positional tracking, flash memory, and a powerful DSP engine that interprets myriad forms of 3D human gestures, flicks, angulars, and symbolics. The chip has a 3D spatial resolution of 150 positions per inch, and can track at 200 positions per second.

At $4 a chip, the migration from hardware control to free-air control has begun. One- and two-fingered gestures, different kinds of taps and swipes—our mobile devices and tablets have trained us well. We have become deeply familiar and entirely comfortable with gesture control on hard surfaces. The leap to gestures made in free air is a natural evolution.

Early adopters are already replacing their mice and touch interfaces with gestures. How long before free-air gestures become the standard? Look at the technology's growth slope (fig.4):

Fig.4 The growth of free-space gesture control.

Footnote 1: Display technology cost-performance weighted sum of Resolution, Color Depth, Dynamic Range, Latency, Dot Pitch, Refresh, Contrast Ratio, Viewing Angle, Brightness, and Energy Use . . . vs cost.

Features

Audio Engineering: the Next 40 Years

ARTICLE CONTENTS

ArtIcle Contents