Audio Engineering: the Next 40 Years

By 2035, the way we produce and consume media will be entirely different from how we experience it now. Today there is still a "fourth wall" between us and the media we consume: within three decades, that line between reality and its recreation will all but disappear. Our media experiences will become fully immersive—from spherical audio and video that tracks with our body's movements, to gestural computing, to physical-feedback devices, and more. Using tomorrow's technology, our children and grandchildren may find it difficult to distinguish the real thing from reproduced.

Technology Improvements Follow a Trend
Fifty years ago, people thought Alan Turing was crazy. The father of algorithmic computing, Turing predicted that computers would employ about one gigabit (1GB) of data storage by the turn of the century. He was right. In 1965, Intel's Gordon Moore famously speculated that the number of transistors on an integrated circuit (IC) would double every two years. He was right, too, though his prediction turned out to be a tad conservative.

The growth of nearly every other technology describes a similarly predictable slope. For example: Since 1990, the cost-performance efficiency (CPE) of wireless devices has doubled every seven months. From 1980, the CPE of video-display technology has doubled every 18 months (footnote 1) And since the early 1950s, magnetic-storage bits-per-dollar has doubled every 18 months (fig.1).


Fig.1 The trend in magnetic storage in bits per dollar.

Since 1970, power consumption per data instruction has halved every 18 months. The cost of DNA sequencing has halved every 10 months since 1990. (NEC is now shipping a portable crime-scene DNA analyzer that takes just 25 minutes. The cost of transistors has halved every 16 months since 1970. One transistor now costs less than the ink for one letter of newsprint (fig.2).


Fig.2 The trend in transistor manufacturing cost.

Similar CPE slopes are seen for dynamic RAM since 1970 (18-month doublings), CPU calculations-per-second since 1950 (24-month doublings), CPU million instructions per second (MIPS) per-dollar since 1950 (22-month doublings), Internet global backbone bits-per-second (14-month doublings), Internet data traffic (7-month doublings), and growth in supercomputer floating-point operations per second (FLOPS) since 1990 (14-month doublings). The list continues for scores of technologies, including audio technologies.

Audio Dynamic-Range Innovations Follow a Trend
At the beginning of recorded sound, in 1890, we achieved a systemic dynamic range of 15–20dB, which is equivalent to 3 bits. By the 1930s, vacuum tubes, condenser microphones, and electric cutter heads had improved dynamic range to 35–40dB (6 bits). Magnetic tape gave us a 60–70dB range and more, especially once noise-reduction technologies like Dolby SR were available (12 bits). With the advent of commercial digital recording in the 1970s and '80s, early digital systems were capable of a dynamic range of about 90dB (15 bits).

Today, we can achieve a best-case, unweighted, systemic dynamic range of 110–115dB (19 bits) from concert hall to home playback, but only under pristine controlled conditions. (A typical high-quality home system playing better-than-average program material delivers around 16 bits.)

I've visualized the history of audio dynamic range on a growth graph (fig.3):


Fig.3 The history of whole-system, unweighted, dynamic range capability.

Two things should be noted. First, looking at technology growth with too narrow a time frame obscures the long-term trendline. For instance, from 1885 through 1925, acoustic dynamic range didn't improve much—it took the breakthrough innovation of electric recording to significantly improve dynamic range. Second, economic incentive drives innovation and improvement. Generally, those technologies with the greatest economic incentives improve the fastest.

If we "average" (or "smooth") 120 years of dynamic range, we see that its growth is predictable. From the beginning of audio recording, the dynamic range of commercial audio formats has improved by roughly 0.8dB annually, or about one bit every seven years. We can easily extend this growth slope into the future and expect the trend to continue until the reproduction by audio systems and recording media of real-world dynamic range is no longer limited by technical or economic factors.

Trends Predict the Future
The economic engines driving the next decades of media technology will be gaming, film, and television, which now have a combined global revenue of almost $500 billion.

The future of audio/video could be called the first-person-shooter era of media production. The world of high-end audiophile and videophile products—less than 2% of media markets—will not be the primary driver of these emerging technologies. Instead, it will be the beneficiary of this massive investment in innovation.

Thus, to better understand the future of A/V, we need to explore a number of emerging technologies and their possible futures over the next 40 years. Then we will converge our exploration into a singular vision for media creation and delivery—especially A/V creation and postproduction.

Gestural Control
Remember the big, gesture-controlled video screen Tom Cruise used in Minority Report, which was released in 2002? As envisioned for the film by John Underkoffler, of MIT's Media Lab, the actual technology would have likely cost more than $1 million in 2001. Today, we have consumer gesture devices that do more than Tom Cruise could do for less than $100. Samsung televisions respond to hand gestures while you sit on the couch. Hewlett-Packard notebook computers are currently shipping with the Leap Motion Controller (Visit Leap's website to see a video of its significant capabilities.)

How soon will free-air gestural control replace the mouse? When will gestural control become the de facto human/machine interface? Consider this: Today, a company called Microchip sells an e-field gestural-control chip for about $4. That IC comes fully equipped with no fewer than five paralleled A-to-D converters, onboard positional tracking, flash memory, and a powerful DSP engine that interprets myriad forms of 3D human gestures, flicks, angulars, and symbolics. The chip has a 3D spatial resolution of 150 positions per inch, and can track at 200 positions per second.

At $4 a chip, the migration from hardware control to free-air control has begun. One- and two-fingered gestures, different kinds of taps and swipes—our mobile devices and tablets have trained us well. We have become deeply familiar and entirely comfortable with gesture control on hard surfaces. The leap to gestures made in free air is a natural evolution.

Early adopters are already replacing their mice and touch interfaces with gestures. How long before free-air gestures become the standard? Look at the technology's growth slope (fig.4):


Fig.4 The growth of free-space gesture control.

Footnote 1: Display technology cost-performance weighted sum of Resolution, Color Depth, Dynamic Range, Latency, Dot Pitch, Refresh, Contrast Ratio, Viewing Angle, Brightness, and Energy Use . . . vs cost.

IgAK's picture

The "progress" is no surprise, but the thought of the constant stream of expenditures to replace what will no longer work, or no longer be "supported", or just to keep up with everyone else really isn't all that attractive! There is such a rush to get everything to market before it obsoletes or loses sales appeal nowadays that nothing works properly and everything is patched on top of patches and then all too soon has to be replaced entirely - long before "they" even get it right in the first place. I just hope those gimmicky free-air gestural controls understand emphatic one-fingered gestures when the control interfaces work as poorly as the mechanical ones do now to let the designers know what we think of the half-baked breakware and horrendously poorly thought-out controls they keep sticking us with.

I'm just glad I stuck with the still-best audio technology around for stability - my vinyl gear - while a constant revolving-door stream of that wonderful new digital stuff keeps coming and going through my listening room as it obsoletes every few months. Technology has become so terrific that the freshest breath of air is ironically what does not have to be replaced so darn often! That's what has become a really a novel pleasure anymore. Call me old if you like, but I yearn for the stability of those days when things didn't change so much. Dynamics are wonderful, but at near 120 dB, we will have as much as we can possibly go deaf if that's sustained. Broke, too.

dalethorn's picture

I like the idea of tech that removes redundant and unnecessary physical motions and other tedium. But I would prefer my music played as it is live. Somehow the idea of "immersive" sound seems like a gimmick - a distraction - unless the recording were engineered for that sound experience right from the start. Undoubtedly there will be wonderful recordings done for immersive listening, but for audiophiles who are still in stereo in 2014, the interesting thing will be partly how much the availability of great stereo recordings will diminish with time, partly how many tinkered and remixed versions of older recordings will appear that push the prior versions aside on the virtual shelves, and how many high-quality remasterings of great classic recordings won't get done due to prioritizing the 'new' sound. The second thing I mentioned is very important for persons who are exploring things that are new to them, and will have to wade through ever greater lists of junk to get to the better recordings. Stereophile may serve a greater purpose in the future, as an invaluable filter to help get people to those better recordings.

Alan Tomlinson's picture

Changing technology is a mixed blessing. When digital audio came out, it sounded truly repulsive(see Bop Til You Drop). Now there are genuinely excellent digital recordings and reproduction systems. One of the primary reasons that digital recording improved(as opposed to digital reproduction), is that a few excellent, highly-skilled recording engineers(e.g. George Massenburg) said 'this sounds like crap, how can I make it better?'. The reason that these engineers were able to change digital recording, is because they new how great recordings sounded and they new how great recordings sounded because they had worked in environments where it was possible to record bands together and therefore they were able to hear how those bands sounded in a favorable acoustic environment.

Many of these crucial aspects of fine engineering are gone. There are few good sounding recording rooms left which has led to the audio industry producing much less top-end equipment. More importantly though, there are few(er) people working in recording who actually know what a band sounds like in a good room and crucially, how to capture the sound of a band in a good room. Music lives through the connection of musicians playing together in the same place at the same time. That's not to say that it's impossible to do it without that, just that it's harder.

I have found very few recordings in the last 20 years that sounded really good that were made by anyone who hadn't come up in recording studios when there were a good many of them still around. Where will the quality recording engineers of the future come from when they have no idea how to make things sound good?



Alan Tomlinson

Rick Tomaszewicz's picture

...can't wait for ultra realistic, all immersive future performances that are BORING.  Too bad new music isn't evolving at the same rate as its recording and playback technologies. 

Maybe we expect less from new music than than we do from its delivery mode.  Do we really want to hear Bach, Miles, Frank and the Beatles etc. through yet another new and improved playback system?  And, each of these technical improvments often require the music to be repurchased in a new format at a higher price.  

These changes are driven by profit, technical and artistic imperatives/opportunities.  The last, but most important of these, is diminishing in importance.

In the distant past, only the wealthy and sophisticated heard the best music because it had to be heard live.  Artists strove to satisfy and impress the highest common denominator.  Once music could be recorded, more people had access to it.  And, once music became portable, everyone could hear it and reset market expectations.  Most artists try to deliver what will sell.  (They gotta eat and pay rent.)  

There's been plenty written about what sells today; they call it aural wallpaper.  ITM, I'll keep thrift store hunting for $1 vinyl and $2 CDs and supporting new artists who aim up.  

Rick Tomaszewicz's picture

Just got back from the Salvation Army Thrift Shop.  Big sale on CD's; three for a buck. Seventy classical CD's for $23!  Sorry, high-resolution-download-seeking crowd, that would have cost you $1,260.  

Stop worrying about having the latest technology.  Just enjoy the music.


corrective_unconscious's picture

I think those dynamic range figures are largely theoretical, even in the cases where qualified by descriptors such as "pristine conditions." I also think those dynamic range figures are not including the limits imposed by ambient sound in the listening environment - I think they pertain (supposedly) only to the recording and playback chain itself.

And the ultimate, real world dynamic range capability of a given medium is far removed from how much of that capability actually gets used by a particular recording.

As others have mentioned, no one could think those early digitally recorded CDs sounded better than most of the even fairly well recorded and produced LP records. No one could think where the market has gone, in bulk, compressed, lossy digital files, offers better sound than Redbook CDs.

Music_Guy's picture

Reviewers talk about systems and content so good that they take the listener into the concert hall or into the studio...

Funny, but for me, the experience I prefer is one of binging the performance into my space.  When I sit down to listen to pre-recorded music,  I want it to clear, dynamic and musical. It is just fine for me that the sound blends with my own acoustic space.  While I look for improvements in clarity and dynamics in my system, I do not yearn for more spatial clues. It is just fine for me that the sound blends with my own acoustic space.  It is uncanny for me how well a good two channel setup does this.

I know the gamers and the videophiles will be pushing the industry towards total immersion.  Digital signal processing is amazing!  (price/performance)  I sure wish there will be similar advances in improving clarity of two-channel reproduction as well in both source material and reprduction equipment.

(And, as others have said, I wish that there was improvement in the dynamic flow encoded in the source material without such a big price or analog..)


wozwoz's picture

Moore's Law, doubling of computing power etc is all very nice ... and yet, in the last 15 years, the quality of music as consumed by most people has gone dramatically backwards --- not forwards. For most, it has got worse --- not better. Oh sure ... for the audiophile elite, we have hi-rez SACD and DSD ... but the simple fact is most consumers are now streaming at 100th of the bit-rate of the studio masters over their iPhones, and if they even have standalone speakers, they are often  'effectively' mono docking stations, or USB sticks attached to a computer. 

Perhaps for the first time in audio history, technology is causing sound quality to decline, not improve. Sadly, I don't think things will change much in 50 years. 

ppgr's picture

Who needs dynamic range when commercial music rarely exceeds 12 dBs and getting more and more compressed?